Chai – Arithmetic and Organization

As you read along, why not try writing some programs and seeing how they evaluate in this version of Chai? You can do so at our live web server.

Organization

Last time, we implemented a simple interpreter for a language which exclusively computed sums of numbers. Before we continue with more interesting language forms, we need to give ourselves some meat to work with.

Specifically, we need to add more primitive operations (+, -, *, /, and, or, not, …), and modularize our files so we can keep track of things better. After we cover the new syntax forms, this post will be admittedly be technical; we’ll spend most of our efforts here designing look-up tables, and implementing simple type-checking for primitive operations. Afterward, we will be able to add new primitive operations and data types without modifying our interpreter, and only slightly modifying our parser. The point of all this is to hide the messy code which handles primitives, so that we can better focus on more interesting language forms. For those readers without a neurotic desire to organize code (and hence no desire to read the entire post), just remember that in the future, we will add new primitives without detailing the implementation. For reference, the source code is available on this blog’s Google Code page.

The first change we’d like to make is to move our abstract syntax tree to a separate file, “chai-ast.rkt.” In order to have all the generated functions (constructors and field accessors) available for the rest of our files to use, we need a special “provide” command:

[sourcecode language=”css”](provide (all-defined-out))
(require plai)

;; a chai-expression
(define-type chai-expr
[num (val number?)]
[sum (lhs chai-expr?) (rhs chai-expr?)]))[/sourcecode]

So (all-defined-out) is a special function which provides all functions defined in this file. Now, instead of having separate language form for each primitive operation (say, a sum, a diff, a mult, a div, etc), we can abstract this to a “primitive” form. In other words, our “sum” line changes to this:

[sourcecode language=”css”] [prim (id symbol?) (args list?)] [/sourcecode]

We also recognize that we will want to include unary primitives, and ternary primitives, so we have an identifying symbol and a list of arguments. In other words, our new EBNF syntax looks like:

expr = number
     | (prim expr expr...) 

prim = + | - | * | /

Where the “expr…” notation means we allow zero or more following things which are exprs. For now, we do not have any null-ary operations (with zero arguments). Finally, we’d like to include boolean-valued expressions and boolean algebra as well, so this extends our syntax to:

expr = number
     | true
     | false
     | (prim expr expr...) 

prim = + | - | * | / | and | or | not

Note that the syntax does not keep track of and information associated with the operations, like type or arity. Syntactically, it is perfectly okay to write the program: “(not 1 2 3)”. This is well-formed, because the first thing is a primitive symbol, and the other things are valid arguments. We will leave the semantics of these primitive operations to the interpreter.

Implementing all of this in our abstract syntax tree is straightforward:

[sourcecode language=”css”];; a chai expression
(define-type chai-expr
[num (val number?)]
[bool (val boolean?)]
[prim (id symbol?) (args list?)])[/sourcecode]

Now let us turn to the parser, which is about as simple as it was last time. Again, we assume that we have already converted an input string into an s-expression (via the utility functions in chai-utils.rkt).

[sourcecode language=”css”];; parse-expr: s-expr -> chai-expr
;; parse an s-expression into a chai expression,
;; or throw an error if it is not well-formed
(define (parse-expr input-expr)
(cond [(number? input-expr) (num input-expr)]
[(eq? ‘true input-expr) (bool #t)]
[(eq? ‘false input-expr) (bool #f)]
[(and (list? input-expr)
(primitive-symbol? (first input-expr)))
(prim (first input-expr)
(map parse-expr (rest input-expr)))]
[else (error (string-append
"parse: unsupported language form: "
(s-expr->string input-expr)))]))[/sourcecode]

The first three cases of the “cond” expression are trivial. In the fourth, we introduce a new function, which determines whether the first thing in our s-expression is a primitive symbol. Of course, we actually need to write this function, but before we do that let’s come up with a good way to structure a table with information about primitives.

We create a new file called “chai-primitives.rkt” just for this purpose. In it, we’ll have a look-up table with some information about primitives:

[sourcecode language=”css”];; a helper function for the primitive operation table
(define (make-op op) (λ (arglist) (apply op arglist)))

;; primitive operation lookup-table
;; symbol -> (list op num-args (list type1? type2? …))
(define primitives-table
(hasheq ‘+ (list (make-op +) 2 (list number? number?))
‘- (list (make-op -) 2 (list number? number?))
‘* (list (make-op *) 2 (list number? number?))
‘/ (list (make-op /) 2
(list number?
(λ (x) (if (equal? 0 x)
(error "division by zero!")
(number? x)))))
‘or (list (λ (args) (ormap (λ (x) x) args))
2 (list boolean? boolean?))
‘and (list (λ (args) (andmap (λ (x) x) args))
2 (list boolean? boolean?))
‘not (list (make-op not) 1 (list boolean?))))[/sourcecode]

“hasheq” creates a new hash table where the “eq?” function is used to compare keys. In this case, the keys are Racket symbols, and “eq?” happens to work very well with symbols. Associated to each  symbol is a list of three things. First, the operation that will be applied to the arguments. Since each function has to be applied to a packaged list of values, we will construct the “operation” via make-op (with the exception of and and or, for quirky Racket reasons), which creates a function that applies the given operation to a list of arguments. Second, we have a number representing the allowed number of arguments; and finally, we have a list of functions. In this list, the nth function is used to determine whether the nth argument has the correct type (and they are applied in order). This allows us to package important information here, such as the condition that the second argument of division is never zero. This sort of ho-hum logic would severely clutter our interpreter.

But instead of allowing our other files to access the primitives table willy-nilly, we should create an extra layer of protection, which will make it easier for us to maintain our code in the future. Specifically, we want a function which determines if a given symbol is a key in the table, and a function to retrieve the contents of the table. We do this as follows:

[sourcecode language=”css”];; primitive-symbol?: symbol -> boolean
;; return true if the given symbol is a primitive
(define (primitive-symbol? id)
(hash-has-key? primitives-table id))

;; get-prim: symbol -> function number (list function)
;; lookup the data associated with a primitive
(define (get-prim id)
(apply values (hash-ref primitives-table id)))[/sourcecode]

The bit with “(apply values … )” takes the items in the list resulting from the hash-lookup, and returns them as multiple values. Note that in Racket, functions can have multiple return values. As we will see, this decision makes our code shorter. Instead of writing code to extract the pieces of the list, we just require the caller to bind all three return values to variables at the same time. Our foresight tells us that when we are type-checking, we will need all of this information at the same time, so requiring us to bind them all is no hindrance. Providing these two functions gives us all we need to continue with our interpreter.

The actual interpreter is quite simple now. There are only three cases:

[sourcecode language=”css”];; interp: chai-expr -> chai-expr
;; interpret a chai-expression
(define (interp input-expr)
(type-case chai-expr input-expr
[num (val) val]
[bool (val) val]
[prim (id args)
(let* ([interped-args (map interp args)]
[operation (type-check/get-op id interped-args)])
(operation interpreted-args))]))[/sourcecode]

At this point, we note that at some point we will want to perform type-checking and retrieve the required primitive operation given its identifier. We also recognize that this code belongs elsewhere. So we defer the logic to a function called “type-check/get-op”, which accepts the primitive operation identifier, and the list of arguments (after they’ve themselves been interpreted!), and returns the function which may then be applied to a list of values, throwing an error if there is an incorrect number of arguments or if any has the wrong type. The rest of the logic that goes into the interpreter is quite clear.

Type Checking

We write a ghostly shell of a type-checker as follows:

[sourcecode language=”css”];; type-check/get-op: symbol list -> function
;; check the types and number of the arguments for the primitive
;; ‘id’, returning the primitive operation on success, and
;; throwing an error on failure
(define (type-check/get-op id args)
(let-values ([(op num-args-allowed arg-types) (get-prim id)])
op))[/sourcecode]

For starters, we just return the operation as it is. Note that let-values allows us to bind all three results of the call to “get-prim” simultaneously.

To fill in the gaps, we should count the number of arguments in the “args” variable, and check that against “num-args-allowed”:

[sourcecode language=”css”];; type-check/get-op: symbol list -> function
(define (type-check/get-op id args)
(let-values
([(num-args-received) (length args)]
[(op num-args-allowed arg-types) (get-prim id)])
(if (not (equal? num-args-received num-args-allowed))
(error ‘type-check
"~a expected ~a args, but received ~a."
id num-args-allowed num-args-received)
op)))[/sourcecode]

Note that we need the additional parentheses around (num-args-received) in the binding clause, because let-values expects each clause to have a list of identifiers. In addition, the “error” function accepts a formatting string similar to the kind encountered in C’s printf function. However, here the “~a” variable allows one to write out whatever value one wants. The details of Racket’s flavor of formatted strings is in the documentation here. All we need to know here is that “~a” works for strings and numbers.

Now that we’ve checked that there are the correct number of arguments, let us ensure they have the right type. To do this, we simply apply each element of “arg-types” (a list of boolean-valued functions) to the corresponding element of “args”:

[sourcecode language=”css”](andmap (λ (a b) (a b)) arg-types args)[/sourcecode]

Here “(λ (a b) (a b))” is an anonymous function which accepts two arguments, and applies the first argument as a function call on the second argument. If we “map” this lambda expression over the list of arg-types and args, we will get a list of booleans representing whether the arguments have the correct types! Going slightly further, “andmap” performs such a “map,” and returns true if and only if each value in the resulting map is true (probably short-cutting by terminating upon finding a single false value). Thus, this expression is true precisely when the arguments type-check!

Putting this back into our type-check/get-op function, we are finished:

[sourcecode language=”css”];; type-check/get-op: symbol list -> function
(define (type-check/get-op id args)
(let-values ([(num-args-received) (length args)]
[(op num-args-allowed types) (get-prim id)])
(if (not (equal? num-args-received num-args-allowed))
(error ‘type-check
"~a expected ~a args, but received ~a."
id num-args-allowed num-args-received)
(let ([good-args? (andmap (λ (a b) (a b)) types args)])
(if good-args?
op
(error ‘type-check
"one of the arguments to ~a has the wrong type: ~a"
id (string-join (map chai-expr->string args) ", ")))))))[/sourcecode]

[We apologize for the sloppy indenting in the last two lines; it is better than forcing the reader to scroll sideways. For a more pleasant indentation scheme (with colors, parenthesis matching, and more!), view the file in DrRacket, the standard Racket IDE.]

So that’s it! Note now that we may add in new primitive operations without changing any of the code in the parser, type-checker, or interpreter! From now on, when we need a new primitive, we will simply add it and mention its name. We will not bother the reader with its implementation, because it is nothing more than another entry in the look-up table, which is out of sight and out of mind.

Adequately many test cases are provided in the source code, so that the user may verify the interpreter’s correctness. Of course, as we repeat time and time again, the full source code for this post is available on this blog’s Google Code page. Additionally, we have a server accepting and evaluating your every program! Just amble right on over to the Chai Interpreters page to give it a swing.

Next time,  we’ll get started with the first big consideration in our language: how to handle scope. We’ll introduce a language form that binds values to variables, and come up with a system for referencing them. Until then!

Chai – The Most Basic Interpreter

While you read along, why not try evaluating programs through our interactive interpreter? As usual, the source code for all the work done in this post is available at this blog’s Google Code page. For more information on Racket, see the Quick Introduction to Racket or the more extensive Racket Guide.

Our Very First Language

A journey of a thousand miles starts with adding numbers.

Recall from our introduction that our theoretical design cycle for new features in Chai is as follows:

  • Add new syntax forms
  • Decide how those forms should be interpreted

So let us implement the most basic language we can imagine, with just two language forms: one for numbers and one for adding numbers. By starting with something so simple, we will have a skeleton of a parser and interpreter to which we may add new features incrementally, and we won’t get bogged down in the details of more complicated ideas.

So let’s decide what programs will look like!

To rigorously define what expressions will look like, we use a nice standard for descriptions called Extended Backus-Naur Form (EBNF). For those familiar with the theory of computation, this is very closely related to the notations for defining context-free grammars. EBNF notation is quite easy to learn by imitation, and we will teach it by example. Our first attempt at a syntax tree might look something like this:

expr = number
     | (+ number number)

Here the pipe symbol, |, should be mentally replaced with the word “or.” For now, every program will consist of a single “expr,” which is short for expression. In words, an expression can either be a number, or a sum of two numbers. For the sake of this language, we allow “number” to be anything that Racket considers a number (we will get to this soon), and we ignore additional whitespace in between these tokens.

Finally, one notices the odd placement of parentheses and the plus symbol. The parentheses represent an application of a binary operation, and the position of the plus is a notational standard called Polish notation. These aspects of the syntax may seem odd at first, but they happen to make our lives much simpler by eliminating all the hard parts of parsing expressions. And, of course, every logician knows the advantage of Polish notation: there is absolutely no ambiguity in how to read an expression. These subtle details rear their ugly heads in compiler design, and we may come back to them in the distant future. But for now we ask that the reader accept it, and get used to it, because it turns out the entire Racket language (and Lisp family of languages) is based on this notation.

Here are a three examples of well-formed programs we could write in this early version of Chai:

7
(+ 14 9)
(+   4   10 )

Unfortunately, there are some very simple programs we cannot yet write in Chai. For instance, the following program does not fit into our specification:

(+ 1 (+ 2 3))

Rigorously speaking, this form is not included in the above EBNF program syntax, because the arguments to a sum may only be numbers, and not other expressions. With the obvious self-referential modification, we fix this.

expr = number
     | (+ expr expr)

Now we may chain additions indefinitely. With the appropriate extension, we could easily extend this to include all binary operations we desire. For instance:

binop = + | - | * | / | ^ | < | ...

expr = number
     | (binop expr expr)

We will do this in the future, but for now, let’s translate these two syntactic forms into Racket code.

Define-Type, and Type-Case

In order to implement our syntax tree, we’d like some sort of internal representation for a language form. Of course, we could just package everything as a list of strings, but our foresight tells us this would get old quick. A structured solution is relatively straightforward; and the code itself turns out to look just like our EBNF tree, but with some additional names for things. In short: we want a datatype that represents each language form, and encapsulates the types of its arguments. In the language of compilers, such a datatype is called an abstract syntax tree (sometimes abbreviated to AST). Here is our first AST for Chai, implemented in Racket.

;; a chai expression
(define-type chai-expr
  [num (val number?)]
  [sum (lhs chai-expr?) (rhs chai-expr?)])

We explain the notation for those of us not so familiar with Racket: the semicolons start a comment line, which is not part of the program. The variable names in Racket (also called identifiers) allow for many characters that other languages prohibit: hyphens, slashes, question marks, etc., are all valid to use in identifiers. So for those experienced with other languages, don’t mistakenly think these are subtraction operations! Finally, the (define-type chai-expr …) language form defines a new type called ‘chai-expr’, and the following sub-expressions are subtypes. For Java and C/C++ users, this is essentially a shorthand for a bunch of public classes which all inherit the chai-expr interface (an empty interface, that is). The define-type form was specifically designed for creating these abstract syntax trees, and it does a pretty good job of it. Here, each subtype looks like

[type-name (field-1 type-1?) (field-2 type-2?) ...]

where ‘type-name’ and ‘field-j’ are identifiers for all j, and ‘type-j?’ are functions which accept one argument, and return true if the argument is of a specific type. For instance, ‘number?’ is a function which accepts one argument and returns true if and only if the argument is a number (as defined by Racket). For user-defined types, Racket creates these ‘type?’ functions automatically. In addition, Racket creates functions for us to create new instances of these types and access their fields.

This is easier to explain with examples. For instance, I could create some objects and then access their various fields:

> (define y (num 1))
> (num-val y)
1
> (define x (sum (num 2) (num 3)))
> (sum-lhs x)
(num 2)
> (num-val (sum-rhs x))
3

So ‘num-val’ accesses the ‘val’ field of the ‘num’ type, and so forth for each type we create. This is fine and dandy, but most of the time we won’t know whether our given piece of data is a ‘num’ or a ‘sum’ type. Instead, we will just know it is a ‘chai-expr’ and we’ll have to figure out which subtype it corresponds to. Luckily, Racket has our back again, and provides us with the ‘type-case’ language form. We might use it like this:

> (define my-expr (num 5))
> (type-case chai-expr my-expr
    [num (val) (string-append 
                 "I got a num! It was "
                 (number->string val))]
    [sum (lhs rhs) "I got a sum! This is rad!"]
    [else "Getting here is an existential crisis."])
"I got a num! It was 5"

The first argument (for us, ‘chai-expr’) describes which type to inspect, the second is the argument of that type (here, a ‘num’ object), and the subsequent [bracketed] clauses provide cases for each subtype one wants to consider, optionally with an ‘else’ clause which we included superfluously. The Racket interpreter determines which clause is appropriate (here, the ‘num’ clause), and binds the actual arguments of the input (in this case, 5) to the parenthetic field identifiers (in this case, (val)), so that one may use them in the following expression (here, the call to string-append).

As it turns out, define-type and type-case is all the machinery we need to get things working. But before we continue, I should mention that these two functions are not native to Racket. In fact, they come from a package called plai, and they were created using Racket’s nice system for macros. In other words, in Racket one can write new language forms for Racket! We won’t cover those here, but any programming enthusiasts out there might have a lot of fun with exploring the possibilities therein.

The Parser

Once we can translate an expression into branches of our abstract syntax tree, we will find that writing the actual interpreter is extremely easy. So let’s do the hard part first: parsing.

Of course, our choice of Racket-like syntax was in part because parsing such a syntax is relatively easy. In particular, Racket has a nice function that takes a string and converts it into a list of symbols, strings, and numbers (and other certain primitive data types). For instance,

> (read (open-input-string "(hello (there! 7 (+ 2 4)))"))
'(hello (there! 7 (+ 2 4)))

Here the leading quote is shorthand for Racket’s “quote” function. The official name for a quoted expression (and what “read” outputs) is s-expression. An s-expression is either a simple value (number, string of characters, boolean, or symbol), or a list of s-expressions. All words within a quoted expression are interpreted as symbols For more on quote, see the Racket Guide’s section on it.

So with an appropriate call to read, we can take a user’s input string and get an organized list of values. In our source code, we implement a more complex read function, which is available in the “chai-utils.rkt” source file on this blog’s Google Code page. With some additional checks to ensure there is exactly one expression to be read, we call this function “read/expect-single.” Its details are decidedly uninteresting, but if the reader is curious, one may find its internals displayed in the aforementioned source file. Similarly, we wrote a function called “expr->string” which accepts an s-expression and prints it out as a string.

As the reader might anticipate, once we have our inputs in the form of a list, parsing becomes a recursive cake-walk. Specifically, we would start with a shell of a function:

;; parse-expr: s-expr -> chai-expr
;; parse an s-expression into a chai expression, or throw an error
;;  if it is not well-formed
(define (parse-expr input-expr)
  (cond [(number? input-expr) <do something>]
        [(list? input-expr) <do something>]
        [else (error (string-append
                       "parse: unsupported language form: "
                       (expr->string input-expr)))]))

So parse-expr accepts an s-expression, and spits out a well-formed chai-expression, defaulting to an error. Here we use the “cond” language form, which is the Racket analogue to “switch” in C/C++/Java. For everyone else, it allows us to string together a number of conditions without writing cumbersomely many nested if/then/else statements. In each branch of the cond, we narrow down what our possible expression could be. If it satisfies number?, then our expression is a num, and if it satisfies list?, it is likely a sum. From here filling in the <do something> parts is easy: we simply construct the appropriate types:

;; parse-expr: s-expr -> chai-expr
;; parse an s-expression into a chai expression, or throw an error
;;  if it is not well-formed
(define (parse-expr input-expr)
  (cond [(number? input-expr) (num input-expr)]
        [(list? input-expr) (sum (parse-expr (second input-expr))
                                 (parse-expr (third input-expr)))]
        [else (error (string-append
                       "parse: unsupported language form: "
                       (expr->string input-expr)))]))

Here, “second” and “third” extract the second and third elements of the list, and parse-expr recursively evaluates the arguments (as we noted above when we said a sum had the syntax (+ expr expr)). However, it appears we missed something big: what if the list doesn’t have three elements in it? Someone could try to run the program “(square 7),” expecting a result of 49. This is certainly not a sum, but as of now our parser doesn’t make any distinctions. So we need to add a few more checks. Here is the complete parser, with all appropriate conditions checked and combined using the “and” function:

;; parse-expr: s-expr -> chai-expr
;; parse an s-expression into a chai expression, or throw an error
;;  if it is not well-formed
(define (parse-expr input-expr)
  (cond [(number? input-expr) (num input-expr)]
        [(and (list? input-expr)
              (eq? (first input-expr) '+)
              (eq? 3 (length input-expr)))
         (sum (parse-expr (second input-expr))
              (parse-expr (third input-expr)))]
        [else (error (string-append
                       "parse: unsupported language form: "
                       (expr->string input-expr)))]))

We check to make sure the first thing in the list is the symbol ‘+, and that the list has exactly three elements. Then we can be sure that the user meant to put in a sum.

The parse-expr function, along with our following “interp” function and the tests for both functions, will be stored in the “chai-basic.rkt” source code file on this blog’s Google Code page.

The Interpreter

At this point, we’ve parsed our expressions into the chai-expr datatype, and so now a simple application of type-case is all we need to interpret them. Indeed, the interp function practically writes itself:

;; interp: chai-expr -> number
;; interpret a chai-expression
(define (interp input-expr)
  (type-case chai-expr input-expr
    [num (val) val]
    [sum (lhs rhs) (+ (interp lhs) (interp rhs))]))

We don’t need to do any more conditional checking, because we know that anything fed to interp is well-formed. Later, specifically once we add variable references, interp will become much more interesting.

Here the plus function is Racket’s plus. Of course, it seems a bit silly to use + to interpret +, but remember that the point of this series is not to write a language from the ground up, but to get things rolling as quickly as possible, so that we may analyze the more interesting features of programming language semantics. Simply put, arithmetic is boring. We include it simply for familiarity, and because it makes good fodder for writing test cases to ensure our interpreter acts as it should.

Finally, we add one additional function which executes the entire parse/interp chain:

;; evaluate/chai: string -> any
;; perform entire the parse/interp chain
(define (evaluate/chai input) (interp (parse input)))

And (at the top of our source file), we “provide” the function so that other Racket programs may access it, specifically with the command (require “chai-basic.rkt”). All of our interpreters in this series will adhere to the same externally-facing interface.

(provide evaluate/chai)

So there we have it! If the reader downloads the source files, he can interpret expressions through Racket’s interactive interpreter. Additionally, the reader is invited to visit our website, where we have set up a program to receive and evaluate chai-programs through the internet. In the future we will store all of our online interpreters here, so one will be able to access Chai at all stages of its development.

Next time, we will finish off a full set of arithmetic operations, and start looking at variables. Until then!

Chai – Designing a Programming Language

The Journey Begins

We want to design a programming language!

That is certainly a lofty ambition, but one might ask, why bother? There are programming languages already out there that are way better than anything one mathematician-slash-programmer could do. Surprisingly, the answer is not so that we can have a programming language! Instead, we follow our inner mathematicians; we wish to study the structure of programming languages.

By doing so, we will learn how and why programming languages work, and we might even get some experience writing programs along the way. For the non-programmers out there, don’t be afraid. The process of designing a language is gradual, and as a result we’ll repeatedly practice the most basic features of our language, and then slowly add more features until we have a lean mean number-crunching machine. If one has a bit of programming knowledge already, then the journey through this series of posts should provide a good intuition for other programming languages, helping one determine the right questions to ask when a program runs amok. For the mathematicians, this series will stray a bit from our theoretical comfort zones, but we will use mathematical tools from computing theory to describe and analyze languages and programs. So this series should have a little something for everyone.

A Rough Outline

In order to study programming languages we need to study the issues which are common to all languages, and then decide how to customize those to fit our needs. At the highest level, a programming language has two parts: syntax and semantics. The logicians out there are already familiar with such words, but we will define them here.

A language’s syntax is the set of rules for producing well-formed programs. Syntaxes show up everywhere. In the language of arithmetic, the expression “1+” is not well-formed, because we broke the syntactical rule of the addition operator having a number before and after it. In the language of currency, one cannot lend a friend “5$2.01” dollars, because we all know the dollar-sign goes at the beginning (or end) of a number. For programs, our syntax will consist of rules for arithmetic, binding variables, constructing functions, and a number of other constructions with names we will learn later. Every form of communication has a syntax, and so forming syntactically correct statements is a prerequisite for conveying their meaning.

Second, a language’s semantics are the rules for evaluating well-formed programs. For us, this will include the precedence rules of arithmetic (and other) operations, variable substitution methods, scope, types, and many other issues. The study of semantics is mathematically the richest area of programming language theory. Researchers define and compare semantic models, and describe their relationships with underlying mathematical concepts, such as categories or models (both very rich and popular contemporary fields). For the majority of this series, we will stay away from such a rigorous mathematical analysis, mainly because the author is not familiar enough with these topics (yet). In any case, we will spend most of our time worrying about how to implement the correct semantic rules in our language.

And that raises the obvious question: how do we “make” a programming language do anything? The answer is (of course) we write a program! Specifically, we will design our new language, which we designate as the source language, by writing a program to interpret it. This program, which we henceforth dub an interpreter, will be written in a language we are familiar with, which we designate as the implementation language. In other words, our programming language will be completely determined by what the interpreter does to well-formed programs. If we design it correctly, then this will coincide with our theoretical idea of what it should do.

Note that writing an interpreter is one of two possible approaches. We could instead write a program called a compiler that translates our source language into machine code, which is specific to a physical computer. After compiling, the user may then ditch the compiler, and use the compiled code as she wishes (with our approach, she needs the interpreter every time she wants to run the program). Compilers have a number of other, more technical problems associated with them (see our post on Register Allocation [upcoming]), but it has the advantage of producing programs which run quickly. Since we are only interested in studying how languages are built, and not in running fast programs, we can live without a compiler and forget its issues. In the future, we may do another series on designing a compiler, during which we focus on the special challenges faced down that deeper rabbit hole.

So as we start to design our programming language, we will follow a recipe for each new feature:

  • Theoretically:
    • Add new syntax forms
    • Decide how those forms should be interpreted (determine their semantics)
  • In code:
    • Accept a program in textual form
    • Translate the textual form into an intermediate representation, while checking that it is well-formed (parse the input)
    • Interpret the intermediate representation (run the program)
    • Output the results to the user

Racket

As we write these interpreters, we will make the source code available on this blog’s Google Code page. However, as a special treat, we will also set up a web server for the reader to practice writing programs! In fact, this web server is already up and running. Later, that page will have links to forms where the user can type in a program, hit “submit,” and the server will run the program and report the result. This will save any non-programmers the trouble of downloading and installing Racket (our implementation language) and running the interpreter from its source. (Of course, the user is always welcome to do so. Racket is a wonderful language to use!)

We Almost Forgot the Most Important Part!

Before we do anything else, we need a name for our language. The names of popular programming languages are generally short, slick, and easy to pronounce. While the Java programming language started a coffee theme which would also be appropriate for mathematicians, this author loves tea, and we find it sad that no popular languages are named after tea. Hence, we name our new language “Chai.”

To sum our discussion up, we are about to embark on a mission: we will write a Chai interpreter in Racket, and learn about programming languages along the way. Next time we kick things off with the simplest version of Chai, one that exclusively features arithmetic. Until then!