*As you read along, why not try writing some programs and seeing how they evaluate in this version of Chai? You can do so at our live web server.*

## Organization

Last time, we implemented a simple interpreter for a language which exclusively computed sums of numbers. Before we continue with more interesting language forms, we need to give ourselves some meat to work with.

Specifically, we need to add more primitive operations (+, -, *, /, and, or, not, …), and modularize our files so we can keep track of things better. After we cover the new syntax forms, this post will be admittedly be technical; we’ll spend most of our efforts here designing look-up tables, and implementing simple type-checking for primitive operations. Afterward, we will be able to add new primitive operations and data types without modifying our interpreter, and only slightly modifying our parser. The point of all this is to hide the messy code which handles primitives, so that we can better focus on more interesting language forms. For those readers without a neurotic desire to organize code (and hence no desire to read the entire post), just remember that in the future, we will add new primitives without detailing the implementation. For reference, the source code is available on this blog’s Google Code page.

The first change we’d like to make is to move our abstract syntax tree to a separate file, “chai-ast.rkt.” In order to have all the generated functions (constructors and field accessors) available for the rest of our files to use, we need a special “provide” command:

(provide (all-defined-out)) (require plai) ;; a chai-expression (define-type chai-expr [num (val number?)] [sum (lhs chai-expr?) (rhs chai-expr?)]))

So (all-defined-out) is a special function which provides all functions defined in this file. Now, instead of having separate language form for each primitive operation (say, a sum, a diff, a mult, a div, etc), we can abstract this to a “primitive” form. In other words, our “sum” line changes to this:

[prim (id symbol?) (args list?)]

We also recognize that we will want to include unary primitives, and ternary primitives, so we have an identifying symbol and a list of arguments. In other words, our new EBNF syntax looks like:

expr = number | (prim expr expr...) prim = + | - | * | /

Where the “expr…” notation means we allow zero or more following things which are exprs. For now, we do not have any null-ary operations (with zero arguments). Finally, we’d like to include boolean-valued expressions and boolean algebra as well, so this extends our syntax to:

expr = number | true | false | (prim expr expr...) prim = + | - | * | / | and | or | not

Note that the syntax does not keep track of and information associated with the operations, like type or arity. Syntactically, it is perfectly okay to write the program: “(not 1 2 3)”. This is well-formed, because the first thing is a primitive symbol, and the other things are valid arguments. We will leave the semantics of these primitive operations to the interpreter.

Implementing all of this in our abstract syntax tree is straightforward:

;; a chai expression (define-type chai-expr [num (val number?)] [bool (val boolean?)] [prim (id symbol?) (args list?)])

Now let us turn to the parser, which is about as simple as it was last time. Again, we assume that we have already converted an input string into an s-expression (via the utility functions in chai-utils.rkt).

;; parse-expr: s-expr -> chai-expr ;; parse an s-expression into a chai expression, ;; or throw an error if it is not well-formed (define (parse-expr input-expr) (cond [(number? input-expr) (num input-expr)] [(eq? 'true input-expr) (bool #t)] [(eq? 'false input-expr) (bool #f)] [(and (list? input-expr) (primitive-symbol? (first input-expr))) (prim (first input-expr) (map parse-expr (rest input-expr)))] [else (error (string-append "parse: unsupported language form: " (s-expr->string input-expr)))]))

The first three cases of the “cond” expression are trivial. In the fourth, we introduce a new function, which determines whether the first thing in our s-expression is a primitive symbol. Of course, we actually need to write this function, but before we do that let’s come up with a good way to structure a table with information about primitives.

We create a new file called “chai-primitives.rkt” just for this purpose. In it, we’ll have a look-up table with some information about primitives:

;; a helper function for the primitive operation table (define (make-op op) (λ (arglist) (apply op arglist))) ;; primitive operation lookup-table ;; symbol -> (list op num-args (list type1? type2? ...)) (define primitives-table (hasheq '+ (list (make-op +) 2 (list number? number?)) '- (list (make-op -) 2 (list number? number?)) '* (list (make-op *) 2 (list number? number?)) '/ (list (make-op /) 2 (list number? (λ (x) (if (equal? 0 x) (error "division by zero!") (number? x))))) 'or (list (λ (args) (ormap (λ (x) x) args)) 2 (list boolean? boolean?)) 'and (list (λ (args) (andmap (λ (x) x) args)) 2 (list boolean? boolean?)) 'not (list (make-op not) 1 (list boolean?))))

“hasheq” creates a new hash table where the “eq?” function is used to compare keys. In this case, the keys are Racket symbols, and “eq?” happens to work very well with symbols. Associated to each symbol is a list of three things. First, the operation that will be applied to the arguments. Since each function has to be applied to a packaged list of values, we will construct the “operation” via make-op (with the exception of and and or, for quirky Racket reasons), which creates a function that applies the given operation to a list of arguments. Second, we have a number representing the allowed number of arguments; and finally, we have a list of functions. In this list, the th function is used to determine whether the th argument has the correct type (and they are applied in order). This allows us to package important information here, such as the condition that the second argument of division is never zero. This sort of ho-hum logic would severely clutter our interpreter.

But instead of allowing our other files to access the primitives table willy-nilly, we should create an extra layer of protection, which will make it easier for us to maintain our code in the future. Specifically, we want a function which determines if a given symbol is a key in the table, and a function to retrieve the contents of the table. We do this as follows:

;; primitive-symbol?: symbol -> boolean ;; return true if the given symbol is a primitive (define (primitive-symbol? id) (hash-has-key? primitives-table id)) ;; get-prim: symbol -> function number (list function) ;; lookup the data associated with a primitive (define (get-prim id) (apply values (hash-ref primitives-table id)))

The bit with “(apply values … )” takes the items in the list resulting from the hash-lookup, and returns them as multiple values. Note that in Racket, functions *can *have multiple return values. As we will see, this decision makes our code shorter. Instead of writing code to extract the pieces of the list, we just require the caller to bind all three return values to variables at the same time. Our foresight tells us that when we are type-checking, we will need all of this information at the same time, so requiring us to bind them all is no hindrance. Providing these two functions gives us all we need to continue with our interpreter.

The actual interpreter is quite simple now. There are only three cases:

;; interp: chai-expr -> chai-expr ;; interpret a chai-expression (define (interp input-expr) (type-case chai-expr input-expr [num (val) val] [bool (val) val] [prim (id args) (let* ([interped-args (map interp args)] [operation (type-check/get-op id interped-args)]) (operation interpreted-args))]))

At this point, we note that at some point we will want to perform type-checking and retrieve the required primitive operation given its identifier. We also recognize that this code belongs elsewhere. So we defer the logic to a function called “type-check/get-op”, which accepts the primitive operation identifier, and the list of arguments (after they’ve themselves been interpreted!), and returns the function which may then be applied to a list of values, throwing an error if there is an incorrect number of arguments or if any has the wrong type. The rest of the logic that goes into the interpreter is quite clear.

## Type Checking

We write a ghostly shell of a type-checker as follows:

;; type-check/get-op: symbol list -> function ;; check the types and number of the arguments for the primitive ;; 'id', returning the primitive operation on success, and ;; throwing an error on failure (define (type-check/get-op id args) (let-values ([(op num-args-allowed arg-types) (get-prim id)]) op))

For starters, we just return the operation as it is. Note that let-values allows us to bind all three results of the call to “get-prim” simultaneously.

To fill in the gaps, we should count the number of arguments in the “args” variable, and check that against “num-args-allowed”:

;; type-check/get-op: symbol list -> function (define (type-check/get-op id args) (let-values ([(num-args-received) (length args)] [(op num-args-allowed arg-types) (get-prim id)]) (if (not (equal? num-args-received num-args-allowed)) (error 'type-check "~a expected ~a args, but received ~a." id num-args-allowed num-args-received) op)))

Note that we need the additional parentheses around (num-args-received) in the binding clause, because let-values expects each clause to have a list of identifiers. In addition, the “error” function accepts a formatting string similar to the kind encountered in C’s printf function. However, here the “~a” variable allows one to write out whatever value one wants. The details of Racket’s flavor of formatted strings is in the documentation here. All we need to know here is that “~a” works for strings and numbers.

Now that we’ve checked that there are the correct number of arguments, let us ensure they have the right type. To do this, we simply apply each element of “arg-types” (a list of boolean-valued functions) to the corresponding element of “args”:

(andmap (λ (a b) (a b)) arg-types args)

Here “(λ (a b) (a b))” is an anonymous function which accepts two arguments, and applies the first argument as a function call on the second argument. If we “map” this lambda expression over the list of arg-types and args, we will get a list of booleans representing whether the arguments have the correct types! Going slightly further, “andmap” performs such a “map,” and returns true if and only if each value in the resulting map is true (probably short-cutting by terminating upon finding a single false value). Thus, this expression is true precisely when the arguments type-check!

Putting this back into our type-check/get-op function, we are finished:

;; type-check/get-op: symbol list -> function (define (type-check/get-op id args) (let-values ([(num-args-received) (length args)] [(op num-args-allowed types) (get-prim id)]) (if (not (equal? num-args-received num-args-allowed)) (error 'type-check "~a expected ~a args, but received ~a." id num-args-allowed num-args-received) (let ([good-args? (andmap (λ (a b) (a b)) types args)]) (if good-args? op (error 'type-check "one of the arguments to ~a has the wrong type: ~a" id (string-join (map chai-expr->string args) ", ")))))))

[We apologize for the sloppy indenting in the last two lines; it is better than forcing the reader to scroll sideways. For a more pleasant indentation scheme (with colors, parenthesis matching, and more!), view the file in DrRacket, the standard Racket IDE.]

So that’s it! Note now that we may add in new primitive operations without changing any of the code in the parser, type-checker, or interpreter! From now on, when we need a new primitive, we will simply add it and mention its name. We will not bother the reader with its implementation, because it is nothing more than another entry in the look-up table, which is out of sight and out of mind.

Adequately many test cases are provided in the source code, so that the user may verify the interpreter’s correctness. Of course, as we repeat time and time again, the full source code for this post is available on this blog’s Google Code page. Additionally, we have a server accepting and evaluating your every program! Just amble right on over to the Chai Interpreters page to give it a swing.

Next time, we’ll get started with the first big consideration in our language: how to handle scope. We’ll introduce a language form that binds values to variables, and come up with a system for referencing them. Until then!