P vs. NP, A Primer (And a Proof Written in Racket)

Decidability Versus Efficiency

In the early days of computing theory, the important questions were primarily about decidability. What sorts of problems are beyond the power of a Turing machine to solve? As we saw in our last primer on Turing machines, the halting problem is such an example: it can never be solved a finite amount of time by a Turing machine. However, more recently (in the past half-century) the focus of computing theory has shifted away from possibility in favor of determining feasibility. In particular, we want to know how fast we can solve a particular problem on a Turing machine. The goal is to design efficient algorithms for important real-world problems, and know when it is impossible to design a more efficient algorithm than what we already have. From a mathematical perspective, that often means we would like to find the boundary problems: those which are “just barely” too hard to admit an efficient algorithm. Of course, this requires us to first define what it means for an algorithm to be “efficient.”

These questions are intimately intertwined with big-O analysis, which we presented in our very first primer on this blog, and the definitions we investigate in the sequel require fluency in such notation. That being said, we try to give colloquial analogues to the most important ideas.

The Class P

The general goal here is to describe classes of problems that roughly correspond to how efficiently one can solve them. The first such class (and by all rights, the smallest class we will consider) is called P, which stands for “polynomial.”

Definition: P is the set of languages which can be decided by a Turing machine in $ O(n^k)$ for some $ k$, where $ n$ is the input size of the problem. In the common notation:

$ \displaystyle \textup{P} = \bigcup_{k \in \mathbb{N}} \textup{TIME}(n^k)$

Where $ \textup{TIME}(n^k)$ is the set of problems decidable in $ O(n^k)$ time. Recall that a language is any set of strings (any subset of $ \Sigma^*$, as we have been investigating in our past computing theory primers and our post about metrics on words).

We don’t have too much to say about P itself, except that many common problems are in P. The real difficulty is in proving that some problem does not lie in P.

Here is an example this author has used to explain to elementary school children what’s it’s like to not be in P. Consider a deck of 52 playing cards, and your goal is to sort the cards by suit and then by rank (as it is when you open a fresh pack of cards). There is one obvious way to do this that children teach themselves quite readily: just look for the Ace of spades, put it on the bottom of a new stack. Then find the two of spades, and continue in this manner until the deck is sorted. This algorithm takes $ O(n^2)$, since in the very worst case the deck is sorted in reverse order, and if you’re too dim to realize it, you look through the entire deck at every step.

Of course, there are much faster ways to do it than the approach we detailed above, and it is a well-known theorem that sorting is $ \Theta(n \log(n))$. That is, there is both a lower bound and an upper bound of $ n \log(n)$ for the general sorting problem.

Now let’s investigate a ridiculous algorithm for sorting cards. What if at each step, we shuffled the deck, and then checked to see if it magically became sorted due to our shuffling. Forgetting for a moment that randomness comes into play, we would find the correct sorting once in $ 52!$ trials, and it would take 52 steps to check if it was sorted. For a deck of $ n$ cards, this would hence take time $ O(n! \cdot n)$, which would take so long even for 52 cards, that the sun would go out before you could finish. But what if we didn’t know a better way?

For an example of a problem where we don’t know a better way, consider the problem of graph coloring. Is there a Turing machine which, when given as input a graph $ G$ and a number $ k$, can determine in polynomial time whether $ G$ admits a $ k$-coloring? There is an obvious algorithm to decide the problem: there are only finitely many choices of assignments of vertices to colors, and we can simply check them all. In fact, there are $ k^n$ of them, where $ n$ is the number of vertices of $ G$.

Unfortunately, this algorithm is not polynomial in runtime: we would have to enumerate all of the different colorings, and check whether each is a valid coloring; this process is easily seen to be $ o(nk^n)$, which is far from polynomial for arbitrary $ k > 2$.

But the true challenge is this: how do we know there is no “faster” algorithm? Of all the crazy wild ideas one could have to solve a problem, how do we know none can be so clever that they reduce the running time to be a polynomial in $ n$?

In fact, we don’t know for sure that there isn’t! This is the heart of the open problem which is succinctly called “P vs. NP”.

The Class NP

While P is a class of “easy” problems from one perspective (problems that can be solved quickly, even in the worst case), being a member of NP is another measure of “easiness,” but from a different perspective.

Definition: NP is the class of problems which have a polynomial-time verifier. That is, given an input $ w$ to a problem $ A \in \textup{NP}$, there is a string $ c$ called a certificate and a Turing machine $ M$ for which $ M$ verifies that $ c$ proves $ w \in A$ and runs in polynomial time.

This definition is a bit hard to swallow, but examples clarify the situation greatly. For the problem of graph coloring, we note that a certificate would simply be a list of pairs $ (v_i, n_i)$ which give a coloring of the graph $ G$. It is quite trivial to define a polynomial-time Turing machine that ensures the coloring of $ G$ is valid. Hence, graph coloring is in NP. This is the case with most problems in NP: a proof that $ w \in A$ is hard to find, but easy to verify once you have it.

There is another definition of NP which is often useful, and it gives a reason for prefixing the “polynomial” part of P with an N.

Definition: NP is the set of problems which are solvable by a nondeterministic Turing machine in polynomial time.

For the motivating picture behind “nondeterministic Turing machines,” we turn to an analogy. Imagine you have an infinite number of computers running in parallel, and they can communicate instantaneously. What sorts of problems could we solve in polynomial time with such a machine? We could certainly solve graph coloring: simply have each machine try one of the $ k^n$ different colorings, and have the entire machine halt when one coloring is found (or when they all finish, we can safely reject that the graph is k-colorable).

So we can reformulate the definition in set notation as:

$ \displaystyle \textup{NP} = \bigcup_{k \in \mathbb{N}} \textup{NTIME}(n^k)$

Here $ \textup{NTIME}(f(n))$ is the set of all languages which can be solved in time $ f(n)$ on a nondeterministic Turing machine.

In other words, “NP” stands for “nondeterministic polynomial-time” problems. And in fact this definition is equivalent to existence of a polynomial-time verifier, as in our first definition. To see this, note that we can construct a nondeterministic machine that enumerates all possible certificates (lending to our analogy, one on each of the infinitely numerous parallel computers), and then tests them using the polynomial-time verifier. Since each branch uses a polynomial-time algorithm, the whole Turing machine runs in (nondeterministic) polynomial time. On the other hand, if some branch of computation halts in deterministic time, then the sequence of configurations of the tape for that branch has polynomial length, and so a Turing machine can simply run that computation to ensure it follows the rules of Turing machines and ends in an accepting state. This clearly represents a certificate.

One might ask why we care about infinitely parallel computers. In reality, we can only have finitely many computations going at once, so why bother with this silly mathematical uselessness? As it often turns out in mathematics, it is useful to think about such structures simply because they capture the essence of what we wish to study in a concise and formal manner. For complexity theory, nondeterministic Turing machines capture the level of complexity we wish to understand: in a sense, it’s the “next level up” from polynomial-time decidable problems.

K-Clique and 3-Sat

We have two more examples of problems in NP which will be important later in this post: the problems of 3-Satisfiability and finding a k-Clique.

Definition: Let $ \varphi(x_1, \dots, x_n)$ be a propositional formula in $ n$ boolean variables. $ \varphi$ is satisfiable if there exists an assignment of the variables $ x_1, \dots, x_n \to \left \{ \textup{T}, \textup{F} \right \}$ that makes $ \varphi$ true.

For example, we could have the formula

$ \displaystyle (x \vee y \vee \overline{z}) \wedge (\overline{x} \vee \overline{y} \vee z)$

And this is satisfiable by the assignment of $ x = \textup{T}, y = \textup{F}, z = \textup{T}$.

It should be obvious to the reader now that determining whether a formula is satisfiable is in NP: a certificate is just a list of variables mapping to true or false, and checking that the formula is satisfied by a given assignment can be done in polynomial time.

In 3-Satisfiability (usually shortened to 3-Sat), we simply restrict the form of $ \varphi$ and ask the same question. The form is called conjunctive normal form, and colloquially it is a bunch of clauses joined with “and,” where each clause is a bunch of variables (and their negations) connected by “or”. Moreover, the “3” in 3-Sat requires that each clause contain exactly three literals. For example, the equation given above would be a valid input to 3-Sat.

k-Clique, on the other hand, is a question about graphs. Given a graph $ G$ and a positive integer $ k$, determine whether $ G$ contains a complete subgraph of $ k$ vertices. (In a complete graph, there is an edge connecting each pair of vertices; the name “clique” is motivated by the party problem, in the sense that there is a “clique” of friends at the party who are all mutually friends with each other.)

As expected, a certificate that a graph has a $ k$-Clique is just a list of the vertices in the clique, and checking that all pairs of vertices listed have a connecting edge is easily seen to take $ O(|G|k^2) = O(n^3)$, which is polynomial in the size of the input.

NP-Completeness

As it turns out, the problems of 3-Satisfiability and k-Clique are quite special (as is graph coloring). They belong to a special sub-class of NP called NP-complete. Before we can define what NP-complete means, we have to be able to compare problems in NP.

Definition: Given two languages $ A, B$, we say $ A \leq_p B$, or $ A$ is polynomial-time reducible to $ B$ if there exists a computable function $ f: \Sigma^* \to \Sigma^*$ such that $ w \in A$ if and only if $ f(w) \in B$, and $ f$ can be computed in polynomial time.

We have seen this same sort of idea with mapping reducibility in our last primer on Turing machines. Given a language $ B$ that we wanted to show as undecidable, we could show that if we had a Turing machine which decided $ B$, we could solve the halting problem. This is precisely the same idea: given a solution for $ B$ and an input for $ A$, we can construct in polynomial time an input for $ B$, use a decider for $ B$ to solve it, and then output accordingly. The only new thing is that the conversion of the input must happen in polynomial time.

Of course, this discussion was the proof of a clarifying lemma:

Lemma: If $ B \in \textup{P}$ and $ A \leq_p B$, then $ A \in \textup{P}$.

The proof is immediate, as $ B$ can be solved in polynomial time, and the conversion function runs in polynomial time. We leave the construction of an explicit Turing machine to decide $ A$ as an exercise to the reader.

To phrase things more colloquially, $ A \leq_p B$ is true if $ A$ is an easier” problem than $ B$, hence justifying the less-than notation.

And now for the amazing part: there are problems in NP which are harder than all other problems in NP.

Definition: A language $ A \in \textup{NP}$ is called NP-complete if for all problems $ B \in \textup{NP}$, $ B \leq_p A$.

In other words, all problems in NP reduce to an NP-complete problem in polynomial time. In fact, we get another nice fact about NP-completeness that mirrors our observation about P above:

Lemma: If $ A$ is NP-complete and $ B \in \textup{NP}$ with $ A \leq_p B$, then $ B$ is NP-complete.

Obviously the composition of two polynomial-time reductions is a polynomial-time reduction, so we can conclude that all problems in NP which reduce to $ A$ also reduce to $ B$.

The cautious reader should be rather hesitant to believe that NP-complete problems should even exist. There is no reason we can’t come up with harder and harder problems, so why should there be a point after which we can’t quickly verify a solution?

Well, Stephen Cook proved in 1971 that there is an NP-complete problem, and shortly thereafter many more were found. Today, there are thousands of known NP-complete problems.

Perhaps unsurprisingly, Cook’s original NP-complete problem was N-Satisfiability (i.e., 3-Satisfiability without a constraint on the number of clauses or the form). Unfortunately the proof is quite difficult. We point the reader to the relevant Wikipedia page, and briefly mention the outline of a proof.

Given a nondeterministic polynomial-time Turing machine, we can bound the number of parallel computations and the length of each computation by $ n^k$ for some fixed $ k$. Then we create a $ n^k$ by $ n^k$ table of the configurations of the Turing machine (the i,j cell for the i-th branch of computation and the j-th step). From this, we can construct a monstrously enormous (yet polynomial in size) formula which has a satisfying assignment if and only if the Turing machine halts on some branch of computation in an accept state. Here is a table of the formulas needed to do this. In short, the formula traces the computation of the machine at each step, and ensures the transition function is honestly followed, the tape is reliably updated, and the head of each tape moves in the correct direction.

The reason we say it’s unsurprising that Satisfiability is NP-complete is because it’s commonly believed that every aspect of mathematics boils down to pure logic, although the process of doing so is entrenched in gory detail every step of the way. So it’s understandable that all problems in NP reduce to a problem about logic which is also in NP. We stipulate that other complexity classes likely have “complete” problems that are essentially questions about logical formulas.

A New Way to Find NP-Complete Problems

Now that we have established the NP-completeness of Satisfiability, we can do the same for other problems by reduction from a known NP-complete problem. First, we claim that 3-Satisfiability is NP-complete, and we leave the proof as an exercise to the reader (hint: reduce from regular Satisfiability by putting the formula into the right form).

Now given that 3-Sat is NP-complete, we will prove that k-Clique is NP-complete, by reduction from 3-Sat (in fact our conversion function will work for any formulas in conjunctive normal form, but 3 is enough).

Theorem: k-Clique is NP-complete.

Proof. Given a formula $ \varphi$ in conjunctive normal form, we construct an instance of k-Clique as follows. First, let $ k$ be the number of clauses in $ \varphi$. Construct a graph $ G_{\varphi}$ by creating a vertex for each literal term in $ \varphi$, and (to help visualization) organize them into columns by their originating clause, and label the vertex with its corresponding literal. Introduce an edge connecting two terms $ a, b$ in different columns when the formula $ a \wedge b$ is not a contradiction. In other words, it cannot be of the form $ x \wedge \overline{x}$ for some variable $ x$.

As a concrete example, the formula

$ \displaystyle (x \vee \overline{y} \vee z) \wedge (\overline{x} \vee \overline{z} \vee w)$

converts to the graph

We claim that $ \varphi$ has a satisfying assignment of variables if and only if $ G_{\varphi}$ has a k-clique. Supposing there is a valid assignment of variables in $ \varphi$, then there must be one variable in each clause which is true (and hence $ k$ variables). This translates to $ k$ vertices in $ G_{\varphi}$, one vertex in each column which is true, and none of these vertices conflict with each other, so $ G_{\varphi}$ has an edge connecting each pair of the $ k$ vertices. Conversely, suppose $ G_{\varphi}$ has a k-clique. By our construction, two edges in the same column cannot be connected by and edge, and hence this k-clique must have one vertex in every column. If the vertex is labeled with a negation, assign it to the value $ F$ (so that the literal evaluates to true), and otherwise assign it the value $ T$. This gives a satisfying assignment of the variables of $ \varphi$, since each clause will evaluate to true under this assignment.

The final part of the proof is that the conversion function runs in polynomial time, and we claim this is obvious from the construction: if $ \varphi$ has $ n$ literals, then we create $ O(n)$ vertices and $ O(n^2)$ edges. The creation of each vertex and edge can be done in constant time, as can the verification that two literals do not conflict. $ \square$

Of course, this is a question about the possibilities of computers. Instead of giving a theoretical proof, why not just write a program to compute the conversion? Well we did just that, and the main body of the code turned out to be quite tidy:

;; n-sat-to-clique: formula -> (listof edge)
;; transform an input to n-sat to an input for clique
;; assume the input expression is in CNF, and that
(define (n-sat-to-clique expr)
  (let* ([conjuncts (∧-conjuncts expr)]
         [columns (map (λ (x) (∨-disjuncts x)) conjuncts)]
         [labeled-columns (label-graph columns 1)]
         [possible-edges (all-pairs-in-distinct-lists labeled-columns)])
     (list->set (filter no-conflict? possible-edges))))

We have a data structure for a general formula (and provide a function to compute the conjunctive normal form of any expression), and a data structure for a graph (essentially, a list of pairs of labelled vertices), and so the process of checking all possible edges, and filtering out those which have no conflict, clearly takes $ O(n^2)$ time.

The rest of the code required to run the function above is available on this blog’s Github page.

Other NP-complete Problems, and P =? NP

In the real world NP-complete problems show up everywhere. Application domains include cryptography, financial portfolio and investment management, scheduling and operation dynamics, packing problems, chem- and bioinformatics, guarding art galleries, circuit design, compiler design, and even modelling social networks.

There are even many questions one can ask about games that turn out to be NP-complete in complexity. For instance, many questions about the classic game of Tetris are NP-complete, along with Minesweeper, FreeCell, Lemmings, Battleship, and Mastermind.

Now the big unsolved question is does P = NP? In other words, can any of these seemingly hard problems be solved quickly? The simple fact is, if the answer is yes to one such problem, then the answer is yes not only to all NP-complete problems, but to all problems in NP (as we saw in our lemma earlier). This is the heart of the million-dollar question that has been crowned the most difficult open problem in computer science to date. Almost everyone agrees that P and NP should not be equal, but nobody can prove it.

Of course, common people love to talk about P and NP because of all of the wild things that would happen if we suddenly discovered that P = NP. All widely used security systems would fail, internet transactions would no longer be safe, efficiency in industry would increase by orders of magnitude, we’d unlock the secrets of the human genome, we’d quickly solve open mathematical problems and find staggeringly ginormicon primes (ginormicon = gigantic + enormous + decepticon), governments will topple, rioting in the streets, the artificial intelligence singularity will occur, etc., etc.

But all of this is just media hype. The likely reality is that some problems are simply too hard to be solved in polynomial time in general, just as there are probably problems which have no polynomial-time verifiers (i.e., problems outside of NP), excluding the trivial problems which are undecidable. In the end, it’s just a matter of time until mathematics sorts everything out, or proves that it’s impossible to do so. Indeed, just two years ago this author remembers waking up to the news that there was a 100 page proof that P is not equal to NP, and moreover that Stephen Cook himself considered it a serious attempt. Unfortunately it turned out to have irreparable flaws, but failure made it no less exciting: this is how great mathematics is made.

On the other hand, there are cranks out there who have, for the last few years, been convinced that they have proved P = NP, but are ignorant of their own flaws and the rest of the world’s criticism. May Gauss have mercy on their mathematical souls.

Until next time!

Chai – Functions with Applications

As usual, we invite the reader to follow along with the source code from this blog’s Google Code page, and to test programs with the interactive online interpreter.

One interpretation of a "Chai function."

Reserved Words, Values, and Ifs

In our last version of Chai we noticed a few things that needed fixing, so we’ll address those before moving on. First, consider the following chai program:

(with true false (or true true))

The user, being as odd as users will be, wants to bind the value “false” to the variable whose name is “true,” in a dirty attempt to trick somebody. The last version of Chai (see, Chai – Environments and Variables) interpreted this expression as “true,” and rightfully so in this mathematician’s opinion, since truth is not something which can be changed!

On the other hand, something just doesn’t feel right about this program. We can either allow such programs to be written (and wag a stern finger at all those who do!), or forbid the program from continuing once we find such a transgression. The latter makes more sense, and it introduces reserved words into Chai. As of the end of this post, our list of reserved words contains:

true false with if end pair fun

And we will add to this list as necessary. It’s ambiguous whether we should include things like +, -, *, /, etc. Someone may want to redefine what “plus” means, and that might not be an awful thing, say, once every two thousand years. For now, we will leave it alone. At least, we may stop someone from writing the appalling program:

(with with 1 (with with (with with with with) with))

All this requires to our code is that during parsing, whenever we inspect a variable binding (or list of variable bindings, or list of function arguments), we must ensure none are reserved words. This is straightforward enough, and the interested reader may inspect the source code.

Next, we realized ahead of time the need to discriminate between the input to our interpreter and its output. This is a direct result of our desire to add support for functions and basic lists. Now we have a new EBNF for “Chai values”:

chai-value = number
           | boolean
           | string
           | (pair chai-value chai-value)
           | end

One might notice that we have added strings, pairs (lists are nested series of pairs), and end, a null-value, named so to signify the terminating value of a list. In addition, we’ve added primitives for a whole host of operations, including: =, !=, <=, >=, <, >, pair (pair construction), end (list termination), first and rest (pair extraction), and end? (testing if a list is empty). We will certainly add more in the future, but for now this is enough.

In Racket code, we have the following new data type:

(define-type chai-value
  [num-v (val number?)]
  [bool-v (val boolean?)]
  [str-v (val string?)]
  [pair-v (head chai-value?) (tail chai-value?)]
  [end-v])

The new discrimination between Chai expressions and Chai values affects every branch of our interpreter and in every primitive operation, but only in how we accept and package up our data types. The interested reader can check out the changes in the source code; we will not discuss them here. (But we will mention that none of the changes occur in the type-checker. Nice.)

Finally, we’ve added “if” expressions for control flow. In the EBNF, an if looks like:

expr = ...
     | (if expr expr expr)

Where the first expression is the “test” expression, and the second is the “then” expression, and the third is the “else” expression. This translates directly to Racket as:

(define-type chai-expr
   ...
   [if? (test chai-expr?) (then-branch chai-expr?) (else-branch chai-expr?)])

Because the symbol ‘if’ conflicts with the Racket language form, we rename it ‘if?’; note that in Chai programs, the language form is still “if”. The parser is straightforward for if expressions, so we’ll jump straight to the interpreter. The new clause is:

(define (interp input-expr env)
  (type-case chai-expr input-expr
    ...
    [if? (test-expr then-branch else-branch)
         (let ([test-result (interp test-expr env)])
           (type-case chai-value test-result
             [bool-v (test-value) (interp (if test-value then-branch else-branch) env)]
             [else (error 'interp &quot;test expression was not a boolean: ~a&quot;
                          (chai-expr-&gt;string test-expr))]))]))

Note that we need to ensure the result of evaluating the test-expression is a boolean, and we ignore the unevaluated branch. For instance, the following program will run in Chai:

(if true 7 (/ 1 0))

even when the else branch will throw an error as a standalone program.

Functions

Finally, we’ve reached a very important milestone in Chai: figuring out what to do with functions. Let’s start with the basic syntax. In EBNF, a function has the form

expr = ...
     | (fun (vars ...) expr)

Where “vars …” indicates the presence of zero or more identifiers representing input variables, which are collectively referred to as “vars”. For instance, here is a Chai program which evaluates to the identity function:

(fun (x) x)

In Chai, we will treat functions the same way we treat any other primitive value. One can pass them around as if they were a number. For instance, here is a function which accepts one argument $ x$, and returns a function which accepts one argument $ y$ and adds $ x$ to $ y$:

(fun (x) (fun (y) (+ x y)))

Applying this function to a number returns a new function. In some languages, functions are required to be defined “at the top level,” i.e., given special names and put in special places. Since Chai expressions still consist of a single expression, we don’t have anywhere else to put them. So if one wants to “name” a function one can, but as it is most functions will be “anonymous.”

Speaking of function applications, they look like:

expr = ...
     | (expr expr ...)

Where the first expression should result in a function, and the remaining zero or more results should result in arguments to the function. Note that syntactically, these “shoulds” are irrelevant.

So here is a program which applies the previous example, which we name “make+=”, to some values:

(with make+= (fun (x) (fun (y) (+ x y)))
   (with +=7 (make+= 7)
     (pair (+=7 9) (pair (+=7 14) end))))

The advanced readers will recognize this as an explicit form of “currying” (accepting a partial list of arguments). We may in the future decide to have Chai support implicit currying, so that we may rewrite the above program as something like:

(with +=7 (+ 7)
  (pair (+=7 9) (pair (+=7 14) end)))

But for now let’s focus on the implementation.

The Parser

In the parser, our new branches look like:

[(and (equal? 'fun first-thing)
      (equal? 3 list-len)
      (list? (second input-expr)))
 (let ([vars (check-vars (second input-expr))]
       [body (parse-expr (third input-expr))])
   (function vars body))]
[else
 (app (parse-expr first-thing)
      (map parse-expr (rest input-expr)))]))] 

To elaborate, these two branches are wrapped within the condition that the expression is a list. If the first element of the list is not a reserved keyword like “with”, “if”, or “fun”, we assume the expression is a function application. Indeed, we could have some weird expressions that evaluate to functions, so we can’t discriminate any further here.

For functions, we have to ensure some things about the list of input variables. For instance, we can’t allow the user to repeat the same variable name twice, or use any reserved words, so we encapsulate both of those checks into a function called “check-vars”:

;; check-vars: listof symbol -&gt; listof symbol
;; check a list of variable names for invalidity
(define (check-vars names)
  (begin
    (map (λ (name) (when (reserved? name) (parse:bad-identifier name))) names)
    (when (has-duplicates? names) (error 'parse &quot;list of variables has duplicates: ~a&quot; names))
    names))

We leave the implementation of has-duplicates? as an exercise to the reader (with the solution in chai-utils.rkt), and reserved? is just a membership test in a list of symbols (in chai-ast.rkt).

While it has its details, the parser is the easy part of implementing functions. We didn’t actually have to make any important decisions. We’ll see the semantics are a bit more complicated.

Scope, and Closures

In general, the scope of a variable $ x$ refers to the places in the program where we can inspect the value of $ x$. For instance, in the following program $ x$ is only in scope within the “with” expression.

(with y (+ x y)
  (with x 7) y)

So, in fact, the reference to $ x$ before the nested with statement is invalid, and this program will raise an error upon evaluation. Note that $ y$ is in scope in the nested with. Recalling how our interpreter works, this is obvious, because we only augment the environment with new variables after encountering their definition in a with.

To illustrate the confusion arising in discussions about scope, we have to investigate functions. In particular, the definition of a function can refer to a variable defined somewhere else, but the body of the function isn’t evaluated until it’s actually called. Since functions can be passed around like numbers, it raises ambiguity as to which variable we should refer to during a function application.

For instance, we could write the following program and ask what it should do:

(with my-function (with x 33
                     (fun (y) (+ x y)))
   (with x 44 (my-function 55)))

Should this program evaluate to 88 or 99? Before continuing, make a guess as to what will happen (and try to argue why your choice is the best option), and verify it by running the program through our interactive interpreter.

In fact, there is no completely “correct” answer to this question, and it highlights the differences between lexical and dynamic scope. In the latter, a variable will always refer to the most recent binding which is in scope. In other words, if Chai had dynamic binding, the program above would evaluate to 99. On the other hand, if we implemented lexical scope, “my-function” would retain the information about what variables were in scope during its definition, and so the resulting call would evaluate to 88.

The engaged reader will have by now verified that Chai has lexical scope, and not dynamic scope (in the future, we may provide an alternative version of Chai which has dynamic scope). Most contemporary programming languages implement lexical scope, because it just so happens that otherwise the logic flow in a program is harder to determine just by reading it. However, some popular languages originally had dynamic scope, and later decided to tack on support for lexical scoping. Sometimes lexical scope is referred to as static scope.

In order to support lexical scope in Chai, we have to now discriminate between a function definition, and the function definition combined with the variables that are in scope during definition. In other words, we need the result of interpreting a function definition to contain both pieces of information. In particular, we call this piece of data a closure.

Now the chai-value datatype has an additional branch:

chai-value = number
           | boolean
           | string
           | (pair chai-value chai-value)
           | end
           | (closure environment (vars...) chai-expr)

Where the environment is our datatype for variable bindings, the vars… is the same list of function variables from our function definitions, and the chai-expr is the body expression, in unevaluated form. (Indeed, we can’t evaluate the body until it’s applied to some arguments!)

So this adds one more branch to our type-case:

(define-type chai-value
   ...
   [closure-v (env environment?) (vars (listof symbol?)) (body chai-expr?)])

Now we are ready to interpret functions, and see what we can do with them.

The Interpreter

In fact, interpreting function definitions is easy! Since we have already been keeping track of the environment all along, we can package it up in a closure in a single line:

(define (interp input-expr env)
  (type-case chai-expr input-expr
     ...
    [function (vars body) (closure-v env vars body)]))

Applying a function is quite a bit more work. In particular, we have two things we need to check before believing that the user knows how to write programs:

  • Is the first thing in a function application a function? (Does it evaluate to a closure?)
  • Does the number of provided arguments match the number of accepted arguments?

If neither of these are the case, we need to raise an error. Assuming these requirements are met, we can perform the actual function application as follows:

(let* ([interped-args (map (λ (arg) (interp arg env)) args)]
       [new-env (foldl add-binding closure-env vars interped-args)])
  (interp body new-env))

where args, vars, closure-env, and body are all the expected parts of our function application and closure. Note that we are very specific to augment the closure environment with the new bindings for the function arguments. This is the key of lexical scope, whereas dynamic scope would use the standard environment.

With all of the case-checking, the branch in our interpreter is a bit longer:

(define (interp input-expr env)
  (type-case chai-expr input-expr
    ...
   [app (clo args)
         (let ([expected-clo (interp clo env)])
           (type-case chai-value expected-clo
              [closure-v (closure-env vars body)
                         (let ([arg-count (length args)]
                               [var-count (length vars)])
                           (when (not (eq? arg-count var-count))
                             (error 'interp &quot;expected ~a args for a function application, but got ~a&quot; var-count arg-count))
                           (let* ([interped-args (map (λ (arg) (interp arg env)) args)]
                                  [new-env (foldl add-binding closure-env vars interped-args)])
                             (interp body new-env)))]
              [else (error 'interp &quot;invalid function in function application: ~a&quot; (chai-expr-&gt;string expected-clo))]))]))

Try to follow the logic above. Even for the author, determining what someone else’s code does is a nontrivial process, so don’t worry if it’s overwhelming at first. Just remember that most of it is error-checking, and the key part is the three lines where we actually perform the function application. Of course, to get an even better understanding, try to run programs in our interactive interpreter which hit every error case, and compare that with the flow of logic in the code above.

Recursion, But Not Quite Easy Enough

The experienced reader might be asking herself “Why haven’t we implemented any loops yet? We can’t really do anything interesting without loops.” And she is correct! A programming language cannot be Turing-complete (informally, have the true power of a computer) if it can’t branch out, or make programs which run indefinitely.

As of now, we couldn’t, say, write the function map (see our functional programming primer, A Taste of Racket, for a definition). If we try, we get a nasty error:

(with map (fun (f list)
            (if (end? list)
                end
                (pair (f (first list)) (map f (rest list)))))
  (map (fun (x) (+ 1 x)) (pair 7 (pair 6 (pair 5 end)))))

It complains that “map” is not bound in the current environment, which is a shame, because map is a really great function to have around, and we can’t build it yet (well, that’s not quite true, but it wouldn’t be as straightforward as what we have above).

It turns out that even though we can’t quite do this kind of recursion with names, we can still achieve infinite loops in Chai! We challenge the reader to figure out how to do it, and see what our interactive interpreter does in response. As a hint, the shortest solution the author has found is a mere 33 characters long, including spaces and parentheses, and it only requires functions and applications.

Some Further Issues to Address

We recall saying we would like the user to be able to redefine what “+” means via a with expression. However, upon trying to evaluate the following program which “adds” two functions, we get an error:

(with + (fun (f g) (fun (arg) (+ (f arg) (g arg))))
  ((+ (fun (x) (+ x 1)) (fun (y) (* y  2))) 7))

In order to fix this, we need to stop distinguishing between primitive operations and functions which are bound to symbols in our environment. In other words, we want to pre-populate the environment with bindings for primitive functions, which can be overwritten by the programmer. This will simplify our code significantly, since we can move even more of the logic outside of the interpreter and parser, and think of everything as a function application. This is something we have to look forward to in the future.

As long as we’re thinking of everything as a function application, why don’t we simplify Chai even further so that we only have functions and applications? No numbers, no primitives, no conditionals. What sort of language would we get? We’ll see just what happens with a little intermezzo next time. Of course, we will only do this temporarily to explore its theoretical properties, and then we’ll return to the civilized world.

Until then!

A Taste of Racket

or, How I Learned to Love Functional Programming

We recognize that not every reader has an appreciation for functional programming. Yet here on this blog, we’ve done most of our work in languages teeming with functional paradigms. It’s time for us to take a stand and shout from the digital mountaintops, “I love functional programming!” In fact, functional programming was part of this author’s inspiration for Math ∩ Programming.

And so, to help the reader discover the joys of functional programming, we present an introduction to programming in Racket, with a focus on why functional programming is amazing, and a functional solution to a problem on Project Euler. As usual, the entire source code for the examples in this post is available on this blog’s Github page.

Lists

Lists are the datatype of choice for functional programmers. In particular, a list is either the empty list or a pair of objects:

list = empty
     | (x list)

Here () denotes a pair, the first thing in the pair, “x”, is any object, and the second thing is another list. In Racket, all lists have this form, and they are constructed with a special function called “cons”. For instance, the following program outputs the list containing 7 and 8, in that order.

(cons 7 (cons 8 empty))

The reader will soon get used to the syntax: every function application looks like (function arguments…), including the arithmetic operators.

This paradigm is familiar; in imperative (“normal”) programming, this is called a linked list, and is generally perceived as slower than lists based on array structures. We will see why shortly, but in general this is only true for some uncommon operations.

Of course, Racket has a shorthand for constructing lists, which doesn’t require one to write “cons” a hundred times:

(list 7 9)

gives us the same list as before, without the nested parentheses. Now, if we wanted to add a single element to the front of this list, “cons” is still the best tool for the job. If the variable “my-list” is bound to a list, we would call

(cons 6 my-list)

to add the new element. For lists of things which are just numbers, symbols, or strings , there is an additional shorthand, using the “quote” macro:

'(1 2 3 4 "hello" my-symbol!)

Note that Racket does not require such lists are homogeneous, and it automatically converts the proper types for us.

To access the elements of a list, we only need two functions: first and rest. The “first” function accepts a list and returns the first element in it, while “rest” returns the tail of the list excluding the first element. This naturally fits with the “cons” structure of a list. For instance, if “my-list” is a variable containing (cons 8 (cons 9 empty)), then first and rest act as follows:

> (first my-list)
8
> (rest my-list)
'(9) 
> (first (rest my-list))
9

In particular, we can access any element of a list with sufficiently many calls to first and rest. But for most problems this is unnecessary. We are about to discover far more elegant methods for working with lists. This is where functional programming truly shines.

Double-List, Triple-List, and Sub-List

Suppose we want to take a list of numbers and double each number. If we just have what we know now about lists, we can write a function to do this. The general function definition looks like:

(define (function-name arg1 arg2 ...)
body-expression)

To be completely clear, the return value of a Racket function is whatever the body expression evaluates to, and we are allowed to recursively call the function. Indeed, this is the only way we will ever loop in Racket (although it has some looping constructs, we frown upon their use).

And so the definition for doubling a list is naturally:

(define (double-list my-list)
  (if (empty? my-list) empty
      (cons (* 2 (first my-list))
            (double-list (rest my-list)))))

In words, we have two cases: if my-list is empty, then there is nothing to double, so we return a new empty list (well, all empty lists are equal, so it’s the same empty list). This is often referred to as the “base case.” If my-list is nonempty, we construct a new list by doubling the first element, and then recursively  doubling the remaining list. Eventually this algorithm will hit the end of the list, and ultimately it will return a new list with each element of my-list doubled.

Of course, the mathematicians will immediately recognize induction at work. If the program handles the base case and the induction step correctly, then it will correctly operate on a list of any length!

Indeed, we may test double-list:

> (double-list empty)
'()
> (double-list '(1 2 3 5))
'(2 4 6 10)

And we are confident that it works. Now say we wanted to instead triple the elements of the list. We could rewrite this function with but a single change:

(define (triple-list my-list)
  (if (empty? my-list) empty
      (cons (* 3 (first my-list))
            (triple-list (rest my-list)))))

It’s painfully obvious that coding up two such functions is a waste of time. In fact, at 136 characters, I’m repeating more than 93% of the code (eight characters changing “doub” to “trip” and one character changing 2 to 3). What a waste! We should instead refactor this code to accept a new argument: the number we want to multiply by.

(define (multiply-list my-list n)
  (if (empty? my-list) empty
      (cons (* n (first my-list))
            (multiply-list (rest my-list) n))))

This is much better, but now instead of multiplying the elements of our list by some fixed number, we want to subtract 7 from each element (arbitrary, I know, but we’re going somewhere). Now I have to write a whole new function to subtract things!

(define (sub-list my-list n)
  (if (empty? my-list) empty
      (cons (- (first my-list) n)
            (sub-list (rest my-list) n))))

Of course, we have the insight to make it generic and accept any subtraction argument, but this is the problem we tried to avoid by writing multiply-list! We obviously need to step things up a notch.

Map

In all of this work above, we’ve only been changing one thing: the operation applied to each element of the list. Let’s create a new function, which accepts as input a list and a function which operates on each element of the list. This special operation is called map, and it is only possible to make because Racket treats functions like any other kind of value (they can be passed as arguments to functions, and returned as values; functions are first class objects).

The implementation of map should look very familiar by now:

(define (map f my-list)
  (if (empty? my-list) empty
      (cons (f (first my-list))
            (map f (rest my-list)))))

In particular, we may now define all of our old functions in terms of map! For instance,

(define (double x) (* 2 x))
(define (double-list2 my-list) (map double my-list))

Or, even better, we may take advantage of Racket’s anonymous functions, which are also called “lambda expressions,” to implement the body of double-list in a single line:

(map (λ (x) (* 2 x)) my-list)

Here the λ character has special binding in the Dr. Racket programming environment (Alt-\), and one can alternatively write the string “lambda” in its place.

With map we have opened quite a large door. Given any function at all, we may extend that function to operate on a list of values using map. Consider the imperative equivalent:

for (i = 0; i < list.length; i++):
   list.set(i, f(list.get(i)))

Every time we want to loop over a list, we have to deal with all of this indexing nonsense (not to mention the extra code needed to make a new list if we don’t want to mutate the existing list, and that iterator shortcuts prohibit mutation). And the Racket haters will have to concede, the imperative method has just as many parentheses 🙂

Of course, we must note that map always creates a new list. In fact, in languages that are so-called “purely” functional, it is impossible to change the value of a variable (i.e., there is no such thing as mutation). The advantages and disadvantages of this approach are beyond the scope of this post, but we will likely cover them in the future.

Fold and Filter

Of course, map is just one kind of loop we might want. For instance, what if we wanted to sum up all of the numbers in a list, or pick out the positive ones? This is where fold and filter come into play.

Fold is a function which reduces a list to a single value. To do this, it accepts a list, an initial value, and a function which accepts precisely two arguments and outputs a single value. It then uses this to combine the elements of a list. It’s implementation is not that different from map:

(define (fold f val my-list)
  (if (empty? my-list) val
      (fold f
            (f val (first my-list))
            (rest my-list))))

Here the base case is to simply return “val”, while the induction step is to combine “val” with the first element of the list using “f”, and then recursively apply fold to the remainder of the list. Now we may implement our desired summing function as

(define (sum my-list) (fold + 0 my-list))

And similarly, make a multiplying function:

(define (prod my-list) (fold * 1 my-list))

Notice now that we’ve extracted the essential pieces of the problem: what operation to apply, and what the base value is. In fact, this is the only relevant information to the summing and multiplying functions. In other words, we couldn’t possibly make our code any simpler!

Finally, filter is a function which selects specific values from a list. It accepts a selection function, which accepts one argument and returns true or false, and the list to select from. It’s implementation is again straightforward induction:

(define (filter select? my-list)
  (if (empty? my-list) empty
      (let ([elt (first my-list)]
            [the-rest (rest my-list)])
        (if (select? elt)
            (cons elt (filter select? the-rest))
            (filter select? the-rest)))))

To avoid superfluous calls to “first” and “rest”, we use Racket’s “let” form to bind some variables. Logically, the base case is again to return an empty list, while the inductive step depends on the result of applying “select?” to the first element in our list. If the result is true, we include it in the resulting list, recursively calling filter to look for other elements. If the result is false, we simply skip it, recursively calling filter to continue our search.

Now, to find the positive numbers in a list, we may simply use filter:

(filter (λ (x) (&gt; x 0)) my-list)

Again, the only relevant pieces of this algorithm are the selection function and the list to search through.

There is one important variant of fold, in particular, the function we’re using to fold may depend on the order in which it’s applied to elements of the list. We might require that folding begin at the end of a list instead of the beginning. Fold functions are usually distinguished as left- or right-folds. Of course, Racket has map, fold, and filter built in, but the fold functions are renamed “foldl” and “foldr”. We have implemented foldl, and we leave foldr as an exercise to the reader.

A Brighter, More Productive World

Any loop we ever want to implement can be done with the appropriate calls to map, fold, and filter. We will illustrate this by solving a Project Euler problem, specifically problem 67. For those too lazy to click a link, the problem is to find the maximal sum of paths from the apex to the base of a triangular grid of numbers. For example, consider the following triangle:

   3
  7 4
 2 4 6
8 5 9 3

Here the maximal path is 3,7,4,9, whose sum is 23. In the problem, we are provided with a file containing a triangle with 100 rows ($ 2^{99}$ paths!) and are asked to find the maximal path.

First, we recognize a trick. Suppose that by travelling the optimal route in the triangle above, we arrived in the second to last row at the number 2. Then we would know precisely how to continue: simply choose the larger of the two next values. We may reduce this triangle to the following:

      3
   7     4
 10   13   15

where we replace the second-to-last row with the sum of the entries and the larger of the two possible subsequent steps. Now, performing the reduction again on this reduced triangle, we get

   3
20   19

And performing the reduction one last time, we arrive at our final, maximal answer of 23.

All we need to do now is translate this into a sequence of maps, folds, and filters.

First, we need to be able to select the “maximum of pairs” of a given row. To do this, we convert a row into a list of successive pairs. Considering the intended audience, this is a rather complicated fold operation, and certainly the hardest part of the whole problem. We will let the reader investigate the code to understand it.

;; row-&gt;pairs: list of numbers -&gt; list of successive pairs of numbers
(define (row-&gt;pairs row)
  (if (equal? (length row) 1)
      (list row)
      (let ([first-pair (list (list (first row) (second row)))]
            [remaining-row (rest (rest row))]
            [fold-function (λ (so-far next)
                             (let ([prev-pair (first so-far)])
                               (cons (list (second prev-pair) next) so-far)))])
        (reverse (fold fold-function first-pair remaining-row)))))

All we will say is that we need to change the base case so that it is the first pair we want, and then apply the fold to the remaining elements of the row. This turns a list like ‘(1 2 3 4) into ‘((1 2) (2 3) (3 4)).

Next, we need to be able to determine which of these pairs in a given row are maximal. This is a simple map, which we extend to work not just with pairs, but with lists of any size:

;; maxes: list of lists of numbers -&gt; list of maxes of each list
(define (maxes pairs)
  (map (λ (lst) (apply max lst)) pairs))

Next, we combine the two operations into a “reduce” function:

;; reduce: combine maxes with row-&gt;pairs
(define (reduce row) (maxes (row-&gt;pairs row)))

Finally, we have the bulk of the algorithm. We need a helper “zip” function:

;; zip: list of lists -&gt; list
;; turn something like '((1 2 3) (5 6 7)) into '((1 5) (2 6) (3 7))
(define (zip lists)
  (if (empty? (first lists)) empty
      (cons (map first lists)
            (zip (map rest lists))))) 

and the function which computes the maximal path, which is just a nice fold. The main bit of logic is highlighted.

;; max-path: list of rows -&gt; number
;; takes the upside-down triangle and returns the max path
(define (max-path triangle)
  (fold (λ (cumulative-maxes new-row)
          (reduce (map sum (zip (list new-row cumulative-maxes)))))
        (make-list (length (first triangle)) 0)
        triangle))

In particular, given the previously computed maxima, and a new row, we combine the two rows by adding the two lists together element-wise (mapping sum over a zipped list), and then we reduce the result. The initial value provided to fold is an appropriately-sized list of zeros, and the rest falls through.

With an extra bit of code to read in the big input file, we compute the answer to be 7273, and eat a cookie.

Of course, we split this problem up into much smaller pieces just to show how map and fold are used. Functions like zip are usually built in to languages (though I haven’t found an analogue in Racket through the docs), and the maxes function would likely not be separated from the rest. The point is: the code is short without sacrificing modularity, readability, or extensibility. In fact, most of the algorithm we came up with translates directly to code! If we were to try the same thing in an imperative language, we would likely store everything in an array with pointers floating around, and our heads would spin with index shenanigans. Yet none of that has anything to do with the actual algorithm!

And that is why functional programming is beautiful.

As usual, the entire source code for the examples in this post is available on this blog’s Github page.

Until next time!

Addendum: Consider, if you will, the following solutions from others who solved the same problem on Project Euler:

C/C++:

#define numlines 100

int main()
{

  	int** lines = new int*[numlines];
  	int linelen[numlines];

  	for(int i=0; i&lt;numlines; i++) linelen[i] = i+1;

   	ifstream in_triangle(&quot;triangle.txt&quot;);

   	// read in the textfile
   	for (int i=0;i&lt;numlines;i++)
   	{
   		lines[i] = new int[i+1];
   		linelen[i] = i+1;
   		for (int j=0; j&lt;i+1; j++)
   		{
       		in_triangle&gt;&gt;lines[i][j];
       	}
       	in_triangle.ignore(1,'\n');
   	}

   	int routes1[numlines];
   	int routes2[numlines];

   	for (int i=0;i&lt;numlines;i++) routes1[i] = lines[numlines-1][i];

   	//find the best way
   	for (int i=numlines-2;i&gt;=0;i--)
   	{
     	for(int j=0;j&lt;i+1;j++)
   		{
   			if(routes1[j] &gt; routes1[j+1])
   			{
      			routes2[j] = routes1[j] + lines[i][j];
            } else
            {
            	routes2[j] = routes1[j+1] + lines[i][j];
            }
     	}
     	for (int i=0;i&lt;numlines;i++) routes1[i] = routes2[i];
   	}
   	cout&lt;&lt;&quot;the sum is: &quot;&lt;&lt;routes1[0]&lt;&lt;endl;
}

PHP:

&lt;?php
 $cont = file_get_contents(&quot;triangle.txt&quot;);
 $lines = explode(&quot;\n&quot;,$cont);
 $bRow = explode(&quot; &quot;,$lines[count($lines)-1]);
 for($i=count($lines)-1; $i&gt;0; $i--)
 {
   $tRow = explode(&quot; &quot;,$lines[$i-1]);
   for($j=0; $j&lt;count($tRow); $j++)
   {
      if($bRow[$j] &gt; $bRow[$j+1])
         $tRow[$j] += $bRow[$j];
      else
         $tRow[$j] += $bRow[$j+1];
   }
   if($i==1)
    echo $tRow[0];
   $bRow = $tRow;
 }
?&gt;

J (We admit to have no idea what is going in programs written in J):

{. (+ 0: ,~ 2: &gt;./\ ])/ m

Python:

import sys

l = [[0] + [int(x) for x in line.split()] + [0] for line in sys.stdin]

for i in range(1, len(l)):
    for j in range(1, i+2):
        l[i][j] += max(l[i-1][j-1], l[i-1][j])
print max(l[-1])

Haskell:

 module Main where
import IO

main = do
  triangle &lt;- openFile &quot;triangle.txt&quot; ReadMode
              &gt;&gt;= hGetContents
  print . foldr1 reduce .
      map ((map read) . words) . lines $ triangle
    where reduce a b = zipWith (+) a (zipWith max b (tail b)) 

Ah, foldr! map! zip! Haskell is clearly functional 🙂 Now there is a lot more going on here (currying, function composition, IO monads) that is far beyond the scope of this post, and admittedly it could be written more clearly, but the algorithm is essentially the same as what we have.

Chai – Environments and Variables

As usual, we encourage the reader to follow along with our online interpreter for this version of Chai. In addition, the source code for this version of Chai is available on this blog’s Google Code page. So go ahead: experiment, discover, and play!

The Dr. Racket programming environment (not to scale, color, font, or positioning) 🙂

Variables, Variables, Variables!

A land without variables would be quite ugly. Runtimes would be sluggish, superfluous arithmetic expressions would litter every corner, and the architects (we coders) would be quite melancholy. Want to reuse a value or function? Forget about it. Want to inspect a randomly generated number more than once? Yeah right. Want to have code that someone else can actually read? Dream on.

Luckily, we’re on an exodus to a better place. A land of loops, first-class functions, and all sorts of other neat language forms. Let’s begin by implementing variables and variable references.

Substitution, Schmubstitution

The first naive approach to variable references would go something like this: whenever the programmer binds a variable, say “x” is bound to 7, the interpreter goes ahead and replaces every occurrence of the symbol “x” with the value 7. So a (pseudocode) program like

bind x to 7
bind y to 8
x + y

would first evaluate to

bind y to 8
7 + y

and then to

7 + 8

and finally result in 15. Of course, this is fine for simple arithmetic expressions, but what if we want to write a program like

bind x to 1
print x
bind x to (x + 1)
print x
...

This kind of thing happens all the time in (procedural) programs, specifically when we’re looping over a counting variable. With our current substitution method, this program would evaluate (with a grain of sense) to something counter-intuitive:

print 1
bind x to (1 + 1)
print 1
...

This is obviously not what the programmer has in mind. So what we really want with a variable is to capture the current value of a variable at the time it is referenced. In other words, we want to keep track of the scope of our variables. Plainly speaking, a variable’s scope is the part of the program where its values may be accessed. Usually, and for the remainder of this post, such “parts of programs” will be nested expressions. Note that we will go into more depth about scope in future posts, in particular the difference between dynamic scope and lexical scope, but for now the difference is moot, because our programs consist of but a single expression.

Of course there are ways to avoid conflicting substitutions, but they are more work than the solution we will arrive at anyway. So let’s skip banal direct substitution method and go straight to a more civilized discussion.

Environments

Environments will be our alternative to substitution. An environment is simply a mapping from variable names to values which is updated as a program runs and new variable bindings are introduced. In other words, as our program evaluates it has a new piece of data floating around with it. Returning to our problematic program above, it might evaluate as follows:

bind x to 1
print x
bind x to (x + 1)
print x

In the first step, we add “x” to the environment and give it a value of one:

[environment: x -> 1]
print x
bind x to (x+1)
print x

When the print statement is reached, the interpreter looks up the appropriate value of $ x$ in the environment, and uses that value. After the subsequent bind statement, the evaluation looks like

[environment: x -> 2, x -> 1]
print x

In the environment, we simply use the first binding of $ x$ as our replacement value. The astute reader might ask: “Why bother keeping the first binding of $ x$ around at all?” This is a reasonable question in the simple pseudocode language we have here, but once we implement our syntax form for Chai, we will find virtue in this peculiar decision. In general, the argument in favor of environments will become stronger and stronger as we add more features to Chai. So without further ado, let us design the syntax form for variable bindings and references.

New Syntax: with and ref

Variable references are easy. We simply add a new language that consists of any valid identifier (here, a symbol as judged by Racket). Our EBNF syntax tree now includes the line:

expr = ...
     | variable-reference

where a variable reference is a string of characters which is recognized as an identifier by Racket. Note that for our purposes we don’t actually care about these kinds of details, and the Racket specification is fairly liberal with identifiers. This means we can use all of the nice descriptive variable names like “number->string”. The specification is more explicit when we implement the additional line of code in our AST:

[ref (id symbol?)]

For binding new variables, we will make variable scope explicit. In particular, we want the programmer to be responsible for telling the interpreter where variables are in scope. This extra burden on the programmer is not so bad. In most languages, the scope of a variable is determined by a set of rules, and here we just trade memorization for explicitness and ease of writing the interpreter. So now we introduce the “with” form, which binds a single variable and describes its scope. Its EBNF:

expr = ...
     | (with variable-name expr expr)

The first sub-expression corresponds to the value of the new variable, and the second sub-expression is the “body” of the with. In particular, the result of the “with” expression is the result of the body expression, after adding these new bindings. In our interpreter, the body will be the only place that references to this particular variable are granted. For example,

(with x (+ 3 4) (* x x))

evaluates to 49, but the following program, while well-formed, might sensibly raise an error upon evaluation:

(with x 2 (with y 3 z))

Since “z” is a free variable. Here the environment only contains “x” and “y”, and “z” is unbound. Another language designer (say, a designer of Mathematica) might reasonably wish to run programs with free variables evaluating trivially to themselves. Unfortunately, doing this right requires a lot of overhead which we are simply not equipped to handle yet. To a mathematician, for instance, the program “(+ 1 (+ 2 z))” should simplify to “(+ 3 z)”, but our primitive expressions are not set up to allow for simplification. There’s a reason Mathematica is so expensive! So we will take the easy route and raise an error whenever we encounter an unbound variable.

In Racket, our abstract syntax tree now includes the line:

[with (var symbol?) (bound-expr chai-expr?) (body chai-expr?)]

Note that as our language becomes more complex, we can make these language forms more convenient for the programmer. For instance, we may want to bind any number of expressions in a single “with,” instead of nesting an expression for each (way too much typing). This will come in due time. But first, let’s translate all of our ideas into code.

The Parser

Our entire abstract syntax tree now looks like this:

;; a chai expression, as defined in our blog post
(define-type chai-expr
  [num (val number?)]
  [bool (val boolean?)]
  [prim (id symbol?) (args list?)]
  [ref (id symbol?)]
  [with (var symbol?) (bound-expr chai-expr?) (body chai-expr?)])

Implementing this in our parser is straightforward, but we have some structural changes from last time:

;; parse:unsupported: s-expr -&gt; error
;; raise an error for unsupported language forms
(define (parse:unsupported input-expr)
  (error (string-append &quot;parse: unsupported language form: &quot;
                        (s-expr-&gt;string input-expr))))

;; parse-expr: s-expr -&gt; chai-expr
;; parse an s-expression into a chai expression, or
;;  throw an error if it is not well-formed
(define (parse-expr input-expr)
  (cond [(number? input-expr) (num input-expr)]
        [(eq? 'true input-expr) (bool #t)]
        [(eq? 'false input-expr) (bool #f)]
        [(symbol? input-expr) (ref input-expr)]
        [(list? input-expr)
         (let ([first-sym (first input-expr)]
               [list-len (length input-expr)])
           (cond [(primitive-symbol? first-sym)
                  (prim first-sym (map parse-expr (rest input-expr)))]
                 [(and (equal? 'with first-sym) (equal? list-len 4))
                  (let ([id (second input-expr)]
                        [bound-expr (parse-expr (third input-expr))]
                        [body-expr (parse-expr (fourth input-expr))])
                    (if (symbol? id) (with id bound-expr body-expr)
                        (error 'parse &quot;bad variable name: ~a&quot; id)))]
                 [else (parse:unsupported input-expr)]))]
        [else (parse:unsupported input-expr)]))

Where the highlighted lines are new. Since we tire of writing (list? input-expr) and (first input-expr), etc., we extract those pieces first (anticipating we may need them for future languagge forms), and then do our usual check for a primitive operation. Finally, we recursively parse the “third” and “fourth” expressions, the binding expression and body expression, respectively, and pack them into a “with” type, raising an error if the variable which we’re trying to bind is not a symbol. For instance, we certainly wouldn’t want to allow the user to rebind things like numbers or parentheses. Though that’s not actually possible with this implementation, we should sensibly signal an error in the parser, since such an attempt is not well-formed.

With the relevant, added test cases, we see our parser is correct, and we turn to the interpreter.

The Interpreter

Now that we need to carry around an environment, we’d like to change our interpreter’s signature to:

;; interp: chai-expr environment -&gt; chai-expr
;; interpret a chai-expression
(define (interp input-expr environment)
   ... )

This raises the obvious question: how are we going to represent an environment? There are many ways, but in particular we don’t want to bind ourselves to a particular implementation (no pun intended). So before we get to the interpreter, let’s develop a “chai-environment.rkt” module which provides an interface for manipulating environments.

This interface should provide functions to create a new set of bindings, add a binding to a list of bindings, and lookup a binding by its identifier. This corresponds neatly to three functions:

;; new-environment: -&gt; environment
;; create a new, empty environment
(define (new-environment) ... )

;; add-binding: environment symbol any -&gt; environment
;; add a new binding to the environment
(define (add-binding old-env id val) ... )

;; lookup-binding: environment id -&gt; any
;; lookup a binding, raising an error if the requested binding is not found.
(define (lookup-binding envrionment sought-id) ... )

We will again use define-type to create a data-type for bindings, and type-case to access them.

;; the bindings type, a linked list of binding values
(define-type environment
  [binding (id symbol?) (value any/c) (rest environment?)]
  [empty-env])

So an “environment” is just the head of a list, containing both a key/value pair, and a list of successive bindings. Implementing the above functions is now easy. For the first two, we just have

;; new-environment: -&gt; environment
;; create a new, empty environment
(define (new-environment)
  (empty-env))

;; add-binding: environment symbol any -&gt; environment
;; add a new binding to the environment
(define (add-binding old-env id val)
  (binding id val old-env))

We just invoke the appropriate constructor, and in the future if we decide to change our environment structure (say, look things up with hash-tables), the rest of our code is unaffected by the change.

Finally, the look-up function requires actually requires a little bit of logic:

;; lookup-binding: environment id -&gt; any
;; lookup a binding, throwing an error if the requested
;;  binding is not found.
(define (lookup-binding env sought-id)
  (type-case environment env
    [empty-env () (error 'lookup-binding
                         &quot;id ~a is unbound in the environment&quot;
                         sought-id)]
    [binding (id val rest-env) (if (eq? id sought-id) val
                               (lookup-binding rest-env sought-id))]))

In particular, we recursively search through the list of bindings, comparing each id to the sought-id, and returning the associated value if it is found. Otherwise, we raise an error.

Bringing this back to our interpreter, we need to alter our interp function to accept a new parameter:

;; interp: chai-expr environment -&gt; chai-expr
;; interpret a chai-expression
(define (interp input-expr env) ... )

Now, when we see a variable reference, we just use our lookup function to extract the correct value. In the type-case, this adds the new line:

[ref (id) (lookup-binding env id)]

And finally, the with case looks like

[with (var bound-expr body-expr)
      (let ([value (interp bound-expr env)])
        (interp body-expr (add-binding env var value)))]

where we recursively interpret the new binding, and then interpret the body expression with an augmented environment containing the new variable reference.

So, our entire interp function is as follows:

;; interp: chai-expr environment -&gt; chai-expr
;; interpret a chai-expression
(define (interp input-expr env)
  (type-case chai-expr input-expr
    [num (val) val]
    [bool (val) val]
    [ref (id) (lookup-binding env id)]
    [with (var bound-expr body-expr)
          (let ([value (interp bound-expr env)])
            (interp body-expr (add-binding env var value)))]
    [prim (id args)
          (let* ([interpreted-args (map (λ (arg) (interp arg env)) args)]
                 [operation (type-check/get-op id interpreted-args)])
            (operation interpreted-args))]))

Note that in order to maintain our other language forms (in particular, prim), we need to alter our recursive calls to interp to pass along the environment appropriately.

And that’s all there is to it! As usual, we have a host of test cases which prove the correctness of the program, and this code is available on this blog’s Google Code page. And, of course, the reader is invited to evaluate programs at this blog’s interactive online interpreter. Happy coding!

Observations, and a Peek into the Future

Before we close, we note some interesting features of Chai so far. First, we can do some weird things with variables. In particular, while we can’t yet overwrite bindings of primitive operations, we can use primitive symbols as variable names. Here is a bizarre program that actually runs:

(with + 7 (+ + +))

Technically, we haven’t “overwritten” the plus primitive operator, since it still functions as we expect it to. In the future, we may experiment with allowing the programmer to locally overshadow primitive operations with their own functions (if only just for fun).

Second, we recognize a very important choice we made in designing the with form. Specifically, in the interpreter, we first interpret the bound expression, and then bind it to the variable. The astute reader might ask, why bother? Why don’t we just leave the expression unevaluated until something references it? For instance, this would allow us to write programs like:

(with x (/ 1 0) (+ 2 3))

Which would run without error. It turns out that this choice has a name! It’s called “laziness,” and it’s used as the default in some significant programming languages, including Haskell. Haskell has a number of other very interesting features, especially pertaining to the study of logic and semantics. For instance, the Haskell compiler refuses to run a program unless it can prove that the program outputs the correct type at each step, and hence guarantee that the program cannot perform certain misbehaviors. We find this all quite fascinating, and add it to our lengthy list of Wonderful Things to Learn and Implement.

Until then!