The Journey Begins
We want to design a programming language!
That is certainly a lofty ambition, but one might ask, why bother? There are programming languages already out there that are way better than anything one mathematician-slash-programmer could do. Surprisingly, the answer is not so that we can have a programming language! Instead, we follow our inner mathematicians; we wish to study the structure of programming languages.
By doing so, we will learn how and why programming languages work, and we might even get some experience writing programs along the way. For the non-programmers out there, don’t be afraid. The process of designing a language is gradual, and as a result we’ll repeatedly practice the most basic features of our language, and then slowly add more features until we have a lean mean number-crunching machine. If one has a bit of programming knowledge already, then the journey through this series of posts should provide a good intuition for other programming languages, helping one determine the right questions to ask when a program runs amok. For the mathematicians, this series will stray a bit from our theoretical comfort zones, but we will use mathematical tools from computing theory to describe and analyze languages and programs. So this series should have a little something for everyone.
A Rough Outline
In order to study programming languages we need to study the issues which are common to all languages, and then decide how to customize those to fit our needs. At the highest level, a programming language has two parts: syntax and semantics. The logicians out there are already familiar with such words, but we will define them here.
A language’s syntax is the set of rules for producing well-formed programs. Syntaxes show up everywhere. In the language of arithmetic, the expression “1+” is not well-formed, because we broke the syntactical rule of the addition operator having a number before and after it. In the language of currency, one cannot lend a friend “5$2.01” dollars, because we all know the dollar-sign goes at the beginning (or end) of a number. For programs, our syntax will consist of rules for arithmetic, binding variables, constructing functions, and a number of other constructions with names we will learn later. Every form of communication has a syntax, and so forming syntactically correct statements is a prerequisite for conveying their meaning.
Second, a language’s semantics are the rules for evaluating well-formed programs. For us, this will include the precedence rules of arithmetic (and other) operations, variable substitution methods, scope, types, and many other issues. The study of semantics is mathematically the richest area of programming language theory. Researchers define and compare semantic models, and describe their relationships with underlying mathematical concepts, such as categories or models (both very rich and popular contemporary fields). For the majority of this series, we will stay away from such a rigorous mathematical analysis, mainly because the author is not familiar enough with these topics (yet). In any case, we will spend most of our time worrying about how to implement the correct semantic rules in our language.
And that raises the obvious question: how do we “make” a programming language do anything? The answer is (of course) we write a program! Specifically, we will design our new language, which we designate as the source language, by writing a program to interpret it. This program, which we henceforth dub an interpreter, will be written in a language we are familiar with, which we designate as the implementation language. In other words, our programming language will be completely determined by what the interpreter does to well-formed programs. If we design it correctly, then this will coincide with our theoretical idea of what it should do.
Note that writing an interpreter is one of two possible approaches. We could instead write a program called a compiler that translates our source language into machine code, which is specific to a physical computer. After compiling, the user may then ditch the compiler, and use the compiled code as she wishes (with our approach, she needs the interpreter every time she wants to run the program). Compilers have a number of other, more technical problems associated with them (see our post on Register Allocation [upcoming]), but it has the advantage of producing programs which run quickly. Since we are only interested in studying how languages are built, and not in running fast programs, we can live without a compiler and forget its issues. In the future, we may do another series on designing a compiler, during which we focus on the special challenges faced down that deeper rabbit hole.
So as we start to design our programming language, we will follow a recipe for each new feature:
- Theoretically:
- Add new syntax forms
- Decide how those forms should be interpreted (determine their semantics)
- In code:
- Accept a program in textual form
- Translate the textual form into an intermediate representation, while checking that it is well-formed (parse the input)
- Interpret the intermediate representation (run the program)
- Output the results to the user
As we write these interpreters, we will make the source code available on this blog’s Google Code page. However, as a special treat, we will also set up a web server for the reader to practice writing programs! In fact, this web server is already up and running. Later, that page will have links to forms where the user can type in a program, hit “submit,” and the server will run the program and report the result. This will save any non-programmers the trouble of downloading and installing Racket (our implementation language) and running the interpreter from its source. (Of course, the user is always welcome to do so. Racket is a wonderful language to use!)
We Almost Forgot the Most Important Part!
Before we do anything else, we need a name for our language. The names of popular programming languages are generally short, slick, and easy to pronounce. While the Java programming language started a coffee theme which would also be appropriate for mathematicians, this author loves tea, and we find it sad that no popular languages are named after tea. Hence, we name our new language “Chai.”
To sum our discussion up, we are about to embark on a mission: we will write a Chai interpreter in Racket, and learn about programming languages along the way. Next time we kick things off with the simplest version of Chai, one that exclusively features arithmetic. Until then!