|Making Hybrid Images||Neural Networks
|Bezier Curves and Picasso||Computing Homology||Probably Approximately
Correct – A Formal Theory
In my book I discuss the importance of context in reading and writing mathematics. An early step in becoming comfortable with math is deciphering the syntax of mathematical expressions. Another is in connecting the symbols to their semantic meanings. Embedded in these is the subproblem of knowing what to call the commonly used symbols. The more abstract you go, the more exotic the symbols tend to get.
Wikipedia has an excellent list for deciphering those symbols that have a typically well-understood meaning, like and . There is another list for common associations of Greek letters in science and math, along with the corresponding English/Latin list. There’s also a great website called Detexify that guesses the name of a symbol from your drawing. It’s a great way to lookup a confusing symbol.
To augment these resources, I’ll describe a few context-clues I’ve picked up over the years, and my first-instinct contextual association of each Greek letter. I wonder if there should be a database of such contextual associations.
Variables represent a word with a related starting letter or sound. E.g., for “function,” or for “time.” Greek letters do this too. For example, (lower case pi) is the Greek “p”, and it might be used for a “projection” function. Or $\latex Gamma$ (capital gamma, the Greek “G”) for a “graph.” This can help, for example, when trying to determine the type of the variable . In many cases you can quickly deduce it’s a vector.
Capital letters and lower case letters are usually related. For example, might be a member of the set . A function might be constructed from another function in such a way that all the information in is preserved, but is somehow “bigger” (e.g., the relationship between the probability density function and the cumulative density function). For this reason, it can help to know greek counterparts, e.g., that is the Greek lower case of $\latex Sigma$.
Adjacent letters are often related, both withinin and across alphabets. The variables are often used together to represent different parts of the same object, while letters like are used to represent a semantically different part of the object. For example, carries a strong association that are fixed constants and are unknown variables. Greek does this too. A triangle might have its side-lengths as , and for each side length the opposite angle to that side gets the corresponding first three Greek letters .
Fonts can imply semantics. The blackboard-bold font represents systems of numbers, as in . The lower-case fraktur font represents ideals in ring theory, particularly prime ideals, like . Calligraphic fonts like are used for higher-order structures after the context of lower-case and capital letters are already set, like categories (calligraphic) of sets (capital) of elements (lower case). I have seen some cases where sans-serif fonts are used in this role when calligraphic fonts are taken.
Greek alphabet first impulse associations
These are bound to be biased and incomplete. I’m interested to hear your associations. Which symbols always seem to be used in the same way, or come attached with the same loose association?
- (lower case alpha) – a generic variable; an angle of a triangle
- (lower case beta) – a generic variable; a different angle of a triangle
- (capital gamma) – the gamma function; the graph of a function as a set of pairs; a graph in the sense of graph theory
- (lower case gamma) – a closed curve (e.g., integrating over the real or complex plane)
- (capital delta) – a change; the max degree of a graph
- (lower case delta) – a change; a small positive value that you may choose; an impulse
- (lower case epsilon) – a small arbitrary positive real number; an error rate
- (lower case zeta) – the Riemann zeta function; a complex variable (like z)
- (lower case eta) – an error rate; a step size in an algorithm
- (capital theta) – an angle; asymptotic equivalence
- (lower case theta) – an angle
- (lower case iota) – the inclusion function; an injective function or embedding
- (lower case kappa) – curvature; connectivity; conditioning
- (capital lambda) – matrix of eigenvalues
- (lower case lambda) – an eigenvalue of a matrix
- (lower case mu) – a measure; a mean;
- (lower case nu) – None. I hate this letter because I think it’s hard to draw and it’s too close to v.
- (capital xi, like “ksee”) – None. It’s a weird letter that only a math troll would love. Too close to and conflicts with drawing bars on top of variables.
- (lower case xi) – a complex variable; I often draw it like a lightning bolt with a hat.
- (capital pi) – a product;
- (lower case pi) – the constant~3.14; a projection
- – projection; a rate; a rotation; a correlation
- (upper case sigma) – sum; an alphabet of symbols; covariance matrix
- (lower case sigma) – standard deviation; a symmetry; a reflection; a sign +/-1.
- (lower case tau) – a symmetry; a translation; if I really want to use but it would be confusing, so I draw it standing like a bird holding up one of its legs.
- (capital upsilon) – ??
- (lower case upsilon) – Too close to “u,v” not used.
- (capital phi, like “fai”, though many pronounce “fee”) – a general function; a potential function
- or (lower case phi) – a function/morphism; golden ratio; totient function
- (lower case chi, like “kai”) – a character; a statistic; the indicator function of an event; Euler characteristic
- (capital psi, like “sai”) – a function/morphism;
- (lower case psi) – a function/morphism
- (capital omega) – a lower bound
- (lower case omega) – a lower bound; a complex cube root of 1
This essay is a slightly modified version of the closing chapter of A Programmer’s Introduction to Mathematics.
We are no longer constrained by pencil and paper. The symbolic shuffle should no longer be taken for granted as the fundamental mechanism for understanding quantity and change. Math needs a new interface.
–Bret Victor, “Kill Math”
Math is a human activity. It’s messy and beautiful, complicated and elegant, useful and bull-headedly frustrating. But in reading this book, A Programmer’s Introduction to Mathematics, my dream is that you have found the attitude, confidence, and enough prerequisite knowledge to continue to engage with mathematics beyond these pages. I hope that you will find the same joy that I have in the combination of math and programming.
In these closing words, I’d like to explore a vision for how mathematics and software can grow together. Much of our effort in this book involved understanding notation, and using our imagination to picture arguments written on paper. In contrast, there’s a growing movement that challenges mathematics to grow beyond its life on a chalkboard.
One of the most visible proponents of this view is Bret Victor. If you haven’t heard of him or seen his fantastic talks, please stop reading now and go watch his talk, “Inventing on Principle.” It’s worth every minute. Victor’s central thesis is that creators must have an immediate connection to their work. As such, Victor finds it preposterous that programmers often have to write code, compile, run, debug, and repeat every time they make a change. Programmers shouldn’t need to simulate a machine inside their head when designing a program—there’s a machine sitting right there that can perform the logic perfectly!
Victor reinforces his grand, yet soft-spoken ideas with astounding prototypes. But his ideas are deeper than a flashy user interface. Victor holds a deep reverence for ideas and enabling creativity. He doesn’t want to fundamentally change the way people interact with their music library. He wants to fundamentally change the way people create new ideas. He wants to enable humans to think thoughts that could not previously have been thought at all. You might wonder what one could possibly mean by “think new thoughts,” but fifteen minutes of Victor’s talk will show you and make disbelieve how we could have possibly made do without the typical software write-compile-run loop. His demonstrations rival the elegance of the finest mathematical proofs.
Just as Lamport’s structured proof hierarchies and automated assistants are his key to navigating complex proofs, and similarly to how Atiyah’s most effective tool is a tour of ideas that pique his interest, Victor feels productive when he has an immediate connection with his work. A large part of it is having the thing you’re creating react to modifications in real time. Another aspect is simultaneously seeing all facets relevant to your inquiry. Rather than watch a programmed car move over time, show the entire trajectory for a given control sequence, the view updating as the control sequence updates. Victor demonstrates this to impressive effect. (It’s amusing to see an audience’s wild applause for this, when the same people might easily have groaned as students being asked to plot the trajectories of a differential equation, despite the two concepts being identical. No doubt it is related to the use of a video game.)
It should not surprise you, then, that Victor despises mathematical notation. In his essay “Kill Math,” Victor argues that a pencil and paper is the most antiquated and unhelpful medium for using mathematics. Victor opines on what a shame it is that so much knowledge is only accessible to those who have the unnatural ability to manipulate symbols on paper. How many good ideas were never thought because of that high bar?
One obvious reason for the ubiquity of mathematical notation is an accident of history’s most efficient information distribution systems, the printing press and later the text-based internet. But given our fantastic new technology—virtual reality, precise sensors, machine learning algorithms, brain-computer interfaces—how is it that mathematics is left in the dust? Victor asks all these questions and more.
I have to tread carefully here, because mathematics is a large part of my identity. When I hear “kill math,” my lizard brain shoots sparks of anger. For me, this is a religious issue deeper than my favorite text editor. Even as I try to remain objective and tactful, take what I say with a grain of salt.
Overall, I agree with Victor’s underlying sentiment. Lots of people struggle with math, and a better user interface for mathematics would immediately usher in a new age of enlightenment. This isn’t an idle speculation. It has happened time and time again throughout history. The Persian mathematician Muhammad ibn Musa al-Khwarizmi invented algebra (though without the symbols for it) which revolutionized mathematics, elevating it above arithmetic and classical geometry, quickly scaling the globe. Make no mistake, the invention of algebra literally enabled average people to do contemporarily advanced mathematics. I’m surprised Victor does not reference algebra as a perfect example of a tool for thinking new thoughts, even if before arguing its time has passed.
And it only gets better, deeper, and more nuanced. Shortly after the printing press was invented French mathematicians invented modern symbolic notation for algebra, allowing mathematics to scale up in complexity. Symbolic algebra was a new user interface that birthed countless new thoughts. Without this, for example, mathematicians would never have discovered the connections between algebra and geometry that are so prevalent in modern mathematics and which lay the foundation of modern physics. Later came the invention of set theory, and shortly after category theory, which were each new and improved user interfaces that allowed mathematicians to express deeper, more unified, and more nuanced ideas than was previously possible.
Meanwhile, many of Victor’s examples of good use of his prototypes are “happy accidents.” By randomly fiddling with parameters (and immediately observing the effect), Victor stumbles upon ideas that would never occur without the immediacy. To be sure, serendipity occurs in mathematics as well. Recall Andrew Wiles fumbling in his dark room looking for a light switch. Many creative aspects of mathematics involve luck, good fortune, and “eureka” moments, but there is nowhere near the same immediacy.
Immediacy makes it dreadfully easy to explore examples, which is one of the most important techniques I hope you take away from this book! But what algebraic notation and its successors bring to the table beyond happenstance is to scale in complexity beyond the problem at hand. While algebra limits you in some ways—you can’t see the solutions to the equations as you write them—it frees you in other ways. You need not know how to find the roots of a polynomial before you can study them. You need not have a complete description of a group before you start finding useful homomorphisms. As Sir Arthur Eddington said, group theory studies operations that are as unknown as the quantities that they operate on. We didn’t need to understand precisely how matrices correspond to linear maps before studying them, as might be required to provide a useful interface meeting Victor’s standards. Indeed, it was algebraic grouping and rearranging (with cognitive load reduced by passing it off to paper) that provided the derivation of matrices in the first place.
Then there are the many “interfaces” that we’ve even seen in this book: geometry and the Cartesian plane, graphs with vertices and edges, pyramids of balls with arrows, drawings of arcs that we assert are hyperbolic curves, etc. Mathematical notation goes beyond “symbol manipulation,” because any picture you draw to reason about a mathematical object is literally mathematical notation.
Ultimately I agree with Victor on his underlying principles. We share inspirations from computing history. We appreciate many of the same successes. However, I see a few ways Victor’s work falls short of enabling new modes of thought, particularly insofar as it aims to replace mathematical notation. I’ll outline the desiderata I think a new interface for mathematics must support if it hopes to replace notation.
- Counterfactual reasoning: The interface must support reasoning about things that cannot logically exist.
- Meaning assignment: The interface must support assigning arbitrary semantic meaning to objects.
- Flexible complexity: The interface should be as accessible to a child learning algebra as to a working mathematician.
- Incrementalism: Adapting the interface to study a topic must not require encoding extensive prior knowledge about that topic.
The last two properties are of particular importance for any interface. Important interfaces throughout history satisfy the last two, including spoken language, writing, most tools for making art and music, spreadsheets, touchscreens and computer mice, keyboards (layouts of buttons and toggles in general, of which QWERTY is one), and even the classic text editors vim and emacs—anyone can use them in a basic fashion, while experts dazzle us with them.
Let’s briefly explore each desired property.
Because mathematical reasoning can be counterfactual, any system for doing mathematics must allow for the possibility that the object being reasoned about cannot logically exist. We’ve seen this time and again in this book when we do proof by contradiction: we assume to the contrary that some object exists, and we conclude via logic that 1 = 2 or some other false statement, and then , which we handled as concretely as we would throw a ball, suddenly never existed to begin with. There is no largest prime, but I can naively assume that there is and explore what happens when I square it. Importantly, the interface need not encode counterfactual reasoning literally. It simply needs to support the task of counterfactual reasoning by a human.
Lumped in with this is population reasoning. I need to be able to reason about the entire class of all possible objects satisfying some properties. The set of all algorithms that compute a function (even if no such algorithm exists), or the set of all distance-preserving functions of an arbitrary space. These kinds of deductions are necessary to organize and synthesize ideas from disparate areas of math together (connecting us to “Flexible complexity” below).
A different view is that a useful interface for mathematics must necessarily allow the mathematician to make mistakes. But part of the point of a new interface was to avoid the mistakes and uncertainty that pencil and paper make frequent! It’s not entirely clear to me whether counterfactual reasoning necessarily enables mistakes. It may benefit from a tradeoff between the two extremes.
One of the handiest parts of mathematical notation is being able to draw an arbitrary symbol and imbue it with arbitrary semantic meaning. is a natural number by fiat. I can write and overload which multiplication means what. I can define a new type of arrow on the fly and say “this means injective map.”
This concept is familiar in software, but the defining feature in mathematics is that one need not know how to implement it to assert it and then study it. This ties in with “Incrementalism” below. Anything I can draw, I can give logical meaning.
Ideally the interface also makes the assignment and management of meaning easy. That is, if I’ve built up an exploration of a problem involving pennies on a table, I should easily be able to change those pennies to be coins of arbitrary unknown denomination. And then allow them to be negative-valued coins. And then give them a color as an additional property. And it should be easy to recall what semantics are applied to which objects later. If each change requires me to redo large swaths of work (as many programs built specifically to explore such a problem would), the interface will limit me. With algebraic notation, I could simply add another index, or pull out a colored pencil (or pretend it’s a color with shading), and continue as before. In real life I just say the word, even if doing so makes the problem drastically more difficult.
Music is something that exhibits flexible complexity. A child raps the keys of a piano and makes sounds. So too does Ray Charles, though his technique is multifaceted and deliberate.
Mathematics has similar dynamic range that can accommodate the novice and the expert alike. Anyone can make basic sense of numbers and unknowns. Young children can understand and generate simple proofs. With a decent grasp of algebra, one can compute difficult sums. Experts use algebra to develop theories of physics, write computer programs with provable guarantees, and reallocate their investment portfolios for maximum profit.
On the other hand, most visual interactive explorations of mathematics—as impressive and fun as they are—are single use. Their design focuses on a small universe of applicable ideas, and the interface is more about guiding you toward a particular realization than providing a tool. These are commendable, but when the experience is over one returns to pencil and paper.
The closest example of an interface I’ve seen that meets the kind of flexible complexity I ask of a replacement for mathematics is Ken Perlin’s Chalktalk. Pegged as a “digital presentation and communication language,” the user may draw anything they wish. If the drawing is recognized by the system, it becomes interactive according to some pre-specified rules. For example, draw a circle at the end of a line, and it turns into a pendulum you can draw to swing around. Different pieces are coupled together by drawing arrows; one can plot the displacement of the pendulum by connecting it via an arrow to a plotting widget. Perlin displays similar interactions between matrices, logical circuits, and various sliders and dials.
Chalktalk falls short in that your ability to use it is limited by what has been explicitly programmed into it as a behavior. If you don’t draw the pendulum just right, or you try to connect a pendulum via an arrow to a component that doesn’t understand its output, you hit a wall. To explain to the interface what you mean, you write a significant amount of code. This isn’t a deal breaker, but rather where I personally found the interface struggling to keep up with my desires and imagination. What’s so promising about Chalktalk is that it allows one to offset the mental task of keeping track of interactions that algebraic notation leaves to manual bookkeeping.
Incrementalism means that if I want to repurpose a tool for a new task, I don’t already need to be an expert in the target task to use the tool on it. If I’ve learned to use a paintbrush to paint a flower on a canvas, I need no woodworking expertise to paint a fence. Likewise, if I want to use a new interface for math to study an optimization problem, using the interface shouldn’t require me to solve the problem in advance. Algebra allows me to pose and reason about an unknown optimum of a function; so must any potential replacement for algebra.
Geometry provides an extended example. One could develop a system in which to study classical geometry, and many such systems exist (Geogebra is a popular one, and quite useful in its own right!). You could enable this system to draw and transform various shapes on demand. You can phrase theorems from Euclidean geometry in it, and explore examples with an immediate observation of the effect of any operation.
Now suppose we want to study parallel lines; it may be as clear as the day from simulations that two parallel lines never intersect, but does this fact follow from the inherent properties of a line? Or is it an artifact of the implementation of the simulation? As we remember, efficient geometry algorithms can suffer from numerical instability or fail to behave properly on certain edge cases. Perhaps parallel lines intersect, but simply very far away and the interface doesn’t display it well? Or maybe an interface that does display far away things happens to make non-intersecting lines appear to intersect due to the limitations of our human eyes and the resolution of the screen.
In this system, could one study the possibility of a geometry in which parallel lines always intersect? With the hindsight of our Chapter on group theory, we know such geometries exist (projective geometry has this property), but suppose this was an unknown conjecture. To repurpose our conventional interface for studying geometry would seem to require defining a correct model for the alternative geometry in advance. Worse, it might require us to spend weeks or months fretting over the computational details of that model. We might hard-code an intersection point, effectively asserting that intersections exist. But then we need to specify how two such hard-coded points interact in a compatible fashion, and decide how to render them in a useful way. If it doesn’t work as expected, did we mess up the implementation, or is it an interesting feature of the model? All this fuss before we even know whether this model is worth studying!
This is mildly unfair, as the origins of hyperbolic geometry did, in fact, come from concrete models. The point is that the inventors of this model were able to use the sorts of indirect tools that precede computer-friendly representations. They didn’t need a whole class of new insights to begin. If the model fails to meet expectations early on, they can throw it out without expending the effort that would have gone into representing it within our hypothetical interface.
On the Shoulders of Giants
Most of my objections boil down to the need to create abstractions not explicitly programmed into the interface. Mathematics is a language, and it’s expressiveness is a core feature. Like language, humans use it primarily to communicate to one another. Like writing, humans use it to record thoughts in times of inspiration, so that memory can be offset to paper and insights can be reproduced faithfully later. Paraphrasing Thurston, mathematics only exists in the social fabric of the people who do it. An interface purporting to replace mathematical notation must build on the shoulders of the existing mathematics community. As Isaac Newton said, “If I have seen further it is by standing on the shoulders of giants.”
The value of Victor’s vision lies in showing us what we struggle to see in our minds. Now let’s imagine an interface that satisfies our desiderata, but also achieves immediacy with one’s work. I can do little more than sketch a dream, but here it is.
Let’s explore a puzzle played on an infinite chessboard, which I first learned from mathematician Zvezdelina Stankova via the YouTube channel Numberphile. You start with an integer grid , and in each grid cell you can have a person or no person. The people are called “clones” because they are allowed to take the following action: if cells and are both empty, then the clone in cell can split into two clones, which now occupy spaces , leaving space vacant. You start with three clones in “prison” cells , and the goal is to determine if there is a finite sequence of moves, after which all clones are outside the prison. For this reason, Stankova calls the puzzle “Escape of the Clones.”
Suppose that our dream interface is sufficiently expressive that it can encode the rules of this puzzle, and even simulate attempts to solve it. If the interface is not explicitly programmed to do this, it would already be a heroic accomplishment of meaning assignment and flexible complexity.
Now after playing with it for a long time, you start to get a feeling that it is impossible to free the clones. We want to use the interface to prove this, and we can’t already know the solution to do so. This is incrementalism.
If we were to follow in Stankova’s footsteps, we’d employ two of the mathematician’s favorite tools: proof by contradiction and the concept of an invariant. The invariant would be the sum of some weights assigned to the initial clones: the clone in cell has weight 1, and the clone in cells each get weight 1/2. To be an invariant, a clone’s splitting action needs to preserve weight. A simple way to do this is to simply have the cloning operation split a clone’s current weight in half. So a clone in cell with weight 1/2 splits into two clones in cells each of weight 1/4. We can encode this in the interface, and the interface can verify for us that the invariant is indeed an invariant. In particular, the weight of a clone depends only on its position, so that the weight of a clone in position is . The interface would determine this and tell us. This is immediacy.
Then we can, with the aid of the interface, compute the weight-sum of any given configuration. The starting region’s weight is 2, and it remains 2 after any sequence of operations. It dawns on us to try filling the entire visible region outside the prison with clones. We have assumed to the contrary that an escape sequence exists, in which the worst case is that it fills up vast regions of the plane. The interface informs us that our egregiously crowded region has weight 1.998283. We then ask the interface to fill the entire complement of the prison with clones (even though that is illegal; the rules imply you must have a finite sequence of moves!). It informs us that weight is also 2. We realize that if any cell is cloneless, as must be true after a finite number of moves, we will have violated the invariant. This is counterfactual reasoning.
Frankly, an interface that isn’t explicitly programmed to explore this specific proof—yet enables an exploration that can reveal it in a more profound way than paper, pencil, and pondering could—sounds so intractable that I am tempted to scrap this entire essay in utter disbelief. How can an interface be so expressive without simply becoming a general-purpose programming language? What would prevent it from displaying the same problems that started this inquiry? What precisely is it about the nature of human conversation that makes it so difficult to explain the tweaks involved in exploring a concept to a machine?
While we may never understand such deep questions, it’s clear that abstract logic puzzles and their proofs provide an excellent test bed for proposals. Mathematical puzzles are limited, but rich enough to guide the design of a proposed interface. Games involve simple explanations for humans with complex analyses (flexible complexity), drastically different semantics for abstract objects like chessboards and clones (meaning assignment), there are many games which to this day still have limited understanding by experts (incrementalism), and the insights in many games involve reasoning about hypothetical solutions (counterfactual reasoning).
In his book “The Art of Doing Science and Engineering,” the mathematician and computer scientist Richard Hamming put this difficulty into words quite nicely,
It has rarely proved practical to produce exactly the same product by machines as we produced by hand. Indeed, one of the major items in the conversion from hand to machine production is the imaginative redesign of an equivalent product. Thus in thinking of mechanizing a large organization, it won’t work if you try to keep things in detail exactly the same, rather there must be a larger give-and-take if there is to be a significant success. You must get the essentials of the job in mind and then design the mechanization to do that job rather than trying to mechanize the current version—if you want a significant success in the long run.
Hamming’s attitude about an “equivalent product” summarizes the frustration of writing software. What customers want differs from what they say they want. Automating manual human processes requires arduously encoding the loose judgments made by humans—often inconsistent and based on folk lore and experience. Software almost always falls short of really solving your problem. Accommodating the shortcomings requires a whole extra layer of process.
We write programs to manage our files, and in doing so we lose much of the spatial reasoning that helps people remember where things are. The equivalent product is that the files are stored and retrievable. On the other hand, for mathematics the equivalent product is human understanding. This should be no surprise by now, provided you’ve come to understand the point of view espoused throughout this book. In this it deviates from software. We don’t want to retrieve the files, we want to understand the meaning behind their contents.
My imagination may thus defeat itself by failing to give any ground. If a new interface is to replace pencil and paper mathematics, must I give up the ease of some routine mathematical tasks? Or remove them from my thinking style entirely? Presuming I can achieve the same sorts of understanding—though I couldn’t say how—the method of arrival shouldn’t matter. And yet, this attitude ignores my experience entirely. The manner of insight you gain when doing mathematics is deeply intertwined with the method of inquiry. That’s precisely why Victor’s prototypes allow him to think new thoughts!
Mathematics succeeds only insofar as it advances human understanding. Pencil and paper may be the wrong tool for the next generation of great thinkers. But if we hope to enable future insights, we must understand how and why the existing tools facilitated the great ideas of the past. We must imbue the best features of history into whatever we build. If you, dear programmer, want to build those tools, I hope you will incorporate the lessons and insights of mathematics.
The Second Edition of A Programmer’s Introduction to Mathematics is now available on Amazon.
The second edition includes a multitude of fixes to typos and some technical errata, thanks to my readers who submitted over 200 errata. Readers who provided names are credited in the front matter.
I also added new exercises, and three new appendices: a notation table to augment the notation index, a summary of elementary formal logic, and an annotated list of further reading. I also had another editor read the book, resulting in my rewriting a few proofs and reorganizing a bit. I also read the book once more to tighten up the prose.
If you bought the first edition, you probably won’t benefit much from buying the second. Note the pdf ebook is now pay what you want (suggested price $20). I felt that with the improvements made on this second pass, the book is in a good enough state for me to consider it done, and to move on to other projects.
For now I’m working on two untitled family projects, and I’d like to get back to blogging regularly as I let the outlines for my next two book ideas simmer and I slowly chip away at them. For what it’s worth, it seems to be getting harder to keep up the writing as work demands ramp up and I try to maintain a healthy work-life balance.
In celebration of the second edition I’ll be posting (a slightly modified version of) the closing essay to the book today.
Recently my employer (Google) forced me to switch to Mercurial instead of my usual version control system, git. The process of switching sparked a few discussions between me and my colleagues about the value of various version control systems. A question like “what benefit does git provide over Mercurial” yielded no clear answers, suggesting many developers don’t know. An informal Twitter survey didn’t refute this claim.
A distinguished value git provides me is the ability to sculpt code changes into stories. It does this by
- Allowing changes to be as granular as possible.
- Providing good tools for manipulating changes.
- Treating the change history as a first-class object that can be manipulated.
- Making branching cheap, simple, and transparent.
This might be summed up by some as “git lets you rewrite history,” but to me it’s much more. Working with code is nonlinear by nature, which makes changes hard to communicate well. Wielding git well lets me easily draft, edit, decompose, mix, and recombine changes with ease. Thus, I can narrate a large change in a way that’s easy to review, reduces review cycle time, and makes hindsight clear. Each pull request is a play, each commit a scene, each hunk a line of dialogue or stage direction, and git is a director’s tool.
The rest of this article will expand on these ideas. For those interested in learning more about git from a technical perspective, I enjoyed John Wiegley’s Git from the Ground Up. That short book will make rigorous many of the terms I will use more loosely in this article, but basic git familiarity is all that’s required to understand the gist of this article. My philosophy of using git is surely unoriginal. It is no doubt influenced by the engineers I’ve worked with and the things I’ve read. At best, my thoughts here refine and incrementally expand upon what I’ve picked up from others.
If Lisp’s great insight was that code is data that programmers can take advantage of that with metaprogramming, then git’s great insight is that code changes are data and programmers can take advantage of that with metachanges. Changes are the data you produce while working on features and bug fixes. Metachanges are the changes you make to your changes to ready them for review. Embracing metachanges enables better cleanliness and clearer communication. Git supports metachanges with few limits, and without sacrificing flexibility.
For instance, if you want, you can treat git like a replacement for Dropbox. You keep a single default branch, you do
git pull, edit code, and run
git add --all && git commit -m "do stuff" && git push. This saves all your work and pushes it to the server. You could even alias this to
git save. I admit that I basically do this for projects of no real importance.
Such sloppy usage violates my personal philosophy of git. That is, you should tell a clear story with your commits. It’s not just that a commit message like “do stuff” is useless, it’s that the entire unit of work is smashed into one change and can’t easily be teased apart or understood incrementally.
This is problematic for code review, which is a crucial part of software development. The cost and cognitive burden of a unit of code review scales superlinearly with the amount of code to review (citation needed, but this is my personal experience). However, sometimes large code reviews are necessary. Large refactors, extensive testing scenarios, and complex features often cannot be split into distinct changesets or pull requests. In addition, most continuous integration frameworks require that after every merge of a changeset or pull request, all tests pass and the product is deployable. That means you can’t submit an isolated changeset that causes tests to fail or performs partial refactors without doing more work and introducing more opportunities to make mistakes.
In light of this, I want to reduce the review burden for my reviewers, and encourage people I’m reviewing for to reduce the burden for me. This is a human toll. The best way to help someone understand a complex change is to break it into the smallest possible reasonable, meaningful units of change, and then compose the pieces together in a logical way. That is, to tell a story.
Git enables this by distinguishing between units of change. The most atomic unit of change is a hunk, which is a diff of a subset of lines of a file. A set of hunks (possibly across many files) are assembled into a commit, which includes an arbitrary message that describes the commit. Commits are assembled linearly into a branch which can then be merged with the commits in another branch, often the master/default/golden branch. (Technically commits are nodes in a tree, each commit having a unique parent, and a branch is just a reference to a commit, but the spirit of what I said is correct.) On GitHub, the concept of a pull request wraps a list of commits on a branch with a review and approval process, where it shows you the list of commits in order.
Take advantage of these three levels of specificity: use hunks to arrange your thoughts, commits to voice a command, and pull requests to direct the ensemble.
In particular, as a feature implementer, you can reduce review burden by separating the various concerns composing a feature into different commits. The pull request as a whole might consist of the core feature work, tests, some incidental cleanup-as-you-go, and some opportunistic refactoring. These can each go in different commits, and the core feature work usually comprises much less than half the total code to review.
Splitting even the core feature work into smaller commits makes reviewing much easier. For example, your commits for a feature might suggestively look like the following (where the top is the first commit and the bottom is the last commit):
9d7a191 - read the user's full name from the database
7d5c212 - unit tests for user name reading
cdb37c5 - include user's name in the user/ API response
7c5c62c - unit tests for user/ API name field
7b4ca44 - display the user's name on the profile page
8e72535 - integration test to verify name displayed
9bdf5b8 - sanitize the name field on submission
e11201b - unit tests for name submission
341abdc - refactor name -> full_name
331bcb2 - style fixes
Each unit is small and the reviewer reads the commits one at a time, in the order you present them. Then the reviewer approves them all as a whole or asks for revisions. This style results in faster reviews than if all the code is included in one commit because the reviewer need not reconstruct your story from scratch. Code is inherently laid out nonlinearly, so it’s hard to control what the reviewer sees and in what order. By crafting your pull request well, you draw attention to certain aspects of the change before showing its otherwise confusing implications. A story commit style is a natural way to achieve this.
This is the key benefit of the story approach. You get better control over how your code reviewer reads your code. This reduces cognitive burden for the reviewer and increases the likelihood that bugs are found during review.
There are also less obvious benefits that can have an outsized impact. Explaining your work as a story prompts you to think critically about your changes, suggesting redesigns and helping you catch errors using the same principle behind rubber duck debugging. Moreover, by revealing your thought process, the reviewer understands you better and can suggest better improvements. If all your code is lumped into one change, it’s easy for a reviewer to second guess the rationale behind a particular line of code. If that line is intentional and included in a small commit with a message that makes it clear it was intended (and for what reason), the question is preemptively answered and a bit of trust is built. Finally, the more you practice organizing your work as a clean story, the easier it is for your work to actually become that clean story. You learn to quickly assemble more efficient plans to get work done. You end up revising your work less often, or at least less often for stupid reasons.
By its design and tooling, git makes crafting narrative code changes easy. The tools that enable this are the staging area (a.k.a. the index), branches, cherry-picking, and interactive rebasing.
The staging area (or index) is a feature that allows you to mark parts of the changes in your workspace as “to be included in the next commit.” In other words, git has three possible states of a change: committed, staged for the next commit, and not yet staged. By default
git diff shows only what has not been staged, and
git diff --staged allows you to see only what’s staged, but not committed.
I find the staging area to be incredibly powerful for sorting out my partial work messes. I make messes because I don’t always know in advance if some tentative change is what I really want. I often have to make some additional changes to see how it plays out, and if I run into any roadblocks, how I would resolve them.
The staging area helps me be flexible: rather than commit the changes and undo them later (see next paragraph), I can experiment with the mess, get it to a good place to start committing, and then repeatedly stage the first subset of changes I want to group into a commit,
git commit, and keep staging. This is seamless with editor support. I use vim-gitgutter, which allows me to simply navigate to the block I want to stage, type
,ha (“leader hunk add”), continue until I’m ready to commit, then drop to the command line and run
git commit. Recall, the “hunk” is the smallest unit of change git supports (a minimal subset of lines changing a single file), and the three layer “hunk-commit-pull” hierarchy of changes provides three layers of commitment support: hunks are what I organize into a commit (what I am ready to “commit” to being included in the feature I’m working on), commits are minimal semantic units of change that can be comprehended by a reviewer in the context of the larger feature, and a pull request is the smallest semantic unit of “approvable work” (feature work that maintains repository-wide continuous integration invariants).
Of course, I can’t always be right. Sometimes I will make some commits and realize I want to go back and do it differently. This is where branching, rebasing, and cherry-picking come in. The simplest case is when I made a mistake in something I committed. Then I can do an interactive rebase to basically go back to the point in time when I made the mistake, correct it, and go back to the present. Alternatively, I can fix the change now, commit it, and use interactive rebasing to combine the two commits post hoc. Provided you don’t run into any merge conflicts between these commits and other commits in your branch, this is seamless. I can also leave unstaged things like extra logging, notes, and debug code, or commit them with a magic string, and run a script before pushing that removes all commits from my branch whose message contains the magic string.
Another kind of failure is when I realize—after finishing half the work—that I can split the feature work into two smaller approvable units. In that case, I can extract the commits from my current branch to a new branch in a variety of ways (branch+rebase or cherry-pick), and then prepare the two separate (or dependent) pull requests.
This is impossible without keeping a fine-grained commit history while you’re developing. Otherwise you have to go back and manually split commits, which is more time consuming. Incidentally, this is a pain point of mine when using Perforce and Mercurial. In those systems the “commit” is the smallest approvable unit, which you can amend by including all local changes or none. While they provide some support for splitting bigger changes into smaller changes post hoc, I’ve yet to do this confidently and easily, because you have to go down from the entire change back to hunks. Git commits group hunks into semantically meaningful (and named) units that go together when reorganized. In my view, when others say a benefit of git is that “branches are cheap,” simple and fast reorganization is the true benefit. “Cheap branching” is a means to that end.
A third kind of mistake is one missed even by review. Such errors make it to the master branch, and start causing bugs in production. Having a clean commit history with small commits is helpful here because it allows one to easily rolled back the minimal bad change without rolling back the entire pull request it was contained in (though you can if you want).
Finally, the benefits of easy review are redoubled when looking at project history outside the context of a review. You can point new engineers who want to implement a feature to a similar previous feature, and rather than have them see all your code lumped into one blob, they can see the evolution of the feature in its cleanest and clearest form. Best practices, like testing and refactoring and when to make abstractions, are included in the story.
There is a famous quote by Hal Abelson, that “Programs must be written for people to read, and only incidentally for machines to execute.” The same view guides my philosophy for working with revisions, that code changes must be written for people to read and incidentally to change codebases. Now that you’ve had a nice groan, let me ask you to reflect on this hyperbole the next time you encounter a confusing code review.