Searching for RH Counterexamples — Setting up Pytest

Some mathy-programmy people tell me they want to test their code, but struggle to get set up with a testing framework. I suspect it’s due to a mix of:

  • There are too many choices with a blank slate.
  • Making slightly wrong choices early on causes things to fail in unexpected ways.

I suspect the same concerns apply to general project organization and architecture. Because Python is popular for mathy-programmies, I’ll build a Python project that shows how I organize my projects and and test my code, and how that shapes the design and evolution of my software. I will use Python 3.8 and pytest, and you can find the final code on Github.

For this project, we’ll take advice from John Baez and explore a question that glibly aims to disprove the Riemann Hypothesis:

A CHALLENGE:

Let σ(n) be the sum of divisors of n. There are infinitely many n with σ(n)/(n ln(ln(n)) > 1.781. Can you find one? If you can find n > 5040 with σ(n)/(n ln(ln(n)) > 1.782, you’ll have disproved the Riemann Hypothesis.

I don’t expect you can disprove the Riemann Hypothesis this way, but I’d like to see numbers that make σ(n)/(n ln(ln(n)) big. It seems the winners are all multiples of 2520, so try those. The best one between 5040 and a million is n = 10080, which only gives 1.755814.

https://twitter.com/johncarlosbaez/status/1149700802371608576

Initializing the Project

One of the hardest parts of software is setting up your coding environment. If you use an integrated development environment (IDE), project setup is bespoke to each IDE. I dislike this approach, because what you learn when using the IDE is not useful outside the IDE. When I first learned to program (Java), I was shackled to Eclipse for years because I didn’t know how to compile and run Java programs without it. Instead, we’ll do everything from scratch, using only the terminal/shell and standard Python tools. I will also ignore random extra steps and minutiae I’ve built up over the years to deal with minor issues. If you’re interested in that and why I do them, leave a comment and I might follow up with a second article.

This article assumes you are familiar with the basics of Python syntax, and know how to open a terminal and enter basic commands (like ls, cd, mkdir, rm). Along the way, I will link to specific git commits that show the changes, so that you can see how the project unfolds with each twist and turn.

I’ll start by creating a fresh Python project that does nothing. We set up the base directory riemann-divisor-sum, initialize git, create a readme, and track it in git (git add + git commit).

mkdir riemann-divisor-sum
cd riemann-divisor-sum
git init .
echo "# Divisor Sums for the Riemann Hypothesis" > README.md
git add README.md
git commit -m "add empty README.md"

Next I create a Github project at https://github.com/j2kun/riemann-divisor-sum (the name riemann-divisor-sum does not need to be the same, but I think it’s good), and push the project up to Github.

git remote add origin git@github.com:j2kun/riemann-divisor-sum.git
# instead of "master", my default branch is really "main"
git push -u origin master   

Note, if you’re a new Github user, the “default branch name” when creating a new project may be “master.” I like “main” because it’s shorter, clearer, and nicer. If you want to change your default branch name, you can update to git version 2.28 and add the following to your ~/.gitconfig file.

[init]
    defaultBranch = main

Here is what the project looks like on Github as of this single commit.

Pytest

Next I’ll install the pytest library which will run our project’s tests. First I’ll show what a failing test looks like, by setting up a trivial program with an un-implemented function, and a corresponding test. For ultimate simplicity, we’ll use Python’s built-in assert for the test lines. Here’s the commit.

# in the terminal
mkdir riemann
mkdir tests


# create riemann/divisor.py containing:
'''Compute the sum of divisors of a number.'''

def divisor_sum(n: int) -> int:
    raise ValueError("Not implemented.")


# create tests/divisor_test.py containing:
from riemann.divisor import divisor_sum

def test_sum_of_divisors_of_72():
    assert 195 == divisor_sum(72)

Next we install and configure Pytest. At this point, since we’re introducing a dependency, we need a project-specific place to store that dependency. All dependencies related to a project should be explicitly declared and isolated. This page helps explain why. Python’s standard tool is the virtual environment. When you “activate” the virtual environment, it temporarily (for the duration of the shell session or until you run deactivate) points all Python tools and libraries to the virtual environment.

virtualenv -p python3.8 venv
source venv/bin/activate

# shows the location of the overridden python binary path
which python
# outputs: /Users/jeremy/riemann-divisor-sum/venv/bin/python

Now we can use pip as normal and it will install to venv. To declare and isolate the dependency, we write the output of pip freeze to a file called requirements.txt, and it can be reinstalled using pip install -r requirements.txt. Try deleting your venv directory, recreating it, and reinstalling the dependencies this way.

pip install pytest
pip freeze > requirements.txt
git add requirements.txt
git commit -m "requirements: add pytest"

# example to wipe and reinstall
# deactivate
# rm -rf venv
# virtualenv -p python3.8 venv
# source venv/bin/activate
# pip install -r requirements.txt

As an aside, at this step you may notice git mentions venv is an untracked directory. You can ignore this, or add venv to a .gitignore file to tell git to ignore it, as in this commit. We will also have to configure pytest to ignore venv shortly.

When we run pytest (with no arguments) from the base directory, we see our first error:

    from riemann.divisor import divisor_sum
E   ModuleNotFoundError: No module named 'riemann'

Module import issues are a common stumbling block for new Python users. In order to make a directory into a Python module, it needs an __init__.py file, even if it’s empty. Any code in this file will be run the first time the module is imported in a Python runtime. We add one to both the code and test directories in this commit.

When we run pytest (with no arguments), it recursively searches the directory tree looking for files like *_test.py and test_*.py loads them, and treats every method inside those files that are prefixed with “test” as a test. Non-“test” methods can be defined and used as helpers to set up complex tests. Pytest then runs the tests, and reports the failures. For me this looks like

Our first test failure.

Our implementation is intentionally wrong for demonstration purposes. When a test passes, pytest will report it quietly as a “.” by default. See these docs for more info on different ways to run the pytest binary and configure its output report.

In this basic pytest setup, you can put test files wherever you want, name the files and test methods appropriately, and use assert to implement the tests themselves. As long as your modules are set up properly, as long as imports are absolute (see this page for gory details on absolute vs. relative imports), and as long as you run pytest from the base directory, pytest will find the tests and run them.

Since pytest searches all directories for tests, this includes venv and __pycache__, which magically appears when you create python modules (I add __pycache__ to gitignore). Sometimes package developers will include test code, and pytest will then run those tests, which often fail or clutter the output. A virtual environment also gets large as you install big dependencies (like numpy, scipy, pandas), so this makes pytest slow to search for tests to run. To alleviate, the --norecursedirs command line flag tells pytest to skip directories. Since it’s tedious to type --norecursedirs='venv __pycache__' every time you run pytest, you can make this the default behavior by storing the option in a configuration file recognized by pytest, such as setup.cfg. I did it in this commit.

Some other command line options that I use all the time:

  • pytest test/dir to test only files in that directory, or pytest test/dir/test_file.py to test only tests in that file.
  • pytest -k STR to only run tests whose name contains “STR”
  • pytest -s to see see any logs or print statements inside tested code
  • pytest -s to allow the pdb/ipdb debugger to function and step through a failing test.

Building up the project

Now let’s build up the project. My general flow is as follows:

  1. Decide what work to do next.
  2. Sketch out the interface for that work.
  3. Write some basic (failing, usually lightweight) tests that will pass when the work is done.
  4. Do the work.
  5. Add more nuanced tests if needed, based on what is learned during the work.
  6. Repeat until the work is done.

This strategy is sometimes called “the design recipe,” and I first heard about it from my undergraduate programming professor John Clements at Cal Poly, via the book “How to Design Programs.” Even if I don’t always use it, I find it’s a useful mental framework for getting things done.

For this project, I want to search through positive integers, and for each one I want to compute a divisor sum, do some other arithmetic, and compare that against some other number. I suspect divisor sum computations will be the hard/interesting part, but to start I will code up a slow/naive implementation with some working tests, confirm my understanding of the end-to-end problem, and then improve the pieces as needed.

In this commit, I implement the naive divisor sum code and tests. Note the commit also shows how to tell pytest to test for a raised exception. In this commit I implement the main search routine and confirm John’s claim about n=10080 (thanks for the test case!).

These tests already showcase a few testing best practices:

  • Test only one behavior at a time. Each test has exactly one assertion in it. This is good practice because when a test fails you won’t have to dig around to figure out exactly what went wrong.
  • Use the tests to help you define the interface, and then only test against that interface. The hard part about writing clean and clear software is defining clean and clear interfaces that work together well and hide details. Math does this very well, because definitions like \sigma(n) do not depend on how n is represented. In fact, math really doesn’t have “representations” of its objects—or more precisely, switching representations is basically free, so we don’t dwell on it. In software, we have to choose excruciatingly detailed representations for everything, and so we rely on the software to hide those details as much as possible. The easiest way to tell if you did it well is to try to use the interface and only the interface, and tests are an excuse to do that, which is not wasted effort by virtue of being run to check your work.

Improving Efficiency

Next, I want to confirm John’s claim that n=10080 is the best example between 5041 and a million. However, my existing code is too slow. Running the tests added in this commit seems to take forever.

We profile to confirm our suspected hotspot:

>>> import cProfile
>>> from riemann.counterexample_search import best_witness
>>> cProfile.run('best_witness(10000)')
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
54826    3.669    0.000    3.669    0.000 divisor.py:10(<genexpr>)

As expected, computing divisor sums is the bottleneck. No surprise there because it makes the search take quadratic time. Before changing the implementation, I want to add a few more tests. I copied data for the first 50 integers from OEIS and used pytest’s parameterize feature since the test bodies are all the same. This commit does it.

Now I can work on improving the runtime of the divisor sum computation step. Originally, I thought I’d have to compute the prime factorization to use this trick that exploits the multiplicativity of \sigma(n), but then I found this approach due to Euler in 1751 that provides a recursive formula for the sum and skips the prime factorization. Since we’re searching over all integers, this allows us to trade off the runtime of each \sigma(n) computation against the storage cost of past \sigma(n) computations. I tried it in this commit, using python’s built-in LRU-cache wrapper to memoize the computation. The nice thing about this is that our tests are already there, and the interface for divisor_sum doesn’t change. This is on purpose, so that the caller of divisor_sum (in this case tests, also client code in real life) need not update when we improve the implementation. I also ran into a couple of stumbling blocks implementing the algorithm (I swapped the order of the if statements here), and the tests made it clear I messed up.

However, there are two major problems with that implementation.

  1. The code is still too slow. best_witness(100000) takes about 50 seconds to run, almost all of which is in divisor_sum.
  2. Python hits its recursion depth limit, and so the client code needs to eagerly populate the divisor_sum cache, which is violates encapsulation. The caller should not know anything about the implementation, nor need to act in a specific way to accommodate hidden implementation details.

I also realized after implementing it that despite the extra storage space, the runtime is still O(n^{3/2}), because each divisor-sum call requires O(n^{1/2}) iterations of the loop. This is just as slow as a naive loop that checks divisibility of integers up to \sqrt{n}. Also, a naive loop allows me to plug in a cool project called numba that automatically speeds up simple Python code by compiling it in place. Incidentally, numba is known to not work with lru_cache, so I can’t tack it on my existing implementation.

So I added numba as a dependency and drastically simplified the implementation. Now the tests run in 8 seconds, and in a few minutes I can upgrade John’s claim that n=10080 is the best example between 5041 and a million, to the best example between 5041 and ten million.

Next up

This should get you started with a solid pytest setup for your own project, but there is a lot more to say about how to organize and run tests, what kinds of tests to write, and how that all changes as your project evolves.

For this project, we now know that the divisor-sum computation is the bottleneck. We also know that the interesting parts of this project are yet to come. We want to explore the patterns in what makes these numbers large. One way we could go about this is to split the project into two components: one that builds/manages a database of divisor sums, and another that analyzes the divisor sums in various ways. The next article will show how the database set up works. When we identify relevant patterns, we can modify the search strategy to optimize for that. As far as testing goes, this would prompt us to have an interface layer between the two systems, and to add fakes or mocks to test the components in isolation.

After that, there’s the process of automating test running, adding tests for code quality/style, computing code coverage, adding a type-hint checker test, writing tests that generate other tests, etc.

If you’re interested, let me know which topics to continue with. I do feel a bit silly putting so much pomp and circumstance around such a simple computation, but hopefully the simplicity of the core logic makes the design and testing aspects of the project clearer and easier to understand.

Taylor Series and Accelerometers

In my book, A Programmer’s Introduction to Mathematics, I describe the Taylor Series as a “hammer for every nail.” I learned about another nail in the design of modern smartphone accelerometers from “Eight Amazing Engineering Stories” by Hammack, Ryan, and Ziech, which I’ll share here.

These accelerometers are designed using a system involving three plates, which correspond to two capacitors. A quick recap on my (limited) understanding of how capacitors work. A capacitor involving two conductive plates looks like this:

A two-plate capacitor hooked up to a battery. (source)

The voltage provided by the battery pushes electrons along the negative direction, or equivalently pushing “charge” along the positive direction (see the difference between charge flow and election flow). These elections build up in the plate labeled -Q, and the difference in charge across the two plates generates an electric field. If that electric field is strong enough, the electrons can jump the gap to the positive plate and complete the circuit. Otherwise, the plate reaches “capacity” and current stops flowing. Whether the jump happens or the current stops depends on the area of the plate A, the distance between the plates d, and the properties of the material between the plates, the last one is called the “dielectric constant” \varepsilon. (Nb., I’m not sure why it doesn’t depend on the material the plate is composed of, but I imagine it’s smooshed into the dielectric constant if necessary) This relationship is summarized by the formula

\displaystyle C = \frac{\varepsilon A}{d}

Then, an external event can cause the plates to move close enough together so that the electrons can jump the gap and current can begin to flow. This discharges the negatively charged plate.

A naive, Taylor-series-free accelerometer could work as follows:

  1. Allow the negatively charged plate to wobble a little bit by fixing just one end of the plate, pictured like a diving board (a cantilever).
  2. The amount of wobble will be proportional to the force of acceleration due to Hooke’s law for springs.
  3. When displaced by a distance of \delta, the capacitance in the plate changes to C = \frac{\varepsilon A}{d - \delta}.
  4. Use the amount of discharge to tell how much the plate displaced.

This is able to measure the force of acceleration in one dimension, and so thee of these devices are arranged in perpendicular axes to allow one to measure acceleration in 3-dimensional space.

The problem with this design is that C = \frac{\varepsilon A}{d - \delta} is a nonlinear change in capacitance with respect to the amount of displacement. To see how nonlinear, expand this as a Taylor series:

\displaystyle \begin{aligned} C &= \frac{\varepsilon A}{d - \delta} \\ &= \frac{\varepsilon A}{d} \left ( \frac{1}{1 -  \frac{\delta}{d}} \right ) \\ &= \frac{\varepsilon A}{d} \left ( 1 + \frac{\delta}{d} + \left ( \frac{\delta}{d} \right )^2 + O_{\delta \to 0}(\delta^3) \right ) \end{aligned}

I’m using the big-O notation O_{\delta \to 0}(\delta^3) to more rigorously say that I’m “ignoring” all cubic and higher terms. I can do this because in these engineering systems (I’m taking Hammack at his word here), the quantity (\delta / d)^2 is meaningfully large, but later terms like (\delta / d)^3 are negligibly small. Of course, this is only true when the displacement \delta is very small compared to d, which is why the big-O has a subscript \delta \to 0.

Apparently, working backwards through the nonlinearity in the capacitance change is difficult enough to warrant changing the design of the system. (I don’t know why this is difficult, but I imagine it has to do with the engineering constraints of measurement devices; please do chime in if you know more)

The system design that avoids this is a three-plate system instead of a two-plate system.

A three-plate system design. The middle plate wobbles between two charged plates. (source)

In this system, the middle plate moves back and forth between two stationary plates that are connected to a voltage source. As it moves away from one and closer to the other, the increased capacitance on one side is balanced by the decreased capacitance on the other. The Taylor series shows how these two changes cancel out on the squared term only.

If C_1 = \frac{\varepsilon A}{d - \delta} represents the changed capacitance of the left plate (the middle plate moves closer to it), and C_2 = \frac{\varepsilon A}{d + \delta} represents the right plate (the middle plate moves farther from it), then we expand the difference in capacitances via Taylor series (using the Taylor series for 1/(1-x) for both, but in the 1 + \delta/d case it’s 1 / (1 - (-x))).

\displaystyle \begin{aligned} C_1 - C_2 &= \frac{\varepsilon A}{d - \delta} - \frac{\varepsilon A}{d + \delta} \\ &= \frac{\varepsilon A}{d} \left ( \frac{1}{1 - \frac{\delta}{d}} - \frac{1}{1 + \frac{\delta}{d}} \right ) \\ &= \frac{\varepsilon A}{d} \left ( 1 + \frac{\delta}{d} + \left ( \frac{\delta}{d} \right )^2 + O_{\delta \to 0}(\delta^3) - 1 + \frac{\delta}{d} - \left ( \frac{\delta}{d} \right )^2 + O_{\delta \to 0}(\delta^3) \right ) \\ &= \frac{\varepsilon A}{d} \left ( \frac{2\delta}{d} + O_{\delta \to 0}(\delta^3) \right ) \end{aligned}

Again, since the cubic and higher terms are negligibly small, we can “ignore” those parts. What remains is a linear response to the change in the middle plate’s displacement. This makes it significantly easier to measure. Because we’re measuring the difference in capacitance, this design is called a “differential capacitor.”

Though the math is tidy in retrospect, I marvel at how one might have conceived of this design from scratch. Did the inventor notice the symmetries in the Taylor series approximations could be arranged to negate each other? Was there some other sort of “physical intuition” at play?

Until next time!

Contextual Symbols in Math

In my book I discuss the importance of context in reading and writing mathematics. An early step in becoming comfortable with math is deciphering the syntax of mathematical expressions. Another is in connecting the symbols to their semantic meanings. Embedded in these is the subproblem of knowing what to call the commonly used symbols. The more abstract you go, the more exotic the symbols tend to get.

Wikipedia has an excellent list for deciphering those symbols that have a typically well-understood meaning, like \otimes and \mathbb{Q}. There is another list for common associations of Greek letters in science and math, along with the corresponding English/Latin list. There’s also a great website called Detexify that guesses the name of a symbol from your drawing. It’s a great way to lookup a confusing symbol.

To augment these resources, I’ll describe a few context-clues I’ve picked up over the years, and my first-instinct contextual association of each Greek letter. I wonder if there should be a database of such contextual associations.

Context clues

Variables represent a word with a related starting letter or sound. E.g., f for “function,” or t for “time.” Greek letters do this too. For example, \pi (lower case pi) is the Greek “p”, and it might be used for a “projection” function. Or $\latex Gamma$ (capital gamma, the Greek “G”) for a “graph.” This can help, for example, when trying to determine the type of the variable v. In many cases you can quickly deduce it’s a vector.

Capital letters and lower case letters are usually related. For example, a might be a member of the set A. A function F might be constructed from another function f in such a way that all the information in f is preserved, but F is somehow “bigger” (e.g., the relationship between the probability density function and the cumulative density function). For this reason, it can help to know greek counterparts, e.g., that \sigma is the Greek lower case of $\latex Sigma$.

Adjacent letters are often related, both withinin and across alphabets. The variables a, b, c are often used together to represent different parts of the same object, while letters like x, y, z are used to represent a semantically different part of the object. For example, ax + by + cz = 0 carries a strong association that a, b, c are fixed constants and x,y,z are unknown variables. Greek does this too. A triangle might have its side-lengths as a, b, c, and for each side length the opposite angle to that side gets the corresponding first three Greek letters \alpha, \beta, \gamma.

Fonts can imply semantics. The blackboard-bold font represents systems of numbers, as in \mathbb{Q, R, C, H, Z}. The lower-case fraktur font represents ideals in ring theory, particularly prime ideals, like \mathfrak{a, b, g, h}. Calligraphic fonts like \mathcal{C} are used for higher-order structures after the context of lower-case and capital letters are already set, like categories (calligraphic) of sets (capital) of elements (lower case). I have seen some cases where sans-serif fonts are used in this role when calligraphic fonts are taken.

Greek alphabet first impulse associations

Pronunciation rules from MIT

These are bound to be biased and incomplete. I’m interested to hear your associations. Which symbols always seem to be used in the same way, or come attached with the same loose association?

  • \alpha (lower case alpha) – a generic variable; an angle of a triangle
  • \beta (lower case beta) – a generic variable; a different angle of a triangle
  • \Gamma (capital gamma) – the gamma function; the graph of a function as a set of pairs; a graph in the sense of graph theory
  • \gamma (lower case gamma) – a closed curve (e.g., integrating over the real or complex plane)
  • \Delta (capital delta) – a change; the max degree of a graph
  • \delta (lower case delta) – a change; a small positive value that you may choose; an impulse
  • \varepsilon (lower case epsilon) – a small arbitrary positive real number; an error rate
  • \zeta (lower case zeta) – the Riemann zeta function; a complex variable (like z)
  • \eta (lower case eta) – an error rate; a step size in an algorithm
  • \Theta (capital theta) – an angle; asymptotic equivalence
  • \theta (lower case theta) – an angle
  • \iota (lower case iota) – the inclusion function; an injective function or embedding
  • \kappa (lower case kappa) – curvature; connectivity; conditioning
  • \Lambda (capital lambda) – matrix of eigenvalues
  • \lambda (lower case lambda) – an eigenvalue of a matrix
  • \mu (lower case mu) – a measure; a mean;
  • \nu (lower case nu) – None. I hate this letter because I think it’s hard to draw and it’s too close to v.
  • \Xi (capital xi, like “ksee”) – None. It’s a weird letter that only a math troll would love. Too close to \equiv and conflicts with drawing bars on top of variables.
  • \xi (lower case xi) – a complex variable; I often draw it like a lightning bolt with a hat.
  • \Pi (capital pi) – a product;
  • \pi (lower case pi) – the constant~3.14; a projection
  • \rho – projection; a rate; a rotation; a correlation
  • \Sigma (upper case sigma) – sum; an alphabet of symbols; covariance matrix
  • \sigma (lower case sigma) – standard deviation; a symmetry; a reflection; a sign +/-1.
  • \tau (lower case tau) – a symmetry; a translation; if I really want to use \pi but it would be confusing, so I draw it standing like a bird holding up one of its legs.
  • \Upsilon (capital upsilon) – ??
  • \upsilon (lower case upsilon) – Too close to “u,v” not used.
  • \Phi (capital phi, like “fai”, though many pronounce “fee”) – a general function; a potential function
  • \phi or \varphi (lower case phi) – a function/morphism; golden ratio; totient function
  • \chi (lower case chi, like “kai”) – a character; a statistic; the indicator function of an event; Euler characteristic
  • \Psi (capital psi, like “sai”) – a function/morphism;
  • \psi (lower case psi) – a function/morphism
  • \Omega (capital omega) – a lower bound
  • \omega (lower case omega) – a lower bound; a complex cube root of 1

Musings on A New Interface for Mathematics

This essay is a slightly modified version of the closing chapter of A Programmer’s Introduction to Mathematics.

We are no longer constrained by pencil and paper. The symbolic shuffle should no longer be taken for granted as the fundamental mechanism for understanding quantity and change. Math needs a new interface.

–Bret Victor, “Kill Math”

Math is a human activity. It’s messy and beautiful, complicated and elegant, useful and bull-headedly frustrating. But in reading this book, A Programmer’s Introduction to Mathematics, my dream is that you have found the attitude, confidence, and enough prerequisite knowledge to continue to engage with mathematics beyond these pages. I hope that you will find the same joy that I have in the combination of math and programming.

In these closing words, I’d like to explore a vision for how mathematics and software can grow together. Much of our effort in this book involved understanding notation, and using our imagination to picture arguments written on paper. In contrast, there’s a growing movement that challenges mathematics to grow beyond its life on a chalkboard.

One of the most visible proponents of this view is Bret Victor. If you haven’t heard of him or seen his fantastic talks, please stop reading now and go watch his talk, “Inventing on Principle.” It’s worth every minute. Victor’s central thesis is that creators must have an immediate connection to their work. As such, Victor finds it preposterous that programmers often have to write code, compile, run, debug, and repeat every time they make a change. Programmers shouldn’t need to simulate a machine inside their head when designing a program—there’s a machine sitting right there that can perform the logic perfectly!

Victor reinforces his grand, yet soft-spoken ideas with astounding prototypes. But his ideas are deeper than a flashy user interface. Victor holds a deep reverence for ideas and enabling creativity. He doesn’t want to fundamentally change the way people interact with their music library. He wants to fundamentally change the way people create new ideas. He wants to enable humans to think thoughts that could not previously have been thought at all. You might wonder what one could possibly mean by “think new thoughts,” but fifteen minutes of Victor’s talk will show you and make disbelieve how we could have possibly made do without the typical software write-compile-run loop. His demonstrations rival the elegance of the finest mathematical proofs.

Just as Lamport’s structured proof hierarchies and automated assistants are his key to navigating complex proofs, and similarly to how Atiyah’s most effective tool is a tour of ideas that pique his interest, Victor feels productive when he has an immediate connection with his work. A large part of it is having the thing you’re creating react to modifications in real time. Another aspect is simultaneously seeing all facets relevant to your inquiry. Rather than watch a programmed car move over time, show the entire trajectory for a given control sequence, the view updating as the control sequence updates. Victor demonstrates this to impressive effect. (It’s amusing to see an audience’s wild applause for this, when the same people might easily have groaned as students being asked to plot the trajectories of a differential equation, despite the two concepts being identical. No doubt it is related to the use of a video game.)

It should not surprise you, then, that Victor despises mathematical notation. In his essay “Kill Math,” Victor argues that a pencil and paper is the most antiquated and unhelpful medium for using mathematics. Victor opines on what a shame it is that so much knowledge is only accessible to those who have the unnatural ability to manipulate symbols on paper. How many good ideas were never thought because of that high bar?

One obvious reason for the ubiquity of mathematical notation is an accident of history’s most efficient information distribution systems, the printing press and later the text-based internet. But given our fantastic new technology—virtual reality, precise sensors, machine learning algorithms, brain-computer interfaces—how is it that mathematics is left in the dust? Victor asks all these questions and more.

I have to tread carefully here, because mathematics is a large part of my identity. When I hear “kill math,” my lizard brain shoots sparks of anger. For me, this is a religious issue deeper than my favorite text editor. Even as I try to remain objective and tactful, take what I say with a grain of salt.

Overall, I agree with Victor’s underlying sentiment. Lots of people struggle with math, and a better user interface for mathematics would immediately usher in a new age of enlightenment. This isn’t an idle speculation. It has happened time and time again throughout history. The Persian mathematician Muhammad ibn Musa al-Khwarizmi invented algebra (though without the symbols for it) which revolutionized mathematics, elevating it above arithmetic and classical geometry, quickly scaling the globe. Make no mistake, the invention of algebra literally enabled average people to do contemporarily advanced mathematics. I’m surprised Victor does not reference algebra as a perfect example of a tool for thinking new thoughts, even if before arguing its time has passed.

And it only gets better, deeper, and more nuanced. Shortly after the printing press was invented French mathematicians invented modern symbolic notation for algebra, allowing mathematics to scale up in complexity. Symbolic algebra was a new user interface that birthed countless new thoughts. Without this, for example, mathematicians would never have discovered the connections between algebra and geometry that are so prevalent in modern mathematics and which lay the foundation of modern physics. Later came the invention of set theory, and shortly after category theory, which were each new and improved user interfaces that allowed mathematicians to express deeper, more unified, and more nuanced ideas than was previously possible.

Meanwhile, many of Victor’s examples of good use of his prototypes are “happy accidents.” By randomly fiddling with parameters (and immediately observing the effect), Victor stumbles upon ideas that would never occur without the immediacy. To be sure, serendipity occurs in mathematics as well. Recall Andrew Wiles fumbling in his dark room looking for a light switch. Many creative aspects of mathematics involve luck, good fortune, and “eureka” moments, but there is nowhere near the same immediacy.

Immediacy makes it dreadfully easy to explore examples, which is one of the most important techniques I hope you take away from this book! But what algebraic notation and its successors bring to the table beyond happenstance is to scale in complexity beyond the problem at hand. While algebra limits you in some ways—you can’t see the solutions to the equations as you write them—it frees you in other ways. You need not know how to find the roots of a polynomial before you can study them. You need not have a complete description of a group before you start finding useful homomorphisms. As Sir Arthur Eddington said, group theory studies operations that are as unknown as the quantities that they operate on. We didn’t need to understand precisely how matrices correspond to linear maps before studying them, as might be required to provide a useful interface meeting Victor’s standards. Indeed, it was algebraic grouping and rearranging (with cognitive load reduced by passing it off to paper) that provided the derivation of matrices in the first place.

Then there are the many “interfaces” that we’ve even seen in this book: geometry and the Cartesian plane, graphs with vertices and edges, pyramids of balls with arrows, drawings of arcs that we assert are hyperbolic curves, etc. Mathematical notation goes beyond “symbol manipulation,” because any picture you draw to reason about a mathematical object is literally mathematical notation.

Ultimately I agree with Victor on his underlying principles. We share inspirations from computing history. We appreciate many of the same successes. However, I see a few ways Victor’s work falls short of enabling new modes of thought, particularly insofar as it aims to replace mathematical notation. I’ll outline the desiderata I think a new interface for mathematics must support if it hopes to replace notation.

  1. Counterfactual reasoning: The interface must support reasoning about things that cannot logically exist.
  2. Meaning assignment: The interface must support assigning arbitrary semantic meaning to objects.
  3. Flexible complexity: The interface should be as accessible to a child learning algebra as to a working mathematician.
  4. Incrementalism: Adapting the interface to study a topic must not require encoding extensive prior knowledge about that topic.

The last two properties are of particular importance for any interface. Important interfaces throughout history satisfy the last two, including spoken language, writing, most tools for making art and music, spreadsheets, touchscreens and computer mice, keyboards (layouts of buttons and toggles in general, of which QWERTY is one), and even the classic text editors vim and emacs—anyone can use them in a basic fashion, while experts dazzle us with them.

Let’s briefly explore each desired property.

Counterfactual Reasoning

Because mathematical reasoning can be counterfactual, any system for doing mathematics must allow for the possibility that the object being reasoned about cannot logically exist. We’ve seen this time and again in this book when we do proof by contradiction: we assume to the contrary that some object {A} exists, and we conclude via logic that 1 = 2 or some other false statement, and then {A}, which we handled as concretely as we would throw a ball, suddenly never existed to begin with. There is no largest prime, but I can naively assume that there is and explore what happens when I square it. Importantly, the interface need not encode counterfactual reasoning literally. It simply needs to support the task of counterfactual reasoning by a human.

Lumped in with this is population reasoning. I need to be able to reason about the entire class of all possible objects satisfying some properties. The set of all algorithms that compute a function (even if no such algorithm exists), or the set of all distance-preserving functions of an arbitrary space. These kinds of deductions are necessary to organize and synthesize ideas from disparate areas of math together (connecting us to “Flexible complexity” below).

A different view is that a useful interface for mathematics must necessarily allow the mathematician to make mistakes. But part of the point of a new interface was to avoid the mistakes and uncertainty that pencil and paper make frequent! It’s not entirely clear to me whether counterfactual reasoning necessarily enables mistakes. It may benefit from a tradeoff between the two extremes.

Meaning Assignment

One of the handiest parts of mathematical notation is being able to draw an arbitrary symbol and imbue it with arbitrary semantic meaning. {N} is a natural number by fiat. I can write {f(ab) = f(a)f(b)} and overload which multiplication means what. I can define a new type of arrow {\hookrightarrow} on the fly and say “this means injective map.”

This concept is familiar in software, but the defining feature in mathematics is that one need not know how to implement it to assert it and then study it. This ties in with “Incrementalism” below. Anything I can draw, I can give logical meaning.

Ideally the interface also makes the assignment and management of meaning easy. That is, if I’ve built up an exploration of a problem involving pennies on a table, I should easily be able to change those pennies to be coins of arbitrary unknown denomination. And then allow them to be negative-valued coins. And then give them a color as an additional property. And it should be easy to recall what semantics are applied to which objects later. If each change requires me to redo large swaths of work (as many programs built specifically to explore such a problem would), the interface will limit me. With algebraic notation, I could simply add another index, or pull out a colored pencil (or pretend it’s a color with shading), and continue as before. In real life I just say the word, even if doing so makes the problem drastically more difficult.

Flexible Complexity

Music is something that exhibits flexible complexity. A child raps the keys of a piano and makes sounds. So too does Ray Charles, though his technique is multifaceted and deliberate.

Mathematics has similar dynamic range that can accommodate the novice and the expert alike. Anyone can make basic sense of numbers and unknowns. Young children can understand and generate simple proofs. With a decent grasp of algebra, one can compute difficult sums. Experts use algebra to develop theories of physics, write computer programs with provable guarantees, and reallocate their investment portfolios for maximum profit.

On the other hand, most visual interactive explorations of mathematics—as impressive and fun as they are—are single use. Their design focuses on a small universe of applicable ideas, and the interface is more about guiding you toward a particular realization than providing a tool. These are commendable, but when the experience is over one returns to pencil and paper.

The closest example of an interface I’ve seen that meets the kind of flexible complexity I ask of a replacement for mathematics is Ken Perlin’s Chalktalk. Pegged as a “digital presentation and communication language,” the user may draw anything they wish. If the drawing is recognized by the system, it becomes interactive according to some pre-specified rules. For example, draw a circle at the end of a line, and it turns into a pendulum you can draw to swing around. Different pieces are coupled together by drawing arrows; one can plot the displacement of the pendulum by connecting it via an arrow to a plotting widget. Perlin displays similar interactions between matrices, logical circuits, and various sliders and dials.

Chalktalk falls short in that your ability to use it is limited by what has been explicitly programmed into it as a behavior. If you don’t draw the pendulum just right, or you try to connect a pendulum via an arrow to a component that doesn’t understand its output, you hit a wall. To explain to the interface what you mean, you write a significant amount of code. This isn’t a deal breaker, but rather where I personally found the interface struggling to keep up with my desires and imagination. What’s so promising about Chalktalk is that it allows one to offset the mental task of keeping track of interactions that algebraic notation leaves to manual bookkeeping.

Incrementalism

Incrementalism means that if I want to repurpose a tool for a new task, I don’t already need to be an expert in the target task to use the tool on it. If I’ve learned to use a paintbrush to paint a flower on a canvas, I need no woodworking expertise to paint a fence. Likewise, if I want to use a new interface for math to study an optimization problem, using the interface shouldn’t require me to solve the problem in advance. Algebra allows me to pose and reason about an unknown optimum of a function; so must any potential replacement for algebra.

Geometry provides an extended example. One could develop a system in which to study classical geometry, and many such systems exist (Geogebra is a popular one, and quite useful in its own right!). You could enable this system to draw and transform various shapes on demand. You can phrase theorems from Euclidean geometry in it, and explore examples with an immediate observation of the effect of any operation.

Now suppose we want to study parallel lines; it may be as clear as the day from simulations that two parallel lines never intersect, but does this fact follow from the inherent properties of a line? Or is it an artifact of the implementation of the simulation? As we remember, efficient geometry algorithms can suffer from numerical instability or fail to behave properly on certain edge cases. Perhaps parallel lines intersect, but simply very far away and the interface doesn’t display it well? Or maybe an interface that does display far away things happens to make non-intersecting lines appear to intersect due to the limitations of our human eyes and the resolution of the screen.

In this system, could one study the possibility of a geometry in which parallel lines always intersect? With the hindsight of our Chapter on group theory, we know such geometries exist (projective geometry has this property), but suppose this was an unknown conjecture. To repurpose our conventional interface for studying geometry would seem to require defining a correct model for the alternative geometry in advance. Worse, it might require us to spend weeks or months fretting over the computational details of that model. We might hard-code an intersection point, effectively asserting that intersections exist. But then we need to specify how two such hard-coded points interact in a compatible fashion, and decide how to render them in a useful way. If it doesn’t work as expected, did we mess up the implementation, or is it an interesting feature of the model? All this fuss before we even know whether this model is worth studying!

This is mildly unfair, as the origins of hyperbolic geometry did, in fact, come from concrete models. The point is that the inventors of this model were able to use the sorts of indirect tools that precede computer-friendly representations. They didn’t need a whole class of new insights to begin. If the model fails to meet expectations early on, they can throw it out without expending the effort that would have gone into representing it within our hypothetical interface.

On the Shoulders of Giants

Most of my objections boil down to the need to create abstractions not explicitly programmed into the interface. Mathematics is a language, and it’s expressiveness is a core feature. Like language, humans use it primarily to communicate to one another. Like writing, humans use it to record thoughts in times of inspiration, so that memory can be offset to paper and insights can be reproduced faithfully later. Paraphrasing Thurston, mathematics only exists in the social fabric of the people who do it. An interface purporting to replace mathematical notation must build on the shoulders of the existing mathematics community. As Isaac Newton said, “If I have seen further it is by standing on the shoulders of giants.”

The value of Victor’s vision lies in showing us what we struggle to see in our minds. Now let’s imagine an interface that satisfies our desiderata, but also achieves immediacy with one’s work. I can do little more than sketch a dream, but here it is.

Let’s explore a puzzle played on an infinite chessboard, which I first learned from mathematician Zvezdelina Stankova via the YouTube channel Numberphile. You start with an integer grid {\mathbb{N} \times \mathbb{N}}, and in each grid cell {(i, j)} you can have a person or no person. The people are called “clones” because they are allowed to take the following action: if cells {(i+1, j)} and {(i, j+1)} are both empty, then the clone in cell {(i, j)} can split into two clones, which now occupy spaces {(i+1, j), (i, j+1)}, leaving space {(i, j)} vacant. You start with three clones in “prison” cells {(1, 1), (1, 2), (2, 1)}, and the goal is to determine if there is a finite sequence of moves, after which all clones are outside the prison. For this reason, Stankova calls the puzzle “Escape of the Clones.”

clones-move

An example move in “Escape of the Clones” whereby the solid-bordered clone transforms into the two dotted-border clones.

clones-start

The starting configuration for the puzzle.

Suppose that our dream interface is sufficiently expressive that it can encode the rules of this puzzle, and even simulate attempts to solve it. If the interface is not explicitly programmed to do this, it would already be a heroic accomplishment of meaning assignment and flexible complexity.

Now after playing with it for a long time, you start to get a feeling that it is impossible to free the clones. We want to use the interface to prove this, and we can’t already know the solution to do so. This is incrementalism.

If we were to follow in Stankova’s footsteps, we’d employ two of the mathematician’s favorite tools: proof by contradiction and the concept of an invariant. The invariant would be the sum of some weights assigned to the initial clones: the clone in cell {(1, 1)} has weight 1, and the clone in cells {(1, 2), (2, 1)} each get weight 1/2. To be an invariant, a clone’s splitting action needs to preserve weight. A simple way to do this is to simply have the cloning operation split a clone’s current weight in half. So a clone in cell {(2, 1)} with weight 1/2 splits into two clones in cells {(2, 2), (3, 1)} each of weight 1/4. We can encode this in the interface, and the interface can verify for us that the invariant is indeed an invariant. In particular, the weight of a clone depends only on its position, so that the weight of a clone in position {(i,j)} is {2^{-(i + j - 2)}}. The interface would determine this and tell us. This is immediacy.

Then we can, with the aid of the interface, compute the weight-sum of any given configuration. The starting region’s weight is 2, and it remains 2 after any sequence of operations. It dawns on us to try filling the entire visible region outside the prison with clones. We have assumed to the contrary that an escape sequence exists, in which the worst case is that it fills up vast regions of the plane. The interface informs us that our egregiously crowded region has weight 1.998283. We then ask the interface to fill the entire complement of the prison with clones (even though that is illegal; the rules imply you must have a finite sequence of moves!). It informs us that weight is also 2. We realize that if any cell is cloneless, as must be true after a finite number of moves, we will have violated the invariant. This is counterfactual reasoning.

Frankly, an interface that isn’t explicitly programmed to explore this specific proof—yet enables an exploration that can reveal it in a more profound way than paper, pencil, and pondering could—sounds so intractable that I am tempted to scrap this entire essay in utter disbelief. How can an interface be so expressive without simply becoming a general-purpose programming language? What would prevent it from displaying the same problems that started this inquiry? What precisely is it about the nature of human conversation that makes it so difficult to explain the tweaks involved in exploring a concept to a machine?

While we may never understand such deep questions, it’s clear that abstract logic puzzles and their proofs provide an excellent test bed for proposals. Mathematical puzzles are limited, but rich enough to guide the design of a proposed interface. Games involve simple explanations for humans with complex analyses (flexible complexity), drastically different semantics for abstract objects like chessboards and clones (meaning assignment), there are many games which to this day still have limited understanding by experts (incrementalism), and the insights in many games involve reasoning about hypothetical solutions (counterfactual reasoning).

In his book “The Art of Doing Science and Engineering,” the mathematician and computer scientist Richard Hamming put this difficulty into words quite nicely,

It has rarely proved practical to produce exactly the same product by machines as we produced by hand. Indeed, one of the major items in the conversion from hand to machine production is the imaginative redesign of an equivalent product. Thus in thinking of mechanizing a large organization, it won’t work if you try to keep things in detail exactly the same, rather there must be a larger give-and-take if there is to be a significant success. You must get the essentials of the job in mind and then design the mechanization to do that job rather than trying to mechanize the current version—if you want a significant success in the long run.

Hamming’s attitude about an “equivalent product” summarizes the frustration of writing software. What customers want differs from what they say they want. Automating manual human processes requires arduously encoding the loose judgments made by humans—often inconsistent and based on folk lore and experience. Software almost always falls short of really solving your problem. Accommodating the shortcomings requires a whole extra layer of process.

We write programs to manage our files, and in doing so we lose much of the spatial reasoning that helps people remember where things are. The equivalent product is that the files are stored and retrievable. On the other hand, for mathematics the equivalent product is human understanding. This should be no surprise by now, provided you’ve come to understand the point of view espoused throughout this book. In this it deviates from software. We don’t want to retrieve the files, we want to understand the meaning behind their contents.

My imagination may thus defeat itself by failing to give any ground. If a new interface is to replace pencil and paper mathematics, must I give up the ease of some routine mathematical tasks? Or remove them from my thinking style entirely? Presuming I can achieve the same sorts of understanding—though I couldn’t say how—the method of arrival shouldn’t matter. And yet, this attitude ignores my experience entirely. The manner of insight you gain when doing mathematics is deeply intertwined with the method of inquiry. That’s precisely why Victor’s prototypes allow him to think new thoughts!

Mathematics succeeds only insofar as it advances human understanding. Pencil and paper may be the wrong tool for the next generation of great thinkers. But if we hope to enable future insights, we must understand how and why the existing tools facilitated the great ideas of the past. We must imbue the best features of history into whatever we build. If you, dear programmer, want to build those tools, I hope you will incorporate the lessons and insights of mathematics.

Until then!