Searching for RH Counterexamples — Setting up Pytest

Some mathy-programmy people tell me they want to test their code, but struggle to get set up with a testing framework. I suspect it’s due to a mix of:

  • There are too many choices with a blank slate.
  • Making slightly wrong choices early on causes things to fail in unexpected ways.

I suspect the same concerns apply to general project organization and architecture. Because Python is popular for mathy-programmies, I’ll build a Python project that shows how I organize my projects and and test my code, and how that shapes the design and evolution of my software. I will use Python 3.8 and pytest, and you can find the final code on Github.

For this project, we’ll take advice from John Baez and explore a question that glibly aims to disprove the Riemann Hypothesis:


Let σ(n) be the sum of divisors of n. There are infinitely many n with σ(n)/(n ln(ln(n)) > 1.781. Can you find one? If you can find n > 5040 with σ(n)/(n ln(ln(n)) > 1.782, you’ll have disproved the Riemann Hypothesis.

I don’t expect you can disprove the Riemann Hypothesis this way, but I’d like to see numbers that make σ(n)/(n ln(ln(n)) big. It seems the winners are all multiples of 2520, so try those. The best one between 5040 and a million is n = 10080, which only gives 1.755814.

Initializing the Project

One of the hardest parts of software is setting up your coding environment. If you use an integrated development environment (IDE), project setup is bespoke to each IDE. I dislike this approach, because what you learn when using the IDE is not useful outside the IDE. When I first learned to program (Java), I was shackled to Eclipse for years because I didn’t know how to compile and run Java programs without it. Instead, we’ll do everything from scratch, using only the terminal/shell and standard Python tools. I will also ignore random extra steps and minutiae I’ve built up over the years to deal with minor issues. If you’re interested in that and why I do them, leave a comment and I might follow up with a second article.

This article assumes you are familiar with the basics of Python syntax, and know how to open a terminal and enter basic commands (like ls, cd, mkdir, rm). Along the way, I will link to specific git commits that show the changes, so that you can see how the project unfolds with each twist and turn.

I’ll start by creating a fresh Python project that does nothing. We set up the base directory riemann-divisor-sum, initialize git, create a readme, and track it in git (git add + git commit).

mkdir riemann-divisor-sum
cd riemann-divisor-sum
git init .
echo "# Divisor Sums for the Riemann Hypothesis" >
git add
git commit -m "add empty"

Next I create a Github project at (the name riemann-divisor-sum does not need to be the same, but I think it’s good), and push the project up to Github.

git remote add origin
# instead of "master", my default branch is really "main"
git push -u origin master   

Note, if you’re a new Github user, the “default branch name” when creating a new project may be “master.” I like “main” because it’s shorter, clearer, and nicer. If you want to change your default branch name, you can update to git version 2.28 and add the following to your ~/.gitconfig file.

    defaultBranch = main

Here is what the project looks like on Github as of this single commit.


Next I’ll install the pytest library which will run our project’s tests. First I’ll show what a failing test looks like, by setting up a trivial program with an un-implemented function, and a corresponding test. For ultimate simplicity, we’ll use Python’s built-in assert for the test lines. Here’s the commit.

# in the terminal
mkdir riemann
mkdir tests

# create riemann/ containing:
'''Compute the sum of divisors of a number.'''

def divisor_sum(n: int) -> int:
    raise ValueError("Not implemented.")

# create tests/ containing:
from riemann.divisor import divisor_sum

def test_sum_of_divisors_of_72():
    assert 195 == divisor_sum(72)

Next we install and configure Pytest. At this point, since we’re introducing a dependency, we need a project-specific place to store that dependency. All dependencies related to a project should be explicitly declared and isolated. This page helps explain why. Python’s standard tool is the virtual environment. When you “activate” the virtual environment, it temporarily (for the duration of the shell session or until you run deactivate) points all Python tools and libraries to the virtual environment.

virtualenv -p python3.8 venv
source venv/bin/activate

# shows the location of the overridden python binary path
which python
# outputs: /Users/jeremy/riemann-divisor-sum/venv/bin/python

Now we can use pip as normal and it will install to venv. To declare and isolate the dependency, we write the output of pip freeze to a file called requirements.txt, and it can be reinstalled using pip install -r requirements.txt. Try deleting your venv directory, recreating it, and reinstalling the dependencies this way.

pip install pytest
pip freeze > requirements.txt
git add requirements.txt
git commit -m "requirements: add pytest"

# example to wipe and reinstall
# deactivate
# rm -rf venv
# virtualenv -p python3.8 venv
# source venv/bin/activate
# pip install -r requirements.txt

As an aside, at this step you may notice git mentions venv is an untracked directory. You can ignore this, or add venv to a .gitignore file to tell git to ignore it, as in this commit. We will also have to configure pytest to ignore venv shortly.

When we run pytest (with no arguments) from the base directory, we see our first error:

    from riemann.divisor import divisor_sum
E   ModuleNotFoundError: No module named 'riemann'

Module import issues are a common stumbling block for new Python users. In order to make a directory into a Python module, it needs an file, even if it’s empty. Any code in this file will be run the first time the module is imported in a Python runtime. We add one to both the code and test directories in this commit.

When we run pytest (with no arguments), it recursively searches the directory tree looking for files like * and test_*.py loads them, and treats every method inside those files that are prefixed with “test” as a test. Non-“test” methods can be defined and used as helpers to set up complex tests. Pytest then runs the tests, and reports the failures. For me this looks like

Our first test failure.

Our implementation is intentionally wrong for demonstration purposes. When a test passes, pytest will report it quietly as a “.” by default. See these docs for more info on different ways to run the pytest binary and configure its output report.

In this basic pytest setup, you can put test files wherever you want, name the files and test methods appropriately, and use assert to implement the tests themselves. As long as your modules are set up properly, as long as imports are absolute (see this page for gory details on absolute vs. relative imports), and as long as you run pytest from the base directory, pytest will find the tests and run them.

Since pytest searches all directories for tests, this includes venv and __pycache__, which magically appears when you create python modules (I add __pycache__ to gitignore). Sometimes package developers will include test code, and pytest will then run those tests, which often fail or clutter the output. A virtual environment also gets large as you install big dependencies (like numpy, scipy, pandas), so this makes pytest slow to search for tests to run. To alleviate, the --norecursedirs command line flag tells pytest to skip directories. Since it’s tedious to type --norecursedirs='venv __pycache__' every time you run pytest, you can make this the default behavior by storing the option in a configuration file recognized by pytest, such as setup.cfg. I did it in this commit.

Some other command line options that I use all the time:

  • pytest test/dir to test only files in that directory, or pytest test/dir/ to test only tests in that file.
  • pytest -k STR to only run tests whose name contains “STR”
  • pytest -s to see see any logs or print statements inside tested code
  • pytest -s to allow the pdb/ipdb debugger to function and step through a failing test.

Building up the project

Now let’s build up the project. My general flow is as follows:

  1. Decide what work to do next.
  2. Sketch out the interface for that work.
  3. Write some basic (failing, usually lightweight) tests that will pass when the work is done.
  4. Do the work.
  5. Add more nuanced tests if needed, based on what is learned during the work.
  6. Repeat until the work is done.

This strategy is sometimes called “the design recipe,” and I first heard about it from my undergraduate programming professor John Clements at Cal Poly, via the book “How to Design Programs.” Even if I don’t always use it, I find it’s a useful mental framework for getting things done.

For this project, I want to search through positive integers, and for each one I want to compute a divisor sum, do some other arithmetic, and compare that against some other number. I suspect divisor sum computations will be the hard/interesting part, but to start I will code up a slow/naive implementation with some working tests, confirm my understanding of the end-to-end problem, and then improve the pieces as needed.

In this commit, I implement the naive divisor sum code and tests. Note the commit also shows how to tell pytest to test for a raised exception. In this commit I implement the main search routine and confirm John’s claim about n=10080 (thanks for the test case!).

These tests already showcase a few testing best practices:

  • Test only one behavior at a time. Each test has exactly one assertion in it. This is good practice because when a test fails you won’t have to dig around to figure out exactly what went wrong.
  • Use the tests to help you define the interface, and then only test against that interface. The hard part about writing clean and clear software is defining clean and clear interfaces that work together well and hide details. Math does this very well, because definitions like \sigma(n) do not depend on how n is represented. In fact, math really doesn’t have “representations” of its objects—or more precisely, switching representations is basically free, so we don’t dwell on it. In software, we have to choose excruciatingly detailed representations for everything, and so we rely on the software to hide those details as much as possible. The easiest way to tell if you did it well is to try to use the interface and only the interface, and tests are an excuse to do that, which is not wasted effort by virtue of being run to check your work.

Improving Efficiency

Next, I want to confirm John’s claim that n=10080 is the best example between 5041 and a million. However, my existing code is too slow. Running the tests added in this commit seems to take forever.

We profile to confirm our suspected hotspot:

>>> import cProfile
>>> from riemann.counterexample_search import best_witness
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
54826    3.669    0.000    3.669    0.000<genexpr>)

As expected, computing divisor sums is the bottleneck. No surprise there because it makes the search take quadratic time. Before changing the implementation, I want to add a few more tests. I copied data for the first 50 integers from OEIS and used pytest’s parameterize feature since the test bodies are all the same. This commit does it.

Now I can work on improving the runtime of the divisor sum computation step. Originally, I thought I’d have to compute the prime factorization to use this trick that exploits the multiplicativity of \sigma(n), but then I found this approach due to Euler in 1751 that provides a recursive formula for the sum and skips the prime factorization. Since we’re searching over all integers, this allows us to trade off the runtime of each \sigma(n) computation against the storage cost of past \sigma(n) computations. I tried it in this commit, using python’s built-in LRU-cache wrapper to memoize the computation. The nice thing about this is that our tests are already there, and the interface for divisor_sum doesn’t change. This is on purpose, so that the caller of divisor_sum (in this case tests, also client code in real life) need not update when we improve the implementation. I also ran into a couple of stumbling blocks implementing the algorithm (I swapped the order of the if statements here), and the tests made it clear I messed up.

However, there are two major problems with that implementation.

  1. The code is still too slow. best_witness(100000) takes about 50 seconds to run, almost all of which is in divisor_sum.
  2. Python hits its recursion depth limit, and so the client code needs to eagerly populate the divisor_sum cache, which is violates encapsulation. The caller should not know anything about the implementation, nor need to act in a specific way to accommodate hidden implementation details.

I also realized after implementing it that despite the extra storage space, the runtime is still O(n^{3/2}), because each divisor-sum call requires O(n^{1/2}) iterations of the loop. This is just as slow as a naive loop that checks divisibility of integers up to \sqrt{n}. Also, a naive loop allows me to plug in a cool project called numba that automatically speeds up simple Python code by compiling it in place. Incidentally, numba is known to not work with lru_cache, so I can’t tack it on my existing implementation.

So I added numba as a dependency and drastically simplified the implementation. Now the tests run in 8 seconds, and in a few minutes I can upgrade John’s claim that n=10080 is the best example between 5041 and a million, to the best example between 5041 and ten million.

Next up

This should get you started with a solid pytest setup for your own project, but there is a lot more to say about how to organize and run tests, what kinds of tests to write, and how that all changes as your project evolves.

For this project, we now know that the divisor-sum computation is the bottleneck. We also know that the interesting parts of this project are yet to come. We want to explore the patterns in what makes these numbers large. One way we could go about this is to split the project into two components: one that builds/manages a database of divisor sums, and another that analyzes the divisor sums in various ways. The next article will show how the database set up works. When we identify relevant patterns, we can modify the search strategy to optimize for that. As far as testing goes, this would prompt us to have an interface layer between the two systems, and to add fakes or mocks to test the components in isolation.

After that, there’s the process of automating test running, adding tests for code quality/style, computing code coverage, adding a type-hint checker test, writing tests that generate other tests, etc.

If you’re interested, let me know which topics to continue with. I do feel a bit silly putting so much pomp and circumstance around such a simple computation, but hopefully the simplicity of the core logic makes the design and testing aspects of the project clearer and easier to understand.

Taylor Series and Accelerometers

In my book, A Programmer’s Introduction to Mathematics, I describe the Taylor Series as a “hammer for every nail.” I learned about another nail in the design of modern smartphone accelerometers from “Eight Amazing Engineering Stories” by Hammack, Ryan, and Ziech, which I’ll share here.

These accelerometers are designed using a system involving three plates, which correspond to two capacitors. A quick recap on my (limited) understanding of how capacitors work. A capacitor involving two conductive plates looks like this:

A two-plate capacitor hooked up to a battery. (source)

The voltage provided by the battery pushes electrons along the negative direction, or equivalently pushing “charge” along the positive direction (see the difference between charge flow and election flow). These elections build up in the plate labeled -Q, and the difference in charge across the two plates generates an electric field. If that electric field is strong enough, the electrons can jump the gap to the positive plate and complete the circuit. Otherwise, the plate reaches “capacity” and current stops flowing. Whether the jump happens or the current stops depends on the area of the plate A, the distance between the plates d, and the properties of the material between the plates, the last one is called the “dielectric constant” \varepsilon. (Nb., I’m not sure why it doesn’t depend on the material the plate is composed of, but I imagine it’s smooshed into the dielectric constant if necessary) This relationship is summarized by the formula

\displaystyle C = \frac{\varepsilon A}{d}

Then, an external event can cause the plates to move close enough together so that the electrons can jump the gap and current can begin to flow. This discharges the negatively charged plate.

A naive, Taylor-series-free accelerometer could work as follows:

  1. Allow the negatively charged plate to wobble a little bit by fixing just one end of the plate, pictured like a diving board (a cantilever).
  2. The amount of wobble will be proportional to the force of acceleration due to Hooke’s law for springs.
  3. When displaced by a distance of \delta, the capacitance in the plate changes to C = \frac{\varepsilon A}{d - \delta}.
  4. Use the amount of discharge to tell how much the plate displaced.

This is able to measure the force of acceleration in one dimension, and so thee of these devices are arranged in perpendicular axes to allow one to measure acceleration in 3-dimensional space.

The problem with this design is that C = \frac{\varepsilon A}{d - \delta} is a nonlinear change in capacitance with respect to the amount of displacement. To see how nonlinear, expand this as a Taylor series:

\displaystyle \begin{aligned} C &= \frac{\varepsilon A}{d - \delta} \\ &= \frac{\varepsilon A}{d} \left ( \frac{1}{1 -  \frac{\delta}{d}} \right ) \\ &= \frac{\varepsilon A}{d} \left ( 1 + \frac{\delta}{d} + \left ( \frac{\delta}{d} \right )^2 + O_{\delta \to 0}(\delta^3) \right ) \end{aligned}

I’m using the big-O notation O_{\delta \to 0}(\delta^3) to more rigorously say that I’m “ignoring” all cubic and higher terms. I can do this because in these engineering systems (I’m taking Hammack at his word here), the quantity (\delta / d)^2 is meaningfully large, but later terms like (\delta / d)^3 are negligibly small. Of course, this is only true when the displacement \delta is very small compared to d, which is why the big-O has a subscript \delta \to 0.

Apparently, working backwards through the nonlinearity in the capacitance change is difficult enough to warrant changing the design of the system. (I don’t know why this is difficult, but I imagine it has to do with the engineering constraints of measurement devices; please do chime in if you know more)

The system design that avoids this is a three-plate system instead of a two-plate system.

A three-plate system design. The middle plate wobbles between two charged plates. (source)

In this system, the middle plate moves back and forth between two stationary plates that are connected to a voltage source. As it moves away from one and closer to the other, the increased capacitance on one side is balanced by the decreased capacitance on the other. The Taylor series shows how these two changes cancel out on the squared term only.

If C_1 = \frac{\varepsilon A}{d - \delta} represents the changed capacitance of the left plate (the middle plate moves closer to it), and C_2 = \frac{\varepsilon A}{d + \delta} represents the right plate (the middle plate moves farther from it), then we expand the difference in capacitances via Taylor series (using the Taylor series for 1/(1-x) for both, but in the 1 + \delta/d case it’s 1 / (1 - (-x))).

\displaystyle \begin{aligned} C_1 - C_2 &= \frac{\varepsilon A}{d - \delta} - \frac{\varepsilon A}{d + \delta} \\ &= \frac{\varepsilon A}{d} \left ( \frac{1}{1 - \frac{\delta}{d}} - \frac{1}{1 + \frac{\delta}{d}} \right ) \\ &= \frac{\varepsilon A}{d} \left ( 1 + \frac{\delta}{d} + \left ( \frac{\delta}{d} \right )^2 + O_{\delta \to 0}(\delta^3) - 1 + \frac{\delta}{d} - \left ( \frac{\delta}{d} \right )^2 + O_{\delta \to 0}(\delta^3) \right ) \\ &= \frac{\varepsilon A}{d} \left ( \frac{2\delta}{d} + O_{\delta \to 0}(\delta^3) \right ) \end{aligned}

Again, since the cubic and higher terms are negligibly small, we can “ignore” those parts. What remains is a linear response to the change in the middle plate’s displacement. This makes it significantly easier to measure. Because we’re measuring the difference in capacitance, this design is called a “differential capacitor.”

Though the math is tidy in retrospect, I marvel at how one might have conceived of this design from scratch. Did the inventor notice the symmetries in the Taylor series approximations could be arranged to negate each other? Was there some other sort of “physical intuition” at play?

Until next time!

Silent Duels—Parsing the Construction

Last time we discussed the setup for the silent duel problem: two players taking actions in [0,1], player 1 gets n chances to act, player 2 gets m, and each knows their probability of success when they act.

The solution is in a paper of Rodrigo Restrepo from the 1950s. In this post I’ll start detailing how I study this paper, and talk through my thought process for approaching a bag of theorems and proofs. If you want to follow along, I re-typeset the paper on Github.

Game Theory Basics

The Introduction starts with a summary of the setting of game theory. I remember most of this so I will just summarize the basics of the field. Skip ahead if you already know what the minimax theorem is, and what I mean when I say the “value” of a game.

A two-player game consists of a set of actions for each player—which may be finite or infinite, and need not be the same for both players—and a payoff function for each possible choice of actions. The payoff function is interpreted as the “utility” that player 1 gains and player 2 loses. If the payoff is negative, you interpret it as player 1 losing utility to player 2. Utility is just a fancy way of picking a common set of units for what each player treasures in their heart of hearts. Often it’s stated as money and we assume both players value cash the same way. Games in which the utility is always “one player gains exactly the utility lost by the other player” are called zero-sum.

With a finite set of actions, the payoff function is a table. For rock-paper-scissors the table is:

Rock, paper: -1
Rock, scissors: 1
Rock, rock: 0
Paper, paper: 0
Paper, scissors: -1
Paper, rock: 1
Scissors, paper: 1
Scissors, scissors: 0
Scissors, rock: -1

You could arrange this in a matrix and analyze the structure of the matrix, but we won’t. It doesn’t apply to our forthcoming setting where the players have infinitely many strategies.

A strategy is a possibly-randomized algorithm (whose inputs are just the data of the game, not including any past history of play) that outputs an action. In some games, the optimal strategy is to choose a single action no matter what your opponent does. This is sometimes called a pure, dominating strategy, not because it dominates your opponent, but because it’s better than all of your other options no matter what your opponent does. The output action is deterministic.

However, as with rock-paper-scissors, the optimal strategy for most interesting games requires each player to act randomly according to a fixed distribution. Such strategies are called mixed or randomized. For rock-paper-scissors, the optimal strategy is to choose rock, paper, and scissors with equal probability.  Computers are only better than humans at rock-paper-scissors because humans are bad at behaving consistently and uniformly random.

The famous minimax theorem says that every two-player zero-sum game has an optimal strategy for each player, which is possibly randomized. This strategy is optimal in the sense that it maximizes your expected winnings no matter what your opponent does. However, if your opponent is playing a particularly suboptimal strategy, the minimax solution might not be as good as a solution that takes advantage of the opponent’s dumb choices. A uniform random rock-paper-scissors strategy is not optimal if your opponent always plays “rock.”  However, the optimal strategy doesn’t need special knowledge or space to store information about past play. If you played against God, you would blindly use the minimax strategy and God would have no upper hand. I wonder if the pope would have excommunicated me for saying that in the 1600’s.

The expected winnings for player 1 when both players play a minimax-optimal strategy is called the value of the game, and this number is unique (even if there are possibly multiple optimal strategies). If a game is symmetric—meaning both players have the same actions and the payoff function is symmetric—then the value is guaranteed to be zero. The game is fair.

The version of the minimax theorem that most people use (in particular, the version that often comes up in theoretical computer science) shows that finding an optimal strategy is equivalent to solving a linear program. This is great because it means that any such (finite) game is easy to solve. You don’t need insight; just compile and run. The minimax theorem is also true for sufficiently well-behaved continuous action spaces. The silent duel is well-behaved, so our goal is to compute an explicit, easy-to-implement strategy that the minimax theorem guarantees exists. As a side note, here is an example of a poorly-behaved game with no minimax optimum.

While the minimax theorem guarantees optimal strategies and a value, the concept of the “value” of the game has an independent definition:

Let X, Y be finite sets of actions for players 1, 2 respectively, and p(x), q(y) be strategies, i.e., probability distributions over X and Y so that p(x) is the probability that x is chosen. Let \Psi(x, y) be the payoff function for the game. The value of the game is a real number v such that there exist two strategies p, q with the two following properties. First, for every fixed y \in Y,

\displaystyle \sum_{x \in X} p(x) \Psi(x, y) \geq v

(no matter what player 2 does, player 1’s strategy guarantees at least v payoff), and for every fixed x \in X,

\displaystyle \sum_{y \in Y} q(y) \Psi(x, y) \leq v

(no matter what player 1 does, player 2’s strategy prevents a loss of more than v).

Since silent duels are continuous, Restrepo opens the paper with the corresponding definition for continuous games. Here a probability distribution is the same thing as a “positive measure with total measure 1.” Restrepo uses F and G for the strategies, and the corresponding statement of expected payoff for player 1 is that, for all fixed actions y \in Y,

\displaystyle \int \Psi(x, y) dF(x) \geq v

And likewise, for all x \in X,

\displaystyle \int \Psi(x, y) dG(y) \leq v

All of this background gets us through the very first paragraph of the Restrepo paper. As I elaborate in my book, this is par for the course for math papers, because written math is optimized for experts already steeped in the context. Restrepo assumes the reader knows basic game theory so we can get on to the details of his construction, at which point he slows down considerably to focus on the details.

Description of the Optimal Strategies

Starting in section 2, Restrepo describes the construction of the optimal strategy, but first he explains the formal details of the setting of the game. We already know the two players are taking n and m actions between 0 \leq t \leq 1, but we also fix the probability of success. Player 1 knows a distribution P(t) on [0,1] for which P(t) is the probability of success when acting at time t. Likewise, player 2 has a possibly different distribution Q(t), and (crucially) P(t), Q(t) both increase continuously on [0,1]. (In section 3 he clarifies further that P satisfies P(0) = 0, P(1) = 1, and P'(t) > 0, likewise for Q(t).) Moreover, both players know both P, Q. One could say that each player has an estimate of their opponent’s firing accuracy, and wants to be optimal compared to that estimate.

The payoff function \Psi(x, y) is defined informally as: 1 if Player one succeeds before Player 2, -1 if Player 2 succeeds first, and 0 if both players exhaust their actions before the end and none succeed. Though Restrepo does not state it, if the players act and succeed at the same time—say both players fire at time t=1—the payoff should also be zero. We’ll see how this is converted to a more formal (and cumbersome!) mathematical definition in a future post.

Next we’ll describe the statement of the fully general optimal strategy (which will be essentially meaningless, but have some notable features we can infer information from), and get a sneak peek at how to build this strategy algorithmically. Then we’ll see a simplified example of the optimal strategy.

The optimal strategy presented depends only on the values n, m (the number of actions each player gets) and their success probability distributions P, Q. For player 1, the strategy splits up [0,1] into subintervals

\displaystyle [a_i, a_{i+1}] \qquad 0 < a_1 < a_2, < \cdots < a_n < a_{n+1} = 1

Crucially, this strategy ignores the initial interval [0, a_1]. In each other subinterval Player 1 attempts an action at a time chosen by a probability distribution specific to that interval, independently of previous attempts. But no matter what, there is some initial wait time during which no action will ever be taken. This makes sense: if player 1 fired at time 0, it is a guaranteed wasted shot. Likewise, firing at time 0.000001 is basically wasted (due to continuity, unless P(t) is obnoxiously steep early on).

Likewise for player 2, the optimal strategy is determined by numbers b_1, \dots, b_m resulting in m intervals [b_j, b_{j+1}] with b_{m+1} = 1.

The difficult part of the construction is describing the distributions dictating when a player should act during an interval. It’s difficult because an interval for player 1 and player 2 can overlap partially. Maybe a_2 = 0.5, a_3 = 0.75 and b_1 = 0.25, b_2 = 0.6. Player 1 knows that Player 2 (using their corresponding minimax strategy) must act before time t = 0.6, and gets another chance after that time. This suggests that the distribution determining when Player 1 should act within [a_2, a_3] may have a discontinuous jump at t = 0.6.

Call F_i the distribution for Player 1 to act in the interval [a_i, a_{i+1}]. Since it is a continuous distribution, Restrepo uses F_i for the cumulative distribution function and dF_i for the probability density function. Then these functions are defined by (note this should be mostly meaningless for the moment)

\displaystyle dF_i(x_i) = \begin{cases} h_i f^*(x_i) dx_i & \textup{ if } a_i < x_i < a_{i+1} \\ 0 & \textup{ if } x_i \not \in [a_i, a_{i+1}] \\ \end{cases}

where f^* is defined as

\displaystyle f^*(t) = \prod_{b_j > t} \left [ 1 - Q(b_j) \right ] \frac{Q'(t)}{Q^2(t) P(t)}.

The constants h_i and h_{i+1} are related by the equation

\displaystyle h_i = [1 - D_i] h_{i+1},


\displaystyle D_i = \int_{a_i}^{a_{i+1}} P(t) dF_i(t)

What can we glean from this mashup of symbols? The first is that (obviously) the distribution is zero outside the interval [a_i, a_{i+1}]. Within it, there is this mysterious h_i that is related to the h_{i+1} used to define the next interval’s probability. This suggests we will likely build up the strategy in reverse starting with F_n as the “base case” (if n=1, then it is the only one).

Next, we notice the curious definition of f^*. It unsurprisingly requires knowledge of both P and Q, but the coefficient is strangely chosen: it’s a product over all failure probabilities (1 - Q(b_j)) of all interval-starts happening later for the opponent.

[Side note: it’s very important that this is a constant; when I first read this, I thought that it was \prod_{b_j > t}[1 - Q(t)], which makes the eventual task of integrating f^* much harder.]

Finally, the last interval (the one ending at t=1) may include the option to simply “wait for a guaranteed hit,” which Restrepo calls a “discrete mass of \alpha at t=1.” That is, F_n may have a different representation than the rest. Indeed, at the end of the paper we will find that Restrepo gives a base-case definition for h_n that allows us to bootstrap the construction.

Player 2’s strategy is the same as Player 1’s, but replacing the roles of P, Q, n, m, a_i, b_j in the obvious way.

The symmetric example

As with most math research, the best way to parse a complicated definition or construction is to simplify the different aspects of the problem until they become tractable. One way to do this is to have only a single action for both players, with P = Q. Restrepo provides a more general example to demonstrate, which results in the five most helpful lines in the paper. I’ll reproduce them here verbatim:

EXAMPLE. Symmetric Game: P(t) = Q(t), and n = m. In this case the two
players have the same optimal strategies; \alpha = 0, and a_k = b_k, k=1, \dots, n. Furthermore

\displaystyle \begin{aligned} P(a_{n-k}) &= \frac{1}{2k+3} & k = 0, 1, \dots, n-1, \\ dF_{n-k}(t) &= \frac{1}{4(k+1)} \frac{P'(t)}{P^3(t)} dt & a_{n-k} < t < a_{n-k+1}. \end{aligned}

Saying \alpha = 0 means there is no “wait until t=1 to guarantee a hit”, which makes intuitive sense. You’d only want to do that if your opponent has exhausted all their actions before the end, which is only likely to happen if they have fewer actions than you do.

When Restrepo writes P(a_{n-k}) = \frac{1}{2k+3}, there are a few things happening. First, we confirm that we’re working backwards from a_n. Second, he’s implicitly saying “choose a_{n-k} such that P(a_{n-k}) has the desired cumulative density.” After a bit of reflection, there’s no other way to specify the a_i except implicitly: we don’t have a formula for P to lean on.

Finally, the definition of the density function dF_{n-k}(t) helps us understand under what conditions the probability function would be increasing or decreasing from the start of the interval to the end. Looking at the expression P'(t) / P^3(t), we can see that polynomials will result in an expression dominated by 1/t^k for some k, which is decreasing. By taking the derivative, an increasing density would have to be built from a P satisfying P''(t) P(t) - 3(P'(t))^2 > 0. However, I wasn’t able to find any examples that satisfy this. Polynomials, square roots, logs and exponentials, all seem to result in decreasing density functions.

Finally, we’ll plot two examples. The first is the most reductive: P(t) = Q(t) = t, and n = m = 1. In this case n=1, and there is only one term k=0, for which a_n = 1/3. Then dF_1(t) = 1/4t^3. (For verification, note the integral of dF_1 on [1/3, 1] is indeed 1).


With just one action and P(t) = Q(t) = t, the region before t=1/3 has zero probability, and the probability decreases from 6.75 to 1/4.

Note that the reason a_n = 1/3 is so nice is that P(t) is so simple. If P(t) were, say, t^2, then a_n should shift to being \sqrt{1/3}. If P(t) were more complicated, we’d have to invert it (or use an approximate search) to find the location a_n for which P(a_n) = 1/3.

Next, we loosen the example to let n=m=4, still with P(t) = Q(t) = t. In this case, we have the same final interval [1/3,1]. The new actions all occur in the time before t=1/3, in the intervals [1/5, 1/3], [1/7, 1/5], [1/9,1/7]. If there were more actions, we’d get smaller inverse-of-odd-spaced intervals approaching zero. The probability densities are now steeper versions of the same 1/4t^3, with the constant getting smaller to compensate for the fact that 1/t^3 gets larger and maintain the normalized distribution. For example, the earliest interval results in \int_{1/9}^{1/7} \frac{1}{16t^3} dt = 1. Closer to zero the densities are somewhat shallower compared to the size of the interval; for example in [1/9, 1/7], the density toward the beginning of the interval is only about twice as large as the density toward the end.


The combination of the four F_i’s for the four intervals in which actions are taken. This is a complete description of the optimal strategy for our simple symmetric version of the silent duel.

Since the early intervals are getting smaller and smaller as we add more actions, the optimal strategy will resemble a burst of action at the beginning, gradually tapering off as the accuracy increases and we work through our budget. This is an explicit tradeoff between the value of winning (lots of early, low probability attempts) and keeping some actions around for the end where you’re likely to succeed.

Next step: get to the example from the general theorem

At this point, we’ve parsed the general statement of the theorem, and while much of it is still mysterious, we extracted some useful qualitative information from the statement, and tinkered with some simple examples.

At this point, I have confidence that the simple symmetric example Restrepo provided is correct; it passed some basic unit tests, like that each dF_i is normalized. My next task in fully understanding the paper is to be able to derive the symmetric example from the general construction. We’ll do this next time, and include a program that constructs the optimal solution for any input.

Until then!


Visualizing an Assassin Puzzle

Over at Math3ma, Tai-Danae Bradley shared the following puzzle, which she also featured in a fantastic (spoiler-free) YouTube video. If you’re seeing this for the first time, watch the video first.

Consider a square in the xy-plane, and let A (an “assassin”) and T (a “target”) be two arbitrary-but-fixed points within the square. Suppose that the square behaves like a billiard table, so that any ray (a.k.a “shot”) from the assassin will bounce off the sides of the square, with the angle of incidence equaling the angle of reflection.

Puzzle: Is it possible to block any possible shot from A to T by placing a finite number of points in the square?

This puzzle found its way to me through Tai-Danae’s video, via category theorist Emily Riehl, via a talk by the recently deceased Fields Medalist Maryam Mirzakhani, who studied the problem in more generality. I’m not familiar with her work, but knowing mathematicians it’s probably set in an arbitrary complex n-manifold.

See Tai-Danae’s post for a proof, which left such an impression on me I had to dig deeper. In this post I’ll discuss a visualization I made—now posted at the end of Tai-Danae’s article—as well as here and below (to avoid spoilers). In the visualization, mouse movement chooses the firing direction for the assassin, and the target is in green. Dragging the target with the mouse updates the position of the guards. The source code is on Github.


The visualization uses d3 library, which was made for visualizations that dynamically update with data. I use it because it can draw SVGs real nice.

The meat of the visualization is in two geometric functions.

  1. Decompose a ray into a series of line segments—its path as it bounces off the walls—stopping if it intersects any of the points in the plane.
  2. Compute the optimal position of the guards, given the boundary square and the positions of the assassin and target.

Both of these functions, along with all the geometry that supports them, is in geometry.js. The rest of the demo is defined in main.js, in which I oafishly trample over d3 best practices to arrive miraculously at a working product. Critiques welcome 🙂

As with most programming and software problems, the key to implementing these functions while maintaining your sanity is breaking it down into manageable pieces. Incrementalism is your friend.

Vectors, rays, rectangles, and ray splitting

We start at the bottom with a Vector class with helpful methods for adding, scaling, and computing norms and inner products.

function innerProduct(a, b) {
  return a.x * b.x + a.y * b.y;

class Vector {
  constructor(x, y) {
    this.x = x;
    this.y = y;

  normalized() { ... }
  norm() { ... }
  add(vector) { ... }
  subtract(vector) { ... }
  scale(length) { ... }
  distance(vector) { ... }
  midpoint(b) { ... }

This allows one to compute the distance between two points, e.g., with vector.subtract(otherVector).norm().

Next we define a class for a ray, which is represented by its center (a vector) and a direction (a vector).

class Ray {
  constructor(center, direction, length=100000) { = center;
    this.length = length;

    if (direction.x == 0 && direction.y == 0) {
      throw "Can't have zero direction";
    this.direction = direction.normalized();

  endpoint() {

  intersects(point) {
    let shiftedPoint = point.subtract(;
    let signedLength = innerProduct(shiftedPoint, this.direction);
    let projectedVector = this.direction.scale(signedLength);
    let differenceVector = shiftedPoint.subtract(projectedVector);

    if (signedLength > 0
        && this.length > signedLength
        && differenceVector.norm() < intersectionRadius) {
      return projectedVector.add(;
    } else {
      return null;

The ray must be finite for us to draw it, but the length we've chosen is so large that, as you can see in the visualization, it's effectively infinite. Feel free to scale it up even longer.


The interesting bit is the intersection function. We want to compute whether a ray intersects a point. To do this, we use the inner product as a decision rule to compute the distance of a point from a line. If that distance is very small, we say they intersect.

In our demo points are not infinitesimal, but rather have a small radius described by intersectionRadius. For the sake of being able to see anything we set this to 3 pixels. If it’s too small the demo will look bad. The ray won’t stop when it should appear to stop, and it can appear to hit the target when it doesn’t.

Next up we have a class for a Rectangle, which is where the magic happens. The boilerplate and helper methods:

class Rectangle {
  constructor(bottomLeft, topRight) {
    this.bottomLeft = bottomLeft;
    this.topRight = topRight;

  topLeft() { ... }
  center() { ... }
  width() { .. }
  height() { ... }
  contains(vector) { ... }

The function rayToPoints that splits a ray into line segments from bouncing depends on three helper functions:

  1. rayIntersection: Compute the intersection point of a ray with the rectangle.
  2. isOnVerticalWall: Determine if a point is on a vertical or horizontal wall of the rectangle, raising an error if neither.
  3. splitRay: Split a ray into a line segment and a shorter ray that’s “bounced” off the wall of the rectangle.

(2) is trivial, computing some x- and y-coordinate distances up to some error tolerance. (1) involves parameterizing the ray and checking one of four inequalities. If the bottom left of the rectangle is (x_1, y_1) and the top right is (x_2, y_2) and the ray is written as \{ (c_1 + t v_1, c_2 + t v_2) \mid t > 0 \}, then—with some elbow grease—the following four equations provide all possibilities, with some special cases for vertical or horizontal rays:

\displaystyle \begin{aligned} c_2 + t v_2 &= y_2 & \textup{ and } \hspace{2mm} & x_1 \leq c_1 + t v_1 \leq x_2 & \textup{ (intersects top)} \\ c_2 + t v_2 &= y_1 & \textup{ and } \hspace{2mm} & x_1 \leq c_1 + t v_1 \leq x_2 & \textup{ (intersects bottom)} \\ c_1 + t v_1 &= x_1 & \textup{ and } \hspace{2mm} & y_1 \leq c_2 + t v_2 \leq y_2 & \textup{ (intersects left)} \\ c_1 + t v_1 &= x_2 & \textup{ and } \hspace{2mm} & y_1 \leq c_2 + t v_2 \leq y_2 & \textup{ (intersects right)} \\ \end{aligned}

In code:

  rayIntersection(ray) {
    let c1 =;
    let c2 =;
    let v1 = ray.direction.x;
    let v2 = ray.direction.y;
    let x1 = this.bottomLeft.x;
    let y1 = this.bottomLeft.y;
    let x2 = this.topRight.x;
    let y2 = this.topRight.y;

    // ray is vertically up or down
    if (epsilon > Math.abs(v1)) {
      return new Vector(c1, (v2 > 0 ? y2 : y1));

    // ray is horizontally left or right
    if (epsilon > Math.abs(v2)) {
      return new Vector((v1 > 0 ? x2 : x1), c2);

    let tTop = (y2 - c2) / v2;
    let tBottom = (y1 - c2) / v2;
    let tLeft = (x1 - c1) / v1;
    let tRight = (x2 - c1) / v1;

    // Exactly one t value should be both positive and result in a point
    // within the rectangle

    let tValues = [tTop, tBottom, tLeft, tRight];
    for (let i = 0; i  epsilon && this.contains(intersection)) {
        return intersection;

    throw "Unexpected error: ray never intersects rectangle!";

Next, splitRay splits a ray into a single line segment and the “remaining” ray, by computing the ray’s intersection with the rectangle, and having the “remaining” ray mirror the direction of approach with a new center that lies on the wall of the rectangle. The new ray length is appropriately shorter. If we run out of ray length, we simply return a segment with a null ray.

  splitRay(ray) {
    let segment = [, this.rayIntersection(ray)];
    let segmentLength = segment[0].subtract(segment[1]).norm();
    let remainingLength = ray.length - segmentLength;

    if (remainingLength < 10) {
      return {
        segment: [, ray.endpoint()],
        ray: null

    let vertical = this.isOnVerticalWall(segment[1]);
    let newRayDirection = null;

    if (vertical) {
      newRayDirection = new Vector(-ray.direction.x, ray.direction.y);
    } else {
      newRayDirection = new Vector(ray.direction.x, -ray.direction.y);

    let newRay = new Ray(segment[1], newRayDirection, length=remainingLength);
    return {
      segment: segment,
      ray: newRay

As you have probably guessed, rayToPoints simply calls splitRay over and over again until the ray hits an input “stopping point”—a guard, the target, or the assassin—or else our finite ray length has been exhausted. The output is a list of points, starting from the original ray’s center, for which adjacent pairs are interpreted as line segments to draw.

  rayToPoints(ray, stoppingPoints) {
    let points = [];
    let remainingRay = ray;

    while (remainingRay) {
      // check if the ray would hit any guards or the target
      if (stoppingPoints) {
        let hardStops = => remainingRay.intersects(p))
          .filter(p => p != null);
        if (hardStops.length > 0) {
          // find first intersection and break
          let closestStop = remainingRay.closestToCenter(hardStops);

      let rayPieces = this.splitRay(remainingRay);
      remainingRay = rayPieces.ray;

    return points;

That’s sufficient to draw the shot emanating from the assassin. This method is called every time the mouse moves.

Optimal guards

The function to compute the optimal position of the guards takes as input the containing rectangle, the assassin, and the target, and produces as output a list of 16 points.

 * Compute the 16 optimal guards to prevent the assassin from hitting the
 * target.
function computeOptimalGuards(square, assassin, target) {

If you read Tai-Danae’s proof, you’ll know that this construction is to

  1. Compute mirrors of the target across the top, the right, and the top+right of the rectangle. Call this resulting thing the 4-mirrored-targets.
  2. Replicate the 4-mirrored-targets four times, by translating three of the copies left by the entire width of the 4-mirrored-targets shape, down by the entire height, and both left-and-down.
  3. Now you have 16 copies of the target, and one assassin. This gives 16 line segments from assassin-to-target-copy. Place a guard at the midpoint of each of these line segments.
  4. Finally, apply the reverse translation and reverse mirroring to return the guards to the original square.

Due to WordPress being a crappy blogging platform I need to migrate off of, the code snippets below have been magically disappearing. I’ve included links to github lines as well.

Step 1 (after adding simple helper functions on Rectangle to do the mirroring):

  // First compute the target copies in the 4 mirrors
  let target1 = target.copy();
  let target2 = square.mirrorTop(target);
  let target3 = square.mirrorRight(target);
  let target4 = square.mirrorTop(square.mirrorRight(target));
  target1.guardLabel = 1;
  target2.guardLabel = 2;
  target3.guardLabel = 3;
  target4.guardLabel = 4;

Step 2:

  // for each mirrored target, compute the four two-square-length translates
  let mirroredTargets = [target1, target2, target3, target4];
  let horizontalShift = 2 * square.width();
  let verticalShift = 2 * square.height();
  let translateLeft = new Vector(-horizontalShift, 0);
  let translateRight = new Vector(horizontalShift, 0);
  let translateUp = new Vector(0, verticalShift);
  let translateDown = new Vector(0, -verticalShift);

  let translatedTargets = [];
  for (let i = 0; i < mirroredTargets.length; i++) {
    let target = mirroredTargets[i];

Step 3, computing the midpoints:

  // compute the midpoints between the assassin and each translate
  let translatedMidpoints = [];
  for (let i = 0; i  t.midpoint(assassin)));

Step 4, returning the guards back to the original square, is harder than it seems, because the midpoint of an assassin-to-target-copy segment might not be in the same copy of the square as the target-copy being fired at. This means you have to detect which square copy the midpoint lands in, and use that to determine which operations are required to invert. This results in the final block of this massive function.

  // determine which of the four possible translates the midpoint is in
  // and reverse the translation. Since midpoints can end up in completely
  // different copies of the square, we have to check each one for all cases.
  function untranslate(point) {
    if (point.x  square.bottomLeft.y) {
      return point.add(translateRight);
    } else if (point.x >= square.bottomLeft.x && point.y <= square.bottomLeft.y) {
      return point.add(translateUp);
    } else if (point.x < square.bottomLeft.x && point.y <= square.bottomLeft.y) {
      return point.add(translateRight).add(translateUp);
    } else {
      return point;

  // undo the translations to get the midpoints back to the original 4-mirrored square.
  let untranslatedMidpoints = [];
  for (let i = 0; i  square.topRight.x && point.y > square.topRight.y) {
      return square.mirrorTop(square.mirrorRight(point));
    } else if (point.x > square.topRight.x && point.y <= square.topRight.y) {
      return square.mirrorRight(point);
    } else if (point.x  square.topRight.y) {
      return square.mirrorTop(point);
    } else {
      return point;


And that’s all there is to it!

Improvements, if I only had the time

There are a few improvements I’d like to make to this puzzle, but haven’t made the time (I’m writing a book, after all!).

  1. Be able to drag the guards around.
  2. Create new guards from an empty set of guards, with a button to “reveal” the solution.
  3. Include a toggle that, when pressed, darkens the entire region of the square that can be hit by the assassin. For example, this would allow you to see if the target is in the only possible safe spot, or if there are multiple safe spots for a given configuration.
  4. Perhaps darken the vulnerable spots by the number of possible paths that hit it, up to some limit.
  5. The most complicated one: generalize to an arbitrary polygon (convex or not!), for which there may be no optional solution. The visualization would allow you to look for a solution using 2-4.

Pull requests are welcome if you attempt any of these improvements.

Until next time!