# Searching for RH Counterexamples — Deploying with Docker

We’re ironically searching for counterexamples to the Riemann Hypothesis.

In this article we’ll deploy the application on a server, so that it can search for RH counterexamples even when I close my laptop.

## Servers and containers

When deploying applications to servers, reproducibility is crucial. You don’t want your application to depend on the details of the computer it’s running on. This is a higher-level version of the same principle behind Python virtual environments, but it applies to collections of programs, possibly written in different languages and running on different computers. In our case, we have a postgres database, the pgmp extension, the populate_database program, and plans for a web server.

The principle of an application not depending on the system it’s running on is called hermeticity (the noun form of hermetic, meaning air-tight). Hermeticity is good for the following reasons. When a server crashes, you don’t have to remember what you did to set it up. Instead you run a build/install that works on any machine. Newcomers also don’t have to guess what unknown aspects of a running server are sensitive. Another benefit is that you can test it on your local machine identically to how it will run in production. It also allows you to easily migrate from one cloud provider to another, which allows you to defer expensive commitments until you have more information about your application’s needs. Finally, if you have multiple applications running on the same server, you don’t want to have their needs conflict with each other, which can happen easily if two applications have dependencies that transitively depend on different versions of the same software. This is called “dependency hell.” In all of these, you protect yourself from becoming dependent on arbitrary choices you made before you knew better.

One industry-strength approach to hermeticity is to use containers. A container is a virtual machine devoted to running a single program, with explicitly-defined exposure to the outside world. We will set up three containers: one to run the database, one for the search application, and (later) one for a web server. We’ll start by deploying them all on the same machine, but could also deploy them to different machines. Docker is a popular containerization system. Before going on, I should stress that Docker, while I like it, is not sacred by any means. In a decade Docker may disappear, but the principle of hermeticity and the need for reproducible deployments will persist.

Docker allows you to describe your container by first starting from an existing (trusted) container, such as one that has an operating system and postgres already installed, and extend it for your application. This includes installing dependencies, fetching the application code from git, copying files into the container from the host system, exposing the container’s network ports, and launching the application. You save the commands that accomplish that in a Dockerfile with some special syntax. To deploy it, you copy the Dockerfile to the server (say, via git) and run docker commands to launch the container. You only have to get the Dockerfile right once, you can test it locally, and then it will work on any server just the same. The only caveat I’ve seen here is that if you migrate to a server with a different processor architecture, the install script (in our case, pip install numba) may fail to find a pre-compiled binary for the target architecture, and it may fall back to compiling from source, which can add additional requirements or force you to change which OS your container is derived from.

This reduces our “set up a new server” script to just a few operations: (1) install docker (2) fetch the repository (3) launch the docker containers from their respective Dockerfiles. In my experience, writing a Dockerfile is no small task, but figuring out how to install stuff is awful in all cases, and doing it for Docker gives you an artifact tracing the steps, and a reasonable expectation of not having to do it again.

Thankfully, you dear readers can skip my head-banging and see the Dockerfiles after I figured it out.

## The Postgres Dockerfile

This commit adds a Dockerfile for the database, and makes some small changes to the project to allow it to run. It has only 15 lines, but it took me a few hours to figure out. The process was similar to installing confusing software on your own machine: try to install, see some error like "missing postgres.h“, go hunt around on the internet to figure out what you have to install to get past the error, and repeat.

Let’s go through each line of the Dockerfile.

FROM postgres:12


The first line defines the container image that this container starts from, which is officially maintained by the Postgres team. Looking at their Dockerfile, it starts from debian:buster-slim, which is a Debian Linux instance that is “slimmed” down to be suitable for docker containers, meaning it has few packages pre-installed. Most importantly, “Debian” tells us what package manager to use (apt-get) in our Dockerfile.

It’s also worth noting at this point that, when docker builds the container, each command in a docker file results in a new image. An image is a serialized copy of all the data in a docker container, so that it can be started or extended easily. And if you change a line halfway through your Dockerfile, docker only has to rebuild images from that step onward. You can publish images on the web, and other docker users can use them as a base. This is like forking a project on Github, and is exactly what happens when Docker executes FROM postgres:12.

ENV POSTGRES_USER docker
ENV POSTGRES_DB divisor


These lines declare configuration for the database that the base postgres image will create when the container is started. The variable names are described in the “Environment Variables” section of the Postgres image’s documentation. The ENV command tells docker to instantiate environment variables (like the PATH variable in a terminal shell), that running programs can access. I’m insecurely showing the password and username here because the server the docker containers will run on won’t yet expose anything to the outside world. Later in this post you will see how to pass an environment variable from the docker command line when the container is run, and you would use something close to that to set configuration secrets securely.

RUN apt-get update \
&& apt-get install -y pgxnclient build-essential libgmp3-dev postgresql-server-dev-12 libmpc-dev


The RUN command allows you to run any shell command you’d like, in this case a command to update apt and install the dependencies needed to build the pgmp extension. This includes gcc and make via build-essential, and the gmp-specific libraries.

RUN apt-get install -y python3.7 python3-setuptools python3-pip python-pip python3.7-dev \
&& pip3 install wheel \
&& pip install six


Next we do something a bit strange. We install python3.7 and pip (because we will need to pip3 install our project’s requirements.txt), but also python2’s pip. Here’s what’s going on. The pgmp postgres extension needs to be built from source, and it has a dependency on python2.7 and the python2-six library. So the first RUN line here installs all the python-related tools we need.

RUN pgxn install pgmp


Then we install the pgmp extension.

COPY . /divisor
WORKDIR "/divisor"


These next two lines copy the current directory on the host machine to the container’s file system, and sets the working directory for all future commands to that directory. Note that whenever the contents of our project change, docker needs to rebuild the image from this step because any subsequent steps like pip install -r requirements.txt might have a different outcome.

RUN python3 -m pip install --upgrade pip
RUN pip3 install -r requirements.txt


Next we upgrade pip (which is oddly required for the numba dependency, though I can’t re-find the Github issue where I discovered this) and install the python dependencies for the project. The only reason this is required is because we included the database schema setup in the python script riemann/postgres_batabase.py. So this makes the container a bit more complicated than absolutely necessary. It can be improved later if need be.

ENV PGUSER=docker
ENV PGDATABASE=divisor


These next lines are environment variables used by the psycopg2 python library to infer how to connect to postgres if no database spec is passed in. It would be nice if this was shared with the postgres environment variables, but duplicating it is no problem.

COPY setup_schema.sh /docker-entrypoint-initdb.d/


The last line copies a script to a special directory specified by the base postgres Dockerfile. The base dockerfile specifies that any scripts in this directory will be run when the container is started up. In our case, we just call the (idempotent) command to create the database. In a normal container we might specify a command to run when the container is started (our search container, defined next, will do this), but the postgres base image handles this for us by starting the postgres database and exposing the right ports.

Finally we can build and run the container

docker build -t divisordb -f divisordb.Dockerfile .
#  ... lots of output ...

docker run -d -p 5432:5432 --name divisordb divisordb:latest


After the docker build command—which will take a while—you will be able to see the built images by running docker images, and the final image will have a special tag divisordb. The run command additionally tells docker to run the container as a daemon (a.k.a. in the background) with -d and to -p to publish port 5432 on the host machine and map it to 5432 on the container. This allows external programs and programs on other computers to talk to the container by hitting 0.0.0.0:5432. It also allows other containers to talk to this container, but as we’ll see shortly that requires a bit more work, because inside a container 0.0.0.0 means the container, not the host machine.

Finally, one can run the following code on the host machine to check that the database is accepting connections.

pg_isready --host 0.0.0.0 --username docker --port 5432 --dbname divisor


If you want to get into the database to run queries, you can run psql with the same flags as pg_isready, or manually enter the container with docker exec -it divisordb bash and run psql from there.

psql --host 0.0.0.0 --username docker --port 5432 --dbname divisor
divisor=# \d
List of relations
Schema |        Name        | Type  | Owner
--------+--------------------+-------+--------
public | riemanndivisorsums | table | docker
public | searchmetadata     | table | docker
(2 rows)


Look at that. You wanted to disprove the Riemann Hypothesis, and here you are running docker containers.

## The Search Container

Next we’ll add a container for the main search application. Before we do this, it will help to make the main entry point to the program a little bit simpler. This commit modifies populate_database.py‘s main routine to use argparse and some sensible defaults. Now we can run the application with just python -m riemann.populate_database.

Then the Dockerfile for the search part is defined in this commit. I’ve copied it below. It’s much simpler than the database, but somehow took just as long for me to build as the database Dockerfile, because I originally chose a base image called “alpine” that is (unknown to me at the time) really bad for Python if your dependencies have compiled C code, like numba does.

FROM python:3.7-slim-buster

RUN apt-get update \
&& apt-get install -y build-essential libgmp3-dev libmpc-dev

COPY . /divisor
WORKDIR "/divisor"

RUN pip3 install -r requirements.txt

ENV PGUSER=docker
ENV PGDATABASE=divisor

ENTRYPOINT ["python3", "-m", "riemann.populate_database"]


The base image is again Debian, with Python3.7 pre-installed.

Then we can build it and (almost) run it

docker build -t divisorsearch -f divisorsearch.Dockerfile .
docker run -d --name divisorsearch --env PGHOST="$PGHOST" divisorsearch:latest  What’s missing here is the PGHOST environment variable, which psycopg2 uses to find the database. The problem is, inside the container “localhost” and 0.0.0.0 are interpreted by the operating system to mean the container itself, not the host machine. To get around this problem, docker maintains IP addresses for each docker container, and uses those to route network requests between containers. The docker inspect command exposes information about this. Here’s a sample of the output $ docker inspect divisordb
[
{
"Id": "f731a78bde50be3de1d77ae1cff6d23c7fe21d4dbe6a82b31332c3ef3f6bbbb4",
"Path": "docker-entrypoint.sh",
"Args": [
"postgres"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
...
},
...
"NetworkSettings": {
...
"Ports": {
"5432/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "5432"
}
]
},
...
...
}
}
]


The part that matters for us is the ip address, and the following extracts it to the environment variable PGHOST.

export PGHOST=(docker inspect -f "{{ .NetworkSettings.IPAddress }}" divisordb)  Once the two containers are running—see docker ps for the running containers, docker ps -a to see any containers that were killed due to an error, and docker logs to see the container’s logged output—you can check the database to see it’s being populated. divisor=# select * from SearchMetadata order by start_time desc limit 10; start_time | end_time | search_state_type | starting_search_state | ending_search_state ----------------------------+----------------------------+-------------------------------+-----------------------+--------------------- 2020-12-27 03:10:01.256996 | 2020-12-27 03:10:03.594773 | SuperabundantEnumerationIndex | 29,1541 | 31,1372 2020-12-27 03:09:59.160157 | 2020-12-27 03:10:01.253247 | SuperabundantEnumerationIndex | 26,705 | 29,1541 2020-12-27 03:09:52.035991 | 2020-12-27 03:09:59.156464 | SuperabundantEnumerationIndex | 1,0 | 26,705  ## Ship it! I have an AWS account, so let’s use Amazon for this. Rather than try the newfangled beanstalks or lightsails or whatever AWS-specific frameworks they’re trying to sell, for now I’ll provision a single Ubuntu EC2 server and run everything on it. I picked a t2.micro for testing (which is free). There’s a bit of setup to configure and launch the server—such as picking the server image, downloading an ssh key, and finding the IP address. I’ll skip those details since they are not (yet) relevant to the engineering process. Once I have my server, I can ssh in, install docker, git clone the project, and run the deploy script. # install docker, see get.docker.com curl -fsSL https://get.docker.com -o get-docker.sh sudo sh get-docker.sh sudo usermod -aG docker ubuntu # log out and log back in git clone https://github.com/j2kun/riemann-divisor-sum && cd riemann-divisor-sum bash deploy.sh  And it works! Sadly, within an hour the divisorsearch container crashes because the instance runs out of RAM and CPU. Upgrading to a t2.medium (4 GiB RAM), it goes for about 2 hours before exhausting RAM. We could profile it and find the memory hotspots, but instead let’s apply a theorem due to billionaire mathematician Jim Simons: throw money at the problem. Upgrading to a r5.large (16 GiB RAM), and it runs comfortably all day. Four days later, logging back into the VM and and I notice things are sluggish, even though the docker instance isn’t exhausting the total available RAM or CPU. docker stats also shows low CPU usage on divisorsearch. The database shows that it has only got up to 75 divisors, which is just as far as it got when I ran it (not in Docker) on my laptop for a few hours in the last article. Something is amiss, and we’ll explore what happened next time. ## Notes A few notes on improvements that didn’t make it into this article. In our deployment, we rebuild the docker containers each time, even when nothing changes. What one could do instead is store the built images in what’s called a container registry, and pull them instead of re-building them on every deploy. This would only save us a few minutes of waiting, but is generally good practice. We could also use docker compose and a corresponding configuration file to coordinate launching a collection of containers that have dependencies on each other. For our case, the divisorsearch container depended on the divisordb container, and our startup script added a sleep 5 to ensure the latter was running before starting the former. docker compose would automatically handle that, as well as the configuration for naming, resource limits, etc. With only two containers it’s not that much more convenient, given that docker compose is an extra layer of indirection to learn that hides the lower-level commands. In this article we deployed a single database container and a single “search” container. Most of the time the database container is sitting idle while the search container does its magic. If we wanted to scale up, an obvious way would be to have multiple workers. But it would require some decent feature work. A sketch: reorganize the SearchMetadata table so that it contains a state attribute, like “not started”, “started”, or “finished,” then add functionality so that a worker (atomically) asks for the oldest “not started” block and updates the row’s state to “started.” When a worker finishes a block, it updates the database and marks the block as finished. If no “not started” blocks are found, the worker proceeds to create some number of new “not started” blocks. There are details to be ironed out around race conditions between multiple workers, but Postgres is designed to make such things straightforward. Finally, we could reduce the database size by keeping track of a summary of a search block instead of storing all the data in the block. For example, we could record the n and witness_value corresponding to the largest witness_value in a block, instead of saving every n and every witness_value. In order for this to be usable—i.e., for us to be able to say “we checked all possible $n < M$ and found no counterexamples”—we’d want to provide a means to verify the approach, say, by randomly verifying the claimed maxima of a random subset of blocks. However, doing this also precludes the ability to analyze patterns that look at all the data. So it’s a tradeoff. # Optimization Models for Subset Cover In a recent newsletter article I complained about how researchers mislead about the applicability of their work. I gave SAT solvers as an example. People provided interesting examples in response, but what was new to me was the concept of SMT (Satisfiability Modulo Theories), an extension to SAT. SMT seems to have more practical uses than vanilla SAT (see the newsletter for details). I wanted to take some time to explore SMT solvers, and I landed on Z3, an open-source SMT solver from Microsoft. In particular, I wanted to compare it to ILP (Integer Linear Programing) solvers, which I know relatively well. I picked a problem that I thought would work better for SAT-ish solvers than for ILPs: subset covering (explained in the next section). If ILP still wins against Z3, then that would be not so great for the claim that SMT is a production strength solver. All the code used for this post is on Github. ## Subset covering A subset covering is a kind of combinatorial design, which can be explained in terms of magic rings. An adventurer stumbles upon a chest full of magic rings. Each ring has a magical property, but some pairs of rings, when worn together on the same hand, produce a combined special magical effect distinct to that pair. The adventurer would like to try all pairs of rings to catalogue the magical interactions. With only five fingers, how can we minimize the time spent trying on rings? Mathematically, the rings can be described as a set $X$ of size $n$. We want to choose a family $F$ of subsets of $X$, with each subset having size 5 (five fingers), such that each subset of $X$ of size 2 (pairs of rings) is contained in some subset of $F$. And we want $F$ to be as small as possible. Subset covering is not a “production worthy” problem. Rather, I could imagine it’s useful in some production settings, but I haven’t heard of one where it is actually used. I can imagine, for instance, that a cluster of machines has some bug occurring seemingly at random for some point-to-point RPCs, and in tracking down the problem, you want to deploy a test change to subsets of servers to observe the bug occurring. Something like an experiment design problem. If you generalize the “5” in “5 fingers” to an arbitrary positive integer $k$, and the “2” in “2 rings” to $l < k$, then we have the general subset covering problem. Define $M(n, k, l)$ to be the minimal number of subsets of size $k$ needed to cover all subsets of size $l$. This problem was studied by Erdős, with a conjecture subsequently proved by Vojtěch Rödl, that asymptotically $M(n,k,l)$ grows like $\binom{n}{l} / \binom{k}{l}$. Additional work by Joel Spencer showed that a greedy algorithm is essentially optimal. However, all of the constructive algorithms in these proofs involve enumerating all $\binom{n}{k}$ subsets of $X$. This wouldn’t scale very well. You can alternatively try a “random” method, incurring a typically $\log(r)$ factor of additional sets required to cover a $1 - 1/r$ fraction of the needed subsets. This is practical, but imperfect. To the best of my knowledge, there is no exact algorithm, that both achieves the minimum and is efficient in avoiding constructing all $\binom{n}{k}$ subsets. So let’s try using an SMT solver. I’ll be using the Python library for Z3. ## Baseline: brute force Z3 For a baseline, let’s start with a simple Z3 model that enumerates all the possible subsets that could be chosen. This leads to an exceedingly simple model to compare the complex models against. Define boolean variables $\textup{Choice}_S$ which is true if and only if the subset $S$ is chosen (I call this a “choice set”). Define boolean variables $\textup{Hit}_T$ which is true if the subset $T$ (I call this a “hit set”) is contained in a chosen choice set. Then the subset cover problem can be defined by two sets of implications. First, if $\textup{Choice}_S$ is true, then so must all $\textup{Hit}_T$ for $T \subset S$. E.g., for $S = \{ 1, 2, 3 \}$ and $l=2$, we get \displaystyle \begin{aligned} \textup{Choice}_{(1,2,3)} &\implies \textup{Hit}_{(1,2)} \\ \textup{Choice}_{(1,2,3)} &\implies \textup{Hit}_{(1,3)} \\ \textup{Choice}_{(1,2,3)} &\implies \textup{Hit}_{(2,3)} \end{aligned} In Python this looks like the following (note this program has some previously created lookups and data structures containing the variables) for choice_set in choice_sets: for hit_set_key in combinations(choice_set.elements, hit_set_size): hit_set = hit_set_lookup[hit_set_key] implications.append( z3.Implies(choice_set.variable, hit_set.variable))  Second, if $\textup{Hit}_T$ is true, it must be that some $\textup{Choice}_S$ is true for some $S$ containing $T$ as a subset. For example, \displaystyle \begin{aligned} \textup{Hit}_{(1,2)} &\implies \\ & \textup{Choice}_{(1,2,3,4)} \textup{ OR} \\ & \textup{Choice}_{(1,2,3,5)} \textup{ OR} \\ & \textup{Choice}_{(1,2,4,5)} \textup{ OR } \cdots \\ \end{aligned} In code, for hit_set in hit_sets.values(): relevant_choice_set_vars = [ choice_set.variable for choice_set in hit_set_to_choice_set_lookup[hit_set] ] implications.append( z3.Implies( hit_set.variable, z3.Or(*relevant_choice_set_vars)))  Next, in this experiment we’re allowing the caller to specify the number of choice sets to try, and the solver should either return SAT or UNSAT. From that, we can use a binary search to find the optimal number of sets to pick. Thus, we have to limit the number of $\textup{Choice}_S$ that are allowed to be true and false. Z3 supports boolean cardinality constraints, apparently with a special solver to handle problems that have them. Otherwise, the process of encoding cardinality constraints as SAT formulas is not trivial (and the subject of active research). But the code is simple enough: args = [cs.variable for cs in choice_sets] + [parameters.num_choice_sets] choice_sets_at_most = z3.AtMost(*args) choice_sets_at_least = z3.AtLeast(*args)  Finally, we must assert that every $\textup{Hit}_T$ is true. solver = z3.Solver() for hit_set in hit_sets.values(): solver.add(hit_set.variable) for impl in implications: solver.add(impl) solver.add(choice_sets_at_most) solver.add(choice_sets_at_least)  Running it for $n=7, k=3, l=2$, and seven choice sets (which is optimal), we get >>> SubsetCoverZ3BruteForce().solve( SubsetCoverParameters( num_elements=7, choice_set_size=3, hit_set_size=2, num_choice_sets=7)) [(0, 1, 3), (0, 2, 4), (0, 5, 6), (1, 2, 6), (1, 4, 5), (2, 3, 5), (3, 4, 6)] SubsetCoverSolution(status=<SolveStatus.SOLVED: 1>, solve_time_seconds=0.018305063247680664)  Interestingly, Z3 refuses to solve marginally larger instances. For instance, I tried the following and Z3 times out around $n=12, k=6$ (about 8k choice sets): from math import comb for n in range(8, 16): k = int(n / 2) l = 3 max_num_sets = int(2 * comb(n, l) / comb(k, l)) params = SubsetCoverParameters( num_elements=n, choice_set_size=k, hit_set_size=l, num_choice_sets=max_num_sets) print_table( params, SubsetCoverZ3BruteForce().solve(params), header=(n==8))  After taking a long time to generate the larger models, Z3 exceeds my 15 minute time limit, suggesting exponential growth: status solve_time_seconds num_elements choice_set_size hit_set_size num_choice_sets SolveStatus.SOLVED 0.0271 8 4 3 28 SolveStatus.SOLVED 0.0346 9 4 3 42 SolveStatus.SOLVED 0.0735 10 5 3 24 SolveStatus.SOLVED 0.1725 11 5 3 33 SolveStatus.SOLVED 386.7376 12 6 3 22 SolveStatus.UNKNOWN 900.1419 13 6 3 28 SolveStatus.UNKNOWN 900.0160 14 7 3 20 SolveStatus.UNKNOWN 900.0794 15 7 3 26  ## An ILP model Next we’ll see an ILP model for the sample problem. Note there are two reasons I expect the ILP model to fall short. First, the best solver I have access to is SCIP, which, despite being quite good is, in my experience, about an order of magnitude slower than commercial alternatives like Gurobi. Second, I think this sort of problem seems to not be very well suited to ILPs. It would take quite a bit longer to explain why (maybe another post, if you’re interested), but in short well-formed ILPs have easily found feasible solutions (this one does not), and the LP-relaxation of the problem should be as tight as possible. I don’t think my formulation is very tight, but it’s possible there is a better formulation. Anyway, the primary difference in my ILP model from brute force is that the number of choice sets is fixed in advance, and the members of the choice sets are model variables. This allows us to avoid enumerating all choice sets in the model. In particular, $\textup{Member}_{S,i} \in \{ 0, 1 \}$ is a binary variable that is 1 if and only if element $i$ is assigned to be in set $S$. And $\textup{IsHit}_{T, S} \in \{0, 1\}$ is 1 if and only if the hit set $T$ is a subset of $S$. Here “$S$” is an index over the subsets, rather than the set itself, because we don’t know what elements are in $S$ while building the model. For the constraints, each choice set $S$ must have size $k$: $\displaystyle \sum_{i \in X} \textup{Member}_{S, i} = k$ Each hit set $T$ must be hit by at least one choice set: $\displaystyle \sum_{S} \textup{IsHit}_{T, S} \geq 1$ Now the tricky constraint. If a hit set $T$ is hit by a specific choice set $S$ (i.e., $\textup{IsHit}_{T, S} = 1$) then all the elements in $T$ must also be members of $S$. $\displaystyle \sum_{i \in T} \textup{Member}_{S, i} \geq l \cdot \textup{IsHit}_{T, S}$ This one works by the fact that the left-hand side (LHS) is bounded from below by 0 and bounded from above by $l = |T|$. Then $\textup{IsHit}_{T, S}$ acts as a switch. If it is 0, then the constraint is vacuous since the LHS is always non-negative. If $\textup{IsHit}_{T, S} = 1$, then the right-hand side (RHS) is $l = |T|$ and this forces all variables on the LHS to be 1 to achieve it. Because we fixed the number of choice sets as a parameter, the objective is 1, and all we’re doing is looking for a feasible solution. The full code is here. On the same simple example as the brute force >>> SubsetCoverILP().solve( SubsetCoverParameters( num_elements=7, choice_set_size=3, hit_set_size=2, num_choice_sets=7)) [(0, 1, 3), (0, 2, 6), (0, 4, 5), (1, 2, 4), (1, 5, 6), (2, 3, 5), (3, 4, 6)] SubsetCoverSolution(status=<SolveStatus.SOLVED: 1>, solve_time_seconds=0.1065816879272461)  It finds the same solution in about 10x the runtime as the brute force Z3 model, though still well under one second. On the “scaling” example, it fares much worse. With a timeout of 15 minutes, it solves $n=8,$ decently fast, $n=9,12$ slowly, and times out on the rest. status solve_time_seconds num_elements choice_set_size hit_set_size num_choice_sets SolveStatus.SOLVED 1.9969 8 4 3 28 SolveStatus.SOLVED 306.4089 9 4 3 42 SolveStatus.UNKNOWN 899.8842 10 5 3 24 SolveStatus.UNKNOWN 899.4849 11 5 3 33 SolveStatus.SOLVED 406.9502 12 6 3 22 SolveStatus.UNKNOWN 902.7807 13 6 3 28 SolveStatus.UNKNOWN 900.0826 14 7 3 20 SolveStatus.UNKNOWN 900.0731 15 7 3 26  ## A Z3 Boolean Cardinality Model The next model uses Z3. It keeps the concept of Member and Hit variables, but they are boolean instead of integer. It also replaces the linear constraints with implications. The constraint that forces a Hit set’s variable to be true when some Choice set contains all its elements is (for each $S, T$) $\displaystyle \left ( \bigwedge_{i \in T} \textup{Member}_{S, i} \right ) \implies \textup{IsHit}_T$ Conversely, A Hit set’s variable being true implies its members are in some choice set. $\displaystyle \textup{IsHit}_T \implies \bigvee_{S} \bigwedge_{i \in T} \textup{Member}_{S, i}$ Finally, we again use boolean cardinality constraints AtMost and AtLeast so that each choice set has the right size. The results are much better than the ILP: it solves all of the instances in under 3 seconds status solve_time_seconds num_elements choice_set_size hit_set_size num_choice_sets SolveStatus.SOLVED 0.0874 8 4 3 28 SolveStatus.SOLVED 0.1861 9 4 3 42 SolveStatus.SOLVED 0.1393 10 5 3 24 SolveStatus.SOLVED 0.2845 11 5 3 33 SolveStatus.SOLVED 0.2032 12 6 3 22 SolveStatus.SOLVED 1.3661 13 6 3 28 SolveStatus.SOLVED 0.8639 14 7 3 20 SolveStatus.SOLVED 2.4877 15 7 3 26  ## A Z3 integer model Z3 supports implications on integer equation equalities, so we can try a model that leverages this by essentially converting the boolean model to one where the variables are 0-1 integers, and the constraints are implications on equality of integer formulas (all of the form “variable = 1”). I expect this to perform worse than the boolean model, even though the formulation is almost identical. The details of the model are here, and it’s so similar to the boolean model above that it needs no extra explanation. The runtime is much worse, but surprisingly it still does better than the ILP model. status solve_time_seconds num_elements choice_set_size hit_set_size num_choice_sets SolveStatus.SOLVED 2.1129 8 4 3 28 SolveStatus.SOLVED 14.8728 9 4 3 42 SolveStatus.SOLVED 7.6247 10 5 3 24 SolveStatus.SOLVED 25.0607 11 5 3 33 SolveStatus.SOLVED 30.5626 12 6 3 22 SolveStatus.SOLVED 63.2780 13 6 3 28 SolveStatus.SOLVED 57.0777 14 7 3 20 SolveStatus.SOLVED 394.5060 15 7 3 26  ## Harder instances So far all the instances we’ve been giving the solvers are “easy” in a sense. In particular, we’ve guaranteed there’s a feasible solution, and it’s easy to find. We’re giving roughly twice as many sets as are needed. There are two ways to make this problem harder. One is to test on unsatisfiable instances, which can be harder because the solver has to prove it can’t work. Another is to test on satisfiable instances that are hard to find, such as those satisfiable instances where the true optimal number of choice sets is given as the input parameter. The hardest unsatisfiable instances are also the ones where the number of choice sets allowed is one less than optimal. Let’s test those situations. Since $M(7, 3, 2) = 7$, we can try with 7 choice sets and 6 choice sets. For 7 choice sets (the optimal value), all the solvers do relatively well method status solve_time_seconds num_elements choice_set_size hit_set_size num_choice_sets SubsetCoverILP SOLVED 0.0843 7 3 2 7 SubsetCoverZ3Integer SOLVED 0.0938 7 3 2 7 SubsetCoverZ3BruteForce SOLVED 0.0197 7 3 2 7 SubsetCoverZ3Cardinality SOLVED 0.0208 7 3 2 7  For 6, the ILP struggles to prove it’s infeasible, and the others do comparatively much better (at least 17x better). method status solve_time_seconds num_elements choice_set_size hit_set_size num_choice_sets SubsetCoverILP INFEASIBLE 120.8593 7 3 2 6 SubsetCoverZ3Integer INFEASIBLE 3.0792 7 3 2 6 SubsetCoverZ3BruteForce INFEASIBLE 0.3384 7 3 2 6 SubsetCoverZ3Cardinality INFEASIBLE 7.5781 7 3 2 6  This seems like hard evidence that Z3 is better than ILPs for this problem (and it is), but note that the same test on $n=8$ fails for all models. They can all quickly prove $8 < M(8, 3, 2) \leq 11$, but time out after twenty minutes when trying to determine if $M(8, 3, 2) \in \{ 9, 10 \}$. Note that $k=3, l=2$ is the least complex choice for the other parameters, so it seems like there’s not much hope to find $M(n, k, l)$ for any seriously large parameters, like, say, $k=6$. ## Thoughts These experiments suggest what SMT solvers can offer above and beyond ILP solvers. Disjunctions and implications are notoriously hard to model in an ILP. You often need to add additional special variables, or do tricks like the one I did that only work in some situations and which can mess with the efficiency of the solver. With SMT, implications are trivial to model, and natively supported by the solver. Aside from reading everything I could find on Z3, there seems to be little advice on modeling to help the solver run faster. There is a ton of literature for this in ILP solvers, but if any readers see obvious problems with my SMT models, please chime in! I’d love to hear from you. Even without that, I am pretty impressed by how fast the solves finish for this subset cover problem (which this experiment has shown me is apparently a very hard problem). However, there’s an elephant in the room. These models are all satisfiability/feasibility checks on a given solution. What is not tested here is optimization, in the sense of having the number of choice sets used be minimized directly by the solver. In a few experiments on even simpler models, z3 optimization is quite slow. And while I know how I’d model the ILP version of the optimization problem, given that it’s quite slow to find a feasible instance when the optimal number of sets is given as a parameter, it seems unlikely that it will be fast when asked to optimize. I will have to try that another time to be sure. Also, I’d like to test the ILP models on Gurobi, but I don’t have a personal license. There’s also the possibility that I can come up with a much better ILP formulation, say, with a tighter LP relaxation. But these will have to wait for another time. In the end, this experiment has given me some more food for thought, and concrete first-hand experience, on the use of SMT solvers. # Searching for RH Counterexamples — Unbounded Integers We’re ironically searching for counterexamples to the Riemann Hypothesis. In the last article, we improved our naive search from “try all positive integers” to enumerate a subset of integers (superabundant numbers), which RH counterexamples are guaranteed to be among. These numbers grow large, fast, and we quickly reached the limit of what 64 bit integers can store. Unbounded integer arithmetic is possible on computers, but it requires a special software implementation. In brief, you represent numbers in base-N for some large N (say, $2^{32}$), and then use a 32-bit integer for each digit. Arithmetic on such quantities emulates a ripple-carry adder, which naturally requires linear time in the number of digits of each operand. Artem Golubin has a nice explanation of how Python does it internally. So Python can handle unbounded integer arithmetic, but neither numba nor our database engine do. Those both crash when exceeding 64-bit integers This is a problem because we won’t be able to store the results of our search without being able to put it in a database. This leaves us with a classic software engineering problem. What’s the path forward? ## Exploring Alternatives The impulse answer is to do as little as possible to make the damn thing work. In a situation where the software you’re writing is a prototype, and you expect it to be rewritten from scratch in the future, this is an acceptable attitude. That said, experienced engineers would caution you that, all too often, such “prototypes” are copy-pasted to become janky mission-critical systems for years. In pretending this is the “real thing,” let’s do what real engineers would do and scope out some alternatives before diving in. The two aspects are our database and the use of numba for performance. Let’s start with the database. A quick and dirty option: store all numbers as text strings in the database. There’s no limit on the size of the number in that case. The benefit: we don’t need to use a different database engine, and most of our code stays the same. The cost: we can’t use numeric operations in database queries, which would make further analysis and fetching awkward. In particular, we can’t even apply sorting operations, since text strings are sorted lexicographically (e.g., 100, 25) while numbers are sorted by magnitude (25, 100). Note, we applied this “numbers as text” idea to the problem of serializing the search state, and it was hacky there, too. A second option is to find a database engine with direct support for unbounded-integer arithmetic. The benefit: fast database queries and the confidence that it will support future use cases well. The cost: if our existing sqlite-based interface doesn’t work with the new database engine, we’d have to write another implementation of our database interface. For numba, we have at least three options. First, fall back to native python arithmetic, which is slow. Second, implement arbitrary-precision arithmetic in Python in a way that numba can compile it. Third, find (or implement) a C-implementation of arbitrary precision integer arithmetic, provide Python bindings, and optionally see if it can work with (or replace) numba. As I write this I haven’t yet tried any of these options. My intuition tells me the best way to go would be to find “proper” support for arbitrary precision integers. For the database, I recall that the Postgres database engine supports various extensions, for example this extension that adds support for geographic objects. Postgres’s extension framework demonstrates an important software engineering principle that many of the best projects follow: “closed for modification, open for extension.” That is, Postgres is designed so that others can contribute new features to Postgres without requiring the Postgres team to do anything special—specifically, they don’t have to change Postgres to accommodate it. The name for this sometimes goes by extensions, or plug-ins, hooks, or (at a lower level) callbacks. Github Actions is a good example of this. Geographic objects are almost certainly more complicated than arbitrary precision integers, so chances are good a Postgres extension exists for the latter. Incorporating it would involve migrating to Postgres, finding and installing that extension, and then converting the C library representation above to whatever representation Postgres accepts in a query. A good route will also ensure that we need not change our tests too much, since all we’re doing here is modifying implementations. We’ll see how well that holds up. ## gmp and pgmp After some digging, I found GMP (GNU Multiple Precision), a C library written by Torbjörn Granlund. It has a Python bindings library called gmpy that allows Python to use an “mpz” (“Multiple Precision $\mathbb{Z}$“) type as a drop-in replacement for Python integers. And I found a PostgreSQL extension called pgmp. The standard Python library for Postgres is psycopg2, which was written by the same person who wrote pgmp, Daniele Varrazzo. To start, I ran a timing test of gmpy, which proves to be as fast as numba. This pull request has the details. It took a small bit of kicking to get pgmp to install, but then I made a test database that uses the new column type mpz and stores the value $2^{513}$. postgres=# create database pgmp_test; CREATE DATABASE postgres=# \connect pgmp_test; You are now connected to database "pgmp_test" as user "jeremy". pgmp_test=# CREATE EXTENSION pgmp; CREATE EXTENSION pgmp_test=# create table test_table (id int4, value mpz); CREATE TABLE pgmp_test=# insert into test_table pgmp_test-# values (1, 2::mpz ^ 513); INSERT 0 1 pgmp_test=# select * from test_table; id | value ----+------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | 26815615859885194199148049996411692254958731641184786755447122887443528060147093953603748596333806855380063716372972101707507765623893139892867298012168192 (1 row)  Now I’m pretty confident this approach will work. This pull request includes the necessary commits to add a postgres implementation of our database interface, add tests (which is a minor nuisance). Then this pull request converts the main divisor computation functions to use gmpy, and this final commit converts the main program to use the postgres database. This exposed one bug, that I wasn’t converting the new mpz types properly in the postgres sql query. This commit fixes it, and this commit adds a regression test to catch that specific error going forward. ## Results and next steps With all that work, I ran the counterexample search for a few hours. When I stopped it, it had checked all possibly-superabundant numbers whose prime factorizations have at most 75 prime factors, including multiplicity. Since all possible counterexamples to the RH must be superabundant, and all superabundant numbers have the aforementioned special prime factorization, we can say it more simply. I ruled out all positive integers whose prime factorization has at most 75 factors. The top 10 are: divisor=# select n, witness_value from RiemannDivisorSums where witness_value > 1.7 and n > 5040 order by witness_value desc limit 10; n | witness_value ----------------------------------------------------------------------------------------------------------------------------------------------------+-------------------- 7837096340441581730115353880089927210115664131849557062713735873563599357074419807246597145310377220030504976899588686851652680862494751024960000 | 1.7679071291526642 49445402778811241199465955079431717413978953513246416799455746836363402883750282695562127099750014006501608687063651021146073696293342277760000 | 1.767864530684858 24722701389405620599732977539715858706989476756623208399727873418181701441875141347781063549875007003250804343531825510573036848146671138880000 | 1.767645098171234 157972532839652527793820942745788234549453525601426251755449670403716942120607931934703281468849885004797471843653837128262216282087355520000 | 1.7676163327497005 2149800120817880052150693699105726844086041457097670295628510732015800125380447073720092482597826695934852551611463087875916247664927925120000 | 1.767592584103948 340743319149633988265884951308257704787637570949980741857118951024504319872800861184634658491755531305674129430416899428332725254891076131520000 | 1.767582883432923 23511289021324745190346061640269781630346992395548671188141207620690798071223259421739791435931131660091514930698766060554958042587484253074880000 | 1.7674462177172812 507950266365442211555694349664913937458049921547994378634886400011951582381375986928306371282475514484879330686989829994412271003496320000 | 1.7674395010995763 78986266419826263896910471372894117274726762800713125877724835201858471060303965967351640734424942502398735921826918564131108141043677760000 | 1.7674104158678667 6868370993028370773644388815034271067367544591366358771976072626248562700895997040639273107341299348034672688854514657750531142699450240000 | 1.7674059308384011  This is new. We’ve found quite a few numbers that have a better witness value than $n = 10080$ which achieves ~1.7558. The best is 78370963404415817301153538800899272101156641318495 57062713735873563599357074419807246597145310377220 030504976899588686851652680862494751024960000 which achieves ~1.7679. Recall the 1.781 threshold needed to be a RH counterexample. We’re about 50% of the way toward disproving RH. How much more work could it take? But seriously, what’s next with this project? For one, even though we have some monstrous numbers and their divisor sums and witness values, it’s hard to see the patterns in them through a SQL queries. It would be nicer to make some pretty plots. I could also take a step back and see what could be improved from a software engineering standpoint. For one, not all parts of the application are tested, and tests aren’t automatically run when I make changes. This enabled the bug above where I didn’t properly convert mpz types before passing them to SQL upsert statements. For two, while I have been using type annotations in some places, they aren’t checked, and the switch to mpz has almost certainly made many of the type hints incorrect. I could fix that and set up a test that type checks. Finally, in the interest of completeness, I could set up a front end that displays some summary of the data, and then deploy the whole application so that it has a continuously-running background search, along with a website that shows how far along the search is. Based on how long the SQL statement to find the top 10 witness values took, this would also likely require some caching, which fits snugly in the class of typical software engineering problems. Let me know what you’re interested in. # Searching for RH Counterexamples — Search Strategies We’re glibly searching for counterexamples to the Riemann Hypothesis, to trick you into learning about software engineering principles. In the first two articles we configured a testing framework and showed how to hide implementation choices behind an interface. Next, we’ll improve the algorithm’s core routine. As before, I’ll link to specific git commits in the final code repository to show how the project evolves. ## Superabundant numbers A superabundant number $n$ is one which has “maximal relative divisor sums” in the following sense: for all $m < n$, $\displaystyle \frac{\sigma(m)}{m} < \frac{\sigma(n)}{n}$ where $\sigma(n)$ is the sum of the divisors of $n$. Erdős and Alaoglu proved in 1944 (“On highly composite and similar numbers“) that superabundant numbers have a specific prime decomposition, in which all initial primes occur with non-increasing exponents $\displaystyle n = \prod_{i=1}^k (p_i)^{a_i},$ where $p_i$ is the i-th prime, and $a_1 \geq a_2 \geq \dots \geq a_k \geq 1$. With two exceptions ($n=4, 36$), $a_k = 1$. Here’s a rough justification for why superabundant numbers should have a decomposition like this. If you want a number with many divisors (compared to the size of the number), you want to pack as many combinations of small primes into the decomposition of your number as possible. Using all 2’s leads to not enough combinations—only $m+1$ divisors for $2^m$—but using 2′ and 3’s you get $(r+1)(s+1)$ for $2^r3^s$. Using more 3’s trades off a larger number $n$ for the benefit of a larger $\sigma(n)$ (up to $r=s$). The balance between getting more distinct factor combinations and a larger $n$ favors packing the primes in there. Though numbers of this form are not necessarily superabundant, this gives us an enumeration strategy better than trying all numbers. Enumerate over tuples corresponding to the exponents of the prime decomposition (non-increasing lists of integers), and save those primes to make it easier to compute the divisor sum. Non-increasing lists of integers can be enumerated in the order of their sum, and for each sum $N$, the set of non-increasing lists of integers summing to $N$ is called the partitions of $N$. There is a simple algorithm to compute them, implemented in this commit. Note this does not enumerate them in order of the magnitude of the number $\prod_{i=1}^k (p_i)^{a_i}$. The implementation for the prime-factorization-based divisor sum computation is in this commit. In addition, to show some alternative methods of testing, we used the hypothesis library to autogenerate tests. It chooses a random (limited size) prime factorization, and compares the prime-factorization-based algorithm to the naive algorithm. There’s a bit of setup code involved, but as a result we get dozens of tests and more confidence it’s right. ## Search Strategies We now have two search strategies over the space of natural numbers, though one is obviously better. We may come up with a third, so it makes sense to separate the search strategy from the main application by an interface. Generally, if you have a hard-coded implementation, and you realize that you need to change it in a significant way, that’s a good opportunity to extract it and hide it behind an interface. A good interface choice is a bit tricky here, however. In the original implementation, we could say, “process the batch of numbers (search for counterexamples) between 1 and 2 million.” When that batch is saved to the database, we would start on the next batch, and all the batches would be the same size, so (ignoring that computing $\sigma(n)$ the old way takes longer as $n$ grows) each batch required roughly the same time to run. The new search strategy doesn’t have a sensible way to do this. You can’t say “start processing from K” because we don’t know how to easily get from K to the parameter of the enumeration corresponding to K (if one exists). This is partly because our enumeration isn’t monotonic increasing ($2^1 3^1 5^1 = 30$ comes before $2^4 = 16$). And partly because even if we did have a scheme, it would almost certainly require us to compute a prime factorization, which is slow. It would be better if we could save the data from the latest step of the enumeration, and load it up when starting the next batch of the search. This scheme suggests a nicely generic interface for stopping and restarting a search from a particular spot. The definition of a “spot,” and how to start searching from that spot, are what’s hidden by the interface. Here’s a first pass. SearchState = TypeVar('SearchState') class SearchStrategy(ABC): @abstractmethod def starting_from(self, search_state: SearchState) -> SearchStrategy: '''Reset the search strategy to search from a given state.''' pass @abstractmethod def search_state(self) -> SearchState: '''Get an object describing the current state of the enumeration.''' pass @abstractmethod def next_batch(self, batch_size: int) -> List[RiemannDivisorSum]: '''Process the next batch of Riemann Divisor Sums''' pass  Note that SearchState is defined as a generic type variable because we cannot say anything about its structure yet. The implementation class is responsible for defining what constitutes a search state, and getting the search strategy back to the correct step of the enumeration given the search state as input. Later I realized we do need some structure on the SearchState—the ability to serialize it for storage in the database—so we elevated it to an interface later. Also note that we are making SearchStrategy own the job of computing the Riemann divisor sums. This is because the enumeration details and the algorithm to compute the divisor sums are now coupled. For the exhaustive search strategy it was “integers n, naively loop over smaller divisors.” In the new strategy it’s “prime factorizations, prime-factorization-based divisor sum.” We could decouple this, but there is little reason to now because the implementations are still in 1-1 correspondence. This commit implements the old search strategy in terms of this interface, and this commit implements the new search strategy. In the latter, I use pytest.parameterize to test against the interface and parameterize over the implementations. The last needed bit is the ability to store and recover the search state in between executions of the main program. This requires a second database table. The minimal thing we could do is just store and update a single row for each search strategy, providing the search state as of the last time the program was run and stopped. This would do, but in my opinion an append-only log is a better design for such a table. That is, each batch computed will have a record containing the timestamp the batch started and finished, along with the starting and ending search state. We can use the largest timestamp for a given search strategy to pick up where we left off across program runs. One can imagine this being the basis for an application like folding@home or the BOINC family of projects, where a database stores chunks of a larger computation (ranges of a search space), clients can request chunks to complete, and they are assembled into a complete database. In this case we might want to associate the chunk metadata with the computed results (say, via a foreign key). That would require a bit of work from what we have now, but note that the interfaces would remain reusable for this. For now, we will just incorporate the basic table approach. It is completed in this pull request, and tying it into the main search routine is done in this commit. However, when running it with the superabundant search strategy, we immediately run into a problem. Superabundant numbers grow too fast, and within a few small batches of size 100 we quickly exceed the 64 bits available to numba and sqlite to store the relevant data. >>> fac = partition_to_prime_factorization(partitions_of_n(16)[167]) >>> fac2 = [p**d for (p, d) in fac] >>> fac2 [16, 81, 625, 2401, 11, 13, 17, 19, 23, 29, 31, 37] >>> math.log2(reduce(lambda x,y: x*y, fac2)) 65.89743638933722  Running populate_database.py results in the error  python -m riemann.populate_database db.sqlite3 SuperabundantSearchStrategy 100
Searching with strategy SuperabundantSearchStrategy
Starting from search state SuperabundantEnumerationIndex(level=1, index_in_level=0)
Computed [1,0, 10,4] in 0:00:03.618798
Computed [10,4, 12,6] in 0:00:00.031451
Computed [12,6, 13,29] in 0:00:00.031518
Computed [13,29, 14,28] in 0:00:00.041464
Computed [14,28, 14,128] in 0:00:00.041674
Computed [14,128, 15,93] in 0:00:00.034419
...
OverflowError: Python int too large to convert to SQLite INTEGER


We’ll see what we can do about this in a future article, but meanwhile we do get some additional divisor sums for these large numbers, and 10080 is still the best.

sqlite> select n, witness_value
from RiemannDivisorSums
where witness_value > 1.7 and n > 5040
order by witness_value desc
limit 10;

10080|1.7558143389253
55440|1.75124651488749
27720|1.74253672381383
7560|1.73991651920276
15120|1.73855867428903
160626866400|1.73744669257158
321253732800|1.73706925385011
110880|1.73484901030336
6983776800|1.73417642212953
720720|1.73306535623807