Searching for RH Counterexamples — Setting up Pytest

Some mathy-programmy people tell me they want to test their code, but struggle to get set up with a testing framework. I suspect it’s due to a mix of:

  • There are too many choices with a blank slate.
  • Making slightly wrong choices early on causes things to fail in unexpected ways.

I suspect the same concerns apply to general project organization and architecture. Because Python is popular for mathy-programmies, I’ll build a Python project that shows how I organize my projects and and test my code, and how that shapes the design and evolution of my software. I will use Python 3.8 and pytest, and you can find the final code on Github.

For this project, we’ll take advice from John Baez and explore a question that glibly aims to disprove the Riemann Hypothesis:

A CHALLENGE:

Let σ(n) be the sum of divisors of n. There are infinitely many n with σ(n)/(n ln(ln(n)) > 1.781. Can you find one? If you can find n > 5040 with σ(n)/(n ln(ln(n)) > 1.782, you’ll have disproved the Riemann Hypothesis.

I don’t expect you can disprove the Riemann Hypothesis this way, but I’d like to see numbers that make σ(n)/(n ln(ln(n)) big. It seems the winners are all multiples of 2520, so try those. The best one between 5040 and a million is n = 10080, which only gives 1.755814.

https://twitter.com/johncarlosbaez/status/1149700802371608576

Initializing the Project

One of the hardest parts of software is setting up your coding environment. If you use an integrated development environment (IDE), project setup is bespoke to each IDE. I dislike this approach, because what you learn when using the IDE is not useful outside the IDE. When I first learned to program (Java), I was shackled to Eclipse for years because I didn’t know how to compile and run Java programs without it. Instead, we’ll do everything from scratch, using only the terminal/shell and standard Python tools. I will also ignore random extra steps and minutiae I’ve built up over the years to deal with minor issues. If you’re interested in that and why I do them, leave a comment and I might follow up with a second article.

This article assumes you are familiar with the basics of Python syntax, and know how to open a terminal and enter basic commands (like ls, cd, mkdir, rm). Along the way, I will link to specific git commits that show the changes, so that you can see how the project unfolds with each twist and turn.

I’ll start by creating a fresh Python project that does nothing. We set up the base directory riemann-divisor-sum, initialize git, create a readme, and track it in git (git add + git commit).

mkdir riemann-divisor-sum
cd riemann-divisor-sum
git init .
echo "# Divisor Sums for the Riemann Hypothesis" > README.md
git add README.md
git commit -m "add empty README.md"

Next I create a Github project at https://github.com/j2kun/riemann-divisor-sum (the name riemann-divisor-sum does not need to be the same, but I think it’s good), and push the project up to Github.

git remote add origin git@github.com:j2kun/riemann-divisor-sum.git
# instead of "master", my default branch is really "main"
git push -u origin master   

Note, if you’re a new Github user, the “default branch name” when creating a new project may be “master.” I like “main” because it’s shorter, clearer, and nicer. If you want to change your default branch name, you can update to git version 2.28 and add the following to your ~/.gitconfig file.

[init]
    defaultBranch = main

Here is what the project looks like on Github as of this single commit.

Pytest

Next I’ll install the pytest library which will run our project’s tests. First I’ll show what a failing test looks like, by setting up a trivial program with an un-implemented function, and a corresponding test. For ultimate simplicity, we’ll use Python’s built-in assert for the test lines. Here’s the commit.

# in the terminal
mkdir riemann
mkdir tests


# create riemann/divisor.py containing:
'''Compute the sum of divisors of a number.'''

def divisor_sum(n: int) -> int:
    raise ValueError("Not implemented.")


# create tests/divisor_test.py containing:
from riemann.divisor import divisor_sum

def test_sum_of_divisors_of_72():
    assert 195 == divisor_sum(72)

Next we install and configure Pytest. At this point, since we’re introducing a dependency, we need a project-specific place to store that dependency. All dependencies related to a project should be explicitly declared and isolated. This page helps explain why. Python’s standard tool is the virtual environment. When you “activate” the virtual environment, it temporarily (for the duration of the shell session or until you run deactivate) points all Python tools and libraries to the virtual environment.

virtualenv -p python3.8 venv
source venv/bin/activate

# shows the location of the overridden python binary path
which python
# outputs: /Users/jeremy/riemann-divisor-sum/venv/bin/python

Now we can use pip as normal and it will install to venv. To declare and isolate the dependency, we write the output of pip freeze to a file called requirements.txt, and it can be reinstalled using pip install -r requirements.txt. Try deleting your venv directory, recreating it, and reinstalling the dependencies this way.

pip install pytest
pip freeze > requirements.txt
git add requirements.txt
git commit -m "requirements: add pytest"

# example to wipe and reinstall
# deactivate
# rm -rf venv
# virtualenv -p python3.8 venv
# source venv/bin/activate
# pip install -r requirements.txt

As an aside, at this step you may notice git mentions venv is an untracked directory. You can ignore this, or add venv to a .gitignore file to tell git to ignore it, as in this commit. We will also have to configure pytest to ignore venv shortly.

When we run pytest (with no arguments) from the base directory, we see our first error:

    from riemann.divisor import divisor_sum
E   ModuleNotFoundError: No module named 'riemann'

Module import issues are a common stumbling block for new Python users. In order to make a directory into a Python module, it needs an __init__.py file, even if it’s empty. Any code in this file will be run the first time the module is imported in a Python runtime. We add one to both the code and test directories in this commit.

When we run pytest (with no arguments), it recursively searches the directory tree looking for files like *_test.py and test_*.py loads them, and treats every method inside those files that are prefixed with “test” as a test. Non-“test” methods can be defined and used as helpers to set up complex tests. Pytest then runs the tests, and reports the failures. For me this looks like

Our first test failure.

Our implementation is intentionally wrong for demonstration purposes. When a test passes, pytest will report it quietly as a “.” by default. See these docs for more info on different ways to run the pytest binary and configure its output report.

In this basic pytest setup, you can put test files wherever you want, name the files and test methods appropriately, and use assert to implement the tests themselves. As long as your modules are set up properly, as long as imports are absolute (see this page for gory details on absolute vs. relative imports), and as long as you run pytest from the base directory, pytest will find the tests and run them.

Since pytest searches all directories for tests, this includes venv and __pycache__, which magically appears when you create python modules (I add __pycache__ to gitignore). Sometimes package developers will include test code, and pytest will then run those tests, which often fail or clutter the output. A virtual environment also gets large as you install big dependencies (like numpy, scipy, pandas), so this makes pytest slow to search for tests to run. To alleviate, the --norecursedirs command line flag tells pytest to skip directories. Since it’s tedious to type --norecursedirs='venv __pycache__' every time you run pytest, you can make this the default behavior by storing the option in a configuration file recognized by pytest, such as setup.cfg. I did it in this commit.

Some other command line options that I use all the time:

  • pytest test/dir to test only files in that directory, or pytest test/dir/test_file.py to test only tests in that file.
  • pytest -k STR to only run tests whose name contains “STR”
  • pytest -s to see see any logs or print statements inside tested code
  • pytest -s to allow the pdb/ipdb debugger to function and step through a failing test.

Building up the project

Now let’s build up the project. My general flow is as follows:

  1. Decide what work to do next.
  2. Sketch out the interface for that work.
  3. Write some basic (failing, usually lightweight) tests that will pass when the work is done.
  4. Do the work.
  5. Add more nuanced tests if needed, based on what is learned during the work.
  6. Repeat until the work is done.

This strategy is sometimes called “the design recipe,” and I first heard about it from my undergraduate programming professor John Clements at Cal Poly, via the book “How to Design Programs.” Even if I don’t always use it, I find it’s a useful mental framework for getting things done.

For this project, I want to search through positive integers, and for each one I want to compute a divisor sum, do some other arithmetic, and compare that against some other number. I suspect divisor sum computations will be the hard/interesting part, but to start I will code up a slow/naive implementation with some working tests, confirm my understanding of the end-to-end problem, and then improve the pieces as needed.

In this commit, I implement the naive divisor sum code and tests. Note the commit also shows how to tell pytest to test for a raised exception. In this commit I implement the main search routine and confirm John’s claim about n=10080 (thanks for the test case!).

These tests already showcase a few testing best practices:

  • Test only one behavior at a time. Each test has exactly one assertion in it. This is good practice because when a test fails you won’t have to dig around to figure out exactly what went wrong.
  • Use the tests to help you define the interface, and then only test against that interface. The hard part about writing clean and clear software is defining clean and clear interfaces that work together well and hide details. Math does this very well, because definitions like \sigma(n) do not depend on how n is represented. In fact, math really doesn’t have “representations” of its objects—or more precisely, switching representations is basically free, so we don’t dwell on it. In software, we have to choose excruciatingly detailed representations for everything, and so we rely on the software to hide those details as much as possible. The easiest way to tell if you did it well is to try to use the interface and only the interface, and tests are an excuse to do that, which is not wasted effort by virtue of being run to check your work.

Improving Efficiency

Next, I want to confirm John’s claim that n=10080 is the best example between 5041 and a million. However, my existing code is too slow. Running the tests added in this commit seems to take forever.

We profile to confirm our suspected hotspot:

>>> import cProfile
>>> from riemann.counterexample_search import best_witness
>>> cProfile.run('best_witness(10000)')
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
54826    3.669    0.000    3.669    0.000 divisor.py:10(<genexpr>)

As expected, computing divisor sums is the bottleneck. No surprise there because it makes the search take quadratic time. Before changing the implementation, I want to add a few more tests. I copied data for the first 50 integers from OEIS and used pytest’s parameterize feature since the test bodies are all the same. This commit does it.

Now I can work on improving the runtime of the divisor sum computation step. Originally, I thought I’d have to compute the prime factorization to use this trick that exploits the multiplicativity of \sigma(n), but then I found this approach due to Euler in 1751 that provides a recursive formula for the sum and skips the prime factorization. Since we’re searching over all integers, this allows us to trade off the runtime of each \sigma(n) computation against the storage cost of past \sigma(n) computations. I tried it in this commit, using python’s built-in LRU-cache wrapper to memoize the computation. The nice thing about this is that our tests are already there, and the interface for divisor_sum doesn’t change. This is on purpose, so that the caller of divisor_sum (in this case tests, also client code in real life) need not update when we improve the implementation. I also ran into a couple of stumbling blocks implementing the algorithm (I swapped the order of the if statements here), and the tests made it clear I messed up.

However, there are two major problems with that implementation.

  1. The code is still too slow. best_witness(100000) takes about 50 seconds to run, almost all of which is in divisor_sum.
  2. Python hits its recursion depth limit, and so the client code needs to eagerly populate the divisor_sum cache, which is violates encapsulation. The caller should not know anything about the implementation, nor need to act in a specific way to accommodate hidden implementation details.

I also realized after implementing it that despite the extra storage space, the runtime is still O(n^{3/2}), because each divisor-sum call requires O(n^{1/2}) iterations of the loop. This is just as slow as a naive loop that checks divisibility of integers up to \sqrt{n}. Also, a naive loop allows me to plug in a cool project called numba that automatically speeds up simple Python code by compiling it in place. Incidentally, numba is known to not work with lru_cache, so I can’t tack it on my existing implementation.

So I added numba as a dependency and drastically simplified the implementation. Now the tests run in 8 seconds, and in a few minutes I can upgrade John’s claim that n=10080 is the best example between 5041 and a million, to the best example between 5041 and ten million.

Next up

This should get you started with a solid pytest setup for your own project, but there is a lot more to say about how to organize and run tests, what kinds of tests to write, and how that all changes as your project evolves.

For this project, we now know that the divisor-sum computation is the bottleneck. We also know that the interesting parts of this project are yet to come. We want to explore the patterns in what makes these numbers large. One way we could go about this is to split the project into two components: one that builds/manages a database of divisor sums, and another that analyzes the divisor sums in various ways. The next article will show how the database set up works. When we identify relevant patterns, we can modify the search strategy to optimize for that. As far as testing goes, this would prompt us to have an interface layer between the two systems, and to add fakes or mocks to test the components in isolation.

After that, there’s the process of automating test running, adding tests for code quality/style, computing code coverage, adding a type-hint checker test, writing tests that generate other tests, etc.

If you’re interested, let me know which topics to continue with. I do feel a bit silly putting so much pomp and circumstance around such a simple computation, but hopefully the simplicity of the core logic makes the design and testing aspects of the project clearer and easier to understand.

My LaTeX Workflow: latexmk, ShareLaTeX, and StackEdit

Over the last year or so I’ve gradually spent more and more of my time typing math. Be it lecture notes, papers, or blog posts, I think in the last two years I’ve typed vastly more dollar signs (TeX math mode delimiters) than in the rest of my life combined. As is the natural inclination for most programmers, I’ve tried lots of different ways to optimize my workflow and minimize the amount of typing, configuring, file duplicating, and compiler-wrestling I do in my day-to-day routine.

I’ve arrived at what I feel is a stable state. Here’s what I use.

First, my general setup. At home I run OS X Mavericks (10.9.5), and I carry a Chromebook with me to campus and when I travel.

For on-the-fly note taking

I haven’t found a better tool than StackEdit.

stackedit-logo

Mindset: somewhere in between writing an email with one or two bits of notation (just write TeX source and hope they can read it) and writing a document that needs to look good. These are documents for which you have no figures, don’t want to keep track of sections and theorem numbering, and have no serious bibliography.

Use cases:

  • In class notes: where I need to type fast and can sacrifice on prettiness. Any other workflow besides Markdown with TeX support is just awfully slow, because the boilerplate of LaTeX proper involves so much typing (\begin{theorem} \end{theorem}, etc.)
  • Notes during talks: these notes usually have fewer formulas and more sentences, but the ability to use notation when I want it really helps.
  • Short drafts of proofs: when I want to send something technical yet informal to a colleague, but it’s in such a draft phase that I’m more concerned about the idea being right—and on paper—than whether it looks good.

Awesome features: I can access documents from Google Drive. Integration with Dropbox (which they have) is not enough because I don’t have Dropbox on every computer I use (Chromebook, public/friends’ computers). Also, you can configure Google Drive to open markdown files with StackEdit by default (otherwise Drive can’t open them at all).

How it could improve: The service gets sluggish with longer documents, and sometimes the preview page jumps around like crazy when you have lots of offset equations. Sometimes it seems like it recompiles the whole document when you only change one paragraph, and so the preview can be useless to look at while you’re typing. I recently discovered you can turn off features you don’t use in the settings, so that might speed things up.

Also, any time something needs to be aligned (such as a matrix or piecewise notation), you have to type \begin{}’s and \end{}’s, so that slows down the typing. It would be nice to have some shortcuts like \matrix[2,3]{1,3,4,4,6,8} or at least an abbreviation for \begin and \end (\b{} and \e{}, maybe?). Also some special support for (and shortcuts for) theorem/proof styling would be nice, but not necessary. Right now I embolden the Theorem and italicize the Proof., and end with a tombstone \square on a line by itself. I don’t see a simple way to make a theorem/proof environment with minimal typing, but it does occur to me as an inefficiency; the less time I can spend highlighting and formatting things the better.

Caveats: Additional features, such as exporting from StackEdit to pdf requires you to become a donor ($5/year, a more than fair price for the amount I use it). I would find the service significantly less useful if I could not export to pdf.

For work while travelling

My favorite so far is ShareLaTeX.

sharelatexI’ve used a bunch of online TeX editors, most notably Overleaf (formerly WriteLaTeX). They’re both pretty solid, but a few features tip me toward ShareLaTeX. I’ll italicize these things below.

Mindset: An editor I can use on my Chromebook or a public machine, yet still access my big papers and projects in progress. Needs support for figures, bibliographies, the whole shebang. Basically I need a browser replacement for a desktop LaTeX setup. I generally do not need collaboration services, because the de facto standard among everyone I’ve ever interacted with is that you can only expect people to have Dropbox. You cannot expect them to sign up for online services just to work with you.

Use cases:

  • Drafting actual research papers
  • Writing slides/talks

Awesome features: Dropbox integration! This is crucial, because I (and everyone I know) does their big collaborative projects using Dropbox. ShareLaTeX (unlike Overleaf) has seamless Dropbox integration. The only caveat is that ShareLaTeX only accesses Dropbox files that are in a specially-named folder. This causes me to use a bunch of symbolic links that would be annoying to duplicate if I got a new machine.

Other than that, ShareLaTeX (like Overleaf) has tons of templates, all the usual libraries, great customer support, and great collaborative features for the once in a blue moon that someone else uses ShareLaTeX.

Vim commands. The problem is that they don’t go far enough here. They don’t support vim-style word-wrapping (gq), and they leave out things like backward search (? instead of /) and any : commands you tend to use.

Github integration. Though literally no mathematicians I know use Github for anything related to research, I think that with the right features Github could become the “right” solution to paper management. The way people store and “archive” their work is horrendous, and everyone can agree a waste of time. I have lots of ideas for how Github could improve academics’ lives and the lives of the users of their research, too many to list here without derailing the post. The point is that ShareLaTeX having Github integration is forward thinking and makes ShareLaTeX more attractive.

How it could improve: Better vim command support. It seems like many of these services are viewed by their creators as a complete replacement for offline work, when really (for me) it’s a temporary substitute that needs to operate seamlessly with my other services. So basically the more seamless integration it has with services I use, the better.

Caveats: Integration comes at a premium of $8/month for students, and $15/month for non-students.

Work at home

This is where we get into the nitty gritty of terminal tools. Because naively writing papers in TeX on a desktop has a lot of lame steps and tricks. There are (multiple types of) bibliography files to manage, you have to run like four commands to compile a document, and the TeX compiler errors are often nonsense.

I used to have a simple script to compile;display;clean for me, but then I came across the latexmk program. What you can do is configure latexmk to automatically recompile when a change is made to a source file, and then you can configure a pdf viewer (like Skim) to update when the pdf changes. So instead of the workflow being “Write. Compile. View. Repeat,” It’s “Compile. View. Write until done.”

Of course lots of random TeX distributions come with crusty GUIs that (with configuration) do what latexmk does. But I love my vim, and you have your favorite editor, too. The key part is that latexmk and Skim don’t care what editor you use.

For reference, here’s how I got it all configured on OS X Mavericks.

  1. Install latexmk (move the perl script downloadable from their website to anywhere on your $PATH).
  2. Add alias latexmk='latexmk.pl -pvc' to your .profile. The -pvc flag makes latexmk watch for changes.
  3. Add the following to a new file called .latexmkrc in your home directory (it says: I only do pdfs and use Skim to preview):
    $pdf_mode = 1;
    $postscript_mode = 0;
    $dvi_mode = 0;
    $pdf_previewer = "open -a /Applications/Skim.app";
    $clean_ext = "paux lox pdfsync out";
  4. Install Skim.
  5. In Skim’s preferences, go to the Sync tab and check the box “Check for file changes.”
  6. Run the following from the command line, which prevents Skim from asking (once for each file!) whether you want to auto reload that file:
    $ defaults write -app Skim SKAutoReloadFileUpdate -boolean true

Now the workflow is: browse to your working directory; run latexmk yourfile.tex (this will open Skim); open the tex document in your editor; write. When you save the file, it will automatically recompile and display in Skim. Since it’s OS X, you can scroll through the pdf without switching window focus, so you don’t even have to click back into the terminal window to continue typing.

Finally, I have two lines in my .vimrc to auto-save every second that the document is idle (or when the window loses focus) so that I don’t have to type :w every time I want the updates to display. To make this happen only when you open a tex file, add these lines instead to ~/.vim/ftplugin/tex.vim

set updatetime=1000
autocmd CursorHoldI,CursorHold,BufLeave,FocusLost silent! wall

Caveats: I haven’t figured out how to configure latexmk to do anything more complicated than this. Apparently it’s possible to get it setup to work with “Sync support,” which means essentially you can go back and forth between the source file lines and the corresponding rendered document lines by clicking places. I think reverse search (pdf->vim) isn’t possible with regular vim (it is apparently with macvim), but forward search (vim->pdf) is if you’re willing to install some plugins and configure some files. So here is the place where Skim does care what editor you use. I haven’t yet figured out how to do it, but it’s not a feature I care much for.


One deficiency I’ve found: there’s no good bibliography manager. Sorry, Mendeley, I really can’t function with you. I’ll just be hand-crafting my own bib files until I find or make a better solution.

Have any great tools you use for science and paper writing? I’d love to hear about them.