Negacyclic Polynomial Multiplication

In this article I’ll cover three techniques to compute special types of polynomial products that show up in lattice cryptography and fully homomorphic encryption. Namely, the negacyclic polynomial product, which is the product of two polynomials in the quotient ring $\mathbb{Z}[x] / (x^N + 1)$. As a precursor to the negacyclic product, we’ll cover the simpler cyclic product.

All of the Python code written for this article is on GitHub.

The DFT and Cyclic Polynomial Multiplication

A recent program gallery piece showed how single-variable polynomial multiplication could be implemented using the Discrete Fourier Transform (DFT). This boils down to two observations:

  1. The product of two polynomials $f, g$ can be computed via the convolution of the coefficients of $f$ and $g$.
  2. The Convolution Theorem, which says that the Fourier transform of a convolution of two signals $f, g$ is the point-wise product of the Fourier transforms of the two signals. (The same holds for the DFT)

This provides a much faster polynomial product operation than one could implement using the naïve polynomial multiplication algorithm (though see the last section for an implementation anyway). The DFT can be used to speed up large integer multiplication as well.

A caveat with normal polynomial multiplication is that one needs to pad the input coefficient lists with enough zeros so that the convolution doesn’t “wrap around.” That padding results in the output having length at least as large as the sum of the degrees of $f$ and $g$ (see the program gallery piece for more details).

If you don’t pad the polynomials, instead you get what’s called a cyclic polynomial product. More concretely, if the two input polynomials $f, g$ are represented by coefficient lists $(f_0, f_1, \dots, f_{N-1}), (g_0, g_1, \dots, g_{N-1})$ of length $N$ (implying the inputs are degree at most $N-1$, i.e., the lists may end in a tail of zeros), then the Fourier Transform technique computes

\[ f(x) \cdot g(x) \mod (x^N – 1) \]

This modulus is in the sense of a quotient ring $\mathbb{Z}[x] / (x^N – 1)$, where $(x^N – 1)$ denotes the ring ideal generated by $x^N-1$, i.e., all polynomials that are evenly divisible by $x^N – 1$. A particularly important interpretation of this quotient ring is achieved by interpreting the ideal generator $x^N – 1$ as an equation $x^N – 1 = 0$, also known as $x^N = 1$. To get the canonical ring element corresponding to any polynomial $h(x) \in \mathbb{Z}[x]$, you “set” $x^N = 1$ and reduce the polynomial until there are no more terms with degree bigger than $N-1$. For example, if $N=5$ then $x^{10} + x^6 – x^4 + x + 2 = -x^4 + 2x + 3$ (the $x^{10}$ becomes 1, and $x^6 = x$).

To prove the DFT product computes a product in this particular ring, note how the convolution theorem produces the following formula, where $\textup{fprod}(f, g)$ denotes the process of taking the Fourier transform of the two coefficient lists, multiplying them entrywise, and taking a (properly normalized) inverse FFT, and $\textup{fprod}(f, g)(j)$ is the $j$-th coefficient of the output polynomial:

\[ \textup{fprod}(f, g)(j) = \sum_{k=0}^{N-1} f_k g_{j-k \textup{ mod } N} \]

In words, the output polynomial coefficient $j$ equals the sum of all products of pairs of coefficients whose indices sum to $j$ when considered “wrapping around” $N$. Fixing $j=1$ as an example, $\textup{fprod}(f, g)(1) = f_0 g_1 + f_1g_0 + f_2 g_{N-1} + f_3 g_{N-2} + \dots$. This demonstrates the “set $x^N = 1$” interpretation above: the term $f_2 g_{N-1}$ corresponds to the product $f_2x^2 \cdot g_{N-1}x^{N-1}$, which contributes to the $x^1$ term of the polynomial product if and only if $x^{2 + N-1} = x$, if and only if $x^N = 1$.

To achieve this in code, we simply use the version of the code from the program gallery piece, but fix the size of the arrays given to numpy.fft.fft in advance. We will also, for simplicity, assume the $N$ one wishes to use is a power of 2. The resulting code is significantly simpler than the original program gallery code (we omit zero-padding to length $N$ for brevity).

import numpy
from numpy.fft import fft, ifft

def cyclic_polymul(p1, p2, N):
    """Multiply two integer polynomials modulo (x^N - 1).

    p1 and p2 are arrays of coefficients in degree-increasing order.
    assert len(p1) == len(p2) == N
    product = fft(p1) * fft(p2)
    inverted = ifft(product)
    return numpy.round(numpy.real(inverted)).astype(numpy.int32)

As a side note, there’s nothing that stops this from working with polynomials that have real or complex coefficients, but so long as we use small magnitude integer coefficients and round at the end, I don’t have to worry about precision issues (hat tip to Brad Lucier for suggesting an excellent paper by Colin Percival, “Rapid multiplication modulo the sum and difference of highly composite numbers“, which covers these precision issues in detail).

Negacyclic polynomials, DFT with duplication

Now the kind of polynomial quotient ring that shows up in cryptography is critically not $\mathbb{Z}[x]/(x^N-1)$, because that ring has enough easy-to-reason-about structure that it can’t hide secrets. Instead, cryptographers use the ring $\mathbb{Z}[x]/(x^N+1)$ (the minus becomes a plus), which is believed to be more secure for cryptography—although I don’t have a great intuitive grasp on why.

The interpretation is similar here as before, except we “set” $x^N = -1$ instead of $x^N = 1$ in our reductions. Repeating the above example, if $N=5$ then $x^{10} + x^6 – x^4 + x + 2 = -x^4 + 3$ (the $x^{10}$ becomes $(-1)^2 = 1$, and $x^6 = -x$). It’s called negacyclic because as a term $x^k$ passes $k \geq N$, it cycles back to $x^0 = 1$, but with a sign flip.

The negacyclic polynomial multiplication can’t use the DFT without some special hacks. The first and simplest hack is to double the input lists with a negation. That is, starting from $f(x) \in \mathbb{Z}[x]/(x^N+1)$, we can define $f^*(x) = f(x) – x^Nf(x)$ in a different ring $\mathbb{Z}[x]/(x^{2N} – 1)$ (and similarly for $g^*$ and $g$).

Before seeing how this causes the DFT to (almost) compute a negacyclic polynomial product, some math wizardry. The ring $\mathbb{Z}[x]/(x^{2N} – 1)$ is special because it contains our negacyclic ring as a subring. Indeed, because the polynomial $x^{2N} – 1$ factors as $(x^N-1)(x^N+1)$, and because these two factors are coprime in $\mathbb{Z}[x]/(x^{2N} – 1)$, the Chinese remainder theorem (aka Sun-tzu’s theorem) generalizes to polynomial rings and says that any polynomial in $\mathbb{Z}[x]/(x^{2N} – 1)$ is uniquely determined by its remainders when divided by $(x^N-1)$ and $(x^N+1)$. Another way to say it is that the ring $\mathbb{Z}[x]/(x^{2N} – 1)$ factors as a direct product of the two rings $\mathbb{Z}[x]/(x^{N} – 1)$ and $\mathbb{Z}[x]/(x^{N} + 1)$.

Now mapping a polynomial $f(x)$ from the bigger ring $(x^{2N} – 1)$ to the smaller ring $(x^{N}+1)$ involves taking a remainder of $f(x)$ when dividing by $x^{N}+1$ (“setting” $x^N = -1$ and reducing). There are many possible preimage mappings, depending on what your goal is. In this case, we actually intentionally choose a non preimage mapping, because in general to compute a preimage requires solving a system of congruences in the larger polynomial ring. So instead we choose $f(x) \mapsto f^*(x) = f(x) – x^Nf(x) = -f(x)(x^N – 1)$, which maps back down to $2f(x)$ in $\mathbb{Z}[x]/(x^{N} + 1)$. This preimage mapping has a particularly nice structure, in that you build it by repeating the polynomial’s coefficients twice and flipping the sign of the second half. It’s easy to see that the product $f^*(x) g^*(x)$ maps down to $4f(x)g(x)$.

So if we properly account for these extra constant factors floating around, our strategy to perform negacyclic polynomial multiplication is to map $f$ and $g$ up to the larger ring as described, compute their cyclic product (modulo $x^{2N} – 1$) using the FFT, and then the result should be a degree $2N-1$ polynomial which can be reduced with one more modular reduction step to the right degree $N-1$ negacyclic product, i.e., setting $x^N = -1$, which materializes as taking the second half of the coefficients, flipping their signs, and adding them to the corresponding coefficients in the first half.

The code for this is:

def negacyclic_polymul_preimage_and_map_back(p1, p2):
    p1_preprocessed = numpy.concatenate([p1, -p1])
    p2_preprocessed = numpy.concatenate([p2, -p2])
    product = fft(p1_preprocessed) * fft(p2_preprocessed)
    inverted = ifft(product)
    rounded = numpy.round(numpy.real(inverted)).astype(p1.dtype)
    return (rounded[: p1.shape[0]] - rounded[p1.shape[0] :]) // 4

However, this chosen mapping hides another clever trick. The product of the two preimages has enough structure that we can “read” the result off without doing the full “set $x^N = -1$” reduction step. Mapping $f$ and $g$ up to $f^*, g^*$ and taking their product modulo $(x^{2N} – 1)$ gives

\[ \begin{aligned} f^*g^* &= -f(x^N-1) \cdot -g(x^N – 1) \\ &= fg (x^N-1)^2 \\ &= fg(x^{2N} – 2x^N + 1) \\ &= fg(2 – 2x^N) \\ &= 2(fg – x^Nfg) \end{aligned} \]

This has the same syntactical format as the original mapping $f \mapsto f – x^Nf$, with an extra factor of 2, and so its coefficients also have the form “repeat the coefficients and flip the sign of the second half” (times two). We can then do the “inverse mapping” by reading only the first half of the coefficients and dividing by 2.

def negacyclic_polymul_use_special_preimage(p1, p2):
    p1_preprocessed = numpy.concatenate([p1, -p1])
    p2_preprocessed = numpy.concatenate([p2, -p2])
    product = fft(p1_preprocessed) * fft(p2_preprocessed)
    inverted = ifft(product)
    rounded = numpy.round(0.5 * numpy.real(inverted)).astype(p1.dtype)
    return rounded[: p1.shape[0]]

Our chosen mapping $f \mapsto f-x^Nf$ is not particularly special, except that it uses a small number of pre and post-processing operations. For example, if you instead used the mapping $f \mapsto 2f + x^Nf$ (which would map back to $f$ exactly), then the FFT product would result in $5fg + 4x^Nfg$ in the larger ring. You can still read off the coefficients as before, but you’d have to divide by 5 instead of 2 (which, the superstitious would say, is harder). It seems that “double and negate” followed by “halve and take first half” is the least amount of pre/post processing possible.

Negacyclic polynomials with a “twist”

The previous section identified a nice mapping (or embedding) of the input polynomials into a larger ring. But studying that shows some symmetric structure in the FFT output. I.e., the coefficients of $f$ and $g$ are repeated twice, with some scaling factors. It also involves taking an FFT of two $2N$-dimensional vectors when we start from two $N$-dimensional vectors.

This sort of situation should make you think that we can do this more efficiently, either by using a smaller size FFT or by packing some data into the complex part of the input, and indeed we can do both.

[Aside: it’s well known that if all the entries of an FFT input are real, then the result also has symmetry that can be exploted for efficiency by reframing the problem as a size-N/2 FFT in some cases, and just removing half the FFT algorithm’s steps in other cases, see Wikipedia for more]

This technique was explained in Fast multiplication and its applications (pdf link) by Daniel Bernstein, a prominent cryptographer who specializes in cryptography performance, and whose work appears in widely-used standards like TLS, OpenSSH, and he designed a commonly used elliptic curve for cryptography.

[Aside: Bernstein cites this technique as using something called the “Tangent FFT (pdf link).” This is a drop-in FFT replacement he invented that is faster than previous best (split-radix FFT), and Bernstein uses it mainly to give a precise expression for the number of operations required to do the multiplication end to end. We will continue to use the numpy FFT implementation, since in this article I’m just focusing on how to express negacyclic multiplication in terms of the FFT. Also worth noting both the Tangent FFT and “Fast multiplication” papers frame their techniques—including FFT algorithm implementations!—in terms of polynomial ring factorizations and mappings. Be still, my beating cardioid.]

In terms of polynomial mappings, we start from the ring $\mathbb{R}[x] / (x^N + 1)$, where $N$ is a power of 2. We then pick a reversible mapping from $\mathbb{R}[x]/(x^N + 1) \to \mathbb{C}[x]/(x^{N/2} – 1)$ (note the field change from real to complex), apply the FFT to the image of the mapping, and reverse appropriately it at the end.

One such mapping takes two steps, first mapping $\mathbb{R}[x]/(x^N + 1) \to \mathbb{C}[x]/(x^{N/2} – i)$ and then from $\mathbb{C}[x]/(x^{N/2} – i) \to \mathbb{C}[x]/(x^{N/2} – 1)$. The first mapping is as easy as the last section, because $(x^N + 1) = (x^{N/2} + i) (x^{N/2} – i)$, and so we can just set $x^{N/2} = i$ and reduce the polynomial. This as the effect of making the second half of the polynomial’s coefficients become the complex part of the first half of the coefficients.

The second mapping is more nuanced, because we’re not just reducing via factorization. And we can’t just map $i \mapsto 1$ generically, because that would reduce complex numbers down to real values. Instead, we observe that (momentarily using an arbitrary degree $k$ instead of $N/2$), for any polynomial $f \in \mathbb{C}[x]$, the remainder of $f \mod x^k-i$ uniquely determines the remainder of $f \mod x^k – 1$ via the change of variables $x \mapsto \omega_{4k} x$, where $\omega_{4k}$ is a $4k$-th primitive root of unity $\omega_{4k} = e^{\frac{2 \pi i}{4k}}$. Spelling this out in more detail: if $f(x) \in \mathbb{C}[x]$ has remainder $f(x) = g(x) + h(x)(x^k – i)$ for some polynomial $h(x)$, then

\[ \begin{aligned} f(\omega_{4k}x) &= g(\omega_{4k}x) + h(\omega_{4k}x)((\omega_{4k}x)^{k} – i) \\ &= g(\omega_{4k}x) + h(\omega_{4k}x)(e^{\frac{\pi i}{2}} x^k – i) \\ &= g(\omega_{4k}x) + i h(\omega_{4k}x)(x^k – 1) \\ &= g(\omega_{4k}x) \mod (x^k – 1) \end{aligned} \]

Translating this back to $k=N/2$, the mapping from $\mathbb{C}[x]/(x^{N/2} – i) \to \mathbb{C}[x]/(x^{N/2} – 1)$ is $f(x) \mapsto f(\omega_{2N}x)$. And if $f = f_0 + f_1x + \dots + f_{N/2 – 1}x^{N/2 – 1}$, then the mapping involves multiplying each coefficient $f_k$ by $\omega_{2N}^k$.

When you view polynomials as if they were a simple vector of their coefficients, then this operation $f(x) \mapsto f(\omega_{k}x)$ looks like $(a_0, a_1, \dots, a_n) \mapsto (a_0, \omega_{k} a_1, \dots, \omega_k^n a_n)$. Bernstein calls the operation a twist of $\mathbb{C}^n$, which I mused about in this Mathstodon thread.

What’s most important here is that each of these transformations are invertible. The first because the top half coefficients end up in the complex parts of the polynomial, and the second because the mapping $f(x) \mapsto f(\omega_{2N}^{-1}x)$ is an inverse. Together, this makes the preprocessing and postprocessing exact inverses of each other. The code is then

def negacyclic_polymul_complex_twist(p1, p2):
    n = p2.shape[0]
    primitive_root = primitive_nth_root(2 * n)
    root_powers = primitive_root ** numpy.arange(n // 2)

    p1_preprocessed = (p1[: n // 2] + 1j * p1[n // 2 :]) * root_powers
    p2_preprocessed = (p2[: n // 2] + 1j * p2[n // 2 :]) * root_powers

    p1_ft = fft(p1_preprocessed)
    p2_ft = fft(p2_preprocessed)
    prod = p1_ft * p2_ft
    ifft_prod = ifft(prod)
    ifft_rotated = ifft_prod * primitive_root ** numpy.arange(0, -n // 2, -1)

    return numpy.round(
        numpy.concatenate([numpy.real(ifft_rotated), numpy.imag(ifft_rotated)])

And so, at the cost of a bit more pre- and postprocessing, we can negacyclically multiply two degree $N-1$ polynomials using an FFT of length $N/2$. In theory, no information is wasted and this is optimal.

And finally, a simple matrix multiplication

The last technique I wanted to share is not based on the FFT, but it’s another method for doing negacyclic polynomial multiplication that has come in handy in situations where I am unable to use FFTs. I call it the Toeplitz method, because one of the polynomials is converted to a Toeplitz matrix. Sometimes I hear it referred to as a circulant matrix technique, but due to the negacyclic sign flip, I don’t think it’s a fully accurate term.

The idea is to put the coefficients of one polynomial $f(x) = f_0 + f_1x + \dots + f_{N-1}x^{N-1}$ into a matrix as follows:

\[ \begin{pmatrix} f_0 & -f_{N-1} & \dots & -f_1 \\ f_1 & f_0 & \dots & -f_2 \\ \vdots & \vdots & \ddots & \vdots \\ f_{N-1} & f_{N-2} & \dots & f_0 \end{pmatrix} \]

The polynomial coefficients are written down in the first column unchanged, then in each subsequent column, the coefficients are cyclically shifted down one, and the term that wraps around the top has its sign flipped. When the second polynomial is treated as a vector of its coefficients, say, $g(x) = g_0 + g_1x + \dots + g_{N-1}x^{N-1}$, then the matrix-vector product computes their negacyclic product (as a vector of coefficients):

\[ \begin{pmatrix} f_0 & -f_{N-1} & \dots & -f_1 \\ f_1 & f_0 & \dots & -f_2 \\ \vdots & \vdots & \ddots & \vdots \\ f_{N-1} & f_{N-2} & \dots & f_0 \end{pmatrix} \begin{pmatrix} g_0 \\ g_1 \\ \vdots \\ g_{N-1} \end{pmatrix} \]

This works because each row $j$ corresponds to one output term $x^j$, and the cyclic shift for that row accounts for the degree-wrapping, with the sign flip accounting for the negacyclic part. (If there were no sign attached, this method could be used to compute a cyclic polynomial product).

The Python code for this is

def cylic_matrix(c: numpy.array) -> numpy.ndarray:
    """Generates a cyclic matrix with each row of the input shifted.

    For input: [1, 2, 3], generates the following matrix:

        [[1 2 3]
         [2 3 1]
         [3 1 2]]
    c = numpy.asarray(c).ravel()
    a, b = numpy.ogrid[0 : len(c), 0 : -len(c) : -1]
    indx = a + b
    return c[indx]

def negacyclic_polymul_toeplitz(p1, p2):
    n = len(p1)

    # Generates a sign matrix with 1s below the diagonal and -1 above.
    up_tri = numpy.tril(numpy.ones((n, n), dtype=int), 0)
    low_tri = numpy.triu(numpy.ones((n, n), dtype=int), 1) * -1
    sign_matrix = up_tri + low_tri

    cyclic_matrix = cylic_matrix(p1)
    toeplitz_p1 = sign_matrix * cyclic_matrix
    return numpy.matmul(toeplitz_p1, p2)

Obviously on most hardware this would be less efficient than an FFT-based method (and there is some relationship between circulant matrices and Fourier Transforms, see Wikipedia). But in some cases—when the polynomials are small, or one of the two polynomials is static, or a particular hardware choice doesn’t handle FFTs with high-precision floats very well, or you want to take advantage of natural parallelism in the matrix-vector product—this method can be useful. It’s also simpler to reason about.

Until next time!

Polynomial Multiplication Using the FFT

Problem: Compute the product of two polynomials efficiently.


import numpy
from numpy.fft import fft, ifft

def poly_mul(p1, p2):
    """Multiply two polynomials.

    p1 and p2 are arrays of coefficients in degree-increasing order.
    deg1 = p1.shape[0] - 1
    deg2 = p1.shape[0] - 1
    # Would be 2*(deg1 + deg2) + 1, but the next-power-of-2 handles the +1
    total_num_pts = 2 * (deg1 + deg2)
    next_power_of_2 = 1 << (total_num_pts - 1).bit_length()

    ff_p1 = fft(numpy.pad(p1, (0, next_power_of_2 - p1.shape[0])))
    ff_p2 = fft(numpy.pad(p2, (0, next_power_of_2 - p2.shape[0])))
    product = ff_p1 * ff_p2
    inverted = ifft(product)
    rounded = numpy.round(numpy.real(inverted)).astype(numpy.int32)
    return numpy.trim_zeros(rounded, trim='b')

Discussion: The Fourier Transform has a lot of applications to science, and I’ve covered it on this blog before, see the Signal Processing section of Main Content. But it also has applications to fast computational mathematics.

The naive algorithm for multiplying two polynomials is the “grade-school” algorithm most readers will already be familiar with (see e.g., this page), but for large polynomials that algorithm is slow. It requires $O(n^2)$ arithmetic operations to multiply two polynomials of degree $n$.

This short tip shows a different approach, which is based on the idea of polynomial interpolation. As a side note, I show the basic theory of polynomial interpolation in chapter 2 of my book, A Programmer’s Introduction to Mathematics, along with an application to cryptography called “Secret Sharing.”

The core idea is that given $n+1$ distinct evaluations of a polynomial $p(x)$ (i.e., points $(x, p(x))$ with different $x$ inputs), you can reconstruct the coefficients of $p(x)$ exactly. And if you have two such point sets for two different polynomials $p(x), q(x)$, a valid point set of the product $(pq)(x)$ is the product of the points that have the same $x$ inputs.

\[ \begin{aligned} p(x) &= \{ (x_0, p(x_0)), (x_1, p(x_1)), \dots, (x_n, p(x_n)) \} \\ q(x) &= \{ (x_0, q(x_0)), (x_1, q(x_1)), \dots, (x_n, q(x_n)) \} \\ (pq)(x) &= \{ (x_0, p(x_0)q(x_0)), (x_1, p(x_1)q(x_1)), \dots, (x_n, p(x_n)q(x_n)) \} \end{aligned} \]

The above uses $=$ loosely to represent that the polynomial $p$ can be represented by the point set on the right hand side.

So given two polynomials $p(x), q(x)$ in their coefficient forms, one can first convert them to their point forms, multiply the points, and then reconstruct the resulting product.

The problem is that the two conversions, both to and from the coefficient form, are inefficient for arbitrary choices of points $x_0, \dots, x_n$. The trick comes from choosing special points, in such a way that the intermediate values computed in the conversion steps can be reused. This is where the Fourier Transform comes in: choose $x_0 = \omega_{N}$, the complex-N-th root of unity, and $x_k = \omega_N^k$ as its exponents. $N$ is required to be large enough so that $\omega_N$’s exponents have at least $2n+1$ distinct values required for interpolating a degree-at-most-$2n$ polynomial, and because we’re doing the Fourier Transform, it will naturally be “the next largest power of 2” bigger than the degree of the product polynomial.

Then one has to observe that, by its very formula, the Fourier Transform is exactly the evaluation of a polynomial at the powers of the $N$-th root of unity! In formulas: if $a = (a_0, \dots, a_{n-1})$ is a list of real numbers define $p_a(x) = a_0 + a_1x + \dots + a_{n-1}x^{n-1}$. Then $\mathscr{F}(a)(k)$, the Fourier Transform of $a$ at index $k$, is equal to $p_a(\omega_n^k)$. These notes by Denis Pankratov have more details showing that the Fourier Transform formula is a polynomial evaluation (see Section 3), and this YouTube video by Reducible also has a nice exposition. This interpretation of the FT as polynomial evaluation seems to inspire quite a few additional techniques for computing the Fourier Transform that I plan to write about in the future.

The last step is to reconstruct the product polynomial from the product of the two point sets, but because the Fourier Transform is an invertible function (and linear, too), the inverse Fourier Transform does exactly that: given a list of the $n$ evaluations of a polynomial at $\omega_n^k, k=0, \dots, n-1$, return the coefficients of the polynomial.

This all fits together into the code above:

  1. Pad the input coefficient lists with zeros, so that the lists are a power of 2 and at least 1 more than the degree of the output product polynomial.
  2. Compute the FFT of the two padded polynomials.
  3. Multiply the result pointwise.
  4. Compute the IFFT of the product.
  5. Round the resulting (complex) values back to integers.

Hey, wait a minute! What about precision issues?

They are a problem when you have large numbers or large polynomials, because the intermediate values in steps 2-4 can lose precision due to the floating point math involved. This short note of Richard Fateman discusses some of those issues, and two paths forward include: deal with it somehow, or use an integer-exact analogue called the Number Theoretic Transform (which itself has issues I’ll discuss in a future, longer article).

Postscript: I’m not sure how widely this technique is used. It appears the NTL library uses the polynomial version of Karatsuba multiplication instead (though it implements FFT elsewhere). However, I know for sure that much software involved in doing fully homomorphic encryption rely on the FFT for performance reasons, and the ones that don’t instead use the NTT.

Searching for RH Counterexamples — Search Strategies

We’re glibly searching for counterexamples to the Riemann Hypothesis, to trick you into learning about software engineering principles. In the first two articles we configured a testing framework and showed how to hide implementation choices behind an interface. Next, we’ll improve the algorithm’s core routine. As before, I’ll link to specific git commits in the final code repository to show how the project evolves.

Superabundant numbers

A superabundant number $ n$ is one which has “maximal relative divisor sums” in the following sense: for all $ m < n$,

$ \displaystyle \frac{\sigma(m)}{m} < \frac{\sigma(n)}{n}$

where $ \sigma(n)$ is the sum of the divisors of $ n$.

Erdős and Alaoglu proved in 1944 (“On highly composite and similar numbers“) that superabundant numbers have a specific prime decomposition, in which all initial primes occur with non-increasing exponents

$ \displaystyle n = \prod_{i=1}^k (p_i)^{a_i},$

where $ p_i$ is the i-th prime, and $ a_1 \geq a_2 \geq \dots \geq a_k \geq 1$. With two exceptions ($ n=4, 36$), $ a_k = 1$.

Here’s a rough justification for why superabundant numbers should have a decomposition like this. If you want a number with many divisors (compared to the size of the number), you want to pack as many combinations of small primes into the decomposition of your number as possible. Using all 2’s leads to not enough combinations—only $ m+1$ divisors for $ 2^m$—but using 2′ and 3’s you get $ (r+1)(s+1)$ for $ 2^r3^s$. Using more 3’s trades off a larger number $ n$ for the benefit of a larger $ \sigma(n)$ (up to $ r=s$). The balance between getting more distinct factor combinations and a larger $ n$ favors packing the primes in there.

Though numbers of this form are not necessarily superabundant, this gives us an enumeration strategy better than trying all numbers. Enumerate over tuples corresponding to the exponents of the prime decomposition (non-increasing lists of integers), and save those primes to make it easier to compute the divisor sum.

Non-increasing lists of integers can be enumerated in the order of their sum, and for each sum $ N$, the set of non-increasing lists of integers summing to $ N$ is called the partitions of $ N$. There is a simple algorithm to compute them, implemented in this commit. Note this does not enumerate them in order of the magnitude of the number $ \prod_{i=1}^k (p_i)^{a_i}$.

The implementation for the prime-factorization-based divisor sum computation is in this commit. In addition, to show some alternative methods of testing, we used the hypothesis library to autogenerate tests. It chooses a random (limited size) prime factorization, and compares the prime-factorization-based algorithm to the naive algorithm. There’s a bit of setup code involved, but as a result we get dozens of tests and more confidence it’s right.

Search Strategies

We now have two search strategies over the space of natural numbers, though one is obviously better. We may come up with a third, so it makes sense to separate the search strategy from the main application by an interface. Generally, if you have a hard-coded implementation, and you realize that you need to change it in a significant way, that’s a good opportunity to extract it and hide it behind an interface.

A good interface choice is a bit tricky here, however. In the original implementation, we could say, “process the batch of numbers (search for counterexamples) between 1 and 2 million.” When that batch is saved to the database, we would start on the next batch, and all the batches would be the same size, so (ignoring that computing $ \sigma(n)$ the old way takes longer as $ n$ grows) each batch required roughly the same time to run.

The new search strategy doesn’t have a sensible way to do this. You can’t say “start processing from K” because we don’t know how to easily get from K to the parameter of the enumeration corresponding to K (if one exists). This is partly because our enumeration isn’t monotonic increasing ($ 2^1 3^1 5^1 = 30$ comes before $ 2^4 = 16$). And partly because even if we did have a scheme, it would almost certainly require us to compute a prime factorization, which is slow. It would be better if we could save the data from the latest step of the enumeration, and load it up when starting the next batch of the search.

This scheme suggests a nicely generic interface for stopping and restarting a search from a particular spot. The definition of a “spot,” and how to start searching from that spot, are what’s hidden by the interface. Here’s a first pass.

SearchState = TypeVar('SearchState')

class SearchStrategy(ABC):
    def starting_from(self, search_state: SearchState) -> SearchStrategy:
        '''Reset the search strategy to search from a given state.'''

    def search_state(self) -> SearchState:
        '''Get an object describing the current state of the enumeration.'''

    def next_batch(self, batch_size: int) -> List[RiemannDivisorSum]:
        '''Process the next batch of Riemann Divisor Sums'''

Note that SearchState is defined as a generic type variable because we cannot say anything about its structure yet. The implementation class is responsible for defining what constitutes a search state, and getting the search strategy back to the correct step of the enumeration given the search state as input. Later I realized we do need some structure on the SearchState—the ability to serialize it for storage in the database—so we elevated it to an interface later.

Also note that we are making SearchStrategy own the job of computing the Riemann divisor sums. This is because the enumeration details and the algorithm to compute the divisor sums are now coupled. For the exhaustive search strategy it was “integers n, naively loop over smaller divisors.” In the new strategy it’s “prime factorizations, prime-factorization-based divisor sum.” We could decouple this, but there is little reason to now because the implementations are still in 1-1 correspondence.

This commit implements the old search strategy in terms of this interface, and this commit implements the new search strategy. In the latter, I use pytest.parameterize to test against the interface and parameterize over the implementations.

The last needed bit is the ability to store and recover the search state in between executions of the main program. This requires a second database table. The minimal thing we could do is just store and update a single row for each search strategy, providing the search state as of the last time the program was run and stopped. This would do, but in my opinion an append-only log is a better design for such a table. That is, each batch computed will have a record containing the timestamp the batch started and finished, along with the starting and ending search state. We can use the largest timestamp for a given search strategy to pick up where we left off across program runs.

One can imagine this being the basis for an application like folding@home or the BOINC family of projects, where a database stores chunks of a larger computation (ranges of a search space), clients can request chunks to complete, and they are assembled into a complete database. In this case we might want to associate the chunk metadata with the computed results (say, via a foreign key). That would require a bit of work from what we have now, but note that the interfaces would remain reusable for this. For now, we will just incorporate the basic table approach. It is completed in this pull request, and tying it into the main search routine is done in this commit.

However, when running it with the superabundant search strategy, we immediately run into a problem. Superabundant numbers grow too fast, and within a few small batches of size 100 we quickly exceed the 64 bits available to numba and sqlite to store the relevant data.

>>> fac = partition_to_prime_factorization(partitions_of_n(16)[167])
>>> fac2 = [p**d for (p, d) in fac]
>>> fac2
[16, 81, 625, 2401, 11, 13, 17, 19, 23, 29, 31, 37]
>>> math.log2(reduce(lambda x,y: x*y, fac2))

Running results in the error

$ python -m riemann.populate_database db.sqlite3 SuperabundantSearchStrategy 100
Searching with strategy SuperabundantSearchStrategy
Starting from search state SuperabundantEnumerationIndex(level=1, index_in_level=0)
Computed [1,0, 10,4] in 0:00:03.618798
Computed [10,4, 12,6] in 0:00:00.031451
Computed [12,6, 13,29] in 0:00:00.031518
Computed [13,29, 14,28] in 0:00:00.041464
Computed [14,28, 14,128] in 0:00:00.041674
Computed [14,128, 15,93] in 0:00:00.034419
OverflowError: Python int too large to convert to SQLite INTEGER

We’ll see what we can do about this in a future article, but meanwhile we do get some additional divisor sums for these large numbers, and 10080 is still the best.

sqlite> select n, witness_value 
from RiemannDivisorSums 
where witness_value > 1.7 and n > 5040 
order by witness_value desc 
limit 10;


Zero Knowledge Proofs for NP

Last time, we saw a specific zero-knowledge proof for graph isomorphism. This introduced us to the concept of an interactive proof, where you have a prover and a verifier sending messages back and forth, and the prover is trying to prove a specific claim to the verifier.

A zero-knowledge proof is a special kind of interactive proof in which the prover has some secret piece of knowledge that makes it very easy to verify a disputed claim is true. The prover’s goal, then, is to convince the verifier (a polynomial-time algorithm) that the claim is true without revealing any knowledge at all about the secret.

In this post we’ll see that, using a bit of cryptography, zero-knowledge proofs capture a much wider class of problems than graph isomorphism. Basically, if you believe that cryptography exists, every problem whose answers can be easily verified have zero-knowledge proofs (i.e., all of the class NP). Here are a bunch of examples. For each I’ll phrase the problem as a question, and then say what sort of data the prover’s secret could be.

  • Given a boolean formula, is there an assignment of variables making it true? Secret: a satisfying assignment to the variables.
  • Given a set of integers, is there a subset whose sum is zero? Secret: such a subset.
  • Given a graph, does it have a 3-coloring? Secret: a valid 3-coloring.
  • Given a boolean circuit, can it produce a specific output? Secret: a choice of inputs that produces the output.

The common link among all of these problems is that they are NP-hard (graph isomorphism isn’t known to be NP-hard). For us this means two things: (1) we think these problems are actually hard, so the verifier can’t solve them, and (2) if you show that one of them has a zero-knowledge proof, then they all have zero-knowledge proofs.

We’re going to describe and implement a zero-knowledge proof for graph 3-colorability, and in the next post we’ll dive into the theoretical definitions and talk about the proof that the scheme we present is zero-knowledge. As usual, all of the code used in making this post is available in a repository on this blog’s Github page. In the follow up to this post, we’ll dive into more nitty gritty details about the proof that this works, and study different kinds of zero-knowledge.

One-way permutations

In a recent program gallery post we introduced the Blum-Blum-Shub pseudorandom generator. A pseudorandom generator is simply an algorithm that takes as input a short random string of length $ s$ and produces as output a longer string, say, of length $ 3s$. This output string should not be random, but rather “indistinguishable” from random in a sense we’ll make clear next time. The underlying function for this generator is the “modular squaring” function $ x \mapsto x^2 \mod M$, for some cleverly chosen $ M$. The $ M$ is chosen in such a way that makes this mapping a permutation. So this function is more than just a pseudorandom generator, it’s a one-way permutation.

If you have a primality-checking algorithm on hand (we do), then preparing the Blum-Blum-Shub algorithm is only about 15 lines of code.

def goodPrime(p):
    return p % 4 == 3 and probablyPrime(p, accuracy=100)

def findGoodPrime(numBits=512):
    candidate = 1

    while not goodPrime(candidate):
        candidate = random.getrandbits(numBits)

    return candidate

def makeModulus(numBits=512):
    return findGoodPrime(numBits) * findGoodPrime(numBits)

def blum_blum_shub(modulusLength=512):
    modulus = makeModulus(numBits=modulusLength)

    def f(inputInt):
        return pow(inputInt, 2, modulus)

    return f

The interested reader should check out the proof gallery post for more details about this generator. For us, having a one-way permutation is the important part (and we’re going to defer the formal definition of “one-way” until next time, just think “hard to get inputs from outputs”).

The other concept we need, which is related to a one-way permutation, is the notion of a hardcore predicate. Let $ G(x)$ be a one-way permutation, and let $ f(x) = b$ be a function that produces a single bit from a string. We say that $ f$ is a hardcore predicate for $ G$ if you can’t reliably compute $ f(x)$ when given only $ G(x)$.

Hardcore predicates are important because there are many one-way functions for which, when given the output, you can guess part of the input very reliably, but not the rest (e.g., if $ g$ is a one-way function, $ (x, y) \mapsto (x, g(y))$ is also one-way, but the $ x$ part is trivially guessable). So a hardcore predicate formally measures, when given the output of a one-way function, what information derived from the input is hard to compute.

In the case of Blum-Blum-Shub, one hardcore predicate is simply the parity of the input bits.

def parity(n):
    return sum(int(x) for x in bin(n)[2:]) % 2

Bit Commitment Schemes

A core idea that will makes zero-knowledge proofs work for NP is the ability for the prover to publicly “commit” to a choice, and later reveal that choice in a way that makes it infeasible to fake their commitment. This will involve not just the commitment to a single bit of information, but also the transmission of auxiliary data that is provably infeasible to fake.

Our pair of one-way permutation $ G$ and hardcore predicate $ f$ comes in very handy. Let’s say I want to commit to a bit $ b \in \{ 0,1 \}$. Let’s fix a security parameter that will measure how hard it is to change my commitment post-hoc, say $ n = 512$. My process for committing is to draw a random string $ x$ of length $ n$, and send you the pair $ (G(x), f(x) \oplus b)$, where $ \oplus$ is the XOR operator on two bits.

The guarantee of a one-way permutation with a hardcore predicate is that if you only see $ G(x)$, you can’t guess $ f(x)$ with any reasonable edge over random guessing. Moreover, if you fix a bit $ b$, and take an unpredictably random bit $ y$, the XOR $ b \oplus y$ is also unpredictably random. In other words, if $ f(x)$ is hardcore, then so is $ x \mapsto f(x) \oplus b$ for a fixed bit $ b$. Finally, to reveal my commitment, I just send the string $ x$ and let you independently compute $ (G(x), f(x) \oplus b)$. Since $ G$ is a permutation, that $ x$ is the only $ x$ that could have produced the commitment I sent you earlier.

Here’s a Python implementation of this scheme. We start with a generic base class for a commitment scheme.

class CommitmentScheme(object):
    def __init__(self, oneWayPermutation, hardcorePredicate, securityParameter):
            oneWayPermutation: int -&gt; int
            hardcorePredicate: int -&gt; {0, 1}
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate
        self.securityParameter = securityParameter

        # a random string of length `self.securityParameter` used only once per commitment
        self.secret = self.generateSecret()

    def generateSecret(self):
        raise NotImplemented

    def commit(self, x):
        raise NotImplemented

    def reveal(self):
        return self.secret

Note that the “reveal” step is always simply to reveal the secret. Here’s the implementation subclass. We should also note that the security string should be chosen at random anew for every bit you wish to commit to. In this post we won’t reuse CommitmentScheme objects anyway.

class BBSBitCommitmentScheme(CommitmentScheme):
    def generateSecret(self):
        # the secret is a random quadratic residue
        self.secret = self.oneWayPermutation(random.getrandbits(self.securityParameter))
        return self.secret

    def commit(self, bit):
        unguessableBit = self.hardcorePredicate(self.secret)
        return (
            unguessableBit ^ bit,  # python xor

One important detail is that the Blum-Blum-Shub one-way permutation is only a permutation when restricted to quadratic residues. As such, we generate our secret by shooting a random string through the one-way permutation to get a random residue. In fact this produces a uniform random residue, since the Blum-Blum-Shub modulus is chosen in such a way that ensures every residue has exactly four square roots.

Here’s code to check the verification is correct.

class BBSBitCommitmentVerifier(object):
    def __init__(self, oneWayPermutation, hardcorePredicate):
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate

    def verify(self, securityString, claimedCommitment):
        trueBit = self.decode(securityString, claimedCommitment)
        unguessableBit = self.hardcorePredicate(securityString)  # wasteful, whatever
        return claimedCommitment == (
            unguessableBit ^ trueBit,  # python xor

    def decode(self, securityString, claimedCommitment):
        unguessableBit = self.hardcorePredicate(securityString)
        return claimedCommitment[1] ^ unguessableBit

and an example of using it

if __name__ == "__main__":
    import blum_blum_shub
    securityParameter = 10
    oneWayPerm = blum_blum_shub.blum_blum_shub(securityParameter)
    hardcorePred = blum_blum_shub.parity

    print('Bit commitment')
    scheme = BBSBitCommitmentScheme(oneWayPerm, hardcorePred, securityParameter)
    verifier = BBSBitCommitmentVerifier(oneWayPerm, hardcorePred)

    for _ in range(10):
        bit = random.choice([0, 1])
        commitment = scheme.commit(bit)
        secret = scheme.reveal()
        trueBit = verifier.decode(secret, commitment)
        valid = verifier.verify(secret, commitment)

        print('{} == {}? {}; {} {}'.format(bit, trueBit, valid, secret, commitment))

Example output:

1 == 1? True; 524 (5685, 0)
1 == 1? True; 149 (22201, 1)
1 == 1? True; 476 (34511, 1)
1 == 1? True; 927 (14243, 1)
1 == 1? True; 608 (23947, 0)
0 == 0? True; 964 (7384, 1)
0 == 0? True; 373 (23890, 0)
0 == 0? True; 620 (270, 1)
1 == 1? True; 926 (12390, 0)
0 == 0? True; 708 (1895, 0)

As an exercise, write a program to verify that no other input to the Blum-Blum-Shub one-way permutation gives a valid verification. Test it on a small security parameter like $ n=10$.

It’s also important to point out that the verifier needs to do some additional validation that we left out. For example, how does the verifier know that the revealed secret actually is a quadratic residue? In fact, detecting quadratic residues is believed to be hard! To get around this, we could change the commitment scheme reveal step to reveal the random string that was used as input to the permutation to get the residue (cf. BBSCommitmentScheme.generateSecret for the random string that needs to be saved/revealed). Then the verifier could generate the residue in the same way. As an exercise, upgrade the bit commitment an verifier classes to reflect this.

In order to get a zero-knowledge proof for 3-coloring, we need to be able to commit to one of three colors, which requires two bits. So let’s go overkill and write a generic integer commitment scheme. It’s simple enough: specify a bound on the size of the integers, and then do an independent bit commitment for every bit.

class BBSIntCommitmentScheme(CommitmentScheme):
    def __init__(self, numBits, oneWayPermutation, hardcorePredicate, securityParameter=512):
            A commitment scheme for integers of a prespecified length `numBits`. Applies the
            Blum-Blum-Shub bit commitment scheme to each bit independently.
        self.schemes = [BBSBitCommitmentScheme(oneWayPermutation, hardcorePredicate, securityParameter)
                        for _ in range(numBits)]
        super().__init__(oneWayPermutation, hardcorePredicate, securityParameter)

    def generateSecret(self):
        self.secret = [x.secret for x in self.schemes]
        return self.secret

    def commit(self, integer):
        # first pad bits to desired length
        integer = bin(integer)[2:].zfill(len(self.schemes))
        bits = [int(bit) for bit in integer]
        return [scheme.commit(bit) for scheme, bit in zip(self.schemes, bits)]

And the corresponding verifier

class BBSIntCommitmentVerifier(object):
    def __init__(self, numBits, oneWayPermutation, hardcorePredicate):
        self.verifiers = [BBSBitCommitmentVerifier(oneWayPermutation, hardcorePredicate)
                          for _ in range(numBits)]

    def decodeBits(self, secrets, bitCommitments):
        return [v.decode(secret, commitment) for (v, secret, commitment) in
                zip(self.verifiers, secrets, bitCommitments)]

    def verify(self, secrets, bitCommitments):
        return all(
            bitVerifier.verify(secret, commitment)
            for (bitVerifier, secret, commitment) in
            zip(self.verifiers, secrets, bitCommitments)

    def decode(self, secrets, bitCommitments):
        decodedBits = self.decodeBits(secrets, bitCommitments)
        return int(''.join(str(bit) for bit in decodedBits))

A sample usage:

if __name__ == "__main__":
    import blum_blum_shub
    securityParameter = 10
    oneWayPerm = blum_blum_shub.blum_blum_shub(securityParameter)
    hardcorePred = blum_blum_shub.parity

    print('Int commitment')
    scheme = BBSIntCommitmentScheme(10, oneWayPerm, hardcorePred)
    verifier = BBSIntCommitmentVerifier(10, oneWayPerm, hardcorePred)
    choices = list(range(1024))
    for _ in range(10):
        theInt = random.choice(choices)
        commitments = scheme.commit(theInt)
        secrets = scheme.reveal()
        trueInt = verifier.decode(secrets, commitments)
        valid = verifier.verify(secrets, commitments)

        print('{} == {}? {}; {} {}'.format(theInt, trueInt, valid, secrets, commitments))

And a sample output:

527 == 527? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 0), (5426, 0), (9124, 1), (23973, 0), (44832, 0), (33044, 0), (68501, 0)]
67 == 67? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 1), (63975, 1), (5426, 0), (9124, 1), (23973, 1), (44832, 1), (33044, 0), (68501, 0)]
729 == 729? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 0), (63975, 1), (5426, 0), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 0)]
441 == 441? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 0), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 0)]
614 == 614? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 1), (5426, 1), (9124, 1), (23973, 1), (44832, 0), (33044, 0), (68501, 1)]
696 == 696? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
974 == 974? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 0), (54363, 0), (63975, 1), (5426, 0), (9124, 1), (23973, 0), (44832, 0), (33044, 0), (68501, 1)]
184 == 184? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
136 == 136? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 0), (63975, 0), (5426, 0), (9124, 1), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
632 == 632? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 1), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]

Before we move on, we should note that this integer commitment scheme “blows up” the secret by quite a bit. If you have a security parameter $ s$ and an integer with $ n$ bits, then the commitment uses roughly $ sn$ bits. A more efficient method would be to simply use a good public-key encryption scheme, and then reveal the secret key used to encrypt the message. While we implemented such schemes previously on this blog, I thought it would be more fun to do something new.

A zero-knowledge proof for 3-coloring

First, a high-level description of the protocol. The setup: the prover has a graph $ G$ with $ n$ vertices $ V$ and $ m$ edges $ E$, and also has a secret 3-coloring of the vertices $ \varphi: V \to \{ 0, 1, 2 \}$. Recall, a 3-coloring is just an assignment of colors to vertices (in this case the colors are 0,1,2) so that no two adjacent vertices have the same color.

So the prover has a coloring $ \varphi$ to be kept secret, but wants to prove that $ G$ is 3-colorable. The idea is for the verifier to pick a random edge $ (u,v)$, and have the prover reveal the colors of $ u$ and $ v$. However, if we run this protocol only once, there’s nothing to stop the prover from just lying and picking two distinct colors. If we allow the verifier to run the protocol many times, and the prover actually reveals the colors from their secret coloring, then after roughly $ |V|$ rounds the verifier will know the entire coloring. Each step reveals more knowledge.

We can fix this with two modifications.

  1. The prover first publicly commits to the coloring using a commitment scheme. Then when the verifier asks for the colors of the two vertices of a random edge, he can rest assured that the prover fixed a coloring that does not depend on the verifier’s choice of edge.
  2. The prover doesn’t reveal colors from their secret coloring, but rather from a random permutation of the secret coloring. This way, when the verifier sees colors, they’re equally likely to see any two colors, and all the verifier will know is that those two colors are different.

So the scheme is: prover commits to a random permutation of the true coloring and sends it to the verifier; the verifier asks for the true colors of a given edge; the prover provides those colors and the secrets to their commitment scheme so the verifier can check.

The key point is that now the verifier has to commit to a coloring, and if the coloring isn’t a proper 3-coloring the verifier has a reasonable chance of picking an improperly colored edge (a one-in-$ |E|$ chance, which is at least $ 1/|V|^2$). On the other hand, if the coloring is proper, then the verifier will always query a properly colored edge, and it’s zero-knowledge because the verifier is equally likely to see every pair of colors. So the verifier will always accept, but won’t know anything more than that the edge it chose is properly colored. Repeating this $ |V|^2$-ish times, with high probability it’ll have queried every edge and be certain the coloring is legitimate.

Let’s implement this scheme. First the data types. As in the previous post, graphs are represented by edge lists, and a coloring is represented by a dictionary mapping a vertex to 0, 1, or 2 (the “colors”).

# a graph is a list of edges, and for simplicity we'll say
# every vertex shows up in some edge
exampleGraph = [
    (1, 2),
    (1, 4),
    (1, 3),
    (2, 5),
    (2, 5),
    (3, 6),
    (5, 6)

exampleColoring = {
    1: 0,
    2: 1,
    3: 2,
    4: 1,
    5: 2,
    6: 0,

Next, the Prover class that implements that half of the protocol. We store a list of integer commitment schemes for each vertex whose color we need to commit to, and send out those commitments.

class Prover(object):
    def __init__(self, graph, coloring, oneWayPermutation=ONE_WAY_PERMUTATION, hardcorePredicate=HARDCORE_PREDICATE):
        self.graph = [tuple(sorted(e)) for e in graph]
        self.coloring = coloring
        self.vertices = list(range(1, numVertices(graph) + 1))
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate
        self.vertexToScheme = None

    def commitToColoring(self):
        self.vertexToScheme = {
            v: commitment.BBSIntCommitmentScheme(
                2, self.oneWayPermutation, self.hardcorePredicate
            ) for v in self.vertices

        permutation = randomPermutation(3)
        permutedColoring = {
            v: permutation[self.coloring[v]] for v in self.vertices

        return {v: s.commit(permutedColoring[v])
                for (v, s) in self.vertexToScheme.items()}

    def revealColors(self, u, v):
        u, v = min(u, v), max(u, v)
        if not (u, v) in self.graph:
            raise Exception('Must query an edge!')

        return (

In commitToColoring we randomly permute the underlying colors, and then compose that permutation with the secret coloring, committing to each resulting color independently. In revealColors we reveal only those colors for a queried edge. Note that we don’t actually need to store the permuted coloring, because it’s implicitly stored in the commitments.

It’s crucial that we reject any query that doesn’t correspond to an edge. If we don’t reject such queries then the verifier can break the protocol! In particular, by querying non-edges you can determine which pairs of nodes have the same color in the secret coloring. You can then chain these together to partition the nodes into color classes, and so color the graph. (After seeing the Verifier class below, implement this attack as an exercise).

Here’s the corresponding Verifier:

class Verifier(object):
    def __init__(self, graph, oneWayPermutation, hardcorePredicate):
        self.graph = [tuple(sorted(e)) for e in graph]
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate
        self.committedColoring = None
        self.verifier = commitment.BBSIntCommitmentVerifier(2, oneWayPermutation, hardcorePredicate)

    def chooseEdge(self, committedColoring):
        self.committedColoring = committedColoring
        self.chosenEdge = random.choice(self.graph)
        return self.chosenEdge

    def accepts(self, revealed):
        revealedColors = []

        for (w, bitSecrets) in zip(self.chosenEdge, revealed):
            trueColor = self.verifier.decode(bitSecrets, self.committedColoring[w])
            if not self.verifier.verify(bitSecrets, self.committedColoring[w]):
                return False

        return revealedColors[0] != revealedColors[1]

As expected, in the acceptance step the verifier decodes the true color of the edge it queried, and accepts if and only if the commitment was valid and the edge is properly colored.

Here’s the whole protocol, which is syntactically very similar to the one for graph isomorphism.

def runProtocol(G, coloring, securityParameter=512):
    oneWayPermutation = blum_blum_shub.blum_blum_shub(securityParameter)
    hardcorePredicate = blum_blum_shub.parity

    prover = Prover(G, coloring, oneWayPermutation, hardcorePredicate)
    verifier = Verifier(G, oneWayPermutation, hardcorePredicate)

    committedColoring = prover.commitToColoring()
    chosenEdge = verifier.chooseEdge(committedColoring)

    revealed = prover.revealColors(*chosenEdge)
    revealedColors = (
        verifier.verifier.decode(revealed[0], committedColoring[chosenEdge[0]]),
        verifier.verifier.decode(revealed[1], committedColoring[chosenEdge[1]]),
    isValid = verifier.accepts(revealed)

    print("{} != {} and commitment is valid? {}".format(
        revealedColors[0], revealedColors[1], isValid

    return isValid

And an example of running it

if __name__ == "__main__":
    for _ in range(30):
        runProtocol(exampleGraph, exampleColoring, securityParameter=10)

Here’s the output

0 != 2 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
1 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 1 and commitment is valid? True
0 != 1 and commitment is valid? True
2 != 1 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
2 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 1 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 1 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
2 != 1 and commitment is valid? True
2 != 1 and commitment is valid? True
1 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
1 != 2 and commitment is valid? True
1 != 2 and commitment is valid? True
0 != 1 and commitment is valid? True

So while we haven’t proved it rigorously, we’ve seen the zero-knowledge proof for graph 3-coloring. This automatically gives us a zero-knowledge proof for all of NP, because given any NP problem you can just convert it to the equivalent 3-coloring problem and solve that. Of course, the blowup required to convert a random NP problem to 3-coloring can be polynomially large, which makes it unsuitable for practice. But the point is that this gives us a theoretical justification for which problems have zero-knowledge proofs in principle. Now that we’ve established that you can go about trying to find the most efficient protocol for your favorite problem.

Anticipatory notes

When we covered graph isomorphism last time, we said that a simulator could, without participating in the zero-knowledge protocol or knowing the secret isomorphism, produce a transcript that was drawn from the same distribution of messages as the protocol produced. That was all that it needed to be “zero-knowledge,” because anything the verifier could do with its protocol transcript, the simulator could do too.

We can do exactly the same thing for 3-coloring, exploiting the same “reverse order” trick where the simulator picks the random edge first, then chooses the color commitment post-hoc.

Unfortunately, both there and here I’m short-changing you, dear reader. The elephant in the room is that our naive simulator assumes the verifier is playing by the rules! If you want to define security, you have to define it against a verifier who breaks the protocol in an arbitrary way. For example, the simulator should be able to produce an equivalent transcript even if the verifier deterministically picks an edge, or tries to pick a non-edge, or tries to send gibberish. It takes a lot more work to prove security against an arbitrary verifier, but the basic setup is that the simulator can no longer make choices for the verifier, but rather has to invoke the verifier subroutine as a black box. (To compensate, the requirements on the simulator are relaxed quite a bit; more on that next time)

Because an implementation of such a scheme would involve a lot of validation, we’re going to defer the discussion to next time. We also need to be more specific about the different kinds of zero-knowledge, since we won’t be able to achieve perfect zero-knowledge with the simulator drawing from an identical distribution, but rather a computationally indistinguishable distribution.

We’ll define all this rigorously next time, and discuss the known theoretical implications and limitations. Next time will be cuffs-off theory, baby!

Until then!