A Spectral Analysis of Moore Graphs

For fixed integers r > 0, and odd g, a Moore graph is an r-regular graph of girth g which has the minimum number of vertices n among all such graphs with the same regularity and girth.

(Recall, A the girth of a graph is the length of its shortest cycle, and it’s regular if all its vertices have the same degree)

Problem (Hoffman-Singleton): Find a useful constraint on the relationship between n and r for Moore graphs of girth 5 and degree r.

Note: Excluding trivial Moore graphs with girth g=3 and degree r=2, there are only two known Moore graphs: (a) the Petersen graph and (b) this crazy graph:


The solution to the problem shows that there are only a few cases left to check.

Solution: It is easy to show that the minimum number of vertices of a Moore graph of girth 5 and degree r is 1 + r + r(r-1) = r^2 + 1. Just consider the tree:


This is the tree example for r = 3, but the argument should be clear for any r from the branching pattern of the tree: 1 + r + r(r-1)

Provided n = r^2 + 1, we will prove that r must be either 3, 7, or 57. The technique will be to analyze the eigenvalues of a special matrix derived from the Moore graph.

Let A be the adjacency matrix of the supposed Moore graph with these properties. Let B = A^2 = (b_{i,j}). Using the girth and regularity we know:

  • b_{i,i} = r since each vertex has degree r.
  • b_{i,j} = 0 if (i,j) is an edge of G, since any walk of length 2 from i to j would be able to use such an edge and create a cycle of length 3 which is less than the girth.
  • b_{i,j} = 1 if (i,j) is not an edge, because (using the tree idea above), every two vertices non-adjacent vertices have a unique neighbor in common.

Let J_n be the n \times n matrix of all 1’s and I_n the identity matrix. Then

\displaystyle B = rI_n + J_n - I_n - A.

We use this matrix equation to generate two equations whose solutions will restrict r. Since A is a real symmetric matrix is has an orthonormal basis of eigenvectors v_1, \dots, v_n with eigenvalues \lambda_1 , \dots, \lambda_n. Moreover, by regularity we know one of these vectors is the all 1’s vector, with eigenvalue r. Call this v_1 = (1, \dots, 1), \lambda_1 = r. By orthogonality of v_1 with the other v_i, we know that J_nv_i = 0. We also know that, since A is an adjacency matrix with zeros on the diagonal, the trace of A is \sum_i \lambda_i = 0.

Multiply the matrices in the equation above by any v_i, i > 1 to get

\displaystyle \begin{aligned}A^2v_i &= rv_i - v_i - Av_i \\ \lambda_i^2v_i &= rv_i - v_i - \lambda_i v_i \end{aligned}

Rearranging and factoring out v_i gives \lambda_i^2 - \lambda_i - (r+1) = 0. Let z = 4r - 3, then the non-r eigenvalues must be one of the two roots: \mu_1 = (-1 + \sqrt{z}) / 2 or \mu_2 = (-1 - \sqrt{z})/2.

Say that \mu_1 occurs a times and \mu_2 occurs b times, then n = a + b + 1. So we have the following equations.

\displaystyle \begin{aligned} a + b + 1 &= n \\ r + a \mu_1 + b\mu_2 &= 0 \end{aligned}

From this equation you can easily derive that \sqrt{z} is an integer, and as a consequence r = (m^2 + 3) / 4 for some integer m. With a tiny bit of extra algebra, this gives

\displaystyle m(m^3 - 2m - 16(a-b)) = 15

Implying that m divides 15, meaning m \in \{ 1, 3, 5, 15\}, and as a consequence r \in \{ 1, 3, 7, 57\}.


Discussion: This is a strikingly clever use of spectral graph theory to answer a question about combinatorics. Spectral graph theory is precisely that, the study of what linear algebra can tell us about graphs. For an deeper dive into spectral graph theory, see the guest post I wrote on With High Probability.

If you allow for even girth, there are a few extra (infinite families of) Moore graphs, see Wikipedia for a list.

With additional techniques, one can also disprove the existence of any Moore graphs that are not among the known ones, with the exception of a possible Moore graph of girth 5 and degree 57 on n = 3250 vertices. It is unknown whether such a graph exists, but if it does, it is known that

You should go out and find it or prove it doesn’t exist.

Hungry for more applications of linear algebra to combinatorics and computer science? The book Thirty-Three Miniatures is a fantastically entertaining book of linear algebra gems (it’s where I found the proof in this post). The exposition is lucid, and the chapters are short enough to read on my daily train commute.


Voltage, Temperature, and Harmonic Functions

This is a guest post by my friend and colleague Samantha Davies. Samantha is a math Ph.D student at the University of Washington, and a newly minted math blogger. Go check out her blog, With High Probability.

If I said “let’s talk about temperature and voltage”, you might be interested, but few would react the same if instead I suggested an umbrella term: harmonic functions.

Harmonic functions are often introduced as foreign operators that follow a finicky set of rules, when really they’re extremely natural phenomena that everyone has seen before. Why the disconnect then? A complex analysis class won’t take the time to talk about potential energy storage, and a book on data science won’t discuss fluid dynamics. Emphasis is placed on instances of harmonic functions in one setting, instead of looking more broadly at the whole class of functions.

Screenshot 2016-09-16 09.24.25.png

ams.org ©Disney/Pixar

Understanding harmonic functions on both discrete and continuous structures leads to a better appreciation of any setting where these functions are found, whether that’s animation, heat distribution, a network of political leanings, random walks, certain fluid flows, or electrical circuits.


Harmonic originated as a descriptor for something that’s undergoing harmonic motion. If a spring with an attached mass is stretched or compressed, releasing that spring results in the system moving up and down. The mass undergoes harmonic motion until the system returns to its equilibrium. These repetitive movements exhibited by the mass are known as oscillations. In physics lingo, a simple harmonic motion is a type of oscillation where the force that restores an object to its equilibrium is directly proportional to the displacement.



So, take the mass on a spring as an example: there’s some restoring force which works to bring the spring back to where it would naturally hang still. The system has more potential energy when it’s really stretched out from its equilibrium position, or really smooshed together from its equilibrium position (it’s more difficult to pull a spring really far out or push a spring really close together).

Equations representing harmonic motion have solutions which can be written as functions of sine and cosine, and these became known as harmonics.  Higher dimensional parallels of harmonics satisfy a certain partial differential equation called Laplace’s equation. As mathematicians are so skilled at using the same name for way too many things, today any real valued function with continuous second partial derivatives which satisfies Laplace’s equation is called a continuous harmonic function.

\Delta denotes the Laplace operator, also called the Laplacian. A continuous function u defined on some region \Omega satisfies Laplace’s equation when \Delta u=0, for all points in \Omega. My choice of the word “region” here is technical: being harmonic only makes sense in open, connected, nonempty sets.

The Laplacian is most easily described in terms of Cartesian coordinates, where it’s the sum of all u‘s unmixed second partial derivatives:

\begin{aligned} \Delta u(v)=\sum\limits_{i=1}^n \frac{\partial^2 f}{\partial x_i^2}(v). \end{aligned}

For example, u(x,y)=e^x \sin y is harmonic on all of \mathbb{R}^2 because

\begin{aligned} \Delta u(x,y)= \frac{\partial^2 f}{\partial x^2} \left (e^x \sin y\right ) + \frac{\partial^2 f}{\partial y^2} \left (e^x \sin y \right ) =e^x \sin y -e^x \sin y=0. \end{aligned}

Ironically, \cos(x) and \sin(x) are not harmonic in any region in \mathbb{R}, even though they define harmonic motion. Taking the second derivatives, \frac{d^2}{dx^2} \cos(x)=-\cos(x) and \frac{d^2}{dx^2} \sin(x)=-\sin(x) are only 0 at isolated points. In fact, harmonic functions on \mathbb{R} are exactly the linear functions.

The discrete Laplace operator is an analogue used on discrete structures like grids or graphs. These structures are all basically a set of vertices, where some pairs of vertices have an edge connecting them. A subset of vertices must be designated as the boundary, where the function has specified values called the boundary condition. The rest of the vertices are called interior.

A function u on a graph assigns a value to each interior vertex, while satisfying the boundary condition. There’s also a function \gamma which assigns a value to each edge in a way such that for every vertex, the sum of its outgoing edges’ values is 1 (without loss of generality, otherwise normalize arbitrary nonnegative weights). Keep in mind that the edge weight function is not necessarily symmetric, as it’s not required that \gamma(v,w)=\gamma(w,v). The discrete Laplacian is still denoted \Delta, and the discrete form of Laplace’s equation is described by:

\begin{aligned} (\Delta u)(v) = \sum\limits_{w \in N(v)} \gamma(v,w)\left( u(v)-u(w)\right)=0, \end{aligned}

where N(v), denotes the neighbors of v, which are the vertices connected to v by an edge.

Below is an example of a harmonic function \alpha on a grid which has symmetric edge weights. The function value is specified on the black boundary vertices, while the white interior vertices take on values that satisfy the discrete form of Laplace’s equation.


symmetric edge weights \\ function values

You can easily check that \alpha is harmonic at the corner vertices, but here’s the grosser computation for the 2 most central vertices on the grid (call the left one v_1 and the other v_2):

\begin{aligned} (\Delta \alpha)(v_1) &= \sum\limits_{w \in N(v_1)} \gamma(v_1,w)\left( \alpha(v_1)-\alpha(w)\right)=\frac{1}{4} \left( \frac{24}{55}-0\right)+\frac{1}{4} \left(\frac{24}{55}+4 \right)+\frac{3}{8} \left(\frac{24}{55}-\frac{64}{55}\right)+\frac{1}{8} \left(\frac{24}{55}-8\right )=0,\\ (\Delta \alpha)(v_2) &= \sum\limits_{w \in N(v_2)} \gamma(v_2,w)\left( \alpha(v_2)-\alpha(w)\right)=\frac{1}{8}  \left(\frac{64}{55}-8 \right )+\frac{3}{8} \left(\frac{64}{55}-0\right)+\frac{3}{8} \left(\frac{64}{55}-\frac{24}{55}\right)+\frac{1}{8} \left(\frac{64}{55}-0\right)=0. \end{aligned}

Properties of harmonic functions

The definition of a harmonic function is not enlightening… it’s just an equation. But, harmonic functions satisfy an array of properties that are much more helpful in eliciting intuition about their behavior. These properties also make the functions particularly friendly to work with in mathematical contexts.

Mean value property

The mean value property states that the value of a harmonic function at an interior point is the average of the function’s values around the point. Of course, the definition of “average” depends on the setting. For example, in \mathbb{R} harmonic functions are just linear functions. So given any linear function, say f(x)=3x-2, the value at a point can be found by averaging values on an interval around it:  f(1)= \frac14 \int_{-1}^{3} (3x-2)=1.

In \mathbb{R}^n, if u is harmonic in a region Ω which contains a ball centered around a, then u(a) is the average of u on the surface of the ball. So in the complex plane, or equivalently \mathbb{R}^2, if the circle has radius r_0, then for all 0 <r<r_0:

\begin{aligned} u(a)=\frac{1}{2 \pi} \int\limits_{0}^{2 \pi} u(a+re^{i \theta}) d \theta. \end{aligned}

In the discrete case, a harmonic function’s value on an interior vertex is the average of all the vertex’s neighbors’ values.  If u is harmonic at interior vertex v then the mean value property states that:

\begin{aligned} u(v)=\sum\limits_{w \in N(v)} \gamma(v,w)u(w). \end{aligned}

In the picture below, the black vertices are the boundary while the white vertices are interior, and the harmonic function \alpha has been defined on the graph. Here, \gamma(v,w)=\frac{1}{\deg(v)}, so this is an example with edge weights that are not necessarily symmetric. For instance, \gamma(v,w)=\frac14 while \gamma(w,v)=\frac13.


Foundations of Data Science pg 156

It’s easy to verify \alpha satisfies the mean value property at all interior vertices, but here’s the check for v:

\begin{aligned} \alpha(v)=\sum\limits_{x \in N(v)} \gamma(v,x)\alpha(x)=\frac14 \cdot 5+\frac14 \cdot 5+\frac14 \cdot 5+\frac14 \cdot 1=4. \end{aligned}

For the settings considered here, the mean value property is actually equivalent with being harmonic. If u is a continuous function which satisfies the mean value property on a region \Omega, then u is harmonic in \Omega (proof in appendix). Any function \alpha on a graph which satisfies the mean value property also satisfies the discrete Laplacian; just rearrange the mean value property to see this.

Maximum principle

The mean value property leads directly to the maximum/ minimum principle for harmonic functions. This principle says that a nonconstant harmonic function on a closed and bounded region must attain its maximum and minimum on the boundary. It’s a consequence that’s easy to see for continuous and discrete settings: if a max/min were attained at an interior point, that interior point is also the average of some set of surrounding points. But it’s impossible for an average to be strictly greater than or less than all of its surrounding points unless the function is constant.



Above are plots of x^2+y^2 and x^2-y^2 on the set [-3,3] \times [-3,3]. x^2+y^2 is nowhere harmonic and x^2-y^2 is harmonic everywhere, which means that x^2-y^2 must satisfy the maximum/ minimum principle but there’s no guarantee that x^2+y^2 does. Looking at the graph above, x^2+y^2  doesn’t satisfy the the minimum principle because the minimum of x^2+y^2 on the domain is 0, which is achieved at the interior point (0,0). On the same set, the maximum of x^2-y^2 is 9, which is achieved at boundary points (3,0) and (-3,0), and the minimum of x^2-y^2 is -9, which is achieved at boundary points (0,3) and (0,-3).


If a harmonic function is continuous on the boundary of a closed and bounded set and harmonic in the interior, then the values of the interior are uniquely determined by the values of the boundary. The proof of this is identical for the discrete and continuous case: if u_1 and u_2 were two such functions described above with the same boundary conditions, then u_1-u_2 is harmonic and its boundary values are 0. By the max/min principle, all the interior values are also 0, so u_1 = u_2 on the whole set.

Solution to a Dirichlet problem

Dirichlet problem is the problem of finding a function which solves some partial differential equation through out a region given its values on the region’s  boundary. When considering harmonic functions in the discrete or continuous setting, the PDE to be solved is the appropriate form of Laplace’s equation.

So, given a region and a set of values to take on the boundary, does there exist a function which takes the designated values on the boundary and is harmonic in the interior of the region? The answer in the continuous setting is yes, so long as the boundary condition is sufficiently smooth. In the discrete setting, it’s much more complicated, especially on networks with infinitely many vertices. For most practical instances though, like grids or finite graphs with edge weight function \gamma(v,w)=\frac{1}{\deg(v)}, the answer is still yes.


I’ll go into detail about a harmonic function on a continuous structure, steady state temperature, and on a discrete structure, voltage. These phenomena are already familiar to us, but weaving previous knowledge with a new appreciation for harmonic functions is crucial to understanding how they, and other natural instances of harmonic functions, behave.


Suppose you have an insulated object, like a wire, plate, or cube, and apply some temperature on its boundary. The temperature on the boundary will affect the temperature of the interior, and eventually these interior temperatures stabilize to some distribution which doesn’t change as time continues. This is known as the steady state temperature distribution.

A temperature function u of spatial variables x_i and time variable t, satisfies the heat equation

\begin{aligned} \frac{\partial u}{\partial t}-\alpha \Delta u=0, \end{aligned}

where \alpha>0 is a constant which represents the material’s ability to store and conduct thermal energy (these modules for complex analysis have a nice derivation of the heat equation ). The requirement that the temperature distribution is steady state yields the necessary condition that \frac{\partial u}{\partial t}=0 for all t,x_i. So, functions solving the heat equation must satisfy \Delta u=0, which means they are harmonic.

Intuitively, this is no surprise because any continuous function which satisfies the mean value property is harmonic. Still, temperature is a great example because it creates a familiar visual for a continuous harmonic function. Below is a picture of a steady state temperature distribution on a square metal plate of length \pi millimeters, where the bottom edge has temperature set to \pi ℃ and all other edges are set to 0℃.



Since in continuous settings the Dirichlet problem always has a unique solution for harmonic functions, there’s a unique temperature function which takes the designated values on the boundary and is harmonic in the interior of the region. That function is:

\begin{aligned} u(x,y)=\sum\limits_{n=0}^{\infty} \frac{4}{2n+1} \sin((2n+1) x) \left (\frac{\sinh((2n+1)(\pi-y))}{\sinh((2n+1) \pi)} \right ). \end{aligned}

(Here’s the derivation of that equation, but it’s too much Diff EQs for here). So the value of u at any point in the interior, like (1.5,0.5), can be calculated:

 \begin{aligned} u(1.5,0.5) = \sum\limits_{n=0}^{\infty} \frac{4}{2n+1} \sin((2n+1) 1.5) \left (\frac{\sinh((2n+1)(\pi-0.5))}{\sinh((2n+1) \pi)} \right ) \approx 2.17. \end{aligned}


An electrical network can easily be abstracted to a graph. Any point in the network where circuit elements meet is a vertex, and the wires (or other channels) are edges. Each wire has some amount of conductance. As the name suggests, conductance is a measure for how easy it is for electric charge to pass through the wire.

Edges in the graph are assigned nonnegative values representing the wires’ conductance relative to the total conductance emanating from the vertex. These values are the edge weights in the graph, and again they’re not necessarily symmetric. If c_{v,w} is the conductance of the edge between v and w and c_v=\sum_{x:N(v)}c_{v,x} is the total conductance emanating from v then, \gamma(v,w)=\frac{c_{v,w}}{c_v} while \gamma(w,v)=\frac{c_{v,w}}{c_w}.

The ground node in a network is an arbitrary node in the circuit picked to have an electrical potential energy per charge of 0 volts. So, the voltage at a node is the difference in electrical potential energy per charge between that node and the ground node.

The picture below shows an electrical network after the positive and negative terminals of a battery have been attached to two nodes. In this network, the conductance for all the wires is the same, so \gamma(v,w)=\frac{1}{\deg(v)} for all edges (v,w). When the positive and negative terminals of a battery are attached to two nodes in a circuit, a flow of electric charge, called current, is created and carried by electrons or ions.

The vertices representing the nodes attached to the battery are the boundary vertices, and here the node with the negative terminal attached to it as the ground node. After the battery is attached, voltage is induced on the rest of the nodes. The voltage taken on by each node minimizes the total energy lost as heat in the system.


source: Simons Foundation

The reader can quickly double check that the function is harmonic at all the interior points, or equivalently showing that the mean value property holds. After a couple extra facts about current it’s easy to show that the voltage at a node is always the average of its neighbors’ voltages, making voltage a harmonic function (proof in appendix). 

Wrap it up

At the end of the day, understanding harmonic functions makes you a better scientist (or better animation artist, apparently). If you want to understand phenomena like potential energy, data modeling, fluid flow, etc, then you’ve got to understand the math that controls it! Check out the appendix for proofs and extra facts that fell somewhere in the intersection of “things I didn’t think you’d care about” and “things I felt guilty leaving out”.


There are tons of references with more information about discrete harmonic functions, but I’m specifically interesting in those relating random walks and electrical networks. Here is a book whose 5th chapter presents some introductory material. Also this, from the Simons Foundation, and that, a paper from Benjamini and Lovász, are sources which dive deeper. The piece from the Simons Foundation further explains the relation between harmonic functions and minimizing the loss of energy in a network, like the voltage example with heat.

Again, here is an article about harmonic functions in animation.

These complex analysis modules, have sections about temperature distribution and fluid flow. They’re a pretty good tool for all your complex analysis needs, including some sections on harmonic functions in complex analysis.

Harmonic \Longleftrightarrow mean value property

This is obvious for the discrete case, as the discrete Laplacian can be rewritten to show that the function’s value at a point is just the average of its neighbors. Given a continuous function, u, which satisfies the mean value property throughout a disk, there exists (by Poisson’s formula) a function, v, harmonic in the same disk and equal to u on the boundary. By uniqueness, u=v through out the disk and u is harmonic in the disk. For anyone concerned about a region which isn’t a disk, the general fact is that the Dirichlet problem for harmonic function always has a solution.

Proof that voltage is a harmonic function

Voltage satisfies the mean value property and is therefore harmonic. As before c_{u,v} is the conductance of the edge between u and v and c_u is the total conductance emanating from u, c_u =\sum_{x:N(u)}c_{u,x} . Let p_{u,v}=\frac{c_{uv,}}{c_u} be the edge value on the edge from u to v in the graph.

Let i_{u,x} denote the current flowing through node u to node x, v_{u} denote the voltage at node u, c_{u,x} denote the conductance of the edge between u and x, and r_{u,x}=\frac{1}{c_{u,x}} denote the resistance of the edge between u and x. Ohm’s law relates current, resistance, and voltage by stating that

\begin{aligned} i_{u,x} = \frac{v_u-v_x}{r_{u,x}}=\left (v_u-v_x \right )c_{u,x} \end{aligned}.

Kirchoff’s law states that the total current from a node is 0, so for any u in the network

\begin{aligned} \sum\limits_{x} i_{u,x}=0. \end{aligned}

Now, we’re ready to show that voltage is a harmonic function.

\begin{aligned} \sum\limits_{x} i_{u,x}=0 \qquad &\Longleftrightarrow \qquad \sum\limits_{x} \left ( v_u-v_x \right ) c_{u,x}=0 \\ \Longleftrightarrow \qquad  \sum\limits_{x}  v_u c_{u,x}= \sum\limits_{x} v_x  c_{u,x} \qquad &\Longleftrightarrow \qquad v_u c_u=\sum\limits_{x} v_x p_{u,x}c_{u}\\ &\Longleftrightarrow \qquad v_u=\sum\limits_{x} v_x p_{u,x}. \end{aligned}

Guest post, “What’s up with graph Laplacians?”

I wrote a guest post for my friend Samantha Davies’s blog, With High Probability. It’s called, What’s up with graph Laplacians?

Go check it out!

Zero-Knowledge: Definitions and Theory

The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, “See that bird? What kind of bird is that?” I said, “I haven’t the slightest idea what kind of a bird it is.” He says, “It’s a brown-throated thrush. Your father doesn’t teach you anything!” But it was the opposite. He had already taught me: “See that bird?” he says. “It’s a Spencer’s warbler.” (I knew he didn’t know the real name.) “Well, in Italian, it’s a Chutto Lapittida. In Portuguese, it’s a Bom da Peida. In Chinese, it’s a Chung-long-tah, and in Japanese, it’s a Katano Tekeda. You can know the name of that bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird. You’ll only know about humans in different places, and what they call the bird. So let’s look at the bird and see what it’s doing—that’s what counts.” I learned very early the difference between knowing the name of something and knowing something.

In the first post in this series, we defined and implemented a simple zero-knowledge proof for graph isomorphism. In the second post, we saw a different protocol with a much heftier tagline: it gives a zero knowledge proof for any problem in the class NP. These two protocols used the same underlying framework—an interaction between a prover and a verifier—but they were actually very different.

Indeed, the graph isomorphism proof was “perfect” in two senses. First, it didn’t require any assumptions about cryptography. Nobody knows how to prove the Blum-Blum-Shub function is actually a one-way permutation (at the least, this would imply \textup{P} \neq \textup{NP}, so it’s probably hard to prove).

Second, in the graph isomorphism proof, the simulator produced transcripts of the protocol that came from the exact same distribution as the true transcripts created by the prover and verifier. This is why we called it zero-knowledge; anything the verifier can do with the output of the protocol, the simulator could do too. The verifier can’t be making use of the prover’s secret knowledge, since that information isn’t even in the simulator’s universe but the simulator can still compute what the verifier can.

But I didn’t tell you precisely why the 3-coloring protocol is a zero-knowledge proof, and in particular why it’s different from the graph isomorphism protocol. I also hinted that I had been misleading you, dear reader, as to the full 3-coloring proof of zero-knowledge. So in this post we’ll get into those nitty-gritty definitions that make the math rigorous. We’ll give a short sketch of the proof of zero-knowledge (the full proof would take many pages, not because it’s hard but because there are a lot of annoying details). And then we’ll give an overview of the landscape of theorems and conjectures about zero knowledge.

Information vs. knowledge

You can’t understand where the following definitions come from without the crucial distinction between information and knowledge from the computer scientist’s perspective. Information concerns how many essential bits are encoded in a message, and nothing more. In particular, information is not the same as computational complexity, the required amount of computational resources required to actually do something. Knowledge, on the other hand, refers to the computational abilities you gain with the information provided.

Here’s an example in layman’s terms: say I give you a zero-knowledge proof that cancer can be cured using a treatment that takes only five days. Even though I might thoroughly convince you my cure works by exhibiting patients with vanishing tumors, you’ll still struggle to find a cure. This is despite the fact that there might be more bits of information relayed in the messages sent during my “zero-knowledge proof” than the number of bits needed to describe the cure! On the other hand, every proof that 1+1=2 is a zero-knowledge proof, because it’s not computationally difficult to prove this on your own in the first place. You don’t gain any new computational powers even if I tell you flat out what the proof is.

Knowledge is Richard Feynman’s worldview, information is knowing the many names for a bird.

From this perspective information theory is abstruse, though it’s much easier to prove theoretical limits on computation in terms of information theory than in terms of computational complexity. However, the theoretical limits of computation are lower that the limits of information theory, which is the essence of what enables cryptography (from an information theory standpoint, all public-key cryptography is broken!). Reframing what’s possible in terms of computation opens a treasure chest of useful algorithms.

From here we’ll explore three definitions of zero-knowledge. First, we’ll see “perfect” zero knowledge, which is believed to be too strong to be practical. We’ll give a sketch of the “right” proof that graph isomorphism has a perfect zero-knowledge proof. Then we’ll see two relaxations to “statistical” and “computational” zero knowledge. We’ll discuss their theoretical relationships to each other and the rest of the world from a high level.

Perfect zero-knowledge

Recall from our first post, the interactive protocol for graph isomorphism has a very nice property. Say you took all the messages sent back and forth between the prover and verifier, and say you think of those messages as a random outcome of a probabilistic event. Then a simulator could have produced a set of messages drawn from the same distribution, without needing to see the messages from the protocol in the first place! So anything that the verifier could compute as a result of participating in the protocol, the simulator could compute without needing the protocol in the first place. The simulator just needs to assume the truth of the claim. This type of zero-knowledge is called perfect zero-knowledge.

Let’s distill this into some notation. In order to do this, we’ll shift to the theoretical way of discussing a “problem” as a language, i.e., a set of strings. For example, for 3-coloring you fix a method for describing graphs as strings (it doesn’t really matter which method you use), and the language is the set of strings

\displaystyle \{ G : G \textup{ has a valid 3-coloring } \}

Membership in the set is the same as saying an instance of the problem “does G have a 3-coloring?” has a yes answer. Throughout the entire discussion we will fix a generic way to encode any object as a binary string, and we’ll implicitly equate G with the string representing G, and call them both G. In this post I’ll always use L for a language.

Now say that P is an algorithm for the prover (a probabilistic Turing machine of arbitrary complexity), V is the analogous polynomial-time algorithm for the verifier, and M(P,V) is the resulting concatenated string of messages from one run of a zero-knowledge proof. Because P and V may be randomized, M(P,V) is a distribution over strings \{ 0, 1\}^m, and one run of the protocol is equivalent to drawing y \sim \{0,1\}^m according to this distribution. This just means: run an instance of the the protocol, which is random, and encode the output messages as a single string.

Now define a simulator as any probabilistic polynomial-time turing machine S. We interpret the simulator as trying to reproduce the output of the protocol without having access to the protocol’s messages, but with computational limits. The prover, verifier, and simulator all take as input the claim x of length n, and the verifier and simulator run in time polynomial in n. Reminder: in order to be an interactive proof for the language L, if x \in L then the verifier must be convinced with probability 1, and if x \not \in L then the verifier must reject with probability at least 1/2.

Definition (not quite correct): An interactive proof M(P,V) for a language L is said to have perfect zero-knowledge if some simulator S exists such that for every input x \in L, the distributions of the random variables M(P,V)(x) and S(x) are equal.

This definition isn’t quite correct, but it’s correct in spirit so that’s why we used it in our first post. Now let’s see a bit more detail about what really makes a zero-knowledge proof. Let’s remind ourselves of the argument for graph isomorphism, and then see why it’s not strong enough. The protocol we described is summed up as: prover has G_1, G_2, and sends a random permutation of G_1 to the verifier, calls it H. The verifier picks a random integer k which is either 1 or 2, sends it the prover. The prover produces an isomorphism between G_k and H, and sends the isomorphism back to the verifier, who checks to make sure the isomorphism is valid and accepts (otherwise, rejects).

In our argument, we said the simulator could simulate messages by picking k for the verifier before choosing the random permutation H, and this is the nit we need to pick. The problem is that the prover doesn’t trust the verifier, but the simulator does trust the verifier, and performs actions for the verifier that are faithful to the protocol!

What if a naughty verifier were able to glean more information by misbehaving in the protocol? Say, instead of randomly choosing k as 1 or 2, the verifier might always just pick k=1. Could such a misbehavior give the verifier more knowledge than strictly following the protocol?

Sure, maybe if the verifier did that, he wouldn’t end up with a valid proof (his acceptances and rejections may not be valid or sound), but it might be worth it if he can get a small bit of information about the hidden isomorphism. The goal is to treat the verifier like a hacker, but our simulator only simulates honest verifiers.

So a real zero-knowledge proof needs to account for a misbehaving verifier. One way to do this is to give the simulator black-box access to the verifier V, and prove that the distribution of messages produced by the simulator S(V, x) is equal to the true distribution of messages produced in the actual protocol with the same (possibly misbehaving) verifier, M(P,V)(x). Note that S can still know about the protocol that V is supposed to follow, but he can’t know exactly how V is defined.

Unfortunately, there is no known zero-knowledge proof for a nontrivial problem that satisfies this stronger criterion for perfect zero knowledge, and the graph isomorphism proof in particular doesn’t satisfy it. The way to fix it, which I’ll just say informally, is to allow the simulator to fail with probability at most 1/2. So to summarize, the simulator gets black-box access to the verifier and is guaranteed an input x \in L, and has to sample from the exact distribution of messages as the real protocol, but the simulator is allowed to say “KAPUT!” at most half the time, when it thinks the output will be bad. The distribution of simulator outputs (conditioned on no-KAPUT) should be equal to the true distribution of outputs.

Now we can sketch the real graph isomorphism proof. The simulator is defined as follows:

  1. Pick i \in \{ 1, 2 \} randomly and let H be a random permutation of G_i.
  2. Invoke the verifier V(H) as a subroutine with input H, and call the output k, which is either 1 or 2.
  3. If k \neq i, output “KAPUT!”, otherwise invoke V as a subroutine with the isomorphism H \to G_i as input.

Now to show the message distributions are equal. The message sent in step 2 is produced with a different process from the actual zero knowledge protocol (since the prover just takes a random isomorphic copy of G_1), but since we know G_1, G_2 are isomorphic, the distribution over H is the same in both cases.

Now when we invoke the verifier subroutine, we can’t know what the verifier will choose. But the point is that regardless of what the verifier does, there’s still a 50% chance it will pick k=i since we chose i randomly and independently of the definition of V.

If you ignore that we need to prove a few probability-theory facts (like that the choice of H really is uniformly random), then this completes the proof.


Statistical and computational zero-knowledge

While perfect zero-knowledge is the strongest form of zero-knowledge, it’s believed to be too strong. A more tractable definition is called statistical zero-knowledge, and for this we relax the requirement that the message distributions are equal to the requirement that they’re “very similar.” Informally, by “very similar” I mean that if the output strings have length m, then the sum of the differences of the probabilities is “basically” exponentially small in m.

More formally, if D_1, D_2 are two distributions over \{0,1\}^m, then we say they’re statistically close if the following distance function vanishes (asymptotically in m) faster than 1/m^c for any c > 0.

\displaystyle d(D_1, D_2) = \sum_{x \in \{0,1\}^m} \left | \Pr_{y \sim D_1}[y=x] - \Pr_{y \sim D_2}[y=x] \right |

This distance function just sums, for every outcome, the difference between the probabilities of those outcomes. In other words, being statistically close means that the two distributions can’t disagree on any particular input too much, nor can they disagree on most the outputs more than a tiny bit.

In little-o notation, statistically close means d(D_1, D_2) = o(1/m^c) for every constant c > 0. It could be like 2^{-m} or something vanishing much slower like 1/m^{\log \log m}. A function of m which has this property is called negligible, and it’s the standard choice of a security guarantee for cryptography.

Definition: Using the same notation as perfect zero-knowledge, an interactive proof is said to have statistical zero-knowledge if there is a simulator such that for every input x \in L, the distributions of M(P,V)(x) and S(x) are statistically close.

Interestingly, for this definition and the next (computational zero-knowledge), giving the simulator the power to output KAPUT with probability 1/2 doesn’t add any power. It turns out that any KAPUTful simulator can be transformed into a KAPUTless simulator with polynomial overhead. A sketch of how this works: run the simulator until either it does not KAPUT, or else you’ve tried n^2 times with all KAPUTs. The latter event happens with exponentially small probability, so in that case the KAPUTless simulator outputs a uniformly random string (it’s a “don’t care”). This maintains statistical closeness (and, as we’ll see, computational indistinguishability).

Finally, the weakest form of zero-knowledge isn’t so much a property of the two message distributions, but a property of a computationally limited algorithm trying to distinguish between them. We model this by having a probabilistic polynomial-time algorithm A which accepts as input the ability to draw from a distribution, and A produces as output a single bit. We interpret this bit as A‘s claim that the two distributions are “different.A‘s goal is to answer correctly. If A can do that with non-negligible probability, then A wins and “distinguishes” between the two input distributions. So we’ll call A the distinguisher.  Note that A can do things like draw a \textup{poly}(m) number of samples and do any polynomial-time computation with those samples.

Definition: An interactive proof M(P,V) is said to have computational zero-knowledge if there is a simulator S such that for every input x, and for every probabilistic polynomial-time algorithm A, the following probability is negligible.

\displaystyle \Pr[A(M(P,V)(x)) \neq A(S(x))]

In words, A fails to distinguish between M(P,V)(x) and S(x) with a significant (non-negligible) edge over random guessing. Another way to say it is that the distributions A(M(P,V)(x)) and A(S(x)) are statistically close. So the simulator can’t necessarily produce a statistically close distribution on messages, but the simulator can produce a distribution that fools a computationally limited observer.

If general, two distributions D_1, D_2 are called computationally indistinguishable if they don’t have a distinguisher. I.e., no polynomial-time adversary can distinguish between them with non-negligible probability, in the same way we described above.

So our landscape consists of three classes of problems nested as follows:

Perfect ZK \subset statistical ZK \subset Computational ZK

An interesting question we’ll discuss by towards the end of this post is whether these classes of problems are actually different.

Back to 3-coloring

This computational distinguishing property shows up all over cryptography. The reason is simple: almost all cryptographic hardness assumptions can be rephrased as the computational indistinguishability properties. The hardness of one-way functions, predicting pseudorandom generators, all of it is the inability of a polynomial-time adversary to solve a problem.

From this perspective, we can already see why it should be obvious that the zero-knowledge proof for 3-coloring has computational zero knowledge: we used crypto to commit to a coloring, and revealed the secret key later. If the verifier could break the zero-knowledge aspect, then they could defeat the underlying cryptographic primitive. The formal proof of computational zero-knowledge is a drawn-out, detail-oriented quest to entrench this idea in formalism and fit the simulator definition.

Nevertheless, it’s interesting to see where precisely the assumption makes its way into the simulator. Reminder of the protocol:

  1. The prover has a 3-coloring, and publicly commits to a random permutation of that coloring, sends it to the verifier.
  2. The verifier picks an edge uniformly at random, sends it to the prover.
  3. The prover reveals the colors for that edge, sends them to the verifier.
  4. The verifier checks that the revealed colors were indeed the ones committed to, and accepts if they are different colors.

Here’s the graph 3-coloring simulator, which involves some KAPUTing:

  1. Pick n colors c_1, \dots, c_n \in \{ 1,2,3 \} uniformly at random. Commit to these as the “coloring.”
  2. Invoke the verifier with the committed “coloring” as input, and receive an edge (i,j) as output.
  3. If c_i = c_j, KAPUT, else, run the verifier as a subroutine with the true c_i, c_j.

Note that if an honest verifier picks the edge randomly, then it will be properly colored with probability 2/3, and the simulator wins.

A dishonest verifier is trickier, because it gets access to the entire committed coloring. A priori there might be a devious way to select a bad edge. So let’s suppose for the sake of contradiction that the verifier has a method for choosing an improperly colored edge. While it takes a few steps to get here (and a lot more detail), in the end this means that the verifier can determine, with probability better at least \frac{1}{3} + \frac{1}{n^c} for some constant c, the difference between a commitment of the bit zero and a commitment of the bit one.

In other words, we can use a verifier that breaks zero-knowledge of this protocol as a bit-commitment-scheme breaker! But the verifier is a probabilistic polynomial-time algorithm, which contradicts the security assumption of a bit-commitment scheme.

Actually, I don’t think I ever formally said what the security assumption for a bit commitment scheme is, since in the previous posts I hadn’t provided the definition of computational indistinguishability. So here it is: a bit-commitment scheme, which is defined by a one-way permutation G with a hard-core predicate f, maps b \mapsto (G(x), f(x) \oplus b) and has the property that the distributions committing to zero and one are computationally indistinguishable. That is, the following two distributions:

\textup{commit}_n(0) = \{ (G(x), f(x) \oplus 0): x \in \{ 0,1 \}^n \}

\textup{commit}_n(1) = \{ (G(x), f(x) \oplus 1): x \in \{ 0,1 \}^n \}

So the verifier can pick a bad edge with probability as most 1/3 + \textup{negligible}, which is far less than 1/2. Then you can use the trick we showed earlier to get a good enough simulator that never KAPUTs. And that proves computational zero-knowledge.

Theorems and limitations

Here’s an interesting theorem:

Theorem: Any zero-knowledge protocol which doesn’t use randomization (on both sides) can be solved in randomized polynomial time (is in BPP).

So the random choices made by the prover/verifier are actually essential to the novelty of this concept.

I haven’t yet said anything about statistical zero-knowledge beyond the definition. It turns out there are a lot of interesting problems with statistical zero-knowledge proofs. I should really expand the discussion of statistical zero knowledge into its own post, but I want to wrap up this series and explore some other projects.

One such problem is pretty meta: the problem of telling whether two distributions are statistically close. In fact, this problem is complete for the class of problems with statistical zero-knowledge proofs, in the same sense that 3-coloring is NP-complete; any problem which has a statistical zero-knowledge proof can be reduced to a question about statistical closeness of two specific distributions. The class of problems with statistical zero-knowledge proofs is usually shortened to SZK (along with PZK for perfect zero-knowledge and CZK for computational zero-knowledge).

In addition to being interesting from a theoretical perspective, it’s a very practical question to study: if you can sample from a distribution efficiently, which properties of that distribution can you estimate efficiently? The completeness result says that the inherent complexity of this class of problems is captured by statistical zero-knowledge proofs.

As a theoretical object of study, SZK has massive advantages over perfect/computational zero-knowledge. In particular,

Theorem: Every interactive proof that has statistical zero-knowledge against an honest verifier can be transformed into a statistical zero-knowledge proof against an arbitrary adversary.

So the entire “gotcha” content of this post, which we had to worry about with perfect zero-knowledge, doesn’t exist for statistical zero-knowledge. It still exists for computational zero-knowledge, as far as I know, because it’s not know that every computational zero-knowledge proof also has statistical zero-knowledge (a stronger property).

On another front, people are relatively convinced that perfect zero-knowledge is too restrictive to be useful. This comes from the following theorem due to Fortnow

Theorem: If an NP-complete problem has a statistical zero-knowledge proof, then the polynomial hierarchy collapses.

Without going into detail about what the polynomial hierarchy is, suffice it to say that most people believe it’s unlikely. And because this means that SZK probably does not contain all of NP, we can ask natural followup questions like, can problems in SZK be solved by polynomial-time quantum algorithms? It’s also open whether these can solve NP-complete problems.

Statistical zero-knowledge can be specialized even further.

Theorem: Every problem with a statistical zero-knowledge proof has a statistical zero-knowledge proof in which the only messages the verifier sends are random coin flips.

This is a so-called “public coin” protocol, which makes zero-knowledge proofs much easier to reason about.

For further reading, see this introduction by Salil Vadhan, arguably the world’s leading expert on the topic. Particularly interesting is sections 3-5 in which Vadhan describes more detail about how zero-knowledge has been used as an “interface” for techniques to pass between complexity theory and cryptography.

For further reading with on the excruciating details of the definitions and proofs presented in this series, see Oded Goldreich’s Foundations of Cryptography (Volume I, Chapter 4). I have mixed feelings about this book because, despite the fact that I’ve found it enlightening, the book somehow manages to be simultaneously dense and scattered. It leaves no snag unsmoothed, for better or worse.

Until next time!