Featured Posts

Making Hybrid Images Neural Networks
and Backpropagation
Elliptic Curves
and Cryptography
bezier-dog-image-small circle-wedge-sphere baby_programmers
Bezier Curves and Picasso Computing Homology Probably Approximately
Correct – A Formal Theory
of Learning

Voltage, Temperature, and Harmonic Functions

This is a guest post by my friend and colleague Samantha Davies. Samantha is a math Ph.D student at the University of Washington, and a newly minted math blogger. Go check out her blog, With High Probability.

If I said “let’s talk about temperature and voltage”, you might be interested, but few would react the same if instead I suggested an umbrella term: harmonic functions.

Harmonic functions are often introduced as foreign operators that follow a finicky set of rules, when really they’re extremely natural phenomena that everyone has seen before. Why the disconnect then? A complex analysis class won’t take the time to talk about potential energy storage, and a book on data science won’t discuss fluid dynamics. Emphasis is placed on instances of harmonic functions in one setting, instead of looking more broadly at the whole class of functions.

Screenshot 2016-09-16 09.24.25.png

ams.org ©Disney/Pixar

Understanding harmonic functions on both discrete and continuous structures leads to a better appreciation of any setting where these functions are found, whether that’s animation, heat distribution, a network of political leanings, random walks, certain fluid flows, or electrical circuits.


Harmonic originated as a descriptor for something that’s undergoing harmonic motion. If a spring with an attached mass is stretched or compressed, releasing that spring results in the system moving up and down. The mass undergoes harmonic motion until the system returns to its equilibrium. These repetitive movements exhibited by the mass are known as oscillations. In physics lingo, a simple harmonic motion is a type of oscillation where the force that restores an object to its equilibrium is directly proportional to the displacement.



So, take the mass on a spring as an example: there’s some restoring force which works to bring the spring back to where it would naturally hang still. The system has more potential energy when it’s really stretched out from its equilibrium position, or really smooshed together from its equilibrium position (it’s more difficult to pull a spring really far out or push a spring really close together).

Equations representing harmonic motion have solutions which can be written as functions of sine and cosine, and these became known as harmonics.  Higher dimensional parallels of harmonics satisfy a certain partial differential equation called Laplace’s equation. As mathematicians are so skilled at using the same name for way too many things, today any real valued function with continuous second partial derivatives which satisfies Laplace’s equation is called a continuous harmonic function.

\Delta denotes the Laplace operator, also called the Laplacian. A continuous function u defined on some region \Omega satisfies Laplace’s equation when \Delta u=0, for all points in \Omega. My choice of the word “region” here is technical: being harmonic only makes sense in open, connected, nonempty sets.

The Laplacian is most easily described in terms of Cartesian coordinates, where it’s the sum of all u‘s unmixed second partial derivatives:

\begin{aligned} \Delta u(v)=\sum\limits_{i=1}^n \frac{\partial^2 f}{\partial x_i^2}(v). \end{aligned}

For example, u(x,y)=e^x \sin y is harmonic on all of \mathbb{R}^2 because

\begin{aligned} \Delta u(x,y)= \frac{\partial^2 f}{\partial x^2} \left (e^x \sin y\right ) + \frac{\partial^2 f}{\partial y^2} \left (e^x \sin y \right ) =e^x \sin y -e^x \sin y=0. \end{aligned}

Ironically, \cos(x) and \sin(x) are not harmonic in any region in \mathbb{R}, even though they define harmonic motion. Taking the second derivatives, \frac{d^2}{dx^2} \cos(x)=-\cos(x) and \frac{d^2}{dx^2} \sin(x)=-\sin(x) are only 0 at isolated points. In fact, harmonic functions on \mathbb{R} are exactly the linear functions.

The discrete Laplace operator is an analogue used on discrete structures like grids or graphs. These structures are all basically a set of vertices, where some pairs of vertices have an edge connecting them. A subset of vertices must be designated as the boundary, where the function has specified values called the boundary condition. The rest of the vertices are called interior.

A function u on a graph assigns a value to each interior vertex, while satisfying the boundary condition. There’s also a function \gamma which assigns a value to each edge in a way such that for every vertex, the sum of its outgoing edges’ values is 1 (without loss of generality, otherwise normalize arbitrary nonnegative weights). Keep in mind that the edge weight function is not necessarily symmetric, as it’s not required that \gamma(v,w)=\gamma(w,v). The discrete Laplacian is still denoted \Delta, and the discrete form of Laplace’s equation is described by:

\begin{aligned} (\Delta u)(v) = \sum\limits_{w \in N(v)} \gamma(v,w)\left( u(v)-u(w)\right)=0, \end{aligned}

where N(v), denotes the neighbors of v, which are the vertices connected to v by an edge.

Below is an example of a harmonic function \alpha on a grid which has symmetric edge weights. The function value is specified on the black boundary vertices, while the white interior vertices take on values that satisfy the discrete form of Laplace’s equation.


symmetric edge weights \\ function values

You can easily check that \alpha is harmonic at the corner vertices, but here’s the grosser computation for the 2 most central vertices on the grid (call the left one v_1 and the other v_2):

\begin{aligned} (\Delta \alpha)(v_1) &= \sum\limits_{w \in N(v_1)} \gamma(v_1,w)\left( \alpha(v_1)-\alpha(w)\right)=\frac{1}{4} \left( \frac{24}{55}-0\right)+\frac{1}{4} \left(\frac{24}{55}+4 \right)+\frac{3}{8} \left(\frac{24}{55}-\frac{64}{55}\right)+\frac{1}{8} \left(\frac{24}{55}-8\right )=0,\\ (\Delta \alpha)(v_2) &= \sum\limits_{w \in N(v_2)} \gamma(v_2,w)\left( \alpha(v_2)-\alpha(w)\right)=\frac{1}{8}  \left(\frac{64}{55}-8 \right )+\frac{3}{8} \left(\frac{64}{55}-0\right)+\frac{3}{8} \left(\frac{64}{55}-\frac{24}{55}\right)+\frac{1}{8} \left(\frac{64}{55}-0\right)=0. \end{aligned}

Properties of harmonic functions

The definition of a harmonic function is not enlightening… it’s just an equation. But, harmonic functions satisfy an array of properties that are much more helpful in eliciting intuition about their behavior. These properties also make the functions particularly friendly to work with in mathematical contexts.

Mean value property

The mean value property states that the value of a harmonic function at an interior point is the average of the function’s values around the point. Of course, the definition of “average” depends on the setting. For example, in \mathbb{R} harmonic functions are just linear functions. So given any linear function, say f(x)=3x-2, the value at a point can be found by averaging values on an interval around it:  f(1)= \frac14 \int_{-1}^{3} (3x-2)=1.

In \mathbb{R}^n, if u is harmonic in a region Ω which contains a ball centered around a, then u(a) is the average of u on the surface of the ball. So in the complex plane, or equivalently \mathbb{R}^2, if the circle has radius r_0, then for all 0 <r<r_0:

\begin{aligned} u(a)=\frac{1}{2 \pi} \int\limits_{0}^{2 \pi} u(a+re^{i \theta}) d \theta. \end{aligned}

In the discrete case, a harmonic function’s value on an interior vertex is the average of all the vertex’s neighbors’ values.  If u is harmonic at interior vertex v then the mean value property states that:

\begin{aligned} u(v)=\sum\limits_{w \in N(v)} \gamma(v,w)u(w). \end{aligned}

In the picture below, the black vertices are the boundary while the white vertices are interior, and the harmonic function \alpha has been defined on the graph. Here, \gamma(v,w)=\frac{1}{\deg(v)}, so this is an example with edge weights that are not necessarily symmetric. For instance, \gamma(v,w)=\frac14 while \gamma(w,v)=\frac13.


Foundations of Data Science pg 156

It’s easy to verify \alpha satisfies the mean value property at all interior vertices, but here’s the check for v:

\begin{aligned} \alpha(v)=\sum\limits_{x \in N(v)} \gamma(v,x)\alpha(x)=\frac14 \cdot 5+\frac14 \cdot 5+\frac14 \cdot 5+\frac14 \cdot 1=4. \end{aligned}

For the settings considered here, the mean value property is actually equivalent with being harmonic. If u is a continuous function which satisfies the mean value property on a region \Omega, then u is harmonic in \Omega (proof in appendix). Any function \alpha on a graph which satisfies the mean value property also satisfies the discrete Laplacian; just rearrange the mean value property to see this.

Maximum principle

The mean value property leads directly to the maximum/ minimum principle for harmonic functions. This principle says that a nonconstant harmonic function on a closed and bounded region must attain its maximum and minimum on the boundary. It’s a consequence that’s easy to see for continuous and discrete settings: if a max/min were attained at an interior point, that interior point is also the average of some set of surrounding points. But it’s impossible for an average to be strictly greater than or less than all of its surrounding points unless the function is constant.



Above are plots of x^2+y^2 and x^2-y^2 on the set [-3,3] \times [-3,3]. x^2+y^2 is nowhere harmonic and x^2-y^2 is harmonic everywhere, which means that x^2-y^2 must satisfy the maximum/ minimum principle but there’s no guarantee that x^2+y^2 does. Looking at the graph above, x^2+y^2  doesn’t satisfy the the minimum principle because the minimum of x^2+y^2 on the domain is 0, which is achieved at the interior point (0,0). On the same set, the maximum of x^2-y^2 is 9, which is achieved at boundary points (3,0) and (-3,0), and the minimum of x^2-y^2 is -9, which is achieved at boundary points (0,3) and (0,-3).


If a harmonic function is continuous on the boundary of a closed and bounded set and harmonic in the interior, then the values of the interior are uniquely determined by the values of the boundary. The proof of this is identical for the discrete and continuous case: if u_1 and u_2 were two such functions described above with the same boundary conditions, then u_1-u_2 is harmonic and its boundary values are 0. By the max/min principle, all the interior values are also 0, so u_1 = u_2 on the whole set.

Solution to a Dirichlet problem

Dirichlet problem is the problem of finding a function which solves some partial differential equation through out a region given its values on the region’s  boundary. When considering harmonic functions in the discrete or continuous setting, the PDE to be solved is the appropriate form of Laplace’s equation.

So, given a region and a set of values to take on the boundary, does there exist a function which takes the designated values on the boundary and is harmonic in the interior of the region? The answer in the continuous setting is yes, so long as the boundary condition is sufficiently smooth. In the discrete setting, it’s much more complicated, especially on networks with infinitely many vertices. For most practical instances though, like grids or finite graphs with edge weight function \gamma(v,w)=\frac{1}{\deg(v)}, the answer is still yes.


I’ll go into detail about a harmonic function on a continuous structure, steady state temperature, and on a discrete structure, voltage. These phenomena are already familiar to us, but weaving previous knowledge with a new appreciation for harmonic functions is crucial to understanding how they, and other natural instances of harmonic functions, behave.


Suppose you have an insulated object, like a wire, plate, or cube, and apply some temperature on its boundary. The temperature on the boundary will affect the temperature of the interior, and eventually these interior temperatures stabilize to some distribution which doesn’t change as time continues. This is known as the steady state temperature distribution.

A temperature function u of spatial variables x_i and time variable t, satisfies the heat equation

\begin{aligned} \frac{\partial u}{\partial t}-\alpha \Delta u=0, \end{aligned}

where \alpha>0 is a constant which represents the material’s ability to store and conduct thermal energy (these modules for complex analysis have a nice derivation of the heat equation ). The requirement that the temperature distribution is steady state yields the necessary condition that \frac{\partial u}{\partial t}=0 for all t,x_i. So, functions solving the heat equation must satisfy \Delta u=0, which means they are harmonic.

Intuitively, this is no surprise because any continuous function which satisfies the mean value property is harmonic. Still, temperature is a great example because it creates a familiar visual for a continuous harmonic function. Below is a picture of a steady state temperature distribution on a square metal plate of length \pi millimeters, where the bottom edge has temperature set to \pi ℃ and all other edges are set to 0℃.



Since in continuous settings the Dirichlet problem always has a unique solution for harmonic functions, there’s a unique temperature function which takes the designated values on the boundary and is harmonic in the interior of the region. That function is:

\begin{aligned} u(x,y)=\sum\limits_{n=0}^{\infty} \frac{4}{2n+1} \sin((2n+1) x) \left (\frac{\sinh((2n+1)(\pi-y))}{\sinh((2n+1) \pi)} \right ). \end{aligned}

(Here’s the derivation of that equation, but it’s too much Diff EQs for here). So the value of u at any point in the interior, like (1.5,0.5), can be calculated:

 \begin{aligned} u(1.5,0.5) = \sum\limits_{n=0}^{\infty} \frac{4}{2n+1} \sin((2n+1) 1.5) \left (\frac{\sinh((2n+1)(\pi-0.5))}{\sinh((2n+1) \pi)} \right ) \approx 2.17. \end{aligned}


An electrical network can easily be abstracted to a graph. Any point in the network where circuit elements meet is a vertex, and the wires (or other channels) are edges. Each wire has some amount of conductance. As the name suggests, conductance is a measure for how easy it is for electric charge to pass through the wire.

Edges in the graph are assigned nonnegative values representing the wires’ conductance relative to the total conductance emanating from the vertex. These values are the edge weights in the graph, and again they’re not necessarily symmetric. If c_{v,w} is the conductance of the edge between v and w and c_v=\sum_{x:N(v)}c_{v,x} is the total conductance emanating from v then, \gamma(v,w)=\frac{c_{v,w}}{c_v} while \gamma(w,v)=\frac{c_{v,w}}{c_w}.

The ground node in a network is an arbitrary node in the circuit picked to have an electrical potential energy per charge of 0 volts. So, the voltage at a node is the difference in electrical potential energy per charge between that node and the ground node.

The picture below shows an electrical network after the positive and negative terminals of a battery have been attached to two nodes. In this network, the conductance for all the wires is the same, so \gamma(v,w)=\frac{1}{\deg(v)} for all edges (v,w). When the positive and negative terminals of a battery are attached to two nodes in a circuit, a flow of electric charge, called current, is created and carried by electrons or ions.

The vertices representing the nodes attached to the battery are the boundary vertices, and here the node with the negative terminal attached to it as the ground node. After the battery is attached, voltage is induced on the rest of the nodes. The voltage taken on by each node minimizes the total energy lost as heat in the system.


source: Simons Foundation

The reader can quickly double check that the function is harmonic at all the interior points, or equivalently showing that the mean value property holds. After a couple extra facts about current it’s easy to show that the voltage at a node is always the average of its neighbors’ voltages, making voltage a harmonic function (proof in appendix). 

Wrap it up

At the end of the day, understanding harmonic functions makes you a better scientist (or better animation artist, apparently). If you want to understand phenomena like potential energy, data modeling, fluid flow, etc, then you’ve got to understand the math that controls it! Check out the appendix for proofs and extra facts that fell somewhere in the intersection of “things I didn’t think you’d care about” and “things I felt guilty leaving out”.


There are tons of references with more information about discrete harmonic functions, but I’m specifically interesting in those relating random walks and electrical networks. Here is a book whose 5th chapter presents some introductory material. Also this, from the Simons Foundation, and that, a paper from Benjamini and Lovász, are sources which dive deeper. The piece from the Simons Foundation further explains the relation between harmonic functions and minimizing the loss of energy in a network, like the voltage example with heat.

Again, here is an article about harmonic functions in animation.

These complex analysis modules, have sections about temperature distribution and fluid flow. They’re a pretty good tool for all your complex analysis needs, including some sections on harmonic functions in complex analysis.

Harmonic \Longleftrightarrow mean value property

This is obvious for the discrete case, as the discrete Laplacian can be rewritten to show that the function’s value at a point is just the average of its neighbors. Given a continuous function, u, which satisfies the mean value property throughout a disk, there exists (by Poisson’s formula) a function, v, harmonic in the same disk and equal to u on the boundary. By uniqueness, u=v through out the disk and u is harmonic in the disk. For anyone concerned about a region which isn’t a disk, the general fact is that the Dirichlet problem for harmonic function always has a solution.

Proof that voltage is a harmonic function

Voltage satisfies the mean value property and is therefore harmonic. As before c_{u,v} is the conductance of the edge between u and v and c_u is the total conductance emanating from u, c_u =\sum_{x:N(u)}c_{u,x} . Let p_{u,v}=\frac{c_{uv,}}{c_u} be the edge value on the edge from u to v in the graph.

Let i_{u,x} denote the current flowing through node u to node x, v_{u} denote the voltage at node u, c_{u,x} denote the conductance of the edge between u and x, and r_{u,x}=\frac{1}{c_{u,x}} denote the resistance of the edge between u and x. Ohm’s law relates current, resistance, and voltage by stating that

\begin{aligned} i_{u,x} = \frac{v_u-v_x}{r_{u,x}}=\left (v_u-v_x \right )c_{u,x} \end{aligned}.

Kirchoff’s law states that the total current from a node is 0, so for any u in the network

\begin{aligned} \sum\limits_{x} i_{u,x}=0. \end{aligned}

Now, we’re ready to show that voltage is a harmonic function.

\begin{aligned} \sum\limits_{x} i_{u,x}=0 \qquad &\Longleftrightarrow \qquad \sum\limits_{x} \left ( v_u-v_x \right ) c_{u,x}=0 \\ \Longleftrightarrow \qquad  \sum\limits_{x}  v_u c_{u,x}= \sum\limits_{x} v_x  c_{u,x} \qquad &\Longleftrightarrow \qquad v_u c_u=\sum\limits_{x} v_x p_{u,x}c_{u}\\ &\Longleftrightarrow \qquad v_u=\sum\limits_{x} v_x p_{u,x}. \end{aligned}

Guest post, “What’s up with graph Laplacians?”

I wrote a guest post for my friend Samantha Davies’s blog, With High Probability. It’s called, What’s up with graph Laplacians?

Go check it out!

Zero-Knowledge: Definitions and Theory

The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, “See that bird? What kind of bird is that?” I said, “I haven’t the slightest idea what kind of a bird it is.” He says, “It’s a brown-throated thrush. Your father doesn’t teach you anything!” But it was the opposite. He had already taught me: “See that bird?” he says. “It’s a Spencer’s warbler.” (I knew he didn’t know the real name.) “Well, in Italian, it’s a Chutto Lapittida. In Portuguese, it’s a Bom da Peida. In Chinese, it’s a Chung-long-tah, and in Japanese, it’s a Katano Tekeda. You can know the name of that bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird. You’ll only know about humans in different places, and what they call the bird. So let’s look at the bird and see what it’s doing—that’s what counts.” I learned very early the difference between knowing the name of something and knowing something.

In the first post in this series, we defined and implemented a simple zero-knowledge proof for graph isomorphism. In the second post, we saw a different protocol with a much heftier tagline: it gives a zero knowledge proof for any problem in the class NP. These two protocols used the same underlying framework—an interaction between a prover and a verifier—but they were actually very different.

Indeed, the graph isomorphism proof was “perfect” in two senses. First, it didn’t require any assumptions about cryptography. Nobody knows how to prove the Blum-Blum-Shub function is actually a one-way permutation (at the least, this would imply \textup{P} \neq \textup{NP}, so it’s probably hard to prove).

Second, in the graph isomorphism proof, the simulator produced transcripts of the protocol that came from the exact same distribution as the true transcripts created by the prover and verifier. This is why we called it zero-knowledge; anything the verifier can do with the output of the protocol, the simulator could do too. The verifier can’t be making use of the prover’s secret knowledge, since that information isn’t even in the simulator’s universe but the simulator can still compute what the verifier can.

But I didn’t tell you precisely why the 3-coloring protocol is a zero-knowledge proof, and in particular why it’s different from the graph isomorphism protocol. I also hinted that I had been misleading you, dear reader, as to the full 3-coloring proof of zero-knowledge. So in this post we’ll get into those nitty-gritty definitions that make the math rigorous. We’ll give a short sketch of the proof of zero-knowledge (the full proof would take many pages, not because it’s hard but because there are a lot of annoying details). And then we’ll give an overview of the landscape of theorems and conjectures about zero knowledge.

Information vs. knowledge

You can’t understand where the following definitions come from without the crucial distinction between information and knowledge from the computer scientist’s perspective. Information concerns how many essential bits are encoded in a message, and nothing more. In particular, information is not the same as computational complexity, the required amount of computational resources required to actually do something. Knowledge, on the other hand, refers to the computational abilities you gain with the information provided.

Here’s an example in layman’s terms: say I give you a zero-knowledge proof that cancer can be cured using a treatment that takes only five days. Even though I might thoroughly convince you my cure works by exhibiting patients with vanishing tumors, you’ll still struggle to find a cure. This is despite the fact that there might be more bits of information relayed in the messages sent during my “zero-knowledge proof” than the number of bits needed to describe the cure! On the other hand, every proof that 1+1=2 is a zero-knowledge proof, because it’s not computationally difficult to prove this on your own in the first place. You don’t gain any new computational powers even if I tell you flat out what the proof is.

Knowledge is Richard Feynman’s worldview, information is knowing the many names for a bird.

From this perspective information theory is abstruse, though it’s much easier to prove theoretical limits on computation in terms of information theory than in terms of computational complexity. However, the theoretical limits of computation are lower that the limits of information theory, which is the essence of what enables cryptography (from an information theory standpoint, all public-key cryptography is broken!). Reframing what’s possible in terms of computation opens a treasure chest of useful algorithms.

From here we’ll explore three definitions of zero-knowledge. First, we’ll see “perfect” zero knowledge, which is believed to be too strong to be practical. We’ll give a sketch of the “right” proof that graph isomorphism has a perfect zero-knowledge proof. Then we’ll see two relaxations to “statistical” and “computational” zero knowledge. We’ll discuss their theoretical relationships to each other and the rest of the world from a high level.

Perfect zero-knowledge

Recall from our first post, the interactive protocol for graph isomorphism has a very nice property. Say you took all the messages sent back and forth between the prover and verifier, and say you think of those messages as a random outcome of a probabilistic event. Then a simulator could have produced a set of messages drawn from the same distribution, without needing to see the messages from the protocol in the first place! So anything that the verifier could compute as a result of participating in the protocol, the simulator could compute without needing the protocol in the first place. The simulator just needs to assume the truth of the claim. This type of zero-knowledge is called perfect zero-knowledge.

Let’s distill this into some notation. In order to do this, we’ll shift to the theoretical way of discussing a “problem” as a language, i.e., a set of strings. For example, for 3-coloring you fix a method for describing graphs as strings (it doesn’t really matter which method you use), and the language is the set of strings

\displaystyle \{ G : G \textup{ has a valid 3-coloring } \}

Membership in the set is the same as saying an instance of the problem “does G have a 3-coloring?” has a yes answer. Throughout the entire discussion we will fix a generic way to encode any object as a binary string, and we’ll implicitly equate G with the string representing G, and call them both G. In this post I’ll always use L for a language.

Now say that P is an algorithm for the prover (a probabilistic Turing machine of arbitrary complexity), V is the analogous polynomial-time algorithm for the verifier, and M(P,V) is the resulting concatenated string of messages from one run of a zero-knowledge proof. Because P and V may be randomized, M(P,V) is a distribution over strings \{ 0, 1\}^m, and one run of the protocol is equivalent to drawing y \sim \{0,1\}^m according to this distribution. This just means: run an instance of the the protocol, which is random, and encode the output messages as a single string.

Now define a simulator as any probabilistic polynomial-time turing machine S. We interpret the simulator as trying to reproduce the output of the protocol without having access to the protocol’s messages, but with computational limits. The prover, verifier, and simulator all take as input the claim x of length n, and the verifier and simulator run in time polynomial in n. Reminder: in order to be an interactive proof for the language L, if x \in L then the verifier must be convinced with probability 1, and if x \not \in L then the verifier must reject with probability at least 1/2.

Definition (not quite correct): An interactive proof M(P,V) for a language L is said to have perfect zero-knowledge if some simulator S exists such that for every input x \in L, the distributions of the random variables M(P,V)(x) and S(x) are equal.

This definition isn’t quite correct, but it’s correct in spirit so that’s why we used it in our first post. Now let’s see a bit more detail about what really makes a zero-knowledge proof. Let’s remind ourselves of the argument for graph isomorphism, and then see why it’s not strong enough. The protocol we described is summed up as: prover has G_1, G_2, and sends a random permutation of G_1 to the verifier, calls it H. The verifier picks a random integer k which is either 1 or 2, sends it the prover. The prover produces an isomorphism between G_k and H, and sends the isomorphism back to the verifier, who checks to make sure the isomorphism is valid and accepts (otherwise, rejects).

In our argument, we said the simulator could simulate messages by picking k for the verifier before choosing the random permutation H, and this is the nit we need to pick. The problem is that the prover doesn’t trust the verifier, but the simulator does trust the verifier, and performs actions for the verifier that are faithful to the protocol!

What if a naughty verifier were able to glean more information by misbehaving in the protocol? Say, instead of randomly choosing k as 1 or 2, the verifier might always just pick k=1. Could such a misbehavior give the verifier more knowledge than strictly following the protocol?

Sure, maybe if the verifier did that, he wouldn’t end up with a valid proof (his acceptances and rejections may not be valid or sound), but it might be worth it if he can get a small bit of information about the hidden isomorphism. The goal is to treat the verifier like a hacker, but our simulator only simulates honest verifiers.

So a real zero-knowledge proof needs to account for a misbehaving verifier. One way to do this is to give the simulator black-box access to the verifier V, and prove that the distribution of messages produced by the simulator S(V, x) is equal to the true distribution of messages produced in the actual protocol with the same (possibly misbehaving) verifier, M(P,V)(x). Note that S can still know about the protocol that V is supposed to follow, but he can’t know exactly how V is defined.

Unfortunately, there is no known zero-knowledge proof for a nontrivial problem that satisfies this stronger criterion for perfect zero knowledge, and the graph isomorphism proof in particular doesn’t satisfy it. The way to fix it, which I’ll just say informally, is to allow the simulator to fail with probability at most 1/2. So to summarize, the simulator gets black-box access to the verifier and is guaranteed an input x \in L, and has to sample from the exact distribution of messages as the real protocol, but the simulator is allowed to say “KAPUT!” at most half the time, when it thinks the output will be bad. The distribution of simulator outputs (conditioned on no-KAPUT) should be equal to the true distribution of outputs.

Now we can sketch the real graph isomorphism proof. The simulator is defined as follows:

  1. Pick i \in \{ 1, 2 \} randomly and let H be a random permutation of G_i.
  2. Invoke the verifier V(H) as a subroutine with input H, and call the output k, which is either 1 or 2.
  3. If k \neq i, output “KAPUT!”, otherwise invoke V as a subroutine with the isomorphism H \to G_i as input.

Now to show the message distributions are equal. The message sent in step 2 is produced with a different process from the actual zero knowledge protocol (since the prover just takes a random isomorphic copy of G_1), but since we know G_1, G_2 are isomorphic, the distribution over H is the same in both cases.

Now when we invoke the verifier subroutine, we can’t know what the verifier will choose. But the point is that regardless of what the verifier does, there’s still a 50% chance it will pick k=i since we chose i randomly and independently of the definition of V.

If you ignore that we need to prove a few probability-theory facts (like that the choice of H really is uniformly random), then this completes the proof.


Statistical and computational zero-knowledge

While perfect zero-knowledge is the strongest form of zero-knowledge, it’s believed to be too strong. A more tractable definition is called statistical zero-knowledge, and for this we relax the requirement that the message distributions are equal to the requirement that they’re “very similar.” Informally, by “very similar” I mean that if the output strings have length m, then the sum of the differences of the probabilities is “basically” exponentially small in m.

More formally, if D_1, D_2 are two distributions over \{0,1\}^m, then we say they’re statistically close if the following distance function vanishes (asymptotically in m) faster than 1/m^c for any c > 0.

\displaystyle d(D_1, D_2) = \sum_{x \in \{0,1\}^m} \left | \Pr_{y \sim D_1}[y=x] - \Pr_{y \sim D_2}[y=x] \right |

This distance function just sums, for every outcome, the difference between the probabilities of those outcomes. In other words, being statistically close means that the two distributions can’t disagree on any particular input too much, nor can they disagree on most the outputs more than a tiny bit.

In little-o notation, statistically close means d(D_1, D_2) = o(1/m^c) for every constant c > 0. It could be like 2^{-m} or something vanishing much slower like 1/m^{\log \log m}. A function of m which has this property is called negligible, and it’s the standard choice of a security guarantee for cryptography.

Definition: Using the same notation as perfect zero-knowledge, an interactive proof is said to have statistical zero-knowledge if there is a simulator such that for every input x \in L, the distributions of M(P,V)(x) and S(x) are statistically close.

Interestingly, for this definition and the next (computational zero-knowledge), giving the simulator the power to output KAPUT with probability 1/2 doesn’t add any power. It turns out that any KAPUTful simulator can be transformed into a KAPUTless simulator with polynomial overhead. A sketch of how this works: run the simulator until either it does not KAPUT, or else you’ve tried n^2 times with all KAPUTs. The latter event happens with exponentially small probability, so in that case the KAPUTless simulator outputs a uniformly random string (it’s a “don’t care”). This maintains statistical closeness (and, as we’ll see, computational indistinguishability).

Finally, the weakest form of zero-knowledge isn’t so much a property of the two message distributions, but a property of a computationally limited algorithm trying to distinguish between them. We model this by having a probabilistic polynomial-time algorithm A which accepts as input the ability to draw from a distribution, and A produces as output a single bit. We interpret this bit as A‘s claim that the two distributions are “different.A‘s goal is to answer correctly. If A can do that with non-negligible probability, then A wins and “distinguishes” between the two input distributions. So we’ll call A the distinguisher.  Note that A can do things like draw a \textup{poly}(m) number of samples and do any polynomial-time computation with those samples.

Definition: An interactive proof M(P,V) is said to have computational zero-knowledge if there is a simulator S such that for every input x, and for every probabilistic polynomial-time algorithm A, the following probability is negligible.

\displaystyle \Pr[A(M(P,V)(x)) \neq A(S(x))]

In words, A fails to distinguish between M(P,V)(x) and S(x) with a significant (non-negligible) edge over random guessing. Another way to say it is that the distributions A(M(P,V)(x)) and A(S(x)) are statistically close. So the simulator can’t necessarily produce a statistically close distribution on messages, but the simulator can produce a distribution that fools a computationally limited observer.

If general, two distributions D_1, D_2 are called computationally indistinguishable if they don’t have a distinguisher. I.e., no polynomial-time adversary can distinguish between them with non-negligible probability, in the same way we described above.

So our landscape consists of three classes of problems nested as follows:

Perfect ZK \subset statistical ZK \subset Computational ZK

An interesting question we’ll discuss by towards the end of this post is whether these classes of problems are actually different.

Back to 3-coloring

This computational distinguishing property shows up all over cryptography. The reason is simple: almost all cryptographic hardness assumptions can be rephrased as the computational indistinguishability properties. The hardness of one-way functions, predicting pseudorandom generators, all of it is the inability of a polynomial-time adversary to solve a problem.

From this perspective, we can already see why it should be obvious that the zero-knowledge proof for 3-coloring has computational zero knowledge: we used crypto to commit to a coloring, and revealed the secret key later. If the verifier could break the zero-knowledge aspect, then they could defeat the underlying cryptographic primitive. The formal proof of computational zero-knowledge is a drawn-out, detail-oriented quest to entrench this idea in formalism and fit the simulator definition.

Nevertheless, it’s interesting to see where precisely the assumption makes its way into the simulator. Reminder of the protocol:

  1. The prover has a 3-coloring, and publicly commits to a random permutation of that coloring, sends it to the verifier.
  2. The verifier picks an edge uniformly at random, sends it to the prover.
  3. The prover reveals the colors for that edge, sends them to the verifier.
  4. The verifier checks that the revealed colors were indeed the ones committed to, and accepts if they are different colors.

Here’s the graph 3-coloring simulator, which involves some KAPUTing:

  1. Pick n colors c_1, \dots, c_n \in \{ 1,2,3 \} uniformly at random. Commit to these as the “coloring.”
  2. Invoke the verifier with the committed “coloring” as input, and receive an edge (i,j) as output.
  3. If c_i = c_j, KAPUT, else, run the verifier as a subroutine with the true c_i, c_j.

Note that if an honest verifier picks the edge randomly, then it will be properly colored with probability 2/3, and the simulator wins.

A dishonest verifier is trickier, because it gets access to the entire committed coloring. A priori there might be a devious way to select a bad edge. So let’s suppose for the sake of contradiction that the verifier has a method for choosing an improperly colored edge. While it takes a few steps to get here (and a lot more detail), in the end this means that the verifier can determine, with probability better at least \frac{1}{3} + \frac{1}{n^c} for some constant c, the difference between a commitment of the bit zero and a commitment of the bit one.

In other words, we can use a verifier that breaks zero-knowledge of this protocol as a bit-commitment-scheme breaker! But the verifier is a probabilistic polynomial-time algorithm, which contradicts the security assumption of a bit-commitment scheme.

Actually, I don’t think I ever formally said what the security assumption for a bit commitment scheme is, since in the previous posts I hadn’t provided the definition of computational indistinguishability. So here it is: a bit-commitment scheme, which is defined by a one-way permutation G with a hard-core predicate f, maps b \mapsto (G(x), f(x) \oplus b) and has the property that the distributions committing to zero and one are computationally indistinguishable. That is, the following two distributions:

\textup{commit}_n(0) = \{ (G(x), f(x) \oplus 0): x \in \{ 0,1 \}^n \}

\textup{commit}_n(1) = \{ (G(x), f(x) \oplus 1): x \in \{ 0,1 \}^n \}

So the verifier can pick a bad edge with probability as most 1/3 + \textup{negligible}, which is far less than 1/2. Then you can use the trick we showed earlier to get a good enough simulator that never KAPUTs. And that proves computational zero-knowledge.

Theorems and limitations

Here’s an interesting theorem:

Theorem: Any zero-knowledge protocol which doesn’t use randomization (on both sides) can be solved in randomized polynomial time (is in BPP).

So the random choices made by the prover/verifier are actually essential to the novelty of this concept.

I haven’t yet said anything about statistical zero-knowledge beyond the definition. It turns out there are a lot of interesting problems with statistical zero-knowledge proofs. I should really expand the discussion of statistical zero knowledge into its own post, but I want to wrap up this series and explore some other projects.

One such problem is pretty meta: the problem of telling whether two distributions are statistically close. In fact, this problem is complete for the class of problems with statistical zero-knowledge proofs, in the same sense that 3-coloring is NP-complete; any problem which has a statistical zero-knowledge proof can be reduced to a question about statistical closeness of two specific distributions. The class of problems with statistical zero-knowledge proofs is usually shortened to SZK (along with PZK for perfect zero-knowledge and CZK for computational zero-knowledge).

In addition to being interesting from a theoretical perspective, it’s a very practical question to study: if you can sample from a distribution efficiently, which properties of that distribution can you estimate efficiently? The completeness result says that the inherent complexity of this class of problems is captured by statistical zero-knowledge proofs.

As a theoretical object of study, SZK has massive advantages over perfect/computational zero-knowledge. In particular,

Theorem: Every interactive proof that has statistical zero-knowledge against an honest verifier can be transformed into a statistical zero-knowledge proof against an arbitrary adversary.

So the entire “gotcha” content of this post, which we had to worry about with perfect zero-knowledge, doesn’t exist for statistical zero-knowledge. It still exists for computational zero-knowledge, as far as I know, because it’s not know that every computational zero-knowledge proof also has statistical zero-knowledge (a stronger property).

On another front, people are relatively convinced that perfect zero-knowledge is too restrictive to be useful. This comes from the following theorem due to Fortnow

Theorem: If an NP-complete problem has a statistical zero-knowledge proof, then the polynomial hierarchy collapses.

Without going into detail about what the polynomial hierarchy is, suffice it to say that most people believe it’s unlikely. And because this means that SZK probably does not contain all of NP, we can ask natural followup questions like, can problems in SZK be solved by polynomial-time quantum algorithms? It’s also open whether these can solve NP-complete problems.

Statistical zero-knowledge can be specialized even further.

Theorem: Every problem with a statistical zero-knowledge proof has a statistical zero-knowledge proof in which the only messages the verifier sends are random coin flips.

This is a so-called “public coin” protocol, which makes zero-knowledge proofs much easier to reason about.

For further reading, see this introduction by Salil Vadhan, arguably the world’s leading expert on the topic. Particularly interesting is sections 3-5 in which Vadhan describes more detail about how zero-knowledge has been used as an “interface” for techniques to pass between complexity theory and cryptography.

For further reading with on the excruciating details of the definitions and proofs presented in this series, see Oded Goldreich’s Foundations of Cryptography (Volume I, Chapter 4). I have mixed feelings about this book because, despite the fact that I’ve found it enlightening, the book somehow manages to be simultaneously dense and scattered. It leaves no snag unsmoothed, for better or worse.

Until next time!

Zero Knowledge Proofs for NP

Last time, we saw a specific zero-knowledge proof for graph isomorphism. This introduced us to the concept of an interactive proof, where you have a prover and a verifier sending messages back and forth, and the prover is trying to prove a specific claim to the verifier.

A zero-knowledge proof is a special kind of interactive proof in which the prover has some secret piece of knowledge that makes it very easy to verify a disputed claim is true. The prover’s goal, then, is to convince the verifier (a polynomial-time algorithm) that the claim is true without revealing any knowledge at all about the secret.

In this post we’ll see that, using a bit of cryptography, zero-knowledge proofs capture a much wider class of problems than graph isomorphism. Basically, if you believe that cryptography exists, every problem whose answers can be easily verified have zero-knowledge proofs (i.e., all of the class NP). Here are a bunch of examples. For each I’ll phrase the problem as a question, and then say what sort of data the prover’s secret could be.

  • Given a boolean formula, is there an assignment of variables making it true? Secret: a satisfying assignment to the variables.
  • Given a set of integers, is there a subset whose sum is zero? Secret: such a subset.
  • Given a graph, does it have a 3-coloring? Secret: a valid 3-coloring.
  • Given a boolean circuit, can it produce a specific output? Secret: a choice of inputs that produces the output.

The common link among all of these problems is that they are NP-hard (graph isomorphism isn’t known to be NP-hard). For us this means two things: (1) we think these problems are actually hard, so the verifier can’t solve them, and (2) if you show that one of them has a zero-knowledge proof, then they all have zero-knowledge proofs.

We’re going to describe and implement a zero-knowledge proof for graph 3-colorability, and in the next post we’ll dive into the theoretical definitions and talk about the proof that the scheme we present is zero-knowledge. As usual, all of the code used in making this post is available in a repository on this blog’s Github page.

One-way permutations

In a recent program gallery post we introduced the Blum-Blum-Shub pseudorandom generator. A pseudorandom generator is simply an algorithm that takes as input a short random string of length s and produces as output a longer string, say, of length 3s. This output string should not be random, but rather “indistinguishable” from random in a sense we’ll make clear next time. The underlying function for this generator is the “modular squaring” function x \mapsto x^2 \mod M, for some cleverly chosen M. The M is chosen in such a way that makes this mapping a permutation. So this function is more than just a pseudorandom generator, it’s a one-way permutation.

If you have a primality-checking algorithm on hand (we do), then preparing the Blum-Blum-Shub algorithm is only about 15 lines of code.

def goodPrime(p):
    return p % 4 == 3 and probablyPrime(p, accuracy=100)

def findGoodPrime(numBits=512):
    candidate = 1

    while not goodPrime(candidate):
        candidate = random.getrandbits(numBits)

    return candidate

def makeModulus(numBits=512):
    return findGoodPrime(numBits) * findGoodPrime(numBits)

def blum_blum_shub(modulusLength=512):
    modulus = makeModulus(numBits=modulusLength)

    def f(inputInt):
        return pow(inputInt, 2, modulus)

    return f

The interested reader should check out the proof gallery post for more details about this generator. For us, having a one-way permutation is the important part (and we’re going to defer the formal definition of “one-way” until next time, just think “hard to get inputs from outputs”).

The other concept we need, which is related to a one-way permutation, is the notion of a hardcore predicate. Let G(x) be a one-way permutation, and let f(x) = b be a function that produces a single bit from a string. We say that f is a hardcore predicate for G if you can’t reliably compute f(x) when given only G(x).

Hardcore predicates are important because there are many one-way functions for which, when given the output, you can guess part of the input very reliably, but not the rest (e.g., if g is a one-way function, (x, y) \mapsto (x, g(y)) is also one-way, but the x part is trivially guessable). So a hardcore predicate formally measures, when given the output of a one-way function, what information derived from the input is hard to compute.

In the case of Blum-Blum-Shub, one hardcore predicate is simply the parity of the input bits.

def parity(n):
    return sum(int(x) for x in bin(n)[2:]) % 2

Bit Commitment Schemes

A core idea that will makes zero-knowledge proofs work for NP is the ability for the prover to publicly “commit” to a choice, and later reveal that choice in a way that makes it infeasible to fake their commitment. This will involve not just the commitment to a single bit of information, but also the transmission of auxiliary data that is provably infeasible to fake.

Our pair of one-way permutation G and hardcore predicate f comes in very handy. Let’s say I want to commit to a bit b \in \{ 0,1 \}. Let’s fix a security parameter that will measure how hard it is to change my commitment post-hoc, say n = 512. My process for committing is to draw a random string x of length n, and send you the pair (G(x), f(x) \oplus b), where \oplus is the XOR operator on two bits.

The guarantee of a one-way permutation with a hardcore predicate is that if you only see G(x), you can’t guess f(x) with any reasonable edge over random guessing. Moreover, if you fix a bit b, and take an unpredictably random bit y, the XOR b \oplus y is also unpredictably random. In other words, if f(x) is hardcore, then so is x \mapsto f(x) \oplus b for a fixed bit b. Finally, to reveal my commitment, I just send the string x and let you independently compute (G(x), f(x) \oplus b). Since G is a permutation, that x is the only x that could have produced the commitment I sent you earlier.

Here’s a Python implementation of this scheme. We start with a generic base class for a commitment scheme.

class CommitmentScheme(object):
    def __init__(self, oneWayPermutation, hardcorePredicate, securityParameter):
            oneWayPermutation: int -> int
            hardcorePredicate: int -> {0, 1}
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate
        self.securityParameter = securityParameter

        # a random string of length `self.securityParameter` used only once per commitment
        self.secret = self.generateSecret()

    def generateSecret(self):
        raise NotImplemented

    def commit(self, x):
        raise NotImplemented

    def reveal(self):
        return self.secret

Note that the “reveal” step is always simply to reveal the secret. Here’s the implementation subclass. We should also note that the security string should be chosen at random anew for every bit you wish to commit to. In this post we won’t reuse CommitmentScheme objects anyway.

class BBSBitCommitmentScheme(CommitmentScheme):
    def generateSecret(self):
        # the secret is a random quadratic residue
        self.secret = self.oneWayPermutation(random.getrandbits(self.securityParameter))
        return self.secret

    def commit(self, bit):
        unguessableBit = self.hardcorePredicate(self.secret)
        return (
            unguessableBit ^ bit,  # python xor

One important detail is that the Blum-Blum-Shub one-way permutation is only a permutation when restricted to quadratic residues. As such, we generate our secret by shooting a random string through the one-way permutation to get a random residue. In fact this produces a uniform random residue, since the Blum-Blum-Shub modulus is chosen in such a way that ensures every residue has exactly four square roots.

Here’s code to check the verification is correct.

class BBSBitCommitmentVerifier(object):
    def __init__(self, oneWayPermutation, hardcorePredicate):
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate

    def verify(self, securityString, claimedCommitment):
        trueBit = self.decode(securityString, claimedCommitment)
        unguessableBit = self.hardcorePredicate(securityString)  # wasteful, whatever
        return claimedCommitment == (
            unguessableBit ^ trueBit,  # python xor

    def decode(self, securityString, claimedCommitment):
        unguessableBit = self.hardcorePredicate(securityString)
        return claimedCommitment[1] ^ unguessableBit

and an example of using it

if __name__ == "__main__":
    import blum_blum_shub
    securityParameter = 10
    oneWayPerm = blum_blum_shub.blum_blum_shub(securityParameter)
    hardcorePred = blum_blum_shub.parity

    print('Bit commitment')
    scheme = BBSBitCommitmentScheme(oneWayPerm, hardcorePred, securityParameter)
    verifier = BBSBitCommitmentVerifier(oneWayPerm, hardcorePred)

    for _ in range(10):
        bit = random.choice([0, 1])
        commitment = scheme.commit(bit)
        secret = scheme.reveal()
        trueBit = verifier.decode(secret, commitment)
        valid = verifier.verify(secret, commitment)

        print('{} == {}? {}; {} {}'.format(bit, trueBit, valid, secret, commitment))

Example output:

1 == 1? True; 524 (5685, 0)
1 == 1? True; 149 (22201, 1)
1 == 1? True; 476 (34511, 1)
1 == 1? True; 927 (14243, 1)
1 == 1? True; 608 (23947, 0)
0 == 0? True; 964 (7384, 1)
0 == 0? True; 373 (23890, 0)
0 == 0? True; 620 (270, 1)
1 == 1? True; 926 (12390, 0)
0 == 0? True; 708 (1895, 0)

As an exercise, write a program to verify that no other input to the Blum-Blum-Shub one-way permutation gives a valid verification. Test it on a small security parameter like n=10.

It’s also important to point out that the verifier needs to do some additional validation that we left out. For example, how does the verifier know that the revealed secret actually is a quadratic residue? In fact, detecting quadratic residues is believed to be hard! To get around this, we could change the commitment scheme reveal step to reveal the random string that was used as input to the permutation to get the residue (cf. BBSCommitmentScheme.generateSecret for the random string that needs to be saved/revealed). Then the verifier could generate the residue in the same way. As an exercise, upgrade the bit commitment an verifier classes to reflect this.

In order to get a zero-knowledge proof for 3-coloring, we need to be able to commit to one of three colors, which requires two bits. So let’s go overkill and write a generic integer commitment scheme. It’s simple enough: specify a bound on the size of the integers, and then do an independent bit commitment for every bit.

class BBSIntCommitmentScheme(CommitmentScheme):
    def __init__(self, numBits, oneWayPermutation, hardcorePredicate, securityParameter=512):
            A commitment scheme for integers of a prespecified length `numBits`. Applies the
            Blum-Blum-Shub bit commitment scheme to each bit independently.
        self.schemes = [BBSBitCommitmentScheme(oneWayPermutation, hardcorePredicate, securityParameter)
                        for _ in range(numBits)]
        super().__init__(oneWayPermutation, hardcorePredicate, securityParameter)

    def generateSecret(self):
        self.secret = [x.secret for x in self.schemes]
        return self.secret

    def commit(self, integer):
        # first pad bits to desired length
        integer = bin(integer)[2:].zfill(len(self.schemes))
        bits = [int(bit) for bit in integer]
        return [scheme.commit(bit) for scheme, bit in zip(self.schemes, bits)]

And the corresponding verifier

class BBSIntCommitmentVerifier(object):
    def __init__(self, numBits, oneWayPermutation, hardcorePredicate):
        self.verifiers = [BBSBitCommitmentVerifier(oneWayPermutation, hardcorePredicate)
                          for _ in range(numBits)]

    def decodeBits(self, secrets, bitCommitments):
        return [v.decode(secret, commitment) for (v, secret, commitment) in
                zip(self.verifiers, secrets, bitCommitments)]

    def verify(self, secrets, bitCommitments):
        return all(
            bitVerifier.verify(secret, commitment)
            for (bitVerifier, secret, commitment) in
            zip(self.verifiers, secrets, bitCommitments)

    def decode(self, secrets, bitCommitments):
        decodedBits = self.decodeBits(secrets, bitCommitments)
        return int(''.join(str(bit) for bit in decodedBits))

A sample usage:

if __name__ == "__main__":
    import blum_blum_shub
    securityParameter = 10
    oneWayPerm = blum_blum_shub.blum_blum_shub(securityParameter)
    hardcorePred = blum_blum_shub.parity

    print('Int commitment')
    scheme = BBSIntCommitmentScheme(10, oneWayPerm, hardcorePred)
    verifier = BBSIntCommitmentVerifier(10, oneWayPerm, hardcorePred)
    choices = list(range(1024))
    for _ in range(10):
        theInt = random.choice(choices)
        commitments = scheme.commit(theInt)
        secrets = scheme.reveal()
        trueInt = verifier.decode(secrets, commitments)
        valid = verifier.verify(secrets, commitments)

        print('{} == {}? {}; {} {}'.format(theInt, trueInt, valid, secrets, commitments))

And a sample output:

527 == 527? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 0), (5426, 0), (9124, 1), (23973, 0), (44832, 0), (33044, 0), (68501, 0)]
67 == 67? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 1), (63975, 1), (5426, 0), (9124, 1), (23973, 1), (44832, 1), (33044, 0), (68501, 0)]
729 == 729? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 0), (63975, 1), (5426, 0), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 0)]
441 == 441? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 0), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 0)]
614 == 614? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 1), (5426, 1), (9124, 1), (23973, 1), (44832, 0), (33044, 0), (68501, 1)]
696 == 696? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
974 == 974? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 0), (54363, 0), (63975, 1), (5426, 0), (9124, 1), (23973, 0), (44832, 0), (33044, 0), (68501, 1)]
184 == 184? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
136 == 136? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 0), (63975, 0), (5426, 0), (9124, 1), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
632 == 632? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 1), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]

Before we move on, we should note that this integer commitment scheme “blows up” the secret by quite a bit. If you have a security parameter s and an integer with n bits, then the commitment uses roughly sn bits. A more efficient method would be to simply use a good public-key encryption scheme, and then reveal the secret key used to encrypt the message. While we implemented such schemes previously on this blog, I thought it would be more fun to do something new.

A zero-knowledge proof for 3-coloring

First, a high-level description of the protocol. The setup: the prover has a graph G with n vertices V and m edges E, and also has a secret 3-coloring of the vertices \varphi: V \to \{ 0, 1, 2 \}. Recall, a 3-coloring is just an assignment of colors to vertices (in this case the colors are 0,1,2) so that no two adjacent vertices have the same color.

So the prover has a coloring \varphi to be kept secret, but wants to prove that G is 3-colorable. The idea is for the verifier to pick a random edge (u,v), and have the prover reveal the colors of u and v. However, if we run this protocol only once, there’s nothing to stop the prover from just lying and picking two distinct colors. If we allow the verifier to run the protocol many times, and the prover actually reveals the colors from their secret coloring, then after roughly |V| rounds the verifier will know the entire coloring. Each step reveals more knowledge.

We can fix this with two modifications.

  1. The prover first publicly commits to the coloring using a commitment scheme. Then when the verifier asks for the colors of the two vertices of a random edge, he can rest assured that the prover fixed a coloring that does not depend on the verifier’s choice of edge.
  2. The prover doesn’t reveal colors from their secret coloring, but rather from a random permutation of the secret coloring. This way, when the verifier sees colors, they’re equally likely to see any two colors, and all the verifier will know is that those two colors are different.

So the scheme is: prover commits to a random permutation of the true coloring and sends it to the verifier; the verifier asks for the true colors of a given edge; the prover provides those colors and the secrets to their commitment scheme so the verifier can check.

The key point is that now the verifier has to commit to a coloring, and if the coloring isn’t a proper 3-coloring the verifier has a reasonable chance of picking an improperly colored edge (a one-in-|E| chance, which is at least 1/|V|^2). On the other hand, if the coloring is proper, then the verifier will always query a properly colored edge, and it’s zero-knowledge because the verifier is equally likely to see every pair of colors. So the verifier will always accept, but won’t know anything more than that the edge it chose is properly colored. Repeating this |V|^2-ish times, with high probability it’ll have queried every edge and be certain the coloring is legitimate.

Let’s implement this scheme. First the data types. As in the previous post, graphs are represented by edge lists, and a coloring is represented by a dictionary mapping a vertex to 0, 1, or 2 (the “colors”).

# a graph is a list of edges, and for simplicity we'll say
# every vertex shows up in some edge
exampleGraph = [
    (1, 2),
    (1, 4),
    (1, 3),
    (2, 5),
    (2, 5),
    (3, 6),
    (5, 6)

exampleColoring = {
    1: 0,
    2: 1,
    3: 2,
    4: 1,
    5: 2,
    6: 0,

Next, the Prover class that implements that half of the protocol. We store a list of integer commitment schemes for each vertex whose color we need to commit to, and send out those commitments.

class Prover(object):
    def __init__(self, graph, coloring, oneWayPermutation=ONE_WAY_PERMUTATION, hardcorePredicate=HARDCORE_PREDICATE):
        self.graph = [tuple(sorted(e)) for e in graph]
        self.coloring = coloring
        self.vertices = list(range(1, numVertices(graph) + 1))
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate
        self.vertexToScheme = None

    def commitToColoring(self):
        self.vertexToScheme = {
            v: commitment.BBSIntCommitmentScheme(
                2, self.oneWayPermutation, self.hardcorePredicate
            ) for v in self.vertices

        permutation = randomPermutation(3)
        permutedColoring = {
            v: permutation[self.coloring[v]] for v in self.vertices

        return {v: s.commit(permutedColoring[v])
                for (v, s) in self.vertexToScheme.items()}

    def revealColors(self, u, v):
        u, v = min(u, v), max(u, v)
        if not (u, v) in self.graph:
            raise Exception('Must query an edge!')

        return (

In commitToColoring we randomly permute the underlying colors, and then compose that permutation with the secret coloring, committing to each resulting color independently. In revealColors we reveal only those colors for a queried edge. Note that we don’t actually need to store the permuted coloring, because it’s implicitly stored in the commitments.

It’s crucial that we reject any query that doesn’t correspond to an edge. If we don’t reject such queries then the verifier can break the protocol! In particular, by querying non-edges you can determine which pairs of nodes have the same color in the secret coloring. You can then chain these together to partition the nodes into color classes, and so color the graph. (After seeing the Verifier class below, implement this attack as an exercise).

Here’s the corresponding Verifier:

class Verifier(object):
    def __init__(self, graph, oneWayPermutation, hardcorePredicate):
        self.graph = [tuple(sorted(e)) for e in graph]
        self.oneWayPermutation = oneWayPermutation
        self.hardcorePredicate = hardcorePredicate
        self.committedColoring = None
        self.verifier = commitment.BBSIntCommitmentVerifier(2, oneWayPermutation, hardcorePredicate)

    def chooseEdge(self, committedColoring):
        self.committedColoring = committedColoring
        self.chosenEdge = random.choice(self.graph)
        return self.chosenEdge

    def accepts(self, revealed):
        revealedColors = []

        for (w, bitSecrets) in zip(self.chosenEdge, revealed):
            trueColor = self.verifier.decode(bitSecrets, self.committedColoring[w])
            if not self.verifier.verify(bitSecrets, self.committedColoring[w]):
                return False

        return revealedColors[0] != revealedColors[1]

As expected, in the acceptance step the verifier decodes the true color of the edge it queried, and accepts if and only if the commitment was valid and the edge is properly colored.

Here’s the whole protocol, which is syntactically very similar to the one for graph isomorphism.

def runProtocol(G, coloring, securityParameter=512):
    oneWayPermutation = blum_blum_shub.blum_blum_shub(securityParameter)
    hardcorePredicate = blum_blum_shub.parity

    prover = Prover(G, coloring, oneWayPermutation, hardcorePredicate)
    verifier = Verifier(G, oneWayPermutation, hardcorePredicate)

    committedColoring = prover.commitToColoring()
    chosenEdge = verifier.chooseEdge(committedColoring)

    revealed = prover.revealColors(*chosenEdge)
    revealedColors = (
        verifier.verifier.decode(revealed[0], committedColoring[chosenEdge[0]]),
        verifier.verifier.decode(revealed[1], committedColoring[chosenEdge[1]]),
    isValid = verifier.accepts(revealed)

    print("{} != {} and commitment is valid? {}".format(
        revealedColors[0], revealedColors[1], isValid

    return isValid

And an example of running it

if __name__ == "__main__":
    for _ in range(30):
        runProtocol(exampleGraph, exampleColoring, securityParameter=10)

Here’s the output

0 != 2 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
1 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 1 and commitment is valid? True
0 != 1 and commitment is valid? True
2 != 1 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
2 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 1 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 1 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
2 != 1 and commitment is valid? True
2 != 1 and commitment is valid? True
1 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
1 != 2 and commitment is valid? True
1 != 2 and commitment is valid? True
0 != 1 and commitment is valid? True

So while we haven’t proved it rigorously, we’ve seen the zero-knowledge proof for graph 3-coloring. This automatically gives us a zero-knowledge proof for all of NP, because given any NP problem you can just convert it to the equivalent 3-coloring problem and solve that. Of course, the blowup required to convert a random NP problem to 3-coloring can be polynomially large, which makes it unsuitable for practice. But the point is that this gives us a theoretical justification for which problems have zero-knowledge proofs in principle. Now that we’ve established that you can go about trying to find the most efficient protocol for your favorite problem.

Anticipatory notes

When we covered graph isomorphism last time, we said that a simulator could, without participating in the zero-knowledge protocol or knowing the secret isomorphism, produce a transcript that was drawn from the same distribution of messages as the protocol produced. That was all that it needed to be “zero-knowledge,” because anything the verifier could do with its protocol transcript, the simulator could do too.

We can do exactly the same thing for 3-coloring, exploiting the same “reverse order” trick where the simulator picks the random edge first, then chooses the color commitment post-hoc.

Unfortunately, both there and here I’m short-changing you, dear reader. The elephant in the room is that our naive simulator assumes the verifier is playing by the rules! If you want to define security, you have to define it against a verifier who breaks the protocol in an arbitrary way. For example, the simulator should be able to produce an equivalent transcript even if the verifier deterministically picks an edge, or tries to pick a non-edge, or tries to send gibberish. It takes a lot more work to prove security against an arbitrary verifier, but the basic setup is that the simulator can no longer make choices for the verifier, but rather has to invoke the verifier subroutine as a black box. (To compensate, the requirements on the simulator are relaxed quite a bit; more on that next time)

Because an implementation of such a scheme would involve a lot of validation, we’re going to defer the discussion to next time. We also need to be more specific about the different kinds of zero-knowledge, since we won’t be able to achieve perfect zero-knowledge with the simulator drawing from an identical distribution, but rather a computationally indistinguishable distribution.

We’ll define all this rigorously next time, and discuss the known theoretical implications and limitations. Next time will be cuffs-off theory, baby!

Until then!