# Zero Knowledge Proofs for NP

Last time, we saw a specific zero-knowledge proof for graph isomorphism. This introduced us to the concept of an interactive proof, where you have a prover and a verifier sending messages back and forth, and the prover is trying to prove a specific claim to the verifier.

A zero-knowledge proof is a special kind of interactive proof in which the prover has some secret piece of knowledge that makes it very easy to verify a disputed claim is true. The prover’s goal, then, is to convince the verifier (a polynomial-time algorithm) that the claim is true without revealing any knowledge at all about the secret.

In this post we’ll see that, using a bit of cryptography, zero-knowledge proofs capture a much wider class of problems than graph isomorphism. Basically, if you believe that cryptography exists, every problem whose answers can be easily verified have zero-knowledge proofs (i.e., all of the class NP). Here are a bunch of examples. For each I’ll phrase the problem as a question, and then say what sort of data the prover’s secret could be.

• Given a boolean formula, is there an assignment of variables making it true? Secret: a satisfying assignment to the variables.
• Given a set of integers, is there a subset whose sum is zero? Secret: such a subset.
• Given a graph, does it have a 3-coloring? Secret: a valid 3-coloring.
• Given a boolean circuit, can it produce a specific output? Secret: a choice of inputs that produces the output.

The common link among all of these problems is that they are NP-hard (graph isomorphism isn’t known to be NP-hard). For us this means two things: (1) we think these problems are actually hard, so the verifier can’t solve them, and (2) if you show that one of them has a zero-knowledge proof, then they all have zero-knowledge proofs.

We’re going to describe and implement a zero-knowledge proof for graph 3-colorability, and in the next post we’ll dive into the theoretical definitions and talk about the proof that the scheme we present is zero-knowledge. As usual, all of the code used in making this post is available in a repository on this blog’s Github page.

## One-way permutations

In a recent program gallery post we introduced the Blum-Blum-Shub pseudorandom generator. A pseudorandom generator is simply an algorithm that takes as input a short random string of length $s$ and produces as output a longer string, say, of length $3s$. This output string should not be random, but rather “indistinguishable” from random in a sense we’ll make clear next time. The underlying function for this generator is the “modular squaring” function $x \mapsto x^2 \mod M$, for some cleverly chosen $M$. The $M$ is chosen in such a way that makes this mapping a permutation. So this function is more than just a pseudorandom generator, it’s a one-way permutation.

If you have a primality-checking algorithm on hand (we do), then preparing the Blum-Blum-Shub algorithm is only about 15 lines of code.

def goodPrime(p):
return p % 4 == 3 and probablyPrime(p, accuracy=100)

def findGoodPrime(numBits=512):
candidate = 1

while not goodPrime(candidate):
candidate = random.getrandbits(numBits)

return candidate

def makeModulus(numBits=512):
return findGoodPrime(numBits) * findGoodPrime(numBits)

def blum_blum_shub(modulusLength=512):
modulus = makeModulus(numBits=modulusLength)

def f(inputInt):
return pow(inputInt, 2, modulus)

return f

The interested reader should check out the proof gallery post for more details about this generator. For us, having a one-way permutation is the important part (and we’re going to defer the formal definition of “one-way” until next time, just think “hard to get inputs from outputs”).

The other concept we need, which is related to a one-way permutation, is the notion of a hardcore predicate. Let $G(x)$ be a one-way permutation, and let $f(x) = b$ be a function that produces a single bit from a string. We say that $f$ is a hardcore predicate for $G$ if you can’t reliably compute $f(x)$ when given only $G(x)$.

Hardcore predicates are important because there are many one-way functions for which, when given the output, you can guess part of the input very reliably, but not the rest (e.g., if $g$ is a one-way function, $(x, y) \mapsto (x, g(y))$ is also one-way, but the $x$ part is trivially guessable). So a hardcore predicate formally measures, when given the output of a one-way function, what information derived from the input is hard to compute.

In the case of Blum-Blum-Shub, one hardcore predicate is simply the parity of the input bits.

def parity(n):
return sum(int(x) for x in bin(n)[2:]) % 2

## Bit Commitment Schemes

A core idea that will makes zero-knowledge proofs work for NP is the ability for the prover to publicly “commit” to a choice, and later reveal that choice in a way that makes it infeasible to fake their commitment. This will involve not just the commitment to a single bit of information, but also the transmission of auxiliary data that is provably infeasible to fake.

Our pair of one-way permutation $G$ and hardcore predicate $f$ comes in very handy. Let’s say I want to commit to a bit $b \in \{ 0,1 \}$. Let’s fix a security parameter that will measure how hard it is to change my commitment post-hoc, say $n = 512$. My process for committing is to draw a random string $x$ of length $n$, and send you the pair $(G(x), f(x) \oplus b)$, where $\oplus$ is the XOR operator on two bits.

The guarantee of a one-way permutation with a hardcore predicate is that if you only see $G(x)$, you can’t guess $f(x)$ with any reasonable edge over random guessing. Moreover, if you fix a bit $b$, and take an unpredictably random bit $y$, the XOR $b \oplus y$ is also unpredictably random. In other words, if $f(x)$ is hardcore, then so is $x \mapsto f(x) \oplus b$ for a fixed bit $b$. Finally, to reveal my commitment, I just send the string $x$ and let you independently compute $(G(x), f(x) \oplus b)$. Since $G$ is a permutation, that $x$ is the only $x$ that could have produced the commitment I sent you earlier.

Here’s a Python implementation of this scheme. We start with a generic base class for a commitment scheme.

class CommitmentScheme(object):
def __init__(self, oneWayPermutation, hardcorePredicate, securityParameter):
'''
oneWayPermutation: int -> int
hardcorePredicate: int -> {0, 1}
'''
self.oneWayPermutation = oneWayPermutation
self.hardcorePredicate = hardcorePredicate
self.securityParameter = securityParameter

# a random string of length self.securityParameter used only once per commitment
self.secret = self.generateSecret()

def generateSecret(self):
raise NotImplemented

def commit(self, x):
raise NotImplemented

def reveal(self):
return self.secret

Note that the “reveal” step is always simply to reveal the secret. Here’s the implementation subclass. We should also note that the security string should be chosen at random anew for every bit you wish to commit to. In this post we won’t reuse CommitmentScheme objects anyway.

class BBSBitCommitmentScheme(CommitmentScheme):
def generateSecret(self):
# the secret is a random quadratic residue
self.secret = self.oneWayPermutation(random.getrandbits(self.securityParameter))
return self.secret

def commit(self, bit):
unguessableBit = self.hardcorePredicate(self.secret)
return (
self.oneWayPermutation(self.secret),
unguessableBit ^ bit,  # python xor
)

One important detail is that the Blum-Blum-Shub one-way permutation is only a permutation when restricted to quadratic residues. As such, we generate our secret by shooting a random string through the one-way permutation to get a random residue. In fact this produces a uniform random residue, since the Blum-Blum-Shub modulus is chosen in such a way that ensures every residue has exactly four square roots.

Here’s code to check the verification is correct.

class BBSBitCommitmentVerifier(object):
def __init__(self, oneWayPermutation, hardcorePredicate):
self.oneWayPermutation = oneWayPermutation
self.hardcorePredicate = hardcorePredicate

def verify(self, securityString, claimedCommitment):
trueBit = self.decode(securityString, claimedCommitment)
unguessableBit = self.hardcorePredicate(securityString)  # wasteful, whatever
return claimedCommitment == (
self.oneWayPermutation(securityString),
unguessableBit ^ trueBit,  # python xor
)

def decode(self, securityString, claimedCommitment):
unguessableBit = self.hardcorePredicate(securityString)
return claimedCommitment[1] ^ unguessableBit

and an example of using it

if __name__ == "__main__":
import blum_blum_shub
securityParameter = 10
oneWayPerm = blum_blum_shub.blum_blum_shub(securityParameter)
hardcorePred = blum_blum_shub.parity

print('Bit commitment')
scheme = BBSBitCommitmentScheme(oneWayPerm, hardcorePred, securityParameter)
verifier = BBSBitCommitmentVerifier(oneWayPerm, hardcorePred)

for _ in range(10):
bit = random.choice([0, 1])
commitment = scheme.commit(bit)
secret = scheme.reveal()
trueBit = verifier.decode(secret, commitment)
valid = verifier.verify(secret, commitment)

print('{} == {}? {}; {} {}'.format(bit, trueBit, valid, secret, commitment))

Example output:

1 == 1? True; 524 (5685, 0)
1 == 1? True; 149 (22201, 1)
1 == 1? True; 476 (34511, 1)
1 == 1? True; 927 (14243, 1)
1 == 1? True; 608 (23947, 0)
0 == 0? True; 964 (7384, 1)
0 == 0? True; 373 (23890, 0)
0 == 0? True; 620 (270, 1)
1 == 1? True; 926 (12390, 0)
0 == 0? True; 708 (1895, 0)

As an exercise, write a program to verify that no other input to the Blum-Blum-Shub one-way permutation gives a valid verification. Test it on a small security parameter like $n=10$.

It’s also important to point out that the verifier needs to do some additional validation that we left out. For example, how does the verifier know that the revealed secret actually is a quadratic residue? In fact, detecting quadratic residues is believed to be hard! To get around this, we could change the commitment scheme reveal step to reveal the random string that was used as input to the permutation to get the residue (cf. BBSCommitmentScheme.generateSecret for the random string that needs to be saved/revealed). Then the verifier could generate the residue in the same way. As an exercise, upgrade the bit commitment an verifier classes to reflect this.

In order to get a zero-knowledge proof for 3-coloring, we need to be able to commit to one of three colors, which requires two bits. So let’s go overkill and write a generic integer commitment scheme. It’s simple enough: specify a bound on the size of the integers, and then do an independent bit commitment for every bit.

class BBSIntCommitmentScheme(CommitmentScheme):
def __init__(self, numBits, oneWayPermutation, hardcorePredicate, securityParameter=512):
'''
A commitment scheme for integers of a prespecified length numBits. Applies the
Blum-Blum-Shub bit commitment scheme to each bit independently.
'''
self.schemes = [BBSBitCommitmentScheme(oneWayPermutation, hardcorePredicate, securityParameter)
for _ in range(numBits)]
super().__init__(oneWayPermutation, hardcorePredicate, securityParameter)

def generateSecret(self):
self.secret = [x.secret for x in self.schemes]
return self.secret

def commit(self, integer):
# first pad bits to desired length
integer = bin(integer)[2:].zfill(len(self.schemes))
bits = [int(bit) for bit in integer]
return [scheme.commit(bit) for scheme, bit in zip(self.schemes, bits)]

And the corresponding verifier

class BBSIntCommitmentVerifier(object):
def __init__(self, numBits, oneWayPermutation, hardcorePredicate):
self.verifiers = [BBSBitCommitmentVerifier(oneWayPermutation, hardcorePredicate)
for _ in range(numBits)]

def decodeBits(self, secrets, bitCommitments):
return [v.decode(secret, commitment) for (v, secret, commitment) in
zip(self.verifiers, secrets, bitCommitments)]

def verify(self, secrets, bitCommitments):
return all(
bitVerifier.verify(secret, commitment)
for (bitVerifier, secret, commitment) in
zip(self.verifiers, secrets, bitCommitments)
)

def decode(self, secrets, bitCommitments):
decodedBits = self.decodeBits(secrets, bitCommitments)
return int(''.join(str(bit) for bit in decodedBits))

A sample usage:

if __name__ == "__main__":
import blum_blum_shub
securityParameter = 10
oneWayPerm = blum_blum_shub.blum_blum_shub(securityParameter)
hardcorePred = blum_blum_shub.parity

print('Int commitment')
scheme = BBSIntCommitmentScheme(10, oneWayPerm, hardcorePred)
verifier = BBSIntCommitmentVerifier(10, oneWayPerm, hardcorePred)
choices = list(range(1024))
for _ in range(10):
theInt = random.choice(choices)
commitments = scheme.commit(theInt)
secrets = scheme.reveal()
trueInt = verifier.decode(secrets, commitments)
valid = verifier.verify(secrets, commitments)

print('{} == {}? {}; {} {}'.format(theInt, trueInt, valid, secrets, commitments))

And a sample output:

527 == 527? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 0), (5426, 0), (9124, 1), (23973, 0), (44832, 0), (33044, 0), (68501, 0)]
67 == 67? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 1), (63975, 1), (5426, 0), (9124, 1), (23973, 1), (44832, 1), (33044, 0), (68501, 0)]
729 == 729? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 0), (63975, 1), (5426, 0), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 0)]
441 == 441? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 0), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 0)]
614 == 614? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 1), (5426, 1), (9124, 1), (23973, 1), (44832, 0), (33044, 0), (68501, 1)]
696 == 696? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
974 == 974? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 0), (54363, 0), (63975, 1), (5426, 0), (9124, 1), (23973, 0), (44832, 0), (33044, 0), (68501, 1)]
184 == 184? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 0), (63975, 0), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
136 == 136? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 1), (342, 1), (54363, 0), (63975, 0), (5426, 0), (9124, 1), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]
632 == 632? True; [25461, 56722, 25739, 2268, 1185, 18226, 46375, 8907, 54979, 23095] [(29616, 0), (342, 1), (54363, 1), (63975, 1), (5426, 1), (9124, 0), (23973, 0), (44832, 1), (33044, 1), (68501, 1)]

Before we move on, we should note that this integer commitment scheme “blows up” the secret by quite a bit. If you have a security parameter $s$ and an integer with $n$ bits, then the commitment uses roughly $sn$ bits. A more efficient method would be to simply use a good public-key encryption scheme, and then reveal the secret key used to encrypt the message. While we implemented such schemes previously on this blog, I thought it would be more fun to do something new.

## A zero-knowledge proof for 3-coloring

First, a high-level description of the protocol. The setup: the prover has a graph $G$ with $n$ vertices $V$ and $m$ edges $E$, and also has a secret 3-coloring of the vertices $\varphi: V \to \{ 0, 1, 2 \}$. Recall, a 3-coloring is just an assignment of colors to vertices (in this case the colors are 0,1,2) so that no two adjacent vertices have the same color.

So the prover has a coloring $\varphi$ to be kept secret, but wants to prove that $G$ is 3-colorable. The idea is for the verifier to pick a random edge $(u,v)$, and have the prover reveal the colors of $u$ and $v$. However, if we run this protocol only once, there’s nothing to stop the prover from just lying and picking two distinct colors. If we allow the verifier to run the protocol many times, and the prover actually reveals the colors from their secret coloring, then after roughly $|V|$ rounds the verifier will know the entire coloring. Each step reveals more knowledge.

We can fix this with two modifications.

1. The prover first publicly commits to the coloring using a commitment scheme. Then when the verifier asks for the colors of the two vertices of a random edge, he can rest assured that the prover fixed a coloring that does not depend on the verifier’s choice of edge.
2. The prover doesn’t reveal colors from their secret coloring, but rather from a random permutation of the secret coloring. This way, when the verifier sees colors, they’re equally likely to see any two colors, and all the verifier will know is that those two colors are different.

So the scheme is: prover commits to a random permutation of the true coloring and sends it to the verifier; the verifier asks for the true colors of a given edge; the prover provides those colors and the secrets to their commitment scheme so the verifier can check.

The key point is that now the verifier has to commit to a coloring, and if the coloring isn’t a proper 3-coloring the verifier has a reasonable chance of picking an improperly colored edge (a one-in-$|E|$ chance, which is at least $1/|V|^2$). On the other hand, if the coloring is proper, then the verifier will always query a properly colored edge, and it’s zero-knowledge because the verifier is equally likely to see every pair of colors. So the verifier will always accept, but won’t know anything more than that the edge it chose is properly colored. Repeating this $|V|^2$-ish times, with high probability it’ll have queried every edge and be certain the coloring is legitimate.

Let’s implement this scheme. First the data types. As in the previous post, graphs are represented by edge lists, and a coloring is represented by a dictionary mapping a vertex to 0, 1, or 2 (the “colors”).

# a graph is a list of edges, and for simplicity we'll say
# every vertex shows up in some edge
exampleGraph = [
(1, 2),
(1, 4),
(1, 3),
(2, 5),
(2, 5),
(3, 6),
(5, 6)
]

exampleColoring = {
1: 0,
2: 1,
3: 2,
4: 1,
5: 2,
6: 0,
}

Next, the Prover class that implements that half of the protocol. We store a list of integer commitment schemes for each vertex whose color we need to commit to, and send out those commitments.

class Prover(object):
def __init__(self, graph, coloring, oneWayPermutation=ONE_WAY_PERMUTATION, hardcorePredicate=HARDCORE_PREDICATE):
self.graph = [tuple(sorted(e)) for e in graph]
self.coloring = coloring
self.vertices = list(range(1, numVertices(graph) + 1))
self.oneWayPermutation = oneWayPermutation
self.hardcorePredicate = hardcorePredicate
self.vertexToScheme = None

def commitToColoring(self):
self.vertexToScheme = {
v: commitment.BBSIntCommitmentScheme(
2, self.oneWayPermutation, self.hardcorePredicate
) for v in self.vertices
}

permutation = randomPermutation(3)
permutedColoring = {
v: permutation[self.coloring[v]] for v in self.vertices
}

return {v: s.commit(permutedColoring[v])
for (v, s) in self.vertexToScheme.items()}

def revealColors(self, u, v):
u, v = min(u, v), max(u, v)
if not (u, v) in self.graph:
raise Exception('Must query an edge!')

return (
self.vertexToScheme[u].reveal(),
self.vertexToScheme[v].reveal(),
)

In commitToColoring we randomly permute the underlying colors, and then compose that permutation with the secret coloring, committing to each resulting color independently. In revealColors we reveal only those colors for a queried edge. Note that we don’t actually need to store the permuted coloring, because it’s implicitly stored in the commitments.

It’s crucial that we reject any query that doesn’t correspond to an edge. If we don’t reject such queries then the verifier can break the protocol! In particular, by querying non-edges you can determine which pairs of nodes have the same color in the secret coloring. You can then chain these together to partition the nodes into color classes, and so color the graph. (After seeing the Verifier class below, implement this attack as an exercise).

Here’s the corresponding Verifier:

class Verifier(object):
def __init__(self, graph, oneWayPermutation, hardcorePredicate):
self.graph = [tuple(sorted(e)) for e in graph]
self.oneWayPermutation = oneWayPermutation
self.hardcorePredicate = hardcorePredicate
self.committedColoring = None
self.verifier = commitment.BBSIntCommitmentVerifier(2, oneWayPermutation, hardcorePredicate)

def chooseEdge(self, committedColoring):
self.committedColoring = committedColoring
self.chosenEdge = random.choice(self.graph)
return self.chosenEdge

def accepts(self, revealed):
revealedColors = []

for (w, bitSecrets) in zip(self.chosenEdge, revealed):
trueColor = self.verifier.decode(bitSecrets, self.committedColoring[w])
revealedColors.append(trueColor)
if not self.verifier.verify(bitSecrets, self.committedColoring[w]):
return False

return revealedColors[0] != revealedColors[1]

As expected, in the acceptance step the verifier decodes the true color of the edge it queried, and accepts if and only if the commitment was valid and the edge is properly colored.

Here’s the whole protocol, which is syntactically very similar to the one for graph isomorphism.

def runProtocol(G, coloring, securityParameter=512):
oneWayPermutation = blum_blum_shub.blum_blum_shub(securityParameter)
hardcorePredicate = blum_blum_shub.parity

prover = Prover(G, coloring, oneWayPermutation, hardcorePredicate)
verifier = Verifier(G, oneWayPermutation, hardcorePredicate)

committedColoring = prover.commitToColoring()
chosenEdge = verifier.chooseEdge(committedColoring)

revealed = prover.revealColors(*chosenEdge)
revealedColors = (
verifier.verifier.decode(revealed[0], committedColoring[chosenEdge[0]]),
verifier.verifier.decode(revealed[1], committedColoring[chosenEdge[1]]),
)
isValid = verifier.accepts(revealed)

print("{} != {} and commitment is valid? {}".format(
revealedColors[0], revealedColors[1], isValid
))

return isValid

And an example of running it

if __name__ == "__main__":
for _ in range(30):
runProtocol(exampleGraph, exampleColoring, securityParameter=10)

Here’s the output

0 != 2 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
1 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 1 and commitment is valid? True
0 != 1 and commitment is valid? True
2 != 1 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 0 and commitment is valid? True
2 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 1 and commitment is valid? True
0 != 2 and commitment is valid? True
0 != 2 and commitment is valid? True
2 != 1 and commitment is valid? True
1 != 0 and commitment is valid? True
1 != 0 and commitment is valid? True
2 != 1 and commitment is valid? True
2 != 1 and commitment is valid? True
1 != 0 and commitment is valid? True
0 != 2 and commitment is valid? True
1 != 2 and commitment is valid? True
1 != 2 and commitment is valid? True
0 != 1 and commitment is valid? True

So while we haven’t proved it rigorously, we’ve seen the zero-knowledge proof for graph 3-coloring. This automatically gives us a zero-knowledge proof for all of NP, because given any NP problem you can just convert it to the equivalent 3-coloring problem and solve that. Of course, the blowup required to convert a random NP problem to 3-coloring can be polynomially large, which makes it unsuitable for practice. But the point is that this gives us a theoretical justification for which problems have zero-knowledge proofs in principle. Now that we’ve established that you can go about trying to find the most efficient protocol for your favorite problem.

## Anticipatory notes

When we covered graph isomorphism last time, we said that a simulator could, without participating in the zero-knowledge protocol or knowing the secret isomorphism, produce a transcript that was drawn from the same distribution of messages as the protocol produced. That was all that it needed to be “zero-knowledge,” because anything the verifier could do with its protocol transcript, the simulator could do too.

We can do exactly the same thing for 3-coloring, exploiting the same “reverse order” trick where the simulator picks the random edge first, then chooses the color commitment post-hoc.

Unfortunately, both there and here I’m short-changing you, dear reader. The elephant in the room is that our naive simulator assumes the verifier is playing by the rules! If you want to define security, you have to define it against a verifier who breaks the protocol in an arbitrary way. For example, the simulator should be able to produce an equivalent transcript even if the verifier deterministically picks an edge, or tries to pick a non-edge, or tries to send gibberish. It takes a lot more work to prove security against an arbitrary verifier, but the basic setup is that the simulator can no longer make choices for the verifier, but rather has to invoke the verifier subroutine as a black box. (To compensate, the requirements on the simulator are relaxed quite a bit; more on that next time)

Because an implementation of such a scheme would involve a lot of validation, we’re going to defer the discussion to next time. We also need to be more specific about the different kinds of zero-knowledge, since we won’t be able to achieve perfect zero-knowledge with the simulator drawing from an identical distribution, but rather a computationally indistinguishable distribution.

We’ll define all this rigorously next time, and discuss the known theoretical implications and limitations. Next time will be cuffs-off theory, baby!

Until then!

# Zero Knowledge Proofs — A Primer

In this post we’ll get a strong taste for zero knowledge proofs by exploring the graph isomorphism problem in detail. In the next post, we’ll see how this relates to cryptography and the bigger picture. The goal of this post is to get a strong understanding of the terms “prover,” “verifier,” and “simulator,” and “zero knowledge” in the context of a specific zero-knowledge proof. Then next time we’ll see how the same concepts (though not the same proof) generalizes to a cryptographically interesting setting.

## Graph isomorphism

Let’s start with an extended example. We are given two graphs $G_1, G_2$, and we’d like to know whether they’re isomorphic, meaning they’re the same graph, but “drawn” different ways.

The problem of telling if two graphs are isomorphic seems hard. The pictures above, which are all different drawings of the same graph (or are they?), should give you pause if you thought it was easy.

To add a tiny bit of formalism, a graph $G$ is a list of edges, and each edge $(u,v)$ is a pair of integers between 1 and the total number of vertices of the graph, say $n$. Using this representation, an isomorphism between $G_1$ and $G_2$ is a permutation $\pi$ of the numbers $\{1, 2, \dots, n \}$ with the property that $(i,j)$ is an edge in $G_1$ if and only if $(\pi(i), \pi(j))$ is an edge of $G_2$. You swap around the labels on the vertices, and that’s how you get from one graph to another isomorphic one.

Given two arbitrary graphs as input on a large number of vertices $n$, nobody knows of an efficient—i.e., polynomial time in $n$—algorithm that can always decide whether the input graphs are isomorphic. Even if you promise me that the inputs are isomorphic, nobody knows of an algorithm that could construct an isomorphism. (If you think about it, such an algorithm could be used to solve the decision problem!)

## A game

Now let’s play a game. In this game, we’re given two enormous graphs on a billion nodes. I claim they’re isomorphic, and I want to prove it to you. However, my life’s fortune is locked behind these particular graphs (somehow), and if you actually had an isomorphism between these two graphs you could use it to steal all my money. But I still want to convince you that I do, in fact, own all of this money, because we’re about to start a business and you need to know I’m not broke.

Is there a way for me to convince you beyond a reasonable doubt that these two graphs are indeed isomorphic? And moreover, could I do so without you gaining access to my secret isomorphism? It would be even better if I could guarantee you learn nothing about my isomorphism or any isomorphism, because even the slightest chance that you can steal my money is out of the question.

Zero knowledge proofs have exactly those properties, and here’s a zero knowledge proof for graph isomorphism. For the record, $G_1$ and $G_2$ are public knowledge, (common inputs to our protocol for the sake of tracking runtime), and the protocol itself is common knowledge. However, I have an isomorphism $f: G_1 \to G_2$ that you don’t know.

Step 1: I will start by picking one of my two graphs, say $G_1$, mixing up the vertices, and sending you the resulting graph. In other words, I send you a graph $H$ which is chosen uniformly at random from all isomorphic copies of $G_1$. I will save the permutation $\pi$ that I used to generate $H$ for later use.

Step 2: You receive a graph $H$ which you save for later, and then you randomly pick an integer $t$ which is either 1 or 2, with equal probability on each. The number $t$ corresponds to your challenge for me to prove $H$ is isomorphic to $G_1$ or $G_2$. You send me back $t$, with the expectation that I will provide you with an isomorphism between $H$ and $G_t$.

Step 3: Indeed, I faithfully provide you such an isomorphism. If I you send me $t=1$, I’ll give you back $\pi^{-1} : H \to G_1$, and otherwise I’ll give you back $f \circ \pi^{-1}: H \to G_2$. Because composing a fixed permutation with a uniformly random permutation is again a uniformly random permutation, in either case I’m sending you a uniformly random permutation.

Step 4: You receive a permutation $g$, and you can use it to verify that $H$ is isomorphic to $G_t$. If the permutation I sent you doesn’t work, you’ll reject my claim, and if it does, you’ll accept my claim.

Before we analyze, here’s some Python code that implements the above scheme. You can find the full, working example in a repository on this blog’s Github page.

First, a few helper functions for generating random permutations (and turning their list-of-zero-based-indices form into a function-of-positive-integers form)

import random

def randomPermutation(n):
L = list(range(n))
random.shuffle(L)
return L

def makePermutationFunction(L):
return lambda i: L[i - 1] + 1

def makeInversePermutationFunction(L):
return lambda i: 1 + L.index(i - 1)

def applyIsomorphism(G, f):
return [(f(i), f(j)) for (i, j) in G]

Here’s a class for the Prover, the one who knows the isomorphism and wants to prove it while keeping the isomorphism secret:

class Prover(object):
def __init__(self, G1, G2, isomorphism):
'''
isomomorphism is a list of integers representing
an isomoprhism from G1 to G2.
'''
self.G1 = G1
self.G2 = G2
self.n = numVertices(G1)
assert self.n == numVertices(G2)

self.isomorphism = isomorphism
self.state = None

def sendIsomorphicCopy(self):
isomorphism = randomPermutation(self.n)
pi = makePermutationFunction(isomorphism)

H = applyIsomorphism(self.G1, pi)

self.state = isomorphism
return H

def proveIsomorphicTo(self, graphChoice):
randomIsomorphism = self.state
piInverse = makeInversePermutationFunction(randomIsomorphism)

if graphChoice == 1:
return piInverse
else:
f = makePermutationFunction(self.isomorphism)
return lambda i: f(piInverse(i))

The prover has two methods, one for each round of the protocol. The first creates an isomorphic copy of $G_1$, and the second receives the challenge and produces the requested isomorphism.

And here’s the corresponding class for the verifier

class Verifier(object):
def __init__(self, G1, G2):
self.G1 = G1
self.G2 = G2
self.n = numVertices(G1)
assert self.n == numVertices(G2)

def chooseGraph(self, H):
choice = random.choice([1, 2])
self.state = H, choice
return choice

def accepts(self, isomorphism):
'''
Return True if and only if the given isomorphism
is a valid isomorphism between the randomly
chosen graph in the first step, and the H presented
by the Prover.
'''
H, choice = self.state
graphToCheck = [self.G1, self.G2][choice - 1]
f = isomorphism

isValidIsomorphism = (graphToCheck == applyIsomorphism(H, f))
return isValidIsomorphism

Then the protocol is as follows:

def runProtocol(G1, G2, isomorphism):
p = Prover(G1, G2, isomorphism)
v = Verifier(G1, G2)

H = p.sendIsomorphicCopy()
choice = v.chooseGraph(H)
witnessIsomorphism = p.proveIsomorphicTo(choice)

return v.accepts(witnessIsomorphism)

Analysis: Let’s suppose for a moment that everyone is honestly following the rules, and that $G_1, G_2$ are truly isomorphic. Then you’ll always accept my claim, because I can always provide you with an isomorphism. Now let’s suppose that, actually I’m lying, the two graphs aren’t isomorphic, and I’m trying to fool you into thinking they are. What’s the probability that you’ll rightfully reject my claim?

Well, regardless of what I do, I’m sending you a graph $H$ and you get to make a random choice of $t = 1, 2$ that I can’t control. If $H$ is only actually isomorphic to either $G_1$ or $G_2$ but not both, then so long as you make your choice uniformly at random, half of the time I won’t be able to produce a valid isomorphism and you’ll reject. And unless you can actually tell which graph $H$ is isomorphic to—an open problem, but let’s say you can’t—then probability 1/2 is the best you can do.

Maybe the probability 1/2 is a bit unsatisfying, but remember that we can amplify this probability by repeating the protocol over and over again. So if you want to be sure I didn’t cheat and get lucky to within a probability of one-in-one-trillion, you only need to repeat the protocol 30 times. To be surer than the chance of picking a specific atom at random from all atoms in the universe, only about 400 times.

If you want to feel small, think of the number of atoms in the universe. If you want to feel big, think of its logarithm.

Here’s the code that repeats the protocol for assurance.

def convinceBeyondDoubt(G1, G2, isomorphism, errorTolerance=1e-20):
probabilityFooled = 1

while probabilityFooled &gt; errorTolerance:
result = runProtocol(G1, G2, isomorphism)
assert result
probabilityFooled *= 0.5
print(probabilityFooled)

Running it, we see it succeeds

\$ python graph-isomorphism.py
0.5
0.25
0.125
0.0625
0.03125
...
&amp;lt;SNIP&amp;gt;
...
1.3552527156068805e-20
6.776263578034403e-21

So it’s clear that this protocol is convincing.

But how can we be sure that there’s no leakage of knowledge in the protocol? What does “leakage” even mean? That’s where this topic is the most difficult to nail down rigorously, in part because there are at least three a priori different definitions! The idea we want to capture is that anything that you can efficiently compute after the protocol finishes (i.e., you have the content of the messages sent to you by the prover) you could have computed efficiently given only the two graphs $G_1, G_2$, and the claim that they are isomorphic.

Another way to say it is that you may go through the verification process and feel happy and confident that the two graphs are isomorphic. But because it’s a zero-knowledge proof, you can’t do anything with that information more than you could have done if you just took the assertion on blind faith. I’m confident there’s a joke about religion lurking here somewhere, but I’ll just trust it’s funny and move on.

In the next post we’ll expand on this “leakage” notion, but before we get there it should be clear that the graph isomorphism protocol will have the strongest possible “no-leakage” property we can come up with. Indeed, in the first round the prover sends a uniform random isomorphic copy of $G_1$ to the verifier, but the verifier can compute such an isomorphism already without the help of the prover. The verifier can’t necessarily find the isomorphism that the prover used in retrospect, because the verifier can’t solve graph isomorphism. Instead, the point is that the probability space of “$G_1$ paired with an $H$ made by the prover” and the probability space of “$G_1$ paired with $H$ as made by the verifier” are equal. No information was leaked by the prover.

For the second round, again the permutation $\pi$ used by the prover to generate $H$ is uniformly random. Since composing a fixed permutation with a uniform random permutation also results in a uniform random permutation, the second message sent by the prover is uniformly random, and so again the verifier could have constructed a similarly random permutation alone.

Let’s make this explicit with a small program. We have the honest protocol from before, but now I’m returning the set of messages sent by the prover, which the verifier can use for additional computation.

def messagesFromProtocol(G1, G2, isomorphism):
p = Prover(G1, G2, isomorphism)
v = Verifier(G1, G2)

H = p.sendIsomorphicCopy()
choice = v.chooseGraph(H)
witnessIsomorphism = p.proveIsomorphicTo(choice)

return [H, choice, witnessIsomorphism]

To say that the protocol is zero-knowledge (again, this is still colloquial) is to say that anything that the verifier could compute, given as input the return value of this function along with $G_1, G_2$ and the claim that they’re isomorphic, the verifier could also compute given only $G_1, G_2$ and the claim that $G_1, G_2$ are isomorphic.

It’s easy to prove this, and we’ll do so with a python function called simulateProtocol.

def simulateProtocol(G1, G2):
# Construct data drawn from the same distribution as what is
# returned by messagesFromProtocol
choice = random.choice([1, 2])
G = [G1, G2][choice - 1]
n = numVertices(G)

isomorphism = randomPermutation(n)
pi = makePermutationFunction(isomorphism)
H = applyIsomorphism(G, pi)

return H, choice, pi

The claim is that the distribution of outputs to messagesFromProtocol and simulateProtocol are equal. But simulateProtocol will work regardless of whether $G_1, G_2$ are isomorphic. Of course, it’s not convincing to the verifier because the simulating function made the choices in the wrong order, choosing the graph index before making $H$. But the distribution that results is the same either way.

So if you were to use the actual Prover/Verifier protocol outputs as input to another algorithm (say, one which tries to compute an isomorphism of $G_1 \to G_2$), you might as well use the output of your simulator instead. You’d have no information beyond hard-coding the assumption that $G_1, G_2$ are isomorphic into your program. Which, as I mentioned earlier, is no help at all.

In this post we covered one detailed example of a zero-knowledge proof. Next time we’ll broaden our view and see the more general power of zero-knowledge (that it captures all of NP), and see some specific cryptographic applications. Keep in mind the preceding discussion, because we’re going to re-use the terms “prover,” “verifier,” and “simulator” to mean roughly the same things as the classes Prover, Verifier and the function simulateProtocol.

Until then!

# Markov Chain Monte Carlo Without all the Bullshit

I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics.

Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model. Successive random selections form a Markov chain, the stationary distribution of which is the target distribution. It is particularly useful for the evaluation of posterior distributions in complex Bayesian models. In the Metropolis–Hastings algorithm, items are selected from an arbitrary “proposal” distribution and are retained or not according to an acceptance rule. The Gibbs sampler is a special case in which the proposal distributions are conditional distributions of single components of a vector parameter. Various special cases and applications are considered.

I can only vaguely understand what the author is saying here (and really only because I know ahead of time what MCMC is). There are certainly references to more advanced things than what I’m going to cover in this post. But it seems very difficult to find an explanation of Markov Chain Monte Carlo without superfluous jargon. The “bullshit” here is the implicit claim of an author that such jargon is needed. Maybe it is to explain advanced applications (like attempts to do “inference in Bayesian networks”), but it is certainly not needed to define or analyze the basic ideas.

So to counter, here’s my own explanation of Markov Chain Monte Carlo, inspired by the treatment of John Hopcroft and Ravi Kannan.

## The Problem is Drawing from a Distribution

Markov Chain Monte Carlo is a technique to solve the problem of sampling from a complicated distribution. Let me explain by the following imaginary scenario. Say I have a magic box which can estimate probabilities of baby names very well. I can give it a string like “Malcolm” and it will tell me the exact probability $p_{\textup{Malcolm}}$ that you will choose this name for your next child. So there’s a distribution $D$ over all names, it’s very specific to your preferences, and for the sake of argument say this distribution is fixed and you don’t get to tamper with it.

Now comes the problem: I want to efficiently draw a name from this distribution $D$. This is the problem that Markov Chain Monte Carlo aims to solve. Why is it a problem? Because I have no idea what process you use to pick a name, so I can’t simulate that process myself. Here’s another method you could try: generate a name $x$ uniformly at random, ask the machine for $p_x$, and then flip a biased coin with probability $p_x$ and use $x$ if the coin lands heads. The problem with this is that there are exponentially many names! The variable here is the number of bits needed to write down a name $n = |x|$. So either the probabilities $p_x$ will be exponentially small and I’ll be flipping for a very long time to get a single name, or else there will only be a few names with nonzero probability and it will take me exponentially many draws to find them. Inefficiency is the death of me.

So this is a serious problem! Let’s restate it formally just to be clear.

Definition (The sampling problem):  Let $D$ be a distribution over a finite set $X$. You are given black-box access to the probability distribution function $p(x)$ which outputs the probability of drawing $x \in X$ according to $D$. Design an efficient randomized algorithm $A$ which outputs an element of $X$ so that the probability of outputting $x$ is approximately $p(x)$. More generally, output a sample of elements from $X$ drawn according to $p(x)$.

Assume that $A$ has access to only fair random coins, though this allows one to efficiently simulate flipping a biased coin of any desired probability.

Notice that with such an algorithm we’d be able to do things like estimate the expected value of some random variable $f : X \to \mathbb{R}$. We could take a large sample $S \subset X$ via the solution to the sampling problem, and then compute the average value of $f$ on that sample. This is what a Monte Carlo method does when sampling is easy. In fact, the Markov Chain solution to the sampling problem will allow us to do the sampling and the estimation of $\mathbb{E}(f)$ in one fell swoop if you want.

But the core problem is really a sampling problem, and “Markov Chain Monte Carlo” would be more accurately called the “Markov Chain Sampling Method.” So let’s see why a Markov Chain could possibly help us.

## Random Walks, the “Markov Chain” part of MCMC

Markov Chain is essentially a fancy term for a random walk on a graph.

You give me a directed graph $G = (V,E)$, and for each edge $e = (u,v) \in E$ you give me a number $p_{u,v} \in [0,1]$. In order to make a random walk make sense, the $p_{u,v}$ need to satisfy the following constraint:

For any vertex $x \in V$, the set all values $p_{x,y}$ on outgoing edges $(x,y)$ must sum to 1, i.e. form a probability distribution.

If this is satisfied then we can take a random walk on $G$ according to the probabilities as follows: start at some vertex $x_0$. Then pick an outgoing edge at random according to the probabilities on the outgoing edges, and follow it to $x_1$. Repeat if possible.

I say “if possible” because an arbitrary graph will not necessarily have any outgoing edges from a given vertex. We’ll need to impose some additional conditions on the graph in order to apply random walks to Markov Chain Monte Carlo, but in any case the idea of randomly walking is well-defined, and we call the whole object $(V,E, \{ p_e \}_{e \in E})$Markov chain.

Here is an example where the vertices in the graph correspond to emotional states.

An example Markov chain; image source http://www.mathcs.emory.edu/~cheung/

In statistics land, they take the “state” interpretation of a random walk very seriously. They call the edge probabilities “state-to-state transitions.”

The main theorem we need to do anything useful with Markov chains is the stationary distribution theorem (sometimes called the “Fundamental Theorem of Markov Chains,” and for good reason). What it says intuitively is that for a very long random walk, the probability that you end at some vertex $v$ is independent of where you started! All of these probabilities taken together is called the stationary distribution of the random walk, and it is uniquely determined by the Markov chain.

However, for the reasons we stated above (“if possible”), the stationary distribution theorem is not true of every Markov chain. The main property we need is that the graph $G$ is strongly connected. Recall that a directed graph is called connected if, when you ignore direction, there is a path from every vertex to every other vertex. It is called strongly connected if you still get paths everywhere when considering direction. If we additionally require the stupid edge-case-catcher that no edge can have zero probability, then strong connectivity (of one component of a graph) is equivalent to the following property:

For every vertex $v \in V(G)$, an infinite random walk started at $v$ will return to $v$ with probability 1.

In fact it will return infinitely often. This property is called the persistence of the state $v$ by statisticians. I dislike this term because it appears to describe a property of a vertex, when to me it describes a property of the connected component containing that vertex. In any case, since in Markov Chain Monte Carlo we’ll be picking the graph to walk on (spoiler!) we will ensure the graph is strongly connected by design.

Finally, in order to describe the stationary distribution in a more familiar manner (using linear algebra), we will write the transition probabilities as a matrix $A$ where entry $a_{j,i} = p_{(i,j)}$ if there is an edge $(i,j) \in E$ and zero otherwise. Here the rows and columns correspond to vertices of $G$, and each column $i$ forms the probability distribution of going from state $i$ to some other state in one step of the random walk. Note $A$ is the transpose of the weighted adjacency matrix of the directed weighted graph $G$ where the weights are the transition probabilities (the reason I do it this way is because matrix-vector multiplication will have the matrix on the left instead of the right; see below).

This matrix allows me to describe things nicely using the language of linear algebra. In particular if you give me a basis vector $e_i$ interpreted as “the random walk currently at vertex $i$,” then $Ae_i$ gives a vector whose $j$-th coordinate is the probability that the random walk would be at vertex $j$ after one more step in the random walk. Likewise, if you give me a probability distribution $q$ over the vertices, then $Aq$ gives a probability vector interpreted as follows:

If a random walk is in state $i$ with probability $q_i$, then the $j$-th entry of $Aq$ is the probability that after one more step in the random walk you get to vertex $j$.

Interpreted this way, the stationary distribution is a probability distribution $\pi$ such that $A \pi = \pi$, in other words $\pi$ is an eigenvector of $A$ with eigenvalue 1.

A quick side note for avid readers of this blog: this analysis of a random walk is exactly what we did back in the early days of this blog when we studied the PageRank algorithm for ranking webpages. There we called the matrix $A$ “a web matrix,” did random walks on it, and found a special eigenvalue whose eigenvector was a “stationary distribution” that we used to rank web pages (this used something called the Perron-Frobenius theorem, which says a random-walk matrix has that special eigenvector). There we described an algorithm to actually find that eigenvector by iteratively multiplying $A$. The following theorem is essentially a variant of this algorithm but works under weaker conditions; for the web matrix we added additional “fake” edges that give the needed stronger conditions.

Theorem: Let $G$ be a strongly connected graph with associated edge probabilities $\{ p_e \}_e \in E$ forming a Markov chain. For a probability vector $x_0$, define $x_{t+1} = Ax_t$ for all $t \geq 1$, and let $v_t$ be the long-term average $v_t = \frac1t \sum_{s=1}^t x_s$. Then:

1. There is a unique probability vector $\pi$ with $A \pi = \pi$.
2. For all $x_0$, the limit $\lim_{t \to \infty} v_t = \pi$.

Proof. Since $v_t$ is a probability vector we just want to show that $|Av_t - v_t| \to 0$ as $t \to \infty$. Indeed, we can expand this quantity as

\displaystyle \begin{aligned} Av_t - v_t &=\frac1t (Ax_0 + Ax_1 + \dots + Ax_{t-1}) - \frac1t (x_0 + \dots + x_{t-1}) \\ &= \frac1t (x_t - x_0) \end{aligned}

But $x_t, x_0$ are unit vectors, so their difference is at most 2, meaning $|Av_t - v_t| \leq \frac2t \to 0$. Now it’s clear that this does not depend on $v_0$. For uniqueness we will cop out and appeal to the Perron-Frobenius theorem that says any matrix of this form has a unique such (normalized) eigenvector.

$\square$

One additional remark is that, in addition to computing the stationary distribution by actually computing this average or using an eigensolver, one can analytically solve for it as the inverse of a particular matrix. Define $B = A-I_n$, where $I_n$ is the $n \times n$ identity matrix. Let $C$ be $B$ with a row of ones appended to the bottom and the topmost row removed. Then one can show (quite opaquely) that the last column of $C^{-1}$ is $\pi$. We leave this as an exercise to the reader, because I’m pretty sure nobody uses this method in practice.

One final remark is about why we need to take an average over all our $x_t$ in the theorem above. There is an extra technical condition one can add to strong connectivity, called aperiodicity, which allows one to beef up the theorem so that $x_t$ itself converges to the stationary distribution. Rigorously, aperiodicity is the property that, regardless of where you start your random walk, after some sufficiently large number of steps $n$ the random walk has a positive probability of being at every vertex at every subsequent step. As an example of a graph where aperiodicity fails: an undirected cycle on an even number of vertices. In that case there will only be a positive probability of being at certain vertices every other step, and averaging those two long term sequences gives the actual stationary distribution.

Image source: Wikipedia

One way to guarantee that your Markov chain is aperiodic is to ensure there is a positive probability of staying at any vertex. I.e., that your graph has a self-loop. This is what we’ll do in the next section.

## Constructing a graph to walk on

Recall that the problem we’re trying to solve is to draw from a distribution over a finite set $X$ with probability function $p(x)$. The MCMC method is to construct a Markov chain whose stationary distribution is exactly $p$, even when you just have black-box access to evaluating $p$. That is, you (implicitly) pick a graph $G$ and (implicitly) choose transition probabilities for the edges to make the stationary distribution $p$. Then you take a long enough random walk on $G$ and output the $x$ corresponding to whatever state you land on.

The easy part is coming up with a graph that has the right stationary distribution (in fact, “most” graphs will work). The hard part is to come up with a graph where you can prove that the convergence of a random walk to the stationary distribution is fast in comparison to the size of $X$. Such a proof is beyond the scope of this post, but the “right” choice of a graph is not hard to understand.

The one we’ll pick for this post is called the Metropolis-Hastings algorithm. The input is your black-box access to $p(x)$, and the output is a set of rules that implicitly define a random walk on a graph whose vertex set is $X$.

It works as follows: you pick some way to put $X$ on a lattice, so that each state corresponds to some vector in $\{ 0,1, \dots, n\}^d$. Then you add (two-way directed) edges to all neighboring lattice points. For $n=5, d=2$ it would look like this:

And for $d=3, n \in \{2,3\}$ it would look like this:

You have to be careful here to ensure the vertices you choose for $X$ are not disconnected, but in many applications $X$ is naturally already a lattice.

Now we have to describe the transition probabilities. Let $r$ be the maximum degree of a vertex in this lattice ($r=2d$). Suppose we’re at vertex $i$ and we want to know where to go next. We do the following:

1. Pick neighbor $j$ with probability $1/r$ (there is some chance to stay at $i$).
2. If you picked neighbor $j$ and $p(j) \geq p(i)$ then deterministically go to $j$.
3. Otherwise, $p(j) < p(i)$, and you go to $j$ with probability $p(j) / p(i)$.

We can state the probability weight $p_{i,j}$ on edge $(i,j)$ more compactly as

$\displaystyle p_{i,j} = \frac1r \min(1, p(j) / p(i)) \\ p_{i,i} = 1 - \sum_{(i,j) \in E(G); j \neq i} p_{i,j}$

It is easy to check that this is indeed a probability distribution for each vertex $i$. So we just have to show that $p(x)$ is the stationary distribution for this random walk.

Here’s a fact to do that: if a probability distribution $v$ with entries $v(x)$ for each $x \in X$ has the property that $v(x)p_{x,y} = v(y)p_{y,x}$ for all $x,y \in X$, the $v$ is the stationary distribution. To prove it, fix $x$ and take the sum of both sides of that equation over all $y$. The result is exactly the equation $v(x) = \sum_{y} v(y)p_{y,x}$, which is the same as $v = Av$. Since the stationary distribution is the unique vector satisfying this equation, $v$ has to be it.

Doing this with out chosen $p(i)$ is easy, since $p(i)p_{i,j}$ and $p(i)p_{j,i}$ are both equal to $\frac1r \min(p(i), p(j))$ by applying a tiny bit of algebra to the definition. So we’re done! One can just randomly walk according to these probabilities and get a sample.

## Last words

The last thing I want to say about MCMC is to show that you can estimate the expected value of a function $\mathbb{E}(f)$ simultaneously while random-walking through your Metropolis-Hastings graph (or any graph whose stationary distribution is $p(x)$). By definition the expected value of $f$ is $\sum_x f(x) p(x)$.

Now what we can do is compute the average value of $f(x)$ just among those states we’ve visited during our random walk. With a little bit of extra work you can show that this quantity will converge to the true expected value of $f$ at about the same time that the random walk converges to the stationary distribution. (Here the “about” means we’re off by a constant factor depending on $f$). In order to prove this you need some extra tools I’m too lazy to write about in this post, but the point is that it works.

The reason I did not start by describing MCMC in terms of estimating the expected value of a function is because the core problem is a sampling problem. Moreover, there are many applications of MCMC that need nothing more than a sample. For example, MCMC can be used to estimate the volume of an arbitrary (maybe high dimensional) convex set. See these lecture notes of Alistair Sinclair for more.

If demand is popular enough, I could implement the Metropolis-Hastings algorithm in code (it wouldn’t be industry-strength, but perhaps illuminating? I’m not so sure…).

Until next time!

# Zero-One Laws for Random Graphs

Last time we saw a number of properties of graphs, such as connectivity, where the probability that an Erdős–Rényi random graph $G(n,p)$ satisfies the property is asymptotically either zero or one. And this zero or one depends on whether the parameter $p$ is above or below a universal threshold (that depends only on $n$ and the property in question).

To remind the reader, the Erdős–Rényi random “graph” $G(n,p)$ is a distribution over graphs that you draw from by including each edge independently with probability $p$. Last time we saw that the existence of an isolated vertex has a sharp threshold at $(\log n) / n$, meaning if $p$ is asymptotically smaller than the threshold there will certainly be isolated vertices, and if $p$ is larger there will certainly be no isolated vertices. We also gave a laundry list of other properties with such thresholds.

One might want to study this phenomenon in general. Even if we might not be able to find all the thresholds we want for a given property, can we classify which properties have thresholds and which do not?

The answer turns out to be mostly yes! For large classes of properties, there are proofs that say things like, “either this property holds with probability tending to one, or it holds with probability tending to zero.” These are called “zero-one laws,” and they’re sort of meta theorems. We’ll see one such theorem in this post relating to constant edge-probabilities in random graphs, and we’ll remark on another at the end.

## Sentences about graphs in first order logic

A zero-one law generally works by defining a class of properties, and then applying a generic first/second moment-type argument to every property in the class.

So first we define what kinds of properties we’ll discuss. We’ll pick a large class: anything that can be expressed in first-order logic in the language of graphs. That is, any finite logical statement that uses existential and universal quantifiers over variables, and whose only relation (test) is whether an edge exists between two vertices. We’ll call this test $e(x,y)$. So you write some sentence $P$ in this language, and you take a graph $G$, and you can ask $P(G) = 1$, whether the graph satisfies the sentence.

This seems like a really large class of properties, and it is, but let’s think carefully about what kinds of properties can be expressed this way. Clearly the existence of a triangle can be written this way, it’s just the sentence

$\exists x,y,z : e(x,y) \wedge e(y,z) \wedge e(x,z)$

I’m using $\wedge$ for AND, and $\vee$ for OR, and $\neg$ for NOT. Similarly, one can express the existence of a clique of size $k$, or the existence of an independent set of size $k$, or a path of a fixed length, or whether there is a vertex of maximal degree $n-1$.

Here’s a question: can we write a formula which will be true for a graph if and only if it’s connected? Well such a formula seems like it would have to know about how many vertices there are in the graph, so it could say something like “for all $x,y$ there is a path from $x$ to $y$.” It seems like you’d need a family of such formulas that grows with $n$ to make anything work. But this isn’t a proof; the question remains whether there is some other tricky way to encode connectivity.

But as it turns out, connectivity is not a formula you can express in propositional logic. We won’t prove it here, but we will note at the end of the article that connectivity is in a different class of properties that you can prove has a similar zero-one law.

## The zero-one law for first order logic

So the theorem about first-order expressible sentences is as follows.

Theorem: Let $P$ be a property of graphs that can be expressed in the first order language of graphs (with the $e(x,y)$ relation). Then for any constant $p$, the probability that $P$ holds in $G(n,p)$ has a limit of zero or one as $n \to \infty$.

Proof. We’ll prove the simpler case of $p=1/2$, but the general case is analogous. Given such a graph $G$ drawn from $G(n,p)$, what we’ll do is define a countably infinite family of propositional formulas $\varphi_{k,l}$, and argue that they form a sort of “basis” for all first-order sentences about graphs.

First let’s describe the $\varphi_{k,l}$. For any $k,l \in \mathbb{N}$, the sentence will assert that for every set of $k$ vertices and every set of $l$ vertices, there is some other vertex connected to the first $k$ but not the last $l$.

$\displaystyle \varphi_{k,l} : \forall x_1, \dots, x_k, y_1, \dots, y_l \exists z : \\ e(z,x_1) \wedge \dots \wedge e(z,x_k) \wedge \neg e(z,y_1) \wedge \dots \wedge \neg e(z,y_l)$.

In other words, these formulas encapsulate every possible incidence pattern for a single vertex. It is a strange set of formulas, but they have a very nice property we’re about to get to. So for a fixed $\varphi_{k,l}$, what is the probability that it’s false on $n$ vertices? We want to give an upper bound and hence show that the formula is true with probability approaching 1. That is, we want to show that all the $\varphi_{k,l}$ are true with probability tending to 1.

Computing the probability: we have $\binom{n}{k} \binom{n-k}{l}$ possibilities to choose these sets, and the probability that some other fixed vertex $z$ has the good connections is $2^{-(k+l)}$ so the probability $z$ is not good is $1 - 2^{-(k+l)}$, and taking a product over all choices of $z$ gives the probability that there is some bad vertex $z$ with an exponent of $(n - (k + l))$. Combining all this together gives an upper bound of $\varphi_{k,l}$ being false of:

$\displaystyle \binom{n}{k}\binom{n-k}{l} (1-2^{-k-1})^{n-k-l}$

And $k, l$ are constant, so the left two terms are polynomials while the rightmost term is an exponentially small function, and this implies that the whole expression tends to zero, as desired.

Break from proof.

## A bit of model theory

So what we’ve proved so far is that the probability of every formula of the form $\varphi_{k,l}$ being satisfied in $G(n,1/2)$ tends to 1.

Now look at the set of all such formulas

$\displaystyle \Phi = \{ \varphi_{k,l} : k,l \in \mathbb{N} \}$

We ask: is there any graph which satisfies all of these formulas? Certainly it cannot be finite, because a finite graph would not be able to satisfy formulas with sufficiently large values of $l, k > n$. But indeed, there is a countably infinite graph that works. It’s called the Rado graph, pictured below.

The Rado graph has some really interesting properties, such as that it contains every finite and countably infinite graph as induced subgraphs. Basically this means, as far as countably infinite graphs go, it’s the big momma of all graphs. It’s the graph in a very concrete sense of the word. It satisfies all of the formulas in $\Phi$, and in fact it’s uniquely determined by this, meaning that if any other countably infinite graph satisfies all the formulas in $\Phi$, then that graph is isomorphic to the Rado graph.

But for our purposes (proving a zero-one law), there’s a better perspective than graph theory on this object. In the logic perspective, the set $\Phi$ is called a theory, meaning a set of statements that you consider “axioms” in some logical system. And we’re asking whether there any model realizing the theory. That is, is there some logical system with a semantic interpretation (some mathematical object based on numbers, or sets, or whatever) that satisfies all the axioms?

A good analogy comes from the rational numbers, because they satisfy a similar property among all ordered sets. In fact, the rational numbers are the unique countable, ordered set with the property that it has no biggest/smallest element and is dense. That is, in the ordering there is always another element between any two elements you want. So the theorem says if you have two countable sets with these properties, then they are actually isomorphic as ordered sets, and they are isomorphic to the rational numbers.

So, while we won’t prove that the Rado graph is a model for our theory $\Phi$, we will use that fact to great benefit. One consequence of having a theory with a model is that the theory is consistent, meaning it can’t imply any contradictions. Another fact is that this theory $\Phi$ is complete. Completeness means that any formula or it’s negation is logically implied by the theory. Note these are syntactical implications (using standard rules of propositional logic), and have nothing to do with the model interpreting the theory.

The proof that $\Phi$ is complete actually follows from the uniqueness of the Rado graph as the only countable model of $\Phi$. Suppose the contrary, that $\Phi$ is not consistent, then there has to be some formula $\psi$ that is not provable, and it’s negation is also not provable, by starting from $\Phi$. Now extend $\Phi$ in two ways: by adding $\psi$ and by adding $\neg \psi$. Both of the new theories are still countable, and by a theorem from logic this means they both still have countable models. But both of these new models are also countable models of $\Phi$, so they have to both be the Rado graph. But this is very embarrassing for them, because we assumed they disagree on the truth of $\psi$.

So now we can go ahead and prove the zero-one law theorem.

Given an arbitrary property $\varphi \not \in \Psi$. Now either $\varphi$ or it’s negation can be derived from $\Phi$. Without loss of generality suppose it’s $\varphi$. Take all the formulas from the theory you need to derive $\varphi$, and note that since it is a proof in propositional logic you will only finitely many such $\varphi_{k,l}$. Now look at the probabilities of the $\varphi_{k,l}$: they are all true with probability tending to 1, so the implied statement of the proof of $\varphi$ (i.e., $\varphi$ itself) must also hold with probability tending to 1. And we’re done!

$\square$

If you don’t like model theory, there is another “purely combinatorial” proof of the zero-one law using something called Ehrenfeucht–Fraïssé games. It is a bit longer, though.

## Other zero-one laws

One might naturally ask two questions: what if your probability is not constant, and what other kinds of properties have zero-one laws? Both great questions.

For the first, there are some extra theorems. I’ll just describe one that has always seemed very strange to me. If your probability is of the form $p = n^{-\alpha}$ but $\alpha$ is irrational, then the zero-one law still holds! This is a theorem of Baldwin-Shelah-Spencer, and it really makes you wonder why irrational numbers would be so well behaved while rational numbers are not :)

For the second question, there is another theorem about monotone properties of graphs. Monotone properties come in two flavors, so called “increasing” and “decreasing.” I’ll describe increasing monotone properties and the decreasing counterpart should be obvious. A property is called monotone increasing if adding edges can never destroy the property. That is, with an empty graph you don’t have the property (or maybe you do), and as you start adding edges eventually you suddenly get the property, but then adding more edges can’t cause you to lose the property again. Good examples of this include connectivity, or the existence of a triangle.

So the theorem is that there is an identical zero-one law for monotone properties. Great!

It’s not so often that you get to see these neat applications of logic and model theory to graph theory and (by extension) computer science. But when you do get to apply them they seem very powerful and mysterious. I think it’s a good thing.

Until next time!