A parlor trick for SET

Tai-Danae Bradley is one of the hosts of PBS Infinite Series, a delightful series of vignettes into fun parts of math. The video below is about the same of SET, a favorite among mathematicians. Specifically, Tai-Danae explains how SET cards lie in (using more technical jargon) a vector space over a finite field, and that valid sets correspond to lines. If you don’t immediately know how this would work, watch the video.

In this post I want to share a parlor trick for SET that I originally heard from Charlotte Chan. It uses the same ideas from the video above, which I’ll only review briefly.

In the game of SET you see a board of cards like the following, and players look for sets.

SetCards

Image source: theboardgamefamily.com

A valid set is a triple of cards where, feature by feature, the characteristics on the cards are either all the same or all different. A valid set above is {one empty blue oval, two solid blue ovals, three shaded blue ovals}. The feature of “fill” is different on all the cards, but the feature of “color” is the same, etc.

In a game of SET, the cards are dealt in order from a shuffled deck, players race to claim sets, removing the set if it’s valid, and three cards are dealt to replace the removed set. Eventually the deck is exhausted and the game is over, and the winner is the player who collected the most sets.

There are a handful of mathematical tricks you can use to help you search for sets faster, but the parlor trick in this post adds a fun variant to the end of the game.

Play the game of SET normally, but when you get down to the last card in the deck, don’t reveal it. Keep searching for sets until everyone agrees no visible sets are left. Then you start the variant: the first player to guess the last un-dealt card in the deck gets a bonus set.

The math comes in when you discover that you don’t need to guess, or remember anything about the game that was just played! A clever stranger could walk into the room at the end of the game and win the bonus point.

Theorem: As long as every player claimed a valid set throughout the game, the information on the remaining board uniquely determines the last (un-dealt) card.

Before we get to the proof, some reminders. Recall that there are four features on a SET card, each of which has three options. Enumerate the options for each feature (e.g., {Squiggle, Oval, Diamond} = {0, 1, 2}).

While we will not need the geometry induced by this, this implies each card is a vector in the vector space $ \mathbb{F}_3^4$, where $ \mathbb{F}_3 = \mathbb{Z}/3\mathbb{Z}$ is the finite field of three elements, and the exponent means “dimension 4.” As Tai-Danae points out in the video, each SET is an affine line in this vector space. For example, if this is the enumeration:

joyofset

Source: “The Joy of Set

Then using the enumeration, a set might be given by

$ \displaystyle \{ (1, 1, 1, 1), (1, 2, 0, 1), (1, 0, 2, 1) \}$

The crucial feature for us is that the vector-sum (using the modular field arithmetic on each entry) of the cards in a valid set is the zero vector $ (0, 0, 0, 0)$. This is because $ 1+1+1 = 0, 2+2+2 = 0,$ and $ 1+2+3=0$ are all true mod 3.

Proof of Theorem. Consider the vector-valued invariant $ S_t$ equal to the sum of the remaining cards after $ t$ sets have been taken. At the beginning of the game the deck has 81 cards that can be partitioned into valid sets. Because each valid set sums to the zero vector, $ S_0 = (0, 0, 0, 0)$. Removing a valid set via normal play does not affect the invariant, because you’re subtracting a set of vectors whose sum is zero. So $ S_t = 0$ for all $ t$.

At the end of the game, the invariant still holds even if there are no valid sets left to claim. Let $ x$ be the vector corresponding to the last un-dealt card, and $ c_1, \dots, c_n$ be the remaining visible cards. Then $ x + \sum_{i=1}^n c_i = (0,0,0,0)$, meaning $ x = -\sum_{i=1}^n c_i$.

$ \square$

I would provide an example, but I want to encourage everyone to play a game of SET and try it out live!

Charlotte, who originally showed me this trick, was quick enough to compute this sum in her head. So were the other math students we played SET with. It’s a bit easier than it seems since you can do the sum feature by feature. Even though I’ve known about this trick for years, I still require a piece of paper and a few minutes.

Because this is Math Intersect Programming, the reader is encouraged to implement this scheme as an exercise, and simulate a game of SET by removing randomly chosen valid sets to verify experimentally that this scheme works.

Until next time!

Earthmover Distance

Problem: Compute distance between points with uncertain locations (given by samples, or differing observations, or clusters).

For example, if I have the following three “points” in the plane, as indicated by their colors, which is closer, blue to green, or blue to red?

example-points.png

It’s not obvious, and there are multiple factors at work: the red points have fewer samples, but we can be more certain about the position; the blue points are less certain, but the closest non-blue point to a blue point is green; and the green points are equally plausibly “close to red” and “close to blue.” The centers of masses of the three sample sets are close to an equilateral triangle. In our example the “points” don’t overlap, but of course they could. And in particular, there should probably be a nonzero distance between two points whose sample sets have the same center of mass, as below. The distance quantifies the uncertainty.

same-centers.png

All this is to say that it’s not obvious how to define a distance measure that is consistent with perceptual ideas of what geometry and distance should be.

Solution (Earthmover distance): Treat each sample set $ A$ corresponding to a “point” as a discrete probability distribution, so that each sample $ x \in A$ has probability mass $ p_x = 1 / |A|$. The distance between $ A$ and $ B$ is the optional solution to the following linear program.

Each $ x \in A$ corresponds to a pile of dirt of height $ p_x$, and each $ y \in B$ corresponds to a hole of depth $ p_y$. The cost of moving a unit of dirt from $ x$ to $ y$ is the Euclidean distance $ d(x, y)$ between the points (or whatever hipster metric you want to use).

Let $ z_{x, y}$ be a real variable corresponding to an amount of dirt to move from $ x \in A$ to $ y \in B$, with cost $ d(x, y)$. Then the constraints are:

  • Each $ z_{x, y} \geq 0$, so dirt only moves from $ x$ to $ y$.
  • Every pile $ x \in A$ must vanish, i.e. for each fixed $ x \in A$, $ \sum_{y \in B} z_{x,y} = p_x$.
  • Likewise, every hole $ y \in B$ must be completely filled, i.e. $ \sum_{y \in B} z_{x,y} = p_y$.

The objective is to minimize the cost of doing this: $ \sum_{x, y \in A \times B} d(x, y) z_{x, y}$.

In python, using the ortools library (and leaving out a few docstrings and standard import statements, full code on Github):

from ortools.linear_solver import pywraplp

def earthmover_distance(p1, p2):
    dist1 = {x: count / len(p1) for (x, count) in Counter(p1).items()}
    dist2 = {x: count / len(p2) for (x, count) in Counter(p2).items()}
    solver = pywraplp.Solver('earthmover_distance', pywraplp.Solver.GLOP_LINEAR_PROGRAMMING)

    variables = dict()

    # for each pile in dist1, the constraint that says all the dirt must leave this pile
    dirt_leaving_constraints = defaultdict(lambda: 0)

    # for each hole in dist2, the constraint that says this hole must be filled
    dirt_filling_constraints = defaultdict(lambda: 0)

    # the objective
    objective = solver.Objective()
    objective.SetMinimization()

    for (x, dirt_at_x) in dist1.items():
        for (y, capacity_of_y) in dist2.items():
            amount_to_move_x_y = solver.NumVar(0, solver.infinity(), 'z_{%s, %s}' % (x, y))
            variables[(x, y)] = amount_to_move_x_y
            dirt_leaving_constraints[x] += amount_to_move_x_y
            dirt_filling_constraints[y] += amount_to_move_x_y
            objective.SetCoefficient(amount_to_move_x_y, euclidean_distance(x, y))

    for x, linear_combination in dirt_leaving_constraints.items():
        solver.Add(linear_combination == dist1[x])

    for y, linear_combination in dirt_filling_constraints.items():
        solver.Add(linear_combination == dist2[y])

    status = solver.Solve()
    if status not in [solver.OPTIMAL, solver.FEASIBLE]:
        raise Exception('Unable to find feasible solution')

    return objective.Value()

Discussion: I’ve heard about this metric many times as a way to compare probability distributions. For example, it shows up in an influential paper about fairness in machine learning, and a few other CS theory papers related to distribution testing.

One might ask: why not use other measures of dissimilarity for probability distributions (Chi-squared statistic, Kullback-Leibler divergence, etc.)? One answer is that these other measures only give useful information for pairs of distributions with the same support. An example from a talk of Justin Solomon succinctly clarifies what Earthmover distance achieves

Screen Shot 2018-03-03 at 6.11.00 PM.png

Also, why not just model the samples using, say, a normal distribution, and then compute the distance based on the parameters of the distributions? That is possible, and in fact makes for a potentially more efficient technique, but you lose some information by doing this. Ignoring that your data might not be approximately normal (it might have some curvature), with Earthmover distance, you get point-by-point details about how each data point affects the outcome.

This kind of attention to detail can be very important in certain situations. One that I’ve been paying close attention to recently is the problem of studying gerrymandering from a mathematical perspective. Justin Solomon of MIT is a champion of the Earthmover distance (see his fascinating talk here for more, with slides) which is just one topic in a field called “optimal transport.”

This has the potential to be useful in redistricting because of the nature of the redistricting problem. As I wrote previously, discussions of redistricting are chock-full of geometry—or at least geometric-sounding language—and people are very concerned with the apparent “compactness” of a districting plan. But the underlying data used to perform redistricting isn’t very accurate. The people who build the maps don’t have precise data on voting habits, or even locations where people live. Census tracts might not be perfectly aligned, and data can just plain have errors and uncertainty in other respects. So the data that district-map-drawers care about is uncertain much like our point clouds. With a theory of geometry that accounts for uncertainty (and the Earthmover distance is the “distance” part of that), one can come up with more robust, better tools for redistricting.

Solomon’s website has a ton of resources about this, under the names of “optimal transport” and “Wasserstein metric,” and his work extends from computing distances to computing important geometric values like the barycenter, computational advantages like parallelism.

Others in the field have come up with transparency techniques to make it clearer how the Earthmover distance relates to the geometry of the underlying space. This one is particularly fun because the explanations result in a path traveled from the start to the finish, and by setting up the underlying metric in just such a way, you can watch the distribution navigate a maze to get to its target. I like to imagine tiny ants carrying all that dirt.

Screen Shot 2018-03-03 at 6.15.50 PM.png

Finally, work of Shirdhonkar and Jacobs provide approximation algorithms that allow linear-time computation, instead of the worst-case-cubic runtime of a linear solver.

A Spectral Analysis of Moore Graphs

For fixed integers $ r > 0$, and odd $ g$, a Moore graph is an $ r$-regular graph of girth $ g$ which has the minimum number of vertices $ n$ among all such graphs with the same regularity and girth.

(Recall, A the girth of a graph is the length of its shortest cycle, and it’s regular if all its vertices have the same degree)

Problem (Hoffman-Singleton): Find a useful constraint on the relationship between $ n$ and $ r$ for Moore graphs of girth $ 5$ and degree $ r$.

Note: Excluding trivial Moore graphs with girth $ g=3$ and degree $ r=2$, there are only two known Moore graphs: (a) the Petersen graph and (b) this crazy graph:

hoffman_singleton_graph_circle2

The solution to the problem shows that there are only a few cases left to check.

Solution: It is easy to show that the minimum number of vertices of a Moore graph of girth $ 5$ and degree $ r$ is $ 1 + r + r(r-1) = r^2 + 1$. Just consider the tree:

500px-petersen-as-moore-svg

This is the tree example for $ r = 3$, but the argument should be clear for any $ r$ from the branching pattern of the tree: $ 1 + r + r(r-1)$

Provided $ n = r^2 + 1$, we will prove that $ r$ must be either $ 3, 7,$ or $ 57$. The technique will be to analyze the eigenvalues of a special matrix derived from the Moore graph.

Let $ A$ be the adjacency matrix of the supposed Moore graph with these properties. Let $ B = A^2 = (b_{i,j})$. Using the girth and regularity we know:

  • $ b_{i,i} = r$ since each vertex has degree $ r$.
  • $ b_{i,j} = 0$ if $ (i,j)$ is an edge of $ G$, since any walk of length 2 from $ i$ to $ j$ would be able to use such an edge and create a cycle of length 3 which is less than the girth.
  • $ b_{i,j} = 1$ if $ (i,j)$ is not an edge, because (using the tree idea above), every two vertices non-adjacent vertices have a unique neighbor in common.

Let $ J_n$ be the $ n \times n$ matrix of all 1’s and $ I_n$ the identity matrix. Then

$ \displaystyle B = rI_n + J_n – I_n – A.$

We use this matrix equation to generate two equations whose solutions will restrict $ r$. Since $ A$ is a real symmetric matrix is has an orthonormal basis of eigenvectors $ v_1, \dots, v_n$ with eigenvalues $ \lambda_1 , \dots, \lambda_n$. Moreover, by regularity we know one of these vectors is the all 1’s vector, with eigenvalue $ r$. Call this $ v_1 = (1, \dots, 1), \lambda_1 = r$. By orthogonality of $ v_1$ with the other $ v_i$, we know that $ J_nv_i = 0$. We also know that, since $ A$ is an adjacency matrix with zeros on the diagonal, the trace of $ A$ is $ \sum_i \lambda_i = 0$.

Multiply the matrices in the equation above by any $ v_i$, $ i > 1$ to get

$ \displaystyle \begin{aligned}A^2v_i &= rv_i – v_i – Av_i \\ \lambda_i^2v_i &= rv_i – v_i – \lambda_i v_i \end{aligned}$

Rearranging and factoring out $ v_i$ gives $ \lambda_i^2 – \lambda_i – (r+1) = 0$. Let $ z = 4r – 3$, then the non-$ r$ eigenvalues must be one of the two roots: $ \mu_1 = (-1 + \sqrt{z}) / 2$ or $ \mu_2 = (-1 – \sqrt{z})/2$.

Say that $ \mu_1$ occurs $ a$ times and $ \mu_2$ occurs $ b$ times, then $ n = a + b + 1$. So we have the following equations.

$ \displaystyle \begin{aligned} a + b + 1 &= n \\ r + a \mu_1 + b\mu_2 &= 0 \end{aligned}$

From this equation you can easily derive that $ \sqrt{z}$ is an integer, and as a consequence $ r = (m^2 + 3) / 4$ for some integer $ m$. With a tiny bit of extra algebra, this gives

$ \displaystyle m(m^3 – 2m – 16(a-b)) = 15$

Implying that $ m$ divides $ 15$, meaning $ m \in \{ 1, 3, 5, 15\}$, and as a consequence $ r \in \{ 1, 3, 7, 57\}$.

$ \square$

Discussion: This is a strikingly clever use of spectral graph theory to answer a question about combinatorics. Spectral graph theory is precisely that, the study of what linear algebra can tell us about graphs. For an deeper dive into spectral graph theory, see the guest post I wrote on With High Probability.

If you allow for even girth, there are a few extra (infinite families of) Moore graphs, see Wikipedia for a list.

With additional techniques, one can also disprove the existence of any Moore graphs that are not among the known ones, with the exception of a possible Moore graph of girth $ 5$ and degree $ 57$ on $ n = 3250$ vertices. It is unknown whether such a graph exists, but if it does, it is known that

You should go out and find it or prove it doesn’t exist.

Hungry for more applications of linear algebra to combinatorics and computer science? The book Thirty-Three Miniatures is a fantastically entertaining book of linear algebra gems (it’s where I found the proof in this post). The exposition is lucid, and the chapters are short enough to read on my daily train commute.

Zero Knowledge Proofs — A Primer

In this post we’ll get a strong taste for zero knowledge proofs by exploring the graph isomorphism problem in detail. In the next post, we’ll see how this relates to cryptography and the bigger picture. The goal of this post is to get a strong understanding of the terms “prover,” “verifier,” and “simulator,” and “zero knowledge” in the context of a specific zero-knowledge proof. Then next time we’ll see how the same concepts (though not the same proof) generalizes to a cryptographically interesting setting.

Graph isomorphism

Let’s start with an extended example. We are given two graphs $ G_1, G_2$, and we’d like to know whether they’re isomorphic, meaning they’re the same graph, but “drawn” different ways.

The problem of telling if two graphs are isomorphic seems hard. The pictures above, which are all different drawings of the same graph (or are they?), should give you pause if you thought it was easy.

To add a tiny bit of formalism, a graph $ G$ is a list of edges, and each edge $ (u,v)$ is a pair of integers between 1 and the total number of vertices of the graph, say $ n$. Using this representation, an isomorphism between $ G_1$ and $ G_2$ is a permutation $ \pi$ of the numbers $ \{1, 2, \dots, n \}$ with the property that $ (i,j)$ is an edge in $ G_1$ if and only if $ (\pi(i), \pi(j))$ is an edge of $ G_2$. You swap around the labels on the vertices, and that’s how you get from one graph to another isomorphic one.

Given two arbitrary graphs as input on a large number of vertices $ n$, nobody knows of an efficient—i.e., polynomial time in $ n$—algorithm that can always decide whether the input graphs are isomorphic. Even if you promise me that the inputs are isomorphic, nobody knows of an algorithm that could construct an isomorphism. (If you think about it, such an algorithm could be used to solve the decision problem!)

A game

Now let’s play a game. In this game, we’re given two enormous graphs on a billion nodes. I claim they’re isomorphic, and I want to prove it to you. However, my life’s fortune is locked behind these particular graphs (somehow), and if you actually had an isomorphism between these two graphs you could use it to steal all my money. But I still want to convince you that I do, in fact, own all of this money, because we’re about to start a business and you need to know I’m not broke.

Is there a way for me to convince you beyond a reasonable doubt that these two graphs are indeed isomorphic? And moreover, could I do so without you gaining access to my secret isomorphism? It would be even better if I could guarantee you learn nothing about my isomorphism or any isomorphism, because even the slightest chance that you can steal my money is out of the question.

Zero knowledge proofs have exactly those properties, and here’s a zero knowledge proof for graph isomorphism. For the record, $ G_1$ and $ G_2$ are public knowledge, (common inputs to our protocol for the sake of tracking runtime), and the protocol itself is common knowledge. However, I have an isomorphism $ f: G_1 \to G_2$ that you don’t know.

Step 1: I will start by picking one of my two graphs, say $ G_1$, mixing up the vertices, and sending you the resulting graph. In other words, I send you a graph $ H$ which is chosen uniformly at random from all isomorphic copies of $ G_1$. I will save the permutation $ \pi$ that I used to generate $ H$ for later use.

Step 2: You receive a graph $ H$ which you save for later, and then you randomly pick an integer $ t$ which is either 1 or 2, with equal probability on each. The number $ t$ corresponds to your challenge for me to prove $ H$ is isomorphic to $ G_1$ or $ G_2$. You send me back $ t$, with the expectation that I will provide you with an isomorphism between $ H$ and $ G_t$.

Step 3: Indeed, I faithfully provide you such an isomorphism. If I you send me $ t=1$, I’ll give you back $ \pi^{-1} : H \to G_1$, and otherwise I’ll give you back $ f \circ \pi^{-1}: H \to G_2$. Because composing a fixed permutation with a uniformly random permutation is again a uniformly random permutation, in either case I’m sending you a uniformly random permutation.

Step 4: You receive a permutation $ g$, and you can use it to verify that $ H$ is isomorphic to $ G_t$. If the permutation I sent you doesn’t work, you’ll reject my claim, and if it does, you’ll accept my claim.

Before we analyze, here’s some Python code that implements the above scheme. You can find the full, working example in a repository on this blog’s Github page.

First, a few helper functions for generating random permutations (and turning their list-of-zero-based-indices form into a function-of-positive-integers form)

import random

def randomPermutation(n):
    L = list(range(n))
    random.shuffle(L)
    return L

def makePermutationFunction(L):
    return lambda i: L[i - 1] + 1

def makeInversePermutationFunction(L):
    return lambda i: 1 + L.index(i - 1)

def applyIsomorphism(G, f):
    return [(f(i), f(j)) for (i, j) in G]

Here’s a class for the Prover, the one who knows the isomorphism and wants to prove it while keeping the isomorphism secret:

class Prover(object):
    def __init__(self, G1, G2, isomorphism):
        '''
            isomomorphism is a list of integers representing
            an isomoprhism from G1 to G2.
        '''
        self.G1 = G1
        self.G2 = G2
        self.n = numVertices(G1)
        assert self.n == numVertices(G2)

        self.isomorphism = isomorphism
        self.state = None

    def sendIsomorphicCopy(self):
        isomorphism = randomPermutation(self.n)
        pi = makePermutationFunction(isomorphism)

        H = applyIsomorphism(self.G1, pi)

        self.state = isomorphism
        return H

    def proveIsomorphicTo(self, graphChoice):
        randomIsomorphism = self.state
        piInverse = makeInversePermutationFunction(randomIsomorphism)

        if graphChoice == 1:
            return piInverse
        else:
            f = makePermutationFunction(self.isomorphism)
            return lambda i: f(piInverse(i))

The prover has two methods, one for each round of the protocol. The first creates an isomorphic copy of $ G_1$, and the second receives the challenge and produces the requested isomorphism.

And here’s the corresponding class for the verifier

class Verifier(object):
    def __init__(self, G1, G2):
        self.G1 = G1
        self.G2 = G2
        self.n = numVertices(G1)
        assert self.n == numVertices(G2)

    def chooseGraph(self, H):
        choice = random.choice([1, 2])
        self.state = H, choice
        return choice

    def accepts(self, isomorphism):
        '''
            Return True if and only if the given isomorphism
            is a valid isomorphism between the randomly
            chosen graph in the first step, and the H presented
            by the Prover.
        '''
        H, choice = self.state
        graphToCheck = [self.G1, self.G2][choice - 1]
        f = isomorphism

        isValidIsomorphism = (graphToCheck == applyIsomorphism(H, f))
        return isValidIsomorphism

Then the protocol is as follows:

def runProtocol(G1, G2, isomorphism):
    p = Prover(G1, G2, isomorphism)
    v = Verifier(G1, G2)

    H = p.sendIsomorphicCopy()
    choice = v.chooseGraph(H)
    witnessIsomorphism = p.proveIsomorphicTo(choice)

    return v.accepts(witnessIsomorphism)

Analysis: Let’s suppose for a moment that everyone is honestly following the rules, and that $ G_1, G_2$ are truly isomorphic. Then you’ll always accept my claim, because I can always provide you with an isomorphism. Now let’s suppose that, actually I’m lying, the two graphs aren’t isomorphic, and I’m trying to fool you into thinking they are. What’s the probability that you’ll rightfully reject my claim?

Well, regardless of what I do, I’m sending you a graph $ H$ and you get to make a random choice of $ t = 1, 2$ that I can’t control. If $ H$ is only actually isomorphic to either $ G_1$ or $ G_2$ but not both, then so long as you make your choice uniformly at random, half of the time I won’t be able to produce a valid isomorphism and you’ll reject. And unless you can actually tell which graph $ H$ is isomorphic to—an open problem, but let’s say you can’t—then probability 1/2 is the best you can do.

Maybe the probability 1/2 is a bit unsatisfying, but remember that we can amplify this probability by repeating the protocol over and over again. So if you want to be sure I didn’t cheat and get lucky to within a probability of one-in-one-trillion, you only need to repeat the protocol 30 times. To be surer than the chance of picking a specific atom at random from all atoms in the universe, only about 400 times.

If you want to feel small, think of the number of atoms in the universe. If you want to feel big, think of its logarithm.

Here’s the code that repeats the protocol for assurance.

def convinceBeyondDoubt(G1, G2, isomorphism, errorTolerance=1e-20):
    probabilityFooled = 1

    while probabilityFooled > errorTolerance:
        result = runProtocol(G1, G2, isomorphism)
        assert result
        probabilityFooled *= 0.5
        print(probabilityFooled)

Running it, we see it succeeds

$ python graph-isomorphism.py
0.5
0.25
0.125
0.0625
0.03125
 ...
<SNIP>
 ...
1.3552527156068805e-20
6.776263578034403e-21

So it’s clear that this protocol is convincing.

But how can we be sure that there’s no leakage of knowledge in the protocol? What does “leakage” even mean? That’s where this topic is the most difficult to nail down rigorously, in part because there are at least three a priori different definitions! The idea we want to capture is that anything that you can efficiently compute after the protocol finishes (i.e., you have the content of the messages sent to you by the prover) you could have computed efficiently given only the two graphs $ G_1, G_2$, and the claim that they are isomorphic.

Another way to say it is that you may go through the verification process and feel happy and confident that the two graphs are isomorphic. But because it’s a zero-knowledge proof, you can’t do anything with that information more than you could have done if you just took the assertion on blind faith. I’m confident there’s a joke about religion lurking here somewhere, but I’ll just trust it’s funny and move on.

In the next post we’ll expand on this “leakage” notion, but before we get there it should be clear that the graph isomorphism protocol will have the strongest possible “no-leakage” property we can come up with. Indeed, in the first round the prover sends a uniform random isomorphic copy of $ G_1$ to the verifier, but the verifier can compute such an isomorphism already without the help of the prover. The verifier can’t necessarily find the isomorphism that the prover used in retrospect, because the verifier can’t solve graph isomorphism. Instead, the point is that the probability space of “$ G_1$ paired with an $ H$ made by the prover” and the probability space of “$ G_1$ paired with $ H$ as made by the verifier” are equal. No information was leaked by the prover.

For the second round, again the permutation $ \pi$ used by the prover to generate $ H$ is uniformly random. Since composing a fixed permutation with a uniform random permutation also results in a uniform random permutation, the second message sent by the prover is uniformly random, and so again the verifier could have constructed a similarly random permutation alone.

Let’s make this explicit with a small program. We have the honest protocol from before, but now I’m returning the set of messages sent by the prover, which the verifier can use for additional computation.

def messagesFromProtocol(G1, G2, isomorphism):
    p = Prover(G1, G2, isomorphism)
    v = Verifier(G1, G2)

    H = p.sendIsomorphicCopy()
    choice = v.chooseGraph(H)
    witnessIsomorphism = p.proveIsomorphicTo(choice)

    return [H, choice, witnessIsomorphism]

To say that the protocol is zero-knowledge (again, this is still colloquial) is to say that anything that the verifier could compute, given as input the return value of this function along with $ G_1, G_2$ and the claim that they’re isomorphic, the verifier could also compute given only $ G_1, G_2$ and the claim that $ G_1, G_2$ are isomorphic.

It’s easy to prove this, and we’ll do so with a python function called simulateProtocol.

def simulateProtocol(G1, G2):
    # Construct data drawn from the same distribution as what is
    # returned by messagesFromProtocol
    choice = random.choice([1, 2])
    G = [G1, G2][choice - 1]
    n = numVertices(G)

    isomorphism = randomPermutation(n)
    pi = makePermutationFunction(isomorphism)
    H = applyIsomorphism(G, pi)

    return H, choice, pi

The claim is that the distribution of outputs to messagesFromProtocol and simulateProtocol are equal. But simulateProtocol will work regardless of whether $ G_1, G_2$ are isomorphic. Of course, it’s not convincing to the verifier because the simulating function made the choices in the wrong order, choosing the graph index before making $ H$. But the distribution that results is the same either way.

So if you were to use the actual Prover/Verifier protocol outputs as input to another algorithm (say, one which tries to compute an isomorphism of $ G_1 \to G_2$), you might as well use the output of your simulator instead. You’d have no information beyond hard-coding the assumption that $ G_1, G_2$ are isomorphic into your program. Which, as I mentioned earlier, is no help at all.

In this post we covered one detailed example of a zero-knowledge proof. Next time we’ll broaden our view and see the more general power of zero-knowledge (that it captures all of NP), and see some specific cryptographic applications. Keep in mind the preceding discussion, because we’re going to re-use the terms “prover,” “verifier,” and “simulator” to mean roughly the same things as the classes Prover, Verifier and the function simulateProtocol.

Until then!