A Spectral Analysis of Moore Graphs

For fixed integers r > 0, and odd g, a Moore graph is an r-regular graph of girth g which has the minimum number of vertices n among all such graphs with the same regularity and girth.

(Recall, A the girth of a graph is the length of its shortest cycle, and it’s regular if all its vertices have the same degree)

Problem (Hoffman-Singleton): Find a useful constraint on the relationship between n and r for Moore graphs of girth 5 and degree r.

Note: Excluding trivial Moore graphs with girth g=3 and degree r=2, there are only two known Moore graphs: (a) the Petersen graph and (b) this crazy graph:

hoffman_singleton_graph_circle2

The solution to the problem shows that there are only a few cases left to check.

Solution: It is easy to show that the minimum number of vertices of a Moore graph of girth 5 and degree r is 1 + r + r(r-1) = r^2 + 1. Just consider the tree:

500px-petersen-as-moore-svg

This is the tree example for r = 3, but the argument should be clear for any r from the branching pattern of the tree: 1 + r + r(r-1)

Provided n = r^2 + 1, we will prove that r must be either 3, 7, or 57. The technique will be to analyze the eigenvalues of a special matrix derived from the Moore graph.

Let A be the adjacency matrix of the supposed Moore graph with these properties. Let B = A^2 = (b_{i,j}). Using the girth and regularity we know:

  • b_{i,i} = r since each vertex has degree r.
  • b_{i,j} = 0 if (i,j) is an edge of G, since any walk of length 2 from i to j would be able to use such an edge and create a cycle of length 3 which is less than the girth.
  • b_{i,j} = 1 if (i,j) is not an edge, because (using the tree idea above), every two vertices non-adjacent vertices have a unique neighbor in common.

Let J_n be the n \times n matrix of all 1’s and I_n the identity matrix. Then

\displaystyle B = rI_n + J_n - I_n - A.

We use this matrix equation to generate two equations whose solutions will restrict r. Since A is a real symmetric matrix is has an orthonormal basis of eigenvectors v_1, \dots, v_n with eigenvalues \lambda_1 , \dots, \lambda_n. Moreover, by regularity we know one of these vectors is the all 1’s vector, with eigenvalue r. Call this v_1 = (1, \dots, 1), \lambda_1 = r. By orthogonality of v_1 with the other v_i, we know that J_nv_i = 0. We also know that, since A is an adjacency matrix with zeros on the diagonal, the trace of A is \sum_i \lambda_i = 0.

Multiply the matrices in the equation above by any v_i, i > 1 to get

\displaystyle \begin{aligned}A^2v_i &= rv_i - v_i - Av_i \\ \lambda_i^2v_i &= rv_i - v_i - \lambda_i v_i \end{aligned}

Rearranging and factoring out v_i gives \lambda_i^2 - \lambda_i - (r+1) = 0. Let z = 4r - 3, then the non-r eigenvalues must be one of the two roots: \mu_1 = (-1 + \sqrt{z}) / 2 or \mu_2 = (-1 - \sqrt{z})/2.

Say that \mu_1 occurs a times and \mu_2 occurs b times, then n = a + b + 1. So we have the following equations.

\displaystyle \begin{aligned} a + b + 1 &= n \\ r + a \mu_1 + b\mu_2 &= 0 \end{aligned}

From this equation you can easily derive that \sqrt{z} is an integer, and as a consequence r = (m^2 + 3) / 4 for some integer m. With a tiny bit of extra algebra, this gives

\displaystyle m(m^3 - 2m - 16(a-b)) = 15

Implying that m divides 15, meaning m \in \{ 1, 3, 5, 15\}, and as a consequence r \in \{ 1, 3, 7, 57\}.

\square

Discussion: This is a strikingly clever use of spectral graph theory to answer a question about combinatorics. Spectral graph theory is precisely that, the study of what linear algebra can tell us about graphs. For an deeper dive into spectral graph theory, see the guest post I wrote on With High Probability.

If you allow for even girth, there are a few extra (infinite families of) Moore graphs, see Wikipedia for a list.

With additional techniques, one can also disprove the existence of any Moore graphs that are not among the known ones, with the exception of a possible Moore graph of girth 5 and degree 57 on n = 3250 vertices. It is unknown whether such a graph exists, but if it does, it is known that

You should go out and find it or prove it doesn’t exist.

Hungry for more applications of linear algebra to combinatorics and computer science? The book Thirty-Three Miniatures is a fantastically entertaining book of linear algebra gems (it’s where I found the proof in this post). The exposition is lucid, and the chapters are short enough to read on my daily train commute.

Advertisements

Zero Knowledge Proofs — A Primer

In this post we’ll get a strong taste for zero knowledge proofs by exploring the graph isomorphism problem in detail. In the next post, we’ll see how this relates to cryptography and the bigger picture. The goal of this post is to get a strong understanding of the terms “prover,” “verifier,” and “simulator,” and “zero knowledge” in the context of a specific zero-knowledge proof. Then next time we’ll see how the same concepts (though not the same proof) generalizes to a cryptographically interesting setting.

Graph isomorphism

Let’s start with an extended example. We are given two graphs G_1, G_2, and we’d like to know whether they’re isomorphic, meaning they’re the same graph, but “drawn” different ways.

The problem of telling if two graphs are isomorphic seems hard. The pictures above, which are all different drawings of the same graph (or are they?), should give you pause if you thought it was easy.

To add a tiny bit of formalism, a graph G is a list of edges, and each edge (u,v) is a pair of integers between 1 and the total number of vertices of the graph, say n. Using this representation, an isomorphism between G_1 and G_2 is a permutation \pi of the numbers \{1, 2, \dots, n \} with the property that (i,j) is an edge in G_1 if and only if (\pi(i), \pi(j)) is an edge of G_2. You swap around the labels on the vertices, and that’s how you get from one graph to another isomorphic one.

Given two arbitrary graphs as input on a large number of vertices n, nobody knows of an efficient—i.e., polynomial time in n—algorithm that can always decide whether the input graphs are isomorphic. Even if you promise me that the inputs are isomorphic, nobody knows of an algorithm that could construct an isomorphism. (If you think about it, such an algorithm could be used to solve the decision problem!)

A game

Now let’s play a game. In this game, we’re given two enormous graphs on a billion nodes. I claim they’re isomorphic, and I want to prove it to you. However, my life’s fortune is locked behind these particular graphs (somehow), and if you actually had an isomorphism between these two graphs you could use it to steal all my money. But I still want to convince you that I do, in fact, own all of this money, because we’re about to start a business and you need to know I’m not broke.

Is there a way for me to convince you beyond a reasonable doubt that these two graphs are indeed isomorphic? And moreover, could I do so without you gaining access to my secret isomorphism? It would be even better if I could guarantee you learn nothing about my isomorphism or any isomorphism, because even the slightest chance that you can steal my money is out of the question.

Zero knowledge proofs have exactly those properties, and here’s a zero knowledge proof for graph isomorphism. For the record, G_1 and G_2 are public knowledge, (common inputs to our protocol for the sake of tracking runtime), and the protocol itself is common knowledge. However, I have an isomorphism f: G_1 \to G_2 that you don’t know.

Step 1: I will start by picking one of my two graphs, say G_1, mixing up the vertices, and sending you the resulting graph. In other words, I send you a graph H which is chosen uniformly at random from all isomorphic copies of G_1. I will save the permutation \pi that I used to generate H for later use.

Step 2: You receive a graph H which you save for later, and then you randomly pick an integer t which is either 1 or 2, with equal probability on each. The number t corresponds to your challenge for me to prove H is isomorphic to G_1 or G_2. You send me back t, with the expectation that I will provide you with an isomorphism between H and G_t.

Step 3: Indeed, I faithfully provide you such an isomorphism. If I you send me t=1, I’ll give you back \pi^{-1} : H \to G_1, and otherwise I’ll give you back f \circ \pi^{-1}: H \to G_2. Because composing a fixed permutation with a uniformly random permutation is again a uniformly random permutation, in either case I’m sending you a uniformly random permutation.

Step 4: You receive a permutation g, and you can use it to verify that H is isomorphic to G_t. If the permutation I sent you doesn’t work, you’ll reject my claim, and if it does, you’ll accept my claim.

Before we analyze, here’s some Python code that implements the above scheme. You can find the full, working example in a repository on this blog’s Github page.

First, a few helper functions for generating random permutations (and turning their list-of-zero-based-indices form into a function-of-positive-integers form)

import random

def randomPermutation(n):
    L = list(range(n))
    random.shuffle(L)
    return L

def makePermutationFunction(L):
    return lambda i: L[i - 1] + 1

def makeInversePermutationFunction(L):
    return lambda i: 1 + L.index(i - 1)

def applyIsomorphism(G, f):
    return [(f(i), f(j)) for (i, j) in G]

Here’s a class for the Prover, the one who knows the isomorphism and wants to prove it while keeping the isomorphism secret:

class Prover(object):
    def __init__(self, G1, G2, isomorphism):
        '''
            isomomorphism is a list of integers representing
            an isomoprhism from G1 to G2.
        '''
        self.G1 = G1
        self.G2 = G2
        self.n = numVertices(G1)
        assert self.n == numVertices(G2)

        self.isomorphism = isomorphism
        self.state = None

    def sendIsomorphicCopy(self):
        isomorphism = randomPermutation(self.n)
        pi = makePermutationFunction(isomorphism)

        H = applyIsomorphism(self.G1, pi)

        self.state = isomorphism
        return H

    def proveIsomorphicTo(self, graphChoice):
        randomIsomorphism = self.state
        piInverse = makeInversePermutationFunction(randomIsomorphism)

        if graphChoice == 1:
            return piInverse
        else:
            f = makePermutationFunction(self.isomorphism)
            return lambda i: f(piInverse(i))

The prover has two methods, one for each round of the protocol. The first creates an isomorphic copy of G_1, and the second receives the challenge and produces the requested isomorphism.

And here’s the corresponding class for the verifier

class Verifier(object):
    def __init__(self, G1, G2):
        self.G1 = G1
        self.G2 = G2
        self.n = numVertices(G1)
        assert self.n == numVertices(G2)

    def chooseGraph(self, H):
        choice = random.choice([1, 2])
        self.state = H, choice
        return choice

    def accepts(self, isomorphism):
        '''
            Return True if and only if the given isomorphism
            is a valid isomorphism between the randomly
            chosen graph in the first step, and the H presented
            by the Prover.
        '''
        H, choice = self.state
        graphToCheck = [self.G1, self.G2][choice - 1]
        f = isomorphism

        isValidIsomorphism = (graphToCheck == applyIsomorphism(H, f))
        return isValidIsomorphism

Then the protocol is as follows:

def runProtocol(G1, G2, isomorphism):
    p = Prover(G1, G2, isomorphism)
    v = Verifier(G1, G2)

    H = p.sendIsomorphicCopy()
    choice = v.chooseGraph(H)
    witnessIsomorphism = p.proveIsomorphicTo(choice)

    return v.accepts(witnessIsomorphism)

Analysis: Let’s suppose for a moment that everyone is honestly following the rules, and that G_1, G_2 are truly isomorphic. Then you’ll always accept my claim, because I can always provide you with an isomorphism. Now let’s suppose that, actually I’m lying, the two graphs aren’t isomorphic, and I’m trying to fool you into thinking they are. What’s the probability that you’ll rightfully reject my claim?

Well, regardless of what I do, I’m sending you a graph H and you get to make a random choice of t = 1, 2 that I can’t control. If H is only actually isomorphic to either G_1 or G_2 but not both, then so long as you make your choice uniformly at random, half of the time I won’t be able to produce a valid isomorphism and you’ll reject. And unless you can actually tell which graph H is isomorphic to—an open problem, but let’s say you can’t—then probability 1/2 is the best you can do.

Maybe the probability 1/2 is a bit unsatisfying, but remember that we can amplify this probability by repeating the protocol over and over again. So if you want to be sure I didn’t cheat and get lucky to within a probability of one-in-one-trillion, you only need to repeat the protocol 30 times. To be surer than the chance of picking a specific atom at random from all atoms in the universe, only about 400 times.

If you want to feel small, think of the number of atoms in the universe. If you want to feel big, think of its logarithm.

Here’s the code that repeats the protocol for assurance.

def convinceBeyondDoubt(G1, G2, isomorphism, errorTolerance=1e-20):
    probabilityFooled = 1

    while probabilityFooled > errorTolerance:
        result = runProtocol(G1, G2, isomorphism)
        assert result
        probabilityFooled *= 0.5
        print(probabilityFooled)

Running it, we see it succeeds

$ python graph-isomorphism.py
0.5
0.25
0.125
0.0625
0.03125
 ...
<SNIP>
 ...
1.3552527156068805e-20
6.776263578034403e-21

So it’s clear that this protocol is convincing.

But how can we be sure that there’s no leakage of knowledge in the protocol? What does “leakage” even mean? That’s where this topic is the most difficult to nail down rigorously, in part because there are at least three a priori different definitions! The idea we want to capture is that anything that you can efficiently compute after the protocol finishes (i.e., you have the content of the messages sent to you by the prover) you could have computed efficiently given only the two graphs G_1, G_2, and the claim that they are isomorphic.

Another way to say it is that you may go through the verification process and feel happy and confident that the two graphs are isomorphic. But because it’s a zero-knowledge proof, you can’t do anything with that information more than you could have done if you just took the assertion on blind faith. I’m confident there’s a joke about religion lurking here somewhere, but I’ll just trust it’s funny and move on.

In the next post we’ll expand on this “leakage” notion, but before we get there it should be clear that the graph isomorphism protocol will have the strongest possible “no-leakage” property we can come up with. Indeed, in the first round the prover sends a uniform random isomorphic copy of G_1 to the verifier, but the verifier can compute such an isomorphism already without the help of the prover. The verifier can’t necessarily find the isomorphism that the prover used in retrospect, because the verifier can’t solve graph isomorphism. Instead, the point is that the probability space of “G_1 paired with an H made by the prover” and the probability space of “G_1 paired with H as made by the verifier” are equal. No information was leaked by the prover.

For the second round, again the permutation \pi used by the prover to generate H is uniformly random. Since composing a fixed permutation with a uniform random permutation also results in a uniform random permutation, the second message sent by the prover is uniformly random, and so again the verifier could have constructed a similarly random permutation alone.

Let’s make this explicit with a small program. We have the honest protocol from before, but now I’m returning the set of messages sent by the prover, which the verifier can use for additional computation.

def messagesFromProtocol(G1, G2, isomorphism):
    p = Prover(G1, G2, isomorphism)
    v = Verifier(G1, G2)

    H = p.sendIsomorphicCopy()
    choice = v.chooseGraph(H)
    witnessIsomorphism = p.proveIsomorphicTo(choice)

    return [H, choice, witnessIsomorphism]

To say that the protocol is zero-knowledge (again, this is still colloquial) is to say that anything that the verifier could compute, given as input the return value of this function along with G_1, G_2 and the claim that they’re isomorphic, the verifier could also compute given only G_1, G_2 and the claim that G_1, G_2 are isomorphic.

It’s easy to prove this, and we’ll do so with a python function called simulateProtocol.

def simulateProtocol(G1, G2):
    # Construct data drawn from the same distribution as what is
    # returned by messagesFromProtocol
    choice = random.choice([1, 2])
    G = [G1, G2][choice - 1]
    n = numVertices(G)

    isomorphism = randomPermutation(n)
    pi = makePermutationFunction(isomorphism)
    H = applyIsomorphism(G, pi)

    return H, choice, pi

The claim is that the distribution of outputs to messagesFromProtocol and simulateProtocol are equal. But simulateProtocol will work regardless of whether G_1, G_2 are isomorphic. Of course, it’s not convincing to the verifier because the simulating function made the choices in the wrong order, choosing the graph index before making H. But the distribution that results is the same either way.

So if you were to use the actual Prover/Verifier protocol outputs as input to another algorithm (say, one which tries to compute an isomorphism of G_1 \to G_2), you might as well use the output of your simulator instead. You’d have no information beyond hard-coding the assumption that G_1, G_2 are isomorphic into your program. Which, as I mentioned earlier, is no help at all.

In this post we covered one detailed example of a zero-knowledge proof. Next time we’ll broaden our view and see the more general power of zero-knowledge (that it captures all of NP), and see some specific cryptographic applications. Keep in mind the preceding discussion, because we’re going to re-use the terms “prover,” “verifier,” and “simulator” to mean roughly the same things as the classes Prover, Verifier and the function simulateProtocol.

Until then!

Hashing to Estimate the Size of a Stream

Problem: Estimate the number of distinct items in a data stream that is too large to fit in memory.

Solution: (in python)

import random

def randomHash(modulus):
   a, b = random.randint(0,modulus-1), random.randint(0,modulus-1)
   def f(x):
      return (a*x + b) % modulus
   return f

def average(L):
   return sum(L) / len(L)

def numDistinctElements(stream, numParallelHashes=10):
   modulus = 2**20
   hashes = [randomHash(modulus) for _ in range(numParallelHashes)]
   minima = [modulus] * numParallelHashes
   currentEstimate = 0

   for i in stream:
      hashValues = [h(i) for h in hashes]
      for i, newValue in enumerate(hashValues):
         if newValue < minima[i]:
            minima[i] = newValue

      currentEstimate = modulus / average(minima)

      yield currentEstimate

Discussion: The technique used here is to use random hash functions. The central idea is the same as the general principle presented in our recent post on hashing for load balancing. In particular, if you have an algorithm that works under the assumption that the data is uniformly random, then the same algorithm will work (up to a good approximation) if you process the data through a randomly chosen hash function.

So if we assume the data in the stream consists of N uniformly random real numbers between zero and one, what we would do is the following. Maintain a single number x_{\textup{min}} representing the minimum element in the list, and update it every time we encounter a smaller number in the stream. A simple probability calculation or an argument by symmetry shows that the expected value of the minimum is 1/(N+1). So your estimate would be 1/(x_{\textup{min}}+1). (The extra +1 does not change much as we’ll see.) One can spend some time thinking about the variance of this estimate (indeed, our earlier post is great guidance for how such a calculation would work), but since the data is not random we need to do more work. If the elements are actually integers between zero and k, then this estimate can be scaled by k and everything basically works out the same.

Processing the data through a hash function h chosen randomly from a 2-universal family (and we proved in the aforementioned post that this modulus thing is 2-universal) makes the outputs “essentially random” enough to have the above technique work with some small loss in accuracy. And to reduce variance, you can process the stream in parallel with many random hash functions. This rough sketch results in the code above. Indeed, before I state a formal theorem, let’s see the above code in action. First on truly random data:

S = [random.randint(1,2**20) for _ in range(10000)]

for k in range(10,301,10):
   for est in numDistinctElements(S, k):
      pass
   print(abs(est))

# output
18299.75567190227
7940.7497160166595
12034.154552410098
12387.19432959244
15205.56844547564
8409.913113220158
8057.99978043693
9987.627098464103
10313.862295081966
9084.872639057356
10952.745228373375
10360.569781803211
11022.469475216301
9741.250165892501
11474.896038520465
10538.452261306533
10068.793492995934
10100.266495424627
9780.532155130093
8806.382800033594
10354.11482578643
10001.59202254498
10623.87031408308
9400.404915767062
10710.246772348424
10210.087633885101
9943.64709187974
10459.610972568578
10159.60175069326
9213.120899718839

As you can see the output is never off by more than a factor of 2. Now with “adversarial data.”

S = range(10000) #[random.randint(1,2**20) for _ in range(10000)]

for k in range(10,301,10):
   for est in numDistinctElements(S, k):
      pass
   print(abs(est))

# output

12192.744186046511
15935.80547112462
10167.188106011634
12977.425742574258
6454.364151175674
7405.197740112994
11247.367453263867
4261.854392115023
8453.228233608026
7706.717624577393
7582.891328643745
5152.918628936483
1996.9365093316926
8319.20208545846
3259.0787592465967
6812.252720480753
4975.796789951151
8456.258064516129
8851.10133724288
7317.348220516398
10527.871485943775
3999.76974425661
3696.2999065091117
8308.843106180666
6740.999794281012
8468.603733730935
5728.532232608959
5822.072220349402
6382.349459544548
8734.008940222673

The estimates here are off by a factor of up to 5, and this estimate seems to get better as the number of hash functions used increases. The formal theorem is this:

Theorem: If S is the set of distinct items in the stream and n = |S| and m > 100 n, then with probability at least 2/3 the estimate m / x_{\textup{min}} is between n/6 and 6n.

We omit the proof (see below for references and better methods). As a quick analysis, since we’re only storing a constant number of integers at any given step, the algorithm has space requirement O(\log m) = O(\log n), and each step takes time polynomial in \log(m) to update in each step (since we have to compute multiplication and modulus of m).

This method is just the first ripple in a lake of research on this topic. The general area is called “streaming algorithms,” or “sublinear algorithms.” This particular problem, called cardinality estimation, is related to a family of problems called estimating frequency moments. The literature gets pretty involved in the various tradeoffs between space requirements and processing time per stream element.

As far as estimating cardinality goes, the first major results were due to Flajolet and Martin in 1983, where they provided a slightly more involved version of the above algorithm, which uses logarithmic space.

Later revisions to the algorithm (2003) got the space requirement down to O(\log \log n), which is exponentially better than our solution. And further tweaks and analysis improved the variance bounds to something like a multiplicative factor of \sqrt{m}. This is called the HyperLogLog algorithm, and it has been tested in practice at Google.

Finally, a theoretically optimal algorithm (achieving an arbitrarily good estimate with logarithmic space) was presented and analyzed by Kane et al in 2010.

A Quasipolynomial Time Algorithm for Graph Isomorphism: The Details

Update 2017-01-09: Laci claims to have found a workaround to the previously posted error, and the claim is again quasipolynoimal time! Updated arXiv paper to follow.

Update 2017-01-04: Laci has posted an update on his paper. The short version is that one small step of his analysis was not quite correct, and the result is that his algorithm is sub-exponential, but not quasipolynomial time. The fact that this took over a year to sort out is a testament to the difficulty of the mathematics and the skill of the mathematicians involved. Even the revised result is still a huge step forward in the analysis of graph isomorphism. Finally, this should reinforce how we should think about progress in mathematics: it comes in fits and starts, with occasional steps backward.

Update 2015-12-13: Laci has posted a preprint on arXiv. It’s quite terse, but anyone who is comfortable with the details sketched in this article should have a fine time (albeit a long time)  reading it.

Update 2015-11-21: Ken Regan and Dick Lipton posted an article with some more details, and a high level overview of how the techniques fit into the larger picture of CS theory.

Update 2015-11-16: Laci has posted the talk on his website. It’s an hour and a half long, and I encourage you to watch it if you have the time 🙂

Laszlo Babai has claimed an astounding theorem, that the Graph Isomorphism problem can be solved in quasipolynomial time (now outdated; see Update 2017-01-04 above). On Tuesday I was at Babai’s talk on this topic (he has yet to release a preprint), and I’ve compiled my notes here. As in Babai’s talk, familiarity with basic group theory and graph theory is assumed, and if you’re a casual (i.e., math-phobic) reader looking to understand what the fuss is all about, this is probably not the right post for you. This post is research level theoretical computer science. We’re here for the juicy, glorious details.

Note: this blog post will receive periodic updates as my understanding of the details improve.

Laci during his lecture.

Laci during his lecture. Photo taken by me.

Standing room only at Laci's talk.

Standing room only at Laci’s talk. My advisor in the bottom right, my coauthor mid-left with the thumbs. Various famous researchers spottable elsewhere.

Background on Graph Isomorphism

I’ll start by giving a bit of background into why Graph Isomorphism (hereafter, GI) is such a famous problem, and why this result is important. If you’re familiar with graph isomorphism and the basics of complexity theory, skip to the next section where I get into the details.

GI is the following problem: given two graphs G = (V_G, E_G), H = (V_H, E_H), determine whether the graphs are isomorphic, that is, whether there is a bijection f : V_G \to V_H such that u,v are connected in G if and only if f(u), f(v) are connected in H. Informally, GI asks whether it’s easy to tell from two drawings of a graph whether the drawings actually represent the same graph. If you’re wondering why this problem might be hard, check out the following picture of the same graph drawn in three different ways.

GI-example

Indeed, a priori the worst-case scenario is that one would have to try all n! ways to rearrange the nodes of the first graph and see if one rearrangement achieves the second graph. The best case scenario is that one can solve this problem efficiently, that is, with an algorithm whose worst-case runtime on graphs with n nodes and m edges is polynomial in n and m (this would show that GI is in the class P). However, nobody knows whether there is a polynomial time algorithm for GI, and it’s been a big open question in CS theory for over forty years. This is the direction that Babai is making progress toward, showing that there are efficient algorithms. He didn’t get a polynomial time algorithm, but he got something quite close, which is why everyone is atwitter.

It turns out that telling whether two graphs are isomorphic has practical value in some applications. I hear rumor that chemists use it to search through databases of chemicals for one with certain properties (one way to think of a chemical compound is as a graph). I also hear that some people use graph isomorphism to compare files, do optical character recognition, and analyze social networks, but it seems highly probable to me that GI is not the central workhorse (or even a main workhorse) in these fields. Nevertheless, the common understanding is that pretty much anybody who needs to solve GI on a practical level can do so efficiently. The heuristics work well. Even in Babai’s own words, getting better worst-case algorithms for GI is purely a theoretical enterprise.

So if GI isn’t vastly important for real life problems, why are TCS researchers so excited about it?

Well it’s known that GI is in the class NP, meaning if two graphs are isomorphic you can give me a short proof that I can verify in polynomial time (the proof is just a description of the function f : V_G \to V_H). And if you’ll recall that inside NP there is this class called NP-complete, which are the “hardest” problems in NP. Now most problems in NP that we care about are also NP-complete, but it turns out GI is not known to be NP-complete either. Now, for all we know P = NP and then the question about GI is moot, but in the scenario that most people believe P and NP are different, so it leaves open the question of where GI lies: does it have efficient algorithms or not?

So we have a problem which is in NP, it’s not known to be in P, and it’s not known to be NP-complete. One obvious objection is that it might be neither. In fact, there’s a famous theorem of Ladner that says if P is different from NP, then there must be problems in NP, not in P, and not NP-complete. Such problems are called “NP-intermediate.” It’s perfectly reasonable that GI is one of these problems. But there’s a bit of a catch.

See, Ladner’s theorem doesn’t provide a natural problem which is NP intermediate; what Ladner did in his theorem was assume P is not NP, and then use that assumption to invent a new problem that he could prove is NP intermediate. If you come up with a problem whose only purpose is to prove a theorem, then the problem is deemed unnatural. In fact, there is no known “natural” NP-intermediate problem (assuming P is not NP). The pattern in CS theory is actually that if we find a problem that might be NP-intermediate, someone later finds an efficient algorithm for it or proves it’s NP-complete. There is a small and dwindling list of such problems. I say dwindling because not so long ago the problem of telling whether an integer is prime was in this list. The symptoms are that one first solves the problem on many large classes of special cases (this is true of GI) or one gets a nice quasipolynomial-time algorithm (Babai’s claimed new result), and then finally it falls into P. In fact, there is even stronger evidence against it being NP-complete: if GI were NP-complete, the polynomial hierarchy would collapse. To the layperson, the polynomial hierarchy is abstruse complexity theoretic technical hoo-hah, but suffice it to say that most experts believe the hierarchy does not collapse, so this counts as evidence.

So indeed, it could be that GI will become the first ever problem which is NP-intermediate (assuming P is not NP), but from historical patterns it seems more likely that it will fall into P. So people are excited because it’s tantalizing: everyone believes it should be in P, but nobody can prove it. It’s right at the edge of the current state of knowledge about the theoretical capabilities and limits of computation.

This is the point at which I will start assuming some level of mathematical maturity.

The Main Result

The specific claim about graph isomorphism being made is the following:

Theorem: There is a deterministic algorithm for GI which runs in time 2^{O(\log^c(n))} for some constant c.

This is an improvement over the best previously known algorithm which had runtime 2^{\sqrt{n \log n}}. Note the \sqrt{n} in the exponent has been eliminated, which is a huge difference. Quantities which are exponential in some power of a logarithm are called “quasipolynomial.”

But the main result is actually a quasipolynomial time algorithm for a different, more general problem called string automorphism. In this context, given a set Xstring is a function from X to some finite alphabet (really it is a coloring of X, but we are going to use colorings in the usual sense later so we have to use a new name here). If the set X is given a linear ordering then strings on X really correspond to strings of length |X| over the alphabet. We will call strings x,y \in X.

Now given a set X and a group G acting on X, there is a natural action of G on strings over X, denoted x^\sigma, by permuting the indices x^{\sigma}(i) = x(\sigma(i)). So you can ask the natural question: given two strings x,y and a representation of a group G acting on X by a set of generating permutations of G, is there a \sigma \in G with x^\sigma = y? This problem is called the string isomorphism problem, and it’s clearly in NP.

Now if you call \textup{ISO}_G(x,y) the set of all permutations in G that map x to y, and you call \textup{Aut}_G(x) = \textup{ISO}_G(x,x), then the actual theorem Babai claims to have proved is the following.

Theorem: Given a generating set for a group G of permutations of a set X and a string x, there is a quasipolynomial time algorithm which produces a generating set of the subgroup \textup{Aut}_G(x) of G, i.e. the string automorphisms of x that lie in in G.

It is not completely obvious that GI reduces to the automorphism problem, but I will prove it does in the next section. Furthermore, the overview of Babai’s proof of the theorem follows an outline laid out by Eugene Luks in 1982, which involves a divide-and-conquer method for splitting the analysis of \textup{Aut}_G(x) into simpler and simpler subgroups as they are found.

Luks’s program

Eugene Luks was the first person to incorporate “serious group theory” (Babai’s words) into the study of graph isomorphism. Why would group theory help in a question about graphs? Let me explain with a lemma.

Lemma: GI is polynomial-time reducible to the problem of computing, given a graph X, a list of generators for the automorphism group of G, denoted \textup{Aut}(X).

Proof. Without loss of generality suppose X_1, X_2 are connected graphs. If we want to decide whether X_1, X_2 are isomorphic, we may form the disjoint union X = X_1 \cup X_2. It is easy to see that X_1 and X_2 are isomorphic if and only if some \sigma \in \textup{Aut}(X) swaps X_1 and X_2. Indeed, if any automorphism with this property exists, every generating set of \textup{Aut}(G) must contain one.

\square

Similarly, the string isomorphism problem reduces to the problem of computing a generating set for \textup{Aut}_G(x) using a similar reduction to the one above. As a side note, while \textup{ISO}_G(x,y) can be exponentially large as a set, it is either the empty set, or a coset of \textup{Aut}_G(x) by any element of \textup{ISO}_G(x,y). So there are group-theoretic connections between the automorphism group of a string and the isomorphisms between two strings.

But more importantly, computing the automorphism group of a graph reduces to computing the automorphism subgroup of a particular group for a given string in the following way. Given a graph X on a vertex set V = \{ 1, 2, \dots, v \} write X as a binary string on the set of unordered pairs Z = \binom{V}{2} by mapping (i,j) \to 1 if and only if i and j are connected by an edge. The alphabet size is 2. Then \textup{Aut}(X) (automorphisms of the graph) induces an action on strings as a subgroup G_X of \textup{Aut}(Z) (automorphisms of strings). These induced automorphisms are exactly those which preserve proper encodings of a graph. Moreover, any string automorphism in G_X is an automorphism of X and vice versa. Note that since Z is larger than V by a factor of v^2, the subgroup G_X is much smaller than all of \textup{Aut}(Z).

Moreover, \textup{Aut}(X) sits inside the full symmetry group \textup{Sym}(V) of V, the vertex set of the starting graph, and \textup{Sym}(V) also induces an action G_V on Z. The inclusion is

\displaystyle \textup{Aut}(X) \subset \textup{Sym}(V)

induces

\displaystyle G_X = \textup{Aut}(\textup{Enc}(X)) \subset G_V \subset \textup{Aut}(Z)

I.e.,

Call G = G_V the induced subgroup of permutations of strings-as-graphs. Now we just have some subgroup of permutations G of \textup{Aut}(Z), and we want to find a generating set for \textup{Aut}_G(x) (where x happens to be the encoding of a graph). That is exactly the string automorphism problem. Reduction complete.

Now the basic idea to compute \textup{Aut}_G(x) is to start from the assumption that \textup{Aut}_G(x) = G. We know it’s a subgroup, so it could be all of G; in terms of GI if this assumption were true it would mean the starting graph was the complete graph, but for string automorphism in general G can be whatever. Then we try to refute this belief by finding additional structure in \textup{Aut}_G(x), either by breaking it up into smaller pieces (say, orbits) or by constructing automorphisms in it. That additional structure allows us to break up G in a way that \textup{Aut}_G(x) is a subgroup of the product of the corresponding factors of G.

The analogy that Babai used, which goes back to graphs, is the following. If you have a graph X and you want to know its automorphisms, one thing you can do is to partition the vertices by degree. You know that an automorphism has to preserve the degree of an individual vertex, so in particular you can break up the assumption that \textup{Aut}(X) = \textup{Sym}(V) into the fact that \textup{Aut}(X) must be a subgroup of the product of the symmetry groups of the pieces of the partition; then you recurse. In this way you’ve hugely reduced the total number of automorphisms you need to consider. When the degrees get small enough you can brute-force search for automorphisms (and there is some brute-force searching in putting the pieces back together). But of course this approach fails miserably in the first step you start with a regular graph, so one would need to look for other kinds of structure.

One example is an equitable partition, which is a partition of vertices by degree relative to vertices in other blocks of the partition. So a vertex which has degree 3 but two degree 2 neighbors would be in a different block than a vertex with degree 3 and only 1 neighbor of degree 2. Finding these equitable partitions (which can be done in polynomial time) is one of the central tools used to attack GI. As an example of why it can be very helpful: in many regimes a Erdos-Renyi random graph has asymptotically almost surely a coarsest equitable partition which consists entirely of singletons. This is despite the fact that the degree sequences themselves are tightly constrained with high probability. This means that, if you’re given two Erdos-Renyi random graphs and you want to know whether they’re isomorphic, you can just separately compute the coarsest equitable partition for each one and see if the singleton blocks match up. That is your isomorphism.

Even still, there are many worst case graphs that resist being broken up by an equitable partition. A hard example is known as the Johnson graph, which we’ll return to later.

For strings the sorts of structures to look for are even more refined than equitable partitions of graphs, because the automorphism group of a graph can be partitioned into orbits which preserve the block structure of an equitable partition. But it still turns out that Johnson graphs admit parts of the automorphism group that can’t be broken up much by orbits.

The point is that when some useful substructure is found, it will “make progress” toward the result by breaking the problem into many pieces (say, n^{\log n} pieces) where each piece has size 9/10 the size of the original. So you get a recursion in the amount of time needed which looks like f(n) \leq n^{\log n} f(9n/10). If you call q(n) = n^{\log n} the quasipolynomial factor, then solving the recurrence gives f(n) \leq q(n)^{O(\log n)} which only adds an extra log factor in the exponent. So you keep making progress until the problem size is polylogarithmic, and then you brute force it and put the pieces back together in quasipolynomial time.

Two main lemmas, which are theorems in their own right

This is where the details start to get difficult, in part because Babai jumped back and forth between thinking of the object as a graph and as a string. The point of this in the lecture was to illustrate both where the existing techniques for solving GI (which were in terms of finding canonical graph substructures in graphs) break down.

The central graph-theoretic picture is that of “individualizing” a vertex by breaking it off from an existing equitable partition, which then breaks the equitable partition structure so you need to do some more (polytime) work to further refine it into an equitable partition again. But the point is that you can take all the vertices in a block, pick all possible ways to individualize them by breaking them into smaller blocks. If you traverse these possibilities in a canonical order, you will eventually get down to a partition of singletons, which is your “canonical labeling” of the graph. And if you do this process with two different graphs and you get to different canonical labelings, you had to have started with non-isomorphic graphs.

The problem is that when you get to a coarsest equitable partition, you may end up with blocks of size \sqrt{n}, meaning you have an exponential number of individualizations to check. This is the case with Johnson graphs, and in fact if you have a Johnson graph J(m,t) which has \binom{m}{t} vertices and you individualize fewer than m/10t if them, then you will only get down to blocks of size polynomially smaller than \binom{m}{t}, which is too big if you want to brute force check all individualizations of a block.

The main combinatorial lemma that Babai proves to avoid this problem is that the Johnson graphs are the only obstacle to finding efficient partitions.

Theorem (Babai 15): If X is a regular graph on m vertices, then by individualizing a polylog number of vertices we can find one of the three following things:

  1. A canonical coloring with each color class having at most 90% of all the nodes.
  2. A canonical equipartition of some subset of the vertices that has at least 90% of the nodes (i.e. a big color class from (1)).
  3. A canonically embedded Johnson graph on at least 90% of the nodes.

[Edit: I think that what Babai means by a “canonical coloring” is an equitable partition of the nodes (not to be confused with an equipartition), but I am not entirely sure. I have changed the language to reflect more clearly what was in the talk as opposed to what I think I understood from doing additional reading.]

The first two are apparently the “easy” cases in the sense that they allow for simple recursion that has already been known before Babai’s work. The hard part is establishing the last case (and this is the theorem whose proof sketch he has deferred for two more weeks). But once you have such a Johnson graph your life is much better, because (for a reason I did not understand) you can recurse on a problem of size roughly the square root of the starting size.

In discussing Johnson graphs, Babai said they were a source of “unspeakable misery” for people who want to solve GI quickly. At the same time, it is a “curse and a blessing,” as once you’ve found a Johnson graph embedded in your problem you can recurse to much smaller instances. This routine to find one of these three things is called the “split-or-Johnson” routine.

The analogue for strings (I believe this is true, but I’m a bit fuzzy on this part) is to find a “canonical” k-ary relational structure (where k is polylog in size) with some additional condition on the size of alternating subgroups of the automorphism group of the k-ary relational structure. Then you can “individualize” the points in the base of this relational structure and find analogous partitions and embedded Johnson schemes (a special kind of combinatorial design).

One important fact to note is that the split-or-Johnson routine breaks down at \log^3(n) size, and Babai has counterexamples that say his result is tight, so getting GI in P would have to bypass this barrier with a different idea.

The second crucial lemma has to do with giant homomorphisms, and this is the method by which Babai constructs automorphisms that bound \textup{Aut}_G(x) from below. As opposed to the split-or-Johnson lemma, which finds structure to bound the group from above by breaking it into simpler pieces. Fair warning: one thing I don’t yet understand is how these two routines interact in the final algorithm. My guess is they are essentially run in parallel (or alternate), but that guess is as good as wild.

Definition: A homomorphism \varphi: G \to S_m is called giant if the image of G is either the alternating group A_n or else all of S_m. I.e. \varphi is surjective, or almost so. Let \textup{Stab}_G(x) denote the stabilizer subgroup of x \in G. Then x is called “affected” by \varphi if \varphi|_{\textup{Stab}_g(x)} is not giant.

The central tool in Babai’s algorithm is the dichotomy between points that are affected and those that are not. The ability to decide this property in quasipolynomial time is what makes the divide and conquer work.

There is one deep theorem he uses that relates affected points to giant homomorphisms:

Theorem (Unaffected Stabilizer Theorem): Let \varphi: G \to S_m be a giant homomorphism and U \subset G the set of unaffected elements. Let G_{(U)} be the pointwise stabilizer of U, and suppose that m > \textup{max}(8, 2 + \log_2 n). Then the restriction \varphi : G_{(U)} \to S_m is still giant.

Babai claimed this was a nontrivial theorem, not because the proof is particularly difficult, but because it depends on the classification of finite simple groups. He claimed it was a relatively straightforward corollary, but it appears that this does not factor into the actual GI algorithm constructively, but only as an assumption that a certain loop invariant will hold.

To recall, we started with this assumption that \textup{Aut}_G(x) was the entire symmetry group we started with, which is in particular an assumption that the inclusion \varphi \to S_m is giant. Now you want to refute this hypothesis, but you can’t look at all of S_m because even the underlying set m has too many subsets to check. But what you can do is pick a test set A \subset [m] where |A| is polylogarithmic in size, and test whether the restriction of \varphi to the test set is giant in \textup{Sym}(A). If it is giant, we call A full.

Theorem (Babai 15): One can test the property of a polylogarithmic size test set being full in quasipolynomial time in m.

Babai claimed it should be surprising that fullness is a quasipolynomial time testable property, but more surprising is that regardless of whether it’s full or not, we can construct an explicit certificate of fullness or non-fullness. In the latter case, one can come up with an explicit subgroup which contains the image of the projection of the automorphism group onto the symmetry group of the test set. In addition, given two test sets A,B, one can efficiently compare the action between the two different test sets. And finding these non-full test sets is what allows one to construct the k-ary relations. So the output of this lower bound technique informs the upper bound technique of how to proceed.

The other outcome is that A could be full, and coming up with a certificate of fullness is harder. The algorithm sketched below claims to do it, and it involves finding enough “independent” automorphisms to certify that the projection is giant.

Now once you try all possible test sets, which gives \binom{m}{k}^2 many certificates (a quasipolynomial number), one has to aggregate them into a full automorphism of x, which Babai assured us was a group theoretic exercise.

The algorithm to test fullness (and construct a certificate) he called the Local Certificates Algorithm. It was sketched as follows: you are given as input a set A and a group G_A \subset G being the setwise stabilizer of A under \psi_A : G_A \to \textup{Sym}(A). Now let W be the group elements affected by \psi_A. You can be sure that at least one point is affected. Now you stabilize on W and get a refined subgroup of G_A, which you can use to compute newly affected elements, growing W in each step. By the unaffected stabilizer theorem, this preserves gianthood. Furthermore, in each step you get layers of W, and all of the stabilizers respect the structure of the previous layers. Babai described this as adding on “layers of a beard.”

The termination of this is either when W stops growing, in which case the projection is giant and W is our certificate of fullness (i.e. we get a rich family of automorphisms that are actually in our target automorphism group), or else we discover the projected ceases to be giant and W is our certificate of non-fullness. Indeed, the subgroup generated by these layers is a subgroup of \textup{Aut}_G(x), and the subgroup generated by the elements of a non-fullness certificate contain the automorphism group.

Not enough details?

This was supposed to be just a high-level sketch of the algorithm, and Babai is giving two more talks elaborating on the details. Unfortunately, I won’t be able to make it to his second talk in which he’ll discuss some of the core group theoretic ideas that go into the algorithm. I will, however, make it to his third talk in which he will sketch the proof of the split-or-Johnson routine. That is in two weeks from the time of this writing, and I will update this post with any additional insights then.

Babai has not yet released a preprint, and when I asked him he said “soon, soon.” Until then 🙂

This blog post is based on my personal notes from Laszlo Babai’s lecture at the University of Chicago on November 10, 2015. At the time of this writing, Babai’s work has not been peer reviewed, and my understanding of his lectures has large gaps and may be faulty. Do not put your life in danger based on information in this post.