NP-hard does not mean hard

When NP-hardness pops up on the internet, say because some silly blogger wants to write about video games, it’s often tempting to conclude that the problem being proved NP-hard is actually very hard!

“Scientists proved Super Mario is NP-hard? I always knew there was a reason I wasn’t very good at it!” Sorry, these two are unrelated. NP-hardness means hard in a narrow sense this post should hopefully make clear. After that, we’ll explore what “hard” means in a mathematical sense that you can apply beyond NP-hardness to inform your work as a programmer.

When a problem is NP-hard, that simply means that the problem is sufficiently expressive that you can use the problem to express logic. By which I mean boolean formulas using AND, OR, and NOT. In the Super Mario example, the “problem” is a bundle of (1) the controls for the player (2) the allowed tiles and characters that make up a level, and (3) the goal of getting from the start to the end. Logic formulas are encoded in the creation of a level, and solving the problem (completing the level) is the same as finding conditions to make the logical formula true.

mario-clause-gadget

The clause gadget for the original Super Mario Brothers, encoding an OR of three variables.

In this sense, NP-hardness doesn’t make all of Super Mario hard. The levels designed to encode logical formulas are contrived, convoluted, and contorted. They abuse the rules of the game in order to cram boolean logic into it. These are worst case levels. It’s using Mario for a completely unintended purpose, not unlike hacking. And so NP-hardness is a worst case claim.

To reiterate, NP-hardness means that Super Mario has expressive power. So expressive that it can emulate other problems we believe are hard in the worst case. And, because the goal of mathematical “hardness” is to reason about the limitations of algorithms, being able to solve Super Mario in full generality implies you can solve any hard subproblem, no matter how ridiculous the level design.

The P != NP conjecture says that there’s no polynomial time algorithm to determine whether boolean logic formulas are satisfiable, and so as a consequence Super Mario (in full generality) also has no polynomial time algorithm.

That being said, in reality Super Mario levels do not encode logical formulas! If you use the knowledge that real-world Super Mario levels are designed in the way they are (to be solvable, fun), then you can solve Super Mario with algorithms. There are many examples.

In general, the difficulty of a problem for humans is unrelated to the difficulty for algorithms. Consider multiplication of integers. This is a trivial problem for computers to solve, but humans tend to struggle with it. It’s an amazing feat to be able to multiply two 7 digit numbers in less than 5 seconds, whereas computers can multiply two thousand-digit numbers in milliseconds.

Meanwhile, protein folding is known to be an NP-hard problem, but it’s been turned into a game sufficiently easy for humans to solve that players have contributed to scientific research. Indeed, even some of the most typically cited NP-hard problems, like traveling salesman, have heuristic, practical algorithmic solutions that allow one to solve them (very close to optimally) in hours on inputs as large as every city on earth.

So the mathematical notions of hardness are quite disconnected from practical notions of hardness. This is not even to mention that some NP-hard problems can be efficiently approximated to within any desired accuracy.

Let’s dig into the math a bit more. “Hardness” is a family of ideas about comparisons between problems based on reusability of algorithmic solutions. Loosely speaking, a problem R is hard with respect to a class of problems C if an algorithm solving R can be easily transformed into an algorithm solving any problem in C. You have to say what kinds of transformations are allowed, and the transformation can be different for different target problems in C, but that’s the basic idea.

In the Super Mario example, if you want to solve logical formulas, you can transform a hypothetically perfect mario-level-playing algorithm into a logic solver by encoding the formula as a level and running the mario-level-playing algorithm on it as a black box. Add an if statement to the end to translate “level can/can’t be finished” to “formula can/can’t be satisfied,” and the transformation is complete. It’s important for NP-hardness that the transformation only takes polynomial time. Other kinds of hardness might admit more or restrict to fewer resources.

And so this is what makes Mario NP-hard, because boolean logic satisfiability is NP-hard. Any problem in NP can be solved by a boolean logic solver, and hence also by a mario-level-player. The fact that boolean logic solving is NP-hard is a difficult theorem to prove. But if we assume it’s true, you can compose the transformations to get from any NP problem to Super Mario.

As a simple example of a different kind of hardness, you can let C be the class of problems solvable using only a finite amount of memory (independent of the input). You have probably heard of this class of problems by another name, but I’ll keep you guessing until the end of the post. A C-hard problem R is one for which an algorithmic solution can be repurposed to solve any finite-memory-solvable problem.

We have to be careful: if the transformation between solutions allows us polynomial time (in the size of the input) like it did for NP-hardness, then we might have enough time in the transformation alone to solve the entire problem, removing the need for a solution to R in the first place! For this reason, we have to limit the amount of work that can be done in the transformation. We get a choice here that influences how interesting or useful the definition of hardness is, but let’s just pick one and say that the transformation can only use finite time (independent of the input).

To be fair, I actually don’t know if there are any hard problems with respect to this definition. There probably are, but chances are good that they are not members of C, and that’s where the definition of hardness gets really interesting. If you have a problem in C which is also C-hard, it’s called complete for C. And once you’ve found a complete problem, from a theoretical perspective you’re a winner. You’ve found a problem which epitomizes the difficulty of solving problems in C. And so it’s a central aim of researchers studying a complexity class to find complete problems. As they say in the business, “ABC: always be completing.”

As a more concrete and interesting example, the class P of all polynomial-time solvable problems has a complete problem. Here the transformations are a bit up in the air. They could either be logarithmic-space computations, or what’s called NC, which can be thought of as poly-logarithmic time (very fast) parallel computations. I only mention NC because it allows you to say “P-complete problems are hard to parallelize.”

Regardless of the choice, there are a number of very useful problems known to be P-complete. The first is the Circuit Value Problem, given a circuit (described by its gates and wires using any reasonable encoding) and an input to the circuit, what is the output?

Others include linear programming (optimize this linear function with respect to linear constraints), data compression (does the compressed version of a string s using Lempel–Ziv–Welch contain a string t?), and type inference for partial types. There are many more in this compendium of Greenlaw et al. Each one is expressive enough to encode any instance of the other, and any instance of any problem in P. It’s quite curious to think that gzip can solve linear programs, but that’s surely no curiouser than super mario levels encoding boolean logic.

Just as with NP-hardness, when a problem is P-hard that doesn’t automatically mean it’s easy or hard for humans, or that typical instances can’t be easily parallelized. P-hardness is also a worst case guarantee.

Studying P-completeness is helpful in the same way NP-completeness is helpful. Completeness informs you about whether you should hope to find a perfect solution or be content with approximations and heuristics (or incorporate problem context to make it easier). Knowing a problem is P-complete means you should not expect perfect efficient parallel algorithms, or perfect efficient algorithms that use severely limited space. Knowing a problem is NP-hard means you should not expect a perfect polynomial time solution. In other words, if you are forced to work with those restrictions, the game becomes one of tradeoffs. Hardness and completeness focus and expedite your work, and clarify a principled decision making process.

Until next time!

P.S. The class of problems solvable in a finite amount of memory is just the class of regular languages. The “finite memory” is the finite state machine used to solve them.

Advertisements

Zero Knowledge Proofs — A Primer

In this post we’ll get a strong taste for zero knowledge proofs by exploring the graph isomorphism problem in detail. In the next post, we’ll see how this relates to cryptography and the bigger picture. The goal of this post is to get a strong understanding of the terms “prover,” “verifier,” and “simulator,” and “zero knowledge” in the context of a specific zero-knowledge proof. Then next time we’ll see how the same concepts (though not the same proof) generalizes to a cryptographically interesting setting.

Graph isomorphism

Let’s start with an extended example. We are given two graphs G_1, G_2, and we’d like to know whether they’re isomorphic, meaning they’re the same graph, but “drawn” different ways.

The problem of telling if two graphs are isomorphic seems hard. The pictures above, which are all different drawings of the same graph (or are they?), should give you pause if you thought it was easy.

To add a tiny bit of formalism, a graph G is a list of edges, and each edge (u,v) is a pair of integers between 1 and the total number of vertices of the graph, say n. Using this representation, an isomorphism between G_1 and G_2 is a permutation \pi of the numbers \{1, 2, \dots, n \} with the property that (i,j) is an edge in G_1 if and only if (\pi(i), \pi(j)) is an edge of G_2. You swap around the labels on the vertices, and that’s how you get from one graph to another isomorphic one.

Given two arbitrary graphs as input on a large number of vertices n, nobody knows of an efficient—i.e., polynomial time in n—algorithm that can always decide whether the input graphs are isomorphic. Even if you promise me that the inputs are isomorphic, nobody knows of an algorithm that could construct an isomorphism. (If you think about it, such an algorithm could be used to solve the decision problem!)

A game

Now let’s play a game. In this game, we’re given two enormous graphs on a billion nodes. I claim they’re isomorphic, and I want to prove it to you. However, my life’s fortune is locked behind these particular graphs (somehow), and if you actually had an isomorphism between these two graphs you could use it to steal all my money. But I still want to convince you that I do, in fact, own all of this money, because we’re about to start a business and you need to know I’m not broke.

Is there a way for me to convince you beyond a reasonable doubt that these two graphs are indeed isomorphic? And moreover, could I do so without you gaining access to my secret isomorphism? It would be even better if I could guarantee you learn nothing about my isomorphism or any isomorphism, because even the slightest chance that you can steal my money is out of the question.

Zero knowledge proofs have exactly those properties, and here’s a zero knowledge proof for graph isomorphism. For the record, G_1 and G_2 are public knowledge, (common inputs to our protocol for the sake of tracking runtime), and the protocol itself is common knowledge. However, I have an isomorphism f: G_1 \to G_2 that you don’t know.

Step 1: I will start by picking one of my two graphs, say G_1, mixing up the vertices, and sending you the resulting graph. In other words, I send you a graph H which is chosen uniformly at random from all isomorphic copies of G_1. I will save the permutation \pi that I used to generate H for later use.

Step 2: You receive a graph H which you save for later, and then you randomly pick an integer t which is either 1 or 2, with equal probability on each. The number t corresponds to your challenge for me to prove H is isomorphic to G_1 or G_2. You send me back t, with the expectation that I will provide you with an isomorphism between H and G_t.

Step 3: Indeed, I faithfully provide you such an isomorphism. If I you send me t=1, I’ll give you back \pi^{-1} : H \to G_1, and otherwise I’ll give you back f \circ \pi^{-1}: H \to G_2. Because composing a fixed permutation with a uniformly random permutation is again a uniformly random permutation, in either case I’m sending you a uniformly random permutation.

Step 4: You receive a permutation g, and you can use it to verify that H is isomorphic to G_t. If the permutation I sent you doesn’t work, you’ll reject my claim, and if it does, you’ll accept my claim.

Before we analyze, here’s some Python code that implements the above scheme. You can find the full, working example in a repository on this blog’s Github page.

First, a few helper functions for generating random permutations (and turning their list-of-zero-based-indices form into a function-of-positive-integers form)

import random

def randomPermutation(n):
    L = list(range(n))
    random.shuffle(L)
    return L

def makePermutationFunction(L):
    return lambda i: L[i - 1] + 1

def makeInversePermutationFunction(L):
    return lambda i: 1 + L.index(i - 1)

def applyIsomorphism(G, f):
    return [(f(i), f(j)) for (i, j) in G]

Here’s a class for the Prover, the one who knows the isomorphism and wants to prove it while keeping the isomorphism secret:

class Prover(object):
    def __init__(self, G1, G2, isomorphism):
        '''
            isomomorphism is a list of integers representing
            an isomoprhism from G1 to G2.
        '''
        self.G1 = G1
        self.G2 = G2
        self.n = numVertices(G1)
        assert self.n == numVertices(G2)

        self.isomorphism = isomorphism
        self.state = None

    def sendIsomorphicCopy(self):
        isomorphism = randomPermutation(self.n)
        pi = makePermutationFunction(isomorphism)

        H = applyIsomorphism(self.G1, pi)

        self.state = isomorphism
        return H

    def proveIsomorphicTo(self, graphChoice):
        randomIsomorphism = self.state
        piInverse = makeInversePermutationFunction(randomIsomorphism)

        if graphChoice == 1:
            return piInverse
        else:
            f = makePermutationFunction(self.isomorphism)
            return lambda i: f(piInverse(i))

The prover has two methods, one for each round of the protocol. The first creates an isomorphic copy of G_1, and the second receives the challenge and produces the requested isomorphism.

And here’s the corresponding class for the verifier

class Verifier(object):
    def __init__(self, G1, G2):
        self.G1 = G1
        self.G2 = G2
        self.n = numVertices(G1)
        assert self.n == numVertices(G2)

    def chooseGraph(self, H):
        choice = random.choice([1, 2])
        self.state = H, choice
        return choice

    def accepts(self, isomorphism):
        '''
            Return True if and only if the given isomorphism
            is a valid isomorphism between the randomly
            chosen graph in the first step, and the H presented
            by the Prover.
        '''
        H, choice = self.state
        graphToCheck = [self.G1, self.G2][choice - 1]
        f = isomorphism

        isValidIsomorphism = (graphToCheck == applyIsomorphism(H, f))
        return isValidIsomorphism

Then the protocol is as follows:

def runProtocol(G1, G2, isomorphism):
    p = Prover(G1, G2, isomorphism)
    v = Verifier(G1, G2)

    H = p.sendIsomorphicCopy()
    choice = v.chooseGraph(H)
    witnessIsomorphism = p.proveIsomorphicTo(choice)

    return v.accepts(witnessIsomorphism)

Analysis: Let’s suppose for a moment that everyone is honestly following the rules, and that G_1, G_2 are truly isomorphic. Then you’ll always accept my claim, because I can always provide you with an isomorphism. Now let’s suppose that, actually I’m lying, the two graphs aren’t isomorphic, and I’m trying to fool you into thinking they are. What’s the probability that you’ll rightfully reject my claim?

Well, regardless of what I do, I’m sending you a graph H and you get to make a random choice of t = 1, 2 that I can’t control. If H is only actually isomorphic to either G_1 or G_2 but not both, then so long as you make your choice uniformly at random, half of the time I won’t be able to produce a valid isomorphism and you’ll reject. And unless you can actually tell which graph H is isomorphic to—an open problem, but let’s say you can’t—then probability 1/2 is the best you can do.

Maybe the probability 1/2 is a bit unsatisfying, but remember that we can amplify this probability by repeating the protocol over and over again. So if you want to be sure I didn’t cheat and get lucky to within a probability of one-in-one-trillion, you only need to repeat the protocol 30 times. To be surer than the chance of picking a specific atom at random from all atoms in the universe, only about 400 times.

If you want to feel small, think of the number of atoms in the universe. If you want to feel big, think of its logarithm.

Here’s the code that repeats the protocol for assurance.

def convinceBeyondDoubt(G1, G2, isomorphism, errorTolerance=1e-20):
    probabilityFooled = 1

    while probabilityFooled > errorTolerance:
        result = runProtocol(G1, G2, isomorphism)
        assert result
        probabilityFooled *= 0.5
        print(probabilityFooled)

Running it, we see it succeeds

$ python graph-isomorphism.py
0.5
0.25
0.125
0.0625
0.03125
 ...
<SNIP>
 ...
1.3552527156068805e-20
6.776263578034403e-21

So it’s clear that this protocol is convincing.

But how can we be sure that there’s no leakage of knowledge in the protocol? What does “leakage” even mean? That’s where this topic is the most difficult to nail down rigorously, in part because there are at least three a priori different definitions! The idea we want to capture is that anything that you can efficiently compute after the protocol finishes (i.e., you have the content of the messages sent to you by the prover) you could have computed efficiently given only the two graphs G_1, G_2, and the claim that they are isomorphic.

Another way to say it is that you may go through the verification process and feel happy and confident that the two graphs are isomorphic. But because it’s a zero-knowledge proof, you can’t do anything with that information more than you could have done if you just took the assertion on blind faith. I’m confident there’s a joke about religion lurking here somewhere, but I’ll just trust it’s funny and move on.

In the next post we’ll expand on this “leakage” notion, but before we get there it should be clear that the graph isomorphism protocol will have the strongest possible “no-leakage” property we can come up with. Indeed, in the first round the prover sends a uniform random isomorphic copy of G_1 to the verifier, but the verifier can compute such an isomorphism already without the help of the prover. The verifier can’t necessarily find the isomorphism that the prover used in retrospect, because the verifier can’t solve graph isomorphism. Instead, the point is that the probability space of “G_1 paired with an H made by the prover” and the probability space of “G_1 paired with H as made by the verifier” are equal. No information was leaked by the prover.

For the second round, again the permutation \pi used by the prover to generate H is uniformly random. Since composing a fixed permutation with a uniform random permutation also results in a uniform random permutation, the second message sent by the prover is uniformly random, and so again the verifier could have constructed a similarly random permutation alone.

Let’s make this explicit with a small program. We have the honest protocol from before, but now I’m returning the set of messages sent by the prover, which the verifier can use for additional computation.

def messagesFromProtocol(G1, G2, isomorphism):
    p = Prover(G1, G2, isomorphism)
    v = Verifier(G1, G2)

    H = p.sendIsomorphicCopy()
    choice = v.chooseGraph(H)
    witnessIsomorphism = p.proveIsomorphicTo(choice)

    return [H, choice, witnessIsomorphism]

To say that the protocol is zero-knowledge (again, this is still colloquial) is to say that anything that the verifier could compute, given as input the return value of this function along with G_1, G_2 and the claim that they’re isomorphic, the verifier could also compute given only G_1, G_2 and the claim that G_1, G_2 are isomorphic.

It’s easy to prove this, and we’ll do so with a python function called simulateProtocol.

def simulateProtocol(G1, G2):
    # Construct data drawn from the same distribution as what is
    # returned by messagesFromProtocol
    choice = random.choice([1, 2])
    G = [G1, G2][choice - 1]
    n = numVertices(G)

    isomorphism = randomPermutation(n)
    pi = makePermutationFunction(isomorphism)
    H = applyIsomorphism(G, pi)

    return H, choice, pi

The claim is that the distribution of outputs to messagesFromProtocol and simulateProtocol are equal. But simulateProtocol will work regardless of whether G_1, G_2 are isomorphic. Of course, it’s not convincing to the verifier because the simulating function made the choices in the wrong order, choosing the graph index before making H. But the distribution that results is the same either way.

So if you were to use the actual Prover/Verifier protocol outputs as input to another algorithm (say, one which tries to compute an isomorphism of G_1 \to G_2), you might as well use the output of your simulator instead. You’d have no information beyond hard-coding the assumption that G_1, G_2 are isomorphic into your program. Which, as I mentioned earlier, is no help at all.

In this post we covered one detailed example of a zero-knowledge proof. Next time we’ll broaden our view and see the more general power of zero-knowledge (that it captures all of NP), and see some specific cryptographic applications. Keep in mind the preceding discussion, because we’re going to re-use the terms “prover,” “verifier,” and “simulator” to mean roughly the same things as the classes Prover, Verifier and the function simulateProtocol.

Until then!

Parameterizing the Vertex Cover Problem

I’m presenting a paper later this week at the Matheamtical Foundations of Computer Science 2014 in Budapest, Hungary. This conference is an interesting mix of logic and algorithms that aims to bring together researchers from these areas to discuss their work. And right away the first session on the first day focused on an area I know is important but have little experience with: fixed parameter complexity. From what I understand it’s not that popular of a topic at major theory conferences in the US (there appears to be only one paper on it at this year’s FOCS conference), but the basic ideas are worth knowing.

The basic idea is pretty simple: some hard computational problems become easier (read, polynomial-time solvable) if you fix some parameters involved to constants. Preferably small constants. For example, finding cliques of size k in a graph is NP-hard if k is a parameter, but if you fix k to a constant then you can check all possible subsets of size k in O(n^k) time. This is kind of a silly example because there are much faster ways to find triangles than checking all O(n^3) subsets of vertices, but part of the point of fixed-parameter complexity is to find the fastest algorithms in these fixed-parameter settings. Since in practice parameters are often small [citation needed], this analysis can provide useful practical algorithmic alternatives to heuristics or approximate solutions.

One important tool in the theory of fixed-parameter tractability is the idea of a kernel. I think it’s an unfortunate term because it’s massively overloaded in mathematics, but the idea is to take a problem instance with the parameter k, and carve out “easy” regions of the instance (often reducing k as you go) until the runtime of the trivial brute force algorithm only depends on k and not on the size of the input. The point is that the solution you get on this “carved out” instance is either the same as the original, or can be extended back to the original with little extra work. There is a more formal definition we’ll state, but there is a canonical example that gives a great illustration.

Consider the vertex cover problem. That is, you give me a graph G = (V,E) and a number k and I have to determine if there is a subset of \leq k vertices of G that touch all of the edges in E. This problem is fixed-parameter tractable because, as with k-clique one can just check all subsets of size k. The kernel approach we’ll show now is much smarter.

What you do is the following. As long as your graph has a vertex of degree > k, you remove it and reduce k by 1. This is because a vertex of degree > k will always be chosen for a vertex cover. If it’s not, then you need to include all of its neighbors to cover its edges, but there are > k neighbors and your vertex cover is constrained by size k. And so you can automatically put this high-degree vertex in your cover, and use induction on the smaller graph.

Once you can’t remove any more vertices there are two cases. In the case that there are more than k^2 edges, you output that there is no vertex cover. Indeed, if you only get k vertices in your cover and you removed all vertices of degree > k, then each can cover at most k edges, giving a total of at most k^2. Otherwise, if there are at most k^2 edges, then you can remove all the isolated vertices and show that there are only \leq 2k^2 vertices left. This is because each edge touches only two vertices, so in the worst case they’re all distinct. This smaller subgraph is called a kernel of the vertex cover, and the fact that its size depends only on k is the key. So you can look at all 2^{2k^2} = O(1) subsets to determine if there’s a cover of the size you want. If you find a cover of the kernel, you add back in all the large-degree vertices you deleted and you’re done.

Now, even for small k this is a pretty bad algorithm (k=5 gives 2^{50} subsets to inspect), but with more detailed analysis you can do significantly better. In particular, the best known bound reduces vertex cover to a kernel of size 2k - c \log(k) vertices for any constant c you specify. Getting \log(k) vertices is known to imply P = NP, and with more detailed complexity assumptions it’s even hard to get a graph with fewer than O(k^{2-\varepsilon}) edges for any \varepsilon > 0. These are all relatively recent results whose associated papers I have not read.

Even with these hardness results, there are two reasons why this kind of analysis is useful. The first is that it gives us a clearer picture of the complexity of these problems. In particular, the reduction we showed for vertex cover gives a time O(2^{2k^2} + n + m)-time algorithm, which you can then compare directly to the trivial O(n^k) time brute force algorithm and measure the difference. Indeed, if k = o(\sqrt{(k/2) log(n)}) then the kernelized approach is faster.

The second reason is that the kernel approach usually results in simple and quick checks for negative answers to a problem. In particular, if you want to check for k-sized set covers in a graph in the real world, this analysis shows that the first thing you should do is check if the kernel has size > k^2. If so, you can immediately give a “no” answer. So useful kernels can provide insight into the structure of a problem that can be turned into heuristic tools even when it doesn’t help you solve the problem exactly.

So now let’s just see the prevailing definition of a “kernelization” of a problem. This comes from the text of Downey and Fellows.

Definition: kernelization of a parameterized problem L (formally, a language where each string x is paired with a positive integer k) is a \textup{poly}(|x|, k)-time algorithm that converts instances (x,k) into instances (x', k') with the following three properties.

  • (x,k) is a yes instance of L if and only if (x', k') is.
  • |x'| \leq f(k) for some computable function f: \mathbb{N} \to \mathbb{N}.
  • k' \leq g(k) for some computable function g: \mathbb{N} \to \mathbb{N}.

The output (x', k') is called a kernel, and the problem is said to admit a polynomial kernel if f(k) = O(k^c) for some constant c.

So we showed that vertex cover admits a polynomial kernel (in fact, a quadratic one).

Now the nice theorem is that a problem is fixed-parameter tractable if and only if it admits a polynomial kernel. Finding a kernel is conceptually easier because, like in vertex cover, it allows you to introduce additional assumptions on the structure of the instances you’re working with. But more importantly from a theoretical standpoint, measuring the size and complexity of kernels for NP-hard problems gives us a way to discriminate among problems within NP. That and the chance to get some more practical tools for NP-hard problems makes parameterized complexity more interesting than it sounds at first.

Until next time!