An Update on “Coloring Resilient Graphs”

A while back I announced a preprint of a paper on coloring graphs with certain resilience properties. I’m pleased to announce that it’s been accepted to the Mathematical Foundations of Computer Science 2014, which is being held in Budapest this year. Since we first published the preprint we’ve actually proved some additional results about resilience, and so I’ll expand some of the details here. I think it makes for a nicer overall picture, and in my opinion it gives a little more justification that resilient coloring is interesting, at least in contrast to other resilience problems.

Resilient SAT

Recall that a “resilient” yes-instance of a combinatorial problem is one which remains a yes-instance when you add or remove some constraints. The way we formalized this for SAT was by fixing variables to arbitrary values. Then the question is how resilient does an instance need to be in order to actually find a certificate for it? In more detail,

Definition: $ r$-resilient $ k$-SAT formulas are satisfiable formulas in $ k$-CNF form (conjunctions of clauses, where each clause is a disjunction of three literals) such that for all choices of $ r$ variables, every way to fix those variables yields a satisfiable formula.

For example, the following 3-CNF formula is 1-resilient:

$ \displaystyle (a \vee b \vee c) \wedge (a \vee \overline{b} \vee \overline{c}) \wedge (\overline{a} \vee \overline{b} \vee c)$

The idea is that resilience may impose enough structure on a SAT formula that it becomes easy to tell if it’s satisfiable at all. Unfortunately for SAT (though this is definitely not the case for coloring), there are only two possibilities. Either the instances are so resilient that they never existed in the first place (they’re vacuously trivial), or the instances are NP-hard. The first case is easy: there are no $ k$-resilient $ k$-SAT formulas. Indeed, if you’re allowed to fix $ k$ variables to arbitrary values, then you can just pick a clause and set all its variables to false. So no formula can ever remain satisfiable under that condition.

The second case is when the resilience is strictly less than the clause size, i.e. $ r$-resilient $ k$-SAT for $ 0 \leq r < k$. In this case the problem of finding a satisfying assignment is NP-hard. We’ll show this via a sequence of reductions which start at 3-SAT, and they’ll involve two steps: increasing the clause size and resilience, and decreasing the clause size and resilience. The trick is in balancing which parts are increased and decreased. I call the first step the “blowing up” lemma, and the second part the “shrinking down” lemma.

Blowing Up and Shrinking Down

Here’s the intuition behind the blowing up lemma. If you give me a regular (unresilient) 3-SAT formula $ \varphi$, what I can do is make a copy of $ \varphi$ with a new set of variables and OR the two things together. Call this $ \varphi^1 \vee \varphi^2$. This is clearly logically equivalent to the original formula; if you give me a satisfying assignment for the ORed thing, I can just see which of the two clauses are satisfied and use that sub-assignment for $ \varphi$, and conversely if you can satisfy $ \varphi$ it doesn’t matter what truth values you choose for the new set of variables. And further you can transform the ORed formula into a 6-SAT formula in polynomial time. Just apply deMorgan’s rules for distributing OR across AND.

Now the choice of a new set of variables allows us to give some resilient. If you fix one variable to the value of your choice, I can always just work with the other set of variables. Your manipulation doesn’t change the satisfiability of the ORed formula, because I’ve added all of this redundancy. So we took a 3-SAT formula and turned it into a 1-resilient 6-SAT formula.

The idea generalizes to the blowing up lemma, which says that you can measure the effects of a blowup no matter what you start with. More formally, if $ s$ is the number of copies of variables you make, $ k$ is the clause size of the starting formula $ \varphi$, and $ r$ is the resilience of $ \varphi$, then blowing up gives you an $ [(r+1)s – 1]$-resilient $ (sk)$-SAT formula. The argument is almost identical to the example above the resilience is more general. Specifically, if you fix fewer than $ (r+1)s$ variables, then the pigeonhole principle guarantees that one of the $ s$ copies of variables has at most $ r$ fixed values, and we can just work with that set of variables (i.e., this small part of the big ORed formula is satisfiable if $ \varphi$ was $ r$-resilient).

The shrinking down lemma is another trick that is similar to the reduction from $ k$-SAT to 3-SAT. There you take a clause like $ v \vee w \vee x \vee y \vee z$ and add new variables $ z_i$ to break up the clause in to clauses of size 3 as follows:

$ \displaystyle (v \vee w \vee z_1) \wedge (\neg z_1 \vee x \vee z_2) \wedge (\neg z_2 \vee y \vee z)$

These are equivalent because your choice of truth values for the $ z_i$ tell me which of these sub-clauses to look for a true literal of the old variables. I.e. if you choose $ z_1 = T, z_2 = F$ then you have to pick either $ y$ or $ z$ to be true. And it’s clear that if you’re willing to double the number of variables (a linear blowup) you can always get a $ k$-clause down to an AND of 3-clauses.

So the shrinking down reduction does the same thing, except we only split clauses in half. For a clause $ C$, call $ C[:k/2]$ the first half of a clause and $ C[k/2:]$ the second half (you can see how my Python training corrupts my notation preference). Then to shrink a clause $ C_i$ down from size $ k$ to size $ \lceil k/2 \rceil + 1$ (1 for the new variable), add a variable $ z_i$ and break $ C_i$ into

$ \displaystyle (C_i[:k/2] \vee z_i) \wedge (\neg z_i \vee C[k/2:])$

and just AND these together for all clauses. Call the original formula $ \varphi$ and the transformed one $ \psi$. The formulas are logically equivalent for the same reason that the $ k$-to-3-SAT reduction works, and it’s already in the right CNF form. So resilience is all we have to measure. The claim is that the resilience is $ q = \min(r, \lfloor k/2 \rfloor)$, where $ r$ is the resilience of $ \varphi$.

The reason for this is that if all the fixed variables are old variables (not $ z_i$), then nothing changes and the resilience of the original $ \phi$ keeps us safe. And each $ z_i$ we fix has no effect except to force us to satisfy a variable in one of the two halves. So there is this implication that if you fix a $ z_i$ you have to also fix a regular variable. Because we can’t guarantee anything if we fix more than $ r$ regular variables, we’d have to stop before fixing $ r$ of the $ z_i$. And because these new clauses have size $ k/2 + 1$, we can’t do this more than $ k/2$ times or else we risk ruining an entire clause. So this give the definition of $ q$. So this proves the shrinking down lemma.

Resilient SAT is always hard

The blowing up and shrinking down lemmas can be used to show that $ r$-resilient $ k$-SAT is NP-hard for all $ r < k$. What we do is reduce from 3-SAT to an $ r$-resilient $ k$-SAT instance in such a way that the 3-SAT formula is satisfiable if and only if the transformed formula is resiliently satisfiable.

What makes these two lemmas work together is that shrinking down shrinks the clause size just barely less than the resilience, and blowing up increases resilience just barely more than it increases clause size. So we can combine these together to climb from 3-SAT up to some high resilience and satisfiability, and then iteratively shrink down until we hit our target.

One might worry that it will take an exponential number of reductions (or a few reductions of exponential size) to get from 3-SAT to the $ (r,k)$ of our choice, but we have a construction that does it in at most four steps, with only a linear initial blowup from 3-SAT to $ r$-resilient $ 3(r+1)$-SAT. Then, to deal with the odd ceilings and floors in the shrinking down lemma, you have to find a suitable larger $ k$ to reduce to (by padding with useless variables, which cannot make the problem easier). And you choose this $ k$ so that you only need at most two applications of shrinking down to get to $ (k-1)$-resilient $ k$-SAT. Our preprint has the gory details (which has an inelegant part that is not worth writing here), but in the end you show that $ (k-1)$-resilient $ k$-SAT is hard, and since that’s the maximal amount of resilience before the problem becomes vacuously trivial, all smaller resilience values are also hard.

So how does this relate to coloring?

I’m happy about this result not just because it answers an open question I’m honestly curious about, but also because it shows that resilient coloring is more interesting. Basically this proves that satisfiability is so hard that no amount of resilience can make it easier in the worst case. But coloring has a gradient of difficulty. Once you get to order $ k^2$ resilience for $ k$-colorable graphs, the coloring problem can be solved efficiently by a greedy algorithm (and it’s not a vacuously empty class of graphs). Another thing on the side is that we use the hardness of resilient SAT to get the hardness results we have for coloring.

If you really want to stretch the implications, you might argue that this says something like “coloring is somewhat easier than SAT,” because we found a quantifiable axis along which SAT remains difficult while coloring crumbles. The caveat is that fixing colors of vertices is not exactly comparable to fixing values of truth assignments (since we are fixing lots of instances by fixing a variable), but at least it’s something concrete.

Coloring is still mostly open, and recently I’ve been going to talks where people are discussing startlingly similar ideas for things like Hamiltonian cycles. So that makes me happy.

Until next time!

Midwest Theory (of Computing) Day at Purdue University

I’ll be giving a talk at Purdue University on Saturday, May 3 as part of the 65th Midwest Theory Day. If any readers happen to live in West Lafayette, Indiana and are interested in hearing about some of my recent research, you can register for free by April 28 (one week from today). Lunch and snacks are provided, and the other talks will certainly be interesting too.

Here’s the title and abstract for my talk:

Resilient Coloring and Other Combinatorial Problems

A good property of a problem instance is that it’s easy to solve. And even better property is resilience: that the instance remains easy to solve under arbitrary (but minor) perturbations. We informally define the resilience of an instance of a combinatorial problem, and discuss recent work on resilient promise problems, including resilient satisfiability and resilient graph coloring.
Hope to see you there!

On Coloring Resilient Graphs

I’m pleased to announce that another paper of mine is finished. This one just got accepted to MFCS 2014, which is being held in Budapest this year (this whole research thing is exciting!). This is joint work with my advisor, Lev Reyzin. As with my first paper, I’d like to explain things here on my blog a bit more informally than a scholarly article allows.

A Recent History of Graph Coloring

One of the first important things you learn when you study graphs is that coloring graphs is hard. Remember that coloring a graph with $ k$ colors means that you assign each vertex a color (a number in $ \left \{ 1, 2, \dots, k \right \}$) so that no vertex is adjacent to a vertex of the same color (no edge is monochromatic). In fact, even deciding whether a graph can be colored with just $ 3$ colors (not to mention finding such a coloring) has no known polynomial time algorithm. It’s what’s called NP-hard, which means that almost everyone believes it’s hopeless to solve efficiently in the worst case.

One might think that there’s some sort of gradient to this problem, that as the graphs get more “complicated” it becomes algorithmically harder to figure out how colorable they are. There are some notions of “simplicity” and “complexity” for graphs, but they hardly fall on a gradient. Just to give the reader an idea, here are some ways to make graph coloring easy:

  • Make sure your graph is planar. Then deciding 4-colorability is easy because the answer is always yes.
  • Make sure your graph is triangle-free and planar. Then finding a 3-coloring is easy.
  • Make sure your graph is perfect (which again requires knowledge about how colorable it is).
  • Make sure your graph has tree-width or clique-width bounded by a constant.
  • Make sure your graph doesn’t have a certain kind of induced subgraph (such as having no induced paths of length 4 or 5).

Let me emphasize that these results are very difficult and tricky to compare. The properties are inherently discrete (either perfect or imperfect, planar or not planar). The fact that the world has not yet agreed upon a universal measure of complexity for graphs (or at least one that makes graph coloring easy to understand) is not a criticism of the chef but a testament to the challenge and intrigue of the dish.

Coloring general graphs is much bleaker, where the focus has turned to approximations. You can’t “approximate” the answer to whether a graph is colorable, so now the key here is that we are actually trying to find an approximate coloring. In particular, if you’re given some graph $ G$ and you don’t know the minimum number of colors needed to color it (say it’s $ \chi(G)$, this is called the chromatic number), can you easily color it with what turns out to be, say, $ 2 \chi(G)$ colors?

Garey and Johnson (the gods of NP-hardness) proved this problem is again hard. In fact, they proved that you can’t do better than twice the number of colors. This might not seem so bad in practice, but the story gets worse. This lower bound was improved by Zuckerman, building on the work of Håstad, to depend on the size of the graph! That is, unless $ P=NP$, all efficient algorithms will use asymptotically more than $ \chi(G) n^{1 – \varepsilon}$ colors for any $ \varepsilon > 0$ in the worst case, where $ n$ is the number of vertices of $ G$. So the best you can hope for is being off by something like a multiplicative factor of $ n / \log n$. You can actually achieve this (it’s nontrivial and takes a lot of work), but it carries that aura of pity for the hopeful graph colorer.

The next avenue is to assume you know the chromatic number of your graph, and see how well you can do then. For example: if you are given the promise that a graph $ G$ is 3-colorable, can you efficiently find a coloring with 8 colors? The best would be if you could find a coloring with 4 colors, but this is already known to be NP-hard.

The best upper bounds, algorithms to find approximate colorings of 3-colorable graphs, also pitifully depend on the size of the graph. Remember I say pitiful not to insult the researchers! This decades-long line of work was extremely difficult and deserves the highest praise. It’s just frustrating that the best known algorithm to color a 3-colorable graph requires as many as $ n^{0.2}$ colors. At least it bypasses the barrier of $ n^{1 – \varepsilon}$ mentioned above, so we know that knowing the chromatic number actually does help.

The lower bounds are a bit more hopeful; it’s known to be NP-hard to color a $ k$-colorable graph using $ 2^{\sqrt[3]{k}}$ colors if $ k$ is sufficiently large. There are a handful of other linear lower bounds that work for all $ k \geq 3$, but to my knowledge this is the best asymptotic result. The big open problem (which I doubt many people have their eye on considering how hard it seems) is to find an upper bound depending only on $ k$. I wonder offhand whether a ridiculous bound like $ k^{k^k}$ colors would be considered progress, and I bet it would.

Our Idea: Resilience

So without big breakthroughs on the front of approximate graph coloring, we propose a new front for investigation. The idea is that we consider graphs which are not only colorable, but remain colorable under the adversarial operation of adding a few new edges. More formally,

Definition: A graph $ G = (V,E)$ is called $ r$-resiliently $ k$-colorable if two properties hold

  1. $ G$ is $ k$-colorable.
  2. For any set $ E’$ of $ r$ edges disjoint from $ E$, the graph $ G’ = (V, E \cup E’)$ is $ k$-colorable.

The simplest nontrivial example of this is 1-resiliently 3-colorable graphs. That is a graph that is 3-colorable and remains 3-colorable no matter which new edge you add. And the question we ask of this example: is there a polynomial time algorithm to 3-color a 1-resiliently 3-colorable graph? We prove in our paper that this is actually NP-hard, but it’s not a trivial thing to see.

The chief benefit of thinking about resiliently colorable graphs is that it provides a clear gradient of complexity from general graphs (zero-resilient) to the empty graph (which is $ (\binom{k+1}{2} – 1)$-resiliently $ k$-colorable). We know that the most complex case is NP-hard, and maximally resilient graphs are trivially colorable. So finding the boundary where resilience makes things easy can shed new light on graph coloring.

Indeed, we argue in the paper that lots of important graphs have stronger resilience properties than one might expect. For example, here are the resilience properties of some famous graphs.

From left to right: the Petersen graph, 2-resiliently 3-colorable; the Dürer graph, 4-resiliently 4-colorable; the Grötzsch graph, 4-resiliently 4-colorable; and the Chvátal graph, 3-resiliently 4-colorable. These are all maximally resilient (no graph is more resilient than stated) and chromatic (no graph is colorable with fewer colors)

From left to right: the Petersen graph, 2-resiliently 3-colorable; the Dürer graph, 4-resiliently 4-colorable; the Grötzsch graph, 4-resiliently 4-colorable; and the Chvátal graph, 3-resiliently 4-colorable. These are all maximally resilient (no graph is more resilient than stated) and chromatic (no graph is colorable with fewer colors)

If I were of a mind to do applied graph theory, I would love to know about the resilience properties of graphs that occur in the wild. For example, the reader probably knows the problem of register allocation is a natural graph coloring problem. I would love to know the resilience properties of such graphs, with the dream that they might be resilient enough on average to admit efficient coloring algorithms.

Unfortunately the only way that I know how to compute resilience properties is via brute-force search, and of course this only works for small graphs and small $ k$. If readers are interested I could post such a program (I wrote it in vanilla python), but for now I’ll just post a table I computed on the proportion of small graphs that have various levels of resilience (note this includes graphs that vacuously satisfy the definition).

Percentage of k-colorable graphs on 6 vertices which are n-resilient
k\n       1       2       3       4
  ----------------------------------------
3       58.0    22.7     5.9     1.7
4       93.3    79.3    58.0    35.3
5       99.4    98.1    94.8    89.0
6      100.0   100.0   100.0   100.0

Percentage of k-colorable graphs on 7 vertices which are n-resilient
k\n       1       2       3       4
  ----------------------------------------
3       38.1     8.2     1.2     0.3
4       86.7    62.6    35.0    14.9
5       98.7    95.6    88.5    76.2
6       99.9    99.7    99.2    98.3

Percentage of k-colorable graphs on 8 vertices which are n-resilient
k\n       1       2       3       4
  ----------------------------------------
3       21.3     2.1     0.2     0.0
4       77.6    44.2    17.0     4.5

The idea is this: if this trend continues, that only some small fraction of all 3-colorable graphs are, say, 2-resiliently 3-colorable graphs, then it should be easy to color them. Why? Because resilience imposes structure on the graphs, and that structure can hopefully be realized in a way that allows us to color easily. We don’t know how to characterize that structure yet, but we can give some structural implications for sufficiently resilient graphs.

For example, a 7-resiliently 5-colorable graph can’t have any subgraphs on 6 vertices with $ \binom{6}{2} – 7$ edges, or else we can add enough edges to get a 6-clique which isn’t 5-colorable. This gives an obvious general property about the sizes of subgraphs in resilient graphs, but as a more concrete instance let’s think about 2-resilient 3-colorable graphs $ G$. This property says that no set of 4 vertices may have more than $ 4 = \binom{4}{2} – 2$ edges in $ G$. This rules out 4-cycles and non-isolated triangles, but is it enough to make 3-coloring easy? We can say that $ G$ is a triangle-free graph and a bunch of disjoint triangles, but it’s known 3-colorable non-planar triangle-free graphs can have arbitrarily large chromatic number, and so the coloring problem is hard. Moreover, 2-resilience isn’t enough to make $ G$ planar. It’s not hard to construct a non-planar counterexample, but proving it’s 2-resilient is a tedious task I relegated to my computer.

Speaking of which, the problem of how to determine whether a $ k$-colorable graph is $ r$-resiliently $ k$-colorable is open. Is this problem even in NP? It certainly seems not to be, but if it had a nice characterization or even stronger necessary conditions than above, we might be able to use them to find efficient coloring algorithms.

In our paper we begin to fill in a table whose completion would characterize the NP-hardness of coloring resilient graphs

table

The known complexity of k-coloring r-resiliently k-colorable graphs

Ignoring the technical notion of 2-to-1 hardness, the paper accomplishes this as follows. First, we prove some relationships between cells. In particular, if a cell is NP-hard then so are all the cells to the left and below it. So our Theorem 1, that 3-coloring 1-resiliently 3-colorable graphs is NP-hard, gives us the entire black region, though more trivial arguments give all except the (3,1) cell. Also, if a cell is in P (it’s easy to $ k$-color graphs with that resilience), then so are all cells above and to its right. We prove that $ k$-coloring $ \binom{k}{2}$-resiliently $ k$-colorable graphs is easy. This is trivial: no vertex may have degree greater than $ k-1$, and the greedy algorithm can color such graphs with $ k$ colors. So that gives us the entire light gray region.

There is one additional lower bound comes from the fact that it’s NP-hard to $ 2^{\sqrt[3]{k}}$-color a $ k$-colorable graph. In particular, we prove that if you have any function $ f(k)$ that makes it NP-hard to $ f(k)$-color a $ k$-colorable graph, then it is NP-hard to $ f(k)$-color an $ (f(k) – k)$-resiliently $ f(k)$-colorable graph. The exponential lower bound hence gives us a nice linear lower bound, and so we have the following “sufficiently zoomed out” picture of the table

zoomed-out

The zoomed out version of the classification table above.

The paper contains the details of how these observations are proved, in addition to the NP-hardness proof for 1-resiliently 3-colorable graphs. This leaves the following open problems:

  • Get an unconditional, concrete linear resilience lower bound for hardness.
  • Find an algorithm that colors graphs that are less resilient than $ O(k^2)$. Even determining specific cells like (4,5) or (5,9) would likely give enough insight for this.
  • Classify the tantalizing (3,2) cell (determine if it’s hard or easy to 3-color a 2-resiliently 3-colorable graph) or even better the (4,2) cell.
  • Find a way to relate resilient coloring back to general coloring. For example, if such and such cell is hard, then you can’t approximate k-coloring to within so many colors.

But Wait, There’s More!

Though this paper focuses on graph coloring, our idea of resilience doesn’t stop there (and this is one reason I like it so much!). One can imagine a notion of resilience for almost any combinatorial problem. If you’re trying to satisfy boolean formulas, you can define resilience to mean that you fix the truth value of some variable (we do this in the paper to build up to our main NP-hardness result of 3-coloring 1-resiliently 3-colorable graphs). You can define resilient set cover to allow the removal of some sets. And any other sort of graph-based problem (Traveling salesman, max cut, etc) can be resiliencified by adding or removing edges, whichever makes the problem more constrained.

So this resilience notion is quite general, though it’s hard to define precisely in a general fashion. There is a general framework called Constraint Satisfaction Problems (CSPs), but resilience here seem too general. A CSP is literally just a bunch of objects which can be assigned some set of values, and a set of constraints (k-ary 0-1-valued functions) that need to all be true for the problem to succeed. If we were to define resilience by “adding any constraint” to a given CSP, then there’s nothing to stop us from adding the negation of an existing constraint (or even the tautologically unsatisfiable constraint!). This kind of resilience would be a vacuous definition, and even if we try to rule out these edge cases, I can imagine plenty of weird things that might happen in their stead. That doesn’t mean there isn’t a nice way to generalize resilience to CSPs, but it would probably involve some sort of “constraint class” of acceptable constraints, and I don’t know a reasonable property to impose on the constraint class to make things work.

So there’s lots of room for future work here. It’s exciting to think where it will take me.

Until then!