Problem: Express a boolean logic formula using polynomials. I.e., if an input variable is set to , that is interpreted as false, while is interpreted as true. The output of the polynomial should be 0 or 1 according to whether the formula is true or false as a whole.
Solution: You can do this using a single polynomial.
Illustrating with an example: the formula is also known as
not((a or b) and (not c or d))
The trick is to use multiplication for “and” and for “not.” So would be , and would be . Indeed, if you have two binary variables and then is 1 precisely when both are 1, and zero when either variable is zero. Likewise, if is zero and zero if is one.
Combine this with deMorgan’s rule to get any formula. translates to . For our example above,
Which expands to
If you plug in you get True in the original formula (because “not c or d” is False), and likewise the polynomial is
You can verify the rest work yourself, using the following table as a guide:
Discussion: This trick is used all over CS theory to embed boolean logic within polynomials, and it makes the name “boolean algebra” obvious, because it’s just a subset of normal algebra.
Moreover, since boolean satisfiability—the problem of algorithmically determining if a boolean formula has a satisfying assignment (a choice of variables evaluating to true)—is NP-hard, this can be used to show certain problems relating to multivariable polynomials is also hard. For example, finding roots of multivariable polynomials (even if you knew nothing about algebraic geometry) is hard because you’d run into NP-hardness by simply considering the subset of polynomials coming from boolean formulas.
Here’s a more interesting example, related to the kinds of optimization problems that show up in modern machine learning. Say you want to optimize a polynomial subject to a set of quadratic equality constraints. This is NP-hard. Here’s why.
Let be a boolean formula, and its corresponding polynomial. First, each variable used in the polynomial can be restricted to binary values via the constraint .
You can even show NP-hardness if the target function to optimize is only quadratic. As an exercise, one can express the subset sum problem as a quadratic programming problem using similar choices for the constraints. According to this writeup you even express subset sum as a quadratic program with linear constraints.
The moral of the story is simply that multivariable polynomials can encode arbitrary boolean logic.
A while back I announced a preprint of a paper on coloring graphs with certain resilience properties. I’m pleased to announce that it’s been accepted to the Mathematical Foundations of Computer Science 2014, which is being held in Budapest this year. Since we first published the preprint we’ve actually proved some additional results about resilience, and so I’ll expand some of the details here. I think it makes for a nicer overall picture, and in my opinion it gives a little more justification that resilient coloring is interesting, at least in contrast to other resilience problems.
Recall that a “resilient” yes-instance of a combinatorial problem is one which remains a yes-instance when you add or remove some constraints. The way we formalized this for SAT was by fixing variables to arbitrary values. Then the question is how resilient does an instance need to be in order to actually find a certificate for it? In more detail,
Definition: -resilient -SAT formulas are satisfiable formulas in -CNF form (conjunctions of clauses, where each clause is a disjunction of three literals) such that for all choices of variables, every way to fix those variables yields a satisfiable formula.
For example, the following 3-CNF formula is 1-resilient:
The idea is that resilience may impose enough structure on a SAT formula that it becomes easy to tell if it’s satisfiable at all. Unfortunately for SAT (though this is definitely not the case for coloring), there are only two possibilities. Either the instances are so resilient that they never existed in the first place (they’re vacuously trivial), or the instances are NP-hard. The first case is easy: there are no -resilient -SAT formulas. Indeed, if you’re allowed to fix variables to arbitrary values, then you can just pick a clause and set all its variables to false. So no formula can ever remain satisfiable under that condition.
The second case is when the resilience is strictly less than the clause size, i.e. -resilient -SAT for . In this case the problem of finding a satisfying assignment is NP-hard. We’ll show this via a sequence of reductions which start at 3-SAT, and they’ll involve two steps: increasing the clause size and resilience, and decreasing the clause size and resilience. The trick is in balancing which parts are increased and decreased. I call the first step the “blowing up” lemma, and the second part the “shrinking down” lemma.
Blowing Up and Shrinking Down
Here’s the intuition behind the blowing up lemma. If you give me a regular (unresilient) 3-SAT formula , what I can do is make a copy of with a new set of variables and OR the two things together. Call this . This is clearly logically equivalent to the original formula; if you give me a satisfying assignment for the ORed thing, I can just see which of the two clauses are satisfied and use that sub-assignment for , and conversely if you can satisfy it doesn’t matter what truth values you choose for the new set of variables. And further you can transform the ORed formula into a 6-SAT formula in polynomial time. Just apply deMorgan’s rules for distributing OR across AND.
Now the choice of a new set of variables allows us to give some resilient. If you fix one variable to the value of your choice, I can always just work with the other set of variables. Your manipulation doesn’t change the satisfiability of the ORed formula, because I’ve added all of this redundancy. So we took a 3-SAT formula and turned it into a 1-resilient 6-SAT formula.
The idea generalizes to the blowing up lemma, which says that you can measure the effects of a blowup no matter what you start with. More formally, if is the number of copies of variables you make, is the clause size of the starting formula , and is the resilience of , then blowing up gives you an -resilient -SAT formula. The argument is almost identical to the example above the resilience is more general. Specifically, if you fix fewer than variables, then the pigeonhole principle guarantees that one of the copies of variables has at most fixed values, and we can just work with that set of variables (i.e., this small part of the big ORed formula is satisfiable if was -resilient).
The shrinking down lemma is another trick that is similar to the reduction from -SAT to 3-SAT. There you take a clause like and add new variables to break up the clause in to clauses of size 3 as follows:
These are equivalent because your choice of truth values for the tell me which of these sub-clauses to look for a true literal of the old variables. I.e. if you choose then you have to pick either or to be true. And it’s clear that if you’re willing to double the number of variables (a linear blowup) you can always get a -clause down to an AND of 3-clauses.
So the shrinking down reduction does the same thing, except we only split clauses in half. For a clause , call the first half of a clause and the second half (you can see how my Python training corrupts my notation preference). Then to shrink a clause down from size to size (1 for the new variable), add a variable and break into
and just AND these together for all clauses. Call the original formula and the transformed one . The formulas are logically equivalent for the same reason that the -to-3-SAT reduction works, and it’s already in the right CNF form. So resilience is all we have to measure. The claim is that the resilience is , where is the resilience of .
The reason for this is that if all the fixed variables are old variables (not ), then nothing changes and the resilience of the original keeps us safe. And each we fix has no effect except to force us to satisfy a variable in one of the two halves. So there is this implication that if you fix a you have to also fix a regular variable. Because we can’t guarantee anything if we fix more than regular variables, we’d have to stop before fixing of the . And because these new clauses have size , we can’t do this more than times or else we risk ruining an entire clause. So this give the definition of . So this proves the shrinking down lemma.
Resilient SAT is always hard
The blowing up and shrinking down lemmas can be used to show that -resilient -SAT is NP-hard for all . What we do is reduce from 3-SAT to an -resilient -SAT instance in such a way that the 3-SAT formula is satisfiable if and only if the transformed formula is resiliently satisfiable.
What makes these two lemmas work together is that shrinking down shrinks the clause size just barely less than the resilience, and blowing up increases resilience just barely more than it increases clause size. So we can combine these together to climb from 3-SAT up to some high resilience and satisfiability, and then iteratively shrink down until we hit our target.
One might worry that it will take an exponential number of reductions (or a few reductions of exponential size) to get from 3-SAT to the of our choice, but we have a construction that does it in at most four steps, with only a linear initial blowup from 3-SAT to -resilient -SAT. Then, to deal with the odd ceilings and floors in the shrinking down lemma, you have to find a suitable larger to reduce to (by padding with useless variables, which cannot make the problem easier). And you choose this so that you only need at most two applications of shrinking down to get to -resilient -SAT. Our preprint has the gory details (which has an inelegant part that is not worth writing here), but in the end you show that -resilient -SAT is hard, and since that’s the maximal amount of resilience before the problem becomes vacuously trivial, all smaller resilience values are also hard.
So how does this relate to coloring?
I’m happy about this result not just because it answers an open question I’m honestly curious about, but also because it shows that resilient coloring is more interesting. Basically this proves that satisfiability is so hard that no amount of resilience can make it easier in the worst case. But coloring has a gradient of difficulty. Once you get to order resilience for -colorable graphs, the coloring problem can be solved efficiently by a greedy algorithm (and it’s not a vacuously empty class of graphs). Another thing on the side is that we use the hardness of resilient SAT to get the hardness results we have for coloring.
If you really want to stretch the implications, you might argue that this says something like “coloring is somewhat easier than SAT,” because we found a quantifiable axis along which SAT remains difficult while coloring crumbles. The caveat is that fixing colors of vertices is not exactly comparable to fixing values of truth assignments (since we are fixing lots of instances by fixing a variable), but at least it’s something concrete.
Coloring is still mostly open, and recently I’ve been going to talks where people are discussing startlingly similar ideas for things like Hamiltonian cycles. So that makes me happy.
In a previous post we introduced a learning model called Probably Approximately Correct (PAC). We saw an example of a concept class that was easy to learn: intervals on the real line (and more generally, if you did the exercise, axis-aligned rectangles in a fixed dimension).
One of the primary goals of studying models of learning is to figure out what is learnable and what is not learnable in the various models. So as a technical aside in our study of learning theory, this post presents the standard example of a problem that isn’t learnable in the PAC model we presented last time. Afterward we’ll see that allowing the learner to be more expressive can be helpful, and by doing so we can make this unlearnable problem learnable.
Addendum: This post is dishonest in the following sense. The original definition I presented of PAC-learning is not considered the “standard” version, precisely because it forces the learning algorithm to produce hypotheses from the concept class it’s trying to learn. As this post shows, that prohibits us from learning concept classes that should be easy to learn. So to quell any misconceptions, we’re not saying that 3-term DNF formulas (defined below) are not PAC-learnable, just that they’re not PAC-learnable under the definition we gave in the previous post. In other words, we’ve set up a straw man (or, done some good mathematics) in order to illustrate why we need to add the extra bit about hypothesis classes to the definition at the end of this post.
3-Term DNF Formulas
Readers of this blog will probably have encountered a boolean formula before. A boolean formula is just a syntactic way to describe some condition (like, exactly one of these two things has to be true) using variables and logical connectives. The best way to recall it is by example: the following boolean formula encodes the “exclusive or” of two variables.
The wedge denotes a logical AND and the vee denotes a logical OR. A bar above a variable represents a negation of a variable. (Please don’t ask me why the official technical way to write AND and OR is in all caps, I feel like I’m yelling math at people.)
In general a boolean formula has literals, which we can always denote by an or the negation , and connectives and , and parentheses to denote order. It’s a simple fact that any logical formula can be encoded using just these tools, but rather than try to learn general boolean formulas we look at formulas in a special form.
Definition: A formula is in three-term disjunctive normal form (DNF) if it has the form where each $C_i$ is an AND of some number of literals.
Readers who enjoyed our P vs NP primer will recall a related form of formulas: the 3-CNF form, where the “three” meant that each clause had exactly three literals and the “C” means the clauses are connected with ANDs. This is a sort of dual normal form: there are only three clauses, each clause can have any number of variables, and the roles of AND and OR are switched. In fact, if you just distribute the ‘s in a 3-term DNF formula using DeMorgan’s rules, you’ll get an equivalent 3-CNF formula. The restriction of our hypotheses to 3-term DNFs will be the crux of the difficulty: it’s not that we can’t learn DNF formulas, we just can’t learn them if we are forced to express our hypothesis as a 3-term DNF as well.
The way we’ll prove that 3-term DNF formulas “can’t be learned” in the PAC model is by an NP-hardness reduction. That is, we’ll show that if we could learn 3-term DNFs in the PAC model, then we’d be able to efficiently solve NP-hard problems with high probability. The official conjecture we’d be violating is that RP is different from NP. RP is the class of problems that you can solve in polynomial time with randomness if you can never have false positives, and the probability of a false negative is at most 1/2. Our “RP” algorithm will be a PAC-learning algorithm.
The NP-complete problem we’ll reduce from is graph 3-coloring. So if you give me a graph, I’ll produce an instance of the 3-term DNF PAC-learning problem in such a way that finding a hypothesis with low error corresponds to a valid 3-coloring of the graph. Since PAC-learning ensures that you are highly likely to find a low-error hypothesis, the existence of a PAC-learning algorithm will constitute an RP algorithm to solve this NP-complete problem.
In more detail, an “instance” of the 3-term DNF problem comes in the form of a distribution over some set of labeled examples. In this case the “set” is the set of all possible truth assignments to the variables, where we fix the number of variables to suit our needs, along with a choice of a target 3-term DNF to be learned. Then you’d have to define the distribution over these examples.
But we’ll actually do something a bit slicker. We’ll take our graph , we’ll construct a set of labeled truth assignments, and we’ll define the distribution to be the uniform distribution over those truth assignments used in . Then, if there happens to be a 3-term DNF that coincidentally labels the truth assignments in exactly how we labeled them, and we set the allowed error to be small enough, a PAC-learning algorithm will find a consistent hypothesis (and it will correspond to a valid 3-coloring of ). Otherwise, no algorithm would be able to come up with a low-error hypothesis, so if our purported learning algorithm outputs a bad hypothesis we’d be certain (with high probability) that it was not bad luck but that the examples are not consistent with any 3-term DNF (and hence there is no valid 3-coloring of ).
This general outline has nothing to do with graphs, and so you may have guessed that the technique is commonly used to prove learning problems are hard: come up with a set of labeled examples, and a purported PAC-learning algorithm would have to come up with a hypothesis consistent with all the examples, which translates back to a solution to your NP-hard problem.
Now we can describe the reduction from graphs to labeled examples. The intuition is simple: each term in the 3-term DNF should correspond to a color class, and so any two adjacent vertices should correspond to an example that cannot be true. The clauses will correspond to…
For a graph with nodes and a set of undirected edges , we construct a set of examples with positive labels and one with negative examples . The examples are truth assignments to variables, which we label , and we identify a truth assignment to the -valued vector in the usual way (true is 1, false is 0).
The positive examples are simple: for each add a truth assignment for . I.e., the binary vector is , and the zero is in the -th position.
The negative examples come from the edges. For each edge , we add the example with a zero in the -th and -th components and ones everywhere else. Here is an example graph and the corresponding positive and negative examples:
Claim: is 3-colorable if and only if the corresponding examples are consistent with some 3-term DNF formula .
Again, consistent just means that is satisfied by every truth assignment in and unsatisfied by every example in . Since we chose our distribution to be uniform over , we don’t care what does elsewhere.
Indeed, if is three-colorable we can fix some valid 3-coloring with colors red, blue, and yellow. We can construct a 3-term DNF that does what we need. Let be the AND of all the literals for which vertex is not red. For each such , the corresponding example in will satisfy , because we put a zero in the -th position and ones everywhere else. Similarly, no example in will make true because to do so both vertices in the corresponding edge would have to be red.
To drive this last point home say there are three vertices and your edge is . Then the corresponding negative example is . Unless both and are colored red, one of will have to be ANDed as part of . But the example has a zero for both and , so would not be satisfied.
Doing the same thing for blue and yellow, and OR them together to get . Since the case is symmetrically the same for the other colors, we a consistent 3-term DNF.
On the other hand, say there is a consistent 3-term DNF . We need to construct a three coloring of . It goes in largely the same way: label the clauses for Red, Blue, and Yellow, and then color a vertex the color of the clause that is satisfied by the corresponding example in . There must be some clause that does this because is consistent with , and if there are multiple you can pick a valid color arbitrarily. Now we argue why no edge can be monochromatic. Suppose there were such an edge , and both and are colored, say, blue. Look at the clause : since and are both blue, the positive examples corresponding to those vertices (with a 0 in the single index and 1’s everywhere else) both make true. Since those two positive examples differ in both their -th and -th positions, can’t have any of the literals . But then the negative example for the edge would satisfy because it has 1’s everywhere except ! This means that the formula doesn’t consistently classify the negative examples, a contradiction. This proves the Claim.
Now we just need to show a few more details to finish the proof. In particular, we need to observe that the number of examples we generate is polynomial in the size of the graph ; that the learning algorithm would still run in polynomial time in the size of the input graph (indeed, this depends on our choice of the learning parameters); and that we only need to pick and in order to enforce that an efficient PAC-learner would generate a hypothesis consistent with all the examples. Indeed, if a hypothesis errs on even one example, it will have error at least , which is too big.
Everything’s not Lost
This might seem a bit depressing for PAC-learning, that we can’t even hope to learn 3-term DNF formulas. But we will give a sketch of why this is mostly not a problem with PAC but a problem with DNFs.
In particular, the difficulty comes in forcing a PAC-learning algorithm to express its hypothesis as a 3-term DNF, as opposed to what we might argue is a more natural representation. As we observed, distributing the ORs in a 3-term DNF produces a 3-CNF formula (an AND of clauses where each clause is an OR of exactly three literals). Indeed, one can PAC-learn 3-CNF formulas efficiently, and it suffices to show that one can learn formulas which are just ANDs of literals. Then you can blow up the number of variables only polynomially larger to get 3-CNFs. ANDs of literals are just called “conjunctions,” so the problem is to PAC-learn conjunctions. The idea that works is the same one as in our first post on PAC where we tried to learn intervals: just pick the “smallest” hypothesis that is consistent with all the examples you’ve seen so far. We leave a formal proof as an (involved) exercise to the reader.
The important thing to note is that a concept class (the thing we’re trying to learn) might be hard to learn if you’re constrained to work within . If you’re allowed more expressive hypotheses (in this case, arbitrary boolean formulas), then learning suddenly becomes tractable. This compels us to add an additional caveat to the PAC definition from our first post.
Definition: A concept class over a set is efficiently PAC-learnable using the hypothesis class if there exists an algorithm with access to a query function for and runtime , such that for all , all distributions over , and all , the probability that produces a hypothesis with error at most is at least .
And with that we’ll end this extended side note. The next post in this series will introduce and analyze a fascinating notion of dimension for concept classes, the Vapnik-Chervonenkis dimension.