Parameterizing the Vertex Cover Problem

I’m presenting a paper later this week at the Matheamtical Foundations of Computer Science 2014 in Budapest, Hungary. This conference is an interesting mix of logic and algorithms that aims to bring together researchers from these areas to discuss their work. And right away the first session on the first day focused on an area I know is important but have little experience with: fixed parameter complexity. From what I understand it’s not that popular of a topic at major theory conferences in the US (there appears to be only one paper on it at this year’s FOCS conference), but the basic ideas are worth knowing.

The basic idea is pretty simple: some hard computational problems become easier (read, polynomial-time solvable) if you fix some parameters involved to constants. Preferably small constants. For example, finding cliques of size $ k$ in a graph is NP-hard if $ k$ is a parameter, but if you fix $ k$ to a constant then you can check all possible subsets of size $ k$ in $ O(n^k)$ time. This is kind of a silly example because there are much faster ways to find triangles than checking all $ O(n^3)$ subsets of vertices, but part of the point of fixed-parameter complexity is to find the fastest algorithms in these fixed-parameter settings. Since in practice parameters are often small [citation needed], this analysis can provide useful practical algorithmic alternatives to heuristics or approximate solutions.

One important tool in the theory of fixed-parameter tractability is the idea of a kernel. I think it’s an unfortunate term because it’s massively overloaded in mathematics, but the idea is to take a problem instance with the parameter $ k$, and carve out “easy” regions of the instance (often reducing $ k$ as you go) until the runtime of the trivial brute force algorithm only depends on $ k$ and not on the size of the input. The point is that the solution you get on this “carved out” instance is either the same as the original, or can be extended back to the original with little extra work. There is a more formal definition we’ll state, but there is a canonical example that gives a great illustration.

Consider the vertex cover problem. That is, you give me a graph $ G = (V,E)$ and a number $ k$ and I have to determine if there is a subset of $ \leq k$ vertices of $ G$ that touch all of the edges in $ E$. This problem is fixed-parameter tractable because, as with $ k$-clique one can just check all subsets of size $ k$. The kernel approach we’ll show now is much smarter.

What you do is the following. As long as your graph has a vertex of degree $ > k$, you remove it and reduce $ k$ by 1. This is because a vertex of degree $ > k$ will always be chosen for a vertex cover. If it’s not, then you need to include all of its neighbors to cover its edges, but there are $ > k$ neighbors and your vertex cover is constrained by size $ k$. And so you can automatically put this high-degree vertex in your cover, and use induction on the smaller graph.

Once you can’t remove any more vertices there are two cases. In the case that there are more than $ k^2$ edges, you output that there is no vertex cover. Indeed, if you only get $ k$ vertices in your cover and you removed all vertices of degree $ > k$, then each can cover at most $ k$ edges, giving a total of at most $ k^2$. Otherwise, if there are at most $ k^2$ edges, then you can remove all the isolated vertices and show that there are only $ \leq 2k^2$ vertices left. This is because each edge touches only two vertices, so in the worst case they’re all distinct. This smaller subgraph is called a kernel of the vertex cover, and the fact that its size depends only on $ k$ is the key. So you can look at all $ 2^{2k^2} = O(1)$ subsets to determine if there’s a cover of the size you want. If you find a cover of the kernel, you add back in all the large-degree vertices you deleted and you’re done.

Now, even for small $ k$ this is a pretty bad algorithm ($ k=5$ gives $ 2^{50}$ subsets to inspect), but with more detailed analysis you can do significantly better. In particular, the best known bound reduces vertex cover to a kernel of size $ 2k – c \log(k)$ vertices for any constant $ c$ you specify. Getting $ \log(k)$ vertices is known to imply P = NP, and with more detailed complexity assumptions it’s even hard to get a graph with fewer than $ O(k^{2-\varepsilon})$ edges for any $ \varepsilon > 0$. These are all relatively recent results whose associated papers I have not read.

Even with these hardness results, there are two reasons why this kind of analysis is useful. The first is that it gives us a clearer picture of the complexity of these problems. In particular, the reduction we showed for vertex cover gives a time $ O(2^{2k^2} + n + m)$-time algorithm, which you can then compare directly to the trivial $ O(n^k)$ time brute force algorithm and measure the difference. Indeed, if $ k = o(\sqrt{(k/2) log(n)})$ then the kernelized approach is faster.

The second reason is that the kernel approach usually results in simple and quick checks for negative answers to a problem. In particular, if you want to check for $ k$-sized set covers in a graph in the real world, this analysis shows that the first thing you should do is check if the kernel has size $ > k^2$. If so, you can immediately give a “no” answer. So useful kernels can provide insight into the structure of a problem that can be turned into heuristic tools even when it doesn’t help you solve the problem exactly.

So now let’s just see the prevailing definition of a “kernelization” of a problem. This comes from the text of Downey and Fellows.

Definition: kernelization of a parameterized problem $ L$ (formally, a language where each string $ x$ is paired with a positive integer $ k$) is a $ \textup{poly}(|x|, k)$-time algorithm that converts instances $ (x,k)$ into instances $ (x’, k’)$ with the following three properties.

  • $ (x,k)$ is a yes instance of $ L$ if and only if $ (x’, k’)$ is.
  • $ |x’| \leq f(k)$ for some computable function $ f: \mathbb{N} \to \mathbb{N}$.
  • $ k’ \leq g(k)$ for some computable function $ g: \mathbb{N} \to \mathbb{N}$.

The output $ (x’, k’)$ is called a kernel, and the problem is said to admit a polynomial kernel if $ f(k) = O(k^c)$ for some constant $ c$.

So we showed that vertex cover admits a polynomial kernel (in fact, a quadratic one).

Now the nice theorem is that a problem is fixed-parameter tractable if and only if it admits a polynomial kernel. Finding a kernel is conceptually easier because, like in vertex cover, it allows you to introduce additional assumptions on the structure of the instances you’re working with. But more importantly from a theoretical standpoint, measuring the size and complexity of kernels for NP-hard problems gives us a way to discriminate among problems within NP. That and the chance to get some more practical tools for NP-hard problems makes parameterized complexity more interesting than it sounds at first.

Until next time!

Conferences, Summer Work, and an Advisor

I’ve been spending a little less time on my blog recently then I’d like to, but for good reason: I’ve been attending two weeks of research conferences, I’m getting ready for a summer internship in cybersecurity, and I’ve finally chosen an advisor.

Visions, STOC, and CCC

The Simons Institute at UC Berkeley

I’ve been taking a break from the Midwest for the last two weeks to attend some of this year’s seminal computer science theory conferences. The first (which is already over) was the Simon’s Institute Symposium on the Visions of the Theory of Computing. It wasn’t a typical computer science theory conference, because the Simon’s Institute exists to bring techniques from computer science to other fields. So there were many talks in life sciences, ranging from neurobiology (modelling the brain as a computer) to nanochemistry (using DNA to engineer computers) to physics (quantum computing). These kinds of sciences are interesting to me, but pretty far beyond my knowledge base. Other talks were on social networks (and the web), designing markets to be efficient, and computational privacy.

All of these talks shared a common theme: how can we apply the computational “lens” to everything? Interestingly, there were some very deep philosophical questions that computational insight can give answers to. For instance, Christos Papadimitriou gave a talk on evolution and sex, and raised an interesting question: what is the place of sex in evolution? Briefly, the problem with sex is as follows. Say you have a perfect human: perfect health, perfect genes, perfect intelligence, perfect everything. Because of sex, all of this human’s offspring are guaranteed to be imperfect. If the “goal” of evolution is to promote the fitness of a species, it seems that sexual reproduction works against us. Why then, is sexual reproduction the norm and asexual reproduction the exception?

Indeed, the same patterns hold in computer science. Genetic algorithms are the analogue of sexual reproduction, while “simulated annealing” (gradient descent with random mutation) is the analogue of asexual reproduction. And there we can actually experiment, and (without insulting anyone too heavily) it’s largely the case that the asexual methods outperform sexual methods.

But with computational (mathematical) models, we can actually analyze them to see what’s going on. And in fact, genetic algorithms are not optimizing a fitness function. Instead, they are simultaneously optimizing fitness and entropy in the gene pool. One might argue this allows for maximal adaptability, and there are some connections to learning-theory methods that mirror evolution (see Multiplicative Weights Update Algorithm).

This is pretty cool to think about, but more deeply there are philosophical claims about how we can apply the computational lens to other fields. The idea is that computer scientists can create and analyze generative models which reflect the empirical phenomena we observe. Indeed, this mentality is in our blood. It stems from (in the words of my advisor) a “deeply religious” view that everything is computation. And even beyond “big data” and other popular trends in the news, researchers in other fields are increasingly appreciative of the algorithmic mindset computer scientists have to offer.

The second two conferences, the Symposium on the Theory of Computing (STOC) and the Conference on Computational Complexity (CCC) are much more technical. STOC is a whirlwind of a conference, with around 15 twenty-minute talks per day (30 in total with two running in parallel at a time). And I admit that I understand very little. As I write this I’m lost during a talk on an online version of the Steiner tree problem. But even so, it gives a great glimpse into the frontier of research, and tells me what things I should be looking at in my own work.

One new notion that seems omnipresent at STOC this year is the concept of differential privacy. The idea stems from recent controversies over privacy in “anonymized” database information that was released to the public being de-anonymized in clever ways. So the question becomes: is it possible for a database to release information from its databases without revealing any information about the contents of the databases? “Information” could mean simple things like means and medians, or more complicated things like principal components. It’s a question about the mechanism for releasing data, and it’s inherently a computational problem: we want it to be (probabilistically) hard to accurately represent the data from the information provided. At a very high level, the idea is to add random noise to the data as you compute, and modify your methods to be resilient to such noise. There is a flurry of work being done on this topic, with a wealth of open problems waiting to be tackled. A related idea is that of homomorphic encryption, in which the user sends encrypted data to the compute server, the compute server computes things without decrypting the data, and then the user decrypts the result of the computation locally. The server, then, can have computational powers far beyond that of the user, but the user need not relinquish sensitive information to access that power.

A schematic example of differential privacy at work. Image credit: Microsoft Research.

Another popular topic (which I know even less about) is the idea of communication complexity and information complexity. Roughly, this field is concerned with two communicating parties with private information trying to compute a function that depends on both of their information. The communication complexity question is how can they do this while spending the least amount of resources on communication. The information complexity concern is to keep as much information private as possible, both between the two parties involved and any nefarious outsiders listening in on the conversation.

It’s amazing to me how much privacy concerns have permeated into theoretical computer science. The usual picture of a mathematician is of a wiry-haired chalk-dust-covered professor with no concern for the world outside his or her own mind. But in reality, we are actively working on the problems faced by average users, to protect them against the dangers of small groups of individuals making bad decisions with their sensitive data. Anyone worried about the upcoming dystopia posed by the masses of available data is apparently not informed in the magical powers of modern cryptography. It seems like it will only be a few years before these kinds of systems become feasible for large-scale applications. Maybe I’ll be lucky enough to have a hand in that 🙂

Starting Wednesday is the CCC, and this will likely be more concerned with more “pure” kinds of theoretical computer science research. I’m pretty excited for it, too (although I know by the end I will be exhausted; I’m already feeling the strain today).

MIT Lincoln Lab

This summer (actually, in a few days), I’ll be at MIT’s Lincoln Lab to work on problems in cybersecurity. I still don’t know quite what I’ll be working on (and I probably won’t be at liberty to describe it when I do find out), but vaguely the problem is in finding the right data representations for massive amounts of data so as to reason about it.

I know that the idea of the government analyzing the world’s internet activity is a scary thought, but I’m going to do my best to go into this with an open mind. At the very least, I’ll be familiar enough with the kind of work they do there to decide if it violates any of my principles. I had minor worries about this last year when I went to work at Lawrence Livermore, but by the time I got there it was quite obvious that my work would keep me far far away from weapons development and closer to problems with renewable energy.

And at least it will be fun to live on the east coast for a little while. In anticipation I’ve joined the Boston Python Users group, so it will be fun to go to some of their project nights and talk math and programming with like-minded people in the area.

Advisor

Finally, I’m quite pleased to announce that I have secured an advisor for my PhD program! His name is Lev Reyzin, and he’s a learning theorist. (For the last year or so I’ve been meaning to start talking about learning theory on this blog). That means that I’m officially in the field of theoretical computer science.

I was a little bit late in choosing an advisor, mostly because I meandered around algebraic geometry and algebraic topology and logic before deciding that theoretical computer science was the right field for me.

It’s exciting to finally have an advisor because it implies a few things:

  • I’ve passed all of my prelim exams.
  • I can focus on one area and start to deeply understand important problems in theoretical computer science.
  • My graduate life will shift from mainly course-taking to mainly research. Research is much more fun.

Lev is also a good fit for me both because I’m interested in learning theory, and because he has nontrivial experience in industry-focused research. He’s done work at Yahoo! Research and Google Research, in particular on improving their algorithms for placing news stories and ads to the front page. The formal name for this problem is the multi-armed bandit problem, and I’ll definitely be writing about it on this blog. While I’d probably be totally happy doing pure learning theory, I think it’s fun and worthwhile to look at wider applications. So I have a lot of optimism that the next few years will be fun and productive.

Until next time!