Parameterizing the Vertex Cover Problem

I’m presenting a paper later this week at the Matheamtical Foundations of Computer Science 2014 in Budapest, Hungary. This conference is an interesting mix of logic and algorithms that aims to bring together researchers from these areas to discuss their work. And right away the first session on the first day focused on an area I know is important but have little experience with: fixed parameter complexity. From what I understand it’s not that popular of a topic at major theory conferences in the US (there appears to be only one paper on it at this year’s FOCS conference), but the basic ideas are worth knowing.

The basic idea is pretty simple: some hard computational problems become easier (read, polynomial-time solvable) if you fix some parameters involved to constants. Preferably small constants. For example, finding cliques of size k in a graph is NP-hard if k is a parameter, but if you fix k to a constant then you can check all possible subsets of size k in O(n^k) time. This is kind of a silly example because there are much faster ways to find triangles than checking all O(n^3) subsets of vertices, but part of the point of fixed-parameter complexity is to find the fastest algorithms in these fixed-parameter settings. Since in practice parameters are often small [citation needed], this analysis can provide useful practical algorithmic alternatives to heuristics or approximate solutions.

One important tool in the theory of fixed-parameter tractability is the idea of a kernel. I think it’s an unfortunate term because it’s massively overloaded in mathematics, but the idea is to take a problem instance with the parameter k, and carve out “easy” regions of the instance (often reducing k as you go) until the runtime of the trivial brute force algorithm only depends on k and not on the size of the input. The point is that the solution you get on this “carved out” instance is either the same as the original, or can be extended back to the original with little extra work. There is a more formal definition we’ll state, but there is a canonical example that gives a great illustration.

Consider the vertex cover problem. That is, you give me a graph G = (V,E) and a number k and I have to determine if there is a subset of \leq k vertices of G that touch all of the edges in E. This problem is fixed-parameter tractable because, as with k-clique one can just check all subsets of size k. The kernel approach we’ll show now is much smarter.

What you do is the following. As long as your graph has a vertex of degree > k, you remove it and reduce k by 1. This is because a vertex of degree > k will always be chosen for a vertex cover. If it’s not, then you need to include all of its neighbors to cover its edges, but there are > k neighbors and your vertex cover is constrained by size k. And so you can automatically put this high-degree vertex in your cover, and use induction on the smaller graph.

Once you can’t remove any more vertices there are two cases. In the case that there are more than k^2 edges, you output that there is no vertex cover. Indeed, if you only get k vertices in your cover and you removed all vertices of degree > k, then each can cover at most k edges, giving a total of at most k^2. Otherwise, if there are at most k^2 edges, then you can remove all the isolated vertices and show that there are only \leq 2k^2 vertices left. This is because each edge touches only two vertices, so in the worst case they’re all distinct. This smaller subgraph is called a kernel of the vertex cover, and the fact that its size depends only on k is the key. So you can look at all 2^{2k^2} = O(1) subsets to determine if there’s a cover of the size you want. If you find a cover of the kernel, you add back in all the large-degree vertices you deleted and you’re done.

Now, even for small k this is a pretty bad algorithm (k=5 gives 2^{50} subsets to inspect), but with more detailed analysis you can do significantly better. In particular, the best known bound reduces vertex cover to a kernel of size 2k - c \log(k) vertices for any constant c you specify. Getting \log(k) vertices is known to imply P = NP, and with more detailed complexity assumptions it’s even hard to get a graph with fewer than O(k^{2-\varepsilon}) edges for any \varepsilon > 0. These are all relatively recent results whose associated papers I have not read.

Even with these hardness results, there are two reasons why this kind of analysis is useful. The first is that it gives us a clearer picture of the complexity of these problems. In particular, the reduction we showed for vertex cover gives a time O(2^{2k^2} + n + m)-time algorithm, which you can then compare directly to the trivial O(n^k) time brute force algorithm and measure the difference. Indeed, if k = o(\sqrt{(k/2) log(n)}) then the kernelized approach is faster.

The second reason is that the kernel approach usually results in simple and quick checks for negative answers to a problem. In particular, if you want to check for k-sized set covers in a graph in the real world, this analysis shows that the first thing you should do is check if the kernel has size > k^2. If so, you can immediately give a “no” answer. So useful kernels can provide insight into the structure of a problem that can be turned into heuristic tools even when it doesn’t help you solve the problem exactly.

So now let’s just see the prevailing definition of a “kernelization” of a problem. This comes from the text of Downey and Fellows.

Definition: kernelization of a parameterized problem L (formally, a language where each string x is paired with a positive integer k) is a \textup{poly}(|x|, k)-time algorithm that converts instances (x,k) into instances (x', k') with the following three properties.

  • (x,k) is a yes instance of L if and only if (x', k') is.
  • |x'| \leq f(k) for some computable function f: \mathbb{N} \to \mathbb{N}.
  • k' \leq g(k) for some computable function g: \mathbb{N} \to \mathbb{N}.

The output (x', k') is called a kernel, and the problem is said to admit a polynomial kernel if f(k) = O(k^c) for some constant c.

So we showed that vertex cover admits a polynomial kernel (in fact, a quadratic one).

Now the nice theorem is that a problem is fixed-parameter tractable if and only if it admits a polynomial kernel. Finding a kernel is conceptually easier because, like in vertex cover, it allows you to introduce additional assumptions on the structure of the instances you’re working with. But more importantly from a theoretical standpoint, measuring the size and complexity of kernels for NP-hard problems gives us a way to discriminate among problems within NP. That and the chance to get some more practical tools for NP-hard problems makes parameterized complexity more interesting than it sounds at first.

Until next time!

About these ads

An Update on “Coloring Resilient Graphs”

A while back I announced a preprint of a paper on coloring graphs with certain resilience properties. I’m pleased to announce that it’s been accepted to the Mathematical Foundations of Computer Science 2014, which is being held in Budapest this year. Since we first published the preprint we’ve actually proved some additional results about resilience, and so I’ll expand some of the details here. I think it makes for a nicer overall picture, and in my opinion it gives a little more justification that resilient coloring is interesting, at least in contrast to other resilience problems.

Resilient SAT

Recall that a “resilient” yes-instance of a combinatorial problem is one which remains a yes-instance when you add or remove some constraints. The way we formalized this for SAT was by fixing variables to arbitrary values. Then the question is how resilient does an instance need to be in order to actually find a certificate for it? In more detail,

Definition: r-resilient k-SAT formulas are satisfiable formulas in k-CNF form (conjunctions of clauses, where each clause is a disjunction of three literals) such that for all choices of r variables, every way to fix those variables yields a satisfiable formula.

For example, the following 3-CNF formula is 1-resilient:

\displaystyle (a \vee b \vee c) \wedge (a \vee \overline{b} \vee \overline{c}) \wedge (\overline{a} \vee \overline{b} \vee c)

The idea is that resilience may impose enough structure on a SAT formula that it becomes easy to tell if it’s satisfiable at all. Unfortunately for SAT (though this is definitely not the case for coloring), there are only two possibilities. Either the instances are so resilient that they never existed in the first place (they’re vacuously trivial), or the instances are NP-hard. The first case is easy: there are no k-resilient k-SAT formulas. Indeed, if you’re allowed to fix k variables to arbitrary values, then you can just pick a clause and set all its variables to false. So no formula can ever remain satisfiable under that condition.

The second case is when the resilience is strictly less than the clause size, i.e. r-resilient k-SAT for 0 \leq r < k. In this case the problem of finding a satisfying assignment is NP-hard. We’ll show this via a sequence of reductions which start at 3-SAT, and they’ll involve two steps: increasing the clause size and resilience, and decreasing the clause size and resilience. The trick is in balancing which parts are increased and decreased. I call the first step the “blowing up” lemma, and the second part the “shrinking down” lemma.

Blowing Up and Shrinking Down

Here’s the intuition behind the blowing up lemma. If you give me a regular (unresilient) 3-SAT formula \varphi, what I can do is make a copy of \varphi with a new set of variables and OR the two things together. Call this \varphi^1 \vee \varphi^2. This is clearly logically equivalent to the original formula; if you give me a satisfying assignment for the ORed thing, I can just see which of the two clauses are satisfied and use that sub-assignment for \varphi, and conversely if you can satisfy \varphi it doesn’t matter what truth values you choose for the new set of variables. And further you can transform the ORed formula into a 6-SAT formula in polynomial time. Just apply deMorgan’s rules for distributing OR across AND.

Now the choice of a new set of variables allows us to give some resilient. If you fix one variable to the value of your choice, I can always just work with the other set of variables. Your manipulation doesn’t change the satisfiability of the ORed formula, because I’ve added all of this redundancy. So we took a 3-SAT formula and turned it into a 1-resilient 6-SAT formula.

The idea generalizes to the blowing up lemma, which says that you can measure the effects of a blowup no matter what you start with. More formally, if s is the number of copies of variables you make, k is the clause size of the starting formula \varphi, and r is the resilience of \varphi, then blowing up gives you an [(r+1)s - 1]-resilient (sk)-SAT formula. The argument is almost identical to the example above the resilience is more general. Specifically, if you fix fewer than (r+1)s variables, then the pigeonhole principle guarantees that one of the s copies of variables has at most r fixed values, and we can just work with that set of variables (i.e., this small part of the big ORed formula is satisfiable if \varphi was r-resilient).

The shrinking down lemma is another trick that is similar to the reduction from k-SAT to 3-SAT. There you take a clause like v \vee w \vee x \vee y \vee z and add new variables z_i to break up the clause in to clauses of size 3 as follows:

\displaystyle (v \vee w \vee z_1) \wedge (\neg z_1 \vee x \vee z_2) \wedge (\neg z_2 \vee y \vee z)

These are equivalent because your choice of truth values for the z_i tell me which of these sub-clauses to look for a true literal of the old variables. I.e. if you choose z_1 = T, z_2 = F then you have to pick either y or z to be true. And it’s clear that if you’re willing to double the number of variables (a linear blowup) you can always get a k-clause down to an AND of 3-clauses.

So the shrinking down reduction does the same thing, except we only split clauses in half. For a clause C, call C[:k/2] the first half of a clause and C[k/2:] the second half (you can see how my Python training corrupts my notation preference). Then to shrink a clause C_i down from size k to size \lceil k/2 \rceil + 1 (1 for the new variable), add a variable z_i and break C_i into

\displaystyle (C_i[:k/2] \vee z_i) \wedge (\neg z_i \vee C[k/2:])

and just AND these together for all clauses. Call the original formula \varphi and the transformed one \psi. The formulas are logically equivalent for the same reason that the k-to-3-SAT reduction works, and it’s already in the right CNF form. So resilience is all we have to measure. The claim is that the resilience is q = \min(r, \lfloor k/2 \rfloor), where r is the resilience of \varphi.

The reason for this is that if all the fixed variables are old variables (not z_i), then nothing changes and the resilience of the original \phi keeps us safe. And each z_i we fix has no effect except to force us to satisfy a variable in one of the two halves. So there is this implication that if you fix a z_i you have to also fix a regular variable. Because we can’t guarantee anything if we fix more than r regular variables, we’d have to stop before fixing r of the z_i. And because these new clauses have size k/2 + 1, we can’t do this more than k/2 times or else we risk ruining an entire clause. So this give the definition of q. So this proves the shrinking down lemma.

Resilient SAT is always hard

The blowing up and shrinking down lemmas can be used to show that r-resilient k-SAT is NP-hard for all r < k. What we do is reduce from 3-SAT to an r-resilient k-SAT instance in such a way that the 3-SAT formula is satisfiable if and only if the transformed formula is resiliently satisfiable.

What makes these two lemmas work together is that shrinking down shrinks the clause size just barely less than the resilience, and blowing up increases resilience just barely more than it increases clause size. So we can combine these together to climb from 3-SAT up to some high resilience and satisfiability, and then iteratively shrink down until we hit our target.

One might worry that it will take an exponential number of reductions (or a few reductions of exponential size) to get from 3-SAT to the (r,k) of our choice, but we have a construction that does it in at most four steps, with only a linear initial blowup from 3-SAT to r-resilient 3(r+1)-SAT. Then, to deal with the odd ceilings and floors in the shrinking down lemma, you have to find a suitable larger k to reduce to (by padding with useless variables, which cannot make the problem easier). And you choose this k so that you only need at most two applications of shrinking down to get to (k-1)-resilient k-SAT. Our preprint has the gory details (which has an inelegant part that is not worth writing here), but in the end you show that (k-1)-resilient k-SAT is hard, and since that’s the maximal amount of resilience before the problem becomes vacuously trivial, all smaller resilience values are also hard.

So how does this relate to coloring?

I’m happy about this result not just because it answers an open question I’m honestly curious about, but also because it shows that resilient coloring is more interesting. Basically this proves that satisfiability is so hard that no amount of resilience can make it easier in the worst case. But coloring has a gradient of difficulty. Once you get to order k^2 resilience for k-colorable graphs, the coloring problem can be solved efficiently by a greedy algorithm (and it’s not a vacuously empty class of graphs). Another thing on the side is that we use the hardness of resilient SAT to get the hardness results we have for coloring.

If you really want to stretch the implications, you might argue that this says something like “coloring is somewhat easier than SAT,” because we found a quantifiable axis along which SAT remains difficult while coloring crumbles. The caveat is that fixing colors of vertices is not exactly comparable to fixing values of truth assignments (since we are fixing lots of instances by fixing a variable), but at least it’s something concrete.

Coloring is still mostly open, and recently I’ve been going to talks where people are discussing startlingly similar ideas for things like Hamiltonian cycles. So that makes me happy.

Until next time!

When Greedy Algorithms are Good Enough: Submodularity and the (1 – 1/e)-Approximation

Greedy algorithms are among the simplest and most intuitive algorithms known to humans. Their name essentially gives their description: do the thing that looks best right now, and repeat until nothing looks good anymore or you’re forced to stop. Some of the best situations in computer science are also when greedy algorithms are optimal or near-optimal. There is a beautiful theory of this situation, known as the theory of matroids. We haven’t covered matroids on this blog (at some point we will), but in this post we will focus on the next best thing: when the greedy algorithm guarantees a reasonably good approximation to the optimal solution.

This situation isn’t hard to formalize, and we’ll make it as abstract as possible. Say you have a set of objects X, and you’re looking to find the “best” subset S \subset X. Here “best” is just measured by a fixed (known, efficiently computable) objective function f : 2^X \to \mathbb{R}. That is, f accepts as input subsets of X and outputs numbers so that better subsets have larger numbers. Then the goal is to find a subset maximizing X.

In this generality the problem is clearly impossible. You’d have to check all subsets to be sure you didn’t miss the best one. So what conditions do we need on either X or f or both that makes this problem tractable? There are plenty you could try, but one very rich property is submodularity.

The Submodularity Condition

I think the simplest way to explain submodularity is in terms of coverage. Say you’re starting a new radio show and you have to choose which radio stations to broadcast from to reach the largest number of listeners. For simplicity say each radio station has one tower it broadcasts from, and you have a good estimate of the number of listeners you would reach if you broadcast from a given tower. For more simplicity, say it costs the same to broadcast from each tower, and your budget restricts you to a maximum of ten stations to broadcast from. So the question is: how do you pick towers to maximize your overall reach?

The hidden condition here is that some towers overlap in which listeners they reach. So if you broadcast from two towers in the same city, a listener who has access to both will just pick one or the other. In other words, there’s a diminished benefit to picking two overlapping towers if you already have chosen one.

In our version of the problem, picking both of these towers has some small amount of "overkill."

In our version of the problem, picking both of these towers has some small amount of “overkill.”

This “diminishing returns” condition is a general idea you can impose on any function that takes in subsets of a given set and produces numbers. If X is a set then for what seems like a strange reason we denote the set of all subsets of X by 2^X. So we can state this condition more formally,

Definition: Let X be a finite set. A function f: 2^X \to \mathbb{R} is called submodular if for all subsets S \subset T \subset X and all x \in X \setminus T,

\displaystyle f(S \cup \{ x \}) - f(S) \geq f(T \cup \{ x \}) - f(T)

In other words, if f measures “benefit,” then the marginal benefit of adding x to S is at least as high as the marginal benefit of adding it to T. Since S \subset T and x are all arbitrary, this is as general as one could possibly make it.

Before we start doing things with submodular functions, let’s explore some basic properties. The first is an equivalent definition of submodularity

Proposition: f is submodular if and only if for all A, B \subset X, it holds that

\displaystyle f(A \cap B) + f(A \cup B) \leq f(A) + f(B).

Proof. If we assume f has the condition from this proposition, then we can set A=T, B=S \cup \{ x \}, and the formula just works out. Conversely, if we have the condition from the definition, then using the fact that A \cap B \subset B we can inductively apply the inequality to each element of A \setminus B to get

\displaystyle f(A \cup B) - f(B) \leq f(A) - f(A \cap B)

\square

Next, we can tweak and combine submodular functions to get more submodular functions. In particular, non-negative linear combinations of sub-modular functions are submodular. In other words, if f_1, \dots, f_k are submodular on the same set X, and \alpha_1, \dots, \alpha_k are all non-negative reals, then \alpha_1 f_1 + \dots + \alpha_k f_k is also a submodular function on X. It’s an easy exercise in applying the definition to see why this is true. This is important because when we’re designing objectives to maximize, we can design them by making some simple submodular pieces, and then picking an appropriate combination of those pieces.

The second property we need to impose on a submodular function is monotonicity. That is, as your sets get more elements added to them, their value under f only goes up. In other words, f is monotone when S \subset T then f(S) \leq f(T). An interesting property of functions that are both submodular and monotone is that the truncation of such a function is also submodular and monotone. In other words, \textup{min}(f(S), c) is still submodular when f is monotone submodular and c is a constant.

Submodularity and Monotonicity Give 1 – 1/e

The wonderful thing about submodular functions is that we have a lot of great algorithmic guarantees for working with them. We’ll prove right now that the coverage problem (while it might be hard to solve in general) can be approximated pretty well by the greedy algorithm.

Here’s the algorithmic setup. I give you a finite set X and an efficient black-box to evaluate f(S) for any subset S \subset X you want. I promise you that f is monotone and submodular. Now I give you an integer k between 1 and the size of X, and your task is to quickly find a set S of size k for which f(S) is maximal among all subsets of size k. That is, you design an algorithm that will work for any k, X, f and runs in polynomial time in the sizes of X, k.

In general this problem is NP-hard, meaning you’re not going to find a solution that works in the worst case (if you do, don’t call me; just claim your million dollar prize). So how well can we approximate the optimal value for f(S) by a different set of size k? The beauty is that, if your function is monotone and submodular, you can guarantee to get within 63% of the optimum. The hope (and reality) is that in practice it will often perform much better, but still this is pretty good! More formally,

Theorem: Let f be a monotone, submodular, non-negative function on X. The greedy algorithm, which starts with S as the empty set and at every step picks an element x which maximizes the marginal benefit f(S \cup \{ x \}) - f(S), provides a set S that achieves a (1- 1/e)-approximation of the optimum.

We’ll prove this in just a little bit more generality, and the generality is quite useful. If we call S_1, S_2, \dots, S_l the sets chosen by the greedy algorithm (where now we might run the greedy algorithm for l > k steps), then for all l, k, we have

\displaystyle f(S_l) \geq \left ( 1 - e^{-l/k} \right ) \max_{T: |T| \leq k} f(T)

This allows us to run the algorithm for more than k steps to get a better approximation by sets of larger size, and quantify how much better the guarantee on that approximation would be. It’s like an algorithmic way of hedging your risk. So let’s prove it.

Proof. Let’s set up some notation first. Fix your l and k, call S_i the set chosen by the greedy algorithm at step i, and call S^* the optimal subset of size k. Further call \textup{OPT} the value of the best set f(S^*). Call x_1^*, \dots, x_k^* the elements of S^* (the order is irrelevant). Now for every i < l monotonicity gives us f(S^*) \leq f(S^* \cup S_i). We can unravel this into a sum of marginal gains of adding single elements. The first step is

\displaystyle f(S^* \cup S_i) = f(S^* \cup S_i) - f(\{ x_1^*, \dots, x_{k-1}^* \} \cup S_i) + f(\{ x_1^*, \dots, x_{k-1}^* \} \cup S_i)

The second step removes x_{k-1}^*, from the last term, the third removes x_{k-2}^*, and so on until we have removed all of S^* and get this sum

\displaystyle f(S^* \cup S_i) = f(S_i) + \sum_{j=1}^k \left ( f(S_i \cup \{ x_1^*, \dots, x_j^* \}) - f(S_i \cup \{ x_1^*, \dots, x_{j-1}^* \} ) \right )

Now, applying submodularity, we can change all of these marginal benefits of “adding one more S^* element to S_i already with some S^* stuff” to “adding one more S^* element to just S_i.” In symbols, the equation above is at most

\displaystyle f(S_i) + \sum_{x \in S^*} f(S_i \cup \{ x \}) - f(S_i)

and because S_{i+1} is greedily chosen to maximize the benefit of adding a single element, so the above is at most

\displaystyle f(S_i) + \sum_{x \in S^*} f(S_{i+1}) - f(S_i) = f(S_i) + k(f(S_{i+1}) - f(S_i))

Chaining all of these together, we have f(S^*) - f(S_i) \leq k(f(S_{i+1}) - f(S_i)). If we call a_{i} = f(S^*) - f(S_i), then this inequality can be rewritten as a_{i+1} \leq (1 - 1/k) a_{i}. Now by induction we can relate a_l \leq (1 - 1/k)^l a_0. Now use the fact that a_0 \leq f(S^*) and the common inequality 1-x \leq e^{-x} to get

\displaystyle a_l = f(S^*) - f(S_l) \leq e^{-l/k} f(S^*)

And rearranging gives f(S_l) \geq (1 - e^{-l/k}) f(S^*).

\square

Setting l=k gives the approximation bound we promised. But note that allowing the greedy algorithm to run longer can give much stronger guarantees, though it requires you to sacrifice the cardinality constraint. 1 - 1/e is about 63%, but doubling the size of S gives about an 86% approximation guarantee. This is great for people in the real world, because you can quantify the gains you’d get by relaxing the constraints imposed on you (which are rarely set in stone).

So this is really great! We have quantifiable guarantees on a stupidly simple algorithm, and the setting is super general. And so if you have your problem and you manage to prove your function is submodular (this is often the hardest part), then you are likely to get this nice guarantee.

Extensions and Variations

This result on monotone submodular functions is just one part of a vast literature on finding approximation algorithms for submodular functions in various settings. In closing this post we’ll survey some of the highlights and provide references.

What we did in this post was maximize a monotone submodular function subject to a cardinality constraint |S| \leq k. There are three basic variations we could do: we could drop constraints and see whether we can still get guarantees, we could look at minimization instead of maximization, and we could modify the kinds of constraints we impose on the solution.

There are a ton of different kinds of constraints, and we’ll discuss two. The first is where you need to get a certain value f(S) \geq q, and you want to find the smallest set that achieves this value. Laurence Wolsey (who proved a lot of these theorems) showed in 1982 that a slight variant of the greedy algorithm can achieve a set whose size is a multiplicative factor of 1 + \log (\max_x f(\{ x \})) worse than the optimum.

The second kind of constraint is a generalization of a cardinality constraint called a knapsack constraint. This means that each item x \in X has a cost, and you have a finite budget with which to spend on elements you add to S. One might expect this natural extension of the greedy algorithm to work: pick the element which maximizes the ratio of increasing the value of f to the cost (within your available budget). Unfortunately this algorithm can perform arbitrarily poorly, but there are two fun caveats. The first is that if you do both this augmented greedy algorithm and the greedy algorithm that ignores costs, then at least one of these can’t do too poorly. Specifically, one of them has to get at least a 30% approximation. This was shown by Leskovec et al in 2007. The second is that if you’re willing to spend more time in your greedy step by choosing the best subset of size 3, then you can get back to the 1-1/e approximation. This was shown by Sviridenko in 2004.

Now we could try dropping the monotonicity constraint. In this setting cardinality constraints are also superfluous, because it could be that the very large sets have low values. Now it turns out that if f has no other restrictions (in particular, if it’s allowed to be negative), then even telling whether there’s a set S with f(S) > 0 is NP-hard, but the optimum could be arbitrarily large and positive when it exists. But if you require that f is non-negative, then you can get a 1/3-approximation, if you’re willing to add randomness you can get 2/5 in expectation, and with more subtle constraints you can get up to a 1/2 approximation. Anything better is NP-hard. Fiege, Mirrokni, and Vondrak have a nice FOCS paper on this.

Next, we could remove the monotonicity property and try to minimize the value of f(S). It turns out that this problem always has an efficient solution, but the only algorithm I have heard of to solve it involves a very sophisticated technique called the ellipsoid algorithm. This is heavily related to linear programming and convex optimization, something which I hope to cover in more detail on this blog.

Finally, there are many interesting variations in the algorithmic procedure. For example, one could require that the elements are provided in some order (the streaming setting), and you have to pick at each step whether to put the element in your set or not. Alternatively, the objective functions might not be known ahead of time and you have to try to pick elements to jointly maximize them as they are revealed. These two settings have connections to bandit learning problems, which we’ve covered before on this blog. See this survey of Krause and Golovin for more on the connections, which also contains the main proof used in this post.

Indeed, despite the fact that many of the big results were proved in the 80’s, the analysis of submodular functions is still a big research topic. There was even a paper posted just the other day on the arXiv about it’s relation to ad serving! And wouldn’t you know, they proved a (1-1/e)-approximation for their setting. There’s just something about 1-1/e.

Until next time!

AMS Network Science Mathematical Research Community

I don’t usually write promotional posts because I don’t enjoy reading them as much as I enjoy reading the technical posts. But I know that a lot of early graduate students and undergraduates read my blog, and this would be of interest to many of them.

I just got back from Utah yesterday where I attended a 5-day workshop run by the American Mathematical Society, called the Network Science Mathematical Research Community (MRC).

The point of the program is to bring graduate students and early career folks together from all over the country to start new collaborations. The AMS runs multiple MRC sessions every year, and this year the topics ranged from network science to quantum physics. We had a group of about 20 people, including statisticians, applied mathematicians, computer scientists, and a handful of pure combinatorialists. We self-organized into groups of four, and spent pretty much all day for the next four days eating great food, thinking about problems, proving theorems, enjoying the view, and discussing our ideas with the three extremely smart, successful, and amicable organizers. There were also career panels every evening that were, in my opinion, better than the average career panel.

The network science group (you can see me peeking out from the back).

The network science group (you can see me peeking out from the back, just left of center).

Anyway, it was a really fun and valuable experience, and the AMS pays for everything and a bag of chips (if by chips you mean more travel money to meet up with your collaborators and a ticket to the AMS Joint Mathematics Meeting the following January). I’m excited to blog about the work that come out of this, as network science is right up there with the coolest of topics in math and programming.

So if you’re eligible, keep an eye out for next year’s program.