In a recent newsletter article I complained about how researchers mislead about the applicability of their work. I gave SAT solvers as an example. People provided interesting examples in response, but what was new to me was the concept of SMT (Satisfiability Modulo Theories), an extension to SAT. SMT seems to have more practical uses than vanilla SAT (see the newsletter for details).

I wanted to take some time to explore SMT solvers, and I landed on Z3, an open-source SMT solver from Microsoft. In particular, I wanted to compare it to ILP (Integer Linear Programing) solvers, which I know relatively well. I picked a problem that I thought would work better for SAT-ish solvers than for ILPs: subset covering (explained in the next section). If ILP still wins against Z3, then that would be not so great for the claim that SMT is a production strength solver.

All the code used for this post is on Github.

Subset covering

A subset covering is a kind of combinatorial design, which can be explained in terms of magic rings.

An adventurer stumbles upon a chest full of magic rings. Each ring has a magical property, but some pairs of rings, when worn together on the same hand, produce a combined special magical effect distinct to that pair.

The adventurer would like to try all pairs of rings to catalogue the magical interactions. With only five fingers, how can we minimize the time spent trying on rings?

Mathematically, the rings can be described as a set $ X$ of size $ n$. We want to choose a family $ F$ of subsets of $ X$, with each subset having size 5 (five fingers), such that each subset of $ X$ of size 2 (pairs of rings) is contained in some subset of $ F$. And we want $ F$ to be as small as possible.

Subset covering is not a “production worthy” problem. Rather, I could imagine it’s useful in some production settings, but I haven’t heard of one where it is actually used. I can imagine, for instance, that a cluster of machines has some bug occurring seemingly at random for some point-to-point RPCs, and in tracking down the problem, you want to deploy a test change to subsets of servers to observe the bug occurring. Something like an experiment design problem.

If you generalize the “5” in “5 fingers” to an arbitrary positive integer $ k$, and the “2” in “2 rings” to $ l < k$, then we have the general subset covering problem. Define $ M(n, k, l)$ to be the minimal number of subsets of size $ k$ needed to cover all subsets of size $ l$. This problem was studied by Erdős, with a conjecture subsequently proved by Vojtěch Rödl, that asymptotically $ M(n,k,l)$ grows like $ \binom{n}{l} / \binom{k}{l}$. Additional work by Joel Spencer showed that a greedy algorithm is essentially optimal.

However, all of the constructive algorithms in these proofs involve enumerating all $ \binom{n}{k}$ subsets of $ X$. This wouldn’t scale very well. You can alternatively try a “random” method, incurring a typically $ \log(r)$ factor of additional sets required to cover a $ 1 – 1/r$ fraction of the needed subsets. This is practical, but imperfect.

To the best of my knowledge, there is no exact algorithm, that both achieves the minimum and is efficient in avoiding constructing all $ \binom{n}{k}$ subsets. So let’s try using an SMT solver. I’ll be using the Python library for Z3.

Baseline: brute force Z3

For a baseline, let’s start with a simple Z3 model that enumerates all the possible subsets that could be chosen. This leads to an exceedingly simple model to compare the complex models against.

Define boolean variables $ \textup{Choice}_S$ which is true if and only if the subset $ S$ is chosen (I call this a “choice set”). Define boolean variables $ \textup{Hit}_T$ which is true if the subset $ T$ (I call this a “hit set”) is contained in a chosen choice set. Then the subset cover problem can be defined by two sets of implications.

First, if $ \textup{Choice}_S$ is true, then so must all $ \textup{Hit}_T$ for $ T \subset S$. E.g., for $ S = \{ 1, 2, 3 \}$ and $ l=2$, we get

$$\displaystyle \begin{aligned} \textup{Choice}_{(1,2,3)} &\implies \textup{Hit}_{(1,2)} \\\ \textup{Choice}_{(1,2,3)} &\implies \textup{Hit}_{(1,3)} \\\ \textup{Choice}_{(1,2,3)} &\implies \textup{Hit}_{(2,3)} \end{aligned}$$

In Python this looks like the following (note this program has some previously created lookups and data structures containing the variables)

for choice_set in choice_sets:
  for hit_set_key in combinations(choice_set.elements, hit_set_size):
    hit_set = hit_set_lookup[hit_set_key]
    implications.append(
      z3.Implies(choice_set.variable, hit_set.variable))

Second, if $ \textup{Hit}_T$ is true, it must be that some $ \textup{Choice}_S$ is true for some $ S$ containing $ T$ as a subset. For example,

$$\displaystyle \begin{aligned} \textup{Hit}_{(1,2)} &\implies \\\ & \textup{Choice}_{(1,2,3,4)} \textup{ OR} \\\ & \textup{Choice}_{(1,2,3,5)} \textup{ OR} \\\ & \textup{Choice}_{(1,2,4,5)} \textup{ OR } \cdots \\\ \end{aligned}$$

In code,

for hit_set in hit_sets.values():
  relevant_choice_set_vars = [
    choice_set.variable
    for choice_set in hit_set_to_choice_set_lookup[hit_set]
  ]
  implications.append(
    z3.Implies(
      hit_set.variable, 
      z3.Or(*relevant_choice_set_vars)))

Next, in this experiment we’re allowing the caller to specify the number of choice sets to try, and the solver should either return SAT or UNSAT. From that, we can use a binary search to find the optimal number of sets to pick. Thus, we have to limit the number of $ \textup{Choice}_S$ that are allowed to be true and false. Z3 supports boolean cardinality constraints, apparently with a special solver to handle problems that have them. Otherwise, the process of encoding cardinality constraints as SAT formulas is not trivial (and the subject of active research). But the code is simple enough:

args = [cs.variable for cs in choice_sets] + [parameters.num_choice_sets]
choice_sets_at_most = z3.AtMost(*args)
choice_sets_at_least = z3.AtLeast(*args)

Finally, we must assert that every $ \textup{Hit}_T$ is true.

solver = z3.Solver()
for hit_set in hit_sets.values():
  solver.add(hit_set.variable)

for impl in implications:
  solver.add(impl)

solver.add(choice_sets_at_most)
solver.add(choice_sets_at_least)

Running it for $ n=7, k=3, l=2$, and seven choice sets (which is optimal), we get

>>> SubsetCoverZ3BruteForce().solve(
  SubsetCoverParameters(
    num_elements=7,
    choice_set_size=3,
    hit_set_size=2,
    num_choice_sets=7))
[(0, 1, 3), (0, 2, 4), (0, 5, 6), (1, 2, 6), (1, 4, 5), (2, 3, 5), (3, 4, 6)]
SubsetCoverSolution(status=<SolveStatus.SOLVED: 1>, solve_time_seconds=0.018305063247680664)

Interestingly, Z3 refuses to solve marginally larger instances. For instance, I tried the following and Z3 times out around $ n=12, k=6$ (about 8k choice sets):

from math import comb

for n in range(8, 16):
  k = int(n / 2)
  l = 3
  max_num_sets = int(2 * comb(n, l) / comb(k, l))
  params = SubsetCoverParameters(
    num_elements=n,
    choice_set_size=k,
    hit_set_size=l,                                   
    num_choice_sets=max_num_sets)

    print_table(
      params, 
      SubsetCoverZ3BruteForce().solve(params), 
      header=(n==8))

After taking a long time to generate the larger models, Z3 exceeds my 15 minute time limit, suggesting exponential growth:

status               solve_time_seconds  num_elements  choice_set_size  hit_set_size  num_choice_sets
SolveStatus.SOLVED   0.0271              8             4                3             28
SolveStatus.SOLVED   0.0346              9             4                3             42
SolveStatus.SOLVED   0.0735              10            5                3             24
SolveStatus.SOLVED   0.1725              11            5                3             33
SolveStatus.SOLVED   386.7376            12            6                3             22
SolveStatus.UNKNOWN  900.1419            13            6                3             28
SolveStatus.UNKNOWN  900.0160            14            7                3             20
SolveStatus.UNKNOWN  900.0794            15            7                3             26

An ILP model

Next we’ll see an ILP model for the sample problem. Note there are two reasons I expect the ILP model to fall short. First, the best solver I have access to is SCIP, which, despite being quite good is, in my experience, about an order of magnitude slower than commercial alternatives like Gurobi. Second, I think this sort of problem seems to not be very well suited to ILPs. It would take quite a bit longer to explain why (maybe another post, if you’re interested), but in short well-formed ILPs have easily found feasible solutions (this one does not), and the LP-relaxation of the problem should be as tight as possible. I don’t think my formulation is very tight, but it’s possible there is a better formulation.

Anyway, the primary difference in my ILP model from brute force is that the number of choice sets is fixed in advance, and the members of the choice sets are model variables. This allows us to avoid enumerating all choice sets in the model.

In particular, $ \textup{Member}_{S,i} \in \{ 0, 1 \}$ is a binary variable that is 1 if and only if element $ i$ is assigned to be in set $ S$. And $ \textup{IsHit}_{T, S} \in \{0, 1\}$ is 1 if and only if the hit set $ T$ is a subset of $ S$. Here “$ S$” is an index over the subsets, rather than the set itself, because we don’t know what elements are in $ S$ while building the model.

For the constraints, each choice set $ S$ must have size $ k$:

$$\displaystyle \sum_{i \in X} \textup{Member}_{S, i} = k$$

Each hit set $ T$ must be hit by at least one choice set:

$$\displaystyle \sum_{S} \textup{IsHit}_{T, S} \geq 1$$

Now the tricky constraint. If a hit set $ T$ is hit by a specific choice set $ S$ (i.e., $ \textup{IsHit}_{T, S} = 1$) then all the elements in $ T$ must also be members of $ S$.

$$\displaystyle \sum_{i \in T} \textup{Member}_{S, i} \geq l \cdot \textup{IsHit}_{T, S}$$

This one works by the fact that the left-hand side (LHS) is bounded from below by 0 and bounded from above by $ l = |T|$. Then $ \textup{IsHit}_{T, S}$ acts as a switch. If it is 0, then the constraint is vacuous since the LHS is always non-negative. If $ \textup{IsHit}_{T, S} = 1$, then the right-hand side (RHS) is $ l = |T|$ and this forces all variables on the LHS to be 1 to achieve it.

Because we fixed the number of choice sets as a parameter, the objective is 1, and all we’re doing is looking for a feasible solution. The full code is here.

On the same simple example as the brute force

>>> SubsetCoverILP().solve(
  SubsetCoverParameters(
    num_elements=7,
    choice_set_size=3,
    hit_set_size=2,
    num_choice_sets=7))
[(0, 1, 3), (0, 2, 6), (0, 4, 5), (1, 2, 4), (1, 5, 6), (2, 3, 5), (3, 4, 6)]
SubsetCoverSolution(status=<SolveStatus.SOLVED: 1>, solve_time_seconds=0.1065816879272461)

It finds the same solution in about 10x the runtime as the brute force Z3 model, though still well under one second.

On the “scaling” example, it fares much worse. With a timeout of 15 minutes, it solves $ n=8,$ decently fast, $ n=9,12$ slowly, and times out on the rest.

status               solve_time_seconds  num_elements  choice_set_size  hit_set_size  num_choice_sets
SolveStatus.SOLVED   1.9969              8             4                3             28
SolveStatus.SOLVED   306.4089            9             4                3             42
SolveStatus.UNKNOWN  899.8842            10            5                3             24
SolveStatus.UNKNOWN  899.4849            11            5                3             33
SolveStatus.SOLVED   406.9502            12            6                3             22
SolveStatus.UNKNOWN  902.7807            13            6                3             28
SolveStatus.UNKNOWN  900.0826            14            7                3             20
SolveStatus.UNKNOWN  900.0731            15            7                3             26

A Z3 Boolean Cardinality Model

The next model uses Z3. It keeps the concept of Member and Hit variables, but they are boolean instead of integer. It also replaces the linear constraints with implications. The constraint that forces a Hit set’s variable to be true when some Choice set contains all its elements is (for each $ S, T$)

$ \displaystyle \left ( \bigwedge_{i \in T} \textup{Member}_{S, i} \right ) \implies \textup{IsHit}_T$

Conversely, A Hit set’s variable being true implies its members are in some choice set.

$$\displaystyle \textup{IsHit}_T \implies \bigvee_{S} \bigwedge_{i \in T} \textup{Member}_{S, i}$$

Finally, we again use boolean cardinality constraints AtMost and AtLeast so that each choice set has the right size.

The results are much better than the ILP: it solves all of the instances in under 3 seconds

status              solve_time_seconds  num_elements  choice_set_size  hit_set_size  num_choice_sets
SolveStatus.SOLVED  0.0874              8             4                3             28
SolveStatus.SOLVED  0.1861              9             4                3             42
SolveStatus.SOLVED  0.1393              10            5                3             24
SolveStatus.SOLVED  0.2845              11            5                3             33
SolveStatus.SOLVED  0.2032              12            6                3             22
SolveStatus.SOLVED  1.3661              13            6                3             28
SolveStatus.SOLVED  0.8639              14            7                3             20
SolveStatus.SOLVED  2.4877              15            7                3             26

A Z3 integer model

Z3 supports implications on integer equation equalities, so we can try a model that leverages this by essentially converting the boolean model to one where the variables are 0-1 integers, and the constraints are implications on equality of integer formulas (all of the form “variable = 1”).

I expect this to perform worse than the boolean model, even though the formulation is almost identical. The details of the model are here, and it’s so similar to the boolean model above that it needs no extra explanation.

The runtime is much worse, but surprisingly it still does better than the ILP model.

status              solve_time_seconds  num_elements  choice_set_size  hit_set_size  num_choice_sets
SolveStatus.SOLVED  2.1129              8             4                3             28
SolveStatus.SOLVED  14.8728             9             4                3             42
SolveStatus.SOLVED  7.6247              10            5                3             24
SolveStatus.SOLVED  25.0607             11            5                3             33
SolveStatus.SOLVED  30.5626             12            6                3             22
SolveStatus.SOLVED  63.2780             13            6                3             28
SolveStatus.SOLVED  57.0777             14            7                3             20
SolveStatus.SOLVED  394.5060            15            7                3             26

Harder instances

So far all the instances we’ve been giving the solvers are “easy” in a sense. In particular, we’ve guaranteed there’s a feasible solution, and it’s easy to find. We’re giving roughly twice as many sets as are needed. There are two ways to make this problem harder. One is to test on unsatisfiable instances, which can be harder because the solver has to prove it can’t work. Another is to test on satisfiable instances that are hard to find, such as those satisfiable instances where the true optimal number of choice sets is given as the input parameter. The hardest unsatisfiable instances are also the ones where the number of choice sets allowed is one less than optimal.

Let’s test those situations. Since $ M(7, 3, 2) = 7$, we can try with 7 choice sets and 6 choice sets.

For 7 choice sets (the optimal value), all the solvers do relatively well

method                    status  solve_time_seconds  num_elements  choice_set_size  hit_set_size  num_choice_sets
SubsetCoverILP            SOLVED  0.0843              7             3                2             7
SubsetCoverZ3Integer      SOLVED  0.0938              7             3                2             7
SubsetCoverZ3BruteForce   SOLVED  0.0197              7             3                2             7
SubsetCoverZ3Cardinality  SOLVED  0.0208              7             3                2             7

For 6, the ILP struggles to prove it’s infeasible, and the others do comparatively much better (at least 17x better).

method                    status      solve_time_seconds  num_elements  choice_set_size  hit_set_size  num_choice_sets
SubsetCoverILP            INFEASIBLE  120.8593            7             3                2             6
SubsetCoverZ3Integer      INFEASIBLE  3.0792              7             3                2             6
SubsetCoverZ3BruteForce   INFEASIBLE  0.3384              7             3                2             6
SubsetCoverZ3Cardinality  INFEASIBLE  7.5781              7             3                2             6

This seems like hard evidence that Z3 is better than ILPs for this problem (and it is), but note that the same test on $ n=8$ fails for all models. They can all quickly prove $ 8 < M(8, 3, 2) \leq 11$, but time out after twenty minutes when trying to determine if $ M(8, 3, 2) \in \{ 9, 10 \}$. Note that $ k=3, l=2$ is the least complex choice for the other parameters, so it seems like there’s not much hope to find $ M(n, k, l)$ for any seriously large parameters, like, say, $ k=6$.

Thoughts

These experiments suggest what SMT solvers can offer above and beyond ILP solvers. Disjunctions and implications are notoriously hard to model in an ILP. You often need to add additional special variables, or do tricks like the one I did that only work in some situations and which can mess with the efficiency of the solver. With SMT, implications are trivial to model, and natively supported by the solver.

Aside from reading everything I could find on Z3, there seems to be little advice on modeling to help the solver run faster. There is a ton of literature for this in ILP solvers, but if any readers see obvious problems with my SMT models, please chime in! I’d love to hear from you. Even without that, I am pretty impressed by how fast the solves finish for this subset cover problem (which this experiment has shown me is apparently a very hard problem).

However, there’s an elephant in the room. These models are all satisfiability/feasibility checks on a given solution. What is not tested here is optimization, in the sense of having the number of choice sets used be minimized directly by the solver. In a few experiments on even simpler models, z3 optimization is quite slow. And while I know how I’d model the ILP version of the optimization problem, given that it’s quite slow to find a feasible instance when the optimal number of sets is given as a parameter, it seems unlikely that it will be fast when asked to optimize. I will have to try that another time to be sure.

Also, I’d like to test the ILP models on Gurobi, but I don’t have a personal license. There’s also the possibility that I can come up with a much better ILP formulation, say, with a tighter LP relaxation. But these will have to wait for another time.

In the end, this experiment has given me some more food for thought, and concrete first-hand experience, on the use of SMT solvers.