# When Greedy Algorithms are Perfect: the Matroid

Greedy algorithms are by far one of the easiest and most well-understood algorithmic techniques. There is a wealth of variations, but at its core the greedy algorithm optimizes something using the natural rule, “pick what looks best” at any step. So a greedy routing algorithm would say to a routing problem: “You want to visit all these locations with minimum travel time? Let’s start by going to the closest one. And from there to the next closest one. And so on.”

Because greedy algorithms are so simple, researchers have naturally made a big effort to understand their performance. Under what conditions will they actually solve the problem we’re trying to solve, or at least get close? In a previous post we gave some easy-to-state conditions under which greedy gives a good approximation, but the obvious question remains: can we characterize when greedy algorithms give an optimal solution to a problem?

The answer is yes, and the framework that enables us to do this is called a matroid. That is, if we can phrase the problem we’re trying to solve as a matroid, then the greedy algorithm is guaranteed to be optimal. Let’s start with an example when greedy is provably optimal: the minimum spanning tree problem. Throughout the article we’ll assume the reader is familiar with the very basics of linear algebra and graph theory (though we’ll remind ourselves what a minimum spanning tree is shortly). For a refresher, this blog has primers on both subjects. But first, some history.

## History

Matroids were first introduced by Hassler Whitney in 1935, and independently discovered a little later by B.L. van der Waerden (a big name in combinatorics). They were both interested in devising a general description of “independence,” the properties of which are strikingly similar when specified in linear algebra and graph theory. Since then the study of matroids has blossomed into a large and beautiful theory, one part of which is the characterization of the greedy algorithm: greedy is optimal on a problem if and only if the problem can be represented as a matroid. Mathematicians have also characterized which matroids can be modeled as spanning trees of graphs (we will see this momentarily). As such, matroids have become a standard topic in the theory and practice of algorithms.

## Minimum Spanning Trees

It is often natural in an undirected graph $G = (V,E)$ to find a connected subset of edges that touch every vertex. As an example, if you’re working on a power network you might want to identify a “backbone” of the network so that you can use the backbone to cheaply travel from any node to any other node. Similarly, in a routing network (like the internet) it costs a lot of money to lay down cable, it’s in the interest of the internet service providers to design analogous backbones into their infrastructure.

A minimal subset of edges in a backbone like this is guaranteed to form a tree. This is simply because if you have a cycle in your subgraph then removing any edge on that cycle doesn’t break connectivity or the fact that you can get from any vertex to any other (and trees are the maximal subgraphs without cycles). As such, these “backbones” are called spanning trees. “Span” here means that you can get from any vertex to any other vertex, and it suggests the connection to linear algebra that we’ll describe later, and it’s a simple property of a tree that there is a unique path between any two vertices in the tree.

An example of a spanning tree

When your edges $e \in E$ have nonnegative weights $w_e \in \mathbb{R}^{\geq 0}$, we can further ask to find a minimum cost spanning tree. The cost of a spanning tree $T$ is just the sum of its edges, and it’s important enough of a definition to offset.

Definition: A minimum spanning tree $T$ of a weighted graph $G$ (with weights $w_e \geq 0$ for $e \in E$) is a spanning tree which minimizes the quantity

$w(T) = \sum_{e \in T} w_e$

There are a lot of algorithms to find minimal spanning trees, but one that will lead us to matroids is Kruskal’s algorithm. It’s quite simple. We’ll maintain a forest $F$ in $G$, which is just a subgraph consisting of a bunch of trees that may or may not be connected. At the beginning $F$ is just all the vertices with no edges. And then at each step we add to $F$ the edge $e$ whose weight is smallest and also does not introduce any cycles into $F$. If the input graph $G$ is connected then this will always produce a minimal spanning tree.

Theorem: Kruskal’s algorithm produces a minimal spanning tree of a connected graph.

Proof. Call $F_t$ the forest produced at step $t$ of the algorithm. Then $F_0$ is the set of all vertices of $G$ and $F_{n-1}$ is the final forest output by Kruskal’s (as a quick exercise, prove all spanning trees on $n$ vertices have $n-1$ edges, so we will stop after $n-1$ rounds). It’s clear that $F_{n-1}$ is a tree because the algorithm guarantees no $F_i$ will have a cycle. And any tree with $n-1$ edges is necessarily a spanning tree, because if some vertex were left out then there would be $n-1$ edges on a subgraph of $n-1$ vertices, necessarily causing a cycle somewhere in that subgraph.

Now we’ll prove that $F_{n-1}$ has minimal cost. We’ll prove this in a similar manner to the general proof for matroids. Indeed, say you had a tree $T$ whose cost is strictly less than that of $F_{n-1}$ (we can also suppose that $T$ is minimal, but this is not necessary). Pick the minimal weight edge $e \in T$ that is not in $F_{n-1}$. Adding $e$ to $F_{n-1}$ introduces a unique cycle $C$ in $F_{n-1}$. This cycle has some strange properties. First, $e$ has the highest cost of any edge on $C$. For otherwise, Kruskal’s algorithm would have chosen it before the heavier weight edges. Second, there is another edge in $C$ that’s not in $T$ (because $T$ was a tree it can’t have the entire cycle). Call such an edge $e'$. Now we can remove $e'$ from $F_{n-1}$ and add $e$. This can only increase the total cost of $F_{n-1}$, but this transformation produces a tree with one more edge in common with $T$ than before. This contradicts that $T$ had strictly lower weight than $F_{n-1}$, because repeating the process we described would eventually transform $F_{n-1}$ into $T$ exactly, while only increasing the total cost.

$\square$

Just to recap, we defined sets of edges to be “good” if they did not contain a cycle, and a spanning tree is a maximal set of edges with this property. In this scenario, the greedy algorithm performed optimally at finding a spanning tree with minimal total cost.

## Columns of Matrices

Now let’s consider a different kind of problem. Say I give you a matrix like this one:

$\displaystyle A = \begin{pmatrix} 2 & 0 & 1 & -1 & 0 \\ 0 & -4 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 7 \end{pmatrix}$

In the standard interpretation of linear algebra, this matrix represents a linear function $f$ from one vector space $V$ to another $W$, with the basis $(v_1, \dots, v_5)$ of $V$ being represented by columns and the basis $(w_1, w_2, w_3)$ of $W$ being represented by the rows. Column $j$ tells you how to write $f(v_j)$ as a linear combination of the $w_i$, and in so doing uniquely defines $f$.

Now one thing we want to calculate is the rank of this matrix. That is, what is the dimension of the image of $V$ under $f$? By linear algebraic arguments we know that this is equivalent to asking “how many linearly independent columns of $A$ can we find”? An interesting consequence is that if you have two sets of columns that are both linearly independent and maximally so (adding any other column to either set would necessarily introduce a dependence in that set), then these two sets have the same size. This is part of why the rank of a matrix is well-defined.

If we were to give the columns of $A$ costs, then we could ask about finding the minimal-cost maximally-independent column set. It sounds like a mouthful, but it’s exactly the same idea as with spanning trees: we want a set of vectors that spans the whole column space of $A$, but contains no “cycles” (linearly dependent combinations), and we want the cheapest such set.

So we have two kinds of “independence systems” that seem to be related. One interesting question we can ask is whether these kinds of independence systems are “the same” in a reasonable way. Hardcore readers of this blog may see the connection quite quickly. For any graph $G = (V,E)$, there is a natural linear map from $E$ to $V$, so that a linear dependence among the columns (edges) corresponds to a cycle in $G$. This map is called the incidence matrix by combinatorialists and the first boundary map by topologists.

The map is easy to construct: for each edge $e = (v_i,v_j)$ you add a column with a 1 in the $j$-th row and a $-1$ in the $i$-th row. Then taking a sum of edges gives you zero if and only if the edges form a cycle. So we can think of a set of edges as “independent” if they don’t contain a cycle. It’s a little bit less general than independence over $\mathbb{R}$, but you can make it exactly the same kind of independence if you change your field from real numbers to $\mathbb{Z}/2\mathbb{Z}$. We won’t do this because it will detract from our end goal (to analyze greedy algorithms in realistic settings), but for further reading this survey of Oxley assumes that perspective.

So with the recognition of how similar these notions of independence are, we are ready to define matroids.

## The Matroid

So far we’ve seen two kinds of independence: “sets of edges with no cycles” (also called forests) and “sets of linearly independent vectors.” Both of these share two trivial properties: there are always nonempty independent sets, and every subset of an independent set is independent. We will call any family of subsets with this property an independence system.

Definition: Let $X$ be a finite set. An independence system over $X$ is a family $\mathscr{I}$ of subsets of $X$ with the following two properties.

1. $\mathscr{I}$ is nonempty.
2. If $I \in \mathscr{I}$, then so is every subset of $I$.

This is too general to characterize greedy algorithms, so we need one more property shared by our examples. There are a few things we do, but here’s one nice property that turns out to be enough.

Definition: A matroid $M = (X, \mathscr{I})$ is a set $X$ and an independence system $\mathscr{I}$ over $X$ with the following property:

If $A, B$ are in $\mathscr{I}$ with $|A| = |B| + 1$, then there is an element $x \in A \setminus B$ such that $B \cup \{ a \} \in \mathscr{I}$.

In other words, this property says if I have an independent set that is not maximally independent, I can grow the set by adding some suitably-chosen element from a larger independent set. We’ll call this the extension property. For a warmup exercise, let’s prove that the extension property is equivalent to the following (assuming the other properties of a matroid):

For every subset $Y \subset X$, all maximal independent sets contained in $Y$ have equal size.

Proof. For one direction, if you have two maximal sets $A, B \subset Y \subset X$ that are not the same size (say $A$ is bigger), then you can take any subset of $A$ whose size is exactly $|B| + 1$, and use the extension property to make $B$ larger, a contradiction. For the other direction, say that I know all maximal independent sets of any $Y \subset X$ have the same size, and you give me $A, B \subset X$. I need to find an $a \in A \setminus B$ that I can add to $B$ and keep it independent. What I do is take the subset $Y = A \cup B$. Now the sizes of $A, B$ don’t change, but $B$ can’t be maximal inside $Y$ because it’s smaller than $A$ ($A$ might not be maximal either, but it’s still independent). And the only way to extend $B$ is by adding something from $A$, as desired.

$\square$

So we can use the extension property and the cardinality property interchangeably when talking about matroids. Continuing to connect matroid language to linear algebra and graph theory, the maximal independent sets of a matroid are called bases, the size of any basis is the rank of the matroid, and the minimal dependent sets are called circuits. In fact, you can characterize matroids in terms of the properties of their circuits, which are dual to the properties of bases (and hence all independent sets) in a very concrete sense.

But while you could spend all day characterizing the many kinds of matroids and comatroids out there, we are still faced with the task of seeing how the greedy algorithm performs on a matroid. That is, suppose that your matroid $M = (X, \mathscr{I})$ has a nonnegative real number $w(x)$ associated with each $x \in X$. And suppose we had a black-box function to determine if a given set $S \subset X$ is independent. Then the greedy algorithm maintains a set $B$, and at every step adds a minimum weight element that maintains the independence of $B$. If we measure the cost of a subset by the sum of the weights of its elements, then the question is whether the greedy algorithm finds a minimum weight basis of the matroid.

The answer is even better than yes. In fact, the answer is that the greedy algorithm performs perfectly if and only if the problem is a matroid! More rigorously,

Theorem: Suppose that $M = (X, \mathscr{I})$ is an independence system, and that we have a black-box algorithm to determine whether a given set is independent. Define the greedy algorithm to iteratively adds the cheapest element of $X$ that maintains independence. Then the greedy algorithm produces a maximally independent set $S$ of minimal cost for every nonnegative cost function on $X$, if and only if $M$ is a matroid.

It’s clear that the algorithm will produce a set that is maximally independent. The only question is whether what it produces has minimum weight among all maximally independent sets. We’ll break the theorem into the two directions of the “if and only if”:

Part 1: If $M$ is a matroid, then greedy works perfectly no matter the cost function.
Part 2: If greedy works perfectly for every cost function, then $M$ is a matroid.

Proof of Part 1.

Call the cost function $w : X \to \mathbb{R}^{\geq 0}$, and suppose that the greedy algorithm picks elements $B = \{ x_1, x_2, \dots, x_r \}$ (in that order). It’s easy to see that $w(x_1) \leq w(x_2) \leq \dots \leq w(x_r)$. Now if you give me any list of $r$ independent elements $y_1, y_2, \dots, y_r \in X$ that has $w(y_1) \leq \dots \leq w(y_r)$, I claim that $w(x_i) \leq w(y_i)$ for all $i$. This proves what we want, because if there were a basis of size $r$ with smaller weight, sorting its elements by weight would give a list contradicting this claim.

To prove the claim, suppose to the contrary that it were false, and for some $k$ we have $w(x_k) > w(y_k)$. Moreover, pick the smallest $k$ for which this is true. Note $k > 1$, and so we can look at the special sets $S = \{ x_1, \dots, x_{k-1} \}$ and $T = \{ y_1, \dots, y_k \}$. Now $|T| = |S|+1$, so by the matroid property there is some $j$ between $1$ and $r$ so that $S \cup \{ y_j \}$ is an independent set (and $y_j$ is not in $S$). But then $w(y_j) \leq w(y_k) < w(x_k)$, and so the greedy algorithm would have picked $y_j$ before it picks $x_k$ (and the strict inequality means they’re different elements). This contradicts how the greedy algorithm runs, and hence proves the claim.

Proof of Part 2.

We’ll prove this contrapositively as follows. Suppose we have our independence system and it doesn’t satisfy the last matroid condition. Then we’ll construct a special weight function that causes the greedy algorithm to fail. So let $A,B$ be independent sets with $|A| = |B| + 1$, but for every $a \in A \setminus B$ adding $a$ to $B$ never gives you an independent set.

Now what we’ll do is define our weight function so that the greedy algorithm picks the elements we want in the order we want (roughly). In particular, we’ll assign all elements of $A \cap B$ a tiny weight we’ll call $w_1$. For elements of $B - A$ we’ll use $w_2$, and for $A - B$ we’ll use $w_3$, with $w_4$ for everything else. In a more compact notation:

We need two things for this weight function to screw up the greedy algorithm. The first is that $w_1 < w_2 < w_3 < w_4$, so that greedy picks the elements in the order we want. Note that this means it’ll first pick all of $A \cap B$, and then all of $B - A$, and by assumption it won’t be able to pick anything from $A - B$, but since $B$ is assumed to be non-maximal, we have to pick at least one element from $X - (A \cup B)$ and pay $w_4$ for it.

So the second thing we want is that the cost of doing greedy is worse than picking any maximally independent set that contains $A$ (and we know that there has to be some maximal independent set containing $A$). In other words, if we call $m$ the size of a maximally independent set, we want

$\displaystyle |A \cap B| w_1 + |B-A|w_2 + (m - |B|)w_4 > |A \cap B|w_1 + |A-B|w_3 + (m-|A|)w_4$

This can be rearranged (using the fact that $|A| = |B|+1$) to

$\displaystyle w_4 > |A-B|w_3 - |B-A|w_2$

The point here is that the greedy picks too many elements of weight $w_4$, since if we were to start by taking all of $A$ (instead of all of $B$), then we could get by with one fewer. That might not be optimal, but it’s better than greedy and that’s enough for the proof.

So we just need to make $w_4$ large enough to make this inequality hold, while still maintaining $w_2 < w_3$. There are probably many ways to do this, and here’s one. Pick some $0 < \varepsilon < 1$, and set

It’s trivial that $w_1 < w_2$ and $w_3 < w_4$. For the rest we need some observations. First, the fact that $|A-B| = |B-A| + 1$ implies that $w_2 < w_3$. Second, both $|A-B|$ and $|B-A|$ are nonempty, since otherwise the second property of independence systems would contradict our assumption that augmenting $B$ with elements of $A$ breaks independence. Using this, we can divide by these quantities to get

$\displaystyle w_4 = 2 > 1 = \frac{|A-B|(1 + \varepsilon)}{|A-B|} - \frac{|B-A|\varepsilon}{|B-A|}$

This proves the claim and finishes the proof.

$\square$

As a side note, we proved everything here with respect to minimizing the sum of the weights, but one can prove an identical theorem for maximization. The only part that’s really different is picking the clever weight function in part 2. In fact, you can convert between the two by defining a new weight function that subtracts the old weights from some fixed number $N$ that is larger than any of the original weights. So these two problems really are the same thing.

This is pretty amazing! So if you can prove your problem is a matroid then you have an awesome algorithm automatically. And if you run the greedy algorithm for fun and it seems like it works all the time, then that may be hinting that your problem is a matroid. This is one of the best situations one could possibly hope for.

But as usual, there are a few caveats to consider. They are both related to efficiency. The first is the black box algorithm for determining if a set is independent. In a problem like minimum spanning tree or finding independent columns of a matrix, there are polynomial time algorithms for determining independence. These two can both be done, for example, with Gaussian elimination. But there’s nothing to stop our favorite matroid from requiring an exponential amount of time to check if a set is independent. This makes greedy all but useless, since we need to check for independence many times in every round.

Another, perhaps subtler, issue is that the size of the ground set $X$ might be exponentially larger than the rank of the matroid. In other words, at every step our greedy algorithm needs to find a new element to add to the set it’s building up. But there could be such a huge ocean of candidates, all but a few of which break independence. In practice an algorithm might be working with $X$ implicitly, so we could still hope to solve the problem if we had enough knowledge to speed up the search for a new element.

There are still other concerns. For example, a naive approach to implementing greedy takes quadratic time, since you may have to look through every element of $X$ to find the minimum-cost guy to add. What if you just have to have faster runtime than $O(n^2)$? You can still be interested in finding more efficient algorithms that still perform perfectly, and to the best of my knowledge there’s nothing that says that greedy is the only exact algorithm for your favorite matroid. And then there are models where you don’t have direct/random access to the input, and lots of other ways that you can improve on greedy. But those stories are for another time.

Until then!

# Linear Programming and the Most Affordable Healthy Diet — Part 1

Optimization is by far one of the richest ways to apply computer science and mathematics to the real world. Everybody is looking to optimize something: companies want to maximize profits, factories want to maximize efficiency, investors want to minimize risk, the list just goes on and on. The mathematical tools for optimization are also some of the richest mathematical techniques. They form the cornerstone of an entire industry known as operations research, and advances in this field literally change the world.

The mathematical field is called combinatorial optimization, and the name comes from the goal of finding optimal solutions more efficiently than an exhaustive search through every possibility. This post will introduce the most central problem in all of combinatorial optimization, known as the linear program. Even better, we know how to efficiently solve linear programs, so in future posts we’ll write a program that computes the most affordable diet while meeting the recommended health standard.

## Generalizing a Specific Linear Program

Most optimization problems have two parts: an objective function, the thing we want to maximize or minimize, and constraints, rules we must abide by to ensure we get a valid solution. As a simple example you may want to minimize the amount of time you spend doing your taxes (objective function), but you certainly can’t spend a negative amount of time on them (a constraint).

The following more complicated example is the centerpiece of this post. Most people want to minimize the amount of money spent on food. At the same time, one needs to maintain a certain level of nutrition. For males ages 19-30, the United States National Institute for Health recommends 3.7 liters of water per day, 1,000 milligrams of calcium per day, 90 milligrams of vitamin C per day, etc.

We can set up this nutrition problem mathematically, just using a few toy variables. Say we had the option to buy some combination of oranges, milk, and broccoli. Some rough estimates [1] give the following content/costs of these foods. For 0.272 USD you can get 100 grams of orange, containing a total of 53.2mg of calcium, 40mg of vitamin C, and 87g of water. For 0.100 USD you can get 100 grams of whole milk, containing 276mg of calcium, 0mg of vitamin C, and 87g of water. Finally, for 0.381 USD you can get 100 grams of broccoli containing 47mg of calcium, 89.2mg of vitamin C, and 91g of water. Here’s a table summarizing this information:

Nutritional content and prices for 100g of three foods

Food         calcium(mg)     vitamin C(mg)      water(g)   price(USD/100g)
Broccoli     47              89.2               91         0.381
Whole milk   276             0                  87         0.100
Oranges      40              53.2               87         0.272

Some observations: broccoli is more expensive but gets the most of all three nutrients, whole milk doesn’t have any vitamin C but gets a ton of calcium for really cheap, and oranges are a somewhere in between. So you could probably tinker with the quantities and figure out what the cheapest healthy diet is. The problem is what happens when we incorporate hundreds or thousands of food items and tens of nutrient recommendations. This simple example is just to help us build up a nice formality.

So let’s continue doing that. If we denote by $b$ the number of 100g units of broccoli we decide to buy, and $m$ the amount of milk and $r$ the amount of oranges, then we can write the daily cost of food as

$\displaystyle \text{cost}(b,m,r) = 0.381 b + 0.1 m + 0.272 r$

In the interest of being compact (and again, building toward the general linear programming formulation) we can extract the price information into a single cost vector $c = (0.381, 0.1, 0.272)$, and likewise write our variables as a vector $x = (b,m,r)$. We’re implicitly fixing an ordering on the variables that is maintained throughout the problem, but the choice of ordering doesn’t matter. Now the cost function is just the inner product (dot product) of the cost vector and the variable vector $\left \langle c,x \right \rangle$. For some reason lots of people like to write this as $c^Tx$, where $c^T$ denotes the transpose of a matrix, and we imagine that $c$ and $x$ are matrices of size $3 \times 1$. I’ll stick to using the inner product bracket notation.

Now for each type of food we get a specific amount of each nutrient, and the sum of those nutrients needs to be bigger than the minimum recommendation. For example, we want at least 1,000 mg of calcium per day, so we require that $1000 \leq 47b + 276m + 40r$. Likewise, we can write out a table of the constraints by looking at the columns of our table above.

$\displaystyle \begin{matrix} 91b & + & 87m & + & 87r & \geq & 3700 & \text{(water)}\\ 47b & + & 276m & + & 40r & \geq & 1000 & \text{(calcium)} \\ 89.2b & + & 0m & + & 53.2r & \geq & 90 & \text{(vitamin C)} \end{matrix}$

In the same way that we extracted the cost data into a vector to separate it from the variables, we can extract all of the nutrient data into a matrix $A$, and the recommended minimums into a vector $v$. Traditionally the letter $b$ is used for the minimums vector, but for now we’re using $b$ for broccoli.

$A = \begin{pmatrix} 91 & 87 & 87 \\ 47 & 276 & 40 \\ 89.2 & 0 & 53.2 \end{pmatrix}$

$v = \begin{pmatrix} 3700 \\ 1000 \\ 90 \end{pmatrix}$

And now the constraint is that $Ax \geq v$, where the $\geq$ means “greater than or equal to in every coordinate.” So now we can write down the more general form of the problem for our specific matrices and vectors. That is, our problem is to minimize $\left \langle c,x \right \rangle$ subject to the constraint that $Ax \geq v$. This is often written in offset form to contrast it with variations we’ll see in a bit:

$\displaystyle \text{minimize} \left \langle c,x \right \rangle \\ \text{subject to the constraint } Ax \geq v$

In general there’s no reason you can’t have a “negative” amount of one variable. In this problem you can’t buy negative broccoli, so we’ll add the constraints to ensure the variables are nonnegative. So our final form is

$\displaystyle \text{minimize} \left \langle c,x \right \rangle \\ \text{subject to } Ax \geq v \\ \text{and } x \geq 0$

In general, if you have an $m \times n$ matrix $A$, a “minimums” vector $v \in \mathbb{R}^m$, and a cost vector $c \in \mathbb{R}^n$, the problem of finding the vector $x$ that minimizes the cost function while meeting the constraints is called a linear programming problem or simply a linear program.

To satiate the reader’s burning curiosity, the solution for our calcium/vitamin C problem is roughly $x = (1.01, 41.47, 0)$. That is, you should have about 100g of broccoli and 4.2kg of milk (like 4 liters), and skip the oranges entirely. The daily cost is about 4.53 USD. If this seems awkwardly large, it’s because there are cheaper ways to get water than milk.

100g of broccoli (image source: 100-grams.blogspot.com)

## Duality

Now that we’ve seen the general form a linear program and a cute example, we can ask the real meaty question: is there an efficient algorithm that solves arbitrary linear programs? Despite how widely applicable these problems seem, the answer is yes!

But before we can describe the algorithm we need to know more about linear programs. For example, say you have some vector $x$ which satisfies your constraints. How can you tell if it’s optimal? Without such a test we’d have no way to know when to terminate our algorithm. Another problem is that we’ve phrased the problem in terms of minimization, but what about problems where we want to maximize things? Can we use the same algorithm that finds minima to find maxima as well?

Both of these problems are neatly answered by the theory of duality. In mathematics in general, the best way to understand what people mean by “duality” is that one mathematical object uniquely determines two different perspectives, each useful in its own way. And typically a duality theorem provides one with an efficient way to transform one perspective into the other, and relate the information you get from both perspectives. A theory of duality is considered beautiful because it gives you truly deep insight into the mathematical object you care about.

In linear programming duality is between maximization and minimization. In particular, every maximization problem has a unique “dual” minimization problem, and vice versa. The really interesting thing is that the variables you’re trying to optimize in one form correspond to the contraints in the other form! Here’s how one might discover such a beautiful correspondence. We’ll use a made up example with small numbers to make things easy.

So you have this optimization problem

$\displaystyle \begin{matrix} \text{minimize} & 4x_1+3x_2+9x_3 & \\ \text{subject to} & x_1+x_2+x_3 & \geq 6 \\ & 2x_1+x_3 & \geq 2 \\ & x_2+x_3 & \geq 1 & \\ & x_1,x_2,x_3 & \geq 0 \end{matrix}$

Just for giggles let’s write out what $A$ and $c$ are.

$\displaystyle A = \begin{pmatrix} 1 & 1 & 1 \\ 2 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix}, c = (4,3,9), v = (6,2,1)$

Say you want to come up with a lower bound on the optimal solution to your problem. That is, you want to know that you can’t make $4x_1 + 3x_2 + 9x_3$ smaller than some number $m$. The constraints can help us derive such lower bounds. In particular, every variable has to be nonnegative, so we know that $4x_1 + 3x_2 + 9x_3 \geq x_1 + x_2 + x_3 \geq 6$, and so 6 is a lower bound on our optimum. Likewise,

\displaystyle \begin{aligned}4x_1+3x_2+9x_3 & \geq 4x_1+4x_3+3x_2+3x_3 \\ &=2(2x_1 + x_3)+3(x_2+x_3) \\ & \geq 2 \cdot 2 + 3 \cdot 1 \\ &=7\end{aligned}

and that’s an even better lower bound than 6. We could try to write this approach down in general: find some numbers $y_1, y_2, y_3$ that we’ll use for each constraint to form

$\displaystyle y_1(\text{constraint 1}) + y_2(\text{constraint 2}) + y_3(\text{constraint 3})$

To make it a valid lower bound we need to ensure that the coefficients of each of the $x_i$ are smaller than the coefficients in the objective function (i.e. that the coefficient of $x_1$ ends up less than 4). And to make it the best lower bound possible we want to maximize what the right-hand-size of the inequality would be: $y_1 6 + y_2 2 + y_3 1$. If you write out these equations and the constraints you get our “lower bound” problem written as

$\displaystyle \begin{matrix} \text{maximize} & 6y_1 + 2y_2 + y_3 & \\ \text{subject to} & y_1 + 2y_2 & \leq 4 \\ & y_1 + y_3 & \leq 3 \\ & y_1+y_2 + y_3 & \leq 9 \\ & y_1,y_2,y_3 & \geq 0 \end{matrix}$

And wouldn’t you know, the matrix providing the constraints is $A^T$, and the vectors $c$ and $v$ switched places.

$\displaystyle A^T = \begin{pmatrix} 1 & 2 & 0 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{pmatrix}$

This is no coincidence. All linear programs can be transformed in this way, and it would be a useful exercise for the reader to turn the above maximization problem back into a minimization problem by the same technique (computing linear combinations of the constraints to make upper bounds). You’ll be surprised to find that you get back to the original minimization problem! This is part of what makes it “duality,” because the dual of the dual is the original thing again. Often, when we fix the “original” problem, we call it the primal form to distinguish it from the dual form. Usually the primal problem is the one that is easy to interpret.

(Note: because we’re done with broccoli for now, we’re going to use $b$ to denote the constraint vector that used to be $v$.)

Now say you’re given the data of a linear program for minimization, that is the vectors $c, b$ and matrix $A$ for the problem, “minimize $\left \langle c, x \right \rangle$ subject to $Ax \geq b; x \geq 0$.” We can make a general definition: the dual linear program is the maximization problem “maximize $\left \langle b, y \right \rangle$ subject to $A^T y \leq c, y \geq 0$.” Here $y$ is the new set of variables and the superscript T denotes the transpose of the matrix. The constraint for the dual is often written $y^T A \leq c^T$, again identifying vectors with a single-column matrices, but I find the swamp of transposes pointless and annoying (why do things need to be columns?).

Now we can actually prove that the objective function for the dual provides a bound on the objective function for the original problem. It’s obvious from the work we’ve done, which is why it’s called the weak duality theorem.

Weak Duality Theorem: Let $c, A, b$ be the data of a linear program in the primal form (the minimization problem) whose objective function is $\left \langle c, x \right \rangle$. Recall that the objective function of the dual (maximization) problem is $\left \langle b, y \right \rangle$. If $x,y$ are feasible solutions (satisfy the constraints of their respective problems), then

$\left \langle b, y \right \rangle \leq \left \langle c, x \right \rangle$

In other words, the maximum of the dual is a lower bound on the minimum of the primal problem and vice versa. Moreover, any feasible solution for one provides a bound on the other.

Proof. The proof is pleasingly simple. Just inspect the quantity $\left \langle A^T y, x \right \rangle = \left \langle y, Ax \right \rangle$. The constraints from the definitions of the primal and dual give us that

$\left \langle y, b \right \rangle \leq \left \langle y, Ax \right \rangle = \left \langle A^Ty, x \right \rangle \leq \left \langle c,x \right \rangle$

The inequalities follow from the linear algebra fact that if the $u$ in $\left \langle u,v \right \rangle$ is nonnegative, then you can only increase the size of the product by increasing the components of $v$. This is why we need the nonnegativity constraints.

In fact, the world is much more pleasing. There is a theorem that says the two optimums are equal!

Strong Duality Theorem: If there are any solutions $x,y$ to the primal (minimization) problem and the dual (maximization) problem, respectively, then the two problems also have optimal solutions $x^*, y^*$, and two candidate solutions $x^*, y^*$ are optimal if and only if they produce equal objective values $\left \langle c, x^* \right \rangle = \left \langle y^*, b \right \rangle$.

The proof of this theorem is a bit more convoluted than the weak duality theorem, and the key technique is a lemma of Farkas and its variations. See the second half of these notes for a full proof. The nice thing is that this theorem gives us a way to tell if an algorithm to solve linear programs is done: maintain a pair of feasible solutions to the primal and dual problems, improve them by some rule, and stop when the two solutions give equal objective values. The hard part, then, is finding a principled and guaranteed way to improve a given pair of solutions.

On the other hand, you can also prove the strong duality theorem by inventing an algorithm that provably terminates. We’ll see such an algorithm, known as the simplex algorithm in the next post. Sneak peek: it’s a lot like Gaussian elimination. Then we’ll use the algorithm (or an equivalent industry-strength version) to solve a much bigger nutrition problem.

In fact, you can do a bit better than the strong duality theorem, in terms of coming up with a stopping condition for a linear programming algorithm. You can observe that an optimal solution implies further constraints on the relationship between the primal and the dual problems. In particular, this is called the complementary slackness conditions, and they essentially say that if an optimal solution to the primal has a positive variable then the corresponding constraint in the dual problem must be tight (is an equality) to get an optimal solution to the dual. The contrapositive says that if some constraint is slack, or a strict inequality, then either the corresponding variable is zero or else the solution is not optimal. More formally,

Theorem (Complementary Slackness Conditions): Let $A, c, b$ be the data of the primal form of a linear program, “minimize $\left \langle c, x \right \rangle$ subject to $Ax \geq b, x \geq 0$.” Then $x^*, y^*$ are optimal solutions to the primal and dual problems if any only if all of the following conditions hold.

• $x^*, y^*$ are both feasible for their respective problems.
• Whenever $x^*_i > 0$ the corresponding constraint $A^T_i y^* = c_i$ is an equality.
• Whenever $y^*_j > 0$ the corresponding constraint $A_j x^* = b_j$ is an equality.

Here we denote by $M_i$ the $i$-th row of the matrix $M$ and $v_i$ to denote the $i$-th entry of a vector. Another way to write the condition using vectors instead of English is

$\left \langle x^*, A^T y^* - c \right \rangle = 0$
$\left \langle y^*, Ax^* - b \right \rangle$

The proof follows from the duality theorems, and just involves pushing around some vector algebra. See section 6.2 of these notes.

One can interpret complementary slackness in linear programs in a lot of different ways. For us, it will simply be a termination condition for an algorithm: one can efficiently check all of these conditions for the nonzero variables and stop if they’re all satisfied or if we find a variable that violates a slackness condition. Indeed, in more mature optimization analyses, the slackness condition that is more egregiously violated can provide evidence for where a candidate solution can best be improved. For a more intricate and detailed story about how to interpret the complementary slackness conditions, see Section 4 of these notes by Joel Sobel.

Finally, before we close we should note there are geometric ways to think about linear programming. I have my preferred visualization in my head, but I have yet to find a suitable animation on the web that replicates it. Here’s one example in two dimensions. The set of constraints define a convex geometric region in the plane

The constraints define a convex area of “feasible solutions.” Image source: Wikipedia.

Now the optimization function $f(x) = \left \langle c,x \right \rangle$ is also a linear function, and if you fix some output value $y = f(x)$ this defines a line in the plane. As $y$ changes, the line moves along its normal vector (that is, all these fixed lines are parallel). Now to geometrically optimize the target function, we can imagine starting with the line $f(x) = 0$, and sliding it along its normal vector in the direction that keeps it in the feasible region. We can keep sliding it in this direction, and the maximum of the function is just the last instant that this line intersects the feasible region. If none of the constraints are parallel to the family of lines defined by $f$, then this is guaranteed to occur at a vertex of the feasible region. Otherwise, there will be a family of optima lying anywhere on the line segment of last intersection.

In higher dimensions, the only change is that the lines become affine subspaces of dimension $n-1$. That means in three dimensions you’re sliding planes, in four dimensions you’re sliding 3-dimensional hyperplanes, etc. The facts about the last intersection being a vertex or a “line segment” still hold. So as we’ll see next time, successful algorithms for linear programming in practice take advantage of this observation by efficiently traversing the vertices of this convex region. We’ll see this in much more detail in the next post.

Until then!

# (Finite) Fields — A Primer

So far on this blog we’ve given some introductory notes on a few kinds of algebraic structures in mathematics (most notably groups and rings, but also monoids). Fields are the next natural step in the progression.

If the reader is comfortable with rings, then a field is extremely simple to describe: they’re just commutative rings with 0 and 1, where every nonzero element has a multiplicative inverse. We’ll give a list of all of the properties that go into this “simple” definition in a moment, but an even more simple way to describe a field is as a place where “arithmetic makes sense.” That is, you get operations for $+,-, \cdot , /$ which satisfy the expected properties of addition, subtraction, multiplication, and division. So whatever the objects in your field are (and sometimes they are quite weird objects), they behave like usual numbers in a very concrete sense.

So here’s the official definition of a field. We call a set $F$ a field if it is endowed with two binary operations addition ($+$) and multiplication ($\cdot$, or just symbol juxtaposition) that have the following properties:

• There is an element we call 0 which is the identity for addition.
• Addition is commutative and associative.
• Every element $a \in F$ has a corresponding additive inverse $b$ (which may equal $a$) for which $a + b = 0$.

These three properties are just the axioms of a (commutative) group, so we continue:

• There is an element we call 1 (distinct from 0) which is the identity for multiplication.
• Multiplication is commutative and associative.
• Every nonzero element $a \in F$ has a corresponding multiplicative inverse $b$ (which may equal $a$) for which $ab = 1$.
• Addition and multiplication distribute across each other as we expect.

If we exclude the existence of multiplicative inverses, these properties make $F$ a commutative ring, and so we have the following chain of inclusions that describes it all

$\displaystyle \textup{Fields} \subset \textup{Commutative Rings} \subset \textup{Rings} \subset \textup{Commutative Groups} \subset \textup{Groups}$

The standard examples of fields are the real numbers $\mathbb{R}$, the rationals $\mathbb{Q}$, and the complex numbers $\mathbb{C}$. But of course there are many many more. The first natural question to ask about fields is: what can they look like?

For example, can there be any finite fields? A field $F$ which as a set has only finitely many elements?

As we saw in our studies of groups and rings, the answer is yes! The simplest example is the set of integers modulo some prime $p$. We call them $\mathbb{Z} / p \mathbb{Z},$ or sometimes just $\mathbb{Z}/p$ for short, and let’s rederive what we know about them now.

As a set, $\mathbb{Z}/p$ consists of the integers $\left \{ 0, 1, \dots, p-1 \right \}$. The addition and multiplication operations are easy to define, they’re just usual addition and multiplication followed by a modulus. That is, we add by $a + b \mod p$ and multiply with $ab \mod p$. This thing is clearly a commutative ring (because the integers form a commutative ring), so to show this is a field we need to show that everything has a multiplicative inverse.

There is a nice fact that allows us to do this: an element $a$ has an inverse if and only if the only way for it to divide zero is the trivial way $0a = 0$. Here’s a proof. For one direction, suppose $a$ divides zero nontrivially, that is there is some $c \neq 0$ with $ac = 0$. Then if $a$ had an inverse $b$, then $0 = b(ac) = (ba)c = c$, but that’s very embarrassing for $c$ because it claimed to be nonzero. Now suppose $a$ only divides zero in the trivial way. Then look at all possible ways to multiply $a$ by other nonzero elements of $F$. No two can give you the same result because if $ax = ay$ then (without using multiplicative inverses) $a(x-y) = 0$, but we know that $a$ can only divide zero in the trivial way so $x=y$. In other words, the map “multiplication by $a$” is injective. Because the set of nonzero elements of $F$ is finite you have to hit everything (the map is in fact a bijection), and some $x$ will give you $ax = 1$.

Now let’s use this fact on $\mathbb{Z}/p$ in the obvious way. Since $p$ is a prime, there are no two smaller numbers $a, b < p$ so that $ab = p$. But in $\mathbb{Z}/p$ the number $p$ is equivalent to zero (mod $p$)! So $\mathbb{Z}/p$ has no nontrivial zero divisors, and so every element has an inverse, and so it’s a finite field with $p$ elements.

The next question is obvious: can we get finite fields of other sizes? The answer turns out to be yes, but you can’t get finite fields of any size. Let’s see why.

## Characteristics and Vector Spaces

Say you have a finite field $k$ (lower-case k is the standard letter for a field, so let’s forget about $F$). Beacuse the field is finite, if you take 1 and keep adding it to itself you’ll eventually run out of field elements. That is, $n = 1 + 1 + \dots + 1 = 0$ at some point. How do I know it’s zero and doesn’t keep cycling never hitting zero? Well if at two points $n = m \neq 0$, then $n-m = 0$ is a time where you hit zero, contradicting the claim.

Now we define $\textup{char}(k)$, the characteristic of $k$, to be the smallest $n$ (sums of 1 with itself) for which $n = 0$. If there is no such $n$ (this can happen if $k$ is infinite, but doesn’t always happen for infinite fields), then we say the characteristic is zero. It would probably make more sense to say the characteristic is infinite, but that’s just the way it is. Of course, for finite fields the characteristic is always positive. So what can we say about this number? We have seen lots of example where it’s prime, but is it always prime? It turns out the answer is yes!

For if $ab = n = \textup{char}(k)$ is composite, then by the minimality of $n$ we get $a,b \neq 0$, but $ab = n = 0$. This can’t happen by our above observation, because being a zero divisor means you have no inverse! Contradiction, sucker.

But it might happen that there are elements of $k$ that can’t be written as $1 + 1 + \dots + 1$ for any number of terms. We’ll construct examples in a minute (in fact, we’ll classify all finite fields), but we already have a lot of information about what those fields might look like. Indeed, since every field has 1 in it, we just showed that every finite field contains a smaller field (a subfield) of all the ways to add 1 to itself. Since the characteristic is prime, the subfield is a copy of $\mathbb{Z}/p$ for $p = \textup{char}(k)$. We call this special subfield the prime subfield of $k$.

The relationship between the possible other elements of $k$ and the prime subfield is very neat. Because think about it: if $k$ is your field and $F$ is your prime subfield, then the elements of $k$ can interact with $F$ just like any other field elements. But if we separate $k$ from $F$ (make a separate copy of $F$), and just think of $k$ as having addition, then the relationship with $F$ is that of a vector space! In fact, whenever you have two fields $k \subset k'$, the latter has the structure of a vector space over the former.

Back to finite fields, $k$ is a vector space over its prime subfield, and now we can impose all the power and might of linear algebra against it. What’s it’s dimension? Finite because $k$ is a finite set! Call the dimension $m$, then we get a basis $v_1, \dots, v_m$. Then the crucial part: every element of $k$ has a unique representation in terms of the basis. So they are expanded in the form

$\displaystyle f_1v_1 + \dots + f_mv_m$

where the $f_i$ come from $F$. But now, since these are all just field operations, every possible choice for the $f_i$ has to give you a different field element. And how many choices are there for the $f_i$? Each one has exactly $|F| = \textup{char}(k) = p$. And so by counting we get that $k$ has $p^m$ many elements.

This is getting exciting quickly, but we have to pace ourselves! This is a constraint on the possible size of a finite field, but can we realize it for all choices of $p, m$? The answer is again yes, and in the next section we’ll see how.  But reader be warned: the formal way to do it requires a little bit of familiarity with ideals in rings to understand the construction. I’ll try to avoid too much technical stuff, but if you don’t know what an ideal is, you should expect to get lost (it’s okay, that’s the nature of learning new math!).

## Constructing All Finite Fields

Let’s describe a construction. Take a finite field $k$ of characteristic $p$, and say you want to make a field of size $p^m$. What we need to do is construct a field extension, that is, find a bigger field containing $k$ so that the vector space dimension of our new field over $k$ is exactly $m$.

What you can do is first form the ring of polynomials with coefficients in $k$. This ring is usually denoted $k[x]$, and it’s easy to check it’s a ring (polynomial addition and multiplication are defined in the usual way). Now if I were speaking to a mathematician I would say, “From here you take an irreducible monic polynomial $p(x)$ of degree $m$, and quotient your ring by the principal ideal generated by $p$. The result is the field we want!”

In less compact terms, the idea is exactly the same as modular arithmetic on integers. Instead of doing arithmetic with integers modulo some prime (an irreducible integer), we’re doing arithmetic with polynomials modulo some irreducible polynomial $p(x)$. Now you see the reason I used $p$ for a polynomial, to highlight the parallel thought process. What I mean by “modulo a polynomial” is that you divide some element $f$ in your ring by $p$ as much as you can, until the degree of the remainder is smaller than the degree of $p(x)$, and that’s the element of your quotient. The Euclidean algorithm guarantees that we can do this no matter what $k$ is (in the formal parlance, $k[x]$ is called a Euclidean domain for this very reason). In still other words, the “quotient structure” tells us that two polynomials $f, g \in k[x]$ are considered to be the same in $k[x] / p$ if and only if $f - g$ is divisible by $p$. This is actually the same definition for $\mathbb{Z}/p$, with polynomials replacing numbers, and if you haven’t already you can start to imagine why people decided to study rings in general.

Let’s do a specific example to see what’s going on. Say we’re working with $k = \mathbb{Z}/3$ and we want to compute a field of size $27 = 3^3$. First we need to find a monic irreducible polynomial of degree $3$. For now, I just happen to know one: $p(x) = x^3 - x + 1$. In fact, we can check it’s irreducible, because to be reducible it would have to have a linear factor and hence a root in $\mathbb{Z}/3$. But it’s easy to see that if you compute $p(0), p(1), p(2)$ and take (mod 3) you never get zero.

So I’m calling this new ring

$\displaystyle \frac{\mathbb{Z}/3[x]}{(x^3 - x + 1)}$

It happens to be a field, and we can argue it with a whole lot of ring theory. First, we know an irreducible element of this ring is also prime (because the ring is a unique factorization domain), and prime elements generate maximal ideals (because it’s a principal ideal domain), and if you quotient by a maximal ideal you get a field (true of all rings).

But if we want to avoid that kind of argument and just focus on this ring, we can explicitly construct inverses. Say you have a polynomial $f(x)$, and for illustration purposes we’ll choose $f(x) = x^4 + x^2 - 1$. Now in the quotient ring we could do polynomial long division to find remainders, but another trick is just to notice that the quotient is equivalent to the condition that $x^3 = x - 1$. So we can reduce $f(x)$ by applying this rule to $x^4 = x^3 x$ to get

$\displaystyle f(x) = x^2 + x(x-1) - 1 = 2x^2 - x - 1$

Now what’s the inverse of $f(x)$? Well we need a polynomial $g(x) = ax^2 + bx + c$ whose product with $f$ gives us something which is equivalent to 1, after you reduce by $x^3 - x + 1$. A few minutes of algebra later and you’ll discover that this is equivalent to the following polynomial being identically 1

$\displaystyle (a-b+2c)x^2 + (-3a+b-c)x + (a - 2b - 2c) = 1$

In other words, we get a system of linear equations which we need to solve:

\displaystyle \begin{aligned} a & - & b & + & 2c & = 0 \\ -3a & + & b & - & c &= 0 \\ a & - & 2b & - & 2c &= 1 \end{aligned}

And from here you can solve with your favorite linear algebra techniques. This is a good exercise for working in fields, because you get to abuse the prime subfield being characteristic 3 to say terrifying things like $-1 = 2$ and $6b = 0$. The end result is that the inverse polynomial is $2x^2 + x + 1$, and if you were really determined you could write a program to compute these linear systems for any input polynomial and ensure they’re all solvable. We prefer the ring theoretic proof.

In any case, it’s clear that taking a polynomial ring like this and quotienting by a monic irreducible polynomial gives you a field. We just control the size of that field by choosing the degree of the irreducible polynomial to our satisfaction. And that’s how we get all finite fields!

## One Last Word on Irreducible Polynomials

One thing we’ve avoided is the question of why irreducible monic polynomials exist of all possible degrees $m$ over any $\mathbb{Z}/p$ (and as a consequence we can actually construct finite fields of all possible sizes).

The answer requires a bit of group theory to prove this, but it turns out that the polynomial $x^{p^m} - x$ has all degree $m$ monic irreducible polynomials as factors. But perhaps a better question (for computer scientists) is how do we work over a finite field in practice? One way is to work with polynomial arithmetic as we described above, but this has some downsides: it requires us to compute these irreducible monic polynomials (which doesn’t sound so hard, maybe), to do polynomial long division every time we add, subtract, or multiply, and to compute inverses by solving a linear system.

But we can do better for some special finite fields, say where the characteristic is 2 (smells like binary) or we’re only looking at $F_{p^2}$. The benefit there is that we aren’t forced to use polynomials. We can come up with some other kind of structure (say, matrices of a special form) which happens to have the same field structure and makes computing operations relatively painless. We’ll see how this is done in the future, and see it applied to cryptography when we continue with our series on elliptic curve cryptography.

Until then!

# Fixing Bugs in “Computing Homology”

A few awesome readers have posted comments in Computing Homology to the effect of, “Your code is not quite correct!” And they’re right! Despite the almost year since that post’s publication, I haven’t bothered to test it for more complicated simplicial complexes, or even the basic edge cases! When I posted it the mathematics just felt so solid to me that it had to be right (the irony is rich, I know).

As such I’m apologizing for my lack of rigor and explaining what went wrong, the fix, and giving some test cases. As of the publishing of this post, the Github repository for Computing Homology has been updated with the correct code, and some more examples.

The main subroutine was the simultaneousReduce function which I’ll post in its incorrectness below

def simultaneousReduce(A, B):
if A.shape[1] != B.shape[0]:
raise Exception("Matrices have the wrong shape.")

numRows, numCols = A.shape # col reduce A

i,j = 0,0
while True:
if i >= numRows or j >= numCols:
break

if A[i][j] == 0:
nonzeroCol = j
while nonzeroCol < numCols and A[i,nonzeroCol] == 0:
nonzeroCol += 1

if nonzeroCol == numCols:
j += 1
continue

colSwap(A, j, nonzeroCol)
rowSwap(B, j, nonzeroCol)

pivot = A[i,j]
scaleCol(A, j, 1.0 / pivot)
scaleRow(B, j, 1.0 / pivot)

for otherCol in range(0, numCols):
if otherCol == j:
continue
if A[i, otherCol] != 0:
scaleAmt = -A[i, otherCol]
colCombine(A, otherCol, j, scaleAmt)
rowCombine(B, j, otherCol, -scaleAmt)

i += 1; j+= 1

return A,B


It’s a beast of a function, and the persnickety detail was just as beastly: this snippet should have an $i += 1$ instead of a $j$.

if nonzeroCol == numCols:
j += 1
continue


This is simply what happens when we’re looking for a nonzero entry in a row to use as a pivot for the corresponding column, but we can’t find one and have to move to the next row. A stupid error on my part that would be easily caught by proper test cases.

The next mistake is a mathematical misunderstanding. In short, the simultaneous column/row reduction process is not enough to get the $\partial_{k+1}$ matrix into the right form! Let’s see this with a nice example, a triangulation of the Mobius band. There are a number of triangulations we could use, many of which are seen in these slides. The one we’ll use is the following.

It’s first and second boundary maps are as follows (in code, because latex takes too much time to type out)

mobiusD1 = numpy.array([
[-1,-1,-1,-1, 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, 0,-1,-1,-1, 0, 0, 0],
[ 0, 1, 0, 0, 1, 0, 0,-1,-1, 0],
[ 0, 0, 0, 1, 0, 0, 1, 0, 1, 1],
])

mobiusD2 = numpy.array([
[ 1, 0, 0, 0, 1],
[ 0, 0, 0, 1, 0],
[-1, 0, 0, 0, 0],
[ 0, 0, 0,-1,-1],
[ 0, 1, 0, 0, 0],
[ 1,-1, 0, 0, 0],
[ 0, 0, 0, 0, 1],
[ 0, 1, 1, 0, 0],
[ 0, 0,-1, 1, 0],
[ 0, 0, 1, 0, 0],
])


And if we were to run the above code on it we’d get a first Betti number of zero (which is incorrect, it’s first homology group has rank 1). Here are the reduced matrices.

>>> A1, B1 = simultaneousReduce(mobiusD1, mobiusD2)
>>> A1
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])
>>> B1
array([[ 0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0],
[ 0,  1,  0,  0,  0],
[ 1, -1,  0,  0,  0],
[ 0,  0,  0,  0,  1],
[ 0,  1,  1,  0,  0],
[ 0,  0, -1,  1,  0],
[ 0,  0,  1,  0,  0]])


The first reduced matrix looks fine; there’s nothing we can do to improve it. But the second one is not quite fully reduced! Notice that rows 5, 8 and 10 are not linearly independent. So we need to further row-reduce the nonzero part of this matrix before we can read off the true rank in the way we described last time. This isn’t so hard (we just need to reuse the old row-reduce function we’ve been using), but why is this allowed? It’s just because the corresponding column operations for those row operations are operating on columns of all zeros! So we need not worry about screwing up the work we did in column reducing the first matrix, as long as we only work with the nonzero rows of the second.

Of course, nothing is stopping us from ignoring the “corresponding” column operations, since we know we’re already done there. So we just have to finish row reducing this matrix.

This changes our bettiNumber function by adding a single call to a row-reduce function which we name so as to be clear what’s happening. The resulting function is

def bettiNumber(d_k, d_kplus1):
A, B = numpy.copy(d_k), numpy.copy(d_kplus1)
simultaneousReduce(A, B)
finishRowReducing(B)

dimKChains = A.shape[1]
kernelDim = dimKChains - numPivotCols(A)
imageDim = numPivotRows(B)

return kernelDim - imageDim


And running this on our Mobius band example gives:

>>> bettiNumber(mobiusD1, mobiusD2))
1


As desired. Just to make sure things are going swimmingly under the hood, we can check to see how finishRowReducing does after calling simultaneousReduce

>>> simultaneousReduce(mobiusD1, mobiusD2)
(array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]]), array([[ 0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0],
[ 0,  0,  0,  0,  0],
[ 0,  1,  0,  0,  0],
[ 1, -1,  0,  0,  0],
[ 0,  0,  0,  0,  1],
[ 0,  1,  1,  0,  0],
[ 0,  0, -1,  1,  0],
[ 0,  0,  1,  0,  0]]))
>>> finishRowReducing(mobiusD2)
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])


Indeed, finishRowReducing finishes row reducing the second boundary matrix. Note that it doesn’t preserve how the rows of zeros lined up with the pivot columns of the reduced version of $\partial_1$ as it did in the previous post, but since in the end we’re only counting pivots it doesn’t matter how we switch rows. The “zeros lining up” part is just for a conceptual understanding of how the image lines up with the kernel for a valid simplicial complex.

In fixing this issue we’ve also fixed an issue another commenter mentioned, that you couldn’t blindly plug in the zero matrix for $\partial_0$ and get zeroth homology (which is the same thing as connected components). After our fix you can.

Of course there still might be bugs, but I have so many drafts lined up on this blog (and research papers to write, experiments to run, theorems to prove), that I’m going to put off writing a full test suite. I’ll just have to update this post with new bug fixes as they come. There’s just so much math and so little time :) But extra kudos to my amazing readers who were diligent enough to run examples and spot my error. I’m truly blessed to have you on my side.

Also note that this isn’t the most efficient way to represent the simplicial complex data, or the most efficient row reduction algorithm. If you’re going to run the code on big inputs, I suggest you take advantage of sparse matrix algorithms for doing this sort of stuff. You can represent the simplices as entries in a dictionary and do all sorts of clever optimizations to make the algorithm effectively linear time in the number of simplices.

Until next time!