Concrete Examples of Quantum Gates

So far in this series we’ve seen a lot of motivation and defined basic ideas of what a quantum circuit is. But on rereading my posts, I think we would all benefit from some concreteness.

“Local” operations

So by now we’ve understood that quantum circuits consist of a sequence of gates A_1, \dots, A_k, where each A_i is an 8-by-8 matrix that operates “locally” on some choice of three (or fewer) qubits. And in your head you imagine starting with some state vector v and applying each A_i locally to its three qubits until the end when you measure the state and get some classical output.

But the point I want to make is that A_i actually changes the whole state vector v, because the three qubits it acts “locally” on are part of the entire basis. Here’s an example. Suppose we have three qubits and they’re in the state

\displaystyle v = \frac{1}{\sqrt{14}} (e_{001} + 2e_{011} - 3e_{101})

Recall we abbreviate basis states by subscripting them by binary strings, so e_{011} = e_0 \otimes e_1 \otimes e_1, and a valid state is any unit vector over the 2^3 = 8 possible basis elements. As a vector, this state is \frac{1}{\sqrt{14}} (0,1,0,2,0,-3,0,0)

Say we apply the gate A that swaps the first and third qubits. “Locally” this gate has the following matrix:

\displaystyle \begin{pmatrix} 1&0&0&0 \\ 0&0&1&0 \\ 0&1&0&0 \\ 0&0&0&1 \end{pmatrix}

where we index the rows and columns by the relevant strings in lexicographic order: 00, 01, 10, 11. So this operation leaves e_{00} and e_{11} the same while swapping the other two. However, as an operation on three qubits the operation looks quite different. And it’s sort of hard to describe a general way to write it down as a matrix because of the choice of indices. There are three different perspectives.

Perspective 1: if the qubits being operated on are sequential (like, the third, fourth, and fifth qubits), then we can write the matrix as I_{2^{a}} \otimes A \otimes I_{2^{b}} where a tensor product of matrices is the Kronecker product and a + b + \log \textup{dim}(A) = n (the number of qubits adds up). Then the final operation looks like a “tiled product” of identity matrices by A, but it’s a pain to write out. Let me hurt my self for your sake, dear reader.

tensormatrix1

And each copy of A \otimes I_{2^b} looks like

tiled-piece

 

That’s a mess, but if you write it out for our example of swapping the first and third qubits of a three-qubit register you get the following:

example-3swap2

And this makes sense: the gate changes any entry of the state vector that has values for the first and third qubit that are different. This is what happens to our state:

\displaystyle v = \frac{1}{\sqrt{14}} (0,1,0,2,0,-3,0,0) \mapsto \frac{1}{\sqrt{14}} (0,0,0,0,1,-3,2,0)

Perspective 2: just assume every operation works on the first three qubits, and wrap each operation A in between an operation that swaps the first three qubits with the desired three. So like BAB for B a swap operation. Then the matrix form looks a bit simpler, and it just means we permute the columns of the matrix form we gave above so that it just has the form A \otimes I_a. This allows one to retain a shred of sanity when trying to envision the matrix for an operation that acts on three qubits that are not sequential. The downside is that to actually use this perspective in an analysis you have to carry around the extra baggage of these permutation matrices. So one might use this as a simplifying assumption (a “without loss of generality” statement).

Perspective 3: ignore matrices and write things down in a summation form. So if \sigma is the permutation that swaps 1 and 3 and leaves the other indices unchanged, we can write the general operation on a state v = \sum_{x \in \{ 0,1 \}^n } a_x e_{x} as Av = \sum_{x \in \{ 0,1 \}^n} a_x e_{\sigma(x)}.

The third option is probably the nicest way to do things, but it’s important to keep the matrix view in mind for many reasons. Just one quick reason: “errors” in quantum gates (that are meant to approximately compute something) compound linearly in the number of gates because the operations are linear. This is a key reason that allows one to design quantum analogues of error correcting codes.

So we’ve established that the basic (atomic) quantum gates are “local” in the sense that they operate on a fixed number of qubits, but they are not local in the sense that they can screw up the entire state vector.

A side note on the meaning of “local”

When I was chugging through learning this stuff (and I still have far to go), I wanted to come up with an alternate characterization of the word “local” so that I would feel better about using the word “local.” Mathematicians are as passionate about word choice as programmers are about text editors. In particular, for a long time I was ignorantly convinced that quantum gates that act on a small number of qubits don’t affect the marginal distribution of measurement outcomes for other qubits. That is, I thought that if A acts on qubits 1,2,3, then Av and v have the same probability of a measurement producing a 1 in index 4, 5, etc, conditioned on fixing a measurement outcome for qubits 1,2,3. In notation, if x is a random variable whose values are binary strings and v is a state vector, I’ll call x \sim v the random process of measuring a state vector v and getting a string x, then my claim was that the following was true for every b_1, b_2, b_3 \in \{0,1\} and every 1 \leq i \leq n:

\displaystyle \begin{aligned}\Pr_{x \sim v}&[x_i = 1 \mid x_1 = b_1, x_2 = b_2, x_3 = b_3] = \\ \Pr_{x \sim Av}&[x_i = 1 \mid x_1 = b_1, x_2 = b_2, x_3 = b_3] \end{aligned}

You could try to prove this, and you would fail because it’s false. In fact, it’s even false if A acts on only a single qubit! Because it’s so tedious to write out all of the notation, I decided to write a program to illustrate the counterexample. (The most brazenly dedicated readers will try to prove this false fact and identify where the proof fails.)

import numpy

H = (1/(2**0.5)) * numpy.array([[1,1], [1,-1]])
I = numpy.identity(4)
A = numpy.kron(H,I)

Here H is the 2 by 2 Hadamard matrix, which operates on a single qubit and maps e_0 \mapsto \frac{e_0 + e_1}{\sqrt{2}}, and e_1 \mapsto \frac{e_0 - e_1}{\sqrt{2}}. This matrix is famous for many reasons, but one simple use as a quantum gate is to generate uniform random coin flips. In particular, measuring He_0 outputs 1 and 0 with equal probability.

So in the code sample above, A is the mapping which applies the Hadamard operation to the first qubit and leaves the other qubits alone.

Then we compute some arbitrary input state vector w

def normalize(z):
   return (1.0 / (sum(abs(z)**2) ** 0.5)) * z

v = numpy.arange(1,9)
w = normalize(v)

And now we write a function to compute the probability of some query conditioned on some fixed bits. We simply sum up the square norms of all of the relevant indices in the state vector.

def condProb(state, query={}, fixed={}):
   num = 0
   denom = 0
   dim = int(math.log2(len(state)))

   for x in itertools.product([0,1], repeat=dim):
      if any(x[index] != b for (index,b) in fixed.items()):
         continue

      i = sum(d << i for (i,d) in enumerate(reversed(x)))
      denom += abs(state[i])**2
      if all(x[index] == b for (index, b) in query.items()):
         num += abs(state[i]) ** 2

   if num == 0:
      return 0 

   return num / denom

So if the query is query = {1:0} and the fixed thing is fixed = {0:0}, then this will compute the probability that the measurement results in the second qubit being zero conditioned on the first qubit also being zero.

And the result:

Aw = A.dot(w)
query = {1:0}
fixed = {0:0}
print((condProb(w, query, fixed), condProb(Aw, query, fixed)))
# (0.16666666666666666, 0.29069767441860467)

So they are not equal in general.

Also, in general we won’t work explicitly with full quantum gate matrices, since for n qubits the have size 2^{2n} which is big. But for finding counterexamples to guesses and false intuition, it’s a great tool.

Some important gates on 1-3 qubits

Let’s close this post with concrete examples of quantum gates. Based on the above discussion, we can write out the 2 x 2 or 4 x 4 matrix form of the operation and understand that it can apply to any two qubits in the state of a quantum program. Gates are most interesting when they’re operating on entangled qubits, and that will come out when we visit our first quantum algorithm next time, but for now we will just discuss at a naive level how they operate on the basis vectors.

Hadamard gate:

We introduced the Hadamard gate already, but I’ll reiterate it here.

Let H be the following 2 by 2 matrix, which operates on a single qubit and maps e_0 \mapsto \frac{e_0 + e_1}{\sqrt{2}}, and e_1 \mapsto \frac{e_0 - e_1}{\sqrt{2}}.

\displaystyle H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

One can use H to generate uniform random coin flips. In particular, measuring He_0 outputs 1 and 0 with equal probability.

Quantum NOT gate:

Let X be the 2 x 2 matrix formed by swapping the columns of the identity matrix.

\displaystyle X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}

This gate is often called the “Pauli-X” gate by physicists. This matrix is far too simple to be named after a person, and I can only imagine it is still named after a person for the layer of obfuscation that so often makes people feel smarter (same goes for the Pauli-Y and Pauli-Z gates, but we’ll get to those when we need them).

If we’re thinking of e_0 as the boolean value “false” and e_1 as the boolean value “true”, then the quantum NOT gate simply swaps those two states. In particular, note that composing a Hadamard and a quantum NOT gate can have interesting effects: XH(e_0) = H(e_0), but XH(e_1) \neq H(e_1). In the second case, the minus sign is the culprit. Which brings us to…

Phase shift gate:

Given an angle \theta, we can “shift the phase” of one qubit by an angle of \theta using the 2 x 2 matrix R_{\theta}.

\displaystyle R_{\theta} = \begin{pmatrix} 1 & 0 \\ 0 & e^{i \theta} \end{pmatrix}

“Phase” is a term physicists like to use for angles. Since the coefficients of a quantum state vector are complex numbers, and since complex numbers can be thought of geometrically as vectors with direction and magnitude, it makes sense to “rotate” the coefficient of a single qubit. So R_{\theta} does nothing to e_0 and it rotates the coefficient of e_1 by an angle of \theta.

Continuing in our theme of concreteness, if I have the state vector v = \frac{1}{\sqrt{204}} (1,2,3,4,5,6,7,8) and I apply a rotation of pi to the second qubit, then my operation is the matrix I_2 \otimes R_{\pi} \otimes I_2 which maps e_{i0k} \mapsto e_{i0k} and e_{i1k} \mapsto -e_{i1k}. That would map the state v to (1,2,-3,-4,5,6,-7,-8).

If we instead used the rotation by \pi/2 we would get the output state (1,2,3i, 4i, 5, 6, 7i, 8i).

Quantum AND/OR gate:

In the last post in this series we gave the quantum AND gate and left the quantum OR gate as an exercise. Rather than write out the matrix again, let me remind you of this gate using a description of the effect on the basis e_{ijk} where i,j,k \in \{ 0,1 \}. Recall that we need three qubits in order to make the operation reversible (which is a consequence of all unitary gates being unitary matrices). Some notation: \oplus is the XOR of two bits, and \wedge is AND, \vee is OR. The quantum AND gate maps

\displaystyle e_{ijk} \mapsto e_{ij(k \oplus (i \wedge j))}

In words, the third coordinate is XORed with the AND of the first two coordinates. We think of the third coordinate as a “scratchwork” qubit which is maybe prepared ahead of time to be in state zero.

Simiarly, the quantum OR gate maps e_{ijk} \mapsto e_{ij(k \oplus (i \vee j))}. As we saw last time these combined with the quantum NOT gate (and some modest number of scratchwork qubits) allows quantum circuits to simulate any classical circuit.

Controlled-* gate:

The last example in this post is a meta-gate that represents a conditional branching. If we’re given a gate A acting on k qubits, then we define the controlled-A to be an operation which acts on k+1 qubits. Let’s call the added qubit “qubit zero.” Then controlled-A does nothing if qubit zero is in state 0, and applies A if qubit zero is in state 1. Qubit zero is generally called the “control qubit.”

The matrix representing this operation decomposes into blocks if the control qubit is actually the first qubit (or you rearrange).

\displaystyle \begin{pmatrix} I_{2^{k-1}} & 0 \\ 0 & A \end{pmatrix}

A common example of this is the controlled-NOT gate, often abbreviated CNOT, and it has the matrix

\displaystyle \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix}

Looking forward

Okay let’s take a step back and evaluate our life choices. So far we’ve spent a few hours of our time motivating quantum computing, explaining the details of qubits and quantum circuits, and seeing examples of concrete quantum gates and studying measurement. I’ve hopefully hammered into your head the notion that quantum states which aren’t pure tensors (i.e. entangled) are where the “weirdness” of quantum computing comes from. But we haven’t seen any examples of quantum algorithms yet!

Next time we’ll see our first example of an algorithm that is genuinely quantum. We won’t tackle factoring yet, but we will see quantum “weirdness” in action.

Until then!

Computing Homology

Update: the mistakes made in the code posted here are fixed and explained in a subsequent post (one minor code bug was fixed here, and a less minor conceptual bug is fixed in the linked post).

In our last post in this series on topology, we defined the homology group. Specifically, we built up a topological space as a simplicial complex (a mess of triangles glued together), we defined an algebraic way to represent collections of simplices called chains as vectors in a vector space, we defined the boundary homomorphism \partial_k as a linear map on chains, and finally defined the homology groups as the quotient vector spaces

\displaystyle H_k(X) = \frac{\textup{ker} \partial_k}{\textup{im} \partial_{k+1}}.

The number of holes in X was just the dimension of this quotient space.

In this post we will be quite a bit more explicit. Because the chain groups are vector spaces and the boundary mappings are linear maps, they can be represented as matrices whose dimensions depend on our simplicial complex structure. Better yet, if we have explicit representations of our chains by way of a basis, then we can use row-reduction techniques to write the matrix in a standard form.

Of course the problem arises when we want to work with two matrices simultaneously (to compute the kernel-mod-image quotient above). This is not computationally any more difficult, but it requires some theoretical fiddling. We will need to dip a bit deeper into our linear algebra toolboxes to see how it works, so the rusty reader should brush up on their linear algebra before continuing (or at least take some time to sort things out if or when confusion strikes).

Without further ado, let’s do an extended example and work our ways toward a general algorithm. As usual, all of the code used for this post is available on this blog’s Github page.

Two Big Matrices

Recall our example simplicial complex from last time.

circle-wedge-sphere

We will compute H_1 of this simplex (which we saw last time was \mathbb{Q}) in a more algorithmic way than we did last time.

Once again, we label the vertices 0-4 so that the extra “arm” has vertex 4 in the middle, and its two endpoints are 0 and 2. This gave us orientations on all of the simplices, and the following chain groups. Since the vertex labels (and ordering) are part of the data of a simplicial complex, we have made no choices in writing these down.

\displaystyle C_0(X) = \textup{span} \left \{ 0,1,2,3,4 \right \}

\displaystyle C_1(X) = \textup{span} \left \{ [0,1], [0,2], [0,3], [0,4], [1,2], [1,3],[2,3],[2,4] \right \}

\displaystyle C_2(X) = \textup{span} \left \{ [0,1,2], [0,1,3], [0,2,3], [1,2,3] \right \}

Now given our known definitions of \partial_k as an alternating sum from last time, we can give a complete specification of the boundary map as a matrix. For \partial_1, this would be

\displaystyle \partial_1 = \bordermatrix{  & [0,1] & [0,2] & [0,3] & [0,4] & [1,2] & [1,3] & [2,3] & [2,4] \cr  0 & -1 & -1 & -1 & -1 & 0 & 0 & 0 & 0\cr  1 & 1 & 0 & 0 & 0 & -1 & -1 & 0 & 0\cr  2 & 0 & 1 & 0 & 0 & 1 & 0 & -1 & -1 \cr  3 & 0 & 0 & 1 & 0 & 0 & 1 & 1 & 0 \cr  4 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 },

where the row labels are the basis for C_0(X) and the column labels are the basis for C_1(X). Similarly, \partial_2 is

\displaystyle \partial_2 = \bordermatrix{  & [0,1,2] & [0,1,3] & [0,2,3] & [1,2,3] \cr  [0,1] & 1 & 1 & 0 & 0\cr  [0,2] & -1 & 0 & 1 & 0\cr  [0,3] & 0 & -1 & -1 & 0\cr  [0,4] & 0 & 0 & 0 & 0\cr  [1,2] & 1 & 0 & 0 & 1\cr  [1,3] & 0 & 1 & 0 & -1\cr  [2,3] & 0 & 0 & 1 & 1\cr  [2,4] & 0 & 0 & 0 & 0}

The reader is encouraged to check that these matrices are written correctly by referring to the formula for \partial as given last time.

Remember the crucial property of \partial, that \partial^2 = \partial_k \partial_{k+1} = 0. Indeed, the composition of the two boundary maps just corresponds to the matrix product of the two matrices, and one can verify by hand that the above two matrices multiply to the zero matrix.

We know from basic linear algebra how to compute the kernel of a linear map expressed as a matrix: column reduce and inspect the columns of zeros. Since the process of row reducing is really a change of basis, we can encapsulate the reduction inside a single invertible matrix A, which, when left-multiplied by \partial, gives us the reduced form of the latter. So write the reduced form of \partial_1 as \partial_1 A.

However, now we’re using two different sets of bases for the shared vector space involved in \partial_1 and \partial_2. In general, it will no longer be the case that \partial_kA\partial_{k+1} = 0. The way to alleviate this is to perform the “corresponding” change of basis in \partial_{k+1}. To make this idea more transparent, we return to the basics.

Changing Two Matrices Simultaneously

Recall that a matrix M represents a linear map between two vector spaces f : V \to W. The actual entries of M depend crucially on the choice of a basis for the domain and codomain. Indeed, if v_i form a basis for V and w_j for W, then the k-th column of the matrix representation M is defined to be the coefficients of the representation of f(v_k) in terms of the w_j. We hope to have nailed this concept down firmly in our first linear algebra primer.

Recall further that row operations correspond to changing a basis for the codomain, and column operations correspond to changing a basis for the domain. For example, the idea of swapping columns i,j in M gives a new matrix which is the representation of f with respect to the (ordered) basis for V which swaps the order of v_i , v_j. Similar things happen for all column operations (they all correspond to manipulations of the basis for V), while analogously row operations implicitly transform the basis for the codomain. Note, though, that the connection between row operations and transformations of the basis for the codomain are slightly more complicated than they are for the column operations. We will explicitly see how it works later in the post.

And so if we’re working with two maps A: U \to V and B: V \to W, and we change a basis for V in B via column reductions, then in order to be consistent, we need to change the basis for V in A via “complementary” row reductions. That is, if we call the change of basis matrix Q, then we’re implicitly sticking Q in between the composition BA to get (BQ)A. This is not the same map as BA, but we can make it the same map by adding a Q^{-1} in the right place:

\displaystyle BA = B(QQ^{-1})A = (BQ)(Q^{-1}A)

Indeed, whenever Q is a change of basis matrix so is Q^{-1} (trivially), and moreover the operations that Q performs on the columns of B are precisely the operations that Q^{-1} performs on the rows of A (this is because elementary row operations take different forms when multiplied on the left or right).

Coming back to our boundary operators, we want a canonical way to view the image of \partial_{k+1} as sitting inside the kernel of \partial_k. If we go ahead and use column reductions to transform \partial_k into a form where the kernel is easy to read off (as the columns consisting entirely of zeroes), then the corresponding row operations, when performed on \partial_{k+1} will tell us exactly the image of \partial_{k+1} inside the kernel of \partial_k.

This last point is true precisely because \textup{im} \partial_{k+1} \subset \textup{ker} \partial_k. This fact guarantees that the irrelevant rows of the reduced version of \partial_{k+1} are all zero.

Let’s go ahead and see this in action on our two big matrices above. For \partial_1, the column reduction matrix is

\displaystyle A =  \begin{pmatrix}  0 & 1 & 0 & 0 & 1 & 1 & 0 & 0\\  0 & 0 & 1 & 0 & -1 & 0 & 1 & 1\\  0 & 0 & 0 & 1 & 0 & -1 & -1 & 0\\  -1 & -1 & -1 & -1 & 0 & 0 & 0 & -1\\  0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\  0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\  0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 1  \end{pmatrix}

And the product \partial_1 A is

\displaystyle \partial_1 A =  \begin{pmatrix}  1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\  0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\  0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\  0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\  -1 & -1 & -1 & -1 & 0 & 0 & 0 & 0  \end{pmatrix}

Now the inverse of A, which is the corresponding basis change for \partial_2, is

\displaystyle A^{-1} =  \begin{pmatrix}  -1 & -1 & -1 & -1 & -0 & -0 & -0 & -0\\  1 & 0 & 0 & 0 & -1 & -1 & 0 & 0\\  0 & 1 & 0 & 0 & 1 & 0 & -1 & -1\\  0 & 0 & 1 & 0 & 0 & 1 & 1 & 0\\  0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\  0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\  0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 1  \end{pmatrix}

and the corresponding reduced form of \partial_2 is

\displaystyle A^{-1} \partial_2 =  \begin{pmatrix}  0 & 0 & 0 & 0\\  0 & 0 & 0 & 0\\  0 & 0 & 0 & 0\\  0 & 0 & 0 & 0\\  1 & 0 & 0 & 1\\  0 & 1 & 0 & -1\\  0 & 0 & 1 & 1\\  0 & 0 & 0 & 0  \end{pmatrix}

As a side note, we got these matrices by slightly modifying the code from our original post on row reduction to output the change of basis matrix in addition to performing row reduction. It turns out one can implement column reduction as row reduction of the transpose, and the change of basis matrix you get from this process will be the transpose of the change of basis matrix you want (by (AB)^\textup{T} = (B^\textup{T}A^\textup{T})). Though the code is particularly ad-hoc, we include it with the rest of the code used in this post on this blog’s Github page.

Now let’s inspect the two matrices \partial_1 A and A^{-1} \partial_2 more closely. The former has four “pivots” left over, and this corresponds to the rank of the matrix being 4. Moreover, the four basis vectors representing the columns with nonzero pivots, which we’ll call v_1, v_2, v_3, v_4 (we don’t care what their values are), span a complementary subspace to the kernel of \partial_1. Hence, the remaining four vectors (which we’ll call v_5, v_6, v_7, v_8) span the kernel. In particular, this says that the kernel has dimension 4.

On the other hand, we performed the same transformation of the basis of C_1(X) for \partial_2. Looking at the matrix that resulted, we see that the first four rows and the last row (representing v_1, v_2, v_3, v_4, v_8) are entirely zeros and so the image of \partial_2 intersects their span trivially. and the remaining three rows (representing v_5, v_6, v_7) have nonzero pivots. This tells us exactly that the image of \partial_2 is spanned by v_5, v_6, v_7.

And now, the coup de grâce, the quotient to get homology is simply

\displaystyle \frac{ \textup{span} \left \{ v_5, v_6, v_7, v_8 \right \}}{ \textup{span} \left \{ v_5, v_6, v_7 \right \}} = \textup{span} \left \{ v_8 \right \}

And the dimension of the homology group is 1, as desired.

The General Algorithm

It is no coincidence that things worked out at nicely as they did. The process we took of simultaneously rewriting two matrices with respect to a common basis is the bulk of the algorithm to compute homology. Since we’re really only interested in the dimensions of the homology groups, we just need to count pivots. If the number of pivots arising in \partial_k is y and the number of pivots arising in \partial_{k+1} is z, and the dimension of C_k(X) is n, then the dimension is exactly

(n-y) - z = \textup{dim}(\textup{ker} \partial_k) - \textup{dim}(\textup{im}\partial_{k+1})

And it is no coincidence that the pivots lined up so nicely to allow us to count dimensions this way. It is a minor exercise to prove it formally, but the fact that the composition \partial_k \partial_{k+1} = 0 implies that the reduced version of \partial_{k+1} will have an almost reduced row-echelon form (the only difference being the rows of zeros interspersed above, below, and possibly between pivot rows).

As the reader may have guessed at this point, we don’t actually need to compute A and A^{-1}. Instead of this, we can perform the column/row reductions simultaneously on the two matrices. The above analysis helped us prove the algorithm works, and with that guarantee we can throw out the analytical baggage and just compute the damn thing.

Indeed, assuming the input is already processed as two matrices representing the boundary operators with respect to the standard bases of the chain groups, computing homology is only slightly more difficult than row reducing in the first place. Putting our homology where our mouth is, we’ve implemented the algorithm in Python. As usual, the entire code used in this post is available on this blog’s Github page.

The first step is writing auxiliary functions to do elementary row and column operations on matrices. For this post, we will do everything in numpy (which makes the syntax shorter than standard Python syntax, but dependent on the numpy library).

import numpy

def rowSwap(A, i, j):
   temp = numpy.copy(A[i, :])
   A[i, :] = A[j, :]
   A[j, :] = temp

def colSwap(A, i, j):
   temp = numpy.copy(A[:, i])
   A[:, i] = A[:, j]
   A[:, j] = temp

def scaleCol(A, i, c):
   A[:, i] *= c*numpy.ones(A.shape[0])

def scaleRow(A, i, c):
   A[i, :] *= c*numpy.ones(A.shape[1])

def colCombine(A, addTo, scaleCol, scaleAmt):
   A[:, addTo] += scaleAmt * A[:, scaleCol]

def rowCombine(A, addTo, scaleRow, scaleAmt):
   A[addTo, :] += scaleAmt * A[scaleRow, :]

From here, the main meat of the algorithm is doing column reduction on one matrix, and applying the corresponding row operations on the other.

def simultaneousReduce(A, B):
   if A.shape[1] != B.shape[0]:
      raise Exception("Matrices have the wrong shape.")

   numRows, numCols = A.shape # col reduce A

   i,j = 0,0
   while True:
      if i >= numRows or j >= numCols:
         break

      if A[i][j] == 0:
         nonzeroCol = j
         while nonzeroCol < numCols and A[i,nonzeroCol] == 0:
            nonzeroCol += 1

         if nonzeroCol == numCols:
            i += 1
            continue

         colSwap(A, j, nonzeroCol)
         rowSwap(B, j, nonzeroCol)

      pivot = A[i,j]
      scaleCol(A, j, 1.0 / pivot)
      scaleRow(B, j, 1.0 / pivot)

      for otherCol in range(0, numCols):
         if otherCol == j:
            continue
         if A[i, otherCol] != 0:
            scaleAmt = -A[i, otherCol]
            colCombine(A, otherCol, j, scaleAmt)
            rowCombine(B, j, otherCol, -scaleAmt)

      i += 1; j+= 1

   return A,B

This more or less parallels the standard algorithm for row-reduction (with the caveat that all the indices are swapped to do column-reduction). The only somewhat confusing line is the call to rowCombine, which explicitly realizes the corresponding row operation as the inverse of the performed column operation. Note that for row operations, the correspondence between operations on the basis and operations on the rows is not as direct as it is for columns. What’s given above is the true correspondence. Writing down lots of examples will reveal why, and we leave that as an exercise to the reader.

Then the actual algorithm to compute homology is just a matter of counting pivots. Here are two pivot-counting functions in a typical numpy fashion:

def numPivotCols(A):
   z = numpy.zeros(A.shape[0])
   return [numpy.all(A[:, j] == z) for j in range(A.shape[1])].count(False)

def numPivotRows(A):
   z = numpy.zeros(A.shape[1])
   return [numpy.all(A[i, :] == z) for i in range(A.shape[0])].count(False)

And the final function is just:

def bettiNumber(d_k, d_kplus1):
   A, B = numpy.copy(d_k), numpy.copy(d_kplus1)
   simultaneousReduce(A, B)

   dimKChains = A.shape[1]
   kernelDim = dimKChains - numPivotCols(A)
   imageDim = numPivotRows(B)

   return kernelDim - imageDim

And there we have it! We’ve finally tackled the beast, and written a program to compute algebraic features of a topological space.

The reader may be curious as to why we didn’t come up with a more full-bodied representation of a simplicial complex and write an algorithm which accepts a simplicial complex and computes all of its homology groups. We’ll leave this direct approach as a (potentially long) exercise to the reader, because coming up in this series we are going to do one better. Instead of computing the homology groups of just one simplicial complex using by repeating one algorithm many times, we’re going to compute all the homology groups of a whole family of simplicial complexes in a single bound. This family of simplicial complexes will be constructed from a data set, and so, in grandiose words, we will compute the topological features of data.

If it sounds exciting, that’s because it is! We’ll be exploring a cutting-edge research field known as persistent homology, and we’ll see some of the applications of this theory to data analysis.

Until then!

Linear Algebra – A Primer

Story Time

Linear algebra was founded around the same time as Calculus (think Leibniz, circa 1700) solely for the purpose of solving general systems of linear equations. The coefficients of a system were written in a grid form, with rows corresponding to equations and columns to the unknown variables. Using a computational tool called the determinant (an awkward, but computable formula involving only the coefficients of the equations in a system), researchers were able to solve these systems, opening a world of information about the positions of celestial bodies and large-scale measurements (of geodesic arcs) on the surface of the earth.

By the 1850’s, Arthur Cayley was representing matrices as abstract objects. He defined matrix multiplication and nurtured matrix theory as its own field, recognizing a vast wealth of theoretical knowledge underlying the theory of determinants. Around turn of the century, a formal system of vector algebra was invented which relied heavily on interpreting matrices as so-called linear transformations. Linear transformations are intuitively those maps of everyday space (\mathbb{R}^n) which preserve “linear” things. Specifically, they send lines to lines, planes to planes, etc., and they preserve the origin (one which does not preserve the origin is very similar but has a different name; see Affine Transformation). Soon enough the mathematical focus shifted to the foundations of such an algebra, and later with the advent of computers to rapid calculations in one.

Motivations

Linear algebra sits at the crossroads of many areas of mathematics. Keeping close to its roots, linear algebra is primarily a tool for computation. Unsurprisingly, a huge chunk of mathematical research has been solely to phrase things in terms of matrices and their associated linear transformations. For instance, an undirected graph on n vertices can be modeled as a matrix of integer entries, with the i,j entry containing the number of edges from vertex i to vertex j. This is called the adjacency matrix of a graph. Suddenly, a wealth of information about the graph translates to simple matrix computations. For instance, we can compute the number of paths from one vertex to another of length m as the appropriate entry of A^m. (more formally,these are walks, which are allowed to repeat edge traversals and visited vertices)

Even in advanced, purely theoretical mathematics, objects are commonly represented in terms of coordinates in some vector space, and are subsequently studied using all of the great things we know about linear transformations and their matrices. And so, without further ado, we will present the terminology and working concepts necessary for the content elsewhere in this blog.

Vector Spaces

The setting for all of linear algebra is in some vector space. Intuitively this is just a collection of objects, which we call vectors, with some rules on how you can combine vectors to get other vectors. This treatment wouldn’t do that idea justice without an axiomatic definition, so here it is.

Definition: A vector space is a quadruple (V, F, +, \cdot), where V is a set of vectors (points in our space), F is a scalar field (coefficients), +:V \times V \to V is a commutative, associative operation to combine vectors, and \cdot: F \times V \to V is an operation to “scale” vectors. In addition, we need the following properties to hold:

  • Addition and multiplication distribute (as we are used to with traditional algebra).
  • There must be an additive identity, which we call 0, giving 0 + v = v for all v \in V.
  • Every vector must have an additive inverse (every v has some w with v + w = 0).

This is a lot to swallow at first, but it is general for a good reason: there are tons of different kinds of vector spaces! Many of these are surprising and counter-intuitive. For our purposes, however, we may stick with the nice, small vector spaces. So here is a simplified definition that will suffice:

Definition: vector space is a set V of vectors which are fixed-length lists of real numbers (v_1, v_2, \dots , v_n) \in \mathbb{R}^n, where addition between vectors is componentwise, we may scale vectors by any real number, and the following properties hold:

  • Addition and multiplication distribute (as above).
  • (0,0,0, \dots, 0) is the additive identity.
  • (-v_1, -v_2, \dots , -v_n) is the unique additive inverse of (v_1, v_2, \dots , v_n).

Hopefully this is much more familiar to what we think of as “vectors,” and with the understanding that we are viewing it as a vector space, we just call it \mathbb{R}^n. The closure of operations gives us a nice way to characterize “any combination” of vectors in a vector space.

Definition: A linear combination of vectors in a vector space V is the vector

a_1v_1 + a_2v_2 + \dots + a_kv_k

for some positive integer k, scalars a_i, and vectors v_i.

We may speak of the span of a set of vectors as the set of all possible linear combinations of those vectors. Furthermore, we call a set of vectors linearly independent if no vector in the list is in the span of the others. For example, (1,0,0), (0,1,0), and (0,0,1) are linearly independent in \mathbb{R}^3. Specifically, (1,0,0) cannot be written as a(0,1,0) + b(0,0,1) = (0,a,b) for any scalars a,b \in F, and the other two vectors are similarly so.

As usual, we may describe subspaces of a vector space, which are just subsets of V which are themselves vector spaces with the inherited operations. The simplest examples of these are lines, planes, and hyperplanes through the origin in \mathbb{R}^n. Consequently, we may identify \mathbb{R}^n as a subspace of \mathbb{R}^m for any n \leq m.

One of the first things we want to ask about a vector space is “how big is it?” While most instances of vector spaces we will see have uncountably many elements, we can characterize “size” in terms of a different metric: the size of a basis.

Definition: A list of vectors (v_1, v_2, \dots v_n) is a basis for V if its elements are linearly independent, and their span is V. The dimension of a vector space is the length of any basis.

For \mathbb{R}^n, and similarly all finite-dimensional vector spaces, it is easy to prove that all bases have the same length, and hence dimension is well-defined. Further, \mathbb{R}^n admits a very natural basis, often called the standard basis:

e_1 = (1,0, \dots, 0)
e_2 = (0,1, \dots, 0)
\vdots
e_n = (0,0, \dots, 1)

These are best visualized as the coordinate axes in \mathbb{R}^n, and it strokes our intuition as to what a basis should be, because any vector in \mathbb{R}^n can be broken down uniquely into a sum of scalar multiples of these unit coordinates. Indeed, this is true of any basis (due to linear independence). Given a fixed basis for V, every vector v \in V may be uniquely written as a linear combination of basis vectors.

Linear Transformations and their Matrix Representations

Moving quickly toward the heart of linear algebra, we may speak of linear transformations (interchangeably, linear maps) between two vector spaces:

Definition: A function f : V \to W is a linear map if it preserves the operations of addition and scalar multiplication. In other words, for all v, w \in V, c \in F, f(v+w) = f(v)+f(w) and f(cv) = cf(v).

Examples are bountiful; some geometrically inspired ones include rotations about the origin, shears, and scalings. These are functions you’d likely see in an image manipulation program like photoshop. From this we can prove a few basic facts, like that every linear map sends 0 to 0 and additive inverses to additive inverses (try it as an exercise).

One remarkable fact that helps us characterize linear maps is that every linear map is determined completely by what it does to a basis. Since every vector x \in V is a linear combination of basis elements, say x=a_1v_1 + \dots + a_nv_n, we see that a linear map plays nicely:

f(x) = f(a_1v_1 + \dots + a_nv_n) = a_1f(v_1) + \dots + a_nf(v_n)

In other words, if we know what f does to a basis, then we know everything about f. In order to aid our computations, we write what f does to each basis vector in a tabular form. To elaborate on the vague word “does,” we need to also fix a basis of our target vector space W, say (w_1, \dots , w_m), and describe each f(v_i) in terms of this basis. We write it in tabular form, as follows:

\begin{pmatrix} | & | & \mathbf{ } & | \\ f(v_1) & f(v_2) & \dots & f(v_n) \\ | & | & \mathbf{ } & | \end{pmatrix}

The jth column corresponds to f(v_j), and the ith row corresponds to the ith coefficient in the expansion of f(v_j) in terms of the basis for W. Here the vertical bars indicate that each element is a column of scalars. We will do an extended example to make this clear.

Consider the map f on \mathbb{R}^3 defined as (x,y,z) \mapsto (y,x,2z+y). It is easy to check this map is linear, and using the standard basis we see that

f(1,0,0) = (0,1,0),
f(0,1,0) = (1,0,1), and
f(0,0,1) = (0,0,2).

or,

f(e_1) = e_2f(e_2) = e_1 + e_3, and f(e_3) = 2e_3.

Hence, the matrix representation of f with respect to the standard basis is

A = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 2 \end{pmatrix}

Now we see that if we take a (column) vector x, and multiply it on the left by our matrix A, the resulting vector is precisely the coordinate representation of f(x) with respect to the basis for W. In fact, the rules for matrix multiplication were constructed very particularly so that this would be the case. In this way, we may arbitrarily switch between viewing f as a transformation and a vector computation. Compositions of linear maps translate to multiplication of two matrices, and matrix inversion (if it exists) is precisely function inversion.

Of course, there are many different bases we could have chosen. Even though we are going from \mathbb{R}^3 \to \mathbb{R}^3, the column basis could be different from the row basis. Fortunately for our purposes, we are not going to consider what basis is appropriate to choose. All that matters is that fixing a basis, the matrix representation of a linear map is unique, and so we may interchange the notation freely. Even so, the truly interesting things about matrices are those properties which are true no matter which basis we prefer to use.

Eigenvectors and Eigenvalues

Definition: A scalar \lambda \in F is an eigenvalue for the linear map A if there exists a non-zero vector v \in V with Av = \lambda v. Any such vector v which satisfies this equation is said to be an eigenvector of A corresponding to \lambda.

Eigenvectors and eigenvalues have a huge number of applications, including facial recognition software, geology, quantum mechanics, and web search. So being able to find them quickly is of great significance to researchers and engineers. What’s interesting is that while eigenvectors depend on a choice of basis, eigenvalues do not. We prove this now:

Proposition: If A and B are different representations of the same linear map, then any eigenvalue of B is an eigenvalue of A.

Proof. It turns out that the process of “changing a basis” can be boiled down to matrix multiplication. Specifically, if A and B are two different matrix representations of the same linear map, we have the existence of some invertible matrix P such that A = PBP^{-1}, or AP = PB. As a result, if v is an eigenvector for B corresponding to the eigenvalue \lambda, then for some APv = PBv = P \lambda v = \lambda Pv and so A(Pv) = \lambda(Pv), and Pv is an eigenvector for A corresponding to \lambda as well. This proves that eigenvalues are invariant with respect to a change of basis, as desired. \square

The point of this is that we can choose whatever basis we want to work with, and compute the eigenvalues where we’re most comfortable. For instance, if we choose a basis that gives the following diagonal representation,

A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}

then we can just eyeball that the eigenvalues are 1, 2, and 3. In fact, there are some very deep theorems in linear algebra that concern the existence and uniqueness of certain matrix representations. For a more in-depth treatment, see Axler, Linear Algebra Done Right. We will cover all the necessary information in the relevant posts, but until then, we are absolutely pooped from typing. Until next time!