Computing Homology

Update: the mistakes made in the code posted here are fixed and explained in a subsequent post (one minor code bug was fixed here, and a less minor conceptual bug is fixed in the linked post).

In our last post in this series on topology, we defined the homology group. Specifically, we built up a topological space as a simplicial complex (a mess of triangles glued together), we defined an algebraic way to represent collections of simplices called chains as vectors in a vector space, we defined the boundary homomorphism $ \partial_k$ as a linear map on chains, and finally defined the homology groups as the quotient vector spaces

$ \displaystyle H_k(X) = \frac{\textup{ker} \partial_k}{\textup{im} \partial_{k+1}}$.

The number of holes in $ X$ was just the dimension of this quotient space.

In this post we will be quite a bit more explicit. Because the chain groups are vector spaces and the boundary mappings are linear maps, they can be represented as matrices whose dimensions depend on our simplicial complex structure. Better yet, if we have explicit representations of our chains by way of a basis, then we can use row-reduction techniques to write the matrix in a standard form.

Of course the problem arises when we want to work with two matrices simultaneously (to compute the kernel-mod-image quotient above). This is not computationally any more difficult, but it requires some theoretical fiddling. We will need to dip a bit deeper into our linear algebra toolboxes to see how it works, so the rusty reader should brush up on their linear algebra before continuing (or at least take some time to sort things out if or when confusion strikes).

Without further ado, let’s do an extended example and work our ways toward a general algorithm. As usual, all of the code used for this post is available on this blog’s Github page.

Two Big Matrices

Recall our example simplicial complex from last time.


We will compute $ H_1$ of this simplex (which we saw last time was $ \mathbb{Q}$) in a more algorithmic way than we did last time.

Once again, we label the vertices 0-4 so that the extra “arm” has vertex 4 in the middle, and its two endpoints are 0 and 2. This gave us orientations on all of the simplices, and the following chain groups. Since the vertex labels (and ordering) are part of the data of a simplicial complex, we have made no choices in writing these down.

$ \displaystyle C_0(X) = \textup{span} \left \{ 0,1,2,3,4 \right \}$

$ \displaystyle C_1(X) = \textup{span} \left \{ [0,1], [0,2], [0,3], [0,4], [1,2], [1,3],[2,3],[2,4] \right \}$

$ \displaystyle C_2(X) = \textup{span} \left \{ [0,1,2], [0,1,3], [0,2,3], [1,2,3] \right \}$

Now given our known definitions of $ \partial_k$ as an alternating sum from last time, we can give a complete specification of the boundary map as a matrix. For $ \partial_1$, this would be

where the row labels are the basis for $ C_0(X)$ and the column labels are the basis for $ C_1(X)$. Similarly, $ \partial_2$ is

The reader is encouraged to check that these matrices are written correctly by referring to the formula for $ \partial$ as given last time.

Remember the crucial property of $ \partial$, that $ \partial^2 = \partial_k \partial_{k+1} = 0$. Indeed, the composition of the two boundary maps just corresponds to the matrix product of the two matrices, and one can verify by hand that the above two matrices multiply to the zero matrix.

We know from basic linear algebra how to compute the kernel of a linear map expressed as a matrix: column reduce and inspect the columns of zeros. Since the process of row reducing is really a change of basis, we can encapsulate the reduction inside a single invertible matrix $ A$, which, when left-multiplied by $ \partial$, gives us the reduced form of the latter. So write the reduced form of $ \partial_1$ as $ \partial_1 A$.

However, now we’re using two different sets of bases for the shared vector space involved in $ \partial_1$ and $ \partial_2$. In general, it will no longer be the case that $ \partial_kA\partial_{k+1} = 0$. The way to alleviate this is to perform the “corresponding” change of basis in $ \partial_{k+1}$. To make this idea more transparent, we return to the basics.

Changing Two Matrices Simultaneously

Recall that a matrix $ M$ represents a linear map between two vector spaces $ f : V \to W$. The actual entries of $ M$ depend crucially on the choice of a basis for the domain and codomain. Indeed, if $ v_i$ form a basis for $ V$ and $ w_j$ for $ W$, then the $ k$-th column of the matrix representation $ M$ is defined to be the coefficients of the representation of $ f(v_k)$ in terms of the $ w_j$. We hope to have nailed this concept down firmly in our first linear algebra primer.

Recall further that row operations correspond to changing a basis for the codomain, and column operations correspond to changing a basis for the domain. For example, the idea of swapping columns $ i,j$ in $ M$ gives a new matrix which is the representation of $ f$ with respect to the (ordered) basis for $ V$ which swaps the order of $ v_i , v_j$. Similar things happen for all column operations (they all correspond to manipulations of the basis for $ V$), while analogously row operations implicitly transform the basis for the codomain. Note, though, that the connection between row operations and transformations of the basis for the codomain are slightly more complicated than they are for the column operations. We will explicitly see how it works later in the post.

And so if we’re working with two maps $ A: U \to V$ and $ B: V \to W$, and we change a basis for $ V$ in $ B$ via column reductions, then in order to be consistent, we need to change the basis for $ V$ in $ A$ via “complementary” row reductions. That is, if we call the change of basis matrix $ Q$, then we’re implicitly sticking $ Q$ in between the composition $ BA$ to get $ (BQ)A$. This is not the same map as $ BA$, but we can make it the same map by adding a $ Q^{-1}$ in the right place:

$ \displaystyle BA = B(QQ^{-1})A = (BQ)(Q^{-1}A)$

Indeed, whenever $ Q$ is a change of basis matrix so is $ Q^{-1}$ (trivially), and moreover the operations that $ Q$ performs on the columns of $ B$ are precisely the operations that $ Q^{-1}$ performs on the rows of $ A$ (this is because elementary row operations take different forms when multiplied on the left or right).

Coming back to our boundary operators, we want a canonical way to view the image of $ \partial_{k+1}$ as sitting inside the kernel of $ \partial_k$. If we go ahead and use column reductions to transform $ \partial_k$ into a form where the kernel is easy to read off (as the columns consisting entirely of zeroes), then the corresponding row operations, when performed on $ \partial_{k+1}$ will tell us exactly the image of $ \partial_{k+1}$ inside the kernel of $ \partial_k$.

This last point is true precisely because $ \textup{im} \partial_{k+1} \subset \textup{ker} \partial_k$. This fact guarantees that the irrelevant rows of the reduced version of $ \partial_{k+1}$ are all zero.

Let’s go ahead and see this in action on our two big matrices above. For $ \partial_1$, the column reduction matrix is

$ \displaystyle A =
0 & 1 & 0 & 0 & 1 & 1 & 0 & 0\\
0 & 0 & 1 & 0 & -1 & 0 & 1 & 1\\
0 & 0 & 0 & 1 & 0 & -1 & -1 & 0\\
-1 & -1 & -1 & -1 & 0 & 0 & 0 & -1\\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1

And the product $ \partial_1 A$ is

$ \displaystyle \partial_1 A =
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
-1 & -1 & -1 & -1 & 0 & 0 & 0 & 0

Now the inverse of $ A$, which is the corresponding basis change for $ \partial_2$, is

$ \displaystyle A^{-1} =
-1 & -1 & -1 & -1 & -0 & -0 & -0 & -0\\
1 & 0 & 0 & 0 & -1 & -1 & 0 & 0\\
0 & 1 & 0 & 0 & 1 & 0 & -1 & -1\\
0 & 0 & 1 & 0 & 0 & 1 & 1 & 0\\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1
\end{pmatrix} $

and the corresponding reduced form of $ \partial_2$ is

$ \displaystyle A^{-1} \partial_2 =
0 & 0 & 0 & 0\\
0 & 0 & 0 & 0\\
0 & 0 & 0 & 0\\
0 & 0 & 0 & 0\\
1 & 0 & 0 & 1\\
0 & 1 & 0 & -1\\
0 & 0 & 1 & 1\\
0 & 0 & 0 & 0

As a side note, we got these matrices by slightly modifying the code from our original post on row reduction to output the change of basis matrix in addition to performing row reduction. It turns out one can implement column reduction as row reduction of the transpose, and the change of basis matrix you get from this process will be the transpose of the change of basis matrix you want (by $ (AB)^\textup{T} = (B^\textup{T}A^\textup{T})$). Though the code is particularly ad-hoc, we include it with the rest of the code used in this post on this blog’s Github page.

Now let’s inspect the two matrices $ \partial_1 A$ and $ A^{-1} \partial_2$ more closely. The former has four “pivots” left over, and this corresponds to the rank of the matrix being 4. Moreover, the four basis vectors representing the columns with nonzero pivots, which we’ll call $ v_1, v_2, v_3, v_4$ (we don’t care what their values are), span a complementary subspace to the kernel of $ \partial_1$. Hence, the remaining four vectors (which we’ll call $ v_5, v_6, v_7, v_8$) span the kernel. In particular, this says that the kernel has dimension 4.

On the other hand, we performed the same transformation of the basis of $ C_1(X)$ for $ \partial_2$. Looking at the matrix that resulted, we see that the first four rows and the last row (representing $ v_1, v_2, v_3, v_4, v_8$) are entirely zeros and so the image of $ \partial_2$ intersects their span trivially. and the remaining three rows (representing $ v_5, v_6, v_7$) have nonzero pivots. This tells us exactly that the image of $ \partial_2$ is spanned by $ v_5, v_6, v_7$.

And now, the coup de grâce, the quotient to get homology is simply

$ \displaystyle \frac{ \textup{span} \left \{ v_5, v_6, v_7, v_8 \right \}}{ \textup{span} \left \{ v_5, v_6, v_7 \right \}} = \textup{span} \left \{ v_8 \right \}$

And the dimension of the homology group is 1, as desired.

The General Algorithm

It is no coincidence that things worked out at nicely as they did. The process we took of simultaneously rewriting two matrices with respect to a common basis is the bulk of the algorithm to compute homology. Since we’re really only interested in the dimensions of the homology groups, we just need to count pivots. If the number of pivots arising in $ \partial_k$ is $ y$ and the number of pivots arising in $ \partial_{k+1}$ is $ z$, and the dimension of $ C_k(X)$ is $ n$, then the dimension is exactly

$ (n-y) – z = \textup{dim}(\textup{ker} \partial_k) – \textup{dim}(\textup{im}\partial_{k+1})$

And it is no coincidence that the pivots lined up so nicely to allow us to count dimensions this way. It is a minor exercise to prove it formally, but the fact that the composition $ \partial_k \partial_{k+1} = 0$ implies that the reduced version of $ \partial_{k+1}$ will have an almost reduced row-echelon form (the only difference being the rows of zeros interspersed above, below, and possibly between pivot rows).

As the reader may have guessed at this point, we don’t actually need to compute $ A$ and $ A^{-1}$. Instead of this, we can perform the column/row reductions simultaneously on the two matrices. The above analysis helped us prove the algorithm works, and with that guarantee we can throw out the analytical baggage and just compute the damn thing.

Indeed, assuming the input is already processed as two matrices representing the boundary operators with respect to the standard bases of the chain groups, computing homology is only slightly more difficult than row reducing in the first place. Putting our homology where our mouth is, we’ve implemented the algorithm in Python. As usual, the entire code used in this post is available on this blog’s Github page.

The first step is writing auxiliary functions to do elementary row and column operations on matrices. For this post, we will do everything in numpy (which makes the syntax shorter than standard Python syntax, but dependent on the numpy library).

import numpy

def rowSwap(A, i, j):
   temp = numpy.copy(A[i, :])
   A[i, :] = A[j, :]
   A[j, :] = temp

def colSwap(A, i, j):
   temp = numpy.copy(A[:, i])
   A[:, i] = A[:, j]
   A[:, j] = temp

def scaleCol(A, i, c):
   A[:, i] *= c*numpy.ones(A.shape[0])

def scaleRow(A, i, c):
   A[i, :] *= c*numpy.ones(A.shape[1])

def colCombine(A, addTo, scaleCol, scaleAmt):
   A[:, addTo] += scaleAmt * A[:, scaleCol]

def rowCombine(A, addTo, scaleRow, scaleAmt):
   A[addTo, :] += scaleAmt * A[scaleRow, :]

From here, the main meat of the algorithm is doing column reduction on one matrix, and applying the corresponding row operations on the other.

def simultaneousReduce(A, B):
   if A.shape[1] != B.shape[0]:
      raise Exception("Matrices have the wrong shape.")

   numRows, numCols = A.shape # col reduce A

   i,j = 0,0
   while True:
      if i >= numRows or j >= numCols:

      if A[i][j] == 0:
         nonzeroCol = j
         while nonzeroCol < numCols and A[i,nonzeroCol] == 0:
            nonzeroCol += 1

         if nonzeroCol == numCols:
            i += 1

         colSwap(A, j, nonzeroCol)
         rowSwap(B, j, nonzeroCol)

      pivot = A[i,j]
      scaleCol(A, j, 1.0 / pivot)
      scaleRow(B, j, 1.0 / pivot)

      for otherCol in range(0, numCols):
         if otherCol == j:
         if A[i, otherCol] != 0:
            scaleAmt = -A[i, otherCol]
            colCombine(A, otherCol, j, scaleAmt)
            rowCombine(B, j, otherCol, -scaleAmt)

      i += 1; j+= 1

   return A,B

This more or less parallels the standard algorithm for row-reduction (with the caveat that all the indices are swapped to do column-reduction). The only somewhat confusing line is the call to rowCombine, which explicitly realizes the corresponding row operation as the inverse of the performed column operation. Note that for row operations, the correspondence between operations on the basis and operations on the rows is not as direct as it is for columns. What’s given above is the true correspondence. Writing down lots of examples will reveal why, and we leave that as an exercise to the reader.

Then the actual algorithm to compute homology is just a matter of counting pivots. Here are two pivot-counting functions in a typical numpy fashion:

def numPivotCols(A):
   z = numpy.zeros(A.shape[0])
   return [numpy.all(A[:, j] == z) for j in range(A.shape[1])].count(False)

def numPivotRows(A):
   z = numpy.zeros(A.shape[1])
   return [numpy.all(A[i, :] == z) for i in range(A.shape[0])].count(False)

And the final function is just:

def bettiNumber(d_k, d_kplus1):
   A, B = numpy.copy(d_k), numpy.copy(d_kplus1)
   simultaneousReduce(A, B)

   dimKChains = A.shape[1]
   kernelDim = dimKChains - numPivotCols(A)
   imageDim = numPivotRows(B)

   return kernelDim - imageDim

And there we have it! We’ve finally tackled the beast, and written a program to compute algebraic features of a topological space.

The reader may be curious as to why we didn’t come up with a more full-bodied representation of a simplicial complex and write an algorithm which accepts a simplicial complex and computes all of its homology groups. We’ll leave this direct approach as a (potentially long) exercise to the reader, because coming up in this series we are going to do one better. Instead of computing the homology groups of just one simplicial complex using by repeating one algorithm many times, we’re going to compute all the homology groups of a whole family of simplicial complexes in a single bound. This family of simplicial complexes will be constructed from a data set, and so, in grandiose words, we will compute the topological features of data.

If it sounds exciting, that’s because it is! We’ll be exploring a cutting-edge research field known as persistent homology, and we’ll see some of the applications of this theory to data analysis.

Until then!

Double Angle Trigonometric Formulas

Problem: Derive the double angle identities

$ \sin(2\theta) = 2\sin(\theta)\cos(\theta)\\ \cos(2\theta) = \cos^2(\theta) – \sin^2(\theta)$

Solution: Recall from linear algebra how one rotates a point in the plane. The matrix of rotation (derived by seeing where $ (1,0)$ and $ (0,1)$ go under a rotation by $ \theta$, and writing those coordinates in the columns) is

$ A = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix}$

Next, note that to rotate a point twice by $ \theta$, we simply multiply the point (as a vector) by $ A$ twice. That is, multiply by $ A^2$:

$ AAv = A^2v$

Computing $ A^2$ gives the following matrix:

$ A^2 = \begin{pmatrix} \cos^2(\theta) – \sin^2(\theta) & -2\sin(\theta)\cos(\theta) \\ 2\sin(\theta)\cos(\theta) & \cos^2(\theta) – \sin^2(\theta) \end{pmatrix}$

But rotating twice by $ \theta$ is the same as rotating once by $ 2\theta$, so we have the equality:

$ \begin{pmatrix} \cos(2\theta) & -\sin(2\theta) \\ \sin(2\theta) & \cos(2\theta) \end{pmatrix} = \begin{pmatrix} \cos^2(\theta) – \sin^2(\theta) & -2\sin(\theta)\cos(\theta) \\ 2\sin(\theta)\cos(\theta) & \cos^2(\theta) – \sin^2(\theta) \end{pmatrix}$

The matrices are equal, so they must be equal entrywise, giving the identities we desire. $ \square$

Discussion: There are (painful, messy) ways to derive these identities by drawing triangles on the unit circle and cultishly chanting “soh-cah-toa.” The key idea in this proof that one might study geometric transformations, and it is a truly mature viewpoint of mathematics. Specifically, over the last two hundred years the field of mathematics has changed focus from the study of mathematical “things” to the study of transformations of mathematical things. This proof is an elementary example of the power such perspective can provide. If you want to be really high-brow, start asking about transformations of transformations of things, and transformations of those transformations, and recurse until you’re doing something original.

Inner Product Spaces – A Primer

This primer is a precursor to a post on eigenfaces, a technique for facial recognition. For a more basic introduction to linear algebra, see our first primer on the subject.


Vector spaces alone are not enough to do a lot of the interesting things we’d like them to do. Since a vector space is a generalization of Euclidean space, it is natural for us to investigate more specific types of vector spaces which are more akin to Euclidean space. In particular, we want to include the notion of a dot product. By admitting additional structure to a vector space, we may perform more computations, and hopefully get more interesting results.

Again, since we are developing mathematics for use in computing, all vector spaces in this post are finite-dimensional, and the field under question is always $ \mathbb{R}$ or $ \mathbb{C}$, but usually the former. It suffices to recognize the vast amount of work and results in infinite-dimensional spaces, but we will not need it here.

Inner Product Spaces

The standard dot product operation in Euclidean space is defined as

$ \left \langle (x_1, \dots , x_n), (y_1, \dots, y_n) \right \rangle = \sum \limits_{i = 1}^n x_iy_i$

So we take the dot product and generalize it to an operation on an arbitrary vector space.

Definition: An inner product on a vector space $ V$ over a field $ F$ is a function $ \left \langle \cdot, \cdot \right \rangle : V \times V \to F$ which satisfies the following properties for all $ x,y,z \in V, c \in F$:

  • $ \left \langle x,y \right \rangle = \overline{ \left \langle y, x\right \rangle }$, where the bar denotes complex conjugate. For real fields, this is just symmetry in the arguments.
  • $ \left \langle cx, y \right \rangle = c \left \langle x,y \right \rangle$
  • $ \left \langle x+y, z \right \rangle = \left \langle x,z \right \rangle + \left \langle y,z \right \rangle$
  • $ \left \langle x,x \right \rangle \geq 0$ and $ \left \langle x,x \right \rangle = 0$ if and only if $ x=0$.

We recommend the novice reader invent some simple and wild operations on two vectors, and confirm or deny that they are inner products.

Notice that the second and third conditions imply that if the second argument of an inner product is fixed, then the resulting map $ V \to F$ is a linear map (since it maps vectors to the underlying field, it has a special name: a linear functional). We leave it as an exercise to the reader to investigate linearity facts about the map resulting from fixing the first argument (hint: things get conjugated).

We call a vector space with an associated inner product and inner product space. Of course, the most natural example of an inner product space is any Euclidean space $ \mathbb{R}^n$ with the dot product. However, there are many other interesting inner products, including ones which involve matrix multiplication, integration, and random variables. As interesting as they may be (and though the results we develop here hold for them), we have no need for them at this point in time. We will stick entirely to inner product spaces embedded in $ \mathbb{R}^n$, and the standard dot product will suffice.

Now from any inner product we may induce a norm on $ V$, which is colloquially the “distance” of a vector from the origin.

Definition: A norm on $ V$ is a function $ \left \| \cdot \right \| : V \to R$ which satisfies the following for all $ x,y \in V, c \in F$:

  • $ \left \| x \right \| \geq 0$, with equality if and only if $ x=0$
  • $ \left \| cx \right \| = |c| \left \| x \right \|$
  • $ \left \| x+y \right \| \leq \left \| x \right \| + \left \| y \right \|$ (the infamous triangle inequality)

If we recall the standard Euclidean norm, we see that it is just $ \left \| x \right \| = \sqrt{\left \langle x,x \right \rangle}$. Indeed, for any inner product this definition satisfies the axioms of a norm, and so it is a natural generalization of “distance” between points in any inner product space.

In particular, those readers familiar with topology (or at least metric space analysis), will immediately recognize that a norm induces a metric on $ V$, defined by $ d(x,y)= \left \| x-y \right \| $, where of course $ x-y$ is the vector between $ x$ and $ y$. Hence, every inner product space has a very rigid (metrized) topology and a fruitful geometry.

Additionally, any vector of norm 1 is called a unit vector, or a normal vector.

Now we look at vectors which have interesting relationships under inner products: specifically, when $ \left \langle x,y \right \rangle = 0$.

Orthogonality and Orthonormal Bases

Definition: Two vectors $ x,y \in V$ are orthogonal if $ \left \langle x,y \right \rangle = 0$. A set $ S$ of vectors is called orthogonal if the vectors in $ S$ are pairwise orthogonal.

Orthogonal vectors naturally generalize perpendicularity in Euclidean space. In particular, two vectors in $ \mathbb{R}^n$ are orthogonal if and only if the subspaces (lines) spanned by them are perpendicular.

The simplest examples of orthogonal vectors are the standard basis vectors $ (1,0, \dots, 0) \dots (0, \dots ,1)$, with respect to the usual dot product. Although, any scalar multiple of a basis vector $ \lambda e_i$ may replace $ e_i$ while still preserving the orthogonality of the set.

Orthogonality gives a new insight into the nature of inner products. Specifically, $ \left \langle x,y \right \rangle$ gives (almost) the length of the projection of $ x$ onto $ y$. In other words, $ \left \langle x,y \right \rangle$ is the component of $ x$ that points in the direction of $ y$ (scaled by the length of $ y$).

Now we may define projection maps to get “projection onto $ y$” more faithfully than a plain inner product:

Definition: The projection of $ x$ onto $ y$, denoted $ p_y(x)$, is the map $ V \to V$ defined by $ p_y(x) = \left \langle x, y \right \rangle \frac{y}{\left \| y \right \|^2}$.

The reader may easily verify that the vectors $ p_y(x)$ and $ x-p_y(x)$ are indeed orthogonal, though the computation is awkward. This will come in handy later when we want to build orthogonal sets of vectors.

In addition, we may obviously decompose any vector $ v$ into its two orthogonal components with respect to another vector $ w$ via this projection: $ v = (v-p_w(v)) + p_w(v)$.

In addition to orthogonality, the standard basis vectors have norm 1. We call such a basis for an inner product space an orthonormal basis (a portmanteau of orthogonal and normal). We commonly denote an orthonormal set of vectors $ (e_1, \dots, e_n)$, perhaps a ritualistic attempt to summon the power of the standard basis vectors.

Given an orthonormal basis for an inner product space V $ (e_1, \dots, e_n)$, we may decompose any vector into its basis representation rather easily:

$ \displaystyle v = p_{e_1}(v) + \dots + p_{e_n}(v) = \left \langle v,e_1 \right \rangle e_1 + \dots + \left \langle v,e_n \right \rangle e_n$,

Since the norm of each $ e_i$ is 1, we may skip the division by the square norm of $ e_i$ in the projection maps. Now, recalling that every vector can be written uniquely as a linear combination of the basis vectors, we see that the inner products $ \left \langle v, e_i \right \rangle$ are precisely those coefficients. The recognition of this fact leads us to an important theorem, with a necessary preliminary definition:

Definition: Two inner product spaces $ V, W$ are isometric if there exists a linear isomorphism $ f:V \to W$ which preserves the inner products in the respective spaces. i.e., $ \left \langle x, y \right \rangle_V = \left \langle f(x), f(y) \right \rangle_W$ for all $ x,y \in V$.

Whereas, linear isomorphisms between vector spaces are the mathematically rigorous way of saying the two vector spaces are identical, isometric vector spaces give the “sameness” of the inner products. Hence, isometric vector spaces have equivalent metrics, topologies, and geometries, with merely a different choice of basis and representation of the vectors. Now, as we had that every finite-dimensional vector space was isomorphic to $ \mathbb{R}^n$, we will soon see that every finite-dimensional inner product space is isometric to $ \mathbb{R}^n$ with the usual dot product.

Theorem: Any finite dimensional real inner product space $ V$ which has an orthonormal basis is isometric to $ \mathbb{R}^n$ with the usual dot product.

Proof: Define $ f: \mathbb{R}^n \to V$ by $ f((x_1, \dots, x_n)) = x_1e_1 + \dots + x_ne_n$. Now, by computation, we see that

$ \left \langle f((a_1, \dots , a_n)), f((b_1, \dots , b_n)) \right \rangle$
$ = \left \langle a_1e_1 + \dots + a_ne_n, b_1e_1 + \dots + b_ne_n \right \rangle$
$ = \sum \limits_{i=1}^n a_i \left \langle e_i, b_1e_1 + \dots + b_ne_n \right \rangle$
$ = \sum \limits_{i=1}^n \sum \limits_{j=1}^n a_i b_j \left \langle e_i, e_j \right \rangle$
$ = a_1b_1 + \dots a_nb_n = \left \langle (a_1, \dots a_n), (b_1, \dots b_n) \right \rangle \square$

Now, all we must prove is that every finite-dimensional inner product space has an orthonormal basis.

Theorem (Gram-Schmidt): Every basis of a finite-dimensional inner product space may be transformed into an orthonormal basis.

Proof: We do so by construction, using the previously introduced projection maps $ p_y(x)$. Given a basis $ (x_1, \dots , x_n)$, we compute the following algorithm:

  1. $ \displaystyle e_1 = \frac{x_1}{\left \| x_1 \right \|}$
  2. For each $ i = 2, \dots , n$:
    $ \ $

    1. Let $ z_i = \sum \limits_{j=1}^{i-1} p_{e_j}(x_i)$
      $ \ $
    2. $ \displaystyle e_i = \frac{x_i – z_i}{\left \| x_i – z_i \right \|}$

The reader may verify computationally that this process produces orthonormal vectors, but the argument is better phrased geometrically: each $ z_i$ is the projection of some new vector onto the subspace generated by all previously computed orthonormal vectors. By subtracting $ x_i – z_i$, we take the part of $ x_i$ that is orthogonal to all vectors in that subspace. This is guaranteed to be nonzero because of the linear independence of our original list. And so, $ e_i$ is orthogonal to every vector preceding it in the algorithm (with indices $ j < i$). Finally, we normalize each $ e_i$ to make them unit vectors, and we are done. $ \square$

We note that this algorithm takes $ O(n^3)$, since we may compute the $ O(n^2)$ needed inner products ahead of time, and then there remains $ O(n)$ steps to compute each of the $ e_i$.

This result now proves that every real finite-dimensional inner product space is isometric to $ \mathbb{R}^n$. With this new insight, we may effectively do all our work in $ \mathbb{R}^n$ with the usual dot product, realizing that the results there hold for all relevant inner product spaces. In other words, our “simplification” at the beginning of this post, restricting our work to $ \mathbb{R}^n$, was not at all a simplification. Proving statements about $ \mathbb{R}^n$ gives us equivalent statements about all real finite-dimensional inner product spaces. Wonderful!

Bases of Eigenvectors

There is one more important topic we wish to discuss here: the importance of bases of eigenvectors. In particular, given a linear operator $ A: V \to V$, if one has a basis of eigenvectors for $ A$, then $ A$ has a diagonal representation.

In particular, if $ A$ has a basis of eigenvectors $ (v_1, \dots , v_n)$, then the expansion of $ Av_i$ in terms of the basis vectors is just $ \lambda_i v_i$, where $ \lambda_i$ is the corresponding eigenvalue. Thus, the matrix corresponding to $ A$ looks like:

$ \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n

Here we count multiplicity of eigenvalues.

The existence of a diagonal representation for a matrix has interesting computational implications. For instance, we often wish to take high powers of a matrix, such as in counting paths in a graph, or working with graphical computations. Unfortunately, each successive matrix multiplication takes $ O(n^2)$ computations. If we wish to compute $ A^m$, this takes $ O(mn^2)$ time. However, if the matrix has a diagonal representation, we may spend the $ O(n^2)$ it takes to convert the matrix to its diagonal form, take powers of the diagonal matrix by simply taking the powers of the diagonal entries, and then convert it back. Indeed, multiplying two diagonal matrices together is just as easy as multiplying the diagonal entries together, as the reader may verify. This optimization reduces computation to $ O(n^2 + mn) = O(nm)$, since we are assuming $ m$ is very large.

Of course, then we are left with the problem of quickly computing eigenvectors. What worries us even more is that we might not have a basis of eigenvectors (some matrices don’t have any!). We instead take a slightly different route, which serves our purposes better. Specifically, we will be using this information to compute eigenvectors of symmetric matrices ($ A = A^{\textup{T}}$). For this, we refer to a grand theorem:

The Spectral Theorem: Every real symmetric matrix has an orthonormal basis consisting of eigenvectors.

The proof goes beyond the scope of this post (see: characteristic polynomials and self-adjoint operators), but it is very useful for us. In particular, by finding these eigenvectors we may both have a diagonal representation for our matrix, and also compute projections in a jiffy! We will see the astounding applications of this quite soon.

So Even with two primers on linear algebra, we have still only scratched the surface of this wonderful subject. In the future we may continue this series of primers by discussing the linear algebra inherent in many optimization problems. Be sure to look out for it.

Until next time!

Linear Algebra – A Primer

Story Time

Linear algebra was founded around the same time as Calculus (think Leibniz, circa 1700) solely for the purpose of solving general systems of linear equations. The coefficients of a system were written in a grid form, with rows corresponding to equations and columns to the unknown variables. Using a computational tool called the determinant (an awkward, but computable formula involving only the coefficients of the equations in a system), researchers were able to solve these systems, opening a world of information about the positions of celestial bodies and large-scale measurements (of geodesic arcs) on the surface of the earth.

By the 1850’s, Arthur Cayley was representing matrices as abstract objects. He defined matrix multiplication and nurtured matrix theory as its own field, recognizing a vast wealth of theoretical knowledge underlying the theory of determinants. Around turn of the century, a formal system of vector algebra was invented which relied heavily on interpreting matrices as so-called linear transformations. Linear transformations are intuitively those maps of everyday space ($ \mathbb{R}^n$) which preserve “linear” things. Specifically, they send lines to lines, planes to planes, etc., and they preserve the origin (one which does not preserve the origin is very similar but has a different name; see Affine Transformation). Soon enough the mathematical focus shifted to the foundations of such an algebra, and later with the advent of computers to rapid calculations in one.


Linear algebra sits at the crossroads of many areas of mathematics. Keeping close to its roots, linear algebra is primarily a tool for computation. Unsurprisingly, a huge chunk of mathematical research has been solely to phrase things in terms of matrices and their associated linear transformations. For instance, an undirected graph on $ n$ vertices can be modeled as a matrix of integer entries, with the $ i,j$ entry containing the number of edges from vertex $ i$ to vertex $ j$. This is called the adjacency matrix of a graph. Suddenly, a wealth of information about the graph translates to simple matrix computations. For instance, we can compute the number of paths from one vertex to another of length $ m$ as the appropriate entry of $ A^m$. (more formally,these are walks, which are allowed to repeat edge traversals and visited vertices)

Even in advanced, purely theoretical mathematics, objects are commonly represented in terms of coordinates in some vector space, and are subsequently studied using all of the great things we know about linear transformations and their matrices. And so, without further ado, we will present the terminology and working concepts necessary for the content elsewhere in this blog.

Vector Spaces

The setting for all of linear algebra is in some vector space. Intuitively this is just a collection of objects, which we call vectors, with some rules on how you can combine vectors to get other vectors. This treatment wouldn’t do that idea justice without an axiomatic definition, so here it is.

Definition: A vector space is a quadruple $ (V, F, +, \cdot)$, where $ V$ is a set of vectors (points in our space), $ F$ is a scalar field (coefficients), $ +:V \times V \to V$ is a commutative, associative operation to combine vectors, and $ \cdot: F \times V \to V$ is an operation to “scale” vectors. In addition, we need the following properties to hold:

  • Addition and multiplication distribute (as we are used to with traditional algebra).
  • There must be an additive identity, which we call $ 0$, giving $ 0 + v = v$ for all $ v \in V$.
  • Every vector must have an additive inverse (every $ v$ has some $ w$ with $ v + w = 0$).

This is a lot to swallow at first, but it is general for a good reason: there are tons of different kinds of vector spaces! Many of these are surprising and counter-intuitive. For our purposes, however, we may stick with the nice, small vector spaces. So here is a simplified definition that will suffice:

Definition: vector space is a set $ V$ of vectors which are fixed-length lists of real numbers $ (v_1, v_2, \dots , v_n) \in \mathbb{R}^n$, where addition between vectors is componentwise, we may scale vectors by any real number, and the following properties hold:

  • Addition and multiplication distribute (as above).
  • $ (0,0,0, \dots, 0)$ is the additive identity.
  • $ (-v_1, -v_2, \dots , -v_n)$ is the unique additive inverse of $ (v_1, v_2, \dots , v_n)$.

Hopefully this is much more familiar to what we think of as “vectors,” and with the understanding that we are viewing it as a vector space, we just call it $ \mathbb{R}^n$. The closure of operations gives us a nice way to characterize “any combination” of vectors in a vector space.

Definition: A linear combination of vectors in a vector space $ V$ is the vector

$ a_1v_1 + a_2v_2 + \dots + a_kv_k$

for some positive integer $ k$, scalars $ a_i$, and vectors $ v_i$.

We may speak of the span of a set of vectors as the set of all possible linear combinations of those vectors. Furthermore, we call a set of vectors linearly independent if no vector in the list is in the span of the others. For example, $ (1,0,0), (0,1,0),$ and $ (0,0,1)$ are linearly independent in $ \mathbb{R}^3$. Specifically, $ (1,0,0)$ cannot be written as $ a(0,1,0) + b(0,0,1) = (0,a,b)$ for any scalars $ a,b \in F$, and the other two vectors are similarly so.

As usual, we may describe subspaces of a vector space, which are just subsets of $ V$ which are themselves vector spaces with the inherited operations. The simplest examples of these are lines, planes, and hyperplanes through the origin in $ \mathbb{R}^n$. Consequently, we may identify $ \mathbb{R}^n$ as a subspace of $ \mathbb{R}^m$ for any $ n \leq m$.

One of the first things we want to ask about a vector space is “how big is it?” While most instances of vector spaces we will see have uncountably many elements, we can characterize “size” in terms of a different metric: the size of a basis.

Definition: A list of vectors $ (v_1, v_2, \dots v_n)$ is a basis for $ V$ if its elements are linearly independent, and their span is $ V$. The dimension of a vector space is the length of any basis.

For $ \mathbb{R}^n$, and similarly all finite-dimensional vector spaces, it is easy to prove that all bases have the same length, and hence dimension is well-defined. Further, $ \mathbb{R}^n$ admits a very natural basis, often called the standard basis:

$ e_1 = (1,0, \dots, 0)$
$ e_2 = (0,1, \dots, 0)$
$ \vdots$
$ e_n = (0,0, \dots, 1)$

These are best visualized as the coordinate axes in $ \mathbb{R}^n$, and it strokes our intuition as to what a basis should be, because any vector in $ \mathbb{R}^n$ can be broken down uniquely into a sum of scalar multiples of these unit coordinates. Indeed, this is true of any basis (due to linear independence). Given a fixed basis for $ V$, every vector $ v \in V$ may be uniquely written as a linear combination of basis vectors.

Linear Transformations and their Matrix Representations

Moving quickly toward the heart of linear algebra, we may speak of linear transformations (interchangeably, linear maps) between two vector spaces:

Definition: A function $ f : V \to W$ is a linear map if it preserves the operations of addition and scalar multiplication. In other words, for all $ v, w \in V, c \in F, f(v+w) = f(v)+f(w)$ and $ f(cv) = cf(v)$.

Examples are bountiful; some geometrically inspired ones include rotations about the origin, shears, and scalings. These are functions you’d likely see in an image manipulation program like photoshop. From this we can prove a few basic facts, like that every linear map sends $ 0$ to $ 0$ and additive inverses to additive inverses (try it as an exercise).

One remarkable fact that helps us characterize linear maps is that every linear map is determined completely by what it does to a basis. Since every vector $ x \in V$ is a linear combination of basis elements, say $ x=a_1v_1 + \dots + a_nv_n$, we see that a linear map plays nicely:

$ f(x) = f(a_1v_1 + \dots + a_nv_n) = a_1f(v_1) + \dots + a_nf(v_n)$

In other words, if we know what $ f$ does to a basis, then we know everything about $ f$. In order to aid our computations, we write what $ f$ does to each basis vector in a tabular form. To elaborate on the vague word “does,” we need to also fix a basis of our target vector space $ W$, say $ (w_1, \dots , w_m)$, and describe each $ f(v_i)$ in terms of this basis. We write it in tabular form, as follows:

$ \begin{pmatrix} | & | & \mathbf{ } & | \\ f(v_1) & f(v_2) & \dots & f(v_n) \\ | & | & \mathbf{ } & | \end{pmatrix}$

The $ j$th column corresponds to $ f(v_j)$, and the $ i$th row corresponds to the $ i$th coefficient in the expansion of $ f(v_j)$ in terms of the basis for $ W$. Here the vertical bars indicate that each element is a column of scalars. We will do an extended example to make this clear.

Consider the map $ f$ on $ \mathbb{R}^3$ defined as $ (x,y,z) \mapsto (y,x,2z+y)$. It is easy to check this map is linear, and using the standard basis we see that

$ f(1,0,0) = (0,1,0)$,
$ f(0,1,0) = (1,0,1)$, and
$ f(0,0,1) = (0,0,2)$.


$ f(e_1) = e_2$, $ f(e_2) = e_1 + e_3$, and $ f(e_3) = 2e_3$.

Hence, the matrix representation of $ f$ with respect to the standard basis is

$ A = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 2 \end{pmatrix}$

Now we see that if we take a (column) vector $ x$, and multiply it on the left by our matrix $ A$, the resulting vector is precisely the coordinate representation of $ f(x)$ with respect to the basis for $ W$. In fact, the rules for matrix multiplication were constructed very particularly so that this would be the case. In this way, we may arbitrarily switch between viewing $ f$ as a transformation and a vector computation. Compositions of linear maps translate to multiplication of two matrices, and matrix inversion (if it exists) is precisely function inversion.

Of course, there are many different bases we could have chosen. Even though we are going from $ \mathbb{R}^3 \to \mathbb{R}^3$, the column basis could be different from the row basis. Fortunately for our purposes, we are not going to consider what basis is appropriate to choose. All that matters is that fixing a basis, the matrix representation of a linear map is unique, and so we may interchange the notation freely. Even so, the truly interesting things about matrices are those properties which are true no matter which basis we prefer to use.

Eigenvectors and Eigenvalues

Definition: A scalar $ \lambda \in F$ is an eigenvalue for the linear map $ A$ if there exists a non-zero vector $ v \in V$ with $ Av = \lambda v$. Any such vector $ v$ which satisfies this equation is said to be an eigenvector of $ A$ corresponding to $ \lambda$.

Eigenvectors and eigenvalues have a huge number of applications, including facial recognition software, geology, quantum mechanics, and web search. So being able to find them quickly is of great significance to researchers and engineers. What’s interesting is that while eigenvectors depend on a choice of basis, eigenvalues do not. We prove this now:

Proposition: If $ A$ and $ B$ are different representations of the same linear map, then any eigenvalue of $ B$ is an eigenvalue of $ A$.

Proof. It turns out that the process of “changing a basis” can be boiled down to matrix multiplication. Specifically, if $ A$ and $ B$ are two different matrix representations of the same linear map, we have the existence of some invertible matrix $ P$ such that $ A = PBP^{-1}$, or $ AP = PB$. As a result, if $ v$ is an eigenvector for $ B$ corresponding to the eigenvalue $ \lambda$, then for some $ APv = PBv = P \lambda v = \lambda Pv$ and so $ A(Pv) = \lambda(Pv)$, and $ Pv$ is an eigenvector for $ A$ corresponding to $ \lambda$ as well. This proves that eigenvalues are invariant with respect to a change of basis, as desired. $ \square$

The point of this is that we can choose whatever basis we want to work with, and compute the eigenvalues where we’re most comfortable. For instance, if we choose a basis that gives the following diagonal representation,

$ A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}$

then we can just eyeball that the eigenvalues are 1, 2, and 3. In fact, there are some very deep theorems in linear algebra that concern the existence and uniqueness of certain matrix representations. For a more in-depth treatment, see Axler, Linear Algebra Done Right. We will cover all the necessary information in the relevant posts, but until then, we are absolutely pooped from typing. Until next time!