A Spectral Analysis of Moore Graphs

For fixed integers r > 0, and odd g, a Moore graph is an r-regular graph of girth g which has the minimum number of vertices n among all such graphs with the same regularity and girth.

(Recall, A the girth of a graph is the length of its shortest cycle, and it’s regular if all its vertices have the same degree)

Problem (Hoffman-Singleton): Find a useful constraint on the relationship between n and r for Moore graphs of girth 5 and degree r.

Note: Excluding trivial Moore graphs with girth g=3 and degree r=2, there are only two known Moore graphs: (a) the Petersen graph and (b) this crazy graph:

hoffman_singleton_graph_circle2

The solution to the problem shows that there are only a few cases left to check.

Solution: It is easy to show that the minimum number of vertices of a Moore graph of girth 5 and degree r is 1 + r + r(r-1) = r^2 + 1. Just consider the tree:

500px-petersen-as-moore-svg

This is the tree example for r = 3, but the argument should be clear for any r from the branching pattern of the tree: 1 + r + r(r-1)

Provided n = r^2 + 1, we will prove that r must be either 3, 7, or 57. The technique will be to analyze the eigenvalues of a special matrix derived from the Moore graph.

Let A be the adjacency matrix of the supposed Moore graph with these properties. Let B = A^2 = (b_{i,j}). Using the girth and regularity we know:

  • b_{i,i} = r since each vertex has degree r.
  • b_{i,j} = 0 if (i,j) is an edge of G, since any walk of length 2 from i to j would be able to use such an edge and create a cycle of length 3 which is less than the girth.
  • b_{i,j} = 1 if (i,j) is not an edge, because (using the tree idea above), every two vertices non-adjacent vertices have a unique neighbor in common.

Let J_n be the n \times n matrix of all 1’s and I_n the identity matrix. Then

\displaystyle B = rI_n + J_n - I_n - A.

We use this matrix equation to generate two equations whose solutions will restrict r. Since A is a real symmetric matrix is has an orthonormal basis of eigenvectors v_1, \dots, v_n with eigenvalues \lambda_1 , \dots, \lambda_n. Moreover, by regularity we know one of these vectors is the all 1’s vector, with eigenvalue r. Call this v_1 = (1, \dots, 1), \lambda_1 = r. By orthogonality of v_1 with the other v_i, we know that J_nv_i = 0. We also know that, since A is an adjacency matrix with zeros on the diagonal, the trace of A is \sum_i \lambda_i = 0.

Multiply the matrices in the equation above by any v_i, i > 1 to get

\displaystyle \begin{aligned}A^2v_i &= rv_i - v_i - Av_i \\ \lambda_i^2v_i &= rv_i - v_i - \lambda_i v_i \end{aligned}

Rearranging and factoring out v_i gives \lambda_i^2 - \lambda_i - (r+1) = 0. Let z = 4r - 3, then the non-r eigenvalues must be one of the two roots: \mu_1 = (-1 + \sqrt{z}) / 2 or \mu_2 = (-1 - \sqrt{z})/2.

Say that \mu_1 occurs a times and \mu_2 occurs b times, then n = a + b + 1. So we have the following equations.

\displaystyle \begin{aligned} a + b + 1 &= n \\ r + a \mu_1 + b\mu_2 &= 0 \end{aligned}

From this equation you can easily derive that \sqrt{z} is an integer, and as a consequence r = (m^2 + 3) / 4 for some integer m. With a tiny bit of extra algebra, this gives

\displaystyle m(m^3 - 2m - 16(a-b)) = 15

Implying that m divides 15, meaning m \in \{ 1, 3, 5, 15\}, and as a consequence r \in \{ 1, 3, 7, 57\}.

\square

Discussion: This is a strikingly clever use of spectral graph theory to answer a question about combinatorics. Spectral graph theory is precisely that, the study of what linear algebra can tell us about graphs. For an deeper dive into spectral graph theory, see the guest post I wrote on With High Probability.

If you allow for even girth, there are a few extra (infinite families of) Moore graphs, see Wikipedia for a list.

With additional techniques, one can also disprove the existence of any Moore graphs that are not among the known ones, with the exception of a possible Moore graph of girth 5 and degree 57 on n = 3250 vertices. It is unknown whether such a graph exists, but if it does, it is known that

You should go out and find it or prove it doesn’t exist.

Hungry for more applications of linear algebra to combinatorics and computer science? The book Thirty-Three Miniatures is a fantastically entertaining book of linear algebra gems (it’s where I found the proof in this post). The exposition is lucid, and the chapters are short enough to read on my daily train commute.

Advertisements

Inner Product Spaces – A Primer

This primer is a precursor to a post on eigenfaces, a technique for facial recognition. For a more basic introduction to linear algebra, see our first primer on the subject.

Motivations

Vector spaces alone are not enough to do a lot of the interesting things we’d like them to do. Since a vector space is a generalization of Euclidean space, it is natural for us to investigate more specific types of vector spaces which are more akin to Euclidean space. In particular, we want to include the notion of a dot product. By admitting additional structure to a vector space, we may perform more computations, and hopefully get more interesting results.

Again, since we are developing mathematics for use in computing, all vector spaces in this post are finite-dimensional, and the field under question is always \mathbb{R} or \mathbb{C}, but usually the former. It suffices to recognize the vast amount of work and results in infinite-dimensional spaces, but we will not need it here.

Inner Product Spaces

The standard dot product operation in Euclidean space is defined as

\left \langle (x_1, \dots , x_n), (y_1, \dots, y_n) \right \rangle = \sum \limits_{i = 1}^n x_iy_i

So we take the dot product and generalize it to an operation on an arbitrary vector space.

Definition: An inner product on a vector space V over a field F is a function \left \langle \cdot, \cdot \right \rangle : V \times V \to F which satisfies the following properties for all x,y,z \in V, c \in F:

  • \left \langle x,y \right \rangle = \overline{ \left \langle y, x\right \rangle }, where the bar denotes complex conjugate. For real fields, this is just symmetry in the arguments.
  • \left \langle cx, y \right \rangle = c \left \langle x,y \right \rangle
  • \left \langle x+y, z \right \rangle = \left \langle x,z \right \rangle + \left \langle y,z \right \rangle
  • \left \langle x,x \right \rangle \geq 0 and \left \langle x,x \right \rangle = 0 if and only if x=0.

We recommend the novice reader invent some simple and wild operations on two vectors, and confirm or deny that they are inner products.

Notice that the second and third conditions imply that if the second argument of an inner product is fixed, then the resulting map V \to F is a linear map (since it maps vectors to the underlying field, it has a special name: a linear functional). We leave it as an exercise to the reader to investigate linearity facts about the map resulting from fixing the first argument (hint: things get conjugated).

We call a vector space with an associated inner product and inner product space. Of course, the most natural example of an inner product space is any Euclidean space \mathbb{R}^n with the dot product. However, there are many other interesting inner products, including ones which involve matrix multiplication, integration, and random variables. As interesting as they may be (and though the results we develop here hold for them), we have no need for them at this point in time. We will stick entirely to inner product spaces embedded in \mathbb{R}^n, and the standard dot product will suffice.

Now from any inner product we may induce a norm on V, which is colloquially the “distance” of a vector from the origin.

Definition: A norm on V is a function \left \| \cdot \right \| : V \to R which satisfies the following for all x,y \in V, c \in F:

  • \left \| x \right \| \geq 0, with equality if and only if x=0
  • \left \| cx \right \| = |c| \left \| x \right \|
  • \left \| x+y \right \| \leq \left \| x \right \| + \left \| y \right \| (the infamous triangle inequality)

If we recall the standard Euclidean norm, we see that it is just \left \| x \right \| = \sqrt{\left \langle x,x \right \rangle}. Indeed, for any inner product this definition satisfies the axioms of a norm, and so it is a natural generalization of “distance” between points in any inner product space.

In particular, those readers familiar with topology (or at least metric space analysis), will immediately recognize that a norm induces a metric on V, defined by d(x,y)= \left \| x-y \right \| , where of course x-y is the vector between x and y. Hence, every inner product space has a very rigid (metrized) topology and a fruitful geometry.

Additionally, any vector of norm 1 is called a unit vector, or a normal vector.

Now we look at vectors which have interesting relationships under inner products: specifically, when \left \langle x,y \right \rangle = 0.

Orthogonality and Orthonormal Bases

Definition: Two vectors x,y \in V are orthogonal if \left \langle x,y \right \rangle = 0. A set S of vectors is called orthogonal if the vectors in S are pairwise orthogonal.

Orthogonal vectors naturally generalize perpendicularity in Euclidean space. In particular, two vectors in \mathbb{R}^n are orthogonal if and only if the subspaces (lines) spanned by them are perpendicular.

The simplest examples of orthogonal vectors are the standard basis vectors (1,0, \dots, 0) \dots (0, \dots ,1), with respect to the usual dot product. Although, any scalar multiple of a basis vector \lambda e_i may replace e_i while still preserving the orthogonality of the set.

Orthogonality gives a new insight into the nature of inner products. Specifically, \left \langle x,y \right \rangle gives (almost) the length of the projection of x onto y. In other words, \left \langle x,y \right \rangle is the component of x that points in the direction of y (scaled by the length of y).

Now we may define projection maps to get “projection onto y” more faithfully than a plain inner product:

Definition: The projection of x onto y, denoted p_y(x), is the map V \to V defined by p_y(x) = \left \langle x, y \right \rangle \frac{y}{\left \| y \right \|^2}.

The reader may easily verify that the vectors p_y(x) and x-p_y(x) are indeed orthogonal, though the computation is awkward. This will come in handy later when we want to build orthogonal sets of vectors.

In addition, we may obviously decompose any vector v into its two orthogonal components with respect to another vector w via this projection: v = (v-p_w(v)) + p_w(v).

In addition to orthogonality, the standard basis vectors have norm 1. We call such a basis for an inner product space an orthonormal basis (a portmanteau of orthogonal and normal). We commonly denote an orthonormal set of vectors (e_1, \dots, e_n), perhaps a ritualistic attempt to summon the power of the standard basis vectors.

Given an orthonormal basis for an inner product space V (e_1, \dots, e_n), we may decompose any vector into its basis representation rather easily:

\displaystyle v = p_{e_1}(v) + \dots + p_{e_n}(v) = \left \langle v,e_1 \right \rangle e_1 + \dots + \left \langle v,e_n \right \rangle e_n,

Since the norm of each e_i is 1, we may skip the division by the square norm of e_i in the projection maps. Now, recalling that every vector can be written uniquely as a linear combination of the basis vectors, we see that the inner products \left \langle v, e_i \right \rangle are precisely those coefficients. The recognition of this fact leads us to an important theorem, with a necessary preliminary definition:

Definition: Two inner product spaces V, W are isometric if there exists a linear isomorphism f:V \to W which preserves the inner products in the respective spaces. i.e., \left \langle x, y \right \rangle_V = \left \langle f(x), f(y) \right \rangle_W for all x,y \in V.

Whereas, linear isomorphisms between vector spaces are the mathematically rigorous way of saying the two vector spaces are identical, isometric vector spaces give the “sameness” of the inner products. Hence, isometric vector spaces have equivalent metrics, topologies, and geometries, with merely a different choice of basis and representation of the vectors. Now, as we had that every finite-dimensional vector space was isomorphic to \mathbb{R}^n, we will soon see that every finite-dimensional inner product space is isometric to \mathbb{R}^n with the usual dot product.

Theorem: Any finite dimensional real inner product space V which has an orthonormal basis is isometric to \mathbb{R}^n with the usual dot product.

Proof: Define f: \mathbb{R}^n \to V by f((x_1, \dots, x_n)) = x_1e_1 + \dots + x_ne_n. Now, by computation, we see that

\left \langle f((a_1, \dots , a_n)), f((b_1, \dots , b_n)) \right \rangle
= \left \langle a_1e_1 + \dots + a_ne_n, b_1e_1 + \dots + b_ne_n \right \rangle
= \sum \limits_{i=1}^n a_i \left \langle e_i, b_1e_1 + \dots + b_ne_n \right \rangle
= \sum \limits_{i=1}^n \sum \limits_{j=1}^n a_i b_j \left \langle e_i, e_j \right \rangle
= a_1b_1 + \dots a_nb_n = \left \langle (a_1, \dots a_n), (b_1, \dots b_n) \right \rangle \square

Now, all we must prove is that every finite-dimensional inner product space has an orthonormal basis.

Theorem (Gram-Schmidt): Every basis of a finite-dimensional inner product space may be transformed into an orthonormal basis.

Proof: We do so by construction, using the previously introduced projection maps p_y(x). Given a basis (x_1, \dots , x_n), we compute the following algorithm:

  1. \displaystyle e_1 = \frac{x_1}{\left \| x_1 \right \|}
  2. For each i = 2, \dots , n:
    \

    1. Let z_i = \sum \limits_{j=1}^{i-1} p_{e_j}(x_i)
      \
    2. \displaystyle e_i = \frac{x_i - z_i}{\left \| x_i - z_i \right \|}

The reader may verify computationally that this process produces orthonormal vectors, but the argument is better phrased geometrically: each z_i is the projection of some new vector onto the subspace generated by all previously computed orthonormal vectors. By subtracting x_i - z_i, we take the part of x_i that is orthogonal to all vectors in that subspace. This is guaranteed to be nonzero because of the linear independence of our original list. And so, e_i is orthogonal to every vector preceding it in the algorithm (with indices j < i). Finally, we normalize each e_i to make them unit vectors, and we are done. \square

We note that this algorithm takes O(n^3), since we may compute the O(n^2) needed inner products ahead of time, and then there remains O(n) steps to compute each of the e_i.

This result now proves that every real finite-dimensional inner product space is isometric to \mathbb{R}^n. With this new insight, we may effectively do all our work in \mathbb{R}^n with the usual dot product, realizing that the results there hold for all relevant inner product spaces. In other words, our “simplification” at the beginning of this post, restricting our work to \mathbb{R}^n, was not at all a simplification. Proving statements about \mathbb{R}^n gives us equivalent statements about all real finite-dimensional inner product spaces. Wonderful!

Bases of Eigenvectors

There is one more important topic we wish to discuss here: the importance of bases of eigenvectors. In particular, given a linear operator A: V \to V, if one has a basis of eigenvectors for A, then A has a diagonal representation.

In particular, if A has a basis of eigenvectors (v_1, \dots , v_n), then the expansion of Av_i in terms of the basis vectors is just \lambda_i v_i, where \lambda_i is the corresponding eigenvalue. Thus, the matrix corresponding to A looks like:

\begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n  \end{pmatrix}

Here we count multiplicity of eigenvalues.

The existence of a diagonal representation for a matrix has interesting computational implications. For instance, we often wish to take high powers of a matrix, such as in counting paths in a graph, or working with graphical computations. Unfortunately, each successive matrix multiplication takes O(n^2) computations. If we wish to compute A^m, this takes O(mn^2) time. However, if the matrix has a diagonal representation, we may spend the O(n^2) it takes to convert the matrix to its diagonal form, take powers of the diagonal matrix by simply taking the powers of the diagonal entries, and then convert it back. Indeed, multiplying two diagonal matrices together is just as easy as multiplying the diagonal entries together, as the reader may verify. This optimization reduces computation to O(n^2 + mn) = O(nm), since we are assuming m is very large.

Of course, then we are left with the problem of quickly computing eigenvectors. What worries us even more is that we might not have a basis of eigenvectors (some matrices don’t have any!). We instead take a slightly different route, which serves our purposes better. Specifically, we will be using this information to compute eigenvectors of symmetric matrices (A = A^{\textup{T}}). For this, we refer to a grand theorem:

The Spectral Theorem: Every real symmetric matrix has an orthonormal basis consisting of eigenvectors.

The proof goes beyond the scope of this post (see: characteristic polynomials and self-adjoint operators), but it is very useful for us. In particular, by finding these eigenvectors we may both have a diagonal representation for our matrix, and also compute projections in a jiffy! We will see the astounding applications of this quite soon.

So Even with two primers on linear algebra, we have still only scratched the surface of this wonderful subject. In the future we may continue this series of primers by discussing the linear algebra inherent in many optimization problems. Be sure to look out for it.

Until next time!

Google’s Page Rank – The Final Product

Dangling Nodes and Non-Uniqueness

Recall where we left off last time. Given a web W with no dangling nodes, the link matrix for W has 1 as an eigenvalue, and if the corresponding eigenspace has dimension 1, then any associated eigenvector gives a ranking of the pages in W which is consistent with our goals.

The first problem is that if there is a dangling node, our link matrix has a column of all zeros, and is no longer column-stochastic. In fact, such a non-negative matrix which has columns summing to 1 or columns of all zeros is called column-substochastic. We cannot guarantee that 1 is an eigenvalue, with the obvious counterexample being the zero matrix.

Second, as we saw last time, webs which have disconnected subwebs admit eigenspaces of large dimension, and hence our derived rankings are not unique.

We will fix both of these problems in one fell swoop: by adjusting the link matrix to make all entries positive.

The motivation for this comes from the knowledge of a particular theorem, called the Perron-Frobenius Theorem. While the general statement says a great deal, here are the parts we need:

Theorem: Let A be a positive, column-stochastic matrix. Then there exists a maximal eigenvalue 0 < \lambda \leq 1 such that all other eigenvalues are strictly smaller in magnitude. Further, the eigenspace associated to \lambda has dimension 1. This unique eigenvalue and eigenvector (up to scaling) are called the Perron eigenvalue and eigenvector.

We won’t prove this theorem, because it requires a lot of additional background. But we will use it to make life great. All we need is a positive attitude.

A Drunkard’s Surf

Any tinkering with the link matrix must be done in a sensible way. Unfortunately, we can’t “add” new links to our web without destroying its original meaning. So we need to view the resulting link matrix in a different light. Enter probability theory.

So say you’re drunk. You’re surfing the web and at every page you click a random link. Suppose further that every page is just a list of links, so it’s not harder to find some than others, and you’re equally likely to pick any link on the site. If you continue surfing for a long time, you’d expect to see pages with lots of backlinks more often than those with few backlinks. As you sober up, you might realize that this is a great way to characterize how important a webpage is! You quickly write down the probabilities for an example web into a matrix, with each i,j entry being the probability that you click on a link from page j to go to page i.

This is a bit of drunken mathematical gold! We’ve constructed precisely the same link matrix for a web, but found from a different perspective. Unfortunately, after more surfing you end up on a page that has no links. You cannot proceed, so you randomly type in a URL into the address bar, and continue on your merry way. Soon enough, you realize that you aren’t seeing the same webpages as you did in the first walk. This random URL must have taken you to a different connected component of the web. Brilliant! Herein we have the solution to both of our problems: add in some factor of random jumping.

To do this, and yet maintain column-stochasticity, we need to proportionally scale the elements of our matrix. Let A be our original link matrix, and B be the n \times n matrix with all entries 1/n. Then form a new matrix:

C = pB + (1-p)A, 0 \leq p \leq 1

In words, C has a factor of egalitarianism proportional to p. All we need to verify is that C is still column-stochastic, and this is clear since each column sum looks like the following for a fixed j (and a_{i,j} denotes the corresponding entry of A):

\sum \limits_{i=1}^n(\frac{p}{n} + (1-p)a_{i,j}) = p\sum \limits_{i=1}^n \frac{1}{n} + (1-p)\sum \limits_{i=1}^na_{i,j} = p + (1-p) = 1

So, applying the Perron-Frobenius theorem to our new matrix (where the value p becomes a parameter in need of tuning), there is a unique largest positive eigenvalue \lambda \leq 1 for this web with an eigenspace of dimension 1. Picking any eigenvector within that eigenspace and then normalizing it to a unit vector with positive entries, we have our ranking algorithm!

Aside: note that the assumption that the eigenvalue has all positive entries is not unfounded. This follows as a result of our new matrix C being irreducible. The details of this implication are unnecessary, as the Perron-Frobenius theorem provides the positivity of the Perron eigenvector. Furthermore, all other eigenvectors (corresponding to any eigenvalues) must have an entry which is either negative or non-real. Hence, we only have one available eigenvector to choose for any valid ranking.

Computing the Beast

A link matrix for a web of any reasonable size is massive. As of June 20th, 2011, Google has an indexed database of close to 46 billion web pages, and in its infancy at Stanford, PageRank was tested on a web of merely 24 million pages. Clearly, one big matrix cannot fit in the memory of a single machine. But before we get to the details of optimizing for scale, let us compute page ranks for webs of modest sizes.

The problem of computing eigenvectors was first studied around the beginning of the 1900s. The first published method was called the power method, and it involves approximating the limit of the sequence

\displaystyle v_{n+1} = \frac{Cv_n}{||Cv_n||}

for some arbitrary initial starting vector v_0. Intuitively, when we apply C to v, it “pulls” v toward each of its eigenvectors proportionally to their associated eigenvalues. In other words, the largest eigenvalue dominates. So an element of this sequence which has a high index will have been pulled closer and closer to an eigenvector corresponding to the largest eigenvalue (in absolute value), and hence approximate that eigenvector. Since we don’t want our vector to grow arbitrarily in size, we normalize it at each step. A more detailed analysis exists which formalizes this concept, but we only need to utilize it.

Under a few assumptions this sequence is guaranteed to converge. Specifically, we require that the matrix is real-valued, there exists a dominant eigenvalue, and the starting vector has a nonzero component in the direction of an eigenvector corresponding to the dominant eigenvalue. Luckily for us, existence is taken care of by the Perron-Frobenius theorem, real-valuedness by construction, and we may trivially pick an initial vector of all 1s. Here is the pseudocode:

C <- link matrix
v <- (1,1,...,1)

while true:
   previous = v
   v = dot(C, v)
   v /= norm(v)
   break if norm(v-previous) < epsilon

return v

The only thing we have to worry about is the inefficiency in computing v = Cv, when C is a matrix that is dense, in the sense that there are very few 0’s. That means computing Cv will take \Theta(n^2) multiplications, even though there are far fewer links in the original link matrix A. For a reasonable web of 24 million pages, this is mind-bogglingly slow. So we will replace this line with its original derivation:

   v = (p/n)*v + (1-p)*dot(A,v)

With the recognition that there exist special algorithms for sparse matrix multiplication, this computation is much faster.

Finally, our choice of epsilon is important, because we have yet to speak of how fast this algorithm converges. According to the analysis (linked) above, we can say a lot more than big-O type runtime complexity. The sequence actually converges geometrically, in the sense that each v_k is closer to the limit than v_{k-1} by a factor of r, 0 < r < 1, which is proven to be \frac{|\lambda_2|}{|\lambda_1|}, the ratio of the second and first dominant eigenvalues. If \lambda_2 is very close to \lambda_1, then the method will converge slowly. However, according to the research done on PageRank, the value of r usually sits at about 0.85, making the convergence rather speedy. So picking reasonably small values of epsilon (say, 10^{-10}) will not kill us.

Implementation

Of course, writing pseudocode is child’s play compared to actually implementing a real algorithm. For pedantic purposes we chose to write PageRank in Mathematica, which has some amazing visualization features for graphs and plots, and built-in computation rules for sparse matrix multiplication. Furthermore, Mathematica represents graphs as a single object, so we can have very readable code. You can download the entire Mathematica notebook presented here from this blog’s Github page.

The code for the algorithm itself is not even fifteen lines of code (though it’s wrapped here to fit). We make heavy use of the built in functions for manipulating graph objects, and you should read the Mathematica documentation on graphs for more detailed information. But it’s pretty self-explanatory, black-box type code.

rulesToPairs[i_ -> j_] := {i,j};
SetAttributes[rulesToPairs, Listable];

PageRank[graph_, p_] := Module[{v, n, prev, edgeRules, degrees,
                                linkMatrixRules, linkMatrix},
   edgeRules = EdgeRules[graph];
   degrees = VertexCount[graph];
   n = VertexCount[graph];

   (* setting up the sparse array as a list of rules *)
   linkMatrixRules =
      Table[{pt[[2]],pt[[1]]} -> 1/degrees[[pt[[1]]]]],
            {pt, rulesToPairs[edgeRules]}];
   linkMatrix = SparseArray[linkMatrixRules, {n, n}];

   v = Table[1.0, {n}];
   While[True,
      prev = v;
      v = (p/n) + (1-p)Dot[linkMatrix, v];
      v = v/Norm[v];
      If[Norm[v-prev] < 10^(-10), Break[]]
   ];

   Return[Round[N[v], 0.001]]
];

And now to test it, we simply provide it a graph object, which might look like

Graph[{1->2, 2->3, 3->4, 4->2, 4->1, 3->1, 2->4}]

And it spits out the appropriate answer. Now, the output of PageRank is just a list of numbers between 0 and 1, and it’s not very easy to see what’s going on as you change the parameter p. So we have some visualization code that gives very pretty pictures. In particular, we set the size of each vertex to be its page rank. Page rank values are conveniently within the appropriate range for the VertexSize option.

visualizePageRanks[G_, p_] := Module[{ranks},
   ranks = PageRank[G,p];
   Show[
     Graph[
      EdgeRules[G], 
      VertexSize -> Table[i -> ranks[[i]], {i, VertexCount[G]}],
      VertexLabels -> "Name",
      ImagePadding -> 10
     ]
   ]
];

And we have some random graphs to work with:

randomGraph[numVertices_, numEdges_] :=
  RandomGraph[{numVertices, numEdges},
              DirectedEdges -> True,
              VertexLabels -> "Name",
              ImagePadding -> 10]

Here’s the result for a random graph on 10 vertices and 30 edges, with p = 0.25.

The vertices are sized proportional to their page rank

Furthermore, using Mathematica’s neat (but slow) commands for animation, we can see what happens as we vary the parameter p between zero and one:

Pretty neat!

Surprisingly enough (given that this is our first try implementing PageRank), the algorithm scales without issue. A web of ten-thousand vertices and thirty-thousand edges takes a mere four seconds on an Atom 1.6 GHz processor with a gig of RAM. Unfortunately (and this is where Mathematica starts to show its deficiencies) the RandomGraph command doesn’t support constructions of graphs with as few as 100,000 vertices and 200,000 edges. We leave it as an exercise to the reader to test the algorithm on larger datasets (hint: construct a uniformly distributed list of random rules, then count up the out-degrees of each vertex, and modify the existing code to accept these as parameters).

To give a better idea of how the algorithm works with respect to varying parameters, we have the following two graphs. The first is a runtime plot for random graphs where the number of vertices is fixed at 100 and the number of edges varies between 100 and 1000. Interestingly enough, the algorithm seems to run quickest when there are close to twice the number of edges as there are vertices.

The graph dips around 200 edges, twice the number of vertices

Next, we investigate the effects of varying p, the egalitarianism factor, between 0 and 1 for random graphs with 100 vertices and 200 edges. Unsurprisingly, the runtime is fastest when we are completely egalitarian, and p is close to 1.

The more egalitarian we are willing to be, the faster the ranking is performed.

Google reportedly used p = 0.15, so there probably were not significant gains in performance from the tuning of that parameter alone. Further, the structure of the web is not uniform; obviously there are large link hubs and many smaller sites which have relatively few links. With a bit more research into the actual density of links in the internet, we could do much better simulations. However, this sort of testing is beyond the scope of this blog.

So there you have it! A fully-functional implementation of PageRank, which scales as well as one could hope a prototype to. Feel free to play around with the provided code (assuming you don’t mind Mathematica, which is honestly a very nice language), and comment with your findings!

Next time we’ll wrap up this series with a discussion of the real-world pitfalls of PageRank. We will likely stray away from mathematics, but the consequences of such a high-profile ranking algorithm is necessary for completeness.

Page Rank Series
An Introduction
A First Attempt
The Final Product
Why It Doesn’t Work Anymore

Linear Algebra – A Primer

Story Time

Linear algebra was founded around the same time as Calculus (think Leibniz, circa 1700) solely for the purpose of solving general systems of linear equations. The coefficients of a system were written in a grid form, with rows corresponding to equations and columns to the unknown variables. Using a computational tool called the determinant (an awkward, but computable formula involving only the coefficients of the equations in a system), researchers were able to solve these systems, opening a world of information about the positions of celestial bodies and large-scale measurements (of geodesic arcs) on the surface of the earth.

By the 1850’s, Arthur Cayley was representing matrices as abstract objects. He defined matrix multiplication and nurtured matrix theory as its own field, recognizing a vast wealth of theoretical knowledge underlying the theory of determinants. Around turn of the century, a formal system of vector algebra was invented which relied heavily on interpreting matrices as so-called linear transformations. Linear transformations are intuitively those maps of everyday space (\mathbb{R}^n) which preserve “linear” things. Specifically, they send lines to lines, planes to planes, etc., and they preserve the origin (one which does not preserve the origin is very similar but has a different name; see Affine Transformation). Soon enough the mathematical focus shifted to the foundations of such an algebra, and later with the advent of computers to rapid calculations in one.

Motivations

Linear algebra sits at the crossroads of many areas of mathematics. Keeping close to its roots, linear algebra is primarily a tool for computation. Unsurprisingly, a huge chunk of mathematical research has been solely to phrase things in terms of matrices and their associated linear transformations. For instance, an undirected graph on n vertices can be modeled as a matrix of integer entries, with the i,j entry containing the number of edges from vertex i to vertex j. This is called the adjacency matrix of a graph. Suddenly, a wealth of information about the graph translates to simple matrix computations. For instance, we can compute the number of paths from one vertex to another of length m as the appropriate entry of A^m. (more formally,these are walks, which are allowed to repeat edge traversals and visited vertices)

Even in advanced, purely theoretical mathematics, objects are commonly represented in terms of coordinates in some vector space, and are subsequently studied using all of the great things we know about linear transformations and their matrices. And so, without further ado, we will present the terminology and working concepts necessary for the content elsewhere in this blog.

Vector Spaces

The setting for all of linear algebra is in some vector space. Intuitively this is just a collection of objects, which we call vectors, with some rules on how you can combine vectors to get other vectors. This treatment wouldn’t do that idea justice without an axiomatic definition, so here it is.

Definition: A vector space is a quadruple (V, F, +, \cdot), where V is a set of vectors (points in our space), F is a scalar field (coefficients), +:V \times V \to V is a commutative, associative operation to combine vectors, and \cdot: F \times V \to V is an operation to “scale” vectors. In addition, we need the following properties to hold:

  • Addition and multiplication distribute (as we are used to with traditional algebra).
  • There must be an additive identity, which we call 0, giving 0 + v = v for all v \in V.
  • Every vector must have an additive inverse (every v has some w with v + w = 0).

This is a lot to swallow at first, but it is general for a good reason: there are tons of different kinds of vector spaces! Many of these are surprising and counter-intuitive. For our purposes, however, we may stick with the nice, small vector spaces. So here is a simplified definition that will suffice:

Definition: vector space is a set V of vectors which are fixed-length lists of real numbers (v_1, v_2, \dots , v_n) \in \mathbb{R}^n, where addition between vectors is componentwise, we may scale vectors by any real number, and the following properties hold:

  • Addition and multiplication distribute (as above).
  • (0,0,0, \dots, 0) is the additive identity.
  • (-v_1, -v_2, \dots , -v_n) is the unique additive inverse of (v_1, v_2, \dots , v_n).

Hopefully this is much more familiar to what we think of as “vectors,” and with the understanding that we are viewing it as a vector space, we just call it \mathbb{R}^n. The closure of operations gives us a nice way to characterize “any combination” of vectors in a vector space.

Definition: A linear combination of vectors in a vector space V is the vector

a_1v_1 + a_2v_2 + \dots + a_kv_k

for some positive integer k, scalars a_i, and vectors v_i.

We may speak of the span of a set of vectors as the set of all possible linear combinations of those vectors. Furthermore, we call a set of vectors linearly independent if no vector in the list is in the span of the others. For example, (1,0,0), (0,1,0), and (0,0,1) are linearly independent in \mathbb{R}^3. Specifically, (1,0,0) cannot be written as a(0,1,0) + b(0,0,1) = (0,a,b) for any scalars a,b \in F, and the other two vectors are similarly so.

As usual, we may describe subspaces of a vector space, which are just subsets of V which are themselves vector spaces with the inherited operations. The simplest examples of these are lines, planes, and hyperplanes through the origin in \mathbb{R}^n. Consequently, we may identify \mathbb{R}^n as a subspace of \mathbb{R}^m for any n \leq m.

One of the first things we want to ask about a vector space is “how big is it?” While most instances of vector spaces we will see have uncountably many elements, we can characterize “size” in terms of a different metric: the size of a basis.

Definition: A list of vectors (v_1, v_2, \dots v_n) is a basis for V if its elements are linearly independent, and their span is V. The dimension of a vector space is the length of any basis.

For \mathbb{R}^n, and similarly all finite-dimensional vector spaces, it is easy to prove that all bases have the same length, and hence dimension is well-defined. Further, \mathbb{R}^n admits a very natural basis, often called the standard basis:

e_1 = (1,0, \dots, 0)
e_2 = (0,1, \dots, 0)
\vdots
e_n = (0,0, \dots, 1)

These are best visualized as the coordinate axes in \mathbb{R}^n, and it strokes our intuition as to what a basis should be, because any vector in \mathbb{R}^n can be broken down uniquely into a sum of scalar multiples of these unit coordinates. Indeed, this is true of any basis (due to linear independence). Given a fixed basis for V, every vector v \in V may be uniquely written as a linear combination of basis vectors.

Linear Transformations and their Matrix Representations

Moving quickly toward the heart of linear algebra, we may speak of linear transformations (interchangeably, linear maps) between two vector spaces:

Definition: A function f : V \to W is a linear map if it preserves the operations of addition and scalar multiplication. In other words, for all v, w \in V, c \in F, f(v+w) = f(v)+f(w) and f(cv) = cf(v).

Examples are bountiful; some geometrically inspired ones include rotations about the origin, shears, and scalings. These are functions you’d likely see in an image manipulation program like photoshop. From this we can prove a few basic facts, like that every linear map sends 0 to 0 and additive inverses to additive inverses (try it as an exercise).

One remarkable fact that helps us characterize linear maps is that every linear map is determined completely by what it does to a basis. Since every vector x \in V is a linear combination of basis elements, say x=a_1v_1 + \dots + a_nv_n, we see that a linear map plays nicely:

f(x) = f(a_1v_1 + \dots + a_nv_n) = a_1f(v_1) + \dots + a_nf(v_n)

In other words, if we know what f does to a basis, then we know everything about f. In order to aid our computations, we write what f does to each basis vector in a tabular form. To elaborate on the vague word “does,” we need to also fix a basis of our target vector space W, say (w_1, \dots , w_m), and describe each f(v_i) in terms of this basis. We write it in tabular form, as follows:

\begin{pmatrix} | & | & \mathbf{ } & | \\ f(v_1) & f(v_2) & \dots & f(v_n) \\ | & | & \mathbf{ } & | \end{pmatrix}

The jth column corresponds to f(v_j), and the ith row corresponds to the ith coefficient in the expansion of f(v_j) in terms of the basis for W. Here the vertical bars indicate that each element is a column of scalars. We will do an extended example to make this clear.

Consider the map f on \mathbb{R}^3 defined as (x,y,z) \mapsto (y,x,2z+y). It is easy to check this map is linear, and using the standard basis we see that

f(1,0,0) = (0,1,0),
f(0,1,0) = (1,0,1), and
f(0,0,1) = (0,0,2).

or,

f(e_1) = e_2f(e_2) = e_1 + e_3, and f(e_3) = 2e_3.

Hence, the matrix representation of f with respect to the standard basis is

A = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 2 \end{pmatrix}

Now we see that if we take a (column) vector x, and multiply it on the left by our matrix A, the resulting vector is precisely the coordinate representation of f(x) with respect to the basis for W. In fact, the rules for matrix multiplication were constructed very particularly so that this would be the case. In this way, we may arbitrarily switch between viewing f as a transformation and a vector computation. Compositions of linear maps translate to multiplication of two matrices, and matrix inversion (if it exists) is precisely function inversion.

Of course, there are many different bases we could have chosen. Even though we are going from \mathbb{R}^3 \to \mathbb{R}^3, the column basis could be different from the row basis. Fortunately for our purposes, we are not going to consider what basis is appropriate to choose. All that matters is that fixing a basis, the matrix representation of a linear map is unique, and so we may interchange the notation freely. Even so, the truly interesting things about matrices are those properties which are true no matter which basis we prefer to use.

Eigenvectors and Eigenvalues

Definition: A scalar \lambda \in F is an eigenvalue for the linear map A if there exists a non-zero vector v \in V with Av = \lambda v. Any such vector v which satisfies this equation is said to be an eigenvector of A corresponding to \lambda.

Eigenvectors and eigenvalues have a huge number of applications, including facial recognition software, geology, quantum mechanics, and web search. So being able to find them quickly is of great significance to researchers and engineers. What’s interesting is that while eigenvectors depend on a choice of basis, eigenvalues do not. We prove this now:

Proposition: If A and B are different representations of the same linear map, then any eigenvalue of B is an eigenvalue of A.

Proof. It turns out that the process of “changing a basis” can be boiled down to matrix multiplication. Specifically, if A and B are two different matrix representations of the same linear map, we have the existence of some invertible matrix P such that A = PBP^{-1}, or AP = PB. As a result, if v is an eigenvector for B corresponding to the eigenvalue \lambda, then for some APv = PBv = P \lambda v = \lambda Pv and so A(Pv) = \lambda(Pv), and Pv is an eigenvector for A corresponding to \lambda as well. This proves that eigenvalues are invariant with respect to a change of basis, as desired. \square

The point of this is that we can choose whatever basis we want to work with, and compute the eigenvalues where we’re most comfortable. For instance, if we choose a basis that gives the following diagonal representation,

A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{pmatrix}

then we can just eyeball that the eigenvalues are 1, 2, and 3. In fact, there are some very deep theorems in linear algebra that concern the existence and uniqueness of certain matrix representations. For a more in-depth treatment, see Axler, Linear Algebra Done Right. We will cover all the necessary information in the relevant posts, but until then, we are absolutely pooped from typing. Until next time!