Facebook Page, Google+ Community, and Whispers of Guest Posts

I recently decided to add a few new ways for readers to keep up with what’s going on at Math ∩ Programming. These come in the form of a Google+ community and a Facebook group. I also have a seasoned Twitter feed (@MathProgramming) that I check regularly and post links to.

gplus-community fbpage

I plan to regularly post updates to these pages whenever I write a new post. Everyone is welcome to join/subscribe/follow these things, and hopefully they can generate some interesting discussion above and beyond what goes into my posts.

Aside from that, I’m having some discussions with my colleagues about the potential for guest posts on various topics. These include topics in elliptic curves, financial mathematics, and game theory. In at least one of these topics there are already working programs floating around. Keep an eye out for it (I’m super excited to see how they turn out), and with luck I can recruit more of my colleagues to join the fun.

Until next time!

Teaching Mathematics – Graph Theory

Community Service

Mathematics is supposed to be a process of discovery. Definitions, propositions, and methods of proof don’t come from nowhere, although after the fact (when presented in a textbook) they often seem to. As opposed to a textbook, real maths is highly non-linear. It took mathematicians quite a lot of fuss to come up with the quadratic formula, and even simple geometric conjectures were for the longest time the subject of hot debate.

I feel like if I’m going to be a teacher worth anyone’s time, I have to let students in on the secret that questions guide mathematics. This urge to teach is especially strong at the high school level, where it is generally agreed that “mathematics education” is a farce.

And so, as the only community service I do regularly (and too seldom, at that), I go to local high schools and middle schools and give “lectures” on mathematics. Though I have ideas for a lot of lectures I could give, and wish I had more than just an hour to work with a class, I usually stick to a particularly intuitive lecture on graph theory. I will reproduce one such lecture here, picking out the best of the student’s innovation that I can remember. Regular text paraphrases what I speak and what is written on the board, quoted text is student response, and square brackets [ ] contain commentary.

As a note to the reader, this will serve as a very detailed introduction to Graph Theory, as opposed to the terse primers I’ve been providing thus far.

Two Puzzles

Today we are going to do three things:

  1. Think about some puzzles,
  2. Do some mathematics, and
  3. Use math to change the world.

So here are the two puzzles:

First, [after asking a student to provide her name, I invent a city name based on it] imagine you’re the mayor of Erintown. In Erintown there are seven very old and beautiful bridges, and as mayor you’d like to promote their prominence in tourism. To do this, you wish to provide a route through the city which crosses every bridge exactly once, never visiting the same bridge twice. The seven bridges are arranged as follows [a much more detailed picture than what one draws on a white board]:

Bridges of Erintown

[The informed reader will recognize this immediately as the Seven Bridges of Königsberg problem, which historically founded graph theory, and was solved by Leonard Euler in the 18th century. But honestly, what (potentially immature) high school student is interested in a problem with a name like that? As we will see throughout the post, personalization (and the engagement inherent in it) is essential to the success of the lecture.]

Unfortunately, after a few tries you are unable to find a route which works. Hence the first puzzle is: does such a route exist? If not, how can we prove it?

[At this point, we clarify some rules of the puzzle. High school students are adept at producing loopholes, and rightfully they enjoy doing so. So typically we talk about swimming, aircraft, traversing bridges halfway, teleportation, etc., banishing each possibility as it comes up. This is an important step, because in part the whole point of the mathematical formulation of this problem is to eliminate these possibilities from consideration. We very much need to rephrase this problem entirely in our minds to extract the aspects we care about and discard the rest. Even when in real life swimming is an option, our mathematical formulation must ignore swimming, and hence we must design it appropriately (and hopefully elegantly).]

For the second problem, say you’re at a party of one hundred people. At this party, someone decides to start tallying up who at the party is friends with whomelse (he’s one of those guys, a drama king). He shows his list to you, and you notice that there are two people at the party who have the same number of friends at the party. The thought occurs to you that this will always be the case, no matter how many people attend and who is friends with whom.

So the second puzzle is: at a party of $ n$ people, must it be true that there exist two people with the same number of friends at the party?

[Again, we have the requisite loopholes, like whether there are stalkers at the party, and whether you are friends with yourself. The former drives us to distinguish that we want “symmetric” friendships, i.e. if you are friends with someone then they are friends with you. The former translates to undirected edges later, while the latter hints at simple graphs. Both are usually made clear by appealing to the rules of Facebook friendship. Finally, we might make the clarification that there are at least two people at the party, in order to prevent a discussion of vacuously true statements.]

Now take five minutes and try to solve these problems, by yourself, with a friend, or with a group, however you feel most comfortable tackling a problem. [They never get very far, but at least once I’ve encountered a student who knew of the Seven Bridges problem ahead of time, spoiling much of the fun and thoroughly confusing the rest of the class.]

New Mathematics

[After five to ten minutes pass and the group is quiet again] So, who thinks they’ve solved the first problem? [hands raise, most proclaiming impossibility; those who try to explain their reasoning mostly resign to awkward case-checking or saying they just couldn’t find one that worked] And what about the second? [nobody raises a hand, most enjoy thinking about the bridges problem because it is very visual. In classrooms blessed with a SmartBoard, I can have a number of them come up to the front and attempt to draw a route with their finger (and hitting undo when they invariably fail, so that I don’t have to redraw the diagram every time).]

So, it appears that we haven’t come up with a good solution for either problem. Now a mathematician might say at this point, “screw this, I’m going to make up my own math to solve it!” And that’s what we’re going to try to do.

The first step is to compare the problems: what is similar and what is different? [Discussion ensues, but often times the students don’t understand what I’m looking for, and usually the problem is that they’re trying to come up with the “correct” answer instead of making observations; it is a curse of the schooling paradigm. Additional leading questions include:] what are the subjects of our study? How are they related? does it matter where you walk on a landmass between visiting bridges? Does it matter where the people in the party are standing? [And the most important question] Is there a better way to draw these problems?

[Soon enough students make the right observations, that our drawings of the two problems are almost identical!] It doesn’t actually matter how big or where the landmasses are, since all we care about is the order in which we cross bridges. Hence we can compress the landmasses down to dots! Additionally, we can just draw people as dots, and arrange them in any way we wish. Then, the bridges and friendships become lines connecting the dots. This yields a much nicer picture for the bridges problem, and a similar one for the party problem.

Our new form of the seven bridges problem.

By writing the problem this way, we have distilled out the relevant facts: all we really care about is the structure of how these things are connected. Unfortunately we have one problem: we don’t have names for these things! We certainly don’t want to call them bridges and landmasses, or people and friendships, because we want the picture to apply to both problems at the same time.

So to start, what would we call the picture as a whole? Appeal to your imagination about what it looks like. [Though this part is sometimes difficult, especially at the middle-school level, eventually someone calls out something truly clever] “How about a constellation?”

I like that! So here we are, this is our invention:

Constellation Theory

What are we going to call the individual dots? “Stars!” And what about the lines connecting them? “How about…connections?” Okay. So here is our first definition:

Definition: A constellation has three parts:

  1. A set of stars $ S$ [we just accept the intuitive definition of a set without issue],
  2. A set of connections $ C$,
  3. A function $ f: C \to S \times S$ which accepts a connection and tells us which two stars it is connected to.

[Before the third, I ask the class whether the first two alone are enough. If I get nods, I draw a random collection of dots and lines, with the lines not at all connected to the dots, and they see we need some statement of incidence.]

Don’t be afraid of the third part (even if you don’t know what a function is), it’s just a formality that uses other (well-established) maths to make our definition consistent. Math can sometimes be a notational nightmare, but all this means is that we can take any connection and easily say which two stars it connects. Since we will always draw constellations as a picture, we can just use the picture as our “function.”

Now can someone remind me again what we were looking for in the bridges problem? Right, a route through the city that hits every bridge exactly once. First we need to translate a “route” into our language of constellations. Does anyone have a good name? [After a few generic suggestions like “trail,” “path,” and “route,” we settle on the imaginative “waltz”.] This gives our second definition:

Definition: A waltz through a constellation $ (S,C,f)$ is a list of alternating stars and connections, which we label $ (s_1, c_1, s_2, c_2, \dots, s_{n-1}, c_{n-1}, s_n)$, where the $ i$th connection $ c_i$ is connected to its neighboring stars in the list $ s_i, s_{i+1}$. In terms of our function, $ f(c_i) = (s_i,s_{i+1})$ for each $ i=1 \dots n-1$.

This is just a way to write out on paper what the waltz is. [I label the seven bridges picture and provide an example.] You who suggested the name “waltz,” what is your name? “Phil.” What is your last name, Phil? “Osman.” Great! Now we have another definition:

Definition: An Osmannian waltz through a constellation is a waltz which uses each connection in $ C$ exactly once.

[A few giggles resound when they realize I’m incorporating the student’s name into the definition.] Now can somebody rephrase the original problem in terms of constellation theory? “We want to find out if there is an Osmannian waltz in that particular constellation.” Excellent!

Now let’s turn our attention to the party problem. Can someone remind me what we were trying to find out about parties? “Whether there are two people who have the same number of friends.” Right, whether that has to be the case for any party. Now in terms of constellations, what is that? “The number of connections at each star.” Great. What’s your name? “Olivia.” Olivia, what’s your last name? “Bisel.” Okay. Here’s another definition:

Definition: The Bisel-degree of a star is the number of connections in $ C$ connected to it.

Now there are a couple of other details we have to consider. Specifically, in a general constellation we never ruled out multiple connections connecting the same two stars. And we never said a connection can’t go from a star to itself. But we must rule these out to make a sensible party problem. So we will call a constellation which rules out doubled connections and self-connections simple. [For the sake of time, we just provide a name, and it’s not that imaginative of a property anyway.]

So can someone translate the party problem into the language of simple constellations? “It’s whether every simple constellation has to have two stars with equal Bisel-degree.” Wonderful!

Now that we have a working language, let’s take another ten minutes to try to solve the problems. But this time, you aren’t allowed to use “bridges,” “landmasses,” “people,” or “friendships” anymore, you have to use the terms we invented. [The students still don’t get far, especially on the bridges problem, but every now and then a student solves the party problem. As they work, I give subtle hints, like, “what happens if you add extra connections or remove some? Does it work then? What aspect of the intrinsic structure have you altered by doing this? Try lots of examples!”]

[After bringing the class back together] So who thinks they’ve solved the first problem? [a few hands raise] “I think it has something to do with whether the Bisel-degree is even or odd.” Interesting. Did you get much further than that? “No…” Okay. What about the party problem? “I think I have it. So if everybody had a different number of friends, then one person would have to have no friends and someone would have to be everyone’s friend, but that can’t happen.”

Did everyone hear that? [I reiterate on the board in detail, explaining the idea behind proof by contradiction, and drawing a picture of the resulting constellation.] This is a very elegant proof. And if anyone can come up with a solid proof of the bridge problem, I have no doubt your teacher would give ample extra credit.

Changing the World

Now, for the mathematician this is enough. This new mathematical object, a constellation, is full of wonderful patterns that we could spend our entire lives thinking about (and many have done just this). However, it’s probably true that most of you aren’t going to become mathematicians. So let’s try to think of things in the real world that we can model as constellations. Any ideas?

[The students suggest a variety of different (and usually complex) ideas, including trade between nation-states, distributions of power among people, and the structure of galaxies. For each example, I usually have to verbally augment our representation of a graph (for the sake of time), bringing constellations with numerically labeled edges or directed connections. Since I always tell the students to “think bigger,” they inevitably say “galaxies,” and I have to explain why that doesn’t work, because the whole point of constellations (the mathematical ones) is that the relative sizes and positions of the stars don’t matter, whereas at the galactic level that completely determines the connection (which is invariably gravitational pull). We apologize for the terminological confusion.

[Eventually, we might get to the cases of modelling all roads and intersections, after which I claim that is exactly how Google Maps (and all other mapping/directions software) works. Sometimes they take the “friendship” hint and recognize Facebook as a constellation, and we often begin to talk about friend suggestions and degrees of separation. Finally, (and this is the main example I wish to work toward), we model the entire internet as a constellation, with directed connections corresponding to links between web sites. Then I talk about how Google based their company on the soundness of this particular model, making 25 billion dollars and changing the world. We do not discuss software representations of constellations, nor algorithms to extract data from them (this would be a whole course worth of information, at least).

[Depending on the amount of remaining time, I either provide the proof of the party problem, if the students didn’t solve it on their own, or continue with anecdotes about Google’s PageRank and its pitfalls. I don’t usually give the proof of the seven bridges problem, but if pressed a short sketch of the proof is easy. More often than not, the bell ends my lecture before I’m ready anyway.]

[I should emphasize that the proof of the party problem is the most exciting moment of the entire lecture. There is often a few students that have an audible “whoa!” and I’ve even received a standing ovation. This tells me that basic elegant proofs are easily within reach of a freshman high school student, and even the obviously “popular” girls have admitted to me it’s cool.]

Reflections

This lecture has generally been successful among students for three obvious reasons.

First, it is exploratory. People intrinsically like puzzles. In sharp contrast to the typical high-school style of memorization and repetition, the students drive the method! Of course, they do so in a discordant, chaotic way, but this can just as easily be said of the same students English essays or history papers. They simply have less practice with this particular kind of argument, and so they are expectantly less coherent.

Unfortunately the shortage of time forces me to guide them much more directly than I should. The amount of content we cover in an hour lecture really deserves a week of discussion, formulation and reformulation, and debate, with as little intervention as possible. I would absolutely love a chance to work with them for an extended period of time to see how it plays out. And of course, it would be much less linear, and we’d explore the questions of the students interest. However, I do fear that they might prefer a more rigid structure, being used to the humdrum of their education heretofore.

Second, the lecture is personal. I don’t have them make up names for nothing. Mathematics is a generative subject. The students not only need to see that, but also experience the process of taking an intuitive idea and nailing it down (often with more logical rigor than anything they’ve experienced in math to date). Inventing a name for the resulting definition secures the idea in their memory, and accentuates the notion that this thing is unique to their special classroom community. Since they don’t yet have the practice or motivation to make their own proofs (the ultimate mathematical self-expression), this is the next best thing. But most of all, naming the concepts is fun!

So I frame the problems in their imaginations, not history. I give them no false pretense for why we are doing this. The puzzle is a means to its own end, and we only later discover that our work is applicable. For a large portion of mathematical history, one might argue, this is how progress worked. Certainly Leonhard Euler anticipated neither Facebook’s social graph nor Google Maps in the mid 18th century. This is the easiest way to impress upon them that many different things have similar patterns in their structure, so even studying a trivial thing can be very enlightening. The puzzles really are worth doing for their own sake.

Compare this to being given the definitions and propositions in the established mathematical language. To an untrained, uninterested student, this is not only confusing, but boring beyond belief! They don’t have the prerequisite intuition for why the definition is needed, and so they are left mindlessly following along at best, and dozing off at worst.

Third, the lecture is a conversation. While ultimately I have to dish out judgement on their suggestions (this name doesn’t make sense, that idea doesn’t pan out for this reason, etc.), I make an honest effort to explain why and reiterate our goals, showing the discrepancy, and then requesting another suggestion. Unfortunately that keeps it as a bona-fide lecture, but if I had a week, the students would ideally critique each other’s work.

At the same time, I do (to some bounded level) entertain their admittedly immature suggestions. When they keep insisting swimming or long jumping across the river, I usually quip with something like, “Pretend the tourist is your grandmother. Would she swim that far?” If not to just deny their question, this reminds them that the original puzzle was meant for all tourists, including the “weakest link” as far as swimming goes. Even when they riposte with “Yes. My grandmother is a body-builder,” I dismiss it with a smirk and a wave of the hand, perhaps sarcastically saying, “Okay.” I feel that such a level of humorous improvisation is necessary, both to keep the students on their toes and to mirror their creativity with my own, thus fueling it and directing it toward the mathematics.

Of course, I might not always spend so much time with such verbal fencing. But for a first exposure to real mathematics, and to establish my role in part as an equal but more so as an obstacle, I deem it necessary. I need to be the out-witter, so that when they exhaust their loopholes, they have no choice but to beat me by solving the problem. Given a lengthier period of time (taking into consideration the students’ maturity level), I would gradually transition to a more pointed focus on the problems, and make it clear when silliness is appropriate. With luck and planning, interest in the problems would compensate for a perhaps dull state of order.

If I Had a Class

Sometimes I entertain the thought that I might end up teaching high school, and that with the providence of the school’s administration I could have my own elective course called “Real Math,” or something perhaps more enticing to the skeptical student (“Math Soup for the Teenage Soul”? “Math as Art”? “Mathematical Composition”?).

This course would start with a week of lectures similar to the one detailed above, and then alternate each week between some objective curricula (likely the basics of set theory and methods of proof) and an exploratory topic. The latter might likely start off as more explorations into graph theory (I’d debate whether to replace their invented names with the established language) and then continue into other basic topics. During the exploratory week, students would present and critique arguments in front of the class. The problems come from a list of problems given at the beginning of the week or the students’ own minds. And though I’d prefer most of the problems being the students’ own, it’s likely that initially most of the problems would come from me. Partial solutions, interesting observations, and even the process of an incorrect solution would all be presentation-worthy.

And finally, perhaps the only original idea I would have for this course, the students would each keep a journal. It would double as a notebook for their own investigation of the material brought up during exploratory weeks and a portfolio to turn in for grading. Its grading would be largely subjective, but the students would have to display some level of effort in terms of the depth to which they explore a particular problem and the number of problems attempted. As the year would progress, I would get to know the students’ ability levels and work tendencies much more clearly, and would thus have a more refined and personalized grading method.

Of course, not long after building up these ideas in my imagination, I came across the essay A Mathematician’s Lament, in which Paul Lockhart mercilessly (and rightfully) berates the current state of mathematics education in America. He more or less advocates the kind of teaching style I propose, and then argues that today’s mathematics teachers cannot play such a role for a lack of their own love for mathematics, and would not want to, because teaching this way is extremely hard! It requires much more work than the average teacher is paid to do, and there is no time for it when you’re berated by scripted curricula and standards and have six classes a day.

After repeating the lecture above for five classes of students in the course of a single day, I certainly agree with Lockhart on the difficulty of this teaching method. Though I feel I have a natural knack for presentation and engagement, to handle the standard number of students per day expected of American teachers is quite tiring (and I am a lively and strapping young lad!). It’s clear that if mathematics education is going to be fixed, one big part will be in taking teachers out of the classroom for preparation, training, and reflection. That being said, I consider it my duty to take every opportunity to do a lecture, while it provides me both with intellectual joy and the satisfaction of beneficence.

Google’s Page Rank – Why it Doesn’t Work Anymore

A Bully By Any Other Name

From the New York Times:

“Shopping online in late July, Clarabelle Rodriguez typed the name of her favorite eyeglass brand into Google’s search bar.

In moments, she found the perfect frames — made by a French company called Lafont — on a Web site that looked snazzy and stood at the top of the search results. Not the tippy-top, where the paid ads are found, but under those, on Google’s version of the gold-medal podium, where the most relevant and popular site is displayed.

Ms. Rodriguez placed an order for both the Lafonts and a set of doctor-prescribed Ciba Vision contact lenses on that site, DecorMyEyes.com. The total cost was $361.97.

It was the start of what Ms. Rodriguez would later describe as one of the most maddening and miserable experiences of her life…” [continue reading]

For those without the patience to read the seven page article, the owner of DecorMyEyes.com, Vitaly Borker, deliberately mischarged his clients, and then bullied any who complained. Constantly dishing out vulgar threats, Borker committed wire-fraud, impersonation, and stalking as part of his business strategy.

Of course, every angry customer went directly to online web forums and business review companies to complain. Unbeknownst to them, this was Borker’s hope. Every review posted about DecorMyEyes.com, no matter how negative, was another backlink to boost the site’s page rank. After enough negative press, sunglasses product pages from DecorMyEyes.com soon showed up even higher on search results than the websites of their designers!

Shortly after the New York Times’s article stirred up a serious conversation about DecorMyEyes’s business practice, the police struck up an investigation, and Borker was promptly arrested.

The Honest Algorithm

This story illustrates a compelling point. Once Google released the information behind their ranking algorithm (of course, it happened before Google was even a company), people could take advantage of it! A staggeringly large number of services exist to boost your page rank (129 million results on Google search). And so the intended way to get a high page rank (others like your content and link to it) is superseded by some method of manufacturing backlinks.

Unfortunately for the rest of us, PageRank seems to be an honest algorithm. It only judges pages appropriately when the pages aren’t competing to be judged. Once Google became a popular option for search, rankings could make or break a start-up tech company. It was only the natural response to exploit PageRank and boost business, but of course this undermines the assumptions of the algorithm.

It is certainly obvious at this point that while PageRank may be a critical component to Google’s overall ranking algorithm, it is certainly not the only factor Google considers. Its plausible that Google has very many alternative ranking criteria which trump PageRank. Undoubtedly, Google was forced to come up with these criteria specifically to combat sites like DecorMyEyes.com, and identify manufactured links.

And so this opens the floor for discussion: what alternative ranking systems would you consider? Can you think of easy ways to identify these maliciously manufactured links? This seems in general to be a hard problem, unless the pages providing the additional links are blatantly obvious.

Other than a potential discussion in the comments, that wraps up the series on PageRank. We hope the readers have enjoyed it!

Page Rank Series
An Introduction
A First Attempt
The Final Product
Why It Doesn’t Work Anymore

Google’s Page Rank – The Final Product

Dangling Nodes and Non-Uniqueness

Recall where we left off last time. Given a web $ W$ with no dangling nodes, the link matrix for $ W$ has 1 as an eigenvalue, and if the corresponding eigenspace has dimension 1, then any associated eigenvector gives a ranking of the pages in $ W$ which is consistent with our goals.

The first problem is that if there is a dangling node, our link matrix has a column of all zeros, and is no longer column-stochastic. In fact, such a non-negative matrix which has columns summing to 1 or columns of all zeros is called column-substochastic. We cannot guarantee that 1 is an eigenvalue, with the obvious counterexample being the zero matrix.

Second, as we saw last time, webs which have disconnected subwebs admit eigenspaces of large dimension, and hence our derived rankings are not unique.

We will fix both of these problems in one fell swoop: by adjusting the link matrix to make all entries positive.

The motivation for this comes from the knowledge of a particular theorem, called the Perron-Frobenius Theorem. While the general statement says a great deal, here are the parts we need:

Theorem: Let $ A$ be a positive, column-stochastic matrix. Then there exists a maximal eigenvalue $ 0 < \lambda \leq 1$ such that all other eigenvalues are strictly smaller in magnitude. Further, the eigenspace associated to $ \lambda$ has dimension 1. This unique eigenvalue and eigenvector (up to scaling) are called the Perron eigenvalue and eigenvector.

We won’t prove this theorem, because it requires a lot of additional background. But we will use it to make life great. All we need is a positive attitude.

A Drunkard’s Surf

Any tinkering with the link matrix must be done in a sensible way. Unfortunately, we can’t “add” new links to our web without destroying its original meaning. So we need to view the resulting link matrix in a different light. Enter probability theory.

So say you’re drunk. You’re surfing the web and at every page you click a random link. Suppose further that every page is just a list of links, so it’s not harder to find some than others, and you’re equally likely to pick any link on the site. If you continue surfing for a long time, you’d expect to see pages with lots of backlinks more often than those with few backlinks. As you sober up, you might realize that this is a great way to characterize how important a webpage is! You quickly write down the probabilities for an example web into a matrix, with each $ i,j$ entry being the probability that you click on a link from page $ j$ to go to page $ i$.

This is a bit of drunken mathematical gold! We’ve constructed precisely the same link matrix for a web, but found from a different perspective. Unfortunately, after more surfing you end up on a page that has no links. You cannot proceed, so you randomly type in a URL into the address bar, and continue on your merry way. Soon enough, you realize that you aren’t seeing the same webpages as you did in the first walk. This random URL must have taken you to a different connected component of the web. Brilliant! Herein we have the solution to both of our problems: add in some factor of random jumping.

To do this, and yet maintain column-stochasticity, we need to proportionally scale the elements of our matrix. Let $ A$ be our original link matrix, and $ B$ be the $ n \times n$ matrix with all entries $ 1/n$. Then form a new matrix:

$ C = pB + (1-p)A, 0 \leq p \leq 1$

In words, $ C$ has a factor of egalitarianism proportional to $ p$. All we need to verify is that $ C$ is still column-stochastic, and this is clear since each column sum looks like the following for a fixed $ j$ (and $ a_{i,j}$ denotes the corresponding entry of $ A$):

$ \sum \limits_{i=1}^n(\frac{p}{n} + (1-p)a_{i,j}) = p\sum \limits_{i=1}^n \frac{1}{n} + (1-p)\sum \limits_{i=1}^na_{i,j} = p + (1-p) = 1$

So, applying the Perron-Frobenius theorem to our new matrix (where the value $ p$ becomes a parameter in need of tuning), there is a unique largest positive eigenvalue $ \lambda \leq 1$ for this web with an eigenspace of dimension 1. Picking any eigenvector within that eigenspace and then normalizing it to a unit vector with positive entries, we have our ranking algorithm!

Aside: note that the assumption that the eigenvalue has all positive entries is not unfounded. This follows as a result of our new matrix $ C$ being irreducible. The details of this implication are unnecessary, as the Perron-Frobenius theorem provides the positivity of the Perron eigenvector. Furthermore, all other eigenvectors (corresponding to any eigenvalues) must have an entry which is either negative or non-real. Hence, we only have one available eigenvector to choose for any valid ranking.

Computing the Beast

A link matrix for a web of any reasonable size is massive. As of June 20th, 2011, Google has an indexed database of close to 46 billion web pages, and in its infancy at Stanford, PageRank was tested on a web of merely 24 million pages. Clearly, one big matrix cannot fit in the memory of a single machine. But before we get to the details of optimizing for scale, let us compute page ranks for webs of modest sizes.

The problem of computing eigenvectors was first studied around the beginning of the 1900s. The first published method was called the power method, and it involves approximating the limit of the sequence

$ \displaystyle v_{n+1} = \frac{Cv_n}{||Cv_n||}$

for some arbitrary initial starting vector $ v_0$. Intuitively, when we apply $ C$ to $ v$, it “pulls” $ v$ toward each of its eigenvectors proportionally to their associated eigenvalues. In other words, the largest eigenvalue dominates. So an element of this sequence which has a high index will have been pulled closer and closer to an eigenvector corresponding to the largest eigenvalue (in absolute value), and hence approximate that eigenvector. Since we don’t want our vector to grow arbitrarily in size, we normalize it at each step. A more detailed analysis exists which formalizes this concept, but we only need to utilize it.

Under a few assumptions this sequence is guaranteed to converge. Specifically, we require that the matrix is real-valued, there exists a dominant eigenvalue, and the starting vector has a nonzero component in the direction of an eigenvector corresponding to the dominant eigenvalue. Luckily for us, existence is taken care of by the Perron-Frobenius theorem, real-valuedness by construction, and we may trivially pick an initial vector of all 1s. Here is the pseudocode:

C <- link matrix
v <- (1,1,...,1)

while true:
   previous = v
   v = dot(C, v)
   v /= norm(v)
   break if norm(v-previous) < epsilon

return v

The only thing we have to worry about is the inefficiency in computing $ v = Cv$, when $ C$ is a matrix that is dense, in the sense that there are very few 0’s. That means computing $ Cv$ will take $ \Theta(n^2)$ multiplications, even though there are far fewer links in the original link matrix $ A$. For a reasonable web of 24 million pages, this is mind-bogglingly slow. So we will replace this line with its original derivation:

   v = (p/n)*v + (1-p)*dot(A,v)

With the recognition that there exist special algorithms for sparse matrix multiplication, this computation is much faster.

Finally, our choice of epsilon is important, because we have yet to speak of how fast this algorithm converges. According to the analysis (linked) above, we can say a lot more than big-O type runtime complexity. The sequence actually converges geometrically, in the sense that each $ v_k$ is closer to the limit than $ v_{k-1}$ by a factor of $ r, 0 < r < 1$, which is proven to be $ \frac{|\lambda_2|}{|\lambda_1|}$, the ratio of the second and first dominant eigenvalues. If $ \lambda_2$ is very close to $ \lambda_1$, then the method will converge slowly. However, according to the research done on PageRank, the value of $ r$ usually sits at about 0.85, making the convergence rather speedy. So picking reasonably small values of epsilon (say, $ 10^{-10}$) will not kill us.

Implementation

Of course, writing pseudocode is child’s play compared to actually implementing a real algorithm. For pedantic purposes we chose to write PageRank in Mathematica, which has some amazing visualization features for graphs and plots, and built-in computation rules for sparse matrix multiplication. Furthermore, Mathematica represents graphs as a single object, so we can have very readable code. You can download the entire Mathematica notebook presented here from this blog’s Github page.

The code for the algorithm itself is not even fifteen lines of code (though it’s wrapped here to fit). We make heavy use of the built in functions for manipulating graph objects, and you should read the Mathematica documentation on graphs for more detailed information. But it’s pretty self-explanatory, black-box type code.

rulesToPairs[i_ -> j_] := {i,j};
SetAttributes[rulesToPairs, Listable];

PageRank[graph_, p_] := Module[{v, n, prev, edgeRules, degrees,
                                linkMatrixRules, linkMatrix},
   edgeRules = EdgeRules[graph];
   degrees = VertexCount[graph];
   n = VertexCount[graph];

   (* setting up the sparse array as a list of rules *)
   linkMatrixRules =
      Table[{pt[[2]],pt[[1]]} -> 1/degrees[[pt[[1]]]]],
            {pt, rulesToPairs[edgeRules]}];
   linkMatrix = SparseArray[linkMatrixRules, {n, n}];

   v = Table[1.0, {n}];
   While[True,
      prev = v;
      v = (p/n) + (1-p)Dot[linkMatrix, v];
      v = v/Norm[v];
      If[Norm[v-prev] < 10^(-10), Break[]]
   ];

   Return[Round[N[v], 0.001]]
];

And now to test it, we simply provide it a graph object, which might look like

Graph[{1->2, 2->3, 3->4, 4->2, 4->1, 3->1, 2->4}]

And it spits out the appropriate answer. Now, the output of PageRank is just a list of numbers between 0 and 1, and it’s not very easy to see what’s going on as you change the parameter $ p$. So we have some visualization code that gives very pretty pictures. In particular, we set the size of each vertex to be its page rank. Page rank values are conveniently within the appropriate range for the VertexSize option.

visualizePageRanks[G_, p_] := Module[{ranks},
   ranks = PageRank[G,p];
   Show[
     Graph[
      EdgeRules[G], 
      VertexSize -> Table[i -> ranks[[i]], {i, VertexCount[G]}],
      VertexLabels -> "Name",
      ImagePadding -> 10
     ]
   ]
];

And we have some random graphs to work with:

randomGraph[numVertices_, numEdges_] :=
  RandomGraph[{numVertices, numEdges},
              DirectedEdges -> True,
              VertexLabels -> "Name",
              ImagePadding -> 10]

Here’s the result for a random graph on 10 vertices and 30 edges, with $ p = 0.25$.

The vertices are sized proportional to their page rank

Furthermore, using Mathematica’s neat (but slow) commands for animation, we can see what happens as we vary the parameter $ p$ between zero and one:

Pretty neat!

Surprisingly enough (given that this is our first try implementing PageRank), the algorithm scales without issue. A web of ten-thousand vertices and thirty-thousand edges takes a mere four seconds on an Atom 1.6 GHz processor with a gig of RAM. Unfortunately (and this is where Mathematica starts to show its deficiencies) the RandomGraph command doesn’t support constructions of graphs with as few as 100,000 vertices and 200,000 edges. We leave it as an exercise to the reader to test the algorithm on larger datasets (hint: construct a uniformly distributed list of random rules, then count up the out-degrees of each vertex, and modify the existing code to accept these as parameters).

To give a better idea of how the algorithm works with respect to varying parameters, we have the following two graphs. The first is a runtime plot for random graphs where the number of vertices is fixed at 100 and the number of edges varies between 100 and 1000. Interestingly enough, the algorithm seems to run quickest when there are close to twice the number of edges as there are vertices.

The graph dips around 200 edges, twice the number of vertices

Next, we investigate the effects of varying $ p$, the egalitarianism factor, between 0 and 1 for random graphs with 100 vertices and 200 edges. Unsurprisingly, the runtime is fastest when we are completely egalitarian, and $ p$ is close to 1.

The more egalitarian we are willing to be, the faster the ranking is performed.

Google reportedly used $ p = 0.15$, so there probably were not significant gains in performance from the tuning of that parameter alone. Further, the structure of the web is not uniform; obviously there are large link hubs and many smaller sites which have relatively few links. With a bit more research into the actual density of links in the internet, we could do much better simulations. However, this sort of testing is beyond the scope of this blog.

So there you have it! A fully-functional implementation of PageRank, which scales as well as one could hope a prototype to. Feel free to play around with the provided code (assuming you don’t mind Mathematica, which is honestly a very nice language), and comment with your findings!

Next time we’ll wrap up this series with a discussion of the real-world pitfalls of PageRank. We will likely stray away from mathematics, but the consequences of such a high-profile ranking algorithm is necessary for completeness.

Page Rank Series
An Introduction
A First Attempt
The Final Product
Why It Doesn’t Work Anymore