The Erdős-Rényi Random Graph

During the 1950’s the famous mathematician Paul Erdős and Alfred Rényi put forth the concept of a random graph and in the subsequent years of study transformed the world of combinatorics. The random graph is the perfect example of a good mathematical definition: it’s simple, has surprisingly intricate structure, and yields many applications.

In this post we’ll explore basic facts about random graphs, slowly detail a proof on their applications to graph theory, and explore their more interesting properties computationally (a prelude to proofs about their structure). We assume the reader is familiar with the definition of a graph, which we’ve written about at length for non-mathematical audiences, and has some familiarity with undergraduate-level probability and combinatorics for the more mathematical sections of the post. We’ll do our best to remind the reader of these prerequisites as we go, and we welcome any clarification questions in the comment section.

The Erdős-Rényi Model

The definition of a random graph is simple enough that we need not defer it to the technical section of the article.

Definition: Given a positive integer n and a probability value 0 \leq p \leq 1, define the graph G(n,p) to be the undirected graph on n vertices whose edges are chosen as follows. For all pairs of vertices v,w there is an edge (v,w) with probability p.

Of course, there is no single random graph. What we’ve described here is a process for constructing a graph. We create a set of n vertices, and for each possible pair of vertices we flip a coin (often a biased coin) to determine if we should add an edge connecting them. Indeed, every graph can be made by this process if one is sufficiently lucky (or unlucky), but it’s very unlikely that we will have no edges at all if p is large enough. So G(n,p) is really a probability distribution over the set of all possible graphs on n vertices. We commit a horrendous abuse of notation by saying G or G(n,p) is a random graph instead of saying that G is sampled from the distribution. The reader will get used to it in time.

Why Do We Care?

Random graphs of all sorts (not just Erdős’s model) find applications in two very different worlds. The first is pure combinatorics, and the second is in the analysis of networks of all kinds.

In combinatorics we often wonder if graphs exist with certain properties. For instance, in graph theory we have the notion of graph colorability: can we color the vertices of a graph with k colors so that none of its edges are monochromatic? (See this blog’s primer on graph coloring for more) Indeed, coloring is known to be a very difficult problem on general graphs. The problem of determining whether a graph can be colored with a fixed number of colors has no known efficient algorithm; it is NP-complete. Even worse, much of our intuition about graphs fails for graph coloring. We would expect that sparse-looking graphs can be colored with fewer colors than dense graphs. One naive way to measure sparsity of a graph is to measure the length of its shortest cycle (recall that a cycle is a path which starts and ends at the same vertex). This measurement is called the girth of a graph. But Paul Erdős proved using random graphs, as we will momentarily, that for any choice of integers g,k there are graphs of girth \geq g which cannot be colored with fewer than k colors. Preventing short cycles in graphs doesn’t make coloring easier.

The role that random graphs play in this picture is to give us ways to ensure the existence of graphs with certain properties, even if we don’t know how to construct an example of such a graph. Indeed, for every theorem proved using random graphs, there is a theorem (or open problem) concerning how to algorithmically construct those graphs which are known to exist.

Pure combinatorics may not seem very useful to the real world, but models of random graphs (even those beyond the Erdős-Rényi model) are quite relevant. Here is a simple example. One can take a Facebook user u and form a graph of that users network of immediate friends N(u) (excluding u itself), where vertices are people and two people are connected by an edge if they are mutual friends; call this the user’s friendship neighborhood. It turns out that the characteristics of the average Facebook user’s friendship neighborhood resembles a random graph. So understanding random graphs helps us understand the structure of small networks of friends. If we’re particularly insightful, we can do quite useful things like identify anomalies, such as duplicitous accounts, which deviate quite far from the expected model. They can also help discover trends or identify characteristics that can allow for more accurate ad targeting. For more details on how such an idea is translated into mathematics and code, see Modularity (we plan to talk about modularity on this blog in the near future; lots of great linear algebra there!).

Random graphs, when they exhibit observed phenomena, have important philosophical consequences. From a bird’s-eye view, there are two camps of scientists. The first are those who care about leveraging empirically observed phenomena to solve problems. Many statisticians fit into this realm: they do wonderful magic with data fit to certain distributions, but they often don’t know and don’t care whether the data they use truly has their assumed properties. The other camp is those who want to discover generative models for the data with theoretical principles. This is more like theoretical physics, where we invent an arguably computational notion of gravity whose consequences explain our observations.

For applied purposes, the Erdős-Rényi random graph model is in the second camp. In particular, if something fits in the Erdős-Rényi model, then it’s both highly structured (as we will make clear in the sequel) and “essentially random.” Bringing back the example of Facebook, this says that most people in the average user’s immediate friendship neighborhood are essentially the same and essentially random in their friendships among the friends of u. This is not quite correct, but it’s close enough to motivate our investigation of random graph models. Indeed, even Paul Erdős in his landmark paper mentioned that equiprobability among all vertices in a graph is unrealistic. See this survey for a more thorough (and advanced!) overview, and we promise to cover models which better represent social phenomena in the future.

So lets go ahead and look at a technical proof using random graphs from combinatorics, and then write some programs to generate random graphs.

Girth and Chromatic Number, and Counting Triangles

As a fair warning, this proof has a lot of moving parts. Skip to the next section if you’re eager to see some programs.

Say we have a k and a g, and we wonder whether a graph can exist which simultaneously has no cycles of length less than g (the girth) and needs at least k colors to color. The following theorem settles this question affirmatively.  A bit of terminology: the chromatic number of a graph G, denoted \chi(G), is the smallest number of colors needed to properly color G.

Theorem: For any natural numbers k,g, there exist graphs of chromatic number at least k and girth at least g.

Proof. Taking our cue from random graphs, let’s see what the probability is that a random graph G(n,p) on n vertices will have our desired properties. Or easier, what’s the chance that it will not have the right properties? This is essentially a fancy counting argument, but it’s nicer if we phrase it in the language of probability theory.

The proof has a few twists and turns for those uninitiated to the probabilistic method of proof. First, we will look at an arbitrary G(n,p) (where n,p are variable) and ask two questions about it: what is the expected number of short cycles, and what is the expected “independence number” (which we will see is related to coloring). We’ll then pick a value of p, depending crucially on n, which makes both of these expectations small. Next, we’ll use the fact that if the probability that G(n,p) doesn’t have our properties is strictly less than 1, then there has to be some instance in our probability space which has those properties (if no instance had the property, then the probability would be one!). Though we will not know what the graphs look like, their existence is enough to prove the theorem.

So let’s start with cycles. If we’re given a desired girth of g, the expected number of cycles of length \leq g in G(n,p) can be bounded by (np)^{g+1}/(np-1). To see this, the two main points are how to count the number of ways to pick k vertices in order to form a cycle, and a typical fact about sums of powers. Indeed, we can think of a cycle of length k as a way to seat a choice of k people around a circular table. There are \binom{n}{k} possible groups of people, and (k-1)! ways to seat one group. If we fix k and let n grow, then the product \binom{n}{k}(k-1)! will eventually be smaller than n^k (actually, this happens almost immediately in almost all cases). For each such choice of an ordering, the probability of the needed edges occurring to form a cycle is p^j, since all edges must occur independently of each other with probability p.

So the probability that we get a cycle of j vertices is

\displaystyle \binom{n}{j}(j-1)!p^j

And by the reasoning above we can bound this by n^jp^j. Summing over all numbers j = 3, \dots, g (we are secretly using the union bound), we bound the expected number of cycles of length \leq g from above:

\displaystyle \sum_{j=3}^g n^j p^j < \sum_{j=0}^g n^j p^j = \frac{(np)^{g+1}}{np - 1}

Since we want relatively few cycles to occur, we want it to be the case that the last quantity, (np)^{j+1}/(np-1), goes to zero as n goes to infinity. One trick is to pick p depending on n. If p = n^l, our upper bound becomes n^{(l+1)(g+1)} / (n^{1+l} - 1), and if we want the quantity to tend to zero it must be that (l+1)(g+1) < 1. Solving this we get that -1 < l < \frac{1}{g+1} - 1 < 0. Pick such an l (it doesn’t matter which), and keep this in mind: for our choice of p, the expected number of cycles goes to zero as n tends to infinity.

On the other hand, we want to make sure that such a graph has high chromatic number. To do this we’ll look at a related property: the size of the largest independent set. An independent set of a graph G = (V,E) is a set of vertices S \subset V so that there are no edges in E between vertices of S. We call \alpha(G) the size of the largest independent set. The values \alpha(G) and \chi(G) are related, because any time you have an independent set you can color all the vertices with a single color. In particular, this proves the inequality \chi(G) \alpha(G) \geq n, the number of vertices of G, or equivalently \chi(G) \geq n / \alpha(G). So if we want to ensure \chi(G) is large, it suffices to show \alpha(G) is small (rigorously, \alpha(G) \leq n / k implies \chi(G) \geq k).

The expected number of independent sets (again using the union bound) is at most the product of the number of possible independent sets and the probability of one of these having no edges. We let r be arbitrary and look at independent sets of size r Since there are \binom{n}{r} sets and each has a probability (1-p)^{\binom{r}{2}} of being independent, we get the probability that there is an independent set of size r is bounded by

\displaystyle \textup{P}(\alpha(G) \geq r) \leq \binom{n}{r}(1-p)^{\binom{r}{2}}

We use the fact that 1-x < e^{-x} for all x to translate the (1-p) part. Combining this with the usual \binom{n}{r} \leq n^r we get the probability of having an independent set of size r at most

\displaystyle \textup{P}(\alpha(G) \geq r) \leq \displaystyle n^re^{-pr(r-1)/2}

Now again we want to pick r so that this quantity goes to zero asymptotically, and it’s not hard to see that r = \frac{3}{p}\log(n) is good enough. With a little arithmetic we get the probability is at most n^{(1-a)/2}, where a > 1.

So now we have two statements: the expected number of short cycles goes to zero, and the probability that there is an independent set of size at least r goes to zero. If we pick a large enough n, then the expected number of short cycles is less than n/5, and using Markov’s inequality we see that the probability that there are more than n/2 cycles of length at most g is strictly less than 1/2. At the same time, if we pick a large enough n then \alpha(G) \geq r with probability strictly less than 1/2. Combining these two (once more with the union bound), we get

\textup{P}(\textup{at least } n/2 \textup{ cycles of length } \leq g \textup{ and } \alpha(G) \geq r) < 1

Now we can conclude that for all sufficiently large n there has to be a graph on at least n vertices which has neither of these two properties! Pick one and call it G. Now G still has cycles of length \leq g, but we can fix that by removing a vertex from each short cycle (it doesn’t matter which). Call this new graph G'. How does this operation affect independent sets, i.e. what is \alpha(G')? Well removing vertices can only decrease the size of the largest independent set. So by our earlier inequality, and calling n' the number of vertices of G', we can make a statement about the chromatic number:

\displaystyle \chi(G') \geq \frac{n'}{\alpha(G')} \geq \frac{n/2}{\log(n) 3/p} = \frac{n/2}{3n^l \log(n)} = \frac{n^{1-l}}{6 \log(n)}

Since -1 < l < 0 the numerator grows asymptotically faster than the denominator, and so for sufficiently large n the chromatic number will be larger than any k we wish. Hence we have found a graph with girth at least g and chromatic number at least k.


Connected Components

The statistical properties of a random graph are often quite easy to reason about. For instance, the degree of each vertex in G(n,p) is np in expectation. Local properties like this are easy, but global properties are a priori very mysterious. One natural question we can ask in this vein is: when is G(n,p) connected? We would very much expect the answer to depend on how p changes in relation to n. For instance, p might look like p(n) = 1/n^2 or \log(n) / n or something similar. We could ask the following question:

As n tends to infinity, what limiting proportion of random graphs G(n,p) are connected?

Certainly for some p(n) which are egregiously small (for example, p(n) = 0), the answer will be that no graphs are connected. On the other extreme, if p(n) = 1 then all graphs will be connected (and complete graphs, at that!). So our goal is to study the transition phase between when the graphs are disconnected and when they are connected. A priori this boundary could be a gradual slope, where the proportion grows from zero to one, or it could be a sharp jump. Next time, we’ll formally state and prove the truth, but for now let’s see if we can figure out which answer to expect by writing an exploratory program.

We wrote the code for this post in Python, and as usual it is all available for download on this blog’s Github page.

We start with a very basic Node class to represent each vertex in a graph, and a function to generate random graphs

import random
class Node:
   def __init__(self, index):
      self.index = index
      self.neighbors = []

   def __repr__(self):
      return repr(self.index)

def randomGraph(n,p):
   vertices = [Node(i) for i in range(n)]
   edges = [(i,j) for i in xrange(n) for j in xrange(i) if random.random() < p]

   for (i,j) in edges:

   return vertices

The randomGraph function simply creates a list of edges chosen uniformly at random from all possible edges, and then constructs the corresponding graph. Next we have a familiar sight: the depth-first search. We use it to compute the graph components one by one (until all vertices have been found in some run of a DFS).

def dfsComponent(node, visited):
   for v in node.neighbors:
      if v not in visited:
         dfsComponent(v, visited)

def connectedComponents(vertices):
   components = []
   cumulativeVisited = set()

   for v in vertices:
      if v not in cumulativeVisited:
        componentVisited = set([v])
        dfsComponent(v, componentVisited)

        cumulativeVisited |= componentVisited

   return components

The dfsComponent function simply searches in breadth-first fashion, adding every vertex it finds to the “visited” set.  The connectedComponents function keeps track of the list of components found so far, as well as the cumulative set of all vertices found in any run of bfsComponent. Hence, as we iterate through the vertices we can ignore vertices we’ve found in previous runs of bfsComponent. The “x |= y” notation is python shorthand for updating x via a union with y.

Finally, we can make a graph of the largest component of (independently generated) random graphs as the probability of an edge varies.

def sizeOfLargestComponent(vertices):
   return max(len(c) for c in connectedComponents(vertices))

def graphLargestComponentSize(n, theRange):
   return [(p, sizeOfLargestComponent(randomGraph(n, p))) for p in theRange]

Running this code and plotting it for p varying from zero to 0.5 gives the following graph.


The blue line is the size of the largest component, and the red line gives a moving average estimate of the data.  As we can see, there is a very sharp jump peaking at p=0.1 at which point the whole graph is connected. It would appear there is a relatively quick “phase transition” happening here. Let’s zoom in closer on the interesting part.


It looks like the transition begins around 0.02, and finishes at around 0.1. Interesting… Let’s change the parameters a bit, and increase the size of the graph. Here’s the same chart (in the same range of p values) for a graph with a hundred vertices.


Now the phase transition appears to have shifted to about (0.01, 0.05), which is about multiplying the endpoints of the phase transition interval above by 1/2. The plot thickens… Once more, let’s move up to a graph on 500 vertices.


Again it’s too hard to see, so let’s zoom in.


This one looks like the transition starts at 0.002 and ends at 0.01. This is a 5-fold decrease from the previous one, and we increased the number of vertices by 5. Could this be a pattern? Here’s a conjecture to formalize it:

Conjecture: The random graph G(n,p) enters a phase transition at p=1/n and becomes connected almost surely at p=5/n.

This is not quite rigorous enough to be a true conjecture, but it sums up our intuition that we’ve learned so far. Just to back this up even further, here’s an animation showing the progression of the phase transition as n = 20 \dots 500 in steps of twenty. Note that the p range is changing to maintain our conjectured window.


Looks pretty good. Next time we’ll see some formal mathematics validating our intuition (albeit reformulated in a nicer way), and we’ll continue to investigate other random graph models.

Until then!

Ramsey Number Lower Bound

Define the Ramsey number R(k,m) to be the minimum number n of vertices required of the complete graph K_n so that for any two-coloring (red, blue) of the edges of K_n one of two things will happen:

  • There is a red k-clique; that is, a complete subgraph of k vertices for which all edges are red.
  • There is a blue m-clique.

It is known that these numbers are always finite, but it is very difficult to compute them exactly.

Problem: Prove that the Ramsey number R(m,m) > n whenever n,m satisfy

\displaystyle \binom{n}{m}2^{1-\binom{m}{2}} < 1

Solution: Color the edges of K_n uniformly at random (that is, each edge has probability 1/2 of being colored red). For any complete subgraph G = K_m, define by event A_G the event that G is monochromatic (its edges are either all red or all blue).

Now the probability that A_G occurs (where G is fixed ahead of time) is easy to compute:

\displaystyle \textup{Pr}(A_G) = \left (\frac{1}{2} \right)^{\binom{m}{2} - 1} = 2^{1-\binom{m}{2}}

Since there are \binom{n}{m} possible subgraphs with m vertices, The probability that for some G the event A_G occurs is at most

\displaystyle \binom{n}{m}2^{1-\binom{m}{2}}

Whenever this quantity is strictly less than 1 (by assumption) then there is a positive probability that no event A_G will occur. That is, there is a positive probability that a random coloring will have no monochromatic subgraph K_m. So there must exist such a coloring, and the Ramsey number R(m,m) must be larger than n. \square

Discussion: This proof (originally due to Erdős) is a classic example of the so-called probabilistic method. In particular, we create a probability space from the object we wish to study, and then we make claims about the probability of joint events.

While it seems quite simple in nature, the probabilistic method has been successfully applied to a wide variety of problems in mathematics. For instance, there is an elegant proof in complexity theory that \textup{BPP} \subset \textup{P/poly} which uses this same method. The probabilistic method has been applied to loads of problems in combinatorics, number theory, and graph theory, and it forms the foundation of the area of random graph theory (which is the setting in which one studies social networks). Perhaps unsurprisingly, there is also a proof of the fundamental theorem of algebra that uses the probabilistic method.

n-Colorability is Equivalent to Finite n-Colorability (A Formal Logic Proof)

Warning: this proof requires a bit of familiarity with the terminology of propositional logic and graph theory.

Problem: Let G be an infinite graph. Show that G is n-colorable if and only if every finite subgraph G_0 \subset G is n-colorable.

Solution: One of the many equivalent versions of the Compactness Theorem for the propositional calculus states that if \Sigma \subset \textup{Prop}(A), where A is a set of propositional atoms, then \Sigma is satisfiable if and only if any finite subset of \Sigma is satisfiable. (Recall that a set of propositions is satisfiable if for some truth valuation t:A \to \left \{ 0,1 \right \} the unique extension \hat{t}:\Sigma \to \left \{ 0,1 \right \} satisfies \hat{t}(p) = 1 for all p \in \Sigma. The function t is called a model for \Sigma). This is equivalent to the Completeness Theorem, which says that if one can use \Sigma to prove p, then every model of \Sigma satisfies \hat{t}(p) = 1. Both are fundamental results in logic.

And so we will convert this graph coloring problem into a logical set of propositions, and use the Compactness Theorem against it. We want a set of propositions \Sigma which has a model if and only if the corresponding graph is n-colorable. Then we will use the n-colorability of finite subgraphs, to show that all finite subsets of \Sigma have models, and this implies by the compactness theorem that \Sigma has a model, so the original infinite graph is n-colorable.

We may think of a coloring of a graph G as a function on the set of vertices: c:V \to \left \{ 1, 2, \dots, n \right \}. Define our set of propositional atoms as A = V \times \left \{ 1, 2, \dots, n \right \}. In other words, we identify a proposition p_{v,i} to each vertex and possible color. So we will define three sets of propositions using these atoms, which codify the conditions of a valid coloring:

  • \left \{ p_{v,1} \vee p_{v,2} \vee \dots \vee p_{v,n} : v \in V \right \} i.e. every vertex must have some color,
  • \left \{ \lnot (p_{v,i} \wedge p_{v,j}) : i,j = 1, \dots, n, i \neq j, v \in V \right \} i.e. no vertex may have two colors, and
  • \left \{ \lnot (p_{v,i} \wedge p_{w,i}) : \textup{whenever } (v,w) \textup{ is an edge in } G \right \} i.e. no two adjacent vertices may have the same color.

Let \Sigma be the union of the above three sets. Take \Sigma_0 \subset \Sigma to be any finite set of the above propositions. Let V_0 be the finite subset of vertices of G which are involved in some proposition of \Sigma_0 (i.e., p_{v,i} \in \Sigma_0 if and only if v \in V_0). Since every proposition involves finitely many atoms, V_0 is finite, and hence the subgraph of vertices of V_0 is n-colorable, with some coloring c: V_0 \to \left \{ 1, 2, \dots, n \right \}. We claim that this c induces a model on \Sigma_0.

Define a valuation t:A \to \left \{ 0,1 \right \} as follows. If v \notin V_0, then we (arbitrarily) choose t(p_{v,i}) = 1. If v \in V_0 and c(v) = i then t(p_{v,1}) = 1. Finally, if v \in V_0 and c(v) \neq i then t(p_{v,i} = 0).

Clearly each of the possible propositions in the above three sets is true under the extension \hat{t}, and so \Sigma_0 has a model. Since \Sigma_0 was arbitrary, \Sigma is finitely satisfiable. So by the Compactness Theorem, \Sigma is satisfiable, and any model s for \Sigma gives a valid graph coloring, simply by choosing i such that the proposition p_{v,i} satisfies s(p_{v,i}) = 1. Our construction forces that such a proposition exists, and hence G is n-colorable. _\square.

Note that without translating this into a logical system, we would be left with the mess of combining n-colorable finite graphs into larger n-colorable finite graphs. The number of cases we imagine encountering are mind-boggling. Indeed, there is probably a not-so-awkward graph theoretical approach to this problem, but this proof exemplifies the elegance that occurs when two different fields of math interact.

Graph Coloring, or Proof by Crayon

This is a precursor to a post which will actually use graph coloring to do interesting computational things. Even so, there are many fascinating ideas and theorems that result from graph coloring, so we devote an entire post just to it. For those who need an additional primer on the basic ideas of graph theory, see our gentle primer on the subject.

Why, Paris Hath Colour Enough.

How many colors are required to color the provinces of Costa Rica?

How many colors are required to color the provinces of Costa Rica?

A common visual aid for maps is to color the regions of the map differently, so that no two regions which share a border also share a color. For example, to the right is a map of the provinces of Costa Rica (where the author is presently spending his vacation). It is colored with eight different colors, one for each province. Of course, before the days of 24-bit “true color,” designing maps was a long and detailed process, and coloring them was expensive. It was in a cartographer’s financial interest to minimize the number of colors used for any given map. This raises the obvious question, what is the minimum number of colors required to color Costa Rica in a way that no two provinces which share a common border share a common color?

With a bit of fiddling, the reader will see that the answer is obviously three. The astute reader will go further to notice that this map contains a “triangle” of three provinces which pairwise share common borders. Hence even these three alone cannot be colored with just two.

This problem becomes much trickier, and much more interesting to we mathematicians, as the number of provinces increases. So let us consider the map of the arrondissements of Paris.

The twenty arrondissements of Paris, which are arranged in a strangely nautilusy spiral.

With so many common borders, we require a more mathematically-oriented description of the problem. Enter graph theory.

Let us construct an undirected graph P (for Paris), in which the vertices are the arrondissements, and two vertices are connected by an edge if and only if they share a common border. In general, when we color graphs we do not consider two regions to share a border if their borders intersect in a single point, as in the four corners of the United States. Performing this construction on Paris, we get the following graph. Recognizing that the positions of the corresponding vertices are irrelevant, we shift them around to have a nicely spread-out picture.

Now that the relationships between arrondissements are decidedly unambiguous, we may rigorously define the problem of coloring a graph.

Definition: A coloring of a graph G = (V,E,\varphi ) is a map f: V \to \left \{ 1 \dots n \right \}, such that if v,w \in V are connected by an edge, then f(v) \neq f(w). We call n the size of a coloring, and if G has a coloring of size n we say that G is n-colorable, or that it has an n-coloring.

Definition: The chromatic number of a graph G, denoted \chi(G) is the smallest positive number k such that G is k-colorable.

Our first map of Costa Rica was 3-colorable, and we saw indeed that \chi(G) = 3.

We now turn our attention to coloring Paris. To do so, we use the following “greedy” algorithm. For the sake of the pseudocode, we represent colors by natural numbers.

Given a set of vertices V of a graph G:
   For each vertex w in V:
      let S_w be the set of colors which have been previously
        assigned to neighbors of w
      Color w with the smallest number that is not in S_w

Obviously, the quality of the coloring depends on the order in which we investigate the vertices. Further, there always exists an ordering of the vertices for which this greedy algorithm produces a coloring which achieves the minimum required number of colors, \chi(G). Applying this to Pairs, we get the following 4-coloring:

Translating this back into its original form, we now have a pretty picture of Paris.

The Parisians will be thrilled when they find out they’re at most 4-colorable.

But this is not enough. Can we do it with just three colors? The call of the wild beckons! We challenge the reader to find a 3-coloring of Paris, or prove its impossibility. In the mean time, we have other matters to attend to.

We note (without proof) that to determine \chi(G) algorithmically is NP-complete in general. Colloquially this means a fast solution is believed not to exist, and to find one (or prove this belief is true) would immediately make one the most famous mathematician in the world. In other words, it’s very, very hard. The only general solutions take at worst an exponential or factorial amount of time with regards to the number of vertices in the graph. In other words, trying to run a general solution on a graph of 50 vertices is likely to take 50! (a 65-digit number) steps to finish, far beyond the computational capacity of any computer in the foreseeable future.

We Are But Planar Fellows, Sir.

Now we turn to a different aspect of graph theory, which somewhat simplifies the coloring process. The maps we have been investigating thus far are special. Specifically, their representations as graphs admit drawings in which no two edges cross. This is obvious by our construction, but on the other hand we recognize that not all graphs have such representations. For instance, try to draw the following graph so that no two edges cross:

If it seems hard, that’s because it’s impossible. The proof of this fact requires a bit more work, and we point the reader in the direction of Wagner’s Theorem. For now, we simply recognize that not all graphs have such drawings. This calls for a definition!

Definition: A simple graph is planar if it can be embedded in \mathbb{R}^2, or, equivalently, if it can be drawn on a piece of paper in such a way that any two intersecting edges do so only at a vertex.

As we have seen, the graphs corresponding to maps of arrondissements or provinces or states are trivially planar. And after a lifetime of cartography, we might notice that every map we attempt to color admits a 4-coloring. We now have a grand conjecture, that every planar graph is 4-colorable.

The audacity of such a claim! You speak from experience, sir, but experience certainly does not imply proof. If we are to seriously tackle this problem, let us try a simpler statement first: that every planar graph is 5-colorable. In order to do so, we need a few lemmas:

Lemma: If a graph G has e edges, then \displaystyle 2e = \sum \limits_{v \in V} \textup{deg}(v).

Proof. Since the degree of a vertex is just the number of edges touching it, and every edge touches exactly two vertices, by counting up the degrees of each vertex we count twice the number of edges. \square

Lemma: If a planar graph G has e edges and f faces (regions bounded by edges, or the exterior of a graph) then 2e \geq 3f, and f \leq (2/3) e.

Proof. In a planar graph, every face is bounded by at least three edges (by definition), and every edge touches at most two faces. Hence, 3f counts each edge at most twice, while 2e counts each face at least three times. \square.

We also recognize without proof the Euler Characteristic Formula for Planar Graphs, (which has an equally easy proof, provided you understand what an edge contraction is): n-e+f=2, where G has n vertices, e edges, and f faces.

Lemma: Every planar graph has a vertex of degree less than 6.

Proof. Supposing otherwise, we may arrive at a contradiction in Euler’s formula: 2 = v - e + f \leq v - e + (2/3) e, so rearranging terms we get e \leq 3v - 6, and so 2e \leq 6v - 12. Recall that 2e is the sum of the degrees of the vertices of G. If every vertex has degree at least 6, then 6v \leq 2e \leq 6v - 12, which is just silly. \square.

This is all we need for the five color theorem, so without further ado:

Theorem: Every planar graph admits a 5-coloring.

Proof. Clearly every graph on fewer than 6 vertices has a 5-coloring. We proceed by induction on the number of vertices.

Suppose to the contrary that G is a graph on n vertices which requires at least 6 colors. By our lemma above, G has a vertex x of degree less than 6. Removing x from G, the inductive hypothesis provides a 5-coloring of the remaining graph on n-1 vertices. We simply need to color x with one of these five colors, and the theorem is proved.

If x has fewer than 5 neighbors or its neighbors use fewer than 5 colors, then we may assign it the color that is not used in any of its neighbors. Otherwise, we may assume x has five neighbors, and each neighbor has a distinct color (presumably requiring x to have the sixth, new color). Let v_1, v_2, v_3, v_4, v_5 be the neighbors of x, labelled in some clockwise order around x, and say they have colors 1,2,3,4,5, respectively.

Consider G_{1,3}, the subgraph of G which contains only vertices of colors 1 and 3. There are two cases: either v_1 is in the same connected component as v_3 (there exists a path connecting them which does not go through x), or not. If not, then we may take the component which contains v_1, and switch the colors 1 and 3. Then x is connected to two vertices of color 3, and we may color it 1.

Otherwise, there is a path from v_1 to v_3. This path is special, because with x, the path v_1 v_3 x completely encapsulates v_2 with vertices of colors 1 and 3 (and x). The planarity of the graph implies, then, that v_2 cannot be connected to either of v_4 or v_5. Hence, we may repeat the color-swapping  argument on G_{2,4}, the subgraph of colors 2 and 4, to give x color 2. \square.

So that was entirely done with elementary arguments, and is a pretty neat proof. Of course, we really want the head of the four-color theorem on a mantle, but he is a much more elusive beast.

Computers Doing Proofs? Blasphemy.

But that is what it took to finish off the four-color theorem. Despite a hundred years of research and diversions, it took a brute-force computer search of 1,936 configurations, and over 1200 computer hours, to prove that no counterexample to the four-color theorem exists.

It was proved in 1976 by Appel and Haken, and it was very controversial for a long time, in principle because no human could verify that the computer hadn’t made an error. On the other hand, it was not hard to verify the correctness of the program itself, but after 1200 CPU hours, who can guarantee a bit wasn’t mistakenly flipped by some cosmic rays or a magnetic malfunction? Luckily, the proof was further simplified and reproved over the next twenty years, so today the problem is widely believed to be solved.

So indeed it is true! We may color every map with just four colors! Unfortunately, even a sketch of the proof is far beyond the scope of this blog. We are content to marvel at the work ethic of other mathematicians.

Next time, we will discuss the applications of graph coloring to computer problems, such as job scheduling and register allocation in compilers.

Until then!