Anti-Coordination Games and Stable Graph Colorings

My First Paper

I’m pleased to announce that my first paper, titled “Anti-Coordination Games and Stable Colorings,” has been accepted for publication! The venue is the Symposium on Algorithmic Game Theory, which will take place in Aachen, Germany this October. A professor of mine once told me that everyone puts their first few publications on a pedestal, so I’ll do my best to keep things down to earth by focusing on the contents of the paper and not my swirling cocktail of pride. The point of this post is to explain the results of my work to a non-research-level audience; the research level folks will likely feel more comfortable reading the paper itself. So here we’ll spend significantly longer explaining the proofs and the concepts, and significantly less time on previous work.

I will assume familiarity with basic graph theory (we have a gentle introduction to that here) and NP-completeness proofs (again, see our primer). We’ll give a quick reminder about the latter when we get to it.

Anti-Coordination Games on Graphs

The central question in the paper is how to find stable strategy profiles for anti-coordination games played on graphs. This section will flush out exactly what all of that means.

The easiest way to understand the game is in terms of fashion. Imagine there is a group of people. Every day they choose their outfits individually and interact with their friends. If any pair of friends happens to choose the same clothing, then they both suffer some embarrassment. We can alternatively say that whenever two friends anti-coordinate their outfits, they each get some kind of reward. If not being embarrassed is your kind of reward, then these really are equivalent. Not every pair of people are friends, so perhaps the most important aspect of this problem is how the particular friendship network considered affects their interactions. This kind of game is called an anti-coordination game, and the network of friends makes it a “game on a graph.” We’ll make this more rigorous shortly.

We can ask questions like, if everyone is acting independently and greedily will their choices converge over time to a single choice of outfit? If so how quickly? How much better could a centralized fashion-planner who knows the entire friendship network fare in choosing outfits? Is the problem of finding a best strategy for picking outfits computationally hard? What if some pairs of people want to coordinate their outfits and others don’t? What if caring about another’s fashion is only one-sided in some cases?

Already this problem is rooted in the theory of social networks, but the concept of an anti-coordination game played on a graph is quite broad, and the relevance of this model to the real world comes from the generality of a graph. For example, one may consider the trading networks of various countries; in this case not all countries are trading partners, and it is beneficial to produce different commodities than your trading partners so that you actually benefit from the interaction. Likewise, neighboring radio towers want to emit signals on differing wavelengths to minimize interference, and commuters want to pick different roadways to minimize traffic. These are all examples of this model which we’re about to formalize.

In place of our “network of friends,” the game entails a graph $G = (V,E)$ in which each player is represented by a vertex, and there is an edge between two vertices whenever the corresponding players are trying to anti-coordinate. We will use the terms player and vertex interchangeably. For now the graph is undirected, but later we will relax this assumption and work with directed graphs. In place of “outfits” we’ll have a generic set of strategies denoted by the numbers $1, \dots, k$, and each vertex will choose a strategy from this set. In one round of the game, each vertex $v$ chooses a strategy, and this defines a function $f : V \to \left \{ 1, \dots, k \right \}$ from the set of vertices to the set of strategies. Then the payoff of a vertex $v$ in a round, which we denote $\mu_f(v)$, is the number of neighbors of $v$ which have chosen a different strategy than $v$. That is, it is

$\displaystyle \mu_f(v) = \sum_{(v,w) \in E} \mathbf{1}_{\left \{ f(v) \neq f(w) \right \}}$

Where $\mathbf{1}_{A}$ denotes the indicator function for the event $A$, which assumes a value of 1 when the event occurs and 0 otherwise. Here is an example of an instance of the game. We have three strategies, denoted by colors, and the payoff for the vertex labeled $v$ is three.

If this game is played over many many rounds, we can ask if there is a so-called Nash equilibrium. That is, is there a choice of strategies for the players so that no single player will have an incentive to change in future rounds, assuming nobody else changes? We restrict even further to thinking about pure strategy Nash equilibria, which means there are no probabilistic choices made in choosing a strategy. Equivalently, a pure strategy equilibrium is just a choice of a strategy for each vertex which doesn’t change across rounds. In terms of the graph, we call a strategy function $f$ which is a Nash equilibrium a stable equilibrium (or, as will be made clear in the next paragraph, a stable coloring). It must satisfy the property that no vertex can increase its payoff by switching to a different strategy. So our question now becomes: how can we find a stable coloring which as good as possible for all players involved? Slightly more generally, we call a Nash equilibrium a strictly stable equilibrium (or a strictly stable coloring) if every vertex would strictly decrease its payoff by switching to another strategy. As opposed to a plain old stable coloring where one could have the same payoff by switching strategies, if any player tries to switch strategy then it will get a necessarily worse payoff. Though it’s not at all clear now, we will see that this distinction is the difference between computational tractability and infeasibility.

We can see a very clear connection between this game and graph coloring. Here an edge produces a payoff of 1 for each of its two vertices if and only if it’s properly colored. And so if the strategy choice function $f$ is also a proper coloring, this will produce the largest possible payoff for all vertices in the graph. But it may not be the case that (for a fixed set of strategies) the graph is properly colorable, and we already know that finding a proper coloring with more than two colors is a computationally hard problem. So this isn’t a viable avenue for solving our fashion game. In any case, the connection confuses us enough to interchangeably call the strategy choice function $f$coloring of $G$.

As an interesting side note, a slight variation of this game was actually tested on humans (with money as payoff!) to see how well they could do. Each human player was only shown the strategies of their neighbors, and received $5 for every round in which they collectively arrived at a proper coloring of the graph. See this article in Science for more details. Since our game allows for the presence of improperly colored edges, we could instead propose to find an assignment of colors to vertices which maximizes the sum of the payoffs of all players. In this vein, we define the social welfare of a graph and a coloring, denoted $W(G,f)$, to be the sum of the payoffs for all vertices $\sum_v \mu_f(v)$. This is a natural quantity one wants to analyze. Unfortunately, even in the case of two strategies, this quantity is computationally difficult (NP-hard) to maximize. It’s a version of the MAX-CUT problem, in which we try to separate the graph into two sets $X, Y$ such that the largest number of edges crosses from $X$ to $Y$. The correspondence between the two problems is seen by having $X$ represent those vertices which get strategy 1 and $Y$ represent strategy 2. So we can’t hope to find an efficient algorithm maximizing social welfare. The next natural question is: can we find stable or strictly stable colorings at all? Will they even necessarily exist? The answers to these questions form the main results of our paper. An Algorithm for Stable Colorings, and the Price of Anarchy It turns out that there is a very simple greedy algorithm for finding stable colorings of a graph. We state it in the form of a proposition. By stable $k$-coloring we mean a stable coloring of the graph with $k$ colors (strategies). Proposition: For every graph $G$ and every $k \geq 1$, $G$ admits a stable $k$-coloring, and such a coloring can be found in polynomial time. Proof. The proof operates by using the social welfare function as a so-called potential function. That is, a change in a player’s strategy which results in a higher payoff results in a higher value of the social welfare function. It is easy to see why: if a player $v$ changes to a color that helps him, then it will result in more properly colored edges (adjacent to $v$) than there were before. This means that more of $v$‘s neighbors receive an additional 1 unit of payoff than those that lost 1 as a result of $v$‘s switch. We call a vertex which has the potential to improve its payoff unhappy, and one which cannot improve its payoff happy. And so our algorithm to find a stable coloring simply finds some unhappy vertex, switches its color to the most uncommon color among its neighbors, and repeats the process until all vertices are happy. Indeed, this is a local maximum of the social welfare function, and the very definition of a stable coloring. $\square$ So that was nice, but we might ask: how much worse is the social welfare arrived at by this algorithm than the optimal social welfare? How much do we stand to lose by accepting the condemnation of NP-hardness and settling for the greedy solution we found? More precisely, if we call $Q$ the set of stable colorings and $C$ the set of all possible colorings, what is the value of $\displaystyle \frac{\max_{c' \in C} W(G, c')}{\min_{c \in Q} W(G, c)}$ This is a well-studied quantity in many games, called the price of anarchy. The name comes from the thought: what do we stand to gain by having a central authority, who can see the entire network topology and decide what is best for us, manage our strategies? The alternative (anarchy) is to have each player in the game act as selfishly and rationally as possible without complete information. It turns out that as the number of strategies grows large in our anti-coordination game, there is no price of anarchy. For our game this obviously depends on the choice of graph, but we know what it is and we formally state the result as a proposition: Proposition: For any graph, the price of anarchy for the $k$ strategy anti-coordination game is at most $k/(k-1)$ and this value is actually achieved by some instances of the game. Proof. The pigeonhole principle says that every vertex can always achieve at least a $(k-1)/k$ fraction of its maximum possible payoff. Specifically, if a vertex $v_i$ can’t achieve a proper coloring, then every color must be accounted for among $v_i$‘s neighbors. Choosing the minimally occurring color will give $v_i$ at least a payoff of $d_i(k-1)/k$ where $d_i$ is the number of neighbors of $v_i$. Since every stable coloring has to satisfy the condition that no vertex can do any better than the strategy it already has, even in the worst stable coloring every vertex already has chosen such a minority color. Since the maximum payoff is twice the number of edges $2 |E|$, and the sum of the degrees $\sum_i d_i = 2 |E|$, we have that the price of anarchy is at most $\displaystyle \frac{2|E|}{\frac{k-1}{k} \sum_i d_i} = \frac{k}{k-1}$ Indeed, we can’t do any better than this in general, because the following graph gives an example where the price of anarchy exactly meets this bound. An instance of the anti-coordination game with 5 strategies which meets the upper bound on price of anarchy. This example can easily be generalized to work with arbitrary $k$. We leave the details as an exercise to the reader. $\square$ Strictly Stable Colorings are Hard to Find Perhaps surprisingly, the relatively minor change of adding strictness is enough to make computability intractable. We’ll give an explicit proof of this, but first let’s recall what it means to be intractable. Recall that a problem is in NP if there is an efficient (read, polynomial-time) algorithm which can verify a solution to the problem is actually a solution. For example, the problem of proper graph $k$-coloring is in NP, because if someone gives you a purported coloring all you have to do is verify that each of the $O(n^2)$ edges are properly colored. Similarly, the problem of strictly stable coloring is in NP; all one need do is verify that no choice of a different color for any vertex improves its payoff, and it is trivial to come up with an algorithm which checks this. Next, call a problem $A$ NP-hard if a solution to $A$ allows you to solve any problem in NP. More formally, $A$ being NP-hard means that there is a polynomial-time reduction from any problem in NP $B$ to $A$ in the following (rough) sense: there is a polynomial-time computable function (i.e. deterministic program) $f$ which takes inputs for $B$ and transforms them into inputs for $A$ such that: $w$ is a solvable instance of $B$ is if and only if $f(w)$ is solvable for $A$. This is not a completely formal definition (see this primer on NP-completeness for a more serious treatment), but it’s good enough for this post. In order to prove a problem $C$ is NP-hard, all you need to do is come up with a polynomial-time reduction from a known NP-hard problem $A$ to your new problem $C$. The composition of the reduction used for $A$ can be composed with the reduction for $C$ to get a new reduction proving $C$ is NP-hard. Finally, we call a problem NP-complete if it is both in NP and NP-hard. One natural question to ask is: if we don’t already know of any NP-hard problems, how can we prove anything is NP-hard? The answer is: it’s very hard, but it was done once and we don’t need to do it again (but if you really want to, see these notes). As a result, we have generated a huge list of problems that are NP-complete, and unless P = NP none of these algorithms have polynomial-time algorithms to solve them. We need two examples of NP-hard problems for this paper: graph coloring, and boolean satisfiability. Since we assume the reader is familiar with the former, we recall the latter. Given a set of variables $x_i$, we can form a boolean formula over those variables of the form $\varphi = C_1 \wedge C_2 \wedge \dots \wedge C_m$ where each clause $C_i$ is a disjunction of three literals (negated or unnegated variables). For example, $C_i = (x_2 \vee \bar{x_5} \vee \bar{x_9})$ might be one clause. Here interpret a formula as the $x_i$ having the value true or false, the horizontal bars denoting negation, the wedges $\wedge$ meaning “and” and the vees $\vee$ meaning “or.” We call this particular form conjunctive normal form. A formula $\varphi$ is called satisfiable if there is a choice of true and false assignments to the variables which makes the entire logical formula true. The problem of determining whether there is any satisfying assignment of such a formula, called 3-SAT, is NP-hard. Going back to strictly stable equilibria and anti-coordination games, we will prove that the problem of determining whether a graph has a strictly stable coloring with $k$ colors is NP-hard. As a consequence, finding such an equilibrium is NP-hard. Since the problem is also in NP, it is in fact NP-complete. Theorem: For all $k \geq 2$, the problem of determining whether a graph $G$ has a strictly stable coloring with $k$ colors is NP-complete. Proof. The hardest case is $k =2$, but $k \geq 3$ is a nice warmup to understand how a reduction works, so we start there. The $k \geq 3$ part works by reducing from graph coloring. That is, our reduction will take an input to the graph $k$-coloring problem (a graph $G$ whose $k$-colorability is in question) and we produce a graph $G'$ such that $G$ is $k$-colorable if and only if $G'$ has a strictly stable coloring with $k$ colors. Since graph coloring is hard for $k \geq 3$, this will prove our problem is NP-hard. More specifically, we will construct $G'$ in such a way that the strictly stable colorings also happen to be proper colorings! So finding a strictly stable coloring of $G'$ will immediately give us a proper coloring of $G$. The construction of $G'$ is quite straightforward. We start with $G' = G$, and then for each edge $e = (u,v)$ we add a new subgraph which we call $H_e$ that looks like: By $K_{k-2}$ we mean the complete graph on $k-2$ vertices (all possible edges are present), and the vertices $u,v$ are adjacent to all vertices of the $H_e = K_{k-2}$ part. That is, the graph $H_e \cup \left \{ u,v \right \}$ is the complete graph on $k$ vertices. Now if the original graph $G$ was $k$-colorable, then we can use the same colors for the corresponding vertices in $G'$, and extend to a proper coloring (and hence a strictly stable equilibrium) of all of $G'$. Indeed, for any $H_e$ we can use one different color for each vertex of the $K_{k-2}$ part if we don’t use either of the colors used for $u,v$, then we’ll have a proper coloring. On the other hand, if $G'$ has a strictly stable equilibrium, then no edge $e$ which originally came from $G$ can be improperly colored. If some edge $e = (u,v)$ were improperly colored, then there would be some vertex in the corresponding $H_e$ which is not strictly stable. To see this, notice that in the $k$ vertices among $H_e \cup \left \{ u,v \right \}$ there can be at most $k-1$ colors used, and so any vertex will always be able to switch to that color without hurting his payoff. That is, the coloring might be stable, but it won’t be strictly so. So strictly stable colorings are the same as proper colorings, and we already see that the subgraph $G \subset G'$ is $k$-colorable, completing the reduction. Well that was a bit of a cheap trick, but it shows the difficulty of working with strictly stable equilibria: preventing vertices from changing with no penalty is hard! What’s worse is that it’s still hard even if there are only two colors. The reduction here is a lot more complicated, so we’ll give a sketch of how it works. The reduction is from 3-SAT. So given a boolean formula $\varphi = C_1 \wedge \dots \wedge C_m$ we produce a graph $G$ so that $\varphi$ has a satisfying assignment if and only if $G$ has a strictly stable coloring with two colors. The principle part of the reduction is the following gadget which represents the logic inherent in a clause. We pulled the figure directly from our paper, since the caption gives a good explanation of how it works. To reiterate, the two “appendages” labeled by $x$ correspond to the literal $x$, and the choice of colors for these vertices correspond to truth assignments in $\varphi$. In particular, if the two vertices have the same color, then the literal is assigned true. Of course, we have to ensure that any $x$‘s showing up in other clause gadgets agree, and any $\bar{x}$‘s will have opposite truth values. That’s what the following two gadgets do: The gadget on the left enforces x’s to have the same truth assignment across gadgets (otherwise the center vertex won’t be in strict equilibrium). The gadget on the right enforces two literals to be opposites. And if we stitch the clauses together in a particular way (using the two gadgets above) then we will guarantee consistency across all of the literals. All that’s left to check is that the clauses do what they’re supposed to. That is, we need it to be the case that if all of the literals in a clause gadget are “false,” then we can’t complete the coloring to be strictly stable, and otherwise we can. Indeed, the following diagram gives all possible cases of this up to symmetry: The last figure deserves an explanation: if the three literals are all false, then we can pick any color we wish for $v_1$, and its two remaining neighbors must both have the same color (or else $v_1$ is not in strict equilibrium). Call this color $a$, and using the same reasoning call the neighbors of $v_2$ and $v_3$ $b$ and $c$, respectively. Now by the pigeonhole principle, either $a=b, b=c,$ or $b=c$. Suppose without loss of generality that $a=b$, then the edge labeled $(a,b)$ will have the $a$ part not in strict equilibrium (it will have two neighbors of its same color and only one of the other color). This shows that no strict equilibrium can exist. The reduction then works by taking a satisfying assignment for the variables, coloring the literals in $G$ appropriately, and extending to a strictly stable equilibrium of all of $G$. Conversely, if $G$ has a strictly stable coloring, then the literals must be consistent and each clause must be fully colorable, which the above diagram shows is the same as the clauses being satisfiable. So all of $\varphi$ is satisfiable and we’re done (excluding a few additional details we describe in the paper). $\square$ Directed Graphs and Cooperation That was the main result of our paper, but we go on to describe some interesting generalizations. Since this post is getting quite long, we’ll just give a quick overview of the interesting parts. The rest of the paper is dedicated to directed graphs, where we define the payoff of a directed edge $(u,v)$ to go to the $u$ player if $u$ and $v$ anti-coordinate, but $v$ gets nothing. Here the computational feasibility is even worse than it was in the undirected case, but the structure is noticeably more interesting. For the former, not only is in NP-hard to compute strictly stable colorings, it’s even NP-hard to do so in the non-strict case! One large part of the reason for this is that stable colorings might not even exist: a directed 3-cycle has no stable equilibrium. We use this fact as a tool in our reductions to prove the following theorem. Theorem: For all $k \geq 2$, determining whether a directed graph has a stable$latex k\$-coloring is NP-complete.

See section 5 of our paper for a full proof.

To address the interesting structure that arises in the directed case, we observe that we can use a directed graph to simulate the desire of one vertex to actually cooperate with another. To see this for two colors, instead of adding an edge $(u,v)$ we add a proxy edge $u'$ and directed edges $(u,u'), (u',v)$. To be in equilibrium, the proxy has no choice but to anti-coordinate with $v$, and this will give $u$ more incentive to cooperate with $v$ by anti-cooperating with its proxy. This can be extended to $k$ colors by using an appropriately (acyclically) directed copy of $K_{k-1}$.

Thoughts, and Future Work

While the results in this paper are nice, and I’m particularly proud that I came up with a novel NP-hardness reduction, it is unfortunate that the only results in this paper were hardness results. Because of the ubiquity of NP-hard problems, it’s far more impressive to have an algorithm which actually does something (approximate a good solution, do well under some relaxed assumption, do well in expectation with some randomness) than to prove something is NP-hard. To get an idea of the tone set by researchers, NP-hardness results are often called “negative” results (in the sense that they give a “no” answer to the question of whether there is an efficient algorithm) and finding an algorithm that does something is called a positive result. That being said the technique of using two separate vertices to represent a single literal in a reduction proof is a nice trick, and I have since used it in another research paper, so I’m happy with my work.

On the positive side, though, there is some interesting work to be done. We could look at varying types of payoff structures, where instead of a binary payoff it is a function of the colors involved (say, $|i - j|$. Another interesting direction is to consider distributed algorithms (where each player operates independently and in parallel) and see what kinds of approximations of the optimal payoff can be achieved in that setting. Yet another direction favored by a combinatorialist is to generalize the game to hypergraphs, which makes me wonder what type of payoff structure is appropriate (payoff of 1 for a rainbow edge? a non-monochromatic edge?). There is also some more work that can be done in inspecting the relationship between cooperation and anti-cooperation in the directed version. Though I don’t have any immediate open questions about it, it’s a very interesting phenomenon.

In any event, I’m currently scheduled to give three talks about the results in this paper (one at the conference venue in Germany, and two at my department’s seminars). Here’s to starting off my research career!

The Erdős-Rényi Random Graph

During the 1950′s the famous mathematician Paul Erdős and Alfred Rényi put forth the concept of a random graph and in the subsequent years of study transformed the world of combinatorics. The random graph is the perfect example of a good mathematical definition: it’s simple, has surprisingly intricate structure, and yields many applications.

In this post we’ll explore basic facts about random graphs, slowly detail a proof on their applications to graph theory, and explore their more interesting properties computationally (a prelude to proofs about their structure). We assume the reader is familiar with the definition of a graph, which we’ve written about at length for non-mathematical audiences, and has some familiarity with undergraduate-level probability and combinatorics for the more mathematical sections of the post. We’ll do our best to remind the reader of these prerequisites as we go, and we welcome any clarification questions in the comment section.

The Erdős-Rényi Model

The definition of a random graph is simple enough that we need not defer it to the technical section of the article.

Definition: Given a positive integer $n$ and a probability value $0 \leq p \leq 1$, define the graph $G(n,p)$ to be the undirected graph on $n$ vertices whose edges are chosen as follows. For all pairs of vertices $v,w$ there is an edge $(v,w)$ with probability $p$.

Of course, there is no single random graph. What we’ve described here is a process for constructing a graph. We create a set of $n$ vertices, and for each possible pair of vertices we flip a coin (often a biased coin) to determine if we should add an edge connecting them. Indeed, every graph can be made by this process if one is sufficiently lucky (or unlucky), but it’s very unlikely that we will have no edges at all if $p$ is large enough. So $G(n,p)$ is really a probability distribution over the set of all possible graphs on $n$ vertices. We commit a horrendous abuse of notation by saying $G$ or $G(n,p)$ is a random graph instead of saying that $G$ is sampled from the distribution. The reader will get used to it in time.

Why Do We Care?

Random graphs of all sorts (not just Erdős’s model) find applications in two very different worlds. The first is pure combinatorics, and the second is in the analysis of networks of all kinds.

In combinatorics we often wonder if graphs exist with certain properties. For instance, in graph theory we have the notion of graph colorability: can we color the vertices of a graph with $k$ colors so that none of its edges are monochromatic? (See this blog’s primer on graph coloring for more) Indeed, coloring is known to be a very difficult problem on general graphs. The problem of determining whether a graph can be colored with a fixed number of colors has no known efficient algorithm; it is NP-complete. Even worse, much of our intuition about graphs fails for graph coloring. We would expect that sparse-looking graphs can be colored with fewer colors than dense graphs. One naive way to measure sparsity of a graph is to measure the length of its shortest cycle (recall that a cycle is a path which starts and ends at the same vertex). This measurement is called the girth of a graph. But Paul Erdős proved using random graphs, as we will momentarily, that for any choice of integers $g,k$ there are graphs of girth $\geq g$ which cannot be colored with fewer than $k$ colors. Preventing short cycles in graphs doesn’t make coloring easier.

The role that random graphs play in this picture is to give us ways to ensure the existence of graphs with certain properties, even if we don’t know how to construct an example of such a graph. Indeed, for every theorem proved using random graphs, there is a theorem (or open problem) concerning how to algorithmically construct those graphs which are known to exist.

Pure combinatorics may not seem very useful to the real world, but models of random graphs (even those beyond the Erdős-Rényi model) are quite relevant. Here is a simple example. One can take a Facebook user $u$ and form a graph of that users network of immediate friends $N(u)$ (excluding $u$ itself), where vertices are people and two people are connected by an edge if they are mutual friends; call this the user’s friendship neighborhood. It turns out that the characteristics of the average Facebook user’s friendship neighborhood resembles a random graph. So understanding random graphs helps us understand the structure of small networks of friends. If we’re particularly insightful, we can do quite useful things like identify anomalies, such as duplicitous accounts, which deviate quite far from the expected model. They can also help discover trends or identify characteristics that can allow for more accurate ad targeting. For more details on how such an idea is translated into mathematics and code, see Modularity (we plan to talk about modularity on this blog in the near future; lots of great linear algebra there!).

Random graphs, when they exhibit observed phenomena, have important philosophical consequences. From a bird’s-eye view, there are two camps of scientists. The first are those who care about leveraging empirically observed phenomena to solve problems. Many statisticians fit into this realm: they do wonderful magic with data fit to certain distributions, but they often don’t know and don’t care whether the data they use truly has their assumed properties. The other camp is those who want to discover generative models for the data with theoretical principles. This is more like theoretical physics, where we invent an arguably computational notion of gravity whose consequences explain our observations.

For applied purposes, the Erdős-Rényi random graph model is in the second camp. In particular, if something fits in the Erdős-Rényi model, then it’s both highly structured (as we will make clear in the sequel) and “essentially random.” Bringing back the example of Facebook, this says that most people in the average user’s immediate friendship neighborhood are essentially the same and essentially random in their friendships among the friends of $u$. This is not quite correct, but it’s close enough to motivate our investigation of random graph models. Indeed, even Paul Erdős in his landmark paper mentioned that equiprobability among all vertices in a graph is unrealistic. See this survey for a more thorough (and advanced!) overview, and we promise to cover models which better represent social phenomena in the future.

So lets go ahead and look at a technical proof using random graphs from combinatorics, and then write some programs to generate random graphs.

Girth and Chromatic Number, and Counting Triangles

As a fair warning, this proof has a lot of moving parts. Skip to the next section if you’re eager to see some programs.

Say we have a $k$ and a $g$, and we wonder whether a graph can exist which simultaneously has no cycles of length less than $g$ (the girth) and needs at least $k$ colors to color. The following theorem settles this question affirmatively.  A bit of terminology: the chromatic number of a graph $G$, denoted $\chi(G)$, is the smallest number of colors needed to properly color $G$.

Theorem: For any natural numbers $k,g$, there exist graphs of chromatic number at least $k$ and girth at least $g$.

Proof. Taking our cue from random graphs, let’s see what the probability is that a random graph $G(n,p)$ on $n$ vertices will have our desired properties. Or easier, what’s the chance that it will not have the right properties? This is essentially a fancy counting argument, but it’s nicer if we phrase it in the language of probability theory.

The proof has a few twists and turns for those uninitiated to the probabilistic method of proof. First, we will look at an arbitrary $G(n,p)$ (where $n,p$ are variable) and ask two questions about it: what is the expected number of short cycles, and what is the expected “independence number” (which we will see is related to coloring). We’ll then pick a value of $p$, depending crucially on $n$, which makes both of these expectations small. Next, we’ll use the fact that if the probability that $G(n,p)$ doesn’t have our properties is strictly less than 1, then there has to be some instance in our probability space which has those properties (if no instance had the property, then the probability would be one!). Though we will not know what the graphs look like, their existence is enough to prove the theorem.

So let’s start with cycles. If we’re given a desired girth of $g$, the expected number of cycles of length $\leq g$ in $G(n,p)$ can be bounded by $(np)^{g+1}/(np-1)$. To see this, the two main points are how to count the number of ways to pick $k$ vertices in order to form a cycle, and a typical fact about sums of powers. Indeed, we can think of a cycle of length $k$ as a way to seat a choice of $k$ people around a circular table. There are $\binom{n}{k}$ possible groups of people, and $(k-1)!$ ways to seat one group. If we fix $k$ and let $n$ grow, then the product $\binom{n}{k}(k-1)!$ will eventually be smaller than $n^k$ (actually, this happens almost immediately in almost all cases). For each such choice of an ordering, the probability of the needed edges occurring to form a cycle is $p^j$, since all edges must occur independently of each other with probability $p$.

So the probability that we get a cycle of $j$ vertices is

$\displaystyle \binom{n}{j}(j-1)!p^j$

And by the reasoning above we can bound this by $n^jp^j$. Summing over all numbers $j = 3, \dots, g$ (we are secretly using the union bound), we bound the expected number of cycles of length $\leq g$ from above:

$\displaystyle \sum_{j=3}^g n^j p^j < \sum_{j=0}^g n^j p^j = \frac{(np)^{g+1}}{np - 1}$

Since we want relatively few cycles to occur, we want it to be the case that the last quantity, $(np)^{j+1}/(np-1)$, goes to zero as $n$ goes to infinity. One trick is to pick $p$ depending on $n$. If $p = n^l$, our upper bound becomes $n^{(l+1)(g+1)} / (n^{1+l} - 1)$, and if we want the quantity to tend to zero it must be that $(l+1)(g+1) < 1$. Solving this we get that $-1 < l < \frac{1}{g+1} - 1 < 0$. Pick such an $l$ (it doesn’t matter which), and keep this in mind: for our choice of $p$, the expected number of cycles goes to zero as $n$ tends to infinity.

On the other hand, we want to make sure that such a graph has high chromatic number. To do this we’ll look at a related property: the size of the largest independent set. An independent set of a graph $G = (V,E)$ is a set of vertices $S \subset V$ so that there are no edges in $E$ between vertices of $S$. We call $\alpha(G)$ the size of the largest independent set. The values $\alpha(G)$ and $\chi(G)$ are related, because any time you have an independent set you can color all the vertices with a single color. In particular, this proves the inequality $\chi(G) \alpha(G) \geq n$, the number of vertices of $G$, or equivalently $\chi(G) \geq n / \alpha(G)$. So if we want to ensure $\chi(G)$ is large, it suffices to show $\alpha(G)$ is small (rigorously, $\alpha(G) \leq n / k$ implies $\chi(G) \geq k$).

The expected number of independent sets (again using the union bound) is at most the product of the number of possible independent sets and the probability of one of these having no edges. We let $r$ be arbitrary and look at independent sets of size $r$ Since there are $\binom{n}{r}$ sets and each has a probability $(1-p)^{\binom{r}{2}}$ of being independent, we get the probability that there is an independent set of size $r$ is bounded by

$\displaystyle \textup{P}(\alpha(G) \geq r) \leq \binom{n}{r}(1-p)^{\binom{r}{2}}$

We use the fact that $1-x < e^{-x}$ for all $x$ to translate the $(1-p)$ part. Combining this with the usual $\binom{n}{r} \leq n^r$ we get the probability of having an independent set of size $r$ at most

$\displaystyle \textup{P}(\alpha(G) \geq r) \leq \displaystyle n^re^{-pr(r-1)/2}$

Now again we want to pick $r$ so that this quantity goes to zero asymptotically, and it’s not hard to see that $r = \frac{3}{p}\log(n)$ is good enough. With a little arithmetic we get the probability is at most $n^{(1-a)/2}$, where $a > 1$.

So now we have two statements: the expected number of short cycles goes to zero, and the probability that there is an independent set of size at least $r$ goes to zero. If we pick a large enough $n$, then the expected number of short cycles is less than $n/5$, and using Markov’s inequality we see that the probability that there are more than $n/2$ cycles of length at most $g$ is strictly less than 1/2. At the same time, if we pick a large enough $n$ then $\alpha(G) \geq r$ with probability strictly less than 1/2. Combining these two (once more with the union bound), we get

$\textup{P}(\textup{at least } n/2 \textup{ cycles of length } \leq g \textup{ and } \alpha(G) \geq r) < 1$

Now we can conclude that for all sufficiently large $n$ there has to be a graph on at least $n$ vertices which has neither of these two properties! Pick one and call it $G$. Now $G$ still has cycles of length $\leq g$, but we can fix that by removing a vertex from each short cycle (it doesn’t matter which). Call this new graph $G'$. How does this operation affect independent sets, i.e. what is $\alpha(G')$? Well removing vertices can only decrease the size of the largest independent set. So by our earlier inequality, and calling $n'$ the number of vertices of $G'$, we can make a statement about the chromatic number:

$\displaystyle \chi(G') \geq \frac{n'}{\alpha(G')} \geq \frac{n/2}{\log(n) 3/p} = \frac{n/2}{3n^l \log(n)} = \frac{n^{1-l}}{6 \log(n)}$

Since $-1 < l < 0$ the numerator grows asymptotically faster than the denominator, and so for sufficiently large $n$ the chromatic number will be larger than any $k$ we wish. Hence we have found a graph with girth at least $g$ and chromatic number at least $k$.

$\square$

Connected Components

The statistical properties of a random graph are often quite easy to reason about. For instance, the degree of each vertex in $G(n,p)$ is $np$ in expectation. Local properties like this are easy, but global properties are a priori very mysterious. One natural question we can ask in this vein is: when is $G(n,p)$ connected? We would very much expect the answer to depend on how $p$ changes in relation to $n$. For instance, $p$ might look like $p(n) = 1/n^2$ or $\log(n) / n$ or something similar. We could ask the following question:

As $n$ tends to infinity, what limiting proportion of random graphs $G(n,p)$ are connected?

Certainly for some $p(n)$ which are egregiously small (for example, $p(n) = 0$), the answer will be that no graphs are connected. On the other extreme, if $p(n) = 1$ then all graphs will be connected (and complete graphs, at that!). So our goal is to study the transition phase between when the graphs are disconnected and when they are connected. A priori this boundary could be a gradual slope, where the proportion grows from zero to one, or it could be a sharp jump. Next time, we’ll formally state and prove the truth, but for now let’s see if we can figure out which answer to expect by writing an exploratory program.

We wrote the code for this post in Python, and as usual it is all available for download on this blog’s Github page.

We start with a very basic Node class to represent each vertex in a graph, and a function to generate random graphs

import random
class Node:
def __init__(self, index):
self.index = index
self.neighbors = []

def __repr__(self):
return repr(self.index)

def randomGraph(n,p):
vertices = [Node(i) for i in range(n)]
edges = [(i,j) for i in xrange(n) for j in xrange(i) if random.random() < p]

for (i,j) in edges:
vertices[i].neighbors.append(vertices[j])
vertices[j].neighbors.append(vertices[i])

return vertices


The randomGraph function simply creates a list of edges chosen uniformly at random from all possible edges, and then constructs the corresponding graph. Next we have a familiar sight: the depth-first search. We use it to compute the graph components one by one (until all vertices have been found in some run of a DFS).

def dfsComponent(node, visited):
for v in node.neighbors:
if v not in visited:
dfsComponent(v, visited)

def connectedComponents(vertices):
components = []
cumulativeVisited = set()

for v in vertices:
if v not in cumulativeVisited:
componentVisited = set([v])
dfsComponent(v, componentVisited)

components.append(componentVisited)
cumulativeVisited |= componentVisited

return components


The dfsComponent function simply searches in breadth-first fashion, adding every vertex it finds to the “visited” set.  The connectedComponents function keeps track of the list of components found so far, as well as the cumulative set of all vertices found in any run of bfsComponent. Hence, as we iterate through the vertices we can ignore vertices we’ve found in previous runs of bfsComponent. The “x |= y” notation is python shorthand for updating x via a union with y.

Finally, we can make a graph of the largest component of (independently generated) random graphs as the probability of an edge varies.

def sizeOfLargestComponent(vertices):
return max(len(c) for c in connectedComponents(vertices))

def graphLargestComponentSize(n, theRange):
return [(p, sizeOfLargestComponent(randomGraph(n, p))) for p in theRange]


Running this code and plotting it for $p$ varying from zero to 0.5 gives the following graph.

The blue line is the size of the largest component, and the red line gives a moving average estimate of the data.  As we can see, there is a very sharp jump peaking at $p=0.1$ at which point the whole graph is connected. It would appear there is a relatively quick “phase transition” happening here. Let’s zoom in closer on the interesting part.

It looks like the transition begins around 0.02, and finishes at around 0.1. Interesting… Let’s change the parameters a bit, and increase the size of the graph. Here’s the same chart (in the same range of $p$ values) for a graph with a hundred vertices.

Now the phase transition appears to have shifted to about $(0.01, 0.05)$, which is about multiplying the endpoints of the phase transition interval above by 1/2. The plot thickens… Once more, let’s move up to a graph on 500 vertices.

Again it’s too hard to see, so let’s zoom in.

This one looks like the transition starts at 0.002 and ends at 0.01. This is a 5-fold decrease from the previous one, and we increased the number of vertices by 5. Could this be a pattern? Here’s a conjecture to formalize it:

Conjecture: The random graph $G(n,p)$ enters a phase transition at $p=1/n$ and becomes connected almost surely at $p=5/n$.

This is not quite rigorous enough to be a true conjecture, but it sums up our intuition that we’ve learned so far. Just to back this up even further, here’s an animation showing the progression of the phase transition as $n = 20 \dots 500$ in steps of twenty. Note that the $p$ range is changing to maintain our conjectured window.

Looks pretty good. Next time we’ll see some formal mathematics validating our intuition (albeit reformulated in a nicer way), and we’ll continue to investigate other random graph models.

Until then!

The graph is among the most common data structures in computer science, and it’s unsurprising that a staggeringly large amount of time has been dedicated to developing algorithms on graphs. Indeed, many problems in areas ranging from sociology, linguistics, to chemistry and artificial intelligence can be translated into questions about graphs. It’s no stretch to say that graphs are truly ubiquitous. Even more, common problems often concern the existence and optimality of paths from one vertex to another with certain properties.

Of course, in order to find paths with certain properties one must first be able to search through graphs in a structured way. And so we will start our investigation of graph search algorithms with the most basic kind of graph search algorithms: the depth-first and breadth-first search. These kinds of algorithms existed in mathematics long before computers were around. The former was ostensibly invented by a man named Pierre Tremaux, who lived around the same time as the world’s first formal algorithm designer Ada Lovelace. The latter was formally discovered much later by Edward F. Moore in the 50′s. Both were discovered in the context of solving mazes.

These two algorithms nudge gently into the realm of artificial intelligence, because at any given step they will decide which path to inspect next, and with minor modifications we can “intelligently” decide which to inspect next.

Of course, this primer will expect the reader is familiar with the basic definitions of graph theory, but as usual we provide introductory primers on this blog. In addition, the content of this post will be very similar to our primer on trees, so the familiar reader may benefit from reading that post as well. As usual, all of the code used in this post is available on this blog’s Github page.

The Basic Graph Data Structure

We present the basic data structure in both mathematical terms and explicitly in Python.

Definition: A directed graph is a triple $G = (V, E, \varphi)$ where $V$ is a set of vertices, $E$ is a set of edges, and $\varphi: E \to V \times V$ is the adjacency function, which specifies which edges connect which vertices (here edges are ordered pairs, and hence directed).

We will often draw pictures of graphs instead of explicitly specifying their adjacency functions:

That is, the vertex set is $\left \{ A,B,C,D,E,F \right \}$, the edge set (which is unlabeled in the picture) has size 6, and the adjacency function just formalizes which edges connect which vertices.

An undirected graph is a graph in which the edges are (for lack of a better word) undirected. There are two ways to realize this rigorously: one can view the codomain of the adjacency function $\varphi$ as the set of subsets of size 2 of $V$ as opposed to ordered pairs, or one can enforce that whenever $\varphi(e) = (v_1, v_2)$ is a directed edge then so is $(v_2, v_1)$. In our implementations we will stick to directed graphs, and our data structure will extend nicely to use the second definition if undirectedness is needed.

For the purpose of finding paths and simplicity in our derivations, we will impose two further conditions. First, the graphs must be simple. That is, no graph may have more than one edge between two vertices or self-adjacent vertices (that is, $(v,v)$ can not be an edge for any vertex $v$). Second, the graphs must be connected. That is, all pairs of vertices must have a path connecting them.

Our implementation of these algorithms will use a variation of the following Python data structure. This code should be relatively self-explanatory, but the beginning reader should consult our primers on the Python programming language for a more thorough explanation of classes and lists.

class Node:
def __init__(self, value):
self.value = value
self.adjacentNodes = []

The entire graph will be accessible by having a reference to one vertex (this is guaranteed by connectivity). The vertex class is called Node, and it will contain as its data a value of arbitrary type and a list of neighboring vertices. For now the edges are just implicitly defined (there is an edge from $v$ to $w$ if the latter shows up in the “adjacentNodes” list of the former), but once we need edges with associated values we will have to improve this data structure to one similar to what we used in our post on neural networks.

That is, one must update the adjacentNodes attribute of each Node by hand to add edges to the graph. There are other data structures for graphs that allow one to refer to any vertex at any time, but for our purposes the constraint of not being able to do that is more enlightening. The algorithms we investigate will use no more and no less than what we have. At each stage we will inspect the vertex we currently have access to, and pick some action based on that.

Enough talk. Let’s jump right in!

Depth-First Search

For the remainder of this section, the goal is to determine if there is a vertex in the graph with the associated value being the integer 6. This can also be phrased more precisely as the question: “is there a path from the given node to a node with value 6?” (For connected. undirected graphs the two questions are equivalent.)

Our first algorithm will solve this problem quite nicely, and is called the depth-first search. Depth-first search is inherently a recursion:

1. Start at a vertex.
2. Pick any unvisited vertex adjacent to the current vertex, and check to see if this is the goal.
3. If not, recursively apply the depth-first search to that vertex, ignoring any vertices that have already been visited.
4. Repeat until all adjacent vertices have been visited.

This is probably the most natural algorithm in terms of descriptive simplicity. Indeed, in the case that our graph is a tree, this algorithm is precisely the preorder traversal.

Aside from keeping track of which nodes have already been visited, the algorithm is equally simple:

def depthFirst(node, soughtValue):
if node.value == soughtValue:
return True



Of course, supposing 6 is not found, any graph which contains a cycle (or any nontrivial, connected, undirected graph) will cause this algorithm to loop infinitely. In addition, and this is a more subtle engineering problem, graphs with a large number of vertices will cause this function to crash by exceeding the maximum number of allowed nested function calls.

To avoid the first problem we can add an extra parameter to the function: a Python set type which contains the set of Nodes which have already been visited. Python sets are the computational analogue of mathematical sets, meaning that their contents are unordered and have no duplicates. And functionally Python sets have fast checks for inclusion and addition operations, so this fits the bill quite nicely.

The updated code is straightforward:

def depthFirst(node, soughtValue, visitedNodes):
if node.value == soughtValue:
return True

return True

return False


There are a few tricky things going on in this code snippet. First, after checking the current Node for the sought value, we add the current Node to the set of visitedNodes. While subsequently iterating over the adjacent Nodes we check to make sure the Node has not been visited before recursing. Since Python passes these sets by reference, changes made to visitedNodes deep in the recursion persist after the recursive call ends. That is, much the same as lists in Python, these updates mutate the object.

Second, this algorithm is not crystal clear on how the return values operate. Each recursive call returns True or False, but because there are arbitrarily many recursive calls made at each vertex, we can’t simply return the result of a recursive call. Instead, we can only know that we’re done entirely when a recursive call specifically returns True (hence the test for it in the if statement). Finally, after all recursive calls have terminated (and they’re all False), the end of the function defaults to returning False; in this case the sought vertex was never found.

Let’s try running our fixed code with some simple examples. In the following example, we have stored in the variable G the graph given in the picture in the previous section.

>>> depthFirst(G, "A")
True
>>> depthFirst(G, "E")
True
>>> depthFirst(G, "F")
True
>>> depthFirst(G, "X")
False

Of course, this still doesn’t fix the problem with too many recursive calls; a graph which is too large will cause an error because Python limits the number of recursive calls allowed. The next and final step is a standard method of turning a recursive algorithm into a non-recursive one, and it requires the knowledge of a particular data structure called a stack.

In the abstract, one might want a structure which stores a collection of items and has the following operations:

• You can quickly add something to the structure.
• You can quickly remove the most recently added item.

Such a data structure is called a stack. By “quickly,” we really mean that these operations are required to run in constant time with respect to the size of the list (it shouldn’t take longer to add an item to a long list than to a short one). One imagines a stack of pancakes, where you can either add a pancake to the top or remove one from the top; no matter how many times we remove pancakes, the one on top is always the one that was most recently added to the stack. These two operations are called push (to add) and pop (to remove). As a completely irrelevant aside, this is not the only algorithmic or mathematical concept based on pancakes.

In other languages, one might have to implement such a data structure by hand. Luckily for us, Python’s lists double as stacks (although in the future we plan some primers on data structure design). Specifically, the append function of a list is the push operation, and Python lists have a special operation uncoincidentally called pop which removes the last element from a list and returns it to the caller.

Here is some example code showing this in action:

>>> L = [1,2,3]
>>> L.append(9)
>>> L
[1,2,3,9]
>>> L.pop()
9
>>> L
[1,2,3]

Note that pop modifies the list and returns a value, while push/append only modifies the list.

It turns out that the order in which we visit the vertices in the recursive version of the depth-first search is the same as if we had done the following. At each vertex, push the adjacent vertices onto a stack in the reverse order that you iterate through the list. To choose which vertex to process next, simple pop whatever is on top of the stack, and process it (taking the stack with you as you go). Once again, we have to worry about which vertices have already been visited, and that part of the algorithm remains unchanged.

For example, say we have the following graph:

Starting at vertex 1, which is adjacent to vertices 2, 3 and 4, we push 4 onto the stack, then 3, then 2. Next, we pop vertex 2 and iteratively process 2. At this point the picture looks like this:

Since 2 is connected to 4 and also 5, we push 5 and then 4 onto the stack. Note that 4 is in the stack twice (this is okay, since we are maintaining a set of visited vertices). Now the important part is that since we added vertex 5 and 4 after adding vertex 3, those will be processed before vertex 3. That is, the neighbors of more recently visited vertices (vertex 2) have a preference in being processed over the remaining neighbors of earlier ones (vertex 1). This is precisely the idea of recursion: we don’t finish the recursive call until all of the neighbors are processed, and that in turn requires the processing of all of the neighbors’ neighbors, and so on.

As a quick side note: it should be clear by now that the order in which we visit adjacent nodes is completely arbitrary in both versions of this algorithm. There is no inherent ordering on the edges of a vertex in a graph, and so adding them in reverse order is simply a way for us to mentally convince ourselves that the same preference rules apply with the stack as with recursion. That is, whatever order we visit them in the recursive version, we must push them onto the stack in the opposite order to get an identical algorithm. But in isolation neither algorithm requires a particular order. So henceforth, we will stop adding things in “reverse” order in the stack version.

Now the important part is that once we have converted the recursive algorithm into one based on a stack, we can remove the need for recursion entirely. Instead, we use a loop that terminates when the stack is empty:

def depthFirst(startingNode, soughtValue):
visitedNodes = set()
stack = [startingNode]

while len(stack) > 0:
node = stack.pop()
if node in visitedNodes:
continue

if node.value == soughtValue:
return True

if n not in visitedNodes:
stack.append(n)
return False


This author particularly hates the use of “continue” in while loops, but its use here is better than any alternative this author can think of. For those unfamiliar: whenever a Python program encounters the continue statement in a loop, it skips the remainder of the body of the loop and begins the next iteration. One can also combine the last three lines of code into one using the lists’s extend function in combination with a list comprehension. This should be an easy exercise for the reader.

Moreover, note that this version of the algorithm removes the issue with the return values. It is quite easy to tell when we’ve found the required node or determined it is not in the graph: if the loop terminates naturally (that is, without hitting a return statement), then the sought value doesn’t exist.

The reliance of this algorithm on a data structure is not an uncommon thing. In fact, the next algorithm we will see cannot be easily represented as a recursive phenomenon; the order of traversal is simply too different. Instead, it will be almost identical to the stack-form of the depth-first search, but substituting a queue for a stack.

As the name suggests, the breadth-first search operates in the “opposite” way from the depth-first search. Intuitively the breadth-first search prefers to visit the neighbors of earlier visited nodes before the neighbors of more recently visited ones. Let us reexamine the example we used in the depth-first search to see this change in action.

Starting again with vertex 1, we add 4, 3, and 2 (in that order) to our data structure, but now we prefer the first thing added to our data structure instead of the last. That is, in the next step we visit vertex 4 instead of vertex 2. Since vertex 4 is adjacent to nobody, the recursion ends and we continue with vertex 3.

Now vertex 3 is adjacent to 5, so we add 5 to the data structure. At this point the state of the algorithm can be displayed like this:

The “Data Structure” has the most recently added items on top. A red “x” denotes a vertex which has already been visited by the algorithm at this stage.

That is, and this is the important bit, we process vertex 2 before we process vertex 5. Notice the pattern here: after processing vertex 1, we processed all of the neighbors of vertex 1 before processing any vertices not immediately adjacent to vertex one. This is where the “breadth” part distinguishes this algorithm from the “depth” part. Metaphorically, a breadth-first search algorithm will look all around a vertex before continuing on into the depths of a graph, while the depth-first search will dive straight to the bottom of the ocean before looking at where it is. Perhaps one way to characterize these algorithms is to call breadth-first cautious, and depth-first hasty. Indeed, there are more formal ways to make these words even more fitting that we will discuss in the future.

The way that we’ll make these rules rigorous is in the data-structure version of the algorithm: instead of using a stack we’ll use a queue. Again in the abstract, a queue is a data structure for which we’d like the following properties:

• We can quickly add items to the queue.
• We can quickly remove the least recently added item.

The operations on a queue are usually called enqueue (for additions) and dequeue (for removals).

Again, Python’s lists have operations that functionally make them queues, but the analogue of the enqueue operation is not efficient (specifically, it costs $O(n)$ for a list of size $n$). So instead we will use Python’s special deque class (pronounced “deck”). Deques are nice because they allow fast addition and removal from both “ends” of the structure. That is, deques specify a “left” end and a “right” end, and there are constant-time operations to add and remove from both the left and right ends.

Hence the enqueue operation we will use for a deque is called “appendleft,” and the dequeue operation is (unfortunately) called “pop.”

>>> from collections import deque
>>> queue = deque()
>>> queue.appendleft(7)
>>> queue.appendleft(4)
>>> queue
[4,7]
>>> queue.pop()
7
>>> queue
[4]

Note that a deque can also operate as a stack (it also has an append function with functions as the push operation). So in the following code for the breadth-first search, the only modification required to make it a depth-first search is to change the word “appendleft” to “append” (and to update the variable names from “queue” to “stack”).

And so the code for the breadth-first search algorithm is essentially identical:

from collections import deque

visitedNodes = set()
queue = deque([startingNode])

while len(queue) > 0:
node = queue.pop()
if node in visitedNodes:
continue

if node.value == soughtValue:
return True

if n not in visitedNodes:
queue.appendleft(n)
return False


As in the depth-first search, one can combine the last three lines into one using the deque’s extendleft function.

We leave it to the reader to try some examples of running this algorithm (we repeated the example for the depth-first search in our code, but omit it for brevity).

Generalizing

After all of this exploration, it is clear that the depth-first search and the breadth-first search are truly the same algorithm. Indeed, the only difference is in the data structure, and this can be abstracted out of the entire procedure. Say that we have some data structure that has three operations: add, remove, and len (the Pythonic function for “query the size”). Then we can make a search algorithm that uses this structure without knowing how it works on the inside. Since words like stack, queue, and heap are already taken for specific data structures, we’ll call this arbitrary data structure a pile. The algorithm might look like the following in Python:

def search(startingNode, soughtValue, pile):
visitedNodes = set()
nodePile = pile()

while len(nodePile) > 0:
node = nodePile.remove()
if node in visitedNodes:
continue

if node.value == soughtValue:
return True

if n not in visitedNodes:
return False


Note that the argument “pile” passed to this function is the constructor for the data type, and one of the first things we do is call it to create a new instance of the data structure for use in the rest of the function.

And now, if we wanted, we could recreate the depth-first search and breadth-first search as special cases of this algorithm. Unfortunately this would require us to add new methods to a deque, which is protected from such devious modifications by the Python runtime system. Instead, we can create a wrapper class as follows:

from collections import deque

class MyStack(deque):
self.append(item)

def remove(self):
return self.pop()

depthFirst = lambda node, val: search(node, val, MyStack)


And this clearly replicates the depth-first search algorithm. We leave the replication of the breadth-first algorithm as a trivial exercise (one need only modify two lines of the above code!).

It is natural to wonder what other kinds of magical data structures we could plug into this generic search algorithm. As it turns out, in the next post in this series we will investigate algorithms which do just that. The data structure we use will be much more complicated (a priority queue), and it will make use of additional information we assume away for this post. In particular, they will make informed decisions about which vertex to visit next at each step of the algorithm. We will also investigate some applications of these two algorithms next time, and hopefully we will see a good example of how they apply to artificial intelligence used in games.

Until then!

Ramsey Number Lower Bound

Define the Ramsey number $R(k,m)$ to be the minimum number $n$ of vertices required of the complete graph $K_n$ so that for any two-coloring (red, blue) of the edges of $K_n$ one of two things will happen:

• There is a red $k$-clique; that is, a complete subgraph of $k$ vertices for which all edges are red.
• There is a blue $m$-clique.

It is known that these numbers are always finite, but it is very difficult to compute them exactly.

Problem: Prove that the Ramsey number $R(m,m) > n$ whenever $n,m$ satisfy

$\displaystyle \binom{n}{m}2^{1-\binom{m}{2}} < 1$

Solution: Color the edges of $K_n$ uniformly at random (that is, each edge has probability 1/2 of being colored red). For any complete subgraph $G = K_m$, define by event $A_G$ the event that $G$ is monochromatic (its edges are either all red or all blue).

Now the probability that $A_G$ occurs (where $G$ is fixed ahead of time) is easy to compute:

$\displaystyle \textup{Pr}(A_G) = \left (\frac{1}{2} \right)^{\binom{m}{2} - 1} = 2^{1-\binom{m}{2}}$

Since there are $\binom{n}{m}$ possible subgraphs with $m$ vertices, The probability that for some $G$ the event $A_G$ occurs is at most

$\displaystyle \binom{n}{m}2^{1-\binom{m}{2}}$

Whenever this quantity is strictly less than 1 (by assumption) then there is a positive probability that no event $A_G$ will occur. That is, there is a positive probability that a random coloring will have no monochromatic subgraph $K_m$. So there must exist such a coloring, and the Ramsey number $R(m,m)$ must be larger than $n$. $\square$

Discussion: This proof (originally due to Erdős) is a classic example of the so-called probabilistic method. In particular, we create a probability space from the object we wish to study, and then we make claims about the probability of joint events.

While it seems quite simple in nature, the probabilistic method has been successfully applied to a wide variety of problems in mathematics. For instance, there is an elegant proof in complexity theory that $\textup{BPP} \subset \textup{P/poly}$ which uses this same method. The probabilistic method has been applied to loads of problems in combinatorics, number theory, and graph theory, and it forms the foundation of the area of random graph theory (which is the setting in which one studies social networks). Perhaps unsurprisingly, there is also a proof of the fundamental theorem of algebra that uses the probabilistic method.