A Rook Game

Problem: Two players take turns moving a rook on an 8×8 chessboard. The rook is only allowed to move south or west (but not both in a single turn), and may move any number of squares in the chosen direction on a turn. The loser is the player who first cannot move the rook. What is the optimal play for any starting position?

rook-board

Solution: Take advantage of the symmetry of the board. If the rook is not on the diagonal, the optimal strategy is to move it to the diagonal. Then when the other player moves it off, your next move is to move it back to the diagonal. If your opponent starts their turn with the rook always on the diagonal, then you will never lose, and by the symmetry of the board you can always move the rook back to the diagonal. This provides an optimal algorithm for either player. In particular, if the rook starts on a square that is not on the diagonal, then player 1 can guarantee a win, and otherwise player 2 can.

Symmetry is one of the most powerful tools in all of mathematics, and this is a simple albeit illustrative example of its usage.

Cauchy-Schwarz Inequality (and Amplification)

Problem: Prove that for vectors v, w in an inner product space, the inequality

\displaystyle |\left \langle v, w \right \rangle | \leq \| v \| \| w \|

Solution: There is an elementary proof of the Cauchy-Schwarz inequality (see the Wikipedia article), and this proof is essentially the same. What makes this proof stand out is its insightful technique, which I first read about on Terry Tao’s blog. He calls it “textbook,” and maybe it is for an analyst, but it’s still very elegant.

We start by observing another inequality we know to be true, that \| v - w \|^2 = \left \langle v - w, v - w \right \rangle \geq 0, since norms are by definition nonnegative. By the properties of a complex inner product we can expand to get

\displaystyle \| v \|^2 - 2 \textup{Re}(\left \langle v,w \right \rangle) + \| w \|^2 \geq 0

or equivalently

\displaystyle \textup{Re}(\left \langle v,w \right \rangle) \leq \frac{1}{2} \| v \|^2 + \frac{1}{2} \| w \|^2

This inequality is close to the one we’re looking for, but ‘weaker’ because the inequality we seek squeezes inside the inequality we have. That is,

\displaystyle \textup{Re}(\left \langle v,w \right \rangle) \leq |\left \langle v, w \right \rangle | \leq \| v \| \| w \| \leq \frac{1}{2} \| v \|^2 + \frac{1}{2} \| w \|^2

The first inequality is trivial (a complex number is always greater than its real part), the second is the inequality we seek to prove, and the third is a consequence of the arithmetic-geometric mean inequality. And so we have an inequality we’d like to “tighten” to get the true theorem. We do this by tightening each side of the inequality separately, and we do each by exploiting symmetries in the expressions involved.

First, we observe that norms of vectors are preserved by (complex) rotations v \mapsto e^{i \theta}v, but the real part is not. Since this inequality is true no matter which vectors we choose, we can choose \theta to our advantage. That is,

\displaystyle \textup{Re}(\left \langle e^{i \theta}v, w \right \rangle) \leq \frac{1}{2} \| e^{i \theta}v \|^2 + \frac{1}{2} \| w \|^2

And by properties of inner products and norms (pulling out scalars) and the fact that |e^{i\theta}| = 1, we can simplify to

\displaystyle \textup{Re}(e^{i \theta}\left \langle v,w \right \rangle) \leq \frac{1}{2}\| v \|^2 + \frac{1}{2} \| w \|^2

where \theta is arbitrary. Since we want to maximize the left hand side as much as possible, we can choose \theta to be whatever is required to make the number real. Then the real part is just the absolute value of the number itself, and we have

\displaystyle \left |\langle v,w \right \rangle | \leq \frac{1}{2} \| v \|^2 + \frac{1}{2} \| w \|^2

Now we tighten the right hand side by exploiting a symmetry in inner products: the transformation (v,w) \mapsto (\lambda v, \frac{1}{\lambda} w) preserves the left hand side (since |\lambda / \bar{\lambda}| = 1) but not the right. And so by the same reasoning, we can transform the above inequality into

\displaystyle \left |\langle v,w \right \rangle | \leq \frac{\lambda^2}{2} \| v \|^2 + \frac{1}{2 \lambda^2} \| w \|^2

And by plugging in \lambda = \sqrt{\| w \| / \| v \|} (indeed, this minimizes the expression for nonzero v,w) we get exactly the Cauchy-Schwarz inequality, as desired.

\square

This technique is termed “amplification” by Tao, and in his blog post he gives quite a few more advanced examples in harmonic and functional analysis (which are far beyond the scope of this blog).   The asymmetrical symmetry we took advantage of is a sort of “arbitrage” (again Terry’s clever choice of words) to take a weak fact and boost it to a stronger fact. And while the details of this proof are quite trivial, the technique of actively looking for one-sided symmetries is difficult to forget.

Ramsey Number Lower Bound

Define the Ramsey number R(k,m) to be the minimum number n of vertices required of the complete graph K_n so that for any two-coloring (red, blue) of the edges of K_n one of two things will happen:

  • There is a red k-clique; that is, a complete subgraph of k vertices for which all edges are red.
  • There is a blue m-clique.

It is known that these numbers are always finite, but it is very difficult to compute them exactly.

Problem: Prove that the Ramsey number R(m,m) > n whenever n,m satisfy

\displaystyle \binom{n}{m}2^{1-\binom{m}{2}} < 1

Solution: Color the edges of K_n uniformly at random (that is, each edge has probability 1/2 of being colored red). For any complete subgraph G = K_m, define by event A_G the event that G is monochromatic (its edges are either all red or all blue).

Now the probability that A_G occurs (where G is fixed ahead of time) is easy to compute:

\displaystyle \textup{Pr}(A_G) = \left (\frac{1}{2} \right)^{\binom{m}{2} - 1} = 2^{1-\binom{m}{2}}

Since there are \binom{n}{m} possible subgraphs with m vertices, The probability that for some G the event A_G occurs is at most

\displaystyle \binom{n}{m}2^{1-\binom{m}{2}}

Whenever this quantity is strictly less than 1 (by assumption) then there is a positive probability that no event A_G will occur. That is, there is a positive probability that a random coloring will have no monochromatic subgraph K_m. So there must exist such a coloring, and the Ramsey number R(m,m) must be larger than n. \square

Discussion: This proof (originally due to Erdős) is a classic example of the so-called probabilistic method. In particular, we create a probability space from the object we wish to study, and then we make claims about the probability of joint events.

While it seems quite simple in nature, the probabilistic method has been successfully applied to a wide variety of problems in mathematics. For instance, there is an elegant proof in complexity theory that \textup{BPP} \subset \textup{P/poly} which uses this same method. The probabilistic method has been applied to loads of problems in combinatorics, number theory, and graph theory, and it forms the foundation of the area of random graph theory (which is the setting in which one studies social networks). Perhaps unsurprisingly, there is also a proof of the fundamental theorem of algebra that uses the probabilistic method.

There are Infinitely Many Primes (Erdős)

Problem: Prove there are infinitely many primes

Solution: Denote by \pi(n) the number of primes less than or equal to n. We will give a lower bound on \pi(n) which increases without bound as n \to \infty.

Note that every number n can be factored as the product of a square free number r (a number which no square divides) and a square s^2. In particular, to find s^2 recognize that 1 is a square dividing n, and there are finitely many squares dividing n. So there must be a largest one, and then r = n/s^2. We will give a bound on the number of such products rs^2 which are less than or equal to n.

If we want to count the number of square-free numbers less than n, we can observe that each square free number is a product of distinct primes, and so (as in our false proof that there are finitely many primes) each square-free number corresponds to a subset of primes. At worst, we can allow these primes to be as large as n (for example, if n itself is prime), so there are no more than 2^{\pi(n)} such subsets.

Similarly, there are at most \sqrt{n} square numbers less than n, since if x > \sqrt{n} then x^2 > n.

At worst the two numbers r, s^2 will be unrelated, so the total number of factorizations rs^2 is at most the product 2^{\pi(n)}\sqrt{n}. In other words,

2^{\pi(n)}\sqrt{n} \geq n

The rest is algebra: divide by \sqrt{n} and take logarithms to see that \pi(n) \geq \frac{1}{2} \log(n). Since \log(n) is unbounded as n grows, so must \pi(n). Hence, there are infinitely many primes.

Discussion: This is a classic analytical argument originally discovered by Paul Erdős. One of the classical ways to investigate the properties of prime numbers is to try to estimate \pi(n). In fact, much of the work of the famous number theorists of the past involved giving good approximations of \pi(n) in terms of logarithms. This usually involved finding good upper bounds and lower bounds and limits. Erdős’s proof is entirely in this spirit, although there are much closer and more accurate lower and upper bounds. In this proof we include a lot of values which are not actually valid factorizations (many larger choices of r, s^2 will have their product larger than n). But for the purposes of proving there are infinitely many primes, this bound is about as elegant as one can find.