# Optimally Stacking the Deck – Texas Hold ‘Em

Main Theorem: There exist optimal stackings for standard two-player Texas Hold ‘Em.

## A Puzzle is Solved (and then some!)

It’s been quite a while since we first formulated the idea of an optimal stacking. In the mean time, we’ve gotten distracted with graduate school, preliminary exams, and the host of other interesting projects that have been going on here at Math ∩ Programming. And so months later, after traversing the homotopic hills of topology and projective plains of algebra, we’ve finally found time to solve the problem. And we found quite a bit more than we expected! But first, let’s recap the ideas from our original post.

For a reminder of the rules of Texas Hold ‘Em, here’s a silly tutorial video from PokerStars.com. For the sake of this post, the betting rules will not matter.

## Optimal Stackings

In card games, especially gambling games, trust is rarely given freely. Instead, we incorporate rituals into standard play that assure fairness. One of the most common such rituals is “cutting the deck.” The player who shuffles the deck passes the deck to a second player, who cuts the deck in half, placing the bottom half above the top half. Assuming the two players are not colluding and there is no sleight of hand, this ritual absolves the dealer of any guilt. Even if the dealer had stacked the deck to deal himself favorable cards, the randomness of the cut will necessarily ruin his efforts. Moreover, the cutter is often chosen to be the player to the dealer’s right (the last one dealt, in most games) to make it all the more difficult for two players to collude.

An optimal stacking circumvents the tacit fairness assumed after a deck is cut. In particular, we search for an ordering of the cards in the deck which is so mischievous that no matter where it is cut, a specific player always wins.

This author doesn’t condone cheating in card games, nor does he plan to use the discoveries in this article for anything more than a parlor trick. Instead, the existence or nonexistence of an optimal stacking is an interesting combinatorial property of card games, and it appears to be a measure of complexity. As we saw last time, simple games like Blind Man’s Bluff (sometimes called Indian Poker) and Kuhn Poker do not have optimal stackings. Adding complexity to these games, we invented Kicsi Poker and discovered it has 11 total distinct optimal stackings (up to cyclic permutations of the deck). Moreover, depending on whether a card was “burned” during the deal, one player had significantly more optimal stackings than the other player (nine for player 1, and two for player 2).

But nobody plays Kicsi Poker, so the true goal is to determine whether Texas Hold ‘Em has optimal stackings. Because we found optimal stackings for Kicsi Poker after adding complexity, we reckoned that adding more complexity would not rule out the possibility of optimal stackings. Hence we supplied the following conjecture, which is now a theorem:

Theorem: There exist optimal stackings for standard two-player Texas Hold ‘Em.

In fact, we find much more than this, but we should formalize the notion a bit further and describe our method before constructing such a stacking. The remainder of this section will be abstract and seemingly unnecessary for the casual reader. We hold the contention that a proper mathematical analysis should be independent of the real world, and with the right definitions we can logically extend optimal stackings to other situations.

Definition: A cutting permutation $\sigma$ of a list $(1, 2, \dots, n)$ is a permutation in $S_n$ with the cycle form:

$(1\ 2\ \dots\ n)^k$

and for a fixed $0 \leq k < n$, we call $\sigma$ the cut at the $k$-th position.

If the reader is not familiar with the symmetric groups, this is simply a formalization of the intuitive idea of a cut. For example, to “cut” the list $(1, 2, 3, 4, 5)$ at the third position is simply the permutation with cycle form $(1\ 3\ 5\ 2\ 4)$. Applying this results in the list $(4,5,1,2,3)$, as expected. Moreover, every cut can be achieved by iterating the process of putting the top card on the bottom of the deck. Hence, the set of all cutting permutations is the cyclic group $\mathbb{Z}/52\mathbb{Z}$.

Since any list imposes an ordering on its contents, we can apply cutting permutations to any list.

Definition: A cut of a list $(c_1, \dots , c_n)$ is the list $(c_{\sigma(1)}, \dots, c_{\sigma(n)})$ for some cutting permutation $\sigma$. If the list is denoted $l$, we denote the cut $\sigma(l)$.

Specifically to card games, we speak of cutting a deck as the process of replacing a list of cards with its cut. We can now define an optimal stacking.

Definition: Fix a game $G$ played with a set of $k$ cards and $n$ players. We say that $G$ has an optimal stacking for player $i$ if there exists an ordering of the cards $D = (c_1, \dots, c_k)$ such that for every cutting permutation $\sigma$, player $i$ can force a win when $\sigma(D)$ is dealt.

Moreover, a game is said to simply have an optimal stacking if it has one for any player. We do not yet know of games which have optimal stackings for some players but not others, and we conjecture that no natural games do (that would make the game unfair for some players, although this author has always found asymmetric games interesting). It may also benefit one to assume in general that all players are infinitely rational, a standard assumption in Game Theory.

Note that we do not want to assume the cheating player knows the order of the deck ahead of time. Instead, an optimal stacking should allow the player to force a win simply by logical play and the knowledge that the stacking is optimal. Specifically, for a game where players have individual hands, the cheating player should not need to know the contents of the other players’ hands in order to win. In Texas Hold ‘Em this becomes a null issue, since if a player knows he is being dealt an optimal stacking, his hand will always end up winning; there is no reason for him to fold, even if his chances seem slim to none by usual probabilistic analysis.

Before we get back to Texas Hold ‘Em, we take a moment to prove a nice fact.

Proposition: The (slightly simplified) game of Hearts does not have an optimal stacking.

Proof. First, we must clarify that a “win” in Hearts is to either collect 0 points in a round or shoot the moon, and such a score must be forced in order to have an optimal stacking. We allow no passing of cards. Moreover, we restrict our game of Hearts to a single round, so that the winner of the game is the one who takes the smallest number of points. Hence, if one cannot shoot the moon, one must try to minimize the number of points taken. In particular, there is no reason for one player to “take one for the team” by trying to stop another player from shooting the moon unless the first player can guarantee having the smallest number of points at the end.

The crux of the proof is that all cards in the deck are dealt before play begins. Suppose first that the goal is simply to collect 0 points. Indeed, suppose that some stacking $D$ wins for player 1. Then a cut at position 1 shifts all of the hands right by one player, so then player 4 can guarantee collecting 0 points. If still player 1 can guarantee collecting zero points, then instead a cut at position 2 gives such guarantees to players 3 and 4. At this point, player 1 can still guarantee taking 0 points, then player 2 can shoot the moon, hence giving all other players 26 points.

If, on the other hand, player 1 can guarantee shooting the moon, the same argument shows that a cut at position 1 gives that guarantee to player 4. This is a contradiction since two players cannot shoot the moon simultaneously, nor can one shoot the moon while the other takes zero points. $\square$

Even extending the definition of a win to “collecting the minimum number of points” does not change the existence of an optimal stacking. The same proof applies, along with the knowledge that 26 (the total number of points) is not divisible by 4 (the number of players), or more coarsely that the Queen of Spades already represents 13 points, and can only be taken by one player.

One aspect of this proof contradicts the claim that the existence of an optimal stacking is a measure of complexity. In particular, Hearts is considered a rather complex game (certainly more complex than Kicsi Poker), and yet it does not have an optimal stacking. Moreover, with a suitable definition of a “win,” we can likely extend this proof to any game which deals out the entire deck to all players before starting (and where all players can see all their cards at all times).

While we do not have a formal rebuttal, it may instead be the case that the existence of an optimal stacking measures the ability for a game to diverge. In other words, in a game like Texas Hold ‘Em, one can start with a poor hand and later improve it. On the other hand, it could simply be that Texas Hold ‘Em has a complex deal: the five community cards, along with the dead cards burned in between each round, give the game enough “space” to allow for an optimal stacking.

But enough big-picture speculation. It’s time for the real prize.

## Optimal Stackings in Texas Hold ‘Em

Before we get into the program, we should give an example of an optimal stacking, and hence a proof of the main theorem. Here it is:

Th5d7cAd3cQsJc4h6c8c9dTc6hQc8d9sJh5c7dAhAc9h2cKh5sJs8s7h2d
7s9c4dKd8hQd6d3sKs5h2hAs4s2sTs6sQhJd3h4cTd3dKc

p1     p2      community       p1 hand     p2 hand    winner
Th7c | 5dAd | Qs Jc 4h 8c Tc | One Pair  | High Card | 1
5dAd | 7c3c | Jc 4h 6c 9d 6h | One Pair  | One Pair  | 1
7c3c | AdQs | 4h 6c 8c Tc Qc | Flush     | One Pair  | 1
AdQs | 3cJc | 6c 8c 9d 6h 8d | Two Pair  | Two Pair  | 1
3cJc | Qs4h | 8c 9d Tc Qc 9s | Flush     | Two Pair  | 1
Qs4h | Jc6c | 9d Tc 6h 8d Jh | Straight  | Two Pair  | 1
Jc6c | 4h8c | Tc 6h Qc 9s 5c | Flush     | High Card | 1
4h8c | 6c9d | 6h Qc 8d Jh 7d | One Pair  | One Pair  | 1
6c9d | 8cTc | Qc 8d 9s 5c Ah | One Pair  | One Pair  | 1
8cTc | 9d6h | 8d 9s Jh 7d Ac | Straight  | One Pair  | 1
9d6h | TcQc | 9s Jh 5c Ah 9h | Trips     | One Pair  | 1
TcQc | 6h8d | Jh 5c 7d Ac 2c | Flush     | High Card | 1
6h8d | Qc9s | 5c 7d Ah 9h Kh | Straight  | One Pair  | 1
Qc9s | 8dJh | 7d Ah Ac 2c 5s | One Pair  | One Pair  | 1
8dJh | 9s5c | Ah Ac 9h Kh Js | Two Pair  | Two Pair  | 1
9s5c | Jh7d | Ac 9h 2c 5s 8s | Two Pair  | High Card | 1
Jh7d | 5cAh | 9h 2c Kh Js 7h | Two Pair  | High Card | 1
5cAh | 7dAc | 2c Kh 5s 8s 2d | Two Pair  | One Pair  | 1
7dAc | Ah9h | Kh 5s Js 7h 7s | Trips     | One Pair  | 1
Ah9h | Ac2c | 5s Js 8s 2d 9c | One Pair  | One Pair  | 1
Ac2c | 9hKh | Js 8s 7h 7s 4d | One Pair  | One Pair  | 1
9hKh | 2c5s | 8s 7h 2d 9c Kd | Two Pair  | One Pair  | 1
2c5s | KhJs | 7h 2d 7s 4d 8h | Two Pair  | One Pair  | 1
KhJs | 5s8s | 2d 7s 9c Kd Qd | One Pair  | High Card | 1
5s8s | Js7h | 7s 9c 4d 8h 6d | Straight  | One Pair  | 1
Js7h | 8s2d | 9c 4d Kd Qd 3s | High Card | High Card | 1
8s2d | 7h7s | 4d Kd 8h 6d Ks | Two Pair  | Two Pair  | 1
7h7s | 2d9c | Kd 8h Qd 3s 5h | One Pair  | High Card | 1
2d9c | 7s4d | 8h Qd 6d Ks 2h | One Pair  | High Card | 1
7s4d | 9cKd | Qd 6d 3s 5h As | Straight  | High Card | 1
9cKd | 4d8h | 6d 3s Ks 2h 4s | One Pair  | One Pair  | 1
4d8h | KdQd | 3s Ks 5h As 2s | Straight  | One Pair  | 1
KdQd | 8h6d | Ks 5h 2h 4s Ts | One Pair  | High Card | 1
8h6d | Qd3s | 5h 2h As 2s 6s | Two Pair  | One Pair  | 1
Qd3s | 6dKs | 2h As 4s Ts Qh | One Pair  | High Card | 1
6dKs | 3s5h | As 4s 2s 6s Jd | Flush     | Flush     | 1
3s5h | Ks2h | 4s 2s Ts Qh 3h | One Pair  | One Pair  | 1
Ks2h | 5hAs | 2s Ts 6s Jd 4c | One Pair  | High Card | 1
5hAs | 2h4s | Ts 6s Qh 3h Td | One Pair  | One Pair  | 1
2h4s | As2s | 6s Qh Jd 4c 3d | One Pair  | High Card | 1
As2s | 4sTs | Qh Jd 3h Td Kc | Straight  | One Pair  | 1
4sTs | 2s6s | Jd 3h 4c 3d Th | Two Pair  | One Pair  | 1
2s6s | TsQh | 3h 4c Td Kc 5d | Straight  | One Pair  | 1
TsQh | 6sJd | 4c Td 3d Th 7c | Trips     | One Pair  | 1
6sJd | Qh3h | Td 3d Kc 5d Ad | Flush     | One Pair  | 1
Qh3h | Jd4c | 3d Kc Th 7c 3c | Trips     | One Pair  | 1
Jd4c | 3hTd | Kc Th 5d Ad Qs | Straight  | One Pair  | 1
3hTd | 4c3d | Th 5d 7c 3c Jc | Two Pair  | One Pair  | 1
4c3d | TdKc | 5d 7c Ad Qs 4h | One Pair  | High Card | 1
TdKc | 3dTh | 7c Ad 3c Jc 6c | Flush     | One Pair  | 1
3dTh | Kc5d | Ad 3c Qs 4h 8c | One Pair  | High Card | 1
Kc5d | Th7c | 3c Qs Jc 6c 9d | High Card | High Card | 1

The first two columns show the pocket cards dealt to player 1 and player 2, respectively, and the fourth and fifth columns give their final hands, again respectively.

Immediately, the reader will recognize that there are 52! possible stackings of the deck, a number with more than 60 digits. Searching them all, even at supercomputer speed, would easily take longer than the life of the universe (at a trillion stackings per second, the estimate is on the order of $10^{50}$ years). So we cannot possibly look through them all.

The idea behind the local search is simple, and we’ve seen it before on this blog when we looked at decrypting substitution ciphers. We use steepest ascent in the discrete space of all orderings of the deck where the function to optimize is the number of cuts that win for the first player dealt. Neighbors consist of the set of stackings obtainable by swapping pairs of cards, and we use random restarts to avoid local maxima. Moreover, we parallelized the algorithm to run on a desktop supercomputer, access to which is graciously provided by the UIC mathematical computer science department. The result is lightning-fast results.

In other words, our algorithm starts with a random shuffle of the deck. We compute the value of a deck by dealing out a game for each of the 52 cuts of the deck, and seeing how many times player 1 wins. Call that value $v$. Then we compare $v$ with the value of all neighboring decks, where a neighbor is obtained by swapping any two cards in the deck. If any such stacking has a higher value, we take the stacking with the highest value, and repeat the process. Once we get to the point where no such neighboring deck has a higher value, we stop and output the final stacking. Then we start the whole process over again with a new random deck, and stop when we find an optimal stacking (or continue searching for more).

Note our choice of 2 player Texas Hold ‘Em is arbitrary, as is our preference for player 1 to win. We have found a number of optimal stackings for player 2 as well, and it’s not hard to imagine there are optimal stackings for Hold ‘Em with any number of players. We leave that for future research.

We chose to implement this algorithm in C++. Now this is one of the few times this author would willingly program in C/C++, because frankly the language is a jungle. A professor of mine once said “You can write programs fast, or you can write fast programs.” What he meant by that was you can write programs in a language that lets you express ideas concisely and clearly, or you can spend days laboriously working in C/C++ and end up with a program that will leave all other programs in the dust. Because at first we expected optimal stackings to be sparse, we imagined speed would be quite important, and that we’d have to analyze billions of different decks before we could find an optimal stacking. In hindsight this was probably unnecessary, but we do admittedly wallow in pride when quoting statistics about how fast the program runs.

All of the code we used is available for free on this blog’s Google Code page. We also include a text file containing the list of about 100,000 optimal stackings we found, and similar tables as above detailing the hands dealt for each cut.

In any case, we separated the code into three parts. First we implemented a general steepest-ascent framework for any problem. It’s general enough that the open-minded reader could easily use it to do steepest ascent for any problem, and one only needs implement the functions to generate neighbors and compute the value of a position. To give a pedantic (yet unrealistic) example, our posted code includes an instance of optimizing a polynomial function in one variable.

Second, we implemented a poker-hand evaluator and neighbor generator. In this part, we borrowed the lookup-table/perfect hashing methods first discovered by Kevin Suffecool and augmented by Paul Senzee. We do note that we take the “slow” method of evaluating a seven card hand, checking all 21 five-card poker hands and choosing the best. Otherwise, enlarging the lookup-table to cover seven-card hands would incur more than a 2,000 times increase in memory use, with only a speedup factor of 10 to 15 times. Considering that we’re running this on someone else’s personal supercomputer, we found it prudent to accept a smaller memory footprint. And even so, on the supercomputer we can evaluate about 7 million seven-card hands per second, down from about 87 million five-card hands per second. We gain the potential to go up to 87 million seven-card hands per second with the added memory footprint, but as it turns out we can find plenty of optimal stackings with the slower version.

Third, we parallelized the program using OpenMP to run on 12 threads, increasing the runtime by a factor of 12 for sufficiently large problem sizes. For smaller problem sizes, we see a speedup factor of roughly 10x, giving a reasonable 80% efficiency.

In the next section we detail the important aspects of the code.

## Enter the Jungle

Unfortunately, a primer on C++ is far beyond the scope of this blog. The best introductory text we know of is Prata’s C++ Primer Plus, but short of reading this behemoth and becoming a guru, the reader may safely skip ahead to the next section. An equally strong understanding of the algorithm itself can also be gained from our Python post on cryptanalysis with ngrams, though the program is admittedly much slower.

The general steepest ascent algorithm features a virtual C++ class called Position. A position has three functions: value(), neighbors(), and show(), the last of which returns a string containing a textual representation of a Position. Here are snippets of the code, taken from my hillclimb.h and hillclimb.cpp files:

class Position {
public:
Position();
virtual ~Position();
virtual double value() = 0;
virtual std::vector<Position *> *neighbors() = 0;
virtual std::string show() = 0;
};

We also define the “hillclimb” function which operates on positions:

Position* hillclimb(Position* posn, const int numSteps);

One important note is that we require the caller of the neighbors function to assume ownership of all the pointers in the returned vector, along with the returned vector itself. This author has a minor suspicion that this isn’t the proper way to do things in C++, but nobody has remarked otherwise.

Then the hillclimb function need know nothing about poker to work:

Position* hillclimb(Position* posn, const int numSteps) {
double value = posn->value(), nextValue;
int i;
bool foundBigger;

vector<Position *> *nbrs;
vector<Position *>::iterator itr;

for (i = 0; i < numSteps; i++) {
nbrs = posn->neighbors(); // allocates neighboring Positions

foundBigger = false;
for (itr = nbrs->begin(); itr != nbrs->end(); itr++) {
nextValue = (*itr)->value();

if (nextValue > value) {
foundBigger = true;
value = nextValue;
posn = *itr;
}
}

// free the computed neighbors
for (itr = nbrs->begin(); itr != nbrs->end(); itr++)
if (*itr != posn)
delete *itr;

delete nbrs;

if (!foundBigger)
break;
}

return posn;
}

Note that in the parallelization step we had to avoid the use of iterators, but that only changes the code superficially by adding additional indices.

For a short example on how to use this framework, the reader should investigate our quarticopt.h and quarticopt.cpp to maximize a quartic polynomial, included on this blog’s Google Code page with the rest of the code for this post.

The poker part required a bit of preprocessing to understand, but it is essentially a simple idea. First, we note that even though there are a lot of poker hands, many are equivalent up to scoring. For example, there are 1,020 different hands of the form AKQJ9 which score “high card,” up to a choice of suits which does not make the hand a flush. But none of these hands is any better than another. In other words, there are only 7,462 distinct five-card poker hands, which is a lot smaller than the some 2.5 million possible distinct five-card general hands.

The poker scoring function hence takes as in put a five-card hand and computes a number between 1 and 7,462 representing the absolute score of the hand. It does this through a combination of accessing a pre-generated look-up table for flushes and high-card hands, and a perfect hashing technique for the remaining hands. The interested reader will see Suffecool’s website for more details. It is quite novel, and results in a lightning-fast scoring algorithm.

Finally, we generate neighbors and score hands in the obvious (albeit tedious) way:

vector<Position *> *Deck::neighbors() {
vector<Position *> *listOfNeighbors = new vector<Position *>;
int i, j, temp;

for (i = 0; i < DECK_SIZE - 1; i++) {
for (j = i + 1; j < DECK_SIZE; j++) {
// copy to a new Deck
Deck *neighboringDeck = new Deck(*this);

// swap a pair of cards
temp = neighboringDeck->cards[i];
neighboringDeck->cards[i] = neighboringDeck->cards[j];
neighboringDeck->cards[j] = temp;

listOfNeighbors->push_back(neighboringDeck);
}
}

return listOfNeighbors;
}

double Deck::value() {
int cutPosn, count = 0, score1, score2;
int p1hand[7], p2hand[7];

for (cutPosn = 0; cutPosn < DECK_SIZE; cutPosn++) {
p1hand[0] = cards[cutPosn];
p1hand[1] = cards[(cutPosn + 2) % DECK_SIZE];
p1hand[2] = cards[(cutPosn + 5) % DECK_SIZE];
p1hand[3] = cards[(cutPosn + 6) % DECK_SIZE];
p1hand[4] = cards[(cutPosn + 7) % DECK_SIZE];
p1hand[5] = cards[(cutPosn + 9) % DECK_SIZE];
p1hand[6] = cards[(cutPosn + 11) % DECK_SIZE];

p2hand[0] = cards[(cutPosn + 1) % DECK_SIZE];
p2hand[1] = cards[(cutPosn + 3) % DECK_SIZE];
p2hand[2] = cards[(cutPosn + 5) % DECK_SIZE];
p2hand[3] = cards[(cutPosn + 6) % DECK_SIZE];
p2hand[4] = cards[(cutPosn + 7) % DECK_SIZE];
p2hand[5] = cards[(cutPosn + 9) % DECK_SIZE];
p2hand[6] = cards[(cutPosn + 11) % DECK_SIZE];

score1 = eval7Hand(p1hand);
score2 = eval7Hand(p2hand);

if (score1 < score2)
count++;
}

return count;
}


Where “eval7Hand” evaluates a seven-card poker hand to find the best five-card sub-hand as described above (and smaller scores are better). We note that the next time we work on this problem, we will likely extend the code to work for any number of hands and any number of players, and replace the ugly assignment operators above with a few loops. Moreover, we leave out an additional optimization from this code snippet for brevity.

Running the main application gives an output similar to the large table above for each discovered optimal stacking.

## Parallelization

A thorough introduction to parallel computation is again (sadly) beyond the scope of this post. Luckily, we chose the arguably simplest possible library for parallelization: OpenMP. With OpenMP, the serial code looks almost identical to the parallel code, with the exception of a few sagely placed pragmas (code annotations). For instance, our main function parallelizes at the top-most level, so that each thread performs steepest ascent trials independently of the others. From this perspective, the steepest ascent algorithm is pleasingly parallel. The code for that is:

int main(int argc, char* argv[]) {
[...]

vector<Deck *> optimalStackings;
#pragma omp parallel shared(optimalStackings)
{
Deck *bestSoFar = new Deck(), *result;
int i;

#pragma omp for schedule(static) nowait
for (i = 0; i < numTrials; i++) {
Deck startingDeck;
result = (Deck *)hillclimb(&startingDeck, trialLength);

if (result->value() == 52) {
#pragma omp critical
{
cout << result->show() << "\n";
optimalStackings.push_back(result);
}

if (bestSoFar->value() < 52)
delete bestSoFar;

bestSoFar = result;
} else if (bestSoFar->value() < result->value()) {
delete bestSoFar;
bestSoFar = result;
} else {
delete result;
}
}
}

[...]

return 0;
}


For an explanation of the pragmas and their syntax, see this reference/tutorial from Lawrence Livermore National Labs. In the code posted, we tried two different levels of parallelization. First, we parallelized at the top-most level, as above. The second way was  at the finest level, so that all threads were cooperating to evaluate the score of a single deck and compute neighbors. The former method turned out to be somewhat more efficient than the latter for large problem sizes. Specifically, in 1,000 iterations of a steepest ascent, the former was 17% faster. The speed increase is due to the significantly less system time overhead to schedule the threads, and the fact that evaluating a single deck is very fast, so at the end of every loop the thread synchronization time begins to outweigh the deck evaluation time. However, for smaller problem sizes the efficiency seems to drop off at a slower rate. We include a summary table of all of these details in a file called “interesting-properties.txt” included in the code archive.

We also include the full source code for the fine-granularity parallelization, with the mindset that it may perform better in other application domains, and it serves as an example implementation.

## Results

Running the algorithm overnight on the supercomputer gave us a list of approximately 100,000 distinct optimal stackings for two-player Hold ‘Em. In particular, we discovered the following interesting properties of stackings:

• Optimal stackings are plentiful. In particular, about one in six randomly chosen decks can be improved to an optimal stacking.
• The number of steps in the hill-climbing process is small. Specifically, in all of our tests we never witnessed a deck which required more than 15 steps to reach a local maximum. We do not have a proof that this is upper bound is sharp. Moreover, some decks were optimized with as few as 5 steps.
• There are optimal stackings both for player 1 and player 2.

We include the list of about 100,000 optimal stackings for the reader’s viewing pleasure, along with the game tables for their corresponding cuts, in a text file on this blog’s Google Code page. It’s a relatively large file (compressed ~70MB, uncompressed ~450MB). We note the possibility that some of these decks are redundant, but the probability of that happening is decidedly small. For a smaller set of examples, one can download the source code and look for a list of a thousand optimal stackings in the “decks” directory. We encourage the reader to try one out at a friendly neighborhood poker game. And finally, here is a list of ten optimal stackings, so that casual readers can avoid downloading and navigating the directory.

[AcQhJd8h9c9h6d3cTc3dQdKhJs6h3hThQc7d3sJc4h2h6c8d7c5c7s8cAd4dTd9sKs5h8s2c4c2d2s5sAhKd9dKcQs4s5dAs7hJhTs6s]
[Ac7cTd6d3c4c9hQc7h9d8hJc8c5sTc6hAdQs9s7s2cKc2s2d4s8d6sAh5d8sJs3sQhKd3hQdJh5cKh4h3d5hTsJd7dKs9c2h4dTh6cAs]

## Future Work

We are very interested in the existence or non-existence of optimal stackings for other games. We encourage the interested reader (say, as a programming exercise) to implement a steepest-ascent algorithm for optimal stackings in a different game, and post the results in a comment. Some ideas for other games include poker variants like Omaha and Lowball, along with completely unrelated games like Blackjack or Rummy. Might we even go so far as to investigate non-standard card games like Magic: the Gathering (a favorite during my youth)? A critical step in the analysis of most non-poker games would be to formalize what it means to force a win, or relax the definition to make it sensible.

But there is still much we can ask about Hold ‘Em. For instance, how hard is it to find optimal stackings for any given player when there are $n$ players? More specifically, how fast does the complexity grow when you add more players? Are there significantly more semi-optimal stackings, in which we allow ties for player 1? What sorts of interesting patterns occur in optimal or semi-optimal stackings? Many of these sorts of questions could be answered by minor modifications to the source code provided. We leave that as an exercise for the reader, until we find enough time ourselves to go through and find optimal stackings which we believe to exist for any reasonable number of players and any position of the winning player.

Until next time!

# In Place Uniform Shuffle

Problem: Write a program that shuffles a list. Do so without using more than a constant amount of extra space and linear time in the size of the list.

Solution: (in Python)

import random
random.seed()

def shuffle(myList):
n = len(myList)
for i in xrange(0, n):
j = random.randint(i, n-1) # randint is inclusive
myList[i], myList[j] = myList[j], myList[i]

Discussion: Using a computer to shuffle a deck of cards is nontrivial at first glance for the following reasons. First, computers are perfect. One can’t “haphazardly” spread the cards on a table and mix them around for a while. Neither can it use a “riffle” technique until it’s satisfied the cards are random enough. These are all human characteristics which have inherent sloppy flaws caused by our limited-precision dexterity. Another difficulty is that we have to give a mathematical guarantee that the resulting distribution of shuffles is uniform. It certainly isn’t when humans shuffle cards, so this adds a new level of difficulty.

We note that there are many gambling companies whose integrity is based on the validity of their shuffling algorithms (and hence, the fairness of their games). But despite the simplicity of our algorithm above, there are still some companies who got it wrong. So we need to take a close look at the right way to solve this problem.

Before we get there, we note that this problem generalizes to constructing random permutations. While it’s easier to understand a problem based on a deck of cards, generating random permutations is really the useful thing we’re getting out of this.

This page gives an example of how not to shuffle cards. We will derive the correct way. If we have a list of $n$ elements, and a good shuffling algorithm, then each element has a uniform probability of $1/n$ to end up in the first position in the list. Once we’ve chosen such an element, we can recursively operate with the remaining $n-1$ elements, and randomly choose which element goes in the second spot, where each has a chance of $1/(n-1)$ to get there. Note that this means for the first stage, we pick a random integer uniformly between 1 and $n$, and in the second stage we pick a integer between 2 and $n$. Inductively, if we have already processed the first $i$ cards, then we need to pick a random integer uniformly between $i+1$ and $n$ to decide which of the remaining cards goes in the $i+1$-th spot. Note that subtracting 1 from all of these randomly chosen numbers gives us the right indices.

We make one further note: the order of the unprocessed cards it totally irrelevant. That means that if, say, during the first stage we want the 5th element to go in the first spot, we can simply swap the fifth and first element in the list. Since we’re picking uniformly distributed numbers, we still have an equal chance to pick any one of the remaining cards in later stages. And of course, sometimes we will be swapping an element with itself, which is the same as not swapping at all.

Taking all of this into consideration, we have the following pseudocode:

on input list L:
for i in range(0, n-1) inclusive:
j = random(i, n-1) inclusive
swap L[i], L[j]

As we showed above, this pseudocode translates quite nicely to Python, and it obviously satisfies the requirements of not using a lot of extra space and running in linear time; we only visit each position in the list once, and swaps take constant time and all the swaps combined only use constant space. On the other hand, implementations in functional languages are a bit more difficult, and if the language is purely functional, it can’t be done “in place.” I’d usually be the last one to admit functional languages aren’t the best tool for every job, but there you have it.

# Busy Beavers, and the Quest for Big Numbers

## Finding Bigger Numbers, a Measure of Human Intellectual Progress

Before we get into the nitty gritty mathematics, I’d like to mirror the philosophical and historical insights that one can draw from the study of large numbers.

That may seem odd at first. What does one even mean by “studying” a large number? Of course, I don’t mean we stare at the number 1,000,000,000,000, which is quite large, and wonder how mankind can benefit from its elusive properties. What I really mean is that scientific and mathematical discoveries are very closely tied in our collective ability to describe large numbers.

That may seem even odder, but let’s enjoy a short historical digression to illustrate the point. Note that much of this historical information is borrowed from an essay by Scott Aaronson. He gives a wonderfully entertaining essay on the topic of Busy Beavers, but where he assumes the reader has no background knowledge of computing theory, we will assume the reader is familiar with the relevant computational topics in our primers.

In the age of the Greeks, it was generally believed that some quantities were beyond counting. Things like the number of grains of sand in the desert were forfeit to the realm of “infinity.” But in the third century B.C.E., Archimedes recognized that they weren’t beyond counting. And in his treatise The Sand Reckoner, he developed a rudimentary notion of exponentials, and was able to provide an upper bound on such mysterious numbers:

There are some [...] who think that the number of the sand is infinite in multitude [...] again there are some who, without regarding it as infinite, yet think that no number has been named which is great enough to exceed its multitude [...] But I will try to show you [numbers that] exceed not only the number of the mass of sand equal in magnitude to the earth [...] but also that of a mass equal in magnitude to the universe.

He proceeded to give an upper bound on the number of grains of sand needed to fill the universe: $10^{63}$. Now this was a quite large number, and certainly beyond most people’s ability to visualize in quantity. But by the time Arabic numerals and algebra became common worldly knowledge in the Middle Ages, exponentiation became a paradigm trivially expressed, allowing people to write such large numbers as $10^{10^{10}}$ with the same ease we do today. These sorts of counting exercises became the topic of mathematical folklore (think of the tale of rice filling a chessboard), and they run amok in contemporary discussions of finance, statistics, physics, economics, and computer science.

Let’s take a moment to investigate exactly how ridiculously huge exponentials are. And in spite of the awe we show for such gargantuan quantities, we foreshadow that exponents are far from the biggest numbers we can grapple.

Let’s take a large digit, say, 9, and repeat it a thousand times to make a number. By all rights, this is a big number! It has a whopping one thousand digits, and each decimal place is as large as it can be. Now, consider the meager $9^9$. Given a few annoying minutes, we could actually perform nine multiplications and get this number exactly; we’ll save you the trouble, it’s 387,420,489, a number with nine digits. Now, let’s inspect $9^{9^9}$. This number, requiring us to write only one more digit on the page, is $9^{387,420,489}$, a number with a staggering 369,693,100 decimal digits! Raising 9 to this power once more, to get $9^{9^{9^9}}$ is already too large for Wolfram Alpha to give much information about.

Now, come the 20th century we see two big jumps in the expression of large numbers. The first is the Ackermann Function. For the purposes of this post, we can think of it as simply a sequence of numbers, which is constructed as follows. The first number in the sequence, $A(1)$, is simply $1+1$. The next is $A(2) = 2*2$, then $A(3) = 3^3$, and so on. Of course, defining “and so on” is tough, because we don’t have commonly known names for such operations. The fourth number in this sequence $A(4)$, is 4 raised to the power of itself four times. I.e., $A(4) = 4^{4^{4^4}}$. We call this operation a tetration of the number 4 to the height of 4, and don it with the notation $4 \uparrow \uparrow 4$. This isn’t so hard to visualize, but while it’s a huge number in itself (it has $10^{154}$ decimal digits), the best is yet to come. The fifth element of this sequence, $A(5)$, is the nested tetration of 5 with itself 5 times. In other words, it is

$A(5) = 5 \uparrow \uparrow (5 \uparrow \uparrow (5 \uparrow \uparrow (5 \uparrow \uparrow 5)))$

For convenience, we give this the notation $5 \uparrow \uparrow \uparrow 5$, and call it pentation. Of course, $A(6)$ would then be 6 hexated to the height of 6, and so on forever. The brave reader will take this to its extreme, and imagine how huge $A(A(5))$ is… Mind blowing.

Of course, one might ask: where the hell did this beast come from? What are Ackermann numbers good for anyway? The Ackermann Function and it’s associated sequence, it turns out, serve as important counterexamples in the theory of computable functions. The familiar reader will recognize the claim that the Ackermann Function is computable, but not primitive recursive. For the rest of the world, here’s a historical explanation of its origin.

Alan Turing, the inventor of the Turing machine.

Back in the days before computers (the early 20th century), mathematicians were fiercely interested in the theory of computing things. The common visitor to this blog will recall our primers on Finite Automata, Turing Machines, and such things as the Game of Life. It might not be a surprise to hear that there are many many different models of computation, and before real computers actually came along it was the job of mathematicians to compare them, sorting out which were “more useful” or “more expressive” than others. In the end, the Turing machine model was actually used to create rudimentary computers, which have since evolved into the supercomputers and iPads we have today. The mathematicians exploring computation during this era were at the forefront of a technological and intellectual revolution. On their shoulders, humanity entered the so-called Age of Information.

Perhaps the original such study of computation was the idea of a computable function of the natural numbers. Given a set of elementary functions which are axiomatically defined to be computable, and a way of combining these elementary functions to make more complicated functions, mathematicians constructed a huge class of functions that were computable. For instance, the function which adds two numbers together is computable; we all learned a nice algorithm to perform addition in grade school. The definitions and axioms were chosen in order to affirm such easy tasks, and then mathematicians explored the axioms to see how far they could push them. This line of reasoning would bring one to the ultimate quest: for a given model of computation, figure out what sorts of things are not computable.

Of course, even without a firm foundation in the theory of computable functions, we would guess that the sequence of Ackermann numbers is computable. In fact, using the model of Turing machines (which happens to be equivalent to the theory of computable functions, insofar as the tasks it deems computable) we can prove it so. We have unbounded space and unbounded time, and we know for a fact that all of these mathematical operations boil down to repeated multiplication. We can even present pseudocode, and such sites as RosettaCode have hundreds of implementations, ranging from a single line of K to a tail-recursive implementation in OCaml. There is no doubt about it, the Ackermann function is computable. In a somewhat naive sense, the Ackermann sequence is no deeper or more insightful than Archimedes’s first attempt at exponentiation. It just carries the idea of exponentiation as far as it can go.

The next natural question is: are there any sequences of numbers which grow so ridiculously fast that they simply cannot be computed? Of course, we know already of one undecidable (non-computable) task: the halting problem. Recall that the halting problem asks if it is possible to design a computer program $T$ which, when given the source code of any other computer program $S$ and an input to that program $w$, can determine in a finite amount of time whether $S$ will loop infinitely on the input $w$, or whether it will halt. The blatant fact is that this is impossible. A hint at the proof is the question: how would this program analyze itself? What about it’s evil twin that does exactly the opposite? Here’s a comical version of the proof in Dr. Seuss form, and a more formal proof can be found in our primers on the subject.

As we have seen before, one way to prove that something is not computable is to first assume that it is computable, and then use that to construct a program that solves the halting problem. This results in a contradiction, and we conclude that the original problem could not be computable. In other words, we are looking for a sequence of numbers which has the following property: if we were able to compute an arbitrarily large entry of the sequence in a finite amount of time, we could use that ability to determine whether any program will halt or loop infinitely in a finite amount of time.

Of course, the idea that such a sequence should even exist is dubious. But amazingly enough, we will construct one, and find the next level of big numbers.

## Busy Beaver Numbers

The idea behind Busy Beavers is quite simple. It was discovered by Tibor Radó, a Hungarian mathematician, in May of 1962, and does not rely at all on the arithmetic expressions of the past. The idea requires a bit of knowledge about Turing machines, so after promoting our primer once more, we will reiterate the ideas.

A Turing machine is essentially an infinite strip of paper called the tape on which one scribbles symbols in a deterministic way. In order to know what to do at each step, a Turing machine has an internal state record, with a finite set of rules for how to transition from state to state and manipulate the tape (write new symbols, erase old symbols, and move around to different places on the tape). In full rigor, the Turing machine is only allowed to look at one of the symbols on the tape at a time. So one should truly imagine a little beaver running back and forth across his dam, making tick marks in the wood and never knowing what he’s going to do next until he sees the tick mark next to the one he just made. The Turing machine has no limit on the amount of time it can run, and the tape is infinite, so it can write as many symbols as it needs to do its work. In the end, it will either stop, report a  ”yes” or a “no,” or it will keep on going until the end of time, never to halt.

Perhaps understandably, the Busy Beaver numbers have to do with counting steps of computation in Turing machines. In particular, for a fixed number $n$, it is possible to look at all Turing machines which have exactly $n$ states in their internal record (indeed, there can only be finitely many, and it is not too difficult to come up with an exact number). Among these Turing machines, some will run forever and some will eventually stop. We only care about the ones that stop, and the largest possible number of steps that those machines take before stopping.

Definition: The $n$th Busy Beaver number, denoted $BB(n)$, is the largest number of steps taken by a Turing machine with exactly $n$ states, which uses only two symbols on the tape (say 0 and 1), and which begins with an initially blank tape (all of the entries start as 0).

In other words, $BB(n)$ is the number of steps taken by the buiest beaver of all. We call a Turing Machine which achieves the required number of steps a busy beaver.

Look at that cute little beaver run along.

Unsurprisingly, there are very few one-state busy beavers. Recalling the formal definition of a Turing machine, it must have either an accept state or a reject state, and if there is only one state and the machine halts, then it takes only one step before halting. Hence, $BB(1) = 1$. A bit of number crunching will convince the reader that the second Busy Beaver number is in fact $BB(2) = 6$. Not impressed? Well it took a bit of research to get there, but Radó (the original discoverer of the Busy Beaver numbers) proved $BB(3) = 21$. This required a mountain of work and many different partial results to acquire. Why is it so hard, you ask? The fact is, there is no algorithm to list the Busy Beaver numbers. It is an uncomputable function. Let’s give a proof of this before we get to the other known values of $BB(n)$.

Theorem: $BB(n)$ is uncomputable. That is, there is no Turing machine that can takes as input $n$ and computes $BB(n)$.

Proof. Suppose to the contrary that such a machine existed, and call it $T$. We will use $T$ to construct a machine which solves the halting problem as follows:

On input $\left \langle T, w \right \rangle$, a description of a Turing machine and its associated input, we can determine in finite time the number of states that $T$ has (it is encoded in the description of $T$). Now simply compute $BB(n)$, where $n$ is the number of states in $T$, and simulate $T$ on input $w$, counting its steps as it proceeds. Eventually the simulation of $T$ either halts, or makes more steps than $BB(n)$. In the former case, we may stop the computation and declare that $T$ halts. In the latter, we know that $BB(n)$ is the maximal number of steps that can occur for any Turing machine with $n$ states that eventually halts. Therefore, if $T$ had not halted on the input $w$, it must never halt. We may stop our computation there, proclaim an infinite loop, and thereby solve the halting problem.

This is a contradiction, so the sequence $BB(n)$ cannot be computable. $\square$

This gives some insight into how ridiculously fast $BB(n)$ grows. In fact, it not only grows faster than the Ackermann function, but it grows faster than any sequence of numbers that could ever be computed by a machine! Even with infinite time to do so! Indeed, if $BB(n)$ were bounded by some computable function, then we could compute the upper bounds for $BB(n)$, use that to eliminate Turing machines that run too long until we narrow down the exact answer, and this would all happen in a finite amount of time.

In a sense, this is completely understandable. Computers are extremely complex! Applications like web browsers only need millions or billions of states, and in the proof above we’re pretending we could determine what all programs would do, even if they have trillions or quadrillions or $A(1000)$ states! Imagine all of the crazy things that could happen with that amount of computing power. The colloquial thesis is that a single program is simply not powerful enough to know all of that. It’s like asking a human to read the minds of every human on the planet, all of the conscious thoughts of simpler animals, and all of the minds of aliens that planet Earth has never dreamed of, but which perhaps reside in faraway galaxies. For indeed, a Turing machine that could count Busy Beaver numbers must itself have finitely many states, and halt on any well-formed input, but be able to reason about such mysterious behaviors of Turing machines that have never been created.

The first three values of $BB(n)$ were indeed wimpy, but now that we know it grows unfathomably quickly, we can mention the remaining known results. First, $BB(4) = 107$ was proved in the 1980′s. But for $BB(5)$, the best known lower bound is 47,176,870, and this was found as recently as 1990. On the other hand, in July of 2010 $BB(6)$ was discovered to be no smaller than  $7.412 \cdot 10^{36,534}$, and it is generally believed that this value will never be known. Nobody is brave enough to give a stab at such higher numbers in the sequence as $BB(10)$, and understandably so.

Plato: “One day, mankind will bask in enlightenment, having computed the Busy Beaver function!” Aristotle: “Word.”

In this light, the Ackermann function is a wimp. But there are some important philosophical issues to consider once more. In Archimedes’s day, people generally believed the sands were beyond counting, and they were later enlightened to know better. Even Archimedes would have been baffled by such modern constructions as pentation and hexation, but nowadays a high school mathematics student can handle the idea (at least, without attempting to visualize it). The day may come to pass when we invent a new way of defining large numbers, some which make the Busy Beaver numbers seem like plain old exponentiation. In those days, perhaps we will define a new model of computation that defies the halting problem and ushers in a new era of mathematical insight and technological prowess.

Indeed, such a model would make long-standing mathematical problems moot. With the power to compute Busy Beaver numbers, it would certainly be trivial to come up with a way to definitively answer the Goldbach Conjecture, or any other mathematical question which can be verified for a single number. In fact, we know already of a simple way to prove the Goldbach Conjecture using Busy Beavers. We leave this as an exercise to the aspiring reader.

Heiner Marxen’s page provides an up-to-date account of the current state of the Busy Beaver Problem, and a list of possible candidates for BB(5), the estimates on BB(6), and links to relevant papers. Scott Aaronson’s essay on Busy Beavers delves deeper into the philosophical implications of solving the Busy Beaver problem (and gives rich analogies), but these considerations, which are essentially fanciful speculation, are beyond the scope of this blog.

Speaking of the scope of this blog, we have yet to write any programs! This will not do, and so next time we will write a program which simulates the currently known busy beavers (or busy beaver candidates) for $n \leq 6$, and displays the changing tape as the beaver runs his route. In a sense, we will be writing a universal Turing machine, which is a commonly used tool in the theory, and not hard to implement.

Until next time!

# Handshake Lemma

Problem: Prove or disprove: at a party of $n$ people, there must be an even number of people who have an odd number of friends at the party.

Solution: Let $P$ be the set of all people, and for any person $p \in P$, let $d(p)$ be the number of friends that person has. Let $f$ be the total number of friendships between pairs of people at the party. Since each friendship among two friends at the party concerns exactly two people, we see that the following is true:

$\displaystyle \sum_{p \in P} d(p) = 2f$

In other words, counting up all of the number of friends that each person has will always be twice the number of total friendships. In particular, this number is always even. If there were an odd number of people with an odd number of friends at the party, and everyone else had an even number of friends, then the total number $\sum_{p \in P} d(p)$ would necessarily be odd.

Hence, there can only be an even number of people with an odd number of friends at the party. $\square$

Discussion: This is a proof by “double-counting,” so to speak. Double counting proves an identity by showing that each side of the equality can be thought of as a different way of counting elements in the same set. In the above proof, we are counting the set of one-sided friendships in two different ways.

This technique comes up all over mathematics (particularly in combinatorics, the field of maths concerned with counting things). We can relate this to graph theory (see our very basic introduction to the topic), by seeing that a party is just a graph where the vertices are people and the edges are friendships. This theorem then says that for any graph, the sum of the degrees of each vertex is twice the number of edges, and hence always even.

Double counting has been used to prove a number of identities, and it can even be used to prove something as interesting as Fermat’s Little Theorem (which has quite a few proofs, and quite a few uses!) and counting the number of trees made from a set of n distinct vertices (it’s quite large: $O(n^n)$ in fact!).