Hunting Serial Killers


“Tonight’s the Night”

A large volume of research goes into the psychological and behavioral analysis of criminals. In particular, serial criminals hold a special place in the imagination and nightmares of the general public (at least, American public). Those criminals with the opportunity to become serial criminals are logical, cool-tempered, methodical, and, of course, dangerous. They walk among us in crowded city streets, or drive slowly down an avenue looking for their next victim. They are sometimes neurotic sociopaths, and other times amicable, charming models of society and business. But most of all, they know their craft well. They work slowly enough to not make mistakes, but fast enough to get the job done and feel good about it. Their actions literally change lives.

In other words, they would be good programmers. If only they all hadn’t given up trying to learn C++!

In all seriousness, a serial killer’s rigid methodology sometimes admits itself nicely to mathematical analysis. For an ideal serial criminal (ideal in being analyzable), we have the following two axioms of criminal behavior:

  1. A serial criminal will not commit crimes too close to his base of operation.
  2. A serial criminal will not travel farther than necessary to find victims.

The first axiom is reasonable because a good serial criminal does not want to arouse suspicion from his neighbors. The second axiom roughly describes an effort/reward ratio that keeps serial offenders from travelling too far away from their homes.

These axioms have a large amount of criminological research behind them. While there is little unifying evidence (the real world is far too messy), there are many bits and pieces supporting these claims. For example, the frequency of burglaries peak about a block from the offender’s residence, while almost none occur closer than a block (Turner, 1969, “Delinquency and distance”). Further, many serial rapists commit subsequent rapes (or rather, abductions preceding rape) within a half mile from the previous (LeBeau, 1987, “Patterns of stranger and serial rape offending”). There are tons of examples of these axioms in action in criminology literature.

On the other hand, there are many types of methodical criminals who do not agree with these axioms. Some killers murder while traveling the country, while others pick victims with such specific characteristics that they must hunt in a single location. So we take the following models with a grain of salt, in that they only apply to a certain class of criminal behavior.

With these ideas in mind, if we knew a criminal’s base of operation, we could construct a mathematical model of his “buffer zone,” inside of which he does not commit crime. With high probability, most of his crimes will lie just outside the buffer zone. This in itself is not useful in the grand scheme of crime-fighting. If we know a criminal’s residence, we need not look any further. The key to this model’s usefulness is working in reverse: we want to extrapolate a criminal’s residence from the locations of his crimes. Then, after witnessing a number of crimes we believe to be committed by the same person, we may optimize a search for the offender’s residence. We will use the geographic locations of a criminal’s activity to accurately profile him, hence the name, geoprofiling.

Murder, She Coded

Historically, the first geoprofiling model was crafted by a criminologist named Dr. Kim Rossmo. Initially, he overlaid the crime locations on a sufficiently fine $ n \times m$ grid of cells. Then, he uses his model to calculate the probability of the criminal’s residence lying within each cell. Rossmo’s formula is displayed below, and explained subsequently.

$ \displaystyle P(x) = \sum \limits_{\textup{crime locations } c} \frac{\varphi}{d(x,c)^f} + \frac{(1-\varphi)B^{f-g}}{(2B-d(x,c))^g}$,

where $ \varphi = 1$ if $ d(x,c) > B, 0$ otherwise.

Here, $ x$ is an arbitrary cell in the grid, $ d(x,c)$ is the distance from a cell to a crime location, with some fixed metric $ d$. The variable $ \varphi$ determines which of the two summands to nullify based on whether the cell in question is in the buffer zone. $ B$ is the radius of the buffer zone, and $ f,g$ are formal empirically tuned parameters. Variations in $ f$ and $ g$ change the steepness of the decay curve before and after the buffer radius. We admit to have no idea why they need to be related, and cannot find a good explanation in Rossmo’s novel of a dissertation. Instead, Rossmo claims both parameters should be equal. For the purposes of this blog we find their exact values irrelevant, and put them somewhere between a half and two thirds.

This model reflects the inherent symmetry in the problem. If we may say that an offender commits a crime outside a buffer of some radius $ B$ surrounding his residence, then we may also say that the residence is likely outside a buffer of the same radius surrounding each crime! For a fixed location, we may compute the probability of the offender’s residence being there with respect to each individual crime, and just sum them up.

This equation, while complete, has a better description for programmers, which is decidedly easier to chew in small bites:

Let d = d(x,c)
if d > B:
   P(x) += 1/d^f
   P(x) += B^(f-g)/(2B-d)^g

Then we may simply loop this routine over over all such $ c$ for a fixed $ x$, and get our probability. Here we see the ideas clearly, that outside the buffer zone of the crime the probability of residence decreases with a power-law, and within the buffer zone it increases approaching the buffer.

Now, note that these “probabilities” are not, strictly speaking, probabilities, because they are not normalized in the unit interval $ [0,1]$. We may normalize them if we wish, but all we really care about are the relative cell values to guide our search for the perpetrator. So we abuse the term “high probability” to mean “relatively high value.”

Finally, the distance metric we actually use in the model is the so-called taxicab metric. Since this model is supposed to be relevant to urban serial criminals (indeed, where the majority of cases occur), the taxicab metric more accurately describes a person’s mental model of distance within a city, because it accounts for roadways. Note that in order for this to work as desired, the map used must be rotated so that its streets lie parallel to the $ x,y$ axes. We will assume for the rest of this post that the maps are rotated appropriately, as this is a problem with implementation and not the model itself or our prototype.

Rossmo’s model is very easy to implement in any language, but probably easiest to view and animate in Mathematica. As usual, the entire program for the examples presented here is available on this blog’s Github page. The decay function is just a direct translation of the pseudocode:

rossmoDecay[p1_, p2_, bufferLength_, f_, g_, distance_] :=
  With[{d = distance[p1, p2]},
   If[d > bufferLength,
    (bufferLength^(g - f))/(2 bufferLength - d)^g]];

We then construct a function which computes the decay from a fixed cell for each crime site:

makeRossmoFunction[sites_, buffer_, f_, g_] :=
  Function[{x, y},
    Map[rossmoDecay[#,{x,y},buffer,f,g,ManhattanDistance] &,

Now we may construct a “Rossmo function,” (initializing the parameters of the model), and map the resulting function over each cell in our grid:

Array[makeRossmoFunction[sites, 14, 1/3, 2/3], {60, 50}];

Here the Array function accepts a function $ f$, and a specification of the dimensions of the array. Then each array index tuple is fed to $ f$, and the resulting number is stored in the $ i,j$ entry of the array. Here $ f: \mathbb{Z}_{60} \times \mathbb{Z}_{50} \to \mathbb{R}^+$. We use as a test the following three fake crime sites:

sites = {{20, 25}, {47, 10}, {55, 40}};

Upon plotting the resulting array, we have the following pretty picture:

A test of Rossmo’s geographic profiling model on three points.

Here, the crime locations are at the centers of each of the diamonds, and cells with more reddish colors have higher values. Specifically, the “hot spot” for the criminal’s residence is in the darkest red spot in the bottom center of the image.

As usual, in order to better visualize the varying parameters, we have the following two animations:

A variation of the “f” parameter from 0.1 and 1.25 in steps of 0.05. “g” is fixed at 2/3.

Variation in the “g” parameter from 0.1 to 1.25 in steps of 0.05. “f” is fixed at 1/2.

Variation in the $ B$ parameter simply increases or decreases the size of the buffer zone. In both animations above we have it fixed at 14 units.

Despite the pretty pictures, a mathematical model is nothing without empirical evidence to support it. Now, we turn to an analysis of this model on real cases.

“Excellent!” I cried. “Elementary mathematics,” said he.

Richard Chase

The first serial killer we investigate is Richard Chase, also known as the Vampire of Sacramento. One of the creepiest murderers in recent history, Richard Chase believed he had to drink the blood of his victims in order to live. In the month of January 1978, Chase killed five people, dumping their mutilated bodies in locations near his home.

Before we continue with the geographic locations of this particular case, we need to determine which locations are admissible. For instance, we could analyze abduction sites, body drop sites, locations of weapons caches or even where the perpetrator’s car was kept. Unfortunately, many of these locations are not known during an investigation. At best only approximate abduction sites can be used, and stash locations are usually uncovered after an offender is caught.

For the sake of the Chase case, and subsequent cases, we will stick to the most objective data points: the body drop sites. We found this particular data in Rossmo’s dissertation, page 272 of the pdf document. Overlaid on a 30 by 30 grid, they are:

richardChaseSites =
   {{3, 17}, {15, 3}, {19, 27}, {21, 22}, {25, 18}};
richardChaseResidence = {19,17};

Then, computing the respective maps, we have the following probability map:

The Rossmo probability map for the Richard Chase body drop sites. Here B = 5, f = 1/2, g = 1

If we overlay the location of Chase’s residence in purple, we see that it is very close to the hottest cell, and well-within the hot zone. In addition, we compare this with another kind of geoprofile: the center of gravity of the five sites. We color the center of gravity in black, and see that it is farther from Chase’s residence than the hot zone. In addition, we make the crime sites easy to see by coloring them green.

Additional data points: center of gravity in black, Chase’s residence in purple, and crime sites in green.

Albert DeSalvo

This is a great result for the model! Let us see how it fares on another case: Albert DeSalvo, the Boston strangler.

With a total of 13 murders and being suspected of over 300 sexual assault charges, DeSalvo is a prime specimen for analysis. DeSalvo entered his victim’s homes with a repertoire of lies, including being a maintenance worker, the building plumber, or a motorist with a broken-down car. He then proceeded to tie his victims to a bed, sexually assault them, and then strangle them with articles of clothing. Sometimes he tied a bow to the cords he strangled his victims with.

We again use the body drop sites, which in this case are equivalent to encounter sites. They are:

deSalvoSites = {{10, 48}, {13, 8}, {15, 11}, {17, 8}, {18, 7},
  {18, 9}, {19, 4}, {19, 8}, {20, 9}, {20, 10}, {20, 11},
  {29, 23}, {33, 28}};
deSalvoResidence = {19,18};

Running Rossmo’s model again, including the same extra coloring as for the Chase murders, we get the following picture:

The Rossmo probability map for Albert DeSalvo’s murders. Here B=10, f= 1/2, g = 1.

Again, we win. DeSalvo’s residence falls right in the darker of our two main hot zones. With this information, the authorities would certainly apprehend him in a jiffy. On the other hand, the large frequency of murders in the left-hand side pulls the center of gravity too close. In this way we see that the center of gravity is not a good “measure of center” for murder cases. Indeed, it violates the buffer principle, which holds strong in these two cases.

Peter Sutcliffe

Finally, we investigate Peter Sutcliffe, more infamously known as the Yorkshire Ripper. Sutcliffe murdered 13 women and attacked at least 6 others between 1975 and 1980 in the county of West Yorkshire, UK. He often targeted prostitutes, hitting them over the head with a hammer and proceeding to sexually molest and mutilate their bodies. He was finally caught with a prostitute in his car, but was not initially thought to be the Yorkshire Ripper until after police returned to the scene of his arrest to look for additional evidence. They found his murder weapons, and promptly prosecuted him.

We list his crime locations below. Note that these include body drop sites and the attack sites for non-murders, which were later reported to the police.

sutcliffeSites = {{5, 1}, {8, 7}, {50, 99}, {53, 68}, {56, 72},
  {59, 59}, {62, 57}, {63, 85}, {63, 87}, {64, 83}, {69, 82},
  {73, 88}, {80, 88}, {81, 89}, {83, 88}, {83, 87}, {85, 85},
  {85, 83}, {90, 90}};
sutcliffeResidences = {{60, 88}, {58, 81}};

Notice that over the course of his five-year spree, he lived in two residences. One of these he moved to with his wife of three years (he started murdering after marrying his wife). It is unclear whether this changed his choice of body drop locations.

Unfortunately, our attempts to pinpoint Sutcliffe’s residence with Rossmo’s model fail miserably. With one static image, guessing at the buffer radius, we have the following probability map:

Failure in the form of a probability map.

As we see, both the center of gravity and the hot zones are far from either of Sutcliffe’s residences. Indeed, even with a varying buffer radius, we are still led to search in unfruitful locations:

An animation of the buffer radius parameter B varying between 1 and 50. Clearly no buffer will give us the desired probability map. Poop.

Even with all of the axioms, all of the parameters, all of the gosh-darn work we went through! Our model is useless here. This raises the obvious question, exactly how applicable is Rossmo’s model?

The Crippling Issues

The real world is admittedly more complex than we make it out to be. Whether the criminal is misclassified, bad data is attributed, or the killer has some special, perhaps deranged motivation, there are far too many opportunities for confounding variables to tamper with our results. Rossmo’s model even requires that the killer live in a more or less central urban location, for if he must travel in a specific direction to find victims, he may necessarily produce a skewed distribution of crime locations.

Indeed, we have to have some metric by which to judge the accuracy of Rossmo’s model. While one might propose the distance between the offender’s residence and the highest-probability area produced on the map, there are many others. In particular, since the point of geographic profiling is to prioritize the search for a criminal’s residence, the best metric is likely the area searched before finding the residence. We call this metric search area. In other words, search area is the amount of area on the map which has probability greater or equal to the cell containing the actual residence. Indeed Rossmo touts this metric as the only useful metric.

However, according to his own tests, the amount of area searched on the Sutcliffe case would be over a hundred square miles! In addition, Rossmo neither provides an idea of what amount of area is feasibly searchable, nor any global statistics on what percentage of cases in his study resulted in an area that was feasibly searchable. We postulate our own analysis here.

In a count of Rossmo’s data tables, out of the fifteen individual cases he studied, the average search area was 395 square kilometers, or 152.5 square miles, while the median was about 87 square kilometers, or 33.6 square miles. The maximum is 1829 square kilometers, while the min is 0.2 square kilometers. The complete table is contained in the Mathematica notebook on this blog’s Github page.

From the 1991 census data for Vancouver, we see that a low density neighborhood has an average population of 2,380 individuals per square kilometer, or about 6,000 per square mile. Applying this to our numbers from the previous paragraph, we have a mean of 940,000 people investigated before the criminal is found, a median of 200,000, a max of four million (!), and a min of 309.

Even basing our measurements on the median values, this method appears to be unfeasible as a sole means of search prioritization. Of course, real investigations go on a lot more data, including hunches, to focus search. At best this could be a useful tool for police, but on the median, we believe it would be marginally helpful to authorities prioritize their search efforts. For now, at least, good ol’ experience will likely prevail in hunting serial killers.

In addition, other researchers have tested human intuition at doing the same geographic profiling analysis, and they found that with a small bit of training (certainly no more than reading this blog post), humans showed no significant difference from computers at computing this model. (English, 2008) Of course, for the average human the “computing” process (via pencil and paper) was speedy and more variable, but for experienced professionals the margin of error would likely disappear. As interesting as this model may be, it seems the average case is more like Sutcliffe than Chase; Rossmo’s model is effectively a mathematical curiosity.

It appears, for now, that our friend Dexter Morgan is safe from the threat of discovery by computer search.

Alternative Models

The idea of a decay function is not limited to Rossmo’s particular equation. Indeed, one might naturally first expect the decay function to be logarithmic, normal, or even exponential. Indeed, such models do exist, and they are all deemed to be roughly equivalent in accuracy for appropriately tuned parameters. (English, 2008) Furthermore, we include an implementation of a normal growth/decay function in the Mathematica notebook on this blog’s Github page. After reading that all of these models are roughly equivalent, we did not conduct an explicit analysis of the normal model. We leave that as an exercise to the reader, in order to become familiar with the code provided.

In addition, one could augment this model with other kinds of data. If the serial offender targets a specific demographic, then this model could be combined with demographic data to predict the sites of future attacks. It could be (and in some cases has been) weighted according to major roadways and freeways, which reduce a criminals mental model of distance to a hunting ground. In other words, we could use the Google Maps “shortest trip” metric between any two points as its distance metric. To our knowledge, this has not been implemented with any established mapping software. We imagine that such an implementation would be slow; but then again, a distributed network of computers computing the values for each cell in parallel would be quick.

Other Uses for the Model

In addition to profiling serial murders, we have read of other uses for this sort of geographic profiling model.

First, there is an established paper on the use of geographic profiling to describe the hunting patterns of great white sharks. Briefly, we recognize that such a model would switch from a taxicab metric to a standard Euclidean metric, since the movement space of the ocean is locally homeomorphic to three-dimensional Euclidean space. Indeed, we might also require a three-dimensional probability map for shark predation, since sharks may swim up or down to find prey. Furthermore, shark swimming patterns are likely not uniformly random in any direction, so this model is weighted to consider that.

Finally, we haphazardly propose additional uses for this model: pinpointing the location of stationary artillery, locating terrorist base camps, finding the source of disease outbreaks, and profiling other minor serial-type criminals, like graffiti vandalists.

Data! Data! My Kingdom for Some Data!

As recent as 2000, one researcher noted that the best source of geographic criminal data was newspaper archives. In the age of information, and given the recent popularity of geographic profiling research, this is a sad state of being. As far as we know, there are no publicly available indexes of geographic crime location data. As of the writing of this post, an inquiry to a group of machine learning specialists has produced no results. There doesn’t seem to be such a forum for criminology experts.

If any readers have information to crime series data that is publicly available on the internet (likely used in some professor’s research and posted on their website), please don’t hesitate to leave a comment with a link. It would be greatly appreciated.

Optimally Stacking the Deck – Kicsi Poker

A Puzzle is Born

Sitting around a poolside table, in the cool air and soft light of a June evening, a few of my old friends and I played a game of Texas Hold ‘Em. While we played we chatted about our times in high school, of our old teachers, friends, and, of course, our times playing poker. We even reminisced lightly about some of the more dishonest people we played with (whom we invited for want of a larger payout), but especially we recalled the times we caught them cheating. We witnessed dealing from the bottom of the deck, two people teaming up with each other to fix hands, and some very blatant deck stacking.

Often having a player different from the dealer cut the deck (usually the player to the dealer’s right, as the cards are dealt starting to the left) absolves the dealer of any foul play. This is tacit at the poker table, and thinking of this on that midsummer evening, a puzzle popped into my head:

For a fixed set of rules in a game using a standard deck of 52 cards, is it possible to optimally stack a deck, such that no matter where the deck is cut, a specific player always wins?

Of course, my mind was initially set on answering this for Texas Hold ‘Em (with a fixed number of players). A public suggestion that such a stacking might be possible halted activity at the table as each person thought about it, attempting to find some quick flaw. Of course, it occurred to everyone at the table that if a stacking existed, it very well could have happened in one of the many hands we collectively played over our (albeit short) lifetimes. Finally, Forrest, one of the men at the table, simply said, “This is blowing my mind.”

At this point I thought it might actually be a puzzle worth working on.

Preliminary Thoughts

This question can be asked of any card game, so we’ll start with as simple a game as we can: Kuhn Poker. There are two players and three cards: a King, a Queen, and a Jack. Each player is dealt one card, and with after a single round of betting, the player with the highest card wins the pot. From a game-theoretic perspective, there are some interesting results on this game; specifically, the first player has multiple optimal betting strategies, while the second player has only one. Despite this, if the second player plays optimally, he should expect to win at a rate of 1/18th of a betting unit per hand. That is, of course, assuming the first player does not stack the deck.

For this combinatorially tractable game we may analyze each of the six cases quickly. We will describe a stacking of the deck by the first letters of its cards in order, with the left end of the string being first card dealt. For example, “KQJ” represents the unique stacking for which the first dealt card is a King, the second a Queen, and the third a Jack. Then, a cut is specified by positions between cards, indexed from zero as follows: “0K1Q2J.” Here cutting at 0 means the deck is left as is, which is usually an option given to the cutter. Cutting at 1 turns the deck into QJK, and at 2 we get JKQ. Here is our analysis of the game:

KQJ: cut at 2, receiving a King against a Queen
KJQ: cut at 1, receiving a Queen against a Jack
QKJ: cut at 1, receiving a King against a Jack
QJK: cut at 2, receiving a King against a Queen
JKQ: cut at 0, receiving a King against a Jack
JQK: cut at 0, receiving a Queen against a Jack

So no such stacking is possible for Kuhn Poker. While this is disheartening to the cheater, the honest card player may now rest assured that with a random cut, no cheater is secure.

Note that this same argument extends to any game in which each player is dealt only one card, and we do not allow a tie to result from a good stacking (i.e. the second player cuts aiming for a tie if he cannot cut to win). The only such one-card poker game I know that is actually played is Indian Poker (also known as Blind Man’s Bluff), and two-player Indian Poker admits an identical analysis. Suppose we have a stacking of a deck of $ n$ cards which is optimal for the first player dealt at any cut. Then card $ k$ strictly beats card $ k+1$ for all positions $ k$ of cards in the deck. By transitivity of “beats,” we have that the first card in the deck beats the last card (in fact, every card beats the last card), and hence the second player may cut at position $ n-1$. Since this forces the first player to receive the last card (the worst card in the deck), the second player will not lose, contradicting our original assumption that the deck was optimally stacked.

So no such optimal stacking exists for Indian Poker either. Despite the easy answer to this problem, we cannot assume the same result holds for Texas Hold ‘Em. Even if a cut results in the first player being dealt low cards, there are sufficiently many rules so as to allow the first player to win (for example, he could have a low flush or straight). So let us try to invent a poker with sufficiently many rules to allow optimal deck stacking, if it is at all possible.

K Poker

We define a new game called K Poker. The K stands for “kicsi,” (pronounced kee-chi) the Hungarian word for “small,” and also for “Kun,” the family name of its creator (also Hungarian).

K Poker is a two player game. A K Poker deck contains nine consecutively ranked cards of the same suit; we use deuce through ten. Each player is dealt two cards face down from the top of the deck. Dealing is done by the poker standard, to the dealer’s left, one card at a time. After a round of betting, three community cards are placed face up in the center for all players to use. After another round of betting, one further community card is dealt, and after a final round of betting, the players reveal their cards and the best five-card poker hand wins. Dealing then switches, and the next hand is dealt.

Actually, the betting scheme here is irrelevant, since we simply care about which player wins in the end. For our purposes, we might as well just say each player is dealt two cards and there are four community cards which both players may use to make a hand. But we might as well define a complete playable set of rules, so there it is. The first player dealt will hereby be referred to as player one, while the second player dealt will be player two.

Note that there are only two hands in K Poker: high card and straight (or, since all cards are the same suit, a straight flush, but this name is inappropriately extravagant). A player receives a straight about one fifth of the time, while high card comparison is the default.

This simple game, while not realistically enjoyable, has theoretical interest to us: optimal stackings exist for both players.

One optimal stacking is very nice: a sorted, ascending K Poker deck is optimal for player two. On one level this is obvious, because the second player will always receive the last card dealt. Since there are no burn cards, the next four community cards, being in ascending order form player two’s straight. Things just happen to work out in the cases where player two has no straight. As an exercise to the reader, prove (or verify) that with an ascending K Poker deck, player one can never receive a straight, no matter the cut. Then, notice that for both cuts which give player one the ten, player two receives a straight. Otherwise, player two obviously receives the highest card.

It is intractable by brute force human effort to find the other optimal stackings, and it is not obvious whether others exist. Trying some ‘orderly’ stackings (descending, alternating, etc.) do not work. So while we could build up an army of lemmas and work toward some theorem, we find it more cost-efficient for this particular game to conduct a computer search, which may later guide our efforts in proving lemmas and theorems.

We chose a nine-card game specifically with this in mind. In particular, there are merely $ 9! = 362,880$ different permutations of the deck. Since each deck has nine cuts, we may further reduce our search to those stackings which begin with a deuce. This reduces the number of decks to $ 8! = 40,320$. Since we may easily determine who wins on each cut of a stacking (bringing us back up to $ 9!$ computations), a brute-force search is reasonable.

We Came, We Sought, We Cheated

This time, our excuse for coding in Mathematica is its lightning fast look-up table for permutations. The Mathematica notebook is posted online (along with the code for our followup post on optimally stacking Texas Hold ‘Em) at this blog’s Github page. Since the code is not exactly scintillating material, we leave the reader to download it at his own discretion.

We found nine distinct (up to cuts) optimal stacking for player one, and merely two for player two. We list them here, where the leftmost digit is on the top of the deck:

Player one:
(2, 5, 10, 7, 3, 8, 4, 9, 6),
(2, 5, 10, 7, 3, 9, 4, 8, 6),
(2, 5, 10, 7, 4, 9, 3, 8, 6),
(2, 6, 8, 3, 9, 4, 7, 10, 5),
(2, 6, 9, 3, 8, 4, 7, 10, 5),
(2, 6, 9, 4, 8, 3, 7, 10, 5),
(2, 6, 10, 7, 3, 8, 4, 9, 5),
(2, 6, 10, 7, 3, 9, 4, 8, 5),
(2, 6, 10, 7, 4, 9, 3, 8, 5)

Player two:
(2, 3, 4, 5, 6, 7, 8, 9, 10),
(2, 5, 6, 7, 8, 9, 10, 4, 3)

Wonderful! These are the only optimal stackings, though we admit if a stacker is willing to accept draws, there are many more admissible stackings.

First, notice that there are many more optimal stackings for player one than player two! In fact, counting cuts, player one has 81 optimal stackings, while player two has a mere 18. Though player one still only has one fifth of a percent chance to get such a stacking at random, that is still better than player two’s mere 0.04%. This does not necessarily say that the game is unfairly balanced for player one, though. There may exist more stackings which admit a larger proportion of wins to player two than player one. And, of course, by the symmetry of switching dealers every hand, the game must be balanced across multiple hands.

Second, notice that when we make a slight modification of the rules, when we burn a single card before dealing the community cards, the count of optimal stackings is reversed: player two has nine while player one has just two. In addition, stacking the deck in descending order is optimal for player one.

Forgetting the sinister tricks we might play on friends, this has exciting implications. Adding complexity gave us more optimal stackings! Further, there is a glimmer of symmetry to the optimal stackings. It remains to be seen whether there is a limit to how much complexity we may have while still maintaining the existence of optimal stackings. We conjecture there is such a limit.

We could try adding more cards or allowing for more poker hands, and seeing if we can maintain optimal stackings. However, the factorial growth rate of the number of deck stackings would limit us almost immediately. Clearly, and sadly enough, a brute force search for optimal stackings in two-player Texas Hold ‘Em would take far longer than the life of the universe. So that is out. In addition, the sparsity of optimal stackings in the case of nine cards implies that a Monte Carlo (random sampling) search is unlikely to produce results. While Monte Carlo is still a viable option, perhaps augmented with some local search methods, we find it more likely that the patterns in K Poker generalize to patterns in Texas Hold ‘Em. We leave both tantalizing problems open for future posts.

Until then!

Low Complexity Art

The Art of Omission

Whether in painting, fiction, film, landscape architecture, or paper folding, art is often said to be the art of omission. Simplicity breeds elegance, and engages the reader at a deep, aesthetic level.

A prime example is the famous six-word story written by Ernest Hemingway:

For sale: baby shoes, never worn.

He called it his best work, and rightfully so. To say so much with this simple picture is a monumental feat that authors have been trying to recreate since Hemingway’s day. Unsurprisingly, some mathematicians (for whom the art of proof had better not omit anything!) want to apply their principles to describe elegance.

Computation and Complexity

This study of artistic elegance will be from a computational perspective, and it will be based loosely on the paper of the same name. While we include the main content of the paper in a condensed form, we will deviate in two important ways: we alter an axiom with justification, and we provide a working implementation for the reader’s use. We do not require extensive working knowledge of theoretical computation, but the informed reader should be aware that everything here is theoretically performed on a Turing machine, but the details are unimportant.

So let us begin with the computational characterization of simplicity. Unfortunately, due to our own lack of knowledge of the subject, we will overlook the underlying details and take them for granted. [At some point in the future, we will provide a primer on Kolmogorov complexity. We just ordered a wonderful book on it, and can’t wait to dig into it!]

Here we recognize that all digital images are strings of bits, and so when we speak of the complexity of a string, in addition to meaning strings in general, we specifically mean the complexity of an image.

Definition: The Kolmogorov complexity of a string is the length of the shortest program which generates it.

In order to specify “length” appropriately, we must fix some universal description language, so that all programs have the same frame of reference. Any Turing-complete programming language will do, so let us choose Python for the following examples. More specifically, there exists a universal Turing machine $ U$, for which any program on any machine may be translated (compiled) into an equivalent program for $ U$ by a program of fixed size. Hence, the measure of Kolmogorov complexity, when a fixed machine is specified (in this case Python), is objective over the class of all outputs.

Here is a simple example illustrating Kolmogorov complexity: consider the string of one hundred zeros. This string is obviously not very “complex,” in the sense that one could write a very short program to generate it. In Python:

print "0" * 100

One can imagine that a compiler which optimizes for brevity would output rather short assembly code as well, with a single print instruction and a conditional branch, and some constants. On the other hand, we want to call a string like


complex, because it follows no apparent pattern. Indeed, in Python the shortest program to output this string is just to print the string itself:

print "00111010010000101101001110101000111101"

And so we see that this random string of ones and zeros has a higher Kolmogorov complexity than the string of all zeros. In other words, the boring string of all zeros is “simple,” while the other is “complicated.”

Kolmogorov himself proved that there is no algorithm to compute Kolmogorov complexity (the number itself) for any input. In other words, the problem of determining exact Kolmogorov complexity is undecidable (by reduction from the halting problem; see the Turing machines primer). So we will not try in vain to actually get a number for the Kolmogorov complexity of arbitrary programs, although it is easy to count the lengths of these provably short examples, and instead we speak of complexity in terms of bounds and relativity.

Kolmogorov Meets Picasso

To apply this to art, we want to ask, “for a given picture, what is the length of the shortest program that outputs it?” This will tell us whether a picture is simple or complex. Unfortunately for us, most pictures are neither generated by programs, nor do they have obvious programmatic representations. More feasibly, we can ask, “can we come up with pictures which have low Kolmogorov complexity and are also beautiful?” This is truly a tough task.

To do so, we must first invent an encoding for pictures, and write a program to interpret the encoding. That’s the easy part. Then, the true test, we must paint a beautiful picture.

We don’t pretend to be capable of such artistry. However, there are some who have created an encoding based on circles and drawn very nice pictures with it. Here we will present those pictures as motivation, and then develop a very similar encoding method, providing the code and examples for the reader to play with.

Jürgen Schmidhuber, a long time proponent of low-complexity art, spent a very long time (on the order of thousands of sketches) creating drawings using his circle encoding method, and here are some of his results:


Marvelous. Our creations will be much uglier. But we admit, one must start somewhere, and it might as well be where we feel most comfortable: mathematics and programming.

Magnificence Meets Method

There are many possible encodings for drawings. We will choose one which is fairly easy to implement, and based on intersecting circles. The strokes in a drawing are arcs of these circles. We call the circles used to generate drawings legal circles, while the arcs are legal arcs. Here is an axiomatic specification of how to generate legal circles:

  1. Arbitrarily define the a circle $ C$ with radius 1 as legal. All other circles are generated with respect to this circle. Define a second legal circle whose center is on $ C$, and also has radius 1.
  2. Wherever two legal circles of equal radius intersect, a third circle of equal radius is centered at the point of intersection.
  3. Every legal circle of radius $ r$ has at its center another legal circle of radius $ r/2$.

A legal arc is then simply any arc of a legal circle, and a legal drawing is any list of legal arcs, where each arc has a width corresponding to some fixed set of values. Now we generate all circles which intersect the interior of the base circle $ C$, and sort them first by radius, then by $ x$ coordinate, then $ y$ coordinate. Now given a specified order on the circles, we may number them from 1 to $ n$, and specify a particular circle by its index in the list. In this way, we have defined a coordinate space of arcs, with points of the form (center, thickness, arc-start, arc-end), where the arc-start and art-end coordinates are measured in radians.

We describe the programmatic construction of these circles later. For now, here is the generated picture of all circles which intersect the unit circle up to radius $ 2^{-5}$:

The legal circles

In addition, we provide an animation showing the different layers:

And another animation displaying the list circles sorted by index in increasing order. For an animated GIF, this file has a large size (5MB), and so we link to it separately.

As we construct smaller and smaller circles, the interior of the base circle is covered up by a larger proportion of legally usable area. By using obscenely small circles, we may theoretically construct any drawing. On the other hand, what we care about is how much information is needed to do so.

Because of our nice well ordering on circles, those circles with very small radii will have huge indices! Indeed, there are about four circles of radius $ 2^{-i-1}$ for each circle of radius $ 2^{-i}$ in any fixed area. Then, we can measure the complexity of a drawing by how many characters its list of legal arcs requires. Clearly, a rendition of Starry Night would have a large number of high-indexed circles, and hence have high Kolmogorov complexity. (On second thought, I wonder how hard it would be to get a rough sketch of a Starry-Night-esque picture in this circle encoding…it might not be all that complex).

Note that Schmidhuber defines things slightly differently. In particular, he requires that the endpoints of a legal arc must be the intersection points of two other legal arcs, making the arc-start and arc-end coordinates integers instead of radian measures. We respectfully disagree with this axiom, and we explain why here:

Which of the two arcs is more “complex”?

Of the two arcs in the picture to the left, which would you say is more complex, the larger or the smaller? We observe that two arcs of the same circle, regardless of how long or short they are, should not be significantly different in complexity.

Schmidhuber, on the other hand, implicitly claims that arcs which begin or terminate at non-standard locations (locations which only correspond to the intersections of sufficiently small circles) should be deemed more complex. But this can be a difference as small as $ \pi/100$, and it drastically alters the complexity. We consider this specification unrealistic, at least to the extent to which human beings consider complexity in art. So we stick to radians.

Indeed, our model does alter the complexity for some radian measures, simply because finely specifying fractions requires more bits than integral values. But the change in complexity is hardly as drastic.

In addition, Schmidhuber allows for region shading between legal arcs. Since we did not find an easy way to implement this in Mathematica, we skipped it as extraneous.

Such Stuff as Programs are Made of

We implemented this circle encoding in Mathematica. The reader is encouraged to download and experiment with the full notebook, available from this blog’s Github page. We will explain the important bits here.

First, we have a function to compute all the circles whose centers lie on a given circle:

borderCircleCenters[{x_, y_}, r_] :=
  Table[{x + r Cos[i 2 Pi/6], y + r Sin[i 2 Pi/6]}, {i, 0, 5}];

We arbitrarily picked the first legal circle to be the unit circle, defined with center (0,0), while the second has center (1,0). This made generating all legal circles a relatively simple search task. In addition, we recognize that any arbitrary second chosen circle is simply a rotation of this chosen configuration, so one may rotate their final drawing to accommodate for a different initialization step.

Second, we have the brute-force search of all circles. We loop through all circles in a list, generating the six border circles appropriately, and then filtering out the ones we need, repeating until we have all the circles which intersect the interior of the unit circle. Note our inefficiencies: we search out as far as radius 2 to find small circles which do not necessarily intersect the unit circle, and we calculate the border circles of each circle many times. On the other hand, finding all circles as small as radius $ 2^{-5}$ takes about a minute on an Intel Atom processor, which is not so slow to need excessive tuning for a prototype’s sake.

getAllCenters[r_] := Module[{centers, borderCenters, searchR,
                             ord, rt},
   ord[{a_, b_}, {c_, d_}] := If[a < c, True, b < d];
   centers = {{0, 0}};

   rt = Power[r, 1/2];
   While[Norm[centers[[-1]]] <= Min[2, 1 + rt],
    borderCenters = Map[borderCircleCenters[#, r] &, centers];
    centers = centers \[Union] Flatten[borderCenters, 1]];

   Sort[Select[centers, Norm[#] < 1 + r &], ord]

Finally, we have a function to extract from the resulting list of all centers the center and radius of a given index, and a function to convert a coordinate to its graphical representation:

(* extracts a pair {center, radius} given the
   index of the circle *)
indexToCenterRadius[layeredCenters_, index_] :=
  Module[{row, length, counter},
   row = 1;
   length = Length[layeredCenters[[row]]];
   counter = index;

   While[counter > length,
    counter -= length;
    length = Length[layeredCenters[[row]]];

   {layeredCenters[[row, counter]], 1/2^(row - 1)}

drawArc[{index_, thickness_, arcStart_, arcEnd_}] :=
  Module[{center, radius},
   {center, radius} = indexToCenterRadius[allCenters, index];
     Circle[center, radius, {arcStart, arcEnd}]},
     ImagePadding -> 5, PlotRange -> {{-1, 1}, {-1, 1}},
     ImageSize -> {400, 400}]

And a front-end style function, which takes a list of coordinates and draws the resulting picture:

paint[coordinates_] := Show[Map[drawArc, coordinates]];

Any omitted details (at least one global variable name) are clarified in the notebook.

Now, with our paintbrush in hand, we unveil our very first low-complexity piece of art. Behold! Surprised Mr. Moustache Witnessing a Collapsing Soufflé:

Surprised Mr. Moustache, © Jeremy Kun, 2011

It’s coordinates are:

{{7, 0.005, 0, 2 Pi}, {197, 0.002, 0, 2 Pi},
{299, 0.002, 0, 2 Pi}, {783, 0.002, 0, 2 Pi},
{2140, 0.001, 0, 2 Pi}, {3592, 0.001, 0, 2 Pi},
{22, 0.004, 8 Pi/6, 10 Pi/6}, {29, 0.004, 4 Pi/3, 5 Pi/3},
 {21, 0.004, Pi/3, 2 Pi/3}, {28, 0.004, Pi/3, 2 Pi/3}}

Okay, so it’s lame, and took all of ten minutes to create (guess-and-check on the indices is quick, thanks to Mathematica’s interpreter). But it has low Kolmogorov complexity! And that’s got to count for something, right?

Even if you disagree with our obviously inspired artistic genius, the Mathematica framework for creating such drawings is free and available for anyone to play with. So please, should you have any artistic talent at all (and access to Mathematica), we would love to see your low-complexity art! If we somehow come across three days of being locked in a room with access to nothing but a computer and a picture of Starry Night, we might attempt to recreate a sketch of it for this blog. But until then, we will explore other avenues.

Happy sketching!

Addendum: Note that the outstanding problem here is how to algorithmically take a given picture (or specification of what one wants to draw), and translate it into this system of coordinates. As of now, no such algorithm is known, and hence we call the process of making a drawing art. We may attempt to find such a method in the future, but it is likely hard, and if we produced an algorithm even a quarter as good as we might hope, we would likely publish a paper first, and blog about it second.

The Wild World of Cellular Automata

So far on this blog we’ve been using mathematics to help us write interesting and useful programs. For this post (and for more in the future, I hope) we use an interesting program to drive its study as a mathematical object. For the uninformed reader, I plan to provide an additional primer on the theory of computation, but for the obvious reason it interests me more to write on their applications first. So while this post will not require too much rigorous mathematical knowledge, the next one we plan to write will.

Cellular Automata

There is a long history of mathematical models for computation. One very important one is the Turing Machine, which is the foundation of our implementations of actual computers today. On the other end of the spectrum, one of the simpler models of computation (often simply called a system) is a cellular automaton. Surprisingly enough, there are deep connections between the two. But before we get ahead of ourselves, let’s see what these automata can do.

A cellular automaton is a space of cells, where each cell has a fixed number of possible states, and a set of rules for when one state transitions to another. At each state, all cells are updated simultaneously according to the transition rules. After a pedantic, yet interesting, example, we will stick to a special two-dimensional automata ($ n \times n$ grids of cells), where the available states are 1 or 0. We will alternate freely between saying “1 and 0,” “on and off,” and “live and dead.”

Consider a 1-dimensional grid of cells which has infinite length in either direction (recalling Turing Machines, an infinite tape), where each cell can contain either a 0 or 1. For the sets of rules, we say that if a cell has any immediately adjacent neighbor which is on, then in the next generation the cell is on. Otherwise, the cell is off. We may sum up this set of rules with the following picture (credit to Wolfram MathWorld):

The state transition rule for our simple cellular automaton.

The first row represents the possible pre-transition states, and the second row is the resulting state for the center cell in the next generation. Intuitively, we may think of these as bacteria reproducing in a petri dish, where there are rigorous rules on when a bacteria dies or is born. If we start with a single cell turned on, and display each successive generation as a row in a 2-dimensional grid, we result in the following orderly pattern (again, credit to Wolfram MathWorld for the graphic):

The resulting pattern in our simple cellular automaton.

While this pattern is relatively boring, there are many interesting patterns resulting from other transition rules (which are just as succinct). To see a list of all such elementary cellular automaton, see Wolfram MathWorld’s page on the topic. Indeed, Stephen Wolfram was the first to classify these patterns, so the link is appropriate.

Because a personification of this simulation appears to resemble competition, these cellular automata are sometimes called zero-player games. Though it borrows terminology from the field of game theory, we do not analyze any sort of strategy, but rather observe the patterns emerging from various initial configurations. There are often nice local or global equilibria; these are the treasures to discover.

As we increase the complexity of the rules, the complexity of the resulting patterns increases as well. (Although, rule 30 of the elementary automata is sufficiently complex, even exhibiting true mathematical chaos, I hardly believe that anyone studies elementary automata anymore)

So let’s increase the dimension of our grid to 2, and explore John Conway’s aptly named Game of Life.

What Life From Yonder Automaton Breaks!

For Life, our automaton has the following parameters: an infinite two-dimensional grid of cells, states that are either on or off, and some initial configuration of the cells called a seed. There are three transition rules:

  1. Any live cell with fewer than two or more than three living neighbors dies.
  2. Any dead cell with exactly three living neighbors becomes alive.
  3. In any other case, the cell remains as it was.

Originally formulated by John Conway around 1970, this game was originally just a mathematical curiosity. Before we go into too much detail in the mathematical discoveries which made this particular game famous, let’s write it and explore some of the patterns it creates.

Note: this is precisely the kind of mathematical object that delights mathematicians. One creates an ideal mathematical object in one’s own mind, gives it life (no pun intended), and soon the creation begins to speak back to its creator, exhibiting properties far surpassing its original conception. We will see this very process in the Game of Life.

The rules of Life are not particularly hard to implement. We did so in Mathematica, so that we may use its capability to easily produce animations. Here is the main workhorse of our implementation. We provide all of the code used here in a Mathematica notebook on this blog’s Github page.

(* We abbreviate 'nbhd' for neighborhood *)
getNbhd[A_, i_, j_] := A[[i - 1 ;; i + 1, j - 1 ;; j + 1]];

evaluateCell[A_, i_, j_] :=
  Module[{nbhd, cell = A[[i, j]], numNeighbors},

   (* no man's land edge strategy *)
   If[i == 1 || j == 1 || i == Length[A] || j == Length[A[[1]]],

   nbhd = getNbhd[A, i, j];
   numNeighbors = Apply[Plus, Flatten[nbhd]];

   If[cell == 1 && (numNeighbors - 1 < 2 || numNeighbors - 1 > 3),
   If[cell == 0 && numNeighbors == 3, Return[1]];

evaluateAll[A_] := Table[evaluateCell[A, i, j],
   {i, 1, Length[A]}, {j, 1, Length[A[[1]]]}];

This implementations creates a few significant limitations to our study of this system. First, we have a fixed array size instead of an infinite grid. This means we need some case to handle live cells reaching the edge of the system. Fortunately, at this introductory stage in our investigation we can ignore patterns which arise too close to the border of our array, recognizing that the edge strategy tampers with the evolution of the system. Hence, we adopt the no man’s land edge strategy, which simply allows no cell to be born on the border of our array. One interesting alternative is to have the edges wrap around, thus treating the square grid as the surface of a torus. For small grids, this strategy can actually tamper with our central patterns, but for a large fixed grid, it is a viable strategy.

Second, we do not optimize our array operations to take advantage of sparse matrices. Since most cells will usually be dead, we really only need to check the neighborhoods of live cells and dead cells which have at least one live neighbor. We could keep track of the positions of live cells in a hash set, checking only those and their immediate neighbors at each step. It would not take much to modify the above code to do this, but for brevity and pedantry we exclude it, leaving the optimization as an exercise to the reader.

Finally, to actually display this code we combine Mathematica’s ArrayPlot and NestList functions to achieve a list of frames, which we then animate:

makeFrames[A_, n_] := Map[
  ArrayPlot[#, Mesh -> True]&, NestList[evaluateAll, A, n]];

animate[frames_] := ListAnimate[frames, 8, ControlPlacement -> Top];

randomLife = makeFrames[RandomInteger[1, {20, 20}], 200];

Throwing any mathematical thoughts we might have to the wind, we just run it! Here’s the results for our first try:

What a beauty. The initial chaos almost completely stabilizes after just a few iterations. We see that there exist stationary patterns, the 2×2 square in the bottom left and the space-invader in the top right. Finally, after the identity crisis in the bottom right flounders for a while, we get an oscillating pattern!

Now hold on, because we recognize that this oscillator (which we henceforth dub, the flame) is resting against the no man’s land. So it might not be genuine, and only oscillate because the edge allows it to. However, we notice that one of the patterns which precedes the flame is a 3×3 live square with a dead center. Let’s try putting this square by itself to see what happens. In order to do this, we have an extra few lines of code to transform a list of local coordinates to a pattern centered in a larger grid.

patternToGrid[pts_List, n_] :=
  With[{xOff = Floor[n/2] - Floor[Max[Map[#[[2]] &, pts]]/2],
        yOff = Floor[n/2] - Floor[Max[Map[#[[1]] &, pts]]/2]},
   SparseArray[Map[# + {yOff, xOff} -> 1 &, pts], {n, n}, 0]];
square = {{1, 1}, {1, 2}, {1, 3}, {2, 1}, {2, 3},
  {3, 1}, {3, 2}, {3, 3}};

Combining the resulting two lines with the earlier code for animation, we produce the following pattern:

While we didn’t recover our coveted flame from before, we have at least verified that natural oscillators exist. It’s not hard to see that one of the four pieces above constitutes the smallest oscillator, for any oscillator requires at least three live cells in every generation, and this has exactly three in each generation. No less populated (static or moving) pattern could possibly exist indefinitely.

Before we return to our attempt to recreate the flame, let’s personify this animation. If we think of the original square as a densely packed community, we might tend to interpret this pattern as a migration. The packed population breaks up and migrates to form four separate communities, each of which is just the right size to sustain itself indefinitely. The astute reader may ask whether this is always the case: does every pattern dissipate into a stable pattern? Indeed, this was John Conway’s original question, and we will return to it in a moment.

For now, we notice that the original square preceding the flame grew until its side hit a wall. Now we realize that the wall was essential in its oscillation. So, let us use the symmetry in the pattern to artificially create a “wall” in the form of another origin square. After a bit of tweaking to get the spacing right (three cells separating the squares), we arrive at the following unexpected animation:

We admit, with four symmetrically oscillating flames, it looks more like a jellyfish than a fire. But while we meant to produce two flames, we ended up with four! Quite marvelous. Here is another beautiful reject, which we got by placing the two squares only one cell apart. Unfortunately, it evaporates rather quickly. We call it, the fleeting butterfly.

We refrain from experimenting with other perturbations of the two-square initial configuration for the sake of completing this post by the end of the year. If the reader happens to find an interesting pattern, he shouldn’t hesitate to post a comment!

Now, before returning to the stabilization question, we consider one more phenomenon: moving patterns. Consider the following initial configuration:

A few mundane calculations show that in four generations this pattern repeats itself, but a few cells to the south-east. This glider pattern will fly indefinitely to its demise in no man’s land, as we see below.

Awesome. And clearly, we can exploit the symmetry of this object to shoot the glider in all four directions. Let’s see what happens when they collide!

Well that was dumb. It’s probably too symmetric. We leave it as an exercise to the reader to slightly modify the initial position (given in the Mathematica notebook on this blog’s Github page) and witness the hopefully ensuing chaos.

Now you may have noticed that these designs are very pretty. Indeed, before the post intermission (there’s still loads more to explore), we will quickly investigate this idea.

Automata in Design

Using automata in design might seem rather far-fetched, and certainly would be difficult to implement (if not impossible) in an environment such as Photoshop or with CSS. But, recalling our post on Randomness in Design, it is only appropriate to show a real-world example of a design based on a cellular automaton (specifically, it seems to use something similar to rule 30 of the elementary automata). The prominent example at hand is the Conus seashell.

A Conus shell.

The Conus has cells which secrete pigment according to some unknown set of rules. That the process is a cellular automaton is stated but unsupported on Wikipedia. As unfortunate as that is, we may still appreciate that the final result looks like it was generated from a cellular automaton, and we can reproduce such designs with one. If I had more immediate access to a graphics library and had a bit more experience dealing with textures, I would gladly produce something. If at some point in the future I do get such experience, I would like to return to this topic and see what I can do. For the moment, however, we just admire the apparent connection.

A Tantalizing Peek

We have yet to breach the question of stabilization. In fact, though we started talking about models for computation, we haven’t actually computed anything besides pretty pictures yet! We implore the reader to have patience, and assert presciently that the question of stabilization comes first.

On one hand, we can prove that from any initial configuration Life always stabilizes, arriving at a state where cell population growth cannot continue. Alternatively, we could discover an initial configuration which causes unbounded population growth. The immature reader will notice that this mathematical object would not be very interesting if the former were the case, and so it is likely the latter. Indeed, without unbounded growth we wouldn’t be able to compute much! Before we actually find such a pattern, we realize that unbounded growth is possible in two different ways. First, a moving pattern (like the glider) may leave cells in its wake which do not disappear. Similarly, a stationary pattern may regularly emit moving patterns. Next time, we will give the canonical examples of such patterns, and show their use in turning Life into a model for computation. Finally, we have some additional ideas to spice Life up, but we will leave those as a surprise, defaulting to exclude them if they don’t pan out.

Until next time!