Recreational mathematics is also full of puzzles involving prisoners wearing colored hats, where they can see others hats but not their own, and their goal is to each determine (often with high probability) the color of their own hat. Sometimes they are given the opportunity to convey a limited amount of information.

Over the last eight years I’ve been delighted by the renaissance of independent tabletop board and card games. Games leaning mathematical, like Set, have had a special place in my heart. Sadly, many games of incomplete information often fall to an onslaught of logic. One example is the popular game The Resistance (a.k.a. Avalon), in which players with unknown allegiances must either deduce which players are spies, or remain hidden as a spy while foiling a joint goal. With enough mathematicians playing, it can be easy to dictate a foolproof strategy. *If we follow these steps, we can be 100% sure of victory against the spies, so anyone who disagrees with this plan or deviates from it is guaranteed to be a spy. *Though we’re clearly digging a grave for our fun, it’s hard to close pandora’s logic box after it’s open. So I’m always on the lookout for games that resist being trivialized.

Enter Hanabi.

A friend recently introduced me to the game, which channels the soul and complexion of the blue-eyed islanders and hat-donning prisoners into a delightful card game.

The game has simple rules: each player gets a hand that they may not see, but they reveal to all other players. The hands come from the following set of cards (with more 1’s than 2’s, and the fewest 5’s), and players work together, aiming to place cards from 1-5 in order in each color. It’s like solitaire, where stacks of different colors may progress independently, but a 2 must be placed before a 3.

Then the players take turns, and on each turn a player may do one of the following:

- Choose a card from your hand to play. If the chosen card cannot be played (e.g, it’s a red 3 but only a red 1 is on the table), everyone gets a strike. Three strikes ends the game in a loss.
- Use an information token (limited in supply) to give one piece of information to one other player; the allowed types of information are explained below.
- Choose a card from your hand to discard, and regain an information token for future use.

The information you can give to a player to choosing a single feature (a specific rank or color), and pointing to all cards in that player’s hand that have that feature. *Example: “these two cards are green”, or “this card is a 4”. *House rules dictate whether “no cards are blue” is a valid piece of information. Officially—I like to think it’s in the spirit of the blue-eyed islander’s puzzle where “someone has blue eyes”—you must be able to* point* at something to reveal information about it.

So the game involves some randomness (the draw), and some resource management (the information tokens), but the heart of the game is figuring out how to convey as much information as possible in a single clue.

Just like the blue-eyed islander’s puzzle, giving a public piece of information to one player can indicate much more. Imagine their are 4 players. I can see your hand, but if, after looking at your hand, I decide instead to give Blair a clue, that gives you information that what’s in Blair’s hand is more valuable for me to reveal to her than what’s in your hand would be to reveal to you.

Another trick: say I know I have a 4, and say it’s the beginning of the game where 4’s are not playable, and the board has a blue 1 on it. If you play before me and you tell me that that same 4 *and a second card are both blue*, what does that tell me? It was certainly somewhat redundant: you told me more information about a card I knew was not playable, and seemingly not super-helpful information about a second card. After some reflection you can often infer that not only is the second card a blue 2, but also that you have at least one more 2 elsewhere in your hand that’s not immediately playable. That’s a lot of information!

The idea of common knowledge takes it down a rabbit hole that I haven’t quite gotten my head around, but which makes the game continually fun. If I know that you know that I can infer the above scenario with the blue 2, then you *not* giving me that clue tells me that either that situation isn’t present in my hand, or else that whatever information you’re instead giving to Matthieu is a higher priority. The more the group can understand to be commonly inferable (say, discussing strategies before starting the game), the more one can take advantage of common knowledge. The game starts to feel like a logical olympiad, where your worst enemy is your fallible memory, and if people aren’t playing at the same level, relying too much on an inference your teammate didn’t intend can cause grave mistakes!

It’s a guaranteed hit at your next gathering of logic-loving mathemalites!

]]>

Consider a square in the xy-plane, and let A (an “assassin”) and T (a “target”) be two arbitrary-but-fixed points within the square.Suppose that the square behaves like a billiard table, so that any ray (a.k.a “shot”) from the assassin will bounce off the sides of the square, with the angle of incidence equaling the angle of reflection.

Puzzle: Is it possible to block any possible shot from A to T by placing a finite number of points in the square?

This puzzle found its way to me through Tai-Danae’s video, via category theorist Emily Riehl, via a talk by the recently deceased Fields Medalist Maryam Mirzakhani, who studied the problem in more generality. I’m not familiar with her work, but knowing mathematicians it’s probably set in an arbitrary complex -manifold.

See Tai-Danae’s post for a proof, which left such an impression on me I had to dig deeper. In this post I’ll discuss a visualization I made—now posted at the end of Tai-Danae’s article—as well as here and below (to avoid spoilers). In the visualization, mouse movement chooses the firing direction for the assassin, and the target is in green. Dragging the target with the mouse updates the position of the guards. The source code is on Github.

The visualization uses d3 library, which was made for visualizations that dynamically update with data. I use it because it can draw SVGs real nice.

The meat of the visualization is in two geometric functions.

- Decompose a ray into a series of line segments—its path as it bounces off the walls—stopping if it intersects any of the points in the plane.
- Compute the optimal position of the guards, given the boundary square and the positions of the assassin and target.

Both of these functions, along with all the geometry that supports them, is in geometry.js. The rest of the demo is defined in main.js, in which I oafishly trample over d3 best practices to arrive miraculously at a working product. Critiques welcome

As with most programming and software problems, the key to implementing these functions while maintaining your sanity is breaking it down into manageable pieces. Incrementalism is your friend.

We start at the bottom with a Vector class with helpful methods for adding, scaling, and computing norms and inner products.

function innerProduct(a, b) { return a.x * b.x + a.y * b.y; } class Vector { constructor(x, y) { this.x = x; this.y = y; } normalized() { ... } norm() { ... } add(vector) { ... } subtract(vector) { ... } scale(length) { ... } distance(vector) { ... } midpoint(b) { ... } }

This allows one to compute the distance between two points, e.g., with `vector.subtract(otherVector).norm()`

.

Next we define a class for a ray, which is represented by its center (a vector) and a direction (a vector).

class Ray { constructor(center, direction, length=100000) { this.center = center; this.length = length; if (direction.x == 0 && direction.y == 0) { throw "Can't have zero direction"; } this.direction = direction.normalized(); } endpoint() { return this.center.add(this.direction.scale(this.length)); } intersects(point) { let shiftedPoint = point.subtract(this.center); let signedLength = innerProduct(shiftedPoint, this.direction); let projectedVector = this.direction.scale(signedLength); let differenceVector = shiftedPoint.subtract(projectedVector); if (signedLength > 0 && this.length > signedLength && differenceVector.norm() < intersectionRadius) { return projectedVector.add(this.center); } else { return null; } } }

The ray must be finite for us to draw it, but the length we've chosen is so large that, as you can see in the visualization, it's effectively infinite. Feel free to scale it up even longer.

The interesting bit is the intersection function. We want to compute whether a ray intersects a point. To do this, we use the inner product as a decision rule to compute the distance of a point from a line. If that distance is very small, we say they intersect.

In our demo points are not infinitesimal, but rather have a small radius described by `intersectionRadius`

. For the sake of being able to see anything we set this to 3 pixels. If it’s too small the demo will look bad. The ray won’t stop when it should appear to stop, and it can appear to hit the target when it doesn’t.

Next up we have a class for a Rectangle, which is where the magic happens. The boilerplate and helper methods:

class Rectangle { constructor(bottomLeft, topRight) { this.bottomLeft = bottomLeft; this.topRight = topRight; } topLeft() { ... } center() { ... } width() { .. } height() { ... } contains(vector) { ... }

The function `rayToPoints`

that splits a ray into line segments from bouncing depends on three helper functions:

`rayIntersection`

: Compute the intersection point of a ray with the rectangle.`isOnVerticalWall`

: Determine if a point is on a vertical or horizontal wall of the rectangle, raising an error if neither.`splitRay`

: Split a ray into a line segment and a shorter ray that’s “bounced” off the wall of the rectangle.

(2) is trivial, computing some x- and y-coordinate distances up to some error tolerance. (1) involves parameterizing the ray and checking one of four inequalities. If the bottom left of the rectangle is and the top right is and the ray is written as , then—with some elbow grease—the following four equations provide all possibilities, with some special cases for vertical or horizontal rays:

In code:

rayIntersection(ray) { let c1 = ray.center.x; let c2 = ray.center.y; let v1 = ray.direction.x; let v2 = ray.direction.y; let x1 = this.bottomLeft.x; let y1 = this.bottomLeft.y; let x2 = this.topRight.x; let y2 = this.topRight.y; // ray is vertically up or down if (epsilon > Math.abs(v1)) { return new Vector(c1, (v2 > 0 ? y2 : y1)); } // ray is horizontally left or right if (epsilon > Math.abs(v2)) { return new Vector((v1 > 0 ? x2 : x1), c2); } let tTop = (y2 - c2) / v2; let tBottom = (y1 - c2) / v2; let tLeft = (x1 - c1) / v1; let tRight = (x2 - c1) / v1; // Exactly one t value should be both positive and result in a point // within the rectangle let tValues = [tTop, tBottom, tLeft, tRight]; for (let i = 0; i epsilon && this.contains(intersection)) { return intersection; } } throw "Unexpected error: ray never intersects rectangle!"; }

Next, `splitRay`

splits a ray into a single line segment and the “remaining” ray, by computing the ray’s intersection with the rectangle, and having the “remaining” ray mirror the direction of approach with a new center that lies on the wall of the rectangle. The new ray length is appropriately shorter. If we run out of ray length, we simply return a segment with a null ray.

splitRay(ray) { let segment = [ray.center, this.rayIntersection(ray)]; let segmentLength = segment[0].subtract(segment[1]).norm(); let remainingLength = ray.length - segmentLength; if (remainingLength < 10) { return { segment: [ray.center, ray.endpoint()], ray: null }; } let vertical = this.isOnVerticalWall(segment[1]); let newRayDirection = null; if (vertical) { newRayDirection = new Vector(-ray.direction.x, ray.direction.y); } else { newRayDirection = new Vector(ray.direction.x, -ray.direction.y); } let newRay = new Ray(segment[1], newRayDirection, length=remainingLength); return { segment: segment, ray: newRay }; }

As you have probably guessed, `rayToPoints`

simply calls ` splitRay`

over and over again until the ray hits an input “stopping point”—a guard, the target, or the assassin—or else our finite ray length has been exhausted. The output is a list of points, starting from the original ray’s center, for which adjacent pairs are interpreted as line segments to draw.

rayToPoints(ray, stoppingPoints) { let points = [ray.center]; let remainingRay = ray; while (remainingRay) { // check if the ray would hit any guards or the target if (stoppingPoints) { let hardStops = stoppingPoints.map(p => remainingRay.intersects(p)) .filter(p => p != null); if (hardStops.length > 0) { // find first intersection and break let closestStop = remainingRay.closestToCenter(hardStops); points.push(closestStop); break; } } let rayPieces = this.splitRay(remainingRay); points.push(rayPieces.segment[1]); remainingRay = rayPieces.ray; } return points; }

That’s sufficient to draw the shot emanating from the assassin. This method is called every time the mouse moves.

The function to compute the optimal position of the guards takes as input the containing rectangle, the assassin, and the target, and produces as output a list of 16 points.

/* * Compute the 16 optimal guards to prevent the assassin from hitting the * target. */ function computeOptimalGuards(square, assassin, target) { ... }

If you read Tai-Danae’s proof, you’ll know that this construction is to

- Compute mirrors of the target across the top, the right, and the top+right of the rectangle. Call this resulting thing the
*4-mirrored-targets.* - Replicate the 4-mirrored-targets four times, by translating three of the copies left by the entire width of the 4-mirrored-targets shape, down by the entire height, and both left-and-down.
- Now you have 16 copies of the target, and one assassin. This gives 16 line segments from assassin-to-target-copy. Place a guard at the midpoint of each of these line segments.
- Finally, apply the reverse translation and reverse mirroring to return the guards to the original square.

Due to WordPress being a crappy blogging platform I need to migrate off of, the code snippets below have been magically disappearing. I’ve included links to github lines as well.

Step 1 (after adding simple helper functions on `Rectangle`

to do the mirroring):

// First compute the target copies in the 4 mirrors let target1 = target.copy(); let target2 = square.mirrorTop(target); let target3 = square.mirrorRight(target); let target4 = square.mirrorTop(square.mirrorRight(target)); target1.guardLabel = 1; target2.guardLabel = 2; target3.guardLabel = 3; target4.guardLabel = 4;

// for each mirrored target, compute the four two-square-length translates let mirroredTargets = [target1, target2, target3, target4]; let horizontalShift = 2 * square.width(); let verticalShift = 2 * square.height(); let translateLeft = new Vector(-horizontalShift, 0); let translateRight = new Vector(horizontalShift, 0); let translateUp = new Vector(0, verticalShift); let translateDown = new Vector(0, -verticalShift); let translatedTargets = []; for (let i = 0; i < mirroredTargets.length; i++) { let target = mirroredTargets[i]; translatedTargets.push([ target, target.add(translateLeft), target.add(translateDown), target.add(translateLeft).add(translateDown), ]); }

Step 3, computing the midpoints:

// compute the midpoints between the assassin and each translate let translatedMidpoints = []; for (let i = 0; i t.midpoint(assassin))); }

Step 4, returning the guards back to the original square, is harder than it seems, because the midpoint of an assassin-to-target-copy segment might not be in the same copy of the square as the target-copy being fired at. This means you have to detect which square copy the midpoint lands in, and use that to determine which operations are required to invert. This results in the final block of this massive function.

// determine which of the four possible translates the midpoint is in // and reverse the translation. Since midpoints can end up in completely // different copies of the square, we have to check each one for all cases. function untranslate(point) { if (point.x square.bottomLeft.y) { return point.add(translateRight); } else if (point.x >= square.bottomLeft.x && point.y <= square.bottomLeft.y) { return point.add(translateUp); } else if (point.x < square.bottomLeft.x && point.y <= square.bottomLeft.y) { return point.add(translateRight).add(translateUp); } else { return point; } } // undo the translations to get the midpoints back to the original 4-mirrored square. let untranslatedMidpoints = []; for (let i = 0; i square.topRight.x && point.y > square.topRight.y) { return square.mirrorTop(square.mirrorRight(point)); } else if (point.x > square.topRight.x && point.y <= square.topRight.y) { return square.mirrorRight(point); } else if (point.x square.topRight.y) { return square.mirrorTop(point); } else { return point; } } return untranslatedMidpoints.map(unmirror);

And that’s all there is to it!

There are a few improvements I’d like to make to this puzzle, but haven’t made the time (I’m writing a book, after all!).

- Be able to drag the guards around.
- Create new guards from an empty set of guards, with a button to “reveal” the solution.
- Include a toggle that, when pressed, darkens the entire region of the square that can be hit by the assassin. For example, this would allow you to see if the target is in the only possible safe spot, or if there are multiple safe spots for a given configuration.
- Perhaps darken the vulnerable spots by the number of possible paths that hit it, up to some limit.
- The most complicated one: generalize to an arbitrary polygon (convex or not!), for which there may be no optional solution. The visualization would allow you to look for a solution using 2-4.

Pull requests are welcome if you attempt any of these improvements.

Until next time!

]]>*Functional programming gives us back that inalienable right to analyze things by using mathematics. Never again need we bear the burden of that foul mutant x = x+1! No novice programmer—nay, not even a mathematician!—could comprehend such flabbergastery. Tis a pinnacle of confusion!*

It’s ironic that so much of the merits or detriment of the use of = is based on a veiled appeal to the purity of mathematics. Just as often software engineers turn the tables, and any similarity to mathematics is decried as elitist jibber jabber (*Such an archaic and abstruse use of symbols! Oh no, big-O!*).

In fact, equality is more rigorously defined in a programming language than it will ever be in mathematics. Even in the hottest pits of software hell, where there’s = and == and ===, throwing in ==== just to rub salt in the wound, each operator gets its own coherent definition and documentation. Learn it once and you’ll never go astray.

Not so in mathematics—oh yes, hide your children from the terrors that lurk. In mathematics equality is little more than a stand-in for the word “is,” oftentimes entirely dependent on context. Now gather round and listen to the tale of the true identities of the masquerader known as =.

Let’s start with some low-hanging fruit, the superficial concerns.

If = were interpreted literally, would be “equal” to 1, and “equal” to 2, and I’d facetiously demand . Aha! Where is your Gauss now?! But seriously, this bit of notation shows that mathematics has both expressions with scope and variables that change their value over time. And the use for notation was established by *Euler*, long before algorithms jumped from logic to computers to billionaire Senate testimonies.

Likewise, set-builder notation often uses the same kind of equals-as-iterate.

In Python, or interpreting the expression literally, the value of would be a tuple, producing a type error. (In Javascript, it produces 2. How could it be Javascript if it didn’t?)

Next up we have the sloppiness of functions. Let . This is a function, and is a variable. Rather than precisely say, , we say that for . So is simultaneously an indeterminate input and a concrete value. The same scoping for programming functions bypass the naive expectation that equality means “now and forever.” Couple that with the question-as-equation , in which one asks what values of produce this result, if any, and you begin to see how deep the rabbit hole goes. To understand what someone means when they say , you need to know the context.

But this is just the tip of the iceberg, and we’re drilling deep. The point is that = carries with it all *kinds* of baggage, not just the scope of a particular binding of a variable.

Continuing with functions, we have rational expressions like . One often starts by saying “let’s let be this function.” Then we want to analyze it, and in-so-doing we simplify to . To keep ourselves safe, we modify the domain of to exclude post-hoc. But the flow of the argument is the same: we defer the exclusion of until we need it, meaning the equality at the beginning is a different equality than at the end. In effect, we have an infinitude of different kinds of equality for functions, one for each choice of what to exclude from the domain. And a mathematical proof might switch between them as needed.

“Why not just define a new function with a different domain,” you ask? You can, but mathematicians don’t. And if you’re arguing in favor or against a particular notation, and using “mathematics” as your impenetrable shield, you’ve got to remember the famous definition of Reuben Hersh, that “mathematics is what mathematicians do.” For us, that means you can’t claim superiority based on an idea of mathematics that disagrees with mathematical practice. And mathematics, dear reader, is messier than programmers and philosophers would have one believe.

And now we turn to the Great Equality Contextualizer, the **isomorphism. **

You see, all over mathematics there are objects which are not equal, but we want them to be. When you study symmetry, say, you learn that there is an algebraic structure to symmetry called a group. And the same structure—that is, the same true underlying relationships between the symmetries of a thing—can show up in many different guises. As a set, as a picture, as a class of functions, in polynomials and compass constructions and wallpapers, oh my! In each of these things we want to say that two symmetry structures are the same even if they look different. We want to overload equality when four-fold rotational symmetry applies to my table as well as a four-pointed star.

The tool we use for that is called an isomorphism. In brief terms, it’s a function between two objects, with an inverse, that preserves the structure you care about both ways. In fact, there *is* a special symbol for when two things are isomorphic, and it’s often . But is annoying to write, and it really just means “is the same as” the same way equality does. So mathematicians often drop the squiggle and use =.

Plus, there are a million kinds of isomorphism. Groups, graphs, vector spaces, rings, fields, modules, algebras, rational functions, varieties, Lie groups, *breathe* topological spaces, manifolds of all stripes, sheaves, schemes, lattices, knots, the list just keeps going on and on and on! No way are we making up a symbol for each one of these and the hundreds of variations we might come up with. And moreover, when you say two things are isomorphic, that gives you absolutely no indication of *how* they are isomorphic. It fact, it can be extremely tedious to compute isomorphisms between things, and it’s even known to be uncomputable in extreme cases! What good is equality if you can’t even check it?

*But wait!* You might ask, having read this blog for a while and knowing better than to not question a claim. *All of these uses of equality are still equivalence relations, and x = x + 1 is not an equivalence relation!*

Well, you got me there. Mathematicians love to keep equality as an equivalence relation. When mathematicians need to define an algorithm where the value of changes in a nontrivial way, it’s usually done by setting equal to some starting value and letting be defined as some function of and smaller terms, like the good ol’ Fibonacci sequence and .

*If mutation is so great, why do mathematicians use recursion so much? Huh? Huh?*

Well, I’ve got two counterpoints. The first is that the goal here is to *reason* about the sequence, not to describe it in a way that can be efficiently carried out by a computer. When you say x = x + 1, you’re telling the computer that the old value of x need not linger, and you can do away with the space occupied by the previous value of x. To achieve the same result with recursion requires a whole other can of worms: memoization and tail recursive style and compiler optimizations to shed stack frames. It’s a lot more work to understand all that (to get to an equivalent solution) than it is to understand mutation! Simply stated, the goals of mathematics and programming are quite differently aligned. The former is about understanding a thing, and the latter is more often about describing a concrete process under threat of limited resources.

My second point is that mathematical notation is so flexible and adaptable that it doesn’t *need* mutation the same way programming languages need it. In mathematics we have no stack overflows, no register limits or page swaps, no limitations on variable names or memory allocation, our brains do the continuation passing for us, and we can rewrite history ad hoc and pile on abstractions as needed to achieve a particular goal. Even when you’re describing an algorithm in mathematics, you get the benefits of mathematical abstractions. A mathematician could easily introduce = as mutation in their work. Nothing stops them from doing so! It’s just that they rarely have a genuine need for it.

But of course, none of this changes that languages could use := or “let” instead of = for assignment. If a strict adherence to asymmetry for asymmetric operations helps you sleep at night, so be it. My point is that the case when = means assignment is an extremely simple bit of context. Much simpler than the albatrossian mental burden required to understand what mathematicians really mean when they write .

*Postscript: I hope everyone reading this realizes I’m embellishing a bit for the sake of entertainment. If you want to fight me, tell me the best tree isn’t aspen. I dare you.*

*Postpostscript: embarrassingly, I completely forgot about Big-O notation and friends (despite mentioning it in the article!) as a case where = does not mean equality! f(n) = O(log n) is a statement about upper bounds, not equality! Thanks to @lreyzin for keeping me honest.*

In this post I want to share a parlor trick for SET that I originally heard from Charlotte Chan. It uses the same ideas from the video above, which I’ll only review briefly.

In the game of SET you see a board of cards like the following, and players look for sets.

A valid set is a triple of cards where, feature by feature, the characteristics on the cards are either all the same or all different. A valid set above is {one empty blue oval, two solid blue ovals, three shaded blue ovals}. The feature of “fill” is different on all the cards, but the feature of “color” is the same, etc.

In a game of SET, the cards are dealt in order from a shuffled deck, players race to claim sets, removing the set if it’s valid, and three cards are dealt to replace the removed set. Eventually the deck is exhausted and the game is over, and the winner is the player who collected the most sets.

There are a handful of mathematical tricks you can use to help you search for sets faster, but the parlor trick in this post adds a fun variant to the end of the game.

Play the game of SET normally, but when you get down to the last card in the deck, don’t reveal it. Keep searching for sets until everyone agrees no visible sets are left. Then you start the variant: the first player to guess the last un-dealt card in the deck gets a bonus set.

The math comes in when you discover that you don’t need to guess, or remember anything about the game that was just played! A clever stranger could walk into the room at the end of the game and win the bonus point.

**Theorem: **As long as every player claimed a valid set throughout the game, the information on the remaining board uniquely determines the last (un-dealt) card.

Before we get to the proof, some reminders. Recall that there are four features on a SET card, each of which has three options. Enumerate the options for each feature (e.g., {Squiggle, Oval, Diamond} = {0, 1, 2}).

While we will not need the geometry induced by this, this implies each card is a vector in the vector space , where is the finite field of three elements, and the exponent means “dimension 4.” As Tai-Danae points out in the video, each SET is an affine line in this vector space. For example, if this is the enumeration:

Then using the enumeration, a set might be given by

The crucial feature for us is that the vector-sum (using the modular field arithmetic on each entry) of the cards in a valid set is the zero vector . This is because and are all true mod 3.

*Proof of Theorem.* Consider the vector-valued invariant equal to the sum of the remaining cards after sets have been taken. At the beginning of the game the deck has 81 cards that can be partitioned into valid sets. Because each valid set sums to the zero vector, . Removing a valid set via normal play does not affect the invariant, because you’re subtracting a set of vectors whose sum is zero. So for all .

At the end of the game, the invariant still holds even if there are no valid sets left to claim. Let be the vector corresponding to the last un-dealt card, and be the remaining visible cards. Then , meaning .

I would provide an example, but I want to encourage everyone to play a game of SET and try it out live!

Charlotte, who originally showed me this trick, was quick enough to compute this sum in her head. So were the other math students we played SET with. It’s a bit easier than it seems since you can do the sum feature by feature. Even though I’ve known about this trick for years, I still require a piece of paper and a few minutes.

Because this is Math *Intersect* Programming, the reader is encouraged to implement this scheme as an exercise, and simulate a game of SET by removing randomly chosen valid sets to verify experimentally that this scheme works.

Until next time!

]]>For example, if I have the following three “points” in the plane, as indicated by their colors, which is closer, blue to green, or blue to red?

It’s not obvious, and there are multiple factors at work: the red points have fewer samples, but we can be more certain about the position; the blue points are less certain, but the closest non-blue point to a blue point is green; and the green points are equally plausibly “close to red” and “close to blue.” The centers of masses of the three sample sets are close to an equilateral triangle. In our example the “points” don’t overlap, but of course they could. And in particular, there should probably be a nonzero distance between two points whose sample sets have the same center of mass, as below. The distance quantifies the uncertainty.

All this is to say that it’s not obvious how to define a distance measure that is consistent with perceptual ideas of what geometry and distance should be.

**Solution (Earthmover** **distance)**: Treat each sample set corresponding to a “point” as a discrete probability distribution, so that each sample has probability mass . The distance between and is the optional solution to the following linear program.

Each corresponds to a pile of dirt of height , and each corresponds to a hole of depth . The cost of moving a unit of dirt from to is the Euclidean distance between the points (or whatever hipster metric you want to use).

Let be a real variable corresponding to an amount of dirt to move from to , with cost . Then the constraints are:

- Each , so dirt only moves from to .
- Every pile must vanish, i.e. for each fixed , .
- Likewise, every hole must be completely filled, i.e. .

The objective is to minimize the cost of doing this: .

In python, using the ortools library (and leaving out a few docstrings and standard import statements, full code on Github):

from ortools.linear_solver import pywraplp def earthmover_distance(p1, p2): dist1 = {x: count / len(p1) for (x, count) in Counter(p1).items()} dist2 = {x: count / len(p2) for (x, count) in Counter(p2).items()} solver = pywraplp.Solver('earthmover_distance', pywraplp.Solver.GLOP_LINEAR_PROGRAMMING) variables = dict() # for each pile in dist1, the constraint that says all the dirt must leave this pile dirt_leaving_constraints = defaultdict(lambda: 0) # for each hole in dist2, the constraint that says this hole must be filled dirt_filling_constraints = defaultdict(lambda: 0) # the objective objective = solver.Objective() objective.SetMinimization() for (x, dirt_at_x) in dist1.items(): for (y, capacity_of_y) in dist2.items(): amount_to_move_x_y = solver.NumVar(0, solver.infinity(), 'z_{%s, %s}' % (x, y)) variables[(x, y)] = amount_to_move_x_y dirt_leaving_constraints[x] += amount_to_move_x_y dirt_filling_constraints[y] += amount_to_move_x_y objective.SetCoefficient(amount_to_move_x_y, euclidean_distance(x, y)) for x, linear_combination in dirt_leaving_constraints.items(): solver.Add(linear_combination == dist1[x]) for y, linear_combination in dirt_filling_constraints.items(): solver.Add(linear_combination == dist2[y]) status = solver.Solve() if status not in [solver.OPTIMAL, solver.FEASIBLE]: raise Exception('Unable to find feasible solution') return objective.Value()

**Discussion: **I’ve heard about this metric many times as a way to compare probability distributions. For example, it shows up in an influential paper about fairness in machine learning, and a few other CS theory papers related to distribution testing.

One might ask: why not use other measures of dissimilarity for probability distributions (Chi-squared statistic, Kullback-Leibler divergence, etc.)? One answer is that these other measures only give useful information for pairs of distributions with the same support. An example from a talk of Justin Solomon succinctly clarifies what Earthmover distance achieves

Also, why not just model the samples using, say, a normal distribution, and then compute the distance based on the parameters of the distributions? That is possible, and in fact makes for a potentially more efficient technique, but you lose some information by doing this. Ignoring that your data might not be approximately normal (it might have some curvature), with Earthmover distance, you get point-by-point details about how each data point affects the outcome.

This kind of attention to detail can be very important in certain situations. One that I’ve been paying close attention to recently is the problem of studying gerrymandering from a mathematical perspective. Justin Solomon of MIT is a champion of the Earthmover distance (see his fascinating talk here for more, with slides) which is just one topic in a field called “optimal transport.”

This has the potential to be useful in redistricting because of the nature of the redistricting problem. As I wrote previously, discussions of redistricting are chock-full of geometry—or at least geometric-sounding language—and people are very concerned with the apparent “compactness” of a districting plan. But the underlying data used to perform redistricting isn’t very accurate. The people who build the maps don’t have precise data on voting habits, or even locations where people live. Census tracts might not be perfectly aligned, and data can just plain have errors and uncertainty in other respects. So the data that district-map-drawers care about is uncertain much like our point clouds. With a theory of geometry that accounts for uncertainty (and the Earthmover distance is the “distance” part of that), one can come up with more robust, better tools for redistricting.

Solomon’s website has a ton of resources about this, under the names of “optimal transport” and “Wasserstein metric,” and his work extends from computing distances to computing important geometric values like the barycenter, computational advantages like parallelism.

Others in the field have come up with transparency techniques to make it clearer how the Earthmover distance relates to the geometry of the underlying space. This one is particularly fun because the explanations result in a path traveled from the start to the finish, and by setting up the underlying metric in just such a way, you can watch the distribution navigate a maze to get to its target. I like to imagine tiny ants carrying all that dirt.

Finally, work of Shirdhonkar and Jacobs provide approximation algorithms that allow linear-time computation, instead of the worst-case-cubic runtime of a linear solver.

]]>