Visualizing an Assassin Puzzle

Over at Math3ma, Tai-Danae Bradley shared the following puzzle, which she also featured in a fantastic (spoiler-free) YouTube video. If you’re seeing this for the first time, watch the video first.

Consider a square in the xy-plane, and let A (an “assassin”) and T (a “target”) be two arbitrary-but-fixed points within the square. Suppose that the square behaves like a billiard table, so that any ray (a.k.a “shot”) from the assassin will bounce off the sides of the square, with the angle of incidence equaling the angle of reflection.

Puzzle: Is it possible to block any possible shot from A to T by placing a finite number of points in the square?

This puzzle found its way to me through Tai-Danae’s video, via category theorist Emily Riehl, via a talk by the recently deceased Fields Medalist Maryam Mirzakhani, who studied the problem in more generality. I’m not familiar with her work, but knowing mathematicians it’s probably set in an arbitrary complex n-manifold.

See Tai-Danae’s post for a proof, which left such an impression on me I had to dig deeper. In this post I’ll discuss a visualization I made—now posted at the end of Tai-Danae’s article—as well as here and below (to avoid spoilers). In the visualization, mouse movement chooses the firing direction for the assassin, and the target is in green. Dragging the target with the mouse updates the position of the guards. The source code is on Github.

Outline

The visualization uses d3 library, which was made for visualizations that dynamically update with data. I use it because it can draw SVGs real nice.

The meat of the visualization is in two geometric functions.

  1. Decompose a ray into a series of line segments—its path as it bounces off the walls—stopping if it intersects any of the points in the plane.
  2. Compute the optimal position of the guards, given the boundary square and the positions of the assassin and target.

Both of these functions, along with all the geometry that supports them, is in geometry.js. The rest of the demo is defined in main.js, in which I oafishly trample over d3 best practices to arrive miraculously at a working product. Critiques welcome 🙂

As with most programming and software problems, the key to implementing these functions while maintaining your sanity is breaking it down into manageable pieces. Incrementalism is your friend.

Vectors, rays, rectangles, and ray splitting

We start at the bottom with a Vector class with helpful methods for adding, scaling, and computing norms and inner products.

function innerProduct(a, b) {
  return a.x * b.x + a.y * b.y;
}

class Vector {
  constructor(x, y) {
    this.x = x;
    this.y = y;
  }

  normalized() { ... }
  norm() { ... }
  add(vector) { ... }
  subtract(vector) { ... }
  scale(length) { ... }
  distance(vector) { ... }
  midpoint(b) { ... }
}

This allows one to compute the distance between two points, e.g., with vector.subtract(otherVector).norm().

Next we define a class for a ray, which is represented by its center (a vector) and a direction (a vector).

class Ray {
  constructor(center, direction, length=100000) {
    this.center = center;
    this.length = length;

    if (direction.x == 0 && direction.y == 0) {
      throw "Can't have zero direction";
    }
    this.direction = direction.normalized();
  }

  endpoint() {
    return this.center.add(this.direction.scale(this.length));
  }

  intersects(point) {
    let shiftedPoint = point.subtract(this.center);
    let signedLength = innerProduct(shiftedPoint, this.direction);
    let projectedVector = this.direction.scale(signedLength);
    let differenceVector = shiftedPoint.subtract(projectedVector);

    if (signedLength > 0
        && this.length > signedLength
        && differenceVector.norm() < intersectionRadius) {
      return projectedVector.add(this.center);
    } else {
      return null;
    }
  }
}

The ray must be finite for us to draw it, but the length we've chosen is so large that, as you can see in the visualization, it's effectively infinite. Feel free to scale it up even longer.

assassin-puzzle

The interesting bit is the intersection function. We want to compute whether a ray intersects a point. To do this, we use the inner product as a decision rule to compute the distance of a point from a line. If that distance is very small, we say they intersect.

In our demo points are not infinitesimal, but rather have a small radius described by intersectionRadius. For the sake of being able to see anything we set this to 3 pixels. If it’s too small the demo will look bad. The ray won’t stop when it should appear to stop, and it can appear to hit the target when it doesn’t.

Next up we have a class for a Rectangle, which is where the magic happens. The boilerplate and helper methods:

class Rectangle {
  constructor(bottomLeft, topRight) {
    this.bottomLeft = bottomLeft;
    this.topRight = topRight;
  }

  topLeft() { ... }
  center() { ... }
  width() { .. }
  height() { ... }
  contains(vector) { ... }

The function rayToPoints that splits a ray into line segments from bouncing depends on three helper functions:

  1. rayIntersection: Compute the intersection point of a ray with the rectangle.
  2. isOnVerticalWall: Determine if a point is on a vertical or horizontal wall of the rectangle, raising an error if neither.
  3. splitRay: Split a ray into a line segment and a shorter ray that’s “bounced” off the wall of the rectangle.

(2) is trivial, computing some x- and y-coordinate distances up to some error tolerance. (1) involves parameterizing the ray and checking one of four inequalities. If the bottom left of the rectangle is (x_1, y_1) and the top right is (x_2, y_2) and the ray is written as \{ (c_1 + t v_1, c_2 + t v_2) \mid t > 0 \}, then—with some elbow grease—the following four equations provide all possibilities, with some special cases for vertical or horizontal rays:

\displaystyle \begin{aligned} c_2 + t v_2 &= y_2 & \textup{ and } \hspace{2mm} & x_1 \leq c_1 + t v_1 \leq x_2 & \textup{ (intersects top)} \\ c_2 + t v_2 &= y_1 & \textup{ and } \hspace{2mm} & x_1 \leq c_1 + t v_1 \leq x_2 & \textup{ (intersects bottom)} \\ c_1 + t v_1 &= x_1 & \textup{ and } \hspace{2mm} & y_1 \leq c_2 + t v_2 \leq y_2 & \textup{ (intersects left)} \\ c_1 + t v_1 &= x_2 & \textup{ and } \hspace{2mm} & y_1 \leq c_2 + t v_2 \leq y_2 & \textup{ (intersects right)} \\ \end{aligned}

In code:

  rayIntersection(ray) {
    let c1 = ray.center.x;
    let c2 = ray.center.y;
    let v1 = ray.direction.x;
    let v2 = ray.direction.y;
    let x1 = this.bottomLeft.x;
    let y1 = this.bottomLeft.y;
    let x2 = this.topRight.x;
    let y2 = this.topRight.y;

    // ray is vertically up or down
    if (epsilon > Math.abs(v1)) {
      return new Vector(c1, (v2 > 0 ? y2 : y1));
    }

    // ray is horizontally left or right
    if (epsilon > Math.abs(v2)) {
      return new Vector((v1 > 0 ? x2 : x1), c2);
    }

    let tTop = (y2 - c2) / v2;
    let tBottom = (y1 - c2) / v2;
    let tLeft = (x1 - c1) / v1;
    let tRight = (x2 - c1) / v1;

    // Exactly one t value should be both positive and result in a point
    // within the rectangle

    let tValues = [tTop, tBottom, tLeft, tRight];
    for (let i = 0; i  epsilon && this.contains(intersection)) {
        return intersection;
      }
    } 

    throw "Unexpected error: ray never intersects rectangle!";
  }

Next, splitRay splits a ray into a single line segment and the “remaining” ray, by computing the ray’s intersection with the rectangle, and having the “remaining” ray mirror the direction of approach with a new center that lies on the wall of the rectangle. The new ray length is appropriately shorter. If we run out of ray length, we simply return a segment with a null ray.

  splitRay(ray) {
    let segment = [ray.center, this.rayIntersection(ray)];
    let segmentLength = segment[0].subtract(segment[1]).norm();
    let remainingLength = ray.length - segmentLength;

    if (remainingLength < 10) {
      return {
        segment: [ray.center, ray.endpoint()],
        ray: null
      };
    }

    let vertical = this.isOnVerticalWall(segment[1]);
    let newRayDirection = null;

    if (vertical) {
      newRayDirection = new Vector(-ray.direction.x, ray.direction.y);
    } else {
      newRayDirection = new Vector(ray.direction.x, -ray.direction.y);
    }

    let newRay = new Ray(segment[1], newRayDirection, length=remainingLength);
    return {
      segment: segment,
      ray: newRay
    };
  }

As you have probably guessed, rayToPoints simply calls splitRay over and over again until the ray hits an input “stopping point”—a guard, the target, or the assassin—or else our finite ray length has been exhausted. The output is a list of points, starting from the original ray’s center, for which adjacent pairs are interpreted as line segments to draw.

  rayToPoints(ray, stoppingPoints) {
    let points = [ray.center];
    let remainingRay = ray;

    while (remainingRay) {
      // check if the ray would hit any guards or the target
      if (stoppingPoints) {
        let hardStops = stoppingPoints.map(p => remainingRay.intersects(p))
          .filter(p => p != null);
        if (hardStops.length > 0) {
          // find first intersection and break
          let closestStop = remainingRay.closestToCenter(hardStops);
          points.push(closestStop);
          break;
        }
      }

      let rayPieces = this.splitRay(remainingRay);
      points.push(rayPieces.segment[1]);
      remainingRay = rayPieces.ray;
    } 

    return points;
  }

That’s sufficient to draw the shot emanating from the assassin. This method is called every time the mouse moves.

Optimal guards

The function to compute the optimal position of the guards takes as input the containing rectangle, the assassin, and the target, and produces as output a list of 16 points.

/*
 * Compute the 16 optimal guards to prevent the assassin from hitting the
 * target.
 */
function computeOptimalGuards(square, assassin, target) {
...
}

If you read Tai-Danae’s proof, you’ll know that this construction is to

  1. Compute mirrors of the target across the top, the right, and the top+right of the rectangle. Call this resulting thing the 4-mirrored-targets.
  2. Replicate the 4-mirrored-targets four times, by translating three of the copies left by the entire width of the 4-mirrored-targets shape, down by the entire height, and both left-and-down.
  3. Now you have 16 copies of the target, and one assassin. This gives 16 line segments from assassin-to-target-copy. Place a guard at the midpoint of each of these line segments.
  4. Finally, apply the reverse translation and reverse mirroring to return the guards to the original square.

Due to WordPress being a crappy blogging platform I need to migrate off of, the code snippets below have been magically disappearing. I’ve included links to github lines as well.

Step 1 (after adding simple helper functions on Rectangle to do the mirroring):

  // First compute the target copies in the 4 mirrors
  let target1 = target.copy();
  let target2 = square.mirrorTop(target);
  let target3 = square.mirrorRight(target);
  let target4 = square.mirrorTop(square.mirrorRight(target));
  target1.guardLabel = 1;
  target2.guardLabel = 2;
  target3.guardLabel = 3;
  target4.guardLabel = 4;

Step 2:

  // for each mirrored target, compute the four two-square-length translates
  let mirroredTargets = [target1, target2, target3, target4];
  let horizontalShift = 2 * square.width();
  let verticalShift = 2 * square.height();
  let translateLeft = new Vector(-horizontalShift, 0);
  let translateRight = new Vector(horizontalShift, 0);
  let translateUp = new Vector(0, verticalShift);
  let translateDown = new Vector(0, -verticalShift);

  let translatedTargets = [];
  for (let i = 0; i < mirroredTargets.length; i++) {
    let target = mirroredTargets[i];
    translatedTargets.push([
      target,
      target.add(translateLeft),
      target.add(translateDown),
      target.add(translateLeft).add(translateDown),
    ]);
  }

Step 3, computing the midpoints:

  // compute the midpoints between the assassin and each translate
  let translatedMidpoints = [];
  for (let i = 0; i  t.midpoint(assassin)));
  }

Step 4, returning the guards back to the original square, is harder than it seems, because the midpoint of an assassin-to-target-copy segment might not be in the same copy of the square as the target-copy being fired at. This means you have to detect which square copy the midpoint lands in, and use that to determine which operations are required to invert. This results in the final block of this massive function.

  // determine which of the four possible translates the midpoint is in
  // and reverse the translation. Since midpoints can end up in completely
  // different copies of the square, we have to check each one for all cases.
  function untranslate(point) {
    if (point.x  square.bottomLeft.y) {
      return point.add(translateRight);
    } else if (point.x >= square.bottomLeft.x && point.y <= square.bottomLeft.y) {
      return point.add(translateUp);
    } else if (point.x < square.bottomLeft.x && point.y <= square.bottomLeft.y) {
      return point.add(translateRight).add(translateUp);
    } else {
      return point;
    }
  }

  // undo the translations to get the midpoints back to the original 4-mirrored square.
  let untranslatedMidpoints = [];
  for (let i = 0; i  square.topRight.x && point.y > square.topRight.y) {
      return square.mirrorTop(square.mirrorRight(point));
    } else if (point.x > square.topRight.x && point.y <= square.topRight.y) {
      return square.mirrorRight(point);
    } else if (point.x  square.topRight.y) {
      return square.mirrorTop(point);
    } else {
      return point;
    }
  }

  return untranslatedMidpoints.map(unmirror);

And that’s all there is to it!

Improvements, if I only had the time

There are a few improvements I’d like to make to this puzzle, but haven’t made the time (I’m writing a book, after all!).

  1. Be able to drag the guards around.
  2. Create new guards from an empty set of guards, with a button to “reveal” the solution.
  3. Include a toggle that, when pressed, darkens the entire region of the square that can be hit by the assassin. For example, this would allow you to see if the target is in the only possible safe spot, or if there are multiple safe spots for a given configuration.
  4. Perhaps darken the vulnerable spots by the number of possible paths that hit it, up to some limit.
  5. The most complicated one: generalize to an arbitrary polygon (convex or not!), for which there may be no optional solution. The visualization would allow you to look for a solution using 2-4.

Pull requests are welcome if you attempt any of these improvements.

Until next time!

A parlor trick for SET

Tai-Danae Bradley is one of the hosts of PBS Infinite Series, a delightful series of vignettes into fun parts of math. The video below is about the same of SET, a favorite among mathematicians. Specifically, Tai-Danae explains how SET cards lie in (using more technical jargon) a vector space over a finite field, and that valid sets correspond to lines. If you don’t immediately know how this would work, watch the video.

In this post I want to share a parlor trick for SET that I originally heard from Charlotte Chan. It uses the same ideas from the video above, which I’ll only review briefly.

In the game of SET you see a board of cards like the following, and players look for sets.

SetCards

Image source: theboardgamefamily.com

A valid set is a triple of cards where, feature by feature, the characteristics on the cards are either all the same or all different. A valid set above is {one empty blue oval, two solid blue ovals, three shaded blue ovals}. The feature of “fill” is different on all the cards, but the feature of “color” is the same, etc.

In a game of SET, the cards are dealt in order from a shuffled deck, players race to claim sets, removing the set if it’s valid, and three cards are dealt to replace the removed set. Eventually the deck is exhausted and the game is over, and the winner is the player who collected the most sets.

There are a handful of mathematical tricks you can use to help you search for sets faster, but the parlor trick in this post adds a fun variant to the end of the game.

Play the game of SET normally, but when you get down to the last card in the deck, don’t reveal it. Keep searching for sets until everyone agrees no visible sets are left. Then you start the variant: the first player to guess the last un-dealt card in the deck gets a bonus set.

The math comes in when you discover that you don’t need to guess, or remember anything about the game that was just played! A clever stranger could walk into the room at the end of the game and win the bonus point.

Theorem: As long as every player claimed a valid set throughout the game, the information on the remaining board uniquely determines the last (un-dealt) card.

Before we get to the proof, some reminders. Recall that there are four features on a SET card, each of which has three options. Enumerate the options for each feature (e.g., {Squiggle, Oval, Diamond} = {0, 1, 2}).

While we will not need the geometry induced by this, this implies each card is a vector in the vector space \mathbb{F}_3^4, where \mathbb{F}_3 = \mathbb{Z}/3\mathbb{Z} is the finite field of three elements, and the exponent means “dimension 4.” As Tai-Danae points out in the video, each SET is an affine line in this vector space. For example, if this is the enumeration:

joyofset

Source: “The Joy of Set

Then using the enumeration, a set might be given by

\displaystyle \{ (1, 1, 1, 1), (1, 2, 0, 1), (1, 0, 2, 1) \}

The crucial feature for us is that the vector-sum (using the modular field arithmetic on each entry) of the cards in a valid set is the zero vector (0, 0, 0, 0). This is because 1+1+1 = 0, 2+2+2 = 0, and 1+2+3=0 are all true mod 3.

Proof of Theorem. Consider the vector-valued invariant S_t equal to the sum of the remaining cards after t sets have been taken. At the beginning of the game the deck has 81 cards that can be partitioned into valid sets. Because each valid set sums to the zero vector, S_0 = (0, 0, 0, 0). Removing a valid set via normal play does not affect the invariant, because you’re subtracting a set of vectors whose sum is zero. So S_t = 0 for all t.

At the end of the game, the invariant still holds even if there are no valid sets left to claim. Let x be the vector corresponding to the last un-dealt card, and c_1, \dots, c_n be the remaining visible cards. Then x + \sum_{i=1}^n c_i = (0,0,0,0), meaning x = -\sum_{i=1}^n c_i.

\square

I would provide an example, but I want to encourage everyone to play a game of SET and try it out live!

Charlotte, who originally showed me this trick, was quick enough to compute this sum in her head. So were the other math students we played SET with. It’s a bit easier than it seems since you can do the sum feature by feature. Even though I’ve known about this trick for years, I still require a piece of paper and a few minutes.

Because this is Math Intersect Programming, the reader is encouraged to implement this scheme as an exercise, and simulate a game of SET by removing randomly chosen valid sets to verify experimentally that this scheme works.

Until next time!

Earthmover Distance

Problem: Compute distance between points with uncertain locations (given by samples, or differing observations, or clusters).

For example, if I have the following three “points” in the plane, as indicated by their colors, which is closer, blue to green, or blue to red?

example-points.png

It’s not obvious, and there are multiple factors at work: the red points have fewer samples, but we can be more certain about the position; the blue points are less certain, but the closest non-blue point to a blue point is green; and the green points are equally plausibly “close to red” and “close to blue.” The centers of masses of the three sample sets are close to an equilateral triangle. In our example the “points” don’t overlap, but of course they could. And in particular, there should probably be a nonzero distance between two points whose sample sets have the same center of mass, as below. The distance quantifies the uncertainty.

same-centers.png

All this is to say that it’s not obvious how to define a distance measure that is consistent with perceptual ideas of what geometry and distance should be.

Solution (Earthmover distance): Treat each sample set A corresponding to a “point” as a discrete probability distribution, so that each sample x \ in A has probability mass p_x = 1 / |A|. The distance between A and B is the optional solution to the following linear program.

Each x \in A corresponds to a pile of dirt of height p_x, and each y \in B corresponds to a hole of depth p_y. The cost of moving a unit of dirt from x to y is the Euclidean distance d(x, y) between the points (or whatever hipster metric you want to use).

Let z_{x, y} be a real variable corresponding to an amount of dirt to move from x \in A to y \in B, with cost d(x, y). Then the constraints are:

  • Each z_{x, y} \geq 0, so dirt only moves from x to y.
  • Every pile x \in A must vanish, i.e. for each fixed x \in A, \sum_{y \in B} z_{x,y} = p_x.
  • Likewise, every hole y \in B must be completely filled, i.e. \sum_{y \in B} z_{x,y} = p_y.

The objective is to minimize the cost of doing this: \sum_{x, y \in A \times B} d(x, y) z_{x, y}.

In python, using the ortools library (and leaving out a few docstrings and standard import statements, full code on Github):

from ortools.linear_solver import pywraplp

def earthmover_distance(p1, p2):
    dist1 = {x: count / len(p1) for (x, count) in Counter(p1).items()}
    dist2 = {x: count / len(p2) for (x, count) in Counter(p2).items()}
    solver = pywraplp.Solver('earthmover_distance', pywraplp.Solver.GLOP_LINEAR_PROGRAMMING)

    variables = dict()

    # for each pile in dist1, the constraint that says all the dirt must leave this pile
    dirt_leaving_constraints = defaultdict(lambda: 0)

    # for each hole in dist2, the constraint that says this hole must be filled
    dirt_filling_constraints = defaultdict(lambda: 0)

    # the objective
    objective = solver.Objective()
    objective.SetMinimization()

    for (x, dirt_at_x) in dist1.items():
        for (y, capacity_of_y) in dist2.items():
            amount_to_move_x_y = solver.NumVar(0, solver.infinity(), 'z_{%s, %s}' % (x, y))
            variables[(x, y)] = amount_to_move_x_y
            dirt_leaving_constraints[x] += amount_to_move_x_y
            dirt_filling_constraints[y] += amount_to_move_x_y
            objective.SetCoefficient(amount_to_move_x_y, euclidean_distance(x, y))

    for x, linear_combination in dirt_leaving_constraints.items():
        solver.Add(linear_combination == dist1[x])

    for y, linear_combination in dirt_filling_constraints.items():
        solver.Add(linear_combination == dist2[y])

    status = solver.Solve()
    if status not in [solver.OPTIMAL, solver.FEASIBLE]:
        raise Exception('Unable to find feasible solution')

    return objective.Value()

Discussion: I’ve heard about this metric many times as a way to compare probability distributions. For example, it shows up in an influential paper about fairness in machine learning, and a few other CS theory papers related to distribution testing.

One might ask: why not use other measures of dissimilarity for probability distributions (Chi-squared statistic, Kullback-Leibler divergence, etc.)? One answer is that these other measures only give useful information for pairs of distributions with the same support. An example from a talk of Justin Solomon succinctly clarifies what Earthmover distance achieves

Screen Shot 2018-03-03 at 6.11.00 PM.png

Also, why not just model the samples using, say, a normal distribution, and then compute the distance based on the parameters of the distributions? That is possible, and in fact makes for a potentially more efficient technique, but you lose some information by doing this. Ignoring that your data might not be approximately normal (it might have some curvature), with Earthmover distance, you get point-by-point details about how each data point affects the outcome.

This kind of attention to detail can be very important in certain situations. One that I’ve been paying close attention to recently is the problem of studying gerrymandering from a mathematical perspective. Justin Solomon of MIT is a champion of the Earthmover distance (see his fascinating talk here for more, with slides) which is just one topic in a field called “optimal transport.”

This has the potential to be useful in redistricting because of the nature of the redistricting problem. As I wrote previously, discussions of redistricting are chock-full of geometry—or at least geometric-sounding language—and people are very concerned with the apparent “compactness” of a districting plan. But the underlying data used to perform redistricting isn’t very accurate. The people who build the maps don’t have precise data on voting habits, or even locations where people live. Census tracts might not be perfectly aligned, and data can just plain have errors and uncertainty in other respects. So the data that district-map-drawers care about is uncertain much like our point clouds. With a theory of geometry that accounts for uncertainty (and the Earthmover distance is the “distance” part of that), one can come up with more robust, better tools for redistricting.

Solomon’s website has a ton of resources about this, under the names of “optimal transport” and “Wasserstein metric,” and his work extends from computing distances to computing important geometric values like the barycenter, computational advantages like parallelism.

Others in the field have come up with transparency techniques to make it clearer how the Earthmover distance relates to the geometry of the underlying space. This one is particularly fun because the explanations result in a path traveled from the start to the finish, and by setting up the underlying metric in just such a way, you can watch the distribution navigate a maze to get to its target. I like to imagine tiny ants carrying all that dirt.

Screen Shot 2018-03-03 at 6.15.50 PM.png

Finally, work of Shirdhonkar and Jacobs provide approximation algorithms that allow linear-time computation, instead of the worst-case-cubic runtime of a linear solver.

NP-hard does not mean hard

When NP-hardness pops up on the internet, say because some silly blogger wants to write about video games, it’s often tempting to conclude that the problem being proved NP-hard is actually very hard!

“Scientists proved Super Mario is NP-hard? I always knew there was a reason I wasn’t very good at it!” Sorry, these two are unrelated. NP-hardness means hard in a narrow sense this post should hopefully make clear. After that, we’ll explore what “hard” means in a mathematical sense that you can apply beyond NP-hardness to inform your work as a programmer.

When a problem is NP-hard, that simply means that the problem is sufficiently expressive that you can use the problem to express logic. By which I mean boolean formulas using AND, OR, and NOT. In the Super Mario example, the “problem” is a bundle of (1) the controls for the player (2) the allowed tiles and characters that make up a level, and (3) the goal of getting from the start to the end. Logic formulas are encoded in the creation of a level, and solving the problem (completing the level) is the same as finding conditions to make the logical formula true.

mario-clause-gadget

The clause gadget for the original Super Mario Brothers, encoding an OR of three variables.

In this sense, NP-hardness doesn’t make all of Super Mario hard. The levels designed to encode logical formulas are contrived, convoluted, and contorted. They abuse the rules of the game in order to cram boolean logic into it. These are worst case levels. It’s using Mario for a completely unintended purpose, not unlike hacking. And so NP-hardness is a worst case claim.

To reiterate, NP-hardness means that Super Mario has expressive power. So expressive that it can emulate other problems we believe are hard in the worst case. And, because the goal of mathematical “hardness” is to reason about the limitations of algorithms, being able to solve Super Mario in full generality implies you can solve any hard subproblem, no matter how ridiculous the level design.

The P != NP conjecture says that there’s no polynomial time algorithm to determine whether boolean logic formulas are satisfiable, and so as a consequence Super Mario (in full generality) also has no polynomial time algorithm.

That being said, in reality Super Mario levels do not encode logical formulas! If you use the knowledge that real-world Super Mario levels are designed in the way they are (to be solvable, fun), then you can solve Super Mario with algorithms. There are many examples.

In general, the difficulty of a problem for humans is unrelated to the difficulty for algorithms. Consider multiplication of integers. This is a trivial problem for computers to solve, but humans tend to struggle with it. It’s an amazing feat to be able to multiply two 7 digit numbers in less than 5 seconds, whereas computers can multiply two thousand-digit numbers in milliseconds.

Meanwhile, protein folding is known to be an NP-hard problem, but it’s been turned into a game sufficiently easy for humans to solve that players have contributed to scientific research. Indeed, even some of the most typically cited NP-hard problems, like traveling salesman, have heuristic, practical algorithmic solutions that allow one to solve them (very close to optimally) in hours on inputs as large as every city on earth.

So the mathematical notions of hardness are quite disconnected from practical notions of hardness. This is not even to mention that some NP-hard problems can be efficiently approximated to within any desired accuracy.

Let’s dig into the math a bit more. “Hardness” is a family of ideas about comparisons between problems based on reusability of algorithmic solutions. Loosely speaking, a problem R is hard with respect to a class of problems C if an algorithm solving R can be easily transformed into an algorithm solving any problem in C. You have to say what kinds of transformations are allowed, and the transformation can be different for different target problems in C, but that’s the basic idea.

In the Super Mario example, if you want to solve logical formulas, you can transform a hypothetically perfect mario-level-playing algorithm into a logic solver by encoding the formula as a level and running the mario-level-playing algorithm on it as a black box. Add an if statement to the end to translate “level can/can’t be finished” to “formula can/can’t be satisfied,” and the transformation is complete. It’s important for NP-hardness that the transformation only takes polynomial time. Other kinds of hardness might admit more or restrict to fewer resources.

And so this is what makes Mario NP-hard, because boolean logic satisfiability is NP-hard. Any problem in NP can be solved by a boolean logic solver, and hence also by a mario-level-player. The fact that boolean logic solving is NP-hard is a difficult theorem to prove. But if we assume it’s true, you can compose the transformations to get from any NP problem to Super Mario.

As a simple example of a different kind of hardness, you can let C be the class of problems solvable using only a finite amount of memory (independent of the input). You have probably heard of this class of problems by another name, but I’ll keep you guessing until the end of the post. A C-hard problem R is one for which an algorithmic solution can be repurposed to solve any finite-memory-solvable problem.

We have to be careful: if the transformation between solutions allows us polynomial time (in the size of the input) like it did for NP-hardness, then we might have enough time in the transformation alone to solve the entire problem, removing the need for a solution to R in the first place! For this reason, we have to limit the amount of work that can be done in the transformation. We get a choice here that influences how interesting or useful the definition of hardness is, but let’s just pick one and say that the transformation can only use finite time (independent of the input).

To be fair, I actually don’t know if there are any hard problems with respect to this definition. There probably are, but chances are good that they are not members of C, and that’s where the definition of hardness gets really interesting. If you have a problem in C which is also C-hard, it’s called complete for C. And once you’ve found a complete problem, from a theoretical perspective you’re a winner. You’ve found a problem which epitomizes the difficulty of solving problems in C. And so it’s a central aim of researchers studying a complexity class to find complete problems. As they say in the business, “ABC: always be completing.”

As a more concrete and interesting example, the class P of all polynomial-time solvable problems has a complete problem. Here the transformations are a bit up in the air. They could either be logarithmic-space computations, or what’s called NC, which can be thought of as poly-logarithmic time (very fast) parallel computations. I only mention NC because it allows you to say “P-complete problems are hard to parallelize.”

Regardless of the choice, there are a number of very useful problems known to be P-complete. The first is the Circuit Value Problem, given a circuit (described by its gates and wires using any reasonable encoding) and an input to the circuit, what is the output?

Others include linear programming (optimize this linear function with respect to linear constraints), data compression (does the compressed version of a string s using Lempel–Ziv–Welch contain a string t?), and type inference for partial types. There are many more in this compendium of Greenlaw et al. Each one is expressive enough to encode any instance of the other, and any instance of any problem in P. It’s quite curious to think that gzip can solve linear programs, but that’s surely no curiouser than super mario levels encoding boolean logic.

Just as with NP-hardness, when a problem is P-hard that doesn’t automatically mean it’s easy or hard for humans, or that typical instances can’t be easily parallelized. P-hardness is also a worst case guarantee.

Studying P-completeness is helpful in the same way NP-completeness is helpful. Completeness informs you about whether you should hope to find a perfect solution or be content with approximations and heuristics (or incorporate problem context to make it easier). Knowing a problem is P-complete means you should not expect perfect efficient parallel algorithms, or perfect efficient algorithms that use severely limited space. Knowing a problem is NP-hard means you should not expect a perfect polynomial time solution. In other words, if you are forced to work with those restrictions, the game becomes one of tradeoffs. Hardness and completeness focus and expedite your work, and clarify a principled decision making process.

Until next time!

P.S. The class of problems solvable in a finite amount of memory is just the class of regular languages. The “finite memory” is the finite state machine used to solve them.