Featured Posts

Making Hybrid Images Neural Networks
and Backpropagation
Elliptic Curves
and Cryptography
bezier-dog-image-small circle-wedge-sphere baby_programmers
Bezier Curves and Picasso Computing Homology Probably Approximately
Correct – A Formal Theory
of Learning

Silent Duels—Parsing the Construction

Last time we discussed the setup for the silent duel problem: two players taking actions in [0,1], player 1 gets n chances to act, player 2 gets m, and each knows their probability of success when they act.

The solution is in a paper of Rodrigo Restrepo from the 1950s. In this post I’ll start detailing how I study this paper, and talk through my thought process for approaching a bag of theorems and proofs. If you want to follow along, I re-typeset the paper on Github.

Game Theory Basics

The Introduction starts with a summary of the setting of game theory. I remember most of this so I will just summarize the basics of the field. Skip ahead if you already know what the minimax theorem is, and what I mean when I say the “value” of a game.

A two-player game consists of a set of actions for each player—which may be finite or infinite, and need not be the same for both players—and a payoff function for each possible choice of actions. The payoff function is interpreted as the “utility” that player 1 gains and player 2 loses. If the payoff is negative, you interpret it as player 1 losing utility to player 2. Utility is just a fancy way of picking a common set of units for what each player treasures in their heart of hearts. Often it’s stated as money and we assume both players value cash the same way. Games in which the utility is always “one player gains exactly the utility lost by the other player” are called zero-sum.

With a finite set of actions, the payoff function is a table. For rock-paper-scissors the table is:

Rock, paper: -1
Rock, scissors: 1
Rock, rock: 0
Paper, paper: 0
Paper, scissors: -1
Paper, rock: 1
Scissors, paper: 1
Scissors, scissors: 0
Scissors, rock: -1

You could arrange this in a matrix and analyze the structure of the matrix, but we won’t. It doesn’t apply to our forthcoming setting where the players have infinitely many strategies.

A strategy is a possibly-randomized algorithm (whose inputs are just the data of the game, not including any past history of play) that outputs an action. In some games, the optimal strategy is to choose a single action no matter what your opponent does. This is sometimes called a pure, dominating strategy, not because it dominates your opponent, but because it’s better than all of your other options no matter what your opponent does. The output action is deterministic.

However, as with rock-paper-scissors, the optimal strategy for most interesting games requires each player to act randomly according to a fixed distribution. Such strategies are called mixed or randomized. For rock-paper-scissors, the optimal strategy is to choose rock, paper, and scissors with equal probability.  Computers are only better than humans at rock-paper-scissors because humans are bad at behaving consistently and uniformly random.

The famous minimax theorem says that every two-player zero-sum game has an optimal strategy for each player, which is possibly randomized. This strategy is optimal in the sense that it maximizes your expected winnings no matter what your opponent does. However, if your opponent is playing a particularly suboptimal strategy, the minimax solution might not be as good as a solution that takes advantage of the opponent’s dumb choices. A uniform random rock-paper-scissors strategy is not optimal if your opponent always plays “rock.”  However, the optimal strategy doesn’t need special knowledge or space to store information about past play. If you played against God, you would blindly use the minimax strategy and God would have no upper hand. I wonder if the pope would have excommunicated me for saying that in the 1600’s.

The expected winnings for player 1 when both players play a minimax-optimal strategy is called the value of the game, and this number is unique (even if there are possibly multiple optimal strategies). If a game is symmetric—meaning both players have the same actions and the payoff function is symmetric—then the value is guaranteed to be zero. The game is fair.

The version of the minimax theorem that most people use (in particular, the version that often comes up in theoretical computer science) shows that finding an optimal strategy is equivalent to solving a linear program. This is great because it means that any such (finite) game is easy to solve. You don’t need insight; just compile and run. The minimax theorem is also true for sufficiently well-behaved continuous action spaces. The silent duel is well-behaved, so our goal is to compute an explicit, easy-to-implement strategy that the minimax theorem guarantees exists. As a side note, here is an example of a poorly-behaved game with no minimax optimum.

While the minimax theorem guarantees optimal strategies and a value, the concept of the “value” of the game has an independent definition:

Let X, Y be finite sets of actions for players 1, 2 respectively, and p(x), q(y) be strategies, i.e., probability distributions over X and Y so that p(x) is the probability that x is chosen. Let \Psi(x, y) be the payoff function for the game. The value of the game is a real number v such that there exist two strategies p, q with the two following properties. First, for every fixed y \in Y,

\displaystyle \sum_{x \in X} p(x) \Psi(x, y) \geq v

(no matter what player 2 does, player 1’s strategy guarantees at least v payoff), and for every fixed x \in X,

\displaystyle \sum_{y \in Y} q(y) \Psi(x, y) \leq v

(no matter what player 1 does, player 2’s strategy prevents a loss of more than v).

Since silent duels are continuous, Restrepo opens the paper with the corresponding definition for continuous games. Here a probability distribution is the same thing as a “positive measure with total measure 1.” Restrepo uses F and G for the strategies, and the corresponding statement of expected payoff for player 1 is that, for all fixed actions y \in Y,

\displaystyle \int \Psi(x, y) dF(x) \geq v

And likewise, for all x \in X,

\displaystyle \int \Psi(x, y) dG(y) \leq v

All of this background gets us through the very first paragraph of the Restrepo paper. As I elaborate in my book, this is par for the course for math papers, because written math is optimized for experts already steeped in the context. Restrepo assumes the reader knows basic game theory so we can get on to the details of his construction, at which point he slows down considerably to focus on the details.

Description of the Optimal Strategies

Starting in section 2, Restrepo describes the construction of the optimal strategy, but first he explains the formal details of the setting of the game. We already know the two players are taking n and m actions between 0 \leq t \leq 1, but we also fix the probability of success. Player 1 knows a distribution P(t) on [0,1] for which P(t) is the probability of success when acting at time t. Likewise, player 2 has a possibly different distribution Q(t), and (crucially) P(t), Q(t) both increase continuously on [0,1]. (In section 3 he clarifies further that P satisfies P(0) = 0, P(1) = 1, and P'(t) > 0, likewise for Q(t).) Moreover, both players know both P, Q. One could say that each player has an estimate of their opponent’s firing accuracy, and wants to be optimal compared to that estimate.

The payoff function \Psi(x, y) is defined informally as: 1 if Player one succeeds before Player 2, -1 if Player 2 succeeds first, and 0 if both players exhaust their actions before the end and none succeed. Though Restrepo does not state it, if the players act and succeed at the same time—say both players fire at time t=1—the payoff should also be zero. We’ll see how this is converted to a more formal (and cumbersome!) mathematical definition in a future post.

Next we’ll describe the statement of the fully general optimal strategy (which will be essentially meaningless, but have some notable features we can infer information from), and get a sneak peek at how to build this strategy algorithmically. Then we’ll see a simplified example of the optimal strategy.

The optimal strategy presented depends only on the values n, m (the number of actions each player gets) and their success probability distributions P, Q. For player 1, the strategy splits up [0,1] into subintervals

\displaystyle [a_i, a_{i+1}] \qquad 0 < a_1 < a_2, < \cdots < a_n < a_{n+1} = 1

Crucially, this strategy ignores the initial interval [0, a_1]. In each other subinterval Player 1 attempts an action at a time chosen by a probability distribution specific to that interval, independently of previous attempts. But no matter what, there is some initial wait time during which no action will ever be taken. This makes sense: if player 1 fired at time 0, it is a guaranteed wasted shot. Likewise, firing at time 0.000001 is basically wasted (due to continuity, unless P(t) is obnoxiously steep early on).

Likewise for player 2, the optimal strategy is determined by numbers b_1, \dots, b_m resulting in m intervals [b_j, b_{j+1}] with b_{m+1} = 1.

The difficult part of the construction is describing the distributions dictating when a player should act during an interval. It’s difficult because an interval for player 1 and player 2 can overlap partially. Maybe a_2 = 0.5, a_3 = 0.75 and b_1 = 0.25, b_2 = 0.6. Player 1 knows that Player 2 (using their corresponding minimax strategy) must act before time t = 0.6, and gets another chance after that time. This suggests that the distribution determining when Player 1 should act within [a_2, a_3] may have a discontinuous jump at t = 0.6.

Call F_i the distribution for Player 1 to act in the interval [a_i, a_{i+1}]. Since it is a continuous distribution, Restrepo uses F_i for the cumulative distribution function and dF_i for the probability density function. Then these functions are defined by (note this should be mostly meaningless for the moment)

\displaystyle dF_i(x_i) = \begin{cases} h_i f^*(x_i) dx_i & \textup{ if } a_i < x_i < a_{i+1} \\ 0 & \textup{ if } x_i \not \in [a_i, a_{i+1}] \\ \end{cases}

where f^* is defined as

\displaystyle f^*(t) = \prod_{b_j > t} \left [ 1 - Q(b_j) \right ] \frac{Q'(t)}{Q^2(t) P(t)}.

The constants h_i and h_{i+1} are related by the equation

\displaystyle h_i = [1 - D_i] h_{i+1},

where

\displaystyle D_i = \int_{a_i}^{a_{i+1}} P(t) dF_i(t)

What can we glean from this mashup of symbols? The first is that (obviously) the distribution is zero outside the interval [a_i, a_{i+1}]. Within it, there is this mysterious h_i that is related to the h_{i+1} used to define the next interval’s probability. This suggests we will likely build up the strategy in reverse starting with F_n as the “base case” (if n=1, then it is the only one).

Next, we notice the curious definition of f^*. It unsurprisingly requires knowledge of both P and Q, but the coefficient is strangely chosen: it’s a product over all failure probabilities (1 - Q(b_j)) of all interval-starts happening later for the opponent.

[Side note: it’s very important that this is a constant; when I first read this, I thought that it was \prod_{b_j > t}[1 - Q(t)], which makes the eventual task of integrating f^* much harder.]

Finally, the last interval (the one ending at t=1) may include the option to simply “wait for a guaranteed hit,” which Restrepo calls a “discrete mass of \alpha at t=1.” That is, F_n may have a different representation than the rest. Indeed, at the end of the paper we will find that Restrepo gives a base-case definition for h_n that allows us to bootstrap the construction.

Player 2’s strategy is the same as Player 1’s, but replacing the roles of P, Q, n, m, a_i, b_j in the obvious way.

The symmetric example

As with most math research, the best way to parse a complicated definition or construction is to simplify the different aspects of the problem until they become tractable. One way to do this is to have only a single action for both players, with P = Q. Restrepo provides a more general example to demonstrate, which results in the five most helpful lines in the paper. I’ll reproduce them here verbatim:

EXAMPLE. Symmetric Game: P(t) = Q(t), and n = m. In this case the two
players have the same optimal strategies; \alpha = 0, and a_k = b_k, k=1, \dots, n. Furthermore

\displaystyle \begin{aligned} P(a_{n-k}) &= \frac{1}{2k+3} & k = 0, 1, \dots, n-1, \\ dF_{n-k}(t) &= \frac{1}{4(k+1)} \frac{P'(t)}{P^3(t)} dt & a_{n-k} < t < a_{n-k+1}. \end{aligned}

Saying \alpha = 0 means there is no “wait until t=1 to guarantee a hit”, which makes intuitive sense. You’d only want to do that if your opponent has exhausted all their actions before the end, which is only likely to happen if they have fewer actions than you do.

When Restrepo writes P(a_{n-k}) = \frac{1}{2k+3}, there are a few things happening. First, we confirm that we’re working backwards from a_n. Second, he’s implicitly saying “choose a_{n-k} such that P(a_{n-k}) has the desired cumulative density.” After a bit of reflection, there’s no other way to specify the a_i except implicitly: we don’t have a formula for P to lean on.

Finally, the definition of the density function dF_{n-k}(t) helps us understand under what conditions the probability function would be increasing or decreasing from the start of the interval to the end. Looking at the expression P'(t) / P^3(t), we can see that polynomials will result in an expression dominated by 1/t^k for some k, which is decreasing. By taking the derivative, an increasing density would have to be built from a P satisfying P''(t) P(t) - 3(P'(t))^2 > 0. However, I wasn’t able to find any examples that satisfy this. Polynomials, square roots, logs and exponentials, all seem to result in decreasing density functions.

Finally, we’ll plot two examples. The first is the most reductive: P(t) = Q(t) = t, and n = m = 1. In this case n=1, and there is only one term k=0, for which a_n = 1/3. Then dF_1(t) = 1/4t^3. (For verification, note the integral of dF_1 on [1/3, 1] is indeed 1).

restrepo-1-over-4tcubed.png

With just one action and P(t) = Q(t) = t, the region before t=1/3 has zero probability, and the probability decreases from 6.75 to 1/4.

Note that the reason a_n = 1/3 is so nice is that P(t) is so simple. If P(t) were, say, t^2, then a_n should shift to being \sqrt{1/3}. If P(t) were more complicated, we’d have to invert it (or use an approximate search) to find the location a_n for which P(a_n) = 1/3.

Next, we loosen the example to let n=m=4, still with P(t) = Q(t) = t. In this case, we have the same final interval [1/3,1]. The new actions all occur in the time before t=1/3, in the intervals [1/5, 1/3], [1/7, 1/5], [1/9,1/7]. If there were more actions, we’d get smaller inverse-of-odd-spaced intervals approaching zero. The probability densities are now steeper versions of the same 1/4t^3, with the constant getting smaller to compensate for the fact that 1/t^3 gets larger and maintain the normalized distribution. For example, the earliest interval results in \int_{1/9}^{1/7} \frac{1}{16t^3} dt = 1. Closer to zero the densities are somewhat shallower compared to the size of the interval; for example in [1/9, 1/7], the density toward the beginning of the interval is only about twice as large as the density toward the end.

restrepo-four-actions.png

The combination of the four F_i’s for the four intervals in which actions are taken. This is a complete description of the optimal strategy for our simple symmetric version of the silent duel.

Since the early intervals are getting smaller and smaller as we add more actions, the optimal strategy will resemble a burst of action at the beginning, gradually tapering off as the accuracy increases and we work through our budget. This is an explicit tradeoff between the value of winning (lots of early, low probability attempts) and keeping some actions around for the end where you’re likely to succeed.

Next step: get to the example from the general theorem

At this point, we’ve parsed the general statement of the theorem, and while much of it is still mysterious, we extracted some useful qualitative information from the statement, and tinkered with some simple examples.

At this point, I have confidence that the simple symmetric example Restrepo provided is correct; it passed some basic unit tests, like that each dF_i is normalized. My next task in fully understanding the paper is to be able to derive the symmetric example from the general construction. We’ll do this next time, and include a program that constructs the optimal solution for any input.

Until then!

 

Silent Duels and an Old Paper of Restrepo

Two men start running at each other with loaded pistols, ready to shoot!

It’s a foggy morning for a duel. Newton and Leibniz have decided this macabre contest is the only way to settle their dispute over who invented Calculus. Each pistol is fitted with a silencer and has a single bullet. Neither can tell when the other has attempted a shot, unless, of course, they are hit.

Newton and Leibniz both know that the closer they get to their target, the higher the chance of a successful shot. Eventually they’ll be standing nose-to-nose: a guaranteed hit! But the longer each waits, the more opportunity they give their opponent to fire. If they both fire and miss, mild embarrassment ensues and they resolve to try again tomorrow. When should they shoot to maximize their chance at victory?

This is the so-called silent duel problem, and as you might have guessed it can be phrased without any violence.

Two players compete to succeed in taking some action in the interval [0,1]. They are given a function P(t) that describes the probability of success if the action is taken at time t. Since the two men are “running” at each other, P(t) is assumed to be increasing, with P(0) = 0, P(1) = 1. If Player 1 succeeds in their action first, Player 1 gets a dollar (1 unit of utility) from Player 2; if Player 2 succeeds, Player 1 loses a dollar to Player 2. What strategy should they use to maximize their expected payoff?

Yet another phrasing of the problem is that a beautiful young woman is arriving at a train station, and two suitors are competing to pick her up. If she arrives and nobody is there to pick her up, she waits for the first suitor to arrive. If a suitor arrives and the woman is not there, the suitor assumes she has already been picked up and leaves. I like the duel version better, because what self-respecting woman can’t arrange her own ride these days? Either way, neither example has aged well. We should come up with a modern version where people are racing to McDonald’s to get Mulan Szechuan Sauce.

I originally heard about this problem in a game theory course I took in undergrad, coincidentally the same class where I met my wife. See section 3 of the course notes by Anthony Mendes, which neatly describes how to solve the silent duel when P(x) = x. I remember spending a lot of time confused about this problem, and wrote out my solution over and over again until I felt I understood it.

Almost ten years later, I found a renewed interest in the silent duel when a colleague posed the following variant (having no leads on how to solve it). A government agency releases daily financial data concerning the market every morning at 6AM, and gives API access to it. In this version I’ll say that this data describes the demand for wheat and sheep. If you can get this data before anyone else, even an extra few milliseconds gives you an edge in the market for that day. You can buy up all the wheat if there’s a shortage, or short sheep futures if there’s a surplus.

There are two caveats. First caveat: 6AM is not precise because your clock deviates from the data provider’s clock. Maybe there’s a person who has to hit a button, and they took an extra few seconds to take a bite of their morning donut. If you call the API too early, it will respond, “Please try again later.” If you call the API after the data has been released, you receive the data immediately. Second caveat: since everyone is racing to get this data first, the API rate limits you to 6 requests per minute. If you go over, your account is blocked for 12 hours and you can’t get the data at all that day. You need a Scrooge-McDuckian vault of money to afford a new account, so you’re stuck with the one.

Assuming you have enough time to watch when the data gets released, you can construct a cumulative distribution P(t), which for a time t describes the probability that the data has been released before time t. I.e., the probability of success if you call the API at time t. If we assume the distribution falls within a single minute around 6AM, then we see a strikingly similarity between this problem and a silent duel with six shots. Perhaps it’s not exactly the same, since there are many more than two players in the game, but it’s close. Perhaps you can assume two players, but you (Player 1) get six shots and your opponent (Player 2) gets some larger number.

I was downright twitterpated to see a natural problem fit so neatly and unexpectedly into a bit of math I remembered, but hadn’t thought about for a decade. The solution was too detailed for me to remember it on the spot (I recall it involved some integrals and curious discontinuities), so I told my colleague I’d go find the paper that solves this problem.

Thus began my journey down the rabbit hole. The first hurdle was that I didn’t know what to call the problem. I found my professor’s notes from the course, but they didn’t provide a definitive name beyond “interval games.” After combing through some textbooks and, more helpfully, crawling back along the graph of citations, I discovered the name silent duel—and noisy duel for the variant where the shooters can hear each other’s attempts. However, no textbook I looked at actually provided a full explanation or proof of the solution. They just said, “optimal strategies have been proven to exist,” or detailed a simplification involving one or two bullets. And after a few more hours of looking I found the title of the original paper that solved this problem in the generality I wanted.

Rodrigo Restrepo. Tactical Problems Involving Several Actions. Contributions to the Theory of Games, Vol. III. 1957.

Unfortunately, I wasn’t able to find a digital copy. I did find a copy being sold on Amazon by a third-party seller. Apparently this seller bought old journal proceedings in bulk from the Bell Laboratories library after they closed down. I bought a copy, and according to the Amazon listing there are only 20-odd copies left.

I was also pleased to see the many recognizable names on the cover.

  • Rabin: Turing Award winner who invented nondeterminism as a computing concept (among many other accomplishments)
  • Gale & Shapley: inventors of the Stable Marriage algorithm, the latter of whom won the Nobel Price in Economics for subsequent work on applying it.
  • Berge: one of the leaders who established graph theory and combinatorics as mathematical disciplines in their own right
  • Karlin: a big name in math for social sciences (think of Arrow’s Impossibility Theorem).
  • Milnor: Fields medalist and heavyweight in differential topology.

I thought about how many of these old papers might be lost to history with no digital record. It’s a shame, because the silent duel is a cool problem, absent from many books, and, prompted by my recent discussions, applicable to software! Rodrigo Restrepo in particular seems to have had no PhD students. He might be a faculty emeritus at the University of British Columbia studying mathematical biology, but I wasn’t able to locate a website (or even a photo!) to cross-check publications. If any UBC math faculty read this, perhaps they can provide more details about who Dr. Restrepo is.

All of this culminated in the inevitable next steps. Buy the manuscript, re-typeset the paper in TeX, grok the theorem and the construction, put the paper on arXiv to make it accessible for the foreseeable future, and then use my newfound knowledge to corner the market on sheep futures once and for all!

I drafted the TeX rewrite (still has a few typos), and started working through the paper. Then I realized I had committed to publishing my book by the end of 2018. I forced myself to put it aside, and now I’ve returned to study it. I’ll detail my exploration of the paper and the code to implement the solution in subsequent posts. I intend the subsequent posts to be as much of a narrative of my process working through a paper as it is about the math itself (to be honest, the paper could be clearer, but I chalk it up to pre-computer era descriptions of algorithms). In general, I’d like to explore more and different kinds of ways to share and explore math on the internet.

In the mean time, intrepid readers can venture forth to see the draft on Github.

Until next time!

A Programmer’s Introduction to Mathematics

For the last four years I’ve been working on a book for programmers who want to learn mathematics. It’s finally done, and you can buy it today.

The website for the book is pimbook.org, which has purchase links—paperback and ebook—and a preview of the first pages. You can see more snippets later in the book on the Amazon listing’s “Look Inside” feature.

If you’re a programmer who wants to learn math, this book is written specifically for you! Why? Because programming and math are naturally complementary, and programmers have a leg up in learning math. Many of the underlying modes of thought in mathematics are present in programming, or are otherwise easy to explain by analogies and contrasts to familiar concepts in software. I leverage that in the book so that you can internalize the insights quickly, and appreciate the nuance more deeply than most books can allow. This book is a bridge from the world of programming to the world of math from the mathematician’s perspective. As far as I know, no other book provides this.

Programs make math more interesting and applicable than otherwise. Typical math writers often hold computation and algorithms at a healthy distance. Not us. We embrace computation as a prize and a principle worth fighting for. Each chapter of the book culminates in an exciting program that applies the mathematical insights from the chapter to an interesting application. The applications include cryptographic schemes, machine learning, drawing hyperbolic tessellations, and a Nobel-prize winning algorithm from economics.

The exercises of the book also push you beyond the book itself. There’s so much math out there that you can’t learn it from a single book. Perspectives and elaborations are spread throughout books, papers, blog posts, wikis, lecture notes, math magazines, and your own scratch paper. This book will prepare you to read a variety of sources by introducing you to the standard language of math, and also push you to engage with those resources.

Finally, this book includes a healthy dose of culture. Quotes and passages from the writings of famous mathematicians, contextual explanations of cultural attitudes, and a light dose of history will provide a peek into why mathematics is the way it is today, and why at times it can seem so confounding to an outsider. Through all this, I will show what progress means for math, what attitudes and patterns will help you along the way, and how to stay sane.

Of course, I couldn’t have written the book without the encouragement and support of you, my readers. Thank you for reading, commenting, and supporting me all these years.

Order the book today! I can’t wait to hear what you think 🙂

Table of Contents for PIM

I am down to the home stretch for publishing my upcoming book, “A Programmer’s Introduction to Mathematics.” I don’t have an exact publication date—I’m self publishing—but after months of editing, I’ve only got two chapters left in which to apply edits that I’ve already marked up in my physical copy. That and some notes from external reviewers, and adding jokes and anecdotes and fun exercises as time allows.

I’m committing to publishing by the end of the year. When that happens I’ll post here and also on the book’s mailing list. Here’s a sneak preview of the table of contents. And a shot of the cover design (still a work in progress)

img_20180914_184845-1-e1540048234788.jpg