|Making Hybrid Images||Neural Networks
|Bezier Curves and Picasso||Computing Homology||Probably Approximately
Correct – A Formal Theory
At Google, our organization designs, owns, and maintains a number of optimization models that automate the planning of Google’s datacenter growth and health. As is pretty standard in supply chain optimization and planning, these models are often integer linear programs. It’s a core competency of operations research, after all.
One might think, “Large optimization problems? That sounds hard!” But it’s actually far from the hardest part of the job. In fact, it’s one of the few exciting parts of the job. The real hard part is getting data. Really, it’s that you get promised data that never materializes, and then you architect your system for features that rot before they ripen.
There’s a classic image of a human acting as if they’re throwing a ball for a dog, and the dog sprints off, only soon to realize the ball was never thrown. The ball is the promise of freshly maintained data, and recently I’ve been the dog.
When you don’t have good data, or you have data that’s bad in a known way, you can always try to design your model to accommodate for the deficiencies. As long as it’s clearly defined, it’s not beyond our reach. The math is fun and challenging, and I don’t want to shy away from it. My mathematician’s instinct pulls me left.
My instincts as an engineer pull me right: data issues will ultimately cause unexpected edge cases at the worst moment, and it will fall on me to spend all day debugging for a deadline tomorrow. Data issues lead to more complicated modeling features which further interact with other parts of the model and the broader system in confounding ways. Worst of all, it’s nearly impossible to signal problems to customers who depend on your output. When technical debt is baked into an optimization model as features, it makes explanation much harder. Accepting bad data also requires you write the code in a way that is easy to audit, since you need to audit literally everything. Transparency is good, but it’s tedious to do it generically well, and the returns are not worth it if the end result is, “well we can’t fix the data for two years anyway.”
Though a lot of this technical debt was introduced by predecessors who left the team, I’ve fallen for the mathematical siren’s call a few times. Go on, just add that slick new constraint. Just mask that misbehavior with a heuristic. It’s bit back hard and caused months of drag.
These days I’m swinging hard right on the pendulum. Delete half-implemented features that don’t have data to support them. Delete features that don’t have a clear connection to business needs (even if they work). Push back on new feature requests until the data exists. Require a point of contact and an SLO for any data you don’t own. Make speculative features easy to turn on/off (or remove!) without having to redesign the architecture. If it can’t be made easy to remove, don’t add it until you’re sure it will survive.
If you can’t evade bad data, err on the side of strict initial validation, and doing nothing or gracefully degrading service when validation fails. Expose the failures as alerts to the people who own the data (not you), and give the data owners a tool that repeats the validation logic in your system verbatim, so there is no confusion on the criteria for success. When you have this view, almost all of the complexity in your system lies in enabling this generic auditing, alerting, and managing of intricate (but ultimately arbitrary) policy.
I like to joke that I don’t have data-intensive applications or problems of scale, but rather policy-intensive applications. I haven’t found much insight from other software engineers about how to design and maintain policy-intensive software. Let me know if you have some! The obvious first step is to turn policy code into data. To the extent that we’ve done this, I adore that aspect of our systems. Still, you can’t avoid it when policies need to be encoded in an optimization model.
I do get sad that so much of my time is spent poop-smithing, as I like to say, even though we’re gradually getting better. Our systems need maintenance and care, and strong principles to keep the thicket from overwhelming us. For one, I track net lines of code added, with the goal to have it be net negative month over month, new features and all. We’ve kept it up for six months so far. Even our fixit week this week seems unnecessary, given how well our team has internalized paying off technical debt.
Though I do wonder what it’s all for. So Google can funnel the money it saves on datacenter costs into
informing people the Earth is flat cat videos? If I didn’t have two particular internal side projects to look forward to—they involve topics I’m very interested in—I’d be bored, and I might succumb to jaded feelings, and I’d need a change. Certain perks and particularly enjoyable colleagues help avoid that. But still, I rarely have time to work on the stimulating projects, and even the teammates I’ve been delegating to often defer it to other priorities.
We let dirty data interfere with our design and architecture, now we’re paying back all that technical debt, and as a consequence there’s no time for our human flourishing.
I should open a math cafe.
Our hero, a mathematician, is writing notes in LaTeX and needs to convert it to a format that her blog platform accepts. She’s used to using dollar sign delimiters for math mode, but her blog requires \( \) and \[ \]. Find-and-replace fails because it doesn’t know about which dollar sign is the start and which is the end. She knows there’s some computer stuff out there that could help, but she doesn’t have the damn time to sort through it all. Paper deadlines, argh!
If you want to skip ahead to a working solution, and you can run basic python scripts, see the Github repository for this post with details on how to run it and what the output looks like. It’s about 30 lines of code, and maybe 10 of those lines are not obvious. Alternatively, copy/paste the code posted inline in the section “First attempt: regular expressions.”
In this post I’ll guide our hero through the world of text manipulation, explain the options for solving a problem like this, and finally explain how to build the program in the repository from scratch. This article assumes you have access to a machine that has basic programming tools pre-installed on it, such as python and perl.
The problem with LaTeX
LaTeX is great, don’t get me wrong, but people who don’t have experience writing computer programs that operate on human input tend to write sloppy LaTeX. They don’t anticipate that they might need to programmatically modify the file because the need was never there before. The fact that many LaTeX compilers are relatively forgiving with syntax errors exacerbates the issue.
The most common way to enter math mode is with dollar signs, as in
Now let $\varepsilon > 0$ and set $\delta = 1/\varepsilon$.
For math equations that must be offset, one often first learns to use double-dollar-signs, as in
First we claim that $$0 \to a \to b \to c \to 0$$ is a short exact sequence
The specific details that make it hard to find and convert from this delimiter type to another are:
- Math mode can be broken across lines, but need not be.
- A simple search and replace for $ would conflict with $$.
- The fact that the start and end are symmetric means a simple search and replace for $$ fails: you can’t tell whether to replace it with \[ or \] without knowing the context of where it occurs in the document.
- You can insert a dollar sign in LaTeX using \$ and it will not enter math mode. (I do not solve this problem in my program, but leave it as an exercise to the reader to modify each solution to support this)
First attempt: regular expressions
The first thing most programmers will tell you when you have a text manipulation problem is to use regular expressions (or regex). Regular expressions are text patterns that a program called a regular expression engine uses to find subsets of text in a document. This can often be with the goal of modifying the matched text somehow, but also just to find places where the text occurs to generate a report.
In their basic form, regular expressions are based on a very clean theory called regular languages, which is a kind of grammar equivalent to “structure that can be recognized by a finite state machine.”
[Aside: some folks prefer to distinguish between regular expressions as implemented by software systems (regex) and regular expressions as a representation of a regular language; as it turns out, features added to regex engines make them strictly stronger than what can be represented by the theory of regular languages. In this post I will use “regex” and “regular expressions” both for practical implementations, because programmers and software don’t talk about the theory, and use “regular languages” for the CS theory concept]
The problem is that practical regular expressions are difficult and nit-picky, especially when there are exceptional cases to consider. Even matching something like a date can require a finnicky expression that’s hard for humans to read and debug when they are incorrect. A regular expression for a line in a file that contains a date by itself:
^\s*(19|20)\d\d[- /.](0[1-9]|1)[- /.](0[1-9]|[0-9]|3)\s*$
Worse, regular expressions are rife with “gotchas” having to do with escape characters. For example, parentheses are used for something called “capturing”, so you have to use \( to insert a literal parenthesis. If having to use “\$” in LaTeX bothers you, you’ll hate regular expressions.
Another issue comes from history. There was a time when computers only allowed you to edit a file one line at a time. Many programming tools and frameworks were invented during this time that continue to be used today (you may have heard of sed which is a popular regular expression find/replace program—one I use almost daily). These tools struggle to operate on problems that span many lines of a file, because they simply weren’t designed for that. Problem (1) above suggests this might be a problem.
Yet another issue is in slight discrepancies between regex engines. Perl, python, sed, etc., all have slight variations and “nonstandard” features. As all programmers know, every visible behavior of a system will eventually be depended on by some other system.
But the real core problem is that regular expressions weren’t really designed for knowing about the context where a match occurs. Regular expressions are designed to be character-at-a-time pattern matching. [edit: removed an incorrect example] But over time, regular expression engines have added features to do such things over the years (which makes them more powerful than the original, formal definition of a regular language, and even more powerful than what parsers can handle!), but the more complicated you make a regular expression, the more likely it’s going to misbehave on odd inputs, and less likely others can use it without bugs or modification for their particular use case. Software engineers care very much about such things, though mathematicians needing a one-off solution may not.
One redeeming feature of regular expressions is that—by virtue of being so widely used in industry—there are many tools to work with them. Every major programming language has a regular expression engine built in. And many websites help explain how regexes work. regexr.com is one I like to use. Here is an example of using that website to replace offset mathmode delimiters. Note the “Explain” button, which traces the regex engine as it looks for matches.
So applying this to our problem: we can use two regular expressions to solve the problem. I’m using the perl programming language because its regex engine supports multiline matches. All MacOS and linux systems come with perl pre-installed.
perl -0777 -pe 's/\$\$(.*?)\$\$/\\[\1\\]/gs' < test.tex | perl -0777 -pe 's/\$(.*?)\$/\\(\1\\)/gs' > output.tex
Now let’s explain, starting with the core expressions being matched and replaced.
s/X/Y/ tells a regex engine to “substitute regex matches of X with Y”. In the first regex X is \$\$(.*?)\$\$, which breaks down as
- \$\$ match two literal dollar signs
- ( capture a group of characters signified by the regex between here and the matching closing parenthesis
- .* zero or more of any character
- ? looking at the “zero or more” in the previous step, try to match as few possible characters as you can while still making this pattern successfully match something
- ) stop the capture group, and save it as group 1
- \$\$ match two more literal dollar signs
Then Y is the chosen replacement. We’re processing offset mathmode, so we want \[ \]. Y is \\[\1\\], which means
- \\ a literal backslash
- [ a literal open bracket
- \1 the first capture group from the matched expression
- \\ a literal backslash
- ] a literal close bracket
All together we have s/\$\$(.*?)\$\$/\\[\1\\]/, but then we add a final s and g characters, which act as configuration. The “s” tells the regex engine to allow the dot . to match newlines (so a pattern can span multiple lines) and the “g” tells the regex to apply the substitution globally to every match it sees—as opposed to just the first.
Finally, the full first command is
perl -0777 -pe 's/\$\$(.*?)\$\$/\\[\1\\]/gs' < test.tex
This tells perl to read in the entire test.tex file and apply the regex to it. Broken down
- perl run perl
- -0777 read the entire file into one string. If you omit it, perl will apply the regex to each line separately.
- -p will make perl automatically “read input and print output” without having to tell it to with a “print” statement
- e tells perl to run the following command line argument as a program.
- < test.tex tells perl to use the file test.tex as input to the program (as input to the regex engine, in this case).
Then we pipe the output of this first perl command to a second one that does a very similar replacement for inline math mode.
<first_perl_command> | perl -0777 -pe 's/\$(.*?)\$/\\(\1\\)/gs'
The vertical bar | tells the shell executing the commands to take the output of the first program and feed it as input to the second program, allowing us to chain together sequences of programs. The second command does the same thing as the first, but replacing $ with \( and \). Note, it was crucial we had this second program occur after the offset mathmode regex, since $ would match $$.
Exercise: Adapt this solution to support Problem (4), support for literal \$ dollar signs. Hint: you can either try to upgrade the regular expression to not be tricked into thinking \$ is a delimiter, or you can add extra programs before that prevent \$ from being a problem. Warning: this exercise may cause a fit.
It can feel like a herculean accomplishment to successfully apply regular expressions to a problem. You can appreciate the now-classic programmer joke from the webcomic xkcd:
However, as you can tell, getting regular expressions right is hard and takes practice. It’s great when someone else solves your problem exactly, and you can copy/paste for a one-time fix. But debugging regular expressions that don’t quite work can be excruciating. There is another way!
Second attempt: using a parser generator
While regular expressions have a clean theory of “character by character stateless processing”, they are limited. It’s not possible to express the concept of “memory” in a regular expression, and the simplest example of this is the problem of counting. Suppose you want to find strings that constitute valid, balanced parentheticals. E.g., this is balanced:
(hello (()there)() wat)
But this is not
(hello ((there )() wat)
This is impossible for regexes to handle because counting the opening parentheses is required to match the closing parens, and regexes can’t count arbitrarily high. If you want to parse and manipulate structures like this, that have balance and nesting, regex will only bring you heartache.
The next level up from regular expressions and regular languages are the two equivalent theories of context-free grammars and pushdown automata. A pushdown automaton is literally a regular expression (a finite state machine) equipped with a simple kind of memory called a stack. Rather than dwell on the mechanics, we’ll see how context-free grammars work, since if you can express your document as a context free grammar, a tool called a parser generator will give you a parsing program for free. Then a few simple lines of code allow you to manipulate the parsed representation, and produce the output document.
The standard (abstract) notation of a context-free grammar is called Extended Backus-Naur Form (EBNF). It’s a “metasyntax”, i.e., a syntax for describing syntax. In EBNF, you describe rules and terminals. Terminals are sequences of constant patterns, like
OFFSET_DOLLAR_DELIMITER = $$ OFFSET_LEFT_DELIMITER = \[ OFFSET_LEFT_DELIMITER = \]
A rule is an “or” of sequences of other rules or terminals. It’s much easier to show an example:
char = "a" | "b" | "c" offset = OFFSET_DOLLAR_DELIMITER char OFFSET_DOLLAR_DELIMITER | OFFSET_LEFT char OFFSET_RIGHT
The above describes the structure of any string that looks like offset math mode, but with a single “a” or a single “b” or a single “c” inside, e.g, “\[b\]”. You can see some more complete examples on Wikipedia, though they use a slightly different notation.
With some help from a practical library’s built-in identifiers for things like “arbitrary text” we can build a grammar that covers all of the ways to do latex math mode.
latex = content content = content mathmode content | TEXT | EMPTY mathmode = OFFSETDOLLAR TEXT OFFSETDOLLAR | OFFSETOPEN TEXT OFFSETCLOSE | INLINEOPEN TEXT INLINECLOSE | INLINE TEXT INLINE INLINE = $ INLINEOPEN = \( INLINECLOSE = \) OFFSETDOLLAR = $$ OFFSETOPEN = \[ OFFSETCLOSE = \]
Here we’re taking advantage of the fact that we can’t nest mathmode inside of mathmode in LaTeX (you probably can, but I’ve never seen it), by defining the mathmode rule to contain only text, and not other instances of the “content” rule. This rules out some ambiguities, such as whether “$x$ y $z$” is a nested mathmode or not.
We may not need the counting powers of context-free grammars, yet EBNF is easier to manage than regular expressions. You can apply context-sensitive rules to matches, whereas with regexes that would require coordination between separate passes. The order of operations is less sensitive; because the parser generator knows about all patterns you want to match in advance, it will match longer terminals before shorter—more ambiguous—terminals. And if we wanted to do operations on all four kinds of math mode, this allows us to do so without complicated chains of regular expressions.
The history of parsers is long and storied, and the theory of generating parsing programs from specifications like EBNF is basically considered a solved problem. However, there are a lot of parser generators out there. And, like regular expression engines, they each have their own flavor of EBNF—or, as is more popular nowadays, they have you write your EBNF using the features of the language the parser generator is written in. And finally, a downside of using a parser generator is that you have to then write a program to operate on the parsed representation (which also differs by implementation).
We’ll demonstrate this process by using a Python library that, in my opinion, stays pretty faithful to the EBNF heritage. It’s called lark and you can pip-install it as
pip install lark-parser
Note: the hard-core industry standard parser generators are antlr, lex, and yacc. I would not recommend them for small parsing jobs, but if you’re going to do this as part of a company, they are weighty, weathered, well-documented—unlike lark.
Lark is used entirely inside python, and you specify the EBNF-like grammar as a string. For example, ours is
tex: content+ ?content: mathmode | text+ mathmode: OFFSETDOLLAR text+ OFFSETDOLLAR | OFFSETOPEN text+ OFFSETCLOSE | INLINEOPEN text+ INLINECLOSE | INLINE text+ INLINE INLINE: "$" INLINEOPEN: "\\(" INLINECLOSE: "\\)" OFFSETDOLLAR: "$$" OFFSETOPEN: "\\[" OFFSETCLOSE: "\\]" ?text: /./s
You can see the similarities with our “raw” EBNF. The main difference here is the use of + for matching “one or more” of a rule, the use of a regular expression to define the “text” rule as any character (here again the trailing “s” means: allow the dot character to match newline characters). The backslashes are needed because backslash is an escape character in Python. And finally, the question mark tells lark to try to compress the tree if it only matches one item (you can see what the difference is by playing with our display-parsed-tree.py script that shows the parsed representation of the input document. You can read more in lark’s documentation about what the structure of the parsed tree is as python objects (Tree for rule/terminal matches and Token for individual characters).
For the input “Let $x=0$”, the parsed tree is as follows (note that the ? makes lark collapse the many “text” matches):
Tree(tex, [Tree(content, [Token(__ANON_0, 'L'), Token(__ANON_0, 'e'), Token(__ANON_0, 't'), Token(__ANON_0, ' ')]), Tree(mathmode, [Token(INLINE, '$'), Token(__ANON_0, 'x'), Token(__ANON_0, '='), Token(__ANON_0, '0'), Token(INLINE, '$')]), Token(__ANON_0, '\n')])
So now we can write a simple python program that traverses this tree and converts the delimiters. The entire program is on Github, but the core is
def join_tokens(tokens): return ''.join(x.value for x in tokens) def handle_mathmode(tree_node): '''Switch on the different types of math mode, and convert the delimiters to the desired output, and then concatenate the text between.''' starting_delimiter = tree_node.children.type if starting_delimiter in ['INLINE', 'INLINEOPEN']: return '\\(' + join_tokens(tree_node.children[1:-1]) + '\\)' elif starting_delimiter in ['OFFSETDOLLAR', 'OFFSETOPEN']: return '\\[' + join_tokens(tree_node.children[1:-1]) + '\\]' else: raise Exception("Unsupported mathmode type %s" % starting_delimiter) def handle_content(tree_node): '''Each child is a Token node whose text we'd like to concatenate together.''' return join_tokens(tree_node.children)
The rest of the program uses lark to create the parser, reads the file from standard input, processes the parsed representation, and outputs the converted document to standard output. You can use the program like this:
python convert-delimiters.py < input.tex > output.tex
Exercise: extend this grammar to support literal dollar signs using \$, and passes them through to the output document unchanged.
I personally prefer regular expressions when the job is quick. If my text manipulation rule fits on one line, or can be expressed without requiring “look ahead” or “look behind” rules, regex is a winner. It’s also a winner when I only expect it to fail in a few exceptional cases that can easily be detected and fixed by hand. It’s faster to write a scrappy regex, and then open the output in a text editor and manually fix one or two mishaps, than it is to write a parser.
However, the longer I spend on a regular expression problem—and the more frustrated I get wrestling with it—the more I start to think I should have used a parser all along. This is especially true when dealing with massive jobs. Such as converting delimiters in hundreds of blog articles, each thousands of words long, or making changes across all chapter files of a book.
When I need something in between rigid structure and quick-and-dirty, I actually turn to vim. Vim has this fantastic philosophy of “act, repeat, rewind” wherein you find an edit that applies to the thing you want to change, then you search for the next occurrence of the start, try to apply the change again, visually confirm it does the right thing, and if not go back and correct it manually. Learning vim is a major endeavor (for me it feels lifelong, as I’m always learning new things), but since I spend most of my working hours editing structured text the investment and philosophy has paid off.
Until next time!
It’s pretty well known now that the average mobile app sucks up as much information as it can about you and sells your data to shadowy nefarious organizations. My wife recently installed a sort of “health” tracking app and immediately got three unwanted subscriptions to magazines with names like “Shape.” We have no idea how to cancel them, or if we’ll eventually get an invoice demanding payment. This is a best case scenario, with worse cases being that your GPS location is tracked, your phone number is sold to robocallers, or your information is leaked to make it easier for hackers to break into your email account. All because you play Mobile Legends: Bang Bang!
But being able to track and understand your habits is a good thing. It encourages you to be healthier, more financially responsible, or to do more ultimately gratifying activities outside of staring at your phone or computer. If a tracker app is the difference between an alcoholic sticking to their AA plan and a relapse, you shouldn’t have to give up your privacy for it.
Rather than sell my data for convenience, I’ve recently started to make my own tracking apps. Here’s how:
- Make a Google Form for entering data.
- Analyze that data in the linked spreadsheet.
Here’s an example I made as a demo, but which has a real analogue that I use to track bullshit work I have to do at my job, and how long it would take to avoid it. When the amount of time wasted exceeds the time for a permanent fix, I can justify delaying other work. It’s called the Churn Log.
And the linked spreadsheet with the raw response data looks like:
These are super fast to make, and have a number of important benefits:
- I can make them for whatever purpose I want, I don’t need to wait for some software engineers to happen to make an app that fits my needs. One other example I made is a “gift idea log.” I don’t think anyone will ever make this app.
- It lives on my phone just like other apps, since (on Android) you can save a link to a webpage as an icon as if it were a native app.
- It’s fast and uses minimal data.
- You can use it trivially with family members and friends.
- I get an incentive to become a spreadsheet wizard, which makes me better at my job.
The downsides are that it’s not as convenient as being completely automated. For instance, a finance tracker app can connect to your credit card account to automatically extract purchase history and group it into food, bills, etc. But then again, if I just want a tracker for my food purchases I have to give up my book purchases, my alcohol purchases (so many expensive liqueurs), and my obsession with bowties. With the Google Form method, I can quickly enter some data when I’m checking out at the grocery store or paying a check when dining out, and then when I’m interested I can go into the spreadsheet, make a chart or compute some averages, and I have 90% of the insight I care about.
But wait, doesn’t Google then have all your data? Can’t it sell it and send you unwanted magazines?
You’re right that technically Google gets all the data you enter. But with Mobile Legends: Health Tracker! (not a real app) they get to pick the structure of your entered data, so they know exactly what you’re entering. Since Google Forms lets you build a form with arbitrary semantics, it’s virtually impossible that enough people will choose the exact same structure that Google could feasibly be able to make sense of it.
And even if Google wanted to be evil and sell your self-tracked data, it wouldn’t be cost effective for Google to do so. The amount of work required to construct a lucrative interpretation of the random choices that humans make in building their own custom tracker app would far outweigh the gains from selling the data. The only reason that little apps like Mobile Legends Health Tracker can make money selling your data is that they suck up system metrics in a structured format whose semantics are known in advance. Disclosure: I work for Google, they aren’t paying me to write this—I honestly believe it’s a good idea—and having seen Google’s project management and incentive structure from the inside, I feel confident that custom tracker app data isn’t worthwhile enough to invest in parsing and exploiting. Not even to mention how much additional scrutiny Google gets from regulators.
While making apps like this I’ve actually learned a ton about spreadsheets that I never knew. For example, you can select an infinite range—e.g., an entire column—and make a chart that will auto-update as the empty cells get filled with new data. You can also create static references to named cells to act as configuration constants using dollar signs.
The beauty of this method is that it puts the power back in your hands, and has a gradual learning curve. If you or your friend has never written programs before, this gives an immediate and relevant application. You can start with simple spreadsheet tools (SUM and IF macros, charting and cross-sheet references), graduate to Apps script for scheduled checks and alerts, and finally to a fully fledged programming language, if the need arises.
I can’t think of a better way to induct someone into the empowering world of Automating Tedious Crap and gaining insights from data. We as programmers (and generally tech-inclined people) can help newcomers get set up. And my favorite part: most useful analyses require learning just a little bit of math and statistics 🙂
Last time we discussed the setup for the silent duel problem: two players taking actions in , player 1 gets chances to act, player 2 gets , and each knows their probability of success when they act.
The solution is in a paper of Rodrigo Restrepo from the 1950s. In this post I’ll start detailing how I study this paper, and talk through my thought process for approaching a bag of theorems and proofs. If you want to follow along, I re-typeset the paper on Github.
Game Theory Basics
The Introduction starts with a summary of the setting of game theory. I remember most of this so I will just summarize the basics of the field. Skip ahead if you already know what the minimax theorem is, and what I mean when I say the “value” of a game.
A two-player game consists of a set of actions for each player—which may be finite or infinite, and need not be the same for both players—and a payoff function for each possible choice of actions. The payoff function is interpreted as the “utility” that player 1 gains and player 2 loses. If the payoff is negative, you interpret it as player 1 losing utility to player 2. Utility is just a fancy way of picking a common set of units for what each player treasures in their heart of hearts. Often it’s stated as money and we assume both players value cash the same way. Games in which the utility is always “one player gains exactly the utility lost by the other player” are called zero-sum.
With a finite set of actions, the payoff function is a table. For rock-paper-scissors the table is:
Rock, paper: -1
Rock, scissors: 1
Rock, rock: 0
Paper, paper: 0
Paper, scissors: -1
Paper, rock: 1
Scissors, paper: 1
Scissors, scissors: 0
Scissors, rock: -1
You could arrange this in a matrix and analyze the structure of the matrix, but we won’t. It doesn’t apply to our forthcoming setting where the players have infinitely many strategies.
A strategy is a possibly-randomized algorithm (whose inputs are just the data of the game, not including any past history of play) that outputs an action. In some games, the optimal strategy is to choose a single action no matter what your opponent does. This is sometimes called a pure, dominating strategy, not because it dominates your opponent, but because it’s better than all of your other options no matter what your opponent does. The output action is deterministic.
However, as with rock-paper-scissors, the optimal strategy for most interesting games requires each player to act randomly according to a fixed distribution. Such strategies are called mixed or randomized. For rock-paper-scissors, the optimal strategy is to choose rock, paper, and scissors with equal probability. Computers are only better than humans at rock-paper-scissors because humans are bad at behaving consistently and uniformly random.
The famous minimax theorem says that every two-player zero-sum game has an optimal strategy for each player, which is possibly randomized. This strategy is optimal in the sense that it maximizes your expected winnings no matter what your opponent does. However, if your opponent is playing a particularly suboptimal strategy, the minimax solution might not be as good as a solution that takes advantage of the opponent’s dumb choices. A uniform random rock-paper-scissors strategy is not optimal if your opponent always plays “rock.” However, the optimal strategy doesn’t need special knowledge or space to store information about past play. If you played against God, you would blindly use the minimax strategy and God would have no upper hand. I wonder if the pope would have excommunicated me for saying that in the 1600’s.
The expected winnings for player 1 when both players play a minimax-optimal strategy is called the value of the game, and this number is unique (even if there are possibly multiple optimal strategies). If a game is symmetric—meaning both players have the same actions and the payoff function is symmetric—then the value is guaranteed to be zero. The game is fair.
The version of the minimax theorem that most people use (in particular, the version that often comes up in theoretical computer science) shows that finding an optimal strategy is equivalent to solving a linear program. This is great because it means that any such (finite) game is easy to solve. You don’t need insight; just compile and run. The minimax theorem is also true for sufficiently well-behaved continuous action spaces. The silent duel is well-behaved, so our goal is to compute an explicit, easy-to-implement strategy that the minimax theorem guarantees exists. As a side note, here is an example of a poorly-behaved game with no minimax optimum.
While the minimax theorem guarantees optimal strategies and a value, the concept of the “value” of the game has an independent definition:
Let be finite sets of actions for players 1, 2 respectively, and be strategies, i.e., probability distributions over and so that is the probability that is chosen. Let be the payoff function for the game. The value of the game is a real number such that there exist two strategies with the two following properties. First, for every fixed ,
(no matter what player 2 does, player 1’s strategy guarantees at least payoff), and for every fixed ,
(no matter what player 1 does, player 2’s strategy prevents a loss of more than ).
Since silent duels are continuous, Restrepo opens the paper with the corresponding definition for continuous games. Here a probability distribution is the same thing as a “positive measure with total measure 1.” Restrepo uses and for the strategies, and the corresponding statement of expected payoff for player 1 is that, for all fixed actions ,
And likewise, for all ,
All of this background gets us through the very first paragraph of the Restrepo paper. As I elaborate in my book, this is par for the course for math papers, because written math is optimized for experts already steeped in the context. Restrepo assumes the reader knows basic game theory so we can get on to the details of his construction, at which point he slows down considerably to focus on the details.
Description of the Optimal Strategies
Starting in section 2, Restrepo describes the construction of the optimal strategy, but first he explains the formal details of the setting of the game. We already know the two players are taking and actions between , but we also fix the probability of success. Player 1 knows a distribution on for which is the probability of success when acting at time . Likewise, player 2 has a possibly different distribution , and (crucially) both increase continuously on . (In section 3 he clarifies further that satisfies , and , likewise for .) Moreover, both players know both . One could say that each player has an estimate of their opponent’s firing accuracy, and wants to be optimal compared to that estimate.
The payoff function is defined informally as: 1 if Player one succeeds before Player 2, -1 if Player 2 succeeds first, and 0 if both players exhaust their actions before the end and none succeed. Though Restrepo does not state it, if the players act and succeed at the same time—say both players fire at time —the payoff should also be zero. We’ll see how this is converted to a more formal (and cumbersome!) mathematical definition in a future post.
Next we’ll describe the statement of the fully general optimal strategy (which will be essentially meaningless, but have some notable features we can infer information from), and get a sneak peek at how to build this strategy algorithmically. Then we’ll see a simplified example of the optimal strategy.
The optimal strategy presented depends only on the values (the number of actions each player gets) and their success probability distributions . For player 1, the strategy splits up into subintervals
Crucially, this strategy ignores the initial interval . In each other subinterval Player 1 attempts an action at a time chosen by a probability distribution specific to that interval, independently of previous attempts. But no matter what, there is some initial wait time during which no action will ever be taken. This makes sense: if player 1 fired at time 0, it is a guaranteed wasted shot. Likewise, firing at time 0.000001 is basically wasted (due to continuity, unless is obnoxiously steep early on).
Likewise for player 2, the optimal strategy is determined by numbers resulting in intervals with .
The difficult part of the construction is describing the distributions dictating when a player should act during an interval. It’s difficult because an interval for player 1 and player 2 can overlap partially. Maybe and . Player 1 knows that Player 2 (using their corresponding minimax strategy) must act before time , and gets another chance after that time. This suggests that the distribution determining when Player 1 should act within may have a discontinuous jump at .
Call the distribution for Player 1 to act in the interval . Since it is a continuous distribution, Restrepo uses for the cumulative distribution function and for the probability density function. Then these functions are defined by (note this should be mostly meaningless for the moment)
where is defined as
The constants and are related by the equation
What can we glean from this mashup of symbols? The first is that (obviously) the distribution is zero outside the interval . Within it, there is this mysterious that is related to the used to define the next interval’s probability. This suggests we will likely build up the strategy in reverse starting with as the “base case” (if , then it is the only one).
Next, we notice the curious definition of . It unsurprisingly requires knowledge of both and , but the coefficient is strangely chosen: it’s a product over all failure probabilities () of all interval-starts happening later for the opponent.
[Side note: it’s very important that this is a constant; when I first read this, I thought that it was , which makes the eventual task of integrating much harder.]
Finally, the last interval (the one ending at ) may include the option to simply “wait for a guaranteed hit,” which Restrepo calls a “discrete mass of at .” That is, may have a different representation than the rest. Indeed, at the end of the paper we will find that Restrepo gives a base-case definition for that allows us to bootstrap the construction.
Player 2’s strategy is the same as Player 1’s, but replacing the roles of in the obvious way.
The symmetric example
As with most math research, the best way to parse a complicated definition or construction is to simplify the different aspects of the problem until they become tractable. One way to do this is to have only a single action for both players, with . Restrepo provides a more general example to demonstrate, which results in the five most helpful lines in the paper. I’ll reproduce them here verbatim:
EXAMPLE. Symmetric Game: and . In this case the two
players have the same optimal strategies; , and . Furthermore
Saying means there is no “wait until to guarantee a hit”, which makes intuitive sense. You’d only want to do that if your opponent has exhausted all their actions before the end, which is only likely to happen if they have fewer actions than you do.
When Restrepo writes , there are a few things happening. First, we confirm that we’re working backwards from . Second, he’s implicitly saying “choose such that has the desired cumulative density.” After a bit of reflection, there’s no other way to specify the except implicitly: we don’t have a formula for to lean on.
Finally, the definition of the density function helps us understand under what conditions the probability function would be increasing or decreasing from the start of the interval to the end. Looking at the expression , we can see that polynomials will result in an expression dominated by for some , which is decreasing. By taking the derivative, an increasing density would have to be built from a satisfying . However, I wasn’t able to find any examples that satisfy this. Polynomials, square roots, logs and exponentials, all seem to result in decreasing density functions.
Finally, we’ll plot two examples. The first is the most reductive: , and . In this case , and there is only one term , for which . Then . (For verification, note the integral of on is indeed 1).
Note that the reason is so nice is that is so simple. If were, say, , then should shift to being . If were more complicated, we’d have to invert it (or use an approximate search) to find the location for which .
Next, we loosen the example to let , still with . In this case, we have the same final interval . The new actions all occur in the time before , in the intervals If there were more actions, we’d get smaller inverse-of-odd-spaced intervals approaching zero. The probability densities are now steeper versions of the same , with the constant getting smaller to compensate for the fact that gets larger and maintain the normalized distribution. For example, the earliest interval results in . Closer to zero the densities are somewhat shallower compared to the size of the interval; for example in the density toward the beginning of the interval is only about twice as large as the density toward the end.
Since the early intervals are getting smaller and smaller as we add more actions, the optimal strategy will resemble a burst of action at the beginning, gradually tapering off as the accuracy increases and we work through our budget. This is an explicit tradeoff between the value of winning (lots of early, low probability attempts) and keeping some actions around for the end where you’re likely to succeed.
Next step: get to the example from the general theorem
At this point, we’ve parsed the general statement of the theorem, and while much of it is still mysterious, we extracted some useful qualitative information from the statement, and tinkered with some simple examples.
At this point, I have confidence that the simple symmetric example Restrepo provided is correct; it passed some basic unit tests, like that each is normalized. My next task in fully understanding the paper is to be able to derive the symmetric example from the general construction. We’ll do this next time, and include a program that constructs the optimal solution for any input.