DIY Tracking Apps with Google Forms

It’s pretty well known now that the average mobile app sucks up as much information as it can about you and sells your data to shadowy nefarious organizations. My wife recently installed a sort of “health” tracking app and immediately got three unwanted subscriptions to magazines with names like “Shape.” We have no idea how to cancel them, or if we’ll eventually get an invoice demanding payment. This is a best case scenario, with worse cases being that your GPS location is tracked, your phone number is sold to robocallers, or your information is leaked to make it easier for hackers to break into your email account. All because you play Mobile Legends: Bang Bang!

But being able to track and understand your habits is a good thing. It encourages you to be healthier, more financially responsible, or to do more ultimately gratifying activities outside of staring at your phone or computer. If a tracker app is the difference between an alcoholic sticking to their AA plan and a relapse, you shouldn’t have to give up your privacy for it.

Rather than sell my data for convenience, I’ve recently started to make my own tracking apps. Here’s how:

  1. Make a Google Form for entering data.
  2. Analyze that data in the linked spreadsheet.

Here’s an example I made as a demo, but which has a real analogue that I use to track bullshit work I have to do at my job, and how long it would take to avoid it. When the amount of time wasted exceeds the time for a permanent fix, I can justify delaying other work. It’s called the Churn Log.

Screen Shot 2019-03-07 at 6.19.10 PM

And the linked spreadsheet with the raw response data looks like:

Screen Shot 2019-03-07 at 6.26.04 PM.png

These are super fast to make, and have a number of important benefits:

  • I can make them for whatever purpose I want, I don’t need to wait for some software engineers to happen to make an app that fits my needs. One other example I made is a “gift idea log.” I don’t think anyone will ever make this app.
  • It lives on my phone just like other apps, since (on Android) you can save a link to a webpage as an icon as if it were a native app.
  • It’s fast and uses minimal data.
  • You can use it trivially with family members and friends.
  • I get an incentive to become a spreadsheet wizard, which makes me better at my job.

The downsides are that it’s not as convenient as being completely automated. For instance, a finance tracker app can connect to your credit card account to automatically extract purchase history and group it into food, bills, etc. But then again, if I just want a tracker for my food purchases I have to give up my book purchases, my alcohol purchases (so many expensive liqueurs), and my obsession with bowties. With the Google Form method, I can quickly enter some data when I’m checking out at the grocery store or paying a check when dining out, and then when I’m interested I can go into the spreadsheet, make a chart or compute some averages, and I have 90% of the insight I care about.

But wait, doesn’t Google then have all your data? Can’t it sell it and send you unwanted magazines?

You’re right that technically Google gets all the data you enter. But with Mobile Legends: Health Tracker! (not a real app) they get to pick the structure of your entered data, so they know exactly what you’re entering. Since Google Forms lets you build a form with arbitrary semantics, it’s virtually impossible that enough people will choose the exact same structure that Google could feasibly be able to make sense of it.

And even if Google wanted to be evil and sell your self-tracked data, it wouldn’t be cost effective for Google to do so. The amount of work required to construct a lucrative interpretation of the random choices that humans make in building their own custom tracker app would far outweigh the gains from selling the data. The only reason that little apps like Mobile Legends Health Tracker can make money selling your data is that they suck up system metrics in a structured format whose semantics are known in advance. Disclosure: I work for Google, they aren’t paying me to write this—I honestly believe it’s a good idea—and having seen Google’s project management and incentive structure from the inside, I feel confident that custom tracker app data isn’t worthwhile enough to invest in parsing and exploiting. Not even to mention how much additional scrutiny Google gets from regulators.

While making apps like this I’ve actually learned a ton about spreadsheets that I never knew. For example, you can select an infinite range—e.g., an entire column—and make a chart that will auto-update as the empty cells get filled with new data. You can also create static references to named cells to act as configuration constants using dollar signs.

Even better, Google Sheets has two ways to interact with it externally. You can write Google Apps Script which is a flavor of Javascript that allows you to do things like send email alerts on certain conditions. E.g., if you tracked your dining budget you could get an email alert when you’re getting close to the limit. Or you could go full engineer and use the Google Sheets Python API to write whatever program you want to analyze your data. I sketched out a prototype scheduler app where the people involved entered their preferences via a Google Form, and I ran a Python script to pull the data and find a good arrangement that respected people’s preferences. That’s not a tracker app, but you can imagine arbitrarily complicated analysis of your own tracked data.

The beauty of this method is that it puts the power back in your hands, and has a gradual learning curve. If you or your friend has never written programs before, this gives an immediate and relevant application. You can start with simple spreadsheet tools (SUM and IF macros, charting and cross-sheet references), graduate to Apps script for scheduled checks and alerts, and finally to a fully fledged programming language, if the need arises.

I can’t think of a better way to induct someone into the empowering world of Automating Tedious Crap and gaining insights from data. We as programmers (and generally tech-inclined people) can help newcomers get set up. And my favorite part: most useful analyses require learning just a little bit of math and statistics 🙂

Silent Duels and an Old Paper of Restrepo

Two men start running at each other with loaded pistols, ready to shoot!

It’s a foggy morning for a duel. Newton and Leibniz have decided this macabre contest is the only way to settle their dispute over who invented Calculus. Each pistol is fitted with a silencer and has a single bullet. Neither can tell when the other has attempted a shot, unless, of course, they are hit.

Newton and Leibniz both know that the closer they get to their target, the higher the chance of a successful shot. Eventually they’ll be standing nose-to-nose: a guaranteed hit! But the longer each waits, the more opportunity they give their opponent to fire. If they both fire and miss, mild embarrassment ensues and they resolve to try again tomorrow. When should they shoot to maximize their chance at victory?

This is the so-called silent duel problem, and as you might have guessed it can be phrased without any violence.

Two players compete to succeed in taking some action in the interval [0,1]. They are given a function P(t) that describes the probability of success if the action is taken at time t. Since the two men are “running” at each other, P(t) is assumed to be increasing, with P(0) = 0, P(1) = 1. If Player 1 succeeds in their action first, Player 1 gets a dollar (1 unit of utility) from Player 2; if Player 2 succeeds, Player 1 loses a dollar to Player 2. What strategy should they use to maximize their expected payoff?

Yet another phrasing of the problem is that a beautiful young woman is arriving at a train station, and two suitors are competing to pick her up. If she arrives and nobody is there to pick her up, she waits for the first suitor to arrive. If a suitor arrives and the woman is not there, the suitor assumes she has already been picked up and leaves. I like the duel version better, because what self-respecting woman can’t arrange her own ride these days? Either way, neither example has aged well. We should come up with a modern version where people are racing to McDonald’s to get Mulan Szechuan Sauce.

I originally heard about this problem in a game theory course I took in undergrad, coincidentally the same class where I met my wife. See section 3 of the course notes by Anthony Mendes, which neatly describes how to solve the silent duel when P(x) = x. I remember spending a lot of time confused about this problem, and wrote out my solution over and over again until I felt I understood it.

Almost ten years later, I found a renewed interest in the silent duel when a colleague posed the following variant (having no leads on how to solve it). A government agency releases daily financial data concerning the market every morning at 6AM, and gives API access to it. In this version I’ll say that this data describes the demand for wheat and sheep. If you can get this data before anyone else, even an extra few milliseconds gives you an edge in the market for that day. You can buy up all the wheat if there’s a shortage, or short sheep futures if there’s a surplus.

There are two caveats. First caveat: 6AM is not precise because your clock deviates from the data provider’s clock. Maybe there’s a person who has to hit a button, and they took an extra few seconds to take a bite of their morning donut. If you call the API too early, it will respond, “Please try again later.” If you call the API after the data has been released, you receive the data immediately. Second caveat: since everyone is racing to get this data first, the API rate limits you to 6 requests per minute. If you go over, your account is blocked for 12 hours and you can’t get the data at all that day. You need a Scrooge-McDuckian vault of money to afford a new account, so you’re stuck with the one.

Assuming you have enough time to watch when the data gets released, you can construct a cumulative distribution P(t), which for a time t describes the probability that the data has been released before time t. I.e., the probability of success if you call the API at time t. If we assume the distribution falls within a single minute around 6AM, then we see a strikingly similarity between this problem and a silent duel with six shots. Perhaps it’s not exactly the same, since there are many more than two players in the game, but it’s close. Perhaps you can assume two players, but you (Player 1) get six shots and your opponent (Player 2) gets some larger number.

I was downright twitterpated to see a natural problem fit so neatly and unexpectedly into a bit of math I remembered, but hadn’t thought about for a decade. The solution was too detailed for me to remember it on the spot (I recall it involved some integrals and curious discontinuities), so I told my colleague I’d go find the paper that solves this problem.

Thus began my journey down the rabbit hole. The first hurdle was that I didn’t know what to call the problem. I found my professor’s notes from the course, but they didn’t provide a definitive name beyond “interval games.” After combing through some textbooks and, more helpfully, crawling back along the graph of citations, I discovered the name silent duel—and noisy duel for the variant where the shooters can hear each other’s attempts. However, no textbook I looked at actually provided a full explanation or proof of the solution. They just said, “optimal strategies have been proven to exist,” or detailed a simplification involving one or two bullets. And after a few more hours of looking I found the title of the original paper that solved this problem in the generality I wanted.

Rodrigo Restrepo. Tactical Problems Involving Several Actions. Contributions to the Theory of Games, Vol. III. 1957.

Unfortunately, I wasn’t able to find a digital copy. I did find a copy being sold on Amazon by a third-party seller. Apparently this seller bought old journal proceedings in bulk from the Bell Laboratories library after they closed down. I bought a copy, and according to the Amazon listing there are only 20-odd copies left.

I was also pleased to see the many recognizable names on the cover.

  • Rabin: Turing Award winner who invented nondeterminism as a computing concept (among many other accomplishments)
  • Gale & Shapley: inventors of the Stable Marriage algorithm, the latter of whom won the Nobel Price in Economics for subsequent work on applying it.
  • Berge: one of the leaders who established graph theory and combinatorics as mathematical disciplines in their own right
  • Karlin: a big name in math for social sciences (think of Arrow’s Impossibility Theorem).
  • Milnor: Fields medalist and heavyweight in differential topology.

I thought about how many of these old papers might be lost to history with no digital record. It’s a shame, because the silent duel is a cool problem, absent from many books, and, prompted by my recent discussions, applicable to software! Rodrigo Restrepo in particular seems to have had no PhD students. He might be a faculty emeritus at the University of British Columbia studying mathematical biology, but I wasn’t able to locate a website (or even a photo!) to cross-check publications. If any UBC math faculty read this, perhaps they can provide more details about who Dr. Restrepo is.

All of this culminated in the inevitable next steps. Buy the manuscript, re-typeset the paper in TeX, grok the theorem and the construction, put the paper on arXiv to make it accessible for the foreseeable future, and then use my newfound knowledge to corner the market on sheep futures once and for all!

I drafted the TeX rewrite (still has a few typos), and started working through the paper. Then I realized I had committed to publishing my book by the end of 2018. I forced myself to put it aside, and now I’ve returned to study it. I’ll detail my exploration of the paper and the code to implement the solution in subsequent posts. I intend the subsequent posts to be as much of a narrative of my process working through a paper as it is about the math itself (to be honest, the paper could be clearer, but I chalk it up to pre-computer era descriptions of algorithms). In general, I’d like to explore more and different kinds of ways to share and explore math on the internet.

In the mean time, intrepid readers can venture forth to see the draft on Github.

Until next time!

A Programmer’s Introduction to Mathematics

For the last four years I’ve been working on a book for programmers who want to learn mathematics. It’s finally done, and you can buy it today.

The website for the book is pimbook.org, which has purchase links—paperback and ebook—and a preview of the first pages. You can see more snippets later in the book on the Amazon listing’s “Look Inside” feature.

If you’re a programmer who wants to learn math, this book is written specifically for you! Why? Because programming and math are naturally complementary, and programmers have a leg up in learning math. Many of the underlying modes of thought in mathematics are present in programming, or are otherwise easy to explain by analogies and contrasts to familiar concepts in software. I leverage that in the book so that you can internalize the insights quickly, and appreciate the nuance more deeply than most books can allow. This book is a bridge from the world of programming to the world of math from the mathematician’s perspective. As far as I know, no other book provides this.

Programs make math more interesting and applicable than otherwise. Typical math writers often hold computation and algorithms at a healthy distance. Not us. We embrace computation as a prize and a principle worth fighting for. Each chapter of the book culminates in an exciting program that applies the mathematical insights from the chapter to an interesting application. The applications include cryptographic schemes, machine learning, drawing hyperbolic tessellations, and a Nobel-prize winning algorithm from economics.

The exercises of the book also push you beyond the book itself. There’s so much math out there that you can’t learn it from a single book. Perspectives and elaborations are spread throughout books, papers, blog posts, wikis, lecture notes, math magazines, and your own scratch paper. This book will prepare you to read a variety of sources by introducing you to the standard language of math, and also push you to engage with those resources.

Finally, this book includes a healthy dose of culture. Quotes and passages from the writings of famous mathematicians, contextual explanations of cultural attitudes, and a light dose of history will provide a peek into why mathematics is the way it is today, and why at times it can seem so confounding to an outsider. Through all this, I will show what progress means for math, what attitudes and patterns will help you along the way, and how to stay sane.

Of course, I couldn’t have written the book without the encouragement and support of you, my readers. Thank you for reading, commenting, and supporting me all these years.

Order the book today! I can’t wait to hear what you think 🙂

Table of Contents for PIM

I am down to the home stretch for publishing my upcoming book, “A Programmer’s Introduction to Mathematics.” I don’t have an exact publication date—I’m self publishing—but after months of editing, I’ve only got two chapters left in which to apply edits that I’ve already marked up in my physical copy. That and some notes from external reviewers, and adding jokes and anecdotes and fun exercises as time allows.

I’m committing to publishing by the end of the year. When that happens I’ll post here and also on the book’s mailing list. Here’s a sneak preview of the table of contents. And a shot of the cover design (still a work in progress)

img_20180914_184845-1-e1540048234788.jpg