The Inequality

Math and computer science are full of inequalities, but there is one that shows up more often in my work than any other. Of course, I’m talking about

\displaystyle 1+x \leq e^{x}

This is The Inequality. I’ve been told on many occasions that the entire field of machine learning reduces to The Inequality combined with the Chernoff bound (which is proved using The Inequality).

Why does it show up so often in machine learning? Mostly because in analyzing an algorithm you want to bound the probability that some bad event happens. Bad events are usually phrased similarly to

\displaystyle \prod_{i=1}^m (1-p_i)

And applying The Inequality we can bound this from above by

\displaystyle\prod_{i=1}^m (1-p_i) \leq \prod_{i=1}^m e^{-p_i} = e^{-\sum_{i=1}^m p_i}

The point is that usually m is the size of your dataset, which you get to choose, and by picking larger m you make the probability of the bad event vanish exponentially quickly in m. (Here p_i is unrelated to how I am about to use p_i as weights).

Of course, The Inequality has much deeper implications than bounds for the efficiency and correctness of machine learning algorithms. To convince you of the depth of this simple statement, let’s see its use in an elegant proof of the arithmetic geometric inequality.

Theorem: (The arithmetic-mean geometric-mean inequality, general version): For all non-negative real numbers a_1, \dots, a_n and all positive p_1, \dots, p_n such that p_1 + \dots + p_n = 1, the following inequality holds:

\displaystyle a_1^{p_1} \cdots a_n^{p_n} \leq p_1 a_1 + \dots + p_n a_n

Note that when all the p_i = 1/n this is the standard AM-GM inequality.

Proof. This proof is due to George Polya (in Hungarian, Pólya György).

We start by modifying The Inequality 1+x \leq e^x by a shift of variables x \mapsto x-1, so that the inequality now reads x \leq e^{x-1}. We can apply this to each a_i giving a_i \leq e^{a_i - 1}, and in fact,

\displaystyle a_1^{p_1} \cdots a_n^{p_n} \leq e^{\sum_{i=1}^n p_ia_i - p_i} = e^{\left ( \sum_{i=1}^n p_ia_i \right ) - 1}

Now we have something quite curious: if we call A the sum p_1a_1 + \dots + p_na_n, the above shows that a_1^{p_1} \cdots a_n^{p_n} \leq e^{A-1}. Moreover, again because A \leq e^{A-1} that shows that the right hand side of the inequality we’re trying to prove is also bounded by e^{A-1}. So we know that both sides of our desired inequality (and in particular, the max) is bounded from above by e^{A-1}. This seems like a conundrum until we introduce the following beautiful idea: normalize by the thing you think should be the larger of the two sides of the inequality.

Define new variables b_i = a_i / A and notice that \sum_i p_i b_i = 1 just by unraveling the definition. Call this sum B = \sum_i p_i b_i. Now we know that

b_1^{p_1} \cdots b_n^{p_n} = \left ( \frac{a_1}{A} \right )^{p_1} \cdots \left ( \frac{a_n}{A} \right )^{p_n} \leq e^{B - 1} = e^0 = 1

Now we unpack the pieces, and multiply through by A^{p_1}A^{p_2} \cdots A^{p_n} = A, the result is exactly the AM-GM inequality.


Even deeper, there is only one case when The Inequality is tight, i.e. when 1+x = e^x, and that is x=0. This allows us to use the proof above to come to a full characterization of the case of equality in the proof above. Indeed, the crucial step was that (a_i / A) = e^{A-1}, which is only true when (a_i / A) = 1, i.e. when a_i = A. Spending a few seconds thinking about this gives the characterization of equality if and only if a_1 = a_2 = \dots = a_n = A.

So this is excellent: the arithmetic-geometric inequality is a deep theorem with applications all over mathematics and statistics. Adding another layer of indirection for impressiveness, one can use the AM-GM inequality to prove the Cauchy-Schwarz inequality rather directly. Sadly, the Wikipedia page for the Cauchy-Schwarz inequality hardly does it justice as far as the massive number of applications. For example, many novel techniques in geometry and number theory are proved directly from C-S. More, in fact, than I can hope to learn.

Of course, no article about The Inequality could be complete without a proof of The Inequality.

Theorem: For all x \in \mathbb{R}, 1+x \leq e^x.

Proof. The proof starts by proving a simpler theorem, named after Bernoulli, that 1+nx \leq (1+x)^n for every x [-1, \infty) and every n \in \mathbb{N}. This is relatively straightforward by induction. The base case is trivial, and

\displaystyle (1+x)^{n+1} = (1+x)(1+x)^n \geq (1+x)(1+nx) = 1 + (n+1)x + nx^2

And because nx^2 \geq 0, we get Bernoulli’s inequality.

Now for any z \geq 0 we can set x = z/n, and get (1+z) = (1+nx) \leq (1+\frac{z}{n})^n for every n. Note that Bernoulli’s inequality is preserved for larger and larger n because x \geq 0. So taking limits of both sides as n \to \infty we get the definition of e^z on the right hand side of the inequality. We can prove a symmetrical inequality for -x when x < 0, and this proves the theorem.


What other insights can we glean about The Inequality? For one, it’s a truncated version of the Taylor series approximation

\displaystyle e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots

Indeed, the Taylor remainder theorem tells us that the first two terms approximate e^x around zero with error depending on some constant times e^x x^2 \geq 0. In other words, 1+x is a lower bound on e^x around zero. It is perhaps miraculous that this extends to a lower bound everywhere, until you realize that exponentials grow extremely quickly and lines do not.

One might wonder whether we can improve our approximation with higher order approximations. Indeed we can, but we have to be a bit careful. In particular, 1+x+x^2/2 \leq e^x is only true for nonnegative x (because the remainder theorem now applies to x^3, but if we restrict to odd terms we win: 1+x+x^2/2 + x^3/6 \leq e^x is true for all x.

What is really surprising about The Inequality is that, at least in the applications I work with, we rarely see higher order approximations used. For most applications, The difference between an error term which is quadratic and one which is cubic or quartic is often not worth the extra work in analyzing the result. You get the same theorem: that something vanishes exponentially quickly.

Support Math ∩ Programming by becoming a patron!

I’m making a few changes to the funding of Math ∩ Programming. First and foremost, I’m launching a Patreon campaign.


The way Patreon works is that you, dear reader, sign up for a monthly donation of any amount you please (as little as $1/month). There are at least three benefits for you doing this:

  1. You show your support of mathematics and programming. You provide documented evidence that you’re a good person. You help to ensure that Math ∩ Programming stays a high quality resource for everyone. You feel good.
  2. There are aggregate milestone goals that make Math ∩ Programming a better place. The first, for example, is that when Math ∩ Programming reaches $200/month I will permanently remove ads. See below for a detailed description of how ads currently support the blog (spoiler: it’s not much).
  3. There are individual benefits if you decide to pledge $5 or more per month. This includes a monthly Google hangout I’ll host, and a sneak peek for every new post and private discussion with me. I’m still thinking about how exactly I’ll implement the member preview, but my current plan is to have a private subreddit. But even if you don’t use reddit there’ll be another way to get access. The highest tier of rewards involves physical merchandise.

I’m excited about Patreon because it seems like an excellent platform. For example, the popular Numberphile YouTube channel makes almost $3k USD per month from patrons. To put that into perspective it’s $36k per year, and my graduate student stipend is only about $17k per year. Assuming Math ∩ Programming could get even half the success of numberphile, I could have potentially funded my entire graduate work just from blogging!

And channels like Numberphile are purely for entertainment’s sake. Math ∩ Programming has the additional benefit of providing working code for algorithms that are directly applicable to business. So if you have ever used the code at Math ∩ Programming as the start of a project or feature, or even if you just have fun reading about math and seeing cool applications, consider becoming a patron to say thank you!

Here are some other minor funding changes.

  • One-time donations are now preferred through Square, due to the lower (1.9%) transaction fee. Square requires a debit card, so if you don’t have one or don’t want to use one you can still use PayPal to donate.
  • People rarely buy merchandise. I made a total of $147 on merchandise since 2013. So I’m going to stop doing that for now. Maybe I just have to come up with better merchandise (comments are welcome).
  • When I link to textbooks on Amazon I’m going to use Amazon Affiliate. Amazon pays me a little bit of money if you use the link and then end up buying something.

Funding so far

As of August 1, 2015 I have made a total of exactly $2,847.55 USD from ads and donations. About $320 of that is from 2015. It works out to about $70 per month since I first asked for money in 2013 and set up ads. It’s a nice little chunk of change, but nothing to get too excited about. Here is a chart of my ad income:


My average (and 6-month moving average) ad revenue. July revenue isn’t posted until mid August.

Donations have provided the rest of the funding, but donations appear to follow a Poisson distribution and the median monthly revenue is zero. By far most of the donations were in the few months after I first asked for donations.

I started my blog with pretty low expectations: I learned a lot of cool things and I wanted to share them, while understanding them better by writing code and filling in proof details. That’s still the core dream, and it will always be the core of Math ∩ Programming. So while it’s pretty cool that I can make any money at all from my blog, and I’m interested to see if I can grow it into a viable side business, you can rest assured that Math ∩ Programming will stay true to its core.

Math ∩ Programming Survey

And the year four birthday post

So Math ∩ Programming was actually born on June 12th 2011, but since my schedule is about to get super busy (more on that below) I’m writing my birthday post a month early.

First things first: I’m conducting a survey of my readers! If you want to help shape the future of Math ∩ Programming, please go take the survey. The results will be private, but I may at some point release some statistics about responses to some of the questions.

So my life is busy this summer. Here’s a few of the things I’ll be doing.

  • I’m giving a poster at the Network Science 2015 workshop at Snowbird, Utah.
  • I’m attending the huge ACM FCRC 2015 conference extravaganza. There will be 12 conferences in one convention center! I’d have reason to go to three or four of them, but they’re all timewise overlapping so I’ll probably approach it like a buffet and try a little piece of everything.
  • I’m going to present a little paper at the ICML Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT ML, the best acronym) in France, and we’re working on beefing it up into a more substantial paper.
  • I recently submitted my application for Hungarian citizenship, which I have by birthright but even after being submitted requires a lot of paperwork in a language I don’t speak very well.
  • I’m rushing to hit some May and June paper deadlines, since…
  • I’m getting married in June! So exciting!

With regards to my blog, I have a big fat list of promises I have not delivered on. Including,

  • Finishing the series on linear programming.
  • Getting to any interesting quantum algorithms.
  • Machine-learny things like support vector machines, boosting, VC-dimension, community detection methods, etc.
  • Topology-related stuff and persistent homology.
  • Following the big blog map I drew last year.

The reason is because I’ve been spending some time on other blog-related projects I plan to announce in the coming months, as well as spending more and more time on a growing list of research projects, and of course on wedding planning. So sadly this will be a very slow summer for M∩P.

And finally, starting in the Fall I will be on the academic job market. I’ll likely post again when September comes around, but my anticipated graduation date is May 2016. So that’s going to be quite the roller coaster I’m sure.

My LaTeX Workflow: latexmk, ShareLaTeX, and StackEdit

Over the last year or so I’ve gradually spent more and more of my time typing math. Be it lecture notes, papers, or blog posts, I think in the last two years I’ve typed vastly more dollar signs (TeX math mode delimiters) than in the rest of my life combined. As is the natural inclination for most programmers, I’ve tried lots of different ways to optimize my workflow and minimize the amount of typing, configuring, file duplicating, and compiler-wrestling I do in my day-to-day routine.

I’ve arrived at what I feel is a stable state. Here’s what I use.

First, my general setup. At home I run OS X Mavericks (10.9.5), and I carry a Chromebook with me to campus and when I travel.

For on-the-fly note taking

I haven’t found a better tool than StackEdit.


Mindset: somewhere in between writing an email with one or two bits of notation (just write TeX source and hope they can read it) and writing a document that needs to look good. These are documents for which you have no figures, don’t want to keep track of sections and theorem numbering, and have no serious bibliography.

Use cases:

  • In class notes: where I need to type fast and can sacrifice on prettiness. Any other workflow besides Markdown with TeX support is just awfully slow, because the boilerplate of LaTeX proper involves so much typing (\begin{theorem} \end{theorem}, etc.)
  • Notes during talks: these notes usually have fewer formulas and more sentences, but the ability to use notation when I want it really helps.
  • Short drafts of proofs: when I want to send something technical yet informal to a colleague, but it’s in such a draft phase that I’m more concerned about the idea being right—and on paper—than whether it looks good.

Awesome features: I can access documents from Google Drive. Integration with Dropbox (which they have) is not enough because I don’t have Dropbox on every computer I use (Chromebook, public/friends’ computers). Also, you can configure Google Drive to open markdown files with StackEdit by default (otherwise Drive can’t open them at all).

How it could improve: The service gets sluggish with longer documents, and sometimes the preview page jumps around like crazy when you have lots of offset equations. Sometimes it seems like it recompiles the whole document when you only change one paragraph, and so the preview can be useless to look at while you’re typing. I recently discovered you can turn off features you don’t use in the settings, so that might speed things up.

Also, any time something needs to be aligned (such as a matrix or piecewise notation), you have to type \begin{}’s and \end{}’s, so that slows down the typing. It would be nice to have some shortcuts like \matrix[2,3]{1,3,4,4,6,8} or at least an abbreviation for \begin and \end (\b{} and \e{}, maybe?). Also some special support for (and shortcuts for) theorem/proof styling would be nice, but not necessary. Right now I embolden the Theorem and italicize the Proof., and end with a tombstone \square on a line by itself. I don’t see a simple way to make a theorem/proof environment with minimal typing, but it does occur to me as an inefficiency; the less time I can spend highlighting and formatting things the better.

Caveats: Additional features, such as exporting from StackEdit to pdf requires you to become a donor ($5/year, a more than fair price for the amount I use it). I would find the service significantly less useful if I could not export to pdf.

For work while travelling

My favorite so far is ShareLaTeX.

sharelatexI’ve used a bunch of online TeX editors, most notably Overleaf (formerly WriteLaTeX). They’re both pretty solid, but a few features tip me toward ShareLaTeX. I’ll italicize these things below.

Mindset: An editor I can use on my Chromebook or a public machine, yet still access my big papers and projects in progress. Needs support for figures, bibliographies, the whole shebang. Basically I need a browser replacement for a desktop LaTeX setup. I generally do not need collaboration services, because the de facto standard among everyone I’ve ever interacted with is that you can only expect people to have Dropbox. You cannot expect them to sign up for online services just to work with you.

Use cases:

  • Drafting actual research papers
  • Writing slides/talks

Awesome features: Dropbox integration! This is crucial, because I (and everyone I know) does their big collaborative projects using Dropbox. ShareLaTeX (unlike Overleaf) has seamless Dropbox integration. The only caveat is that ShareLaTeX only accesses Dropbox files that are in a specially-named folder. This causes me to use a bunch of symbolic links that would be annoying to duplicate if I got a new machine.

Other than that, ShareLaTeX (like Overleaf) has tons of templates, all the usual libraries, great customer support, and great collaborative features for the once in a blue moon that someone else uses ShareLaTeX.

Vim commands. The problem is that they don’t go far enough here. They don’t support vim-style word-wrapping (gq), and they leave out things like backward search (? instead of /) and any : commands you tend to use.

Github integration. Though literally no mathematicians I know use Github for anything related to research, I think that with the right features Github could become the “right” solution to paper management. The way people store and “archive” their work is horrendous, and everyone can agree a waste of time. I have lots of ideas for how Github could improve academics’ lives and the lives of the users of their research, too many to list here without derailing the post. The point is that ShareLaTeX having Github integration is forward thinking and makes ShareLaTeX more attractive.

How it could improve: Better vim command support. It seems like many of these services are viewed by their creators as a complete replacement for offline work, when really (for me) it’s a temporary substitute that needs to operate seamlessly with my other services. So basically the more seamless integration it has with services I use, the better.

Caveats: Integration comes at a premium of $8/month for students, and $15/month for non-students.

Work at home

This is where we get into the nitty gritty of terminal tools. Because naively writing papers in TeX on a desktop has a lot of lame steps and tricks. There are (multiple types of) bibliography files to manage, you have to run like four commands to compile a document, and the TeX compiler errors are often nonsense.

I used to have a simple script to compile;display;clean for me, but then I came across the latexmk program. What you can do is configure latexmk to automatically recompile when a change is made to a source file, and then you can configure a pdf viewer (like Skim) to update when the pdf changes. So instead of the workflow being “Write. Compile. View. Repeat,” It’s “Compile. View. Write until done.”

Of course lots of random TeX distributions come with crusty GUIs that (with configuration) do what latexmk does. But I love my vim, and you have your favorite editor, too. The key part is that latexmk and Skim don’t care what editor you use.

For reference, here’s how I got it all configured on OS X Mavericks.

  1. Install latexmk (move the perl script downloadable from their website to anywhere on your $PATH).
  2. Add alias latexmk=' -pvc' to your .profile. The -pvc flag makes latexmk watch for changes.
  3. Add the following to a new file called .latexmkrc in your home directory (it says: I only do pdfs and use Skim to preview):
    $pdf_mode = 1;
    $postscript_mode = 0;
    $dvi_mode = 0;
    $pdf_previewer = "open -a /Applications/";
    $clean_ext = "paux lox pdfsync out";
  4. Install Skim.
  5. In Skim’s preferences, go to the Sync tab and check the box “Check for file changes.”
  6. Run the following from the command line, which prevents Skim from asking (once for each file!) whether you want to auto reload that file:
    $ defaults write -app Skim SKAutoReloadFileUpdate -boolean true

Now the workflow is: browse to your working directory; run latexmk yourfile.tex (this will open Skim); open the tex document in your editor; write. When you save the file, it will automatically recompile and display in Skim. Since it’s OS X, you can scroll through the pdf without switching window focus, so you don’t even have to click back into the terminal window to continue typing.

Finally, I have two lines in my .vimrc to auto-save every second that the document is idle (or when the window loses focus) so that I don’t have to type :w every time I want the updates to display. To make this happen only when you open a tex file, add these lines instead to ~/.vim/ftplugin/tex.vim

set updatetime=1000
autocmd CursorHoldI,CursorHold,BufLeave,FocusLost silent! wall

Caveats: I haven’t figured out how to configure latexmk to do anything more complicated than this. Apparently it’s possible to get it setup to work with “Sync support,” which means essentially you can go back and forth between the source file lines and the corresponding rendered document lines by clicking places. I think reverse search (pdf->vim) isn’t possible with regular vim (it is apparently with macvim), but forward search (vim->pdf) is if you’re willing to install some plugins and configure some files. So here is the place where Skim does care what editor you use. I haven’t yet figured out how to do it, but it’s not a feature I care much for.

One deficiency I’ve found: there’s no good bibliography manager. Sorry, Mendeley, I really can’t function with you. I’ll just be hand-crafting my own bib files until I find or make a better solution.

Have any great tools you use for science and paper writing? I’d love to hear about them.