Math ∩ Programming

Featured Posts

Mon, 20 Jun 2011 18:31:45 +0000

My next book will be Practical Math for Programmers Searching for Riemann Hypothesis Counterexamples Linear Programming and Healthy Diets Hybrid Images Bezier Curves and Picasso Correcting Errors in Data

A High-Level Technical Overview of Fully Homomorphic Encryption

Sat, 04 May 2024 15:30:00 +0000

About two years ago, I switched teams at Google to focus on fully homomorphic encryption (abbreviated FHE, or sometimes HE). Since then I’ve got to work on a lot of interesting projects, learning along the way about post-quantum cryptography, compiler design, and the ins and outs of fully homomorphic encryption. If you’ve heard about FHE and you’re a software person, you’ve probably heard two things: it lets you run programs directly on encrypted data without ever decrypting it; and it’s still too slow to be useful for anything.

Unusual Tips for Parenting Toddlers

Mon, 01 Apr 2024 07:00:00 +0000

It’s April Cools! Last year I wrote about friendship bracelets and the year before about cocktails. This year it’s parenting. Parenting articles are a dime a dozen and always bury the lede behind a long story. I’ll skip that. How to think about your child and your role as a parent These are framing devices. Concrete things to do to work toward these are in the next section. I will refer to the numbers for each action to show what principle it is applying.

Tabletop Games Based on Math Problems

Fri, 08 Mar 2024 20:06:19 -0800

There’s a family of tabletop games that are based directly on a nontrivial mathematics problem. As a casual and fun way to inaugurate my new blog (migrated from Wordpress to Hugo, after my work on getting better LaTeX mathmode support in Hugo), I thought I’d write a short listicle about them, so that I have a place to add more as I find them, as well as give the shortest canonical description of the associated math problem.

MLIR — A Global Optimization and Dataflow Analysis

Wed, 15 Nov 2023 11:20:52 +0000

Table of Contents In this article we’ll implement a global optimization pass, and show how to use the dataflow analysis framework to verify the results of our optimization. The code for this article is in this pull request, and as usual the commits are organized to be read in order. The noisy arithmetic problem This demonstration is based on a simplified model of computation relevant to the HEIR project. You don’t need to be familiar with that project to follow this article, but if you’re wondering why someone would ever want the kind of optimization I’m going to write, that project is why.

MLIR — Lowering through LLVM

Wed, 01 Nov 2023 07:00:00 +0000

Table of Contents In the last article we lowered our custom poly dialect to standard MLIR dialects. In this article we’ll continue lowering it to LLVM IR, exporting it out of MLIR to LLVM, and then compiling to x86 machine code. The code for this article is in this pull request, and as usual the commits are organized to be read in order. Defining a Pipeline The first step in lowering to machine code is to lower to an “exit dialect.

MLIR — Dialect Conversion

Mon, 23 Oct 2023 06:00:00 +0000

Table of Contents In previous articles we defined a dialect, and wrote various passes to optimize and canonicalize a program using that dialect. However, one of the main tenets of MLIR is “incremental lowering,” the idea that there are lots of levels of IR granularity, and you incrementally lower different parts of the IR, only discarding information when it’s no longer useful for optimizations. In this article we’ll see the first step of that: lowering the poly dialect to a combination of standard MLIR dialects, using the so-called dialect conversion infrastructure to accomplish it.

Socks, a matching game based on an additive combinatorics problem

Sat, 14 Oct 2023 06:00:00 +0000

Can you find a set of cards among these six, such that the socks on the chosen cards can be grouped into matching pairs? (Duplicate pairs of the same sock are OK) Spoilers: If the cards are indexed as 1 2 3 4 5 6 Then the following three subsets work: $\{ 1, 2, 4, 5, 6 \}$, $\{ 2, 3, 6 \}$, and $\{ 1, 3, 4, 5 \}$.

MLIR — Canonicalizers and Declarative Rewrite Patterns

Wed, 20 Sep 2023 08:00:00 +0000

Table of Contents In a previous article we defined folding functions, and used them to enable some canonicalization and the sccp constant propagation pass for the poly dialect. This time we’ll see how to add more general canonicalization patterns. The code for this article is in this pull request, and as usual the commits are organized to be read in order. Why is Canonicalization Needed? MLIR provides folding as a mechanism to simplify an IR, which can result in simpler, more efficient ops (e.

Encoding Schemes in FHE

Mon, 18 Sep 2023 11:08:39 +0000

In cryptography, we need a distinction between a cleartext and a plaintext. A cleartext is a message in its natural form. A plaintext is a cleartext that is represented in a specific way to prepare it for encryption in a specific scheme. The process of taking a cleartext and turning it into a plaintext is called encoding, and the reverse is called decoding. In homomorphic encryption, the distinction matters. Cleartexts are generally all integers, though the bit width of allowed integers can be restricted (e.

MLIR — Verifiers

Wed, 13 Sep 2023 09:00:00 +0000

Table of Contents Last time we defined folders and used them to enable some canonicalization and the sccp constant propagation pass for the poly dialect. This time we’ll add some additional safety checks to the dialect in the form of verifiers. The code for this article is in this pull request, and as usual the commits are organized to be read in order. Purpose of a verifier Verifiers ensure the types and operations in a concrete MLIR program are well-formed.

MLIR — Folders and Constant Propagation

Mon, 11 Sep 2023 08:00:00 +0000

Table of Contents Last time we saw how to use pre-defined MLIR traits to enable upstream MLIR passes like loop-invariant-code-motion to apply to poly programs. We left out -sccp (sparse conditional constant propagation), and so this time we’ll add what is needed to make that pass work. It requires the concept of folding. The code for this article is in this pull request, and as usual the commits are organized to be read in order.

MLIR — Using Traits

Thu, 07 Sep 2023 08:00:00 +0000

Table of Contents Last time we defined a new dialect poly for polynomial arithmetic. This time we’ll spruce up the dialect by adding some pre-defined MLIR traits, and see how the application of traits enables some general purpose passes to optimize poly programs. The code for this article is in this pull request, and as usual the commits are organized to be read in order. Traits and Loop Invariant Code Motion As a compiler toolchain, MLIR heavily emphasizes code reuse.

Computing Percentages Easier

Tue, 05 Sep 2023 21:47:37 +0000

Problem: Compute 16% of 25 in your head. Solution: 16% of 25 is equivalent to 25% of 16, which is clearly 4. This is true for all numbers: $x\%$ of $y$ is always equal to $y\%$ of $x$. The first one is $\frac{x}{100} y$ and the second is $\frac{y}{100}x$, and because multiplication is commutative and associative, both are equal to $(x \cdot y) / 100$. You can pick the version that is easiest.

MLIR — Defining a New Dialect

Mon, 21 Aug 2023 08:00:00 +0000

Table of Contents In the last article in the series, we migrated the passes we had written to use the tablegen code generation framework. That was a preface to using tablegen to define dialects. In this article we’ll define a dialect that represents arithmetic on single-variable polynomials, with coefficients in $\mathbb{Z} / 2^{32} \mathbb{Z}$ (32-bit unsigned integers). The code for this article is in this pull request, and as usual the commits are organized to be read in order.

MLIR — Using Tablegen for Passes

Thu, 10 Aug 2023 14:41:45 +0000

Table of Contents In the last article in this series, we defined some custom lowering passes that modified an MLIR program. Notably, we accomplished that by implementing the required interfaces of the MLIR API directly. This is not the way that most MLIR developers work. Instead, they use a code generation tool called tablegen to generate boilerplate for them, and then only add the implementation methods that are custom to their work.

MLIR — Writing Our First Pass

Thu, 10 Aug 2023 14:39:42 +0000

Table of Contents This series is an introduction to MLIR and an onboarding tutorial for the HEIR project. Last time we saw how to run and test a basic lowering. This time we will write some simple passes to illustrate the various parts of the MLIR API and the pass infrastructure. As mentioned previously, the main work in MLIR is defining passes that either optimize part of a program, lower from parts of one dialect to others, or perform various normalization and canonicalization operations.

MLIR — Running and Testing a Lowering

Thu, 10 Aug 2023 14:36:13 +0000

Table of Contents Last time, we covered a Bazel build system setup for an MLIR project. This time we’ll give an overview of a simple lowering and show how end-to-end tests work in MLIR. All of the code for this article is contained in this pull request on GitHub, and the commits are nicely organized and quite readable. Two of the central concepts in MLIR are dialects and lowerings. These are the scaffolding within which we can do the truly interesting parts of a compiler—that is, the optimizations and analyses.

MLIR — Getting Started

Thu, 10 Aug 2023 14:32:33 +0000

Table of Contents As we announced recently, my team at Google has started a new effort to build production-worthy engineering tools for Fully Homomorphic Encryption (FHE). One focal point of this, and one which I’ll be focusing on as long as Google is willing to pay me to do so, is building out a compiler toolchain for FHE in the MLIR framework (Multi-Level Intermediate Representation). The project is called Homomorphic Encryption Intermediate Representation, or HEIR.

Google's Recent FHE work, and starting HEIR

Thu, 10 Aug 2023 12:22:29 +0000

Today my team at Google published an article on Google’s Developers Blog with some updates on what we’ve been doing with fully homomorphic encryption (FHE). There’s fun stuff in there, including work on video processing FHE, compiling ML models to FHE, an FHE implementation for TPUs, and improvements to the compiler I wrote about earlier this year. TODO: add mower gif video A simple object movement tracking algorithm in FHE, tracking a runaway lawn mower from a Nest camera.

Two's Complement and Group Theory

Mon, 10 Jul 2023 07:00:00 +0000

Before I discovered math, I was a first year undergrad computer science student taking Electrical Engineering 101. The first topic I learned was what bits and boolean gates are, and the second was the two’s complement representation of a negative n-bit integer. At the time two’s complement seemed to me like a bizarre quirk of computer programming, with minutiae you just had to memorize. If the leading bit is 1, it’s negative, and otherwise it’s positive.

We're Knot Friends

Sat, 01 Apr 2023 05:00:00 +0000

It’s April Cools again. For a few summers in high school and undergrad, I was a day camp counselor. I’ve written before about how it helped me develop storytelling skills, but recently I thought of it again because, while I was cleaning out a closet full of old junk, I happened upon a bag of embroidery thread. While stereotypically used to sew flowers into a pillowcase or write “home sweet home” on a hoop, at summer camps embroidery thread is used to make friendship bracelets.

Sample Extraction from RLWE to LWE

Mon, 27 Feb 2023 08:00:00 +0000

In this article I’ll derive a trick used in FHE called sample extraction. In brief, it allows one to partially convert a ciphertext in the Ring Learning With Errors (RLWE) scheme to the Learning With Errors (LWE) scheme. Here are some other articles I’ve written about other FHE building blocks, though they are not prerequisites for this article. Modulus Switching in LWE Key Switching in LWE The Gadget Decomposition in FHE Negacyclic Polynomial Multiplication Estimating the Security of Ring Learning With Errors LWE and RLWE The first two articles in the list above define the Learning With Errors problem (LWE).

Google's Fully Homomorphic Encryption Compiler — A Primer

Mon, 13 Feb 2023 11:34:01 +0000

Back in May of 2022 I transferred teams at Google to work on Fully Homomorphic Encryption (newsletter announcement). Since then I’ve been working on a variety of projects in the space, including being the primary maintainer on github.com/google/fully-homomorphic-encryption, which is an open source FHE compiler for C++. This article will be an introduction to how to use it to compile programs to FHE, as well as a quick overview of its internals.

Estimating the Security of Ring Learning with Errors (RLWE)

Wed, 28 Dec 2022 14:52:01 +0000

This article was written by my colleague, Cathie Yun. Cathie is an applied cryptographer and security engineer, currently working with me to make fully homomorphic encryption a reality at Google. She’s also done a lot of cool stuff with zero knowledge proofs. In previous articles, we’ve discussed techniques used in Fully Homomorphic Encryption (FHE) schemes. The basis for many FHE schemes, as well as other privacy-preserving protocols, is the Learning With Errors (LWE) problem.

Negacyclic Polynomial Multiplication

Fri, 09 Dec 2022 13:35:45 +0000

In this article I’ll cover three techniques to compute special types of polynomial products that show up in lattice cryptography and fully homomorphic encryption. Namely, the negacyclic polynomial product, which is the product of two polynomials in the quotient ring $\mathbb{Z}[x] / (x^N + 1)$. As a precursor to the negacyclic product, we’ll cover the simpler cyclic product. All of the Python code written for this article is on GitHub.

Polynomial Multiplication Using the FFT

Wed, 16 Nov 2022 08:00:00 +0000

Problem: Compute the product of two polynomials efficiently. Solution: import numpy from numpy.fft import fft, ifft def poly_mul(p1, p2): """Multiply two polynomials. p1 and p2 are arrays of coefficients in degree-increasing order. """ deg1 = p1.shape[0] - 1 deg2 = p1.shape[0] - 1 # Would be 2*(deg1 + deg2) + 1, but the next-power-of-2 handles the +1 total_num_pts = 2 * (deg1 + deg2) next_power_of_2 = 1 << (total_num_pts - 1).

Carnival of Mathematics #209

Sun, 02 Oct 2022 08:00:00 +0000

Welcome to the 209th Carnival of Mathematics! 209 has a few distinctions, including being the smallest number with 6 representations as a sum of 3 positive squares: $$\begin{aligned}209 &= 1^2 + 8^2 + 12^2 \\\ &= 2^2 + 3^2 + 14^2 \\\ &= 2^2 + 6^2 + 13^2 \\\ &= 3^2 + 10^2 + 10^2 \\\ &= 4^2 + 7^2 + 12^2 \\\ &= 8^2 + 8^2 + 9^2 \end{aligned}$$ As well as being the 43rd Ulam number, the number of partitions of 16 into relatively prime parts and the number of partitions of 63 into squares.

Key Switching in LWE

Mon, 29 Aug 2022 08:00:00 +0000

Last time we covered an operation in the LWE encryption scheme called modulus switching, which allows one to switch from one modulus to another, at the cost of introducing a small amount of extra noise, roughly $\sqrt{n}$, where $n$ is the dimension of the LWE ciphertext. This time we’ll cover a more sophisticated operation called key switching, which allows one to switch an LWE ciphertext from being encrypted under one secret key to another, without ever knowing either secret key.

Modulus Switching in LWE

Sat, 16 Jul 2022 14:39:56 +0000

The Learning With Errors problem is the basis of a few cryptosystems, and a foundation for many fully homomorphic encryption (FHE) schemes. In this article I’ll describe a technique used in some of these schemes called modulus switching. In brief, an LWE sample is a vector of values in $\mathbb{Z}/q\mathbb{Z}$ for some $q$, and in LWE cryptosystems an LWE sample can be modified so that it hides a secret message $m$.

"Practical Math" Preview: Collect Sensitive Survey Responses Privately

Sat, 14 May 2022 09:40:49 +0000

This is a draft of a chapter from my in-progress book, Practical Math for Programmers: A Tour of Mathematics in Production Software. Tip: Determine an aggregate statistic about a sensitive question, when survey respondents do not trust that their responses will be kept secret. Solution: import random def respond_privately(true_answer: bool) -> bool: '''Respond to a survey with plausible deniability about your answer.''' be_honest = random.random() < 0.5 random_answer = random.random() < 0.

Cocktails

Fri, 01 Apr 2022 07:00:00 +0000

It’s April Cools! We’re taking back April Fools. When I was younger I had a strange relationship with alcohol, not because of any sort of trauma, but because I found it decidedly boring and disgusting to the taste. I didn’t drink in high school, didn’t enjoy parties in college, and didn’t care for tailgating or other sports-based events where drinking was common. I also never enjoyed wine—red is too tannic, white is just meh—and almost all beer tastes awful to me (lambic is a delightful exception).

Silent Duels—Constructing the Solution part 2

Thu, 24 Mar 2022 07:00:00 +0000

Previous posts in this series: Silent Duels and an Old Paper of Restrepo Silent Duels—Parsing the Construction Silent Duels—Constructing the Solution part 1 Since it’s been three years since the last post in this series, and the reason for the delay is that I got totally stuck on the implementation. I’m publishing this draft article as partial progress until I can find time to work on it again. If you haven’t read the last post, please do.

My next book will be "Practical Math for Programmers"

Wed, 16 Mar 2022 07:00:00 +0000

tl;dr: I’m writing a new book, sign up for the announcements mailing list. I’ve written exactly zero new technical blog posts this year because I’ve been spending all my writing efforts on my next book, Practical Math for Programmers (PMFP, subtitle: A Tour of Mathematics in Production Software). I’ve written a little bit about it in my newsletter, Halfspace. There I rant, critique, brainstorm, and wax poetic about math and software.

The Gadget Decomposition in FHE

Sat, 11 Dec 2021 13:57:25 +0000

Lately I’ve been studying Fully Homomorphic Encryption, which is the miraculous ability to perform arbitrary computations on encrypted data without learning any information about the underlying message. It’s the most comprehensive private computing solution that can exist (and it does exist!). The first FHE scheme by Craig Gentry was based on ideal lattices and was considered very complex (I never took the time to learn how it worked). Some later schemes (GSW = Gentry-Sahai-Waters) are based on matrix multiplication, and are conceptually much simpler.

Group Actions and Hashing Unordered Multisets

Thu, 14 Oct 2021 08:00:00 +0000

I learned of a neat result due to Kevin Ventullo that uses group actions to study the structure of hash functions for unordered sets and multisets. This piqued my interest because a while back a colleague asked me if I could think of any applications of “pure” group theory to practical computer programming that were not cryptographic in nature. He meant, not including rings, fields, or vector spaces whose definitions happen to be groups when you forget the extra structure.

Carnival of Mathematics #197

Wed, 01 Sep 2021 08:00:00 +0000

Welcome to the 197th Carnival of Mathematics! 197 is an unseemly number, as you can tell by the Wikipedia page which currently says that it has “indiscriminate, excessive, or irrelevant examples.” How deviant. It’s also a Repfigit, which means if you start a fibonacci-type sequence with the digits 1, 9, 7, and then continue with $ a_n = a_{i-3} + a_{i-2} + a_{i-1}$, then 197 shows up in the sequence. Indeed: 1, 9, 7, 17, 33, 57, 107, 197, …

Searching for RH Counterexamples — Exploring Data

Mon, 14 Jun 2021 08:00:00 +0000

We’re ironically searching for counterexamples to the Riemann Hypothesis. Setting up Pytest Adding a Database Search Strategies Unbounded integers Deploying with Docker Performance Profiling Scaling up Productionizing In the last article we added a menagerie of “production readiness” features like continuous integration tooling (automating test running and static analysis), alerting, and a simple deployment automation. Then I let it loose on AWS, got extremely busy with buying a house, forgot about this program for a few weeks (no alerts means it worked flawlessly!

Regression and Linear Combinations

Mon, 29 Mar 2021 09:00:00 +0000

Recently I’ve been helping out with a linear algebra course organized by Tai-Danae Bradley and Jack Hidary, and one of the questions that came up a few times was, “why should programmers care about the concept of a linear combination?” For those who don’t know, given vectors $ v_1, \dots, v_n$, a linear combination of the vectors is a choice of some coefficients $ a_i$ with which to weight the vectors in a sum $ v = \sum_{i=1}^n a_i v_i$.

Searching for RH Counterexamples — Productionizing

Sat, 06 Mar 2021 10:08:59 +0000

We’re ironically searching for counterexamples to the Riemann Hypothesis. Setting up Pytest Adding a Database Search Strategies Unbounded integers Deploying with Docker Performance Profiling Scaling up In the last article we rearchitected the application so that we could run as many search instances as we want in parallel, and speed up the application by throwing more compute resources at the problem. This is good, but comes with a cost. The complexity of this new architecture requires us to manage many different containers and AWS instances.

Searching for RH Counterexamples — Scaling Up

Tue, 16 Feb 2021 09:00:00 +0000

We’re ironically searching for counterexamples to the Riemann Hypothesis. Setting up Pytest Adding a Database Search Strategies Unbounded integers Deploying with Docker Performance Profiling Last time we made the audacious choice to remove primary keys from the RiemannDivisorSums table for performance reasons. To help with that, we will do two things in this post Reduce the storage footprint of the whole application (it was 60 GiB when it crashed, and we got up to 84 prime factors).

Searching for RH Counterexamples — Performance Profiling

Tue, 02 Feb 2021 08:00:00 +0000

We’re ironically searching for counterexamples to the Riemann Hypothesis. Setting up Pytest Adding a Database Search Strategies Unbounded integers Deploying with Docker In the last article we ran into some performance issues with our deployed docker application. In this article we’ll dig in to see what happened, fix the problem, run into another problem, fix it, and run the search until we rule out RH witness-value-based counterexamples among all numbers with fewer than 85 prime factors.

Searching for RH Counterexamples — Deploying with Docker

Mon, 04 Jan 2021 09:00:00 +0000

We’re ironically searching for counterexamples to the Riemann Hypothesis. Setting up Pytest Adding a Database Search Strategies Unbounded Integers In this article we’ll deploy the application on a server, so that it can search for RH counterexamples even when I close my laptop. Servers and containers When deploying applications to servers, reproducibility is crucial. You don’t want your application to depend on the details of the computer it’s running on. This is a higher-level version of the same principle behind Python virtual environments, but it applies to collections of programs, possibly written in different languages and running on different computers.

Optimization Models for Subset Cover

Tue, 20 Oct 2020 09:00:00 +0000

In a recent newsletter article I complained about how researchers mislead about the applicability of their work. I gave SAT solvers as an example. People provided interesting examples in response, but what was new to me was the concept of SMT (Satisfiability Modulo Theories), an extension to SAT. SMT seems to have more practical uses than vanilla SAT (see the newsletter for details). I wanted to take some time to explore SMT solvers, and I landed on Z3, an open-source SMT solver from Microsoft.

Searching for RH Counterexamples — Unbounded Integers

Tue, 13 Oct 2020 09:00:00 +0000

We’re ironically searching for counterexamples to the Riemann Hypothesis. Setting up Pytest Adding a Database Search strategies In the last article, we improved our naive search from “try all positive integers” to enumerate a subset of integers (superabundant numbers), which RH counterexamples are guaranteed to be among. These numbers grow large, fast, and we quickly reached the limit of what 64 bit integers can store. Unbounded integer arithmetic is possible on computers, but it requires a special software implementation.

Searching for RH Counterexamples — Search Strategies

Mon, 28 Sep 2020 20:48:00 +0000

We’re glibly searching for counterexamples to the Riemann Hypothesis, to trick you into learning about software engineering principles. In the first two articles we configured a testing framework and showed how to hide implementation choices behind an interface. Next, we’ll improve the algorithm’s core routine. As before, I’ll link to specific git commits in the final code repository to show how the project evolves. Superabundant numbers A superabundant number $ n$ is one which has “maximal relative divisor sums” in the following sense: for all $ m < n$,

Searching for RH Counterexamples — Adding a Database

Fri, 11 Sep 2020 18:53:48 +0000

In the last article we set up pytest for a simple application that computes divisor sums $ \sigma(n)$ and tries to disprove the Riemann Hypothesis. In this post we’ll show how to extend the application as we add a database dependency. The database stores the computed sums so we can analyze them after our application finishes. As in the previous post, I’ll link to specific git commits in the final code repository to show how the project evolves.

Searching for RH Counterexamples — Setting up Pytest

Fri, 11 Sep 2020 15:51:48 +0000

Some mathy-programmy people tell me they want to test their code, but struggle to get set up with a testing framework. I suspect it’s due to a mix of: There are too many choices with a blank slate. Making slightly wrong choices early on causes things to fail in unexpected ways. I suspect the same concerns apply to general project organization and architecture. Because Python is popular for mathy-programmies, I’ll build a Python project that shows how I organize my projects and and test my code, and how that shapes the design and evolution of my software.

Taylor Series and Accelerometers

Sun, 26 Jul 2020 11:11:41 +0000

In my book, A Programmer’s Introduction to Mathematics, I describe the Taylor Series as a “hammer for every nail.” I learned about another nail in the design of modern smartphone accelerometers from “Eight Amazing Engineering Stories” by Hammack, Ryan, and Ziech, which I’ll share here. These accelerometers are designed using a system involving three plates, which correspond to two capacitors. A quick recap on my (limited) understanding of how capacitors work.

Contextual Symbols in Math

Fri, 22 May 2020 12:36:07 +0000

In my book I discuss the importance of context in reading and writing mathematics. An early step in becoming comfortable with math is deciphering the syntax of mathematical expressions. Another is in connecting the symbols to their semantic meanings. Embedded in these is the subproblem of knowing what to call the commonly used symbols. The more abstract you go, the more exotic the symbols tend to get. Wikipedia has an excellent list for deciphering those symbols that have a typically well-understood meaning, like $ \otimes$ and $ \mathbb{Q}$.

Musings on A New Interface for Mathematics

Sun, 17 May 2020 12:11:22 +0000

This essay is a slightly modified version of the closing chapter of A Programmer’s Introduction to Mathematics. We are no longer constrained by pencil and paper. The symbolic shuffle should no longer be taken for granted as the fundamental mechanism for understanding quantity and change. Math needs a new interface. –Bret Victor, “Kill Math” Math is a human activity. It’s messy and beautiful, complicated and elegant, useful and bull-headedly frustrating. But in reading this book, A Programmer’s Introduction to Mathematics, my dream is that you have found the attitude, confidence, and enough prerequisite knowledge to continue to engage with mathematics beyond these pages.

Second Edition of A Programmer's Introduction to Mathematics

Sun, 17 May 2020 11:12:28 +0000

The Second Edition of A Programmer’s Introduction to Mathematics is now available on Amazon. The second edition includes a multitude of fixes to typos and some technical errata, thanks to my readers who submitted over 200 errata. Readers who provided names are credited in the front matter. I also added new exercises, and three new appendices: a notation table to augment the notation index, a summary of elementary formal logic, and an annotated list of further reading.

The Communicative Value of Using Git Well

Tue, 14 Jan 2020 07:00:28 +0000

Recently my employer (Google) forced me to switch to Mercurial instead of my usual version control system, git. The process of switching sparked a few discussions between me and my colleagues about the value of various version control systems. A question like “what benefit does git provide over Mercurial” yielded no clear answers, suggesting many developers don’t know. An informal Twitter survey didn’t refute this claim. A distinguished value git provides me is the ability to sculpt code changes into stories.

A Good Year for "A Programmer's Introduction to Mathematics"

Sun, 01 Dec 2019 07:00:36 +0000

A year ago today I self-published “A Programmer’s Introduction to Mathematics” (PIM). In this short note I want to describe the success it’s had, summarize the complaints of some readers and the praise of others, and outline what’s next. Since publication PIM has sold over 11,000 copies. A rough chart showing the sales of paperback and ebook copies of PIM. Here’s a table: Month Paperback Ebook Total 2018-12 4,323 1,866 6,189 2019-01 1,418 258 1,676 2019-02 852 128 980 2019-03 762 107 869 2019-04 357 63 420 2019-05 289 51 340 2019-06 200 58 258 2019-07 223 40 263 2019-08 158 43 201 2019-09 159 24 183 2019-10 155 44 199 As expected, the sales dropped significantly after the first month, but leveled out at around 200 copies per month.

Silent Duels—Constructing the Solution part 1

Sun, 30 Jun 2019 07:00:14 +0000

Previous posts in this series: Silent Duels and an Old Paper of Restrepo Silent Duels—Parsing the Construction Last time we waded into Restrepo’s silent duel paper. You can see the original and my re-typeset version on Github along with all of the code in this series. We digested Section 2 and a bit further, plotting some simplified examples of the symmetric version of the game. I admit, this paper is not an easy read.

Math Versus Dirty Data

Sat, 08 Jun 2019 08:00:06 +0000

At Google, our organization designs, owns, and maintains a number of optimization models that automate the planning of Google’s datacenter growth and health. As is pretty standard in supply chain optimization and planning, these models are often integer linear programs. It’s a core competency of operations research, after all. One might think, “Large optimization problems? That sounds hard!” But it’s actually far from the hardest part of the job. In fact, it’s one of the few exciting parts of the job.

A Working Mathematician's Guide to Parsing

Sat, 20 Apr 2019 17:51:51 +0000

Our hero, a mathematician, is writing notes in LaTeX and needs to convert it to a format that her blog platform accepts. She’s used to using dollar sign delimiters for math mode, but her blog requires  and \[ \]. Find-and-replace fails because it doesn’t know about which dollar sign is the start and which is the end. She knows there’s some computer stuff out there that could help, but she doesn’t have the damn time to sort through it all.

Silent Duels—Parsing the Construction

Mon, 28 Jan 2019 15:45:02 +0000

Last time we discussed the setup for the silent duel problem: two players taking actions in $ [0,1]$, player 1 gets $ n$ chances to act, player 2 gets $ m$, and each knows their probability of success when they act. The solution is in a paper of Rodrigo Restrepo from the 1950s. In this post I’ll start detailing how I study this paper, and talk through my thought process for approaching a bag of theorems and proofs.

Silent Duels and an Old Paper of Restrepo

Mon, 31 Dec 2018 08:00:09 +0000

Two men start running at each other with loaded pistols, ready to shoot! It’s a foggy morning for a duel. Newton and Leibniz have decided this macabre contest is the only way to settle their dispute over who invented Calculus. Each pistol is fitted with a silencer and has a single bullet. Neither can tell when the other has attempted a shot, unless, of course, they are hit. Newton and Leibniz both know that the closer they get to their target, the higher the chance of a successful shot.

A Programmer's Introduction to Mathematics

Sat, 01 Dec 2018 07:57:26 +0000

For the last four years I’ve been working on a book for programmers who want to learn mathematics. It’s finally done, and you can buy it today. The website for the book is pimbook.org, which has purchase links—paperback and ebook—and a preview of the first pages. You can see more snippets later in the book on the Amazon listing’s “Look Inside” feature. If you’re a programmer who wants to learn math, this book is written specifically for you!

Hanabi: a card game for logicians

Fri, 10 Aug 2018 07:00:48 +0000

Mathematics students often hear about the classic “blue-eyed islanders” puzzle early in their career. If you haven’t seen it, read Terry Tao’s excellent writeup linked above. The solution uses induction and the idea of *common knowledge—*I know X, and you know that I know X, and I know that you know that I know X, and so on—to make a striking inference from a seemingly useless piece of information. Recreational mathematics is also full of puzzles involving prisoners wearing colored hats, where they can see others hats but not their own, and their goal is to each determine (often with high probability) the color of their own hat.

Visualizing an Assassin Puzzle

Tue, 24 Jul 2018 08:00:09 +0000

Over at Math3ma, Tai-Danae Bradley shared the following puzzle, which she also featured in a fantastic (spoiler-free) YouTube video. If you’re seeing this for the first time, watch the video first. Consider a square in the xy-plane, and let A (an “assassin”) and T (a “target”) be two arbitrary-but-fixed points within the square. Suppose that the square behaves like a billiard table, so that any ray (a.k.a “shot”) from the assassin will bounce off the sides of the square, with the angle of incidence equaling the angle of reflection.

For mathematicians, = does not mean equality

Fri, 13 Apr 2018 08:00:05 +0000

Every now and then I hear some ridiculous things about the equals symbol. Some large subset of programmers—perhaps related to functional programmers, perhaps not—seem to think that = should only and ever mean “equality in the mathematical sense.” The argument usually goes, Functional programming gives us back that inalienable right to analyze things by using mathematics. Never again need we bear the burden of that foul mutant x = x+1! No novice programmer—nay, not even a mathematician!

A parlor trick for SET

Sun, 25 Mar 2018 09:51:34 +0000

Tai-Danae Bradley is one of the hosts of PBS Infinite Series, a delightful series of vignettes into fun parts of math. The video below is about the same of SET, a favorite among mathematicians. Specifically, Tai-Danae explains how SET cards lie in (using more technical jargon) a vector space over a finite field, and that valid sets correspond to lines. If you don’t immediately know how this would work, watch the video.

Earthmover Distance

Mon, 05 Mar 2018 08:00:15 +0000

Problem: Compute distance between points with uncertain locations (given by samples, or differing observations, or clusters). For example, if I have the following three “points” in the plane, as indicated by their colors, which is closer, blue to green, or blue to red? It’s not obvious, and there are multiple factors at work: the red points have fewer samples, but we can be more certain about the position; the blue points are less certain, but the closest non-blue point to a blue point is green; and the green points are equally plausibly “close to red” and “close to blue.

NP-hard does not mean hard

Fri, 29 Dec 2017 13:30:23 +0000

When NP-hardness pops up on the internet, say because some silly blogger wants to write about video games, it’s often tempting to conclude that the problem being proved NP-hard is actually very hard! “Scientists proved Super Mario is NP-hard? I always knew there was a reason I wasn’t very good at it!” Sorry, these two are unrelated. NP-hardness means hard in a narrow sense this post should hopefully make clear. After that, we’ll explore what “hard” means in a mathematical sense that you can apply beyond NP-hardness to inform your work as a programmer.

Binary Search on Graphs

Wed, 08 Nov 2017 08:59:38 +0000

Binary search is one of the most basic algorithms I know. Given a sorted list of comparable items and a target item being sought, binary search looks at the middle of the list, and compares it to the target. If the target is larger, we repeat on the smaller half of the list, and vice versa. With each comparison the binary search algorithm cuts the search space in half. The result is a guarantee of no more than $ \log(n)$ comparisons, for a total runtime of $ O(\log n)$.

Linear Programming and Healthy Diets — Part 2

Sun, 24 Sep 2017 08:00:46 +0000

Previously in this series: Linear programming and healthy diets — Part 1 Linear programing and the simplex algorithm Foods of the Father My dad’s an interesting guy. Every so often he picks up a health trend and/or weight loss goal that would make many people’s jaw drop. For example, we once went on a 5-day, 50-mile backpacking trip in the Grand Tetons, and my dad brought one packet of Lipton’s Side Dishes noodle soup per day for dinner, and had vitamin tablets for the rest of his sustenance.

Notes on Math and Gerrymandering

Mon, 14 Aug 2017 08:00:40 +0000

Last week I was in Boston for the Geometry of Redistricting workshop. It was an optimistic gathering of over 500 mathematicians, computer scientists, lawyers, policy makers, teachers, and interested people of all stripes. There was a ton of information in the talks and subsequent discussions. I’ll try to distill the main ideas and avenues for research as best I can. Unfortunately, due to how preliminary most of the technical work is, I won’t be presenting any concrete code or algorithms.

Boolean Logic in Polynomials

Mon, 24 Jul 2017 09:00:38 +0000

Problem: Express a boolean logic formula using polynomials. I.e., if an input variable $ x$ is set to $ 0$, that is interpreted as false, while $ x=1$ is interpreted as true. The output of the polynomial should be 0 or 1 according to whether the formula is true or false as a whole. Solution: You can do this using a single polynomial. Illustrating with an example: the formula is $ \neg[(a \vee b) \wedge (\neg c \vee d)]$ also known as

Mathematical Genealogy

Thu, 22 Jun 2017 09:00:10 +0000

As a fun side project to distract me from my abysmal progress on my book, I decided to play around with the math genealogy graph! For those who don’t know, since 1996, mathematicians, starting with the labor of Harry Coonce et al, have been managing a database of all mathematicians. More specifically, they’ve been keeping track of who everyone’s thesis advisors and subsequent students were. The result is a directed graph (with a current estimated 200k nodes) that details the scientific lineage of mathematicians.

Duality for the SVM

Mon, 12 Jun 2017 08:00:50 +0000

This post is a sequel to Formulating the Support Vector Machine Optimization Problem. The Karush-Kuhn-Tucker theorem Generic optimization problems are hard to solve efficiently. However, optimization problems whose objective and constraints have special structure often succumb to analytic simplifications. For example, if you want to optimize a linear function subject to linear equality constraints, one can compute the Lagrangian of the system and find the zeros of its gradient. More generally, optimizing a linear function subject to linear equality and inequality constraints can be solved using various so-called “linear programming” techniques, such as the simplex algorithm.

Formulating the Support Vector Machine Optimization Problem

Mon, 05 Jun 2017 08:00:46 +0000

The hypothesis and the setup This blog post has an interactive demo (mostly used toward the end of the post). The source for this demo is available in a Github repository. Last time we saw how the inner product of two vectors gives rise to a decision rule: if $ w$ is the normal to a line (or hyperplane) $ L$, the sign of the inner product $ \langle x, w \rangle$ tells you whether $ x$ is on the same side of $ L$ as $ w$.

The Inner Product as a Decision Rule

Mon, 22 May 2017 08:00:00 +0000

The standard inner product of two vectors has some nice geometric properties. Given two vectors $ x, y \in \mathbb{R}^n$, where by $ x_i$ I mean the $ i$-th coordinate of $ x$, the standard inner product (which I will interchangeably call the dot product) is defined by the formula $$\displaystyle \langle x, y \rangle = x_1 y_1 + \dots + x_n y_n$$ This formula, simple as it is, produces a lot of interesting geometry.

Testing Polynomial Equality

Mon, 24 Apr 2017 09:00:14 +0000

Problem: Determine if two polynomial expressions represent the same function. Specifically, if $ p(x_1, x_2, \dots, x_n)$ and $ q(x_1, x_2, \dots, x_n)$ are a polynomial with inputs, outputs and coefficients in a field $ F$, where $ |F|$ is sufficiently large, then the problem is to determine if $ p(\mathbf{x}) = q(\mathbf{x})$ for every $ x \in F$, in time polynomial in the number of bits required to write down $ p$ and $ q$.

Bayesian Ranking for Rated Items

Mon, 13 Mar 2017 09:00:49 +0000

Problem: You have a catalog of items with discrete ratings (thumbs up/thumbs down, or 5-star ratings, etc.), and you want to display them in the “right” order. Solution: In Python ''' score: [int], [int], [float] -> float Return the expected value of the rating for an item with known ratings specified by `ratings`, prior belief specified by `rating_prior`, and a utility function specified by `rating_utility`, assuming the ratings are a multinomial distribution and the prior belief is a Dirichlet distribution.

The Reasonable Effectiveness of the Multiplicative Weights Update Algorithm

Mon, 27 Feb 2017 08:00:42 +0000

papad Hard to believe Sanjeev Arora and his coauthors consider it “a basic tool [that should be] taught to all algorithms students together with divide-and-conquer, dynamic programming, and random sampling.” Christos Papadimitriou calls it “so hard to believe that it has been discovered five times and forgotten.” It has formed the basis of algorithms in machine learning, optimization, game theory, economics, biology, and more. What mystical algorithm has such broad applications?

A Spectral Analysis of Moore Graphs

Thu, 03 Nov 2016 08:00:14 +0000

For fixed integers $ r > 0$, and odd $ g$, a Moore graph is an $ r$-regular graph of girth $ g$ which has the minimum number of vertices $ n$ among all such graphs with the same regularity and girth. (Recall, A the girth of a graph is the length of its shortest cycle, and it’s regular if all its vertices have the same degree) Problem (Hoffman-Singleton): Find a useful constraint on the relationship between $ n$ and $ r$ for Moore graphs of girth $ 5$ and degree $ r$.

Voltage, Temperature, and Harmonic Functions

Mon, 26 Sep 2016 09:00:52 +0000

This is a guest post by my friend and colleague Samantha Davies. Samantha is a math Ph.D student at the University of Washington, and a newly minted math blogger. Go check out her blog, With High Probability. If I said “let’s talk about temperature and voltage”, you might be interested, but few would react the same if instead I suggested an umbrella term: harmonic functions. Harmonic functions are often introduced as foreign operators that follow a finicky set of rules, when really they’re extremely natural phenomena that everyone has seen before.

Guest post, "What's up with graph Laplacians?"

Tue, 20 Sep 2016 15:21:20 +0000

I wrote a guest post for my friend Samantha Davies’s blog, With High Probability. It’s called, What’s up with graph Laplacians? Go check it out!

Zero-Knowledge: Definitions and Theory

Mon, 19 Sep 2016 09:00:00 +0000

The next Monday, when the fathers were all back at work, we kids were playing in a field. One kid says to me, “See that bird? What kind of bird is that?” I said, “I haven’t the slightest idea what kind of a bird it is.” He says, “It’s a brown-throated thrush. Your father doesn’t teach you anything!” But it was the opposite. He had already taught me: “See that bird?

Zero Knowledge Proofs for NP

Mon, 01 Aug 2016 08:00:53 +0000

Last time, we saw a specific zero-knowledge proof for graph isomorphism. This introduced us to the concept of an interactive proof, where you have a prover and a verifier sending messages back and forth, and the prover is trying to prove a specific claim to the verifier. A zero-knowledge proof is a special kind of interactive proof in which the prover has some secret piece of knowledge that makes it very easy to verify a disputed claim is true.

The Blum-Blum-Shub Pseudorandom Generator

Mon, 11 Jul 2016 08:00:54 +0000

Problem: Design a random number generator that is computationally indistinguishable from a truly random number generator. Solution (in Python): note this solution uses the Miller-Rabin primality tester, though any primality test will do. See the github repository for the referenced implementation. from randomized.primality import probablyPrime import random def goodPrime(p): return p % 4 == 3 and probablyPrime(p, accuracy=100) def findGoodPrime(numBits=512): candidate = 1 while not goodPrime(candidate): candidate = random.getrandbits(numBits) return candidate def makeModulus(): return findGoodPrime() * findGoodPrime() def parity(n): return sum(int(x) for x in bin(n)[2:]) % 2 class BlumBlumShub(object): def __init__(self, seed=None): self.

Zero Knowledge Proofs — A Primer

Tue, 05 Jul 2016 08:00:29 +0000

In this post we’ll get a strong taste for zero knowledge proofs by exploring the graph isomorphism problem in detail. In the next post, we’ll see how this relates to cryptography and the bigger picture. The goal of this post is to get a strong understanding of the terms “prover,” “verifier,” and “simulator,” and “zero knowledge” in the context of a specific zero-knowledge proof. Then next time we’ll see how the same concepts (though not the same proof) generalizes to a cryptographically interesting setting.

Singular Value Decomposition Part 2: Theorem, Proof, Algorithm

Mon, 16 May 2016 08:00:19 +0000

I’m just going to jump right into the definitions and rigor, so if you haven’t read the previous post motivating the singular value decomposition, go back and do that first. This post will be theorem, proof, algorithm, data. The data set we test on is a thousand-story CNN news data set. All of the data, code, and examples used in this post is in a github repository, as usual. We start with the best-approximating $ k$-dimensional linear subspace.

Singular Value Decomposition Part 1: Perspectives on Linear Algebra

Mon, 18 Apr 2016 09:00:40 +0000

The singular value decomposition (SVD) of a matrix is a fundamental tool in computer science, data analysis, and statistics. It’s used for all kinds of applications from regression to prediction, to finding approximate solutions to optimization problems. In this series of two posts we’ll motivate, define, compute, and use the singular value decomposition to analyze some data. (Jump to the second post) I want to spend the first post entirely on motivation and background.

Tensorphobia and the Outer Product

Mon, 28 Mar 2016 09:00:46 +0000

Variations on a theme Back in 2014 I wrote a post called How to Conquer Tensorphobia that should end up on Math $ \cap$ Programming’s “greatest hits” album. One aspect of tensors I neglected to discuss was the connection between the modern views of tensors and the practical views of linear algebra. I feel I need to write this because every year or two I forget why it makes sense.

My Graduate Career in Math

Sat, 05 Mar 2016 18:41:01 +0000

2010–2011 (Year 0) I had just switched my major at Cal Poly State University from computer science to math. I wanted to double major but California was in a budget crisis and a few weeks before I tried submitting my double-major request the Provost for the CSU system put a blanket ban on double majors. I sat in a long meeting with a vice-dean-something-or-other paper pusher for the College of Engineering who patronizingly tried to convince me that my life would be better if, instead of getting a math degree, I did Cal Poly’s accelerated MS in computer science.

Big Dimensions, and What You Can Do About It

Mon, 08 Feb 2016 10:00:14 +0000

Data is abundant, data is big, and big is a problem. Let me start with an example. Let’s say you have a list of movie titles and you want to learn their genre: romance, action, drama, etc. And maybe in this scenario IMDB doesn’t exist so you can’t scrape the answer. Well, the title alone is almost never enough information. One nice way to get more data is to do the following:

Concrete Examples of Quantum Gates

Mon, 11 Jan 2016 14:05:22 +0000

So far in this series we’ve seen a lot of motivation and defined basic ideas of what a quantum circuit is. But on rereading my posts, I think we would all benefit from some concreteness. “Local” operations So by now we’ve understood that quantum circuits consist of a sequence of gates $ A_1, \dots, A_k$, where each $ A_i$ is an 8-by-8 matrix that operates “locally” on some choice of three (or fewer) qubits.

Hashing to Estimate the Size of a Stream

Mon, 04 Jan 2016 09:00:37 +0000

Problem: Estimate the number of distinct items in a data stream that is too large to fit in memory. Solution: (in python) import random def randomHash(modulus): a, b = random.randint(0,modulus-1), random.randint(0,modulus-1) def f(x): return (a*x + b) % modulus return f def average(L): return sum(L) / len(L) def numDistinctElements(stream, numParallelHashes=10): modulus = 2**20 hashes = [randomHash(modulus) for _ in range(numParallelHashes)] minima = [modulus] * numParallelHashes currentEstimate = 0 for i in stream: hashValues = [h(i) for h in hashes] for i, newValue in enumerate(hashValues): if newValue < minima[i]: minima[i] = newValue currentEstimate = modulus / average(minima) yield currentEstimate Discussion: The technique used here is to use random hash functions.

Load Balancing and the Power of Hashing

Mon, 28 Dec 2015 09:00:31 +0000

Here’s a bit of folklore I often hear (and retell) that’s somewhere between a joke and deep wisdom: if you’re doing a software interview that involves some algorithms problem that seems hard, your best bet is to use hash tables. More succinctly put: Google loves hash tables. As someone with a passion for math and theoretical CS, it’s kind of silly and reductionist. But if you actually work with terabytes of data that can’t fit on a single machine, it also makes sense.

The Inequality

Mon, 23 Nov 2015 22:06:27 +0000

Math and computer science are full of inequalities, but there is one that shows up more often in my work than any other. Of course, I’m talking about $$\displaystyle 1+x \leq e^{x}$$ This is The Inequality. I’ve been told on many occasions that the entire field of machine learning reduces to The Inequality combined with the Chernoff bound (which is proved using The Inequality). Why does it show up so often in machine learning?

A Quasipolynomial Time Algorithm for Graph Isomorphism: The Details

Thu, 12 Nov 2015 09:00:00 +0000

Update 2017-01-09: Laci claims to have found a workaround to the previously posted error, and the claim is again quasipolynoimal time! Updated arXiv paper to follow. Update 2017-01-04: Laci has posted an update on his paper. The short version is that one small step of his analysis was not quite correct, and the result is that his algorithm is sub-exponential, but not quasipolynomial time. The fact that this took over a year to sort out is a testament to the difficulty of the mathematics and the skill of the mathematicians involved.

Serial Dictatorships and House Allocation

Mon, 26 Oct 2015 11:08:00 +0000

I was recently an invited speaker in a series of STEM talks at Moraine Valley Community College. My talk was called “What can algorithms tell us about life, love, and happiness?” and it’s on Youtube now so you can go watch it. The central theme of the talk was the lens of computation, that algorithms and theoretical computer science can provide new and novel explanations for the natural phenomena we observe in the world.

One definition of algorithmic fairness: statistical parity

Mon, 19 Oct 2015 09:00:00 +0000

If you haven’t read the first post on fairness, I suggest you go back and read it because it motivates why we’re talking about fairness for algorithms in the first place. In this post I’ll describe one of the existing mathematical definitions of “fairness,” its origin, and discuss its strengths and shortcomings. Before jumping in I should remark that nobody has found a definition which is widely agreed as a good definition of fairness in the same way we have for, say, the security of a random number generator.

The Boosting Margin, or Why Boosting Doesn't Overfit

Mon, 21 Sep 2015 11:33:00 +0000

There’s a well-understood phenomenon in machine learning called overfitting. The idea is best shown by a graph: overfitting Let me explain. The vertical axis represents the error of a hypothesis. The horizontal axis represents the complexity of the hypothesis. The blue curve represents the error of a machine learning algorithm’s output on its training data, and the red curve represents the generalization of that hypothesis to the real world. The overfitting phenomenon is marker in the middle of the graph, before which the training error and generalization error both go down, but after which the training error continues to fall while the generalization error rises.

The Welch-Berlekamp Algorithm for Correcting Errors in Data

Mon, 07 Sep 2015 11:02:00 +0000

In this post we’ll implement Reed-Solomon error-correcting codes and use them to play with codes. In our last post we defined Reed-Solomon codes rigorously, but in this post we’ll focus on intuition and code. As usual the code and data used in this post is available on this blog’s Github page. The main intuition behind Reed-Solomon codes (and basically all the historically major codes) is Error correction is about adding redundancy, and polynomials are a really efficient way to do that.

The Čech Complex and the Vietoris-Rips Complex

Thu, 06 Aug 2015 09:00:00 +0000

It’s about time we got back to computational topology. Previously in this series we endured a lightning tour of the fundamental group and homology, then we saw how to compute the homology of a simplicial complex using linear algebra. What we really want to do is talk about the inherent shape of data. Homology allows us to compute some qualitative features of a given shape, i.e., find and count the number of connected components or a given shape, or the number of “2-dimensional holes” it has.

What does it mean for an algorithm to be fair?

Mon, 13 Jul 2015 09:00:00 +0000

In 2014 the White House commissioned a 90-day study that culminated in a report (pdf) on the state of “big data” and related technologies. The authors give many recommendations, including this central warning. Warning: algorithms can facilitate illegal discrimination! Here’s a not-so-imaginary example of the problem. A bank wants people to take loans with high interest rates, and it also serves ads for these loans. A modern idea is to use an algorithm to decide, based on the sliver of known information about a user visiting a website, which advertisement to present that gives the largest chance of the user clicking on it.

Methods of Proof — Diagonalization

Mon, 08 Jun 2015 09:00:00 +0000

A while back we featured a post about why learning mathematics can be hard for programmers, and I claimed a major issue was not understanding the basic methods of proof (the lingua franca between intuition and rigorous mathematics). I boiled these down to the “basic four,” direct implication, contrapositive, contradiction, and induction. But in mathematics there is an ever growing supply of proof methods. There are books written about the “probabilistic method,” and I recently went to a lecture where the “linear algebra method” was displayed.

Weak Learning, Boosting, and the AdaBoost algorithm

Mon, 18 May 2015 09:00:00 +0000

When addressing the question of what it means for an algorithm to learn, one can imagine many different models, and there are quite a few. This invariably raises the question of which models are “the same” and which are “different,” along with a precise description of how we’re comparing models. We’ve seen one learning model so far, called Probably Approximately Correct (PAC), which espouses the following answer to the learning question:

The Many Faces of Set Cover

Mon, 04 May 2015 09:00:00 +0000

A while back Peter Norvig posted a wonderful pair of articles about regex golf. The idea behind regex golf is to come up with the shortest possible regular expression that matches one given list of strings, but not the other. “Regex Golf,” by Randall Munroe. In the first article, Norvig runs a basic algorithm to recreate and improve the results from the comic, and in the second he beefs it up with some improved search heuristics.

Markov Chain Monte Carlo Without all the Bullshit

Mon, 06 Apr 2015 09:00:00 +0000

I have a little secret: I don’t like the terminology, notation, and style of writing in statistics. I find it unnecessarily complicated. This shows up when trying to read about Markov Chain Monte Carlo methods. Take, for example, the abstract to the Markov Chain Monte Carlo article in the Encyclopedia of Biostatistics. Markov chain Monte Carlo (MCMC) is a technique for estimating by simulation the expectation of a statistic in a complex model.

The Codes of Solomon, Reed, and Muller

Mon, 23 Mar 2015 09:00:00 +0000

Last time we defined the Hamming code. We also saw that it meets the Hamming bound, which is a measure of how densely a code can be packed inside an ambient space and still maintain a given distance. This time we’ll define the Reed-Solomon code which optimizes a different bound called the Singleton bound, and then generalize them to a larger class of codes called Reed-Muller codes. In future posts we’ll consider algorithmic issues behind decoding the codes, for now we just care about their existence and optimality properties.

Finding the majority element of a stream

Mon, 09 Mar 2015 09:00:11 +0000

Problem: Given a massive data stream of $ n$ values in $ \{ 1, 2, \dots, m \}$ and the guarantee that one value occurs more than $ n/2$ times in the stream, determine exactly which value does so. Solution: (in Python) def majority(stream): held = next(stream) counter = 1 for item in stream: if item == held: counter += 1 elif counter == 0: held = item counter = 1 else: counter -= 1 return held Discussion: Let’s prove correctness.

Hamming's Code

Mon, 02 Mar 2015 09:00:00 +0000

Or how to detect and correct errors Last time we made a quick tour through the main theorems of Claude Shannon, which essentially solved the following two problems about communicating over a digital channel. What is the best encoding for information when you are guaranteed that your communication channel is error free? Are there any encoding schemes that can recover from random noise introduced during transmission? The answers to these questions were purely mathematical theorems, of course.

A Proofless Introduction to Information Theory

Mon, 16 Feb 2015 09:00:00 +0000

There are two basic problems in information theory that are very easy to explain. Two people, Alice and Bob, want to communicate over a digital channel over some long period of time, and they know the probability that certain messages will be sent ahead of time. For example, English language sentences are more likely than gibberish, and “Hi” is much more likely than “asphyxiation.” The problems are: Say communication is very expensive.

Zero-One Laws for Random Graphs

Mon, 09 Feb 2015 09:00:00 +0000

Last time we saw a number of properties of graphs, such as connectivity, where the probability that an Erdős–Rényi random graph $ G(n,p)$ satisfies the property is asymptotically either zero or one. And this zero or one depends on whether the parameter $ p$ is above or below a universal threshold (that depends only on $ n$ and the property in question). To remind the reader, the Erdős–Rényi random “graph” $ G(n,p)$ is a distribution over graphs that you draw from by including each edge independently with probability $ p$.

The Giant Component and Explosive Percolation

Mon, 02 Feb 2015 09:00:00 +0000

Last time we left off with a tantalizing conjecture: a random graph with edge probability $ p = 5/n$ is almost surely a connected graph. We arrived at that conjecture from some ad-hoc data analysis, so let’s go back and treat it with some more rigorous mathematical techniques. As we do, we’ll discover some very interesting “threshold theorems” that essentially say a random graph will either certainly have a property, or it will certainly not have it.

Multiple Qubits and the Quantum Circuit

Mon, 26 Jan 2015 09:00:00 +0000

Last time we left off with the tantalizing question: how do you do a quantum “AND” operation on two qubits? In this post we’ll see why the tensor product is the natural mathematical way to represent the joint state of multiple qubits. Then we’ll define some basic quantum gates, and present the definition of a quantum circuit. Working with Multiple Qubits In a classical system, if you have two bits with values $ b_1, b_2$, then the “joint state” of the two bits is given by the concatenated string $ b_1b_2$.

The Quantum Bit

Mon, 15 Dec 2014 10:00:52 +0000

The best place to start our journey through quantum computing is to recall how classical computing works and try to extend it. Since our final quantum computing model will be a circuit model, we should informally discuss circuits first. A circuit has three parts: the “inputs,” which are bits (either zero or one); the “gates,” which represent the lowest-level computations we perform on bits; and the “wires,” which connect the outputs of gates to the inputs of other gates.

A Motivation for Quantum Computing

Mon, 08 Dec 2014 10:00:48 +0000

Quantum mechanics is one of the leading scientific theories describing the rules that govern the universe. It’s discovery and formulation was one of the most important revolutions in the history of mankind, contributing in no small part to the invention of the transistor and the laser. Here at Math ∩ Programming we don’t put too much emphasis on physics or engineering, so it might seem curious to study quantum physics. But as the reader is likely aware, quantum mechanics forms the basis of one of the most interesting models of computing since the Turing machine: the quantum circuit.

Linear Programming and the Simplex Algorithm

Mon, 01 Dec 2014 10:00:59 +0000

In the last post in this series we saw some simple examples of linear programs, derived the concept of a dual linear program, and saw the duality theorem and the complementary slackness conditions which give a rough sketch of the stopping criterion for an algorithm. This time we’ll go ahead and write this algorithm for solving linear programs, and next time we’ll apply the algorithm to an industry-strength version of the nutrition problem we saw last time.

Learning a single-variable polynomial, or the power of adaptive queries

Tue, 18 Nov 2014 09:00:18 +0000

Problem: Alice chooses a secret polynomial $ p(x)$ with nonnegative integer coefficients. Bob wants to discover this polynomial by querying Alice for the value of $ p(x)$ for some integer $ x$ of Bob’s choice. What is the minimal number of queries Bob needs to determine $ p(x)$ exactly? Solution: Two queries. The first is $ p(1)$, and if we call $ N = p(1) + 1$, then the second query is $ p(N)$.

The Complexity of Communication

Mon, 10 Nov 2014 09:00:25 +0000

satellite One of the most interesting questions posed in the last thirty years of computer science is to ask how much “information” must be communicated between two parties in order for them to jointly compute something. One can imagine these two parties living on distant planets, so that the cost of communicating any amount of information is very expensive, but each person has an integral component of the answer that the other does not.

On the Computational Complexity of MapReduce

Sun, 05 Oct 2014 09:00:30 +0000

I recently wrapped up a fun paper with my coauthors Ben Fish, Adam Lelkes, Lev Reyzin, and Gyorgy Turan in which we analyzed the computational complexity of a model of the popular MapReduce framework. Check out the preprint on the arXiv. Update: this paper is now published in the proceedings of DISC2015. As usual I’ll give a less formal discussion of the research here, and because the paper is a bit more technically involved than my previous work I’ll be omitting some of the more pedantic details.

Making Hybrid Images

Mon, 29 Sep 2014 10:00:06 +0000

The Mona Lisa Leonardo da Vinci’s Mona Lisa is one of the most famous paintings of all time. And there has always been a discussion around her enigmatic smile. He used a trademark Renaissance technique called sfumato, which involves many thin layers of glaze mixed with subtle pigments. The striking result is that when you look directly at Mona Lisa’s smile, it seems to disappear. But when you look at the background your peripherals see a smiling face.

Occam's Razor and PAC-learning

Fri, 19 Sep 2014 10:00:47 +0000

So far our discussion of learning theory has been seeing the definition of PAC-learning, tinkering with it, and seeing simple examples of learnable concept classes. We’ve said that our real interest is in proving big theorems about what big classes of problems can and can’t be learned. One major tool for doing this with PAC is the concept of VC-dimension, but to set the stage we’re going to prove a simpler theorem that gives a nice picture of PAC-learning when your hypothesis class is small.

A Rook Game

Sun, 31 Aug 2014 17:51:37 +0000

Problem: Two players take turns moving a rook on an 8×8 chessboard. The rook is only allowed to move south or west (but not both in a single turn), and may move any number of squares in the chosen direction on a turn. The loser is the player who first cannot move the rook. What is the optimal play for any starting position? rook-board Solution: Take advantage of the symmetry of the board.

When Greedy Algorithms are Perfect: the Matroid

Tue, 26 Aug 2014 09:00:02 +0000

Greedy algorithms are by far one of the easiest and most well-understood algorithmic techniques. There is a wealth of variations, but at its core the greedy algorithm optimizes something using the natural rule, “pick what looks best” at any step. So a greedy routing algorithm would say to a routing problem: “You want to visit all these locations with minimum travel time? Let’s start by going to the closest one. And from there to the next closest one.

Parameterizing the Vertex Cover Problem

Mon, 25 Aug 2014 06:50:11 +0000

I’m presenting a paper later this week at the Matheamtical Foundations of Computer Science 2014 in Budapest, Hungary. This conference is an interesting mix of logic and algorithms that aims to bring together researchers from these areas to discuss their work. And right away the first session on the first day focused on an area I know is important but have little experience with: fixed parameter complexity. From what I understand it’s not that popular of a topic at major theory conferences in the US (there appears to be only one paper on it at this year’s FOCS conference), but the basic ideas are worth knowing.

An Update on "Coloring Resilient Graphs"

Mon, 14 Jul 2014 10:00:51 +0000

A while back I announced a preprint of a paper on coloring graphs with certain resilience properties. I’m pleased to announce that it’s been accepted to the Mathematical Foundations of Computer Science 2014, which is being held in Budapest this year. Since we first published the preprint we’ve actually proved some additional results about resilience, and so I’ll expand some of the details here. I think it makes for a nicer overall picture, and in my opinion it gives a little more justification that resilient coloring is interesting, at least in contrast to other resilience problems.

When Greedy Algorithms are Good Enough: Submodularity and the (1—1/e)-Approximation

Mon, 07 Jul 2014 10:00:01 +0000

Greedy algorithms are among the simplest and most intuitive algorithms known to humans. Their name essentially gives their description: do the thing that looks best right now, and repeat until nothing looks good anymore or you’re forced to stop. Some of the best situations in computer science are also when greedy algorithms are optimal or near-optimal. There is a beautiful theory of this situation, known as the theory of matroids. We haven’t covered matroids on this blog (edit: we did), but in this post we will focus on the next best thing: when the greedy algorithm guarantees a reasonably good approximation to the optimal solution.

The Mathematics of Secret Sharing

Mon, 23 Jun 2014 09:00:10 +0000

Here’s a simple puzzle with a neat story. A rich old woman is drafting her will and wants to distribute her expansive estate equally amongst her five children. But her children are very greedy, and the woman knows that if he leaves her will unprotected her children will resort to nefarious measures to try to get more than their fair share. In one fearful scenario, she worries that the older four children will team up to bully the youngest child entirely out of his claim!

Linear Programming and Healthy Diets — Part 1

Mon, 02 Jun 2014 09:00:27 +0000

Optimization is by far one of the richest ways to apply computer science and mathematics to the real world. Everybody is looking to optimize something: companies want to maximize profits, factories want to maximize efficiency, investors want to minimize risk, the list just goes on and on. The mathematical tools for optimization are also some of the richest mathematical techniques. They form the cornerstone of an entire industry known as operations research, and advances in this field literally change the world.

Learning to Love Complex Numbers

Mon, 26 May 2014 09:00:48 +0000

This post is intended for people with a little bit of programming experience and no prior mathematical background. So let’s talk about numbers. Numbers are curious things. On one hand, they represent one of the most natural things known to humans, which is quantity. It’s so natural to humans that even newborn babies are in tune with the difference between quantities of objects between 1 and 3, in that they notice when quantity changes much more vividly than other features like color or shape.

Community Detection in Graphs — a Casual Tour

Mon, 19 May 2014 10:00:32 +0000

Graphs are among the most interesting and useful objects in mathematics. Any situation or idea that can be described by objects with connections is a graph, and one of the most prominent examples of a real-world graph that one can come up with is a social network. Recall, if you aren’t already familiar with this blog’s gentle introduction to graphs, that a graph $ G$ is defined by a set of vertices $ V$, and a set of edges $ E$, each of which connects two vertices.

A problem that is not (properly) PAC-learnable

Mon, 21 Apr 2014 10:00:16 +0000

In a previous post we introduced a learning model called Probably Approximately Correct (PAC). We saw an example of a concept class that was easy to learn: intervals on the real line (and more generally, if you did the exercise, axis-aligned rectangles in a fixed dimension). One of the primary goals of studying models of learning is to figure out what is learnable and what is not learnable in the various models.

Sending and Authenticating Messages with Elliptic Curves

Mon, 14 Apr 2014 10:00:38 +0000

Last time we saw the Diffie-Hellman key exchange protocol, and discussed the discrete logarithm problem and the related Diffie-Hellman problem, which form the foundation for the security of most protocols that use elliptic curves. Let’s continue our journey to investigate some more protocols. Just as a reminder, the Python implementations of these protocols are not at all meant for practical use, but for learning purposes. We provide the code on this blog’s Github page, but for the love of security don’t actually use them.

Stable Marriages and Designing Markets

Wed, 02 Apr 2014 18:21:55 +0000

Here is a fun puzzle. Suppose we have a group of 10 men and 10 women, and each of the men has sorted the women in order of their preference for marriage (that is, a man prefers to marry a woman earlier in his list over a woman later in the list). Likewise, each of the women has sorted the men in order of marriageability. We might ask if there is any way that we, the omniscient cupids of love, can decide who should marry to make everyone happy.

Elliptic Curve Diffie-Hellman

Mon, 31 Mar 2014 10:22:51 +0000

So far in this series we’ve seen elliptic curves from many perspectives, including the elementary, algebraic, and programmatic ones. We implemented finite field arithmetic and connected it to our elliptic curve code. So we’re in a perfect position to feast on the main course: how do we use elliptic curves to actually do cryptography? History As the reader has heard countless times in this series, an elliptic curve is a geometric object whose points have a surprising and well-defined notion of addition.

Connecting Elliptic Curves with Finite Fields

Wed, 19 Mar 2014 10:00:05 +0000

So here we are. We’ve studied the general properties of elliptic curves, written a program for elliptic curve arithmetic over the rational numbers, and taken a long detour to get some familiarity with finite fields (the mathematical background and a program that implements arbitrary finite field arithmetic). And now we want to get back on track and hook our elliptic curve program up with our finite field program to make everything work.

Want to make a great puzzle game? Get inspired by theoretical computer science.

Mon, 17 Mar 2014 10:00:55 +0000

Two years ago, Erik Demaine and three other researchers published a fun paper to the arXiv proving that most incarnations of classic nintendo games are NP-hard. This includes almost every Super Mario Brothers, Donkey Kong, and Pokemon title. Back then I wrote a blog post summarizing the technical aspects of their work, and even gave a talk on it to a room full of curious undergraduate math majors. But while bad tech-writers tend to interpret NP-hard as “really really hard,” the truth is more complicated.

Programming with Finite Fields

Thu, 13 Mar 2014 10:00:11 +0000

Back when I was first exposed to programming language design, I decided it would be really cool if there were a language that let you define your own number types and then do all your programming within those number types. And since I get excited about math, I think of really exotic number types (Boolean rings, Gaussian integers, Octonions, oh my!). I imagined it would be a language feature, so I could do something like this:

Martingales and the Optional Stopping Theorem

Mon, 03 Mar 2014 10:00:38 +0000

This is a guest post by my colleague Adam Lelkes. The goal of this primer is to introduce an important and beautiful tool from probability theory, a model of fair betting games called martingales. In this post I will assume that the reader is familiar with the basics of probability theory. For those that need to refresh their knowledge, Jeremy’s excellent primers (1, 2) are a good place to start.

(Finite) Fields — A Primer

Wed, 26 Feb 2014 10:00:01 +0000

So far on this blog we’ve given some introductory notes on a few kinds of algebraic structures in mathematics (most notably groups and rings, but also monoids). Fields are the next natural step in the progression. If the reader is comfortable with rings, then a field is extremely simple to describe: they’re just commutative rings with 0 and 1, where every nonzero element has a multiplicative inverse. We’ll give a list of all of the properties that go into this “simple” definition in a moment, but an even more simple way to describe a field is as a place where “arithmetic makes sense.

Elliptic Curves as Python Objects

Mon, 24 Feb 2014 10:00:28 +0000

Last time we saw a geometric version of the algorithm to add points on elliptic curves. We went quite deep into the formal setting for it (projective space $ \mathbb{P}^2$), and we spent a lot of time talking about the right way to define the “zero” object in our elliptic curve so that our issues with vertical lines would disappear. With that understanding in mind we now finally turn to code, and write classes for curves and points and implement the addition algorithm.

On Coloring Resilient Graphs

Fri, 21 Feb 2014 08:45:39 +0000

I’m pleased to announce that another paper of mine is finished. This one just got accepted to MFCS 2014, which is being held in Budapest this year (this whole research thing is exciting!). This is joint work with my advisor, Lev Reyzin. As with my first paper, I’d like to explain things here on my blog a bit more informally than a scholarly article allows. A Recent History of Graph Coloring One of the first important things you learn when you study graphs is that coloring graphs is hard.

Elliptic Curves as Algebraic Structures

Sun, 16 Feb 2014 10:00:39 +0000

Last time we looked at the elementary formulation of an elliptic curve as the solutions to the equation $$y^2 = x^3 + ax + b$$ where $ a,b$ are such that the discriminant is nonzero: $$-16(4a^3 + 27b^2) \neq 0$$ We have yet to explain why we want our equation in this form, and we will get to that, but first we want to take our idea of intersecting lines as far as possible.

Simulating a Biased Coin with a Fair Coin

Wed, 12 Feb 2014 10:00:51 +0000

This is a guest post by my friend and colleague Adam Lelkes. Adam’s interests are in algebra and theoretical computer science. This gem came up because Adam gave a talk on probabilistic computation in which he discussed this technique. Problem: simulate a biased coin using a fair coin. Solution: (in Python) def biasedCoin(binaryDigitStream, fairCoin): for d in binaryDigitStream: if fairCoin() != d: return d Discussion: This function takes two arguments, an iterator representing the binary expansion of the intended probability of getting 1 (let us denote it as $ p$) and another function that returns 1 or 0 with equal probability.

Elliptic Curves as Elementary Equations

Mon, 10 Feb 2014 10:00:40 +0000

Finding solutions to systems of polynomial equations is one of the oldest and deepest problems in all of mathematics. This is broadly the domain of algebraic geometry, and mathematicians wield some of the most sophisticated and abstract tools available to attack these problems. The elliptic curve straddles the elementary and advanced mathematical worlds in an interesting way. On one hand, it’s easy to describe in elementary terms: it’s the set of solutions to a cubic function of two variables.

Simulating a Fair Coin with a Biased Coin

Sat, 08 Feb 2014 19:03:46 +0000

This is a guest post by my friend and colleague Adam Lelkes. Adam’s interests are in algebra and theoretical computer science. This gem came up because Adam gave a talk on probabilistic computation in which he discussed this technique. Problem: Simulate a fair coin given only access to a biased coin. Solution: (in Python) def fairCoin(biasedCoin): coin1, coin2 = 0,0 while coin1 == coin2: coin1, coin2 = biasedCoin(), biasedCoin() return coin1 Discussion: This is originally von Neumann’s clever idea.

Introducing Elliptic Curves

Sat, 08 Feb 2014 10:00:14 +0000

With all the recent revelations of government spying and backdoors into cryptographic standards, I am starting to disagree with the argument that you should never roll your own cryptography. Of course there are massive pitfalls and very few people actually need home-brewed cryptography, but history has made it clear that blindly accepting the word of the experts is not an acceptable course of action. What we really need is more understanding of cryptography, and implementing the algorithms yourself is the best way to do that.

Fixing Bugs in "Computing Homology"

Thu, 23 Jan 2014 23:05:16 +0000

A few awesome readers have posted comments in Computing Homology to the effect of, “Your code is not quite correct!” And they’re right! Despite the almost year since that post’s publication, I haven’t bothered to test it for more complicated simplicial complexes, or even the basic edge cases! When I posted it the mathematics just felt so solid to me that it had to be right (the irony is rich, I know).

RealityMining, a Case Study in the Woes of Data Processing

Tue, 21 Jan 2014 22:11:49 +0000

This post is intended to be a tutorial on how to access the RealityMining dataset using Python (because who likes Matlab?), and a rant on how annoying the process was to figure out. RealityMining is a dataset of smart-phone data logs from a group of about one hundred MIT students over the course of a year. The data includes communication and cell tower data, the latter being recorded every time a signal changes from one tower to the next.

How to Conquer Tensorphobia

Fri, 17 Jan 2014 10:10:07 +0000

A professor at Stanford once said, If you really want to impress your friends and confound your enemies, you can invoke tensor products… People run in terror from the $ \otimes$ symbol. He was explaining some aspects of multidimensional Fourier transforms, but this comment is only half in jest; people get confused by tensor products. It’s often for good reason. People who really understand tensors feel obligated to explain it using abstract language (specifically, universal properties).

Probably Approximately Correct — a Formal Theory of Learning

Thu, 02 Jan 2014 18:45:51 +0000

In tackling machine learning (and computer science in general) we face some deep philosophical questions. Questions like, “What does it mean to learn?” and, “Can a computer learn?” and, “How do you define simplicity?” and, “Why does Occam’s Razor work? (Why do simple hypotheses do well at modelling reality?)” In a very deep sense, learning theorists take these philosophical questions — or at least aspects of them — give them fleshy mathematical bodies, and then answer them with theorems and proofs.

The Two-Dimensional Fourier Transform and Digital Watermarking

Mon, 30 Dec 2013 19:24:05 +0000

We’ve studied the Fourier transform quite a bit on this blog: with four primers and the Fast Fourier Transform algorithm under our belt, it’s about time we opened up our eyes to higher dimensions. Indeed, in the decades since Cooley & Tukey’s landmark paper, the most interesting applications of the discrete Fourier transform have occurred in dimensions greater than 1. But for all our work we haven’t yet discussed what it means to take an “n-dimensional” Fourier transform.

Bandits and Stocks

Mon, 09 Dec 2013 09:00:52 +0000

So far in this series we’ve seen two nontrivial algorithms for bandit learning in two different settings. The first was the UCB1 algorithm, which operated under the assumption that the rewards for the trials were independent and stochastic. That is, each slot machine was essentially a biased coin flip, and the algorithm was trying to find the machine with the best odds. The second was the Exp3 algorithm, which held the belief that the payoffs were arbitrary.

Lagrangians for the Amnesiac

Sat, 30 Nov 2013 09:00:50 +0000

For a while I’ve been meaning to do some more advanced posts on optimization problems of all flavors. One technique that comes up over and over again is Lagrange multipliers, so this post is going to be a leisurely reminder of that technique. I often forget how to do these basic calculus-type things, so it’s good practice. We will assume something about the reader’s knowledge, but it’s a short list: know how to operate with vectors and the dot product, know how to take a partial derivative, and know that in single-variable calculus the local maxima and minima of a differentiable function $ f(x)$ occur when the derivative $ f'(x)$ vanishes.

Adversarial Bandits and the Exp3 Algorithm

Fri, 08 Nov 2013 09:00:40 +0000

In the last twenty years there has been a lot of research in a subfield of machine learning called Bandit Learning. The name comes from the problem of being faced with a large sequence of slot machines (once called one-armed bandits) each with a potentially different payout scheme. The problems in this field all focus on one central question: If I have many available actions with uncertain outcomes, how should I act to maximize the quality of my results over many trials?

Optimism in the Face of Uncertainty: the UCB1 Algorithm

Mon, 28 Oct 2013 11:42:55 +0000

startups The software world is always atwitter with predictions on the next big piece of technology. And a lot of chatter focuses on what venture capitalists express interest in. As an investor, how do you pick a good company to invest in? Do you notice quirky names like “Kaggle” and “Meebo,” require deep technical abilities, or value a charismatic sales pitch? This author personally believes we’re not thinking as big as we should be when it comes to innovation in software engineering and computer science, and that as a society we should value big pushes forward much more than we do.

The Universal Properties of Map, Fold, and Filter

Mon, 30 Sep 2013 09:00:53 +0000

A lot of people who like functional programming often give the reason that the functional style is simply more elegant than the imperative style. When compelled or inspired to explain (as I did in my old post, How I Learned to Love Functional Programming), they often point to the three “higher-order” functions map, fold, and filter, as providing a unifying framework for writing and reasoning about programs. But how unifying are they, really?

Anti-Coordination Games and Stable Graph Colorings

Mon, 09 Sep 2013 09:00:00 +0000

My First Paper I’m pleased to announce that my first paper, titled “Anti-Coordination Games and Stable Colorings,” has been accepted for publication! The venue is the Symposium on Algorithmic Game Theory, which will take place in Aachen, Germany this October. A professor of mine once told me that everyone puts their first few publications on a pedestal, so I’ll do my best to keep things down to earth by focusing on the contents of the paper and not my swirling cocktail of pride.

The Erdős-Rényi Random Graph

Thu, 22 Aug 2013 16:11:18 +0000

During the 1950’s the famous mathematician Paul Erdős and Alfred Rényi put forth the concept of a random graph and in the subsequent years of study transformed the world of combinatorics. The random graph is the perfect example of a good mathematical definition: it’s simple, has surprisingly intricate structure, and yields many applications. In this post we’ll explore basic facts about random graphs, slowly detail a proof on their applications to graph theory, and explore their more interesting properties computationally (a prelude to proofs about their structure).

Linear Regression

Sun, 18 Aug 2013 17:43:20 +0000

Machine learning is broadly split into two camps, statistical learning and non-statistical learning. The latter we’ve started to get a good picture of on this blog; we approached Perceptrons, decision trees, and neural networks from a non-statistical perspective. And generally “statistical” learning is just that, a perspective. Data is phrased in terms of independent and dependent variables, and statistical techniques are leveraged against the data. In this post we’ll focus on the simplest example of this, linear regression, and in the sequel see it applied to various learning problems.

Cauchy-Schwarz Inequality (and Amplification)

Tue, 23 Jul 2013 20:33:41 +0000

Problem: Prove that for vectors $ v, w$ in an inner product space, the inequality $$\displaystyle |\left \langle v, w \right \rangle | \leq \| v \| \| w \|$$ Solution: There is an elementary proof of the Cauchy-Schwarz inequality (see the Wikipedia article), and this proof is essentially the same. What makes this proof stand out is its insightful technique, which I first read about on Terry Tao’s blog. He calls it “textbook,” and maybe it is for an analyst, but it’s still very elegant.

Functoriality

Sun, 14 Jul 2013 10:03:29 +0000

Last time we worked through some basic examples of universal properties, specifically singling out quotients, products, and coproducts. There are many many more universal properties that we will mention as we encounter them, but there is one crucial topic in category theory that we have only hinted at: functoriality. As we’ve repeatedly stressed, the meat of category theory is in the morphisms. One natural question one might ask is, what notion of morphism is there between categories themselves?

Reservoir Sampling

Fri, 05 Jul 2013 10:00:49 +0000

Problem: Given a data stream of unknown size $ n$, pick an entry uniformly at random. That is, each entry has a $ 1/n$ chance of being chosen. Solution: (in Python) import random def reservoirSample(stream): for k,x in enumerate(stream, start=1): if random.random() < 1.0 / k: chosen = x return chosen Discussion: This is one of many techniques used to solve a problem called reservoir sampling. We often encounter data sets that we’d like to sample elements from at random.

Guest Post: Torus-Knotted Baklava

Fri, 28 Jun 2013 12:47:51 +0000

Guest Post: Torus-Knotted Baklava at Baking And Math, a blog by my friend and colleague, Yen.

Miller-Rabin Primality Test

Sun, 16 Jun 2013 18:55:40 +0000

Problem: Determine if a number is prime, with an acceptably small error rate. Solution: (in Python) import random def decompose(n): exponentOfTwo = 0 while n % 2 == 0: n = n/2 exponentOfTwo += 1 return exponentOfTwo, n def isWitness(possibleWitness, p, exponent, remainder): possibleWitness = pow(possibleWitness, remainder, p) if possibleWitness == 1 or possibleWitness == p - 1: return False for _ in range(exponent): possibleWitness = pow(possibleWitness, 2, p) if possibleWitness == p - 1: return False return True def probablyPrime(p, accuracy=100): if p == 2 or p == 3: return True if p < 2: return False exponent, remainder = decompose(p - 1) for _ in range(accuracy): possibleWitness = random.

Why Theoretical Computer Scientists Aren't Worried About Privacy

Mon, 10 Jun 2013 09:51:27 +0000

There has been a lot of news recently on government surveillance of its citizens. The biggest two that have pervaded my news feeds are the protests in Turkey, which in particular have resulted in particular oppression of social media users, and the recent light on the US National Security Agency’s widespread “backdoor” in industry databases at Google, Verizon, Facebook, and others. It appears that the facts are in flux, as some companies have denied their involvement in this program, but regardless of the truth the eye of the public has landed firmly on questions of privacy.

Conferences, Summer Work, and an Advisor

Mon, 03 Jun 2013 17:01:08 +0000

I’ve been spending a little less time on my blog recently then I’d like to, but for good reason: I’ve been attending two weeks of research conferences, I’m getting ready for a summer internship in cybersecurity, and I’ve finally chosen an advisor. Visions, STOC, and CCC The Simons Institute at UC Berkeley I’ve been taking a break from the Midwest for the last two weeks to attend some of this year’s seminal computer science theory conferences.

Rings — A Second Primer

Sat, 01 Jun 2013 23:49:01 +0000

Last time we defined and gave some examples of rings. Recapping, a ring is a special kind of group with an additional multiplication operation that “plays nicely” with addition. The important thing to remember is that a ring is intended to remind us arithmetic with integers (though not too much: multiplication in a ring need not be commutative). We proved some basic properties, like zero being unique and negation being well-behaved.

Universal Properties

Fri, 24 May 2013 14:53:25 +0000

Previously in this series we’ve seen the definition of a category and a bunch of examples, basic properties of morphisms, and a first look at how to represent categories as types in ML. In this post we’ll expand these ideas and introduce the notion of a universal property. We’ll see examples from mathematics and write some programs which simultaneously prove certain objects have universal properties and construct the morphisms involved.

Properties of Morphisms

Wed, 15 May 2013 12:43:42 +0000

This post is mainly mathematical. We left it out of our introduction to categories for brevity, but we should lay these definitions down and some examples before continuing on to universal properties and doing more computation. The reader should feel free to skip this post and return to it later when the words “isomorphism,” “monomorphism,” and “epimorphism” come up again. Perhaps the most important part of this post is the description of an isomorphism.

Bezier Curves and Picasso

Sat, 11 May 2013 00:32:05 +0000

Pablo Picasso Simplicity and the Artist Some of my favorite of Pablo Picasso’s works are his line drawings. He did a number of them about animals: an owl, a camel, a butterfly, etc. This piece called “Dog” is on my wall: Dachshund-Picasso-Sketch (Jump to interactive demo where we recreate “Dog” using the math in this post) These paintings are extremely simple but somehow strike the viewer as deeply profound. They give the impression of being quite simple to design and draw.

Categories as Types

Sat, 04 May 2013 19:30:35 +0000

In this post we’ll get a quick look at two ways to define a category as a type in ML. The first way will be completely trivial: we’ll just write it as a tuple of functions. The second will involve the terribly-named “functor” expression in ML, which allows one to give a bit more structure on data types. The reader unfamiliar with the ML programming language should consult our earlier primer.

Rings — A Primer

Tue, 30 Apr 2013 12:17:28 +0000

Previously on this blog, we’ve covered two major kinds of algebraic objects: the vector space and the group. There are at least two more fundamental algebraic objects every mathematician should something know about. The first, and the focus of this primer, is the ring. The second, which we’ve mentioned briefly in passing on this blog, is the field. There are a few others important to the pure mathematician, such as the $ R$-module (here $ R$ is a ring).

Introducing Categories

Wed, 24 Apr 2013 06:48:01 +0000

For a list of all the posts on Category Theory, see the Main Content page. It is time for us to formally define what a category is, to see a wealth of examples. In our next post we’ll see how the definitions laid out here translate to programming constructs. As we’ve said in our soft motivational post on categories, the point of category theory is to organize mathematical structures across various disciplines into a unified language.

Categories, What's the Point?

Tue, 16 Apr 2013 18:00:09 +0000

Perhaps primarily due to the prominence of monads in the Haskell programming language, programmers are often curious about category theory. Proponents of Haskell and other functional languages can put category-theoretic concepts on a pedestal or in a mexican restaurant, and their benefits can seem as mysterious as they are magical. For instance, the most common use of a monad in Haskell is to simulate the mutation of immutable data. Others include suspending and backtracking computations, and even untying tangled rope.

Probabilistic Bounds — A Primer

Mon, 15 Apr 2013 11:14:32 +0000

Probabilistic arguments are a key tool for the analysis of algorithms in machine learning theory and probability theory. They also assume a prominent role in the analysis of randomized and streaming algorithms, where one imposes a restriction on the amount of storage space an algorithm is allowed to use for its computations (usually sublinear in the size of the input). While a whole host of probabilistic arguments are used, one theorem in particular (or family of theorems) is ubiquitous: the Chernoff bound.

Computing Homology

Wed, 10 Apr 2013 16:38:25 +0000

Update: the mistakes made in the code posted here are fixed and explained in a subsequent post (one minor code bug was fixed here, and a less minor conceptual bug is fixed in the linked post). In our last post in this series on topology, we defined the homology group. Specifically, we built up a topological space as a simplicial complex (a mess of triangles glued together), we defined an algebraic way to represent collections of simplices called chains as vectors in a vector space, we defined the boundary homomorphism $ \partial_k$ as a linear map on chains, and finally defined the homology groups as the quotient vector spaces

A Sample of Standard ML, the TreeSort Algorithm, and Monoids

Sun, 07 Apr 2013 21:57:37 +0000

In this post we will assume the reader has a passing familiarity with some of the basic concepts of functional programming (the map, fold, and filter functions). We introduce these topics in our Racket primer, but the average reader will understand the majority of this primer without expertise in functional programming. Follow-ups to this post can be found in the Computational Category Theory section of the Main Content page. Preface: ML for Category Theory A few of my readers have been asking for more posts about functional languages and algorithms written in functional languages.

Homology Theory — A Primer

Wed, 03 Apr 2013 20:07:46 +0000

This series on topology has been long and hard, but we’re are quickly approaching the topics where we can actually write programs. For this and the next post on homology, the most important background we will need is a solid foundation in linear algebra, specifically in row-reducing matrices (and the interpretation of row-reduction as a change of basis of a linear operator). Last time we engaged in a whirlwind tour of the fundamental group and homotopy theory.

Conditional (Partitioned) Probability — A Primer

Thu, 28 Mar 2013 13:36:09 +0000

One of the main areas of difficulty in elementary probability, and one that requires the highest levels of scrutiny and rigor, is conditional probability. The ideas are simple enough: that we assign probabilities relative to the occurrence of some event. But shrewd applications of conditional probability (and in particular, efficient ways to compute conditional probability) are key to successful applications of this subject. This is the basis for Nate Silver‘s success, the logical flaws of many a political pundit, and the ability for a robot to tell where it is in an environment.

Methods of Proof — Induction

Thu, 21 Mar 2013 11:36:44 +0000

In this final post on the basic four methods of proof (but perhaps not our last post on proof methods), we consider the proof by induction. Proving Statements About All Natural Numbers Induction comes in many flavors, but the goal never changes. We use induction when we want to prove something is true about all natural numbers. These statements will look something like this: For all natural numbers n, $ 1 + 2 + \dots + n = n(n+1)/2$.

Seam Carving for Content-Aware Image Scaling

Mon, 04 Mar 2013 22:19:19 +0000

The Problem with Cropping Every programmer or graphic designer with some web development experience can attest to the fact that finding good images that have an exactly specified size is a pain. Since the dimensions of the sought picture are usually inflexible, an uncomfortable compromise can come in the form of cropping a large image down to size or scaling the image to have appropriate dimensions. Both of these solutions are undesirable.

Methods of Proof — Contradiction

Thu, 28 Feb 2013 11:46:02 +0000

In this post we’ll expand our toolbox of proof techniques by adding the proof by contradiction. We’ll also expand on our knowledge of functions on sets, and tackle our first nontrivial theorem: that there is more than one kind of infinity. Impossibility and an Example Proof by Contradiction Many of the most impressive results in all of mathematics are proofs of impossibility. We see these in lots of different fields. In number theory, plenty of numbers cannot be expressed as fractions.

Methods of Proof — Contrapositive

Fri, 22 Feb 2013 20:43:25 +0000

In this post we’ll cover the second of the “basic four” methods of proof: the contrapositive implication. We will build off our material from last time and start by defining functions on sets. Functions as Sets So far we have become comfortable with the definition of a set, but the most common way to use sets is to construct functions between them. As programmers we readily understand the nature of a function, but how can we define one mathematically?

Methods of Proof — Direct Implication

Sat, 16 Feb 2013 12:56:53 +0000

I recently posted an exploratory piece on why programmers who are genuinely interested in improving their mathematical skills can quickly lose stamina or be deterred. My argument was essentially that they don’t focus enough on mastering the basic methods of proof before attempting to read research papers that assume such knowledge. Also, there are a number of confusing (but in the end helpful) idiosyncrasies in mathematical culture that are often unexplained.

Why there is no Hitchhiker's Guide to Mathematics for Programmers

Fri, 08 Feb 2013 11:24:53 +0000

For those who aren’t regular readers: as a followup to this post, there are four posts detailing the basic four methods of proof, with intentions to detail some more advanced proof techniques in the future. You can find them on this blog’s primers page. Do you really want to get better at mathematics? Remember when you first learned how to program? I do. I spent two years experimenting with Java programs on my own in high school.

k-Means Clustering and Birth Rates

Mon, 04 Feb 2013 17:54:20 +0000

A common problem in machine learning is to take some kind of data and break it up into “clumps” that best reflect how the data is structured. A set of points which are all collectively close to each other should be in the same clump. A simple picture will clarify any vagueness in this: cluster-example Here the data consists of points in the plane. There is an obvious clumping of the data into three pieces, and we want a way to automatically determine which points are in which clumps.

Depth- and Breadth-First Search

Tue, 22 Jan 2013 11:44:27 +0000

The graph is among the most common data structures in computer science, and it’s unsurprising that a staggeringly large amount of time has been dedicated to developing algorithms on graphs. Indeed, many problems in areas ranging from sociology, linguistics, to chemistry and artificial intelligence can be translated into questions about graphs. It’s no stretch to say that graphs are truly ubiquitous. Even more, common problems often concern the existence and optimality of paths from one vertex to another with certain properties.

The Fundamental Group — A Primer

Sat, 12 Jan 2013 20:49:06 +0000

Being part of the subject of algebraic topology, this post assumes the reader has read our previous primers on both topology and group theory. As a warning to the reader, it is more advanced than most of the math presented on this blog, and it is woefully incomplete. Nevertheless, the aim is to provide a high level picture of the field with a peek at the details. An Intuitive Topological Invariant Our eventual goal is to get comfortable with the notion of the “homology group” of a topological space.

Probability Theory — A Primer

Fri, 04 Jan 2013 13:45:54 +0000

It is a wonder that we have yet to officially write about probability theory on this blog. Probability theory underlies a huge portion of artificial intelligence, machine learning, and statistics, and a number of our future posts will rely on the ideas and terminology we lay out in this post. Our first formal theory of machine learning will be deeply ingrained in probability theory, we will derive and analyze probabilistic learning algorithms, and our entire treatment of mathematical finance will be framed in terms of random variables.

Groups — A Second Primer

Sat, 22 Dec 2012 00:09:54 +0000

The First Isomorphism Theorem The meat of our last primer was a proof that quotient groups are well-defined. One important result that helps us compute groups is a very easy consequence of this well-definition. Recall that if $ G,H$ are groups and $ \varphi: G \to H$ is a group homomorphism, then the image of $ \varphi$ is a subgroup of $ H$. Also the kernel of $ \varphi$ is the normal subgroup of $ G$ consisting of the elements which are mapped to the identity under $ \varphi$.

Neural Networks and the Backpropagation Algorithm

Sun, 09 Dec 2012 21:24:48 +0000

Neurons, as an Extension of the Perceptron Model In a previous post in this series we investigated the Perceptron model for determining whether some data was linearly separable. That is, given a data set where the points are labelled in one of two classes, we were interested in finding a hyperplane that separates the classes. In the case of points in the plane, this just reduced to finding lines which separated the points like this:

Groups — A Primer

Sat, 08 Dec 2012 23:42:10 +0000

The study of groups is often one’s first foray into advanced mathematics. In the naivete of set theory one develops tools for describing basic objects, and through a first run at analysis one develops a certain dexterity for manipulating symbols and definitions. But it is not until the study of groups that one must step back and inspect the larger picture. The main point of that picture (and indeed the main point of a group) is that algebraic structure can be found in the most unalgebraic of settings.

Information Distance — A Primer

Tue, 04 Dec 2012 08:30:08 +0000

This post assumes familiarity with our primer on Kolmogorov complexity. We recommend the uninformed reader begin there. We will do our best to keep consistent notation across both posts. Kolmogorov Complexity as a Metric Over the past fifty years mathematicians have been piling up more and more theorems about Kolmogorov complexity, and for good reason. One of the main interpretations of the Kolmogorov complexity function $ K$ is that for a given string $ x$, $ K(x)$ is the best theoretical compression of $ x$ under any compression scheme.

Ramsey Number Lower Bound

Sun, 02 Dec 2012 15:17:26 +0000

Define the Ramsey number $ R(k,m)$ to be the minimum number $ n$ of vertices required of the complete graph $ K_n$ so that for any two-coloring (red, blue) of the edges of $ K_n$ one of two things will happen: There is a red $ k$-clique; that is, a complete subgraph of $ k$ vertices for which all edges are red. There is a blue $ m$-clique. It is known that these numbers are always finite, but it is very difficult to compute them exactly.

Constructing Topological Spaces — A Primer

Sun, 11 Nov 2012 18:13:58 +0000

Last time we investigated the (very unintuitive) concept of a topological space as a set of “points” endowed with a description of which subsets are open. Now in order to actually arrive at a discussion of interesting and useful topological spaces, we need to be able to take simple topological spaces and build them up into more complex ones. This will take the form of subspaces and quotients, and through these we will make rigorous the notion of “gluing” and “building” spaces.

There are Infinitely Many Primes (Erdős)

Sat, 10 Nov 2012 13:48:52 +0000

Problem: Prove there are infinitely many primes Solution: Denote by $ \pi(n)$ the number of primes less than or equal to $ n$. We will give a lower bound on $ \pi(n)$ which increases without bound as $ n \to \infty$. Note that every number $ n$ can be factored as the product of a square free number $ r$ (a number which no square divides) and a square $ s^2$.

Topological Spaces — A Primer

Sun, 04 Nov 2012 23:33:34 +0000

In our last primer we looked at a number of interesting examples of metric spaces, that is, spaces in which we can compute distance in a reasonable way. Our goal for this post is to relax this assumption. That is, we want to study the geometric structure of space without the ability to define distance. That is not to say that some notion of distance necessarily exists under the surface somewhere, but rather that we include a whole new class of spaces for which no notion of distance makes sense.

Decision Trees and Political Party Classification

Mon, 08 Oct 2012 14:49:21 +0000

Last time we investigated the k-nearest-neighbors algorithm and the underlying idea that one can learn a classification rule by copying the known classification of nearby data points. This required that we view our data as sitting inside a metric space; that is, we imposed a kind of geometric structure on our data. One glaring problem is that there may be no reasonable way to do this. While we mentioned scaling issues and provided a number of possible metrics in our primer, a more common problem is that the data simply isn’t numeric.

Complete Sequences and Magic Tricks

Tue, 02 Oct 2012 11:28:16 +0000

Numberphile posted a video today describing a neat trick based on complete sequences: The mathematics here is pretty simple, but I noticed at the end of the video that Dr. Grime was constructing the cards by hand, when really this is a job for a computer program. I thought it would be a nice warmup exercise (and a treat to all of the Numberphile viewers) to write a program to construct the cards for any complete sequence.

Infinitely Many Primes (Using Topology)

Wed, 26 Sep 2012 11:51:28 +0000

Problem: Prove there are infinitely many prime numbers. Solution: First recall that an arithmetic progression with difference $ d$ is a sequence of integers $ a_n \subset \mathbb{Z}$ so that for every pair $ a_k, a_{k+1}$ the difference $ a_{k+1} – a_k = d$. We proceed be defining a topology on the set of integers by defining a basis $ B$ of unbounded (in both directions) arithmetic progressions. That is, an open set in this topology is an arbitrary union of arithmetic progressions from $ -\infty$ to $ \infty$.

Trees—A Primer

Sun, 16 Sep 2012 22:40:47 +0000

This post comes in preparation for a post on decision trees (a specific type of tree used for classification in machine learning). While most mathematicians and programmers are familiar with trees, we have yet to discuss them on this blog. For completeness, we’ll give a brief overview of the terminology and constructions associated with trees, and describe a few common algorithms on trees. We will assume the reader has read our first primer on graph theory, which is a light assumption.

K-Nearest-Neighbors and Handwritten Digit Classification

Sun, 26 Aug 2012 12:19:43 +0000

The Recipe for Classification One important task in machine learning is to classify data into one of a fixed number of classes. For instance, one might want to discriminate between useful email and unsolicited spam. Or one might wish to determine the species of a beetle based on its physical attributes, such as weight, color, and mandible length. These “attributes” are often called “features” in the world of machine learning, and they often correspond to dimensions when interpreted in the framework of linear algebra.

Metric Spaces — A Primer

Sun, 26 Aug 2012 12:08:25 +0000

The Blessing of Distance We have often mentioned the idea of a “metric” on this blog, and we briefly described a formal definition for it. Colloquially, a metric is simply the mathematical notion of a distance function, with certain well-behaved properties. Since we’re now starting to cover a few more metrics (and things which are distinctly not metrics) in the context of machine learning algorithms, we find it pertinent to lay out the definition once again, discuss some implications, and explore a few basic examples.

Machine Learning — Introduction

Sat, 04 Aug 2012 19:59:42 +0000

A Series on Machine Learning These days an absolutely staggering amount of research and development work goes into the very coarsely defined field of “machine learning.” Part of the reason why it’s so coarsely defined is because it borrows techniques from so many different fields. Many problems in machine learning can be phrased in different but equivalent ways. While they are often purely optimization problems, such techniques can be expressed in terms of statistical inference, have biological interpretations, or have a distinctly geometric and topological flavor.

The Cellular Automaton Method for Cave Generation

Sun, 29 Jul 2012 20:01:18 +0000

Dear reader, this post has an interactive simulation! We encourage you to play with it as you read the article below. In our series of posts on cellular automata, we explored Conway’s classic Game of Life and discovered some interesting patterns therein. And then in our primers on computing theory, we built up a theoretical foundation for similar kinds of machines, including a discussion of Turing machines and the various computational complexity classes surrounding them.

Dynamic Time Warping for Sequence Comparison

Wed, 25 Jul 2012 20:11:39 +0000

Problem: Write a program that compares two sequences of differing lengths for similarity. Solution: (In Python) import math def dynamicTimeWarp(seqA, seqB, d = lambda x,y: abs(x-y)): # create the cost matrix numRows, numCols = len(seqA), len(seqB) cost = [[0 for _ in range(numCols)] for _ in range(numRows)] # initialize the first row and column cost[0][0] = d(seqA[0], seqB[0]) for i in xrange(1, numRows): cost[i][0] = cost[i-1][0] + d(seqA[i], seqB[0]) for j in xrange(1, numCols): cost[0][j] = cost[0][j-1] + d(seqA[0], seqB[j]) # fill in the rest of the matrix for i in xrange(1, numRows): for j in xrange(1, numCols): choices = cost[i-1][j], cost[i][j-1], cost[i-1][j-1] cost[i][j] = min(choices) + d(seqA[i], seqB[j]) for row in cost: for entry in row: print "%03d" % entry, print "" return cost[-1][-1] Discussion: Comparing sequences of numbers can be tricky business.

The Fast Fourier Transform

Wed, 18 Jul 2012 08:00:54 +0000

It’s often said that the Age of Information began on August 17, 1964 with the publication of Cooley and Tukey’s paper, “An Algorithm for the Machine Calculation of Complex Fourier Series.” They published a landmark algorithm which has since been called the Fast Fourier Transform algorithm, and has spawned countless variations. Specifically, it improved the best known computational bound on the discrete Fourier transform from $ O(n^2)$ to $ O(n \log n)$, which is the difference between uselessness and panacea.

Principal Component Analysis

Thu, 28 Jun 2012 12:08:44 +0000

Problem: Reduce the dimension of a data set, translating each data point into a representation that captures the “most important” features. Solution: in Python import numpy def principalComponents(matrix): # Columns of matrix correspond to data points, rows to dimensions. deviationMatrix = (matrix.T - numpy.mean(matrix, axis=1)).T covarianceMatrix = numpy.cov(deviationMatrix) eigenvalues, principalComponents = numpy.linalg.eig(covarianceMatrix) # sort the principal components in decreasing order of corresponding eigenvalue indexList = numpy.argsort(-eigenvalues) eigenvalues = eigenvalues[indexList] principalComponents = principalComponents[:, indexList] return eigenvalues, principalComponents Discussion: The problem of reducing the dimension of a dataset in a meaningful way shows up all over modern data analysis.

The Discrete Fourier Transform — A Primer

Sat, 23 Jun 2012 14:13:53 +0000

So here we are. We have finally made it to a place where we can transition with confidence from the classical continuous Fourier transform to the discrete version, which is the foundation for applications of Fourier analysis to programming. Indeed, we are quite close to unfurling the might of the Fast Fourier Transform algorithm, which efficiently computes the discrete Fourier transform. But because of its focus on algorithmic techniques, we will save it for a main content post and instead focus here on the intuitive connections between the discrete and continuous realms.

Streaming Median

Thu, 14 Jun 2012 22:03:55 +0000

Problem: Compute a reasonable approximation to a “streaming median” of a potentially infinite sequence of integers. Solution: (in Python) def streamingMedian(seq): seq = iter(seq) m = 0 for nextElt in seq: if m > nextElt: m -= 1 elif m < nextElt: m += 1 yield m Discussion: Before we discuss the details of the Python implementation above, we should note a few things. First, because the input sequence is potentially infinite, we can’t store any amount of information that is increasing in the length of the sequence.

Thoughts after a Year of Math ∩ Programming

Tue, 12 Jun 2012 06:00:22 +0000

After a year of writing this blog, what have I learned about the nature of the relationship between computer programs and mathematics? Here are a few notes that sum up my thoughts, roughly in order of how strongly I agree with them. I’d love to hear your thoughts in the comments. Programming is absolutely great for exploring questions and automating tasks. Mathematics is absolutely great for distilling the soul of a problem.

Generalized Functions — A Primer

Wed, 06 Jun 2012 20:57:05 +0000

Last time we investigated the naive (which I’ll henceforth call “classical”) notion of the Fourier transform and its inverse. While the development wasn’t quite rigorous, we nevertheless discovered elegant formulas and interesting properties that proved useful in at least solving differential equations. Of course, we wouldn’t be following this trail of mathematics if it didn’t result in some worthwhile applications to programming. While we’ll get there eventually, this primer will take us deeper down the rabbit hole of abstraction.

The Fourier Transform — A Primer

Sun, 27 May 2012 19:00:15 +0000

In our last primer we saw the Fourier series, which flushed out the notion that a periodic function can be represented as an infinite series of sines and cosines. While this is fine and dandy, and quite a powerful tool, it does not suffice for the real world. In the real world, very little is truly periodic, especially since human measurements can only record a finite period of time. Even things we wish to explore on this blog are hardly periodic (for instance, image analysis).

Double Angle Trigonometric Formulas

Sat, 19 May 2012 23:28:53 +0000

Problem: Derive the double angle identities $$\sin(2\theta) = 2\sin(\theta)\cos(\theta)\\\ \cos(2\theta) = \cos^2(\theta) – \sin^2(\theta)$$ Solution: Recall from linear algebra how one rotates a point in the plane. The matrix of rotation (derived by seeing where $ (1,0)$ and $ (0,1)$ go under a rotation by $ \theta$, and writing those coordinates in the columns) is $$A = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\\ \sin(\theta) & \cos(\theta) \end{pmatrix}$$ Next, note that to rotate a point twice by $ \theta$, we simply multiply the point (as a vector) by $ A$ twice.

False Proof – 2 = 4, As the Limit of an Infinite Power Tower

Sat, 05 May 2012 14:06:34 +0000

Problem: Prove that $ 2 = 4$. Solution: Consider the value of the following infinitely iterated exponent: $$\displaystyle \sqrt{2}^{\sqrt{2}^{\sqrt{2}^{\cdot^{\cdot^{\cdot}}}}}$$ Let $ a_n = \sqrt{2} \uparrow \uparrow n$, that is, the above power tower where we stop at the $ n$-th term. Then $ a_n$ is clearly an increasing sequence, and moreover $ a_n \leq 4$ by a trivial induction argument: $ \sqrt{2} \leq 4$ and if $ a_n \leq 4$ then $ a_{n+1} = (\sqrt{2})^{a_n} \leq (\sqrt{2})^{4} = 4$.

The Fourier Series—A Primer

Wed, 25 Apr 2012 21:43:06 +0000

Overview In this primer we’ll get a first taste of the mathematics that goes into the analysis of sound and images. In the next few primers, we’ll be building the foundation for a number of projects in this domain: extracting features of music for classification, constructing so-called hybrid images, and other image manipulations for machine vision problems (for instance, for use in neural networks or support vector machines; we’re planning on covering these topics in due time as well).

Kolmogorov Complexity—A Primer

Sat, 21 Apr 2012 18:23:41 +0000

The Complexity of Things Previously on this blog (quite a while ago), we’ve investigated some simple ideas of using randomness in artistic design (psychedelic art, and earlier randomized css designs). Here we intend to give a more thorough and rigorous introduction to the study of the complexity of strings. This naturally falls into the realm of computability theory and complexity theory, and so we refer the novice reader to our other primers on the subject (Determinism and Finite Automata, Turing Machines, and Complexity Classes; but Turing machines will be the most critical to this discussion).

Optimally Stacking the Deck—Texas Hold 'Em

Mon, 09 Apr 2012 09:00:26 +0000

Main Theorem: There exist optimal stackings for standard two-player Texas Hold ‘Em. A Puzzle is Solved (and then some!) It’s been quite a while since we first formulated the idea of an optimal stacking. In the mean time, we’ve gotten distracted with graduate school, preliminary exams, and the host of other interesting projects that have been going on here at Math ∩ Programming. And so months later, after traversing the homotopic hills of topology and projective plains of algebra, we’ve finally found time to solve the problem.

Classic Nintendo Games are NP-Hard

Thu, 22 Mar 2012 18:34:56 +0000

Problem: Prove that generalized versions of Mario Brothers, Metroid, Donkey Kong, Pokemon, and Legend of Zelda are NP-hard. Solution: http://arxiv.org/pdf/1203.1895v1.pdf Discussion: Three researchers (including Erik Demaine, a computer science professor at MIT famous for his work with the mathematics of origami) recently finished a paper giving the complexity of a number of classic Nintendo games (the ones I loved to play). All are proven NP-hard, some are shown to be NP-complete, and some are PSPACE-complete.

Caching (and Memoization)

Thu, 22 Mar 2012 11:09:57 +0000

Problem: Remember results of a function call which requires a lot of computation. Solution: (in Python) def memoize(f): cache = {} def memoizedFunction(*args): if args not in cache: cache[args] = f(*args) return cache[args] memoizedFunction.cache = cache return memoizedFunction @memoize def f(): ... Discussion: You might not use monoids or eigenvectors on a daily basis, but you use caching far more often than you may know. Caching the results of some operation is so prevalent in computer science that the world would slow down considerably without it.

In Place Uniform Shuffle

Sun, 18 Mar 2012 20:33:42 +0000

Problem: Write a program that shuffles a list. Do so without using more than a constant amount of extra space and linear time in the size of the list. Solution: (in Python) import random random.seed() def shuffle(myList): n = len(myList) for i in xrange(0, n): j = random.randint(i, n-1) # randint is inclusive myList[i], myList[j] = myList[j], myList[i] Discussion: Using a computer to shuffle a deck of cards is nontrivial at first glance for the following reasons.

Learning Programming — Finger-Painting and Killing Zombies

Thu, 15 Mar 2012 01:01:07 +0000

By the end, the breadth and depth of our collective knowledge was far beyond what anyone could expect from any high school course in any subject. Education Versus Exploration I’m a lab TA for an introductory Python programming course this semester, and it’s been…depressing. I remember my early days of programming, when the possibilities seemed endless and adding new features to my programs was exciting and gratifying, and I brimmed with pride at every detail, and I boasted to my friends of the amazing things I did, and I felt powerful.

Other Complexity Classes

Wed, 29 Feb 2012 17:12:35 +0000

Not Just Time, But Space Too! So far on this blog we’ve introduced models for computation, focused on Turing machines and given a short overview of the two most fundamental classes of problems: P and NP. While the most significant open question in the theory of computation is still whether P = NP, it turns out that there are hundreds (almost 500, in fact!) other “classes” of problems whose relationships are more or less unknown.

P vs. NP, A Primer (And a Proof Written in Racket)

Thu, 23 Feb 2012 19:57:35 +0000

Decidability Versus Efficiency In the early days of computing theory, the important questions were primarily about decidability. What sorts of problems are beyond the power of a Turing machine to solve? As we saw in our last primer on Turing machines, the halting problem is such an example: it can never be solved a finite amount of time by a Turing machine. However, more recently (in the past half-century) the focus of computing theory has shifted away from possibility in favor of determining feasibility.

Busy Beavers, and the Quest for Big Numbers

Wed, 08 Feb 2012 19:42:22 +0000

Finding Bigger Numbers, a Measure of Human Intellectual Progress Before we get into the nitty gritty mathematics, I’d like to mirror the philosophical and historical insights that one can draw from the study of large numbers. That may seem odd at first. What does one even mean by “studying” a large number? Of course, I don’t mean we stare at the number 1,000,000,000,000, which is quite large, and wonder how mankind can benefit from its elusive properties.

Fundamental Theorem of Algebra (With Picard's Little Theorem)

Tue, 07 Feb 2012 21:30:12 +0000

This post assumes familiarity with some basic concepts in complex analysis, including continuity and entire (everywhere complex-differentiable) functions. This is likely the simplest proof of the theorem (at least, among those that this author has seen), although it stands on the shoulders of a highly nontrivial theorem. The fundamental theorem of algebra has quite a few number of proofs (enough to fill a book!). In fact, it seems a new tool in mathematics can prove its worth by being able to prove the fundamental theorem in a different way.

Cryptanalysis with N-Grams

Fri, 03 Feb 2012 13:52:16 +0000

This post is the third post in a series on computing with natural language data sets. For the first two posts, see the relevant section of our main content page. A Childish Bit of Fun In this post, we focus on the problem of decoding substitution ciphers. First, we’ll describe a few techniques humans use to crack ciphers. We’ll find these unsatisfactory, and move on to a simplistic algorithm which does a local search on the space of all possible decryptions, where we utilize our word segmentation algorithm from last time to determine the likelihood that a decryption is correct.

The Fundamental Theorem of Algebra (with Galois Theory)

Thu, 02 Feb 2012 22:42:11 +0000

This post assumes familiarity with some basic concepts in abstract algebra, specifically the terminology of field extensions, and the classical results in Galois theory and group theory. The fundamental theorem of algebra has quite a few number of proofs (enough to fill a book!). In fact, it seems a new tool in mathematics can prove its worth by being able to prove the fundamental theorem in a different way. This series of proofs of the fundamental theorem also highlights how in mathematics there are many many ways to prove a single theorem, and in re-proving an established theorem we introduce new concepts and strategies.

Handshake Lemma

Sun, 29 Jan 2012 22:24:31 +0000

Problem: Prove or disprove: at a party of $ n$ people, there must be an even number of people who have an odd number of friends at the party. Solution: Let $ P$ be the set of all people, and for any person $ p \in P$, let $ d(p)$ be the number of friends that person has. Let $ f$ be the total number of friendships between pairs of people at the party.

The Fundamental Theorem of Algebra (with the Fundamental Group)

Sun, 22 Jan 2012 14:46:13 +0000

This post assumes familiarity with some basic concepts in algebraic topology, specifically what a group is and the definition of the fundamental group of a topological space. The fundamental theorem of algebra has quite a few number of proofs (enough to fill a book!). In fact, it seems a new tool in mathematics can prove its worth by being able to prove the fundamental theorem in a different way. This series of proofs of the fundamental theorem also highlights how in mathematics there are many many ways to prove a single theorem, and in re-proving an established theorem we introduce new concepts and strategies.

The Fundamental Theorem of Algebra (with Liouville)

Tue, 17 Jan 2012 22:37:45 +0000

This proof assumes knowledge of complex analysis, specifically the notions of analytic functions and Liouville’s Theorem (which we will state below). The fundamental theorem of algebra has quite a few number of proofs (enough to fill a book!). In fact, it seems a new tool in mathematics can prove its worth by being able to prove the fundamental theorem in a different way. This series of proofs of the fundamental theorem also highlights how in mathematics there are many many ways to prove a single theorem, and in re-proving an established theorem we introduce new concepts and strategies.

Word Segmentation, or Makingsenseofthis

Sun, 15 Jan 2012 11:10:51 +0000

A First Look at Google’s N-Gram Corpus In this post we will focus on the problem of finding the appropriate word boundaries in strings like “homebuiltairplanes”, as is common in web URLs like www.homebuiltairplanes.com. This is an interesting problem because humans do it so easily, but there is no obvious programmatic solution. We will begin this article by addressing the complexity of this problem, continue by implementing a simple model using a subset of Google’s n-gram corpus, and finish by describing our future plans to enhance the model.

A Spoonful of Python (and Dynamic Programming)

Thu, 12 Jan 2012 23:11:26 +0000

This primer is a third look at Python, and is admittedly selective in which features we investigate (for instance, we don’t use classes, as in our second primer on random psychedelic images). We do assume some familiarity with the syntax and basic concepts of the language. For a first primer on Python, see A Dash of Python. We’ll investigate some of Python’s useful built-in types, including lists, tuples, and dictionaries, and we use them to computing Fibonacci numbers and “optimal” coin change.

Numerical Integration

Sun, 08 Jan 2012 18:08:42 +0000

Rectangles, Trapezoids, and Simpson’s I just wrapped up a semester of calculus TA duties, and I thought it would be fun to revisit the problem of integration from a numerical standpoint. In other words, the goal of this article is to figure out how fast we can approximate the definite integral of a function $ f:\mathbb{R} \to \mathbb{R}$. Intuitively, a definite integral is a segment of the area between a curve $ f$ and the $ x$-axis, where we allow area to be negative when $ f(x) < 0$.

Random (Psychedelic) Art

Sun, 01 Jan 2012 19:11:28 +0000

And a Pinch of Python Next semester I am a lab TA for an introductory programming course, and it’s taught in Python. My Python experience has a number of gaps in it, so we’ll have the opportunity for a few more Python primers, and small exercises to go along with it. This time, we’ll be investigating the basics of objects and classes, and have some fun with image construction using the Python Imaging Library.

Row Reduction Over A Field

Fri, 30 Dec 2011 15:36:28 +0000

We’re quite eager to get to applications of algebraic topology to things like machine learning (in particular, persistent homology). Even though there’s a massive amount of theory behind it (and we do plan to cover some of the theory), a lot of the actual computations boil down to working with matrices. Of course, this means we’re in the land of linear algebra; for a refresher on the terminology, see our primers on linear algebra.

Metrics on Words

Mon, 19 Dec 2011 20:59:31 +0000

We are about to begin a series where we analyze large corpora of English words. In particular, we will use a probabilistic analysis of Google’s ngrams to solve various tasks such as spelling correction, word segmentation, on-line typing prediction, and decoding substitution ciphers. This will hopefully take us on a wonderful journey through elementary probability, dynamic programming algorithms, and optimization. As usual, the code implemented in this post is available from this blog’s Github page, and we encourage the reader to use the code to implement our suggested exercises.

Holidays and Homicide

Fri, 25 Nov 2011 16:33:21 +0000

A Study In Data Just before midnight on Thanksgiving, there was a murder by gunshot about four blocks from my home. Luckily I was in bed by then, but all of the commotion over the incident got me thinking: is murder disproportionately more common on Thanksgiving? What about Christmas, Valentine’s Day, or Saint Patrick’s Day? Of course, with the right data set these are the kinds of questions one can answer!

Tiling a Chessboard with Dominoes (Opposite Colors Removed)

Fri, 18 Nov 2011 10:32:54 +0000

This is a natural follow-up to our first gallery entry on the impossibility of tiling certain chessboards with dominoes. Problem: Suppose we remove two squares from a chessboard which have opposite color. Is it possible to tile the remaining squares with 2-by-1 dominoes? Solution: Notice that if we remove two squares of opposite color, then there is only one way to place dominoes on the remaining squares according to this scheme (one cannot tile a domino across the “walls”).

Z[√2] has Infinitely Many Units

Mon, 07 Nov 2011 00:10:26 +0000

Note, while the problem below arose in ring theory (specifically, Euclidean domains), the proof itself is elementary, and so the title should not scare away any viewers. In fact, we boil the problem down to something which requires no knowledge of abstract algebra at all. Problem: Show that the ring $ \mathbb{Z}[\sqrt{2}]$ has infinitely many units. Solution: An element of $ \mathbb{Z}[\sqrt{2}]$ has the form $ a+b\sqrt{2}$ for integers $ a, b$, and there is a function $ N: \mathbb{Z}[\sqrt{2}] \to \mathbb{Z}$ defined by $ N(a+b\sqrt{2}) = a^2 – 2b^2$ satisfying $ N(uv) = N(u)N(v)$, and from this it’s easy to see $ u$ is a unit if and only if $ N(u) = \pm 1$.

Conway's Game of Life in Conway's Game of Life

Thu, 03 Nov 2011 17:48:30 +0000

Recalling our series on Conway’s Game of Life, here is an implementation of Life within Life. Unfortunately, it does not “prove” what I hoped it might, so unless a reader has a suggestion on what this demonstration proves, it doesn’t belong in the proof gallery. But it sure is impressive.

False Proof: 1 = 2 (with Calculus)

Tue, 25 Oct 2011 14:46:12 +0000

Problem: Show 1 = 2 (with calculus) “Solution”: Consider the following: $ 1^2 = 1$ $ 2^2 = 2 + 2$ $ 3^2 = 3 + 3 + 3$ $ \vdots$ $ x^2 = x + x + \dots + x$ ($ x$ times) And since this is true for all values of $ x$, we may take the derivative of both sides, and the equality remains true. In other words:

The Smallest Non-Cyclic Simple Group has Order 60

Sat, 08 Oct 2011 20:53:48 +0000

Preamble: This proof is not particularly elegant or insightful. However, it belongs in this gallery for two reasons. First, it is an example of the goal of most mathematics: to classify things. In the same way that all natural numbers can be built up from primes, every group can be built up from simple groups. So if we want to understand all groups, it suffices to understand the simple ones. Indeed, this project has been the collective goal of hundreds of mathematicians for the past hundred years, culminating in one awesomely gargantuan theorem: The Classification of Finite Simple Groups.

A Taste of Racket

Sun, 02 Oct 2011 22:42:57 +0000

or, How I Learned to Love Functional Programming We recognize that not every reader has an appreciation for functional programming. Yet here on this blog, we’ve done most of our work in languages teeming with functional paradigms. It’s time for us to take a stand and shout from the digital mountaintops, “I love functional programming!” In fact, functional programming was part of this author’s inspiration for Math ∩ Programming. And so, to help the reader discover the joys of functional programming, we present an introduction to programming in Racket, with a focus on why functional programming is amazing, and a functional solution to a problem on Project Euler.

N Choose 2 is the Sum of the First N-1 Integers

Sun, 02 Oct 2011 16:16:47 +0000

Problem: Determine an arithmetic expression for $ \binom{n}{2}$. Solution: The following picture describes a bijection between the set of yellow dots and the set of pairs of purple dots: In particular, selecting any yellow dots and travelling downward along diagonals gives a unique pair of blue dots. Conversely, picking any pair of blue dots gives a unique yellow dot which is the meeting point (the “peak”) of the inward diagonals. If we say the bottom row has $ n$ elements, then the number of yellow dots is clearly $ 1 + 2 + \dots + (n-1)$, and the number of pairs in the last row is just $ \binom{n}{2}$.

n-Colorability is Equivalent to Finite n-Colorability (A Formal Logic Proof)

Sun, 04 Sep 2011 22:41:27 +0000

Warning: this proof requires a bit of familiarity with the terminology of propositional logic and graph theory. Problem: Let $ G$ be an infinite graph. Show that $ G$ is $ n$-colorable if and only if every finite subgraph $ G_0 \subset G$ is $ n$-colorable. Solution: One of the many equivalent versions of the Compactness Theorem for the propositional calculus states that if $ \Sigma \subset \textup{Prop}(A)$, where $ A$ is a set of propositional atoms, then $ \Sigma$ is satisfiable if and only if any finite subset of $ \Sigma$ is satisfiable.

Graduate Studies

Wed, 17 Aug 2011 15:25:40 +0000

I want to thank all my readers for visiting Math ∩ Programming as often as you do, and doubly thank those who are kind enough to leave a comment. UIC's east side campus. My office building is unseen, to the left of this picture. Unfortunately over the next few weeks I may not have time to do work as much on this blog as I have in the past two months.

The Square Root of 2 is Irrational (Geometric Proof)

Sun, 14 Aug 2011 19:48:51 +0000

Problem: Show that $ \sqrt{2}$ is an irrational number (can’t be expressed as a fraction of integers). Solution: Suppose to the contrary that $ \sqrt{2} = a/b$ for integers $ a,b$, and that this representation is fully reduced, so that $ \textup{gcd}(a,b) = 1$. Consider the isosceles right triangle with side length $ b$ and hypotenuse length $ a$, as in the picture on the left. Indeed, by the Pythagorean theorem, the length of the hypotenuse is $ \sqrt{b^2 + b^2} = b \sqrt{2} = a$, since $ \sqrt{2} = a/b$.

The Perceptron, and All the Things it Can't Perceive

Thu, 11 Aug 2011 13:16:32 +0000

This post assumes some basic familiarity with Euclidean geometry and linear algebra. Though we do not assume so much knowledge as is contained in our primer on inner product spaces, we will be working with the real Euclidean inner product. For the purpose of this post, it suffices to know about the “dot product” of two vectors. The General Problem One of the main problems in machine learning is to classify data.

A Dash of Python

Wed, 10 Aug 2011 19:26:55 +0000

We will orient our dash of Python around the first and simplest problem from ProjectEuler.net. Installing Python To get Python on your computer, go to python’s website and follow the instructions for downloading and installing the interpreter. Most Window’s users can simply click here to download an installer, Mac OS 10.6 – 10.7 users can click here to get their installer, and linux users can (and should) fend for themselves.

Programming Primers—An Introduction

Sat, 06 Aug 2011 20:15:07 +0000

So far on this blog we’ve assumed familiarity with the programming languages used (at the time of this writing, this is Mathematica and Java). This is unfair for the mathematicians who have little to no programming experience, and we admit that some readers tend to skim those technical sections with source code. As our work on this blog progresses, we recognize that the mathematical elegance is sometimes inherently manifested within the code itself.

Number Theory—A Primer

Sat, 30 Jul 2011 15:03:38 +0000

This primer exists for the background necessary to read our post on RSA encryption, but it also serves as a general primer to number theory. Oh, Numbers, Numbers, Numbers We start with some easy definitions. Definition: The set of integers, denoted $ \mathbb{Z}$, is the set $ \left \{ \dots -2, -1, 0, 1, 2, \dots \right \}$. Definition: Let $ a,b$ be integers, then $ a$ divides $ b$, denoted $ a \mid b$, if there exists an integer $ n$ such that $ na = b$.

Encryption & RSA

Fri, 29 Jul 2011 23:55:57 +0000

This post assumes working knowledge of elementary number theory. Luckily for the non-mathematicians, we cover all required knowledge and notation in our number theory primer. So Three Thousand Years of Number Theory Wasn’t Pointless It’s often tough to come up with concrete applications of pure mathematics. In fact, before computers came along mathematics was used mostly for navigation, astronomy, and war. In the real world it almost always coincided with the physical sciences.

False Proof—All Numbers are Describable in at Most Twenty Words

Thu, 28 Jul 2011 16:03:27 +0000

Problem: Show that every natural number can be unambiguously described in fewer than twenty words. “Solution”: Suppose to the contrary that not every natural number can be so described. Let $ S$ be the set of all natural numbers which are describable in fewer than twenty words. Consider $ R = \mathbb{N}-S$, the set of all words which cannot be described in fewer than twenty words. Since $ R$ is a subset of the natural numbers, which is well-ordered, it has a unique smallest element which we call $ r$.

Eigenfaces, for Facial Recognition

Wed, 27 Jul 2011 18:41:01 +0000

This post assumes familiarity with the terminology and notation of linear algebra, particularly inner product spaces. Fortunately, we have both a beginner’s primer on linear algebra and a follow-up primer on inner products. The Quest We are on a quest to write a program which recognizes images of faces. The general algorithm should be as follows. Get a bunch of sample images of people we want to recognize. Train our recognition algorithm on those samples.

Inner Product Spaces—A Primer

Mon, 25 Jul 2011 00:29:00 +0000

Vector spaces alone are not enough to do a lot of the interesting things we’d like them to do. Since a vector space is a generalization of Euclidean space, it is natural for us to investigate more specific types of vector spaces which are more akin to Euclidean space. In particular, we want to include the notion of a dot product. By admitting additional structure to a vector space, we may perform more computations, and hopefully get more interesting results.

Möbius Transformations are Isometries of a Sphere

Sat, 23 Jul 2011 09:14:25 +0000

We present a video on Möbius transformations and the geometry of the sphere. Anyone who has taken or will take complex analysis (that means you engineers!) should watch this. It shows not only the beautiful correspondence between the two, but it reveals the intuition behind a lot of complex analysis, when more often than not a student is left in the dust of rigorous formulas. In short, this is a proof without words that the Möbius transformations are in correspondence with rigid motions of the unit sphere in $ \mathbb{R}^3$.

Hunting Serial Killers

Wed, 20 Jul 2011 18:07:38 +0000

“Tonight’s the Night” A large volume of research goes into the psychological and behavioral analysis of criminals. In particular, serial criminals hold a special place in the imagination and nightmares of the general public (at least, American public). Those criminals with the opportunity to become serial criminals are logical, cool-tempered, methodical, and, of course, dangerous. They walk among us in crowded city streets, or drive slowly down an avenue looking for their next victim.

False Proof—The Reals are Countable

Tue, 19 Jul 2011 17:10:00 +0000

It seems that false proofs are quickly becoming some of the most popular posts on Math ∩ Programming. I have been preparing exciting posts on applications of graph coloring, deck stacking, and serial killers. Unfortunately, each requires resources which exist solely on my home desktop, which is currently dismantled in California while I am on vacation in Costa Rica. Until I return from the tropics, I will continue with more of the ever -popular false proofs.

False Proof—All Horses are the Same Color

Sat, 16 Jul 2011 07:49:32 +0000

Problem: Show that all horses are of the same color. “Solution”: We will show, by induction, that for any set of $ n$ horses, every horse in that set has the same color. Suppose $ n=1$, this is obviously true. Now suppose for all sets of $ n$ horses, every horse in the set has the same color. Consider any set $ H$ of $ n+1$ horses. We may pick a horse at random, $ h_1 \in H$, and remove it from the set, getting a set of $ n$ horses.

Graph Coloring, or Proof by Crayon

Thu, 14 Jul 2011 22:24:13 +0000

How many colors are required to color the provinces of Costa Rica? A common visual aid for maps is to color the regions of the map differently, so that no two regions which share a border also share a color. For example, to the right is a map of the provinces of Costa Rica (where the author is presently spending his vacation). It is colored with eight different colors, one for each province.

Three Circles and Collinear Centers of Dilation

Wed, 13 Jul 2011 16:29:14 +0000

Problem: Suppose their are three circles in the plane of distinct radii. For any two of these circles, we may find their center of dilation as the intersection point of their common tangents. For example, in the following picture we mark the three centers of dilation for each pair of circles: We notice that the three centers of dilation are collinear. Show they are always collinear for any three non-intersecting circles of distinct radii.

Optimally Stacking the Deck—Kicsi Poker

Mon, 11 Jul 2011 20:30:05 +0000

A Puzzle is Born Sitting around a poolside table, in the cool air and soft light of a June evening, a few of my old friends and I played a game of Texas Hold ‘Em. While we played we chatted about our times in high school, of our old teachers, friends, and, of course, our times playing poker. We even reminisced lightly about some of the more dishonest people we played with (whom we invited for want of a larger payout), but especially we recalled the times we caught them cheating.

Set Theory—A Primer

Sat, 09 Jul 2011 18:14:59 +0000

It’s often that a student’s first exposure to rigorous mathematics is through set theory, as originally studied by Georg Cantor. This means we will not treat set theory axiomatically (as in ZF set theory), but rather we will take the definition of a set for granted, and allow any operation to be performed on a set. This will be clear when we present examples, and it will be clear why this is a bad idea when we present paradoxes.

False Proof—31.5 = 32.5

Thu, 07 Jul 2011 14:14:06 +0000

Problem: Show 31.5 = 32.5. “Solution”: Explanation: It appears that by shifting around the pieces of one triangle, we have constructed a second figure which covers less area! Since the first triangle has base length 13 and height 5, its area is 32.5. Clearly, the second figure has the same area minus a square of area 1, giving the second figure area 31.5. In fact, by counting up the squares in each component, we do see that it is simply a shifting of pieces, so the areas of the two figures must be the same.

False Proof—There are Finitely Many Primes

Tue, 05 Jul 2011 19:01:48 +0000

Problem: Show there are finitely many primes. “Solution”: Suppose to the contrary there are infinitely many primes. Let $ P$ be the set of primes, and $ S$ the set of square-free natural numbers (numbers whose prime factorization has no repeated factors). To each square-free number $ n \in S$ there corresponds a subset of primes, specifically the primes which make up $ n$’s prime factorization. Similarly, any subset $ Q \subset P$ of primes corresponds to a number in $ S$, since we can simply multiply all numbers in $ Q$ together to get a square-free number.

False Proof—1 = 2

Tue, 05 Jul 2011 18:36:09 +0000

“Solution”: Let $ a=b \neq 0$. Then $ a^2 = ab$, and $ a^2 – b^2 = ab – b^2$. Factoring gives us $ (a+b)(a-b) = b(a-b)$. Canceling both sides, we have $ a+b = b$, but remember that $ a = b$, so $ 2b = b$. Since $ b$ is nonzero, we may divide both sides to obtain $ 2=1$, as desired. Explanation: This statement, had we actually proved it, would imply that all numbers are equal, since subtracting 1 from both sides gives $ 0=1$ and hence $ a=0$ for all real numbers $ a$.

Geometric Series with Geometric Proofs

Tue, 05 Jul 2011 18:13:44 +0000

Problem: $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \dots = 1$ Solution: Problem: $ \frac{1}{3} + \frac{1}{9} + \frac{1}{27} + \dots = \frac{1}{2}$ Solution: Problem: $ \frac{1}{4} + \frac{1}{16} + \frac{1}{64} + \dots = \frac{1}{3}$ Solution: Problem: $ 1 + r + r^2 + \dots = \frac{1}{1-r}$ if $ r < 1$. Solution: This last one follows from similarity of the subsequent trapezoids: the right edge of the teal(ish) trapezoid has length $ r$, and so the right edge of the neighboring trapezoid, $ x$, is found by $ \frac{r}{1} = \frac{x}{r}$, and we see that it has length $ r^2$.

Turing Machines—A Primer

Mon, 04 Jul 2011 16:35:44 +0000

Last time we saw some models for computation, and saw in turn how limited they were. Now, we open Pandrora’s hard drive: Definition: A Turing machine is a tuple $ (S, \Gamma, \Sigma, s_0, F, \tau)$, where $ S$ is a set of states, $ \Gamma$ is a set of tape symbols, including a special blank symbol $ b$, $ \Sigma \subset \Gamma$ is a set of input symbols, not including $ b$, $ s_0$ is the initial state, $ A \subset S$ is a set of accepting states, $ R \subset S$ is a set of rejecting states, $ \tau: S – (A \cup R) \times \Gamma \to S \times \Gamma \times \left \{ L, R \right \}$ is a partial function called the transition function, where $ L, R$ correspond to “left shift” and “right shift,” respectively.

Determinism and Finite Automata—A Primer

Sat, 02 Jul 2011 23:41:30 +0000

The first step in studying the sorts of possible computations (and more interestingly, those things which cannot be computed) is to define exactly what we mean by a “computation.” At a high level, this is easy: a computation is simply a function. Given some input, produce the appropriate output. Unfortunately this is much too general. For instance, we could define almost anything we want in terms of functions. Let $ f$ be the function which accepts as input the date of California Super Lotto drawings, and returns the set of winning numbers for that date.

Sums of k Powers

Fri, 01 Jul 2011 12:52:29 +0000

Problem: Prove that for all $ n,k \in \mathbb{N}, k > 1$, we have $$\sum \limits_{i=0}^{n} k^i = \frac{k^{n+1}-1}{k-1}$$ Solution: Representing the numbers in base $ k$, we have that each term of the sum is all 0’s except for a 1 in the $ i$th place. Hence, the sum of all terms is the $ n$-digit number comprised of all 1’s. Multiplying by $ k-1$ gives us the $ n$-digit number where every digit is $ k-1$.

Turing Machines and Conway's Dreams

Thu, 30 Jun 2011 20:50:04 +0000

Additional Patterns Last time we left the reader with the assertion that Conway’s game of life does not always stabilize. Specifically, there exist patterns which result in unbounded cell population growth. Although John Conway’s original conjecture was that all patterns eventually stabilize (and offered $50 to anyone who could provide a proof or counterexample), he was proven wrong. Here we have the appropriately named glider gun, whose main body oscillates, expelling a glider once per period.

The Wild World of Cellular Automata

Wed, 29 Jun 2011 21:08:45 +0000

There is a long history of mathematical models for computation. One very important one is the Turing Machine, which is the foundation of our implementations of actual computers today. On the other end of the spectrum, one of the simpler models of computation (often simply called a system) is a cellular automaton. Surprisingly enough, there are deep connections between the two. But before we get ahead of ourselves, let’s see what these automata can do.

Tiling a Chessboard

Sun, 26 Jun 2011 20:24:09 +0000

Problem: Take a chessboard and cut off two opposite corners. Is it possible to completely tile the remaining board with 2-by-1 dominoes? Solution: Notice that every domino covers exactly one white tile and one black tile. Counting up the colors, we have 32 white and 30 black. Hence, any tiling by 2-by-1 dominoes will leave two extra white squares unaccounted for. So no such tiling is possible. Problem: Cut one corner off a chessboard.

Teaching Mathematics—Graph Theory

Sun, 26 Jun 2011 16:19:20 +0000

Community Service Mathematics is supposed to be a process of discovery. Definitions, propositions, and methods of proof don’t come from nowhere, although after the fact (when presented in a textbook) they often seem to. As opposed to a textbook, real maths is highly non-linear. It took mathematicians quite a lot of fuss to come up with the quadratic formula, and even simple geometric conjectures were for the longest time the subject of hot debate.

Area of a Triangle

Fri, 24 Jun 2011 15:46:11 +0000

Problem: What is the area of the triangle within the rectangle? Solution: In a moment of inspiration, we draw the following additional line: Now the answer is obvious. Once we split the rectangle into two smaller rectangles, the sides of the triangle become diagonals of their respective rectangles. The diagonals obviously split each of the two smaller rectangles into halves, where one half lies inside our original triangle. Clearly then, the interior of the whole triangle is half the area of its enclosing rectangle.

Sums of the first n numbers, squares

Fri, 24 Jun 2011 15:05:22 +0000

Problem: Find the sum of the first 1000 natural numbers. Solution: Write the numbers twice as follows: $ \begin{matrix} 1 & + & 2 & + & \dots & + & 999 & + & 1000 \\\ 1000 & + & 999 & + & \dots & + & 2 & + & 1 \end{matrix}$ Summing the numbers in each column, we have: $ 2 (1 + 2 + \dots + 1000) = 1001 + 1001 + \dots + 1001$ (1000 times)

The Party Problem

Thu, 23 Jun 2011 16:22:22 +0000

Problem: At any party of 1000 people, must there always exist two people at the party who have the same number of friends at the party? For the sake of this problem, one cannot be friends with oneself, and friendship is bidirectional. Solution: This must always happen. Suppose to the contrary, that every person at the party has a different number of friends at the party. The minimum number of friends one could have is 0, while 999 is the maximum.

Number of Games in a Tournament

Thu, 23 Jun 2011 16:04:47 +0000

Problem: 1000 players compete in a tournament. In each round, players are matched with opponents, and the winner proceeds to the next round. If there are an odd number of players in a round, one player chosen at random sits out of that round. What is the total number of games are played in the tournament? Solution: 999. Each player loses exactly one game, except for the winner of the tournament.

Google's Page Rank—Why it Doesn't Work Anymore

Tue, 21 Jun 2011 17:25:08 +0000

A Bully By Any Other Name From the New York Times: “Shopping online in late July, Clarabelle Rodriguez typed the name of her favorite eyeglass brand into Google’s search bar. In moments, she found the perfect frames — made by a French company called Lafont — on a Web site that looked snazzy and stood at the top of the search results. Not the tippy-top, where the paid ads are found, but under those, on Google’s version of the gold-medal podium, where the most relevant and popular site is displayed.

Google's Page Rank—The Final Product

Mon, 20 Jun 2011 22:07:41 +0000

Dangling Nodes and Non-Uniqueness Recall where we left off last time. Given a web $ W$ with no dangling nodes, the link matrix for $ W$ has 1 as an eigenvalue, and if the corresponding eigenspace has dimension 1, then any associated eigenvector gives a ranking of the pages in $ W$ which is consistent with our goals. The first problem is that if there is a dangling node, our link matrix has a column of all zeros, and is no longer column-stochastic.

Linear Algebra—A Primer

Sun, 19 Jun 2011 18:39:40 +0000

Story Time Linear algebra was founded around the same time as Calculus (think Leibniz, circa 1700) solely for the purpose of solving general systems of linear equations. The coefficients of a system were written in a grid form, with rows corresponding to equations and columns to the unknown variables. Using a computational tool called the determinant (an awkward, but computable formula involving only the coefficients of the equations in a system), researchers were able to solve these systems, opening a world of information about the positions of celestial bodies and large-scale measurements (of geodesic arcs) on the surface of the earth.

Google's PageRank—A First Attempt

Sat, 18 Jun 2011 18:05:20 +0000

The Web as a Graph The goal of this post is to assign an “importance score” $ x_i \in [0,1]$ to each of a set of web pages indexed $ v_i$ in a way that consistently captures our idea of which websites are likely to be important. But before we can extract information from the structure of the internet, we need to have a mathematical description of that structure. Enter graph theory.

Big-O Notation—A Primer

Tue, 14 Jun 2011 16:26:27 +0000

The Quest to Capture Speed Companies and researchers spend hundreds of millions of dollars for the fruits of their algorithms. Whether one is indexing websites on the internet for search, folding proteins, or figuring out which warehouse is the most cost-effective to ship a product from, improvements in algorithm speed save immense amounts of money. It’s no surprise then, that a similarly immense amount of money has gone into the mathematics behind algorithm analysis.

Well Orderings and Search

Tue, 14 Jun 2011 11:18:04 +0000

Binary Search Binary search is perhaps the first and most basic nontrivial algorithm a student learns. For the mathematicians out there, binary search is a fast procedure to determine whether a sorted list contains a particular element. Here is a pseudocode implementation: # Binary Search: # Given a list L, sorted via the total order <, and a sought # element x, return true iff L contains x. function binarySearch(L, x, <): # base case if(length(L) == 1): return L[0] == x middleIndex = floor(length(L) / 2) if (L[middleIndex] == x): return true # inductive step, with ellipsis notation meaning slices of L # from the beginning and to the end, respectively if (x < L[middleIndex]): return binarySort(L[.

Prime Design

Mon, 13 Jun 2011 14:26:39 +0000

The goal of this post is to use prime numbers to make interesting and asymmetric graphics, and to do so in the context of the web design language CSS. Number Patterns For the longest time numbers have fascinated mathematicians and laymen alike. Patterns in numbers are decidedly simple to recognize, and the proofs of these patterns range from trivially elegant to Fields Medal worthy. Here’s an example of a simple one that computer science geeks will love:

Google's PageRank—Introduction

Sun, 12 Jun 2011 21:09:29 +0000

Importance on the Web As a society living in the “Information Age,” it comes as no surprise that we are faced with the task of sorting through vast oceans of content. With the admission that most content is actually junk, we must wisely choose the objects of our analysis. The appropriately named site UselessJunk.com certainly doesn’t deserve the same attention as the BBC World News page, and yet within the monstrous heart of the internet, it requires the maturity of the human psyche to discriminate their relative worth.