Book mailing list

I’m launching an email list for my book-in-progress, which is tentatively titled, “A Programmer’s Introduction to Mathematics.”

Sign up form

This will probably end up being a monthly or once-every-two-months newsletter with my progress updates. I’ll probably also use this email list to send out sneak peeks of certain chapters and ask for feedback. A lot about the book is still up in the air, but now that the excitement of my PhD defense has passed, I can focus deeply on the book.

A short description of the intended audience (from the comments on this post)

My target audience is a programmer who has gotten through the standard CS education, but never felt they deeply understood math. Think of it as a step up from Kalid’s Better Explained blog (I won’t explain the basics of exponents or trigonometry), but a big step below the hard posts on this blog. I’ll emphasize connections and analogies between math and programming. Think of it as a coherent, linear version of the introductory posts on this blog with cool and nontrivial applications. I will also focus on building the skills and mindset that allows one to pick up other math books and continue learning.

More than anything, I want to keep this blog free of too much non-math, so this will be my last plug for the book until the day it’s released.

book-progress-2016-04-21

A mock cover. Look closely and you’ll see it’s got 95 pages so far.

Concrete Examples of Quantum Gates

So far in this series we’ve seen a lot of motivation and defined basic ideas of what a quantum circuit is. But on rereading my posts, I think we would all benefit from some concreteness.

“Local” operations

So by now we’ve understood that quantum circuits consist of a sequence of gates A_1, \dots, A_k, where each A_i is an 8-by-8 matrix that operates “locally” on some choice of three (or fewer) qubits. And in your head you imagine starting with some state vector v and applying each A_i locally to its three qubits until the end when you measure the state and get some classical output.

But the point I want to make is that A_i actually changes the whole state vector v, because the three qubits it acts “locally” on are part of the entire basis. Here’s an example. Suppose we have three qubits and they’re in the state

\displaystyle v = \frac{1}{\sqrt{14}} (e_{001} + 2e_{011} - 3e_{101})

Recall we abbreviate basis states by subscripting them by binary strings, so e_{011} = e_0 \otimes e_1 \otimes e_1, and a valid state is any unit vector over the 2^3 = 8 possible basis elements. As a vector, this state is \frac{1}{\sqrt{14}} (0,1,0,2,0,-3,0,0)

Say we apply the gate A that swaps the first and third qubits. “Locally” this gate has the following matrix:

\displaystyle \begin{pmatrix} 1&0&0&0 \\ 0&0&1&0 \\ 0&1&0&0 \\ 0&0&0&1 \end{pmatrix}

where we index the rows and columns by the relevant strings in lexicographic order: 00, 01, 10, 11. So this operation leaves e_{00} and e_{11} the same while swapping the other two. However, as an operation on three qubits the operation looks quite different. And it’s sort of hard to describe a general way to write it down as a matrix because of the choice of indices. There are three different perspectives.

Perspective 1: if the qubits being operated on are sequential (like, the third, fourth, and fifth qubits), then we can write the matrix as I_{2^{a}} \otimes A \otimes I_{2^{b}} where a tensor product of matrices is the Kronecker product and a + b + \log \textup{dim}(A) = n (the number of qubits adds up). Then the final operation looks like a “tiled product” of identity matrices by A, but it’s a pain to write out. Let me hurt my self for your sake, dear reader.

tensormatrix1

And each copy of A \otimes I_{2^b} looks like

tiled-piece

 

That’s a mess, but if you write it out for our example of swapping the first and third qubits of a three-qubit register you get the following:

example-3swap2

And this makes sense: the gate changes any entry of the state vector that has values for the first and third qubit that are different. This is what happens to our state:

\displaystyle v = \frac{1}{\sqrt{14}} (0,1,0,2,0,-3,0,0) \mapsto \frac{1}{\sqrt{14}} (0,0,0,0,1,-3,2,0)

Perspective 2: just assume every operation works on the first three qubits, and wrap each operation A in between an operation that swaps the first three qubits with the desired three. So like BAB for B a swap operation. Then the matrix form looks a bit simpler, and it just means we permute the columns of the matrix form we gave above so that it just has the form A \otimes I_a. This allows one to retain a shred of sanity when trying to envision the matrix for an operation that acts on three qubits that are not sequential. The downside is that to actually use this perspective in an analysis you have to carry around the extra baggage of these permutation matrices. So one might use this as a simplifying assumption (a “without loss of generality” statement).

Perspective 3: ignore matrices and write things down in a summation form. So if \sigma is the permutation that swaps 1 and 3 and leaves the other indices unchanged, we can write the general operation on a state v = \sum_{x \in \{ 0,1 \}^n } a_x e_{x} as Av = \sum_{x \in \{ 0,1 \}^n} a_x e_{\sigma(x)}.

The third option is probably the nicest way to do things, but it’s important to keep the matrix view in mind for many reasons. Just one quick reason: “errors” in quantum gates (that are meant to approximately compute something) compound linearly in the number of gates because the operations are linear. This is a key reason that allows one to design quantum analogues of error correcting codes.

So we’ve established that the basic (atomic) quantum gates are “local” in the sense that they operate on a fixed number of qubits, but they are not local in the sense that they can screw up the entire state vector.

A side note on the meaning of “local”

When I was chugging through learning this stuff (and I still have far to go), I wanted to come up with an alternate characterization of the word “local” so that I would feel better about using the word “local.” Mathematicians are as passionate about word choice as programmers are about text editors. In particular, for a long time I was ignorantly convinced that quantum gates that act on a small number of qubits don’t affect the marginal distribution of measurement outcomes for other qubits. That is, I thought that if A acts on qubits 1,2,3, then Av and v have the same probability of a measurement producing a 1 in index 4, 5, etc, conditioned on fixing a measurement outcome for qubits 1,2,3. In notation, if x is a random variable whose values are binary strings and v is a state vector, I’ll call x \sim v the random process of measuring a state vector v and getting a string x, then my claim was that the following was true for every b_1, b_2, b_3 \in \{0,1\} and every 1 \leq i \leq n:

\displaystyle \begin{aligned}\Pr_{x \sim v}&[x_i = 1 \mid x_1 = b_1, x_2 = b_2, x_3 = b_3] = \\ \Pr_{x \sim Av}&[x_i = 1 \mid x_1 = b_1, x_2 = b_2, x_3 = b_3] \end{aligned}

You could try to prove this, and you would fail because it’s false. In fact, it’s even false if A acts on only a single qubit! Because it’s so tedious to write out all of the notation, I decided to write a program to illustrate the counterexample. (The most brazenly dedicated readers will try to prove this false fact and identify where the proof fails.)

import numpy

H = (1/(2**0.5)) * numpy.array([[1,1], [1,-1]])
I = numpy.identity(4)
A = numpy.kron(H,I)

Here H is the 2 by 2 Hadamard matrix, which operates on a single qubit and maps e_0 \mapsto \frac{e_0 + e_1}{\sqrt{2}}, and e_1 \mapsto \frac{e_0 - e_1}{\sqrt{2}}. This matrix is famous for many reasons, but one simple use as a quantum gate is to generate uniform random coin flips. In particular, measuring He_0 outputs 1 and 0 with equal probability.

So in the code sample above, A is the mapping which applies the Hadamard operation to the first qubit and leaves the other qubits alone.

Then we compute some arbitrary input state vector w

def normalize(z):
   return (1.0 / (sum(abs(z)**2) ** 0.5)) * z

v = numpy.arange(1,9)
w = normalize(v)

And now we write a function to compute the probability of some query conditioned on some fixed bits. We simply sum up the square norms of all of the relevant indices in the state vector.

def condProb(state, query={}, fixed={}):
   num = 0
   denom = 0
   dim = int(math.log2(len(state)))

   for x in itertools.product([0,1], repeat=dim):
      if any(x[index] != b for (index,b) in fixed.items()):
         continue

      i = sum(d << i for (i,d) in enumerate(reversed(x)))
      denom += abs(state[i])**2
      if all(x[index] == b for (index, b) in query.items()):
         num += abs(state[i]) ** 2

   if num == 0:
      return 0 

   return num / denom

So if the query is query = {1:0} and the fixed thing is fixed = {0:0}, then this will compute the probability that the measurement results in the second qubit being zero conditioned on the first qubit also being zero.

And the result:

Aw = A.dot(w)
query = {1:0}
fixed = {0:0}
print((condProb(w, query, fixed), condProb(Aw, query, fixed)))
# (0.16666666666666666, 0.29069767441860467)

So they are not equal in general.

Also, in general we won’t work explicitly with full quantum gate matrices, since for n qubits the have size 2^{2n} which is big. But for finding counterexamples to guesses and false intuition, it’s a great tool.

Some important gates on 1-3 qubits

Let’s close this post with concrete examples of quantum gates. Based on the above discussion, we can write out the 2 x 2 or 4 x 4 matrix form of the operation and understand that it can apply to any two qubits in the state of a quantum program. Gates are most interesting when they’re operating on entangled qubits, and that will come out when we visit our first quantum algorithm next time, but for now we will just discuss at a naive level how they operate on the basis vectors.

Hadamard gate:

We introduced the Hadamard gate already, but I’ll reiterate it here.

Let H be the following 2 by 2 matrix, which operates on a single qubit and maps e_0 \mapsto \frac{e_0 + e_1}{\sqrt{2}}, and e_1 \mapsto \frac{e_0 - e_1}{\sqrt{2}}.

\displaystyle H = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}

One can use H to generate uniform random coin flips. In particular, measuring He_0 outputs 1 and 0 with equal probability.

Quantum NOT gate:

Let X be the 2 x 2 matrix formed by swapping the columns of the identity matrix.

\displaystyle X = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}

This gate is often called the “Pauli-X” gate by physicists. This matrix is far too simple to be named after a person, and I can only imagine it is still named after a person for the layer of obfuscation that so often makes people feel smarter (same goes for the Pauli-Y and Pauli-Z gates, but we’ll get to those when we need them).

If we’re thinking of e_0 as the boolean value “false” and e_1 as the boolean value “true”, then the quantum NOT gate simply swaps those two states. In particular, note that composing a Hadamard and a quantum NOT gate can have interesting effects: XH(e_0) = H(e_0), but XH(e_1) \neq H(e_1). In the second case, the minus sign is the culprit. Which brings us to…

Phase shift gate:

Given an angle \theta, we can “shift the phase” of one qubit by an angle of \theta using the 2 x 2 matrix R_{\theta}.

\displaystyle R_{\theta} = \begin{pmatrix} 1 & 0 \\ 0 & e^{i \theta} \end{pmatrix}

“Phase” is a term physicists like to use for angles. Since the coefficients of a quantum state vector are complex numbers, and since complex numbers can be thought of geometrically as vectors with direction and magnitude, it makes sense to “rotate” the coefficient of a single qubit. So R_{\theta} does nothing to e_0 and it rotates the coefficient of e_1 by an angle of \theta.

Continuing in our theme of concreteness, if I have the state vector v = \frac{1}{\sqrt{204}} (1,2,3,4,5,6,7,8) and I apply a rotation of pi to the second qubit, then my operation is the matrix I_2 \otimes R_{\pi} \otimes I_2 which maps e_{i0k} \mapsto e_{i0k} and e_{i1k} \mapsto -e_{i1k}. That would map the state v to (1,2,-3,-4,5,6,-7,-8).

If we instead used the rotation by \pi/2 we would get the output state (1,2,3i, 4i, 5, 6, 7i, 8i).

Quantum AND/OR gate:

In the last post in this series we gave the quantum AND gate and left the quantum OR gate as an exercise. Rather than write out the matrix again, let me remind you of this gate using a description of the effect on the basis e_{ijk} where i,j,k \in \{ 0,1 \}. Recall that we need three qubits in order to make the operation reversible (which is a consequence of all unitary gates being unitary matrices). Some notation: \oplus is the XOR of two bits, and \wedge is AND, \vee is OR. The quantum AND gate maps

\displaystyle e_{ijk} \mapsto e_{ij(k \oplus (i \wedge j))}

In words, the third coordinate is XORed with the AND of the first two coordinates. We think of the third coordinate as a “scratchwork” qubit which is maybe prepared ahead of time to be in state zero.

Simiarly, the quantum OR gate maps e_{ijk} \mapsto e_{ij(k \oplus (i \vee j))}. As we saw last time these combined with the quantum NOT gate (and some modest number of scratchwork qubits) allows quantum circuits to simulate any classical circuit.

Controlled-* gate:

The last example in this post is a meta-gate that represents a conditional branching. If we’re given a gate A acting on k qubits, then we define the controlled-A to be an operation which acts on k+1 qubits. Let’s call the added qubit “qubit zero.” Then controlled-A does nothing if qubit zero is in state 0, and applies A if qubit zero is in state 1. Qubit zero is generally called the “control qubit.”

The matrix representing this operation decomposes into blocks if the control qubit is actually the first qubit (or you rearrange).

\displaystyle \begin{pmatrix} I_{2^{k-1}} & 0 \\ 0 & A \end{pmatrix}

A common example of this is the controlled-NOT gate, often abbreviated CNOT, and it has the matrix

\displaystyle \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{pmatrix}

Looking forward

Okay let’s take a step back and evaluate our life choices. So far we’ve spent a few hours of our time motivating quantum computing, explaining the details of qubits and quantum circuits, and seeing examples of concrete quantum gates and studying measurement. I’ve hopefully hammered into your head the notion that quantum states which aren’t pure tensors (i.e. entangled) are where the “weirdness” of quantum computing comes from. But we haven’t seen any examples of quantum algorithms yet!

Next time we’ll see our first example of an algorithm that is genuinely quantum. We won’t tackle factoring yet, but we will see quantum “weirdness” in action.

Until then!

Reflections and a public resolution

When I started blogging my goal was to record the cool things I learned about in math and computer science so that I didn’t forget them. Then it turned into an excuse for me to learn new cool things, a way to publicize my research, and an unexpected reputation.

It seems that Math ∩ Programming has become something of a resource. In a recent survey, I found out that my readers are as young as 15 and as old as 70. And they come from all over the world. It seems that already, even before graduating with my PhD, I’ve made a bigger dent in the world with my blog than my research papers likely ever will.

And I have no intention to stop. These last few months have been slow due to job searching, grant applications, and a horde of research projects. The graph isomorphism brouhaha didn’t help my productivity either, but it was worth it on a personal level🙂

Indeed, the more I blog the more ideas I get. Here are just a few titles of unfinished drafts in my queue:

  1. The group theoretic view of quantum gates
  2. Singular value decomposition
  3. Matrix completion and recommender systems
  4. The unreasonable effectiveness of the Multiplicative Weights Update Algorithm
  5. Cryptographic hardness assumptions — a primer
  6. VC-dimension, Occam’s razor, and the geometry of hypotheses
  7. Big dimensions, and what you can do about it
  8. The quantum Fourier transform
  9. Linear programming and the most affordable healthy diet, part 2
  10. What’s up with graph Laplacians?
  11. Persistent homology
  12. Byzantine generals
  13. Support vector machines

And I’ve been meaning to spend some time on convex optimization techniques, and maybe some deep learning while I’m at it. Feel free to vote for your favorite topics and I’ll try to prioritize accordingly.

I have a few blog-related projects in mind for this year. The one that I want to publicly commit to, my new years resolution, is to finish and publish my book. This should be a surprise because I haven’t announced it anywhere. The tentative title is, “A Programmer’s Introduction to the Culture of Mathematics.” It’s intended to be a book introducing mathematics “from scratch,” but with the assumption that the reader is a competent programmer. So I’ll treat the reader like an experienced coder  instead of a calculus student (or worse, a researcher). I’ll rely on explanations and analogies via code. In each chapter I’ll implement a full program illustrating what you learned in the chapter, with the code on github, too. I’ll explain every bit of notation, and discuss why the strange aspects of math are the way they are. This book isn’t intended to be a terse reference, but rather a book that you read from end to end. It’s also not meant to be a new “approach” to math. Instead this is more of a travel guide for the mathematical foreigner, with translations and tours and an explanation of mathematical tastes.

I’ve already written a hundred or so pages, and I’d estimate the book is around 1/3 to 1/2 done. I’m resolving to finish it this year. I plan to set up a mailing list within the next month or two for it so those interested can keep informed on my progress.

Support Math ∩ Programming by becoming a patron!

I’m making a few changes to the funding of Math ∩ Programming. First and foremost, I’m launching a Patreon campaign.

patreon-logo

The way Patreon works is that you, dear reader, sign up for a monthly donation of any amount you please (as little as $1/month). There are at least three benefits for you doing this:

  1. You show your support of mathematics and programming. You provide documented evidence that you’re a good person. You help to ensure that Math ∩ Programming stays a high quality resource for everyone. You feel good.
  2. There are aggregate milestone goals that make Math ∩ Programming a better place. The first, for example, is that when Math ∩ Programming reaches $200/month I will permanently remove ads. See below for a detailed description of how ads currently support the blog (spoiler: it’s not much).
  3. There are individual benefits if you decide to pledge $5 or more per month. This includes a monthly Google hangout I’ll host, and a sneak peek for every new post and private discussion with me. I’m still thinking about how exactly I’ll implement the member preview, but my current plan is to have a private subreddit. But even if you don’t use reddit there’ll be another way to get access. The highest tier of rewards involves physical merchandise.

I’m excited about Patreon because it seems like an excellent platform. For example, the popular Numberphile YouTube channel makes almost $3k USD per month from patrons. To put that into perspective it’s $36k per year, and my graduate student stipend is only about $17k per year. Assuming Math ∩ Programming could get even half the success of numberphile, I could have potentially funded my entire graduate work just from blogging!

And channels like Numberphile are purely for entertainment’s sake. Math ∩ Programming has the additional benefit of providing working code for algorithms that are directly applicable to business. So if you have ever used the code at Math ∩ Programming as the start of a project or feature, or even if you just have fun reading about math and seeing cool applications, consider becoming a patron to say thank you!

Here are some other minor funding changes.

  • One-time donations are now preferred through Square, due to the lower (1.9%) transaction fee. Square requires a debit card, so if you don’t have one or don’t want to use one you can still use PayPal to donate.
  • People rarely buy merchandise. I made a total of $147 on merchandise since 2013. So I’m going to stop doing that for now. Maybe I just have to come up with better merchandise (comments are welcome).
  • When I link to textbooks on Amazon I’m going to use Amazon Affiliate. Amazon pays me a little bit of money if you use the link and then end up buying something.

Funding so far

As of August 1, 2015 I have made a total of exactly $2,847.55 USD from ads and donations. About $320 of that is from 2015. It works out to about $70 per month since I first asked for money in 2013 and set up ads. It’s a nice little chunk of change, but nothing to get too excited about. Here is a chart of my ad income:

ads-0413-0615

My average (and 6-month moving average) ad revenue. July revenue isn’t posted until mid August.

Donations have provided the rest of the funding, but donations appear to follow a Poisson distribution and the median monthly revenue is zero. By far most of the donations were in the few months after I first asked for donations.

I started my blog with pretty low expectations: I learned a lot of cool things and I wanted to share them, while understanding them better by writing code and filling in proof details. That’s still the core dream, and it will always be the core of Math ∩ Programming. So while it’s pretty cool that I can make any money at all from my blog, and I’m interested to see if I can grow it into a viable side business, you can rest assured that Math ∩ Programming will stay true to its core.