Quantum mechanics is one of the leading scientific theories describing the rules that govern the universe. It’s discovery and formulation was one of the most important revolutions in the history of mankind, contributing in no small part to the invention of the transistor and the laser.

Here at Math ∩ Programming we don’t put too much emphasis on physics or engineering, so it might seem curious to study quantum physics. But as the reader is likely aware, quantum mechanics forms the basis of one of the most interesting models of computing since the Turing machine: the *quantum circuit*. My goal with this series is to elucidate the algorithmic insights in quantum algorithms, and explain the mathematical formalisms while minimizing the amount of “interpreting” and “debating” and “experimenting” that dominates so much of the discourse by physicists.

Indeed, the more I learn about quantum computing the more it’s become clear that the shroud of mystery surrounding quantum topics has a lot to do with their presentation. The people teaching quantum (writing the textbooks, giving the lectures, writing the Wikipedia pages) are almost all purely physicists, and they almost unanimously follow the same path of teaching it.

Scott Aaronson (one of the few people who explains quantum in a way I understand) describes the situation superbly.

There are two ways to teach quantum mechanics. The first way – which for most physicists today is still the only way – follows the historical order in which the ideas were discovered. So, you start with classical mechanics and electrodynamics, solving lots of grueling differential equations at every step. Then, you learn about the “blackbody paradox” and various strange experimental results, and the great crisis that these things posed for physics. Next, you learn a complicated patchwork of ideas that physicists invented between 1900 and 1926 to try to make the crisis go away. Then, if you’re lucky, after years of study, you finally get around to the central conceptual point: that nature is described not by

probabilities(which are always nonnegative), but by numbers calledamplitudesthat can be positive, negative, or even complex.The second way to teach quantum mechanics eschews a blow-by-blow account of its discovery, and instead

starts directly from the conceptual core– namely, a certain generalization of the laws of probability to allow minus signs (and more generally, complex numbers). Once you understand that core, you canthensprinkle in physics to taste, and calculate the spectrum of whatever atom you want.

Indeed, the sequence of experiments and debate has historical value. But the mathematics needed to have a basic understanding of quantum mechanics is quite simple, and it is often blurred by physicists in favor of discussing interpretations. To start thinking about quantum mechanics you only need to a healthy dose of linear algebra, and most of it we’ve covered in the three linear algebra primers on this blog. More importantly for computing-minded folks, one only needs a basic understanding of quantum mechanics to understand quantum *computing. *

The position I want to assume on this blog is that we don’t care about whether quantum mechanics is an accurate description of the real world. The real world gave an invaluable inspiration, but at the end of the day the mathematics stands on its own merits. The really interesting question to me is how the quantum computing model compares to classical computing. Most people believe it is strictly stronger in terms of efficiency. And so the murky depths of the quantum swamp must be hiding some fascinating algorithmic ideas. I want to understand those ideas, and explain them up to my own standards of mathematical rigor and lucidity.

So let’s begin this process with a discussion of an experiment that motivates most of the ideas we’ll need for quantum computing. Hopefully this will be the last experiment we discuss.

## Shooting Photons and The Question of Randomness

Does the world around us have inherent randomness in it? This is a deep question open to a lot of philosophical debate, but what evidence do we have that there is randomness?

Here’s the experiment. You set up a contraption that shoots photons in a straight line, aimed at what’s called a “beam splitter.” A beam splitter seems to have the property that when photons are shot at it, they will be either be reflected at a 90 degree angle or stay in a straight line with probability 1/2. Indeed, if you put little photon receptors at the end of each possible route (straight or up, as below) to measure the number of photons that end at each receptor, you’ll find that on average half of the photons went up and half went straight.

If you accept that the photon shooter is sufficiently good and the beam splitter is not tricking us somehow, then this is evidence that universe has some inherent randomness in it! Moreover, the probability that a photon goes up or straight seems to be independent of what other photons do, so this is evidence that whatever randomness we’re seeing follows the classical laws of probability. Now let’s augment the experiment as follows. First, put *two* beam splitters on the corners of a square, and mirrors at the other two corners, as below.

This is where things get *really* weird. If you assume that the beam splitter splits photons randomly (as in, according to an independent coin flip), then after the first beam splitter half go up and half go straight, and the same thing would happen after the second beam splitter. So the two receptors should measure half the total number of photons on average.

But that’s *not* what happens. Rather, *all the photons go to the top receptor! *Somehow the “probability” that the photon goes left or up in the first beam splitter is connected to the probability that it goes left or up in the second. This seems to be a counterexample to the claim that the universe behaves on the principles of independent probability. Obviously there is some deeper mystery at work.

## Complex Probabilities

One interesting explanation is that the beam splitter modifies something intrinsic to the photon, something that carries with it until the next beam splitter. You can imagine the photon is carrying information as it shambles along, but regardless of the interpretation it can’t follow the laws of classical probability.

The simplest classical probability explanation would go something like this:

There are two states, RIGHT and UP, and we model the state of a photon by a probability distribution $ (p, q)$ such that the photon has a probability $ p$ of being in state RIGHT a probability $ q$ of being in state UP, and like any probability distribution $ p + q = 1$. A photon hence starts in state $ (1,0)$, and the process of traveling through the beam splitter is the random choice to switch states. This is modeled by multiplication by a particular so-called *stochastic matrix* (which just means the rows sum to 1)

$ \displaystyle A = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{pmatrix}$

Of course, we chose this matrix because when we apply it to $ (1,0)$ and $ (0,1)$ we get $ (1/2, 1/2)$ for both outcomes. By doing the algebra, applying it *twice* to $ (1,0)$ will give the state $ (1/2, 1/2)$, and so the chance of ending up in the top receptor is the same as for the right receptor.

But as we already know this isn’t what happens in real life, so something is amiss. Here is an alternative explanation that gives a nice preview of quantum mechanics.

The idea is that, rather than have the state of the traveling photon be a probability distribution over RIGHT and UP, we have it be a unit vector in a vector space (over $ \mathbb{C}$). That is, now RIGHT and UP are the (basis) unit vectors $ e_1 = (1,0), e_2 = (0,1)$, respectively, and a state $ x$ is a linear combination $ c_1 e_1 + c_2 e_2$, where we require $ \left \| x \right \|^2 = |c_1|^2 + |c_2|^2 = 1$. And now the “probability” that the photon is in the RIGHT state is the square of the coefficient for that basis vector $ p_{\text{right}} = |c_1|^2$. Likewise, the probability of being in the UP state is $ p_{\text{up}} = |c_2|^2$.

This might seem like an innocuous modification — even a pointless one! — but changing the sum (or 1-norm) to the Euclidean sum-of-squares (or the 2-norm) is at the heart of why quantum mechanics is so different. Now rather than have stochastic matrices for state transitions, which are defined they way they are because they preserve probability distributions, we use *unitary matrices,* which are those complex-valued matrices that preserve the 2-norm. In both cases, we want “valid states” to be transformed into “valid states,” but we just change precisely what we mean by a state, and pick the transformations that preserve that.

In fact, as we’ll see later in this series using complex numbers is totally unnecessary. Everything that can be done with complex numbers can be done without them (up to a good enough approximation for computing), but using complex numbers just happens to make things more elegant mathematically. It’s the kind of situation where there are more and better theorems in linear algebra about complex-valued matrices than real valued matrices.

But back to our experiment. Now we can hypothesize that the beam splitter corresponds to the following transformation of states:

$ \displaystyle A = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & i \\ i & 1 \end{pmatrix}$

We’ll talk a lot more about unitary matrices later, so for now the reader can rest assured that this is one. And then how does it transform the initial state $ x =(1,0)$?

$ \displaystyle y = Ax = \frac{1}{\sqrt{2}}(1, i)$

So at this stage the probability of being in the RIGHT state is $ 1/2 = (1/\sqrt{2})^2$ and the probability of being in state UP is also $ 1/2 = |i/\sqrt{2}|^2$. So far it matches the first experiment. Applying $ A$ again,

$ \displaystyle Ay = A^2x = \frac{1}{2}(0, 2i) = (0, i)$

And the photon is in state UP with probability 1. Stunning. This time Science is impressed by mathematics.

Next time we’ll continue this train of thought by generalizing the situation to the appropriate mathematical setting. Then we’ll dive into the quantum circuit model, and start churning out some algorithms.

Until then!

**[Edit: Actually, if you make the model complicated enough, then you can achieve the result using classical probability. The experiment I described above, while it does give evidence that something more complicated is going on, it does not fully rule out classical probability. Mathematically, you can lay out the axioms of quantum mechanics (as we will from the perspective of computing), and mathematically this forces non-classical probability. But to the best of my knowledge there is no experiment or set of experiments that gives decisive proof that all of the axioms are necessary. In my search for such an experiment I asked this question on stackexchange and didn’t understand any of the answers well enough to paraphrase them here. Moreover, if you leave out the axiom that quantum circuit operations are reversible, you can do everything with classical probability. I read this somewhere but now I can’t find the source 🙁**

**One consequence is that I am more firmly entrenched in my view that I only care about quantum mechanics in how it produced quantum computing as a new paradigm in computer science. This paradigm doesn’t need physics at all, and apparently the motivations for the models are still unclear, so we just won’t discuss them any more. Sorry, physics lovers.]**

Terrific! There’s always a lot of interesting stuff to dive into, and while figuring everything out, step by step and book by book, can be useful and illuminating, there is simply not enough spare time to learn about everything.

Such series that go straight to the point are great to get some basic insight into a subject. Also, your writing is very clear. I’ll be keeping an eye on this one. Thanks!

Cool stuff. Very clear. Is there a paper that describes the experiment with the photons? I wonder how the the fact that there is a possibility that the beamer changes something about the information the photon is caring is addressed in the paper. If it is addressed at all.

The papers are referenced in the linked Wikipedia page (http://en.wikipedia.org/wiki/Mach%E2%80%93Zehnder_interferometer), though they appear to all be written in German.

Wouldn’t the simplest explanation of the data be that half of the photons are such as to always bounce off beam splitters and half of them are such as to always pass through beam splitters? Then the beam splitter doesn’t even have to modify the state of the photon. These also seem like more natural intrinsic properties to give the photon than RIGHT and UP because they don’t raise questions like “What happens if you rotate the experiment?”

I guess you are assuming that all the photons are intrinsically identical to begin with. That would rule out my suggestion, although it seems an unwarranted assumption. It also seems you are assuming that the photon’s state can only be in two states (at least before you introduce the complex number stuff). If the photons are all in the same state initially then

two states are not enough to generate the results of the experiment, but you can do it with three. Call the states A, B, and C, and let A be the initial state. When a beam splitter gets an A photon, half the time it reflects it and makes it a B photon, and half the time it passes it and makes it a C photon. B photons are always reflected and C photons are always passed.

Here are two things I found confusing about this presentation. You suggest thinking of the state of a photon as a probability distribution when it seems clear that (before you introduce the complex number stuff) it is meant to be a binary variable. And when you are describing how the beam splitters process photons you don’t separately consider how they change their states and whether they reflect or pass them. (I think the terms RIGHT and UP were meant to be somehow suggestive of how the reflecting/passing works, but I don’t really understand what these terms are supposed to indicate. Probably my earlier comment about rotating the experiment is off base, based on a wrong understanding of these terms.)

Good comments. I’ll try to address them one by one.

> all the photons are intrinsically identical to begin with…seems an unwarranted assumption.

Really? I have never heard of any theory that distinguishes, say, one standard Helium atom from another. Why would one photon be different from another?

> I think the terms RIGHT and UP were meant to be somehow suggestive of how the reflecting/passing works, but I don’t really understand what these terms are supposed to indicate.

I believe in reality the mirrors are polarized, and a photon passing through the mirror will correspond to a change in spin of the photon (the binary states being “polarized” or “not polarized”). I used the terms RIGHT and UP because then I don’t have to talk about spin and polarization, and the fact is it doesn’t matter what you

callthe states. What matters is the behavior. I like to think about quantum mechanics as an algorithmic mechanism for manipulating abstract states, not a physical process for manipulating objects.> You suggest thinking of the state of a photon as a probability distribution when it seems clear that…it is meant to be a binary variable… [and also] you can do it with three [states instead of two].

You’re right, you can get the behavior by adding more states. I don’t have a good example that I can use to replace it (I will look for one), but I do know that once you add in the axiom that quantum transformations are linear, continuous, and reversible operators, you suddenly lose the ability to model it with classical probability. But it sounds like you already know this?

“In my search for such an experiment I asked this question on stackexchange and didn’t understand any of the answers well enough to paraphrase them here. Moreover, if you leave out the axiom that quantum circuit operations are reversible, you can do everything with classical probability. I read this somewhere but now I can’t find the source :(”

Too bad the answers on stackexchange pretty much went off on a tangent. Irreversible dynamics in quantum mechanics is possible even though reversible dynamics via the Schrödinger equation is fundamental. This is because the restriction of a reversible dynamics in a larger system may not give a reversible dynamics on your subsystem of interest.

Reversibility is not important for quantum computing. In measurement-based quantum computing (https://en.wikipedia.org/wiki/One-way_quantum_computer), one does computation equivalent to computation in the circuit model by preparing a multipartite “resource” state and then measuring each qubit, one by one, in a specific pattern and in an adaptive way with subsequent measurements depending on previous results. And there’re more works out there on dissipative quantum computing that I’m not familiar with.

The original positivist interpretation of quantum mechanics forces QC people to admit that certain non-empirically verified assumptions hold, such as unitary and psi-ontic wave functions. The “negative probabilities” are in a quasiprobability distribution where the anomaly is created at less than hbar and there are no final negative probabilities. By the way, the wave function of an electron is a complex-coefficient spinor function, which isn’t just a simple amplitude. The wave function can be positive, negative, complex, spinor or vector. Note that mathematicians have found quantum probability to be useless for modeling anything but atomic particles.

Just about every field of science attempts to quantify counterfactuals and probability in QM is no more needed than in any other branch of science. Everything in QM follows from the uncertainty principle and that only gives support to the idea that we use probabilities to measure OUR uncertainty! The system could even be chaotic. To get probabilities, you are assuming classical particles but there are no classical particles. Quantum mechanics really predicts the expected values of observables and it does not measure probabilities.

Just because simulating quantum states requires a high Turing complexity does not mean the argument runs backwards because the exponential computation on the wave function may well have nothing to do with the physical system of something like an dumb electron. Many in the QC subscribe to the Schrödinger’s cat fallacy. The fallacies here are as persistent as those about EPR. QCs may not be possible at all given that they go beyond current accepted theory of QM. People need to be honest about that. People making money building them are not honest, however.

I understood the math but I totally didn’t get the physics part. Why would beam splitters behave *that* way? Because that is not what really happens if you actually make an experiment: https://en.wikipedia.org/wiki/Mach%E2%80%93Zehnder_interferometer

Jeremy hi.

I haven’t read all of your articles yet, but since I’ve been a Physics undergrad many years ago, the phrase “complex probabilities” was a bit too much for me.

I can even now remember my dear prof. saying that “guys, even if we’re dealing with complex probability amplitudes, if you end up getting complex probabilities when calculating say the mean value of the total Energy, you’ve obviously did something wrong.”. I’m thinking you might have meant something different, but I just wanted to give a sincere feedback.

Oh yes, also I haven’t lived many years in an english speaking country, so perhaps I’m not well aware of the technical jargon and the colloquialisms used in the States to describe the wavefunction or the Dirac states ().

Cheers and keep it up. You’re a great inspiration, for a Physicist turned into an aspiring Applied Mathematician.

SMD

ps: now that I think of it, the relationship of Quantum Mechanics as a method of using complex linear algebra to find results for the real world, feels analogous to the use of Complex Analysis to answer questions of the Real Analysis.

ps2: I’d really like to read one of your posts in the future, that’d explain the idea behind Hugh Everett’s III Phd thesis [http://goo.gl/hRHxnf (PDF- 4.2MB)]. I don’t think it’s easily linked with Quantum Computing, so it might not interest you. But from skimming into it, I got the idea that it is closer to a Mathematician than to a Physicist. Maybe that’s why it was so unpopular between Physicists back then, and of course because N. Bohr was still alive. Oh yes, and it seems to me that he was deeply influenced by the work of C. Shannon on Information Theory.

As a physics graduate, what I want to say is that the historical approach to QM was only taught in some undergraduate introductory classes. Many talented student just skip it and learn the QM from mathematical assumption with experimental picture in mind.

reference：Modern Quantum Mechanics by Sakurai

This is related to your Edit at the end.

Bell’s theorem showed that if QM works the way they think it does, then the correlations you would see in certain experiments couldn’t be explained by a “local” hidden-variable theory. “Local” sort of means that information has to travel at the speed of light or less. (That includes not going backwards in time.)

The example was a variation on the original Einstein-Podolsky–Rosen (EPR) paradox: a correlated-particle pair is produced in a middle place, and the two particles go to two distant detectors to (say) the East and West. I think the detectors need to be oriented by true random noise at each end just before detection. The correlation between the two detections follows a curve that’s a trig function of the difference between the two detectors’ orientations at the moment(s) of detection(s).

The way I like to summarize it is: the shape of that curve *can’t be explained by any encoding of any amount of information* traveling along with the two particles. The key is that neither particle knows the relative orientation of the two detectors, each one acts locally as if it doesn’t know it, but the correlation between what they do *does* depend on it.

Later experiments, especially Aspect’s, showed those curves (as predicted by QM) are the curves that show up.

The place I finally found a decent explanation of amplitudes, EPR, Bell, and Aspect was in _Quantum Reality_ by Nick Herbert.

Anyway, this-all means ours can’t be local physics plus classical probability; I’m pretty sure that’s what you meant by ruling out classical probability. I would also guess that it would prevent you modeling quantum circuits without (the equivalent of) amplitudes but like you said, that would be getting into physics, and I probably know less physics than you do.

Never underestimate how little physics I know 🙂

Thanks for the note!