The Quantum Bit

The best place to start our journey through quantum computing is to recall how classical computing works and try to extend it. Since our final quantum computing model will be a circuit model, we should informally discuss circuits first.

A circuit has three parts: the “inputs,” which are bits (either zero or one); the “gates,” which represent the lowest-level computations we perform on bits; and the “wires,” which connect the outputs of gates to the inputs of other gates. Typically the gates have one or two input bits and one output bit, and they correspond to some logical operation like AND, NOT, or XOR.

A simple example of a circuit.

A simple example of a circuit. The V’s are “OR” and the Λ’s are “AND.” Image source: Ryan O’Donnell

If we want to come up with a different model of computing, we could start regular circuits and generalize some or all of these pieces. Indeed, in our motivational post we saw a glimpse of a probabilistic model of computation, where instead of the inputs being bits they were probabilities in a probability distribution, and instead of the gates being simple boolean functions they were linear maps that preserved probability distributions (we called such a matrix “stochastic”).

Rather than go through that whole train of thought again let’s just jump into the definitions for the quantum setting. In case you missed last time, our goal is to avoid as much physics as possible and frame everything purely in terms of linear algebra.

Qubits are Unit Vectors

The generalization of a bit is simple: it’s a unit vector in \mathbb{C}^2. That is, our most atomic unit of data is a vector (a,b) with the constraints that a,b are complex numbers and a^2 + b^2 = 1. We call such a vector a qubit.

A qubit can assume “binary” values much like a regular bit, because you could pick two distinguished unit vectors, like (1,0) and (0,1), and call one “zero” and the other “one.” Obviously there are many more possible unit vectors, such as \frac{1}{\sqrt{2}}(1, 1) and (-i,0). But before we go romping about with what qubits can do, we need to understand how we can extract information from a qubit. The definitions we make here will motivate a lot of the rest of what we do, and is in my opinion one of the major hurdles to becoming comfortable with quantum computing.

A bittersweet fact of life is that bits are comforting. They can be zero or one, you can create them and change them and read them whenever you want without an existential crisis. The same is not true of qubits. This is a large part of what makes quantum computing so weird: you can’t just read the information in a qubit! Before we say why, notice that the coefficients in a qubit are complex numbers, so being able to read them exactly would potentially encode an infinite amount of information (in the infinite binary expansion)! Not only would this be an undesirably powerful property of a circuit, but physicists’ experiments tell us it’s not possible either.

So as we’ll see when we get to some algorithms, the main difficulty in getting useful quantum algorithms is not necessarily figuring out how to compute what you want to compute, it’s figuring out how to tease useful information out of the qubits that otherwise directly contain what you want. And the reason it’s so hard is that when you read a qubit, most of the information in the qubit is destroyed. And what you get to see is only a small piece of the information available. Here is the simplest example of that phenomenon, which is called the measurement in the computational basis.

Definition: Let v = (a,b) \in \mathbb{C}^2 be a qubit. Call the standard basis vectors e_0 = (1,0), e_1 = (0,1) the computational basis of \mathbb{C}^2. The process of measuring v in the computational basis consists of two parts.

  1. You observe (get as output) a random choice of e_0 or e_1. The probability of getting e_0 is |a|^2, and the probability of getting e_1 is |b|^2.
  2. As a side effect, the qubit v instantaneously becomes whatever state was observed in 1. This is often called a collapse of the waveform by physicists.

There are more sophisticated ways to measure, and more sophisticated ways to express the process of measurement, but we’ll cover those when we need them. For now this is it.

Why is this so painful? Because if you wanted to try to estimate the probabilities |a|^2 or |b|^2, not only would you get an estimate at best, but you’d have to repeat whatever computation prepared v for measurement over and over again until you get an estimate you’re satisfied with. In fact, we’ll see situations like this, where we actually have a perfect representation of the data we need to solve our problem, but we just can’t get at it because the measurement process destroys it once we measure.

Before we can talk about those algorithms we need to see how we’re allowed to manipulate qubits. As we said before, we use unitary matrices to preserve unit vectors, so let’s recall those and make everything more precise.

Qubit Mappings are Unitary Matrices

Suppose v = (a,b) \in \mathbb{C}^2 is a qubit. If we are to have any mapping between vector spaces, it had better be a linear map, and the linear maps that send unit vectors to unit vectors are called unitary matrices. An equivalent definition that seems a bit stronger is:

Definition: A linear map \mathbb{C}^2 \to \mathbb{C}^2 is called unitary if it preserves the inner product on \mathbb{C}^2.

Let’s remember the inner product on \mathbb{C}^n is defined by \left \langle v,w \right \rangle = \sum_{i=1}^n v_i \overline{w_i} and has some useful properties.

  • The square norm of a vector is \left \| v \right \|^2 = \left \langle v,v \right \rangle.
  • Swapping the coordinates of the complex inner product conjugates the result: \left \langle v,w \right \rangle = \overline{\left \langle w,v \right \rangle}
  • The complex inner product is a linear map if you fix the second coordinate, and a conjugate-linear map if you fix the first. That is, \left \langle au+v, w \right \rangle = a \left \langle u, w \right \rangle + \left \langle v, w \right \rangle and \left \langle u, aw + v \right \rangle = \overline{a} \left \langle u, w \right \rangle + \left \langle u,v \right \rangle

By the first bullet, it makes sense to require unitary matrices to preserve the inner product instead of just the norm, though the two are equivalent (see the derivation on page 2 of these notes). We can obviously generalize unitary matrices to any complex vector space, and unitary matrices have some nice properties. In particular, if U is a unitary matrix then the important property is that the columns (and rows) of U form an orthonormal basis. As an immediate result, if we take the product U\overline{U}^\text{T}, which is just the matrix of all possible inner products of columns of U, we get the identity matrix. This means that unitary matrices are invertible and their inverse is \overline{U}^\text{T}.

Already we have one interesting philosophical tidbit. Any unitary transformation of a qubit is reversible because all unitary matrices are invertible. Apparently the only non-reversible thing we’ve seen so far is measurement.

Recall that \overline{U}^\text{T} is the conjugate transpose of the matrix, which I’ll often write as U^*. Note that there is a way to define U^* without appealing to matrices: it is a notion called the adjoint, which is that linear map U^* such that \left \langle Uv, w \right \rangle = \left \langle v, U^*w \right \rangle for all v,w. Also recall that “unitary matrix” for complex vector spaces means precisely the same thing as “orthogonal matrix” does for real numbers. The only difference is the inner product being used (indeed, if the complex matrix happens to have real entries, then orthogonal matrix and unitary matrix mean the same thing).

Definition: single qubit gate is a unitary matrix \mathbb{C}^2 \to \mathbb{C}^2.

So enough with the properties and definitions, let’s see some examples. For all of these examples we’ll fix the basis to the computational basis e_0, e_1. One very important, but still very simple example of a single qubit gate is the Hadamard gate. This is the unitary map given by the matrix

\displaystyle \frac{1}{\sqrt{2}}\begin{pmatrix}  1 & 1 \\  1 & -1  \end{pmatrix}

It’s so important because if you apply it to a basis vector, say, e_0 = (1,0), you get a uniform linear combination \frac{1}{\sqrt{2}}(e_1 + e_2). One simple use of this is to allow for unbiased coin flips, and as readers of this blog know unbiased coins can efficiently simulate biased coins. But it has many other uses we’ll touch on as they come.

Just to give another example, the quantum NOT gate, often called a Pauli X gate, is the following matrix

\displaystyle \begin{pmatrix}  0 & 1 \\  1 & 0  \end{pmatrix}

It’s called this because, if we consider e_0 to be the “zero” bit and e_1 to be “one,” then this mapping swaps the two. In general, it takes (a,b) to (b,a).

As the reader can probably imagine by the suggestive comparison with classical operations, quantum circuits can do everything that classical circuits can do. We’ll save the proof for a future post, but if we want to do some kind of “quantum AND” operation, we get an obvious question. How do you perform an operation that involves multiple qubits? The short answer is: you represent a collection of bits by their tensor product, and apply a unitary matrix to that tensor.

We’ll go into more detail on this next time, and in the mean time we suggest checking out this blog’s primer on the tensor product. Until then!

About these ads

A Motivation for Quantum Computing

Quantum mechanics is one of the leading scientific theories describing the rules that govern the universe. It’s discovery and formulation was one of the most important revolutions in the history of mankind, contributing in no small part to the invention of the transistor and the laser.

Here at Math ∩ Programming we don’t put too much emphasis on physics or engineering, so it might seem curious to study quantum physics. But as the reader is likely aware, quantum mechanics forms the basis of one of the most interesting models of computing since the Turing machine: the quantum circuit. My goal with this series is to elucidate the algorithmic insights in quantum algorithms, and explain the mathematical formalisms while minimizing the amount of “interpreting” and “debating” and “experimenting” that dominates so much of the discourse by physicists.

Indeed, the more I learn about quantum computing the more it’s become clear that the shroud of mystery surrounding quantum topics has a lot to do with their presentation. The people teaching quantum (writing the textbooks, giving the lectures, writing the Wikipedia pages) are almost all purely physicists, and they almost unanimously follow the same path of teaching it.

Scott Aaronson (one of the few people who explains quantum in a way I understand) describes the situation superbly.

There are two ways to teach quantum mechanics. The first way – which for most physicists today is still the only way – follows the historical order in which the ideas were discovered. So, you start with classical mechanics and electrodynamics, solving lots of grueling differential equations at every step. Then, you learn about the “blackbody paradox” and various strange experimental results, and the great crisis that these things posed for physics. Next, you learn a complicated patchwork of ideas that physicists invented between 1900 and 1926 to try to make the crisis go away. Then, if you’re lucky, after years of study, you finally get around to the central conceptual point: that nature is described not by probabilities (which are always nonnegative), but by numbers called amplitudes that can be positive, negative, or even complex.

The second way to teach quantum mechanics eschews a blow-by-blow account of its discovery, and instead starts directly from the conceptual core – namely, a certain generalization of the laws of probability to allow minus signs (and more generally, complex numbers). Once you understand that core, you can then sprinkle in physics to taste, and calculate the spectrum of whatever atom you want.

Indeed, the sequence of experiments and debate has historical value. But the mathematics needed to have a basic understanding of quantum mechanics is quite simple, and it is often blurred by physicists in favor of discussing interpretations. To start thinking about quantum mechanics you only need to a healthy dose of linear algebra, and most of it we’ve covered in the three linear algebra primers on this blog. More importantly for computing-minded folks, one only needs a basic understanding of quantum mechanics to understand quantum computing.

The position I want to assume on this blog is that we don’t care about whether quantum mechanics is an accurate description of the real world. The real world gave an invaluable inspiration, but at the end of the day the mathematics stands on its own merits. The really interesting question to me is how the quantum computing model compares to classical computing. Most people believe it is strictly stronger in terms of efficiency. And so the murky depths of the quantum swamp must be hiding some fascinating algorithmic ideas. I want to understand those ideas, and explain them up to my own standards of mathematical rigor and lucidity.

So let’s begin this process with a discussion of an experiment that motivates most of the ideas we’ll need for quantum computing. Hopefully this will be the last experiment we discuss.

Shooting Photons and The Question of Randomness

Does the world around us have inherent randomness in it? This is a deep question open to a lot of philosophical debate, but what evidence do we have that there is randomness?

Here’s the experiment. You set up a contraption that shoots photons in a straight line, aimed at what’s called a “beam splitter.” A beam splitter seems to have the property that when photons are shot at it, they will be either be reflected at a 90 degree angle or stay in a straight line with probability 1/2. Indeed, if you put little photon receptors at the end of each possible route (straight or up, as below) to measure the number of photons that end at each receptor, you’ll find that on average half of the photons went up and half went straight.

photon-experiment

The triangle is the photon shooter, and the camera-looking things are receptors.

 

If you accept that the photon shooter is sufficiently good and the beam splitter is not tricking us somehow, then this is evidence that universe has some inherent randomness in it! Moreover, the probability that a photon goes up or straight seems to be independent of what other photons do, so this is evidence that whatever randomness we’re seeing follows the classical laws of probability. Now let’s augment the experiment as follows. First, put two beam splitters on the corners of a square, and mirrors at the other two corners, as below.

The thicker black lines are mirrors which always reflect the photons.

The thicker black lines are mirrors which always reflect the photons.

This is where things get really weird. If you assume that the beam splitter splits photons randomly (as in, according to an independent coin flip), then after the first beam splitter half go up and half go straight, and the same thing would happen after the second beam splitter. So the two receptors should measure half the total number of photons on average.

But that’s not what happens. Rather, all the photons go to the top receptor! Somehow the “probability” that the photon goes left or up in the first beam splitter is connected to the probability that it goes left or up in the second. This seems to be a counterexample to the claim that the universe behaves on the principles of independent probability. Obviously there is some deeper mystery at work.

awardplz

Complex Probabilities

One interesting explanation is that the beam splitter modifies something intrinsic to the photon, something that carries with it until the next beam splitter. You can imagine the photon is carrying information as it shambles along, but regardless of the interpretation it can’t follow the laws of classical probability. The classical probability explanation would go something like this:

There are two states, RIGHT and UP, and we model the state of a photon by a probability distribution (p, q) such that the photon has a probability p of being in state RIGHT a probability q of being in state UP, and like any probability distribution p + q = 1. A photon hence starts in state (1,0), and the process of traveling through the beam splitter is the random choice to switch states. This is modeled by multiplication by a particular so-called stochastic matrix (which just means the rows sum to 1)

\displaystyle A = \begin{pmatrix} 1/2 & 1/2 \\ 1/2 & 1/2 \end{pmatrix}

Of course, we chose this matrix because when we apply it to (1,0) and (0,1) we get (1/2, 1/2) for both outcomes. By doing the algebra, applying it twice to (1,0) will give the state (1/2, 1/2), and so the chance of ending up in the top receptor is the same as for the right receptor.

But as we already know this isn’t what happens in real life, so something is amiss. Here is an alternative explanation that gives a nice preview of quantum mechanics.

The idea is that, rather than have the state of the traveling photon be a probability distribution over RIGHT and UP, we have it be a unit vector in a vector space (over \mathbb{C}). That is, now RIGHT and UP are the (basis) unit vectors e_1 = (1,0), e_2 = (0,1), respectively, and a state x is a linear combination c_1 e_1 + c_2 e_2, where we require \left \| x \right \|^2 = |c_1|^2 + |c_2|^2 = 1. And now the “probability” that the photon is in the RIGHT state is the square of the coefficient for that basis vector p_{\text{right}} = |c_1|^2. Likewise, the probability of being in the UP state is p_{\text{up}} = |c_2|^2.

This might seem like an innocuous modification — even a pointless one! — but changing the sum (or 1-norm) to the Euclidean sum-of-squares (or the 2-norm) is at the heart of why quantum mechanics is so different. Now rather than have stochastic matrices for state transitions, which are defined they way they are because they preserve probability distributions, we use unitary matrices, which are those complex-valued matrices that preserve the 2-norm. In both cases, we want “valid states” to be transformed into “valid states,” but we just change precisely what we mean by a state, and pick the transformations that preserve that.

In fact, as we’ll see later in this series using complex numbers is totally unnecessary. Everything that can be done with complex numbers can be done without them (up to a good enough approximation for computing), but using complex numbers just happens to make things more elegant mathematically. It’s the kind of situation where there are more and better theorems in linear algebra about complex-valued matrices than real valued matrices.

But back to our experiment. Now we can hypothesize that the beam splitter corresponds to the following transformation of states:

\displaystyle A = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & i \\ i & 1 \end{pmatrix}

We’ll talk a lot more about unitary matrices later, so for now the reader can rest assured that this is one. And then how does it transform the initial state x =(1,0)?

\displaystyle y = Ax = \frac{1}{\sqrt{2}}(1, i)

So at this stage the probability of being in the RIGHT state is 1/2 = (1/\sqrt{2})^2 and the probability of being in state UP is also 1/2 = |i/\sqrt{2}|^2. So far it matches the first experiment. Applying A again,

\displaystyle Ay = A^2x = \frac{1}{2}(0, 2i) = (0, i)

And the photon is in state UP with probability 1. Stunning. This time Science is impressed by mathematics.

Next time we’ll continue this train of thought by generalizing the situation to the appropriate mathematical setting. Then we’ll dive into the quantum circuit model, and start churning out some algorithms.

Until then!

Making Hybrid Images

monalisa

The Mona Lisa

Leonardo da Vinci’s Mona Lisa is one of the most famous paintings of all time. And there has always been a discussion around her enigmatic smile. He used a trademark Renaissance technique called sfumato, which involves many thin layers of glaze mixed with subtle pigments. The striking result is that when you look directly at Mona Lisa’s smile, it seems to disappear. But when you look at the background your peripherals see a smiling face.

One could spend decades studying the works of these masters from various perspectives, but if we want to hone in on the disappearing nature of that smile, mathematics can provide valuable insights. Indeed, though he may not have known the relationship between his work and da Vinci’s, hundreds of years later Salvador Dali did the artist’s equivalent of mathematically isolating the problem with his painting, “Gala Contemplating the Mediterranean Sea.”

gala-dali

Gala Contemplating the Mediterranean Sea (Salvador Dali, 1976)

Here you see a woman in the foreground, but step back quite far from the picture and there is a (more or less) clear image of Abraham Lincoln. Here the question of gaze is the blaring focus of the work. Now of course Dali and da Vinci weren’t scribbling down equations and computing integrals; their artistic expression was much less well-defined. But we the artistically challenged have tools of our own: mathematics, science, and programming.

In 2006 Aude Oliva, Antonio Torralba, and Philippe. G. Schyns used those tools to merge the distance of Dali and the faded smiles of da Vinci into one cohesive idea. In their 2006 paper they presented the notion of a “hybrid image,” presented below.

monalisas

The Mona Lisas of Science

If you look closely, you’ll see three women, each of which looks the teensiest bit strange, like they might be trying to suppress a smile, but none of them are smiling. Blur your eyes or step back a few meters, and they clearly look happy. The effect is quite dramatic. At the risk of being overly dramatic, these three women are literally modern day versions of Mona Lisa, the “Mona Lisas of Science,” if you will.

Another, perhaps more famous version of their technique, since it was more widely publicized, is their “Marilyn Einstein,” which up close is Albert Einstein and from far away is Marilyn Monroe.

marilyn-einstein

Marilyn Einstein

This one gets to the heart of the question of what the eye sees at close range versus long range. And it turns out that you can address this question (and create brilliant works of art like the ones above) with some basic Fourier analysis.

Intuitive Fourier analysis (and references)

The basic idea of Fourier analysis is the idea that smooth functions are hard to understand, and realization of how great it would be if we could decompose them into simpler pieces. Decomposing complex things into simpler parts is one of the main tools in all of mathematics, and Fourier analysis is one of the clearest examples of its application.

In particular, the things we care about are functions f(x) with specific properties I won’t detail here like “smoothness” and “finiteness.” And the building blocks are the complex exponential functions

\displaystyle e^{2 \pi i kx}

where k can be any integer. If you have done some linear algebra (and ignore this if you haven’t), then I can summarize the idea succinctly by saying the complex exponentials form an orthonormal basis for the vector space of square-integrable functions.

Back in colloquial language, what the Fourier theorem says is that any function of the kind we care about can be broken down into (perhaps infinitely many) pieces of this form called Fourier coefficients (I’m abusing the word “coefficient” here). The way it’s breaking down is also pleasingly simple: it’s a linear combination. Informally that means you’re just adding up all the complex exponentials with specific weights for each one. Mathematically, the conversion from the function to its Fourier coefficients is called the Fourier transform, and the set of all Fourier coefficients together is called the Fourier spectrum. So if you want to learn about your function f, or more importantly modify it in some way, you can inspect and modify its spectrum instead. The reason this is useful is that Fourier coefficients have very natural interpretations in sound and images, as we’ll see for the latter.

We wrote f(x) and the complex exponential as a function of one real variable, but you can do the same thing for two variables (or a hundred!). And, if you’re willing to do some abusing and ignore the complexness of complex numbers, then you can visualize “complex exponentials in two variables” as images of stripes whose orientation and thickness correspond to two parameters (i.e., the k in the offset equation becomes two coefficients). The video below shows how such complex exponentials can be used to build up an image of striking detail. The left frame shows which complex exponential is currently being added, and the right frame shows the layers all put together. I think the result is quite beautiful.

This just goes to show how powerful da Vinci’s idea of fine layering is: it’s as powerful as possible because it can create any image! 

Now for digital images like the one above, everything is finite. So rather than have an infinitely precise function and a corresponding infinite set of Fourier coefficients, you get a finite list of sampled values (pixels) and a corresponding grid of Fourier coefficients. But the important and beautiful theorem is, and I want to emphasize how groundbreakingly important this is:

If you give me an image (or any function!) I can compute the decomposition very efficiently.

And the same theorem lets you go the other way: if you give me the decomposition, I can compute the original function’s samples quite easily. The algorithm to do this is called the Fast Fourier transform, and if any piece of mathematics or computer science has a legitimate claim to changing the world, it’s the Fast Fourier transform. It’s hard to pinpoint specific applications, because the transform is so ubiquitous across science and engineering, but we definitely would not have cell phones, satellites, internet, or electronics anywhere near as small as we do without the Fourier transform and the ability to compute it quickly.

Constructing hybrid images is one particularly nice example of manipulating the Fourier spectrum of two images, and then combining them back into a single image. That’s what we’ll do now.

As a side note, by the nature of brevity, the discussion above is a big disservice to the mathematics involved. I summarized and abused in ways that mathematicians would object to. If you want to see a much better treatment of the material, this blog has a long series of posts developing Fourier transforms and their discrete analogues from scratch. See our four primers, which lead into the main content posts where we implement the Fast Fourier transform in Python and use it to apply digital watermarks to an image. Note that in those posts, as in this one, all of the materials and code used are posted on this blog’s Github page.

High and low frequencies

For images, interpreting ranges of Fourier coefficients is easy to do. You can imagine the coefficients lying on a grid in the plane like so:

sherlock-spectrum

Each dot in this grid corresponds to how “intense” the Fourier coefficient is. That is, it’s the magnitude of the (complex) coefficient of the corresponding complex exponential. Now the points that are closer to the origin correspond informally to the broad, smooth changes in the image. These are called “low frequency” coefficients. And points that are further away correspond to sharp changes and edges, and are likewise called “high frequency” components. So the if you wanted to “hybridize” two images, you’d pick ones with complementary intensities in these regions. That’s why Einstein (with all his wiry hair and wrinkles) and Monroe (with smooth features) are such good candidates. That’s also why, when we layered the Fourier components one by one in the video from earlier, we see the fuzzy shapes emerge before the fine details.

Moreover, we can “extract” the high frequency Fourier components by simply removing the low frequency ones. It’s a bit more complicated than that, since you want the transition from “something” to “nothing” to be smooth in sone sense. A proper discussion of this would go into sampling and the Nyquist frequency, but that’s beyond the scope of this post. Rather, we’ll just define a family of “filtering functions” without motivation and observe that they work well.

Definition: The Gaussian filter function with variance \sigma and center (a, b) is the function

\displaystyle g(x,y) = e^{-\frac{(x - a)^2 + (y - b)^2}{2 \sigma^2}}

It looks like this

image credit Wikipedia

image credit Wikipedia

In particular, at zero the function is 1 and it gradually drops to zero as you get farther away. The parameter \sigma controls the rate at which it vanishes, and in the picture above the center is set to (0,0).

Now what we’ll do is take our image, compute its spectrum, and multiply coordinatewise with a certain Gaussian function. If we’re trying to get rid of high-frequency components (called a “low-pass filter” because it lets the low frequencies through), we can just multiply the Fourier coefficients directly by the filter values g(x,y), and if we’re doing a “high-pass filter” we multiply by 1 - g(x,y).

Before we get to the code, here’s an example of a low-pass filter. First, take this image of Marilyn Monroe

marilyn

Now compute its Fourier transform

dft

Apply the low-pass filter

filtered-dft

And reverse the Fourier transform to get an image

low-passed-marilyn

In fact, this is a common operation in programs like photoshop for blurring an image (it’s called a Gaussian blur for obvious reasons). Here’s the python code to do this. You can download it along with all of the other resources used in making this post on this blog’s Github page.

import numpy
from numpy.fft import fft2, ifft2, fftshift, ifftshift
from scipy import misc
from scipy import ndimage
import math

def makeGaussianFilter(numRows, numCols, sigma, highPass=True):
   centerI = int(numRows/2) + 1 if numRows % 2 == 1 else int(numRows/2)
   centerJ = int(numCols/2) + 1 if numCols % 2 == 1 else int(numCols/2)

   def gaussian(i,j):
      coefficient = math.exp(-1.0 * ((i - centerI)**2 + (j - centerJ)**2) / (2 * sigma**2))
      return 1 - coefficient if highPass else coefficient

   return numpy.array([[gaussian(i,j) for j in range(numCols)] for i in range(numRows)])

def filterDFT(imageMatrix, filterMatrix):
   shiftedDFT = fftshift(fft2(imageMatrix))
   filteredDFT = shiftedDFT * filterMatrix
   return ifft2(ifftshift(filteredDFT))

def lowPass(imageMatrix, sigma):
   n,m = imageMatrix.shape
   return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=False))

def highPass(imageMatrix, sigma):
   n,m = imageMatrix.shape
   return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=True))

if __name__ == "__main__":
   marilyn = ndimage.imread("marilyn.png", flatten=True)
   lowPassedMarilyn = lowPass(marilyn, 20)
   misc.imsave("low-passed-marilyn.png", numpy.real(lowPassedMarilyn))

The first function samples the values from a Gaussian function with the specified parameters, discretizing the function and storing the values in a matrix. Then the filterDFT function applies the filter by doing coordinatewise multiplication (note these are all numpy arrays). We can do the same thing with a high-pass filter, producing the edgy image below

high-passed-marilyn

And if we compute the average of these two images, we basically get back to the original.

sum-of-marilyns

So the only difference between this and a hybrid image is that you take the low-passed part of one image and the high-passed part of another. Then the art is in balancing the parameters so as to make the averaged image look right. Indeed, with the following picture of Einstein and the above shot of Monroe, we can get a pretty good recreation of the Oliva-Torralba-Schyns piece. I think with more tinkering it could be even better (I did barely any centering/aligning/resizing to the original images).

Albert Einstein, Marilyn Monroe, and their hybridization.

Albert Einstein, Marilyn Monroe, and their hybridization.

And here’s the code for it

def hybridImage(highFreqImg, lowFreqImg, sigmaHigh, sigmaLow):
   highPassed = highPass(highFreqImg, sigmaHigh)
   lowPassed = lowPass(lowFreqImg, sigmaLow)

   return highPassed + lowPassed

Interestingly enough, doing it in reverse doesn’t give quite as pleasing results, but it still technically works. So there’s something particularly important that the high-passed image does have a lot of high-frequency components, and vice versa for the low pass.

backwards

You can see some of the other hybrid images Oliva et al constructed over at their web gallery.

Next Steps

How can we take this idea further? There are a few avenues I can think of. The most obvious one would be to see how this extends to video. Could one come up with generic parameters so that when two videos are hybridized (frame by frame, using this technique) it is only easy to see one at close distance? Or else, could we apply a three-dimensional transform to a video and modify that in some principled way? I think one would not likely find anything astounding, but who knows?

Second would be to look at the many other transforms we have at our disposal. How does manipulating the spectra of these transforms affect the original image, and can you make images that are hybridized in senses other than this one?

And finally, can we bring this idea down in dimension to work with one-dimensional signals? In particular, can we hybridize music? It could usher in a new generation of mashup songs that sound different depending on whether you wear earmuffs :)

Until next time!

AMS Network Science Mathematical Research Community

I don’t usually write promotional posts because I don’t enjoy reading them as much as I enjoy reading the technical posts. But I know that a lot of early graduate students and undergraduates read my blog, and this would be of interest to many of them.

I just got back from Utah yesterday where I attended a 5-day workshop run by the American Mathematical Society, called the Network Science Mathematical Research Community (MRC).

The point of the program is to bring graduate students and early career folks together from all over the country to start new collaborations. The AMS runs multiple MRC sessions every year, and this year the topics ranged from network science to quantum physics. We had a group of about 20 people, including statisticians, applied mathematicians, computer scientists, and a handful of pure combinatorialists. We self-organized into groups of four, and spent pretty much all day for the next four days eating great food, thinking about problems, proving theorems, enjoying the view, and discussing our ideas with the three extremely smart, successful, and amicable organizers. There were also career panels every evening that were, in my opinion, better than the average career panel.

The network science group (you can see me peeking out from the back).

The network science group (you can see me peeking out from the back, just left of center).

Anyway, it was a really fun and valuable experience, and the AMS pays for everything and a bag of chips (if by chips you mean more travel money to meet up with your collaborators and a ticket to the AMS Joint Mathematics Meeting the following January). I’m excited to blog about the work that come out of this, as network science is right up there with the coolest of topics in math and programming.

So if you’re eligible, keep an eye out for next year’s program.