# Finding the majority element of a stream

Problem: Given a massive data stream of $n$ values in $\{ 1, 2, \dots, m \}$ and the guarantee that one value occurs more than $n/2$ times in the stream, determine exactly which value does so.

Solution: (in Python)

def majority(stream):
held = next(stream)
counter = 1

for item in stream:
if item == held:
counter += 1
elif counter == 0:
held = item
counter = 1
else:
counter -= 1

return held


Discussion: Let’s prove correctness. Say that $s$ is the unknown value that occurs more than $n/2$ times. The idea of the algorithm is that if you could pair up elements of your stream so that distinct values are paired up, and then you “kill” these pairs, then $s$ will always survive. The way this algorithm pairs up the values is by holding onto the most recent value that has no pair (implicitly, by keeping a count how many copies of that value you saw). Then when you come across a new element, you decrement the counter and implicitly account for one new pair.

Let’s analyze the complexity of the algorithm. Clearly the algorithm only uses a single pass through the data. Next, if the stream has size $n$, then this algorithm uses $O(\log(n) + \log(m))$ space. Indeed, if the stream entirely consists of a single value (say, a stream of all 1’s) then the counter will be $n$ at the end, which takes $\log(n)$ bits to store. On the other hand, if there are $m$ possible values then storing the largest requires $\log(m)$ bits.

Finally, the guarantee that one value occurs more than $n/2$ times is necessary. If it is not the case the algorithm could output anything (including the most infrequent element!). And moreover, if we don’t have this guarantee then every algorithm that solves the problem must use at least $\Omega(n)$ space in the worst case. In particular, say that $m=n$, and the first $n/2$ items are all distinct and the last $n/2$ items are all the same one, the majority value $s$. If you do not know $s$ in advance, then you must keep at least one bit of information to know which symbols occurred in the first half of the stream because any of them could be $s$. So the guarantee allows us to bypass that barrier.

This algorithm can be generalized to detect $k$ items with frequency above some threshold $n/(k+1)$ using space $O(k \log n)$. The idea is to keep $k$ counters instead of one, adding new elements when any counter is zero. When you see an element not being tracked by your $k$ counters (which are all positive), you decrement all the counters by 1. This is like a $k$-to-one matching rather than a pairing.

# Making Hybrid Images

The Mona Lisa

Leonardo da Vinci’s Mona Lisa is one of the most famous paintings of all time. And there has always been a discussion around her enigmatic smile. He used a trademark Renaissance technique called sfumato, which involves many thin layers of glaze mixed with subtle pigments. The striking result is that when you look directly at Mona Lisa’s smile, it seems to disappear. But when you look at the background your peripherals see a smiling face.

One could spend decades studying the works of these masters from various perspectives, but if we want to hone in on the disappearing nature of that smile, mathematics can provide valuable insights. Indeed, though he may not have known the relationship between his work and da Vinci’s, hundreds of years later Salvador Dali did the artist’s equivalent of mathematically isolating the problem with his painting, “Gala Contemplating the Mediterranean Sea.”

Gala Contemplating the Mediterranean Sea (Salvador Dali, 1976)

Here you see a woman in the foreground, but step back quite far from the picture and there is a (more or less) clear image of Abraham Lincoln. Here the question of gaze is the blaring focus of the work. Now of course Dali and da Vinci weren’t scribbling down equations and computing integrals; their artistic expression was much less well-defined. But we the artistically challenged have tools of our own: mathematics, science, and programming.

In 2006 Aude Oliva, Antonio Torralba, and Philippe. G. Schyns used those tools to merge the distance of Dali and the faded smiles of da Vinci into one cohesive idea. In their 2006 paper they presented the notion of a “hybrid image,” presented below.

The Mona Lisas of Science

If you look closely, you’ll see three women, each of which looks the teensiest bit strange, like they might be trying to suppress a smile, but none of them are smiling. Blur your eyes or step back a few meters, and they clearly look happy. The effect is quite dramatic. At the risk of being overly dramatic, these three women are literally modern day versions of Mona Lisa, the “Mona Lisas of Science,” if you will.

Another, perhaps more famous version of their technique, since it was more widely publicized, is their “Marilyn Einstein,” which up close is Albert Einstein and from far away is Marilyn Monroe.

Marilyn Einstein

This one gets to the heart of the question of what the eye sees at close range versus long range. And it turns out that you can address this question (and create brilliant works of art like the ones above) with some basic Fourier analysis.

## Intuitive Fourier analysis (and references)

The basic idea of Fourier analysis is the idea that smooth functions are hard to understand, and realization of how great it would be if we could decompose them into simpler pieces. Decomposing complex things into simpler parts is one of the main tools in all of mathematics, and Fourier analysis is one of the clearest examples of its application.

In particular, the things we care about are functions $f(x)$ with specific properties I won’t detail here like “smoothness” and “finiteness.” And the building blocks are the complex exponential functions

$\displaystyle e^{2 \pi i kx}$

where $k$ can be any integer. If you have done some linear algebra (and ignore this if you haven’t), then I can summarize the idea succinctly by saying the complex exponentials form an orthonormal basis for the vector space of square-integrable functions.

Back in colloquial language, what the Fourier theorem says is that any function of the kind we care about can be broken down into (perhaps infinitely many) pieces of this form called Fourier coefficients (I’m abusing the word “coefficient” here). The way it’s breaking down is also pleasingly simple: it’s a linear combination. Informally that means you’re just adding up all the complex exponentials with specific weights for each one. Mathematically, the conversion from the function to its Fourier coefficients is called the Fourier transform, and the set of all Fourier coefficients together is called the Fourier spectrum. So if you want to learn about your function $f$, or more importantly modify it in some way, you can inspect and modify its spectrum instead. The reason this is useful is that Fourier coefficients have very natural interpretations in sound and images, as we’ll see for the latter.

We wrote $f(x)$ and the complex exponential as a function of one real variable, but you can do the same thing for two variables (or a hundred!). And, if you’re willing to do some abusing and ignore the complexness of complex numbers, then you can visualize “complex exponentials in two variables” as images of stripes whose orientation and thickness correspond to two parameters (i.e., the $k$ in the offset equation becomes two coefficients). The video below shows how such complex exponentials can be used to build up an image of striking detail. The left frame shows which complex exponential is currently being added, and the right frame shows the layers all put together. I think the result is quite beautiful.

This just goes to show how powerful da Vinci’s idea of fine layering is: it’s as powerful as possible because it can create any image!

Now for digital images like the one above, everything is finite. So rather than have an infinitely precise function and a corresponding infinite set of Fourier coefficients, you get a finite list of sampled values (pixels) and a corresponding grid of Fourier coefficients. But the important and beautiful theorem is, and I want to emphasize how groundbreakingly important this is:

If you give me an image (or any function!) I can compute the decomposition very efficiently.

And the same theorem lets you go the other way: if you give me the decomposition, I can compute the original function’s samples quite easily. The algorithm to do this is called the Fast Fourier transform, and if any piece of mathematics or computer science has a legitimate claim to changing the world, it’s the Fast Fourier transform. It’s hard to pinpoint specific applications, because the transform is so ubiquitous across science and engineering, but we definitely would not have cell phones, satellites, internet, or electronics anywhere near as small as we do without the Fourier transform and the ability to compute it quickly.

Constructing hybrid images is one particularly nice example of manipulating the Fourier spectrum of two images, and then combining them back into a single image. That’s what we’ll do now.

As a side note, by the nature of brevity, the discussion above is a big disservice to the mathematics involved. I summarized and abused in ways that mathematicians would object to. If you want to see a much better treatment of the material, this blog has a long series of posts developing Fourier transforms and their discrete analogues from scratch. See our four primers, which lead into the main content posts where we implement the Fast Fourier transform in Python and use it to apply digital watermarks to an image. Note that in those posts, as in this one, all of the materials and code used are posted on this blog’s Github page.

## High and low frequencies

For images, interpreting ranges of Fourier coefficients is easy to do. You can imagine the coefficients lying on a grid in the plane like so:

Each dot in this grid corresponds to how “intense” the Fourier coefficient is. That is, it’s the magnitude of the (complex) coefficient of the corresponding complex exponential. Now the points that are closer to the origin correspond informally to the broad, smooth changes in the image. These are called “low frequency” coefficients. And points that are further away correspond to sharp changes and edges, and are likewise called “high frequency” components. So the if you wanted to “hybridize” two images, you’d pick ones with complementary intensities in these regions. That’s why Einstein (with all his wiry hair and wrinkles) and Monroe (with smooth features) are such good candidates. That’s also why, when we layered the Fourier components one by one in the video from earlier, we see the fuzzy shapes emerge before the fine details.

Moreover, we can “extract” the high frequency Fourier components by simply removing the low frequency ones. It’s a bit more complicated than that, since you want the transition from “something” to “nothing” to be smooth in sone sense. A proper discussion of this would go into sampling and the Nyquist frequency, but that’s beyond the scope of this post. Rather, we’ll just define a family of “filtering functions” without motivation and observe that they work well.

Definition: The Gaussian filter function with variance $\sigma$ and center $(a, b)$ is the function

$\displaystyle g(x,y) = e^{-\frac{(x - a)^2 + (y - b)^2}{2 \sigma^2}}$

It looks like this

image credit Wikipedia

In particular, at zero the function is 1 and it gradually drops to zero as you get farther away. The parameter $\sigma$ controls the rate at which it vanishes, and in the picture above the center is set to $(0,0)$.

Now what we’ll do is take our image, compute its spectrum, and multiply coordinatewise with a certain Gaussian function. If we’re trying to get rid of high-frequency components (called a “low-pass filter” because it lets the low frequencies through), we can just multiply the Fourier coefficients directly by the filter values $g(x,y)$, and if we’re doing a “high-pass filter” we multiply by $1 - g(x,y)$.

Before we get to the code, here’s an example of a low-pass filter. First, take this image of Marilyn Monroe

Now compute its Fourier transform

Apply the low-pass filter

And reverse the Fourier transform to get an image

In fact, this is a common operation in programs like photoshop for blurring an image (it’s called a Gaussian blur for obvious reasons). Here’s the python code to do this. You can download it along with all of the other resources used in making this post on this blog’s Github page.

import numpy
from numpy.fft import fft2, ifft2, fftshift, ifftshift
from scipy import misc
from scipy import ndimage
import math

def makeGaussianFilter(numRows, numCols, sigma, highPass=True):
centerI = int(numRows/2) + 1 if numRows % 2 == 1 else int(numRows/2)
centerJ = int(numCols/2) + 1 if numCols % 2 == 1 else int(numCols/2)

def gaussian(i,j):
coefficient = math.exp(-1.0 * ((i - centerI)**2 + (j - centerJ)**2) / (2 * sigma**2))
return 1 - coefficient if highPass else coefficient

return numpy.array([[gaussian(i,j) for j in range(numCols)] for i in range(numRows)])

def filterDFT(imageMatrix, filterMatrix):
shiftedDFT = fftshift(fft2(imageMatrix))
filteredDFT = shiftedDFT * filterMatrix
return ifft2(ifftshift(filteredDFT))

def lowPass(imageMatrix, sigma):
n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=False))

def highPass(imageMatrix, sigma):
n,m = imageMatrix.shape
return filterDFT(imageMatrix, makeGaussianFilter(n, m, sigma, highPass=True))

if __name__ == "__main__":
lowPassedMarilyn = lowPass(marilyn, 20)
misc.imsave("low-passed-marilyn.png", numpy.real(lowPassedMarilyn))


The first function samples the values from a Gaussian function with the specified parameters, discretizing the function and storing the values in a matrix. Then the filterDFT function applies the filter by doing coordinatewise multiplication (note these are all numpy arrays). We can do the same thing with a high-pass filter, producing the edgy image below

And if we compute the average of these two images, we basically get back to the original.

So the only difference between this and a hybrid image is that you take the low-passed part of one image and the high-passed part of another. Then the art is in balancing the parameters so as to make the averaged image look right. Indeed, with the following picture of Einstein and the above shot of Monroe, we can get a pretty good recreation of the Oliva-Torralba-Schyns piece. I think with more tinkering it could be even better (I did barely any centering/aligning/resizing to the original images).

Albert Einstein, Marilyn Monroe, and their hybridization.

And here’s the code for it

def hybridImage(highFreqImg, lowFreqImg, sigmaHigh, sigmaLow):
highPassed = highPass(highFreqImg, sigmaHigh)
lowPassed = lowPass(lowFreqImg, sigmaLow)

return highPassed + lowPassed


Interestingly enough, doing it in reverse doesn’t give quite as pleasing results, but it still technically works. So there’s something particularly important that the high-passed image does have a lot of high-frequency components, and vice versa for the low pass.

You can see some of the other hybrid images Oliva et al constructed over at their web gallery.

## Next Steps

How can we take this idea further? There are a few avenues I can think of. The most obvious one would be to see how this extends to video. Could one come up with generic parameters so that when two videos are hybridized (frame by frame, using this technique) it is only easy to see one at close distance? Or else, could we apply a three-dimensional transform to a video and modify that in some principled way? I think one would not likely find anything astounding, but who knows?

Second would be to look at the many other transforms we have at our disposal. How does manipulating the spectra of these transforms affect the original image, and can you make images that are hybridized in senses other than this one?

And finally, can we bring this idea down in dimension to work with one-dimensional signals? In particular, can we hybridize music? It could usher in a new generation of mashup songs that sound different depending on whether you wear earmuffs :)

Until next time!

# Learning to Love Complex Numbers

This post is intended for people with a little bit of programming experience and no prior mathematical background.

Numbers are curious things. On one hand, they represent one of the most natural things known to humans, which is quantity. It’s so natural to humans that even newborn babies are in tune with the difference between quantities of objects between 1 and 3, in that they notice when quantity changes much more vividly than other features like color or shape.

But our familiarity with quantity doesn’t change the fact that numbers themselves (as an idea) are a human invention. And they’re not like most human inventions, the kinds where you have to tinker with gears or circuits to get a machine that makes your cappuccino. No, these are mathematical inventions. These inventions exist only in our minds.

Numbers didn’t always exist. A long time ago, back when the Greeks philosophers were doing their philosophizing, negative numbers didn’t exist! In fact, it wasn’t until 1200 AD that the number zero was first considered in Europe. Zero, along with negative numbers and fractions and square roots and all the rest, were invented primarily to help people solve more problems than they could with the numbers they had available. That is, numbers were invented primarily as a way for people to describe their ideas in a useful way. People simply  wondered “is there a number whose square gives you 2?” And after a while they just decided there was and called it $\sqrt{2}$ because they didn’t have a better name for it.

But with these new solutions came a host of new problems. You see, although I said mathematical inventions only exist in our minds, once they’re invented they gain a life of their own. You start to notice patterns in your mathematical objects and you have to figure out why they do the things they do. And numbers are a perfectly good example of this: once I notice that I can multiply a number by itself, I can ask how often these “perfect squares” occur. That is, what’s the pattern in the numbers $1^2, 2^2, 3^2, 4^2, \dots$? If you think about it for a while, you’ll find that square numbers have a very special relationship with odd numbers.

Other times, however, the things you invent turn out to make no sense at all, and you can prove they never existed in the first place! It’s an odd state of affairs, but we’re going to approach the subject of complex numbers from this mindset. We’re going to come up with a simple idea, the idea that negative numbers can be perfect squares, and explore the world of patterns it opens up. Along the way we’ll do a little bit of programming to help explore, give some simple proofs to solidify our intuition, and by the end we’ll see how these ideas can cause wonderful patterns like this one:

## The number i

Let’s bring the story back around to squares. One fact we all remember about numbers is that squaring a number gives you something non-negative. $7^2 = 49, (-2)^2 = 4, 0^2 = 0$, and so on. But it certainly doesn’t have to be this way. What if we got sick of that stupid fact and decided to invent a new number whose square was negative? Which negative, you ask? Well it doesn’t really matter, because I can always stretch it larger or smaller so that it’s square is -1.

Let’s see how: if you say that your made-up number $x$ makes $x^2 = -7$, then I can just use $\frac{x}{\sqrt{7}}$ to get a number whose square is -1. If you’re going to invent a number that’s supposed to interact with our usual numbers, then you have to be allowed to add, subtract, and multiply $x$ with regular old real numbers, and the usual properties would have to still work. So it would have to be true that $(x / \sqrt{7})^2 = x^2 / \sqrt{7}^2 = -7/7 = -1$.

So because it makes no difference (this is what mathematicians mean by, “without loss of generality”) we can assume that the number we’re inventing will have a square of negative one. Just to line up with history, let’s call the new number $i$. So there it is: $i$ exists and $i^2 = -1$. And now that we are “asserting” that $i$ plays nicely with real numbers, we get these natural rules for adding and subtracting and multiplying and dividing. For example

• $1 + i$ is a new number, which we’ll just call $1+i$. And if we added two of these together, $(1+ i) + (1+i)$, we can combine the real parts and the $i$ parts to get $2 + 2i$. Same goes for subtraction. In general a complex number looks like $a + bi$, because as we’ll see in the other points you can simplify every simple arithmetic expression down to just one “real number” part and one “real number times $i$” part.
• We can multiply $3 \cdot i$, and we’ll just call it $3i$, and we require that multiplication distributes across addition (that the FOIL rule works). So that, for example, $(2 - i)(1 + 3i) = (2 + 6i - i - 3i^2) = (2 + 3) + (6i - i) = (5 + 5i)$.
• Dividing is a significantly more annoying. Say we want to figure out what $1 / (1+i)$ is (in fact, it’s not even obvious that this should look like a regular number! But it does). The $1 / a$ notation just means we’re looking for a number which, when we multiply by the denominator $a$, we get back to 1. So we’re looking to find out when $(a + bi)(1 + i) = 1 + 0i$ where $a$ and $b$ are variables we’re trying to solve for. If we multiply it out we get $(a-b) + (a + b)i = 1 + 0i$, and since the real part and the $i$ part have to match up, we know that $a - b = 1$ and $a + b = 0$. If we solve these two equations, we find that $a = 1/2, b = -1/2$ works great. If we want to figure out something like $(2 + 3i) / (1 - i)$, we just find out what $1 / (1- i)$ is first, and then multiply the result by $(2+3i)$.

So that was tedious and extremely boring, and we imagine you didn’t even read it (that’s okay, it really is boring!). All we’re doing is establishing ground rules for the game, so if you come across some arithmetic that doesn’t make sense, you can refer back to this list to see what’s going on. And once again, for the purpose of this post, we’re asserting that all these laws hold. Maybe some laws follow from others, but as long as we don’t come up with any nasty self-contradictions we’ll be fine.

And now we turn to the real questions: is $i$ the only square root of -1? Does $i$ itself have a square root? If it didn’t, we’d be back to where we started, with some numbers (the non-$i$ numbers) having square roots while others don’t. And so we’d feel the need to make all the $i$ numbers happy by making up more numbers to be their square roots, and then worrying what if these new numbers don’t have square roots and…gah!

I’ll just let you in on the secret to save us from this crisis. It turns out that $i$ does have a square root in terms of other $i$ numbers, but in order to find it we’ll need to understand $i$ from a different angle, and that angle turns out to be geometry.

Geometry? How is geometry going to help me understand numbers!?

It’s a valid question and part of why complex numbers are so fascinating. And I don’t mean geometry like triangles and circles and parallel lines (though there will be much talk of angles), I mean transformations in the sense that we’ll be “stretching,” “squishing,” and “rotating” numbers. Maybe another time I can tell you why for me “geometry” means stretching and rotating; it’s a long but very fun story.

The clever insight is that you can represent complex numbers as geometric objects in the first place. To do it, you just think of $a + bi$ as a pair of numbers $(a,b)$, (the pair of real part and $i$ part), and then plot that point on a plane. For us, the $x$-axis will be the “real” axis, and the $y$-axis will be the $i$-axis. So the number $(3 - 4i)$ is plotted 3 units in the positive $x$ direction and 4 units in the negative $y$ direction. Like this:

The “j” instead of “i” is not a typo, but a disappointing fact about the programming language we used to make this image. We’ll talk more about why later.

We draw it as an arrow for a good reason. Stretching, squishing, rotating, and reflecting will all be applied to the arrow, keeping its tail fixed at the center of the axes. Sometimes the arrow is called a “vector,” but we won’t use that word because here it’s synonymous with “complex number.”

So let’s get started squishing stuff.

## Stretching, Squishing, Rotating

Before we continue I should clear up some names. We call a number that has an $i$ in it a complex number, and we call the part without the $i$ the real part (like 2 in $2-i$) and the part with $i$ the complex part.

Python is going to be a great asset for us in exploring complex numbers, so let’s jump right into it. It turns out that Python natively supports complex numbers, and I wrote a program for drawing complex numbers. I used it to make the plot above. The program depends on a library I hate called matplotlib, and so the point of the program is to shield you from as much pain as possible and focus on complex numbers. You can use the program by downloading it from this blog’s Github page, along with everything else I made in writing this post. All you need to know how to do is call a function, and I’ve done a bit of window dressing removal to simplify things (I really hate matplotlib).

# plotComplexNumbers : [complex] -&gt; None
# display a plot of the given list of complex numbers
def plotComplexNumbers(numbers):
...


Before we show some examples of how to use it, we have to understand how to use complex numbers in Python. It’s pretty simple, except that Python was written by people who hate math, and so they decided the complex number would be represented by $j$ instead of $i$ (people who hate math are sometimes called “engineers,” and they use $j$ out of spite. Not really, though).

So in Python it’s just like any other computation. For example:

>>> (1 + 1j)*(4 - 2j) == (6+2j)
True
>>> 1 / (1+1j)
(0.5-0.5j)

And so calling the plotting function with a given list of complex numbers is as simple as importing the module and calling the function

from plotcomplex import plot
plot.plotComplexNumbers([(-1+1j), (1+2j), (-1.5 - 0.5j), (.6 - 1.8j)])


Here’s the result

So let’s use plots like this one to explore what “multiplication by $i$” does to a complex number. It might not seem exciting at first, but I promise there’s a neat punchline.

Even without plotting it’s pretty easy to tell what multiplying by $i$ does to some numbers. It takes 1 to $i$, moves $i$ to $i^2 = -1$, it takes -1 to $-i$, and $-i$ to $-i \cdot i = 1$.

What’s the pattern in these? well if we plot all these numbers, they’re all at right angles in counter-clockwise order. So this might suggest that multiplication by $i$ does some kind of rotation. Is that always the case? Well lets try it with some other more complicated numbers. Click the plots below to enlarge.

Well, it looks close but it’s hard to tell. Some of the axes are squished and stretched, so it might be that our images don’t accurately represent the numbers (the real world can be such a pain). Well when visual techniques fail, we can attempt to prove it.

Clearly multiplying by $i$ does some kind of rotation, maybe with other stuff too, and it shouldn’t be so hard to see that multiplying by $i$ does the same thing no matter which number you use (okay, the skeptical readers will say that’s totally hard to see, but we’ll prove it super rigorously in a minute). So if we take any number and multiply it by $i$ once, then twice, then three times, then four, and if we only get back to where we started at four multiplications, then each rotation had to be a quarter turn.

Indeed,

$\displaystyle (a + bi) i^4 = (ai - b) i^3 = (-a - bi) i^2 = (-ai + b) i = a + bi$

This still isn’t all that convincing, and we want to be 100% sure we’re right. What we really need is a way to arithmetically compute the angle between two complex numbers in their plotted forms. What we’ll do is find a way to measure the angle of one complex number with the $x$-axis, and then by subtraction we can get angles between arbitrary points. For example, in the figure below $\theta = \theta_1 - \theta_2$.

One way to do this is with trigonometry: the geometric drawing of $a + bi$ is the hypotenuse of a right triangle with the $x$-axis.

And so if $r$ is the length of the arrow, then by the definition of sine and cosine, $\cos(\theta) = a/r, \sin(\theta) = b/r$. If we have $r, \theta$, and $r > 0$, we can solve for a unique $a$ and $b$, so instead of representing a complex number in terms of the pair of numbers $(a,b)$, we can represent it with the pair of numbers $(r, \theta)$. And the conversion between the two is just

$a + bi = r \cos(\theta) + (r \sin(\theta)) i$

The $(r, \theta)$ representation is called the polar representation, while the $(a,b)$ representation is called the rectangular representation or the Cartesian representation. Converting between polar and Cartesian coordinates fills the pages of many awful pre-calculus textbooks (despite the fact that complex numbers don’t exist in classical calculus). Luckily for us Python has built-in functions to convert between the two representations for us.

&gt;&gt;&gt; import cmath
&gt;&gt;&gt; cmath.polar(1 + 1j)
(1.4142135623730951, 0.7853981633974483)
&gt;&gt;&gt; z = cmath.polar(1 + 1j)
&gt;&gt;&gt; cmath.rect(z[0], z[1])
(1.0000000000000002+1j)


It’s a little bit inaccurate on the rounding, but it’s fine for our purposes.

So how do we compute the angle between two complex numbers? Just convert each to the polar form, and subtract the second coordinates. So if we get back to our true goal, to figure out what multiplication by $i$ does, we can just do everything in polar form. Here’s a program that computes the angle between two complex numbers.

def angleBetween(z, w):
zPolar, wPolar = cmath.polar(z), cmath.polar(w)
return wPolar[1] - zPolar[1]

print(angleBetween(1 + 1j, (1 + 1j) * 1j))
print(angleBetween(2 - 3j, (2 - 3j) * 1j))
print(angleBetween(-0.5 + 7j, (-0.5 + 7j) * 1j))


Running it gives

1.5707963267948966
1.5707963267948966
-4.71238898038469


Note that the decimal form of $\pi/2$ is 1.57079…, and that the negative angle is equivalent to $\pi/2$ if you add a full turn of $2\pi$ to it. So programmatically we can see that for every input we try multiplying by $i$ rotates 90 degrees.

But we still haven’t proved it works. So let’s do that now. To say what the angle is between $r \cos (\theta) + ri \sin (\theta)$ and $i \cdot [r \cos (\theta) + ri \sin(\theta)] = -r \sin (\theta) + ri \cos(\theta)$, we need to transform the second number into the usual polar form (where the $i$ is on the sine part and not the cosine part). But we know, or I’m telling you now, this nice fact about sine and cosine:

$\displaystyle \sin(\theta + \pi/2) = cos(\theta)$
$\displaystyle \cos(\theta + \pi / 2) = -\sin(\theta)$

This fact is maybe awkward to write out algebraically, but it’s just saying that if you shift the whole sine curve a little bit you get the cosine curve, and if you keep shifting it you get the opposite of the sine curve (and if you kept shifting it even more you’d eventually get back to the sine curve; they’re called periodic for this reason).

So immediately we can rewrite the second number as $r \cos(\theta + \pi/2) + i r \sin (\theta + \pi/2)$. The angle is the same as the original angle plus a right angle of $\pi/2$. Neat!

Applying this same idea to $(a + bi) \cdot (c + di)$, it’s not much harder to prove that multiplying two complex numbers in general multiplies their lengths and adds their angles. So if a complex number $z$ has its magnitude $r$ smaller than 1, multiplying by $z$ squishes and rotates whatever is being multiplied. And if the magnitude is greater than 1, it stretches and rotates. So we have a super simple geometric understanding of how arithmetic with complex numbers works. And as we’re about to see, all this stretching and rotating results in some really weird (and beautifully mysterious!) mathematics and programs.

But before we do that we still have one question to address, the question that started this whole geometric train of thought: does $i$ have a square root? Indeed, I’m just looking for a number such that, when I square its length and double its angle, I get $i = \cos(\pi/2) + i \sin(\pi/2)$. Indeed, the angle we want is $\pi/4$, and the length we want is $r = 1$, which means $\sqrt{i} = \cos(\pi/4) + i \sin(\pi/4)$. Sweet! There is another root if you play with the signs, see if you can figure it out.

In fact it’s a very deeper and more beautiful theorem (“theorem” means “really important fact”) called the fundamental theorem of algebra. And essentially it says that the complex numbers are complete. That is, we can always find square roots, cube roots, or anything roots of numbers involving $i$. It actually says a lot more, but it’s easier to appreciate the rest of it after you do more math than we’re going to do in this post.

On to pretty patterns!

## The Fractal

So here’s a little experiment. Since every point in the plane is the end of some arrow representing a complex number, we can imagine transforming the entire complex plane by transforming each number by the same rule. The most interesting simple rule we can think of: squaring! So though it might strain your capacity for imagination, try to visualize the idea like this. Squaring a complex number is the same as squaring it’s length and doubling its angle. So imagine: any numbers whose arrows are longer than 1 will grow much bigger, arrows shorter than 1 will shrink, and arrows of length exactly one will stay the same length (arrows close to length 1 will grow/shrink much more slowly than those far away from 1). And complex numbers with small positive angles will increase their angle, but only a bit, while larger angles will grow faster.

Here’s an animation made by Douglas Arnold showing what happens to the set of complex numbers $a + bi$ with $0 \leq a, b \leq 1$ or $-1 < a,b < 0$. Again, imagine every point is the end of a different arrow for the corresponding complex number. The animation is for a single squaring, and the points move along the arc they would travel if one rotated/stretched them smoothly.

So that’s pretty, but this is by all accounts a well-behaved transformation. It’s “predictable,” because for example we can always tell which complex numbers will get bigger and bigger (in length) and which will get smaller.

What if, just for the sake of tinkering, we changed the transformation a little bit? That is, instead of sending $z = a+bi$ to $z^2$ (I’ll often write this $z \mapsto z^2$), what if we sent

$\displaystyle z \mapsto z^2 + 1$

Now it’s not so obvious: which vectors will grow and which will shrink? Notice that it’s odd because adding 1 only changes the real part of the number. So a number whose length is greater than 1 can become small under this transformation. For example, $i$ is sent to $0$, so something slightly larger would also be close to zero. Indeed, $5i/4 \mapsto -9/16$.

So here’s an interesting question: are there any complex numbers that will stay small even if I keep transforming like this forever? Specifically, if I call $f(z) = z^2$, and I call $f^2(z) = f(f(z))$, and likewise call $f^k(z)$ for $k$ repeated transformations of $z$, is there a number $z$ so that for every $k$, the value $f^k(z) < 2$? “Obvious” choices like $z=0$ don’t work, and neither do random guesses like $z=i$ or $z=1$. So should we guess the answer is no?

Before we jump to conclusions let’s write a program to see what happens for more than our random guesses. The program is simple: we’ll define the “square plus one” function, and then repeatedly apply that function to a number for some long number of times (say, 250 times). If the length of the number stays under 2 after so many tries, we’ll call it “small forever,” and otherwise we’ll call it “not small forever.”

def squarePlusOne(z):
return z*z + 1

def isSmallForever(z, f):
k = 0

while abs(z) &lt; 2:
z = f(z)
k += 1

if k &gt; 250:
return True

return False


This isSmallForever function is generic: you can give it any function $f$ and it will repeatedly call $f$ on $z$ until the result grows bigger than 2 in length. Note that the abs function is a built-in Python function for computing the length of a complex number.

Then I wrote a classify function, which you can give a window and a small increment, and it will produce a grid of zeros and ones marking the results of isSmallForever. The details of the function are not that important. I also wrote a function that turns the grid into a picture. So here’s an example of how we’d use it:

from plotcomplex.plot import gridToImage

def classifySquarePlusOne(z):
return isSmallForever(z, squarePlusOne)

grid = classify(classifySquarePlusOne) # the other arguments are defaulted to [-2,2], [-2,2], 0.1
gridToImage(grid)


And here’s the result. Points colored black grow beyond 2, and white points stay small for the whole test.

Looks like they’ll always grow big.

So it looks like repeated squaring plus one will always make complex numbers grow big. That’s not too exciting, but we can always make it more exciting. What happens if we replace the 1 in $z^2 + 1$ with a different complex number? For example, if we do $z^2 - 1$ then will things always grow big?

You can randomly guess and see that 0 will never grow big, because $0^2 - 1 = -1$ and $(-1)^2 - 1 = 0$. It will just oscillate forever. So with -1 some numbers will grow and some will not! Let’s use the same routine above to see which:

def classifySquareMinusOne(z):
return isSmallForever(z, squareMinusOne)

grid = classify(classifySquareMinusOne)
gridToImage(grid)


And the result:

Now that’s a more interesting picture! Let’s ramp up the resolution

grid = classify(classifySquareMinusOne, step=0.001)
gridToImage(grid)


Gorgeous. If you try this at home you’ll notice, however, that this took a hell of a long time to run. Speeding up our programs is very possible, but it’s a long story for another time. For now we can just be patient.

Indeed, this image has a ton of interesting details! It looks almost circular in the middle, but if we zoom in we can see that it’s more like a rippling wave

It’s pretty incredible, and a huge question is jumping out at me: what the heck is causing this pattern to occur? What secret does -1 know that +1 doesn’t that makes the resulting pattern so intricate?

But an even bigger question is this. We just discovered that some values of $c$ make $z \mapsto z^2 + c$ result in interesting patterns, and some do not! So the question is which ones make interesting patterns? Even if we just, say, fix the starting point to zero: what is the pattern in the complex numbers that would tell me when this transformation makes zero blow up, and when it keeps zero small?

Sounds like a job for another program. This time we’ll use a nice little Python feature called a closure, which we define a function that saves the information that exists when it’s created for later. It will let us write a function that takes in $c$ and produces a function that transforms according to $z \mapsto z^2+c$.

def squarePlusC(c):
def f(z):
return z*z + c

return f


And we can use the very same classification/graphing function from before to do this.

def classifySquarePlusC(c):
return isSmallForever(0, squarePlusC(c))

grid = classify(classifySquarePlusC, xRange=(-2, 1), yRange=(-1, 1), step=0.005)
gridToImage(grid)


And the result:

Stunning. This wonderful pattern, which is still largely not understood today, is known as the Mandelbrot set. That is, the white points are the points in the Mandlebrot set, and the black points are not in it. The detail on the border of this thing is infinitely intricate. For example, we can change the window in our little program to zoom in on a particular region.

And if you keep zooming in you keep getting more and more detail. This was true of the specific case of $z^2 - 1$, but somehow the patterns in the Mandelbrot set are much more varied and interesting. And if you keep going down eventually you’ll see patterns that look like the original Mandelbrot set. We can already kind of see that happening above. The name for this idea is a fractal, and the $z^2 - 1$ image has it too. Fractals are a fascinating and mysterious subject studied in a field called discrete dynamical systems. Many people dedicate their entire lives to studying these things, and it’s for good reason. There’s a lot to learn and even more that’s unknown!

So this is the end of our journey for now. I’ve posted all of the code we used in the making of this post so you can continue to play, but here are some interesting ideas.

• The Mandelbrot set (and most fractals) are usually colored. The way they’re colored is as follows. Rather than just say true or false when zero blows up beyond 2 in length, you return the number of iterations $k$ that happened. Then you pick a color based on how big $k$ is. There’s a link below that lets you play with this. In fact, adding colors shows that there is even more intricate detail happening outside the Mandelbrot set that’s too faint to see in our pictures above. Such as this.
• Some very simple questions about fractals are very hard to answer. For example, is the Mandelbrot set connected? That is, is it possible to “walk” from every point in the Mandelbrot set to every other point without leaving the set? Despite the scattering of points in the zoomed in picture above that suggest the answer is no, the answer is actually yes! This is a really difficult thing to prove, however.
• The patterns in many fractals are often used to generate realistic looking landscapes and generate pseudo randomness. So fractals are not just mathematical curiosities.
• You should definitely be experimenting with this stuff! What happens if you change the length threshold from 2 to some bigger number? What about a smaller number? What if you do powers different than $2$? There’s so much to explore!
• The big picture thing to take away from this is that it’s not the numbers themselves that are particularly interesting, it’s the transformations of the numbers that generate these patterns! The interesting questions are what kinds of things are the same under these transformations, and what things are different. This is a very general idea in mathematics, and the more math you do the more you’ll find yourself wondering about useful and bizarre transformations.

For the chance to keep playing with the Mandelbrot set, check out this Mandelbrot grapher that works in your browser. It lets you drag rectangles to zoom further in on regions of interest. It’s really fun.

Until next time!

# Sending and Authenticating Messages with Elliptic Curves

Last time we saw the Diffie-Hellman key exchange protocol, and discussed the discrete logarithm problem and the related Diffie-Hellman problem, which form the foundation for the security of most protocols that use elliptic curves. Let’s continue our journey to investigate some more protocols.

Just as a reminder, the Python implementations of these protocols are not at all meant for practical use, but for learning purposes. We provide the code on this blog’s Github page, but for the love of security don’t actually use them.

## Shamir-Massey-Omura

Recall that there are lots of ways to send encrypted messages if you and your recipient share some piece of secret information, and the Diffie-Hellman scheme allows one to securely generate a piece of shared secret information. Now we’ll shift gears and assume you don’t have a shared secret, nor any way to acquire one. The first cryptosystem in that vein is called the Shamir-Massey-Omura protocol. It’s only slightly more complicated to understand than Diffie-Hellman, and it turns out to be equivalently difficult to break.

The idea is best explained by metaphor. Alice wants to send a message to Bob, but all she has is a box and a lock for which she has the only key. She puts the message in the box and locks it with her lock, and sends it to Bob. Bob can’t open the box, but he can send it back with a second lock on it for which Bob has the only key. Upon receiving it, Alice unlocks her lock, sends the box back to Bob, and Bob can now open the box and retrieve the message.

To celebrate the return of Game of Thrones, we’ll demonstrate this protocol with an original Lannister Infographic™.

Assuming the box and locks are made of magically unbreakable Valyrian steel, nobody but Bob (also known as Jamie) will be able to read the message.

Now fast forward through the enlightenment, industrial revolution, and into the age of information. The same idea works, and it’s significantly faster over long distances. Let $C$ be an elliptic curve over a finite field $k$ (we’ll fix $k = \mathbb{Z}/p$ for some prime $p$, though it works for general fields too). Let $n$ be the number of points on $C$.

Alice’s message is going to be in the form of a point $M$ on $C$. She’ll then choose her secret integer $0 < s_A < p$ and compute $s_AM$ (locking the secret in the box), sending the result to Bob. Bob will likewise pick a secret integer $s_B$, and send $s_Bs_AM$ back to Alice.

Now the unlocking part: since $s_A \in \mathbb{Z}/p$ is a field, Alice can “unlock the box” by computing the inverse $s_A^{-1}$ and computing $s_BM = s_A^{-1}s_Bs_AM$. Now the “box” just has Bob’s lock on it. So Alice sends $s_BM$ back to Bob, and Bob performs the same process to evaluate $s_B^{-1}s_BM = M$, thus receiving the message.

Like we said earlier, the security of this protocol is equivalent to the security of the Diffie-Hellman problem. In this case, if we call $z = s_A^{-1}$ and $y = s_B^{-1}$, and $P = s_As_BM$, then it’s clear that any eavesdropper would have access to $P, zP$, and $yP$, and they would be tasked with determining $zyP$, which is exactly the Diffie-Hellman problem.

Now Alice’s secret message comes in the form of a point on an elliptic curve, so how might one translate part of a message (which is usually represented as an integer) into a point? This problem seems to be difficult in general, and there’s no easy answer. Here’s one method originally proposed by Neal Koblitz that uses a bit of number theory trickery.

Let $C$ be given by the equation $y^2 = x^3 + ax + b$, again over $\mathbb{Z}/p$. Suppose $0 \leq m < p/100$ is our message. Define for any $0 \leq j < 100$ the candidate $x$-points $x_j = 100m + j$. Then call our candidate $y^2$-values $s_j = x_j^3 + ax_j + b$. Now for each $j$ we can compute $x_j, s_j$, and so we’ll pick the first one for which $s_j$ is a square in $\mathbb{Z}/p$ and we’ll get a point on the curve. How can we tell if $s_j$ is a square? One condition is that $s_j^{(p-1)/2} \equiv 1 \mod p$. This is a basic fact about quadratic residues modulo primes; see these notes for an introduction and this Wikipedia section for a dense summary.

Once we know it’s a square, we can compute the square root depending on whether $p \equiv 1 \mod 4$ or $p \equiv 3 \mod 4$. In the latter case, it’s just $s_j^{(p+1)/4} \mod p$. Unfortunately the former case is more difficult (really, the difficult part is $p \equiv 1 \mod 8$). You can see Section 1.5 of this textbook for more details and three algorithms, or you could just pick primes congruent to 3 mod 4.

I have struggled to find information about the history of the Shamir-Massey-Omura protocol; every author claims it’s not widely used in practice, and the only reason seems to be that this protocol doesn’t include a suitable method for authenticating the validity of a message. In other words, some “man in the middle” could be intercepting messages and tricking you into thinking he is your intended recipient. Coupling this with the difficulty of encoding a message as a point seems to be enough to make cryptographers look for other methods. Another reason could be that the system was patented in 1982 and is currently held by SafeNet, one of the US’s largest security providers. All of their products have generic names so it’s impossible to tell if they’re actually using Shamir-Massey-Omura. I’m no patent lawyer, but it could simply be that nobody else is allowed to implement the scheme.

## Digital Signatures

Indeed, the discussion above raises the question: how does one authenticate a message? The standard technique is called a digital signature, and we can implement those using elliptic curve techniques as well. To debunk the naive idea, one cannot simply attach some static piece of extra information to the message. An attacker could just copy that information and replicate it to forge your signature on another, potentially malicious document. In other words, a signature should only work for the message it was used to sign. The technique we’ll implement was originally proposed by Taher Elgamal, and is called the ElGamal signature algorithm. We’re going to look at a special case of it.

So Alice wants to send a message $m$ with some extra information that is unique to the message and that can be used to verify that it was sent by Alice. She picks an elliptic curve $E$ over $\mathbb{F}_q$ in such a way that the number of points on $E$ is $br$, where $b$ is a small integer and $r$ is a large prime.

Then, as in Diffie-Hellman, she picks a base point $Q$ that has order $r$ and a secret integer $s$ (which is permanent), and computes $P = sQ$. Alice publishes everything except $s$:

Public information: $\mathbb{F}_q, E, b, r, Q, P$

Let Alice’s message $m$ be represented as an integer at most $r$ (there are a few ways to get around this if your message is too long). Now to sign $m$ Alice picks a message specific $k < r$ and computes what I’ll call the auxiliary point $A = kQ$. Let $A = (x, y)$. Alice then computes the signature $g = k^{-1}(m + s x) \mod r$. The signed message is then $(m, A, g)$, which Alice can safely send to Bob.

Before we see how Bob verifies the message, notice that the signature integer involves everything: Alice’s secret key, the message-specific secret integer $k$, and most importantly the message. Remember that this is crucial: we want the signature to work only for the message that it was used to sign. If the same $k$ is used for multiple messages then the attacker can find out your secret key! (And this has happened in practice; see the end of the post.)

So Bob receives $(m, A, g)$, and also has access to all of the public information listed above. Bob authenticates the message by computing the auxiliary point via a different route. First, he computes $c = g^{-1} m \mod r$ and $d = g^{-1}x \mod r$, and then $A' = cQ + dP$. If the message was signed by Alice then $A' = A$, since we can just write out the definition of everything:

Now to analyze the security. The attacker wants to be able to take any message $m'$ and produce a signature $A', g'$ that will pass validation with Alice’s public information. If the attacker knew how to solve the discrete logarithm problem efficiently this would be trivial: compute $s$ and then just sign like Alice does. Without that power there are still a few options. If the attacker can figure out the message-specific integer $k$, then she can compute Alice’s secret key $s$ as follows.

Given $g = k^{-1}(m + sx) \mod r$, compute $kg \equiv (m + sx) \mod r$. Compute $d = gcd(x, r)$, and you know that this congruence has only $d$ possible solutions modulo $r$. Since $s$ is less than $r$, the attacker can just try all options until they find $P = sQ$. So that’s bad, but in a properly implemented signature algorithm finding $k$ is equivalently hard to solving the discrete logarithm problem, so we can assume we’re relatively safe from that.

On the other hand one could imagine being able to conjure the pieces of the signature $A', g'$ by some method that doesn’t involve directly finding Alice’s secret key. Indeed, this problem is less well-studied than the Diffie-Hellman problem, but most cryptographers believe it’s just as hard. For more information, this paper surveys the known attacks against this signature algorithm, including a successful attack for fields of characteristic two.

## Signature Implementation

We can go ahead and implement the signature algorithm once we’ve picked a suitable elliptic curve. For the purpose of demonstration we’ll use a small curve, $E: y^2 = x^3 + 3x + 181$ over $F = \mathbb{Z}/1061$, whose number of points happens to have the a suitable prime factorization ($1047 = 3 \cdot 349$). If you’re interested in counting the number of points on an elliptic curve, there are many theorems and efficient algorithms to do this, and if you’ve been reading this whole series something then an algorithm based on the Baby-Step Giant-Step idea would be easy to implement. For the sake of brevity, we leave it as an exercise to the reader.

Note that the code we present is based on the elliptic curve and finite field code we’re been implementing as part of this series. All of the code used in this post is available on this blog’s Github page.

The basepoint we’ll pick has to have order 349, and $E$ has plenty of candidates. We’ll use $(2, 81)$, and we’ll randomly generate a secret key that’s less than $349$ (eight bits will do). So our setup looks like this:

if __name__ == "__main__":
F = FiniteField(1061, 1)

# y^2 = x^3 + 3x + 181
curve = EllipticCurve(a=F(3), b=F(181))
basePoint = Point(curve, F(2), F(81))
basePointOrder = 349
secretKey = generateSecretKey(8)
publicKey = secretKey * basePoint


Then so sign a message we generate a random key, construct the auxiliary point and the signature, and return:

def sign(message, basePoint, basePointOrder, secretKey):
modR = FiniteField(basePointOrder, 1)
oneTimeSecret = generateSecretKey(len(bin(basePointOrder)) - 3) # numbits(order) - 1

auxiliaryPoint = oneTimeSecret * basePoint
signature = modR(oneTimeSecret).inverse() *
(modR(message) + modR(secretKey) * modR(auxiliaryPoint[0]))

return (message, auxiliaryPoint, signature)


So far so good. Note that we generate the message-specific $k$ at random, and this implies we need a high-quality source of randomness (what’s called a cryptographically-secure pseudorandom number generator). In absence of that there are proposed deterministic methods for doing it. See this draft proposal of Thomas Pornin, and this paper of Daniel Bernstein for another.

Now to authenticate, we follow the procedure from earlier.

def authentic(signedMessage, basePoint, basePointOrder, publicKey):
modR = FiniteField(basePointOrder, 1)
(message, auxiliary, signature) = signedMessage

sigInverse = modR(signature).inverse() # sig can be an int or a modR already
c, d = sigInverse * modR(message), sigInverse * modR(auxiliary[0])

auxiliaryChecker = int(c) * basePoint + int(d) * publicKey
return auxiliaryChecker == auxiliary


Continuing with our example, we pick a message represented as an integer smaller than $r$, sign it, and validate it.

>>> message = 123
>>> signedMessage = sign(message, basePoint, basePointOrder, secretKey)
>>> signedMessage
(123, (220 (mod 1061), 234 (mod 1061)), 88 (mod 349))
>>> authentic(signedMessage, basePoint, basePointOrder, publicKey)
True


So there we have it, a nice implementation of the digital signature algorithm.

## When Digital Signatures Fail

As we mentioned, it’s extremely important to avoid using the same $k$ for two different messages. If you do, then you’ll get two signed messages $(m_1, A_1, g_1), (m_2, A_2, g_2)$, but by definition the two $g$‘s have a ton of information in common! An attacker can recognize this immediately because $A_1 = A_2$, and figure out the secret key $s$ as follows. First write

$\displaystyle g_1 - g_2 \equiv k^{-1}(m_1 + sx) - k^{-1}(m_2 + sx) \equiv k^{-1}(m_1 - m_2) \mod r$.

Now we have something of the form $\text{known}_1 \equiv (k^{-1}) \text{known}_2 \mod r$, and similarly to the attack described earlier we can try all possibilities until we find a number that satisfies $A = kQ$. Then once we have $k$ we have already seen how to find $s$. Indeed, it would be a good exercise for the reader to implement this attack.

The attack we just described it not an idle threat. Indeed, the Sony corporation, producers of the popular Playstation video game console, made this mistake in signing software for Playstation 3. A digital signature algorithm makes sense to validate software, because Sony wants to ensure that only Sony has the power to publish games. So Sony developers act as one party signing the data on a disc, and the console will only play a game with a valid signature. Note that the asymmetric setup is necessary because if the console had shared a secret with Sony (say, stored as plaintext within the hardware of the console), anyone with physical access to the machine could discover it.

Now here come the cringing part. Sony made the mistake of using the same $k$ to sign every game! Their mistake was discovered in 2010 and made public at a cryptography conference. This video of the humorous talk includes a description of the variant Sony used and the attacker describe how the mistake should have been corrected. Without a firmware update (I believe Sony’s public key information was stored locally so that one could authenticate games without an internet connection), anyone could sign a piece of software and create games that are indistinguishable from something produced by Sony. That includes malicious content that, say, installs software that sends credit card information to the attacker.

So here we have a tidy story: a widely used cryptosystem with a scare story of what will go wrong when you misuse it. In the future of this series, we’ll look at other things you can do with elliptic curves, including factoring integers and testing for primality. We’ll also see some normal forms of elliptic curves that are used in place of the Weierstrass normal form for various reasons.

Until next time!