The Blum-Blum-Shub Pseudorandom Generator

Problem: Design a random number generator that is computationally indistinguishable from a truly random number generator.

Solution (in Python): note this solution uses the Miller-Rabin primality tester, though any primality test will do. See the github repository for the referenced implementation.

from randomized.primality import probablyPrime
import random

def goodPrime(p):
    return p % 4 == 3 and probablyPrime(p, accuracy=100)

def findGoodPrime(numBits=512):
    candidate = 1
    while not goodPrime(candidate):
        candidate = random.getrandbits(numBits)
    return candidate

def makeModulus():
    return findGoodPrime() * findGoodPrime()

def parity(n):
    return sum(int(x) for x in bin(n)[2:]) % 2

class BlumBlumShub(object):
    def __init__(self, seed=None):
        self.modulus = makeModulus()
        self.state = seed if seed is not None else random.randint(2, self.modulus - 1)
        self.state = self.state % self.modulus

    def seed(self, seed):
        self.state = seed

    def bitstream(self):
        while True:
            yield parity(self.state)
            self.state = pow(self.state, 2, self.modulus)

    def bits(self, n=20):
        outputBits = ''
        for bit in self.bitstream():
            outputBits += str(bit)
            if len(outputBits) == n:
                break

        return outputBits

Discussion:

An integer x is called a quadratic residue of another integer N if it can be written as x = a^2 \mod N for some a. That is, if it’s the remainder when dividing a perfect square by N. Some numbers, like N=8, have very special patterns in their quadratic residues, only 0, 1, and 4 can occur as quadratic residues.

The core idea behind this random number generator is that, for a specially chosen modulus N, telling whether a number x is a quadratic residue mod N is hard. In fact, one can directly convert an algorithm that can predict the next bit of this random number generator (by even a slight edge) into an arbitrarily accurate quadratic-residue-decider. So if computing quadratic residues is even mildly hard, then predicting the next bit in this random number generator is very hard.

More specifically, the conjectured guarantee about this random number generator is the following: if you present a polynomial time adversary with two sequences:

  1. A truly random sequence of bits of length k,
  2. k bits from the output of the pseudorandom generator when seeded with a starting state shorter than k bits.

Then the adversary can’t distinguish between the two sequences with probability “significantly” more than 1/2, where by “significantly” I mean 1/k^c for any c>0 (i.e., the edge over randomness vanishes faster than any inverse polynomial). It turns out, due to a theorem of Yao, that this is equivalent to not being able to guess the next bit in a pseudorandom sequence with a significant edge over a random guess, even when given the previous \log(N)^{10} bits in the sequence (or any \textup{poly}(\log N) bits in the sequence).

This emphasizes a deep philosophical viewpoint in theoretical computer science, that whether some object has a property (randomness) really only depends on the power of a computationally limited observer to identify that property. If nobody can tell the difference between fake randomness and real randomness, then the fake randomness is random. Offhand I wonder whether you can meaningfully apply this view to less mathematical concepts like happiness and status.

Anyway, the modulus N is chosen in such a way that every quadratic residue of N has a unique square root which is also a quadratic residue. This makes the squaring function a bijection on quadratic residues. In other words, with a suitably chosen N, there’s no chance that we’ll end up with N=8 where there are very few quadratic residues and the numbers output by the Blum-Blum-Shub generator have a short cycle. Moreover, the assumption that detecting quadratic residues mod N is hard makes the squaring function a one-way permutation.

Here’s an example of how this generator might be used:

generator = BlumBlumShub()

hist = [0] * 2**6
for i in range(10000):
    value = int(generator.bits(6), 2)
    hist[value] += 1

print(hist)

This produces random integers between 0 and 64, with the following histogram:

bbs-hist

See these notes of Junod for a detailed exposition of the number theory behind this random number generator, with full definitions and proofs.

Learning to Love Complex Numbers

This post is intended for people with a little bit of programming experience and no prior mathematical background.

So let’s talk about numbers.

Numbers are curious things. On one hand, they represent one of the most natural things known to humans, which is quantity. It’s so natural to humans that even newborn babies are in tune with the difference between quantities of objects between 1 and 3, in that they notice when quantity changes much more vividly than other features like color or shape.

But our familiarity with quantity doesn’t change the fact that numbers themselves (as an idea) are a human invention. And they’re not like most human inventions, the kinds where you have to tinker with gears or circuits to get a machine that makes your cappuccino. No, these are mathematical inventions. These inventions exist only in our minds.

Numbers didn’t always exist. A long time ago, back when the Greeks philosophers were doing their philosophizing, negative numbers didn’t exist! In fact, it wasn’t until 1200 AD that the number zero was first considered in Europe. Zero, along with negative numbers and fractions and square roots and all the rest, were invented primarily to help people solve more problems than they could with the numbers they had available. That is, numbers were invented primarily as a way for people to describe their ideas in a useful way. People simply  wondered “is there a number whose square gives you 2?” And after a while they just decided there was and called it \sqrt{2} because they didn’t have a better name for it. 

But with these new solutions came a host of new problems. You see, although I said mathematical inventions only exist in our minds, once they’re invented they gain a life of their own. You start to notice patterns in your mathematical objects and you have to figure out why they do the things they do. And numbers are a perfectly good example of this: once I notice that I can multiply a number by itself, I can ask how often these “perfect squares” occur. That is, what’s the pattern in the numbers 1^2, 2^2, 3^2, 4^2, \dots? If you think about it for a while, you’ll find that square numbers have a very special relationship with odd numbers.

Other times, however, the things you invent turn out to make no sense at all, and you can prove they never existed in the first place! It’s an odd state of affairs, but we’re going to approach the subject of complex numbers from this mindset. We’re going to come up with a simple idea, the idea that negative numbers can be perfect squares, and explore the world of patterns it opens up. Along the way we’ll do a little bit of programming to help explore, give some simple proofs to solidify our intuition, and by the end we’ll see how these ideas can cause wonderful patterns like this one:

mandelbrot

The number i

Let’s bring the story back around to squares. One fact we all remember about numbers is that squaring a number gives you something non-negative. 7^2 = 49, (-2)^2 = 4, 0^2 = 0, and so on. But it certainly doesn’t have to be this way. What if we got sick of that stupid fact and decided to invent a new number whose square was negative? Which negative, you ask? Well it doesn’t really matter, because I can always stretch it larger or smaller so that it’s square is -1.

Let’s see how: if you say that your made-up number x makes x^2 = -7, then I can just use \frac{x}{\sqrt{7}} to get a number whose square is -1. If you’re going to invent a number that’s supposed to interact with our usual numbers, then you have to be allowed to add, subtract, and multiply x with regular old real numbers, and the usual properties would have to still work. So it would have to be true that (x / \sqrt{7})^2 = x^2 / \sqrt{7}^2 = -7/7 = -1.

So because it makes no difference (this is what mathematicians mean by, “without loss of generality”) we can assume that the number we’re inventing will have a square of negative one. Just to line up with history, let’s call the new number i. So there it is: i exists and i^2 = -1. And now that we are “asserting” that i plays nicely with real numbers, we get these natural rules for adding and subtracting and multiplying and dividing. For example

  • 1 + i is a new number, which we’ll just call 1+i. And if we added two of these together, (1+ i) + (1+i), we can combine the real parts and the i parts to get 2 + 2i. Same goes for subtraction. In general a complex number looks like a + bi, because as we’ll see in the other points you can simplify every simple arithmetic expression down to just one “real number” part and one “real number times i” part.
  • We can multiply 3 \cdot i, and we’ll just call it 3i, and we require that multiplication distributes across addition (that the FOIL rule works). So that, for example, (2 - i)(1 + 3i) = (2 + 6i - i - 3i^2) = (2 + 3) + (6i - i) = (5 + 5i).
  • Dividing is a significantly more annoying. Say we want to figure out what 1 / (1+i) is (in fact, it’s not even obvious that this should look like a regular number! But it does). The 1 / a notation just means we’re looking for a number which, when we multiply by the denominator a, we get back to 1. So we’re looking to find out when (a + bi)(1 + i) = 1 + 0i where a and b are variables we’re trying to solve for. If we multiply it out we get (a-b) + (a + b)i = 1 + 0i, and since the real part and the i part have to match up, we know that a - b = 1 and a + b = 0. If we solve these two equations, we find that a = 1/2, b = -1/2 works great. If we want to figure out something like (2 + 3i) / (1 - i), we just find out what 1 / (1- i) is first, and then multiply the result by (2+3i).

So that was tedious and extremely boring, and we imagine you didn’t even read it (that’s okay, it really is boring!). All we’re doing is establishing ground rules for the game, so if you come across some arithmetic that doesn’t make sense, you can refer back to this list to see what’s going on. And once again, for the purpose of this post, we’re asserting that all these laws hold. Maybe some laws follow from others, but as long as we don’t come up with any nasty self-contradictions we’ll be fine.

And now we turn to the real questions: is i the only square root of -1? Does i itself have a square root? If it didn’t, we’d be back to where we started, with some numbers (the non-i numbers) having square roots while others don’t. And so we’d feel the need to make all the i numbers happy by making up more numbers to be their square roots, and then worrying what if these new numbers don’t have square roots and…gah!

I’ll just let you in on the secret to save us from this crisis. It turns out that i does have a square root in terms of other i numbers, but in order to find it we’ll need to understand i from a different angle, and that angle turns out to be geometry.

Geometry? How is geometry going to help me understand numbers!?

It’s a valid question and part of why complex numbers are so fascinating. And I don’t mean geometry like triangles and circles and parallel lines (though there will be much talk of angles), I mean transformations in the sense that we’ll be “stretching,” “squishing,” and “rotating” numbers. Maybe another time I can tell you why for me “geometry” means stretching and rotating; it’s a long but very fun story.

The clever insight is that you can represent complex numbers as geometric objects in the first place. To do it, you just think of a + bi as a pair of numbers (a,b), (the pair of real part and i part), and then plot that point on a plane. For us, the x-axis will be the “real” axis, and the y-axis will be the i-axis. So the number (3 - 4i) is plotted 3 units in the positive x direction and 4 units in the negative y direction. Like this:

single-complex-number

The “j” instead of “i” is not a typo, but a disappointing fact about the programming language we used to make this image. We’ll talk more about why later.

We draw it as an arrow for a good reason. Stretching, squishing, rotating, and reflecting will all be applied to the arrow, keeping its tail fixed at the center of the axes. Sometimes the arrow is called a “vector,” but we won’t use that word because here it’s synonymous with “complex number.”

So let’s get started squishing stuff.

Stretching, Squishing, Rotating

Before we continue I should clear up some names. We call a number that has an i in it a complex number, and we call the part without the i the real part (like 2 in 2-i) and the part with i the complex part.

Python is going to be a great asset for us in exploring complex numbers, so let’s jump right into it. It turns out that Python natively supports complex numbers, and I wrote a program for drawing complex numbers. I used it to make the plot above. The program depends on a library I hate called matplotlib, and so the point of the program is to shield you from as much pain as possible and focus on complex numbers. You can use the program by downloading it from this blog’s Github page, along with everything else I made in writing this post. All you need to know how to do is call a function, and I’ve done a bit of window dressing removal to simplify things (I really hate matplotlib).

Here’s the function header:

# plotComplexNumbers : [complex] -> None
# display a plot of the given list of complex numbers
def plotComplexNumbers(numbers):
   ...

Before we show some examples of how to use it, we have to understand how to use complex numbers in Python. It’s pretty simple, except that Python was written by people who hate math, and so they decided the complex number would be represented by j instead of i (people who hate math are sometimes called “engineers,” and they use j out of spite. Not really, though).

So in Python it’s just like any other computation. For example:

>>> (1 + 1j)*(4 - 2j) == (6+2j)
True
>>> 1 / (1+1j)
(0.5-0.5j)

And so calling the plotting function with a given list of complex numbers is as simple as importing the module and calling the function

from plotcomplex import plot
plot.plotComplexNumbers([(-1+1j), (1+2j), (-1.5 - 0.5j), (.6 - 1.8j)])

Here’s the result

example-complex-plot

So let’s use plots like this one to explore what “multiplication by i” does to a complex number. It might not seem exciting at first, but I promise there’s a neat punchline.

Even without plotting it’s pretty easy to tell what multiplying by i does to some numbers. It takes 1 to i, moves i to i^2 = -1, it takes -1 to -i, and -i to -i \cdot i = 1.

What’s the pattern in these? well if we plot all these numbers, they’re all at right angles in counter-clockwise order. So this might suggest that multiplication by i does some kind of rotation. Is that always the case? Well lets try it with some other more complicated numbers. Click the plots below to enlarge.

Well, it looks close but it’s hard to tell. Some of the axes are squished and stretched, so it might be that our images don’t accurately represent the numbers (the real world can be such a pain). Well when visual techniques fail, we can attempt to prove it.

Clearly multiplying by i does some kind of rotation, maybe with other stuff too, and it shouldn’t be so hard to see that multiplying by i does the same thing no matter which number you use (okay, the skeptical readers will say that’s totally hard to see, but we’ll prove it super rigorously in a minute). So if we take any number and multiply it by i once, then twice, then three times, then four, and if we only get back to where we started at four multiplications, then each rotation had to be a quarter turn.

Indeed,

\displaystyle (a + bi) i^4 = (ai - b) i^3 = (-a - bi) i^2 = (-ai + b) i = a + bi

This still isn’t all that convincing, and we want to be 100% sure we’re right. What we really need is a way to arithmetically compute the angle between two complex numbers in their plotted forms. What we’ll do is find a way to measure the angle of one complex number with the x-axis, and then by subtraction we can get angles between arbitrary points. For example, in the figure below \theta = \theta_1 - \theta_2.

angle-example

One way to do this is with trigonometry: the geometric drawing of a + bi is the hypotenuse of a right triangle with the x-axis.

triangle-example

And so if r is the length of the arrow, then by the definition of sine and cosine, \cos(\theta) = a/r, \sin(\theta) = b/r. If we have r, \theta, and r > 0, we can solve for a unique a and b, so instead of representing a complex number in terms of the pair of numbers (a,b), we can represent it with the pair of numbers (r, \theta). And the conversion between the two is just

a + bi = r \cos(\theta) + (r \sin(\theta)) i

The (r, \theta) representation is called the polar representation, while the (a,b) representation is called the rectangular representation or the Cartesian representation. Converting between polar and Cartesian coordinates fills the pages of many awful pre-calculus textbooks (despite the fact that complex numbers don’t exist in classical calculus). Luckily for us Python has built-in functions to convert between the two representations for us.

>>> import cmath
>>> cmath.polar(1 + 1j)
(1.4142135623730951, 0.7853981633974483)
>>> z = cmath.polar(1 + 1j)
>>> cmath.rect(z[0], z[1])
(1.0000000000000002+1j)

It’s a little bit inaccurate on the rounding, but it’s fine for our purposes.

So how do we compute the angle between two complex numbers? Just convert each to the polar form, and subtract the second coordinates. So if we get back to our true goal, to figure out what multiplication by i does, we can just do everything in polar form. Here’s a program that computes the angle between two complex numbers.

def angleBetween(z, w):
   zPolar, wPolar = cmath.polar(z), cmath.polar(w)
   return wPolar[1] - zPolar[1]

print(angleBetween(1 + 1j, (1 + 1j) * 1j))
print(angleBetween(2 - 3j, (2 - 3j) * 1j))
print(angleBetween(-0.5 + 7j, (-0.5 + 7j) * 1j))

Running it gives

1.5707963267948966
1.5707963267948966
-4.71238898038469

Note that the decimal form of \pi/2 is 1.57079…, and that the negative angle is equivalent to \pi/2 if you add a full turn of 2\pi to it. So programmatically we can see that for every input we try multiplying by i rotates 90 degrees.

But we still haven’t proved it works. So let’s do that now. To say what the angle is between r \cos (\theta) + ri \sin (\theta) and i \cdot [r \cos (\theta) + ri \sin(\theta)] = -r \sin (\theta) + ri \cos(\theta), we need to transform the second number into the usual polar form (where the i is on the sine part and not the cosine part). But we know, or I’m telling you now, this nice fact about sine and cosine:

\displaystyle \sin(\theta + \pi/2) = cos(\theta)
\displaystyle \cos(\theta + \pi / 2) = -\sin(\theta)

This fact is maybe awkward to write out algebraically, but it’s just saying that if you shift the whole sine curve a little bit you get the cosine curve, and if you keep shifting it you get the opposite of the sine curve (and if you kept shifting it even more you’d eventually get back to the sine curve; they’re called periodic for this reason).

So immediately we can rewrite the second number as r \cos(\theta + \pi/2) + i r \sin (\theta + \pi/2). The angle is the same as the original angle plus a right angle of \pi/2. Neat!

Applying this same idea to (a + bi) \cdot (c + di), it’s not much harder to prove that multiplying two complex numbers in general multiplies their lengths and adds their angles. So if a complex number z has its magnitude r smaller than 1, multiplying by z squishes and rotates whatever is being multiplied. And if the magnitude is greater than 1, it stretches and rotates. So we have a super simple geometric understanding of how arithmetic with complex numbers works. And as we’re about to see, all this stretching and rotating results in some really weird (and beautifully mysterious!) mathematics and programs.

But before we do that we still have one question to address, the question that started this whole geometric train of thought: does i have a square root? Indeed, I’m just looking for a number such that, when I square its length and double its angle, I get i = \cos(\pi/2) + i \sin(\pi/2). Indeed, the angle we want is \pi/4, and the length we want is r = 1, which means \sqrt{i} = \cos(\pi/4) + i \sin(\pi/4). Sweet! There is another root if you play with the signs, see if you can figure it out.

In fact it’s a very deeper and more beautiful theorem (“theorem” means “really important fact”) called the fundamental theorem of algebra. And essentially it says that the complex numbers are complete. That is, we can always find square roots, cube roots, or anything roots of numbers involving i. It actually says a lot more, but it’s easier to appreciate the rest of it after you do more math than we’re going to do in this post.

On to pretty patterns!

The Fractal

So here’s a little experiment. Since every point in the plane is the end of some arrow representing a complex number, we can imagine transforming the entire complex plane by transforming each number by the same rule. The most interesting simple rule we can think of: squaring! So though it might strain your capacity for imagination, try to visualize the idea like this. Squaring a complex number is the same as squaring it’s length and doubling its angle. So imagine: any numbers whose arrows are longer than 1 will grow much bigger, arrows shorter than 1 will shrink, and arrows of length exactly one will stay the same length (arrows close to length 1 will grow/shrink much more slowly than those far away from 1). And complex numbers with small positive angles will increase their angle, but only a bit, while larger angles will grow faster.

Here’s an animation made by Douglas Arnold showing what happens to the set of complex numbers a + bi with 0 \leq a, b \leq 1 or -1 < a,b < 0. Again, imagine every point is the end of a different arrow for the corresponding complex number. The animation is for a single squaring, and the points move along the arc they would travel if one rotated/stretched them smoothly.

complex-squaring

So that’s pretty, but this is by all accounts a well-behaved transformation. It’s “predictable,” because for example we can always tell which complex numbers will get bigger and bigger (in length) and which will get smaller.

What if, just for the sake of tinkering, we changed the transformation a little bit? That is, instead of sending z = a+bi to z^2 (I’ll often write this z \mapsto z^2), what if we sent

\displaystyle z \mapsto z^2 + 1

Now it’s not so obvious: which numbers will grow and which will shrink? Notice that it’s odd because adding 1 only changes the real part of the number. So a number whose length is greater than 1 can become small under this transformation. For example, i is sent to 0, so something slightly larger would also be close to zero. Indeed, 5i/4 \mapsto -9/16.

So here’s an interesting question: are there any complex numbers that will stay small even if I keep transforming like this forever? Specifically, if I call f(z) = z^2, and I call f^2(z) = f(f(z)), and likewise call f^k(z) for k repeated transformations of z, is there a number z so that for every k, the value f^k(z) < 2? “Obvious” choices like z=0 don’t work, and neither do random guesses like z=i or z=1. So should we guess the answer is no?

Before we jump to conclusions let’s write a program to see what happens for more than our random guesses. The program is simple: we’ll define the “square plus one” function, and then repeatedly apply that function to a number for some long number of times (say, 250 times). If the length of the number stays under 2 after so many tries, we’ll call it “small forever,” and otherwise we’ll call it “not small forever.”

def squarePlusOne(z):
   return z*z + 1

def isSmallForever(z, f):
   k = 0

   while abs(z) < 2: z = f(z) k += 1 if k > 250:
         return True

   return False

This isSmallForever function is generic: you can give it any function f and it will repeatedly call f on z until the result grows bigger than 2 in length. Note that the abs function is a built-in Python function for computing the length of a complex number.

Then I wrote a classify function, which you can give a window and a small increment, and it will produce a grid of zeros and ones marking the results of isSmallForever. The details of the function are not that important. I also wrote a function that turns the grid into a picture. So here’s an example of how we’d use it:

from plotcomplex.plot import gridToImage

def classifySquarePlusOne(z):
   return isSmallForever(z, squarePlusOne)

grid = classify(classifySquarePlusOne) # the other arguments are defaulted to [-2,2], [-2,2], 0.1
gridToImage(grid)

And here’s the result. Points colored black grow beyond 2, and white points stay small for the whole test.

Looks like they'll always grow big.

Looks like they’ll always grow big.

So it looks like repeated squaring plus one will always make complex numbers grow big. That’s not too exciting, but we can always make it more exciting. What happens if we replace the 1 in z^2 + 1 with a different complex number? For example, if we do z^2 - 1 then will things always grow big?

You can randomly guess and see that 0 will never grow big, because 0^2 - 1 = -1 and (-1)^2 - 1 = 0. It will just oscillate forever. So with -1 some numbers will grow and some will not! Let’s use the same routine above to see which:

def classifySquareMinusOne(z):
      return isSmallForever(z, squareMinusOne)

grid = classify(classifySquareMinusOne)
gridToImage(grid)

And the result:

second-attempt

Now that’s a more interesting picture! Let’s ramp up the resolution

grid = classify(classifySquareMinusOne, step=0.001)
gridToImage(grid)

second-attempt-zoomed

Gorgeous. If you try this at home you’ll notice, however, that this took a hell of a long time to run. Speeding up our programs is very possible, but it’s a long story for another time. For now we can just be patient.

Indeed, this image has a ton of interesting details! It looks almost circular in the middle, but if we zoom in we can see that it’s more like a rippling wave

second-attempt-zoomed2

It’s pretty incredible, and a huge question is jumping out at me: what the heck is causing this pattern to occur? What secret does -1 know that +1 doesn’t that makes the resulting pattern so intricate?

But an even bigger question is this. We just discovered that some values of c make z \mapsto z^2 + c result in interesting patterns, and some do not! So the question is which ones make interesting patterns? Even if we just, say, fix the starting point to zero: what is the pattern in the complex numbers that would tell me when this transformation makes zero blow up, and when it keeps zero small?

Sounds like a job for another program. This time we’ll use a nice little Python feature called a closure, which we define a function that saves the information that exists when it’s created for later. It will let us write a function that takes in c and produces a function that transforms according to z \mapsto z^2+c.

def squarePlusC(c):
   def f(z):
      return z*z + c

   return f

And we can use the very same classification/graphing function from before to do this.

def classifySquarePlusC(c):
   return isSmallForever(0, squarePlusC(c))

grid = classify(classifySquarePlusC, xRange=(-2, 1), yRange=(-1, 1), step=0.005)
gridToImage(grid)

And the result:

mandelbrot

Stunning. This wonderful pattern, which is still largely not understood today, is known as the Mandelbrot set. That is, the white points are the points in the Mandlebrot set, and the black points are not in it. The detail on the border of this thing is infinitely intricate. For example, we can change the window in our little program to zoom in on a particular region.

mandelbrot-zoomed

And if you keep zooming in you keep getting more and more detail. This was true of the specific case of z^2 - 1, but somehow the patterns in the Mandelbrot set are much more varied and interesting. And if you keep going down eventually you’ll see patterns that look like the original Mandelbrot set. We can already kind of see that happening above. The name for this idea is a fractal, and the z^2 - 1 image has it too. Fractals are a fascinating and mysterious subject studied in a field called discrete dynamical systems. Many people dedicate their entire lives to studying these things, and it’s for good reason. There’s a lot to learn and even more that’s unknown!

So this is the end of our journey for now. I’ve posted all of the code we used in the making of this post so you can continue to play, but here are some interesting ideas.

  • The Mandelbrot set (and most fractals) are usually colored. The way they’re colored is as follows. Rather than just say true or false when zero blows up beyond 2 in length, you return the number of iterations k that happened. Then you pick a color based on how big k is. There’s a link below that lets you play with this. In fact, adding colors shows that there is even more intricate detail happening outside the Mandelbrot set that’s too faint to see in our pictures above. Such as this.
  • Some very simple questions about fractals are very hard to answer. For example, is the Mandelbrot set connected? That is, is it possible to “walk” from every point in the Mandelbrot set to every other point without leaving the set? Despite the scattering of points in the zoomed in picture above that suggest the answer is no, the answer is actually yes! This is a really difficult thing to prove, however.
  • The patterns in many fractals are often used to generate realistic looking landscapes and generate pseudo randomness. So fractals are not just mathematical curiosities.
  • You should definitely be experimenting with this stuff! What happens if you change the length threshold from 2 to some bigger number? What about a smaller number? What if you do powers different than 2? There’s so much to explore!
  • The big picture thing to take away from this is that it’s not the numbers themselves that are particularly interesting, it’s the transformations of the numbers that generate these patterns! The interesting questions are what kinds of things are the same under these transformations, and what things are different. This is a very general idea in mathematics, and the more math you do the more you’ll find yourself wondering about useful and bizarre transformations.

For the chance to keep playing with the Mandelbrot set, check out this Mandelbrot grapher that works in your browser. It lets you drag rectangles to zoom further in on regions of interest. It’s really fun.

Until next time!

Elliptic Curve Diffie-Hellman

So far in this series we’ve seen elliptic curves from many perspectives, including the elementary, algebraic, and programmatic ones. We implemented finite field arithmetic and connected it to our elliptic curve code. So we’re in a perfect position to feast on the main course: how do we use elliptic curves to actually do cryptography?

History

As the reader has heard countless times in this series, an elliptic curve is a geometric object whose points have a surprising and well-defined notion of addition. That you can add some points on some elliptic curves was a well-known technique since antiquity, discovered by Diophantus. It was not until the mid 19th century that the general question of whether addition always makes sense was answered by Karl Weierstrass. In 1908 Henri Poincaré asked about how one might go about classifying the structure of elliptic curves, and it was not until 1922 that Louis Mordell proved the fundamental theorem of elliptic curves, classifying their algebraic structure for most important fields.

While mathematicians have always been interested in elliptic curves (there is currently a million dollar prize out for a solution to one problem about them), its use in cryptography was not suggested until 1985. Two prominent researchers independently proposed it: Neal Koblitz at the University of Washington, and Victor Miller who was at IBM Research at the time. Their proposal was solid from the start, but elliptic curves didn’t gain traction in practice until around 2005. More recently, the NSA was revealed to have planted vulnerable national standards for elliptic curve cryptography so they could have backdoor access. You can see a proof and implementation of the backdoor at Aris Adamantiadis’s blog. For now we’ll focus on the cryptographic protocols themselves.

The Discrete Logarithm Problem

Koblitz and Miller had insights aplenty, but the central observation in all of this is the following.

Adding is easy on elliptic curves, but undoing addition seems hard.

What I mean by this is usually called the discrete logarithm problem. Here’s a formal definition. Recall that an additive group is just a set of things that have a well-defined addition operation, and the that notation ny means y + y + \dots + y (n times).

Definition: Let G be an additive group, and let x, y be elements of G so that x = ny for some integer n. The discrete logarithm problem asks one to find n when given x and y.

I like to give super formal definitions first, so let’s do a comparison. For integers this problem is very easy. If you give me 12 and 4185072, I can take a few seconds and compute that 4185072 = (348756) 12 using the elementary-school division algorithm (in the above notation, y=12, x=4185072, and n = 348756). The division algorithm for integers is efficient, and so it gives us a nice solution to the discrete logarithm problem for the additive group of integers \mathbb{Z}.

The reason we use the word “logarithm” is because if your group operation is multiplication instead of addition, you’re tasked with solving the equation x = y^n for n. With real numbers you’d take a logarithm of both sides, hence the name. Just in case you were wondering, we can also solve the multiplicative logarithm problem efficiently for rational numbers (and hence for integers) using the square-and-multiply algorithm. Just square y until doing so would make you bigger than x, then multiply by y until you hit x.

But integers are way nicer than they need to be. They are selflessly well-ordered. They give us division for free. It’s a computational charity! What happens when we move to settings where we don’t have a division algorithm? In mathematical lingo: we’re really interested in the case when G is just a group, and doesn’t have additional structure. The less structure we have, the harder it should be to solve problems like the discrete logarithm. Elliptic curves are an excellent example of such a group. There is no sensible ordering for points on an elliptic curve, and we don’t know how to do division efficiently. The best we can do is add y to itself over and over until we hit x, and it could easily happen that n (as a number) is exponentially larger than the number of bits in x and y.

What we really want is a polynomial time algorithm for solving discrete logarithms. Since we can take multiples of a point very fast using the double-and-add algorithm from our previous post, if there is no polynomial time algorithm for the discrete logarithm problem then “taking multiples” fills the role of a theoretical one-way function, and as we’ll see this opens the door for secure communication.

Here’s the formal statement of the discrete logarithm problem for elliptic curves.

Problem: Let E be an elliptic curve over a finite field k. Let P, Q be points on E such that P = nQ for some integer n. Let |P| denote the number of bits needed to describe the point P. We wish to find an algorithm which determines n and has runtime polynomial in |P| + |Q|. If we want to allow randomness, we can require the algorithm to find the correct n with probability at least 2/3.

So this problem seems hard. And when mathematicians and computer scientists try to solve a problem for many years and they can’t, the cryptographers get excited. They start to wonder: under the assumption that the problem has no efficient solution, can we use that as the foundation for a secure communication protocol?

The Diffie-Hellman Protocol and Problem

Let’s spend the rest of this post on the simplest example of a cryptographic protocol based on elliptic curves: the Diffie-Hellman key exchange.

A lot of cryptographic techniques are based on two individuals sharing a secret string, and using that string as the key to encrypt and decrypt their messages. In fact, if you have enough secret shared information, and you only use it once, you can have provably unbreakable encryption! We’ll cover this idea in a future series on the theory of cryptography (it’s called a one-time pad, and it’s not all that complicated). All we need now is motivation to get a shared secret.

Because what if your two individuals have never met before and they want to generate such a shared secret? Worse, what if their only method of communication is being monitored by nefarious foes? Can they possibly exchange public information and use it to construct a shared piece of secret information? Miraculously, the answer is yes, and one way to do it is with the Diffie-Hellman protocol. Rather than explain it abstractly let’s just jump right in and implement it with elliptic curves.

As hinted by the discrete logarithm problem, we only really have one tool here: taking multiples of a point. So say we’ve chosen a curve C and a point on that curve Q. Then we can take some secret integer n, and publish Q and nQ for the world to see. If the discrete logarithm problem is truly hard, then we can rest assured that nobody will be able to discover n.

How can we use this to established a shared secret? This is where Diffie-Hellman comes in. Take our two would-be communicators, Alice and Bob. Alice and Bob each pick a binary string called a secret key, which in interpreted as a number in this protocol. Let’s call Alice’s secret key s_A and Bob’s s_B, and note that they don’t have to be the same. As the name “secret key” suggests, the secret keys are held secret. Moreover, we’ll assume that everything else in this protocol, including all data sent between the two parties, is public.

So Alice and Bob agree ahead of time on a public elliptic curve C and a public point Q on C. We’ll sometimes call this point the base point for the protocol.

Bob can cunningly do the following trick: take his secret key s_B and send s_B Q to Alice. Equally slick Alice computes s_A Q and sends that to Bob. Now Alice, having s_B Q , computes s_A s_B Q. And Bob, since he has s_A Q, can compute s_B s_A Q. But since addition is commutative in elliptic curve groups, we know s_A s_B Q = s_B s_A Q. The secret piece of shared information can be anything derived from this new point, for example its x-coordinate.

If we want to talk about security, we have to describe what is public and what the attacker is trying to determine. In this case the public information consists of the points Q, s_AQ, s_BQ. What is the attacker trying to figure out? Well she really wants to eavesdrop on their subsequent conversation, that is, the stuff that encrypt with their new shared secret s_As_BQ. So the attacker wants find out s_As_BQ. And we’ll call this the Diffie-Hellman problem.

Diffie-Hellman Problem: Suppose you fix an elliptic curve E over a finite field k, and you’re given four points Q, aQ, bQ and P for some unknown integers a, b. Determine if P = abQ in polynomial time (in the lengths of Q, aQ, bQ, P).

On one hand, if we had an efficient solution to the discrete logarithm problem, we could easily use that to solve the Diffie-Hellman problem because we could compute a,b and them quickly compute abQ and check if it’s P. In other words discrete log is at least as hard as this problem. On the other hand nobody knows if you can do this without solving the discrete logarithm problem. Moreover, we’re making this problem as easy as we reasonably can because we don’t require you to be able to compute abQ. Even if some prankster gave you a candidate for abQ, all you have to do is check if it’s correct. One could imagine some test that rules out all fakes but still doesn’t allow us to compute the true point, which would be one way to solve this problem without being able to solve discrete log.

So this is our hardness assumption: assuming this problem has no efficient solution then no attacker, even with really lucky guesses, can feasibly determine Alice and Bob’s shared secret.

Python Implementation

The Diffie-Hellman protocol is just as easy to implement as you would expect. Here’s some Python code that does the trick. Note that all the code produced in the making of this post is available on this blog’s Github page.

def sendDH(privateKey, generator, sendFunction):
   return sendFunction(privateKey * generator)

def receiveDH(privateKey, receiveFunction):
   return privateKey * receiveFunction()

And using our code from the previous posts in this series we can run it on a small test.

import os

def generateSecretKey(numBits):
   return int.from_bytes(os.urandom(numBits // 8), byteorder='big')

if __name__ == "__main__":
   F = FiniteField(3851, 1)
   curve = EllipticCurve(a=F(324), b=F(1287))
   basePoint = Point(curve, F(920), F(303))

   aliceSecretKey = generateSecretKey(8)
   bobSecretKey = generateSecretKey(8)

   alicePublicKey = sendDH(aliceSecretKey, basePoint, lambda x:x)
   bobPublicKey = sendDH(bobSecretKey, basePoint, lambda x:x)

   sharedSecret1 = receiveDH(bobSecretKey, lambda: alicePublicKey)
   sharedSecret2 = receiveDH(aliceSecretKey, lambda: bobPublicKey)
   print('Shared secret is %s == %s' % (sharedSecret1, sharedSecret2))

Pythons os module allows us to access the operating system’s random number generator (which is supposed to be cryptographically secure) via the function urandom, which accepts as input the number of bytes you wish to generate, and produces as output a Python bytestring object that we then convert to an integer. Our simplistic (and totally insecure!) protocol uses the elliptic curve C defined by y^2 = x^3 + 324 x + 1287 over the finite field \mathbb{Z}/3851. We pick the base point Q = (920, 303), and call the relevant functions with placeholders for actual network transmission functions.

There is one issue we have to note. Say we fix our base point Q. Since an elliptic curve over a finite field can only have finitely many points (since the field only has finitely many possible pairs of numbers), it will eventually happen that nQ = 0 is the ideal point. Recall that the smallest value of n for which nQ = 0 is called the order of Q. And so when we’re generating secret keys, we have to pick them to be smaller than the order of the base point. Viewed from the other angle, we want to pick Q to have large order, so that we can pick large and difficult-to-guess secret keys. In fact, no matter what integer you use for the secret key it will be equivalent to some secret key that’s less than the order of Q. So if an attacker could guess the smaller secret key he wouldn’t need to know your larger key.

The base point we picked in the example above happens to have order 1964, so an 8-bit key is well within the bounds. A real industry-strength elliptic curve (say, Curve25519 or the curves used in the NIST standards*) is designed to avoid these problems. The order of the base point used in the Diffie-Hellman protocol for Curve25519 has gargantuan order (like 2^{256}). So 256-bit keys can easily be used. I’m brushing some important details under the rug, because the key as an actual string is derived from 256 pseudorandom bits in a highly nontrivial way.

So there we have it: a simple cryptographic protocol based on elliptic curves. While we didn’t experiment with a truly secure elliptic curve in this example, we’ll eventually extend our work to include Curve25519. But before we do that we want to explore some of the other algorithms based on elliptic curves, including random number generation and factoring.

Comments on Insecurity

Why do we use elliptic curves for this? Why not do something like RSA and do multiplication (and exponentiation) modulo some large prime?

Well, it turns out that algorithmic techniques are getting better and better at solving the discrete logarithm problem for integers mod p, leading some to claim that RSA is dead. But even if we will never find a genuinely efficient algorithm (polynomial time is good, but might not be good enough), these techniques have made it clear that the key size required to maintain high security in RSA-type protocols needs to be really big. Like 4096 bits. But for elliptic curves we can get away with 256-bit keys. The reason for this is essentially mathematical: addition on elliptic curves is not as well understood as multiplication is for integers, and the more complex structure of the group makes it seem inherently more difficult. So until some powerful general attacks are found, it seems that we can get away with higher security on elliptic curves with smaller key sizes.

I mentioned that the particular elliptic curve we chose was insecure, and this raises the natural question: what makes an elliptic curve/field/basepoint combination secure or insecure? There are a few mathematical pitfalls (including certain attacks we won’t address), but one major non-mathematical problem is called a side-channel attack. A side channel attack against a cryptographic protocol is one that gains additional information about users’ secret information by monitoring side-effects of the physical implementation of the algorithm.

The problem is that different operations, doubling a point and adding two different points, have very different algorithms. As a result, they take different amounts of time to complete and they require differing amounts of power. Both of these can be used to reveal information about the secret keys. Despite the different algorithms for arithmetic on Weierstrass normal form curves, one can still implement them to be secure. Naively, one might pad the two subroutines with additional (useless) operations so that they have more similar time/power signatures, but I imagine there are better methods available.

But much of what makes a curve’s domain parameters mathematically secure or insecure is still unknown. There are a handful of known attacks against very specific families of parameters, and so cryptography experts simply avoid these as they are discovered. Here is a short list of pitfalls, and links to overviews:

  1. Make sure the order of your basepoint has a short facorization (e.g., is 2p, 3p, or 4p for some prime p). Otherwise you risk attacks based on the Chinese Remainder Theorem, the most prominent of which is called Pohlig-Hellman.
  2. Make sure your curve is not supersingular. If it is you can reduce the discrete logarithm problem to one in a different and much simpler group.
  3. If your curve C is defined over \mathbb{Z}/p, make sure the number of points on C is not equal to p. Such a curve is called prime-field anomalous, and its discrete logarithm problem can be reduced to the (additive) version on integers.
  4. Don’t pick a small underlying field like \mathbb{F}_{2^m} for small mGeneral-purpose attacks can be sped up significantly against such fields.
  5. If you use the field \mathbb{F}_{2^m}, ensure that m is prime. Many believe that if m has small divisors, attacks based on some very complicated algebraic geometry can be used to solve the discrete logarithm problem more efficiently than any general-purpose method. This gives evidence that m being composite at all is dangerous, so we might as well make it prime.

This is a sublist of the list provided on page 28 of this white paper.

The interesting thing is that there is little about the algorithm and protocol that is vulnerable. Almost all of the vulnerabilities come from using bad curves, bad fields, or a bad basepoint. Since the known attacks work on a pretty small subset of parameters, one potentially secure technique is to just generate a random curve and a random point on that curve! But apparently all respected national agencies will refuse to call your algorithm “standards compliant” if you do this.

Next time we’ll continue implementing cryptographic protocols, including the more general public-key message sending and signing protocols.

Until then!

Connecting Elliptic Curves with Finite Fields

So here we are. We’ve studied the general properties of elliptic curves, written a program for elliptic curve arithmetic over the rational numbers, and taken a long detour to get some familiarity with finite fields (the mathematical background and a program that implements arbitrary finite field arithmetic).

And now we want to get back on track and hook our elliptic curve program up with our finite field program to make everything work. And indeed, for most cases it’s just that simple! For example, take the point P = (2,1) on the elliptic curve y = x^3 + x + 1 with coefficients in \mathbb{Z}/5. Using purely code produced in previous posts, we can do arithmetic:

>>> F5 = FiniteField(5, 1)
>>> C = EllipticCurve(a=F5(1), b=F5(1))
>>> P = Point(C, F5(2), F5(1))
>>> P
(2 (mod 5), 1 (mod 5))
>>> 2*P
(2 (mod 5), 4 (mod 5))
>>> 3*P
Ideal

Here’s an example of the same curve y^2 = x^3 + x + 1 with coefficients over the finite field of order 25 \mathbb{F}_{5^2}.

>>> F25 = FiniteField(5,2)
>>> F25.idealGenerator
3 + 0 t^1 + 1 t^2
>>> curve = EllipticCurve(a=F25([1]), b=F25([1]))
>>> x = F25([2,1])
>>> y = F25([0,2])
>>> y*y - x*x*x - x - 1
0 ∈ F_{5^2}
>>> curve.testPoint(x,y)
True
>>> P = Point(curve, x, y)
>>> -P
(2 + 1 t^1, 0 + 3 t^1)
>>> P+P
(3 + 1 t^1, 2)
>>> 4*P
(3 + 2 t^1, 4 + 4 t^1)
>>> 9*P
Ideal

There are some subtle issues, though, in that we shouldn’t use the code we have to work over any finite field. But we’ve come very far and covered a lot of technical details, so let’s briefly remember how we got here.

Taking a Step Back

At the beginning there was only \mathbb{Q}, the field of rational numbers. We had a really nice geometric picture of elliptic curves over this field, and using that picture we developed an algorithm for (geometrically) adding points.

add-points-exampleIf we assume the equation of the elliptic curve had this nice form (the so-called Weierstrass normal form, y^2 = x^3 + ax + b), then we were able to translate the geometric algorithm into an algebraic one. This made it possible to write a program to perform the additions, and this was our first programmatic milestone. Along the way, we learned about groups and projective geometry, which I explained was the proper mathematical setting for elliptic curves. In that setting, we saw that for most fields, every elliptic curve could be modified into one in Weierstrass normal form without changing the algebraic structure of the set of solutions. Moreover, we saw that you can replace the field \mathbb{Q} with the field of your choice. The set of solutions to an elliptic curve still forms a group and the same algebraic point-adding algorithm works. It’s just an interesting quirk of mathematics that one way to represent elements of finite fields are as polynomial remainders when dividing by a “prime” polynomial (analogous to modular arithmetic with integers). So we spent a while actually implementing finite fields in terms of this representation.

The reader has probably heard of this, but in practice one uses a (very large) finite field for the coefficients of their elliptic curve. Often this is \mathbb{Z}/p for some really large prime p, or the field of 2^m elements for some large integer m. But one would naturally complain: there are so many (infinitely many!) finite fields to choose from! Which one should we use, and how did they choose these?

As with most engineering problems the answer is a trade-off, in this case between efficiency and security. Arithmetic is faster in fields of characteristic 2 (and easy to implement at the hardware level!) but a lot is known about the finite field of 2^m elements. In fact, if you are sloppy in picking m you’ll get no security at all! One prominent example is the so-called Weil descent attack, which breaks security assumptions for elliptic curve cryptography when m is not prime. These attacks use some sophisticated machinery, but this is how it goes. An abstract mathematical breakthrough can immediately invalidate cryptography based on certain elliptic curves.

But before we get neck-deep in cryptography we have an even bigger problem: for some finite fields, not every elliptic curve has a Weierstrass normal form! So our program isn’t expressive enough to represent all elliptic curves we might want to. We could avoid these curves in our applications, but that would be unnecessarily limiting. With a bit more careful work, we can devise a more general algorithm (and a different normal form) that works for all fields. But let’s understand the problem first.

In general, you can have an elliptic curve of the form \sum_{i+j=3} a_{i,j}x^iy^j = 0. That is, it’s just a really general degree 3 polynomial in two variables. If we assume the discriminant of this polynomial is nonzero, we’ll get a smooth curve. And then to get to the Weierstrass normal form involves a bunch of changes of variables. The problem is that the algebraic manipulations you do require you to multiply and divide by 2 and 3. In a field of either characteristic, these operations are either destructive (multiplying by zero) or totally illegal (dividing by zero), and they ruin Weierstrass’s day.

So what can we do?

Well it turns out that there is a more general Weierstrass normal form, unsurprisingly called the generalized Weierstrass normal form. It looks like this

\displaystyle y^2 + a_1 xy + a_3y = x^3 + a_2x^2 + a_4x + a_6

The same geometric idea of drawing lines works for this curve as well. It’s just that now the formula is way more complicated. It involves computing a bunch of helper constants and computing far more arithmetic. My colleague Daniel Ngheim was kind enough to code up the algorithm, and here it is

    def __add__(self, Q):
        if isinstance(Q, Ideal):
            return Point(self.curve, self.x, self.y)

        a1,a2,a3,a4,a6 = (self.curve.a1, self.curve.a2, self.curve.a3, self.curve.a4, self.curve.a6)

        if self.x == Q.x:
            x = self.x
            if self.y + Q.y + a1*x + a3 == 0:
                return Ideal(self.curve)
            else:
                c = ((3*x*x + 2*a2*x + a4 - a1*self.y) / (2*self.y + a1*x + a3))
                d = (-(x*x*x) + a4*x + 2*a6 - a3*self.y) / (2*self.y + a1*x + a3)
                Sum_x = c*c + a1*c - a2 - 2*self.x
                Sum_y = -(c + a1) * Sum_x - d - a3
                return Point(self.curve, Sum_x, Sum_y)
        else:
            c =  (Q.y - self.y) / (Q.x - self.x)
            d =  (self.y*Q.x - Q.y*self.x) / (Q.x - self.x)
            Sum_x = c*c + a1*c - a2 - self.x - Q.x
            Sum_y = -(c + a1)*Sum_x - d - a3
            return Point(self.curve, Sum_x, Sum_y)

   def __neg__(self):
      return Point(self.curve, self.x, -self.y - self.curve.a1*self.x - self.curve.a3)

I trust that the devoted reader could derive this algorithm by hand, but for a more detailed derivation see the book of Silverman (it’s a graduate level text, but the point is that if you’re not really serious about implementing elliptic curve cryptography then you shouldn’t worry about this more general algorithm).

One might start to wonder: are there still other forms of elliptic curves that we could use to get around some of the difficulties of the Weierstrass normal form? The answer is yes, but we’ll defer their discussion to a future post. The brief explanation is that through a different choice of variable changes you can get to a different form of curve, and the algorithms you get from writing out the algebraic equations for adding points are slightly more efficient.

For the remainder of this series we’ll just work with one family of finite fields, those fields of the form \mathbb{Z}/p for some large p. There is one particularly famous elliptic curve over this field that is used in some of the most secure applications in existence, and this will roughly be our target. In either case, we have provided the combined elliptic curve and finite field code (and the generalized elliptic curve class) on this blog’s Github page.

So in the next post we’ll actually start talking about cryptography and how to use elliptic curves to do things like generate a shared secret key.

Until then!