|The Fast Fourier Transform,
and a Noisy Signal
|Kolmogorov Complexity -
|Bezier Curves and Picasso||Computing Homology||Probably Approximately
Correct – A Formal Theory
This is a guest post by my colleague Adam Lelkes.
The goal of this primer is to introduce an important and beautiful tool from probability theory, a model of fair betting games called martingales. In this post I will assume that the reader is familiar with the basics of probability theory. For those that need to refresh their knowledge, Jeremy’s excellent primers (1, 2) are a good place to start.
The Geometric Distribution and the ABRACADABRA Problem
Before we start playing with martingales, let’s start with an easy exercise. Consider the following experiment: we throw an ordinary die repeatedly until the first time a six appears. How many throws will this take in expectation? The reader might recognize immediately that this exercise can be easily solved using the basic properties of the geometric distribution, which models this experiment exactly. We have independent trials, every trial succeeding with some fixed probability . If denotes the number of trials needed to get the first success, then clearly (since first we need failures which occur independently with probability , then we need one success which happens with probability ). Thus the expected value of is
by basic calculus. In particular, if success is defined as getting a six, then thus the expected time is .
Now let us move on to a somewhat similar, but more interesting and difficult problem, the ABRACADABRA problem. Here we need two things for our experiment, a monkey and a typewriter. The monkey is asked to start bashing random keys on a typewriter. For simplicity’s sake, we assume that the typewriter has exactly 26 keys corresponding to the 26 letters of the English alphabet and the monkey hits each key with equal probability. There is a famous theorem in probability, the infinite monkey theorem, that states that given infinite time, our monkey will almost surely type the complete works of William Shakespeare. Unfortunately, according to astronomists the sun will begin to die in a few billion years, and the expected time we need to wait until a monkey types the complete works of William Shakespeare is orders of magnitude longer, so it is not feasible to use monkeys to produce works of literature.
So let’s scale down our goals, and let’s just wait until our monkey types the word ABRACADABRA. What is the expected time we need to wait until this happens? The reader’s first idea might be to use the geometric distribution again. ABRACADABRA is eleven letters long, the probability of getting one letter right is , thus the probability of a random eleven-letter word being ABRACADABRA is exactly . So if typing 11 letters is one trial, the expected number of trials is
which means keystrokes, right?
Well, not exactly. The problem is that we broke up our random string into eleven-letter blocks and waited until one block was ABRACADABRA. However, this word can start in the middle of a block. In other words, we considered a string a success only if the starting position of the word ABRACADABRA was divisible by 11. For example, FRZUNWRQXKLABRACADABRA would be recognized as success by this model but the same would not be true for AABRACADABRA. However, it is at least clear from this observation that is a strict upper bound for the expected waiting time. To find the exact solution, we need one very clever idea, which is the following:
Let’s Open a Casino!
Do I mean that abandoning our monkey and typewriter and investing our time and money in a casino is a better idea, at least in financial terms? This might indeed be the case, but here we will use a casino to determine the expected wait time for the ABRACADABRA problem. Unfortunately we won’t make any money along the way (in expectation) since our casino will be a fair one.
Let’s do the following thought experiment: let’s open a casino next to our typewriter. Before each keystroke, a new gambler comes to our casino and bets $1 that the next letter will be A. If he loses, he goes home disappointed. If he wins, he bets all the money he won on the event that the next letter will be B. Again, if he loses, he goes home disappointed. (This won’t wreak havoc on his financial situation, though, as he only loses $1 of his own money.) If he wins again, he bets all the money on the event that the next letter will be R, and so on.
If a gambler wins, how much does he win? We said that the casino would be fair, i.e. the expected outcome should be zero. That means that it the gambler bets $1, he should receive $26 if he wins, since the probability of getting the next letter right is exactly (thus the expected value of the change in the gambler’s fortune is .
Let’s keep playing this game until the word ABRACADABRA first appears and let’s denote the number of keystrokes up to this time as . As soon as we see this word, we close our casino. How much was the revenue of our casino then? Remember that before each keystroke, a new gambler comes in and bets $1, and if he wins, he will only bet the money he has received so far, so our revenue will be exactly dollars.
How much will we have to pay for the winners? Note that the only winners in the last round are the players who bet on A. How many of them are there? There is one that just came in before the last keystroke and this was his first bet. He wins $26. There was one who came three keystrokes earlier and he made four successful bets (ABRA). He wins . Finally there is the luckiest gambler who went through the whole ABRACADABRA sequence, his prize will be . Thus our casino will have to give out dollars in total, which is just under the price of 200,000 WhatsApp acquisitions.
Now we will make one crucial observation: even at the time when we close the casino, the casino is fair! Thus in expectation our expenses will be equal to our income. Our income is dollars, the expected value of our expenses is dollars, thus . A beautiful solution, isn’t it? So if our monkey types at 150 characters per minute on average, we will have to wait around 47 million years until we see ABRACADABRA. Oh well.
Time to be More Formal
After giving an intuitive outline of the solution, it is time to formalize the concepts that we used, to translate our fairy tales into mathematics. The mathematical model of the fair casino is called a martingale, named after a class of betting strategies that enjoyed popularity in 18th century France. The gambler’s fortune (or the casino’s, depending on our viewpoint) can be modeled with a sequence of random variables. will denote the gambler’s fortune before the game starts, the fortune after one round and so on. Such a sequence of random variables is called a stochastic process. We will require the expected value of the gambler’s fortune to be always finite.
How can we formalize the fairness of the game? Fairness means that the gambler’s fortune does not change in expectation, i.e. the expected value of , given is the same as . This can be written as or, equivalently, .
The reader might be less comfortable with the first formulation. What does it mean, after all, that the conditional expected value of a random variable is another random variable? Shouldn’t the expected value be a number? The answer is that in order to have solid theoretical foundations for the definition of a martingale, we need a more sophisticated notion of conditional expectations. Such sophistication involves measure theory, which is outside the scope of this post. We will instead naively accept the definition above, and the reader can look up all the formal details in any serious probability text (such as ).
Clearly the fair casino we constructed for the ABRACADABRA exercise is an example of a martingale. Another example is the simple symmetric random walk on the number line: we start at 0, toss a coin in each step, and move one step in the positive or negative direction based on the outcome of our coin toss.
The Optional Stopping Theorem
Remember that we closed our casino as soon as the word ABRACADABRA appeared and we claimed that our casino was also fair at that time. In mathematical language, the closed casino is called a stopped martingale. The stopped martingale is constructed as follows: we wait until our martingale X exhibits a certain behaviour (e.g. the word ABRACADABRA is typed by the monkey), and we define a new martingale X’ as follows: let if and if where denotes the stopping time, i.e. the time at which the desired event occurs. Notice that itself is a random variable.
We require our stopping time to depend only on the past, i.e. that at any time we should be able to decide whether the event that we are waiting for has already happened or not (without looking into the future). This is a very reasonable requirement. If we could look into the future, we could obviously cheat by closing our casino just before some gambler would win a huge prize.
We said that the expected wealth of the casino at the stopping time is the same as the initial wealth. This is guaranteed by Doob’s optional stopping theorem, which states that under certain conditions, the expected value of a martingale at the stopping time is equal to its expected initial value.
Theorem: (Doob’s optional stopping theorem) Let be a martingale stopped at step , and suppose one of the following three conditions hold:
- The stopping time is almost surely bounded by some constant;
- The stopping time is almost surely finite and every step of the stopped martingale is almost surely bounded by some constant; or
- The expected stopping time is finite and the absolute value of the martingale increments are almost surely bounded by a constant.
We omit the proof because it requires measure theory, but the interested reader can see it in these notes.
For applications, (1) and (2) are the trivial cases. In the ABRACADABRA problem, the third condition holds: the expected stopping time is finite (in fact, we showed using the geometric distribution that it is less than ) and the absolute value of a martingale increment is either 1 or a net payoff which is bounded by . This shows that our solution is indeed correct.
Another famous application of martingales is the gambler’s ruin problem. This problem models the following game: there are two players, the first player has dollars, the second player has dollars. In each round they toss a coin and the loser gives one dollar to the winner. The game ends when one of the players runs out of money. There are two obvious questions: (1) what is the probability that the first player wins and (2) how long will the game take in expectation?
Let denote the change in the second player’s fortune, and set . Let denote the first time when . Then our first question can be formalized as trying to determine . Let . Clearly is a stopping time. By the optional stopping theorem we have that
I would like to ask the reader to try to answer the second question. It is a little bit trickier than the first one, though, so here is a hint: is also a martingale (prove it), and applying the optional stopping theorem to it leads to the answer.
A Randomized Algorithm for 2-SAT
The reader is probably familiar with 3-SAT, the first problem shown to be NP-complete. Recall that 3-SAT is the following problem: given a boolean formula in conjunctive normal form with at most three literals in each clause, decide whether there is a satisfying truth assignment. It is natural to ask if or why 3 is special, i.e. why don’t we work with -SAT for some instead? Clearly the hardness of the problem is monotone increasing in since -SAT is a special case of -SAT. On the other hand, SAT (without any bound on the number of literals per clause) is clearly in NP, thus 3-SAT is just as hard as -SAT for any . So the only question is: what can we say about 2-SAT?
It turns out that 2-SAT is easier than satisfiability in general: 2-SAT is in P. There are many algorithms for solving 2-SAT. Here is one deterministic algorithm: associate a graph to the 2-SAT instance such that there is one vertex for each variable and each negated variable and the literals and are connected by a directed edge if there is a clause . Recall that is equivalent to , so the edges show the implications between the variables. Clearly the 2-SAT instance is not satisfiable if there is a variable x such that there are directed paths and (since is always false). It can be shown that this is not only a sufficient but also a necessary condition for unsatisfiability, hence the 2-SAT instance is satisfiable if and only if there is are no such path. If there are directed paths from one vertex of a graph to another and vice versa then they are said to belong to the same strongly connected component. There are several graph algorithms for finding strongly connected components of directed graphs, the most well-known algorithms are all based on depth-first search.
Now we give a very simple randomized algorithm for 2-SAT (due to Christos Papadimitriou in a ’91 paper): start with an arbitrary truth assignment and while there are unsatisfied clauses, pick one and flip the truth value of a random literal in it. Stop after rounds where denotes the number of variables. Clearly if the formula is not satisfiable then nothing can go wrong, we will never find a satisfying truth assignment. If the formula is satisfiable, we want to argue that with high probability we will find a satisfying truth assignment in steps.
The idea of the proof is the following: fix an arbitrary satisfying truth assignment and consider the Hamming distance of our current assignment from it. The Hamming distance of two truth assignments (or in general, of two binary vectors) is the number of coordinates in which they differ. Since we flip one bit in every step, this Hamming distance changes by in every round. It also easy to see that in every step the distance is at least as likely to be decreased as to be increased (since we pick an unsatisfied clause, which means at least one of the two literals in the clause differs in value from the satisfying assignment).
Thus this is an unfair “gambler’s ruin” problem where the gambler’s fortune is the Hamming distance from the solution, and it decreases with probability at least . Such a stochastic process is called a supermartingale — and this is arguably a better model for real-life casinos. (If we flip the inequality, the stochastic process we get is called a submartingale.) Also, in this case the gambler’s fortune (the Hamming distance) cannot increase beyond . We can also think of this process as a random walk on the set of integers: we start at some number and in each round we make one step to the left or to the right with some probability. If we use random walk terminology, 0 is called an absorbing barrier since we stop the process when we reach 0. The number , on the other hand, is called a reflecting barrier: we cannot reach , and whenever we get close we always bounce back.
There is an equivalent version of the optimal stopping theorem for supermartingales and submartingales, where the conditions are the same but the consequence holds with an inequality instead of equality. It follows from the optional stopping theorem that the gambler will be ruined (i.e. a satisfying truth assignment will be found) in steps with high probability.
So far on this blog we’ve given some introductory notes on a few kinds of algebraic structures in mathematics (most notably groups and rings, but also monoids). Fields are the next natural step in the progression.
If the reader is comfortable with rings, then a field is extremely simple to describe: they’re just commutative rings with 0 and 1, where every nonzero element has a multiplicative inverse. We’ll give a list of all of the properties that go into this “simple” definition in a moment, but an even more simple way to describe a field is as a place where “arithmetic makes sense.” That is, you get operations for which satisfy the expected properties of addition, subtraction, multiplication, and division. So whatever the objects in your field are (and sometimes they are quite weird objects), they behave like usual numbers in a very concrete sense.
So here’s the official definition of a field. We call a set a field if it is endowed with two binary operations addition () and multiplication (, or just symbol juxtaposition) that have the following properties:
- There is an element we call 0 which is the identity for addition.
- Addition is commutative and associative.
- Every element has a corresponding additive inverse (which may equal ) for which .
These three properties are just the axioms of a (commutative) group, so we continue:
- There is an element we call 1 (distinct from 0) which is the identity for multiplication.
- Multiplication is commutative and associative.
- Every nonzero element has a corresponding multiplicative inverse (which may equal ) for which .
- Addition and multiplication distribute across each other as we expect.
If we exclude the existence of multiplicative inverses, these properties make a commutative ring, and so we have the following chain of inclusions that describes it all
The standard examples of fields are the real numbers , the rationals , and the complex numbers . But of course there are many many more. The first natural question to ask about fields is: what can they look like?
For example, can there be any finite fields? A field which as a set has only finitely many elements?
As we saw in our studies of groups and rings, the answer is yes! The simplest example is the set of integers modulo some prime . We call them or sometimes just for short, and let’s rederive what we know about them now.
As a set, consists of the integers . The addition and multiplication operations are easy to define, they’re just usual addition and multiplication followed by a modulus. That is, we add by and multiply with . This thing is clearly a commutative ring (because the integers form a commutative ring), so to show this is a field we need to show that everything has a multiplicative inverse.
There is a nice fact that allows us to do this: an element has an inverse if and only if the only way for it to divide zero is the trivial way . Here’s a proof. For one direction, suppose divides zero nontrivially, that is there is some with . Then if had an inverse , then , but that’s very embarrassing for because it claimed to be nonzero. Now suppose only divides zero in the trivial way. Then look at all possible ways to multiply by other nonzero elements of . No two can give you the same result because if then (without using multiplicative inverses) , but we know that can only divide zero in the trivial way so . In other words, the map “multiplication by ” is injective. Because the set of nonzero elements of is finite you have to hit everything (the map is in fact a bijection), and some will give you .
Now let’s use this fact on in the obvious way. Since is a prime, there are no two smaller numbers so that . But in the number is equivalent to zero (mod )! So has no nontrivial zero divisors, and so every element has an inverse, and so it’s a finite field with elements.
The next question is obvious: can we get finite fields of other sizes? The answer turns out to be yes, but you can get finite fields of any size. Let’s see why.
Characteristics and Vector Spaces
Say you have a finite field (lower-case k is the standard letter for a field, so let’s forget about ). Beacuse the field is finite, if you take 1 and keep adding it to itself you’ll eventually run out of field elements. That is, at some point. How do I know it’s zero and doesn’t keep cycling never hitting zero? Well if at two points , then is a time where you hit zero, contradicting the claim.
Now we define , the characteristic of , to be the smallest (sums of 1 with itself) for which . If there is no such (this can happen if is infinite, but doesn’t always happen for infinite fields), then we say the characteristic is zero. It would probably make more sense to say the characteristic is infinite, but that’s just the way it is. Of course, for finite fields the characteristic is always positive. So what can we say about this number? We have seen lots of example where it’s prime, but is it always prime? It turns out the answer is yes!
For if is composite, then by the minimality of we get , but . This can’t happen by our above observation, because being a zero divisor means you have no inverse! Contradiction, sucker.
But it might happen that there are elements of that can’t be written as for any number of terms. We’ll construct examples in a minute (in fact, we’ll classify all finite fields), but we already have a lot of information about what those fields might look like. Indeed, since every field has 1 in it, we just showed that every finite field contains a smaller field (a subfield) of all the ways to add 1 to itself. Since the characteristic is prime, the subfield is a copy of for . We call this special subfield the prime subfield of .
The relationship between the possible other elements of and the prime subfield is very neat. Because think about it: if is your field and is your prime subfield, then the elements of can interact with just like any other field elements. But if we separate from (make a separate copy of ), and just think of as having addition, then the relationship with is that of a vector space! In fact, whenever you have two fields , the latter has the structure of a vector space over the former.
Back to finite fields, is a vector space over its prime subfield, and now we can impose all the power and might of linear algebra against it. What’s it’s dimension? Finite because is a finite set! Call the dimension , then we get a basis . Then the crucial part: every element of has a unique representation in terms of the basis. So they are expanded in the form
where the come from . But now, since these are all just field operations, every possible choice for the has to give you a different field element. And how many choices are there for the ? Each one has exactly . And so by counting we get that has many elements.
This is getting exciting quickly, but we have to pace ourselves! This is a constraint on the possible size of a finite field, but can we realize it for all choices of ? The answer is again yes, and in the next section we’ll see how. But reader be warned: the formal way to do it requires a little bit of familiarity with ideals in rings to understand the construction. I’ll try to avoid too much technical stuff, but if you don’t know what an ideal is, you should expect to get lost (it’s okay, that’s the nature of learning new math!).
Constructing All Finite Fields
Let’s describe a construction. Take a finite field of characteristic , and say you want to make a field of size . What we need to do is construct a field extension, that is, find a bigger field containing so that the vector space dimension of our new field over is exactly .
What you can do is first form the ring of polynomials with coefficients in . This ring is usually denoted , and it’s easy to check it’s a ring (polynomial addition and multiplication are defined in the usual way). Now if I were speaking to a mathematician I would say, “From here you take an irreducible monic polynomial of degree , and quotient your ring by the principal ideal generated by . The result is the field we want!”
In less compact terms, the idea is exactly the same as modular arithmetic on integers. Instead of doing arithmetic with integers modulo some prime (an irreducible integer), we’re doing arithmetic with polynomials modulo some irreducible polynomial . Now you see the reason I used for a polynomial, to highlight the parallel thought process. What I mean by “modulo a polynomial” is that you divide some element in your ring by as much as you can, until the degree of the remainder is smaller than the degree of , and that’s the element of your quotient. The Euclidean algorithm guarantees that we can do this no matter what is (in the formal parlance, is called a Euclidean domain for this very reason). In still other words, the “quotient structure” tells us that two polynomials are considered to be the same in if and only if is divisible by . This is actually the same definition for , with polynomials replacing numbers, and if you haven’t already you can start to imagine why people decided to study rings in general.
Let’s do a specific example to see what’s going on. Say we’re working with and we want to compute a field of size . First we need to find a monic irreducible polynomial of degree . For now, I just happen to know one: . In fact, we can check it’s irreducible, because to be reducible it would have to have a linear factor and hence a root in . But it’s easy to see that if you compute and take (mod 3) you never get zero.
So I’m calling this new ring
It happens to be a field, and we can argue it with a whole lot of ring theory. First, we know an irreducible element of this ring is also prime (because the ring is a unique factorization domain), and prime elements generate maximal ideals (because it’s a principal ideal domain), and if you quotient by a maximal ideal you get a field (true of all rings).
But if we want to avoid that kind of argument and just focus on this ring, we can explicitly construct inverses. Say you have a polynomial , and for illustration purposes we’ll choose . Now in the quotient ring we could do polynomial long division to find remainders, but another trick is just to notice that the quotient is equivalent to the condition that . So we can reduce by applying this rule to to get
Now what’s the inverse of ? Well we need a polynomial whose product with gives us something which is equivalent to 1, after you reduce by . A few minutes of algebra later and you’ll discover that this is equivalent to the following polynomial being identically 1
In other words, we get a system of linear equations which we need to solve:
And from here you can solve with your favorite linear algebra techniques. This is a good exercise for working in fields, because you get to abuse the prime subfield being characteristic 3 to say terrifying things like and . The end result is that the inverse polynomial is , and if you were really determined you could write a program to compute these linear systems for any input polynomial and ensure they’re all solvable. We prefer the ring theoretic proof.
In any case, it’s clear that taking a polynomial ring like this and quotienting by a monic irreducible polynomial gives you a field. We just control the size of that field by choosing the degree of the irreducible polynomial to our satisfaction. And that’s how we get all finite fields!
One Last Word on Irreducible Polynomials
One thing we’ve avoided is the question of why irreducible monic polynomials exist of all possible degrees over any (and as a consequence we can actually construct finite fields of all possible sizes).
The answer requires a bit of group theory to prove this, but it turns out that the polynomial has all degree monic irreducible polynomials as factors. But perhaps a better question (for computer scientists) is how do we work over a finite field in practice? One way is to work with polynomial arithmetic as we described above, but this has some downsides: it requires us to compute these irreducible monic polynomials (which doesn’t sound so hard, maybe), to do polynomial long division every time we add, subtract, or multiply, and to compute inverses by solving a linear system.
But we can do better for some special finite fields, say where the characteristic is 2 (smells like binary) or we’re only looking at . The benefit there is that we aren’t forced to use polynomials. We can come up with some other kind of structure (say, matrices of a special form) which happens to have the same field structure and makes computing operations relatively painless. We’ll see how this is done in the future, and see it applied to cryptography when we continue with our series on elliptic curve cryptography.
Last time we saw a geometric version of the algorithm to add points on elliptic curves. We went quite deep into the formal setting for it (projective space ), and we spent a lot of time talking about the right way to define the “zero” object in our elliptic curve so that our issues with vertical lines would disappear.
With that understanding in mind we now finally turn to code, and write classes for curves and points and implement the addition algorithm. As usual, all of the code we wrote in this post is available on this blog’s Github page.
Points and Curves
Every introductory programming student has probably written the following program in some language for a class representing a point.
class Point(object): def __init__(self, x, y): self.x = x self.y = y
It’s the simplest possible nontrivial class: an x and y value initialized by a constructor (and in Python all member variables are public).
We want this class to represent a point on an elliptic curve, and overload the addition and negation operators so that we can do stuff like this:
p1 = Point(3,7) p2 = Point(4,4) p3 = p1 + p2
But as we’ve spent quite a while discussing, the addition operators depend on the features of the elliptic curve they’re on (we have to draw lines and intersect it with the curve). There are a few ways we could make this happen, but in order to make the code that uses these classes as simple as possible, we’ll have each point contain a reference to the curve they come from. So we need a curve class.
It’s pretty simple, actually, since the class is just a placeholder for the coefficients of the defining equation. We assume the equation is already in the Weierstrass normal form, but if it weren’t one could perform a whole bunch of algebra to get it in that form (and you can see how convoluted the process is in this short report or page 115 (pdf p. 21) of this book). To be safe, we’ll add a few extra checks to make sure the curve is smooth.
class EllipticCurve(object): def __init__(self, a, b): # assume we're already in the Weierstrass form self.a = a self.b = b self.discriminant = -16 * (4 * a*a*a + 27 * b * b) if not self.isSmooth(): raise Exception("The curve %s is not smooth!" % self) def isSmooth(self): return self.discriminant != 0 def testPoint(self, x, y): return y*y == x*x*x + self.a * x + self.b def __str__(self): return 'y^2 = x^3 + %Gx + %G' % (self.a, self.b) def __eq__(self, other): return (self.a, self.b) == (other.a, other.b)
And here’s some examples of creating curves
>>> EllipticCurve(a=17, b=1) y^2 = x^3 + 17x + 1 >>> EllipticCurve(a=0, b=0) Traceback (most recent call last): [...] Exception: The curve y^2 = x^3 + 0x + 0 is not smooth!
So there we have it. Now when we construct a Point, we add the curve as the extra argument and a safety-check to make sure the point being constructed is on the given elliptic curve.
class Point(object): def __init__(self, curve, x, y): self.curve = curve # the curve containing this point self.x = x self.y = y if not curve.testPoint(x,y): raise Exception("The point %s is not on the given curve %s" % (self, curve))
Note that this last check will serve as a coarse unit test for all of our examples. If we mess up then more likely than not the “added” point won’t be on the curve at all. More precise testing is required to be bullet-proof, of course, but we leave explicit tests to the reader as an excuse to get their hands wet with equations.
>>> c = EllipticCurve(a=1,b=2) >>> Point(c, 1, 2) (1, 2) >>> Point(c, 1, 1) Traceback (most recent call last): [...] Exception: The point (1, 1) is not on the given curve y^2 = x^3 + 1x + 2
Before we go ahead and implement addition and the related functions, we need to be decide how we want to represent the ideal point . We have two options. The first is to do everything in projective coordinates and define a whole system for doing projective algebra. Considering we only have one point to worry about, this seems like overkill (but could be fun). The second option, and the one we’ll choose, is to have a special subclass of Point that represents the ideal point.
class Ideal(Point): def __init__(self, curve): self.curve = curve def __str__(self): return "Ideal"
Note the inheritance is denoted by the parenthetical (Point) in the first line. Each function we define on a Point will require a 1-2 line overriding function in this subclass, so we will only need a small amount of extra bookkeeping. For example, negation is quite easy.
class Point(object): ... def __neg__(self): return Point(self.curve, self.x, -self.y) class Ideal(Point): ... def __neg__(self): return self
Note that Python allows one to override the prefix-minus operation by defining __neg__ on a custom object. There are similar functions for addition (__add__), subtraction, and pretty much every built-in python operation. And of course addition is where things get more interesting. For the ideal point it’s trivial.
class Ideal(Point): ... def __add__(self, Q): return Q
Why does this make sense? Because (as we’ve said last time) the ideal point is the additive identity in the group structure of the curve. So by all of our analysis, , and the code is satisfyingly short.
For distinct points we have to follow the algorithm we used last time. Remember that the trick was to form the line passing through the two points being added, substitute that line for in the elliptic curve, and then figure out the coefficient of in the resulting polynomial. Then, using the two existing points, we could solve for the third root of the polynomial using Vieta’s formula.
In order to do that, we need to analytically solve for the coefficient of the term of the equation . It’s tedious, but straightforward. First, write
The first step of expanding gives us
And we notice that the only term containing an part is the last one. Expanding that gives us
And again we can discard the parts that don’t involve . In other words, if we were to rewrite as , we’d expand all the terms and get something that looks like
where are some constants that we don’t need. Now using Vieta’s formula and calling the third root we seek, we know that
Which means that . Once we have , we can get from the equation of the line .
Note that this only works if the two points we’re trying to add are different! The other two cases were if the points were the same or lying on a vertical line. These gotchas will manifest themselves as conditional branches of our add function.
class Point(object): ... def __add__(self, Q): if isinstance(Q, Ideal): return self x_1, y_1, x_2, y_2 = self.x, self.y, Q.x, Q.y if (x_1, y_1) == (x_2, y_2): # use the tangent method ... else: if x_1 == x_2: return Ideal(self.curve) # vertical line # Using Vieta's formula for the sum of the roots m = (y_2 - y_1) / (x_2 - x_1) x_3 = m*m - x_2 - x_1 y_3 = m*(x_3 - x_1) + y_1 return Point(self.curve, x_3, -y_3)
First, we check if the two points are the same, in which case we use the tangent method (which we do next). Supposing the points are different, if their values are the same then the line is vertical and the third point is the ideal point. Otherwise, we use the formula we defined above. Note the subtle and crucial minus sign at the end! The point is the third point of intersection, but we still have to do the reflection to get the sum of the two points.
Now for the case when the points are actually the same. We’ll call it , and we’re trying to find . As per our algorithm, we compute the tangent line at . In order to do this we need just a tiny bit of calculus. To find the slope of the tangent line we implicitly differentiate the equation and get
The only time we’d get a vertical line is when the denominator is zero (you can verify this by taking limits if you wish), and so implies that and we’re done. The fact that this can ever happen for a nonzero should be surprising to any reader unfamiliar with groups! But without delving into a deep conversation about the different kinds of group structures out there, we’ll have to settle for such nice surprises.
In the other case , we plug in our values into the derivative and read off the slope as . Then using the same point slope formula for a line, we get , and we can use the same technique (and the same code!) from the first case to finish.
There is only one minor wrinkle we need to smooth out: can we be sure Vieta’s formula works? In fact, the real problem is this: how do we know that is a double root of the resulting cubic? Well, this falls out again from that very abstract and powerful theorem of Bezout. There is a lot of technical algebraic geometry (and a very interesting but complicated notion of dimension) hiding behind the curtain here. But for our purposes it says that our tangent line intersects the elliptic curve with multiplicity 2, and this gives us a double root of the corresponding cubic.
And so in the addition function all we need to do is change the slope we’re using. This gives us a nice and short implementation
def __add__(self, Q): if isinstance(Q, Ideal): return self x_1, y_1, x_2, y_2 = self.x, self.y, Q.x, Q.y if (x_1, y_1) == (x_2, y_2): if y_1 == 0: return Ideal(self.curve) # slope of the tangent line m = (3 * x_1 * x_1 + self.curve.a) / (2 * y_1) else: if x_1 == x_2: return Ideal(self.curve) # slope of the secant line m = (y_2 - y_1) / (x_2 - x_1) x_3 = m*m - x_2 - x_1 y_3 = m*(x_3 - x_1) + y_1 return Point(self.curve, x_3, -y_3)
What’s interesting is how little the data of the curve comes into the picture. Nothing depends on , and only one of the two cases depends on . This is one reason the Weierstrass normal form is so useful, and it may bite us in the butt later in the few cases we don’t have it (for special number fields).
Here are some examples.
>>> C = EllipticCurve(a=-2,b=4) >>> P = Point(C, 3, 5) >>> Q = Point(C, -2, 0) >>> P+Q (0.0, -2.0) >>> Q+P (0.0, -2.0) >>> Q+Q Ideal >>> P+P (0.25, 1.875) >>> P+P+P Traceback (most recent call last): ... Exception: The point (-1.958677685950413, 0.6348610067618328) is not on the given curve y^2 = x^3 + -2x + 4! >>> x = -1.958677685950413 >>> y = 0.6348610067618328 >>> y*y - x*x*x + 2*x - 4 -3.9968028886505635e-15
And so we crash headfirst into our first floating point arithmetic issue. We’ll vanquish this monster more permanently later in this series (in fact, we’ll just scrap it entirely and define our own number system!), but for now here’s a quick fix:
>>> import fractions >>> frac = fractions.Fraction >>> C = EllipticCurve(a = frac(-2), b = frac(4)) >>> P = Point(C, frac(3), frac(5)) >>> P+P+P (Fraction(-237, 121), Fraction(845, 1331))
Now that we have addition and negation, the rest of the class is just window dressing. For example, we want to be able to use the subtraction symbol, and so we need to implement __sub__
def __sub__(self, Q): return self + -Q
Note that because the Ideal point is a subclass of point, it inherits all of these special functions while it only needs to override __add__ and __neg__. Thank you, polymorphism! The last function we want is a scaling function, which efficiently adds a point to itself times.
class Point(object): ... def __mul__(self, n): if not isinstance(n, int): raise Exception("Can't scale a point by something which isn't an int!") else: if n < 0: return -self * -n if n == 0: return Ideal(self.curve) else: Q = self R = self if n & 1 == 1 else Ideal(self.curve) i = 2 while i <= n: Q = Q + Q if n & i == i: R = Q + R i = i << 1 return R def __rmul__(self, n): return self * n class Ideal(Point): ... def __mul__(self, n): if not isinstance(n, int): raise Exception("Can't scale a point by something which isn't an int!") else: return self
The scaling function allows us to quickly compute ( times). Indeed, the fact that we can do this more efficiently than performing additions is what makes elliptic curve cryptography work. We’ll take a deeper look at this in the next post, but for now let’s just say what the algorithm is doing.
Given a number written in binary , we can write as
The advantage of this is that we can compute each of the iteratively using only additions by multiplying by 2 (adding something to itself) times. Since the number of bits in is , we’re getting a huge improvement over additions.
The algorithm is given above in code, but it’s a simple bit-shifting trick. Just have be some power of two, shifted by one at the end of every loop. Then start with being , and replace , and in typical programming fashion we drop the indices and overwrite the variable binding at each step (Q = Q+Q). Finally, we have a variable to which is added when the -th bit of is a 1 (and ignored when it’s 0). The rest is bookkeeping.
Note that __mul__ only allows us to write something like P * n, but the standard notation for scaling is n * P. This is what __rmul__ allows us to do.
We could add many other helper functions, such as ones to allow us to treat points as if they were lists, checking for equality of points, comparison functions to allow one to sort a list of points in lex order, or a function to transform points into more standard types like tuples and lists. We have done a few of these that you can see if you visit the code repository, but we’ll leave flushing out the class as an exercise to the reader.
>>> import fractions >>> frac = fractions.Fraction >>> C = EllipticCurve(a = frac(-2), b = frac(4)) >>> P = Point(C, frac(3), frac(5)) >>> Q = Point(C, frac(-2), frac(0)) >>> P-Q (Fraction(0, 1), Fraction(-2, 1)) >>> P+P+P+P+P (Fraction(2312883, 1142761), Fraction(-3507297955, 1221611509)) >>> 5*P (Fraction(2312883, 1142761), Fraction(-3507297955, 1221611509)) >>> Q - 3*P (Fraction(240, 1), Fraction(3718, 1)) >>> -20*P (Fraction(872171688955240345797378940145384578112856996417727644408306502486841054959621893457430066791656001, 520783120481946829397143140761792686044102902921369189488390484560995418035368116532220330470490000), Fraction(-27483290931268103431471546265260141280423344817266158619907625209686954671299076160289194864753864983185162878307166869927581148168092234359162702751, 11884621345605454720092065232176302286055268099954516777276277410691669963302621761108166472206145876157873100626715793555129780028801183525093000000))
As one can see, the precision gets very large very quickly. One thing we’ll do to avoid such large numbers (but hopefully not sacrifice security) is to work in finite fields, the simplest version of which is to compute modulo some prime.
So now we have a concrete understanding of the algorithm for adding points on elliptic curves, and a working Python program to do this for rational numbers or floating point numbers (if we want to deal with precision issues). Next time we’ll continue this train of thought and upgrade our program (with very little work!) to work over other simple number fields. Then we’ll delve into the cryptographic issues, and talk about how one might encode messages on a curve and use algebraic operations to encode their messages.
I’m pleased to announce that another paper of mine is finished. This one is submitted to ICALP, which is being held in Copenhagen this year (this whole research thing is exciting!). This is joint work with my advisor, Lev Reyzin. As with my first paper, I’d like to explain things here on my blog a bit more informally than a scholarly article allows.
A Recent History of Graph Coloring
One of the first important things you learn when you study graphs is that coloring graphs is hard. Remember that coloring a graph with colors means that you assign each vertex a color (a number in ) so that no vertex is adjacent to a vertex of the same color (no edge is monochromatic). In fact, even deciding whether a graph can be colored with just colors (not to mention finding such a coloring) has no known polynomial time algorithm. It’s what’s called NP-hard, which means that almost everyone believes it’s hopeless to solve efficiently in the worst case.
One might think that there’s some sort of gradient to this problem, that as the graphs get more “complicated” it becomes algorithmically harder to figure out how colorable they are. There are some notions of “simplicity” and “complexity” for graphs, but they hardly fall on a gradient. Just to give the reader an idea, here are some ways to make graph coloring easy:
- Make sure your graph is planar. Then deciding 4-colorability is easy because the answer is always yes.
- Make sure your graph is triangle-free and planar. Then finding a 3-coloring is easy.
- Make sure your graph is perfect (which again requires knowledge about how colorable it is).
- Make sure your graph has tree-width or clique-width bounded by a constant.
- Make sure your graph doesn’t have a certain kind of induced subgraph (such as having no induced paths of length 4 or 5).
Let me emphasize that these results are very difficult and tricky to compare. The properties are inherently discrete (either perfect or imperfect, planar or not planar). The fact that the world has not yet agreed upon a universal measure of complexity for graphs (or at least one that makes graph coloring easy to understand) is not a criticism of the chef but a testament to the challenge and intrigue of the dish.
Coloring general graphs is much bleaker, where the focus has turned to approximations. You can’t “approximate” the answer to whether a graph is colorable, so now the key here is that we are actually trying to find an approximate coloring. In particular, if you’re given some graph and you don’t know the minimum number of colors needed to color it (say it’s , this is called the chromatic number), can you easily color it with what turns out to be, say, colors?
Garey and Johnson (the gods of NP-hardness) proved this problem is again hard. In fact, they proved that you can’t do better than twice the number of colors. This might not seem so bad in practice, but the story gets worse. This lower bound was improved by Zuckerman, building on the work of Håstad, to depend on the size of the graph! That is, unless , all efficient algorithms will use asymptotically more than colors for any in the worst case, where is the number of vertices of . So the best you can hope for is being off by something like a multiplicative factor of . You can actually achieve this (it’s nontrivial and takes a lot of work), but it carries that aura of pity for the hopeful graph colorer.
The next avenue is to assume you know the chromatic number of your graph, and see how well you can do then. For example: if you are given the promise that a graph is 3-colorable, can you efficiently find a coloring with 8 colors? The best would be if you could find a coloring with 4 colors, but this is already known to be NP-hard.
The best upper bounds, algorithms to find approximate colorings of 3-colorable graphs, also pitifully depend on the size of the graph. Remember I say pitiful not to insult the researchers! This decades-long line of work was extremely difficult and deserves the highest praise. It’s just frustrating that the best known algorithm to color a 3-colorable graph requires up to colors. At least it bypasses the barrier of mentioned above, so we know that knowing the chromatic number actually does help.
The lower bounds are a bit more hopeful; it’s known to be NP-hard to color a -colorable graph using colors if is sufficiently large. There are a handful of other linear lower bounds that work for all , but to my knowledge this is the best asymptotic result. The big open problem (which I doubt many people have their eye on considering how hard it seems) is to find an upper bound depending only on . I wonder offhand whether a ridiculous bound like colors would be considered progress, and I bet it would.
Our Idea: Resilience
So without big breakthroughs on the front of approximate graph coloring, we propose a new front for investigation. The idea is that we consider graphs which are not only colorable, but remain colorable under the adversarial operation of adding a few new edges. More formally,
Definition: A graph is called -resiliently -colorable if two properties hold
- is -colorable.
- For any set of edges disjoint from , the graph is -colorable.
The simplest nontrivial example of this is 1-resiliently 3-colorable graphs. That is a graph that is 3-colorable and remains 3-colorable no matter which new edge you add. And the question we ask of this example: is there a polynomial time algorithm to 3-color a 1-resiliently 3-colorable graph? We prove in our paper that this is actually NP-hard, but it’s not a trivial thing to see.
The chief benefit of thinking about resiliently colorable graphs is that it provides a clear gradient of complexity from general graphs (zero-resilient) to the empty graph (which is -resiliently -colorable). We know that the most complex case is NP-hard, and maximally resilient graphs are trivially colorable. So finding the boundary where resilience makes things easy can shed new light on graph coloring.
Indeed, we argue in the paper that lots of important graphs have stronger resilience properties than one might expect. For example, here are the resilience properties of some famous graphs.
If I were of a mind to do applied graph theory, I would love to know about the resilience properties of graphs that occur in the wild. For example, the reader probably knows the problem of register allocation is a natural graph coloring problem. I would love to know the resilience properties of such graphs, with the dream that they might be resilient enough on average to admit efficient coloring algorithms.
Unfortunately the only way that I know how to compute resilience properties is via brute-force search, and of course this only works for small graphs and small . If readers are interested I could post such a program (I wrote it in vanilla python), but for now I’ll just post a table I computed on the proportion of small graphs that have various levels of resilience (note this includes graphs that vacuously satisfy the definition).
Percentage of k-colorable graphs on 6 vertices which are n-resilient k\n 1 2 3 4 ---------------------------------------- 3 58.0 22.7 5.9 1.7 4 93.3 79.3 58.0 35.3 5 99.4 98.1 94.8 89.0 6 100.0 100.0 100.0 100.0 Percentage of k-colorable graphs on 7 vertices which are n-resilient k\n 1 2 3 4 ---------------------------------------- 3 38.1 8.2 1.2 0.3 4 86.7 62.6 35.0 14.9 5 98.7 95.6 88.5 76.2 6 99.9 99.7 99.2 98.3 Percentage of k-colorable graphs on 8 vertices which are n-resilient k\n 1 2 3 4 ---------------------------------------- 3 21.3 2.1 0.2 0.0 4 77.6 44.2 17.0 4.5
The idea is this: if this trend continues, that only some small fraction of all 3-colorable graphs are, say, 2-resiliently 3-colorable graphs, then it should be easy to color them. Why? Because resilience imposes structure on the graphs, and that structure can hopefully be realized in a way that allows us to color easily. We don’t know how to characterize that structure yet, but we can give some structural implications for sufficiently resilient graphs.
For example, a 7-resiliently 5-colorable graph can’t have any subgraphs on 6 vertices with edges, or else we can add enough edges to get a 6-clique which isn’t 5-colorable. This gives an obvious general property about the sizes of subgraphs in resilient graphs, but as a more concrete instance let’s think about 2-resilient 3-colorable graphs . This property says that no set of 4 vertices may have more than edges in . This rules out 4-cycles and non-isolated triangles, but is it enough to make 3-coloring easy? We can say that is a triangle-free graph and a bunch of disjoint triangles, but it’s known 3-colorable non-planar triangle-free graphs can have arbitrarily large chromatic number, and so the coloring problem is hard. Moreover, 2-resilience isn’t enough to make planar. It’s not hard to construct a non-planar counterexample, but proving it’s 2-resilient is a tedious task I relegated to my computer.
Speaking of which, the problem of how to determine whether a -colorable graph is -resiliently -colorable is open. Is this problem even in NP? It certainly seems not to be, but if it had a nice characterization or even stronger necessary conditions than above, we might be able to use them to find efficient coloring algorithms.
In our paper we begin to fill in a table whose completion would characterize the NP-hardness of coloring resilient graphs
Ignoring the technical notion of 2-to-1 hardness (it’s technical), the paper accomplishes this as follows. First, we prove some relationships between cells. In particular, if a cell is NP-hard then so are all the cells to the left and below it. So our Theorem 1, that 3-coloring 1-resiliently 3-colorable graphs is NP-hard, gives us the entire black region, though more trivial arguments give all except the (3,1) cell. Also, if a cell is in P (it’s easy to -color graphs with that resilience), then so are all cells above and to its right. We prove that -coloring -resiliently -colorable graphs is easy. This is trivial: no vertex may have degree greater than , and the greedy algorithm can color such graphs with colors. So that gives us the entire light gray region.
There is one additional lower bound comes from the fact that it’s NP-hard to -color a -colorable graph. In particular, we prove that if you have any function that makes it NP-hard to -color a -colorable graph, then it is NP-hard to -color an -resiliently -colorable graph. The exponential lower bound hence gives us a nice linear lower bound, and so we have the following “sufficiently zoomed out” picture of the table
The paper contains the details of how these observations are proved, in addition to the NP-hardness proof for 1-resiliently 3-colorable graphs. This leaves the following open problems:
- Get an unconditional, concrete linear resilience lower bound for hardness.
- Find an algorithm that colors graphs that are less resilient than . Even determining specific cells like (4,5) or (5,9) would likely give enough insight for this.
- Classify the tantalizing (3,2) cell (determine if it’s hard or easy to 3-color a 2-resiliently 3-colorable graph) or even better the (4,2) cell.
- Find a way to relate resilient coloring back to general coloring. For example, if such and such cell is hard, then you can’t approximate k-coloring to within so many colors.
But Wait, There’s More!
Though this paper focuses on graph coloring, our idea of resilience doesn’t stop there (and this is one reason I like it so much!). One can imagine a notion of resilience for almost any combinatorial problem. If you’re trying to satisfy boolean formulas, you can define resilience to mean that you fix the truth value of some variable (we do this in the paper to build up to our main NP-hardness result of 3-coloring 1-resiliently 3-colorable graphs). You can define resilient set cover to allow the removal of some sets. And any other sort of graph-based problem (Traveling salesman, max cut, etc) can be resiliencified by adding or removing edges, whichever makes the problem more constrained.
So this resilience notion is quite general, though it’s hard to define precisely in a general fashion. There is a general framework called Constraint Satisfaction Problems (CSPs), but resilience here seem too general. A CSP is literally just a bunch of objects which can be assigned some set of values, and a set of constraints (k-ary 0-1-valued functions) that need to all be true for the problem to succeed. If we were to define resilience by “adding any constraint” to a given CSP, then there’s nothing to stop us from adding the negation of an existing constraint (or even the tautologically unsatisfiable constraint!). This kind of resilience would be a vacuous definition, and even if we try to rule out these edge cases, I can imagine plenty of weird things that might happen in their stead. That doesn’t mean there isn’t a nice way to generalize resilience to CSPs, but it would probably involve some sort of “constraint class” of acceptable constraints, and I don’t know a reasonable property to impose on the constraint class to make things work.
So there’s lots of room for future work here. It’s exciting to think where it will take me.