Elliptic Curves as Algebraic Structures

Last time we looked at the elementary formulation of an elliptic curve as the solutions to the equation

$ y^2 = x^3 + ax + b$

where $ a,b$ are such that the discriminant is nonzero:

$ -16(4a^3 + 27b^2) \neq 0$

We have yet to explain why we want our equation in this form, and we will get to that, but first we want to take our idea of intersecting lines as far as possible.

Fair warning: this post will start out at the same level as the previous post, but we intend to gradually introduce some mathematical maturity. If you don’t study mathematics, you’ll probably see terminology and notation somewhere between mysterious and incomprehensible. In particular, we will spend a large portion of this post explaining projective coordinates, and we use the blackboard-bold $ \mathbb{R}$ to denote real numbers.

Skimming difficult parts, asking questions in the comments, and pushing through to the end are all encouraged.

The Algorithm to Add Points

The deep idea, and the necessary insight for cryptography, is that the points on an elliptic curve have an algebraic structure. What I mean by this is that you can “add” points in a certain way, and it will satisfy all of the properties we expect of addition with true numbers. You may have guessed it based on our discussion in the previous post: adding two points will involve taking the line passing between them and finding the third point of intersection with the elliptic curve. But in order to make “adding points” rigorous we need to deal with some special cases (such as the vertical line problem we had last time).

So say we have two points $ P, Q$ on an elliptic curve $ E$ defined by $ y^2 = x^3 + ax + b$. By saying they’re “on the curve” we mean their coordinates satisfy the equation defining the curve. Then to add $ P + Q$, we do the following geometric algorithm:

  1. Form the line $ y = L(x)$ connecting $ P$ and $ Q$.
  2. Compute the third intersection point of $ L$ with $ E$ (the one that’s not $ P$ or $ Q$). Call it $ R$.
  3. Reflect $ R$ across the $ x$-axis to get the final point $ P + Q$.

Here’s that shown visually on our practice curve $ E: y^2 = x^3 – x + 1$.

add-points-example

Adding P and Q on an elliptic curve.

This algorithm might seem dubious, but it’s backed up by solid mathematics. For example, it’s almost immediately obvious that step 1 will always work (that you can always form such a line), the only exception being when $ P = Q$. And it’s almost a theorem that step 2 will always work (that there is always a third point of intersection), the only exception being the vertical line. If we ignore these exceptional cases, then the correctness of the algorithm is easy to prove, because we can just generalize the idea from last time.

Solving the joint system of the curve and the line is equivalent to solving

$ L(x)^2 = x^3 + ax + b$

Since $ L(x)$ is a degree 1 polynomial, this equation is a cubic polynomial in $ x$

$ x^3 – L(x)^2 + ax + b = 0$

If we already have two solutions to this equation with distinct $ x$-values (two points on the curve that don’t form a vertical line) then there has to be a third. Why? Because having a root of a polynomial means you can factor, and we have two distinct roots, so we know that our polynomial has as a divisor

$ (x – p_1)(x – q_1)$

But then the remainder must be a linear polynomial, and because the leading term is $ x^3$ it has to look like $ (x – r_1)$ for some $ r_1$. And so $ (r_1, L(r_1))$ is our third point. Moreover, $ p_1 + r_1 + q_1$ must be equal to the opposite of the coefficient of $ x^2$ in the equation above, so we can solve for it without worry about how to factor cubic polynomials. When we get down to some nitty-gritty code we’ll need to be more precise here with equations and such, but for now this is good enough.

Pause, Breathe, Reflect

It’s time to take a step back. There is a big picture overarching our work here. We have this mathematical object, a bunch of points on a curve in space, and we’re trying to say that it has algebraic structure. As we’ve been saying, we want to add points on the curve using our algorithm and always get a point on the curve as a result.

But beyond the algorithm, two important ideas are at work here. The first is that any time you make a new mathematical definition with the intent of overloading some operators (in this case, + and -), you want to make sure the operators behave like we expect them to. Otherwise “add” is a really misleading name!

The second idea is that we’re encoding computational structure in an elliptic curve. The ability to add and negate opens up a world of computational possibilities, and so any time we find algebraic structure in a mathematical object we can ask questions about the efficiency of computing functions within that framework (and reversing them!). This is precisely what cryptographers look for in a new encryption scheme: functions which are efficient to compute, but very hard to reverse if all you know is the output of the function.

So what are the properties we need to make sure addition behaves properly?

  1. We need there to be an the additive identity, which we’ll call zero, for which $ P + 0 = 0 + P = P$.
  2. We need every point $ P$ to have an inverse $ -P$ for which $ P + (-P) = 0$.
  3. We want adding to commute, so that $ P + Q = Q + P$. This property of an algebraic structure is called abelian or commutative.
  4. We need addition to be associative, so that $ (P + Q) + R = P + (Q + R)$.

In fact, if you just have a general collection of things (a set, in the mathematical parlance) and an operation which together satisfy these four properties, we call that a commutative group. By the end of this series we’ll to switch to the terminology of groups to get a mathematically mature viewpoint on elliptic curves, because it turns out that not all types of algebraic structure are the same (there are lots of different groups). But we’ll introduce it slowly as we see why elliptic curves form groups under the addition of their points.

There are still some things about our adding algorithm above that aren’t quite complete. We still need to know:

  1. What will act as zero.
  2. How to get the additive inverse of a point.
  3. How to add a point to itself.
  4. What to do if the two points form a vertical line.

The first, second, and fourth items are all taken care of in one fell swoop (and this is the main bit of mathematical elbow grease we have been hinting at), but the third point has a quick explanation if you’re willing to postpone a technical theorem. If you want to double a point, or add $ P + P = 2P$, then you can’t “take the line joining those two points” because you need two distinct points to define a line. But you can take the tangent line to the curve at $ P$ and look for the second point of intersection. Again we still have to worry about the case of a vertical line. But ignoring that, the reason there will always be a second point of intersection is called Bezout’s theorem, and this theorem is so strong and abstract that it’s very difficult to discuss it with what we presently know, but it has to do with counting multiplicity of roots of polynomial equations. Seeing that it’s mostly a technical tool, we’ll just be glad it’s there.

So with that, let’s get to the most important bit.

Projective Space and the Ideal Line

The shortest way to describe what will act as zero in our elliptic curve is to say that we invent a new point which is the “intersection” of all vertical lines. Because it’s the intersection of vertical lines, we’ll sometimes call it “the point at infinity.” We’ll also call it “zero” because it’s supposed to be the additive identity, we’ll demand it lies on every elliptic curve, and we’ll enforce that if we “reflect” zero across the $ x$-axis we still get zero. And then everything works!

If you want to add two points that form a vertical line, well, now the third point of intersection is zero and reflecting zero still gives you zero. If you want to get the additive inverse of a point $ P = (x,y)$, just have it be the point $ -P = (x, -y)$ reflected across the $ x$-axis; the two points form a vertical line and so by our algorithm they “add up” to zero. So it’s neat and tidy.

But wait, wait, wait. Points at infinity? All vertical lines intersect? This isn’t like any geometry I’ve ever seen before. How do we know we can do this without getting some massive contradictions!?

This is the best question one could ask, and we refuse to ignore it. Many articles aimed at a general technically literate audience get very fuzzy at this part. They make it seem like the mathematics we’re doing is magic, and they ignore the largest mathematical elephant in the room. Of course it’s not magic, and all of this has a solid foundation called projective space. We’re going to explore its basics now.

There is a long history of arguing over Euclid’s geometric axioms and how the postulate that parallel lines never intersect doesn’t follow from the other axioms. This is exactly what we’re going for, but blah blah blah let’s just see the construction already!

The crazy brilliant idea is that we want to make a geometric space where points are actually lines.

What will happen is that our elliptic curve equations will get an extra variable to account for this new kind of geometry, and a special value for this new variable will allow us to recover the usual pictures of elliptic curves in the plane (this is intentionally vague and will become more precise soon).

So how might one make such a geometry? Well you can do it with or without linear algebra. The way without linear algebra is to take three dimensional Euclidean space $ \mathbb{R}^3$, and look at the lines that pass through the origin. Make each of these lines its own point, and call the resulting space $ \mathbb{P}^2$, the projective plane. (For avid readers of this blog, this is exactly the same construction as $ \mathbb{R}\textup{P}^2$ we gave in our second topology primer, just seen from a very different angle. Leave a comment if you want to hear more.)

The problem with the non-linear-algebra approach is that we get no natural coordinate system, and we’re dying for coordinates (we’re here to compute stuff, after all!). So we need to talk about vectors. Every nonzero vector $ v$ in $ \mathbb{R}^3$ spans a line (by taking all multiples $ \lambda v$ for $ \lambda \in \mathbb{R}$). So instead of representing a point in projective space by a line, we can represent it by a vector, with the additional condition that two points are the same if they’re multiples of each other.

Here’s a picture of this idea. The two vectors displayed are equal to each other because they lie on the same line (they are multiples of each other).

These two vectors are equivalent because they lie on (span) the same line.

These two vectors are equivalent because they lie on (span) the same line.

Don’t run away because a very detailed explanation will follow what I’m about to say, but the super formal way of saying this is that projective space is the quotient space

$ \displaystyle \mathbb{P}^2 = (\mathbb{R}^3 – \left \{ 0 \right \} ) / v \sim \lambda v$

Still here? Okay great. Let’s talk about coordinates. If we are working with vectors in $ \mathbb{R}^3$, then we’re looking at coordinates like

$ (x, y, z)$

where $ x,y,z$ are real numbers. Trivial. But now in projective space we’re asserting two things. First, $ (0,0,0)$ is not allowed. And second, whenever we have $ (x,y,z)$, we’re declaring that it’s the same thing as $ (2x,2y,2z)$ or $ (-6x, -6y, -6z)$ or any other way to scale every component by the same amount. To denote the difference between usual vectors (parentheses) and our new coordinates, we use square brackets and colons. So a point in 2-dimensional projective space is

$ [x:y:z]$

where $ x,y,z$ are real numbers that are not all zero, and $ [x:y:z] = \lambda [x : y : z] = [\lambda x : \lambda y : \lambda z]$ for any $ \lambda \in \mathbb{R}$.

Now we can make some “canonical choices” for our coordinates, and start exploring how the assertions we made shape the geometry of this new space. For example, if $ P = [x:y:z]$ with $ z \neq 0$ then we can always scale by $ 1/z$ so that the point looks like

$ [x/z : y/z : 1]$

Now $ x/z$ and $ y/z$ can be anything (just think of it as $ [a : b : 1]$), and different choices of $ a,b$ give distinct points. Why? Because if we tried to scale to make two points equal we’d be screwing up the 1 that’s fixed in the third coordinate. So when $ z$ is nonzero we have this special representation (often called an affine slice), and it’s easy to see that all of these points form a copy of the usual Euclidean plane sitting inside $ \mathbb{P}^2$. There is a nice way to visualize exactly how this copy can be realized as “usual Euclidean space” using the picture below:

proj-eucl-ex

The indigo plane is the “affine slice” given by z = 1. It’s called “affine” because the indigo plane doesn’t intersect the origin. Image source: math.toronto.edu

Each line (each vector with $ z \neq 0$) intersects the indigo plane in exactly one point, so this describes a one-to-one mapping of points in the affine slice to the Euclidean plane.

But then when $ z = 0$, we get some other stuff that makes up the rest of $ \mathbb{P}^2$. Since $ x,y$ can’t both also be zero, we’ll suppose $ y$ is not zero, and then we can do the same normalization trick to see that all the points we get are

$ [a : 1 : 0]$

Since again $ a$ can be anything and you get distinct points for different choices of $ a$, this forms a copy of the real line inside of $ \mathbb{P}^2$ but outside of the affine slice. This line is sometimes called the ideal line to distinguish it from the lines that lie inside the affine slice $ z=1$. Actually, the ideal line is more than just a line. It’s (gasp) a circle! Why do I say that? Well think about what happens as $ a$ gets really large (or really negatively large). We have

$ [a : 1 : 0] = [1 : 1/a : 0]$

and the right hand side approaches $ [1 : 0 : 0]$, the last missing point! Another way to phrase our informal argument is to say $ [1 : 0 : 0]$ is the boundary of the line $ [a : 1 : 0]$, and (you guessed it) the circle we get here is the boundary of the affine slice $ [a : b : 1]$. And we can see exactly what it means for “two parallel lines” to intersect. Take the two lines given by

$ [a : 1 : 1], [b : 2 : 1]$

If we think of these as being in the affine slice of $ \mathbb{R}^2$ where $ z = 1$, it’s the lines given by $ (a, 1), (b, 2)$, which are obviously parallel. But where do they intersect as $ a,b$ get very large (or very negatively large)? at

$ [1 : 1/a : 1/a], [1 : 2/b : 1/b]$

which both become $ [1 : 0 : 0]$ in the limit. I’m being a bit imprecise here appealing to limits, but it works because projective space inherits some structure when we pass to the quotient (for the really technically inclined, it inherits a metric that comes from the sphere of radius 1). This is why we feel compelled to call it a quotient despite how confusing quotients can be, and it illustrates the power of appealing to these more abstract constructions.

In any case, now we have this image of projective space:

Our mental image of the projective plane: a big copy of the Euclidean plane (affine slice) whose boundary is the ideal line, whose boundary is in turn the single point [1:0:0].

Our mental image of the projective plane: a big copy of the Euclidean plane (affine slice z=1) whose boundary is the ideal line, whose boundary is in turn the single point [1:0:0].

It should be pretty clear that the choice of $ z=1$ to represent the affine slice is arbitrary, and we could have used $ x=1$ or $ y=1$ to realize different “copies” of the Euclidean plane sitting inside projective space. But in any case, we can use our new understanding to turn back to elliptic curves.

Homogeneous Equations and the Weierstrass Normal Form

Elliptic curves are officially “projective objects” in the sense that they are defined by homogeneous equations over projective space. That is, an elliptic curve equation is any homogeneous degree three equation whose discriminant is zero. By homogeneous I mean all the powers of the terms add up to three, so it has the general form

$ 0 = a_0 z^3 + a_1 z^2y + a_2 z^2x + a_3y^2x + \dots$

And note that now the solutions to this equation are required to be projective points $ [x : y : z]$. As an illuminating exercise, prove that $ [x : y : z]$ is a solution if and only if $ [\lambda x : \lambda y: \lambda z]$ is, i.e. that our definition of “solution” and the use of homogeneous equations is well-defined.

But to work with projective language forever is burdensome, something that only mathematicians are required to do. And in the case of elliptic curves we only use one new point from projective space (the intersection of the vertical lines). Once we get to writing programs we’ll have a special representation for points that aren’t in the affine slice, so we will be able to use regular Euclidean coordinates without much of a fuss.

That being said, we can now officially explain why we want the special form of elliptic curve $ y^2 = x^3 + ax + b$, called the Weierstrass normal form (pronounced VY-er-shtrahss). Specifically, elliptic curves can look very weird in their natural coordinates.

An elliptic curve before being converted to Weierstrass normal form.

An elliptic curve before being converted to Weierstrass normal form. Blech, it’s not even symmetric!

So to bring some order to the chaos, we want the projective point $ [0 : 1 : 0]$ to be our zero point. If we choose our axes appropriately (swapping the letters $ x, y, z$), then $ [0:1:0]$ lies at the intersection of all vertical lines. Now that point isn’t a solution to all homogeneous degree-three curves, but it is a solution to homogeneous equations that look like this (plug it in to see).

$ \displaystyle y^2z = x^3 + ax z^2 + b z^3$

Starting to look familiar? It turns out there is a theorem (that requires either heavy mathematical machinery or LOTS of algebra to prove) that says that for any homogeneous degree three equation you start with, you can always pick your “projective axes” (that is, apply suitable projective transformations) to get an equivalent equation of the form above. I mean equivalent in the sense that the transformation we applied took solutions of the original equation to solutions of this new equation and didn’t create any new ones. The casual reader should think of all this machinery as really clever changes of variables.

And then if we pick our classical Euclidean slice to be $ [x : y : 1]$, we get back to the standard form $ y^2 = x^3 + ax + b$. This is the Weierstrass normal form.

So that was a huge detour…

Back to adding points on elliptic curves. Now that we’ve defined zero one can check that addition makes sense. Zero has the needed property $ P + 0 = P$ since the “third” point of intersection of the vertical line passing through $ P$ is the reflection of $ P$ across the $ x$-axis, and reflecting that across the $ x$-axis give you $ P$. For the same reason $ P + (-P) = 0$. Even more, properties like $ 0 = -0$ naturally fall out of our definitions for projective coordinates, since $ [0:1:0] = [0:-1:0] = -1[0:1:0]$. So projective space, rather than mathematical hocus-pocus, is the correct setting to think about algebra happening on elliptic curves.

It’s also clear that addition is commutative because lines passing through points don’t care about the order of the points. The only real issue is whether addition is associative. That is whether $ (P + Q) + R = P + (Q + R)$ no matter what $ P,Q,R$ are. This turns out to be difficult, and it takes a lot of algebra and the use of that abstract Bezout’s theorem we mentioned earlier, so the reader will have to trust us that everything works out. (Although, Wikipedia has a slick animation outlining one such proof.)

So “adding points,” and that pesky “point at infinity” now officially makes sense!

What we’ve shown in the mathematical parlance is that the solutions to an elliptic curve form a group under this notion of addition. However, one thing we haven’t talked about is where these numbers come from. Recall from last time that we were interested in integer points on the elliptic curve, but it was clear from our example that adding two integer-valued points on an elliptic curve might not give you an integer-valued point.

However, if we require that our equation has coefficients in a field, and we allow our points to have coordinates in that same field, then adding two points with coordinates in the field always gives you a point with coordinates in the field. We haven’t ever formally talked about fields on this blog, but we’re all familiar with them: they have addition, multiplication, and division in the expected ways (everything except 0 has a multiplicative inverse, multiplication distributes across addition, etc.). The usual examples are $ \mathbb{R}, \mathbb{C}$ the real and complex numbers, and $ \mathbb{Q}$, the rational numbers (fractions of integers). There are also finite fields, which are the proper setting for elliptic curve cryptography, but we’ll save those for another post.

But why must it work if we use a field? Because the operations we need to perform the point-adding algorithm only use field operations (addition, multiplication, and division by other numbers in the field). So performing those operations on numbers in a field always give you back numbers in that field. Since all fields have 0 and 1, we also have the point at infinity $ [0 : 1 : 0]$.

This gives a natural definition.

Definition: Let $ k$ be a field and let $ E$ be the equation of an elliptic curve in Weierstrass form. Define $ E(k)$ to be the set of projective points on $ E$ with coordinates in $ k$ along with the ideal point $ [0:1:0]$. As we have discussed $ E(k)$ is a group under the operation of adding points, so we call it the elliptic curve group for $ E$ over $ k$.

Now that we’ve waded through a few hundred years of mathematics, it should be no surprise that this definition feels so deep and packed full of implicit understanding.

However, there are still a few mathematical peccadilloes to worry about. One is that if the chosen field $ k$ is particularly weird (as are many of those finite fields I mentioned), then we can’t transform any equation with coefficients in $ k$ into the Weierstrass normal form. This results in a few hiccups when you’re trying to do cryptography, but for simplicity’s sake we’ll avoid those areas and stick to nicer fields.

We have only scratched the surface of the algebraic structure of elliptic curves by showing elliptic curves have such structure at all. The next goal for a mathematician is to classify all possible algebraic structures for elliptic curves , and find easy ways to tell which from the coefficients of the equation. Indeed, we intend to provide a post at the end of this series (after we get to the juicy programs) that describes what’s known in this area from a more group-theoretic standpoint (i.e., for someone who has heard of the classification of finitely generated abelian groups).

But before that we’re ready to jump headfirst into some code. Next time we’ll implement the algorithms for adding points on elliptic curves using Python objects.

Until then!

Elliptic Curves as Elementary Equations

Finding solutions to systems of polynomial equations is one of the oldest and deepest problems in all of mathematics. This is broadly the domain of algebraic geometry, and mathematicians wield some of the most sophisticated and abstract tools available to attack these problems.

The elliptic curve straddles the elementary and advanced mathematical worlds in an interesting way. On one hand, it’s easy to describe in elementary terms: it’s the set of solutions to a cubic function of two variables. But despite how simple they seem deep theorems govern their behavior, and many natural questions about elliptic curves are still wide open. Since elliptic curves provide us with some of the strongest and most widely used encryption protocols, understanding elliptic curves more deeply would give insight into the security (or potential insecurity) of these protocols.

Our first goal in this series is to treat elliptic curves as mathematical objects, and derive the elliptic curve group as the primary object of study. We’ll see what “group” means next time, and afterward we’ll survey some of the vast landscape of unanswered questions. But this post will be entirely elementary, and will gently lead into the natural definition of the group structure on an elliptic curve.

Elliptic Curves as Equations

The simplest way to describe an elliptic curve is as the set of all solutions to a specific kind of polynomial equation in two real variables, $ x,y$. Specifically, the equation has the form:

$ \displaystyle y^2 = x^3 + ax + b$

Where $ a,b$ are real numbers such that

$ \displaystyle -16(4a^3 + 27b^2) \neq 0$

One would naturally ask, “Who the hell came up with that?” A thorough answer requires a convoluted trip through 19th and 20th-century mathematical history, but it turns out that this is a clever form of a very natural family of equations. We’ll elaborate on this in another post, but for now we can give an elementary motivation.

Say you have a pyramid of spheres whose layers are squares, like the one below

pyramid-spheres

We might wonder when it’s the case that we can rearrange these spheres into a single square. Clearly you can do it for a pyramid of height 1 because a single ball is also a 1×1 square (and one of height zero if you allow a 0x0 square). But are there any others?

This question turns out to be a question about an elliptic curve. First, recall that the number of spheres in such a pyramid is given by

$ \displaystyle 1 + 4 + 9 + 16 + \dots + n^2 = \frac{n(n+1)(2n+1)}{6}$

And so we’re asking if there are any positive integers $ y$ such that

$ \displaystyle y^2 = \frac{x(x+1)(2x+1)}{6}$

Here is a graph of this equation in the plane. As you admire it, though, remember that we’re chiefly interested in integer solutions.

pyramid-ec

The equation doesn’t quite have the special form we mentioned above, but the reader can rest assured (and we’ll prove it later) that one can transform our equation into that form without changing the set of solutions. In the meantime let’s focus on the question: are there any integer-valued points on this curve besides $ (0,0)$ and $ (1,1)$? The method we use to answer this question comes from ancient Greece, and is due to Diophantus. The idea is that we can use the two points we already have to construct a third point. This method is important because it forms the basis for our entire study of elliptic curves.

Take the line passing through $ (0,0)$ and  $ (1,1)$, given by the equation $ y = x$, and compute the intersection of this line and the original elliptic curve. The “intersection” simply means to solve both equations simultaneously. In this case it’s

$ \begin{aligned} y^2 &= \frac{x(x+1)(2x+1)}{6} \\ y &= x \end{aligned}$

It’s clear what to do: just substitute the latter in for the former. That is, solve

$ \displaystyle x^2 = \frac{x(x+1)(2x+1)}{6}$

Rearranging this into a single polynomial and multiplying through by 3 gives

$ \displaystyle x^3 – \frac{3x^2}{2} + \frac{x}{2} = 0$

Factoring cubics happens to be easy, but let’s instead use a different trick that will come up again later. Let’s use a fact that is taught in elementary algebra and precalculus courses and promptly forgotten, that the sum of the roots of any polynomial is $ \frac{-a_{n-1}}{a_n}$, where $ a_{n}$ is the leading coefficient and $ a_{n-1}$ is the next coefficient. Here $ a_n = 1$, so the sum of the roots is $ 3/2$. This is useful because we already know two roots, namely the solutions 0 and 1 we used to define the system of equations in the first place. So the third root satisfies

$ \displaystyle r + 0 + 1 = \frac{3}{2}$

And it’s $ r = 1/2$, giving the point $ (1/2, 1/2)$ since the line was $ y=x$. Because of the symmetry of the curve, we also get the point $ (1/2, -1/2)$.

Here’s a zoomed-in picture of what we just did to our elliptic curve. We used the two pink points (which gave us the dashed line) to find the purple point.

line-intersection-example

The bad news is that these two new points don’t have integer coordinates. So it doesn’t answer our question. The good news is that now we have more points! So we can try this trick again to see if it will give us still more points, and hope to find some that are integer valued. (It sounds like a hopeless goal, but just hold out a bit longer). If we try this trick again using $ (1/2, -1/2)$ and $ (1,1)$, we get the equation

$ \displaystyle (3x – 2)^2 = \frac{x(x+1)(2x+1)}{6}$

And redoing all the algebraic steps we did before gives us the solution $ x=24, y=70$. In other words, we just proved that

$ \displaystyle 1^2 + 2^2 + \dots + 24^2 = 70^2$

Great! Here’s another picture showing what we just did.

line-intersection-example-2

In reality we don’t care about this little puzzle. Its solution might be a fun distraction (and even more distracting: try to prove there aren’t any other integer solutions), but it’s not the real treasure. The mathematical gem is the method of finding the solution. We can ask the natural question: if you have two points on an elliptic curve, and you take the line between those two points, will you always get a third point on the curve?

Certainly the answer is no. See this example of two points whose line is vertical.

vertical-line

But with some mathematical elbow grease, we can actually force it to work! That is, we can define things just right so that the line between any two points on an elliptic curve will always give you another point on the curve. This sounds like mysterious black magic, but it lights the way down a long mathematical corridor of new ideas, and is required to make sense of using elliptic curves for cryptography.

Shapes of Elliptic Curves

Before we continue, let’s take a little detour to get a good feel for the shapes of elliptic curves. We have defined elliptic curves by a special kind of equation (we’ll give it a name in a future post). During most of our study we won’t be able to make any geometric sense of these equations. But for now, we can pretend that we’re working over real numbers and graph these equations in the plane.

Elliptic curves in the form $ y^2 = x^3 + ax + b$ have a small handful of different shapes that we can see as $ a,b$ vary:

ECdynamics1ECdynamics2

The problem is when we cross the point at which the rounded part pinches off in the first animation, and the circular component appears in the second. At those precise moments, the curve becomes “non-smooth” (or singular), and for reasons we’ll see later this is bad. The condition from the beginning of the article (that $ -16(4a^3 + 27b^2) \neq 0$) ensures that these two cases are excluded from consideration, and it’s one crucial part of our “elbow grease” to ensure that lines behave nicely.

The “canonical” shape of the elliptic curve is given by the specific example $ y^2 = x^3 – x + 1$. It’s the example that should pop up whenever you imagine an elliptic curve, and it’s the example we’ll use for all of our pictures.

canonical-EC

So in the next post we’ll roll up our sleeves and see exactly how “drawing lines” can be turned into an algebraic structure on an elliptic curve.

Until then!

Fixing Bugs in “Computing Homology”

A few awesome readers have posted comments in Computing Homology to the effect of, “Your code is not quite correct!” And they’re right! Despite the almost year since that post’s publication, I haven’t bothered to test it for more complicated simplicial complexes, or even the basic edge cases! When I posted it the mathematics just felt so solid to me that it had to be right (the irony is rich, I know).

As such I’m apologizing for my lack of rigor and explaining what went wrong, the fix, and giving some test cases. As of the publishing of this post, the Github repository for Computing Homology has been updated with the correct code, and some more examples.

The main subroutine was the simultaneousReduce function which I’ll post in its incorrectness below

def simultaneousReduce(A, B):
   if A.shape[1] != B.shape[0]:
      raise Exception("Matrices have the wrong shape.")

   numRows, numCols = A.shape # col reduce A

   i,j = 0,0
   while True:
      if i >= numRows or j >= numCols:
         break

      if A[i][j] == 0:
         nonzeroCol = j
         while nonzeroCol < numCols and A[i,nonzeroCol] == 0:
            nonzeroCol += 1

         if nonzeroCol == numCols:
            j += 1
            continue

         colSwap(A, j, nonzeroCol)
         rowSwap(B, j, nonzeroCol)

      pivot = A[i,j]
      scaleCol(A, j, 1.0 / pivot)
      scaleRow(B, j, 1.0 / pivot)

      for otherCol in range(0, numCols):
         if otherCol == j:
            continue
         if A[i, otherCol] != 0:
            scaleAmt = -A[i, otherCol]
            colCombine(A, otherCol, j, scaleAmt)
            rowCombine(B, j, otherCol, -scaleAmt)

      i += 1; j+= 1

   return A,B

It’s a beast of a function, and the persnickety detail was just as beastly: this snippet should have an $ i += 1$ instead of a $ j$.

if nonzeroCol == numCols:
   j += 1
   continue

This is simply what happens when we’re looking for a nonzero entry in a row to use as a pivot for the corresponding column, but we can’t find one and have to move to the next row. A stupid error on my part that would be easily caught by proper test cases.

The next mistake is a mathematical misunderstanding. In short, the simultaneous column/row reduction process is not enough to get the $ \partial_{k+1}$ matrix into the right form! Let’s see this with a nice example, a triangulation of the Mobius band. There are a number of triangulations we could use, many of which are seen in these slides. The one we’ll use is the following.

mobius-triangulation

It’s first and second boundary maps are as follows (in code, because latex takes too much time to type out)

mobiusD1 = numpy.array([
   [-1,-1,-1,-1, 0, 0, 0, 0, 0, 0],
   [ 1, 0, 0, 0,-1,-1,-1, 0, 0, 0],
   [ 0, 1, 0, 0, 1, 0, 0,-1,-1, 0],
   [ 0, 0, 0, 1, 0, 0, 1, 0, 1, 1],
])

mobiusD2 = numpy.array([
   [ 1, 0, 0, 0, 1],
   [ 0, 0, 0, 1, 0],
   [-1, 0, 0, 0, 0],
   [ 0, 0, 0,-1,-1],
   [ 0, 1, 0, 0, 0],
   [ 1,-1, 0, 0, 0],
   [ 0, 0, 0, 0, 1],
   [ 0, 1, 1, 0, 0],
   [ 0, 0,-1, 1, 0],
   [ 0, 0, 1, 0, 0],
])

And if we were to run the above code on it we’d get a first Betti number of zero (which is incorrect, it’s first homology group has rank 1). Here are the reduced matrices.

>>> A1, B1 = simultaneousReduce(mobiusD1, mobiusD2)
>>> A1
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])
>>> B1
array([[ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  1,  0,  0,  0],
       [ 1, -1,  0,  0,  0],
       [ 0,  0,  0,  0,  1],
       [ 0,  1,  1,  0,  0],
       [ 0,  0, -1,  1,  0],
       [ 0,  0,  1,  0,  0]])

The first reduced matrix looks fine; there’s nothing we can do to improve it. But the second one is not quite fully reduced! Notice that rows 5, 8 and 10 are not linearly independent. So we need to further row-reduce the nonzero part of this matrix before we can read off the true rank in the way we described last time. This isn’t so hard (we just need to reuse the old row-reduce function we’ve been using), but why is this allowed? It’s just because the corresponding column operations for those row operations are operating on columns of all zeros! So we need not worry about screwing up the work we did in column reducing the first matrix, as long as we only work with the nonzero rows of the second.

Of course, nothing is stopping us from ignoring the “corresponding” column operations, since we know we’re already done there. So we just have to finish row reducing this matrix.

This changes our bettiNumber function by adding a single call to a row-reduce function which we name so as to be clear what’s happening. The resulting function is

def bettiNumber(d_k, d_kplus1):
   A, B = numpy.copy(d_k), numpy.copy(d_kplus1)
   simultaneousReduce(A, B)
   finishRowReducing(B)

   dimKChains = A.shape[1]
   kernelDim = dimKChains - numPivotCols(A)
   imageDim = numPivotRows(B)

   return kernelDim - imageDim

And running this on our Mobius band example gives:

>>> bettiNumber(mobiusD1, mobiusD2))
1

As desired. Just to make sure things are going swimmingly under the hood, we can check to see how finishRowReducing does after calling simultaneousReduce

>>> simultaneousReduce(mobiusD1, mobiusD2)
(array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]]), array([[ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  1,  0,  0,  0],
       [ 1, -1,  0,  0,  0],
       [ 0,  0,  0,  0,  1],
       [ 0,  1,  1,  0,  0],
       [ 0,  0, -1,  1,  0],
       [ 0,  0,  1,  0,  0]]))
>>> finishRowReducing(mobiusD2)
array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

Indeed, finishRowReducing finishes row reducing the second boundary matrix. Note that it doesn’t preserve how the rows of zeros lined up with the pivot columns of the reduced version of $ \partial_1$ as it did in the previous post, but since in the end we’re only counting pivots it doesn’t matter how we switch rows. The “zeros lining up” part is just for a conceptual understanding of how the image lines up with the kernel for a valid simplicial complex.

In fixing this issue we’ve also fixed an issue another commenter mentioned, that you couldn’t blindly plug in the zero matrix for $ \partial_0$ and get zeroth homology (which is the same thing as connected components). After our fix you can.

Of course there still might be bugs, but I have so many drafts lined up on this blog (and research papers to write, experiments to run, theorems to prove), that I’m going to put off writing a full test suite. I’ll just have to update this post with new bug fixes as they come. There’s just so much math and so little time 🙂 But extra kudos to my amazing readers who were diligent enough to run examples and spot my error. I’m truly blessed to have you on my side.

Also note that this isn’t the most efficient way to represent the simplicial complex data, or the most efficient row reduction algorithm. If you’re going to run the code on big inputs, I suggest you take advantage of sparse matrix algorithms for doing this sort of stuff. You can represent the simplices as entries in a dictionary and do all sorts of clever optimizations to make the algorithm effectively linear time in the number of simplices.

Until next time!

Rings — A Second Primer

Last time we defined and gave some examples of rings. Recapping, a ring is a special kind of group with an additional multiplication operation that “plays nicely” with addition. The important thing to remember is that a ring is intended to remind us arithmetic with integers (though not too much: multiplication in a ring need not be commutative). We proved some basic properties, like zero being unique and negation being well-behaved. We gave a quick definition of an integral domain as a place where the only way to multiply two things to get zero is when one of the multiplicands was already zero, and of a Euclidean domain where we can perform nice algorithms like the one for computing the greatest common divisor. Finally, we saw a very important example of the ring of polynomials.

In this post we’ll take a small step upward from looking at low-level features of rings, and start considering how general rings relate to each other. The reader familiar with this blog will notice many similarities between this post and our post on group theory. Indeed, the definitions here will be “motivated” by an attempt to replicate the same kinds of structures we found helpful in group theory (subgroups, homomorphisms, quotients, and kernels). And because rings are also abelian groups, we will play fast and loose with a number of the proofs here by relying on facts we proved in our two-post series on group theory. The ideas assume a decidedly distinct flavor here (mostly in ideals), and in future posts we will see how this affects the computational aspects in more detail.

Homomorphisms and Sub-structures

The first question we should ask about rings is: what should mappings between rings look like? For groups they were set-functions which preserved the group operation ($ f(xy) = f(x)f(y)$ for all $ x,y$). The idea of preserving algebraic structure translates to rings, but as rings have two operations we must preserve both.

Definition: Let $ (R, +_R, \cdot_R), (S, +_S, \cdot_S)$ be rings, and let $ f: R \to S$ be a function on the underlying sets. We call $ f$ a ring homomorphism if $ f$ is a group homomorphism of the underlying abelian groups, and for all $ x,y \in R$ we have $ f(x \cdot_R y) = f(x) \cdot_S f(y)$. A bijective ring homomorphism is called an isomorphism.

Indeed, we have the usual properties of ring homomorphisms that we would expect. All ring homomorphisms preserve the additive and multiplicative identities, multiplicative inverses (when they exist), and things like zero-divisors. We leave these verifications to the reader.

Ring homomorphisms have the same kinds of constructions as group homomorphisms did. One of particular importance is the kernel $ \textup{ker} f$, which is the preimage of zero under $ f$, or equivalently the set $ \left \{ x : f(x) = 0 \right \}$. This is the same definition as for group homomorphisms, linear maps, etc.

The second question we want to ask about rings is: what is the appropriate notion of a sub-structure for rings? One possibility is obvious.

Definition: A subring $ S$ of a ring $ R$ is a subset $ S \subset R$ which is also a ring under the same operations as those of $ R$ (and with the same identities).

Unfortunately subrings do not have the properties we want them to have, and this is mostly because of the requirement that our rings have a multiplicative identity. For instance, we want to say that the kernel of a ring homomorphism $ f: R \to S$ is a subring of $ R$. This will obviously not be the case, because the multiplicative identity never maps to zero under a ring homomorphism! We also want an appropriate notion of a quotient ring, but if we quotient out (“declare to be zero”) a subring, then the identity will become zero, which yields all sorts of contradictions in all but the most trivial of rings. One nice thing, however, is that the image of a ring homomorphism $ R \to S$ is a subring of $ S$. It is a trivial exercise to prove.

Rather than modify our definition of a kernel or a quotient to make it work with the existing definition of a subring, we pick a different choice of an appropriate sub-structure: the ideal.

The Study of Ideals

From an elementary perspective, an ideal is easiest understood as a generalization of even numbers. Even integers have this nice property that multiplying any integer by an even integer gives you an even integer. This is a kind of closure property that mathematicians really love to think about, because they tend to lead to further interesting properties and constructions. Indeed, we can generalize it as follows:

Definition: Let $ R$ be a commutative ring, and $ I \subset R$. We call $ I$ an ideal if two conditions are satisfied:

  1. $ I$ is a subgroup of $ R$ under addition.
  2. For all $ r \in R$ and for all $ x \in I$, we have $ rx \in I$.

That is, an ideal is a subgroup closed under multiplication. The even integers as a subset of $ \mathbb{Z}$ give a nice example, as does the set $ \left \{ 0 \right \}$. In fact, every ideal contains zero by being a subgroup. A slightly more complicated example is the set of polynomials divisible by $ (x-1)$ as a subset of the polynomial ring $ \mathbb{R}[x]$.

We have to be a little careful here if our ring is not commutative. Technically this definition above is for a left-ideal, which is closed under left-multiplication. You can also define right-ideals closed under right-multiplication, and the official name for a plain old “ideal” is a two-sided ideal. The only time we will ever work with noncommutative rings (that we can envision) is with matrix rings, so excluding that case our ideals will forevermore be two-sided. Often times we will use the notation $ rI$ to denote the set of all possible left multiplications $ \left \{ rx : x \in I \right \}$, and so the ideal condition is just $ rI = I$. This is a multiplicative kind of coset, although as we are about to see we will use both additive and multiplicative cosets in talking about rings.

Let’s look at some properties of ideals that show how neatly this definition encapsulates what we want. First,

Proposition: The kernel of a ring homomorphism is an ideal.

Proof. Let $ f: R \to S$ be a ring homomorphism, and let $ I = \textup{ker}(f)$. We already know that $ I$ is a subgroup of $ R$ under addition because kernels of group homomorphisms are subgroups. To show the ideal condition, simply take $ r \in R, x \in I$ and note that $ f(rx) = f(r)f(x) = f(r)0 = 0$, and so $ rx \in \textup{ker}(f)$. $ \square$

In fact the correspondence is one-to-one: every ideal is the kernel of a ring homomorphism and every ring homomorphism’s kernel is an ideal. This is not surprising as it was the case for groups, and the story starts here with quotients, too.

Definition: Let $ R$ be a ring and $ I$ an ideal of $ R$. The quotient group $ R/I$ forms a ring called the quotient ring, and is still denoted by $ R / I$.

To show this definition makes any sense requires some thought: what are the operations of this new ring? Are they well-defined on coset representatives?

For the first, we already know the addition operation because it is the same operation when we view $ R / I$ as a quotient group; that is, $ (x + I) + (y + I) = (x+y) + I$. The additive identity is just $ 0 + I = I$. The multiplication operation is similar: $ (x + I)(y + I) = xy + I$. And the multiplicative identity is clearly $ 1 + I$.

The fact that multiplication works as we said it does above gives more motivation for the definition of an ideal. To prove it, pick any representatives $ x + z_1 \in x + I$ and $ y + z_2 \in y + I$. Their product is

$ (xy + xz_2 + yz_1 + z_1z_2) \in xy + xI + yI + I^2$

Where we denote by $ I^2 = \left \{ xy : x \in I, y \in I \right \}$. The condition for an ideal to be an ideal is precisely that the weird parts, the $ xI + yI + I^2$, become just $ I$. And indeed, $ I$ is an additive group, so sums of things in $ I$ are in $ I$, and moreover $ I$ is closed under multiplication by arbitrary elements of $ R$. More rigorously, $ xy + xz_2 + yz_1 + z_1z_2$ is equivalent to $ xy$ under the coset equivalence relation because their difference $ xz_2 + yz_1 + z_1z_2$ is a member of $ I$.

Now we can realize that every ideal is the kernel of some homomorphism. Indeed, if $ I$ is an ideal of $ R$, then there is a natural map $ \pi : R \to R/I$ (called the canonical projection, see our post on universal properties for more) defined by sending an element to its coset: $ \pi(x) = x+ I$. It should be obvious to the reader that the kernel of $ \pi$ is exactly $ I$ (because $ x + i = 0 + I$ if and only if $ x \in I$; this is a fact we borrow from groups). And so there is a correspondence between kernels and ideals. They are really the same concept manifested in two different ways.

Because we have quotient rings and ring homomorphisms, we actually get a number of “theorems” for free. These are the isomorphism theorems for rings, the most important one being the analogue of the First Isomorphism Theorem for groups. That is, if $ f: R \to S$ is a ring homomorphism, $ \textup{im}(f)$ is a subring of $ S$, and moreover $ R/ \textup{ker}(f) \cong \textup{im}(f)$. We invite the reader to prove it by hand (start by defining a map $ \varphi(x + I) = f(x)$ and prove it’s a ring isomorphism). There are some other isomorphism theorems, but they’re not particularly deep; rather, they’re just commonly-applied corollaries of the first isomorphism theorem. In light of this blog’s discussions on category theory, the isomorphism theorems are a trivial consequence of the fact that rings have quotients, and that $ \textup{im}(f)$ happens to be well-behaved.

Noetherian Rings and Principal Ideal Domains

One cannot overestimate the importance of ideals as a fundamental concept in ring theory. They are literally the cornerstone of everything interesting in the subject, and especially the computational aspects of the field (more on that in future posts). So to study ideals, we make some basic definitions.

Definition: Let $ A \subset R$ be any subset of the ring $ R$. The ideal generated by $ A$ is the smallest ideal of $ R$ containing $ A$, denoted $ I(A)$. It is “smallest” in the sense that all ideals containing $ A$ must also contain $ I(A)$.

Indeed, we can realize $ I(A)$ directly as the set of all possible finite linear combinations $ r_1 a_1 + \dots r_k a_k$ where the $ r_i \in R$ and $ a_i \in A$. Such a linear combination is called an $ R$-linear combination because the “coefficients” come from $ R$. It is not hard to see that this is an ideal. It is obviously a subgroup since sums of linear combinations are also linear combinations, and negation distributes across the sum. It satisfies multiplicative closure because

$ r(r_1a_1 + \dots + r_k a_k) = (rr_1)a_1 + \dots + (rr_k)a_k$

is another $ R$-linear combination.

One convenient way to think of this ideal is as the intersection of all ideals $ I$ which contain $ A$. Since the intersection of any family of ideals is an ideal (check this!) this will always give us the smallest possible ideal containing $ A$.

One important bit of notation is that if $ I = I(A)$ is the ideal generated by a finite set $ A$, then we write $ I = \left \langle a_1, \dots, a_n \right \rangle$ where $ A = \left \{ a_1, \dots, a_n \right \}$. We say that $ I$ is finitely generated. If $ A$ happens to be a singleton, we say that $ I$ is a principal ideal. Following the notation of linear algebra, a minimal (by cardinality) generating set for an ideal is called a basis for the ideal. 

Thinking about how ideals are generated is extremely important both in the pure theory of rings and in the computational algorithms that deal with them. It’s so important, in fact, that there are entire classes of rings defined by how their ideals behave. We give the two most important ones here.

Definition: A commutative ring is called Noetherian if all of its ideals are finitely generated. An integral domain is called a principal ideal domain if all of its ideals are principal.

The concept of a Noetherian ring is a particularly juicy one, and it was made famous by the founding mother of commutative ring theory, Emmy Noether. Without going into too much detail, just as an integral domain is the most faithful abstraction of the ring of integers, a Noetherian ring is the best way to think about polynomial rings (and quotients of polynomial rings). This is highlighted by a few basic facts and some deep theorems:

Fact: If $ R$ is a Noetherian ring and $ I$ an ideal, then $ R/I$ is Noetherian.

This follows from the correspondence of ideals between $ R$ and $ R/I$. Indeed, for every ideal $ J \subset R/I$ there is an ideal $ \pi^{-1}(J)$ of $ R$ which contains $ I$. Moreover, this correspondence is a bijection. So if we want a generating set for $ J$, we can lift it up to $ R$ via $ \pi$, get a finite generating set for $ \pi^{-1}(J)$, and project the generators back down to $ R/I$. Some of the generators will be killed off (sent to $ I$ by $ \pi$), but what remains will be a valid generating set for $ J$, and still finite.

Theorem (Hilbert Basis Theorem): If $ R$ is a Noetherian ring, then the polynomial ring in one variable $ R[x]$ is Noetherian. Inductively, $ R[x_1, \dots, x_n]$ is also Noetherian.

These theorems start to lay the foundation for algebraic geometry, which connects ideals generated by a family of polynomials to the geometric solution set of those polynomials. Since a vast array of industrial problems can be reduced to solving systems of polynomial equations (take robot motion planning, for example), it would be quite convenient if we could write programs to reason about those systems of equations. Indeed, such algorithms do exist, and we make heavy use of theorems like the Hilbert Basis Theorem to ensure their correctness.

In the next post in this series, we’ll start this journey through elementary algebraic geometry by defining a variety, and establishing its relationship to ideals of polynomial rings. We’ll then work towards theorems like Hilbert’s Nullstellensatz, the computational operations we wish to perform on ideals, and the algorithms that carry out those operations. As usual, we’ll eventually see some applications and write some code.

Until then!