# Introducing Elliptic Curves

With all the recent revelations of government spying and backdoors into cryptographic standards, I am starting to disagree with the argument that you should never roll your own cryptography. Of course there are massive pitfalls and very few people actually need home-brewed cryptography, but history has made it clear that blindly accepting the word of the experts is not an acceptable course of action. What we really need is more understanding of cryptography, and implementing the algorithms yourself is the best way to do that. [1]

For example, the crypto community is quickly moving away from the RSA standard (which we covered in this blog post). Why? It turns out that people are getting just good enough at factoring integers that secure key sizes are getting too big to be efficient. Many experts have been calling for the security industry to switch to Elliptic Curve Cryptography (ECC), because, as we’ll see, the problem appears to be more complex and hence achieves higher security with smaller keys. Considering the known backdoors placed by the NSA into certain ECC standards, elliptic curve cryptography is a hot contemporary issue. If nothing else, understanding elliptic curves allows one to understand the existing backdoor.

I’ve seen some elliptic curve primers floating around with all the recent talk of cryptography, but very few of them seem to give an adequate technical description [2], and legible implementations designed to explain ECC algorithms aren’t easy to find (I haven’t found any).

So in this series of posts we’re going to get knee deep in a mess of elliptic curves and write a full implementation. If you want motivation for elliptic curves, or if you want to understand how to implement your own ECC, or you want to understand the nuts and bolts of an existing implementation, or you want to know some of the major open problems in the theory of elliptic curves, this series is for you.

The series will have the following parts:

1. Elliptic curves as elementary equations
2. The algebraic structure of elliptic curves
3. Points on elliptic curves as Python objects
4. Elliptic curves over finite fields
5. Elliptic curve cryptography and random number generation

Along the way we’ll survey a host of mathematical topics as needed, including group theory, projective geometry, and the theory of cryptographic security. We won’t assume any familiarity with these topics ahead of time, but we do intend to develop some maturity through the post without giving full courses on the side-topics. When appropriate, we’ll refer to the relevant parts of the many primers this blog offers.

A list of the posts in the series (as they are published) can be found on the Main Content page. And as usual all programs produced in the making of this series will be available on this blog’s Github page.

The first post will be published on Monday 2014-02-10. Hope you enjoy it!

[1] Okay, what people usually mean is that you shouldn’t use your own cryptography for things that actually matter, but I think a lot of the warnings are interpreted or extended to, “Don’t bother implementing cryptographic algorithms, just understand them at a fuzzy high level.” I imagine this results in fewer resources for people looking to learn cryptography and the mathematics behind it, and at least it prohibits them from appreciating how much really goes into an industry-strength solution. And this mindset is what made the NSA backdoor so easy: the devil was in the details.
[2] From my heavily biased standpoint as a mathematician.

# Thinking about Graduate School? Consider Mathematical Computer Science at UI Chicago!

It’s that time of year where senior undergraduates are considering whether to go to graduate school. And I wouldn’t be surprised if many students were afraid of the prospect, perhaps having read that popular genre of articles these days that tell you graduate school will turn you into an emotional wreck and that only a psychopathic masochist would put themselves through it.

The problem with these articles is they’re usually written by both outliers and those who put themselves in situations with no other options. I’ve felt my time at UI Chicago, however, has provided me nothing but options and excitement! So if you’re thinking about graduate school in mathematics or theoretical computer science, here’s my pitch for

## Why you should come to UI Chicago and study theoretical computer science

We’re social.

In fact, UI Chiago’s mathematics department is the most social of any math department I’ve ever heard of. I think this is the biggest benefit for me. On my first day here, I was surprised that everyone was totally normal and not the typical weird antisocial stereotype one associates with people who like math. Our department has a huge list of seminars going on every day of the week, and a small party every Friday called “Tea” that has a large attendance. We often go out to bars and restaurants, and have other outings. We even have a Facebook group (for grad students only) and a ping pong league that the professors sometimes join. We currently have over 150 graduate students in our department, and I know around 70 by name.

We have world-class faculty.

Some of my colleagues came to UIC specifically to work with David Marker on model theory, or Lou Kauffman on knot theory. At least one researcher here has over two hundred publications! We have big names in algebraic geometry, hypergraph combinatorics, dynamical systems, low-dimensional topology, and a very active logic group. Our theoretical computer science group (mixed with our combinatorics group) is small but vibrant and growing fast. We just got three new mathematical computer science students this year, and I’m doing everything I can to convert some of the other students over to our side.

We’re in the middle of a thriving intellectual community.

Chicago is the center of the Midwest US, and there are a ton of universities not only in the city but within a few hours drive. There are regular seminars and colloquia at the University of Chicago, Northwestern, and other smaller institutions like the Toyota Technical Institute (which has very strong researchers). Then there are the universities of Wisconsin, Indiana, and Michigan which all have strong theoretical computer science groups (and of course other mathematics groups) and we get together for conferences like Midwest Theory Day.

Our department is not cutthroat competitive.

I hear rumors about top mathematics and computer science programs that (unintentionally) pit students against each other for the attention of a few glorified professors. That simply doesn’t happen here. Everyone is friendly and people regularly collaborate. You can approach any professor and ask to do a reading course with them or ask them what kinds of open problems they’re thinking about, and most of them will gladly sit down with you and explain all the neat ideas in their heads. Even the hardest, most sarcastic professors genuinely care about their students. I think, along with being social, this makes our department one of the friendliest and most stress-free places to get a PhD.

We’re in a great city.

Our department staff is very supportive.

Our director and assistant director of graduate studies are extremely helpful at getting new students situated and ensuring they have funding. It’s not uncommon for students who start in the PhD program to decide after one or two years that a PhD is not right for them. Usually they will stop with the requirements for a master’s degree, and there are no hard feelings. Students who do this are even encouraged to return if they decide they want to finish their PhD later. In the mean time, our department guarantees tuition waivers and stipends to all of its teaching assistants (and there are alternatives to teaching as well), so you can focus on your studies and not have to think too much about money.

And even more, if you decide to study theoretical computer science at UI Chicago you get a whole bunch of other benefits:

You get to hang out and do research with me!

(Okay maybe that’s not a serious benefit to consider)

Jobs are hard to come by for the purest of pure mathematics researchers. Research positions are in short supply, and unless you want to go into industry with an applied math degree the remaining option is to teach at a 4-year institution. But if you study theoretical computer science, now you are qualified to do all kinds of things. Work at industry research labs like Microsoft Research, Google Research, or Yahoo! Research. Work at government labs like Lincoln Labs and Lawrence Livermore National Labs, both of which I interned at. You can shoot for a professorship or do a postdoc like a regular mathematics PhD would. If you’re hand with Python you could go into the software industry and get a high-demand job at any major company in cryptography or operations research (both of which depend on ideas from TCS). And you always keep the option of teaching at a 4-year.

You have many options for internships during summers.

I, my colleagues, and even my advisor did research internships during the Summers at various research labs and industry companies. This is a particularly nice benefit of doing mathematical computer science in grad school, because it augments your normal graduate student stipend by enough to live much more comfortably than otherwise (that being said, for extra money a lot of my pure math colleagues will tutor on the side, and tutoring comes at a high price these days). It’s not uncommon to receive additional funding through these opportunities as well.

You get to travel a lot.

The main publication venue in computer science is the conference, and that means there are conferences happening all over the world all the time. In fact, I just got back from a conference in Aachen, Germany, earlier this year I was at Berkeley and Stanford, I am helping to run a conference in Florida early next year, and I am looking at conferences in Beijing and Barcelona next Summer. All of the trips you take to present your published research is paid for, so it’s just pure awesome.

You enjoy the breadth of problems in computer science.

Computer science is unique in that it connects to almost every field of mathematics.

1. Like statistics? There’s statistical machine learning and randomized algorithm design.
2. Like real analysis and dynamical systems? There’s convex optimization, support vector machines, and tons of computational aspects of PDE’s.
3. Like algebra or number theory? There’s cryptography.
4. Like combinatorics? There’s combinatorial optimization.
5. Like game theory? I just got back from a conference on algorithmic game theory.
6. Like geometry and representation theory? There’s a Geometric Complexity Theory program working toward P vs NP.
7. Like logic? You might be surprised to know that the cleanest proofs of the incompleteness theorems are via Turing machines.
8. Like topology? There are researchers (not at UIC) working on computational topology, like persistent homology which we’ve been slowly covering on this blog.

The list just goes on and on, and this isn’t even mentioning the purely pure theoretical computer science topics which have a flavor of their own.

Programming options exist, but you aren’t forced to write programs.

Some of the greatest computer science researchers cannot write simple computer programs, and if you’re just interested in theory there is plenty of theory to go around. On the other hand, we have researchers in our department studying aspects of supercomputing, and options for collaboration with researchers in the (engineering) computer science department. Over there they’re studying things like biological networks, machine learning and robotics, and all kinds of hands-on applied stuff that you might be interested in if you read this blog.

So if you’re interested in joining us for next year and have any questions, feel free to drop me or the professors in the MCS group or the director of graduate studies an email.

# Status Update, and Boston Python Project Night

Hey everyone, this is just a quick note to say what’s going on.

Recently I’ve been slowing down a bit on blog posts. I’ve been travelling and settling in at my Summer job in Boston where we’re looking at a lot of really fascinating questions w.r.t. clustering and learning theory. For various reasons (my proper hardware isn’t with me, time constraints, etc.), I haven’t been able to get down to the meaty posts I’ve been hoping to work on, such as the follow-up to Bezier & Picasso and more on category theory. I will do my best to continue posting, but until about September most of the posts will be on the mathy side with no hand-made images and no Youtube videos.

On a brighter note, I will be attending this upcoming Monday’s meeting of the Boston Python User’s Group! It’s going to be pretty fun, and as of this writing there are 43 spots left. So if you’re in the area, if you enjoy free pizza, and if you want to come and meet people and code Python for a few hours, I’ll be there working on some stuff related to stable marriages. You’ll know me as the guy who keeps talking about math.

As a side note, I noticed there is no comparable meetup in Chicago. Despite my already busy life (i.e., maybe in a year or two), I was considering starting a Python meetup in Chicago. Then I thought about how small the software scene is in Chicago, and wondered if I would be able to find enough sponsors and people to give talks. It’s more of a pipe dream right now, but maybe someday it can become a reality. At least I can start by signing up to give a talk (cf. evangelize mathematics) at the Boston meetup later next month.

In other news, I’ve got a small surprise tomorrow in the form of a fun guest post on a friend’s blog. Keep an eye out for that.

Until next time!

# Why Theoretical Computer Scientists Aren’t Worried About Privacy

There has been a lot of news recently on government surveillance of its citizens. The biggest two that have pervaded my news feeds are the protests in Turkey, which in particular have resulted in particular oppression of social media users, and the recent light on the US National Security Agency’s widespread “backdoor” in industry databases at Google, Verizon, Facebook, and others. It appears that the facts are in flux, as some companies have denied their involvement in this program, but regardless of the truth the eye of the public has landed firmly on questions of privacy.

Barack Obama weighed in on the controversy as well, being quoted as saying,

You can’t have 100% security and 100% privacy, and also zero inconvenience.

I don’t know what balance the US government hopes to strike, but what I do know is that privacy and convenience are technologically possible, and we need not relinquish security to attain it.

Before I elaborate, let me get my personal beliefs out of the way. I consider the threat of terrorism low compared to the hundreds of other ways I can die. I should know, as I personally have been within an $\varepsilon$ fraction of my life for all $\varepsilon > 0$ (when I was seven I was hit by a bus, proclaimed dead, and revived). So I take traffic security much more seriously than terrorism, and the usual statistics will back me up in claiming one would be irrational to do otherwise. On the other hand, I also believe that I only need so much privacy. So I don’t mind making much of my personal information public, and I opt in to every one of Google’s tracking services in the hopes that my user experience can be improved. Indeed it has, as services like Google Now will, e.g., track my favorite bands for me based on my Google Play listening and purchasing habits, and alert me when there are concerts in my area. If only it could go one step further and alert me of trending topics in theoretical computer science! I have much more utility for timely knowledge of these sorts of things than I do for the privacy of my Facebook posts. Of course, ideologically I’m against violating privacy as a matter of policy, but this is a different matter. One can personally loathe a specific genre of music and still recognize its value and one’s right to enjoy it.

But putting my personal beliefs aside, I want to make it clear that there is no technological barrier to maintaining privacy and utility. This may sound shocking, but it rings true to the theoretical computer scientist. Researchers in cryptography have experienced this feeling many times, that their wildest cryptographic dreams are not only possible but feasible! Public-key encryption and digital signatures, secret sharing on a public channel, zero-knowledge verification, and many other protocols have been realized quite soon after being imagined. There are still some engineering barriers to implementing these technologies efficiently in large-scale systems, but with demand and a few years of focused work there is nothing stopping them from being used by the public. I want to use this short post to describe two of the more recent ideas that have pervaded the crypto community and provide references for further reading.

## Differential Privacy and Fully Homomorphic Encryption

There are two facts which are well known in theoretical computer science that the general public is not aware of. The first is about the privacy of databases:

There is a way to mine information from a database without the ability to inspect individual entries in the database.

This is known as differential privacy. The second is no less magical:

There are secure encryption schemes which allow one to run programs on encrypted data and produce encrypted results, without the ability to decrypt the data.

This is known as fully homomorphic encryption.

The implications of these two facts should be obvious: search engines need not know our queries but can still fetch us search results and mine our information to serve ads, Facebook need not have access to our personal data but may still accurately predict new friends, grocery stores can even know what products to place side by side without knowing what any individual customer has purchased. Banks could process our transactions without knowing the amounts involved, or even the parties involved. Perhaps most importantly, governments can have access to databases (in the form of differentially private queries) and mine for the existence of threats without violating any individual user’s privacy. If they get an indication of a terrorist threat, then they can use the usual channels (court orders) to get access to specific individual data.

It’s easy to argue that these techniques will never become mainstream enough for individuals to benefit from it. Indeed, we’ve had cryptography for many years but few average users actively encrypt their communication for a lack of convenience. And then there are questions of policy: why would any company relinquish the ability to work directly with user data? And the cost of rearchitecturing existing services to utilize these technologies would be enough to dissuade most business leaders.

But the point of all this is that these are problems of policy that could in principle be solved without waiting for governments and corporations to get their act together. With enough demand for such services and with enough technologically-minded entrepreneurs (I’m looking at you, Silicon Valley), it would be just a matter of time before the world was differentially private. Mathematics cannot be revoked or legislated away.

## Fully Homomorphic Encryption

A fully homomorphic encryption scheme is a normal encryption scheme (two functions “enc” and “dec” to encrypt and decrypt) with one additional function, which we’ll call “eval.” Very roughly, eval accepts as input the text of a program and a ciphertext, and produces as output a ciphertext such that the following diagram commutes:

That is, $m$ is our message, and $\textup{eval}$ runs $f$ on the encrypted version of our message. In practice this happens by lifting two operations, multiplication and addition, from plaintexts (which are usually number-representations of letters) to ciphertexts (again usually numbers). Once this is done one can simulate the functionality of an arbitrary circuit on the encrypted data without decrypting it. Those readers who have been following our category theory series will recognize these sorts of diagrams as being functorial. [Actually, at the time of this writing we have yet to look at functors, but we will soon!] So perhaps a better term would be “functorial encryption.”

I should emphasize: a truly homomorphic encryption scheme has the ability to run any computable function on the encrypted data. There is no loss of functionality in preserving the privacy from the program runner. The main use of this is to maintain privacy while deferring large computations to the cloud. We do this all the time, e.g. a search query, but it also applies to big websites like Reddit, which operate entirely on Amazon Web Services.

Fully homomorphic encryption was first envisaged by Rivest, Adleman (two of the inventors of RSA), and Dertouzos in the late seventies, mainly because the RSA encryption scheme is close to being homomorphic (one can multiply ciphertexts, but not add them). In 2009, Craig Gentry released the first working fully-homomorphic scheme based on the mathematical theory of ideal lattices, and later that year he (with a group of other researchers) came up with a second system that is arguably as simple as RSA; it operates on integers with modular arithmetic.

Gentry has produced a lot of research since then in homomorphic encryption, but the interested reader should probably start with his tutorial paper describing his arithmetic-based system. From there, there are existing implementations in Python (using Sage) and C++, both of which are freely available on github.

## Differential Privacy

The main idea of differential privacy is that one can add noise to statistical data to protect the identities of individual records. Slightly more rigorously, a randomized algorithm $f$ is said to be $\varepsilon$-differentially private if for all possible datasets (inputs) $D_1, D_2$ which differ on a single record, and all possible collections of outputs $y$ of $f$, the probability of correctly guessing $D_1$ from $y$ is not significantly different from that of $D_2$. In particular, their quotient is at most $e^{\varepsilon}$ (this choice of using $e$ is arbitrary, but makes the analysis nicer).

The motivation for differential privacy came from two notable events in which companies released “anonymized” data which was partially de-anonymized because it was too specific. The first was the million-dollar Netflix Prize contest to develop a better recommendation algorithm, and the second was the release of the Massachusetts Group Insurance Commission medical database. As such, many companies are very strict with how they handle their user data, and information sharing the medical community is practically nonexistent.

There are many known differentially private algorithms, and they’re much stronger than one would imagine at first. One can run random forests of decision trees, network trace analysis, query-click analysis, certain forms of clustering, and a whole host of combinatorial optimization problems. For a gentle introduction to differential privacy, see Christine Task’s lecture video, a Practical Beginner’s Guide to Differential Privacy. There is also an influential survey from Microsoft Research of Dwork. These go into much more detail about the abilities and inabilities of differential privacy than I could do here.

If there’s one thing to take away from this discussion, it’s that efficient protocols for ensuring privacy are out there waiting to be implemented in software. So while we complain and listen to others complain about governments violating our liberties (and indeed, this discussion is extremely important to have), let’s do a little mathematics, do a little computer science, and figure out how to make privacy the standard of measure in software.

Until next time!