About j2kun

Mathematician, programmer, writer.

Miller-Rabin Primality Test

Problem: Determine if a number is prime, with an acceptably small error rate.

Solution: (in Python)

import random

def decompose(n):
   exponentOfTwo = 0

   while n % 2 == 0:
      n = n/2
      exponentOfTwo += 1

   return exponentOfTwo, n

def isWitness(possibleWitness, p, exponent, remainder):
   possibleWitness = pow(possibleWitness, remainder, p)

   if possibleWitness == 1 or possibleWitness == p - 1:
      return False

   for _ in range(exponent):
      possibleWitness = pow(possibleWitness, 2, p)

      if possibleWitness == p - 1:
         return False

   return True

def probablyPrime(p, accuracy=100):
   if p == 2 or p == 3: return True
   if p < 2: return False

   numTries = 0
   exponent, remainder = decompose(p - 1)

   for _ in range(accuracy):
      possibleWitness = random.randint(2, p - 2)
      if isWitness(possibleWitness, p, exponent, remainder):
         return False

   return True

Discussion: This algorithm is known as the Miller-Rabin primality test, and it was a very important breakthrough in the study of probabilistic algorithms.

Efficiently testing whether a number is prime is a crucial problem in cryptography, because the security of many cryptosystems depends on the use of large randomly chosen primes. Indeed, we’ve seen one on this blog already which is in widespread use: RSA. Randomized algorithms also have quite useful applications in general, because it’s often that a solution which is correct with probability, say, 2^{-100} is good enough for practice.

But from a theoretical and historical perspective, primality testing lied at the center of a huge problem in complexity theory. In particular, it is unknown whether algorithms which have access to randomness and can output probably correct answers are more powerful than those that don’t. The use of randomness in algorithms comes in a number of formalizations, the most prominent of which is called BPP (Bounded-error Probabilistic Polynomial time). The Miller-Rabin algorithm shows that primality testing is in BPP. On the other hand, algorithms solvable in polynomial time without randomness are in a class called P.

For a long time (from 1975 to 2002), it was unknown whether primality testing was in P or not. There are very few remaining important problems that have BPP algorithms but are not known to be in P. Polynomial identity testing is the main example, and until 2002 primality testing shared its title. Now primality has a known polynomial-time algorithm. One might argue that (in theory) the Miller-Rabin test is now useless, but it’s still a nice example of a nontrivial BPP algorithm.

The algorithm relies on the following theorem:

Theorem: if p is a prime, let s be the maximal power of 2 dividing p-1, so that p-1 = 2^{s}d and d is odd. Then for any 1 \leq n \leq p-1, one of two things happens:

  • n^d = 1 \mod p or
  • n^{2^j d} = -1 \mod p for some 0 \leq j < s.

The algorithm then simply operates as follows: pick nonzero n at random until both of the above conditions fail. Such an n is called a witness for the fact that p is a composite. If p is not a prime, then there is at least a 3/4 chance that a randomly chosen n will be a witness.

We leave the proof of the theorem as an exercise. Start with the fact that a^{p-1} = 1 \mod p (this is Fermat’s Little Theorem). Then use induction to take square roots (the result has to be +/-1 mod p), and continue until you get to a^{d}=1 \mod p.

The Python code above uses Python’s built in modular exponentiation function pow to do fast modular exponents. The isWitness function first checks n^d = 1 \mod p and then all powers n^{2^j d} = -1 \mod p. The probablyPrime function then simply generates random potential witnesses and checks them via the previous function. The output of the function is True if and only if all of the needed modular equivalences hold for all witnesses inspected. The choice of endpoints being 2 and p-2 are because 1 and p-1 will always have exponents 1 mod p.

About these ads

Why Theoretical Computer Scientists Aren’t Worried About Privacy

There has been a lot of news recently on government surveillance of its citizens. The biggest two that have pervaded my news feeds are the protests in Turkey, which in particular have resulted in particular oppression of social media users, and the recent light on the US National Security Agency’s widespread “backdoor” in industry databases at Google, Verizon, Facebook, and others. It appears that the facts are in flux, as some companies have denied their involvement in this program, but regardless of the truth the eye of the public has landed firmly on questions of privacy.

Barack Obama weighed in on the controversy as well, being quoted as saying,

You can’t have 100% security and 100% privacy, and also zero inconvenience.

I don’t know what balance the US government hopes to strike, but what I do know is that privacy and convenience are technologically possible, and we need not relinquish security to attain it.

Before I elaborate, let me get my personal beliefs out of the way. I consider the threat of terrorism low compared to the hundreds of other ways I can die. I should know, as I personally have been within an \varepsilon fraction of my life for all \varepsilon > 0 (when I was seven I was hit by a bus, proclaimed dead, and revived). So I take traffic security much more seriously than terrorism, and the usual statistics will back me up in claiming one would be irrational to do otherwise. On the other hand, I also believe that I only need so much privacy. So I don’t mind making much of my personal information public, and I opt in to every one of Google’s tracking services in the hopes that my user experience can be improved. Indeed it has, as services like Google Now will, e.g., track my favorite bands for me based on my Google Play listening and purchasing habits, and alert me when there are concerts in my area. If only it could go one step further and alert me of trending topics in theoretical computer science! I have much more utility for timely knowledge of these sorts of things than I do for the privacy of my Facebook posts. Of course, ideologically I’m against violating privacy as a matter of policy, but this is a different matter. One can personally loathe a specific genre of music and still recognize its value and one’s right to enjoy it.

But putting my personal beliefs aside, I want to make it clear that there is no technological barrier to maintaining privacy and utility. This may sound shocking, but it rings true to the theoretical computer scientist. Researchers in cryptography have experienced this feeling many times, that their wildest cryptographic dreams are not only possible but feasible! Public-key encryption and digital signatures, secret sharing on a public channel, zero-knowledge verification, and many other protocols have been realized quite soon after being imagined. There are still some engineering barriers to implementing these technologies efficiently in large-scale systems, but with demand and a few years of focused work there is nothing stopping them from being used by the public. I want to use this short post to describe two of the more recent ideas that have pervaded the crypto community and provide references for further reading.

Differential Privacy and Fully Homomorphic Encryption

There are two facts which are well known in theoretical computer science that the general public is not aware of. The first is about the privacy of databases:

There is a way to mine information from a database without the ability to inspect individual entries in the database.

This is known as differential privacy. The second is no less magical:

There are secure encryption schemes which allow one to run programs on encrypted data and produce encrypted results, without the ability to decrypt the data.

This is known as fully homomorphic encryption. 

The implications of these two facts should be obvious: search engines need not know our queries but can still fetch us search results and mine our information to serve ads, Facebook need not have access to our personal data but may still accurately predict new friends, grocery stores can even know what products to place side by side without knowing what any individual customer has purchased. Banks could process our transactions without knowing the amounts involved, or even the parties involved. Perhaps most importantly, governments can have access to databases (in the form of differentially private queries) and mine for the existence of threats without violating any individual user’s privacy. If they get an indication of a terrorist threat, then they can use the usual channels (court orders) to get access to specific individual data.

It’s easy to argue that these techniques will never become mainstream enough for individuals to benefit from it. Indeed, we’ve had cryptography for many years but few average users actively encrypt their communication for a lack of convenience. And then there are questions of policy: why would any company relinquish the ability to work directly with user data? And the cost of rearchitecturing existing services to utilize these technologies would be enough to dissuade most business leaders.

But the point of all this is that these are problems of policy that could in principle be solved without waiting for governments and corporations to get their act together. With enough demand for such services and with enough technologically-minded entrepreneurs (I’m looking at you, Silicon Valley), it would be just a matter of time before the world was differentially private. Mathematics cannot be revoked or legislated away.

Fully Homomorphic Encryption

A fully homomorphic encryption scheme is a normal encryption scheme (two functions “enc” and “dec” to encrypt and decrypt) with one additional function, which we’ll call “eval.” Very roughly, eval accepts as input the text of a program and a ciphertext, and produces as output a ciphertext such that the following diagram commutes:

FHE-diagram

That is, m is our message, and \textup{eval} runs f on the encrypted version of our message. In practice this happens by lifting two operations, multiplication and addition, from plaintexts (which are usually number-representations of letters) to ciphertexts (again usually numbers). Once this is done one can simulate the functionality of an arbitrary circuit on the encrypted data without decrypting it. Those readers who have been following our category theory series will recognize these sorts of diagrams as being functorial. [Actually, at the time of this writing we have yet to look at functors, but we will soon!] So perhaps a better term would be “functorial encryption.”

I should emphasize: a truly homomorphic encryption scheme has the ability to run any computable function on the encrypted data. There is no loss of functionality in preserving the privacy from the program runner. The main use of this is to maintain privacy while deferring large computations to the cloud. We do this all the time, e.g. a search query, but it also applies to big websites like Reddit, which operate entirely on Amazon Web Services.

Fully homomorphic encryption was first envisaged by Rivest, Adleman (two of the inventors of RSA), and Dertouzos in the late seventies, mainly because the RSA encryption scheme is close to being homomorphic (one can multiply ciphertexts, but not add them). In 2009, Craig Gentry released the first working fully-homomorphic scheme based on the mathematical theory of ideal lattices, and later that year he (with a group of other researchers) came up with a second system that is arguably as simple as RSA; it operates on integers with modular arithmetic.

Gentry has produced a lot of research since then in homomorphic encryption, but the interested reader should probably start with his tutorial paper describing his arithmetic-based system. From there, there are existing implementations in Python (using Sage) and C++, both of which are freely available on github.

Differential Privacy

The main idea of differential privacy is that one can add noise to statistical data to protect the identities of individual records. Slightly more rigorously, a randomized algorithm f is said to be \varepsilon-differentially private if for all possible datasets (inputs) D_1, D_2 which differ on a single record, and all possible collections of outputs y of f, the probability of correctly guessing D_1 from y is not significantly different from that of D_2. In particular, their quotient is at most e^{\varepsilon} (this choice of using e is arbitrary, but makes the analysis nicer).

The motivation for differential privacy came from two notable events in which companies released “anonymized” data which was partially de-anonymized because it was too specific. The first was the million-dollar Netflix Prize contest to develop a better recommendation algorithm, and the second was the release of the Massachusetts Group Insurance Commission medical database. As such, many companies are very strict with how they handle their user data, and information sharing the medical community is practically nonexistent.

There are many known differentially private algorithms, and they’re much stronger than one would imagine at first. One can run random forests of decision trees, network trace analysis, query-click analysis, certain forms of clustering, and a whole host of combinatorial optimization problems. For a gentle introduction to differential privacy, see Christine Task’s lecture video, a Practical Beginner’s Guide to Differential Privacy. There is also an influential survey from Microsoft Research of Dwork. These go into much more detail about the abilities and inabilities of differential privacy than I could do here.

If there’s one thing to take away from this discussion, it’s that efficient protocols for ensuring privacy are out there waiting to be implemented in software. So while we complain and listen to others complain about governments violating our liberties (and indeed, this discussion is extremely important to have), let’s do a little mathematics, do a little computer science, and figure out how to make privacy the standard of measure in software.

Until next time!

Conferences, Summer Work, and an Advisor

I’ve been spending a little less time on my blog recently then I’d like to, but for good reason: I’ve been attending two weeks of research conferences, I’m getting ready for a summer internship in cybersecurity, and I’ve finally chosen an advisor.

Visions, STOC, and CCC

The Simons Institute at UC Berkeley

I’ve been taking a break from the Midwest for the last two weeks to attend some of this year’s seminal computer science theory conferences. The first (which is already over) was the Simon’s Institute Symposium on the Visions of the Theory of Computing. It wasn’t a typical computer science theory conference, because the Simon’s Institute exists to bring techniques from computer science to other fields. So there were many talks in life sciences, ranging from neurobiology (modelling the brain as a computer) to nanochemistry (using DNA to engineer computers) to physics (quantum computing). These kinds of sciences are interesting to me, but pretty far beyond my knowledge base. Other talks were on social networks (and the web), designing markets to be efficient, and computational privacy.

All of these talks shared a common theme: how can we apply the computational “lens” to everything? Interestingly, there were some very deep philosophical questions that computational insight can give answers to. For instance, Christos Papadimitriou gave a talk on evolution and sex, and raised an interesting question: what is the place of sex in evolution? Briefly, the problem with sex is as follows. Say you have a perfect human: perfect health, perfect genes, perfect intelligence, perfect everything. Because of sex, all of this human’s offspring are guaranteed to be imperfect. If the “goal” of evolution is to promote the fitness of a species, it seems that sexual reproduction works against us. Why then, is sexual reproduction the norm and asexual reproduction the exception?

Indeed, the same patterns hold in computer science. Genetic algorithms are the analogue of sexual reproduction, while “simulated annealing” (gradient descent with random mutation) is the analogue of asexual reproduction. And there we can actually experiment, and (without insulting anyone too heavily) it’s largely the case that the asexual methods outperform sexual methods.

But with computational (mathematical) models, we can actually analyze them to see what’s going on. And in fact, genetic algorithms are not optimizing a fitness function. Instead, they are simultaneously optimizing fitness and entropy in the gene pool. One might argue this allows for maximal adaptability, and there are some connections to learning-theory methods that mirror evolution (see Multiplicative Weights Update Algorithm).

This is pretty cool to think about, but more deeply there are philosophical claims about how we can apply the computational lens to other fields. The idea is that computer scientists can create and analyze generative models which reflect the empirical phenomena we observe. Indeed, this mentality is in our blood. It stems from (in the words of my advisor) a “deeply religious” view that everything is computation. And even beyond “big data” and other popular trends in the news, researchers in other fields are increasingly appreciative of the algorithmic mindset computer scientists have to offer.

The second two conferences, the Symposium on the Theory of Computing (STOC) and the Conference on Computational Complexity (CCC) are much more technical. STOC is a whirlwind of a conference, with around 15 twenty-minute talks per day (30 in total with two running in parallel at a time). And I admit that I understand very little. As I write this I’m lost during a talk on an online version of the Steiner tree problem. But even so, it gives a great glimpse into the frontier of research, and tells me what things I should be looking at in my own work.

One new notion that seems omnipresent at STOC this year is the concept of differential privacy. The idea stems from recent controversies over privacy in “anonymized” database information that was released to the public being de-anonymized in clever ways. So the question becomes: is it possible for a database to release information from its databases without revealing any information about the contents of the databases? “Information” could mean simple things like means and medians, or more complicated things like principal components. It’s a question about the mechanism for releasing data, and it’s inherently a computational problem: we want it to be (probabilistically) hard to accurately represent the data from the information provided. At a very high level, the idea is to add random noise to the data as you compute, and modify your methods to be resilient to such noise. There is a flurry of work being done on this topic, with a wealth of open problems waiting to be tackled. A related idea is that of homomorphic encryption, in which the user sends encrypted data to the compute server, the compute server computes things without decrypting the data, and then the user decrypts the result of the computation locally. The server, then, can have computational powers far beyond that of the user, but the user need not relinquish sensitive information to access that power.

A schematic example of differential privacy at work. Image credit: Microsoft Research.

Another popular topic (which I know even less about) is the idea of communication complexity and information complexity. Roughly, this field is concerned with two communicating parties with private information trying to compute a function that depends on both of their information. The communication complexity question is how can they do this while spending the least amount of resources on communication. The information complexity concern is to keep as much information private as possible, both between the two parties involved and any nefarious outsiders listening in on the conversation.

It’s amazing to me how much privacy concerns have permeated into theoretical computer science. The usual picture of a mathematician is of a wiry-haired chalk-dust-covered professor with no concern for the world outside his or her own mind. But in reality, we are actively working on the problems faced by average users, to protect them against the dangers of small groups of individuals making bad decisions with their sensitive data. Anyone worried about the upcoming dystopia posed by the masses of available data is apparently not informed in the magical powers of modern cryptography. It seems like it will only be a few years before these kinds of systems become feasible for large-scale applications. Maybe I’ll be lucky enough to have a hand in that :)

Starting Wednesday is the CCC, and this will likely be more concerned with more “pure” kinds of theoretical computer science research. I’m pretty excited for it, too (although I know by the end I will be exhausted; I’m already feeling the strain today).

MIT Lincoln Lab

This summer (actually, in a few days), I’ll be at MIT’s Lincoln Lab to work on problems in cybersecurity. I still don’t know quite what I’ll be working on (and I probably won’t be at liberty to describe it when I do find out), but vaguely the problem is in finding the right data representations for massive amounts of data so as to reason about it.

I know that the idea of the government analyzing the world’s internet activity is a scary thought, but I’m going to do my best to go into this with an open mind. At the very least, I’ll be familiar enough with the kind of work they do there to decide if it violates any of my principles. I had minor worries about this last year when I went to work at Lawrence Livermore, but by the time I got there it was quite obvious that my work would keep me far far away from weapons development and closer to problems with renewable energy.

And at least it will be fun to live on the east coast for a little while. In anticipation I’ve joined the Boston Python Users group, so it will be fun to go to some of their project nights and talk math and programming with like-minded people in the area.

Advisor

Finally, I’m quite pleased to announce that I have secured an advisor for my PhD program! His name is Lev Reyzin, and he’s a learning theorist. (For the last year or so I’ve been meaning to start talking about learning theory on this blog). That means that I’m officially in the field of theoretical computer science.

I was a little bit late in choosing an advisor, mostly because I meandered around algebraic geometry and algebraic topology and logic before deciding that theoretical computer science was the right field for me.

It’s exciting to finally have an advisor because it implies a few things:

  • I’ve passed all of my prelim exams.
  • I can focus on one area and start to deeply understand important problems in theoretical computer science.
  • My graduate life will shift from mainly course-taking to mainly research. Research is much more fun.

Lev is also a good fit for me both because I’m interested in learning theory, and because he has nontrivial experience in industry-focused research. He’s done work at Yahoo! Research and Google Research, in particular on improving their algorithms for placing news stories and ads to the front page. The formal name for this problem is the multi-armed bandit problem, and I’ll definitely be writing about it on this blog. While I’d probably be totally happy doing pure learning theory, I think it’s fun and worthwhile to look at wider applications. So I have a lot of optimism that the next few years will be fun and productive.

Until next time!

Rings — A Second Primer

Last time we defined and gave some examples of rings. Recapping, a ring is a special kind of group with an additional multiplication operation that “plays nicely” with addition. The important thing to remember is that a ring is intended to remind us arithmetic with integers (though not too much: multiplication in a ring need not be commutative). We proved some basic properties, like zero being unique and negation being well-behaved. We gave a quick definition of an integral domain as a place where the only way to multiply two things to get zero is when one of the multiplicands was already zero, and of a Euclidean domain where we can perform nice algorithms like the one for computing the greatest common divisor. Finally, we saw a very important example of the ring of polynomials.

In this post we’ll take a small step upward from looking at low-level features of rings, and start considering how general rings relate to each other. The reader familiar with this blog will notice many similarities between this post and our post on group theory. Indeed, the definitions here will be “motivated” by an attempt to replicate the same kinds of structures we found helpful in group theory (subgroups, homomorphisms, quotients, and kernels). And because rings are also abelian groups, we will play fast and loose with a number of the proofs here by relying on facts we proved in our two-post series on group theory. The ideas assume a decidedly distinct flavor here (mostly in ideals), and in future posts we will see how this affects the computational aspects in more detail.

Homomorphisms and Sub-structures

The first question we should ask about rings is: what should mappings between rings look like? For groups they were set-functions which preserved the group operation (f(xy) = f(x)f(y) for all x,y). The idea of preserving algebraic structure translates to rings, but as rings have two operations we must preserve both.

Definition: Let (R, +_R, \cdot_R), (S, +_S, \cdot_S) be rings, and let f: R \to S be a function on the underlying sets. We call f a ring homomorphism if f is a group homomorphism of the underlying abelian groups, and for all x,y \in R we have f(x \cdot_R y) = f(x) \cdot_S f(y). A bijective ring homomorphism is called an isomorphism.

Indeed, we have the usual properties of ring homomorphisms that we would expect. All ring homomorphisms preserve the additive and multiplicative identities, multiplicative inverses (when they exist), and things like zero-divisors. We leave these verifications to the reader.

Ring homomorphisms have the same kinds of constructions as group homomorphisms did. One of particular importance is the kernel \textup{ker} f, which is the preimage of zero under f, or equivalently the set \left \{ x : f(x) = 0 \right \}. This is the same definition as for group homomorphisms, linear maps, etc.

The second question we want to ask about rings is: what is the appropriate notion of a sub-structure for rings? One possibility is obvious.

Definition: A subring S of a ring R is a subset S \subset R which is also a ring under the same operations as those of R (and with the same identities).

Unfortunately subrings do not have the properties we want them to have, and this is mostly because of the requirement that our rings have a multiplicative identity. For instance, we want to say that the kernel of a ring homomorphism f: R \to S is a subring of R. This will obviously not be the case, because the multiplicative identity never maps to zero under a ring homomorphism! We also want an appropriate notion of a quotient ring, but if we quotient out (“declare to be zero”) a subring, then the identity will become zero, which yields all sorts of contradictions in all but the most trivial of rings. One nice thing, however, is that the image of a ring homomorphism R \to S is a subring of S. It is a trivial exercise to prove.

Rather than modify our definition of a kernel or a quotient to make it work with the existing definition of a subring, we pick a different choice of an appropriate sub-structure: the ideal.

The Study of Ideals

From an elementary perspective, an ideal is easiest understood as a generalization of even numbers. Even integers have this nice property that multiplying any integer by an even integer gives you an even integer. This is a kind of closure property that mathematicians really love to think about, because they tend to lead to further interesting properties and constructions. Indeed, we can generalize it as follows:

Definition: Let R be a commutative ring, and I \subset R. We call I an ideal if two conditions are satisfied:

  1. I is a subgroup of R under addition.
  2. For all r \in R and for all x \in I, we have rx \in I.

That is, an ideal is a subgroup closed under multiplication. The even integers as a subset of \mathbb{Z} give a nice example, as does the set \left \{ 0 \right \}. In fact, every ideal contains zero by being a subgroup. A slightly more complicated example is the set of polynomials divisible by (x-1) as a subset of the polynomial ring \mathbb{R}[x].

We have to be a little careful here if our ring is not commutative. Technically this definition above is for a left-ideal, which is closed under left-multiplication. You can also define right-ideals closed under right-multiplication, and the official name for a plain old “ideal” is a two-sided ideal. The only time we will ever work with noncommutative rings (that we can envision) is with matrix rings, so excluding that case our ideals will forevermore be two-sided. Often times we will use the notation rI to denote the set of all possible left multiplications \left \{ rx : x \in I \right \}, and so the ideal condition is just rI = I. This is a multiplicative kind of coset, although as we are about to see we will use both additive and multiplicative cosets in talking about rings.

Let’s look at some properties of ideals that show how neatly this definition encapsulates what we want. First,

Proposition: The kernel of a ring homomorphism is an ideal.

Proof. Let f: R \to S be a ring homomorphism, and let I = \textup{ker}(f). We already know that I is a subgroup of R under addition because kernels of group homomorphisms are subgroups. To show the ideal condition, simply take r \in R, x \in I and note that f(rx) = f(r)f(x) = f(r)0 = 0, and so rx \in \textup{ker}(f). \square

In fact the correspondence is one-to-one: every ideal is the kernel of a ring homomorphism and every ring homomorphism’s kernel is an ideal. This is not surprising as it was the case for groups, and the story starts here with quotients, too.

Definition: Let R be a ring and I an ideal of R. The quotient group R/I forms a ring called the quotient ring, and is still denoted by R / I.

To show this definition makes any sense requires some thought: what are the operations of this new ring? Are they well-defined on coset representatives?

For the first, we already know the addition operation because it is the same operation when we view R / I as a quotient group; that is, (x + I) + (y + I) = (x+y) + I. The additive identity is just 0 + I = I. The multiplication operation is similar: (x + I)(y + I) = xy + I. And the multiplicative identity is clearly 1 + I.

The fact that multiplication works as we said it does above gives more motivation for the definition of an ideal. To prove it, pick any representatives x + z_1 \in x + I and y + z_2 \in y + I. Their product is

(xy + xz_2 + yz_1 + z_1z_2) \in xy + xI + yI + I^2

Where we denote by I^2 = \left \{ xy : x \in I, y \in I \right \}. The condition for an ideal to be an ideal is precisely that the weird parts, the xI + yI + I^2, become just I. And indeed, I is an additive group, so sums of things in I are in I, and moreover I is closed under multiplication by arbitrary elements of R. More rigorously, xy + xz_2 + yz_1 + z_1z_2 is equivalent to xy under the coset equivalence relation because their difference xz_2 + yz_1 + z_1z_2 is a member of I.

Now we can realize that every ideal is the kernel of some homomorphism. Indeed, if I is an ideal of R, then there is a natural map \pi : R \to R/I (called the canonical projection, see our post on universal properties for more) defined by sending an element to its coset: \pi(x) = x+ I. It should be obvious to the reader that the kernel of \pi is exactly I (because x + i = 0 + I if and only if x \in I; this is a fact we borrow from groups). And so there is a correspondence between kernels and ideals. They are really the same concept manifested in two different ways.

Because we have quotient rings and ring homomorphisms, we actually get a number of “theorems” for free. These are the isomorphism theorems for rings, the most important one being the analogue of the First Isomorphism Theorem for groups. That is, if f: R \to S is a ring homomorphism, \textup{im}(f) is a subring of S, and moreover R/ \textup{ker}(f) \cong \textup{im}(f). We invite the reader to prove it by hand (start by defining a map \varphi(x + I) = f(x) and prove it’s a ring isomorphism). There are some other isomorphism theorems, but they’re not particularly deep; rather, they’re just commonly-applied corollaries of the first isomorphism theorem. In light of this blog’s discussions on category theory, the isomorphism theorems are a trivial consequence of the fact that rings have quotients, and that \textup{im}(f) happens to be well-behaved.

Noetherian Rings and Principal Ideal Domains

One cannot overestimate the importance of ideals as a fundamental concept in ring theory. They are literally the cornerstone of everything interesting in the subject, and especially the computational aspects of the field (more on that in future posts). So to study ideals, we make some basic definitions.

Definition: Let A \subset R be any subset of the ring R. The ideal generated by A is the smallest ideal of R containing A, denoted I(A). It is “smallest” in the sense that all ideals containing A must also contain I(A).

Indeed, we can realize I(A) directly as the set of all possible finite linear combinations r_1 a_1 + \dots r_k a_k where the r_i \in R and a_i \in A. Such a linear combination is called an R-linear combination because the “coefficients” come from R. It is not hard to see that this is an ideal. It is obviously a subgroup since sums of linear combinations are also linear combinations, and negation distributes across the sum. It satisfies multiplicative closure because

r(r_1a_1 + \dots + r_k a_k) = (rr_1)a_1 + \dots + (rr_k)a_k

is another R-linear combination.

One convenient way to think of this ideal is as the intersection of all ideals I which contain A. Since the intersection of any family of ideals is an ideal (check this!) this will always give us the smallest possible ideal containing A.

One important bit of notation is that if I = I(A) is the ideal generated by a finite set A, then we write I = \left \langle a_1, \dots, a_n \right \rangle where A = \left \{ a_1, \dots, a_n \right \}. We say that I is finitely generated. If A happens to be a singleton, we say that I is a principal ideal. Following the notation of linear algebra, a minimal (by cardinality) generating set for an ideal is called a basis for the ideal. 

Thinking about how ideals are generated is extremely important both in the pure theory of rings and in the computational algorithms that deal with them. It’s so important, in fact, that there are entire classes of rings defined by how their ideals behave. We give the two most important ones here.

Definition: A commutative ring is called Noetherian if all of its ideals are finitely generated. An integral domain is called a principal ideal domain if all of its ideals are principal.

The concept of a Noetherian ring is a particularly juicy one, and it was made famous by the founding mother of commutative ring theory, Emmy Noether. Without going into too much detail, just as an integral domain is the most faithful abstraction of the ring of integers, a Noetherian ring is the best way to think about polynomial rings (and quotients of polynomial rings). This is highlighted by a few basic facts and some deep theorems:

Fact: If R is a Noetherian ring and I an ideal, then R/I is Noetherian.

This follows from the correspondence of ideals between R and R/I. Indeed, for every ideal J \subset R/I there is an ideal \pi^{-1}(J) of R which contains I. Moreover, this correspondence is a bijection. So if we want a generating set for J, we can lift it up to R via \pi, get a finite generating set for \pi^{-1}(J), and project the generators back down to R/I. Some of the generators will be killed off (sent to I by \pi), but what remains will be a valid generating set for J, and still finite.

Theorem (Hilbert Basis Theorem): If R is a Noetherian ring, then the polynomial ring in one variable R[x] is Noetherian. Inductively, R[x_1, \dots, x_n] is also Noetherian.

These theorems start to lay the foundation for algebraic geometry, which connects ideals generated by a family of polynomials to the geometric solution set of those polynomials. Since a vast array of industrial problems can be reduced to solving systems of polynomial equations (take robot motion planning, for example), it would be quite convenient if we could write programs to reason about those systems of equations. Indeed, such algorithms do exist, and we make heavy use of theorems like the Hilbert Basis Theorem to ensure their correctness.

In the next post in this series, we’ll start this journey through elementary algebraic geometry by defining a variety, and establishing its relationship to ideals of polynomial rings. We’ll then work towards theorems like Hilbert’s Nullstellensatz, the computational operations we wish to perform on ideals, and the algorithms that carry out those operations. As usual, we’ll eventually see some applications and write some code.

Until then!