Two’s Complement and Group Theory

Before I discovered math, I was a first year undergrad computer science student taking Electrical Engineering 101. The first topic I learned was what bits and boolean gates are, and the second was the two’s complement representation of a negative n-bit integer.

At the time two’s complement seemed to me like a bizarre quirk of computer programming, with minutiae you just had to memorize. If the leading bit is 1, it’s negative, and otherwise it’s positive. To negate a number, flip the bits and add one. Except of course the number 1000 0000, which in 8-bit two’s complement is -128, because its “negative” according to this operation is also 1000 0000 = -128.

The quirky negation operation is usually explained by appealing to the “borrowing” operation from elementary school subtraction. Or an appeal to a sense of shared helplessness. That’s just the way things are in two’s complement. But it doesn’t quite explain why one can use the same boolean circuit to compute addition and multiplication of unsigned and two’s complement signed integers.

Math can provide a different, clearer explanation for the quirkiness. It’s not arbitrary, but forced. The explanation uses group theory, which I wrote a primer about a long time ago. The main ideas from group theory used here are cyclic groups, quotient groups, group isomorphisms, and group actions.

The first insight is that the signed integers are the exact same as the unsigned integers as a quotient group, just with a different equivalence class representative chosen for the arithmetic.

The set of $n$-bit unsigned integers $\mathbb{Z}/2^n\mathbb{Z} = \{ 0, 1, \dots, 2^n-1 \}$ forms a group under addition modulo $2^n$. The set of $n$-bit signed integers $\{ -2^{n-1}, \dots, -1, 0, 1, \dots, 2^{n-1} – 1 \}$ also forms a group, but it’s harder to see why. The operation is still “addition modulo $2^n$”, but you have to recall the definition of a quotient group to see why.

The elements of the quotient group $\mathbb{Z}/2^n \mathbb{Z}$ aren’t integers, but rather equivalence classes of integers. We denote $[x] \in \mathbb{Z}/2^n \mathbb{Z}$ as the equivalence class of $x \in \mathbb{Z}$. For example, the “element” $1$ is the equivalence class $[1] = \{ \dots, -2^n+1, 1, 2^n + 1, 2\cdot 2^n + 1, \dots \}$. And the whole point of defining equivalence classes is that you can choose any representative element from an equivalence class, do the arithmetic using that element, and you get the same (equivalence class) output as you would if you had used the “usual” representative. In other words, it’s guaranteed that $[x+y] = [x] + [y] = [x’] + [y’] = [x’ + y’]$, so long as $[x] = [x’]$ and $[y] = [y’]$.

Signed integers are exactly this: choosing a different representative for some of the equivalence classes. The element $[2^{n-1}] = \{ \dots, -2\cdot 2^{n-1}, -2^{n-1}, 2^{n-1}, 2\cdot 2^{n-1}, \dots \}$ is represented by $-2^{n-1}$, the element $[2^{n-1} + 1]$ is represented by $-2^{n-1} + 1$, all the way up to $2^n – 1$ being represented by $-1$. This is a formal way to say two’s complement is a “reinterpretation” of the bits of an unsigned integer. It’s not an arbitrary reinterpretation, but one that satisfies the exact same arithmetic structure as the unsigned interpretation.

So any circuit representation of an addition operation that works for unsigned integers can also be used for addition for all possible alternative choices of equivalence class representatives. Note this has nothing to do with the way these numbers are represented as bits. It’s a guarantee of the mathematics underlying modular arithmetic. It would work if the numbers were represented in ternary or base 10 (excluding the fact that there would be some extra inputs and outputs not in the set). Also note it applies equally well to multiplication, since everything said here applies equally well to the quotient ring of integers modulo $2^n$.

It also makes it clear that the choice of $-2^{n-1}$ as the smallest negative number is arbitrary, and one could equally well include $2^n-1$ as the largest positive, leaving $-2^{n-1} + 1$ as the smallest negative. I think the current standard choice ($-2^{n-1}$ as the smallest negative number) is objectively better than this alternative, because it allows one to use the leading sign bit as negativeness/overflow/underflow detector.

Now we can study the negation operation in this group. For starters, for each element $x$ except $-2^{n-1}$, the number $-x$ is already a signed $n$-bit integer. So negation is just negation (we’ll get back to how this is interpreted as bits in a second). And if we think of $-2^{n-1}$ as an equivalence class, it’s the same as $2^{n-1}$, which is its own inverse as an unsigned integer. Add it to itself and you get $2^n \equiv 0 \mod 2^n$. Since equivalence classes behave identically no matter the representative, $-2^{n-1}$ must be its own inverse, and hence $-(-2^{n-1}) = -2^{n-1}$ as signed integers.

To interpret it as bits, again lift back to unsigned integers, and notice that “negation” for an unsigned integer $x$—that is, finding the value $y$ such that $x+y \equiv 0 \mod 2^n$—is computed via $-[x] = [2^n – x]$ (subtraction inside the equivalence class is defined even though it’s being used to define $-[x]$, because inside an equivalence class the arithmetic is just arithmetic on integers). But because $2^n$ isn’t representable in concrete $n$-bit numbers (it’s just zero), to compute it entirely within $n$-bit integers, we need to break the arithmetic down further:

$$ [2^n – x ] = [ 2^n – 1 – x + 1 ] = [ 2^n – 1 – x ] + [ 1 ] $$

$2^n-1$ is the string of all 1’s, hence $2^n – 1 – x$ is equivalent to flipping the bits of $x$, and adding 1 finishes the calculation.

Finally, this $-2^{n-1}$-is-its-own-inverse business. If you view the group of unsigned integers as a circle—a natural choice because addition “wraps around” like clock math—the negation operation can be seen as a reflection across the line passing through $0$ and the circle’s center. Naturally this makes the negation of $0$ equal to $0$. The other point that line of symmetry passes through is $[2^{n-1}] = [-2^{n-1}]$, and so it must also fix $-2^{n-1}$.

Signed $n$-bit integers visualized on a circle. Reflection across the dashed horizontal line defines negation.

This is explanation enough, but the pesky question is whether it’s forced. Is there some other representation better than two’s complement for which this behavior doesn’t occur? So long as our number system has an even size and 0 is its own inverse, then simple counting shows it’s impossible. We’d have an odd number $2^n – 1$ of nonzero elements, and the inverse operation would match them up in pairs, producing an even total unless one of the “pairs” is a singleton.

Aside: The Wikipedia article on this topic had a somewhat sloppy explanation using group actions (really, Burnside’s lemma), which, while correct, is like using the Millennium Falcon to shoot a sparrow. I simplified it to the above counting argument.

Group Actions and Hashing Unordered Multisets

I learned of a neat result due to Kevin Ventullo that uses group actions to study the structure of hash functions for unordered sets and multisets. This piqued my interest because a while back a colleague asked me if I could think of any applications of “pure” group theory to practical computer programming that were not cryptographic in nature. He meant, not including rings, fields, or vector spaces whose definitions happen to be groups when you forget the extra structure. I couldn’t think of any at the time, and years later Kevin has provided me with a tidy example. I took the liberty of rephrasing his argument to be a bit simpler (I hope), but I encourage you to read Kevin’s post to see the reasoning in its original form.

Hashes are useful in programming, and occasionally one wants to hash an unordered set or multiset in such a way that the hash does not depend on the order the elements were added to the set. Since collection types are usually generic, one often has to craft a hash function for a set<T> or multiset<T> that relies on a hash function hash(t) of an unknown type T. To make things more efficient, rather than requiring one to iterate over the entire set each time a hash is computed, one might seek out a hash function that can be incrementally updated as new elements are added, and provably does not depend on the order of insertion.

For example, having a starting hash of zero, and adding hashes of elements as they are added (modulo $ 2^{64}$) has this incremental order-ignorant property, because addition is commutative and sums can be grouped. XOR-ing the bits of the hashes is similar. However, both of these strategies have downsides.

For example, if you adopt the XOR strategy for a multiset hash, then any element that has an even quantity in the multiset will be the same as if it were not in the set at all (or if it were in the set with some other even quantity). This is because x XOR x == 0. On the other hand, if you use the addition approach, if an element hashes to zero, its inclusion in any set has no effect on the hash. In Java the integer hash is the identity, so zero would be undetectable as a member of a multiset of ints in any quantity. Less drastically, a multiset with all even counts of elements will always hash to a multiple of 2, and this makes it easier to find hash collisions.

A natural question arises: given the constraint that the hash function is accumulative and commutative, can we avoid such degenerate situations? In principle the answer should obviously be no, just by counting. I.e., the set of all unordered sets of 64-bit integers has size $ 2^{2^{64}}$, while the set of 64-bit hashes has size merely $ 2^{64}$. You will have many many hash collisions, and would need a much longer hash to avoid them in principle.

More than just “no,” we can characterize the structure of such hash functions. They impose an abelian group structure on the set of hashes. And due to the classification theorem of finite abelian groups, up to isomorphism (and for 64-bit hashes) that structure consists of addition on blocks of bits with various power-of-2 moduli, and the blocks are XOR’ed together at the end.

To give more detail, we need to review some group theory, write down the formal properties of an accumulative hash function, and prove the structure theorem.

Group actions, a reminder

This post will assume basic familiarity with group theory as covered previously on this blog. Basically, this introductory post defining groups and actions, and this followup post describing the classification theorem for commutative (abelian) groups. But I will quickly review group actions since they take center stage today.

A group $ G$ defines some abstract notion of symmetries in a way that’s invertible. But a group is really meaningless by itself. They’re only useful when they “act” upon a set. For example, a group of symmetries of the square acts upon the square to actually rotate its points. When you have multiple group structures to consider, it makes sense to more formally separate the group structure from the set.

So a group action is formally a triple of a group $ G$, a set $ X$, and a homomorphism $ f:G \to S_X$, where $ S_X$ is the permutation group (or symmetry group) of $ X$, i.e., the set of all bijections $ X \to X$. The permutation group of a set encompasses every possible group that can act on $ X$. In other words, every group is a subgroup of a permutation group. In our case, $ G$ and $ f$ define a subgroup of symmetries on $ X$ via the image of $ f$. If $ f$ is not injective, some of the structure in $ G$ is lost. The homomorphism determines which parts of $ G$ are kept and how they show up in the codomain. The first isomorphism theorem says how: $ G / \textup{ker} f \cong \textup{im} f$.

This relates to our hashing topic because an accumulative hash—and a nicely behaved hash, as we’ll make formal shortly—creates a group action on the set of possible hash values. The image of that group action is the “group structure” imposed by the hash function, and the accumulation function defines the group operation in that setting.

Multisets as a group, and nice hash functions

An appropriate generalization of multisets whose elements come from a base set $ X$ forms a group. This generalization views a multiset as a “counting function” $ T: X \to \mathbb{Z}$. The empty set is the function that assigns zero to each element. A positive value of $ k$ implies the entry shows up in the multiset $ k$ times. And a negative value is like membership “debt,” which is how we represent inverses, or equivalently set difference operations. The inverse of a multiset $ T$, denoted $ -T$, is the multiset with all counts negated elementwise. Since integer-valued functions generally form a group under point-wise addition, these multisets also do. Call this group $ \textup{MSet}(X)$. We will freely use the suggestive notation $ T \cup \{ x \} $ to denote the addition of $ T$ and the function that is 1 on $ x$ and 0 elsewhere. Similarly for $ T – \{ x \}$.

$ \textup{MSet}(X)$ is isomorphic to the free abelian group on $ X$ (because an instance of a multiset only has finitely many members). Now we can define a hash function with three pieces:

  • An arbitrary base hash function $ \textup{hash}: X \to \mathbb{Z} / 2^n \mathbb{Z}$.
  • An arbitrary hash accumulator $ \textup{h}: \mathbb{Z} / 2^n \mathbb{Z} \times \mathbb{Z} / 2^n \mathbb{Z} \to \mathbb{Z} / 2^n \mathbb{Z}$
  • A seed, i.e., a choice for the hash of the empty multiset $ s \in \mathbb{Z} / 2^n \mathbb{Z}$

With these three data we want to define a multiset hash function $ h^*: \textup{MSet}(X) \to \mathbb{Z} / 2^n \mathbb{Z}$ recursively as

  • $ h^*(\left \{ \right \}) = s$
  • $ h^*(T \cup \left \{ x \right \}) = h(h^*(T), \textup{hash}(x))$
  • $ h^*(T – \left \{ x \right \}) = \dots$

In order for the second bullet to lead to a well-defined hash, we need the property that the accumulation order of individual elements does not matter. Call a hash accumulator commutative if, for all $ a, b, c \in \mathbb{Z} / 2^n \mathbb{Z}$,

$ \displaystyle h(h(a,b), c) = h(h(a,c), b)$

This extends naturally to being able to reorder any sequence of hashes being accumulated.

The third is a bit more complicated. We need to be able to use the accumulator to “de-accumulate” the hash of an element $ x$, even when the set that gave rise to that hash didn’t have $ x$ in it to start.

Call a hash accumulator invertible if for a fixed hash $ z = \textup{hash}(x)$, the map $ a \mapsto h(a, z)$ is a bijection. That is, accumulating the hash $ z$ to two sets with different hashes under $ h^*$ will not cause a hash collision. This defines the existence of an inverse map (even if it’s not easily computable). This allows us to finish the third bullet point.

  • Fix $ z = \textup{hash}(x)$ and let $ g$ be the inverse of the map $ a \mapsto h(a, z)$. Then $ h^*(T – \left \{ x \right \}) = g(h^*(T))$

Though we don’t have a specific definition for the inverse above, we don’t need it because we’re just aiming to characterize the structure of this soon-to-emerge group action. Though, in all likelihood, if you implement a hash for a multiset, it should support incremental hash updates when removing elements, and that formula would apply here.

This gives us the well-definition of $ h^*$. Commutativity allows us to define $ h^*(T \cup S)$ by decomposing $ S$ arbitrarily into its constituent elements (with multiplicity), and applying $ h^*(T \cup \{ x \})$ or $ h^*(T – \{ x \})$ in any order.

A group action emerges

Now we have a well-defined hash function on a free abelian group.

$ \displaystyle h^*: \textup{MSet}(X) \to \mathbb{Z} / 2^n \mathbb{Z}$

However, $ h^*$ is not a homomorphism. There’s no reason hash accumulation needs to mesh well with addition of hashes. Instead, the family of operations “combine a hash with the hash of some fixed set” defines a group action on the set of hashes. Let’s suppose for simplicity that $ h^*$ is surjective, i.e., that every hash value can be achieved as the hash of some set. Kevin’s post gives the more nuanced case when this fails, and in that case you work within $ S_{\textup{im}(h^*)}$ instead of all of $ S_{\mathbb{Z} / 2^n \mathbb{Z}}$.

The group action is defined formally as a homomorphism

$ \displaystyle \varphi : \textup{MSet}(X) \to S_{\mathbb{Z} / 2^n \mathbb{Z}}$

where $ \varphi(T)$ is the permutation $ a \mapsto h(a, h^*(T))$. Equivalently, we start from $ a$, pick some set $ S$ with $ h^*(S) = a$, and output $ h^*(T \cup S)$.

The map $ \varphi$ is a homomorphism. The composition of two accumulations is order-independent because $ h$ is commutative. This is how we view $ h$ as “the binary operation” in $ \textup{im} \varphi$, because combining two permutations $ a \mapsto h(a, h^*(T))$ and $ a \mapsto h(a, h^*(S))$ is the permutation $ a \mapsto h(a, h^*(S \cup T))$.

And now we can apply the first isomorphism theorem, that

$ \displaystyle \textup{MSet}(X) / \textup{ker} \varphi \cong \textup{im} \varphi \subset S_{\mathbb{Z} / 2^n \mathbb{Z}}$

This is significant because any quotient of an abelian group is abelian, and this quotient is finite because $ S_{\mathbb{Z} / 2^n \mathbb{Z}}$ is finite. This means that the group $ \textup{im} \varphi$ is isomorphic to

$ \displaystyle \textup{im} \varphi \cong \bigoplus_{i=1}^k \mathbb{Z}/2^{n_i} \mathbb{Z}$

where $ n = \sum_i n_i$, and where the operation in each component is the usual addition modulo $ n_i$. The $ i$-th summand corresponds to a block of $ n_i$ bits of the hash, and within that block the operation is addition modulo $ 2^{n_i}$. Here the “block” structure is where XOR comes in. Each block can be viewed as a bitmask with zeros outside the block, and two members are XOR’ed together, which allows the operations to apply to each block independently.

For example, the group might be $ \mathbb{Z} / 2^{4} \mathbb{Z} \times \mathbb{Z} / 2^{26} \mathbb{Z}\times \mathbb{Z} / 2^{2} \mathbb{Z}$ for a 32-bit hash. The first block corresponds to 32-bit unsigned integers whose top 4 bits may be set but all other bits are zero. Addition is done within those four bits modulo 16, leaving the other bits unchanged. Likewise, the second component has the top four bits zero and the bottom two bits zero, but the remaining 26 bits are summed mod $ 2^{24}$. XOR combines the bits from different blocks.

In one extreme case, you only have one block, and your group is just $ \mathbb{Z} / 2^n \mathbb{Z}$ and the usual addition combines hashes. In the other extreme, each bit is its own block, your group is $ (\mathbb{Z} / 2 \mathbb{Z})^n$, the operation is a bitwise XOR.

Note, if instead of $ 2^n$ we used a hash of some other length $ m$, then in the direct sum decomposition above, $ m$ would be the product of the sizes of the components. The choice $ m = 2^n$ maximizes the number of different structures you can have.

Implications for hash function designers

Here’s the takeaway.

First, if you’re trying to design a hash function that avoids the degeneracies mentioned at the beginning of this article, then it will have to break one of the properties listed. This could happen, say, by maintaining additional state.

Second, if you’re resigned to use a commutative, invertible, accumulative hash, then you might as well make this forced structure explicit, and just pick the block structure you want to use in advance. Since no clever bit shifting will allow you to outrun this theorem, you might as well make it simple.

Until next time!

Rings — A Primer

Previously on this blog, we’ve covered two major kinds of algebraic objects: the vector space and the group. There are at least two more fundamental algebraic objects every mathematician should something know about. The first, and the focus of this primer, is the ring. The second, which we’ve mentioned briefly in passing on this blog, is the field. There are a few others important to the pure mathematician, such as the $ R$-module (here $ R$ is a ring). These do have some nice computational properties, but in order to even begin to talk about them we need to know about rings.

A Very Special Kind of Group

Recall that an abelian group $ (G, +)$ is a set $ G$ paired with a commutative binary operation $ +$, where $ G$ has a special identity element called 0 which acts as an identity for $ +$. The archetypal example of an abelian group is, of course, the integers $ \mathbb{Z}$ under addition, with zero playing the role of the identity element.

The easiest way to think of a ring is as an abelian group with more structure. This structure comes in the form of a multiplication operation which is “compatible” with the addition coming from the group structure.

Definition:ring $ (R, +, \cdot)$ is a set $ R$ which forms an abelian group under $ +$ (with additive identity 0), and has an additional associative binary operation $ \cdot$ with an element 1 serving as a (two-sided) multiplicative identity. Furthermore, $ \cdot$ distributes over $ +$ in the sense that for all $ x,y,z \in R$

$ x(y+z) = xy + xz$ and $ (y+z)x = yx + zx$

The most important thing to note is that multiplication is not commutative both in general rings and for most rings in practice. If multiplication is commutative, then the ring is called commutative. Some easy examples of commutative rings include rings of numbers like $ \mathbb{Z}, \mathbb{Z}/n\mathbb{Z}, \mathbb{Q}, \mathbb{R}$, which are just the abelian groups we know and love with multiplication added on.

If the reader takes anything away from this post, it should be the following:

Rings generalize arithmetic with integers.

Of course, this would imply that all rings are commutative, but this is not the case. More meaty and tempestuous examples of rings are very visibly noncommutative. One of the most important examples are rings of matrices. In particular, denote by $ M_n(\mathbb{R})$ the set of all $ n \times n$ matrices with real valued entries. This forms a ring under addition and multiplication of matrices, and has as a multiplicative identity the $ n \times n$ identity matrix $ I_n$.

Commutative rings are much more well-understood than noncommutative rings, and the study of the former is called commutative algebra. This is the main prerequisite for fields like algebraic geometry, which (in the simplest examples) associate commutative rings to geometric objects.

For us, all rings will have an identity, but many ring theorists will point out that one can just as easily define a ring to not have a multiplicative identity. We will call these non-unital rings, and will rarely, if ever, see them on this blog.

Another very important example of a concrete ring is the polynomial ring in $ n$ variables with coefficients in $ \mathbb{Q}$ or $ \mathbb{R}$. This ring is denoted with square brackets denoting the variables, e.g. $ \mathbb{R}[x_1, x_2, \dots , x_n]$. We rest assured that the reader is familiar with addition and multiplication of polynomials, and that this indeed forms a ring.

Kindergarten Math

Let’s start with some easy properties of rings. We will denote our generic ring by $ R$.

First, the multiplicative identity of a ring is unique. The proof is exactly the same as it was for groups, but note that identities must be two-sided for this to work. If $ 1, 1’$ are two identities, then $ 1 = 1 \cdot 1′ = 1’$.

Next, we prove that $ 0a = a0 = 0$ for all $ a \in R$. Indeed, by the fact that multiplication distributes across addition, $ 0a = (0 + 0)a = 0a + 0a$, and additively canceling $ 0a$ from both sides gives $ 0a = 0$. An identical proof works for $ a0$.

In fact, pretty much any “obvious property” from elementary arithmetic is satisfied for rings. For instance, $ -(-a) = a$ and $ (-a)b = a(-b) = -(ab)$ and $ (-1)^2 = 1$ are all trivial to prove. Here is a list of these and more properties which we invite the reader to prove.

Zero Divisors, Integral Domains, and Units

One thing that is very much not automatically given in the general theory of rings is multiplicative cancellation. That is, if I have $ ac = bc$ then it is not guaranteed to be the case that $ a = b$. It is quite easy to come up with examples in modular arithmetic on integers; if $ R = \mathbb{Z}/8\mathbb{Z}$ then $ 2*6 = 6*6 = 4 \mod 8$, but $ 2 \neq 6$.

The reason for this phenomenon is that many rings have elements that lack multiplicative inverses. In $ \mathbb{Z}/8\mathbb{Z}$, for instance, $ 2$ has no multiplicative inverse (and neither does 6). Indeed, one is often interested in determining which elements are invertible in a ring and which elements are not. In a seemingly unrelated issue, one is interested in determining whether one can multiply any given element $ x \in R$ by some $ y$ to get zero. It turns out that these two conditions are disjoint, and closely related to our further inspection of special classes of rings.

Definition: An element $ x$ of a ring $ R$ is said to be a left zero-divisor if there is some $ y \neq 0$ such that $ xy = 0$. Similarly, $ x$ is a right zero-divisor if there is a $ z$ for which $ zx = 0$. If $ x$ is a left and right zero-divisor (e.g. if $ R$ is commutative), it is just called a zero-divisor.

Definition: Let $ x,y \in R$. The element $ y$ is said to be a left inverse to $ x$ if $ yx = 1$, and a right inverse if $ xy = 1$. If there is some $ z \neq 0$ for which $ xz = zx = 1$, then $ x$ is said to be a two-sided inverse and $ z$ is called the inverse of $ x$, and $ x$ is called a unit.

As a quick warmup, we prove that if $ x$ has a left  and a right inverse then it has a two-sided inverse. Indeed, if $ zx = 1 = xy$, then $ z = z(xy) = (zx)y = y$, so in fact the left and right inverses are the same.

The salient fact here is that having a (left- or right-) inverse allows one to do (left- or right-) cancellation, since obviously when $ ac = bc$ and $ c$ has a right inverse, we can multiply $ acc^{-1} = bcc^{-1}$ to get $ a=b$. We will usually work with two-sided inverses and zero-divisors (since we will usually work in a commutative ring). But in non-commutative rings, like rings of matrices, one-sided phenomena do run rampant, and one must distinguish between them.

The right way to relate these two concepts is as follows. If $ c$ has a right inverse, then define the right-multiplication function $ (- \cdot c) : R \to R$ which takes $ x$ and spits out $ xc$. In fact, this function is an injection. Indeed, we already proved that (because $ c$ has a right inverse) if $ xc = yc$ then $ x = y$. In particular, there is a unique preimage of $ 0$ under this map. Since $ 0c = 0$ is always true, then it must be the case that the only way to left-multiply $ c$ times something to get zero is $ 0c$. That is, $ c$ is not a right zero-divisor if right-multiplication by $ c$ is injective. On the other hand, if the map is not injective, then there are some $ x \neq y$ such that $ xc = yc$, implying $ (x-y)c = 0$, and this proves that $ c$ is a right zero-divisor. We can do exactly the same argument with left-multiplication.

But there is one minor complication: what if right-multiplication is injective, but $ c$ has no inverses? It’s not hard to come up with an example: 2 as an element of the ring of integers $ \mathbb{Z}$ is a perfectly good one. It’s neither a zero-divisor nor a unit.

This basic study of zero-divisors gives us some natural definitions:

Definition: A division ring is a ring in which every element has a two-sided inverse.

If we allow that $ R$ is commutative, we get something even better (more familiar: $ \mathbb{Q}, \mathbb{R}, \mathbb{C}$ are the standard examples of fields).

Definition: field is a nonzero commutative division ring.

The “nonzero” part here is just to avoid the case when the ring is the trivial ring (sometimes called the zero ring) with one element. i.e., the set $ \left \{ 0 \right \}$ is a ring in which zero satisfies both the additive and multiplicative identities. The zero ring is excluded from being a field for silly reasons: elegant theorems will hold for all fields except the zero ring, and it would be messy to require every theorem to add the condition that the field in question is nonzero.

We will have much more to say about fields later on this blog, but for now let’s just state one very non-obvious and interesting result in non-commutative algebra, known as Wedderburn’s Little Theorem.

Theorem: Every finite divison ring is a field.

That is, simply having finitely many elements in a division ring is enough to prove that multiplication is commutative. Pretty neat stuff. We will actually see a simpler version of this theorem in a moment.

Now as we saw units and zero-divisors are disjoint, but not quite opposites of each other. Since we have defined a division ring as a ring where all (non-zero) elements are units, it is natural to define a ring in which the only zero-divisor is zero. This is considered a natural generalization of our favorite ring $ \mathbb{Z}$, hence the name “integral.”

Definition: An integral domain is a commutative ring in which zero is the only zero-divisor.

Note the requirement that the ring is commutative. Often we will simply call it a domain, although most authors allow domains to be noncommutative.

Already we can prove a very nice theorem:

Theorem: Every finite integral domain is a field.

Proof. Integral domains are commutative by definition, and so it suffices to show that every non-zero element has an inverse. Let $ R$ be our integral domain in question, and $ x \in R$ the element whose inverse we seek. By our discussion of above, right multiplication by $ x$ is an injective map $ R \to R$, and since $ R$ is finite this map must be a bijection. Hence $ x$ must have some $ y \neq 0$ so that $ yx = 1$. And so $ y$ is the inverse of $ x$.

$ \square$

We could continue traveling down this road of studying special kinds of rings and their related properties, but we won’t often use these ideas on this blog. We do think the reader should be familiar with the names of these special classes of rings, and we will state the main theorems relating them.

Definition: A nonzero element $ p \in R$ is called prime if whenever $ p$ divides a product $ ab$ it either divides $ a$ or $ b$ (or both). A unique factorization domain (abbreviated UFD) is an integral domain in which every element can be written uniquely as a product of primes.

Definition:Euclidean domain is a ring in which the division algorithm can be performed. That is, there is a norm function $ | \cdot | : R \ \left \{ 0 \right \} \to \mathbb{N}$, for which every pair $ a,b \neq 0$ can be written as $ a = bq + r$ with r satisfying $ |r| < |b|$.

Paolo Aluffi has a wonderful diagram showing the relations among the various special classes of integral domains. This image comes from his book, Algebra: Chapter 0, which is a must-have for the enterprising mathematics student interested in algebra.

rings

In terms of what we have already seen, this diagram says that every field is a Euclidean domain, and in turn every Euclidean domain is a unique factorization domain. These are standard, but non-trivial theorems. We will not prove them here.

The two big areas in this diagram we haven’t yet mentioned on this blog are PIDs and Noetherian domains. The reason for that is because they both require a theory of ideals in rings (perhaps most briefly described as a generalization of the even numbers). We will begin next time with a discussion of ideals, and their important properties in studying rings, but before we finish we want to focus on one main example that will show up later on this blog.

Polynomial Rings

Let us formally define the polynomial ring.

Definition: Let $ R$ be a commutative ring. Define the ring $ R[x]$, to be the set of all polynomials in $ x$ with coefficients in $ R$, where addition and multiplication are the usual addition and multiplication of polynomials. We will often call $ R[x]$ the polynomial ring in one variable over $ R$.

We will often replace $ x$ by some other letter representing an “indeterminate” variable, such as $ t$, or $ y$, or multiple indexed variables as in the following definition.

Definition: Let $ R$ be a commutative ring. The ring $ R[x_1, x_2, \dots, x_n]$ is the set of all polynomials in the $ n$ variables with the usual addition and multiplication of polynomials.

What can we say about the polynomial ring in one variable $ R[x]$? It’s additive and multiplicative identities are clear: the constant 0 and 1 polynomials, respectively. Other than that, we can’t quite get much more. There are some very bizarre features of polynomial rings with bizarre coefficient rings, such as multiplication decreasing degree.

However, when we impose additional conditions on $ R$, the situation becomes much nicer.

Theorem: If $ R$ is a unique factorization domain, then so is $ R[x]$.

Proof. As we have yet to discuss ideals, we refer the reader to this proof, and recommend the reader return to it after our next primer.

$ \square$

On the other hand, we will most often be working with polynomial rings over a field. And here the situation is even better:

Theorem: If $ k$ is a field, then $ k[x]$ is a Euclidean domain.

Proof. The norm function here is precisely the degree of the polynomial (the highest power of a monomial in the polynomial). Then given $ f,g$, the usual algorithm for polynomial division gives a quotient and a remainder $ q, r$ so that $ f = qg + r$. In following the steps of the algorithm, one will note that all multiplication and division operations are performed in the field $ k$, and the remainder always has a smaller degree than the quotient. Indeed, one can explicitly describe the algorithm and prove its correctness, and we will do so in full generality in the future of this blog when we discuss computational algebraic geometry.

$ \square$

For multiple variables, things are a bit murkier. For instance, it is not even the case that $ k[x,y]$ is a euclidean domain. One of the strongest things we can say originates from this simple observation:

Lemma: $ R[x,y]$ is isomorphic to $ R[x][y]$.

We haven’t quite yet talked about isomorphisms of rings (we will next time), but the idea is clear: every polynomial in two variables $ x,y$ can be thought of as a polynomial in $ y$ where the coefficients are polynomials in $ x$ (gathered together by factoring out common factors of $ y^k$). Similarly, $ R[x_1, \dots, x_n]$ is the same thing as $ R[x_1, \dots, x_{n-1}][x_n]$ by induction. This allows us to prove that any polynomial ring is a unique factorization domain:

Theorem: If $ R$ is a UFD, so is $ R[x_1, \dots, x_n]$.

Proof. $ R[x]$ is a UFD as described above. By the lemma, $ R[x_1, \dots, x_n] = R[x_1, \dots, x_{n-1}][x_n]$ so by induction $ R[x_1, \dots, x_{n-1}]$ is a UFD implies $ R[x_1, \dots, x_n]$ is as well.

$ \square$

We’ll be very interested in exactly how to compute useful factorizations of polynomials into primes when we start our series on computational algebraic geometry. Some of the applications include robot motion planning, and automated theorem proving.

Next time we’ll visit the concept of an ideal, see quotient rings, and work toward proving Hilbert’s Nullstellensatz, a fundamental result in algebraic geometry.

Until then!

Groups — A Second Primer

The First Isomorphism Theorem

The meat of our last primer was a proof that quotient groups are well-defined. One important result that helps us compute groups is a very easy consequence of this well-definition.

Recall that if $ G,H$ are groups and $ \varphi: G \to H$ is a group homomorphism, then the image of $ \varphi$ is a subgroup of $ H$. Also the kernel of $ \varphi$ is the normal subgroup of $ G$ consisting of the elements which are mapped to the identity under $ \varphi$. Moreover, we proved that the quotient $ G / \ker \varphi$ is a well-defined group, and that every normal subgroup $ N$ is the kernel of the quotient map $ G \to G/N$. These ideas work together to compute groups with the following theorem. Intuitively, it tells us that the existence of a homomorphism between two groups gives us a way to relate the two groups.

Theorem: Let $ \varphi: G \to H$ be a group homomorphism. Then the quotient $ G/ \ker \varphi$ is isomorphic to the image of $ \varphi$. That is,

$ G/ \ker \varphi \cong \varphi(G)$

As a quick corollary before the proof, if $ \varphi$ is surjective then $ H \cong G / \ker \varphi$.

Proof. We define an explicit map $ f : G/ \ker \varphi \to \varphi(G)$ and prove it is an isomorphism. Let $ g \ker \varphi$ be an arbitrary coset and set $ f(g \ker \varphi) = \varphi(g)$. First of all, we need to prove that this definition does not depend on the choice of a coset representative. That is, if $ g \ker \varphi = g’ \ker \varphi$, then $ f(g) = f(g’)$. But indeed, $ f(g)^{-1}f(g’) = f(g^{-1}g’) = 1$, since for any coset $ N$ we have by definition $ gN = g’N$ if and only if $ g^{-1}g’ \in N$.

It is similarly easy to verify that $ f$ is a homomorphism:

$ f((g \ker \varphi )(g’ \ker \varphi)) = \varphi(gg’) = \varphi(g)\varphi(g’) = f(g \ker \varphi) f(g’ \ker \varphi)$

It suffices now to show that $ f$ is a bijection. It is trivially surjective (since anything in the image of $ \varphi$ is in a coset). It is injective since if $ f(g \ker \varphi) = 1$, then $ \varphi(g) = 1$ and hence $ g \in \ker \varphi$, so the coset $ g \ker \varphi = 1 \ker \varphi$ is the identity element. So $ f$ is an isomorphism. $ \square$

Let’s use this theorem to compute some interesting things.

Denote by $ D_{2n}$ the group of symmetries of the regular $ n$-gon. That is, $ D_{16}$ is the symmetry group of the regular octagon and $ D_{8}$ is the symmetry group of the square (the $ 2n$ notation is because this group always has order $ 2n$). We want to relate $ D_{16}$ to $ D_8$. To do this, let’s define a homomorphism $ f:D_{16} \to D_8$ by sending a one-eighth rotation $ \rho$ of the octagon to a one-fourth rotation of the square $ f(\rho) = \rho^2$, and using the same reflection for both ($ f(\sigma) = \sigma$). It is easy to check that this is a surjective homomorphism, and moreover the kernel is $ \left \{ 1, \rho^4 \right \}$. That is, $ D_8 \cong D_{16}/ \left \{ 1, \rho^4 \right \}$.

Here is a more general example. If $ G, H$ are groups of relatively prime order, then there are no nontrivial homomorphisms $ G \to H$. In order to see this, note that $ |G/ \ker \varphi| = |G| / |\ker \varphi|$ as a simple consequence of Lagrange’s theorem. Indeed, by the first isomorphism theorem this quantity is equal to $ |\varphi(G)|$. So $ |G| = | \ker \varphi| |\varphi(G)|$. That is, the order of $ \varphi(G)$ divides the order of $ G$. But it also divides the order of $ H$ because $ \varphi(G)$ is a subgroup of $ H$. In other words, the order of $ \varphi(G)$ is a common factor of the orders of $ G$ and $ H$. By hypothesis, the only such number is 1, and so $ |\varphi(G)| = 1$ and $ \varphi(G)$ is the trivial group.

We will use the first isomorphism theorem quite a bit on this blog. Because it is such a common tool, it is often used without explicitly stating the theorem.

Generators

One extremely useful way to describe a subgroup is via a set of generators. The simplest example is for a single element.

Definition: Let $ G$ be a group and $ x \in G$. Then the subgroup generated by $ x$, denoted $ \left \langle x \right \rangle$, is the smallest subgroup of $ G$ containing $ x$. More generally, if $ S \subset G$ then the subgroup generated by $ S$ is the smallest subgroup containing $ S$.

This definition is not quite useful, but the useful version is easy to derive. On one hand, the identity element must always be in $ \left \langle x \right \rangle$. Since $ x \in \left \langle x \right \rangle$ and it’s a subgroup, we must have that $ x^{-1} \in \left \langle x \right \rangle$. Moreover, all powers of $ x$ must be in the subgroup, as must all powers of the inverse (equivalently, inverses of the powers). In fact that is all that is necessary. That is,

$ \left \langle x \right \rangle = \left \{ \dots, x^{-2}, x^{-1}, 1, x, x^2, \dots \right \}$

For finite groups, this list of elements will terminate. And in fact, the inverse of $ x$ will be a power of $ x$ as well. To see this, note that if we keep taking powers of $ x$, eventually one of those will be the identity element. Specifically, some power of $ x$ must repeat, and if $ x^n = x^m$ then $ x^{n-m} = 1$. Hence $ x^{-1} = x^{n-m-1}$.

For subgroups generated by more than one element, these subgroups are more difficult to write down. For example, if $ x,y \in G$ then $ x,y$ may have a nontrivial relationship. That is, even though all possible products involving $ x$ and $ y$ are  in the subgroup, it is very difficult to determine whether two such products are the same (in fact, one very general formulation of this problem is undecidable!). Often times one can find a set of generators which generates the entire group $ G$. In this case, we say $ G$ is generated by those elements.

A familiar example is the symmetry group of the square. As it turns out this group is generated by $ \rho, \sigma$, where $ \rho$ is a quarter turn and $ \sigma$ is a reflection across some axis of symmetry. The relationship between the two elements is succinctly given by the equality $ \rho \sigma \rho \sigma = 1$. To see this, try holding our your hand with your palm facing away; rotate your hand clockwise, flip it so your fingers are pointing left, rotate again so your fingers are pointing up, and then flip to get back to where you started; note that the two flips had the same axis of symmetry (the up-down axis). The other (obvious) relationships are that $ \rho^4 = 1$ and $ \sigma^2 = 1$. If we want to describe the group in a compact form, we write

$ G = \left \langle \rho, \sigma | \rho^4, \sigma^2, \rho \sigma \rho \sigma \right \rangle$

This is an example of a group presentation. The left hand side is the list of generators, and the right hand side gives a list of relators, where each one is declared to be the identity element. In particular, the existence of a presentation with generators and relators implies that all possible relationships between the generators can be decuded from the list of relators (kind of like how all relationships between sine and cosine can be deduced from the fact that $ \sin^2(x) + \cos^2(x) = 1$). Indeed, this is the case for the symmetry group (and all dihedral groups); there are only three distinct equations describing the behavior of rotations and reflections.

Here’s a quick definition we will often refer to in the future: a group is called cyclic if it is generated by a single element. Here are some interesting exercises for the beginning reader to puzzle over, which are considered basic facts for experienced group theorists:

  • Every subgroup of a cyclic group is cyclic.
  • There is only one infinite cyclic group: $ \mathbb{Z}$.
  • Every finite cyclic group is isomorphic to $ \mathbb{Z}/n\mathbb{Z}$ for some $ n$.

Finally, we will call a group finitely generated if it is generated by a finite set of elements and finitely presented if it has a presentation with finitely many generators and relators. Just to give the reader a good idea about how vast this class of groups is: many basic conjectured facts about finitely generated groups which have “only” one relator are still open problems. So trying to classify groups with two relators (or finitely many relators) is still a huge leap away from what we currently understand. As far as this author knows, this subject has been largely abandoned after a scant few results were proved.

Products and Direct Sums

Just as one does in every field of math, in order to understand groups better we want to decompose them into smaller pieces which are easier to understand. Two of the main ways to do this are via direct products and direct sums.

Definition: Let $ (G, \cdot_G),(H, \cdot_H)$ be groups. The product group $ G \times H$ is defined to have the underlying set $ G \times H$ (pairs of elements), and the operation is defined by entrywise multiplication in the appropriate group.

$ (g,h) \cdot (g’, h’) = (g \cdot_G g’, h \cdot_H h’)$

Of course, one must verify that this operation actually defines a group according to the usual definition, but this is a simple exercise. One should note that there are two canonical subgroups of the direct product. Define by $ p_1 : G \times H \to G$ the projection onto the first coordinate (that is, $ (a,b) \mapsto a$). This map is obviously a homomorphism, and its kernel is the subgroup of elements $ (1,h), h \in H$. That is, we can identify $ H$ as a subgroup of $ G \times H$. Identically, we see with $ p_2(a,b) = b$ that $ G$ is a subgroup of $ G \times H$.

Note that this allows us to make some very weird groups. For instance, by induction a single direct product allows us to define products of arbitrarily many groups. Note that reordering the terms of such a product does not change the isomorphism class of the group (e.g. $ \mathbb{Z} \times D_8 \cong D_8 \times \mathbb{Z}$). Additionally, there is nothing that stops us from defining infinite product groups. The elements of such a group are sequences of elements from the corresponding multiplicands. For example, the group $ \mathbb{Z} \times \mathbb{Z} \times \dots$ is the group of sequences of integers, where addition is defined termwise and the identity is the sequence of all zeroes.

Now infinite products can be particularly unwieldy, but we may still want to talk about groups constructed from infinitely many pieces. Although we have no motivation for this, one such example is the group of an elliptic curve. In order to tame the unwieldiness, we define the following construction which takes an infinite product, and allows only those elements which have finitely many non-identity terms.

Definition: Let $ G_{\alpha}$ be a family of groups. Define the direct sum of the $ G_{\alpha}$, denoted by $ \bigoplus_{\alpha} G_{\alpha}$, to be the subgroup of $ \prod_{\alpha} G_{\alpha}$ of elements $ (g_0, g_1, \dots)$ where all but finitely many $ g_i$ are the identity in the corresponding $ G_i$.

[As a quick side note, this is mathematically incorrect: since the family of groups need not be countable, the $ g_i$ may not be enumerable. One can fix this by defining the elements as functions on the index set instead of sequences, but we are too lazy to do this.]

Note that we can define a direct sum of only finitely many groups, and for example this would be denoted $ G \oplus H$, but finite sums are trivially the same as finite products. In fact, in this primer and all foreseeable work on this blog, we will stick to finite direct sums of groups, and implicitly identify them with direct products.

Finally, in terms of notation we will write $ G^n$ for a direct product of $ G$ with itself $ n$ times, and $ G^{\oplus n}$ for the direct sum of $ G$ with itself $ n$ times. As we just mentioned, these two are identical, so we will just default to the former for simplicity of reading.

One might wonder: why do we even distinguish these two constructions? The answer is somewhat deep, and we will revisit the question when in our future series on category theory. For now, we can simply say that the distinction is in the case of infinite products and infinite sums, which we won’t discuss anyway except for in passing curiosity.

The Classification of Finitely Generated Abelian Groups

Now we have finally laid enough groundwork to state the first big classification theorem of group theory. In words, it says that any finitely generated abelian group can be written as a direct sum of things isomorphic to $ \mathbb{Z}$ and $ \mathbb{Z}/n\mathbb{Z}$ for various choices of $ n$. Moreover, the choices of $ n$ are related to each other.

In particular, the theorem is stated as follows:

Theorem: Let $ G$ be a finitely generated abelian group. Then $ G$ is isomorphic to a group of the form:

$ \displaystyle \mathbb{Z}^m \oplus \mathbb{Z}/p_1^{n_1}\mathbb{Z} \oplus \dots \oplus \mathbb{Z}/p_k^{n_k}\mathbb{Z}$

Where $ p_i$ are (not necessarily distinct) primes. Moreover, $ G$ is completely determined by the choices of primes and exponents above.

In particular, we name these numbers as follows. The $ \mathbb{Z}^m$ part of the equation is called the free part, the exponent $ m$ is called the rank of $ G$, and the numbers $ p_i^{n_i}$ are called the primary factors.

The proof of this theorem is beyond the scope of this blog, but any standard algebra text will have it. All we are saying here that every finitely generated abelian group can be broken up into the part that has infinite order (the free part) and the part that has finite order (often called the torsion part), and that these parts are largely disjoint from each other. For finite groups this is a huge step forward: to classify all finite groups one now only needs to worry about nonabelian groups (although this is still a huge feat in its own).

A quick application of this theorem is as follows:

Corollary: Let $ G$ be a finitely generated abelian group which has no elements of finite order. Then $ G$ has no nontrivial relators except for those enforcing commutativity.

Proof. Indeed, $ G \cong \mathbb{Z}^m$ for some $ m$, and it has a presentation  $ \left \langle x_1, x_1, \dots, x_m | x_ix_jx_i^{-1}x_j^{-1}, i \neq j \right \rangle$, which has no nontrivial relators. $ \square$

Borrowing the free terminology, such groups without relators are called free abelian groups. Indeed, there are also nonabelian “free” groups, and this is the last topic we will cover in this primer.

Free Groups, and a Universal Property

Equivalently to “a group with no relators,” we can define a free group as a group which has presentation $ \left \langle x_{\alpha} \right \rangle$ for some potentially infinite family of elements $ x_{\alpha}$. If the number of generators is finite, say there are $ n$ of them, then we call it the free group on $ n$ generators.

The interesting thing here is that all possible products of the generating elements are completely distinct. For example, the free group on two generators $ \left \langle a,b \right \rangle$ contains the elements $ ab, aba, abab, b^3a^{-5}b^2a$, along with infinitely many others. The only way that two elements can be the same is if they can be transformed into each other by a sequence of inserting or deleting strings which are trivially the identity. For example, $ abb^{-1}a = a^2$, but only because of our cancellations of $ bb^{-1} = 1$, which holds in every group.

There is another way to get free groups, and that is by taking free products. In particular, the group $ \mathbb{Z} = \left \langle a \right \rangle$ is the free group on a single generator. Given two copies of $ \mathbb{Z}$, we’d like to combine them in some way to get $ \left \langle a,b \right \rangle$. More generally, given any two groups we’d like to define their free product $ G * H$ to be the group which contains all possible words using elements in $ G$ or $ H$, which has no nontrivial relators, except for those already existing among the elements of $ G$ and $ H$ before taking a product.

Rigorously, this is very easy to do with group presentations. Give presentations for $ G = \left \langle x_i | r_j \right \rangle$ and $ H = \left \langle y_k | s_m \right \rangle$, and define the free product by giving a presentation

$ G * H = \left \langle x_i, y_k | r_j, s_m \right \rangle$

For instance, the free product $ \mathbb{Z}/3\mathbb{Z} * \mathbb{Z}/4\mathbb{Z}$ has presentation $ \left \langle a,b | a^3, b^4 \right \rangle$. One interesting fact is that even if $ G,H$ are finite, as long as one is nontrivial then the free product will be an infinite group. Another interesting fact, which we’ll explore in future posts on category theory, is that the free product of groups is “the same thing” as the disjoint union of sets. That is, these two operations play the same role in their respective categories.

This “role” is called the universal property of the free product. We will use this directly in our post on the fundamental group, and in general it just says that homomorphisms provided on the two pieces of the free product extend uniquely to a homomorphism of the product. The simpler form is the universal property of the free group, which says that the free group is the “most general possible” group which is generated by these generators.

Theorem (Universal Property of the Free Group): Let $ S$ be a set of elements, and let $ F(S)$ be the free group generated by the elements of $ S$. Then any set-function $ S \to G$ where $ G$ is a group extends uniquely to a group homomorphism $ F(S) \to G$.

That is, deciding where the generators should go in $ G$ carries with it all of the information needed to define a homomorphism $ F(S) \to G$, and it is uniquely determined in this way. To see this, simply see that once $ f(a), f(b)$ are determined, so are $ f(a^{-1}) = f(a)^{-1}, f(ab) = f(a)f(b), \dots$

The corresponding property for free products is similar:

Theorem (Universal Property of the Free Product): Let $ G_{\alpha}$ be groups, and let $ f_{\alpha}: G_{\alpha} \to H$ be group homomorphisms. Then there is a unique group homomorphism from the free product of the $ G_{\alpha}$ to $ H$, i.e. $ \ast_{\alpha} G_{\alpha} \to H$.

The idea is the same as for the first universal property: the free product is the most general group containing the $ G_{\alpha}$ as subgroups, and so any group homomorphism from the product to another group is completely determined by what happens to each of the multiplicand groups.

Moreover, we can take a free product and add in extra relators in some reasonable way. This is called an amalgamated free product, and it has similar properties which we won’t bother to state here. The important part for us is that this shows up in topology and in our special ways to associate a topological space with a group.

After these two short primers, we have covered a decent chunk of group theory. Nevertheless, we have left out a lot of the important exercises and intuition that goes into this subject. In the future, we will derive group-theoretic propositions we need as we go (in the middle of other posts). On the other hand, we will continually use groups (and abelian groups, among others) as an example of a category in our exploration of category theory. Finally, when we study rings and fields we will first lay them out as abelian groups with further structural constraints. This will shorten our definitions to a manageable form.