I learned of a neat result due to Kevin Ventullo that uses group actions to study the structure of hash functions for unordered sets and multisets. This piqued my interest because a while back a colleague asked me if I could think of any applications of “pure” group theory to practical computer programming that were not cryptographic in nature. He meant, not including rings, fields, or vector spaces whose definitions happen to be groups when you forget the extra structure. I couldn’t think of any at the time, and years later Kevin has provided me with a tidy example. I took the liberty of rephrasing his argument to be a bit simpler (I hope), but I encourage you to read Kevin’s post to see the reasoning in its original form.
Hashes are useful in programming, and occasionally one wants to hash an unordered set or multiset in such a way that the hash does not depend on the order the elements were added to the set. Since collection types are usually generic, one often has to craft a hash function for a set<T> or multiset<T> that relies on a hash function hash(t) of an unknown type T. To make things more efficient, rather than requiring one to iterate over the entire set each time a hash is computed, one might seek out a hash function that can be incrementally updated as new elements are added, and provably does not depend on the order of insertion.
For example, having a starting hash of zero, and adding hashes of elements as they are added (modulo $ 2^{64}$) has this incremental order-ignorant property, because addition is commutative and sums can be grouped. XOR-ing the bits of the hashes is similar. However, both of these strategies have downsides.
For example, if you adopt the XOR strategy for a multiset hash, then any element that has an even quantity in the multiset will be the same as if it were not in the set at all (or if it were in the set with some other even quantity). This is because x XOR x == 0. On the other hand, if you use the addition approach, if an element hashes to zero, its inclusion in any set has no effect on the hash. In Java the integer hash is the identity, so zero would be undetectable as a member of a multiset of ints in any quantity. Less drastically, a multiset with all even counts of elements will always hash to a multiple of 2, and this makes it easier to find hash collisions.
A natural question arises: given the constraint that the hash function is accumulative and commutative, can we avoid such degenerate situations? In principle the answer should obviously be no, just by counting. I.e., the set of all unordered sets of 64-bit integers has size $ 2^{2^{64}}$, while the set of 64-bit hashes has size merely $ 2^{64}$. You will have many many hash collisions, and would need a much longer hash to avoid them in principle.
More than just “no,” we can characterize the structure of such hash functions. They impose an abelian group structure on the set of hashes. And due to the classification theorem of finite abelian groups, up to isomorphism (and for 64-bit hashes) that structure consists of addition on blocks of bits with various power-of-2 moduli, and the blocks are XOR’ed together at the end.
To give more detail, we need to review some group theory, write down the formal properties of an accumulative hash function, and prove the structure theorem.
Group actions, a reminder
This post will assume basic familiarity with group theory as covered previously on this blog. Basically, this introductory post defining groups and actions, and this followup post describing the classification theorem for commutative (abelian) groups. But I will quickly review group actions since they take center stage today.
A group $ G$ defines some abstract notion of symmetries in a way that’s invertible. But a group is really meaningless by itself. They’re only useful when they “act” upon a set. For example, a group of symmetries of the square acts upon the square to actually rotate its points. When you have multiple group structures to consider, it makes sense to more formally separate the group structure from the set.
So a group action is formally a triple of a group $ G$, a set $ X$, and a homomorphism $ f:G \to S_X$, where $ S_X$ is the permutation group (or symmetry group) of $ X$, i.e., the set of all bijections $ X \to X$. The permutation group of a set encompasses every possible group that can act on $ X$. In other words, every group is a subgroup of a permutation group. In our case, $ G$ and $ f$ define a subgroup of symmetries on $ X$ via the image of $ f$. If $ f$ is not injective, some of the structure in $ G$ is lost. The homomorphism determines which parts of $ G$ are kept and how they show up in the codomain. The first isomorphism theorem says how: $ G / \textup{ker} f \cong \textup{im} f$.
This relates to our hashing topic because an accumulative hash—and a nicely behaved hash, as we’ll make formal shortly—creates a group action on the set of possible hash values. The image of that group action is the “group structure” imposed by the hash function, and the accumulation function defines the group operation in that setting.
Multisets as a group, and nice hash functions
An appropriate generalization of multisets whose elements come from a base set $ X$ forms a group. This generalization views a multiset as a “counting function” $ T: X \to \mathbb{Z}$. The empty set is the function that assigns zero to each element. A positive value of $ k$ implies the entry shows up in the multiset $ k$ times. And a negative value is like membership “debt,” which is how we represent inverses, or equivalently set difference operations. The inverse of a multiset $ T$, denoted $ -T$, is the multiset with all counts negated elementwise. Since integer-valued functions generally form a group under point-wise addition, these multisets also do. Call this group $ \textup{MSet}(X)$. We will freely use the suggestive notation $ T \cup \{ x \} $ to denote the addition of $ T$ and the function that is 1 on $ x$ and 0 elsewhere. Similarly for $ T – \{ x \}$.
$ \textup{MSet}(X)$ is isomorphic to the free abelian group on $ X$ (because an instance of a multiset only has finitely many members). Now we can define a hash function with three pieces:
An arbitrary base hash function $ \textup{hash}: X \to \mathbb{Z} / 2^n \mathbb{Z}$.
In order for the second bullet to lead to a well-defined hash, we need the property that the accumulation order of individual elements does not matter. Call a hash accumulator commutative if, for all $ a, b, c \in \mathbb{Z} / 2^n \mathbb{Z}$,
$ \displaystyle h(h(a,b), c) = h(h(a,c), b)$
This extends naturally to being able to reorder any sequence of hashes being accumulated.
The third is a bit more complicated. We need to be able to use the accumulator to “de-accumulate” the hash of an element $ x$, even when the set that gave rise to that hash didn’t have $ x$ in it to start.
Call a hash accumulator invertible if for a fixed hash $ z = \textup{hash}(x)$, the map $ a \mapsto h(a, z)$ is a bijection. That is, accumulating the hash $ z$ to two sets with different hashes under $ h^*$ will not cause a hash collision. This defines the existence of an inverse map (even if it’s not easily computable). This allows us to finish the third bullet point.
Fix $ z = \textup{hash}(x)$ and let $ g$ be the inverse of the map $ a \mapsto h(a, z)$. Then $ h^*(T – \left \{ x \right \}) = g(h^*(T))$
Though we don’t have a specific definition for the inverse above, we don’t need it because we’re just aiming to characterize the structure of this soon-to-emerge group action. Though, in all likelihood, if you implement a hash for a multiset, it should support incremental hash updates when removing elements, and that formula would apply here.
This gives us the well-definition of $ h^*$. Commutativity allows us to define $ h^*(T \cup S)$ by decomposing $ S$ arbitrarily into its constituent elements (with multiplicity), and applying $ h^*(T \cup \{ x \})$ or $ h^*(T – \{ x \})$ in any order.
A group action emerges
Now we have a well-defined hash function on a free abelian group.
However, $ h^*$ is not a homomorphism. There’s no reason hash accumulation needs to mesh well with addition of hashes. Instead, the family of operations “combine a hash with the hash of some fixed set” defines a group action on the set of hashes. Let’s suppose for simplicity that $ h^*$ is surjective, i.e., that every hash value can be achieved as the hash of some set. Kevin’s post gives the more nuanced case when this fails, and in that case you work within $ S_{\textup{im}(h^*)}$ instead of all of $ S_{\mathbb{Z} / 2^n \mathbb{Z}}$.
The group action is defined formally as a homomorphism
where $ \varphi(T)$ is the permutation $ a \mapsto h(a, h^*(T))$. Equivalently, we start from $ a$, pick some set $ S$ with $ h^*(S) = a$, and output $ h^*(T \cup S)$.
The map $ \varphi$ is a homomorphism. The composition of two accumulations is order-independent because $ h$ is commutative. This is how we view $ h$ as “the binary operation” in $ \textup{im} \varphi$, because combining two permutations $ a \mapsto h(a, h^*(T))$ and $ a \mapsto h(a, h^*(S))$ is the permutation $ a \mapsto h(a, h^*(S \cup T))$.
And now we can apply the first isomorphism theorem, that
This is significant because any quotient of an abelian group is abelian, and this quotient is finite because $ S_{\mathbb{Z} / 2^n \mathbb{Z}}$ is finite. This means that the group $ \textup{im} \varphi$ is isomorphic to
where $ n = \sum_i n_i$, and where the operation in each component is the usual addition modulo $ n_i$. The $ i$-th summand corresponds to a block of $ n_i$ bits of the hash, and within that block the operation is addition modulo $ 2^{n_i}$. Here the “block” structure is where XOR comes in. Each block can be viewed as a bitmask with zeros outside the block, and two members are XOR’ed together, which allows the operations to apply to each block independently.
For example, the group might be $ \mathbb{Z} / 2^{4} \mathbb{Z} \times \mathbb{Z} / 2^{26} \mathbb{Z}\times \mathbb{Z} / 2^{2} \mathbb{Z}$ for a 32-bit hash. The first block corresponds to 32-bit unsigned integers whose top 4 bits may be set but all other bits are zero. Addition is done within those four bits modulo 16, leaving the other bits unchanged. Likewise, the second component has the top four bits zero and the bottom two bits zero, but the remaining 26 bits are summed mod $ 2^{24}$. XOR combines the bits from different blocks.
In one extreme case, you only have one block, and your group is just $ \mathbb{Z} / 2^n \mathbb{Z}$ and the usual addition combines hashes. In the other extreme, each bit is its own block, your group is $ (\mathbb{Z} / 2 \mathbb{Z})^n$, the operation is a bitwise XOR.
Note, if instead of $ 2^n$ we used a hash of some other length $ m$, then in the direct sum decomposition above, $ m$ would be the product of the sizes of the components. The choice $ m = 2^n$ maximizes the number of different structures you can have.
Implications for hash function designers
Here’s the takeaway.
First, if you’re trying to design a hash function that avoids the degeneracies mentioned at the beginning of this article, then it will have to break one of the properties listed. This could happen, say, by maintaining additional state.
Second, if you’re resigned to use a commutative, invertible, accumulative hash, then you might as well make this forced structure explicit, and just pick the block structure you want to use in advance. Since no clever bit shifting will allow you to outrun this theorem, you might as well make it simple.
Last time we saw a geometric version of the algorithm to add points on elliptic curves. We went quite deep into the formal setting for it (projective space $ \mathbb{P}^2$), and we spent a lot of time talking about the right way to define the “zero” object in our elliptic curve so that our issues with vertical lines would disappear.
With that understanding in mind we now finally turn to code, and write classes for curves and points and implement the addition algorithm. As usual, all of the code we wrote in this post is available on this blog’s Github page.
Points and Curves
Every introductory programming student has probably written the following program in some language for a class representing a point.
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
It’s the simplest possible nontrivial class: an x and y value initialized by a constructor (and in Python all member variables are public).
We want this class to represent a point on an elliptic curve, and overload the addition and negation operators so that we can do stuff like this:
p1 = Point(3,7)
p2 = Point(4,4)
p3 = p1 + p2
But as we’ve spent quite a while discussing, the addition operators depend on the features of the elliptic curve they’re on (we have to draw lines and intersect it with the curve). There are a few ways we could make this happen, but in order to make the code that uses these classes as simple as possible, we’ll have each point contain a reference to the curve they come from. So we need a curve class.
It’s pretty simple, actually, since the class is just a placeholder for the coefficients of the defining equation. We assume the equation is already in the Weierstrass normal form, but if it weren’t one could perform a whole bunch of algebra to get it in that form (and you can see how convoluted the process is in this short report or page 115 (pdf p. 21) of this book). To be safe, we’ll add a few extra checks to make sure the curve is smooth.
class EllipticCurve(object):
def __init__(self, a, b):
# assume we're already in the Weierstrass form
self.a = a
self.b = b
self.discriminant = -16 * (4 * a*a*a + 27 * b * b)
if not self.isSmooth():
raise Exception("The curve %s is not smooth!" % self)
def isSmooth(self):
return self.discriminant != 0
def testPoint(self, x, y):
return y*y == x*x*x + self.a * x + self.b
def __str__(self):
return 'y^2 = x^3 + %Gx + %G' % (self.a, self.b)
def __eq__(self, other):
return (self.a, self.b) == (other.a, other.b)
So there we have it. Now when we construct a Point, we add the curve as the extra argument and a safety-check to make sure the point being constructed is on the given elliptic curve.
class Point(object):
def __init__(self, curve, x, y):
self.curve = curve # the curve containing this point
self.x = x
self.y = y
if not curve.testPoint(x,y):
raise Exception("The point %s is not on the given curve %s" % (self, curve))
Note that this last check will serve as a coarse unit test for all of our examples. If we mess up then more likely than not the “added” point won’t be on the curve at all. More precise testing is required to be bullet-proof, of course, but we leave explicit tests to the reader as an excuse to get their hands wet with equations.
Some examples:
>>> c = EllipticCurve(a=1,b=2)
>>> Point(c, 1, 2)
(1, 2)
>>> Point(c, 1, 1)
Traceback (most recent call last):
[...]
Exception: The point (1, 1) is not on the given curve y^2 = x^3 + 1x + 2
Before we go ahead and implement addition and the related functions, we need to be decide how we want to represent the ideal point $ [0 : 1 : 0]$. We have two options. The first is to do everything in projective coordinates and define a whole system for doing projective algebra. Considering we only have one point to worry about, this seems like overkill (but could be fun). The second option, and the one we’ll choose, is to have a special subclass of Point that represents the ideal point.
Note the inheritance is denoted by the parenthetical (Point) in the first line. Each function we define on a Point will require a 1-2 line overriding function in this subclass, so we will only need a small amount of extra bookkeeping. For example, negation is quite easy.
class Point(object):
...
def __neg__(self):
return Point(self.curve, self.x, -self.y)
class Ideal(Point):
...
def __neg__(self):
return self
Note that Python allows one to override the prefix-minus operation by defining __neg__ on a custom object. There are similar functions for addition (__add__), subtraction, and pretty much every built-in python operation. And of course addition is where things get more interesting. For the ideal point it’s trivial.
class Ideal(Point):
...
def __add__(self, Q):
return Q
Why does this make sense? Because (as we’ve said last time) the ideal point is the additive identity in the group structure of the curve. So by all of our analysis, $ P + 0 = 0 + P = P$, and the code is satisfyingly short.
For distinct points we have to follow the algorithm we used last time. Remember that the trick was to form the line $ L(x)$ passing through the two points being added, substitute that line for $ y$ in the elliptic curve, and then figure out the coefficient of $ x^2$ in the resulting polynomial. Then, using the two existing points, we could solve for the third root of the polynomial using Vieta’s formula.
In order to do that, we need to analytically solve for the coefficient of the $ x^2$ term of the equation $ L(x)^2 = x^3 + ax + b$. It’s tedious, but straightforward. First, write
And again we can discard the parts that don’t involve $ x^2$. In other words, if we were to rewrite $ L(x)^2 = x^3 + ax + b$ as $ 0 = x^3 – L(x)^2 + ax + b$, we’d expand all the terms and get something that looks like
Which means that $ x_3 = \left ( \frac{y_2 – y_1}{x_2 – x_1} \right )^2 – x_2 – x_1$. Once we have $ x_3$, we can get $ y_3$ from the equation of the line $ y_3 = L(x_3)$.
Note that this only works if the two points we’re trying to add are different! The other two cases were if the points were the same or lying on a vertical line. These gotchas will manifest themselves as conditional branches of our add function.
class Point(object):
...
def __add__(self, Q):
if isinstance(Q, Ideal):
return self
x_1, y_1, x_2, y_2 = self.x, self.y, Q.x, Q.y
if (x_1, y_1) == (x_2, y_2):
# use the tangent method
...
else:
if x_1 == x_2:
return Ideal(self.curve) # vertical line
# Using Vieta's formula for the sum of the roots
m = (y_2 - y_1) / (x_2 - x_1)
x_3 = m*m - x_2 - x_1
y_3 = m*(x_3 - x_1) + y_1
return Point(self.curve, x_3, -y_3)
First, we check if the two points are the same, in which case we use the tangent method (which we do next). Supposing the points are different, if their $ x$ values are the same then the line is vertical and the third point is the ideal point. Otherwise, we use the formula we defined above. Note the subtle and crucial minus sign at the end! The point $ (x_3, y_3)$ is the third point of intersection, but we still have to do the reflection to get the sum of the two points.
Now for the case when the points $ P, Q$ are actually the same. We’ll call it $ P = (x_1, y_1)$, and we’re trying to find $ 2P = P+P$. As per our algorithm, we compute the tangent line $ J(x)$ at $ P$. In order to do this we need just a tiny bit of calculus. To find the slope of the tangent line we implicitly differentiate the equation $ y^2 = x^3 + ax + b$ and get
The only time we’d get a vertical line is when the denominator is zero (you can verify this by taking limits if you wish), and so $ y=0$ implies that $ P+P = 0$ and we’re done. The fact that this can ever happen for a nonzero $ P$ should be surprising to any reader unfamiliar with groups! But without delving into a deep conversation about the different kinds of group structures out there, we’ll have to settle for such nice surprises.
In the other case $ y \neq 0$, we plug in our $ x,y$ values into the derivative and read off the slope $ m$ as $ (3x_1^2 + a)/(2y_1)$. Then using the same point slope formula for a line, we get $ J(x) = m(x-x_1) + y_1$, and we can use the same technique (and the same code!) from the first case to finish.
There is only one minor wrinkle we need to smooth out: can we be sure Vieta’s formula works? In fact, the real problem is this: how do we know that $ x_1$ is a double root of the resulting cubic? Well, this falls out again from that very abstract and powerful theorem of Bezout. There is a lot of technical algebraic geometry (and a very interesting but complicated notion of dimension) hiding behind the curtain here. But for our purposes it says that our tangent line intersects the elliptic curve with multiplicity 2, and this gives us a double root of the corresponding cubic.
And so in the addition function all we need to do is change the slope we’re using. This gives us a nice and short implementation
def __add__(self, Q):
if isinstance(Q, Ideal):
return self
x_1, y_1, x_2, y_2 = self.x, self.y, Q.x, Q.y
if (x_1, y_1) == (x_2, y_2):
if y_1 == 0:
return Ideal(self.curve)
# slope of the tangent line
m = (3 * x_1 * x_1 + self.curve.a) / (2 * y_1)
else:
if x_1 == x_2:
return Ideal(self.curve)
# slope of the secant line
m = (y_2 - y_1) / (x_2 - x_1)
x_3 = m*m - x_2 - x_1
y_3 = m*(x_3 - x_1) + y_1
return Point(self.curve, x_3, -y_3)
What’s interesting is how little the data of the curve comes into the picture. Nothing depends on $ b$, and only one of the two cases depends on $ a$. This is one reason the Weierstrass normal form is so useful, and it may bite us in the butt later in the few cases we don’t have it (for special number fields).
Here are some examples.
>>> C = EllipticCurve(a=-2,b=4)
>>> P = Point(C, 3, 5)
>>> Q = Point(C, -2, 0)
>>> P+Q
(0.0, -2.0)
>>> Q+P
(0.0, -2.0)
>>> Q+Q
Ideal
>>> P+P
(0.25, 1.875)
>>> P+P+P
Traceback (most recent call last):
...
Exception: The point (-1.958677685950413, 0.6348610067618328) is not on the given curve y^2 = x^3 + -2x + 4!
>>> x = -1.958677685950413
>>> y = 0.6348610067618328
>>> y*y - x*x*x + 2*x - 4
-3.9968028886505635e-15
And so we crash headfirst into our first floating point arithmetic issue. We’ll vanquish this monster more permanently later in this series (in fact, we’ll just scrap it entirely and define our own number system!), but for now here’s a quick fix:
>>> import fractions
>>> frac = fractions.Fraction
>>> C = EllipticCurve(a = frac(-2), b = frac(4))
>>> P = Point(C, frac(3), frac(5))
>>> P+P+P
(Fraction(-237, 121), Fraction(845, 1331))
Now that we have addition and negation, the rest of the class is just window dressing. For example, we want to be able to use the subtraction symbol, and so we need to implement __sub__
def __sub__(self, Q):
return self + -Q
Note that because the Ideal point is a subclass of point, it inherits all of these special functions while it only needs to override __add__ and __neg__. Thank you, polymorphism! The last function we want is a scaling function, which efficiently adds a point to itself $ n$ times.
class Point(object):
...
def __mul__(self, n):
if not isinstance(n, int):
raise Exception("Can't scale a point by something which isn't an int!")
else:
if n < 0:
return -self * -n
if n == 0:
return Ideal(self.curve)
else:
Q = self
R = self if n & 1 == 1 else Ideal(self.curve)
i = 2
while i <= n:
Q = Q + Q
if n & i == i:
R = Q + R
i = i << 1
return R
def __rmul__(self, n):
return self * n
class Ideal(Point):
...
def __mul__(self, n):
if not isinstance(n, int):
raise Exception("Can't scale a point by something which isn't an int!")
else:
return self
The scaling function allows us to quickly compute $ nP = P + P + \dots + P$ ($ n$ times). Indeed, the fact that we can do this more efficiently than performing $ n$ additions is what makes elliptic curve cryptography work. We’ll take a deeper look at this in the next post, but for now let’s just say what the algorithm is doing.
Given a number written in binary $ n = b_kb_{k-1}\dots b_1b_0$, we can write $ nP$ as
The advantage of this is that we can compute each of the $ P, 2P, 4P, \dots, 2^kP$ iteratively using only $ k$ additions by multiplying by 2 (adding something to itself) $ k$ times. Since the number of bits in $ n$ is $ k= \log(n)$, we’re getting a huge improvement over $ n$ additions.
The algorithm is given above in code, but it’s a simple bit-shifting trick. Just have $ i$ be some power of two, shifted by one at the end of every loop. Then start with $ Q_0$ being $ P$, and replace $ Q_{j+1} = Q_j + Q_j$, and in typical programming fashion we drop the indices and overwrite the variable binding at each step (Q = Q+Q). Finally, we have a variable $ R$ to which $ Q_j$ is added when the $ j$-th bit of $ n$ is a 1 (and ignored when it’s 0). The rest is bookkeeping.
Note that __mul__ only allows us to write something like P * n, but the standard notation for scaling is n * P. This is what __rmul__ allows us to do.
We could add many other helper functions, such as ones to allow us to treat points as if they were lists, checking for equality of points, comparison functions to allow one to sort a list of points in lex order, or a function to transform points into more standard types like tuples and lists. We have done a few of these that you can see if you visit the code repository, but we’ll leave flushing out the class as an exercise to the reader.
As one can see, the precision gets very large very quickly. One thing we’ll do to avoid such large numbers (but hopefully not sacrifice security) is to work in finite fields, the simplest version of which is to compute modulo some prime.
So now we have a concrete understanding of the algorithm for adding points on elliptic curves, and a working Python program to do this for rational numbers or floating point numbers (if we want to deal with precision issues). Next time we’ll continue this train of thought and upgrade our program (with very little work!) to work over other simple number fields. Then we’ll delve into the cryptographic issues, and talk about how one might encode messages on a curve and use algebraic operations to encode their messages.
The meat of our last primer was a proof that quotient groups are well-defined. One important result that helps us compute groups is a very easy consequence of this well-definition.
Recall that if $ G,H$ are groups and $ \varphi: G \to H$ is a group homomorphism, then the image of $ \varphi$ is a subgroup of $ H$. Also the kernel of $ \varphi$ is the normal subgroup of $ G$ consisting of the elements which are mapped to the identity under $ \varphi$. Moreover, we proved that the quotient $ G / \ker \varphi$ is a well-defined group, and that every normal subgroup $ N$ is the kernel of the quotient map $ G \to G/N$. These ideas work together to compute groups with the following theorem. Intuitively, it tells us that the existence of a homomorphism between two groups gives us a way to relate the two groups.
Theorem: Let $ \varphi: G \to H$ be a group homomorphism. Then the quotient $ G/ \ker \varphi$ is isomorphic to the image of $ \varphi$. That is,
$ G/ \ker \varphi \cong \varphi(G)$
As a quick corollary before the proof, if $ \varphi$ is surjective then $ H \cong G / \ker \varphi$.
Proof. We define an explicit map $ f : G/ \ker \varphi \to \varphi(G)$ and prove it is an isomorphism. Let $ g \ker \varphi$ be an arbitrary coset and set $ f(g \ker \varphi) = \varphi(g)$. First of all, we need to prove that this definition does not depend on the choice of a coset representative. That is, if $ g \ker \varphi = g’ \ker \varphi$, then $ f(g) = f(g’)$. But indeed, $ f(g)^{-1}f(g’) = f(g^{-1}g’) = 1$, since for any coset $ N$ we have by definition $ gN = g’N$ if and only if $ g^{-1}g’ \in N$.
It is similarly easy to verify that $ f$ is a homomorphism:
It suffices now to show that $ f$ is a bijection. It is trivially surjective (since anything in the image of $ \varphi$ is in a coset). It is injective since if $ f(g \ker \varphi) = 1$, then $ \varphi(g) = 1$ and hence $ g \in \ker \varphi$, so the coset $ g \ker \varphi = 1 \ker \varphi$ is the identity element. So $ f$ is an isomorphism. $ \square$
Let’s use this theorem to compute some interesting things.
Denote by $ D_{2n}$ the group of symmetries of the regular $ n$-gon. That is, $ D_{16}$ is the symmetry group of the regular octagon and $ D_{8}$ is the symmetry group of the square (the $ 2n$ notation is because this group always has order $ 2n$). We want to relate $ D_{16}$ to $ D_8$. To do this, let’s define a homomorphism $ f:D_{16} \to D_8$ by sending a one-eighth rotation $ \rho$ of the octagon to a one-fourth rotation of the square $ f(\rho) = \rho^2$, and using the same reflection for both ($ f(\sigma) = \sigma$). It is easy to check that this is a surjective homomorphism, and moreover the kernel is $ \left \{ 1, \rho^4 \right \}$. That is, $ D_8 \cong D_{16}/ \left \{ 1, \rho^4 \right \}$.
Here is a more general example. If $ G, H$ are groups of relatively prime order, then there are no nontrivial homomorphisms $ G \to H$. In order to see this, note that $ |G/ \ker \varphi| = |G| / |\ker \varphi|$ as a simple consequence of Lagrange’s theorem. Indeed, by the first isomorphism theorem this quantity is equal to $ |\varphi(G)|$. So $ |G| = | \ker \varphi| |\varphi(G)|$. That is, the order of $ \varphi(G)$ divides the order of $ G$. But it also divides the order of $ H$ because $ \varphi(G)$ is a subgroup of $ H$. In other words, the order of $ \varphi(G)$ is a common factor of the orders of $ G$ and $ H$. By hypothesis, the only such number is 1, and so $ |\varphi(G)| = 1$ and $ \varphi(G)$ is the trivial group.
We will use the first isomorphism theorem quite a bit on this blog. Because it is such a common tool, it is often used without explicitly stating the theorem.
Generators
One extremely useful way to describe a subgroup is via a set of generators. The simplest example is for a single element.
Definition: Let $ G$ be a group and $ x \in G$. Then the subgroup generated by $ x$, denoted $ \left \langle x \right \rangle$, is the smallest subgroup of $ G$ containing $ x$. More generally, if $ S \subset G$ then the subgroup generated by $ S$ is the smallest subgroup containing $ S$.
This definition is not quite useful, but the useful version is easy to derive. On one hand, the identity element must always be in $ \left \langle x \right \rangle$. Since $ x \in \left \langle x \right \rangle$ and it’s a subgroup, we must have that $ x^{-1} \in \left \langle x \right \rangle$. Moreover, all powers of $ x$ must be in the subgroup, as must all powers of the inverse (equivalently, inverses of the powers). In fact that is all that is necessary. That is,
For finite groups, this list of elements will terminate. And in fact, the inverse of $ x$ will be a power of $ x$ as well. To see this, note that if we keep taking powers of $ x$, eventually one of those will be the identity element. Specifically, some power of $ x$ must repeat, and if $ x^n = x^m$ then $ x^{n-m} = 1$. Hence $ x^{-1} = x^{n-m-1}$.
For subgroups generated by more than one element, these subgroups are more difficult to write down. For example, if $ x,y \in G$ then $ x,y$ may have a nontrivial relationship. That is, even though all possible products involving $ x$ and $ y$ are in the subgroup, it is very difficult to determine whether two such products are the same (in fact, one very general formulation of this problem is undecidable!). Often times one can find a set of generators which generates the entire group $ G$. In this case, we say $ G$ is generated by those elements.
A familiar example is the symmetry group of the square. As it turns out this group is generated by $ \rho, \sigma$, where $ \rho$ is a quarter turn and $ \sigma$ is a reflection across some axis of symmetry. The relationship between the two elements is succinctly given by the equality $ \rho \sigma \rho \sigma = 1$. To see this, try holding our your hand with your palm facing away; rotate your hand clockwise, flip it so your fingers are pointing left, rotate again so your fingers are pointing up, and then flip to get back to where you started; note that the two flips had the same axis of symmetry (the up-down axis). The other (obvious) relationships are that $ \rho^4 = 1$ and $ \sigma^2 = 1$. If we want to describe the group in a compact form, we write
This is an example of a group presentation. The left hand side is the list of generators, and the right hand side gives a list of relators, where each one is declared to be the identity element. In particular, the existence of a presentation with generators and relators implies that all possible relationships between the generators can be decuded from the list of relators (kind of like how all relationships between sine and cosine can be deduced from the fact that $ \sin^2(x) + \cos^2(x) = 1$). Indeed, this is the case for the symmetry group (and all dihedral groups); there are only three distinct equations describing the behavior of rotations and reflections.
Here’s a quick definition we will often refer to in the future: a group is called cyclic if it is generated by a single element. Here are some interesting exercises for the beginning reader to puzzle over, which are considered basic facts for experienced group theorists:
Every subgroup of a cyclic group is cyclic.
There is only one infinite cyclic group: $ \mathbb{Z}$.
Every finite cyclic group is isomorphic to $ \mathbb{Z}/n\mathbb{Z}$ for some $ n$.
Finally, we will call a group finitely generated if it is generated by a finite set of elements and finitely presented if it has a presentation with finitely many generators and relators. Just to give the reader a good idea about how vast this class of groups is: many basic conjectured facts about finitely generated groups which have “only” one relator are still open problems. So trying to classify groups with two relators (or finitely many relators) is still a huge leap away from what we currently understand. As far as this author knows, this subject has been largely abandoned after a scant few results were proved.
Products and Direct Sums
Just as one does in every field of math, in order to understand groups better we want to decompose them into smaller pieces which are easier to understand. Two of the main ways to do this are via direct products and direct sums.
Definition: Let $ (G, \cdot_G),(H, \cdot_H)$ be groups. The product group $ G \times H$ is defined to have the underlying set $ G \times H$ (pairs of elements), and the operation is defined by entrywise multiplication in the appropriate group.
Of course, one must verify that this operation actually defines a group according to the usual definition, but this is a simple exercise. One should note that there are two canonical subgroups of the direct product. Define by $ p_1 : G \times H \to G$ the projection onto the first coordinate (that is, $ (a,b) \mapsto a$). This map is obviously a homomorphism, and its kernel is the subgroup of elements $ (1,h), h \in H$. That is, we can identify $ H$ as a subgroup of $ G \times H$. Identically, we see with $ p_2(a,b) = b$ that $ G$ is a subgroup of $ G \times H$.
Note that this allows us to make some very weird groups. For instance, by induction a single direct product allows us to define products of arbitrarily many groups. Note that reordering the terms of such a product does not change the isomorphism class of the group (e.g. $ \mathbb{Z} \times D_8 \cong D_8 \times \mathbb{Z}$). Additionally, there is nothing that stops us from defining infinite product groups. The elements of such a group are sequences of elements from the corresponding multiplicands. For example, the group $ \mathbb{Z} \times \mathbb{Z} \times \dots$ is the group of sequences of integers, where addition is defined termwise and the identity is the sequence of all zeroes.
Now infinite products can be particularly unwieldy, but we may still want to talk about groups constructed from infinitely many pieces. Although we have no motivation for this, one such example is the group of an elliptic curve. In order to tame the unwieldiness, we define the following construction which takes an infinite product, and allows only those elements which have finitely many non-identity terms.
Definition: Let $ G_{\alpha}$ be a family of groups. Define the direct sum of the $ G_{\alpha}$, denoted by $ \bigoplus_{\alpha} G_{\alpha}$, to be the subgroup of $ \prod_{\alpha} G_{\alpha}$ of elements $ (g_0, g_1, \dots)$ where all but finitely many $ g_i$ are the identity in the corresponding $ G_i$.
[As a quick side note, this is mathematically incorrect: since the family of groups need not be countable, the $ g_i$ may not be enumerable. One can fix this by defining the elements as functions on the index set instead of sequences, but we are too lazy to do this.]
Note that we can define a direct sum of only finitely many groups, and for example this would be denoted $ G \oplus H$, but finite sums are trivially the same as finite products. In fact, in this primer and all foreseeable work on this blog, we will stick to finite direct sums of groups, and implicitly identify them with direct products.
Finally, in terms of notation we will write $ G^n$ for a direct product of $ G$ with itself $ n$ times, and $ G^{\oplus n}$ for the direct sum of $ G$ with itself $ n$ times. As we just mentioned, these two are identical, so we will just default to the former for simplicity of reading.
One might wonder: why do we even distinguish these two constructions? The answer is somewhat deep, and we will revisit the question when in our future series on category theory. For now, we can simply say that the distinction is in the case of infinite products and infinite sums, which we won’t discuss anyway except for in passing curiosity.
The Classification of Finitely Generated Abelian Groups
Now we have finally laid enough groundwork to state the first big classification theorem of group theory. In words, it says that any finitely generated abelian group can be written as a direct sum of things isomorphic to $ \mathbb{Z}$ and $ \mathbb{Z}/n\mathbb{Z}$ for various choices of $ n$. Moreover, the choices of $ n$ are related to each other.
In particular, the theorem is stated as follows:
Theorem: Let $ G$ be a finitely generated abelian group. Then $ G$ is isomorphic to a group of the form:
Where $ p_i$ are (not necessarily distinct) primes. Moreover, $ G$ is completely determined by the choices of primes and exponents above.
In particular, we name these numbers as follows. The $ \mathbb{Z}^m$ part of the equation is called the free part, the exponent $ m$ is called the rank of $ G$, and the numbers $ p_i^{n_i}$ are called the primary factors.
The proof of this theorem is beyond the scope of this blog, but any standard algebra text will have it. All we are saying here that every finitely generated abelian group can be broken up into the part that has infinite order (the free part) and the part that has finite order (often called the torsion part), and that these parts are largely disjoint from each other. For finite groups this is a huge step forward: to classify all finite groups one now only needs to worry about nonabelian groups (although this is still a huge feat in its own).
A quick application of this theorem is as follows:
Corollary: Let $ G$ be a finitely generated abelian group which has no elements of finite order. Then $ G$ has no nontrivial relators except for those enforcing commutativity.
Proof. Indeed, $ G \cong \mathbb{Z}^m$ for some $ m$, and it has a presentation $ \left \langle x_1, x_1, \dots, x_m | x_ix_jx_i^{-1}x_j^{-1}, i \neq j \right \rangle$, which has no nontrivial relators. $ \square$
Borrowing the free terminology, such groups without relators are called free abelian groups. Indeed, there are also nonabelian “free” groups, and this is the last topic we will cover in this primer.
Free Groups, and a Universal Property
Equivalently to “a group with no relators,” we can define a free group as a group which has presentation $ \left \langle x_{\alpha} \right \rangle$ for some potentially infinite family of elements $ x_{\alpha}$. If the number of generators is finite, say there are $ n$ of them, then we call it the free group on $ n$ generators.
The interesting thing here is that all possible products of the generating elements are completely distinct. For example, the free group on two generators $ \left \langle a,b \right \rangle$ contains the elements $ ab, aba, abab, b^3a^{-5}b^2a$, along with infinitely many others. The only way that two elements can be the same is if they can be transformed into each other by a sequence of inserting or deleting strings which are trivially the identity. For example, $ abb^{-1}a = a^2$, but only because of our cancellations of $ bb^{-1} = 1$, which holds in every group.
There is another way to get free groups, and that is by taking free products. In particular, the group $ \mathbb{Z} = \left \langle a \right \rangle$ is the free group on a single generator. Given two copies of $ \mathbb{Z}$, we’d like to combine them in some way to get $ \left \langle a,b \right \rangle$. More generally, given any two groups we’d like to define their free product $ G * H$ to be the group which contains all possible words using elements in $ G$ or $ H$, which has no nontrivial relators, except for those already existing among the elements of $ G$ and $ H$ before taking a product.
Rigorously, this is very easy to do with group presentations. Give presentations for $ G = \left \langle x_i | r_j \right \rangle$ and $ H = \left \langle y_k | s_m \right \rangle$, and define the free product by giving a presentation
$ G * H = \left \langle x_i, y_k | r_j, s_m \right \rangle$
For instance, the free product $ \mathbb{Z}/3\mathbb{Z} * \mathbb{Z}/4\mathbb{Z}$ has presentation $ \left \langle a,b | a^3, b^4 \right \rangle$. One interesting fact is that even if $ G,H$ are finite, as long as one is nontrivial then the free product will be an infinite group. Another interesting fact, which we’ll explore in future posts on category theory, is that the free product of groups is “the same thing” as the disjoint union of sets. That is, these two operations play the same role in their respective categories.
This “role” is called the universal property of the free product. We will use this directly in our post on the fundamental group, and in general it just says that homomorphisms provided on the two pieces of the free product extend uniquely to a homomorphism of the product. The simpler form is the universal property of the free group, which says that the free group is the “most general possible” group which is generated by these generators.
Theorem (Universal Property of the Free Group): Let $ S$ be a set of elements, and let $ F(S)$ be the free group generated by the elements of $ S$. Then any set-function $ S \to G$ where $ G$ is a group extends uniquely to a group homomorphism $ F(S) \to G$.
That is, deciding where the generators should go in $ G$ carries with it all of the information needed to define a homomorphism $ F(S) \to G$, and it is uniquely determined in this way. To see this, simply see that once $ f(a), f(b)$ are determined, so are $ f(a^{-1}) = f(a)^{-1}, f(ab) = f(a)f(b), \dots$
The corresponding property for free products is similar:
Theorem (Universal Property of the Free Product): Let $ G_{\alpha}$ be groups, and let $ f_{\alpha}: G_{\alpha} \to H$ be group homomorphisms. Then there is a unique group homomorphism from the free product of the $ G_{\alpha}$ to $ H$, i.e. $ \ast_{\alpha} G_{\alpha} \to H$.
The idea is the same as for the first universal property: the free product is the most general group containing the $ G_{\alpha}$ as subgroups, and so any group homomorphism from the product to another group is completely determined by what happens to each of the multiplicand groups.
Moreover, we can take a free product and add in extra relators in some reasonable way. This is called an amalgamated free product, and it has similar properties which we won’t bother to state here. The important part for us is that this shows up in topology and in our special ways to associate a topological space with a group.
After these two short primers, we have covered a decent chunk of group theory. Nevertheless, we have left out a lot of the important exercises and intuition that goes into this subject. In the future, we will derive group-theoretic propositions we need as we go (in the middle of other posts). On the other hand, we will continually use groups (and abelian groups, among others) as an example of a category in our exploration of category theory. Finally, when we study rings and fields we will first lay them out as abelian groups with further structural constraints. This will shorten our definitions to a manageable form.
The study of groups is often one’s first foray into advanced mathematics. In the naivete of set theory one develops tools for describing basic objects, and through a first run at analysis one develops a certain dexterity for manipulating symbols and definitions. But it is not until the study of groups that one must step back and inspect the larger picture. The main point of that picture (and indeed the main point of a group) is that algebraic structure can be found in the most unalgebraic of settings. And we mean algebraic literally: we will make sense of what it means to “multiply” things which are not even close to numbers.
It is in my mind prohibitively difficult to motivate the study of these sorts of abstract algebraic objects. Part of the problem is that in order to impress upon the reader any hint of the riches returned from investing in their study, one must already be fluent in their language (indeed, in their culture!) and recognize the commanding utility of discovering them in otherwise mysterious places.
In pure mathematics, the study of groups alone has yielded some impressive and counterintuitive results:
There is no arithmetic formula for the roots of a polynomial of degree 5 or higher.
Loads of interesting properties of numbers, for example that $ m!n!$ divides $ (m+n)!$
Tools to help classify topological spaces, varieties, field extensions, and many other classes of mathematical objects.
And there are plenty of applications to the real world:
Public-key cryptography (based on the group-theoretic treatment of elliptic curves).
Error-detection systems on credit cards, ISBNs for books, bar codes, etc.
Study of abstract games like the Rubik’s Cube.
Determining properties of crystal and molecular structures.
The formulation of modern particle physics.
Mathematical biology, in particular determining the not-quite-icosahedral structure of a virus.
As a general rule, anything which has some kind of symmetrical structure can be “represented” by a group. We will of course make these notions clearer in this post, but before we jump into the rigorous mathematics, let us give two accessible (visual) examples of groups.
The Square and the Rubik’s Cube
Imagine a square in the plane:
Specifically, we will label each of the corners of this square so that we can distinguish them from another. One can imagine cutting this square out of the plane, doing some kind of physical manipulation, and placing it back into the same hole (so that it fills up all the same space). For instance, one can rotate the square by a quarter turn (say, counterclockwise), or reflect it across a diagonal (say, the AC diagonal) before replacing it. Either of these two operations will not alter the square itself: it will always take points in the square to other points in the square. In other words, it might be the case that our four labelled corners are swapped around, but if we had not labelled the corners we wouldn’t be able to tell anything had changed. More rigorously, these transformations have to preserve the distances between points, so we cannot stretch, tear, or squeeze the square in any way.
If we denote a quarter-turn rotation by $ \rho$ and a flip by $ \sigma$, we can write down a sequence of these operations like
$ \rho \sigma \rho \rho$
Where we apply the operations in order from left to right. That is, the above operation is “rotate a quarter turn, then flip, then rotate twice more.” Using the same notation: here are some additional examples of these operations on the square:
In particular, we will call one of these operations a symmetry of the square. Now we can ask the following question, “How can we classify all of the different kinds of symmetry of the square?”
There are some obvious properties of these symmetries. First, and trivially, the operation where we “do nothing” is a valid symmetry. Second, each of these symmetries has an “opposite” symmetry. Indeed, in terms of sets these symmetries are functions, and they should be bijections. And finally, as we’ve already seen, we can compose any two symmetries to get another symmetry.
The salient point here is that two different compositions of symmetries can end up being the same thing. Indeed, doing the same flip twice is the same thing as doing nothing, and rotating four times is also the same thing as doing nothing.
Moreover, a symmetry of the square is completely determined by how it moves the corners around. Here’s a short sketch of a proof of this. By our requirement that distances are preserved, the corners must also go to corners (think of the diagonal corners), and once the corners are chosen every other point in the square is required to be a certain distance from each corner. A standard geometric argument shows that three or more circles with non-collinear centers which have a simultaneous intersection point must intersect in a single point.
And so there is a one-to-one correspondence between symmetries of the square and the pictures of the resulting squares with labels. The important fact is that not all possible labelings can be represented in this way. For example, the reader will try in vain to find a symmetry between the two labeled squares:
We will touch more on this example later, but here’s a more complicated example.
A Rubik’s Cube (Wikipedia)
In the same way that we can enumerate all possible symmetries of the square, we can enumerate all possible symmetries of the Rubik’s cube. But here the structure is more complicated: we can rotate any one of the six faces of the cube, and the relationships between operations are not at all obvious. The colored stickers form our labels to distinguish two configurations, but it’s not clear which (if any) stickers are superfluous. But again the same properties hold: there is a do-nothing operation, every operation is reversible, and any two operations can be composed and the result is still a valid operation.
As we will see shortly, these properties characterize what it means to have symmetry.
Groups
Now that we’ve seen two examples of symmetry groups, it’s time to get abstract. The three properties of symmetries we mentioned above will form the basis for our definition:
Definition: A group $ G$ is a set with a specified binary operation (denoted here by a $ \cdot$), so that the following properties hold:
$ G$ contains an identity element denoted $ e$ for which $ e \cdot x = x$ for all $ x \in G$.
Every $ x \in G$ has an inverse element $ y \in G$ for which $ x \cdot y = e$.
$ G$ is closed under $ \cdot$. That is, for any $ x,y \in G$, $ x \cdot y$ is again in $ G$.
The group operation is associative. That is, $ x \cdot (y \cdot z) = (x \cdot y) \cdot z$
There are a few issues we need to get out of the way about this definition and the mess of notation associated with it, but first let’s see some trivial examples.
The singleton set $ \left \{ e \right \}$ with the binary operation $ \cdot$ defined by setting $ e \cdot e = e$ is trivially a group. In terms of symmetries there is only one operation (to do nothing), and doing nothing any number of times is the same as doing nothing.
The set of integers $ \mathbb{Z}$ forms a group under the operation of addition. It is common knowledge that zero fits the definition of the identity element, that the sum of two integers is an integer, that addition on integers is associative, and that every integer has an additive inverse. We can similarly take any of our common number systems and see that they are groups under addition: rational numbers, real numbers, complex numbers, etc. IF we want to work with multiplication, it is not hard to see that $ \mathbb{R} – \left \{ 0 \right \}$ is a group, since every nonzero real number has an inverse, and 1 is the multiplicative identity.
Here are a few basic propositions to clear up some silly ambiguities in the definition. For instance, we never say that there should only be one identity element. But indeed, if there are two identity elements $ e, e’$, then we can show they must be equal:
$ e = e \cdot e’ = e’$
The first equality holds because $ e’$ is an identity element, and the second because $ e$ is. A similar proof shows that the inverse of an element is unique. Therefore, we are justified in the following notation: we will call the identity element $ 1$, and use subscripts $ 1_G, 1_H$ to distinguish between identity elements in different groups $ G, H$. We will also conventionally replace the explicit $ \cdot$ operation with juxtaposition, and emulate the multiplication operation by saying things like $ x^n$ to mean $ x \cdot x \cdot \dots \cdot x$ multiplying $ n$ times.
Of course, if we’re talking about the integers $ \mathbb{Z}$ that won’t do, because $ \mathbb{Z}$ is only a group under addition (although multiplication is well-defined, only $ \pm 1$ have multiplicative inverses). So in the case of groups whose underlying sets are numbers, we have to be more careful. In these cases, we will write things like $ nx$ to mean $ x + x + \dots + x$ adding $ n$ times (and here $ n$ is not considered an element of $ \mathbb{Z}$ as a group, but just the number of additions), and $ -x$ for the additive inverse of $ x$. While all of this notation might sound horribly confusing, we promise that it’s easy to get used to.
Another preliminary question we might ask is: are there groups of any given set size?
Definition: The order of a group, denoted $ |G|$, is the size of $ G$ as a set. If $ G$ is an infinite set, we say it has infinite order. Otherwise we say it has finite order.
Just as a quick example, let’s take the group of integers modulo 4. That is, our set is $ \left \{ 0, 1, 2, 3 \right \}$, and the operation $ +$ is defined by adding the two operands and taking the remainder when dividing by four. It is easy to verify this is indeed a group, and it has order 4. In addition this group is equivalent to the structure of rotations of our square earlier. In particular, if we consider the do-nothing operation to be 0, a quarter turn to be the number 1, a half turn to be 2, and a three-quarter turn to be 3, then composing two rotations would correspond to addition between the two numbers, where a full 360-degree rotation is the same as doing nothing. In the formal parlance, which we’ll explain shortly, these two groups are isomorphic. Everything about their group structure is the same, they just have different labels for their elements and operations.
The number 4 is obviously arbitrary. We could define $ \mathbb{Z}/n\mathbb{Z}$ to be the group of integers modulo $ n$ for any $ n$. And it’s not hard to see that this group is isomorphic to the group of rotations of the regular $ n$-gon in the plane. So there are finite groups of any order we please, and we can realize them as the rotational groups of regular polygons.
These groups $ \mathbb{Z}$ and $ \mathbb{Z}/n\mathbb{Z}$ share something else in common: their operations are commutative. That is, for any two integers $ x+y = y+x$ (regardless of whether we take modulus afterward). It is not hard to see that group operations are not always commutative, because we have already seen one. The symmetry group of the square, if we call $ \rho$ a quarter turn and $ \sigma$ a diagonal flip, it will not be true that $ \rho \sigma = \sigma \rho$.
Definition: A group $ G$ is called abelian (pronounced, uh-BEE-lee-un) if its group operation is commutative.
Abelian groups are quite important. Part of the reason is that finite abelian groups are completely understood. This is a standard statement of classification, and it is one of the main results of group theory. We are not yet equipped to state or prove it, but in principle the idea is clear: every finite abelian group decomposes into pieces that look like $ \mathbb{Z}$ and $ \mathbb{Z}/n\mathbb{Z}$ for some choices of $ n$.
Before we continue we should note a few other examples of groups which come up quite often:
$ n \times n$ matrices with real entries under the operation of entry-wise matrix addition. This group is called $ M_{n \times n}(\mathbb{R})$
$ n \times n$ invertible matrices with real entries under matrix multiplication. This group is often called the general linear group, denoted $ GL_n(\mathbb{R})$.
The symmetry group of the regular $ n$-gon, called the dihedral group of order $ 2n$ denoted $ D_{2n}$.
The symmetric group on $ n$ letters is the group of all permutations of a finite set of size $ n$. For example, there are twenty-four ways to permute the numbers $ 1,2,3,4$, and each of these permutations (that is, bijections $ \left \{ 1,2,3,4 \right \} \to \left \{ 1,2,3,4 \right \}$) is an element of the group. In general, the symmetric group is denoted $ S_n$ and has order $ n!$.
The symmetry group of any set $ A$, denoted $ S(A)$, is the set of all bijections $ A \to A$, with the group operation being function composition. Note that if $ A$ is finite, then this is just the previously defined symmetric group on $ |A|$ letters $ S_{|A|}$, but it is useful at times to use infinite sets.
Subgroups, Group Homomorphims, and Quotients
Definition: Let $ (G, \cdot)$ be a group and let $ H \subset G$ be a subset. We call $ H$ a subgroup of $ G$ if $ H$ is a group under the same operation as $ G$. That is, the operation on $ H$ is precisely the restriction $ \cdot |_{H \times H}$.
In particular, $ H$ must be closed under the operation of $ G$. For instance, the set of even integers $ 2\mathbb{Z}$ is a subset of the set of all integers $ \mathbb{Z}$, and the sum of any two even numbers is even, so this is a subgroup. Similarly, the subset of all integers which are multiples of $ n$ forms a subgroup of $ \mathbb{Z}$, and we denote it $ n\mathbb{Z}$. On the other hand, the subset of integers modulo $ n$ is not a subgroup of $ \mathbb{Z}$, because the operation is not just addition. As we will see in a bit, $ \mathbb{Z}/n\mathbb{Z}$ is a quotient of $ \mathbb{Z}$, justifying the unexplained division notation.
Another example is the subgroup of rotations within the group of symmetries of the regular $ n$-gon, and the subgroup of matrices with determinant 1 within the group of invertible matrices $ GL_n(\mathbb{R})$. The reader may easily verify that these are indeed subgroups.
We are about to give the most important example of subgroups, but first we must speak of homomorphisms.
Definition: Let $ G, H$ be two groups. A function $ \varphi: G \to H$ is called a homomorphism if $ \varphi(xy) = \varphi(x)\varphi(y)$ for all $ x,y \in G$.
Note that the operation on the left-hand side of the equality is happening with the operation in $ G$, but on the right-hand side it is happening in $ H$. In short, this condition says that the map $ \varphi$ preserves the structure of the group, at least in some coarse way.
We have already seen an example of a group homomorphism between the group $ \mathbb{Z}/4\mathbb{Z}$ and the subgroup of rotations of the square. There we said that $ \varphi(\rho^k) = k \mod 4$, and indeed
So this is a group homomorphism. Here the homomorphism also happens to be a bijection. In this case we call it an isomorphism, but this need not always be the case. For example, if $ G$ is a group of invertible matrices under multiplication, the operation of taking a determinant is is group homomorphism to the multiplicative group of nonzero real numbers $ \textup{det}: GL_n(\mathbb{R}) \to \mathbb{R} – \left \{ 0 \right \}$. Indeed, $ \textup{det}(AB) = \textup{det}(A)\textup{det}(B)$ for all matrices $ A,B$. But many different matrices have equal determinants, so this map cannot be a bijection.
It is easy to verify (and we will leave it to the reader) that group homomorphisms always preserve the identity and inverses. For a slightly more detailed exercise, the reader can prove that the set-theoretic image of a homomorphism is a subgroup of the target group (you may want to use the “three properties” detailed below for this).
Now our important subgroup example is simply the set of things in a group homomorphism which get sent to the identity. That is:
Definition: The kernel of a group homomorphism $ \varphi: G \to H$, denoted $ \ker \varphi$ is the preimage of the identity $ \varphi^{-1}(1_H)$.
To verify that this is a group, let’s verify the three group properties (distributivity obviously holds since $ G$ is a group).
Identity: $ \varphi(1_G) = \varphi(1_G 1_G) = \varphi(1_G) \varphi(1_G)$, and multiplying by $ \varphi(1_G)^{-1}$ (whatever it is), gives $ 1_H = \varphi(1_G)$. So $ 1_G \in \ker \varphi$.
Inverses: if $ \varphi(x) = 1_H$ then $ \varphi(xx^{-1}) = \varphi(x) \varphi(x^{-1}) = 1_H \varphi(x^{-1}) = \varphi(x^{-1})$. But $ \varphi(xx^{-1}) = \varphi(1_G) = 1_H$ to begin with, so $ 1_H = \varphi(x^{-1})$.
Closed under multiplication: if $ \varphi(x) = \varphi(y) = 1_H$ then so does $ \varphi(xy) = \varphi(x)\varphi(y) = 1_H^2 = 1_H$.
So the kernel of a homomorphism is a valid subgroup of $ G$.
Now something amazing happens with group homomorphisms. We just saw that the set of things which map to the identity form a group themselves, but there’s more to the story. As the reader knows, or can easily verify, every function $ f: G \to H$ between two sets induces an equivalence relation on $ G$, where two elements are equivalent if they get mapped to the same element in $ H$. For group homomorphisms we can say something much more powerful: the set of equivalence classes of a homomorphism forms a group. As we will see, the identity in this group is precisely the kernel of the homomorphism. This is a deep fact about groups, so let’s make it more rigorous.
Definition: Let $ G$ be a group, and let $ H$ be a subgroup. The left coset of $ H$ by an element $ x \in G$ is the set
$ xH = \left \{ xy: y \in H \right \}$
Similarly, the right coset by $ x$ is $ Hx = \left \{ yx : y \in H \right \}$.
For example, one can take the trivial coset $ 1H$, and it is easy to verify that the coset $ hH = H$ if and only if $ h \in H$.
In general nonabelian groups the right coset and left coset if $ H$ by the same element will not agree. Moreover, as sets sometimes two left cosets $ xH, yH$ will be equal even when $ x \neq y$. In fact, they will be equal if and only if $ y^{-1}x \in H$. In particular, if $ xH = yH$, then there are two elements $ z, z’ \in H$ for which $ xz = yz’$, or equivalently $ y^{-1}x = z’z^{-1}$. But $ z, z’$ are both in $ H$, so these two products are.
Since set equality is an equivalence relation, we just defined an equivalence relation on $ G$ by $ x \sim y$ if $ y^{-1}x \in H$. The equivalence classes are just the cosets, and we can ask what is required for the set of equivalence classes to form a group. But before we do that, let’s verify what we said about group homomorphisms holds. That is, if $ x \ker \varphi = y \ker \varphi$, we have $ y^{-1}x \in \ker \varphi$, so that $ \varphi(y^{-1}x) = 1 = \varphi(y)^{-1} \varphi(x)$. In other words, $ \varphi(y) = \varphi(x)$, as we claimed.
Now here’s the cool part. We can “multiply” two cosets $ xHyH$ by calling this product $ \left \{ ab : a \in xH, b \in yH \right \}$. If we want the set of left cosets to form a group under this operation, we require the usual properties to hold:
Associativity will follow form closure, since it is just $ xyzH$ and the group operation in $ G$ is associative. Suppose that the identity part works and let’s look at closure: if $ xHyH = xyH = xyH1H$, then we can isolate the middle parts, and see that $ Hy = yH$ (the argument really should be more detailed, and we leave it to the reader to flush it out). That is, the left and right cosets by $ y$ must be equal!
Moreover, if left and right cosets are always equal, then one can easily verify that these properties all hold. So we just proved the following statement:
Theorem: The set of left (right) cosets of a subgroup $ H \subset G$ form a group if and only if the left and right cosets by $ x$ agree for any $ x \in G$.
And for the final stroke: left and right cosets agree for kernels. Let $ x \ker \varphi$ be such a coset; we want to show that for any $ z \in \ker \varphi$, we have $ xz = yx$ for some $ y \in \ker \varphi$. This is equivalent to requiring that $ xzx^{-1} \in \ker \varphi$, and indeed $ \varphi(xzx^{-1}) = \varphi(x)\varphi(z)\varphi(x)^{-1} = \varphi(x)\varphi(x)^{-1} = 1$. So the cosets of a kernel always form a group.
This condition that the left and right cosets agree (or that $ xzx^{-1} \in H$ for all $ z \in H, x \in G$) is such an important condition that we give it a name:
Definition: If the left and right cosets of a subgroup $ H \subset G$ agree, we call it a normal subgroup.
Now our theorem is simply: the cosets of a subgroup form a group if and only if the subgroup is normal, and kernels are always normal subgroups. To put the final touches on this mathematical cake, it just so happens that every normal subgroup is the kernel of some homomorphism. To see this, we just need one more definition.
Definition: Let $ G$ be a group and $ N$ be a normal subgroup. The quotient group of $ G$ by $ N$, denoted $ G/N$, is the group of left cosets of $ N$ in $ G$.
The important part is that there is an obvious group homomorphism $ G \to G/N$: just send an element $ x$ to the coset $ xN$. We call this map the canonical projection of $ G$ onto $ G/N$. Indeed it is surjective (hence the name projection), and the kernel of this homomorphism is clearly $ N$ (since $ xN = N$ if and only if $ x \in N$). So every normal subgroup is the kernel of the canonical projection map onto its quotient.
As a quick aside, the avid reader of this blog may recognize the notation and terminology form our post on constructing topological spaces. Indeed, the quotients introduced here are in a sense identical to quotients of groups, but in a different “setting.” The exploration of these analogous constructions in different settings falls under the name of category theory. Category theory happens to be extremely interesting and have a computational flavor; needless to say, we will be investigating it on this blog in the future.
The promised example here is of $ \mathbb{Z}/n\mathbb{Z}$. As we already saw, $ n\mathbb{Z}$ is a subgroup of $ \mathbb{Z}$, and it is normal simply because $ \mathbb{Z}$ is an abelian group (every subgroup of an abelian group is normal). So the cosets (denoted $ x + n\mathbb{Z}$ in the additive style) correspond to the remainders modulo $ n$: $ x + n\mathbb{Z} = y + n\mathbb{Z}$ if and only if $ y-z$ is a multiple of $ n$. The connection is immediate.
Lagrange’s Theorem
The language of quotients and cosets immediately gives us some powerful tools. The most immediate one is called Lagrange’s Theorem. In our discussion of cosets, we implicitly realized that the set of left cosets of $ H \subset G$ partition a $ G$ (every equivalence relation defines a partition). Moreover, it is interesting to note that all of these cosets have equal size!
To see this, note that for a fixed $ x \in G$ there is a natural bijection $ G \to G$ defined by left multiplication by $ x$. The nice thing about bijections is that they preserve the size of sets. So if we can restrict this map to a coset $ yH$ and the image is another coset $ zH$, then these two cosets have the same size.
Indeed, for any given $ y, z$, choose $ x = zy^{-1}$. Then left-multiplication by $ x$ gives $ zy^{-1}yH = zH$. As we mentioned in our discussion of cosets, this operation is well defined (in the sense that we chose the representative $ z$ of $ zH$ and ended up with the representative $ y$ of $ yH$). Since $ y,z$ were arbitrary, it must be the case that all left cosets have the same size.
Theorem: (Lagrange) Let $ G$ be a finite group. The order of any subgroup $ H \subset G$ divides the order of $ G$.
Proof. Note that $ H$ is a left coset of $ H$ (by, say, the identity). Further note that since $ G$ is finite, there are finitely many cosets of $ H$. Denote by $ [G:H]$ the number of cosets of $ H$ in $ G$. Since all cosets have the same size, they are disjoint, and the union of all cosets of $ H$ is $ G$, it must be that $ |H| [G:H] = |G|$. In particular, the order of $ H$ divides the order of $ G$. $ \square$
This imposes a very large amount of structure on a group. For instance, one can now say that a group of primer order $ p$ has no nontrivial subgroups except for the whole group itself (indeed, with a little more work we can completely characterize groups of prime order). Further, any group of order $ 2^n$ cannot have subgroups of odd order (except the identity subgroup). Many of the arguments of group theory turn into arguments by divisibility solely because of this theorem. And the proof is essentially summed up by the phrase “multiplication by a group element is bijective.” This is a gem of elegance that shows the power of viewing mathematics from the perspective of functions.
It is common to call the quantity $ [G:H]$ in the proof above the index of $ H$ in $ G$. As we said, this is nothing more than the number of cosets of $ H$, and it satisfies $ |G| = [G:H] |H|$.
Group Actions
The last thing we will discuss in this primer is the idea of a group action. As we mentioned, the important part about group theory is that groups represent symmetries of things. Going back to our example of a square, it’s not that a square itself is a group in a nice way, but that the symmetries of a square form a group. This is an important distinction, because we are saying that somehow the symmetry group of the square “performs an action” on the square.
This gives us a hint that we should define what it means for a general group to “perform actions” on a general set.
Definition: A group $ G$ acts on a set $ A$ if there is a group homomorphism $ \varphi: G \to S(A)$, where $ S(A)$ is the group of bijections $ A \to A$. We call $ \varphi$ a group action.
Let’s dissect this definition a little. What we’re saying is that we want to associate elements of $ G$ with operations on $ A$. If $ g \in G$ and $ a \in A$, then $ \varphi(g): A \to A$ permutes the elements of $ A$ in some way. We can even go so far as to identify $ g$ with its image in $ S(A)$, and write $ ga$ for the action of (the function) $ g$ on (the set element) $ a$.
For example, $ \mathbb{Z}/4\mathbb{Z}$ acts on the square by sending $ n \mapsto \rho^n$, where $ \rho$ is again the quarter turn rotation. Indeed, any finite symmetry group $ S_n$ acts on the set of numbers $ 1, \dots, n$ by permutations (this is how they were defined), and for general sets $ A$, the group $ S(A)$ acts on $ A$ (the homomorphism is the identity map). Specific examples we’ve seen on this blog include matrix groups acting on vector spaces and homeomorphisms acting on topological spaces.
There are a few important aspects of group actions that we will investigate in future posts. But at least for now we can prove a nice theorem using group actions:
Theorem: Every group is a subgroup of a group of symmetries. In particular, every finite group $ G$ is isomorphic to a subgroup of $ S_n$ for some $ n$.
Proof. Every group $ G$ acts on itself by left multiplication. That is, every element $ g \in G$ corresponds to a distinct isomorphism $ G \to G$ by sending $ x \mapsto gx$. Indeed, left multiplication by an element $ g$ is easily proven to be an isomorphism on $ G$. Furthermore, if $ g \neq h$ then $ g^{-1}h \neq 1$ and so left multiplication by $ h$ followed by left multiplication by $ g^{-1}$ is not the identity map on $ G$. This proves that two distinct elements in $ G$ correspond to two distinct isomorphisms of $ G$. That is, the group action is injective as a homomorphism $ G \to S(G)$. In general every injective map is a bijection onto its image, and the image of a homomorphism is always a subgroup. So the group action provides an isomorphism $ G \to \varphi(G) \subset S(G)$, as required. $ \square$
There is a whole host of things we have yet to talk about with groups. We will leave these for next time, but just to lay them on the table, we need to discuss:
Generators and cyclic groups
The first isomorphism theorem (more appropriately named the homomorphism decomposition theorem)
Direct sums and products
Orbits, stabilizers, and other objects associated to group actions.
Free groups, group presentations, and relations
That being said, group theory is a very large subject, and the reader may wish to pick up some books. This author learned algebra from two primary textual sources, the first being Michael Artin’s “Algebra” and the second (from a more mature perspective) being Paolo Aluffi’s “Algebra: Chapter 0”. We highly recommend the second text as a way to simultaneously learn algebra, demistify category theory, and see a wide host of applications of abstract algebra to other fields of mathematics.