The Fourier Series – A Primer

Overview

In this primer we’ll get a first taste of the mathematics that goes into the analysis of sound and images. In the next few primers, we’ll be building the foundation for a number of projects in this domain: extracting features of music for classification, constructing so-called hybrid images, and other image manipulations for machine vision problems (for instance, for use in neural networks or support vector machines; we’re planning on covering these topics in due time as well).

But first we must cover the basics, so let’s begin with the basic ideas about periodic functions. Next time, we’ll move forward to talk about Fourier transforms, and then to the discrete variants of the Fourier transform. But a thorough grounding in this field’s continuous origins will benefit everyone, and the ubiquity of the concepts in engineering applications ensures that future projects will need it too. And so it begins.

The Bird’s Eye View

The secret of the universe that we will soon make rigorous is that the sine and cosine can be considered the basic building blocks of all functions we care about in the real world. That’s not to say they are the only basic building blocks; with a different perspective we can call other kinds of functions basic building blocks as well. But the sine and cosine are so well understood in modern mathematics that we can milk them for all they’re worth with minimum extra work. And as we’ll see, there’s plenty of milk to go around.

The most rigorous way to state that vague “building block” notion is the following theorem, which we will derive in the sequel. Readers without a strong mathematical background may cringe, but rest assured, the next section and the remainder of this primer require nothing more than calculus and familiarity with complex numbers. We simply state the main idea in full rigor here for the mathematicians. One who understands the content of this theorem may skip this primer entirely, for everything one needs to know about Fourier series is stated there. This may also display to the uninitiated reader the power of more abstract mathematics.

Theorem: The following set of functions forms a complete orthonormal basis for the space $ L^2$ of square-integrable functions:

$ \displaystyle \left \{ e^{2 \pi i k x} \right \}_{k = – \infty}^{\infty}$

And the projection of a function $ f$ onto this basis gives the Fourier series representation for $ f$.

Of course, those readers with a background in measure theory and linear algebra will immediately recognize many of the words in this theorem. We don’t intend to cover the basics of measure theory or linear algebra; we won’t define a measure or Lebesgue-integrability, nor will we reiterate the facts about orthogonality we covered in our primer on inner-product spaces. We will say now that the inner products here should be viewed as generalizations of the usual dot product to function spaces, and we will only use the corresponding versions of the usual tasks the dot product can perform. This author prefers to think about these things in algebraic terms, and so most of the important distinctions about series convergence will either fall out immediately from algebraic facts or be neglected.

On the other hand, we will spend some time deriving the formulas from a naive point of view. In this light, many of the computations we perform in this primer will not assume knowledge beyond calculus.

 Periodic Signals (They’re Functions! Just Functions!)

Fourier analysis is generally concerned with the analysis and synthesis of functions. By analysis we mean “the decomposition into easy-to-analyze components,” and by synthesis we mean “the reconstruction from such components.” In the wealth of literature that muddles the subject, everyone seems to have different notation and terminology just for the purpose of confusing innocent and unsuspecting mathematicians. We will take what this author thinks is the simplest route (and it is certainly the mathematically-oriented one), but one always gains an advantage by being fluent in multiple languages, so we clarify our terminology with a few definitions.

Definition: A signal is a function.

For now we will work primarily with continuous functions of one variable, either $ \mathbb{R} \to \mathbb{R}$ or $ \mathbb{R} \to \mathbb{C}$. This makes sense for functions of time, but many functions have domains that should be interpreted as spatial (after all, sine and cosine are classically defined as spatial concepts). We give an example of such spatial symmetry at the end of this primer, but the knowledgeable mathematician will imagine how this theory could be developed in more general contexts. When we get to image analysis, we will extend these methods to two (or more) dimensions. Moreover, as all of our non-theoretical work will be discrete, we will eventually drop the continuity assumption as well.

Also just for the moment, we want to work in the context of periodic functions. The reader should henceforth associate the name “Fourier series” with periodic functions. Since sine and cosine are periodic, it is clear that finite sums of sines and cosines are also periodic.

Definition: A function $ f: \mathbb{R} \to \mathbb{C}$ is $ a$-periodic if $ f(x+a) = f(x)$ for all $ x \in \mathbb{R}$.

This is a very strong requirement of a function, since just by knowing what happens to a function on any interval of length $ a$, we can determine what the function does everywhere. At first, we will stick to 1-periodic functions and define the Fourier series just for those. As such, our basic building blocks won’t be $ \sin(t), \cos(t)$, but rather $ \sin(2\pi t), \cos(2 \pi t)$, since these are 1-periodic. Later, we will (roughly) generalize the Fourier series by letting the period tend to infinity, and arrive at the Fourier transform.

For functions which are only nonzero on some finite interval, we can periodize them to be 1-periodic. Specifically, we scale them horizontally so that they lie on the interval $ [0,1]$, and we repeat the same values on every interval. In this way we can construct a large number of toy periodic functions.

Naive Symbol-Pushing

Now once we get it in our minds that we want to use sines and cosines to build up a signal, we can imagine what such a representation might look like. We might have a sum that looks like this:

$ \displaystyle A_0 + \sum_{k = 1}^{n} A_k\sin(2 \pi k t + \varphi_k)$

Indeed, this is the most general possible sum of sines that maintains 1-periodicity: we allow arbitrary phase shifts by $ \varphi_k$, and arbitrary amplitude changes via $ A_k$. We have an arbitrary vertical shifting constant $ A_0$. We also multiply the frequency by $ k$ at each step, but it is easy to see that the function is still 1-periodic, although it may also be, e.g. 1/2-periodic. In general, since we are increasing the periods by integer factors the period of the sum is determined by the longest of the periods, which is 1. Test for yourself, for example, the period of $ \sin(2\pi t) + \sin(4 \pi t)$.

Before we continue, the reader should note that we don’t yet know if such representations exist! We are just supposing initially that they do, and then deriving what the coefficients would look like.

To get rid of the phase shifts $ \varphi_k$, and introduce cosines, we use the angle sum formula on the above quantity to get

$ \displaystyle A_0 + \sum_{k = 1}^n A_k \sin(2 \pi k t) \cos(\varphi_k) + A_k \cos(2 \pi k t) \sin(\varphi_k)$

Now the cosines and sines of $ \varphi_k$ can be absorbed into the constant $ A_k$, which we rename $ a_k$ and $ b_k$ as follows:

$ \displaystyle A_0 + \sum_{k =1}^{n} a_k \sin(2 \pi k t) + b_k \cos(2 \pi k t)$

Note that we will have another method to determine the necessary coefficients later, so we can effectively ignore how these coefficients change. Next, we note the following elementary identities from complex analysis:

$ \displaystyle \cos(2 \pi k t) = \frac{e^{2 \pi i k t} + e^{-2 \pi i k t}}{2}$
$ \displaystyle \sin(2 \pi k t) = \frac{e^{2 \pi i k t} – e^{-2 \pi i k t}}{2i}$

Now we can rewrite the whole sum in terms of complex exponentials as follows. Note that we change the indices to allow negative $ k$, we absorb $ A_0$ into the $ k = 0$ term (for which $ e^{0} = 1$), and we again shuffle the constants around.

$ \displaystyle \sum_{k = -n}^n c_k e^{2 \pi i k t}$

At this point, we must allow for the $ c_k$ to be arbitrary complex numbers, since in the identity for sine above we divide by $ 2i$. Also, though we leave out the details, we see that the $ c_k$ satisfy an important property that results in this sum being real-valued for any $ t$. Namely, $ c_{-k} = \overline{c_k}$, where the bar represents the complex conjugate.

We’ve made a lot of progress, but whereas at the beginning we didn’t know what the $ A_k$ should be in terms of $ f$ alone, here we still don’t know how to compute $ c_k$. To find out, let’s use a more or less random (inspired) trick: integrate.

Suppose that $ f$ is actually equal to this sum:

$ \displaystyle f(t) = \sum_{k = -n}^n c_k e^{2 \pi i k t}$

Let us isolate the variable $ c_m$ by first multiplying through by $ e^{-2 \pi i m t}$ (where $ m$ may also be negative), and then move the remaining terms to the other side of the equation.

$ \displaystyle c_m = f(t)e^{-2 \pi i m t} – \sum_{k \neq m} c_k e^{2 \pi i (k-m) t}$

Integrating both sides of the equation on $ [0,1]$ with respect to $ t$, something magical happens. First, note that if $ k \neq m$, the integral

$ \displaystyle \int_0^1 c_ke^{2 \pi i (k-m) t} dt$

is actually zero! Moreover, the integral $ \int_0^1 c_m dt = c_m(1 – 0) = c_m$. Hence, this big mess simplifies to

$ \displaystyle c_m = \int_0^1 f(t)e^{-2 \pi i m t} dt$

And now the task of finding the coefficients is simply reduced to integration. Very tidy.

Those readers with some analysis background will recognize this immediately as the $ L^2$ inner product. Specifically, the inner product is:

$ \displaystyle \left \langle f,g \right \rangle = \int_0^1 f(t)\overline{g(t)}dt$

In fact, the process we went through is how one derives what the appropriate inner product for $ L^2$ should be. With respect the this inner product, the exponentials $ e^{2 \pi i k t}$ do form an orthonormal set and this is trivial to verify once we’ve found the inner product. As we will say in a second, the coefficients of the Fourier series will be determined precisely by the projections of $ f$ onto the complex exponentials, as is usual with orthonormal bases of inner product spaces.

We will provide an example of finding such coefficients in due time, but first we have bigger concerns.

The Fourier Series, and Convergence Hijinx

Now recall that our original formula was a finite sum of sines. In general, not all functions can be represented by a finite sum of complex exponentials. For instance, take this square wave:

A square wave, 1-periodic, but discontinuous at k/2 for all integers k.

This function is the characteristic function of the set $ \cup_{k \in \mathbb{Z}} [k, k+1/2]$. No matter how many terms we use, if the sum is finite then the result will be continuous, and this function is discontinuous. So we can’t represent it.

“Ah, what about continuous functions?” you say, “Surely if everything is continuous our problems will be solved.” Alas, if only mathematics were so simple. Here is an example of a continuous function which still cannot be represented: a triangle wave.

A triangle wave. It’s continuous, but not smooth.

This function can be described as the periodization of the piecewise function defined by $ 1-2x$ and $ 2x$ on the intervals $ [0, 1/2], [1/2,1]$, respectively.

Unfortunately, these “sharp corners” prevent any finite sum from giving an exact representation. Indeed, this function is not differentiable at those points, while a finite sum of differentiable exponentials is.

More generally, this is a problem if the function we are trying to analyze is not smooth; that is, if it is not infinitely differentiable at all points in its domain. Since a complex exponential is smooth, and the sum of smooth functions is smooth, we see that this must be the case.

Indeed, the only way to avoid this issue is to let $ n$ go to infinity. So let us make the natural definitions, and arrive at the true Fourier series:

Definition: The $ k$-th Fourier coefficient of a function $ f$, denoted $ \widehat{f}(k)$, is the integral

$ \displaystyle \widehat{f}(k) = \left \langle f, e^{2 \pi i k t} \right \rangle = \int_{0}^1 f(t) e^{-2 \pi i k t} dt$

The Fourier series of $ f$ is

$ \displaystyle \sum_{k = -\infty}^{\infty} \widehat{f}(k) e^{2 \pi i k t}$

and is equal to $ f(t)$ for all $ t \in \mathbb{R}$.

At first, we have to wonder what class of functions we can use this on. Indeed, this integral is not finite for some wild functions. This leads us to restrict Fourier series to functions in $ L^2$. That is, the only functions we can apply this to must have finite $ L^2$ norm. In the simplest terms, $ f$ must be square-integrable, so that the integral

$ \displaystyle \int_0^1 \left | f(t) \right |^2 dt$

is finite. As it turns out, $ L^2$ is a huge class of functions, and every function we might want to analyze in the real world is (or can be made to be) in $ L^2$. Then, of course, the pointwise equality of a function with its Fourier series (pointwise convergence) is guaranteed by the fact that the complex exponentials form an orthonormal basis for $ L^2$.

But we have other convergence concerns. Specifically, we want it to be the case that when $ k$ is very far from 0, then $ \widehat{f}(k)$ contributes little to the representation of the function. The real way to say that this happens is by using the $ L^2$ norm again, in the sense that two functions are “close” if the following integral is small in absolute value:

$ \displaystyle \int_0^1 \left | f(t) – g(t) \right |^2 dt$

Indeed, if we let $ g_n(t)$ be the termination of the Fourier series at $ n$ (i.e. $ -n \leq k \leq n$), then as $ n \to \infty$, the above integral tends to zero. This is called convergence in $ L^2$, and it’s extremely important for applications. We will never be able to compute the true Fourier series in practice, so we have to stop at some sufficiently large $ n$. We want the theoretical guarantee that our approximation only gets better for picking large $ n$. The proof of this fact is again given by our basis: the complex exponentials form a complete basis with respect to the $ L^2$ norm.

Moreover, if the original function $ f$ is continuous then we get uniform convergence. That is, the quality of the approximation will not depend on $ t$, but only on our choice of a terminating $ n$.

There is a wealth of other results on the convergence of Fourier series, and rightly so, by how widely used they are. One particularly interesting result we note here is a counterexample: there are continuous and even integrable functions (but not square-integrable) for which the Fourier series diverges almost everywhere.

Other Bases, and an Application to Heat

As we said before, there is nothing special about using the complex exponentials as our orthonormal basis of $ L^2$. More generally, we call $ \left \{ e_n \right \}$ a Hilbert basis for $ L^2$ if it forms an orthonormal basis and is complete with respect to the $ L^2$ norm.

One can define an analogous series expansion for a function with respect to any Hilbert basis. While we leave out many of the construction details for a later date, one can use, for example, Chebyshev polynomials or Hermite polynomials. This idea is hence generalized to the field of “wavelet analysis” and “wavelet transforms,” of which the Fourier variety is a special case (now we’re speaking quite roughly; there are details this author isn’t familiar with at the time of this writing).

Before we finish, we present an example where the Fourier series is used to solve the heat equation on a circular ring. We choose this problem because historically it was the motivating problem behind the development of these ideas.

In general, the heat equation applies to some region $ R$ in space, and you start with an initial distribution of heat $ f(x)$, where $ x$ is a vector with the same dimension as $ R$. The heat equation dictates how heat dissipates over time.

Periodicity enters into the discussion because of our region: $ R$ is a one-dimensional circle, and so $ x$ is just the value $ x \in [0, 1]$ parameterizing the angle $ 2 \pi x$. Let $ u(x,t)$ be the temperature at position $ x$ at time $ t$. As we just said, $ u$ is 1-periodic in $ x$, as is $ f$. The two are related by $ u(x,0) = f(x)$.

The important consideration is that the symmetry of the circle has consequences in how the heat dissipates.

Now let us write the Fourier series for $ u$.

$ \displaystyle u(x,t) = \sum_{k = -\infty}^{\infty} c_k(t) e^{2 \pi i k x}$

Where the dependence on time $ t$ comes into play in $ c_k$:

$ \displaystyle c_k(t) = \int_0^1 u(x,t)e^{-2 \pi i k x} dx$

Now the mystery here is evaluating these $ c_k$. We want to find the coefficients in terms of the initial temperature distribution $ f$. The magical step here is that $ u$ satisfies the (one-dimensional) heat equation: the partial derivative with respect to time of $ u$ is directly proportional to the second partial spatial derivative. In symbols,

$ u_t = au_{xx}$

Where the constant $ a$ depends on the physical properties of the region. See this page for a general derivation. This equation also called the diffusion equation, because it not only governs the dissipation of heat, but the diffusion of lots of quantities over time (e.g., charge in a wire).

Without loss of generality, let’s let $ a = 1/2$. Let’s plug the Fourier series for $ u(x,t)$ into the heat equation and see what happens. First,

$ \displaystyle u_t = \sum_{k = -\infty}^{\infty} c_k'(t) e^{2 \pi i k x}$
$ \displaystyle u_{xx} = \sum_{k = -\infty}^{\infty} c_k(t)(-4 \pi ^2 k^2) e^{2 \pi i k x}$

And plugging in, we have

$ \displaystyle \sum_{k = -\infty}^{\infty} c_k'(t) e^{2 \pi i k x} = \sum_{k = -\infty}^{\infty} c_k(t)(-2 \pi^2 k^2)e^{2 \pi i k x}$

This simply says that we equate the coefficients of each term $ e^{2 \pi i k x}$, giving

$ \displaystyle c_k'(t) = -2 \pi^2 k^2 c_k(t)$

But this is a wonderful thing, because this is an ordinary differential equation, and we can solve it by elementary means. Using the usual logarithmic trick, we have

$ \displaystyle c_k(t) = c_k(0) e^{- 2 \pi^2k^2 t}$

So now $ c_k(0)$ is just $ \int_0^1 u(x,0)e^{-2 \pi i k x} dx$ and $ u(x,0) = f(x)$. From a different perspective, this is just the $ k$-th Fourier coefficient of $ f$. Summing it all up, we now have a very explicit description of $ u(x,t)$ in which the dependence on time and space is obvious:

$ \displaystyle u(x,t) = \sum_{k = -\infty}^{\infty} \widehat{f}(k)e^{-2 \pi^2 k^2 t}e^{2 \pi i k x}$

And this solves the problem.

To convince you of some of the virtues of this representation, we can see that as $ t$ tends to infinity, the first exponential goes to zero in every term, so that the whole expression goes to zero (regardless of $ k, k^2$ is always positive). That is, no matter what the initial heat distribution is, after a long enough time all of the heat at any point will dissipate.

So there you have it! Our main content posts in the future will use Fourier series (and more heavily, Fourier transforms) in analyzing things like music files, images, and other digital quantities.

Until then!

Inner Product Spaces – A Primer

This primer is a precursor to a post on eigenfaces, a technique for facial recognition. For a more basic introduction to linear algebra, see our first primer on the subject.

Motivations

Vector spaces alone are not enough to do a lot of the interesting things we’d like them to do. Since a vector space is a generalization of Euclidean space, it is natural for us to investigate more specific types of vector spaces which are more akin to Euclidean space. In particular, we want to include the notion of a dot product. By admitting additional structure to a vector space, we may perform more computations, and hopefully get more interesting results.

Again, since we are developing mathematics for use in computing, all vector spaces in this post are finite-dimensional, and the field under question is always $ \mathbb{R}$ or $ \mathbb{C}$, but usually the former. It suffices to recognize the vast amount of work and results in infinite-dimensional spaces, but we will not need it here.

Inner Product Spaces

The standard dot product operation in Euclidean space is defined as

$ \left \langle (x_1, \dots , x_n), (y_1, \dots, y_n) \right \rangle = \sum \limits_{i = 1}^n x_iy_i$

So we take the dot product and generalize it to an operation on an arbitrary vector space.

Definition: An inner product on a vector space $ V$ over a field $ F$ is a function $ \left \langle \cdot, \cdot \right \rangle : V \times V \to F$ which satisfies the following properties for all $ x,y,z \in V, c \in F$:

  • $ \left \langle x,y \right \rangle = \overline{ \left \langle y, x\right \rangle }$, where the bar denotes complex conjugate. For real fields, this is just symmetry in the arguments.
  • $ \left \langle cx, y \right \rangle = c \left \langle x,y \right \rangle$
  • $ \left \langle x+y, z \right \rangle = \left \langle x,z \right \rangle + \left \langle y,z \right \rangle$
  • $ \left \langle x,x \right \rangle \geq 0$ and $ \left \langle x,x \right \rangle = 0$ if and only if $ x=0$.

We recommend the novice reader invent some simple and wild operations on two vectors, and confirm or deny that they are inner products.

Notice that the second and third conditions imply that if the second argument of an inner product is fixed, then the resulting map $ V \to F$ is a linear map (since it maps vectors to the underlying field, it has a special name: a linear functional). We leave it as an exercise to the reader to investigate linearity facts about the map resulting from fixing the first argument (hint: things get conjugated).

We call a vector space with an associated inner product and inner product space. Of course, the most natural example of an inner product space is any Euclidean space $ \mathbb{R}^n$ with the dot product. However, there are many other interesting inner products, including ones which involve matrix multiplication, integration, and random variables. As interesting as they may be (and though the results we develop here hold for them), we have no need for them at this point in time. We will stick entirely to inner product spaces embedded in $ \mathbb{R}^n$, and the standard dot product will suffice.

Now from any inner product we may induce a norm on $ V$, which is colloquially the “distance” of a vector from the origin.

Definition: A norm on $ V$ is a function $ \left \| \cdot \right \| : V \to R$ which satisfies the following for all $ x,y \in V, c \in F$:

  • $ \left \| x \right \| \geq 0$, with equality if and only if $ x=0$
  • $ \left \| cx \right \| = |c| \left \| x \right \|$
  • $ \left \| x+y \right \| \leq \left \| x \right \| + \left \| y \right \|$ (the infamous triangle inequality)

If we recall the standard Euclidean norm, we see that it is just $ \left \| x \right \| = \sqrt{\left \langle x,x \right \rangle}$. Indeed, for any inner product this definition satisfies the axioms of a norm, and so it is a natural generalization of “distance” between points in any inner product space.

In particular, those readers familiar with topology (or at least metric space analysis), will immediately recognize that a norm induces a metric on $ V$, defined by $ d(x,y)= \left \| x-y \right \| $, where of course $ x-y$ is the vector between $ x$ and $ y$. Hence, every inner product space has a very rigid (metrized) topology and a fruitful geometry.

Additionally, any vector of norm 1 is called a unit vector, or a normal vector.

Now we look at vectors which have interesting relationships under inner products: specifically, when $ \left \langle x,y \right \rangle = 0$.

Orthogonality and Orthonormal Bases

Definition: Two vectors $ x,y \in V$ are orthogonal if $ \left \langle x,y \right \rangle = 0$. A set $ S$ of vectors is called orthogonal if the vectors in $ S$ are pairwise orthogonal.

Orthogonal vectors naturally generalize perpendicularity in Euclidean space. In particular, two vectors in $ \mathbb{R}^n$ are orthogonal if and only if the subspaces (lines) spanned by them are perpendicular.

The simplest examples of orthogonal vectors are the standard basis vectors $ (1,0, \dots, 0) \dots (0, \dots ,1)$, with respect to the usual dot product. Although, any scalar multiple of a basis vector $ \lambda e_i$ may replace $ e_i$ while still preserving the orthogonality of the set.

Orthogonality gives a new insight into the nature of inner products. Specifically, $ \left \langle x,y \right \rangle$ gives (almost) the length of the projection of $ x$ onto $ y$. In other words, $ \left \langle x,y \right \rangle$ is the component of $ x$ that points in the direction of $ y$ (scaled by the length of $ y$).

Now we may define projection maps to get “projection onto $ y$” more faithfully than a plain inner product:

Definition: The projection of $ x$ onto $ y$, denoted $ p_y(x)$, is the map $ V \to V$ defined by $ p_y(x) = \left \langle x, y \right \rangle \frac{y}{\left \| y \right \|^2}$.

The reader may easily verify that the vectors $ p_y(x)$ and $ x-p_y(x)$ are indeed orthogonal, though the computation is awkward. This will come in handy later when we want to build orthogonal sets of vectors.

In addition, we may obviously decompose any vector $ v$ into its two orthogonal components with respect to another vector $ w$ via this projection: $ v = (v-p_w(v)) + p_w(v)$.

In addition to orthogonality, the standard basis vectors have norm 1. We call such a basis for an inner product space an orthonormal basis (a portmanteau of orthogonal and normal). We commonly denote an orthonormal set of vectors $ (e_1, \dots, e_n)$, perhaps a ritualistic attempt to summon the power of the standard basis vectors.

Given an orthonormal basis for an inner product space V $ (e_1, \dots, e_n)$, we may decompose any vector into its basis representation rather easily:

$ \displaystyle v = p_{e_1}(v) + \dots + p_{e_n}(v) = \left \langle v,e_1 \right \rangle e_1 + \dots + \left \langle v,e_n \right \rangle e_n$,

Since the norm of each $ e_i$ is 1, we may skip the division by the square norm of $ e_i$ in the projection maps. Now, recalling that every vector can be written uniquely as a linear combination of the basis vectors, we see that the inner products $ \left \langle v, e_i \right \rangle$ are precisely those coefficients. The recognition of this fact leads us to an important theorem, with a necessary preliminary definition:

Definition: Two inner product spaces $ V, W$ are isometric if there exists a linear isomorphism $ f:V \to W$ which preserves the inner products in the respective spaces. i.e., $ \left \langle x, y \right \rangle_V = \left \langle f(x), f(y) \right \rangle_W$ for all $ x,y \in V$.

Whereas, linear isomorphisms between vector spaces are the mathematically rigorous way of saying the two vector spaces are identical, isometric vector spaces give the “sameness” of the inner products. Hence, isometric vector spaces have equivalent metrics, topologies, and geometries, with merely a different choice of basis and representation of the vectors. Now, as we had that every finite-dimensional vector space was isomorphic to $ \mathbb{R}^n$, we will soon see that every finite-dimensional inner product space is isometric to $ \mathbb{R}^n$ with the usual dot product.

Theorem: Any finite dimensional real inner product space $ V$ which has an orthonormal basis is isometric to $ \mathbb{R}^n$ with the usual dot product.

Proof: Define $ f: \mathbb{R}^n \to V$ by $ f((x_1, \dots, x_n)) = x_1e_1 + \dots + x_ne_n$. Now, by computation, we see that

$ \left \langle f((a_1, \dots , a_n)), f((b_1, \dots , b_n)) \right \rangle$
$ = \left \langle a_1e_1 + \dots + a_ne_n, b_1e_1 + \dots + b_ne_n \right \rangle$
$ = \sum \limits_{i=1}^n a_i \left \langle e_i, b_1e_1 + \dots + b_ne_n \right \rangle$
$ = \sum \limits_{i=1}^n \sum \limits_{j=1}^n a_i b_j \left \langle e_i, e_j \right \rangle$
$ = a_1b_1 + \dots a_nb_n = \left \langle (a_1, \dots a_n), (b_1, \dots b_n) \right \rangle \square$

Now, all we must prove is that every finite-dimensional inner product space has an orthonormal basis.

Theorem (Gram-Schmidt): Every basis of a finite-dimensional inner product space may be transformed into an orthonormal basis.

Proof: We do so by construction, using the previously introduced projection maps $ p_y(x)$. Given a basis $ (x_1, \dots , x_n)$, we compute the following algorithm:

  1. $ \displaystyle e_1 = \frac{x_1}{\left \| x_1 \right \|}$
  2. For each $ i = 2, \dots , n$:
    $ \ $

    1. Let $ z_i = \sum \limits_{j=1}^{i-1} p_{e_j}(x_i)$
      $ \ $
    2. $ \displaystyle e_i = \frac{x_i – z_i}{\left \| x_i – z_i \right \|}$

The reader may verify computationally that this process produces orthonormal vectors, but the argument is better phrased geometrically: each $ z_i$ is the projection of some new vector onto the subspace generated by all previously computed orthonormal vectors. By subtracting $ x_i – z_i$, we take the part of $ x_i$ that is orthogonal to all vectors in that subspace. This is guaranteed to be nonzero because of the linear independence of our original list. And so, $ e_i$ is orthogonal to every vector preceding it in the algorithm (with indices $ j < i$). Finally, we normalize each $ e_i$ to make them unit vectors, and we are done. $ \square$

We note that this algorithm takes $ O(n^3)$, since we may compute the $ O(n^2)$ needed inner products ahead of time, and then there remains $ O(n)$ steps to compute each of the $ e_i$.

This result now proves that every real finite-dimensional inner product space is isometric to $ \mathbb{R}^n$. With this new insight, we may effectively do all our work in $ \mathbb{R}^n$ with the usual dot product, realizing that the results there hold for all relevant inner product spaces. In other words, our “simplification” at the beginning of this post, restricting our work to $ \mathbb{R}^n$, was not at all a simplification. Proving statements about $ \mathbb{R}^n$ gives us equivalent statements about all real finite-dimensional inner product spaces. Wonderful!

Bases of Eigenvectors

There is one more important topic we wish to discuss here: the importance of bases of eigenvectors. In particular, given a linear operator $ A: V \to V$, if one has a basis of eigenvectors for $ A$, then $ A$ has a diagonal representation.

In particular, if $ A$ has a basis of eigenvectors $ (v_1, \dots , v_n)$, then the expansion of $ Av_i$ in terms of the basis vectors is just $ \lambda_i v_i$, where $ \lambda_i$ is the corresponding eigenvalue. Thus, the matrix corresponding to $ A$ looks like:

$ \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n
\end{pmatrix}$

Here we count multiplicity of eigenvalues.

The existence of a diagonal representation for a matrix has interesting computational implications. For instance, we often wish to take high powers of a matrix, such as in counting paths in a graph, or working with graphical computations. Unfortunately, each successive matrix multiplication takes $ O(n^2)$ computations. If we wish to compute $ A^m$, this takes $ O(mn^2)$ time. However, if the matrix has a diagonal representation, we may spend the $ O(n^2)$ it takes to convert the matrix to its diagonal form, take powers of the diagonal matrix by simply taking the powers of the diagonal entries, and then convert it back. Indeed, multiplying two diagonal matrices together is just as easy as multiplying the diagonal entries together, as the reader may verify. This optimization reduces computation to $ O(n^2 + mn) = O(nm)$, since we are assuming $ m$ is very large.

Of course, then we are left with the problem of quickly computing eigenvectors. What worries us even more is that we might not have a basis of eigenvectors (some matrices don’t have any!). We instead take a slightly different route, which serves our purposes better. Specifically, we will be using this information to compute eigenvectors of symmetric matrices ($ A = A^{\textup{T}}$). For this, we refer to a grand theorem:

The Spectral Theorem: Every real symmetric matrix has an orthonormal basis consisting of eigenvectors.

The proof goes beyond the scope of this post (see: characteristic polynomials and self-adjoint operators), but it is very useful for us. In particular, by finding these eigenvectors we may both have a diagonal representation for our matrix, and also compute projections in a jiffy! We will see the astounding applications of this quite soon.

So Even with two primers on linear algebra, we have still only scratched the surface of this wonderful subject. In the future we may continue this series of primers by discussing the linear algebra inherent in many optimization problems. Be sure to look out for it.

Until next time!