The Discrete Fourier Transform — A Primer

So here we are. We have finally made it to a place where we can transition with confidence from the classical continuous Fourier transform to the discrete version, which is the foundation for applications of Fourier analysis to programming. Indeed, we are quite close to unfurling the might of the Fast Fourier Transform algorithm, which efficiently computes the discrete Fourier transform. But because of its focus on algorithmic techniques, we will save it for a main content post and instead focus here on the intuitive connections between the discrete and continuous realms.

The goal has roughly three parts:

  1. Find a reasonable discrete approximation to a continuous function.
  2. Find a reasonable discrete approximation to the Fourier transform of a continuous function.
  3. Find a way to transition between the two discrete representations.

We should also note that there will be some notational clashes in the sequel. Rigorously, item 3 will result in an operation which we will denote by $ \mathscr{F}$, but will be distinct from the classical Fourier transform on continuous functions. Indeed, they will have algebraic similarities, but one operates on generalized functions, and the other on finite sequences. We will keep the distinction clear from context to avoid adding superfluous notation. Moreover, we will avoid the rigor from the last post on tempered distributions. Instead we simply assume all functions are understood to be distributions and use the classical notation. Again, this is safe because our dabbling with the classical transform will be solely for intuition.

Of course, the point of this entire post is that all of the facts we proved about the continuous Fourier transform have direct translations into the discrete case. Up to a constant factor (sometimes) and up to notational conventions, the two theories will be identical. This is part of the beauty of the subject; sit back and enjoy it.

Sampling

There is a very nice theorem about classical Fourier transforms that has to do with reconstructing a function from a discrete sample of its points. Since we do not actually need this theorem for anything other than motivation, we will not prove it here. Instead, we’ll introduce the definitions needed to state it, and see why it motivates a good definition for the discrete “approximation” to a function. For a much more thorough treatment of the sampling theorem and the other issues we glaze over in this post, see these lecture notes.

Definition: A function $ f$ is time limited if it has bounded support. A function $ f$ is bandlimited if its Fourier transform has bounded support. That is, if there is some $ B$ so that the Fourier transform of $ f$ is only nonzero when $ |s|<B$. We call $ B$ the bandwidth of $ f$.

To be honest, before seeing a mathematical treatment of signal processing, this author had no idea what bandwidth actually referred to. It’s nice to finally solve those life mysteries.

In any case, it often occurs that one has a signal $ f$ for which one can only measure values, but one doesn’t have access to an exact description of $ f$. The sampling theorem allows us to reconstruct $ f$ exactly by choosing certain sample points. In a simplified form, the theorem is as follows:

TheoremSuppose $ f$ is a function of bandwidth $ B$. Then one can reconstruct $ f$ exactly from the set of points $ (k/2B, f(k/2B))$ as $ k$ ranges over $ \mathbb{Z}$ (that is, the sampling rate is at least $ 1/2B$).

Unsurprisingly, the proof uses the Fourier transform in a nontrivial way. Moreover, there is a similar theorem for the Fourier transform $ \mathscr{F}f$, which can be reconstructed exactly from its samples if the sampling rate is at least $ 1/L$, where $ L/2$ bounds the support of $ f$. Note that the sampling rate in one domain is determined by the limitations on the other domain.

What’s more, if we are daring enough to claim $ f$ is both time limited and bandlimited (and we sample as little as possible in each domain), then the number of sample points is the same for both domains. In particular, suppose $ f(t)$ is only nonzero on $ 0 \leq t \leq L$ and its Fourier transform on $ 0 \leq s \leq 2B$. Note that since $ f$ is both timelimited and bandlimited, there is no fault in shifting both so their smallest value is at the origin. Then, if $ n, m$ are the respective numbers of sampled points in the time and frequency domains, then $ m/L = 2B, n/2B = L$, and so $ m = n = 2BL$.

Now this gives us a good reason to define the discrete approximation to a signal as

$ \displaystyle f_{\textup{sampled}} = (f(t_0), f(t_1), \dots, f(t_{n-1})),$

where $ t_k = k/2B$. This accomplishes our first task. Now, in order to determine what the discrete Fourier transform should look like, we can represent this discrete approximation as a distribution using shifted deltas:

$ \displaystyle f_{\textup{sampled}}(t) = \sum_{k=0}^{n-1}f(t_k)\delta(t-t_k)$

Here the deltas ensure the function is zero at points not among the $ t_k$, capturing the essence of the finite sequence above. Taking the Fourier transform of this (recalling that the Fourier transform of the shifted delta is a complex exponential and using linearity), we get

$ \displaystyle \mathscr{F}f_{\textup{sampled}}(s) = \sum_{k=0}^{n-1}f(t_k)e^{-2 \pi i s t_k}$

Denote the above function by $ F$. Now $ F$ is unfortunately still continuous, so we take the same approach we did for $ f$ and sample it via the samples in the frequency domain at $ s_j = j/L$. This gives the following list of values for the discrete approximation to the transformed version of $ f$:

$ \displaystyle F(s_0) = \sum_{k=0}^{n-1} f(t_k)e^{-2 \pi i s_0t_k}$

$ \displaystyle F(s_1) = \sum_{k=0}^{n-1} f(t_k)e^{-2 \pi i s_1t_k}$

$ \vdots$

$ \displaystyle F(s_n) = \sum_{k=0}^{n-1} f(t_k)e^{-2 \pi i s_nt_k}$

And now the rest falls together. We can denote by $ (\mathscr{F}f)_{\textup{sampled}}$ the tuple of values $ F(s_j)$, and the list of formulas above gives a way to transition from one domain to the other.

A Discrete Life

Now we move completely away from the continuous realm. The above discussion was nice in giving us an intuition for why the following definitions are reasonable. However, in order to actually use these ideas on discrete sequences, we can’t be restricted to sampling continuous functions. In fact, most of our planned work on this blog will not use discrete approximations to continuous functions, but just plain old sequences. So we just need to redefine the above ideas in a purely discrete realm.

The first step is to get rid of any idea of the sampling rate, and instead write everything in terms of the indices. This boils down to the a convenient coincidence. If $ t_k = k/2B, s_j = j/L$, then by our earlier remarks that $ 2BL = n$ (the number of sample points taken), then $ t_ks_j = kj/n$. This allows us to rewrite the complex exponentials as $ e^{-2 \pi i kj/n}$, which is entirely in terms of the number of points used and the indices of the summation.

In other words, we can finally boil all of this discussion down to the following definition. Before we get to it, we should note that we use the square bracket notation for lists. That is, if $ L$ is a list, then $ L[i]$ is the $ i$-th element of that list. Usually one would use subscripts to denote this, but we’d like to stay close to our notation for the continuous case, at least to make the parallels between the two theories more obvious.

DefinitionLet $ f = (f[1], \dots f[n])$ be a vector in $ \mathbb{R}^n$. Then the discrete Fourier transform of $ f$ is defined by the vector $ (\mathscr{F}f[1], \dots, \mathscr{F}f[n])$, where

$ \displaystyle \mathscr{F}f[j] = \sum_{k=0}^{n-1} f[k]e^{-2 \pi i jk/n}$

The phrase “discrete Fourier transform” is often abbreviated to DFT.

Now although we want to showcase the connections between the discrete and continuous Fourier transforms, we should note that they are completely disjoint. If one so desired (and many books follow this route) one could start with the discrete Fourier transform and later introduce the continuous version. We have the advantage of brevity here, in that we know what the theorems should say before we actually prove them. The point is that while the two transforms have connections and similarities, their theory is largely independent, and moreover the definition of the Fourier transform can be given without any preface. Much as we claimed with the continuous Fourier transform, the discrete Fourier transform is simply an operation on lists of data (i.e., on vectors).

Pushing the discrete to the extreme, we can represent the list of complex exponentials as a discrete signal too.

Definition: The discrete complex exponential of period $ n$ is the list

$ \displaystyle \omega_n = (1, e^{2 \pi i/n}, \dots, e^{2 \pi i (n-1)/n}).$

We will omit the subscript $ n$ when it is understood (at least, for the rest of this primer). And hence $ \omega[k] = e^{2 \pi i k/n}$ and moreover $ \omega^m[k] = \omega^k[m] = e^{2\pi imk/n}$. If we denote by $ \omega^n$ the vector of entry-wise powers of $ \omega$, then we can write the discrete Fourier transform in its most succinct form as:

$ \displaystyle \mathscr{F}f[m] = \sum_{k=0}^{n-1} f[k]\omega^{-m}[k]$

or, recognizing everything without indexing in “operator notation,”

$ \displaystyle \mathscr{F}f = \sum_{k=0}^{n-1}f\omega^{-m}.$

There are other ways to represent the discrete Fourier transform as an operation. In fact, since we know the classical Fourier transform is a linear operator, we should be able to come up with a matrix representation of the discrete transform. We will do just this, and as a result we will find the inverse discrete Fourier transform, but first we should look at some examples.

The Transforms of Discrete Signals

Perhaps the simplest possible signal one could think of is the delta function, whose discretized form is

$ \displaystyle \delta = (1,0, \dots, 0)$

with the corresponding shifted form

$ \displaystyle \delta_k = (0, 0, \dots 0, 1, 0, \dots 0)$

where the 1 appears in the $ k$-th spot. The reader should be convinced that this is indeed the right definition because it’s continuous cousin’s defining properties translate nicely. Specifically, $ \sum_{k}\delta_m[k] f[k] = f[m]$ for all $ m$, and $ \sum_{k}\delta[k] = 1$.

Now to find its Fourier transform, we simply use the definition:

$ \displaystyle \mathscr{F}\delta[m] = \sum_{k=0}^{n-1}\delta[k] \omega^{-m}[k]$

The only time that $ \delta[k]$ is nonzero is for $ k=0$, so this is simply the vector $ \delta[0] \omega^{0}$, which is the vector consisting entirely of 1’s. Hence, as in the continuous case (or at least, for the continuous definition of the 1 distribution),

$ \displaystyle \mathscr{F}\delta = 1$

In an identical manner, one can prove that $ \mathscr{F}\delta_k = \omega^{-k}$, just as it was in for the continuous transform.

Now let’s do an example which deviates slightly from the continuous case. Recall that the continuous Fourier transform of the complex exponential was a delta function (to convince the reader, simply see that a single complex exponential can only have a single angular variable, and hence a singular frequency). In the discrete case, we get something similar.

$ \displaystyle \mathscr{F}\omega^{d} = \sum_{k=0}^{n-1} \omega^d\omega^{-m}$,

so looking at the $ m$-th entry of this vector, we get

$ \displaystyle \mathscr{F}\omega^d[m] = \sum_{k=0}^{n-1} \omega^d[k] \omega^{-m}[k]$

but because $ \omega^{-m}[k] = e^{-2 \pi i km/n} = \overline{\omega^m[k]}$, we see that the sum is just the usual complex inner product $ \left \langle \omega^d, \omega^m \right \rangle$. Further, as the differing powers of $ \omega$ are orthogonal (hint: their complex inner product forms a geometric series), we see that they’re only nonzero when $ d =m$. In this case, it is easy to show that the inner product is precisely $ n$. Summarizing,

This is naturally $ n \delta_d$, so our final statement is just $ \mathscr{F}\omega^d = n\delta_d$. Note that here we have the extra factor of $ n$ floating around. The reason for this boils down to the fact that the norm of the complex exponential $ \omega$ is $ \sqrt{n}$, and not 1. That is, the powers of $ \omega$ do not form an orthonormal basis of $ \mathbb{C}^n$. There are various alternative definitions that try to compensate for this, but to the best of this author’s knowledge a factor of $ n$ always shows up in some of the resulting formulas. In other words, we should just accept this deviation from the continuous theory as collateral damage.

The mischievous presence of $ n$ also shows up in an interesting way in the inverse discrete Fourier transform.

The DFT and its Inverse, as a Matrix

It’s a trivial exercise to check by hand that the discrete Fourier transform is a linear operation on vectors. i.e., $ \mathscr{F}(v_1 + v_2)[m] = \mathscr{F}v_1[m] + \mathscr{F}v_2[m]$ for all vectors $ v_1, v_2$ and all $ m$. As we know from our primer on linear algebra, all such mappings are expressible as matrix multiplication by a fixed matrix.

The “compact” form we represented the discrete Fourier transform above sheds light onto this question. Specifically, the formula

$ \displaystyle \mathscr{F}f[m] = \sum_{k=0}^{n-1}f[k]\omega^{-m}[k]$

Is basically just the definition of a matrix multiplication. Viewing $ f$ as the column vector

$ \displaystyle \begin{pmatrix} f[0]\\ f[1]\\ \vdots\\ f[n] \end{pmatrix}$

It is easy to see that the correct matrix to act on this vector is

A perhaps easier way to digest this matrix is by viewing each row of the matrix as the vector $ \omega^{-m}$.

$ \displaystyle \mathscr{F} = \begin{pmatrix} \textbf{—} & \omega^0 & \textbf{—} \\ \textbf{—} & \omega^{-1} & \textbf{—} \\ & \vdots & \\ \textbf{—} & \omega^{-(n-1)} & \textbf{—} \end{pmatrix}$ 

Now let’s just admire this matrix for a moment. What started many primers ago as a calculus computation requiring infinite integrals and sometimes divergent answers is now trivial to compute. This is our first peek into how to compute discrete Fourier transforms in a program, but unfortunately we are well aware of the fact that computing this naively requires $ O(n^2)$ computations. Our future post on the Fast Fourier Transform algorithm will take advantage of the structure of this matrix to improve this by an order or magnitude to $ O(n \log n)$.

But for now, let’s find the inverse Fourier transform as a matrix. From the first of the two matrices above, it is easy to see that the matrix for $ \mathscr{F}$ is symmetric. Indeed, the roles of $ k,m$ are identical in $ e^{-2\pi i km/n}$. Furthermore, looking at the second matrix, we see that $ \mathscr{F}$ is almost unitary. Recall that a matrix $ A$ is unitary if $ AA^* = I$, where $ I$ is the identity matrix and $ A^*$ is the conjugate transpose of $ A$. For the case of $ \mathscr{F}$, we have that $ \mathscr{FF^*} = nI$. We encourage the reader to work this out by hand, noting that each entry in the matrix resulting from $ \mathscr{FF^*}$ is an inner product of powers of $ \omega$.

In other words, we have $ \mathscr{F}\cdot \frac{1}{n}\mathscr{F^*} = I$, so that $ \mathscr{F}^{-1} = \frac{1}{n}\mathscr{F^*}$. Since $ \mathscr{F}$ is symmetric, this simplifies to $ \frac{1}{n}\overline{\mathscr{F}}$.

Expanding this out to a summation, we get what we might have guessed was the inverse transform:

$ \displaystyle \mathscr{F}^{-1}f[m] = \frac{1}{n} \sum_{k=0}^{n-1} f[k]\omega^m[k]$.

The only difference between this formula and our intuition is the factor of $ 1/n$.

Duality

The last thing we’d like to mention in this primer is that the wonderful results on duality for the continuous Fourier transform translate to the discrete (again, with a factor of $ n$). While we leave the details as an exercise to the reader,

In order to do this, we need some additional notation. We can think of $ \omega$ as an infinite sequence which would repeat itself every $ n$ entries (by Euler’s identity). Then we can “index” $ \omega$ higher than $ n$, and get the identity $ \omega^m[k] = \omega[mk]$. Similarly, we could “periodize” a discrete signal $ f$ so that $ f[m]$ is defined by $ f[m \mod n]$ and we allow $ m \in \mathbb{Z}$.

This periodization allows us to define $ f^-$ naturally as $ f^-[m] = f[-m \mod n]$. Then it becomes straightforward to check that $ \mathscr{F}f^- = (\mathscr{F}f)^-$, and as a corollary $ \mathscr{FF}f = nf^-$. This recovers some of our most powerful tools from the classical case for computing Fourier transforms of specific functions.

Next time (and this has been a long time coming) we’ll finally get to the computational meat of Fourier analysis. We’ll derive and implement the Fast Fourier Transform algorithm, which computes the discrete Fourier transform efficiently. Then we’ll take our first foray into playing with real signals, and move on to higher-dimensional transforms for image analysis.

Until then!

Generalized Functions — A Primer

Last time we investigated the naive (which I’ll henceforth call “classical”) notion of the Fourier transform and its inverse. While the development wasn’t quite rigorous, we nevertheless discovered elegant formulas and interesting properties that proved useful in at least solving differential equations. Of course, we wouldn’t be following this trail of mathematics if it didn’t result in some worthwhile applications to programming. While we’ll get there eventually, this primer will take us deeper down the rabbit hole of abstraction. We will develop the necessary framework required to reason about Fourier transforms in a mathematically rigorous manner. Most importantly, we will avoid the divergent integrals which, when we try to use them in an otherwise rigorous proof, make our stomachs heave.

It turns out the rigorous theory is wholly neat and tidy. The whole idea hinges on a part of linear algebra which is slightly more advanced than what we’ve seen so far on this blog, but it is by all means accessible to the reader who has mastered our relevant primers. And even though we still will overlook some of the more minute details of the theory, we will cover a nontrivial portion of it, and leave further exploration to the reader’s whim and discretion.

A Bit of Motivation

Every tidy mathematical theory deserves some kind of motivation, and the theory of Fourier transforms is no different. The primary motivating question, however, does not often change. That is, we want to ask, “Which functions make the classical Fourier transform as nice as possible?” and build our theory from that foundation. The tricky part is rigorously defining what it means to be “good” for the Fourier transform. One obvious condition we should require is that such a function have a classically-defined Fourier transform (that is, the integral defining its transform converges), but is that enough?

It turns out that this is not enough. Taking for granted the many years of mathematical development and genius that resulted in the correct conditions, we state the following criteria:

Criteria: A class of functions $ C$ is good for Fourier transforms if it satisfies the following two criteria:

  1. If $ f \in C$ then $ \mathscr{F}f \in C$ and $ \mathscr{F}^{-1}f \in C$.
  2. $ \mathscr{FF}^{-1}f = f = \mathscr{F}^{-1}\mathscr{F}f$ for all $ f \in C$.

Now if we can find a class of functions which satisfies these two criteria, then it should be a good candidate to base our formal theory of Fourier transforms on. But this raises the obvious question: what does one mean by “basing a theory” on a class of functions? It would be a waste of time to try to put it in less general non-mathematical terms, so let’s just slide right into it.

Generalized Functions

We’ll begin with the complete unabridged definition:

Definition: Given a space $ A$ of functions $ f: \mathbb{R} \to \mathbb{R}$, a generalized function on $ A$ is a continuous linear functional on $ A$.

Breaking the definition down, a linear functional is a function $ A \to \mathbb{R}$ which is linear. Explicitly, if $ T$ is a linear functional, then $ T$ operates on functions, and outputs complex numbers in a way that the following identity holds:

$ \displaystyle T(af + bg) = aTf + bTg$

for all $ a, b \in \mathbb{C}, f,g \in A$.

The requirements on the space $ A$ are a bit tricky to pin down, and the details begin to overwhelm the reader quite quickly. We will be content to say that $ A$ has some notion of distance which is complete. That is, sequences of functions which converge have their limit inside the space. Such a notion is required to talk about continuity.

With all that securely bolted down, we can finally say that a generalized function is just a continuous linear functional, i.e., a linear mapping $ T: A \to \mathbb{C}$ which satisfies

$ T(f_n) \to Tf$ whenever $ f_n \to f$ in $ A$.

Often generalized functions on $ A$ are simply called distributions on $ A$, but this author thinks that term clashes with probability distributions in an unnatural way. In fact, all probability distributions can be realized as generalized functions on some suitably chosen function space, so the name isn’t there without a reason. But for a student with no formal measure theory or probability theory under his belt, the coincidence of names can be confusing. We will stick to the term generalized function.

There is another interesting bit of notation that accompanies generalized functions, and that is to write a generalized function as if it were an inner product. i.e., instead of writing $ Tf$ for a linear functional $ T$ operating on a function $ f$, one writes $ \left \langle T, f \right \rangle$. We personally think this is silly, but we will bring it up later when we discuss the Fourier transform, because this choice of notation hints at a deep mathematical theorem. The interpretation usually given to physics students is that this notation represents a “pairing” between $ T$ and $ f$, and that the way you can tell a generalized function from a regular function is by the fact that it’s defined by how it “operates” with other functions.

The most prominent example of this is the Dirac delta function $ \delta$. When trying to define it, one might say ridiculous things like “it’s zero everywhere except at the origin, where it’s infinite,” and, “the integral from $ -\infty$ to $ \infty$ of the delta function is 1.” Both of these claims ludicrously defy classical analysis; the delta simply cannot be defined as a function. On the other hand, it has a very natural definition in terms of how it operates when “multiplied” by another function and then integrated. In our pairing notation, this definition is that $ \left \langle \delta, f \right \rangle = f(0)$.

To prove this works rigorously, let $ A$ be any space of functions for which convergence in $ A$ implies pointwise convergence. Consider the mapping $ \delta(f) = f(0)$ extended linearly from any basis. The linearity condition is manifestly held, and given a convergent sequence $ f_n \to f$, the sequence $ \delta(f_n) = f_n(0) \to f(0)$, since we required pointwise convergence.

In fact, it will work in situations with weaker hypotheses (and for Fourier transforms the hypotheses certainly are weaker), but these details are beyond the scope of this primer (for instance, see this necessary and sufficient condition).

For the purpose of working with Fourier transforms, our generalized functions will be defined almost always by integration. That is, even though we have yet to figure out what $ A$ is, we want a generalized function $ T$ to act on a function using the usual kinds of inner products on function spaces like $ L^2$.

One large class of examples of generalized functions are those which are induced by regular functions. That is, we will be able to take any function $ g$ (discontinuous or wild as you like), and define the generalized function $ T_g$ by

$ \displaystyle T_g(f) = \int_{-\infty}^{\infty} g(x)f(x)dx$

Now, of course we require that $ f$ satisfy whatever conditions we impose on $ A$, and these will ensure that the integral always converges, no matter what $ g$ is.

Now supposing our criteria for being “good for Fourier transforms” holds for $ A$, and we have such a class where we can define linear functionals by integration. Then we can define quite easily the Fourier transform of a generalized function on $ A$.

There is a nice derivation here, and it goes like this: if $ T$ is a generalized function and it happens to be induced by a Fourier-transformable function $ g(x)$, then the Fourier transform of $ T$ is defined by

$ \displaystyle (\mathscr{F}T)(\varphi) = \int_{-\infty}^{\infty} \mathscr{F}g(x)\varphi(x)dx$

And since $ g$ is transformable, we can expand the classical definition as an integral, giving

$ \displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2\pi ixy} g(y) dy \right ) \varphi(x) dx$

Swapping the order of integration we get

$ \displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2 \pi ixy} \varphi(x) dx \right ) g(y)dy$

And the inner integral is simply $ \mathscr{F}\varphi$, giving

$ \displaystyle \int_{-\infty}^{\infty} \mathscr{F}\varphi(y) g(y) dy = T(\mathscr{F}\varphi)$

Since the pairing of $ T$ is defined by integrating with a product of $ g$.

But notice that the final form of our expression for $ (\mathscr{F}T)(\varphi)$ does not at all require that $ T$ is induced by $ g$. All that we require is that $ \varphi$ is transformable, and that is always true by our assumption that $ A$ satisfies our original criterion. This motivates the definition:

DefinitionThe Fourier transform of a generalized function $ T$ is defined by $ (\mathscr{F}T)(f) = T(\mathscr{F}f)$. Similarly, the inverse Fourier transform is defined by $ (\mathscr{F}^{-1}T)(f) = T(\mathscr{F}^{-1}f)$.

This is much neater, and in fact more general than a definition based on integration. The point of deriving it the way we did is so that we can have a theory which reduces to our classical notions given the right assumptions, but can be framed in other, perhaps unexpected contexts. Such is the beauty of mathematics.

Moreover, we note that the “pairing” notation hints at an interesting fact about Fourier transforms. In particular, the above definition says that $ \left \langle \mathscr{F}T,f \right \rangle = \left \langle T, \mathscr{F}f \right \rangle$. One familiar with the basics of finite-dimensional functional analysis will recognize this immediately as the condition for $ \mathscr{F}$ to be a self-adjoint operator. While we won’t discuss self-adjoint operators in this primer (we’ll save it for a future primer on functional analysis; we foresee this topic surfacing again when we cover support vector machines), we will note for the knowledgable reader that with a few additional conditions, this is precisely what is going on with the Fourier transform. However, since we’re dealing with an infinite-dimensional vector space, we can’t quite say it’s a diagonalizable matrix, but it is “multiplication by a constant” in a sense. The relationship is most evident when again $ T$ is induced by a  regular function, and then the pairing is given by integration after multiplication by the “constant” $ g$. Additionally, the fact that we represent a generalized function by the inner product notation hints at Riesz-type representation theorems, but we will press the issue no further here.

Sticking in this abstract land of an unknown $ A$ a little longer, we can reprove some of the basic facts about Fourier transforms in this general setting. For instance,

$ \displaystyle (\mathscr{F}\mathscr{F}^{-1}T)(f) = (\mathscr{F}^{-1}T)(\mathscr{F}f) = T(\mathscr{F}^{-1}\mathscr{F}f) = T(f)$

since the function $ f$ is required to satisfy the criteria. So in other words, $ \mathscr{F}\mathscr{F}^{-1}T = T$, and it is trivial to see the reverse composition yields the same.

A similarly easy proof recovers the identity that confused us last time, namely that $ (\mathscr{FF}T)(f) = -Tf$. We leave it as an exercise to the reader

Schwartz Functions, and Tempered Distributions

It’s time to forego the stalling and declare what the set of functions $ A$ should be to make this all work. This bit is generally considered the hard work and inspired genius of a man named Laurent Schwartz, and so they are appropriately called Schwartz functions.

Definition: The class $ S$ of rapidly decreasing functions on $ \mathbb{R}$, or Schwartz functions, is the set of the smooth functions $ f : \mathbb{R} \to \mathbb{R}$ which satisfy the following condition for all $ m, n \in \mathbb{N}$:

$ \displaystyle |x|^n \left | \frac{d^m}{dx^m} f(x) \right | \to 0$ as $ x \to \pm \infty$

Recall, if forgotten, that smooth simply means infinitely differentiable. While we won’t get into the nitty gritty of proving the following facts (they’re quite analytic, and this author is an algebraist), we will state the important properties of $ S$.

First and foremost, $ S$ is a vector space which satisfies our criteria for being “good for Fourier transforms.” Another way of saying this is that the Fourier transform is a linear isomorphism on $ S$. Second, $ S$ includes all smooth functions with compact support; that is, it includes all functions which are nonzero except on a closed and bounded set. Moreover, $ S$ includes all Gaussian curves. Third, all functions in $ S$ are uniformly continuous. And finally, $ S \subset L^p$ for all $ p \geq 1$.

In other words, things are as nice as we could ever hope for them to be, with respect to taking Fourier transforms.  Things converge, transforms are defined, and things just work.

But of course, the real meat of the discussion comes when we analyze generalized functions on $ S$. When the class is specifically $ S$, these generalized functions are called tempered distributions. As we have already seen, the Dirac delta “function” is a tempered distribution. And with this new framework, we can start to compute the Fourier transforms of functions we couldn’t previously. For instance, the Fourier transform of the Dirac delta is the constant 1 function:

$ \displaystyle \mathscr{F}\delta(\varphi) = \delta(\mathscr{F}\varphi) = \mathscr{F}\varphi(0)$

But $ \mathscr{F}\varphi(0)$ is classically computable, and it’s just $ \int_{-\infty}^{\infty}\varphi(x)dx$, which is another way to say $ 1(\varphi)$, where 1 is understood to be the tempered distribution induced by the constant 1 function. We just showed that $ \mathscr{F}\delta(\varphi) = 1(\varphi)$ for all $ \varphi \in S$. In other words, $ \mathscr{F}\delta = 1$.

This has the nice interpretation that being infinitely concentrated in the time domain (as is the delta) corresponds to being infinitely spread out in the frequency domain. Similarly, being spread out infinitely in the time domain is equivalent to being concentrated at a single point in the frequency domain, as the reader will have no trouble proving that $ \mathscr{F}1 = -\delta$. The eager reader will go ahead and find the Fourier transforms of the complex exponential $ e^{2\pi iax}$ and $ \cos(2 \pi i ax), \sin(2 \pi i ax)$.

Operations on Tempered Distributions

Now that we have tempered distributions, it makes sense to start investigating the various operations we can perform on them. As we just saw, the Fourier transform is one of them, but it is useful to have a couple of others.

We should note that even some basic operations aren’t defined for generalized functions. For instance, with regular functions $ f,g$, we could compute the product $ fg$. This is not defined for all generalized functions. In fact, since the space of generalized functions is a vector space, the only kinds of operations we can apply are linear ones. The Fourier transform counts as one, and so does addition and multiplication by a constant. Multiplication is not a linear operation.

On the other hand, if one restricts to tempered distributions, one can compute the product of a tempered distribution with a function in $ S$. The derivation of the definition follows the same philosophy that it did for the definition of the Fourier transform, and the computation is quite trivial. In fact, we get something slightly more general:

Definition: Given a function $ f$ such that $ f\varphi \in S$ for all $ \varphi \in S$ and a tempered distribution $ T$, we define $ fT$ by $ fT(\varphi) = T(f\varphi)$.

Continuing with the same derivation philosophy, we can define the derivative of a tempered distribution $ T$:

Definition: The derivative of a tempered distribution $ T$, denoted $ T’$, is defined by $ T'(\varphi) = -T(\varphi’)$.

A special case of this occurs when $ T$ is the delta function; we have $ f\delta = f(0)\delta$ as tempered distributions. Moreover, from this definition it is easy to reprove the classical identities $ \mathscr{F}T’ = (2\pi is)\mathscr{F}T$, and $ (\mathscr{F}T)’ = \mathscr{F}(-2\pi it T)$.

Finally, while the convolution of two tempered distributions is also undefined in general, some additional hypotheses allow it to work (see here), and then we can regain the theorem that $ \mathscr{F}(f * T) = (\mathscr{F}f)(\mathscr{F}T)$. Again, a special case of this is the delta function: $ f * \delta =f$ as tempered distributions.

This menagerie of properties all works toward reclaiming the theorems we proved about the classical Fourier transform. For the purpose of this blog, we will henceforth blur the distinction between the classical theory and this more complicated (and correct) theory of Fourier transforms. It’s a lame cop out, we admit, but it allows us to focus on the less pedantic aspects of the theory and applications. This stuff took decades to iron out, and for good reason!

So next time we will continue with discrete Fourier transforms and multidimensional Fourier transforms. Until then!

The Fourier Transform — A Primer

In our last primer we saw the Fourier series, which flushed out the notion that a periodic function can be represented as an infinite series of sines and cosines. While this is fine and dandy, and quite a powerful tool, it does not suffice for the real world. In the real world, very little is truly periodic, especially since human measurements can only record a finite period of time. Even things we wish to explore on this blog are hardly periodic (for instance, image analysis). To compensate, we’ll develop the Fourier transform, which can be thought of as a limiting case of the Fourier series.

We will approach the Fourier transform from two different angles. First, we will take the “limiting case of the Fourier series” notion as far as it will take us (this will be quite far) to motivate the appropriate definitions. In this naive world, we will perform a few interesting computations, and establish the most useful properties of the Fourier transform as an operation. On the other hand, this naive world will be fraught with hand-waving and mathematical laxity. We will make statements that analysts (and people who know about the issues of convergence) would find uncouth.

And so we will redevelop the bulk of the theory from the ground up, and define the Fourier transform on a larger class of things called distributions. In doing so, we will circumvent (or rather, properly handle) all of the issues of convergence.

Fourier and Naivete

The Fourier transform is best thought of as an operation on functions which has some nice properties. One such property is linearity, and a more complex one is its effect on the convolution operation. But if one wants to know where the transform comes from, and this is an important thing to know, then we could interpret it as a generalization of the Fourier series coefficient. Similarly, its inverse can be thought of as a generalization of the Fourier series representation (or, the map taking a function to its Fourier series representation). The generalization we speak of, and the “limiting case” we mentioned earlier, is in the size of the period. In rough terms,

The Fourier transform is the limit of the Fourier coefficient as the period of the function tends to infinity.

This is how we will develop the definition of the Fourier transform, and the reader should understand why this is a sensible place to start: a function which has no period is simply a function which has an infinitely large period. Colloquially, one can “spot” periodicity much easier if the period is shorter, and as the period increases, functions start to “look” less and less periodic. So in the limiting case, there is no period.

In order to do this correctly, we should alter some of the definitions we made in our post on Fourier series. Specifically, we want to have an arbitrary period $ T$. So instead of making the 1-periodic complex exponential $ e^{2\pi i k t}$ our basic building block, we choose $ e^{2 \pi i t k/T}$. The reader will check that as $ k$ varies over all integers, these new complex exponentials still form an orthonormal basis of $ L^2$ (well not quite, we have to modify the inner product slightly; see below). Then, using the notation we used last time for the Fourier coefficients $ c_k$, the series is calculated as

$ \displaystyle \sum_{k = -\infty}^{\infty} c_k e^{\frac{2 \pi i k t}{T}}$

where the $ c_k$ are computed via the new inner product:

$ \displaystyle c_k = \frac{1}{T}\int_{0}^{T}e^{\frac{-2 \pi i k t}{T}} f(t)dt$

We make another slight alteration in the limits of integration:

$ \displaystyle c_k = \frac{1}{T} \int_{-T/2}^{T/2} e^{\frac{-2 \pi i k t}{T}} f(t)dt$

This doesn’t change the integral, since a  $ T$-periodic function has the same integral on any interval of length $ T$.

Before we continue, we should show an intuitive aspect of what happens when $ T \to \infty$. We can think of the usual Fourier series representation of a 1-periodic function $ f$ as a function on the integers whose values are the Fourier coefficients $ k \mapsto c_k$, since the coefficients completely determine the representation of $ f$.  The interval between the inputs to this mapping is 1, because the function is 1-periodic. If we generalize this to an arbitrary period $ T$, then we have functions whose inputs are multiples of $ 1/T$. So the intervals between adjacent inputs shrink to $ 1/T$. As $ T$ grows, the inputs are moving closer and closer together. In other words, the Fourier series representation is becoming a continuous mapping of the frequency! This viewpoint will motivate our seemingly magical choices to follow, and it partially justifies the common notion that the Fourier transform takes a function “in the time domain” and represents it “in the frequency domain.” It’s a stupid notion for mathematicians, but everyone says it anyway.

So if we try to take the limit immediately, that is, if we use the exact formula above for $ c_k$ and try to evaluate $ \lim_{T \to \infty} c_k$, we have issues. The problem rears it’s ugly head when we let $ f$ be a function with bounded support (that is, $ f(x)$ is zero everywhere except possibly on a finite interval). If $ f$ is zero outside of $ [a,b]$ and $ \int_a^b |f(x)|dx \leq M$ is finite, then for some large $ T$, we have that all of the Fourier coefficients go to zero as $ T \to \infty$. The details:

$ \displaystyle |c_k| = \left | \frac{1}{T}\int_{-T/2}^{T/2} e^{\frac{-2 \pi i kt}{T}} f(t)dt \right | \leq \frac{1}{T} \int_{-T/2}^{T/2} \left | e^{\frac{-2 \pi i k t}{T}} \right | |f(t)| dt$

But as the absolute value of the complex exponential is 1, we can bound this by

$ \displaystyle \frac{1}{T} \int_{-T/2}^{T/2} |f(t)|dt \leq \frac{M}{T}$,

and as $ T \to \infty$, we see that the whole thing goes to 0.

The solution is (magic!) to scale linearly by $ T$, and pull the balancing factor of $ 1/T$ outside of the coefficients $ c_k$, and into the Fourier series itself. In other words, our new Fourier series is (written with the terms rearranged for good reason)

$ \displaystyle \sum_{k = -\infty}^{\infty} e^{\frac{-2 \pi i k t}{T}} c_k (1/T) \frac{1}{T}$

where the coefficients $ c_k(1/T)$ are

$ \displaystyle c_k(1/T) = \int_{-T/2}^{T/2}e^{\frac{-2 \pi i k t}{T}} f(t)dt$

Now we suddenly (magically!) realize that the first equation is just the usual Riemann sum from calculus for the estimate of an integral. If we think of $ x$ as our variable, then we’d be integrating $ e^{-2 \pi i t x} c_k(x)$. And in particular, when the interval between the $ x$ values goes to 0, the discrete sum converges to an integral by definition. Let us denote the infinitesimal variable $ s$ to represent this “limit of $ 1/T$.” Then we redefine the two above equations with glorious new names:

Definition: The Fourier transform of a function $ f: \mathbb{R} \to \mathbb{C}$ is the integral

$ \displaystyle \mathscr{F}f(s) = \int_{-\infty}^{\infty}e^{-2 \pi i s t} f(t)dt$

whenever such an integral converges.

The inverse Fourier transform of $ g$ is the integral

$ \displaystyle \mathscr{F}^{-1}g(t) = \int_{-\infty}^{\infty} e^{2\pi i s t}g(s)ds$

whenever such an integral converges.

And so, the Fourier transform above generalizes the Fourier coefficient $ c_k$ (the limits of integration go to infinity), while the inverse transform generalizes the Fourier series reconstruction, by our conversion from a discrete sum to an integral.

We should note a few things about this definition, because it is quite a mouthful. First, $ \mathscr{F}$ and $ \mathscr{F}^{-1}$ operate on functions. In other words, they accept a function as input, and their values are functions. Still in other words, the parentheses are like $ (\mathscr{F}f)(s)$, and not like $ \mathscr{F}(f(s))$. We will often omit the parentheses with this implicitly understood precedence. This is also part of why we choose different variables. $ f(t)$ will often use a different variable than its transform $ \mathscr{F}f(s)$.

Second, returning to our remark about stupid notions, the function $ f(t)$ can be thought of as being in the “time domain,” where the inputs are instances of time, while the transformed function $ \mathscr{F}f$ is in the “frequency domain.” That is, for a given input $ s$, the Fourier transform $ \mathscr{F}f(s)$ describes how the complex exponential with frequency $ s$ contributes to the overall function $ f$. The set of values of the Fourier transform of $ f$ is called the spectrum of $ f$. One can visualize the spectrum, but only indirectly. A complex-valued function is always hard to visualize, so instead one graphs $ |\mathscr{F}f(s)|$ as a function on the real numbers. We then get pretty pictures like this one giving the spectrum of some human-spoken words:

This also explains the humorous comic at the beginning of this post: the thing saying “meow” is the spectrum of a cat, complete with whiskers. The comic also reinforces the idea that $ \mathscr{F}$ is simply an operation on functions. One does not need to restrict $ \mathscr{F}$ to operate on functions whose domain is time (indeed, a cat is not a function of time). It’s just an instance in which one can concretely interpret the transform of a function. For example, if one wanted to (and we will shortly), one could wonder about $ \mathscr{FF}f$, or even apply $ \mathscr{F}$ arbitrarily many times and see what happens under the limit. The same thing goes for the inverse Fourier transform.

The last big issue we have with the above definition is that it only makes sense when the integral actually converges. We will run into a few examples where this becomes a big problem, but we will sweep these issues under the rug for now (remember, this is still the land of naivete).

Nevertheless, we can do some wonderful computations with this mentality and this new definition. It will benefit us in the long run, because we’ll discover the useful properties of the Fourier transform now, and use those properties to steer the more rigorous formalities later.

Elementary Transforms and Elementary Properties

Armed with the mighty definition of the Fourier transform, we can take two paths. We can compute the transforms of various elementary functions, or we can develop tools to construct transforms of combinations of functions by computing the transforms of their constituent parts. This is largely the same approach one takes in studying derivatives and integrals in classical calculus: one learns to compute the derivatives of polynomials, logarithms, and exponentials; and then one learns to compute derivatives of products, sums, and compositions of functions. We will operate with the same philosophy for now.

Example: Let $ f = \chi_{[-1/2, 1/2]}$ be the characteristic function of the interval from -1/2 to 1/2. That is, it is the function which is 1 on that interval and zero everywhere else. We will show $ \mathscr{F}f = \frac{\sin(\pi s)}{\pi s}$ by appealing directly to the definition of the Fourier transform.

$ \displaystyle \mathscr{F}f(s) = \int_{-\infty}^{\infty} e^{-2 \pi i s t} \chi_{[-1/2, 1/2]}(t) dt$

Since $ f$ is zero outside of the chosen interval, and one inside, we can simplify the integral by adjusting the limits to -1/2 and 1/2, and inside simply using $ f(t) = 1$:

$ \displaystyle \mathscr{F}f(s) = \int_{-1/2}^{1/2}e^{-2 \pi ist} dt$

And this is quite tractable. Integrating the complex exponential as usual, we have:

$ \displaystyle \mathscr{F}f(s) = \left ( \frac{1}{-2 \pi is} e^{-2 \pi ist} \right ) \Big |_{-1/2}^{1/2} = \frac{e^{- \pi ist} – e^{\pi ist}}{-2 \pi is} = \frac{\sin(\pi s)}{\pi s}$

Where the last equality follows from the classic identity $ e^{ix} = \cos(x) + i\sin(x)$. The result of this Fourier transform is so pervasive that it has it’s own name: $ \textup{sinc}(s) = \frac{\sin(\pi s)}{\pi s}$.

Exercise: Let $ \Lambda(t)$ be the piecewise function defined as $ 1 – |t|$ if $ |t| < 1$, and zero otherwise. Prove that $ \mathscr{F}\Lambda(s) = \textup{sinc}^2(s) = \frac{\sin^2(\pi s)}{(\pi s)^2}$.

Again, this one follows straight from the definition, which must be computed piecewise to handle the absolute value.

Example: Let $ f$ be the Gaussian $ f(t) = e^{- \pi t^2}$. Then $ \mathscr{F}f(s) = f(s)$. That is, the Gaussian is fixed by the Fourier transform.

This is a very special property of the Gaussian, hinting at the special relationship between Fourier transforms and smoothness of curves. In order to prove it we need to borrow a fact from complex analysis, that $ \int_{-\infty}^{\infty} e^{- \pi t^2} = 1$. Note that here the indefinite integral of $ f$ cannot be expressed in elementary terms, so basic calculus tools are insufficient to prove this fact. A proof is most easily accessible using complex integration and residue theory, and Wikipedia provides a proof that does the same thing using a real parameterization to make it seem more elementary.

To find the Fourier transform, we again appeal to the definition, except this time we use some tricks. First, we differentiate the definition with respect to $ s$, and then integrate the result by parts to arrive at an ordinary differential equation, which we know how to solve. Set $ F(s) = \mathscr{F}f(s)$ for ease of notation.

$ \displaystyle F(s) = \int_{-\infty}^{\infty} e^{-2 \pi ist} e^{-\pi t^2}dt$

Differentiating with respect to $ s$, we have

$ \displaystyle F'(s) = \frac{d}{ds}\int_{-\infty}^{\infty} e^{-2 \pi ist}e^{-\pi t^2}dt = \int_{-\infty}^{\infty} \frac{d}{ds}(e^{-2 \pi ist}) e^{-\pi t^2} dt$

Performing the differentiation and regrouping terms we have

$ \displaystyle F'(s) = i \int_{-\infty}^{\infty}e^{-2\pi ist}(-2\pi t e^{-\pi t^2}) dt$

Now integrating by parts with respect to $ t$, and recognizing that the term $ e^{-2 \pi ist} e^{-\pi t^2}$ tends to zero both as $ t \to \pm \infty$, we get

$ \displaystyle F'(s) = -i \int_{-\infty}^{\infty} e^{-\pi t^2}(-2\pi is)e^{-2 \pi ist} dt = -2\pi s F(s)$

As we claimed earlier, this is a simple ordinary differential equation, which has solution

$ \displaystyle F(s) = F(0)e^{-\pi s^2} = F(0)f(s)$

And here $ F(0) = \int_{-\infty}^{\infty} e^{-2 \pi i t 0}e^{-\pi t^2} dt = 1$, as we claimed from the beginning. This completes the proof, as $ F(s) = f(s)$.

Next, we will focus on the rules for taking Fourier transforms of functions combined in various ways. First, we note that the Fourier transform is linear, in the very same sense as linear maps between vector spaces in elementary linear algebra. In particular, the linearity of the integral gives

$ \displaystyle \mathscr{F}(af + bg)(s) = a \mathscr{F}f(s) + b\mathscr{F}g(s)$

Other easy properties arise from modifying the input of $ f$, and using multiplicative properties of the complex exponential. For instance, if we let $ f_a(t) = f(t – a)$, we see that $ \mathscr{F}f_a(s) = e^{-2 \pi ias}\mathscr{F}f(s)$. This follows by a simple change of variables in the integral. Letting $ u = t-a$,

$ \displaystyle \mathscr{F}f_a(s) = \int_{-\infty}^{\infty} e^{-2\pi is(u+a)}f(u)du$

And we can trivially factor out the needed complex exponential coefficient, and work with $ u$ as usual. One convenient interpretation of this formula is that a shift in time corresponds to a phase shift in frequency. Once again, we caution that these interpretations are a tool; they can massively confuse the issue when, say, the domain of $ f$ is frequency to begin with.

Similar considerations give us a formula for the scaled function $ f(at)$ whose transform is $ \frac{1}{|a|} \mathscr{F}f(\frac{s}{a})$. We leave this as an exercise to the reader (hint: break it into two cases for when $ a<0, a>0$).

Next, we note that the Fourier transform turns derivatives of some functions into a very manageable product. Rigorously, if $ \lim_{t \to \pm \infty} f(t) = 0$ then

$ \displaystyle \mathscr{F}\frac{d^nf}{dt^n}(s) = (2\pi i s)^n \mathscr{F}f(s)$

We can prove this by induction. We’ll just prove the base case:

$ \displaystyle \mathscr{F}f'(s) = \int_{-\infty}^{\infty} e^{-2 \pi ist}f'(t)dt$

Integrating by parts, we get

$ \displaystyle \mathscr{F}f'(s) = e^{-2\pi ist}f(t) – \int_{-\infty}^{\infty} (-2\pi is)e^{-2\pi ist}f(t)dt$

And by our boundedness property and the fact that the complex exponential has a constant norm, the first term (evaluated from $ -\infty$ to $ \infty$) tends to zero, leaving our desired product. The inductive step follows with the ease of iterated integration by parts. Note that although this example only holds for functions which tend to zero at $ \pm \infty$, next time we will rectify the situation by restricting our theory to functions which are “the best candidates” for the theory of Fourier analysis and eliminate the need for such hypotheses.

The More Interesting Properties

The final two properties of the Fourier transform that we will inspect are in a sense deeper and more substantial than those above. In particular, we will establish the duality of the Fourier transform, and the effect of Fourier transforms on convolution.

First, the Fourier transform has a few notions of duality. Let $ f^-(t)$ denote $ f(-t)$. One such duality notion is the following, which is a trivial consequence of the definitions of the transform and its inverse:

$ \displaystyle (\mathscr{F}f)^- = \mathscr{F}^{-1}f$

Similarly, a minor change of variables shows that $ \mathscr{F}(f^-) = \mathscr{F}^{-1}f$. Chaining these together, we have the nice identity

$ \displaystyle \mathscr{F}(f^-) = (\mathscr{F}f)^-$

A simple corollary is that $ \mathscr{FF}f = f^-$. This allows us to compute the Fourier transforms of some potentially unmanageable functions. For instance, let us return to our friend the sinc function.

$ \displaystyle \mathscr{F}\textup{sinc}(s) = \mathscr{FF}(\chi_{[-1/2,1/2]}) = \chi^-_{[-1/2, 1/2]} = \chi_{[-1/2, 1/2]}$

by the symmetry of the characteristic function. On the other hand, it’s ridiculously counterintuitive that the following integral is actually the characteristic function of a finite interval:

$ \displaystyle \int_{-\infty}^{\infty} e^{-2 \pi ist} \frac{\sin(\pi t)}{\pi t} dt$

In fact, even though we just “proved” that the sinc function has a nice transform, it is hardly clear how to integrate it. In fact, the sinc function is not even (Lebesgue) integrable! Without further qualifications, the above expression is complete nonsense.

Historically, this is the point at which the physicists contact the mathematicians and say, “We dun broked it!” Because the physicists went ahead and used these naively impossible transforms to do amazing things and discover elegant identities, the mathematicians are left to design a sensible theory to support their findings. The field is rife with such inconsistencies, and this is not the last one we will see before consolidating the theory. Perhaps this is in part because successful applications in engineering outpace mathematical rigor. Glibly, they’re racing for profit while the mathematicians want to architect a flawless foundation in which deep theorems are manifestly obvious.

Getting back to the properties of Fourier transforms, we have saved perhaps the most useful one for last. In short, Fourier transforms turn convolution into multiplication. Convolutions, both continuous and discrete, make cameo appearances all over applied mathematics, from signal processing and image analysis to quantum mechanics and mathematical finance. In our applications, we will use the following properties of convolution to modify the spectrum of a signal, for such purposes as removing noise, or filtering out low/high/mid frequency regions. Without further ado, the definition:

Definition: The convolution of $ f$ and $ g$, denoted $ f \ast g$, is the integral

$ \displaystyle (f \ast g)(x) = \int_{-\infty}^{\infty} g(x-y)f(y)dy,$

should such an integral converge. Otherwise the convolution is undefined.

Often convolution is interpreted as some sort of stretch+translation of a function by another, but we find such meager interpretations mathematically flaccid. Convolution is simply an operation that combines functions in an interesting way (perhaps its definition is motivated by the question below). Nevertheless, Wikipedia provides a number of relevant animations showing convolution in action.

So the leading question here is, what happens when one takes the product of $ (\mathscr{F}f)(\mathscr{F}g)$? From the definition, this is

$ \displaystyle \left ( \int_{-\infty}^{\infty}e^{-2\pi isx}f(x)dx \right ) \left ( \int_{-\infty}^{\infty} e^{-2 \pi ist} g(t) dt \right )$

We may combine the integrals into a double integral, and further combine the complex exponentials, getting

$ \displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2 \pi is (t+x)}g(t) dt \right ) f(x) dx$

Substituting $ u = t+x$, we have

$ \displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2\pi isu} g(u-x)du \right ) f(x) dx$

And swapping the order of integration,

$ \displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} g(u-x)f(x)dx \right ) e^{-2\pi isu} du = \mathscr{F}(g \ast f)$

(The parenthetical quantity drove our definition of the convolution to begin with.) And so we have the beautiful identity:

$ \mathscr{F}(f \ast g)(s) = \mathscr{F}f(s) \mathscr{F}g(s)$

We will use this as follows: multiply the Fourier transform of a signal by an appropriate characteristic function (the characteristic function of the set of “good” frequencies of, say, a sound clip) and then take the inverse transform of the product, getting as a result a modified signal with certain frequencies removed.

There are a few hurdles between here and there (at least, as far as this blog goes). First, we must compensate for our convergence naivete with mathematical rigor. Next time, we will define the class of Schwartz functions, from which we will derive a class of “generalized functions,” intuitively constituting the class of “transformable” functions. After that, we must needs find a suitable discrete approximation of the Fourier transform. In real life, all signals are sampled sequences of numbers. As such, we cannot take their integrals, and must convert these continuous notions to operations on sequences. Finally, we need to investigate an algorithm to efficiently compute such a discrete Fourier transform. Then, and only then, may we proceed with writing programs to do great things.

So look forward to all of the above in the coming weeks. Until next time!