The Fast Fourier Transform

John Tukey, one of the developers of the Cooley-Tukey FFT algorithm.

It’s often said that the Age of Information began on August 17, 1964 with the publication of Cooley and Tukey’s paper, “An Algorithm for the Machine Calculation of Complex Fourier Series.” They published a landmark algorithm which has since been called the Fast Fourier Transform algorithm, and has spawned countless variations. Specifically, it improved the best known computational bound on the discrete Fourier transform from O(n^2) to O(n \log n), which is the difference between uselessness and panacea.

Indeed, their work was revolutionary because so much of our current daily lives depends on efficient signal processing. Digital audio and video, graphics, mobile phones, radar and sonar, satellite transmissions, weather forecasting, economics and medicine all use the Fast Fourier Transform algorithm in a crucial way. (Not to mention that electronic circuits wouldn’t exist without Fourier analysis in general.) Before the Fast Fourier Transform algorithm was public knowledge, it simply wasn’t feasible to process digital signals.

Amusingly, Cooley and Tukey’s particular algorithm was known to Gauss around 1800 in a slightly different context; he simply didn’t find it interesting enough to publish, even though it predated the earliest work on Fourier analysis by Joseph Fourier himself.

In this post we will derive and implement a Fast Fourier Transform algorithm, and explore a (perhaps naive) application to audio processing. In particular, we will remove white noise from a sound clip by filtering the frequency spectrum of a noisy signal.

As usual, all of the resources used for this post are available on this blog’s Google Code page.

Derivation

It turns out that there are a number of different ways to speed up the naive Fourier Transform computation. As we saw in our primer on the discrete Fourier transform, the transform itself can be represented as a matrix. With a bit of nontrivial analysis, one can factor the Fourier transform matrix into a product of two sparse matrices (i.e., which contain mostly zeros), and from there the operations one can skip are evident. The intuitive reason why this should work is that the Fourier transform matrix is very structured, and the complex exponential has many useful properties. For more information on this method, see these lecture notes (p. 286).

We will take a different route which historically precedes the matrix factorization method. In fact, the derivation we’ll trace through below was Cooley and Tukey’s original algorithm. The method itself is a classical instance of a divide and conquer algorithm. Familiar programmers would recognize the ideas in common sorting algorithms: both mergesort and quicksort are divide and conquer algorithms.

The difficulty here is to split a list of numbers into two lists which are half in size in such a way that the Fourier transforms of the smaller pieces can be quickly combined to form the Fourier transform of the original list. Once we accomplish that, the rest of the algorithm falls out from recursion; further splitting each piece into two smaller pieces, we will eventually reach lists of length two or one (in which case the Fourier transform is completely trivial).

For now, we’ll assume that the length of the list is a power of 2. That is,

f = (f[1], f[2], \dots, f[n]), n = 2^k for some k.

We will also introduce the somewhat helpful notation for complex exponentials. Specifically, \omega[p,q] = e^{2\pi i q/p}. Here the p will represent the length of the list, and q will be related to the index. In particular, the complex exponential that usually shows up in the discrete Fourier transform (again, refer to our primer) is \omega[n, -km] = e^{-2 \pi i mk/n}. We write the discrete Fourier transform of f by specifying its m-th entry:

\displaystyle \mathscr{F}f[m] = \sum_{k=0}^{n-1} f[k]\omega[n, -km]

Now the trick here is to split up the sum into two lists each of half size. Of course, one could split it up in many different ways, but after we split it we need to be able to rewrite the pieces as discrete Fourier transforms themselves. It turns out that if we split it into the terms with even indices and the terms with odd indices, things work out nicely.

Specifically, we can write the sum above as the two sums below. The reader should study this for a moment (or better yet, try to figure out what it should be without looking) to ensure that the indices all line up. The notation grows thicker ahead, so it’s good practice.

\displaystyle \mathscr{F}f[m] = \sum_{k=0}^{\frac{n}{2} - 1} f[2k] \omega[n,-2km] + \sum_{k=0}^{\frac{n}{2} - 1} f[2k+1]\omega[n,-(2k+1)m].

Now we need to rewrite these pieces as Fourier transforms. Indeed, we must replace the occurrences of n in the complex exponentials with n/2. This is straightforward in the first summation, since

\omega[n, -2km] = e^{-2 \pi i km/n} = \omega[\frac{n}{2}, -km].

For the second summation, we observe that

\displaystyle \omega[n, -(2k+1)m] = \omega[n, -2km] \cdot \omega[n,-m].

Now if we factor out the \omega[n,-m], we can transform the second sum in the same way as the first, but with that additional factor out front. In other words, we now have

\displaystyle \mathscr{F}f[m] = \sum_{k=0}^{\frac{n}{2}-1} f[2k] \omega[n/2, -km] + \omega[n, -m] \sum_{k=0}^{\frac{n}{2}-1}f[2k+1] \omega[n/2, -km].

If we denote the list of even-indexed entries of f by f_{\textup{even}}, and vice versa for f_{\textup{odd}}, we see that this is just a combination of two Fourier transforms of length n/2. That is,

\displaystyle \mathscr{F}f = \mathscr{F}f_{\textup{even}} + \omega[n,-m] \mathscr{F}f_{\textup{odd}}

But we have a big problem here. Computer scientists will recognize this as a type error. The thing on the left hand side of the equation is a list of length n, while the thing on the right hand side is a sum of two lists of length n/2, and so it has length n/2. Certainly it is still true for values of m which are less than n/2; we broke no algebraic laws in the way we rewrote the sum (just in the use of the \mathscr{F} notation).

Recalling our primer on the discrete Fourier transform, we naturally extended the signals involved to be periodic. So indeed, the length-n/2 Fourier transforms satisfy the following identity for each m.

\mathscr{F}f_{\textup{even}}[m] = \mathscr{F}f_{\textup{even}}[m+n/2] \\ \mathscr{F}f_{\textup{odd}}[m] = \mathscr{F}f_{\textup{odd}}[m+n/2]

However, this does not mean we can use the same formula above! Indeed, for a length n Fourier transform, it is not true in general that \mathscr{F}f[m] = \mathscr{F}f[m+n/2]. But the correct formula is close to it. Plugging in m + n/2 to the summations above, we have

\displaystyle \mathscr{F}f[m+n/2] = \sum_{k=0}^{\frac{n}{2} - 1}f[2k] \omega[n/2, -(m+n/2)k] + \\ \omega[n, -(m+n/2)] \sum_{k=0}^{\frac{n}{2} - 1}f[2k+1] \omega[n/2, -(m+n/2)k]

Now we can use the easy-to-prove identity

\displaystyle \omega[n/2, -(m+n/2)k] = \omega[n/2, -mk] \omega[n/2, -kn/2]

And see that the right-hand-term is simply e^{2 \pi i k} = 1. Similarly, one can trivially prove the identity

\displaystyle \omega[n, -(m+n/2)] = \omega[n, -m] \omega[n, -n/2] = -\omega[n, -m],

this simplifying the massive formula above to the more familiar

\displaystyle \mathscr{F}f[m + n/2] = \sum_k f[2k]\omega[n/2, -mk] - \omega[n, -m] \sum_k f[2k+1] \omega[n/2, -mk]

Now finally, we have the Fast Fourier Transform algorithm expressed recursively as:

\displaystyle \mathscr{F}f[m] = \mathscr{F}f_{\textup{even}}[m] + \omega[n,-m] \mathscr{F}f_{\textup{odd}}[m]
\displaystyle \mathscr{F}f[m+n/2] = \mathscr{F}f_{\textup{even}}[m] - \omega[n,-m] \mathscr{F}f_{\textup{odd}}[m]

With the base case being \mathscr{F}([a]) = [a].

In Python

With all of that notation out of the way, the implementation is quite short. First, we should mention a few details about complex numbers in Python. Much to this author’s chagrin, Python represents i using the symbol 1j. That is, in python, a + bi is

a + b * 1j

Further, Python reserves a special library for complex numbers, the cmath library. So we implement the omega function above as follows.

import cmath

def omega(p, q):
   return cmath.exp((2.0 * cmath.pi * 1j * q) / p)

And then the Fast Fourier Transform algorithm is more or less a straightforward translation of the mathematics above:

def fft(signal):
   n = len(signal)
   if n == 1:
      return signal
   else:
      Feven = fft([signal[i] for i in xrange(0, n, 2)])
      Fodd = fft([signal[i] for i in xrange(1, n, 2)])

      combined = [0] * n
      for m in xrange(n/2):
         combined[m] = Feven[m] + omega(n, -m) * Fodd[m]
         combined[m + n/2] = Feven[m] - omega(n, -m) * Fodd[m]

      return combined

Here I use the awkward variable names “Feven” and “Fodd” to be consistent with the notation above. Note that this implementation is highly non-optimized. There are many ways to improve the code, most notably using a different language and cacheing the complex exponential computations. In any event, the above code is quite readable, and a capable programmer could easily speed this up by orders of magnitude.

Of course, we should test this function on at least some of the discrete signals we investigated in our primer. And indeed, it functions correctly on the unshifted delta signal

>>> fft([1,0,0,0])
[(1+0j), (1+0j), (1+0j), (1+0j)]
>>> fft([1,0,0,0,0,0,0,0])
[(1+0j), (1+0j), (1+0j), (1+0j), (1+0j), (1+0j), (1+0j), (1+0j)]

As well as the shifted verion (up to floating point roundoff error)

>>> fft([0,1,0,0])
[(1+0j), 
 (6.123233995736766e-17-1j), 
 (-1+0j), 
 (-6.123233995736766e-17+1j)]
>>> fft([0,1,0,0,0,0,0,0])
[(1+0j), 
 (0.7071067811865476-0.7071067811865475j), 
 (6.123233995736766e-17-1j), 
 (-0.7071067811865475-0.7071067811865476j), 
 (-1+0j), 
 (-0.7071067811865476+0.7071067811865475j), 
 (-6.123233995736766e-17+1j), 
 (0.7071067811865475+0.7071067811865476j)]

And testing it on various other shifts gives further correct outputs. Finally, we test the Fourier transform of the discrete complex exponential, which the reader will recall is a scaled delta.

>>> w = cmath.exp(2 * cmath.pi * 1j / 8)
>>> w
(0.7071067811865476+0.7071067811865475j)
>>> d = 4
>>> fft([w**(k*d) for k in range(8)])
[                           1.7763568394002505e-15j, 
 (-7.357910944937894e-16  + 1.7763568394002503e-15j), 
 (-1.7763568394002505e-15 + 1.7763568394002503e-15j), 
 (-4.28850477329429e-15   + 1.7763568394002499e-15j), 
 (8                       - 1.2434497875801753e-14j), 
 (4.28850477329429e-15    + 1.7763568394002505e-15j), 
 (1.7763568394002505e-15  + 1.7763568394002505e-15j), 
 (7.357910944937894e-16   + 1.7763568394002505e-15j)]

Note that n = 8 occurs in the position with index d=4, and all of the other values are negligibly close to zero, as expected.

So that’s great! It works! Unfortunately it’s not quite there in terms of usability. In particular, we require our signal to have length a power of two, and most signals don’t happen to be that long. It turns out that this is a highly nontrivial issue, and all implementations of a discrete Fourier transform algorithm have to compensate for it in some way.

The easiest solution is to simply add zeros to the end of the signal until it is long enough, and just work with everything in powers of two. This technique (called zero-padding) is only reasonable if the signal in question is actually zero outside of the range it’s sampled from. Otherwise, subtle and important things can happen. We’ll leave further investigations to the reader, but the gist of the idea is that the Fourier transform assumes periodicity of the data one gives it, and so padding with zeros imposes a kind of periodicity that is simply nonexistent in the actual signal.

Now, of course not every Fast Fourier transform uses zero-padding. Unfortunately the techniques to evaluate a non-power-of-two Fourier transform are relatively complex, and beyond the scope of this post (though not beyond the scope of this blog). The interested reader should investigate the so-called Chirp Z-transform, as discovered by Rader, et al. in 1968. We may cover it here in the future.

In any case, the code above is a good starting point for any technique, and as usual it is available on this blog’s Google code page. Finally, the inverse transform has a simple implementation, since it can be represented in terms of the forward transform (hint: remember duality!). We leave the code as an exercise to the reader.

Experiments with Sound — I am no Tree!

We’ll manipulate a clip of audio from Treebeard, a character from Lord of the Rings.

For the remainder of this post we’ll use a more established Fast Fourier Transform algorithm from the Python numpy library. The reasons for this are essentially convenience. Being implemented in C and brandishing the full might of the literature on Fourier transform algorithms, the numpy implementation is lightning fast. Now, note that the algorithm we implemented above is still correct (if one uses zero padding)! The skeptical reader may run the code to verify this. So we are justified in abandoning our implementation until we decide to seriously focus on improving its speed.

So let’s play with sound.

The sound clip we’ll be using is the following piece of dialog from the movie The Lord of the Rings: The Two Towers. We include the original wav file with the code on this blog’s Google code page.

If we take the Fourier transform of the sound sample, we get the following plot of the frequency spectrum. Recall, the frequency spectrum is the graph of the norm of the frequency values.

The frequency spectrum of an Ent speaking. The x-axis represents the index in the list (larger values correspond to larger frequencies), and the y-axis corresponds to intensity (larger values correspond to that frequency having a greater presence in the original signal). There is symmetry about the ~80,000 index, and we may consider the right half ‘negative’ frequencies because the Fourier Transform is periodic modulo its length.

Here we note that there is a symmetry to the graph. This is not a coincidence: if the input signal is real-valued, it will always be the case that the Fourier transform is symmetric about its center value. The reason for this goes back to our first primer on the Fourier series, in that the negative coefficients were complex conjugates of the positive ones. In any event, we only need concern ourselves with the first half of the values.

We will omit the details for doing file I/O with audio. The interested reader can inspect our source code, and they will find a very basic use of the scikits.audiolab library, and matplotlib for plotting the frequency spectrum.

Now, just for fun, let’s tinker with this audio bite. First we’ll generate some random noise in the signal. That is, let’s mutate the input signal as follows:

import random
inputSignal = [x/2.0 + random.random()*0.1 for x in inputSignal]

This gives us a sound bite with an annoying fuzziness in the background:

Next, we will use the Fourier transform to remove some of this noise. Of course, since the noise is random it inhabits all the frequencies to some degree, so we can’t eliminate it entirely. Furthermore, as the original audio clip uses some of the higher frequencies, we will necessarily lose some information in the process. But in the real world we don’t have access to the original signal, so we should clean it up as best we can without losing too much information.

To do so, we can plot the frequencies for the noisy signal:

Comparing this with our original graph (cheating, yes, but the alternative is to guess and check until we arrive at the same answer), we see that the noise begins to dominate the spectrum at around the 20,000th index. So, we’ll just zero out all of those frequencies.

def frequencyFilter(signal):
   for i in range(20000, len(signal)-20000):
      signal[i] = 0

And voila! The resulting audio clip, while admittedly damaged, has only a small trace of noise:

Inspecting the source code, one should note that at one point we halve the amplitude of the input signal (i.e., in the time domain). The reason for this is if one arbitrarily tinkers with the frequencies of the Fourier transform, it can alter the amplitude of the original signal. As one quickly discovers, playing the resulting signal as a wav file can create an unpleasant crackling noise. The amplitudes are clipped (or wrapped around) by the audio software or hardware for a reason which is entirely mysterious to this author.

In any case, this is clearly not the optimal way (or even a good way) to reduce white noise in a signal. There are certainly better methods, but for the sake of time we will save them for future posts. The point is that we were able to implement and use the Fast Fourier Transform algorithm to analyze the discrete Fourier transform of a real-world signal, and manipulate it in logical ways. That’s quite the milestone considering where we began!

Next up in this series we’ll investigate more techniques on sound processing and other one-dimensional signals, and we’ll also derive a multi-dimensional Fourier transform so that we can start working with images and other two-dimensional signals.

Until then!

About these ads

The Discrete Fourier Transform — A Primer

So here we are. We have finally made it to a place where we can transition with confidence from the classical continuous Fourier transform to the discrete version, which is the foundation for applications of Fourier analysis to programming. Indeed, we are quite close to unfurling the might of the Fast Fourier Transform algorithm, which efficiently computes the discrete Fourier transform. But because of its focus on algorithmic techniques, we will save it for a main content post and instead focus here on the intuitive connections between the discrete and continuous realms.

The goal has roughly three parts:

  1. Find a reasonable discrete approximation to a continuous function.
  2. Find a reasonable discrete approximation to the Fourier transform of a continuous function.
  3. Find a way to transition between the two discrete representations.

We should also note that there will be some notational clashes in the sequel. Rigorously, item 3 will result in an operation which we will denote by \mathscr{F}, but will be distinct from the classical Fourier transform on continuous functions. Indeed, they will have algebraic similarities, but one operates on generalized functions, and the other on finite sequences. We will keep the distinction clear from context to avoid adding superfluous notation. Moreover, we will avoid the rigor from the last post on tempered distributions. Instead we simply assume all functions are understood to be distributions and use the classical notation. Again, this is safe because our dabbling with the classical transform will be solely for intuition.

Of course, the point of this entire post is that all of the facts we proved about the continuous Fourier transform have direct translations into the discrete case. Up to a constant factor (sometimes) and up to notational conventions, the two theories will be identical. This is part of the beauty of the subject; sit back and enjoy it.

Sampling

There is a very nice theorem about classical Fourier transforms that has to do with reconstructing a function from a discrete sample of its points. Since we do not actually need this theorem for anything other than motivation, we will not prove it here. Instead, we’ll introduce the definitions needed to state it, and see why it motivates a good definition for the discrete “approximation” to a function. For a much more thorough treatment of the sampling theorem and the other issues we glaze over in this post, see these lecture notes.

Definition: A function f is time limited if it has bounded support. A function f is bandlimited if its Fourier transform has bounded support. That is, if there is some B so that the Fourier transform of f is only nonzero when |s|<B. We call B the bandwidth of f.

To be honest, before seeing a mathematical treatment of signal processing, this author had no idea what bandwidth actually referred to. It’s nice to finally solve those life mysteries.

In any case, it often occurs that one has a signal f for which one can only measure values, but one doesn’t have access to an exact description of f. The sampling theorem allows us to reconstruct f exactly by choosing certain sample points. In a simplified form, the theorem is as follows:

TheoremSuppose f is a function of bandwidth B. Then one can reconstruct f exactly from the set of points (k/2B, f(k/2B)) as k ranges over \mathbb{Z} (that is, the sampling rate is at least 1/2B).

Unsurprisingly, the proof uses the Fourier transform in a nontrivial way. Moreover, there is a similar theorem for the Fourier transform \mathscr{F}f, which can be reconstructed exactly from its samples if the sampling rate is at least 1/L, where L/2 bounds the support of f. Note that the sampling rate in one domain is determined by the limitations on the other domain.

What’s more, if we are daring enough to claim f is both time limited and bandlimited (and we sample as little as possible in each domain), then the number of sample points is the same for both domains. In particular, suppose f(t) is only nonzero on 0 \leq t \leq L and its Fourier transform on 0 \leq s \leq 2B. Note that since f is both timelimited and bandlimited, there is no fault in shifting both so their smallest value is at the origin. Then, if n, m are the respective numbers of sampled points in the time and frequency domains, then m/L = 2B, n/2B = L, and so m = n = 2BL.

Now this gives us a good reason to define the discrete approximation to a signal as

\displaystyle f_{\textup{sampled}} = (f(t_0), f(t_1), \dots, f(t_{n-1})),

where t_k = k/2B. This accomplishes our first task. Now, in order to determine what the discrete Fourier transform should look like, we can represent this discrete approximation as a distribution using shifted deltas:

\displaystyle f_{\textup{sampled}}(t) = \sum_{k=0}^{n-1}f(t_k)\delta(t-t_k)

Here the deltas ensure the function is zero at points not among the t_k, capturing the essence of the finite sequence above. Taking the Fourier transform of this (recalling that the Fourier transform of the shifted delta is a complex exponential and using linearity), we get

\displaystyle \mathscr{F}f_{\textup{sampled}}(s) = \sum_{k=0}^{n-1}f(t_k)e^{-2 \pi i s t_k}

Denote the above function by F. Now F is unfortunately still continuous, so we take the same approach we did for f and sample it via the samples in the frequency domain at s_j = j/L. This gives the following list of values for the discrete approximation to the transformed version of f:

\displaystyle F(s_0) = \sum_{k=0}^{n-1} f(t_k)e^{-2 \pi i s_0t_k}

\displaystyle F(s_1) = \sum_{k=0}^{n-1} f(t_k)e^{-2 \pi i s_1t_k}

\vdots

\displaystyle F(s_n) = \sum_{k=0}^{n-1} f(t_k)e^{-2 \pi i s_nt_k}

And now the rest falls together. We can denote by (\mathscr{F}f)_{\textup{sampled}} the tuple of values F(s_j), and the list of formulas above gives a way to transition from one domain to the other.

A Discrete Life

Now we move completely away from the continuous realm. The above discussion was nice in giving us an intuition for why the following definitions are reasonable. However, in order to actually use these ideas on discrete sequences, we can’t be restricted to sampling continuous functions. In fact, most of our planned work on this blog will not use discrete approximations to continuous functions, but just plain old sequences. So we just need to redefine the above ideas in a purely discrete realm.

The first step is to get rid of any idea of the sampling rate, and instead write everything in terms of the indices. This boils down to the a convenient coincidence. If t_k = k/2B, s_j = j/L, then by our earlier remarks that 2BL = n (the number of sample points taken), then t_ks_j = kj/n. This allows us to rewrite the complex exponentials as e^{-2 \pi i kj/n}, which is entirely in terms of the number of points used and the indices of the summation.

In other words, we can finally boil all of this discussion down to the following definition. Before we get to it, we should note that we use the square bracket notation for lists. That is, if L is a list, then L[i] is the i-th element of that list. Usually one would use subscripts to denote this, but we’d like to stay close to our notation for the continuous case, at least to make the parallels between the two theories more obvious.

DefinitionLet f = (f[1], \dots f[n]) be a vector in \mathbb{R}^n. Then the discrete Fourier transform of f is defined by the vector (\mathscr{F}f[1], \dots, \mathscr{F}f[n]), where

\displaystyle \mathscr{F}f[j] = \sum_{k=0}^{n-1} f[k]e^{-2 \pi i jk/n}

The phrase “discrete Fourier transform” is often abbreviated to DFT.

Now although we want to showcase the connections between the discrete and continuous Fourier transforms, we should note that they are completely disjoint. If one so desired (and many books follow this route) one could start with the discrete Fourier transform and later introduce the continuous version. We have the advantage of brevity here, in that we know what the theorems should say before we actually prove them. The point is that while the two transforms have connections and similarities, their theory is largely independent, and moreover the definition of the Fourier transform can be given without any preface. Much as we claimed with the continuous Fourier transform, the discrete Fourier transform is simply an operation on lists of data (i.e., on vectors).

Pushing the discrete to the extreme, we can represent the list of complex exponentials as a discrete signal too.

Definition: The discrete complex exponential of period n is the list

\displaystyle \omega_n = (1, e^{2 \pi i/n}, \dots, e^{2 \pi i (n-1)/n}).

We will omit the subscript n when it is understood (at least, for the rest of this primer). And hence \omega[k] = e^{2 \pi i k/n} and moreover \omega^m[k] = \omega^k[m] = e^{2\pi imk/n}. If we denote by \omega^n the vector of entry-wise powers of \omega, then we can write the discrete Fourier transform in its most succinct form as:

\displaystyle \mathscr{F}f[m] = \sum_{k=0}^{n-1} f[k]\omega^{-m}[k]

or, recognizing everything without indexing in “operator notation,”

\displaystyle \mathscr{F}f = \sum_{k=0}^{n-1}f\omega^{-m}.

There are other ways to represent the discrete Fourier transform as an operation. In fact, since we know the classical Fourier transform is a linear operator, we should be able to come up with a matrix representation of the discrete transform. We will do just this, and as a result we will find the inverse discrete Fourier transform, but first we should look at some examples.

The Transforms of Discrete Signals

Perhaps the simplest possible signal one could think of is the delta function, whose discretized form is

\displaystyle \delta = (1,0, \dots, 0)

with the corresponding shifted form

\displaystyle \delta_k = (0, 0, \dots 0, 1, 0, \dots 0)

where the 1 appears in the k-th spot. The reader should be convinced that this is indeed the right definition because it’s continuous cousin’s defining properties translate nicely. Specifically, \sum_{k}\delta_m[k] f[k] = f[m] for all m, and \sum_{k}\delta[k] = 1.

Now to find its Fourier transform, we simply use the definition:

\displaystyle \mathscr{F}\delta[m] = \sum_{k=0}^{n-1}\delta[k] \omega^{-m}[k]

The only time that \delta[k] is nonzero is for k=0, so this is simply the vector \delta[0] \omega^{0}, which is the vector consisting entirely of 1′s. Hence, as in the continuous case (or at least, for the continuous definition of the 1 distribution),

\displaystyle \mathscr{F}\delta = 1

In an identical manner, one can prove that \mathscr{F}\delta_k = \omega^{-k}, just as it was in for the continuous transform.

Now let’s do an example which deviates slightly from the continuous case. Recall that the continuous Fourier transform of the complex exponential was a delta function (to convince the reader, simply see that a single complex exponential can only have a single angular variable, and hence a singular frequency). In the discrete case, we get something similar.

\displaystyle \mathscr{F}\omega^{d} = \sum_{k=0}^{n-1} \omega^d\omega^{-m},

so looking at the m-th entry of this vector, we get

\displaystyle \mathscr{F}\omega^d[m] = \sum_{k=0}^{n-1} \omega^d[k] \omega^{-m}[k]

but because \omega^{-m}[k] = e^{-2 \pi i km/n} = \overline{\omega^m[k]}, we see that the sum is just the usual complex inner product \left \langle \omega^d, \omega^m \right \rangle. Further, as the differing powers of \omega are orthogonal (hint: their complex inner product forms a geometric series), we see that they’re only nonzero when d =m. In this case, it is easy to show that the inner product is precisely n. Summarizing,

This is naturally n \delta_d, so our final statement is just \mathscr{F}\omega^d = n\delta_d. Note that here we have the extra factor of n floating around. The reason for this boils down to the fact that the norm of the complex exponential \omega is \sqrt{n}, and not 1. That is, the powers of \omega do not form an orthonormal basis of \mathbb{C}^n. There are various alternative definitions that try to compensate for this, but to the best of this author’s knowledge a factor of n always shows up in some of the resulting formulas. In other words, we should just accept this deviation from the continuous theory as collateral damage.

The mischievous presence of n also shows up in an interesting way in the inverse discrete Fourier transform.

The DFT and its Inverse, as a Matrix

It’s a trivial exercise to check by hand that the discrete Fourier transform is a linear operation on vectors. i.e., \mathscr{F}(v_1 + v_2)[m] = \mathscr{F}v_1[m] + \mathscr{F}v_2[m] for all vectors v_1, v_2 and all m. As we know from our primer on linear algebra, all such mappings are expressible as matrix multiplication by a fixed matrix.

The “compact” form we represented the discrete Fourier transform above sheds light onto this question. Specifically, the formula

\displaystyle \mathscr{F}f[m] = \sum_{k=0}^{n-1}f[k]\omega^{-m}[k]

Is basically just the definition of a matrix multiplication. Viewing f as the column vector

\displaystyle \begin{pmatrix} f[0]\\ f[1]\\ \vdots\\ f[n] \end{pmatrix}

It is easy to see that the correct matrix to act on this vector is

A perhaps easier way to digest this matrix is by viewing each row of the matrix as the vector \omega^{-m}.

\displaystyle \mathscr{F} = \begin{pmatrix} \textbf{---} & \omega^0 & \textbf{---} \\ \textbf{---} & \omega^{-1} & \textbf{---} \\ & \vdots & \\ \textbf{---} & \omega^{-(n-1)} & \textbf{---} \end{pmatrix} 

Now let’s just admire this matrix for a moment. What started many primers ago as a calculus computation requiring infinite integrals and sometimes divergent answers is now trivial to compute. This is our first peek into how to compute discrete Fourier transforms in a program, but unfortunately we are well aware of the fact that computing this naively requires O(n^2) computations. Our future post on the Fast Fourier Transform algorithm will take advantage of the structure of this matrix to improve this by an order or magnitude to O(n \log n).

But for now, let’s find the inverse Fourier transform as a matrix. From the first of the two matrices above, it is easy to see that the matrix for \mathscr{F} is symmetric. Indeed, the roles of k,m are identical in e^{-2\pi i km/n}. Furthermore, looking at the second matrix, we see that \mathscr{F} is almost unitary. Recall that a matrix A is unitary if AA^* = I, where I is the identity matrix and A^* is the conjugate transpose of A. For the case of \mathscr{F}, we have that \mathscr{FF^*} = nI. We encourage the reader to work this out by hand, noting that each entry in the matrix resulting from \mathscr{FF^*} is an inner product of powers of \omega.

In other words, we have \mathscr{F}\cdot \frac{1}{n}\mathscr{F^*} = I, so that \mathscr{F}^{-1} = \frac{1}{n}\mathscr{F^*}. Since \mathscr{F} is symmetric, this simplifies to \frac{1}{n}\overline{\mathscr{F}}.

Expanding this out to a summation, we get what we might have guessed was the inverse transform:

\displaystyle \mathscr{F}^{-1}f[m] = \frac{1}{n} \sum_{k=0}^{n-1} f[k]\omega^m[k].

The only difference between this formula and our intuition is the factor of 1/n.

Duality

The last thing we’d like to mention in this primer is that the wonderful results on duality for the continuous Fourier transform translate to the discrete (again, with a factor of n). While we leave the details as an exercise to the reader,

In order to do this, we need some additional notation. We can think of \omega as an infinite sequence which would repeat itself every n entries (by Euler’s identity). Then we can “index” \omega higher than n, and get the identity \omega^m[k] = \omega[mk]. Similarly, we could “periodize” a discrete signal f so that f[m] is defined by f[m \mod n] and we allow m \in \mathbb{Z}.

This periodization allows us to define f^- naturally as f^-[m] = f[-m \mod n]. Then it becomes straightforward to check that \mathscr{F}f^- = (\mathscr{F}f)^-, and as a corollary \mathscr{FF}f = nf^-. This recovers some of our most powerful tools from the classical case for computing Fourier transforms of specific functions.

Next time (and this has been a long time coming) we’ll finally get to the computational meat of Fourier analysis. We’ll derive and implement the Fast Fourier Transform algorithm, which computes the discrete Fourier transform efficiently. Then we’ll take our first foray into playing with real signals, and move on to higher-dimensional transforms for image analysis.

Until then!

Generalized Functions — A Primer

Last time we investigated the naive (which I’ll henceforth call “classical”) notion of the Fourier transform and its inverse. While the development wasn’t quite rigorous, we nevertheless discovered elegant formulas and interesting properties that proved useful in at least solving differential equations. Of course, we wouldn’t be following this trail of mathematics if it didn’t result in some worthwhile applications to programming. While we’ll get there eventually, this primer will take us deeper down the rabbit hole of abstraction. We will develop the necessary framework required to reason about Fourier transforms in a mathematically rigorous manner. Most importantly, we will avoid the divergent integrals which, when we try to use them in an otherwise rigorous proof, make our stomachs heave.

It turns out the rigorous theory is wholly neat and tidy. The whole idea hinges on a part of linear algebra which is slightly more advanced than what we’ve seen so far on this blog, but it is by all means accessible to the reader who has mastered our relevant primers. And even though we still will overlook some of the more minute details of the theory, we will cover a nontrivial portion of it, and leave further exploration to the reader’s whim and discretion.

A Bit of Motivation

Every tidy mathematical theory deserves some kind of motivation, and the theory of Fourier transforms is no different. The primary motivating question, however, does not often change. That is, we want to ask, “Which functions make the classical Fourier transform as nice as possible?” and build our theory from that foundation. The tricky part is rigorously defining what it means to be “good” for the Fourier transform. One obvious condition we should require is that such a function have a classically-defined Fourier transform (that is, the integral defining its transform converges), but is that enough?

It turns out that this is not enough. Taking for granted the many years of mathematical development and genius that resulted in the correct conditions, we state the following criteria:

Criteria: A class of functions C is good for Fourier transforms if it satisfies the following two criteria:

  1. If f \in C then \mathscr{F}f \in C and \mathscr{F}^{-1}f \in C.
  2. \mathscr{FF}^{-1}f = f = \mathscr{F}^{-1}\mathscr{F}f for all f \in C.

Now if we can find a class of functions which satisfies these two criteria, then it should be a good candidate to base our formal theory of Fourier transforms on. But this raises the obvious question: what does one mean by “basing a theory” on a class of functions? It would be a waste of time to try to put it in less general non-mathematical terms, so let’s just slide right into it.

Generalized Functions

We’ll begin with the complete unabridged definition:

Definition: Given a space A of functions f: \mathbb{R} \to \mathbb{R}, a generalized function on A is a continuous linear functional on A.

Breaking the definition down, a linear functional is a function A \to \mathbb{R} which is linear. Explicitly, if T is a linear functional, then T operates on functions, and outputs complex numbers in a way that the following identity holds:

\displaystyle T(af + bg) = aTf + bTg

for all a, b \in \mathbb{C}, f,g \in A.

The requirements on the space A are a bit tricky to pin down, and the details begin to overwhelm the reader quite quickly. We will be content to say that A has some notion of distance which is complete. That is, sequences of functions which converge have their limit inside the space. Such a notion is required to talk about continuity.

With all that securely bolted down, we can finally say that a generalized function is just a continuous linear functional, i.e., a linear mapping T: A \to \mathbb{C} which satisfies

T(f_n) \to Tf whenever f_n \to f in A.

Often generalized functions on A are simply called distributions on A, but this author thinks that term clashes with probability distributions in an unnatural way. In fact, all probability distributions can be realized as generalized functions on some suitably chosen function space, so the name isn’t there without a reason. But for a student with no formal measure theory or probability theory under his belt, the coincidence of names can be confusing. We will stick to the term generalized function.

There is another interesting bit of notation that accompanies generalized functions, and that is to write a generalized function as if it were an inner product. i.e., instead of writing Tf for a linear functional T operating on a function f, one writes \left \langle T, f \right \rangle. We personally think this is silly, but we will bring it up later when we discuss the Fourier transform, because this choice of notation hints at a deep mathematical theorem. The interpretation usually given to physics students is that this notation represents a “pairing” between T and f, and that the way you can tell a generalized function from a regular function is by the fact that it’s defined by how it “operates” with other functions.

The most prominent example of this is the Dirac delta function \delta. When trying to define it, one might say ridiculous things like “it’s zero everywhere except at the origin, where it’s infinite,” and, “the integral from -\infty to \infty of the delta function is 1.” Both of these claims ludicrously defy classical analysis; the delta simply cannot be defined as a function. On the other hand, it has a very natural definition in terms of how it operates when “multiplied” by another function and then integrated. In our pairing notation, this definition is that \left \langle \delta, f \right \rangle = f(0).

To prove this works rigorously, let A be any space of functions for which convergence in A implies pointwise convergence. Consider the mapping \delta(f) = f(0) extended linearly from any basis. The linearity condition is manifestly held, and given a convergent sequence f_n \to f, the sequence \delta(f_n) = f_n(0) \to f(0), since we required pointwise convergence.

In fact, it will work in situations with weaker hypotheses (and for Fourier transforms the hypotheses certainly are weaker), but these details are beyond the scope of this primer (for instance, see this necessary and sufficient condition).

For the purpose of working with Fourier transforms, our generalized functions will be defined almost always by integration. That is, even though we have yet to figure out what A is, we want a generalized function T to act on a function using the usual kinds of inner products on function spaces like L^2.

One large class of examples of generalized functions are those which are induced by regular functions. That is, we will be able to take any function g (discontinuous or wild as you like), and define the generalized function T_g by

\displaystyle T_g(f) = \int_{-\infty}^{\infty} g(x)f(x)dx

Now, of course we require that f satisfy whatever conditions we impose on A, and these will ensure that the integral always converges, no matter what g is.

Now supposing our criteria for being “good for Fourier transforms” holds for A, and we have such a class where we can define linear functionals by integration. Then we can define quite easily the Fourier transform of a generalized function on A.

There is a nice derivation here, and it goes like this: if T is a generalized function and it happens to be induced by a Fourier-transformable function g(x), then the Fourier transform of T is defined by

\displaystyle (\mathscr{F}T)(\varphi) = \int_{-\infty}^{\infty} \mathscr{F}g(x)\varphi(x)dx

And since g is transformable, we can expand the classical definition as an integral, giving

\displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2\pi ixy} g(y) dy \right ) \varphi(x) dx

Swapping the order of integration we get

\displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2 \pi ixy} \varphi(x) dx \right ) g(y)dy

And the inner integral is simply \mathscr{F}\varphi, giving

\displaystyle \int_{-\infty}^{\infty} \mathscr{F}\varphi(y) g(y) dy = T(\mathscr{F}\varphi)

Since the pairing of T is defined by integrating with a product of g.

But notice that the final form of our expression for (\mathscr{F}T)(\varphi) does not at all require that T is induced by g. All that we require is that \varphi is transformable, and that is always true by our assumption that A satisfies our original criterion. This motivates the definition:

DefinitionThe Fourier transform of a generalized function T is defined by (\mathscr{F}T)(f) = T(\mathscr{F}f). Similarly, the inverse Fourier transform is defined by (\mathscr{F}^{-1}T)(f) = T(\mathscr{F}^{-1}f).

This is much neater, and in fact more general than a definition based on integration. The point of deriving it the way we did is so that we can have a theory which reduces to our classical notions given the right assumptions, but can be framed in other, perhaps unexpected contexts. Such is the beauty of mathematics.

Moreover, we note that the “pairing” notation hints at an interesting fact about Fourier transforms. In particular, the above definition says that \left \langle \mathscr{F}T,f \right \rangle = \left \langle T, \mathscr{F}f \right \rangle. One familiar with the basics of finite-dimensional functional analysis will recognize this immediately as the condition for \mathscr{F} to be a self-adjoint operator. While we won’t discuss self-adjoint operators in this primer (we’ll save it for a future primer on functional analysis; we foresee this topic surfacing again when we cover support vector machines), we will note for the knowledgable reader that with a few additional conditions, this is precisely what is going on with the Fourier transform. However, since we’re dealing with an infinite-dimensional vector space, we can’t quite say it’s a diagonalizable matrix, but it is “multiplication by a constant” in a sense. The relationship is most evident when again T is induced by a  regular function, and then the pairing is given by integration after multiplication by the “constant” g. Additionally, the fact that we represent a generalized function by the inner product notation hints at Riesz-type representation theorems, but we will press the issue no further here.

Sticking in this abstract land of an unknown A a little longer, we can reprove some of the basic facts about Fourier transforms in this general setting. For instance,

\displaystyle (\mathscr{F}\mathscr{F}^{-1}T)(f) = (\mathscr{F}^{-1}T)(\mathscr{F}f) = T(\mathscr{F}^{-1}\mathscr{F}f) = T(f)

since the function f is required to satisfy the criteria. So in other words, \mathscr{F}\mathscr{F}^{-1}T = T, and it is trivial to see the reverse composition yields the same.

A similarly easy proof recovers the identity that confused us last time, namely that (\mathscr{FF}T)(f) = -Tf. We leave it as an exercise to the reader

Schwartz Functions, and Tempered Distributions

It’s time to forego the stalling and declare what the set of functions A should be to make this all work. This bit is generally considered the hard work and inspired genius of a man named Laurent Schwartz, and so they are appropriately called Schwartz functions.

Definition: The class S of rapidly decreasing functions on \mathbb{R}, or Schwartz functions, is the set of the smooth functions f : \mathbb{R} \to \mathbb{R} which satisfy the following condition for all m, n \in \mathbb{N}:

\displaystyle |x|^n \left | \frac{d^m}{dx^m} f(x) \right | \to 0 as x \to \pm \infty

Recall, if forgotten, that smooth simply means infinitely differentiable. While we won’t get into the nitty gritty of proving the following facts (they’re quite analytic, and this author is an algebraist), we will state the important properties of S.

First and foremost, S is a vector space which satisfies our criteria for being “good for Fourier transforms.” Another way of saying this is that the Fourier transform is a linear isomorphism on S. Second, S includes all smooth functions with compact support; that is, it includes all functions which are nonzero except on a closed and bounded set. Moreover, S includes all Gaussian curves. Third, all functions in S are uniformly continuous. And finally, S \subset L^p for all p \geq 1.

In other words, things are as nice as we could ever hope for them to be, with respect to taking Fourier transforms.  Things converge, transforms are defined, and things just work.

But of course, the real meat of the discussion comes when we analyze generalized functions on S. When the class is specifically S, these generalized functions are called tempered distributions. As we have already seen, the Dirac delta “function” is a tempered distribution. And with this new framework, we can start to compute the Fourier transforms of functions we couldn’t previously. For instance, the Fourier transform of the Dirac delta is the constant 1 function:

\displaystyle \mathscr{F}\delta(\varphi) = \delta(\mathscr{F}\varphi) = \mathscr{F}\varphi(0)

But \mathscr{F}\varphi(0) is classically computable, and it’s just \int_{-\infty}^{\infty}\varphi(x)dx, which is another way to say 1(\varphi), where 1 is understood to be the tempered distribution induced by the constant 1 function. We just showed that \mathscr{F}\delta(\varphi) = 1(\varphi) for all \varphi \in S. In other words, \mathscr{F}\delta = 1.

This has the nice interpretation that being infinitely concentrated in the time domain (as is the delta) corresponds to being infinitely spread out in the frequency domain. Similarly, being spread out infinitely in the time domain is equivalent to being concentrated at a single point in the frequency domain, as the reader will have no trouble proving that \mathscr{F}1 = -\delta. The eager reader will go ahead and find the Fourier transforms of the complex exponential e^{2\pi iax} and \cos(2 \pi i ax), \sin(2 \pi i ax).

Operations on Tempered Distributions

Now that we have tempered distributions, it makes sense to start investigating the various operations we can perform on them. As we just saw, the Fourier transform is one of them, but it is useful to have a couple of others.

We should note that even some basic operations aren’t defined for generalized functions. For instance, with regular functions f,g, we could compute the product fg. This is not defined for all generalized functions. In fact, since the space of generalized functions is a vector space, the only kinds of operations we can apply are linear ones. The Fourier transform counts as one, and so does addition and multiplication by a constant. Multiplication is not a linear operation.

On the other hand, if one restricts to tempered distributions, one can compute the product of a tempered distribution with a function in S. The derivation of the definition follows the same philosophy that it did for the definition of the Fourier transform, and the computation is quite trivial. In fact, we get something slightly more general:

Definition: Given a function f such that f\varphi \in S for all \varphi \in S and a tempered distribution T, we define fT by fT(\varphi) = T(f\varphi).

Continuing with the same derivation philosophy, we can define the derivative of a tempered distribution T:

Definition: The derivative of a tempered distribution T, denoted T', is defined by T'(\varphi) = -T(\varphi').

A special case of this occurs when T is the delta function; we have f\delta = f(0)\delta as tempered distributions. Moreover, from this definition it is easy to reprove the classical identities \mathscr{F}T' = (2\pi is)\mathscr{F}T, and (\mathscr{F}T)' = \mathscr{F}(-2\pi it T).

Finally, while the convolution of two tempered distributions is also undefined in general, some additional hypotheses allow it to work (see here), and then we can regain the theorem that \mathscr{F}(f * T) = (\mathscr{F}f)(\mathscr{F}T). Again, a special case of this is the delta function: f * \delta =f as tempered distributions.

This menagerie of properties all works toward reclaiming the theorems we proved about the classical Fourier transform. For the purpose of this blog, we will henceforth blur the distinction between the classical theory and this more complicated (and correct) theory of Fourier transforms. It’s a lame cop out, we admit, but it allows us to focus on the less pedantic aspects of the theory and applications. This stuff took decades to iron out, and for good reason!

So next time we will continue with discrete Fourier transforms and multidimensional Fourier transforms. Until then!

The Fourier Transform — A Primer

In our last primer we saw the Fourier series, which flushed out the notion that a periodic function can be represented as an infinite series of sines and cosines. While this is fine and dandy, and quite a powerful tool, it does not suffice for the real world. In the real world, very little is truly periodic, especially since human measurements can only record a finite period of time. Even things we wish to explore on this humble blog are hardly periodic (for instance, image analysis). In order to compensate for this, we will develop the Fourier transform, which can be thought of as a limiting case of the Fourier series.

We will approach the Fourier transform from two different angles. First, we will take the “limiting case of the Fourier series” notion as far as it will take us (this will be quite far) to motivate the appropriate definitions. In this naive world, we will perform a few interesting computations, and establish the most useful properties of the Fourier transform as an operation. On the other hand, this naive world will be fraught with hand-waving and mathematical laxity. We will make statements that analysts (and people who know about the issues of convergence) would find uncouth.

And so we will redevelop the bulk of the theory from the ground up, and define the Fourier transform on a larger class of things called distributions. In doing so, we will circumvent (or rather, properly handle) all of the issues of convergence.

Fourier and Naivete

The Fourier transform is best thought of as an operation on functions which has some nice properties. One such property is linearity, and a more complex one is its effect on the convolution operation. But if one wants to know where the transform comes from, and this is an important thing to know, then we could interpret it as a generalization of the Fourier series coefficient. Similarly, its inverse can be thought of as a generalization of the Fourier series representation (or, the map taking a function to its Fourier series representation). The generalization we speak of, and the “limiting case” we mentioned earlier, is in the size of the period. In rough terms,

The Fourier transform is the limit of the Fourier coefficient as the period of the function tends to infinity.

This is how we will develop the definition of the Fourier transform, and the reader should understand why this is a sensible place to start: a function which has no period is simply a function which has an infinitely large period. Colloquially, one can “spot” periodicity much easier if the period is shorter, and as the period increases, functions start to “look” less and less periodic. So in the limiting case, there is no period.

In order to do this correctly, we should alter some of the definitions we made in our post on Fourier series. Specifically, we want to have an arbitrary period T. So instead of making the 1-periodic complex exponential e^{2\pi i k t} our basic building block, we choose e^{2 \pi i t k/T}. The reader will check that as k varies over all integers, these new complex exponentials still form an orthonormal basis of L^2 (well not quite, we have to modify the inner product slightly; see below). Then, using the notation we used last time for the Fourier coefficients c_k, the series is calculated as

\displaystyle \sum_{k = -\infty}^{\infty} c_k e^{\frac{2 \pi i k t}{T}}

where the c_k are computed via the new inner product:

\displaystyle c_k = \frac{1}{T}\int_{0}^{T}e^{\frac{-2 \pi i k t}{T}} f(t)dt

We make another slight alteration in the limits of integration:

\displaystyle c_k = \frac{1}{T} \int_{-T/2}^{T/2} e^{\frac{-2 \pi i k t}{T}} f(t)dt

This doesn’t change the integral, since a  T-periodic function has the same integral on any interval of length T.

Before we continue, we should show an intuitive aspect of what happens when T \to \infty. We can think of the usual Fourier series representation of a 1-periodic function f as a function on the integers whose values are the Fourier coefficients k \mapsto c_k, since the coefficients completely determine the representation of f.  The interval between the inputs to this mapping is 1, because the function is 1-periodic. On the other hand, if we let T vary, we see that the intervals between adjacent inputs shrink to 1/T. As T grows, the inputs are moving closer and closer together. In other words, the Fourier series representation is becoming a continuous mapping of the frequency! This viewpoint will motivate our seemingly magical choices to follow, and it partially justifies the common notion that the Fourier transform takes a function “in the time domain” and represents it “in the frequency domain.” It’s a stupid notion for mathematicians, but everyone says it anyway.

So if we try to take the limit immediately, that is, if we use the exact formula above for c_k and try to evaluate \lim_{T \to \infty} c_k, we have issues. The problem rears it’s ugly head when we let f be a function with bounded support (that is, f(x) is zero everywhere except possibly on a finite interval). If f is zero outside of [a,b] and \int_a^b |f(x)|dx \leq M is finite, then for some large T, we have that all of the Fourier coefficients go to zero as T \to \infty. The details:

\displaystyle |c_k| = \left | \frac{1}{T}\int_{-T/2}^{T/2} e^{\frac{-2 \pi i kt}{T}} f(t)dt \right | \leq \frac{1}{T} \int_{-T/2}^{T/2} \left | e^{\frac{-2 \pi i k t}{T}} \right | |f(t)| dt

But as the absolute value of the complex exponential is 1, we can bound this by

\displaystyle \frac{1}{T} \int_{-T/2}^{T/2} |f(t)|dt \leq \frac{M}{T},

and as T \to \infty, we see that the whole thing goes to 0.

The solution is (magic!) to scale linearly by T, and pull the balancing factor of 1/T outside of the coefficients c_k, and into the Fourier series itself. In other words, our new Fourier series is (written with the terms rearranged for good reason)

\displaystyle \sum_{k = -\infty}^{\infty} e^{\frac{-2 \pi i k t}{T}} c_k (1/T) \frac{1}{T}

where the coefficients c_k(1/T) are

\displaystyle c_k(1/T) = \int_{-T/2}^{T/2}e^{\frac{-2 \pi i k t}{T}} f(t)dt

Now we suddenly (magically!) realize that the first equation is just the usual Riemann sum from calculus for the estimate of an integral. If we think of x as our variable, then we’d be integrating e^{-2 \pi i t x} c_k(x). And in particular, when the interval between the x values goes to 0, the discrete sum converges to an integral by definition. Let us denote the infinitesimal variable s to represent this “limit of 1/T.” Then we redefine the two above equations with glorious new names:

Definition: The Fourier transform of a function f: \mathbb{R} \to \mathbb{C} is the integral

\displaystyle \mathscr{F}f(s) = \int_{-\infty}^{\infty}e^{-2 \pi i s t} f(t)dt

whenever such an integral converges.

The inverse Fourier transform of f is the integral

\displaystyle \mathscr{F}^{-1}g(t) = \int_{-\infty}^{\infty} e^{2\pi i s t}g(s)ds

whenever such an integral converges.

And so, the Fourier transform above generalizes the Fourier coefficient c_k (the limits of integration go to infinity), while the inverse transform generalizes the Fourier series reconstruction, by our conversion from a discrete sum to an integral.

We should note a few things about this definition, because it is quite a mouthful. First, \mathscr{F} and \mathscr{F}^{-1} operate on functions. In other words, they accept a function as input, and their values are functions. Still in other words, the parentheses are like (\mathscr{F}f)(s), and not like \mathscr{F}(f(s)). We will often omit the parentheses with this implicitly understood precedence. This is also part of why we choose different variables. f(t) will often use a different variable than its transform \mathscr{F}f(s).

Second, returning to our remark about stupid notions, the function f(t) can be thought of as being in the “time domain,” where the inputs are instances of time, while the transformed function \mathscr{F}f is in the “frequency domain.” That is, for a given input s, the Fourier transform \mathscr{F}f(s) describes how the complex exponential with frequency s contributes to the overall function f. The set of values of the Fourier transform of f is called the spectrum of f. One can visualize the spectrum, but only indirectly. A complex-valued function is always hard to visualize, so instead one graphs |\mathscr{F}f(s)| as a function on the real numbers. We then get pretty pictures like this one giving the spectrum of some human-spoken words:

This also explains the humorous comic at the beginning of this post: the thing saying “meow” is the spectrum of a cat, complete with whiskers. The comic also reinforces the idea that \mathscr{F} is simply an operation on functions. One does not need to restrict \mathscr{F} to operate on functions whose domain is time (indeed, a cat is not a function of time). It’s just an instance in which one can concretely interpret the transform of a function. For example, if one wanted to (and we will shortly), one could wonder about \mathscr{FF}f, or even apply \mathscr{F} arbitrarily many times and see what happens under the limit. The same thing goes for the inverse Fourier transform.

The last big issue we have with the above definition is that it only makes sense when the integral actually converges. We will run into a few examples where this becomes a big problem, but we will sweep these issues under the rug for now (remember, this is still the land of naivete).

Nevertheless, we can do some wonderful computations with this mentality and this new definition. It will benefit us in the long run, because we’ll discover the useful properties of the Fourier transform now, and use those properties to steer the more rigorous formalities later.

Elementary Transforms and Elementary Properties

Armed with the mighty definition of the Fourier transform, we can take two paths. We can compute the transforms of various elementary functions, or we can develop tools to construct transforms of combinations of functions by computing the transforms of their constituent parts. This is largely the same approach one takes in studying derivatives and integrals in classical calculus: one learns to compute the derivatives of polynomials, logarithms, and exponentials; and then one learns to compute derivatives of products, sums, and compositions of functions. We will operate with the same philosophy for now.

Example: Let f = \chi_{[-1/2, 1/2]} be the characteristic function of the interval from -1/2 to 1/2. That is, it is the function which is 1 on that interval and zero everywhere else. We will show \mathscr{F}f = \frac{\sin(\pi s)}{\pi s} by appealing directly to the definition of the Fourier transform.

\displaystyle \mathscr{F}f(s) = \int_{-\infty}^{\infty} e^{-2 \pi i s t} \chi_{[-1/2, 1/2]}(t) dt

Since f is zero outside of the chosen interval, and one inside, we can simplify the integral by adjusting the limits to -1/2 and 1/2, and inside simply using f(t) = 1:

\displaystyle \mathscr{F}f(s) = \int_{-1/2}^{1/2}e^{-2 \pi ist} dt

And this is quite tractable. Integrating the complex exponential as usual, we have:

\displaystyle \mathscr{F}f(s) = \left ( \frac{1}{-2 \pi is} e^{-2 \pi ist} \right ) \Big |_{-1/2}^{1/2} = \frac{e^{- \pi ist} - e^{\pi ist}}{-2 \pi is} = \frac{\sin(\pi s)}{\pi s}

Where the last equality follows from the classic identity e^{ix} = \cos(x) + i\sin(x). The result of this Fourier transform is so pervasive that it has it’s own name: \textup{sinc}(s) = \frac{\sin(\pi s)}{\pi s}.

Exercise: Let \Lambda(t) be the piecewise function defined as 1 - |t| if |t| < 1, and zero otherwise. Prove that \mathscr{F}\Lambda(s) = \textup{sinc}^2(s) = \frac{\sin^2(\pi s)}{(\pi s)^2}.

Again, this one follows straight from the definition, which must be computed piecewise to handle the absolute value.

Example: Let f be the Gaussian f(t) = e^{- \pi t^2}. Then \mathscr{F}f(s) = f(s). That is, the Gaussian is fixed by the Fourier transform.

This is a very special property of the Gaussian, hinting at the special relationship between Fourier transforms and smoothness of curves. In order to prove it we need to borrow a fact from complex analysis, that \int_{-\infty}^{\infty} e^{- \pi t^2} = 1. Note that here the indefinite integral of f cannot be expressed in elementary terms, so basic calculus tools are insufficient to prove this fact. A proof is most easily accessible using complex integration and residue theory, and Wikipedia provides a proof that does the same thing using a real parameterization to make it seem more elementary.

To find the Fourier transform, we again appeal to the definition, except this time we use some tricks. First, we differentiate the definition with respect to s, and then integrate the result by parts to arrive at an ordinary differential equation, which we know how to solve. Set F(s) = \mathscr{F}f(s) for ease of notation.

\displaystyle F(s) = \int_{-\infty}^{\infty} e^{-2 \pi ist} e^{-\pi t^2}dt

Differentiating with respect to s, we have

\displaystyle F'(s) = \frac{d}{ds}\int_{-\infty}^{\infty} e^{-2 \pi ist}e^{-\pi t^2}dt = \int_{-\infty}^{\infty} \frac{d}{ds}(e^{-2 \pi ist}) e^{-\pi t^2} dt

Performing the differentiation and regrouping terms we have

\displaystyle F'(s) = i \int_{-\infty}^{\infty}e^{-2\pi ist}(-2\pi t e^{-\pi t^2}) dt

Now integrating by parts with respect to t, and recognizing that the term e^{-2 \pi ist} e^{-\pi t^2} tends to zero both as t \to \pm \infty, we get

\displaystyle F'(s) = -i \int_{-\infty}^{\infty} e^{-\pi t^2}(-2\pi is)e^{-2 \pi ist} dt = -2\pi s F(s)

As we claimed earlier, this is a simple ordinary differential equation, which has solution

\displaystyle F(s) = F(0)e^{-\pi s^2} = F(0)f(s)

And here F(0) = \int_{-\infty}^{\infty} e^{-2 \pi i t 0}e^{-\pi t^2} dt = 1, as we claimed from the beginning. This completes the proof, as F(s) = f(s).

Next, we will focus on the rules for taking Fourier transforms of functions combined in various ways. First, we note that the Fourier transform is linear, in the very same sense as linear maps between vector spaces in elementary linear algebra. In particular, the linearity of the integral gives

\displaystyle \mathscr{F}(af + bg)(s) = a \mathscr{F}f(s) + b\mathscr{F}g(s)

Other easy properties arise from modifying the input of f, and using multiplicative properties of the complex exponential. For instance, if we let f_a(t) = f(t - a), we see that \mathscr{F}f_a(s) = e^{-2 \pi ias}\mathscr{F}f(s). This follows by a simple change of variables in the integral. Letting u = t-a,

\displaystyle \mathscr{F}f_a(s) = \int_{-\infty}^{\infty} e^{-2\pi is(u+a)}f(u)du

And we can trivially factor out the needed complex exponential coefficient, and work with u as usual. One convenient interpretation of this formula is that a shift in time corresponds to a phase shift in frequency. Once again, we caution that these interpretations are a tool; they can massively confuse the issue when, say, the domain of f is frequency to begin with.

Similar considerations give us a formula for the scaled function f(at) whose transform is \frac{1}{|a|} \mathscr{F}f(\frac{s}{a}). We leave this as an exercise to the reader (hint: break it into two cases for when a<0, a>0).

Next, we note that the Fourier transform turns derivatives of some functions into a very manageable product. Rigorously, if \lim_{t \to \pm \infty} f(t) = 0 then

\displaystyle \mathscr{F}\frac{d^nf}{dt^n}(s) = (2\pi i s)^n \mathscr{F}f(s)

We can prove this by induction. We’ll just prove the base case:

\displaystyle \mathscr{F}f'(s) = \int_{-\infty}^{\infty} e^{-2 \pi ist}f'(t)dt

Integrating by parts, we get

\displaystyle \mathscr{F}f'(s) = e^{-2\pi ist}f(t) - \int_{-\infty}^{\infty} (-2\pi is)e^{-2\pi ist}f(t)dt

And by our boundedness property and the fact that the complex exponential has a constant norm, the first term (evaluated from -\infty to \infty) tends to zero, leaving our desired product. The inductive step follows with the ease of iterated integration by parts. Note that although this example only holds for functions which tend to zero at \pm \infty, next time we will rectify the situation by restricting our theory to functions which are “the best candidates” for the theory of Fourier analysis and eliminate the need for such hypotheses.

The More Interesting Properties

The final two properties of the Fourier transform that we will inspect are in a sense deeper and more substantial than those above. In particular, we will establish the duality of the Fourier transform, and the effect of Fourier transforms on convolution.

First, the Fourier transform has a few notions of duality. Let f^-(t) denote f(-t). One such duality notion is the following, which is a trivial consequence of the definitions of the transform and its inverse:

\displaystyle (\mathscr{F}f)^- = \mathscr{F}^{-1}f

Similarly, a minor change of variables shows that \mathscr{F}(f^-) = \mathscr{F}^{-1}f. Chaining these together, we have the nice identity

\displaystyle \mathscr{F}(f^-) = (\mathscr{F}f)^-

A simple corollary is that \mathscr{FF}f = f^-. This allows us to compute the Fourier transforms of some potentially unmanageable functions. For instance, let us return to our friend the sinc function.

\displaystyle \mathscr{F}\textup{sinc}(s) = \mathscr{FF}(\chi_{[-1/2,1/2]}) = \chi^-_{[-1/2, 1/2]} = \chi_{[-1/2, 1/2]}

by the symmetry of the characteristic function. On the other hand, it’s ridiculously counterintuitive that the following integral is actually the characteristic function of a finite interval:

\displaystyle \int_{-\infty}^{\infty} e^{-2 \pi ist} \frac{\sin(\pi t)}{\pi t} dt

In fact, even though we just “proved” that the sinc function has a nice transform, it is hardly clear how to integrate it. In fact, the sinc function is not even (Lebesgue) integrable! Without further qualifications, the above expression is complete nonsense.

Historically, this is the point at which the physicists contact the mathematicians and say, “We dun broked it!” Because the physicists went ahead and used these naively impossible transforms to do amazing things and discover elegant identities, the mathematicians are left to design a sensible theory to support their findings. The field is rife with such inconsistencies, and this is not the last one we will see before consolidating the theory. Perhaps this is in part because successful applications in engineering outpace mathematical rigor. Glibly, they’re racing for profit while the mathematicians want to architect a flawless foundation in which deep theorems are manifestly obvious.

Getting back to the properties of Fourier transforms, we have saved perhaps the most useful one for last. In short, Fourier transforms turn convolution into multiplication. Convolutions, both continuous and discrete, make cameo appearances all over applied mathematics, from signal processing and image analysis to quantum mechanics and mathematical finance. In our applications, we will use the following properties of convolution to modify the spectrum of a signal, for such purposes as removing noise, or filtering out low/high/mid frequency regions. Without further ado, the definition:

Definition: The convolution of f and g, denoted f \ast g, is the integral

\displaystyle (f \ast g)(x) = \int_{-\infty}^{\infty} g(x-y)f(y)dy,

should such an integral converge. Otherwise the convolution is undefined.

Often convolution is interpreted as some sort of stretch+translation of a function by another, but we find such meager interpretations mathematically flaccid. Convolution is simply an operation that combines functions in an interesting way (perhaps its definition is motivated by the question below). Nevertheless, Wikipedia provides a number of relevant animations showing convolution in action.

So the leading question here is, what happens when one takes the product of (\mathscr{F}f)(\mathscr{F}g)? From the definition, this is

\displaystyle \left ( \int_{-\infty}^{\infty}e^{-2\pi isx}f(x)dx \right ) \left ( \int_{-\infty}^{\infty} e^{-2 \pi ist} g(t) dt \right )

We may combine the integrals into a double integral, and further combine the complex exponentials, getting

\displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2 \pi is (t+x)}g(t) dt \right ) f(x) dx

Substituting u = t+x, we have

\displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} e^{-2\pi isu} g(u-x)du \right ) f(x) dx

And swapping the order of integration,

\displaystyle \int_{-\infty}^{\infty} \left ( \int_{-\infty}^{\infty} g(u-x)f(x)dx \right ) e^{-2\pi isu} du = \mathscr{F}(g \ast f)

(The parenthetical quantity drove our definition of the convolution to begin with.) And so we have the beautiful identity:

\mathscr{F}(f \ast g)(s) = \mathscr{F}f(s) \mathscr{F}g(s)

We will use this as follows: multiply the Fourier transform of a signal by an appropriate characteristic function (the characteristic function of the set of “good” frequencies of, say, a sound clip) and then take the inverse transform of the product, getting as a result a modified signal with certain frequencies removed.

There are a few hurdles between here and there (at least, as far as this blog goes). First, we must compensate for our convergence naivete with mathematical rigor. Next time, we will define the class of Schwartz functions, from which we will derive a class of “generalized functions,” intuitively constituting the class of “transformable” functions. After that, we must needs find a suitable discrete approximation of the Fourier transform. In real life, all signals are sampled sequences of numbers. As such, we cannot take their integrals, and must convert these continuous notions to operations on sequences. Finally, we need to investigate an algorithm to efficiently compute such a discrete Fourier transform. Then, and only then, may we proceed with writing programs to do great things.

So look forward to all of the above in the coming weeks. Until next time!