It’s often said that the Age of Information began on August 17, 1964 with the publication of Cooley and Tukey’s paper, “An Algorithm for the Machine Calculation of Complex Fourier Series.” They published a landmark algorithm which has since been called the Fast Fourier Transform algorithm, and has spawned countless variations. Specifically, it improved the best known computational bound on the discrete Fourier transform from to , which is the difference between uselessness and panacea.

Indeed, their work was revolutionary because so much of our current daily lives depends on efficient signal processing. Digital audio and video, graphics, mobile phones, radar and sonar, satellite transmissions, weather forecasting, economics and medicine all use the Fast Fourier Transform algorithm in a crucial way. (Not to mention that electronic circuits wouldn’t exist without Fourier analysis in general.) Before the Fast Fourier Transform algorithm was public knowledge, it simply wasn’t feasible to process digital signals.

Amusingly, Cooley and Tukey’s particular algorithm was known to Gauss around 1800 in a slightly different context; he simply didn’t find it interesting enough to publish, even though it predated the earliest work on Fourier analysis by Joseph Fourier himself.

In this post we will derive and implement a Fast Fourier Transform algorithm, and explore a (perhaps naive) application to audio processing. In particular, we will remove white noise from a sound clip by filtering the frequency spectrum of a noisy signal.

As usual, all of the resources used for this post are available on this blog’s Github page.

## Derivation

It turns out that there are a number of different ways to speed up the naive Fourier Transform computation. As we saw in our primer on the discrete Fourier transform, the transform itself can be represented as a matrix. With a bit of nontrivial analysis, one can factor the Fourier transform matrix into a product of two *sparse* matrices (i.e., which contain mostly zeros), and from there the operations one can skip are evident. The intuitive reason why this should work is that the Fourier transform matrix is very structured, and the complex exponential has many useful properties. For more information on this method, see these lecture notes (p. 286).

We will take a different route which historically precedes the matrix factorization method. In fact, the derivation we’ll trace through below was Cooley and Tukey’s original algorithm. The method itself is a classical instance of a *divide and **conquer* algorithm. Familiar programmers would recognize the ideas in common sorting algorithms: both mergesort and quicksort are divide and conquer algorithms.

The difficulty here is to split a list of numbers into two lists which are half in size in such a way that the Fourier transforms of the smaller pieces can be quickly combined to form the Fourier transform of the original list. Once we accomplish that, the rest of the algorithm falls out from recursion; further splitting each piece into two smaller pieces, we will eventually reach lists of length two or one (in which case the Fourier transform is completely trivial).

For now, we’ll assume that the length of the list is a power of 2. That is,

for some .

We will also introduce the somewhat helpful notation for complex exponentials. Specifically, . Here the will represent the length of the list, and will be related to the index. In particular, the complex exponential that usually shows up in the discrete Fourier transform (again, refer to our primer) is . We write the discrete Fourier transform of by specifying its -th entry:

Now the trick here is to split up the sum into two lists each of half size. Of course, one could split it up in many different ways, but after we split it we need to be able to rewrite the pieces as discrete Fourier transforms themselves. It turns out that if we split it into the terms with even indices and the terms with odd indices, things work out nicely.

Specifically, we can write the sum above as the two sums below. The reader should study this for a moment (or better yet, try to figure out what it should be without looking) to ensure that the indices all line up. The notation grows thicker ahead, so it’s good practice.

.

Now we need to rewrite these pieces as Fourier transforms. Indeed, we must replace the occurrences of in the complex exponentials with . This is straightforward in the first summation, since

.

For the second summation, we observe that

.

Now if we factor out the , we can transform the second sum in the same way as the first, but with that additional factor out front. In other words, we now have

.

If we denote the list of even-indexed entries of by , and vice versa for , we see that this is just a combination of two Fourier transforms of length . That is,

But we have a big problem here. Computer scientists will recognize this as a type error. The thing on the left hand side of the equation is a list of length , while the thing on the right hand side is a sum of two lists of length , and so it has length . Certainly it is still true for values of which are less than ; we broke no algebraic laws in the way we rewrote the sum (just in the use of the notation).

Recalling our primer on the discrete Fourier transform, we naturally extended the signals involved to be periodic. So indeed, the length- Fourier transforms satisfy the following identity for each .

However, this does not mean we can use the same formula above! Indeed, for a length Fourier transform, it is not true in general that . But the correct formula is close to it. Plugging in to the summations above, we have

Now we can use the easy-to-prove identity

And see that the right-hand-term is simply . Similarly, one can trivially prove the identity

,

this simplifying the massive formula above to the more familiar

Now finally, we have the Fast Fourier Transform algorithm expressed recursively as:

With the base case being .

## In Python

With all of that notation out of the way, the implementation is quite short. First, we should mention a few details about complex numbers in Python. Much to this author’s chagrin, Python represents using the symbol 1j. That is, in python, is

a + b * 1j

Further, Python reserves a special library for complex numbers, the cmath library. So we implement the omega function above as follows.

import cmath def omega(p, q): return cmath.exp((2.0 * cmath.pi * 1j * q) / p)

And then the Fast Fourier Transform algorithm is more or less a straightforward translation of the mathematics above:

def fft(signal): n = len(signal) if n == 1: return signal else: Feven = fft([signal[i] for i in xrange(0, n, 2)]) Fodd = fft([signal[i] for i in xrange(1, n, 2)]) combined = [0] * n for m in xrange(n/2): combined[m] = Feven[m] + omega(n, -m) * Fodd[m] combined[m + n/2] = Feven[m] - omega(n, -m) * Fodd[m] return combined

Here I use the awkward variable names “Feven” and “Fodd” to be consistent with the notation above. Note that this implementation is highly non-optimized. There are many ways to improve the code, most notably using a different language and cacheing the complex exponential computations. In any event, the above code is quite readable, and a capable programmer could easily speed this up by orders of magnitude.

Of course, we should test this function on at least some of the discrete signals we investigated in our primer. And indeed, it functions correctly on the unshifted delta signal

>>> fft([1,0,0,0]) [(1+0j), (1+0j), (1+0j), (1+0j)] >>> fft([1,0,0,0,0,0,0,0]) [(1+0j), (1+0j), (1+0j), (1+0j), (1+0j), (1+0j), (1+0j), (1+0j)]

As well as the shifted verion (up to floating point roundoff error)

>>> fft([0,1,0,0]) [(1+0j), (6.123233995736766e-17-1j), (-1+0j), (-6.123233995736766e-17+1j)] >>> fft([0,1,0,0,0,0,0,0]) [(1+0j), (0.7071067811865476-0.7071067811865475j), (6.123233995736766e-17-1j), (-0.7071067811865475-0.7071067811865476j), (-1+0j), (-0.7071067811865476+0.7071067811865475j), (-6.123233995736766e-17+1j), (0.7071067811865475+0.7071067811865476j)]

And testing it on various other shifts gives further correct outputs. Finally, we test the Fourier transform of the discrete complex exponential, which the reader will recall is a scaled delta.

>>> w = cmath.exp(2 * cmath.pi * 1j / 8) >>> w (0.7071067811865476+0.7071067811865475j) >>> d = 4 >>> fft([w**(k*d) for k in range(8)]) [ 1.7763568394002505e-15j, (-7.357910944937894e-16 + 1.7763568394002503e-15j), (-1.7763568394002505e-15 + 1.7763568394002503e-15j), (-4.28850477329429e-15 + 1.7763568394002499e-15j), (8 - 1.2434497875801753e-14j), (4.28850477329429e-15 + 1.7763568394002505e-15j), (1.7763568394002505e-15 + 1.7763568394002505e-15j), (7.357910944937894e-16 + 1.7763568394002505e-15j)]

Note that occurs in the position with index , and all of the other values are negligibly close to zero, as expected.

So that’s great! It works! Unfortunately it’s not quite there in terms of usability. In particular, we require our signal to have length a power of two, and most signals don’t happen to be that long. It turns out that this is a highly nontrivial issue, and all implementations of a discrete Fourier transform algorithm have to compensate for it in some way.

The easiest solution is to simply add zeros to the end of the signal until it is long enough, and just work with everything in powers of two. This technique (called *zero-padding*) is only reasonable if the signal in question is actually zero outside of the range it’s sampled from. Otherwise, subtle and important things can happen. We’ll leave further investigations to the reader, but the gist of the idea is that the Fourier transform assumes periodicity of the data one gives it, and so padding with zeros imposes a kind of periodicity that is simply nonexistent in the actual signal.

Now, of course not every Fast Fourier transform uses zero-padding. Unfortunately the techniques to evaluate a non-power-of-two Fourier transform are relatively complex, and beyond the scope of this post (though not beyond the scope of this blog). The interested reader should investigate the so-called Chirp Z-transform, as discovered by Rader, et al. in 1968. We may cover it here in the future.

In any case, the code above is a good starting point for any technique, and as usual it is available on this blog’s Github page. Finally, the inverse transform has a simple implementation, since it can be represented in terms of the forward transform (hint: remember duality!). We leave the code as an exercise to the reader.

## Experiments with Sound — I am no Tree!

For the remainder of this post we’ll use a more established Fast Fourier Transform algorithm from the Python numpy library. The reasons for this are essentially convenience. Being implemented in C and brandishing the full might of the literature on Fourier transform algorithms, the numpy implementation is lightning fast. Now, note that the algorithm we implemented above is still correct (if one uses zero padding)! The skeptical reader may run the code to verify this. So we are justified in abandoning our implementation until we decide to seriously focus on improving its speed.

So let’s play with sound.

The sound clip we’ll be using is the following piece of dialog from the movie The Lord of the Rings: The Two Towers. We include the original wav file with the code on this blog’s Github page.

If we take the Fourier transform of the sound sample, we get the following plot of the frequency spectrum. Recall, the frequency spectrum is the graph of the norm of the frequency values.

Here we note that there is a symmetry to the graph. This is not a coincidence: if the input signal is real-valued, it will always be the case that the Fourier transform is symmetric about its center value. The reason for this goes back to our first primer on the Fourier series, in that the negative coefficients were complex conjugates of the positive ones. In any event, we only need concern ourselves with the first half of the values.

We will omit the details for doing file I/O with audio. The interested reader can inspect our source code, and they will find a very basic use of the scikits.audiolab library, and matplotlib for plotting the frequency spectrum.

Now, just for fun, let’s tinker with this audio bite. First we’ll generate some random noise in the signal. That is, let’s mutate the input signal as follows:

import random inputSignal = [x/2.0 + random.random()*0.1 for x in inputSignal]

This gives us a sound bite with an annoying fuzziness in the background:

Next, we will use the Fourier transform to remove some of this noise. Of course, since the noise is random it inhabits all the frequencies to some degree, so we can’t eliminate it entirely. Furthermore, as the original audio clip uses some of the higher frequencies, we will necessarily lose some information in the process. But in the real world we don’t have access to the original signal, so we should clean it up as best we can without losing too much information.

To do so, we can plot the frequencies for the noisy signal:

Comparing this with our original graph (cheating, yes, but the alternative is to guess and check until we arrive at the same answer), we see that the noise begins to dominate the spectrum at around the 20,000th index. So, we’ll just zero out all of those frequencies.

def frequencyFilter(signal): for i in range(20000, len(signal)-20000): signal[i] = 0

And voila! The resulting audio clip, while admittedly damaged, has only a small trace of noise:

Inspecting the source code, one should note that at one point we halve the amplitude of the input signal (i.e., in the time domain). The reason for this is if one arbitrarily tinkers with the frequencies of the Fourier transform, it can alter the *amplitude* of the original signal. As one quickly discovers, playing the resulting signal as a wav file can create an unpleasant crackling noise. The amplitudes are clipped (or wrapped around) by the audio software or hardware for a reason which is entirely mysterious to this author.

In any case, this is clearly not the optimal way (or even a good way) to reduce white noise in a signal. There are certainly better methods, but for the sake of time we will save them for future posts. The point is that we were able to implement and use the Fast Fourier Transform algorithm to analyze the discrete Fourier transform of a real-world signal, and manipulate it in logical ways. That’s quite the milestone considering where we began!

Next up in this series we’ll investigate more techniques on sound processing and other one-dimensional signals, and we’ll also derive a multi-dimensional Fourier transform so that we can start working with images and other two-dimensional signals.

Until then!

This was a delight to read! The level of your blog is perfect for someone like me, who once upon a time knew a little mathematics, and forgotten a good deal of what he ever knew, but can still follow and delight in a mathematical argument. Too much mathematical writing is aimed either at the expert or the fearful uninitiated. The middle ground is poorly served, which is why your blog is so valuable. Thanks for your hard work!

LikeLike

Couldn’t have said it better

LikeLike

Reblogged this on statsr.

LikeLike

Thanks for the interesting post. I look forward to the derivation for the multi-dimensional FFT algorithm.

LikeLike

Zeroing out frequencies is an extremely poor way to do a low-pass filter. You actually introduce other frequency components by doing so. You have to remember that a discrete FFT is not completely representative of the real world. You get frequency artifacts introduced as the discrete bins can only represent specific frequencies. So when your frequencies aren’t exact matches of those bins other frequency bins are introduced to fill it in. So you get high frequency components which actually are part of the low frequency information. Zeroing out kinda’ works on audio data like this….but most definitely not on signal analysis. Zeroing out can completely change the temporal spacing on the original signal.

Convolution does better.

https://ccrma.stanford.edu/~jos/sasp/Example_1_Low_Pass_Filtering.html

LikeLike

I understand it’s not a good method; I just wanted something easy I could do with the Fourier transform to have some kind of trivial demo after the admittedly symbol-happy derivation. I would love to see a mathematical treatment of why this is the case. I know that sampling can be very tricky, but I don’t quite understand what you mean when you say “other frequency bins are introduced to fill it in.” What is introducing it? Why would it work on audio data but suddenly change the ‘temporal spacing’ (not quite sure what that means) on other kinds of data? And do you mean convolution of the original signal or convolution of the Fourier transform? Convolution by what?

LikeLike

It’s important to realise that the FFT is an approximation of an analog Fourier transform; assuming a number of mathematical bits and pieces that make everything convenient. Some/Most of the time the difference is trivial, but Michael is indeed correct, and you need to be careful if you’re doing something where the artefacts could reduce signal quality in a way that is not optimal (e.g., audio processing).

To explain Michael’s frequency bin stuff; think of the FFT as a histogram, with each “frequency bin” representing how often a particular frequency occurs in a particular signal. Now, for example, if you do something to your original signal that shifts some of the frequencies part way from bin 2 to bin 1, then you may actually be moving a frequency component from say “bin 1.8″ to “bin 1.6″. In effect, it looks like you have less of bin 2 and more of bin 1, but that’s only because you only had the frequency resolution to differentiate between 1 and 2; in reality, the actual frequency was 1.8 and it shifted to 1.6.

An aside: As the length of the FFT is equal in time and frequency domains, zero-padding the input signal gives you more bins of “resolution” in the frequency domain. You can use this to get a finer grained picture of the frequency components, but the high frequency components that you have introduced are no longer accurate (I think they represent harmonics of the actual signal).

To answer your other question: Convolution is on the original signal (in the time-domain). Convolution in the time domain is equivalent to multiplication in the frequency domain. i,e., If you have two signals x(t) and y(t) then FFT(conv(x(t), y(t))) = FFT(x(t)) * FFT(y(t)).

Finally, in your first frequency histogram figure, you say that the left side of the figure can be thought of as the negative frequencies—it’s actually the right hand side of the figure that are the negative frequencies. If you cut the figure in half, and move the right hand side to the left hand side of the figure, such that you have a massive peak in the middle and tails going away on both sides (kinda like a Normal distribution, or “bell curve”) then you will get a better picture of what the signal looks like. In this case, all of the energy is in the middle, around the zero frequencies, meaning that you have mainly low frequency components in your original signal. In fact, whenever your original signal is “real” (as opposed to complex), then you can safely ignore the negative frequency components as they are just a mirror image of the positive ones.

LikeLike

You’re quite right about the negative frequencies, since the signal is assumed to be periodic modulo its length. I changed the caption to reflect that.

So I proved (or referenced) most of what you’re saying in my primers on Fourier analysis. This is why I’m confused. Zeroing out the high frequencies just multiplication by some characteristic function, and that corresponds to convolution in the time domain by a sinc function. I don’t see how that ‘introduces’ more frequency components, because I’m not modifying the time domain in a way that shifts frequencies. Why would it be that the low frequency information (I’m imagining very close to zero) ends up in the high frequency bins? If it’s only off by a difference of one bin, then that should be negligible; the wav file I’m using was sampled at a high rate. Again, I believe I need a rigorous mathematical treatment of this to understand it fully.

And certainly for this case (audio signals) zero padding is fine, because the signal is actually zero in the padded region.

LikeLike

You are correct about the convolution with a sinc function, but because we are dealing with discrete time the sinc will be truncated in the time domain (this can introduce artifacts, but you’d have to look at the mathematics to properly quantify them; it’s been ages since I worked on this stuff).

I don’t think what you’re doing is a problem when you have many high-sample-rate samples (as you do), but in many applications this is not the case. For example, waiting for a large number of samples will lead to a large processing delay in audio applications.

LikeLike

Although a MSc holder, I was too dumb to understand Fourier Transforms and passed on it.

The maths rotted my brain.

LikeLike

I’ve been looking for this for months! Thank you!

LikeLike

It’s good to find another website explaining Fourier – the whole field of DSP has math that feels out of my league.From my (admitted tenuous) understanding, and playing with Excel, I can “sort of” see the problem with just zeroing high frequency stuff…

When a signal frequency lines up nicely with a sample frequency, Excel plots nice sharp spikes. If the signal is NOT spot on, you don’t get two adjacent spikes – you get a smeared out bell curve. If you just flatten out that curve, you are deleting frequencies that weren’t there to start with. This translates back as ditortion in your lower frequency signal.

One way to help overcome this (and a potential topic) is “Window functions” with names like hamming, hanning, and Blackman: http://en.wikipedia.org/wiki/Window_function

Another good topic would be the Hilbert Transform. I have yet to figure out how to model one of these in Excel, and I need pretty graphs to understand some of these high-falutin’ concepts.

LikeLike

Unfortunately my focus is increasingly on algebraic and geometric topics these days. So any work I do on this blog in the realm of analysis will be pretty focused on applications to image and sound processing. I still haven’t even gotten around to doing the multidimensional Fourier stuff, yet… especially with this wave of machine learning theory entering my brain.

But you’re right, all of these different transforms are interesting. Same with the more general theory of wavelet transforms.

LikeLike

Fascinating ! thank you for providing such an engaging and interesting content ! Xavier Berthet (www.millorquelaxocolata.com)

LikeLike

Great writeup.

I think you have a sign error in your definition of ω[p,q]; it’s not consistent with the equation for ω[n,-km] in the following sentence. I guess ω[p,q] = exp(2 π i q / p)?

LikeLike

Yes. That was a typo; thanks.

LikeLike

I understood nothing. Your blog post was for some math geniuses! Please write in ENGLISH next time!!!

LikeLike

Well that was rude. The FFT is a technical subject by nature; if you want to understand more than nothing you have to apply yourself. I even provided background posts to motivate all of this, but judging by your attitude you’re not mature enough to understand those either.

This is mathematics. It’s the hardest thing there is, and there is no English translation. If you can’t handle it, go read about something else.

LikeLike

While your post is a tad rude, I have sympathy. Mathematics is its own language, independent of English (or any other language). Understanding higher functions requires a good grasp of the basics, patience, and even practice. I’ll try to explain the Fourier in basic English – but I’m probably hopelessly wrong and about to be severely ridiculed. Ready? Let’s go!

A Fourier Transform converts time-based samples into a group of frequencies (and amplitudes). An Inverse transform reverses this.

Say you have 64 samples. If any of the first 32 are positive, and another one 32 samples later is negative, you can say theres a periodic signal present- 64 samples long. The more positive samples in first 32, matched by negatives in the second 32, the bigger this signal is.

Other frequencies? Split in half, and do again.

With 32 samples, you compare the first 16 to the second 16. 16 samples – first 8 to second 8…

…until you reach 2 samples, where you compare the first with the second.

Comparing isn’t so hard. Keeping track of all those comparisons (and adding up the results) is where the math and gets harder to follow. Also, because the Fourier Transform is S-L-O-W, we are looking for “shortcuts” or ways to speed everything up.

I hope that helps, and is “In English” enough for you. Now stand back: I suspect my betters are about to blast my explanation to pieces.

LikeLike

The explanation of “transforming the time domain into the frequency domain” is the explanation that every person who knew about the Fourier transform gave me when I didn’t know what the transform was, and it was incredibly useless to help me understand what the operation does. In particular, most introductory physics and EE textbooks take this approach: they remove or dumb down (or lie about!) the mathematical details in order to spare the student from actually having to do mathematics.

As you can probably understand, this doesn’t sit well with a mathematician trying to learn the subject. I would prefer if every topic related to mathematics would be explained to me in terms of mathematics. For instance, English makes quantum mechanics sound very hard, but the basic ideas are elementary linear algebra. And so of course, on my blog I will always always always prefer to explain things in terms of mathematics. It’s clearer, more concise, and the English words should only be there to support the math and make formulas natural and obvious.

So anyone who comes to a blog with “Math” in the title and complains about how much math there is should really just spend their time on something more important. I’ll gladly clarify questions and explain things in further detail, but I have no patience here for people who simply complain that they “understand nothing.” I have to address that unaddressable concern often enough with students who don’t show up to lecture and then beg for help in the last week of the semester. I consider this a matter of immaturity, and it doesn’t belong on my blog.

LikeLike

It is your blog, and what you consider appropriate is up to you. Math is independent of your native language. I have seen books in Thai where the only thing I understand are the equations… I wish “stan” the best of luck in finding explanations he can understand.

My own abilities with math crumble when faced with div, grad, or curl. Unfortunatley, this makes Maxwells equations (and electromagnetics) inpenetrable. But I admit the fault is with me, not the people trying to teach me – the “English” they use is fine.

LikeLike

Wow! Thank you for such a great explanation of the FFT! I have read tons of definitions but as with most DSP topics either the math can get very complicated or the definition will expect the reader to assume a lot between steps without guidance.

As for python referring to the imaginary number as ‘j’… it makes us electrical engineers very happy to see it. We already use ‘i’ to define current so we use ‘j’ to represent complex numbers in order to avoid confusion. I’m sure it upsets mathematicians but so do a lot of the things engineers do. =)

LikeLike

Reblogged this on Manaswi Saha.

LikeLike

It is strange, but example in python gives me more understanding than numerous equations. Great article! Thanks!

LikeLike

I’ve had little exposure to Python, so I have to ask about this line:

combined = [0] * n

from what I do know about it, I would guess that combined is now an array of 0’s with a length of n.

Am I correct?

LikeLike

Yes, this is the Python version of instantiating an array filled with zeros. In C you’d use malloc and a loop to initialize everything…

LikeLike

Outside of signal processing, I’ve heard of but not seen any clear explanation of the use of the FFT in machine learning. Does anyone know something about this?

LikeLike

I haven’t heard much to that effect except that FFT-based filters are used to preprocess data before they’re fed into other algorithms. (though Fourier methods on boolean functions are a big topic, it’s quite different from the FFT…)

LikeLike

Hi, I used this Algorithm (Python), and found that it gives excellent frequency response results. However, what I do not understand is that the Amplitude of the FFT coefficients are like in the 1000s, while the signal amplitude was between +1.024 and -1.024. How can i also get accurate amplitude values ? Thank you..

LikeLike

Reblogged this on The Order of SQL.

LikeLike

Not sure how I arrived here, but your entire blog is fantastic and thoroughly readable; thanks for it. I’m posting to let you know that the link to the Stanford lecture notes is broken, but I believe I dug them up at

http://web.stanford.edu/class/ee261/reader/book.pdf

in case you’d like to update it, or for anyone else that stumbles here. Thanks again for the great blog!

LikeLike

Thanks. It’s updated now.

LikeLike

hi very good blog ……. you would like to know what version of Python you used? …

thank you

LikeLike

3.3

LikeLike

umm ok … is that I’m trying to compile and I get error Might you help me?.

and install the packages: numpy, Matplotlib and Scikits.audiolab but in version 2.7.8, because version 3.3 did not find the Scikits.audiolab …

I hope you can help me.

LikeLike

hello, and thank you for your efforts.

I am admittedly, a cookbook math user. I dont no the ‘language’ very well.

xrange is no longer part of the python3.4 library, but they say that ‘range’ will do as well.

LikeLike