Some Fine Points of Fourier Transforms and Spectrum Analyzers

[Contents]

* Contents

1 Overview

For reasons that will be explained below, I define my basic discrete Fourier transforms as follows:

V_f(f_k)

N−1

∑

j=0

V_in(t_j) exp(

−2 π i j k

) dt

for all k from 0 to N−1

(1)

(with an explicit factor of dt), while the corresponding inverse transformation is:

V_out(t_j)

N−1

∑

k=0

V_f(f_k) exp(

2 π i k j

) df

for all j from 0 to N−1

(2)

(with an explicit factor of df). Note the following:

: 1. Everything is written in terms of circular frequency f (not angular frequency ω). Crudely speaking, that means we measure frequency in hertz (not radians per second).
: 2. Except for the sign of i, the structure of equation 1 and equation 2 is the same. There is a pleasing symmetry. This differs dramatically from the approach taken in the usual software packages, where one of the transforms contains an explicit factor of 1/N while the other does not.
: 3. Beware that standard library functions have no clue about the normalization. They do not apply the required normalization factors df and dt. You should do the normalization yourself, immediately after calling the library function, as a matter of habit.

The corresponding analog (non-discrete) Fourier transforms are:

V_f(f)

∫

∞

−∞

V_in(t) exp(−2 π i f t) dt

(3)

V_out(t)

∫

∞

−∞

V_f(f) exp(2 π i f t) df

(4)

: 4. This approach has a pleasing symmetry and a convenient normalization. This is unlike what you see in old-style math textbooks, where there are factors of √2π floating around in front of the integrals, or (worse) a factor of 2π in front of one integral but not the other.
We put the factor of 2π inside the integral, inside the exponential, where it belongs.

2 Normalization

2.1 Normalizing the Ordinates

Suppose we have a vector V_in, representing a voltage waveform sampled at successive points in time. We wish to take the Fourier transform and then the inverse transform, which we denote rather abstractly as:

V_f

FT[V_in]

(forward)

(5)

V_out

IFT[V_f]

(inverse)

(6)

Let’s do an example. Suppose that our input data V_in is a Kronecker delta function in the time domain, that is:

V_in = [1, 0, 0, ⋯⋯, 0] (7)

The way the transform is defined in typical math books, and in typical software libraries, the normalization is such that V_f will be all ones:

V_f = [1, 1, 1, ⋯⋯, 1] ☠ (8)

This is not nice, but it is what you see in canned function libraries (such as fftw), in spreadsheet apps, and in scientific analysis programs (such as scilab or matlab).

The weird thing is that when the transform is defined this way, the inverse transform is defined to have a different normalization. As an example, if V_f is a Kronecker delta function in the frequency domain, and we take the inverse transform, V_out is not all ones; it is smaller by a factor of N, where N is the number of samples.

This leads to some pointed questions:

Why are the forward transform and the inverse transform so different? Conceptually they ought to be very, very symmetric. They need to differ by the sign of the imaginary unit (i) somewhere, but for real-valued inputs that shouldn’t matter very much.
What are the units on V_in?
What are the units on V_f?
We can call V_in and V_f the ordinates (in the time domain and in the frequency domain, respectively). What are the corresponding abscissas?
The usual canned FFT subroutine knows nothing of the abscissas, and tacitly assumes the first abscissa corresponds to t=0. But what if we do know the abscissas, and know that the time series starts at some nonzero time? This is totally nontrivial in some cases, for instance pulsed NMR experiments.
Whatever happened to Parseval’s theorem? As mentioned in section 5, I would hope that our signals would uphold the identity

||V_in|| ≡ ||V_f|| ≡ ||V_out|| (9)

so I guess we need to ask how to take the norm of such vectors.

Here are some answers. Reference 1 contains the scilab code to implement the solutions recommended here.

5. A function and its Fourier transform do not, in general, have the same units. For example, in equation 7 and equation 8, V_in has units of volts, while V_f has units of volt-seconds.

6. Beware that in the idiomatic expressions frequency domain and time domain, the word “domain” is used in a relatively prosaic sense, meaning something like “realm” … which is not to be confused with the technical mathematical notion of the domain of a function (meaning the set of all abscissas). This can lead to messy situations where the word “domain” is used with two different meanings in the same sentence, but the confusion is minimal if you remember that the expressions “frequency domain” and “time domain” are idiomatic, sui generis.

7. The FT and IFT algorithms calculate ordinates for us ... but they don’t help us with the abscissas very much. The abscissas are sorta implied. Suppose we have two seconds of waveform data, sampled at intervals of

dt = 1 millisecond (10)

for a total of

N = 2000 samples (11)

Then the abscissas in the frequency domain will have a spacing of

df		=		1 / (N dt)	(as a rule)
		=		0.5 Hz	(in this case)

(12)

8. I claim that when doing a forward FT, the output should be scaled by a factor of dt. In particular, if V_in is a Kronecker delta function

V_in = [1, 0, 0, 0, .... , 0] (13)

then the forward transform should produce

V_f		=		myFT[V_in]
		=		[.001, .001, .001, .... , .001]

(14)

which differs from the V_f in equation 8 by a nontrivial correction factor, namely a factor of dt.

9. I claim that when doing an inverse FT, the output ordinate should be scaled by a factor of df, that is, the spacing of the output abscissas. In particular, if V_f is a Kronecker delta function,

V_f = [1, 0, 0, 0, .... , 0] (15)

then the inverse transform should be

V_out		=		myIFT[V_f]
		=		[.5, .5, .5, .... , .5]

(16)

which differs by two nontrivial correction factors from what you get from ordinary software packages. In particular, when using scilab, matlab, or the fftw library, the software automagically applies a factor of 1/N to the ordinates whenever it computes an inverse transform, so you need to undo that (by multiplying by a factor of N). Then you need to apply a further factor of df.

By the way, note that multiplying by (N df) is the same as dividing by dt. This means that if you perform a forward transform and an inverse transform, the correction factors discussed in this section drop out, and the corrected transforms really are inverses of each other.

2.2 Normalizing the Abscissas

10. If your raw data is in the time domain, then presumably you know dt just from looking at your data.

In any case, in the time domain, the abscissa of the jth point is the time

t_j = j dt (17)

Meanwhile, in the frequency domain, the abscissa of the kth point is the frequency

f_k = k df (18)

which we can calculate via

df =

N dt

(19)

In equation 1 and equation 2, there are explicit factors of dt and df, respectively. These factors do not appear in the usual “textbook” definition of the transforms, but they have several advantages. For one thing, this makes the units come out right, in accordance with item 5. Secondly, this allows a high degree of symmetry between equation 1 and equation 2. In particular, I don’t need the ugly factor of 1/N that conventionally appears in the inverse transform.

(Don’t overlook the qualifiers “basic” and “discrete” that apply to this basic discrete Fourier transform. Equation 1 is not the most general notion of Fourier transform, as will be discussed in section 6.)

11. In addition to elegance, there are many practical benefits to doing things as suggested by equation 1. Perhaps the most important involves Parseval’s theorem. In preparation for discussing that, let us note that the relevant norm-squared is not just

||V_in||²

norm squared

V_in · V_in

☠

∑

V_in(t_i) * V_in(t_i)

☠

(20)

nor

||V_f||²

norm squared

V_f · V_f

☠

∑

V_f(f_i) * V_f(f_i)

☠

(21)

Think about the Riemann integral: You don’t just sum up a bunch of ordinates; you need to weight each term in the sum by the measure of the corresponding abscissas.

Therefore the right answer is:

\|\|V_in\|\|²		=		V_in · V_in dt	with units of volt² seconds
\|\|V_f\|\|²		=		V_f · V_f df	with units of volt² seconds

(22)

If you actually work out these quantities, using the Fourier transform as defined in equation 1, then the norm of the original data is equal to the norm of the transformed data, which means we are upholding Parseval’s theorem.

12. The same arrangements generalize cleanly to the case of Fourier integral transforms, for analog (non-discrete) signals. They are not limited to the discrete Fourier sums considered above.

That is, the integral transforms are defined by equation 3 and equation 4.

Note that a big part of the trick is to think of frequencies (f) in circular measure (cycles per second, i.e. Hz) instead of frequencies (ω) in angular measure (radians per second).

3 Aliasing

Given that the input is sampled in steps of size dt, the output will always be periodic with a frequency equal to the sampling frequency, i.e. f_S := 1/dt. The celebrated Nyquist frequency is half of that, i.e. f_N := 0.5/dt. We have once again expressed f is in circular measure, and you can convert to radian measure if you want, in the obvious way, ω = 2πf.

The reason why aliasing occurs is obvious if you imagine that the input signal was built up from periodic waves to begin with. The signal component V_in = A G2π(f)t looks a whole lot like V_in = A G2π(f + f_S)t or V_in = A G2π(f + k f_S)t for any integer k, when sampled at intervals of dt. This works for any function G that is periodic with period 2π. The point is that the argument to G contains the term:

(period) (k f_S) t = (period) (k f_S) i dt (23)

for integers k and i. The factor of f_S cancels the factor of dt, leaving only integer multiples of the period, so this term drops out.

In figure 1 you can see how this works, i.e. why the output must be periodic in frequency space, whenever the input is sampled in the time domain.

Figure 1: Why Aliasing Must Happen

We have a periodic function (shown in black) with period 1 (not 2π) and therefore frequency f=1. We sample it with sampling frequency f_S=4. We also have a scaled version of the same function, scaled up in frequency by a factor of 5 in the time direction (shown in blue). (Multiplying the frequency by 5 is the same as adding 4 in this example. Recall that f_S=4.) You can see that both functions go through the exact same sampling points (circled in red). So the scaled function is indistinguishable, sample-wise, from the original function. So if we perform Fourier analysis (or indeed any kind of frequency analysis) on sampled data, we will find that anything that happens at frequency f will happen again at frequency f ± f_S and again at frequency f ± 2 f_S, et cetera.

The scaled function may differ wildly from the original function in between sample points, but it always comes home at the sample points.

You cannot safely decide whether a signal is band-limited or not by looking at the sampled data! That would be like asking the drunkard whether he is drunk. For example, if I have a 100.01 Hz wave and sample it at 10.00 Hz, it will look like a beautiful 0.01 Hz signal. Ooops.

If you want to be sure that the original data is band-limited, rely on the physics, not on the math. That is, apply an analog filter to the signal, upstream of the digitizer. A real, physical, analog filter. If the filter passes negligible energy above the Nyquist frequency, then there will be negligible aliasing.

Standard good practice is to have the analog filter passband extend significantly above any frequency component you care about, so that you are not sensitive to details of the filter performance. Secondly it is standard good practice to sample at a sufficiently high rate that the Nyquist frequency is substantially above the passband of the filter. This means that you don’t need an analog filter with a fancy ultra-steep rolloff, while ensuring that no power gets through above the Nyquist frequency. When considered together, as in equation 24, these practices require that the sampling frequency be quite a bit higher than the highest frequency component of interest. This is called oversampling.

frequency		analog		digital
of	≪	prefilter	≪	sampling
interest		bandwidth		frequency

(24)

Oversampling by factors of 10 or more is common these days. That creates 10 times more sampled data than you might naïvely think was necessary. Some of that is necessary if you want any semblance of quality, and some of that can be removed by digital filtering after the data has been safely digitized. High-Q digital filters are much better behaved than high-Q analog filters.

4 The Convention for Positive Frequency

This is somewhat a matter of taste, but I recommend choosing the sign of the imaginary unit (i) the way it appears in equation 1 and equation 3. This puts a minus sign in the expression defining the forward transform, and not in the inverse transform, which may not be the first thing you would have guessed. The advantage is that we want the signal

W_f(t) = exp(+2 π i f t) (25)

to be a wave with the frequency f. We do not want a minus sign in equation 25.

Note that the scilab program follows the convention I recommend. Some other software packages do not.

(If you are taking the transform of data that doesn’t have an imaginary part, you can’t tell the difference between positive frequency and negative frequency anyway, in which case it doesn’t much matter how you choose the sign of i.)

Non-experts sometimes want to have their cake and eat it too. That is, they want to have a plus sign in the exponential in equation 25 and a plus sign in the exponential in equation 1 also. This is not possible, because equation 1 has the form and function of an inner product. As discussed at e.g. reference 2, when complex numbers are involved an inner product must have the property that ⟨x,y⟩ = ⟨y,x⟩^*, where the * denotes complex conjugate. There are many good reasons for this, including the fact that we want the inner product of something with itself to be a real number.

5 Parseval’s Identity

If you are doing anything involving Fourier transforms (including FFTs) you should make it a habit, every time you do a transform, to check that it upholds Parseval’s identity:

N−1

∑

j=0

|V_in(t_j)|² dt

≡

N−1

∑

k=0

|V_f(f_k)|² dt

(26)

the factors of dt and df are important. The analog (non-discrete) version is:

∫

|V_in|² dt

≡

∫

|V_f|² df

(27)

If it fails that test, it means there’s something messed up with the normalization or the units or something. Remember that standard library functions know nothing about normalization; you have to do it yourself, prior to performing the Parseval check.

6 Increasing the Resolution of Discrete Fourier Transforms

We now present several ways of obtaining a high-resolution Fourier transform. Section 6.1 presents the most advanced result, namely frequency-domain abscissas that are continuous, not discrete. Section 6.2 and section 6.3 present simple methods that are discrete, but with higher resolution due to finer spacing. Section 6.4 presents yet another method for obtaining the same result, in a way that is more mathematically sophisticated and more computationally efficient.

In all cases we are using the same raw time-domain data; we are improving the resolution of the frequency-domain results by using the data more cleverly.

6.1 Continuous as Opposed to Discrete

Let’s talk about the following:

FFT: Fast Fourier Transform
DFT: Discrete Fourier Transform
FT: Fourier Transform

Sometimes people throw around the term “FFT” as if it were fully synonymous with Fourier transform. It is not.

FFT means Fast Fourier Transform, and is the name of a clever algorithm for performing a DFT efficiently. You could, if you wanted, perform the DFT using various less-efficient algorithms, and obtain a numerically-identical answer. In such a case it would still be a Fourier transform, just not an FFT.

The DFT has the interesting property that in general, if V_in consists of N complex values in the time domain, the transform produces N complex values in the frequency domain … and vice versa. In general, this number of samples is necessary and sufficient for the transform to be invertible. (There are some exceptions in cases where V_in and V_f have special symmetries.)

Let’s do an example. We consider the data shown in figure 2. The original data is the sum of two sine waves, as shown by the continuous curve in the figure. We sample the data at the 32 points indicated.

Figure 2: Time-Series Data

When we take the discrete Fourier transform, we obtain 32 points, namely the 17 points shown in figure 3, plus the mirror-image points at negative frequencies. The spacing between abscissas is df = 1/(32 seconds), i.e. 31.25 millihertz. Note that the ordinate is the magnitude of V_f, so we are not plotting the power spectrum, but rather the square root of the power spectrum.


Figure 3: Discrete Fourier Transform		Figure 4: High-Resolution Fourier Transform

We are now in position to discuss an important concept. Although, as previously mentioned, the N complex values are sufficient to make the transform mathematically invertible, invertibility is not the only consideration. The spectrum often has other uses, notably helping us to understand and communicate important facts about the signal. Plotting N points (as in figure 3) leaves out a lot of information, as we shall see in a moment.

We can define a more informative Fourier transform as follows: Start with equation 1 and eliminate k by substituting k → f_k / df → f_k N dt in accordance with equation 18 and equation 19. Then equation 1 becomes:

V_f(f_k)

N−1

∑

j=0

V_in(t_j) exp(−2 π i j f_k dt) dt

V_f(f)

N−1

∑

j=0

V_in(t_j) exp(−2 π i j f dt) dt

for all f

(28)

where the second line is a generalization, valid for all f whatsoever, not just integral multiples of df. This is illustrated in figure 4. You can see that figure 4 has vastly more detail than figure 3. This additional detail does not come from using more raw time-domain data; rather, it comes from using the existing data in more-informative ways. Both figures are based on the same 32-point time-series data.

You can see that equation 28 does not have (or need) any notion of df. This shouldn’t be surprising. The wave function W_f(t) as given by equation 25 is well defined for all times and for all frequencies. It is defined for all frequencies (f_k) that happen to fit an integer number of cycles into the span of data (N dt) and for all other frequencies besides. It is defined for times (t_j = j dt) that coincide with the data samples and for all other times besides. Equation 28 requires sampling the time variable to coincide with the data, but there is no requirement to sample the frequency variable.

The relationship between figure 3 and figure 4 is easy to see if we overlay them, as in figure 5. The function V_f(⋯) defined by equation 1 is a subset of the function V_f(⋯) defined by equation 28. That is, V_f(f) is defined for all f, while V_f(f) is only defined for the isolated points f = k df. At the places where both functions are defined, they are equal.

Figure 5: High-Res Compared to Discrete Fourier Transform

Again you can see that the high-resolution transform has vastly more detail than the low-resolution transform. You may or may not be interested in this additional detail, depending on what problem you are trying to solve. Here are some common examples that illustrate the range of possibilities:

Suppose you are doing some digital filtering. You start out with N time-series points, take the transform, multiply by some filter kernel, and transform back. This works fine, and the transform doesn’t need more than N points in the frequency domain.
Suppose you are doing spectrum analysis. If you just look at figure 3, you could very easily mis-estimate the height, width, position, and symmetry of the peaks in the spectrum. You really need the additional detail as seen in figure 4.

The limitations of the low-resolution transform look even more dramatic if we plot the power (rather than the magnitude), as in figure 6.

Figure 6: High-Res Power Spectrum Compared to Discrete Fourier Transform

You can see that making a good spectrum analyzer is quite a bit trickier than simply making a call to the standard FFT subroutine. The following sections discuss various ways of increasing the resolution.

6.2 A Discrete High-Resolution Transform

It is possible to increase the resolution of a Fourier transform while still maintaining some discreteness. This is less mathematically sophisticated than the fully continuous transformation, but may be more convenient for computation.

Here’s one way you could do this, if you wanted to rewrite the guts of the fourier-transform routines. This is not practical for the casual user, but it is worth discussing briefly, because it sheds some light on the fundamental issues. A more practical quick-and-dirty method is discussed in section 6.3 and a practical efficient method is discussed in section 6.4.

Suppose we want to have a large number of finely-spaced points in the frequency domain. Let N_t denote the number of raw data points in the time domain, and N_f denote the number of data points in the frequency domain. We want to have N_f ≫ N_t and therefore df = 1/(N_f dt) ≪ 1/(N_t dt). For some applications, such as visualizing the height, width, position, and symmetry of spectral features, we need to enhance the resolution by a factor (N_f/N_t) of at least 4, preferably 8 or even more.

We can generalize equation 1 by replacing each N that appears there by N_f or N_t as appropriate. That gives us:

V_f(f_k)

N_t−1

∑

j=0

V_in(t_j) exp(

−2 π i j k

N_f

) dt

for all k from 0 to N_f−1

(29)

which is more general than equation 1 (but less general than equation 28). The corresponding inverse transformation is:

V_out(t_j)

N_f−1

∑

k=0

V_f(f_k) exp(

2 π i k j

N_f

) df

for all j from 0 to N_t−1

(30)

Note that things are not quite as symmetric as you might expect, since N_f appears in the denominator inside the exponential in both the forward and inverse transform. That is related to the fact that we assume we are given some immutable time-domain data, and we are trying to increase the resolution in the frequency domain. The frequency spacing is

df =

N_f dt

(31)

This, plus the fact that there are N_f frequency-domain data points passed from the forward transform to the reverse transform, is sufficient to establish the orthornormality relationships that allow the inverse transform to properly undo the forward transform.

6.3 Padding: Another Discrete High-Resolution Transform

As another way to formulate the high-resolution Fourier transform, some people find it conceptually simpler to use equation 1, equation 2, and equation 19 as they stand, with the understanding that every N that appears there is N_f not N_t. That requires you to pad the time-domain data with N_f−N_t zeros, so that there are N_t “raw” time-domain points and a total of N_f “cooked” time-domain points. Such padding means it no longer matters whether the summation in equation 1 runs up to N_t−1 or N_f−1.

Reference 1 uses this trick, i.e. padding the input data with lots of zeros. This has the advantage of being easy to remember and easy to code. It is somewhat inelegant. It is less efficient than the method discussed in section 6.4, but not grossly inefficient, so unless you are doing huge numbers of transforms, it may not be worth your trouble to improve on it. (This assumes you are passing the data to a “fast” Fourier transform routing. In contrast, if you are using a brute-force un-fast Fourier transform, as in section 7, you will pay a high price for zero-padding.)

If you take the limit of large N_f in equation 29, it becomes effectively equivalent to equation 28. This reminds us of the key mathematical fact: The Fourier transform is well defined for any frequency whatsoever, even if you have a limited amount of time-domain data.

6.4 Heterodyning: A Smarter High-Resolution Transform

Some people question the validity of the generalization from equation 1 to equation 28, and also question the validity of padding the data with zeros. (After all, the real data isn’t zero, and anything involving fake data must be suspect.)

So, here is yet another method that obtains the same result. It should be pretty clear that this method is mathematically well-founded. Another advantage is that it is more computationally efficient. Indeed it is as efficient as it possibly could be.

The first ingredient in this method is the theorem that says multiplying two functions in the time-domain corresponds to a convolution in the frequency domain. The second ingredient is the observation that a frequency-shift can be considered a convolution, namely a convolution with a delta-function. The inverse Fourier transform is particularly simple; it is a trigonometric wavefunction of the form given by equation 25, where the f in that equation is the amount of the desired frequency-shift. So, the plan is to multiply the raw time-series data by a function of this form.

In the electronics and signal-processing field, multiplying by such a signal is called heterodyning. We will denote the amount of shift by f_LO where “LO” stands for Local Oscillator.

You can verify that this works as advertised by doing some examples, perhaps using a spreadsheet as discussed in section 7.5.

If we heterodyne the raw data using a frequency shift that happens to be equal to df (i.e. the conventional “natural” spacing of points in the frequency domain), then it just shifts the frequency-domain data over one step. This doesn’t tell us anything new.

Things get much more interesting if we shift things by a fraction of df, let’s say a tenth of df for example, so that f_LO = 0.1df. This shifts the frequency data by a tenth of a step. What does that mean? Well, recall that the basic discrete Fourier transform samples the continuous transform at the conventional points k df, as shown in figure 5. Sampling the heterodyned data at the conventional points is mathematically equivalent to sampling the un-heterodyned data at the unconventional points

f_k = kdf − f_LO (32)

which in this case is (k−0.1)df.

Note the minus sign in equation 32. The logic here is that when we multiply the raw data by a local-oscillator wave with positive frequency f_LO, any feature that existed in the raw data lies at a higher frequency in the cooked data. Therefore if we apply a conventional FFT to the cooked data and see a feature at frequency kdf, we should associate that feature with a lower frequency kdf − f_LO in the raw data. You can verify this numerically, as discussed in section 7.5.

Heterodyning gives us a highly quibble-resistant way of ascertaining what is going on “between” the conventional frequency-domain sampling points.

It does not require padding the data.
It does not require even the slightest change to the definition or implementation of the discrete Fourier transform.
It is as computationally efficient as it could possibly be. Whereas the ordinary FFT scales like Nlog(N), this type of hi-res transform scales like N_flog(N_t). This is an improvement of the padding trick, which scales like N_flog(N_f).

It is straightforward to prove (and/or verify numerically) that the methods suggested in section 6.2 and section 6.3 produce the exact same end results as the heterodyne method. This means that those methods were mathematically well founded all along, even if that wasn’t obvious until now.

7 Exercises using a Spreadsheet

Reference 3 is a spreadsheet that implements a Fourier transform, directly from the definition (equation 1) without calling any canned FFT library routines. This may help make the transforms less mysterious.

7.1 Basic Forward Transform

We start with N points of time-series data. N=8 should be enough for a start. (Later you can move on to N=32 if you are ambitious.) You can enter whatever ordinates you want in cells B12:C19 of the spreadsheet. The off-the-shelf version of the spreadsheet sets the input data to a wavefunction with frequency determined by the “fx” variable in cell C5. Feel free to change fx, or to completely replace the input data in cells B12:C19.

The time-domain cells are color-coded light blue. Similarly the graph of the time-domain data has a light blue background. In contrast, frequency-domain cells and the frequency-domain graph are color-coded light red.

Slightly to the right of time-series inputs, there is a rectangular table, N wide by N high, color-coded light green. It has time running vertically and frequency running horizontally.

Each cell of this table implements the complex exponential in equation 1. Since spreadsheets are not smart about complex numbers, we actually need two N×N tables, one for the real part and one for the imaginary part of the complex exponential … i.e. a cosine table and a sine table.

To implement the rest of the summation called for in equation 1, we use the sumproduct() function (rather than just the sum() function). The results are placed below the table, in cells F24:M25. These are color-coded light red, since they are frequency-domain data. There is also a graph of this data.

Note that there are N nontrivial frequency-domain points. That is, there is no “folding” or mirror-imaging. This is related to the fact that the input data had an imaginary part as well as a real part. (Without the imaginary part it would have been impossible to tell the difference between positive frequency and negative frequency, as you can easily verify by zeroing out the imaginary part of the input time-series data.) Actually I like to plot N+1 frequency-domain points, so that the plot better reflects the periodicity of the data.

We calculate the time-domain abscissas in cells A12:A19 based on the parameter dt in cell C4. We also calculate the frequency-domain abscissas in cells F23:M23.

We calculate the energy of the time-domain data and the energy of the frequency-domain data. These agree, as they should, in accordance with Parseval’s theorem. (The fact that we correctly calculated the abscissas is a big help when calculating the energy.)

It is amusing to change dt. Observe the effect on the abscissas, ordinates, and energies (in both the time domain and the frequency domain).

7.2 Inverse Transform

The inverse transform is implemented using the same technique. The complex exponential is in the yellow tables. The output of the inverse transform can be found in cells B33:C40. Below these cells is a check to confirm that the time-domain output agrees with the time-domain input.

7.3 Higher-Resolution Transform

If there are N_D data points, there is absolutely nothing preventing you from implementing more than N_D columns in the aforementioned tables. For example, you could easily double the number of columns (with a corresponding decrease in the column-to-column frequency spacing).

This produces higher resolution in the transform.

I implemented this in reference 4. There are still N_D=8 time-domain data points, but there are now N=12 columns. The blue transformation tables are of size 8×12.

You can see that this is formally equivalent to padding the time-domain data with zeros and then using a 12×12 table. The spreadsheet omits the bottom four rows of the 12×12 since there is no point in doing a lot of multiplication by zero.

This also makes the point that Fourier transforms do not require the number of data points to be a power of two. Some canned FFT routines do require this, but the better ones do not. The FFT (if done right) is efficient for any highly-composite number, that is, any number that is the product of small primes. For example, if you have 3⁷=2187 data points, it is more efficient to process that directly as 3⁷, rather than rounding it up to 4¹²=4096. And if you are using a non-fast Fourier transform, such as the direct brute-force Fourier transform in reference 4, then you can use any arbitrary number of points.

In the example in reference 4, since the frequency of the input data is 1/12th of a Hertz, fiddling with the resolution gives a better measure of the height of the peak in the frequency-domain data, since it allows the peak to fall on one of the calculated points.

More generally, though, increasing the resolution by a factor of 1.5 isn’t enough to accomplish much. If you are trying to measure the height and width of peaks, you need to oversample the frequency-domain data by at least a factor of 4, preferably a factor of 8 or more. If you want to see the advantages of higher resolution, look at figure 5 not reference 4 (or modify your copy of reference 4 to use vastly more columns).

7.4 Inverse of the High-Res Transform

What do you think will happen if we take the inverse of the high-resolution transform?

The spreadsheet in reference 4 contains an example of this. (You can think of this as the combination of section 7.2 with section 7.3.)

7.5 Heterodyning

The spreadsheet in reference 5 implements heterodyning, of the sort discussed in section 6.4.

In the raw data, there is a peak at 0.1 Hz. This falls between steps in the conventional basic discrete Fourier transform, which has a stepsize of 0.125 Hz (i.e. 1/8th of a Hz, as expected, since there are 8 raw data points 1 second apart).

By using a LO frequency of 0.025 we can get a good look at the height of this peak. By using LO frequencies in the neighborhood of 0.025 we can learn about the width of this peak.

Feel free to change the LO frequency in cell G7 and see what happens. Values near 0 are interesting, and values near 0.025 are interesting.

In the frequency-domain plot (light red background), the red curve is the real part of V_f and the black curve is the imaginary part. The solid circles represent points calculated using heterodyning, while the circles with white centers represent points calculated without heterodyning, i.e. by applying the basic transform to the raw data. You can verify that the raw data and cooked data coincide if you set the LO frequency is zero.

This is just an introductory demonstration of some features of heterodyning. In a real application, you wouldn’t use a spreadsheet, and you would use the heterodyning trick repeatedly with many different LO frequencies (not just once as in this spreadsheet).

7.6 Remarks

The so-called "fast" Fourier transform is indeed relatively efficient. If there are N data points, it requires an amount of work that scales like Nlog(N), in contrast to the brute-force method used by the spreadsheet exercises, which obviously scales like N².
For large N, the brute-force method is not appropriate, but for small pedagogical examples it works just fine.
I think there is some pedagogical value in the way the spreadsheets represent the time-domain data as column vectors and the frequency-domain data as row vectors.
This is neither entirely the same nor entirely different from the way ordinary linear algebra uses column vectors and row vectors to represent contravariant and covariant vectors, i.e. pointy vectors and one-forms. One similarity is that if you scale up the intervals in the time-domain data, the intervals in the frequency-domain data get scaled down. One dissimilarity is that there is not (so far as I know) a meaningful contraction operation connecting the time-domain and frequency-domain data. (If this paragraph doesn’t mean anything to you, don’t worry about it.)
There is some pedagogical value in implementing the transform via a spreadsheet, because there are quite a few people who are willing to work with spreadsheets but are morbidly averse to using traditional computer languages.

8 Fourier Transform versus Frequency Analysis

Fourier transforms, while useful, are not the be-all and end-all when it comes to analyzing periodic or quasi-periodic data.

You know the saying: «If your only tool is a hammer, everything starts to look like a nail.» People are so accustomed to using Fourier transforms that sometimes they think the Fourier transform “defines” what we mean by frequency content. It doesn’t.

8.1 Periodic Continuation

As an example, consider the free-induction decay that is the raw data in a typical pulsed NMR (FT-NMR) experiment. It has a nice sinusoidal part, but that is convolved with a wicked sawtooth-shaped envelope due to the sudden rise and gradual decay. The Fourier transform, by its nature, responds to the envelope in ways that interfere with simple interpretation of the frequency-domain data. You can say that the sinusoidal component of the time-domain data is represented by the center of the peak, and the decay of the envelope is represented by the width of the peak … but what about the steep rise at the beginning of the pulse? That causes weird artifacts.

There are additional artifacts if the apparatus does not capture the entire tail of the decay.

Apodization is a kludge to camouflage the worst of the artifacts. Trying to understand apodization may not be worth the trouble.

Remember that a Fourier transform is basically a linear analysis, followed perhaps by some nonlinearity in the peak-finding step.

At some point, you may want to give up on the FT method entirely, and just do a full-blown nonlinear analysis. This takes a ton of computer power, and wasn’t possible in Fourier’s day (or even 20 years ago), but it is possible now.

In particular, consider the time-series data in figure 2. Would you say that there "is" power at the frequencies where there are wiggly feet in figure 4? The FT power spectrum says there “is” energy at these frequencies, but other types of analysis (such as curve-fitting a sum of sine waves to the raw data) would say no.

Similarly, what about the width of the peaks in figure 4? The Fourier transform method attributes some nonzero width to the peaks. However, curve-fitting could establish that there are really just two sine waves, one at frequency 0.0625 and the other at frequency 0.3, with virtually no uncertainty about these frequencies, and with no residuals i.e. no other frequency content.

As another way to look at the same idea, the Fourier method is almost tantamount to assuming that the input data is periodic in the time domain, with period N dt. We can understand this with the help of figure 7.

In the figure,⁵ the red curve represents the real data. However, only the first 8 points (t=0 through t=7 inclusive) are actually observed, as shown by the solid dark red curve. Alas, the data has a frequency of 0.1 Hz, which is not commensurate with the 8-second span of the data. (A frequency of 0.125 Hz would have been commensurate.)

Figure 7: Wave Incommensurate with the Span of the Data

The Fourier transform method assumes the observed data will be periodically continued, as shown by the black curve. In this case there is a significant misfit between the unobserved real data and the periodic continuation of the observed data.

So in general, if your dataset doesn’t have a reasonable periodic continuation, or if it has a period incommensurate with N dt, then the whole Fourier approach is on shaky ground. Apodization may camouflage the worst of the problems, but you’ve still got problems. The real data in figure 7 is a perfectly nice sine wave, but the periodic continuation is not. This is a fundamentally unfixable problem, because the Fourier method blindly depends on the periodic continuation.

8.2 Not Every Spectrum is a Fourier Transform

As another example, consider the so-called Fourier Transform Infra Red spectrometer (FTIR). The name FTIR is a misnomer in some ways. Non-experts commonly imagine that there is some sort of Fourier transform / inverse transform pair is involved, but there isn’t.

The crucial part of the machine is a Michelson interferometer, which performs an autocorrelation – not a Fourier transform. An autocorrelation is related to a Fourier transform via the Wiener-Khinchin theorem, but “related to” is not the same as “same as”.

In particular, the spectrum coming out of an FTIR machine shows a so-called “zero burst” which cannot be explained in terms of Fourier transforms, but is exactly what you would expect from an autocorrelation.

8.3 Correspondence

In the spirit of the correspondence principle, let’s examine the relationship between a discrete Fourier transform and a non-discrete Fourier transform. We start with a discrete transform and see what happens:

As previously discussed: When dt becomes small, the frequency-range of the first period of the output becomes large, so it covers a big piece of the f-axis. (The output is always defined for all frequencies, so we can’t talk about the range of the output, only the range of one period.)
Obviously as N_tdt becomes large, the input data covers a big piece of the t-axis.
As previously discussed: In a discrete transform (such as an FFT), when N_fdt becomes large, the result of the transform (i.e. V(f)) becomes nearly continuous. The period (in frequency space) is being divided very finely. From a more sophisticated view, the result of the transform was completely continuous all along, as you can see from equation 34.
For present purposes, let’s assume N_f = N_t, so that the transformation will be “just barely” invertible in both directions. Call this common value N.
If you take both limits together – small dt and large Ndt – the discrete Fourier transform begins to look a whole lot like the old-fashioned non-discrete Fourier transform.

We can do another correspondence check, going in the opposite direction, i.e. starting with a continuous Fourier integral and converting it to a discrete sum. In a certain sense, that’s easy, because a sum is always a special case of an integral. From a measure-theory point of view, a sum is just an integral with a funny weighting function.

In particular, we can use the Dirac delta-function (δ) to apply unit weight to the places where we sample the waveform, and zero weight elsewhere. Examples include:

∫

f(x) δ(x − x₁) dx

f(x₁)

∫	f(x) [δ(x − x₁) + δ(x − x₂)] dx

f(x₁) + f(x₂)

∫

f(x) [

∑

δ(x − x_i)] dx

∑

f(x_i)

(33)

The first line of equation 33 is the defining property of the δ function. Once you buy that, the rest follows immediately by linearity. An integral is a linear operator.

If we plug this idea into the Fourier transform for continuous data (equation 3), we obtain the corresponding expression for sampled data:

V_f(f)

∫

V_in(t) [

∑

δ(t−t_i)] exp(−2 π i f t) dt

∑

V_in(t_i) exp(−2 π i f t_i)

(34)

which is the desired result. In the last line, we are free to write V_in(t_i) in place of V_in(t_i), because they are numerically equal. We can understand this relationship as follows: We can always express a function as a set of ordered pairs.¹ As such, the function V_in() is is a subset of the function V_in(). That is, {(t_i, V_in(t_i))} ⊂ {(t, V_in(t))}. If we know V_in() it will tell us V_in() but not vice versa.

Note that even when the input consists of discrete samples, the result V_f() is defined for all frequencies.

The V_f on the LHS of equation 34 is not numerically equal to the V_f on the LHS of equation 3. The data being transformed is different, so we expect the results of the transform to be different.

9 Summary

When using Fourier transforms, you often need to scale the ordinates, if you want them to make sense. Section 2.1 explains how to do this.
Very often, you also need to worry about the abscissas, especially if you want to invoke Parseval’s theorem or anything else that depends on the normalization. Section 2.2 explains how to do this.
If you have an opinion about what constitutes “positive” frequency, that determines the sign of i in the definition of the transform.
The amount of resolution need to make the transformation invertible is far, far less than the amount of resolution needed for determining the height and width of peaks in the data. There are easy ways of increasing the resolution, as explained in section 6.
There is some pedagogical value in implementing the transformation via a spreadsheet, so you can see all the steps involved.
Fourier transform is not synonymous with optimal frequency analysis, especially in cases where the data does not have a period commensurate with the span of the data (N dt).

10 References

: 1.
scilab code for Fourier transforms ./fourier-norm.sci
: 2.
Wikipedia article: “Inner Product” http://en.wikipedia.org/wiki/Inner_product
: 3.
Spreadsheet demonstrating Fourier transforms ./fourier-demo.xls
: 4.
Spreadsheet demonstrating Higher-Resolution Fourier transforms:
./fourier-demo-hi.xls
: 5.
Spreadsheet demonstrating Heterodyne Fourier transforms:
./fourier-hetero.xls

1: Specifically, in all formality, a function is, by definition, a set of ordered pairs such that for each abscissa there is a unique ordinate.

[Contents]