Suppose we have a vector Vin, representing a voltage waveform sampled at successive points in time. We can take the Fourier transform
| Vf = FT[Vin] (1) |
and we can take the inverse transform:
| Vout = IFT[Vf] (2) |
Let’s do an example. Suppose that our input data Vin is a Kronecker delta function in the time domain, that is:
| Vin = [1, 0, 0, ⋯⋯, 0] (3) |
The way the transform is defined in typical textbooks, the normalization is such that Vf will be all ones:
| Vf = [1, 1, 1, ⋯⋯, 1] (4) |
This same normalization is also used in typical FFT implementations, as in canned function libraries and in programs such as scilab or matlab.
The funny thing is that when the transform is defined this way, the inverse transform is defined to have a different normalization. As an example, if Vf is a Kronecker delta function in the frequency domain, and we take the inverse transform, Vout is not all ones; it is smaller by a factor of N, where N is the number of samples.
This leads to some pointed questions:
| ||Vin|| = ||Vf|| = ||Vout|| (5) |
so I guess we need to ask how to take the norm of such vectors.
Here are some answers. Reference 1 contains the scilab code to implement the solutions recommended here.
| dt = 1 millisecond (6) |
for a total of
| N = 2000 samples (7) |
Then the abscissas in the frequency domain will have a spacing of
| (8) |
| Vin = [1, 0, 0, 0, .... , 0] (9) |
then the forward transform should produce
| (10) |
which differs by a nontrivial factor from the Vf in equation 4.
| Vf = [1, 0, 0, 0, .... , 0] (11) |
then the inverse transform should be
| (12) |
which differs by a couple of nontrivial factors from what you get from ordinary software packages. In particular, when using the scilab or matlab inverse FT, the software applies a factor of 1/N automagically to the ordinates it computes, so you need to undo that (by multiplying by a factor of N) before applying the factor of df. By the way, note that multiplying by (N df) is the same as dividing by dt.
| (13) |
and the corresponding inverse transformation is:
| (14) |
In the time domain, the abscissa of the jth point is the time
| tj = j dt (15) |
Presumably you know dt from looking at your data. Meanwhile, in the frequency domain, the abscissa of the kth point is the frequency
| fk = k df (16) |
which we calculate via
| df = |
| (17) |
I included in equation 13 and equation 14, I included factors of dt and df, respectively. These factors do not appear in the usual “textbook” definition of the transforms, but they have several advantages. For one thing, this makes the units come out right, in accordance with item 1. Secondly, this allows a high degree of symmetry between equation 13 and equation 14. In particular, I don’t need the ugly factor of 1/N that conventionally appears in the inverse transform.
(Don’t overlook the qualifiers “basic” and “discrete” that apply to this basic discrete Fourier transform. Equation 13 is not the most general notion of Fourier transform, as will be discussed in section 3.)
| (18) |
nor
| (19) |
Think about the Riemann integral: You don’t just sum up a bunch of ordinates; you need to weight each term in the sum by the measure of the corresponding abscissas.
Therefore the right answer is:
| (20) |
If you actually work out these quantities, using the Fourier transform as defined in equation 13, then the norm of the original data is equal to the norm of the transformed data, which means we are upholding Parseval’s theorem.
That is, the integral transforms are defined by
| (21) |
Note that a big part of the trick is to think of frequencies (f) in circular measure (cycles per second, i.e. Hz) instead of frequencies (ω) in angular measure (radians per second).
This is somewhat a matter of taste, but I recommend choosing the sign of the imaginary unit (i) the way it appears in equation 13 and equation 21. This puts a minus sign in the expression defining the forward transform, and not in the inverse transform, which may not be the first thing you would have guessed. The advantage is that we want the signal
| W(f; t) = exp(+2 π i f t) (22) |
to be a wave with the frequency f. We do not want a minus sign in equation 22.
Note that the scilab program follows the convention I recommend. Some other software packages do not.
(If you are taking the transform of data that doesn’t have an imaginary part, you can’t tell the difference between positive frequency and negative frequency anyway, in which case it doesn’t much matter how you choose the sign of i.)
Non-experts sometimes want to have their cake and eat it too. That is, they want to have a plus sign in the exponential in equation 22 and a plus sign in the exponential in equation 13 also. This is not possible, because equation 13 has the form and function of an inner product. As discussed at e.g. reference 2, when complex numbers are involved an inner product must have the property that ⟨x,y⟩ = ⟨y,x⟩*, where the * denotes complex conjugate. There are many good reasons for this, including the fact that we want the inner product of something with itself to be a real number.
We now present several ways of obtaining a high-resolution Fourier transform. Section 3.1 presents the most advanced result, namely frequency-domain abscissas that are continuous, not discrete. Section 3.2 and section 3.3 present simple methods that are discrete, but with higher resolution due to finer spacing. Section 3.4 presents yet another method for obtaining the same result, in a way that is more mathematically sophisticated and more computationally efficient.
In all cases we are using the same raw time-domain data; we are improving the resolution of the frequency-domain results by using the data more cleverly.
Let’s talk about the following:
Sometimes people throw around the term “FFT” as if it were fully synonymous with Fourier transform. It is not.
FFT means Fast Fourier Transform, and is the name of a clever algorithm for performing a DFT efficiently. You could, if you wanted, perform the DFT using various less-efficient algorithms, and obtain a numerically-identical answer. In such a case it would still be a Fourier transform, just not an FFT.
The DFT has the interesting property that in general, if Vin consists of N complex values in the time domain, the transform produces N complex values in the frequency domain … and vice versa. In general, this number of samples is necessary and sufficient for the transform to be invertible. (There are some exceptions in cases where Vin and Vf have special symmetries.)
Let’s do an example. We consider the data shown in figure 1. The original data is the sum of two sine waves, as shown by the continuous curve in the figure. We sample the data at the 32 points indicated.
When we take the discrete Fourier transform, we obtain 32 points, namely the 17 points shown in figure 2, plus the mirror-image points at negative frequencies. The spacing between abscissas is df = 1/(32 seconds), i.e. 31.25 milliHertz. Note that the ordinate is the magnitude of Vf, so we are not plotting the power spectrum, but rather the square root of the power spectrum.
We are now in position to discuss an important concept. Although, as previously mentioned, the N complex values are sufficient to make the transform mathematically invertible, invertibility is not the only consideration. The spectrum often has other uses, notably helping us to understand and communicate important facts about the signal. Plotting N points (as in figure 2) leaves out a lot of information, as we shall see in a moment.
We can define a more informative Fourier transform as follows: Start with equation 13 and eliminate k by substituting k → fk/df → fk N dt in accordance with equation 16 and equation 17. That gives us:
| (23) |
This function VF is illustrated in figure 3.
You can see that figure 3 has vastly more detail than figure 2. This additional detail does not come from using more raw time-domain data; rather, it comes from using the existing data in more-informative ways. Both figures are based on the same 32-point time-series data.
You can see that equation 23 does not have (or need) any notion of df. This shouldn’t be surprising. The wave function W(f; t) as given by equation 22 is well defined for all times and for all frequencies. It is defined for all frequencies (fk) that happen to fit an integer number of cycles into the span of data (N dt) and for all other frequencies besides. It is defined for times (tj = j dt) that coincide with the data samples and for all other times besides. Equation 23 requires sampling the time variable to coincide with the data, but there is no requirement to sample the frequency variable.
The relationship between figure 2 and figure 3 is easy to see if we overlay them, as in figure 4. The function Vf(⋯) defined by equation 13 is a subset of the function VF(⋯) defined by equation 23. That is, VF(f) is defined for all f, while Vf(f) is only defined for the isolated points f = k df. At the places where both functions are defined, they are equal.
Again you can see that the high-resolution transform has vastly more detail than the low-resolution transform. You may or may not be interested in this additional detail, depending on what problem you are trying to solve. Here are some common examples that illustrate the range of possibilities:
The limitations of the low-resolution transform look even more dramatic if we plot the power (rather than the magnitude), as in figure 5.
You can see that making a good spectrum analyzer is quite a bit trickier than simply making a call to the standard FFT subroutine.
It is possible to increase the resolution of a Fourier transform while still maintaining some discreteness. This is less mathematically sophisticated than the fully continuous transformation, but may be more convenient for computation.
Here’s one way to do this: Suppose we want to have a large number of finely-spaced points in the frequency domain. Let Nt denote the number of raw data points in the time domain, and Nf denote the number of data points in the frequency domain. We want to have Nf≫Nt and df=1/(Nf dt≪1/(Nt dt. For some applications, such as visualizing the height, width, position, and symmetry of spectral features, we need to enhance the resolution by a factor (Nf/Nt) of at least 4, preferably 8 or even more.
We can generalize equation 13 by replacing each N that appears there by Nf or Nt as appropriate. That gives us:
| (24) |
which is more general than equation 13 (but less general than equation 23). The corresponding inverse transformation is:
| (25) |
Note that Nf appears in the denominator inside the exponential in both the forward and inverse transform. That is related to the fact that frequency spacing is
| df = |
| (26) |
This, plus the fact that there are Nf frequency-domain data points passed from the forward transform to the reverse transform, is sufficient to establish the orthornormality relationships that allow the inverse transform to properly undo the forward transform.
As another way to formulate the high-resolution Fourier transform, some people find it conceptually simpler to use equation 13, equation 14, and equation 17 as they stand, with the understanding that every N that appears there is Nf not Nt. That requires you to pad the time-domain data with Nf−Nt zeros, so that there are Nt “raw” time-domain points and a total of Nf “cooked” time-domain points. Such padding means it no longer matters whether the summation in equation 13 runs up to Nt−1 or Nf−1.
Reference 1 uses this trick, i.e. padding the input data with lots of zeros. This has the advantage of being easy to remember and easy to code. It is somewhat inelegant. It is less efficient than the method discussed in section 3.4, but not grossly inefficient, so unless you are doing huge numbers of transforms, it may not be worth your trouble to improve on it. (This assumes you are passing the data to a “fast” Fourier transform routing. In contrast, if you are using a brute-force un-fast Fourier transform, as in section 4, you will pay a high price for zero-padding.)
If you take the limit of large Nf in equation 24, it becomes effectively equivalent to equation 23. This reminds us of the key mathematical fact: The Fourier transform is well defined for any frequency whatsoever, even if you have a limited amount of time-domain data.
Some people question the validity of the generalization from equation 13 to equation 23, and also question the validity of padding the data with zeros. (After all, the real data isn’t zero, and anything involving fake data must be suspect.)
So, here is yet another method that obtains the same result. It should be pretty clear that this method is mathematically well-founded. Another advantage is that it is more computationally efficient. Indeed it is as efficient as it possibly could be.
The first ingredient in this method is the theorem that says multiplying two functions in the time-domain corresponds to a convolution in the frequency domain. The second ingredient is the observation that a frequency-shift can be considered a convolution, namely a convolution with a delta-function. The inverse Fourier transform is particularly simple; it is a trigonometric wavefunction of the form given by equation 22, where the f in that equation is the amount of the desired frequency-shift. So, the plan is to multiply the raw time-series data by a function of this form.
In the electronics and signal-processing field, multiplying by such a signal is called heterodyning. We will denote the amount of shift by fLO where “LO” stands for Local Oscillator.
You can verify that this works as advertised by doing some examples, perhaps using a spreadsheet as discussed in section 4.5.
If we heterodyne the raw data using a frequency shift that happens to be equal to df, i.e. the conventional “natural” spacing of points in the frequency domain) then it just shifts the frequency-domain data over one step. This doesn’t tell us anything new.
Things get much more interesting if we shift things by a fraction of df, let’s say a tenth of df for example, so that fLO = 0.1df. This shifts the frequency data by a tenth of a step. What does that mean? Well, recall that the basic discrete Fourier transform samples the continuous transform at the conventional points k df, as shown in figure 4. Sampling the heterodyned data at the conventional points is mathematically equivalent to sampling the un-heterodyned data at the unconventional points
| fk = kdf − fLO (27) |
which in this case is (k−0.1)df.
Note the minus sign in equation 27. The logic here is that when we multiply the raw data by a local-oscillator wave with positive frequency fLO, any feature that existed in the raw data lies at a higher frequency in the cooked data. Therefore if we apply a conventional FFT to the cooked data and see a feature at frequency kdf, we should associate that feature with a lower frequency kdf − fLO in the raw data. You can verify this numerically, as discussed in section 4.5.
Heterdyning gives us a highly quibble-resistant way of ascertaining what is going on “between” the conventional frequency-domain sampling points.
It is straightforward to prove (and/or verify numerically) that the methods suggested in section 3.2 and section 3.3 produce the exact same end results as the heterodyne method. This means that those methods were mathematically well founded all along, even if that wasn’t obvious until now.
Reference 3 is a spreadsheet that implements a Fourier transform, directly from the definition (equation 13) without calling any canned FFT library routines. This may help make the transforms less mysterious.
We start with N points of time-series data. N=8 should be enough for a start. (Later you can move on to N=32 if you are ambitious.) You can enter whatever ordinates you want in cells B12:C19 of the spreadsheet. The off-the-shelf version of the spreadsheet sets the input data to a wavefunction with frequency determined by the “fx” variable in cell C5. Feel free to change fx, or to completely replace the input data in cells B12:C19.
The time-domain cells are color-coded light blue. Similarly the graph of the time-domain data has a light blue background. In contrast, frequency-domain cells and the frequency-domain graph are color-coded light red.
Slightly to the right of time-series inputs, there is a rectangular table, N wide by N high, color-coded light green. It has time running vertically and frequency running horizontally.
Each cell of this table implements the complex exponential in equation 13. Since spreadsheets are not smart about complex numbers, we actually need two N×N tables, one for the real part and one for the imaginary part of the complex exponential … i.e. a cosine table and a sine table.
To implement the rest of the summation called for in equation 13, we use the sumproduct() function (rather than just the sum() function). The results are placed below the table, in cells F24:M25. These are color-coded light red, since they are frequency-domain data. There is also a graph of this data.
Note that there are N nontrivial frequency-domain points. That is, there is no “folding” or mirror-imaging. This is related to the fact that the input data had an imaginary part as well as a real part. (Without the imaginary part it would have been impossible to tell the difference between positive frequency and negative frequency, as you can easily verify by zeroing out the imaginary part of the input time-series data.) Actually I like to plot N+1 frequency-domain points, so that the plot better reflects the periodicity of the data.
We calculate the time-domain abscissas in cells A12:A19 based on the parameter dt in cell C4. We also calculate the frequency-domain abscissas in cells F23:M23.
We calculate the energy of the time-domain data and the energy of the frequency-domain data. These agree, as they should, in accordance with Parseval’s theorem. (The fact that we correctly calculated the abscissas is a big help when calculating the energy.)
It is amusing to change dt. Observe the effect on the abscissas, ordinates, and energies (in both the time domain and the frequency domain).
The inverse transform is implemented using the same technique. The complex exponential is in the yellow tables. The output of the inverse transform can be found in cells B33:C40. Below these cells is a check to confirm that the time-domain output agrees with the time-domain input.
If there are ND data points, there is absolutely nothing preventing you from implementing more than ND columns in the aforementioned tables. For example, you could easily double the number of columns (with a corresponding decrease in the column-to-column frequency spacing).
This produces higher resolution in the transform.
I implemented this in reference 4. There are still ND=8 time-domain data points, but there are now N=12 columns. The blue transformation tables are of size 8×12.
You can see that this is formally equivalent to padding the time-domain data with zeros and then using a 12×12 table. The spreadsheet omits the bottom four rows of the 12×12 since there is no point in doing a lot of multiplication by zero.
This also makes the point that Fourier transforms do not require the number of data points to be a power of two. Some canned FFT routines do require this, but the better ones do not. The FFT (if done right) is efficient for any highly-composite number, that is, any number that is the product of small primes. For example, if you have 37=2187 data points, it is more efficient to process that directly as 37, rather than rounding it up to 412=4096. And if you are using a non-fast Fourier transform, such as the direct brute-force Fourier transform in reference 4, then you can use any arbitrary number of points.
In the example in reference 4, since the frequency of the input data is 1/12th of a Hertz, fiddling with the resolution gives a better measure of the height of the peak in the frequency-domain data, since it allows the peak to fall on one of the calculated points.
More generally, though, increasing the resolution by a factor of 1.5 isn’t enough to accomplish much. If you are trying to measure the height and width of peaks, you need to oversample the frequency-domain data by at least a factor of 4, preferably a factor of 8 or more. If you want to see the advantages of higher resolution, look at figure 4 not reference 4 (or modify your copy of reference 4 to use vastly more columns).
What do you think will happen if you take the inverse of the high-resolution transform?
The spreadsheet in reference 5 implements heterodyning, of the sort discussed in section 3.4.
In the raw data, there is a peak at 0.1 Hz. This falls between steps in the conventional basic discrete Fourier transform, which has a stepsize of 0.125 Hz (i.e. 1/8th of a Hz, as expected, since there are 8 raw data points 1 second apart).
By using a LO frequency of 0.025 we can get a good look at the height of this peak. By using LO frequencies in the neighborhood of 0.025 we can learn about the width of this peak.
Feel free to change the LO frequency in cell G7 and see what happens. Values near 0 are interesting, and values near 0.025 are interesting.
In the frequency-domain plot (light red background), the red curve is the real part of Vf and the black curve is the imaginary part. The solid circles represent points calculated using heterodyning, while the circles with white centers represent points calculated without heterodyning, i.e. by applying the basic transform to the raw data. You can verify that the raw data and cooked data coincide if you set the LO frequency is zero.
This is just an introductory demonstration of some features of heterodyning. In a real application, you wouldn’t use a spreadsheet, and you would use the heterdyning trick repeatedly with many different LO frequencies (not just once as in this spreadsheet).
For large N, the brute-force method is not appropriate, but for small pedagogical examples it works just fine.
This is neither entirely the same nor entirely different from the way ordinary linear algebra uses column vectors and row vectors to represent contravariant and covariant vectors, i.e. pointy vectors and one-forms. One similarity is that if you scale up the intervals in the time-domain data, the intervals in the frequency-domain data get scaled down. One dissimilarity is that there is not (so far as I know) a meaningful contraction operation connecting the time-domain and frequency-domain data. (If this paragraph doesn’t mean anything to you, don’t worry about it.)
Fourier transforms, while useful, are not the be-all and end-all when it comes to analyzing periodic or quasi-periodic data.
You know the saying: «If your only tool is a hammer, everything starts to look like a nail.» People are so accustomed to using Fourier transforms that sometimes they think the Fourier transform “defines” what we mean by frequency content. It doesn’t.
As an example, consider the free-induction decay that is the raw data in a typical pulsed NMR (FT-NMR) experiment. It has a nice sinusoidal part, but that is convolved with a wicked sawtooth-shaped envelope due to the sudden rise and gradual decay. The Fourier transform, by its nature, responds to the envelope in ways that interfere with simple interpretation of the frequency-domain data. You can say that the sinusoidal component of the time-domain data is represented by the center of the peak, and the decay of the envelope is represented by the width of the peak … but what about the steep rise at the beginning of the pulse? That causes weird artifacts.
There are additional artifacts if the apparatus does not capture the entire tail of the decay.
Apodization is a kludge to camouflage the worst of the artifacts. Trying to understand apodization may not be worth the trouble.
Remember that a Fourier transform is basically a linear analysis, followed perhaps by some nonlinearity in the peak-finding step.
At some point, you may want to give up on the FT method entirely, and just do a full-blown nonlinear analysis. This takes a ton of computer power, and wasn’t possible in Fourier’s day (or even 20 years ago), but it is possible now.
In particular, consider the time-series data in figure 1. Would you say that there "is" power at the frequencies where there are wiggly feet in figure 3? The FT power spectrum says there “is” energy at these frequencies, but other types of analysis (such as curve-fitting a sum of sine waves to the raw data) would say no.
Similarly, what about the width of the peaks in figure 3? The Fourier transform method attributes some nonzero width to the peaks. However, curve-fitting could establish that there are really just two sine waves, one at frequency 0.0625 and the other at frequency 0.3, with virtually no uncertainty about these frequencies, and with no residuals i.e. no other frequency content.
As another way to look at the same idea, the Fourier method is almost tantamount to assuming that the input data is periodic in the time domain, with period N dt. Figure 6 shows an example of some time-series data consisting of eight points, one second apart.1 The wave has a frequency of 0.1 Hz, which is not commensurate with the 8-second span of the data. (A frequency of 0.125 Hz would have been commensurate.)
If you just look at points 0 through 7 inclusive, which are the real data, everything looks fine. Alas, the Fourier transform method essentially assumes that everything will be periodically continued, and in this case there is a significant misfit between the end of the first period and the beginning of the second period, as you can see in figure 7.
So in general, if your dataset doesn’t have a reasonable periodic continuation, or if it is periodic with a period incommensurate with N dt, then the whole Fourier approach is on shaky ground. Apodization may camouflage the worst of the problems, but you’ve still got problems. Figure 6 is a perfectly nice sine wave, but figure 7 is not, and the Fourier method treats them the same.
As another example, consider the so-called Fourier Transform Infra Red spectrometer (FTIR). The name FTIR is a misnomer in some ways. Non-experts commonly imagine that there is some sort of Fourier transform / inverse transform pair is involved, but there isn’t.
The crucial part of the machine is a Michelson interferometer, which performs an autocorrelation – not a Fourier transform. An autocorrelation is related to a Fourier transform via the Wiener-Khinchin theorem, but “related to” is not the same as “same as”.
In particular, the spectrum coming out of an FTIR machine shows a so-called “zero burst” which cannot be explained in terms of Fourier transforms, but is exactly what you would expect from an autocorrelation.