[Contents]

Copyright © 2020 jsd

Fitting an Exponential to Noisy Data

*John Denker
*

When an exponential process is producing on a few events, it requires a bit of skill to model the process. The methods discussed here are quite general. They could apply to radioactive decay, or to the growth of yeast in bread or beer, or to the spread of disease in some local area.

Figure 1 shows an exponential fitted to some data. You can see that there is noise in the data. Also, the fitted function (red line) does not exactly agree with the idea (dashed blue line). To understand what’s going on, lets discuss where the data came from.

- There is an ideal exponential, of form y = a 2
^{bx}where y is the number of events, x is the independent variable, a is the prefactor, and b is the growth rate. In all our examples, the ideal growth rate is b=1. In the particular example shown in figure 1, the prefactor is a=2. - The ideal exponential does not generate the data directly, but
rather sets the rate for a Poisson process. There is a different
Poisson process for each x-value. We draw one sample from each, and
record that as the data.
We use a Poisson process to account for the fact that we always have an integer number of events. We can’t have a fractional number of radioactive decays, or a fractional number of deaths due to a disease. For example, at x=−2, the Poisson rate is 0.5, so we would expect to see 0 events sometimes, and 1 event sometimes, and rarely more than that. At x=2, the Poisson rate is 8, so we would expect to see about 8 events, plus or minus a few.

- We now perform the curve fit. This process knows nothing about the ideal exponential; all it has to go on is the data.

Here is another run of the same process, with all the same conditions, just another random sample from the Poisson process. You can see that the fitted parameters are different from the previous example, and also from the ideal. In section 2 we shall see that most of the differences are due to randomness (as opposed to systematic bias).

Here’s the same idea again, but with higher numbers of events at each x-value. The statistics are better, so the fitted function does a better job of approximating the ideal.

Here’s the same idea again, but with even higher numbers of events at each x-value.

At this point we have to wonder to what degree the plots in section 1 are typical. So we collect some statistics. We create an ensemble of 50 fits. Each element of the ensemble corresponds to a plot of the kind shown in section 1; that is, it involves fitting to an 11-point data set.

Figure 5 shows the fitted parameters, as a scatter plot. The ideal values are at the center of the blue cross. The arms of the cross represent 5% uncertainty in each parameter.

You can see that the Poisson noise in the data creates a tremendous amount of uncertainty in the fitted values.

You can also see that there is a nontrivial amount of correlation. If the estimated growth-rate is too small, the fitting procedure can partially make up for it by increasing the prefactor, and vice versa.

The fits discussed here were performed using weighted nonlinear least squares fitting.

You cannot get away using an unweighted fit. That would introduce huge amounts of bias. That’s because the uncertainty band is a lot smaller near x=−2 than it is near x=2. The uncertainty band is shown – approximately – in the diagrams.

This is an approximation because it assumes the peak of the Poisson distribution looks like the peak of a Gaussian. This is a bad assumption when the number of events is small. It would be possible to calculate the exact uncertainty band, but that is more work than I feel like doing at the moment.Furthermore, the whole idea of least

squaresfitting is fundamentally unsound when the peak is not Gaussian. The thing we are minimizing is the log probability. For a Gaussian that scales like Δx squared, but for anything else it doesn’t.These two phenomena introduce some amount of systematic bias into our model.

As is often the case, the size of the uncertainty band comes from the
*model*, not from the data. The data points in figure 1
do not have error bars, nor should they. Any error bar you could
assign to the points would be wrong. This is obvious for the points
where the ordinate is zero. It is less obvious but no less true for
the other points. The *model* is nonzero everywhere, and the
model tells us what the uncertainties (and the weights) should be.

As you can see by comparing figure 1 with figure 2, the undertainty band attaches to the fitted curve (not to the ideal). It has to be this way, because the fitting process knows nothing of the ideal. Since the uncertainties (aka weights) are needed for the fit, and also depend on the results of the fit, this requires iterating until things settle down.

Another thing you cannot get away with in situations like this is taking the logarithm of the ordinate and fitting a straight line to it using linear regression. You can appreciated this by looking at figure 1: Some of the data points are zero. If you take the logarithm of that, you get minus infinity. You cannot fit to such points using a straight line. And you cannot afford to ignore these points. So skip the logarithm and skip the linear regression, and use the industrial-strength nonlinear fit.

If you have a huge number of events at each x-value, you might try the trick of fitting a straight line on semi-log paper, but even then you risk introducing bias into the results.

By way of analogy, consider a simple three-beam balance. It is reasonably robust and reliable, but even so, it needs to be calibrated every so often. This is done by loading it with known masses, then debugging it and adjusting it until it reads correctly.

By the same token, it is important to debug and calibrate curve-fit routines. There are a lot of things that can go wrong. It is hard to predict how accurate the fitted parameters will be, especially when there are correlations and/or when there is some systematic discrepancy between the model and reality, e.g. when using a polynomial to approximate a sine wave or other transcendental function – or, as in our case, approximating a Poisson distribution with a Gaussian.

This explains why we went to the trouble of fitting to artificial
data, where we knew the right answer *a priori*. It must be
emphasized that the right answer was not used during the fitting
process; it was only used (a) beforehand, to generate the noisy data,
and (b) afterwards, to evaluate the results of the fit.

To say the same thing the other way, it is generally a bad idea to use real data to calibrate the curve-fit procedures, especially in a research situation where you don’t entirely know what the data is supposed do look like. Even if you have already taken the data and even analyzed it, set all that aside for a moment and calibrate the analysis procedure by feeding it Monte Carlo data. That’s sometimes the only way to detect bugs in the analysis.

We call this “closing the loop” because it goes parameters → synthesis → data → analysis → parameters.

[Contents]

Copyright © 2020 jsd