[Contents]

Numerical Methods
John Denker

## 1  Evaluating a Polynomial

Suppose we have a polynomial of the form

 y = ax3 + bx2 + cx + d
(1)

### 1.1  Horner’s Method

It is standard good practice to evaluate it using Horner’s method. That is, start out with a. Multiply by x and add b. Then multiply by x and add c. Finally multiply by x and add d. In other words:

 y = a y = y*x + b y = y*x + c y = y*x + d
(2)

This requires only d multiplications, where d is the degree of the polynomial ... in contrast to direct evaluation of equation 1, which requires d(d+1)/2 multiplications. So Horner’s method results in a substantial speedup when d is large.

Perhaps more importantly, it results in better accuracy. Fewer operations means less accumulation of roundoff errors.

### 1.2  FMA : Fused Multply Add

Many late-model desktop computers implement the fma instruction in hardware. That stands for Fused Multiply Add. So to a first approximation we can say:

 fma(y, x, a) := y*x + a
(3)

and we can rewrite the Horner method as:

 y = a y = fma(y, x, b) y = fma(y, x, c) y = fma(y, x, d)
(4)

Here’s where it gets interesting: fma has only about half as much roundoff error.

The explanation is simple: If the multiplication and addition were done separately, there would be two roundoff errors: (a) one after the multiplication, and (b) one after the addition.

In contrast, with fma, the result of the multiplication is never delivered to the user. There is no need to represent it in standard floating-point format, and in particular no need to round it. In fact, it is done with plenty of guard digits, so there is no there is no roundoff error at step (a). The only roundoff occurs later, at step (b), after the addition, so the final result can be delivered to the user.

As a possible side-benefit, it is sometimes true that the fma instruction is faster than doing the multiplication and addition separately. This is not necessarily the case, because modern cpus are smart about pipelining ... but sometimes it is true. Typically there is a modest speedup: somewhat faster, but not 2× faster.

The fma() function implemented by C++ does not map directly onto the fma hardware instruction. Instead it calls a library function which determines at runtime whether the hardware implements the instruction. If not, the library carries out the required calculation in software. It assumes you want the accuracy and you don’t care about speed. In fact, the last time I checked, the software emulation was excruciatingly slow, about 100× slower than plain old multiplication. There is no excuse for this. Efficient ways of emulating fma are known. However, it is what it is.

If you are sure your hardware supports fma, you can bypass the runtime library entirely. There is a compiler option that tells it to inline the hardware fma instructions. (If you try to run this on a machine without the fma instruction, your program will raise the illegal instruction signal.)

So here’s how I usually code it: For portability, I write FMA(y,x,a). Depending on the situation, that can be defined in terms of the hardware fma instruction, or defined in terms of plain old multiplication and addition:

```        // if you care more about speed than utmost accuracy:
#ifdef FP_FAST_FMA
#define FMA(y,x,a) fma(y,x,a)
#else
#define FMA(y,x,a) (y*x+a)
#endif
```

The fma idea works for dot products, too. See section 2.

### 1.3  Compensation

Compensation means calculating how big the roundoff error must have been, and then correcting for it. For a single operation, that’s not very exciting, but when dealing with a high degree polynomial it’s a big deal.

The rule of thumb is that compensated fma using doubles has the same accuracy as regular uncompensated fma using double doubles. It’s slower than uncompensated fma, but dramatically faster than double doubles.1

## 2  Dot Products

• You can use the fma instruction to improve the accuracy of dot products. As a side benefit, it might speed things up. See section 1.2.
• If you are dealing with 16 bit integers, you can use the MMX instruction set. This gives you speed. If your data can be represented in short integers, the results are exact.

These instructions are older than dirt, and rather widely available.

• If you are dealing with 32 bit floats you can use the SSE instruction set. This gives you speed with no improvement in accuracy.

Also rather widely available.

## 3  Integrating an Exponential

### 3.1  Scenario

Suppose we want to evaluate the following indefinite integral:

I :=
 t2 t1
e−α t dt
(5)

I =
 e−α t2 − e−α t2 α
(6)

which is not the formula that I use in my software. There are *multiple* things wrong with it.

1. For starters, it fails due to division by zero when α=0. This means that equation 6 is not even mathematically correct.

The question is perfectly well defined for α=0, with a nontrivial result, and doing the integration correctly is super-simple.

The case where α=0 is perfectly reasonable and can occur in the real world. Examples include:

• The background count rate in a radioactive decay experiment.
• The spread of a pandemic when the effective reproduction number Re is unity.

If you are writing a routine to evalute the definite integral, you would be wise handle the case of α=0. I am surprised and disappointed that fancy mathematical systems like wolframalpha and maxima screw this up.

2. Even after we fix it to make it mathematically correct, it’s still terrible from a numerical-methods standpoint. It suffers from tremendous loss of precision due to roundoff errors whenever |α(t2t1)| gets small compared to 1,

The lessons here are:

• Beware of what we call small differences between large numbers. For small x, exp(x) and exp(−x) are large compared to the difference between them. This is a recipe for disastrous roundoff errors.
• More generally, if something makes a big mistake at zero, check for the possibility that it makes slightly smaller mistakes near zero.
• Even more generally: Floating point numbers are not the same as mathematical real numbers. They have roundoff errors.
3. Therefore it is very often a step in the right direction to factor the difference of exponentials into sinh(x) times exp(x) is definitely smart. Let’s be clear: in the computer’s math library, sinh(x) is not evaluated as the difference of exponentials; actually it is evaluated from scratch, using the known mathematical properties of sinh(x). In particular, the result is reasonably accurate near x=0. This gives us

I=
e−α tmid
 sinh(α τ / 2) α / 2

for α≠0
=
τ e−α tmid
 sinh(α τ / 2) α τ / 2

for α≠0
=τ e−α tmid sinhc(α τ / 2)   always
where
tmid:=(t2+t1)/2
τ:=t2t1
(7)

This works pretty well; in contrast, the difference of exponentials is a disaster.

Sanity check: Equation 7 treats t1 and t2 on the same footing.

Sanity check: When α goes to zero, I goes to τ, as it should.

Sanity check: When τ goes to zero, I goes to zero, as it should.

4. So now all we need to do is evaluate sinhc(x) reliably. When |x| is not too small, we can define it as sinh(x)/x. We also know it must go to 1 as x goes to zero.

Calculating sinh(x)/x is probably good enough for all nonzero x, even when |x| is quite small, but we can do better. Since we have to make a special check for the case of x=0, we might as well be smart about x near zero.

Specifically: For x near zero, we can use the first two terms of the Taylor series.

sinhc(x) =

 1 + x2/6 for |x| < 10−4 sinh(x)/x otherwise

(8)

We set the threshold at 10−4 for the following reasons:

• Below that, the two-term series works fine, since the next term in the series is x4/120, which would be less than 10−18, which is small compared to the machine epsilon, so it can be ignored.
• Above that, sinh(x)/x works fine. It is observed to agree with the many-term Taylor series.

### 3.2  The Code

```// numerically accurate sinh(arg)/arg
template<class T>
inline T sinhc(T arg) {
if (abs(arg) < 1e-4) return 1 + arg*arg/6;
return sinh(arg)/arg;
}

// ... inner loop ...
{
double mid((t2 + t1) / 2.);
double tau(t2 - t1);
rslt += amp * exp(-rate*mid) * tau * sinhc(rate*tau/2.);
}

```

### 3.3  Remarks

1. This applies to data in bins of size t2t1. Binning the data is almost always a bad idea, but sometimes we have no choice.

For example, suppose we want to construct models of the current public health emergency. The state and county have the timestamped data, but they won’t release it. Instead they grudgingly release data in bins of size one day, or sometimes two days or three days, depending on their whims. It would be easier for them and everybody else if they would just release the underlying data, but they won’t.

We have to deal with the data that’s available, but we don’t have to like it.

2. The name sinhc is perhaps not ideal, but it is memorable, and has some precedent.4 It is analogous to the longstanding definition of sinc(x) := sin(x)/x.
3. If −α tmid is too hugely negative, the exponential will underflow and return zero. Note that exp(−709) underflows, but exp(−708) does not.

Similarly, if the exponent is too hugely positive, the exponential will overflow and return “inf” (which does have a floating point representation).

Both of these outcomes are probably just fine in practical cases, but if you want mathematical exactitude you can wrap equation 7 in code that checks for underflow or overflow and throws an exception. Or you can turn on the FPU traps and let the hardware do the checking for you. It might thrown exceptions from places you never thought to check.

## 4  References

1.
S. Graillat, Ph. Langlois, and N. Louvet
Compensated Horner Scheme (July 2005)
https://www-pequan.lip6.fr/~jmc/polycopies/Compensation-horner.pdf
2.
Wolfram Alpha gives the “textbook” answer.
https://www.wolframalpha.com/input/?i=integral+exp(-a+t)+dt+from+t1+to+t2
3.
The maxima program also gives the “textbook” answer.
4.
Mathworld definition of sinhc:
https://mathworld.wolfram.com/SinhcFunction.html
[Contents]