The version of the quadratic formula that you see in books is not very well behaved, especially when one root is much smaller in magnitude than the other. Here’s a version that is much more numerically stable.
We wish to find the roots of the equation
 (1) 
That is, we wish to find values of x that satisfy the equation, for given values of the coefficients a, b, and c.
The formula in equation 2 is recommended. The computercode equivalent is in equation 18. This way of doing things is numerically wellbehaved. As we shall see in figure 1, this version performs better than the unsophisticated “textbook” version in equation 8, better by many orders of magnitude in situations where one root is much larger than the other.
where sgn_{L}() is the lefthanded signfunction, which is defined to be −1 whenever its argument is less than or equal to zero, and +1 otherwise. See section 5. (In particular: Do not use the regular sgn() function, which is zero at zero.) The names “small” and “large” are based on the absolute magnitude of the roots:
 (3) 
The rationale behind equation 2 is easy to understand:
The fundamental issue here is the fact that in a computer, floating point numbers are subject to roundoff. Roughly speaking, the roundoff error is on the order of the “machine epsilon”, which is not zero. (Typically there is considerable roundoff error in the 16th decimal place, although this varies somewhat from machine to machine.) There are lots of seeminglyinnocuous realworld situations where this matters.
As a general rule, when the terms have the same sign, the sum of two terms in the denominator (as in equation 2b or equation 15) is numerically better behaved than the difference of two terms the numerator (as in equation 8 or equation 14). Vastly better.
It is a good habit to use equation 2 always, to the exclusion of lessclever formulas such as equation 8. (You can always use moreclever formulas, for instance in special cases where some of the coefficients are zero). You can get away with using equation 8 in situations where you know the two roots are a complexconjugate pair, or are real and close together, that is, in situations where the discriminant b^{2}−4ac is either negative or small compared to b^{2}.
Note: If you know b is zero, the original equation is trivial. It can be solved by inspection:
 (4) 
Conversely, if you know b is nonzero, the solution can be written in the slightly simpler form:
Similarly, when a is zero, the original equation is trivial; it’s not even a quadratic. So in equation 2 we can safely assume that a and b are nonzero.
Figure 1 compares the smart formula (equation 2) with the notsosmart formula (equation 8) over a range of conditions. The true x_{big} is between 1 and 8, such that log(x_{big}) is uniformly distributed. The true x_{small} is between 10^{−14} and 10^{−19}, such that log(x_{small}) is uniformly distributed.
There is a range of many orders of magnitude where equation 2 produces the correct answers, but equation 8 produces wildly incorrect answers for x_{small}. Cases where the incorrect answer is zero cannot be properly plotted on loglog axes, but are qualitatively indicated by downwardpointing triangles.
Let’s see what happens in a realworld equation. Here is an approximate equation that comes up in chemistry, when calculating the pH of an acid solution:
 (6) 
where
 (7) 
First, let’s see what happens if we try to solve that using the notsosmart “textbook” version of the quadratic formula:
 (8) 
where in this case the variables are:
 (9) 
Let’s do a numerical example, in the case where the acid is strong but moderately dilute:
 (10) 
Let’s assume we arrived at the K_{a} value for our hypothetical acid by taking the average of various estimates. On this bases we estimate the uncertainty of K_{a} to be ±1×10^{4} or even more. The uncertainty in the concentration is negligible by comparison.
Plugging the K_{a} and C_{HA} numbers into equation 8, we get
 (11) 
Now some people might decide on the basis of «common sense» that the number inside the square root could be rounded off to 3.210×10^{9}. The uncertainty is so large that the sigfigs rules require us to round this number to a single digit, so carrying three extra digits «should» be plenty, or so the story goes. So let’s try rounding off and see what happens when we continue the calculation:
 (12) 
which is just completely wrong. Both of the alleged roots of the quadratic are negative. It is physically impossible for the [H^{+}] concentration to be negative.
Analysis: It turns out that the «common sense» roundoff leading to equation 12 was a disaster. In this situation:
So now let’s switch to using the smart method. Equation 2 gives us:
 (13) 
where x_{2} is the desired answer. Rounding of intermediate results does not cause any problems. (This x_{2} value should not be taken as gospel, because equation 6 is only an approximation. It neglects the properties of the solvent. However, it’s in the right ballpark.)
It should have been obvious from the getgo that a 10^{−5} molar solution of a chemicallystrong acid will have a pH of about 5.
Let’s consider the equation
 (14) 
where z is small. This comes up in connection with the quadratic formula, and also in special relativity, as discussed in reference 1. Although equation 14 is just fine if you are doing algebra, it is grossly unsuitable if you want to evaluate it numerically. This is because of the infamous “small difference between large numbers” problem. You are much better off using equation 15 instead; it is algebraically exact and numerically wellbehaved for all z≤1.
 (15) 
The reasoning behind equation 15 is the same as the reasoning behind equation 2, as discussed in section 1.
Let’s investigate another way of dealing with equation 14. You could expand the square root using a firstorder Taylor series, namely:
 (16) 
whenever z is small compared to 1. This would give you a reasonably accurate answer if z is small enough. On the other hand, there is no real advantage to equation 16, because equation 15 is just as convenient and is less restricted.
If you want more accuracy than is provided by a firstorder Taylor series, you should not assume that the best way forward is to use a higherorder Taylor series. Often there are other numerical methods that are better behaved. That is, they converge more quickly, giving higher accuracy with less work.
For an interesting application of these ideas, see reference 1.
The ordinary sgn(x) function is defined to be −1 for all x<0, +1 for all x>0, and zero when x=0. The signum() function is same thing. Sometimes they are explicitly undefined when x=0.
None of the above are what we want for the quadratic formula. We need it to be either +1 or −1 when x=0. (Either one works.)
In most computer languages, you can implement the needed behavior as follows:
 (17) 
So all in all, the quadratic formula in computerspeak is:
