Copyright © 2003–2011 jsd
Here are some simple rules:
Important note: The previous two sentences tell you everything you need to know for most purposes, including nearly all situations you encounter in primary and secondary school. You can probably skip the rest of this document. Seriously: The primary rule is to use plenty of digits. You hardly even need to think about it. Too many is vastly better than too few.
As a minor, secondary matter: If it gets to the point where you have more digits than you need and the extra digits are causing major inconvenience, then you can think about reducing the number of digits.
If you want slightly more guidance, still using ultrasimple procedures, see section 2. For details on how to do things right, see section 7.2. For a discussion of the effect of roundoff, see section 7.5. For a discussion of why “sig figs” are never the right answer, see section 1.3. There is also a complete table of contents.
Along the same lines, here is a lessextreme example that arises in the introductory chemistry class. Suppose the assignment is to balance the equation for the combustion of gasoline, namely
 (1) 
by finding numerical values for the coefficients a, b, x, and y. The conventional answer is (a, b, x, y) = (2, 25, 16, 18). The outcome of the real reaction must have “some” uncertainty, because there will generally be some nonidealities, including the presence of other molecules such as CO or C_{60}, not to mention NO_{2} or whatever. However, my point is that we don’t necessarily care about these nonidealities. We can perfectly well find the idealized solution to the idealized equation and postpone worrying about the nonidealities and uncertainties until much, much later.
As another example, suppose you use a digital stopwatch to measure some event, and the reading is 1.234 seconds. We call this number the indicated time, and we distinguish it from the true time of the event, as discussed in section 4.7. In principle, there is no chance that the indicated time will be exactly equal to the true time (since time is a continuous variable). However, in many cases you may decide that it is close enough, in which case you should just write down the indicated reading and not worry about the quantization error.
Conversely, it is not recommended to describe uncertainty in terms of “random numbers” or “uncertain quantities”. Technically speaking, there is no such thing.

People do commonly speak in terms of “random numbers” or “uncertain quantities”, but that doesn’t make it right. These must be considered idiomatic expressions and misnomers. See section 3.3 and section 4.2 for more on this.
Equivalently, you can write something of the form 1.234(55), where the number in parentheses indicates the uncertainty. The placevalue is such that the last digit of the uncertainty lines up with the last digit of the nominal value. Therefore 1.234(55) is just a morecompact way of writing 1.234 ± 0.055.
If you wish to describe the uncertainty in relative terms (as opposed to absolute terms), it can be expressed using percentages, parts per thousand, parts per million, or something like that, e.g. 2900 ± 0.13% or equivalently 2900 ± 1300ppm.
(Note that in the expression 1.234 ± 0.055 we have two separate numbers represented by two separate numerals, which makes sense. This stands in contrast to the “sig figs” notation, which tries to represent two numbers using a single numeral, which is a very bad idea.)
If you have N variables that are statistically independent and Gaussian distributed, you can describe the uncertainty in terms of N variances. (The standard deviation is the square root of the variance.)  If you have N variables that are correlated, to describe an Ndimensional Gaussian distribution requires a covariance matrix which has N^{2} entries. The plain old variances are the diagonal elements of the covariance matrix, and they don’t tell the whole story, especially when N is large. 
In the real world, there are commonly nontrivial correlations involving several variables – or several thousand variables. In other words, there are lots of nontrivial offdiagonal matrix elements in the covariance matrix.
As a corollary, you should not become too enamored of the notation 1.234 ± 0.055 or 1.234(55), because that only allows you to keep track of the N variances, not the N^{2} covariances.
You are not trying write down the true values. You don’t know the true values (except insofar as the indicated values represent them, indirectly), as discussed in section 4.7. You don’t need to know the true values, so don’t worry about it. The rule is: Write down what you know. So write down the indicated value.Also: You are not obliged to attribute any uncertainty to the numbers you write down. Normal labbook entries do not express an uncertainty using A±B notation or otherwise, and they do not “imply” an uncertainty using sig figs or otherwise. We are always uncertain about the true value, but we aren’t writing down the true value, so that’s not a concern. For an example of how this works, see table 5 in section 5.4.
Some people say there must be some uncertainty “associated” with the number you write down, and of course there is, indirectly, in the sense that the indicated value is “associated” with some range of true values. We are always uncertain about the true value, but that does not mean we are uncertain about the indicated value. These things are “associated” ... but they are not the same thing.
In a welldesigned experiment, things like readability and quantization error usually do not make a large contribution to the overall uncertainty anyway, as discussed in section 4.10. Please do not confuse such things with “the” uncertainty.
It suffices to write down the rule just once; you do not need to restate the rule every time you take a reading. Later, when you are analyzing the data, you can apply the rule to each of the readings.^{1} As a familiar example of such a rule, you might say “all readings are uncertain due to Poisson statistics”. For another familiar example, see section 5.1.
Executive summary: No matter what you are trying to do, significant figures are the wrong way to do it.
When writing, do not use the number of digits to imply anything about the uncertainty. If you want to describe a distribution, describe it explicitly, perhaps using expressions such as 1.234±0.055, as discussed in section 1.2.
When reading, do not assume the number of digits tells you anything about the overall uncertainty, unless you are absolutely sure that’s what the writer intended ... and even then, beware that the meaning is very unclear.
The disadvantages of sig figs include:
For a more detailed discussion of why sig figs are a bad idea, see section 16 and reference 3
In an introductory chemistry class, you should start with some useful chemistry ideas, such as atoms, molecules, bonds, energy, atomic number, nucleon number, etc. — without worrying about uncertainty in any form, and doubleespecially without introducing ideas (such as sig figs) that are mostly wrong and worse than useless.
Roundoff procedures are necessary, so learn that. Scientific notation is worthwhile, so learn that. The “sig figs” rules that you find in chemistry books are not necessary and are not worthwhile, so the less said about them, the better.
In place of the “sig figs” rules, you can use the following guidelines:
Basic 3digit rule: For a number in scientific notation, the rule is simple: For present purposes, you are allowed to round it off to three digits (i.e. two decimal places).
Example: 1.23456×10^{8} may be rounded to 1.23×10^{8}
For a number not in scientific notation, the rule is almost as simple: convert to scientific notation, then apply the aforementioned 3digit rule. (Afterwards, you can convert back, or not, as you wish.)
The point of these rules is to limit the amount of roundoff error. As a corollary, you are allowed to keep more than three digits if you wish, for any reason, or for no reason at all. This is makes sense because it introduces even less roundoff error. As another corollary, trailing zeros may always be rounded off, since that introduces no roundoff error at all.
Example: 1.80 may be rounded to 1.8, since that means the same thing. Conversely 1.8 can be represented as 1.80, 1.800, 1.8000000, et cetera.
These rules apply to intermediate steps as well as to final results.
These “house rules” apply unless/until you hear otherwise. They tell you what is considered significant at the moment. As such, they have zero portability outside the introductory class, and even within this class we will encounter some exceptions (as in section 6.7 for example). Still, for now three digits is enough. There is method to this madness, but now is not the time to worry about it. We have more important things to worry about.
These rules differ in several ways from the “sig figs” rules that you often see in introductory chemistry textbooks.
This is important because of the following contrast:
Every time you write down a number, you have to write down a definite number of digits, and this almost always involves rounding off. Therefore you must have a roundoff rule or some similar guidance as to how many digits are needed.  There are many cases when you want to write down a number without any indication of uncertainty. 
A roundoff rule is necessary and harmless (unless abused).  A “sig figs” rule that forces a connection between the number of digits and the uncertainty is unnecessary and harmful. 
Remember, these are roundoff rules. Do not confuse roundoff with uncertainty. Roundoff error is just one contribution to the overall uncertainty. Knowing how much roundoff has occurred gives you a lower bound on the overall uncertainty, but this lower bound is rarely the whole story. Looking at the number of digits in a numeral gives you an upper bound on how much roundoff has occurred. (This is not a tight upper bound, since the number might be exact, i.e. no roundoff at all.) At the end of the day, the number of digits tells you nothing about the overall uncertainty.
Roundoff error is in the category of things that we generally do not need to know very precisely, so long as it is small enough. Uncertainty is not in this category, for reasons discussed in section 3.4.
As discussed in section 4.4, an expression such as 1.234±0.055 does not represent a number, but rather a distribution over numbers, i.e. a probability distribution. Unfortunately, people sometimes use sloppy shorthand shorthand expressions, perhaps referring to the «random variable» x or the «uncertain quantity» x, such that x = 1.234±0.055. Beware that this shorthand causes endless confusion. When in doubt, it is best to think of 1.234±0.055 as describing a distribution.
As a compromise, in the alltoocommon situation where somebody wants to learn about uncertainty but doesn’t have a very strong background in probability, we can simplify things by talking about an interval or equivalently a range of numbers.
Note: “interval” is an official mathematical term, while “range of numbers” is more likely to be understood by nonexperts.
Working with intervals is easier than working with distributions. You can draw a range of numbers on the number line much more easily than you can draw a probability distribution. It is not an ideal solution, but it is a way to get started. (In contrast, the idea of socalled «random variables» is not good, not as a starting point or anything else.)
In order of decreasing power, sophistication, and reliability:
probability distributions ≫ intervals ≫ socalled «random variables» (2) 
In order of decreasing simplicity:
intervals ≫ probability distributions ≫ socalled «random variables» (3) 
In any case, the fundamental point is that some situations cannot be described by a single “number”. Instead, they are better described by a whole range of numbers that are consistent with our knowledge of the situation. The extent of the range expresses the uncertainty. One way to explain this is in terms of hedging a bet. If you roll a pair of dice, the most likely outcome is 7 ... but that outcome occurs less than 17% of the time. If you want to be right more than half of the time, you can’t do it by betting on any single number, but you can do it by betting on a range of numbers.
So, if you want, you can simplify the following discussion (with only a modest reduction in correctness) by crossing out every mention of “probability distribution” and replacing it with “range of numbers”.
Sometimes there is uncertainty, but it is unimportant, as mentioned in section 2.1 and especially section 4.1.
Moreover, sometimes there is no uncertainty, and it would be quite wrong to pretend there is, especially when dealing with raw data or when dealing with a particular data point drawn from a distribution, as discussed in section 4.2.
Suppose we have a distribution over x – perhaps the distribution shown in figure 1 – and the distribution is described by a couple of parameters, the mean A and and the standard deviation B. Consider the contrast:
Separate {A, B}  Bundled A±B 
Sometimes it is best to think of the mean and standard deviation as two separate, independent parameters.  Sometimes you might choose to think of the mean as the “nominal” value of x and the standard deviation as the “uncertainty” on x. 
This is more abstract and more formal. It is hard to go wrong with this. One case where it is particularly advantageous is diffusion, where the mean velocity is expected to be zero, and all you care about is the RMS velocity.  This is less formal and more intuitive. It is advantageous when the average is the primary object of attention. 
We must distinguish between raw data points and cooked data blobs. These are different, as surely as a scalar is different from a highdimensional vector. You can visualize by means of the following contrast:
Good  Bad 
Figure 3 shows 400 data points, each of which has zero size. The plotting symbols have nonzero size, so you can see them, but the data itself is a zerosized point in the middle of the circle. The distribution over points has some width. The distribution is represented by the dashed red line.  In figure 4 each data point is shown with error bars, which is profoundly wrong. Remember, the width is associated with the distribution, not with any particular raw data point. 
See also section 4.2. These two figures, and the associated ideas, are discussed in more detail in reference 2.
Suppose on Monday we roll a pair of slightlylopsided dice 1000 times, and observe the number of spots each time. Let x_{i} represent the number of spots on the ith observation. This is the raw data: 1000 raw data points. It must be emphasized that each of these raw data points has no error bars and no uncertainty. The number of spots is what it is, period. The points are zerosized pointlike points.
On Tuesday we have the option of histogramming the data as a function of x and calculating the mean (A) and standard deviation (B) of the distribution.
For some purposes, keeping track of A±B is more convenient than keeping track of all 1000 raw data points.  For some other purposes, A±B does not tell us what we need to know. 
For example, if we are getting paid according to the total number of spots, then we have good reason to be interested in A directly and B almost as directly.  For example, suppose we are using the dice as input to a randomnumber generator. We need to know the entropy of the distribution. It is possible to construct two distributions with the same mean and standard deviation, but wildly different entropy. Because the dice are lopsided, we cannot reliably determine the entropy from A and B alone. 
As another example: Suppose we are getting paid whenever snakeeyes comes up, and not otherwise. Because the dice are lopsided, A and B do not tell us what we need to know. 
Using the raw data to find values for A and B can be considered an example of curve fitting. (See section 6.23 for more about curve fitting.) It is also an example of modeling. We are fitting the data to a model – presumably a triangular model – and determining the parameters of the model. (Beware that using the measured standard deviation of the set of raw data points is not the best way to determine the width of the model distribution. This is obvious when there is only a small number of raw data points. See section 10.4 and reference 2 for details on this.)
If we bundle A and B together (as defined in section 3.2), we can consider A±B as a single object, called a blob, i.e. a cooked data blob. We have the option of trading in 250 raw data points for one cooked data blob. This cooked data blob represents a model distribution, which is in turn represented by two numbers, namely the mean and the standard deviation.
So, this is one answer to the question of why uncertainty is important: It is sometimes more convenient to carry around one cooked data blob, rather than hundreds, thousands, or millions of raw data points. Cooking the data causes a considerable loss of information, but there is sometimes a valuable gain in convenience.
Note that if somebody gives you a cooked data blob, you can – approximately – uncook it using Monte Carlo, thereby returning to a representation where the distribution is represented by a cloud of zerosized points. That is, you can create a set of artificial raw data points, randomly distributed according to the distribution described by the cooked data blob.
In the early stages of data analysis, one deals with raw data. None of the raw data points has any uncertainty associated with it. The raw data is what it is. The raw data speaks for itself.  In the later stages of data analysis, one deals with a lot of cooked data. In the simplest case, each cooked data blob has a nominal value and an uncertainty. 
If one variable is correlated with some other variable(s), we have to keep track of all the means, all the standard deviations, and all the correlations. Any attempt to keep track of separate blobs of the form A±B is doomed to fail. 

See section 6.6 for a simple example of a calculation involving cooked data, showing what can go wrong when there are correlations. See section 6.14 and section 6.15 for a more elaborate discussion, including one approach to handling correlated cooked data. 
Here’s a story that illustrates an important conceptual point:
Suppose we are using a voltmeter. The manufacturer (or the calibration lab) has provided a calibration certificate that says anything we measure using this voltmeter will be uncertain plusorminus blahblah percent. In effect, they are telling us that there is an ensemble of voltmeters, and there is some spread to the distribution of calibration coefficients.
Note that any uncertainty associated with the ensemble of voltmeters is not associated with any of the raw data points. This should be obvious from the fact that the ensemble of voltmeters existed before we made any observations. This ensemble is owned by the manufacturer or the calibration lab, and we don’t get to see more than one or two elements of the ensemble. So we rely on the calibration certificate, which contains a cooked data blob describing the whole ensemble of voltmeters.
Now suppose we make a few measurements. This is the raw data. It must be emphasized that each of these raw data points has no error bars and no uncertainty. The data is what it is, period.
At the next step, we can use the raw data plus other information including the calibration certificate to construct a model distribution. The ensemble of voltmeters has a certain width. It would be a tremendous mistake to attribute this width to each of the raw data points, especially considering that the calibration coefficient is likely to be very strongly correlated across all of our raw data.
See section 12.6 for more on this.
When dealing with a cooked data blob, it is sometimes very important to keep track of the width of the blob, i.e. the uncertainty. Far and away the most common reason for this has to do with weighing the evidence. If you are called upon to make a judgment based on a collection of evidence, the task is straightforward if all of the evidence is equally reliable. On the other hand, if some of the evidence is more uncertain than the rest, you really need to know how uncertain it is.
Here’s a nonnumerical example: Suppose you are on a jury. there are ten witnesses who didn’t see what happened, and one who did. It should go without saying that you really, really ought to give less weight to the uncertain witnesses.
Now let’s do a detailed numerical example. Suppose we are trying to diagnose and treat a patient who has some weird symptoms. We have run 11 lab tests, 10 of which are consistent and suggest we should try treatment “A” while the 11th test suggests we should try treatment “B”.
In the first scenario, all 11 observations have the same uncertainty. This situation is depicted in figure 5. Each of the observations is shown as a Gaussian (bellshaped curve) such that the width of the curve represents the uncertainty.
In a situation like this, where the observations are equally weighted, it makes sense to average them. The average xvalue is shown by the black dot, and the uncertainty associated with the average value is shown by the error bars sticking out from the sides of the dot. We could have represented this by another Gaussian curve, but for clarity we represented it as a dot with error bars, which is another way of representing a probabilistic distribution of observations.
We see that the average is about x=0.1, which is slightly to the right of x=0. The outlier (the 11th observation) has pulled the average to the right somewhat, but only somewhat. The outlier is largely outvoted by the other 10 observations.
Scenario #2 is the same as scenario #1 except for one detail: The 11th observation was obtained using a technique that has much less uncertainty. This situation is shown in figure 6. (We know the 11th curve must be taller because it is narrower, and we want the area under each of the curves to be the same. For all these curves, the area corresponds to the total probability of the measurement producing some value, which must be 100%.)
When we consider the evidence, we must give each observation the appropriate weight. The observation with the small uncertainty is given greater weight. When we take the appropriatelyweighted average, it gives us x=0.91. This is represented by the black dot in figure 6. Once again the uncertainty in the average is represented by error bars sticking out from the black dot.
It should be obvious that the weighted average (figure 6) is very, very different from from the unweighted average (figure 5).
In particular, suppose the yellow bar in the diagram represents the decision threshold. With unweighted data, the weight of the evidence is to the left of the threshold, and we should try treatment “A”. With weighted data, the weight of the evidence is to the right of the threshold, and we should try treatment “B”.
On the third hand, when considering these 11 observations collectively, it could be argued that the chisquare is so bad that we ought to consider the possibility that all 11 are wrong, but let’s not get into that right now. Properly weighing the evidence would be just as important, just slightly harder to visualize, if the chisquare were lower.
This could be a lifeordeath decision, so it is important to know the uncertainty, so that we can properly weigh the evidence.
The “significant figures” approach is intrinsically and incurably unable to represent uncertainty to better than the nearest order of magnitude; see section 7.5 for more on this. What’s worse, the way that sig figs are used in practice is even more outofcontrol than that; see section 16.5.1 for details.
Everyone who reports results with uncertainties needs to walk a little ways in the other guy’s moccasins, namely the guy downstream, the guy who will receive those results and do something with them. If the uncertainty is only reported to the nearest order of magnitude, it makes it impossible for the downstream guy to collect data from disparate sources and weigh the evidence.
To say the same thing the other way, it is OK to use sig figs if you are sure that nobody downstream from you will ever use your data in an intelligent way, i.e. will never want to weigh the evidence.
Tangential remark: Just to rub salt into the wound: In addition to doing a lousy job of representing the uncertainty ΔX, the sigfigs rules also do a lousy job of representing the nominal value ⟨X⟩ because they introduce excessive roundoff error. However that is not the topic of this section.
Some things are, for all practical purposes, completely certain. For example:
On the other hand, there is a very wide class of processes that lead to a distribution of possible outcomes, and these are the main focus of today’s discussion. Some introductory examples are discussed in section 4.2.
The only way to really understand uncertainty is in terms of probability distributions. You learned in gradeschool how to add, subtract, multiply, and divide numbers ... but in order to deal with uncertainties you will have to add, subtract, multiply and divide probability distributions. This requires a tremendously higher level of sophistication.

If you want a definition of probability, in fundamental and formal terms, please see reference 2. For the present purposes we can get along without that, using instead some simple intuitive notions of probability, as set forth in the following examples.
As a first example, suppose we roll an ordinary sixsided die and observe the outcome. The first time we do the experiment, we observe six spots, which we denote by x_{1}=6. The second time, we observe three spots, which we denote by x_{2}=3. It must be emphasized that each of these observations has no uncertainty whatsoever. The observation x_{1} is equal to 6, and that’s all there is to it.
If we repeat the experiment many times, ideally we get the probability distribution X shown in figure 7. To describe the distribution X, we need to say three things: the outline of the distribution is rectangular, the distribution is centered at x=3.5, and the distribution has a halfwidth at halfmaximum (HWHM) of 2.5 units (as shown by the red bar).
The conventional but abusive notation for describing such a situation is to write x=3.5±2.5, where x is called a «random variable» or an «uncertain quantity». I do not recommend this notation or this way of thinking about things. However, it is sometimes encountered, so we need a way of translating it into something that makes more sense.
An expression of the form 3.5±2.5 is a fine way to describe the distribution X. So far so good. There are however problems with the x that we encounter in expressions such as x = 3.5±2.5. In this narrow context evidently x is being used to represent the distribution X, while in other contexts the same symbol x is used to represent an outcome drawn from X, or perhaps some sort of abstract “average” outcome, or whoknowswhat. This is an example of form not following function. Remember, there is a profound distinction between a number and some distribution from which that number might have been randomly drawn. See section 5.4 for more on this.
When you see the symbol x, it is important to appreciate the distinction between x=3.5±2.5 (which is abusive shorthand for the distribution X) and particular outcomes such as x_{1}=6 and x_{2}=3 (which are plain old numbers, not distributions):
The socalled random variable x “looks” like it might be one of the observations x_{i}, but it is not. The expression x=3.5±2.5 does not represent a number; instead it is a shorthand way of describing the distribution X from which outcomes such as x_{1} and x_{2} are drawn.  An outcome such as x_{1} or x_{2} is not an uncertain quantity; it’s just a number. In our example, x_{1} has the value x_{1}=6 with no uncertainty whatsoever. 
Now suppose we roll two dice, not just one. The first time we do the experiment, we observe 8 spots total, which we denote by x_{1}=8. The second time, we observe 11 spots, which we denote by x_{2}=11. If we repeat the experiment many times, ideally we get the probability distribution X shown in figure 8. To describe the distribution X, we need to say that the outline of the distribution is symmetrical and triangular, the distribution peaks at x=7, and the distribution has a halfwidth at halfmaximum (HWHM) of 3 units (as shown by the red bar).
Next suppose the outcomes are not restricted to being integers. Let one of the outcomes be x_{3}=25.37. Once again, these outcomes are drawn from some distribution X.
We can round off each of the original data points x_{i} and thereby create some rounded data, y_{i}. For example, x_{3}=25.37 and y_{3}=25.4. We can also calculate the roundoff error q_{i} := y_{i} − x_{i}. In our example, we have q_{3}=0.03. Given a large number of such data points, we can calculate statistical properties such as the RMS roundoff error. Each x_{i} is drawn from the distribution X, while each y_{i} is drawn from some different distribution Y, and each q_{i} is drawn from some evenmoredifferent distribution Q.
Consider the probability distribution represented by the colored bands in figure 9. There is a distribution over yvalues, centered at y=2. Green represents ±1σ from the centerline, yellow represents ±2σ, and magenta represents ±3σ. The distribution exists as an abstraction, as a thing unto itself. The distribution exists whether or not we draw any points from it.
Meanwhile in figure 10, the small circles represent data points drawn from the specified distribution. The distribution is independent of x, and the xcoordinate has no meaning. The points are spread out in the xdirection just to make them easier to see. The point here is that randomness is a property of the distribution, not of any particular point drawn from the distribution.
According to the frequentist definition of probability, if we had an infinite number of points, we could use the points to define what we mean by probability ... but we have neither the need nor the desire to do that. We already know the distribution. Figure 9 serves quite nicely to to define the distribution of interest.
By way of contrast, it is very common practice – but not recommended – to focus attention on the midline of the distribution, and then pretend that all the uncertainty is attached to the data points, as suggested by the error bars in figure 11.
In particular, consider the red point in these figures, and consider the contrasting interpretations suggested by figure 10 and figure 11.
Figure 10 does a good job of representing what’s really going on. It tells us that the red point is drawn from the specified distribution. The distribution has a standard deviation of σ=0.25 and is centered at y=2 (even though the red dot is sitting at y=2.5).  Figure 11 incorrectly suggests that the red point represents a probability distribution unto itself, allegedly centered at y=2.5 and extending symmetrically above and below there, with an alleged standard deviation of σ=0.25. 
Specifically, the red point sits approximately 2σ from the center of the relevant distribution as depicted in figure 10. If we were to go up another σ from there, we would be 3σ from the center of the distribution.  Figure 11 wrongly suggests that the top end of the red error bar is only 1σ from the center of “the” distribution i.e. the alleged red distribution ... when in fact it is 3σ from the center of the relevant distribution. This is a big deal, given that 3σ deviations are quite rare. 
Things get more interesting when the model says the uncertainty varies from place to place, as in figure 12. The midline of the band is a power law, y = x^{3.5}. The uncertainty has two components: an absolute uncertainty of 0.075, “plus” a relative uncertainty of 0.3 times the yvalue. The total uncertainty is found by adding these two components in quadrature.
This sort of thing is fairly common. For instance, a the calibration certificate for a voltmeter might say the uncertainty is suchandsuch percent of the reading plus thisorthat percent of full scale.
Note that on the left side of the diagram, the total uncertainty – the width of the band – is dominated by the absolute uncertainty, whereas on the right side of the diagram, the total uncertainty is dominated by the relative uncertainty.
Figure 13 shows the same data, plotted on log/log axes. Note that log/log axes are very helpful for visualizing some aspects of the data, such as the fact that the power law is a straight line in this space. However, log/log axes can also get you into a lot of trouble. One source of trouble is the fact that the error bands in figure 12 extend into negativey territory. If you take the log of negative number, bad things are going to happen.
In figure 13, the red downwardpointing triangles hugging the bottom edge of the triangle correspond to offscale points. The abscissa is correct, but the ordinate of such points is unplottable.
The spreadsheet used to create this figures is given in reference 4.
Band plots (as in figure 10 or figure 12) are extremely useful. The technique is not nearly as well known as it should be. As a related point, it is extremely unfortunate that the commonlyavailable plotting tools do not support this technique in any reasonable way.
Tangential remark: This can be seen as reason #437 why sig figs are a bad idea. In this case, sig figs force you to attribute error bars to every data point you write down, even though that’s conceptually wrong.
Please see reference 2 for a discussion of fundamental notions of probability, including the idea that a distribution has width but a point does not.
Figure 14 is a copy of figure 1. Let’s take a closer look at it.
For one thing, note that the figure actually gives two representations of the same distribution.
We expect that as we increase the number of points in the scatter plot, it becomes a better and better representation of the ideal distribution, but this is not necessary. The ideal Gaussian distribution exists as a thing unto itself, and is not defined in terms of the scatter plot.
This is considered a onedimensional distribution, because the probability is known as a function of x alone. That is to say, when we draw a point from the distribution, we care only about where it lies along the xaxis, the horizontal axis. In the figure, the points are spread out vertically, but primarily this is just to make them easier to see; you could redistribute them vertically without changing the meaning.
Secondarily, we have used a clever trick: At each point along the x axis, the points are spread vertically by an amount proportional to the probability density in the vicinity of x. That means that the scatter plot has a uniform density per unit area in the plane.
The yellowshaded region extends one standard deviation to each side of the midline. Recall that in this example, the nominal value is 1.234 and the standard deviation is 0.055. You can see that “most” of the probability is within ± one standard deviation of the nominal value, but there will always be outliers.
The ordinate is dP(x), which you should think of as the probability density. For any x, there is zero probability of finding a point exactly at x, but the probability density tells you how much probability there is near x.
Last but not least, it must be emphasized that the data points have zero size. In the scatter plot, the points correspond to the centers of the red circles. The size of the circle means nothing. The circles are drawn big enough to be visible, and small enough to avoid overcrowding. There is a width to the distribution of points, but no width to any individual point. For details on this, see the section on “convergence” in reference 2.
Here are some examples of discrete distributions:
There is an important distinction between an individual outcome and a distribution over outcomes. The distribution assigns a certain amount of probability to each possible outcome. (Reference 2 presents a more formal, mathematical way of saying the same thing.)
There are many different distributions in the world. For starters, we must distinguish the “before” and “after” situations:
Before the toss, for an ideal die, the initial distribution assigns 1/6th of the probability to each of the six possible outcomes, as shown in figure 15.  After the die has been tossed, suppose we observe three spots. The set of remaining possibilities is a singleton, i.e. a set with only this one element. The final distribution assigns 100% of the probability to this one outcome, as shown in figure 16. 
In a discrete distribution, the outcomes do not need to be integers. It is perfectly possible to have a distribution over rational numbers, over real numbers, or even over abstract symbols. As a familiar and important example, there are 2.54 centimeters per inch. Even though 2.54 is a rational number, and even though there are infinitely many rational numbers, is no uncertainty about having 2.54 centimeters per inch. There is a 100% probability that there will be 254 centimeters in 100 inches, by definition. The cumulative probability for this distribution is shown in figure 17.
Now suppose we have a continuous distribution (as opposed to a discrete distribution). This allows us to handle situations where there are infinitely many possible outcomes. This includes outcomes that are represented by rational numbers or real numbers, such as length or voltage.
We can contrast the discrete distributions we have just seen with various continuous distributions:
Some people who have been exposed to sig figs think that every time you write a rational number in decimal form, such as 2.54, there must be some “implied” uncertainty. This is just not true. The width of the riser in figure 17 is zero. There is some width in figure 18 and in figure 19, but not in figure 17.
When sane people write 2.54, they are writing down a rational number. It is 254/100, and that’s all there is to it. As such, it is exact. This number can be used in various ways, as part of more complex expressions. For example:
Let’s be clear: You are allowed to write down a number without saying – or implying – anything about any sets or distributions from which the number might have come.
For some purposes, it is conceptually important to distinguish 2.54 (which is a plain old number) from [2.54±0.005] (which is an interval). It is OK to talk about them using the same language, treating them both as distributions, as in figure 17 and figure 18 ... but we can clearly see that they are different distributions.
Furthermore, even though they are different, there are some situations where we can get away with blurring the distinction:
Tangential remark: There is a tendency to associate continuous distributions with infinite sets and vice versa, but this is not strictly necessary, as we have seen in figure 17. It is also possible to have a hybrid distribution, partly continuous and partly discrete. In figure 20, for example, half of the probability is spread over the negative real numbers, while the other half of the probability is assigned to a single positive number, namely 2.54. The probability for all other positive numbers is zero. The distribution for negative numbers is half a Gaussian; if it were a whole Gaussian it would have a mean of zero and a standard deviation of unity. You can see that about 16% of the total probability lies to the left of 1, which is what you would expect for such a distribution.
There are lots of analog measurements in the world. For example:
Analog measurements are perfectly reasonable. There are ways of indicating the uncertainty of an analog measurement. However, these topics are beyond the scope of the present discussion, and we shall have nothing more to say about them.
Here are the main cases and subcases of interest:
Let’s be clear: The incoming signal is analog, and the needle position is analog, but the digits you write into the lab book are digital.
It helps to distinguish the indicated value from the true values. Let’s consider a couple of scenarios:
Scenario A: We hook a digital voltmeter to a nice steady voltage.
We observe that the meter says 1.23 volts. This is the indicated voltage. It is known.  There is “some” true voltage at the input. We will never know the exact voltage, which is OK, because we don’t need to know it. 
If the meter is broken, the true voltage could be wildly different from the indicated voltage. 
Since this is a digital instrument, the indicated values are discrete.  The true voltage is a continuous variable. 
In general, each indicated value corresponds to a range of true values, or some similar distribution over true values. For example, in the case of an ideal voltmeter, the relationship might follow the pattern shown in table 1.
indicated  range of  
value  true values  
1.1  :  [1.05,  1.15] 
1.2  :  [1.15,  1.25] 
1.3  :  [1.25,  1.35] 
1.4  :  [1.35,  1.45] 
etc.  etc. 
Scenario B: Using a couple of comparators, we arrange to show a green light whenever the voltage is greater than −12 volts and less than +12 volts, and a red light otherwise. That is to say, a “green light” indication corresponds to a true value in the interval 0±12 volts.
indicated  range of  
value  true values  
Green  :  [−12,  12]  
Red  :  (−∞,  −12) ∪ (12,  ∞) 
Instruments with nonnumerical outputs are quite common in industry, used for example in connection with “pass/fail” inspections of incoming or outgoing merchandise. There are many indicators of this kind on the dashboard of your car, indicating voltage, oil pressure, et cetera.
In both of these scenarios, the indicated value is discrete.  The true value is a continuous, analog variable. 
If the indicated value is not fluctuating, it can be considered exact, with zero uncertainty, with 100% of the probability.  The true value will always have some nonzero uncertainty. It will never be equal to thisorthat number. 
Even if the indicated value is fluctuating, there will be a finite set of indications that share 100% of the probability. Each member of the set will have some discrete, nonzero probability.  No specific true value occurs with any nonzero probability. The best we can do is talk about probability density, or about the probability of true values in thisorthat interval. 
The indicated value will never be exactly equal to the true value. This is particularly obvious in scenario B, where the indicated value is not even numerical, but is instead an abstract symbol.
Still, the indicated value does tell us “something” about the true value. It corresponds to a range of true values, even though it cannot possibly equal the true value.
You should not imagine that things will always be as simple as the examples we have just seen.
Terminology: The truevalue intervals (such as we see in table 1) go by various names. In the context of digital instruments people speak of resolution, quantization error, and/or roundoff error. In the context of analog instruments they speak of resolution and/or readability.
In a welldesigned experiment, these issues are almost never the dominant contribution to the overall uncertainty. This leads to an odd contrast:
When designing apparatus and procedures, you absolutely must understand these issues well enough to make sure they will not cause problems.  Later, during the daytoday operation of a welldesigned procedure, you can almost forget about these issues. Almost. Maybe. 
Keep in mind that we are using the word uncertainty to refer to the width of a probability distribution ... nothing more, nothing less.
Sometimes this topic is called “error analysis”, but beware that the word “error” is very widely misunderstood.
In this context, the word “error” should not be considered pejorative. It comes from a Latin root meaning travel or journey. The same root shows up in nonpejorative terms including errand and knighterrant.  Some people think that an error is Wrong with a capital W, in the same way that lying and stealing are Wrong, i.e. sinful. This is absolutely not what error means in this context. 
In this context, error means the same thing as uncertainty. It refers to the width of the distribution, not to a mistake or blunder. Indeed, we use the concept of uncertainty in order to avoid making mistakes. It would always be a mistake to say the voltage was exactly equal to 1.23 volts, but we might be confident that the voltage was in the interval 1.23±0.05 volts.
The proper meaning of uncertainty (aka “error”) is well illustrated by Scenario B in section 4.7. The comparator has a wide distribution of true voltages that correspond to the “green light” indication. This means we are uncertain about the true voltage. This uncertainty is, however, not a blunder. Absolutely not. The width of the distribution is completely intentional. The width was carefully designed, and serves a useful purpose.
This point is very widely misunderstood. For example, the cover of Taylor’s book on Error Analysis (reference 5) features a crashed train at the Gare Montparnasse, 22 October 1895. A train crash is clearly an example of a shameful mistake, rather than a careful and sophisticated analysis of the width of a distribution. It’s a beautiful photograph, but it conveys entirely the wrong idea.
See also section 7.10.
Consider the following contrast:
I have zero confidence that the value of π is in the interval [3.14 ± 0.001].  I have 100% confidence that the value of π is in the interval [3.14 ± 0.002]. 
In this case, we have a tight tolerance but low confidence.  Using a wider tolerance gives us a vastly greater confidence. 
If you demand exact results, you are going to be bitterly disappointed. Science rarely provides exact results.  If you are willing to accept approximate results within some reasonable tolerance interval, science can deliver extremely reliable, trustworthy results. 
Science does not achieve perfection, or even try for perfection.  What we want is confidence. Science provides extremely powerful, highconfidence methods for dealing with an imperfect world. 
Accounting for uncertainty is not merely an exercise in mathematics. Before you can calculate the uncertainty in your results, you need to identify all the significant sources of uncertainty. This is a major undertaking, and requires skill and judgment.
For example: The voltmeter could be miscalibrated. There could be parallax error when reading the ruler. There could be bubbles in the burette. The burette cannot possibly be a perfectly uniform cylinder. There could be moisture in the powder you are weighing. And so on and so on.
Four categories of contributions that are almost always present to some degree are fluctuations, biases, calibration errors, and resolution problems aka roundoff errors, as we now discuss.
Remark #1: Remember: Roundoff error is only one contribution to the overall uncertainty. In a welldesigned experiment, it is almost never the dominant contribution. See section 7.5 for a discussion of how distributions are affected by roundoff errors.
Remark #2: It is not safe to assume that roundoff errors are uncorrelated. It is not safe to assume that calibration errors are uncorrelated. Beware that many textbooks feature techniques that might work for uncorrelated errors, but fail miserably in practical situations where the errors are correlated.
Remark #3: If one of these three contributions is dominant, it is fairly straightforward to account for it while ignoring the others. On the other hand, if more than one of these contributions are nonnegligible, the workload goes up significantly. You may want to redesign the experiment.
If you can’t redesign the experiment, you might still be able to save the day by finding some fancy way to account for the various contributions to the uncertainty. This, however, is going far beyond the scope of this document
Remark #4: More specifically: You usually want to design the experiment so that the dominant contribution to the uncertainty comes from the inherent fluctuations and scatter in the variable(s) of interest. Let’s call this the Good Situation.
It’s hard to explain how to think about this. In the Good Situation, many idealizations and simplifications are possible. For example: since calibration errors are negligible and roundoff errors are negligible, you can moreorless ignore everything we said in section 4.7 about the distinction between the indicated value and the range of true values. If you always live in the Good Situation, you might be tempted to reduce the number of concepts that you need to learn. If you do that, though, and then encounter a NotSoGood Situation, you are going to be very confused, and you will suddenly wish you had a better grasp of the fundamentals.
Possibly helpful suggestion: A null experiment – or at least a differential experiment – often improves the situation twice over, because (a) it reduces your sensitivity to calibration errors, and (b) after you have subtracted off the baseline and other commonmode contributions, you can turn up the gain on the remaining differentialmode signal, thereby improving the resolution and readability.
There are many probability distributions in the world, including experimentallyobserved distributions as well as theoreticallyconstructed distributions.
Any set of experimental observations {x_{i}} can be considered a probability distribution unto itself. In simple cases, we assign equal weight (i.e. equal measure, to use the technical term) to each of the observations. To visualize such a distribution, often the first thing to do is look a scatter plot. For example, figure 34 shows a twodimensional scatter plot, and figure 37 shows a onedimensional scatter plot. We can also make a graph that shows how often x_{i} falls within a given interval. Such a graph is called a histogram. Examples include figure 7, figure 8, and figure 24.
Under favorable conditions, given enough observations, the histogram may converge to some wellknown theoretical probability distribution. (Or, more likely, the cumulative distribution will converge, as discussed in reference 2.) For example, it is very common to encounter a piecewiseflat distribution as shown by the magenta curve in figure 21. This is also known as a square distribution, a rectangular distribution, or the uniform distribution over a certain interval. Distributions of this form are common in nature: For instance, if you take a snapshot of an ideal rotating wheel at some random time, all angles between 0 and 360 degrees will be equally probable. Similarly, in a wellshuffled deck of cards, all of the 52factorial permutations are equally probable. As another example, ordinary decimal roundoff errors are confined to the interval [0.5, 0.5] in the last decimal place. Sometimes they are uniformly distributed over this interval and sometimes not. See equation 52 and section 6.11 for more on this. Other quantization errors (such as discrete drops coming from a burette) contribute an uncertainty that might be moreorless uniform over some interval (such as ± half a drop).
It is also very common to encounter a Gaussian distribution (also sometimes called a “normal” distribution). In figure 21, the black curve is one example of a Gaussian distribution. The standard deviation of the black Gaussian is 1.0, and is depicted by a blue bar. The standard deviation of the magenta rectangle is also 1.0, and is depicted by the same blue bar.
Meanwhile, the HWHM (halfwidth at halfmaximum) of the black Gaussian is depicted by a red bar. The green Gaussian is three times as wide but only 1/3rd as high as the black Gaussian, so the area under the curve is the same for these two curves.
Table 3 lists a few wellknown families of distributions. See section 12.8 for more on this.
Family  # of parameters  example  
Bernoulli  1  coin toss  
Poisson  1  counting random events  
Gaussian  2  white noise  
Rectangular  2  one die; also roundoff (sometimes)  
Symmetric triangular  2  two dice  
Asymmetric triangular  3 
Each of these distributions is discussed in more detail in reference 2.
Each name in table 3 applies to a family of distributions. Within each such family, to describe a particular member of the family (i.e. a particular distribution), it suffices to specify a few parameters. For a symmetrical twoparameter family, typically one parameter specifies the centerposition and the second parameter has something to do with the halfwidth of the distribution. The height of the curve is implicitly determined by the width, via the requirement^{2} that the area under the curve is always 1.0.
In particular, when we write A±B, that means A tells us the nominal value of the distribution and B tells us the uncertainty or equivalently the error bar. See section 4.14 for details on the various things we might mean by nominal value and uncertainty.
Best current practice is to speak in terms of the uncertainty. We use uncertainty in a broad sense. Other terms such as accuracy, precision, experimental error, readability, tolerance, etc. are often used as nontechnical terms ... but sometimes connote various subtypes of uncertainty, i.e. various contributions to the overall uncertainty, as discussed in section 11. In most of this document, the terms “precise” and “precision” will be used as generic, notverytechnical antonyms for “uncertain” and “uncertainty”.
As a related point, see section 12.7 for details on why we avoid the term “experimental error”.
Some guidelines for describing a distribution are given in section 1.2. When writing the nominal value and the standard deviation, be sure to write them separately, using two separte numerals. For example, NIST (reference 6) reports the charge of the electron as
1.602176462(63) × 10^{−19} coulombs (4) 
which is by definition equivalent to
⎛ ⎜ ⎝ 
 ⎞ ⎟ ⎠  × 10^{−19} coulombs (5) 
Note that this value departs from the usual “sigdigs rules” by
a wide margin. The reported nominal value ends in not one but two
fairly uncertain digits.
For specific recommendations on what you should do, see section 7.2. Also, NIST offers some prescriptions on how to analyze and report uncertainties; see reference 7, reference 8, and reference 9.
The “significant figures” method attempts to use a single decimal numeral to express both the center and the halfwidth of a distribution: the ordinary value of the numeral encodes the center, while the length of the string of digits roughly encodes the halfwidth. This is a horribly clumsy way of doing things.
See section 1.3 and section 16.
In the expression A±B, we call A the nominal value and B the uncertainty (or, equivalently, the error bar).
We will explicitly avoid giving any quantitative definition for the terms nominal value and uncertainty. This is because there is not complete consensus as how to quantify the expression A±B. When you write such an expression, it is up to you to specify exactly what you mean by it. When you read such an expression, you will have to look at the context to figure out what it means.
Meanwhile, as for B:
However, if you are going to use twosigma or threesigma error bars, you need to warn people, because this is not what they are expecting. Normally, for a Gaussian, the expression A±B communicates the mean plusorminus one sigma.
As for the uncertainty, there are at least two reasonable choices. B could represent the standard deviation, or it could represent the HWHM.
Again there are reasonable arguments for using the standard deviation to quantify the uncertainty, and also reasonable arguments for using the HWHM. Both are commonly used:
In all cases the uncertainty B is more closely related to the halfwidth than to the full width, since the expression A±B is pronounced A plusorminus B, not plusandminus. That is to say, B represents the plus error bar or the minus error bar separately, not both error bars together.
For a distribution defined by a collection of data, we need to proceed even more carefully. The data itself has a perfectly well defined mean and standard deviation, and you could certainly compute the mean and standard deviation, using the definition directly. These are called the samplemean and the samplestandarddeviation. These quantities are well defined, but not necessarily very useful. Usually it is smarter to assume that the data is a sample drawn from some underlying mathematicallydefined distribution, – called the population – and to use the data to estimate the parameters of the population. The mean of the data might not be the best estimator of the mean of the population. (When the number of data points is not very large, the standard deviation of the sample is a rather badly biased estimator of the standard deviation of the population)
Also, remember: An expression of the form A±B only makes sense provided everybody knows what family of distributions you are talking about, provided it is a wellbehaved twoparameter family, and provided everybody knows what convention you are using to quantify the nominal value and the uncertainty. To say the same thing the other way: it is horrifically common for people to violate these provisos, in which case it A±B doesn’t suffice to tell you what you need to know. For example: in figure 21, the black curve and the magenta curve have the same mean and the same standard deviation, but they are certainly not the same curve. Data that is well described by the black curve would not be well described by the magenta curve, nor vice versa.
It is very common to have an analog meter where the calibration certificate says the uncertainty is 2% of the reading plus 2% of full scale. The latter number means there is some uncertainty as to the “zero offset” of the meter.
When dealing with uncertainty, it helps to keep in mind the distinction between the indicated value and the true value. As discussed in section 4.7, even when the indicated value is known with zero uncertainty, it usually represents a range of true values with some conspicuously nonzero uncertainty.
This tells us that when the indicated value is at the top of the scale, the distribution of of true values has a relative uncertainty of 3 or 4 percent (depending on whether you think the various contributions are independent). More generally, the situation is shown in table 4.
indicated  range of  absolute  relative  
value  true values  uncertainty  uncertainty  
0  :  [0.02,  0.02]  0.02  ∞  
0.05  :  [0.03,  0.07]  0.02  40.05%  
0.1  :  [0.08,  0.12]  0.0201  20.1%  
0.2  :  [0.18,  0.22]  0.0204  10.2%  
0.3  :  [0.28,  0.32]  0.0209  6.96%  
0.4  :  [0.38,  0.42]  0.0215  5.39%  
0.5  :  [0.48,  0.52]  0.0224  4.47%  
0.6  :  [0.58,  0.62]  0.0233  3.89%  
0.7  :  [0.68,  0.72]  0.0244  3.49%  
0.8  :  [0.77,  0.83]  0.0256  3.2%  
0.9  :  [0.87,  0.93]  0.0269  2.99%  
1  :  [0.97,  1.03]  0.0283  2.83% 
As you can see in the table, as the readings get closer to the bottom of the scale, the absolute uncertainty goes down, but the relative uncertainty goes up dramatically. Indeed, if the reading is in the bottom part of the scale, you should switch ranges if you can ... but for the moment, let’s suppose you can’t.
Keep in mind that calibration errors are only one of many contributions to the overall uncertainty.
Let’s turn now to another contribution, namely readability. Imagine that the meter is readable to ±2% of full scale. That means it is convenient to express each reading as a twodigit number. You should record both digits, even in the bottom quarter of the range, where the associated uncertainty is so large that the sig figs rules would require you to round off. You should record both digits because:
You should write a note in the lab book saying what you know about the situation:
Calibration good to 2% of reading plus 2% of full scale.
Scale readable to 2%.
Then just record each indicated value, as is. Two decimal places suffice to guarantee that the roundoff error is not larger than the readability interval. Remember that the indicated value is known with zero uncertainty, but represents a distribution of true values.
Writing such a note in the lab book, and then writing the indicated values as plain numbers, is incomparably easier and better than trying to describe the range of true values for every observation on a linebyline basis.
This upholds the important rule: say what you mean, and mean what you say. Describing the calibration and readability situation and then writing down the indicated values makes sense, because you are writing down what you know, nothing more and nothing less.
Also note that this upholds the rule of specifying the uncertainty separately, rather than trying to encode it using sig figs. You should never try to use one numeral to represent two numbers.
Figure 22 is a photograph^{3} of some liquid in a burette. For present purposes, this photograph is our raw data. Our task is to read the data, so as to arrive at a numerical reading.
Let’s start by taking the simple approach. (See section 5.3 for a fancier approach.
To get a decent accuracy, we divide the smallest graduation in half. Therefore readings will be quantized in steps of 0.05 mL. More to the point, that gives us a readability of ±0.025 mL, since the indicated value will differ from the true value by at most half a step in either direction.
Using this approach, I observe that the meniscus is pretty close to the 39.7 graduation. It is not halfway to 39.8, or even halfway to halfway, so it is clearly closer to 39.7 than to 39.75. Therefore I would record the indicated value as 39.7 mL (with a readability of ±0.0125 mL.
We now start over and redo the interpolation. We work a lot harder this time, so as to obtain a more accurate result.
It is not always worthwhile to go to this much trouble, but sometimes it is.
I choose to define “the” position of the meniscus as the boundary between the dark boundary and the bright halo. Others may choose differently. The choice doesn’t matter much for typical chemlab purposes (so long as the choice is applied consistently), because when using a burette we are almost always interested in the difference between two readings.
It is not hard to position the boundary of the red object against the boundary of the liquid with subpixel accuracy. It may help to reduce the opacity of the red object during this step.
Following this procedure, I decide the indicated value is 39.71, readable to the nearest .01 mL. That is to say, the readability is ±0.005 ml. Note that this approach gives us five times better accuracy, compared to the simple approach in section 5.2.
It is not be necessary to computeranalyze every burette reading. For one thing, in many cases you don’t need to know the reading to this degree of accuracy. Secondly, with a little bit of practice you can read this burette by eye to the nearest 0.01 mL, without the aid of the computer. A detailed analysis is worth the trouble every once in a while, if only to increase your eyeball skills, and to give you confidence in those skills. Interpolating by eye to one tenth of a division is doable, but it is not easy. Nobody was born knowing how to do this.
At some point readability gets mixed up with quantization error aka roundoff error associated with the numbers you write down. In this example, I have chosen to quantize the reading in steps of 0.01 ml. This introduces a roundoff error of ± 0.005 ml ... with a very nonGaussian distribution.
Remember: In a welldesigned experiment, roundoff error is almost never the dominant contribution to the overall uncertainty. In this case, the roundoff error is less than the uncertainty due to my limited ability to see where the meniscus actually is, so I’m not going to worry too much about it.
It is hard to know the readability for sure without repeating the measurement N times and doing some sort of statistical analysis.
For reasons discussed in section 5.1 and section 5.4, you probably do not want to record this in the form 39.71 ± 0.005, because people will interpret that as a statement of “the” uncertainty, whereas readability is only one contribution to the overall uncertainty. It is better to simply make a note in the lab book, saying that you read the burette to the nearest 0.01 mL, or words to that effect.
On top of all that, the meaning of a burette reading may be subject to uncertainty due to the fact that the liquid comes out in discrete drops. There are steps you can take to migitate this. If there are droplets inside the column, or a thin film wetting the surface, this is an additional source of uncertainty, including both scatter and systematic error.
Last but not least, there will be some uncertainty due to the fact that the burette may not be a perfect cylinder, and the graduations may not be in exactly the right places. Industrystandard tolerances are:
Capacity / mL  Tolerance / ml  
Class A  Class B  
10  0.02  0.04  
25  0.03  0.06  
50  0.05  0.10  
100  0.10  0.20 
The tolerances apply to the full capacity of the burette. It is likely (but not guaranteed) that the errors will be less if a lesser amount is delivered from the burette.
At the time you make a reading, it is quite likely that you don’t know the overall uncertainty, in which case you should just write down the number with plenty of guard digits.^{4} Make a note of whatever calibration information you have, and make a note about the readability, but don’t say anything about the uncertainty. Weeks or months later, when you have figured out the overall uncertainty, you should report it ... and in most cases you should also report the various things that contributed to it, including things like readability, quantization errors, systematic errors, et cetera.
Suppose we perform an ensemble of measurements, namely 100 repetitions of the experiment described in section 5.3. The black vertical bars in Figure 24 are a histogram, showing the results of a numerical simulation.
One thing to notice is that the measurements, as they appear in my lab book, have evidently been rounded off. This is of course unavoidable, since the true value is a continuous, analog variable, while the indicated value that gets written down must be discrete, and must be represented by some finite number of digits. See section 7.5 for more about the effect of rounding. We can see that in the figure, by noticing that only the bins corresponding to round multiples of 0.001 are occupied. The histogram shows data for bins at all multiples of 0.0002, but only every fifth such bin has any chance of being occupied.
In figure 24, the magenta line is a Gaussian with the same mean and standard deviation as the ensemble of measurements. No deep theory is needed here; we just calculate the mean and standard deviation of the data and plot the Gaussian. You can see that the Gaussian is not a very good fit to the data, but it is not too horribly bad, either. It is a concise but imperfect way of summarizing the data.
There is a conceptual point to be made here: Suppose we ignore the black bars in the histogram, and consider only the 100 raw data points plus the cooked data blob. The question arises, how many numbers are we talking about?
The answer is 102, namely the 100 raw data points plus the mean and standard deviation that constitute the raw data blob, i.e. the Gaussian model distribution, as indicated in the following table:
Measurement # 1  is  39.37 
Measurement # 2  is  39.371 
...  
Measurement # 99  is  39.373 
Measurement # 100  is  39.371 
The model  is  39.3704 ± 0.0015 
We emphasize that there is only one ± symbol in this entire table, namely the one on the bottom line, where we describe the model distribution. In contrast, at the time measurement #1 is made, we could not possibly know the standard deviation – much less the uncertainty^{5} – of this set of measurements, so it would be impossible to write down 39.37 plusorminus anything meaningful. Therefore we just write down 39.37 and move on to the next measurement.
In general, if we have N observations drawn from some Gaussian distribution, we are talking about N+2 numbers. We are emphatically not talking about 2N+2 numbers, because it is conceptually not correct to write down any particular measurement in the form A±B. People do it all the time, but that doesn’t make it right. As mentioned in section 4, a distribution is not a number, and a number is not a distribution.
In the simplest case, namely N=1, it requires three numbers to describe the measurement and the distribution from which it was drawn. If we unwisely follow the common practice of recording “the measurement” in the form A±B, presumably B represents the standard deviation of the distribution, but A is ambiguous. Does it represent the actual observed reading, or some sort of estimate of the mean of the underlying distribution? When we have only a single measurement, the ambiguity seems mostly harmless, because the measurement itself may be our best estimate of the mean of the distribution. Even if it’s not a very good estimate, it’s all we have to go on.
Things get much stickier when there are multiple observations, i.e. N≥2. In that case, we really don’t want to have N separate estimates of the mean of the distribution and N separate estimates of the standard deviation. That is to say, it just doesn’t make sense to write down N expressions of the form A±B. The only thing that makes any sense is to write down the N measurements as plain numbers, and then separately write down the estimated mean and standard deviation of the distribution ... as in the table above.
Before leaving the burette example, there is one more issue we must discuss. It turns out that during my series of simulated experiments, in every experiment I started out with the exact same volume of liquid, namely 39.3312 mL, known to very high accuracy. Subsequently, during the course of each experiment, the volume of liquid will of course fluctuate, due to thermal expansion and other factors, which accounts for some of the scatter we see in the data in figure 24. Imperfect experimental technique and roundoff error account for additional spread.
Now we have a little surprise. The distribution of measurements is 39.3704 ± 0.0015 mL, whereas the actual amount of liquid was only 39.3312 mL, which is far, far outside the measured distribution. So, how do we explain this?
It turns out that every one of the experiments was done with the same burette, which was manufactured in such a way that its crosssectional area is too small by one part per thousand. Therefore it always reads high by a factor of 1.001, systematically.
This underlines that point that statistical analysis of your observations will not reveal systematic error. Standard deviation is precisely defined and easy to calculate, but it is not equivalent to uncertaintly, let alone error. For more on this, see section 12, especially section 12.5 and section 12.6.
Suppose I’m measuring the sizes of some blocks using a ruler. The ruler is graduated in millimeters. If I look closely, I can measure the blocks more accurately than that, by interpolating between the graduations. As pointed out by Michael Edmiston, sometimes the situation arises where it is convenient to interpolate to the nearest 1/4th of a millimeter. Imagine that the blocks are slightly misshapen so that it is not possible to interpolate more accurately than that.
Let’s suppose you look in my lab notebook and find a column containing the following numbers:
40 40.25 40.75 41 Table 6: Length of Blocks, Raw Data
and somewhere beside the column is a notation that all the numbers are rounded to the nearest 1/4th of a millimeter. That means that each of these numbers has a roundoff error on the order of ±1/8th of a millimeter. As always, the roundoff errors are not Gaussiandistributed. Roundoff errors are one contribution to the uncertainty. In favorable situations this contribution is flatdistributed over the interval ±1/8 mm, but the actual situation may not be nearly so favorable, as discussed in section 6.11, but let’s not worry about that right now.
If we worshipped at the altar of sig digs, we would say that that the first number (40) had one “sig dig” and therefore had an uncertainty of a few dozen units. However, that would be complete nonsense. The actual uncertainty is a hundred times smaller than that. The lab book says the uncertainty is 1/8th of a unit, and it means what it says.
At the other end of the spectrum, the fact that I wrote 40.75 with two digits beyond the decimal point does not mean that the uncertainty is a few percent of a millimeter (or less). The actual uncertainty is ten times larger than that. The lab book says that all the numbers are rounded to the nearest 1/4th of a millimeter, and it means what it says.
The numbers in table 6 are perfectly suitable for typing into a computer for further processing. Other ways of recording are also suitable, but it is entirely within my discretion to choose among the various suitable formats that are available.
The usual ridiculous “significant digits rules” would compel me to round off 40.75 to 40.8. That changes the nominal value by 0.05mm. That shifts the distribution by 40% of its halfwidth. Forty percent seems like a lot. Why did I bother to interpolate to the nearest 1/4th of a unit, if I am immediately forced to introduce a roundoff error that significantly adds to the uncertainty? In contrast, writing 3/4ths as .75 is harmless and costs nothing.
Bottom line: Paying attention to the “sig digs rules” is unnecessary at best. Good practice is to record the nominal value and the uncertainty separately. Keep many enough digits to make sure there is no roundoff error. Keep few enough digits to be reasonably convenient. Keep all the original data. See section 7.2 for more details.
Even moreextreme examples can be found. Many rulers are graduated in 1/8ths of an inch. This is similar to the example just discussed, except that now it is convenient to write things to three decimal places (not just two). Again the sig figs rules mess things up.
More generally: Any time your measurements are quantized with a stepsize that doesn’t divide 10 evenly, you can expect the “sig digs rules” to cause trouble.
Consider the contrast:
Sometimes readability is the dominant contribution to the uncertainty of the instrument, as when there are only a limited number of digits on a display, or only a limited number of coarse gradations on an analog scale.  Sometimes readability is nowhere near being the dominant contribution, as in the example in section 5.1, at the low end of the scale. 
And another, separate contrast:
Sometimes the uncertainty associated with the instrument is the dominant contribution to the overall uncertainty.  Sometimes the instrument is nowhere near being the dominant contribution, for instance when you hook a highly accurate meter to a signal that is fluctuating. 
I’ve seen alleged rules that say you should read instruments by interpolating to 1/10th of the finest scale division, and/or that the precision of the instrument is 1/10th of the finest scale division. In some situations those rules reflect reality, but sometimes they are wildly wrong.
When choosing or designing an instrument for maximum accuracy, usually you should arrange it so that the dominant contribution to the overall uncertainty is is set by some sort of noise, fluctuations, or fuzz. That makes sense, because if the reading is not fuzzy, you can usually find a way to apply some some magnification and get more accuracy very cheaply.
Consider the following scenario: Suppose we know how to calculate some result x_{i} as a function of some inputs a_{i}, b_{i}, and c_{i}:
 (6) 
We assume the functional form of f(...) is known. That’s fine as far as it goes. The next step is to understand the uncertainty. To do that, we need to imagine that the numbers a_{i}, b_{i}, and c_{i} are drawn from known distributions A, B, and C respectively, and we want to construct a distribution X with the following special property: Drawing an element x_{i} at random from X is the same as drawing elements from A, B, and C and calculating x_{i} via equation 6.
This topic is called propagation of uncertainty. The idea is that the uncertainty “propagates” from the input of f(...) to the output.
If we are lucky, the distribution X will have a simple form that can be described in terms of some nominal value ⟨X⟩ plusorminus some uncertainty [X]. If we are extra lucky, the nominal value of X will be related to the nominal values of A, B, and C by direct application of the same function f(...) that we saw in equation 6, so that
 (7) 
Beware that propagation of uncertainty suffers from three categories of problems, namly Misrepresentation, Malexpansion, and Correlation. That is:
 Misrepresentation: The sigfigs approach cannot even represent uncertainty to an acceptable accuracy. Representation issues are discussed in section 7.2. You could fix the representation using the ⟨A⟩±[A] notation or some such, but then both of the following problems would remain.
 Malexpansion: The stepbystep firstorder approach fails if the firstorder Taylor expansion is not a good approximation, i.e. if there is significant nonlinearity. The stepbystep approach fails even more spectacularly if the Taylor series fails to converge. See e.g. section 6.18, section 6.5, and section 6.4.
 Correlation: The whole idea of a data blob of the form ⟨A⟩±[A] goes out the window if one blob is correlated with another. See e.g. section 6.6.
Let’s consider how these issue affect the various steps in the calculation:
 Step 0: We need a way to represent the uncertainty of three input distributions A, B, and C.
 Step 1: We need a way to calculate the properties (including the uncertainty) of the new distribution X.
 Step 2: After we know the uncertainty of X, we need a way to represent it.
Steps 0 and 2 are representation issues, while step 1 is a propagation issue. The propagation rules are distinct from the representation issues, and are very much more complicated. The propagation rules might fail if the Taylor expansion isn’t a good approximation ... and might also fail if there are correlations in the data.
Beware that the people who believe in sig figs tend to express both the representation rules and the propagation rules in terms of sig figs, and lump them all together, but this is just two mistakes for the price of one. As a result, when people speak of “the” sig figs rules, you never know whether they are talking about the relativelysimple representation rules, or the more complicated propagation rules.
Sig figs cause people to misunderstand the distinction between representation of uncertainty and propagation of uncertainty. In reality, when dealing with real raw data points or artificial (Monte Carlo) raw data points, the representation issue does not arise. The raw data speaks for itself.
In practice, the smart way to propagate uncertainties is:
This is tremendously advantageous, because the uncertainty is now represented by the width of the cloud. The individual points have no width, so you can use ordinary algebra to calculate whatever you want, pointbypoint, stepbystep. This is very much simpler – and more reliable – than trying to attach uncertainty to each point and then trying to propagate the uncertainty using calculusbased firstorder techniques.
In order to really understand the propagation of uncertainty, we must learn a new type of arithmetic: We will be performing computations on probability distributions rather than on simple numbers.
This subsection shows what goes wrong if you try to express the propagation rules in terms of sig figs.
Let’s start with an ultrasimple example
x = (((2 + 0.4) + 0.4) + 0.4) + 0.4 (8) 
where each of the addends has an uncertainty of ±10%, normally and independently distributed.
Common sense suggests that the correct answer is x = 3.6 with some uncertainty. You might guess that the uncertainty is about 10%, but in fact it is less than 6%, as you can verify using the methods of section 6.15 or otherwise.
In contrast, the usual “significant digits rules” give the ludicrous result x=2. Indeed the “rules” set each of the parenthesized subexpressions is equal to 2.
This is a disaster. Not only do the “sig figs rules” get the answer wrong, they get it wrong by a huge margin. They miss the target by seven times the radius of the target!
To understand what’s going on here, consider the innermost parenthesized subexpression, namely (2 + 0.4).
Repeatedly adding 0.4 causes the same disaster to occur repeatedly.
The fundamental issue here is that the sig figs rules require you to keep rounding off until roundoff error becomes the dominant contribution to the uncertainty. This is a representation issue, but it interacts with the propagation issue as follows: The more often you apply the sig figs representation rules, the worse off you are ... and the whole idea of propagation requires you to do this at every step of the calculation.
Rounding off always introduces some error. This is called roundoff error or quantization error. Again: One of the fundamental problems with the sig figs rules is that in all cases, they demand too much roundoff.
This problem is even worse than you might think, because there is no reason to assume that roundoff errors are random. Indeed, in equation 8 the roundoff errors are not random at all; the roundoff error is 0.4 at every step. These errors accumulate linearly. That is, in this multistep calculation, the overall error grows linearly with the number of steps. The errors do not average out; they just accumulate. See section 6.3 for a way to solve part of the problem.
Let’s take another look at the multistep calculation in equation 8.
Many people have discovered that they can perform multistep calculations with much greater accuracy by using the following approach: At each intermediate step of the calculation, the use more digits than would be called for by the sig figs rules. These extra digits are called guard digits. So far so good.  They apply the sig figs rules only at the very last step, rounding only the final result. This is a blunder. It introduces roundoff error into the result. Don’t do it. 
Keeping a few guard digits reduces the roundoff error by a few orders of magnitude. When in doubt, keep plenty of guard digits.  You have to assume somebody is going to use your result. Therefore your “final output” is somebody else’s input. An example of this can be seen in section 6.9.3. Therefore, from an overall point of view, all results are intermediate results, and all of them need guard digits. 
Guard digits do not, however, solve all the world’s problems. In particular, suppose you were using the sig figs rules at every step (as in section 6.2) in an attempt to perform “propagation of error”. (Propagation is, after all, the topic of this whole section, section 6). The problem is, stepbystep firstorder propagation is almost never reliable, even if you use plenty of guard digits. The first reason why it is unreliable is that the firstorder Taylor approximation often breaks down. Furthermore, even if you could fix that problem, the approach fails if there are correlations. There’s a proverb that says imperfect information is better than no information, but that proverb doesn’t apply here, because we have much better ways of getting information about the uncertainty, such as the crankthreetimes method.
When there is noise (i.e. uncertainty) in your raw data, guard digits don’t make the raw noise any smaller ... they just make the roundoff errors smaller.
See section 7.7 for more discussion of guard digits.
See section 11 for more discussion of various contributions to the uncertainty.
Suppose we have a bunch of particles in thermal equilibrium. The x component of momentum is Gaussian distributed, with mean 0 and standard deviation √mkT. The distribution is the same for the y and z components. For simplicity, lets choose units such that m=1, and momentum is equal to velocity. A scatter plot of the x and y components is shown in figure 25.
The kinetic energy of any given particle is p^{2}/(2m). The uncertainty in the mass is negligible in this situation. This situation is simple enough that the right answer can be found analytically, as some guy named Maxwell did in the mid1800s. You can also find the right answer using Monte Carlo techniques. If the situation were even slightly more complicated, Monte Carlo would be the only option.
If you calculate the energy for an ensemble of such particles, the cumulative probability is shown in figure 26. Similarly, the probability density distribution is shown in figure 27. The dashed red line shows the exact analytic result, i.e. the MaxwellBoltzmann distribution.
Figure 26: MaxwellBoltzmann Distribution of Energy (3D)  Figure 27: MaxwellBoltzmann Distribution of Energy (3D)  
Cumulative Probability  Probability Density 
If you tried to obtain the same result using stepbystep propagation of uncertainty, starting from the thermal distribution of velocities, things would not go well. Using the procedure given in section 6.19.2, you would find that the relative uncertainty was infinite. Forging ahead, applying the formula without regard to the provisos in the rule, this would imply an energy of zero plusorminus infinity. This is nowhere close to the right answer.
We can discuss the failure of the stepbystep approach in terms of the unholy trinity of Misrepresentation, Malexpansion, and Correlation.
This example and the next one were chosen because they are simple, and because they make obvious the failure of the stepbystep approach. Beware that in situations that are even slightly more complex, the stepbystep approach will fail and give you wrong answers with little or no warning.
Suppose we have a long, narrow conference table. We start a particle in the middle of the table. At time t=0 we give it a velocity based on a thermal distribution, zero plusorminus √kT/m. Thereafter it moves as a free particle, moving across the table. We want to know how lot it takes before the particle falls off the edge of the table. A scatter plot of the velocity is shown in figure 25. For present purposes, only the x component matters, because the table is narrow in the x direction and very very long in the y direction.
If we take the Monte Carlo approach, this is an ultrasimple “time = distance / rate” problem. For each element of the ensemble, the time to fall off is:
 (9) 
where w is the width of the table, and v is the velocity.
The cumulative probability distribution is shown in figure 28. A histogram of the probability density is shown in figure 29.
Beware that not all the data is visible in these figures. Given an ensemble of 1000 points, it would not be uncommon to find the maximum time to be greater than 1000 units, or indeed greater than 2000 units. The maximumtime point corresponds to the minimumvelocity point, and velocites near zero are not particularly uncommon. That means that the probability density distribution converges only very slowly toward zero at large times. As a consequence, the mean of the distribution is large, vastly larger than the mode. The standard deviation could be in the hundreds, which is vastly larger than the HWHM.
We can contrast the Monte Carlo approach to stepbystep firstorder propagation. The latter fails miserably. In the first step, we need to take the absolute value of the velocity. To calculate the uncertainty, we need the derivative of this, evaluated at the origin, but alas absolute value is not a differentiable function at the origin. In the second step, we need to take the reciprocal, which is not even a function at the origin, much less a differentiable function.
This example and the previous one were chosen because they are simple, and because they make obvious the failure of the stepbystep approach. Beware that in situations that are even slightly more complex, the stepbystep approach will fail and give you wrong answers with little or no warning.
Extensions: This simple example is part of a larger family. It can be extended and elaborated in various ways, including:
Suppose we want to know the chargetomass ratio for the electron, i.e. the e/m ratio. This is useful because it shows up in lots of places, for instance in the formula for the cyclotron frequency (per unit field).
We start by looking up the accepted values for e and m, along with the associated uncertainties. Here are the actual numbers, taken from the NIST website:
 (10) 
At this point it is amusing to calculate the e/m ratio by following the propagationoferror rules that you see in textbooks. Ask yourself, What is the calculated uncertainty for the e/m ratio, when calculated this way? Choose the nearest answer:
 a) 22 ppb
 b) 33 ppb
 c) 44 ppb
 d) 50 ppb
 e) 66 ppb
Note: Ordinarily I play by the rule that says you are expected to use everything you know in order to get the realworld right answer. Ordinarily I despise questions where knowing the right answer will get you into trouble. However ... at the moment I’m making a point about the method, not trying to get the right answer, so this rule is temporarily suspended. You’ll see why shortly.
If we carry out the calculation in the usual naïve way, we assume the uncertainties are uncorrelated, so we can add the relative uncertainties in quadrature:
 (11) 
so the full result is
 (12) 
We can contrast this with the realworld correct value:
 (13) 
The real uncertainty is vastly less than the naïvelycalculated uncertainty.
We can understand this as follows: The accepted values for e and m are correlated. Virtually 100% correlated.
Simple recommendation: If you want to calculate e/m, don’t just look up the values for e and m separately. Use the NIST website to look them up jointly along with the correlation coefficient.
Before we go on, lets try to understand the physics that produces the high correlation between e and m. It’s an interesting story: You could measure the mass of the electron directly, but there’s not much point in doing so, because it turns out that indirect methods work much better. It’s a multistep process. The details are not super important, but here’s a slightly simplified outline of the process.
 A) The fine structure constant is measured to 0.32 ppb relative uncertainty.
 B) The Rydberg constant is measured to 0.005 ppb.
 C) The Rydberg constant is equal to m e^{4} / 8 є_{0}^{2} h^{3} c and the finestructure constant is e^{2} / 2 є_{0} h c.
Combining α^{3}/Ry gives e^{2}/m to 0.96 ppb. It hardly matters whether they are correlated or not, since the uncertainty is dominated by the uncertainty in α^{3}. Note that the speed of light is exact, by definition, so it does not contribute to the uncertainty.
 D) The charge on the electron is measured to 22 ppb.
 E) If you want the e/m ratio, divide e^{2}/m by e. The uncertainty in e/m is dominated by the uncertainty in e.
 F) To find the mass, calculate e^{2} (using the measured charge directly) then divide by the e^{2}/m value obtained in item (c) above. The uncertainty is 44 ppb, dominated by the uncertainty in e^{2}.
Bottom line: Whenever you have two randomlydistributed quantities and you want to combine them – by adding, subtracting, multiplying, dividing, or whatever – you need to find out whether they are correlated. Otherwise you will have a hard time calculating the combined uncertainty.
Figure 30 shows pH as a function of concentration, for various pK_{a} values, including weak acids and strong acids, as well as intermediatestrength acids, which are particularly interesting.
This is obviously not a contrived example. There are plenty of good reasons for preparing a plot like this. For present purposes, however, we are not particularly interested in the meaning of this figure, but rather in the process of computing it. (If you are interested in the meaning, please see reference 10.)
For simplicity, we temporarily restrict attention to the parts of figure 30 that are not too near the top. That is, we focus attention on solutions that are definitely acidic, with a pH well below the pH of water. (This restriction will be lifted in section 6.8.)
In this regime, the relevant equation is:
 (14) 
Equation 14 is a quadratic polynomial. It has one positive root and one negative root, as we shall see. For more on where this comes from and what it means, see reference 10 and references cited therein.
If we plug in the numbers for our dilute solution of a strong acid, using equation 91 and equation 92, we get
 (15) 
You can see that this is definitely a “big root / small root” situation, so you need to use the smart version of the quadratic formula, equation 87, for reasons explained in section 17.
Only the positive root in equation 15 makes sense. Taking the logarithm, we find
 (16) 
Note that the “small root” here is not some minor correction term; it is the entire answer.
Also note that there are other ways of solving this problem. For example, you could expand the square root in equation 90 using a firstorder Taylor series, namely:
 (17) 
whenever z is small compared to 1. This would give you the right answer with plenty of accuracy in the situation we are considering. (It is arguably not the best choice in this situation, because equation 87 is just as convenient and is less restricted.)
For a discussion of the lessons we can learn from this example, see section 6.10.
We revisit this example again in section 6.22, in connection with the rules for stepbystep firstorder propagation of uncertainty.
We now consider the full pH versus concentration diagram, without the restrictions on strength and/or concentration imposed in section 6.8.
The full curves in figure 30 were computed by solving the following equation.
 (18) 
That’s a cubic, with one positive root and two negative roots. For more on where this comes from and what it means, see reference 10.
It is easy to solve the equation with an iterative rootfinding algorithm.
In contrast, beware that standard “algebraic” formulas for solving the cubic can give wrong answers in some cases. Depending on details of the implementation, the formulas can be numerically unstable. That is to say, the result gets trashed by roundoff errors. Specifically: I tried using the standard library routine gsl_poly_complex_solve_cubic() and it failed spectacularly for certain values of pK_a and pC_HA. Some of the alleged results were off by multiple orders of magnitude. Some of the alleged results were complex numbers, even though the right answers were real numbers. It might be possible to rewrite the code to make it behave better, but that’s not a job I’m eager to do.
For a discussion of the lessons we can learn from this example, see section 6.10.
Once upon a time, at Acme Anvil company, there was an ensemble of particles. The boss wanted a relativisticallycorrect calculation of the kinetic energy. He especially wanted the mean and standard deviation of the ensemble of kineticenergy values.
The boss assigned two staffers to the task, Audrey and Alfred. Audrey worked all morning computing the total energy E(v) and the rest energy E(0) for each particle. Then Alfred worked all afternoon, subtracting these two quantities to find the kinetic energy for each particle.
In all cases, Audrey and Alfred used the relativistically correct formulas, namely
 (19) 
The following data describes a typical particle in the ensemble:
 (20) 
For this particle, Audrey calculated the following results:
 (21) 
where both of those numbers are repeating decimals.
Later, Alfred subtracted those numbers to obtain
 (22) 
which is again a repeating decimal.
After calculating the kinetic energy for all the particles, Alfred calculated the mean and standard deviation, namely:
 (23) 
which is in fact the correct answer.
Meanwhile, across the street at Delta Doodad Company, they needed to do the exact same calculation. The boss assigned Darla and Dave to do the calculation.
Darla calculated E(v) and E(0) using a spreadsheet program, which represents all numbers using IEEE doubleprecision floating point. For the typical particle described in equation 20, she obtained:
 (24) 
These numbers cannot be represented to any greater accuracy using IEEE double precision.
When Dave subtracted these numbers, he found the kinetic energy was zero. In fact the apparent kinetic energy was zero for all particles. When he calculated the mean and standard deviation, they were both zero. Alfred suspected that 0±0 was not the correct answer, but given what he had to work with, there was no way for him to compute a better answer.
The problem is that IEEE double precision can only represent about 16 decimal digits, whereas at least 20 digits are needed to obtain a useful answer in this case. If you use less than 20 digits, the roundoff error will be unacceptably large. (By way of contrast, across the street, Audrey used 25 digits just to be on the safe side.)
Meanwhile, down the street at General Gadget Company, they needed to do the same calculation. The boss was a big fan of sig figs. He demanded that everybody adhere to the sig figs rules.
The boss assigned Gail and Gordon to the task. In the morning, Gail calculated the total energy and rest energy. She noticed that there was some uncertainty in these numbers. The relative uncertainty was about 0.5%. So for the typical particle described in equation 20, she obtained:
 (25) 
In accordance with the usual sig figs rules, Gail rounded off these numbers, as follows:
 (26) 
Gail’s reasons for rounding off included:
All in all, it was “obvious” to Gail that equation 26 was the right way to express things.
In the afternoon, Gordon subtracted these numbers. He found that every particle had zero kinetic energy.
Based on the uncertainty in the numbers he was given, he tried to apply the propagationoferror rules. Since Gail did not report any correlations, he assumed all her results were uncorrelated, so that the rules presented in section 6.19 could be applied. On this basis, he estimated that the uncertainty in the difference was about ± 1×10^{15}. So Gordon could have reported his result as 0± 1×10^{15} joule.
That’s the wrong answer. Gordon’s estimate of the mean is wrong by about 200 standard deviations. That’s a lot. Gordon’s estimate of the standard deviation is also off by about seventeen orders of magnitude. That’s a lot, too.
One problem is that Gail didn’t feed Gordon enough digits. She actually calculated enough digits, but she felt obliged to round off her results, in accordance with the sig figs rules. This illustrates a general principle:
Another problem is that for each particle, Gail’s numbers for E(v) and E(0) have very highly correlated uncertainties. Therefore Gordon’s application of the propagationoferror rules was invalid.
Thirdly, just to add insult to injury: The sigfigs method does not provide any way to represent 0 ± 1×10^{15}, so Gordon could not find any way to report his results at all. The boss wanted a sigfigs representation, but no such representation was possible.
Meanwhile, across town at Western Widget Company, yet another company was faced with the same task. At this company, they noticed that equation 19 implies that:
 (27) 
where on the second line we have used some trigonometric identities. Both lines in equation 27 share an important property: the factor in square brackets is a purely mathematical function. The function can be defined in terms of a subtraction that involves no uncertainty of any kind. In contrast, if you were to multiply through by m c^{2} before subtracting, you would then face the problem of subtracting two things that not only have some uncertainties (because of the uncertainty in m) but would have highly correlated uncertainties.
It must be emphasized that equation 27 is relativistically correct; no approximations have been made (yet).
Since the task at hand involves ρ values that are very small compared to 1, the following approximations are good to very high accuracy:
 (28) 
You can check that these approximations are consistent with each other to third order in ρ or better, in the sense that they uphold the identities tanh= sinh/cosh and cosh^{2} − sinh^{2} = 1.
Plugging into equation 27 we find that, with more than enough accuracy,
 (29) 
which allows us to calculate the kinetic energy directly. No subtractions are needed, and ordinary floatingpoint arithmetic gives us no roundofferror problems. The next term in the series is smaller than the E_{kin} by a factor of v^{2}/c^{2}, as you can easily verify.
We apply this formula to all the particles, and then calculate the mean and standard deviation of the results. The answer is E_{kin} = 1.481(7) joule, which is identical to the result obtained by other means in section 6.9.1.
The pH examples in section 6.7 and section 6.8 are obviously realworld examples. They are typical of examples that come up all the time, in many different situations, ranging from astronomy to zoology.
The relativity example in section 6.9 is a bit more contrived, but it illustrates an important theoretical point about the relationship between special relativity and classical dynamics. It is representative of a wider class of problems ... just simplified for pedagogical purposes.
There are a number of lessons we can learn from these examples:
Therefore: Whenever possible, leave numbers in the calculator (or computer), rather than writing them down and reentering them. If you write them down, you will be tempted to round them off, and that can cause big trouble.
To put it bluntly: If you see an expression of the form:
 (30) 
you should not assume it is safe to round things off. It may be that such a number already has too few digits. It may already have been rounded off too much.
Equation 30 is marked “incomplete” for the following reason: Suppose you need to write down something to represent the distribution X. The problem is, because of the correlations, it is not sufficient to report the variance; you need to report the covariances as well. The equation as it stands is not wrong, but without the covariances it is incomplete and possibly misleading.
Not that the ± notation can only represent the variance (or, rather, the square root thereof), not the covariances, so it cannot handle the task when there are nontrivial correlations.
In the relativity example considered in section 6.9, E(v) is in fact highly correlated with E(0). I know (based on how the particles were prepared) that there is some uncertainty in the mass of the particle. A factor of mass is common to both of the terms that are being subtracted. The uncertainty in the particle velocity is relatively small, so all in all there is nearly 100% correlation in the uncertainties. (There is of course no uncertainty in the speed of light, since it is 299792458 m/s by definition.)
It is alltoocommon to find expressions for the roots of a polynomial that depend on subtracting numbers that are highly correlated.
The same idea can be applied to experiments, not just calculations. For example, to avoid a problem with small differences between large numbers, you can use null measurements, differential measurements, bridge structures (such as a Wheatstone bridge), et cetera.
As mentioned in item 3, my advice is: If you have a number that ought to be written down, write it down. Just write it down already. You can worry about the uncertainty later, if necessary. Write down plenty of guard digits. The number of digits you write down does not imply anything about the uncertainty, precision, tolerance, significance, or anything else.
Indeed, in section 6.9.3, Gail’s uncertainty numbers were in some hypertechnical sense correct, but they were highly misleading. They were worse than nothing, because the correlations were not taken into account.
There are lots of situations where the uncertainty in the final answer is less than the uncertainty in the raw data.
This can be understood in terms of “signal to noise” ratio. When we process lots of data, if we do things right, the signal will accumulate faster than the noise. (Conversely, if we don’t do things right, the accumulated errors can rapidly get out of hand.)
We now consider an example that illustrates this point. For simplicity, we assume the raw data is normally distributed and uncorrelated, as shown in figure 31. In this section we assume the analysis is done correctly; compare section 6.12.
Specifically, each data point is drawn from a Gaussian distribution that has a width of 0.018 units. Suppose we run the experiment many times. On each run, we take the average of 100 points. We know the average much more accurately than we know any particular raw data point. In fact, if we look at all the runs, the averages will have a distribution of their own, and this distribution will have a width of only 0.0018 units, ten times narrow than the distribution of raw data points. The distribution of averages is represented by the single black point with error bars at the top of figure 31. (This is a cooked data point, not a raw data point.)
We can say the same thing using fancy statistical language. Each run is officially called a sample. Each sample contains N raw data points. We assume the points are IID, normally distributed. We compute the mean of each sample. Theory tells us that the sample means behave as if they were drawn from a Gaussian distribution, which will be narrower than the distribution of raw data, narrower by a factor of √N.
Let’s reanalyze the data from section 6.11. In particular, let’s consider the effect of roundoff errors that occur while we are calculating the average. Even though the raw data is normally distributed and IID, the roundoff errors will not be normally distributed, and if we’re not careful this can lead to serious problems.
We denote the ith raw data point by a_{i}. It is drawn from a distribution A that has some uncertainty σ_{A}.
Next, we round off each data point. That leaves us with some new quantity b_{i}. These new points behave as if they were drawn from some new distribution B.
The new uncertainty σ_{B} will be larger than σ_{A}, but we don’t know how much larger, and we don’t even know that distribution B can be described as a Gaussian (or any other twoparameter model). It may be that B is a viciously lopsided nonnormal distribution (even though A was a perfectly wellbehaved normal distribution).
For normallydistributed errors, when you add two numbers, the absolute errors add in quadrature, as discussed in section 6.19. That’s good, because it means errors accumulate relatively slowly, and errors can be reduced by averaging.  For a lopsided distribution of errors, such as can result from roundoff, the errors just plain add, linearly. This can easily result in disastrous accumulation of error. Averaging doesn’t help. 
This is illustrated by the example worked out in the “roundoff” spreadsheet (reference 12), as we now discuss. The first few rows and the last few rows of the spreadsheet are reproduced here. The numbers in red are seriously erroneous.
raw data  — Alice —  — Bob —  — Carol —  
1  0.062  0.062  ±  0.018  0.062  ±  0.018  0.06  ±  0.02  
2  0.036  0.098  ±  0.025  0.098  ±  0.025  0.10  ±  0.03  
3  0.030  0.128  ±  0.031  0.128  ±  0.031  0.13  ±  0.03  
4  0.026  0.154  ±  0.036  0.154  ±  0.036  0.16  ±  0.04  
...  
98  0.026  4.285  ±  0.178  4.36  ±  0.18  3.4  ±  0.2  
99  0.044  4.329  ±  0.179  4.40  ±  0.18  3.4  ±  0.2  
100  0.021  4.350  ±  0.180  4.42  ±  0.18  3.4  ±  0.2  
average:  .0435  ±  0.0018  .0442  .034  
=  .0435  ±  4.1% 
The leftmost column is a label giving the row number. The next column is the raw data. You can see that the raw data consists of numbers like 0.048. As usual, the raw data points have no width whatsoever. However, the distribution from which these numbers were drawn has a width of 0.018. You can see that we are already departing from the usual “significant figures” hogwash. If you believed in sig figs, you would attribute considerable uncertainty to the second decimal place in each raw data point, and you would not bother to record the data to three decimal places.
In contrast, in reality, it is important to keep that third decimal place, for reasons that will become clear very soon. We are going to calculate the average of 100 such numbers, and the average will be known tenfold more accurately than any of the raw inputs.
To say the same thing in slightly different terms: there is in fact an important signal – a significant signal – in that third decimal place. The signal is obscured by noise; that is, there is a poor signaltonoise ratio. Your mission, should you decide to accept it, is to recover that signal.
This sort of signalrecovery is at the core of many activities in real research labs, and in industry. On ordinary GPS receiver depends on signals that are hundreds of times less powerful than the noise (SNR on the order of 25 dB). The second thing I ever did in a real physics lab was to build a communications circuit that picked up a signal that was ten million times less powerful than the noise (SNR = 70 dB). The JPL Deep Space Network deals with SNRs even worse than that. Throwing away the signal at the first step by “rounding” the raw data would be a Bad Idea.
Takehome message #1: Signals can be dug out from the noise. Uncertainty is not the same as insignificance. A digit that is uncertain (and many digits to the right of that!) may well carry some significance that can be dug out by techniques such as signalaveraging. Given just a number and its uncertainly level, without knowing the context, you cannot say whether the uncertain digits are significant or not.Takehome message #2: An expression such as 0.048 ± 0.018 expresses two quantities: the value of the signal, and an estimate of the noise. Combining these two quantities into a single numeral by rounding (according to the “significant figures rules”) is highly unsatisfactory. In cases like this, if you round to express the noise, you destroy the signal.
Now, returning to the numerical example: I assigned three students (Alice, Bob, and Carol) to analyze this data. In the data table, the first column under each student’s name is a running sum. The second column is a running estimate of the uncertainty of the running sum.
Alice didn’t round any of the raw data or intermediate results. She got an average of
0.0435±0.0018 (31) 
and the main value (0.0435) is the best that could be done given the points that were drawn from the ensemble. (The errorestimate is a worstcase error; the probable error is somewhat smaller.)
Meanwhile, Bob was doing fine until he got to row 31. At that point he decided it was ridiculous to carry four figures (three decimal places) when the estimated error was more than 100 counts in the last decimal place. He figured that if rounded off one digit, there would still be at least ten counts of uncertainty in the last place. He figured that would give him not only “enough” accuracy, but would even give him a guard digit for good luck.
Alas, Bob was not lucky. Part of his problem is that he assumed that roundoff errors would be random and would add in quadrature. In this case, they aren’t and they don’t. The errors accumulate linearly (not in quadrature) and cause Bob’s answer to be systematically high. The offset in the answer in this case is slightly less than the error bars, but if we had averaged a couple hundred more points the error would have accumulated to disastrous levels.

Carol was even more unlucky. She rounded off her intermediate results so that every number on the page reflected its own uncertainty (one count, possibly more, in the last digit). In this case, her roundoff errors accumulate in the “down” direction, with spectacularly bad effects.
The three students turned in the following “bottom line” answers:
 (32) 
Note that Alice, Bob, and Carol are all analyzing the same raw data; the
discrepancies between their answers are entirely due to the analysis, not
due to the randomness with which the data was drawn from the ensemble.
Alice obtains the correct result. This is shown by the single black point with error bars at the top of figure 31. Bob’s result is slightly worse, but similar. Carol’s result is terrible, as shown by the red point with error bars at the top of figure 31.
Takehome message #3: Do not assume that roundoff errors are random. Do not assume that they add in quadrature. It is waaaay too easy to run into situations where they accumulate nonrandomly, introducing a bias into the result. Sometimes the bias is obvious, sometimes it’s not.
Important note: computer programs^{6} and hand calculators round off the data at every step. IEEE 64bit floating point is slightly better than 15 decimal places, which is enough for most purposes but not all. Homebrew numerical integration routines are particularly vulnerable to serious errors arising from accumulation of roundoff errors.
One of the things that contributes to Bob’s systematic error can be traced to the following anomaly: Consider the number 0.448. If we round it off, all at once, to one decimal place, we get 0.4. On the other hand, if we round it off in two steps, we get 0.45 (correct to two places) which we then round off to 0.5. This can be roughly summarized by saying that the roundoff rules do not have the associative property. If you have this problem, you might find it amusing to try the roundtoeven rule: round the fives toward even digits. That is, 0.75 rounds up to 0.8, but 0.65 rounds down to 0.6. There are cases where this is imperfect (e.g. 0.454) but it’s better overall, it’s easy to implement, and it has a pleasing symmetry. (This rule has been invented and reinvented many times; I reinvented it myself when I was in high school.) Alas, it is not really an improvement in any practical sense.
The important point is this: If fiddling with the roundoff rules produces a nonnegligible change in the results, it means you are in serious trouble. It means the situation is overly burdened by roundoff errors, and fiddling with the roundoff rules is just rearranging deck chairs on the Titanic. Usually the only real solution is to use more precision (more guard digits) during the calculation ... or to use a different algorithm, so that fewer steps (hence fewer roundings) are required. If the rounding is part of a purely mathematical exercise, keep tacking on guard digits until the result is no longer sensitive to the details of the roundoff rules. If the rounding is connected to experimental data, consider redesigning the experiment so that less rounding is required, perhaps by nulling out a commonmode signal early in the process. This might be done using a bridge, or phaselock techniques, or the like.
You can play with the spreadsheet yourself. For fun, see if you can fiddle the formulas so that Bob’s bias is downward rather than upward. Save the spreadsheet (reference 12) to disk and open it with your favorite spreadsheet program.
Notes:
Additional constructive suggestions and rules of thumb:
There exist very detailed guidelines for rounding off if that turns out to be necessary.
This is risky in a multistep or iterated calculation where many roundoff operations occur. That’s because you need to worry about accumulation of errors.
The main advantage is that if you have a problem and are trying to fix it, the analytic approach will probably tell you where to focus your attention. Very commonly, some steps require extra digits while other steps do not.
Here’s a simple yet powerful way of estimating the uncertainty of a result, given the uncertainty of the thing(s) it depends on.
Here’s the procedure, in the simple case when there is only one input variable with appreciable uncertainty:
I call this the Crank Three Times method. Here is an example:
 (33) 
Equation 33 tells us that if x is distributed according to x = 2±.02 then 1/x is distributed according to 1/x = .5±.005. Equivalently we can say that if x = 2±1% then 1/x = .5±1%. We remark in passing that the percentage uncertainty (aka the relative uncertainty) is the same for x and 1/x, which is what we expect provided the uncertainty is small.
The Crank Three Times method is a type of “what if” analysis. We can also consider it a simple example of an iterative numerical method of estimating the uncertainty (in contrast to the stepbystep firstorder methods described in section 6.19). This simple method is a nice leadin to fancier iterative methods such as Monte Carlo, as discussed in section 6.15.
The Crank Three Times method is by no means an exact error analysis. It is an approximation. The nice thing is that you can understand the nature of the approximation, and you can see that better and better results are readily available (for a modest price).
One of the glories of the Crank Three Times method is that in cases where it doesn’t work, it will tell you it isn’t working, provided you listen to what it’s trying to tell you. If you get asymmetrical error bars, you need to investigate further. Something bad is happening, and you need to check closely to see whether it is a little bit bad or very, very bad.
As far as I can tell, for every flaw that this method has, the sigfigs method has the same flaw plus others ... which means Crank Three Times is Pareto superior.
This method requires no new software, no learning curve, and no new concepts beyond the concept of uncertainty itself. In particular, unlike significant digits, it introduces no wrong concepts.
Crank Three Times shouldn’t require more than a few minutes of labor. Once a problem is set up, turning the crank should take only a couple of minutes; if it takes longer than that you should have been doing it on a spreadsheet all along. And if you are using a spreadsheet, Crank Three Times is supereasy and superquick.
If you have N variables that are (or might be) making a significant contribution to the uncertainty of the result, the Crank Three Times method could more precisely be called the Crank 2N+1 Times method. Here’s the procedure: Set up the spreadsheet and wiggle each variable in turn, and see what happens. Wiggle them one at a time, leaving the other N−1 at their original, nominal values.
If you are worried about what happens when two of the input variables are simultaneously at the ends of their error bars, you can check that case if you want. However, beware that if there are many variables, checking all the possibilities is exponentially laborious. Furthermore, it is improbable that many variables would simultaneously take on extreme values, and checking extreme cases can lead you to overestimate the uncertainty. For these reasons, and others, if you have numerous variables and need to study the system properly, at some point you need to give up on the Crank Three Times method and do a fullblown Monte Carlo analysis.
In the rare situation where you want a worstcase analysis, you can move each variable to whichever end of its error bar makes a positive contribution to the final answer, and then flip them all so that each one makes a negative contribution. In most cases, however, a worstcase analysis is wildly overpessimistic, especially when there are more than a few uncertain variables.
Remember: there are many cases, especially when there are multiple uncertain variables and/or correlations among the variables and/or nonlinearities, your only reasonable option is Monte Carlo, as discussed in section 6.15. The Crank Three Times method can be considered an ultrasimplified variation of the Monte Carlo method, suitable for introductory reconnaissance.
Here is another example, which is more interesting because it exhibits nonlinearity:
 (34) 
Equation 34 tells us that if x is distributed according to x = 2±.9 then 1/x is distributed according to 1/x = .5(+.41−.16). Equivalently we can say that if x = 2±45% then 1/x = .5(+82%−31%). Even though the error bars on x are symmetric, the error bars on 1/x are markedly lopsided.
Lopsided error bars are fairly common in practice. Sometimes they are merely a symptom of a harmless nonlinearity, but sometimes they are a symptom of something much worse, such as a singularity or a branch cut in the calculation you are doing.
This is vastly superior to the stepbystep firstorder methods discussed in section 6.19, which blissfully assume everything is linear. That is to say, in effect they expand everything in a Taylor series, and keep only the zerothorder and firstorder terms. In cases where this is not a good approximation, you are likely to get wrong answers with little or no warning.
Note: If you want more accuracy than is provided by a firstorder Taylor series, you should not assume that the best way forward is to use a higherorder Taylor series. Often there are other numerical methods that are better behaved. That is, they converge more quickly, giving higher accuracy with less work.For example, consider the function
q =
1 − √
1 − z (35)
where z is small. This comes up in relativity and also in connection with the quadratic formula. Although equation 35 is just fine if you are doing algebra, it is grossly unsuitable if you want to evaluate it numerically. This is because of the infamous “small difference between large numbers” problem. You are much better off using equation 36 instead.
q =
z
1 + √
1 − z (36)
Similar reasoning applies to equation 87.
Also: If you ever need to evaluate a square root to high accuracy, in conjunction with equation 36 or otherwise, you should use Newton’s iterative method – not the Taylor series.
Here is yet another example, which is interesting because it shows how to handle correlated uncertainties in simple cases. The task is to calculate the molar mass of natural bromine, given the nuclide mass for each isotope, and the corresponding natural abundance.
The trick here is to realize that the abundances must add up to 100%. So if one isotope is at the low end of its error bar, the other isotope must be at the high end of its error bar. So the abundance numbers are anticorrelated. This is an example of a sum rule. For more about correlations and how to handle them, see section 6.15.
(The uncertainties in the mass of each nuclide are negligible.)
nuclide mass  natural  light case  nominal case  heavy case  
/ dalton  abundance  
^{79}Br  78.9183376(20)  ×  50.686+.026%  =  40.02107  more  
^{79}Br  78.9183376(20)  ×  50.686%  =  40.00055  nominal  
^{79}Br  78.9183376(20)  ×  50.686.026%  =  39.98003  less  
^{81}Br  80.9162911(30)  ×  49.314+.026%  =  39.92410  more  
^{81}Br  80.9162911(30)  ×  49.314%  =  39.90306  nominal  
^{81}Br  80.9162911(30)  ×  49.314.026%  =  39.88202  less  
———  ———  ———  
79.90309  79.90361  79.90412 
So by comparing the three columns (light case, nominal case, and heavy case), we find the bottomline answer: The computed molar mass of natural bromine is 79.90361(52). This is the right answer based on a particular sample of natural bromine. The usual “textbook” value is usually quoted as 79.904(1), which has nearly twice as much uncertainty, in order to account for sampletosample variability.
Note that if you tried to carry out this calculation using “significant figures” you would get the uncertainty wrong. Spectacularly wrong. Off by two orders of magnitude. The relative uncertainty in the molar mass is two orders of magnitude smaller than the relative uncertainty in the abundances.
This is based on question 3:21 on page 122 of reference 13.
Suppose we want to calculate (as accurately as possible) the molar mass of natural magnesium, given the mass of the various isotopes and their natural abundances.
Many older works referred to this as the atomic mass, or (better) the average atomic mass ... but the term molar mass is strongly preferred. For details, see reference 14.
The textbook provides the raw data shown in table 7.
isotope molar mass / dalton abundance ^{24}Mg 23.9850 78.99% ^{25}Mg 24.9858 10.00% ^{26}Mg 25.9826 11.01% Table 7: Isotopes of Magnesium, Rough Raw Data
The textbook claims that the answer is 24.31 dalton and that no greater accuracy is possible. However, we can get a vastly more accurate result.
The approach in the textbook has multiple problems:
It is tempting to blame all the problems on the “sig digs” notation, but that wouldn’t be fair in this case. The primary problem is misaccounting for the uncertainty, and as we shall see, we are still vulnerable to misaccounting even if the uncertainty is expressed using proper notation.
Similarly note that even if we did manage to get good estimate of the uncertainty, the “sig digs” rules would not have called for such drastic rounding. So the propagationoferror issues really are primary.
Let’s make a preliminary attempt to figure out what’s going on. If we clean up the notation, it will facilitate understanding and communication. In particular, it will expose a bunch of problems that the text sweeps under the rug.
We can start by reexpressing the textbook data so as to make the uncertainties explicit. We immediately run into some unanswerable questions, because the “sig digs” notation in table 7 gives us only the crudest idea of the uncertainty ... is it half a count in the last decimal place? Or one count? Or more??? If we use only the numbers presented in the textbook, we have to guess. Let’s temporarily hypothesize a middleoftheroad value, namely three counts of uncertainty in the last decimal place. We can express this in proper notation, as shown in table 8.
isotope molar mass / dalton abundance ^{24}Mg 23.9850(3) 78.99(3)% ^{25}Mg 24.9858(3) 10.00(3)% ^{26}Mg 25.9826(3) 11.01(3)% Table 8: Isotopes of Magnesium, Rough Data with Explicit Uncertainty
This gives the molar mass of the ^{25}Mg isotope with a relative accuracy of 12 parts per million (12 ppm), while the abundance is given with a relative accuracy of 3 parts per thousand (3000 ppm). So in some sense, the abundance number is 250 times less accurate.
If you think about the data, you soon realize that the abunance numbers are in percentages, and must add up to 100%. We say there is a sum rule.
The sum rule means the uncertainty in any one of the abundance numbers is strongly anticorrelated with the uncertainty in the other two. The widelytaught pseuosophisticated “propagation of uncertainty” rules don’t take this into account; instead, they rashly assume that all errors are uncorrelated. If you just add up the abundance numbers without realizing they are percentages, i.e. without any sum rule, you get
78.99(3) + 10.00(3) + 11.01(3) = 100.00(5) ??? (37) 
with (allegedly) 500 ppm uncertainty, even though the sum rule tells us they actually add up to 100 with essentially no uncertainty:
78.99(3) + 10.00(3) + 11.01(3) = 100.0±0 (38) 
Even if you imagine that equation 38 is not perfectly exact – perhaps because it fails to account for some fourth, hithertounknown isotope – the sum must still be very nearly 100%, with vastly less uncertainty than equation 37 would suggest.
To say the same thing another way, we are talking about three numbers (the percent abundance of the three isotopes). Taken together, these numbers specify a point in some abstract threedimensional space. However, the valid, physicallysignificant points are restricted to a twodimensional subspace (because of the sum rule).
Here’s another fact worth noticing: All three isotope masses are in the same ballpark. That means that uncertainties in the abundance numbers will have little effect on the soughtafter average mass. Imagine what would happen if all three isotopes had the same identical mass. Then the percentages wouldn’t matter at all; we would know the average mass with 12 ppm accuracy, no matter how inaccurate the percentages were.
There are various ways to take the “ballpark” property into account.
One method, as pointed out by Matt Sanders, is to subtract off the commonmode contribution by artfully regrouping the terms in the calculation. That is, you can subtract 25 (exactly) from each of the masses in table 8, then take the weighted average of what’s left in the usual way, and then add 25 (exactly) to the result. The differences in mass are on the order of unity, i.e. 25 times smaller than the masses themselves, so this trick makes us 25 times less sensitive to problems with the percentages. We are still misaccounting for the correlated uncertainties in the percentages, but the misaccounting does 25 times less damage.
The idea of subtracting off the commonmode contribution is a good one, and has many applications. The idea was applied here to a mathematical calculation, but it also applies to the design of experimental apparatus: for best accuracy, make a differential measurement or a null measurement whenever you can.
To summarize, subtracting off the commonmode contribution is a good trick, but (a) it requires understanding the problem and being somewhat devious, (b) in its simplest form, it only works if the problem is linear, (c) it doesn’t entirely solve the problem, because it doesn’t fully exploit the sum rule.
The situation described in section 6.14 has so many problems that we need to start over.
For one thing, if we’re going to go to the trouble of calculating things carefully, we might as well use the best available data (rather than the crummy data given in the textbook, i.e. table 8). A secondary source containing mass and abundance data for the isotopes of various elements can be found in reference 15. We can use that for our mass data. Another secondary source is reference 16.
isotope molar mass / dalton ^{24}Mg 23.9850423(8) ^{25}Mg 24.9858374(8) ^{26}Mg 25.9825937(8) Table 9: Isotopes of Magnesium, IUPAC Mass Data
Reference 15 appears to be taking its magnesium abundances from reference 17, and it is always good to look at the primary sources if possible, so let’s do that.
abundance isotope pair ratio 95% confidence ^{25}Mg/^{24}Mg 0.12663 ± 0.00013 ^{26}Mg/^{24}Mg 0.13932 ± 0.00026 Table 10: Isotopes of Magnesium, NBS Abundance Data
The first thing you notice is that that the scientists to did the work report their results in the form 0.12663 ± 0.00013 at 95% confidence. The uncertainty is clearly and explicitly stated. People who care about their data don’t use sig figs. (Beware that the 95% error bar is two standard deviations, not one.)
Another thing you notice is that they report only two numbers for the abundance data. They report the ratio of ^{25}Mg abundance to ^{24}Mg abundance, and the ratio of ^{26}Mg abundance to ^{24}Mg abundance. They report the uncertainty for each of these ratios. These two numbers are just what we need to span the twodimensional subspace mentioned in section 6.14. The authors leave it up to you to infer the third abundance number (by means of the sum rule). Similarly they leave it up to you to infer the uncertainty of the third number ... including its correlations. The correlations are important, as we shall see.
The smart way to deal with this data, including the correlations, is to use the Monte Carlo technique. As we shall see, this is simultaneously easier and more powerful than the textbook approach.
Monte Carlo has many advantages. It is a very general and very powerful technique. It can be applied to nonlinear problems. It is flexible enough to allow us to exploit the sum rule directly. Relatively little deviousness is required.
As mentioned in section 1.2 and section 4, we must keep in mind that there is no such thing as an “uncertain quantity”. There is no such thing as a “random number”. Instead we should be talking about probability distributions. There are many ways of representing a probability distribution. We could represent it parametrically (specifying the center and standard deviation). Or we could represent it graphically. Or (!) we could represent it by a huge sample, i.e. a huge ensemble of observations drawn from the distribution.
The representation in terms of a huge sample is sometimes considered an inelegant, bruteforce technique, to be used when you don’t understand the problem ... but sometimes brute force has an elegance all its own. Doing this problem analytically requires a great deal of sophistication (calculus, statistics and all that) and even then it’s laborious and errorprone. The Monte Carlo approach just requires knowing one or two simple tricks, and then the computer does all the work.
You can download the spreadsheet for solving the Mg molar mass question. See reference 18.
The strategy goes like this: As always, whenever we see an expression of the form A±B we interpret it as a probability distribution. We start by applying this rule to the mass data in table 9 and the abundanceratio data in table 10. This gives a mathematical distribution over five variables. Then we represent this distribution by 100 rows of simulated observations, with five variables on each row, all randomly and independently drawn from the mathematical distribution. This gives us another representation of the same distribution, namely a sampled representation. Using these observations, on each row we we make an independent trial calculation of the average mass, and then compute the mean and standard deviation of these 100 trial values.
On each row of the spreadsheet, the five raw observations are drawn independently. The three percentage abundance numbers are not raw data, but instead are calculated from the two abundance ratios. The means the three percentage abundance numbers are not independent. They exhibit nontrivial correlations.
The final answer appears in cells M10 and M12, namely 24.30498(18), where our reported uncertainty represents the onesigma error bar (unlike reference 17, which reported the twosigma error bar).
Technical notes:
If you compare my value for the average mass against the value quoted in reference 17, you find that the nominal value is the same, but the estimated uncertainty is slightly less. There are a couple of explanations for this. For one thing, they make an effort to account for some systematic errors that the Monte Carlo calculation knows nothing about. Also, at one point they add some uncertainties linearly, whereas I suspect they should have added them in quadrature. Futhermore, it’s not clear to what extent they accounted for correlated uncertainties.
Pretend that we didn’t have a sum rule. That is, pretend that the abundance data consisted of three independent random variables, with standard deviations as given in table 8. Modify the spreadsheet accordingly. Observe what happens to the nominal value and the uncertainty of the answer. How important is the sum rule?
Hint: There’s an entire column of independent Gaussian random numbers lying around unused in the spreadsheet.
To summarize: As mentioned near the top of section 6.14, the textbook approach has multiple problems: For one thing, it does the propagationofuncertainty calculations without taking the sum rule into account (which is a huge source of error). Then the dreaded “sig digs” rules make things worse in two ways: they compel the nonuse of guard digits, and they express the uncertainty very imprecisely.
The textbook answer is 24.31 dalton, with whatever degree of uncertainty is implied by that number of “sig digs”.
We now compare that with the our preferred answer, 24.30498(18) dalton. Our standard deviation is less than 8 ppm; theirs is something like one part per thousand (although we can’t be sure). In any case, their uncertainty is more than 100 times worse than ours.
Their nominal value differs from our nominal value by something like 27 times the length of our error bars. That’s a lot.
Last but not least, note that this whole calculation should not be taken overly seriously. The highprecision abundanceratio data we have been using refers to a particular sample of magnesium. Magnesium from other sources can be expected to have a different isotope ratio, well outside the error bars of our calculation.
In this section, we are interested in the isotope abundance percentages (not just the average molar mass).
Recall that reference 17 reported only the two abundance ratios. In contrast, the text reported three abundance percentages, without mentioning the sum rule, let alone explaining how the sum rule should be enforced. So the question arises, if we wanted to report the three abundance percentages, what would be the proper way to do it?
The first step toward a reasonable representation of correlated uncertainties is the covariance matrix. This is shown in cells Q3:S5 in the spreadsheet (reference 18), and shown again in equation 39
 (39) 
For uncorrelated variables, the offdiagonal elements of the covariance matrix are zero. Looking at the matrix in our example we see that the offdiagonal elements are nonzero, so we know there are correlations. Of course we knew that already, because the sum rule guarantees there will be correlations.
Alas, it is not easy to understand the physical significance of a matrix by looking at its matrix elements. For example, it may not be obvious that the matrix in equation 39 is singular ... but if you try to invert it, you’re going to have trouble.
Ideally, if we could represent the matrix in terms of its singular value decomposition (SVD), its meaning would become considerably clearer. Since the matrix is symmetric, the SVD is identical to the eigenvalue decomposition (EVD).
There exist software packages for calculating the SVD. If the matrix is larger than 3×3, it is generally not practical to calculate the SVD by hand.
Once you have the eigenvectors, it is trivial to get the eigenvalues.
Even in situations where you cannot readily obtain the exact SVD, you can still make quite a lot of progress by using an approximate SVD, which I call a ballpark decomposition (BPD). This is shown in cells Q9:AA11 in the spreadsheet and shown again in equation 40.
 (40) 
where R is a unitary matrix and S is “almost” diagonal. Specifically, R consists of a set of approximate eigenvectors of the covariance matrix, considered as column vectors, normalized and stacked sidebyside. The approximate eigenvalues of the covariance matrix appear on the diagonal of S.
The approximate eigenvalues can be figured out using the following reasoning: It is a good guess that [1, 1, 1] or something close to that is the mostexpensive eigenvalue of the covariance matrix, because if you increase all three abundance percentages, you violate the sum rule. Secondly, if you check this guess against the computed covariance matrix, equation 39, it checks out, in the sense that it is an eigenvector with zero eigenvalue. Thirdly, if you look at the definition of the covariance matrix and apply a little algebra, you can prove that [1, 1, 1] is exactly (not just approximately) an eigenvector with zero eigenvalue.
Meanwhile, the cheapest eigenvector must be [1, 0, −1] or something like that, because that corresponds to increasing the amount of ^{24}Mg and decreasing the amount of ^{26}Mg, which is cheap (in terms of Mahalanobis distance) because of the relatively long error bar on the ^{26}Mg/^{24}Mg ratio as given in table 10.
The third approximate eigenvector is determined by the requirement that it be perpendicular to the other two. (You might guess that it would be something like [1, −1, 0], but that wouldn’t be perpendicular.) In general, you can take a guess and then orthogonalize it using the GramSchmidt process. In the particular case of D dimensions where D−1 of the vectors are known, you can take the cross product (or its higherdimensional generalization). In the present example, the third member of the orthogonal set is [1, −2, 1]. This is middle eigenvector, neither the cheapest nor the most expensive.
We interpret this as follows: Since the offdiagonal elements in the Smatrix in equation 40 are relatively small, we can say that the uncertainties in the eigenvalues are almost uncorrelated. The eigenvalues are a good (albeit not quite exact) indication of the variance associated with the corresponding eigenvector. Take the square root of the variance to find the standard deviation.
Consider the following scenario. Suppose we are given that:
 (41) 
The variable x behaves as if it were drawn from some distribution X, and our goal is to find a description of this distribution.
It suffices to treat this as a mathematical puzzle unto itself, but if you would prefer to have some physical interpretation, context, and motivation, we remark that such equations can arise in connection with wave propagation (reflected wave, transmitted wave), and also in connection with optimization problems (finding the shortest path).
We can solve this equation using the smart version of the quadratic formula, as explained in section 17.
 (42) 
We can get a feel for the two variable coefficients (b and c) by making a two dimensional scatter plot. The result is a sample drawn from a twodimensional Gaussian distribution, as shown in figure 32.
The twodimensional Gaussian distribution from which this sample was drawn has the following properties: The probability density is highest near the nominal value of (b, c) = (−2.08, 1.08). The density tails off from there, gradually at first and then more quickly.
Let’s see what we can learn by using the Crank Three Times method. In this case it will actually require five turns of the crank, since we have two uncertain coefficients to deal with.
The first crank, as always, involves setting the coefficients a, b, and c to their nominal values and solving for x. When we do this, we find two solutions, namely x=1.00 and x=1.08. In some sense these x values are “centered” on the point x=1.04. We shall see that x=1.04 is a point of pseudosymmetry for this system, and we shall call it the “nominal” xvalue.
In figure 32 the region with the tan background corresponds to points (b, c)space where the discriminant b^{2}−4ac is positive, resulting in a pair of realvalued solutions for x. Meanwhile, the region with the gray background corresponds to points where the discriminant is negative, resulting in a conjugate pair of complexvalued solutions.
There is zero probability of a point falling exactly on the boundary. This would result in a double root. For example, the point (b, c) = (−2.08, 1.0816) would produce a double root at x=1.04. Since this is vanishingly unlikely, we will have nothing further to say about it, and will speak of the roots as occurring in pairs.
For present purposes, we will keep all the xvalues we find, including both elements of each pair of roots, and including complex as well as real values. (In some situations there could be additional information that would allow us to discard some of the solutions as unphysical, but for now it is easier and more informative to consider the most general case, and just keep all the solutions.)
If we (temporarily!) consider just the realvalued solutions, we find that x has lopsided error bars. This means it is not safe to describe the xdistribution in terms of some nominal value plusorminus some uncertainty. Lopsided error bars are a warning, telling us to investigate more closely, to see whether the problem is just a mild nonlinearity, or whether something very very bad is going on.
When we take into account the complexvalued solutions, we immediately discover that the situation falls into the very very bad category. The Crank Three Times method has given us a valuable warning, telling us that it cannot give us the full picture. To get the full picture, we need to do a fullblown Monte Carlo analysis. The result of such an analysis can be presented as a scatter plot in the complex plane, as shown in figure 33.
The distribution of xvalues can be plotted in the complex plane, as shown in figure 33. This distribution does not even remotely resemble a twodimensional Gaussian. It looks more like some sort of diabolical pitchfork.
The probability density actually goes to zero at the nominal point x=1.04.
Sprouting out from the nominal xvalue are four segments, shown using four different colors in the diagram. These correspond to whether we take the plus or minus sign in front of the ± square root, and whether the discriminant (b^{2}−4ac) is positive or negative. (The sign of the discriminant depends on the luck of the draw, when we draw values for the coefficients b and c. The ± sign does not depend on the luck of the draw, because except in the case of a double root, for every point in (a,b,c)space we get two points in xspace.)
This diagram is moreorless equivalent to something that in another context would be called a root locus plot or root locus diagram.
In the interests of simplicity, let us consider a slightly different version of the same problem. The statement of the problem is the same as before, except that there is less uncertainty on the coefficients. Specifically, we wish to describe the distribution X that models the behavior of the variable x, given that:
 (43) 
The scatter plot for the coefficients (b, c) is shown in figure 34.
The corresponding scatter plot for the solutions x in the complex plane is shown in figure 35. The pitchfork shape is less evident here. It looks more like a Greek cross. The curvature of the upper and lower segments is barely visible. Compared to figure 33, this is similar except more “zoomed in”; that is, all the points now lie closer to the nominal xvalue. The probability density is still zero at the nominal point, so the nominal solution is by no means the best solution. It is arguably not even a solution at all.
Mathematically speaking, it is straightforward to calculate the sample mean, i.e. the mean of the points shown in figure 35. It comes out to very nearly the nominal xvalue, namely x=1.04.
Also mathematically speaking, it is straightforward to calculate the variance and the standard deviation of the sample points. The standard deviation is essentially the RMS distance of the points from the mean value. Actually I prefer to call it the RMAS, for rootmeanabsolutesquare, since technically speaking we want the absolute square x^{2} rather than the plain old square x^{2}. It comes out to be about 0.11 for this sample.
I emphasize that calculating these numbers is easier than assigning any useful meaning to the numbers. Specifically, it would be grossly misleading to describe this distribution in terms of its mean and standard deviation. That is, it would be grossly misleading to write x=1.04±0.11 without stating the form of the distribution. This distribution is about as nonGaussian as anything I can imagine. For figure 35, it might make sense to describe the mean and standard deviation of each of the four segments separately ... but for figure 33, not even that would do a good job of describing the overall xdistribution.
Note that if we – hypothetically and temporarily – pretend the RMAS is a useful measure of the uncertainty, then the relative uncertainty on x is almost 11 percent, which is more than an order of magnitude larger than the uncertainty in either of the coefficients. Nonhypothetically speaking, keep in mind that the RMAS barely begins to describe what we know (and don’t know) about the distribution of xvalues.
These examples illustrate the importance of plotting the data and looking at it, rather than relying on mathematical abstractions such as mean and standard deviation. If you just blithely calculated numerical values for the mean and standard deviation, you would come nowhere near understanding this system.
These examples also illustrate the tremendous power of the Monte Carlo method. It works when other methods fail.
In the introductory texts, when they lay down “rules” for propagating the uncertainty stepbystep, they often neglect to mention that you need to systematically check the radius of convergence at every step. If you fail to check, convergence problems will go unnoticed, and you will get seriously wrong answers. Unfortunately, this sort of checking is quite laborious, so it is seldom done, and serious errors are common.
Remember that there are three problems layered on top of each other: Misrepresentation, Malexpansion, and Correlation. This is discussed in section 6.1.
Bottom line: In this example, and in many similar examples, if you want a good, simple, quantitative answer for the nominal value and uncertainty of the distribution X, you’re out of luck. There is no such thing. We need to ask a different question, such as “How can we understand what’s going on in this system?”
Looking at a scatter plot such as figure 35 is a good starting point for understanding what is going on.
Suppose we have a procedure, consisting of one or more steps. We start with a_{i} and then calculate b_{i} and then c_{i} et cetera. Here a_{i} is an observation drawn from some distribution A. We assume the distribution A can be represented by a blob of the form ⟨A⟩±[A] where ⟨A⟩ is the mean and [A] is the standard deviation.
The hallmark of stepbystep propagation is that at each step in the calculation, rather than keeping track of plain old numbers such as a_{i}, b_{i} et cetera, we keep track of the corresponding distributions, by means of the blobs ⟨A⟩±[A], ⟨B⟩±[B], et cetera.
This approach suffers from three categories of problems, namely misrepresentation, malexpansion, and correlation.
People often ask for some mathematical rules for keeping track of the uncertainty at each step in a long calculation, literally “propagating’ the uncertainty on a stepbystep basis. This approach works fine in a few simple, ideal cases. Perhaps the biggest advantage of the stepbystep approach is that thinking about the logic behind the rules helps give you a feel for what’s going on, and allows you to predict which steps are likely to make the largest contributions to the overall uncertainty.
On the other hand, beware: The stepbystep firstorder approach is subject to many provisos that often make it inapplicable to practical problems. (If you ignore the provisos, you will get wrong answers – often with little or no warning.)In a complicated multistep problem, you may find that stepbystep firstorder propagation works fine everywhere except for one or two steps. Alas, a chain is only as strong as its weakest link, so the method fails to solve the overall problem. The quadratic formula in section 6.18 serves as an example of just such an overall failure, even though the method worked for every step except one, i.e. except for the step that called for extracting the square root.
Also beware that even in cases where the stepbystep method is applicable, it can become quite laborious. For example, when stepping through the quadratic formula (as in equation 41 for example), there is a product, then a sum, then a square root, then another sum, and then a division. This requires repeated conversion between absolute uncertainty and relative uncertainty. In this case, calculating the uncertainty requires about three times as many arithmetical operations as calculating the nominal value. You can reduce the workload by using ultracrude approximations to the uncertainty (such as sig figs), but this gives you the wrong answer. There is no advantage to having an easy way of getting the wrong answer.
Generally speaking, when dealing with messy, complicated, practical cases you’re better off letting a computer do the work for you. You can start with the Crank Three Times method discussed in section 6.13, and if that’s not good enough, you can use the Monte Carlo^{7} method as discussed in section 6.15.
These rules have some advantage and disadvantages. In situations where they are valid, they are very convenient. For example, if you know that a certain distribution has a mean of 19 and a relative uncertainty of 10%, then if you double every element of the ensemble you get a new distribution with a mean of 38 and the same relative uncertainty, namely 10%. This is easy and intuitive, and gets the right answer in this situation. You don’t need to understand any calculus, you don’t need to worry about the radius of convergence, and you hardly need to do any work at all.
However, beware that a collection of anecdotes is not a proof. These rules work in certain selected situations, but they fail miserably in other situations.
I assume you already know how to add, subtract, multiply, and divide numbers, so we will now discuss how to add, subtract, multiply, and divide probability distributions, subject to certain restrictions.
Each of the capitalletter quantities here (A, B, and C) is a probability distribution. We can write A := m_{A}±σ_{A}, where m_{A} is the mean and σ_{A} is the standard deviation.
The best way to explain where these rules come from is to use calculus, but if you don’t know calculus you can (a) start by accepting the rules as plausible hypotheses, and then (b) checking them for consistency. More specifically, calculus is needed for any serious understanding of the limitations of the rules.
σ_{C}^{2} = σ_{A}^{2} + σ_{B}^{2} (44) 
(σ_{C}/m_{C})^{2} = (σ_{A}/m_{A})^{2} + (σ_{B}/m_{B})^{2} (45) 
σ_{B}/m_{B} = N σ_{A}/m_{A} (46) 
Note that you cannot get this result by applying the product rule. The product rule is not applicable, since taking powers involves multiplying quantities with correlated uncertainties.
If N is not an integer, equation 46 is not reliable. It might work, or it might not. For example, consider the case where N=½. Suppose we know x^{2} = y and the distribution on y is 81±1ppm. The problem is, we don’t know whether x ≈ 9 or x ≈ −9, so we might need to write x = 0±9, in which case the uncertainty on x is incomparably more than the uncertainty on y. For more on this, see section 6.18.
Bottom line: As a practical matter, stepbystep “algebraic” propagation of uncertainty calculation is usually not the best approach. Usually Monte Carlo is both better and easier. The more steps in the calculation, the more you gain from the Monte Carlo approach.
Here is an example where the propagation rules give the correct answer. For a counterexample, see section 6.22.
Suppose somebody asks you to carry out the computation indicated on the RHS of equation 47. If you wish, for concreteness you may imagine that the first number is a raw observation, the second number is some scale factor or conversion factor, and the third number is some baseline that must be subtracted off.
x = 4.4(⁄) × 2.617(⁄) − 9.064(⁄) (47) 
As always, the (⁄) indicates that the uncertainty results from roundoff, and is a halfcount in the last decimal place. That means we can restate the problem as 4.4±.05 × 2.617±.0005 − 9.064±.0005, with due regard for the fact that roundoff errors are never Gaussian distributed. In this example, for simplicity, we assume the roundoff errors follow a rectangular distribution.
Using the usual precedence rules, we do the multiplication first. According to the propagation rules in section 6.19, we will need to convert the absolute uncertainties to relative uncertainties.
That gives us: 4.4±1.14% × 2.617±0.02%. When we carry out the multiplication, the result is 11.5148±1.14%. Note that the uncertainty in the product is entirely dominated by the uncertainty in the first factor, because the uncertainty in the other factor is relatively small.
Next we convert back from relative to absolute uncertainties, then carry out the subtraction. That results in 11.5148±0.131 − 9.064±.005 = 2.4508±0.131.
Now we have to decide how to present this result. One reasonable possibility would be to round it to 2.45±0.13 or equivalently 2.45(13). One could maybe consider heavier rounding, to 2.5(1). Note that this version differs from the previous version by 39% of an error bar, which seems like a nasty thing to do to your data.
Trying to express the foregoing result using sig digs would be a nightmare, as discussed in more detail in section 16.5.4. Expressing the result properly, e.g. 2.45(13), is no trouble at all.
The calculation set forth in equation 47 is an example of what we call a noise amplifier. We started with three numbers, one of which had about 1% relative uncertainty, and the others much less. We ended up with more than 5% relative uncertainty.
This is not a problem with the stepbystep approach; Monte Carlo would have given you the same result.
It appears that the uncertainty grew during the calculation, but you should not blame the calculation in any way. The calculation did not cause the uncertainty; it merely made manifest the uncertainty that was inherent in the situation from the beginning.
As a rule of thumb: Any time you compute a small difference between large numbers, the relative uncertainty will be magnified.
If you have a noise amplifier situation that results in unacceptable uncertainty in the final answer, you will need to make major changes and start over. In some cases, it suffices to a more precise measurement of the raw data. In other cases, you will need to make major architectural changes in the experimental apparatus and procedures, perhaps using some sort of “null” technique (electrical bridge, acoustical beats, etc.) so that subtracting off such a large “baseline” number is not required.
Let’s carry out the calculation of the pH along the lines suggested in section 6.7. We assume a dilute solution of a weakish acid:
 (48) 
We can find the pH by direct application of the lame quadratic formula, i.e. equation 90. If you understand what’s going on, you know that the actual relative uncertainty in the pH is one percent. The Crank Three Times method gives the correct answer, namely one percent.
In this section we will compare the correct result with the result we get from propagating the uncertainty stepbystep, using the rules set forth in section 6.19.2 ... except that we will not pay attention to the provisos and limitations that are contained in the rules.
Here is a snapshot of the spreadsheet (reference 20) used to carry out the calculation. The final pH has a calculated uncertainty, highlighted with boldface, that is off by about three orders of magnitude. The explanation is that in one of the steps, we subtracted two numbers with highly correlated uncertainties, violating one of the crucial provisos.
symbol  meaning  numerical  abs uncertainty  rel uncertainty  
a  1  1  0  –>  0.00%  
b  Ka  0.001  0.0001  <–  10.00%  
Cha  1e05  1e07  <–  1.00%  
c  Ka Cha  1e08  1.005e09  <–  10.05%  
b**2  1e06  2e07  <–  20.00%  
4ac  4e08  4.02e09  <–  10.05%  
b**2  4ac  1.04e06  2e07  –>  19.23%  
sqrt(..)  0.00102  9.808e05  <–  9.62%  
b + sqrt()  1.98e05  0.0001401  –>  707.28%  
../2  pH  9.902e06  7.003e05  –>  707.28%  <<< 
b  sqrt()  unphysical  0.00202  0.0001401  –>  6.93%  
../2  big root  0.00101  7.003e05  –>  6.93% 
There are two parts to the lesson here:
In this example, the problem is so large as to be obvious. However, beware that in other situations, you could easily make a mistake that is not quite so conspicuous ... just wrong enough to be fatal, but not wrong enough to be noticeable until it is too late.
Hint: If you want to see some lessobvious mistakes, try modifying this example by increasing the concentration and/or decreasing the uncertainty on the concentration.
Note that the more robust version of the quadratic formula, equation 87, does slightly better, but still does not play nicely with the stepbystep propagation rules. It gets an uncertainty that is off by “only” about one order of magnitude.
Also keep in mind that no matter what you are doing, you can always make it worse by using sig figs. Section 6.7 shows how sig figs can mess up the quadratic formula in general and pH calculations in particular.
The basic scenario goes like this: We start with some raw data. The distribution over raw data has some uncertainty. We choose a model that has some adjustable parameters. We run the data through the curvefitting process. This gives us a set of bestfit parameters. There will be some uncertainty associated the parameters.
There are methods for estimating the uncertainty, based on what we know about the model and the distribution of raw data. This can be considered a form of stepbystep analytic propagation of the kind considered in section 6.19. As such, it might work or it might not. It is, as the saying goes, a checkable hypothesis. After doing the calculation, it is rather easy to wiggle the parameters and confirm that the fitted model is behaving in a way that is consistent with the estimated uncertainties.
For the next level of detail on this, see reference 21.
There are some simple situations where simple approaches provide accurate propagation and/or provide useful insight. In these situations the simple approaches should be used and fancier methods would be a waste of effort. For example, as mentioned in section 6.19.2, if you know that a certain distribution has a mean of 19 and a relative uncertainty of 10%, then if you double every element of the ensemble you get a new distribution with a mean of 38 and the same relative uncertainty, namely 10%. This is easy and intuitive and gets the right answer in this situation.
Consider the following multiway contrast:
In this case, the right answer is less laborious than stepbystep propagation, by at least a factor of 2.
However, there are lots of situations where the hard part is checking the validity. After you figure that out, the calculation is probably easy ... but you have to account for all the work, not just the calculational crankturning work.
If you skip the validation step, you are very likely to get the wrong answer with no warning.
Even when an analytic solution exists, it might be a good idea to check it against the Monte Carlo solution. Analytic calculations are not infallible.
Errors of this kind can be exceedingly hard to catch. However, the Monte Carlo solution provides a very powerful check.
This contrasts with the stepbystep approach, where (at a minimum) you need two equations: one equation for the nominal value ⟨X⟩ and another verydifferent equation for the uncertainty [X]. Just not having to derive (and check!) this second equation may be a significant savings. The fact that you need 1000 iterations to collect the Monte Carlo statistics is a negligible cost, because you don’t do that work yourself; the computer does it.
Last but not least, there are plenty of situations where Monte Carlo is the only option.
Suppose you are taking data. How many raw data points should you take? How accurately should you measure each point? There are reliable schemes for figuring out how much is enough. However, the reliable schemes are not simple, and the simple schemes are not reliable. Any simple rule like “Oh, just measure everything to three significant digits and don’t worry about it” is highly untrustworthy. Some helpful suggestions will be presented shortly, but first let’s take a moment to understand why this is a hard problem.
First you need to know how much accuracy is needed in the final answer, and then you need to know how the raw data (and other factors) affect the final answer.
Sometimes the uncertainties in the raw data can have less effect than you might have guessed, because of signalaveraging or other clever data reduction (section 6.11) or because of anticorrelated errors (section 6.15). Conversely, sometimes the uncertainties in the raw data can be much more harmful than you might have guessed, because of correlated errors, or because of unfavorable leverage, as we now discuss.
As an example of how unfavorable leverage can hurt you, suppose we have an angle theta that is approximately 89.3 or 89.4 degrees. If you care about knowing tan(theta) within one part in a hundred, you need to know theta within less than one part in ten thousand.
Whenever there is a singularity or nearsingularity, you risk having unfavorable leverage. The proverbial problem of small differences between large numbers falls into this category, if you care about relative error (as opposed to absolute error).
If you are recording some points:
If you are describing a distribution, and you think it can be described in terms of its center and halfwidth:

There are several equally good ways of expressing the mean and halfwidth of a distribution. It usually doesn’t matter whether the uncertainty is expressed in absolute or relative terms, so long as it is expressed clearly. For example, here is one common way to express the relative uncertainty of a distribution:
 (49) 
Meanwhile, there are multiple ways to express the absolute uncertainty of a distribution. The following are synonymous:

Another way of expressing absolute uncertainty is:
 (51) 
The “interval” or “range” notation in equation 51 has the connotation that the probability is flat and goes to zero outside the stated interval. A flat distribution can result from roundoff, or from other quantization phenomena such as discrete drops coming out of a burette. You could use either of the forms in equation 50 for such a distribution, but then there would be questions as to whether the stated error bars represented the HWHM or the standard deviation.
Sometimes the uncertainty can be expressed indirectly, for example by giving a rule that applies to a whole family of distributions. See section 5.1 for an example.
There are a couple of additional special rules for raw data, as described in section 7.3. Otherwise, all these recommendations apply equally well to measured quantities and calculated quantities.
Remember that a distribution has width, but an individual point sampled from that distribution does not. For details on this, see section 4.2 and reference 2.
Therefore, if you are recording a long list of points, there is normally no notion of uncertainty attached to the individual points, so the the question of how to express uncertainty on a perpoint basis does not arise. If you want to describe the distributional properties of the whole collection of points, do that separately. Note the contrast:
The Wrong Way: write down 1000 points using 2000 numbers, i.e. one mean and one standard deviation per point.  The Right Way: Write down the points and describe the distribution using 1002 numbers, i.e. one number per point, and then one mean and one standard deviation for the distribution as a whole. 
Note that there is a distinction between the mean and standard deviation of the sample, and the samplebased estimate of the mean and standard deviation of the population. For an explanation of this, see reference 2.
You should report the form of the distribution, as discussed in section 7.4. Once the form of the distribution is known, if it is a twoparameter distribution, then any of the expressions in equation 49 or equation 50 or perhaps equation 51 suffice to complete the description of the distribution.
In the nottoounusual situation where the uncertainty of a distribution is dominated by roundoff error or some similar quantization error, the situation can be expressed using a slash in parentheses:
 (52) 
This can be viewed as shorthand for 0.087(½) i.e. an uncertainty of half a count in the last place, or equivalently 0.0870(5), but it also conveys the fact that the distribution of roundoff errors is usually highly nonGaussian, usually closer to a flat distribution. In particular, the standard deviation may be markedly smaller than the halfwidth, as discussed in connection with figure 21.
Similarly, if the uncertainty is dominated by a onesided truncatation error (such as rounding down), this an be expressed using a plussign in parentheses:
 (53) 
This can be viewed as shorthand for 0.0875(5), with a highly nonGaussian distribution.
Returning to the basic recommendations given at the start of this section: These recommendations do not dictate an “exactly right” number of digits. You should not be surprised by this; you should have learned by now that many things – most things – do not have exact answers. For example, suppose I know something is ten inches long, plus or minus 10%. If I convert that to millimeters, I get 254 mm, ± 10%. I might choose to round that off to 250 mm, ± 10%, or I might choose not to. In any case I am not required to round it off.
Keep in mind that there are plenty of numbers for which the uncertainty doesn’t matter, in which case you are free to write the number (with plenty of guard digits) and leave its uncertainty unstated. For example, an experiment might involve ten numbers, one of which makes an obviously dominant contribution to the uncertainty, in which case you don’t need to obsess over the others.
When comparing numbers, don’t round them before comparing, except maybe for qualitative, ataglance comparisons, and maybe not even then, as discussed in section 7.6.
When doing multistep calculations, whenever possible leave the numbers in the calculator between steps, so that you retain as many digits as the calculator can handle.^{8} Leaving numbers in the calculator is vastly preferable to copying them from the calculator to the notebook and then keying them back into the calculator; if you round them off you introduce roundoff error, and if you don’t round them off there are so many digits that it raises the risk of miskeying something.
Similarly: When cutandpasting numbers from one program to another, you should make sure that all the available digits get copied. And again similarly: When a program writes numbers to a file, to be read back in later, it should ordinarily write out all the available digits. (In very exceptional cases where this would incur unacceptable inefficiency, some sort of careful data compression is needed. Simple rounding does not count as careful data compression.)
Note that the notion of “no unintended loss of significance” is meant to be somewhat vague. Indeed the whole notion of “significance” is often hard to quantify. You need to take into account the details of the task at hand to know whether or not you care about the roundoff errors introduced by keeping fewer digits. For instance, if I’m adjusting the pH of a swimming pool, I suppose I could use an analytical balance to measure the chemicals to one part in 10^{5}, but I don’t, because I know that nobody cares about the exact pH, and there are other farlarger sources of uncertainty.
When thinking about precision and roundoff, it helps to think about the same quantity two ways:
Therefore it makes sense to use a twostep process: First figure out how much roundoff error you can afford, and then use that to give you a lower bound on how many digits to use.
Beware that the terminology can be confusing here: N digits is not the same as N decimal places. Let’s temporarily focus attention on numbers in scientific notation (since the sigdigs rules are even more confusing otherwise). A numeral like 1.234 has four digits, but only three decimal places. Sometimes it makes sense to think of it in fourdigit terms, since it can represent 10^{4} different numbers, from 1.000 through 9.999 inclusive. Meanwhile it sometimes makes sense to think of it in threedecimalplace terms, since the stepsize (stepping from one such number to the next) is 10^{−3}.
If you want to keep the roundoff errors below one part in 10 to the Nth, you need N decimal places, i.e. N+1 digits of scientific notation. For example numbers near 1.015 will be rounded up to 1.02 or rounded down to 1.01. That is, the roundoff error is half a percent.
Also beware that roundoff errors are not normally distributed. In multistep calculations, roundoff errors accumulate faster than normallydistributed errors would. Details on this problem, and suggestions for dealing with it, can be found in section 6.11. Additional discussion of roundoff procedures can be found in reference 7.
The cost of carrying more guard digits than are really needed is usually very small. In contrast, the cost of carrying too few guard digits can be disastrously large. You don’t want to do a complicated, expensive experiment and then ruin the results due to roundoff errors, due to recording too few digits.
When you are making observations, the rule is that you should record all the original data, just as it comes from the apparatus. Do not make any “mental conversions” on the fly.
We are making a distinction between the raw data and the calculations used to analyze the data. The point is that if you keep all the raw data, if you discover a problem with the calculation, you can always redo the calculation. Redoing the calculation may be irksome, but it is usually much less laborious and much less costly than redoing all the lab work.
There is a wide class of analog apparatus – including rulers, burettes, graduated cylinders etc. – for which the following rule applies: It is good practice to record all of the certain digits, plus one estimated digit. For example, if the finest marks on the ruler are millimeters, in many cases you can measure a point on the ruler with certainty to the nearest millimeter … and then you should try to estimate how far along the point is between marks. If you estimate that the point is halfway between the 13 mm and 14 mm marks, record it as 13.5 mm. This emphatically does not indicate that you know the reading is exactly 13.5 mm. It is only an estimate. You are keeping one guard digit beyond what is known with certainty, to reduce the roundoff errors. You don’t want roundoff errors to make any significant contribution to the overall uncertainty of the measurement. [Also, if possible, include some indication of how well you think you have estimated the last digit: perhaps 13.5(5)mm or 13.5(3)mm or even 13.5(1)mm if you have really sharp eyes.]
There is a class of instruments, notably analog voltmeters and multimeters, where in order to make sense of the reading you need to look at the needle and at the rangesetting knob. (This is in contrast to digital meters, where the display often tells the whole story.) I recommend the following notation:
Reading  Scale  
2.88  /3*300mV  
2.88  /10*1V 
which is to be interpreted as follows:
Reading  Scale  Interpretation  
2.88  /3*300mV  “2.88 out of three on the 300mV scale”  
2.88  /10*1V  “2.88 out of ten on the 1V scale” 
Note that both of the aforementioned readings correspond to 0.288 volts.
There are two things going on here: First of all, converting onthefly from what the scale says (2.88) to SI units (0.288) is too error prone, so don’t do it that way; record the 2.88 as is, and do the conversion later. Secondly, there are two ways of getting this reading, either most of the way up on the 300mV scale (the first line in the table above) or partway up on the 1V scale (the second line). It is important to record which scale was used, in case the two scales are not equally well calibrated.
Note that the notation “/3*300mV” also tells you the algebraic operations needed to convert the raw data to SI units: in this case divide by 3, and multiply by 300mV.
Whenever you are describing a distribution, it is important to specify the form of the distribution, i.e. the family from which your distribution comes. For instance if the data is Gaussian and IID, you should say so, unless this is obvious from context. Only after the family is known does it make sense to report the parameters (such as position and halfwidth) that specify a particular member of the family.
On the other side of the same coin, people have a tendency to assume distributions are Gaussian and IID, even when there is no reasonable basis for such an assumption. Therefore if your data is known to be – or even suspected to be – nonGaussian and/or nonIID, it is doubly important to point this out explicitly. See section 12.8 for more on this.
As mentioned in section 2.1, whenever you write down a number, you have to round it to “some” number of digits. As mentioned in section 1.1, you should keep many enough digits so that roundoff error does not cause any unintended loss of significance. Therefore, we need to understand the effect of roundoff error.
Figure 36 shows how a Gaussian distribution is affected by roundoff. It shows an “original” distribution and two other distributions derived from that by rounding off, as follows:
distribution  representation  remark  
3.8675309 ± 0.1  solid blue line  original  
3.87 ± 0.1  dashed yellow line  rounded to two places  
3.9 ± 0.1  dotted red line  rounded to one place 
Obviously, the blue curve is the best. It is the most faithful representation of the real, original distribution.
As I see it, the dashed yellow curve is not better, but it’s not much worse than the original. Its KullbackLeibler information divergence (relative to the original) is about 0.0003. You can see that even if you keep more digits than are called for by the sigfigs rules, the roundoff error is not entirely negligible.
The dotted red curve is clearly worse. You can see at a glance that it represents a different distribution. It’s KL information divergence (relative to the original) is more than 0.05. You can see that following the sigfigs rules definitely degrades the data.
To show the effect of rounding, let’s do the following experiment, which can done using nothing more than a spreadsheet program: We draw a sample consisting of N=100 numbers, drawn from a source distribution, namely a Gaussian centered at 1.17 with a standard deviation of 0.05.
As usual, the first thing to do is look at a scatter plot of the data, as shown in figure 37. We calculate a mean of 1.164 and a standard deviation of 0.0510, so the sample is not too dissimilar from the source distribution.
Next we round each data point to the nearest 0.01, and histogram the results. This is shown in figure 38.
Next we round off this data to the nearest 0.1 units and histogram the results. This is shown in figure 39. The mean and standard deviation of the rounded data are 1.157 and 0.0624 ... which means that the roundoff has increased the spread of the data by more than 20%.
Rather than plotting the probability density, which is what these histogram are doing, it is often smarter to plot the cumulative distribution. This is generally a good practice when comparing two distributions, for reasons discussed in reference 2. This is shown in figure 40. The green curve is the theoretical distribution, namely the integral of a Gaussian, which we recognize as a scaled and shifted error function, erf(...), as discussed in reference 2.
You can see that the raw data (shown in black) does a fairly good job of sticking to the theoretical distribution. The data that has been rounded to the nearest 0.01 (shown in blue) does a slightly worse job of sticking to the theoretical curve, and the data that has been rounded to the nearest 0.1 (shown in red) does a much, much worse job.
Now let’s see what this looks like if we use a larger sample, namely N=1000 points, as shown in figure 41. You can see that the raw data (shown in black) is smoother, and sticks to the theoretical curve more closely.
In the limit, by using everlarger samples, we can make the black curve converge to the green curve as closely as desired. The convergence works like this: Each of the N raw data points in figure 37 can be considered a delta function with measure 1/N. When we integrate to get the cumulative distribution, as in figure 40 or figure 41, each data point results in a step, such that the black curve rises by an amount 1/N. If you look closely, you can see 100 such steps in figure 40. For arbitrarily large N, the steps become arbitrarily small.
In contrast, the rounded data will always be a series of stairsteps, due to the rounding, and the steps do not get smaller as we increase N. In this example, the red curve will never be much better than a twostep approximation to the error function, and the blue curve will never be much better than a 20step approximation. The only way to get the rounded data to converge would be to use less and less rounding, i.e. more and more digits.
If we think in terms of relative error, aka percentage error, we see that roundoff does not affect all numbers the same way. Figure 42 shows the percentage error introduced by rounding X to one significant digit, plotted as a function of X. The function is periodic; each decade looks the same.
For numbers near 150, the roundoff error is 33%. For numbers near 950, the roundoff error is barely more than 5%.
The situation does not improve when the number of digits gets larger, as you can see from figure 43. For numbers near 105, the roundoff error is 5%. Meanwhile, for numbers near 905, the roundoff error is an order of magnitude less.
When some quantity has been observed repeatedly and the ensemble of observations has an uncertainty of 1%, there is an alltoocommon tendency for people to say the measurement is “good to two significant figures”. This is a very sloppy figure of speech, and should be avoided.
As always, the rule should be: Say what you mean, and mean what you say.
As a rule, whenever you are tempted to say anything in terms of significant digits, you should resist the temptation. There is almost certainly a better way of saying it.
Note the following contrast:
Sometimes roundoff error looks somewhat random. If we start with a bunch of random numbers and round them off, the roundoff errors will exhibit some degree of randomness.  Sometimes roundoff error is completely nonrandom. If we start with 1.23 and round it off to one decimal place, we get 1.2 every time. 
In some cases, the roundoff errors will be uniformly distributed.  In some cases, even if the roundoff errors are somewhat random, the distribution will be highly nonuniform. 
As a slight digression, let us look at some random data (figure 44). We shall see that it does not look anything like roundoff errors (figure 42 or figure 43).
Suppose we conduct an experiment that can be modeled by the following process: For a given value of λ, we construct a Poisson random process with expectation value λ. We then draw a random number from this process. We calculate the residual by subtracting off the expected value. We then express the residual in relative terms, i.e. as a percentage of the expected value. All in all, the normalized residual is:
 (54) 
For selected values of λ we collect ten of these normalized residuals, and plot them as a function of λ, as shown in figure 44. The magenta curves in the figure represent ±σ, where σ is the standard deviation of the normalized residuals.
Our purpose here is to compare and contrast two ideas:
In both cases, the ordinate in the figure is the percentage “discrepancy”. The style of representation is the same, to facilitate comparing the two ideas.
Now, when we make the comparison, we find some glaring dissimilarities.
In contrast, the random data plotted in figure 44 is not a function. There are ten different residuals (the ordinate) for each value of λ (the abscissa).
Here is a good estimate for the mass of the earth, as discussed in section 8.3:
 (55) 
Looking at this value, you might be tempted to think that the nominal value has several insignificant digits, five digits more than seem necessary, and six or seven digits more than are allowed by sig figs doctrine. However, it would be tremendously destructive to round off this number. We need all those digits for some purposes, including forming products such as GM_{⊕} and ratios such as M_{⊕}/M_{⊙}, as discussed in section 8.
The fundamental problem is that the uncertainty indicated in equation 55 only tells us about the variance, and doesn’t tell us about the covariance between M_{⊕} and other things we are interested in.
Indeed, the whole idea of associating a single uncertainty with each variable is Dead on Arrival, because when there are N variables, we need on the order of N^{2} covariances to describe what is going on.
Using decent terminology, as in equation 55, we are allowed to write down enough digits. We are allowed to keep the roundoff error small enough, even to the point where it is several orders of magnitude smaller than the standard deviation.  The sig figs rules would require us to round things off until the roundoff error was comparable to the standard deviation. If we went on to calculate GM_{⊕} or M_{⊕}/M_{⊙}, the accuracy would be reduced by several orders of magnitude. 
Indeed, decent terminology allows us take a multistep approach, which is usually preferable: First, write down M_{⊕} = 5.9725801308 ×10^{24} kg, with no indication of uncertainty. Similarly, write down all the other quantities of interest, with no indication of uncertainty. In a later step, write down the full covariance matrix, all in one place.
It is permissible to write something like M_{⊕} = (5.9725801308 ± 0.00071)×10^{24} kg, but indicating the uncertainty in this way is possibly misleading, and at best redundant, because you are going to need to write down the covariance matrix eventually. The variances are the diagonal elements of the covariance matrix, and this is usually the best way to present them.
In the exceptional case where all the variables are uncorrelated, the covariance matrix is diagonal, and we can get away with using simple notions of “the” uncertainty “associated” with a particular variable.
See section 8.
One of the rare situations where rounding off might arguably be helpful concerns eyeball comparison of numbers. In particular, suppose we have the numbers
 (56) 
and we are sure that a halfpercent variation in these numbers will never be significant. From that we conclude that on the first line there is no significant difference between a and b, while on the second line there is. Superficially, it seems “easier” to compare roundedoff numbers, since rounding makes the similarities and differences more immediately apparent to the eye:
 (57) 
However, rounding is definitely not the best way to facilitate comparisons. Rounding can get you into trouble. For example, if 3.4997 gets rounded down to 3 and 3.5002 gets rounded up to 4, you can easily get a severely false mismatch. On the other side of the same coin, if 3.5000 gets rounded up to 4, and 4.4997 gets rounded down to 4, you get a false match. Once again, we find that aggressive rounding produces wrong answers. Note that the sigfigs rules require aggressive rounding.
It is far more sensible to subtract the numbers at full precision, tabulate the results (as in equation 58), and then see whether the magnitude of the difference is smaller than some appropriate amount of “fuzz”.
 (58) 
If you are doing things by computer, computing the deltas is no harder than computing the roundedoff versions, and you should always write programs to display the deltas without rounding. (Here “delta” is shorthand for the difference b−a.) While you are at it, you might as well have the computer display a flag whenever the delta exceeds some configurable threshold.
Compared to equation 56 or even equation 57, the advantage goes to equation 58. It makes it incomparably less likely that important details will be overlooked.
Even if you are doing things by hand, you should consider calculating the deltas, especially if the numbers are going to be looked at more times than they are calculated. It is both easier and less errorprone to look for largepercentage variations in the deltas than to look for smallpercentage variations in the original values.
The need for guard digits is intimately connected to the fact that uncertainty is not the same as insignificance. See section 6.11, section 16.4.4, section 11, and especially figure 50 in section 13.
One category of examples concerns uncertain digits whose significance can be revealed by signal averaging, as discussed in section 6.11.
Another category of examples concerns correlated uncertainties, in which case guard digits are particularly important. A familiar subcategory comprises situations where there is a small difference between large numbers.
As an example in this category, suppose we have a meter stick lying on the ground somewhere at NIST, in Gaithersburg, oriented north/south. We wish to record this in a Geospatial Information System (GIS). Let point A and point B represent the two ends of stick. We record these in the database in the form of latitude and longitude, as follows:
 (59) 
The uncertainty of ± 0.002 represents the fact that the location of the stick is known only approximately, with an uncertainty of a couple hundred meters.
You may be wondering why we represent these numbers using nine decimal places, when the sigfigs doctrine says we should use only three. The answer is that the difference between these two vectors is known quite accurately. The difference A−B is 0.0000090075(90) degrees of latitude, i.e. one meter, with an uncertainty of ± 1 millimeter or less.
We emphasize that the absolute uncertainty in A−B is on the order of a millimeter or less, whereas the uncertainty in A or B separately is several orders of magnitude greater, on the order of hundreds of meters.
Remember: As mentioned in section 2.1, section 5.3, section 6.3, and section 16.1, roundoff error is only one contribution to the overall uncertainty. The uncertainty in A or B separately is on the order of 0.002, but that does not tell you how much precision is needed. The sig figs approach gets the precision wrong by a factor of a million. Situations like this come up all the time the real world, including GIS applications and innumerable other applications.
I often get questions from people who are afraid there will be an outbreak of too many insignificant digits. A typical question is:
“What if a student divides distance by time and reports the result as 0.285714286 m/s? Isn’t that just wrong? In the absence of other information, it implies an uncertainty of 0.0000000005 m/s, which is a gross underestimate, isn’t it?”
My reply is always the same: No, those “extra” digits are not wrong, and they do not imply anything about the uncertainty.
Yes, I see nine digits, but no, that doesn’t tell me the uncertainty. The uncertainty might be much greater than one part in 10^{9}, or it might be much less. If the situation called for stating the uncertainty, I might fault the student for not doing so. However, there are plenty of cases where the uncertainty does not need to be expressed, and may not even be knowable, in which case the only smart thing to do is to write down plenty of guard digits.
Suppose we later discover the uncertainty was 10%. Then I interpret 0.285714286 as having eight guard digits. Is that a problem? I wish all my problems were as trivial as that.
If you think excess digits are a crime, we should make the punishment fit the crime. Let’s do the math:
My time is valuable. The amount of my time wasted by people who are worried about the “threat” of excess digits greatly exceeds the amount of my time wasted reading excess digits.
My advice: Breathe in. Breathe out. Relax already. Excess digits aren’t going to hurt you. They might even help you.

In an introductory course, the most sensible approach is to adopt the following rules:
This is much simpler than dealing with sig figs. It also more honest. Reporting no information about the uncertainty is preferable to reporting wrong information about the uncertainty (which is what you get with sig figs).
If the students are “mathematically challenged” and even “reading challenged”, it is a safe bet that they are not doing multidigit calculations longhand. And they probably aren’t using slide rules either. So let’s assume they are using calculators. Therefore the burden of keeping intermediate results to 6digit precision or better (indeed much better) is negligible. It has the advantage of getting them in the habit of keeping plenty of guard digits.
Yes, some of those digits will be insignificant. So what? Extra digits will not actually kill anybody.
At some point in the course, we want the students to develop “some” feeling for uncertainty. So let’s do that. We can do it easily and correctly, using the crankthreetimes method as described in section 6.13. (Apply it to selected problems now and then, not every problem.) It requires less sophistication, requires less effort, and produces better results – compared to anything involving sig figs.
Using sig figs is like trying to eat a bowlful of clear soup using a fork. It’s silly, especially since spoons are readily available. Even if somebody has a phobia about spoons, the fork is still silly; they’d be better off throwing it away and using no utensil at all.
In an introductory course, some students (especially the more thoughtful students) will be appalled by the crudity and unreliability of the sig figs doctrine, and will appreciate the value of guard digits.
On the other hand, there will also be some students (especially the more insecure students) for whom various psychological issues make it hard to appreciate the necessity for guard digits. These issues include the following:
This rule of barnyard ethology applies to some spheres of human activity, including lawyering, politics, and military combat. Never admit weakness, and never admit uncertainty.
However ... students need to realize that science is not like lawyering, or politics, or combat. Scientists do admit uncertainty. The surest way to be recognized as a nonscientist is to pretend to be certain when you’re not.
It may seem ironic or even paradoxical, but it is true: One of the most basic steps toward reducing uncertainty is to admit that there is some uncertainty, and to account for it. For example, it would always be wrong to say that the true voltage is 1.23 volts, whereas we might be quite confident that the true voltage is in the range bewteen 1.22 and 1.24 volts. For more on this, see reference 22.
Being able to admit uncertainty requires some emotional maturity, some emotional security, some grownupness. This is an important part of why students go to school, to learn such things.
This is spectacularly unscientific. By rounding off the number to the point where it is not fluctuating, they have arranged to get the same number every time ... but it is wrong every time. It is wrong because of excessive roundoff error. Evidently they would rather be wrong with certainty than right with uncertainty.
They need to realize that when they write down raw observations, with or without guard digits, they are recording the indicated values, not the true values. The indicated value represents the range of true values, but it is not the same thing.
When describing a distribution, don’t worry about the fact that the description is nonunique. There are lots of ways of describing the same distribution. If it makes you feel better, first write down the width of the distribution, and then write down the nominal value. If the distribution has a halfwidth of ±7%, it doesn’t matter whether you express the nominal value as 51, or 51.13, or 51.1394744. The fact that the trailing digits are uncertain and nonunique doesn’t make these numbers wrong. They are all equivalent, for almost all practical purposes.
If you were to claim that any number such as 51, or 51.13, or 51.1394744 (with or without guard digits) represented an exact measurement, that would be wrong. So don’t pretend it’s exact. Say it has an uncertainty of ±7%. Once you’ve said that, you are free to write down as many guard digits as you like. (You need at least some uncertain digits, to guard against roundoff errors.)
The real world does not offer certainty. Students should not blame themselves for uncertainty, and should not blame the teacher. We live in an uncertain world. The goal is not to eliminate all uncertainty; the goal is to learn how to live in an uncertain world.
One of the crucial techniques for dealing with uncertainty is to represent things as distributions rather than as plain numbers.
The goal is not to avoid all mistakes. Everybody makes mistakes. Students are expected to make more mistakes than professionals, but even professionals make mistakes. The goal is to (a) minimize the cost of the mistakes, and (b) learn from the mistakes. For example, realworld engineers commonly build pilot plants and/or carry out pilot programs, so they can learn from mistakes relatively cheaply, before they commit to a multibilliondollar fullscale program. For more along this line, see section 7.12.
I have seen students go to great lengths to avoid having the slightest imperfection in their lab books. These students need to realize that real science involves approximation, including what we call successive refinement. That is, we first make a rough measurement, and then based on what we just learned, we make successively more refined measurements. If the first measurement were perfect, we wouldn’t need the later measurements. Learning is not a sin.
There are two issues: writing sig figs, and reading sig figs.
If you ever feel you need to write something using sig figs, you should lie down until the feeling goes away. Figure out what you are trying to say, and find a better way of saying it. If you are going to express the uncertainty at all, express it separately. See also section 7.9.
The rest of this section is devoted to reading sig figs. That is, suppose you are given a bunch of numbers and are required to interpret them as having significant digits.
If that’s all you have to go on, it is not necessary – and not possible – to take the situation seriously. If the authors had intended their uncertainties to be taken seriously, they would have encoded the data properly, not using significant digits.
Sometimes, though, you do have more information available.
One good strategy, if possible, is to simply ask the authors what they think the data means. If the data is from a book, there may be a statement somewhere in the book that says what rules the authors are playing by. Along similar lines, I have seen blueprints where explicit tolerance rules were stated in the legend of the blueprint: one example said that numbers with 1, 2, or 3 decimal places had a tolerance of ±0.001 inches, while numbers with 4 decimal places had a tolerance of ±0.0001 inches. That made sense.
Another possibility is to use your judgment as to how much uncertainty attaches to the given data. This judgment may be based on what you know about the source of the data. For instance, if you know that the data results from a counting process, you might decide that 1100 is an exact integer, even though the sig figs rules might tell you it had an uncertainty of ±50 or even ±500 or worse.
As a nexttolast resort, you can try the following procedure. We need to attribute some uncertainty to each of the given numbers. Since we don’t know which sect of the sigdigs cult to follow, we temporarily and hypothetically make the worstcase assumption, namely just shy of ten counts of uncertainty in the last place. For example, 1.23 becomes 1.23±0.099, on the theory that 1.23±0.10 would have been rounded to 1.2 according to the multicount sect. (The multicount sect is generally the worst case when you are decoding numbers that are already represented in sigfigs notation. Conversely, the halfcount sect is generally the worst case when you are encoding numbers into the sigfigs representation, because it involves the greatest amount of destructive rounding.)
Now turn the crank. Do the calculation, using plenty of guard digits on the intermediate results. Propagate the uncertainty using the methods suggested in section 6.
Now there are two possibilities:
At some point you might well decide that the given data is inadequate for the purpose. Go back to Square One and obtain some better data.
I categorically decline to suggest an explicit convention as to what sig figs “should” mean. There are two reasons for this: First of all, the sectarian differences are too huge; anything I could say would be wildly wrong, one way or the other, according to one sect or another. Secondly, as previously mentioned, what’s safest when writing sig figs is not what’s safest when reading and trying to interpret sig figs. Last but not least, sig figs “should” not be used at all; I don’t want to say anything that could be misinterpreted as endorsing their use.
Spreadsheets are great. You need to analyze the data one way or another, so you might as well do it on a spreadsheet. This gives you a big bonus: you can do some “whatif” analysis. You don’t need to do a fullblown Monte Carlo analysis as in section 6.15; instead just wiggle a few of your data points to see how that affects the final answer. The same goes for other quantities such as calibration factors: find out how much of a perturbation is needed to significantly affect the final answer.
If goodsized changes in a data point have negligible effect on the final answer, it means you can relax a bit; you don’t need to drive yourself crazy measuring that data point to extreme precision. Conversely, if you find that smallish changes in a single data point have a major effect on the answer, it tells you that you’d better measure each such data point as accurately as you can, and/or you’d better take a huge amount of data (so you you can do some signalaveraging, as discussed in section 6.11). You can also consider upgrading the apparatus, perhaps using more accurate instruments, and/or redesigning the whole experiment to give you better leverage.
There is a lesson here about procedures: It is a really bad idea to take all your data and then do all your analysis. Take some data and do some analysis, so you can see whether you’re on the right track and so you can do the sensitivity analysis we just discussed. Then take some more data and do some more analysis. This is called online analysis.
This is quite important. As mentioned in section 7.10, realworld engineers commonly build pilot plants and/or carry out pilot programs, so they can learn what the real issues are before they commit to fullscale production. Once the program is in operation, they do a lot of trend monitoring, so that if a problem starts to develop about it they learn about it sooner rather than later.
You should also find ways to make internal consistency checks. If there are good theoretical reasons why the data should follow a certain functional form, see if it does. Exploit any sum rules or other constraints you can find. Make sure there is enough data to overconstrain the intended interpretation. By that I mean do not rely on two points to determine a straight line; use at least three and preferably a lot more than that, so that there will be some internal error checks. Similarly, if you are measuring something that is supposed to be a square, measure all four sides and both diagonals if you can. Measure the angles also if you can.
There are few hardandfast rules in this business. It involves tradeoffs. It involves judgment. You have to ask: What is the cost of taking more data points? What is the cost of making them more accurate? What is the cost of a given amount of uncertainty in the final answer?
Additional good advice can be found in reference 23.
If you want to calculate the electron e/m ratio, correlations must be taken into account. This is discussed in section 6.6.
Consider the simplified ohmmeter circuit shown in figure 45
In such a circuit, it would not be uncommon to find the following voltages:
 (60) 
The question arises, what is the differentialmode signal V_{A} − V_{B}? If you thought V_{A} and V_{B} were uncorrelated, you would calculate
 (61) 
However, in the real world, with a little bit of work you could probably arrange for V_{A} and V_{B} to be very highly correlated. It might turn out that
 (62) 
and with extra work you could do even better. There is no way to calculate the result in equation 62, not without a great deal of additional information, but that’s not the point. The point is that assuming the voltages are uncorrelated would be a very very bad assumption. The physics of the situation is that the stray timedependent magnetic flux φ· affects both V_{A} and V_{B} in the same way, to an excellent approximation. Communications equipment and measuring instruments depend on this. It’s not something that happens automatically; you make it happen by careful engineering.
Let’s do an example involving Newton’s constant of universal gravitation (G), the mass of the earth (M_{⊕}), and the product of the two (GM_{⊕}).
In order to speak clearly, we introduce the notation D(M_{⊕}) to represent a direct measurement of M_{⊕}. We use the unadorned symbol M_{⊕} to represent our best estimate of M_{⊕}. If necessary, we can use T(M_{⊕}) to represent the true, ideal, exact value, which will never be known by mortal man.
The last time I checked,
 (63) 
You could obtain an estimate of M_{⊕} from geology and seismology, but even that wouldn’t count as a “direct” measurement, and more importantly it wouldn’t be particularly helpful, since it would not be anywhere near as accurate as D(GM_{⊕})/D(G).
Here are the actual nominal values and absolute uncertainties, from reference 24 and reference 25:
 (64) 
Looking at the value for M_{⊕} in equation 64, you might be tempted to think that the nominal value has several insignificant digits, five digits more than seem necessary, and six or seven digits more than are allowed by sig figs doctrine. However, it would be a Bad Idea to round off this number. Note the contrast:
Suppose you keep all the digits in equation 64. If you multiply M_{⊕} by G, you get a good value for the product GM_{⊕}, accurate to 2 ppb.  Suppose you round off the nominal value for M_{⊕}. If you then multiply by G, you get a much less accurate value for GM_{⊕}, accurate to no better than 100 ppm. 
The fundamental issue here is the fact that M_{⊕} is highly correlated with G. They are correlated in such a way that when you multiply them, the uncertainty of the product is vastly less than the uncertainty in either one separately.
Yes, the distributions governing G and M_{⊕} have considerable uncertainty.  No, you should not round off those quantities to the point where roundoff error becomes comparable to the uncertainty; that would be ludicrously destructive. 
To better understand this situation, it may help to look at the diagram shown in figure 46. Recall from section 4.2 that fundamentally, an “uncertain quantity” such as G or M_{⊕} is really a probability distribution. Also recall that as a general principle, you can always visualize a probability distribution in terms of a scatter plot. In this case, it pays to plot both variables jointly, as a twodimensional scatter plot. In figure 46, G is plotted horizontally and its standard deviation is shown by the magenta bar. Similarly M_{⊕} is plotted vertically its standard deviation is shown by the blue bar. The standard deviation of the product GM_{⊕} is represented – loosely – by the yellow bar.
In this figure, the amount of correlation has been greatly deemphasized for clarity. The uncertainty of the product is portrayed as only six times less than the uncertainty of the raw variables. (This is in contrast to the real physics of mass and gravitation, where the uncertainty of the product is millions of times less than the uncertainty of the raw variables.)
If the probability distribution is a twodimensional Gaussian, the contours of constant probability are ellipses when we plot the probability as in figure 46. If the variables are highly correlated, the ellipses are highly elongated, and the principal axes of the ellipse are nowhere near aligned with the axes of the plot. (Conversely, in the special case of uncorrelated variables, the axes of the ellipse are aligned with the axes of the plot, and the ellipse may or may not be highly elongated.)
This example serves to reinforce the rule that you should not round off unless you are sure it’s safe. It’s not always easy to figure out what’s safe and what’s not. When in doubt, keep plenty of guard digits.
To make progress, we need to construct the covariance matrix. It is defined as:
 (65) 
where angle brackets ⟨⋯⟩ indicate the ensemble average, and the overbar ⋯ indicates the same thing; we use two different notations to improve legibility. To say the same thing another way, we can define the vector of residuals in terms of its components:
 (66) 
Then to form the covariance matrix, we take the outer product Δx(i) Δx(i)^{T}, and then take the ensemble average. The superscript T indicates transpose, which in this case converts a column vector to a row vector.
The generalization to more than two variables is straightforward. The correlation matrix is guaranteed to be symmetric.
We can simplify things by taking logarithms. Rather than multiplying G by M_{⊕} we can add ln(G) to ln(M_{⊕}). The new variables are:
 (67) 
Also, rather than writing G = A ± B where B is the absolute uncertainty, we write G = A(1 ± B/A) where B/A is the relative uncertainty. We will make use of the Taylor expansion, ln(1+є) = є when є is small.
 (68) 
It makes sense to write x_{1} and x_{2} in the form of a nominal value plus an uncertainty, because we think these two quantities are uncorrelated. They are measured by completely dissimilar methods; G is measured using a Cavendish balance or something like that, while GM is measured using clocks and radar to observe the motion of satellites.
That means the covariance matrix for x_{1} and x_{2} is:
 (69) 
Now suppose we wish to change variables. Mass is, after all, directly relevant to physics. Mass is one of the SI base units. Meanwhile G is a fundamental universal constant. So let’s choose G and M as our variables, or equivalently x_{1} and x_{3}.

In the numerical matrix equation 70b, the lowerright matrix element differs slightly from the others. It differs in the tenth decimal place.
In equation 70c, we have very unwisely rounded things off to two decimal places, which is not enough. Even eight decimal places would not have been enough. Rounding causes the matrix to be singular. Since we plan on inverting the matrix, this is a Bad Thing.
In fact, even equation 70b is nearly useless, for multiple reasons. Part of the problem is that the matrix elements are rounded to machine precision (IEEE double precision), which isn’t really good enough for this application. That is, you can’t multiply the numerical matrix by vectors, you can’t invert it, and you can’t find its eigenvectors or eigenvalues. Anything you try to do runs afoul of small differences between large numbers. Secondly, even if we could trust the numbers, it is not humanly possible to look at the numbers and figure out what they mean.
As a general rule, if you want to extract meaning from a matrix, you will be much better off if you reexpress it using SVD i.e. singular value decomposition. In our case, we are in luck, because the matrix is real and symmetric, hence Hermitian, so we can use EVD i.e. eigenvalue decomposition, which (compared to SVD) is easier to compute and at least as easy to understand.
Let’s take one preliminary step, to put our matix into form that is not so numerically illconditioned. We start by rotating the matrix 45 degrees:

We can do things with this matrix, without being plagued by small differences between large numbers. We still have work to do, because the 45 degree rotation did not exactly diagonalize the matrix.
In general, the power method is a good way to find the eigenvector associated with the largest eigenvalue. The power method applied to the inverse matrix will find the eigenvector associated with the largest eigenvalues of that matrix, which is of course the smallest eigenvalue of the noninverted matrix. Also remember that if you have found N−1 of the eigenvectors, you can construct the last one using the fact that it is orthogonal to all the others.
In our example, the eigenvectors of the matrix in equation 71c are:
 (72) 
These vectors are orthonormal. They may not look normalized, but they are, as closely as possible within the IEEE double precision representation, which is close enough for present purposes.
We can arrange these sidebyside to define a unitary matrix
 (73) 
This can be thought of as a rotation matrix, with a rather small rotation angle. We use it to rotate the covariance matrix a little bit more. We also make use of the fact that rotation matrices are unitary, which means R(−θ) = R^{T}(θ) = R^{−1}(θ).
 (74) 
which is diagonal. The matrix elements are the eigenvalues of the covariance matrix.
To say the same thing the other way, we can write:

where A is a diagonal matrix of eigenvalues, and V is the matrix of eigenvectors of the original covariance matrix. Equation 75b is the standard way of writing the singular value decomposition, and in this case also the eigenvalue decomposition.
In the SVD representation, it is exceedingly easy to find the inverse covariance matrix:
 (76) 
where V is the same as in equation 75c, and we can invert the diagonal elements of A one by one:
 (77) 
The fact that we could so easily invert the covariance matrix gives you some idea of the power of SVD.
In general, the inverse covariance matrix is quite useful. For instance, this is what you use for weighting the data when doing a leastsquares fit. Specifically: In terms of the residuals as defined by equation 66, the unweighted sumofsquares is given by the dot product Δx(i)^{T} Δx(i), whereas the properly weighted sum is:
 (78) 
which is known as the Mahalanobis distance.
It pays to look at the eigenvalues of the covariance matrix and/or the inverse covariance matrix. If all the eigenvalues are comparable in magnitude, it means the correlations are not particularly significant. Conversely, if some eigenvalues are very much smaller or larger than others, it means that the correlations are very significant. You can visualize this in terms of a highly elongated error ellipsoid, as illustrated in figure 46.
In the example we are considering, one of the eigenvalues is ten orders of magnitude larger than the other. This helps us to understand why the matrix in equation 70 is so illconditioned. If we wrote out the inverse covariance matrix explicitly (without SVD) it would be equally illconditioned.
It also pays to look at the eigenvectors.
We refer to an eigenvector of the inverse covariance matrix Σ^{−1} as being “cheap” or “expensive” according to whether the associated eigenvalue is small or large.  The same vectors are eigenvectors of the plain old covariance matrix Σ, in which case the cheap eigenvectors have a large eigenvalue (long error bars) and the expensive eigenvectors have a small eigenvalue (short error bars). 
The idea is that in figure 46, if you move away from the center in an expensive direction (in the direction of the yellow line), the Mahalanobis distance goes up rapidly, whereas if you move in a cheap direction (perpendicular to the yellow line), the Mahalanobis distance goes up only slowly.
This tells us something about the physics. If you just look at the variance, it tells you that in some sense G is not well determined, but that does not mean you can cheaply vary the value of G all by itself. If you don’t want a big penalty, you have to vary G and vary M_{⊕} at the same time, in opposite directions, so as to move along a contour of constant GM_{⊕}.
The example presented in section 8.3 was simplified for pedagogical reasons. In realworld situations, there are usually many more variables to worry about. For example:

The uncertainties indicated in equation 79e, equation 79f, and equation 79g take into account only the associated variance, without regard to any of the covariances. The trailing digits in the nominal values are necessary for some purposes, including forming products such as GM_{⊕} and ratios such as M_{⊕}/M_{⊙}.
If we choose G and the three masses as our variables, the covariance will be a 4×4 matrix, with lots of nontrivial correlations.
In classroom settings, people often get the idea that the goal is to report an uncertainty that reflects the difference between the measured value and the “correct” value. That idea certainly doesn’t work in real life – if you knew the “correct” value you wouldn’t need to make measurements.
In all cases – in the classroom and in real life – you need to determine the uncertainty of your measurement by scrutinizing your measurement procedures and your analysis.
Given two quantities, you can judge how well they agree.
For example, we say the quantities 10±2 and 11±2 agree reasonably well. That is because there is considerable overlap between the probability distributions. It is moreorless equivalent to say that the two distributions are reasonably consistent. As a counterexample, 10±.2 does not agree with 11±.2, because there is virtually no overlap between the distributions.
If your results disagree with wellestablished results, you should comment on this, but you must not fudge your data to improve the agreement. You must start by reporting your nominal value and your uncertainty independently of other people’s values. As an optional later step, you might also report a “unified” value resulting from combining your results with others, but this must be clearly labeled as such, and in no way relieves you of your responsibility to report your data “cleanly”. The reason for this is the same as before: There is always the possibility that the your value is better than the “established” value. You can tell whether they agree or not, but you cannot really tell which (if either) of them is correct.
Of course, if a beginner measures the charge of the electron and gets an answer that is wildly inconsistent with the established value, it is overwhelmingly likely that the beginner has made a mistake as to the value and/or the uncertainty. Be that as it may, the honorable way to proceed is to report the data “as is”, without fudging it. Disagreement with established results might motivate you to go back and scrutinize the measurement process and the analysis, looking for errors. That is generally considered acceptable, and seems harmless, but actually it is somewhat risky, because it means that answers that agree with expectations will receive less scrutiny than answers that don’t.
The historical record contains bad examples as well as good examples. Sometimes people who could have made an important discovery talked themselves out of it by fudging their data to agree with expectations. However, on other occasions people have done the right thing.
As J.W.S. Rayleigh put it in reference 26:
One’s instinct at first is to try to get rid of a discrepancy, but I believe that experience shows such an endeavour to be a mistake. What one ought to do is to magnify a small discrepancy with a view to finding out the explanation....
When Rayleigh found a tiny discrepancy in his own data on the molar mass of nitrogen, he did not cover it up. He called attention to it, magnified it, and clarified it. The discrepancy was real, and led to the discovery of argon, for which he won the Nobel Prize in 1904.
Whenever possible, raw data should be taken “blind”, i.e. by someone who doesn’t know what the expected answer is, to eliminate the temptation to fudge the data. This is often relatively easy to arrange, for instance by applying a scale factor or baselineshift that is recorded in the lab book but not told to the observer.
Bottom line: Your data is your data. The other guy’s data is the other guy’s data. You should discuss whether your data agrees with the other guy’s data, but you should not fudge your data to improve the agreement.
You should not assume that all the world’s errors are due to imperfect measurements.
Consider the situation where we are measuring the properties of, say, a real spring. Not some fairytale ideal spring, but a real spring. It will exhibit some nonlinear forceversusextension relationship.
Now suppose that we do a really good job of measuring this relationship. The data is reproducible within some ultratiny uncertainty. For all practical purposes, the data is exact.
Next, suppose we want to model this data. Modeling is an important scientific activity. We can model the data using a straight line. We can also model it using an Nthorder polynomial. No matter what we do, there will always be some “error”. This is an error in the model, not in the observed data. It will lead to errors in whatever predictions we make with the model.
Proper error analysis will tell us bounds on the errors of the predictions.
Is this an example of “if it doesn’t work, it’s physics”? No! An inexact prediction is often tremendously valuable. An approximate prediction is a lot better than no prediction.
I mention this because far too many introlevel science books seem to describe a fairytale axiomatic world where the theorists are always right and the experimentalists are always wrong. Phooey!
It is very important to realize that error analysis is not limited to hunting for errors in the data. In the above example, the data is essentially exact. The spring is not “at fault” for not adhering to Hooke’s socalled law. Instead, the reality is that Hooke’s law is imperfect, in that it does not fully model the complexities of real springs.
A huge part of realworld physics (and indeed a huge part of real life in general) depends on making approximations, which includes finding and using phenomenological relationships. The thing that sets the big leagues apart from the bush leagues is the ability to make controlled approximations.
When dealing with sets or clusters of measurements, we must deal with several different probability distributions at once, which requires a modicum of care. The conventional terminology in this area is a mess, so I will use some colorful but nonstandard terminology.
This gives us two equivalent ways of forming a cluster: We can draw a cluster directly from V, or we can draw N particles from U and then group them to form a cluster.
Therefore:
See also the definition(s) of sample mean and sample standard deviation in section 10.4.
Linearity guarantees that µ_{V} will always be equal to µ_{U}. In contrast, the definition of σ is nonlinear, and σ_{V} will be smaller than σ_{U} by a factor of √N, where N is the number of particles per cluster. And thereby hangs a tale: all too commonly people talk about “the” standard deviation, and sometimes it is hard to figure out whether they are talking about σ_{U} or σ_{V}.
Given a single cluster consisting of N measurements, we can form an estimate (denoted µ_{U}′) of the center (µ_{U}) of the underlying distribution. In fact, for a wellbehaved distribution, we can set µ_{U}′ = y = ⟨x⟩_{C}, i.e. we can let the yvalue of the cluster serve as our estimate of µ_{U}. Meanwhile, we can also form an estimate (σ_{U}′) of the width (σ_{U}) of the underlying distribution, as discussed below.
Given a group consisting of M clusters, we can form an estimate (µ_{V}′) of the center of the distribution of yvalues. Similarly we can form an estimate (σ_{V}′) of the width of the distribution of yvalues.
To say the same things more formally:
 (80) 
Among other things, we note the following:
Note: Commonly we use [x] as our σ_{U}′ i.e. our estimate of σ_{U}, using the [⋯] notation defined in section 10.4.
When you report the results of a cluster of measurements, you have a choice:
In either case, you should be very explicit about the choice you have made. If you just report 4.3 ± 2.1 it’s ambiguous, since [x] differs from [y] by a factor of √N, which creates the potential for huge errors.
The relationships among the quantities of interest are shown in figure 47.
Conceptually, [y] would manifest itself in connection with drawing multiple clusters from the distribution V. However, you have enough information within a single cluster to calculate [y]. Just divide [x] by √N.
For a given cluster of data:
⟨x⟩ aka y is our estimate of µ_{U} and also of µ_{V}.
[x] is our estimate of σ_{U}.
[y] = [x]/√N is our estimate of σ_{V}.
The field of statistics, like most fields, has its own terminology and jargon.
Here are some terms where the statistical meaning is ambiguous and/or differs from the vernacular meaning.
In statistics, sample mean refers to y = ⟨x⟩, i.e. the mean of a given sample i.e. a given cluster. This is a natural consequence of the definition of sample.
In contrast, the standard deviation of a distribution is unambiguous. That’s because [x] and [x]^{b} converge in the largesample limit, and we can draw and arbitrarilylarge sample from the distribution.
If an event is a set with only one element, it is called a simple event; if it contains multiple elements, it is called a compound event.
To repeat: When dealing with “standard deviation” in connection with clusters (samples) of size N, there are at least six ideas in play:
 (81) 
For large N, note that the lefttoright variation is rather small
within each row, but the rowtorow variation is huge.
See reference 2 for a careful definition of mean, variance, and standard deviation.
The modern approach is to use uncertainty as a catchall term. I recommend this approach. Sometimes it is useful to separate out various contributions to the overall uncertainty ... and sometimes not.
A few common sources of uncertainty include:
The first five items on this list are often present in realworld measurements, sometimes to a nontrivial and irreducible degree. In contrast, the last two items are equally applicable to purely theoretical quantities and to experimentally measured quantities.
Neither readability nor roundoff error are usually considered “irreducible” sources of experimental error, since they can usually be reduced by redesigning the experiment.
As an example of statistical fluctuations, suppose you have a tray containing 1000 coins. You randomize the coins, and count how many “heads” turn up. Suppose the first time you do the experiment, you observe x_{1} = 511, the second time you observe x_{2} = 493, et cetera.
There are several points we can make about this. First of all, there is no uncertainty of measurement associated with the individual observations x_{1}, x_{2}, etc. after they have been carried out. These are exact counts. On the other hand, if you want to describe the entire distribution X = {x_{i}} from which such outcomes are drawn, it has some mean and some standard deviation. Similarly if you want to predict the outcome of the next observation, there will be some uncertainty. For fair coins, we expect x = 500±16 based on theory, so this is not necessarily an “experimental” uncertainty, unless you want to consider it a Gedankenexperimental uncertainty. If you do the actual experiment with actual coins, then experimental uncertainty would be the correct terminology.
See section 12.6 for more on this.
In some contexts (particularly in electronics), the statistical fluctuations of a counting process go by the name of shot noise.
As an example of roundoff error unrelated to measurement error, consider rounding off the value of π or the value of 1/81:
 (82) 
 (83) 
The point is that neither π nor 1/81 has any uncertainty of measurement. In principle they are known exactly, yet when we express them as a decimal numeral there is always some amount of roundoff error.
Roundoff error is not statistical. It is not random. See section 11.4 for more on this.
Consider the celebrated series expansion
 (84) 
This is a power series, in powers of x. That is, the Nth term of the series is equal to some power of x times some coefficient.
Note that in a certain sense, the decimal representation of any number (e.g. equation 82 or equation 83) can be considered a power series. The digits in front of the decimal point are a series in powers of 10, counting righttoleft. Similarly the digits after the decimal point are a series in powers of 1/10, counting lefttoright, such that the contribution from the Nth digit to the overall number is equal to 1/10^{N} times some coefficient.Similar words apply to other bases, not just base 10. Base 2, base 8, base 10, and base 16 are all commonly used in computer science. They are called binary, octal, decimal, and hexadecimal.
There are many situations in science where it is necessary to use a truncated series, perhaps because the higher order terms are unknown in principle, or simply because it would be prohibitively expensive to evaluate them. Such situations arise in mathematical analysis and in numerical simulations.
Every time you use a truncated series you introduce some error into the calculation. In an iterative calculation, such errors can add up, and can easily reach troublesome levels.
Starting from equation 84, whenever you truncate the power series by throwing away secondorder and higher terms, you are left with 1+x every time. Therefore the truncation error is (exp(x)−1−x) every time. This is not random. It is 100% reproducible.
Similarly, as mentioned in section 11.2, whenever you round off π to five decimal places you get 3.14159 every time. Therefore the roundoff error is (π − 3.14159) every time. This is not random. It is 100% reproducible.
As a third example, consider the force F(x) developed by a spring, as a function of the extension x. We can expand F(x) as a power series. In accordance with Hooke’s law we expect the secondorder and higher terms to be small, but in the real world they won’t be zero. And for any given spring, they won’t be random.
The third example is important, because you don’t know what the truncation error is. This stands in contrast to the previous two examples, in the sense that even if you don’t know the value of (π − 3.14159) at the moment, you could figure it out.
So now we come to the point of this section: If you don’t know the value of y at the moment, that doesn’t mean y is random. Even if you don’t know y and cannot possibly figure it out, that does not mean it is random. More importantly, even if y contains “some” amount of randomness, that does not mean that successive observations of y drawn from some distribution Y will be uncorrelated.
This is important because many of the statistical methods that people like to use are based on the assumption that the observations are statistically independent.
In Appendix D of TN1297 (reference 9) you can find a discussion of some commonlyencountered terms for various contributions to the overall uncertainty, and various related notions. I will now say a few words say about some of these terms.
A tolerance serves somewhat as the mirror image of uncertainty of measurement. Tolerances commonly appear in recipes, blueprints, and other specifications. They are used to specify the properties of some manufactured (or abouttobe manufactured) object. Each number on the specification will have some stated tolerance; for example in the expression e.g. 5.000 ± .003 the tolerance is ± .003. The corresponding property of the finished object is required to be within the stated toleranceband; in this example, greater than 4.997 and less than 5.003.
The idea of tolerance applies to a process of going from numbers to objects. This is the mirror image of a typical scientific observation, which goes from objects to numbers.
The notation is somewhat ambiguous, since tolerance is expressed using exactly the same notation as used to express the uncertainty of a measurement. The notations are the same, but the concepts are very different. There are at least three possibilities:
This illustrates a subtle but important conceptual point: Whenever you are talking about a cooked data blob or any other probability distribution, it is important to ascertain what is the ensemble. Note the contrast:
If the ensemble consists of measuring the 17th widget over and over again, the uncertainty is the uncertainty of the measurement process, 0.0005 inches.  If the ensemble consists of measuring every widget in today’s production run, the uncertainty is dominated by the widgettowidget variability, 0.004 inches. (The uncertainty of the measurement process makes some contribution, but it is small by comparison.) 
When specifying tolerances, the recommended practice is to explain in words what you want. That is, very commonly the desired result cannot be expressed in terms of simple “A±B” terminology. For example, I might walk into the machine shop and say that I would like a chunk of copper one inch in diameter and one inch long. The machinists could machine me something 1±0.0001 inches in diameter and 1±0.0001 inches long, but that’s not what I want; I don’t want them to machine it at all. In this context they know I just want a chunk of raw material. In all likelihood they will reach into the scrap bin and pull out a piece of stock and toss it to me. The diameter is roughly 1 inch but it’s outofround by at least 0.010 inches. The length is somewhere between 1 inch and 6 inches. This is at least ten thousand times less accuracy than the shop is capable of, but it is within tolerances and is entirely appropriate. They know that at the end of the day I will have turned the material into a set of things all very much smaller than what I started with, so the size of the raw material is not important.
As another example, a surfacescience experiment might require a cylinder very roughly one inch in diameter and very roughly one inch long, with one face polished flat within a few millionths of an inch.
It is also quite common to have correlated tolerances. (This is roughly the mirror image of the correlated uncertainties of measurement discussed in section 6.15.) For example, I might tell the shop that I need some spacers one inch in diameter and one inch long. I explain that since they are spacers, on each cylinder the ends need to be flat and parallel ... but I’m not worried about the diameter and I’m not even worried about the length, so long as all three spacers have the same length ±0.001 inch. That is, the lengths can be highly variable so long as they are closely correlated.
A common yet troublesome example of correlated uncertainties concerns the proverbial round peg in a round hole. To a first approximation, you don’t care about the diameter of the peg or the diameter of the hole, provided the peg fits into the hole with the proper amount of clearance. The amount of clearance is the proverbial small difference between large numbers, which means that the relative uncertainty in the clearance will be orders of magnitude larger than the relative uncertainty in the diameters. For a oneofakind apparatus you can customize one of the diameters to give the desired clearance ... whereas in a massproduction situation controlling the clearance might require very tight tolerances on both of the diameters. In some cases you’d be better off using a tapered pin in a tapered hole, or using a sellock pin (aka spring pin).
Nowadays experts generally avoid using the term “precision” except in a vague, notverytechnical sense, and concentrate instead on quantifying the uncertainty.
Multiple conflicting meanings of “precision” can be found in the literature.
One rather common meaning corresponds roughly to “an empirical estimate of the scatter”. That is, suppose we have a set of data that is empirically well described by a probability distribution with a halfwidth of 0.001; we say that data has a precision of 0.001. Alas that turns the commonsense meaning of precision on its head; it would be more logical to call the halfwidth the imprecision, because a narrow distribution is more precise.
For more discussion of empirical estimates of uncertainty, see section 12.6.
It is amusing to note that Appendix D of TN1297 (reference 9) pointedly declines to say what precision is, “because of the many definitions that exist for this word”. Apparently “precision” cannot be defined precisely.
Similarly, it says that accuracy is a “qualitative concept”. Apparently “accuracy” cannot be defined accurately.
This is particularly amusing because nonexperts commonly make a big fuss about the distinction between accuracy and precision. A better strategy is to talk about the overall uncertainty versus an empirical estimate of the scatter, as discussed in section 12.6.
The term “accuracy” suffers from multiple inconsistent definitions.
One of the mostcommon meanings is as a generalpurpose antonym for uncertainty. Nowadays experts byandlarge use “accuracy” only in an informal sense. For careful work, they focus on quantifying the uncertainty. For more on this, see section 12.6.
It is neither necessary nor possible to draw a sharp distinction between accuracy and precision, as discussed in section 12.2 and section 12.6.
On a digital instrument, there are only somany digits. That introduces some irreducible amount of roundoff error into the reading. This is one contribution to the uncertainty.
A burette is commonly used as an almostdigital instrument, because of the discreteness of the drops. Drop formation introduces quantization error.
On an analog instrument, sometimes you have the opportunity to interpolate between the smallest graduations on the scale. This reduces the roundoff error, but introduces other types of uncertainty, due to the vagaries of human perception. You also have to ask whether you should just replace it with an instrument with finer graduations.
As another example, suppose you are determining the endpoint of a titration by watching a colorchange. This suffers from the vagaries of human perception. Often, determining the colorchange point is the dominant source of uncertainty; interpolating between graduations on the burette won’t help, and using a more finely graduated burette won’t help. In this case, if more resolution is needed, you might consider using a photometer to quantify the color change, and if necessary use curve fitting to make best use of the photometer data.
On a digital instrument, the number of digits does not necessarily dictate the readability or the resolution. This is obvious in the case where there is autoranging or manual rangeswitching going on. Also, I have a scale where the lowestorder digit counts by twos. I’m not quite sure why; it makes the data “look” less uncertain (i.e. more reproducible) at the cost of making it actually more uncertain (i.e. more roundoff error). In any case, the fact remains: the number of digits does not control the resolution.
The ultimate limit – the fundamental limit – to readability is noise. If the reading is hopping around all over the place, roundoff error is not the dominant contribution to the noise. Interpolating and/or using a finer scale won’t help.
Roughly speaking, errors can be classified as follows:
Nonsystematic errors are random, with a wellbehaved distribution, and will average out if you take enough data.  Systematic errors don’t average out. 
This classification leaves open a nasty gray area when there are random errors that don’t average out, as discussed below. This is a longstanding problem with the terminology, and with the underlying concepts. 
For example: An instrument with a lousy temperature coefficient might be reproducible from minute to minute but not reproducible from season to season.
As another example: Suppose you measure something using an instrument that is miscalibrated, and the miscalibration is large compared to the empirical scatter that you see in your readings. As far as anybody can tell, today, your results are reproducible, because there is no scatter in the data … yet next month we may learn that your colleagues – using a different instrument – are not able to reproduce your results. An example of this is discussed in section 5.5.
On the third hand, if you kept all the raw data, you might be able to go back and recalibrate the data without having to repeat the experiment.
This illustrates a number of points:
So the question is, how do we describe this situation? The fundamental issue is that there are multiple contributions to the uncertainty. As usual, it should be possible to describe this in statistical terms.
We are in some formal sense “uncertain” as to how well your instrument is calibrated, and we would like to quantify that uncertainty. There is, at least in theory, an ensemble of instruments, some of which are calibrated, and some of which are miscalibrated in various ways, with a horribly abnormal distribution of errors. Your instrument represents an example drawn from this ensemble. Since you have drawn only one example, you have no empirical way of estimating the properties of this ensemble. So we’ve got a nasty problem. There is no convenient empirical method for quantifying how much overall uncertainty attaches to your results.
When we take a larger view, the situation becomes slightly clearer. Your colleagues have drawn additional examples from the ensemble of instruments, so there might be a chance of empirically estimating the distribution of miscalibrations.
However, the empirical approach will never be entirely satisfactory, because even including the colleagues, a toosmall sample has been drawn from the ensemble of instruments. If there is any nontrivial chance that your instrument is significantly miscalibrated, you should recalibrate it against a primary standard, or against some morereliable secondary standard. For instance, if you are worried that your meter stick isn’t really 1m long, take it to a machine shop. Nowadays they have laser interferometers on the beds of the milling machines, so you can reduce the uncertainty about your stick far beyond what is needed for typical purposes.
The smart way to proceed is to develop a good estimate of the reliability of the instrument, based on considerations such as how the instrument is constructed, whether two instruments are likely to fail in the same way, et cetera. This requires thought and effort, far beyond a simple histogram or scatterplot of the data.
Also keep in mind that sometimes it is possible to redesign the whole experiment to measure a dimensionless ratio, so that calibration factors drop out. As a famous example, the ratio of (moon mass)/(earth mass) is known vastly better than either mass separately. (The uncertainty of any measurement of either individual mass would be dominated by the uncertainty in Newton’s constant of universal gravitation.)
It is possible to make an empirical measurement of the scatter in your data, perhaps by making a histogram of your data and measuring the width. However, the point remains that this provides only a lower bound on the true uncertainty of your results. This may be a tight lower bound, or it may be a serious underestimate of the true uncertainty. You can get into trouble if there are uncontrolled variables that don’t show up in the histogram. This can happen if you have inadvertently drawn a toosmall sample of some variables.
Also beware that “random” errors may or may not average out. Consider the contrast:
There is a category of random errors that will average out, if you take enough data.  There is a category of random errors that will never average out, no matter how much data you take. 
If your measuring instrument has an offset, and the offset is undergoing an unbiased random walk, then we can invoke the central limit theorem to convince ourselves that the average of many measurements will converge to the right answer.  If the offset in your measuring process is undergoing a biased random walk, there will be an overall rate of drift, and the longer you sit there taking measurements the more the drift will accumulate. You may have seen an example of this in highschool chemistry class, when you tried to weigh a hygroscopic substance. 
Bias is not the only type of badlybehaved randomness. Consider for example 1/f noise (“pink noise”), which will never average out, even though it is not biased, as discussed in reference 29. (The statement of the central limit theorem has some important provisos, which are not satisfied in the case of 1/f noise.) 
Averaging can be considered a simple type of digital filter, namely a boxcar filter. Longtime averaging results in a filter with a narrow bandwidth, centered at zero. White noise has a constant power per unit bandwidth, so decreasing the bandwidth decreases the amount of noise that gets through.  As the name suggests, 1/f noise has an exceedingly large amount of noise power per unit bandwidth at low frequencies. A narrow filter centered at zero is never going to make the noise average out. You might be able to solve the problem by using a more sophisticated filter, namely a narrowband filter not centered at zero. Hint: lockin amplifier. 
Given any set of data, we can calculate the standard deviation of that data, as mentioned in section 12.2. This is a completely cutanddried mathematical operation on the empirical data. It gives a measure of the scatter in the data.
Things become much less clear when we try to make predictions based on the observed scatter. It would be nice if we could predict how well our data will agree with future measurements of the same quantity ... but this is not always possible, and is never cutanddried, because there may be sources of uncertainty that don’t show up in the scatter.
Note that what we have been calling “scatter” is conventionally called the “statistical” uncertainty. Alas, that is at best an idiomatic expression, and at worst a misleading misnomer, for the simple reason that virtually anything can be considered “statistical” in the following sense: Even absolute truth is statistical, equivalent to 100% probability of correctness, while falsity is statistical, equivalent to 0% probability of correctness.It might be slightly better to call it an empirical estimate or even better an internal estimate of one contribution to the uncertainty. The informal term scatter is as good as any. However, even this is imperfect, for reasons we now discuss:
Niels Bohr once said “Never express yourself more clearly than you are able to think”. By that argument, it is not worth coming up with a superprecise name for the distinction between scatter and systematic error, because it is not a superprecise concept. It depends on the details of how the experiment is done. Suppose we have a set of voltmeters with some uncertainty due to calibration errors. Further suppose one group measures something using an ensemble of voltmeters, while a second group uses only a single voltmeter. Then calibration errors will show up as readilyobservable scatter in the first group’s results but will show up as a hardtodetect systematic error (not scatter) in the second group’s results.
An oversimplified view of the relationship between scatter and systematic error is presented in figure 48. In all four parts of the figure, the black data points are essentially the same, except for scaling and/or shifting. Specifically: In the bottom row the spacing between points is 3X larger than the spacing in the top row, and in the righthand column the pattern is offcenter, i.e. shifted to the right relative to where it was in the lefthand column.
The data is a 300point sample drawn from a twodimensional Gaussian distribution. That is, the density of points falls of exponentially as a function of the square of the distance from the center of the pattern.
Figure 48 is misleading because it suggests that you can with one glance estimate how much the centroid suffers from systematic error. In contrast, in the real world, it is very very hard to get a decent estimate of this. You can’t tell at a glance how far the data is from the target, because you don’t know where the target is. (If you knew the location of the target, you wouldn’t have needed to take data.) The realworld situation is more like figure 49.
Remark: Terminology: Sometimes people use the word “precision” to mean the lack of scatter, and use the word “accuracy” to mean, roughly speaking, the lack of systematic error of the centroid. It is, alas, hard to quantify these terms, as discussed in section 12.2 and section 12.3.
Here’s another issue: Sometimes people imagine there is a clean dichotomy between precision and accuracy, or between scatter and systematic error ... but this is not right. Scatter is not the antonym or the alternative to systematic error. There can perfectly well be systematic errors in the scatter!
In particular, moving lefttoright in figure 48 illustrates a systematic offset of the centroid. In contrast, moving toptobottom in figure 48 illustrates a systematic 3x increase of the standard deviation.
Here’s how such issues can arise in practice: Suppose you want to measure the Brownian motion of a small particle. If the raw data is position, then the mean position is meaningless and the scatter in the data tells you everything you need to know. If you inadvertently use a 10x microscope when you think you are using a 30x microscope, that systematically decreases the scatter by a factor of 3. This is a disaster, because it introduces a 3x systematic error in the main thing you are trying to measure.
As another example in the same vein, imagine you want to measure the noise figure of a radiofrequency preamplifier. The raw data is voltage. The mean of the data is meaningless, and is zero by construction in an ACcoupled amplifier. The scatter in the data tells you everything you need to know.
On the other hand, in the last two examples, it might be more practical to shift attention away from the raw data to a slightly cooked (“parboiled”) representation of the data. In the Brownian motion experiment, let the parboiled data be the diffusion constant, i.e. the slope of the curve when you plot the square of the distance traveled versus time. Then we can talk about the mean and standard deviation of the measured diffusion constant.
Here’s a twopart constructive suggestion:
Scatter is one contribution to our uncertainty about the nominal value. The measured scatter provides a lower bound on the uncertainty. It tells you nothing about possible systematic offsets of the nominal value, and tells you nothing about possible systematic errors in the amount of scatter itself (as in the microscope example above).
When reporting the uncertainty, what really matters is the total, overall uncertainty. Breaking it down into separate contributions (scatter, systematic error, or whatever) is often convenient, but is not a fundamental requirement.
Quantifying the scatter is easy ... much easier than estimating the systematic errors in the mean and standard deviation. Do your best to estimate the total, overall uncertainty.
In an introductory class, students may not have the time, resources, or skill required to do a meaningful investigation of possible systematic errors. This naturally leads to an emphasis on analyzing the scatter ... but this emphasis should not become an overemphasis. Remember, the scatter is a lower bound on the uncertainty, and should be reported as such. There is nothing wrong with saying “We observed σ_{X} to be suchandsuch. This provides a lower bound on the uncertainty of ⟨X⟩. There was no investigation of possible systematic errors”.
Remark: Notation: Sometimes you see a measurement reported using an expression of the form A±B±C, where A is the nominal value, B is the observed scatter, and C is an estimate of the systematic error of the centroid. This notation is not very well established, so if you’re going to use it you should be careful to explain what you mean by it.
The title of this section is in scare quotes, because you should be very wary of using the term “experimental error”. The term has a couple of different meanings, which would be bad enough ... but then each meaning has problems of its own.
By way of background, note that the word “error” has the same ancient roots as the word “errand” or “knight errant”, referring to wanderings and excursions, including ordinary, normal, and even commendable excursions. However, for thousands of years, the word “error” has also denoted faults, mistakes, or even deceptions, which are all undesirable, reprehensible things that “should” have been avoided.
Sometimes the term “experimental error” is applied to unavoidable statistical fluctuations, and sometimes it is applied to avoidable mistakes and blunders. These two meanings are dramatically different. They are both problematic, but for different reasons:
Consider the contrast:
Negative example: Saying “our result differs from the accepted value by 15% due to experimental error” is not a explanation. Often graders, reviewers, and/or editors will automatically reject a report that contains such a statement.  In contrast, you might get away with using “Experimental Error” as the headline of a section in which the specific sources of error were analyzed. Even that is not recommended; a better headline would be “Sources of Uncertainty” or some such. 
Last but not least, we should mention that the term “error bar” has entered the language as an idiomatic expression. Logically it should be called an “uncertainty bar” but nobody actually says that. So we will continue to call it an error bar, with the understanding that it measures uncertainty.
Beware that you cannot always describe a distribution in terms of some “nominal value” and some “uncertainty”. There is a whole litany of things that could go wrong.
An example of correlated data is shown in figure 46 as discussed in section 8.3.
For a moment, let’s restrict attention to Gaussian distributions. In D dimensions, a Gaussian can be described using a vector with D components (to describe the center of the distribution) plus a symmetric D×D matrix (to describe the uncertainties). That means you need D+D(D+1)/2 numbers to describe the Gaussian.
In the special case where the uncertainties are all uncorrelated, the matrix is diagonal, so we can get by with only 2D numbers to describe the whole Gaussian, and we recover the simple description in terms of “nominal value ± uncertainty” for each dimension separately. Such a description provides us with the 2D numbers that we need. Obviously D=1 is a subcase of the uncorrelated case.  If the uncertainties are correlated, we need more than 2D numbers to describe what is going on. It is impossible in principle to describe the situation in terms of “nominal value ± uncertainty” because that only gives us 2D numbers. 
In the real world, sometimes the uncertainties are uncorrelated, but sometimes they are not. See section 6.15 and section 8.3 for examples where correlations must be taken into account. See section 6.15 for an example of how you can handle correlated data.
Also, beware that not everything is Gaussian. Other distributions – including square, triangular, and Lorentzian among others – can be described using using two parameters, and represented using the “value” ± “uncertainty” notation. Morecomplicated distributions may require more than two parameters.
If you know that your data has correlations or has a nonnormal distribution, be sure to say so explicitly.
The significance of data depends on how the data is being used. Value judgments are involved. Let’s start by examining some examples.
Of course the most significant feature of the data is usually not the only significant feature of the data.
From this we see that true significance is highly dependent on the details of the application. In particular, one feature of the data that might be significant to one user, while another feature is significant to another user.
All this can be summarized by saying some feature of the data is significant if and when it is worth knowing. We take this as our definition of “significance”.
Formerly it some authorities used the term “significance” as a generalpurpose antonym for uncertainty, but nowadays this is considered a bad idea.
Generally it is up to each user of the data to decide which features of the data are significant, and how significant they are. In contrast, the dataproducers generally do not get to decide how significant it is.
It is, however, important for the dataproducers to have an estimate of the significance, to help guide and motivate the dataproduction process. Here’s how it often works in practice: Before attempting to measure something, you ought to identify one or two significant applications of the data. This gives you at least a lower bound on the significance of the measurement. You don’t need to identify all applications, just enough to convince yourself – and convince the funding agencies – that the measurement will be worth doing.
Note the distinction: the dataproducers do not get to decide the significance, but they should obtain an estimate (or at least a lower bound) for the significance.
This explains why in, say, a compendium of fundamental constants, there is much discussion of uncertainty but almost no mention of significance.
Significance is important, and uncertainty is important, but you must not confuse the two. Significance is not even a category or component of the uncertainty. (This is in contrast to, say, roundoff error, which is one component of the overall uncertainty.)
Significance is not the opposite of uncertainty. Uncertainty is not the opposite of significance. We can see this in the following examples:
Various combinations of significance and/or uncertainty are summarized in figure 50.
When only a single scalar is being measured, and only a single final application is contemplated, it is sometimes tempting to arrange things so that the uncertainty of the measurement process is well matched to the inverse of the significance of the final application. Sometimes that is a good idea, but sometimes not.
In this connection, it must be emphasized that the significantfigures rules are a very crude way of representing uncertainty. Also, despite the name, they are not used to represent significance! This should be obvious from the fact that the sigfigs rules as set forth in the chemistry textbooks deal with roundoff error and other sources of uncertainty, which are under control of the dataproducers. The rules say nothing about the datausers, who always determine the true significance.
The foregoing remarks apply to the significantdigits rules, not to the digits themselves. In contrast, if/when we choose to operate under a completely different set of rules, we can arrange for the number of of digits to be related to the true significance. A simple example of this can be found in section 2.1.
Let us now discuss a more interesting example. Suppose we have a chemical plant that unfortunately releases a certain level L of pollutants into the air. The government has established a threshold, and requires that the actual level of pollutants remain below the threshold.
Let us consider the quantities
 (85) 
On a daytoday basis, from the point of view of the plant supervisor, the most significant feature of the data is that x remain less than zero, with high confidence. In many situations it is convenient to replace this with a statement that our best estimate of y is less than zero, where y contains a builtin safety margin.
Note that the assertion that y is less than zero is a onebit binary statement. The value of y is being expressed using less than one significant digit.
The error bars on x, y, and L don’t matter so long as they are short enough, i.e. so long as the distribution on L does not cross the threshold to any appreciable extent.
The plant supervisor may wish to conceal the true value of L from competitors. Therefore it may be desirable, when filing reports, to include only the most severely roundedoff approximation to L.
We have seen multiple reasons why the plant supervisor might find it convenient to round things off very heavily. This roundoff is based on true significance, competitive considerations, and other considerations ... none of which are directly related to the uncertainty of the measurement. To say the same thing another way, the significancebased roundoff completely swamps any uncertaintybased roundoff that you might have done. This significancebased roundoff is not carried out using the “sigfigs” rules that you find in the chemistry textbook ... not by a long shot. This should be obvious from the fact that the sigfigs rules are (at best) a crude way of expressing uncertainty, not significance. The fact that extreme significancebased roundoff is possible is not an excuse for teaching, learning, or using the sigfigs rules.
Meanwhile we must keep in mind that features that are insignificant for one purpose may be very significant for other purposes.
Figure 51 shows a rough outline of how people generally approach data analysis. They start with some raw data. They perform some analysis, perhaps curve fitting of the sort described in section 6.23. The curve is a model, or rather a parameterized family of models, and analysis determines the parameters. The hope is that the fitted parameters will have some meaning that promotes understanding.
The parts of the figure shown in gray express an idea that is not often thought about and even less often carried out in practice, namely the idea that the model could be used to generate data, and given the right parameters it could generate data that is in some illspecified sense “equivalent” to the data we started with. We will not pursue this idea, because it’s not the best way to do things.
A better strategy is shown in figure 52. We start by choosing some parameters that seem plausible, in the right ballpark. We feed those into the model, to generate some fake data. We then analyze the fake data using our favorite dataanalysis tools. The reconstructed parameters really ought to agree with the chosen parameters. This is a valuable check on the validity of the model and the validity of the analysis methods.
Passing this test is necessary but not sufficient. It is necessary because if the analyzer cannot handle fake data, it certainly cannot handle real data. It is not sufficient because sometimes the analyzer works fine on fake data but fails miserably on realworld data – perhaps because both the model and the analyzer embody the same misconceptions.
Please see reference 2 for a discussion of fundamental concepts of probability.
The term “significant figures” is equivalent to “significant digits”. Such terms are commonly encountered in introductory science books. At last check they were more common in chemistry books than in physics or biology books. They appear to be gradually becoming less common overall, which is a good thing.
The meaning of these terms is remarkably muddled and inconsistent. There are at least three categories of ideas involved. These include:
No matter what goal we are trying to achieve, sig figs are never the right way to do it. Consider the following contrast between goals and means, in each of the three categories mentioned above:
a) Roundoff: Whenever you write down a number, you need to write some definite number of digits, so some sort of roundoff rules are necessary. Basic practical rules for rounding off are given in section 1.1. In more advanced situations, you can apply the crankthreetimes method (section 6.13) to each step in the calculation to confirm that you are carrying enough guard digits.  The sig fig rules are the wrong roundoff rules. They require the roundoff to be far too aggressive. There are plenty of important cases where following the usual “significant figures” rules would introduce unacceptable and completely unnecessary errors into the calculations. See section 6.2 and section 16.4.3 for simple examples of this. 
b) Describing distributions: Basic practical methods for describing probability distributions are outlined in section 1.2. The width of a given distribution can be interpreted as the uncertainty of that distribution. 
Beware
that roundoff is only one contribution to the overall uncertainty.
One of the fundamental flaws in the sigfigs approach is that it blurs
the distinction between roundoff and uncertainty. This is a serious
blunder. Sometimes roundoff error is the dominant contribution to the
overall uncertainty, but sometimes not. Indeed, in a welldesigned
experiment, roundoff error is almost never the dominant
contribution. Furthermore, the sig figs rules do a lousy job of representing the uncertainty. See section 16.5.2 and section 7.7 for examples where sig figs wildly overstate or wildly understate the width of the distribution. 
c) Propagation: Often you perform some calculations on the raw data in order to obtain a result. We need a way of estimating the uncertainty in the result. Practical methods for doing this are discussed in section 6.13 and section 6.15. 
The technique of propagating the uncertainty
from step to step throughout the calculation is a very bad technique.
It might sometimes work for supersimple “textbook” problems but it
is unlikely to work for realworld problems. Commonly propagation
works for some steps in a calculation but not others, and since a
chain is only as strong as its weakest link, the overall calculation
fails. See section 6.19 for additional discussion and
examples of this. Stepbystep propagation does
a particularly bad job when dealing with
correlations. It is also quite laborious and errorprone. This is not intrinsically a sigfigs problem; stepbystep propagation is a bad idea whether or not the uncertainty is represented by sig figs. On the other hand, no matter what you are doing, you can always make it worse by using sig figs. 
People who care about their data don’t use significant figures. Anything you might do with sig figs can be done much better (and more easily!) by other means.
It is not safe to assume that counting the digits in a numeral implies anything about the significance, uncertainty, accuracy, precision, repeatability, readability, resolution, tolerance, or anything else. See section 16.5.2 for more discussion of this point, including an example.
The number 120 can be considered the “same” as 1200 except for place value. This is useful when multiplying such numbers: we can multiply 12 by 12 and then shift the result three places to obtain 144000. This has absolutely nothing to do with roundoff or with any kind of uncertainty. All the numbers mentioned here are exact.
Similar ideas are useful when computing the characteristic (as opposed to mantissa) of a logarithm. Again this has nothing to do with roundoff or uncertainty; the characteristic is the same no matter whether you are using fourplace logarithms or sevenplace logarithms.
These ideas have been around for hundreds of years. They are harmless provided you do not confuse them with other ideas, such as the disastrous ideas discussed in section 16.4.
Given a number in scientific notation, if you know it has been rounded off to a certain number of digits, then you know the magnitude of the roundoff error distribution.
This idea is OK as far as it goes, but there are several important caveats:
We have a serious problem, because nowadays when most people speak of “significant figures” they are referring to a set of rules that require you to keep rounding off until roundoff error is dominant, or at least comparable to the overall uncertainty. This is an abomination, as we discuss in section 16.4.
See section 16.2 and section 18 for a discussion of the mathematical notion of place value and significance.
As discussed in section 4 and section 5.4, there is a crucial distinction between a distribution and some observation drawn from that distribution. An expression of the form 12.3±0.5 clearly refers to a distribution. One problem with the whole idea of significant figures is that in an expression such as x=12.3, you can’t tell whether it is meant to describe a particular observation or an entire distribution over observations. In particular: Does it refer to an indicated value, or to the entire distribution over true values?
A chemistry teacher once asked 1000 colleagues the following question:
Consider an experiment to determine the density of some material: mass = 10.065 g and volume = 9.95 mL Should the answer be reported as 1.01 g/mL or 1.011 g/mL?
Soon another teacher replied
Maybe I missed something, that's a very straightforward problem. The answer should be reported as 1.01 g/mL.
The claim was that since one of the givens is only known to three sig figs, the answer should be reported with only three sig figs, strictly according to the sigfigs rules.
Shortly thereafter, a third teacher chimed in, disagreeing with the previous answers and saying that the answer should be reported as 1.011 g/mL. He asserted that the aforementioned digitcounting rules were «simplistic» and should be discarded in favor of the concept of relative uncertainty. His final answer, however, was expressed in terms of sig figs.
Eventually a fourth teacher pointed out that if you do the math carefully, you find that 1.012 is a better answer than either of the choices offered in the original question.
Remarkably, none of these responses attached an explicit uncertainty to the answer. Apparently they all hoped we could estimate uncertainty using the “sig figs” doctrine. As a result, we don’t know whether 1.01 means 1.01(½) or 1.01(5). That’s distressingly indefinite.
At this point you may be wondering whether this ambiguity is the whole problem. Perhaps we should accept all three answers – 1.01(½), 1.011(5), and 1.012(5) – since they are all close together, within the stated error bars.
Well, sorry, that doesn’t solve the problem. First of all, the ambiguity is a problem unto itself, and secondly there is a deeper problem that should not be swept under the rug of ambiguity.
The deeper problem is that if you solve the problem properly – for instance using the “crank three times” method as described in section 6.13 – you find it might be reasonable to report a density of 1.0116(5) g/mL, which is a very different answer, as you can see in figure 53.
In the previous paragraph, and in the next several paragraphs, we assume the mass and density started out with a halfcount of absolute uncertainty, such as might result from roundoff. Specifically, if we do the calculation properly, we have:
 (86) 
Note that if we count the significant digits and compare the mass to the volume, the mass has two digits more. In contrast, in terms of relative uncertainty, the mass has only one order of magnitude less. This gross discrepancy between the number of sig figs and the relative uncertainty is discussed in section 7.5.3. Given the shape of figure 42, and given a mass just above 10 and a volume just below 10, you would expect a fiasco if you try to do this calculation using significant figures.
Figure 53 shows several of the probability distributions we are considering. It shows each distribution as a histogram. (This is the same style of diagram as figure 21.)
This example illustrates the following point:

Tangential discussion: To make the discussion more complete, we temporarily switch assumptions and assume that the given mass and volume started out with five counts of uncertainty in the last decimal place, such as might result from the sensible laboratory practice of recording all the certain digits plus one estimated digit.Under the new assumptions, 1.012(5) is the best answer, although 1.011(5) and even 1.01(½) might be considered acceptable. These three answers agree within the stated error bars.
Note that no matter which assumption you make, it is hard to justify the unadorned answer 1.01 – which is the answer that comes directly from applying the sig figs rules. If the givens have a halfcount of uncertainty, the right answer cannot be expressed as 1.01 with a half count of uncertainty ... and if the givens have a few counts of uncertainty, the right answer cannot be expressed as 1.01 with a few counts of uncertainty. There is no way that 1.01 can be a good answer if you want your numbers to have any consistent meaning.
Recall that uncertainty is not the same as insignificance; see section 6.11, section 7.7, and section 11 especially figure 50 in section 13.
The usual “sig figs rules” cause you to round things off far too much. If possible, do not round intermediate results at all. If you must round, keep at least one guard digit.
As an illustration of the harm that “sig figs” can cause, let’s redo the calculation in section 6.20. The only difference is that when we compute the quotient, 11.5136, we round it to two digits ... since after all it was the result of an operation involving a twodigit number. That gives us 12, from which we subtract 9.064 to obtain the final “result” ... either 2.9 or 3. Unfortunately neither of these results is correct. Not even close.
Oddly enough, folks who believe in significant digits typically use them to represent uncertainty. Hmmmm. If they use significant digits to represent uncertainty, what kind of digits do they use to represent significance?
Reference 30 gives additional examples. It summarizes by saying: “The examples show that the conventional rules of thumb for propagating significant figures frequently fail.”
It is sometimes claimed that the sigdigs rules are only intended to give a “rough” estimate of the uncertainty. That sort of apology is crazy and very unhelpful, because even if you believe what it says, it doesn’t make it OK to use sig figs.
Keep in mind that sig figs cause multiple practical problems and multiple conceptual problems, as discussed in section 1.3. Apologizing for the “rough uncertainty” tends to make people lose sight of all the other problems that sig figs cause.
Even if we (temporarily!) focus just on the uncertainty, the apology is often not acceptable, because the socalled “rough” estimate is just too rough. Even ignoring the sectarian differences discussed in section 16.5.1, the “sigdigs rules” convey at best only a range of uncertainties. The top of the range has ten times more uncertainty than the bottom of the range. If you draw the graph of two distributions, one of which is tenfold lower and tenfold broader than the other, you will see that they don’t resemble each other at all. They are radically different distributions. Compare figure 21.
If you do your work even moderately carefully, you will know your uncertainties much more precisely than that. Furthermore, if you are doing data analysis with anything resembling professionalism and due diligence, you will need to know your uncertainties much more precisely than that. One reason is that you will be using weighted averaging and weighted curve fitting – weighted inversely according to the variance – and accurate weighting is important. This leads us yet again to a simple conclusion: Don’t use significant figures. Instead, follow the guidelines in section 7.2.
Returning now to even larger issues: Given something that is properly expressed in the form A±B, sig figs do a lousy job of representing the nominal value A ... not just the uncertainty B. This is important!
To say the same thing another way: The sig figs rules forbid people to use enough guard digits. They require too much rounding. They require excessive roundoff error.
This is a big deal, because all too often, the “sigfigs rules” are taught as if they were mandatory, to the exclusion of any reasonable way of doing business. It is really quite astonishing what some authors say about the “importance” of sig figs.
In addition to the immediate, practical, quantitative damage that sig figs do to the values of A and B, sig figs also lead to multiple conceptual problems, as mentioned in section 1.3.
The “significant digits rules” cannot represent the uncertainty more accurately than the nearest power of ten. For example, they represent the distribution 45±3 in exactly the same way as the distribution 45±1, but as we can see in figure 21, these are markedly different distributions. In the figure, the heavy black curve represents 45±1 while the thin green curve represents 45±3. These curves certainly look different. In this example the uncertainties differ by a factor of three; if the difference had been closer to a factor of ten the contrast would have been even more extreme.
Within the sigdigs cult, there are sects that hold mutuallyincompatible beliefs. There is no consensus. You cannot get a group of teachers to agree within an order of magnitude what “significant figures” mean.
That makes a certain amount of sense when you are recording readings from laboratory apparatus and instruments. The point is that you want the quantization error (i.e. roundoff error) to be smaller than the the intrinsic uncertainty of the instrument. You want the uncertainty of the recorded reading to be dominated by the intrinsic uncertainty of the instrument, and not needlessly increased by rounding.
As is always the case with any form of significant digits, we run into trouble because of the coarseness of the encoding; it is impossible to know by looking at the number how much uncertainty there is in the last digit.
Things get even worse when we consider calculated (rather than observed) numbers. For example, consider the distribution 5.123(9). Nine counts of uncertainty in the third decimal place not only makes the third place uncertain, it makes the second place “somewhat” uncertain. There is no logical basis for deciding how much uncertainty is “too much”, i.e. deciding when to drop a digit.
For present purposes, let’s assume that this sect puts the cutoff just shy of ten counts, so that 1.234(9) will be expressed as 1.234, while 1.234(10) will be rounded to 1.23. (We ignore subsects that put the cutoff elsewhere.)
This sect has the advantage, relatively speaking, of requiring less rounding than the other sects mentioned below ... but in absolute terms it still requires too much rounding. It can seriously degrade your data, as discussed in section 6.11.
This rule actually makes sense provided you know that the quantity has been rounded off, and that roundoff error is the dominant contribution to the uncertainty.
On the other hand, there are innumerable important situations where roundoff should not the dominant contribution, in which case this is the worst of all the sects. It is the most destructive, because it demands the most rounding. It demands an order of magnitude more rounding than the fewcount sect. It basically forces you to keep rounding off until the roundoff error becomes a large contribution to the uncertainty.
Let’s try applying these “rules” and see what happens. Some examples are shown in the following table.
0.10  0.99  
multicount sect:  0.100(10) ⋯ 0.100(99)  0.990(10) ⋯ 0.990(99)  
percent sect:  0.100(1) ⋯ 0.100(10)  0.990(10) ⋯ 0.990(99)  
halfcount sect:  0.100(5)  0.990(5)  
overall range:  0.100(1) ⋯ 0.100(99)  0.990(5) ⋯ 0.990(99) 
Let’s consider 0.10, as shown in the table. If we interpret 0.10 according to the multicount sect’s rules, we get something in the range 0.100(10) to 0.100(99). Meanwhile, if we interpret that according to the percentsect’s rules, we get something in the range 0.100(1) to 0.100(10). Ouch! These two sects don’t even overlap; that is, they don’t have any interpretations in common, except on a set of measure zero. Last but not least, the halfcount sect interprets 0.10 as 0.100(5), which is near the middle of the range favored by the percentsect ... and far outside the range favored by the multicount sect.
Next, let’s consider 0.99. If we interpret 0.99 according to the multicount sect’s rules, we get something in the range 0.990(10) to 0.990(99). Meanwhile, if we interpret it according to the percent sect’s rules and convert to professional notation, we get something in the range 0.990(10) to 0.990(99). So these two sects agree on the interpretation of this number. However, the halfcount sect interprets 0.99 as 0.990(5), which is somewhere between 2x and 20x less uncertainty than the other sects would have you believe.
As shown in the bottom row of the table, when we take sectarian differences into account, there can be two orders of magnitude of vagueness as to what a particular number represents. If you draw the graph of two distributions, one of which is a hundredfold lower and a hundredfold broader than the other, the difference is shocking. It’s outrageous. You cannot possibly consider one to be a useful approximation to the other.
Consider the notion that one inch equals some number of centimeters. If you adhere to the sigfigs cult, how many digits should you use to express this number? It turns out that the number is 2.54, exactly, by definition. Unless you want to write down an infinite number of digits, you are going to have to give up on the idea of sig figs and express the uncertainty separately, as discussed in section 7.2.
Suppose you see the number 2.54 in the display of a calculator. How much significance attaches to that number? You don’t know! Counting digits will not tell you anything about the uncertainty. Calculators are notorious for displaying large numbers of insignificant digits, so counting digits might cause you to seriously underestimate the uncertainty (i.e. overestimate the precision). On the other hand, 2.54 might represent the centimeterperinch conversion factor, in which case it is exact, and counting digits will cause you to spectacularly overestimate the uncertainty (i.e. underestimate the precision).
A number such as 4.32±.43 expresses an absolute uncertainty of .43 units. A number such as 4.32±10% expresses a relative uncertainty of 10%. Both of these expressions describe nearly the same distribution, since 10% of 4.32 is nearly .43.
Sometimes relative uncertainty is convenient for expressing the idea behind a distribution, sometimes absolute uncertainty is convenient, and sometimes you can do it either way.
It is interesting to consider the category of null experiments, that is, experiments where the value zero lies well within the distribution that describes the results. Null experiments are fairly common, and some of them are celebrated as milestones or even turningpoints in the history of science. Examples include the difference between gravitational and inertial mass (Galileo, Eötvös, etc.), the luminiferous ether (Michelson and Morley), the mass of the photon, the rateofchange of the finestructure constant and other fundamental “constants” over time, et cetera.
The point of a null experiment is to obtain a very small absolute uncertainty.
Suppose you redo the experiment, improving your technique by a factor of ten, so that the absolute uncertainty σ_{A} of the result goes down by a factor of ten. You can expect that the mean value of the result m_{A} will also go down by a factor of ten, roughly. So to a rough approximation the relative uncertainty is unchanged, even though you did a much better experiment.
On closer scrutiny we see that the idea of relative uncertainty never did make much sense for null experiments. For one thing, there is always the risk that the mean value m_{A} might come out to be zero. (In a counting experiment, you might get exactly zero counts.) In that case, the relative uncertainty is infinite, and certainly doesn’t tell you anything you need to know.
Scientists have a simple and commonsensical solution: In such cases they quote the absolute uncertainty, not the relative uncertainty.
Life is not so simple if you adhere to the sigfigs cult. The problem is that the sigfigs rules always express relative uncertainty.
To put an even finer point on it, consider the case where the relative uncertainty is greater than 100%, which is what you would expect for a successful null experiment. For concreteness, consider .012±.034. How many digits should be used to express such a result? Let’s consider the choices:
Bottom line: There is an important class of distributions that simply cannot be described using the significantfigures method. This includes distributions that straddle the origin. Such distributions are common; indeed they are expected in the case of null experiments.
In addition to distributions that straddle the origin (as discussed in section 16.5.3), there are some that do not straddle the origin but are nevertheless so broad that they cannot be well described using significant digits.
Let’s look again at the example of the sixsided die, as depicted in figure 7. The number of spots can be described by the expression x=3.5±2.5. There is just no good way to express this using significant figures. If you write x=3.5, those who believe in sig figs will interpret that as perhaps x=3.5(½) or x=3.5(5) or somewhere in between … all of which greatly understate the width of the distribution. If you round off to x=3, that would significantly misstate the center of the distribution.
As a second example, let’s look again at the result calculated in section 6.20, namely 2.4(8). Trying to express this using sig digs would be a nightmare. If you write it as 2.4 and let the reader try to infer how much uncertainty there is, the most basic notions of consistency would suggest that this number has about the same amount of uncertainty as the twodigit number in the statement of the problem ... but in fact it has a great deal more, by a ratio of about eight to three. That is, any consistentlyapplied sigdigs rule understates the uncertainty of this expression. The right answer is about 260% of the “sigfigs answer”.
Note that the result 2.4(8) has eight counts of uncertainty in the last digit. Another way of saying the same thing is that there is 32% relative uncertainty. That’s so much uncertainty that if you adhere to the percentsect (as defined in section 16.5.1) you are obliged to use only one significant digit. That means means converting 2.4 to 2. That result differs from the correct value by 57% of an error bar, which is a significant degradation of your hardwon data, in the sense that the distribution specified by 2.45(79) is just not the same as a distribution centered on 2, no matter what width you attach to the latter.
So we discover yet again that the “sigdigs” approach gives us no reasonable way of expressing what needs to be expressed.
Consider the following contrast:
Suppose some distribution has a nominal value of A and an uncertainty of B. We can write this as A±B, even when we do not yet know the values of A and/or B. We can then find these A and B using algebra.  There is no way to express A±B using significant figures, when A and/or B are abstract or not yet known. 
The same idea applies to electronic computations, including hand calculators, spreadsheets, c++ programs, et cetera. You can use a variable A and a variable B to represent the distribution A±B.  I have never seen a computer represent uncertainty using significant figures. 
To approach the same idea from a different direction:
Often it is important to think about numbers as numbers, without reference to any particular system of numerals.  The notion of significant figures, to the extent that it means anything at all, applies to decimal numerals, not to numbers per se. 
Therefore (unless you are going to forfeit the possibility of doing any algebra or any electronic computation) you need to learn the “±” concept and terminology.
Once you have learned this, you might as well use it for everything, to the exclusion of anything resembling significant figures.
Suppose somebody asks you what is 4 times 2.1. If you adhere to the sigfigs cult, you can’t tell from the statement of the problem whether the numeral 4 is trying to represent a probability distribution (centered at 4 with one sigfig of uncertainty), or whether it is meant to be an exact quantity (plain old 4).
Occasionally somebody tries to distinguish these two cases by making a fuss about units. The idea apparently is that all inexact quantities are measured and have units, and conversely all quantities with units are measured and therefore inexact. Well, this idea is false. Both the obverse and converse are false.
For example:
To summarize: Dimensionless does not imply exact. Exact does not imply dimensionless. Trying to estimate uncertainty by counting the digits in a numeral is a guaranteed losing proposition, and making a fuss about units does not appreciably alleviate the problem.
There is no mathematical principle that associates any uncertainty with a decimal numeral such as 2.54. On the contrary, 2.54 is defined to be a rational number, i.e. the ratio of two integers, in this case 254/100 or in lowest terms 127/50. In such ratios, the numerator is an exact integer, the denominator is an exact integer, and therefore the ratio is an exact rational number.
By way of contrast, sometimes it may be convenient to approximate a rational number; for instance the ratio 173/68 may be rounded off to 2.54(⁄) if you think the roundoff error is unimportant in a given situation. Still, the point remains that 2.54(⁄) is not the same thing as 2.54.
Once I was discussing a distribution that had been calculated to be x=2.1(2). A sigfigs partisan objected that sometimes you don’t know that the uncertainty is exactly 0.2 units, and in such a case it was preferable to write x=2.1 using sig figs, thereby making a vague and ambiguous statement about the uncertainty. The fact that nobody knows what the sig figs expression really means was claimed to be an advantage in such a case. Maybe it means x=2.1(½), or maybe x=2.1(5), or maybe something else.
There are several ways of seeing how silly this claim is. First of all, even if the claim were technically true, it would not be worth learning the sigfigs rules just to handle this unusual case.
Secondly, nobody ever said the uncertainty was “exactly” 0.2 units. In the expression x=2.1(2), nobody would interpret the (2) as being exact, unless they already belonged to the sigfig cult. The rest of us know that the (2) is just an estimate.
Thirdly, it is true that the notation x=2.1(2) or equivalently x=2.1±0.2 does not solve all the world’s problems. However, if that notation is problematic, the solution is not to switch to a worse notation such as sig figs. Instead, you should switch to a better notation, such as plain language. If you don’t have a good handle on the uncertainty, just say so. For example, you could say “we find x=2.1. The uncertainty has not been quantitatively analyzed, but is believed to be on the order of 10%”. This adheres to the wise, simple rule:
Sig figs neither say what they mean nor mean what they say.
You should make it a habit to use the quadratic formula in the form of equation 87 (instead of equation 90), especially in situations where there might be one big root and one small root. This is a vastly more robust algorithm, especially in the bigroot / smallroot situation:

Similar reasoning applies to equation 36.
To say the same thing another way, you never need equation 90, but you can get away with using it in situations where you know the two roots are a complexconjugate pair, or are real and close together. To say the same thing another way, you never need equation 90, but you can get away with using it provided the discriminant b^{2}−4ac is either negative or small compared to b^{2}.
The rationale behind equation 87 is easy to understand:
Note that the function sgn(b) is defined according to equation 88. It is pronounced “signum”. That’s the Latin word for “sign” (as in positive sign or negative sign) but we pronounce it “signum” so it doesn’t rhyme with “sine”.
 (88) 
However, the b=0 case is explicitly excluded from equation 87. When b is zero, the solution to the quadratic is obviously:
 (89) 
By way of contrast, let’s see what happens if we try to solve equation 14 using the “textbook” version of the quadratic formula, namely:
 (90) 
where in this case the variables are:
 (91) 
Let’s do a numerical example, in the case where the acid is strong but moderately dilute:
 (92) 
We are talking about a hypothetical acid. Let’s assume we arrived at the K_{a} value by taking the average of various estimates. There is a huge amount of uncertainty in the resulting K_{a} value, easily ±1×10^{4} or even more. The uncertainty in the concentration is negligible by comparison. Plugging the K_{a} and C_{HA} numbers into equation 90, we get
 (93) 
Now some people might decide on the basis of «common sense» that the number inside the square root could be rounded off to 3.210×10^{9}. The uncertainty is so large that the sigfigs rules require us to round this number to a single digit, so carrying three extra digits «should» be plenty, or so the story goes. So let’s try rounding off and see what happens when we continue the calculation:
 (94) 
which is just completely wrong. Both of the alleged roots of the quadratic are negative. It is physically impossible for the [H^{+}] concentration to be negative.
Analysis: It turns out that the «common sense» roundoff leading to equation 94 was a disaster. In this situation:
There exists a purely mathematical concept of “place value” which is related to the concept of significance. We mention it only for completeness, because it is never what chemistry textbooks mean when they talk about “significant digits”.
For example, in the numeral 12.345, the “1” is has the highest place value, while the “5” has the lowest place value.
Sometimes the term “significance” is used to express this mathematical idea. For example, in the numeral 12.345, the “1” is called the mostsignificant digit, while the “5” is called the leastsignificant digit. These are relative terms, indicating that the “1” has relatively more significance, while the “5” has relatively less significance. We have no way of knowing whether any of the digits has any absolute significance with respect to any real application.
This usage is common, logical, and harmless. However, since the other usages of the term “significant digit” are so very harmful, it may be prudent to avoid this usage as well, especially since some attractive alternatives are available. One option is to speak of place value (rather than significance) if that’s what you mean.
Another option is to speak of mantissa digits. For example, if we compare 2.54 with 2.5400, the trailing zeros have no effect on the mantissa. (In fact, they don’t contribute to the characteristic, either, so they are entirely superfluous, but that’s not relevant to the present discussion.) Similarly, if we compare 2.54 to 002.54, the leading zeros don’t contribute to the mantissa (or the characteristic).
It is more interesting to compare .0254 with .000254. In this case, the zeros do not contribute to the mantissa (although they do contribute to the characteristic, so they are not superfluous). This is easy to see if we rewrite the numbers in scientific notation, comparing 2.54×10^{−2} versus 2.54×10^{−4}.
To make a long story short, the mantissa digits are all the digits from the leftmost nonzero digit to the rightmost nonzero digit, inclusive. For example, the number 0.00008009000 has four mantissa digits, from the 8 to the 9 inclusive. In more detail, we say it has a superfluous leading zero, then four placeholder digits, then four mantissa digits, then four superfluous trailing zeros.
Keep in mind that the number of mantissa digits does not tell you anything about the uncertainty, accuracy, precision, readability, reproducibility, tolerance, or anything like that. If you see a number with N digits of mantissa, it does not imply or even suggest that the number was rounded to N digits; it could well be an exact number, as in 2.54 centimeters per inch or 2.99792458×10^{8} meters per second.
When the number system is taught in elementary school, mantissa digits are called “significant digits”. This causes conflict and confusion when the highschool chemistry text uses the same term with a different meaning. For example, some people would say that 0.025400 has three significant digits, while others would say it has five significant digits. I don’t feel like arguing over which meaning is “right”. Suggestions:
This section continues the discussion that began in section 4.7. It makes the point that the relationship between indicated value and true value does not need to be simple or evenly spaced.
Suppose you wanted to measure some 5% resistors and sort them into bins. The industrystandard binlabels are given in the following table, along with the corresponding intervals:
indicated  range of  
value  true values  
1.0  :  [0.95,  1.05] 
1.1  :  [1.05,  1.15] 
1.2  :  [1.15,  1.25] 
1.3  :  [1.25,  1.4] 
1.5  :  [1.4,  1.55] 
1.6  :  [1.55,  1.7] 
1.8  :  [1.7,  1.9] 
2.0  :  [1.9,  2.1] 
2.2  :  [2.1,  2.3] 
2.4  :  [2.3,  2.55] 
2.7  :  [2.55,  2.85] 
3.0  :  [2.85,  3.15] 
3.3  :  [3.15,  3.45] 
3.6  :  [3.45,  3.75] 
3.9  :  [3.75,  4.1] 
4.3  :  [4.1,  4.5] 
4.7  :  [4.5,  4.9] 
5.1  :  [4.9,  5.34] 
5.6  :  [5.34,  5.89] 
6.2  :  [5.89,  6.49] 
6.8  :  [6.49,  7.14] 
7.5  :  [7.14,  7.79] 
8.1  :  [7.79,  8.59] 
9.1  :  [8.59,  9.54] 
10.  :  [9.54,  10.49] 
It may not be obvious at first, but this table does have a somewhat logical basis. Roughly speaking, it comes from rounding the readings to the nearest 1/24th of 20dB, exponentiating, and then rounding to one decimal place. For what it’s worth, note that even in the absence of roundoff, it would be barely possible to cover the entire decade and still keep all the readings within 5% of the nominal bin label. That’s because 1.05 is too small and/or 24 is too few. Roundoff makes it impossible. One consequence is that if you want a resistance of 1.393 kΩ, you cannot approximate it within 5% using any standard 5% resistor. You can’t even approximate it within 7%.
This is sometimes called “the train book” because of the cover, which features a crashed train at the Gare Montparnasse, 22 October 1895. It’s a beautiful photograph, but alas it conveys completely the wrong idea about what we mean by “error” in the context of error analysis, as discussed in section 4.8.
In the first 70 pages, the book contains many formulas, none of which can safely be applied to real data, as far as I can tell.
Footnotes
Copyright © 2003–2011 jsd