*   Contents

Data Provenance and Graph Interpretation

1  How to Read the Graphs

usa-static-8par
Figure 1: US Coronavirus Daily Deaths

This diagram shows four main quantities of interest:

Note: The just-mentioned items can be normalized in absolute terms (deaths) or in per-capita terms (deaths per million population). The logarithmic derivative is the same no matter what normalization you use.

The acceleration is the rate-of-change of the height of the curve (i.e. the daily death rate). In other words, it is the rate-of-change of the rate-of-change of the area under the curve (i.e. the cumulative death toll). This is calculated by fitting an exponential, so the acceleration reflects not absolute change but rather the relative change, which is very nearly the percentage change. In mathematical terms, the acceleration is the logarithmic derivative of the death rate. It is measured in cNp per day. (One cNp is very nearly one percent. See section 4 for an explanation.)

A negative acceleration is synonymous with a deceleration.

Curve-fitting is a powerful technique for averaging out the meaningless day-to-day variations. The fitted exponential is easier to interpret than the raw data, as discussed in section 3.2.

2  Objective

The main purpose in looking at graphs like this is to help decide what’s good policy and what’s bad policy. When doing so, keep in mind that (a) there are other sources of information (notably observations of countries that have been successful in suppressing the virus), and (b) this data is imperfect, as we now discuss.

3  Caveats

3.1  The Data is Not Timely

The data is seriously delayed, and arrives in batches. Specifically:

  1. Deaths are a lagging indicator. After people are exposed, it takes them a while before they get sick enough to die.
  2. Deaths are not promptly reported. Most deaths that have occurred in the last week have not been reported at all. This is even worse than noise, because it introduces bias into the data.
  3. Reports are sketchy to nonexistent on weekends. On Mondays, we see the reports from Sunday, which are anomalously low. On Tuesdays and perhaps Wednesdays, they play catch-up.

3.2  The Data is Noisy

Imperfect data is better than no data. Everybody (including scientists and everybody else) makes decisions based on imperfect data all day every day. We should not over-react or under-react to the presence of noise in the data.

A certain amount of the noise is Poisson noise, which is inescapable because we are (thank heavens) working with smallish numbers.

Fitting an exponential to the data is a very powerful way of averaging out the short-term noise. As long as public-health policy doesn’t change, and other “facts on the ground” don’t change, we expect the daily death rate to follow a more-or-less exponential trend. As a corollary, we expect the acceleration (i.e. the logarithmic derivative of the death rate) to be constant. Similarly, we expect the cumulative death toll to be an exponential offset by some constant.

3.3  Other Types of Data

3.4  Extrapolate at Your Own Risk

You could kinda maybe sorta get away with extrapolating the nationwide trend back in early March, when there was more-or-less one big outbreak. But not any more. Now we have hundreds of smaller outbreaks. The only way to make sense of the situation is to model each outbreak separately, and then add the results.

Let’s be clear: In general, it is difficult to extrapolate the nationwide or even statewide trends.

4  Growth Rate in Logarithmic Units: cNp = centineper

One centineper (abbreviated cNp) corresponds to 1% when the changes are small. When the changes are large, centinepers behave much better. It’s the difference between compound interest and simple interest. A more detailed discussion with examples and graphs is available.

5  Data Sources