You may have heard people speak of “the scientific method”. Be careful, because that is an idiomatic expression. As such, it must not be taken literally.
By way of analogy: Voltaire famously remarked that the Holy Roman Empire was neither holy, nor Roman, nor an empire. The term cannot be taken literally, yet we continue to use it. This is characteristic of idiomatic expressions. In English, there are hundreds of idomatic expressions, such as “raining cats and dogs”, “it cracks me up”, and “it breaks my heart”.
When sensible people speak of “the scientific method”, the term is never taken literally. The discussion is not limited to any specific method, but instead covers something much grander, which we could describe as:
Some discussion of how science is done can be found in section 2.
When an idiomatic expression is taken literally, it crosses the boundary from idiomatic to idiotic.
In particular, it would be a spectacular mistake to take “the scientific method” literally. Alas, this mistake is remarkably common among non-scientists.
For starters, scientific research is not nearly as methodical as non-scientists seem to think it is. Of course some scientific activities are highly methodical, but other activities – especially research and exploration – are not.
As an extreme form of this mistake, people who have not done any research, nor even seen it done, sometimes equate “the scientific method” with a step-by-step hypothesis-testing approach. This is a travesty. It makes research seem hundreds of times easier than it really is. It is an insult to every researcher – past, present, and future. Overemphasis on hypothesis testing makes it impossible to understand the history of science. Also it is a disservice to students who may be thinking becoming scientists, since it gives them a false impression of what they are getting into.
Some of these misconceptions are discussed in more detail in section 5.
Here are some of the principles that guide how science is done:
This strikes me as unnecessary and abusive. If you want to talk about strategy, call it “strategy”. If you want to talk about the nature of science, call it “the nature of science”. For example, the NSTA has a nice discussion of the nature of science (reference 5).
Every increase in computer power increases the importance of calculation and computation.
Often calculation and computation go hand-in-hand with visualization (item 20).
For additional discussion of “thinking skills” per se – including how to learn, and how to teach thinking skills – see reference 7.
See also reference 1, reference 8, and reference 9 for sensible discussions of what science is, and how scientists do science.
It is important to know the difference between science and pseudo-science. An amusing story about this can be found in reference 10.
There is an important distinction between fallacy and absurdity. An idea that makes wrong predictions every time is absurd, and is not dangerous, because nobody will pay any attention to it. The most dangerous ideas are the ones that are often correct or nearly correct, but then betray you at some critical moment.
Pernicious fallacies are pernicous precisely because they are not absurd. They work OK some of the time, especially in simple “textbook” situations … but alas they do not work in general.
You need not worry about the “most erroneous” errors. You should worry about the “most deceptive” and “most destructive” errors.
You should avoid using fallacious arguments, and you should object loudly if somebody tries to use them on you. Common examples of unscientific thinking include:
As mentioned in item 3 and item 10, most rules have limitations on the accuracy and/or their range of validity. You should neither over-react nor under-react to these limitations.
Consider the contrast: Equation 1 is very different from equation 2:
| x = y provided a, b, and c (1) |
| x = y (2) |
which means x = y in all generality.
It is a common mistake to mislearn, misremember, or misunderstand the provisos, and thereby to overestimate the range of validity of such a rule.
There are several ways such mistakes can come about. I’ve seen cases where the textbook soft-pedals the provisos “in the interest of simplicity” (at the expense of correctness). I’ve seen even more cases where the text and the teacher emphasize the restrictions in equation 1, yet some students gloss over the provisos and therefore learn the wrong thing, namely equation 2.
Another possibility is that we don’t fully know the provisos. A good example concerns the Wiedemann-Franz law. There are good theoretical reasons to expect it to be true, and experiments have shown it to be reliably true over a very wide range of conditions. That was the whole story until the discovery of superconductivity. The Wiedemann-Franz law does not apply to superconductors, and you will get spectacularly wrong predictions if you try to apply it to superconductors. My point is that before the discovery of superconductivity – which was a complete surprise – there was no way anyone could have had the slightest idea that there was any such limitation to the Wiedemann-Franz law.
As mentioned in item 13, offering non-specific and/or non-constructive criticism doesn’t help anybody.
It is important to keep track of the limitations of each model, and to communicate the limitations. If you see some folks at risk of error because they are disregarding the limitations, it is helpful to remind them of the limitations. Sometimes it is worth trying to find improved ways of expressing the limitations.
If a model stands in need of improvement, the best thing you can do is to improve it. Devise a rule that has more accuracy and fewer limitations. (You may find this is more easily said than done.) Communicate the new rule to the community, and explain why it is better.
If you can’t devise a better rule on your own, you might hire a scientist to do it for you. (Again, you might find that devising accurate, robust models is more easily said than done.)
There’s a rule that says “don’t borrow trouble”. Conversely, you shouldn’t spread trouble around, either. Let me explain what that means: Suppose a rule is good enough to solve Joe’s problem, but is too limited to solve Moe’s problems. Then it’s not constructive for Moe to complain about what Joe is doing. It’s none of Moe’s business. If Moe accuses Joe of using a “wrong” rule, the accusation is false; just because the rule is no good for Moe’s purposes doesn’t make it no good for Joe’s purposes. Conversely, if Joe notices that the rule is too limited to handle Moe’s problem, that is no reason for Joe to distrust the rule within its proper limitations.
This is worth mentioning, because some people think that “the truth” must be exact and unlimited, and conversely anything that has limitations must be worthless. This can be seen as an extreme form of over-reacting to the limitations of a model, but it is all too common. See section 3.5 for more on this.
If Joe and Moe choose to work together to devise a new, grander model that has fewer limitations, so that it can handle both their problems, that is great – but it is their choice, not their obligation, and should not be an impediment to using the old model to solve Joe’s problems.
Sometimes we are faced with black-and-white choices, as indicated in figure 1.
More often, though, the choices form a one-dimensional continuum: not just black and white, but all shades of gray in between, as indicated in figure 2.
It is an all-too-common mistake to see things in black-and-white when really there is a continuum. This well-known fallacy has been called by many names, including false dichotomy, black-and-white fallacy, four-legs-good two-legs-bad, Manichaean fallacy, et cetera.
To say the same thing again, it is all too common for people to assume that everything that is not black is completely white, everything that is not white is completely black, everything that is not perfect is worthless, everything that is not completely true is completely false, their friends are always good and their enemies are always evil, et cetera.
A related but more-subtle fallacy is to assume that all things that are not perfect are equally imperfect. In contrast, the fact is that point B in figure 2 is much blacker than point A, even though neither one is perfectly black nor perfectly white.
Understanding this is a crucial part of scientific thinking, because as mentioned in item 3, scientists are continually dealing with rules that are inexact or otherwise imperfect. The point is that we must make judgements about which rules are better or worse for this-or-that application. We cannot just say they are all imperfect and leave it at that. They are definitely not equally imperfect.
Actually, sophisticated thinking requires even more than shades of gray. Often things must be evaluated in multiple dimensions, evaluated according to multiple criteria at once, as indicated in figure 3. Option A is better for some purposes, and option B is better for other purposes.
See reference 14 for more about the distinction between truth and knowledge.
In science as in daily life, it is necessary to make approximations, as mentioned in item 17. For example, when you buy shoes, you don’t buy a pair that is exactly the right size; you buy a pair that is close enough to the right size.
Elementary arithmetic is exact, in the sense that 2 plus 2 equals 4 exactly. In contrast, physics, chemistry, biology, etc. are not exact sciences; they are natural sciences. For example, Newton’s law of universal gravitation
| FI = G |
| (3) |
is one of the greatest triumphs in the history of human thought … but we know it is not exact. It is a very good approximation when the gravitational field is not too strong and not changing too quickly. It is also misleading, because FI is not the only contribution to the weight of ordinary terrestrial objects; there are significant correction terms from other sources including the rotation of the earth, as discussed in reference 15.
It is a common mistake to treat all approximations as equally good, or equally bad.
To say the same thing another way, when you are in a situation that requires making an approximation, that does not give you a license to make a bad approximation. It’s your job to figure out what’s good and what’s bad.
It is not always easy to distinguish good approximations from bad approximations. It requires knowledge, skill, and judgement.
Science rarely offers certainty. Often it offers near certainty, but not absolute certainty. (This is in contrast to religion, which sometimes offers absolute certainty, and to things like elementary arithmetic, which offers absolute certainty over a limited range.)
One of the surest ways to be recognized as a non-scientist is to pretend to be certain when you’re not.
The world is full of uncertainty. It always has been, and always will be. You should not blame science for “causing” this uncertainty, and you should not expect science to eliminate this uncertainty. Instead, science tells us good ways to live in an uncertain world.
Techniques for quantifying uncertainty are discussed in reference 3.
As mentioned in item 18, it is impossible for anyone to do anything without making assumptions.
Remember that a major purpose of scientific methods is to make useful predictions and to avoid mistakes. False assumptions are a common source of serious mistakes.
At this point, non-experts commonly say “don’t make assumptions” or perhaps “check all your assumptions”. Alas, that’s not helpful. After all, most assumptions are true and useful ... otherwise people wouldn’t assume them. The trick is to filter out the tiny minority of assumptions that turn out to be false. This is far easier said than done. There are too many assumptions, and it is impractical to even list them all, let alone check them all.
The real question is, which assumptions should be checked under what conditions? There is no easy answer to this question.
Assumptions can be classified, approximately, as explicit assumptions and implicit assumptions. Explicit assumptions are the ones you know you are making. They are usually not the main problem; you can make a list of the explicit assumptions and then check them one by one.
The big trouble comes from implicit assumptions that aren’t quite true. This includes things that “everybody knows” to be true, but are not in fact true, as discussed in reference 12. They also include rules that have become invalid because you have mistaken the provisos, as discussed in section 3.3.
Skilled scientists can question assumptions somewhat more quickly and more methodically than other folks, because they have had more experience doing it. But it’s never easy. All of us must rack our brains to figure out which assumptions have let us down.
It always looks relatively easy in retrospect. Once somebody has identified the assumption that needed repair, it is easy for everybody else to hop onto the bandwagon.
One sometimes-helpful suggestion is this: If you find a contradiction, inconsistency, or paradox in what you “know”, that is a good reason to start questioning assumptions. Start by questioning the assumptions that are most closely connected to the contradiction.
Some scientists keep lists of paradoxes. If an item stays on the list for a long time, it means there is a problem that is not easily solved, and the solution is likely to be a turning point in the history of science. Examples from the past include the Gibbs paradox, the black-body paradox, various paradoxes associated with the luminiferous ether, the Olbers paradox, et cetera.
An important component of science, especially of scientific research, involves exploring new territory. Commonly assumptions that were valid in the old territory break down in the new territory. Indeed when researchers choose where to explore, they often seek out situations where assumptions can be expected to break down, since that will reveal new information. For more on this, see section 7 and reference 16.
| In ordinary applications, when you want to rely on the model, you should stay safely within the limitations of the model. | In research mode, where the model is the object of research, you are testing the model, not relying on it. Then it makes sense to patrol along the boundaries, to see if the limits need to be tightened or loosened. It also sometimes makes sense to go far beyond the limits, in hopes of making a surprising discovery. |
As discussed in reference 4, there are two kinds of statements: asssertions and hypotheses. Unlike an ordinary assertion, a hypothesis is stated without regard to whether it is true or false, probable or improbable, et cetera.
In many cases, after a scientific result is complete or nearly complete, it can be retrospectively summarized in terms of hypothesis testing. That is, we can make a list of hypotheses and say which are consistent with the results and which are ruled out by the results. One should not imagine, however, that all scientific work is motivated by hypotheses or organized in terms of hypotheses. Some is, and some isn’t.
Science – especially exploration and research – usually involves a multi-stage iterative process, where the results of early stages are used to guide the later stages. The early stages are not well described in terms of hypothesis testing, unless we abuse the terminology by including ultra-vague hypotheses such as “I hypothesize that if we explore the jungle we might find something interesting”.
Typical example: When Bardeen, Brattain, and Shockley did their famous work, they started from the hypothesis that a semiconductor amplifier device could be built. This hypothesis turned out to be true, but it was neither novel nor specific. The general idea had been patented decades earlier by Lilienfield. Indeed a glance at the following table would have led almost anyone to a vague hypothesis about semiconductor triodes.
| vacuum-tube diode (known) | vacuum-tube triode (known) | |
| semiconductor diode (known) | ??? |
The problem was, all non-vague early hypotheses about this topic turned out to be false. It is easy to speculate about semiconductor amplifiers, but hard to make one that actually works. The devil is in the details. Bardeen, Brattain, and Shockley had to do a tremendous amount of work. Experiments led to new theories, which led to new experiments ... and so on, iteratively. Many iterations were required before they figured out the details and built a transistor that worked.
Example: When Kamerlingh Onnes began his famous experiments, he was not entertaining any hypotheses involving superconductivity. He was wondering what the y-intercept would be on the graph of resistivity versus temperature; it had never occurred to him (or anyone else) that the graph might have an x-intercept instead.
Example: When Jansky began his famous experiments, he was not entertaining any hypotheses about radio astronomy. He spent over a year taking data before he discovered that part of the signal had a period of one sidereal day. At this instant – and not before – the correct hypothesis came to mind: that part of the signal was emanating from somewhere far outside the solar system. The point is that a very great deal of scientific activity preceded the historic hypothesis.
Looking back with 20/20 hindsight we can analyze and summarize Jansky’s work in terms of hypotheses ruled out or not ruled out ... but hindsight is not a useful method to the researcher who is doing the original work.
Example: On the day when Fleming discovered penicillin, he was not entertaining any hypotheses about penicillin, antibiotics, or anything remotely similar. The key observation was the result of a lucky accident. Of course, after the discovery, he considered various hypotheses that might explain the observations, but the point remains: the hypotheses came after the observations, and did not guide the initial discovery.
Example: At the opposite extreme, in a typical forensic DNA-testing laboratory, a very specific hypothesis is being entertained: Either sample A is consistent with sample B, or it isn’t. This may be “scientific”, but it isn’t research.
Example: Calculation (item 21) does not usually proceed by means of hypothesis testing. If you are asked to multiply 17 by 29, I suppose you “could” do it by testing a series of hypotheses:
However, I don’t recommend that approach. Reliable and efficient long-multiplication algorithms are available.
Theoretical physics involves a great deal of calculation. Overall, it is not well described as hypothesis testing.
Example: Simple counting is not well described by hypothesis testing. If you are asked to count the number of beans in a given jar, you could contrive all sorts of hypotheses, including:
but none of the hypotheses would do you much good. At some point, if you want an accurate result, you have to count the beans.
As the proverb says: If the only tool you have is a hammer, everything begins to look like a nail. Now, I have nothing against hammers, and I have nothing against hypothesis testing. But the fact remains that in many circumstances, they are not the right tools for the job. Scientists know how to use many different tools.
It is common for people who don’t understand science to radically overemphasize the hypothesis-testing model, and to underestimate the number of iterative stages required before a good set of hypotheses can be formulated. It is a common but ghastly mistake to think that a good set of hypotheses can be written down in advance, and then simply tested.
Overemphasizing hypothesis-testing tends to overstate the importance of deduction and to understate the importance of induction, exploration, and serendipity.
Over-emphasis on hypothesis testing is not the only widespread fallacy.
Another misconception involves over-emphasis on an over-strict notion of experimentation, namely controlling the system, making changes, and observing how the system responds to the changes.
However, in some cases it is easier to imagine an experiment than to actually carry it out. The experiment might be physically impossible, prohibitively expensive, and/or unethical.
Example: Astronomy is not well described in terms of experimentation. Most of astronomy is an observational science, not an experimental science. We lack the means to perturb stars and galaxies; all we can do is observe.
Being restricted to observation rather than experimentation makes doing science very much harder. Specialized scientific methods are required.
Example: Epidemiology often requires a considerable amount of passive observation ... instead of or in addition to some amount of experimentation.
Example: Paleontology is not well described in terms of experimentation (not counting Jurassic Park).
Note: In all these cases, some aspects of the observations can be checked by experiment. For example, stellar spectra can be compared with laboratory spectra. However, the primary subject matter (stars, galaxies, et cetera) remain beyond the reach of real experimentation.
As discussed in section 5.1, we should not overemphasize hypothesis testing. On the other hand, we should not overemphasize serendipity, either.
All too often, people tend to draw boundaries where no real boundaries exist, and tend to focus on the extremes when reality lies in the middle, far from any extreme. The following table shows some of the wrong ways to look at the situation:
|
To repeat: reality lies in the middle, usually far from any extreme. The table is typeset on a red background with lots of question marks, to warn you that it is unwise to make the distinctions on each row, and unwise to equate things in each column.
| At one extreme, it would be a mistake to think that research is precisely guided by pre-existing hypotheses. | At the opposite extreme, it would be a mistake to think that research is conducted at random, with no idea what to look for or where to look. |
Many things that are touted as “big discoveries” were actually invented, step by step, combining many small discoveries and many small inventions.
By way of example, when people found out about the magnetrons used for radar in World War II, they assumed it had been a sudden discovery. Actually it had been the subject of years of intense research and development, but the work had been kept secret.
As another example, Pierre and Marie Curie announced in 1898 the “discovery” of radium, but then they had to slave away in a decrepit shed for several years before they could isolate a pure sample and determine just how radioactive it was.
Thomas Edison said that inventing was 1% inspiration and 99% perspiration. I don’t want to argue about the exact percentages; the crucial point is that both are necessary. Inspiration alone won’t get the job done. Perspiration alone won’t get the job done.
I am reminded of the rock star who said it took him 15 years to become an overnight sensation.
Scientists understand probability and statistics. By definition, you can’t make a particular fortunate accident happen on demand, but you can work in an area where valuable discoveries are likely to be made from time to time. Everybody must accept some risk. For example, any sensible farmer knows there is some risk that a freak storm will destroy his entire crop. The key idea is that successful crops are sufficiently common – and the crop is sufficiently valuable – that the farmer makes money on average.
Scientists do not accept all risks, nor do they decline all risks. They accept risks that are likely to pay off well on average.
See reference 16 for more on this.
As mentioned in item 2, a major purpose of scientific methods is to make useful predictions and to avoid mistakes. The known scientific methods are a collection of guidelines that have been found to work reasonably well.
One of the most important steps in avoiding mistakes is to always keep in mind that mistakes are possible. This is so important that this whole section is devoted to emphasizing it and re-expressing it in assorted ways.
James Randi said you should take care not to fool yourself, keeping in mind that “the easiest person to fool is yourself”.
Another word for this is modesty. Being aware of your own fallibility is modest. Pretending you are infallible is immodest.
It is OK to a limited extent to be an advocate for your favorite idea, but you must not get carried away. When you collect data in support of an idea, you must also look just as diligently for data that conflicts with that idea. See section 9.3.
A related form of modesty, which is also crucial for avoiding mistakes, is to not overstate your results. Scientists use certain figures of speech that are designed to avoid overstatement. Among other things, this includes recognizing the distinction between data and the interpretation that you wish to place upon the data. As an illustration, imagine some children go on a field trip to the dairy. Upon their return, they write a childish report that says “cows are brown” – or, worse, “all cows are brown”. A more modest, scientific approach would be to say “the cows we observed were all predominantly brown”. A statement about the observed cows sticks closely to the data, while a generalization about all cows requires a leap beyond the data.
As mentioned in item 10 and section 3.3, practically all scientific results have some limits to their validity, and you must clearly understand and clearly communicate these limits.
Here is a very incomplete sketch of some of the issues that arise when taking data. (This stands in contrast to the rest of this document, which mostly emphasizes the analysis phase.)
Consider the famous Twelve Coins Puzzle as discussed in reference 17. Suppose you find a casino that is willing to pay you $350 for identifying the odd coin, but makes you pay $100 for each weighing. If you weigh the right combinations of coins, you can do the job in three weighings, so you make money every time. In contrast, if you follow a sub-optimal strategy that requires four or more weighings, you will lose money on average.
This scenario is reasonably analogous to many real-world situations. Commonly there’s a significant price for making a measurement, and you want to maximize the amount of information you get for this price.
I mention this because all too often, people claim that a principle of scientific experimentation is to “change only one variable at a time”. It’s easy to see that such a claim is hogwash. The Twelve Coins Puzzle suffices as a counterexample. If each weighing differs from the previous weighing by only one coin, you cannot come anywhere close to an optimal solution.
The suggestion to “change only one variable at a time” might nevertheless be good advice in some special situations. That’s because the cost of making a measurement is not always the dominant cost in the overall information-gathering process. For example, imagine a situation where gathering the raw data is very cheap, while just plain thinking about it is expensive. Then you might want to follow a strategy, such as changing only one variable at a time, that makes the data easy to interpret, even though you had to do a large number of experiments (much larger than theoretically necessary). Consider the contrast:
| For young children doing cheap, simple experiments, it might make sense to tell them to change only one thing at a time, because the rate-limiting step is interpreting and understanding the data, and we want to make that step as easy as possible. | For skilled scientists (and engineers, farmers, etc.) doing complex, expensive experiments, changing only one variable at a time would be an unnecessary burden, and often a disastrous burden. |
Changing only one variable at a time is a crutch, which may partially compensate for the investigator’s lack of skill in interpreting the data. In contrast, for performers with ordinary ability and training, crutches are harmful, not helpful.
When an experiment involves human collecting or selecting the data, and especially when there is a task involving human subjects, elaborate strict measures must be taken to maintain the integrity of the results.
The problems that can arise are numerous, varied, and often subtle. Some of the variations have names of their own, including:
Techniques that can be used to defend against such problems include blinding (especially double blinding) and the use of thoroughly randomized control groups.
For example, if an observer is interviewing a subject, it is all too common for the interviewer to telegraph the desired answer. Double blinding means that neither the interviewer nor the subject knows what answer is desired, so this particular problem cannot arise.
There have been notorious cases where forensic labs “helped” prosecutors by reporting false DNA matches, false bullet matches, et cetera. It is very difficult to prevent this. At the very least there must be strict blinding, plenty of samples from a suitable control group, and independent testing to detect false matches and detect departures from protocol.
Even when there is no human subject, for instance if a human observer is looking at an inanimate dial, there are many ways that bias can creep in. Sometimes blinding helps, but there may be serious practical disadvantages to this. Nowadays often the simplest thing to do is to digitize the readings and stream them into a file on a computer ... either instead of or (preferably) in addition to having a human observer.
It is OK to a limited extent to be an advocate for your favorite idea, but you must not get carried away. When you collect data in support of an idea, you must also look just as diligently for data that conflicts with that idea. Then you must weigh all the data fairly, and disclose all the data when you discuss your idea. (If you don’t do this it is called “selecting” the data, which is considered a form of scientific fraud.)
The same applies to theories: It does not suffice to show that your favorite theory does a good job of fitting the data. You should diligently search for other theories that do a comparably good job of fitting the data.
This is what sets science apart from debating and lawyering, where advocacy is carried to an extreme, and it is considered acceptable to skip or make light of data that tends to support the “opposing side”.
In science, the phrase “selecting the data” has very nasty connotations, namely selecting the data so as to support some preconceived notion. For example, it is unacceptable to discard some of the data because it seems “out of range” or “implausible”.
On the other hand, there are cases where it is acceptable or indeed necessary to examine a subset. The requirement is to draw a fair sample. That is, the data should be sampled in such a way that the sampling does not bias the result.
For example, if you want to compute the average aspect ratio of eggs, and you have millions of eggs available, it is acceptable to choose a moderately small sample and measure only the sample. You must, however, arrange that the sample is chosen in such a way that no bias is introduced.
Sometimes during the course of an experiment, it is necessary to abandon or veto a measurement. For example, if you are trying to measure the length and width of an egg, and the egg falls to the floor and gets smashed before the measurements are complete, you have to veto that egg. You must, however, make sure that such losses are independent of the thing you are trying to measure. In particular, if it should happen that high-aspect-ratio eggs are more likely to be dropped, this could seriously degrade the aspect-ratio measurement. Redesign the experiment to prevent such losses, and start over.
On the other side of the same coin, unless you are absolutely sure that your sampling does not bias the result, you should assume that it does bias the result. This is not tragic; it just means that the details of the sampling procedure become part of the definition of what you are measuring. For example, you can measure the height of a group of basketball players. That is OK so long as you don’t think that basketball players are representative of the population at large.
Design your experiments with plenty of dynamic range and plenty of “headroom” so as to minimize the chance of data falling outside the range of your instruments. Whenever data is out of range, vetoing the data is just the beginning. You then have to analyze how much distortion that introduces into the measurement you are trying to make. Such an analysis is usually difficult and sometimes impossible. Vetoing out-of-range data is a notorious source of serious error.
Roundoff errors are another notorious problem. To avoid this, record the raw data using plenty of guard digits. Do the analytical calculations using plenty of guard digits. See reference 3.
Recall that a measurement generally has both a nominal value and an uncertainty (“error bars”) as discussed in reference 3. Vetoing out-of-range data is particularly likely to distort the error bars, which is unacceptable, even if the nominal value is not greatly affected.
If you are careful, it is OK to do a practice run, then do a for-real run, and publish only the real data. You must, however, decide in advance which runs are for practice and which runs are for real. Otherwise this could become a nasty scheme for selecting the data. In particular, performing run after run until you obtain the “desired” result is completely unacceptable.
Sometimes it is necessary to have a trigger or veto or triage mechanism, i.e. some rule that selects which data will be kept and which will be discarded. Doing this right is very, very tricky ... so tricky that usually it is simpler, cheaper, and all-around better to just keep all the data. Also: when you write up your results, you should describe the trigger criterion in detail, so that readers can judge for themselves whether the trigger introduced any significant bias.
To repeat: The phrase “selecting the data” may not sound nasty the first time you hear it, but the connotations are very nasty indeed. There are various things that could be going on:
The burden of proof is on you, to show that whatever you are doing is legitimate, i.e. that it does not bias the conclusions.
Keep in mind the common-sense principle that you should never put yourself in a position where the first mistake is fatal.
In the real world, scientists and engineers do simulations and dry runs. They build pilot plants before committing to full-scale operation, so that most of their mistakes will be small mistakes.
Therefore you should arrange your experiment so that you can take some data, then do some analysis, and then come back and take some more data. This allows feedback from the analysis phase back to the data-taking phase, so that you can improve the data-taking if necessary.
This is all the more important for students who are, after all, students not experts, and can be expected to make mistakes. Yet we want students to get good results in the end.
If at all possible, arrange it so that analysis (at least some sort of preliminary analysis) happens in real time, so that if anything funny happens during the experiment, the experimenter knows about it, for reasons explained in section 9.5.
Let me tell a little story. Once upon a time in a mythical place called La Jolla there was a fellow named John who really liked numbers. The more digits the better. He wanted unbiased numbers so dearly that he would have his grad student, Richard, face the instrument with his eyes closed; when John said “Now!” Richard would momentarily open his eyes and observe the number. John would write it in the lab book. After doing this for many days and weeks, they had lots of lab books filled with lots of numbers.
They were looking for some sign of a superfluid transition in liquid 3He at low temperatures. John had already made a string of important discoveries in low-temperature physics, and finding the superfluid would move him from pre-eminence to immortality.
Aside: The nice thing about theoretical predictions is that there are so many to choose from. Predictions of Tc started at about 300 milliKelvin. When that was disproved experimentally, the predictions moved to lower temperatures: 100 mK was disproved; 50 mK was disproved; 30 mK was disproved; 10 mK was disproved. In fact John and Richard had checked experimentally down to less than 2.5 mK without seeing anything. So the theorists gave up and lowered their prediction to something in the microKelvin range.
Once upon the same time, in a mythical place called Ithaca, there were a couple of guys named Bob and Doug. They liked numbers OK, but they also liked graphs. They wanted to see the data. And they didn’t want to wait until the experiment was over to analyze the data and see what it meant; they wanted to see the data in real time. This is in the days before computers, so if you wanted a graph you had to suffer for it, fooling with Leeds chart recorders, i.e. mechanical contraptions with paper that always jammed and pens that always clogged.
One Thursday in late November, Doug was watching the chart as the cell cooled through 2.5 mK. There was a glitch on the T versus t trace. Doug circled it and wrote “Glitch!!” on the chart. He warmed back up and saw it again on the way up. And then again on the way back down. He called Bob. Bob put down his plate of turkey and zoomed into the lab. The two of them stayed up all night walking back and forth through the “glitch”.
I’ve seen that chart. It doesn’t look like what I would call a glitch. It’s more of a corner. A small-angle corner, just a barely-visible departure from the underlying linear trend. Eyes are good at spotting corners in otherwise-straight lines.
Doug and Bob assumed, based on the aforementioned experimental and theoretical evidence, that this wasn’t the superfluid transition. They figured it was something going on in the nearby solid 3He. But they eventually figured out that it was indeed the superfluid. Right there at 2.5 mK.
Everybody assumes that if John and Richard had been plotting their data on a strip-chart recorder, they would in fact have discovered the superfluid. But they didn’t.
Now, imagine what it was like working in Bob’s lab after that. With strictness bordering on fanaticism, strip-chart recorders were attached to all the significant variables ... and even some of the not-very-significant variables.
Every so often, a new baby grad student, his fingers stained N different colors from trying to unclog the chart pens, would ask whether we really needed all those chart recorders. Somebody would explain by saying, Once upon a time in a mythical place called La Jolla, .......
Lab notebooks are not supposed to be perfect. If there are no mistakes in the lab book, the lab book is presumably a fraud. You are allowed to mark bad data as bad, but you are not allowed to obliterate it or eradicate it. See reference 18 for an example of a well-kept lab book containing a correction.
You are not allowed to “clean up” the books. You are certainly not allowed to keep two sets of books (a dirty one to record raw data, and a clean one to show off).
There are many reasons for keeping good records. First and foremost, you and your collaborators need the information on a day-to-day basis. During the analysis phase, it is all-too common to find that the data cannot be analyzed, because even though the nominal result of the measurement was recorded, the conditions of the measurement were not adequately recorded. There’s no value in knowing the ordinate if you don’t know the abscissa.
As mentioned in section 9.3, record all the data. If possible, hook up a computer to stream all the data into a file. Record the abscissas as well as the ordinates.
The rule is to keep good records. It is traditional to keep data in a so-called lab book, aka laboratory notebook, which is one way of keeping records, but not the only acceptable way. Good electronic records are an acceptable alternative, and are in some ways better. For example, if a witness signs and dates a page in a lab book, a bad guy might be able to add something to the page later. Sometimes this is detectable, but sometimes not. (Don’t try it.) In contrast, if an electronic document is cryptographically signed, there is no way to alter any part of the document without invalidating the signature.
There are various ways of obtaining unforgeable date-stamps on electronic documents; see reference 19 for an example. If you don’t want to bother with that, one option is to just email the document (or a hash thereof) to your lawyer, with a cover letter saying that you are not asking him to take any action, just to file away the document, so that if there is ever any dispute he can authenticate the date. Note that if the document is huge and/or highly sensitive, you don’t need to send the document itself; it suffices to send a cryptologic HMAC (hashed message authentication code).
For things like blueprints, circuit diagrams and computer programs, electronic records have huge advantages. The point is that such things get revised again and again, and if you just have “the” document, you have no idea who contributed what when. In contrast, a modern revision control system such as git will conveniently keep track of the current version and all previous versions. It also keeps track of who submitted what changes, and when. Submissions can be digitally signed.
If you are collaborating with people at other locations, electronic documents have tremendous advantages.
If you are stuck with an ordinary lab book, and you want to revise or annotate a page after it has been signed, one good way is to attach a sticky note to the page. The note should refer the reader to a later page where your latest&greatest thoughts can be found. Using a sticky note guarantees you can’t be accused of altering the page after it was signed.
In addition to helping you and your collaborators directly, good records have other uses. They are particularly important in connection with patents. Questions about inventorship and priority come up a lot. I’ve seen it first hand, many times. Once I came out on the short end of the stick, which didn’t bother me, because the other guy had records that convinced everybody (including me) that he invented the thing a month before I did, fair and square. More commonly though, one claimant has copious records documenting the gradual evolution of the invention, while the other claimant is just a pretender, with nothing but a bold assertion covering the final, perfect result. Sometimes disputes arise after an application has been filed or after a patent has been granted, but in a large company or large university, disputes can arise intramurally, before the application is filed, when the lawyers try to figure who should be named as inventor(s) on the application.
You should ask your patent attorneys what they want your records to look like ... and then ask them what they will settle for. It seems some lawyers would “prefer” to see every detail recorded in a lab book, then signed and notarized in triplicate ... but they will settle for a lot less formality.