Copyright © 2005 jsd

1  California Standards Test – Process Issues

I was asked to comment on the Physics Standards Test (reference 1). It is part of the California Standardized Testing and Reporting Program.

Executive summary: The main issue has to do with the understanding the process, and fixing the process. How did the system get to be so broken as to produce questions like this? How can we fix it?

As a tangentially-related minor point, we note that the released questions have many flaws. (Details can be found in section 2.) That’s a minor point, because you presumably already knew that. Anybody with rudimentary physics knowledge and teaching ability could see that in an instant. It is not the purpose of this document to quantify just how awful this test is, nor to see how/whether we can improve specific test-questions.

An informative overview of the existing process can be found in reference 2.

The test is called a “Standards Test” but the standardization is not very effective. At the beginning of reference 1 one finds a list of standards, but I see no evidence that the questions uphold those standards.

For that matter, the relevance and even the correctness of the stated standards is open to serious question. This is discussed in section 4.

I am informed that the taxpayers paid ETS (and possibly some other companies) to produce these questions. Well, they should demand a refund. Whoever prepared these questions should be disqualified from doing so in the future.

More generally, whoever prepares questions in the future should provide, for each question, a brief statement as to its rationale. That is, they ought to tell us what the question is supposed to test for. Also the preparer should be responsible for vetting the questions, and should provide evidence of validity and sensitivity. That is, does the question do what it is supposed to do? This includes making sure the intended answer really is better than the distractors, and that each of the distractors really serves some purpose.

It is also important to consider the standard as a whole. It is unacceptable for a question to address one narrow part of the standard, while contravening other more-important parts of the standard (notably the parts that implicitly and explicitly expect students to be able to think and reason effectively).

Similarly it is important to consider the test as a whole. There are important questions of balance and coverage, as discussed in section 3. In particular, even if the test contains some plug-and-chug questions are individually acceptable, a heavy preponderance of such, to the near-exclusion of questions that require even a modicum of reasoning, is collectively unacceptable.

Process must not be used as an excuse. That is, to turn the proverb on its head, the means do not justify the ends. It is preposterous to argue that the questions must be OK because they resulted from an established standards-based process. The unalterable fact is that the questions are not OK. Somebody has got to take responsibility for this.

I mention this because some folks involved in the process have adopted the attitude that

“If an item assesses a standard, it’s good to go on the test”.

Talk like that makes my hair stand on end! It’s a complete cop-out, i.e. it is a lame excuse for acting without good judgment, without common sense. It ignores the truism that no matter what you’re doing, you can do it badly, and sure enough, we can see in section 2 many items that badly address various aspects of the standard.

The facts of the matter are:

The standards do not produce the test. People produce the test. These people need to take responsibility for the product.
If it’s a good test, the standards do not get the credit.
If it’s a bad test, the standards do not get the blame.
It is possible to create a good set of questions, with far fewer deficiencies than we see in section 2, within the existing standards.
There are opportunities to improve the standards in certain ways, as discussed in section 4, but doing so will not change facts (a), (b), (c), and (d).

As an example of something that has much more potential for improving the process, here’s a suggestion: Dual-source the questions. That is, have two companies submit questions.

A crucial ingredient in this arrangement is to pay each company in proportion to how many of its questions are selected for use.

If a vendor objects to such a dual-sourcing arrangement, it is a dead giveaway that they don’t have confidence in their own product. You don’t want to do business with such a vendor.

Test printing, scoring, and analysis should be covered by a separate contract, separate from composing the questions.

2  Comments on Specific Test Materials

There is a bank of questions that are eligible to be used on California Standards Tests. Each year, after the tests have been administered, 25% of the questions are released from the bank. The released questions are not used again. For an index of all released questions, see reference 3.

It is unclear whether the released physics questions (reference 1) are representative of the bank. The released questions are so bad that one hopes they are not representative. It is astonishing to see such a high concentration of objectionable questions. Perhaps they are the culls, removed in order to improve the bank. (Releasing the culls would create the paradoxical situation where doing the right thing makes matters seem worse than they are.)

2.1  Test Questions

The question numbers here conform to the 2006/2007 version of reference 1.

Question 3:   In technical usage – notably in the expression “freely falling reference frame” – the word “falling” does not necessarily mean falling downward. That means we should consider sideways or upward trajectories, in which case answer B works at least as well as answer A.

As a result, this question could well have negative discriminatory power. See discussion of this point under question 17.

Furthermore, some skilled teachers have taught their students to adopt the policy that “human error” is too vague to be acceptable as an explanation for anything. Therefore some test-takers will reject answer A out-of-hand.

It is pointless to argue whether these are strong objections or weak objections. Instead, the convenient and 100% proper way to proceed is to re-word the question to eliminate the objections. In this case a good starting place would be to speak of an object “dropped from rest” (as opposed to merely “falling”).

This just goes to support the larger point that it is important to have a timely, systematic vetting process. Also it is important to have a process for improving questions, not just selecting questions, because choosing the best of a bad lot is unsatisfactory.

Question 4:   This is a remarkably poor question. There are twin weaknesses.

Question 5:   This is another remarkably poor question. The obvious criticism is that the question is unanswerable. The expectation could be based on formal theory or it could be based on direct observation. Real scientists do not make the distinction between “theory” and “hypothesis” that is implied here – see reference 4 for more on this.

Furthermore, the stem of the question contains strange misconceptions and anomalies. One normally applies voltage (not current) to light-bulb circuits, for a number of very good reasons. (One might apply current to a magnet coil, or to the base of a transistor, but those are significantly different physical situations.) Indeed applying current to a series circuit containing a chunk of rubber might be downright dangerous. On an advanced test, I might not object to a question-stem containing obiter dicta that cause confusion, but the confusion here is so pointless and so out of balance with the chug-and-plug questions that make up the rest of the test that I have to assume it is simply another mistake, another question drawn up by somebody who didn’t really understand the subject matter.

I am reminded of the proverbial expression “all hat, no cattle” (referring of course to someone who dresses like a cowboy but has not the slightest understanding of real cows or real cowboys). This question, like the previous one, appears to have been designed by someone who likes to talk about “the scientific method” but has no idea how real science is done ... likes to talk about “physics” but has no real understanding of the subject matter ... and likes to talk about “teaching” but has no idea how real teaching is done.

Question 7:   There are two possibilities:

In either case, this is an objectionable question.

Question 9:   What’s the point? Is there any reason to believe this question has any appreciable sensitivity?

Question 11:   This tests only rather low levels of understanding (not quite the lowest possible levels).

Question 13:   Again, this tests only the almost-lowest levels of understanding. As far as I can tell, it discriminates against the worst sort of mindless plugging-and-chugging, but not much more.

Question 15:   This question was extensively discussed on the Phys-L discussion group. The consensus found this question to be highly objectionable. The four answers are so absurd and bizarre that one cannot imagine any rational basis for deciding which is “best”.

I am informed that the stem of the question should have asked about “speed” not “velocity”. Even with that explanation, the situation is well-nigh incomprehensible.

Question 17:   This is not the worst question I’ve ever seen, but it’s not the best, either.

Question 18:   No serious objections, although C is a pretty lame distractor.

Another question on the same topic appears on the ETS web site (reference 5), as an example of an SAT question. The SAT version is markedly better. Why did California get saddled with the dumbed-down version?

Question 19:   Plug and chug, especially considering that the definition of kinetic energy is given in the appendix.

Question 21:   Plug and chug.

Question 24:   No objections.

Question 26:   Plug and chug. Ineffective distractors, resulting in negligible sensitivity.

Question 28:   This is objectionable for two reasons.

Question 29:   Plug and chug. Ineffective distractors ... is anybody really likely to choose D?

Question 31:   What’s the point? Is it really possible to get this wrong? If somebody gets it wrong, what do we infer from that?

Question 35:   No objections.

Question 36:   No objections.

Question 37:   No objections.

Question 38:   Ineffective distractors. Does anybody really think that smashing a coffee-cup with a hammer increases the “order”?

Also, this uses “order” as a stand-in for “entropy”, which is not really correct, for reasons discussed in reference 7.

Question 42:   Rote memory.

Question 44:   This is at best a one-out-of-two choice, not a one-out-of-four choice. Also, the question is masquerading as a real-world experimental scenario, but how often to you measure a solid that has a speed of sound as low as 1000 m/s? Cork is an example, but there aren’t many such.

Question 47:   This is at best a one-out-of-two choice, i.e. two of the distractors are ineffective.

Question 48:   Rote memory. Ineffective distractors.

Question 49:   Small sensitivity, since it’s obvious that answer D is what is wanted ... even though answer B would be physically correct in almost all practical situations. The other two distractors are ineffective.

Question 50:   Two of the three distractors are lame. (This question could have been made pretty good, with a little reworking. Again this illustrates the point that a process that involves choosing the best of a bad lot is a process in urgent need of improvement.)

Question 52:   Plug and chug. The fact that the battery and second resistor are unlabeled is a dead giveaway that they are irrelevant.

Question 56:   I’m not sure that the nonclassical spin of an electron counts as “motion” in the usual sense ... so the desired answer can be seen as perpetuating a misconception. See discussion of strategy under item 15.

I also wonder about distractor A. Although there is a somewhat-common misconception concerning the northness of north poles and the southness of south poles, I suspect anyone who is sophisticated enough to know the vocabulary word “monopoles” doesn’t suffer from this misconception. It seems as if the test-makers were trying to show off, trying to show me that they knew about monopoles, but alas doing so in a way that weakened the test. All hat, no cattle.

Question 59:   What’s the point? This question seems to be testing for rote-level familiarity with the word “plasma”.

2.2  Reference Sheet

In conjunction with the test-question booklet, students are given a “Physics Reference Sheet” containing “Formulas, Units, and Constants”.

Among these is the formula

Δ S = 
               [allegedly]              (1)

which is really quite objectionable, for reasons discussed in reference 7.

The released questions do not make use of this formula, but one may reasonably fear that some of the unreleased questions do.

3  Coverage and Balance

When viewed as a whole, a well-designed test must meet a number of global requirements, such as requirements as to coverage and balance. That includes making sure that none of the important topics are skipped or unduly under-weighted.

Checking individual questions one by one is not good enough. Checking to see that an individual question is “acceptable” is nowhere near sufficient for achieving acceptable coverage and balance. The existing test-construction process pays far too little attention to coverage and balance. The individual questions, as bad as they are, are not the biggest problem. The lack of coverage and balance is a more serious and more deep-seated problem.

The idea of balance applies not just to individual bits of domain-specific knowledge, but also to higher-level goals. As an example of a higher-level goal, consider question 6 on the corresponding chemistry document (reference 8). In reference to figure 1 it says: «The chart ... shows the relationship between the first ionization energy and the increase in atomic number. The letter on the chart for the alkali family of elements is: ....»

Figure 1: Ionization Energy Trends

This question requires knowing the definition of “alkali family” but does not ask for a rote recitation of the definition. Similarly it requires knowing the definition of “first ionization energy” but does not ask for a rote recitation. Thirdly it requires some minimal skill in interpreting a graph. Therefore this question has a special role to play as part of the overall test:

If you are interested in testing the student’s ability to think, to combine ideas, as opposed to plugging and chugging, then this is a commendable question. The question is not difficult if you know the material, but it does require knowing the material and it also requires a multi-step thought process.   In contrast, if you aren’t trying to test for thinking skills, this question is sub-optimal, because its score is difficult to interpret: If the student gives a wrong answer, you won’t know which of the various steps went wrong.

I am not saying that all questions should require combining multiple ideas. I am saying there needs to be a balance between checking basic factoids one-by-one and checking for higher-level thinking skills.

The alkali question has some remarkable weaknesses to go along with the strengths mentioned above. For one thing, the chart has four Ws but only three Xs, two Zs, and two Ys ... even though it would been the easiest thing in the world to add a third Y and/or remove one of the Ws. This telegraphs that W is the “interesting” thing. A student who doesn’t know the material but is test-wise will pick up on this. I cannot imagine how this weakness arose, or how it slipped through the question-selection process.

Actually there are multiple excellent reasons for removing one of the Ws. Calling hydrogen an alkali metal represents the triumph of dogma over objective reality. For details on the placement of hydrogen, see reference 9.

If the released questions are representative, they indicate that the physics test has serious coverage and balance problems:

4  The Standards Themselves

The full, official physics standards can be found in reference 10. You may wish to compare them to the standards for related subjects, such as math (reference 11), algebra (reference 12), science (reference 13), and chemistry (reference 14). There is even a standard for “Investigation & Experimentation” (reference 15).The state also provides some “framework” documents in areas such as science (reference 16) and math (reference 17) to explain and elaborate upon the standards.

The standards need to be explicit about the following:

The primary, fundamental, and overarching goal is that students should be able to think, to reason effectively. This is far more important than any single bit of domain-specific knowledge.

Clearly it couldn’t hurt to say that, but you may be wondering whether it is really necessary.

Therefore, yes, it is important to stop pussyfooting around and explicitly make “thinking” the primary, fundamental, and overarching goal.

Moving now to a lower level, the tactical level, the standards fail to mention the great scaling laws. In 1638, Galileo wrote a book On Two New Sciences. In it, he made heavy use of scaling laws. The scaling laws are simultaneously more profound, more age-appropriate, and more readily applicable than many of the topics that are mentioned in the standards.

Except for those omissions, and except for a few howlers mentioned below, the standards themselves seem reasonable. Although they could be improved, they are not the rate-limiting step. As mentioned in section 1, it is perfectly possible to make a good set of questions, much better than the questions discussed in section 2, within the current standards.

On the other side of the same coin, we should keep in mind the fact that you can’t make anything foolproof, because fools are so ingenious. No matter how good the standards are, they will never be foolproof or abuse-proof.

5  References

California Standards Test – Released Questions (physics). The 2004/2005 version of the rtqphysics document (which was the basis for the first draft of this document) is archived at:
I call this the 2004/2005 version, because it appeared in early 2005 and covers questions used on the 2004 and earlier tests.

The corresponding “current” document is at: http://www.cde.ca.gov/ta/tg/sr/documents/rtqphysics.pdf

This analysis has been revised so that the question-numbering conforms to the 2006/2007 rtqphysics document, i.e. the one released in January 2007 and covering questions administered to students in 2006 and earlier years.

Dean Baird, “CST Schoolhouse Rock” http://phyzblog.blogspot.com/2007/08/cst-schoolhouse-rock.html
Index of all released questions (various topics and grade levels) http://www.cde.ca.gov/ta/tg/sr/css05rtq.asp
John Denker, “Scientific Methods” ./scientific-methods.htm
Test question: what happens when the string breaks? http://www.collegeboard.com/student/testing/sat/lc_two/phys/prac/prac08.html?phys
John Denker, “Conservation as related to Continuity and Constancy” ./conservation-continuity.htm
John Denker, “The Laws of Thermodynamics” ./thermo-laws.htm
California Standards Test – Released Questions (chemistry). The 2004 version is archived at: http://web.archive.org/web/20050325132956/www.cde.ca.gov/ta/tg/sr/documents/css05rtqchem.pdf
and the “current” version is at: http://www.cde.ca.gov/ta/tg/sr/documents/rtqchem.pdf
John Denker, “Periodic Table of the Elements – Cylinder with Bulges” www.av8n.com/physics/periodic-table.htm
California Department of Education, “Physics” (9-12 standards) http://www.cde.ca.gov/be/st/ss/scphysics.asp
California Department of Education, “Mathematics” (K-12 standards) http://www.cde.ca.gov/be/st/ss/mthmain.asp
California Department of Education, “Algebra 1” (8-12 standards) http://www.cde.ca.gov/be/st/ss/mthalgebra1.asp
California Department of Education, “Science” (K-12 standards) http://www.cde.ca.gov/be/st/ss/scmain.asp
California Department of Education, “Chemistry” (9-12 standards) http://www.cde.ca.gov/be/st/ss/scchemistry.asp
California Department of Education, “Investigation & Experimentation - Grades 9 to 12” (standards) http://www.cde.ca.gov/be/st/ss/scinvestigation.asp
California Department of Education, “Science Framework” (K-12) http://www.cde.ca.gov/re/pn/fd/documents/scienceframework.pdf
California Department of Education, “Mathematics Framework” http://www.cde.ca.gov/ci/ma/cf/index.asp
Copyright © 2005 jsd