Government-Mandated High-Stakes Trivia Tests -- Or Not --

Government-Mandated
High-Stakes Trivia Tests
– Or Not –
John Denker

1 The Testing Game, Or Not

In the United states, students from 3rd grade through 12th grade are subject to a high-stakes test every year. The testing program is required by federal law, but the implementation is delegated to the states. Because of the way the results are used, each state has nothing to gain and everything to lose by having a test that actually measures anything. The entirely foreseeable result is that the test-vendors (who prepare the tests on behalf of the states) are engaged in a “race to the bottom”, preparing tests such that anyone can score well, whether or not they know anything.

I call these “game-show” tests. They are basically trivia contests. That is, they are a mile wide and an inch deep, and emphasize questions that can be answered quickly. For all practical purposes, they don’t measure the things that really matter. They don’t measure critical thinking skills or the ability to work through complicated problems, as discussed in reference 1. They also don’t measure judgment, personal responsibility, teamwork, generosity, et cetera.

Be careful what you test for;
you might get it.

Game-show tests are a disaster on many levels. For starters, one of the most important things a school can do is to get students interested in learning. Also, at some level, school should teach useful skills. Trivia are, alas, neither particularly interesting nor particularly useful.

You would think that a game-show test would make life easy for the teachers, but it doesn’t work out that way. It’s not a game. The teachers are obliged to take the tests seriously, for reasons discussed in section 2.

* Contents

1 The Testing Game, Or Not

2 Inconsistent, Invidious, and Punitive

3 Making the Tests Easy To Grade and Easy to Interpret

3.1 Convenience and “Validity”

3.2 Another Viewpoint

3.3 Non-Trivia are Easier to Motivate

3.4 Building Block Approach

4 The True Cost

5 The Federal Mandate is Obsolete

6 Possible Improvements

6.1 Not Teaching to the Tests

6.2 Just Improving the Tests

6.3 Dropping the Government Mandate Altogether

6.4 Statistical Application of Better Evaluations

6.5 Board of Visitors

7 Selective Admissions

8 Persuading the Persuadable

9 Porcupine

10 References

2 Inconsistent, Invidious, and Punitive

Some folks say these tests are supposed to promote “accountability” in the school system. That sounds like a noble goal, in principle. I am generally in favor of accountability ... in the schools and everywhere else. I would say that the devil is in the details, except that the existing testing program is not merely wrong as to details; it is profoundly and structurally wrong. We don’t even need to ask what is the intended purpose of the testing program, because no matter what you are trying to do, this is the wrong way to do it.

Except in Lake Wobegon, there will always be some schools that are below average. Indeed, there will always be some schools in the bottom ten percent.

If you identify the schools that are in the bottom ten percent and destroy them, the next year there will be some other schools in the bottom ten percent. If you destroy them and keep going in this direction, rather soon there will be nothing left. If the goal is to make sure nobody is in the bottom ten percent, this is an equation with only one solution, namely a zero-sized school system. There is a fundamental logical inconsistency here ... unless your goal is to zero out the entire school system.

In contrast, if you were to identify the schools in the bottom ten percent and help them, that would make a certain amount of sense ... but that’s not how the program works.

Because the consequences of doing poorly on the test are so severe, teachers feel obliged to take the tests very seriously, no matter how trivial and counterproductive the tests are. (On the other hand, see also section 6.1.)

The 2001 federal law is captioned “No Child Left Behind”(NCLB). Teachers refer to it instead as “Every Child Left Behind Equally” (ECLBE) and “No School Left Standing” (NSLS).

3 Making the Tests Easy To Grade and Easy to Interpret

3.1 Convenience and “Validity”

In some quarters, it is considered conventional wisdom to design each test question so that it covers a single topic, i.e. a single concept, fact, or principle – specifically, a single line-item from the applicable standard. See e.g. the third paragraph of reference 2.

We can describe such questions as "sharply focused".

One advantage of sharp focusing is that you know how to grade the test and you know how to interpret the results: If the student answered question X correctly, he understands concept X, and if not, he doesn’t.

Another advantage of sharp focusing is that it guarantees "validity". That is to say, it means the test demonstrably measures exactly what it purports to be measuring. This is practically the textbook definition of "validity" in this context; see e.g. reference 3 and references therein.

Given the twin advantages of convenience and "validity", we are not surprised to find that tight focusing is heavily favored by erudite academics and by the people who design and administer large-scale standardized tests. Indeed, it literally goes without saying in most cases. See e.g. reference 4 etc. etc. etc. (although there are some that don’t fit the pattern)

3.2 Another Viewpoint

There is, however, another way of looking at this matter.

According to this second viewpoint, questions that are important and relevant to the real world are almost never tightly focused. If you are working at any job other than the proverbial McJob, the questions that arrive on your desk are not tightly focused. Although there are some real-world questions that are multiple-choice, there are many that are not. For example, teaching is certainly not a multiple-guess job. I mean, seriously, when was the last time a student came up to a teacher and said "I’m confused, and here are the four possible ways in which I could be confused. Pick one."

It would appear that although tight focusing guarantees "validity" and convenient interpretation, it also guarantees triviality. "Validity" (in the technical, education-research sense) means that the test measures what it purports to measure; it does not say anything about what the test should be measuring (in the common-sense sense). "Validity" does not imply real-world relevance, let alone importance.

This seems like an extreme form of looking for keys under the lamp-post. It is convenient to look there, and you can do a thorough and "valid" search there ... but the whole exercise is pointless, because you knew a priori that nothing worth looking for could possibly be under that lamp-post.

3.3 Non-Trivia are Easier to Motivate

Another problem with sharp focusing is that students aren’t stupid. They realize that sharply focused trivia are not worth remembering, and they’re right. Therefore if you teach to the test – the trivia test – you will not clear even that low hurdle, because it is impossible to motivate the students to pay attention to a manifestly worthless goal.

Constructive suggestion: If you aim higher, teaching actual worthwhile principles and applications, you can clear the much higher hurdle. The students will learn something useful. They will also do well on the trivia test, but that’s merely a corollary, merely icing on the cake. It’s ironic in a good way: If you don’t focus too much on teaching to the test, they’ll do better on the test.

This suggestion is important, because it can be implemented more-or-less overnight, without waiting for the bureaucracy to get its act together.

3.4 Building Block Approach

Some of this can be understood in terms of the building block approach. I am a big fan of that approach for teaching complex tasks. It calls for starting by dividing the task into simple elements, teaching the elements one by one (i.e. tight focusing) ... AND THEN gradually putting these elements together, assembling the building blocks into a complete edifice.

So I would say that tight focusing is a good place to start, but it is not a good place to end up.

Tests are important. Your tests tell the world where you think the rubber meets the road. Alas, all too often, the standardized tests call for:

superficiality instead of complexity
triviality instead of importance and relevance
equation-hunting instead of reasoning
terminological lawyering instead of understanding
rote regurgitation instead of comprehension
paint-by-numbers instead of artistry

To borrow a phrase: I’m not opposed to all testing. I’m opposed to dumb testing.

I’m all in favor of real achievement. I’m in favor of accountability. I’m in favor of assessment. Indeed, good teachers spend all day, almost on a minute- by-minute basis, assessing the students’ situation, so as to determine where to go next. I’m even in favor of a certain amount of standardization. I am not, however, in favor of judging students, teachers, or schools on the basis of trivia tests.

Everybody is in favor of convenience and "validity", other things being equal ... but not at the cost of defeating the purpose of the test ... and defeating the purpose of the entire educational system. This seems like an extreme form of penny-wise and pound-foolish.

We need to put a stop to this. See figure 1.

Figure 1: Torches and Pitchforks

We need more complex, nuanced tests. That way, if the student gets the right answer, we know quite a lot. We know something important. This should be considered the normal case. In contrast, if the student does not get the right answer to a complex question, we don’t immediately know what went wrong,¹ so we need a backup plan. We need specialized follow-up tests and other techniques for dealing with this abnormal case, to diagnose what went wrong. We must not, however, let the abnormal tail wag the normal dog.

4 The True Cost

Let’s estimate the costs involved. In this section, we focus attention on the present-day best case, namely schools that do well on the test. (The costs are far higher for schools that don’t do well, as discussed in section 2.)

Cost W is the so-called hard cost, i.e. the direct out-of-pocket cost, paid to the contractors who prepare the test questions, print the test materials, do the scoring, et cetera. It varies from state to state, but as a ballpark approximation we can say W is on the order of $20.00 per student per year.
Cost X is the opportunity cost directly associated with administering the test. This accounts for the time the students spend taking the test, all of which must be considered a loss of instructional time. Based on the overall cost of operating the school system, this loss X must be at least $100.00 per student per year, maybe more.
Cost Y is the indirect cost to the educational system. This arises because the test distorts and perverts the instruction throughout the year. Because the stakes are so high, teachers feel obliged to teach to the test. As a general principle, teaching to the test is either a good thing or a bad thing, depending on the test. Alas, the existing state-mandated high-stakes tests are so bad that teaching to these tests tends to defeat the purpose of the educational system. Therefore the cost Y must be at least $1000.00 per student per year, maybe more.
Cost Z is even more indirect. It represents the cost to industry of having a poorly-educated workforce, and the cost to society at large of having a poorly-educated citizenry.

We really need to pay attention to the disparity in the costs:

W	≪	X	≪	Y	≪	Z
20	≪	100	≪	1000	≪	??

(1)

These numbers suggest that every dollar spent on these tests is at least 50 dollars wasted, because of the overall pernicious effect on teaching and learning throughout the year.

Obviously, economizing on the hard cost W is exceedingly pound-foolish. We should either spend enough on the testing program to make it worthwhile, or we should do away with it altogether.

5 The Federal Mandate is Obsolete

At present, there is a federal layer and a state layer. That is, the federal government requires the state governments to require testing. Let’s discuss the federal layer first. (Various options available at the state level are discussed in section 6.)

Let’s consider the following scenario, which is meant to be favorable to the idea of testing. We assume, hypothetically, that state-mandated high-stakes testing is a good idea. We further assume, hypothetically, that back in 2001, the states were not doing enough testing, and therefore did not appreciate the value of testing. Subject to these assumptions, it could be argued that the federal law was a good idea, insofar as it compelled the states to overcome inertia and ignorance. It forced them to begin testing.

However, even in this favorable scenario, there is no longer any point in having a federal mandate. The states now have many years of experience with state-mandated high-stakes testing. Surely they are, by now, fully aware of the advantages of testing. Inertia has been fully overcome.

Therefore we conclude, non-hypothetically, that the federal mandate should be abolished. Whether or not you think the federal mandate originally served a useful purpose, it no longer does so. Let each state decide for itself the appropriate amount of testing and kind of testing.

6 Possible Improvements

In this section, except possibly in subsection 6.1, we assume that the federal mandate has been dropped (for reasons discussed in section 5) so that states are free to make their own decisions.

6.1 Not Teaching to the Tests

Students aren’t stupid. They realize that trivia are not worth remembering, and they’re right. Therefore if you teach to the test – the trivia test – you will not clear even that low hurdle, because it is impossible to motivate the students to pay attention to a manifestly worthless goal.

Constructive suggestion: If you aim higher, teaching actual worthwhile principles and applications, teaching students to solve important problems, you can clear the much higher hurdle. The students will learn something useful. They will also do well on the trivia test, but that’s merely a corollary, merely icing on the cake. It’s ironic in a good way: If you don’t focus too much on teaching to the test, they’ll do better on the test.

This suggestion is important, because it can be implemented more-or-less overnight, without waiting for the bureaucracy to get its act together.

6.2 Just Improving the Tests

In other contexts, I have seen good tests. For example, the FAA-mandated Private Pilot Practical Test Standard lays out certain requirements for getting a pilot license. It’s a good test. It’s not too easy. It’s not too hard. There’s not much extraneous material on it. There’s not too much of importance that is missing from it. It is a nationwide standard, constructed and published by the federal government.

Returning now to the grade school classroom: One could imagine improving the state-mandated tests. Making small changes won’t help. Reducing the opportunity cost Y by a factor of two would be nice, but wouldn’t be worth the trouble, because even then, the testing program would still be worse than nothing. (It appears that “nothing” is a plausible option, as discussed in section 6.3.)

The ideal would be to change the sign of Y. That is, the ideal testing program would improve the overall instructional program, rather than subverting and perverting it. This would require a radically different approach to testing.

Suppose we switch to better tests. The new hard cost W₂ would be substantially higher than the current hard cost W. For example, rather than having multiple-guess questions we could have open-ended questions that would have to be hand-graded, which is expensive. However, these costs are finite, and overall this would be a very good bargain: It would pay for itself many times over via the improvement in Y and Z.

Note that administering the private pilot practical test costs several hundred dollars per student ... and it covers roughly 100 hours of instruction. A year of school is on the order of 1000 hours of classroom time. The obvious extrapolation suggests that a good test – good enough to improve the overall instructional program – would cost hundreds or may be even thousands of dollars per student, substantially more than the existing tests. On the other hand, if this is what is needed to improve the instructional program, we have to pay the price.

To say the same thing the other way: The current testing regime is penny-wise and pound-foolish to an extreme degree. We cannot afford to continue with these supposedly cheap tests.

6.3 Dropping the Government Mandate Altogether

There is a proverb that says half a loaf is better than none. Like most proverbs, it is sometimes true and sometimes not. If the loaf is laced with high levels of cyanide, half a loaf is much worse than none. Even if some of the ingredients are nutritious, overall the disadvantages outweigh the advantages, by a lot.

I mention this because it is analogous to the existing state-mandated high-stakes multiple-guess tests. All in all, these tests are worse than nothing. Even if they do some good in some cases, overall the disadvantages outweigh the advantages, by a lot.

It is fairly easy to find non-toxic bread. It is somewhat harder to find an easy-to-administer test that measures anything worth measuring.

Therefore we should seriously consider the option of simply stopping the mandatory testing program altogether. This is the famous null hypothesis. This is in some sense the simplest option. “Just say no.”

Here’s one way of explaining this proposal: Teaching is a profession. By definition, a professional is free to choose the tools and methods for solving a given problem. This is how it should be. This requires trusting the teachers, which I do. I trust a below-average teacher more than I trust whomever is making up these state-mandated tests.

Evidently there are some folks who – intentionally or otherwise – are bent on turning teaching into a blue-collar job, instead of a profession. That is, the teacher becomes an automaton standing in front of the room and reciting a state-mandated script. I suppose this might be an improvement in 1 percent of the cases, but it is insanely destructive in other 99 percent of the cases.

These tests were imposed by politicians for political reasons. This is a political problem, and will have to be solved by political means.

Teachers did not ask for these tests, and teachers cannot be held responsible for fixing these tests. It’s not clear whether the tests are fixable, even in principle. Maybe they are, maybe they aren’t. So I say to the politicians: If and when you find a test that actually measures something worth measuring, then you can impose it if you want. In the meantime, however, the existing tests are worse than nothing, and need to be stopped. The first step toward recovery is to stop eating the cyanide-tainted bread. This will not solve all the world’s problems, but the existence of multiple problems is never an excuse for tolerating problems that are readily fixable.

6.4 Statistical Application of Better Evaluations

Now let’s discuss another option, namely statistical application of better evaluations (SABE).

Some folks suggest that any improvement over the existing test would by prohibitively expensive. That might be true, if you take the direct approach, but there is a way around it. That is, we administer a sound, comprehensive test to a small percentage of the students, randomly selected. If the test costs ten times more for each person actually tested, but we test only 10% of the students, we break even on testing costs, and come out ahead overall, to the extent that the test measures something worth measuring.

The SABE approach might be be more politically acceptable than doing away with state-mandated testing altogether.

6.5 Board of Visitors

Some folks say there is a need to identify the so-called “failing schools”. I’m not sure we need to do that ... but if we do, the existing tests are the wrong way to go about it. (They are wrong for that purpose and for any other purpose.) It’s not hard to figure out which schools have problems.

Sending around a “board of visitors” would be much better, because it might give you some idea of why things went wrong, and what remedial steps need to be taken. No multiple-guess test is going to tell you that.

The board-of-visitors approach can be combined with better testing (section 6.2), SABE (section 6.4) ... or with no state-mandated high-stakes testing at all (section 6.3).

7 Selective Admissions

There is one sure-fire way to improve your school, and that is by being more selective in your admission (and retention!) policies. This is closely related to the idea of “school choice”.

Selectivity is great for the selective schools, but it is not so great for the educational system as a whole. Somebody has to deal with the students who aren’t sought after by the selective schools.

As always, policy should be set in such a way as to improve the system as a whole, rather than optimizing one part of the system at the expense of other parts.

I have no objection to selective schools. I know lots of people who have benefitted from going to selective schools. My point is that those who benefit should be asked to pay their fair share of the total cost ... and that includes the costs that have been shifted to the not-so-selective schools.

8 Persuading the Persuadable

Again: The mandatory high-stakes tests were imposed for political reasons. This is a political problem, and will have to be solved by political means.

The first step in any political operation is to identify the various stakeholders and figure out what they want. We need to do some segmentation.

In segment “A” are the folks who want to improve the public school system. It should be relatively easy to get them to realize that the existing testing program is horribly counterproductive.

In segment “B” are the folks who want to destroy the public school system. One sub-segment comprises folks who want to re-segregate the education system along racial, partisan, and sectarian lines. They send their kids to religious school or home school, which is very expensive, and they don’t want to spend any money on “those people” who go to public school. Another sub-segment comprises folks with no kids of their own. They don’t care whether the schools are segregated or not; they just don’t want to pay for schools, period.

This is relevant to testing, because the tests are part of a system that is more-or-less guaranteed to destroy the school system, as discussed in section 2.

At the tactical level, segment B has been quite successful. The way things are going, many of the public schools will not survive.

Alas, this tactical achievement is a strategic disaster. Democracy cannot survive without a strong public school system. There is a short-term argument that says, crudely but accurately, that schools are cheaper than prisons. The longer-term argument is more subtle: Presumably everybody wants to live long enough to retire. I’m talking about everybody, including folks who don’t have kids in the public school system. Also, retired people want to be able to afford retirement. They want to collect social security payments and they need to buy services. If the next generation of workers is not skillful and productive, they won’t be paying enough taxes to support the social security system. Just as importantly, no matter how much money the retirees have saved, they won’t be able to buy services that aren’t available. Spending enough money now to guarantee a skilled, productive workforce in the future is one of the best investments anybody can make.

It is ironic that some people think that re-segregating and/or destroying the public schools will give them a religious advantage. This defies my understanding of religion, because they seem to be saying “I take care of me and mine, and everybody else can go .......” This may not be the message they intend to send, but that’s the way it comes across. To make the same point in another way, I refer to Matthew 25:31-46.

It is also ironic that many of the politicians who are bent on destroying the school system are at the same time strong supporters of the military. To understand the irony, I call attention to the Normandy landings. See reference 5. It wasn’t a bunch of rich preppies who won that battle. Most of the allied officers got killed before they got off the beach. It was up to the private soldiers to figure out what had to be done, and then do it. It was a bunch of boys, the salt of the earth, from the farms and tenements of America, self-motivated and self-led. They prevailed against a force that had superior numbers, superior military training, superior experience, a superior tactical situation (in terms of prepared positions, interior lines of communication, etc.), superior weapons, and so forth.

Evidently some politicians nowadays don’t like people who aren’t sufficiently wealthy and aren’t sufficiently white. Even so, deciding not to educate non-elite people is going to seriously compromise military security in the not-very-distant future.

You know who these politicians are. Don’t vote for them, ever. No matter what they say they’re trying to accomplish, they’re going about it in the wrong way.

9 Porcupine

Once upon a time I found a porcupine quill. I observed that one quill, all by itself, is not very scary. In contrast, an entire porcupine full of quills, if it is provoked, is very scary indeed. A porcupine is normally a placid, peaceable, vegetarian creature ... but you don’t want to pick a fight with one.

I mention this because a teacher recently said he had to tolerate a lot of nonsense because he didn’t have enough “standing” to challenge it. That’s entirely true as far as it goes, but it’s not the whole story. Each of us has a quill. If we get together and hold up our quills, nobody will be able to kick us around.

Before we can do that, we need to have a consensus amongst ourselves as to where we stand, and what we want to happen next.

So I leave it as series of questions:

Do we have consensus on the following points?
- The federal requirement has outlived its usefulness and should be immediately repealed. (See section 5.)
- The existing testing program is worse than nothing. Unless it can be radically improved, it should be abolished.
Is there any way of modifying the testing program so as to make it make sense?
More generally, assuming the goal is to improve the educational system as a whole, what are the appropriate ways of doing that?

10 References

: 1.
John Denker “Teaching (and Learning) Thinking Skills” www.av8n.com/physics/thinking.htm
: 2.
California Released Questions – Math
http://www.cde.ca.gov/ta/tg/sr/documents/rtqgr7math.pdf
: 3.
Wikipedia article, “Test validity”
http://en.wikipedia.org/wiki/Test_validity
: 4.
A Few State-Mandated Tests
http://www.azed.gov/research-evaluation/aims-assessment-results/
http://www.cde.ca.gov/ta/tg/sr/
http://www.cde.ca.gov/ta/tg/sr/css05rtq.asp
http://www.nysedregents.org/
http://www.tea.state.tx.us/index3.aspx?id=3850menu_id=793
: 5.
Stephen E. Ambrose, D Day: June 6, 1944: The Climactic Battle of World War II (1995). ISBN 068480137X.

1: This is the mirror image of what we get with a trivia test, where a bad score is meaningful but a good score is not.

[Contents]