Measuring Temperature with a Tablespoon

 

Bill Breisch, Director of Instruction

Monona Grove School District

Is that what we're trying to do when we use standardized achievement tests to measure the

instructional quality of Wisconsin's schools? James Popham thinks so.

Recently I had the opportunity to hear a major presentation on high-stakes statewide testing by James

Popham, one of the nation's foremost authorities on educational testing. He has taught courses in

evaluation and measurement for nearly thirty years at UCLA's Graduate School of Education.

Popham refers to himself as a .recovering test developer. having headed a group that built highstakes

tests for more than a dozen states.

Popham confirmed some suspicions I have had regarding Wisconsin's recent use of standardized

achievement tests (CTB-McGraw Hill.s TerraNova) for our fourth, eighth and tenth grade students,

which our students will take in November. In addition, Popham shared fascinating information about

the history, proper use and misuse of standardized tests which I would like to summarize and share

through this column.

I will only be able to .scratch the surface. in describing this issue which is incredibly important to

students, parents, educators and our entire community. An excellent resource for additional

information is James Popham's recent book Testing! Testing! What Every Parent Should Know

About School Tests.

So when did our country start using standardized tests extensively? All of this began during World

War I when the U. S. Army needed lots of new officers and wanted to find a way to identify who

should enroll in officer training. .Army Alpha. was developed to discriminate test-taker's intellectual

abilities (group intelligence test). If you scored high, you'd be good for officer training. Middle

scores meant you were sent to the trenches, and low scores meant you didn't belong in the Army.

Army Alpha worked well for its intended purpose.

Popham notes: .The overriding mission of today's standardized achievement tests is not

fundamentally different from the mission of Army Alpha. Develop a set of items that will allow for

comparisons (.score spread.) among test-takers. which, as it turns out, is why they are altogether

unsuitable for determining the effectiveness of teachers. instructional endeavors.. How come?

Popham gives three major reasons.

First, there are teaching/testing mismatches. Standardized tests often broadly list skills in order to

be marketed throughout the country. If the skills were made very specific, they wouldn't match the

different academic standards of our states. This would affect the company's ability to market their

test. Years ago, I was frustrated trying to match very broad TerraNova skill categories with specific

test questions where our students did not score well. I thought at the time that these skill categories

were so broad that I often couldn't figure out what specific skills were being measured. I understand

now that it was by design so that the standardized test could be marketed and sold in many states.

Studies have shown that often half or more what's tested isn't even supposed to be taught in a

particular district or state. For example, a major study at Michigan State showed that when they

compared the content of the nation's major elementary mathematics textbooks to the major

standardized achievement tests, at least 50% of the content of the tests did not assess skills that were

addressed meaningfully in any of the textbooks. Assessments should reflect the major skills

emphasized in instructional materials that are used throughout the country.

Second, when standardized tests are updated (re-normed), companies will often delete items

that covered important content. Why would this happen? Popham notes, .Remember that Army

Alpha worked because it generated a large .score spread. among the examinees. The more important

the content, the more likely teachers are to stress it. The more that teachers stress important content,

the better that students will do on an item measuring that content. But the better that students do on

such an item, the more likely it is that the item will disappear from the test..

Popham further concludes, .From a test publisher's perspective, the best items for standardized

achievement tests (those that produce a good .score spread.) are those that can't be influenced by

even first-rate instruction.. What a scary thought! Companies often pick test items that don't match

major instructional targets.

Third, there are factors other than instruction that also influence student performance on these

tests. To be honest, I have had doubts about the influence of some of these factors until I read some

of Popham.s very specific examples. Without going into detail here, I now believe that there is

something to this. Popham observes that student responses on these tests could reflect any of the

following three factors (or some combination): (1) what was taught, (2) inherited academic aptitudes,

and (3) socioeconomic status. Student responses to these test items often reflect more than just the

quality of teaching.

Many of our nation's state tests have become .high stakes.. For example, individual student

performance on these tests has been linked to student promotion. In addition, school test scores are

now being used to determine .instructional success,. which under No Child Left Behind federal

legislation can now result in a series of sanctions for schools judged not to be making .Adequate

Yearly Progress..

Is all of this some kind of conspiracy? Popham doesn't think so. .No, I don't think educational

policymakers established high-stakes testing programs to harm children---or to .get. teachers. Nor do

I think these policymakers acted out of malevolence. Rather, policymakers. actions reflect their

ignorance of the reality of educational testing. Even worse, they don't know that they don't know..

Assessment illiteracy is also common in the educational community as well as the general public.

What should we do? How we can have large-scale state tests that give real evidence of

accountability of student learning and contain high quality skills that can be successfully taught? We

can do both. Read my follow up column for some answers!