Measuring Temperature with a Tablespoon
Bill Breisch, Director of Instruction
Monona Grove School District
Is that what we're trying to do when we use standardized achievement tests to measure the
instructional quality of Wisconsin's schools? James Popham thinks so.
Recently I had the opportunity to hear a major presentation on high-stakes statewide testing by James
Popham, one of the nation's foremost authorities on educational testing. He has taught courses in
evaluation and measurement for nearly thirty years at UCLA's Graduate School of Education.
Popham refers to himself as a .
recovering test developer. having headed a group that built highstakestests for more than a dozen states.
Popham confirmed some suspicions I have had regarding Wisconsin's recent use of standardized
achievement tests (CTB-McGraw Hill.s
TerraNova) for our fourth, eighth and tenth grade students,which our students will take in November. In addition, Popham shared fascinating information about
the history, proper use and misuse of standardized tests which I would like to summarize and share
through this column.
I will only be able to .scratch the surface. in describing this issue which is incredibly important to
students, parents, educators and our entire community. An excellent resource for additional
information is James Popham's recent book
Testing! Testing! What Every Parent Should KnowAbout School Tests.
So when did our country start using standardized tests extensively? All of this began during World
War I when the U. S. Army needed lots of new officers and wanted to find a way to identify who
should enroll in officer training. .Army Alpha. was developed to discriminate test-taker's intellectual
abilities (group intelligence test). If you scored high, you'd be good for officer training. Middle
scores meant you were sent to the trenches, and low scores meant you didn't belong in the Army.
Army Alpha worked well for its intended purpose.
Popham notes: .The overriding mission of today's standardized achievement tests is not
fundamentally different from the mission of Army Alpha. Develop a set of items that will allow for
comparisons (.score spread.) among test-takers. which, as it turns out, is why they are altogether
unsuitable for determining the effectiveness of teachers. instructional endeavors.. How come?
Popham gives three major reasons.
First, there are teaching/testing mismatches
. Standardized tests often broadly list skills in order tobe marketed throughout the country. If the skills were made very specific, they wouldn't match the
different academic standards of our states. This would affect the company's ability to market their
test. Years ago, I was frustrated trying to match very broad
TerraNova skill categories with specifictest questions where our students did not score well. I thought at the time that these skill categories
were so broad that I often couldn't figure out what specific skills were being measured. I understand
now that it was by design so that the standardized test could be marketed and sold in many states.
Studies have shown that often half or more what's tested isn't even supposed to be taught in a
particular district or state. For example, a major study at Michigan State showed that when they
compared the content of the nation's major elementary mathematics textbooks to the major
standardized achievement tests, at least 50% of the content of the tests did not assess skills that were
addressed meaningfully in
any of the textbooks. Assessments should reflect the major skillsemphasized in instructional materials that are used throughout the country.
Second, when standardized tests are updated (re-normed), companies will often delete items
that covered important content.
Why would this happen? Popham notes, .Remember that ArmyAlpha worked because it generated a large .score spread. among the examinees. The more important
the content, the more likely teachers are to stress it. The more that teachers stress important content,
the better that students will do on an item measuring that content. But the better that students do on
such an item, the more likely it is that the item will disappear from the test..
Popham further concludes, .From a test publisher's perspective, the best items for standardized
achievement tests (those that produce a good .score spread.) are those that
can't be influenced byeven first-rate instruction.. What a scary thought! Companies often pick test items that don't match
major instructional targets.
Third, there are factors other than instruction that also influence student performance on these
tests.
To be honest, I have had doubts about the influence of some of these factors until I read someof Popham.s very specific examples. Without going into detail here, I now believe that there is
something to this. Popham observes that student responses on these tests could reflect any of the
following three factors (or some combination): (1) what was taught, (2) inherited academic aptitudes,
and (3) socioeconomic status. Student responses to these test items often reflect more than just the
quality of teaching.
Many of our nation's state tests have become .high stakes.. For example, individual student
performance on these tests has been linked to student promotion. In addition, school test scores are
now being used to determine .instructional success,. which under No Child Left Behind federal
legislation can now result in a series of sanctions for schools judged not to be making .Adequate
Yearly Progress..
Is all of this some kind of conspiracy? Popham doesn't think so. .No, I don't think educational
policymakers established high-stakes testing programs to harm children---or to .get. teachers. Nor do
I think these policymakers acted out of malevolence. Rather, policymakers. actions reflect their
ignorance of the reality of educational testing. Even worse, they don't know that they don't know..
Assessment illiteracy is also common in the educational community as well as the general public.
What should we do?
How we can have large-scale state tests that give real evidence ofaccountability of student learning and contain high quality skills that can be successfully taught? We
can do both. Read my follow up column for some answers!