The New York Times

September 2, 2003

Rising Demands for Testing Push Limits of Its Accuracy

By DIANA B. HENRIQUES

During a tutoring session last December, Jennifer Mueller, a high school student in Whitman, Mass., came up with a second correct answer for a question on the state's high school exit exam — an answer that the giant company that designed the test had never anticipated.

When statewide scores were adjusted to reflect Ms. Mueller's discovery, 95 dejected seniors who had failed the test by one point suddenly found they could graduate after all.

"I got flowers delivered to the school, and letters and thank you notes," said Ms. Mueller, 18, who wants to be an American Sign Language interpreter. "I was just wicked excited."

Her find was not the only testing flaw to surface recently. Indeed, it was the second problem reported last year in Massachusetts. In Nevada, a scoring error caused 736 juniors and sophomores to fail that state's high school exit exam. And in Georgia this spring, officials canceled statewide exams for more than 600,000 fifth graders when the third error in three years was discovered in the tests.

Testing is the buzzword of education these days, with state legislatures and the federal government demanding more of it than ever before. Everything from high school graduation to eligibility for transfers, tutoring and federal aid is tied to the results. But educators and some testing industry experts are warning that the new demands are pushing the limits of the testing industry's ability to provide fair and accurate tests.

When President Bush signed the No Child Left Behind Act in January 2002, calling for increased annual testing in grades three through eight by the 2005-06 school year, the testing industry — dominated by a handful of companies — had just weathered the three most error-plagued years in its history. Researchers at Boston College recently found that last year was hardly better, with at least 18 problems reported, almost matching the total reported between 1976 and 1996.

Many experts are warning that the increased testing and tight deadlines of the education law will trigger a spike in human errors unless greater attention is paid to quality control issues.

"I think preventing them entirely is impossible," said Prof. Mark L. Davison, an educational psychologist at the University of Minnesota, saying that the amount of testing is likely to double in the next few years. "As existing companies expand and new companies move into the field," Professor Davison said, "they're going to experience growing pains."

Executives at some of the largest testing companies say they can meet the demands of the law while improving the industry's recent track record. But even some of them fear that educators and politicians have unrealistic expectations.

"They want faster, better and cheaper — and we often tell them, pick two out of the three, because you can't have all three," said Stuart Kahl, the president of Measured Progress, a fast-growing testing company in Dover, N.H.

Because errors can have such life-altering consequences for students and schools, a few critics are even calling for federal oversight of the industry.

Secretary of Education Rod Paige, a staunch defender of the education law, said that was an issue for Congress to decide. "If, in their judgment, there is a need for some type of federal regulation, that's the role that Congress plays," Mr. Paige said in an interview.

In fact, it is very difficult to monitor the performance of the big testing companies, said Kathleen Rhoades, a co-author of "Errors in Standardized Tests: A Systemic Problem," a study released this summer by the National Board on Educational Testing and Public Policy at Boston College.

"They don't have to let you in, they don't have to answer your questions," said Ms. Rhoades, who worked on the study with Prof. George Madaus.

Indeed, Ms. Mueller's discovery was only possible because Massachusetts — hoping to catch errors early — makes all its test questions public after the tests are given. But the practice adds substantially to testing costs because each year's test must be built from scratch.

Beginning in 1999, Ms. Rhoades and Professor Madaus conducted a systematic search for reports of testing errors and found more than 100 in the United States, Britain and Canada from 1976 through 2002, a period that saw extraordinary growth in school testing. One major testing company, for example, had its revenues rise more than tenfold during those years.

The study confirmed the rising number of errors cited in a series of articles in The New York Times in May 2001. And more errors have been reported since the research for the study was completed, Ms. Rhoades said. All told, of the 103 reported errors and disputes over testing results, more than two-thirds occurred in the past four years. And only a quarter of those were detected by testing companies themselves, she said.

Several testing company executives said that the Boston College study reflected an "antitesting agenda" and that it did not distinguish between serious errors and trivial ones. But they agreed with the researchers that haste was the most common contributor to errors. Neal Kingston, the chief operating officer at Measured Progress, said his company had occasionally been asked to devise and deliver new statewide tests in three months — an utterly impossible task, he said.

Under the law, schools must show that all students — regardless of race, for instance — are showing improvement. But gathering accurate data to allow students to be placed into the appropriate racial group is a major problem for testing companies.

Many states still rely on information gathered at the district, school or even the classroom level. And when children fill in the demographic information themselves, it is riddled with errors, Mr. Kahl said. Children may simply not know which ethnic group they belong in, or even how their names are listed in school files.

Building these student information systems is an unappreciated part of the challenge, and expense, of complying with the law, Mr. Kahl added.

For schools, the school year that opens in September 2005 is "the crunch year," he said. Ideally, testing companies would already be at work on the new tests that will be administered then. But few states are that far along, he said.

The pressure does not ease when the tests are delivered. States want the tests scored quickly so they can give tests in May and have the results in time for summer school. "But giving a test, getting it right and getting it back in two weeks — you've just multiplied the odds for mistakes," said Mark Musick, the president of the Southern Regional Education Board.

Many of the largest testing companies are expanding to cope with the added work and compressed schedules built into the law. Pearson Education Measurement, which says it is the nation's largest school testing company, has increased its answer-sheet scanning equipment by two-thirds since 2000 and expanded the office space devoted to essay-scoring by more than 300 percent.

CTB/McGraw-Hill, another testing giant, said it had also added capacity and was upgrading its aging computer systems.

And Harcourt Educational Measurement, the third major full-service company in the market, said it had been adding professional staff and revising its procedures for detecting and preventing errors.

Mr. Paige, the education secretary, said that the opportunities created by the law would attract more companies to the testing business. But industry experts say it is hard for new companies to come in because of shortages of specialized personnel, especially the psychometricians who devise tests and monitor their validity. Moreover, newcomers need an expensive computer infrastructure, and states demand a proven track record.

"You're not going to be able to go to `Joe's Truck Stop and Testing Service' and get a test," Mr. Musick said. "You've got to go to a major provider that, in spite of its problems, is still respected."

Besides time, money can be a key factor in determining how error-prone a state's testing program is — as shown by a judge's findings in a lawsuit against the Pearson testing subsidiary after a large scoring error on Minnesota's high school exit exam in 2000.

Almost 8,000 students got incorrect scores as a result of the error, which was discovered when a parent demanded to see his daughter's test results and found that correct answers had been marked wrong. Initially, the trial judge refused to allow the students' lawyers to seek punitive damages against the subsidiary, then known as NCS Pearson.

But the judge later reversed himself in a scathing opinion that said the company "continually short-staffed the relatively unprofitable Minnesota project while maintaining adequate staffing on more profitable projects like the one it had in Texas."

The company settled the lawsuit in September 2002 on terms that prohibit it from commenting on it. But Steve Kromer, general manager of the Pearson testing unit, said that Pearson had made substantial improvements in its quality-control procedures in the past three years.

Harcourt, too, has had some widely publicized problems, including the one that Ms. Mueller discovered in Massachusetts. The company recently settled a class-action lawsuit filed after its testing error in Nevada.

Charging more for improved quality control services, however, is difficult when state finances are in such dismal shape and when the costs of complying with the law are so uncertain.

Concern about this rising tide of testing errors is reviving the long-dormant issue of industry regulation. "We regulate our pet food, and we don't regulate the tests which are making major decisions about the lives of our kids," said Monty Neill, executive director of FairTest, an advocacy group in Boston.

Others have called for an independent oversight panel that could monitor for quality in testing. Professor Madaus, the co-author of the Boston College study, said he preferred that approach to letting the federal government regulate the industry because he feared that politics would taint the professionalism of test evaluation.

Even some testing executives see merit in at least compiling a national database to track testing errors. "Researchers have to hunt and peck where they can to find the mistakes and compile them," said Dr. Kingston of Measured Progress. "A lot of mistakes, quite possibly, don't even get caught."


Copyright 2003 The New York Times Company | Home | Privacy Policy | Search | Corrections | Help | Back to Top