Race, Academic Achievement, and School Reform

 

An analysis of the racial consequences of state mandated testing, and an agenda for change 

An assessment of the API, California's Academic Performance Index 
 

Harold Berlak 
 
 
 

1. Reforming schools by mandating tests: the API how it works and where it came from 

2. Institutional racism and the achievement gap 

3. Multiculturalism and standardized tests

4. Test validity: what standardized tests measure and don't measure

5. The effects of centralized control of schools

6. Reforming assessment; reforming schools; an agenda for change 
 
 
 
 
 
 
 

1.1 10/18/00
 
 
 

Race, Academic Achievement, and School Reform
 

An analysis of the racial consequences of state mandated testing, and an agenda for change 
 

A case study of the API, California's Academic Performance Index

Harold Berlak 
 

1. Reforming schools by mandating tests: 

the API how it works and where it came from 
 

In January, 2000 California installed a statewide public school ranking system called the Academic Performance Index or API. The State Department of Education released to the public and posted on its website comparative rankings of every public school in the state. The rankings made the headlines in all the state's major newspapers, and was a lead story on the nightly local TV news. Newspapers were filled with pages of tables and graphs comparing scores of local schools and districts across the state. 
 

What the API and all similar indexing systems provide is a uniform scale for measuring educational productivity statewide, what one editorial writer approvingly called the equivalent of a Dow Jones Average for schools. Every public school in California is given a number from 200-1000 based on the school's average score on the Stanford 9 Achievement Test or Stat-9, a standardized, multiple-choice test published, distributed, and scored by Harcourt Educational Measurement. Every school is also categorized according to the relative affluence of the area it serves, and within its category ranked 1-10, worst to best based on the school's average on the Stat-9. According to officials, at some time in the future other factors may enter into the calculation of API (attendance, and drop out rates, scores on the State's new high school exit exams, etc.). But for now and for the foreseeable future, in California, students' performance on a single test, the Stat-9, is by state regulation the final and controlling measure of students' academic performance, school quality, and the professional competence of teachers, principals and school staff.
 

The great majority of states have introduced some form of statewide standardized testing. All to one degree or another have increased or are in the process of increasing centralized, state government control of schools by a policy of tying test performance to a system of state administered rewards and sanctions. A statewide indexing system consolidates state control, not only by linking specific sanctions and rewards to test performance, but by creating a single, articulated system of state control accompanied by an annual high profile public display of results. A uniform index of school quality also sets the stage for market solutions to educational problems with private ('independent') schools and profit-making educational management organizations competing with the public sector to improve students' standardized test scores. As of this writing Colorado, Florida, Texas, have adopted some version of a statewide school indexing system and if the past is any indication, states will soon follow California's lead.
 

There are three essential features common to all existing and proposed state indexing policies: 
 

(1) School districts are required to administer the same test to all students with no (or almost no) exceptions. 

(2) There is exclusive or very heavy reliance on standardized tests as the measure of 'academic achievement', either a 'norm-referenced' or 'proficiency' test in one or more academic school subjects. 

(3) Test performance is linked to a centralized state system of individual and institutional rewards and sanctions 
 

Its supporters claim index systems not only raise standards, but also promote educational opportunity because the process is color blind and objective, thus free of racial and cultural bias. California's API, for example, sets a score of 800 as the standard of excellence for all schools regardless of the socio-economic class, race, or languages of the communities the school serves. Despite the great disparities in physical and human resources between rich and poor schools, proponents say the system is fair because in measuring a school's progress toward meeting 800 standard richer schools and poorer schools are compared to their own kind. 
 

Schools whose average scores fall below the 800 mark are considered below standard and failing. The principal and the teachers are informed by the State Department of Education that annual improvement targets must be met. Should a school repeatedly fall short of the targets, it will be taken over by the state. What this means is not yet clear, but is generally taken to include 'restructuring' and 'reconstituting' failed schools. The State of California could assume direct control, or perhaps contract with a non-profit or a profit-making management company to manage the schools. The entire school staff of the failed school, principals, teachers, and counselors, are reassigned, demoted or fired. Though not formally integrated into California's API policy, there are dire consequences for individual students who fall below Stat-9 grade-level norms. They may be required to repeat the school year, placed in an 'opportunity' or remedial track, or denied entry to special academic programs, tracks and/or to high status academic public schools.
 

In addition to the sticks, there are carrots. Schools, students, and teachers meeting the API targets are rewarded with access to educational programs, opportunities for professional advancement, and cash bonuses. A school that in the course of a year decreases the shortfall to the 800 mark by 5%, could reap an additional $150 per student (the amount depends on annual state educational appropriation), a considerable sum for resource starved schools. The current Democratic Governor, Gray Davis, at one point proposed that the top scoring students on the Stat-9 be awarded university scholarships. 
 

'High stakes testing' is the term most often used in connection with tests that are used for making fateful and irreversible educational decisions. What marks Stat-9 as a high stakes test, however is not the test itself, but the fact that the test is instrumental to the state's centrally controlled accountability system, the API, which links Stat-9 test results to state sanctions and rewards. .
 

Goals 2000 and Indexing Policy.

Both state mandated standardized testing, and indexing policies are a product of a larger educational standards and accountability agenda that marches under the flag of 'Goals 2000.' It is a policy formulated during the Bush Administration, (1) actively pursued by Clinton and still supported by the great majority of centrist Republican and Democrats at the national, state and local levels. Put simply, this policy equates higher academic standards with rising test scores. The Goals 2000 legislation that was passed by Congress in 1994 adopted six lofty promises to be met by the year 2000 (2)

including achieving 'world class' academic standards coupled with greater equality of educational opportunity. Clinton's proposal contained provisions for national testing in reading and math which were eliminated because of opposition by an odd mix, progressive educators, child and fairness in testing advocates led by FairTest, civil rights groups, the Black Caucus, and right wing Christian conservative Republicans. What remained in the bill that was passed were a variety of federal government incentives to induce state governments to assume a higher degree of state control over the schooling process by imposing statewide testing. 
 

Though Goals 2000 policies and its political and educational effects are widely reported in the education press, they have gone almost unnoticed in the national and local media and journals of opinion, right, left, and center. In March 1994, shortly after Congress passed Goals 2000, an education writer for the NY Times called the shift in control in the bill historic, and unprecedented, yet the story itself was buried in the inside pages. The public's and the press' indifference to Goals 2000 was, however, understandable. To most, it appeared harmless, did not raise taxes, and would have no noticeable immediate effects. Life in schools would go on more or less as before. Impact on schools and the everyday lives of teachers, school administrators, parents, and students would be enormous but was still several years away.
 

In January 2000, the full weight of Goals 2000 first came to California in the guise of the API. The API is Goals 2000 in action with one very notable departure from the original design as espoused in government reports, and officials in testimony before Congress. The original Goals 2000 design called for 'world class standards' tied to a new breed of world class tests that were to be capable of assessing in-depth understandings, complex ideas and 'higher order' thinking required by the new demanding standards. These were to be smart tests or 'tests worth teaching to' a phrase coined by Goals 2000 advocates. Although there are some continuing efforts to create smart tests tied to smart standards, they remain exceptions to the rule.
 

California spent nearly 100 million-dollar to produce the first ever state mandated smart test. It was the CLAS language test linked to the State's 1987 standards. These standards (called 'curriculum frameworks') sought to reform the way reading is taught. Contrary to the critics' claims, this framework did not abandon phonics. It did, however move away from reliance on commercially published 'basal' readers that were tied to workbooks, and encouraged literature-based, basic reading instruction, and a writing curriculum that emphasizes making connections to the child's interests and experience. 
 

It was a short-lived victory for progressive, multi-cultural educators. The framework and CLAS very quickly became enmeshed in California's toxic, race-charged electoral politics. The first state-mandated smart test had the misfortune of arriving at the end of the second year, a low point, in the Clinton presidency, and on eve of the 1994 mid-term elections. The Republicans swept to power and took control of both houses of Congress for the first time in forty years. Newt Gingrich was elevated to Speaker of the House and declared victory for his brand of backlash, right-wing Republicanism. This was also the year Pete Wilson was reelected to a second term as governor of California. The early years of the nineties had not been good ones for California's economy, and Wilson, a pro-choice, conservative Republican entered the election season with approval rating scraping bottom. His problems were compounded because the politically powerful southern California Republican right wing lobby also disliked him because of his pro-choice stance. 
 

Wilson won a second term by riding the tails of the voters' overwhelming support of two infamous propositions, 185 and 187. The former called the 'three strikes you're out' initiative mandated 25 year to life sentences for a third felony regardless of its seriousness. The latter sought to deny schooling and virtually all social and health services to undocumented immigrants, the vast majority of whom are Mexican. Wilson courted the right wing, which had bitterly opposed the new literature-based language standards. He launched a virulent attack on the CLAS language test, charging that the standards and CLAS served the ideological aims of multicultural extremists and the far left. Among the more offensive test items cited by Wilson as evidence of political correctness and multiculturalism run amuck was a passage from Maxine Hong Kingston's Women Warrior, followed by the directions, 'Write an essay in which you interpret the moments of silence or inability to speak.'
 

The progressive and moderate forces that had supported the policy of 'smart' tests tied to new 'smart' standards were outmaneuvered politically, overwhelmed, and soundly defeated. In 1998 CLAS was replaced with the Stat-9, a run-of the-mill commercially available standardized, multiple-choice achievement test published by Harcourt Measurement. In 2000, the Stat-9 became the state's instrument for calculating the API. 
 
 
 

2. Institutionalized racism and the achievement gap
 

That there is a race gap in educational achievement is not news. Large numbers of the Nation's children leave school, with and without high school diplomas, ill educated, barely able to read, write and do simple math. But the failures of the schools are not evenly distributed, they fall disproportionately on students of color. (3) Even when parents' income and wealth is comparable, African-Americans, Native Americans, Latinos, and immigrants for whom English is not their first language lag behind English-speaking, native born, white students. The evidence for the gap has been documented repeatedly by the usual measures. These include drop-out rates, relative numbers of students who take the advanced placement examination, who are enrolled in the top academic and 'gifted' classes and/or admitted to higher status secondary schools, colleges, graduate and professional programs. And last but not least, are the discrepancies in scores on standardized tests of academic achievement, on which teachers' and students' fate so heavily depend. 
 

How is this achievement gap to be explained? I focus first on the general question and then separately on the statistical gap in standardized test scores. I draw readers' attention to the distinction between academic performance and academic achievement as measured by standardized tests. Though often spoken of as though they are one, they are clearly not the same. The failure to separate out the standardized test question is serious. It clouds and confounds the educational and policy issues and misleads us in efforts to explain and eradicate the race gap in academic performance. 
 

Over the years the major reasons given for the claimed superior attainments of whites in cultural, artistic, and academic endeavors were overtly racist. It was said that the explanation lay in the superior genes of white northern European, Anglo-Americans. As the social sciences developed in the latter years of the 19th and the 20th centuries, 'scientific' tracts defending white supremacy appeared with regularity. By the 1930's, the eugenics movement managed to gain a foothold in North American universities. And, it is relevant to add, all the leaders of this overtly racist movement were the leaders of the newly emerging field of scientific mental measurement. Many were the same men who testified before Congress in the early 1920's lending scientific credence to the racist immigration exclusion acts which barred or greatly restricted immigration from China, Japan, Latin America and southern and eastern Europe. The eugenics movement was considered a respectable academic discipline until it was discredited in the wake of defeat of the Third Reich and the immensity of the crimes committed in the name of Nordic racial purity. (4)
 

In 1969, the scientific case for racism was revived by an article published in the Harvard Educational Review by U.C. Berkeley education professor Arthur Jensen. Based on his statistical analysis of I.Q. test scores, he concluded that African-Americans were genetically inferior to whites in general intelligence. His racist thesis was widely disseminated and discussed in the popular press and in respectable academic and policy circles. In time, Jensen's conclusions were thoroughly discredited by a spate of books and articles. (5) In 1994 once again using standardized test data, Charles Murray and Richard Hernstein in The Bell Curve claimed to have proven the inferior place of black and brown people in the social, political, and economic order was rooted in biology. The arguments for the genetic superiority of the white race were again dismembered and discredited by many geneticists and biologists. (6) Recently a more subtle form of 'scientific' racism has gained some respectability. The inferiority of the black and brown races is now said to lie not necessarily in genetics but in culture and history. This more quietly spoken academic version of the master-race ideology has also been thoroughly dismantled, yet racist explanations for the race gap persist. (7)
 

Once all 'scientific' arguments supporting racism are dismissed how is the ever-present gap in academic school performance to be explained? Numerous social and behavioral scientists have addressed this question. 
 

A statistical study conducted by Professor Samuel Meyers Jr. at the Roy Wilkins Center for Human Relations and Social Justice at the University of Minnesota sought to determine whether poverty was a primary cause of the poor performance of black students on the Minnesota Basic Standards Test. (8)

Passing this test is scheduled to become a prerequisite for a high school diploma beginning in 2000. In a 1996 trial run in Minneapolis, 75 percent of African-American students failed the math test, and 79 percent failed in reading, compared to 26 percent and 42 percent respectively for whites. 
 

Using standard statistical indices, the researchers found, contrary to expectations, test scores were not statistically related to school poverty, neighborhood poverty, racial concentration, or even ranking of schools (except in the case of whites). They did find that African Americans, American Indians, and Hispanics were underrepresented in the top ranked schools. African-Americans were 4.5 times as likely to be found in schools low ranked in math, and twice as likely to be found in schools ranked lowest in reading. (9) For both white and students of color, success on the tests was positively correlated to whether an individual had been tracked. However, only 6.9 percent of students of color compared to 23 percent of white students had access to 'gifted and talented' programs. This study suggests that tracking and the quality of the academic opportunities affects both the test score gap and the gap in academic performance generally. While these correlational studies, are suggestive, they do not examine basic causes, nor explain the pervasiveness and stability of the gap over prolonged periods of time.
 

A set of experimental studies conducted by Stanford University professor Claude Steele, an African-American psychologist sought to explain the circumstances and situations that give rise to race gap in test scores. (10)

He and colleagues gave equal numbers of African American and white Stanford sophomores a thirty-minute standardized test composed of some of the more challenging items from the advanced Graduate Record Examination in literature. Steele notes all the students were highly successful students and test-takers since all Stanford students to be admitted must have earned SAT scores well above the national average. The researchers told half the students that the test did not assess ability, that the research was aimed at 'understanding the psychological factors involved in solving verbal problems.' The others were told that the test was a valid measure of academic ability and capacity. African American students who were told that the test was a true measure of ability scored significantly lower than the white students. The other black students' scores were equal to white students'. Whites performed the same in both situations. 
 

The explanation Steele offers is that black students know they are especially likely to be seen as having limited ability. Groups not stereotyped in this way do not experience this extra intimidation. He suggests that 'it is serious intimidation, implying as it does that if they should perform badly, they may not belong in walks of life where their tested abilities are important --walks of life in which they are heavily invested.' He labels this phenomenon 'stereotype vulnerability,' In another study, Steele and colleagues found, to their surprise, that students most likely to do poorest on the tests, were not the least able or prepared academically. To the contrary, they tended to be among the more highly motivated and academically focussed. While Steele's research provides a psychological explanation for the gap, it does not probe the historical, social and cultural factors that have created and continue to sustain these stereotypes. We are left with no explanation of how 'stereotype vulnerability' is created by, and shapes everyday life in society and at school. 
 

The previously cited studies focus on the gap in standardized test scores. The final study cited is one of a large number of recent 'qualitative' studies, observational, historical, and ethnographic studies that address the achievement gap and the test score gap, and illuminate relationships of culture, gender, and race to the social relations within the classroom and school. (11) Signithia Fordham, an African American anthropologist, in a study of a Washington D.C. public high school, focused on how the 'hidden' and explicit curriculum shape student aspirations and achievements, and how students of differing cultural, racial, and social backgrounds respond to the schooling experience. (12)


 

Hers is a multifaceted and complex study, including interviews, participant observation, questionnaires, and field notes, gathered over a four year period. She concludes that for black students, patterns of academic success and underachievement are a reflection of processes of resistance that enable African-Americans to maintain their humanness in the face of a stigmatized racial identity. She shows that African American adolescents' profound ambivalence about the value and possibility of school success is manifest as both conformity and avoidance. Ambivalence is manifest in students' motivation and interest in schoolwork, which of course includes learning standardized test-taking skills. 
 

The following two quotes are taken from interviews with two African-American men. The first is from a young lawyer employed in a Washington D.C. firm who had been a National Merit finalist and whose test scores were among the top 2% in his state. 
 

[Commenting on why he was disappointed with his career] I realized that no matter how smart I was [in school] or how hard I was willing to work [in the law firm] that it wasn't going to happen for me. . . .Don't get me wrong, integration has been great for my life. Without it, I would be playing on a much more restrictive field, [but] there's no doubt in my mind that I would be much more successful today if I were white.
 

A high-performing African-Americans high school student offers the following view of why African Americans often underperform in school, and also expresses his doubts that his own school success will be rewarded. 
 

Well, we supposed to be stupid . . . we perform poorly in school 'cause we all have it thought it up in our heads we're supposed to be dumb so we might as well go ahead and be dumb. And we think that most of the things we learn [at school] won't help us in life anyway . . . What good is a quadratic equation gonna do me if I'm picking up garbage cans? 
 

Fordham found that even the most academically talented African-American high school students expressed profound ambivalence toward schooling, and uncertainty that they will reap the rewards of school success. Virtually all African-Americans she interviewed indicated that a central problem facing them at school and in larger white society is the widely held perception of whites that African-Americans' are less intelligent and a continuing need to confront and deal with this in everyday experience. 
 

These three studies taken together suggest three related explanations for the race gap in academic achievement and in test scores. First, is students' perceptions of the opportunity structure in the wider society, of the options open and the objective chances of 'making it'. Second, are the educational opportunities available in the educational system itself -within school districts, schools, and within each classroom. Third, is the cumulative psychic and emotional effects of living in a social world saturated with racist ideology, and where racist practices and structures are pervasive, and often go unnamed. 
 

What does the gap in scores mean

What is almost always overlooked in all these explanations is the size of the test score gap itself. Most assume that the statistical gap in test scores between persons of color and whites is enormous. It is not. There is an eight to ten percent difference in test scores on academic standardized achievement tests which has persisted over time, regardless of the type of test, whether it is a 'IQ' test norm-referenced or proficiency test, regardless of a test's publisher, or educational level of the test-taker, be it kindergarten or graduate school. (13) Figure 1 illustrates graphically an eight per-cent difference using as an example California's CBEST, a standardized test of basic literacy required for entry to a teaching credential program. 
 


 

Figure 1
 

It is important to note that the distributions of scores are highly overlapping. An eight to ten percent gap amounts to a mere handful of test items. In the illustration above, the gap is an average of 3.2 multiple-choice test item on a fifty item multiple-choice test. (14) From an educational point of view this is a minor difference. Because of the way the tests are 'normed' and cut scores set, minor differences in number of correct multiple choice test performance create greatly inflated failure rates for persons of color. On CBEST, for example, African -American test-takers are 3.5 times more likely to fail the test than whites, Latino/ Hispanics more than twice as likely, and Asian Americans more than 1.5 times as likely to fail than whites. (15)
 
 
 

Figure 2
 

Number and Percentages of First-time Failures on CBEST 1985-95
 

Number eliminated
percent failing
African Americans 11,200 63.0
Latinos 15,600 50.6
Asians 23,800 47.0
Whites 125,900 19.7

Numerous researchers have carefully documented the highly disproportionate adverse impact on students of color of standardized achievement testing. (16) An argument might be made that these differences in test score while small, nevertheless represent real differences in performance, and that tests though imperfect eliminate the incompetent, those most likely to perform poorly at school or on the job. Steele's study suggests the opposite --that the more talented students are at greater risk of failure. As documented in section 4, there is no evidence to support the claim that standardized tests are valid and credible measures of academic achievement or intellectual capacity. There is no demonstrable connection between observed academic performance and standardized test scores. Test scores do not predict future success in school, the university, or in the workplace. In the case of CBEST, several studies were conducted to explore the link. CBEST showed no correlation to current or future performance on the job, or to observed levels of literacy. While some tests do correlate statistically to future grades, this correlation is short lived. (17) What standardized achievement tests appear to predict best are scores on other similarly constructed tests, and parent's wealth. As reported by Peter Sacks, socio-economic class accounts for approximately 50% of the variance in SAT test scores. He estimates that for every additional $10,000 in family income, a person on average gains 30 points on the SAT.(18)
 

Among the more commonly heard explanations for the gap in standardized test scores is that the tests themselves are culturally and racially biased. What this has usually been taken to mean is that the bias is lodged in the content or language of individual test items. In the early years of mental measurement the racism of the test items was blatant. In more recent years, major test publishers have made efforts to review and eliminate items with overt cultural and racial bias. Though item bias remains, it is implausible to conclude that all the publishers in all their tests knowingly or unknowingly managed to create tests with an almost identical ratio of biased to unbiased items. The fact is that scores on all commercially produced tests show the same eight to ten percent gap suggests that the gap cannot be fully explained by racial or cultural bias lodged in individual test items. Rather, the bias is systemic and structural -that is, built into in the basic assumptions and technology of standardized testing, in the way the tests are constructed, administered, the way results are reported and in the organizational structure and administrative rules of the accountability system itself. 
 

There is perhaps no clearer illustration of how the differences among the races are greatly exaggerated and distorted than the numerical scales used to report results. There is, as I have noted, an eight to 10 percent difference in scores between white and nonwhite students. On a 100 point scale this amounts to a gap of ten points. California's Academic Performance Index or API (which is based entirely on students' scores on the Stanford 9Achievement Test) constructs a 200 to 1000 point scale, and a 10 percent difference in scores morphs into a formidable 100 points. The SAT, the most commonly used test for college admissions, also frequently used (inappropriately) to rank states academic performance creates a 400-1600 scale. In this instance, a ten percent difference becomes a 120 point chasm . 
 

A major goal of social reformers of the 20th century was the elimination of legalized segregation. We still live in a society that is separate and unequal. To achieve social and economic justice, the goal for the 21st century must become the elimination of institutionalized racism in all sectors of social, economic, cultural, and political life -in business, housing, employment law enforcement, the courts, health-care institutions, and, of course, schools. What makes institutionalized racism so pernicious and difficult to eradicate is that racist practices are often invisible because they are accepted as standard operating procedures within our institutions. 
 

Standardized tests are a particularly invidious form of institutionalized racism because they lend the cloak of science to policies that have denied, and are continuing to deny the next generation of persons of color equal access to educational and job opportunities. An educational accountability system based on standardized testing, though predicated on 'standardized' measurements which are purportedly neutral', objective, and color-blind serves to perpetuate and strengthen institutionalized racism. (2886)
 

3. Multiculturalism, curriculum, and learning

Tests are the single most important influence on the content of the school's curriculum and how it is taught. All tests, including those composed by teachers at the school level, confer upon students' attitudes, ideas, and images of what matters, and just as important, what does not. But the shift in power over the assessment process from the school and local community to state government represents a momentous and qualitative change. The power of the state over rewards and sanctions imposes singular answers to the questions of what schools are for, what constitutes genuine knowledge and learning, and what the next generation should and should not learn at school. 
 

Whatever does not contribute directly to short-term gains in test scores is marginalized --critical thinking, interdisciplinary studies, music, the arts, physical education, and forms of multicultural curriculum and bilingual education that are not add-ons, but integral to the entire curriculum. Even if tests incorporate some multicultural, and multiracial content, state control undermines local efforts by parents, teachers, principals and elected officials to revitalize their local schools, rethink curriculum and pedagogy, and respond in ways that cultivate the major assets of a multicultural society -its racial and cultural diversity and a heterogeneity of perspectives on knowledge, culture, and learning. 
 

Finally, among the most serious negative educational consequence of high stakes state-mandated tests is that teachers and administrators in low scoring schools are under such extraordinary pressure to raise test scores, that those most likely to be first in line for a narrow and culturally truncated curriculum, and the recipients of shrinking educational opportunities are the children of the poor, immigrants, and people of color.
 

While President Clinton and other defenders of the excellence via testing policies are never heard proclaiming that one of the chief purposes of government mandated testing and indexing policies is to employ government power to unify the culture, it is clear that from the beginning that this has been a chief corollary goal of the architects of these policies. The seminal 1992 report, Raising Standards for American Education that launched Goals 2000 argued that testing tied to national standards would 'bind together a wide variety of groups into one nation,' providing 'shared values and knowledge' which will serve 'as a powerful force for national unity.' Lauren Resnick, an academic advocate for 'smart tests', a former president of the American Educational Research Association and one of the originators of the New Standards Project, argued, 'Without performance standards, the meaning of content standards is subject to interpretation, which if allowed to vary would undermine efforts to set high standards for the majority of American students' (italics added). Nicholas Tate, chief executive of the British government's 'Qualifications and Curriculum Authority' is more forthright. He said in an interview, 'Today, we face the widespread belief that there are no underlying shared values in our society, that people are no longer willing to go along with what the school says. That is why we are beginning to make explicit, what has hitherto been implicit.'
 

It is no coincidence that this concerted effort by governments to gain near monopoly control over the curriculum arrives at the time that social movements have appeared and are challenging the cultural dominance of western Anglo-European traditions in the curriculum. The multicultural, bilingual movements are expressions of the will of men and women of differing races and ethnicities, an assertion of their rights, which includes the right to reclaim cultural power, and to forge their own cultural and social identities. But some see the diversity and heterogeneity of multicultural and bilingual movements as a threat to national unity, fostering balkanization of the nation, and the erosion of culture and academic standards. Ironically, the Goals 2000 plan of government imposed common curriculum tied to mandated testing as a way to foster social stability and promote national unity achieves the opposite. In practice, it exacerbates inequalities and provokes racial, cultural, and social strife.
 

As the locus of control of the assessment system (with its sanctions and rewards) shifts power upward, not surprisingly, new arenas for political culture warfare are opened at the national state levels, within the bureaucratic apparatus of the executive, legislative, and judicial branches of government. Decisions are being made increasingly by politically appointed 'blue ribbon' commissions, panels of experts and consultants many degrees removed and insulated from local community concerns, and remote from children, teachers, classrooms, and schools. What best serves interests of this child, this classroom, and this community is lost as control of the curricular and pedagogical decisions shifts to the upper reaches of government where the 'stakeholders" are those who are well organized and possess the financial resources and power required to compete in the political arena at the state and national levels. 
 

The stakes become higher as the decisions are made farther and farther away from concrete situations, classrooms and schools. Differences over difficult moral, cultural, and educational questions are magnified and intensify and (as in California) likely to become entangled in acrimonious, racially charged electoral politics. 
 

The kinds of learning required of citizens in the modern world cannot be achieved by standardized and centrally imposed systems of learning. Human learning to be effective and long-lasting requires the engagement of the learner on his or her own behalf, and rests on the relationships that develop between schools and their local communities, and between teachers and their students. Powerless school communities and teachers cannot produce powerful, engaged citizens committed to social and racial justice, and the public good. (19) (915)
 
 
 

4Testvalidity: what standardized tests measure and don't measure

How credible and dependable are mass administered standardized tests as measures of academic achievement? All of us brought up and schooled in America are familiar with standardized tests. We sit in a classroom or auditorium well spaced from fellow test-takers, given a time limit and booklet of test 'items' --passages of text, math problems, tables, diagrams or charts. From four or five possible responses, we choose the one correct or best answer, and darken the appropriate bubble on the answer sheet. Some standardized tests include 'open- response' items, which are scored according to standardized procedures.
 

Broadly speaking there are two types of mass-administered, standardized educational tests used in current accountability systems. The first are 'norm referenced' tests (sometimes referred to as 'standardized' tests). These tests do just that; they create norms, percentiles or grade equivalency scores which indicate an individual's or group's standing on the bell curve relative to all others taking the same test. 
 

The second type are 'proficiency' or 'basic skills' (sometimes called criterion-referenced) tests which employ either a pass/fail or proficiency level 'cut ' score. 
 

From the point of view of the test-taker, the types are indistinguishable. They look alike, contain very similar test items, and are generally given under the same time constraints and conditions. (20) Both use standardized scoring and reporting procedures and rely on 'normal' or bell curve statistical models. The major difference is in the way each creates failure. A norm-referenced test is deliberately constructed and pre-tested to yield scores distributed so as to approximate the so-called 'normal' curve, The curve defines the percentages of high, medium and low-test performers. (See Figure 3) 
 

Figure 3
 

(Figure 3 is an illustration of a normal curve showing mean, and percentages of students expected to score at ± 1, 2 and 3 standard deviations.
 
 
 
 
 
 
 

The technology of the norm-referenced test accepts as given the dubious assumption, that (if the sample is random and large enough) virtually all human qualities, traits, capacities, achievements, etc., if properly measured will approximate the bell-shaped 'normal' curve. Failure is created by the bell curve. With a proficiency or 'basic skills' tests, failure is created by a particular cut or proficiency score selected by a group of human beings, elected or appointed government officials and/or a panel of experts chosen by these officials or by the testing company under contract to a state agency. 
 

The most fundamental problem with both types of standardized academic achievement tests is that there is little evidence to support the contention that they measure what they purport to measure -academic achievement, or proficiency. This does not mean that academic achievement and high standards are not vital. Rather it is that the tests have very little relationship to actual academic performance of any kind. For some standardized tests there is correlation to grades at least in the short term. But, for virtually all standardized reading and writing tests, there is no demonstrable connection between a person's performance on a standardized reading test and a person's reading abilities in the real world --in everyday life situations, at school, work and elsewhere where one might be called on to read. What this means is that contrary to common sense, Student A's , score at the 45th percentile and Student B's at 95th percentile on the Stat-9 reading test (or any other norm-referenced tests) says nothing whatsoever about the actual or relative reading performance of students A or B. The standardized test informs us only how every test-taker's score on the test compares to everyone else taking the same test. 
 

A score on basic skills' or 'proficiency' test tells us only how far above or below the established cut-off a student's score falls. Cut scores on academic proficiency test are not based on actual or observed level of competence or proficiency. There have been numerous studies that have explored the relationship between test performance and actual performance, and researchers have repeatedly come up with the same conclusion: no (or almost no) connections. Neither do the tests meet the criterion of 'predictive validity'. Norm-referenced and proficiency tests (except for grades in the short term) do not predict future success in school, the university, or in the workplace. (21) What the tests predict best is a person's score on similarly constructed test and parent wealth.
 

The failure to ground cut scores in performance applies to California's Academic Performance Index (API). The score of 800 established as the mark of excellence for all schools to aspire to reach and exceed is a wholly statistical construct. It is not based on any direct observational evidence or documentation. It is extraordinary and also sadly ironic that the cut score now driving state educational policy for achieving educational excellence is not grounded in any way on educational excellence and high academic achievement as they are manifest in the real world of teaching and schools.
 

The seriousness of the problem of the failure to ground standardized tests in actual performance is now widely acknowledged. Clinton's Secretary of Education, Richard Riley, a longtime Goals 2000 testing enthusiast, in his final 'State of American Education" address urged the states to stay the course, but also cautioned state officials about the dangers of relying on a single test for making high stakes decisions. The Secretary's caution is a response to internal and external pressures -including the US Department of Education own Office of Civil Rights (OCR) which has very recently (2000) issued guidelines which assert that the use of test scores as the single factor to determine retention, graduation, and college admission is improper and possibly a violation of Civil Rights law. 
 

The OCR guidelines' are grounded in two recent studies conducted by the National Research Council of the National Academy of Sciences, (22) These two studies are but the tip of the iceberg. There is a vast literature in mainstream psychological and educational measurement research that raises fundamental question about the meaning and usefulness of norm-referenced and conventional standardized proficiency tests. 
 

For the first time in its 50 year history, the 1999 revision of Standards, for Educational and Psychological Tests produced jointly by the American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education NCME) indicates that the validity of educational tests cannot be established without reference to how they contribute to the improvement of student learning and consideration of the negative consequences of test use. Further, the standards assert that no "decision or characterization" of students which has major impact on their future should be made on the basis of a single test score, and caution against the use and interpretation of tests for students with learning disabilities and with limited English language proficiency. 
 

State mandated testing and indexing policies that distribute rewards and sanctions based on test results are common and are increasing. This in the face of almost universal agreement among independent experts on the technical limitations of standardized achievement tests and that their use as a high stakes measure of educational achievements or capacities is destructive, misleading and inappropriate. Government regulations and mandates linking tests to high stakes decisions proliferate, at the same time that the standing of standardized tests' as trustworthy instruments of modern social science has never been lower. Why are indexing policies that strengthen state control so readily endorsed and supported by politicians, corporate leaders, and national teachers unions? This question is addressed in Part 6 that explores possibilities for fundamental changes in direction in assessment policy. 
 

Because tests have long been used to determine merit and access to top academic tracks, special programs and high status schools and universities, they have been challenged over the years politically and in numerous suits alleging violations of US civil rights law. Much of the heated controversy over affirmative action also rests on the continuing use of standardized testing to define merit. (1300)
 
 
 

5.The effects of centralized control of schools
 

An independently funded set of studies conducted by the Centre for Assessment Studies at the University of Bristol, UK sought to map empirically the consequences of the Education Reform Act of 1988, initiated by the Conservative government led by Prime Minister Margaret Thatcher. (23) This law created a nationwide school indexing system for England and Wales, and shifted control of curriculum from individual schools, school councils and local Educational Authorities (school districts) to a central government authority called OFSTED (Office of Standards in Education). A team of researchers over a period of eight years (1989-1997) studied a national sample of primary schools employing a wide range of systematic social scientific qualitative and quantitative methods. The study produced dozens of articles and four major books, the most recent to be published in late 2000. (24)
 

The researchers document the grand mismatch between policy intentions and the outcomes. Rather than erasing educational inequalities and raising the level of academic accomplishment as promised, the state-mandated assessment process served to obstruct learning, perpetuate and increase disparities. Tests, even 'good' tests served to distort and disrupt learning, in particular for bilingual students whose first language was not English. Documented also was a dramatic narrowing of the curriculum and restriction of the range of learning opportunities, increased devaluation of teacher knowledge, decline in teacher and headteacher (principal) morale. There were also increases in pupil anxiety and dysfunctional changes in school structure and governance. This included various forms of resistance by teachers and administrators to somehow hijack the rules and circumvent the system. They engaged in behaviors some might call 'cheating,' and others principled defiance of government regulations that denied opportunity and stunted student learning. Other studies have documented that headteachers once highly independent and insulated from the twists and turns of national electoral politics became highly vulnerable. As a consequence, it has becoime increasingly difficult to recruit and retain creative and talented teachers and administrators in schools located at the bottom of the school rankings --which, in England, as in the US, serve children of the poor, immigrants, persons of color, a majority of whom live in the nation's most distressed urban areas. 
 

Though there are structural differences between the English and US systems, this set of studies is significant and relevant because California's API is modeled on Britain's 1988 School Reform Act. In 1998, in the waning months of his administration, the press reported that the outgoing governor Wilson had tea with former Prime Minister, now Lady Thatcher. He reported that his proposal to establish the API was based on his great admiration for the centralized system of school accountability she had installed. The study is also important because it leaves little doubt that the central issue is control. The negative effects of centralized curriculum control and indexing are evident regardless of the quality of the assessment instruments used. By US standards, many of the British assessment tools would be considered 'smart' tests. Thus, even if the flaws built into standardized tests such as the Stat-9 could be remedied, or if the tests were tomorrow replaced with a new generation of 'authentic' tests, there is little doubt that the negative effects of centralized state control of the assessment system carefully documented by the British researchers would remain in place. The British findings are also particularly instructive to those reformers who focus almost entirely on eliminating or correcting the deficiencies of standardized testing, and ignore both the race question and the fundamental issue of power. Who is in control of the curriculum and the assessment system 

There is no comparable comprehensive, longitudinal study of the impact of a policy of government curriculum mandates in the US. There is, however, a large body of research on the use of standardized tests in making high stakes decisions. Numerous studies have been conducted by researchers from some of the nation's leading educational research universities, and independent R&D centers devoted to evaluation, testing and assessment. In a recent issue of the Educational Researcher, the lead journal of the American Research Association, Robert Linn who is among the nations respected experts on educational testing, reviewed fifty years of research on the use of tests and assessment in accountability systems. He concludes that 'common standards and testing encourages a narrowing of educational experiences for most students, dooms many to failure, and limit the development of many worthy talents.' This, he adds, 'should not to be misinterpreted to mean that one should not have high standards for all students...[H]aving high standards is not the same as having common standards for all.' Professor Linn, concludes with an extraordinary and damming commentary on the current state of the science of educational measurement.
 

As someone who has spent his entire career doing research, writing and thinking about educational testing and assessment issues, I would like to conclude by citing a compelling case showing that the major uses of tests for student and school accountability during the past 50 years have improved education and student learning in dramatic ways. Unfortunately, that is not my conclusion. Instead I am led to conclude that in most cases the instruments and technology have not been up to the demands placed on them by high stakes accountability…Assessment systems that are useful monitors lose much of their dependability and credibility . . . when high stakes are attached to them. The unintended negative effects of high stakes accountability…often outweigh the intended positive effects. [Italics added] (25)
 

Among the most egregious examples is the use of standardized tests to drive school retention and promotion policy. It is also a striking illustration of how the science and technology of educational testing is used to strengthen institutional racism. In recent years, 'social promotion' has cited by Chester Finn of the right wing Fordham Foundation and by President Clinton (26) as a major culprit in depressing the nation's educational standards. Several states and many school districts have responded with tough 'no social promotion' policies. Whether an individual's grade advancement or graduation is considered 'social' (meaning undeserved) or not is being determined solely by a single standardized test score. In the public mind and in the popular press, generally, all reasons for promotion except for standardized test score are considered 'social.' From the point of view of public policy this an absurdity. It is known that students who are retained have higher dropout rates. It is also known that disproportionate numbers of students of color drop out. (27) It is impossible to sustain the argument that policies that have been shown to degrade curriculum and pedagogy, increase drop-outs and exacerbate inequalities, and that have no known educational benefits, will improve the level of education of the nation's youth or enhance their chances of competing in new global economy. It is also a debasement of the social and behavioral sciences, when the observations and judgments of all the adults in a child's school life, parents, teachers, principals, counselors, teaching and learning specialists, those with direct first-hand experience, are overridden, dismissed as 'social', scientifically unfit and subjective, while standardized tests are valorized as the one and only scientifically valid measure of academic performance. (1188)
 
 
 

6.Reforming assessment; reforming schools

The cornerstone of the Goals 2000 standards movement -raising educational standards by central government mandates that tie test scores to rewards and sanctions- is self defeating. The centralization of authority and the proliferation of standardized testing which has become pervasive in the past decade have shown no evidence of positive results. Indeed to the extent to which these policies have been implemented, there is substantial evidence that they have more often than not served as an obstruction to the pursuit of educational excellence and equity. To repeat the words of Robert Linn, 'the evidence indicates that the unintended negatives of high stakes accountability systems probably outweigh the intended positive effects.' 
 

This conclusion, however, should not be mistaken as a rejection of the importance of raising standards and public accountability, nor as rejection of the need for national and state and local governments to use executive, legislative, and judicial power to protect student, parent, and community rights -and to take a strong affirmative role in the pursuit and maintenance of high educational standards for all. Also, the fact that current national and state school reform policies are almost totally reliant on an arcane and deeply flawed test technology does not in any way diminish the need for accountability nor for effective and appropriate forms of testing and assessment. 
 

Disputes over educational reform and accountability are often, sharply polarized, cast in terms of top-down vs. bottom-up, two apparently contradictory perspectives and sets of remedies for reforming and assessing the nation's schools. Government mandated testing linked to a uniform system of rewards and sanctions is the defining example of the former. The bottom-up view stresses local school and community-based initiatives, rooted in face-to-face encounters among teachers, principals, and parents in collaboration with community and local officials. 
 

An assessment system in fact must serve both, 'top-down' and 'bottom-up' functions. On the one hand, it must provide dependable information to school authorities, advisory and governing boards, state legislators, local officials, etc., so they may be better informed to make policy decisions about the distribution of public resources. A systematic assessment process is key for holding districts, school officials and teachers accountable for the quality of their performance. On the other hand, the system of assessment must also provide information that serves the educational needs and interests of each individual child, strengthens local school and community level reform initiatives aimed at improving teaching and learning, and cultivates the integration of diverse cultural historical perspectives and language traditions into the school's curriculum and pedagogy. To serve the nation, and serve children of diverse cultural, racial, ethnic, language, and religious traditions, there must be an appropriate balance of power between central government authority and local school and community control. How the power over rewards and sanctions is distributed is the key. 
 

A more, balanced, inclusive, effective and democratic set of national and state educational assessment policies is workable, possible, and not beyond reach. A shift in power would temper and moderate the already highly disproportionate power held by federal and state government authorities. A shift would restore balance by giving significantly more voice and greater responsibility not to state governments but to the 'grassroots'--to individual schools, teachers, parents, and local communities. 
 

What follows from a shift away from the center is that many of the differences rooted in fundamental moral religious, and cultural beliefs, including philosophical and ideological differences related to curriculum, pedagogy and learning, would be resolved face-to-face, by locally constituted groups in an open consultative process, rather than by panels of experts, executive or legislative commissions appointed by state and federal officeholders and far removed from local communities, schools, children, and teachers. The resolution of the basic dilemmas of teaching, learning, and curriculum would be distanced from divisive, xenophobic, electoral politics. 
 

Obstructions to change 

Redressing the power imbalance is possible but by no means assured. The political and institutional support for current centralized policies is strong, and the resistance to rethinking and reformulating of national and state assessment policy is considerable. 
 

Though the Goals 2000 standards movement shows no promise of producing excellence and equity as its proponents promised, it has wide support by the public and among politicians from the presidential candidates on down. Why, despite the intense criticism and in the face of increasing resistance by students, teachers, and community activists, (28) does the pro testing/standards perspective continue to have such a strong hold on popular opinion, and remain dominant? 
 

The cultural / psychological barriers to rethinking assessment policy and practices are formidable As a culture, Americans believe in tests, standardized tests in particular. Tests used for making high stakes decisions have a deep psychological hold on us because we are part of, and surrounded by, a culture where the need to assign numbers to performance and to compare and rank order individuals and institutions is seen as self-evident. In a world where rank and test scores matter, we also assume that test scores will tell us whether our children are prepared to compete in the hard, cold world. We also want our children's local school, and school district to be among the best. In addition, to a lesser or greater degree, most of us schooled in this society have come to accept standardized tests as a measure of our self worth, particularly with respect to our estimate of our and our children's intellectual and academic capacities and abilities. For many Americans, an educational system without or with a very greatly diminished place for standardized tests is inconceivable. 
 

Any remedy that would disperse power downward to schools and communities will be greeted with skepticism. Though Americans celebrate democracy and democratic values, when confronted by difficult problems or a crisis, as a nation and culture, we are inclined not to more democracy but less. The political and popular culture more readily endorses solutions that promise immediate measurable results and that rely on hierarchical power relations backed up by a universally applied system of rewards and punishments. 
 

There are also political and economic obstructions to rethinking and reforming assessment. A whole generation of mainstream politicians, governors and ex-governors (e.g. Bill Clinton, Lamar Alexander, Richard Riley, George Bush 1&2, Al Gore), many state education officers, legislators, corporate and national union leaders, remain fixed on a standards movement predicated on raising test scores. Despite its failure, the policy persists, in part because they have no other solutions to offer. Also, those now in office who were responsible for conceiving of and instituting these policies are not likely to concede that the Goals 2000 plan is a total failure.
 

Furthermore, testing is big business. Several of the largest test publishers and service providers are divisions of the major textbook publishing firms, which in turn are part of larger publishing and media conglomerates. According to the Bowker Annual, direct expenditures on tests doubled annually between 1980 and 1997 to 200 million dollars. These figures do not reflect increases in state mandated testing programs over the past five years, mandates on the books, but not as yet implemented. Neither do these figures account for indirect costs which includes a large army of experts and consulting firms, state and district-level bureaucrats whose livelihoods and careers depends upon administering, scoring, analyzing, classifying, reporting, storing test data, and insuring compliance with government regulations. In 1993, researchers at the Center for Evaluation and Policy Research at Boston College estimated overhead cost at 20 billion annually. (29)
 

Finally, one of the more formidable obstructions to significant change in assessment policy is the widely voiced belief that whatever the deficiencies of high stakes standardized testing policies, there are no alternatives or at least no economically feasible alternatives to standardized testing. A very commonly voiced concern is that without centralized testing, the system of education would be undermined and flounder because there are no other practical ways to raise standards, assess educational progress, and to sort students and evaluate teachers. 
 

Policy alternatives

The educational policy issues surrounding testing, assessment, and public accountability are immensely complex. This report does not offer a sweeping blueprint for reform of the system of accountability that will tomorrow overturn and repair the damage created and fostered by Goals 2000 policies. It does however propose five guiding principles for reform, and three fundamental issues that must be addressed in formulating and pursuing alternatives. This essay closes with recommendations for shifting the balance of power in order to create a fair and effective accountability system.
 

Principles:

A fair and effective accountability system will: 

1. help to achieve and maintain high educational standards, but will not seek to standardize the curriculum, the learning process, nor attempt to impose a singular view of knowledge, language and culture;

2. contribute to the education of the nation's children to the full range of their talents and capacities; 

3. serve to assure equitable distribution of resources and equality of access to educational and job opportunities; 

4. serve to encourage and reward initiative and meritorious performance --schools, teachers and students;

5. contribute to erasing institutional racism.
 

Three key issues
Race

Virtually absent in discussions of educational excellence by mainstream press political leaders is the pervasiveness of institutional racism, and of the enormous inequities in human and material resources, between the richest and poorest schools 
 

It is of vital importance that the accountability system specifically address the legacy of white supremacy and institutionalized racism legitimated by standardized testing, a legacy that lives on in the present. Institutional racism is manifest not only in disproportionate outcomes, but is built into the instruments and the assessment technology itself. Racism, of course, is also about who has power and who doesn't when basic decisions are made about allocation of resources, curriculum content and teaching methods, eligibility for programs, grade advancement, and the awarding of educational credentials. And most important, who sets the rules, names the 'stakeholders' and makes the final decisions. 
 

The accountability system to be fair and effective must make affirmative efforts to counter the institutional racism currently built into the technology of the instruments of assessment. Procedural and structural protections against institutionalized racism depend on proportionate distribution of decision-making power with a significant degree of cultural control vested at the school and community levels.
 

Technology

Contrary to widely held belief, there is no shortage of systematic evaluation instruments for assessing teaching and school learning and for gauging the quality of 'academic' and other forms of school learning. (30) Some of the 'alternatives' are highly developed and have been shown to provide teachers, parents, and local officials with useful information for enhancing student learning and/or making local and internal school policy decisions. Some of these approaches are more cost efficient than conventional standardized tests because the time spent on assessment is not lost but integral and additive to the teaching and learning process. It is also important to note that the use and interpretation of these instruments is dependent on the social context and particular situation. Thus, none are suitable for producing a single numerical scale that serves as a universal measure of the educational productivity for all schools, teachers, and students. 
 

The technology of multiple choice standardized testing was developed in the first two decades of the 20th century at a time when mechanical hole punch and manual sorting with pins was state of the art information processing technology. The high-speed digital microprocessor and desktop computer technology developed over in last decade has transformed our technological capacity to collect, process, organize, and use very complex information. Other than the introduction in the 1930's of machines capable of reading the graphite pencil marks on answer sheets, and their replacement with digital scanners in recent years, the basic technology of the multiple choice test taking is virtually unchanged since it was invented nearly a century ago. By contemporary standards, the multiple-choice test technology as represented in the Stat-9 is primitive, highly limited, and static. 
 

It is not likely that the innovations in testing and assessment technology will originate from the testing industry, which is heavily invested in multiple-choice technology and ill- equipped for dealing with the new cutting edge information technologies. They have nothing to gain and much to lose from a accountability system that does not rely on centrally administered and scored standardized multiple-choice tests. Though no technology can replace human judgement, the newer digital information technologies have unexplored potential for fostering responsive, systematic, and locally based assessments that also teach. To avoid commercialization of the educational process and undue influence of large corporate interests, pursuing these paths will require public investments that stimulate school and community-based collaborative research and assessment development. 
 

In the near future a variety of educational tests will continue to be used in diagnosing student needs and assessing educational achievements. There are a number of steps that ought to be taken immediately by governments to protect children, communities and the public at large from discriminatory tests, and insuring that the tests used meet the dual standard of enhancing learning and advancing equality of educational opportunity. Forms these protections might take is briefly discussed in the concluding section. 
 

Power and Control 

How is power distributed within the accountability system, that is who writes the rules, distributes rewards and sanctions, determines who are the 'stakeholders' who will make the fateful educational decisions. The instruments employed by the system of accountability are of course critical, because they define what is valued. But an accountability system includes a distinct organizational structure, a set of procedures for controlling rewards and sanctions that represent a particular configuration and distribution of power. The configuration can be changed and balanced so as to give more weight and responsibility to schools and local communities and less to experts, government officials, appointed national and state boards and agencies. 
 

A Massachusetts group called the Coalition for Authentic Reform (31) (CARE) has outlined a proposed statewide accountability system that aims to raise educational standards and the quality of learning and teaching in classrooms and schools. It consists of four integrated components. 

Local Assessments. Each school would submit its accountability plan for review and approval to a regional board, established by the Massachusetts Department of Education. The plan would outline how the school will assess progress toward a broadly stated set of competencies. 

External Quality Reviews. On a three to five year cycle schools would undertake a self study and an external auditors would review the self study visit the school, report on progress toward the dual goal of academic excellence and equitable and quality resources and learning opportunities are being provided to all students. 

Standardized tests. These would be limited to literacy and numeracy and would not have high stakes decisions attached. 

Annual Reports. Each school and school district would annually report to 'stakeholders' on a set of 'indicators' developed by the state. These would include but not be limited to academic performance and reported in terms of race, gender, low income status, special needs and limited English proficiency 
 

The State of Nebraska in 1998 adopted policies that emphasize that the assessment of student academic performance is a local responsibility that should primarily serve to improve instruction and increase learning in the classroom. Further, the policy asserts that no single measure can achieve all purposes, and multiple measures are needed to provide complete information to teachers, parents and policymakers. The assessment system called School-based, Teacher-led Assessment and Reporting System (STARS) is set to begin in the 2000-01 school year. Under this plan, (32) the Nebraska Department of Education invited proposals from teachers and local districts to develop their own operational plan. One of the proposals submitted and approved was by a coalition of representatives from the Nebraska Writing Project (the home of the National Writing Project), 'The School at the Center.' (both are networks of teachers and university faculty members) and nine Nebraska school districts. The underlying premise of their plan is that teachers develop the assessments and become their own assessment experts. The Coalition promises to produce nine "locally appropriate, context-sensitive assessments" for mathematics and reading/writing. While the Nebraska plan has its limits (all districts are required to administer annually one of several state approved commercially available achievements test batteries), it provides funds for local school and school district assessment initiatives, and places significant restrictions on the use of standardized tests in making high stakes decisions. 
 

There are also other living examples of accountability systems where power is balanced between national and local interests and concerns. Scotland with population of just over five million (approximately the size of Maryland, Missouri, Wisconsin, Minnesota) governs its own primary and secondary schools independently of the British government in London. Traditionally education in Scotland is organized as a partnership between the central government, local government, and schools. For many years it has had a system of school inspection that resembles the school review process proposed by the Massachusetts CARE coalition. American style standardized tests play no role in the assessment process. Two recently issued government papers (33) reassert and strengthen the policy that it is the responsibility of the national executive authority 'to exert strategic leadership of the national system… by articulating after consultation the national priorities for education, yet leave to each school supported by its local authority [school district] the central responsibility for its own improvement and for raising standards.' Further, the paper affirms national policy that specifies a basic level of provision which specifies a minimum educational resources for all schools and students. 
 

There have also been several notable efforts in the US and in the U.K. to develop comprehensive accountability models that could serve to articulate local school-based systems into a national (or regional) assessment framework. (34)
 

An Agenda for change

1. Eliminate regulations that directly or indirectly link federal incentives to state adoption of centralized statewide testing of teachers and students. 
 

2. Require an educational impact statement prior to the implementation of a test or assessment procedure by any government educational agency. Such a statement would report on effects on children, schools and community, level of academic achievement, distribution of resources and learning opportunities, drop-out, etc. by race, gender, socio-economic status. 
 

3. Provide federal and state incentives and technical services to schools that (with the support of locally elected school officials) take central responsibility for school improvement and raising standards, and stimulate the development of partnerships among teachers, communities, parents, to develop 'locally appropriate, context-sensitive assessments.' 
 

4. Strengthen and support efforts to set and enforce standards for tests and assessments that protects the public from inappropriate use of tests and assessments, violations of civil liberties and rights of students and teachers. Currently there are two sets of relevant standards: the Standards for Psychological and Educational Tests produced by three professional associations; (35) and Principles and Indicators for Student Assessment Systems developed bythe National Forum on Assessment, a coalition of children's and national civil rights groups. (36) Both are useful but largely symbolic since test developers and government agencies are under no legal obligation or political pressure to meet any standards or principles. The major test publishers and service providers operate under a shroud of secrecy that has been sanctioned by the courts. Remedying the situation requires either legislation, or extra- governmental agreements that would insist that tests used for high stakes decisions meet published professional standards, and that educational tests and assessment procedures be open to public scrutiny and independent review. (37)
 

5. State legislatures should declare a moratorium on tests used high stakes testing in to undertake an orderly review existing tests to determine whether they comply with professional standards and meet National Forum assessment principles. 
 

6. Federal and/or state government could pass legislation intended to rein in the abusive uses of tests. Paul Welstone (D-MN) in April, 2000, introduced to the Senate, (and Rep. Robert Scott (D-VA) to the House) The Fairness and Accuracy in Student Testing Act, (38) that would prohibit the use of standardized tests as the single determinant in making decisions about graduation, promotion, tracking or ability grouping of students and that tests must: be valid and reliable for the purposes for which they are used; measure what the student was taught; provide students with multiple opportunities to demonstrate proficiency; provide appropriate accom-modations for students with limited English proficiency and disabilities. Political action at the state level is more likely. 
 

Changing assessment; changing schools

Goals 2000 policies has led the nation down a dangerous path by increments to a radical transfer of power, with an increasing concentration in the hands of government, authorities, bureaucrats, experts, and Washington D.C. and state capital based 'stakeholders,' all distant from children, classrooms, and schools. This power imbalance is educationally, and as the recent electoral politics of California illustrates, politically unwise, and potentially explosive. Mandated standardized tests because they have a disproportionately high adverse impact on communities of color, sustain and strengthen institutional racism. As testing programs authorized by state legislatures that tie tests to school promotion, admission to 'gifted' programs, entitlement to high school, scholarships, diplomas, degrees, certification, etc., are implemented over the next five to eight years, these adverse effects on communities of color will intensify, and provoke racial and cultural conflicts, and organized resistance. 
 

Significant school reform is not possible without significant reform of the current system of national and state educational assessments. Changes will not occur of their own accord. They will come about only in response to persistent pressure by coalitions and tactical alliances that cut across political, social class, racial, and ideological lines. These include coalitions of citizen, student, teacher, and parents activists, children's advocates, civil liberties and civil rights leaders, educational traditionalists, and grassroots political progressives and conservatives. There were in the 1999-2000 school year, for the first time, numerous organized, protests, boycotts, and other forms of active resistance to high stakes standardized testing by teachers, parents, youth, and community activists across the nation. That resistance is growing and is becoming more militant as mandated tests tied to sanctions are put into place. We as a nation will continue differ profoundly on how schools ought to educate, what an educated person ought to know, and on how students learn best. In a democracy we cannot allow governments, panels of experts remote from communities, classrooms and students to impose a singular view of curriculum and learning, and to decide our and our children's futures. (11,982)

©2000 Harold Berlak Comments welcome: hberlak @sbcglobal.net
 

Harold Berlak holds a doctorate educational reseach from Harvard. He is an former professor of education at Wshinton University in St. Louis, and lontime educational activist. 
 

1. Raising Standards For American Education, A Report To Congress, the Secretary of Education, the National Goals Panel, and the American People; Wash DC 1/24/1992 US Printing Office IBSN 0-16-036097-8. This report explicitly acknowledges and adopts California as the model. The 'smart' standards tied to 'smart' test policy was introduced to California by Bill Honig nominally a liberal Democrat who served as State Superintendent of Instruction from 1983-93.

2. These goals are: by the year 2000 (1) all children will start school ready to learn; (2) the high school graduation rate will increase to at least 90%; (3) all children will leave grades 4, 8, and 12 having demonstrated competency in challenging subjects including English, mathematics, science, foreign languages, civics and government, arts, history and geography. (4) American students will be first in the world in science and mathematics achievement; (5) every adult American will be literate; (6) every school will be free of drugs, violence, and the unauthorized presence of fire arms and alcohol and will offer a disciplined and drug free environment. 

3. Still Separate, Still Unequal, A Research Brief Oakland, CA: Applied Research Center, May, 2000.www.arc.org.

 
4. See Stephen Selden Inheriting Shame: The Story of Eugenics in America, New York, Teachers College Press, 1999.

5. Steven J. Gould, 'Jensen's Last Stand' , New York Review of Books, 1980 ; Leon J. Kamin, The Science and Politics of IQ New York 1974: Daniel M. Kohl, 'The IQ Game: Bait and Switch',School Review 84:44 1976., John Wiley 

6. Russell Jacoby and Naomi Glauberman (eds.) The Bell Curve Debate. New York, NY Times Books/Random House, 1995 

7. Cultural supremacy arguments are dismantled in Jared Diamond, Guns, Germs and Steel New York: Norton, 1997.

8. Samuel L. Myers Jr. and Cheryl Mandala Is Poverty the Cause of Poor Performance of Black Students on Basic Standards Examination? Roy Wilkins Center for Human Relations and Social Justice, Univ. of Minnesota. Paper presented at the 1998 American Educ. Research. Assoc. Annual Meeting, June, 1997

9. Schools were ranked in terms of resources, education and experience of staff, number, depth and range of academic course offerings. 

10. Claude M. Steele, 'A threat in the air: How stereotypes shape the intellectual identities American Psychologist, 52, 1997. Also see 'Stereotyping and its threat are real' American Psychologist, 53, 1998.

11. These include Lisa Delpit Other Peoples Children New York; The new Press ,1995. Joyce E. King, ' The Purpose of Schooling for African American Students' In J. King, E. Hollins and W.C. Hayman (eds.) Preparing Teachers for Cultural Diversity New York: Teachers College Press, 1996; Gloria Ladson Billings The Dream Keeper: Successful Teachers of African-American Children, San Francisco: Josey-Bass 1999. 

12. Fordham, Signithia, (1996) Blacked Out, Dilemmas of Race, Identity, and Success at Capital High, Chicago: University of Chicago Press

13. See Robert Linn 'Assessments and Accountability" , Educational Researcher 29:2 2000. He cites data from the Florida high school competency test, given annually since 1977 to illustrate a common pattern. When first a test is introduced, scores rise markedly for several years for whites and persons of color, level off, and over time decline slightly. However, the gap in test performance between the races remains virtually constant over time.

14.In CBEST 10 of the 50 items on the math and language section are not scored. They used in creating items for future version of CBEST. Eight percent of 40 items equals 3.2 items. The size of the gap in terms of test items will of course vary depending on the number of test items.

15. On some tests, particularly in mathematics and engineering, some Asian populations outperform Whites.

16. These include: Linda McNeil Contradictions of School Reform; The Educational Costs of Standardized Testing New York: Routledge, 2000; George F Madaus,. 'A Technological and Historical Consideration of Equity Issues Associated with Proposals to change the Nation's Testing Policy' Harvard Educational Review,64:1,1994; Diana C Pullin, 'Learning to Work: The Impact of Curriculum and Assessment Standards on Educational Opportunity' Harvard Educational Review, 64:1,1994. 

17. Some achievement tests, the college entrance SAT for example, predict academic grades at the next level, but only in the very short run. 

18. See Peter Sacks, Standardized Minds., Cambridge, MA: Perseus Books 2000.

19. The concluding paragraph, slightly revised, is taken from Deborah Meier, 'Educating a Democracy: Standards and the future of public education', Boston Review, Dec 1999/ Jan 2000. 

20. Some states afford flexibility in time allotted for completing the mandated test. In some cases allowances are made for students with documented learning disabilities, or whose native or first language is not English. 

21. Some achievement tests, the college entrance SAT, for example, predicts academic grades at the next level, but only in the very short run. 

22.High Stakes Testing for Tracking, Promotion and Graduation, 1998, and Myths and Tradeoff: The Role of Tests in Undergraduate Admissions, National Research Council.1999.

23. The 1988 Education Reform Act proposed and shaped while Margret Thatcher was Prime Minister became law under her successor, John Major. 

24. Pollard, Broadfoot, Croll, Osborn and Abbott, Changing English Primary Schools, Cassell; 1994; Croll (ed.) with Abbott, Black, Broadfoot, Osborn and Pollard) Teachers, Pupils and Primary Schooling, Cassell,1996; Osborn, McNess and Broadfoot, with Pollard and Triggs Policy, Practice and Teacher Experience (Continuum 2000; Pollard and Triggs, with Broadfoot, McNess and OsbornPolicy, Practice and Pupil Experience Continuum, 2000.

25.  Robert Linn, op cit.

26. See Call to Action for American Education in the 21st Centurywww.ed.gov/updates/ PresEDPlan/part2.hmtl

27. Bureau of the Census, October Current Population Survey 1996

28. Movements of teachers, community and youth activists in opposition to tests are gathering force across the nation including California, Massachusetts, New York and Illinois. There have also been boycotts by students in Chicago, and Boston. 

29. Walter Haney, George Madaus, and Robert Lyons , The Fractured Marketplace for Standardized Testing Boston: Kluwer, 1993.

30.Deborah Meier Will Standards Save Public Education Boston: Beacon Press, 2000; Linda Darling-Hammond, J. Ancess and B. Falk, Authentic Assessment in Action, New York: Teachers College Press, 1995; Grant Wiggins, Educative Assessment: Designing Assessment to Inform and Improve Student Performance, San Francisco: Jossey-Bass,1998; Patrick Griffin, P. Smith and L Burrill The American Literacy Profile Scales: A Framework for authentic Assessment Portsmouth NH: Heinemann,1995; Monty Neil et al. Implementing Performance Assessments Cambridge MA: FairTest, 1996.

31. Full proposal and list of members of the coalition www.fairtest.org/care/accountability.html 

32. See the STARS Planning Guide at; http://www.edneb.org/IPS/starsmnt.pdf . Also see Chris Gallagher A Seat at the Table; Teachers Reclaiming Assessment Through Rethinking Accountability PDK  http://www.kiva.net/~pdkintl/kappan/kga10003.htm 

33. Scotland Executive Education Department, School code paper and Priorities for schools 2000

34. Ann Filer (ed.) Assessment Social Practice and social Product Falmer Press, London & New York, 2000; Harold Berlak, (ed.) Toward a new science of educational testing & assessment, Albany: SUNY Press, 1992; John Raven "A model of competence, motivation, and behavior, and a paradigm for assessment". in Berlak (ed.) Toward a new science; Tyrell Burgess and Elizabeth Adams Outcomes of Education,. London: Macmillan Education, 1980. 

35.The American Psychological Association, American Educational Research Association, and National Council on Measurement in Education). 

36. Summary available at: www.fairtest.org

37.  In 1998 the Ford Foundation funded an organization called National Board on Testing and Public Policy, located at the Center for Testing, Evaluation and Educational Policy at Boston College. One of its chief purposes of the Board would be to monitor quality by conducting independent expert audits of tests. One of the more serious difficulties with the proposed Board is that testing industry is elevated to be a major 'stakeholder.' This would almost certainly retard innovation by giving precedence to the companies who are heavily invested in an out-of-date technology. Also a problem is the absence of a significant presence on the governing board of practicing teachers, parents, and local community. See George Medaus and Cathy Horn, Testing Technology: the need for oversight' in AnnFiler op cit. 

38. http://www.senate.gov/~wellstone/highstakes2.htm