The Tacit Assumptions of Testing Season

With the slow receding of the snow come signs of spring – longer days, tiny but promising buds on the trees, the arrival of standardized test instruction booklets and nondisclosure agreements at schools all over the country.  Yes, spring is Testing Season.  But where testing used to mostly just strike anxiety into the hearts of students, the increasing popularity of policies binding teachers’ job performance to student test scores have meant that Testing Season is increasingly anxiety-inducing for teachers, as well.  I think it is worth questioning some of the tacit assumptions guiding these policies.

As an example, I turn to Governor Andrew Cuomo of New York.  Governor Cuomo and his education agenda have been much in the news.  This week, Rebecca Mead suggested on NewYorker.com that just as Cuomo has made education reform “a centerpiece of his agenda,” he is making the evaluation of teachers a centerpiece of his education reform.  Quoting Cuomo’s State of the State address from January, Mead wrote:

“Everyone will tell you, nationwide, the key to education reform is a teacher evaluation system,” the governor said. He noted that while only thirty-eight per cent of New York State high-school students are deemed to be “college ready,” according to their scores on standardized tests, 98.7 per cent of teachers in New York’s schools are rated “effective.” “How can that be?” Cuomo asked. “Who are we kidding, my friends? The problem is clear and the solution is clear. We need real, accurate, fair teacher evaluations.”

Cuomo’s rhetorical question and answer rest on two related assumptions:  (1) that the tests assessing student learning are fair and accurate, and (2) that the evaluations assessing teacher performance are neither.

The inconvenient truth is that the assessments we use to determine performance (or failure, as the case may be) are neither divinely created nor objective.  They are valiant but insufficient proxies for deeper understanding about what test-takers truly know, and so to use them fairly we also need to view them with a measure of healthy skepticism.  (Harvard professor Daniel Koretz has written extensively and clearly on the opportunities and constraints of educational testing, but as near as I can tell policymakers seem to be paying scant attention to his work.  Koretz's current project includes research on improving and holding accountable high-stakes accountability systems ).  

To risk stating the obvious:  tests are created by human beings, and so their meaning is whatever we project on to them.  If we agree there should be a meaningful difference between “proficient” and “not proficient,” then we make a decision about where the border between them lies.  The process of choosing that line is often deliberate and informed, but it is not objective and it is not science.  It is a subjective judgment, and it can change.  On a scale where, say, 220 represents passing and 219 failing, then it would be more accurate to call the student scoring 220 “lucky” than “proficient.” 

Standards-based assessments have at least two clear benefits for politicians and policymakers like Cuomo:  messaging and malleability.  However precarious the line between passing and failing may be, the meanings assigned to these categories are tantalizingly easy to understand and thus very influential.  Newspaper reporters, pundits of various shapes and sizes, and the general public all have a shared assumption that “proficient” is good and “failing” is bad.  When we hear the message that half of all students “fail” a test – as was the case when the Massachusetts Comprehensive Assessment System (MCAS) debuted in 1998 – we all know to be alarmed. 

Thankfully, standards-based assessments are also malleable.  A “passing” score can be whatever we say it is.  In 1999, after two years of pretty disappointing results on the MCAS, the Massachusetts Board of Education decided to set the rate for “passing” not at Proficient but at the low end of the Needs Improvement, exactly one point above Failing.  At the time, Commissioner David Driscoll and other education officials expressed a determination to raise the passing score to Proficient over time, but they never did.  And why would they, when such a change would result in thousands of previously successful students being suddenly rebranded as failures? 

Such a change would seem to be a PR disaster, but this is not that different from what happened in New York in 2013, when state officials released the results of redesigned tests aligned to the Common Core.  What initially seemed like a dizzying failure of high stakes accountability has turned out to be a real boon for Governor Cuomo, who now uses the low passing rate as the linchpin of his education reform message.  Because the new test scores tell us the “fair and accurate” truth about student performance, then it clearly must be the teacher evaluations that are failing and in such dire need of reform.

But this leads me to a more vexing and arguably more essential question:  just how will we know when we have “real, accurate, fair teacher evaluations”? Will it be when more students perform well on the tests we equate with learning? Or will it be when fewer teachers are judged to be effective? 

If it's the former, then we should maybe just make the tests easier to pass.  To do so, we would not need to change anything about the tests themselves.  We would need only to change how we define success (as officials in Massachusetts did in 1999).  It is an elegantly simple, almost magical, solution:  we simply decree that our students are proficient, talk ourselves into believing that we helped them become proficient learners, and never again wonder whether “proficient” is actually the same as “learning.”

If it's the latter, though, then we need ask ourselves where all the effective teachers will come from once we've fired the ones we deem ineffective (and then ask ourselves why anyone in their right mind would agree to be a teacher).  I have asked a version of this question before, but I have yet to find a satisfying answer.  The drain of innumerable committed career teachers from the profession would be an especially concerning consequence of policies designed to weed out only the most egregious offenders.  Boosters of hard-edged teacher evaluation policies might respond by saying that “effective” teachers have nothing to worry about:  if they are doing their job, the evaluations will show that.  But this assumption is just as false as the assumption that student tests are fair and accurate representations of proficiency.  Tests that are already merely masking as objective become even more compromised when high stakes (like a job) are added.

Ultimately, the final say on teacher evaluations in New York comes down not to what is objectively the right thing to do.  It comes down to the subjective judgment of one person:  Governor Cuomo.  He will press for legislation, and he will sign or veto whatever makes it through the state legislature.  Of course, subjectivity would be less of a problem were it accompanied by trust, but here is Cuomo’s biggest liability.  Too many people whose livelihoods depend on his judgment do not trust him.  Just as tests are more valuable (and arguably more reliable) when the people taking them trust that the people giving them are reasonable and fair, teacher evaluations based on these same tests are more likely to be valuable if teachers trust that the people designing the tests and setting the cut scores and analyzing the test data are reasonable and fair.  So far, I think they have good reason to be skeptical.