February 24, 2007
The Cult of the Quantifiable
Filed under: NCLB Testing by Leo Casey @ 11:50 am
In his latest contribution to the ongoing debate over the Aspen Commission’s NCLB report, Kevin Carey dismisses what he describes as our “the standardized attack on standardized tests.” For all of the failings of actually existing standardized tests, he opines, they provide needed information, albeit imperfect information, on student learning. And as information on student learning, they are the window through which we can observe the quality of teaching.
He then asks:
What level of imperfection can we tolerate, given the way the information will be used?… Leo clearly thinks that our current tests fall below that threshold. Okay, I’ll bite: how good does student assessment information — from standardized tests or any other source — have to be before it would be appropriate for use in determining teacher salaries? 100% perfect and unassailable in every way? If not perfect, how good? 95%? Let’s put a number on the table and then figure out what it would take to get there.
This passage, and the assumptions of Carey’s general position here, expresses a particular view of how one can meaningfully assess actual student learning. It is a not a view unique to Carey; it could just as easily have been written by a Checker Finn or a Rick Hess, and it informs the most problematic parts of the Aspen report. At the center of all of their work is the assumption that it is possible and desirable to capture the most important aspects of student learning through the metrics of standardized testing, whatever the flaws in the actually existing tests.
The single-minded focus on what is measurable in quantifiable terms [so-called 'hard data'] is often presented as the royal road, if not the only road, to educational accountability. In this respect, it commonly takes on the stern language of the superego, and invokes a ‘father knows best’ paternalism in addressing teachers and parents — just take note of Hess’ fixation with the metaphor of ‘tough love’ and Carey’s charming description of his view as the “grown up” one. But when one looks beyond the paternal language, what one finds is a educational world view which studiously ignores ‘best practices’ in other fields.
Consider the political arena. Carey, Finn and Hess know, without doubt, a large number of ‘hard nosed,’ no nonsense politicians and elected officials of the center to right. But between the three of them, it is questionable that they could produce even one politician who practices politics the way they want to practice education. In contemporary politics, it is axiomatic that one must combine quantifiable measures of public opinion [polls] with qualitative studies [most commonly focus groups]. Each tool produces different and distinct sets of knowledge: polls provide what might be called thin but generalizable knowledge, while focus groups provide what could be described as thick but specific knowledge. Polling data can tell us where different candidates and elected officials stand with the public, or public views of particular issues of the day; conducted over time, polls can identify trends in those standings. But what they can’t provide is robust and rich insights into why the public views the candidates and issues the way it does, how those different views are interconnected, and how they might change, if the public figure employed a particular approach. That sort of knowledge, the so-called ’soft data,’ can only be plumbed with tools of qualitative assessment such as focus groups. Politicians across the political spectrum understand very well the importance of possessing and using both sets of knowledge.
Educational assessments operate in analogous ways. The more rudimentary or elementary the skill, the easier it is to assess it definitively through quantifiable measures and standardized tests. There is no reason, as least in principle, why a standardized test could not provide reasonably accurate measures of basic reading comprehension, basic computational skills or the ability to recall and use discrete pieces of information. True, a great many standardized tests in use today do not meet that benchmark, but these failings are more a function of districts and states constructing such tests ‘on the cheap’ and misusing tests for purposes other than those for which they were designed. One can not ignore those shortcomings, or write them off as an acceptable margin of error in some statistical sense, as Carey wants to do, but they can not be an argument against a singular reliance on any standardized test. The real problem is more fundamental.
The most important skills we want our students to possess upon graduation from high school — how to write a persuasive essay which convincingly presents a logical argument with supporting illustrations; how to research a major issue, organize the evidence from the research and write a research paper which synthesizes the evidence in support of a thesis; how to deliver a coherent, convincing oral presentation; how to take a real life technical problem, and using the tools of analysis and computation, develop real life solutions — simply can not be meaningfully assessed through standardized tests. One needs performance assessments, necessarily qualitative in nature, to assess these skills. It is only in a careful evaluation of a written essay, for example, that one can assess the ability to write a strong persuasive essay. Note that these skills which can only be meaningfully assessed through actual performances are the very same skills that are indispensable for success in a post-secondary educational setting, and that are in demand in the global knowledge economy.
Qualitative, performance based assessments should not be mistaken for subjective assessments. Just as there are protocols and best practices which generally govern the operation of focus groups, there are generally recognized standards for what constitutes a proficient and an excellent essay. It is even possible to formalize those standards in the form of rubrics, thus maintaining a necessary level of rigor and maximizing the consistency of the assessment among different raters. What can’t be done is the reduction of the complex skills of essay writing to the form of a standardized test, where they can be quantified and neatly distributed along a normal curve.
Unlike some critics of standardardized tests, teacher unions are strong supporters of rigorous educational standards, which we believe essential to the advance of American education. Moreover, we do believe that there is a positive and necessary role for standardized tests in K-12 education. Diagnostic tests can be particularly helpful in identifying the problems an individual student is having in mastering a fundamental literacy or numeracy skill. And standardized achievement tests can provide useful data to be considered as one piece of evidence, weighed together with performance assessments, classroom performance and teacher observations, in making important decisions on a student’s promotion or graduation. The NAEP exams are useful checks on the achievement claims of local school districts, and provide very significant broad data on the state of American schools and education. [Indeed, part of the reason for the reliability of the NAEP exams is that they are not "high stakes," so students and teachers are not compelled to spend weeks of "test prep" in order to maximize their scores and the effects of testing anxiety are minimized.]
But when standardized tests are taken as the sole measure of a student’s learning, and when high stakes decisions concerning a student’s promotion or graduation are made solely or predominantly on the basis of a standardized test, an educational wrong is being perpetrated. Psychometricians, psychologists and other professionals of test design and use are emphatic in their insistence that to be legitimate, such decisions must be made on the basis of multiple forms of evidence of student learning. [See the Code of Fair Testing Practices in Education, prepared by the Joint Committee on Testing Practices, representing the American Counseling Association, the American Educational Research Association, the American Psychological Association, the American Speech-Language-Hearing Association, the National Association of School Psychologists, the National Association of Test Directors and the National Council on Measurement in Education.] High stakes decisions on the basis of standardized test scores are simply not the most complete, the most accurate and the fairest measures of a student’s learning.
Equally troubling, once a standardized test becomes high stakes, as the basis of promotion and graduation decisions, it drives what is taught. It becomes counter-productive for a secondary school teacher concerned with getting his or her students past such high stakes obstacles and through graduation to spend a great deal of time teaching to the most important skills we identified above, because the standardized tests the student must pass will not be measuring them. Only those teachers who can start from the presumption that their students will pass these tests as a matter of course are able to devote the requisite time and energy to the development of these skills. Let’s be clear here: given the high correlation between socio-economic status and standardized test performance, the teachers who have the time to teach to these essential higher order skills are the teachers of students from middle and high income families. Notwithstanding the rhetorical genuflection before the goal of bridging the achievement gap, the singular reliance upon standardized tests as the measure of student learning extends, rather than lessens, that gap.
A true and complete measure of a student’s learning, especially as the student moves up through the grades and the skills being acquired become more complex, requires qualitative assessment. The further a student advances in schooling, the less useful information standardized tests tell us about his or her learning. If our task is to prepare all students — and most especially students from high poverty backgrounds — to succeed in post-secondary education and in knowledge economy occupations, then we need to move beyond the cult of the quantifiable.
Permalink TrackBack Share This
2 Comments
Comments are open for registered users and do not reflect the views of the UFT. Please read our general rules for commenters.
RSS feed for comments on this post. TrackBack URI
Sorry, the comment form is closed at this time.

Leo,
if you are going to debate these people, (and I wonder why you would, but your choice), can you at least tell us the context up front?
For those of you who don’t want to click the (vile) link, it’s a smarmy anti-union pro-merit pay argument.
Jonathan
Comment by jd2718 — February 24, 2007 @ 4:33 pm
I read Carey’s most recent post where he says that most of his work has been with politicians on the left. I know that the conventional wisdom is that education makes strange bedfellows– Kennedy likes the Aspen report, etc.–and so the action is in the New Democratic center. However, out in the schools, far from the think tanks, what it looks like is this: the NCLB deal that was supposed to give money and support for low income school communities in exchange for accountability has resulted in not much money and support and lots of accountibility, with no clear benefits for kids.
Nonetheless, Carey and others like him continue to beat up on teachers and support more testing for kids, claiming that at the end of some long road is a better system. The best quote on NCLB and the current policy climate came from Greg Topos in USA Today. It goes something like this: “The further you get from the classroom, the better it all sounds.” I would think that would give people like Carey pause, but I guess not.
Comment by August — February 27, 2007 @ 3:04 pm