Tag Archives: VAM

Bush/Obama Ed Reform: Victory over Value Add

(I was writing my final article on this era when I realized I hadn’t really focused completely on the history of Value Added Metrics (VAM) in my original coverage of the Obama years. I am saying this because VAM sprites both pro and con are holding me at gunpoint demanding I write an article all about them.)

In 2009, The New Teacher Project’s The Widget Effect declared that schools treated all teachers as interchangeable units, didn’t bother to train new teachers, refused to fire tenured teachers, and worse, gave all teachers high ratings.  99% of teachers got ratings of Proficient or higher! The shame!

Mind you, none of these are new declarations, but this paper initiated the argument that allowed Obama and Duncan (as I wrote here)  to demand that states evaluate teachers with student achievement, and that achievement must be test scores. Thus, one of the requirements for a Duncan “waiver” from No Child Left Behind school “program improvement penalities”, which by now were affecting over half of all schools, was that the state must begin evaluating teacher effectiveness using data–just another word for VAM.

Put another way, Obama and Duncan allowed states to escape schoolwide accountability for student test scores by forcing them to agree to teacher accountability for student test scores.

In 2009, 10 states required evaluation to include student achievement metrics. By 2015, 43 states required value-added metrics for evaluation. Most courts agreed that the usually hasty and poorly thought through implementation plans were absurd and unfair, but declined to step in. There were some notable exceptions, as you’ll see. (Note: I wrote a longer opinion of VAM that includes more info.)

From 1% Ineffective to…..?

By now, no one should be surprised to learn that these efforts were a spectacular failure, although rarely reported in just those terms. But by 2019, only 34 states required it, and most other states still requiring them on paper had watered down the impact by dramatically reducing the VAM component, making VAM optional, removing the yearly requirement for teacher evaluations, or allowing schools to design their own metrics.

In the definitive evaluation, Harvard researchers studied 24 states that implemented value-added metrics and learned that principals refused to give teachers bad ratings. In fact, principals would rate teachers lower in confidential ratings than in formal ones, although in either method the average score was a positive evaluation.  When asked, principals said that they felt mean giving the bad results (which suggests they didn’t agree with them). Moreover, many principals worried that if they gave a bad review, the teachers might leave–or worse, force the principal to begin firing procedures. Either way, the principal might end up forced to hire a teacher no better or possibly worse.

Brief aside: Hey, that should sound familiar to long-time readers . As I wrote seven years ago: “…most principals don’t fire teachers often because it’s incredibly hard to find new ones.”. Or as I put it on Twitter back when it allowed only 140 characters, “Hiring, not firing, is the pain point.” 

So the Obama administration required an evaluation method that would identify bad teachers for firing or training, and principals are worried that the teachers might leave or get fired. That’s….kind of a problem. 

Overall, the Harvard study found that only two of them gave more than 1% of teachers unsatisfactory ratings.

If you do the math, 100% – 1% = 99% which is exactly what the Widget effect found, so that was a whole bunch of money and energy spent for no results.

New Mexico

The study’s outlier was New Mexico, which forced principals to weight VAM as 50% of the overall evaluation score, courtesy of Hanna Skandera, a committed reform education secretary appointed by a popular Republican governor. As a result, over 1 in 4 teachers were rated unsatisfactory.

But! A 2015 court decision prevented any terminations based on the evaluation system, and the case got delayed until it was irrelevant. In 2017, Governor Martinez agreed to a compromise on the evaluation methodology, increasing permitted absences to six and dropping VAM from 50% to 35%. New Mexico also completed its shift from a purple to blue state, and in 2018 all the Democratic gubernatorial candidates promised they would end the evaluation system. The winner, Michelle Lujan, wasted no time. On January 3, 2019, a perky one-page announcement declared that VAM was ended, absences wouldn’t count on evaluations, and just for good measure she ended PARCC.

So the one state in which principals couldn’t juke the stats to keep teachers they didn’t want to fire, the courts stepped in, the Republican governor backed down, and the new Democrat governor rendered the whole fuss moot.

California

California had always been a VAM outlier, as governor Jerry Brown steadfastly refused the waiver bribes .Students Matter, an organization founded by a tech entrepreneur, engaged in a two-pronged attempt to force California into evaluation compliance–first by suing to end teacher tenure (Vergara) and then by forcing evaluation by student test scores (Doe vs. Antioch).  Triumphalists hailed the original 2014 Vergara decision that overturned the protections of teacher tenure, and even the  more cautiously optimistic believed that the California appeals court might overturn the decision, but the friendlier California Supreme Court would side with the plaintiffs and end tenure. The appeals court did overturn, and the CA Supreme Court….declined to review, letting the appellate ruling stand. 

Welch and Students Matter likewise tried to force California schools to read its 1971 Stull Act as requiring teachers to be evaluated by test scores. That failed, too.  No appeal.

Upshot

“Experts” often talk about forcing education in America to follow market-based principles. But in the VAM failure, the principals are following those principles! (hyuk.) As I’ve also written many times, there is, in fact, a teacher shortage. But at the same time, even the confidential evaluations demonstrate that the vast majority of teachers are doing good work by their manager’s estimation.

As a teacher, I would be interested in learning whether I had an impact on my students’ scores. I’d be more interested, really, in whether my teaching methods were helping all students equally, or if there were useful skews. Were my weakest students, the ones who really weren’t qualified for the math I was teaching, being harmed, unlearning some of the earlier skills that could have been enforced? Was my practice of challenging the strongest students with integrated problem solving and cumulative applications of material keeping them in the game compared to other students whose teachers taught more faster, tested only on new material, and gave out practice tests?

But the idea that any teachers other than, perhaps, reading teachers in elementary school could be accurately assessed on their performance by student learning is just absurd.

Any teacher could have told you that. Many teachers did tell the politicians and lobbyists and billionaires that. But teachers are the peasants and plebes of the cognitive elite, so the country had to waste billions only to get right back to where we started. Worse: they still haven’t learned.

( I swear I began this article as the final one in the series until I realized VAM was pulling focus. I really do have that one almost done. Happy New Year.)

Next up–and Finally! Bush/Obama Ed Reform: It All Came Tumbling Down


The Many Failings of Value-Added Modeling

Scott Alexander reviews the research on value-added models measuring teacher quality1. While Scott’s overview is perfectly fine, any such effort is akin to a circa 1692 overview of the research literature on alchemy. Quantifying teacher quality will, I believe, be understood in those terms soon enough.

High School VAM is Impossible

I have many objections to the whole notion of modeling what value a teacher adds, but top of the idiocy heap is how little attention is paid to the fact that VAM is only even possible with elementary school teachers. First, reading and basic math are the primary learning objectives of years 1-5. Second, elementary schools think of reading and math ability in terms of grade level. Finally, elementary teachers or their schools have considerable leeway in allocating instruction time by subject.

Now, go to high school (of which middle school is, as always, a pale imitation with similar issues). We don’t evaluate student reading skills by grade level, but rather “proficiency”. We don’t say “this 12th grader reads at the 10th grade level”. We have 12th graders who read at the 8th grade level, of course. We have 12th graders who read at the third grade level. But we don’t acknowledge this in our test scores, and so high school tests can’t measure reading progress. Which is good, because high school teachers aren’t tasked with reading instruction, so we wouldn’t expect students to make much progress. What’s that? Why don’t we teach reading instruction in high school, if kids can’t read at high school level, you ask? Because we aren’t allowed to. High school students with remedial level skills have to wait until college acknowledges their lack of skills.

And that’s reading, where at least we have a fighting shot of measuring progress, even though the tests don’t currently measure it–if we had yearly tests, which of course we don’t. Common Core ended yearly high school tests in most states. Math, it’s impossible because we pass most kids (regardless of ability) into the next class the next year, so there’s no “progress”, unless we measure kids at the beginning and end of the year, which introduces more tests and, of course, would show that the vast majority of students entering, say, algebra 2 don’t in fact understand algebra 1. Would the end of year tests measure whether or not the students had learned algebra 1, or algebra 2?

Nor can high school legally just allocate more time to reading and math instruction, although they can put low-scoring kids in double block instruction, which is a bad, bad thing.

Scope Creep

Most teachers at all levels don’t teach tested subjects and frankly, no one really cares about teacher quality and test scores in anything other than math or reading, but just pretend on everything else. Which leads to a question that proponents answer implicitly by picking one and ignoring the other: do we measure teacher quality to improve student outcomes or to spend government dollars effectively?

If the first, then what research do we have that art teachers, music teachers, gym teachers, or, god save us, special education teachers improve student outcomes? (answer: none.) If the second, then what evidence do we have that the additional cost of testing in all these additional topics, as well as the additional cost of defending the additional lawsuits that will inevitably arise as these teachers attack the tests as invalid, will be less strain on the government coffers than the cost of the purportedly inadequate teachers? What research do we have that any such tests on non-academic subjects are valid even as measures of knowledge, much less evidence of teacher validity?

None, of course. Which is why you see lawsuits by elective teachers pointing out it’s a tad unfair to be judged on the progress of students they’ve never actually met, much less taught. While many of those lawsuits get overturned as unfair but not constitutional, the idiocy of these efforts played no small part in the newest version of the federal ESEA, the ESSA, killed the student growth measure (SGM) requirement.

So while proponents might argue that math and English score growth have some relationship to teacher quality in those subjects, they can’t really argue for testing all subjects. Sure, people can pretend (a la Common Core) that history and science teachers have an impact on reading skills, but we have no mechanism to, and are years away from, changing instruction and testing in these topics to require reading content and measuring the impact of that specific instruction in that specific topic. And again, that’s just reading. Not math, where it’s easy enough to test students on their understanding of math in science and history, but very difficult to tangle out where that instruction came from. Of course, this is only an issue after elementary school. See point one.

Abandoning false gods

For the past 20 years or so, school policy has been about addressing “preparation”, which explains the obsession with elementary school. Originally, the push for school improvement began in high school. Few people realize or acknowledge these days that the Nation at Risk, that polemic seen as groundbreaking by education reformers but kind of, um, duh? by any regular people who take the time to read it, was entirely focused on high school, as can be ascertained by a simple perusal of its findings and recommendations. Stop coddling kids with easy classes, make them take college prep courses! That’s the ticket. It’s the easy courses, the low high school standards that cause the problem. Put all kids in harder classes. And so we did, with pretty disastrous results through the 80s. Many schools began tracking, but Jeannie Oakes and disparate impact lawsuits put an end to that.

I’m not sure when the obsession with elementary school began because I wasn’t paying close attention to ed policy during the 90s. But at some point in the early 90s, it began to register that putting low-skilled kids in advanced high school classes was perhaps not the best idea, leading to either fraud or a lot of failing grades, depending on school demographics. And so, it finally dawned on education reformers that many high school students weren’t “academically prepared” to manage the challenging courses that they had in mind. Thus the dialogue turned to preparing “underserved” students for high school. Enter KIPP and all the other “no excuses” charters which, as I’ve mentioned many times, focus almost entirely on elementary school students.

In the early days of KIPP, the scores seemed miraculous. People were bragging that KIPP completely closed the achievement gap back then, rather than the more measured “slight improvement controlling for race and SES” that you hear today. Ed reformers began pushing for all kids to be academically prepared, that is hey! Let’s make sure no child is left behind! And so the law, which led to an ever increasing push for earlier reading and math instruction, because hey, if we can just be sure that all kids are academically prepared for challenging work by high school, all our problems will be fixed.

Except, alas, they weren’t. I believe that the country is nearing the end of its faith in the false god of elementary school test scores, the belief that the achievement gap in high school is caused simply by not sufficiently challenging black and Hispanic kids in elementary school. Two decades of increasing elementary scores to the point that they appear to have topped out, with nary a budge in high school scores has given pause. Likewise, Rocketship, KIPP, and Success Academy have all faced questions about how their high-scoring students do in high school and college.

As I’ve said many times, high school is brutally hard compared to elementary school. The recent attempt to genuinely shove difficulty down earlier in the curriculum went over so well that the new federal law gave a whole bunch of education rights back to the states as an apology. Kidding. Kind of.

And so, back to VAM….Remember VAM? This is an essay about VAM. Well, all the objections I pointed out above–the problems with high school, the problems with specific subject teachers–were mostly waved away early on, because come on, folks, if we fix elementary school and improve instruction there, everything will fall into place! Miracles will happen. Cats will sleep with dogs. Just like the NCLB problem with 100% above average was waved away because hey, by them, the improvements will be sooooo wonderful that we won’t have to worry about the pesky statistical impossibilities.

I am not sure, but it seems likely that the fed’s relaxed attitude towards test scores has something to do with the abandonment of this false idol, which leads inevitably to the reluctant realization that perhaps The Nation At Risk was wrong, perhaps something else is involved with academic achievement besides simply plopping kids in the right classes. I offer in support the fact that Jerry Brown, governor of California, has remained almost entirely unscathed for shrugging off the achievement gap, saying hey, life’s a meritocracy. Who’s going to be a waiter if everyone’s “elevated” into some important job? Which makes me wonder if Jerry reads my blog.

So if teacher’s don’t make any difference and VAM is pointless, how come any yutz can’t become a teacher?

No one, ever, has argued that teachers don’t make any difference. What they do say is that individual teacher qualities make very little difference in student test scores and/or student academic outcomes, and the differences aren’t predictable or measurable.

If I may quote myself:

Teaching, like math, isn’t aspirin. It’s not medicine. It’s not a cure. It is an art enhanced by skills appropriate to the situation and medium, that will achieve all outcomes including success and failure based on complex interactions between the teachers and their audience. Treat it as a medicine, mandate a particular course of treatment, and hundreds of thousands of teachers will simply refuse to comply because it won’t cure the challenges and opportunities they face.

And like any art, teaching is not a profession that yields to market justice. Van Gogh died penniless. Bruces Dern and Davison are better actors than Chrisses Hemsworth and Evans, although their paychecks would never know it. Teaching, like art and acting, runs the range from velvet Elvis paint by numbers to Renoir, from Fast and Furious to Short Cuts. There are teaching superstars, and journeyman teachers, and the occasional lousy teacher who keeps working despite this–just as Rob Scheider still finds work, despite being so bad that Roger Ebert wrote a book about it.

Unlike art and acting, teaching is a government job. So while actors will get paid lots of money to pretend to be teachers, the job itself will never lead to the upside achieved by the private sector, despite the many stories about famous Korean tutors. Upside, practicing our craft won’t usually lead to poverty, except perhaps in North Carolina.

Most teachers understand this. It’s the outside world and the occasional short-termers who want teachers to be rewarded for excellence. Most teachers don’t support merit pay and vehemently oppose “student growth measures”.

The country appears to be moving towards a teacher shortage. I anticipate all talk of VAM to vanish. But if you want to improve teacher quality beyond its current much-better-than-it’s-credited condition, I suggest we consider limiting the scope of public education. Four of these five education policy proposals will do just that.

**************************************************************************
1 I was writing this up in the comments section of Scott Alexander’s commentary on teacher VAM research, when I remembered I was behind on my post quota. What the heck. I’m turning this into a post. It’s a long answer, but not as long-winded as Scott Alexander, the one blogger who makes me feel brusque.