“Good Teaching” and the Failure of Education Reform


 Student achievement is soundly measured; teacher effectiveness is not. The system is spending time and effort rating teachers using criteria that do not have a basis in research showing how teaching practices improve student learning.”–Mark Dynarski, Brookings Institute

Goodbye Mr. Chips. Up the Down Staircase. My Posse Don’t Do Homework. To Sir With Love. Dead Poet’s Society. Mr. Holland’s Opus. The 4th season of The Wire.

The “great teacher” movie has become a bit of a cliche. But decades of film and movies work on our emotions for good reason. That reason is not “Wow, this teacher’s practice is soundly based in practice that research shows improves student learning!”

“You cannot ignore facts. That is why any state that makes it unlawful to link student progress to teacher evaluations will have to change its ways.”–President Barack Obama, announcing Race to the Top


Reform movies usually fail. Won’t Back Down, a piece of blatant choice advocacy, bombed at the box office. Waiting for Superman was a big hit in elite circles but for a film designed as propaganda, it notably failed to move people to action, or even win considerable praise from the unconverted.

In general, performance-obsessed folks are the villains in mainstream movies and TV.

In Pump Up The Volume, the villain was a principal who found reason to expel teens whose lack of motivation and personal problems would affect her school’s test scores. This was before charters, when such practices became encouraged.

In Searching for Bobby Fischer (the movie, as opposed to the book), the parents reject the competition-obsessed teacher who wanted the boy to spend all his waking hours on chess, giving equal time to a homeless street guy who advocates a more open, aggressive, impulsive approach to chess. The parents preferred a son with a happy, rounded life to a neurotic who wouldn’t know a normal life. (Their son is, today, a happy well-rounded brilliant man who never became Bobby Fischer. In every sense of that meaning.)

In the famous season 4 of The Wire, AVP Donnelly tries hard to “juke the stats” by gaming the test, “spoonfeeding” the “Leave No Child Behind stuff”. Prez rejects this approach: “I came here to teach, right?”

I can think of only one movie in which a teacher was judged by his test scores and declared a hero:  Jaime Escalante in Stand and Deliver.

But most people throwing about Escalante’s name and achievements don’t really understand that  it took  fourteen years of sustained effort, handpicked teachers, legally impossible demands of his students, and a supportive principal to get 73 kids to pass the AB Calculus exam, with another 12 passing the BC, with around 140-200 in his program, out of a student population of 3500 . Once Escalante lost his supportive principal, he  was voted out as department chair because he was an arrogant jerk to other teachers, and handled defeat by  leaving the school.

Escalante’s story, channeled through Jay Mathews, thrilled policy wonks and politicians, and the public was impressed by the desire and determination of underprivileged kids to do what it takes to get an opportunity they otherwise wouldn’t have. But those same wonks and politicians wouldn’t have tolerated Escalante’s tracking, and 2% would have been an unacceptably low participation rate. He rejected a lot of kids. Mine is a contrarian view, but I’ve never though Escalante cared about kids who couldn’t or wouldn’t do the work he demanded.

“Teachers should be evaluated based on their ability to fulfill their core responsibility as professionals-—delivering instruction that helps students learn and succeed.”–The Widget Effect ((publication of the National Council for Teacher Quality)

In the book We Need To Talk About Kevin, the teacher Dana Rocco makes two brief appearances. The first is in a parent-teacher conference with Kevin’s mother:


We don’t know how Dana Rocco’s students’ performed on tests, or even how she taught. But purely on the strength of this passage, we know she is passionate about her subject and her students, who she works to reach in ways straightforward and otherwise. And in the second passage, we learn that she kept trying to reach Kevin right up to the moment he split her head open with a bolt from crossbow while she was trying to carry another of his victims away from danger.

In Oklahoma, a hurricane blew down a school, and they pulled a car off a teacher who had three kids underneath her. Teachers were pulling rubble away from classrooms before the rescue workers even got there. Were they delivering on their core responsibility as professionals?

The Sandy Hook teachers died taking bullets for their students.

Were they fulfilling their core responsibilities as professionals? Would NCTQ celebrate the teachers who abandoned their students to the deranged young gunman, who left their students to be buried in rubble? Could they argue that their efforts were better spent raising test scores for another ten years than giving their lives to save twenty students?

“Most notably, [the Every Student Succeeds Act} does not require states to set up teacher-evaluation systems based in significant part on students’ test scores—a key requirement of the U.S. Department of Education’s state-waiver system in connection with ESSA’s predecessor, the No Child Left Behind Act.–Stephen Sawchuk, “ESSA Loosens Reins on Teacher Evaluations”

ESSA is widely acknowledged to have ended the era of education reform, started in the 90s, hitting its peak in the Bush Obama years. Eulogies abound, many including prescriptions for the future by the same people who pushed the past policies that failed so completely, so spectacularly. In future years, the Bush-Obama choice/accountability reforms will ever more be accompanied by the words “roundly repudiated”. The world we live in going forward is as much a rejection of Michael Petrilli, John King, and Michelle Rhee as the “Nation At Risk” era was to the wasteful excesses of the 70s. The only real question left is why they still have billionaires paying their salaries.

They failed for many reasons. But chief among their failures was their conviction that public education is measured by student outcomes. This conviction is easily communicated, and allowed reformers to move politicians and policy in directions completely at odds with the public will. Reformers never captured the  hearts and minds of the public.  They failed to understand that student academic outcomes aren’t what the public thinks of when they think of good teaching.

The repudiation of education reform policies and preferences in favor of emotion-based, subjective expectations is one of the most comforting developments of the past twenty years. Go USA.


The Many Failings of Value-Added Modeling

Scott Alexander reviews the research on value-added models measuring teacher quality1. While Scott’s overview is perfectly fine, any such effort is akin to a circa 1692 overview of the research literature on alchemy. Quantifying teacher quality will, I believe, be understood in those terms soon enough.

High School VAM is Impossible

I have many objections to the whole notion of modeling what value a teacher adds, but top of the idiocy heap is how little attention is paid to the fact that VAM is only even possible with elementary school teachers. First, reading and basic math are the primary learning objectives of years 1-5. Second, elementary schools think of reading and math ability in terms of grade level. Finally, elementary teachers or their schools have considerable leeway in allocating instruction time by subject.

Now, go to high school (of which middle school is, as always, a pale imitation with similar issues). We don’t evaluate student reading skills by grade level, but rather “proficiency”. We don’t say “this 12th grader reads at the 10th grade level”. We have 12th graders who read at the 8th grade level, of course. We have 12th graders who read at the third grade level. But we don’t acknowledge this in our test scores, and so high school tests can’t measure reading progress. Which is good, because high school teachers aren’t tasked with reading instruction, so we wouldn’t expect students to make much progress. What’s that? Why don’t we teach reading instruction in high school, if kids can’t read at high school level, you ask? Because we aren’t allowed to. High school students with remedial level skills have to wait until college acknowledges their lack of skills.

And that’s reading, where at least we have a fighting shot of measuring progress, even though the tests don’t currently measure it–if we had yearly tests, which of course we don’t. Common Core ended yearly high school tests in most states. Math, it’s impossible because we pass most kids (regardless of ability) into the next class the next year, so there’s no “progress”, unless we measure kids at the beginning and end of the year, which introduces more tests and, of course, would show that the vast majority of students entering, say, algebra 2 don’t in fact understand algebra 1. Would the end of year tests measure whether or not the students had learned algebra 1, or algebra 2?

Nor can high school legally just allocate more time to reading and math instruction, although they can put low-scoring kids in double block instruction, which is a bad, bad thing.

Scope Creep

Most teachers at all levels don’t teach tested subjects and frankly, no one really cares about teacher quality and test scores in anything other than math or reading, but just pretend on everything else. Which leads to a question that proponents answer implicitly by picking one and ignoring the other: do we measure teacher quality to improve student outcomes or to spend government dollars effectively?

If the first, then what research do we have that art teachers, music teachers, gym teachers, or, god save us, special education teachers improve student outcomes? (answer: none.) If the second, then what evidence do we have that the additional cost of testing in all these additional topics, as well as the additional cost of defending the additional lawsuits that will inevitably arise as these teachers attack the tests as invalid, will be less strain on the government coffers than the cost of the purportedly inadequate teachers? What research do we have that any such tests on non-academic subjects are valid even as measures of knowledge, much less evidence of teacher validity?

None, of course. Which is why you see lawsuits by elective teachers pointing out it’s a tad unfair to be judged on the progress of students they’ve never actually met, much less taught. While many of those lawsuits get overturned as unfair but not constitutional, the idiocy of these efforts played no small part in the newest version of the federal ESEA, the ESSA, killed the student growth measure (SGM) requirement.

So while proponents might argue that math and English score growth have some relationship to teacher quality in those subjects, they can’t really argue for testing all subjects. Sure, people can pretend (a la Common Core) that history and science teachers have an impact on reading skills, but we have no mechanism to, and are years away from, changing instruction and testing in these topics to require reading content and measuring the impact of that specific instruction in that specific topic. And again, that’s just reading. Not math, where it’s easy enough to test students on their understanding of math in science and history, but very difficult to tangle out where that instruction came from. Of course, this is only an issue after elementary school. See point one.

Abandoning false gods

For the past 20 years or so, school policy has been about addressing “preparation”, which explains the obsession with elementary school. Originally, the push for school improvement began in high school. Few people realize or acknowledge these days that the Nation at Risk, that polemic seen as groundbreaking by education reformers but kind of, um, duh? by any regular people who take the time to read it, was entirely focused on high school, as can be ascertained by a simple perusal of its findings and recommendations. Stop coddling kids with easy classes, make them take college prep courses! That’s the ticket. It’s the easy courses, the low high school standards that cause the problem. Put all kids in harder classes. And so we did, with pretty disastrous results through the 80s. Many schools began tracking, but Jeannie Oakes and disparate impact lawsuits put an end to that.

I’m not sure when the obsession with elementary school began because I wasn’t paying close attention to ed policy during the 90s. But at some point in the early 90s, it began to register that putting low-skilled kids in advanced high school classes was perhaps not the best idea, leading to either fraud or a lot of failing grades, depending on school demographics. And so, it finally dawned on education reformers that many high school students weren’t “academically prepared” to manage the challenging courses that they had in mind. Thus the dialogue turned to preparing “underserved” students for high school. Enter KIPP and all the other “no excuses” charters which, as I’ve mentioned many times, focus almost entirely on elementary school students.

In the early days of KIPP, the scores seemed miraculous. People were bragging that KIPP completely closed the achievement gap back then, rather than the more measured “slight improvement controlling for race and SES” that you hear today. Ed reformers began pushing for all kids to be academically prepared, that is hey! Let’s make sure no child is left behind! And so the law, which led to an ever increasing push for earlier reading and math instruction, because hey, if we can just be sure that all kids are academically prepared for challenging work by high school, all our problems will be fixed.

Except, alas, they weren’t. I believe that the country is nearing the end of its faith in the false god of elementary school test scores, the belief that the achievement gap in high school is caused simply by not sufficiently challenging black and Hispanic kids in elementary school. Two decades of increasing elementary scores to the point that they appear to have topped out, with nary a budge in high school scores has given pause. Likewise, Rocketship, KIPP, and Success Academy have all faced questions about how their high-scoring students do in high school and college.

As I’ve said many times, high school is brutally hard compared to elementary school. The recent attempt to genuinely shove difficulty down earlier in the curriculum went over so well that the new federal law gave a whole bunch of education rights back to the states as an apology. Kidding. Kind of.

And so, back to VAM….Remember VAM? This is an essay about VAM. Well, all the objections I pointed out above–the problems with high school, the problems with specific subject teachers–were mostly waved away early on, because come on, folks, if we fix elementary school and improve instruction there, everything will fall into place! Miracles will happen. Cats will sleep with dogs. Just like the NCLB problem with 100% above average was waved away because hey, by them, the improvements will be sooooo wonderful that we won’t have to worry about the pesky statistical impossibilities.

I am not sure, but it seems likely that the fed’s relaxed attitude towards test scores has something to do with the abandonment of this false idol, which leads inevitably to the reluctant realization that perhaps The Nation At Risk was wrong, perhaps something else is involved with academic achievement besides simply plopping kids in the right classes. I offer in support the fact that Jerry Brown, governor of California, has remained almost entirely unscathed for shrugging off the achievement gap, saying hey, life’s a meritocracy. Who’s going to be a waiter if everyone’s “elevated” into some important job? Which makes me wonder if Jerry reads my blog.

So if teacher’s don’t make any difference and VAM is pointless, how come any yutz can’t become a teacher?

No one, ever, has argued that teachers don’t make any difference. What they do say is that individual teacher qualities make very little difference in student test scores and/or student academic outcomes, and the differences aren’t predictable or measurable.

If I may quote myself:

Teaching, like math, isn’t aspirin. It’s not medicine. It’s not a cure. It is an art enhanced by skills appropriate to the situation and medium, that will achieve all outcomes including success and failure based on complex interactions between the teachers and their audience. Treat it as a medicine, mandate a particular course of treatment, and hundreds of thousands of teachers will simply refuse to comply because it won’t cure the challenges and opportunities they face.

And like any art, teaching is not a profession that yields to market justice. Van Gogh died penniless. Bruces Dern and Davison are better actors than Chrisses Hemsworth and Evans, although their paychecks would never know it. Teaching, like art and acting, runs the range from velvet Elvis paint by numbers to Renoir, from Fast and Furious to Short Cuts. There are teaching superstars, and journeyman teachers, and the occasional lousy teacher who keeps working despite this–just as Rob Scheider still finds work, despite being so bad that Roger Ebert wrote a book about it.

Unlike art and acting, teaching is a government job. So while actors will get paid lots of money to pretend to be teachers, the job itself will never lead to the upside achieved by the private sector, despite the many stories about famous Korean tutors. Upside, practicing our craft won’t usually lead to poverty, except perhaps in North Carolina.

Most teachers understand this. It’s the outside world and the occasional short-termers who want teachers to be rewarded for excellence. Most teachers don’t support merit pay and vehemently oppose “student growth measures”.

The country appears to be moving towards a teacher shortage. I anticipate all talk of VAM to vanish. But if you want to improve teacher quality beyond its current much-better-than-it’s-credited condition, I suggest we consider limiting the scope of public education. Four of these five education policy proposals will do just that.

1 I was writing this up in the comments section of Scott Alexander’s commentary on teacher VAM research, when I remembered I was behind on my post quota. What the heck. I’m turning this into a post. It’s a long answer, but not as long-winded as Scott Alexander, the one blogger who makes me feel brusque.

Ed Schools and Affirmative Action

Education policy rarely—hell, let’s say never—results in anticipated consequences. But usually, this acknowledgment turns our thoughts to bleak, dark places.

So let’s think of the one time when an education policy’s unanticipated consequences actually had a reasonably positive outcome—and opportunity for a chuckle. I speak, of course, of the 1998 Higher Education Act, specifically Title II, section 206: “Increasing success in the pass rate for initial State teacher certification or licensure, or increasing the numbers of highly qualified individuals being certified or licensed as teachers through alternative programs.”

The plan: force education schools to report their students’ licensure pass rates.

The pass rates were widely expected to be dismal. According to Sandra Stotsky, the 60% failure rate seen in Massachussetts, which had instituted a similar requirement a few years earlier, had provoked the federal law. The Democrats behind the bipartisan bill expected to see a tiered system result, with ed schools ranked by their licensure test pass rates. Those schools with pass rates below 80% would improve or be shot and put out of their misery. It’d be like law school.

The Republican politicians and reformers of all denominations saw this as a means of destabilizing the evil cartel. They were certain that all the ed schools would have low pass rates. It was not a coincidence that the 1998 law required states to provide alternative certification paths to a credential. Alternative certification was actually the secret sauce of the 1998 law which would, its advocates fantasized, enable an organic move from ed schools to alternative certification programs. Parents would learn that ed schools turned out students with abysmally low pass rates on simple tests, so they’d demand that their children’s schools hire from only those schools with high pass rates. Faced with the realization that traditional ed schools turned out simpletons, parents would join reformers in a push for alternative certification.

So you can imagine the anticipation back in November, 2001, when the first Title II report was released online. It got 7000 hits—no doubt all of them from ed school critics, eager to curate a list of dismal passing rates, looking for a high-profile target.

and…what’s this? They all passed?

Well. I laughed, anyway.

Ed schools had been accepting and graduating students who they knew wouldn’t pass the licensure test, in the name of affirmative action. Faced with a threat, they sacrificed their ideology and commitment to collect money from underprivileged students wanting a college degree, and made a new rule: No pass, no diploma.

And so, the much-anticipated Title II reports showed that most ed schools had 100% passing rates. All but a very few easily bested the 80% barrier. Far from showing a picture of unprepared, low quality candidates, the Title II reports gave a glowing picture of competence.

The “tiered” results dreamed of by the law’s supporters? Useless. As an example, just one of Kentucky’s 25 ed schools that first year had a low passing rate of 55%, while the others were all above the minimum. So schools with 93% passing rates were in the third tier. Definitely not planned. Several states reported 100% passing rates—California, for example, which doesn’t credential teachers with an undergraduate education degree, simply required all candidates to pass the tests to gain admission.

A simple policy change rendered the law irrelevant. And expensive, alas–states spend lots of money turning out largely useless reports.

(Here’s a more measured account of the law’s intent and why it went off the rails.)

Much gnashing of teeth ensued, much castigation, many claims that the tests were incredibly easy, testing just basic skills, so of course the passing rate was so high. They accused ed schools of gaming the requirement, states of lowering the pass rate. They castigated ed schools for having such low standards, for cheating, for wasting the government’s time. For a taste of the frustration and near rage of the enjoy this 2002 Edtrust diatribe or the NCTQ wishlist.

Critics regrouped. Subsequent retoolings of the law attempted to thwart the ed schools—for example, ed schools now have to report their student score average against the state average– and lord knows NCTQ knows how to push for meaningless requirements, but it’s been pretty much game over ever since. While alternative teacher certification programs have grown, ed schools aren’t worried about their market share. It still takes a lot of work and education to become a teacher. (Before you wave TFA at me–they all still go to ed school, Relay or otherwise.)

But the attempt to destabilize or “improve” ed schools was lost, and the proponents knew it. How extremely annoying. No differentiation, no high profile targets, no rationale to get the public pushing for alternative certification programs.

Ed schools were angry right back, of course, but you have to figure they had a whole bunch of smug in there. I mean, seriously, who could get mad at ed schools for requiring their candidates to pass the licensure tests? Wasn’t the point to raise teacher quality? In your face, Snidely. Foiled you again.

That’s the end of the funny part.

The strategy wasn’t free. Ed schools couldn’t commit affirmative action, at least not as most colleges do.

Ironic, really, that the profession notorious for its supposedly lax standards, is the only profession that denies itself the opportunity to give underrepresented minorities a chance at a good government job. This reality is utterly obscured liars or fools like Arne Duncan (your choice) complaining that a 95% pass rate shows the lack of rigor.

Reality: most of the tests are appropriately rigorous, and the pass rate is considerably less than 95%.


When people refer to the “high passing rate” of licensure exams, they’re either deliberately deceitful or extremely ill-informed. The exams leave carnage in their wake when all testers are considered, not just ed school graduates, and a substantial portion of that carnage is black and Hispanic.

We all know that many college students, indeed, many college graduates, lack basic skills. We all know that these individuals are, overwhelmingly but not exclusively, black and Hispanic. Colleges let them in and then graduate them anyway, both out of ideological zeal and a reasonable fear of lawsuits.

But alone among all the professions, the majority of prospective undergraduate teachers are now required to demonstrate that they have a given skill set (set by each state, much to the feds’ chagrin) at some point before they graduate. At the graduate level, they have to pass the test just to get in. Ed schools can’t use a different standard to accept black and Hispanic candidates. They are limited to those blacks and Hispanics that can both pass the tests and want to be teachers. And most ed schools aren’t selective, so those candidates are in, anyway.

I’m oversimplifying. Some ed schools are dedicated to underrepresented minorities: HCBU ed schools , and some smaller colleges who swallow the low pass rate on their Title II report for the tuition. Alternative credential programs, once envisioned as the elite corps of folks too good for traditional ed schools, are more commonly a means to produce black and Hispanic teachers, as they are immune from the Title II reports, and passing the tests is their primary curricular objective.

But traditional ed schools, both public and elite, the ones producing the bulk of all teachers, can’t realistically provide that extensive training for a small number of students, so they “counsel out” those who don’t pass the Praxis by a certain date–or require passage for admission.

But, you say, the tests have cut scores, set far below the average. Well duh. That’s because the states don’t want to shut out blacks and Hispanics. That’s where the affirmative action sneaks in—not by ed schools, but by the states, in setting the cut scores.

I don’t know the specifics of the math involved in setting the cut score. But it seems obvious that the bulk of whites (and Asians determined to infuriate their parents) are easily clearing the cut score—or the mean would be lower. It seems equally obvious that very few blacks and Hispanics are easily clearing the cut score—or the cut score would be higher. I suspect the cut scores for elementary school are letting through more than optimal, but I can’t find any data on this. The cut score is lower than the average, but not that many people are scoring far below that average—and they are disproportionately black and Hispanic, just as the states want.

Not only did most ed schools begin to require a passing score prior to graduation, but states raised the cut scores (still below the average, though) in response to No Child Left Behind. The mean scores jumped dramatically, both as a group and by race:

ETSsatpraxisverbal ETSsatpraxismath

The average scores by race, coupled with the average SAT scores for each type of teacher, suggest that the bulk of Hispanic and black passing the test are elementary school teachers.

Before the 1998 Act, many blacks and Hispanics ed school graduates who didn’t pass the test got an emergency license, which doesn’t require a test, and hired by schools on that basis, using the fiction that they were working towards their credential. No Child Left Behind cracked down on emergency credentials and closed this loophole. The ETS report points out that a disproportionately high number of Praxis testers from 2002-2005 were employed teachers who either had an emergency or otherwise unqualified credential, and these testers were disproportionately black. The Clarence Mumford ring’s clients were often black teachers with emergency credentials, as well as clients who couldn’t pass the original test.

This may be why there wasn’t a huge fuss about the failure of many black candidates to pass the Praxis in the 90s–they were able to get teaching jobs. Or maybe there was a fuss and google just doesn’t like me.

So most public and elite ed schools can’t commit affirmative action, can’t accept wholly unqualified candidates in the name of the diversity, take their money, push them through classes they don’t really understand, pressure professors into giving passing grades, graduate them, and let them figure out after it’s all over that they can’t pass the licensure test.

In other words, ed schools can’t be law schools.

This all came about because reformers and politicians had this bizarre delusion that the quality of the ed school had something to do with the licensure test pass rates, when in fact the licensure pass rates have everything to do with the quality of the student body.

So the 1998 law and the follow-on restrictions of NCLB, restrictions based on a profound underestimation of an average teacher’s intellect, didn’t even come close to having their desired impact. Meanwhile, the laws inadvertently took away the dream of teaching for many black and Hispanic teachers. The media steadfastly ignores this and wonders gravely where all the black and Hispanic teachers went.

I can’t see the change as a bad thing; while some of the black and Hispanic ed school grads who couldn’t pass the test found jobs with emergency credentials, I doubt they all did.

This way, eventually, the feds and the states will be forced to realize they need to lower cut scores, at least for elementary school teachers, if they want to have more black and Hispanic teachers. This, too, I see as a good thing.

But as I started with a chuckle, so I shall finish: the idea that Teach for America’s “diversity” is in some way comparable and thus superior to ed schools. That’s really, really funny.

If you’ve been paying attention, you’re wondering how the hell TFA recruited so many blacks capable of passing the license tests. Yeah, me, too. I have some ideas. Another post.