Tag Archives: test scores

“Good Teaching” and the Failure of Education Reform


 Student achievement is soundly measured; teacher effectiveness is not. The system is spending time and effort rating teachers using criteria that do not have a basis in research showing how teaching practices improve student learning.”–Mark Dynarski, Brookings Institute

Goodbye Mr. Chips. Up the Down Staircase. My Posse Don’t Do Homework. To Sir With Love. Dead Poet’s Society. Mr. Holland’s Opus. The 4th season of The Wire.

The “great teacher” movie has become a bit of a cliche. But decades of film and movies work on our emotions for good reason. That reason is not “Wow, this teacher’s practice is soundly based in practice that research shows improves student learning!”

“You cannot ignore facts. That is why any state that makes it unlawful to link student progress to teacher evaluations will have to change its ways.”–President Barack Obama, announcing Race to the Top


Reform movies usually fail. Won’t Back Down, a piece of blatant choice advocacy, bombed at the box office. Waiting for Superman was a big hit in elite circles but for a film designed as propaganda, it notably failed to move people to action, or even win considerable praise from the unconverted.

In general, performance-obsessed folks are the villains in mainstream movies and TV.

In Pump Up The Volume, the villain was a principal who found reason to expel teens whose lack of motivation and personal problems would affect her school’s test scores. This was before charters, when such practices became encouraged.

In Searching for Bobby Fischer (the movie, as opposed to the book), the parents reject the competition-obsessed teacher who wanted the boy to spend all his waking hours on chess, giving equal time to a homeless street guy who advocates a more open, aggressive, impulsive approach to chess. The parents preferred a son with a happy, rounded life to a neurotic who wouldn’t know a normal life. (Their son is, today, a happy well-rounded brilliant man who never became Bobby Fischer. In every sense of that meaning.)

In the famous season 4 of The Wire, AVP Donnelly tries hard to “juke the stats” by gaming the test, “spoonfeeding” the “Leave No Child Behind stuff”. Prez rejects this approach: “I came here to teach, right?”

I can think of only one movie in which a teacher was judged by his test scores and declared a hero:  Jaime Escalante in Stand and Deliver.

But most people throwing about Escalante’s name and achievements don’t really understand that  it took  fourteen years of sustained effort, handpicked teachers, legally impossible demands of his students, and a supportive principal to get 73 kids to pass the AB Calculus exam, with another 12 passing the BC, with around 140-200 in his program, out of a student population of 3500 . Once Escalante lost his supportive principal, he  was voted out as department chair because he was an arrogant jerk to other teachers, and handled defeat by  leaving the school.

Escalante’s story, channeled through Jay Mathews, thrilled policy wonks and politicians, and the public was impressed by the desire and determination of underprivileged kids to do what it takes to get an opportunity they otherwise wouldn’t have. But those same wonks and politicians wouldn’t have tolerated Escalante’s tracking, and 2% would have been an unacceptably low participation rate. He rejected a lot of kids. Mine is a contrarian view, but I’ve never though Escalante cared about kids who couldn’t or wouldn’t do the work he demanded.

“Teachers should be evaluated based on their ability to fulfill their core responsibility as professionals-—delivering instruction that helps students learn and succeed.”–The Widget Effect ((publication of the National Council for Teacher Quality)

In the book We Need To Talk About Kevin, the teacher Dana Rocco makes two brief appearances. The first is in a parent-teacher conference with Kevin’s mother:


We don’t know how Dana Rocco’s students’ performed on tests, or even how she taught. But purely on the strength of this passage, we know she is passionate about her subject and her students, who she works to reach in ways straightforward and otherwise. And in the second passage, we learn that she kept trying to reach Kevin right up to the moment he split her head open with a bolt from crossbow while she was trying to carry another of his victims away from danger.

In Oklahoma, a hurricane blew down a school, and they pulled a car off a teacher who had three kids underneath her. Teachers were pulling rubble away from classrooms before the rescue workers even got there. Were they delivering on their core responsibility as professionals?

The Sandy Hook teachers died taking bullets for their students.

Were they fulfilling their core responsibilities as professionals? Would NCTQ celebrate the teachers who abandoned their students to the deranged young gunman, who left their students to be buried in rubble? Could they argue that their efforts were better spent raising test scores for another ten years than giving their lives to save twenty students?

“Most notably, [the Every Student Succeeds Act} does not require states to set up teacher-evaluation systems based in significant part on students’ test scores—a key requirement of the U.S. Department of Education’s state-waiver system in connection with ESSA’s predecessor, the No Child Left Behind Act.–Stephen Sawchuk, “ESSA Loosens Reins on Teacher Evaluations”

ESSA is widely acknowledged to have ended the era of education reform, started in the 90s, hitting its peak in the Bush Obama years. Eulogies abound, many including prescriptions for the future by the same people who pushed the past policies that failed so completely, so spectacularly. In future years, the Bush-Obama choice/accountability reforms will ever more be accompanied by the words “roundly repudiated”. The world we live in going forward is as much a rejection of Michael Petrilli, John King, and Michelle Rhee as the “Nation At Risk” era was to the wasteful excesses of the 70s. The only real question left is why they still have billionaires paying their salaries.

They failed for many reasons. But chief among their failures was their conviction that public education is measured by student outcomes. This conviction is easily communicated, and allowed reformers to move politicians and policy in directions completely at odds with the public will. Reformers never captured the  hearts and minds of the public.  They failed to understand that student academic outcomes aren’t what the public thinks of when they think of good teaching.

The repudiation of education reform policies and preferences in favor of emotion-based, subjective expectations is one of the most comforting developments of the past twenty years. Go USA.


Algebra Student Distribution–An Example

I thought I’d provide some data that I never see in the discussions about student achievement or teacher evaluations that seems to me to be highly relevant.

Last year, I taught algebra I. I taught more algebra I than any other teacher: 4 sections. Three teachers had three sections; one teacher had one (and won’t be part of this data collection). One of my classes was intervention, a double period, and one of the other teachers who had three segments had that same double.

It became apparent to me and another teacher, listening to another colleague, that this third teacher had dramatically different students than we did. Or at least we hoped she did, because she mentioned casually once that she’d had an extra day and so introduced point-slope to the kids, and they got it fine. The other teacher and I whispered about that conversation later—would your kids learn point slope in a day? No, we agreed, our kids weren’t even clear about slope intercept, and wouldn’t be for a while. So we decided, tentatively, that the other teacher had much more advanced students than we did. Of course, we were both worried that she was just that much better a teacher than we were, which meant we were utterly disastrous. Because if she had the same population of kids we did, and was able to teach them point slope in a day, then we were very bad teachers indeed.

We weren’t resentful, we weren’t mad at the teacher at all—she’s an excellent teacher who always has time to answer questions. We were just confused—was there that much difference in our populations, or were we that horrible that we couldn’t get our kids to learn?

So it was a matter of considerable import to me to determine whether my kids were really that much weaker, or if I was just that rotten a teacher. We had some common assessments that year that reassured me—I was slightly below them, on average, but not unusually considering I was teaching intervention kids, and when I took those kids out of the results, I was roughly the same. And we were constantly hearing that our algebra scores were way above previous years, so given I taught about a fourth of all the students, I couldn’t be doing that bad.

Still, I wondered. And so, when the state test results came out—where we’d improved on last year’s pass rate by over 10 percentage points—I delved into the assessments database and pulled out all the data for last year and looked at the allocation by test scores.

A couple points to remember as you look at these numbers:

  • They do not include all students. About 100 or more of the nearly 500 algebra students don’t have test scores from the previous year. Some are excellent, some are weak. Each teacher in the graph below is missing approximately 20-25 students.
  • Over 75% of our algebra students took algebra the year before—that is, they were repeaters. Until this year, freshmen students were put in an algebra class if they hadn’t passed or taken the district test. It didn’t matter if they’d scored advanced or proficient on the state test. They could opt out, but they had to sign a document. Consequently, we had some very strong students repeating algebra. This year, the school discontinued that policy.
  • Some small percentage of our students took pre-algebra. With some exceptions, these students were extremely weak.

Because of the pre-algebra issue, I broke down the scores twice. First up, the population of students who have test scores from last year, broken down by state score. This chart makes no distinction between students who just finished pre-algebra and those who are repeating algebra for the first, second, or third time—because the state reports don’t care, either.

I also put the average incoming scores of the students, both with and without pre-algebra students (that is, students who took pre-algebra, not algebra, the previous year).

Brief comments:

  • If you paid attention above, you know that I’m Teacher 3, since that’s the teacher with the most students. Teacher 1 is who I nervously conferred with about the teacher who seemed to have entirely different students—who is Teacher 4. Teacher 2 is the teacher who, like me, has an intervention class.
  • While Teacher 1 and I have an equally high total of FBB and BB students, my students’ average score is considerably lower than all the teachers. T2, who had the other intervention class, has a low average despite having fewer total students in the FBB and BB category.
  • Teacher 1 and I were, apparently, correct. Teacher 4 had very different students than we did. Her average students’ score was a full 23 points higher than mine and 12-14 points higher than Teachers 1 and 2.
  • One of the reasons I had so many weak students is that I told any student with an Advanced or Proficient state test score that, in my opinion, they should go on to Geometry. I followed the 14 kids that took this advice, and they all did well. The classes are then “rebalanced”, after all the adjustments happened. I picked up a number of wonderful kids, but only two of them were particularly good at Algebra (and neither of them had test scores from last year).
  • These were the student populations we had at the beginning of the year. This graph is after the rebalancing, so almost no students switched teachers after this point. The only changes after this point were arrivals and departures.

Now, here’s the same picture except I pulled out the students who took pre-algebra. This will show graphically what percentage of each teacher’s population had taken pre-algebra last year.

Brief comments:

  • I had the most pre-algebra students, but the percentage differences aren’t terribly large.
  • Notice that the bulk of my pre-algebra students are in the BB category (see how the block shrinks from the first to second graph). I also had three pre-algebras with Basic scores, two of whom left and the third slept through class every day. T1 pick up a good chunk of FBB students in pre-algebra (although she had the only “Proficient” pre-algebra student with a test score).
  • This again emphasizes that T1 and I were not wrong to wonder, because we not only had the weakest students, but a good chunk of our weakest students had been weak at pre-algebra, and hadn’t even had algebra before.

For those familiar with low ability algebra student populations, you might be wondering why I didn’t break it down by age. Answer: too much work and didn’t tell me all that much meaningful. About 80 of the kids are sophomores, juniors or seniors. These kids don’t fall easily into one category, but three. Some of them have good abilities and just goofed off too much. In some cases they passed algebra in eighth grade but missed taking the district test (see above) and then goofed around, didn’t do homework, and flunked their ninth grade year (Don’t get me started.) These kids were usually pretty good—they knew they’d screwed up and were determined to move on. Then there were the kids who passively sat and did nothing because they didn’t understand what was going on, and either didn’t understand or didn’t care that just doing some work would get them moved on. These kids have very low skills, but if you can reach them it’s satisfying. And then, of course, there are the kids who are simply waiting out the time until they go to alternative high school, and are behavior nightmares. Teachers 1, 3, and 4 all had around 26 non-freshman. Teacher 2 had about 12.

I also have the results from our state tests, comparing before and after for the students that have two years of test scores. In my opinion, all the teachers did well, but I won’t post those numbers without the teachers’ consent. I will say that, based largely on our algebra performance, our school had an excellent year.

But these graphs, in and of themselves, say quite a bit and attack many assumptions about what should and should not be reasonable expectations for student outcomes. For example, we are expected to move a large percentage of our students to Advanced or Proficient, something that I will now assert is only possible if a good chunk of your students enter with Proficient or Advanced abilities in the first place. I’d also add that the graphs seem to have something relevant to say in terms of student equity because seriously, what strong student would want to be in the class with 65% students that struggle with basic math facts? Oh, and yes—they just might have some relevance to the value-added conversation, since how is it fair to be compared on needle movement when some of your students have taken algebra three times and still failed, or just finished pre-algebra without basic abilities?

My test numbers as published in the reports looked….eh. Not disastrous, but nothing that says “Wow, this teacher sure knows how to raise test scores!” But I feel fine about my numbers. First, there’s the certainty that comes from the logic of “I taught the most algebra students, we had a great year in algebra, therefore whatever else is true, I couldn’t have done that badly.” But also, because I took them and broke them down by incoming category and compared them to the other teachers on the same basis, I have confirmable data that my students were far weaker, andI know that my students improved reasonably compared to the other teachers. To the extent they didn’t improve as much, I have to wonder how much is my teaching, and how much of it is the fact that I had to actively help far more students up the hill?

I would like to see more schools give teachers data like this but alas, it would require caring about incoming student ability. Administrators and reformers get offended if you bring the subject up—are you suggesting all students can’t achieve? Progressives hate tests and hold they’re irrelevant, so they aren’t going to argue for comparative ability loads.

I’m pro-testing, and have no problems with being evaluated by test scores. In fact, I’d probably be better off. But I’m going to be compared against another algebra teacher who had 65% below basic or lower students, right? Along with 30% students ready to learn algebra, so I had to constantly balance competing needs? I’m going to be compared against that teacher, right? Right?

We can’t have a meaningful conversation about student achievement, much less grading teachers on student achievement, until we know whether or not ability distributions like those in the graphs above have an impact on outcomes. And right now, we not only don’t know, but most people don’t care.