Tag Archives: teacher evaluations

Why Merit Pay and Value Added Assessment Won’t Work, Part I

The year I taught Algebra I, I did a lot of data collection, some of which I discussed in an earlier post. Since I’ve been away from that school for a while, I thought it’d be a good time to finish the discussion.

I’m not a super stats person. I’m not even a mathematician. To the extent I know math, it’s applied math, with the application being “high school math problems”. This is not meant to be a statistically sound analysis, comparing Treatment A to Treatment B. But it does reveal some interesting big picture information.

This data wasn’t just sitting around. A genuine DBA could have probably whipped up the report in a few hours. I know enough SQL to get what I want, but not enough to get it quickly. I had to run reports for both years, figure out how to get the right fields, link tables, blah blah blah. I’m more comfortable with Excel than SQL, so I dumped both years to Excel files and then linked them with student id. Unfortunately, the state data did not include the subject name of each test. So I could get 2010 and 2011 math scores, but it took me a while to figure out how to get the 2010 test taken—and that was a big deal, because some of the kids whose transcripts said algebra had, in fact, taken the pre-algebra (general math) test. Not that I’m bitter, or anything.

Teachers can’t get this data easily. I haven’t yet figured out how to get the data for my current school, or if it’s even possible. I don’t know what my kids’ incoming scores are, and I still haven’t figured out how my kids did on their graduation tests.

So the data you’re about to see is not something teachers or the general public generally has access to.

At last school, in the 2010-11 school year, four teachers taught algebra to all but 25 of over 400 students. I had the previous year’s test scores for about 75% of the kids, 90% of whom had taken algebra the year before, the other 10% or so having taken pre-algebra. This is a slightly modified version of my original graph; I put in translations of the scores and percentages.

algallocdist

You should definitely read the original post to see all the issues, but the main takeaway is this: Teacher 4 has a noticeably stronger population than the other three teachers, with over 40% of her class having scored Basic or Higher the year before, usually in Algebra. I’m Teacher 3, with by far the lowest average incoming scores.

The graph includes students for who I had 2010 school year math scores in any subject. Each teacher has from 8-12 pre-algebra student scores included in their averages. Some pre-algebra kids are very strong; they just hadn’t been put in algebra as 8th graders due to an oversight. Most are extremely weak. Teachers are assessed on the growth of kids repeating algebra as well as the kids who are taking it for the first time. Again, 80% of the kids in our classes had taken algebra once. 10-20% had taken it twice (our sophomores and juniors).

Remember that at the time of these counts, I had 125 students. Two of the other teachers (T1 and T4) had just under 100, the third (T2) had 85 or so. The kids not in the counts didn’t have 2010 test scores. Our state reports student growth for those with previous years’ scores and ignores the rest. The reports imply, however, that the growth is for all students. Thanks, reports! In my case, three or four of my strongest students were missing 2010 scores, but the bulk of my students without scores were below average.

So how’d we do?

I limited the main comparison to the 230 students who took algebra for both years and had scores for both years and had one of 4 teachers.

scoreimpalg

Here are the pre-algebra and algebra intervention growth–pre-algebra is not part of the above scores, but the algebra intervention is a sub-group. These are tiny groups, but illustrative:

scoreimpother

The individual teacher category gains/slides/pushes are above; here they are in total:
myschooltotcatchg

(Arrrggh, I just realized I left off the years. Vertical is 2010, horizontal is 2011.)

Of the 230 students who took algebra two years in a row, the point gain/loss categories went like this:

Score change > + 50 points

57
Score change > -20 points

27
-20 points < score change < + 50 points

146

Why the Slice and Dice?

As I wrote in the original post, Teacher 1 and I were positive that Teacher 4 had much stronger student population than we did—and the data supports that belief. Consequently I suspected that no matter how I sliced the data, Teacher 4 would have the best numbers. But I wanted a much better idea of how I’d done, based on the student population.

Because one unshakeable fact kept niggling at me: our school had a tremendous year in 2010-2011, based largely on our algebra scores. We knew this all throughout the year—benchmark tests, graduation tests—and our end of year tests confirmed it, giving us a huge boost in the metrics that principals and districts cared about. And I’d taught far more algebra students than any other teacher. Yet my numbers based on the district report looked mediocre or worse. I wanted to square that circle.

The district reports the data on the right. We were never given average score increase. A kid who had a big bump in average score was irrelevant if he or she didn’t change categories, while a kid who increases 5 points from the top of one category to the bottom of another was a big win. All that matters were category bumps. From this perspective, my scores look terrible.

I wanted to know about the data on the left. For example Teacher 1 had far better “gain” category numbers than I did. But we had the same mean improvement overall, of 5%, with comparable increases in each category. Broken down further, Teacher 4’s spectacular numbers are accompanied by a huge standard deviation—she improved some kids a lot. The other three teachers might not have had as dramatic a percentage increase, but the kids moved up more consistently. In three cases, the average score declined, but was accompanied by a big increase in standard deviation, suggesting many of the kids in that category improved a bit, while a few had huge drops. Teacher 2 and I had much tighter achievement numbers—I may have moved my students less far, but I moved a lot of them a little bit. None of this is to argue for one teacher’s superiority over another.

Of course, once I broke the data down by initial ability, group size became relevant but I don’t have the overall numbers for each teacher, each category, to calculate the confidence interval or a good sample size. I like 10. Eleven of the 18 categories hit that mark.

How many kids have scores for both years?

The 2011 scores for our school show that just over 400 students took the algebra test. My fall 2010 graph above show 307 students with 2010 scores (in any subject) who began the year. Kick in another 25 for the teacher I didn’t include and we had about 330 kids with 2010 scores. My results show 230 kids with algebra scores for both years, and the missing teacher had 18, making 248. Another 19 kids had pre-algebra scores for the first year, although the state’s reports wouldn’t have cared about that. So 257 of the kids had scores for both years, or about 63% of the students tested.

Notice that I had the biggest fall off in student count. I think five of my kids were expelled before the tests, another four or so left to alternative campuses. I remember that two went back to Mexico; one moved to his grandparents’ in Iowa. Three of my intervention students were so disruptive during the tests that they were ejected, so their test results were not scored (the next year our school had a better method of dealing with disruptive students). Many of the rest finished the year and took the tests, but they left the district over the summer (not sure if they are included in the state reports, but I couldn’t get their data). I think I had the biggest fall-off over the year in the actual student counts; I went from 125 to 95 by year-end.

What about the teachers?

Teacher 1: TFA, early-mid 20s, Asian, first year teacher. Had a first class honors masters degree in Economics from one of the top ten universities in Europe. She did her two, then left teaching and is now doing analytics for a fashion firm in a city where “fashion firm” is a big deal. She was the best TFAer I’ve met, and an excellent new teacher.

Teacher 2: About 60. White. A 20-year teacher who started in English, took time off to be a mom, then came back and got a supplemental math credential. She is only qualified to teach algebra. She is the prototype for the Teacher A I described in my last post, an algebra specialist widely regarded as one of the finest teachers in the district, a regard I find completely warranted.

Teacher 3: Me. 48 at the time, white. Second career, second year teacher, English major originally but a 15-year techie. Went to one of the top-rated ed schools in the country.

Teacher 4: Asian, mid-late 30s. Math degree from a solid local university, teaches both advanced math and algebra. She became the department head the next year. The reason her classes are top-loaded with good students: the parents request her. Very much the favorite of administration and district officials.

And so, a Title I school, predominantly Hispanic population (my classes were 80% Hispanic), teachers that run the full gamut of desirability—second career techie from a good ed school, experienced pro math major, experienced pro without demonstrated higher math ability, top-tier recent college grad.

Where was the improvement? Case 1: Educational Policy Objectives

So what is “improvement”? Well, there’s a bunch of different answers. There’s “significant” improvement as researchers would define it. Can’t answer that with this data. But then, that’s not really the point. Our entire educational policy is premised on proficiency. So what improvement does it take to reach “proficiency”, or at least to change categories entirely?

Some context: In our state, fifty points is usually enough to move a student from the bottom of one category to the bottom of another. So a student who was at the tip top of Below Basic could increase 51 points and make it to the bottom of Proficient, which would be a bump of two categories. An increase of 50 points is, roughly, a 17% increase. Getting from the bottom of Far Below Basic to Below Basic requires an increase of 70%, but since the kids were all taking Algebra for the second time, the boost needed to get them from FBB to BB was a more reasonable 15-20%. To get from the top of the Far Below Basic category to Proficient—the goal that we are supposed to aim for—would require a 32% improvement. Improving from top of Basic to bottom of Advanced requires a 23% improvement.

Given that context, only two of the teachers in one category each moved the needle enough to even think about those kind of gains—and both categories had 6-8 students. Looking at categories with at least ten students, none of the teachers had average gains that would achieve our educational policy goals. In fact, from that perspective, the teachers are all doing roughly the same.

I looked up our state reports. Our total population scoring Proficient or Advanced increased 1%.

Then there’s this chart again:

myschooltotcatchg

32 students moved from “not proficient” to “proficient/advanced”. 9 students moved from “proficient” to “advanced”. I’ll throw them in. 18% of our students were improved to the extent that, officially, 100% are supposed to achieve.

So educational policy-wise, not so good.

Where was the improvement? Case 2: Absolute Improvement

How about at the individual level? The chart helps with that, too:

myschooltotcatchg

Only 18 students were “double gainers” moving up two categories, instead of 1. Twelve of those students belonged to Teacher 4; 4 belonged to Teachers 1 , while Teacher 2 and I only had 1 (although I had two more that just missed by under 3 points). Teachers 1, 2, and 3 had one “double slider” each, who dropped two categories.

(I interviewed all the teachers on the double gainers; in all cases, the gains were unique to the students. The teachers all shrugged—who knew why this student improved? It wasn’t some brilliant aha moment unique to that teacher’s methods, nor was it due to the teacher’s inspiring belief and/or enthusiasm. Two of the three echoed my own opinion: the students’ cognitive abilities had just developed over the past year. Or maybe for some reason they’d blown off the test the year before. I taught two of the three “double sliders”—one was mine, one I taught the following year in geometry, so I had the opportunity to ask them about their scores. Both said “Oh, yeah, I totally blew off the test.” )

So a quarter of the students had gains sufficient to move from the middle of one category to the middle of another. The largest improvement was 170 points, with about 10 students seeing >100 point improvement. The largest decline was 169 points, with 2 students seeing over 100 point decline. Another oddity: only one of these two students was a “double slider”. The other two “double sliders” had less than 100 point declines. My double slider had a 60 point decline; my largest point decline was 89 points, but only dropped one category.

However, the primary takeaway from our data is that 63% of the students forced to take algebra twice were, score-wise if not category-wise, a “push”. They dropped or gained slightly, may have moved from the bottom of one category to the middle of the same, or maybe from the top of one category to the bottom of another.

One might argue that we wasted a year of their lives.

State reports say our average algebra score from 2010 to 2011 nudged up half a point.

So it’s hard to find evidence that we made much of a difference to student achievement as a whole.

I know this is a long post, so I’ll remind the reader that all of the students in my study have already taken algebra once. Chew on that for a while, will you?

Where was the improvement? Case 3: Achievement Gap

I had found no answer to my conundrum in my above numbers, although I had found some comfort. Broken down by category, it’s clear I’m in the hunt. But the breakdown doesn’t explain how we had such a stupendous year.

But when I thought of comparing our state scores from year to year, I got a hint. The other way that schools can achieve educational policy objectives is by closing the achievement gap.

All of this data comes from the state reports for our school, and since I don’t want to discuss who I am on this blog, I can’t provide links. You’ll have to take my word for it—but then, this entire post is based on data that no one else has, so I guess the whole post involves taking my word for it.

2010-11 Change
Overall

+

0.5
Whites

7.2
Hispanics

+

4
EcDis Hisp

1
ELL

+

7

Wow. Whites dropped by seven points, Hispanics overall increased by 4, and non-native speakers (almost entirely Hispanic and economically disadvantaged), increased by 7 points.

So clearly, when our administrator was talking about our great year, she was talking about our cleverness in depressing white scores whilst boosting Hispanics.

Don’t read too much into the decline. For example, I personally booted 12 students, most of them white, out of my algebra classes because they’d scored advanced or proficient in algebra the previous year. Why on earth would they be taking the subject again? No other teacher did this, but I know that these students told their friends that they could get out of repeating Algebra I simply by demanding to be put in geometry. So it’s quite possible that much of the loss is due to fewer white advanced or proficient students taking algebra in the first place.

So who was teaching Hispanics and English Language Learners? While I can’t run reports anymore, I did have my original file of 2010 scores. So this data is incoming students with 2010 scores, not the final 2011 students. Also, in the file I had, the ED and ELL overlap was 100%, and I didn’t care about white or black EDs for this count. Disadvantaged non-ELL Asians in algebra is a tiny number (hell, even with ELL). So I kept ED out of it.

 

Hisp

ELL
t1

30

21
t2

32

38
t3

48

37
t4

39

12

Well, now. While Teacher 4 has a hefty number of Hispanics, very few of them are poor or ELLs. Teacher 2 seems to have Asian ELLs in addition to Hispanic ELLs. I have a whole bunch of Hispanics, most of them poor and ELL.

So I had the most mediocre numbers, but we had a great year for Hispanic and ELL scores, and I had the most Hispanic and ELL students. So maybe I was inadvertently responsible for depressing white scores by booting all those kids to geometry, but I had to have something to do with raising scores.

Or did I? Matthew DiCarlo is always warning against confusing comparing year to year scores, which are a cross-section of data at a point in time, with comparing student progress at two different points in time. In fact, he would probably say that I don’t have a conundrum, that it’s quite possible for me to have been a crappy teacher who had minimal impact on student achievement compared point to point, while the school’s “cross-section” data, which doesn’t compare students directly, could have some other reason for the dramatic changes.

Fair enough. In that case, we didn’t have a great year, right? It was just random happenstance.

This essay is long enough. So I’ll leave any one interested to explain why this data shows that merit pay and value added scores are pointless. I’m not sure when I’ll get back to it, as I’ve got grades to do.


Administrators

I like my current principal more than any of my previous overlords—and I pretty much liked all of them as well. Of course, I never forget they are management, and like all long-term corporate survivors I consider management all-powerful, functionally (not personally) untrustworthy, and utterly irrelevant to my own job performance. They aren’t evil. It’s just baked into the job description. So this opening story isn’t a complaint, just an opening.

We were in a two hour staff meeting today and the principal wandered by. It struck me that until that moment I hadn’t even seen him for three weeks—I mean, literally seen him. I haven’t actually had a conversation with him since the first day of school. In that same period I’ve spoken to the AVP who interviewed me twice for a minute each time, just hi, how are you. I don’t even know the other two AVP’s names; they haven’t stopped by or introduced themselves. No administrator has even entered my room, much less watched me teach.

And this utter isolation from administrators is the norm, for me. I spent two years at my last job; the principal spent a grand total of 40 minutes in my classroom. 20 for evaluation, 20 with a district visitor, all 40 minutes during in the first year, although she didn’t actually give me the results of my eval until a 5-minute meeting the last day of school. She never set foot in my classroom the second year when students were present. Two AVPs spent, collectively, an hour in my room over the first year (about 30 minutes each, spread out over the year), and the AVP who did my eval the second year never spent a moment in my classroom and few even talking to me until the first observation.

My first year as a teacher, I taught at a ultra-progressive school; the principal gave me two hour long evals and a nice follow-up meeting for each. Except for those two evals, however, the administrators were never in my room and I did little more than nod hi to them periodically—it was a smaller school than the other two, so we ran into each other more frequently.

Is it like that for all new teachers? No. If a teacher’s classroom is out of control, the administrators will live there. If the teacher has highly sought after attributes (i.e., young and male) the administrators will do everything short of buying him hookers to win him over, and part of that winning over involves visiting his classroom, giving him lots of praise, extra earning opportunities, and seeking his input on everything short of buying new whiteboard erasers. No, I am not bitter, truly. That’s just how it rolls.

But if a newly hired teacher isn’t spectacularly bad or a hot commodity, he or she is ignored. This gives the administrator complete flexibility without the embarrassment of having to walk back any untoward comments, like praise or condemnation. The first evaluation can be noncommittal, leaving plenty of room to give a second bad one if the district needs to give a few extra teachers the boot, or if a new hot commodity has graduated and someone needs to be cut. (While I am not certain, tenured teachers seem to see administrators more often; maybe they have less to worry about and actively seek them out.)

As I’ve mentioned before, I’m not the hot commodity type, even though I’m a damn good teacher for just three years in. I’m not mad about this, more mildly chagrined and amused. I charge enough money per hour in my private tutoring sessions that my ego’s not at stake, and I’ve long since realized that teacher assessment is largely ideological.

So when eduformers talk about the importance of allowing administrators complete control over the hiring and firing of teachers, I’m like um, what? Are you insane? Principals are managers. They went into management because they find it appealing. That’s fine. It does not make them expert judges of teaching ability. In fact, it probably means they were entirely adequate but not stupendous teachers, because no matter how much you need the money, you don’t leave teaching if you’re stupendous. It’s a drug. And principals simply aren’t spending much time in classrooms; if they do, the other aspects of their job will suffer. PR outranks HR every time. How complicated is that?

Principals have considerable hiring autonomy; unless the district reallocates personnel, they interview and pick their own candidates. In my state they get fifteen months in which they can boot a teacher on a whim. A teacher can get sterling evaluations, be declared teacher of the year, and fired unceremoniously any time in the first two years—in some districts, it can take even longer to get tenure.

That strikes me as adequate time to give principals complete control over staff. After that, giving principals any control at all is spooky, in my view, but I guess most of the time limited firing ability works out because firing long-time teachers on a whim gets the rest of the staff pissed off. But giving them unlimited termination powers? Seriously? Why would we give government employees the autonomy of a small business owner?

If eduformers are absurd in their expectation for principals, progressives—and teachers themselves—aren’t any more realistic in their expectations. When I hear them going on and on about the importance of good leadership, I just yawn. A principal is—must be—focused on selling an image: to teachers, to parents, to the district, to the community. The extent to which he or she keeps the trains running on time is entirely dependent on which trains are carrying the most important passengers at that point in time. That’s their job.

Needless to say, I’ve stopped taking the evaluation process itself seriously. I’m interested in good feedback and suggestions—no, really! But the evaluation isn’t even remotely about me. The principal is interested in contract compliance (all teachers on the evaluation list undergo observation by October 20th. Check.) This evaluation process has nothing at all to do with whether or not the principal decides to keep me, either. It’s just cover.

And I’m fine with that. I just wish I didn’t have to go through the pretense every year that, in this observation, the administrator could suddenly discover that a teacher who has been utterly ignored for two to three months is in fact a wholly unsatisfactory teacher, one who is utterly failing to meet objectives. Really? Three months of nothing, followed by 30-40 minutes of observation, and suddenly the teacher is unsatisfactory? What sort of manager are you Sir or Madame Administrator, that you hadn’t figured that out before?

But in fact, a bad early eval that comes out of the blue is just a sign that the principal has someone else lined up for your job next year. I’d rather they do away with the extra effort, and the principal just had a form that said “Like/Don’t Like (circle one)”. But oh, well. Sorry, Sonny. Make sure the mortician fixes you up nice.

This is a good time to reiterate that at this point in time, given our current determination to delude ourselves about student ability, the existing teacher evaluation and tenure system is the best possible option. Mess with it at your peril. I’m personally certain the adjustments eduformers fantasize about will hurt low ability, low income kids. But that’s a different post.