Category Archives: testing

Evaluating the New PSAT: Math

Well, after the high drama of writing, the math section is pretty tame. Except the whole oh, my god, are they serious? part. Caveat: I’m assuming that the SAT is still a harder version of the PSAT, and that this is a representative test.

Metric Old SAT Old PSAT ACT New PSAT
Questions
 
54 
44 MC, 10 grid
38 
28 MC, 10 grid
60 MC 
 
48 
40 MC, 8 grid

Sections
 
 
1: 20 q, 25 m 
2: 18 q, 25 m 
3: 16 q, 20 m
1: 20 q, 25 m 
2: 18 q, 25 m
 
1: 60 q, 60 m 
 
 
NC: 17 q, 25 m 
Calc: 31 q, 45 m
 
MPQ
 
 
1: 1.25 mpq 
2: 1.38 mpq
3: 1.25 mpq
1: 1.25 mpq 
2: 1.38 mpq
 
1 mpq 
 
 
NC: 1.47 mpq 
Calc: 1.45 mpq
 
Category 
 
 
 
 
 
 

Number Operations 
Algebra & Functions
Geometry & Measurement
Data & Statistics
 
 
 

Same  
 
 
 
 
 
 

Pre-algebra 
Algebra
elem & intermed.
Geometry
coord & plane
Trigonometry
 
 
1) Heart of Algebra 
2) Passport to
Advanced Math
3) Probability &
4) Data Analysis
Additional Topics
in math
 

It’s going to take me a while to fully process the math section. For my first go-round, I thought I’d point out the instant takeaways, and then discuss the math questions that are going to make any SAT expert sit up and take notice.

Format
The SAT and PSAT always gave an average of 1.25 minutes for multiple choice question sections. On the 18 question section that has 10 grid-ins, giving 1.25 minutes for the 8 multiple choice questions leaves 1.5 minutes for each grid in.

That same conversion doesn’t work on the new PSAT. However, both sections have exactly 4 grid-ins, which makes a nifty linear system. Here you go, boys and girls, check my work.

The math section that doesn’t allow a calculator has 13 multiple choice questions and 4 grid-ins, and a time limit of 25 minutes. The calculator math section has 27 multiple choice questions and 4 grid-ins, and a time limit of 45 minutes.

13x + 4y = 1500
27x + 4y = 2700

Flip them around and subtract for
14x = 1200
x = 85.714 seconds, or 1.42857 minutes. Let’s round it up to 14.3
y = 96.428 seconds, or 1.607 minutes, which I shall round down to 1.6 minutes.

If–and this is a big if–the test is using a fixed average time for multiple choice and another for grid-ins, then each multiple choice question is getting a 14.4% boost in time, and each grid-in a 7% boost. But the test may be using an entirely different parameter.

Question Organization

In the old SAT and ACT, the questions move from easier to more difficult. The SAT and PSAT difficulty level resets for the grid-in questions. The new PSAT does not organize the problems by difficulty. Easy problems (there are only 4) are more likely to be at the beginning, but they are interlaced with medium difficulty problems. I saw only two Hard problems in the non-calculator section, both near but not at the end. The Hard problems in the calculator section are tossed throughout the second half, with the first one showing up at 15. However, the coding is inexplicable, as I’ll discuss later.

As nearly everyone has mentioned, any evaluation of the questions in the new test doesn’t lead to an easy distinction between “no calc” and “calc”. I didn’t use a calculator more than two or three times at any point in the test. However, the College Board may have knowledge about what questions kids can game with a good calculator. I know that the SAT Math 2c test is a fifteen minute endeavor if you get a series of TI-84 programs. (Note: Not a 15 minute endeavor to get the programs, but a 15 minute endeavor to take the test. And get an 800. Which is my theory as to why the results are so skewed towards 800.) So there may be a good organizing principle behind this breakdown.

That said, I’m doubtful. The only trig question on the test is categorized as “hard”. But the question is simplicity itself if the student knows any right triangle trigonometry, which is taught in geometry. But for students who don’t know any trigonometry, will a calculator help? If the answer is “no”, then why is it in this section? Worse, what if the answer is “yes”? Do not underestimate the ability of people who turned the Math 2c into a 15 minute plug and play to come up with programs to automate checks for this sort of thing.

Categories

Geometry has disappeared. Not just from the categories, either. The geometry formula box has been expanded considerably.

There are only three plane geometry questions on the test. One was actually an algebra question using the perimeter formula Another is a variation question using a trapezoid’s area. Interestingly, neither rectangle perimeter nor trapezoid formula were provided. (To reinforce an earlier point, both of these questions were in the calculator section. I don’t know why; they’re both pure algebra.)

The last geometry question really involves ratios; I simply picked the multiple choice answer that had 7 as a factor.

I could only find one coordinate geometry question, barely. Most of the other xy plane questions were analytic geometry, rather than the basic skills that you usually see regarding midpoint and distance–both of which were completely absent. Nothing on the Pythagorean Theorem, either. Freaky deaky weird.

When I wrote about the Common Core math standards, I mentioned that most of geometry had been pushed down into seventh and eighth grade. In theory, anyway. Apparently the College Board thinks that testing geometry will be too basic for a test on college-level math? Don’t know.

Don’t you love the categories? You can see which ones the makers cared about. Heart of Algebra. Passport to Advanced Math! Meanwhile, geometry and the one trig question are stuck under “Additional Topic in Math”. As opposed to the “Additional Topic in History”, I guess.

Degree of Difficulty;

I worked the new PSAT test while sitting at a Starbucks. Missed three on the no-calculator section, but two of them were careless errors due to clatter and haste. In one case I flipped a negative in a problem I didn’t even bother to write down, in the other I missed a unit conversion (have I mentioned before how measurement issues are the obsessions of petty little minds?)

The one I actually missed was a function notation problem. I’m not fully versed in function algebra and I hadn’t really thought this one through. I think I’ve seen it before on the SAT Math 2c test, which I haven’t looked at in years. Takeaway— if I’m weak on that, so are a lot of kids. I didn’t miss any on the calculator section, and I rarely used a calculator.

But oh, my lord, the problems. They aren’t just difficult. The original, pre-2005 SAT had a lot of tough questions. But those questions relied on logic and intelligence—that is, they sought out aptitude. So a classic “diamond in the rough” who hadn’t had access to advanced math could still score quite well. Meanwhile, on both the pre and post 2005 tests, kids who weren’t terribly advanced in either ability or transcript faced a test that had plenty of familiar material, with or without coaching, because the bulk of the test is arithmetic, algebra I, and geometry.

The new PSAT and, presumably, the SAT, is impossible to do unless the student has taken and understood two years of algebra. Some will push back and say oh, don’t be silly, all the linear systems work is covered in algebra I. Yeah, but kids don’t really get it then. Not even many of the top students. You need two years of algebra even as a strong student, to be able to work these problems with the speed and confidence needed to get most of these answers in the time required.

And this is the PSAT, a test that students take at the beginning of their junior year (or sophomore, in many schools), so the College Board has created a test with material that most students won’t have covered by the time they are expected to take the test. As I mentioned earlier, California alone has nearly a quarter of a million sophomores and juniors in algebra and geometry. Will the new PSAT or the SAT be able to accurately assess their actual math knowledge?

Key point: The SAT and the ACT’s ability to reflect a full range of abilities is an unacknowledged attribute of these tests. Many colleges use these tests as placement proxies, including many, if not most or all, of the public university systems.

The difficulty level I see in this new PSAT makes me wonder what the hell the organization is up to. How can the test will reveal anything meaningful about kids who a) haven’t yet taken algebra 2 or b) have taken algebra 2 but didn’t really understand it? And if David Coleman’s answer is “Those testers aren’t ready for college so they shouldn’t be taking the test” then I have deep doubts that David Coleman understands the market for college admissions tests.

Of course, it’s also possible that the SAT will yield the same range of scores and abilities despite being considerably harder. I don’t do psychometrics.

Examples:

newpsatmath10

Here’s the function question I missed. I think I get it now. I don’t generally cover this degree of complexity in Precalc, much less algebra 2. I suspect this type of question will be the sort covered in new SAT test prep courses.

mathnocalcquads

These two are fairly complicated quadratic questions. The question on the left reveals that the SAT is moving into new territory; previously, SAT never expected testers to factor a quadratic unless a=1. Notice too how it uses the term “divisible by x” rather than the more common term, “x is a factor”. While all students know that “2 is a factor of 6″ is the same as “6 is divisible by 2″, it’s not a completely intuitive leap to think of variable factors in the same way. That’s why we cover the concept–usually in late algebra 2, but much more likely in pre-calc. That’s when synthetic division/substitution is covered–as I write in that piece, I’m considered unusual for introducing “division” of this form so early in the math cycle.

The question on the right is a harder version of an SAT classic misdirection. The test question doesn’t appear to give enough information, until you realize it’s not asking you to identify the equation and solve for a, b, and c–just plug in the point and yield a new relationship between the variables. But these questions always used to show up in linear equations, not quadratics.

That’s the big news: the new PSAT is pushing quadratic fluency in a big way.

Here, the student is expected to find the factors of 1890:

newpsatperimeter

This is a quadratic system. I don’t usually teach these until Pre-Calc, but then my algebra 2 classes are basically algebra one on steroids. I’m not alone in this.

No doubt there’s a way to game this problem with the answer choices that I’m missing, but to solve this in the forward fashion you either have to use the quadratic formula or, as I said, find all the factors of 1890, which is exactly what the answer document suggests. I know of no standardized test that requires knowledge of the quadratic formula. The old school GRE never did; the new one might (I don’t coach it anymore). The GMAT does not require knowledge of the quadratic formula. It’s possible that the CATs push a quadratic formula question to differentiate at the 800 level, but I’ve never heard of it. The ACT has not ever required knowledge of the quadratic formula. I’ve taught for Kaplan and other test prep companies, and the quadratic formula is not covered in most test prep curricula.

Here’s one of the inexplicable difficulty codings I mentioned–this is coded as of Medium difficulty.

As big a deal as that is, this one’s even more of a shock: a quadratic and linear system.

newpsatsystemlineparabola

The answer document suggests putting the quadratic into vertex form, then plugging in the point and solving for a. I solved it with a linear system. Either way, after solving the quadratic you find the equation of the line and set them equal to each other to solve. I am….stunned. Notice it’s not a multiple choice question, so no plug and play.

Then, a negative 16 problem–except it uses meters, not feet. That’s just plain mean.
newpsatmathneg16

Notice that the problem gives three complicated equations. However, those who know the basic algorithm (h(t)=-4.9t2 + v0 + s0) can completely ignore the equations and solve a fairly easy problem. Those who don’t know the basic algorithm will have to figure out how to coordinate the equations to solve the problem, which is much more difficult. So this problem represents dramatically different levels of difficulty based on whether or not the student has been taught the algorithm. And in that case, the problem is quite straightforward, so should be coded as of Medium difficulty. But no, it’s tagged as Hard. As is this extremely simple graph interpretation problem. I’m confused.

Recall: if the College Board keeps the traditional practice, the SAT will be more difficult.

So this piece is long enough. I have some thoughts–rather, questions–on what on earth the College Board’s intentions are, but that’s for another test.

tl;dr Testers will get a little more time to work much harder problems. Geometry has disappeared almost entirely. Quadratics beefed up to the point of requiring a steroids test. Inexplicable “calc/no calc” categorization. College Board didn’t rip off the ACT math section. If the new PSAT is any indication, I do not see how the SAT can be used by the same population for the same purpose unless the CB does very clever things with the grading scale.


Evaluating the New PSAT: Reading and Writing

The College Board has released a new practice PSAT, which gives us a lot of info on the new SAT. This essay focuses on the reading and writing sections.

As I predicted in my essay on the SAT’s competitive advantage, the College Board has released a test that has much in common with the ACT. I did not predict that the homage would go so far as test plagiarism.

This is a pretty technical piece, but not in the psychometric sense. I’m writing this as a long-time coach of the SAT and, more importantly, the ACT, trying to convey the changes as I see them from that viewpoint.

For comparison, I used these two sample ACT, this practice SAT (old version), and this old PSAT.

Reading

The old SAT had a reading word count of about 2800 words, broken up into eight passages. Four passages were very short, just 100 words each. The longest was 800 words. The PSAT reading count was around 2000 words in six passages. This word count is reading passages only; the SAT has 19 sentence completions to the PSAT’s 13.

So SAT testers had 70 minutes to complete 19 sentence completions and 47 questions over eight passages of 2800 words total. PSAT testers had 50 minutes to complete 13 sentence and 27 questions over six passages of 2000 words total.

The ACT has always had 4 passages averaging 750 words, giving the tester 35 minutes to complete 40 questions (ten for each passage). No sentence completions.

Comparisons are difficult, but if you figure about 45 seconds per sentence completion, you can deduct that from the total time and come up with two rough metrics comparing reading passages only: minutes per question and words per question (on average, how many words is the tester reading to answer the questions).

Metric Old SAT Old PSAT ACT New PSAT
Word Count 2800 2000 3000 3200
Passage Count 8 6 4 5
Passage Length 100-850 100-850 750 500-800
MPQ 1.18 1.49 1.14 1.27
WPQ 59.57 74.07 75 69.21

I’ve read a lot of assertions that the new SAT reading text is more complex, but my brief Lexile analysis on random passages in the same category (humanities, science) showed the same range of difficulty and sentence lengths for old SAT, current ACT, and old and new PSAT. Someone with more time and tools than I have should do an indepth analysis.

Question types are much the same as the old format: inference, function, vocabulary in context, main idea. The new PSAT requires the occasional figure analysis, which the College Board will undoubtedly flaunt as unprecedented. However, the College Board doesn’t have an entire Science section, which is where the ACT assesses a reader’s ability to evaluate data and text.

Sentence completions are gone, completely. In passage length and overall reading demands, the new PSAT is remarkably similar in structure and word length to the ACT. This suggests that the SAT is going to be even longer? I don’t see how, given the time constraints.

tl;dr: The new PSAT reading section looks very similar to the current ACT reading test in structure and reading demands. The paired passage and the questions types are the only holdover from the old SAT/PSAT structure. The only new feature is actually a cobbled up homage to the ACT science test in the form of occasional table or graph analysis.

Writing

I am so flummoxed by the overt plagiarism in this section that I seriously wonder if the test I have isn’t a fake, designed to flush out leaks within the College Board. This can’t be serious.

The old PSAT/SAT format consisted of three question types: Sentence Improvements, Identifying Sentence Error, and Paragraph Improvements. The first two question types presented a single sentence. In the first case, the student would identify a correct (or improved) version or say that the given version was best (option A). In the ISEs, the student had to read the sentence cold with no alternatives and indicate which if any underlined word or phrase was erroneous (much, much more difficult, option E was no change). In Paragraph Improvements, the reader had to answer grammar or rhetoric questions about a given passage. All questions had five options.

The ACT English section is five passages running down the left hand side of the page, with underlined words or phrases. As the tester goes along, he or she stops at each underlined section and looks to the right for a question. Some questions are simple grammar checks. Others ask about logic or writing choices—is the right transition used, is the passage redundant, what would provide the most relevant detail. Each passage has 15 questions, for a total of 75 questions in 45 minutes (9 minutes per passage, or 36 seconds per question). The tester has four choices and the “No Change” option is always A.

The new PSAT/SAT Writing/Language section is four passages running down the left hand side of the page, with underlined words or phrases. As the tester goes along, he or she stops at each underlined section and looks to the right for a question. Some questions are simple grammar checks. Others ask about logic or writing choices—is the right transition used, is the passage redundant, what would provide the most relevant detail. Each passage has 11 questions, for a total of 44 questions in 35 minutes (about 8.75 minutes per passage or 47 seconds a question). The tester has four choices and the “No Change” option is always A.

Oh, did I forget? Sometimes the tester has to analyze a graph.

The College Board appears to have simply stolen not only the structure, but various common question types that the ACT has used for years—as long as I’ve been coaching the test, which is coming on for twelve years this May.

I’ll give some samples, but this isn’t a random thing. The entire look and feel of the ACT English test has been copied wholesale—I’ll add “in my opinion” but don’t know how anyone could see this differently.

Writing Objective:

Style and Logic:

Grammar/Punctuation:

tl;dr: The College Board ripped off the ACT English test. I don’t really understand copyright law, much less plagiarism. But if the American College Test company is not considering legal action, I’d love to know why.

The PSAT reading and writing sections don’t ramp up dramatically in difficulty. Timing, yes. But the vocabulary load appears to be similar.

The College Board and the poorly informed reporters will make much of the data analysis questions, but I hope to see any such claims addressed in the context of the ACT’s considerably more challenging data analysis section. The ACT should change the name; the “Science” section only uses science contexts to test data analysis. All the College Board has done is add a few questions and figures. Weak tea compared to the ACT.

As I predicted, The College Board has definitely chosen to make the test more difficult for gaming. I’ve been slowly untangling the process by which someone who can barely speak English is able to get a high SAT verbal and writing score, and what little I know suggests that all the current methods will have to be tossed. Moving to longer passages with less time will reward strong readers, not people who are deciphering every word and comparing it to a memory bank. And the sentence completions, which I quite liked, were likely being gamed by non-English speakers.

In writing, leaving the plagiarism issue aside for more knowledgeable folk, the move to passage-based writing tests will reward English speakers with lower ability levels and should hurt anyone with no English skills trying to game the test. That can only be a good thing.

Of course, that brings up my larger business question that I addressed in the competitive advantage piece: given that Asians show a strong preference for the SAT over the ACT, why would Coleman decide to kill the golden goose? But I’ll put big picture considerations aside for now.

Here’s my evaluation of the math section.


What You Probably Don’t Know About the Gaokao

I didn’t intend to write about the gaokao, or Brook Larmer ‘s profile of 18-year-old Yang and his family inside Chinese test prep factory. I just started out googling, as is my wont, to find out more information than the article provides. I certainly did that.

The novice might find Larmer’s article emotionally draining. Anyone with even a rudimentary understanding of Chinese academic culture will notice a huge, gaping hole.

I noticed the hole, which led me to an observation, which led me to a better understanding of how the gaokao works, which is almost exactly the opposite of its presentation in the American press.

The hole: In a story dedicated to students preparing for the National Higher Education Entrance Examination (aka the gaokao) Larmer never once mentions cheating. This would be a problematic oversight in any event, but given the last anecdote, the omission strains credulity.

When Larmer returned to the town for his second visit, the day before the gaokao, Yang’s scores, which had been dropping, had not improved. As a result, Yang had kicked out his mom and brought his grandfather to live with him in Maotanchang for the last few weeks of prep. While Larmer drove into town with Yang’s parents, the grandfather refused to let Larmer accompany the family to the test site. Grandpa was afraid the family might “get in trouble” for talking to a reporter, according to “someone”.

Yang does exceptionally well, given his fears—“his scores far surpassed his recent practice tests”. Sadly, his friend Cao tanks because he “had a panic attack”.

Yang’s scores were considerably beyond what his recent performance had predicted. Yet it apparently never once occurred to Larmer that perhaps Yang and Grandpa prudently got the New York Times reporter out of the way before they arranged a fix. Maybe Yang wanted more aid than could be provided with “‘brain-rejuvenating’ tea”, or Gramps didn’t want Larmer to see Yang wired up for sound, or that he’d really put in some money and paid for a double.

Yang’s performance might have been entirely unaided, of course. But any article about the gaokao should address cheating, even with Gramps banning access.

When I realized that Larmer hadn’t mentioned cheating, I read the piece again, thinking I must have missed it. Nope. But that second readthrough led to an observation.

I got curious—just curious, nothing skeptical at this point—about the school’s gender restriction on teachers. Was that just for cram schools? What was the gender distribution of Chinese teachers?

I couldn’t find anything. No confirmation that the teacher were all male, no comprehensive source on cram schools, no readily available data on Maotanchang. I couldn’t find anything at all about the school’s business practices online. So I went back to Larmer’s paper to look for a source for that fact—and nothing.

And so, the observation: In his description of the school’s interior and practices, Larmer doesn’t mention interviews with school representatives, other journalism, or a Big Book of Facts on Chinese Cram Schools.

The earliest detailed description of Maotanchang online appears to be this August 2013 article in China Youth Daily, a Beijing paper, which created quite a furor in China and largely ignored here because we can’t read Chinese. Rachel Lu, senior editor at Foreign Policy magazine, restated some key points for those folks who don’t read Chinese, which is nice of her, because what idiot would copy and paste the Chinese piece into Google Translate?

Yeah, well, I’m an idiot. I won’t bore people with the extended version, but a lot of the details that Larmer didn’t seem to personally witness show up in the Chinese story: same school official quoting management theory, teachers using bullhorns, Maotanchang’s 1939 origins, bus license plates ending in 8, burning incense at the town’s sacred tree, teacher dismissals for low scores.

The excitement over the China Youth Daily article generated more interest, like Exam Boot Camp, also written in August 2013, happily in English, which profiled a female student and her mother who provide data points like higher prices for lower scoring students ,lack of electrical outlets, and surveillance cameras in the classroom.

Am I accusing Larmer of lifting tidbits from these other stories? Well, I’d like to know where he got the information.

Leave that aside, though, because reading through these stories looking for sources led me to all sorts of “new things” to learn about the gaokao. These “new things” are readily available online; in fact, anyone can find most of the information in the Wikipedia entry. But you will rarely read these not-in-fact new things, but well-established facts, explicitly laid out by any major media outlet (although now that I know, I can see hints). I don’t know why. I can’t even begin to see how any reporter wouldn’t trumpet these facts to the world, narrative or no.

China’s supposedly meritocratic test is a fraud.

To begin with, Larmer, like just about any other reporter discussing the gaokao, describes it as a “grueling test, which is administered every June over two or three days (depending on the province), is the lone criterion for admission to Chinese universities.”

Wrong. The test score is, technically, the sole criterion for admission. But in China, the test score and the test performance aren’t the same thing.

Testers get additional points literally added to their scores for a number of attributes. China’s 55 ethnic minorities (non-Han) get a boost of up to 30 points , although the specific number varies by province. Athletic and musical certifications appear to be in flux, but still giving some students more points, even though the list of certification sports culled from 70 to 17. Children whose parents died in the military and Chinese living overseas get extra points, and recently the government announced point boosts for morality.

Remember when the University of Michigan used to give students 20 points if they were black, and 12 points if they had a perfect SAT score? Well, imagine those points were just added into the SAT/ACT score. That’s what the Chinese do.

But even after the extra points are allotted, test scores aren’t relevant until the tester’s residence has been factored in. Larmer: “The university quota system also skews sharply against rural students, who are allocated far fewer admissions spots than their urban peers.”

I first understood this to mean that colleges used the same cut scores for everyone, but just accepted fewer rural students, without grasping the implications: city kids have lower cut scores than rural kids.

Xu Peng, the only Maotanchong student to make the cut off score for Tsinghua, where the “minimum score for students from Anhui province taking the science exam was 641.”

Two years earlier, the cutoff score for Tsinghua for a Beijing student was somewhere under 584.

Rachel Lu again:” the lowest qualifying score for a Beijing-based test-taker may be vastly lower than the score required from a student taking the examination in Henan or Jiangsu. [rural provinces]. ”

A joke goes:
gaokaojoke

Of course, don’t make the mistake, as I did, of thinking the cut scores mean the same thing for each student.

Curious about the nature of the studying/memorization the students do (another vague area for Larmer’s piece), I tried to find more information on the gaokao content. The actual gaokao essay questions are usually published each year and they’re….well, insane.

When I finally did find an an actual math question:


beijingmathtrans

it seemed surprisingly easy and then, I realized that it was only for the Beijing test:

beijingmatheasy

Then I went back to the essay questions and it sunk in: the essay questions differed by city.

The gaokao isn’t the same test in every province. Many provinces develop their own custom test and just call it the gaokao.


diffgaokaos

At which point, I threw up my hands and mentally howled at Larmer, my current proxy for the mainstream American press: you didn’t think this worth mentioning? Or didn’t you know?

If all this is true, then the wealthier province universities use a lower cut score for their residents. But just to be sure, some provinces make an easier test for their residents, so that the rural kids are taking a harder test on which they have to get a higher score. Please, please, please tell me I’m misunderstanding this.

Consider Larmer’s story again in light of this new information. Larmer can’t say definitively who had the best performance without ascertaining whether Yang or Cao got extra points. Both Yang and Cao might both have outscored many students who were admitted to top-tier universities. Cao may or may not have “panicked”, and may not have even done poorly, in an absolute sense. None of this context is provided.

In my last story about Chinese academic fraud, I pointed out that so much money was involved that few people have any incentive to fix the corruption. All the people bellyaching about the American test prep industry should pause for a moment to think about the size of the gaokao enterprise. The original China Youth Daily story focused on Maotanchang’s economic transformation, something Larmer also mentions. Parents are paying small fortunes for tutoring, for cheating devices, for impersonators, for bribes for certificates. All of these services have their own inventory supply chains and personnel. Turn the gaokao into a meritocratic test and what happens to a small but non-trivial chunk of the Chinese economy?

But I’m just stunned at how much worse the Chinese fraud is than I’d ever imagined.

Sure, well-connected parents could probably bribe their kids into college. Sure, urban kids who had better schools that operated longer with educated teachers would likely learn more than those stuck with “substitutes”. Sure, the content was probably absurd and has little relationship to actual knowledge. Sure, the tests were little more than a memory capacity game, with students memorizing essays as well as facts that had no real meaning to them. Without question the testers were engaging in rampant cheating.

But not once had I considered that the test difficulty varied by province, that some kids got affirmative action or athletic points added directly to their score, and worst of all, that a kid from Outer Nowhere who scored a 650 would have no chance at a college that accepted a kid from Beijing with a 500.

Once again, I am distressed to realize that my cynical skepticism has been woefully inadequate to the occasion.

The gaokao isn’t a meritocracy. Millions of kids who live in the wrong province are getting screwed by a test whose great claim to fame is that it will reward applicants strictly by merit. And of course, the more kids who apply to college, the more cut scores and test difficulty will increase–but only for those students from those wrong provinces. Meanwhile, the kids from the “right” provinces have a (relatively) easy time.

In this context, the 2013 gaokao cheating riot takes on a whole new light. If you really want to feel sad, consider the possibility that Yang’s friend, Cao, now working as a migrant, might have scored higher on a harder test than a rich kid in Shanghai.

By the way, could someone alert Ron Unz?

*Note: in the comments, someone who understands this is (bizarrely, to me) fussed over my use of the “rural/urban” paradigm. I was using the same construct that Brooke Larmer and others have. The commenter seems to think it makes a difference. My point is simpler, and I don’t think obscured for non-Chinese readers. But I caution anyone that I’m utterly unfamiliar with Chinese geography.


The SAT is Corrupt. No One Wants to Know.

“We got a recycled test, BTW. US March 2014.”.

This was posted on the College Confidential site, very early in the morning on December 6, the test date for the international SAT.

Did you get it?

Get what?

I mean how do you know it was a recycled Marhc test? Do you have the March Us test?

Oh, no. I just typed in one of the math questions from today’s test and the March US 2014 forum popped right up.

And of course, the March 2014 test thread has all the answers spelled out. The kids (assuming it’s kids) build a Google doc in which they compile all the questions and answers.

This is a pattern that goes on for every SAT, both domestic and international. The kids clearly are using technology during the test. They acknowledge storing answers on their calculators, but don’t explain what allows them to remember all the sentence completions, reading questions and even whole passages verbatim, much less post their entire essay online. Presumably, they are using their phones to capture the images?

They create a google doc, in which they recreate as many of the questions as can be remembered (in many cases, all) and then they chew over the answers. By the end of the collaboration, they have largely recreated the test. They used to post links to openly with any request. But recently the College Confidential moderators, aware that their site is being exposed as a cheating venue, have cracked down on requests for the link, while banning anyone who links to the document.

So floating out there somewhere in the Internet are copies of the actual test, which many hagwons put out (and pull them down because hey, no sense letting people have them for free), as well as the results of concentrated braindumping by hundreds of testers.

For international students, “studying for the SAT” doesn’t mean increasing math and vocabulary skills, but rather memorizing the answers of as many tests as possible.

And those are just the kids that aren’t paying for the answers.

The wealthy but not super-rich parents who want a more structured approach pay cram schools–be they hagwons, jukus or buxiban–to provide kids with all the recycled tests and memorize every question. No, not learn the subject. Memorize. As described here, cram schools provide a “key king”, a compilation of all the answer sequences for sections, using all the potential international tests. They know which ones will be recycled because the CB “withholds” these tests.

Of course, the super-rich parents don’t want to fuss their kids with all that memorizing. Cram schools have obtained copies of all the potential international tests by paying testers to photograph them. Then they pay someone to take the SAT in the earliest time zone for the International, and disseminate the news via text to all the testers. They just copy the answers from the pictures. Using phones. Which they have told the proctors they don’t have, of course.

I don’t know exactly how all this works—for example, are the cram schools offering tiered pricing for key kings vs. phoned in answers? Do different cram schools have different offerings? I’ve read through the documented process provided by Bob Schaeffer of FairTest (a guy I don’t often agree with), and it seems very credible. He’s also provided a transcript of an offer to provide answers to the test. Valerie Strauss got on the record accounts of this process from two international administrators, Ffiona Rees and Joachim Ekstrom.

Every so often Alexander Russo complains that Valerie Strauss shouldn’t do straight education reporting, given her open advocacy against reform.

Great. So where’s all the other hard reporting on this topic? The New York Times, whose public editor Margaret Sullivan just encouraged to “to enlighten citizens, hold powerful people and institutions accountable and maybe even make the world a better place”, bleeds for the poor Korean and Chinese testers anxious for their scores and concerned they’ll be tarred with the same brush. Everyone else just spits out the College Board press release–if they mention it at all. While most news outlets reported the October cancellation, few other than Strauss reported that the November and December international tests scores were delayed as well.

At the same time Strauss reported the College Board is stonewalling any inquiries as to how many kids were cheating, how many scores were cancelled, or what it was doing to prevent further corruption, an actual Post “reporter”, Anna Fifield, regurgitates a promotional ad for a Korean SAT equivalent coach.*

Well, you can understand why. The millionaire Korean test prep coach-called-a-teacher story is one of the woefully underreported stories of the 21st century. I mean, we only had one promo put out by the Wall Street Journal the year before, and another glowing testimonial CBS a few months later (even mentioning the tops in performance, bottom in happiness poll). But really, only one or two a year of these stories have been coming out since 2005.

So you can see why the Post felt another story on a Korean test prep instructor making millions required immediate exposure, if not anything approaching investigation or reporting.

These stories are catnip to reporters who get all their education facts from The Big Book Of Middlebrow Education Shibboleths. First, unlike our cookie cutter teacher tenure system, Korean teachers work in a real meritocracy where kids and their parents reward excellence with cash. Take that, teachers!

Then, unlike American moms and dads, Korean parents care about their kids and put billions into their education. Take that, parents!

And oy, the faith Anna shows in her subjects. Cha is a “top-ranked math teacher” who “says” he earns a “cool $8 million last year.” Cha says he’s been teaching for 20 years, but refuses to give his age and there’s no mention of the topic or school he attended for his PhD, or if he ever got one. But he’s got a really popular video, so he must be great!

Some outlets are less adulatory. The Financial Times points out that the Korean government is cracking down on hagwon fees and operating hours, and preventing them from pre-teaching topics. Megastudy, the company in the 2005 story linked in above, just went up for sale because of those government changes. Michael Horn of the Christiansen Institute is doing no small part to alert people to the madness of the Korean system. The New York Times, despite its tears for the Korean and Chinese testers, has done its fair share to report on the endemic cheating in Chinese college applications.

But when it comes to the College Board and the SAT, everyone seems to be hands off the international market. At what point will it occur to reporters to seriously investigate whether a large chunk of the money spent on cram schools is not for instruction, but for “prior knowledge” cheating? When will they ask the Korean cram school instructors if they are fronts for an organized criminal conspiracy, if the money they get is not for tutoring, but for efficient delivery of test answers on test day? And how many of those test days are run by the College Board?

People think “well, sure, there’s some cheating, but so what? Some kids cheat.” Yeah, like I’d be writing this if it were a few dozen, or even a few hundred kids. Asian immigrants cheating on major tests in this country is in the high hundreds a year. Maybe more. In China and Korea? I suspect it’s beyond our comprehension, us ethical ‘murricans.

One of the depressing things about the past three years is that I start looking into things more closely. I never really trusted the media, mind you, but I did assume that journalists skewed stories because of bias. I fondly imagined, silly me, that journalists wanted to investigate real wrongdoing. Yes. Laugh at my foolish innocence.

Consider what would be disrupted if public American pressure forced the College Board to end endemic international student cheating. First, the CB would lose millions but weep no tears, it’s a non-profit company. hahahahah! Yeah, that makes me laugh, too.

But public universities increasingly rely on international student fees and the pretense that they are qualified to do college work. After all, the thinking goes, we accept a lot of Americans who aren’t prepared for college work—may as well take in some kids who pay full freight. Private schools, too, appreciate the well-heeled Chinese students who don’t expect tuition discounts.

So suppose public pressure forces the College Board to use brand new tests for the overseas market, require all international testing to be done at US international schools, use different tests at different locations. The College Board might decide that the international market profits weren’t worth the hassle for other than US students living abroad (as indeed, the ACT seems to have done for years). Either way, a crackdown on testing security would seriously compromise Chinese and Korean students’ ability to lie about their college readiness and English skills.

A wide swath of public universities would either have to forego those delightful international fees or simply waive the SAT requirement, but without those inflated test scores it will be tough to justify letting in these kids over the huge chunk of white and Asian Americans who are actually qualified. No foreign students, more begging for money from state legislatures. Private universities would have a difficult time bragging about their elite international students without the SAT scores to back thing up.

Plus, hell, we changed the source country for zombies because we didn’t want to piss off China. Three years ago, the College Board wanted to open up mainland China as a market. 95% of the SAT testers in Hong Kong are Chinese. Stop all that money flowing around? People are going to be annoyed.

At this point, I start to feel too conspiratorial, and go back to figuring that reporters just don’t care. I’ve got a lot of respect for education policy reporters—the Edweek reporters are excellent on most topics—and most reporters do a good job some of the time.

But the SAT is basically corrupt in the international market. I’ve already written about test and grade corruption among recent Asian immigrants over here, particularly in regards to the Advanced Placement tests and grades.

Yet no one seems to really care. Sure, people disapprove of the SAT, but for all the wrong reasons: it’s racist, it’s nothing more than an income test, it reinforces privilege, it has no relationship to actual ability. None of these proffered reasons for hating the SAT have any relationship to reality. But that the SAT is this huge money funnel, taking money from states and parents and shoveling it directly or indirectly into the College Board, universities, and the companies who have essentially broken the test? Eh. Whatever.

The people who are hurt by this: middle and lower middle class whites and Asian Americans. So naturally, who gives a damn?

enlighten citizens, hold powerful people and institutions accountable and maybe even make the world a better place

Sigh. Happy New Year.

*****************************
*In the comments, an actual SAT prep coach making millions–no, really, he assures us, millions!–simply by being a fabulous coach with stupendous methods is insulted that I insinuated that the Washington Post story was on an SAT prep coach, rather than the Korean equivalent of the SAT. I knew that, but at one point referred to the guy as a SAT prep coach. I fixed the text.


Advanced Placement Test Preferences: Asians and Whites

I just finished my AP US History survey course, and a glorious time it was. But I will save the specifics of my three to four hour lectures, and whether or not this is a good way to teach history for another post. I will also, hopefully, weigh in some time on what value add I think I bring to history. (If you’re curious, in public school I taught history of Elizabethan theater and a truly awesome 50s science fiction film course, in which students were to analyze the movie’s foreign policy approach by Walter Russell Mead’s paradigm.)

I always end my AP class by discussing the students’ course selections for next year. APUSH is a junior course, and I have about ten kids in this particular class, and the conversation is always the same.

“What are you taking?”

“Calc BC, AP Physics C, AP Bio, AP Gov.”

“You?

“AP Chem, AP Stats, AP Psych,…”

“So you’ve already taken BC?”

“Yeah, just took the test. Piece of cake. I’m taking intro to MVC.”

“What about AP English?”

All the heads shake. “God, no. Way too hard.”

One kid says “I’m taking AP Gov, I heard it’s easy.”

“I’m taking Macro Econ, one of our teachers has all the info you need to pass the tests.”

I laugh. “Jesus. Embrace the stereotype.”

They all get it and laugh, shamefacedly.

“Who’s taking AP English this year?” Two hands rose. “AP English next year?” No hands.

“So here’s what I don’t understand. You are all trying to get into college, and the reason you are taking these tough classes is to make yourself look good for colleges.”

“Sure.”

“And I see only Chinese, Korean, and Indian Americans in front of me, all either FOB or citizens with parents who lived most of their lives in China, Korea, or India. Moreover, as I imagine you’ve heard, and certainly your parents have heard, universities often engage in some form of discrimination against Asians.”

“Wow,” one of the students laugh-gasped. “I never thought I’d hear an American admit that.”

“An American, or a white person?”

“They aren’t the same?”

“You born here?” Pause, as I see that datapoint register. Yes. She’s an American. (We’ll leave aside the fact that they don’t consider blacks and Hispanics American, either. I’ve written about this before; it’s still weird to see.)

“Anyway. All of you avoid classes that involve reading literature or written analysis because they would be too difficult.”

“Well, yeah.”

“So the stereotype is all wrong.”

“What, the stereotype that says we’re good at science and math?”

“No, the stereotype that says you work hard, that you take on challenges.”

“Oooooh, SNAP.”

I smiled, too. “Look, there’s a serious point here. You’re a college admissions officer, reading through approximately 16 billion Asian resumes that all read exactly the same: 4.2 GPA, BC calculus as a sophomore ( with the occasional underachiever waiting until junior year), several AP science courses, APUSH for those of you who can string a sentence together, AP Chinese for those of you lucky enough to win the language lottery, and so on. What’s going to stand out? Not one more STEM course.”

“Yeah, but I hate reading.”

“You think the universities don’t know that? Oh, look, one more Asian kid who’s a machine at math and can memorize all the facts in AP Bio but uses Cliff notes for Hamlet. College admissions is a numbers game anyway, and I’m not pretending anything is going to make a huge difference,, but…”

“My dad says colleges are reducing Asians born here…American Asians [score!] for Chinese and Koreans.”

“Your dad’s right. So given all the work you’re putting in clearly to just get that last inch of consideration, may I suggest that the path to differentiation lies in showing the admissions reviewer that you take on challenges in all subjects, as opposed to taking classes you know you’ll get an A in.”

***********************************************

I was going to just post this little anecdote, but then I got to wondering just how prevalent the behavior is—it is exclusive to my little corner of the country, or are the recent Asian immigrants showing up in national data?

One of the problems with AP data is that you simply can’t make too many assumptions. For example, much has been written about the fact that the mode AP score for blacks is 1. Not only do most blacks fail the AP test, people wail, but they fail it completely! Twice as many blacks get a failing score as get a passing score! Our teachers are failing black children!

Yeah, no. The black AP population is a combination of at least three different groups. First, the group of genuinely qualified, academically prepared black students. Small group, I know, but each year hundreds of African American students take and pass the BC Calculus test, many with a score of 5 (however, 1 is still the mode for BC Calc). Second, the group of average or higher ability blacks with relatively little interest in academic success, who have nonetheless been put in AP classes by desperate suburban school officians who are under fire from the feds for their “opportunity gap” numbers. These are kids who could, with good teaching, achieve a respectable “3” on a number of tests, and probably do.

The problem, alas, is that a teacher can focus on getting middle achievers over the hump, or on challenging a bunch of smart kids. Can’t do both in the same room, not easily and probably not at all. Thus bringing in more marginal black students and coaxing them to a three occasionally has a depressing effect on suburban AP scores, as the top white kids aren’t being taught at the top of their ability. But I digress.

The third group, and it’s huge, are low income urban and charter schools gaming the GPA and Jay Mathews Challenge Index. These are kids who are barely literate, often aren’t even taught the course material, but boy, by golly if they get the butts in the seats they’ll show up on Jay’s list somewhere. All at taxpayer expense.

While the AP tests results disaggregate Mexicans and Puerto Ricans from the rest of Hispanics, Mexican performance has the same conflation of three groups as black results do, and are equally useless. The Hispanic mode score is also one.

Asian scores aren’t disaggregated, but the Big Three (Chinese, Koreans, and Indians) dominate.

So are Asians showing a preference for science and math over the humanities AP tests?

AP testing populations by race–mostly. It would have been a huge hassle to add up all the URM categories, so I just subtracted whites and Asians from the total. So “Decline to state” is categorized as a URM, when it’s probably mostly white. I checked a couple values, it wasn’t a big difference. These are the top 20 tests by popularity, in order from left to right1.

2013aptablebyrace

The visual display is useful—look for big green, little blue, or a relatively high number of URMs, fewer Asians. See? Asians live the stereotype. Don’t assume that blacks and Hispanics are drawn to the Humanities courses—it’s just easier for schools to shove unprepared kids into English, Geography, and History classes than it is to science and math courses. Fewer prerequisites.

Here’s the same data in table form. I added one column, Asians as a percentage of the Asian/white total, to clear away the URM noise. Then I highlighted the tests for each column that were more than one average deviation away from the mean, both higher and lower (I used average deviation because I don’t think these distribution have a standard one. Could be wrong, but that’s why the choice). I bolded any values that were more than two average deviations away from mean.
2013aptablebyrace

Whites are the most tightly clustered, URMs next. Asians tilt strongly towards and against.

There’s a lot more to explore here, and I hope to do that soon. But for now, I wanted to stay focused on Asian vs. white preferences. So I next compared the top 20 Asian test preferences to those of whites. (Actually, I did 22 for Asians because I thought #22 was revealing.)

AP totals include many multiple testers, so I took the number of testers for any given test as a percentage of the total for that race. This is not a perfect measure, for obvious reasons. Or maybe not so obvious. Say, for example, that an entirely different group of Asians take the English Lit test than take the Calc AB test, but the white students have a significant overlap. In that case, the percentage of testers would be saying something entirely different about each group than if both Asians and whites had overlapping testers.

However, in either case, it would be revealing. If more whites than Asians took both math and English tests, or if one group of Asians took math tests and another group took English (or the same case of whites), the percentages are still showing a preference. I think. I’m sure there’s a way to describe this more technically, but it’s late, the school year’s almost over, so put the correct text in comments and I’ll change it.

Anyway.

whitestop20ap

asiantop22

And here it is graphically, ranked again by test popularity. The blue and green columns are the percentage of white or Asian testers taking that test. The graph above was percent of each test population that was white/Asian/URM. These columns show the percent of white or Asian population taking that particular test (the blue column “% of total” in the tables immediately above). The line graph is the percent of each group that scored a 5 on that test.

apasianwhitepref

(You notice something weird? Spanish is the tenth most popular test–but it barely makes the top 20 for either whites or Asians. How could that be? Who on earth is taking all those Spanish tests?)

So again, I want to write more about these results but I thought I’d put them out there and let people chew on them. Here’s a few preliminary observations:

  • Whites appear to be the utility players, good in a number of subjects and not expressing huge preferences. They are stretching more into STEM than Asians stretch into writing.
  • Asians appear to be avoiding writing-intensive tests relative to whites, no matter how you interpret the data.
  • Asians tend to choose tests that are more likely to yield high scores, and avoid tests that give out fewer 5s. Until recently, AP Bio doled out 5 scores like candy; they clearly changed scoring in some significant way this year (without announcing it, I guess). Environmental Science, which has a deservedly crappy rep, is actually pretty hard to get a high score on, so Asians avoid.
  • The real difference between Asians and whites in both preferences and scores is in the science tests, not math. Asians have higher scores in all tests—and while that’s probably a reflection of cognitive ability, you really can’t understand the difference in preparation and grinding until you see it—but the real gaps are in the sciences. AP Science courses are, in my opinion, pretty horrible to begin with. Yes. It’s the subject I don’t teach. Bias alert.

TL, DR: Asians across the land reflect the same biases. They may or may not be working hard, but they appear to be avoiding subjects that are more difficult for them, and don’t yield as high a score. This may also be why they avoid the ACT. Or not.

More on this later. Let me know what you think and of course, point out any errors.

1I actually did this work from the bottom up. So in the first chart, which was actually the last one I did, there are only 19 tests. Guess which one I left off, and why. The other charts all have 20 tests.


Finding the Bad Old Days

Michael Petrilli wrote an extremely aggravating article suggesting we tell unqualified kids they aren’t ready for college and go to CTE and then a much improved follow up that acknowledges the racial reality of his idea.

In his first piece, Petrilli only mentions race once:

PetrilliCTEquote3

This is a common trope in articles on tracking, a nod to “the bad old days” right after the end of segregation, that time immediately after Brown and ending sometime in the late 70s, or when Jeannie Oakes excoriated the practice in Keeping Track.

In the bad old days, the story goes, evil school districts, eager to keep angry racist white parents from fleeing, sought a means of maintaining segregation despite the Supreme Court decision and the Civil Rights Act. So they pretended to institute ability grouping and curriculum tracks, but in reality, they used race. That way the district could minimize white flight and still pretend to educate the poor and the brown. That’s why so many brown kids were in the low ability classes, and that’s why so many lawsuits happened, because of the evil racist/classist methods of rich whites keeping the little brown people down.

The bad old days are a touchstone for anyone proposing an educational sorting mechanism. So you have Petrilli advocating a return to tracking, who tell us the bad old days are a thing of the past: yeah, we used to track by race and income, pretending to use ability, but we’ve progressed. Districts pretended to use IQ, but they were really using culturally biased tests to commit second-order segregation. Today, we understand that all races and all incomes can achieve. Districts don’t have to distort reality. The bad old days are behind us, and we can group by ability secure that we aren’t discriminating by race.

Before ed school, I accepted the existence of the bad old days, but then I noticed that every reading asserted discrimination but didn’t back it up with data. Since ed school, I’d occasionally randomly google on the point, looking for research that established discriminatory tracking back in the 60s and 70s. And so the Petrilli article got me googling and thinking again. (What, buy books? Pay for research? Cmon, I’m a teacher on a budget. If it’s damning, the web has it.)

I first reviewed Jeannie Oakes, reaffirming that Oakes holds tracking itself, properly applied, as the operative sin. Discriminatory tracking isn’t a main element of Oakes’ argument, although she points out that “some research” suggests it occurred. Oakes’ third assumption, that tracking is largely made on valid decisions (page 4) is accepted at face value. So the grande dame of the anti-tracking movement has completely neglected to mention the bad old days—which, at that time, would have been contemporary.

On I move to Roslyn Mickelson, who does charge Charlotte Mecklenburg schools with discriminatory tracking.

mickelson5

In Capacchione v Charlotte-Mecklenburg, Judge Richard Potter eviscerates her expert testimony, finding faults with her credibility, her accuracy, and her logic.

Bottom line, however, Mickelson’s research shows that high achieving scorers in year one are not consistently placed in high achieving classes six years later. While both whites and blacks with high scores end up in low tracks and vice versa, more whites get high placement than blacks. But generally, her data shows something I’ve documented before, that achievement falls off each year because school gets harder.

Both whites and blacks experience the falloff, even though Mickelson seems to think that the pattern should be linear. The achievement scale simply gets larger as kids move up in grade levels, and fewer blacks make the top tier. This is consistent with cognitive realities.

There might be a smoking gun in research. But I couldn’t find it.

Then I suddenly realized duh, what about case law? If districts were tracking by race, there’d be a lawsuit.

I started with three legal articles that discussed tracking case law: 1, 2 and 3. They were all useful, but all failed to mention a significant case in which the district routinely used different standards or sorted directly by race or zip code.

From these articles, I determined that Hobson vs. Hanson was the original tracking case, and that the McNeal standard was for many years (and may still be) the test for ability grouping.

So I created a reading list of cases from the late 60s to the early 90s:

Only two of these cases involved schools directly accused of using race to sort students. In Johnson v. Jackson, the schools were forced to integrate in the middle of a school year. The black kids were ported over to white schools and the classes kept intact. The court ordered them to fix this. From first integration order to the fix order: 4 months.

The second case, Rockford, was decided in the early 90s, and the judge directly accuses the district of intentionally using race to ability group. However, Jeannie Oakes was the expert witness, and the judge drank every bit of Koolaid she had to offer and licked the glass. Oakes is presented as an expert witness, with no mention that she’s an anti-tracking advocate. Her testimony appears to be little more than readings from her book and some data analysis.

The proof of “intentional racism” was pretty weak and largely identical to Mickelson’s described above. Major difference: the judge accepted it.

Leaving aside these two cases, I couldn’t find any case in which the district was found to misuse the results of the test, either by using different racial standards or ignoring the tests entirely. The tests themselves were the issue.

In the south, school systems that weren’t “unitary” (that is, were previously segregated districts) couldn’t use ability testing. Since blacks would have lower scores based on past racial discrimination, the use of tests was discriminatory, an intent to segregate.

For school systems that were found to be unitary, ability testing isn’t in and of itself invalid and racial imbalance isn’t a problem (see Starkville case for example).

In all these cases, I couldn’t find a district that was tracking by race. They were guilty of tracking by test. Everyone knew the tests would reveal that blacks would have lower ability on average, and therefore ability grouping was by definition invalid in previously segregated schools. This was an era in which judges said “The court also finds that a Negro student in a predominantly Negro school gets a formal education inferior to the academic education he would receive, and which white students receive, in a school which is integrated or predominantly white.” (Hobson)

Once the system is declared unitary, or that was never an issue, the record is mixed. When judges did accept the results as valid, they ruled in favor of the school districts (Starkville, Hannon). In Pase v Hannon, the judge actually reviewed the test questions himself and determined they were unbiased with few exceptions, all of which were far above the IQ level in question.

In California, on the other hand, where de jure segregation wasn’t an issue*, the mere existence of racial imbalance was still a problem (Pasadena, Riles). In Riles, Judge Robert Peckham banned all IQ testing of blacks in California for educational purposes. He later extended the ruling even if black parents requested testing, but later withdrew that order. Peckham’s reasoning is much like the other judges who believed in cultural bias:

Even if it is assumed that black children have a 15 percent higher incidence of mild mental retardation than white children, there is still less than a one in a million chance that a color-blind system would have produced this disproportionate enrollment. If it is assumed that black children have a 50 percent greater incidence of this type of mental retardation, there is still less than a one in 100,000 chance that the enrollment could be so skewed towards black children.

Notice the reasoning: of course it’s not possible that blacks have a 50% greater incidence of an IQ below 75. Except it’s worse than that.

This image is from The Bell Curve (borrowed from here) reflecting the frequency of black/white IQ distribution:

BCFreqblkwhiteIQ

As many blacks as whites populate the sub 75 IQ space, but the population distribution being what it is, blacks are far more likely to have low IQs.

When Charles Murray researched this for The Bell Curve:

In the NLSY-79 cohort, 16.8 percent of the black sample scored below 75, using the conversion of AFQT scores reported in the appendix of TBC and applying sample weights. The comparable figure for non-Latino whites was 2.2 percent. In the NLSY-97 cohort, the comparable figures were 13.8 percent for blacks and 2.7 percent for non-Latino whites.

(Charles Murray, personal communication)

So at the time of Peckham’s decision, blacks didn’t have a 50% higher chance of an IQ below 75, but rather a several hundred percent higher chance, a chance that is still in the triple digits today.1 Peckham couldn’t even begin to envision such a possibility, and so no IQ testing for blacks in California.

(As for the lower frequency of blacks in the “trainable” mentally retarded division, as it was called then, an interesting but rarely discussed fact: Low IQ blacks are often higher functioning that low IQ whites. They are less likely to be organically retarded, and more likely to be capable of independent living. This despite the fact that their IQ tests and academic outcomes are identical. Arthur Jensen discovered this phenomenon, and I highly recommend that article; it’s fascinating. I wonder if the difference is somehow related to crystallized vs. fluid intelligence, but haven’t read up enough on it.)

So there it is. Obviously, if I missed a key case in which a major district was found to have deliberately tracked kids by race, please let me know.

But despite extensive efforts, I couldn’t find the bad old days of discriminatory sorting. What I found, instead, was a judicial rejection of IQ and other ability tests, coupled with an inability to conceive of the actual distribution patterns of cognitive ability.

Please understand my limited objective. Many Southern districts did everything they could to avoid integration. See, for example, US v Tunica, where the school tried to assign students based on test scores, but were denied because of the achievement testing ban and required to reassign students and teachers to achieve integration. The teachers refused assignment to integrated schools and resigned, white parents withdrew their kids, then the white schools set up shop at local churches, classes largely intact. Money? Not an issue. They used taxpayer dollars, since the district paid the teachers who resigned and the kids took all their school books with them.

But believe it or not, there’s no mention that the district was only pretending to use test scores, actually assigning students by race. And this is a place where I’d expect to find it. Opposition to integration, absolutely. Achievement testing used as a way to minimize racially mixed classes? Sure.

In many other cases, schools or districts instituted tracking as a genuine attempt to educate a much wider range of abilities, or even had a tracking system in place before integration.

The inconvenient realities of cognitive ability distribution being what they are, the test scores would be depressingly indifferent to intent.

Then there’s the messy middle, the one that Mickelson probably found in Charlotte and Oakes found in Rockford and any one looking at my classrooms would find as well. All tracked classrooms are going to have inconsistencies, whether the schools use tests, teacher recommendations, or student choice. The honors classes fill up or a teacher suddenly dies or all sorts of other unforeseen situations mean some kids get moved around and it’s a safe bet high income parents bitch more about wrong assignments than poor parents. Go through each high score in a “regular” class and each low score in a tracked, and each one of those test scores will have a story—a story usually doesn’t involve race or malign intent. The story occasionally does involve bad teachers or district bureaucracy, but not as often as you might think.

Teacher recommendations are supposed to mitigate the testing achievement gap but teachers are moralists, particularly in math, as I’ve written before. It doesn’t surprise me that new study shows that controlling for performance, blacks are less likely to be assigned to algebra as 8th graders by teacher recommendation. I can’t tell you the number of bright Hispanic and black kids I’ve run into (as well as huge number of white boys, including my son) who don’t bother with homework and have great test scores. So their GPA is 2.7, but their test scores are higher than the kids who got As–and the teacher recommendations.

Parents: some parents insist that their kids need to be in the top group to be challenged. Others feel that their kids do better when they feel secure, able to manage the challenge. Then there are the parents who don’t give a damn about their kids’ abilities but don’t want them in a noisy classroom with kids who don’t give a damn about education. White and Asian parents are disproportionately represented in the first group, black and Hispanic parents take up more than their share in the second, and all parents of all races worry about the last.

So let’s stop using teacher recommendation, stop allowing parents or students to ask for different placement. Test scores are destiny.

But test scores today still reflect the same reality that the judges assumed, back then, could only be caused by racism or bias.

The tests haven’t changed. The kids haven’t changed much.

The judges are another story.

Richard Posner, in a much-quoted 1997 decision on an appeal to the People Who Care v Rockford did what he has done before–made my point with much greater efficiency:

Tracking is a controversial educational policy, although just grouping students by age, something no one questions, is a form of “tracking.” Lawyers and judges are not competent to resolve the controversy. The conceit that they are belongs to a myth of the legal profession’s omnicompetence that was exploded long ago. To abolish tracking is to say to bright kids, whether white or black, that they have to go at a slower pace than they’re capable of; it is to say to the parents of the brighter kids that their children don’t really belong in the public school system; and it is to say to the slower kids, of whatever race, that they may have difficulty keeping up, because the brighter kids may force the pace of the class. …

Tracking might be adopted in order to segregate the races. The well-known correlation between race and academic performance makes tracking, even when implemented in accordance with strictly objective criteria, a pretty effective segregator. If tracking were adopted for this purpose, then enjoining tracking would be a proper as well as the natural remedy for this form of intentional discrimination, at least if there were no compelling evidence that it improves the academic performance of minority children and if the possible benefits to the better students and the social interest in retaining them in the public schools were given little weight. The general view is that tracking does not benefit minority students…although there is evidence that some of them do benefit… All this is neither here nor there. The plaintiffs’ argument is not that the school district adopted tracking way back when in order to segregate the schools. It is that it misused tracking, twisting the criteria to achieve greater segregation than objective tracking alone would have done. The school district should be enjoined from doing this not, on this record, enjoined from tracking.

The Charlotte-Mecklenburg case mentioned above cited Posner’s reasoning. The third of my case law articles discusses Holton v Thomasville II, which doesn’t mention Posner but does say that racial imbalance in ability grouping isn’t of itself evidence of discrimination, and points out that the time for judicial interference in educational decisions is probably over:

holtoncase

Most districts ended tracking out of fear of lawsuits. It may be time for parents to demand more honors classes, test the limits.

So what does this have to do with Petrilli? Well, less than it once did, now that Petrilli has acknowledged the profound racial implications of his suggestion.

But if the bad old days of racial tracking never really existed, then Petrilli can’t pretend things will be better. Yes, we must stop devaluing college degrees, stop fooling kids who have interest but no ability in taking on massive loans that they can never pay off. And with luck even Petrilli will eventually realize as well that we have to stop forcing kids with neither interest nor ability to sit in four years of “college preparation” courses feeling useless.

So what comes next? Well, that’s the question, isn’t it?

*************************
*Commenter Mark Roulo points out that California did commit de jure segregation against Hispanics and was ordered to stop in Mendez v. Westminster. See comments for my response.

1See Steve Sailer’s comment for why black IQs might have been biased against lower IQ blacks and the 97 data more representative.


NAEP TUDA Scores—Detroit isn’t Boston

So everyone is a-twitter over NAEP TUDA (Trial Urban District Assessment) scores. For those who aren’t familiar with The Nation’s Report Card, the “gold standard” of academic achievement metrics, it samples performance rather than test every student. For most of its history, NAEP only provided data at the state level. But some number of years ago, NAEP began sampling at the district level, first by invitation and then accepting some volunteers.

I don’t know that anyone has ever stated this directly, but the cities selected suggest that NAEP and its owners are awfully interested in better tracking “urban” achievement, and by “urban” I mean black or Hispanic.

I’m not a big fan of NAEP but everyone else is, so I try to read up, which is how I came across Andy Smarick‘s condemnation of Detroit, Milwaukee, and Cleveland: “we should all hang our heads in shame if we don’t dramatically intervene in these districts.”

Yeah, yeah. But I was pleased that Smarick presented total black proficiency, rather than overall proficiency levels. Alas, my takeaway was all wrong: where Smarick saw grounds for a federal takeover, I was largely encouraged. Once you control for race, Detroit looks a lot better. Bad, sure, but only a seventh as bad as Boston.

So I tweeted this to Andy Smarick, but told him that he couldn’t really wring his hands until he sorted for race AND poverty.

He responded “you’re wrong. I sorted by race and Detroit still looks appalling.”

He just scooted right by the second attribute, didn’t he?

Once I’d pointed this out, I got curious about the impact that poverty had on black test scores. Ironic, really, given my never-ending emphasis on low ability, as opposed to low income. But hey, I never said low income doesn’t matter, particularly when evaluating an economically diverse group.

But I began to wonder: how much does poverty matter, once you control for race? For that matter, how do you find the poverty levels for a school district?

Well, it’s been a while since I did data. I like other people to do it and then pick holes. But I was curious, and so went off and did data.

Seventeen days later, I emerged, blinking, with an answer to the second question, at least.

It’s hard to know how to describe what I did during those days, much less put it into an essay. I don’t want to attempt any sophisticated analysis—I’m not a social scientist, and I’m not trying to establish anything certain about the impact of poverty on test scores, an area that’s been studied by people with far better grades than I ever managed. But at the same time, I don’t think most of the educational policy folk dig down into poverty or race statistics at the district level. So it seemed like it might be worthwhile to describe what I did, and what the data looks like. If nothing else, the layperson might not know what’s involved.

If my experience is any guide, it’s hard finding poverty rates for children by race. You can get children in poverty, race in poverty, but not children by race in poverty. And then it appears to be impossible to find enrolled children in a school district—not just who live in it, which is tough enough—by poverty. And then, of course, poverty by enrollment by race.

First, I looked up the poverty data here (can’t provide direct links to each city).

But this is overall poverty by race, not child poverty by race, and it’s not at the district level, which is particularly important for some of the county data. However, I’m grateful to that site because it led me to American Community Survey Factfinder, which organizes data by all kinds of geographic entities—including school districts—and all kinds of topics–including poverty—on all sorts of groups and individuals—including race. Not that this is news to data geeks, which I am not, so I had to wander around for a while before I stumbled on it.

Anyway. I ran report 1701 for the districts in question. If I understand googledocs, you can save yourself the trouble of running it yourself. But since the report is hard to read, I’ll translate. Here are the overall district black poverty rates for the NAEP testing regions:

ACSdistrictblkpoverty

Again, these are for the districts, not the cities.

(Am I the only one who’s surprised at how relatively low the poverty rates are for New York and DC? Call me naïve for not realizing that the Post and the Times are provincial papers. Here I thought they focused on their local schools because of their inordinately high poverty rates, not their convenient locations. Kidding. Kind of.)

But these rates are for all blacks in the district, not black children. Happily, the ACS also provides data on poverty by age and race, although you have to add and divide in order to get a rate. But I did that so you don’t have to–although lord knows, my attention to detail isn’t great so it should probably be double or triple checked. So here, for each district, are the poverty rates for black children from 5-17:

ACSblk517poverty

In both cases, Boston and New York have poverty rates a little over half those of the cities with the highest poverty rates—and isn’t it coincidental that the four cities with the lowest black NAEP scores have the highest black poverty rates? Weird how that works.

But the NAEP scores and the district data don’t include charter or private schools in the zone, and this impacts enrollment rates differently. So back to ACS to find data on age and gender, and more combining and calculating, with the same caveats about my lamentable attention to detail. This gave me the total number of school age kids in the district. Then I had to find the actual district enrollment data, most of which is in another census report (relevant page here) for the largest school districts. The smaller districts, I just went to the website.

Results:

naepdistenrollrate

Another caveat–some of these data points are from different years so again, some fuzziness. All within the last three or four years, though.

So this leads into another interesting question: the districts don’t report poverty anywhere I can find (although I think some of them have the data as part of their Title I metrics) and in any event, they never report it by race. I have the number and percent of poor black children in the region, but how many of them attend district schools?

So to take Cleveland, for example, the total 5-17 district population was 67,284. But the enrolled population was 40871, or 60.7% of the district population.

According to ACS, 22,445 poor black children age 5-17 live in the district, and I want an approximation of the black and overall poverty rates for the district schools. How do I apportion poverty? I do not know the actual poverty rate for the district’s black kids. I saw three possibilities:

  1. I could use the black child poverty rate for the residents of the Cleveland district (ACS ratio of poor black children to ACS total black children). That would assume (I think) that the poor black children were evenly distributed over district and non-district schools.
  2. I could have take the enrollment rate and multiplied that by the poor black children in ACS—and then use that to calculate the percentage of poor kids from blacks enrolled.
  3. I could assign all the black children in poverty (according to ACS) to the black children enrolled in the district (using district given percentage of black children enrolled).

Well, the middle method is way too complicated and hurts my head. Plus, it didn’t really seem all that different from the first method; both assume poor black kids would be just as likely to attend a charter or private school than they would their local district school. The third method assumes the opposite—that kids in poverty would never attend private or charter schools. This method would probably overstate the poverty rates.

So here are poverty levels calculated by methods 1 and 3–ACS vs assigning all the poor black students to the district. In most cases, the differences were minor. I highlight the districts that have greater than 10 percentage points difference.

naepweightingpov

Again, is it just a coincidence that the schools with the lowest enrollment rates and the widest range of potential poverty rates have some of the lowest NAEP scores?

Finally, after all this massaging, I had some data to run regression analysis on. But I want to do that in a later post. Here, I want to focus on the fact that gathering this data was ridiculously complicated and required a fair amount of manual entry and calculations.

If I didn’t take the long way round, I suspect this effort is why researchers use the National Student Lunch Program (“free and reduced lunch”) as a poverty proxy.

The problem is that the poverty proxy sucks, and we need to stop using it.

Schools and districts have noticed that researchers use National School Lunch enrollment numbers as a proxy for poverty, and it’s also a primary criterion for Title I allocations. So it’s hard not to wonder about Boston’s motives when the district decides to give all kids free lunches regardless of income level, and whether it’s really about “awkward socio-economic divides” and “invasive questions”. The higher the average income of a district’s “poor” kids, the easier it is to game the NCLB requirements, for example.

Others use the poverty proxy to compare academic outcomes and argue for their preferred policy, particularly on the reform side of things. For example, charter school research uses the proxy when “proving” they do a “great job educating poor kids” when in fact they might just be skimming the not-quite-as-poor kids and patting themselves on the back. We can’t really tell. And of course, the NAEP uses the poverty proxy as well, and then everyone uses it to compare the performance of “poor” kids. See for example, this analysis by Jill Barshlay, highlighted by Alexander Russo (with Paul Bruno chiming in to object to FRL as poverty proxy). Bruce Baker does a lot of work with this.

To see exactly how untrustworthy the “poverty proxy is”, consider the NAEP TUDA results broken down by participation in the NSLP.

naepfrlelig

Look at all the cities that have no scores for blacks who aren’t eligible for free or reduced lunch: Boston, Cleveland, Dallas, Fresno, Hillsborough County, Los Angeles, Philadelphia, and San Diego. These cities apparently have no blacks with income levels higher than 180% of poverty. Detroit can drum up non-poor blacks, but Hillsborough County, Boston, Dallas, and Philadelphia can’t? That seems highly unlikely, given the poverty levels outlined above. Far more likely that the near-universal poverty proxy includes a whole bunch of kids who aren’t actually poor.

In any event, the feds, after giving free lunches to everyone, decided that NSLP participation levels are pretty meaningless for deciding income levels “…because many schools now automatically enroll everyone”.

I find this news slightly cheering, as it suggests that I’m not the only one having a hard time identifying the actually poor. Surely this article would have mentioned any easier source?

So. If someone can come back and say “Ed, you moron. This is all in a table, which I will now conveniently link in to show you how thoroughly you wasted seventeen days”, I will feel silly, but less cynical about education policy wonks hyping their notions. Maybe they do know more than I do. But it’s at least pretty likely that no one is looking at actual district poverty rates by race when fulminating about academic achievement, because what I did wasn’t easy.

Andy Smarick, at any rate, wasn’t paying any attention to poverty rates. And he should be. Because Detroit isn’t Boston.

This post is long enough, so I’ll save my actual analysis data for a later post. Not too much later, I hope, since I put a whole bunch of work into it.


Algebra 1 Growth in Geometry and Algebra II, Spring 2013

This is part of an ongoing series on my Algebra II and Geometry classes. By definition, students in these classes should have some level of competence in Algebra I. I’ve been tracking their progress on an algebra I pre-assessment test. The test assesses student ability to evaluate and substitute, use PEMDAS, solve simple equations, operate with negative integers, combine like terms. It tiptoes into first semester algebra—linear equations, simple systems, basic quadratic factoring—but the bulk of the 50 questions involve pre-algebra. While I used the test at my last school, I only thought of tracking student progress this year. My school is on a full-block schedule, which means we teach a year’s content in a semester, then repeat the whole cycle with another group of students. A usual teacher schedule is three daily 90-minute classes, with a fourth period prep. I taught one algebra II and one geometry class first semester (the third class prepared low ability students for a math graduation test), their results are here.

So in round two, I taught two Algebra 2 courses and one Geometry 10-12 (as well as a precalc class not part of this analysis). My first geometry class was freshmen only. In my last school, only freshmen who scored advanced or proficient on their 8th grade algebra test were put into geometry, while the rest take another year of algebra. In this school, all a kid has to do is pass algebra to be put into geometry, but we offer both honors and regular geometry. So my first semester class, Geometry 9, was filled with well-behaved kids with extremely poor algebra skills, as well as a quarter or so kids who had stronger skills but weren’t interested in taking honors.

I was originally expecting my Geometry 10-12 class to be extremely low ability and so wasn’t surprised to see they had a lower average incoming score. However, the class contained 6 kids who had taken Honors Geometry as freshmen—and failed. Why? They didn’t do their homework. “Plus, proofs. Hated proofs. Boring,” said one. These kids knew the entire geometry fact base, whether or not they grokked proofs, which they will never use again. I can’t figure out how to look up their state test scores yet, but I’m betting they got basic or higher in geometry last year. But because they were put into Honors, they have to take geometry twice. Couldn’t they have been given a C in regular geometry and moved on?

But I digress. Remember that I focus on number wrong, not number right, so a decrease is good.

Alg2GeomAlg1Progress

Again, I offer up as evidence that my students may or may not have learned geometry and second year algebra, but they know a whole lot more basic algebra than they did when they entered my class. Fortunately, my test scores weren’t obliterated this semester, so I have individual student progress to offer.

I wasn’t sure the best way to do this, so I did a scatter plot with data labels to easily show student before/after scores. The data labels aren’t reliably above or below the point, but you shouldn’t have to guess which label belongs to which point.

So in case you’re like me and have a horrible time reading these graphs, scores far over to the right on the x-axis are those who did poorly the first time. Scores low on the y-axis are those who did well the second time. So high right corner are the weak students at both beginning and end. The low left corner are the strong students who did well on both.

Geometry first. Thirty one students took both tests.

Spring2013GeomIndImprovement

Four students saw no improvement, another four actually got more wrong, although just 1 or 2 more. Another 3 students saw just one point improvement. But notice that through the middle range, almost all the students saw enormous improvement: twelve students, over a third, got from five to sixteen more correct answers, that is, improved from 10% to over 30%.

Now Algebra 2. Forty eight students took both tests; I had more testers at the end than the beginning; about ten students started a few days late.

Spring2013A2IndImprovement

Seven got exactly the same score both times, but only three declined (one of them a surprising 5 points—she was a good student. Must not have been feeling well). Eighteen (also a third) saw improvements of 5 to 16 points.

The average improvement was larger for the Algebra 2 classes than the Geometry classes, but not by much. Odd, considering that I’m actually teaching algebra, directly covering some of the topics in the test. In another sense, not so surprising, given that I am actually tasked to teach an entirely different topic in both cases. I ain’t teaching to this test. Still, I am puzzled that my algebra II students consistently show similar progress to my geometry students, even though they are soaked in the subject and my geometry students aren’t (although they are taught far more algebra than is usual for a geometry class).

I have two possible answers. Algebra 2 is insanely complex compared to geometry, particularly given I teach a very slimmed-down version of geometry. The kids have more to keep track of. This may lead to greater confusion and difficulty retaining what they’ve learned.

The other possibility is one I am reminded of by a beer-drinking buddy, a serious mathematician who is also teaches math: namely, that I’m a kickass geometry teacher. He bases this assertion on a few short observations of my classes and extensive discussions, fueled by many tankards of ale, of my methods and conceptual approaches (eg: Real-life coordinate Geometry, Geometry: Starting Off, Teaching Geometry,Teaching Congruence or Are You Happy, Professor Wu?, Kicking Off Triangles, Teaching Trig).

This possibility is a tad painful to contemplate. Fully half the classes I’ve taught in my four years of teaching—twelve out of twenty four—have been some form of Algebra, either actual Algebra I or Algebra I pretending to be Algebra II. I spend hours thinking about teaching algebra, about making it more understandable, and I believe I’ve had some success (see my various posts on modeling).

Six of those 24 classes have been geometry. Now, I spend time thinking about geometry, too, but not nearly as much, and here’s the terrible truth: when I come up with a new method to teach geometry, whether it be an explanation or a model, it works for a whole lot longer than my methods in algebra.

For example, I have used all the old standbys for identifying slope direction, as well as devising a few of my own, and the kids are STILL doing the mental equivalent of tossing a coin to determine if it’s positive or negative. But when I teach my kids how to find the opposite and adjacent legs of an angle (see “teaching Trig” above), the kids are still remembering it months later.

It is to weep.

I comfort myself with a few thoughts. First, it’s kind of cool being a kickass geometry teacher, if that is my fate. It’s a fun class that I can sculpt to my own design, unlike algebra, which has a billion moving parts everyone needs again.

Second, my algebra II kids say without exception that they understand more algebra than they ever did in the past, that they are willing to try when before they just gave up. Even the top kids who should be in a different class tell me they’ve learned more concepts than before, when they tended to just plug and play. My algebra 2 kids are often taking math placement tests as they go off to college, and I track their results. Few of them are ending up in more than one class out of the hunt, which would be my goal for them, and the best are placing out of remediation altogether. So I am doing something right.

And suddenly, I am reminded of my year teaching all algebra, all the time, and the results. My results look mediocre, yet the school has a stunningly successful year based on algebra growth in Hispanic and ELL students—and I taught the most algebra students and the most of those particular categories.

Maybe what I get is what growth looks like for the bottom 75% of the ability/incentive curve.

Eh. I’ll keep mulling that one. And, as always, spend countless hours trying to think up conceptual and procedural explanations that sticks.

I almost titled this post “Why Merit Pay and Value Added Assessment Won’t Work, Part IA” because if you are paying attention, that conclusion is obvious. But after starting a rant, I decided to leave it for another post.

Also glaringly on display to anyone not ignorant, willfully obtuse, or deliberately lying: Common Core standards are irrelevant. I’d be cynically neutral on them because hell, I’m not going to change what I do, except the tests will cost a fortune, so go forth ye Tea Partiers, ye anti-test progressives, and kill them standards daid.


Why Merit Pay and Value Added Assessment Won’t Work, Part I

The year I taught Algebra I, I did a lot of data collection, some of which I discussed in an earlier post. Since I’ve been away from that school for a while, I thought it’d be a good time to finish the discussion.

I’m not a super stats person. I’m not even a mathematician. To the extent I know math, it’s applied math, with the application being “high school math problems”. This is not meant to be a statistically sound analysis, comparing Treatment A to Treatment B. But it does reveal some interesting big picture information.

This data wasn’t just sitting around. A genuine DBA could have probably whipped up the report in a few hours. I know enough SQL to get what I want, but not enough to get it quickly. I had to run reports for both years, figure out how to get the right fields, link tables, blah blah blah. I’m more comfortable with Excel than SQL, so I dumped both years to Excel files and then linked them with student id. Unfortunately, the state data did not include the subject name of each test. So I could get 2010 and 2011 math scores, but it took me a while to figure out how to get the 2010 test taken—and that was a big deal, because some of the kids whose transcripts said algebra had, in fact, taken the pre-algebra (general math) test. Not that I’m bitter, or anything.

Teachers can’t get this data easily. I haven’t yet figured out how to get the data for my current school, or if it’s even possible. I don’t know what my kids’ incoming scores are, and I still haven’t figured out how my kids did on their graduation tests.

So the data you’re about to see is not something teachers or the general public generally has access to.

At last school, in the 2010-11 school year, four teachers taught algebra to all but 25 of over 400 students. I had the previous year’s test scores for about 75% of the kids, 90% of whom had taken algebra the year before, the other 10% or so having taken pre-algebra. This is a slightly modified version of my original graph; I put in translations of the scores and percentages.

algallocdist

You should definitely read the original post to see all the issues, but the main takeaway is this: Teacher 4 has a noticeably stronger population than the other three teachers, with over 40% of her class having scored Basic or Higher the year before, usually in Algebra. I’m Teacher 3, with by far the lowest average incoming scores.

The graph includes students for who I had 2010 school year math scores in any subject. Each teacher has from 8-12 pre-algebra student scores included in their averages. Some pre-algebra kids are very strong; they just hadn’t been put in algebra as 8th graders due to an oversight. Most are extremely weak. Teachers are assessed on the growth of kids repeating algebra as well as the kids who are taking it for the first time. Again, 80% of the kids in our classes had taken algebra once. 10-20% had taken it twice (our sophomores and juniors).

Remember that at the time of these counts, I had 125 students. Two of the other teachers (T1 and T4) had just under 100, the third (T2) had 85 or so. The kids not in the counts didn’t have 2010 test scores. Our state reports student growth for those with previous years’ scores and ignores the rest. The reports imply, however, that the growth is for all students. Thanks, reports! In my case, three or four of my strongest students were missing 2010 scores, but the bulk of my students without scores were below average.

So how’d we do?

I limited the main comparison to the 230 students who took algebra for both years and had scores for both years and had one of 4 teachers.

scoreimpalg

Here are the pre-algebra and algebra intervention growth–pre-algebra is not part of the above scores, but the algebra intervention is a sub-group. These are tiny groups, but illustrative:

scoreimpother

The individual teacher category gains/slides/pushes are above; here they are in total:
myschooltotcatchg

(Arrrggh, I just realized I left off the years. Vertical is 2010, horizontal is 2011.)

Of the 230 students who took algebra two years in a row, the point gain/loss categories went like this:

Score change > + 50 points 57
Score change > -20 points 27
-20 points < score change < + 50 points 146

Why the Slice and Dice?

As I wrote in the original post, Teacher 1 and I were positive that Teacher 4 had much stronger student population than we did—and the data supports that belief. Consequently I suspected that no matter how I sliced the data, Teacher 4 would have the best numbers. But I wanted a much better idea of how I’d done, based on the student population.

Because one unshakeable fact kept niggling at me: our school had a tremendous year in 2010-2011, based largely on our algebra scores. We knew this all throughout the year—benchmark tests, graduation tests—and our end of year tests confirmed it, giving us a huge boost in the metrics that principals and districts cared about. And I’d taught far more algebra students than any other teacher. Yet my numbers based on the district report looked mediocre or worse. I wanted to square that circle.

The district reports the data on the right. We were never given average score increase. A kid who had a big bump in average score was irrelevant if he or she didn’t change categories, while a kid who increases 5 points from the top of one category to the bottom of another was a big win. All that matters were category bumps. From this perspective, my scores look terrible.

I wanted to know about the data on the left. For example Teacher 1 had far better “gain” category numbers than I did. But we had the same mean improvement overall, of 5%, with comparable increases in each category. Broken down further, Teacher 4’s spectacular numbers are accompanied by a huge standard deviation—she improved some kids a lot. The other three teachers might not have had as dramatic a percentage increase, but the kids moved up more consistently. In three cases, the average score declined, but was accompanied by a big increase in standard deviation, suggesting many of the kids in that category improved a bit, while a few had huge drops. Teacher 2 and I had much tighter achievement numbers—I may have moved my students less far, but I moved a lot of them a little bit. None of this is to argue for one teacher’s superiority over another.

Of course, once I broke the data down by initial ability, group size became relevant but I don’t have the overall numbers for each teacher, each category, to calculate the confidence interval or a good sample size. I like 10. Eleven of the 18 categories hit that mark.

How many kids have scores for both years?

The 2011 scores for our school show that just over 400 students took the algebra test. My fall 2010 graph above show 307 students with 2010 scores (in any subject) who began the year. Kick in another 25 for the teacher I didn’t include and we had about 330 kids with 2010 scores. My results show 230 kids with algebra scores for both years, and the missing teacher had 18, making 248. Another 19 kids had pre-algebra scores for the first year, although the state’s reports wouldn’t have cared about that. So 257 of the kids had scores for both years, or about 63% of the students tested.

Notice that I had the biggest fall off in student count. I think five of my kids were expelled before the tests, another four or so left to alternative campuses. I remember that two went back to Mexico; one moved to his grandparents’ in Iowa. Three of my intervention students were so disruptive during the tests that they were ejected, so their test results were not scored (the next year our school had a better method of dealing with disruptive students). Many of the rest finished the year and took the tests, but they left the district over the summer (not sure if they are included in the state reports, but I couldn’t get their data). I think I had the biggest fall-off over the year in the actual student counts; I went from 125 to 95 by year-end.

What about the teachers?

Teacher 1: TFA, early-mid 20s, Asian, first year teacher. Had a first class honors masters degree in Economics from one of the top ten universities in Europe. She did her two, then left teaching and is now doing analytics for a fashion firm in a city where “fashion firm” is a big deal. She was the best TFAer I’ve met, and an excellent new teacher.

Teacher 2: About 60. White. A 20-year teacher who started in English, took time off to be a mom, then came back and got a supplemental math credential. She is only qualified to teach algebra. She is the prototype for the Teacher A I described in my last post, an algebra specialist widely regarded as one of the finest teachers in the district, a regard I find completely warranted.

Teacher 3: Me. 48 at the time, white. Second career, second year teacher, English major originally but a 15-year techie. Went to one of the top-rated ed schools in the country.

Teacher 4: Asian, mid-late 30s. Math degree from a solid local university, teaches both advanced math and algebra. She became the department head the next year. The reason her classes are top-loaded with good students: the parents request her. Very much the favorite of administration and district officials.

And so, a Title I school, predominantly Hispanic population (my classes were 80% Hispanic), teachers that run the full gamut of desirability—second career techie from a good ed school, experienced pro math major, experienced pro without demonstrated higher math ability, top-tier recent college grad.

Where was the improvement? Case 1: Educational Policy Objectives

So what is “improvement”? Well, there’s a bunch of different answers. There’s “significant” improvement as researchers would define it. Can’t answer that with this data. But then, that’s not really the point. Our entire educational policy is premised on proficiency. So what improvement does it take to reach “proficiency”, or at least to change categories entirely?

Some context: In our state, fifty points is usually enough to move a student from the bottom of one category to the bottom of another. So a student who was at the tip top of Below Basic could increase 51 points and make it to the bottom of Proficient, which would be a bump of two categories. An increase of 50 points is, roughly, a 17% increase. Getting from the bottom of Far Below Basic to Below Basic requires an increase of 70%, but since the kids were all taking Algebra for the second time, the boost needed to get them from FBB to BB was a more reasonable 15-20%. To get from the top of the Far Below Basic category to Proficient—the goal that we are supposed to aim for—would require a 32% improvement. Improving from top of Basic to bottom of Advanced requires a 23% improvement.

Given that context, only two of the teachers in one category each moved the needle enough to even think about those kind of gains—and both categories had 6-8 students. Looking at categories with at least ten students, none of the teachers had average gains that would achieve our educational policy goals. In fact, from that perspective, the teachers are all doing roughly the same.

I looked up our state reports. Our total population scoring Proficient or Advanced increased 1%.

Then there’s this chart again:

myschooltotcatchg

32 students moved from “not proficient” to “proficient/advanced”. 9 students moved from “proficient” to “advanced”. I’ll throw them in. 18% of our students were improved to the extent that, officially, 100% are supposed to achieve.

So educational policy-wise, not so good.

Where was the improvement? Case 2: Absolute Improvement

How about at the individual level? The chart helps with that, too:

myschooltotcatchg

Only 18 students were “double gainers” moving up two categories, instead of 1. Twelve of those students belonged to Teacher 4; 4 belonged to Teachers 1 , while Teacher 2 and I only had 1 (although I had two more that just missed by under 3 points). Teachers 1, 2, and 3 had one “double slider” each, who dropped two categories.

(I interviewed all the teachers on the double gainers; in all cases, the gains were unique to the students. The teachers all shrugged—who knew why this student improved? It wasn’t some brilliant aha moment unique to that teacher’s methods, nor was it due to the teacher’s inspiring belief and/or enthusiasm. Two of the three echoed my own opinion: the students’ cognitive abilities had just developed over the past year. Or maybe for some reason they’d blown off the test the year before. I taught two of the three “double sliders”—one was mine, one I taught the following year in geometry, so I had the opportunity to ask them about their scores. Both said “Oh, yeah, I totally blew off the test.” )

So a quarter of the students had gains sufficient to move from the middle of one category to the middle of another. The largest improvement was 170 points, with about 10 students seeing >100 point improvement. The largest decline was 169 points, with 2 students seeing over 100 point decline. Another oddity: only one of these two students was a “double slider”. The other two “double sliders” had less than 100 point declines. My double slider had a 60 point decline; my largest point decline was 89 points, but only dropped one category.

However, the primary takeaway from our data is that 63% of the students forced to take algebra twice were, score-wise if not category-wise, a “push”. They dropped or gained slightly, may have moved from the bottom of one category to the middle of the same, or maybe from the top of one category to the bottom of another.

One might argue that we wasted a year of their lives.

State reports say our average algebra score from 2010 to 2011 nudged up half a point.

So it’s hard to find evidence that we made much of a difference to student achievement as a whole.

I know this is a long post, so I’ll remind the reader that all of the students in my study have already taken algebra once. Chew on that for a while, will you?

Where was the improvement? Case 3: Achievement Gap

I had found no answer to my conundrum in my above numbers, although I had found some comfort. Broken down by category, it’s clear I’m in the hunt. But the breakdown doesn’t explain how we had such a stupendous year.

But when I thought of comparing our state scores from year to year, I got a hint. The other way that schools can achieve educational policy objectives is by closing the achievement gap.

All of this data comes from the state reports for our school, and since I don’t want to discuss who I am on this blog, I can’t provide links. You’ll have to take my word for it—but then, this entire post is based on data that no one else has, so I guess the whole post involves taking my word for it.

2010-11 Change
Overall + 0.5
Whites 7.2
Hispanics + 4
EcDis Hisp 1
ELL + 7

Wow. Whites dropped by seven points, Hispanics overall increased by 4, and non-native speakers (almost entirely Hispanic and economically disadvantaged), increased by 7 points.

So clearly, when our administrator was talking about our great year, she was talking about our cleverness in depressing white scores whilst boosting Hispanics.

Don’t read too much into the decline. For example, I personally booted 12 students, most of them white, out of my algebra classes because they’d scored advanced or proficient in algebra the previous year. Why on earth would they be taking the subject again? No other teacher did this, but I know that these students told their friends that they could get out of repeating Algebra I simply by demanding to be put in geometry. So it’s quite possible that much of the loss is due to fewer white advanced or proficient students taking algebra in the first place.

So who was teaching Hispanics and English Language Learners? While I can’t run reports anymore, I did have my original file of 2010 scores. So this data is incoming students with 2010 scores, not the final 2011 students. Also, in the file I had, the ED and ELL overlap was 100%, and I didn’t care about white or black EDs for this count. Disadvantaged non-ELL Asians in algebra is a tiny number (hell, even with ELL). So I kept ED out of it.

  Hisp ELL
t1 30 21
t2 32 38
t3 48 37
t4 39 12

Well, now. While Teacher 4 has a hefty number of Hispanics, very few of them are poor or ELLs. Teacher 2 seems to have Asian ELLs in addition to Hispanic ELLs. I have a whole bunch of Hispanics, most of them poor and ELL.

So I had the most mediocre numbers, but we had a great year for Hispanic and ELL scores, and I had the most Hispanic and ELL students. So maybe I was inadvertently responsible for depressing white scores by booting all those kids to geometry, but I had to have something to do with raising scores.

Or did I? Matthew DiCarlo is always warning against confusing comparing year to year scores, which are a cross-section of data at a point in time, with comparing student progress at two different points in time. In fact, he would probably say that I don’t have a conundrum, that it’s quite possible for me to have been a crappy teacher who had minimal impact on student achievement compared point to point, while the school’s “cross-section” data, which doesn’t compare students directly, could have some other reason for the dramatic changes.

Fair enough. In that case, we didn’t have a great year, right? It was just random happenstance.

This essay is long enough. So I’ll leave any one interested to explain why this data shows that merit pay and value added scores are pointless. I’m not sure when I’ll get back to it, as I’ve got grades to do.


Spring 2013: These students aren’t really prepared, either.

I’m teaching Geometry and Algebra II again, so I gave the same assessment and got these results, with the beginning scores from the previous semester:

AlgAssessspr13

I’m teaching two algebra II classes, but their numbers were pretty close to identical—one class had the larger range and a lower mode—so I combined them.

The geometry averages are significantly lower than the fall freshmen only class, which isn’t surprising. Kids who move onto geometry from 8th grade algebra are more likely to be stronger math students, although (key plot point) in many schools, the difference between moving on and staying back in algebra come down to behavior, not math ability. At my last school, kids who didn’t score Proficient or Advanced had to take Algebra in 9th grade. I’d have included Basic kids in the “move-on” list as well. But sophomores who not only can’t factor or graph a line, but struggle with simple substition ought not to be in second year algebra. They should repeat algebra I freshman year, go onto geometry, and then take algebra II in junior year—at which point, they’d still be very weak in algebra, of course, but some would have benefited from that second year of first year.

Wait, what was my point? Oh, yeah–this geometry class class is 10-12, so the students took one or more years of high school algebra. Some of them will have just goofed around and flunked algebra despite perfectly adequate to good skills, but a good number will also be genuinely weak at math.

On the other hand, a number of them really enjoyed my first activity: visualizing intersecting planes, graphing 3-D points. I got far more samples from this class. I’ll put those in another post, also the precalc assessment.

I don’t know if my readers (I have an audience! whoo!) understand my intent in publishing these assessment results. In no way am I complaining about my students.

My point in a huge nutshell: how can math teachers be assessed on “value-added” when the testing instrument will not measure what the students needed to learn? Last semester, my students made tremendous gains in first year algebra knowledge. They also learned geometry and second year algebra, but over half my students in both classes will test Below Basic or Far Below Basic–just as they did the year before. My evaluation will faithfully record that my students made no progress—that they tested BB or FBB the year before, and test the same (or worse) now. I will get no credit for the huge gains they made in pre-algebra and algebra competency, because educational policy doesn’t recognize the existence of kids taking second year algebra despite being barely functional in pre-algebra.

The reformers’ response:

1) These kids just had bad teachers who didn’t teach them anything, and in the Brave New World of Reform, these bad teachers won’t be able to ruin students’ lives;

2) These bad teachers just shuffled students who hadn’t learned onto the next class, and in the Brave New World of Reform, kids who can’t do the work won’t pass the class.

My response:

1) Well, truthfully, I think this response is moronic. But more politely, this answer requires willful belief in a delusional myth.

2) Fail 50-60% of kids who are forced to take math classes against their will? Seriously? This answer requires a willful refusal to think things through. Most high schools require a student to take and pass three years of math for graduation. Fail a kid just once, and the margin for error disappears. Fail twice and the kid can’t graduate. And in many states, the sequence must start with algebra—pre-algebra at best. So we are supposed to teach all students, regardless of ability, three years of increasingly abstract math and fail them if they don’t achieve basic proficiency. If, god save us, the country was ever stupid enough to go down this reformer path, the resulting bloodbath would end the policy in a year. We’re not talking the occasional malcontent, but over half of a graduating class in some schools—overwhelmingly, this policy impacts black and Hispanic students. But it’s okay. We’re just doing it for their own good, right? Await the disparate impact lawsuits—or, more likely, federal investigation and oversight.

Reformers faithfully hold out this hope: bad teachers are creating lazy students who could do the work but just don’t want to. Oh, yeah, and if we catch them in elementary school, they’ll be fine in high school.

It is to weep.

Hey, under 1000 words!


Follow

Get every new post delivered to your Inbox.

Join 1,102 other followers