False Positives

I quit writing about tests. And test prep.  Five, six, years ago? I still taught test prep until this year, always giving in to my old employer’s pleas to teach his Saturday classes. But I largely quit the SAT after the last changes, focusing on the ACT. I still love tests, still enjoy coaching kids for the big day.

Explaining why has been a task I’ve avoided for several years, as the doubt is hard to put into words. 

It was an APUSH review course, the last one I taught, I think. Class hadn’t started for the day, but one of my five students was sitting there highlighting notes. She was a tiny little thing, perky and eager but not intellectually remarkable and it was March of what would have been her junior year.

“This is my last test prep course. I’ve taken the SAT for the fourth time, took AP Calculus BC last year, and I’m all done.”

“Yay! How’d you do on the SAT?”

“2400,” she said, casually. “I got 2000 the first time, but I spent the whole summer in two prep courses, plus over Christmas.”


Like I said, she was….ordinary. Bright, sure. But her APUSH essays were predictable, regurgitating the key points she’d read in the prep material–pedestrian grammar, too many commas. Her lexile level was unimpressive. Nothing terrible. I gave her some tips. 

This girl had placed in the 99th percentile for the SAT but couldn’t write a grammatically complex sentence, much less an interesting one. Couldn’t come up with interesting ways to use data (graphs, statistics). Couldn’t accurately use the words she’d memorized and didn’t understand their nuance in reading text

She was a false positive.

I’ve known a lot of high scoring students of every ethnicity over the years–and by high scoring, I mean 1400-1600 on the 1600 SAT, and 2200-2400 on the 10 years with the three tests. 5s on all AP tests, 700+ on all Subject tests. Until that conversation, I would have said kids had high test scores were without exception tremendously impressive kids: usually creative, solid to great writing, opinionated, spotted patterns, knew history, knew the underlying theory of anything that interested them. I could see the difference, I’d say, between these kids and those slightly lower on the score scale–the 1200s, the kids who were well rounded with solid skills who were sometimes as impressive, sometimes not, sometimes a swot, sometimes a bright kid who didn’t see much point in striving.

Every time saying it, though, I’d push back memories of a few kids who’d casually mentioned a 5 score, or a 1600 or 2400, that took me aback. That particular kid who didn’t seem all that remarkable for such a high score. But in all these cases, I was only relying on gut instinct and besides, disappointingly high IQ folks exist.   For every Steven Hawking there’s a Ron Hoeflin. Or a Marilyn vos Savant, telling us whether or not larks are happy.  Surely the test would sometimes capture intellect that just wasn’t there in the creative original ways I looked for. Or hey, maybe some of those kids were stretching the truth.

But here, I had my own experience of her work and her scores were easily confirmable, as my employer kept track (her name was on the “2400 list”, the length of which was another shock to my prior understanding). She got a perfect score despite being a banal teen who couldn’t write or think in ways worthy of that score.

Since that first real awareness, I’ve met other kids with top 1% test scores who are similarly…unimpressive.  98+ percentile SAT scores, eight 5 AP scores, and a 4.5 GPA with no intellectual depth, no ability to make connections, or even to use their knowledge to do anything but pick the correct letter on the multiple choice test or regurgitate the correct answer for a teacher. Some I could confirm their high scores, others I just trusted my gut, now that I’d validated instinct. These are kids with certainly decent brains, but not unusually so.  No shame in that.  But no originality, not even the kind I’d expect from their actual abilities. No interest in anything but achieving high scores, without any interest in what that meant.

It probably won’t come as a shock to learn that all the kids with scores much higher than demonstrated ability were born somewhere in east Asia, that they all spent months and months learning how to take the test, taking practice tests, endlessly prepping.

The inverse doesn’t hold. I know dozens, possibly hundreds, of exceptional Asian immigrants with extraordinary brains and the requisite intellectual depth and heft I would expect from their profile of perfect SAT scores and AP Honors status. But when I am shocked at a test score that is much higher than demonstrated ability, the owner of that score is Chinese or Korean of recent vintage. 

I don’t know whether American kids (of any race) could achieve similar scores if they swotted away endlessly. Maybe some of them are. But my sample size of all races is pretty high, and I’ve not seen it.  On the other hand, I’m certain that very few American kids would find this a worthwhile goal. 

Brief aside: when I taught ELL, I had a kid who was supposedly 18. That’s what his birth certificate said, although there’s a lot of visa fraud in Chinese immigrants, so who knows. He didn’t look a day older than fourteen. And he had very little interest in speaking or learning English. Maybe he was just shy, like Taio, although I’d test him every so often by offering him chocolate or asking him about his beloved bike and he showed no sign of comprehension. But then he’d ace multiple choice reading passages. Without reading the passage. He had no idea what the words meant, but he’d pick the right A, B, or C, every time. I mentioned this to the senior ELL teacher, a Chinese American, and she snorted, “It’s in our genes.”

I don’t think she was kidding but the thing is, I don’t much care how it happens. If American kids are doing this, then it changes not a whit about my unhappiness. It’s not a skill I want to see transferred to the general teenage American population. (That said, the college admissions scandal makes it pretty clear that, as I’ve said many times, rich parents are buying or bribing their way in, not prepping. And unsurprisingly, it appears that Chinese parents were the biggest part of his business.)

Now, before everyone cites data that I probably know better than they do, let me dispatch with the obvious. Many people think test prep doesn’t work at all. That was never my opinion  When people asked me if test prep “worked”, I’d always say the same thing: depends on the kid. “Average score improvement” is a useless metric; some kids don’t improve, some improve a bit, some improve a huge amount. Why not pay to see if your kid improves a lot? But I also felt strongly that test prep couldn’t distort measured ability to beyond actual ability, and I no longer believe that.

But I didn’t believe what critics at the time said, that test prep worked…..too well. I didn’t believe that false positives were a real problem. And the terrible thing is–at least to me–is that I still believe normal test prep is a good thing. Distortion of ability, however, is not.

As the push to de-emphasize tests came, as test-entry high schools came under attack, as colleges turn to grades only–a change I find horrifying–I could no longer join the opposition because the opposition focused their fire almost exclusively on their dismay at the end of meritocracy and the concomitant discrimination against Asian immigrants. I oppose the discrimination, but I no longer really believe the tests we have reliably reveal merit to a granular degree. The changes I want to see in the admissions process would almost certainly reduce Asian headcount not by design, but by acknowledging that specific test scores aren’t as important.

I have other topics I’ve been holding off discussing:

  • why I support an end to test-based high schools in its current form
  • why we still need tests
  • how the SAT changes made all this worse
  • how the emphasis on grades for the past 20 years has exacerbated this insanity
  • why we need to stop using hard work as a proxy for merit

But I needed to try, at least,  to express how my feelings have changed. This is a start. It’s probably badly written, but as you all know, I’ve been trying to write more even if the thoughts aren’t fully baked, so bear with me.



About educationrealist

12 responses to “False Positives

  • Polynices

    Great post. Been reading you for years but I especially appreciate this post at this time because my oldest is now a senior in HS and trying to figure out the college stuff.

  • Veracitor

    Thank you! I’m a long-time reader and very interested to read more of your views especially on the topics you just teased. Don’t sweat polishing your prose too much. I’m old enough and by happenstance come from a key part of the country to have observed ‘in the field’ a lot of what you discuss in the history of evolution of attitudes and actions toward standardized testing in high schools and colleges by students, parents, and administrators of various backgrounds and personal goals especially as mix of student families has changed due to demographic shifts notably though not only by immigration from East Asia.

  • Calvin Hobbes

    Very interesting. Thanks.

    Probably related:

    2020 SAT Scores: Asian Supremacy Intensifies


    “And I, for one, welcome our new Asian overlords. I’d like to remind them as a trusted Internet personality, I can be helpful in rounding up others to toil in their underground test prep caves.”

    I’ve read that test prep supposedly does not make much difference, but I suspected that it can make a real difference if it’s done the way many Asians do it. It sounds like you think so, too (with more evidence than I had).

    I’ll be interested in what you have to say about the SAT being made worse.

    I think the opposition to the SAT is driven by the racial gaps, and it would be in the interest of the ETS to make the test worse if that reduces the racial gaps a bit.

    I think Charles Murray says we should use achievement tests instead of the SAT.

    It would be nice if we used tests which are such that prepping for them leads to worthwhile learning. For math, I think that’s true of the AMC tests. Those would be too hard for most students, but I don’t know why the elite colleges don’t use those.

    • Eric Brown

      @Calvin – I think it’s perfectly obvious why the “elite” colleges don’t use the AMC tests. They’re not interested in ability. They never have been.

      Caltech and MIT might be interested in the AMC tests, although I admit I haven’t followed MIT/Caltech internal politics. It’s entirely possible they’ve decided that equity is more important.

  • Yancey Ward

    This is the shallow thinker/deep thinker divide. The college admissions tests no longer really allow one to accurately assign a student to one or the other of the two groups is what this sounds like to me.

    I was a chemist in the pharma business for a couple of decades, and as such I interviewed prospective job candidates just out of their masters or PhD programs. An older colleague warned me when I first started my career not to get snowed by candidates that could regurgitate data or procedures without any understanding of what it all actually meant. He gave me a few examples of how to probe that deeper understanding of, in my case, organic chemistry- things I might not have thought to ask a candidate about because my assumption would have been this was “common knowledge” and might insult the candidate to asked about it. I was wrong in that assumption- badly wrong.

  • JT McQuitty

    Some of the worst things in school that ever happened to me, were answered prayers – being admitted to a “reach” school or class.

  • Ruth

    Despite your claim that this post is badly written, I personally find it fascinating (again, same as with numerous other posts that you’ve written in the past), especially with the reinvigorated debate over the value of test scores now that the University of California has gotten rid of them.

    I think your point about your student displaying a lack of intellectual depth and curiosity relative to her scores is perfectly valid, but in my opinion such a mismatch should not be surprising. Tests like the SAT merely provide a snapshot of the test taker’s abilities in reading comprehension, vocabulary, and basic algebra and geometry – nothing more, nothing less. It can’t distinguish between one student who naturally acquired a large vocabulary of his/her own accord by reading voraciously and being exposed to a variety of different works and authors, and another student who simply memorized lists upon lists of vocabulary words in isolation. And as a student of East Asian heritage who also got a very good score on the reading/writing sections (I took it back when it was on the 2400 scale), I can say that the reading comprehension section is actually extremely straightforward: the correct answer is almost always the one that is supported by a direct quote from the text. In many literary works, there are often spirited debates among scholars about historical context, alternative interpretations and theories, and meanings/subtext that can only be perceived by reading between the lines, but none of that is present on the SAT. In my view, the SAT and others like it are merely skill-based tests that reveal little about the student’s intangible qualities such as self-motivation, curiosity, and openness to ideas, which is why many selective colleges resort to teacher recommendations and student essays in order to gain insight into these areas.

  • Jackson Jules

    An article that is relevant to this topic is “How Not to Pick a Physicist”

    Click to access glanz-how-not-to-pick-a-physicist.pdf

    Most of the article is standard liberal hand-wringing over test scores. But there’s this interesting section where the author talks about the dubious validity of the Physics GRE when applied to Chinese international students.

    “In a field as obsessed with numbers as physics is, here are two
    whose disparity can hardly be overlooked: 618 and 851. The
    first is the mean score earned by U.S. residents between 1992
    and 1995 on the Graduate Record Examination (GRE) subject
    test for physics, a standardized test required for admission to
    most physics graduate programs in the United States. The
    second is the mean score eamed over the same period by
    residents of the People’s Republic of China (PRC), who ac-
    count for about 6% of Ph.D.s awarded in all fields by U.S.
    universities each year. On a test that has possible scores of
    between 200 and 990, says Howard Georgi, a physicist at Harvard
    University, ‘the difference between the PRC and everybody
    else is incredibly dramatic.’

    The disparity also underscores the caution with which GRE
    scores have to be approached, say faculty at physics graduate
    programs (see main text). One factor in the high scores, they
    say, has to be the huge pool of highly motivated, well-trained
    students in the PRC. But physics faculty also say that Chinese
    students’ overwhelming advantage on the test doesn’t seem to be
    reflected in other measures of physics ability. An educational
    system focused on exam-taking and the existence of poorly regu-
    lated “coaching” classes for the physics GRE may have inflated
    the PRC scores, say some observers. And Science has learned of
    another factor that may have played some role in the past:
    widespread security breaches, which culminated in October
    1992, when exam booklets were widely leaked to students in the
    PRC before the physics exam was given.

    No one denies that dozens of top-notch physicists have
    emerged from the PRC in the 1990s. J. Woods Halley, a physicist
    at the University of Minnesota who has served on the physics
    GRE advisory committee, suspects that the effort to identify
    “unfair” advantages is driven in part by nationalistic bias-“a
    certain feeling that (PRC students] can’t be that good.” He
    points out that “these are very, very able students out of a huge
    pool.” Xueqiao Xu, a physicist at the Lawrence Livermore Na-
    tional Laboratory, suggests that the Chinese educational system
    may also foster strength in physics, since students in the PRC
    gain “a very strong mathematics and physics background” as
    early as elementary school.

    But while “there are a lot of good physics students in the PRC,”
    says Georgi, his experience with graduate admissions suggests that
    those students “are certainly nowhere near as good at physics as
    they are at taking the GRE subject test.” The GRE scores, says
    Jack Mochel, professor and associate head of physics at the Uni-
    versity of Illinois at Urbana-Champaign, “are no indication of
    how [PRC students] will do in graduate school.”

    Coaching for the test may explain part of the discrepancy,
    says Neal Abraham, a physicist at Bryn Mawr College who has
    also served on the physics GRE committee. And in a recent
    essay published on the Internet, he cited another factor: “Chi-
    nese students report that books of prior exams and exam ques-
    tions are compiled by test takers and are available for study”-
    testimony that one former student from the vicinity of Xian
    province confirmed to Science. While American physics stu-
    dents also benefit from “practice” multiple-choice exams, the
    actual questions from old GREs are not made available. Since
    some questions are reused, extensive use of such materials could
    give test-takers an unfair advantage.

    That practice peaked in October 1992, according to sources
    including the Educational Testing Service (ETS) in Princeton,
    New Jersey, which produces the exam. At the time, “we occasion-
    ally reused [entire] test forms,” says Jacqueline Briel, associate
    program director for the GRE, and it turned out that copies of a
    previous exam using the same form were circulating among stu-
    dents planning to take the test. The leak was discovered when a
    student complained to ETS, Halley says. “He said this test wasn’t
    fair because he hadn’t seen it beforehand as his friends had. This
    caused real earthquakes at ETS.”

    In response, the testing service voided those test results and
    halted physics GRE testing in the PRC for a year while it made
    changes to improve the test’s security, says Briel. The test is
    now given in China only once a year with a fresh form each
    time. She believes that uncaught security problems are now
    rare, noting that “aberrations” in the scores have not recurred.
    Even so, she stresses that admissions committees should also
    rely heavily on other information about a candidate-course
    grades, recommendations, and any knowledge of a candidate’s
    personal motivations and English skills, for example. Xu agrees.
    “First you want a high score,” he says. “But if you have [enough]
    time and manpower, interview these candidates.” Only by going
    beyond the numbers, Xu says, can admissions officers identify
    which candidates are prepared to make the great cultural leap
    from Chinese academics to the research community in the
    United States.”

  • Celebrating the Decennium | educationrealist

    […] as I wrote recently, I’ve been having some trouble organizing my thoughts to set the groundwork for future […]

  • Murray/Sailer on Powerline Podcast | educationrealist

    […] Asian test prep that goes on for years and years, not a few weeks, sets up what I believe are false positives but we can argue that point […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: