I have a bunch of different posts in the hopper right now, but after starting a mammoth comment on this brand new E. D. Hirsch post (Welcome to blogging, sir!), I decided to convert it to a post—after all, I need the content. (Well, it was brand new when I started this post, anyway.)
Hirsch is making a larger point about Samuel Messick’s concern with consequential validity versus construct validity but he does so using the history of the SAT. In the 80s, says Hirsch, the ETS devised a multiple choice only method of testing writing ability, which was more accurate than an essay test. But writing quality declined, he implies, because students believed that writing wasn’t important. But thanks to Messick, the SAT finally included a writing sample in its 2005 changes.
I have nothing more than a layman’s understanding of construct vs. consequential validity, and Hirsch’s expertise in the many challenges of assessing writing ability is unquestioned, least of all by me. But I know a hell of a lot about the SAT, and what he writes here just didn’t match up with what I knew. I went looking to confirm my knowledge and fill any gaps.
First, a bit of actual SAT writing assessment history:
- By 1950, the CEEB (College Board’s original name) had introduced the English Composition Achievement Test. The original test had six sections, three multiple choice, three essay (or free response). The CEEB began experimenting with a full 2-hour essay the next year, and discontinued that in 1956. At that point, I believe, the test was changed to 100 question multiple choice only. (Cite for most of this history; here’s a second cite but you need to use the magnifying glass option.)
- In 1960, the CEEB offered an unscored writing sample to be taken at the testing center, at the universities’ request, which would be sent on to the schools for placement scoring. (I think this was part of the SAT, but can’t be sure. Anyone have a copy of “The Story Behind the First Writing Sample”, by Fred Godshalk?)
- In 1963, the English Composition Achievement Test was changed to its most enduring form: a 20 minute essay, followed by a 40-minute multiple choice section with 70 questions.
- In 1968, the CEEB discontinued the unscored writing sample, again at the universities’ request. No one wanted to grade the essays.
- In 1971, the CEEB discontinued the essay in the ECAT , citing cost concerns.
- In 1974, the SAT was shortened from 3 hours to 2 hours and 45 minutes, and the Test of Standard Written English was added. The TSWE was multiple choice only, with questions clearly similar to the English Composition Achievement Test. The score is not included in the SAT score, but reported to colleges separately, to be used for placement.
- In 1976, in response to complaints, the essay version of the ECAT was reinstated. (It may or may not be significant that four years later, the ETS ran its first deficit.) From what I can tell, the ECAT and the TSWE process remained largely unchanged from 1976 through 1994. This research paper shows that the essay was part of the test throughout the 80s.
- In 1993, all achievement tests were rebranded as SAT II; the English Composition Achievement Test was renamed to the SAT II Writing exam. At some point, the SAT II was shortened from 70 to 60 questions, but I can’t find out when.
- In 1994 , there were big changes to the SAT: end to antonyms, calculators allowed, free response questions in math. While the College Board had originally intended to add a “free response” to the verbal section (that is, an essay), pressure from the University of California, the SAT’s largest customer, forced it to back down (more on this later). At this time, the TSWE was discontinued. Reports often said that the SAT Writing exam was “new”; I can find no evidence that the transition from the ECAT to the SAT II was anything but seamless.
- In 1997, the College Board added a writing section to the PSAT that was clearly derived from the TSWE.
- In 2005, the College Board added a writing section to the SAT. The writing section has three parts: one 25 minute essay and two multiple choice sections for a total of 49 questions. The new writing test uses the same type of questions as the ECAT/SAT II, but the essay prompt is simpler (I can personally attest to this, as I was a Kaplan tutor through the transition).
- By the way, the ACT never required an essay until 2005, when compliance with UC’s new requirement forced it to add an optional essay.
I’m sure only SAT geeks like me care about this, but either Hirsch is wrong or all my links are wrong or incomplete. First, even with his link, I can’t tell what he’s referring to when he says “ETS devised a test…”. A few sentences before, he places the date as the early 80s. The 80s were the one decade of the past five in which the College Board made no changes to any of its writing tests. So what test is he referring to?
I think Hirsch is referring to the TSWE, which he apparently believes was derived in the early 80s, that it was a unique test, and that the College Board replaced the TSWE with the required essay in 2005. This interpretation of his errors is the only way I can make sense of his explanation.
In that case, not only are his facts wrong, but this example doesn’t support his point. The SAT proper did not test written English for admissions. The TSWE was intended for placement, not admissions. Significantly, the ACT was starting to pick up market share during this time, and the ACT has always had an excellent writing test (multiple choice, no essay). Without the TSWE, the SAT lacked a key element the ACT offered, and saying “Hey, just have your students pay to take this extra test” gave the ACT an even bigger opening. This may just possibly have played into the rationale for the TSWE.
Colleges that wanted an SAT essay test for admissions (as opposed to placement) had won that battle with the English Composition Achievement Test. The CEEB bowed to the pressures of English teachers not in 2005, but in 1963, when it put the essay back into the ECAT despite research showing that essays were unreliable and expensive. After nine years of expense the CEEB believed to be unnecessary, it tried again to do away with the essay, but the same pressures forced it to use the essay on the English Composition Achievement Test/SAT II Writing Test from 1976 to 2005, when the test was technically discontinued, but actually shortened and incorporated into the SAT proper as the SAT Writing test. Any university that felt strongly about using writing for admissions could just require the ECAT. Many schools did, including the University of California, Harvard, Stanford, and most elite schools.
The College Board tried to put an essay into the test back in the 90s, but was stopped not because anyone was concerned about construct or consequential validity, but because its largest customer, the University of California, complained and said it would stop using the SAT if an essay was required. This struck me as odd at first, because, as I mentioned, the University of California has required that all applicants take the English Composition Achievement test since the early 60s. However, I learned in the link that that Achievement Test scores weren’t used as an admissions metric until later in the 90s. In 1994, UC was using affirmative action so wasn’t worried about blacks and Hispanics. Asians, on the other hand, had reason to be worried about an essay test, since UC had already been caught discriminating against them, and UC clearly felt some placation was in order. Later, after the affirmative action ban, UC did a 180 on the essay, requiring that an essay be added to the SAT in 2005.
Why did the College Board want to put an essay in the SAT in 1994, and why did UC change its position 11 years later? My opinion: by then the College Board was getting more efficient at scoring essays, and the ECAT/SAT II Writing wasn’t catching on with any other than elite schools and UC. If the Writing test was rolled into the SAT, the College Board could charge more money. During the 90s we saw the first big push against multiple choice tests in favor of “performance-based assessments” (Hirsch has a whole chapter in one of his books about these misconceptions), giving the College Board a perfect rationale for introducing an essay and charging a lot more money. But UC nixed the essay until 2002, when its list of demands to the College Board called for for removing analogies, quantitative comparisons, and—suddenly—demanding that the writing assessment be rolled into the main SAT (page 15 of the UC link). I can see no reason for this—at that time, UC still required Subject tests, so why couldn’t applicants take the writing test when they took their other two Subject tests? The only reason—and I mean the only reason—I can see for rolling the writing test into the main SAT comes down to profit: the change made the College Board a hell of a lot of money.
Consider: the College Board already had the test, so no development costs beyond dumbing the test down for the entire SAT population (fewer questions, more time for the essay). So a test that only 10% of the testing population paid for could now be sold to 100% of the testing population. The 2005 SAT was both longer (in time) and shorter (in total questions), and a hell of a lot more expensive. Win win.
So UC’s demand gave the College Board cover. Fair’s fair, since UC had no research rationale whatsoever in demanding the end to analogies and quantitative comparisons, changes that would cost the College Board a great deal of money. Everyone knows that California’s ban on affirmative action has made UC very, very unhappy and if I were to assert without foundation that UC hoped and believed that removing the harder elements of the SAT would reduce the achievement gap and enable the university to admit more blacks and Hispanics, well, I’d still get a lot of takers. (Another clue: UC nearly halved the math test burden requirement at the same time—page 16 of the UC link.) (Oh, wait—Still another clue: Seven years later, after weighting the subject tests more heavily than the SAT and threatening to end the SAT requirement altogether, UC ends its use of….the Subject tests. Too many Asians being “very good at figuring out the technical requirements of UC eligibility”.)
So why does any of this matter?
Well, first, I thought it’d be useful to get the history in one place. Who knows, maybe a reporter will use it some day. Hahahahaha. That’s me, laughing.
Then, Hirsch’s assertion that the “newly devised test”, that is, the TSWE, led to a great decline in student writing ability is confusing, since the TSWE began in 1974, and was discontinued twenty years later. So when did the student writing ability decline? I’ve read before now that the seventies, not the eighties, saw writing nearly disappear from the high school curriculum (but certainly Hirsch knows about Applebee, way more than I do). If anything, writing instruction has improved, but capturing national writing ability is a challenge (again, not news to Hirsch). So where’s the evidence that student writing ability declined over the time of the TSWE, which would be 1974-1994? Coupled with the evidence that writing ability has improved since the SAT has achieved “consequential validity”?
Next, Hirsch’s history ignores the ECAT/SAT II Writing test, which offers excellent research opportunities for the impact of consequential validity. Given that UC has required a test with an essay for 50 years, Hirsch’s reasoning implies that California students would have stronger writing curriculum and abilities, given that they faced an essay test. Moreover, any state university that wanted to improve its students’ writing ability could just have required the ECAT/SAT Writing test—yet I believe UC was the only public university system in the country with that requirement. For that matter, several states require all students to take the ACT, but not the essay. Perhaps someone could research whether Illinois and Colorado (ACT required) have a weaker writing curriculum than California.
Another research opportunity might involve a comparison between the College Board’s choices and those driving American College Testing, creator of the ACT and the SAT’s only competition. I could find no evidence that the ACT was subjected to the on-again, off-again travails of the College Board’s English/Writing essay/no essay test. Not once did the College Board point to the ACT and say to all those teachers demanding an essay test, “Hey, these guys don’t have an essay, so why pick on us?” The ACT, from what I can see, never got pressured to offer an essay. This suggests, again, that the reason for all the angst over the years came not from dissatisfaction with the TSWE, but rather the Achievement/SAT II essay test, and the College Board’s varying profit motives over the years.
Finally, Hirsch’s example also assumes that the College Board, universities, high school teachers, and everyone else in 2005 were thinking about consequential or construct validity in adding the essay. I offer again my two unsupported assertions: The College Board made its 1994 and 2005 changes for business reasons. The UC opposed the change in 1994 and demanded it in 2005 for ideological reasons, to satisfy one of its various identity groups. Want to argue with me? No problem. Find me some evidence that UC was interested in anything other than broadening its admissions demographic profile in the face of an affirmative action ban, and any evidence that the College Board made the 2005 changes for any other reason than placating UC. Otherwise, the cynic’s view wins.
On some later date, I’ll write up my objections to the notion that the essay test has anything to do with writing ability, but they pulled the focus so I yanked them from this post.
By the way, I have never once met a teacher, except me, who gives a damn about helping his or her students prepare for the SAT. Where are these teachers? Can we take a survey?
Every so often, I wonder why I spend hours looking up data to refute a fairly minor point that no one really cares about in the first place and yes, this is one of those times. But dammit, I want things like this to matter. I don’t question Hirsch’s goals and agree with most of them. But I am bothered by the simplification or complete erasure of history in testing, and Hirsch, of all people, should value content knowledge.
Yeah, I did say “brief”, didn’t I? Sorry.
January 13th, 2013 at 4:47 pm
Thanks for your very credible and creditable correction.
Here’ what I should have said about the use on non-writing tests to test writing competence. The ETS had found from its researches that a rather short multiple choice test was a more reliable predictor of actual writing skill than a short writing sample.
They based this assessment not only on their experience with grading writing samples, but also on some exhaustive research. In one experiment, Diederich and his colleagues took 300 student papers, and asked them to be graded by English teachers, lawyers, editors, and business executives. The highest correlation within each group of graders was .4. Of the 300 papers, graded, 101 of them received every grade from 1 to 9. Ninety four percent received from 7 to 9 different grades. No essay received less than 5 different grades. ( “Factors in Judgments of Writing Ability, “ by Diederich, French and Carlton, ETS 1961) This lead Godshalk et al to test out indirect methods that could prove more consistent and accurate, as judged by the more consistent judgments compiled when subjects wrote several different essays over several days. (Godshalk, et al, “The Measurement of Writing Ability” New York, College Board, 1966.) The result was the simplified multiple choice test I mentioned.
Here is where my knowledge grows dim and I defer to you. I’m not sure how far the Godshalk test was put to use. I do know that there was concern from ETS that using it would send the wrong pedagogical signal and would probably cause less writing practice in schools. Whether this prediction was the fruit of observation or just a prediction, I leave to your historical expertise. So I defer to you on the testing history. The key point I made about ETS’s concern with consequential validity vs construct validity was correct in the case of their decision to continue demanding writing samples. (Within ETS, it’s well known that plenty of weight is still given to the more reliable multiple-choice component.) And the lore is that consequential validity is by far the chief reason for demanding actual writing.
E D Hirsch
January 13th, 2013 at 5:33 pm
My blog is much honored! Thanks for your response. I agree entirely that the ETS (and other researchers as well) pretty conclusively established that essays were both less reliable and more expensive. However, I find it interesting that the CEEB/College Board did everything it could to push that decision to its achievement test, so it could charge more money (not, I hasten to add, that there’s anything wrong with that). I can see no evidence that the ACT makers ever struggled with this decision, and the ACT’s English test has always been highly regarded.
Thanks again–I’m a huge fan.
January 14th, 2013 at 12:04 am
I agree that few high school teachers (other than you) care much about preparing their students for the SAT. What I discovered to my initial surprise as a school board President about 10 years ago was that, for the most part, high school teachers have little interest in the success their students have in obtaining admission to the most competitive universities.
My upscale suburban district in New Jersey set up a high school grading committee, with about equal representation from high school teachers and parents, to consider changing the A-B-C-D-F grading system (with no +’s or -‘s). The teachers were all in favor of either adding pluses and minuses to the letter grades or going to actual numerical grades on a 50 to 100 scale. Research on grading history in the district indicated that far more A-‘s would appear on transcripts than A+’s, and somewhat more B-‘s would appear than B+’s, likely making it more difficult for the district’s students to achieve admission to the most selective universities. It turned out that this subject did not interest the teachers on this grading committee at all! Since there is nothing that most of the town’s parents care more than college admissions, this was a huge surprise to me. As a former college professor, I would have thought that teachers would take great pride in helping their students get into Ivy League universities, but that did not appear to be the case.
We eventually agreed to separate the college transcript issue from the grade reporting to parents issue, and report cards were modified to indicate numerical scores, while college transcripts showed only As, Bs, Cs and Ds. The latter was done to please parents, who did not want their children’s low As to show up on the transcript as A-‘s or 90s, while the former was done to please the teachers. The teachers wanted their students to try harder to make top grades in their classes, and were quite frank that the potential impact on college admissions did not much concern them.
January 14th, 2013 at 9:03 am
Here’s my conspiracy theory about the creation of the Writing subtest for the SAT: the University of California figured that if they had the English Subject Test rolled into the main SAT, they could drop the required SAT Subject Tests, which would cut down on the number of Asians getting in, because Tiger Moms are better at signing their kids up for all the different tests UC required.
That sounds nuts except that when UC tried to drop the Subject Test requirements a few years ago, the Asian Caucus in the California legislature went nuts, denouncing UC’s attempt to simplify the admissions process as racist:
Click to access apiscannedletter.pdf
January 14th, 2013 at 9:05 am
On another topic, what do you think about the Torlakson v. Deasy dustup in California over high school testing? The state of California will be switching to a whole new test (Common Core?) in a couple of years, so state superintendent Torlakson suggested taking a year off from testing rather than use the CST one last time before throwing it away, which LAUSD supremo Deasy denounced.
January 14th, 2013 at 2:58 pm
My take, and I keep thinking I should write this up but I don’t usually get into funding, so have at it if you find this helpful:
1) California is at or near bankruptcy
2) It has a very good test suite.
3) Developing the common core tests will cost a FORTUNE and then, since they are computer-based, the state will have to fork out money for every school to have enough computers to test.
4) This assumes the development of the tests will be done in a year, or two, or three. Really? It took the GRE much longer to go to a CAT system, and in that case the questions were a known quantity. These tests have to be done by grade, done in CAT (adaptive) form, and in high school for every subject. How likely is it that these tests are going to be done on time? I put the odds at zero.
The feds scandalized edupundits everywhere by turning down California’s request for an NCLB waiver. The state was turned down because of its refusal to get the teachers on board with assessments being used in evaluation (something that Deasy is in favor of, is he not)?
I think Torlakson’s push is a beautifully cynical strategy. Anyone sane knows that Common Core won’t happen on time, but nothing good comes to the person who points out the emperor is naked. So Torlakson conspicuously gets on board with Common Core, says “Wow, let’s get started on these tests right away!”
Because, of course, if the state is implementing common core testing there’s no way it can do VA testing for five or more years. And if he’s buying into the fed’s common core plan and is saying hey, the reason we’re not doing VA is because we’re abandoning our CST all so that we can get on board with your (the feds) new New Thing. So can we have our damn money, please?
That’s my guess, anyway.
January 14th, 2013 at 10:14 pm
I’m suspicious of CAT in practice ever since I took the GMAT the first year it was CAT. The first 2/3rds of the questions were normal. The last 1/3 on the math went into the deep end of number theory really fast and really deep. In contrast, the last 1/3 of reading went really superficial really fast, questions not much harder than identifying the subject in a short passage.
As an engineer I figure, of course I’d hit the math out of the park and of course my reading isn’t going to be as stellar, but sheesh guys come on, you call that adaptive?
Except when I get the results back, I was (IIRC, it’s been over 10 years now) in the low/mid-80’s percentile for math and upper/mid 90’s for reading. Compared to the usual MBA wanna-be? I find that hard to believe. I think they didn’t have their CAD tuned, it quickly pegged the meter after 2/3rds of the test and then my streak of right (or wrong) at the end answers helped with the final total. My combined percentile was an unimpressive 92. (Unimpressive as I was trying to get into HBS on a brand new fellowship an alumnus from my undergrad alma mater had created).
BTW, I’ve always wondered, are those percentiles among only those taking the test, normed for all college grads, or the entire population? Anyone know? Thx.
July 5th, 2013 at 8:11 am
From the GMAT website:
“The percentile rank of your score shows you the percentage of tests taken with scores lower than you for the most recent three-year period. Every year, each test taker’s score is updated with the most recent year’s percentiles.”
April 7th, 2014 at 1:02 am
[…] you like about the writing section, it was pure genius as a business decision. The College Board dumbed down an existing test that a couple hundred thousand kids took, made it cheaper to grade, and forced a […]