Showing posts with label testing. Show all posts
Showing posts with label testing. Show all posts

September 14, 2011

Asians pulling away in SAT scores

From FairTest:

2011 College-Bound Seniors Avg SAT Scores

W/score changes from 2006





READING MATH WRITING TOTAL
ALL 497 (-6) 514 (-4) 489 (-8) 1500 (-18)
Female 495 (-7) 500 (-2) 496 (-6) 1491 (-15)
Male 500 (-5) 531 (-5) 482 (-9) 1513 (-19)
Asian 517 (+7) 595 (+17) 528 (+16) 1640 (+40)
White 528 (+1) 535 (-1) 516 (-3) 1579 (-3)
Black 428 (-6) 427 (-2) 417 (-11) 1272 (-19)
AmerIndian 484 (-3) 488 (-6) 465 (-9) 1437 (-18)
Mexican 451 (-3) 466 (+1) 445 (-7) 1362 (-9)
PR 452 (-7) 452 (-4) 442 (-6) 1346 (-17)
Other Hisp 451 (-7) 462 (-1) 444 (-6) 1357 (-14)

I've been following baseball statistics since 1965 and educational test scores since 1972. Test scores are vastly more important for understanding how the world works, but they aren't as diverting because they seldom change. For example, in the above, whites are just treading water, down 1% of a standard deviation over half a decade. Boring. NAM scores are down, probably mostly because the College Board has been subsidizing more NAMs to take the SAT for free as a publicity move. In other words, it's probably not a real change.

But, wow, as I pointed out last year in writing about PSAT National Merit Semifinalists, Asians have just been pulling away from everybody else in the last few years.

Is the same trend true on low stakes tests, like most school achievement tests that are used to grade schools rather than students?

This is a big story and it deserves more research. Is the innate intelligence of Asians going up? Or does this prove that Tiger Mothering works? Is the SAT being unfairly gamed? There are a lot of questions here.

July 7, 2011

By Any Means Necessary

For a long, long time, the foremost goal of the American educational system has been to close The Gap. This has turned out be kind of like if President Kennedy had announced in 1961 that America was committed to, by the end of the decade, building a perpetual motion machine. From the Atlanta Journal-Constitution:
Investigation into APS cheating finds unethical behavior across every level 
By Heather Vogell  
Across Atlanta Public Schools, staff worked feverishly in secret to transform testing failures into successes. 
Teachers and principals erased and corrected mistakes on students’ answer sheets. 
Area superintendents silenced whistle-blowers and rewarded subordinates who met academic goals by any means possible. 
Superintendent Beverly Hall and her top aides ignored, buried, destroyed or altered complaints about misconduct, claimed ignorance of wrongdoing and accused naysayers of failing to believe in poor children’s ability to learn. 
For years — as long as a decade — this was how the Atlanta school district produced gains on state curriculum tests. The scores soared so dramatically they brought national acclaim to Hall and the district, according to an investigative report released Tuesday by Gov. Nathan Deal.

Yeah, this is bad, but what do you expect? From CBS News:
"We were told that we needed to get the scores by any means necessary, and we were told that our jobs were on the line," former Atlanta Public Schools teacher Sidney Fells said.

The Republican President of the United States and the hereditary dynastic leader of the Democrats, Ted Kennedy, got together a decade ago and made up a law, No Child Left Behind, that said that every public school student in America had to score Proficient (on a scale that runs Below Basic, Basic, Proficient, Advanced) on tests that will be given about 34 months from now.

But, the states could make up, administer, and grade their own tests. 

What else did Bush, Kennedy, and the press expect other than massive fraud?

The whole foundation of education in America is based on lying and punishing truth-tellers (e.g., James Watson), so what else could have happened?

May 7, 2011

IQ: Intelligence and/or motivation?

Bryan Caplan writes:
Years ago, I told Tyler Cowen, "It's surprising that IQ tests predict life outcomes so well, because there's usually no financial incentive to get a high score."  He replied, "People try out of pride - an under-rated motive."  So when Tyler blogged Duckworth et al, "Role of Test Motivation in Intelligence Testing" I naturally took notice.  Key claims: 
1. Material incentives boost IQ scores: ... "The authors reasonably infer that IQ is more of a composite intelligence/motivation measure than usually believed - especially by inter-disciplinary researchers." 
As far as I can tell, the authors do nothing to show that their results make IQ is less predictive.  They don't even show that IQ is more mutable than earlier studies find; boosting incentives boosts scores while the incentives remain in place, but there's no reason to think the boost lasts after the test-takers receive their pay.  All the researchers require us to reconsider is the reason why IQ is so predictive and hard to durably improve.  

I made Duckworth's point in my 2007 FAQ on IQ:
Q. So, you're saying that IQ testing can tell us more about group differences than about individual differences? 
A. If the sample sizes are big enough and all else is equal, a higher IQ group will virtually always outperform a lower IQ group on any behavioral metric.... 
Of course, everything else is seldom equal. A more conscientious group may well outperform a higher IQ group. On the other hand, conscientiousness, like many virtues, is positively correlated with IQ, so IQ tests work surprisingly well. 
Q. Wait a minute, does that mean that maybe some of the predictive power of IQ comes not from intelligence itself, but from virtues associated with it like conscientiousness? 
A. Most likely. But perhaps smarter people are more conscientious because they are more likely to foresee the bad consequences of slacking off. It's an interesting philosophical question, but, in a practical sense, so what? We have a test that can predict behavior. That's useful.

Keep in mind that the notorious average group gaps in cognitive test scores show up not only on low stakes tests, but on high-stakes tests where the testees are highly motivated: the SAT, ACT, LSAT, MCAT, GMAT, GRE, the military's AFQT enlistment test, NYC firefighting hiring tests, New Haven fire department promotion tests, Chicago cop tests, the NFL's Wonderlic IQ test, insurance agent licensing tests, and so forth and so on ad infinitum.

I can think of only one example where different levels of group motivation had a sizable effect: the military's AFQT enlistment test was renormed in 1980 on the National Longitudinal Study of Youth sample of about 12,000 young people, most of whom weren't trying to enlist. The test was 105 pages long. It was found years later that the anomalously large white-black gap on this renorming (18.6 IQ points rather than the usual 15 or 16) was caused by blacks being more likely to give up from discouragement part way through this long and hard test and leave the latter questions unanswered or just "bubbled in." (Keep in mind that this was a low stakes test for the participants, who were just taking part in a social science project, not trying to enlist).

In 1997, the AFQT was renormed using a computer adaptive testing where wrong answers lead to easier questions and thus less discouragement. The white-black gap was only 14.7 points.

This finding is worth keeping in mind for evaluating school performance test scores, which are usually low stakes tests for the students. 

Some of the difference in performance among schools on achievement tests therefore depends upon how well the principal and teachers manage to motivate students to keep working until the end of the test.

So, a lot of reports of miracle schools that seem to fizzle out after awhile have to do with higher scores ginned up by getting students just to not bubble in.

On the other hand, I'd rather send my kid to a school where the management has enough on the ball to figure out how to look better and is persuasive enough to motivate students to work for an extra 20 minutes than a school where management isn't. And a school that manages to motivate students on their state tests is likely to attract the children of more motivated and smarter parents in the future. 

So, once again, the question of intelligence v. motivation turns out to be more philosophical than predictive.

One thing to keep in mind is that in experimental situations involving low stakes tests, if the experimenters _want_ one group of testtakers to be unmotivated, it's easy to demotivate them to work less hard on the test. The test administrator can convey that a lackadaisical attitude is okay just through word choice, tone of voice, body language, and so forth.

I suspect this is a major feature of the popular stereotype threat experiments where low stakes tests are given to blacks. In the test group, blacks are told that they are expected to score low on the following test and in the control group, they aren't. Not surprisingly, on these tests that are meaningless to the testtakers, the first group is more likely to pick up the experimenters' hopes that they will work less hard and they do work less hard.

I've never seen stereotype threat confirmed experimentally on high stakes tests. I can't see how such an experiment would pass an ethical review board.

You'll note that stereotype threat experiments aren't about getting blacks to perform better on tests but about getting them to perform worse. Big difference.