Ever since December 2001, when the results of the first PISA survey were made public, the Finnish educational system has received a lot of international attention. Foreign delegations are flocking to Finland, in the hope of discovering Finland's secrets.
The explanation widely accepted is that the Finnish educational system is better. For example, the following aspects have been pointed out:
- Schools routinely provide tutoring for weak students.
- Each school has a social worker ("koulukuraattori").
- Substitute teachers are often provided when the teacher is ill.
December 11, 2010
More Mandatory Finnish Content
December 7, 2010
PISA scores for Shanghai
Mean is 500, standard deviation is 100. So, Shanghai students beat the advanced world's international mean by 0.75, 0.56, and 1.00 standard deviations. Pretty good. On an IQ-like scale, that's approaching 112.
Interestingly, the NYT table left out some low-scoring countries, such as Mexico, but then what possible policy implications do the educational attainment and intelligence of Mexicans have for an American audience? None, none I tell you! Here's an eyeball-frying graphic from The Guardian of the OECD countries (leaving out unrepresentative cities like Shanghai). Check out the bottom line:
September 22, 2010
Is the SAT getting easier?
| California SAT 1998 v. 2010 | ||||||||
| College Bound seniors | 1998 # | 2010 # | Chg | 1998 Mean | 2010 Mean | Chg | 1998 v NHW | 2010 v. NHW |
| Total | 142,139 | 210,926 | 48% | 1520 | 1517 | -3 | -89 | -124 |
| White | 56,217 | 69,969 | 24% | 1608 | 1641 | 33 | 0 | 0 |
| Asian, As-Am, or Pac Isl | 29,889 | 44,932 | 50% | 1557 | 1614 | 57 | -51 | -27 |
| Black or Af Am | 8,868 | 14,476 | 63% | 1292 | 1320 | 29 | -317 | -321 |
| Mex or MA | 18,494 | 42,380 | 129% | 1341 | 1355 | 14 | -267 | -286 |
| Other Hispanic | 6,606 | 20,735 | 214% | 1359 | 1325 | -34 | -249 | -316 |
| Puerto Rican | 489 | 699 | 43% | 1434 | 1489 | 55 | -174 | -152 |
| American Indian | 1,415 | 1,256 | -11% | 1479 | 1488 | 9 | -129 | -153 |
| Other | 7,863 | 8,498 | 8% | 1566 | 1561 | -5 | -42 | -80 |
| No Response | 12,298 | 7,981 | -35% | 1520 | 1566 | 47 | -89 | -75 |
- People in California are getting smarter relative to the rest of the country.
September 21, 2010
SAT scores in California
| California 2010 SAT | Total | Crit Read | Math | Writing | |||||
| College Bound seniors | # | Share | Mean | Mean | SD | Mean | SD | Mean | SD |
| Total | 210,926 | 100% | 1517 | 501 | 113 | 516 | 119 | 500 | 113 |
| White | 69,969 | 33% | 1641 | 546 | 100 | 553 | 102 | 542 | 100 |
| Asian, As-Am, or Pac Isl | 44,932 | 21% | 1614 | 518 | 116 | 571 | 121 | 525 | 122 |
| Black or Af Am | 14,476 | 7% | 1320 | 444 | 101 | 436 | 102 | 440 | 97 |
| Mex or MA | 42,380 | 20% | 1355 | 449 | 95 | 458 | 96 | 448 | 90 |
| Other Hispanic | 20,735 | 10% | 1325 | 440 | 102 | 444 | 102 | 441 | 95 |
| Puerto Rican | 699 | 0% | 1489 | 501 | 101 | 495 | 105 | 493 | 100 |
| American Indian | 1,256 | 1% | 1488 | 499 | 102 | 504 | 101 | 485 | 98 |
| Other | 8,498 | 4% | 1561 | 517 | 113 | 525 | 118 | 519 | 115 |
| No Response | 7,981 | 4% | 1566 | 523 | 121 | 526 | 123 | 517 | 121 |
Being lazy, I'll leave it up to interested readers to do the work to evaluate this hypothesis and post their findings in the comments.
For example, questions to consider are: What exactly are the racial percentages of National Merit semifinalists in California? Do a higher percentage of Asian 17-year-olds take the SAT in California than do white 17-year-olds? (One thing not to worry about much in California is the SAT v. ACT divide that confuses things when thinking about SAT scores in, say, Iowa: California is traditionally an SAT state.) What is the nationality makeup of Asian / Pacific Islander 17-year-olds in California? What about taking the SAT multiple times -- how does that affect the numbers? (Okay, I found the answer to this last question: "Students are counted only once, no matter how often they tested, and only their latest scores and most recent SAT Questionnaire responses are summarized.) And so forth and so on.
Good luck!
By the way, this is the first bit of quantitative evidence I can recall to support the common-sense notion that California has smarter than the national average white people. Considering how damnably expensive it is and all the high end industries and all the Nobel Prizes, you would think it would have smart white people. But on the NAEP, California non-Hispanic whites always lag badly behind, say, Texan whites. And that was true way back on the big 1960 federal Project Talent test of 15-year-olds, where Texans beat Californians. So, numbers like that got me assuming that most white Californians are less Hewletts and Packards and more Bodines and Spicolis. But, maybe, white people in California just can't be bothered with trying on low stakes tests?
September 20, 2010
Test scores and home prices
September 7, 2010
Critical Thinking
Whereas students in most parts of the United States are typically asked simply to recognize a single fact they have memorized from a list of answers, students in high-achieving countries are asked to apply their knowledge in the ways that writers, mathematicians, historians and scientists do.
In the United States, a typical item on the 12th grade National Assessment of Educational Progress, for example, asks students which two elements from a multiple choice list are found in the Earth's atmosphere. An item from the Victoria, Australia, high school biology test (which resembles those in Hong Kong and Singapore) describes how a particular virus works, asks students to design a drug to kill the virus and explain how the drug operates (complete with diagrams), and then to design and describe an experiment to test the drug - asking students to think and act like scientists.
This kind of testing would clearly pay for itself just from the patent rights to the anti-viral drugs designed by the high school test-takers. They must be worth billions!
July 9, 2010
How did your kid do on the APs?
AP tests are graded 1 to 5 with a 5 supposed to be an equivalent to an A in a typical college's introductory year long course in the subject, a 4 equal to a B, and so forth.
So, if your kid took the English Lit test (the top bar in the graph) and got a 4 (the yellow-orange band), he actually scored at the 98th percentile (or higher) out of all kids his age in the country. If he got a 3 (light gray) in US History the third bar down), he scored in at least the 94th percentile.
Of course, if all students took the test, the number of people scoring 3s, 4s and even 5s would go up. In particular, Red State students don't take APs as much as Blue State students, and whites don't take anywhere near as many APs as Asians.
My 2009 VDARE.com article has lots of graphs on how students do on the AP, overall and by race.
June 23, 2010
Not this again!
From Inside Higher Ed:
New Evidence of Racial Bias on SAT
A new study may revive arguments that the average test scores of black students trail those of white students not just because of economic disadvantages, but because some parts of the test result in differential scores by race for students of equal academic prowess.
The finding -- already being questioned by the College Board -- could be extremely significant as many colleges that continue to rely on the SAT may be less comfortable doing so amid allegations that it is biased against black test-takers.
"The confirmation of unfair test results throws into question the validity of the test and, consequently, all decisions based on its results. All admissions decisions based exclusively or predominantly on SAT performance -- and therefore access to higher education institutions and subsequent job placement and professional success -- appear to be biased against the African American minority group and could be exposed to legal challenge," says the study, which has just appeared in Harvard Educational Review (abstract available here).
The existence of racial patterns on SAT scores is hardly new. The average score on the reading part of the SAT was 429 for black students last year -- 99 points behind the average for white students. And while white students' scores were flat, the average score for black students fell by one. Statistics like these are debated every year when SAT data are released, and when similar breakdowns are offered on other standardized tests.
The standard explanation offered by defenders of the tests is that the large gaps reflect the inequities in American society -- since black students are less likely than white students to attend well-financed, generously-staffed elementary and secondary schools, their scores lag.
In other words, the College Board says that American society is unfair, but the SAT is fair. And while many educators question that fairness of using a test on which wealthier students do consistently better than less wealthy students, research findings that directly isolate race as a factor in the fairness of individual SAT questions have, of late, been few.
The new paper in fact is based on a study that set out to replicate one of the last major studies to do so -- a paper published in the Harvard Educational Review in 2003, strongly attacked by the College Board -- and the new paper confirms those results (but using more recent SAT exams). The new paper is by Maria Santelices, assistant professor of education at the Catholic University of Chile, and Mark Wilson, professor of education at the University of California at Berkeley. The earlier study was by Roy Freedle of the Educational Testing Service.
The focus of both studies is on questions that show "differential item functioning," known by its acronym DIF. A DIF question is one on which students "matched by proficiency" and other factors have variable scores, predictably by race, on selected questions. A DIF question has notable differences between black and white (or, in theory, other subsets of students) whose educational background and skill set suggest that they should get similar scores. The 2003 study and this year's found no DIF issues in the mathematics section.
But what Freedle found in 2003 has now been confirmed independently by the new study: that some kinds of verbal questions have a DIF for black and white students. On some of the easier verbal questions, the two studies found that a DIF favored white students. On some of the most difficult verbal questions, the DIF favored black students. Freedle's theory about why this would be the case was that easier questions are likely reflected in the cultural expressions that are used commonly in the dominant (white) society, so white students have an edge based not on education or study skills or aptitude, but because they are most likely growing up around white people. The more difficult words are more likely to be learned, not just absorbed.
While the studies found gains for both black and white students on parts of the SAT, the white advantage is larger such that the studies suggest scores for black students are being held down by the way the test is scored and that a shift to favor the more difficult questions would benefit black test-takers.
Ready? Here goes:
By definition, blacks and whites are equally good at randomly guessing on multiple choice questions. So, the more difficult the question and thus the higher the percentage of students who randomly guess, the narrower the white-black differential.
If you made all the questions impossibly esoteric, so that everybody would guess on everything, then the white-black gap would disappear. If you made all the questions unbelievably easy, the white-black gap would also disappear. But when you make them a reasonable mix of difficulty in order to maximize the predictive value of the SAT, you wind up with a white-black gap -- because there is also a white-black gap in real world performance.
January 17, 2010
Evaluating teachers on value-added test scores: the Regression toward the Mean problem
Both ideas are very fashionable these days. I want to evaluate both theoretically, using a simple model with two assumptions:
First, star teachers exist, fortunately. Over the course of oneyear, some teachers can raise their students test scores more than one grade level. (There are also dud teachers who can't raise test scores as much as the averager teacher can.) In my simplified model, a star teacher is one who raises grade levels 1.5 years per year 1.0 years in the classroom.
Second, the positive impact of star teachers' is partly reduced over time by regression toward the mean. After nine months under the guidance of Miss Jean Brodie, the kids are well ahead of the average. But when they come back from summer vacation, they aren't as far ahead anymore. Away from Ms. Wonderful, they've regressed toward the mean. There can be a lot of other causes for regression toward the mean. Perhaps after a second year under Miss Jean, some of the students are bored with her tricks and less intimidated by her shtick. Maybe, especially in math and science, the students start getting closer to their intellectual limits.
So, let's assess both questions about teachers with these two concepts in mind. Let's start with something I've always assumed was a good idea: value-added evaluations of teacher performance.
I've long advocated that teachers should not be evaluated upon how well their students do on standardized tests, since the impact of the teacher is typically overwhelmed in the results by the differences between students. Those kind of evaluation systems just augment the natural tendency for the best teachers to wind up with the best students, as everybody scrambles to get hired at the schools with the smartest students. Instead, I've argued for "value-added" evaluations of teachers, measuring how much test scores have gone up under the teacher relative to the students' previous scores. The Obama Administration has come around to this view, too.
Now, though, I've developed a worrisome question about measuring teacher performance on value added, something I've always recommended. How do you factor the effects of regression toward the mean into formulas for measuring teacher performance? In the real world, you can't always assume that last year's test scores show how smart each teacher's students are on average. Last years scores were likely driven up or down by the quality of the teacher last year. The really confusing thing is that it's likely that students whose test scores were unnaturally depressed by a bad teacher last year are likely to go up more this year than students whose test scores were boosted last year by a very good teacher. That's regression toward the mean.
Let's take a sports coaching example. When I was at Notre Dame High School, our archrival Crespi always killed us in pole vaulting during our annual track meet. In fact, Crespi vaulters set a whole bunch of different national age group and high school year records.That's pretty amazing. Strangely enough, it becomes less amazing when you discover that all three star Crespi vaulters were named Curran. It turns out that the Curran brothers had a pole vault track and pit in their backyard, where their father, who had been a pole vaulter, trained them in advanced pole vaulting techniques.
Here's a one minute video from a Super Eight home movie from around 1972 of seventh-grader Anthony Curran clearing 9 feet in his backyard. I had always imagined ever since I read in the 1970s about the Curran family pole vaulting practice ground that they were very rich and had a huge back yard with an Olympic Stadium type set-up, but the video shows it's cramped, ramshackle, and the pit consists of old mattresses right in front of a brick wall. It looks like a good place to break your neck. I'm sure no modern upper middle class mom would put up with Dad and the boys building such a nightmare in the backyard, but Mrs. Curran can be seen waving happily in the home movie as her 13-year-old son hurtles toward his fate.
Not too surprisingly, the Curran Brothers were quite good pole vaulters in college (Anthony Curran, now the pole vault coach at UCLA, has an all-time personal best of 18'-8"), but they weren't the record setters in their subsequent careers that they had been in high school. I don't think any Curran's ever made the U.S. Olympic team. Regression toward the mean set in as they got older and better natural athletes started to catch up to them in hours of lifetime training.
Say you were the college pole vault coach of the Curran Brothers and the athletic director said to you, "Tim Curran set a world age group record at 15, Anthony Curran sent national class year records in high school for sophomore, junior, and senior years. We recruited you the two most accomplished high school vaulters in the history of the top pole vaulting state in the Union. But under your coaching, they aren't even winning college national championships. Why are you failing so badly with all this talent we gave you?"
The true answer is that because the Currans started training so much younger than their current competitors in college, they came closer to fulfilling 100% of their natural potential in high school than anybody else in California did. Now, the other kids are catching up and regression toward the mean is kicking in for the Currans. As high schoolers, the Currans had good nature and exceptional nurture to dominate an obscure sport. By college, they were running into competitors with even better nature, and the nurture gap was closing as all the top competitors got the same amount of coaching in college.
Now, let's think about this in a typical school, where children aren't always fully randomly shuffled after each year. For example, at my elementary school in the 1960s, there were 70 children at each grade level, so they were divided up into the Blue and the Red classrooms. They weren't tracked, they were just randomly assigned. If you started out as a Blue, you typically stayed in Blue with your closer friends.
Say that the two 1st grade teachers are wildly different in effectiveness. The Blue 1st Grade teacher's students finish the year a half grade level above the average, while the Red 1st Grade teachers students finish the year a half grade level below average.
Now, if you are a second grade teacher of perfectly average effectiveness, a teacher who can be expected to raise the grade level of an average class by 1.0 years (relative to the average), which class do you want to inherit, Blue or Red, to do best on the teacher effectiveness evaluation at the end of their second grade.
Let's say that the great Blue first grade teacher's benefits have a one year half life and the bad Red first grade teacher's harm's have a one year half life. In other words, there is regression toward the mean over time in teaching effectiveness, as in so much in life.
If you were just being measured not on value added, but on simple absolute performance at the end of the grade, you'd want to inherit the Blue class that ended last year 0.5 grade levels above average. If you do an average job and the half life is one year, then they'll finish your year averaging grade level 2.25: 0.25 grade levels above average, and you'll be considered a good teacher.
On the other hand, if you are being relativistically measured on value added as calculated by your second graders' grade level at the end of your year minus their grade level at the end of the previous year, you don't want to inherit the star teacher's overachieving Blue class, because you will only get credit for adding a crummy 0.75 grade levels in value. Sure, after two years, they'll be at grade level 2.25, but the were at 1.5 a year ago, so you only get credit for 2.25-1.50 = 0.75 grade levels of value added.
Under value added measurement, you might get fired for, in essence, having inherited the better taught class.
Instead, under value added measurement, you want to inherit the underachieving Red Class from that bad teacher, so that you can get the credit for her students inevitable upward regression toward the mean. They'll wind up the year going from 0.5 to 1.75, so you'll get credit for adding the value of 1.25 grades. I'm a star! Give me my bonus money, Arne Duncan, gimme it now!
This model where there is partial regression toward the mean after the impact of superstar teachers has interesting implications for the national obsession with closing the racial gaps in school achievement.
Assume you have an elementary school with average students where every teacher is a star capable of pushing students ahead 1.5 grades each year (a Grade Level Boost of 0.5), all else being equal. If there is zero regression toward the mean, a simple Excel model predicts that when the average student graduates at the end of eighth grade, he's performing at the 12th grade level.
| Grade | Grd Level Boost | Regress to Mean | Grade Level |
| 1 | 0.5 | 0% | 1.5 |
| 2 | 0.5 | 0% | 3.0 |
| 3 | 0.5 | 0% | 4.5 |
| 4 | 0.5 | 0% | 6.0 |
| 5 | 0.5 | 0% | 7.5 |
| 6 | 0.5 | 0% | 9.0 |
| 7 | 0.5 | 0% | 10.5 |
| 8 | 0.5 | 0% | 12.0 |
On the other hand, if there is 100% regression toward the mean, the average student, after eight years of star teachers, tests at just the 8.5 grade level at the end of 8th grade:
| Grade | Grade Level Boost | Reg to Mean | Grade Level |
| 1 | 0.5 | 100% | 1.5 |
| 2 | 0.5 | 100% | 2.5 |
| 3 | 0.5 | 100% | 3.5 |
| 4 | 0.5 | 100% | 4.5 |
| 5 | 0.5 | 100% | 5.5 |
| 6 | 0.5 | 100% | 6.5 |
| 7 | 0.5 | 100% | 7.5 |
| 8 | 0.5 | 100% | 8.5 |
The discouraging thing is that the results of regression toward the mean aren't symmetrical: you only get the the big boosts in grade level by eliminating the last bits of regression toward the mean, but that's very hard to do.
| Grade | Grade Level Boost | Reg to Mean | Grade Level |
| 1 | 0.5 | 50% | 1.5 |
| 2 | 0.5 | 50% | 2.8 |
| 3 | 0.5 | 50% | 3.9 |
| 4 | 0.5 | 50% | 4.9 |
| 5 | 0.5 | 50% | 6.0 |
| 6 | 0.5 | 50% | 7.0 |
| 7 | 0.5 | 50% | 8.0 |
| 8 | 0.5 | 50% | 9.0 |
So, you can see the contemporary obsession in the Obama Administration and the prestige press comes from with trying to reduce regression toward the mean by taking away kids' summer vacations, by keeping them at school a dozen hours per day (the celebrated KIPP program), and so forth.
Unfortunately, the big gains only come from eliminating the last bits of regression toward the mean. If you can cut regression toward the mean from 50% to 25%, then the average student's grade level at the end of eighth grade increases from 9.0 to 9.8:
| Grade | Grade Level Boost | Reg to Mean | Grade Level |
| 1 | 0.5 | 25% | 1.5 |
| 2 | 0.5 | 25% | 2.9 |
| 3 | 0.5 | 25% | 4.2 |
| 4 | 0.5 | 25% | 5.4 |
| 5 | 0.5 | 25% | 6.5 |
| 6 | 0.5 | 25% | 7.6 |
| 7 | 0.5 | 25% | 8.7 |
| 8 | 0.5 | 25% | 9.8 |
But, as you can see, in a school of star teachers, reducing annual regression toward the mean from 100% to 25% only boosts grade level upon eighth grade graduation by 1.3 years, from 8.5 to 9.8. In contrast, reducing annual regression toward the mean from 25% to 0% would, theoretically, boost grade level at elementary school graduation by 2.2 years, from 9.8 to 12.0. But, due to diminishing marginal returns, it's probably much harder to reduce regression toward the mean from 25% to 0% than from 100% to 25%.
Since the white-black gap at the end of high school is three to four years, these regression toward the mean calculations can help explain why there is such a Blind Side-like obsession with plugging holes in the environment where NAM students' regression toward the mean might occur. For example, the NYT Magazine ran a feature on a public boarding school in a poor part of Washington DC where the taxpayers pay $35k per student per year for five nights per week at this boarding school. But the article was heavily devoted to worrying about whether the two nights per week that the students spend at home was causing the presumed test score gains of the five nights in the dorm to regress back toward the black mean.
Of course, the real killer in terms of closing the racial gap by eliminating sources of regression toward the mean is that eventually, these individuals turn into adults whom you can't manipulate so much, and then they choose environments for themselves.
My published articles are archived at iSteve.com -- Steve Sailer
How to select a good teacher
Superstar teachers had four other tendencies in common: they avidly recruited students and their families into the process; they maintained focus, ensuring that everything they did contributed to student learning; they planned exhaustively and purposefully—for the next day or the year ahead—by working backward from the desired outcome; and they worked relentlessly, refusing to surrender to the combined menaces of poverty, bureaucracy, and budgetary shortfalls. ....But all these traits that correlate with being a good teacher would also likely correlate with being a good senior vice president at a Fortune 500 firm and lots of other tough and high-paid jobs. Heck, Teach for America's ideal high school math teacher would probably also a good candidate to claw his way up the corporate ladder to be a Chief Financial Officer making 7 or even 8 figures.Ideally, schools would hire better teachers to begin with. But this is notoriously difficult. How do you screen for a relentless mind-set?
When Teach for America began, applicants were evaluated on 12 criteria (such as persistence and communication skills), chosen based on conversations with educators. Recruits answered open-ended questions like “What is wind?” Starting in 2000, the organization began to retroactively critique its own judgments. What did the best teachers have in common when they applied for the job?
Once a model for outcomes-based hiring was built, it started churning out some humbling results. “I came into this with a bunch of theories,” says Monique Ayotte-Hoeltzel, who was then head of admissions. “I was proven wrong at least as many times as I was validated.”
Based on her own experience teaching in the Mississippi Delta, Ayotte-Hoeltzel was convinced, for example, that teachers with earlier experience working in poor neighborhoods were more effective. Wrong. An analysis of the data found no correlation.
For years, Teach for America also selected for something called “constant learning.” As Farr and others had noticed, great teachers tended to reflect on their performance and adapt accordingly. So people who tend to be self-aware might be a good bet. “It’s a perfectly reasonable hypothesis,” Ayotte-Hoeltzel says.
But in 2003, the admissions staff looked at the data and discovered that reflectiveness did not seem to matter either. Or more accurately, trying to predict reflectiveness in the hiring process did not work.
What did predict success, interestingly, was a history of perseverance—not just an attitude, but a track record. In the interview process, Teach for America now asks applicants to talk about overcoming challenges in their lives—and ranks their perseverance based on their answers. Angela Lee Duckworth, an assistant professor of psychology at the University of Pennsylvania, and her colleagues have actually quantified the value of perseverance. In a study published in TheJournal of Positive Psychology in November 2009, they evaluated 390 Teach for America instructors before and after a year of teaching. Those who initially scored high for “grit”—defined as perseverance and a passion for long-term goals, and measured using a short multiple-choice test—were 31 percent more likely than their less gritty peers to spur academic growth in their students. Gritty people, the theory goes, work harder and stay committed to their goals longer. (Grit also predicts retention of cadets at West Point, Duckworth has found.)
But another trait seemed to matter even more. Teachers who scored high in “life satisfaction”—reporting that they were very content with their lives—were 43 percent more likely to perform well in the classroom than their less satisfied colleagues. These teachers “may be more adept at engaging their pupils, and their zest and enthusiasm may spread to their students,” the study suggested.
In general, though, Teach for America’s staffers have discovered that past performance—especially the kind you can measure—is the best predictor of future performance. Recruits who have achieved big, measurable goals in college tend to do so as teachers. And the two best metrics of previous success tend to be grade-point average and “leadership achievement”—a record of running something and showing tangible results. If you not only led a tutoring program but doubled its size, that’s promising.
Knowledge matters, but not in every case. In studies of high-school math teachers, majoring in the subject seems to predict better results in the classroom. And more generally, people who attended a selective college are more likely to excel as teachers (although graduating from an Ivy League school does not unto itself predict significant gains in a Teach for America classroom). Meanwhile, a master’s degree in education seems to have no impact on classroom effectiveness.
The most valuable educational credentials may be the ones that circle back to squishier traits like perseverance. Last summer, an internal Teach for America analysis found that an applicant’s college GPA alone is not as good a predictor as the GPA in the final two years of college. If an applicant starts out with mediocre grades and improves, in other words, that curve appears to be more revealing than getting straight A’s all along.
Last year, Teach for America churned through 35,000 candidates to choose 4,100 new teachers. Staff members select new hires by deferring almost entirely to the model: they enter more than 30 data points about a given candidate (about twice the number of inputs they considered a decade ago), and then the model spits out a hiring recommendation. Every year, the model changes, depending on what the new batch of student data shows.
It's not hugely enlightening to come up with a test that can determine that, say, Ben Franklin or James Cameron or Steve Jobs or Meryl Streep or John Madden or Steven Spielberg or Lee Kwan Yew has the skill set it takes to be a good schoolteacher. We also need another test to identify people who would be better at schoolteaching than at most other competing careers, or we'll suffer very high attrition from the schoolteacher ranks (as Teach for America does).
My published articles are archived at iSteve.com -- Steve Sailer
Augumenting MCAT with a Big 5 personality test
At the start of the study, the researchers administered a standardized personality test and assessed each student for five different dimensions of personality — extraversion, neuroticism, openness, agreeableness and conscientiousness. They then followed the students through their schooling, taking note of the students’ grades, performance and attrition rates.
The investigators found that the results of the personality test had a striking correlation with the students’ performance. Neuroticism, or an individual’s likelihood of becoming emotionally upset, was a constant predictor of a student’s poor academic performance and even attrition. Being conscientious, on the other hand, was a particularly important predictor of success throughout medical school.
In the U.S. setting, conscientiousness is likely measured well by undergraduate GPA.
And the importance of openness and agreeableness increased over time, though neither did as significantly as extraversion. Extraverts invariably struggled early on but ended up excelling as their training entailed less time in the classroom and more time with patients.
“The noncognitive, personality domain is an untapped area for medical school admissions,” said Deniz S. Ones, a professor of psychology at the University of Minnesota and one of the authors of the study. “We typically address it in a more haphazard way than we do cognitive ability, relying on recommendations, essays and either structured or unstructured interviews. We need to close the loop on all of this.”
Some schools have tried to use a quantitative rating system to evaluate applicant essays and letters of recommendation, but the results remain inconsistent. “Even with these attempts to make the process more sophisticated, there is no standardization,” Dr. Ones said. “Some references might emphasize conscientiousness, and some interviewers might focus on extraversion. That nonstandardization has costs in terms of making wrong decisions based on personality characteristics.”
By using standardized assessments of personality, a medical school admissions committee can get a better sense of how a candidate stands relative to others. “If I know someone is not just stress-prone, but stress-prone at the 95th percentile rather than the 65th,” Dr. Ones said, “I would have to ask myself if that person could handle the stress of medicine.”
This all makes sense. The danger, however, always seems to be that somebody who had a high IQ and a low honesty level might be able to figure out what answers are wanted on the Big 5 personality test and just tell them what they want to hear. That's an advantage for IQ tests -- if you can figure out the answers the IQ testers want to hear, they you have a high IQ.
While standardized tests like the MCAT and the SAT have been criticized for putting certain population groups at a disadvantage, the particular personality test used in this study has been shown to work consistently across different cultures and backgrounds. “This test shows virtually none or very tiny differences between different ethnic or minority groups,” Dr. Ones noted. Because of this reliability, the test is a potentially invaluable adjunct to more traditional knowledge-based testing. “It could work as an additional predictive tool in the system,” she said.
I find this implausible. Has, for example, Woody Allen been lying to us all these years about Jews scoring higher on Neuroticism?
Keep in mind that Belgians need more than just a cognitive test because they have a single admission point for a seven year course of study, so a personality test could augment a cognitive test and high school grades. Our 3-year medical schools, however, get to use college grades, which are a lot more recent and relevant than high school grades for assessing Conscientiousness and the like.
One perennial question that personality testing could help to answer is whether hard work can make up for differences in cognitive ability. “Some of our data says yes,” Dr. Ones said. “If someone is at the 15th percentile of the cognitive test but at the 95th percentile of conscientiousness, chances are that the student is going to make it.” That student may even eventually outperform peers who have higher cognitive test scores but who are less conscientious or more neurotic and stress-prone.
This is like saying that if you score at the 95th percentile on undergrad GPA, you can make it if you score at only the 15th percentile on the MCAT. Perhaps. I would be more worried in this situation relying on a single personality test result showing extreme conscientiousness than on four years of outstanding undergraduate grades, since a personality test result showing you're a hard worker is more likely to be faked than are four years of good grades in college.
If you work hard for four years in college, then you probably are a hard worker. Still, it would be nice to have a faster selection method than that, so if the personality test boys can prove their results are reliable, more power to them. But, I'd like to see the proof, first.
My published articles are archived at iSteve.com -- Steve Sailer
January 7, 2010
NYT: "Law School Admissions Lag Among Minorities"
Law School Admissions Lag Among MinoritiesMeanwhile, lawyer Mark Greenbaum complains in the LA Times that, by his estimate, there are 50% more law school grads each year than are needed to fill legal jobs:
by Tamar LewinWhile law schools added about 3,000 seats for first-year students from 1993 to 2008, both the percentage and the number of black and Mexican-American law students declined in that period, according to a study by a Columbia Law School professor.
What makes the declines particularly troubling, said the professor, Conrad Johnson, is that in that same period, both groups improved their college grade-point averages and their scores on the Law School Admission Test, or L.S.A.T.
“Even though their scores and grades are improving, and are very close to those of white applicants [not true], African-Americans and Mexican-Americans are increasingly being shut out of law schools,” said Mr. Johnson, who oversees the Lawyering in the Digital Age Clinic at Columbia, which collaborated with the Society of American Law Teachers to examine minority enrollment rates at American law schools.
However, Hispanics other than Mexicans and Puerto Ricans made slight gains in law school enrollment.
The number of black and Mexican-American students applying to law school has been relatively constant, or growing slightly, for two decades. But from 2003 to 2008, 61 percent of black applicants and 46 percent of Mexican-American applicants were denied acceptance at all of the law schools to which they applied, compared with 34 percent of white applicants.
“What’s happening, as the American population becomes more diverse, is that the lawyer corps and judges are remaining predominantly white,” said John Nussbaumer, associate dean of Thomas M. Cooley Law School’s campus in Auburn Hills, Mich., which enrolls an unusually high percentage of African-American students.
Mr. Nussbaumer, who has been looking at the same minority-representation numbers, independently of the Columbia clinic, has become increasingly concerned about the large percentage of minority applicants shut out of law schools.
“A big part of it is that many schools base their admissions criteria not on whether students have a reasonable chance of success, but how those L.S.A.T. numbers are going to affect their rankings in the U.S. News & World Report,” Mr. Nussbaumer said. “Deans get fired if the rankings drop, so they set their L.S.A.T. requirements very high.
“We’re living proof that it doesn’t have to be that way, that those students with the slightly lower L.S.A.T. scores can graduate, pass the bar and be terrific lawyers.”
Margaret Martin Barry, co-president of the Society of American Law Teachers, said that while she understood the importance of rankings, law schools must address the issue of diversity. “If you’re so concerned with rankings, you’re going to lose a whole generation,” she said.
The Columbia study found that among the 46,500 law school matriculants in the fall of 2008, there were 3,392 African-Americans, or 7.3 percent, and 673 Mexican-Americans, or 1.4 percent. Among the 43,520 matriculants in 1993, there were 3,432 African-Americans, or 7.9 percent, and 710 Mexican-Americans, or 1.6 percent. The study, whose findings are detailed at the Web site A Disturbing Trend in Law School Diversity, relied on the admission council’s minority categories, which track Mexican-Americans separately from Puerto Ricans and Hispanic/Latino students.
“We focused on the two groups, African-Americans and Mexican-Americans, who did not make progress in law school representation during the period,” Mr. Johnson said. “The Hispanic/Latino group did increase, from 3.1 percent of the matriculants in 1993, to 5.1 percent in 2008.”
Mr. Johnson said he did not have a good explanation for the disparity, particularly since the 2008 LSAT scores among Mexican-Americans were, on average, one point higher than those of the Hispanics, and one point lower in 1993.
Over all, Mr. Johnson said, it is puzzling that minority enrollment in law schools has fallen, even since the United States Supreme Court ruled in 2003, in Grutter v. Bollinger, that race can be taken into account in law school admissions because the diversity of the student body is a compelling state interest.
“Someone told me that things had actually gotten worse since the Grutter decision, and that’s what got us started looking at this,” Mr. Johnson said. “Many people are not aware of the numbers, even among those interested in diversity issues. For many African-American and Mexican-American students, law school is an elusive goal.”
From 2004 through 2008, the field grew less than 1% per year on average, going from 735,000 people making a living as attorneys to just 760,000, with the Bureau of Labor Statistics postulating that the field will grow at the same rate through 2016. Taking into account retirements, deaths and that the bureau's data is pre-recession, the number of new positions is likely to be fewer than 30,000 per year. That is far fewer than what's needed to accommodate the 45,000 juris doctors graduating from U.S. law schools each year.Of course, a lot of people who graduate from law school never pass the Bar Exam, including about 40% of black law school grads and 53% of all blacks who start law school, according to Richard Sanders. I don't believe there is affirmative action grading on Bar Exams, so that test traps a lot of blacks who have taken out huge students loans to attend law school. Why is the NYT pushing for playing that kind of dirty trick on blacks who are even less likely to pass the Bar Exam?
My published articles are archived at iSteve.com -- Steve Sailer
December 21, 2009
Advanced Placement Tests
Leaving aside for the moment the more subtle issues (some of which are explored surprisingly well in the discussion), I noticed in the NYT's comments a "B.P." who makes one helluva case for the basic existence of Advanced Placement testing:
I was the first person in my extended family (35 siblings and first cousins in this generation) to graduate from a 4 year university. My parents both left high school at age 16. My father finished high school by correspondence, my mother has her GED. I was raised in a religious minority with lower U.S. college attendance rates than the Native American population (per Pew research). As late as my last semester of high school, I doubted whether I would be able to attend college upon high school graduation.
I was also the (male) AP State Scholar from AZ for 1994. I qualified for free AP exams based on family income level, and I took all offered AP courses consistent with my schedule as well as taking exams in several other areas where AP courses were not offered. The 63 credits I earned in this fashion allowed me to complete a BS in Electrical Engineering in 3.5 years, while taking a light enough (12-15 semester hour) course load that I could schedule all of my classes for two or three day schedules, allowing me to work 3-4 days per week, while continuing to spend roughly 20 hours per week in religious activities. While supplemented by an AZ tuition waiver (class rank based) to attend a state school, a National Merit Scholarship, and proximity to campus (4 miles from ASU), this course credit was the key factor which allowed me to make the case to my father that I would be able to continue to work in the family business while attending college for an unextended period, and it wouldn't cost him a dime, nor would we incur debt.
Had my high school (with its roughly 50% dropout rate) not had an extensive AP program, I have no doubt that I would not have gone to college. I would currently be a sub-par unemployed electrician, instead of a registered professional engineer for the past 9 years. I would be looking for a job rather than having been employed in 5 progressively more responsible engineering positions at the same utility over the past 11 years. At least three family members would currently not own the houses they are living in, my youngest sister wouldn't have graduated from ASU, and I would currently be worring about how to support my parents in retirement.
... Denying students opportunity is no service to students or society.
Sounds like the hero of a Heinlein juvenile novel from 1958.
I wonder which "religious minority" is this fellow from? Polygamous Mormon? Jehovah's Witnesses? Syrian Jewish? Shi'ite? Mennonite? There are a lot of clues in his comment (which can be read in full here), but I haven't been able to come up with a good guess.
My published articles are archived at iSteve.com -- Steve Sailer
July 28, 2009
Pilots and g-Force
When researching my 2004 article on John F. Kerry's and George W. Bush's IQ scores judging from their performance on the Officer's Qualification Tests they took in the later 1960s (Bush 120-125, Kerry 115-120, which turned out to fit with their GPAs at Yale), I read a lot of studies from the 1960s by the military's psychometricians documenting the predictive validity of these exams. I then tried to track down the authors to help me understand Kerry's and Bush's scores.
I spent two hours on the phone with a very helpful gentleman, now a college professor of statistics, who had retired after many years as the head psychometrician for one of the major branches of the Armed Services.
Among much else that was interesting, he mentioned that in 1990 he had provided to Charles Murray the U.S. military's scores from the renorming of its AFQT enlistment test. In 1980, the Pentagon had paid the Department of Labor to give the AFQT to all 12,000+ young people in its National Longitudinal Study of Youth database. The middle section of The Bell Curve is devoted to tracking how these ex-youths, now 25 to 33 in 1990, were doing in life in relation to their IQ scores a decade before.
My source had nothing but praise for The Bell Curve.
The psychometric expert said something that seemed puzzling to me. He said that the General Factor of intelligence completely dominated job performance as a pilot to such an extent that it really wasn't worthwhile to give multiple intelligences tests of specific piloting skills, such as the one George W. Bush took in 1968 to measure his 3-d visualization skills.
For example, a question might ask:
Which picture represents how the horizon would look straight-ahead out the cockpit window when you are in the midst of turning from flying north to flying east while banking 60 degrees?
A. _
B. /
C. \
D. |
Bush only scored, I believe, at the 25th percentile on this test, but I don't think this kind of thing came up much in the Oval Office.
My source said that he recommended getting rid of flying-specific tests for admission to pilot-training, but the brass wouldn't go along with it because they insisted their had to be pilot-specific skills separate from the g Factor.
Listening to him, I certainly agreed with the brass. After all, I have a decent IQ, but I'd make a terrible pilot during the brief interval before I became a smoking crater due to making some stupid mistake.
And, this is not something I only recently realized. I can vaguely recall being 16 and looking at the catalog from the Air Force Academy and deciding that, based on my experience driving a car, riding a bike, playing sports, and generally bumbling about in the physical world, that I wasn't cut out to pilot Air Force jets.
I've wondered about this expert's finding over the years, and I think I've finally started to figure it out: People with high IQs who would be bad pilots generally figure out for themselves that they would be bad pilots; so, they never take the tests to be pilots. Thus, the high correlation between the g Factor and pilot performance: high IQ individuals are already selected for having pilot-specific skills.
Similarly, high IQ guys who would make lousy firemen already know it, so they don't take the firemen's test much.
Thus, a hiring test like the New York ones ruled too discriminating by Judge Garaufis tend to work well. They are combination aptitude and achievement tests with all the questions solely about firefighting, but all the information needed to answer the questions given on the test. Still, under pressure, it's not too easy to decipher passages about technical details of chainsaw maintenance.
Thus, to score perfectly on these kind of tests, it's helpful to be both reasonably bright and to have studied firefighting guidebooks. High IQ guys who wouldn't make good firemen tend to figure out while they're studying that this isn't the career for them and thus don't take the tests. So, these kind of aptitude/achievement tests work quite well.
My published articles are archived at iSteve.com -- Steve Sailer
