December 4, 2013

PISA: Which countries to trust the least

How can you be confident that local officials didn't pull any fast ones with their PISA results? Well, you can't, but you can get some sense of how much room there is to pull the wool over your eyes by looking at the response rate. 

Large countries have to test at least 4,500 students, and the sample is supposed to be carefully designed to represent the entire country's 15-year-olds. But projected coverage usually turns out less than perfect. For example, countries can exclude students with disabilities. This sounds reasonable -- it's hard for a blind person to take a pencil and paper test. But, what about cognitive disabilities, such as not being very bright? From the federal government's website on PISA:
PISA 2012 is designed to be as inclusive as possible. The guidelines allowed schools to be excluded for approved reasons (for example, schools in remote regions, very small schools, or special education schools). Schools used the following international guidelines on student exclusions: 
Students with functional disabilities. These were students with a moderate to severe permanent physical disability such that they cannot perform in the PISA testing environment. 
Students with intellectual disabilities. These were students with a mental or emotional disability and who have been tested as cognitively delayed or who are considered in the professional opinion of qualified staff to be cognitively delayed such that they cannot perform in the PISA testing environment. 
Students with insufficient language experience. These were students who meet the three criteria of not being native speakers in the assessment language, having limited proficiency in the assessment language, and having less than 1 year of instruction in the assessment language. 
Overall estimated exclusions (including both school and student exclusions) were to be under 5 percent of the PISA target population.

Buried in a PISA appendix entitled Annex 2A are PISA figures for what percentage of the target populations of 15-year-olds didn't get tested. America didn't come close to getting 95% representation, and many Third World countries were far worse.

"Coverage Index 3: Coverage of 15-year-old population" shows what percentage of the cohort are represented if the test taking sample was projected to the whole country. I subtracted this percentage from 100% to come up with the % Missing index. For example, Costa Rica only managed to test half the people they were supposed to, and Albania only tested 55%. Vietnam, which made a splashy PISA debut with high scores, somehow couldn't find 44% of their 15-year-olds. At the other end, the dutiful Dutch managed to test slightly more students than were thought to be around.

% Missing
Costa Rica 50%
Albania 45%
Vietnam 44%
Mexico 37%
Colombia 37%
Indonesia 37%
Turkey 32%
Brazil 31%
Thailand 28%
Peru 28%
Uruguay 27%
Liechtenstein 25%
Bulgaria 23%
Shanghai-China 21%
Malaysia 21%
Argentina 20%
Kazakhstan 19%
Macao-China 19%
Hungary 18%
United Arab Emirates  17%
Canada 17%
Chile 17%
Hong Kong-China 16%
Czech Republic 15%
Serbia 15%
Latvia 15%
Lithuania 14%
Jordan 14%
Australia 14%
Italy 14%
Greece 13%
New Zealand 12%
Korea 12%
Austria 12%
Portugal 12%
Spain 12%
France 12%
United States 11%
Chinese Taipei  11%
Poland 11%
Luxembourg 11%
Montenegro 10%
Israel 9%
Denmark 9%
Japan 9%
Ireland 9%
Slovak Republic 9%
Tunisia 9%
Switzerland 9%
Norway 8%
Estonia 8%
Russian Federation 8%
Iceland 7%
Sweden 7%
United Kingdom 7%
Slovenia 6%
Qatar 6%
Croatia 6%
Germany 5%
Singapore 5%
Belgium 5%
Finland 4%
Romania 4%
Cyprus 3%
Netherlands -1%
In general, Third World countries were bad at getting good coverage, suggesting that the First World v. Third World gap is even larger than the test scores imply.

Top scorer Shanghai missed 21%, so we should take its flashy scores with a few grains of salt.

America was at 11% missing, down from 18% missing in 2009, which may account for the slight decline in U.S. scores?

Consistent high-flier Finland had only 4% missing, so they aren't cheating on this measure more than the competition is.

A major question is how random were the missing test-takers. If the missing were purely random, then no harm no foul. But of course, many of the missing are dropouts, or in special day classes, or in juvy hall, or whatever.

This may help excuse slightly Argentina's horrible scores. The Argentineans misplaced only 20% of their 15-year-olds compared to the 37% of Mexicans who went missing.

Swedes accuse PISA of fabricated data

Via Staffan's Personality Blog, here's an article from a Swedish (ahem, sore loser, ahem) newspaper accusing PISA of using fabricated data from Slovenia, Italy, and the United Arab Emirates. The charges don't involve students, but high school principals. The principals were supposed to fill in a 184 question survey for the Nosey Parkers at PISA, but there is evidence that dozens of principals just cut and pasted somebody else's answers, which wouldn't be hugely surprising with a survey that is 184 questions long.

A general problem with comparing results of countries in international tests are differing levels of motivation. It's remarkable how plausible the PISA results are in general considering how much this factor is likely to vary from place to place and time to time.

The 5 most expensive words in the world: "We'll fix it in post"

Commenter Power Child notes:
"We'll fix it in post" are known to production guys as the five most expensive words in filmmaking.

"We'll fix it in post" is also the reasoning behind an awful lot of government spending on education, welfare, medicine, prisons, and many other Gaps caused by lack of care upfront in the production of residents of America.

PISA by state by race: Massachusetts, Connecticut, Florida

The PISA test was given to large samples sizes in three American states. From the federal National Center for Educational Statistics:

PISA 2012
Race / Ethnicity Mean Math Science Reading
Massachusetts
White 538 530 545 540
Black 467 458 466 476
Hispanic 460 446 460 475
Asian 578 569 580 584
Multiracial NA NA NA NA
Connecticut
White 542 534 547 546
Black 434 421 433 447
Hispanic 456 442 463 463
Asian 548 534 553 558
Multiracial 512 496 520 521
Florida
White 512 499 520 518
Black 429 413 425 449
Hispanic 474 458 475 489
Asian NA NA NA NA
Multiracial 486 467 500 492
White-Black Gaps
Massachusetts 72 72 79 64
Connecticut 109 113 114 99
Florida 83 86 95 69
White-Hispanic Gaps
Massachusetts 78 84 85 65
Connecticut 86 92 84 83
Florida 38 41 45 29

The standard deviation is supposed to be 100, so you can just put a decimal place in front of those gap numbers to convert them into rough z scores.

We can see patterns here that shouldn't be unexpected. Massachusetts, home to the education-industrial complex since 1636, has smart whites. Connecticut, home to the hedge fund industry, has smart whites.

Florida, not so much. Still, this would be a good time for an old anecdote about how Florida isn't wall-to-wall Parrot Heads. I had a girlfriend in college who went to the public high school in Cocoa Beach, FL (the town that was the setting for the 1960s sit-com I Dream of Jeannie). She told me she scored 1580 on the SAT (M+V, old-style). I exclaimed:

"You must have had the highest score in your high school!"

"Oh, no, I was fourth-highest."

"Fourth? Who were the other three? The children of rocket scientists?" (In my defense, this was a relatively new witticism in 1979.)

"Yes."

Massachusetts has pretty smart blacks, going back to Phillis Wheatley and W.E. Du Bois. Connecticut and Florida, not so much.

Florida has pretty smart Hispanics, although the wealthy Cubans and other rich Latin Americans are getting diluted more and more.

A thought on the cause of growing inequality

In this globalist age, we all know that nationalism was the worst thing ever. 

Except that the masses tended to do pretty well for themselves under nationalist governments that needed well-educated, well-fed, and enthusiastic populations to man their giant armies.

Perhaps Tyler Cowen's "Average Is Over" is, fundamentally, a function of the development of smart bombs, cruise missiles, stealth, and other military technology in the 1970s that increased the accuracy of weaponry, and thus decreased the need for large numbers of conscript soldiers firing vaguely in the general direction of the enemy to make them keep their heads down. 

During the Great Compression in the middle of the 20th Century, elites needed mass armies, so they treated the masses well economically. But, warfare has gone high tech and the need for cannon fodder has dropped sharply, so elites don't need the masses to fight their wars for them, so they don't feel any longer the need to cut the masses a large share of the economic pie anymore.

Americans students have trouble with "higher cognitive demands"

From the "Country Note" for the United States from PISA:
Students in the United States have particular weaknesses in performing mathematics tasks with higher cognitive demands, such as taking real-world situations, translating them into mathematical terms, and interpreting mathematical aspects in real-world problems. An alignment study between the Common Core State Standards for Mathematics and PISA suggests that a successful implementation of the Common Core Standards would yield significant performance gains also in PISA.

The key phrase there is "a successful implementation."

How does PISA really work? "Fix it in Post"

How can PISA claim to fairly test in 65 countries in dozens of languages?

My vague hunch is that modern Item Response Theory testing, of which the PISA test's Rasch Model is an example, allows testers to say, much like movie directors of sloppy productions: "We'll fix it in Post." 

You tell me that during the big, expensive action scene I just shot, the leading man's fly was open and in the distant background two homeless guys got into a highly distracting shoving match? And you want to know whether we should do another take, even though we'd have to pay overtime to 125 people? 

"Eh, we'll fix it in Post."

Modern filmmakers have a lot of digital tricks up their sleeves for rescuing scenes, just as modern psychometricians have a lot of computing power available to rescue tests they've already given.

For example, how can the PISA people be sure ahead of time that their Portuguese translations are just as accurate as their Spanish translations? 

Well, that's expensive to do and raises security problems. But, when they see the results come in, they can notice that, say, smart kids in both Brazil and Portugal who scored high overall, did no better on Question 11 than kids who don't score well on the other questions, which suggests the translation of Question 11 might be ambiguous. Oh, yeah, there are, now that we think about it, two legitimately right answers to Question 11 in the Portuguese translation. So we'll drop #11 from the scoring in those two countries. But, in the Spanish-speaking countries, this anomaly doesn't show up in the results, so maybe we'll count Question 11 for those countries.

This kind of post-hoc flexibility allows PISA to wring a lot out of their data. On the other hand, it's also a little scary. 

Israel's PISA scores: Arab v. Hebrew-speakers

From Globes, an Israeli business publication:
The PISA exam shows substantial gaps between Hebrew and Arabic-speaking pupils. In the math exam, Hebrew speakers achieved a score of 489 points, while Arabic speakers achieved a score of 388 points. Arabic speakers scored 98 points less than Hebrew speakers in the science exam.

Graph of 2012 PISA scores for 65 countries/economies


This graph displays the mean of the Math, Science, and Reading test scores from the OECD's 2012 Programme for International Student Assessment. American scores are red, white countries are blue, East Asians countries are yellow, Muslim countries are green, and Latin American countries are brown.

So, Asian Americans outscored all large Asian countries (with the exception of three rich cities); white Americans outperformed most, but not all, traditionally white countries; and Latino Americans did better than all Latin American countries. African Americans almost certainly scored higher than any black majority country would have performed.

Bear in mind that many countries did not take part in PISA, such as India, which dropped out after a trial run in two states produced average scores below any seen on this chart. For a broader sampling of Third World scores, see the 2011 TIMSS Math and Science scores.

The reality is that there is not much difference in PISA or TIMSS scores within major racial blocs of countries. The Northeast Asians all tend to score well, the European and white Anglosphere countries tend to score fairly well, the Latin American countries tend to score fair to middling, and on down from there. The rank order of continents is very much like the rank order of racial/ethnic groups on NAEP or SAT or CST tests. Newcomers to the topic like Amanda Ripley, author of The Smartest Kids in the World, get excited about minor differences in PISA scores within continents, but those often are statistical noise. 

For more on how to think about PISA scores, see here. And all my postings on PISA are here.

December 3, 2013

Steve in Taki's: PISA, Piece by Piece

From my new column at Taki's Magazine:
PISA, Piece by Piece 
by Steve Sailer  
With the release of new PISA test scores for 65 countries’ 15-year-olds this week, it’s worth taking a look at TIME reporter Amanda Ripley’s latest book The Smartest Kids in the World: And How They Got that Way
Ripley came up with the clever idea of following three American high schoolers as exchange students in Finland, South Korea, and Poland. She chose Finland and South Korea because they are perennial PISA powerhouses, while Poland has improved its ranking significantly in this century. 
Her sample size of three American kids abroad is hardly foolproof, and yet it’s a start. Everybody has opinions on schooling, but few people have firsthand experience with different countries’ school systems because it’s immensely time-consuming to sit in on classrooms long enough that the teacher runs out of her dog-and-pony shows for visitors and finally gets down to normal business. 
Having only recently become interested in the topic of education, Ripley is a true believer in PISA scores. 
Should you be? In truth, nobody seems to really know how much to trust PISA and its ace salesman Andreas Schleicher. ... The sheer logistical challenge of what PISA attempts to do should raise common-sense questions about how perfectly 65 countries can be compared. Translation of tests, selection of representative samples, and prevention of local authorities putting their thumbs on the scale are challenges so daunting to get exactly equal around the world that most observers just seem to hope for the best and trust that Schleicher has somehow devised a globally level playing field.

Please read the whole thing there.

Mandatory Finnish Content: Finland still #1 in Europe

There has been much talk about how Finland plummeted from its traditional top spot in the PISA scores, but:

- Finland was always only tops in white countries. Some Northeast Asians would typically beat the Finns.

- Finland is still #1 in white countries if you weight Math (519 for Finland), Science (545), and Reading (524) equally, for an overall score of 529, ahead of runner-up Estonia (526). This go-round of PISA emphasized Math, on which Finland came in fifth among white countries behind Liechtenstein, Switzerland, Netherlands, and Estonia.

Overall, though, across all three subjects, Finland was still #1 in Europe and its diaspora.

In contrast, the Scandinavian countries did not excel in 2012, with overall means of 498 for Denmark, 496 for oil-rich Norway, and 482 for Sweden. Among members of the OECD, the rich countries' club, Sweden beat only Israel, Slovakia, Greece, Turkey, Chile, and Mexico. (Some of those countries that Sweden edged out are in the rich countries' club only for "courtesy" or "aspirational" reasons.)

Overall PISA rankings, including America by race

Here are today's 2012 PISA average scores ranked by the mean across the three subjects. Americans' scores by race are broken out to make the comparisons less misleading. In summary, each race in America appears to average a little better than their racial cousins overseas. (By the way, in the following list, the italicized names refer to non-OECD places):

Country or "Economy" Reading Science Math Mean
OECD average              496 501 494 497
Shanghai-China            570 580 613 587
Singapore                 542 551 573 556
Hong Kong-China           545 555 561 554
Asian Americans 550 546 549 548
Korea, Republic of        536 538 554 542
Japan                     538 547 536 540
Chinese Taipei            523 523 560 535
Finland                   524 545 519 529
Estonia                   516 541 521 526
Liechtenstein             516 525 535 525
Massachusetts All Races 527 527 514 523
Macao-China               509 521 538 523
Canada                    523 525 518 522
Poland                    518 526 518 521
Netherlands               511 522 523 519
Switzerland               509 515 531 518
White Americans 519 528 506 518
Connecticut All Races 521 521 506 516
Vietnam                   508 528 511 516
Ireland                   523 522 501 516
Germany                   508 524 514 515
Australia                 512 521 504 512
Belgium                   509 505 515 510
New Zealand               512 516 500 509
Multiracial Americans 517 511 492 507
United Kingdom            499 514 494 502
Austria                   490 506 506 500
Czech Republic            493 508 499 500
France                    505 499 495 500
Slovenia                  481 514 501 499
Denmark                   496 498 500 498
Norway                    504 495 489 496
Latvia                    489 502 491 494
United States             498 497 481 492
Luxembourg                488 491 490 490
Spain                     488 496 484 490
Italy                     490 494 485 490
Portugal                  488 489 487 488
Hungary                   488 494 477 487
Iceland                   483 478 493 484
Lithuania                 477 496 479 484
Croatia                   485 491 471 482
Sweden                    483 485 478 482
Florida All Races 492 485 467 481
Russian Federation        475 486 482 481
Israel                    486 470 466 474
Slovak Republic           463 471 482 472
Greece                    477 467 453 466
Hispanic Americans 478 462 455 465
Turkey                    475 463 448 462
Serbia, Republic of       446 445 449 447
Cyprus                    449 438 440 442
United Arab Emirates      442 448 434 441
Bulgaria                  436 446 439 440
Romania                   438 439 445 440
Thailand                  441 444 427 437
Chile                     441 445 423 436
African Americans 443 439 421 434
Costa Rica                441 429 407 426
Mexico                    424 415 413 417
Kazakhstan                393 425 432 416
Montenegro, Republic of   422 410 410 414
Malaysia                  398 420 421 413
Uruguay                   411 416 409 412
Brazil                    410 405 391 402
Jordan                    399 409 386 398
Argentina                 396 406 388 397
Tunisia                   404 398 388 397
Albania                   394 397 394 395
Colombia                  403 399 376 393
Indonesia                 396 382 375 384
Qatar                     388 384 376 383
Peru                      384 373 368 375

Source: My previous PISA postings, which are based on federal NCES access to PISA scores.

PISA science scores by race

From the federal National Center for Education Statistics, here are 2012 Science Literacy PISA scores for 15-year-olds, breaking out Americans by race:

OECD average              501
Shanghai-China            580
Hong Kong-China           555
Singapore                 551
Japan                     547
Asian Americans 546
Finland                   545
Estonia                   541
Korea, Republic of        538
White Americans 528
Vietnam                   528
Massachusetts All Races  527
Poland                    526
Canada                    525
Liechtenstein             525
Germany                   524
Chinese Taipei            523
Netherlands               522
Ireland                   522
Connecticut All Races 521
Australia                 521
Macao-China               521
New Zealand               516
Switzerland               515
Slovenia                  514
United Kingdom            514
Multiracial Americans 511
Czech Republic            508
Austria                   506
Belgium                   505
Latvia                    502
France                    499
Denmark                   498
United States             497
Spain                     496
Lithuania                 496
Norway                    495
Hungary                   494
Italy                     494
Croatia                   491
Luxembourg                491
Portugal                  489
Russian Federation        486
Florida All Races 485
Sweden                    485
Iceland                   478
Slovak Republic           471
Israel                    470
Greece                    467
Turkey                    463
Hispanic Americans 462
United Arab Emirates      448
Bulgaria                  446
Chile                     445
Serbia, Republic of       445
Thailand                  444
African Americans 439
Romania                   439
Cyprus                    438
Costa Rica                429
Kazakhstan                425
Malaysia                  420
Uruguay                   416
Mexico                    415
Montenegro, Republic of   410
Jordan                    409
Argentina                 406
Brazil                    405
Colombia                  399
Tunisia                   398
Albania                   397
Qatar                     384
Indonesia                 382
Peru                      373

Asian Americans did about average for wealthy Northeast Asians, white Americans beat all traditionally white countries except Finland and Estonia, Latino Americans beat all Latin American countries, and African Americans likely would have beaten all majority black countries by a comfortable margin.

What are the trends in U.S. PISA scores?

Scores for 65 countries (or "economies") are now out from the OECD's Programme for International Student Assessment (PISA). How have scores in the U.S. changed since PISA got going in 2000? 

From the federal NCES explorer tool for PISA scores:

Subject 2000 2003 2006 2009 2012
Mathematics 483 474 487 481
Science 489 502 497
Reading 504 495 500 498

Scores were down slightly in 2012 versus 2009, but they had been higher in 2009 than in preceding years (when, unfortunately, not all three subjects had been tested). Overall, 2012 scores look about the same as 21st Century scores in general, with no consistent trends visible in any subject.

So, that's kind of boring. The reason I mention it is because PISA results usually lead to great wailing and gnashing of teeth about Decline, etc.

PISA tests are scored like SATs with 500 as the intended mean for wealthy OECD countries (it usually slips below that) and a standard deviation of 100. So a 400 is at the 16th percentile for the OECD and 600 at the 84th percentile.

For more postings devoted to analyzing PISA scores, click on Labels: PISA below.