(by Julien Cornebise)
That’s it. It’s over. Done. Gone. RIP JSM 2011. ’til next year. A great week!
Yesterday’s convention center was a mix between an airport and the ghost town of Saturday: a fraction of the people were still here, most of them carrying suitcases. There should not be any talks on the last day 😉 And, although there were not big 2 hours Lecture to attend, I still had a hard time choosing between
- Sampling and Sampling Distributions contributed session, and
- Significance Magazine: Communicating Statistics to the World about the joint magazine Significance of the RSS and ASA.
The 15-minutes shortness of the former’s talks put me off, and the curiosity about this magazine that Xian blogged about, the challenges to talk stats to non-statisticians, and my own will for a steroid-version of “Popular science” decided me into picking the latter.
Boy was I glad: after a short introduction outlining the aim of Significance and calling for contributors (think of it, for you or your PhD students, it looks like a great experience!), we were treated to three very enjoyable talks by authors of recent cover papers:
- Uneducated Guesses: Using Evidence to Uncover Misguided Education Policies by Howard Wainer (I could not find the article in Significance’s archive — anyone?)
- The sea, the Census and statistics by Andrew Solow
- Deepwater disaster: how the oil spill estimates got it wrong by Ian MacDonald
Howard Wainer on how missing data can lead to dire policies, and how just a few extra data will be of precious help to avoid dramatic mistakes, with striking illustrations in Education that are also available in his book. This was thought-provoking: in a first move, I might tend to integrate out the missing data using using EM algorithm or Data Augmentation, hence assuming that the missing data is distributed similarly to the non-missing. Wrong! Howard’s examples were some of those “ah-ah!” moments, where you just realize that the original strategy amounted to standing on your head. Three examples:
- Allowing the students to pick a subset of possible questions in a test, so as to make it fairer. Wrong. A quick study on one class showed that it tends to worsen the inequality: weak students are impaired in their choice and pick the hardest questions, failing them. Consequence of assuming random missing data: augmenting the score gap with the better students who picked the easiest questions.
- Eliminating tenure for teachers to save money. Wrong. Looking back to 1991’s suppression of tenure for super-intendants showed that the salaries increased massively. Most likely explanation: tenure is a job benefit that costs nothing to the employer; removing it requires to increase the salary to compensate. Consequence of assuming random missing data: augmenting the expenses.
- Making SAT scores disclosure optional to enter college>. Wrong. Studying withheld SAT scores for the one college who has done so for 40 years shows that students choose rationally to disclose their score or not: very few “I did very well at SAT, but so what?”, many “I scored less than the average entry score, disclosing it won’t help my chances to enter”. Consequence of assuming random missing data: those students picked classes that they failed, as they lacked too many prerequisites. A thought here: it would also have been interesting to compare them not only with students who divulged their score as Howard did, but with other students with similar scores who went to other universities: did getting access to harder classes than they would have usually been allowed to helped them on the long term?
Andrew Solow on the Census of Marine Life (2000-2010): how many species, and is a species extinct? There were some striking statistical problems, again due to non-uniform missing data: it is missing because the species is harder to observe in our usual surroundings! So there is more to it than the abstract problem of estimating the number of classes in multinomial sampling, and of estimating the end-point of a distribution (a tricky problem in itself already).
Finally, most anchored in recent actuality, Ian MacDonald brilliant talk on the BP Discharge in the Gulf of Mexico (I learned it’s a more precise term than “Deepwater oil spill”: it’s not Deepwater in charge but BP, and it is not an overboard spill but a discharge from a reservoir).
This one was one for the records: a precise and scientific study of the estimates of the size of the discharge, based on the speaker’s experience with natural oil seeps occurring everyday in the Gulf. Beyond the beautiful/appalling before/after pictures, and the pleasant feeling of the modest scientist being (sadly) proved true vs the massive corporation, there was a fascinating scientific chase to the source of the discrepancies amongst the estimates. Ian brilliantly chased it down to the table linking thickness of the surface oil spread with its color (rainbow, metallic, light-brown, dark), which is multiplied by the surface to estimate the volume: while all of the scholar’s studies use one table, oil companies (BP, Exxon) use one provided by US Coast Guards with a 100-fold downward error for the thickest levels — precisely the ones needed when drama occurs!
The dramatic consequences of this error are well-know: we’re not talking indemnities, but dramatic error on the pressure escaping the well leading to failure of the blockage attempts — an error confirmed when the videos of the leak were finally released and particle-velocity expert scholars were able to confirm overnight that the flow was much more than officially stated.
Ian concluded not in an obvious “who’s to blame” that would have been too easy (and obvious…), but focused on the question: what will be the long-lasting impact? His study of the spatial distribution of the natural seeps, much different than that of the BP discharge, puts at rest the idea that the ecosystem is somehow immunized. We’re left with the challenge of designing a statistical test to that unwanted massive experiment. Ian calls for two concrete measure:
- Identify and monitor key habitats and population to check ecosystem health.
- Put the repayment of the ecosystem in the front of the line, using BP’s fine to that effect.
In conclusion, a much pleasant session, a treat for those of us who could stay this last day, and a much interesting magazine: I’ll definitely think of contributing!
Stay tuned for a final post later tonight, before I hand back the keys of the blog to its editor.