(by Andrew Gelman)
Nicholas Christakis and James Fowler are famous for finding that obesity is contagious. Their claims, which have been received with both respect and skepticism (perhaps we need a new word for this: “respecticism”?) are based on analysis of data from the Framingham heart study, a large longitudinal public-health study that happened to have some social network data (for the odd reason that each participant was asked to provide the name of a friend who could help the researchers locate them if they were to move away during the study period.
The short story is that if your close contact became obese, you were likely to become obese also. The long story is a debate about the reliability of this finding (that is, can it be explained by measurement error and sampling variability) and its causal implications.
This sort of study is in my wheelhouse, as it were, but I have never looked at the Christakis-Fowler work in detail. Thus, my previous and current comments are more along the lines of reporting, along with general statistical thoughts.
Lyons’s paper was recently published under the title, The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis. Lyons has a pretty aggressive tone–he starts the abstract with the phrase “chronic widespread misuse of statistics” and it gets worse from there–and he’s a bit rougher on Christakis and Fowler than I would be, but this shouldn’t stop us from evaluating his statistical arguments. Here are my thoughts:
1. Lyons’s statistical critiques seem reasonable to me. There could well be something important that I’m missing, but until I hear otherwise (for example, in a convincing reply by Christakis and Fowler, which could well appear soon), I’d have to go with Lyons and say that the claimed results on contagion of obesity (and also sleep problems, drug use, depression, and divorce) have not been convincingly demonstrated.
2. That said, this does not mean that Christakis and Fowler are wrong in their claims, merely that their evidence is weaker than may have at first appeared. Lyons recognizes this, writing, “while the world may indeed work as C&F say, their studies do not provide evidence to support such claims.” I wouldn’t go quite so far as to say they don’t provide evidence, but it seems fair to say they don’t provide convincing or compelling evidence. And, had the criticisms of Lyons and others been available when the papers were first submitted, I doubt they would’ve been accepted by top journals. (Again, this is my current impression, and I’m open to changing my opinion if Christakis, Fowler, or others can supply a convincing response.)
3. In debates about empirical social science, there is often a tendency to simply accept descriptive claims and move straight to the arguments about their implications. But as I’ve learned in my own research, often the descriptive claims themselves should be disputed. (For example: No, congressional elections are not increasingly likely to be close; No, redistricting does not in general create safe seats; No, we don’t need to explain why rich people now vote for Democrats or why Kansas has suddenly gone Republican.)
So I’d like to separate Lyons’s criticism of the descriptive inferences and the causal implications. The descriptive criticism is that some of Christakis and Fowler’s observed differences are not statistically significant, thus there is some doubt about generalization to the larger population, it could all just be patterns in random noise. The causal criticism is that, if the descriptive patterns do generalize, they could be explained in other ways than contagion.
4. Some of Lyons’s points relate to my own research! In particular, he notes on page 6 that the difference between significant and non-significant is not itself statistically significant, a point that should be familiar to regular readers of this space. And on page 20, he discusses difficulties with average predictive comparisons in nonlinear models.
I love seeing these ideas in new places. It’s like traveling in some foreign country and seeing McDonald’s and Gap.
5. Lyons goes a bit over the top in the conclusion of his article, slamming observational studies and modeling in general. But statistical modeling is important and useful in many many areas of science and engineering. We all know about the modeling success of the past, Kepler etc., but even modern-day statisticians can make progress with models. For example, see this paper where we explicitly discuss how the model works for us to fit a nonlinear differential equation in toxicology. There’s also political science (lots of examples, starting with the recent work by Lax and Phillips who used multilevel regression and poststratification to estimate state-level opinions on gay rights issues), civil engineering (they’ve been modeling road traffic for a long time, as discussed in the comments to Aleks’s recent blog entry on the topic), indoor air quality (ask Phil for details), business (lots of models were used in the Netflix prize, including for the winning teams), etc etc.
Bob writes the following about models in computational linguistics:
Google translate is heavily model based, being derived from IBM’s original statistical translation models.
Ad placement is also heavily model based, and works at least as far as Google’s revenue is concerned.
All of the speech recognition in everything from call centers to the desktop is heavily
model based, and works pretty well judging by the numbers of people using it.
A neat example is the Swype and T9 interfaces for entering numbers on cell phones. . . .
All of these models make crazy and wrong assumptions about independence and so on, but they work in the sense that they’re useful, not in the sense that they’re right.
Let me echo my friend here on the “useful” point. I don’t think our models are true. Even seemingly slam-dunk models such as simple random sampling are not true with real surveys, nor are randomization models actually true with experiments on real people. Models are always approximations. (Further ranting here.)
I think one should step back before slamming any research just cos it’s observational and model based. (And it doesn’t help that Lyons cites Larry Summers as an authority on statistical evidence.)
6. Just a minor technical point: On pages 17-18, Lyons implies that modeling is something you do when you don’t have enough data:
Small-scale experiments could be initiated to see what the effects of intervention actually are. Since the collection of good data is usually very hard and expensive, most papers substitute for it by statistical modeling.
This is misleading: first, a small-scale experiment can be noisy and also, by the very nature of its small scale, its larger implications can be limited. See our recent discussion of the claim (based on evidence from a randomized experiment!) that “a raise won’t make you work harder.” Reliability score: 100. Validity score: zero. Well, maybe not zero, but I don’t buy the generalization from lab to real world at all in that example. To me it seems more of a case of lab results + ideology = policy claim.
The second problem with Lyons’s argument above is the implication that modeling is what you do when you don’t have good data. Au contraire! Once you have good data, you might very well want to model to learn important things. Consider our radon project. We had 5000 excellent data points and 80,000 good data points. And to learn what we needed to learn, we fit a model. Which involved lots of work, lots of interaction between the science and the data, and lots of checking. It wasn’t easy but it’s what we needed to do.
Some of these points are subtle. Applied statistics can be subtle. Taking an intro stat course, or even teaching such a class, doesn’t give you the full story. From an intro book, you can easily get the idea that when you have clean data you don’t need to model. But it all depends on what questions you’re asking.
It’s easy to write a sentence like, “viewing observational data through the lens of statistical modeling produces new biases, generally unknown and mostly unacknowledged, lurking in mathematical thickets.” That sounds reasonable enough. But if I want an estimate (and uncertainty) about the distribution of radon levels of houses with basements in Lac Qui Parle County, then, yes, I’ll accept those mathematical thickets. Math is ok if it helps us get good answers.
The bottom line
To return to Christakis and Fowler: I’d be interested to see their reply to the criticisms of Lyons and others. Perhaps they’ll simply step back a few paces and say that the Framingham data are sparse, that they’ve found some interesting patterns that they hope will inspire further study in other contexts.
After all, even if the Framingham results were unambiguously statistically significant, robust to reasonable models of measurement error, and had a clean identification strategy–even then, it’s just one group of people. In that sense, the debate about Christakis and Fowler’s particular claims, interesting and (methodologically) important as it is, is only part of a larger story of personal networks, health, and behavior. I hope that Lyons’s article and any responses by Christakis, Fowler, and others will be helpful in designing and analyzing future studies and in piecing together the big picture.
P.S. I conveyed point 5 above to Lyons and he responded that he respected models too but was concerned with models that cannot be tested. I agree with him on that. I believe that model checking is central to applied statistics. (Another point that will be familiar to regular readers of this blog.)