Bringing Causal Models Into the Mainstream

(by John Johnson)

This is a response to “Bringing Causal Models Into the Mainstream,” by Joseph Hogan in Epidemiology 20 (3), 431-432 (2009).

Hogan has an understandable desire to bring causal inference into the "mainstream," presumably meaning that more practitioners consider it as a standard tool for the analysis of observational data. Using the cohort study on the impact of physical activity on mortality in the elderly, he points out some
shortcomings in the way that causal inference is presented that may scare practitioners away from using these methods in their own research.

In terms of language, I completely agree that we should use intuitively appealing terms such as "potential outcomes" rather than mathematically precise words such as "counterfactual." (I like the title of the article "What is the effect of the treatment on the treated?") We should not be asking readers to learn a new language just so they can understand a new methodology. This new language will make causal methods appear new, foreign, and suspect to practitioners. If we combine the use of familiar language with easy to understand diagrams, this will go a long way in helping practitioners, article reviewers, and readers understand the methodology, what is being estimated, and why causal inference is used instead of other methodologies.

Hogan’s discussion of inverse weighting is interesting, because what he is really discussing is the connection of observational to randomized clinical trials. As statisticians, we know the weighting is to account for selection bias, but to a practitioner differential weighting in a sample is suspect because of the potential for bias. In fact, we are biasing the analysis through weighting, but it is to offset the selection bias inherent in an observational study. (We hope–see discussion below.) Some transparency in the discussion of sample weighting and an argument that the weighting procedure puts the observational study in line with a randomized trial as much as possible would go a long way to increasing acceptance.

However clever we are with weighting, it is not going to make the observational study the same as a randomized trial. Like Hogan, I think the sensitivity analysis is the best tool to address the reliance on the inherent assumptions and characterize the resulting uncertainty in the selection bias in the study. A clear exposition of this important part of the analysis will lead to a clear discussion of the limitations of the study and perhaps, if we’re lucky, provide guidance on future research.

So, that leaves the last point, which may cause some controversy. "Do not try this at home." Causal analysis does not have a SAS proc or simple R routine (perhaps with the exception of two-stage least squares). This is going to have to come at the end of perhaps hours of data exploration, modeling, testing, rejecting, trying something else, and finally accepting. A causal model is not always going to be easy to write into a statistical analysis plan, and primary investigators may not want something so fluid in the plan.

Honestly, I don’t know how to get around this point. I was involved in a study that, while randomized, had the treatment effect explained by a postrandomization variable. It was assumed that the antibody response to the drug explained all of the treatment effect. Most, but not all, treatment subjects responded, and also a small but notable minority of placebo subjects had an antibody response. An intent-to-treat analysis was necessary and performed (this was in fact a randomized trial), but a causal analysis was also necessary to properly combine the four groups (treated/not treated and responded/not responded). This approach took a lot of time, and there was no way around it. There was also a lot of ad hoc analysis — testing, modeling, throwing away results, etc. — which was also frustrating to project team members who simply wanted to compare treatment responders to the whole placebo population (clearly a biased approach, and rejected by regulators). Scope and time management are very difficult in this scenario, which is very undesirable to project managers.

As observational studies become more standard in the biopharmaceutical industry in the wake of enforceable postmarketing commitments, and as science improves our ability to determine biological pathways a drug takes, causal analysis will become more mainstream and perhaps essential to understanding the efficacy and effectiveness of drugs and biologics. There are a few relatively simple steps, and some difficult ones, that we statisticians can take to help our colleagues understand and trust the results of causal analyses. The difficult ones — resolving "do not try this at home" — are going to be alleviated by our developing good working relationships and increasing trust.

(John Johnson is a statistical consultant in the pharmaceuticals industry.)

Editor’s notes:

1. I find it interesting to read these discussions of causal inference in medical research. My encounters with causal reasoning come in social science, and one of my pet peeves is social scientists, in their discussions and their books, always resorting to medical examples that they know nothing about. Here the discussants actually know something about the topic, which is cool.

2. Johnson writes, “we should use intuitively appealing terms such as ‘potential outcomes’ rather than mathematically precise words such as ‘counterfactual.'” I don’t get this. The terms “potential outcome” and “counterfactual” seem equally precise to me. Don Rubin prefers “potential outcome” because often some of these are actually observed (for example, if a particular unit is treated, then the two potential outcomes are y_T and y_C, with only the latter actually being counterfactual). Other people prefer to talk about “counterfactuals,” I suppose because it is the unobserved part that makes things interesting.


3 Responses to “Bringing Causal Models Into the Mainstream”

  1. 1 Cyrus May 29, 2011 at 5:33 pm

    Johnson writes,

    “Causal analysis does not have a SAS proc or simple R routine (perhaps with the exception of two-stage least squares). This is going to have to come at the end of perhaps hours of data exploration, modeling, testing, rejecting, trying something else, and finally accepting. A causal model is not always going to be easy to write into a statistical analysis plan, and primary investigators may not want something so fluid in the plan.”

    I agree that causal modeling cannot be the result of any canned routine. However, the idea that it should involve “data exploration, modeling, testing, rejecting, trying something else, and finally accepting” sounds just as wrong to me. An alternative view, and one that I stand behind, is that causal analysis should be based on an identification strategy. The identification strategy provides the conditions through which one obtains plausibly exogenous variation in the causal factor (or “treatment”) of interest. A randomized experiment is the most appealing way to achieve this for many reasons. Alternatives rely on more stringent assumptions that, as Johnson suggests, ought to be the subject of a sensitivity analysis. In any case, the identification strategy dictates nearly all aspects of how one ought to collect the data and measure causal effects. So there ought not to be any room for “data exploration, modeling, testing, rejecting, trying something else, and finally accepting.” This perspective, which eschews using data exploration and model fitting as the basis of estimating causal effects, is well presented in the following:

    Donald B. Rubin. 2008. “For objective causal inference, design trumps analysis.” Ann. Appl. Stat. 2(3):808-840.

  2. 2 The Statistics Forum May 30, 2011 at 1:20 am


    I know what you mean, and I think for some studies you’re right, that you can come up with a good design of find a good natural experiment. In other cases, though, identification is not so clean. Here are two examples:

    – The claim “A raise won’t make you work harder,” which is supposedly supported by an identification strategy but I don’t believe it.

    – Our struggles to understand political polarization. There are causal questions lurking around but I’m still struggling to think of any identification strategy.

    One problem I have with the whole “identification strategy” framework is that I think it can lead researchers away from questions of ultimate interest.

  1. 1 Bringing causal models into the mainstream « Samarth Bhaskar’s Blog Trackback on May 10, 2011 at 4:37 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


The Statistics Forum, brought to you by the American Statistical Association and CHANCE magazine, provides everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics.

The views expressed here are those of the individual authors and not necessarily those of the ASA, its officers, or its staff. The Statistics Forum is edited by Andrew Gelman.

A Magazine for People Interested in the Analysis of Data

RSS CHANCE Magazine Online


%d bloggers like this: