(by Andrew Gelman)
Mike Jordan reports in the bulletin of the International Society of Bayesian Analysis the results of his excellent project of contacting about fifty statistics professors and asking them, What are the open problems in Bayesian statistics?
I’ll excerpt Jordan’s findings (his full article is here), then give my thoughts on what he reported, along with my own response to his query from last month.
Nonparametrics and semiparametrics. Bayesian nonparametrics is viewed by some of my respondents as a class of methods looking for a problem, and so the main open problem in Bayesian nonparametrics is (for some people) that of finding a characterization of classes of problems for which these tools are worth the trouble. . . . problems surrounding prior specification and identifiability were viewed as particularly virulent in the nonparametric setting. David Dunson: “Nonparametric Bayes models involve infinitely many parameters and priors are typically chosen for convenience with hyperparameters set at seemingly reasonable values with no proper objective or subjective justification.” . . .
Priors. Not surprisingly, priors were on the minds of many. Elicitation remains a major source of open problems. Tony O’Hagan avers: “When it comes to eliciting distributions for two or more uncertain quantities we are working more or less in the dark.” Mike West pointed to the fact that many scientific fields express their prior knowledge in terms of “scientifically predictive models,” and using these models in a statistical setting involves the quintessentially Bayesian tasks of understanding assumptions and conducting detailed sensitivity analyses. . . .
Bayesian/frequentist relationships. As already mentioned in the nonparametrics section, many respondents expressed a desire to further hammer out Bayesian/frequentist relationships. . . . whether there might be a sense in which it is worthwhile to give up some Bayesian coherence in return for some of the advantages of the frequentist paradigm, including simplicity of implementation and computational tractability.
Computation and statistics. It was interesting to see some disagreement on the subject of computation, with some people feeling that MCMC has tamed the issue, and with others (the majority by my count) opining that many open problems remain. . . . Rob Kass raised the possibility of a notion of “inferential solvability,” where some problems are understood to be beyond hope . . .
Model selection and hypothesis testing. I have placed this topic as number one not only for the large numbers of respondents mentioning it, but also for the urgency that was transmitted. . . . George Casella is concerned about lack of theory for inference after selection: “We now do model selection but Bayesians don’t seem to worry about the properties of basing inference on the selected model. What if it is wrong? . . . ”
My reactions to the above:
1. I’m surprised to hear people characterize nonparametrics “as a class of methods looking for a problem.” To me, the world is full of curves to model for which there is no clear parametric form. Examples range from tree-ring series to opinions on gay rights. If people are out there looking for a problem to apply their nonparametric methods to, I don’t think they’re looking very hard!
2. I think many academic statisticians have been brainwashed about the purported benefits of so-called frequentist paradigm. “Simplicity of implementation and computational tractability”? Huh? You gotta be kidding. Please tell me how to fit a model in toxicology or estimate public opinion by income and state in a simple computationally tractable frequentist way. Oh, and by the way, the results have to be reasonable. It’s not enough to have a method that’s simple, tractable, and silly–that’s reminiscent of the Thurber fairy-tale phrase, “healthy, wealthy, and dead.”
3. I like Kass’s inferential solvability idea, which reminds me of Don Rubin’s dictum that the model you’re using is the model you fit, not the model you want to fit, and it also reminds me of the folk theorem of computational statistics.
4. Casella asks, What if the model is wrong? I’ll answer that one. The model is wrong! Bayesian inference is conditional on our models. I agree that it’s good to understand what happens under reasonable departures from our assumptions.
My original response to Mike’s query:
I divide Bayesian data analysis into three steps: model building, inference, and model checking. For model building, the open problem remains the development of potentially infinite models that unfold to fit ever-larger datasets. There is some great work in this direction in nonparametric Bayes but the potential for a lot more. For inference, a key open problem is weakly informative priors, making more use of Bayes-as-regularization. For model checking, a key open problem is developing graphical tools for understanding and comparing models. Graphics is not just for raw data; rather, complex Bayesian models give opportunity for better and more effective exploratory data analysis.
What are your thoughts?