(by Andrew Gelman)

Mike Jordan reports in the bulletin of the International Society of Bayesian Analysis the results of his excellent project of contacting about fifty statistics professors and asking them, What are the open problems in Bayesian statistics?

I’ll excerpt Jordan’s findings (his full article is here), then give my thoughts on what he reported, along with my own response to his query from last month.

Here’s Jordan:

Nonparametrics and semiparametrics. Bayesian nonparametrics is viewed by some of my respondents as a class of methods looking for a problem, and so the main open problem in Bayesian nonparametrics is (for some people) that of finding a characterization of classes of problems for which these tools are worth the trouble. . . . problems surrounding prior specification and identifiability were viewed as particularly virulent in the nonparametric setting. David Dunson: “Nonparametric Bayes models involve infinitely many parameters and priors are typically chosen for convenience with hyperparameters set at seemingly reasonable values with no proper objective or subjective justification.” . . .

Priors. Not surprisingly, priors were on the minds of many. Elicitation remains a major source of open problems. Tony O’Hagan avers: “When it comes to eliciting distributions for two or more uncertain quantities we are working more or less in the dark.” Mike West pointed to the fact that many scientific fields express their prior knowledge in terms of “scientifically predictive models,” and using these models in a statistical setting involves the quintessentially Bayesian tasks of understanding assumptions and conducting detailed sensitivity analyses. . . .

Bayesian/frequentist relationships. As already mentioned in the nonparametrics section, many respondents expressed a desire to further hammer out Bayesian/frequentist relationships. . . . whether there might be a sense in which it is worthwhile to give up some Bayesian coherence in return for some of the advantages of the frequentist paradigm, including simplicity of implementation and computational tractability.

Computation and statistics. It was interesting to see some disagreement on the subject of computation, with some people feeling that MCMC has tamed the issue, and with others (the majority by my count) opining that many open problems remain. . . . Rob Kass raised the possibility of a notion of “inferential solvability,” where some problems are understood to be beyond hope . . .

Model selection and hypothesis testing. I have placed this topic as number one not only for the large numbers of respondents mentioning it, but also for the urgency that was transmitted. . . . George Casella is concerned about lack of theory for inference after selection: “We now do model selection but Bayesians don’t seem to worry about the properties of basing inference on the selected model. What if it is wrong? . . . ”

My reactions to the above:

1. I’m surprised to hear people characterize nonparametrics “as a class of methods looking for a problem.” To me, the world is full of curves to model for which there is no clear parametric form. Examples range from tree-ring series to opinions on gay rights. If people are out there looking for a problem to apply their nonparametric methods to, I don’t think they’re looking very hard!

2. I think many academic statisticians have been brainwashed about the purported benefits of so-called frequentist paradigm. “Simplicity of implementation and computational tractability”? Huh? You gotta be kidding. Please tell me how to fit a model in toxicology or estimate public opinion by income and state in a simple computationally tractable frequentist way. Oh, and by the way, the results have to be reasonable. It’s not enough to have a method that’s simple, tractable, and silly–that’s reminiscent of the Thurber fairy-tale phrase, “healthy, wealthy, and dead.”

3. I like Kass’s inferential solvability idea, which reminds me of Don Rubin’s dictum that the model you’re using is the model you fit, not the model you want to fit, and it also reminds me of the folk theorem of computational statistics.

4. Casella asks, What if the model is wrong? I’ll answer that one. The model is wrong! Bayesian inference is conditional on our models. I agree that it’s good to understand what happens under reasonable departures from our assumptions.

My original response to Mike’s query:

I divide Bayesian data analysis into three steps: model building, inference, and model checking. For model building, the open problem remains the development of potentially infinite models that unfold to fit ever-larger datasets. There is some great work in this direction in nonparametric Bayes but the potential for a lot more. For inference, a key open problem is weakly informative priors, making more use of Bayes-as-regularization. For model checking, a key open problem is developing graphical tools for understanding and comparing models. Graphics is not just for raw data; rather, complex Bayesian models give opportunity for better and more effective exploratory data analysis.

What are your thoughts?

“Simplicity of implementation and computational tractability”? For most intractable Bayesian models, can’t we formulate a ‘frequentist’ counterpart? It’s not clear to me that computational tractability as a Bayesian/frequentist issue, except perhaps because Bayesians are apt to build more complex models (‘perhaps’ because the evidence for this is anecdotal).

I wonder if the perception of ‘simplicity of implementation’ stems from an *abundance* of implementations of classical methods. That is, implementation of classical methods appear simpler because so many implementations are readily available.

I agree with Andrew about nonparametrics: there are plenty of uses for curves that don’t belong to standard parametric families.

Computation deserves further comment. While it’s true that there are unsolved problems in Bayesian computation, it’s also true that Bayesian statistics owes its current popularity to ease of computation. It was when MCMC allowed us to work with more complex, and therefore more realistic, models, that practitioners started using Bayes. To applied scientists, the ability to work with ever-more realistic models is usually more important than philosophical differences between Bayesian and classical statistics.

I also think the call for “agreed upon methods” for model selection and hypothesis testing is misdirected. There are several problems that statisticians have called model selection and hypothesis testing. Here is a partial list.

a) How does the best parameter (e.g. mle) in Model 1 compare to the best parameter in Model 2?

b) How does p(data|M1) compare to p(data|M2)?

c) Does M1 provide a reasonable explanation for the data?

d) What is the (or a reasonable) posterior probability of M1?

e) If M2 beats M1 according to AIC, BIC, likelihood, Bayes factor, or whatever is your favorite criterion, is it because M2 is slightly better for many points, or lots better for just a few points? More generally, which points are better modeled by M1 and which by M2, and by how much?

Any one of these questions might be appropriate in a particular scientific investigation. I don’t see why we should expect to have a single set of agreed upon methods that covers all of them. (Yes, we could hope for a collection of agreed upon methods, one or more for each type of problem. But so far, I don’t see the general recognition that there really are several problems, all called ‘hypothesis testing’.) So, before we ask for a unified theory, we should assess whether a unified theory is either possible or desirable.

An important open problem, not mentioned in the original article, is how our models apportion variation. Many of our models are ill-identified. In those cases, wiggles or features in the data might be attributed to more than one component of the model; it’s as though different components compete to explain various aspects of the data. In my experience, the attribution can be highly sensitive to small features of the models, features that we have usually chosen for convenience. And, too often, we draw inferences that depend heavily on those features. For one good analysis of those features in a spatial setting, see Reich, Hodges, and Zadnik, Biometrics, 2006.