(by Julien Cornebise)
JSM is now rolling at full steam in its third day yesterday. I kicked off the day with a great panel with three massive speakers: Christian “Xian” Robert, Jim Berger, and Andrew Gelman, on Controversies in the Philosophy of Bayesian statistics. It turned out to be less about philosophy than about controversies (past and present), which in a sense suited better what I expecting from it!
Large attendance, as was to be expected. Appart from the fact that Xian is “getting old and hence less critical” (yeah, right… although he might indeed be getting old, as Susie Bayarri spotted gleefully that he had his Bayes formula wrong in his slides! Reinsuring to see that it can even happen to him 😉 ), several salient topics were covered, with deep and subtle discussions such that it is hard to do them full justice here.
Here is for an all-too-brief summary of what I most recall.
On the choice of the prior (first topic to pop up):
- Jim opened the dance with Objective Bayes vs Subjective Bayes, with the thought-catching “if everyone in the application agrees on a prior, it’s not subjective any more”.
- Andrew chipped in about how frequentist linear regression is not objective: your subjective prior knowledge is hidden in the structure of the subgroups you choose to look at. Besides, on the topic of sensitivity analysis, sure, but then do it on your likelihood too!
- To which Xian added that the sensitivity analysis can conveniently be turned into a hierarchical model: if you check the impact of your prior for such or such range of hyperparameters, then you ought as well put this information in an upper level in the hierarchy.
The conversation — to which the audience played a great part — then moved on to model selection and Jeffrey’s Scale of Evidence for Bayes Factor:
- how its arbitrary nature is a drag to precision: Jim argued that everyone knows what a 10 to 1 chance mean, and that we ought to teach to those who don’t; no need for such a truncative scale.
- the three also agreed that this scale, for convenient that it may seem, should actually really depend on the application field: a physicist will not consider the same level of evidence as a sociologist.
On the opposition with Frequentism:
- I was much impressed to see how Jim bridged the gap, clearly stating that the community made some mistakes 15 years ago, and that since then many new developments conciliated the two approaches, especially on multiple testing;
- All three converged on the fact that frequentist makes sense when you are expecting long-run behaviour to stay stable,
- and even though Xian argued that in some cases long-run makes no sense as the data is here, it won’t be growing to infinity,
- I, for myself, am glad to use some frequentist tools (e.g. EM to design proposal kernels) within Monte-Carlo algorithms, where precisely I can make my sample grow on an on! (nb: please sample responsibly)
Finally, on Non-Parametric Bayes (NPB):
- Jim stated that, as we’ve seen in Mike Jordan’s impressive talk the day before, those are brilliant for discovery and learning, flexible and powerful;
- however, he is not convinced of their use for statistical inference, as there is no theoretical long run insurance of consistency (no Von Mises theorem),
- and as it is hard to understand what the prior is doing: although NPB indeed does less assumptions, it is not better than more classical priors because of this.
- I would add that, from what I understood from Mike’s talk, the assumptions are still present, just not in the form of distributions, but in the hierarchy.
Overall an intense and enlightening discussion: I learned a lot, finding echos to my half baked questions, essentially getting out of it the equivalent of reading through hundreds of pages of the speaker’s books and blogs!
The next session I attended was Silvia Richardson’s Medallion lecture on Recent Developments in Bayesian Methods for Discovering Regression Structures: Applications in the Health Sciences. This was a very full 2 hours lecture on some of her most recent developments, especially how to combine:
- dimension reduction by profile regression and clustering by means of finite mixture models — with an illuminating forumlation of appropriate post-processing analysis to get a good grasp of the large amount of output, with functionals invariant to relabelling; this latter step is often overlooked, but nevertheless mandatory to gain the full power of the flexibility of mixtures, as she brilliantly showed on appealing graphs with high investment/high reward.
- variable selection going beyond plain 0-1 switches on the variables, moving from “0-1 useful covariate for such or such profile cluster” to “useful covariate with such probability for clustering” — a subtle difference that has a large impact on the quality of the inference.
- Bayesian sparse regression, especially her free Evolutionary Monte-Carlo C++ software ESS++, optimized so as to reach circa one million evaluation of very intricate models in less than 10 hours — meaning you have a really significant sample in a day or two, a feat for such intricate applications.
My afternoon session was then such an experience that it deserves a post by itself.
Also, Dick De Veaux very kindly sent me the slides of CounterExample talk mentioned on day 1! Really worth a check — even though they’re only part of the story, missing the equally-counter-example way he actually gave his talk.