## Archive Page 2

### understanding computational Bayesian statistics

(This post has been contributed by Christian Robert.)

I have just finished reading this book by Bill Bolstad (University of Waikato, New Zealand) which a previous ‘Og post pointed out when it appeared, shortly after our Introducing Monte Carlo Methods with R. My family commented that the cover was nicer than those of my books, which is true. Before I launch into a review, let me warn the ‘Og reader that, as an author of three books on computational Bayesian statistics, I cannot be very objective on the topic: I do favour the way we approached Bayesian computational methods and, after reading Bolstad’s Understanding computational Bayesian statistics, would still have written the books the way we did. Be warned, thus.

Understanding computational Bayesian statistics is covering the basics of Monte Carlo and (fixed dimension) Markov Chain Monte Carlo methods, with a fair chunk dedicated to prerequisites in Bayesian statistics and Markov chain theory. Even though I have only glanced at the table of contents of Bolstad’s Introduction to Bayesian Statistics [using almost the same nice whirl picture albeit in bronze rather than cobalt], it seems to me that the current book is the continuation of the earlier one, going beyond the Binomial, Poisson, and normal cases, to cover generalised linear models, via MCMC methods. (In this respect, it corresponds to Chapter 4 of Bayesian Core.) The book is associated with Minitab macros and an R package (written by James Curran), Bolstad2, in continuation of Bolstad, written for Introduction to Bayesian Statistics. Overall, the level of the book is such that it should be accessible to undergraduate students, MCMC methods being reduced to Gibbs, random walk and independent Metropolis-Hastings algorithms, and convergence assessments being done via autocorrelation graphs, the Gelman and Rubin (1992) intra-/inter-variance criterion, and a forward coupling device. The illustrative chapters cover logistic regression (Chap. 8), Poisson regression (Chap. 9), and normal hierarchical models (Chap. 10). Again, the overall feeling is that the book should be understandable to undergraduate students, even though it may make MCMC seem easier than it is by sticking to fairly regular models. In a sense, it is more a book of the [roaring MCMC] 90’s in that it does not incorporate advances from 2000 onwards (as seen from the reference list) like adaptive MCMC and the resurgence of importance sampling via particle systems and sequential Monte Carlo.

(by Christian P. Robert)

Julien Cornebise has [once again!] pointed out a recent Guardian article. It is about commercial publishers of academic journals, mainly Elsevier, Springer, and Wiley, with a clear stand from its title: “Academic publishers make Murdoch look like a socialist“! The valuable argument therein is that academic publishers make hefty profits (a 40% margin for Elsevier!) without contributing to the central value of the journals, namely the research itself that is mostly funded by public or semi-public bodies. The publishers of course distribute the journals to the subscribers, but the reported profits clearly show that, on average, they spend much less doing so than they charge… Here are some of the institutional rates (can you spot Elsevier journals? journals published by societies? free open access journals?!):

(apart from greed, there is no justification for the top four [Taylor and Francis/Elsevier] journals to ask for such prices! The Journal of Econometrics also charges \$50 per submission! PNAS is another story given the volume of the [non-for-profit] publication: 22750 pages in 2010, meaning it is highly time to move to being fully electronic. The rate for Statistics and Computing is another disappointment, when compared with JCGS. )

The article reports the pressure to publish in such journals (vs. non-commercial journals) because of the tyranny of the impact factors. However, the reputation of those top-tier journals is not due to the action of the publishers, but rather to the excellence of their editorial boards; there is therefore no foreseeable long-term impact in moving from one editor to another for our favourite journals. Moreover, I think that the fact to publish in top journals is more relevant for the authors themselves than for the readers when the results are already circulating through a media like arXiv. Of course, having the papers evaluated by peers in a strict academic mode is of prime importance to distinguish major advances from pseudo-science; however the electronic availability of papers and of discussion forums and blogs implies that suspicious results should anyway be detected by the community. (I am not advocating the end of academic journals, far from it!, but an evolution towards a wider range of evaluations via Internet discussions, as for the DREAM paper recently.) The article also mentions that some funding organisms impose Open Access publishing. However, this is not the ideal solution as long as journals also make a profit on that line, by charging for open access (see, e.g., PNAS or JRSS)! Hence using another chunk of public (research) money towards their profits… My opinion is that everyone should make one’s papers available on-line or better via arXiv. And petition one’s societies for a tighter control of the subscription rates, or even a move to electronic editions when the rates get out of control.

PS-Here is a link to an Australian blog, the Conversation, where some publishers (Wiley and Elsevier) were interviewed on these points. I will not comment, but this interview is quite informative on the defense arguments of the publisher!

### the theory that would not die…

(Post contributed by Christian Robert)

Here is my enthusiastic (and obviously biased) reaction to the theory that would not die. It tells the story and the stories of Bayesian statistics and of Bayesians in a most genial and entertaining manner. There may be some who will object to such a personification of science, which should be (much) more than the sum of the characters who contributed to it. However, I will defend the perspective that (Bayesian) statistical science is as much philosophy as it is mathematics and computer-science, thus that the components that led to its current state were contributed by individuals, for whom the path to those components mattered. While the book inevitably starts with the (patchy) story of Thomas Bayes’s life, incl. his passage in Edinburgh, and a nice non-mathematical description of his ball experiment, the next chapter is about “the man who did everything”, …, yes indeed, Pierre-Simon (de) Laplace himself! (An additional nice touch is the use of lower case everywhere, instead of an inflation of upper case letters!) How Laplace attacked the issue of astronomical errors is brilliantly depicted, rooting the man within statistics and explaining why he would soon move to the “probability of causes”. And rediscover plus generalise Bayes’ theorem. That his (rather unpleasant!) thirst for honours and official positions would cause later disrepute on his scientific worth is difficult to fathom, esp. when coming from knowledgeable statisticians like Florence Nightingale David.

The next chapter is about the dark ages of [not yet] Bayesian statistics and I particularly liked the links with the French army, discovering there that the great Henri Poincaré testified at Dreyfus’ trial using a Bayesian argument, that Bertillon had completely missed the probabilistic point, and that the military judges were then all aware of Bayes’ theorem, thanks to Bertrand’s probability book being used at École Polytechnique! (The last point actually was less of a surprise, given that I had collected some documents about the involvement of late 19th/early 20th century artillery officers in the development of Bayesian techniques, Edmond Lhostes and Maurice Dumas, in connection with Lyle Broemeling’s Biometrika study.) The description of the fights between Fisher and Bayesians and non-Bayesians alike is as always both entertaining and sad. Sad also is the fact that Jeffreys’ masterpiece got so little recognition at the time. (While I knew about Fisher’s unreasonable stand on smoking, going as far as defending the assumption that “lung cancer might cause smoking”(!), the Bayesian analysis of Jerome Cornfield was unknown to me. And quite fascinating.) The figure of Fisher actually permeates the whole book, as a negative bullying figure preventing further developments of early Bayesian statistics, but also as an ambivalent anti-Bayesian who eventually tried to create his own brand of Bayesian statistics in the format of fiducial statistics…

…and then there was the ghastly de Gaulle.” D. Lindley

The following part of the theory that would not die is about Bayes’ contribution to the war (WWII), at least from the Allied side. Again, I knew most of the facts about Alan Turing and Bletchley Park, however the story is well-told and, as in previous occasions, I cannot but be moved by the waste of such a superb intellect, thanks to the stupidity of governments. The role of Albert Madansky in the assessment of the [lack of] safety of nuclear weapons is also well-described, stressing the inevitability of a Bayesian assessment of a one-time event that had [thankfully] not yet happened. The above quote from Dennis Lindley is the conclusion of his argument on why Bayesian statistics were not called Laplacean; I would think instead that the French post-war attraction for abstract statistics in the wake of Bourbaki did more against this recognition than de Gaulle’s isolationism. The involvement of John Tukey into military research was also a novelty for me, but not so much as his use of Bayesian [small area] methods for NBC election night previsions. (They could not hire José nor Andrew at the time.) The conclusion of Chapter 14 on why Tukey felt the need to distance himself from Bayesianism is quite compelling. Maybe paradoxically, I ended up appreciating Chapter 15 even more for the part about the search for a missing H-bomb near Palomares, Spain, as it exposes the plusses a Bayesian analysis would have brought.

There are many classes of problems where Bayesian analyses are reasonable, mainly classes with which I have little acquaintance.” J. Tukey

When coming to recent times and to contemporaries, Sharon McGrayne gives a very detailed coverage of the coming-of-age of Bayesians like Jimmy Savage and Dennis Lindley, as well as the impact of Stein’s paradox (a personal epiphany!), along with the important impact of Howard Raiffa and Robert Schlaifer, both on business schools and on modelling prior beliefs [via conjugate priors]. I did not know anything about their scientific careers, but Applied Statistical Decision Theory is a beautiful book that prefigured both DeGroot‘s and Berger‘s. (As an aside, I was amused by Raiffa using Bayesian techniques for horse betting based on race bettors, as I had vaguely played with the idea during my spare time in the French Navy!) Similarly, while I’d read detailed scientific accounts of Frederick Mosteller’s and David Wallace’s superb Federalist Papers study, they were only names to me. Chapter 12 mostly remedied this lack of mine’s.

We are just starting” P. Diaconis

The final part, entitled Eureka!, is about the computer revolution we witnessed in the 1980’s, culminating with the (re)discovery of MCMC methods we covered in our own “history”. Because it covers stories that are closer and closer to today’s time, it inevitably crumbles into shorter and shorter stories. However, the theory that would not die conveys the essential message that Bayes’ rule had become operational, with its own computer language and objects like graphical models and Bayesian networks that could tackle huge amounts of data and real-time constraints. And used by companies like Microsoft and Google. The final pages mention neurological experiments on how the brain operates in a Bayesian-like way (a direction much followed by neurosciences, as illustrated by Peggy Series’ talk at Bayes-250).

In conclusion, I highly enjoyed reading through the theory that would not die. And I am sure most of my Bayesian colleagues will as well. Being Bayesians, they will compare the contents with their subjective priors about Bayesian history, but will in the end update those profitably. (The most obvious missing part is in my opinion the absence of E.T Jaynes and the MaxEnt community, which would deserve a chapter on its own.) Maybe ISBA could consider supporting a paperback or electronic copy to distribute to all its members! As an insider, I have little idea on how the book would be perceived by the layman: it does not contain any formula apart from [the discrete] Bayes’ rule at some point, so everyone can read it through: The current success of the theory that would not die shows that it reaches much further than academic circles. It may be that the general public does not necessarily grasp the ultimate difference between frequentist and Bayesians, or between Fisherians and Neyman-Pearsonians. However, the theory that would not die goes over all the elements that explain these differences. In particular, the parts about single events are quite illuminating on the specificities of the Bayesian approach. I will certainly [more than] recommend it to my graduate students (and buy the French version for my mother once it is translated, so that she finally understands why I once gave a talk “Don’t tell my mom I am Bayesian” at ENSAE…!) If there is any doubt from the above, I obviously recommend the book to all the readers of the Statistic Forum!

### A misleading title… [book review]

(By Christian Robert)

When I received this book, Handbook of fitting statistical distributions with R, by Z. Karian and E.J. Dudewicz, from/for the Short Book Reviews section of the International Statistical Review, I was obviously impressed by its size (around 1700 pages and 3 kilos…). From briefly glancing at the table of contents, and the list of standard distributions appearing as subsections of the first chapters, I thought that the authors were covering different estimation/fitting techniques for most of the standard distributions. After taking a closer look at the book, I think the cover is misleading in several aspects: this is not a handbook (a.k.a. a reference book), it does not cover standard statistical distributions, the R input is marginal, and the authors only wrote part of the book, since about half of the chapters were written by other authors…

### Error and Inference (part 1)

(by Christian Robert)

“The philosophy of science offer valuable tools for understanding and advancing solutions to the problems of evidence and inference in practice”—D. Mayo & A. Spanos, p.xiv, Error and Inference, 2010

Deborah Mayo kindly sent me her book, whose subtitle is “Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of Science” and contributors are P. Achinstein, A. Chalmers, D. Cox, C. Glymour, L. Laudan, A. Musgrave, and J. Worrall, plus both editors, Deborah Mayo and Aris Spanos. Deborah Mayo rightly inferred that this debate was bound to appeal to my worries about the nature of testing and model choice and to my layman interest in the philosophy of Science. Speaking of which [layman], the book reads really well, even though I am clearly missing references to Mayo’s and others’ earlier works. And even though it cannot be read under my cherry tree (esp. now that weather has moved from été to étaumne… as I heard this morning on the national public radio) Deborah Mayo is clearly the driving force in putting this volume together, from setting the ERROR 06 conference to commenting the chapters of all contributors (but her own and Aris Spanos’). Her strongly frequentist perspective on the issues of testing and model choice are thus reflected in the overall tone of the volume, even though contributors bring some contradiction to the debate. A complete book review was published in the Notre-Dame Philosophical Review.

However, scientists wish to resist relativistic, fuzzy, or post-modern turns (…) Notably, the Popperian requirement that our theories are testable and falsifiable is widely regarded to contain important insights about responsible science and objectivity.—D. Mayo & A. Spanos, p.2, Error and Inference, 2010

Given the philosophical, complex, and interesting nature of the work, I will split my comments into several linear posts (hence the part 1), as I did for Evidence and Evolution. The following comments are thus about a linear (even pedestrian) and incomplete read through the first three chapters. Those comments are not pretending at any depth, they simply reflect the handwritten notes, thoughts, and counterarguments I scribbled as I was reading through… As illustrated by the above quote (which first part I obviously endorse), the overall perspective in the book is Popperian, despite Popper’s criticism of statistical inference as a whole. Another fundamental concept throughout the book is the “Error-Statistical philosophy” whose Deborah Mayo is the proponent. One of the tenets of this philosophy is a reliance on statistical significance tests in the Fisher-Neyman-Pearson (or frequentist) tradition, along with a severity principle (“We want hypotheses that will allow for stringent testing so that if they pass we have evidence of a genuine experimental effect“, p.19) stated as (p.22)

A hypothesis H passes a severe test T with data x is

1. x agrees with H, and
2. with very high probability, test T would have produced a result that accords less well with H than does x, if H were false or incorrect.

(The p-value is advanced as a direct accomplishment of this goal, but I fail to see why it does or why a Bayes factor would not. Indeed, the criterion depends on the definition of probability when H is false or incorrect. This relates to Mayo’s criticism of the Bayesian approach, as explained below.)

Formal error-statistical tests provide tools to ensure that errors will be correctly detected with high probabilities“—D. Mayo, p.33, Error and Inference, 2010

In Chapter 1, Deborah Mayo has a direct go at the Bayesian approach. The main criticism is about the Bayesian approach to testing (defined through the posterior probability of the hypothesis, rather than through the predictive) is about the catchall hypothesis, a somehow desultory term replacing the alternative hypothesis. According to Deborah Mayo, this alternative should “include all possible rivals, including those not even though of” (p.37). This sounds like a weak argument, although it was also used by Alan Templeton in his rebuttal of ABC, given that (a) it should also apply in the frequentist sense, in order to define the probability distribution “when H is false or incorrect” (see, e.g., “probability of so good an agreement (between H and x) calculated under the assumption that H is false”, p.40); (b) a well-defined alternative should be available as testing an hypothesis is very rarely the end of the story: if H is rejected, there should/will be a contingency plan; (c) rejecting or accepting an hypothesis H in terms of the sole null hypothesis H does not make sense from operational as well as from game-theoretic perspectives. The further argument that the posterior probability of H is a direct function of the prior probability of H does not stand against the Bayes factor. (The same applies to the criticism that the Bayesian approach does not accommodate newcomers, i.e., new alternatives.) Stating that “one cannot vouch for the reliability of [this Bayesian] procedure—that it would rarely affirm theory T were T false” (p.37) completely ignores the wealth of results about the consistency of the Bayes factor (since the “asymptotic long run”, p.20, matters in the Error-Statistical philosophy). The final argument that Bayesians rank “theories that fit the data equally well (i.e., have identical likelihoods)” (p.38) does not account for (or dismisses, p.50, referring to Jeffreys and Berger instead of Jefferys and Berger) the fact that Bayes factors are automated Occam’s razors in that the averaging of the likelihoods over spaces of different dimensions are natural advocates of simpler models. Even though I plan to discuss this point in a second post, Deborah Mayo also seems to imply that Bayesians are using the data twice (this is how I interpret the insistence on same p. 50), which is a sin [genuine] Bayesian analysis can hardly be found guilty of!

### another lottery coincidence

(by Christian Robert)

Once again, meaningless figures are published about a man who won the French lottery (Le Loto) for the second time. The reported probability of the event is indeed one chance out of 363 (US) trillions (i.e., billions in the metric system. or 1012)… This number is simply the square of

${49 \choose 5}\times{10 \choose 1} = 19,068,840$

which is the number of possible loto grids. Thus, the probability applies to the event “Mr so-&-so plays a winning grid of Le Loto on May 6, 1995 and a winning grid of Le Loto on July 27, 2011“. But this is not the event that occured: one of the bi-weekly winners of Le Loto won a second time and this was spotted by Le Loto spokepersons. If we take the specific winner for today’s draw, Mrs such-&-such, who played bi-weekly one single grid since the creation of Le Loto in 1976, ie about 3640 times, the probability that she won earlier is of the order of

$1-\left(1-\frac{1}{{49\choose 5}\times{10\choose 1}}\right)^{3640}=2\cdot 10^{-4}$.

There is thus a chance in 20 thousands to win again for a given (unigrid) winner, not much indeed, but no billion involved either. Now, this is also the probability that, for a given draw (like today’s draw), one of the 3640 previous winners wins again (assuming they all play only one grid, play independently from each other, &tc.). Over a given year, i.e. over 104 draws, the probability that there is no second-time winner is thus approximately

$\left(1-\frac{1}{2\cdot10^4}\right)^{104} = 0.98,$

showing that within a year there is a 2% chance to find an earlier winner. Not so extreme, isn’t it?! Therefore, less bound to make the headlines…

Now, the above are rough and conservative calculations. The newspaper articles about the double winner report that the man is playing about 1000 euros a month (this is roughly the minimum wage!), representing the equivalent of 62 grids per draw (again I am simplifying to get the correct order of magnitude). If we repeat the above computations, assuming this man has played 62 grids per draw from the beginning of the game in 1976 till now, the probability that he wins again conditional on the fact that he won once is

$1-\left(1-\frac{62}{{49 \choose 5}\times{10 \choose 1}}\right)^{3640} = 0.012$,

a small but not impossible event. (And again, we consider the probability only for Mr so-&-so, while the event of interest does not.) (I wrote this post before Alexs Jakulin pointed out the four-time lottery winner in Texas, whose “luck” seems more related with the imperfections of the lottery process…)

I also stumbled on this bogus site providing the “probabilities” (based on the binomial distribution, nothing less!) for each digit in Le Loto, no need for further comments. (Even the society that runs Le Loto hints at such practices, by providing the number of consecutive draws a given number has not appeared, with the sole warning “N’oubliez jamais que le hasard ne se contrôle pas“, i.e. “Always keep in mind that chance cannot be controlled“…!)

### Numerical analysis for statisticians [a review]

(Post contributed by Christian Robert.)

“In the end, it really is just a matter of choosing the relevant parts of mathematics and ignoring the rest. Of course, the hard part is deciding what is irrelevant.”

Somehow, I had missed the first edition of this book and thus I started reading it this afternoon with a newcomer’s eyes (obviously, I will not comment on the differences with the first edition, sketched by the author in the Preface). Past the initial surprise of discovering it was a mathematics book rather than an algorithmic book, I became engrossed into my reading and could not let it go! Numerical Analysis for Statisticians, by Kenneth Lange, is a wonderful book. It provides most of the necessary background in calculus and some algebra to conduct rigorous numerical analyses of statistical problems. This includes expansions, eigen-analysis, optimisation, integration, approximation theory, and simulation, in less than 600 pages. It may be due to the fact that I was reading the book in my garden, with the background noise of the wind in tree leaves, but I cannot find any solid fact to grumble about! Not even about the MCMC chapters! I simply enjoyed Numerical Analysis for Statisticians from beginning till end.

“Many fine textbooks (…) are hardly substitutes for a theoretical treatment emphasizing mathematical motivations and derivations. However, students do need exposure to real computing and thoughtful numerical exercises. Mastery of theory is enhanced by the nitty gritty of coding.”

From the above, it may sound as if Numerical Analysis for Statisticians does not fulfill its purpose and is too much of a mathematical book. Be assured this is not the case: the contents are firmly grounded in calculus (analysis) but the (numerical) algorithms are only one code away. An illustration (among many) is found in Section 8.4: Finding a Single Eigenvalue, where Kenneth Lange shows how the Raleigh quotient algorithm of the previous section can be exploited to this aim, when supplemented with a good initial guess based on Gerschgorin’s circle theorem. This is brilliantly executed in two pages and the code is just one keyboard away. The EM algorithm is immersed into a larger M[&]M perspective. Problems are numerous and mostly of high standards, meaning one (including me) has to sit and think about them. References are kept to a minimum, they are mostly (highly recommended) books, plus a few papers primarily exploited in the problem sections. (When reading the Preface, I found that “John Kimmel, [his] long suffering editor, exhibited extraordinary patience in encouraging [him] to get on with this project”. The quality of Numerical Analysis for Statisticians is also a testimony to John’s editorial acumen!)

“Every advance in computer architecture and software tempts statisticians to tackle numerically harder problems. To do so intelligently requires a good working knowledge of numerical analysis. This book equips students to craft their own software and to understand the advantages and disadvantages of different numerical methods. Issues of numerical stability, accurate approximation, computational complexity, and mathematical modeling share the limelight in a broad yet rigorous overview of those parts of numerical analysis most relevant to statisticians.”

While I am reacting so enthusiastically to the book (imagine, there is even a full chapter on continued fractions!), it may be that my French math background is biasing my evaluation and that graduate students over the World would find the book too hard. However, I do not think so: the style of Numerical Analysis for Statisticians is very fluid and the rigorous mathematics are mostly at the level of undergraduate calculus. The more advanced topics like wavelets, Fourier transforms and Hilbert spaces are very well-introduced and do not require prerequisites in complex calculus or functional analysis. (Although I take no joy in this, even measure theory does not appear to be a prerequisite!) On the other hand, there is a prerequisite for a good background in statistics. This book will clearly involve a lot of work from the reader, but the respect shown by Kenneth Lange to those readers will sufficiently motivate them to keep them going till assimilation of those essential notions. Numerical Analysis for Statisticians is also recommended for more senior researchers and not only for building one or two courses on the bases of statistical computing. It contains most of the math bases that we need, even if we do not know we need them! Truly an essential book.