We’re looking for stories of everyday statistical life. What’s it like to do statistics?

See here for further details.

]]>Following our discussion of Avastin, a drug that some have argued doesn’t work for one of its Medicare-approved uses, an anonymous source who has worked with a foreign regulatory review agency, speculated that at least there, in similar situations, the lack of observed OS advantage would not have been a large concern (for the statistical reasons that Don Berry raised in his thoughtful post). However, there is uncertainty about a PS advantage always leading to OS survival. This can be fairly well contained from the context in many cancers, as Don Berry also indicates, but it remains extra uncertainty above what is usual in the evaluation of RCT based evidence.

The larger concerns more likely would have been:

1. Is the benefit of the uncertain increase in PFS worth the risk of uncertain harms observed in the trials?

2. How does one properly discount what was “reported” about the trials/research program by the sponsor, given as Sander Greenland nicely put it is his short post “the influences of vested interests” that almost surely exaggerated those reports?

For 1, it needs to be kept in mind that the consideration of costs and whether to provide the approved products to which patients (i.e. consider cost-effectiveness) might be best left for others to decide later. That is, an initial focus to strictly decide if benefits exceed (justify) the harms might be better. In this initial focus, note that the increases in PFS are often small, harms often include fatal ones, and the observed harms in the trials usually considerably under estimate the harms in the less well controlled practice settings.

For 2, one should not blame the drug companies or expect otherwise from their employees/consultants. It is in partly human nature and also what business is about (when within reason such as when the “answer” is not obvious to most and there is enough uncertainty for judgments to vary widely.) See this paper by Sander Greenland.

]]>Last week I posted a link to New York Times columnist Joe Nocera’s claim that Medicare is a corrupt system that is paying for a drug, Avastin, that doesn’t work for breast cancer patients. I sent the pointer to several colleagues, along with the following note:

I have three goals here:

1. To get to the bottom of the Avastin story.

2. If Nocera is correct, to add the voice of the statistics profession to his push to have more rational rules for drug approval.

3. To promote the Statistics Forum, as part of a larger goal of engaging the statistics community in frequent public discussions.

In the old days, professional discussions were centered on journal articles. We still do this, but blogs and other online forums are becoming increasingly important. By contributing a couple paragraphs here (including links to further reading), you will be helping in all three goals above. Just send to me and I will post at the Forum (under your name).

(I myself am doing this in the spirit of service to the profession in my duties as editor of the Statistics Forum.)

I received several responses. The quick summary is:

1. Nocera claimed that Avastin doesn’t work for breast cancer. Actually, though, the data seem to show that it does work.

2. It’s not clear whether Medicare should be paying for the drug. In any case, the drug reimbursement system is a mess.

Now here are the comments from the experts.

From biostatistician John Carlin:

I asked a colleague of mine about it – he sits on Australia’s Pharmaceutical Benefits Advisory Committee, which rather uniquely in the world (I believe) is charged with examining cost-effectiveness of new drugs in order to advise government as to whether they should be subsidised under the national Pharmaceutical Benefits Scheme (which pays the bulk of drug costs in this country). As I understand it, Avastin (bevacizumab) is still approved here as a treatment for certain indications but it is not approved for listing on the PBS, and without receiving a subsidy via that mechanism its cost is generally prohibitive.

See also this paper by John Ioannidis, which highlights issues about selective early evidence and multiplicity of comparisons for this drug: http://www.bmj.com/content/341/bmj.c4875.

From statistician Howard Wainer:

Data and evidence seems to have only a minor role in today’s political climate — witness the support of policies for cutting taxes to reduce the deficit. In healthcare note the reaction to the evidence-based decision to cut back massively on such diagnostic tests as PSAs and mammograms in which their huge number of false positives makes the probability o a true positive tiny and clinical results that confirm that early detection of breast cancer seems to have no effect on survival (with modern treatment) and the lack of a clear advantage for treatment over not for prostate cancer.

None-the-less the size of the diagnostic industry presses against rationality.

From a health-care economist:

From what I can understand, there are scientific, technical, and political forces at work here. The scientific issue is that Avastin is believed to work for some women, but not all. No one knows which women will benefit, and so the FDA ordered Roche/Genentech to figure this out. The technical issue is that by law, CMS has to cover any approved cancer drug for an off-label indication if the indication is included in one of several named drug compendia. Because CMS does not like making these decisions, this generally suits CMS fine. In this case, the National Comprehensive Cancer Network (NCCN), “a non-profit group of oncologists whose guidance is closely followed by leading treatment centers, has voted overwhelmingly in favor of maintaining its recommendation that Avastin should be used to treat breast cancer.” The NCCN vote was 24-0, with one abstention. That brings up the politics here. The interesting dimension of politics is not the death panel charge — though that’s obviously there — but how organizations like the NCCN make their decisions. There is an undercurrent of sleeziness that one picks up. E.g., a bunch of members of the NCCN have ties to Roche; they make their money off this stuff; etc. All of these play together to create outcomes like this.

From epidemiologist Sander Greenland:

I laud Andrew’s call to think about what might be done in the Avastin case in particular and the general basis for Medicare reimbursement. From the reported information the latter seems not to have much in the way of conflict-of-interest safeguards (unlike FDA panels). That problem, however, is not one of technical statistics or evidence evaluation – it’s a far more fundamental and touchy issue of protecting decision-making from the influences of vested interests.

From psychologist Jon Baron:

I have long advocated the use of cost-effectiveness analysis in health care. (See, for example, the chapter on utility measurement in my textbook Thinking and Deciding. The section on the Oregon Health Plan has been there since the 3d edition in 2000.) Of course, the method has problems. Measurement of utility is not rocket science. (I compare it to the measurement of time in ancient Egypt.) Nor is measurement of cost, for that matter. But the issue is “compared to what?” All other methods of allocating health resources seem more likely to waste resources that could be put to better use, saving lives and increasing quality of life, elsewhere.

A move in this direction in Obama’s health-care proposal was derided as “death panels”. My understanding is that it is still there and could grow in the future.

I have not read the literature on Avastin. But, from Nocera’s article and other things I’ve read about it, it seems to me that it would not pass a cost-effectiveness test with some reasonable limit on dollars per Quality-adjusted Life Year. Whether it has a statistically significant benefit is beside the point. The issue is, I think, our best estimate of its effectiveness given the data available. If the FDA thinks it does no good at all, then it probably does not do very much good, and it is very expensive. Some newer treatments for prostate cancer seem to fall in the same category, and it is possible that the same is true of the HPV vaccine (in the U.S., where Pap smears are routine).

Of course, any use of cost-effectiveness analysis needs to be careful about research, including “Phase 3” research on extensions of drugs to off-label uses and other issues not addressed before approval.

But the main problem is public opposition. People have trouble with trade-offs and tend to think in terms of an opposition between “cover everything that might help” and “put limits on the amount of money that is spent and let someone else figure out how to allocate resources within those limits”. The latter approach might work, but only if the “someone else” were willing to use cost-effectiveness and if the limits were sufficiently generous.

On the latter point, it seems to me that health care is very important and that its cost, relative to other ways of spending money, is not that high for what it does. The idea that the total budget for health care should not increase seems wrong. Health care is getting better, and thus it is reasonable for people to want more of it. The idea of keeping the cost in line with inflation is like saying that, between 1945 and 1960, the amount of money spent on home entertainment increased too much and must be limited.

I saved the best for last. Here’s statistician Don Berry, who is an expert on medical decision making in general and cancer treatment in particular:

]]>The Avastin story is a very long one, with many turns. The FDA approval question for metastatic breast cancer, which is the issue here, has divided oncologists. There are rational arguments on both sides.

There is no question that Avastin “works” in the sense that it has an anti-tumor effect. Everyone agrees with that, well, with the possible exception of Joe Nocera. In particular, it clearly delays progression of metastatic breast cancer, which is the reason it was approved for treating that disease in the first place. The FDA reversed itself (for this disease but not for other cancers such as lung and colon that will remain on Avastin’s label) because Avastin has not been shown to statistically significantly prolong overall survival (OS). Some oncologists–actually, most oncologists–argue that progression-free survival (PFS) is clinically meaningful and should be a registration end point. The FDA’s position–and that of the Oncology Drug Advisory Committee (ODAC)–is that improved PFS is not usually enough to approve drugs without empirical evidence of improved OS to go along with it.

Genentech/Roche appealed to the FDA Commissioner after the FDA’s first decision (based on ODAC’s recommendation) to remove metastatic breast cancer from Avastin’s label. This kind of appeal is legal but is almost never used. Genentech mustered many arguments. Several were statistical. For example, they argued that none of the clinical trials in question were powered to show a benefit in OS. The following article addresses this question:

Broglio KR, Berry DA (2009). Detecting an overall survival benefit that is derived from progression-free survival. Journal of the National Cancer Institute 101:1642-1649.

This article demonstrates that it’s very difficult to power a study to show an OS benefit when survival post-progression (SPP=OS-PFS) is long, which it is in metastatic breast cancer, about 2 years in some of the Avastin trials. The article argues that even if an advantage in PFS translates perfectly into the same advantage in OS, the variability after progression so dilutes the OS effect that it’s likely to be lost.

The assumptions in the article about SPP are realistic. I know many example clinical trials in many types of cancer (and I know no counterexamples) where SPP is essentially the same in both treatment groups, even when the experimental drug showed better PFS than control. This is despite crossovers (to the drug when the control patient progresses) and potentially greater efforts by the clinicians in one treatment group to keep their patients alive. (The main reason that SPP is similar in the two treatment groups is that metastatic cancer is almost uniformly fatal and it’s hard to slow the disease after it’s set up housekeeping throughout the body.) It makes sense that a drug that was effective in delaying progression is not effective after progression because the drug is almost always stopped when the patient progresses, and the patient usually goes onto another drug.

For the full Avastin story check out The Cancer Letter http://www.cancerletter.com/downloads/20111118/download and links provided therein.

New York Times columnist Joe Nocera tells the story of Avastin, a cancer drug produced by Genentech that Medicare pays for in breast cancer treatment even though the Food and Drug Administration says it doesn’t work. Nocera writes:

For breast cancer patients, Avastin neither suppresses tumor growth to any significant degree nor extends life. Although a 2007 study showed Avastin adding 5.5 months to progression-free survival, subsequent studies have failed to replicate that result.

As a result of that first, optimistic study, the Food and Drug Administration gave the drug “accelerated approval,” meaning it could be marketed as a breast cancer therapy while further studies were conducted. Those follow-up studies are what caused a panel of F.D.A. experts to then withdraw that approval . . . After weighing the evidence, the F.D.A. panel voted 6 to 0 against Avastin.

After Genentech appealed, Dr. Margaret Hamburg, the F.D.A. commissioner, affirmed the decision on Friday in a ruling that would seem, on its face, unassailable. She essentially said that F.D.A. decisions had to be driven by science, and the science wasn’t there to support Genentech’s desire to market Avastin as a breast cancer drug.

And here’s the punch line. After describing the political pressure coming from cancer support groups and political hacks such as Sarah Palin and the Wall Street Journal editorial board, Nocera continues:

The strangest reaction, though, has come from the nation’s health insurers and the administrators of Medicare. Despite the clear evidence of Avastin’s lack of efficacy in treating breast cancer, they have mostly agreed to continue paying whenever doctors prescribe it “off label” for breast cancer patients. Avastin, by the way, costs nearly $90,000 a year. . . .

Medicare . . . is, by statute, guided in such decisions not by the F.D.A. but by various compendia of drug use put together by such groups as the National Comprehensive Cancer Network. The network’s 32-member breast cancer panel is made up almost entirely of breast cancer specialists, nine of whom have financial ties to Genentech. The last time the panel voted on Avastin, it voted unanimously in favor of continuing to recommend it as a breast cancer therapy.

Nocera’s summary:

Here is an enormously expensive drug that largely doesn’t work, has serious side effects and can no longer be marketed as a breast cancer therapy. Yet insurers, including Medicare, will continue to cover it.

If we’re not willing to say no to a drug like Avastin, then what drug will we say no to?

Based on Nocera’s description, this does seem pretty scandalous. Perhaps not quite on the scale of a financial and public health disaster such as the pouring of antibiotics into animal feed in our subsidized farms, but still a bit disturbing.

And I know people who work at Genentech, which makes it seem that much worse.

On the other hand, I don’t know anything about this case. I’m curious what experts on medical decision making would say. Is Nocera right on this one? Should we as statisticians be raising our voices and making a fuss about Medicare’s apparent disregard of the principles of evidence-based medicine?

]]>Bill Bolstad wrote a reply to my review of his book *Understanding computational Bayesian statistics* last week and here it is, unedited except for the first paragraph where Bill thanks me for the opportunity to respond, “so readers will see that the book has some good features beyond having a “nice cover”.” (!) I simply processed his Word document into an html output and put a ** Read More** bar in the middle as it is fairly detailed.

The target audience for this book are upper division undergraduate students and first year graduate students in statistics whose prior statistical education has been mostly frequentist based. Many will have knowledge of Bayesian statistics at an introductory level similar to that in my first book, but some will have no previous Bayesian statistics course. Being self-contained, it will also be suitable for statistical practitioners without a background in Bayesian statistics.

The book aims to show that:

- Bayesian statistics makes different assumptions from frequentist statistics, and these differences lead to the advantages of the Bayesian approach.
- Finding the proportional posterior is easy, however finding the exact posterior distribution is difficult in practice, even numerically, especially for models with many parameters.
- Inferences can be based on a (random) sample from the posterior.
- There are methods for drawing samples from the incompletely known posterior.
- Direct reshaping methods become inefficient for models with large number of parameters.
- We can find a Markov chain that has the long-run distribution with the same shape as the posterior. A draw from this chain after it has run a long time can be considered a random draw from the posterior
- We have many choices in setting up a Markov chain Monte Carlo. The book shows the things that should be considered, and how problems can be detected from sample output from the chain.
- An independent Metropolis-Hastings chain with a suitable heavy-tailed candidate distribution will perform well, particularly for regression type models. The book shows all the details needed to set up such a chain.
- The Gibbs sampling algorithm is especially well suited for hierarchical models.

I am satisfied that the book has achieved the goals that I set out above. The title “Understanding Computational Bayesian Statistics” explains what this book is about. I want the reader (who has background in frequentist statistics) to understand how computational Bayesian statistics can be applied to models he/she is familiar with. I keep an up-to-date errata on the book website..The website also contains the computer software used in the book. This includes Minitab macros and R-functions. These were used because because they had good data analysis capabilities that could be used in conjunction with the simulations. The website also contains Fortran executables that are much faster for models containing more parameters, and WinBUGS code for the examples in the book.

**Some particular comments:**

I do not think statements such as “statisticians have long known that the Bayesian approach offered clear cut advantages over the frequentist approach” or “clearly the Bayesian approach is more straightforward than the frequentist p-value” are either unbalanced or antagonistic. Wald’s findings that all admissible procedures are Bayesian goes way back. Frequentists all know that Bayesian statistics is the only coherent approach to inference (we have been telling them that for many years!). Historically, almost all applied statistics was done using frequentist methods despite this knowledge. Frequentist p-values are constructed in the parameter dimension using a probability distribution defined only in the observation dimension. Bayesian credible intervals are constructed in the parameter dimension using a probability distribution in the parameter dimension. I think that is more straightforward. The target audience for my book is people with statistical background mainly in frequentist statistics. My book aims to build on that knowledge, not antagonize them. I think the best way to convert frequentists to the Bayesian approach is to show them how it can be used for models they are familiar with.

In Chapter 6 we show and graphically illustrate the Metropolis-Hastings algorithm using both random-walk and independent candidate distributions for both one-dimensional and two-dimensional parameters. In the two-dimensional case we do show how it works blockwise as well and show how the Gibbs sampler is a special case of blockwise Metropolis-Hastings. An important aspect is how their traceplots show the different mixing properties these chains have, particularly for highly correlated parameters. The independent Metropolis-Hastings chain sampling all parameters in a single block has the best mixing properties when the candidate density has a similar shape to the target and dominates in the tails. In Chapter 7 we show how to find such a candidate density using the mode of the target and curvature of the target at the mode, but with Student’s t shape in the tails. This is shown graphically for the single dimensional case, and the steps of the algorithm in the multivariate case (including how to sample from it) are given in the Chapter Appendix. In the multivariate case, this candidate density will have similar relationship structure for the parameters as the target, as well as having heavy tails. This approach is not misguided as suggested in the review. It leads to chains having excellent mixing properties requiring only a short burn-in and minimal thinning. The review takes the position that thinning is not required, and more precise estimates can be obtained using all the draws after burn-in. Perhaps I am a dinosaur in my view that inferences should be obtained from a random sample. In particular I worry that inferences about the relations between parameters might depend on the path the Markov chain has a tendency to take through the parameter space as well as their true relationship in the posterior. This path dependence can be avoided by thinning to give an approximate random sample from the posterior. The real inference in Bayesian statistics is the entire posterior distribution, not univariate summaries. In any case, the simulations thrown away during burn-in and thinning are not data, rather they are computer generated Markov chain output. If we want larger random sample size for our inferences we can just let the Markov chain run longer. The sample autocorrelations, Gelman-Rubin statistics, and coupling with the past are used to determine burn-in and thinning required to get this random sample. In my view, the burn-in time and the number of steps omitted between consecutive elements of the thinned sample are essentially the same idea in principle. In both cases we are looking for the number of steps required for the effect of the state we start in (or are currently in) are in to have died out. That means we have converged to long run distribution. In practice, the long burn-in times relative to thinning only occur when we start a very inefficient chain a long way from the main body of the posterior. This doesn’t happen when we use the independent Metropolis-Hastings chain using the heavy tailed candidate distribution derived from the matched curvature normal.

Chapters 8 and 9 apply this method for logistic regression, Poisson regression, and proportional hazards models using both multivariate flat priors and multivariate normal priors, including how to choose multivariate normal priors. It is well known that in regression type models including predictor variables that the response does not depend on, degrades the fit and predictive effectiveness of the model. It is also well known that correlated predictor variables can steal each others effect. This means we should try to remove all suspect predictors simultaneously instead of one at a time. The variable selection issue is handled in a similar manner to hypothesis testing. We determine an (approximate) multivariate credible region for all suspect parameters and remove the whole group if the zero vector lies inside the credible region. This builds on frequentist ideas that readers would bring in with them. The credible region is based on the normal approximation to the posterior sample.

Chapter 10 shows how the Gibbs sampler works very well for hierarchical models. Including a short diversion into the empirical Bayes method for hierarchical models for comparison purposes. Perhaps my suggestion about not using improper priors in hierarchical models is a bit too strong. It is Jeffreys’ priors for scale parameters in the hierarchical model that cause the problem. This is due to the improperness caused by the vertical asymptote and the inability of the model to force this parameter away from the asymptote as shown in the Chapter appendix (I also note this fact in a footnote in ** Introduction to Bayesian Statistics**.) The more moderate positive uniform prior although also improper does not cause this problem.

“

Statistical significance is not a scientific test. It is a philosophical, qualitative test. It asks “whether”. Existence, the question of whether, is interesting. But it is not scientific.” S. Ziliak and D. McCloskey, p.5

**T**he book, written by economists Stephen Ziliak and Deirdre McCloskey, has a theme bound to attract Bayesians and all those puzzled by the absolute and automatised faith in significance tests. The main argument of the authors is indeed that an overwhelming majority of papers stop at rejecting variables (“coefficients”) on the sole and unsupported basis of non-significance at the 5% level. Hence the subtitle “*How the standard error costs us jobs, justice, and lives*“… This is an argument I completely agree with, however, the aggressive style of the book truly put me off! As with * Error and Inference*, which also addresses a non-Bayesian issue, I could have let the matter go, however I feel the book may in the end be counter-productive and thus endeavour to explain why through this review.

“

Advanced empirical economics, which we’ve endured, taught, and written about for years, has become an exercise in hypothesis testing, and is broken. We’re saying the brokenness extends to many other quantitative sciences.” S. Ziliak and D. McCloskey, p. xviii

**T**he first chapters contain hardly any scientific argument, but rather imprecations against those using blindly significance tests. Rather than explaining in simple terms and with a few mathematical symbols [carefully avoided throughout the book] what is the issue with significance tests, Ziliak and McCloskey start with the assumption that the reader knows what tests are or, worse, that the reader does not need to know. While the insistence on thinking about the impact of a significant or insignificant coefficient/parameter in terms of the problem at hand is more than commendable, the alternative put forward by the authors remains quite vague, like “size matters”, “how big is big?”, and so on. They mention Bayesian statistics a few time, along with quotes of Jeffreys and Zellner, but never get into the details of their perspective on model assessment. (In fact, their repeated call on determining how important the effect is seems to lead to some sort of prior on the alternative to the null.) It would have been so easy to pick of the terrible examples mocked by Ziliak and McCloskey and to show what a decent statistical analysis could produce w/o more statistical sophistication than the one required by *t*-tests.. Instead, the authors have conducted a massive and rather subjective study of the *American Economic Review* for the 1980’s with regard to the worth of all [statistical] significance studies used in all papers published in the journal, then repeated the analysis for the 1990’s, and those studies constitute the core of their argument. (Following chapters reproduce the same type of analysis in other fields like epidemiology and psychometrics.)

“

Fisher realized that acknowledging power and loss function would kill the unadorned significance testing he advocated and fought to the end, and successfully, against them.” S. Ziliak and D. McCloskey, p.144

**Z**iliak and McCloskey somehow surprisingly seem to focus on the arch-villain Ronald Fisher while leaving Neyman and Pearson safe from their attacks. (And turning Gosset into the good fellow, supposed to be “hardly remembered nowadays” [p.3]. While being dubbed a “lifelong Bayesian” [p.152]) I write “surprisingly” because Fisher did not advise as much the use of a fixed significance level (even though he indeed considered 5% as a convenient bound) as the use of the *p*-value *per se*, while Neyman and Pearson introduced fixed 5% significance levels as an essential part of their testing apparatus. (See the previous posts on ** Error and Inference** for more discussions on that. And of course Jim Berger’s “Could Fisher, Jeffreys, and Neyman have agreed on testing?“) Not a surprising choice when considering the unpleasant personality of Fisher, of course! (Another over-the-board attack: “Fisherians do not literally conduct experiments. The brewer did.” [p.27] What was Fisher doing in Rothamsted then? Playing with his calculator?!) The twined fathers of significance testing seem to escape the wrath of Ziliak and McCloskey due to their use of a loss function… Or maybe of defining a precise alternative. While I completely agree that loss functions should be used to decide about models (or predictives, to keep Andrew happy!), the loss function imagined by Neyman and Pearson is simply too mechanistic to make any sense to a decision analyst. Or even to a statistician. We discussed earlier the València 9 paper of Guido Consonni, in connection with more realistic loss functions. Also the authors seem to think power is an acceptable way to salvage significance test, while I never understood the point of arguing in favour of power since, like other risk functions, power depends on the unknown parameter(s) and it is hence improbable that two procedures will get uniformly ordered for all values of the parameter(s), except in textbook situations. For instance, they think that classical sign tests are good guys!

“

Significance unfortunately is a useful mean towards personal ends in the advance of science – status and widely distributed publications, a big laboratory, a staff of research assistants, a reduction in teaching load, a better salary, the finer wines of Bordeaux. (…) In a narrow and cynical sense statistical isgnificance is the way to achieve these.” S. Ziliak and D. McCloskey, p.32

**I**n a possibly unnecessary fashion, let me repeat I find it quite sad that a book that addresses such an important issue let aggressivity, arrogance, and under-the-belt rhetorics ruin its purpose. It sounds too much like a crusade against an establishment to be convincing to neophytes and to be taken as a serious warning. (I wonder in fact what is the intended readership of this book, given that it requires some statistical numeracy, but not “too much” to be open-minded about statistical tests! ) Bullying certainly does not help in making one’s case more clearly understood: even though letting mere significance tests at standard levels rule the analysis of a statistical model is a sign of intellectual laziness, or of innumeracy, accusing its perpetrators of intentional harm and cynicism does not feel adequate. Once again, I fully agree that users of statistical methods should not let SAS (or any other commercial software) write their research paper for them but, instead, think about the indications provided by such outputs in terms of the theory and concepts behind their model(s). Interestingly, Ziliak and McCloskey mention for instance the use of simulation and pseudo-data to reproduce the performance of those tests under the assumed model and to calibrate the meaning of tools like p-values. A worthwhile and positive recommendation in an otherwise radically negative and counter-productive book.

“

Adam Smith, who is much more than an economist, noted in 1759 that hatred, resentment, and indignation against bad behavior serve, of course, a social purpose (…) “Yet there is still something disagreeable in the passions themselves.”.” S. Ziliak and D. McCloskey, p.55

**T**he first example Ziliak and McCloskey use to make their point falls quite far from the mark: in Chapter 1, discussing the impact of two diets pill A and B with means 20 and 5 and standard deviations 5 and 1/4, respectively, they conclude that B gives a smaller p-value for the test whether or not the pill has an effect. Because 20/10=2 and 5/(1/2)=10. There are two misleading issues there: first, the diets are compared in terms of mean effect, so outside statistics. Second, running a t-test of nullity of the mean is not meaningful in this case. What imports is whether or not a diet is more efficient than the other. Assuming a normal distribution, we have here

which is a pretty good argument in favour of diet pill A. (Of course, this is under the normal assumption and all that, which can be criticised and assessed.) The surprising thing is that Ziliak and McCloskey correctly criticise a similar error about the New Jersey vs. Pennsylvania minimum wage study (Chapter 9, pp.101-103)

“

Around the time that significance testing was sinking deeply into the life and human sciences, Jean-Paul Sartre noted a personality type. “There are people who are attracted by the durability of a stone (…)” Sartre could have been talking about the psychological makeup of the most rigid of the significance testers.” S. Ziliak and D. McCloskey, p. 32.

**T**he above quote shows the authors are ready to call on an “authority” as un-scientific as Jean-Sol Partre, which would be enough for me to close the case! Esp. because I am attracted by stones… Except that I came upon the quote

“

Fisher-significance is a manly sounding answer, though false. And one can see in the dichotomy of hard and soft a gendered worry, too. The worry may induce some men to cling to Significance Only. (…) Around 1950, at the peak of gender anxiety among middle-class men in the United States, nothing could be worse than to call a man soft..” S. Ziliak and D. McCloskey, pp. 140-141.

which is so completely inappropriate and unrelated as to be laughable… It also shows how far from rational academic arguments Ziliak and McCloskey are ready to delve in order to make their point. (They also blame the massacre of whales and the torturing of lambs, p. 39, on *t*-tests!) Just as laughable is the characterisation of statistics as the “bourgeois cousin” of probability theory (p.195) at a time where both fields did not truly exist and were clearly mixed in most researchers’ mind (as shown by the titles of Keynes‘ and Jeffreys‘ books).

*(Note: the book got published in 2008, hence already got a lot of reviews. However, it did not get much publicised in statistical circles, and even less in mine’s, so I only became aware of it this summer. Here are some reviews on The Endeavour and kwams, who also blogs about the review by Aris Spanos, who interestingly complaints about the authors “using a variety of well-known rhetorical strategies and devices” and the reply from the authors. David Aldous also wrote a convincing and balanced review on amazon about the book. Now, most ironically!, as I was completing this book review, I received the latest issue of Significance that contained an article by Stephen Ziliak on Matrixx v. Siracusano, about the Supreme Court ruling that statistical significance does not imply causation nor association. He and Deirdre McCloskey were experts in this ruling, however, as an academic, I fail to see how a Supreme Court ruling brings any scientific support to the case… Actually, several articles in this issue are linked to the damages caused by the blind use of significance tests. In particular, the xkcd comics about p-values, which in my opinion has more impact than the cult of significance!)*

“*Bayes Theorem is a simple consequence of the axioms of probability, and is therefore accepted by all as valid. However, some who challenge the use of personal probability reject certain applications of Bayes Theorem.*” J. Kadane, p.44

**Principles of uncertainty** by Joseph (“Jay”) Kadane (Carnegie Mellon University, Pittsburgh) is a profound and mesmerising book on the foundations and principles of subjectivist or behaviouristic Bayesian analysis. Jay Kadane wrote

“

My desire to avoid the phrase “it can be shown that” has led me to display more of the mathematical underpinnings of the subject than necessary.” J. Kadane, p.xxv

**I**ndeed **Principles of uncertainty** is (almost) self-contained from a mathematical point of view. Probability is defined from a betting perspective (no stabilisation of frequencies à la von Mises!). Limits, series, uncountable sets, Riemann integrals (whose simultaneous use with and without integration domain confused me for a while), Stieltjes integrals, Fatou’s lemma, Lebesgue’s dominated convergence theorem, matrix algebra, Euler’s formula, the Borel-Kolmogorov paradox, Taylor expansions (I dislike the use of HOT for “higher order terms” in math formulas!), Laplace’s approximation, the Weierstrass approximation, all are covered in reasonable details within the 500 of the book. (I am not sure I agree with the discussion about the uniform distribution on the integers, in Section 3.2!) All standard distributions are covered and justified (incl. the Wishart distribution). Paradoxes like Simpson’s, Monty Hall‘s, the Gambler’s Ruin, Allais‘, the Prisoner dilemma, are processed in specific sections. As written above, the processing of the convergence of MCMC algorithms is quite nice and rather unique: the argument is based on a minorisation constraint (existence of a small set) and the use of the corresponding renewal process of Nummelin (1984), which, in my opinion, is a beautiful way of explaining most convergence properties of Markov chains. While the R code sprinkled along the book may appear superficial, I think it relates to the same goal of Jay Kadane to keep no step unjustified and hence to back graphs with the corresponding R code. The style is as personalistic as the message and very enjoyable, with little stories at the entry of some chapters to make a point. As I read the book within a few days surrounding my trip to Zürich, I cannot be certain I did not miss typos, but I saw very few. (A change of line within the first displayed formula of page 87 is rather surprising and, I think, unintentional. Some pages like p.215, p.230 or p. 324-326 also end up with several highly isolated formulas because of long displayed equations the page after. A

“

A hierarchical model divides the parameters into groups that permit the imposition of assumptions of conditional independence.” J. Kadane, p.337

**T**he hierarchical chapter of **Principles of uncertainty** is also well-done, with connections to the James-Stein phenomenon. And an inclusion of the famous New Jersey turnpike lawsuit. In the model choice section (pp.343-344), Jay Kadane comes the closest to defining a Bayesian test, even though he does not call it this way. He will only return to tests in the final chapter (see below). The MCMC chapter that comes right after, while being highly enjoyable on the theoretical side, is missing an illustration of MCMC implementation and convergence (or lack thereof).

“

A claim of possession of the objective truth has been a familiar rhetorical move of elites, social, religious, scientific, or economic. Such a claim is useful to intimidate those who might doubt, challenge, or debate the “objective” conclusions reached. History is replete with the unfortunate consequences, nay disasters, that have ensued. To assert the possession of an objective method of analyzing data is to make a claim of extraordinary power in our society. Of course it is annoyingly arrogant, but, much worse, it has no basis in the theory it purports to implement.” J. Kadane, p.446

**T**he above quote is the concluding sentence of the one but [very short] last chapter… It reflects the opinion of the author in such a belligerent way that I fear this chapter does not belong to a general audience book: the “Exploration of Old Ideas” chapter in **Principles of uncertainty** is too antagonistic to be unde

**I** have just finished reading this book by Bill Bolstad (University of Waikato, New Zealand) which a previous ‘Og post pointed out when it appeared, shortly after our ** Introducing Monte Carlo Methods with R**. My family commented that the cover was nicer than those of my books, which is true. Before I launch into a review, let me warn the ‘Og reader that, as an author of three books on computational Bayesian statistics, I cannot be very objective on the topic: I do favour the way we approached Bayesian computational methods and, after reading Bolstad’s

*Understanding computational Bayesian statistics* is covering the basics of Monte Carlo and (fixed dimension) Markov Chain Monte Carlo methods, with a fair chunk dedicated to prerequisites in Bayesian statistics and Markov chain theory. Even though I have only glanced at the table of contents of Bolstad’s *Introduction to Bayesian Statistics* [using almost the same nice whirl picture albeit in bronze rather than cobalt], it seems to me that the current book is the continuation of the earlier one, going beyond the Binomial, Poisson, and normal cases, to cover generalised linear models, via MCMC methods. (In this respect, it corresponds to Chapter 4 of *Bayesian Core*.) The book is associated with *Minitab* macros and an **R** package (written by James Curran), *Bolstad2*, in continuation of *Bolstad*, written for *Introduction to Bayesian Statistics*. Overall, the level of the book is such that it should be accessible to undergraduate students, MCMC methods being reduced to Gibbs, random walk and independent Metropolis-Hastings algorithms, and convergence assessments being done via autocorrelation graphs, the Gelman and Rubin (1992) intra-/inter-variance criterion, and a forward coupling device. The illustrative chapters cover logistic regression (Chap. 8), Poisson regression (Chap. 9), and normal hierarchical models (Chap. 10). Again, the overall feeling is that the book should be understandable to undergraduate students, even though it may make MCMC seem easier than it is by sticking to fairly regular models. In a sense, it is more a book of the [roaring MCMC] 90’s in that it does not incorporate advances from 2000 onwards (as seen from the reference list) like adaptive MCMC and the resurgence of importance sampling via particle systems and sequential Monte Carlo.

**J**ulien Cornebise has [once again!] pointed out a recent Guardian article. It is about commercial publishers of academic journals, mainly Elsevier, Springer, and Wiley, with a clear stand from its title: “*Academic publishers make Murdoch look like a socialist*“! The valuable argument therein is that academic publishers make hefty profits (a 40% margin for Elsevier!) without contributing to the central value of the journals, namely the research itself that is mostly funded by public or semi-public bodies. The publishers of course distribute the journals to the subscribers, but the reported profits clearly show that, on average, they spend much less doing so than they charge… Here are some of the institutional rates *(can you spot Elsevier journals? journals published by societies? free open access journals?!)*:

- Communications in Statistics (A, B, C, print and online): 9,526 euros
- Journal of Econometrics: $3,560
- Statistics and Probability Letters: $2,941
- PNAS: $2,910
- JMA: $2,130
- Statistics and Computing: 1037 euros
- PRTF: $999
- Econometrica: $650
- JASA (print and online): $615
- JRSS B (print and online): $565
- International Statistical Review: $411
- Annales de l’IHP: $400
- Annals of Statistics: $390
- Biometrika: $282
- Significance: $279
- JCGS: $233
- Technometrics: $180
- Chance: $96
- Bayesian Analysis: $0.00
- Journal of Statistical Software: $0.00

*(apart from greed, there is no justification for the top four [Taylor and Francis/Elsevier] journals to ask for such prices! The Journal of Econometrics also charges $50 per submission! PNAS is another story given the volume of the [non-for-profit] publication: 22750 pages in 2010, meaning it is highly time to move to being fully electronic. **The rate for Statistics and Computing is another disappointment, when compared with JCGS. )*** **

**T**he article reports the pressure to publish in such journals (vs. non-commercial journals) because of the tyranny of the impact factors. However, the reputation of those top-tier journals is not due to the action of the publishers, but rather to the excellence of their editorial boards; there is therefore no foreseeable long-term impact in moving from one editor to another for our favourite journals. Moreover, I think that the fact to publish in top journals is more relevant for the authors themselves than for the readers when the results are already circulating through a media like arXiv. Of course, having the papers evaluated by peers in a strict academic mode is of prime importance to distinguish major advances from pseudo-science; however the electronic availability of papers and of discussion forums and blogs implies that suspicious results should anyway be detected by the community. (I am not advocating the end of academic journals, far from it!, but an evolution towards a wider range of evaluations via Internet discussions, as for the DREAM paper recently.) The article also mentions that some funding organisms impose Open Access publishing. However, this is not the ideal solution as long as journals also make a profit on that line, by charging for open access (see, e.g., PNAS or JRSS)! Hence using another chunk of public (research) money towards their profits… My opinion is that everyone should make one’s papers available on-line or better via arXiv. And petition one’s societies for a tighter control of the subscription rates, or even a move to electronic editions when the rates get out of control.

**PS-H**ere is a link to an Australian blog, *the Conversation*, where some publishers (Wiley and Elsevier) were interviewed on these points. I will not comment, but this interview is quite informative on the defense arguments of the publisher!

**H**ere is my enthusiastic (and obviously biased) reaction to ** the theory that would not die**. It tells the story and the stories of Bayesian statistics and of Bayesians in a most genial and entertaining manner. There may be some who will object to such a personification of science, which should be (much) more than the sum of the characters who contributed to it. However, I will defend the perspective that (Bayesian) statistical science is as much philosophy as it is mathematics and computer-science, thus that the components that led to its current state were contributed by individuals, for whom the path to those components mattered. While the book inevitably starts with the (patchy) story of Thomas Bayes’s life, incl. his passage in Edinburgh, and a nice non-mathematical description of his ball experiment, the next chapter is about “the man who did everything”, …, yes indeed, Pierre-Simon (de) Laplace himself! (An additional nice touch is the use of lower case everywhere, instead of an inflation of upper case letters!) How Laplace attacked the issue of astronomical errors is brilliantly depicted, rooting the man within statistics and explaining why he would soon move to the “probability of causes”. And rediscover plus generalise Bayes’ theorem. That his (rather unpleasant!) thirst for honours and official positions would cause later disrepute on his scientific worth is difficult to fathom, esp. when coming from knowledgeable statisticians like Florence Nightingale David.

**T**he next chapter is about the dark ages of [not yet] Bayesian statistics and I particularly liked the links with the French army, discovering there that the great Henri Poincaré testified at Dreyfus’ trial using a Bayesian argument, that Bertillon had completely missed the probabilistic point, and that the military judges were then all aware of Bayes’ theorem, thanks to Bertrand’s probability book being used at École Polytechnique! (The last point actually was less of a surprise, given that I had collected some documents about the involvement of late 19th/early 20th century artillery officers in the development of Bayesian techniques, Edmond Lhostes and Maurice Dumas, in connection with Lyle Broemeling’s Biometrika study.) The description of the fights between Fisher and Bayesians and non-Bayesians alike is as always both entertaining and sad. Sad also is the fact that Jeffreys’ masterpiece got so little recognition at the time. (While I knew about Fisher’s unreasonable stand on smoking, going as far as defending the assumption that “lung cancer might cause smoking”(!), the Bayesian analysis of Jerome Cornfield was unknown to me. And quite fascinating.) The figure of Fisher actually permeates the whole book, as a negative bullying figure preventing further developments of early Bayesian statistics, but also as an ambivalent anti-Bayesian who eventually tried to create his own brand of Bayesian statistics in the format of fiducial statistics…

“

…and then there was the ghastly de Gaulle.” D. Lindley

**T**he following part of ** the theory that would not die** is about Bayes’ contribution to the war (WWII), at least from the Allied side. Again, I knew most of the facts about Alan Turing and Bletchley Park, however the story is well-told and, as in previous occasions, I cannot but be moved by the waste of such a superb intellect, thanks to the stupidity of governments. The role of Albert Madansky in the assessment of the [lack of] safety of nuclear weapons is also well-described, stressing the inevitability of a Bayesian assessment of a one-time event that had [thankfully] not yet happened. The above quote from Dennis Lindley is the conclusion of his argument on why Bayesian statistics were not called Laplacean; I would think instead that the French post-war attraction for abstract statistics in the wake of Bourbaki did more against this recognition than de Gaulle’s isolationism. The involvement of John Tukey into military research was also a novelty for me, but not so much as his use of Bayesian [small area] methods for NBC election night previsions. (They could not hire José nor Andrew at the time.) The conclusion of Chapter 14 on why Tukey felt the need to distance himself from Bayesianism is quite compelling. Maybe paradoxically, I ended up appreciating Chapter 15 even more for the part about the search for a missing H-bomb near Palomares, Spain, as it exposes the plusses a Bayesian analysis would have brought.

“

There are many classes of problems where Bayesian analyses are reasonable, mainly classes with which I have little acquaintance.” J. Tukey

**W**hen coming to recent times and to contemporaries, Sharon McGrayne gives a very detailed coverage of the coming-of-age of Bayesians like Jimmy Savage and Dennis Lindley, as well as the impact of Stein’s paradox (a personal epiphany!), along with the important impact of Howard Raiffa and Robert Schlaifer, both on business schools and on modelling prior beliefs [via conjugate priors]. I did not know anything about their scientific careers, but *Applied Statistical Decision Theory* is a beautiful book that prefigured both DeGroot‘s and Berger‘s. (As an aside, I was amused by Raiffa using Bayesian techniques for horse betting based on race bettors, as I had vaguely played with the idea during my spare time in the French Navy!) Similarly, while I’d read detailed scientific accounts of Frederick Mosteller’s and David Wallace’s superb Federalist Papers study, they were only names to me. Chapter 12 mostly remedied this lack of mine’s.

“

We are just starting” P. Diaconis

**T**he final part, entitled Eureka!, is about the computer revolution we witnessed in the 1980’s, culminating with the (re)discovery of MCMC methods we covered in our own “history”. Because it covers stories that are closer and closer to today’s time, it inevitably crumbles into shorter and shorter stories. However, ** the theory that would not die** conveys the essential message that Bayes’ rule had become operational, with its own computer language and objects like graphical models and Bayesian networks that could tackle huge amounts of data and real-time constraints. And used by companies like Microsoft and Google. The final pages mention neurological experiments on how the brain operates in a Bayesian-like way (a direction much followed by neurosciences, as illustrated by Peggy Series’ talk at Bayes-250).

**I**n conclusion, I highly enjoyed reading through ** the theory that would not die**. And I am sure most of my Bayesian colleagues will as well. Being Bayesians, they will compare the contents with their subjective priors about Bayesian history, but will in the end update those profitably. (The most obvious missing part is in my opinion the absence of E.T Jaynes and the MaxEnt community, which would deserve a chapter on its own.) Maybe ISBA could consider supporting a paperback or electronic copy to distribute to all its members! As an insider, I have little idea on how the book would be perceived by the layman: it does not contain any formula apart from [the discrete] Bayes’ rule at some point, so everyone can read it through: The current success of