Error and Inference (part 1)

(by Christian Robert)

“The philosophy of science offer valuable tools for understanding and advancing solutions to the problems of evidence and inference in practice”—D. Mayo & A. Spanos, p.xiv, Error and Inference, 2010

Deborah Mayo kindly sent me her book, whose subtitle is “Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of Science” and contributors are P. Achinstein, A. Chalmers, D. Cox, C. Glymour, L. Laudan, A. Musgrave, and J. Worrall, plus both editors, Deborah Mayo and Aris Spanos. Deborah Mayo rightly inferred that this debate was bound to appeal to my worries about the nature of testing and model choice and to my layman interest in the philosophy of Science. Speaking of which [layman], the book reads really well, even though I am clearly missing references to Mayo’s and others’ earlier works. And even though it cannot be read under my cherry tree (esp. now that weather has moved from été to étaumne… as I heard this morning on the national public radio) Deborah Mayo is clearly the driving force in putting this volume together, from setting the ERROR 06 conference to commenting the chapters of all contributors (but her own and Aris Spanos’). Her strongly frequentist perspective on the issues of testing and model choice are thus reflected in the overall tone of the volume, even though contributors bring some contradiction to the debate. A complete book review was published in the Notre-Dame Philosophical Review.

However, scientists wish to resist relativistic, fuzzy, or post-modern turns (…) Notably, the Popperian requirement that our theories are testable and falsifiable is widely regarded to contain important insights about responsible science and objectivity.—D. Mayo & A. Spanos, p.2, Error and Inference, 2010

Given the philosophical, complex, and interesting nature of the work, I will split my comments into several linear posts (hence the part 1), as I did for Evidence and Evolution. The following comments are thus about a linear (even pedestrian) and incomplete read through the first three chapters. Those comments are not pretending at any depth, they simply reflect the handwritten notes, thoughts, and counterarguments I scribbled as I was reading through… As illustrated by the above quote (which first part I obviously endorse), the overall perspective in the book is Popperian, despite Popper’s criticism of statistical inference as a whole. Another fundamental concept throughout the book is the “Error-Statistical philosophy” whose Deborah Mayo is the proponent. One of the tenets of this philosophy is a reliance on statistical significance tests in the Fisher-Neyman-Pearson (or frequentist) tradition, along with a severity principle (“We want hypotheses that will allow for stringent testing so that if they pass we have evidence of a genuine experimental effect“, p.19) stated as (p.22)

A hypothesis H passes a severe test T with data x is

  1. x agrees with H, and
  2. with very high probability, test T would have produced a result that accords less well with H than does x, if H were false or incorrect.

(The p-value is advanced as a direct accomplishment of this goal, but I fail to see why it does or why a Bayes factor would not. Indeed, the criterion depends on the definition of probability when H is false or incorrect. This relates to Mayo’s criticism of the Bayesian approach, as explained below.)

Formal error-statistical tests provide tools to ensure that errors will be correctly detected with high probabilities“—D. Mayo, p.33, Error and Inference, 2010

In Chapter 1, Deborah Mayo has a direct go at the Bayesian approach. The main criticism is about the Bayesian approach to testing (defined through the posterior probability of the hypothesis, rather than through the predictive) is about the catchall hypothesis, a somehow desultory term replacing the alternative hypothesis. According to Deborah Mayo, this alternative should “include all possible rivals, including those not even though of” (p.37). This sounds like a weak argument, although it was also used by Alan Templeton in his rebuttal of ABC, given that (a) it should also apply in the frequentist sense, in order to define the probability distribution “when H is false or incorrect” (see, e.g., “probability of so good an agreement (between H and x) calculated under the assumption that H is false”, p.40); (b) a well-defined alternative should be available as testing an hypothesis is very rarely the end of the story: if H is rejected, there should/will be a contingency plan; (c) rejecting or accepting an hypothesis H in terms of the sole null hypothesis H does not make sense from operational as well as from game-theoretic perspectives. The further argument that the posterior probability of H is a direct function of the prior probability of H does not stand against the Bayes factor. (The same applies to the criticism that the Bayesian approach does not accommodate newcomers, i.e., new alternatives.) Stating that “one cannot vouch for the reliability of [this Bayesian] procedure—that it would rarely affirm theory T were T false” (p.37) completely ignores the wealth of results about the consistency of the Bayes factor (since the “asymptotic long run”, p.20, matters in the Error-Statistical philosophy). The final argument that Bayesians rank “theories that fit the data equally well (i.e., have identical likelihoods)” (p.38) does not account for (or dismisses, p.50, referring to Jeffreys and Berger instead of Jefferys and Berger) the fact that Bayes factors are automated Occam’s razors in that the averaging of the likelihoods over spaces of different dimensions are natural advocates of simpler models. Even though I plan to discuss this point in a second post, Deborah Mayo also seems to imply that Bayesians are using the data twice (this is how I interpret the insistence on same p. 50), which is a sin [genuine] Bayesian analysis can hardly be found guilty of!

Advertisements

1 Response to “Error and Inference (part 1)”


  1. 1 David W. Hogg September 1, 2011 at 9:11 am

    Isn’t it true that for every test T there is a hypothesis H2 that “the data x are just produced by a very flexible noise model”? In this case, the frequentist must *always* accept H2 as a plausible explanation of the data? That is, without some prior input, or model-complexity penalty, the frequentist will always be forced to rank such “null” theories ahead of *any* predictive theory? That is, it would be *completely wrong* to rank equally “theories that fit the data equally well”. No?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




About

The Statistics Forum, brought to you by the American Statistical Association and CHANCE magazine, provides everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics.

The views expressed here are those of the individual authors and not necessarily those of the ASA, its officers, or its staff. The Statistics Forum is edited by Andrew Gelman.

A Magazine for People Interested in the Analysis of Data

RSS CHANCE Magazine Online


%d bloggers like this: