### The foundations of Statistics: a simulation-based approach

This post is in two parts.  First, Christian Robert reviews a new book by two linguists, Shravan Vasishth and Michael Broe; then Vasishth responds.

Review by Christian Robert:

“We have seen that a perfect correlation is perfectly linear, so an imperfect correlation will be imperfectly linear’.” page 128

This book has been written by two linguists, Shravan Vasishth and Michael Broe, in order to teach statistics “in  areas that are traditionally not mathematically demanding” at a deeper level than traditional textbooks “without using too much mathematics”, towards building “the confidence necessary for carrying more sophisticated analyses” through R simulation. This is a praiseworthy goal, bound to produce a great book. However, and most sadly, I find the book does not live up to expectations. As in Radford Neal’s recent coverage of introductory probability books, there are statements there that show a deep misunderstanding of the topic…

“The least that you need to know about is LaTeX, Emacs, and Emacs Speaks Statistics. Other tools that will further enhance your working experience with LaTeX are AucTeX, RefTeX, preview-latex, and Python.”page 1

The above recommendation is cool (along with the point that these tools are “already pre-installed in Linux”, while, for Windows or Macintosh, users “will need to read the manual”!) but eventually rather daunting when considering the intended audience. While I am using LaTeX and only LaTeX in my everyday work, the recommendation to learn LaTeX prior to “understand the principles behind inferential statistics” sounds inappropriate. The book clearly does not require an understanding of LaTeX to be read, understood, and practiced. (Same thing for Python!)

The authors advertise a blog about the book that contains very little information. (The last entry is from December 2010: “The book is out”.) This was a neat idea, had it been implemented.

“Let us convince ourselves of the observation that the sum of the deviations from the mean always equals zero.” page 5

What I dislike the most about this book is the waste of space dedicated to expository developments that aim at bypassing mathematical formulae, only to provide at the very end of the argument this mathematical formula. And then the permanent confusion between the distribution and the sample, the true parameters and their estimates. (Plus the many foundational mistakes, as those reported below.) If a reader has had some earlier exposition to statistics, the style and pace are likely to unsettle/infuriate her. If not, she will be left with gapping holes in her statistical bases: no proper definition of unbiasedness (hence a murky justification of the degrees of freedom whenever they appear), of the Central Limit theorem, of the t distribution, no mention being made of the Law of Large Numbers. (although a connection is made in the summary, page 63) This does not seem sufficient enough to engage in reading Gelman and Hill (2007), as suggested at the end of thebook… Having the normal density defined as the “somewhat intimidating-looking function” (page 39)

$f(x) = \dfrac{1}{(\sigma\sqrt{2\pi}}\,E^{-((x-\mu)^2/2\sigma^2)}$

certainly does not help! (Nor does the call to integrate rather than pnorm to compute normal tail probabilities (pages 69-70)).

“The key idea for inferential statistics is as follows: If we know what a random’ distribution looks like, we can tell random variation from non-random variation.” page 9

The above quote gives a rather obscure and confusing entry to statistical inference. Especially when it appears at the beginning of a chapter (Chapter 2) centred on the binomial distribution. As the authors seem reluctant to introduce the binomial probability function from the start, they resort to an intuitive discourse based on (rather repetitive) graphs (with an additional potential confusion induced by the choice of a binomial probability of p=0.5, since pk(1-p)n-k is then constant in k…) In Section 2.3, the distinction between binomial and hypergeometric sampling is not mentioned, i.e. the binomial approximation is used without any warning. The fact that the mean of the binomial distribution B(n,p) is np is not established and the variance being np(1-p) is not stated at all. (However, the book spends four pages [36-39] showing through an R experiment that “the sum of squared deviations from the mean are [sic!] smaller than from any other number”.)

“The mean of a sample is more likely to be close to the population mean than not.”page 49

The above is the conclusive summary about the Central Limit theorem, after an histogram with 8 bins showing that “the distribution of the means is normal!”… It is then followed by a section on “sis an Unbiased Estimator of σ“, nothing less!!! This completely false result (s is the standard estimator of the standard deviation σ) is again based on the “fact” that it is “more likely than not to get close to the right value”. The introduction of the t distribution is motivated by the “fact that the sampling distribution of the sample mean is no longer be modeled by the normal distribution” (page 55). With such flaws in the presentation, it is difficult to recommend the book at any level. Especially the most introductory level.

“We know that the value is within 6 of 20, 95% of the time.” page 27

I am also dissatisfied with the way confidence and testing are handled (and not only because of my Bayesian inclinations!). The above quote, which replicates the usual fallacy about the interpretation of confidence intervals, is found a few lines away from a warning about the inversion of confidence statements! A warning only repeated later “it’s a statement about the probability that the hypothetical confidence intervals (that would be computed from the hypothetical repeated samples) will contain the population mean” (page 59). The book spends a large amount of pages on hypothesis testing, presumably because of the own interests of the authors, however it is unclear a neophyte could gain enough expertise from those pages to conduct his own tests. Worse, statements like (page 75)

$H_0: \bar x = \mu_0$

show a deep misunderstanding of the nature of both testing and random variables. How can one test a property about the observed sample mean?! A similar confusion appears in the ANOVA chapter (e.g. (5.51) on page 112).

“The research goal is to find out if the treatment is effective or not; if it is not, the difference between the means should be `essentially’ equivalent.” page 92

The following chapters are about analysis of variance (5), linear models (6), and linear mixed models (7). all of which face fatal deficiencies similar to the ones noted above. The book would have greatly benefited from a statistician’s review before being published.  (I cannot judge whether or not the book belongs to a particular series.) As is, it cannot deliver the expected outcome on its readers and train them towards more sophisticated statistical analyses. As a non-expert on linguistics, I cannot judge of the requirements of the field and of the complexity of the statistical models it involves. However, even the most standard models and procedures should be treated with the appropriate statistical rigour. While the goals of the book were quite commendable, it seems to me it cannot endow its intended readers with the proper perspective on statistics…

Response by Shravan Vasishth:

I speak only for myself, not my co-author.

1. About the outright mistakes in the book

These I will definitely correct on the blog (acknowledging this
reviewer and the review) and in the next edition (which I’ll try to
get out as soon as possible).  I’m grateful for that feedback. There
are two kinds of errors; one involves statements that the reviewer
would like to see differently presented, and the other are outright
errors (such as H_0: \bar{x}=\mu; this one is truly absurd, I agree).

I am certainly not a statistician and so it’s not surprising that I
end up annoying real statisticians with erroneous statements (this is
a general problem whenever someone from field X writes about material
from field Y, e.g., psychologists working on language).

It would be perfect if a statistician would write a book like this,
instead of a bunch of linguists who have only an imperfect
understanding of statistical theory. So why hasn’t it happened? I
don’t know. My theory is that statisticans can’t see the world from a
non-statistician’s eyes.

Perhaps the reviewer would like to write a better book; I would be the
first to adopt it in my classes. Perhaps such a book already exists; I
would appreciate a pointer on a book that the author considers a more
competent version than ours that has the same goals.

I should add that I tried to recruit several statisticians (at
different points) to join us as co-authors for this book, in order to
save us from ourselves, so to speak. Nobody was willing to do join in;
the most frequent response was that all the information is in the
standard books already (which is true–it’s just not accessible to
everyone).

Would the reviewer be willing to join us as co-author in a second
edition? That would be a very useful outcome of this review.

2. About LaTeX recommendations being beside the point:

I do take the reviewer’s point about LaTeX etc. not being necessary
for the book. But he did leave out in his review this line from the
book when criticizing the recommendations regarding LaTeX etc.

“In order to use the code that comes with this book, you only need to install R….None of these [LaTeX, emacs, etc.] are required but are highly
recommended for typesetting and other sub-tasks necessary for data
analysis.” (p. 1).

The reviewer considers the task of using/installing LaTeX/emacs etc.
daunting/distracting, but these are standard tools for linguists at
universities I have attended (Graduate Program in Language and
Culture, Osaka University; Linguistics, Ohio State, where I did my
PhD; Computational Linguistics, Saarbruecken; Linguistics, Potsdam).
Many among my intended audience need these tools anyway, so I don’t
consider it overkill to suggest them to those who don’t have them yet.
Many linguists (especially linguists doing theoretical work involving,
e.g., the lambda calculus or formal logic) only need to know that
Sweave exists–they know the rest.

3. About the blog not being updated:

This was the first indication to me that the author has let his rage
get the better of him (the rage is no doubt there for good reason).
When I am this enraged with an article or book, and have to review it,
I generally write my venomous version of the review first, and then
let it sit for a while, and then rewrite it, just focusing on the real
weaknesses. I think the reviewer may benefit from this strategy in
future.

Turning now to the critique about the non-active blog: the book is 6
months old. Finding time to add useful entries to the blog is hard in
my current situation (8 hours of teaching hours per week; lots of PhD
students; a 5 year old child; health problems; and so on). I expect to
add material to the blog over this summer, when my sabbatical begins.

4. The criticism about the methods presented (“The book spends a large
amount of pages on hypothesis testing, presumably because of the own
interests of the authors, however it is unclear a neophyte could gain
enough expertise from those pages to conduct his own tests.”)

Hypothesis testing is unfortunately the standard in Linguistics; our
focus has nothing to do with our personal interests. I am very keen to
use bayesian methods in my work, and I know that a new generation of
linguists is already working hard in that direction. I know the
material in Gelman and Hill 2007 enough to know how to fit models
using WinBUGS, and am working through Kruschke’s book. However,  I
don’t yet see how I can convince a journal reviewer that they should
abandon hypothesis testing.

We generally fit models using linear mixed models, where we have some
dependent measure, like reading time or event-related potentials, and
we usually a 2×2 or 2×3 within subjects counterbalanced design, with
participants and items as crossed random factors (see the book by
Baayen, 2008, for more examples). If I use uninformative priors in
WinBUGS to fit such a model, I will get comparable results to:

## condition with some appropriate contrast coding:

lmer(dv~condition+(1|subject)+(1|item,data)

What’s the payoff of switching to bayesian models for the reviewer (I
understand this myself–I follow Andrew Gelman’s blog ;)? If I could
use informative priors, I could make a strong case for abandoning
hypothesis testing in a paper, but I haven’t seen any clear strategy
for doing that kind of analysis in books yet.

So, what did I learn from this review? I learnt that I should spend
more time studying statistics. Even without the review, I knew I need
to understand this material better, and I am working on it. One
central problem for an outsider trying to get insider knowledge of
statistics is that one cannot engage a professional statistician long
enough to get enough detail on obscure points (of which there are
many). Nobody has the time. Andrew Gelman has patiently answered a lot of my questions, by email and in person, but there’s a limit to how
much one can impose. The ideal way seems to be to become a full-time
student again in a statistics department. Maybe that’s the way I will
go.

In closing, I should add that although the reviewer doesn’t recommend
this book at all, the reality is that we have received a lot of
positive reviews from users. It has been in use in several
universities (mostly in the US) as a freely available draft . I only
published it after a lot of people asked me to.

We have excellent textbooks out there, particularly Baayen (2008)
(focused on language research); Gelman and Hill (2007) (focused on
linear mixed models and bayesian data analysis). These are in wide use
in linguistics, but grad students are unable to get into them as their
first textbook. Our book provides them with a way to start with these
books (they need a lot of help with Gelman and Hill, but our grad
Given all this, perhaps the reviewer will dislike it a bit less once
we fix the errors in the book?