(by Andrew Gelman)
C. Wild, M. Pfannkuch, M. Regan, and N. J. Horton have a long article. Here’s their abstract:
There is a compelling case, based on research in statistics education, for first courses in statistical inference to be underpinned by a staged development path. . . . We discuss the issues that are involved in formulating precursor versions of inference and then present some specific and highly visual proposals. These build on novel ways of experiencing sampling variation and have intuitive connections to the standard formal methods of making inferences in first university courses in statistics. Our proposal uses visual comparisons to enable the inferential step to be made without taking the eyes off relevant graphs of the data. . . . Our approach was devised for use in high schools but is also relevant to adult education and some introductory tertiary courses.
The article appears with many comments by discussants. Here’s what I wrote:
I agree that, wonderful as informal plots and data summaries are, we also should be teaching students formal statistical inference, which is a big part of what separates statistical thinking from mere intelligent commentary. I like the authors’ formulation that statistical inference
addresses a particular type of uncertainty, namely that caused by having data from random samples rather than having complete knowledge of entire populations, processes or distributions.
The authors write
we also need to start from a clear distinction between when we are playing the description game and when we are playing the inference game.
I would go one step further and get rid of the concept of ‘description’ entirely, for two reasons.
(a) Ultimately, we are almost always interested in inference. Catch-phrases such as ‘let the data speak’ and rhetoric about avoiding assumptions (not in the paper under discussion, but elsewhere in the statistics literature) can obscure the truth that we care about what is happening in the population; the sample we happen to select is just a means to this end.
(b) Description can often—always?—be reframed as inference. For example, we talk about the mean and standard deviation of the data, or the mean and standard deviation of the population. But the former can be presented simply as an estimate of the latter. I prefer to start with the idea of the population mean and standard deviation, then introduce the sample quantities as estimates.
Similarly, some textbooks first introduce linear regression as a data summary and then return to it later in the context of statistical inference. I prefer to . . . introduce the least squares estimate, not as a data summary, but as an estimate of an underlying pattern of interest.
In giving these comments, I am not trying to imply that my approach is the best way or even a good way to teach statistics. I have no evidence to back up my hunches on how to teach. But I would like to suggest the possibilities, because I think that statisticians are so stuck in a ‘description versus inference’ view of the world which can lead to difficulty in teaching and learning.
The article and many of the discussions are worth reading. (Check out what Sander Greenland has to say!)
We welcome your reactions here. Any comments you post at this forum will certainly be noticed by the authors of the article.