Data science vs. statistics: has “statistics” become a dirty word?

(by John Johnson)

Revolution Analytics recently published the results of a poll indicating that JSM 2011 attendees consider themselves “data scientists.” Nancy Geller, President of the ASA, asks statisticians not to “Shun the ‘S’ word.” Yet a third take on the matter is the top tweet from JSM 2011 with Dave Blei’s quote “‘machine learning’ is how you say ‘statistics’ to a computer scientist.”

Comments about selection bias from Revolution’s poll aside (it was conducted as part of the free wifi connection in the expo), the shift from “statistics” to “analytics,” “machine learning,” “data science,” and other terms seems to reflect that calling oneself a “statistician” is just not cool or scares our colleagues. So I open the floor up to the question: has “statistics” become a dirty word?

3 Responses to “Data science vs. statistics: has “statistics” become a dirty word?”

  1. 1 Michael Lavine August 9, 2011 at 1:41 am

    You write, ‘Revolution Analytics recently published the results of a poll indicating that JSM 2011 attendees consider themselves “data scientists”, which you cite as evidence of a ‘shift from “statistics” to “analytics,” “machine learning,” “data science,” and other terms …”’

    But the question in the poll was “How strongly do you identify with the statement, ‘I consider myself a data scientist'”, not “Which term most accurately describes your profession?” Many people might identify with both “data scientist” and “statistician”. So is there really evidence of a ‘shift from “statistics”‘?

  2. 2 Wayne August 18, 2011 at 10:10 pm

    Two thoughts: 1) there’s an old saying that if you have to label it a science, it’s not, and 2) I imagine that “data scientist” is eliminating several connotations involved with “statistician”.

    It may just be my own connotation, but since I don’t have a degree in statistics, I’d hesitate to call myself a statistician. Also, statistician, like mathematician, sounds like the person might be a theoretician in an ivory tower, while data scientist sounds much more practical.

  3. 3 Felicien Kanyamibwa, PhD August 24, 2011 at 10:49 pm

    I believe it depends on where most of the responders came from. Statisticians in the industry have become usually the go to people when it comes to manipulating, sifting through, analyzing and interpreting the information gleaned from massive data sets. Hence, across the halls of corporations, one may hear managers and other decision makers say: “hey let us ask the data person. she will certainly provide the answer and recommend us a solution.”
    The line between statisticians and data analysts is blurred. The days when statisticians were the people to go to only for promotional mix models regression models, segmentation, biostatistics analyses, etc. may be over. Statisticians are expected to manipulate large data sets, develop models, create presentations, tell the story and provide business recommendations.
    I think universities statistics and business departments may need to adapt to the situation if they are not already there.
    Perhaps that is why the poll gave these results.

    Felicien Kanyamibwa, PhD

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


The Statistics Forum, brought to you by the American Statistical Association and CHANCE magazine, provides everyone the opportunity to participate in discussions about probability and statistics and their role in important and interesting topics.

The views expressed here are those of the individual authors and not necessarily those of the ASA, its officers, or its staff. The Statistics Forum is edited by Andrew Gelman.

A Magazine for People Interested in the Analysis of Data

%d bloggers like this: