(by Andrew Gelman)
Last week I pointed you to a pair of dueling mini-articles on information visualization and statistical graphics.
In the infovis article, Robert Kosara gives an example that I think perfectly illustrates one of my general points on the difference between the two approaches.
My remarks below might appear harsh but I don’t mean them to. As Antony Unwin and I have written, we take as a starting point that infovis has value. In exploring the different goals of infovis and statgraphics, we are not questioning the values of infovis but rather trying to move toward a future in which the best ideas from both approaches can be used to understand and communicate quantitative information.
OK, now to the example. In his article, Kosara writes:
A common question in time series data is whether the data is periodic, and if yes, what the period is. A common way of finding out is drawing the data on a spiral. By changing the number of data points that is shown per full round the spiral makes (that number is constant, of course), patterns become visible. Figure 1 shows an example of sick leave data that has an interesting periodic pattern: in 28 days, there are four periods, which means that there is a weekly pattern: more people call in sick on Mondays than later in the week.
The way this pattern was discovered is deceptively simple. All it took was to play with a slider that allowed the user to change the number of days on shown on the spiral. Slide it back and forth, and soon you will see a pattern (if there is one). With a bit of practice, you can even tell when you’re getting close, as there are telltale signs around the optimal value.
Here’s Kosara’s graph:
with the following label:
Figure 1: Spirals are useful for finding periodicity in data. (a) The bar chart shows no obvious periodic pattern; (b) the spiral set to 25 days hints at a periodic pattern, but this is clearly not the correct time frame; (c) at 28 days, the pattern is very clearly visible.
I agree the graph is pretty but I really don’t see the point. Of course you’d want to look for day-of-the-week patterns in sick leave data. Instead of the pretty pinwheel, I’d much prefer a simple dot and line plot showing the data and averages for the 7 days from Monday through Sunday. That would give me quantitative information.
What does the swirly graph tell me? That there’s a weekly pattern. Which I would’ve easily noticed using a simpler, more direct graph.
Kosara’s Figure 1 is an excellent example of something I talk a lot regarding infovis: It’s a graph that invites the reader in, it’s intriguing and appealing, and what it leads the reader to is . . . discovery about the method used to make the graph. If you look at the swirly graphs and think about them, you can say: Hey, cool: It’s a spiral graph, each twist is 28 days, there are four blue zones, 28 divided by 4 is 7 . . . hey, there are 7 days in the week! Yeah, that makes sense, sick days happen by the week! Lots of work to convey a very simple piece of information. The joy of discovery, applied to discovering the familiar and expected–it’s the Chris Rock effect all over again.
In contrast, a dot-and-line plot showing the data by day of week would convey much more information, much more clearly and transparently. But from the standpoint of infovis, clarity and transparency are a minus, not a plus! A confusing graph can invite more reader involvement.