Monday, December 20, 2010

statistics = ignorance?

This post on BoingBoing today provides a good opportunity to explore a pet peeve subject for me: statistics are a problem.

It's a common joke to say "90% of statistics are made up" - common, but funny.  It's also common to claim that "correlation does not imply causation", which seems to be the main message of the BoingBoing post.

My primary beef with statistics is that humans do not seem to possess an innate ability to intuit statistical truth; we almost seem predisposed to "short cut" to conclusions, regardless of what the statistical evidence is trying to tell us.  My secondary beef with statistics is that is almost absurdly easy to game the system; whether you are studying biological systems, economics, every case you can create a statistical set of data that seems to support almost any contention.

Unfortunately, we exist in this space where one of our best tools for understanding and analysis is a deeply flawed tool.  The example from the BoingBoing post is perhaps a little simplistic, where a skeptical reader of data can apply some "common sense" to sniff out the next level of macro data hiding behind the surface level, but many studies of sufficiently complex systems are challenged by the inability to "step back" and see the subject in a wider context.

So what to do?  We have a flawed tool and a problem with confidence in the outcomes of using that tool...

and as XKCD suggests, sometimes correlation doesn't necessarily mean causation, but rightly does suggest the possibility of a relationship.

The upshot from me is that one needs to take statistics with a grain of salt, and just as in other areas discussed in this blog, one needs to be aware of the bias built in, both in the statistics and in the mind of the audience.

update: After I posted this I wondered if more examples of faulty statistical reasoning would be would they?


  1. Are there any new systems under development that are poised, should they turn out to "work," to replace statistics as we know it today? The superstring theory of data analysis, as it were?

    Sure seems like there ought to be. Or, if not a totally different approach, at least an HTML5 of revised and appropriately limited and re-tooled tools version.

  2. @AdanA: I think that statistics will continue to evolve, and while you might have been joking about HTML5, I think that hyperlinked documentation for statistics presentations will be an important and welcome part of that evolution.

    Much of the "disclaimers" and error margin data tends to get lost below the headlines, I think, and in a hyperlinked world it's easier to have all of that info right at hand.