So I hope to explore, over time and in this space, several different kinds of "cognitive bias"; for me talking through things, trying to explain them in plain language, or just describing them again and again but with different words is helpful in my own understanding.
So let's start with "survivor bias". The basic idea here is that when studying something, anything, you have to pick a data set to work with; choosing your data set from too slim a portion can strongly bias the outcome of the study.
Real world example? Say a high school is concerned with their success with the AP History test; they want more students to score 4s and 5s (is that still the scale? sorry of this example is dated!) in order to get the college credits for the class and raise the high school's status relative to its peer group. So they start a study: when the scores come in for a given year's test, the find that there were 10 students (out of a group of 40 taking the test) that scored a 4 or better - pretty decent results! So they complete an in-depth of these 10 kids, learning about their diets, study habits, elementary school grades, the marital status of their parents... Ultimately the school comes up with a lot of data, but they want to be careful before coming to conclusions, so they decide to do the same study the next year, and so get a second set of data from the high scorers (this time, 12 of 39 students scored 4 or better!). So let's say the school continues this pattern until they have a pretty large sample of students to study, maybe 100+ over 8 or so years.
With all this data in hand, the school runs the numbers and comes up with some insights on "what kinds of students" score 4s and 5s. What's the problem with the study at this point? What if the qualities observed in the "winner" group are also present in a significant number of the "losers" as well? The data set was too narrow and biased to the "survivors" of the process being studied.
Much shorter example: say you want to discover the average time it takes a male aged 22-26 to finish the NY Marathon in 2010. The race organizers provide you with raw data of finish times and start times for all the guys of that age range. Simple, right? But what about those guys that start but never finish? Pulled hamstring, stopped to chat up a cutie at the water station, decided that running is for cheetahs...whatever the reason, some guys just don't finish the race. If the data of all starters of the appropriate age range are not considered, there is a "survivor bias" in the results of the study.
At this point, I feel it's important to point out that a study can intend to study only survivor's; so long as the study presents the findings as biased towards the survivors, there's nothing inherently wrong with studying a subset of the available data. But a persistent problem with statistics and sound bite "findings" of studies is that people tend to misunderstand and/or abuse the findings.
Through careful analysis it has been found that 100% of the survivor's of the Titanic's sinking had urinated at least once during the 24 hours preceding the iceberg incident...if only those poor drowning bastards had thought to go pee!
No comments:
Post a Comment