True Lies: How Statistics Fools Us

When an ad says 80% of dentists recommend Colgate, what do you assume? Do you believe they asked ~1,000 dentists out of which 800 recommended Colgate, or do you think they asked 10 out of which 8 recommended Colgate? Either way, they are correct—even if all the dentists recommend other toothpastes along with Colgate. This is how true numbers can fool you unless you arm yourself against the common fallacies.

In 1964 LA, an elderly lady was walking home from grocery shopping. As she made her way down an alley, she stooped to pick up an empty carton, at which point she suddenly felt herself being pushed to the ground. When she looked up, she saw a young woman with a blond ponytail running away down the alley with her purse. Near the end of the alley, a man saw the woman jump into a yellow car which was driven by a bearded black man.

Several days later, the police arrested a couple that matched the description. The probability that a random couple would satisfy all of the criteria—interracial couple, blonde hair, pony tail, bearded man, yellow car—was 1 in 12 million. Janet Collins and her husband Malcolm Collins were charged with the crime.

In 1996 England, Sally Clark’s son died within few weeks after his birth. She had a second son in 1998 who also died in similar circumstances. Clark was arrested and tried for both deaths. The defence argued that the children had died of sudden infant death syndrome (SIDS). However, the prosecution case relied on the fact that the chance of two children from an affluent family suffering SIDS was 1 in 73 million. Clark was convicted in 1999.

As you would have guessed it by now, in both the cases the subjects were falsely convicted. But before I explain that, here’s a question. If something increases from 5% to 10%, is it a 5% increase or a 100% increase?

Which framing sounds more alarming? Suicide rate increases by 100% or Suicide rate increases by 5%? If the initial rate was 1 in a million, and has now increased to 2 in a million, would you call that a 100% increase or an increase of 0.0001%?

Both are correct, but one implies a completely different meaning. That’s how easily truth can be used to lie.

In 1995 UK, news spread that a certain type of birth control pill increases the risk of life-threatening blood clot by 100%. In reality what was 1 out of 7000 become 2 out of 7000—an increase of a mere 0.014%. But as a result of the “true lies” that spread, UK suffered from a lot of unwanted pregnancies. 13,000 of them to be precise, mostly among teenagers.

Now, let’s discuss the classic Correlation-Causation Fallacy. Do you think taking vitamin supplements keeps one healthy? Or in reality the case is that mostly healthy folks tend to take supplements? Do you think students who smoke cigarettes have bad grades, or is it that students who have bad grades tend to smoke cigarettes?

Let’s take a bit more complex example. Ice-cream sales and hospital admissions for heat stroke are positively correlated. But neither causes the other. There’s a third variable: hot weather. Hot weather makes use have ice-creams and also causes heat strokes. This is known as the Third-Cause Fallacy.

Okay, one more. The admission figures for the fall of 1973 in UC Berkley showed that men applying were more likely (44%) than women (35%) to be admitted. UC Berkley was accused of gender bias. But after taking a closer look at the admission data of all the 85 departments it was concluded that women tended to apply to more competitive departments with low overall rates of admission, whereas men tended to apply to less competitive departments with overall high rates of admission. This is known as Simpson’s Paradox, in which a trend appears in several different groups of data (at the department level), but disappears or reverses when these groups are combined (at the college level).

Now, let’s get back to the two original stories. Before that, we’ll have to understand something popularly known as the Prosecutor’s Fallacy. Because prosecutors tend to misuse it more often.

Let’s take an example. Say a 1,000 people live in a town where a murder has occurred. The perpetrator is known to have the blood type of A+, and only 10% of the population (i.e., 100 folks) have that blood type.

Now, you can’t just pick a random person from the street and declare them guilty if their blood type matches with that of the perpetrator, right? The similar blood type may only be used as an “unexpected coincidence” after a set of robust evidence (discovered prior to the blood test) proves them guilty.

With this understanding, let’s look into the case of Janet and Malcolm Collins. If there are 10 couples in the city who match the perpetrators profile, the probability that a random couple would satisfy all of the criteria—interracial couple, blonde hair, pony tail, bearded man, yellow car—is not 1 in 12 million, but 1 in 10. You compare apples to apples, not apples to all fruits. Get it?

Now let’s take the case of Sally Clark. In such cases, it’s easy to confuse that the chance of a rare event happening is the same as the chance of a suspect’s innocence. In reality, these two probabilities are never equal. Saying that there is a 1 in 73 million chance that the babies died of SIDS is not the same as saying that there is a 1 in 73 million chance that the mother did not kill them.

On top of that, the probabilistic calculation was based on the assumption that two SIDS deaths in the same family are independent. But this assumption is false. There are genetic and environmental factors that predispose families to SIDS. So, after a first death from SIDS the chances of a second become greatly increased.

Also, using UK crime statistics and the exact same reasoning that was used against Sally, we can find that the probability that two infants will be murdered in the same household is 1 in 2 billion. Too insignificant to convict her.

But after losing both of her sons, Sally Clark had to spend four years in prison for a crime she didn’t commit. In 2003, Sally’s conviction was overturned on the basis of additional medical evidence of a possible bacterial infection. However, this story doesn’t have a happy ending. Sally Clark died on 2007 due to alcohol poisoning. She was suffering from depression because neither the judge, jurors, media and nor the people could correctly interpret the significance of the numbers. All of them got fooled by true lies.

Bottom line is this: we don’t understand probability. Numbers confuse us. Statistics fools us. Be cautious!