Counterfactual Reasoning: How to Ask Better What-if Questions

Let me start with an example. Suppose Jonny takes a drug and dies a month later, how do we go about investigating whether the drug might have caused his death?

To answer this question (correctly) we need to imagine a scenario in which Jonny was about to take the drug but changed his mind. What if he hadn’t taken the drug, would he have lived?

In 1748, David Hume wrote in his Enquiry Concerning Human Understanding the following statement: “We may define a cause to be an object followed by another, and where all objects, similar to the first, are followed by objects similar to the second. Or, in other words, if the first had not been, the second had never existed.”

The first part is pretty obvious: there should be a continuity between the cause and the effect. But the second part of the statement, “If the first had not been, the second had never existed,” is what we know today as a Counterfactual. And the process of asking what-if questions (that run counter to the fact) in order to investigate what might have caused what, is called Counterfactual Reasoning.

Whenever we find two events are correlated (for example, I ate street food and I fell sick) we inherently draw a causal link (I fell sick because I ate street food). Counterfactual reasoning simply states that in order to show this, we must prove the answer to the counterfactual question: all things remaining the same, what if I hadn’t eaten street food, would I have fallen sick?

Counterfactuals try to understand the alternative histories that never happened because of (or lack thereof) an intervention. Since counterfactuals only deal with “what might have been,” they sound unscientific. After all, there’s no way to either confirm or refute the answers of such questions. Unless we run really long and tedious experiments, we won’t know for sure if the street food really made me sick, and we’ll never know what killed Jonny, yet we aren’t thaaat ignorant. We do understand a few things inherently.

For example, had the rooster been silent this morning, the sun would have risen just as well. We know this because we know that the rooster doesn’t cause the sun to rise. In this aspect, counterfactuals are often based on the models through which we understand the world, which is either common sense of scientific knowledge. Let me illustrate this with a real example.

Back in the day, when the smallpox vaccine was first introduced in Europe, a public debate erupted. A data (erroneously) showed that more people died from smallpox inoculations than from smallpox itself. Antivaxxers used this information to argue that inoculation should be banned (when in fact it was saving lives by eradicating smallpox). Now, if we were to go about finding out whether smallpox vaccines were in fact causing deaths instead of preventing them, how would we go about it?

Let’s make some assumptions. Suppose out of 1 million children, 99 percent are vaccinated, and 1 percent are not. If a child is vaccinated, they have one chance in one hundred of developing a reaction, and the reaction has one chance in one hundred of being fatal. On the other hand, they have no chance of developing smallpox.

Meanwhile, if a child is not vaccinated, they obviously have zero chance of developing a reaction to the vaccine, but they have one chance in fifty of developing smallpox. Finally, let’s assume that smallpox is fatal in one out of five cases.

I think you would agree that vaccination looks like a good idea. The odds of having a reaction are lower than the odds of getting smallpox, and the reaction is much less dangerous than the disease. But now let’s look at the data. Out of 1 million children, 990,000 get vaccinated, 9,900 have the reaction, and 99 die from it. Meanwhile, 10,000 don’t get vaccinated, 200 get smallpox, and 40 die from the disease. In summary, more children die from vaccination (99) than from the disease (40).

I can empathise with the parents who might come to the streets with “Vaccines kill!” signs. The data seem to be on their side. The vaccinations indeed cause more deaths than smallpox itself. But the question is: is logic on their side? Is this data strong enough for us to ban vaccination, or should we take into account the deaths prevented?

Here’s how you approach these types of problems (where you cannot run real life experiments) with counterfactuals. When we began, the vaccination rate was 99 percent. We now ask: what if there was no vaccination, would the facts still hold? Using the assumptions we made, we can conclude that out of 1 million children, 20,000 would have gotten smallpox, and 4,000 would have died.

Now, comparing this counterfactual world with the real world, we see that not vaccinating would have cost the lives of 3,861 children (the difference between 4,000 and 139).

The crux of counterfactual reasoning is this: unless we verify the answer to the what-if question in the counterfactual world, we cannot verify the real-world conclusion either.

This is why it’s hard to figure out if the drug actually killed Jonny with this limited information. Do we know that this drug has had similar effect on several other people? Did we see anything in the postmortem report? Unless we investigate these, we cannot draw a causal link between the drug and the death (unlike the smallpox vaccination case), hence we can neither verify nor deny.

But, what if I had studied harder, would I have aced my exams? Study has an effect on exams (we know that), so most likely I would have faired better. But the rooster has no effect on the rising sun, so no matter what it does, the sun’s gonna do its thing.

We often engage in unfruitful what-if questions after something bad happens. “What if I had done something else, would this have happened?” Truth is, we often cannot know for sure if things would have been any different had we done or not done something, especially when there’s no direct casual relationship and no way to establish one.

Counterfactual reasoning doesn’t offer any silver bullet. There’s no way we can verify everything—not without proper reason or data. But it does offer us a framework to ask the right questions and verify the answer, without jumping into abrupt conclusions.