Skip to main content
2021

Nate’s Notes 3: Correlation vs Causation

By February 14, 2021June 8th, 2023No Comments

If you’re in a room with 60 people, what are the odds that any two of you will share a birthday?

I run this experiment in my class and people always get it wrong. You might think to estimate the answer by dividing 60 by 360 which equals 17%. But in reality, with just 60 people in a room, the odds are 99% that someone will share a birthday with someone else.

So why do we get this question wrong…and by orders of magnitude?

For one, we’re pretty bad at estimating probabilities. For this particular birthday question, the trick is to think about all of the possible combinations of shared birthdays.

So For example, I might share a birthday with any of the 59 other people.

But Person 2 could share a birthday with the remaining 58 other people.

And Person 3 could share a birthday with the remaining 57 other people.

And so on.

So, with just 60 people in a room, there 1,170 possible combinations. That’s why the odds are 99% with just 60 people.

In fact, with just 24 people in a room, the odds are greater than 50% that one person will share a birthday with another.

And though this birthday example seems trite, it has massive implications for how we think about disentangling correlation from causation. Whenever there are lots of possible combinations, it’s a mathematical certainty that there will be lots of coincidences or correlations.

One of my favorite correlations is the Washington Winner. From 1936 to 2000 the outcome of the U.S. Presidential election was perfectly correlated with the Washington Commander’s final home game each year. If you wanted to know who would win the US presidential election—the incumbent party or the challenging party—all you had to do was check to see whether the Commanders won or lost their final home game in the election years. If the Commanders won their final home game in an election year, then the party that was already in office was going to win. If the Commanders lost, however, then the challenging party would win.

But of course, this was just a coincidence, a spurious correlation. As with the birthday example, if you look at lots of possible combinations (e.g., every possible stat from every possible sport, at every possible level, and every possible political outcome), you are mathematically guaranteed to find a bunch of coincidences, or correlations.

While correlations can be interesting, they really aren’t that helpful for us. What we really want to know is whether correlations are causally related. Whether one thing causes another.

Trying to disentangle correlation from causation is one of the greatest challenges we face as humans. You might think I’m exaggerating the point, but if anything I think we vastly under appreciate it.

Think of all the times we’ve heard someone give an explanation for why they are successful. Some people credit their success to waking up early each day. Others credit their success to working late. Some credit their success to reading. Others credit their success to persistence. But what every explanation fails to take into account is the counterfactual. No one ever knows how things would have turned out for them had they behaved differently.

Disentangling correlation from causation is much harder than most of us realize. But the good news is that our species has developed a work around, and it’s how I spend the majority of my time as an academic.

The greatest tool we have to identify causal relationships is to run randomized, controlled experiments.

If we get lots of people to participate in an experiment, and then randomly divide them into two groups, we can be reasonably confident that the two groups, on average, will be essentially the same on a surprisingly number of characteristics.

Then if we give one group of people a treatment and the other group we hold constant, we’ve created an artificial counterfactual. We can see what happens to the group that gets the treatment and compare it to the group that didn’t get the treatment.

If the results between the two groups differ, we can be reasonably confident that it was the treatment that led to the change, because we held everything else constant. We’ve disentangled a correlation from a causation.

But so what? Why does this matter?

Ultimately, we want to know what will make us happy or successful. But the correlates of happiness and success are much less interesting than the causes of happiness and success.

And recognizing that correlation is not necessarily causation, may be the first step in helping us make the right decisions that will cause happiness and success.

As Annie Duke says, there are only two things that determine the quality of our lives: luck, and the decisions we make.

And maybe the most important factor in helping us make good decisions starts with understanding that correlation doesn’t always equal causation.

It’s a simple idea. Please take it seriously.

Get Nates Notes In Your Inbox

Subscribe to Nates Notes to receive a summary of each podcast episode delivered to your Inbox.