If correlation doesn't imply causation, what does?




Wait, wait, wait. Correlation does not prove causation, but it sure as heck does imply it. MH17 is a good example of that.

Jeopardy music while I go read TFA… ding Okay, ‘imply’ in the scientific sense has a much stronger implication (har) than in the colloquial sense.

My attempt to translate: Correlation suggests causation.

Nice article, thanks Maggie. Today I Learned, and how can you top that?


Reading the article now, just popped back to drop this bomb:

I find it more than a little mind-bending that my heuristics about how to behave on the basis of statistical evidence are obviously not just a little wrong, but utterly, horribly wrong.


You have my attention. Go on.


Causation is demonstrated by evidence.



Eyes… glazing…

I’ll have to come back to this. In the meantime, I think I’ll get back to Gödel, Escher, Bach, which seems like taking a break from the heavy going.


In a nutshell, correlation DOES imply causation when imply is used colloquially.

…which is how the general public uses it most often.

Honestly it’s a phrase we should get rid of, it’s worse than the theory fiasco.

Thanks again, English language.


What are you implying here?


“Correlation suggests causation.”

What about “correlation implies you should check for causation”?


That when the English language (d)evolves in ways that causes perfectly sensible and useful statements to be completely misused by the general populace it literally makes my head explode.



Correlation does certainly suggest a connection, with a degree of strength proportional to the correlation. It doesn’t tell you which way the causality goes, or whether both phenomena are actually caused by (the same) something else, or whether your experiment was designed in such a way as to pre-select connected phenomena to look at. However, in many, if not most, cases involving a sufficiently strong correlation, we know enough to connect it to other things we know that will tell us which of those it is.


I could care less; that’d make me sad.


If you’re looking to show causation, you can use the criteria set down by Bradford Hill. They’re for epidemiology, but do translate to other types of problem.

Using them may suggest causation, and the more that do apply, the stronger the case that causation exists. The wikilink has examples of this process as used in medicine.


I didn’t find anything in the article that showed any difference in my understanding of the word ‘imply’. Can you point it out to me?

In my mind correlation does imply causation, which is why experiments and theories are devised to either make causation explicit or to disprove unsupported implications.


Coincidence does not imply causation.

Correlation does imply causation.

Causation is determined via experiment where one can observe both the causation and resultant correlation.

Correlation -> Causation is often about having a good description of the mechanism.

I always refer to this chart, knowing that pirates enjoy and benefit from global warming:


The phrase “correlation doesn’t imply causation” is being used in place of a statistics term where correlation doesn’t equal causation.

A better way to say it is that “correlation alone doesn’t imply causation”. That’s because there are correlations where no causation exists, and (like you say) testing or simply further examination may bear out the lack of causation.

Correlation alone doesn’t necessarily imply causation. Here are some examples why not. As humans we love to see patterns, and that can result in a false leap of logic. There are different types of correlations, with different strengths, and stronger correlations are more likely to imply a causation, but more information is still needed to prove it. It’s best to hold off on assuming causation until you have a fuller picture.


That’s the best form of the statement I’ve seen so far.


It’s not about chance. Your data could have fallen out that way first time, but if you carry on repeating the experiment then it becomes increasingly unlikely to fall out the same way. We can even formulate laws about quantum things which predict average behaviours of things that are innately unpredictable. No, here we are talking about repeatable experiments which can lead us to the wrong conclusions.

Look at the graph on the right hand side of this article, and the bit of text with it…

That explains the voting problem.

A lot of the apparent correlations of A & B, particularly the funny ones, are generally because A and B are both varying with time. That is the mobile phones and Greek currency argument.

Next, there are a huge lot of possibilities where A & B are linked by a whole raft of causes and intermediate variables. An the final catch-all is that Decartes’ deamon arranged for your data to come out like that, for no particular reason that you shall ever know, other then he’s a bit of a dick.

That’s pretty much it. You can understand it all without going up a hat size.


The math is a bit complicated, but if you scroll down to the “tobacco causes lung cancer” example, it makes some intuitive sense. In that example, if your only three random variables are “smokes tobacco”, “Gets cancer”, and a hidden variable"unknown genetic factor" which could potentially cause both smoking and cancer, you can’t prove that smoking causes cancer.

So in a simplified example you add an extra variable that represents an intermediate observable variable, “tar in lungs”, which is potentially caused by smoking, potentially causes cancer, but which you believe is not (directly) influenced by the unknown genetic factor. In this particular case, that may or may not be a valid assumption, but this is also a simplified model of a real causal network. If you do that, and you measure all the correlations between the observable variables (smoking, tar, cancer), you can now calculate, or at least bound, exactly how much smoking causes cancer independently of any genetic factor.

Of course, you never exclude all possible models, which isn’t really the point. The point is that this is a technique to formally reason about how much influence confounding variables can have when you have a partial model for how they would work.


I don’t think that it’s valid for anyone to equate “correlation doesn’t imply causation” with “correlation doesn’t equal causation”. In my native language, American English, those are two different phrases, meaning two different things.

Implications aren’t necessarily true - they are possibilities or probabilities perhaps, but mostly they are just ideas that exist subjectively in the minds of observers. Different people might see entirely different implications given the same data set, due to factors like education or culture. To imply something is to suggest something indirectly, but even if a thing is suggested directly that does not make it true.

I’m going to continue to say that correlation doesn’t equal causation, and that good scientists are not necessarily good wordsmiths.


I wholly agree. American English is also my native tongue, and it has its foibles.

I didn’t say “Correlation doesn’t imply causation.” (That’s the old one.)
I said “Correlation alone doesn’t imply causation.”

The two statements are very different. Correlation may be a component to discovering causation, so the two can be related. It can, and often does, point out causality. The word “alone” provides a warning that just correlation isn’t enough to assume causation, and we need to be wary of false patterns. If correlation exists, we need to look for additional evidence of causality.

On the flip side: Correlation is almost guaranteed to exist if causality exists. So if you can prove causality, you may be able to find a correlation you were missing. Discoveries have been made made this way - kind of in reverse.

The two concepts “correlation” and “causation” aren’t equal, but I think my statement gives a clearer picture of their relationship.