# If correlation doesn't imply causation, what does?

**maggiekb**#1

**oldtaku**#2

Wait, wait, wait. Correlation does not *prove* causation, but it sure as heck does imply it. MH17 is a good example of that.

Jeopardy music while I go read TFAâŠ *ding* Okay, âimplyâ in the scientific sense has a much stronger implication (har) than in the colloquial sense.

My attempt to translate: Correlation suggests causation.

Nice article, thanks Maggie. Today I Learned, and how can you top that?

**Kimmo**#3

Reading the article now, just popped back to drop this bomb:

I find it more than a little mind-bending that my heuristics about how to behave on the basis of statistical evidence are obviously not just a little wrong, but utterly, horribly wrong.

O_O

You have my attention. Go on.

**Kimmo**#5

PhewâŠ

EyesâŠ glazingâŠ

Iâll have to come back to this. In the meantime, I think Iâll get back to *GĂ¶del, Escher, Bach*, which seems like taking a break from the heavy going.

**William_Holz**#6

In a nutshell, correlation DOES imply causation *when imply is used colloquially.*

âŠwhich is how the general public uses it most often.

Honestly itâs a phrase we should get rid of, itâs worse than the theory fiasco.

Thanks again, English language.

âCorrelation suggests causation.â

What about âcorrelation implies you should check for causationâ?

**William_Holz**#9

That when the English language (d)evolves in ways that causes perfectly sensible and useful statements to be completely misused by the general populace it *literally* makes my head explode.

**Space_Monkey**#10

Correlation does certainly suggest a *connection*, with a degree of strength proportional to the correlation. It doesnât tell you which way the causality goes, or whether both phenomena are actually caused by (the same) something else, or whether your experiment was designed in such a way as to pre-select connected phenomena to look at. However, in many, if not most, cases involving a sufficiently strong correlation, we know enough to connect it to other things we know that will tell us which of those it is.

**catgrin**#12

If youâre looking to show causation, you can use the criteria set down by Bradford Hill. Theyâre for epidemiology, but do translate to other types of problem.

Using them may suggest causation, and the more that do apply, the stronger the case that causation exists. The wikilink has examples of this process as used in medicine.

**Medievalist**#13

I didnât find anything in the article that showed any difference in my understanding of the word âimplyâ. Can you point it out to me?

In my mind correlation does imply causation, which is why experiments and theories are devised to either make causation explicit or to disprove unsupported implications.

**disarticulate**#14

Coincidence does not imply causation.

Correlation does imply causation.

Causation is determined via experiment where one can observe both the causation and resultant correlation.

Correlation -> Causation is often about having a good description of the mechanism.

I always refer to this chart, knowing that pirates enjoy and benefit from global warming:

**catgrin**#15

The phrase âcorrelation doesnât imply causationâ is being used in place of a statistics term where correlation doesnât equal causation.

A better way to say it is that âcorrelation *alone* doesnât imply causationâ. Thatâs because there are correlations where no causation exists, and (like you say) testing or simply further examination may bear out the lack of causation.

Correlation alone doesnât necessarily imply causation. Here are some examples why not. As humans we love to see patterns, and that can result in a false leap of logic. There are different types of correlations, with different strengths, and stronger correlations are more likely to imply a causation, but more information is still needed to prove it. Itâs best to hold off on assuming causation until you have a fuller picture.

**Richard_Kirk**#17

Itâs not about chance. Your data could have fallen out that way first time, but if you carry on repeating the experiment then it becomes increasingly unlikely to fall out the same way. We can even formulate laws about quantum things which predict average behaviours of things that are innately unpredictable. No, here we are talking about repeatable experiments which can lead us to the wrong conclusions.

Look at the graph on the right hand side of this article, and the bit of text with itâŠ

That explains the voting problem.

A lot of the apparent correlations of A & B, particularly the funny ones, are generally because A and B are both varying with time. That is the mobile phones and Greek currency argument.

Next, there are a huge lot of possibilities where A & B are linked by a whole raft of causes and intermediate variables. An the final catch-all is that Decartesâ deamon arranged for your data to come out like that, for no particular reason that you shall ever know, other then heâs a bit of a dick.

Thatâs pretty much it. You can understand it all without going up a hat size.

**ejeffrey**#18

The math is a bit complicated, but if you scroll down to the âtobacco causes lung cancerâ example, it makes some intuitive sense. In that example, if your only three random variables are âsmokes tobaccoâ, âGets cancerâ, and a hidden variable"unknown genetic factor" which could potentially cause both smoking and cancer, you canât prove that smoking causes cancer.

So in a simplified example you add an extra variable that represents an intermediate observable variable, âtar in lungsâ, which is potentially caused by smoking, potentially causes cancer, but which you believe is not (directly) influenced by the unknown genetic factor. In this particular case, that may or may not be a valid assumption, but this is also a simplified model of a real causal network. If you do that, and you measure all the correlations between the observable variables (smoking, tar, cancer), you can now calculate, or at least bound, exactly how much smoking causes cancer independently of any genetic factor.

Of course, you never exclude all possible models, which isnât really the point. The point is that this is a technique to formally reason about how much influence confounding variables can have when you have a partial model for how they would work.

**Medievalist**#19

I donât think that itâs valid for anyone to equate âcorrelation doesnât imply causationâ with âcorrelation doesnât equal causationâ. In my native language, American English, those are two different phrases, meaning two different things.

Implications arenât necessarily true - they are possibilities or probabilities perhaps, but mostly they are just ideas that exist subjectively in the minds of observers. Different people might see entirely different implications given the same data set, due to factors like education or culture. To *imply* something is to suggest something indirectly, but even if a thing is suggested *directly* that does not make it *true*.

Iâm going to continue to say that correlation doesnât equal causation, and that good scientists are not necessarily good wordsmiths.

**catgrin**#20

I wholly agree. American English is also my native tongue, and it has its foibles.

I didnât say âCorrelation doesnât imply causation.â (Thatâs the old one.)

I said âCorrelation **alone** doesnât imply causation.â

The two statements are very different. Correlation may be a component to discovering causation, so the two can be related. It can, and often does, point out causality. The word âaloneâ provides a warning that just correlation isnât enough to assume causation, and we need to be wary of false patterns. If correlation exists, we need to look for additional evidence of causality.

On the flip side: Correlation is almost guaranteed to exist if causality exists. So if you can prove causality, you may be able to find a correlation you were missing. Discoveries have been made made this way - kind of in reverse.

The two concepts âcorrelationâ and âcausationâ arenât equal, but I think my statement gives a clearer picture of their relationship.