# Understanding spurious correlation in data-mining

Wait - like a study showing that some racist, white people have guns?

Now Iâ€™m confused.

1 Like

â€śAnyone who thinks itâ€™s possible to draw truthful conclusions from data analysis without really learning statistics needs to read this.â€ť

Thatâ€™s all good and true. You probably canâ€™t trust the statistical results presented by someone who never really learned statistics. Similarly though, you canâ€™t ever put too much weight on critiques of statistical techniques by people who never really learned statistics either. Iâ€™m not saying the latter applies here.

I think a clearer message would be, â€śDear non-statisticians, please realize that other non-statisticians do bad statistical analysis.â€ť

There really isnâ€™t much here for statisticians to learn from. They already know this stuff. Itâ€™s more about educating laypeople and/or creating a sense of superiority in one group of non-statisticians over another group of non-statisticians.

The bad thing about â€śgeek chicâ€ť is that itâ€™s vastly increased the number of dilettantes who like to lecture people about math and science when their credentials donâ€™t extend much past having watched Battlestar Galactica or whatever it is they think gives them nerd-cred.

3 Likes

I donâ€™t see the cause of confusion.

Itâ€™s one thing to say â€śIf you take a humongous pile of Big Data and just randomly run regressions, you are going to find correlations that donâ€™t exist.â€ť

Itâ€™s another thing to say â€śif you have a hypothesis about two things, and gather the data and see that there is a correlation between the two things, then that adds some evidence that there is a connection between the two things.â€ť

In the case of the guns study, there are numerous plausible hypotheses (off the top of my head: there are more people with guns in certain states, and there are more people who got a higher score on that â€śsymbolic racismâ€ť test in those states (maybe because of the wordings in the test), so the two are therefore correlated). Just because this one emotionally got your goat (because it involves guns) doesnâ€™t mean you have to say that all statistics are bullshit.

3 Likes

As the post states:

Given two measurements xi in X and yi in Y on a set of points p1â€¦n in P, if the value of xi+yi increases the chance that pi will be sampled, it will introduce a phantom correlation between X and -Y

For the gun/racism thing that would translate to a â€śphantom correlationâ€ť if your gun-ownership status plus your racism score made it more likely that you would show up in the American National Election Study. If that is the case, then this particular issue is one to be aware of.

Out of all the potential statistical problems with the gun/racism study, thatâ€™s a pretty minor one to worry about though.

1 Like

This topic was automatically closed after 5 days. New replies are no longer allowed.