Understanding spurious correlation in data-mining

I don’t see the cause of confusion.

It’s one thing to say “If you take a humongous pile of Big Data and just randomly run regressions, you are going to find correlations that don’t exist.”

It’s another thing to say “if you have a hypothesis about two things, and gather the data and see that there is a correlation between the two things, then that adds some evidence that there is a connection between the two things.”

In the case of the guns study, there are numerous plausible hypotheses (off the top of my head: there are more people with guns in certain states, and there are more people who got a higher score on that “symbolic racism” test in those states (maybe because of the wordings in the test), so the two are therefore correlated). Just because this one emotionally got your goat (because it involves guns) doesn’t mean you have to say that all statistics are bullshit.

3 Likes