Understanding spurious correlation in data-mining

As the post states:

Given two measurements xi in X and yi in Y on a set of points p1…n in P, if the value of xi+yi increases the chance that pi will be sampled, it will introduce a phantom correlation between X and -Y

For the gun/racism thing that would translate to a “phantom correlation” if your gun-ownership status plus your racism score made it more likely that you would show up in the American National Election Study. If that is the case, then this particular issue is one to be aware of.

Out of all the potential statistical problems with the gun/racism study, that’s a pretty minor one to worry about though.

1 Like