Everyone who has paid attention to more than one lecture in inferential statistics knows that you have to correct for the number of tests that you’re doing (and .01/N of parameters is the simplest one, but there are many more sophisticated methods). But that doesn’t solve the problem: even if you do that, there’s still going to be significant outcomes/interesting-looking correlations, if you just look at enough parameters, just by chance! But p values only work if you had a hypothesis beforehand, it’s in their very nature, but this doesn’t get taught to eagerly in courses nowadays, although that’s changing in the wake of the replication crisiscredibility revolution in psychology and other fields.
Philosophers of science and even some psychologists have talked about this problem for decades, but nothing much has changed, because how do you publish your scientific articles if you just don’t find anything interesting and finding no significant p values doesn’t count as interesting (although it might well be). This is how you get publication bias.
I guess the same thing applies to big data: If you’re a marketing consultancy and your clients pay you to find patterns in some user data and all you find is a big chunk of uninteresting noise, well, you’re sure as hell going to squeeze blood from a stone to present something “groundbreaking” (and get paid).
(Also, pedantry: it’s p < .01 not >)