500 phrases from scientific publications that are correlated with bullshit

I’m going to start The Journal of Negative Results. I’m not entirely unserious about this.

4 Likes

We’ll have to disagree there. I was always taught, and continue to believe that the a p value is a binary type value in the way we use it. You either reach significance or you don’t. Getting a smaller p value than you were hoping to get doesn’t mean your results are any truer or better or “significant” in a real world context. That is the discussion preserved for effect size or another practical measure such as how many people you have to treat in order to get a positive result. Perhaps it’s like being “very” or “more” pregnant. It just doesn’t make sense (BTW I hate that pregnant or not pregnant binary I just used to illustrate the point).

BTW, I hate p values and the frequentist statistical approach. I would much prefer social scientists made more use of regression analysis where variance can be measured across multiple variables. I think this is much more useful than rejection of the null hypothesis of various single hypotheses.

Is there any reason why significance should be a strictly binary proposition? I’d think that saying so is a somewhat arbitrary proposition, grounded in convention rather than on any inherently binary nature to the idea of significance. However, agreeing on a threshold for significance beforehand, then trying to change it post hoc is goal post moving and post hoc rationalization, that is, bad science.

I think that there’s a confusion over the binary fact of whether there IS or ISN’T a correlation with our certainty of whether we know there is a correlation.

Whether or not there exists a correlation is binary, as you say. It may be a strong correlation or a weak correlation, but there either is or isn’t a correlation between them. And this is what we’re trying to determine with p-values, that’s true.

However, we cannot be certain of whether there is a correlation, or whether all the results we’ve seen are the result of being luck. The p-value represents our confidence that there is indeed a real correlation. This confidence can never be 100% — we can never be absolutely certain that there is a correlation — but we can measure this confidence.

This is what we mean by the results being “strongly” significant — that our confidence of whether it really is significant is higher. But you’re right that it’s a poor phrase: the significance itself is still binary, but our confidence about it has increased.

That said “you either reach significance or you don’t,” while true in practice (articles tend to just say they’re “significant” when they hit that 0.05 level), isn’t really true. The 0.05 boundary is completely arbitrary. It really is meaningful if your p-value is 0.00001 vs 0.05. 1 our of 20 studies with a p-value of 0.05 are likely to get overturned when someone repeats the study, even though they claimed to be “significant,” while those with a p-value of 0.00001 are extremely unlikely to be overturned (using the same methodology, that is). That’s useful information to know.

Right, the effect size is completely separate from the p-value (except that to even say there’s any effect size at all means you’re assuming any correlation you see is real), and too often it’s ignored. (Witness the kerfuffle over eating bacon having a “significant” affect on cancer rates, when the actual effect size was tiny.)

It was fun to read that piece, but unfortunately it perpetuates the notion that statistical inference can only ever be binary. As several commenters have pointed out, this is not true. Deciding to treat inference that way is not unreasonable, but it isn’t the only reasonable decision. A strength-of-evidence approach is also entirely reasonable, and the two views each have (unsurprisingly) plusses and minuses. Fuller discussion here: http://wp.me/p5x2kS-cR

This topic was automatically closed after 5 days. New replies are no longer allowed.