I was expecting phrases like “moderately powered” (n<=20) and such.
I’m glad he started off with a bit about how screwed up the frequentist paradigm is overall, because in large part this kind of bullshit is due to the failings of that conceptual framework for hypothesis testing.
p < 0.05 is a pretty arbitrary border though. It’s only a limit for significance through convention. Calling 0.1 significant is idiotic, but 0.052 or even 0.06 is close enough to at least warrant a discussion.
The real problem is that p-values are the fig leaf. People across many disciplines use statistics improperly. Even highly educated people lazily bandy things like p-values about and the speed-reading masses skip to the part where a magical number means “SCIENCE HAS PROCLAIMED IT TO BE TRUE!” All a p-value of 0.05 really means is that the results of whatever test that was employed has a 5% probability of being due to chance and chance alone. That’s still a pretty high probability if you think of it. Anyhow, the point I want to make is that the choice of the statistical test, and of course the raw data itself is far more important.
Can you blame them? There’s no money to be made proving the null hypothesis.
I use 'hitch in my/his/her giddy up" often, scientifically speaking that is.
Didn’t phdcomics do something like this? All I could find was
And of course there’s https://xkcd.com/882/ .
P values are not magic.
Let’s suppose that someone does a study showing that a harmless drug extends life tenfold, whitens your teeth permanently, and tans you seasonally and appropriately. And that this study has a p-value of 0.2.
Significant? Not by our conventional significance of p<0.05. Which is just that, a convention. But it still means that there’s an 80% chance this magic sauce is legit.
But – you interject – that doesn’t pass the eye test. It’s too good to be true. There must be methodological flaws.
Probably. A p-value is just a posterior probability. And if your pretest probability of this thing being true is very, very low you may suspect that the study includes assumptions or errors that lead to pretty good certainty about something wrong. We encounter this all the time when we argue with people who have excellent formal logic with weird initial assumptions – That combination leads to crazy talk. Viz. Ayn Rand. So you go back and do an independent study, and another, and a few more, and if five in a row support it the odds are now exponentially less that all the studies with different methodologies leading to the same results are due to chance.
P<0.05 is not magic. As in all things, you use it as a way to calibrate your uncertainty, and to hedge your bets.
There is nothing wrong with publishing negative results, and, in fact, it is actually a good idea. When researchers engage in “file drawer bias” by not submitting negative results (or when journals engage in “publication bias” by declining to accept negative studies) the published record becomes skewed in favor of positive results, whether they are representative or not, as XKCD illustrated so well:
The issue is not that of publishing negative results, the issue is the use of misleading claims that results short of the threshold chosen for statistical significance are statistically significant. This relates to your other post about scientific publishing:
Replication, or reproducibility, is a key to the methodology of science. Not only are many initial studies not replicable, journals often refuse to publish negative results of attempts to replicate a study.
Evidence for precognition is big news, but leading journals won’t touch repeat studies that fail to replicate the results.
The bias against negative studies is, I think, one of the reasons that people try to claim that negative results are positive. We should encourage the publication of high quality negative studies. Doing so may be a way to reduce the perceived incentives to make misleading claims about the statistical significance of study results.
This is why registered studies and registered clinical trials are so important, and need to be further enforced.
The FDA ought to require registration of all trials for submitted drugs and devices. Otherwise the file drawer effect runs rampant.
Pre-registered studies in regular journals are also a boon to the researchers themselves. The current wide-spread model is to “generate results”. Ones that are interesting, so that they get published. But in the few places that have started pre-registering the publishing of the study is guaranteed, so that researchers don’t feel forced to get the outcomes they want, and journals themselves are less prone to ignoring all the important replication.
Of course this only works if the journals are both willing to do pre-registration, and if the journals are willing to do pre-registration of replication studies as well.
On the other hand this piece of magnificent gobbledygook was actually accepted by a real math journal. The references at the end are what should have given it away, but…
See also: blog post about the thing.
Life gets interesting when you edit translations of abstracts for articles on the beneficial effects of Buddhism in sociology, poli sci, etc., by Buddhist monks at a large Southeast Asian seminary university that shall remain nameless.
"Weasel words for p > 0.05? We don’t need no stinkin’ weasel words for p > 0.05. We don’t get p > 0.05.
“Repeat after us: qualitative methods…”
Phra Marasatsana has this science thing figured out.
There are, of course, several different standards for statistical significance. The p < .05 is typically used in the social sciences. P <.001 is the standard in other sciences. You can set the boundary anywhere you want, except that it has to be accepted by the rest of the scientists in your discipline.
What did we know and when did we know it?
Knowledge is a tricky thing. Each individual has a different standard for deciding to accept something as knowledge or not. Some people accept hearsay. Some people accept witnessing. Some people accept “science” as it is reported in the news. Some people accept knowledge they can verify in their own labs. My point is that the social and individual factors play a huge role in any person’s decision to decide that something is known or not.
Is there an objectively correct standard for knowledge? I doubt it, but I’m willing to entertain arguments (or better yet, evidence) to the contrary.
For all that it is good to be skeptical, too much skepticism paralyzes people. So at some point, a different point for every individual, we must take a leap of faith in deciding how much evidence of what quality is necessary to decide whether we know enough to take action. Uncertainty is rampant. And still, we survive.
I would also like to add that “insignificant” is not an accurate way to describe a non significant result, and “very significant” or “of strong significance” make equally no sense for when a small p value is detected. I was taught that there is only one way to report a result greater than the p value you chose for the experiment: non significant. A significant p value only signals the beginning of a discussion, it is not the end of the discussion.
Calling that a real journal is a stretch. It’s a scam, not a journal.
p < .05 is standard for medical research as well.
Also wanted to support the publishing of negatives. “Trend towards significant” etc are valid responses as long as they are clearly stated and the suggested action based on them is further research. A study that returns a p of .055 may have had an anomaly or a poorly controlled confounding variable. One that returns a p of .89 is almost certainly negative if the methodology is sound.
I’m surprised that the publishing of negative results hasn’t become more popular in the age of the internet. Shooting down other people’s arguments and evidence is the universal pasttime of internet chatter. Sure, compared to buzz news, scientific forums would move at a pitch-drop pace, but if they learned to incorporate memes when letting each other know that their result were not reproducable, it might garner more public attention.
“ermagerd intenervning veriabers!”
*awkward penguin image* “results couldn’t be reproduced independently by 3 other labs in 6 additional trials”
*willy wonka patronizing face* “Oh really, tell me more about your p < 0.05 significance…”
“I’m in yur methodologies section, pointing out flaws”
,etc…
That doesn’t make sense. p-values are a continuum, describing our confidence that a correlation is real, and that confidence approaches certainty as p approaches 0.
A p-value of 0.000001 is “very significant.” It means that you are almost absolutely certain that the correlation you found is real and not due to chance (regardless of methodological flaws in your study, which is an orthogonal question).