Psychology's reproducibility crisis: why statisticians are publicly calling out social scientists

Best of luck to you. I mean it.

Having gotten an advanced degree in Computer Science means that even though I didn’t stay in academia, I can still get a job. It’s not the same for all disciplines. Though my first job after getting my PhD paid less than my last job before going back to school. The difference is that I get to choose the kind of job I want and I don’t have to work at a crappy investment bank.


Just a quick note to say thank you to the participants in this thread; I feel both informed and entertained on a subject that I only knew a bit about (enough to know why p<0.5 is significant) and it’s made me think again about some things I had always taken for granted (e.g. reproducible results always being a good thing.) Thanks.


And thats why I’ve come to use the term “social alchemies”. These fields of study are not in fact science but might someday become sciences.

What?!? I can certainly see a parallel between non-economic Marxism and the messianic urge, but here I think you’ve made a grave mistake or at least a timeline error. The messianic urge you allude to here is far closer to traditional Christianity than to the Jews for Jesus (another type of Christianity) which gets mistaken as a form of judaism.

1 Like

I remember reading the results of a survey of psychiatrists which showed that about 80% of them were atheists. The researchers claimed that since, conversely, 80% of mental patients have at least some religious beliefs, this made it harder for psychiatrists to work well with their patients.

The article included a list of some of the questions asked, and a friend of mine and I killed ourselves laughing reading them to reach other over the phone, because they were so, so Abrahamist. Terms like “higher power” were used, which is cringe-worthy in certain polytheistic and pantheistic religions. “God” was always used in the singular. There was an assumption of a formal, specialised, communal building for worship. And so on.

Unless 80% of the respondents said at the top, “Just mark me as an atheist” I suspect the results were not terribly accurate.


I’m in broad agreement with a lot of what’s posted upthread, but I’ll add my bit as well:

  • The use of inferential statistics is completely broken across a wide variety of disciplines. It ain’t just the “social sciences”.

  • “Social science” was always a dodgy term, anyway. It was either used as a pejorative to dismiss valid but non-traditional scientific research, or used in an attempt to claim scientific authority for non-science. Bad either way.

  • Especially in psych, “social science” is not the right term. Psychology is a very broad church; it runs the full gamut from blatant pseudoscientific nonsense to hard-as-nails empirical work.

  • Officially, I was a psychologist. What I actually did was dose up rats on party drugs, observe their behaviour in controlled experiments and then analyse their brains down to a cellular level. Not terribly social down there in the rat labs.

  • OTOH, because I was lookng at behaviour and drugs rather than just neurons and neurotransmitters, I was still at the “softer” end of my lab. The “harder” folks were focussing on a single-neuron level, while I was looking at systems. Still all “psychologists” though.

  • On the research side, the boundaries between psychology, pharmacology, neuroscience and computer science are rapidly disappearing, with the related disciplines blending into the new-ish field of Cognitive Science.

  • Publication pressure is a very, very big influence.

  • So are ethics boards. Most pre-clinical medical work uses garbage stats, trying to discern subtle effects in variable subjects using tiny sample sizes. A major reason why the sample sizes are so tiny is due to the effort to minimise the numbers of animals used in research.

  • The garbage stats are not always as much of a problem as they appear. The role of pre-clinical research is not to provide absolute answers; it’s to give the later-stage researchers clues about potential dangers and suggestions as to what are the most promising avenues to explore.

  • Replication failure does not always mean that the original findings were wrong. Especially at the trickier end of biological work, the techniques involved are so difficult that much of the variation in results is likely due to variation in researcher expertise. The most likely suspect for any unusual result is that someone fucked up in the lab.

  • If we tighten the rigour of the statistics while maintaining the sample sizes and techniques, what you’ll find is that the overwhelming majority of previously published preclinical research is now statistically non-significant.

  • But there is a big difference between “statistically non-significant” and “wrong”. When I’m researching a drug, I don’t care if the true p-value is .04 or .06. What I care about is “will the people taking this drug damage their brains?”. The numbers are just a tool to use in pursuing that question.

  • I think that there would be value in reintroducing some of the practices that were common in scientific publishing before inferential statistics became universal. It used to be that medical research papers included a section where the researcher openly stated and argued for what they believed to be the implications of their findings, beyond what they could absolutely prove. These days, any sort of speculation is very much frowned upon, even if clearly labelled as such.


My supervisor in sociology of religion was also an ordained minister who had a parish. He told us one day that they had interviewed a potential new scoutmaster, who had revealed that he was an atheist. One of the churchwardens got visibly upset and asked various questions, the last of this was “do you believe in a Supreme Being”.
My supervisor said that he remarked “Well, if we’re going to discuss 18th century ideas of theology we’re going to be here all day”, before telling the candidate that the job was his if he wanted it.


I am appropriately grateful for the information.


Thank you for doing so, because that was a most interesting and informative post.


I would be interested in the definition of “science” you are using and in which way which of the fields you count under social sciences fail to satisfy its requisites.



1 Like

Thanks for asking this pertinent question.

As a multilingual person I am struck by how the English usage of “science” differs from e.g. German usage of “Wissenschaft”. Similarly the English “evidence” fundamentally differs from the German “Beweis” These linguistic differences are particularly interesting, because they highlight fundamental differences in how academia and academic endeavours are viewed differently in these cultures. e.g. While I could deliver " einen wissenschaftlichen Beweis" for the above statement / hypothesis–it would be impossible to provide evidence for it. Because a “Beweis” is merely a proof constant within the parameters of the methods applied, evidence is increasingly only accepted in the context of an RCT. My method of choice to prove above hypothesis would be discourse analysis and the Beweis would be the texts quoted and the transparency of my analysis.

The claim that social sciences or humanities are not “sciences” per se would be unsustainable to a German speaker. e.g. The German Wiki page on scientists of Antiquity lists Grammarians, Philologists and Philosophers. There isn’t and couldn’t be an English equivalent because half of those listed wouldn’t be considered scientists in English.

There is also no German translation for hard sciences. Instead there are Naturwissenschaften and Geisteswissenschaften and Sozialwissenschaften. These terms reflect that there are different spheres of reality / existence / human experience which merit different approaches of understanding and ways of questioning–in short different methods.

Scientific rigour lies in applying the most appropriate method to the questions asked, rather than bending the questions to fit the method.

The latter appears to be what the “hard science” fan club advocate.

To many hard science advocates, it seems inconceivable, that certain fundamental questions (and some of us might claim, some of the most interesting questions) can’t be approached by RCTs and / or hard sciences / and or by evidence in the natural science sense). Such questions would be those pertaining to human consciousness, or pretty much anything to do with death–which is a core aspect of life, of the human experience.

Even in the hard sciences e.g. Physics, there are and have been laws which had theoretical proofs long before they could be experimentally proven. Some took centuries.

No amount of RCT is going to provide evidence on how to best spend your last months on earth (or how your Mother / Father / Sister or Brother should spend theirs). Given the kind of person you are (with a high probability unique) and the kind of disease you might end up suffering with (one of 1000s of currently known terminal conditions). But curiously enough this is the question which is the focus of much of medical science (in terms of $ spent) these days. e.g. Will chemotherapy x extend the your life expectancy by three weeks or five months–habitually omitting to ask whether those five months will be spent in utter misery.

Yes. Yes. and yes. the easy access to “data” (which by definition these days, seems to be everything that can possibly, remotely be expressed as a number) has created a toxic proliferation of “inferential statistics” which by many “hard science” advocates seems to be tasked with providing answers to all of lives questions. Except that what you put in, significantly determines what comes out, what you can possibly infer. And what you put in might easily be skewed by the blinders the “putters-in” carry with them.

Inferential statistics and the easy availability of big data has a lot to answer for, when it comes to the crises of scientific methods.


Thank you for that enlightening post! May I ask what field you’re in (assuming that you do work in academia)? My guess would be linguistics or philosophy?

Anyway, I am well aware of the differences between “science” and “Wissenschaft”. My question concerning the “social alchemies” post was relating to the term “social sciences” as it is understood (I hope) by native English speakers, even if “Sozialwissenschaften” bears a slightly different connotation.

That said, I wouldn’t translate evidence with “Beweis”, but maybe with “Beleg”, which has a weaker connotation, or maybe even only with “Hinweis”. Following Popper’s critical rationalism (which I would say most of current-day academic psychology subscribes to) our research can never prove anything and I usually tell my students that when they interpret a study as a “Beweis” for something.

I differ on one other point:

I think most questions that we ask, including your example concerning spending our last living months, can only be meaningfully answered by applying multiple methods. How will we know whether a life extended by chemo will really be spent in utter misery if we don’t have hard data on that from many of its recipients?

Inferential statistics per se are a very powerful tool. They can tell us how probable it is that our original hypothesis is right or wrong, given our data. Yes, people do draw wrong conclusions from inferential statistics but it is much, much easier to lie with just the descriptives (“Wow, look how high this bar goes! On this scale, this looks certainly at least twice as high as that bar over there…”). In my opinion, the problem is not that inferential statistics are used ubiquitously, but that people have come to rely on scripts (p < .05 means there really is a difference) instead of using them as what they are: tools to support, not supplant, your sense-making.

Still, I agree with what you said, especially that the empirical-quantitative method (the only one I know how to use) isn’t the only thing that will generate valid knowledge and the “hard science fan club” needs to be much more open to other methods of inquiry.

1 Like

Thanks for your thoughtful and detailed response. The questions posed here are particularly pertinent to my work as my background / MA is in Cultural Studies while my work is on healthcare systems. I work on the intersection between humanities and “hard” sciences. The methods I am skilled at are Hermeneutics, Text & Discourse Analysis. Among speakers of English these wouldn’t be considered scientific methods, while in German they are well recognised Wissenschaftliche Methoden, in the case of Hermeneutics with several 1000 years of history.

This exactly is my point. German scientists wouldn’t describe Sozialwissenschaften as “social alchemies” while for an English speaker there always seems to be a connotation of doggyness with social sciences.

In German it is merely a description of a field of study, of an academic focus on (human) society rather than on nature. A statement of fact. There is nothing pejorative about it. aLthough that might be changing.

Some of the most influential German thinkers of the 19th / 20th century were social scientists–to dismiss them as alchemists would be folly. Marx, Weber, Adorno, Habermas, Horkheimer were all Sozialwissenschaftler, interestingly, non of them applied the scientific methods referred to in this article. They were not after reproducibility but understanding–believing it to be an agent of change. Does that make their writings academically less valid or relevant?

I use the term Beweis, because my work is in healthcare / medicine and that is the term used in the context of evidence-based medicine. In addition there is also the term “Beweisführung” which in German, unlike in English can be used for any scientific reasoning–Philosophy as much as Physics, Mathematics or Chemistry, again highlighting the differing attitudes to the sciences.

In my understanding the method is the structure of the argument and the evidence / Beweis are the stepping stones in that argument. Those stepping stones can be data e.g. laboratory results or text references or observations or whatever is considered appropriate evidence for a given method–The crucial component which differentiates academic rigour from alchemy is transparency. Is the evidence accessible to others scrutinising the results, are the statements verifiable? Verifiability and relevance would be the academic criteria here rather than validity and reproducibility.

Absolutely. And I believe (and am working hard on the Beweisführung) that well established methods in the humanities would be / should be relevant in areas of academic inquiry currently considered the reserve of hard sciences, e.g. areas which pertain to human experience. And not in the wishy washy way it is currently applied–lets throw a bit of narrative on top of the real science…

Yes. In the right circumstance, when the data is valid and gathered in a way that is transparent, when the hypothesis examined can be formulated in a way that lends itself to be transferred into numerical data. So in very specific circumstances.

As we have seen in the last decade inferential statistics is not particularly good at either economic or political forecasting, or in predicting risk (I am particularly interested in Safety improvements in medicine). I would dare to suggest that historical analysis is / would be a far better method of forecasting economic and political trends.

Inferential statistics, for obvious reasons, is not very good at predicting phenomena when a) the data is patchy (bcs human bias) and b) there are too many uncertainties (known and unknown unknowns). And thus it doesn’t seem to be particularly particularly helpful in informing policymaking even though it is being sold as a panacea.

Anecdote: I have just started an online statistics course by an ivy league University and am utterly flabbergasted by the sloppy use of terminology. There is no glossary and terms such as “externalities” “bunching” are randomly used according to the instructors heart desire.

In the humanities this would be an absolute no no. If you are using a term explain, define, otherwise you undermine the verifiability of your reasoning. To me this is telling, because in my experience of the application of inferential statistics pertaining to human experiences there is a tendency of sloppy description of data–which in turn can easily invalidate the findings.


When I read that in the original piece, I thought about how many young academics are turning away from non-STEM fields because there aren’t enough jobs on a real career path, no job security (in an industry that until recently had the greatest job security imaginable), and the pay isn’t enough to justify the years of training and 6-figure debt.

Blaming this exodus on social media trolls is juvenile thinking, and pretty much proves the accuracy of the rebuttal.


It’s a clever turn of phrase, and I get your meaning when it’s broken down, but the association feels derogatory. Alchemy carries with it the connotations of a promise that is greater than the methods or state of knowledge could produce. Namely, turning elements into gold without knowledge of chemical structures or true elements.

I agree with the presumption that we understand the basic building blocks of the hard sciences a lot better (they’re easier to understand, after all, as @Enkita pointed out above) than in psychology, history, sociology, anthropology, but the promises made by the practitioners are a lot different. A responsible social scientist doesn’t promise to make gold out of iron. They make promises that fit the state of the science, which most of the time amounts to “contributing to the understanding of X.” As @Wanderfound pointed out, psychology crosses this line a bit more, as there is a fuzzy boundary with biology, but even biology is far more complex than inorganic sciences, on the spectrum of “how conclusive are our conclusions?”


This topic was automatically closed after 5 days. New replies are no longer allowed.