Machine learning has a reproducibility crisis


Originally published at:


In short, we are still at the stage of just messing around.

Which is fine. I have a gut-feel I can’t hope to justify that reproducing anything like the human mind and consciousness is going to take, literally, centuries of work by thousands of experimenters, 99% of whom will patiently exhaust every blind alley on the way.

Reproducible experiments will gradually, umm…evolve…out of this primitive level of messing around, a few decades from now. And we’ll still be just getting started.


maybe this is what the machines want us to think. :thinking:


The study of artificial intelligence is perpetually on the verge of a breakthrough-kind of like how lead-into-gold was just another experiment or two away from perfection.

Now, the study of artificial STUPIDITY, that could actually make humans less miserable, if researchers could disentangle themselves from the research subjects…


“Machine learning” is such a vast term, I submit that this headline has a specificity crisis.



Moreover, I would hazard that this is a bug, not a feature. Machine learning is probabilistic; even with identical parameters, the code merely defines the rates at which algorithms try new things vis reviewing learned behaviors for accuracy.

Also, it’s possible to save and export specific machine states for purposes of study. If two configurations produce different outcomes/behaviors, THAT’S what’s worth investigating.

If you want to play around with an AI, try this:

You set the parameters and let it learn! When you want to check in on the algorithm’s progress, turn off learning and watch it go! I spent a solid afternoon avoiding work with this on Saturday.


“In the first decades of the twenty-first century, everyone suspected that this world was being watched, sloppily and inattentively, by intelligences far inferior to humans: minds that are to our minds as those of paramecia; intellects narrow, mediocre and uncomprehending.” ~ Greg Egan




I see this as an issue with not using the tools available, or the general laziness of researchers.

Git works fine for storing versioned model code, weights, and data. Code is committed like anything else, and model weights and data is stored in git LFS.

At my work, we store all of our training data in a repo, with docker files for creating the development environment, and our model code. We follow the same version control process that any other development project would use, and we don’t have a problem with reproducing each other’s work. You can go to any point in the commit history, and rebuild a model in the state it was created.

The problem it seems with researchers is they commit incomplete work, usually with data, or crucial pieces of code missing. Most machine learning projects I’ve tried to pull down and work off of are useless, but that’s not the tools fault.


This topic was automatically closed after 5 days. New replies are no longer allowed.