Machine learning has a reproducibility crisis

doctorow · March 19, 2018, 4:51pm

Originally published at: https://boingboing.net/2018/03/19/irreproducible-results.html

…

Roy_Brander · March 19, 2018, 6:24pm

In short, we are still at the stage of just messing around.

Which is fine. I have a gut-feel I can’t hope to justify that reproducing anything like the human mind and consciousness is going to take, literally, centuries of work by thousands of experimenters, 99% of whom will patiently exhaust every blind alley on the way.

Reproducible experiments will gradually, umm…evolve…out of this primitive level of messing around, a few decades from now. And we’ll still be just getting started.

Sqyntz · March 20, 2018, 12:15am

maybe this is what the machines want us to think.

anon47741163 · March 20, 2018, 1:36am

The study of artificial intelligence is perpetually on the verge of a breakthrough-kind of like how lead-into-gold was just another experiment or two away from perfection.

Now, the study of artificial STUPIDITY, that could actually make humans less miserable, if researchers could disentangle themselves from the research subjects…

NickyG · March 20, 2018, 2:15am

“Machine learning” is such a vast term, I submit that this headline has a specificity crisis.

StVincent · March 20, 2018, 4:39am

Yup.

Moreover, I would hazard that this is a bug, not a feature. Machine learning is probabilistic; even with identical parameters, the code merely defines the rates at which algorithms try new things vis reviewing learned behaviors for accuracy.

Also, it’s possible to save and export specific machine states for purposes of study. If two configurations produce different outcomes/behaviors, THAT’S what’s worth investigating.

If you want to play around with an AI, try this:

http://projects.rajivshah.com/rldemo/

You set the parameters and let it learn! When you want to check in on the algorithm’s progress, turn off learning and watch it go! I spent a solid afternoon avoiding work with this on Saturday.

GulliverFoyle · March 20, 2018, 8:18am

“In the first decades of the twenty-first century, everyone suspected that this world was being watched, sloppily and inattentively, by intelligences far inferior to humans: minds that are to our minds as those of paramecia; intellects narrow, mediocre and uncomprehending.” ~ Greg Egan

LurkingGrue · March 21, 2018, 1:04am

Norman

anon90144361 · March 21, 2018, 5:17pm

I see this as an issue with not using the tools available, or the general laziness of researchers.

Git works fine for storing versioned model code, weights, and data. Code is committed like anything else, and model weights and data is stored in git LFS.

At my work, we store all of our training data in a repo, with docker files for creating the development environment, and our model code. We follow the same version control process that any other development project would use, and we don’t have a problem with reproducing each other’s work. You can go to any point in the commit history, and rebuild a model in the state it was created.

The problem it seems with researchers is they commit incomplete work, usually with data, or crucial pieces of code missing. Most machine learning projects I’ve tried to pull down and work off of are useless, but that’s not the tools fault.

doctorow · March 24, 2018, 4:52pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Prominent AI researchers call the entire field "alchemy" boing	18	1660	May 9, 2018
Towards a method for fixing machine learning's persistent and catastrophic blind spots boing	15	905	May 14, 2019
Anyone who claims that machine learning will save money in high-stakes government decision-making is lying boing	10	1373	January 17, 2018
This machine learning bundle is your ticket to AI mastery boing	2	853	March 9, 2017
Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes boing	5	746	December 14, 2019

Machine learning has a reproducibility crisis

Related topics