Garbage In, Garbage Out: machine learning has not repealed the iron law of computer science

doctorow · May 29, 2018, 1:17pm

Originally published at: https://boingboing.net/2018/05/29/gigo-gigo-gigo.html

…

RickMycroft · May 29, 2018, 1:29pm

Bathed in his currents of liquid helium, self-contained, immobile, vastly well informed by every mechanical sense: Shalmaneser.

Every now and again there passes through his circuits a pulse which carries the cybernetic equivalent of the phrase, “Christ, what an imagination I’ve got.“

John Brunner, Stand on Zanzibar.

euansmith · May 29, 2018, 1:33pm

This reminds me of the US Military project to train drones to spot tanks. They used a helicopter to take a lot of photos of tanks one day, and then shot the same locations the next day without the tanks. Working off the photos, the control system could spot a photo containing a tank with 100% accuracy. Then someone thought to try the system with some different photos. It turned out that they’d spent a pretty penny training a computer to spot the difference between a sunny day (day 1) and an overcast day (day 2).

jrkrideau · May 29, 2018, 2:47pm

I was just listening to a BBC program where a machine learning researcher was reporting that she has had to wear a blank white mask for a (borrowed) algorithem to recognize her face as she has very dark skin.

I am ‘verryy’ reassured with the deployment of new facial recognition software.

Roy_Wilson · May 29, 2018, 2:51pm

In the words of Hulk Hogan, “Amen, brother!” The training/test sets are SAMPLES! How to assess whether they are “good”/“bad” remains a sticky-wicket.

anon50609448 · May 29, 2018, 3:21pm

This might just be a blacklist of categories you never want to predict, because the cost of a false positive is so high

Like, for example:

If you want good examples of things you shouldn’t use machine learning to predict, look at the things we are using it to predict.

Schmorgluck · May 29, 2018, 3:34pm

Except, of course, rich abusers…

Roy_Brander · May 29, 2018, 4:02pm

A longer treatment of this issue at the Canadian magazine, The Walrus.

To get you interested, here’s an excerpt:

“More than 99 percent of the time, the systems correctly identified a lighter-skinned man. But that’s no great feat when data sets skew heavily toward white men; in another widely used data set, the training photos used to make identifications are of a group that’s 78 percent male and 84 percent white. When Buolamwini tested the facial-recognition programs on photographs of black women, the algorithm made mistakes nearly 34 percent of the time. And the darker the skin, the worse the programs performed, with error rates hovering around 47 percent— the equivalent of a coin toss. The systems didn’t know a black woman when they saw one.”

…so hire an all-black, all-female crew for your next heist!!

ficuswhisperer · May 29, 2018, 4:04pm

That’s not ML. You can do that with a few lines of Perl.

anon50609448 · May 29, 2018, 4:17pm

I took this as just an example. I’ve cleaned up a lot of data. Often once you are aware of a problem you can solve it with a careful find and replace. It’s becoming aware of the problems that takes work.

Bonivus_elderheart · May 29, 2018, 4:49pm

THIS TIMES INFINITY! (and beyond!)
If your fancy ML system is giving you garbage results, look at what you are feeding it- perhaps it is garbage.

kinscore · May 29, 2018, 5:17pm

Was that researcher Joy Buolamwini?
I didn’t find a BBC audio program, but I did find these:

ficuswhisperer · May 29, 2018, 8:21pm

This is why you need to go into ML with a problem/question in mind.

All too often I see people jumping into ML by feeding it data but without any question in mind, which then leads to frustration because the model isn’t giving them useful data.

roomwithaview · May 29, 2018, 8:45pm

42!

gero · May 29, 2018, 9:58pm

Hah, great example! That is the main problem with machine learning algorithms: unlike most statistical models, which allow you to more or less understand how they classify something, a machine learning algorithm is a black box. Enter LIME to the rescue! This is an approach that opens the black box and visualizes an algorithm’s inner workings…

In the paper, the authors use a similar neat example (Section 6.4): to train a classifier to distinguish between huskies and wolves, they purposely fed it pictures of huskies and wolves where all the wolf pictures have snow in the background. Result: the classifier classifies anything as a wolf as long as it has snow in the background.

tyger11 · May 29, 2018, 10:37pm

The wisdom of the ages:

Sanitize your inputs
Always keep your optics clean.
Never get involved in a land war in Asia.
Do not taunt Happy Funball™.

spejic · May 30, 2018, 2:42am

I first heard this story in the early 1990’s, so the problem goes way back.

euansmith · May 30, 2018, 7:48am

For me, the 1990s are still recent Waddaya mean, that’s getting on for thirty years ago?! I demand a re-count!

FGD135 · May 30, 2018, 8:10am

That’s also the part that really takes real intelligence.

FGD135 · May 30, 2018, 8:16am

The two iron laws of computing:

GIGO.
Every new generation of programmers has some PFYs in it that think they found a way to magically circumvent GIGO.

Topic		Replies	Views
To do in NYC next Sat, May 11: "The Bigot in the Machine," a panel on algorithmic bias from PEN and McSweeney's boing	9	791	May 9, 2019
Using Machine Learning to synthesize images that look NSFW but aren't boing	43	3938	October 26, 2016
Towards a method for fixing machine learning's persistent and catastrophic blind spots boing	15	905	May 14, 2019
Beyond GIGO: how "predictive policing" launders racism, corruption and bias to make them seem empirical boing	11	1015	February 19, 2019
AI surveillance cameras to fine British "litter louts" boing	17	634	March 8, 2021

Garbage In, Garbage Out: machine learning has not repealed the iron law of computer science

Related topics