Tiny alterations in training data can introduce "backdoors" into machine learning models

doctorow · November 25, 2019, 7:26pm

Originally published at: https://boingboing.net/2019/11/25/backdooring-ai.html

…

anon50609448 · November 25, 2019, 7:37pm

If it were possible to do this then millions of years of evolution would have caused us not to see heat on flat surfaces in the distance as water. Basically the question is, “Can we make a sensor that accurately perceives the world?” and that question is laughable.

With that out of the way I find this idea of poisoning data super exciting. I don’t know how weird that makes me sound. I’m not saying I’m looking forward to doing it. I suddenly thought of the scene from West World where Anthony Hopkins causes all the automata to freeze without doing anything noticeable and likens their power to a magician who has not revealed their tricks. The idea that there might one day be a person who can wave their hand and suddenly every car on the street turns right just jumped out of absurd science fiction right into, “Yeah, you might be able to pull that off.”

sqlrob · November 25, 2019, 8:15pm

I was thinking more of pareidolia than something like heat illusions.

comedian · November 25, 2019, 10:04pm

Brings to mind the ugly t-shirt from Zero History.

KingGhidorah · November 26, 2019, 2:30am

Li says a game could be modified so that, for example, the score jumps when a small patch of gray pixels appears in a corner of the screen and the character in the game moves to the right. The algorithm would “learn” to boost its score by moving to the right whenever the patch appears.

So…in other words, machine learning. It would be erroneous of the AI not to learn to boost its score by doing that.

All they are saying is that, when it comes to AI, you can teach an old dog new tricks.

phiis161803 · November 26, 2019, 8:05am

The Manchurian Machine…

Eelco · November 26, 2019, 1:17pm

Is it possible to give AI a command to unlearn something? For example, if you notice that the data has been poisoned with adversarial examples.

Another thing, if you have self driving cars that are controlled by AI and the government decides to introduce a new traffic sign (or rule), this would mean that all AI’s have to be updated.

Garymon · November 26, 2019, 8:50pm

Anecdotal…
I knew someone way back that had a well curated Led Zeppelin station on Pandora. One day they clicked, I forget what song, but after that, the station started playing Abba constantly. They tried valiantly to click songs to steer Pandora away from Abba. But the more they did the worse it got. It was pretty funny to watch from afar.

FGD135 · November 26, 2019, 9:07pm

Interesting book review:

You Look Like a Thing and I Love You : A quirky investigation into why AI does not always work

fuzzyfungus · November 26, 2019, 10:22pm

It will be much more amenable to study when we have a fully mechanized example; but I suspect that the people preying on gambling addicts already know something about hitting a reinforcement-learning neural network right in thee backdoor with an adversarial stimulus that effectively triggers what we think we’ve learned(individually and on an evolutionary scale) about risk/reward; but with odds that are not what they appear.

fuzzyfungus · November 26, 2019, 10:29pm

Depends on how crude you are willing to be: unless your backups are sloppy you can always step a computer back to an earlier state; which will make it forget whatever happened between today and the snapshot you restored; but more precise work without the mindwipe?

That’s a much, much, taller order. We generally don’t know exactly where the “something learned” is; or how it’s encoded(the complexity is way lower than a big biological brain and you don’t have to contend with neural imaging limitations; but it’s still largely a black box); so you can’t just do a “yup, snip that association there”; and designing a new training set to counter a given poisoned training set is not a readily obvious process. There almost certainly exists such a training set; whether you can generate it, ideally by some deterministic algorithm that takes usefully finite amounts of time…

doctorow · November 30, 2019, 7:26pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Machine learning models keep getting spoofed by adversarial attacks and it's not clear if this can ever be fixed boing	31	3510	March 14, 2018
Researchers think that adversarial examples could help us maintain privacy from machine learning systems boing	5	698	October 7, 2019
Towards a general theory of "adversarial examples," the bizarre, hallucinatory motes in machine learning's all-seeing eye boing	5	879	March 13, 2019
A catalog of ingenious cheats developed by machine-learning systems boing	24	2263	November 17, 2018
Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes boing	5	744	December 14, 2019

Tiny alterations in training data can introduce "backdoors" into machine learning models

You Look Like a Thing and I Love You : A quirky investigation into why AI does not always work

Related topics