Tiny alterations in training data can introduce "backdoors" into machine learning models

Originally published at: https://boingboing.net/2019/11/25/backdooring-ai.html


If it were possible to do this then millions of years of evolution would have caused us not to see heat on flat surfaces in the distance as water. Basically the question is, “Can we make a sensor that accurately perceives the world?” and that question is laughable.

With that out of the way I find this idea of poisoning data super exciting. I don’t know how weird that makes me sound. I’m not saying I’m looking forward to doing it. I suddenly thought of the scene from West World where Anthony Hopkins causes all the automata to freeze without doing anything noticeable and likens their power to a magician who has not revealed their tricks. The idea that there might one day be a person who can wave their hand and suddenly every car on the street turns right just jumped out of absurd science fiction right into, “Yeah, you might be able to pull that off.”


I was thinking more of pareidolia than something like heat illusions.

1 Like

Brings to mind the ugly t-shirt from Zero History.

Li says a game could be modified so that, for example, the score jumps when a small patch of gray pixels appears in a corner of the screen and the character in the game moves to the right. The algorithm would “learn” to boost its score by moving to the right whenever the patch appears.

So…in other words, machine learning. It would be erroneous of the AI not to learn to boost its score by doing that.

All they are saying is that, when it comes to AI, you can teach an old dog new tricks.

The Manchurian Machine…

Is it possible to give AI a command to unlearn something? For example, if you notice that the data has been poisoned with adversarial examples.

Another thing, if you have self driving cars that are controlled by AI and the government decides to introduce a new traffic sign (or rule), this would mean that all AI’s have to be updated.

I knew someone way back that had a well curated Led Zeppelin station on Pandora. One day they clicked, I forget what song, but after that, the station started playing Abba constantly. They tried valiantly to click songs to steer Pandora away from Abba. But the more they did the worse it got. It was pretty funny to watch from afar.

1 Like

Interesting book review:

You Look Like a Thing and I Love You : A quirky investigation into why AI does not always work


It will be much more amenable to study when we have a fully mechanized example; but I suspect that the people preying on gambling addicts already know something about hitting a reinforcement-learning neural network right in thee backdoor with an adversarial stimulus that effectively triggers what we think we’ve learned(individually and on an evolutionary scale) about risk/reward; but with odds that are not what they appear.


Depends on how crude you are willing to be: unless your backups are sloppy you can always step a computer back to an earlier state; which will make it forget whatever happened between today and the snapshot you restored; but more precise work without the mindwipe?

That’s a much, much, taller order. We generally don’t know exactly where the “something learned” is; or how it’s encoded(the complexity is way lower than a big biological brain and you don’t have to contend with neural imaging limitations; but it’s still largely a black box); so you can’t just do a “yup, snip that association there”; and designing a new training set to counter a given poisoned training set is not a readily obvious process. There almost certainly exists such a training set; whether you can generate it, ideally by some deterministic algorithm that takes usefully finite amounts of time…


This topic was automatically closed after 5 days. New replies are no longer allowed.