Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes

doctorow · December 9, 2019, 6:39pm

Originally published at: https://boingboing.net/2019/12/09/reward-hacking.html

…

anon50609448 · December 9, 2019, 7:48pm

I love the idea of trying to train an AI to win Scrabble and what the AI ends up doing is corrupting it’s own understanding of the scoring system and playing to that scoring system, thereby always “winning.” It helps explain contemporary politics.

anon47741163 · December 9, 2019, 8:12pm

Some of my favorite science fiction has futures with prohibitions against AI for anything more important than a bomb or a toaster. I forget why Dune had mentats instead of AI, but (as least)one of Larry Niven’s worlds had AI’s that very quickly went insane as soon as they were turned on.

This particular reality seems hell bent on using AI to leverage every other problem we have, and make it more profitable.(to someone)

beep54orama · December 10, 2019, 12:19am

It was the Butlarian Jihad. Also, I love that .gif.

hecep · December 10, 2019, 12:25am

When machine learning starts exhibiting human failure modes, then we’ll know they’re getting close.

doctorow · December 14, 2019, 6:42pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
A catalog of ingenious cheats developed by machine-learning systems boing	24	2275	November 17, 2018
Wireheading: when machine learning systems jolt their reward centers by cheating boing	19	1857	January 16, 2020
Machine learning models keep getting spoofed by adversarial attacks and it's not clear if this can ever be fixed boing	31	3518	March 14, 2018
Jerks were able to turn Microsoft's chatbot into a Nazi because it was a really crappy bot boing	67	5554	March 31, 2016
ReCaptcha reconfigured to train people on what is deemed “unacceptable uses of force” by police boing	7	805	August 31, 2020

Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes

Related topics