Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes

Originally published at: https://boingboing.net/2019/12/09/reward-hacking.html

1 Like

I love the idea of trying to train an AI to win Scrabble and what the AI ends up doing is corrupting it’s own understanding of the scoring system and playing to that scoring system, thereby always “winning.” It helps explain contemporary politics.

3 Likes

Some of my favorite science fiction has futures with prohibitions against AI for anything more important than a bomb or a toaster. I forget why Dune had mentats instead of AI, but (as least)one of Larry Niven’s worlds had AI’s that very quickly went insane as soon as they were turned on.

This particular reality seems hell bent on using AI to leverage every other problem we have, and make it more profitable.(to someone)

2 Likes

It was the Butlarian Jihad. Also, I love that .gif.

1 Like

When machine learning starts exhibiting human failure modes, then we’ll know they’re getting close.

1 Like

This topic was automatically closed after 5 days. New replies are no longer allowed.