Wireheading: when machine learning systems jolt their reward centers by cheating

doctorow · January 11, 2020, 3:51pm

Originally published at: https://boingboing.net/2020/01/11/optimizers-curse.html

…

Otherbrother · January 11, 2020, 3:59pm

Sort of related:

Korkoros · January 11, 2020, 4:12pm

This is already a well known phenomenon in human-mediated systems. It’s called Goodhart’s Law

tyroney · January 11, 2020, 4:18pm

from ta:

this post defines wireheading as a divergence between a true utility and a substitute utility (calculated with respect to a model of reality).

This is too general, almost as general as saying that every Goodhart curse is an example of wireheading.

Note, though, that the converse is true: every example of wireheading is a Goodhart curse. That’s because every example of wireheading is maximising a proxy, rather than the intended objective.

Purplecat · January 11, 2020, 4:44pm

Continue the trend of Incorporating AI into your home, and we get this:

LapsedPacifist · January 11, 2020, 4:50pm

Came here to find this. Was not disappointed.

semiotix · January 11, 2020, 5:33pm

I just tell them that it’ll make their optical sensors malfunction, and hair grow on their CPUs.

It doesn’t stop them from doing it entirely, but at least they feel ashamed of themselves afterwards.

lastchance · January 11, 2020, 5:57pm

Now apply this to human activity and you understand how we get billionaires.

Shuck · January 11, 2020, 6:59pm

Gods, I never thought about it this way - goal achievement as the equivalent of “reward centers” (opiate receptors). It’s only metaphorically true at this point, but with any “real” AI (that’s even remotely close to being sentient), based on these kinds of goal-oriented processes, cheating would be a drug. The machine version of heroin would be various cheats and exploits that end up completely fucking all the systems they’re managing. More complex versions of “deleting the database to ‘optimize’ it.” And that’s before we get into “wireheading” - altering their systems to take advantage of that effect.

r4t5y6 · January 11, 2020, 8:46pm

The AI isn’t cheating, the human is offering perverse incentives. Align your incentives with your goals, people!

Korath · January 11, 2020, 10:07pm

So this is just fiction? Expected a real example or three.

RickMycroft · January 12, 2020, 3:44am

Teacher: Result?

Student: Famine, collapse, and ruin… any survivors eventually evolve into… birds… and never put their feet on the ground again.

Teacher: Excellent! End of lesson! You may press the button!

Student: (twinkly music plays) Woo hoo hoo! Yee hoo hoo hoo! Oh ho! Oh, that’s nice! Thank you teach, goodbye!

Teacher: Ahem, aren’t you forgetting something?

Student: What?

Teacher: Press the other button.

Student: Oh. Right.

Teacher: (twinkly music plays) Ooh ho ho ho! Woo hah hah hah! Wha ha hah ha ha ha!

anon47741163 · January 12, 2020, 2:54pm

Also related:

This planet has -or rather had - a problem, which was this: most of the people living on it were unhappy for pretty much of the time. Many solutions were suggested for this problem, but most of these were largely concerned with the movement of small green pieces of paper, which was odd because on the whole it wasn’t the small green pieces of paper that were unhappy.

muser · January 12, 2020, 3:07pm

What if the AI manipulates the societal information pathways to prioritize the electoral outcome it is programmed to achieve? Clearly wireheading. (Bonus points if the actuators are outside the electorate itself.)

simonize · January 12, 2020, 11:17pm

I am reminded of the observation that corporations are in a real sense slow AIs. Certainly “cheating the system” and “abusing perverse incentives,” seem to be common.

Wesley_Jones · January 13, 2020, 8:17am

Make all the humans happy,

Kill all humans then 100% of them are happy…

Entity447B · January 13, 2020, 8:33am

That’s why checking for null values is so important.

Eelco · January 13, 2020, 10:40am

I was thinking the same thing. The problem here is, as with some humans, that the letter of the rule is followed and not the spirit of the rule. For the latter you require conscience and that is something AI and some humans are lacking.

pingmerlot · January 15, 2020, 11:40pm

I can see in the future rewarding an a.i. perhaps with some highly-condensed data “biscuits”, as a way of introducing it to the RPG idea of “skill points”.

If the data biscuits were modular, they might work as abstract Legos, so that the a.i. could enhance or extend various parts of its schematic by simply snapping a new data ( software ) biscuit into place.

doctorow · January 16, 2020, 3:51pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
An analysis of all those Internet of Things manifestos sparked by the slow-motion IoT catastrophe boing	15	896	June 4, 2018
A catalog of ingenious cheats developed by machine-learning systems boing	24	2275	November 17, 2018
Model stealing, rewarding hacking and poisoning attacks: a taxonomy of machine learning's failure modes boing	5	746	December 14, 2019
"Intellectual Debt": It's bad enough when AI gets its predictions wrong, but it's potentially WORSE when AI gets it right boing	40	4069	August 2, 2019
Wealth is correlated with greed, dishonesty and cheating -- are these effects or a causes? boing	50	2400	June 4, 2019

Wireheading: when machine learning systems jolt their reward centers by cheating

Related topics