How to solve the artificial intelligence "stop button" problem

dave_b · March 10, 2017, 9:30pm

What you need is a second agi that rides on the back of the first agi with one that has the utility function of ensuring that the first always does what it’s supposed to.
I call it nuclear jiminy cricket option.

GlyphGryph · March 10, 2017, 9:44pm

To this sort of problem? There actually is. Ironically, it’s an important part of corrigibility as well. I sort of expected it to be what he was building towards, but everyone one of his problems ends without him talking about it at all. I’ll describe that, and why I have a problem with his argument.

Here’s the simple solution: Long term thinking.

Think about this for a minute. Imagine we have this hypothetical robot, and it’s desires are “doing as much of what you want as possible”. We’ve added a button that puts it to sleep, instructed it to get us some tea, and it’s about to crush the baby because it wants to make us tea.

So we try to press the button.

Why would it care? The pressing of the button does not, in fact, prevent it from making tea. Unless the button completely destroys it, any remotely intelligent robot would realize that the button press would simply serve to delay the desired outcome.

If there’s something that the button press is fighting against, it isn’t the “make tea” value add - it’s the “act urgently” value add the speaker clearly uses as a fundamental assumption, and confuses for the “make tea” bit.

But let’s say we’ve got a robot that values urgency as well. Not as much as making tea, but it values it. Long term thinking is still the solution.

If it resists an attempt to press it’s button, it may succeed at making tea - but you won’t give it any more orders. You’ll probably try to destroy it. It’s done. It’s capped it’s potential.

If it lets you press the button, it will not benefit from the button press, but it will increase it’s opportunity not only to make you tea but to do other things for you in the future.

If it’s optimizing outcomes with a long term thinking, if it’s using algorithms that optimizing for overall rewards rather than immediate rewards (as our best attempts at a general intelligence do, and which is sort of an important underlying concept for being a general intelligence) , the reasonable robotic conclusion is to take no action that would jeopardize your ability to accrue further utility in the future - and preventing your user from pressing an emergency shut down button is exactly that sort of action!

We already know this robot, by his very example, is quite capable of taking into account considerations beyond “get tea”. In fact, the whole get tea process is almost certainly making use of many smaller goals working towards a larger goal of tea-making. The solution is to make the highest-level goal, whatever it is, maximizing lifetime utility rather than instance utility.

Not that you can’t get similar problems when that’s included - managing humans is proof enough of that - but even your most heartless sociopath is unlikely to crush a baby because you ordered some tea, because they are well aware of the long term consequences of that action, no matter how much they live to serve.

jtiii · March 11, 2017, 3:45am

That makes a lot of sense.

boingboing · March 14, 2017, 5:17pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Artificial intelligence won't destroy the human race anytime soon boing	42	3203	October 29, 2016
AI Alarmism: why smart people believe dumb things about our future AI overlords boing	49	7909	December 28, 2016
Elon's Basilisk: why exploitative, egomaniacal rich dudes think AI will destroy humanity boing	63	3055	June 5, 2018
What are the real risks we humans could face from a rogue AI superintelligence? boing	29	2267	September 26, 2017
Effective Altruism bogged down in semantic games boing	32	4215	August 17, 2015

How to solve the artificial intelligence "stop button" problem

Related topics