How to solve the artificial intelligence "stop button" problem

You don’t think anybody’s ever thought of that?

Modifying the utility function doesn’t change the problem at all, because at the outset the utility function is undefined. It’s not “make tea.” Isn’t it obvious that’s a stand-in for a much more complex, and more useful function? If you jigger the function so that “press button” is less desirable - even fractionally less - then the robot crushes the baby. If you set things up so that any recognized attempt to press the button reduces the reward function for tea to zero, the robot will immediately try to press the button.

There isn’t any simple solution.

1 Like