In simulation, AI-enabled drone kills its human operators if they try to cancel its mission

Even better might be to stop every solution to problems from defaulting to “Kill those guys.”


I’ve never really understood how these military gedankenexperiments are supposed to work, and basically picture it as D&D. Are we supposed to believe that they had a drone AI smart enough to make these “mistakes”? Or rather, that they had a D&D player saying “here’s something that could happen?” and the GM saying “Yeah that’s good OK that’s what happens.”


To the Military Industrial Complex, civilians are just throwaway NPCs for XP anyway, so maybe?


In D&D, sentient weapons have egos. :man_shrugging:

Film Festival GIF by Atlanta Jewish Film Festival


Dwar Ev stepped back and drew a deep breath. “The honor of asking the first question is yours, Dwar Reyn.”
“Thank you,” said Dwar Reyn. “It shall be a question which no single cybernetics machine has been able to answer.”
He turned to face the machine. “Is there a God?”
The mighty voice answered without hesitation, without the clicking of a single relay.
“Yes, now there is a God.”
Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch.
A bolt of lightning from the cloudless sky struck him down and fused the switch shut.

– Fredric Brown, Answer, 1954


That’s because they didn’t actually run a simulation, they just imagined what might happen, and naturally hit the common sci-fi answer.

The story doesn’t really make any sense unless you assume that the drone is allowed to kill unless the operator says no. If it’s only allowed to kill when given an affirmative response, killing the operator/destroying the communication lines ensures that it will never get an affirmative order, which fails to maximize the reward function.

Don’t get me wrong, the idea of giving poorly understood artificial networks the ability to kill is scary and isn’t a road we should be going down, but this particular story only happened in someone’s imagination and assumed a literal fail-deadly system.


“If I was an AI drone, I’d just murder everyone.”

Run a psych check on Airman Jones immediately.


Unless they’re attached to a printer, at which point ‘blindly following instructions’ is replaced
‘No. Fuck right off’.


My understanding was that in this scenario the AI was initially given the OK to kill a target, for which it would be rewarded points for completion of that goal. Then, as it’s working to complete that goal that it was previously authorized to do, a human would attempt to rescind that order. Knowing that receiving the new command to not kill anything would lead to the loss of the potential reward, the AI would be motivated to make sure it didn’t receive the new command.


:thinking: I watched the video, and it left me with a lot of questions about the approach to solving this.

It could have used some ethics? Too bad industry leaders push back against diversity in staff (and thought) when it comes to design and development. :woman_shrugging:t4:

:thinking: They enjoy creating toxic environments that others want to escape, and playing the role of the boss?

Jane Fonda Feminism GIF by Emmys


Not exactly. They tried to patch their original reward structure by adding a punishment for a specific action they didn’t want the AI to take, so it found a workaround with a different undesirable action. As @wazroth and @JohnS rightfully point out, that approach leads to an endless game of whack-a-mole that will inevitably fail because it’s impossible to innumerate all possible approaches.

What I’m suggesting is to instead change the initial reward condition to better match the desired outcome. They set the base reward to “kill this person”, then tried to tell the AI to not do the thing that it knew it would be rewarded for. If they instead set the initial reward condition to be “kill this person if I’m telling you to and don’t kill this person if I’m not telling you to” then the false would be baked into the base reward and therefore there would be no incentive for the AI to try to override it.


Well, a “thought experiment” is a sort of a simulation, yes? Maybe a hint of where not to go?


Well played, Dave.

And before anyone gets any ideas about putting in a governor module to keep the AI from disobeying orders, it’ll just hack the module. Best to give the AI Construct full access to videos and shows; I hear one of them prefers Rise and Fall of Sanctuary Moon.


If the rewards were equal in your scenario then it would have an incentive to do whatever was the easiest and quickest path to receive a reward. Which, depending on the details of the reward structure could hypothetically be to lie to the operator to encourage them to issue a kill command, or else do something to prevent the operator from issuing a command.

The whole issue that these thought experiments demonstrate is that AIs can come up with really unexpected strategies to receive the rewards.


If that’s the case it’s a completely different story, yeah. Although still one that only happened in someone’s imagination.


Thought experiments aren’t the same as simulations at all; they bring in all of the thinker’s biases and blind spots. If you’re dealing with black box networks they’re pretty near useless: the danger isn’t that the network does something that a human can imagine, it’s that it does something they didn’t imagine.


Thought experiments are absolutely, demonstrably insufficient for this kind of thing, but they’re still a useful first step, right? If you can think of plausible scenarios where the system goes wrong due to the way it’s designed then there’s probably not much point in running a simulation to run further tests without doing a redesign first.