Google's talking AI is indistinguishable from humans

I’m going with 2,1,1,2 for the AI. But damn it’s getting close.

1 Like

Huh, I thought that with the first one, the second version, where there’s stress on “that,” was obviously more human, and the first version seemed too uniform in pace. But you’re right. I think the only one I actually got right was the last, where I thought I detected some obvious emotion in the human version.

I think the problem may be that I’m actually the robot. (Twist!)

2 Likes

Luckily dogs can still detect disguised Taconator units.

3 Likes

If the training data included those particular samples that’s a good criticism, but it’s possible they first trained it on other samples of the woman’s voice, then had the woman read some new phrases and separately gave them to the AI.

I dunno how much training data this program requires, but it would be pretty cool if they could train it just using samples of HAL in the movie (or other films where the actor did dialogue in similar tones, like his narration for the educational film “Universe” which apparently inspired Kubrick to use him for HAL)

1 Like

It’s possible, yes. But I’m still suspicious about how similar the two sound in each case - there are so many different ways of inflecting those sentences, it’s a mighty strange coincidence that the speech synthesizer is so close to the human. If you ask two people to read those sentences, there’s no way they’d sound so similar accidentally.

Maybe they asked the voice actor to mimic the speech synthesizer. Which really is cheating.

What are we going to do tonight, Brain?

4 Likes

My wife and I are both separately pretty clear that #1 sounds vaguely Siri like and that it’s the robot. Of course we’ll be pretty shocked if we have them all backwards or something haha

I wonder how it pronounces “games journalism”?

1 Like

5 Likes

According to mainstream radio, all singers are sounding like Autotuner.

2 Likes

I’ve seen similar statements about voices that are indistinguishable that sounded like badly tuned vocaloids, this at least, I can’t really tell the difference.

But on the other hand all the samples sound like supernaturally clean recordings of a voice actor reading copy in a friendly but vacant, emotionally flat manner, making the samples sound like a robot either way, which is somewhat less impressive. Should make the amiable but hollow voice of whatever digital assistant somewhat more pleasant I guess?

Or would it? I’m honestly wondering when artificial voices will reach a point where they’ve been scrubbed clean of all their worst tics and quirks, and we begin missing them and program the things to have those traits on purpose as an affectation. Not unlike the voice actress doing the voice of GLaDOS, except a robot copying an earlier iteration of robot, but only putting on the affect when it doesn’t damage communication. Like a song with record hisses and pops only at the beginning to set the mood.

1 Like

And her name is Sinclair Turing.

1 Like

Hmm…I’m guessing that it could be used to incorporate player names into NPC responses in games. Those could be pre-generated and added to the canned responses.

1 Like

It’s sad, but on one Bob Mould (formerly with Husker Du) CD, I could clearly hear his singing being pushed through an autotuner. All I could think was, et tu, Mould?

1 Like

Still waiting for the voice assistant that sounds like Paul Frees, Hans Conried, or Jonathan Harris.

2 Likes

If this thing asks you to play a game, shut it down and go full Office Space on it.

1 Like

The musical pitch of the voice in the samples is different in examples 2-4(this could also be the case for the first one, but its beyond the limit of my perceptual abilities ).

Example one emphasizes different words and seems faster.

It is certainly difficult to detect the difference, but this seems like a bit of chicanery. I would say the in sample 1, the first example was AI, which is wrong apparently. If I’m confusing a regular human voice with an AI voice, I think it would have more to do with the particular way the human is talking, and also the possible psychological effects of messing with the pitch and word emphasis with the design to trick you into a wrong answer. For instance, I would wager that the higher pitched voice would be more likely to be considered human.

This topic was automatically closed after 5 days. New replies are no longer allowed.