Google's talking AI is indistinguishable from humans

frauenfelder · April 3, 2018, 5:04pm

Originally published at: https://boingboing.net/2018/04/03/googles-talking-ai-is-indist.html

…

TobinL · April 3, 2018, 5:14pm

I for one welcome our new Tacotron overlords.

HMSGoose · April 3, 2018, 5:25pm

Well, said overlords are either SJW socialists, or are tracking the activities of / trying to lure SJW Socialists into it’s trap… Could be a good or bad sign…

“That girl did a video about Star Wars lipstick.”
“She earned a doctorate in sociology at Columbia University.”
“The buses aren’t the PROBLEM, they actually provide a SOLUTION.”

brad_quinn · April 3, 2018, 5:25pm

I’m a bit concerned that they’re too busy for romance. Is that because they’re spending all their time plotting our downfall?

TobinL · April 3, 2018, 5:26pm

Who doesn’t do that in their spare time?

CarlMud · April 3, 2018, 5:28pm

The uncanny valley for voice recordings vs. AI generated speech is that humans will often breathe into the mic, even for a split second. If they start emulating small breath sounds, the AI voice would be even more difficult to distinguish.

Tribune · April 3, 2018, 5:30pm

“The entree consists of boiled dog”

LDoBe · April 3, 2018, 5:40pm

Tacotron is based on the Google’s wavenet algo, and one of wavenet’s first accomplishments was making all the gross little breathing, lip smacking, tongue flapping and sniffing noises. It’s also good at generating convincing piano music.

lava · April 3, 2018, 5:40pm

first the AI is 2 - the pace is too uniform
second the AI is 1 - the real voice slows around difficult consonant combinations
third the AI is 2 - it lacks the depth of tonal variation of the other
fourth the AI is 2 - some mis-emphasis on syllables

m_a_t · April 3, 2018, 5:42pm

They’re not fooling me yet.

hecep · April 3, 2018, 5:44pm

Maybe we’re all just confused and humans are just starting to sound more like talking AIs.

jhbadger · April 3, 2018, 5:44pm

The bus one might be about the Google buses rather than public transport, though.

hecep · April 3, 2018, 5:52pm

Perhaps not until a touch of “human” is introduced, such as stammering, or what you hear ends up f—ing you over.

hecep · April 3, 2018, 5:54pm

For the record… me too!.
Hear that, overlords? Hello? Uh, oh. Too late.

Damocles · April 3, 2018, 5:55pm

Still discernible subtle differences but pretty good!

tyroney · April 3, 2018, 5:56pm

A tacogon is the tastiest shape. Maybe the Tacotron equally delicious?

edit: I might have to name my new car tacotron

hecep · April 3, 2018, 5:57pm

It’s AI. Their plot will be a log-log one.

brad_quinn · April 3, 2018, 5:59pm

Is that a CS dad joke?

mr_raccoon · April 3, 2018, 6:00pm

The real question about incorporating AI into speech synthesis isn’t so much how much more realistic you can make it sound but in how many more places speech synthesis can be incorporated into with the AI.

I mean for example take the rise of speech synths alongside singing synths like Vocaloid in Japan. Programs like Voiceroid for speech and CeVIO for both speech & singing have been growing in popularity in use. Those programs are starting to get heavily used in “Let’s Play” gameplay videos, reading news articles, some light usage in skits; short anime clips; small film & etc.

But those programs main problem is they really can’t be used in real time, thus limiting where they can be used and in what capacity/usage. You can narrate prerecorded gameplay footage with the software, but you have a very difficult time trying to both play a game and input words into it at the same time. There has been some experimentation of incorporating machine learning and AI to make voice recognition work with these programs thus making it possible to do both at the same time (https://gigazine.net/news/20180220-ai-voice-change/).

There should be some concern about AI and voice synthesis but mostly around the fact that scammers could possibly target elderly people much easier over the phone, they could target a significantly larger number of them and the increased realism of synthetic voices is only going to make that problem even worse. It could end up being as almost as easy as sending out common spam.

hecep · April 3, 2018, 6:03pm

I… don’t… think… so…

Topic		Replies	Views
Death to the "chatbot passed a Turing Test" story boing	31	3676	June 16, 2014
Google engineer suspended after allegedly warning of AI sentience boing	66	2558	June 19, 2022
Perfect impression of a contemporary text-to-speech bot boing	12	1796	April 14, 2019
Your voice-to-text speech is recorded and sent to strangers boing	42	4437	March 5, 2015
Microsoft AI chatbot promptly becomes Nazi boing	106	7481	March 29, 2016

Google's talking AI is indistinguishable from humans

Related topics