This software can clone a person's voice by listening to a 5-second sample

Originally published at:


All the better to impersonate a target’s mother on the phone (as in The Terminator) or possibly a target’s foster mother on the phone (as in Terminator 2: Judgement Day).


My mind went to Ferris Bueller.


I’m not sure why a Terminator would want to impersonate Ferris Bueller on the phone but I guess that could work too.


Well, assuming the Terminator came of age during the 1980s, I’m pretty sure everyone wanted to be Ferris Bueller.


I wonder how it works with transgender voices for trans-folks crossing the binary vocally?


I was discussing this last night with the missus.

With realtime raytracing hardware now, deepfakes and now this voice cloning capability, if you look over the horizon and imagine slick tooling made available to home users, where you end up is very watchable fanfiction of long-dead TV shows.

I would watch fanmade episodes of “Pushing Daisies”.

You look a little past this point, and the lightbulb goes off for the writers of such shows. Low cost reboots with royalties paid to the original actors.


Tell tale being a slight Austrian tone due to the base voice files.

Deep fakes just got a LOT deep fakier.


Shopping list for immediate future (everyday citizen version):

-Guy Fawkes mask
-mirrored clothing
-red laser pointer thingy
-tin foil (double the order)
-hair dye
-multiple VPN subscriptions
-anti-drone netting
-colored contact lenses
-pi hole
-and a fucking Electrolarynx


I’m disappointed that I can’t think of any hacker movies that foresaw this technology, as the immediate use is going to be calling up someone at work to impersonate their boss and get them to compromise the entire organization/transfer millions of dollars.

I’m sure the software don’t care. The training set is “human voices” and it’s not distinguishing gender or age, etc. What it’s learned from that is just how to extrapolate from a small sample, rather than try to reconstruct the voice based on other similar voices. What might throw it off might be unusual speech impediments or medical conditions that make voices inconsistent in usual ways, or give the speaker’s voice a quality that it doesn’t recognize as human.

I’ll be curious to see how the new James Dean movie does…


There’s probably a lot of pronunciations that will get messed up.

Reproducing the tone of the voice is very different than the accent or dialect.


Yeah, it couldn’t not mess up pronunciations - there’s not enough information for it to pronounce words the way the speaker would. It’ll be using pronunciations from whatever general model they’re using, obviously (though presumably that could be modified on a per-voice basis to sound more authentic). But it’ll sound like that person pronouncing something in a way that they wouldn’t normally. I imagine accent is partially captured, though - there’s also not enough information to fully capture it, but it also does impact tone of voice to some degree, so it’s not totally ignored (and again, presumably modifiable on a per-voice basis).

So it’s possible that the voicemail I got from Obama telling me to vote for Trump was faked?


Was that during the 2007-8 runnings when Trump was a democrat ? If so, maybe.

I thought I saw that it has takes “sex” parameter, but actual voice combines aspects of sex (e.g., register and formants resulting from bimodal distribution of phenotypic sex), and gender (e.g., intonation, word choice, non-verbal cues such as vocal fry, etc.), so I wonder. :slight_smile: I might have seen this or something very like this a few days ago, and be transposing the other’s sex parameter (if it was a different tool).

I haven’t tried to read the paper, but in the video they just happened to mention the gender of the speakers in some samples they had, but it didn’t seem to be relevant to anything in terms of how the process worked.

1 Like

Aha, good! About time too. I’ve been waiting for ages to get a cold, unfeeling machine to perfectly mimic the sweet way my girlfriend says my name so I can horrify myself right off.

5.5 minute video by “Two Minute Papers”.

I used to have a very good voice changer to play around with and honestly no matter what our brains are telling us about male-sounding and female-sounding voices it’s almost entirely just pitch. I doubt the AI would have to bother distinguishing.