Common Voice: Mozilla releases the largest dataset of voice samples for free, for all

Originally published at:


Is this that thing where you train your voice recognition software on the data set, and then the computer suddenly, inexplicably, turns into a 4-channer?


The NSA should sell its database of voice samples to pay for Trump’s border fence.


so… what is it? what can it be used for? so confused

That’s Microsoft’s vertical.

They made Clippy’s younger sister too real

1 Like

voice recognition systems need training data for their machine learning models to learn to recognize commands. many companies, staffed by white male engineers, fail to train the models to understand other languages, dialects, or genders.

Without diverse training sets, life becomes an episode of Better Off Ted:


Why does it force you to download the whole 22GB?
Why not have the filenames or directories individually accessible?
Why isn’t this a torrent?

This topic was automatically closed after 5 days. New replies are no longer allowed.