This brings up something I thought of a while back when the topic of bias in speech recognition systems WRT their ability to recognise certain accents and dialects came up:
Will we hit a point where in order for AI to recognise a sufficiently broad variety of $thing, its ability to actually distinguish $thing from $another_thing is compromised?
I was thinking of it specifically in terms of voice recognition, because if you train your voice recognition algorithm only on (for example) people with a British Recieved Pronunciation accent, it will end up being very accurate for those specific people, and useless for anyone else.
On the other hand, once you start introducing more accents into the training data, the room for confusion grows larger, because the number of sounds which equate to each word becomes larger, and the overlap between words increases.
Theoretically, of course, you’d want your AI to have a bunch of separate sub-models so it could go “Oh yes, this person is speaking with a Birmingham accent!” and base recognition of that voice on that particular library…
But I wonder how this is going to end up working as we try to increase the size of our training databases for different types of machine learning…