Two years later, Google solves 'racist algorithm' problem by purging 'gorilla' label from image classifier


Yes to this.

There aren’t that many gorillas left in the world. There are probably many more racist posts than actual pictures of gorillas. Now word is out that you can get Google to misdiagnose images, there are probably a lot more deliberately mislabeled images.

Google’s art is in getting the software to train itself. In their ocean of data there is no clear parameter you can change to identify people as gorillas. The best you can do is to pick up the cases identified as ‘gorilla’ and try to double-check them somehow. Meanwhile, your image database is getting contaminated, possibly by racists, but also by images from the original news story. And you are still making mistakes and getting complaints. Or you pull the ‘gorilla’ category and it stops.

If they can count a particular sort of mislabelled image, then it would be interesting to see whether there has been a shift from labelling ‘gorilla’ to ‘chimpanzee’ since this story broke. That would actually show racists at work.


Don’t their nets get trained by user input? I wouldn’t be at least surprised if malicious racist trolls tagged black skinned people as Gorillas just out of spite.


But isn’t that a distinction without a difference? Failing to test your software against a properly representative sample of faces is surely an example of unconscious bias.


Or viceversa. I present to you Copito de Nieve:


In other news, Facebook plans to reduce the number of racist slurs in posts by banning the letters “n”, “I”, “g”,“e”, and of course “r”.


Souds lik a pla to me. Would mprov Facbook.


Also, it should be pointed out the problem is not symmetrical. If object identification doesn’t work and 1 in 1,000 times I am mistaken for a Snowman or a Naked Mole Rat, I am going to find that humourous, not offensive. One in one thousand is an acceptable error rate.

When my entire race has undergone multiple centuries of oppression, and that oppression involves the very terms that the image identifier is using, then one in a million is an utterly unacceptable error rate.

If the technology is not up identifying people with 99.9999% accuracy (for any race), then the only sensible (and sensitive) thing to do is to remove objectionable terms altogether.

Google did the right thing.


Oh thanks, now Google has removed institutions.


Accidental or Accitentional - know the difference.

When an ‘error’ by the powerful always benefits one side in a power struggle …


So if they remove the identifiers ‘gorilla’ etc, what will the system label pictures of actual primates as, then. once it has been ‘trained’ to ‘recognise’ dark-skinned people as people? I hesitate to suggest the obvious answer here, but if it were true then we can expect an even bigger almighty shitshow when that leaks out in due course.


Probably dumb question: Google Photos is different from Google image search? Is that an app of some kind? Because Google image search seems okay; I just did a search for “gorilla” and got lots of pictures of gorillas, with the only humans appearing in photos that also had gorillas in them.

Which raises the question, if Google image search can manage the task, why can’t this app do it? Or am I missing something?


Presumably Google Image is searching the text around the photo for keywords like “gorilla”, while algorithm is pure image recognition.


Boris, you may not be aware of this, but you are becoming problematic, maybe even racist.


Comrade, I don’t speak. This is text.

That’s your own voice.

I’m just another cartoon villain the plot can get blamed on.


He means the character dear sir


Criticism of 40 year old cartoon with lens of todays will solve everything. Everything.


Racism isn’t just “intentional malice”…


Agreed. It’s a useful tool of classicism and a deeply rooted systemic problem… as perpetually evidenced by people tripping all over themselves to deny its very existence.


Hoo boy, I never thought of that.


Writing the to-be-trained algorithm is creating it.
Choosing training data is creating it.
As has been mentioned here on the Boing repeatedly, bias can get into these programs not just from conscious biases on the part of their programmers, but also from unconscious prejudices.

The training is part of the creating.