Two years later, Google solves 'racist algorithm' problem by purging 'gorilla' label from image classifier


read and reread Congo recently and the author mentions humans’ ingrained prejudices against apes, even ones that can speak through ASL. wonder how scientific the research presented in the novel really is though


Racism ≠ intentional.

You’re making a common mistake, assuming racism is a transgression. A transgression can be racist, but so can an unconscious bias, an institution, a tradition, a cultural norm and many other things.

The racism is in the sample selection of naked mole-rats white peepo used to train the filter.

Bayesian statistics, how do they work?! ¯_(ツ)_/¯

Oh, you do understand how it works.


You certainly have…creative definitions of words.

Will the oppression of Google never end?


Yeah but then they get all soggy and clog the disposal.


I guess I have to side with Google here, except they should have done it in the first week, at worst the first month after they realized aw shit, we’re not going to be able to fix this problem really quick are we.


Yeah, that’s kind of what I’m saying. If they did this right away, then OK, applause, but taking two years to implement a lame non-solution merits only an an exasperated sigh over the quality of free online services in a corporate world.

Well, sure, every hacker tries to do her best, but unfortunately the Internet can’t be simulated. I know, I’ve tried. At some point, you have to release, and then you find out what you have not forseen. Google releases a lot of big stuff on the Internet, so they make a lot of big mistakes. It’s a function of scale.


Testing something like this exhaustively is literally impossible. The best you could do woul be to add specific checks for something like this after it is brought to your attention, and even specific checks can’t 100% confirm that it won’t happen. If your algorithm is correct 99.99% of the time, that’s an amazingly good algorithm, but if it’s running a million times a day (a tiny amount, for a Google algorithm), you’re still getting 100 errors every day. If it’s something where a single mistake would really make people mad, manually forbidding it from making that mistake is the only way out that I can think of, because your algorithm will never be 100% accurate.



I hear that money and data can solve the problem, but there isn’t enough interest to invest in it.


As I said earlier, I suspect that the algorithm has trouble distinguishing any human face from any ape face going by structural features alone–they are quite similar if you don’t have a large chunk of your brain evolved specifically to recognize fine distinctions in faces. When it sees a white face, it thinks about it and realizes that most apes have dark faces, so this probably isn’t an ape. When it sees a dark face, it has a harder time.

(And of course there’s also the issue with the training corpus not having enough black people in it. If they’ve been working on this for two years without a reliable solution, though, I suspect that they’ve addressed that and it wasn’t enough.)


ImageNet is the standard and seems to have taken care to include a variety of races. I suspect that, due to the color shortcut, that to get good results you probably have to oversample black humans. i.e. make sure that regularization doesn’t force the NN to learn the simple feature of color, but instead learn some more complicated feature like “supraorbital ridge” or “sagittal crest”.


Is it demographics or socioeconomics? It is not like there are a lot more white people around, it is just that more pictures of white people get uploaded to the internet.


I would blame it on programmer / data scientist hubris. I bet for the first year plus some months they spent claiming we’re almost there and this time it will definitely work.


One can make the claim that that Boris Badenov was a product of “his” time. The USSR (=Russia) was the enemy and thinly veiled anti-Russian sentiments did not offend the vast majority.

But choosing to use such a figure in this era is rather more problematic. Now for almost anyone here, I’d say, sure, using an anti-Russia propaganda character was no doubt an accident.

But I’ve recently learned that when a power-imbalance exists, such as between you and the average Russian citizen, it’s far more likely that your “accident” was “accintentional”.

Based on my new-found knowledge, I now realize that your obviously a horrible anti-Russian bigot determined to maintain that power imbalance.

For shame!

(And thanks to LJ for providing us with the evidence of exactly how wrong headed and dangerous “accintentional” thinking can be. Assuming malice where incompetence will do may let a few malicious acts escape. But assuming malice can, and has, cost millions of lives, to say nothing of the emotional carnage among friends, family, and workplaces.)


But one cannot make me feel responsible for your interpretation of someone elses work.

Way way too many triangles involved


did the majority of people become suddenly white? demographics would favor the opposite outcome here.

now if you meant “the demographics of wealth and power” then you might be on to something. which points out that the racist status quo of our world gets reflected and - without explicit effort - reinforced by all our systems. laws, economics, algorithms. all of it.

our systems “neutrally” support that which already exists.

separately, i really can’t believe that any one at google is dumb enough to trust user generated content when training their algorithms. all you ever wind up with are boaty mc boat face results.

you train your algorithms with what you believe are authoritative sources, and then you test the results. that’s the whole job.

the outcomes they achieve with their algorithms are their doing and their responsibility. no if or buts about it.

they’re certainly happy enough to rake in the cash when it works out well. so they don’t get to wash their hands of it when it doesn’t.


I think you misunderstood what I meant by demographics, or I used the word wrong. I was talking about the demographics of the training data, which is the only thing learning algorithms have to work with.

Sure, you’re right. Even if I’m not using the word “demographics” wrong, it’s still a socioeconomic effect driving this.

I think gmail’s anti-spam system proves otherwise; it’s driven by millions of users clicking the “spam” button, and it’s the best spam filter I know of. It evolves new blocks in real time to counter completely unforseen threats without any Google programmers having to be paid to make to happen.

But it’s a treacherous thing; if the users know they have the wheel, you get boaty mcboatface, if you don’t, you get gorilla misclassifications. Computer science at global scale is a pretty new discipline :wink:


true that.

re: gmail. on that front, they have the issue that some people click “spam” when they really mean “unsubscribe” – so different people value mail differently. part of that is doubtless what drove them to push for new mail headers with some additional authentication information in them.

it’s definitely a hard problem. no single solution.


This topic was automatically closed after 5 days. New replies are no longer allowed.