Training bias in AI "hate speech detector" means that tweets by Black people are far more likely to be censored

Originally published at:


Well, that and the non-computing rule that computing often misses: Context Matters





  1. Driving while Black
  2. Flying while Black
  3. BBQing while Black
    2,372,487 Tweeting while Black

Specifically, candidate texts written in African American English (AAE) are 1.5x more likely to be rated as offensive than texts written in “white-aligned English.”

That’s not right. “X times more likely” and “X times as likely” aren’t the same thing.

There are multiple data points at play in the report. One shows AAE texts marked as offensive at a rate of 38.7%, compared to 18.5% for whites. Another had figures of 20% and 13.5%, respectively.

Something that happens 20% of the time is 0.48x more likely to happen than something that happens 13.5% of the time. It’s 1.48x as likely.

The other figures support 2x as likely (or 1x more likely). I don’t see any data here that fits “1.5x more likely.”


An “AI hate speech detector” sounds like an awful idea. As with any other large scale censorship by government or large companies (or government mandating that large companies do it), it will inevitably be used disproportionately against marginalized communities and people who oppose the status quo. It boggles my mind that there are still people on the left who seem not to realize this.


mmmm… it would appear that the mods here can’t tell the difference between using a specific word in quotes as a reference and using it as a pejorative. Let’s try again: I see no reason to allow any specific group or subculture to use $BADWORD even if they claim it’s special within their group.
Why any filter would reject a three-letter word meaning either gluteus max or a donkey is a separate question, and well-known to be a clbuttic mistake.

1 Like

Uh. I imagine it comes from this part in the paper:

… Specifically, in DEMOGRAPHIC16, AAE tweets are more than twice as likely to be labelled as “offensive” or “abusive” (by classifiers trained on DWMW17 and FDCL18, respectively). We show similar effects on USERLEVELRACE18, where tweets by African American authors are 1.5 times more likely to be labelled “offensive”. Our findings corroborate the existence of racial bias in the toxic language datasets and confirm that models propagate this bias when trained on them.

1 Like

“Johnson, I needed you and your donkey here at 4pm. It’s now 5pm. Get your ass over here!”


For the record, I’m not a big fan of poorly trained AI running amok, but this seems to be a stupid problem.

For the first example part of “training” an AI should be easy enough by having it ask “does the word in question end in “a” or “er”?” As far as the second example, unless it’s an explicitly child targeted website (which Twitter definitely isn’t) or a particularly religious website, then why the fuck would they care that someone says “ass”? I could be far more offensive not using that word, while still referencing the same bit of anatomy.

That passage is discussing the data points I mentioned before. I could very well be missing something, but my suspicion is that the authors are making the same more/as likely error.

The first “as likely” usage seems correct, referencing likelihood comparisons of 38.7/18.5 and 24.6/11.4, supporting language “2.09 times as likely” and “2.15 times as likely” — safely in the “more than twice as likely” ballpark.

But when they switch to “more” from “as,” it’s looking like the error creeps in. USERLEVELRACE18 presents comparisons of 20/13.5 and 10.8/7.4, which are both very close to “1.5 times as likely,” but not “1.5 times more.”

From the study:
“This dataset suffers from severe sampling bias that
limit the conclusions to be drawn from this data:
70% of sexist tweets were written by two users,
and 99% of racist tweets were written by a single

What the?


I saw his ass yesterday

1 Like

I don’t think many native English speakers in America use terms like “300% more” to mean, actually, four times as much. For percentages over 100, people use “more” and “as” interchangeably.

Percentages of 100 or less are the anomaly, and in those cases the word “more” is distinguishing the meaning. “50% more” cannot be the same as “50% as much” because 50% as much is not “more.”

1 Like

Isn’t that “Garbage In, Gospel Out”?

1 Like

Nah, that’s plumber’s cleavage.

I’d like to point out that I observe people daily who are white, asian, and hispanic communicate exactly the same disrespectful way.

The problem isn’t “texting while black.” It’s using racial slurs in everyday communication. Frankly speaking, it’s time for everyone to stop doing that.

It may not be a popular opinion but I don’t see this detection flagging as a problem. I would not be sorry that someone else’s decision to be rude would cause them inconvenience.

Edit: Don’t get me wrong, this entire thing is a surveillance dumpster fire, but if the worst side-effect is that people have to stop casually using the N-word, etc. to send messages promptly, I’ll take it.


So it seems to me that Google/Jigsaw think computers can moderate better than people.
Just hire more damn people Jigsaw.
Hell most forum moderators work for free anyway.

And that is, of course, the most important thing to take from this story.


Anyone who thinks, “oh, we’ll just develop an algorithm to read and interpret human speech to classify it as either hateful or not,” is really fucking deluded.

Of course, training an AI on a human data-set is the easy path to this “hard problem”, but even that is flawed since meaning always comes from context and is fundamentally subjective. You would need a massive data-set to even begin to get close to the nuance that most people intuitively possess.

Furthermore, even if such a system is deployed, how long until language just evolves new signifiers to communicate the same hateful ideas?

1 Like