Hate-speech detection algorithms are trivial to fool

doctorow · September 27, 2018, 12:58pm

Originally published at: https://boingboing.net/2018/09/27/ha-ha-only-serious.html

…

timstellmach · September 27, 2018, 1:06pm

Seems like moderation is a pretty well-understood problem right up until people with lots of money start complaining that it’s too expensive.

Mister44 · September 27, 2018, 2:00pm

This whole thing about one man and his battle to combat racist trolls on Twitch shows how the struggle is real. And it can be countered with someone smart at the helm.

anon62122146 · September 27, 2018, 3:48pm

I have no idea where you got that from.I’m not sure why you would think automated moderation is even possible, much less well-understood.

I think these results are extremely unsurprising, In addition to all the other problems with “AI,” the question these filters are trying to answer is poorly posed.

When you want to train an algorithm to recognize pictures of cats, it’s at least easy to make a good training dataset, because you can show a whole bunch of people thousands of pictures, and they will agree about which ones are cats roughly 100% of the time.

“Hate speech,” OTOH, doesn’t have a universal definition, and like other abstract concepts (as opposed to something concrete like “cat.”) doesn’t even have a definition that will clearly and unequivocally apply or not apply in every case.

Even if you train your raters with a standard rubric, when you show them thousands and thousands of twitter posts, there will be a lot of cases, especially borderline cases, where their judgements will not be the same. This kind of (unavoidable) ambiguity in in a training set is very hard for machine learning algorithms to deal with.

A good rule of thumb for the state of the art right now might be “If you can’t show any arbitrary example of the thing you’re trying to classify to 20 randomly selected people and expect them to all always give you exactly the same answer, you can’t expect a machine learning system to be any good at it.”

Bunbain · September 27, 2018, 3:57pm

The impression I got was that @timstellmach was referring to human moderation. Correct me if I’m wrong

anon62577920 · September 27, 2018, 3:59pm

As long as there have been filters, they have been l33t h@><0rs e/@c|ing d3m

anon61221983 · September 27, 2018, 4:02pm

Just look at that anti-racist banana… JUST LOOK AT IT!!!

doctorow · October 2, 2018, 12:58pm

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Training bias in AI "hate speech detector" means that tweets by Black people are far more likely to be censored boing	22	1630	August 19, 2019
Just look at this racist talking robot banana boing	16	1567	May 26, 2018
A report from the Christchurch Call, where the future of "anti-extremist" moderation was debated at the highest levels boing	7	930	May 21, 2019
Yet another chatbot, trained on online utterances, starts spewing hate boing	30	1979	February 5, 2021
Towards a method for fixing machine learning's persistent and catastrophic blind spots boing	15	903	May 14, 2019

Hate-speech detection algorithms are trivial to fool

Related topics