Hate-speech detection algorithms are trivial to fool

Originally published at: https://boingboing.net/2018/09/27/ha-ha-only-serious.html


Seems like moderation is a pretty well-understood problem right up until people with lots of money start complaining that it’s too expensive.


This whole thing about one man and his battle to combat racist trolls on Twitch shows how the struggle is real. And it can be countered with someone smart at the helm.


I have no idea where you got that from.I’m not sure why you would think automated moderation is even possible, much less well-understood.

I think these results are extremely unsurprising, In addition to all the other problems with “AI,” the question these filters are trying to answer is poorly posed.

When you want to train an algorithm to recognize pictures of cats, it’s at least easy to make a good training dataset, because you can show a whole bunch of people thousands of pictures, and they will agree about which ones are cats roughly 100% of the time.

“Hate speech,” OTOH, doesn’t have a universal definition, and like other abstract concepts (as opposed to something concrete like “cat.”) doesn’t even have a definition that will clearly and unequivocally apply or not apply in every case.

Even if you train your raters with a standard rubric, when you show them thousands and thousands of twitter posts, there will be a lot of cases, especially borderline cases, where their judgements will not be the same. This kind of (unavoidable) ambiguity in in a training set is very hard for machine learning algorithms to deal with.

A good rule of thumb for the state of the art right now might be “If you can’t show any arbitrary example of the thing you’re trying to classify to 20 randomly selected people and expect them to all always give you exactly the same answer, you can’t expect a machine learning system to be any good at it.”

1 Like

The impression I got was that @timstellmach was referring to human moderation. Correct me if I’m wrong


As long as there have been filters, they have been l33t h@><0rs e/@c|ing d3m


Just look at that anti-racist banana… JUST LOOK AT IT!!!


This topic was automatically closed after 5 days. New replies are no longer allowed.