This whole thing about one man and his battle to combat racist trolls on Twitch shows how the struggle is real. And it can be countered with someone smart at the helm.
I have no idea where you got that from.I’m not sure why you would think automated moderation is even possible, much less well-understood.
I think these results are extremely unsurprising, In addition to all the other problems with “AI,” the question these filters are trying to answer is poorly posed.
When you want to train an algorithm to recognize pictures of cats, it’s at least easy to make a good training dataset, because you can show a whole bunch of people thousands of pictures, and they will agree about which ones are cats roughly 100% of the time.
“Hate speech,” OTOH, doesn’t have a universal definition, and like other abstract concepts (as opposed to something concrete like “cat.”) doesn’t even have a definition that will clearly and unequivocally apply or not apply in every case.
Even if you train your raters with a standard rubric, when you show them thousands and thousands of twitter posts, there will be a lot of cases, especially borderline cases, where their judgements will not be the same. This kind of (unavoidable) ambiguity in in a training set is very hard for machine learning algorithms to deal with.
A good rule of thumb for the state of the art right now might be “If you can’t show any arbitrary example of the thing you’re trying to classify to 20 randomly selected people and expect them to all always give you exactly the same answer, you can’t expect a machine learning system to be any good at it.”