What is machine learning anyway? This three minute animation delivers an excellent simple explanation

That is the exact phrase I’ve been using to explain Machine Learning to my students. “Machine learning is automation of bias.”
The clearest example I can think of is the program that was (supposed to be) taught to recognise photographs of sheep. It inevitably started labelling fields of green grass as sheep, because that was the most common factor in the vast majority of it’s data set. It couldn’t define a sheep, it couldn’t recognise one. It had simply developed a bias that it filed under the label “sheep”.
A good interactive example (current as of this post) is Google Translate for Finnish. These are gender neutral phrases:

  • Hän sijoittaa. (That person invests.)
  • Hän pesee pyykiä. (That person does the laundry.)
  • Hän urheilee. (That person is playing sports.)
  • Hän hoitaa lapsia. (That person takes care of the children.)
  • Hän tekee töitä. (That person works.)
  • Hän tanssii. (That person dances.)
  • Hän ajaa autoa. (That person drives a car.)

But Google Translate was automated based on millions of volumes that were full of translator bias. So instead of coming out as gender neutral, Google Translate gives us:

  • He invests.
  • She washes the laundry.
  • He’s playing sports.
  • She takes care of the children.
  • He works.
  • She dances.
  • He drives a car.

The datasets aren’t curated, they’re not filtered to weed out unjust bias and prejudice. They’re often just massive banks of open data that is assumed to be good enough.

I try to get my students to think about the real-world implications of bias automation. We already have such systems in place for IDing suspects and in sentence recommendation for criminal courts. If we can’t even reliably recognise a sheep or translate a gender neutral phrase, what is centuries of prejudice in criminal data doing to the people in our justice system?

9 Likes