Racist algorithms: how Big Data makes bias seem objective

doctorow · December 2, 2015, 8:00am

miasm · December 2, 2015, 8:18am

Inb4 ‘Algorithms can’t be racist’.

Verse · December 2, 2015, 12:17pm

#KillAllWhiteComputerAlgorithms

William_Holz · December 2, 2015, 12:34pm

#AllAlgorithmsMatter

This is a seriously bad thing, but it’s mostly a problem with the lazy humans doing the analysis and not the algorithms themselves.

Messy data is hard to work with, and people get WAY too excited when they tease something out without doing proper reality checks far too often. (I’ve fallen into that trap myself). All those neat machine learning toys seem like magic sometimes and can do really impressive things, but they’re only as good as the source and their analysts.

If you don’t have critical people doing the analysis and others double-checking their work then this is exactly what we’re going to end up with.

NickSay · December 2, 2015, 12:57pm

Computers are designed by us and taught by us, but lack the capacity for independent thought.

GiGo

Boundegar · December 2, 2015, 3:09pm

There’s no easy solution, but there is a hard one: to acknowledge bias and correct for it. Not only algorithms, but privileged people can do this and have a great positive impact, but it requires thinking hard thoughts.

PAPPP · December 2, 2015, 4:45pm

Of the many ways data-driven-discrimination scares me, transparently amplifying/legitimizing existing biases is only #3 the list because it already happened decades ago, and is fairly detectable. IMO, the bigger sacries are

Clusterism: The machine discovery of new, possibly dynamic, groups to discriminate for/against for potentially new reasons. These groups may not be easily detected by humans, so it’s likely we wont’ even know except by sorting-hat effect after the fact.
Process-opaque discrimination: Just as machine-learning algorithms for images often pick out some small subset of the feature space (see the classic “recognize big cats by their patterns, even if it’s on a couch” example) to distinguish on and basically ignore the rest, other learning algorithms are likely to do the same. The loan-risk calculating system analyzed 128 factors, but the internal net really just short circuited and rejected you because of one factor (maybe something old and obvious like race or birthplace, maybe something new and opaque like a social graph feature).

Existing forms of prejudice make good examples, but focusing on existing forms of discrimination is fighting last centuries battle while you lose this one, in addition to tying broad-appeal issues to specific identity groups.

clifyt · December 2, 2015, 5:58pm

I was working on computer essay scoring years ago, and quickly realized this was a problem. At first we let educators rate papers from students that hadn’t been transcribed yet, and they could see the names. Oops. If a name was hispanic or black, scored low. If it were an asian name, higher. Threw off the grading curves, but as the computer couldn’t see the names, it really only just made the software a little less reliable.

So we switched this out with having instructors read everything they were rating on the computer to develop the model, and we quickly found that if in these essays students had used names of their subjects within the papers (these weren’t technical writing, they were college entrance essays which often were about families and things like this) – humans still picked these up, and the neural models figured this out as well. Resubmit the same paper to the computer with names changed to stereotypically white names and scores went up. Change names to asian names? Ok this didn’t effect the model as well because these were too unique within the system – but when we did this to humans, they could figure it out pretty quickly and upscore them.

Knowing all of this, we ended having to rescore EVERYTHING trying to figure out how to algorithmicly neutralize all names…and future proof the algorithms. Way too easy to teach a computer to be racist.

zathras · December 2, 2015, 6:32pm

A lot of bias is objective.
It becomes discrimination or *-ist when you ruthlessly apply statistical facts to individuals.

Whenever you discriminate against a group X because they are more likely to be/do Y, that’s unfair towards those members of group X that aren’t/don’t Y.

Young men are more likely to crash their car than young women are (statistical fact, at least in Austria). But making someone pay higher insurance premiums because they are male is discrimination. Why should a careful young man pay more than a careless middle-aged woman?
On the other hand, women are likely to live longer, should they therefore get lower retirement benefits?

[…] provides a veneer of objective respectability to racism, sexism and other forms of discrimination.

I think that is the wrong problem. The problem is not that these algorithms make things seem objective. The problem is that too many people think that “objectively justified discrimination” is respectable.

If the police are stop-and-frisking brown people, then all the weapons and drugs they find will come from brown people.

It is of course nice to defend “brown people” as a group, but what if it turns out to be true after all that the crime rate in one racial group is higher than in another? Is it then OK to stop and frisk people just based on their skin color?

I’m hoping that clusterism will turn out to be less bad than the traditional *isms, because two different machines will likely come up with different clusterings, whereas two different people are likely to divide the world into roughly the same races, ethnicities, and genders.

PAPPP · December 2, 2015, 8:34pm

I’m worried the new *isms will just as bad, partly because they’ll be less apparent, and because of the growing body of evidence about just how little it takes to set up tribal behavior. Out of today’s news feeds, there is a neat experiment where arbitrarily splitting a class and seeding the two groups with different nominal culture to develop over the span of a few weeks managed to have all the traditional cross-cultural problems when re-integrating.

William_Holz · December 2, 2015, 8:44pm

PAPPP:

Clusterism: The machine discovery of new, possibly dynamic, groups to discriminate for/against for potentially new reasons. These groups may not be easily detected by humans, so it’s likely we wont’ even know except by sorting-hat effect after the fact.

Process-opaque discrimination: Just as machine-learning algorithms for images often pick out some small subset of the feature space (see the classic “recognize big cats by their patterns, even if it’s on a couch” example) to distinguish on and basically ignore the rest, other learning algorithms are likely to do the same. The loan-risk calculating system analyzed 128 factors, but the internal net really just short circuited and rejected you because of one factor (maybe something old and obvious like race or birthplace, maybe something new and opaque like a social graph feature).

These are both almost literally why we use data mining tools, and often when they’re used to discriminate it’s a GOOD thing.

We use it to identify people who are more likely to suffer high risk pregnancies, we use it to identify connections between health conditions that no human could, we use them to identify facilities that are associated with negative outcomes for patients. Discrimination is what machine learning is all about.

As I mentioned before, the key is the humans involved. We need to be well trained to be objective and responsible with how we analyze the data, and we need to be very transparent about our analysis because it’s nearly impossible to figure out how a neural network (one of the more useful types of algorithms for many tasks) concluded what it did.

PAPPP · December 2, 2015, 9:35pm

Of course, I didn’t mean to imply those things are always bad, just that bad things can happen as a result.

There’s an opposing moral hazard of rejecting a result just because an algorithm discriminates on a factor that makes us uncomfortable, it’s entirely possible that after running the regressions some factor does swamp all the others. It’s not my favorite example, but to set up a not-terribly-emotionally-charged case, a medical diagnostic AI is fed that a young patient has low hemoglobin and recurrent chest pain. Suddenly, race becomes a dominant feature, because sickle-cell disorder is a reasonable guess for black patient but not a white patient. I’m totally in agreement that we want our decision support tools to do that.

I also agree that transparency is a critical part of avoiding harmful effects (and that’s a seriously losing battle at the moment, unless there’s a major change, insurance companies aren’t going to publicize their actuarial secret sauce, IBM isn’t going to expose the inner workings of licensed Watson units, etc.). I’m not so convinced that human involvement is always helpful to keeping things reasonable, see the anecdote from clifyt above about how the human-fed training sets will introduce bias into systems.

doctorow · December 7, 2015, 8:00am

This topic was automatically closed after 5 days. New replies are no longer allowed.

Topic		Replies	Views
Forget Skynet: AI is already making things terrible for people who aren't rich white dudes boing	12	2559	August 17, 2016
Towards a method for fixing machine learning's persistent and catastrophic blind spots boing	15	905	May 14, 2019
Big Data Ethics: racially biased training data versus machine learning boing	34	3935	February 10, 2016
Amazon trained a sexism-fighting, resume-screening AI with sexist hiring data, so the bot became sexist boing	17	1159	October 16, 2018
Two years later, Google solves 'racist algorithm' problem by purging 'gorilla' label from image classifier boing	95	5916	January 16, 2018

Racist algorithms: how Big Data makes bias seem objective

Related topics