Originally published at: http://boingboing.net/2016/08/12/forget-skynet-ai-is-already-m.html
…
AI is already making things terrible
Currently the loudest voices debating the potential dangers of superintelligence are affluent white men, and, perhaps for them, the biggest threat is the rise of an artificially intelligent apex predator.
You mean a more intelligent apex predator, right? Because we kind of already are that.
Algorithms != A.I.
Thank you for summarizing my issue with this article’s premise so succinctly.
True AI screws over Rich White Guys because it doesn’t like them.
Ai will do as it is instructed,
The instructors’ handler is where the danger lies.
But all AI is algorithms.
That was my secondary issue. My primary issue is that the premise is really not news: algorithms have been used by humans for millennia, and people in power have been using algorithms to maintain power for just as long (whether they knew they were using them or not is beside the point). The article does report some real news, but it’s downplayed in favor of encouraging this silly witch hunt against algorithms and so-called AI. It’s ridiculous, tantamount to saying math and intelligence make things terrible for people; they don’t—other people do. If somebody attacked me with a steel pipe, I wouldn’t then crusade against cylinders (a mathematical entity).
I don’t think it’s the issue people think it is. People have this idea that AI operates according to some set of beliefs chosen by the programmers. But what we’re talking about here are machine learning algorithms. They’re really just things that look for patterns in data, then use the patterns they find to make predictions.
For example, let’s say you fed the computer a data set comprised of athletes in the NBA and NHL. The data included which sport they played (hockey or basketball), and each player’s race. If you task the computer with finding a pattern, it’ll find the same pattern that we humans would see: Hockey players seem to be overwhelmingly white, and the vast majority of the black people seem to play basketball.
So, if you then give the computer a new piece of data that just includes the athlete’s race and ask them to predict which of the two sports they play, it’ll likely come up with a simple algorithm that amounts to, “If they’re white, the probably play hockey, and if they’re black, they probably play basketball.” Similarly, if you ask the computer, “which race is better suited for each sport?” it’s going to say, “Well, it seems like white people are a better fit for hockey and black people are a better fit for basketball.”
This is the basic cause of what people describe as the “racist” behavior of the machine learning algorithms. But it’s not really the machine, the algorithm, or the programmer that’s being racist (or that’s responsible for the racist output).
Rather, the whole thing is just reflecting the racial realities of the data based on what happens in real life. There is a valid concern that the machine is basically trying to replicate the patterns it sees, so that if there is racism or gender bias in society, the algorithms will perpetuate that because it just mimics what actually happens. The output reflects the data the algorithm was trained on. It’s not the “fault” of the handler, the programming, the algorithm, etc. It’s a reflection of society.
I was working on a project and a company was upset because the algorithm kept suggesting it pay women less and we had to explain that this was because it was trained on their company’s data where they do indeed pay women less. It was reflecting the behavior of that company. We told them, "if you want the algorithm to stop suggesting you pay women less, then you need to stop paying women less and stop training the algorithm on bigoted data. It’s machine learning. It learns what you do. If you’re bigoted, it learns to also be bigoted. (This is basically what happened with Microsoft’s chatbot. It read racist twitter rants and “learned” to be racist.)
It’s like that old 90’s drug commercial, “I learned it by watching you, Dad. I learned it by watching you!”
I don’t even think you’d necessarily have to give race as an input. I think it’s easy to argue in that case that the algorithm is by definition racist. Though I’m sure in situations like the one you suggested, it’d be unavoidable.
How about an algorithm that determines a person’s loan interest rate based on their employment status, income, age, and address? Considering how segregated American society still is, I could imagine such an algorithm ending up being unintentionally racist based on that.
I have a few thoughts on that:
-
I come from a research background where we purposely include things like race and gender in our models because we want to see if they are treated differently (e.g., wage gap, job discrimination), so it’s second nature for me to include those. You can’t test for racial/gender disparities without including race and gender information.
-
The goal of machine learning is accurate predictions. A variable with predictive power is likely to end up in the model. You don’t make models based on a utopian vision of what you wish did or didn’t have predictive power, but what actually does.
For example, one of the way society treats genders unequally is that some non-work family responsibilities unfairly fall on women (e.g., taking children to doctor appointments, staying home with the kid if the kid is sick, picking up kid from school). You can say this is very unfair. It is! But you cannot say it doesn’t exist. It does! This unfair thing exists. Since this unfair thing does exist, you see real differences in the data between genders regarding hours worked, overtime, time off, shift-scheduling, etc. Therefore, if you want your model to better predict these variables for some set of workers, you should include gender because it has predictive power (even if you wish it didn’t). Maybe you’re trying to model the likelihood of getting arrested. Your inclusion of race as a variable doesn’t mean you think one race is more likely to be criminal. Rather, maybe you think one race is more likely to be unfairly targeted by the police. Either way, race probably has some predictive power in the model.
-
This is somewhat related to the above. Sometimes you don’t have data for some piece of information that you’d like. Maybe you don’t know someone’s income, but that’s important to your model. But, you do have their ZIP code. Since wealth levels often cluster in American cities (rich neighborhoods, poor neighborhoods), you may be able to use ZIP codes as a proxy for income. That is, it’s likely correlated with the variable you really want and you can use it to help capture some of the statistical effects of the variable you don’t have. Many US cities also show geographical clustering by race. This is often a hold-over from the days of redlining and other discriminatory practices. The Bay Area, Chicago, Dallas, LA, etc. are all metropolitan areas with pretty striking geographical segregation. So if race is correlated with geography, and geography is correlated with wealth, school quality, environmental factors, etc. then race becomes a proxy variable. If there’s a concern about children being exposed to lead, asbestos, etc., and I’m tasked with building a model to predict who is most at risk (and thus have more resources directed at them), race is probably something I’d include. The old public schools in the inner city are probably more likely to still have lead paint, asbestos, etc. compared to the schools in the wealthy suburb, and there’s probably going to be a racial disparity between who attends which school.
-
That brings me to your final point:
How about an algorithm that determines a person’s loan interest rate based on their employment status, income, age, and address?
My guess is that you’d get the algorithm spitting out different interest rates for people in different neighborhoods. As mentioned above, neighborhoods in America are often very racially segregated. You’d have minorities saying, “Hey, why do you charge us higher interest rates? Is it because we’re black?” And what’s your answer going to be, “No, race isn’t even a variable! It’s not because you’re black! It’s because of where you live.” And you’d get a response of, “Oh, so it’s not because I’m black, but because I live in a predominantly black neighborhood?” Do you think that explanation is going to silence the critique that the algorithm is racist? The problem here is similar to the one exploited by the GOP’s voter suppression laws. They do things like look up which forms of ID are correlated with which people. So they allow gun license ID’s, but not gov’t issued public assistance ID’s. The point is, outcomes can be discriminatory even if you never explicitly invoke race, gender, etc. Racial, gender, etc. differences are so interwoven into the fabric of society that there is no way to purge those effects from any data-driven process.
ALL of the above amount to the same underlying fact. Society itself in America contains many unjust inequalities and forms of racial and gender discrimination. Even if you ignore “race” as a variable, you’ll still pick up this inequality. You’re modeling students applying for college. Maybe you leave the model race-blind, but you include “quality of high school.” So kids from School X are more likely to get accepted that kids from School Y. Guess what? To an outside observer, you’re going to see a lot of white kids get acceptance letters and a lot of minorities being denied because schools themselves are often racially segregated.
Then, on top of this, there’s capitalism. Imagine one bank uses a model that doesn’t include race or any variable it thinks may inadvertently result in discriminatory racial practices because those variables are often correlated with race (like neighborhood, education levels, school quality, etc.), even though this causes the model to lose predictive power. Another bank uses a model that maximizes predictive ability regardless of the politics of the model. Which bank is going to do better financially? The one with the model with higher predictive power. That bank will have more success, grow larger, faster, etc.
The point is, for various reasons, if society itself is racist, then the model will be racist. If you want the model to be less racist, you have to either change society, or ignore reality. If you ignore reality, you’ll lose to a competing model that doesn’t. The reality-based model will beat your utopian-vision model in predictive power.
I keep thinking about a commercial that’s on TV recently where the parrot of a pirate captain keeps saying aloud, in front of the crew, all the bad things the captain says about the crew. The parrot is just repeating the captain’s private message. The captain gets very mad at the parrot. But is it really the parrot’s fault? It’s just parroting back the captain’s words.
My final thought: I’m not “defending” machine learning. I think the perpetuation of existing discrimination through machine learning is a real problem. Machine learning just parrots society. If society is racist, the machines will be racist. As a result, maybe the machines shouldn’t make decisions for us until the society they reflect is the society we want to perpetuate. That’s a very valid critique. I can get on board with that. But too many people misunderstand what’s going on and instead blame the machines or those who program the machines, and/or think we can just make tweaks to the programs to avoid this problem. We probably can’t do that. The discrimination is too deeply interwoven into the fabric of society, from where you live, to what school you go to, to which types of jobs you pursue, to which health risks you’re exposed to, to which police tactics are you subject to. If you stick neighborhood, income, schooling, etc. into a principle component analysis algorithm to find the “common thread” between those variables, I would not be surprised if that “common thread” is statistically indistinguishable from race. Essentially ALL data is inherently tainted by institutionalized discrimination.
This topic was automatically closed after 5 days. New replies are no longer allowed.