In any of your scenarios what happens if you remove the police altogether? Do the racially profiled neighborhoods continue to have higher crime rates than their surroundings or does everything everywhere level out? Because if it’s the latter then the obvious problem is the police and not the people. Some how I expect that’s not the case considering the amount of hard data you can find on various crime statistics. (Which obviously are somehow racially biased.)
The issue here isn’t the people who are victims, it’s the software. We can feel sorry for the people and try and help them all we can, but if nothing is done about programs like this it solves nothing. Realistically technology is outpacing our greater perception of problems, ie. humans are decent at pattern recognition, so a computer must be or could be better. I think the point or direction @Timoth3y was going for was how much data and how far can you go with a system like this? The #1 issue with this system is the fact the accused is providing their own answers. I’m guessing on bad days a lot of people would skew heavily toward psychopathic/violent tendencies on a psychological self evaluation questionnaire. But it’s not like an outsider is without fault either, we are all biased in some way - hopefully the professionals in the law fields are trained well enough to look past those and make rational choices.
Think about this example. I tell you a person has been brutally murdered with a claw hammer and the perpetrator is in custody. They were found at the scene holding the hammer and there is a witness to the event. You have already formed a mental image of that situation. Now add in more information, the victim and perp were related. Perhaps the scenario you pictured has now changed. How about if the perp is underaged? They show signs of long term physical abuse? Still the scenario continues to change. Software is only as good as the data we can give it. In a perfect world I don’t see software that could psychoanalyze you to the point of gauging your truthfulness as racists or any other -ists. I will agree having it all hidden behind patents and corporate provided data is ill advised, but rigorous third party testing could go a long way to helping any of those issues. As far as the here and now all programs like this should be discontinued and banned until said time were AI is advanced enough to be indistinguishable from a real person.
It is true that a piece of predictive software is only as good as the data it gets, but let me explain what I mean because it is a slightly different point (though we both come to the conclusion that this kind of software should be handled with extreme care).
Let’s say you have a hypothetical town. This town has two neighborhoods, A and B. Neighborhood A is mostly white, B is mostly black. By sheer coincidence both neighborhoods have exactly the same crime levels. Our town also has a police force of two officers and a newly appointed police chief. The chief, well he’s kind of a racist. On his first day on the job he tells his two officers to patrol neighborhood B 3 out of 5 days and neighborhood A 2 out of 5 days. Even though both neighborhoods have equal levels of crime, B now sees more arrests (simply because the officers patrol there more). After a month the chief looks at the new stats, feels good about himself and says “see, B is more criminal”. He orders his officers to patrol 4 out of 5 days in B and to arrest people there for minor offenses. “We need to make a hard stance against crime” he says. Obviously, a month later the stats are even more skewed, so the chief announces an even harder stance against neighborhood B. Now the people in neighborhood B have had enough and start to protest. This makes the news, so the town decides to buy predictive software to “proof that the police force is not racist” and decide how often to patrol each neighborhood. They feed the software the town’s crime figures and obviously it says “patrol neighborhood B more often” (since it gets the skewed data from our racist police chief). “See!” the chief says “It wasn’t racism, it was fact.”.
Something like this creates a feedback loop and every time you go through the loop the data gets more skewed. This already happened before the software was used. But the presence of software that we percieve as being emotionless and unbiased hides the fact that the odds were stacked against one group of people to begin with. It allows an already racist system to become even more racist and therein lies the real problem. Predictive software with bad data isn’t just unreliable, it is harmful to society (even if the software were an “advanced AI” bad data would still make it harmful).
What would be better is having software that can point out biases in the justice system. Software that can show how one group gets jail time for a crime that other groups only gets fined for. That can show if a certain group gets targeted on the streets more than others. That way you can find points were the justice system has become racist and provide adjustments to remove the problematic bias.
Further more, any software that a government uses to make decisions with (whether that is a predictive system, or a voting system, or a problem finder) needs to be open source and available for public scrutiny so any possible bias in the system can be found and removed.
Yes, it should be discontinued, and, no, it would not be a “useful tool” in the justice system.
Why?
If the system reliably predicted the risk of someone not showing up for their hearing and was constantly validated by out of sample data, I don’t think it should be discontinued simply because we don’t like the results.
For example, minorities consistently score lower on FICO, but I don’t think that fact alone is reason for discontinuing the system or even adjusting it. (Investigating certainly but not necessarily adjusting)
Because it’s using speculative output from an imaginary computer program offered to convince a judge to sentence a member of one racial group differently based on her race.
The injustice and impermissibility of racial discrimination is settled law in every circuit of the United States.
Because science is an old and rejected argument for racism.
Imagine instead we gathered free throw percentage, using the same kind of valid, scientifically vetted algorithm. Feed all the same data you would use to predict bail into predicting free-throw percentage. We’ll get neighborhood level information and try to apply it to individuals. Suppose your system said people in my neighborhood hit free throws 20%. I show up and you are going to give me 10 throws. Binomial distribution tells us 20% hitting 2 out 10 is 30%. So how much are you willing to bet at 4:1 (see how I gave you an edge there) that I sink 2 of my 10 shots? Since bail is a decision that may cause me to lose my job, have my marriage fall apart, etc., I hope you are willing to put at least a couple hundred grand on the wager to cover the damage you would be willing to inflict on others based on it.
Of course you probably wouldn’t be willing to put down more than a few dollars. Because I’m just one person. You don’t know if I play basketball regularly or if I have never played before in my life. The neighborhood data doesn’t help you. Now, if you were making the bet all day you might be happy to put down $100 a bet, confident you’d rake in the cash.
Which one of those models is the way the justice system ought to work - thinking of individuals and their rights, or thinking of getting things basically right on the aggregate?
If you told me you had a great mathematical model that worked every time that said that black people were more likely to jump bail than white people, and that you could reasonably make predictions about the future based on that model I would tell you that people have tried things like that in the past and they’ve always been wrong. It’s racism masquerading as reason. We aren’t as smart as we think we are, and we ought to be more confident that racism is wrong than we are that we can get numbers right.
In the freethrow example you gave I would definitely use the model as opposed to not using it.
Remember, not making a bet is not an option. If the bail setting program can verifiably determine an optimal amount of bail better than a judge, we should use it. Even if it produces a higher bail amount (or freethrow percentage) for African Americans.
The program would be no more racist than the FICO score are racist. There may well be important underlying discrimination in society that leads to these results, but I think the proper response should be to look into and fix those root causes rather than turning off the system because we don’t like the results.
The key validation here is not me or anyone else “telling you” that a system works – thats what’s wrong with the system in this article. They key is ongoing validation and refinement with out-of-sample data.
Step 2 is where those bogus crime rates get fed into software used to determine credit scores. Now people in area B have a harder time getting a mortgage or home improvement loans or cars.
Soon the banks shut down their branches in area B and replace them with their associated (and more lucrative) payday loan companies.
Crime and credit datasets are later used by companies to determine where to open new stores and services. And they all go into area A.
The fact that you trust the system at all shows the problem. If the neighborhood was averaging 20% was composed of 23% people who shoot 70% and 77% people who shoot 5% then the odds of a random individual going 2 for 10 is only 7.5% and the expected value on that 4:1 bet is a return of thirty cents on the dollar - maybe you actually would be better using your own judgement. The system is actually 100% correct, and if you averaged out the results of multiple individuals you’d find it was correct, but if instead you average out how well is applies to each individual case it is wrong. My point was that it is easy to be fooled by apparently objective systems and we’d be better off trusting our beware racism instincts.
If you want to say my analogy breaks down, explain how it breaks down. Why wouldn’t we expect court appearance to be more about individual factors that aren’t determined by height/weight/street address/employment history than about bail amounts? Why wouldn’t we expect that many people with the same profile in the system will show up 90%+ with no bail while a small group will flee at nearly any bail amount?
And if you could explain how the system will be validated using real terms that would be useful. Suppose I have a whole bunch of data - bail amounts set by judges, whether bail was posted, whether the court date was made if bail was posted. How do you use that to validate your model? I’d like to understand how you can, without averaging over multiple people, validate that it ever got anything right, as opposed to being terribly wrong in a way that the errors cancel out on average (much the way the force of two speed cars coming out you from opposite directions do). As far as I can tell, that is impossible without having data about what could have happened instead of only having data about what actually happened.
I understand and agree with your first point. It is easy to design systems with strong correlations with no real predictive power.
Let me focus on the second part of your question. How to validate the system. The objective of the system is to produce the highest number of people showing up for court with the lowest possible bails posted. That’s what we are optimizing for.
We train and optimize this system by looking at the results of bail decisions of courts that are not using the system. Did the judge set bail higher or lower than the system would have? Did the defendant show up for trail. If a number of judges set bail lower than the system would for a particular type of defendant and the defendant shows up, there is a opportunity for optimization. Also if a certain class of defendant always seems to show up for their hearing, the system could try lowering the bail to see if people of that class still show up.
The system is valid and useful if it performs better than most judges. Naturally, this is done by looking at its performance over a large number of cases.