Originally published at: https://boingboing.net/2018/05/14/beijing-envy.html
Originally published at: https://boingboing.net/2018/05/14/beijing-envy.html
What could possibly go wrong?
For what it’s worth, the recognition system does appear to more or less be working as intended: it’s reducing the human workload to a manageable number of people to manually review. That, by itself, shouldn’t be a problem; a competent police department should be able to manually clear ~104 people very quickly.
However, with that being said, it doesn’t sound like the humans in this equation are using the system correctly. Of those 104, they should be able to rapidly reduce the number via quick ID checks and, you know, just looking at these people, to see if there’s anything questionable about them after the system pops a “maybe but probably not actually a match, about 2%, so 1 in 50, actually” flag.
But it sounds like, unsurprisingly, they didn’t think of it that way. There’s all kinds of algorithmic implementation rules where the users are supposed to understand, and have it baked into the system itself, that a “match” doesn’t mean “match”. It just means that there’s a slightly-better-than-random-chance probability that this person in front of you needs to be arrested… But humans are really bad at grokking probability at an intuitive level, so you have to use different words and systems to keep that under control.
One metric I would like to see is how many false negatives are in the mix. Once you rely on a faulty system, you begin to miss the people you are actually after because you spend time following up all those false leads.
And this is before take into account the accuracy of the database of faces. I bet there are lots of errors there too.
It’s all about generating “probable cause” out of thin air.
Image shows the City of London Police coat of arms, not the Met.
I had the idea (I haven’t read the links) that the 98% was the false positive rate. A system that missed 49 out of 50 matches doesn’t sound like much to worry about on the civil liberties side of things, just a waste of resources.
Without information on the total number of people scanned and cleared, it’s impossible to say anything meaningful about the accuracy of the system. Generating 104 alerts with 102 false positives is pretty terrible if the total number of scans is 300, but it’s pretty good if the total number of scans is something like 80,000 and the false negative rate is low.
A false negative would be when there is a bad actor in the image but the system did not flag them as such. This is the true failure rate of these systems and a metric I don’t see published anywhere.
See the difference? When the system which the police rely upon fails to notice that a terrorist is walking down the road, there is a good bet the police will not be interacting with the terrorist but instead will be clearing 90% of the people the system did flag.
@Ben_Curthoys already gave a perfectly cogent explanation on the Wales post why calling this a 98% false positive rate is incorrect. The false positive rate is not the ratio of false positives to correct positives; the false positive rate is the ratio of false positives to total subjects, and we don’t know the false positive rate because we don’t know the total number of subjects from which those positives were generated.
Cory, why do you continue to misrepresent the accuracy of this technology, even after the nature of your misrepresentation has been made clear? You don’t need such tactics to make the dangers of these systems clear, and deliberately misinforming your audience about the nature of those dangers just undermines your argument.
Thanks. I was going to reply to this one too but then I found it too depressing.
If you justify the use of misleading statistics and specious sound bites because the ‘other’ side does it too then you debase the whole debate: it stops being a good faith investigation into what ought to be done and becomes yet another tribal pissing contest.
Eh, I had my positives and negatives confused. Why did I think today would be a good day to skip the caffeine?
‘What the hell are you doing here?’ said Sergeant Gilks, wearily.
‘I am here,’ said Dirk, ‘in pursuit of justice.’
‘Well, I wouldn’t mix with me then,’ said Gilks, ‘and I certainly wouldn’t mix with the Met.’
Your post about the Base Rate Fallacy was spot on. I scoffed at that story about the Welsh police until I read your post. Then I realized 1/2 of the important information was missing.
The post calls it a “false positive rate”, which a statistician wouldn’t, but it clearly explains what it means by 98% inaccuracy. There’s nothing misleading about it.
The maths of how test accuracy translates to confidence in the result, dependent on the rarity of the thing being tested, is so counterintuitive to lay people that IMHO it would be actually be misleading to report the true false positive rate.
“Our test is (eg) 99.9% accurate”, when the end result is that 98% of those accused are innocent, is exactly the kind of statistic those in power misuse to hide what they’re doing.
Sounds just ike “the internet of
things shit” to me. Destined to become a ‘thing’.
If the police arrested and charged everyone flagged up by this screening, then they would be using a potentially useful tool in a very stupid way. I wouldn’t put it past them.
Incidentally, the maths here is part of the reason why biometric identity cards are a terrible idea.
I spoke to a security officer who works in a casino in Vegas once. He said that real time face ID is basically the stuff of movies and will remain so - the main thing they use it for is if they suspect someone is card counting or a banned person, they can scan to see if they’re in the ban book. Things like scanning everyone who enters the casino aren’t doable and won’t be for years.
Which post? The original article at the Independent clearly describes the origin of the 98% figure, but it does not describe the statistic you’d actually need to calculate the true false positive rate, which is how many faces were screened to generate the 104 alerts, nor does it make any indication that the author even understands that fact. Cory’s post doesn’t even do that much – he just calls it a 98% false positive with no direct reference to the underlying figures, an assertion that it wholly false and enormously misleading. (I can only assume deliberately so, since Cory’s not stupid and has had this mistake pointed out already.)
Then we disagree. I find reporting the truth to be less misleading than doing otherwise. It took Ben fewer than 170 words to explain the maths with perfect clarity off the top oh his head; with two minutes effort a reporter who was genuinely interested in accurately presenting the information could get it down to 100.
Sure, and that would be a valid cause for extreme concern if the Met had falsely accused 102 people of their 104 hits, or even detained those 102 people. Oh, but they didn’t actually do that, despite Cory’s implication to the contrary. It would be cause for extreme concern if this were a system that had been widely adopted and blindly believed by its users, rather than a system currently being employed on a trial basis with careful civilian government oversight that currently deems it not fit for widespread use. Oh, but it’s not, and it is.
Police abuse of this technology, which clearly isn’t accurate enough to do what the police want it to do, would be a very large concern. If it were happening. But it’s not. Deliberately misrepresenting the accuracy of the technology and its current use only serve to undermine the argument against its broader deployment, and unnecessarily so, because there is already a perfectly good case against its use without resorting to deliberate misinformation.