For what it’s worth, the recognition system does appear to more or less be working as intended: it’s reducing the human workload to a manageable number of people to manually review. That, by itself, shouldn’t be a problem; a competent police department should be able to manually clear ~104 people very quickly.
However, with that being said, it doesn’t sound like the humans in this equation are using the system correctly. Of those 104, they should be able to rapidly reduce the number via quick ID checks and, you know, just looking at these people, to see if there’s anything questionable about them after the system pops a “maybe but probably not actually a match, about 2%, so 1 in 50, actually” flag.
But it sounds like, unsurprisingly, they didn’t think of it that way. There’s all kinds of algorithmic implementation rules where the users are supposed to understand, and have it baked into the system itself, that a “match” doesn’t mean “match”. It just means that there’s a slightly-better-than-random-chance probability that this person in front of you needs to be arrested… But humans are really bad at grokking probability at an intuitive level, so you have to use different words and systems to keep that under control.