This report was scanned and published on a Xerox WorkCentre 2883.
Its the beginning of the end! The machines are starting to rise up! First, its a randomly changed number causing the occasional wrong dosage of a drug to be given to a patient, killing them. Misaligned components failing catastrophically, killing workers. The next thing we know, missile coordinates will be "wrong", wiping out whole cities!
Btw, did you know that there is a real SkyNet? http://en.wikipedia.org/wiki/Skynet_%28satellite%29
It's actually a pretty nasty bug: Unlike JPEG-style lossy compression, which introduces nasty visual crunchiness and artefacts, JBIG2 divides the document into small tiles and 'deduplicates' it (more or less), replacing all sufficiently similar tiles with references to a single tile.
A good strategy for aggressive compression of text (and this format is a successor to fax compression formats, so that was probably the idea), which is almost entirely duplicates of a relatively small alphabet of identical tiles; but it means that the compression engine is stuck doing quasi-OCR: if tiles containing different numbers/letters are judged sufficiently similar, one will be silently, seamlessly, replaced with a copy of the other.
So, unlike the obnoxious-but-visible artifacts, you'll get good looking results at even quite aggressive settings; but with the potential for totally silent character substitutions. Not Good.
So I guess it would be a bad idea to scan, email and use KSFO runway data without a thorough check before hand. In particular a mistake with the runway lat and lon values could lead to an aircraft landing where there isn't actually a runway.
I can see it now.
"Hey, the numbers on the hardcopy don't match, must have been a transcription error"
"How'd they get it wrong? Somebody fix the electronic copies quick!"
Why are they using compression algorithms at all? Are the copiers going to run out of memory? Do they still build them with 16K memories or something?
Ugh. So why are copiers OCR-ing anything anyway? Aren't they just supposed to copy stuff?
I wonder if the cops use these copiers a lot to photocopy search warrants. Or how about drone-warfare hit lists? It's all sort of like that fly getting into the printer in Brazil.
The paper says that they turned all OCR features off before running the tests. Of course, maybe, changing that setting doesn't really change what it's supposed to!
Yeah, I was kind of wondering that myself. It seems like 'lots more storage' should be considered before 'compression algorithms that may change data' for business copiers.
I get the impression this only affects scanning to documents, not photocopying. If I'm wrong let me know so I can proceed to panic.
I was thinking there'd be no reason to compress when copying but I guess it would help when making multiple copies of large stacks of documents.
Apparently this level of lossy compression isn't the default (though it is called 'normal' quality, I'm assuming the usual 'normal', 'epic', 'document of surpassing beauty' marketing terms are in use), and the default setting doesn't have the same issue.
As for why compression, an office-duty copier very likely has a decent size hard drive, and probably a network connection, these days; but some compression isn't implausible if you want electronic output that you can easily attach to email, or local storage that doesn't fill up by lunchtime. Why 'some compression' would mean 'brutally lossy' compression, rather than lossless TIFFs or something, I can only assume that it is part of Xerox's (well advanced) plan to trash their reputation in the industry they founded...
In modern usage "Copier" is a polite term for "Scanner and laser printer in a single box, with a usurious support agreement" (seriously: I dealt with a Canon unit that, despite having full network connectivity and print capability, didn't offer support for any printer languages, like PCL or Postscript, without an additional license that cost more than a midrange laser printer...)
They can copy stuff; but they can also do things like create PDFs, automatically email scanned documents, even do LDAP/Kerberos auth against an AD to drop files directly into user directories.
Some of them do have OCR capabilities (to generate PDFs that have a proper text plane, usually). In this case, though, the problem was with the compression algorithm, which performed a somewhat OCR-like operation; but which was not counted as 'OCR' for the purposes of turning OCR off before testing.
Worse, Xerox knew about the issue. The manual actually notes that 'occasional character substitutions may occur'. And they shipped it. A dyslexic copy machine.
One of our major highways has an end to end time/distance speed camera system. It logs registration numbers between two camera sites and calculates average speed. If you are over the limit, they send out a speeding ticket. So this big bus got booked at 160 km/h, which is impossible over that distance, and after checking it turned out that human operators had transposed digits or characters to incorrectly identify two buses as one. Bus fleets presumably use blocks of registration numbers, so they are similar.
So these guys have been sprung bad: their "automatic" registration number system is actually at least partly manual because only humans have dyslexia, right?
Buttle, Tuttle, what's the difference?
No, I think we are talking about photocopying here. Please, feel free to panic.
Also, note that the distinction you're making between copying and photocopying may be fading; I don't know what happens when you set the machine to "photocopy" (though at least one commenter in this thread is saying it happens when the device is asked to make photocopies), but the device I use, in my (academic department) workplace, is a scanner and a photocopier. When I use it, I can get photocopies, if I enter my group's code (if I remember it), and our budget gets charged for each piece of paper. Or I can tell the machine to email a scan to my computer, and then I can save that copy someplace I'll actually find it, and that travels with me, and I can email it; I can even print it out if I need to. And we don't get charged: no paper used.
I'm pretty sure the photocopiers in the libraries also scan for free.
I would assume document retention for "paperless office" conversions. The best answer is still to not skimp on the storage, but the people who make those decisions are seldom thinking of anything but getting the cheapest possible solution.
A lot of modern copiers are really scanner+printer. That's how they can read through all the pages of your original once and then start cranking out collated sets.