This report was scanned and published on a Xerox WorkCentre 2883.
Its the beginning of the end! The machines are starting to rise up! First, its a randomly changed number causing the occasional wrong dosage of a drug to be given to a patient, killing them. Misaligned components failing catastrophically, killing workers. The next thing we know, missile coordinates will be âwrongâ, wiping out whole cities!
cackles maniacally
Btw, did you know that there is a real SkyNet? http://en.wikipedia.org/wiki/Skynet_%28satellite%29
Itâs actually a pretty nasty bug: Unlike JPEG-style lossy compression, which introduces nasty visual crunchiness and artefacts, JBIG2 divides the document into small tiles and âdeduplicatesâ it (more or less), replacing all sufficiently similar tiles with references to a single tile.
A good strategy for aggressive compression of text (and this format is a successor to fax compression formats, so that was probably the idea), which is almost entirely duplicates of a relatively small alphabet of identical tiles; but it means that the compression engine is stuck doing quasi-OCR: if tiles containing different numbers/letters are judged sufficiently similar, one will be silently, seamlessly, replaced with a copy of the other.
So, unlike the obnoxious-but-visible artifacts, youâll get good looking results at even quite aggressive settings; but with the potential for totally silent character substitutions. Not Good.
So I guess it would be a bad idea to scan, email and use KSFO runway data without a thorough check before hand. In particular a mistake with the runway lat and lon values could lead to an aircraft landing where there isnât actually a runway.
Oh great.
I can see it now.
âHey, the numbers on the hardcopy donât match, must have been a transcription errorâ
âHowâd they get it wrong? Somebody fix the electronic copies quick!â
Why are they using compression algorithms at all? Are the copiers going to run out of memory? Do they still build them with 16K memories or something?
Ugh. So why are copiers OCR-ing anything anyway? Arenât they just supposed to copy stuff?
I wonder if the cops use these copiers a lot to photocopy search warrants. Or how about drone-warfare hit lists? Itâs all sort of like that fly getting into the printer in Brazil.
The paper says that they turned all OCR features off before running the tests. Of course, maybe, changing that setting doesnât really change what itâs supposed to!
Yeah, I was kind of wondering that myself. It seems like âlots more storageâ should be considered before âcompression algorithms that may change dataâ for business copiers.
I get the impression this only affects scanning to documents, not photocopying. If Iâm wrong let me know so I can proceed to panic.
I was thinking thereâd be no reason to compress when copying but I guess it would help when making multiple copies of large stacks of documents.
Apparently this level of lossy compression isnât the default (though it is called ânormalâ quality, Iâm assuming the usual ânormalâ, âepicâ, âdocument of surpassing beautyâ marketing terms are in use), and the default setting doesnât have the same issue.
As for why compression, an office-duty copier very likely has a decent size hard drive, and probably a network connection, these days; but some compression isnât implausible if you want electronic output that you can easily attach to email, or local storage that doesnât fill up by lunchtime. Why âsome compressionâ would mean âbrutally lossyâ compression, rather than lossless TIFFs or something, I can only assume that it is part of Xeroxâs (well advanced) plan to trash their reputation in the industry they foundedâŚ
In modern usage âCopierâ is a polite term for âScanner and laser printer in a single box, with a usurious support agreementâ (seriously: I dealt with a Canon unit that, despite having full network connectivity and print capability, didnât offer support for any printer languages, like PCL or Postscript, without an additional license that cost more than a midrange laser printerâŚ)
They can copy stuff; but they can also do things like create PDFs, automatically email scanned documents, even do LDAP/Kerberos auth against an AD to drop files directly into user directories.
Some of them do have OCR capabilities (to generate PDFs that have a proper text plane, usually). In this case, though, the problem was with the compression algorithm, which performed a somewhat OCR-like operation; but which was not counted as âOCRâ for the purposes of turning OCR off before testing.
Worse, Xerox knew about the issue. The manual actually notes that âoccasional character substitutions may occurâ. And they shipped it. A dyslexic copy machine.
One of our major highways has an end to end time/distance speed camera system. It logs registration numbers between two camera sites and calculates average speed. If you are over the limit, they send out a speeding ticket. So this big bus got booked at 160 km/h, which is impossible over that distance, and after checking it turned out that human operators had transposed digits or characters to incorrectly identify two buses as one. Bus fleets presumably use blocks of registration numbers, so they are similar.
So these guys have been sprung bad: their âautomaticâ registration number system is actually at least partly manual because only humans have dyslexia, right?
Buttle, Tuttle, whatâs the difference?
No, I think we are talking about photocopying here. Please, feel free to panic.
Also, note that the distinction youâre making between copying and photocopying may be fading; I donât know what happens when you set the machine to âphotocopyâ (though at least one commenter in this thread is saying it happens when the device is asked to make photocopies), but the device I use, in my (academic department) workplace, is a scanner and a photocopier. When I use it, I can get photocopies, if I enter my groupâs code (if I remember it), and our budget gets charged for each piece of paper. Or I can tell the machine to email a scan to my computer, and then I can save that copy someplace Iâll actually find it, and that travels with me, and I can email it; I can even print it out if I need to. And we donât get charged: no paper used.
Iâm pretty sure the photocopiers in the libraries also scan for free.
I would assume document retention for âpaperless officeâ conversions. The best answer is still to not skimp on the storage, but the people who make those decisions are seldom thinking of anything but getting the cheapest possible solution.
A lot of modern copiers are really scanner+printer. Thatâs how they can read through all the pages of your original once and then start cranking out collated sets.