It really is dumb. Though, Verizon and Comcast can’t throttle data on a DVD when they get pissy with the IRS.
Yeah, the IRS should use hills of BRDs . Get with the programme.
Not so dumb. Its 30 DVDs at almost three grand. That works out to about $100 a disc. Compare that to the cost to produce those discs (about 30 cents according to Yahoo Answers) and, well, this is a nice little gold mine for the Infernal Revenue Service.
And that’s without any Director’s cuts, behind the scenes clips, artwork, music, …
There might be better ways, though less secure or harder for the IRS to maintain/keep secure.
I believe this is working as designed. IRS “makes the data available” as required - the fact that is very, very inconvenient and very, very expensive is a ruse to discourage any of the great unwashed from troubling our betters.
This is the same as FOIA requests being served with mountains of paper (with a per sheet cost) even though the data exists electronically.
What’s “secure”? Mailing records to Carl Malamud is a sure way to lose any confidentiality you hoped to maintain over them. (and anyway, they’re supposed to be public records - that’s why Malamud was able to get them).
This is true, and I’m not arguing that they aren’t overpriced. But there are too many ways to hack an FTP site/Dropbox/etc and mess with the data, whereas being sent by mail puts them in a semi-permanent format.
Admittedly, I think this is a part we could go round and round on. I do think they are asking him to pay too much for it. Even if they are figuring work time into the project, I don’t think the dvds should cost that much.
The records are public. Who cares if they get shared?
I’m willing to bet that IRS is legally required to offer these records as hard copies of some sort. There is also a great deal of overhead in organizing this material, above the body of merely scanning it. For instance, can any info be back engineered to reveal data on individual taxpayers? IRS takes this stuff very seriously.
30 DVDS … @ 5GB per disc, that’s 150GB, which is smaller than the smallest disk newegg carries. I see that an external 1TB drive costs about $80. Do we have a winner?
Of course, copying 150GB of data via a USB connection … not fun.
Still faster than optical drives. Actually, I’ll test copying 150 GB from one USB hard drive to another. It’s going between 25-30 MB/s and says it’ll finish around 90-100 minutes from now. I’m going to go out on a limb and guess it took more than 90 minutes for some poor IRS minion to burn 30 DVDs.
If I recall correctly from a previous boingboing article (year or three ago?), not only are they on DVD, but each return is a series of TIFF images of the individual scanned page…
You’re correct. Each DVD is ~60,000 1-page low-res TIFF images. The last DVD for each month also has a “DAT” file which says which 1-page TIFF images are associated with which return. My hacked up PERL code turns that mess into PDF files with metadata stamped in, a privacy header, and various manifests. We get approximately 200 DVDs per year out of this “service.”
Each DVD is ~60,000 1-page low-res TIFF images. The last DVD for each month also has a “DAT” file which says which 1-page TIFF images are associated with which return.
Good lord, what a mess. It sounds almost like they’re going out of their way to make the data ‘un-greppable’. Have you had any luck running the pages through bulk OCR?
Or their subcontractor.
Do you really think any potential government profit center doesn’t already have a well-connected free-marketeer stuck on it’s teat?
@carlmalamud, Do you know who fulfills the orders? Specifically, is it subcontracted out? I am curious, because that’s an absurd price tag.
True. I forgot about that unwritten rule “if it can be contracted out, do it”
Its still a gold mine
This topic is a good reminder to support Public Resource Org.
I just did…
They do it in house. A guy named Dave cuts their DVDs, he is quite clueful. He’s out of the Utah facility. But, there’s a big clay layer on top of the folks doing the work. IRS spends $2 billion/year on IT, they have some really hard problems to solve on things like individual and corporate returns. The Exempt Organizations feed is sort of an orphan at the agency, an example of data that’s intended to flow back out instead of just coming in.
The basic problem is they’ve never paid attention to this database. Because it represents 10% of our U.S. economy and is very much analogous to the SEC’s EDGAR database, I’m hoping they start paying attention. This is important.
I’ve worked at the IRS, so I might be able to offer some perspective. Although government offices in movies may be sleek, futuristic command centers, in reality, most of the IRS is desperately understaffed, running Windows XP on desks that quite literally date back to WWII. Sucking at the government teat isn’t as glamorous as it sounds.
The office in question is set up to allow citizens to look up one nonprofit, or ten, or one hundred, for purposes of accountability and investigative journalism. They simply aren’t set up to dump seven million tax returns just because reasons. They probably have no reason at all to cooperate with your project, aside from what’s required by law.
So by all means tweet away, but I don’t think the head of EO pays much attention to Twitter. Shocking that anybody could ignore that fount of wisdom, but hey, bureaucrats, amirite?