It really is dumb. Though, Verizon and Comcast canāt throttle data on a DVD when they get pissy with the IRS.
Yeah, the IRS should use hills of BRDs . Get with the programme.
Not so dumb. Its 30 DVDs at almost three grand. That works out to about $100 a disc. Compare that to the cost to produce those discs (about 30 cents according to Yahoo Answers) and, well, this is a nice little gold mine for the Infernal Revenue Service.
And thatās without any Directorās cuts, behind the scenes clips, artwork, music, ā¦
There might be better ways, though less secure or harder for the IRS to maintain/keep secure.
I believe this is working as designed. IRS āmakes the data availableā as required - the fact that is very, very inconvenient and very, very expensive is a ruse to discourage any of the great unwashed from troubling our betters.
This is the same as FOIA requests being served with mountains of paper (with a per sheet cost) even though the data exists electronically.
Whatās āsecureā? Mailing records to Carl Malamud is a sure way to lose any confidentiality you hoped to maintain over them. (and anyway, theyāre supposed to be public records - thatās why Malamud was able to get them).
This is true, and Iām not arguing that they arenāt overpriced. But there are too many ways to hack an FTP site/Dropbox/etc and mess with the data, whereas being sent by mail puts them in a semi-permanent format.
Admittedly, I think this is a part we could go round and round on. I do think they are asking him to pay too much for it. Even if they are figuring work time into the project, I donāt think the dvds should cost that much.
The records are public. Who cares if they get shared?
They are
ā¦ public.
Iām willing to bet that IRS is legally required to offer these records as hard copies of some sort. There is also a great deal of overhead in organizing this material, above the body of merely scanning it. For instance, can any info be back engineered to reveal data on individual taxpayers? IRS takes this stuff very seriously.
30 DVDS ā¦ @ 5GB per disc, thatās 150GB, which is smaller than the smallest disk newegg carries. I see that an external 1TB drive costs about $80. Do we have a winner?
Of course, copying 150GB of data via a USB connection ā¦ not fun.
Still faster than optical drives. Actually, Iāll test copying 150 GB from one USB hard drive to another. Itās going between 25-30 MB/s and says itāll finish around 90-100 minutes from now. Iām going to go out on a limb and guess it took more than 90 minutes for some poor IRS minion to burn 30 DVDs.
If I recall correctly from a previous boingboing article (year or three ago?), not only are they on DVD, but each return is a series of TIFF images of the individual scanned pageā¦
Youāre correct. Each DVD is ~60,000 1-page low-res TIFF images. The last DVD for each month also has a āDATā file which says which 1-page TIFF images are associated with which return. My hacked up PERL code turns that mess into PDF files with metadata stamped in, a privacy header, and various manifests. We get approximately 200 DVDs per year out of this āservice.ā
Each DVD is ~60,000 1-page low-res TIFF images. The last DVD for each month also has a āDATā file which says which 1-page TIFF images are associated with which return.
Good lord, what a mess. It sounds almost like theyāre going out of their way to make the data āun-greppableā. Have you had any luck running the pages through bulk OCR?
Or their subcontractor.
Do you really think any potential government profit center doesnāt already have a well-connected free-marketeer stuck on itās teat?
@carlmalamud, Do you know who fulfills the orders? Specifically, is it subcontracted out? I am curious, because thatās an absurd price tag.
True. I forgot about that unwritten rule āif it can be contracted out, do itā
Its still a gold mine
This topic is a good reminder to support Public Resource Org.
I just didā¦
https://public.resource.org/
They do it in house. A guy named Dave cuts their DVDs, he is quite clueful. Heās out of the Utah facility. But, thereās a big clay layer on top of the folks doing the work. IRS spends $2 billion/year on IT, they have some really hard problems to solve on things like individual and corporate returns. The Exempt Organizations feed is sort of an orphan at the agency, an example of data thatās intended to flow back out instead of just coming in.
The basic problem is theyāve never paid attention to this database. Because it represents 10% of our U.S. economy and is very much analogous to the SECās EDGAR database, Iām hoping they start paying attention. This is important.
Iāve worked at the IRS, so I might be able to offer some perspective. Although government offices in movies may be sleek, futuristic command centers, in reality, most of the IRS is desperately understaffed, running Windows XP on desks that quite literally date back to WWII. Sucking at the government teat isnāt as glamorous as it sounds.
The office in question is set up to allow citizens to look up one nonprofit, or ten, or one hundred, for purposes of accountability and investigative journalism. They simply arenāt set up to dump seven million tax returns just because reasons. They probably have no reason at all to cooperate with your project, aside from whatās required by law.
So by all means tweet away, but I donāt think the head of EO pays much attention to Twitter. Shocking that anybody could ignore that fount of wisdom, but hey, bureaucrats, amirite?