Tell the IRS that mountains of DVDs are a stupid way to distribute public records

[Permalink]

It really is dumb. Though, Verizon and Comcast canā€™t throttle data on a DVD when they get pissy with the IRS.

1 Like

Yeah, the IRS should use hills of BRDs . Get with the programme.

Not so dumb. Its 30 DVDs at almost three grand. That works out to about $100 a disc. Compare that to the cost to produce those discs (about 30 cents according to Yahoo Answers) and, well, this is a nice little gold mine for the Infernal Revenue Service.

And thatā€™s without any Directorā€™s cuts, behind the scenes clips, artwork, music, ā€¦

3 Likes

There might be better ways, though less secure or harder for the IRS to maintain/keep secure.

I believe this is working as designed. IRS ā€œmakes the data availableā€ as required - the fact that is very, very inconvenient and very, very expensive is a ruse to discourage any of the great unwashed from troubling our betters.

This is the same as FOIA requests being served with mountains of paper (with a per sheet cost) even though the data exists electronically.

6 Likes

Whatā€™s ā€œsecureā€? Mailing records to Carl Malamud is a sure way to lose any confidentiality you hoped to maintain over them. (and anyway, theyā€™re supposed to be public records - thatā€™s why Malamud was able to get them).

1 Like

This is true, and Iā€™m not arguing that they arenā€™t overpriced. But there are too many ways to hack an FTP site/Dropbox/etc and mess with the data, whereas being sent by mail puts them in a semi-permanent format.

Admittedly, I think this is a part we could go round and round on. I do think they are asking him to pay too much for it. Even if they are figuring work time into the project, I donā€™t think the dvds should cost that much.

1 Like

The records are public. Who cares if they get shared?
They are

ā€¦ public.

Iā€™m willing to bet that IRS is legally required to offer these records as hard copies of some sort. There is also a great deal of overhead in organizing this material, above the body of merely scanning it. For instance, can any info be back engineered to reveal data on individual taxpayers? IRS takes this stuff very seriously.

2 Likes

30 DVDS ā€¦ @ 5GB per disc, thatā€™s 150GB, which is smaller than the smallest disk newegg carries. I see that an external 1TB drive costs about $80. Do we have a winner?

Of course, copying 150GB of data via a USB connection ā€¦ not fun.

1 Like

Still faster than optical drives. Actually, Iā€™ll test copying 150 GB from one USB hard drive to another. Itā€™s going between 25-30 MB/s and says itā€™ll finish around 90-100 minutes from now. Iā€™m going to go out on a limb and guess it took more than 90 minutes for some poor IRS minion to burn 30 DVDs.

1 Like

If I recall correctly from a previous boingboing article (year or three ago?), not only are they on DVD, but each return is a series of TIFF images of the individual scanned pageā€¦

Youā€™re correct. Each DVD is ~60,000 1-page low-res TIFF images. The last DVD for each month also has a ā€œDATā€ file which says which 1-page TIFF images are associated with which return. My hacked up PERL code turns that mess into PDF files with metadata stamped in, a privacy header, and various manifests. We get approximately 200 DVDs per year out of this ā€œservice.ā€

1 Like

Each DVD is ~60,000 1-page low-res TIFF images. The last DVD for each month also has a ā€œDATā€ file which says which 1-page TIFF images are associated with which return.

Good lord, what a mess. It sounds almost like theyā€™re going out of their way to make the data ā€˜un-greppableā€™. Have you had any luck running the pages through bulk OCR?

Or their subcontractor.

Do you really think any potential government profit center doesnā€™t already have a well-connected free-marketeer stuck on itā€™s teat?

@carlmalamud, Do you know who fulfills the orders? Specifically, is it subcontracted out? I am curious, because thatā€™s an absurd price tag.

1 Like

True. I forgot about that unwritten rule ā€œif it can be contracted out, do itā€

Its still a gold mine :slight_smile:

1 Like

This topic is a good reminder to support Public Resource Org.
I just didā€¦
https://public.resource.org/

1 Like

They do it in house. A guy named Dave cuts their DVDs, he is quite clueful. Heā€™s out of the Utah facility. But, thereā€™s a big clay layer on top of the folks doing the work. IRS spends $2 billion/year on IT, they have some really hard problems to solve on things like individual and corporate returns. The Exempt Organizations feed is sort of an orphan at the agency, an example of data thatā€™s intended to flow back out instead of just coming in.

The basic problem is theyā€™ve never paid attention to this database. Because it represents 10% of our U.S. economy and is very much analogous to the SECā€™s EDGAR database, Iā€™m hoping they start paying attention. This is important.

2 Likes

Iā€™ve worked at the IRS, so I might be able to offer some perspective. Although government offices in movies may be sleek, futuristic command centers, in reality, most of the IRS is desperately understaffed, running Windows XP on desks that quite literally date back to WWII. Sucking at the government teat isnā€™t as glamorous as it sounds.

The office in question is set up to allow citizens to look up one nonprofit, or ten, or one hundred, for purposes of accountability and investigative journalism. They simply arenā€™t set up to dump seven million tax returns just because reasons. They probably have no reason at all to cooperate with your project, aside from whatā€™s required by law.

So by all means tweet away, but I donā€™t think the head of EO pays much attention to Twitter. Shocking that anybody could ignore that fount of wisdom, but hey, bureaucrats, amirite?

1 Like