Unfortunately, this is one of those situations where âsufficiently advanced stupidity is indistinguishable from maliceâ. It may well be that only some of the pro âanonymizationâ parties are actually lying because they want to be able to sell data and know full well that the most dangerous stuff is the most valuable, and that some of them are sincere in their ignorance; but that barely matters once the harm has been done (especially since getting the data scrubbed once it hits the market will beâŚnontrivial).
Paul Ramsey just did a 5 part series about the privacy implications of the BC governmentâs choice of vendors for storage of sensitive or personal data.
Much of what he discusses revolves around slippery definitions of secure and private, and the fact that governments and vendors create definitions that solve for selection of the vendor.
When âsufficiently advanced stupidityâ is accounted for, the Stupid/Evil vortex is instantiated.
Iâm a corporate data analyst, and I have this conversation a lot. The bizarre part is that people can accept any amount of conference-presentation bullshit about how revolutionary big data could be for their business, while at the same time refusing to accept that big data could be revolutionary in inconvenient ways that might allow people to be identified and then fucked around with.
Also one specific complaint about the pro-anonymisation paper; it quotes a competition entry that managed 0.8% re-identification for a specific, limited set of attacks. It says that since the target for the competition was 5%, that this result is pants-wettingly amazing. I donât know about you, but when weâre talking about a dataset of, say, 15 million hospital patientsâ records, 0.8% sounds incredibly dangerous, and 5% sounds like a career-ending possible-criminal-charges-brought politician-destroying meltdown the likes of which we can only imagine in our darkest nightmares. So even if we concede their entire argument, theyâre still being laughably optimistic.
I have a feeling that the solution to âbig dataâ is on the symptom side rather than looking for a cure. You can cure allergies by killing off your entire immune system⌠but most people just opt to mask the symptoms.
The problem is that big data has some real value, and not just value in the âmore money plzâ sense of the word. If you could open up all medical records in the country to all researchers, you can safely assume that we would almost instantly make incredible discoveries. In the course of making those discoveries, you could almost certainly warn people of conditions that your data suggests that they have. We of course wonât do this for the obvious reason that insurance companies would bend everyone over and fuck them.
Personally, I think that the approach is two-part. First, do the obvious and anonymize. It wonât stop a determined attack, but like a shitty bike lock, it helps keep honest people honest and makes it so that you only have to deal with determined attackers. The second bit is to look for where deanonymized information can be used, and ban its use. This is attacking the symptoms. We donât mind the data being used, we just donât want it put to ends that are going to hurt us.
Outside of that, I am not sure what else you can do. The promise of sifting through that data for gems is far too great to flat out ban it, and you should be skeptical of people who claim that there is a way magically make this dataâs anonymity bullet proof. Either path you take, you are going to do harm. I think the best we can do is try and thread the difference, do the common sense things, and fully accept that our defense against using big data for evil will never be 100%.
One thing which isnât mentioned is that positive views of de-identification assume that the threat is the identification of some specific victim. But many crimes involve a victim of opportunity. For example, if a burglar can obtain a list of people who are on holiday, he doesnât care if only 1% can be re-identified as long as he can find some whose houses can be safely burgled.
Big data is the information ageâs equivalent of the nuclear weapon.
It could have broad, sweeping powers to marshal many changes, but ultimately, itâs going to create a ugly legacy that few people will want to touch, unless theyâre also looking to assert their own power on the global stage.
As soon as itâs unleashed, the good and bad will start flowing.
You canât make good technical regulations by ignoring technical experts, even if the thing those technical experts are telling you is that your cherished plans are impossible.
Can we get that on a billboard please.
Well who are you going to believe - some computer nerds, or this big fat check Iâm handing you?
Totally agree with the author and in fact I blogged about this a few days ago with the same concept regarding how big data can be used maliciously: Influence of data in enterprises
We cannot âhope for the bestâ when it comes to the way data is handled.
Big Data could end up being a red herring for âBig Knowledgeâ since data remains bits and bytes until you can transform it into knowledge and cannot see the value in an organisation storing useless data without an identifiable entity relation. Encrypted or not.
Though not overtly stated, medical records are implied here. Just wanted to toss in my $0.02 that PHI (personal health information) has a standard of deidentification that renders it immune to Narayanan and Feltenâs reidentification methods: location, for starters. Here is the relevant HHS standard in that case for removing patient identifiers:
(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:
(1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and
(2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000
The larger PHI deideintification standard is here.
Still I think it would be naive to think that reidentification were âimpossibleâ given enough data points. But is absolute what we need,or is this a risk analysis cost/benefit type situation? Lives lost because PHI isnât easily shared clinically or for research purposes is staggering. This is public record (i.e, âTo Err is Humanâ, 1999).
This is a societal problem and issues on privacy hinder the flow of clinical and research based health information. This will get worse when genomics information becomes readily available. Though healthcare is regulated to a far greater degree than any other sector and our data privacy standards are higher, the fear over privacy is palpable to people that otherwise have no beef (other than bitching â actions not words) with the Facebooks, the Googles, and the Visas of the world.
So itâs a hurdle. And itâs a hurdle that impedes medical progress. Balancing privacy and the need for information is something society needs to address. But in HC the fear is so overblown and the regulations so onerous that I see us coming very close to an âopt-inâ scenario re patient data. Right now technically this is the case, but it isnât the case in practice (patientâs arenât educated enough to be stewards of their own PHI). This is even more likely in the case of âpersonalized medicineâ which I see as both the advent of personal health records (PHR) but mostly the coming of genomics. In theory patientâs control their own data, but in practice somebody else does (possession being 9/10 of the law). With genomics data itâs the opposite in terms of possession. The patient both owns and possesses the data and submits that data for the benefit of their own diagnostic outcomes at their own discretion.
So I see real âopt-inâ (not HIPAA âopt-inâ) becoming a force in HC and the tipping point occurring (patientâs willingly submitting personal information including genomics data) when the research shows far greater outcomes in the cases of more precise information. While encouraging the free flow of information would get us their quicker, from an insiders perspective, I donât see that as likely.
This topic was automatically closed after 5 days. New replies are no longer allowed.