The Australian health authority believed it had "anonymised" a data-set of patient histories, but academics were easily able to unscramble it

Originally published at:


Why assume?

Although, I guess the better question is, why take random swipes at unrelated disciplines? Tired of those cocky art historians swaggering around claiming to be L33t h@ck0rz?


They can then be subject to empirical testing, scientific analysis, and open public review

Well, welcome to my life.

1 Like

What’s refreshing is that at least the government pulled the dataset, rather than simply arresting the hackers.


Estonia’s e-republic is trying to solve this kind of problem in interesting ways (both technical and non-technical): via the blockchain they make such databases are distributed, de-centralised and heavily encrypted, which makes it more difficult to unscramble and link datasets as described here; and while they encourage open sharing of data amongst those for whom it would be useful (including medical and financial info) they also have a core policy that data about an individual belongs to that individual and not to whichever institutions collected it.

It’s an experiment, but a fascinating one that’s an alternative to the standard thinking by Western governments and corporations and privacy advocates about personal data:

1 Like

They hid it in the HTML comments of the website?


@doctorow, did you see the The Register link I submitted some time ago?

Talking about the easier route…

This is an interesting story because the researchers found vulnerabilities in the encryption, rather than re-identifying by using metadata, which is generally considered the easiest route to re-identification.

Wait, easier? Erm. They put data online which is “easier” re-individualised?
Oh, fuck, they do:

And don’t forget how the Zardoz government reacted last time…

But I think I see why you chose not to link the SQL query re-identification piece. As the researcher said:

“The idea that the government can make open all the data about people is just wrong.”

May I assume you didn’t want to discuss this stance? :wink:

1 Like

I have experience in open data and I find stories like this mind boggling.

Don’t encrypt names, remove the field entirely. If it seems important to have an identifier field, replace the names with numbers - not numbers you derived from the names, just number the first person on the list 1, then 2, then 3. You can’t break that “encryption”. This isn’t a password that you are going to have to check again later. There is no reason to want to go from the open data identified to the field value, and there should be no way to do it.

Deidentification is still hard because of the “linkage attacks” mentioned in the original post. Cryptography shouldn’t even come into it.


Right? The image in the post is only relevant to the article’s content in so far as they both involve computers. I find it a puzzling choice.


1 Like

Reading through I assumed they needed to keep the names there for some reason.

If not, GEEZ, you’re exactly right. Why encrypt them when you could just remove them?

This topic was automatically closed after 5 days. New replies are no longer allowed.