Originally published at: https://boingboing.net/2018/04/09/over-the-rainbow-table.html
…
Take this with a large grain of salt.
Salt is good for passwords, but you can’t use them for the purpose being discussed here.
Part of the idea is that one firm can hash an email address and another firm can hash the same email address and then they can match those two things in the datasets they produce. To get that matching effect you need to be sure that the same email always hashes to the same thing, where the point of salt (as I understand it) is to make the same password for two different people not hash to the same thing.
Since same email => same hash is basically part of the design spec, I think rainbow tables are necessarily possible.
(Ignoring the possibility that someone is cleverer than I am, of course, but I feel like I’ve got the corners covered here)
You can distribute the salt with the hash and avoid rainbow tables. It’s useless as a database key though.
Yeah, that’s the issue. I think the whole point is they want to use it as a key.
Uncovered corners may be found here.
So, AIUI, each email address would have to have a different salt applied prior to hashing, then that hash made public … but wouldn’t the specific salt for each email also have to be made public, so other DBs can generate the same hash for that email? Thus, reversing the hash would be feasible?
IOW, AI(probably badly)UI, hashing is great for salting pwds that you never want to share or match with anybody else, but not useful for things that you do want to share and match?
… where “you” in this case refers to the owner of the DB, not the owner of the pwd or email address.
This is what I was thinking as well, but now that I think it over, it’d be completely useless. At that point you might as well assign a random ID.
Well, thanks for the good reminder that I’m completely useless at databases.
But can amazon unscramble hashes into flowers? Surely would hasten the drone delivery age.
The thing is, companies believe that it is.
Companies don’t have beliefs. Executives have beliefs, and on matters this technical, engineers have beliefs. Are the engineers idiots, or is the article exaggerated?
Yeah, but they insist on exact change - or make you do like 150 Mechanical Turk tasks to cover that much money.
Not if they don’t want the hash reversed. I.e. keep the salt secret and the hash is still valid for identifying a dataset but functionally only the original vendor could attack the hash via rainbow tables. (It would still be a brute-force attack as per the SO discussion linked above, but they would have the significant advantage of possessing a set of known salts.)
This is one of the reasons I use a catchall - anything sent to *@mydomain.com forwards to a single email.
So I have separate emails (comcast@domain.com, bank@domain.com etc) which would produce different hashes.
SUCK IT DATA BROKERS
If there’s no cross-company matching then I get it; salt=teh_good.
But what if Company A wants to match their data with the data held by Company B, in order to create a richer data set about all users?
That’s the point, they can’t!
(Which is why nobody will ever actually hash anything properly in a shared dataset. Bastards…)
So … I do understand it correctly?
Yay?
This is not news to anyone who deals with “anonymised” health data. Nearly all anonymised health data released for research, etc, is so poorly done that it is a trivial matter to re-identify records. Typically, “we don’t release people’s names” is considered adequate anonymisation. The fact that parts of addresses, DOB, gender, etc are released indicates the lack of sophistication in this area.
Important distinction. But we need a new word for ‘have good evidence that the buck can be passed to the successor’. Trumpliefs?
I’m not willing to panic just yet…
Rainbow tables need to be big. A lot bigger than indicated in the article actually.
Ok, it may take just a couple of cents to calculate the hashes for 5 billion addresses, but that’s starting from the point where you already know all the addresses in use. I’m guessing: you don’t.
A rainbow table for all email addresses should cover all possible combinations of characters, digits, special characters,… (about 40 possible characters per position in the address) into email addresses and then calculate all the hashes for these combinations.
To illustrate: if someone used up 30 characters to make the not-so-crazy emailaddres "example.name01@something.co.uk" and we want to make sure we have the hash for that address in our rainbow table, our table needs to have all possible combinations of 30-character addresses. That would make our table 40^30 entries big… (and I admit it could be smaller since you can eliminate a lot of combinations because they would be obviously malformed as email addresses, but just computing the hash is faster than checking if it’s a possibly valid address)
Now, I don’t know exactly how big that number is, but I have a gut feeling it’s more than 5 billion…