You can unscramble the hashes of humanity's 5 billion email addresses in ten milliseconds for $0.0069

doctorow · April 9, 2018, 7:37pm

Originally published at: https://boingboing.net/2018/04/09/over-the-rainbow-table.html

…

nodolra · April 9, 2018, 8:13pm

Take this with a large grain of salt.

Pensketch · April 9, 2018, 8:17pm

anon50609448 · April 9, 2018, 8:26pm

Salt is good for passwords, but you can’t use them for the purpose being discussed here.

Part of the idea is that one firm can hash an email address and another firm can hash the same email address and then they can match those two things in the datasets they produce. To get that matching effect you need to be sure that the same email always hashes to the same thing, where the point of salt (as I understand it) is to make the same password for two different people not hash to the same thing.

Since same email => same hash is basically part of the design spec, I think rainbow tables are necessarily possible.

(Ignoring the possibility that someone is cleverer than I am, of course, but I feel like I’ve got the corners covered here)

heng · April 9, 2018, 8:36pm

You can distribute the salt with the hash and avoid rainbow tables. It’s useless as a database key though.

anon50609448 · April 9, 2018, 8:41pm

Yeah, that’s the issue. I think the whole point is they want to use it as a key.

roomwithaview · April 9, 2018, 8:42pm

Uncovered corners may be found here.

JonS · April 9, 2018, 9:09pm

So, AIUI, each email address would have to have a different salt applied prior to hashing, then that hash made public … but wouldn’t the specific salt for each email also have to be made public, so other DBs can generate the same hash for that email? Thus, reversing the hash would be feasible?

IOW, AI(probably badly)UI, hashing is great for salting pwds that you never want to share or match with anybody else, but not useful for things that you do want to share and match?

… where “you” in this case refers to the owner of the DB, not the owner of the pwd or email address.

nodolra · April 9, 2018, 9:14pm

This is what I was thinking as well, but now that I think it over, it’d be completely useless. At that point you might as well assign a random ID.

Well, thanks for the good reminder that I’m completely useless at databases.

bolamig · April 9, 2018, 9:19pm

But can amazon unscramble hashes into flowers? Surely would hasten the drone delivery age.

Boundegar · April 9, 2018, 9:28pm

The thing is, companies believe that it is.

Companies don’t have beliefs. Executives have beliefs, and on matters this technical, engineers have beliefs. Are the engineers idiots, or is the article exaggerated?

sockdoll · April 9, 2018, 9:30pm

Yeah, but they insist on exact change - or make you do like 150 Mechanical Turk tasks to cover that much money.

roomwithaview · April 9, 2018, 9:54pm

Not if they don’t want the hash reversed. I.e. keep the salt secret and the hash is still valid for identifying a dataset but functionally only the original vendor could attack the hash via rainbow tables. (It would still be a brute-force attack as per the SO discussion linked above, but they would have the significant advantage of possessing a set of known salts.)

anon62577920 · April 9, 2018, 9:58pm

This is one of the reasons I use a catchall - anything sent to *@mydomain.com forwards to a single email.

So I have separate emails (comcast@domain.com, bank@domain.com etc) which would produce different hashes.

SUCK IT DATA BROKERS

JonS · April 9, 2018, 10:00pm

If there’s no cross-company matching then I get it; salt=teh_good.

But what if Company A wants to match their data with the data held by Company B, in order to create a richer data set about all users?

roomwithaview · April 9, 2018, 10:02pm

That’s the point, they can’t!

(Which is why nobody will ever actually hash anything properly in a shared dataset. Bastards…)

JonS · April 9, 2018, 10:07pm

So … I do understand it correctly?

Yay?

johnd · April 9, 2018, 10:35pm

This is not news to anyone who deals with “anonymised” health data. Nearly all anonymised health data released for research, etc, is so poorly done that it is a trivial matter to re-identify records. Typically, “we don’t release people’s names” is considered adequate anonymisation. The fact that parts of addresses, DOB, gender, etc are released indicates the lack of sophistication in this area.

bolamig · April 9, 2018, 10:46pm

Important distinction. But we need a new word for ‘have good evidence that the buck can be passed to the successor’. Trumpliefs?

Ghost · April 9, 2018, 10:57pm

I’m not willing to panic just yet…

Rainbow tables need to be big. A lot bigger than indicated in the article actually.

Ok, it may take just a couple of cents to calculate the hashes for 5 billion addresses, but that’s starting from the point where you already know all the addresses in use. I’m guessing: you don’t.
A rainbow table for all email addresses should cover all possible combinations of characters, digits, special characters,… (about 40 possible characters per position in the address) into email addresses and then calculate all the hashes for these combinations.

To illustrate: if someone used up 30 characters to make the not-so-crazy emailaddres "example.name01@something.co.uk" and we want to make sure we have the hash for that address in our rainbow table, our table needs to have all possible combinations of 30-character addresses. That would make our table 40^30 entries big… (and I admit it could be smaller since you can eliminate a lot of combinations because they would be obviously malformed as email addresses, but just computing the hash is faster than checking if it’s a possibly valid address)

Now, I don’t know exactly how big that number is, but I have a gut feeling it’s more than 5 billion…

Topic		Replies	Views
How your smartphone betrays you all day long boing	42	3766	September 21, 2014
23andMe to hacking victims: it's your fault because you reused passwords boing	46	986	January 11, 2024
"I hope the Chinese aren't collating the Ashley Madison data with their handy federal list of every American with a security clearance." boing	43	5276	August 25, 2015
Password-cracking software runs at 8 million guesses per second boing	66	6144	September 8, 2013
Ashley Madison data dump confirmed boing	72	6618	August 26, 2015

You can unscramble the hashes of humanity's 5 billion email addresses in ten milliseconds for $0.0069

Related topics