Excellent video about why the SHA 256 hash algorithm is so cool and useful

Originally published at: https://boingboing.net/2019/07/15/excellent-video-about-why-the.html


“It’s a one-way algorithm, which means there’s no known way to practically retrieve the input from the output.”

I think you need to modify this statement. There will be an infinite amount of different inputs that give the same output, so reversing the algorithm is not only hard but impossible.

What would be important for uses like bitcoin is an effective way of calculating any input that gives the correct hash signature. While this would break bitcoin many other uses would still be safe. If you have an image you want to fingerprint, even if someone manages to find another bit sequence that gives exaclty the same hash, that bit sequence will just be gibberish if looked at as a picture.


Um. SHA-2 256 or SHA-3 256? (I guess it doesn’t really matter for the purpose of the video.)


The SHA-2 one. SHA-256 is the official name of the 256-bit variant of SHA-2. This terminology predates SHA3, and SHA-1 is fixed at 160-bits, so at the time there was no ambiguity. When SHA-3 was standardized, they named its variants SHA3-(size) to avoid a conflict.

Reference: the NIST hash function project page, and the linked standards documents FIPS 180-4 (SHA-2) and FIPS 202 (SHA-3).


(Video OP here) You’re right, of course. I will clarify that in an extended “follow-up” video I’m planning for next week. Thanks for your input, I appreciate it. (Subscribe to my channel if you want to see the update when it comes out.)


Thanks. Yes, you’re correct. I’ll clarify these names in a follow-up video I’m doing, to explain the different variants better. I appreciate your concise description of the differences, and might quote you, if that’s okay?

Welcome, Matthew, and thanks for the video! Feel free to quote me in your follow-up. BTW, I do have one gripe about the video: around 1:30, when you talk about collision resistance, your phrasing seems to imply that there aren’t any collisions; there are lots, they’re just incredibly hard to find among all the non-collisions.

1 Like

Here’s a draft of what I’m planning to include in my follow-up video:

In the original video, I said that “the only time you ever end up with the same hash, or the same fingerprint, is if you started with the exactly identical input,” and while that’s true in the practical sense, it’s not technically, mathematically accurate. In reality, for each possible hash output, there is a finite set of inputs that will produce that that specific output.

However, since currently no one knows how to determine anything about the set of inputs for a specific hash output, and no one has ever found two files that are in the same input set producing the same output, for now we can assume that if you see identical hash output, it almost certainly came from identical inputs.

And I’ll show an example. The reason I say “finite set of inputs” is because I just learned that the SHA-256 specifies a maximum input size (of 2048 pebibytes minus one bit, or about 2305.8 Petabytes - see FIPS 180-3 specification). In addition, SHA-256 has never been proven to be mathematically surjective (you don’t want it to be), so we don’t know whether any of those input sets are the empty set. (In other words, we don’t know if there are any outputs for which there is no possible input).

Does that sound accurate to you? Thanks for your input.

Years back, I was trying to “time-stamp” my Zipped collection of notes for a project, using something like https://stampd.io/, but I couldn’t, because there was another file already on the Bitcoin blockchain a few years earlier with the same SHA-256 hash.

I now remember it’s https://proofofexistence.com/ where I did it. I found the file but, it should be sitting sitting at, Proof of Existence

It’s possible OneDrive mucked with the file contents (watermarking it?) and changed the resulting SHA-256 hash. I regret not writing everything down before I got distracted back then. I had no idea that collisions were not supposed to happen.

Of course, even if I could reproduce the same record, we wouldn’t have had much luck tracking down the other owner of the other file that supposedly matched the same hash. :slight_smile:

Interesting. I had never heard of the Proof of Existence project.

I believe you, that it said there was another file with the same hash. But the odds against that really happening are so huge, that I wonder if something else was really happening. Maybe a software error on their part? (like they were hashing an empty file, or the wrong file) Or you uploaded the same file a few years before and forgot?

Who knows? Even if there really was an accidental hash collision like this, I think the algorithm is still secure as long as no one knows how to do it intentionally.

Oh, I was under the impression from your description that it was almost mathematically impossible to have a hash collision. Like having to land on the same beach on all the earth-like planets in the observable part of our galaxy or something.

But since I was not able to reproduce the same result, there was something that went wrong in the chain. With large files we are supposed to SHA-256 hash it ourselves and paste it in before starting the notarization process. So either the ZIP file contents were watermarked by Microsoft (shaking fist) when it got uploaded to OneDrive and thus changing the resulting hash (I wouldn’t know unless I found another local copy of the same file), or there was indeed a bug at Proofofexistence.

I know I hadn’t uploaded before that date because I produced the ZIP file that day when I was about to notarize, and remember making a small tweak to the file with the intention to try again (making another 500 MB ZIP file). Back then the BTC amount that they wanted was equivalent to $7 so I quickly got distracted and didn’t follow through. The price did come down as they’ve adjusted the BTC ask price since then.

This topic was automatically closed after 5 days. New replies are no longer allowed.