Saturday, February 19, 2005

more on sha-1 collisions

Bruce Schneier has more thoughts on the recent discovery of SHA-1 hash collisions that put the situation into proper perspective. As Jon Callas, CTO of PGP, says: "This is not a run for the exits, the place is on fire kind of situation... it's the fire alarm is on, this is not a drill, please move to the exits."

There is a lot of hand-waving going on about this discovery, so I took a deeper look at the math to find out what is really going on. SHA-1 generates a 160-bit hash from any message, from the letter 'a' to the entire Library of Congress. Since there are no restrictions on message length, there are an infinite number of messages... and unfortunately this means there are also an infinite number of messages that map to the same hash. Even though hashes like SHA-1 are used to validate the authenticity of messages, the situation isn't as bad as it seems because the vast majority of messages that map to the same hash will be garbage, or at least contextually unrelated to the 'real' message.

But what if someone found a way to generate a message that had the same hash as another? Suppose the real message is "IOU $100", and you use SHA-1 to digitally sign it... and then someone finds that "IOU $100000" maps to the same hash, and tacks your digital signature on to the end of the new fake message. The trust boundary has been broken, and every message becomes suspect.

Currently the situation isn't as bad as that... the recent discovery does not provide a mechanism for finding a new message that has the same hash as an existing message. It does show how to create two messages that have the same hash, but the term 'message' is used loosely here since the methods involved don't maintain contextual coherence between the two messages - there isn't an easy way to make the two messages both adhere to any specific syntax like the English language or bank transaction protocols.

Since there are 2160 possible hashes (~1.46*1048) in SHA-1, a brute-force attempt to find two messages that map to the same hash should take on average 280 (~1.20*1024) attempts before succeeding. The recent discovery shows a way to reduce the number of attempts to 269 (~5.90*1018), or 2048 times fewer calculations. Reducing such large numbers by a factor of 2048 might seem insignificant, but suppose the new method allows someone to generate a collision in one day instead of 2048 days... there are a few computer systems in existence right now that can do this in ~2.5 days, versus 14 years doing it the old way.

The problem of creating a new message whose hash collides with another specific message is still way beyond the computational power that exists today... there are still infinities in the equation, and you can divide as much as you want out of infinity without gaining a thing. On the other hand, like the lottery, you could hit it on the first try. The odds are so astronomically against it that no one would bother trying - unless, like during WWII, there was enough on the line to justify the effort. Crypto buffs take for granted the idea that right now, given enough incentive, the government could build a system that could crack problems like this in hours... and that in a few years, computers will be so much more powerful that it wouldn't even require the resources of a country to do this. The trick is to make the problems more complex faster than computers make the solutions more accessible.