Wednesday, August 18, 2004

MD5 collisions discovered

On the cryptography front, Chinese researchers Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu announced this week their discovery [57KB PDF] of a relatively simple method for generating MD5 collisions, demonstrating such collisions for the MD5, MD4, HAVAL-128, and RIPEMD-128 hashing algorithms. While they have not as yet released the details of their implementation, there is little doubt that they've discovered a serious flaw in MD5 given that brute force attempts like MD5CRK have been working on this problem for some time without results (leaving the $10000CAD CertainKey prize unclaimed), while their paper shows two pairs of unique collisions for each of the 4 related algorithms. On hearing the news, MD5CRK announced that their project (which had been running for about 6 months and had reached 3,824 Million MD5 transforms per second) would be shut down within 48 hours: thank you, drive through.

The MD5 hash (see RFC1321 for implementation details) is a one-way function that is used (in conjunction with public-key cryptosystems such as RSA) to establish a secure connection almost every time you purchase something online or log into your bank's webpage. Theoretically its job is to generate a unique identifier for any piece of data - for example, your password won't be sent in cleartext over a secure connection; the hash of your password (combined with other unique session data) will be sent instead. Anyone snooping the line in between would be able to see the hash but would not be able to reverse engineer the password from it. On the receiving side, the server generates the hash as well, and the two hashes are compared... if they match, you just managed to transfer information without any of the sensitive bits ever getting into the clear.

An MD5 collision occurs when two different pieces of information generate the same hash. The discovery of the existence of these collisions doesn't mean your 'secure' banking is no longer secure - but it does mean that patterns can now be found in the relationships between data and its hash, and cryptanalysis is just tools for exploiting patterns in encoded data to reveal the plaintext source. The paper describing the collisions only shows that a collision can occur... it's another issue entirely to leap from there to being able to create a specific message whose hash collides with another.