Ayase/MD5 Collisions

From Bibliotheca Anonoma
Revision as of 22:15, 4 September 2019 by Baystdev (talk | contribs) (New page on the problem of MD5 collisions, mitigations etc.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

MD5 Vulnerabilities

There are two forms of the MD5 collision exploit discovered so far: a fixed-prefix and a unfixed-prefix collision mechanism.

  • The "unfixed-prefix" style of exploit inserts "collision blocks" within otherwise identical files to generate md5 collisions. Files generated with this style of collision have been demonstrated to pass through any 4chan post-processing steps without alteration. 4chan and its archives are vulnerable to at least gif md5 collisions, and probably to exploits crafted for other file formats as well. For more information on unfixed-prefix collisions, see [hashclash], [this exploitation of hashclash], and [this article] on md5 collisions in image formats.
  • The "fixed-prefix" exploit allows for an arbitrary pair of chosen files to be appended with "collision blocks" until they share the same md5. More info on this style of collision can be found here: [1] . This style of exploit can be countered on the backend by removing bytes past the media file trailer (a pattern signifying the end of the file), and it seems that this is part of the post-processing 4chan does on media upload.
  • The existing types of md5 collision exploits are not known to pose a major risk to either the main site or its archives, because they can only be performed intentionally by an "attacker" who must generate and post both pieces of media. However, they do introduce a quirk which has a minor impact on the integrity of the archive:
    • Hiding images from archives: a user can post two md5-colliding images to the same board with a delay, and the second image will never be archived by Asagi-based archives. This is thanks to the md5-based deduplication mechanism which it uses, which will skip downloading an image if its md5 is already present in its database (the 4Chan API has an md5 field). This exploit is somewhat concerning since it prevents 100% fidelity at the post level: [the wrong image will be linked in the archive].
  • Neither of the above types of attacks are pre-image attacks; weaponizable pre-image attacks would pose a much larger risk to the mainsite and archives. If semi-arbitrary images could be generated with the same md5 as another arbitrary image posted by another user, automod or mod systems relying on media hashes could be gamed to ban non-offending users or media. e.g. a user could post an image, and then another user could generate an image with illegal or ban-worthy content sharing the same md5 as the first user. The first user or media file could end up banned from the mainsite and/or archives, along with the offending user/media. Note that this scenario is completely theoretical, and even if a pre-image attack were to exist, it would also need to be very flexible to be weaponized in this way.