Editing Ayase/MD5 Collisions

From Bibliotheca Anonoma

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
In recent years, a variety of mechanisms for generating md5 collisions have been made practical and well-publicised. More recently still, practical methods have been found to exploit these exploits with media files. A good demonstration of how broken md5s are is animated "hashquines" [https://www.rogdham.net/2017/03/12/gif-md5-hashquine.en] [https://twitter.com/__spq__/status/838583044260904960], which use md5 collisions to display the animated GIF's own md5 hash.
In recent years, a variety of mechanisms for generating md5 collisions have been made practical and well-publicised. More recently still, practical methods have been found to exploit these exploits with media files. A good demonstration of how broken md5s are is animated ["hashquines"], which use md5 collisions to display the animated GIF's own md5 hash.


4Chan and its archives depend on md5 to a certain extent for identifying unique media files. 4Chan uses md5 in its spam detection process (and elsewhere), and Asagi-based archivers use the "uniqueness" property for deduplication.
4Chan and its archives depend on md5 to a certain extent for identifying unique media files. 4Chan uses md5 in its spam detection process (and elsewhere), and Asagi-based archivers use the "uniqueness" property for deduplication.
Line 5: Line 5:
= MD5 Vulnerabilities =
= MD5 Vulnerabilities =


There are two forms of the MD5 collision exploit discovered so far: a "chosen-prefix" and an "identical-prefix" collision mechanism.
There are two forms of the MD5 collision exploit discovered so far: a fixed-prefix and a unfixed-prefix collision mechanism.
   
   
* The "identical-prefix" style of exploit inserts "collision blocks" within otherwise identical files to generate md5 collisions. Files generated with this style of collision have been demonstrated to pass through any 4chan post-processing steps without alteration. 4chan and its archives are vulnerable to at least gif md5 collisions, and probably to exploits crafted for other file formats as well. For more information on identical-prefix collisions, see [https://www.mscs.dal.ca/~selinger/md5collision/ this explanation/example] and [https://github.com/corkami/collisions#fastcoll-md5 this discussion on hash collisions in various image formats]. Here's an [https://desuarchive.org/_/search/image/XIJ8Drqc-qZHwaSJvqd8YA/end/2019-09-03/ example] of an archived collision, using some of corkami's example images. Both images show in the image search because Asagi deduplication is per-board.
* The "unfixed-prefix" style of exploit inserts "collision blocks" within otherwise identical files to generate md5 collisions. Files generated with this style of collision have been demonstrated to pass through any 4chan post-processing steps without alteration. 4chan and its archives are vulnerable to at least gif md5 collisions, and probably to exploits crafted for other file formats as well. For more information on unfixed-prefix collisions, see [hashclash], [this exploitation of hashclash], and [this article] on md5 collisions in image formats.
* The "chosen/fixed-prefix" exploit allows for an arbitrary pair of chosen files to be appended with "collision blocks" until they share the same md5. More info on this style of collision can be found [https://natmchugh.blogspot.com/2014/10/how-i-created-two-images-with-same-md5.html here]. This style of exploit can be countered on the backend by removing bytes past the media file trailer (a pattern signifying the end of the file), and it seems that this is part of the post-processing 4chan does on media upload.
* The "fixed-prefix" exploit allows for an arbitrary pair of chosen files to be appended with "collision blocks" until they share the same md5. More info on this style of collision can be found here: [https://natmchugh.blogspot.com/2014/10/how-i-created-two-images-with-same-md5.html] . This style of exploit can be countered on the backend by removing bytes past the media file trailer (a pattern signifying the end of the file), and it seems that this is part of the post-processing 4chan does on media upload.
 
* The existing types of md5 collision exploits are not known to pose a major risk to either the main site or its archives, because they can only be performed intentionally by an "attacker" who must generate and post both pieces of media. However, they do introduce a quirk which has a minor impact on the integrity of the archive:
The existing types of md5 collision exploits are not known to pose a major risk to either the main site or its archives, because they can only be performed intentionally by an "attacker" who must generate and post both pieces of media. However, they do introduce a quirk which has a minor impact on the integrity of the archive:
** Hiding images from archives: a user can post two md5-colliding images to the same board with a delay, and the second image will never be archived by Asagi-based archives. This is thanks to the md5-based deduplication mechanism which it uses, which will skip downloading an image if its md5 is already present in its database (the 4Chan API has an md5 field). This exploit is somewhat concerning since it prevents 100% fidelity at the post level: [the wrong image will be linked in the archive].
 
* Neither of the above types of attacks are pre-image attacks; weaponizable pre-image attacks would pose a much larger risk to the mainsite and archives. If semi-arbitrary images could be generated with the same md5 as another arbitrary image posted by another user, automod or mod systems relying on media hashes could be gamed to ban non-offending users or media. e.g. a user could post an image, and then another user could generate an image with illegal or ban-worthy content sharing the same md5 as the first user. The first user or media file could end up banned from the mainsite and/or archives, along with the offending user/media. Note that this scenario is completely theoretical, and even if a pre-image attack were to exist, it would also need to be very flexible to be weaponized in this way.
* Hiding images from archives: a user can post two md5-colliding images to the same board with a delay, and the second image will never be archived by Asagi-based archives. This is thanks to the md5-based deduplication mechanism which it uses, which will skip downloading an image if its md5 is already present in its database (the 4Chan API has an md5 field). This exploit is somewhat concerning since it prevents 100% fidelity at the post level: [the wrong image will be linked in the archive].
 
Neither of the above types of attacks are pre-image attacks; weaponizable pre-image attacks would pose a much larger risk to the mainsite and archives. If semi-arbitrary images could be generated with the same md5 as another arbitrary image posted by another user, automod or mod systems relying on media hashes could be gamed to ban non-offending users or media. e.g. a user could post an image, and then another user could generate an image with illegal or ban-worthy content sharing the same md5 as the first user. The first user or media file could end up banned from the mainsite and/or archives, along with the offending user/media. Note that this scenario is completely theoretical, and even if a pre-image attack were to exist, it would also need to be very flexible to be weaponized in this way.


= Mitigations =
= Mitigations =
Please note that all contributions to Bibliotheca Anonoma are considered to be released under the Creative Commons Attribution-ShareAlike (see Bibliotheca Anonoma:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!
Cancel Editing help (opens in new window)