Editing Ayase

From Bibliotheca Anonoma

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 60: Line 60:
** While there is a question of whether the ones generating the first sha256sum can be trusted, md5sums also will have this same issue as md5sum collisions can be generated and injected into the archive anytime. At least the fact that the file has the same sha256sum and md5sum at archival time should at least vouch for the fact that the archiver did not change it after a file was scraped.  
** While there is a question of whether the ones generating the first sha256sum can be trusted, md5sums also will have this same issue as md5sum collisions can be generated and injected into the archive anytime. At least the fact that the file has the same sha256sum and md5sum at archival time should at least vouch for the fact that the archiver did not change it after a file was scraped.  
** Even without hardware acceleration chips, sha256sums will still be quicker to generate than files can be read from disk, even SSDs.  Even if a mass conversion of filenames has to be done, it would only take 4 days to deal with 30TB, though SSD caching might be necessary.
** Even without hardware acceleration chips, sha256sums will still be quicker to generate than files can be read from disk, even SSDs.  Even if a mass conversion of filenames has to be done, it would only take 4 days to deal with 30TB, though SSD caching might be necessary.
* Image files are to be stored in double nested folders based on characters in the filename, as seen in Futabilly. This therefore provides a near perfect random distribution where each folder is on average the same size.
* They are to be stored in double nested folders as seen in Futabilly. This therefore provides a near perfect random distribution where each folder is on average the same size.
** This could allow partitioning of the storage between each alphanumeric range, or if even necessary at huge sizes, single servers.
** This could allow partitioning of the storage between each alphanumeric range, or if even necessary at huge sizes, single servers.
** There could exist an nginx config on the server s1, where the scraper is located. We first have the user start accessing s1, which checks if the sha256sum is on local disk (maybe checking a redis index to avoid disk i/o).
** There could exist an nginx config on the server s1, where the scraper is located. We first have the user start accessing s1, which checks if the sha256sum is on local disk (maybe checking a redis index to avoid disk i/o).
** if it is not found, then the nginx config redirects sha256sum ranges 0-5 to s2, a-g to s3, h-l to s4, etc. These ranges can be modified as they get copied around or rebalanced.
** if it is not found, then the nginx config redirects sha256sum ranges 0-5 to s2, a-g to s3, h-l to s4, etc. These ranges can be modified as they get copied around or rebalanced.
* sha256sum should be appended as an extra value to each post right next to md5sum. Then there should be an images table with a SERIAL primary key, sha256sum as unique key, and a foreign key for the datastore location that the specific image is currently in.   
* sha256sum should be appended as an extra value to each post right next to md5sum. Then there should be an images table with a SERIAL primary key, sha256sum as unique key, and a foreign key for the datastore location that the specific image is currently in.   
** While sha256sum cannot necessarily be made into a foreign key in JSONB, finding it from the images table just a matter of putting it back into the search query and having the index do the rest.
** While sha256sum cannot necessarily be made into a foreign key in this state, it's just a matter of putting it back into the search and having the index do the rest.
** The SERIAL primary key is used even though sha256sum is the real key used as seen from its unique constraint, because it also doubles as a record of when the file was scraped and added to the database. This is necessary for incremental backup systems brought straight over from Asagi (still to be made though).
 
URL Resolution would probably have to take some special thought, especially if the images are subdivided across various different hosts. One easy


The question of radix directories should be considered. Since gitlab does 2 characters/2characters without a problem even at large scale, at least thats a proven standard to follow. However, some archives prefer 2 characters/1, it makes less directories.
The question of radix directories should be considered. Since gitlab does 2 characters/2characters without a problem even at large scale, at least thats a proven standard to follow. However, some archives prefer 2 characters/1, it makes less directories.
Please note that all contributions to Bibliotheca Anonoma are considered to be released under the Creative Commons Attribution-ShareAlike (see Bibliotheca Anonoma:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!
Cancel Editing help (opens in new window)