The EternalArchive Project is an organization set on archiving and digitizing cultural works from various language groups.
Team members meet at #eternalarchive on irc.rizon.net .
The organization has various roles:
- EternalArchive/Solutions - Think Tank for figuring out how to archive a tremendous wealth of data safely and cheaply.
- EternalArchive/Discovery - Figuring out what content to archive and whether it is worth saving.
- EternalArchive/Acquisition - Acquisition of the cultural works from various sources, through totally legal means :^).
- Archival - Infrastructure for actually storing the data on LTO Tape or the Internet Archive.
One of the first problems in archiving data is managing the archival process, upon deciding what to archive.
There are several solutions that are available for managing the archival, in terms of management software. One solution, which is highly recommended is using a biblo software such as OpenBiblio or PMB PhpMyBibli slightly modified to track archival process and index tapes.
Another idea is to simply run Gazelle and have a private torrent tracker, with minimal modifications we could use it to track requests, torrent privately the data and archive it as a group. It also includes a forum and supports private (Invite only) registration. The same archive backend that WhatCD uses. It contains a system for seed requests and upload requests for torrents that don’t yet exist. One stipulation of running Gazelle is that you cannot share your torrent file as-is and must edit it as the tracker uses a private hash key for each user to connect and torrent, being a private auth tracker. We would also have to decide how concerned we are about ratio, if we aren’t concerned about ratio it should be okay to allow the addition of other trackers and possibly DHT on the torrent files.
The proposed system comprises of the following points:
- The present archival content is catalogued. Everything already on tapes and archived is listed to avoid duplicates.
- Other members who are running LTO or other archives are integrated into the “Library” system as external libraries where one can request archived content for their library.
- The items in queue to be archived are made available as an index similar to being in another library or marked as “In curation”
- Items that are in curation are sequentially, in order of some designated priority or in order of submission, to be acquired by some means.
- All items should have a content hash generated and be within a containing folder containing a torrent file as a checksum and metadata. We need a system of scripts for acquiring metadata for movies and anime and films. See what to archive: Discovery.
- For the purpose of this archive we will likely be considering tapes as shelves. Therefore, once a book is on a shelf it is archived successfully.
- Requests for archived content will be handled on a first come first serve basis at the discretion of the archivist. That is there may be some priority of release/retrieval of content for the archive teams to other archives or as a likely lower priority to the people discovering the content to be archived, due to the fact that redundancy is paramount across systems in the event of a disaster.
- A system for contacting other archivists privately should be put in place in the event that someone would like to request content stored on tapes in LTO format. The tapes would be
The Discovery Department is a task force to discover what cultural works are worth archiving in the first place.
- English Language - Global content written in the English language, the lingua franca of the modern world.
- Research Papers
- American Culture - Hollywood, Television, Capitalist Lifestyles, Reality TV, the like.
- TV, Movies, Music, Musicals
- British Culture
- TV, Movies, Music
- Japanese Culture
- AV, Movies, Music, Anime
- Taiwanese Culture - Has a deep influence on Chinese culture, but never the other way around.
- Chinese Culture
Cultural works from these linguistic spheres are currently unfamiliar to our team members and requires the assistance of team members from those areas for assistance.
- Korean Culture
- French Culture
- German Culture
- Indian Culture - Bollywood
- Spanish Language - Absolutely anything in Castellano.
- Portuguese Language
We need tools to record what works we need to find in a list system. This list also needs to show what we’ve already “watched” (archived), are “watching” (downloading), and what we “want to watch” (need to grab in the future).
need some sort of list system for American TV
IMDB has some good lists by year in box office or popular rating:
By box office for 2015 gross:
By popularity for 2015:
Depending on the amount of storage you may want to archive the top 10,25 or 50 movies and so on, grabbing the rest when you have more room.
This is the best way to list the anime we’ve collected and are worth collecting.
One method of deciding what to archive is to do so systematically: that is, (in terms of seasons) newest to oldest.
The benefit of this method is that the freshest content is easier to retrieve, and things thin out around the oldest content anyway.
One drawback is that multiseason series might get separated into different seasons.
Anime: Summer and Winter
Seasons for Japanese anime are Summer and Winter.
Another method of deciding what to archive is to do so on the basis of the first airing. This has the benefit of matching all the seasons of a show together.
The Acquisition department of EternalArchive is responsible for actually obtaining the cultural content, drawing from the lists that the Discovery Department has made.
Once the content is obtained, it is pushed to the Archival Department for archival on LTO tape or the Internet Archive (if public domain).
- Torrents - For a legal standpoint, of anyone uploading copyrighted material as storage for their legal backup, offered as a service with a review process, whether the user has acquired the backup of the purchased media through other means or not (although we cannot verify legitimacy or encrypt the files on a per-user basis. However, we don’t want to turn into the next TPB.)
- Download Sites
- Streaming Sites - Some streaming sites are great sources of data, especially if they’re all you can eat.
- Direct Rip - For some rather rare things, direct rip off of DVD or Blu-Ray may be necessary.
- manga.madokami.com - Massive ~4TB private collection of LNs and manga scans in English, rescued from the dying mangatraders. Ask Sunako for rsync access on irc.rizon.net.