Difference between revisions of "EternalArchive"

(Created page with "The EternalArchive Project is an organization set on archiving and digitizing cultural works from various language groups. Team members meet at #eternalarchive on irc.rizon.n...")
 
(Separated roles into their own pages)
 
Line 9: Line 9:
 
# [[EternalArchive/Acquisition]] - Acquisition of the cultural works from various sources, through ''totally legal means'' :^).
 
# [[EternalArchive/Acquisition]] - Acquisition of the cultural works from various sources, through ''totally legal means'' :^).
 
# [http://github.com/antonizoon/home-server/wiki/LTO-Tape Archival] - Infrastructure for actually storing the data on LTO Tape or the Internet Archive.
 
# [http://github.com/antonizoon/home-server/wiki/LTO-Tape Archival] - Infrastructure for actually storing the data on LTO Tape or the Internet Archive.
 
== Solutions ==
 
 
One of the first problems in archiving data is managing the archival process, upon deciding what to archive.
 
 
There are several solutions that are available for managing the archival, in terms of management software. One solution, which is highly recommended is using a biblo software such as [http://obiblio.sourceforge.net/ OpenBiblio] or [https://en.wikipedia.org/wiki/PMB_%28software%29 PMB PhpMyBibli] slightly modified to track archival process and index tapes.
 
 
Another idea is to simply run [https://github.com/WhatCD/Gazelle Gazelle] and have a private torrent tracker, with minimal modifications we could use it to track requests, torrent privately the data and archive it as a group. It also includes a forum and supports private (Invite only) registration. The same archive backend that [http://ssl.what.cd WhatCD] uses. It contains a system for seed requests and upload requests for torrents that don’t yet exist. One stipulation of running Gazelle is that you cannot share your torrent file as-is and must edit it as the tracker uses a private hash key for each user to connect and torrent, being a private auth tracker. We would also have to decide how concerned we are about ratio, if we aren’t concerned about ratio it should be okay to allow the addition of other trackers and possibly DHT on the torrent files.
 
 
The proposed system comprises of the following points:
 
 
# The present archival content is catalogued. Everything already on tapes and archived is listed to avoid duplicates.
 
# Other members who are running LTO or other archives are integrated into the “Library” system as external libraries where one can request archived content for their library.
 
# The items in queue to be archived are made available as an index similar to being in another library or marked as “In curation”
 
# Items that are in curation are sequentially, in order of some designated priority or in order of submission, to be acquired by some means.
 
# All items should have a content hash generated and be within a containing folder containing a torrent file as a checksum and metadata. We need a system of scripts for acquiring metadata for movies and anime and films. See what to archive: [[etarc/discover|Discovery.]]
 
# For the purpose of this archive we will likely be considering tapes as shelves. Therefore, once a book is on a shelf it is archived successfully.
 
# Requests for archived content will be handled on a first come first serve basis at the discretion of the archivist. That is there may be some priority of release/retrieval of content for the archive teams to other archives or as a likely lower priority to the people discovering the content to be archived, due to the fact that redundancy is paramount across systems in the event of a disaster.
 
# A system for contacting other archivists privately should be put in place in the event that someone would like to request content stored on tapes in LTO format. The tapes would be
 
 
= EternalArchive Discovery =
 
 
The Discovery Department is a task force to discover what cultural works are worth archiving in the first place.
 
 
== Cultural-Linguistic Continuums ==
 
 
* English Language - Global content written in the English language, the lingua franca of the modern world.
 
* Books
 
* Research Papers
 
* American Culture - Hollywood, Television, Capitalist Lifestyles, Reality TV, the like.
 
* TV, Movies, Music, Musicals
 
* British Culture
 
* TV, Movies, Music
 
* Japanese Culture
 
* AV, Movies, Music, Anime
 
* Taiwanese Culture - Has a deep influence on Chinese culture, but never the other way around.
 
* Chinese Culture
 
 
=== Requires Assistance ===
 
 
Cultural works from these linguistic spheres are currently unfamiliar to our team members and requires the assistance of team members from those areas for assistance.
 
 
* Korean Culture
 
* French Culture
 
* German Culture
 
* Indian Culture - Bollywood
 
* Spanish Language - Absolutely anything in Castellano.
 
* Portuguese Language
 
 
== Tools ==
 
 
We need tools to record what works we need to find in a list system. This list also needs to show what we’ve already “watched” (archived), are “watching” (downloading), and what we “want to watch” (need to grab in the future).
 
 
=== American TV ===
 
 
need some sort of list system for American TV
 
 
IMDB has some good lists by year in box office or popular rating:
 
 
By box office for 2015 gross:
 
 
http://www.imdb.com/search/title?sort=boxoffice_gross_us&title_type=feature&year=2015,2015
 
 
By popularity for 2015:
 
 
http://www.imdb.com/search/title?sort=moviemeter&title_type=feature&year=2015,2015
 
 
Depending on the amount of storage you may want to archive the top 10,25 or 50 movies and so on, grabbing the rest when you have more room.
 
 
=== Hummingbird.me (Anime) ===
 
 
This is the best way to list the anime we’ve collected and are worth collecting.
 
 
== Chronologic Archival ==
 
 
One method of deciding what to archive is to do so systematically: that is, (in terms of seasons) newest to oldest.
 
 
The benefit of this method is that the freshest content is easier to retrieve, and things thin out around the oldest content anyway.
 
 
One drawback is that multiseason series might get separated into different seasons.
 
 
=== Anime: Summer and Winter ===
 
 
Seasons for Japanese anime are Summer and Winter.
 
 
== Systematic Archival ==
 
 
Another method of deciding what to archive is to do so on the basis of the first airing. This has the benefit of matching all the seasons of a show together.
 
 
= EternalArchive Acquisition =
 
 
The Acquisition department of EternalArchive is responsible for actually obtaining the cultural content, drawing from the lists that the [[etarc/discover|Discovery]] Department has made.
 
 
Once the content is obtained, it is pushed to the [[etarc/archive|Archival]] Department for archival on LTO tape or the Internet Archive (if public domain).
 
 
== Sources ==
 
 
* Torrents - For a legal standpoint, of anyone uploading copyrighted material as storage for their legal backup, offered as a service with a review process, whether the user has acquired the backup of the purchased media through other means or not (although we cannot verify legitimacy or encrypt the files on a per-user basis. However, we don’t want to turn into the next TPB.)
 
* Download Sites
 
* Streaming Sites - Some streaming sites are great sources of data, especially if they’re all you can eat.
 
* Direct Rip - For some rather rare things, direct rip off of DVD or Blu-Ray may be necessary.
 
 
=== To Grab ===
 
 
* manga.madokami.com - Massive ~4TB private collection of LNs and manga scans in English, rescued from the dying mangatraders. Ask Sunako for rsync access on irc.rizon.net.
 

Latest revision as of 03:56, 5 March 2017

The EternalArchive Project is an organization set on archiving and digitizing cultural works from various language groups.

Team members meet at #eternalarchive on irc.rizon.net .

The organization has various roles:

  1. EternalArchive/Solutions - Think Tank for figuring out how to archive a tremendous wealth of data safely and cheaply.
  2. EternalArchive/Discovery - Figuring out what content to archive and whether it is worth saving.
  3. EternalArchive/Acquisition - Acquisition of the cultural works from various sources, through totally legal means :^).
  4. Archival - Infrastructure for actually storing the data on LTO Tape or the Internet Archive.