Archival Tools
From Bibliotheca Anonoma
Revision as of 17:46, 16 October 2016 by Antonizoon (talk | contribs) (Created page with "== Complete Website Archival == === Wget === Outputs plain HTML. <pre> wget -mbc -np "http://aya.shii.org" \ --convert-links \ --adjust-extension \ --page-requisit...")
Complete Website Archival
Wget
Outputs plain HTML.
wget -mbc -np "http://aya.shii.org" \ --convert-links \ --adjust-extension \ --page-requisites --no-check-certificate --restrict-file-names=nocontrol \ -e robots=off \ --waitretry 5 \ --timeout 60 \ --tries 5 \ --wait 1 \ --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
- `-k` or `-K` - keep original file as `*.orig` after `--adjust-extension`. Otherwise, when remirroring with wget, each page will be redownloaded.
Wget WARC
Outputs in WARC format, ready for upload to the Internet Archive.
wget -mbc -np "http://aya.shii.org" \ --page-requisites --no-check-certificate --restrict-file-names=nocontrol -e robots=off \ --waitretry 5 \ --timeout 60 \ --tries 5 \ --wait 1 \ --warc-file=aya.shii.org \ --warc-cdx \ --warc-max-size=1G \ --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
Youtube-dl
youtube-dl "http://www.youtube.com/playlist?list=PL3634152194A90D8B&feature=mh_lolz" \ --write-thumbnail \ --write-description \ --write-info-json \ --write-annotations \ --write-sub \ --all-subs \ --add-metadata \ --embed-subs \ --restrict-filenames \ --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"