Archival Tools

From Bibliotheca Anonoma
Revision as of 17:46, 16 October 2016 by Antonizoon (talk | contribs) (Created page with "== Complete Website Archival == === Wget === Outputs plain HTML. <pre> wget -mbc -np "http://aya.shii.org" \ --convert-links \ --adjust-extension \ --page-requisit...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Complete Website Archival

Wget

Outputs plain HTML.

wget -mbc -np "http://aya.shii.org" \
   --convert-links \
   --adjust-extension \
   --page-requisites --no-check-certificate --restrict-file-names=nocontrol \
   -e robots=off \
   --waitretry 5 \
   --timeout 60 \
   --tries 5 \
   --wait 1 \
   --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
  • `-k` or `-K` - keep original file as `*.orig` after `--adjust-extension`. Otherwise, when remirroring with wget, each page will be redownloaded.

Wget WARC

Outputs in WARC format, ready for upload to the Internet Archive.

wget -mbc -np "http://aya.shii.org" \
   --page-requisites --no-check-certificate --restrict-file-names=nocontrol   -e robots=off \
   --waitretry 5 \
   --timeout 60 \
    --tries 5 \
   --wait 1 \
   --warc-file=aya.shii.org \
   --warc-cdx \
   --warc-max-size=1G \
   --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"

Youtube-dl

youtube-dl "http://www.youtube.com/playlist?list=PL3634152194A90D8B&feature=mh_lolz" \
   --write-thumbnail \
   --write-description \
   --write-info-json \
   --write-annotations \
   --write-sub \
   --all-subs \
   --add-metadata \
   --embed-subs \
   --restrict-filenames \
   --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"