Editing Archival Tools
From Bibliotheca Anonoma
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 37: | Line 37: | ||
--warc-max-size=1G \ | --warc-max-size=1G \ | ||
--user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" | --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" | ||
</pre> | </pre> | ||
Line 79: | Line 49: | ||
# After you click “Alphabetize Text”, copy+paste your link list into a text file and save it. | # After you click “Alphabetize Text”, copy+paste your link list into a text file and save it. | ||
# Open Linux terminal, type this command ( [https://github.com/chfoo/wpull based on this command] ), and run: | # Open Linux terminal, type this command ( [https://github.com/chfoo/wpull based on this command] ), and run: | ||
#* <pre>wpull -i TEXTFILE --page-requisites --no-robots --no-check-certificate --tries 3 --timeout 60 --delete-after --warc-file WARCNAME --warc-max-size=4294967296 --database DATABASE.db --output-file OUTPUT.log --user-agent "Scraper v1.0"</pre | #* <pre>wpull -i TEXTFILE --page-requisites --no-robots --no-check-certificate --tries 3 --timeout 60 --delete-after --warc-file WARCNAME --warc-max-size=4294967296 --database DATABASE.db --output-file OUTPUT.log --user-agent "Scraper v1.0"</pre | ||
#* <code>--youtube-dl</code> - (Optional) add this if there are videos you want to download. '''Please download video hosting links standalone because it will give problems if you use this argument while downloading ordinary web pages.''' | #* <code>--youtube-dl</code> - (Optional) add this if there are videos you want to download. '''Please download video hosting links standalone because it will give problems if you use this argument while downloading ordinary web pages.''' | ||
#* <code>--warc-append</code> - (Optional) add this if a WARC stopped downloading and you want to resume. | #* <code>--warc-append</code> - (Optional) add this if a WARC stopped downloading and you want to resume. | ||
Line 85: | Line 55: | ||
# After installing internetarchive, use this command to upload, and you are finished: | # After installing internetarchive, use this command to upload, and you are finished: | ||
<pre | <pre> | ||
ia upload <identifier> <file/foldername> \ | ia upload <identifier> <file/foldername> \ | ||
--metadata="title:<title>" \ | --metadata="title:<title>" \ | ||
--metadata="subject:<tag>;<tag>; | --metadata="subject:<tag>;<tag>;etc... | ||
</pre> | |||
== Youtube-dl == | == Youtube-dl == | ||
Line 106: | Line 76: | ||
--user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" | --user-agent "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" | ||
</pre> | </pre> | ||