LTO Tape
LTO Tapes are hands down the best and cheapest way to archive large amounts of data.
Using LTO Tapes can be a bit of a challenge, but if you're willing to use some simple Linux commands you can do it. We've detailed the whole process of buying some tape drives and tapes, setting them up with a normal desktop with PCI-E (or better yet an ECC RAM Server), and how to actually write to the tapes.
It was a tremendous challenge to build this guide, but thankfully it won't be so hard for you once you read through the details below.
Introduction
Store anywhere from 500GBs, or even 6.25 Terabytes of information on $35 tapes. That's nearly 1-2 cents per gigabyte! LTO tapes are a cheap and long-lasting data archival solution for video editors, webserver admins, or online archivists.
Here's a great overview of the LTO format and how it benefits video production, or virtually anyone who needs to safely store hundreds of gigabytes of data for decades to come.
Choosing an LTO Type
Since LTO Drives were designed for the server market, it will be difficult to buy a drive as an individual; let alone at prices less than $100.
On the other hand, since organizations toss waves of old electronics all the time; you can easily find an LTO-3 (400/600GB) tape drive for $50 on eBay, and even less at your local surplus store. This is more than enough for a small-time archivist, and you can upgrade anytime. So choose the LTO level that fits your needs:
- LTO-3 (400/600GB)
- Tapes: $15 each
- Drives: ~$50-80 (used)
- LTO-4 (800GB)
- Tapes: $20 each
- Drives: ~$150-375 (used)
- LTO-5 (1.5/3TB)
- Tapes: $30-40 each
- Drives: ~$300 (used), ~$1,000-2,000 (new)
- LTO-6 (2.5/6.25TB)
- Tapes: $50 each
- Drives: $500 (used), ~$3,500 (new)
Here's a comparison chart.
Hooking up an LTO Drive
Now that you have an LTO Drive and some tapes; you need to use special adapters and cables to hook it server-grade SAS to a typical computer. Unfortunately, this can be somewhat difficult. But that's why this guide exists.
- Desktops - It is significantly easier and cheaper to find a PCI adapter card than a laptop one, since many servers are based on PCI-Express anyway. And it's not like your tape drive or the library is portable anyway, so grab a cheap desktop and plug in one of these cards.
- SCSI
- SAS to PCI - An expansion card that gives your computer the ability to use SAS drives.
- HP Smart Array P411 - A great card for desktops. If you can find one. HP makes a wide range of variants, so check the Model Comparison list under it for more info. Most of these cards require Full size SAS to Mini SAS adapters, which should only cost an extra $20.
- HP Smart Array P400 - An older generation of the above. Since we're no server junkies, this should be good enough for most people.
- $20 - SPlusDirect - HP Smart Array P400 RAID
- Laptops - It can be a challenge to connect SAS to a laptop.
- SCSI - Not sure how to interface with this ancient protocol.
- SAS to Expresscard - The most popular method. Instructions here.
- $3,500 - MLogic MTape LTO-6
- Sal Guarisco's LTO Drive Connection System
- SAS to PCI to PE4C Bridge - Info here? If you happen to have a PE4C bridge from an eGPU, the best method is to grab one of the HP SmartArray desktop PCI cards and plug them into the Expresscard bridge. Otherwise, don't bother, a PE4C costs $90.
- Thunderbolt - If you have a recent Macbook, a Mac Mini, or even some high-end gaming desktop motherboards, they all come with Intel Thunderbolt, a convenient PCIe plug.
- But if you don't need a whole desktop, get Intel's Thunderbolt-equipped Next Unit of Computing (
DC3217BY
for $225 kit,D33217CK
for $140 board-only), a full Desktop on a chip+CPU. It's the exact same hardware as a Mac Mini, just cheaper. - SAS to Thunderbolt - The fastest, easiest method.
And no, you can't just hook up SAS to a SATA port. They are different protocols.
Storing Data
Now that your drive is set up, what format do you store data in?
We strongly recommend that you use free open-source tools and filesystems, such as LTFS. Everything else costs thousands of dollars, and might not last as long as your tapes will.
- LTFS
Tape Tips
Remember that Tapes are a sequential access technology, unlike random-access hard drives. Therefore, tapes should be used for archival, not day-to-day usage.
You dump your massive files in, and put the tape on the shelf when it's full. If you ever need to access the files again (perhaps after your hard drive got corrupted, or just to look at previous records), that's when you reach for the tape.
Why use Tapes for Archival/Backup?
Why a Tape Backup System is still popular
LTO Tapes are beyond your ol' VHS tapes: they can last up to 30 years in a good air conditioned room, and within 5 years you'll probably upgrade two generations up in LTO once the drives get cheaper.
You could take a hammer to a tape drive and rewind it on to a new tape. You'd lose what ever part got damaged but anything else would be salvageable.
They survive drops and falls much better than 4TB hard drives. They're easy to send off site. LTO tapes are backwards compatible for 3 generations.
Most of all, LTO-3 and LTO-4 tapes and drives are very inexpensive, at $10-15 for a 400-800GB tape, and around $100 for a good used drive.
There's a reason some very important stuff is backed up to Tape. I'll use 4 TB drives or Blu-rays to back up archived movies. I'll use tapes to back up family photos.
And large cloud data providers such as Google or Amazon would be teetering on hard drive destruction without cheap tape backups. Google once lost a significant portion of Gmail data from runaway bad code, but recovered everything just by picking out a tape from the shelf (with a robot arm).
Read/Write Speeds
One of the big reasons that LTO drives are so expensive when sold new (but not used) is that these drives use probably some of the fastest motors ever developed, and read data at very high speeds.
Notice that with each increasing LTO level, the minimum drive read/write speed increases as well. LTO-3 requires 40MB/s, while LTO-4 requires 100MB/s, and that's just uncompressed.
This can be mitigated by providing enough GBs of RAM for buffering, since RAM can write data very quickly. We recommend about 2-4GBs of RAM for LTO-3, and 4-8GBs for LTO-4.
Any modern desktop computer with a PCI-E slot would easily work with LTO 3, 4, and 5. We do recommend ECC RAM for mission critical data (standard on Xeon server blades), which conducts checksum verification on the data stored inside the RAM itself: But for movies, text, and images, you can probably live without error checking.
Data Migration
One aspect to watch out for is obseletion.
It's not hard to fight obseletion. Here's how you do it:
- Don't use proprietary software/standards. Stick to
tar
, which have been around forever and won't stop being supported on Linux. Or on LTO-5 and up, useLVDS
, which looks like a standard hard drive filesystem. This way, you know what format your tapes used and you can recover a piece even if they are sliced up (not ideal, but try that with a hard drive). - Migrate your old tapes if you're upgrading to a new Standard - You shouldn't be too worried about the coming of the next LTO standard. They have read backwards compatibility up to two generations back. In addition, an LTO tape one generation up can store 2 times the data, and 4 times for two generations up. Your old backups will just get smaller upon transfer.
Alternatives
https://www.digitalrebellion.com/blog/posts/backup_options_for_filmmakers
- Hard drives are out of the question, since there's been big issues
LTO Types
Note that all standards LTO-4 and lower must use tarballs for spitting data. No need to compress, it does hardware compression on it's own. LTO-5 and up supports "partitioning" and can utilize LTFS, an open backup format which is incredibly easy to work with.
Blu-ray
The first step is to compare against the price of typical Blu-ray. It's not an exact comparison, but it's a good marker. While you do have to switch out disks and use DVDisaster using Blu-ray, that compares easily against the non-intuitiveness of tapes. Thus, to be practical, the tape price must beat the cost per gigabyte
- $23 - Plexdisc 50 pack, 1,250GB, 1.84 cents per gigabyte, 54GB for $1, 46 cents per 25GB disc.
LTO-3
The absolute minimum you should go for. It's 3.75 cents per gigabyte at $15, or $1 for 22-45GB. That's actually not that cost effective against Blu-rays (LTO-4 would be much better value), but if the data is compressible you can fit more, at least the drives are significantly cheaper, and what you get is volume and longevity over burned Blu-Rays.
I could in fact salvage one for Justin, get all the parts except for the drive (which he buys himself). Most importantly, a second PCIe to SCSI card from a Dell PowerEdge.
- Examples
- https://www.youtube.com/watch?v=zmYq71gKDd8
- https://www.reddit.com/r/freenas/comments/30cbpd/any_one_else_backup_their_server_to_lto_tape/
- Configuration Guide for Linux
- Parts - Rings up to about $50-70, depending on which items you already have.
- Drive Comparison
- $33 HP StorageWorks 960 - This drive is dirt cheap and uses SCSI (68-pin Internal).
- $30 - LSI Logic PCI-E to SCSI Adapter - This is used to connect to the drive in question. If you want to be extra cheap, find a 2000s era computer and buy a PCI-X SCSI card for just $10.
- I'm going to need to find more of these at the scrapyard, they're so valuable. Find at least two more.
- $10 Internal SCSI Cable with built-in Terminator - If you're using an internal drive, this is what you need.
- $10 68pin to VHDCI SCSI cable - If you're lucky enough to have a drive equipped to work externally (with it's own terminator), this cable allows you to wire up the drive with the SCSI board to the small VHDCI port.
- $0 - Molex Power Supply - Use a good ol' PC Power supply to fire it up.
Other LTO Types
- LTO-4 - The best value when it comes to media, at 1.85 cents per gigabyte, and $1 for 40-80GB. This handily beats Blu-ray. The drives are more expensive than LTO-3 at $200, but if you bid you might get lucky with $50-100. At least you're not scraping the bottom of the barrel.
- Think about it: you have double the storage at the same price as LTO-3, so it you would save $15 per tape. Weigh that against the cost of an LTO-4 drive.
- LTO-5 - $1 for 75-150GB - At this point, the price of drives starts to climb up significantly, to $500. Whether that will pay for itself in a year is your challenge. Hopefully we can find it on a Dell PowerEdge blade, but don't push your luck.
- LTO-6 Photographers and Moviemakers have been using external LTO drives for years.
Silicon Mechanics Server
The best way is to get a mini SATA server to run the tape library. This little blade server is the ideal system
Upgrades will be duly necessary:
- 2TB Hard Drive - The Hard Drive functions as the data cache, while the data is being written over.
- 4Gigabit Fiber Channel HBA - I already have one of these, and
- LTO-5 Effect - It looks like LTO-5 will only saturate to 150mbps, so 4GB is more than enough.
Usage
256k Block Size
For LTO4 and up, Only use `dd` with the block size specified as `bs=256k`. At least for HP Tape Drives, it always subdivides into blocks of these sizes so it will run at maximum speed.
- "A block size no larger than 256 KB (262144 bytes) is strongly recommended when working with HP-UX and tape or VTL devices. Backup applications should be configured to work with I/O block sizes that are no larger than 256 KB. Please check your application documentation to find out how to check or configure block sizes used for transfers to and from tape or VTL devices.
- This is because, by default, the HP-UX stape driver processes a block size larger than 256 KB by subdividing it into 256 KB blocks for writing to tape (giving a net effect of 256 KB I/O transfers)
- For example a 1 MB block (1048576 bytes) is written to tape as four 256 KB blocks. During restore, stape attempts to reconstruct the original block size that was larger than 256 KB with the 256 KB blocks from tape. This subdivision and subsequent reconstruction process of block sizes larger than 256 KB adds unnecessary complexity and risk to tape positioning and restore operations and offers no net gain in terms of increased block size. It should therefore be avoided.
- - HP LTO Ultrium 6 Tape Drive Reference Manual, Volume 5 p. 6 (Visit HP Enterprise for access)
Prerequisites
Make sure you have chosen your LTO type. Then, get a Host Bus Adapter that is compatible with your chosen LTO drive, and wire it up following one of the directions below:
Dependencies
tar
and mt
come by default with all Linux systems. However, you will want the mt-st
and mtx
packages to provide more detailed tapeinfo, and control robotic autochangers (if equipped).
We will also want pv
to view progress, and mbuffer
to buffer the data before pushing.
sudo apt-get install mtx mt-st pv mbuffer
Kernel Module
You will need to set the kernel module st
on boot before you will be able to detect tape drives.
This will enable it temporarily:
sudo modprobe st
However, we want to enable it on every boot. Check how to do so. On Ubuntu 14.04 just add st
on a line to /etc/modules
.
mtx Tapeinfo
We want to figure out specific information about the tape drive we're using, so let's run the following command to get a status report:
# tapeinfo -f /dev/nst0 Product Type: Tape Drive Vendor ID: 'HP ' Product ID: 'Ultrium 3-SCSI ' Revision: 'G63W' Attached Changer API: No SerialNumber: 'HU1061001U' MinBlock: 1 MaxBlock: 16777215 SCSI ID: 3 SCSI LUN: 0 Ready: yes BufferedMode: yes Medium Type: Not Loaded Density Code: 0x44 BlockSize: 0 DataCompEnabled: yes DataCompCapable: yes DataDeCompEnabled: yes CompType: 0x1 DeCompType: 0x1 BOP: yes Block Position: 0 Partition 0 Remaining Kbytes: 400308 Partition 0 Size in Kbytes: 400308 ActivePartition: 0 EarlyWarningSize: 0 NumPartitions: 0 MaxPartitions: 0
tar
and mt
LTO-4 drives and lower only support classic UNIX tarballs. Yeah, you got that right: tar
stands for Tape ARchive, and is a sequential archive format as a result.
You will also need the mt
command to operate the drive (our version provided by the package mt-st
), such as checking status or rewinding.
/dev/st0 and /dev/nst0
The major difference is after perform a task, /dev/st0 will rewind to begining of tape, but the /dev/nst0 won't. normally we use nst0 for daily backup(unchange tape), st0 for weekly/monthly backup (using single tape for each backup).
- Use
/dev/st0
if you only want to push a single directory to fill the entire tape (it rewinds upon completion). - Use
/dev/nst0
if you want to add multiple directories to the tape incrementally.
Note: The moment you use
/dev/st0
, when a command completes the system will automatically rewind it.
Operation
Note: Run all commands as root.
Before loading a new tape (at least for an HP LTO3 Drive), you need to run the following to set up mt-st
:
# mt -f /dev/nst0 stsetoptions scsi2logical
If you've already written data to the tape, make sure to wind to the last written block, otherwise existing data could be lost. If this is a new tape, the drive will stay at 0.
(Alternatively, if you are fine with overwriting the whole tape, you can start from the beginning.)
# mt -f /dev/nst0 eod
Figure out what block you are at:
# mt -f /dev/nst0 tell
Now send a folder to the tape drive using tar:
Tip: We don't need to set software compression with tar, because the LTO drive will do hardware compression on its own.
# tar -cvf /dev/nst0 /path/to/dir
However, even an LTO-3 tape drive requires 40-80MB/s of sustained RW speeds or it will undergo a "shoe shining" effect, where not enough data is being pushed to fill the LTO drive's buffer, and the head has to stop or rewind the tape to wait. This can cause serious wear to both tape and drive.
Although today's hard drives are just barely able to push 50-80MB/s, there's a better method: buffer to RAM.
Buffered Writes
A better method is to buffer the data to 2-4GB RAM before pushing (of course, this means you need that much free RAM).
We also want to make checksums of the tar to ensure that everything went well.
Here's a good one-liner (check the source link for more info about what it does). First, go to the directory where you want to store the write logs.
In the following command, enter a name in bkname="name-of-backup-YYYY-MM-DD"
, then enter folders/files in tobk="/home /var/www"
Make sure that if your folder names contains spaces, use single quotes '
within: tobk="'/mnt/extdisk/Disc 7/ISOs' /var/log"
. Finally, run it as root.
bkname="001_external-backup-items_2016-02-25"; tobk="/mnt/extdisk/Backup" ; totalsize=$(du -csb $tobk | tail -1 | cut -f1) ; tar cvf - $tobk | tee >(sha512sum > $bkname.sha512) >(tar -tv > $bkname.lst) | mbuffer -m 3G -P 100% | pv -s $totalsize -w 100 | dd of=/dev/nst0 bs=256k
The buffered writes command we used above will write a tar
archive with a block size of (bs=256k), so don't forget it. Make sure to write this block size on your tape case, so you can use it later.
Tip: Make sure to put the resulting
.lst
and.sha512
files into a small flash drive and/or CD-R alongside the LTO tape. (consider online storage as the third backup) It will be extremely helpful, if not crucial, to refer to these next time you have to extract data, so they need to be accessible.
Tip: Always number the
bkname
above, so this way you will know the order of the files on the tape. Also write down a sequential, numbered list of each record you've pushed to the tape (the plastic case usually has a notepad). That way you'll be able to figure out which record to access by that same index number.
Tip: If at all possible, have a primary hard drive where the OS can store the generated logs and programs (e.g. a small SSD), and keep your backup data on a separate hard drive. This way you can run this command from a directory on the primary hard drive, to reduce lag.
Warning: Mbuffer will take up 3GBs of RAM in the above example. If you have less than 3GBs of RAM, performance will actually suffer, not improve. Reduce the buffer size to fit your RAM amount: or buy more RAM.
Tip: If the data you want to push is already in
.tar
format (no compression), replacetar cvf - $tobk
withcat $tobk
. If the data is in.tar.gz
format (gzip compression), replace withzcat $tobk
. If the data is in.tar.xz
, usexzcat $tobk
, and for.tar.bz2
usebz2cat $tobk
.
- Command Line Fu - Backup to LTO Tape with Progress, Checksums, and buffering
- http://dampfnudel.blogspot.com/2007/10/tape-backups.html
Once you have multiple records on the tape, you can navigate between them using the following commands:
Forward 1 record:
# mt -f /dev/nst0 fsf 1
Go to previous record:
# mt -f /dev/nst0 bsfm 1
Go to the last written byte of the tape:
# mt -f /dev/nst0 eod
Caution: If you're appending a record to a previously written tape, make sure to wind to the last written byte of the tape (
eod
), or you will overwrite existing data.
Reading Tape Archives
Since we piped in data using dd
, we might as well pipe out data using it and the same block size (bs=256k) as well. Here's how you list all files in one section of the tape (navigate to your desired section first):
dd if=/dev/nst0 bs=256k | tar tvBpf -
Once you know that this section is what you want, rewind (mt -f /dev/nst0 bsfm 1
), and extract with the following command (drops in the current directory):
dd if=/dev/nst0 bs=256k | tar xvBpf -
Ejecting the Tape
Note: We strongly recommend that the tape is ejected when it is not in use. Otherwise, dust may enter the cartridge. The drive will also have to wind the motor every hour or so, to keep the tape taut. The LTO Drive can run pretty darn fast, so it won't take much time to wind forward.
Finally, when you are finished with the tape, rewind and eject it with the following command:
# mt -f /dev/st0 offline
Hardware Compression
The LTO Drives support basic, but very fast hardware compression algorithms, burned into the chips of the tape drive. This way, if you have tons of text logs to push, you don't have to compress your data before putting it in the drive, the drive will do it for you. You also don't have to decompress afterwards.
For the most part, the LTO drive has compression on by default, but if it is off for some reason, run the following commands to enable them.
Enable compression until drive is turned off:
# mt -f /dev/nst0 compression 1
Enable compression for all writes by default:
# mt -f /dev/nst0 defcompression 1
Note that only some kinds of data will compress well enough for 2:1 compression ratio (as they advertise to transform 400GB to 800GB). This includes text, logs, documents, etc.
Since images (jpg, png) and video (mp4, mpeg) are already compressed, further compression is much less effective. In fact, due to the primitive compression algorithm used, it might even make compressed data bigger, at 15% loss.
As a result, you might want to consider disabling hardware compression whenever you're storing such data, or using your own software compression: gz
or xz
algorithms (with pigz or parallel xz for speed, modern CPUs should handle it fine). Before you use these, make sure to turn compression off:
Disable compression until drive is turned off:
# mt -f /dev/nst0 compression 0
Disable compression for all writes by default:
# mt -f /dev/nst0 defcompression -1
Notice that the tape drive can automatically detect HW compressed data on an existing tape and automatically enable it. Thus, if you aren't using a fresh tape, you'll need to relabel it accordingly.
https://wiki.zmanda.com/index.php/Hardware_compression
Sources
- Ubuntu Forums - mt-st HP LTO Input/Output Error
- http://www.cyberciti.biz/faq/linux-tape-backup-with-mt-and-tar-command-howto/
- http://www.linuxquestions.org/questions/linux-hardware-18/difference-between-dev-st0-and-dev-nst0-493760/
LTFS
LTO-5 tapes and after support tape partitioning, with the open source LTFS filesystem. This makes the tape look like an ordinary filesystem, rather than tape archives back to back, making it easier to deal with.
The flip side of this is that tape is still a sequential medium, so beware if you try to use it like a disk. Avoid random reads and just push and pull data sequentially.
Sources
LTO Autoloaders using MTX
http://surf.ml.seikei.ac.jp/~nakano/dump-restore/dump-restore-mini-HOWTO.en.html#ss2.2