LTO Tape

From Bibliotheca Anonoma
Revision as of 22:09, 30 January 2018 by Antonizoon (talk | contribs)
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

LTO Tapes are hands down the best and cheapest way to archive large amounts of data.

Using LTO Tapes can be a bit of a challenge, but if you're willing to use some simple Linux commands you can do it. We've detailed the whole process of buying some tape drives and tapes, setting them up with a normal desktop with PCI-E (or better yet an ECC RAM Server), and how to actually write to the tapes.

It was a tremendous challenge to build this guide, but thankfully it won't be so hard for you once you read through the details below.

How tapes work: https://books.google.com/books?id=rGjkBQAAQBAJ&lpg=PA76&ots=DHpi1-f7K1&dq=decompress%20before%20lto%20tape&pg=PA75#v=onepage&q=decompress%20before%20lto%20tape&f=false

Introduction

Store anywhere from 500GBs, or even 6.25 Terabytes of information on $35 tapes. That's nearly 1-2 cents per gigabyte! LTO tapes are a cheap and long-lasting data archival solution for video editors, webserver admins, or online archivists.

Here's a great overview of the LTO format and how it benefits video production, or virtually anyone who needs to safely store hundreds of gigabytes of data for decades to come.

Choosing an LTO Type

Since LTO Drives were designed for the server market, it will be difficult to buy a drive as an individual; let alone at prices less than $100.

On the other hand, since organizations toss waves of old electronics all the time; you can easily find an LTO-3 (400/600GB) tape drive for $50 on eBay, and even less at your local surplus store. This is more than enough for a small-time archivist, and you can upgrade anytime. So choose the LTO level that fits your needs:

  • LTO-3 (400/600GB)
  • Tapes: $15 each
  • Drives: ~$50-80 (used)
  • LTO-4 (800GB)
  • Tapes: $20 each
  • Drives: ~$150-375 (used)
  • LTO-5 (1.5/3TB)
  • Tapes: $30-40 each
  • Drives: ~$300 (used), ~$1,000-2,000 (new)
  • LTO-6 (2.5/6.25TB)
  • Tapes: $50 each
  • Drives: $500 (used), ~$3,500 (new)

Here's a comparison chart.

Hooking up an LTO Drive

Now that you have an LTO Drive and some tapes; you need to use special adapters and cables to hook it server-grade SAS to a typical computer. Unfortunately, this can be somewhat difficult. But that's why this guide exists.

  • Desktops - It is significantly easier and cheaper to find a PCI adapter card than a laptop one, since many servers are based on PCI-Express anyway. And it's not like your tape drive or the library is portable anyway, so grab a cheap desktop and plug in one of these cards.
  • SCSI
  • SAS to PCI - An expansion card that gives your computer the ability to use SAS drives.
    • HP Smart Array P411 - A great card for desktops. If you can find one. HP makes a wide range of variants, so check the Model Comparison list under it for more info. Most of these cards require Full size SAS to Mini SAS adapters, which should only cost an extra $20.
    • HP Smart Array P400 - An older generation of the above. Since we're no server junkies, this should be good enough for most people.
    • $20 - SPlusDirect - HP Smart Array P400 RAID
  • Laptops - It can be a challenge to connect SAS to a laptop.
  • SCSI - Not sure how to interface with this ancient protocol.
  • SAS to Expresscard - The most popular method. Instructions here.
  • $3,500 - MLogic MTape LTO-6
  • Sal Guarisco's LTO Drive Connection System
  • SAS to PCI to PE4C Bridge - Info here? If you happen to have a PE4C bridge from an eGPU, the best method is to grab one of the HP SmartArray desktop PCI cards and plug them into the Expresscard bridge. Otherwise, don't bother, a PE4C costs $90.
  • Thunderbolt - If you have a recent Macbook, a Mac Mini, or even some high-end gaming desktop motherboards, they all come with Intel Thunderbolt, a convenient PCIe plug.
  • But if you don't need a whole desktop, get Intel's Thunderbolt-equipped Next Unit of Computing (DC3217BY for $225 kit, D33217CK for $140 board-only), a full Desktop on a chip+CPU. It's the exact same hardware as a Mac Mini, just cheaper.
  • SAS to Thunderbolt - The fastest, easiest method.

And no, you can't just hook up SAS to a SATA port. They are different protocols.

Storing Data

Now that your drive is set up, what format do you store data in?

We strongly recommend that you use free open-source tools and filesystems, such as LTFS. Everything else costs thousands of dollars, and might not last as long as your tapes will.

  • LTFS

Tape Tips

Remember that Tapes are a sequential access technology, unlike random-access hard drives. Therefore, tapes should be used for archival, not day-to-day usage.

You dump your massive files in, and put the tape on the shelf when it's full. If you ever need to access the files again (perhaps after your hard drive got corrupted, or just to look at previous records), that's when you reach for the tape.

Why use Tapes for Archival/Backup?

Why a Tape Backup System is still popular

https://youtu.be/RIDa2EobXqo

LTO Tapes are beyond your ol' VHS tapes: they can last up to 30 years in a good air conditioned room, and within 5 years you'll probably upgrade two generations up in LTO once the drives get cheaper.

You could take a hammer to a tape drive and rewind it on to a new tape. You'd lose what ever part got damaged but anything else would be salvageable.

They survive drops and falls much better than 4TB hard drives. They're easy to send off site. LTO tapes are backwards compatible for 3 generations.

Most of all, LTO-3 and LTO-4 tapes and drives are very inexpensive, at $10-15 for a 400-800GB tape, and around $100 for a good used drive.

There's a reason some very important stuff is backed up to Tape. I'll use 4 TB drives or Blu-rays to back up archived movies. I'll use tapes to back up family photos.

And large cloud data providers such as Google or Amazon would be teetering on hard drive destruction without cheap tape backups. Google once lost a significant portion of Gmail data from runaway bad code, but recovered everything just by picking out a tape from the shelf (with a robot arm).

Read/Write Speeds

One of the big reasons that LTO drives are so expensive when sold new (but not used) is that these drives use probably some of the fastest motors ever developed, and read data at very high speeds.

Notice that with each increasing LTO level, the minimum drive read/write speed increases as well. LTO-3 requires 40MB/s, while LTO-4 requires 100MB/s, and that's just uncompressed.

This can be mitigated by providing enough GBs of RAM for buffering, since RAM can write data very quickly. We recommend about 2-4GBs of RAM for LTO-3, and 4-8GBs for LTO-4.

Any modern desktop computer with a PCI-E slot would easily work with LTO 3, 4, and 5. We do recommend ECC RAM for mission critical data (standard on Xeon server blades), which conducts checksum verification on the data stored inside the RAM itself: But for movies, text, and images, you can probably live without error checking.

Data Migration

One aspect to watch out for is obseletion.

It's not hard to fight obseletion. Here's how you do it:

  1. Don't use proprietary software/standards. Stick to tar, which have been around forever and won't stop being supported on Linux. Or on LTO-5 and up, use LVDS, which looks like a standard hard drive filesystem. This way, you know what format your tapes used and you can recover a piece even if they are sliced up (not ideal, but try that with a hard drive).
  2. Migrate your old tapes if you're upgrading to a new Standard - You shouldn't be too worried about the coming of the next LTO standard. They have read backwards compatibility up to two generations back. In addition, an LTO tape one generation up can store 2 times the data, and 4 times for two generations up. Your old backups will just get smaller upon transfer.

Alternatives

https://www.digitalrebellion.com/blog/posts/backup_options_for_filmmakers

  • Hard drives are out of the question, since there's been big issues

LTO Types

Note that all standards LTO-4 and lower must use tarballs for spitting data. No need to compress, it does hardware compression on it's own. LTO-5 and up supports "partitioning" and can utilize LTFS, an open backup format which is incredibly easy to work with.

Blu-ray

The first step is to compare against the price of typical Blu-ray. It's not an exact comparison, but it's a good marker. While you do have to switch out disks and use DVDisaster using Blu-ray, that compares easily against the non-intuitiveness of tapes. Thus, to be practical, the tape price must beat the cost per gigabyte

  • $23 - Plexdisc 50 pack, 1,250GB, 1.84 cents per gigabyte, 54GB for $1, 46 cents per 25GB disc.

LTO-3

The absolute minimum you should go for. It's 3.75 cents per gigabyte at $15, or $1 for 22-45GB. That's actually not that cost effective against Blu-rays (LTO-4 would be much better value), but if the data is compressible you can fit more, at least the drives are significantly cheaper, and what you get is volume and longevity over burned Blu-Rays.

I could in fact salvage one for Justin, get all the parts except for the drive (which he buys himself). Most importantly, a second PCIe to SCSI card from a Dell PowerEdge.

Other LTO Types

  • LTO-4 - The best value when it comes to media, at 1.85 cents per gigabyte, and $1 for 40-80GB. This handily beats Blu-ray. The drives are more expensive than LTO-3 at $200, but if you bid you might get lucky with $50-100. At least you're not scraping the bottom of the barrel.
  • Think about it: you have double the storage at the same price as LTO-3, so it you would save $15 per tape. Weigh that against the cost of an LTO-4 drive.
  • LTO-5 - $1 for 75-150GB - At this point, the price of drives starts to climb up significantly, to $500. Whether that will pay for itself in a year is your challenge. Hopefully we can find it on a Dell PowerEdge blade, but don't push your luck.
  • LTO-6 Photographers and Moviemakers have been using external LTO drives for years.

Silicon Mechanics Server

The best way is to get a mini SATA server to run the tape library. This little blade server is the ideal system

Upgrades will be duly necessary:

  • 2TB Hard Drive - The Hard Drive functions as the data cache, while the data is being written over.
  • 4Gigabit Fiber Channel HBA - I already have one of these, and
  • LTO-5 Effect - It looks like LTO-5 will only saturate to 150mbps, so 4GB is more than enough.

Usage

Prerequisites

Make sure you have chosen your LTO type. Then, get a Host Bus Adapter that is compatible with your chosen LTO drive, and wire it up following one of the directions below:

Dependencies

tar and mt come by default with all Linux systems. However, you will want the mt-st and mtx packages to provide more detailed tapeinfo, and control robotic autochangers (if equipped).

We will also want pv to view progress, and mbuffer to buffer the data before pushing.

sudo apt-get install mtx mt-st pv mbuffer

Kernel Module

You will need to set the kernel module st on boot before you will be able to detect tape drives.

This will enable it temporarily:

sudo modprobe st

However, we want to enable it on every boot. Check how to do so. On Ubuntu 14.04 just add st on a line to /etc/modules.

mtx Tapeinfo

We want to figure out specific information about the tape drive we're using, so let's run the following command to get a status report:

# tapeinfo -f /dev/nst0

Product Type: Tape Drive
Vendor ID: 'HP      '
Product ID: 'Ultrium 3-SCSI  '
Revision: 'G63W'
Attached Changer API: No
SerialNumber: 'HU1061001U'
MinBlock: 1
MaxBlock: 16777215
SCSI ID: 3
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: Not Loaded
Density Code: 0x44
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x1
DeCompType: 0x1
BOP: yes
Block Position: 0
Partition 0 Remaining Kbytes: 400308
Partition 0 Size in Kbytes: 400308
ActivePartition: 0
EarlyWarningSize: 0
NumPartitions: 0
MaxPartitions: 0

tar and mt

LTO-4 drives and lower only support classic UNIX tarballs. Yeah, you got that right: tar stands for Tape ARchive, and is a sequential archive format as a result.

You will also need the mt command to operate the drive (our version provided by the package mt-st), such as checking status or rewinding.

/dev/st0 and /dev/nst0

The major difference is after perform a task, /dev/st0 will rewind to begining of tape, but the /dev/nst0 won't. normally we use nst0 for daily backup(unchange tape), st0 for weekly/monthly backup (using single tape for each backup).

  • Use /dev/st0 if you only want to push a single directory to fill the entire tape (it rewinds upon completion).
  • Use /dev/nst0 if you want to add multiple directories to the tape incrementally.

Note: The moment you use /dev/st0, when a command completes the system will automatically rewind it.

Operation

Note: Run all commands as root.

Before loading a new tape (at least for an HP LTO3 Drive), you need to run the following to set up mt-st:

# mt -f /dev/nst0 stsetoptions scsi2logical

If you've already written data to the tape, make sure to wind to the last written block, otherwise existing data could be lost. If this is a new tape, the drive will stay at 0.

(Alternatively, if you are fine with overwriting the whole tape, you can start from the beginning.)

# mt -f /dev/nst0 eod

Figure out what block you are at:

# mt -f /dev/nst0 tell

Now send a folder to the tape drive using tar:

Tip: We don't need to set software compression with tar, because the LTO drive will do hardware compression on its own.

# tar -cvf /dev/nst0 /path/to/dir

However, even an LTO-3 tape drive requires 40-80MB/s of sustained RW speeds or it will undergo a "shoe shining" effect, where not enough data is being pushed to fill the LTO drive's buffer, and the head has to stop or rewind the tape to wait. This can cause serious wear to both tape and drive.

Although today's hard drives are just barely able to push 50-80MB/s, there's a better method: buffer to RAM.

Buffered Writes

A better method is to buffer the data to 2-4GB RAM before pushing (of course, this means you need that much free RAM).

We also want to make checksums of the tar to ensure that everything went well.

Here's a good one-liner (check the source link for more info about what it does). First, go to the directory where you want to store the write logs.

In the following command, enter a name in bkname="name-of-backup-YYYY-MM-DD", then enter folders/files in tobk="/home /var/www" Make sure that if your folder names contains spaces, use single quotes ' within: tobk="'/mnt/extdisk/Disc 7/ISOs' /var/log". Finally, run it as root.

bkname="001_external-backup-items_2016-02-25"; tobk="/mnt/extdisk/Backup" ; totalsize=$(du -csb $tobk | tail -1 | cut -f1) ; tar cvf - $tobk | tee >(sha512sum > $bkname.sha512) >(tar -tv > $bkname.lst) | mbuffer -m 3G -P 100% | pv -s $totalsize -w 100 | dd of=/dev/nst0 bs=256k

The buffered writes command we used above will write a tar archive with a block size of (bs=256k), so don't forget it. Make sure to write this block size on your tape case, so you can use it later.

Tip: Make sure to put the resulting .lst and .sha512 files into a small flash drive and/or CD-R alongside the LTO tape. (consider online storage as the third backup) It will be extremely helpful, if not crucial, to refer to these next time you have to extract data, so they need to be accessible.

Tip: Always number the bkname above, so this way you will know the order of the files on the tape. Also write down a sequential, numbered list of each record you've pushed to the tape (the plastic case usually has a notepad). That way you'll be able to figure out which record to access by that same index number.

Tip: If at all possible, have a primary hard drive where the OS can store the generated logs and programs (e.g. a small SSD), and keep your backup data on a separate hard drive. This way you can run this command from a directory on the primary hard drive, to reduce lag.

Warning: Mbuffer will take up 3GBs of RAM in the above example. If you have less than 3GBs of RAM, performance will actually suffer, not improve. Reduce the buffer size to fit your RAM amount: or buy more RAM.

Tip: If the data you want to push is already in .tar format (no compression), replace tar cvf - $tobk with cat $tobk. If the data is in .tar.gz format (gzip compression), replace with zcat $tobk. If the data is in .tar.xz, use xzcat $tobk, and for .tar.bz2 use bz2cat $tobk.

Tape Record Navigation

Once you have multiple records on the tape, you can navigate between them using the following commands:

Forward 1 record:

# mt -f /dev/nst0 fsf 1

Go to previous record:

# mt -f /dev/nst0 bsfm 1

Go to the last written byte of the tape:

# mt -f /dev/nst0 eod

Caution: If you're appending a record to a previously written tape, make sure to wind to the last written byte of the tape (eod), or you will overwrite existing data.

Reading Tape Archives

Since we piped in data using dd, we might as well pipe out data using it and the same block size (bs=256k) as well. Here's how you list all files in one section of the tape (navigate to your desired section first):

dd if=/dev/nst0 bs=256k | tar tvBpf -

Once you know that this section is what you want, rewind (mt -f /dev/nst0 bsfm 1), and extract with the following command (drops in the current directory):

dd if=/dev/nst0 bs=256k | tar xvBpf -

Ejecting the Tape

Note: We strongly recommend that the tape is ejected when it is not in use. Otherwise, dust may enter the cartridge. The drive will also have to wind the motor every hour or so, to keep the tape taut. The LTO Drive can run pretty darn fast, so it won't take much time to wind forward.

Finally, when you are finished with the tape, rewind and eject it with the following command:

# mt -f /dev/st0 offline

Hardware Compression

The LTO Drives support basic, but very fast hardware compression algorithms, burned into the chips of the tape drive. This way, if you have tons of text logs to push, you don't have to compress your data before putting it in the drive, the drive will do it for you. You also don't have to decompress afterwards.

For the most part, the LTO drive has compression on by default, but if it is off for some reason, run the following commands to enable them.

Enable compression until drive is turned off:

# mt -f /dev/nst0 compression 1

Enable compression for all writes by default:

# mt -f /dev/nst0 defcompression 1

Note that only some kinds of data will compress well enough for 2:1 compression ratio (as they advertise to transform 400GB to 800GB). This includes text, logs, documents, etc.

Since images (jpg, png) and video (mp4, mpeg) are already compressed, further compression is much less effective. In fact, due to the primitive compression algorithm used, it might even make compressed data bigger, at 15% loss.

As a result, you might want to consider disabling hardware compression whenever you're storing such data, or using your own software compression: gz or xz algorithms (with pigz or parallel xz for speed, modern CPUs should handle it fine). Before you use these, make sure to turn compression off:

Disable compression until drive is turned off:

# mt -f /dev/nst0 compression 0

Disable compression for all writes by default:

# mt -f /dev/nst0 defcompression -1

Notice that the tape drive can automatically detect HW compressed data on an existing tape and automatically enable it. Thus, if you aren't using a fresh tape, you'll need to relabel it accordingly.

https://wiki.zmanda.com/index.php/Hardware_compression

Sources

LTFS

LTO-5 tapes and after support tape partitioning, with the open source LTFS filesystem. This makes the tape look like an ordinary filesystem, rather than tape archives back to back, making it easier to deal with.

The flip side of this is that tape is still a sequential medium, so beware if you try to use it like a disk. Avoid random reads and just push and pull data sequentially.

Sources

LTO Autoloaders using MTX

http://surf.ml.seikei.ac.jp/~nakano/dump-restore/dump-restore-mini-HOWTO.en.html#ss2.2