What's Your Back-up Plan?
Volume Number: 23 (2007)
Issue Number: 12
Column Tag: Hardware
What's Your Back-up Plan?
Why you must protect an ever-increasing volume of valuable data
by Mike Cobb
Have you ever been stuck in a traffic jam on the freeway and thought, "Gee, they really ought to widen this road and add a couple of lanes to handle all this traffic?" Except, you know what would happen. Those lanes would fill up with cars in no time and you'd be stuck again. Or, as someone said, "Nature abhors a vacuum."
That's sort of how it is with today's hard drives. As manufacturers have been able to create more and more digital capacity in less and less physical space, users-Mac users in particular-have eagerly poured in more and more data to fill that extra space, often right to the limit.
The problem is that it takes longer to back up these expanding data sets. Many individuals don't bother, and many companies don't budget adequately for the right tools to keep up with the volume.
Meanwhile, the drives themselves have become more intricate, with smaller components and tighter tolerances. So the risk of failure is increased. Drive failure has been a fact of life since the first computers, but with the capacity of today's machines and the propensity of users to use it all, the sheer volume of data at risk today is staggering.
Whether you edit home movies on your laptop for fun or manage a room full of servers for business, it's time for a back-up plan.
A data salvager's perspective
As a Director of Engineering at DriveSavers, I'm responsible for overseeing the data recovery process for Mac and Unix systems, which includes everything from iPods to Xserve RAID. We deal with the challenge of data overflow every day from both the end-user and the enterprise level.
When you've been doing this for a few years-13 in my case-you get an interesting perspective on how much data volume has grown. A typical data recovery job in 1994 involved a hard drive with storage capacity in the 20-40 megabyte range. For the recovery process, we used 240 MB hard drives to hold the data we recovered and the average file count, including all the OS files as well as the user's data, was around 25,000 files per recovery. And in those days, floppy discs were the primary back-up medium and the media DriveSavers used to send customers their recovered data.
Today the average recovery for a Mac is 60 gigabytes and the average number of files is 160,000. Only 25% of the recoveries can go on CDs or DVDs. Most are sent back in new internal or external hard drives because of the large data sets and file sizes. In fact, a lot of our customers have files that are bigger than the 8-gig capacity of a dual-layer DVD. External drives-not CDs or DVDs-have become today's floppies.
Of course, the Mac world is different from the PC world, as you might expect. The average PC recovery is 10-15 GB, so most of those recoveries go out on DVDs. On the Mac side we work for a lot of creative professionals like photographers, filmmakers and audio engineers who are creating huge files with applications like ProTools and FinalCutPro. It sounds like a cliche straight out of the "Mac vs. PC" commercials, but it's true. Mac and PC users have one thing in common, however-they keep all their email and e-mail attachments forever.
Drive failure happens. Here's why.
So how do all these cluttered drives end up in our shop? Drive failure is inevitable, and its causes are many. A few are extreme. We have recovered data from computers that have been dropped, run over, burned, drowned and shot. But those "disk-asters", as we call them, are the exception. The everyday causes of drive failure are more mundane, breakdowns in the inner workings of the drives themselves, brought on by the very complexity that makes them so powerful.
Many drives come out of the factory with some kind of defect that will eventually surface. The average service life of a drive these days is 3-5 years. Drive manufacturers claim the failure rate is about 1% of all drives in use per year, but some independent estimates put it as high as 4% and even up to 13%.
Just as it can't be avoided, neither can it be predicted. There have been various efforts at "smart" failure prediction, but a majority of drive failures happen immediately, like bad accidents, without warning. When the drive heads suddenly decide they're going to crash into a spinning platter, no one can see it coming.
Simple electrical failure caused by a few inherently bad sectors used to be the primary cause of failure. Nowadays, we see less electrical failure and more physical media damage resulting from the tight packing of ever-shrinking, fast-moving mechanical parts, especially the head-to-platter surface interface.
Power surges are a common cause, too. They're especially bad for the users who are conscientious about backing up, because their back-up drives are usually plugged into the same power source as their main ones.
Here's one that surprises people: hard drives are sensitive to altitude. They have a higher rate of failure over 10,000 feet, even in pressurized airplane cabins where every other person seems to have a laptop or an iPod. In a depressurized environment, like a mountaintop in the Andes, they simply won't function over 10,000 feet.
User error is a less common cause of failure, but it certainly happens. It might be as simple as unplugging a Firewire or USB drive without first "ejecting" it. You might even get away with it nine times and get lulled into thinking nothing's going to happen-until the tenth.
At the enterprise level, we repeatedly hear from IT managers who thought that, in a RAID server, back-up was "built in"-if one drive failed, another would take over-never anticipating that a second drive failure would crash the entire system. In fact, when one drive fails the remaining drives start working overtime, running faster and hotter than normal, increasing the risk of complete failure.
IT support people often call DriveSavers, too, after they've done a reinstall on a desktop system to try to eradicate some kind of corruption, only to find there were some crucial documents that the user didn't back up to the server.
People ask us how they can prevent their disks from failing. The short answer is you can't. It's not a question of if, but when. The more pertinent question, though, is, how can you prevent your data from being lost, or avoid going through the downtime and costs that a recovery entails? Based on my experiences, there's only one answer.
Back up. Back up. And back up some more.
Being told you need to back up regularly is kind of like having the dentist tell you that you need to floss. You know it's true. You vow to be better about it. You have the means and every intention. But you forget, or you put it off. And next thing you know, you're getting a root canal.
Cost used to be an impediment. After paying for a computer, who really wanted to shell out the money for an extra hard drive? But external drives are comparatively cheap now. We see more and more of them coming in for data recovery. The reason? Users are buying external drives for back-up, and then they wind up using them for data overflow. So, the data on their external drives is just as much at risk as that on their computers. The only data that is not at risk is data that has been backed up.
So, what's the best back-up system for the heavy hobbyist or small creative business? The answer is the one you're most likely to use-if it encourages you to back up instead of discouraging you, it's right for you. If you have large files of photos, movies, music and the like, CDs or DVDs are simply not a practical option. The handling and storage of them is also a bit cumbersome. Tape back-up was once the standard in business, but nowadays it's costly and slow compared to other options and does not give you the flexibility to restore on other systems. Firewire or USB external drives are the way to go-provided you're not tempted to use that extra capacity for your data overflow. If that's inevitable, you need to buy another drive. If you can't be bothered to remember to back up on a regular basis, there are programs you can buy to schedule automatic backups.
Of course, in a creative business, you probably have very large data sets and a large number of files, possibly more than you can back up in one night. And if your back-up time cuts into your work time, that translates to downtime. One solution is to upgrade your network to gigabit. It's ten times faster than 100 Base-T Ethernet-currently the standard in many businesses-and you can transfer as much as two gigs per minute (vs. 200 Mb.)
Strategies for the RAID Environment
The issue is trickier for enterprise IT departments that measure their volume in terabytes rather than gigabytes, because it comes down to the fundamental business tradeoffs of time and money. If you have a RAID 1 mirror or RAID 5 striped with distributed parity, you're off to a good start. But if there's corruption on any of the drives, it will be mirrored as well. You still need a backup. Tape can take days to back up a high volume of data. Now, you can get external drives capable of holding up to 2 terabytes, allowing you to back up to multiple drives and restore the data to any Mac-an advantage that tape doesn't give you.
Do you need to back up your entire system every time you back up? One strategy to consider is incremental back-ups. Start with a full system back-up, and then back up only the data that has been changed in succeeding intervals. You'll still need to do full back-ups regularly, and you'll need to determine the schedule based on the amount of your data and the nature of your business-for instance, incremental back-ups nightly and full back-ups on the weekends.
The real killer in the movement of data is not always the amount of data in gigabytes or terabytes, but the file count. Is there a way to consolidate multiple files (directories, for example) into a single file? Fewer, larger files will be easier and faster to back up and restore than more, smaller files, even if the total volume of data is the same.
Of course, all the backup planning in the world is for naught if you don't also have a plan and the means for restoring the data. That means having yet another server dedicated to restoration. It's advisable if not imperative to do an occasional practice restore on another computer-obviously you can't restore to one that's failed. It doesn't have to be the identical computer, just one that's sufficiently robust to handle the data.
Managing the movement and storage of data is a secondary if not even a more distant priority at most companies relative to their primary business, which is why IT departments have to keep cajoling management on the importance of having adequate backup resources-and the cost of downtime. At our company, handling data is the main thing we do, so our experience may be instructive for other companies.
We didn't even have servers when I started. Now the Mac group has 40 terabytes of online fibre storage and 60 terabytes of offline disk storage using 500 GB and 750 GB SATA and PATA drives. The combined storage capacity is the equivalent of 100 million floppy discs. All of our systems are on a gigabit network, through which we push between 6 and 8 terabytes per day-almost more than the total of data permanently stored at the Library of Congress. Where it used to take us a week to back up 3 terabytes to tape and another week to restore it, we can now back up this amount of data to a server in just one day. This is over a 14-fold savings in time and resources.
Our business is all about keeping data safe, but when you think about it, it really ought to be a priority for any business.
When all else fails: data recovery
So, you vowed to be better about backing up, but you got too busy and just plain forgot. Or everything got backed up except one crucial file. Or your well-planned back-up system didn't do its job properly. Well, good news. Your chances are good, actually better than good, that the data is not really lost and that it can be restored.
At our company, the data gets imaged from the original drive in the cleanroom then sent to our Mac engineers. Once we ascertain whether the damage is logical or physical, we'll determine the best strategy for an optimal recovery. We typically call our customers to find out what is most important to them and what they are hoping to have recovered. The recovered data is returned on external hard drives or DVDs, and also backed up onto our own servers. The data is held for a period of time in case the customer has any issues or questions. The length of turnaround and cost is based on the service selected (priority, standard or economy), the drive capacity, the operating system and the complexity of the recovery.
More data than ever, that matters more than ever.
As we've witnessed the increase in data volume over the years, we've also noted that it is proportional to the rise in the value of data-because so much of our work and lives now exists primarily in digital form. (Ironically, our company started out doing drive repair, until we figured out that people were more concerned with their data than the drives themselves.) Data loss can cripple businesses and send everyday users to the edge of despair.
With today's data recovery capabilities, despair is unwarranted in the vast majority of cases. However, the best recovery plan is the user's own. Think about what that growing volume of data is really worth to you and figure out how to safeguard it before it overflows. Hard drive failure may be unavoidable, but with the right tools, strategies mindset, you can avoid losing what really matters.
Mike Cobb is the Director of Macintosh and Unix Engineering at DriveSavers, a data recovery services company. He joined the company in 1994, and has performed recoveries on all types of hard drives. Before joining DriveSavers, he worked as a tech support supervisor and beta-test coordinator for a manufacturer of Mac-based RAID mirroring hardware and software, among other products.