12:19 PM
George Crump
George Crump
Connect Directly

Backup Deduplication 2.0 Needs Better RAID

As we wrap up our series on what is needed in the next generation of backup deduplication devices, one of the key needs is going to be a better drive protection capability. Today most deduplication systems leverage RAID to provide that drive protection, however as capacities increase, RAID rebuild times are going to get worse. Vendors need to provide a better solution.

As we wrap up our series on what is needed in the next generation of backup deduplication devices, one of the key needs is going to be a better drive protection capability. Today most deduplication systems leverage RAID to provide that drive protection, however as capacities increase, RAID rebuild times are going to get worse. Vendors need to provide a better solution.The first part of the problem is that disk backup systems are always going to be one of the first storage platforms in a data center to adopt larger drive capacities so they can continue to narrow the price gap between disk and tape. As these capacities increase so does the time it takes for RAID rebuilds to complete if a drive fails. With RAID 5 this means a longer period of time that your backups are exposed to complete failure. With RAID 6 while you can sustain a second failure without data loss overall backup and recovery performance is still impacted. Of course the longer the rebuild time the greater the chance that there could actually be a second drive failure.

You may ask yourself "why does a total data loss matter?". Fair question, this is after all backup data. First most backup administrators, armed with disk backup, now count on not having to perform as many full backups, instead running higher numbers of incremental, differentials or synthetic full daily backups. The full backup window may now be designed to happen once a quarter. The time saved by eliminating the weekly full backup has probably been absorbed by some other process. Total data loss is especially costly on backup deduplication systems since its efficiency depends on previous generations of files. A total failure means that the entire deduplication process would essentially need to start all over again.

Until drive manufacturers make drives that never fail, the key is for backup deduplication systems to get through this rebuild process sooner or to use a different process all together. RAID is RAID and the larger drives get the more work will be involved in the rebuild process. There are ways around this though. First you can throw more storage horsepower at the problem. While your are limited to drive mechanics the faster the parity calculations can be done the better. Another option is to not fail the entire drive but to use intelligence to mark out the bad section of the drive and keep on going.

Another option is to use a different data protection algorithm than RAID. There are erasure coding or Reed-Solomon techniques that may have better rebuild times. These and other techniques understand what blocks on a drive contain data and only does the rebuild for those blocks, again faster. The other option, probably least attractive in the disk backup space is mirroring since, again, it is trying to compete with tape.

A final option may be to actually use smaller, faster drives and then through backup virtualization, leverage tape to keep the size of the front end disk smaller. As we discussed in our recent article "Breaking The Top Four Myths Of Tape vs. Disk Backup" tape is not susceptible to the cost per GB scrutiny that disk is when it is used as part of the backup process. It may sound a little like turning back the clock. This small disk based cached backed by an increasingly reliable tape library or even as a front end to a deduplicated disk backend may be an ideal solution.

Additional Blogs in this Series:

Deduplication 2.0 - Recovery Performance Backup Deduplication 2.0 - Density Backup Deduplication 2.0 - Power Savings Backup Deduplication 2.0 - Integration

Track us on Twitter:

Subscribe to our RSS feed.

George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.

Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
Partner Perspectives
What's This?
In a digital world inundated with advanced security threats, Intel Security seeks to transform how we live and work to keep our information secure. Through hardware and software development, Intel Security delivers robust solutions that integrate security into every layer of every digital device. In combining the security expertise of McAfee with the innovation, performance, and trust of Intel, this vision becomes a reality.

As we rely on technology to enhance our everyday and business life, we must too consider the security of the intellectual property and confidential data that is housed on these devices. As we increase the number of devices we use, we increase the number of gateways and opportunity for security threats. Intel Security takes the “security connected” approach to ensure that every device is secure, and that all security solutions are seamlessly integrated.
Featured Writers
White Papers
Current Issue
Dark Reading's October Tech Digest
Fast data analysis can stymie attacks and strengthen enterprise security. Does your team have the data smarts?
Flash Poll
10 Recommendations for Outsourcing Security
10 Recommendations for Outsourcing Security
Enterprises today have a wide range of third-party options to help improve their defenses, including MSSPs, auditing and penetration testing, and DDoS protection. But are there situations in which a service provider might actually increase risk?
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
Published: 2014-10-25
The Ethernet Connectivity Fault Management (CFM) handling feature in Cisco IOS 12.2(33)SRE9a and earlier and IOS XE 3.13S and earlier allows remote attackers to cause a denial of service (device reload) via malformed CFM packets, aka Bug ID CSCuq93406.

Published: 2014-10-25
The EMC NetWorker Module for MEDITECH (aka NMMEDI) 3.0 build 87 through 90, when EMC RecoverPoint and Plink are used, stores cleartext RecoverPoint Appliance credentials in nsrmedisv.raw log files, which allows local users to obtain sensitive information by reading these files.

Published: 2014-10-25
EMC Avamar 6.0.x, 6.1.x, and 7.0.x in Avamar Data Store (ADS) GEN4(S) and Avamar Virtual Edition (AVE), when Password Hardening before is enabled, uses UNIX DES crypt for password hashing, which makes it easier for context-dependent attackers to obtain cleartext passwords via a brute-force a...

Published: 2014-10-25
EMC Avamar Data Store (ADS) and Avamar Virtual Edition (AVE) 6.x and 7.0.x through 7.0.2-43 do not require authentication for Java API calls, which allows remote attackers to discover grid MCUser and GSAN passwords via a crafted call.

Published: 2014-10-25
CRLF injection vulnerability in IBM Tivoli Integrated Portal (TIP) 2.2.x allows remote authenticated users to inject arbitrary HTTP headers and conduct HTTP response splitting attacks via unspecified vectors.

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Follow Dark Reading editors into the field as they talk with noted experts from the security world.