News
1/25/2011
11:32 AM
George Crump
George Crump
Commentary
Connect Directly
RSS
E-Mail
50%
50%

Deduplication 2.0 - Recovery Performance

"It's all about recovery", you'll here it in almost every sales presentation by a backup vendor. That advice holds true for backup deduplication devices as well. A common mistake is to assume that because deduplication products, most often disk based, that they also offer the best recovery performance. This is not always the case and as we move into the next dedupe era it has to improve.

"It's all about recovery", you'll here it in almost every sales presentation by a backup vendor. That advice holds true for backup deduplication devices as well. A common mistake is to assume that because deduplication products, most often disk based, that they also offer the best recovery performance. This is not always the case and as we move into the next dedupe era it has to improve.A common cause of poor recovery performance seems to be in poor meta data management. Most deduplication systems build some form of a table that tracks what type of data has been written to disk and where it is stored. It is the responsibility of this table to compare new inbound data to data that is already on disk and eliminate the redundant segments. It is also the responsibility of this table, in most cases, to put these segments back together when the backup application requests a file to be recovered. Interestingly as we discussed a while ago in our article "All Deduplication Is Not Created Equal" and what we have seen in repeated testing in our labs is still true today how well this table is managed and accessed can impact recovery performance. In some cases we have seen that the further you get away from the original data set the more of a performance hit poor meta-data management makes.

For example if you do 40 full backups of a data set that changes slightly between sets, meaning that the deduplication ratio is fairly high, and then try to recover from the 3rd copy and then 37th copy. With some deduplication systems you will find a significant difference in the time it takes to recover that data between those two interations of the backup data set. This is certainly something to test in any deduplication system that you are evaluating to make sure your perspective vendor has addressed this issue. It is also something that all deduplication vendors need to keep working on to make sure their systems don't have that problem. Versus straight un-deduplicated disk, a small less than 5%, performance loss is probably acceptable but anything more could begin to significantly impact recovery windows.

The other area where recovery performance is going to become increasingly critical is as data protection solutions continue to add a recovery in place type of capability, as we discuss in our article "Virtualization Powered Recovery". In this instance you can leverage the fact that disk backup technology is in fact disk and running a server instance or other type of data set directly from the backup device is now possible. The performance focus shifts from fast streaming reads to purely random interactive reads. While no one is expecting primary storage like performance, deduplication hardware vendors need to make sure that they can handle this change in requirement from the deduplicated area or they may need to provide a non-deduplicated staging area, to at least keep that performance acceptable.

Another event that impacts recovery performance is what happens when a disk has failed on the backup deduplication system and you need to recover data while the rebuild is underway? We will address RAID data protection and how it is implemented on deduplicated systems in an upcoming entry.

Track us on Twitter: http://twitter.com/storageswiss

Subscribe to our RSS feed.

George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
404040
50%
50%
404040,
User Rank: Apprentice
11/16/2011 | 12:25:59 PM
re: Deduplication 2.0 - Recovery Performance
great job
404040
50%
50%
404040,
User Rank: Apprentice
11/16/2011 | 12:26:45 PM
re: Deduplication 2.0 - Recovery Performance
fantastic.
Register for Dark Reading Newsletters
White Papers
Flash Poll
Current Issue
Cartoon
Threat Intel Today
Threat Intel Today
The 397 respondents to our new survey buy into using intel to stay ahead of attackers: 85% say threat intelligence plays some role in their IT security strategies, and many of them subscribe to two or more third-party feeds; 10% leverage five or more.
Video
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2013-6335
Published: 2014-08-26
The Backup-Archive client in IBM Tivoli Storage Manager (TSM) for Space Management 5.x and 6.x before 6.2.5.3, 6.3.x before 6.3.2, 6.4.x before 6.4.2, and 7.1.x before 7.1.0.3 on Linux and AIX, and 5.x and 6.x before 6.1.5.6 on Solaris and HP-UX, does not preserve file permissions across backup and ...

CVE-2014-0480
Published: 2014-08-26
The core.urlresolvers.reverse function in Django before 1.4.14, 1.5.x before 1.5.9, 1.6.x before 1.6.6, and 1.7 before release candidate 3 does not properly validate URLs, which allows remote attackers to conduct phishing attacks via a // (slash slash) in a URL, which triggers a scheme-relative URL ...

CVE-2014-0481
Published: 2014-08-26
The default configuration for the file upload handling system in Django before 1.4.14, 1.5.x before 1.5.9, 1.6.x before 1.6.6, and 1.7 before release candidate 3 uses a sequential file name generation process when a file with a conflicting name is uploaded, which allows remote attackers to cause a d...

CVE-2014-0482
Published: 2014-08-26
The contrib.auth.middleware.RemoteUserMiddleware middleware in Django before 1.4.14, 1.5.x before 1.5.9, 1.6.x before 1.6.6, and 1.7 before release candidate 3, when using the contrib.auth.backends.RemoteUserBackend backend, allows remote authenticated users to hijack web sessions via vectors relate...

CVE-2014-0483
Published: 2014-08-26
The administrative interface (contrib.admin) in Django before 1.4.14, 1.5.x before 1.5.9, 1.6.x before 1.6.6, and 1.7 before release candidate 3 does not check if a field represents a relationship between models, which allows remote authenticated users to obtain sensitive information via a to_field ...

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Three interviews on critical embedded systems and security, recorded at Black Hat 2014 in Las Vegas.