The answer to one disk backups biggest weakness is to figure out how to integrate power managed drives into disk deduplication systems. The use of these drives can be troublesome with disk solutions that use deduplication. Deduplication makes heavy use of indexing to identify redundant data, it performs frequent data integrity checks and often use garbage collection techniques to remove old data that no longer has active pointers. All of this constant access makes it difficult to spin down a drive for any significant amount of time.
There are ways to get some power management in backup deduplication systems. For example you can add deduplication technology to a spin down system as we discussed in "Power Managed Dedupe". The deduplication software can be optimized to narrow its garbage collection windows and error checking so the system could be in a spin down mode for the bulk of the non-backup window. Further multiples of these systems could be used over time with backup re-directed to different units at different times, alternating by quarter for example. The downside to this approach of course would be some increase in redundancy of the backup data set but it would increase power optimization. Over time though deduplication systems are going to have to learn to self-isolate old data.
The backup software applications that can do their own deduplication may be able to perform this for you as well. By setting up different drive groups in a power managed array or even using different arrays you could send deduplicated backup data to backup pools, which would give the system more time to power the drives down.
Clustered or scale out based disk backup systems are going to have to take all of this a step further, since each node is a potential power consumer. They are going to have to be able to move data to older nodes and then power or at least idle those nodes down. Steps could be taken to not only power the drives down but lower fan speed and processor speed which could lead to a very efficient scale out story. That is either going to require sophisticated node communication or an internal sub-dividing of the nodes to segregate off the infrequently accessed data set.
Another option for power efficiency is to use backup virtualization as we talked about in our recent article "Backup Virtualization Brings Flexibility to Disk Backup". Leveraging this technology backups could be sent to a very small high speed disk cache, then quickly spool that data to a disk deduplication system for medium term storage and then finally spill to tape as the data becomes old and infrequently accessed. This gives you the use of each backup device for what it is already best at instead of waiting for technology to fill in the gaps.
Track us on Twitter: http://twitter.com/storageswiss
Subscribe to our RSS feed.
George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.