Capacity optimization should come in at least two forms. One is compression and the other is deduplication. Deduplication, the ability to identify redundant data and only store that data once, captures all the attention. While it is important in the keep data forever strategy, there should not be the redundancy that there is in backup. As we stated in our article "Backup vs. Archive", backups send essentially the same data over and over again. Archive should be a one time event where data is archived one time, replicated to a redundant archive and then removed from primary storage. While certainly some duplication will exist, it will not deliver the same return on investment that backup deduplication will. While your mileage will vary, expect about a 3X to 5X reduction.
While compression does not have the same percentage gains that deduplication does when there is duplicate data, compression does work across almost all data, redundant or not. Gaining a 50% reduction on all files instead of a 300% reduction on a few files may provide greater savings. The best choice though is to combine the two techniques for maximum total reduction.
Tape as a means to contain capacity costs can not be left out of the capacity discussion. As we discuss in our article "What is LTFS?", IBM's new tape based file system for LTO makes tape more viable than ever for long term data retention. Tape and disk archive should no longer be looked at as competitors but complimentary to each other where disk fills the intermediate role of storing data for 3-7 years and tape stores data for the remainder of the time. There are several solutions that would support automatically moving data from disk to tape after a given timeframe.
Track us on Twitter: http://twitter.com/storageswiss
Subscribe to our RSS feed.
George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.