Primary file system storage optimization, i.e. squeezing more data into the same space, continues to grow in popularity. The challenge is that the deduplication of primary storage is not without its rules. You can't dedupe this, you can dedupe that and you have to be cognizant of the performance impact on a deduplicated volume.EMC has announced deduplication on their Celerra platform and NetApp has had it for a while. Others have added it in a near active fashion by compressing and deduplicating data after it becomes stagnant and then companies like Storwize have been providing it in the form of inline real time compression.
As storage virtualization and thin provisioning have proven, primary storage is better when you don't have to compromise. The problem with imposing conditions for use on primary storage is that things can get complicated and that complication can lead people to not use the technology. The more transparent and universally applicable a technology is, the greater its chances for success.
The challenge with some primary storage optimization is it is largely dependent on the type of data you have and the workload that is accessing that data. Obviously for deduplication to generate any benefit there has to be duplicate data which is why, with its weekly fulls, backup is such an ideal application for deduplication. Primary storage on the other hand is not full of duplicate data.
In addition primary storage deduplication is going to have issues with heavy write IO and with random read/write IO. In these situations the performance impact of applying deduplication may be felt by users.
As a result most vendors suggest limiting the deployment of the technology to home directories and to VMware images where the likelihood of duplicate data is high and the workloads are more read intensive.
Databases in particular are left out of the process, concerns arises around the amount of duplicate data that would be found in a database and the performance impact associated with the process. As we stated in our article on database storage optimization, Data Reducing Oracle, inline, real time compression solutions may be a better fit here. Databases are very compressible, whether there is duplicate data or not and in most cases real time compression has no direct impact on performance.
As data growth continues to accelerate more data optimization will be required and applying multiple techniques may be the only way to stem the tide. Compression may be applied universally and as a compliment to deduplication that should be applied to specific workloads, this deduplicated data should then be moved to an archive and out of primary storage all together. Finally as I stated in the last few entries, all this has to be wrapped around tools that increase IT personnel efficiency instep with resource efficiencies.
Track us on Twitter: http://twitter.com/storageswiss.
Subscribe to our RSS feed.
George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.