News
6/5/2009
11:20 AM
George Crump
George Crump
Commentary
50%
50%

What Is Deduplication And Why Should You Care?

A couple of days ago I was speaking at an event in Dallas and was reminded that sometimes those of us in storage get too wrapped up in, well, storage and that IT professionals have other things to worry about than just storage. I asked the audience how many of them had done anything with deduplication. Only 30% had, although 100% wanted to know more.

A couple of days ago I was speaking at an event in Dallas and was reminded that sometimes those of us in storage get too wrapped up in, well, storage and that IT professionals have other things to worry about than just storage. I asked the audience how many of them had done anything with deduplication. Only 30% had, although 100% wanted to know more.With all the news about NetApp and EMC in a bidding war to buy Data Domain. It might make sense for us to pause a moment and help explain why these two companies are willing to pay almost $2 billion dollars for the market leading provider of this technology.

Deduplication at its most simple level, examines data, compares it to other data that is already stored and if that data is identical, instead of storing that second copy of data the deduplication technology establishes a link to the original data. It requires significantly less storage space to establish a link than to actually store the file.

Deduplication first gained traction as a technology to enhance disk backup. Without deduplication your disk backup had to scale to store multiple full backups and several weeks worth of incremental backups. Even with the plummeting price of ATA storage the cost to configure a disk array to store even a months worth of backups, let alone the power, cooling and space required by the array was enormous.

If you do backups you know that this data, especially in full backups, is highly redundant and this is where deduplication shines, as a result it was the first market that the technology became a requirement and companies like Data Domain and Avamar became market leaders. Avamar was snatched up by EMC but Data Domain made it all the way to becoming a public company.

What really drove Data Domain's success in the backup space is the ability to replicate backup data to another site. This was an often requested feature when disk to disk backup first started to become viable, but the way and speed at which backup data is created standard replication wouldn't work across normal WAN bandwidth. Deduplication gets around this because it only stores changed or net new data and then only that data needs to be replicated; much more WAN friendly.

These capabilities in backup alone are not enough to justify a $2 billion investment in deduplication technology. What is driving these companies to pay this type of money is what deduplication can do to the rest of the storage spectrum; primary storage and archive storage.

For example if a company armed with deduplication can implement this into primary storage in a way that causes little to no performance impact yet can increase storage efficiencies 60% to 70%, that could get interesting. Imagine if you needed 80TB's of storage but one of your vendors only needed to supply you with 40TB because they had this technology, clearly that vendor would have a significant advantage in winning your business.

Clearly this technology is not limited to Data Domain and there are a host of other vendors that can provide compression, deduplication or both at different levels of storage. The publicity generated by this bidding war obviously helps Data Domain but it also helps many of the other deduplication suppliers.

What all of this should be telling you is that deduplication is important, how it is used and how it is implemented in the various storage tiers matters and this is as good a time as any to begin to learn about and implement the technology.

Track us on Twitter: http://twitter.com/storageswiss.

Subscribe to our RSS feed.

George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.

Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
White Papers
Cartoon
Current Issue
Flash Poll
10 Recommendations for Outsourcing Security
10 Recommendations for Outsourcing Security
Enterprises today have a wide range of third-party options to help improve their defenses, including MSSPs, auditing and penetration testing, and DDoS protection. But are there situations in which a service provider might actually increase risk?
Video
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2013-6501
Published: 2015-03-30
The default soap.wsdl_cache_dir setting in (1) php.ini-production and (2) php.ini-development in PHP through 5.6.7 specifies the /tmp directory, which makes it easier for local users to conduct WSDL injection attacks by creating a file under /tmp with a predictable filename that is used by the get_s...

CVE-2014-9652
Published: 2015-03-30
The mconvert function in softmagic.c in file before 5.21, as used in the Fileinfo component in PHP before 5.4.37, 5.5.x before 5.5.21, and 5.6.x before 5.6.5, does not properly handle a certain string-length field during a copy of a truncated version of a Pascal string, which might allow remote atta...

CVE-2014-9653
Published: 2015-03-30
readelf.c in file before 5.22, as used in the Fileinfo component in PHP before 5.4.37, 5.5.x before 5.5.21, and 5.6.x before 5.6.5, does not consider that pread calls sometimes read only a subset of the available data, which allows remote attackers to cause a denial of service (uninitialized memory ...

CVE-2014-9705
Published: 2015-03-30
Heap-based buffer overflow in the enchant_broker_request_dict function in ext/enchant/enchant.c in PHP before 5.4.38, 5.5.x before 5.5.22, and 5.6.x before 5.6.6 allows remote attackers to execute arbitrary code via vectors that trigger creation of multiple dictionaries.

CVE-2014-9709
Published: 2015-03-30
The GetCode_ function in gd_gif_in.c in GD 2.1.1 and earlier, as used in PHP before 5.5.21 and 5.6.x before 5.6.5, allows remote attackers to cause a denial of service (buffer over-read and application crash) via a crafted GIF image that is improperly handled by the gdImageCreateFromGif function.

Dark Reading Radio
Archived Dark Reading Radio
Good hackers--aka security researchers--are worried about the possible legal and professional ramifications of President Obama's new proposed crackdown on cyber criminals.