Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

News

9/10/2010
12:27 PM
George Crump
George Crump
Commentary
50%
50%

The DeDupe Performance Boost

Deduplication is the elimination of redundant data typically associated with optimizing storage utilization. I've spent some time lately defending our stance that deduplication in primary storage can be done without a performance penalty. What is not often discussed is that there is also the potential for a performance gain when using deduplication that may outweigh the resources costs associated with the process.

Deduplication is the elimination of redundant data typically associated with optimizing storage utilization. I've spent some time lately defending our stance that deduplication in primary storage can be done without a performance penalty. What is not often discussed is that there is also the potential for a performance gain when using deduplication that may outweigh the resources costs associated with the process.First from a performance penalty perspective, by no penalty I mean that if configured correctly the right combination of hardware and software should not negatively impact the user or application experience. As I discuss in a recent article this may mean the vendor has to up the ante on the storage processor, but most of the time the storage processor is idle, it definitely means that the software code has to be written very efficiently.

Dedupe can potentially boost both write and read performance in a storage system as well. The read side is somewhat easy to figure out as long as deduplication is out of the read path and data does not need to be re-hydrated on read. This is typically done by leveraging the storage systems' existing extent strategy similar to how snapshots work today. If as a result of deduplication data has been consolidated into fewer bits of information the storage system has less work to do. There is less head movement because there is less data to seek. Deduplication should also increase the likelihood of a cache hit, since from a logical sense more data can fit into the same cache. This can be especially true in a virtualized server environment where much of the server OS images are deduplicated. The active files from the core virtual machine image are leveraged across multiple virtual servers all reading the same or similar information, which now thanks to dedupe is really only one cache friendly instance.

Most deduplication vendors agree that reads are the easy part. Writes are hard. To see a write boost from deduplication to the best of my reasoning is going to require an inline process. Post process or even parallel process deduplication means that the write, redundant or not, always goes to the disk system first and is then later (within seconds with parallel) eliminated if a redundancy is found. While this process can be done with minimal, if any, performance penalty, it would be hard to claim a write performance gain as a result. There is certainly a write that occurred that potentially did not need to and if that write was found to be redundant then some work has to be done to erase or release the redundant information.

With an inline deduplication model a redundant write can be identified and it never has to occur. If that write does not have to occur then neither does the parity calculation nor the subsequent parity writes. In a RAID 6 configuration over the course of not having to save a file or parts of a file that already exists you could be saving hundreds of block writes and parity writes.

As I discussed on a SSD chat on Enterprise Efficiency, where the use of inline deduplication in primary storage could become very interesting is in FLASH based systems. First FLASH memory is more expensive per GB than mechanical hard drives, so every % saved in capacity has 10X more value. Second FLASH's weakness is writes. While FLASH controllers are addressing much of the wear leveling issues that surround FLASH the less writes the longer the life of the solid state system. Finally FLASH storage typically has performance to spare, using the excess to increase its value and reliability makes deduplication a good investment.

FLASH or HDD, using inline deduplication on primary storage has the potential to improve overall write performance in a storage system. The question is will the amount of work that has to occur to determine redundancy negate the gains in write performance? That, as is the case with other forms of deduplication, is largely dependent on how well the deduplication software is written, how efficient its lookups are and how much memory and processing power can affordably be put on the storage system itself.

In either case make no mistake, we are heading into an era where deduplication without a performance penalty is a reality. The software can be made efficient enough and the hardware has enough power to make it a reality.

Track us on Twitter: http://twitter.com/storageswiss

Subscribe to our RSS feed.

George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Find Storage Switzerland's disclosure statement here.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Commentary
Ransomware Is Not the Problem
Adam Shostack, Consultant, Entrepreneur, Technologist, Game Designer,  6/9/2021
Edge-DRsplash-11-edge-ask-the-experts
How Can I Test the Security of My Home-Office Employees' Routers?
John Bock, Senior Research Scientist,  6/7/2021
News
New Ransomware Group Claiming Connection to REvil Gang Surfaces
Jai Vijayan, Contributing Writer,  6/10/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: This gives a new meaning to blind leading the blind.
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2021-27479
PUBLISHED: 2021-06-16
ZOLL Defibrillator Dashboard, v prior to 2.2,The affected product’s web application could allow a low privilege user to inject parameters to contain malicious scripts to be executed by higher privilege users.
CVE-2021-27483
PUBLISHED: 2021-06-16
ZOLL Defibrillator Dashboard, v prior to 2.2,The affected products contain insecure filesystem permissions that could allow a lower privilege user to escalate privileges to an administrative level user.
CVE-2021-27485
PUBLISHED: 2021-06-16
ZOLL Defibrillator Dashboard, v prior to 2.2,The application allows users to store their passwords in a recoverable format, which could allow an attacker to retrieve the credentials from the web browser.
CVE-2021-31159
PUBLISHED: 2021-06-16
Zoho ManageEngine ServiceDesk Plus MSP before 10519 is vulnerable to a User Enumeration bug due to improper error-message generation in the Forgot Password functionality, aka SDPMSP-15732.
CVE-2021-31857
PUBLISHED: 2021-06-16
In Zoho ManageEngine Password Manager Pro before 11.1 build 11104, attackers are able to retrieve credentials via a browser extension for non-website resource types.