Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security //

Database Security

5/12/2008
04:45 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Deduplication Checklist

Here are five key questions to ask before committing to a data deduplication system.

No doubt about it: data deduplication can be a magic bullet for backup. Organizations that apply it intelligently will see faster backups, easier restores, and a reduction in power, space, and cooling costs. But put in the wrong solution, and you may instead find yourself walking the unemployment line.

Nowhere in IT does the phrase "Your mileage may vary" apply more than with data deduplication. Data reduction ratios vary depending on the type of data being backed up, the rate at which data changes between backups, and the backup scheme used.

To help companies choose the best technology for their needs, I've identified five key questions to ask:

InformationWeek Reports

Where to deduplicate?
Organizations looking to bring sanity to remote-office backups should consider remote-office/back-office backup software such as Asigra's Televaulting or EMC's Avamar that deduplicates at the source server, reducing the bandwidth needed to back up across the WAN. Larger branches, or those with less reliable WAN connections, are better served by deduplicating appliances that can replicate globally deduplicated data.

Pure Windows shops can look at Data Storage Group's ArchiveIQ, an innovative backup program that deduplicates data at the backup server. I expect EMC, CommVault, and Symantec to add backup-server deduplication over the next year or two.

How fast do you need to back up?
While vendors like to talk about speeds and feeds, the main thing is whether a given backup device is fast enough for your needs. Vendors claim their in-line deduping targets can handle 200 GB to 800 GB an hour, and their post-processing virtual tape libraries (VTLs) have data ingestion rates of up to 34 TB an hour. But the latter then may need several hours to deduplicate the data.

In addition to overall performance, make sure to look at how fast the appliance you're considering can handle a single backup stream from your biggest backup job, and how long the deduplicating post-process will take to complete.


Howard Marks

Nowhere in IT does the phrase 'Your mileage may vary' apply more than with data deduplication
Does the technology work with your backup software?
Content-aware products rely on their knowledge of the data formats that the backup applications write in. Pair a content-aware solution with a backup application it isn't equipped to manage, and you won't get any deduplication.

What interface?
Deduplicating targets come with network-attached storage and/or VTL interfaces. The NAS interface makes it easier to manage data--after all, you can't delete part of a tape, real or virtual. The problem with NAS is that it's limited to 1-Gbps Ethernet, while VTLs run over 4-Gbps Fibre Channel hardware interfaces. If you need more than backup speeds of 300 GB an hour, VTL is the way to go.

How does the technology scale?
The first data corollary to Murphy's Law states, "Data will grow to fill all available space." As a result, no matter what backup appliance or VTL you buy today, in two or three years you'll need a bigger one.

Look for devices that can expand to at least twice their size when you buy them. Gateway devices that use storage area networks and appliances that can be expanded by adding drive trays are more flexible than standalone devices. NEC's Hydrastor, using a grid architecture of accelerator and storage nodes with essentially no maximum capacity, is especially well-suited to those with fast-growing or unpredictable needs.

Illustration by John Hersey

Return to the story:
With Data Deduplication, Less Is More

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
COVID-19: Latest Security News & Commentary
Dark Reading Staff 10/30/2020
'Act of War' Clause Could Nix Cyber Insurance Payouts
Robert Lemos, Contributing Writer,  10/29/2020
6 Ways Passwords Fail Basic Security Tests
Curtis Franklin Jr., Senior Editor at Dark Reading,  10/28/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
How to Measure and Reduce Cybersecurity Risk in Your Organization
In this Tech Digest, we examine the difficult practice of measuring cyber-risk that has long been an elusive target for enterprises. Download it today!
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-5991
PUBLISHED: 2020-10-30
NVIDIA CUDA Toolkit, all versions prior to 11.1.1, contains a vulnerability in the NVJPEG library in which an out-of-bounds read or write operation may lead to code execution, denial of service, or information disclosure.
CVE-2020-15273
PUBLISHED: 2020-10-30
baserCMS before version 4.4.1 is vulnerable to Cross-Site Scripting. The issue affects the following components: Edit feed settings, Edit widget area, Sub site new registration, New category registration. Arbitrary JavaScript may be executed by entering specific characters in the account that can ac...
CVE-2020-15276
PUBLISHED: 2020-10-30
baserCMS before version 4.4.1 is vulnerable to Cross-Site Scripting. Arbitrary JavaScript may be executed by entering a crafted nickname in blog comments. The issue affects the blog comment component. It is fixed in version 4.4.1.
CVE-2020-15277
PUBLISHED: 2020-10-30
baserCMS before version 4.4.1 is affected by Remote Code Execution (RCE). Code may be executed by logging in as a system administrator and uploading an executable script file such as a PHP file. The Edit template component is vulnerable. The issue is fixed in version 4.4.1.
CVE-2020-7373
PUBLISHED: 2020-10-30
vBulletin 5.5.4 through 5.6.2 allows remote command execution via crafted subWidgets data in an ajax/render/widget_tabbedcontainer_tab_panel request. NOTE: this issue exists because of an incomplete fix for CVE-2019-16759. ALSO NOTE: CVE-2020-7373 is a duplicate of CVE-2020-17496. CVE-2020-17496 is ...