Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

News

6/3/2011
12:08 PM
George Crump
George Crump
Commentary
50%
50%

How To Design A 100 Year Data Retention Strategy

A cost effective hardware strategy is only the first step, a process and software strategy is vital to identifying for retention and moving it from primary storage.

100 years is a long time to retain anything let alone electronic data. While not everyone needs to retain data for that long, most organizations have retention needs at least in the seven to 10 year range. Most modern storage systems, however, are not designed to last more than five years, so how to create a storage strategy that can retain data for more than a century?

There are two components to a 100-year retention strategy. The first is to develop a hardware strategy that can cost effectively store that data for the next 100 years. The second is to develop a software and process strategy that will identify and move data to the retention storage area, ideally removing it from primary storage. I believe we need to be driving toward a data center where primary storage is small, fast, and only used for the most active set of data. Even at today's prices, many environments could be solid-state storage only for their primary tier.

I am specifically avoiding calling this storage area an archive tier. Using the term archive implies that this data will be moved to the archive, never to be accessed again. Thanks to initiatives like analytics, litigation management, and compliance, this data will be accessed and the system needs to be able to deliver that data in a timely manner relative to its age and no matter what all the data needs to be easily found.

This does not mean though that the disk tier needs to be disk only. I struggle with how organizations are going to afford to be able to keep 100 years of data on spinning disk. I don’t think all the power management and deduplication in the world is not going to make 100 years of disk only retention a reality. Additionally tape has overcome some of its challenges when it comes to use as a long term archive specifically in the form of the Long Term File System (LTFS) as we discuss in our article "What is LTFS?". The answer for the retention storage area is going to be a mixture of tape and disk.

The disk component needs to be a scaleable infrastructure where nodes of storage can be added to the disk area. More importantly, as we describe in our recent article "Building Affordable, Scalable Storage Infrastructures", these scaleable designs need to support mixed node types. This means nodes of varying disk capacity and processor types but still acting as one within the cluster. This is important because it allows for a rolling migration of storage nodes as equipment ages. Meaning that, over time, you can add new nodes with the latest processors and storage while at the same time gradually deactivating older nodes. This allows you to upgrade the cluster but not have to do a massive data migration, which, depending on the archive, may be almost impossible because of the capacity of the storage area.

The size of the disk component of this retention tier though should be kept at a reasonable level for what you need. Analytics (Big Data) will need to be larger because of the amount of data that needs to be scanned. Compliance and other forms of retention areas can have smaller disk areas but will still be large in comparison to primary storage. The fact that scale out systems can potentially scale to hundreds of nodes does not mean that you want to power, cool, and protect hundreds of nodes. At some point, and I know the disk guys won't like this, you really do need to push to tape. In the past, I have advocated for a disk only repository but LTFS in large part changes all that. I’ll explain why and how to use tape in this 100 year retention strategy in our next entry.

Follow Storage Switzerland on Twitter

George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Storage Switzerland's disclosure statement.

 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Threaded  |  Newest First  |  Oldest First
COVID-19: Latest Security News & Commentary
Dark Reading Staff 7/14/2020
Omdia Research Launches Page on Dark Reading
Tim Wilson, Editor in Chief, Dark Reading 7/9/2020
Why Cybersecurity's Silence Matters to Black Lives
Tiffany Ricks, CEO, HacWare,  7/8/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Special Report: Computing's New Normal, a Dark Reading Perspective
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
The Threat from the Internetand What Your Organization Can Do About It
The Threat from the Internetand What Your Organization Can Do About It
This report describes some of the latest attacks and threats emanating from the Internet, as well as advice and tips on how your organization can mitigate those threats before they affect your business. Download it today!
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-13934
PUBLISHED: 2020-07-14
An h2c direct connection to Apache Tomcat 10.0.0-M1 to 10.0.0-M6, 9.0.0.M5 to 9.0.36 and 8.5.1 to 8.5.56 did not release the HTTP/1.1 processor after the upgrade to HTTP/2. If a sufficient number of such requests were made, an OutOfMemoryException could occur leading to a denial of service.
CVE-2020-13935
PUBLISHED: 2020-07-14
The payload length in a WebSocket frame was not correctly validated in Apache Tomcat 10.0.0-M1 to 10.0.0-M6, 9.0.0.M1 to 9.0.36, 8.5.0 to 8.5.56 and 7.0.27 to 7.0.104. Invalid payload lengths could trigger an infinite loop. Multiple requests with invalid payload lengths could lead to a denial of ser...
CVE-2020-15721
PUBLISHED: 2020-07-14
RosarioSIS through 6.8-beta allows modules/Custom/NotifyParents.php XSS because of the href attributes for AddStudents.php and User.php.
CVE-2020-7592
PUBLISHED: 2020-07-14
A vulnerability has been identified in SIMATIC HMI Basic Panels 1st Generation (incl. SIPLUS variants) (All versions), SIMATIC HMI Basic Panels 2nd Generation (incl. SIPLUS variants) (All versions), SIMATIC HMI Comfort Panels (incl. SIPLUS variants) (All versions), SIMATIC HMI KTP700F Mobile Arctic ...
CVE-2020-7593
PUBLISHED: 2020-07-14
A vulnerability has been identified in LOGO! 8 BM (incl. SIPLUS variants) (V1.81.01 - V1.81.03), LOGO! 8 BM (incl. SIPLUS variants) (V1.82.01), LOGO! 8 BM (incl. SIPLUS variants) (V1.82.02). A buffer overflow vulnerability exists in the Web Server functionality of the device. A remote unauthenticate...