Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.
Your servers are probably bloated with data that is years old and yet despite your retention policy, if you have one, you keep it all. The relatively inexpensive price of disk capacity has made it easier to keep everything on primary disk storage. When you think of primary storage, you think of active data, databases, current documents, e-mail, etc. -- but because of the affordability of storage, it basically also has become the archive. Data is kept on disk, "just in case." It seems easier to s
May 7, 2008
3 Min Read
Your servers are probably bloated with data that is years old and yet despite your retention policy, if you have one, you keep it all. The relatively inexpensive price of disk capacity has made it easier to keep everything on primary disk storage. When you think of primary storage, you think of active data, databases, current documents, e-mail, etc. -- but because of the affordability of storage, it basically also has become the archive. Data is kept on disk, "just in case." It seems easier to simply add more disk space to primary storage than to force users to manage it; as a result, "Data Keepage" begins.The effects of data keepage are widespread, but two of those effects require immediate concern.
Impact On Backup Data keepage's largest impact is on the backup process. Most data centers will continue to run weekly full backups. These backups will be bogged down, backing up millions of files that haven't been touched in years, let alone since the last backup. Hardware acquisition costs are driven up because the size of the backup target, be it disk or tape, needs to be larger. Backup data deduplication systems can help the storage capacity costs, but they do little to thin the backup across the network. Additionally, most backup applications have difficulty handling millions of small files because they have to walk these file systems. Block-level incremental backup (BLIB) applications, like those from Network Appliance and Syncsort, thin the backup across the network by only sending changed blocks. They also are immune to the millions of small-file issues because they do their changed state analysis at a much lower level that is more efficient than at the file-system level.
Another solution that can be deployed in conjunction with BLIB or independently is disk-based archiving: getting the data off primary storage to a less expensive but more secure device. Cleaning off primary storage has been less then desirable in the past but with features like data deduplication, easy access via a network mount point, massive scalability, and power management, these solutions make the process viable and the effort worthwhile.
Violation Of Retention Policies If you have a strict retention policy, whether for e-mail, files, or other data types, data keepage probably puts you in violation of that policy. For example, if you have a retention policy that defines retention of a year or three years, which is not uncommon, this typically means that you are only going to maintain backups for that period of time. It essentially is a restore policy to protect you from legal action. The problem is that if it can be proven that the data being sought after already is on a server somewhere, you will be forced to deliver that information. Saying you don't have backups of that data doesn't apply unless that data has been deleted. Obviously, deleting data after notification of legal action is directly against the law.
The solution here is disk-based archive, with added functionality to make it enterprise class. Capabilities such as Write Once Read Many (WORM, used to prevent changes to data), encryption, and content indexing all become critical in managing retained data.
These are just two of the critical problems that giving into the temptation of data keepage cause. Solutions exist to clear off this old data and optimize the investment in primary storage for only the most active set of data, resulting in better use of storage expenditures, improved backup windows, and better litigation preparedness.
George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and sales positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.
About the Author(s)
President, Storage Switzerland
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, and SAN. Prior to founding Storage Switzerland, he was CTO at one the nation’s largest storage integrators, where he was in charge of technology testing, integration, and product selection. George is responsible for the storage blog on InformationWeek's website and is a regular contributor to publications such as Byte and Switch, SearchStorage, eWeek, SearchServerVirtualizaiton, and SearchDataBackup.
You May Also Like
A screen displaying many different types of charts and graphs to show what data is being analyzed.Cybersecurity Analytics