The answer to the event data retention question isn't simple to answer. Organizations must not only consider their security and compliance analysis needs, but they must also think about the e-discovery risk posed by storing all of that data. They must consider what format to store the data in, the taxonomy of how it is all grouped together, the ability of the security department to actually store and manage the data, and -- perhaps most important -- how all of that data will be accessed and searched to extract actionable information when an incident occurs or regulators come knocking.
"As organizations see monitoring become more significant, they're also going to have to realize this means big data," says Scott Crawford, research director of security for Enterprise Management Associates, a consulting firm. "So we're going to need to adapt how we manage data in security, and we're going to have to become more literate in the tools and techniques of big data management."
Your ability to manage and search log data will depend largely on how long the data is kept -- and for what purpose. If the organization is simply looking to comply with regulations, then it is all pretty cut-and-dried because compliance mandates are fairly clear, says Mark Seward, director security and compliance marketing for Splunk. But as the organization moves beyond check-box compliance and really wants to drive value from their event data, things change.
"If you want to store it to detect some sort of advanced persistent threat or some sort of attack that could occur over months or years, you may want to store it longer," Seward says. "I don't think that necessarily all needs to be stored on spinning disk all the time, but you want to store it someplace where it can be quickly resurrected -- preferably using a system that time-indexes all of the data so that you can take it, put it on a server, and immediately start a forensics investigation, if that's what you want to do." According to Joe Gottlieb, CEO for SIEM vendor SenSage, his company's most advanced customers retain log data for periods of time much longer than those required by regulations. "Our most advanced customers are very progressive in terms of embracing the fact that the more data you can hold onto and intelligently process, the more you can understand about your security operations, your security posture, what's working, what's not working, and how you might actually support the new investments to shore up one area versus another and so on," Gottlieb says. "Sometimes that holding period could fly in the face of what you are feeling obligated to do in terms of compliance."
Not only do longer retention periods afford organizations the potential to make operational adjustments to thwart attacks, but they can be crucial in breach investigations, particularly for breaches that occurred during the course of months or years.
"From a breach preparation standpoint, it is very useful to have a decent amount of data that will allow you to identify what data was compromised and what data wasn't for a given breach customer record, which customer records were accessed, which weren't, and some of the ratios there," says Gottlieb, who says two years is a good starting rule of thumb for organizations seeking to retain event data for longer periods.
"One of the biggest benefits of some of this data retention is that, in the event of a breach, you have a more immediate understanding that you can use to put a boundary around the breach and then start to shape your damage-control activities," Gottlieb says.
Once an organization has figured out how long to keep data, two key issues that must addressed are scalability and data taxonomy.
"Anything you can do along the lines of providing your own sort of taxonomy -- to make sure you can quickly find and identify pieces of information that are relevant to a particular use case -- is essential," Seward says. "Putting a whole library of books somewhere without some sort of system to keep everything straight is a recipe for disaster.
"If you're a data-intensive organization and need to keep massive amounts of data around -- for use either by your own internal employees or by customers -- then you've got to look at scalability, and you've got to find a system that can scale, potentially to petabytes of data," Seward says.
Data management is a function of how the data itself is structured, Crawford observes. Data format issues have long plagued SIEM practitioners, and they will be further exacerbated by keeping data longer, he notes.
Organizations should keep close tabs on formats, such as Common Event Expression -- or even vendor-based Common Event Format -- to see how they evolve, Crawford advises. But, ultimately, the effective management of petabytes of stored log data will require the security world to adopt data normalization methods used in other data-intensive disciplines.
"Normalization basically has to be done by what is understood in the data management world as ETL functionality," Crawford says. "So it is done within the pipeline somewhere -- not necessarily on the fly, but it has to be converted to a format that can be consumed by data management resources. That would be a pretty substantial, fundamental change in security and event management technology, but we're a couple of years away at least."
The absence of a structured, cross-vendor format is one reason why tools like Splunk have made a dent in the SIEM world, experts say. Security practitioners use Splunk and tools like it to run an analytics language that can still run complicated search queries on unstructured data, as well as data sources that might not be normalized, Crawford says.
Crawford cites SenSage for its efforts to offer security data feeds in a way that enables security departments to analyze it using mainstream business intelligence and data management tools -- a concept that is being championed by the Open Security Intelligence forum. Other vendors that offer both SIEM and storage management capabilities -- such as EMC, HP, and IBM -- still have a ways to go to offer a mature tool set for log data management and searching, he notes.
Quick searches are the key to long-running stores of security, experts say. Because, ultimately, all of these retention efforts are for naught if the data isn't actually used efficiently in the long run.
"All this data storage is a waste if you can't access it -- and access it in a way that is going to be timely in the event of a incident response scenario," says Gottlieb. "Perhaps more importantly, on a proactive level [the stored data should] actually be feeding your exception filtering, reporting, review, and management triage processes."
Have a comment on this story? Please click "Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.