"Your security analytics are only as accurate and useful as the data you put in," says Gidi Cohen, CEO of SkySecurity. "If the data has gaping holes, misses important network zones, or lacks input from security controls, then you will have gaping holes in your view and miss key dependencies between the myriad security tools and processes you use."
So what does a data-centric analysis process look like? It starts first with recognizing that you've got access to more relevant data than you think you do. Most organizations already have everything they need to know in order to know themselves for analytics' sake, says Kelly White, vice president and information security manager of a top 25 U.S. financial institution, who shared best practices on the condition of not naming his employer.
[Do you see the perimeter half empty or half full? See Is The Perimeter Really Dead?.]
"If you just think about and internalize the amount of information your systems produce -- just by the fact that they're running on your network -- if you think about all of the security information that your users produce as they go about their daily work, it's not something that you have to go out and buy from somebody," White says. "You don't need to subscribe to a report. Really, everything you need to know yourself, you've got already."
Organizations that get creative with their sourcing of data are the ones that tend to get more value out of analytics than those that simply lump together security system log data in a SIEM or who think of threat intelligence from outside sources interchangeably with security analytics.
Some of the data sources that could play a big part in forming more complete data sets could include network footprint data, platform configuration information, log-in and identity management data, database server logs and NetFlow data. White's organization is even as creative as to use a Google appliance to index and search against unstructured data stores such as SharePoint servers to find relevant information, such as unstructured repositories of PII, and create a map of relevant information that would otherwise present blind spots when assessing security risks.
Identifying potential internal sources of data is only the first step in ensuring that it can provide value to an analytics program. Organizations also must groom and prepare the data to make sure it is of reliable quality and it is in a useful format. This means doing a bit of quality assurance -- a sort of presecurity analytics, as Mike Lloyd, CTO of RedSeal Networks, calls it -- to make sure gaps are filled and sources are refined so their feeds are accurate enough to make operational assumptions upon.
"If the data quality is bad, you have to do analysis on that first to decide what's wrong with the data, how bad a problem is it and what you can do about it to make it useable," Lloyd says, explaining that the more data sources you combine to get slightly different views of the same environment, the easier it is to do this. "When you combine data, you can criticize the data feed itself and not rush headlong into security analytics."
And this kind of criticism of data feeds shouldn't just happen on the front end of the analytics process -- it should be an on-going routine. Because, as Rajesh Goel, CTO of Brainlink International, points out, changes from infrastructure vendors could greatly impact data feeds.
"Vendor updates, patches and changes can change the meaning of the raw data generated and subsequent analytics. Some vendors communicate the changes clearly, others bury them in massive updates, and do NOT take into account that the events being generated have changed," he says. "It's important to confirm/validate that we're still getting the needed data and that the value of threats or events hasn't changed."
Even if the data itself is good, it may not be dispensed by a particular piece of software or hardware in any kind of format useable to a security analytics team.
The data required to perform accurate and thorough security big data analytics exists, however the challenge is in having to consume vast amounts of dissimilar and proprietary formats," says Jim Butterworth, CSO of HBGary.
This is why normalization may also play an important role in getting data ready for analytics prime time.
"In order for the data to be useful, it must be collected and normalized, so that all of the data is speaking the same language," says Cohen. "Once the data is normalized, your analytical tools can operate on that data in a common way, which reduces the amount of vendor-specific expertise needed."
However, organizations shouldn't worship at the normalization altar to the point where it holds back nimble analysis.
"I would argue that you don't necessarily have to normalize everything. There's going to be a lot of unstructured data that doesn't necessarily have to be structured," says Michael Roytman, data scientist for Risk I/O, explaining that for example an organization may take a piece of external data from a report like the DBIR that says its industry is 12% more likely experience something like a SQL injection attack and add a 'fudge factor' that increases the weight of those vulnerabilities. "It's about looking at that data and figuring out a quick, easy and dirty way to apply that to your target asset."
Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.