Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analytics //

Security Monitoring

06:54 PM

Big Data Detectives

Could big data be the key to identifying sophisticated threats? Security experts are on the case.

How Big Is Big?

Big data itself isn't a technology or a method of analysis. It's a concept that involves collecting, managing and making sense of more and new data sources. It's about analyzing the "dark data" (data that is collected but rarely used) created by business devices and systems. For companies, that means collecting orders of magnitude more data.

Business projects aimed at using big data to support security typically follow two paths. In the first, security teams gain access to a company's operational data and run an analysis against that data to highlight events that may indicate a security threat. Alternatively, the team can store data from security devices and other related systems and analyze the secu–rity-specific data for correlations that flag a potential attack.

Which types of data should be analyzed? Opinions vary. Many SIEM vendors argue that the proliferation of device log data creates a big data problem. Other companies, such as RSA, use a more strict definition. For them, big data means monitoring all of the information that crosses the enterprise network -- perhaps an unsurprising opinion for a company owned by storage system maker EMC.

"People think that any time you collect security information, that is big data," says Eddie Schwartz, chief information security officer for RSA. "No, it 's a new way of looking at information. Big data means that we're looking at transactional information, we're looking at the full context and content of network traffic."

For large companies, the creation of a big data store of security information may result as the byproduct of normal business, or it may be a goal.

But some big data advocates urge companies to search for more data sources under a "more is better" mantra. "One of the tenets of big data is that if I have a larger data set, I may see correlations that I might not have seen before," says Samuel Harris, director of enterprise risk management for Teradata.

Yet deriving security intelligence from a large collection of business data requires hard work. Many enterprises have tried to merge additional analytics capabilities into SIEM systems, but that has caused more headaches than hits, says Lucas Zaichkowsky, enterprise defense architect for AccessData, a computer forensics and security consulting firm.

"A company can have so much data and try to do so much with it, and there are no SIEM solutions that can handle it," he says. "There are a lot of failed SIEM projects."

In fact, growth in the types and volume of data produced by networking hardware creates the greatest challenge for companies trying to mine network data. In a study of companies' attitudes toward using big data analytics for security, half of 706 respondents had trouble handling the growth of network data, the Ponemon Institute found. Only 5% of IT security respondents believed the growth in data is an opportunity.

From Big Data To Bad Guys

Nevertheless, there are success stories in combining big data and security. In 2009, IT security firm BeyondTrust embarked on its own big data project. To help security managers focus on the most pressing vulnerabilities, the company pulled together frequently updated internal information -- such as the configuration of every machine in a 100,000-client network -- with information on the latest vulnerabilities, exploit kits and attacks.

Combining external and internal sets of data can help companies focus on the few vulnerabilities that really make a difference -- situations where the company has systems using vulnerable software, and attackers know about the software flaws and are actively exploiting them.

"As a customer, it lets me determine what do I have to do this week and what do I have to do next week to prevent my company from being hacked," says Marc Maiffret, CTO for BeyondTrust.

Another benefit is that BeyondTrust customers can see where they are vulnerable and also query the data for more specific information. "We know there is no way that we have thought of every scenario of how people will use this data, so we give them the tools and let them work with the data," Maiffret says.

Another success story: At the RSA Conference in 2012, Preston Wood, chief security officer at Salt Lake City-based Zions Bancorporation, outlined the bank's use of analytics to mine security events. Zions used open source Hadoop coupled with Google's MapReduce and business intelligence tools to correlate logs from antivirus, databases, firewalls, intrusion-detection systems and financial-industry-specific sources of information, such as credit applications and data. Using these methods, Zions has been able to collect and take action on security information in minutes when it used to take hours, Wood said.

In most cases, big data techniques are used to detect compromises that have already occurred, rather than to prevent them. Because companies are living in a state of compromise, they need to gather as much information as possible on what is happening in their network, says AccessData's Zaichkowsky. "They are accepting that there always will be a Victim Zero, and instead focus on spotting the activity."

Using statistical techniques such as linear regression, general linear models and machine learning, a security analyst looking at data can find odd behavior, suspicious events and other anomalies indicative of a compromise. While some events -- such as an internal system accessed from Russia at night -- are easy to identify as suspicious, more subtle transactions are missed because an analyst hasn't created a rule to watch for the activity. Mapping access attempts from each system, for example, could help security teams pinpoint when a compromised computer is trying every system on a network.

"If I ask a business person what 'bad' looks like, it's not an easy question," says RSA's Schwartz. "But mathematically, these types of anomalous transactions are much more obvious when you do statistical analysis."

RSA, for example, regularly explores different data sets within its own business to find new sources of data that can be mined for security information, Schwartz says.

Unlike log data, which resembles the summary information on a phone bill, big data systems collect detailed records, network packet data, and other data and metadata that are important to enterprise security.

For instance, a SIEM system may note that an EXE file had been downloaded to a desktop, and that the domain it came from was not on any blacklist. However, using other data, a different picture can emerge: The program was packed and obfuscated, downloaded from a nonstandard port and sent from a domain that was only 3 days old.

"Using full-packet capture solutions and big data analytics, we see everything," says John Vecchi, VP of product strategy for Solera Networks, a security analytics firm acquired by Blue Coat in May. "We are going to be able to see things and derive information that you would never be able to know from looking at log data."

In addition to allowing security analysts a deeper look, scrutinizing big data gives them more flexibility to find indicators of compromise that may not be immediately evident. One problem with current SIEM systems is that they typically define their searches and analyses performed on the log files, giving the user less flexibility, says Mark Seward, a senior director at Splunk, which offers tools for searching and analyzing security data.

"If I let my vendor determine in advance what data I am going to see, then I am already essentially compromised," Seward says.

chart: Is your security data considered big data?

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

2 of 3
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Chuck Brooks
Chuck Brooks,
User Rank: Apprentice
10/19/2013 | 6:20:48 PM
re: Big Data Detectives
Situational awareness on both internal and external threats can be enhanced via data analytics. False alarms a major issue and the predictive and forensic aspects of data analysis is an imperative. Vigilant is a good acquisition for Deloitte.
User Rank: Moderator
10/14/2013 | 10:01:35 PM
re: Big Data Detectives
Uhg, another Big Data article. "APT" must be running out of steam. I'd like to see some discussion of the NEW analytics being performed. Don't just backend your [SIEM] solution with Hadoop and call it a "Big Data" solution. This re-branding has been going on for as long as I've been in the security industry. This industry makes me want to move into the woods and make moonshine for a living.
What the FedEx Logo Taught Me About Cybersecurity
Matt Shea, Head of Federal @ MixMode,  6/4/2021
A View From Inside a Deception
Sara Peters, Senior Editor at Dark Reading,  6/2/2021
Register for Dark Reading Newsletters
White Papers
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Post a Comment
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-06-14
DoS attack can be performed when an email contains specially designed URL in the body. It can lead to the high CPU usage and cause low quality of service, or in extreme case bring the system to a halt. This issue affects: OTRS AG ((OTRS)) Community Edition 6.0.x version 6.0.1 and later versions. OTR...
PUBLISHED: 2021-06-13
The package studio-42/elfinder before 2.1.58 are vulnerable to Remote Code Execution (RCE) via execution of PHP code in a .phar file. NOTE: This only applies if the server parses .phar files as PHP.
PUBLISHED: 2021-06-12
Receita Federal IRPF 2021 1.7 allows a man-in-the-middle attack against the update feature.
PUBLISHED: 2021-06-12
In Apache PDFBox, a carefully crafted PDF file can trigger an OutOfMemory-Exception while loading the file. This issue affects Apache PDFBox version 2.0.23 and prior 2.0.x versions.
PUBLISHED: 2021-06-12
In Apache PDFBox, a carefully crafted PDF file can trigger an infinite loop while loading the file. This issue affects Apache PDFBox version 2.0.23 and prior 2.0.x versions.