Big Data Detectives

Cybersecurity Analytics

Could big data be the key to identifying sophisticated threats? Security experts are on the case.

Robert Lemos, Contributing Writer

October 10, 2013

14 Min Read

Download the Dark Reading October 2013 Digital Issue

Download Dark Reading's October 2013 Digital Issue

For Vigilant, it started in 2009. And as with most companies, it started small.

The security services startup, now part of audit and consulting firm Deloitte, wanted a way to bring information about external threats to clients that were using SIEM (security information and event management) systems to monitor their own environments. The Vigilant team knew that the combination of external threat data with internal security event data could be a powerful way to improve enterprise defenses, but crunching all that data would be a monumental task.

Vigilant began combining threat intelligence feeds, filtering the data to pull out the most important information for each client, and then transmitting the data to their clients' SIEM systems. The company started with two threat lists: domains serving malware, and domains compromised by the Trojans SpyEye and Zeus. To reduce false alarms and aid in analysis, the company began adding more data feeds.

Vigilant's analysts quickly became addicted to the analysis. Each new source of data gave them the ability to tease out additional information on threats. By 2011, the company was processing about 50 to 100 GBs per day. But the company's systems couldn't keep up with the flow of data, and it started missing performance deadlines, says Joe Magee, co-founder and former CTO of Vigilant, who is now a director at Deloitte.

"We were not able to catch up," Magee says. "We were not able to process the information and push it out fast enough, and that's when it became a big data issue for us. We needed to be able to rip through this data in Google-like fashion."

The volume of data and rate of change caused the problem, because most of the data came in the form of feeds updated daily with gigabytes of data. It overwhelmed the company's initial database built on top of Postgres. In 2011, Vigilant moved to Hadoop and became one of many companies -- both vendors and enterprises -- that are advocating the use of big data analytics to improve the response to security threats.

Big Data Still Just A Promise

For security teams, the use of analytics on massive quantities of security data -- from device and application logs to collections of captured network packets and operational business data -- promises better visibility into the security threats that elude current defenses.

Big data analytics can be more complex than the log collection and analysis conducted by most SIEM systems, so automating the number crunching is often needed to let security pros more easily use statistical correlations to discover trends and anomalies. Tracking days or weeks of business activity allows the system to find outliers -- a user who accesses far more data on a daily basis than the average employee, or a system that has a sudden spike in resource consumption. Analysts then can dig deeper into the large data sets of security information for any flagged events.

"Big data is not just about gaining insights, it's about helping remediate issues faster," says Jason Corbin, director of security intelligence strategy for IBM Security Systems. "The big problem is that [security teams] are overwhelmed with information they have. All that information goes to some guy who has to sift through tons of incidents or vulnerability reports and decide what they need to patch or virtually patch or fix. Security teams fall behind, and that's how companies suffer breaches based on known but unpatched vulnerabilities."

But for many companies, the promise of big data in security is just that -- a promise. While security teams hope to gain more awareness of what is going on in their networks by collecting and analyzing more of their data, the technology is still in its adolescence. "Hadoop has been around for a while, but it is still figuring out what it is and what is wants to be," says Adrian Lane, CTO for security consultancy Securosis.

Still, the potential is huge, Lane adds. Companies that kick off a big data project for security can collect an immense volume of data and have a security analyst poke through the information, ask queries of the data and make important discoveries.

chart: Which of these big data tools are in use at your company

How Big Is Big?

Big data itself isn't a technology or a method of analysis. It's a concept that involves collecting, managing and making sense of more and new data sources. It's about analyzing the "dark data" (data that is collected but rarely used) created by business devices and systems. For companies, that means collecting orders of magnitude more data.

Business projects aimed at using big data to support security typically follow two paths. In the first, security teams gain access to a company's operational data and run an analysis against that data to highlight events that may indicate a security threat. Alternatively, the team can store data from security devices and other related systems and analyze the secu–rity-specific data for correlations that flag a potential attack.

Which types of data should be analyzed? Opinions vary. Many SIEM vendors argue that the proliferation of device log data creates a big data problem. Other companies, such as RSA, use a more strict definition. For them, big data means monitoring all of the information that crosses the enterprise network -- perhaps an unsurprising opinion for a company owned by storage system maker EMC.

"People think that any time you collect security information, that is big data," says Eddie Schwartz, chief information security officer for RSA. "No, it 's a new way of looking at information. Big data means that we're looking at transactional information, we're looking at the full context and content of network traffic."

For large companies, the creation of a big data store of security information may result as the byproduct of normal business, or it may be a goal.

But some big data advocates urge companies to search for more data sources under a "more is better" mantra. "One of the tenets of big data is that if I have a larger data set, I may see correlations that I might not have seen before," says Samuel Harris, director of enterprise risk management for Teradata.

Yet deriving security intelligence from a large collection of business data requires hard work. Many enterprises have tried to merge additional analytics capabilities into SIEM systems, but that has caused more headaches than hits, says Lucas Zaichkowsky, enterprise defense architect for AccessData, a computer forensics and security consulting firm.

"A company can have so much data and try to do so much with it, and there are no SIEM solutions that can handle it," he says. "There are a lot of failed SIEM projects."

In fact, growth in the types and volume of data produced by networking hardware creates the greatest challenge for companies trying to mine network data. In a study of companies' attitudes toward using big data analytics for security, half of 706 respondents had trouble handling the growth of network data, the Ponemon Institute found. Only 5% of IT security respondents believed the growth in data is an opportunity.

From Big Data To Bad Guys

Nevertheless, there are success stories in combining big data and security. In 2009, IT security firm BeyondTrust embarked on its own big data project. To help security managers focus on the most pressing vulnerabilities, the company pulled together frequently updated internal information -- such as the configuration of every machine in a 100,000-client network -- with information on the latest vulnerabilities, exploit kits and attacks.

Combining external and internal sets of data can help companies focus on the few vulnerabilities that really make a difference -- situations where the company has systems using vulnerable software, and attackers know about the software flaws and are actively exploiting them.

"As a customer, it lets me determine what do I have to do this week and what do I have to do next week to prevent my company from being hacked," says Marc Maiffret, CTO for BeyondTrust.

Another benefit is that BeyondTrust customers can see where they are vulnerable and also query the data for more specific information. "We know there is no way that we have thought of every scenario of how people will use this data, so we give them the tools and let them work with the data," Maiffret says.

Another success story: At the RSA Conference in 2012, Preston Wood, chief security officer at Salt Lake City-based Zions Bancorporation, outlined the bank's use of analytics to mine security events. Zions used open source Hadoop coupled with Google's MapReduce and business intelligence tools to correlate logs from antivirus, databases, firewalls, intrusion-detection systems and financial-industry-specific sources of information, such as credit applications and data. Using these methods, Zions has been able to collect and take action on security information in minutes when it used to take hours, Wood said.

In most cases, big data techniques are used to detect compromises that have already occurred, rather than to prevent them. Because companies are living in a state of compromise, they need to gather as much information as possible on what is happening in their network, says AccessData's Zaichkowsky. "They are accepting that there always will be a Victim Zero, and instead focus on spotting the activity."

Using statistical techniques such as linear regression, general linear models and machine learning, a security analyst looking at data can find odd behavior, suspicious events and other anomalies indicative of a compromise. While some events -- such as an internal system accessed from Russia at night -- are easy to identify as suspicious, more subtle transactions are missed because an analyst hasn't created a rule to watch for the activity. Mapping access attempts from each system, for example, could help security teams pinpoint when a compromised computer is trying every system on a network.

"If I ask a business person what 'bad' looks like, it's not an easy question," says RSA's Schwartz. "But mathematically, these types of anomalous transactions are much more obvious when you do statistical analysis."

RSA, for example, regularly explores different data sets within its own business to find new sources of data that can be mined for security information, Schwartz says.

Unlike log data, which resembles the summary information on a phone bill, big data systems collect detailed records, network packet data, and other data and metadata that are important to enterprise security.

For instance, a SIEM system may note that an EXE file had been downloaded to a desktop, and that the domain it came from was not on any blacklist. However, using other data, a different picture can emerge: The program was packed and obfuscated, downloaded from a nonstandard port and sent from a domain that was only 3 days old.

"Using full-packet capture solutions and big data analytics, we see everything," says John Vecchi, VP of product strategy for Solera Networks, a security analytics firm acquired by Blue Coat in May. "We are going to be able to see things and derive information that you would never be able to know from looking at log data."

In addition to allowing security analysts a deeper look, scrutinizing big data gives them more flexibility to find indicators of compromise that may not be immediately evident. One problem with current SIEM systems is that they typically define their searches and analyses performed on the log files, giving the user less flexibility, says Mark Seward, a senior director at Splunk, which offers tools for searching and analyzing security data.

"If I let my vendor determine in advance what data I am going to see, then I am already essentially compromised," Seward says.

chart: Is your security data considered big data?

Waiting For Maturity

While big data analysis holds promise for security, a number of factors have slowed its adoption. First, most enterprises don't have a line item in the budget for big data security projects. "Big data is about solving business problems, and security is generally, in the beginning, not one of those business problems," says Hadi Nahari, chief security architect for graphics chipmaker Nvidia. Some companies are also concerned that big data projects might introduce risk by forcing changes to the way security systems collect and report data, he notes.

Another major obstacle is the shortage of experts with the skills to mine large security databases for information. In addition to having the abilities of a data scientist, any big data security project leader also needs security expertise and a focus on usability, says Teradata's Harris.

The lack of skilled personnel was the third most significant barrier to a strong security posture among enterprises, according to the Ponemon Institute's "Big Data Analytics In Cyber Defense" report, commissioned by Teradata.

The top two barriers, according to the report, were a lack of effective security technology and an insufficient view into business processes -- chosen by 43% and 42% of respondents, respectively. During its RSA 2012 presentation, Zions Bancorporation introduced a team of three employees, including a data scientist, who created and run the company's big data project. But most companies can't afford to hire so many people for a big data security project.

Protecting Big Data

Using big data could be a boon to security, but enterprises should not forget about protecting the big data itself.

Because big data can be a complete record of a business's operations, it's important to lock it down, says Erik Jarlstrom, VP of technology solutions at Dataguise. Companies need to secure big data stores early to avoid delaying the project.

Big data resides in highly distributed clusters of computers, so securing the entire systems is a challenge, according to Adrian Lane, CTO of security consultancy Securosis, which recently released a research paper on big data security. Because data is distributed among the nodes and distributed in multiple copies, it's difficult to know where your data resides. In most cases, there is no generally available encryption for repositories, and no role-based administrative controls.

Lane advises that companies should use the Kerberos protocol to authenticate big data nodes and add file encryption. "We hear [from security architects] the most popular security model is to just hide the entire cluster within their infrastructure," Lane writes. "But those repositories are now Web accessible and very attractive targets."

Another hurdle to using big data in security is the relative immaturity of the market. While a number of security products now tout some tie-in with big data analytics, they require a great deal of expertise to use and maintain. "Big data has been around for a while, but it's only in its second generation," Securosis's Lane says. "It's not ready for prime time for many companies."

The easiest way for a company to get started in analyzing its security data is to buy a large server and start collecting information, says Vigilant's Magee. Many Vigilant clients are considering buying a large 32- or 64-CPU server and a fast data store, and some of them work with business teams that are already familiar with Hadoop.

"We can leverage Moore's Law to get out in front of this problem. We can start putting data into it and analyze it," Magee says. "While that may seem like a very simple or mundane version of SIEM, companies want that ability. They want to ask questions of their data."

For small and midsize businesses that don't have the resources to start up their own big data project, the only likely solution is to settle for services that incorporate external feeds and security analytics, says Jon Oltsik, senior principal analyst with the Enterprise Strategy Group. While big data analytics can be more effective than SIEM, it isn't easy to incorporate into a business.

"Easy is the key word," Oltsik says. "Big data is too complex and too costly for most midsize businesses, so the question is who can deliver the intelligence of big data at a lower cost than doing it themselves. For most smaller companies, that will be a service provider."

chart: When will use big data analytics for cyber defense?

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Related Topics

Related Topics

Related Topics

Related Topics

About the Author(s)

Editor's Choice