Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Perimeter

6/27/2013
11:37 PM
50%
50%

'BinaryPig' Uses Hadoop To Sniff Out Patterns In Malware

At Black Hat next month, researchers will release new set of big-data tools that can find patterns in the data among security firms' massive databases of malware

As the menagerie of malware collected by security firms continues to multiply, researchers are looking for new ways to analyze the massive data sets to find interesting information in their malware zoos.

Click here for more of Dark Reading's Black Hat articles.

At the Black Hat Security Briefings in late July, one trio of researchers plans to release a framework for using Hadoop and the Apache Pig platform for parallelization that makes analyzing large sets of malware programs easier. The three researchers -- Zachary Hanif, Telvis Calhoun, and Jason Trost of Endgame Systems -- developed the framework, dubbed BinaryPig, while trying to analyze a quickly growing collection of millions of malware samples collected by the company over the past three years.

Originally, the researchers wanted to mine their collection of malware data for historical trends, but the number of malware binaries -- now at 20 million -- made it difficult to process. By moving to big-data analytics, the researchers can now analyze trends in patterns in the executable headers, look for specific file features and even do entropy analysis, says Hanif, a senior researcher with Endgame.

"It is comparatively shallow analysis compared to what heavyweight reverse-engineers do, but at scale we can take that shallow analysis and extract deep insight," he says.

The security industry has begun focusing on using big-data analytics techniques to find intelligence in their security data, from enterprises looking for signs of breaches in their log data to security companies looking for patterns in attack data from their sensor networks.

Malware analysis is an appropriate application of the techniques because attackers are generating so many variants of their programs, as a way to dodge defenses, that security firms' malware zoos have become overpopulated. McAfee's zoo, for example, topped 128 million malicious programs after gaining more than 14 million in the first quarter of 2013, according to the firm's quarterly report.

Using Hadoop and other big-data analysis methods, McAfee and other companies can reduce the tens of thousands of malware samples arriving each day into a more manageable number, says Adam Wosotoswsky, messaging security architect for McAfee.

"You are able to say, 'Here are the things that we definitely think are bad, here are the things that we definitely think are good, and here is the gray area,'" he says.

The advantage of Hadoop in working with big data is that the movement of data between machines is minimized. Instead, the processing functions are moved to where the data resides, which takes less time to accomplish.

[For big companies looking to spend big budgets, the Big Data pitch for security information and event management (SIEM) systems is a good fit. But other improvements are on the way. See More Improvements To SIEM Than Big Data.]

The approach allows novel approaches to malware analysis. For example, the Endgame researchers have analyzed the bitmaps used by malware for icons, buttons, and controls to find commonalities in families of malware. Some attempt to display the old Windows XP icon for PDF files, for example, while others use some far more recognizable images, such as skulls, says Endgame's Hanif.

"There are a handful of malware authors out there, or at least malware families, that seem to have differences in which icon they use to masquerade as a different file type," he says. "We are trying to see what the possibilities are for doing some clustering and classification based on those images."

The three researchers plan to release the framework as open-source tools at the Black Hat Security Briefings, allowing others to use and build on the work.

Dean De Beer, chief technology officer for malware analysis firm ThreatGRID, sees the release of the tools as important to helping researchers and open-source intelligence projects deal with big-data problems. While ThreatGRID has created a non-Hadoop framework to allow the storing of features culled from the static and binary analysis of malware, De Beer says the BinaryPig approach can help improve analysis.

"If there is a way that people can find a far more efficient means to search, query, and extract content, than I think that is a very very powerful tool," he said. "It would be nice to see it evolve from static extraction to handle dynamic feature extraction, however."

For Trost, Hanif, and Calhoun, however, the release of the framework means giving more tools to help the security industry adopt big-data analysis tools.

"Big data technology is going to revolutionize the security industry," Endgame's Trost says. "A lot of other industries have started to ride this wave, and I really am hoping that the security industry will take advantage of this."

Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message. Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Mobile Banking Malware Up 50% in First Half of 2019
Kelly Sheridan, Staff Editor, Dark Reading,  1/17/2020
Active Directory Needs an Update: Here's Why
Raz Rafaeli, CEO and Co-Founder at Secret Double Octopus,  1/16/2020
New Attack Campaigns Suggest Emotet Threat Is Far From Over
Jai Vijayan, Contributing Writer,  1/16/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
The Year in Security: 2019
This Tech Digest provides a wrap up and overview of the year's top cybersecurity news stories. It was a year of new twists on old threats, with fears of another WannaCry-type worm and of a possible botnet army of Wi-Fi routers. But 2019 also underscored the risk of firmware and trusted security tools harboring dangerous holes that cybercriminals and nation-state hackers could readily abuse. Read more.
Flash Poll
How Enterprises are Attacking the Cybersecurity Problem
How Enterprises are Attacking the Cybersecurity Problem
Organizations have invested in a sweeping array of security technologies to address challenges associated with the growing number of cybersecurity attacks. However, the complexity involved in managing these technologies is emerging as a major problem. Read this report to find out what your peers biggest security challenges are and the technologies they are using to address them.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-5216
PUBLISHED: 2020-01-23
In Secure Headers (RubyGem secure_headers), a directive injection vulnerability is present in versions before 3.9.0, 5.2.0, and 6.3.0. If user-supplied input was passed into append/override_content_security_policy_directives, a newline could be injected leading to limited header injection. Upon seei...
CVE-2020-5217
PUBLISHED: 2020-01-23
In Secure Headers (RubyGem secure_headers), a directive injection vulnerability is present in versions before 3.8.0, 5.1.0, and 6.2.0. If user-supplied input was passed into append/override_content_security_policy_directives, a semicolon could be injected leading to directive injection. This could b...
CVE-2020-5223
PUBLISHED: 2020-01-23
In PrivateBin versions 1.2.0 before 1.2.2, and 1.3.0 before 1.3.2, a persistent XSS attack is possible. Under certain conditions, a user provided attachment file name can inject HTML leading to a persistent Cross-site scripting (XSS) vulnerability. The vulnerability has been fixed in PrivateBin v1.3...
CVE-2019-20399
PUBLISHED: 2020-01-23
A timing vulnerability in the Scalar::check_overflow function in Parity libsecp256k1-rs before 0.3.1 potentially allows an attacker to leak information via a side-channel attack.
CVE-2020-7915
PUBLISHED: 2020-01-22
An issue was discovered on Eaton 5P 850 devices. The Ubicacion SAI field allows XSS attacks by an administrator.