Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

7/22/2015
01:50 PM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Researchers Enlist Machine Learning In Malware Detection

No sandbox required for schooling software to speedily spot malware, researchers will demonstrate at Black Hat USA.

In 100 milliseconds or less, researchers are now able to determine whether a piece of code is malware or not -- and without the need to isolate it in a sandbox for analysis.

Welcome to the age of machine learning as a tool for more efficiently detecting malware, via so-called "deep learning" techniques. Researchers have built a special machine learning tool module that employs static analysis of a piece of code to quickly spot -- and ultimately, stop -- malware infections. A pair of researchers plans to demonstrate live at Black Hat USA next month just how this approach can spot malware from live malware feeds.

Matt Wolff, chief data scientist at Cylance, says his team is applying deep learning--a more granular subset of machine learning--to malware detection by training the software via legitimate files and malicious ones, and teaching the application/algorithm which is which. The application then can take files it's never seen before and spot malware, he says.

It uses a static analysis approach. When you run malware to test it, the malware has a window "to fight back before you can stop it," Wolff says. "We don't run it [the malware], so the malware doesn't have a chance. And it's fast," he says, faster than sandboxing and analyzing malware.

The concept of employing machine learning and deep learning to malware detection isn't really new, but it's been only over the past few years that it's become more realistic to deploy, thanks to cloud-based computing options making the cost of big-data computing more affordable. You don't have to build a data center of hundreds of machines anymore; you can rent the necessary processing power for machine learning.  "Advances in processors, memory, etc., lend themselves to help make these techniques more powerful," Wolff says. "We don't see anyone [else] applying algorithms to … malware detection" yet, he says.

"The main premise behind machine learning is matching patterns. When you look at malware, you may not see any patterns. But when you look at a half of a billion samples, there may be tons of patterns that are relatively easy to discern," he says. "The goal of this model is to find these patterns."

A typical malware characteristic would be the ability for the code to use functions that capture and log keystrokes, for example.

Machine/deep learning is especially helpful in staying atop the increasingly polymorphic nature of malware. "If a malware author two months later comes up with a new [variant], there's a high probability the module you wrote is going to detect that. It has a predictive capability," Wolff says.

With the mountains of malware generated daily, the need for a more automated and intelligent method to learn, adapt, and catch malware is crucial. Cylance has some one- to 2 petabytes of data in its data set for machine learning: "We typically have a few hundred CPUs running for days to process and work through the data, and weeks and months running and training the machines to learn these things," Wolff says. It takes hundreds of gigabytes of memory, CPUs and "big machines," he says.

The machine learning-based method for now is all about detection. It's up to the security analyst or other tools to decide what to do next with the newly discovered malicious code, he says.

A deep learning system could ultimately replace today's existing malware detection tools, Wolff says. "A machine learning engine is more effective" than a signature-based engine, he says.

Wolff and his colleague Andrew Davis, a machine learning scientist at Cylance, will feed their deep-learning module some fresh meat malware live during their talk at Black Hat, called "Deep Learning on Disassembly."  "We'll … see what it catches," says Wolff.

[Register now for Black Hat USA.]

Kelly Jackson Higgins is the Executive Editor of Dark Reading. She is an award-winning veteran technology and business journalist with more than two decades of experience in reporting and editing for various publications, including Network Computing, Secure Enterprise ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
News
FluBot Malware's Rapid Spread May Soon Hit US Phones
Kelly Sheridan, Staff Editor, Dark Reading,  4/28/2021
Slideshows
7 Modern-Day Cybersecurity Realities
Steve Zurier, Contributing Writer,  4/30/2021
Commentary
How to Secure Employees' Home Wi-Fi Networks
Bert Kashyap, CEO and Co-Founder at SecureW2,  4/28/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-35519
PUBLISHED: 2021-05-06
An out-of-bounds (OOB) memory access flaw was found in x25_bind in net/x25/af_x25.c in the Linux kernel version v5.12-rc5. A bounds check failure allows a local attacker with a user account on the system to gain access to out-of-bounds memory, leading to a system crash or a leak of internal kernel i...
CVE-2021-20204
PUBLISHED: 2021-05-06
A heap memory corruption problem (use after free) can be triggered in libgetdata v0.10.0 when processing maliciously crafted dirfile databases. This degrades the confidentiality, integrity and availability of third-party software that uses libgetdata as a library. This vulnerability may lead to arbi...
CVE-2021-30473
PUBLISHED: 2021-05-06
aom_image.c in libaom in AOMedia before 2021-04-07 frees memory that is not located on the heap.
CVE-2021-32030
PUBLISHED: 2021-05-06
The administrator application on ASUS GT-AC2900 devices before 3.0.0.4.386.42643 allows authentication bypass when processing remote input from an unauthenticated user, leading to unauthorized access to the administrator interface. This relates to handle_request in router/httpd/httpd.c and auth_chec...
CVE-2021-22209
PUBLISHED: 2021-05-06
An issue has been discovered in GitLab CE/EE affecting all versions starting from 13.8. GitLab was not properly validating authorisation tokens which resulted in GraphQL mutation being executed.