Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analytics

9/6/2016
02:30 PM
Guy Caspi
Guy Caspi
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Introducing Deep Learning: Boosting Cybersecurity With An Artificial Brain

With nearly the same speed and precision that the human eye can identify a water bottle, the technology of deep learning is enabling the detection of malicious activity at the point of entry in real-time.

Editor’s Note: Last month, Dark Reading editors named Deep Instinct the most innovative startup in its first annual Best of Black Hat Innovation Awards program at Black Hat 2016 in Las Vegas. For more details on the competition and other results, read Best Of Black Hat Innovation Awards: And The Winners Are

It’s hot outside and you’re thirsty. As you reach for a water bottle, you don’t pause to analyze its material, size or shape in order to determine whether it’s a water bottle. Instead, you immediately reach for it, with complete confidence in its identification.

If I show the same water bottle to any traditional computer vision module, it will easily recognize it. If I partially obstruct the image with my fingers, then traditional computer vision modules will have difficulty recognizing it. But, if I apply an advanced form of artificial intelligence that is called deep learning, which is resistant to small changes and can generalize from partial data, it would be very easy for the computer vision module to correctly recognize the water bottle, even when most of the image is obstructed.

Deep learning, also known as neural networks, is “inspired” by the brain’s ability to learn to identify objects. Take vision as an example. Our brain can process raw data derived from our sensory inputs and learn the high-level features all on its own. Similarly, in deep learning, raw data is fed through the deep neural network, which learns to identify the object on which it is trained. Machine learning, on the other hand, requires manual intervention in selecting which features to process through the machine learning modules. As a result, the process is slower and accuracy can be affected by human error. Deep learning's more sophisticated, self-learning capability results in higher accuracy and faster processing.

Similar to image recognition, in cybersecurity, more than 99% of new threats and malware are actually very small mutations of previously existing ones. And even that 1% of supposedly brand-new malware are rather substantial mutations of existing malicious threats and concepts. But, despite this fact, cybersecurity solutions -- even the most advanced ones that use dynamic analysis and traditional machine learning -- have great difficulty in detecting a large portion of these new malware. The result is vulnerabilities that leave organizations exposed to data breaches, data theft, seizure for ransomware, data corruption, and destruction. We can solve this problem by applying deep learning to cybersecurity.

The history of malware detection in a nutshell
Signature-based solutions are the oldest form of malware detection, which is why they are also called legacy solutions. To detect malware, the antivirus engine compares the contents of an unidentified piece of code to its database of known malware signatures. If the malware hasn’t been seen before, these methods rely on manually tuned heuristics to generate a handcrafted signature, which is then released as an update to clients. This process is time-consuming, and sometimes signatures are released months after the initial detection. As a result, this detection method can’t keep up with the million new malware variants that are created daily. This leaves organizations vulnerable to the new threats as well as threats that have already been detected but have yet to have a signature released.

Heuristic techniques identify malware based on the behavioral characteristics in the code, which has led to behavioral-based solutions. This malware detection technique analyzes the malware’s behavior at runtime, instead of considering the characteristics hardcoded in the malware code itself. The main limitation of this malware detection method is that it is able to discover malware only once the malicious actions have begun. As a result, prevention is delayed, sometimes available only once it’s too late.

Sandbox solutions are a development of the behavioral-based detection method. These solutions execute the malware in a virtual (sandbox) environment to determine whether the file is malicious or not, instead of detecting the behavioral fingerprint at runtime. Although this technique has shown to be quite effective in its detection accuracy, it is achieved at the cost of real-time protection because of the time-consuming process involved. Additionally, newer types of malicious code that can evade sandbox detection by stalling their execution in a sandbox environment are posing new challenges to this type of malware detection and consequently, prevention capabilities.

Malware detection using AI: machine learning & deep learning
Incorporating AI capabilities to enable more sophisticated detection capabilities is the latest step in the evolution of cybersecurity solutions. Malware detection methods that are based on machine learning AI apply elaborate algorithms to classify a file’s behavior as malicious or legitimate according to feature engineering that is conducted manually. However, this process is time-consuming and requires massive human resources to tell the technology on which parameters, variables or features to focus during the file classification process. Additionally, the rate of malware detection is still far from 100%. 

Deep learning AI is an advanced branch of machine learning, also known as “neural networks” because it is "inspired" by the way the human brain works. In our neocortex, the outer layer of our brain where high-level cognitive tasks are performed, we have several tens of billions of neurons. These neurons, which are largely general purpose and domain-agnostic, can learn from any type of data. This is the great revolution of deep learning because deep neural networks are the first family of algorithms within machine learning that do not require manual feature engineering. Instead, they learn on their own to identify the object on which they are trained by processing and learning the high-level features from raw data -- very much like the way our brain learns on its own from raw data derived from our sensory inputs.

When applied to cybersecurity, the deep learning core engine is trained to learn without any human intervention whether a file is malicious or legitimate. Deep learning exhibits potentially groundbreaking results in detecting first-seen malware, compared with classical machine learning. In real environment tests on publicly known databases of endpoints, mobile and APT malware, for example, the detection rates of a deep learning solution detected over 99.9% of both substantial and slightly modified malicious code. These results are consistent with improvements achieved by deep learning in other fields, such as computer vision, speech recognition and text understanding.

In the same way humans can immediately identify a water bottle in the real world, the technology advancements of deep learning -- applied to cybersecurity -- can enable the precise detection of new malware threats and fill in the critical gaps that that leave organizations exposed to attacks.

Related Content:

Guy Caspi is a leading mathematician and a data scientist global expert. He has 15 years of extensive experience in applying mathematics and machine learning in a technology elite unit of the Israel Defense Forces (IDF), financial institutions and intelligence organizations ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
10 Ways to Keep a Rogue RasPi From Wrecking Your Network
Curtis Franklin Jr., Senior Editor at Dark Reading,  7/10/2019
The Security of Cloud Applications
Hillel Solow, CTO and Co-founder, Protego,  7/11/2019
Where Businesses Waste Endpoint Security Budgets
Kelly Sheridan, Staff Editor, Dark Reading,  7/15/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
Building and Managing an IT Security Operations Program
As cyber threats grow, many organizations are building security operations centers (SOCs) to improve their defenses. In this Tech Digest you will learn tips on how to get the most out of a SOC in your organization - and what to do if you can't afford to build one.
Flash Poll
The State of IT Operations and Cybersecurity Operations
The State of IT Operations and Cybersecurity Operations
Your enterprise's cyber risk may depend upon the relationship between the IT team and the security team. Heres some insight on what's working and what isn't in the data center.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-10100
PUBLISHED: 2019-07-16
NASA CFITSIO prior to 3.43 is affected by: Buffer Overflow. The impact is: arbitrary code execution. The component is: over 40 source code files were changed. The attack vector is: remote unauthenticated attacker. The fixed version is: 3.43.
CVE-2019-10100
PUBLISHED: 2019-07-16
BigTree-CMS commit b2eff67e45b90ca26a62e971e8f0d5d0d70f23e6 and earlier is affected by: Improper Neutralization of Script-Related HTML Tags in a Web Page. The impact is: Any Javascript code can be executed. The component is: users management page. The attack vector is: Insert payload into users' pro...
CVE-2019-10100
PUBLISHED: 2019-07-16
PluckCMS 4.7.4 and earlier is affected by: CWE-434 Unrestricted Upload of File with Dangerous Type. The impact is: get webshell. The component is: data/inc/images.php line36. The attack vector is: modify the MIME TYPE on HTTP request to upload a php file. The fixed version is: after commit 09f0ab871...
CVE-2019-13612
PUBLISHED: 2019-07-16
MDaemon Email Server 19 skips SpamAssassin checks by default for e-mail messages larger than 2 MB (and limits checks to 10 MB even with special configuration), which is arguably inconsistent with currently popular message sizes. This might interfere with risk management for malicious e-mail, if a cu...
CVE-2019-10100
PUBLISHED: 2019-07-16
Zammad GmbH Zammad 2.3.0 and earlier is affected by: Cross Site Scripting (XSS) - CWE-80. The impact is: Execute java script code on users browser. The component is: web app. The attack vector is: the victim must open a ticket. The fixed version is: 2.3.1, 2.2.2 and 2.1.3.