Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analytics

9/6/2016
02:30 PM
Guy Caspi
Guy Caspi
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Introducing Deep Learning: Boosting Cybersecurity With An Artificial Brain

With nearly the same speed and precision that the human eye can identify a water bottle, the technology of deep learning is enabling the detection of malicious activity at the point of entry in real-time.

Editor’s Note: Last month, Dark Reading editors named Deep Instinct the most innovative startup in its first annual Best of Black Hat Innovation Awards program at Black Hat 2016 in Las Vegas. For more details on the competition and other results, read Best Of Black Hat Innovation Awards: And The Winners Are

It’s hot outside and you’re thirsty. As you reach for a water bottle, you don’t pause to analyze its material, size or shape in order to determine whether it’s a water bottle. Instead, you immediately reach for it, with complete confidence in its identification.

If I show the same water bottle to any traditional computer vision module, it will easily recognize it. If I partially obstruct the image with my fingers, then traditional computer vision modules will have difficulty recognizing it. But, if I apply an advanced form of artificial intelligence that is called deep learning, which is resistant to small changes and can generalize from partial data, it would be very easy for the computer vision module to correctly recognize the water bottle, even when most of the image is obstructed.

Deep learning, also known as neural networks, is “inspired” by the brain’s ability to learn to identify objects. Take vision as an example. Our brain can process raw data derived from our sensory inputs and learn the high-level features all on its own. Similarly, in deep learning, raw data is fed through the deep neural network, which learns to identify the object on which it is trained. Machine learning, on the other hand, requires manual intervention in selecting which features to process through the machine learning modules. As a result, the process is slower and accuracy can be affected by human error. Deep learning's more sophisticated, self-learning capability results in higher accuracy and faster processing.

Similar to image recognition, in cybersecurity, more than 99% of new threats and malware are actually very small mutations of previously existing ones. And even that 1% of supposedly brand-new malware are rather substantial mutations of existing malicious threats and concepts. But, despite this fact, cybersecurity solutions -- even the most advanced ones that use dynamic analysis and traditional machine learning -- have great difficulty in detecting a large portion of these new malware. The result is vulnerabilities that leave organizations exposed to data breaches, data theft, seizure for ransomware, data corruption, and destruction. We can solve this problem by applying deep learning to cybersecurity.

The history of malware detection in a nutshell
Signature-based solutions are the oldest form of malware detection, which is why they are also called legacy solutions. To detect malware, the antivirus engine compares the contents of an unidentified piece of code to its database of known malware signatures. If the malware hasn’t been seen before, these methods rely on manually tuned heuristics to generate a handcrafted signature, which is then released as an update to clients. This process is time-consuming, and sometimes signatures are released months after the initial detection. As a result, this detection method can’t keep up with the million new malware variants that are created daily. This leaves organizations vulnerable to the new threats as well as threats that have already been detected but have yet to have a signature released.

Heuristic techniques identify malware based on the behavioral characteristics in the code, which has led to behavioral-based solutions. This malware detection technique analyzes the malware’s behavior at runtime, instead of considering the characteristics hardcoded in the malware code itself. The main limitation of this malware detection method is that it is able to discover malware only once the malicious actions have begun. As a result, prevention is delayed, sometimes available only once it’s too late.

Sandbox solutions are a development of the behavioral-based detection method. These solutions execute the malware in a virtual (sandbox) environment to determine whether the file is malicious or not, instead of detecting the behavioral fingerprint at runtime. Although this technique has shown to be quite effective in its detection accuracy, it is achieved at the cost of real-time protection because of the time-consuming process involved. Additionally, newer types of malicious code that can evade sandbox detection by stalling their execution in a sandbox environment are posing new challenges to this type of malware detection and consequently, prevention capabilities.

Malware detection using AI: machine learning & deep learning
Incorporating AI capabilities to enable more sophisticated detection capabilities is the latest step in the evolution of cybersecurity solutions. Malware detection methods that are based on machine learning AI apply elaborate algorithms to classify a file’s behavior as malicious or legitimate according to feature engineering that is conducted manually. However, this process is time-consuming and requires massive human resources to tell the technology on which parameters, variables or features to focus during the file classification process. Additionally, the rate of malware detection is still far from 100%. 

Deep learning AI is an advanced branch of machine learning, also known as “neural networks” because it is "inspired" by the way the human brain works. In our neocortex, the outer layer of our brain where high-level cognitive tasks are performed, we have several tens of billions of neurons. These neurons, which are largely general purpose and domain-agnostic, can learn from any type of data. This is the great revolution of deep learning because deep neural networks are the first family of algorithms within machine learning that do not require manual feature engineering. Instead, they learn on their own to identify the object on which they are trained by processing and learning the high-level features from raw data -- very much like the way our brain learns on its own from raw data derived from our sensory inputs.

When applied to cybersecurity, the deep learning core engine is trained to learn without any human intervention whether a file is malicious or legitimate. Deep learning exhibits potentially groundbreaking results in detecting first-seen malware, compared with classical machine learning. In real environment tests on publicly known databases of endpoints, mobile and APT malware, for example, the detection rates of a deep learning solution detected over 99.9% of both substantial and slightly modified malicious code. These results are consistent with improvements achieved by deep learning in other fields, such as computer vision, speech recognition and text understanding.

In the same way humans can immediately identify a water bottle in the real world, the technology advancements of deep learning -- applied to cybersecurity -- can enable the precise detection of new malware threats and fill in the critical gaps that that leave organizations exposed to attacks.

Related Content:

Guy Caspi is a leading mathematician and a data scientist global expert. He has 15 years of extensive experience in applying mathematics and machine learning in a technology elite unit of the Israel Defense Forces (IDF), financial institutions and intelligence organizations ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
DevSecOps: The Answer to the Cloud Security Skills Gap
Lamont Orange, Chief Information Security Officer at Netskope,  11/15/2019
Attackers' Costs Increasing as Businesses Focus on Security
Robert Lemos, Contributing Writer,  11/15/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
Navigating the Deluge of Security Data
In this Tech Digest, Dark Reading shares the experiences of some top security practitioners as they navigate volumes of security data. We examine some examples of how enterprises can cull this data to find the clues they need.
Flash Poll
Rethinking Enterprise Data Defense
Rethinking Enterprise Data Defense
Frustrated with recurring intrusions and breaches, cybersecurity professionals are questioning some of the industrys conventional wisdom. Heres a look at what theyre thinking about.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-19037
PUBLISHED: 2019-11-21
ext4_empty_dir in fs/ext4/namei.c in the Linux kernel through 5.3.12 allows a NULL pointer dereference because ext4_read_dirblock(inode,0,DIRENT_HTREE) can be zero.
CVE-2019-19036
PUBLISHED: 2019-11-21
btrfs_root_node in fs/btrfs/ctree.c in the Linux kernel through 5.3.12 allows a NULL pointer dereference because rcu_dereference(root->node) can be zero.
CVE-2019-19039
PUBLISHED: 2019-11-21
__btrfs_free_extent in fs/btrfs/extent-tree.c in the Linux kernel through 5.3.12 calls btrfs_print_leaf in a certain ENOENT case, which allows local users to obtain potentially sensitive information about register values via the dmesg program.
CVE-2019-6852
PUBLISHED: 2019-11-20
A CWE-200: Information Exposure vulnerability exists in Modicon Controllers (M340 CPUs, M340 communication modules, Premium CPUs, Premium communication modules, Quantum CPUs, Quantum communication modules - see security notification for specific versions), which could cause the disclosure of FTP har...
CVE-2019-6853
PUBLISHED: 2019-11-20
A CWE-79: Failure to Preserve Web Page Structure vulnerability exists in Andover Continuum (models 9680, 5740 and 5720, bCX4040, bCX9640, 9900, 9940, 9924 and 9702) , which could enable a successful Cross-site Scripting (XSS attack) when using the products web server.