Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Attacks/Breaches

3/31/2016
10:00 AM
Commentary
Commentary
Commentary
50%
50%

Machine Learning In Security: Seeing the Nth Dimension in Signatures

How adding "supervised" machine learning to the development of n-dimensional signature engines is moving the detection odds back to the defender.

Second in a series of two articles about the history of signature-based detections and how the methodology has evolved to identify different types of cybersecurity threats.

Many security vendors are now applying increasingly sophisticated machine learning elements into their cloud-based analysis and classification systems, and into their products. All of these techniques have already proven their value in Internet search, targeted advertising and social networking business arenas.

For example, supervised learning models lie at the heart of ensuring that the best and most applicable results are returned when searching for the phrase “never going to give you up.”

In the information security world, supervised learning models are a natural progression of the one, two, and multi-dimensional signature systems discussed in my earlier article. At its core, instead of humans arguing over which features and attributes of a threat are most relevant to a detection, mathematics and science are used to find and evaluate the most important artifacts, and to automatically construct a sophisticated signature.

 N-dimensional Signatures

Multidimensional signatures and the security products that use them rely heavily on human researchers and analysts to observe and classify each behavior for efficacy.

If a threat exhibits a new malicious behavior (or a false positive behavior has been identified in the field), the analyst must manually create or edit a new signature element and its classification, and include it as an update. The assumption is that humans will be the most relevant elements of a threat and can label them.

The application of machine learning to the problem largely removes humans and their biases to the development of an n-dimensional signature (or often called a “classification model”).

Instead of manually trying to figure out and label all the good, bad, and suspicious behaviors, a machine is fed a bunch of “known bad” and “known good” samples, which could be binary files, network traffic, or even photographs. 

It then takes and compares all the observable behaviors of the collected samples, automatically determines which behaviors were more prevalent or less prevalent to each class of samples, calculates a weighting factor for each behavior, and combines all that intelligence in to a single model of n-dimensions – where n is a variable size based upon the type and number of samples and behaviors the machine used. 

Enter ‘Supervised Learning’

Different sample volumes and differing samples supplied over time will often affect n. In machine learning terminology, this process is called “supervised learning.” 

Historically, there existed a class of threat detection referred to as “Anomaly Detection Systems” (ADS) that effectively operated on the premise of baselining a network or host activity. In the case of network ADS (i.e. NADS), the approach would entail constructing a map of network devices, identifying who talks to who over what ports and protocols, how often, and in what kind of volume.

Once that baseline is established (typically over a month), any new chatter that was an anomaly to that model (e.g. a new host added to the network) generated an alert – subject to certain thresholds being defined. Obviously that approach generated incredibly high volumes of alerts and detection was governed by those threshold settings. As a technology, ADS represented a failed branch of the threat detection evolutionary tree.

Without getting into the math, unsupervised machine learning has allowed security vendors to revisit the ADS path and detection objectives – and overcome most of the alerting and threshold problems. The detection models and engines that use unsupervised machine learning still require an element of baselining, but continually learn and reassess that baseline on an hourly or daily basis. 

As such, these new detection systems are capable of identifying attack vectors such as “low-and-slow” data exfiltration, lateral movement, and staging servers. These threats are difficult or cumbersome to detect using signature systems.

This is why signature-based detection systems will continue to be valuable in to the future – not as a replacement, but as a companion to the new advancements in unsupervised machine learning. In other words, what the current generation of unsupervised machine learning brings to security is the ability to detect threats that are anomalies or unclassified events and behaviors.

It is inevitable that machine learning approaches will play an increasingly important role in future generations of threat detection technology. Just as their use has been critical to the advancement of Internet search and social media applications, their application to information security will be just as great. 

Signature-based threat detection systems have been evolving for more than two decades, and the application of supervised machine learning to the development of n-dimensional signature engines over the last couple of years is already moving the detection odds back to the defender. When combined with the newest generation of unsupervised machine learning systems, we can expect that needle to shift more rapidly in the defender’s favor.

Return to part 1: Machine Learning In Security: Good & Bad News About Signatures

Related Content: 

Interop 2016 Las Vegas

Find out more about security threats at Interop 2016, May 2-6, at the Mandalay Bay Convention Center, Las Vegas. Click here for pricing information and to register.

 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
investigators3
50%
50%
investigators3,
User Rank: Apprentice
7/27/2016 | 8:32:17 AM
Machine Learning In Security
Hello i am cyber security investigator please inform me,how i am learn about machine learning in security.
Jeremseo
100%
0%
Jeremseo,
User Rank: Strategist
4/5/2016 | 10:42:12 AM
THE Future
This is why signature-based detection systems will continue to be valuable in to the future- Glad to hear this opinion, not for replace but for makeing better.
mhkang589
50%
50%
mhkang589,
User Rank: Apprentice
4/2/2016 | 1:31:54 AM
String matching and reading are pretty similar.
And reading and understand context is hard for computer and easy for human.
The problem of signature-based detections is just so many many logs and alerts.
But ultimately, machine learning will be trend.

 
COVID-19: Latest Security News & Commentary
Dark Reading Staff 8/14/2020
Lock-Pickers Face an Uncertain Future Online
Seth Rosenblatt, Contributing Writer,  8/10/2020
Hacking It as a CISO: Advice for Security Leadership
Kelly Sheridan, Staff Editor, Dark Reading,  8/10/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
7 New Cybersecurity Vulnerabilities That Could Put Your Enterprise at Risk
In this Dark Reading Tech Digest, we look at the ways security researchers and ethical hackers find critical vulnerabilities and offer insights into how you can fix them before attackers can exploit them.
Flash Poll
The Changing Face of Threat Intelligence
The Changing Face of Threat Intelligence
This special report takes a look at how enterprises are using threat intelligence, as well as emerging best practices for integrating threat intel into security operations and incident response. Download it today!
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-17475
PUBLISHED: 2020-08-14
Lack of authentication in the network relays used in MEGVII Koala 2.9.1-c3s allows attackers to grant physical access to anyone by sending packet data to UDP port 5000.
CVE-2020-0255
PUBLISHED: 2020-08-14
** REJECT ** DO NOT USE THIS CANDIDATE NUMBER. ConsultIDs: CVE-2020-10751. Reason: This candidate is a duplicate of CVE-2020-10751. Notes: All CVE users should reference CVE-2020-10751 instead of this candidate. All references and descriptions in this candidate have been removed to prevent accidenta...
CVE-2020-14353
PUBLISHED: 2020-08-14
** REJECT ** DO NOT USE THIS CANDIDATE NUMBER. ConsultIDs: CVE-2017-18270. Reason: This candidate is a duplicate of CVE-2017-18270. Notes: All CVE users should reference CVE-2017-18270 instead of this candidate. All references and descriptions in this candidate have been removed to prevent accidenta...
CVE-2020-17464
PUBLISHED: 2020-08-14
** REJECT ** DO NOT USE THIS CANDIDATE NUMBER. ConsultIDs: none. Reason: This candidate was withdrawn by its CNA. Further investigation showed that it was not a security issue. Notes: none.
CVE-2020-17473
PUBLISHED: 2020-08-14
Lack of mutual authentication in ZKTeco FaceDepot 7B 1.0.213 and ZKBiosecurity Server 1.0.0_20190723 allows an attacker to obtain a long-lasting token by impersonating the server.