Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Attacks/Breaches

3/31/2016
10:00 AM
Commentary
Commentary
Commentary
50%
50%

Machine Learning In Security: Seeing the Nth Dimension in Signatures

How adding "supervised" machine learning to the development of n-dimensional signature engines is moving the detection odds back to the defender.

Second in a series of two articles about the history of signature-based detections and how the methodology has evolved to identify different types of cybersecurity threats.

Many security vendors are now applying increasingly sophisticated machine learning elements into their cloud-based analysis and classification systems, and into their products. All of these techniques have already proven their value in Internet search, targeted advertising and social networking business arenas.

For example, supervised learning models lie at the heart of ensuring that the best and most applicable results are returned when searching for the phrase “never going to give you up.”

In the information security world, supervised learning models are a natural progression of the one, two, and multi-dimensional signature systems discussed in my earlier article. At its core, instead of humans arguing over which features and attributes of a threat are most relevant to a detection, mathematics and science are used to find and evaluate the most important artifacts, and to automatically construct a sophisticated signature.

 N-dimensional Signatures

Multidimensional signatures and the security products that use them rely heavily on human researchers and analysts to observe and classify each behavior for efficacy.

If a threat exhibits a new malicious behavior (or a false positive behavior has been identified in the field), the analyst must manually create or edit a new signature element and its classification, and include it as an update. The assumption is that humans will be the most relevant elements of a threat and can label them.

The application of machine learning to the problem largely removes humans and their biases to the development of an n-dimensional signature (or often called a “classification model”).

Instead of manually trying to figure out and label all the good, bad, and suspicious behaviors, a machine is fed a bunch of “known bad” and “known good” samples, which could be binary files, network traffic, or even photographs. 

It then takes and compares all the observable behaviors of the collected samples, automatically determines which behaviors were more prevalent or less prevalent to each class of samples, calculates a weighting factor for each behavior, and combines all that intelligence in to a single model of n-dimensions – where n is a variable size based upon the type and number of samples and behaviors the machine used. 

Enter ‘Supervised Learning’

Different sample volumes and differing samples supplied over time will often affect n. In machine learning terminology, this process is called “supervised learning.” 

Historically, there existed a class of threat detection referred to as “Anomaly Detection Systems” (ADS) that effectively operated on the premise of baselining a network or host activity. In the case of network ADS (i.e. NADS), the approach would entail constructing a map of network devices, identifying who talks to who over what ports and protocols, how often, and in what kind of volume.

Once that baseline is established (typically over a month), any new chatter that was an anomaly to that model (e.g. a new host added to the network) generated an alert – subject to certain thresholds being defined. Obviously that approach generated incredibly high volumes of alerts and detection was governed by those threshold settings. As a technology, ADS represented a failed branch of the threat detection evolutionary tree.

Without getting into the math, unsupervised machine learning has allowed security vendors to revisit the ADS path and detection objectives – and overcome most of the alerting and threshold problems. The detection models and engines that use unsupervised machine learning still require an element of baselining, but continually learn and reassess that baseline on an hourly or daily basis. 

As such, these new detection systems are capable of identifying attack vectors such as “low-and-slow” data exfiltration, lateral movement, and staging servers. These threats are difficult or cumbersome to detect using signature systems.

This is why signature-based detection systems will continue to be valuable in to the future – not as a replacement, but as a companion to the new advancements in unsupervised machine learning. In other words, what the current generation of unsupervised machine learning brings to security is the ability to detect threats that are anomalies or unclassified events and behaviors.

It is inevitable that machine learning approaches will play an increasingly important role in future generations of threat detection technology. Just as their use has been critical to the advancement of Internet search and social media applications, their application to information security will be just as great. 

Signature-based threat detection systems have been evolving for more than two decades, and the application of supervised machine learning to the development of n-dimensional signature engines over the last couple of years is already moving the detection odds back to the defender. When combined with the newest generation of unsupervised machine learning systems, we can expect that needle to shift more rapidly in the defender’s favor.

Return to part 1: Machine Learning In Security: Good & Bad News About Signatures

Related Content: 

Interop 2016 Las Vegas

Find out more about security threats at Interop 2016, May 2-6, at the Mandalay Bay Convention Center, Las Vegas. Click here for pricing information and to register.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
investigators3
50%
50%
investigators3,
User Rank: Apprentice
7/27/2016 | 8:32:17 AM
Machine Learning In Security
Hello i am cyber security investigator please inform me,how i am learn about machine learning in security.
Jeremseo
100%
0%
Jeremseo,
User Rank: Strategist
4/5/2016 | 10:42:12 AM
THE Future
This is why signature-based detection systems will continue to be valuable in to the future- Glad to hear this opinion, not for replace but for makeing better.
mhkang589
50%
50%
mhkang589,
User Rank: Apprentice
4/2/2016 | 1:31:54 AM
String matching and reading are pretty similar.
And reading and understand context is hard for computer and easy for human.
The problem of signature-based detections is just so many many logs and alerts.
But ultimately, machine learning will be trend.

 
Limited-Time Free Offers to Secure the Enterprise Amid COVID-19
Curtis Franklin Jr., Senior Editor at Dark Reading,  3/31/2020
Palo Alto Networks to Buy CloudGenix for $420M
Dark Reading Staff 3/31/2020
COVID-19: Latest Security News & Commentary
Dark Reading Staff 4/3/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
6 Emerging Cyber Threats That Enterprises Face in 2020
This Tech Digest gives an in-depth look at six emerging cyber threats that enterprises could face in 2020. Download your copy today!
Flash Poll
State of Cybersecurity Incident Response
State of Cybersecurity Incident Response
Data breaches and regulations have forced organizations to pay closer attention to the security incident response function. However, security leaders may be overestimating their ability to detect and respond to security incidents. Read this report to find out more.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-11580
PUBLISHED: 2020-04-06
An issue was discovered in Pulse Secure Pulse Connect Secure (PCS) through 2020-04-06. The applet in tncc.jar, executed on macOS, Linux, and Solaris clients when a Host Checker policy is enforced, accepts an arbitrary SSL certificate.
CVE-2020-11581
PUBLISHED: 2020-04-06
An issue was discovered in Pulse Secure Pulse Connect Secure (PCS) through 2020-04-06. The applet in tncc.jar, executed on macOS, Linux, and Solaris clients when a Host Checker policy is enforced, allows a man-in-the-middle attacker to perform OS command injection attacks (against a client) via shel...
CVE-2020-11582
PUBLISHED: 2020-04-06
An issue was discovered in Pulse Secure Pulse Connect Secure (PCS) through 2020-04-06. The applet in tncc.jar, executed on macOS, Linux, and Solaris clients when a Host Checker policy is enforced, launches a TCP server that accepts local connections on a random port. This can be reached by local HTT...
CVE-2020-11585
PUBLISHED: 2020-04-06
There is an information disclosure issue in DNN (formerly DotNetNuke) 9.5 within the built-in Activity-Feed/Messaging/Userid/ Message Center module. A registered user is able to enumerate any file in the Admin File Manager (other than ones contained in a secure folder) by sending themselves a messag...
CVE-2020-5832
PUBLISHED: 2020-04-06
Symantec Data Center Security Manager Component, prior to 6.8.2 (aka 6.8 MP2), may be susceptible to a privilege escalation vulnerability, which is a type of issue whereby an attacker may attempt to compromise the software application to gain elevated access to resources that are normally protected ...