Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Attacks/Breaches

3/31/2016
10:00 AM
Commentary
Commentary
Commentary
50%
50%

Machine Learning In Security: Seeing the Nth Dimension in Signatures

How adding "supervised" machine learning to the development of n-dimensional signature engines is moving the detection odds back to the defender.

Second in a series of two articles about the history of signature-based detections and how the methodology has evolved to identify different types of cybersecurity threats.

Many security vendors are now applying increasingly sophisticated machine learning elements into their cloud-based analysis and classification systems, and into their products. All of these techniques have already proven their value in Internet search, targeted advertising and social networking business arenas.

For example, supervised learning models lie at the heart of ensuring that the best and most applicable results are returned when searching for the phrase “never going to give you up.”

In the information security world, supervised learning models are a natural progression of the one, two, and multi-dimensional signature systems discussed in my earlier article. At its core, instead of humans arguing over which features and attributes of a threat are most relevant to a detection, mathematics and science are used to find and evaluate the most important artifacts, and to automatically construct a sophisticated signature.

 N-dimensional Signatures

Multidimensional signatures and the security products that use them rely heavily on human researchers and analysts to observe and classify each behavior for efficacy.

If a threat exhibits a new malicious behavior (or a false positive behavior has been identified in the field), the analyst must manually create or edit a new signature element and its classification, and include it as an update. The assumption is that humans will be the most relevant elements of a threat and can label them.

The application of machine learning to the problem largely removes humans and their biases to the development of an n-dimensional signature (or often called a “classification model”).

Instead of manually trying to figure out and label all the good, bad, and suspicious behaviors, a machine is fed a bunch of “known bad” and “known good” samples, which could be binary files, network traffic, or even photographs. 

It then takes and compares all the observable behaviors of the collected samples, automatically determines which behaviors were more prevalent or less prevalent to each class of samples, calculates a weighting factor for each behavior, and combines all that intelligence in to a single model of n-dimensions – where n is a variable size based upon the type and number of samples and behaviors the machine used. 

Enter ‘Supervised Learning’

Different sample volumes and differing samples supplied over time will often affect n. In machine learning terminology, this process is called “supervised learning.” 

Historically, there existed a class of threat detection referred to as “Anomaly Detection Systems” (ADS) that effectively operated on the premise of baselining a network or host activity. In the case of network ADS (i.e. NADS), the approach would entail constructing a map of network devices, identifying who talks to who over what ports and protocols, how often, and in what kind of volume.

Once that baseline is established (typically over a month), any new chatter that was an anomaly to that model (e.g. a new host added to the network) generated an alert – subject to certain thresholds being defined. Obviously that approach generated incredibly high volumes of alerts and detection was governed by those threshold settings. As a technology, ADS represented a failed branch of the threat detection evolutionary tree.

Without getting into the math, unsupervised machine learning has allowed security vendors to revisit the ADS path and detection objectives – and overcome most of the alerting and threshold problems. The detection models and engines that use unsupervised machine learning still require an element of baselining, but continually learn and reassess that baseline on an hourly or daily basis. 

As such, these new detection systems are capable of identifying attack vectors such as “low-and-slow” data exfiltration, lateral movement, and staging servers. These threats are difficult or cumbersome to detect using signature systems.

This is why signature-based detection systems will continue to be valuable in to the future – not as a replacement, but as a companion to the new advancements in unsupervised machine learning. In other words, what the current generation of unsupervised machine learning brings to security is the ability to detect threats that are anomalies or unclassified events and behaviors.

It is inevitable that machine learning approaches will play an increasingly important role in future generations of threat detection technology. Just as their use has been critical to the advancement of Internet search and social media applications, their application to information security will be just as great. 

Signature-based threat detection systems have been evolving for more than two decades, and the application of supervised machine learning to the development of n-dimensional signature engines over the last couple of years is already moving the detection odds back to the defender. When combined with the newest generation of unsupervised machine learning systems, we can expect that needle to shift more rapidly in the defender’s favor.

Return to part 1: Machine Learning In Security: Good & Bad News About Signatures

Related Content: 

Interop 2016 Las Vegas

Find out more about security threats at Interop 2016, May 2-6, at the Mandalay Bay Convention Center, Las Vegas. Click here for pricing information and to register.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
investigators3
50%
50%
investigators3,
User Rank: Apprentice
7/27/2016 | 8:32:17 AM
Machine Learning In Security
Hello i am cyber security investigator please inform me,how i am learn about machine learning in security.
Jeremseo
100%
0%
Jeremseo,
User Rank: Strategist
4/5/2016 | 10:42:12 AM
THE Future
This is why signature-based detection systems will continue to be valuable in to the future- Glad to hear this opinion, not for replace but for makeing better.
mhkang589
50%
50%
mhkang589,
User Rank: Apprentice
4/2/2016 | 1:31:54 AM
String matching and reading are pretty similar.
And reading and understand context is hard for computer and easy for human.
The problem of signature-based detections is just so many many logs and alerts.
But ultimately, machine learning will be trend.

 
MoviePass Leaves Credit Card Numbers, Personal Data Exposed Online
Kelly Sheridan, Staff Editor, Dark Reading,  8/21/2019
New FISMA Report Shows Progress, Gaps in Federal Cybersecurity
Curtis Franklin Jr., Senior Editor at Dark Reading,  8/21/2019
Aviation Faces Increasing Cybersecurity Scrutiny
Kelly Jackson Higgins, Executive Editor at Dark Reading,  8/22/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
7 Threats & Disruptive Forces Changing the Face of Cybersecurity
This Dark Reading Tech Digest gives an in-depth look at the biggest emerging threats and disruptive forces that are changing the face of cybersecurity today.
Flash Poll
The State of IT Operations and Cybersecurity Operations
The State of IT Operations and Cybersecurity Operations
Your enterprise's cyber risk may depend upon the relationship between the IT team and the security team. Heres some insight on what's working and what isn't in the data center.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-15513
PUBLISHED: 2019-08-23
An issue was discovered in OpenWrt libuci (aka Library for the Unified Configuration Interface) as used on Motorola CX2L MWR04L 1.01 and C1 MWR03 1.01 devices. /tmp/.uci/network locking is mishandled after reception of a long SetWanSettings command, leading to a device hang.
CVE-2019-15504
PUBLISHED: 2019-08-23
drivers/net/wireless/rsi/rsi_91x_usb.c in the Linux kernel through 5.2.9 has a Double Free via crafted USB device traffic (which may be remote via usbip or usbredir).
CVE-2019-15505
PUBLISHED: 2019-08-23
drivers/media/usb/dvb-usb/technisat-usb2.c in the Linux kernel through 5.2.9 has an out-of-bounds read via crafted USB device traffic (which may be remote via usbip or usbredir).
CVE-2019-15507
PUBLISHED: 2019-08-23
In Octopus Deploy versions 2018.8.4 to 2019.7.6, when a web request proxy is configured, an authenticated user (in certain limited special-characters circumstances) could trigger a deployment that writes the web request proxy password to the deployment log in cleartext. This is fixed in 2019.7.7. Th...
CVE-2019-15508
PUBLISHED: 2019-08-23
In Octopus Tentacle versions 3.0.8 to 5.0.0, when a web request proxy is configured, an authenticated user (in certain limited OctopusPrintVariables circumstances) could trigger a deployment that writes the web request proxy password to the deployment log in cleartext. This is fixed in 5.0.1. The fi...