Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Threat Intelligence

2/11/2016
12:30 PM
Giora Engel
Giora Engel
Commentary
Connect Directly
Facebook
Twitter
LinkedIn
RSS
E-Mail vvv
100%
0%

3 Flavors of Machine Learning: Who, What & Where

To get beyond the jargon of ML, you have to consider who (or what) performs the actual work of detecting advanced attacks: vendor, product or end-user.

The great promise machine learning holds for the security industry is its ability to detect advanced and unknown attacks -- particularly those leading to data breaches. These range from traditional uses -- such as malware detection -- to new areas like attack detection for hackers who have circumvented preventative security.

Unfortunately, machine learning , which is rapidly becoming a popular marketing term, has lost much of its meaning because virtually all vendors define it differently. One way to get beyond the jargon is to look at ML from the perspective of who actually performs it, and where. But first, some basic concepts and definitions.

The strength of any ML algorithm is only as strong as the data modeling behind it; the actual algorithm in use only plays a secondary role. If the selected data parameters do not contain parameters that can predict the result, you can use fancy algorithms, but the accuracy of the results will be very low. They will also generate a lot of noise when used outside of a lab environment.

A basic principle in data science is that simple schemes with the right data modeling work better than complex schemes. So in evaluating options, it’s wise to look for vendors that have real domain expertise rather than a large staff of PhDs. That’s because understanding the parameters and various scenarios is more important than the development of an algorithm for correlating data. Domain expertise directly affects the quality of the data modeling. Consequently, if it’s hard to understand how ML is used, it probably means that it is not relevant to the way the product works.

As for understanding the various flavors of ML, one approach is to divide products into categories based on who (or what) actually performs the machine learning work: the vendor, the product or the end-user.

The Vendor
The vast majority of cases using the term machine learning actually describe one of the tools that the vendor uses to develop their product or generate threat intelligence. In these cases, the vendor is actually performing ML in their lab, rather than the product doing it on premise.

A typical example: AV and URL filtering vendors that perform ML behind the scenes. In order to keep their signatures (or threat intelligence) reasonably current and to process heavy loads of malware and viruses that have been encountered, vendors need to leverage ML in their labs to automate the classification and signature creation process. This use of ML occurs in the vendor’s lab and results in signatures or threat intelligence that the product then uses to detect specific patterns or artifacts.

Typical products: AV, sandboxing, anti-bot, whitelisting and rule-based event correlation.

Advantage: the products are deterministic and will always operate in the same way, regardless of the environment.

Disadvantage: the products are rule-based and can leverage only known artifacts, which leads to low detection accuracy (e.g. AVs inherently don’t detect new malware well). Attackers can circumvent detection and test against the product.

The Product
Some products perform ML as an integral part of their function, typically for behavioral detection. In this case the product “learns” the specific environment and uses that information for detection. For example, observing a user or machine starting to access resources it never accessed before and ones that the user’s peer group doesn’t typically access. There is no predetermined rule, signature or pattern that can detect this. You can only achieve an accurate detection by profiling normal behavior in the particular network and applying that knowledge to detect anomalous behavior.

“Behavioral analysis” by itself doesn’t mean machine learning. Many products look at behaviors and apply rules or signatures. For example, sandboxing products typically run a malware in a sandbox environment, examine its behavior and then compare the behavior against a list or rules previously developed by the vendor in their lab (using different methods, including machine learning). In this case the product itself does not perform any ML. A product that performs ML must have a self-training/learning/profiling period. Products that don’t operate this way do not belong in this category, even if they are said to perform “behavioral analysis” or “detection”.

A relatively new security application for machine learning is detection of attacks that have evaded preventative security. While malware detection doesn’t necessarily need ML-capable products, more general behavioral attack detection is usually based around the activities of a human attacker or insider. The system has to essentially customize its logic to the environment in order to accurately detect the activities. This area represents a substantial break from traditional security in that the goal is to identify unknown anomalous behaviors that neither the end user nor the vendor specified in advance, rather than evaluate against known, already-defined technical artifacts.

Typical products: fraud detection, anomaly detection, attack detection, behavioral detection. A product in this category has to have a self-learning/profiling period, so other “behavioral analysis” products are not included here.

Advantage: Leveraging ML, these products can obtain higher detection accuracy and a lower rate of false positives. They automatically optimize their detection to every specific environment and could detect unknown things that the end-user or vendor would not need to specify in advance. Additionally, these can’t be “gamed” by hackers in the way a statically defined technical artifact can be known and thus circumvented by an attacker.

Disadvantage: The detection depends on the profile of the specific environment, making the process less predictable. The products are less optimized for generic queries on the data, but more on automated detection.

The End-user
This category includes products that are are toolkits used by data scientists to perform ML. For example, business intelligence (BI) tools enable the end user to define datasets, run correlations, regressions and clustering algorithms. In this case the end user is the data scientist who leverages ML, and the product is only a tool at his or her disposal. The end user decides which data to process, what parameters to use and how to interpret the results.

Typical products: Business intelligence products, mathematical/statistical analysis toolkits, SIEM products with analytics toolkits.

Advantage: Lets the user perform custom analytics on custom datasets.

Disadvantage: Can only be leveraged if the security team has data scientists. The responsibility is on the analyst rather than the tool to define the problem, the input data and the conclusions. The analyst would not be able to see patterns that he or she wasn’t looking for. In order to allow custom analytics the collection of data is a heavy task that requires additional products and storage.

 More on this topic:

Interop 2016 Las Vegas

Find out more about security trends and technologies at Interop 2016, May 2-6, at the Mandalay Bay Convention Center, Las Vegas. Register today and receive an early bird discount of $200.

Giora Engel, vice president, product & strategy at LightCyber is a serial entrepreneur with many years of technological and managerial experience. For nearly a decade, he served as an officer in an elite technological unit in the Israel Defense Forces, where he initiated and ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
JouCTO
50%
50%
JouCTO,
User Rank: Apprentice
2/14/2016 | 9:38:22 AM
Outstanding
A refreshingly accurate and honest review of machine learning. Thank you, Giora!

 
When It Comes To Security Tools, More Isn't More
Lamont Orange, Chief Information Security Officer at Netskope,  1/11/2021
US Capitol Attack a Wake-up Call for the Integration of Physical & IT Security
Seth Rosenblatt, Contributing Writer,  1/11/2021
IoT Vendor Ubiquiti Suffers Data Breach
Dark Reading Staff 1/11/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
2020: The Year in Security
Download this Tech Digest for a look at the biggest security stories that - so far - have shaped a very strange and stressful year.
Flash Poll
Assessing Cybersecurity Risk in Today's Enterprises
Assessing Cybersecurity Risk in Today's Enterprises
COVID-19 has created a new IT paradigm in the enterprise -- and a new level of cybersecurity risk. This report offers a look at how enterprises are assessing and managing cyber-risk under the new normal.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2021-25173
PUBLISHED: 2021-01-18
An issue was discovered in Open Design Alliance Drawings SDK before 2021.12. A memory allocation with excessive size vulnerability exists when reading malformed DGN files, which allows attackers to cause a crash, potentially enabling denial of service (crash, exit, or restart).
CVE-2021-25174
PUBLISHED: 2021-01-18
An issue was discovered in Open Design Alliance Drawings SDK before 2021.12. A memory corruption vulnerability exists when reading malformed DGN files. It can allow attackers to cause a crash, potentially enabling denial of service (Crash, Exit, or Restart).
CVE-2021-25175
PUBLISHED: 2021-01-18
An issue was discovered in Open Design Alliance Drawings SDK before 2021.11. A NULL pointer dereference exists when rendering malformed .DXF and .DWG files. This can allow attackers to cause a crash, potentially enabling a denial of service attack (Crash, Exit, or Restart). This is issue 1 of 3.
CVE-2021-25176
PUBLISHED: 2021-01-18
An issue was discovered in Open Design Alliance Drawings SDK before 2021.11. A NULL pointer dereference exists when rendering malformed .DXF and .DWG files. This can allow attackers to cause a crash, potentially enabling a denial of service attack (Crash, Exit, or Restart). This is issue 2 of 3.
CVE-2021-25177
PUBLISHED: 2021-01-18
An issue was discovered in Open Design Alliance Drawings SDK before 2021.11. A NULL pointer dereference exists when rendering malformed .DXF and .DWG files. This can allow attackers to cause a crash, potentially enabling a denial of service attack (Crash, Exit, or Restart). This is issue 3 of 3.