Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Threat Intelligence

3/14/2019
02:30 PM
Rosaria Silipo
Rosaria Silipo
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Anomaly Detection Techniques: Defining Normal

The challenge is identifying suspicious events in training sets where no anomalies are encountered. Part two of a two-part series.

The problem of anomaly detection is not new, and a number of solutions have already been proposed over the years. However, before starting with the list of techniques, let's agree on a necessary premise: All anomaly detection techniques must involve a training set where no anomaly examples are encountered. The challenge consists of identifying suspicious events, even in the absence of examples.

We talk in this case of a training set formed of only "normal" events. The definition of "normal" is, of course, arbitrary. In the case of anomaly detection, a "normal" event refers just to the events represented in the training set. Here are four common approaches.

Statistical Methods
Everything that falls outside of the statistical distribution calculated over the training set is considered an anomaly.

The simplest statistical method is the control chart. Here the average and standard deviation for each feature is calculated on the training set. Thresholds are then defined around the average value as k*std deviation where k is an arbitrary coefficient, usually between 1.5 and 3.0, depending on how conservative we want the algorithm to be. During deployment, a point trespassing the thresholds in both directions is a suspicious candidate for an anomaly event.

Such methods are easy to implement and understand, fast to execute, and fit both static and time series data. However, they might be too simple to detect more subtle anomalies.

Clustering
Other proposed methods are often clustering methods. Since the anomaly class is missing from the training set, clustering algorithms might sound suitable for the task.

The concept here is clear. The algorithm creates a number of clusters on the training set. During deployment, the distance between the current data point and the clusters is calculated. If the distance is above a given threshold, the data point becomes a suspicious candidate for an anomaly event. Depending on the distance measure used and on the aggregation rules, different clustering algorithms have been designed and different clusters are created.

This approach, however, does not fit time series data since a fixed set of clusters cannot capture the evolution in time.

Supervised Machine Learning
Surprised? Supervised machine learning algorithms can also be used for anomaly detection. They would even cover all data situations since supervised machine learning techniques can be applied to static classification as well as to time series prediction problems. However, since they all require a set of examples for all involved classes, we need a little change in perspective.

In the case of anomaly detection, a supervised machine learning model can only be trained on "normal" data — i.e., on data describing the system operating in "normal" conditions. The evaluation of whether the input data is an anomaly can only happen during deployment after the classification/prediction has been made.There are two popular approaches for anomaly detection relying on supervised learning techniques.

The first one is a neural autoassociator (or autoencoder). The autoassociator is trained to reproduce the input pattern onto the output layer. The pattern reproduction works fine as long as the input patterns are similar to the examples in the training set — i.e., “normal.” Things do not work quite as well when a new, different shape vector appears at the input layer. In this case, the network will not be able to adequately reproduce the input vector onto the output layer. If a distance is calculated between the input and the output of the network, the distance value will be higher for an anomaly rather than for a "normal" event. Again, defining a threshold on this distance measure should find the anomaly candidates. This approach works well for static data points but does not fit time series data.

The second approach uses algorithms for time series prediction. The model is trained to predict the value of the next sample based on the history of previous n samples on a training set of "normal" values. During deployment, the prediction of the next sample value will be relatively correct — i.e., close to the real sample value, if the past history comes from a system working in "normal" conditions. The predicted value will be farther from reality if the past history samples come from a system not working in "normal" conditions anymore. In this case, a distance measure calculated between the predicted sample value and the real sample value would isolate candidates for anomaly events.

Related Content: 

 

 

Join Dark Reading LIVE for two cybersecurity summits at Interop 2019. Learn from the industry's most knowledgeable IT security experts. Check out the Interop agenda here.

Rosaria Silipo, Ph.D., principal data scientist at KNIME, is the author of 50+ technical publications, including her most recent book "Practicing Data Science: A Collection of Case Studies". She holds a doctorate degree in bio-engineering and has spent more than 25 years ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
Commentary
Cyberattacks Are Tailored to Employees ... Why Isn't Security Training?
Tim Sadler, CEO and co-founder of Tessian,  6/17/2021
Edge-DRsplash-10-edge-articles
7 Powerful Cybersecurity Skills the Energy Sector Needs Most
Pam Baker, Contributing Writer,  6/22/2021
News
Microsoft Disrupts Large-Scale BEC Campaign Across Web Services
Kelly Sheridan, Staff Editor, Dark Reading,  6/15/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2021-32716
PUBLISHED: 2021-06-24
Shopware is an open source eCommerce platform. In versions prior to 6.4.1.1 the admin api has exposed some internal hidden fields when an association has been loaded with a to many reference. Users are recommend to update to version 6.4.1.1. You can get the update to 6.4.1.1 regularly via the Auto-U...
CVE-2021-32717
PUBLISHED: 2021-06-24
Shopware is an open source eCommerce platform. In versions prior to 6.4.1.1 private files publicly accessible with Cloud Storage providers when the hashed URL is known. Users are recommend to first change their configuration to set the correct visibility according to the documentation. The visibilit...
CVE-2021-32712
PUBLISHED: 2021-06-24
Shopware is an open source eCommerce platform. Versions prior to 5.6.10 are vulnerable to system information leakage in error handling. Users are recommend to update to version 5.6.10. You can get the update to 5.6.10 regularly via the Auto-Updater or directly via the download overview.
CVE-2021-32713
PUBLISHED: 2021-06-24
Shopware is an open source eCommerce platform. Versions prior to 5.6.10 suffer from an authenticated stored XSS in administration vulnerability. Users are recommend to update to the version 5.6.10. You can get the update to 5.6.10 regularly via the Auto-Updater or directly via the download overview.
CVE-2021-32710
PUBLISHED: 2021-06-24
Shopware is an open source eCommerce platform. Potential session hijacking of store customers in versions below 6.3.5.2. We recommend to update to the current version 6.3.5.2. You can get the update to 6.3.5.2 regularly via the Auto-Updater or directly via the download overview. For older versions o...