SOC Turns to Homegrown Machine Learning to Catch Cyber Intruders

A do-it-yourself machine learning system helped a French bank detect three types of exfiltration attacks missed by current rules-based systems, attendees will learn at Black Hat Europe.

4 Min Read
Machine Learning for cybersecurity in SOCs
Source: Blue Planet Studio via Adobe

Using an internally developed machine learning model trained on log data, the information security team for a French bank found it could detect three new types of data exfiltration that rules-based security appliances did not catch.

Carole Boijaud, a cybersecurity engineer with Credit Agricole Group Infrastructure Platform (CA-GIP), will take the stage at next week's Black Hat Europe 2022 conference to detail the research into the technique, in a session entitled, "Thresholds Are for Old Threats: Demystifying AI and Machine Learning to Enhance SOC Detection." The team took daily summary data from log files, extracted interesting features from the data, and used that to find anomalies in the bank's Web traffic. 

The research focused on how to better detect data exfiltration by attackers, and resulted in identification of attacks that the company's previous system failed to detect, she says.

"We implemented our own simulation of threats, of what we wanted to see, so we were able to see what could identify in our own traffic," she says. "When we didn't detect [a specific threat], we tried to figure out what is different, and we tried to understand what was going on."

As machine learning has become a buzzword in the cybersecurity industry, some companies and academic researchers are still making headway in experimenting with their own data to find threats that might otherwise hide in the noise. Microsoft, for example, used data collected from the telemetry of 400,000 customers to identify specific attack groups and, using those classifications, predict future actions of the attackers. Other firms are using machine learning techniques, such as genetic algorithms, to help detect accounts on cloud computing platforms that have too many permissions.

There are a variety of benefits from analyzing your own data with a homegrown system, says Boijaud. Security operation centers (SOCs) gain a better understanding of their network traffic and user activity, and security analysts can gain more insight into the threats attacking their systems. While Credit Agricole has its own platform group to manage infrastructure, handle security, and conduct research, even smaller enterprises can benefit from applying machine learning and data analysis, Boijaud says.

"Developing your own model is not that expensive and I'm convinced that everyone can do it," she says. "If you have access to the data, and you have people who know the logs, they can create their own pipeline, at least in the beginning."

Finding the Right Data Points to Monitor

The cybersecurity engineering team used a data-analysis technique known as clustering to identify the most important features to track in their analysis. Among the features that were deemed most significant included the popularity of domains, the number of times systems reached out to specific domains, and whether the request used an IP address or a standard domain name.

"Based on the representation of the data and the fact that we have been monitoring the daily behavior of the machines, we have been able to identify those features," says Boijaud. "Machine learning is about mathematics and models, but one of the important facts is how you choose to represent the data and that requires understanding the data and that means we need people, like cybersecurity engineers, who understand this field."

After selecting the features that are most significant in classifications, the team used a technique known as "isolation forest" to find the outliers in the data. The isolation forest algorithm organizes data into several logical trees based on their values, and then analyzes the trees to determine the characteristics of outliers. The approach scales easily to handle a large number of features and is relatively light, processing-wise.

The initial efforts resulted in the model learning to detect three types of exfiltration attacks that the company would not otherwise have detected with existing security appliances. Overall, about half the exfiltration attacks could be detected with a low false-positive rate, Boijaud says.

Not All Network Anomalies Are Malicious

The engineers also had to find ways to determine what anomalies indicated malicious attacks and what may be nonhuman — but benign — traffic. Advertising tags and requests sent to third-party tracking servers were also caught by the system, as they tend to match the definitions of anomalies, but could be filtered out of the final results.

Automating the initial analysis of security events can help companies more quickly triage and identify potential attacks. By doing the research themselves, security teams gain additional insight into their data and can more easily determine what is an attack and what may be benign, Boijaud says.

CA-GIP plans to expand the analysis approach to use cases beyond detecting exfiltration using Web attacks, she says.

Read more about:

Black Hat News

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights