Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analytics //

Security Monitoring

10:55 AM
Connect Directly

Myth-Busting Machine Learning In Security

Black Hat USA presentation to help quell misconceptions and confusion over machine learning methods in today's security tools.

As increasingly more security tools are touted today as being backed by big data, anomaly detection, behavioral analysis, and algorithmic technology, security practitioners should be wary. According to a talk slated for Black Hat USA next week, the interest has grown among security rank and file to employ machine learning to improve how they solve tough security problems.

But it's mostly the marketing arms of vendors that have caught up to this interest -- not the actual technology. 

"Don't get me wrong, there are techniques that work if applied correctly," says Alex Pinto, a former security consultant and founder of the MLSec Project, a year-old research project dedicated to investigating machine learning techniques using live data from real enterprises. "But most of the claims I am seeing are rebranded old techniques."

As Pinto explains, most of the tools marketed as using advanced algorithmic analysis of behavior or anomaly detection has actually been around in tools in the same exact form for 10 years and around in research for 30 years. And some of the mathematically based capabilities that vendors are claiming to have cracked are actually very hard "open" problems in the theoretical world.

"When you think about true anomaly detection or behavior analysis, the challenge is that security is grasping at straws because it wants the algorithm to figure out if something is normal or not," says Pinto, who will lead a talk called "Secure Because Math: A Deep Dive On Machine Learning-Based Monitoring at Black Hat." "That works well if you're only measuring one variable. But if you increase that and try to analyze, say, the NetFlow of 1,000 different machines talking to each other, today's theoretical mathematical capabilities have no chance."

If they did, they would be in use offering breakthroughs in DNA analysis to figure out if certain people would be susceptible to specific diseases, or more lucratively, to further drive data-driven marketing campaigns, because the underlying mathematical problems are similar. His goal with the talk is to show security people, who may not have a ton of theoretical math background, some of the right lines for grilling vendors when they come knocking with claims of advanced algorithms but aren't being very transparent about what is actually under the hood.

"If people are not able to answer these questions, they either don't know what they're doing or they're just point-blank lying about using machine learning," Pinto tells us.

He says he has three main things to warn buyers to look out for. The first is understanding where the information for the machine learning models is coming from.

"If it comes all from your environment, it could be susceptible to tampering by attackers. Also not having strictly identified what are the events that should be singled out by the models makes them very fragile. Pure and simple anomaly detection usually falls into this dangerous area, and these tools end up being very prone to false positives."

The second is asking about the underlying assumptions of the models. One of the big pitfalls of user behavior analysis and leak detection technology is the big assumption that the buying organization already has a very good information classification labeling of its data and solid hierarchical definitions of the users' roles. He warns that most corporate organizations have a long road ahead to establish that. Finally, the third question to be answered is how the technology is meant to be used and how it integrates with current security processes.

"Even the best models have some degree of false positives, and you need support from your organization to manage and handle this. This is again at odds with the life-saving and miraculous way machine learning is portrayed. You know Facebook ads? They get stuff wrong all the time. It is the same tech! Sometimes the ML might just get in the way of your processes."

Pinto says that he came up with the topic idea almost in direct opposition of his talk last year, which was meant to pump up interest in machine learning's potential to improve security. He's still a big believer in machine learning, but he wants to bring the hype down a bit so that scurrilous marketing doesn't give the entire field of machine learning a bad name. The idea is to get security practitioners to understand that just because a vendor claims to have PhD mathematicians on staff doesn't mean that it's magically solved security's problems.

"To be honest, I want to provoke vendors who are doing this to be more transparent."

In conjunction with his warnings, he'll also offer a silver lining in the form of an update on some of the research breakthroughs his team at MLSec has had in the past year.

Notably, he'll discuss how they're looking for better ways to use machine learning to interrogate threat intelligence feeds in order to fine tune and speed up analysis and point practitioners and operations staff in the right directions more quickly. As he explains, experienced security practitioners know that while some of these indicator feeds could be used as simple black lists, they're best used for experienced analysis. This human investigation usually involves learning patterns and intrinsic characteristics such as Internet routing positions or data centers hosting threats.

"These are very labor-intensive processes, and it turns out they can be efficiently mined for features for machine learning," says Pinto. "In a nutshell, our models are able to extrapolate the knowledge of existing threat intelligence feeds as experienced analysts would. When you run them against log data from one of our participants, their data helps fill out some knowledge gaps and biases the model may develop during the training process and returns a very short list of potential compromised machines."

He says to think of it like an Amazon recommendation system for network security:

" 'Your peers have just been hacked like this -- you may want to look at these guys on your network.' Of course, it is not magical, and some few false positives creep up now and then, but the organizations working with us have been very receptive to the results."

Ericka Chickowski specializes in coverage of information technology and business innovation. She has focused on information security for the better part of a decade and regularly writes about the security industry as a contributor to Dark Reading.  View Full Bio

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Kelly Jackson Higgins
Kelly Jackson Higgins,
User Rank: Strategist
7/28/2014 | 5:06:52 PM
Re: Machine Learning
Good point, @tgulati. It is definitely evolving, and I think this talk should help advance that. 
User Rank: Apprentice
7/28/2014 | 4:25:36 PM
Re: Machine Learning
I would argue that multidimensional analytics achieved via some combination of machine learning and algorithms will advance the capabilities around threat detection. In my experience, the unidimensional rulesets written on single datasets or reliance on singular techniques tend to provide the most number of false positives. Some of the smartest security practitioners are able to apply their knowledge of their networks and cross dataset threat signatures to detect threats. The smarter technologies are the ones that can use similar analytics to get better results. I do believe this fied will continue to evolve and the serious security vendors will focus more on the solution than the unerlying big data technologies.
Marilyn Cohodas
Marilyn Cohodas,
User Rank: Strategist
7/28/2014 | 3:18:20 PM
Re: intereesting
With all the buzz about big data and how security teams can harness the technology to combat threats, this session will definitely be an eye-opener.  For background, check out this blog 6 Tips for Using Big Data to Hunt Cyberthreats, by Timber Wolfe,  Principal Security Engineer, at TrainAC

Kelly Jackson Higgins
Kelly Jackson Higgins,
User Rank: Strategist
7/28/2014 | 1:59:01 PM
This should be a very revealing presentation. I would think quite a few enterprise folks would be interested in gaining some insight on machine learning differences, etc.
COVID-19: Latest Security News & Commentary
Dark Reading Staff 10/23/2020
7 Tips for Choosing Security Metrics That Matter
Ericka Chickowski, Contributing Writer,  10/19/2020
Russian Military Officers Unmasked, Indicted for High-Profile Cyberattack Campaigns
Kelly Jackson Higgins, Executive Editor at Dark Reading,  10/19/2020
Register for Dark Reading Newsletters
White Papers
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2020-10-23
A Cross-Site Request Forgery (CSRF) vulnerability is identified in FruityWifi through 2.4. Due to a lack of CSRF protection in page_config_adv.php, an unauthenticated attacker can lure the victim to visit his website by social engineering or another attack vector. Due to this issue, an unauthenticat...
PUBLISHED: 2020-10-23
FruityWifi through 2.4 has an unsafe Sudo configuration [(ALL : ALL) NOPASSWD: ALL]. This allows an attacker to perform a system-level (root) local privilege escalation, allowing an attacker to gain complete persistent access to the local system.
PUBLISHED: 2020-10-23
NVIDIA GeForce Experience, all versions prior to, contains a vulnerability in the ShadowPlay component which may lead to local privilege escalation, code execution, denial of service or information disclosure.
PUBLISHED: 2020-10-23
An arbitrary command execution vulnerability exists in the fopen() function of file writes of UCMS v1.4.8, where an attacker can gain access to the server.
PUBLISHED: 2020-10-23
NVIDIA GeForce Experience, all versions prior to, contains a vulnerability in NVIDIA Web Helper NodeJS Web Server in which an uncontrolled search path is used to load a node module, which may lead to code execution, denial of service, escalation of privileges, and information disclosure.