Vulnerabilities / Threats
10/29/2015
10:30 AM
Simon Crosby
Simon Crosby
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail
100%
0%

Machine Learning Is Cybersecuritys Latest Pipe Dream

Rather than waste money on the unproven promises of ML and AI, invest in your experts, and in tools that enhance their ability to search for and identify components of a new attack.

Go to any security conference and you’ll likely hear someone say cybersecurity is a “big data problem” or claim artificial intelligence (AI) and machine learning (ML) are the best hope we have to improve our defenses. Most organizations collect mountains of data and urgently need help sifting through it all to find signs of an attack or breach. First generation tools like SIEM (security information and event management) systems, gave security teams a way to correlate events and triage the data. Then solutions with powerful search and indexing capabilities emerged, enabling security teams to quickly search through massive amounts of indexed data.

These tools have helped enormously, but leave us with two challenges: Vast amounts of data of unknown value that we don’t know when to discard, and nagging worries that the security team might have missed a needle somewhere in the haystack.

Can machine learning help? Can an algorithm reliably find the needles and give us the confidence to discard the haystack of data that represents normal activity? It’s an appealing idea. We’ve all experienced the power of ML systems in Google search, the recommendation engines of Amazon and Netflix, and the powerful spam-filtering capabilities of Web mail providers. Former Symantec CTO Amit Mital once said that ML offers of the “few beacons of hope in this mess.”

But it’s important not to succumb to hubris. Google’s fabled ability to identify flu epidemics turned out to be woefully inaccurate. And the domain of cybersecurity is characterized by weak signals, intelligent actors, a large attack surface, and a huge number of variables. Here, there is no guarantee that using ML/AI will leave you any better off than relying on skilled experts to do the hard work.

Unfortunately, that hasn’t stopped the marketing spin. 

What’s Normal, Anyway?
It’s important to remember there is no silver bullet in security, and there’s no evidence at all that these tools help. ML is good at finding similarities between things (such as spam emails), but it’s not so good at finding anomalies. In fact, any discussion of anomalous behavior presumes that it is possible to describe normal behavior. Unfortunately, decades of research confirm that human activity, application behavior, and network traffic are all heavily auto-correlated, making it hard to understand what activity is normal. This gives malicious actors plenty of opportunity to “hide in plain sight” and even an opportunity to train the system that malicious activity is normal. 

Trained vs. Untrained Learning
Any ML system must attempt to separate and differentiate activity based either on pre-defined (i.e. trained learning) or self-learned classifications. Training an ML engine using human experts seems like a great idea, but assumes that the attackers won’t subtly vary their behavior over time in response. Self-learned categories are often impossible for humans to understand. Unfortunately, ML systems are not good at describing why a particular activity is anomalous, and how it is related to others. So when the ML system delivers an alert, you still have to do the hard work of understanding whether it is a false positive or not, before trying to understand how the anomaly is related to other activity in the system. 

Is It Real?
There is a huge difference between being pleased when Netflix recommends a movie you like, and expecting Netflix to never recommend a movie that you don’t like. So while applying ML to your security feeds might deliver some helpful insights, you cannot rely on such a system to reliably deliver only valid results. In our industry, the difference is cost – time spent understanding why an alert was triggered and whether or not it is a false positive. Ponemon estimates that a typical large enterprise spends up to 395 hours per week processing false alerts - about $1.27 million per year. Unfortunately, you also cannot rely on an ML system to find all anomalies, so you have no way to know if an attacker may still be lurking on your network, and no way to know when to throw away data.

Experts Are Still Better
Cybersecurity is a domain where human expertise will always be needed to pick through the subtle differences between anomalies. Rather than waste money on the unproven promises of ML and AI-based security technologies, I recommend that you invest in your experts, and in tools that enhance their ability to quickly search for and identify components of a new attack. In the context of endpoint security, an emerging category of tools that Gartner calls “Endpoint Detection & Response” plays an important role in equipping the security team with real-time insight into indicators of compromise on the endpoint. Here, both continuous monitoring and real-time search are key.

ML Cannot Protect You
A final caution – obvious as it may be: Post-hoc analysis of monitoring data cannot prevent a vulnerable system from being compromised in the first place. Ultimately, we need to quickly adopt technologies and infrastructure that are more secure by design. By way of example, segmenting the enterprise network and placing all PCs on a separate, routed network segment, forcing users to authenticate to get access to privileged applications, makes it much harder for malware to penetrate and move sideways in the organization. Virtualization and micro-segmentation take this a step further, restricting flows of activity in your networks and making your applications more resilient to attack. Overall, good infrastructure architecture can make the biggest difference in your security posture – reducing the size of the haystack and making the business of defending the enterprise much easier.

Black Hat Europe returns to the beautiful city of Amsterdam, Netherlands November 12 & 13, 2015. Click here for more information and to register.

Simon Crosby is co-founder and CTO at Bromium. He was founder and CTO of XenSource prior to the acquisition of XenSource by Citrix, and then served as CTO of the Virtualization & Management Division at Citrix. Previously, Simon was a principal engineer at Intel where he led ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
dfabbri
100%
0%
dfabbri,
User Rank: Apprentice
11/19/2015 | 6:42:56 PM
Explanation-Based Auditing - A different type of machine learning method for cyber security
The author rightly comments that standard anomaly detection systems are difficult to apply to cyber security. Outlier detection may help identify large scale scraping, but individual and subtle inappropriate acts are hard to find (e.g., accessing an ex-girlfriend's medical record). 

As the author notes:

"Any ML system must attempt to separate and differentiate activity based either on pre-defined (i.e. trained learning) or self-learned classifications"

Thus, for a cybersecurity machine learning system to be effective, it must have some principled and structured method to differentiate appropriate and inappropriate access. And, moreover, the system must have the correct context to make such a decision.

The author makes the statement that ML systems struggle to do this:

"Unfortunately, ML systems are not good at describing why a particular activity is anomalous, and how it is related to others. So when the ML system delivers an alert, you still have to do the hard work of understanding whether it is a false positive or not, before trying to understand how the anomaly is related to other activity in the system."

I would point the author to a new line of machine learning algorithms for access auditing called Explanation-Based Auditing.

A detailed peer-reviewed publication can be found at vldb.org/pvldb/vol5/p001_danielfabbri_vldb2012.pdf.

The general idea is to learn why accesses to data occur (e.g., the doctor accessed a record because of an appointment with the patient). This can be modeled as a graph search between the person accessing the data and the data accessed. When such an "explanation" is found, the system can determine the reason for access, filtering it away from manual review.

Thus, as the previous comment states, such as system can remove a tremendous amount of false positives, allowing the privacy or security officer to focus on the unexplained and suspicious. 

 

 
gyp
100%
0%
gyp,
User Rank: Author
11/5/2015 | 4:01:36 PM
Experts vs. ML is false dichotomy
The role of ML in security is not and never was replacing the experts, rather to free them from doing tedious tasks and to give them efficient tools. You don't want you experts to manually go through tens of thousands of log lines or do manual data munging every time they need to investigate an incident. Actually, if they are true experts, they would be fed up with that quite fast. Build or buy is a valid question, whether an off-the-shelf product can truly find the problems in your scenario better than something custom-built could but discarding data science as part of security alltogether would be a mistake.
RyanSepe
100%
0%
RyanSepe,
User Rank: Ninja
10/31/2015 | 5:04:38 PM
Re: Experts>AI
Very much agree. There will most certainly need to be a human element in cybersecurity.

Unfortunately to improve AI is a catch 22 situation. You need the man power to put hours into benefitting the technology. But then that takes hours away from that persons security knowledge being taken being used in the workforce.
Whoopty
100%
0%
Whoopty,
User Rank: Ninja
10/30/2015 | 8:02:29 AM
Experts>AI
I have to agree with the idea to keep paying experts rather than relying just on AI. Although I think machine learning has the potential in the future to be a major tool used to combat security issues and can be used sparingly now for some uses, there really is no substitute for an intuitive security expert. 

Not only can they improvise on the fly and make jumps way further down the alphabet than a computer controlled system can, they can intuitively figure out how human hackers may think and can provision for them. That's something that will take AI much longer to figure out, as they'll never understand how we think quite like we do.
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Current Issue
Five Emerging Security Threats - And What You Can Learn From Them
At Black Hat USA, researchers unveiled some nasty vulnerabilities. Is your organization ready?
Flash Poll
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2013-7445
Published: 2015-10-15
The Direct Rendering Manager (DRM) subsystem in the Linux kernel through 4.x mishandles requests for Graphics Execution Manager (GEM) objects, which allows context-dependent attackers to cause a denial of service (memory consumption) via an application that processes graphics data, as demonstrated b...

CVE-2015-4948
Published: 2015-10-15
netstat in IBM AIX 5.3, 6.1, and 7.1 and VIOS 2.2.x, when a fibre channel adapter is used, allows local users to gain privileges via unspecified vectors.

CVE-2015-5660
Published: 2015-10-15
Cross-site request forgery (CSRF) vulnerability in eXtplorer before 2.1.8 allows remote attackers to hijack the authentication of arbitrary users for requests that execute PHP code.

CVE-2015-6003
Published: 2015-10-15
Directory traversal vulnerability in QNAP QTS before 4.1.4 build 0910 and 4.2.x before 4.2.0 RC2 build 0910, when AFP is enabled, allows remote attackers to read or write to arbitrary files by leveraging access to an OS X (1) user or (2) guest account.

CVE-2015-6333
Published: 2015-10-15
Cisco Application Policy Infrastructure Controller (APIC) 1.1j allows local users to gain privileges via vectors involving addition of an SSH key, aka Bug ID CSCuw46076.

Dark Reading Radio
Archived Dark Reading Radio
Cybercrime has become a well-organized business, complete with job specialization, funding, and online customer service. Dark Reading editors speak to cybercrime experts on the evolution of the cybercrime economy and the nature of today's attackers.