Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Threat Intelligence

5/26/2020
10:00 AM
100%
0%

The Problem with Artificial Intelligence in Security

Any notion that AI is going to solve the cyber skills crisis is very wide of the mark. Here's why.

If you believed everything you read, artificial intelligence (AI) is the savior of cybersecurity. According to Capgemini, 80% of companies are counting on AI to help identify threats and thwart attacks. That's a big ask to live up to because, in reality, few nonexperts really understand the value of AI to security or whether the technology can effectively address information security's many potential use cases.

A cynic would call out the proliferation of claims about using AI for what it is — marketing hype. Even the use of the term "AI" is misleading. "Artificial intelligence" makes it sound like the technology has innate generalized intelligence that can tackle different problems. In reality, what you have in most cases is a machine learning (ML) algorithm that has been tuned for a specific task.

The algorithms that are embedded in some security products could, at best, be called narrow (or weak) AI. They perform highly specialized tasks in a single (narrow) field and have been trained on large volumes of data, specific to a single domain. This is a far cry from general (or strong) AI, which is a system that can perform any generalized task and answer questions across multiple domains. We are a long way from those type of solutions hitting the market.

Having a technology that can do only one job is no replacement for a general member of your team. So, any notion that AI is going to solve the cyber skills crisis is very wide of the mark. In fact, these solutions often require more time from security teams — a fact that is often overlooked.

For example, take the case of anomaly detection. It's really valuable for your security operations center analysts to be able to find any "bad stuff" in your network, and machine learning can be well-suited to this problem. However, an algorithm that finds way more "bad stuff" than you ever did before might not be as good as it sounds. All ML algorithms have a false-positive rate (identifying events as "bad" when they are benign), the value of which is part of a trade-off between various desired behaviors. Therefore, you tend to still need a human to triage these results — and the more "bad" the algorithm finds, the more events there are for your team member to assess.  

The point is not that this is a particularly surprising result to anyone familiar with ML — just that it's not necessarily common knowledge to teams that may wish to employ these solutions, which may lead to inflated expectations of how much time ML may free up for them.

Whereas the example above was about how ML algorithms can be targeted at doing some of the work of a security team directly, algorithms can also be used to assist them indirectly by helping users avoid making mistakes that can pose a risk. This approach is exciting because it starts to look at reducing the number of possible events coming into the funnel — rather than trying to identify and mitigate them at the end when they contribute to a security event. It's not just solving the most obvious issue that may bring about the desired outcomes in the long term.

The other issue that is easy to overlook when considering ML is that of data. Any ML algorithm can only work when it has enough data to learn from. It takes time to learn; just think, how many Internet cat pictures do you need to show it before it recognizes a cat? How long does the algorithm need to run before the model starts to work? The learning process can take much longer than expected, so security teams need to factor this in. Furthermore, labeled data, which is optimal for some use cases, is in short supply in security. This is another area where getting a "human in the loop" to classify security events and assist in the training of the algorithm can be required.

There is a lot of promise for machine learning to augment tasks that security teams must undertake — as long as the need for both data and subject matter experts are acknowledged. Rather than talking about "AI solving a skill shortage," we should be thinking of AI as enhancing or assisting with the activities that people are already performing.

So, how can CISOs best take advantage of the latest advances in machine learning, as its usage in security tooling increases, without being taken in by the hype? The key is to come with a very critical eye. Consider in detail what type of impact you want to have by employing ML and where in your overall security process you want this to be. Do you want to find "more bad" or do you want to help prevent user error or one of the other many possible applications?

This choice will point you toward different solutions. You should ensure that the trade-offs of any ML algorithm employed in these solutions are abundantly clear to you, which is possible without needing to understand the finer points of the math under the hood. Finally, you will need to weigh up the benefits of these trade-offs, against the less obvious, potential negative second-order effects on your existing team — for example, more events to triage. 

Whichever type of problem you're hoping to solve, availability of data that is high quality and up to date is absolutely crucial to your success with emerging ML capabilities. Organizations can lay the foundations for this now by investing in security data collection and analysis capabilities and their security team's data skill sets. The necessity of having security SMEs to interpret machine learning output (whether as part of a formal "human in the loop" solution, or just having analysts triaging results post-processing) is going to continue to be fundamental for the foreseeable future.

Check out The Edge, Dark Reading's new section for features, threat data, and in-depth perspectives. Today's featured story: "The Entertainment Biz Is Changing, but the Cybersecurity Script Is One We've Read Before."

Dr. Leila Powell started out as an astrophysicist, using supercomputers to study the evolution of galaxies. Now she tackles more down-to-earth challenges. As the lead data scientist at Panaseer, she helps information security functions in global organizations understand and ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Yoni Kahana
50%
50%
Yoni Kahana,
User Rank: Author
5/27/2020 | 4:03:29 AM
trust your AI as much as you trust the data
one of the main issue of AI is that its trustable as the inputs , means ensure the integrity of the inputs otherwise the output wont be valid , regardless how good is the algorithm 
drich188
50%
50%
drich188,
User Rank: Apprentice
5/26/2020 | 11:08:44 AM
Phrasing is absolutely outstanding
This is exactly what we at Enveloperty have been trying to communicate and solve! The way you formulated the article and the wording choice is impeccable! Excellent work, will definetely be sharing this article as much as I can!
COVID-19: Latest Security News & Commentary
Dark Reading Staff 9/25/2020
Hacking Yourself: Marie Moe and Pacemaker Security
Gary McGraw Ph.D., Co-founder Berryville Institute of Machine Learning,  9/21/2020
Startup Aims to Map and Track All the IT and Security Things
Kelly Jackson Higgins, Executive Editor at Dark Reading,  9/22/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-15208
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, when determining the common dimension size of two tensors, TFLite uses a `DCHECK` which is no-op outside of debug compilation modes. Since the function always returns the dimension of the first tensor, malicious attackers can ...
CVE-2020-15209
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, a crafted TFLite model can force a node to have as input a tensor backed by a `nullptr` buffer. This can be achieved by changing a buffer index in the flatbuffer serialization to convert a read-only tensor to a read-write one....
CVE-2020-15210
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, if a TFLite saved model uses the same tensor as both input and output of an operator, then, depending on the operator, we can observe a segmentation fault or just memory corruption. We have patched the issue in d58c96946b and ...
CVE-2020-15211
PUBLISHED: 2020-09-25
In TensorFlow Lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, saved models in the flatbuffer format use a double indexing scheme: a model has a set of subgraphs, each subgraph has a set of operators and each operator has a set of input/output tensors. The flatbuffer format uses indices f...
CVE-2020-15212
PUBLISHED: 2020-09-25
In TensorFlow Lite before versions 2.2.1 and 2.3.1, models using segment sum can trigger writes outside of bounds of heap allocated buffers by inserting negative elements in the segment ids tensor. Users having access to `segment_ids_data` can alter `output_index` and then write to outside of `outpu...