Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Threat Intelligence

10/7/2016
10:00 AM
Nik Whitfield
Nik Whitfield
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Data Science & Security: Overcoming The Communication Challenge

Data scientists face a tricky task -- taking raw data and making it meaningful for both security and business teams. Here's how to bridge the gap.

Today, CISOs and their teams are being asked lots of questions about risk by different types of stakeholders. Many of these questions require security professionals to analyze raw data from multiple sources, then communicate insight about impact exposure or priorities that's meaningful to people who are not security pros. This goal has many challenges, such as understanding raw data and analyzing it to produce accurate information that's helpful to a particular person's decision making context. This is a skill in itself, and one that data scientists are uniquely placed to provide.

Security's Analysis and Communication Challenge
CISOs often face questions from business or governance, risk management, and compliance stakeholders that operational tools can't answer. This is either because tools are designed to meet a single operational security need rather than correlate data to answer a business risk question, or because tools are designed to "find bad" and detect when something goes wrong rather than enumerate risk. 

As a result, someone in the security team eventually must extract raw data from a technology "Frankenstack," put it into an analysis tool (spreadsheets by default), and then torture the data for answers to questions that inevitably get more complex over time. This is all before working out how best to communicate the output of data analysis to clearly answer "So what?" and "What now?"

How Data Science Can Help
Asking questions of raw data from one source, let alone multiple sources, isn't easy.

First you have to understand the data that your security tools put out and any quirks that exist (such as timestamps and field names). In data science, data preparation is one of the most important stages of producing insight. It involves understanding what questions a data set can answer, the limits of the data set (that is, what information is missing or invalid), and looking at other data sets that can improve completeness of analysis where a single data set is not sufficient.

Then comes the job of selecting the most appropriate analysis method to answer the question at hand. Data scientists have a spectrum of methods they can use, which are suitable for extracting different information from data. Data science as a discipline will consider multiple factors to deliver the most meaningful information in the time available, all with appropriate caveats. For example, what is the current state of knowledge on this topic? What does the consumer of analysis want to know? The answers here will set the bar for the complexity of analysis required to learn something new. For example, if a data set hasn't been analyzed before, simple stats can provide valuable insight quickly. Then there's the inevitable trade-off between speed to results on one hand and precision on the other. Based on all this, the best analysis method could be simple counts or using a machine learning algorithm. 

Finally comes communication. What view of the data does a decision maker need? For example, the view of vulnerability will be different for a CISO who needs insight for a strategic quarterly meeting when compared with a vulnerability manager who needs to prioritize what to fix at a tactical level. While these views will be built from the same raw data, the summary for each requires different caveats, because as you summarize, you inevitably exclude details. 

Merging Data Science and Domain Expertise
Data scientists can't, and shouldn't, work in a silo away from the security team. Far more value is gained by combining their expertise in understanding, analyzing, and communicating data with the domain expertise of security professionals who understand the problem and the questions that need answering.

As more security departments start working with data scientists, here are three key factors to bear in mind:

  1. Time: Understanding multiple data sets, applying the most relevant analysis techniques to them, and delivering meaningful insights based on what question needs answering won't happen overnight. It takes time.
  2. Domain expertise: There will be gaps in knowledge between your data scientist and your security team. Working in close partnership is critical. Just as you're getting used to constraints the data scientist has discovered in the data you have, so too is your data scientist coming to grips with new and usually complex log formats in an effort to see what's possible.
  3. The needs of your consumers: Communicating and visualizing insight from data requires different analysis for different roles. The CISO, control manager, IT operations, and C-suite all have different needs — and your data scientist must learn about these roles to strike the right balance between conclusions and caveats for each one.

Related Content:

Nik Whitfield is the founder and CEO at Panaseer. He founded the company with the mission to make organizations cybersecurity risk-intelligent. His  team created the Panaseer Platform to automate the breadth and depth of visibility required to take control of ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
COVID-19: Latest Security News & Commentary
Dark Reading Staff 9/25/2020
Hacking Yourself: Marie Moe and Pacemaker Security
Gary McGraw Ph.D., Co-founder Berryville Institute of Machine Learning,  9/21/2020
Startup Aims to Map and Track All the IT and Security Things
Kelly Jackson Higgins, Executive Editor at Dark Reading,  9/22/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-15208
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, when determining the common dimension size of two tensors, TFLite uses a `DCHECK` which is no-op outside of debug compilation modes. Since the function always returns the dimension of the first tensor, malicious attackers can ...
CVE-2020-15209
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, a crafted TFLite model can force a node to have as input a tensor backed by a `nullptr` buffer. This can be achieved by changing a buffer index in the flatbuffer serialization to convert a read-only tensor to a read-write one....
CVE-2020-15210
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, if a TFLite saved model uses the same tensor as both input and output of an operator, then, depending on the operator, we can observe a segmentation fault or just memory corruption. We have patched the issue in d58c96946b and ...
CVE-2020-15211
PUBLISHED: 2020-09-25
In TensorFlow Lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, saved models in the flatbuffer format use a double indexing scheme: a model has a set of subgraphs, each subgraph has a set of operators and each operator has a set of input/output tensors. The flatbuffer format uses indices f...
CVE-2020-15212
PUBLISHED: 2020-09-25
In TensorFlow Lite before versions 2.2.1 and 2.3.1, models using segment sum can trigger writes outside of bounds of heap allocated buffers by inserting negative elements in the segment ids tensor. Users having access to `segment_ids_data` can alter `output_index` and then write to outside of `outpu...