Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Analytics

8/29/2017
02:00 PM
Nik Whitfield
Nik Whitfield
Commentary
Connect Directly
Twitter
LinkedIn
RSS
E-Mail vvv
50%
50%

Security Analytics: Making the Leap from Data Lake to Meaningful Insight

Once you've got a lake full of data, it's essential that your analysis isn't left stranded on the shore.

Second of a two-part series.

Lots of technology and security teams, particularly in finance, are running data lake projects to advance their data analytics capabilities. Their goal is to extract meaningful, timely insights from data, so that security leaders, control managers, IT, and security operations can make effective decisions using information from their live environment.

During the four phases of a data lake project (build data lake; ingest data; do analysis; deliver insight), the hurdles to success are different. In the first two phases, it's easy for a data lake to become a data swamp. In the last two, it's easy to have a lake full of data that delivers a poor return on investment. Here are three steps to avoid that happening.

Step 1: Clean up messy analysis workflows from the get-go.
Security teams know the frustrations of ad hoc data analysis efforts only too well. For example, a risk question from an executive is given to an analyst, who collects data from whatever parts of the technology "frankenstack" he or she can get access to. This ends up in spreadsheets and visualization tools. Then, after days or weeks of battling with hard-to-link, unwieldy data sets, the results of best-effort analysis are sent off in a slide deck. If this doesn't answer the questions "So what?" and "What now?" the cycle repeats.

Automating the process of putting data from the many and varied security-relevant technologies that exist in an enterprise environment into a data lake, and then just replicating a process like the one above, means analysis efforts may start sooner. However, if data isn't structured so it's easy to understand and interact with, it doesn't get easier to deliver meaningful output.

To avoid this, look at ways to optimize your data analysis workflow. Consider everything from how you ingest, store, and model data to how you scope questions with stakeholders and set their expectations about "speed to answer." Build a framework for iterating analysis — and be clinical about quickly proving or disproving the ROI a data set can contribute toward the stakeholder-requested insight. Also think about how to build a knowledge base about what relationships in what data sets are valuable, and which aren't. It's important that analysts' experience isn't trapped in their heads and that teams can avoid repeating analysis that eats up time and money without delivering value.

Step 2: Buy time for the hard work that needs doing up front.
There's often a lot of complexity involved in correlating data sets from different technologies that secure an end-to-end business process. When data analytics teams first start working with data from the diverse security and IT solutions that are in place, they need to do a lot of learning to make sure the insight they present to decision makers is robust and has the necessary caveats.

This learning is on three levels: first, understanding the data; second, implementing the right analysis for the insight required; and third, finding the best way to communicate the insight to relevant stakeholders so they can make decisions.

The item with the biggest lead time is "understanding the data." Even if security teams interact regularly with a technology's user interface, getting to grips with the raw data it generates can be a hard task. Sometimes there's little documentation about how the data is structured, and it can be difficult to interpret and understand the relevance of information from just looking at the labels. As a result, data analytics teams usually need to spend a long time answering questions such as: What do the fields in each data source mean? How does technology configuration influence the data available? Across data sources, how does the format of information referring to the same thing differ? And what quirks can occur in the data that need to be accounted for?

It's critical to set expectations with budget holders and executives about how important this is and the time it will take. This isn't just because understanding and modeling data is a key enabler of delivering speed to insight in the long term. It's also because it's easy to create "data skeptics" when pressure for fast results leads to rushed analysis that delivers incorrect information.

Step 3: Plan to scale from the start for long-term success.
Providing data-driven risk and security insights for executives and operations teams usually means answering the following questions: What's our status? Is it good or bad? If it's bad, why? Do we act or gather more information? If we act, what are our best cost actions?

Once data analytics teams start providing high-value insights that answer these questions — for example, "What are our best cost options to achieve large reductions in risk due to vulnerability exposure?" — the next challenge is scaling the team's capability to answer more (and usually more difficult) questions.

At this point, the number of people on your data analytics team can become a bottleneck to servicing requests for insight. And when other departments like IT, audit, and compliance see the benefits from data-driven decisions, the volume of requests for insight can increase exponentially.

To avoid this, think about how to enable people who aren't data analytics experts to interact with data so they can "self-serve" insights. Who are your different stakeholders? What insights and metrics do they need to run their business process? How will they need to explore the dimensions of data relevant to answer questions they have and diagnose issues? Mapping a workflow of "these stakeholders, need this insight, from this data" helps answer these questions and will enable you to identify data sets that are relevant to decisions for multiple stakeholders. In turn, this helps you focus efforts to understand high-value data sets as early as possible.

Related Content:

Learn from the industry’s most knowledgeable CISOs and IT security experts in a setting that is conducive to interaction and conversation. Click for more info and to register.

Nik Whitfield is the founder and CEO at Panaseer. He founded the company with the mission to make organizations cybersecurity risk-intelligent. His  team created the Panaseer Platform to automate the breadth and depth of visibility required to take control of ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Stop Defending Everything
Kevin Kurzawa, Senior Information Security Auditor,  2/12/2020
Small Business Security: 5 Tips on How and Where to Start
Mike Puglia, Chief Strategy Officer at Kaseya,  2/13/2020
5 Common Errors That Allow Attackers to Go Undetected
Matt Middleton-Leal, General Manager and Chief Security Strategist, Netwrix,  2/12/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
6 Emerging Cyber Threats That Enterprises Face in 2020
This Tech Digest gives an in-depth look at six emerging cyber threats that enterprises could face in 2020. Download your copy today!
Flash Poll
How Enterprises Are Developing and Maintaining Secure Applications
How Enterprises Are Developing and Maintaining Secure Applications
The concept of application security is well known, but application security testing and remediation processes remain unbalanced. Most organizations are confident in their approach to AppSec, although others seem to have no approach at all. Read this report to find out more.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-20477
PUBLISHED: 2020-02-19
PyYAML 5.1 through 5.1.2 has insufficient restrictions on the load and load_all functions because of a class deserialization issue, e.g., Popen is a class in the subprocess module. NOTE: this issue exists because of an incomplete fix for CVE-2017-18342.
CVE-2019-20478
PUBLISHED: 2020-02-19
In ruamel.yaml through 0.16.7, the load method allows remote code execution if the application calls this method with an untrusted argument. In other words, this issue affects developers who are unaware of the need to use methods such as safe_load in these use cases.
CVE-2011-2054
PUBLISHED: 2020-02-19
A vulnerability in the Cisco ASA that could allow a remote attacker to successfully authenticate using the Cisco AnyConnect VPN client if the Secondary Authentication type is LDAP and the password is left blank, providing the primary credentials are correct. The vulnerabilities is due to improper in...
CVE-2015-0749
PUBLISHED: 2020-02-19
A vulnerability in Cisco Unified Communications Manager could allow an unauthenticated, remote attacker to conduct a cross-site scripting (XSS) attack on the affected software. The vulnerabilities is due to improper input validation of certain parameters passed to the affected software. An attacker ...
CVE-2015-9543
PUBLISHED: 2020-02-19
An issue was discovered in OpenStack Nova before 18.2.4, 19.x before 19.1.0, and 20.x before 20.1.0. It can leak consoleauth tokens into log files. An attacker with read access to the service's logs may obtain tokens used for console access. All Nova setups using novncproxy are affected. This is rel...