Vulnerabilities / Threats

01:00 PM
Dan Koloski
Dan Koloski
Connect Directly
E-Mail vvv

Better Security Analytics? Clean Up the Data First!

Even the best analytics algorithms using incomplete and unclean data won't yield useful results.

Our industry is losing the cybersecurity war. Not a week goes by in which we don't hear about a new data breach. Overwhelmed security operations center (SOC) personnel, who already were in short supply, are leaving the profession because of sheer exhaustion. The rapid rate of change brought on by DevOps and cloud computing has completely overwhelmed our traditional, rules-based perimeter defense. Sophisticated hacking syndicates and nation-states are coming at us with machines, and we're responding with humans.

The industry's current response to this has been to offer practitioners a dizzying array of shiny, new artificial intelligence (AI)-enabled analytics regimes, each of which claims to have better algorithms than everybody else. Nowhere has this been more pronounced than in standalone user and behavior analytics regimes, but it's undeniable that there has been a rush to add fancy new analytical features to existing, siloed security information and event management tools, intrusion-detection systems, threat feeds, network monitoring, cloud access security brokers, common vulnerabilities and exposures lists, configuration management databases, log tools, and more.

Here's the problem with that approach: even the best analytics algorithms operating against incomplete and unclean data aren't going to yield useful results.

Economists, behavioral scientists, mathematicians, and ethicists often refer to the concept of "imperfect information," in which parties involved in the decision-making process (be it a market, game-theory scenario, or ethical question) do not have equal access to all the information required to make a decision. The concept is important because it is both theoretically and empirically demonstrable that imperfect information leads to bad outcomes; for example: markets don't function as well, game theory doesn't accurately predict what will happen, or one party in a transaction takes advantage of another. The drive toward transparency in many areas of business and life is a direct reflection of the fact that imperfect information is undesirable. Even though truly perfect information may be unachievable, most transactional and behavioral scenarios certainly benefit from the availability of less-imperfect information (or in other words, closer-to-perfect information).

The environment already presents a huge amount of data to the SOC. We have security events, user activity, intrusion detection, threat intelligence, network activity, cloud access, known exploits and vulnerabilities, configuration and IT activity metrics, security and operational logs, identity, and many other sources of data. Each of these sources tends to both emanate from and land in separate data silos. Traditionally, we expected our human SOC operators to be able to work across all of these silos, process all of this data, and turn it into actionable information. That didn't work. SOCs were overwhelmed, exhausted people naturally missed things, we didn't have all the information we needed, and we landed pretty much where we are today.

Now we are expecting that bolted-on, AI-enabled regimes will solve all of our problems. It's true that machines don't get tired and can analyze more data at scale than humans. That's good. But machines can only analyze the data with which they are presented. That means if we apply AI to, say, our user activity data silo, but that data is separated from our configuration information silo, our topology-mapping silo, or our network monitoring silo (you get the idea…), we're back to the imperfect information problem. Fancy analytics against imperfect information still yields decisions you can't entirely trust. If you can't trust the decisions, how can you automate the remediation based on them?

Time for a Fresh Approach
The hard truth is that we need to rethink our data tier. A data tier that perpetuates unconnected silos of data and expects an AI-enabled analytic regime to somehow normalize across them will yield the same "analysis paralysis" that faces human operators: too much uncertainty and too many gray areas to draw a definitive conclusion (and therefore to take action). The common phrase for this is "garbage in, garbage out." The reason that truth is hard is because most SOCs have substantial investment in those siloed data tiers already, and there is natural inertia to consider replacing them.

A better data tier will allow the ingest and normalization of the full operational and security data set as a single data lake that can then be optimized for AI-enabled analysis. I say "operational and security data set" because they are closely related. For example, user activity drives optimization for performance (operations) and hardening (security). Configuration information is critical to resolving performance issues (operations) and vulnerabilities (security). Derived topology and dependency mapping is as equally useful for troubleshooting performance problems as it is for data-loss prevention and attack detection.

Better data tiers exist, but they aren't bolt-ons to existing silos; they are replacements for them. While that may be hard to swallow, we need to adapt to the new reality, and a bolt-on approach won't get us there. Armed with better and cleaner data, an AI-based analytics regime is more able to derive better conclusions, and those conclusions can be used to directly interface with automated remediation, yielding a highly automated cyber-defense regime that is more appropriate for today's threat environment.

Think radically. Your attackers are, I assure you.

Related Content:


Black Hat Asia returns to Singapore with hands-on technical Trainings, cutting-edge Briefings, Arsenal open-source tool demonstrations, top-tier solutions and service providers in the Business Hall. Click for information on the conference and to register.

Dan Koloski is a software industry expert with broad experience as both a technologist working on the IT side and as a management executive on the vendor side. Dan is a Vice President in Oracle's Systems Management and Security products group, which produces the Oracle ... View Full Bio
Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
User Rank: Guru
3/11/2018 | 2:14:13 PM
data quality & normalization
Absolutely agree.  We should be spending as much time normalizing our operational/config/posture data as we are threat data.  
The Case for Integrating Physical Security & Cybersecurity
Paul Kurtz, CEO & Cofounder, TruSTAR Technology,  3/20/2018
A Look at Cybercrime's Banal Nature
Curtis Franklin Jr., Senior Editor at Dark Reading,  3/20/2018
City of Atlanta Hit with Ransomware Attack
Dark Reading Staff 3/23/2018
Register for Dark Reading Newsletters
White Papers
Current Issue
How to Cope with the IT Security Skills Shortage
Most enterprises don't have all the in-house skills they need to meet the rising threat from online attackers. Here are some tips on ways to beat the shortage.
Flash Poll
[Strategic Security Report] Navigating the Threat Intelligence Maze
[Strategic Security Report] Navigating the Threat Intelligence Maze
Most enterprises are using threat intel services, but many are still figuring out how to use the data they're collecting. In this Dark Reading survey we give you a look at what they're doing today - and where they hope to go.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
Published: 2017-05-09
NScript in mpengine in Microsoft Malware Protection Engine with Engine Version before 1.1.13704.0, as used in Windows Defender and other products, allows remote attackers to execute arbitrary code or cause a denial of service (type confusion and application crash) via crafted JavaScript code within ...

Published: 2017-05-08
unixsocket.c in lxterminal through 0.3.0 insecurely uses /tmp for a socket file, allowing a local user to cause a denial of service (preventing terminal launch), or possibly have other impact (bypassing terminal access control).

Published: 2017-05-08
A privilege escalation vulnerability in Brocade Fibre Channel SAN products running Brocade Fabric OS (FOS) releases earlier than v7.4.1d and v8.0.1b could allow an authenticated attacker to elevate the privileges of user accounts accessing the system via command line interface. With affected version...

Published: 2017-05-08
Improper checks for unusual or exceptional conditions in Brocade NetIron 05.8.00 and later releases up to and including 06.1.00, when the Management Module is continuously scanned on port 22, may allow attackers to cause a denial of service (crash and reload) of the management module.

Published: 2017-05-08
Nextcloud Server before 11.0.3 is vulnerable to an inadequate escaping leading to a XSS vulnerability in the search module. To be exploitable a user has to write or paste malicious content into the search dialogue.