Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

2/12/2018
01:00 PM
Dan Koloski
Dan Koloski
Commentary
Connect Directly
Twitter
RSS
E-Mail vvv
100%
0%

Better Security Analytics? Clean Up the Data First!

Even the best analytics algorithms using incomplete and unclean data won't yield useful results.

Our industry is losing the cybersecurity war. Not a week goes by in which we don't hear about a new data breach. Overwhelmed security operations center (SOC) personnel, who already were in short supply, are leaving the profession because of sheer exhaustion. The rapid rate of change brought on by DevOps and cloud computing has completely overwhelmed our traditional, rules-based perimeter defense. Sophisticated hacking syndicates and nation-states are coming at us with machines, and we're responding with humans.

The industry's current response to this has been to offer practitioners a dizzying array of shiny, new artificial intelligence (AI)-enabled analytics regimes, each of which claims to have better algorithms than everybody else. Nowhere has this been more pronounced than in standalone user and behavior analytics regimes, but it's undeniable that there has been a rush to add fancy new analytical features to existing, siloed security information and event management tools, intrusion-detection systems, threat feeds, network monitoring, cloud access security brokers, common vulnerabilities and exposures lists, configuration management databases, log tools, and more.

Here's the problem with that approach: even the best analytics algorithms operating against incomplete and unclean data aren't going to yield useful results.

Economists, behavioral scientists, mathematicians, and ethicists often refer to the concept of "imperfect information," in which parties involved in the decision-making process (be it a market, game-theory scenario, or ethical question) do not have equal access to all the information required to make a decision. The concept is important because it is both theoretically and empirically demonstrable that imperfect information leads to bad outcomes; for example: markets don't function as well, game theory doesn't accurately predict what will happen, or one party in a transaction takes advantage of another. The drive toward transparency in many areas of business and life is a direct reflection of the fact that imperfect information is undesirable. Even though truly perfect information may be unachievable, most transactional and behavioral scenarios certainly benefit from the availability of less-imperfect information (or in other words, closer-to-perfect information).

The environment already presents a huge amount of data to the SOC. We have security events, user activity, intrusion detection, threat intelligence, network activity, cloud access, known exploits and vulnerabilities, configuration and IT activity metrics, security and operational logs, identity, and many other sources of data. Each of these sources tends to both emanate from and land in separate data silos. Traditionally, we expected our human SOC operators to be able to work across all of these silos, process all of this data, and turn it into actionable information. That didn't work. SOCs were overwhelmed, exhausted people naturally missed things, we didn't have all the information we needed, and we landed pretty much where we are today.

Now we are expecting that bolted-on, AI-enabled regimes will solve all of our problems. It's true that machines don't get tired and can analyze more data at scale than humans. That's good. But machines can only analyze the data with which they are presented. That means if we apply AI to, say, our user activity data silo, but that data is separated from our configuration information silo, our topology-mapping silo, or our network monitoring silo (you get the idea…), we're back to the imperfect information problem. Fancy analytics against imperfect information still yields decisions you can't entirely trust. If you can't trust the decisions, how can you automate the remediation based on them?

Time for a Fresh Approach
The hard truth is that we need to rethink our data tier. A data tier that perpetuates unconnected silos of data and expects an AI-enabled analytic regime to somehow normalize across them will yield the same "analysis paralysis" that faces human operators: too much uncertainty and too many gray areas to draw a definitive conclusion (and therefore to take action). The common phrase for this is "garbage in, garbage out." The reason that truth is hard is because most SOCs have substantial investment in those siloed data tiers already, and there is natural inertia to consider replacing them.

A better data tier will allow the ingest and normalization of the full operational and security data set as a single data lake that can then be optimized for AI-enabled analysis. I say "operational and security data set" because they are closely related. For example, user activity drives optimization for performance (operations) and hardening (security). Configuration information is critical to resolving performance issues (operations) and vulnerabilities (security). Derived topology and dependency mapping is as equally useful for troubleshooting performance problems as it is for data-loss prevention and attack detection.

Better data tiers exist, but they aren't bolt-ons to existing silos; they are replacements for them. While that may be hard to swallow, we need to adapt to the new reality, and a bolt-on approach won't get us there. Armed with better and cleaner data, an AI-based analytics regime is more able to derive better conclusions, and those conclusions can be used to directly interface with automated remediation, yielding a highly automated cyber-defense regime that is more appropriate for today's threat environment.

Think radically. Your attackers are, I assure you.

Related Content:

 

Black Hat Asia returns to Singapore with hands-on technical Trainings, cutting-edge Briefings, Arsenal open-source tool demonstrations, top-tier solutions and service providers in the Business Hall. Click for information on the conference and to register.

Dan Koloski is a software industry expert with broad experience as both a technologist working on the IT side and as a management executive on the vendor side. Dan is a Vice President in Oracle's Systems Management and Security products group, which produces the Oracle ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
AMoss
50%
50%
AMoss,
User Rank: Guru
3/11/2018 | 2:14:13 PM
data quality & normalization
Absolutely agree.  We should be spending as much time normalizing our operational/config/posture data as we are threat data.  
COVID-19: Latest Security News & Commentary
Dark Reading Staff 11/19/2020
New Proposed DNS Security Features Released
Kelly Jackson Higgins, Executive Editor at Dark Reading,  11/19/2020
How to Identify Cobalt Strike on Your Network
Zohar Buber, Security Analyst,  11/18/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: This comment is waiting for review by our moderators.
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-25159
PUBLISHED: 2020-11-24
499ES EtherNet/IP (ENIP) Adaptor Source Code is vulnerable to a stack-based buffer overflow, which may allow an attacker to send a specially crafted packet that may result in a denial-of-service condition or code execution.
CVE-2020-25654
PUBLISHED: 2020-11-24
An ACL bypass flaw was found in pacemaker before 1.1.24-rc1 and 2.0.5-rc2. An attacker having a local account on the cluster and in the haclient group could use IPC communication with various daemons directly to perform certain tasks that they would be prevented by ACLs from doing if they went throu...
CVE-2020-28329
PUBLISHED: 2020-11-24
Barco wePresent WiPG-1600W firmware includes a hardcoded API account and password that is discoverable by inspecting the firmware image. A malicious actor could use this password to access authenticated, administrative functions in the API. Affected Version(s): 2.5.1.8, 2.5.0.25, 2.5.0.24, 2.4.1.19.
CVE-2020-29053
PUBLISHED: 2020-11-24
HRSALE 2.0.0 allows XSS via the admin/project/projects_calendar set_date parameter.
CVE-2020-25640
PUBLISHED: 2020-11-24
A flaw was discovered in WildFly before 21.0.0.Final where, Resource adapter logs plain text JMS password at warning level on connection error, inserting sensitive information in the log file.