4 min read

Data Transparency Hasn't Made Us Safer Yet. Can It Uncover Breach Causality?

Advanced machine learning models within an XDR framework could uncover what actually causes breaches, but first we need better data transparency.

Company leaders worldwide are making huge investments to improve security, but they're still awaiting a big return. According to Gartner, global spending on information security and risk management is expected to top $150 billion this year. One survey found its respondents pay an average of $2.7 million per year on security engineering, but only 51% found their engineering efforts to be effective or very effective.

One of the reasons is that we're still looking at vulnerabilities we missed instead of taking a more proactive approach. Our best methodology today is to have very wide nets cast that aim to find technical anomalies or breaks in patterns. There's no feedback loop to make sure that the alerts the security operations center (SOC) receives will actually stop resources from becoming compromised.

To get there, we need to understand what causes a breach. We already do this for vulnerabilities, which are direct causes of exploits, but that parallel doesn't exist in users or networks. Verizon's "Data Breach Investigations Report" (DIBR) is the only model that takes a stab at what causes breaches, and it's still a statistical guess.

Data sharing hasn't brought about any solid answers in causality yet. Companies don't get anything in return from joining threat sharing agreements or the data that comes with them. With risk-based machine learning models, however, we could give something back. Every data submission makes the model better, which makes everybody safer.

Why Data Sharing Hasn't Worked
Breach data is rare and causal breach data even more so because it could contain personal identifiable information (PII) or could be proprietary and release too much information about a target, so in those cases it's never made public. We need to find a broker to mediate that so we can tie telemetry data to a cause.

We have learned from the past 20 years of security that a company submitting their incident data isn't going to happen. Some companies are reluctant to share data because it shines a negative light on them, forcing a mea culpa and admitting what exactly they couldn't stop. In some cases, there's even a legal liability keeping them from disclosing information. They're often fighting legal battles claiming that they weren't negligent. This leaves the data that informs the models we're using without enough context.

To improve data transparency, we need to look not at the victims, but at their insurers and vendors. When organizations are looking to get their insurance companies to pay out for a breach, it's often too complex of a scenario to see the data we need. If we look from the other side of the coin, data from insurers will tell us what they're paying out for. These scenarios are the ones that CEOs will care about most and the signals present in those breaches would help define a causal relationship. We need that data in aggregate and in a passive, anonymized way.

How Machine Learning Can Help
If we had better data — we're talking quality, not quantity — we could start saving SOC teams time by creating probabilistic models that could get closer to showing causality. SOC teams are increasingly bogged down with more work, trying to sift through noise to find the alerts that matter most, and analysts are leaving despite being paid more.

Machine learning has already created suggestive models in other scenarios where the stakes are much lower, like what else we'd like to watch on Netflix, which credit cards would benefit us most as consumers, or which accounts we should follow on social media. This can soon be applicable to saving time for SOC analysts.

Right now, the best we have isn't good enough. Just look at the attack and response concept called indicator of compromise (IoC). Even the name is an admission that there is no certainty, yet companies actually sell IoC data because that guess is at least something as they try to determine causality.

XDR's Influence on Decision-Making
As machine learning supports decision-making, we're also seeing an evolution from endpoint detection and response (EDR) to extended detection and response (XDR). Gartner describes the latter as "SaaS-based, vendor-specific, security threat detection and incident response tool that natively integrates multiple security products into a cohesive security operations system that unifies all licensed components."

Although there is no standard framework for XDR, it provides the technology that allows us to centralize data and extend the telemetry to get closer to finding causation. The promise of XDR is that it analyzes endpoints, networks, servers, clouds, SIEM, email, and more, contextualizing attacker behavior to drive meaningful action.

Marrying more comprehensive data to the prospect of what XDR can accomplish is our best bet for being able to show causality in real time, saving SOC teams time and solving one of the biggest pain points in cybersecurity.