Throughout my military career, I had two- and three-star generals ask — no, demand — that our security and operations center have measurable cybersecurity metrics. They’d challenge me with the same gamut of questions: “How do you know we are making a difference? Are we getting any better? How do we calculate our return on investment if we don’t know what to measure?”
I retired from the U.S. Army in 2012. I was never able to answer any of those demands for “good” cybersecurity metrics.
One of the metrics I talked myself out of providing was our number of infected hosts. Is a low number good or bad? If it is low, I am paranoid that I am missing threat activity. If the number is high, there’s a bigger problem at hand. No matter the number, you can never find the denominator (i.e., the actual number of infected hosts).
From there, I considered another metric: number of security events. This caused me concern as well. Most complex environments detect billions of daily security events. It is impossible to characterize them as true positives or false positives. Plus, I can’t be sure of the number of dreaded true negatives. How many events evaded detection by our security sensors?
Nothing felt informative or effective.
After I left the military, I finally figured it out. I was fortunate enough to manage an incident response and forensics team. Everything a forensics teams does in their investigations is in the context of the Kill Chain. This is the seven-step sequence of events that must occur for a threat actor to achieve their objectives (e.g., steal or destroy data).
While examining the Kill Chain, the idea dawned on me. I could measure the one variable that a threat actor had to have in order to be successful: dwell time in the network. I needed to eliminate or reduce the amount of time they have to complete the Kill Chain. That’s it. If I could limit dwell time, the threat actor would not have what they needed to progress through the Kill Chain.
Dwell time, which is the duration a threat actor has in an environment before they are detected or eliminated by the security team, is something I could measure fairly accurately with a good forensics investigation.
There are a number of well-known dwell time benchmarks to get a good baseline to measure against. Most of the major annual cybersecurity reports now cite the average dwell time number as being over 200 days. We can do better. We must do better.
With this renewed focus, I centered my security strategy around reducing dwell time by:
- Leveraging hardened CIS server builds
- Building an aggressive patching program focused on the most likely targeted servers in our data centers
- Using on-access scans for anti-malware tools
- Integrating traffic-shaping at that edge, with IP reputation management, to remove the noise for network intrusion detection and Layer 7 inspection
- Deploying a ‘zero-trust’ model in provision servers (i.e., only ports and protocols required for operation are open)
- Leveraging a SIEM with great correlation
Dwell time is my obsession. Through diligence and careful process, we continue to see this number drop in our customer environments. This change in thinking rallies the team around one standard (measuring the amount of time from detection to eradication) that is quantifiable and can be leveraged to calculate the effectiveness of a security strategy and overall posture.
No metric is perfect. But any other approach has too many unknowns that will overrun you with false positives. Until a new standard is found, dwell time will continue to be my obsession.