20 Ways to Increase the Efficiency of the Incident Response Workflow

Despite all the good intentions of some great security teams, we are still living in a "cut-and-paste" incident management world.

Joshua Goldfarb, Global Solutions Architect — Security

April 10, 2018

6 Min Read

Image Credit: DuMont Television/Rosen Studios. Public domain, via Wikimedia Commons.

I am a big fan of efficiency. Why do I love efficiency? Mainly because introducing efficiencies into processes saves time and money. There are other benefits as well, such as decreased chance for human error, improved accuracy, and increased productivity.

Unfortunately, in the incident response world, the overall state of inefficiency still reigns supreme. Despite the good intentions of some great security teams, we are still largely living in a "cut-and-paste" incident management world. I know quite a few talented teams in good organizations that struggle to introduce efficiencies into their incident response process.

That's not to say that there aren't a lot of people talking about introducing efficiencies into incident response. But somehow all that talk hasn't resulted in a lot of change on the ground. There are likely a number of different reasons. Part of me wonders if there is a gap in understanding where organizations end up sinking time on manual incident management tasks. Perhaps it would help to enumerate areas where organizations are likely begging for efficiencies in their incident response workflow. Here are 20:

Intelligent mapping between alerts/events and tickets/work queue. A security organization may work with billions of events, tens or hundreds of thousands of alerts, and hundreds of tickets on a given day. Alerts are typically generated automatically based on logic covering one or more events. One can debate the quality and fidelity of the alerts, but the process is relatively automated. But which of the alerts makes it into a ticket that needs to be worked? Unfortunately, that process is far less well-defined and for the most part, intensely manual.
Pre-emptive prioritization. It has always amazed me that we wait until our alerts get into our work queue before thinking about prioritization. That means that our teams need to comb through tens of thousands of data points that add little to no value to our security postures in order to get to the data points that do add value. Why not think about the risks and threats we face and look to prioritize at the beginning of the content development process, before a single alert finds its way to the work queue?
Front-loading analytics. Why run analytics over a mass of data whose context and meaning we know very little about? Why not run analytics strategically at the beginning of the content development process to produce higher-quality alerts and more contextually aware, meaningful data to send to the work queue?
User identification. We all need to identify the user when looking into an alert. So, why do we continue to do this step manually?
Asset identification. See No. 4.
Vet the alert. Chances are that we check the same five or 10 things when vetting most alerts. We might even follow a written procedure instructing us how to vet a given family of alerts. So why do we not automate much of this work?
Understand the alert. Once we vet an alert, we need to gain at least a basic understanding of what is going on. That typically involves reviewing the alert, along with additional supporting evidence. Why not pull in that supporting evidence automatically?
Extract IOCs. Chances are that if you're investigating something involving malicious code or a malicious link, you will want to extract the indicators of compromise for a variety of reasons. In 2018, don't you wish you didn't have to perform this step manually?
Build the narrative. Decisions require context and understanding. So, as you build the narrative around the alert you're investigating, wouldn't you prefer to have much of the manual work done for you automatically so that you can focus on analysis and incident response?
Analyze. Ah, analysis. Quite possibly your favorite part of the entire workflow. So, why are you still cutting and pasting into and out of Excel? Can't we do better than that?
Identify the infection/intrusion vector. As one result of all of our analysis, we'll want to identify gaps in our security posture and close them. Chances are that once we identify any gaps, we will need to log in to one or more entirely separate system to take any action toward closing those gaps.
Pivot. Once we have isolated one or more hosts that are behaving oddly, we will likely want to pivot to study what those very hosts have been up to recently. Yup, you guessed it. That likely involves cutting and pasting, along with setting up additional queries.
Look for related activity. Once we have a decent understanding of what we're working with, another type of pivot that needs to happen is one that will enable us to look for similar types of activity elsewhere. I know you won't be surprised when I tell you that we are once again looking at a lot of cutting and pasting, alongside additional queries.
Identify/fill gaps in alerting. In the event that we missed something important, we will need to understand why and address the gap in alerting. Of course, we will need to drive this process ourselves. Wouldn't it be nice if our tooling could suggest how we might identify something we missed more proactively in the future?
Identify root cause. After any incident, it's important to understand what the root cause was. But that is a very manual process. Wouldn't it be nice to have some assistance here?
Improve security posture. Say I discover a new set of malicious domains or something analogous. I might want to block it, sinkhole it, or do something else. Manually, of course.
Include everything in the ticket. I once worked for a boss who said, "If it isn't written down, it didn't happen." He was absolutely right. But why does recording everything in the incident ticket have to involve so much cutting and pasting?
Report. Large or serious incidents typically involve a post-incident report. If I already recorded all of the important details in my incident ticket, why do I need to redo all that work to put together a respectable report that I can be proud to share with management, executives, and other stakeholders?
Communicate. Clear, concise, and timely communication can make all the difference in handling an incident. So, why do I find myself cutting and pasting into emails instead of pulling automatically from the system out of which I'm running the incident?
Extract lessons learned. No security program is perfect. We can always take lessons learned from anything we work with. Wouldn't it be great to have a little help from our tooling?

About the Author(s)

Joshua Goldfarb

Global Solutions Architect — Security, F5

Josh Goldfarb is currently Global Solutions Architect — Security at F5. Previously, Josh served as VP and CTO of Emerging Technologies at FireEye and as Chief Security Officer for nPulse Technologies until its acquisition by FireEye. Prior to joining nPulse, Josh worked as an independent consultant, applying his analytical methodology to help enterprises build and enhance their network traffic analysis, security operations, and incident response capabilities to improve their information security postures. Earlier in his career, Josh served as the Chief of Analysis for the United States Computer Emergency Readiness Team, where he built from the ground up and subsequently ran the network, endpoint, and malware analysis/forensics capabilities for US-CERT. In addition to Josh's blogging and public speaking appearances, he is also a regular contributor to Dark Reading and SecurityWeek.

See more from Joshua Goldfarb

Related Topics

Related Topics

Related Topics

Related Topics

About the Author(s)

Editor's Choice