Data Visualization: Keeping an Eye on SecurityVisualization can be one of the most powerful approaches a security team can use to make sense of vast quantities of data. So why does it end up as an afterthought?
Have you ever recognized someone but had trouble recalling their name? Or perhaps you've felt as if you had met someone before but couldn't place where? It turns out that this phenomenon is something that is quite common. It has to do with the way our brains are wired, and the ways in which we process visual information and non-visual information differently.
While I'm not a scientist, I do know from experience that the human eye can often identify visual patterns quite quickly. As an example of this, consider a bar graph with one marked outlier. If you look at the bar graph, you will likely identify this outlier fairly quickly. But what if I gave you all the empirical data in table form? It would likely take you far longer to identify the outlier, right?
There is a lesson in here for security, but not the one you might think. Visualization is often something on an organization's to-do list, and for good reason. Visualization is one of the most powerful approaches a security team can use to help make sense of vast quantities of data. But more often than not, organizations struggle to get the value out of visualization that they had hoped for. Instead of becoming one of the key tools for the security team, visualization often ends up as an afterthought relegated to a few monitors on the fringe of the Security Operations Center.
Image Source: agsandrew via ShutterStock
Why is this? To better understand what is going on here, we first need to take a step back and think about what we are trying to accomplish with visualization. In this context, visualization is essentially being used as an analytic. In other words, the human eye is being used as an analytical tool to better understand the data it is looking at, and to try and identify patterns or outliers in it. So what is causing the disconnect between the desired outcome and the reality of the matter?
As you might already know, analytics work best when focused on answering specific questions, or addressing specific use cases. For example, I'm sure you can appreciate the difference between trying to use analytics to find "something interesting" versus "privileged accounts that appear to be compromised." And therein lies the reason that most visualization efforts are so underwhelming. They are simply not aimed towards answering any particular question or addressing any particular use case.
What do I mean by this? Think about how most organizations approach visualization. Generally, these organizations take a bunch of raw, unprocessed data and represent it in any of a number of different types of graphs (e.g., time series, scatter plots, bar graphs, etc.). There is no focus here at all! If I were to ask these organizations the simple question, what are you looking to find with this visualization, they would most likely have no answer. Not surprisingly, the results of these visualization attempts almost always disappoint.
What's missing from this approach to visualization are the right questions. Questions force us to pause and think about what we’re actually trying to accomplish. As an example, think about a case where we are interested in looking for callbacks to potential command and control sites that may not yet be online. When a system is infected with malicious code, it often calls back to a command and control infrastructure seeking further instructions. Sometimes, the command and control infrastructure is not yet online, or the attacker wants the malicious code to "sleep" for a period of time before activating it. If we look for this type of activity, we can sometimes identify malicious callback domains that may not yet be widely known (and thus will not match any known signature or intelligence source).
We will likely want to go to our DNS data for this example. Further, we need to filter the data to look for domain requests that return no answer over a period of time (say the last 24 hours). Lastly, we'll want to aggregate, by domain name, a count of the number of requests matching this criteria. If we visualize the data that results from asking this question of the data, we will likely have a wildly different visualization experience entirely.
Let's say we order by count descending and use a bar graph to visualize the data set. We may have some instances of a small number of requests for a given domain that return no answer. These could be mistyped domain names, or perhaps some type of a misconfiguration. But if we have infected systems exhibiting this type of behavior, we will likely see a higher number of requests for one or more domain names that return no answer. Our human eye will be treated to something it can process quite easily and use to identify outliers very quickly.
So you see, it's all in how you interrogate the data. Visualization can be a powerful tool, but you have to know how to use it properly. When looking to leverage visualization, it is helpful to first ask yourself the question "What am I looking for?" The answer to this question can guide you to interrogate the data using a variety of queries and pivots to get it into a state where the actual visualization can be successfully leveraged.
Get the picture?
[Find out more about collecting, correlating & analyzing security data from leading threat intel experts during Interop ITX. For details on other Interop security tracks, or to register, click on the live links.]
Josh (Twitter: @ananalytical) is an experienced information security leader with broad experience building and running Security Operations Centers (SOCs). Josh is currently co-founder and chief product officer at IDRRA and also serves as security advisor to ExtraHop. Prior to ... View Full Bio