Breaking the Code: The Role of Visualization in Security Research
In today's interconnected, data rich IT environments, passive inspection of information is not enough.The human retina can transmit visual information to the brain at roughly the rate of an Ethernet connection, while reading text transmits information at roughly the rate of a dial-up modem.
Obviously, relying on text for the presentation of data has drawbacks, especially in the field of security research, which depends on the monitoring and analysis of large-scale, constantly evolving data sets. Meanwhile, using smart data visualization combined with intelligent data mining can allow researchers to draw connections between data points even in loosely related data, skipping the gradual comprehension of text files otherwise needed to reach the same results. Observations and conclusions can also be made through visualization that may not be obvious in text.
The security field offers an endless number of applicable uses for the visualization of loosely related data. Firewall, intrusion detection and prevention systems (IDS/IPS), and malware infection alerts could, for instance, be visualized to expose a malicious actor’s previously unrecognized activity patterns. By processing and analyzing very large log files, data visualization can help summarize and simplify the current state of a complex IT system in an accurate and elegant fashion.
The process
To get from data to visualization, semantic networks are a key. Also called frame networks, semantic networks can represent any desired relationship between any defined concepts or entities, and can be applied to nearly any problem.
Such networks consist of nodes (also called vertices) that represent the entities being examined, and edges (the connections between the nodes) that describe the relationships between the entities. A semantic network representing a company’s IT environment might consist of nodes that represent various types of server characteristics and environments (HTTP, Mail, NTP, SSH ...), and edges that specify relationships and their attributes (Channels, Ports, Traffic, Bandwidth, etc.)
But during the creation of any semantic network it is up to the user to define the entities and relationships. The nodes and edges of a semantic network, taken together, are called its domain and represent the model of the underlying information.
Of course, there is more than one way to model any given problem, but it is always best to approach the problem with the available data in mind. When a model has been decided upon, the source data should be parsed so as to populate a relational data set that follows the model.
Data-driven layouts
With the model and the data in hand, the next logical step is to derive insights from the shape of the resulting semantic network. A common method is to use force-directed layouts, where the data drives its own layout.
To get results, the semantic model is treated as a particle physics experiment. Each node is treated as a particle, and each edge is treated as an attracting or repelling force. Connected nodes will attract each other, and unconnected nodes will repel each other.
Many physics variables can be used to control the movement of the nodes (gravity, charge, mass, temperature, etc.) and bring the forces on the nodes into equilibrium. The result is usually a molecule-like layout where relational clusters are aggregated in the same areas.
The general concept is relatively simple, and by implementing a physics engine we can transform relational data, however loosely related, into a 2D or 3D structure (a visualization). Since the structure will be defined by the relationships of the data, previously unnoticed clusters or patterns can, basically, highlight themselves. Consider the following example:
This image represents a graph of all email communication inside a company. All the nodes represent employees and the connections signify that an email was sent between them. This visualization instantly exposes three conditions: First, three main central clusters can be identified. This could
Thibault Reuille is a security researcher at OpenDNS and creator of OpenGraphiti, an open-source 3D data visualization engine. Prior to OpenDNS, he was a software engineer for Nvidia, where he helped develop the Nvidia Parallel Nsight integrated development environment for ... View Full Bio
1 of 2

More Insights