Handling Threat Intelligence Across Billions of Data Points

Threat Intelligence

News, news analysis, and commentary on the latest trends in cybersecurity technology.

Graph databases can play a role in threat intelligence and unraveling sprawling data.

Sherman Ye, co-founder, CEO, vesoft

October 11, 2021

5 Min Read

Businessman watching a virtual widescreen of data analysis.

Federico Caputo / Alamy Stock Photo

Most large, well-known organizations are under constant cybersecurity threats. This is why threat intelligence is arguably important enough to warrant its own team. But threat intelligence involves many factors that, more than ever, demand a newer, sophisticated approach. It begins with figuring out how data can be best used to fight security threats.

Threat intelligence has many facets. Various and diverse entities can include websites, apps, back-office systems, user accounts, and many more entry or access points. These systems can all have complex associations and relationships – not just with each other but also over time. The amount of data that can be collected is practically infinite for large organizations.

In fact, these data sets can be billions to trillions of combinations of data points. Looked at disparately, these data entities can be meaningless. But understanding how they might relate can be highly revealing. As a result, a graph database is ideal for unraveling the mystery of sprawling data.

Why a Graph Database?
A common relational database degrades in performance the more data there is. In particular, it performs even more poorly in handling relational operations of complicated data. Simply put, relational databases are outdated and not built to the task of traversing billions or more data points or to relating data to each other.

As a result, graph databases – though they have been around for a while – have recently grown in popularity. Threat intelligence happens to be an ideal use case to throw at a graph database. They are specifically built to uncover relationships between data and between data sets, not just to pull up data. How they work can get complex. The important takeaway is graph databases are different than RDBMS databases because they store deep relationship characteristics about data within data itself.

Basic Data Capability Needs
Central to a graph database solution is the ability to write data and query the data with speed. An organization, like a government agency or multinational company, might have billions or more data points, so they might require a database to support batch offline importing of data generated each day. This is because tens of billions of relational data might be generated daily. This data needs to be written to the database in hours so the system is ready again for the next day.

Next, it is ideal for the graph database to support online, real-time queries. Query performance should be possible within milliseconds. Filtering capabilities are also essential. For example, a data scientist will most likely need to query database vertices and edges by property.

So, essentially, the graph database to be used should allow writing data in real time and offline and querying online graph data. These basics are fundamental for big data analytics involving large-scale threat intelligence.

Modularity in Graph Databases
Another important factor to consider when figuring out graph database structure how much data will need to be handled. As mentioned, a large organization, especially with numerous assets where data points are captured or stored, usually generate tens of billions or even a trillion graph data entities.

Separation of a compute and storage engine is ideal. Each can then be scaled and managed independently. Scalability support adds convenience and can enable redundancy. DevOps might also need to factor in whether they want the ability to scale their clusters online without stopping service in a production environment.

Basics of Graphing for Threat Intelligence
If we break down threat opportunities and protection points into their network layers, we can begin to define how graph models can help. For example, a bottom layer might consist of hash values in files as a point of weakness and file storage and transport as a defense layer opportunity. Next up might be the IP or domain name as a threat point whereas its network layer can be seen as a point of defense. We can do this up to a mobile phone number and its user being a threat point, and authentication of the user and device being points of defense.

For each of these layer points, a hacker and cybersecurity response team are normally in adversarial roles. This can be used to begin to define modeling. Typically, there was no great way to link them via any specific relationship. With a graph database, using vertices, edges, and properties, this becomes possible. We can form a three-dimensional hierarchical network to understand attack methods, tools used, and more.

For example, a connected device requires the network layer, device layer, account layer, and a user layer. For each of these layers, the device will have its own identification. With the help of the graph database, we can complete a three-dimensional risk recognition for this device.

Relationships between an account and a device should be weighted. For example, if an account usually uses a device, we can conclude the account is strongly linked with the device. So the weight of the relationship should be higher. Similarly, if an account uses a device to commit criminal activity, it can mean the account is weakly linked with the device. So the weight of this relationship is lower.

Such defined edges do not just have weight properties. They should also have time properties. This way you can more greatly correlate account usage with devices across typical times they are used versus the opposite.

Getting Started with Graph Databases
As illustrated, there is complexity in interrelating data sets for meaningful big data. This is true across applications, from threat intelligence to real-time recommendations. However, seasoned programmers can easily get started. Open source graph database projects are available to test the waters.

About the Author(s)

Sherman Ye

co-founder, CEO, vesoft

Sherman Ye is co-founder and CEO of vesoft. Ye previously worked at Facebook and Ant Financial where he led graph database efforts. He also open sourced the distributed graph database Nebula Graph in 2019.

See more from Sherman Ye

Related Topics

Related Topics

Related Topics

Related Topics

About the Author(s)