Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.
The New Secret Weapon in Breach Detection: Math and Data Science
It's time for organizations across industries to use math and data science to assess the probabilities of a breach. Here's how.
August 24, 2021
3 Min Read
Source: WrightStudio via Adobe Stock
The days of looking at log files to find security breaches are long gone. Don't get me wrong — log files are still useful. They are vital to confirming a breach and its cause, and necessary for forensics and remediation workflows. But manually sifting through logs to identify trouble is a waste of time in an era during which data grows exponentially, seemingly by the hour. This is further compounded by the complex interconnectedness and opacity of the digital supply chain necessary to deliver modern services.
For many of us, sitting through high school and college math courses, such as calculus, spurred a common question: "When am I ever going to use this in real life?" But for those of us that found our way to the information security world, the answer to that question is "now."
It's time for organizations across industries to take a page from the financial services book and use math and data science to assess the probabilities of a breach. Specifically, security teams can leverage time series data to build mathematical models that describe user behavior, and then look for anomalies and assign a probability that something is wrong.
Here are some of the elements and basic concepts of math and data science that organizations can use to improve their breach detection:
Derivatives. The word "derivative" may sound fancy, but it essentially means the rate of change with respect to time. For our purposes, a sudden increase in the number of authentication failures per unit time (per hour, per day, and so on) is a derivative worth watching. For example, if authentication failures jump from five or ten per day to 100 or more, it's a sign that a breach is being attempted (best case) or has already happened (worst case). Here you want to look at the derivative of a function rather than the quantity.
Mathematical models. Another concept that's useful in our field is building mathematical models of asset behavior. For example, think of a software-as-a-service product or platform as an asset. How can we provide a baseline norm that can then be used to spot anomalies? You might model GitHub if you use it as a code repository by monitoring metrics over time for some set of critical operations, such as "clone," "merge," "delete," "add user," or "generate access token."
Cardinality. These examples can also include the notion of cardinality — the number of elements of a set. This could be logins from known devices, in that we are looking for a change in the quantity of specific critical operations to represent a possible indicator of compromise. But in order to derive this, we first need to "learn" it. As a basic example, say the number of devices used to log in from a CEO is almost always three per day — one on their phone, one on their tablet, and another on their laptop. If that number grows to four or five, it could be that the CEO started working on a new device or two (still worth confirming). But if it jumps significantly, there is a high probability of a breach.
Many organizations and security teams still do breach detection the old-fashioned way, collecting and searching logs for patterns or regular expressions from any and everywhere, but it's clearly not adequate. Again, logs are still useful for forensics. But to limit the window of exposure and improve time-to-detection so that remediation activities can be initiated sooner, combining time series data with math and data science principles is proving to be extremely valuable.
About the Author(s)
CISO at InfluxData
As the Chief Information Security Officer (CISO) at InfluxData, Peter Albert is responsible for ensuring the security of InfluxData's information systems and services. With more than 30 years of experience in the security, technology, and telecommunications industries, Peter brings tremendous technical leadership and operational expertise to the company.
Prior to joining InfluxData, Peter spent 3 years at IOActive, a premier, boutique security consultancy, where he advised various Global 1000 companies on their security program. Before that, he was responsible for managing global operations and expansion of the QualysGuard global SaaS infrastructure, overseeing its worldwide security operation centers (SOCs). He has also held various leadership positions in architecture, engineering, and operations with iPass Inc. and General Magic.
Having grown up in Silicon Valley, Peter joined his first start-up at age 16 managing databases.
You May Also Like
Unbiased Testing. Unbeatable ResultsFeb 22, 2024
Unbiased Testing. Unbeatable ResultsFeb 22, 2024
Your Everywhere Security guide: Four steps to stop cyberattacksFeb 27, 2024
Your Everywhere Security Guide: 4 Steps to Stop CyberattacksFeb 27, 2024
API Security: Protecting Your Application's Attack SurfaceFeb 29, 2024
A screen displaying many different types of charts and graphs to show what data is being analyzed.Cybersecurity Analytics