Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Endpoint

10/26/2017
02:30 PM
Connect Directly
LinkedIn
RSS
E-Mail vvv
100%
0%

Why Data Breach Stats Get It Wrong

It's not the size of the stolen data dump that is important. It's the window between the date of the breach and the date of discovery that represents the biggest threat.

Earlier this month, Yahoo announced that it had drastically underestimated the impact of the data breach it reported in 2016. In December 2016, the company reported that, in addition to its previous breach of 500 million accounts, an additional 1 billion accounts had been compromised in a separate breach. Now, it believes that all of its accounts were compromised — affecting over 3 billion users.

How did Yahoo get it so wrong? And what do these revised breach numbers mean? To understand this, we need to examine the dismal science behind calculating the impact of data breaches.

In one out of four cases, third parties discover the breach, typically after being affected by it or seeing the data distributed in darknets. In other cases, internal investigations discover anomalous behavior, such as a user accessing a database he shouldn't. Either way, it can be difficult to determine how much data was stolen. When a breach is discovered by third parties, it represents only a sample of exposed data, and the attacker may have accessed additional data.

In breaches found by internal probes, seeing that an attacker accessed one file or database does not mean other resources weren't accessed using methods that would appear differently to a forensic investigator (for example, logging in as one user to access one file, while using a different account to access another). There is a great deal of detective work — and estimation — required to describe the scope of a breach.

The round numbers that Yahoo provided illustrate this perfectly. Few consumer companies have sets of user information stored in neatly defined buckets of 500 million and 1 billion. At first, Yahoo likely found evidence to suggest certain data was accessed and had to extrapolate estimates from there. But there is always an element of estimation, or to put it another way, guesswork. Yahoo's latest statement reads: "the company recently obtained new intelligence and now believes [...] that all Yahoo user accounts were affected by the August 2013 theft." Companies are compelled to make, and revise, educated guesses at each stage to demonstrate control of the situation, and to be as transparent and responsible as possible. So it's not surprising that these figures often grow.

The Lag on Lag Time
Another challenge that affects a company's ability to get these figures right is lag time. The breaches reported by Yahoo happened years earlier. There was also a significant lag between breach detection and public notice with Equifax — nearly six weeks. Sophisticated attackers are not only adept at finding system vulnerabilities, but they typically have plenty of time to access data carefully and hide their tracks. Add in several years of buffer time for a company's own available analytics to atrophy for useful forensic purposes (e.g., regularly cleared log files) and detecting unauthorized access and estimating damage becomes even more difficult.

In many ways, lag time is the most critical problem. The days, weeks, and years that pass before breaches are discovered (if they are discovered at all), gives attackers all the time they need to extract full value from the data they have stolen. In the case of stolen usernames and passwords, they are used in credential stuffing attacks to compromise millions of accounts at banks, airlines, government agencies, and other high-profile companies. Businesses are starting to follow recent NIST guidelines that recommend searching darknets for "spilled" credentials. But, by then, the original attacker has already used the credentials to break into accounts. At that point, the credentials are commoditized and are becoming worthless.

So, how should we think differently about data breach statistics? First, we need to remember that huge breaches may have already occurred but remain undiscovered. By definition, we can discuss only the breaches we know about. The more sophisticated the attacker, the greater the likelihood that it will take time to detect their breach. The second thing to remember is that a data breach is like a natural disaster, in that it has follow-on effects throughout the Internet ecosystem and our economy, enabling credential stuffing, account takeover, identity theft, and other forms of fraud. The indirect impact of data breaches is harder to quantify than the scope of the original breaches, and may outstrip the original breach in total harm by orders of magnitude.

The larger a data breach is suspected to be, the more attention it receives. But the scope of the problem is vast and hard to quantify; the projected numbers are just the tip of the iceberg in representing the risk consumers and business users face. It's not the size of the stolen data dump that we need to focus on. It's the window between the date of the breach and the date of discovery that represents the real danger zone. This is when cybercriminals are doing the most harm, using stolen data to break into more accounts, steal more data and identities, and transfer funds. The smart move for every corporate user or consumer is to create strong passwords, never reuse them across sites, monitor financial accounts, and be cautious with all data and shared online services.

Related Content:

Join Dark Reading LIVE for two days of practical cyber defense discussions. Learn from the industry’s most knowledgeable IT security experts. Check out the INsecurity agenda here.

 

Shuman Ghosemajumder is global head of artificial intelligence at F5 Networks (NASDAQ: FFIV). Shuman was previously chief technology officer of Shape Security, which was acquired by F5 in 2020. Shape's technology platform is the primary application defense for the world's ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
COVID-19: Latest Security News & Commentary
Dark Reading Staff 7/2/2020
Ripple20 Threatens Increasingly Connected Medical Devices
Kelly Sheridan, Staff Editor, Dark Reading,  6/30/2020
DDoS Attacks Jump 542% from Q4 2019 to Q1 2020
Dark Reading Staff 6/30/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
How Cybersecurity Incident Response Programs Work (and Why Some Don't)
This Tech Digest takes a look at the vital role cybersecurity incident response (IR) plays in managing cyber-risk within organizations. Download the Tech Digest today to find out how well-planned IR programs can detect intrusions, contain breaches, and help an organization restore normal operations.
Flash Poll
The Threat from the Internetand What Your Organization Can Do About It
The Threat from the Internetand What Your Organization Can Do About It
This report describes some of the latest attacks and threats emanating from the Internet, as well as advice and tips on how your organization can mitigate those threats before they affect your business. Download it today!
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-9498
PUBLISHED: 2020-07-02
Apache Guacamole 1.1.0 and older may mishandle pointers involved inprocessing data received via RDP static virtual channels. If a userconnects to a malicious or compromised RDP server, a series ofspecially-crafted PDUs could result in memory corruption, possiblyallowing arbitrary code to be executed...
CVE-2020-3282
PUBLISHED: 2020-07-02
A vulnerability in the web-based management interface of Cisco Unified Communications Manager, Cisco Unified Communications Manager Session Management Edition, Cisco Unified Communications Manager IM & Presence Service, and Cisco Unity Connection could allow an unauthenticated, remote attack...
CVE-2020-5909
PUBLISHED: 2020-07-02
In versions 3.0.0-3.5.0, 2.0.0-2.9.0, and 1.0.1, when users run the command displayed in NGINX Controller user interface (UI) to fetch the agent installer, the server TLS certificate is not verified.
CVE-2020-5910
PUBLISHED: 2020-07-02
In versions 3.0.0-3.5.0, 2.0.0-2.9.0, and 1.0.1, the Neural Autonomic Transport System (NATS) messaging services in use by the NGINX Controller do not require any form of authentication, so any successful connection would be authorized.
CVE-2020-5911
PUBLISHED: 2020-07-02
In versions 3.0.0-3.5.0, 2.0.0-2.9.0, and 1.0.1, the NGINX Controller installer starts the download of Kubernetes packages from an HTTP URL On Debian/Ubuntu system.