Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Endpoint

10/26/2017
02:30 PM
Connect Directly
LinkedIn
RSS
E-Mail vvv
100%
0%

Why Data Breach Stats Get It Wrong

It's not the size of the stolen data dump that is important. It's the window between the date of the breach and the date of discovery that represents the biggest threat.

Earlier this month, Yahoo announced that it had drastically underestimated the impact of the data breach it reported in 2016. In December 2016, the company reported that, in addition to its previous breach of 500 million accounts, an additional 1 billion accounts had been compromised in a separate breach. Now, it believes that all of its accounts were compromised — affecting over 3 billion users.

How did Yahoo get it so wrong? And what do these revised breach numbers mean? To understand this, we need to examine the dismal science behind calculating the impact of data breaches.

In one out of four cases, third parties discover the breach, typically after being affected by it or seeing the data distributed in darknets. In other cases, internal investigations discover anomalous behavior, such as a user accessing a database he shouldn't. Either way, it can be difficult to determine how much data was stolen. When a breach is discovered by third parties, it represents only a sample of exposed data, and the attacker may have accessed additional data.

In breaches found by internal probes, seeing that an attacker accessed one file or database does not mean other resources weren't accessed using methods that would appear differently to a forensic investigator (for example, logging in as one user to access one file, while using a different account to access another). There is a great deal of detective work — and estimation — required to describe the scope of a breach.

The round numbers that Yahoo provided illustrate this perfectly. Few consumer companies have sets of user information stored in neatly defined buckets of 500 million and 1 billion. At first, Yahoo likely found evidence to suggest certain data was accessed and had to extrapolate estimates from there. But there is always an element of estimation, or to put it another way, guesswork. Yahoo's latest statement reads: "the company recently obtained new intelligence and now believes [...] that all Yahoo user accounts were affected by the August 2013 theft." Companies are compelled to make, and revise, educated guesses at each stage to demonstrate control of the situation, and to be as transparent and responsible as possible. So it's not surprising that these figures often grow.

The Lag on Lag Time
Another challenge that affects a company's ability to get these figures right is lag time. The breaches reported by Yahoo happened years earlier. There was also a significant lag between breach detection and public notice with Equifax — nearly six weeks. Sophisticated attackers are not only adept at finding system vulnerabilities, but they typically have plenty of time to access data carefully and hide their tracks. Add in several years of buffer time for a company's own available analytics to atrophy for useful forensic purposes (e.g., regularly cleared log files) and detecting unauthorized access and estimating damage becomes even more difficult.

In many ways, lag time is the most critical problem. The days, weeks, and years that pass before breaches are discovered (if they are discovered at all), gives attackers all the time they need to extract full value from the data they have stolen. In the case of stolen usernames and passwords, they are used in credential stuffing attacks to compromise millions of accounts at banks, airlines, government agencies, and other high-profile companies. Businesses are starting to follow recent NIST guidelines that recommend searching darknets for "spilled" credentials. But, by then, the original attacker has already used the credentials to break into accounts. At that point, the credentials are commoditized and are becoming worthless.

So, how should we think differently about data breach statistics? First, we need to remember that huge breaches may have already occurred but remain undiscovered. By definition, we can discuss only the breaches we know about. The more sophisticated the attacker, the greater the likelihood that it will take time to detect their breach. The second thing to remember is that a data breach is like a natural disaster, in that it has follow-on effects throughout the Internet ecosystem and our economy, enabling credential stuffing, account takeover, identity theft, and other forms of fraud. The indirect impact of data breaches is harder to quantify than the scope of the original breaches, and may outstrip the original breach in total harm by orders of magnitude.

The larger a data breach is suspected to be, the more attention it receives. But the scope of the problem is vast and hard to quantify; the projected numbers are just the tip of the iceberg in representing the risk consumers and business users face. It's not the size of the stolen data dump that we need to focus on. It's the window between the date of the breach and the date of discovery that represents the real danger zone. This is when cybercriminals are doing the most harm, using stolen data to break into more accounts, steal more data and identities, and transfer funds. The smart move for every corporate user or consumer is to create strong passwords, never reuse them across sites, monitor financial accounts, and be cautious with all data and shared online services.

Related Content:

Join Dark Reading LIVE for two days of practical cyber defense discussions. Learn from the industry’s most knowledgeable IT security experts. Check out the INsecurity agenda here.

 

Shuman Ghosemajumder is CTO at Shape Security, which operates a global defense platform to protect web and mobile applications against sophisticated cybercriminal attacks. Shape is the primary application defense for the world's largest banks, airlines, retailers, and ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
7 Tips for Infosec Pros Considering A Lateral Career Move
Kelly Sheridan, Staff Editor, Dark Reading,  1/21/2020
For Mismanaged SOCs, The Price Is Not Right
Kelly Sheridan, Staff Editor, Dark Reading,  1/22/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win a Starbucks Card! Click Here
Latest Comment:   It's a PEN test of our cloud security.
Current Issue
IT 2020: A Look Ahead
Are you ready for the critical changes that will occur in 2020? We've compiled editor insights from the best of our network (Dark Reading, Data Center Knowledge, InformationWeek, ITPro Today and Network Computing) to deliver to you a look at the trends, technologies, and threats that are emerging in the coming year. Download it today!
Flash Poll
How Enterprises are Attacking the Cybersecurity Problem
How Enterprises are Attacking the Cybersecurity Problem
Organizations have invested in a sweeping array of security technologies to address challenges associated with the growing number of cybersecurity attacks. However, the complexity involved in managing these technologies is emerging as a major problem. Read this report to find out what your peers biggest security challenges are and the technologies they are using to address them.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-5124
PUBLISHED: 2020-01-25
An exploitable out-of-bounds read vulnerability exists in AMD ATIDXX64.DLL driver, version 26.20.13001.50005. A specially crafted pixel shader can cause a denial of service. An attacker can provide a specially crafted shader file to trigger this vulnerability. This vulnerability can be triggered fro...
CVE-2019-5146
PUBLISHED: 2020-01-25
An exploitable out-of-bounds read vulnerability exists in AMD ATIDXX64.DLL driver, version 26.20.13025.10004. A specially crafted pixel shader can cause a denial of service. An attacker can provide a specially crafted shader file to trigger this vulnerability. This vulnerability can be triggered fro...
CVE-2019-5147
PUBLISHED: 2020-01-25
An exploitable out-of-bounds read vulnerability exists in AMD ATIDXX64.DLL driver, version 26.20.13003.1007. A specially crafted pixel shader can cause a denial of service. An attacker can provide a specially crafted shader file to trigger this vulnerability. This vulnerability can be triggered from...
CVE-2019-5183
PUBLISHED: 2020-01-25
An exploitable type confusion vulnerability exists in AMD ATIDXX64.DLL driver, versions 26.20.13031.10003, 26.20.13031.15006 and 26.20.13031.18002. A specially crafted pixel shader can cause a type confusion issue, leading to potential code execution. An attacker can provide a specially crafted shad...
CVE-2020-5226
PUBLISHED: 2020-01-24
Cross-site scripting in SimpleSAMLphp before version 1.18.4. The www/erroreport.php script allows error reports to be submitted and sent to the system administrator. Starting with SimpleSAMLphp 1.18.0, a new SimpleSAML\Utils\EMail class was introduced to handle sending emails, implemented as a wrapp...