Endpoint

10/26/2017
02:30 PM
Connect Directly
LinkedIn
RSS
E-Mail vvv
100%
0%

Why Data Breach Stats Get It Wrong

It's not the size of the stolen data dump that is important. It's the window between the date of the breach and the date of discovery that represents the biggest threat.

Earlier this month, Yahoo announced that it had drastically underestimated the impact of the data breach it reported in 2016. In December 2016, the company reported that, in addition to its previous breach of 500 million accounts, an additional 1 billion accounts had been compromised in a separate breach. Now, it believes that all of its accounts were compromised — affecting over 3 billion users.

How did Yahoo get it so wrong? And what do these revised breach numbers mean? To understand this, we need to examine the dismal science behind calculating the impact of data breaches.

In one out of four cases, third parties discover the breach, typically after being affected by it or seeing the data distributed in darknets. In other cases, internal investigations discover anomalous behavior, such as a user accessing a database he shouldn't. Either way, it can be difficult to determine how much data was stolen. When a breach is discovered by third parties, it represents only a sample of exposed data, and the attacker may have accessed additional data.

In breaches found by internal probes, seeing that an attacker accessed one file or database does not mean other resources weren't accessed using methods that would appear differently to a forensic investigator (for example, logging in as one user to access one file, while using a different account to access another). There is a great deal of detective work — and estimation — required to describe the scope of a breach.

The round numbers that Yahoo provided illustrate this perfectly. Few consumer companies have sets of user information stored in neatly defined buckets of 500 million and 1 billion. At first, Yahoo likely found evidence to suggest certain data was accessed and had to extrapolate estimates from there. But there is always an element of estimation, or to put it another way, guesswork. Yahoo's latest statement reads: "the company recently obtained new intelligence and now believes [...] that all Yahoo user accounts were affected by the August 2013 theft." Companies are compelled to make, and revise, educated guesses at each stage to demonstrate control of the situation, and to be as transparent and responsible as possible. So it's not surprising that these figures often grow.

The Lag on Lag Time
Another challenge that affects a company's ability to get these figures right is lag time. The breaches reported by Yahoo happened years earlier. There was also a significant lag between breach detection and public notice with Equifax — nearly six weeks. Sophisticated attackers are not only adept at finding system vulnerabilities, but they typically have plenty of time to access data carefully and hide their tracks. Add in several years of buffer time for a company's own available analytics to atrophy for useful forensic purposes (e.g., regularly cleared log files) and detecting unauthorized access and estimating damage becomes even more difficult.

In many ways, lag time is the most critical problem. The days, weeks, and years that pass before breaches are discovered (if they are discovered at all), gives attackers all the time they need to extract full value from the data they have stolen. In the case of stolen usernames and passwords, they are used in credential stuffing attacks to compromise millions of accounts at banks, airlines, government agencies, and other high-profile companies. Businesses are starting to follow recent NIST guidelines that recommend searching darknets for "spilled" credentials. But, by then, the original attacker has already used the credentials to break into accounts. At that point, the credentials are commoditized and are becoming worthless.

So, how should we think differently about data breach statistics? First, we need to remember that huge breaches may have already occurred but remain undiscovered. By definition, we can discuss only the breaches we know about. The more sophisticated the attacker, the greater the likelihood that it will take time to detect their breach. The second thing to remember is that a data breach is like a natural disaster, in that it has follow-on effects throughout the Internet ecosystem and our economy, enabling credential stuffing, account takeover, identity theft, and other forms of fraud. The indirect impact of data breaches is harder to quantify than the scope of the original breaches, and may outstrip the original breach in total harm by orders of magnitude.

The larger a data breach is suspected to be, the more attention it receives. But the scope of the problem is vast and hard to quantify; the projected numbers are just the tip of the iceberg in representing the risk consumers and business users face. It's not the size of the stolen data dump that we need to focus on. It's the window between the date of the breach and the date of discovery that represents the real danger zone. This is when cybercriminals are doing the most harm, using stolen data to break into more accounts, steal more data and identities, and transfer funds. The smart move for every corporate user or consumer is to create strong passwords, never reuse them across sites, monitor financial accounts, and be cautious with all data and shared online services.

Related Content:

Join Dark Reading LIVE for two days of practical cyber defense discussions. Learn from the industry’s most knowledgeable IT security experts. Check out the INsecurity agenda here.

 

Shuman Ghosemajumder is chief technology officer at Shape Security, a security company located in Mountain View, California. As one of the largest processors of login traffic in the world, Shape Security prevents fraud resulting from credential stuffing attacks, when breached ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Russia Hacked Clinton's Computers Five Hours After Trump's Call
Robert Lemos, Technology Journalist/Data Researcher,  4/19/2019
Why We Need a 'Cleaner Internet'
Darren Anstee, Chief Technology Officer at Arbor Networks,  4/19/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
5 Emerging Cyber Threats to Watch for in 2019
Online attackers are constantly developing new, innovative ways to break into the enterprise. This Dark Reading Tech Digest gives an in-depth look at five emerging attack trends and exploits your security team should look out for, along with helpful recommendations on how you can prevent your organization from falling victim.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-11506
PUBLISHED: 2019-04-24
In GraphicsMagick from version 1.3.30 to 1.4 snapshot-20190403 Q8, there is a heap-based buffer overflow in the function WriteMATLABImage of coders/mat.c, which allows an attacker to cause a denial of service or possibly have unspecified other impact via a crafted image file. This is related to Expo...
CVE-2019-8991
PUBLISHED: 2019-04-24
The administrator web interface of TIBCO Software Inc.'s TIBCO ActiveMatrix BPM, TIBCO ActiveMatrix BPM Distribution for TIBCO Silver Fabric, TIBCO ActiveMatrix Policy Director, TIBCO ActiveMatrix Service Bus, TIBCO ActiveMatrix Service Grid, TIBCO Silver Fabric Enabler for ActiveMatrix BPM, and TIB...
CVE-2019-8992
PUBLISHED: 2019-04-24
The administrative server component of TIBCO Software Inc.'s TIBCO ActiveMatrix BPM, TIBCO ActiveMatrix BPM Distribution for TIBCO Silver Fabric, TIBCO ActiveMatrix Policy Director, TIBCO ActiveMatrix Service Bus, TIBCO ActiveMatrix Service Grid, TIBCO ActiveMatrix Service Grid Distribution for TIBC...
CVE-2019-8993
PUBLISHED: 2019-04-24
The administrative web server component of TIBCO Software Inc.'s TIBCO ActiveMatrix BPM, TIBCO ActiveMatrix BPM Distribution for TIBCO Silver Fabric, TIBCO ActiveMatrix Policy Director, TIBCO ActiveMatrix Service Bus, TIBCO ActiveMatrix Service Grid, TIBCO ActiveMatrix Service Grid Distribution for ...
CVE-2019-8994
PUBLISHED: 2019-04-24
The workspace client of TIBCO Software Inc.'s TIBCO ActiveMatrix BPM, TIBCO ActiveMatrix BPM Distribution for TIBCO Silver Fabric, and TIBCO Silver Fabric Enabler for ActiveMatrix BPM contains vulnerabilities where an authenticated user can change settings that can theoretically adversely impact oth...