The dataset contains 200 million rows of information stolen from websites across industries, likely via opportunistic access.

Kelly Sheridan, Former Senior Editor, Dark Reading

May 18, 2018

3 Min Read

A dataset containing more than 200 million lines of Japanese personally identifiable information (PII) has been found on the Chinese underground market, researchers report. It's believed the data is authentic and was exfiltrated from multiple Japanese website databases.

Experts at FireEye iSIGHT Intelligence first noticed the actor advertising the dataset in December 2017. This actor has sold site databases on Chinese underground forums since at least 2013 and is likely connected to someone living in China's Zhejiang province.

The team identified the actor and data as part of regular monitoring of the cyber threat landscape, explains Oleg Bondarenko, senior manager for international research at FireEye. The Chinese underground primarily consists of instant messenger groups such as QQ, he says. This dataset was not discovered on a forum but rather a group for sharing and offering data.

"Yes, we've observed actors who were selling Japanese PII data or interested in purchase," Bondarenko continues. "However [we] have never observed at such scale."

Given the number of sources and different types of data included, it's likely the data was taken via opportunistic compromise and not targeted attacks. The means of obtaining this data have not been confirmed, but Bondarenko says one possible way would be collecting data from previous public leaks and taking over victims' accounts. Motivation was likely financial gain.

Specific data types included in this set include names, credentials, email addresses, birthdates, phone numbers, and home addresses. The data seemingly comes from a range of 11-50 Japanese websites across industries including financial, retail, food and beverage, transportation, and entertainment. One folder indicated the data was collected between May and June 2016; another showed its data was acquired in May and July 2013.

The actor claims all credential sets are unique and priced them at ¥1,000 CNY ($150.96 USD) for the full dataset.

In a random sample of 200,000 leaked email addresses, most were previously leaked in major data breaches, a sign the addresses included in this dataset were not specifically created for it. Since most of the leaked data didn't come from one specific leak or public website, researchers don't think the actor scraped the info from other data leaks and resold it as a new product.

"The data was extremely varied and not available through publicly available data sources; therefore, we believe that the advertised data is genuine," researchers explain in a report.

That said, they do believe the number of real and unique credentials is lower than the actor claims. In a sample of 190,000 credentials, researchers noticed more than 36% contained duplicate values and there is a significant number of fake email addresses. Several actors commented on the ad to express interest in buying the data. However, the same actors later posted negative feedback, claiming they didn't receive the product advertised.

Most of the information advertised is commonly stored on websites with customer login and profile information. Researchers didn't notice the actor selling sensitive email or businesses data that would indicate he/she had access beyond servers connected to a site or Web portal.

Bondarenko says the team hasn't noticed any similar type of activity from a specific group in China. The actor behind this was active for a while, and during the time he was selling the data.

"However, there are no other insights available for the actor because he became inactive recently, so we've been closely monitoring to understand the reason behind that and potentially getting additional insights," he adds.

Since much of the data advertised had been exposed in large leaks, researchers don't think this specific dataset will enable large-scale cyberattacks toward the people whose credentials are included. It is worth noting the leaked PII could be used to target other entities if those people reused credentials between the compromised sites and other personal or business accounts.

Related Content:

About the Author(s)

Kelly Sheridan

Former Senior Editor, Dark Reading

Kelly Sheridan was formerly a Staff Editor at Dark Reading, where she focused on cybersecurity news and analysis. She is a business technology journalist who previously reported for InformationWeek, where she covered Microsoft, and Insurance & Technology, where she covered financial services. Sheridan earned her BA in English at Villanova University. You can follow her on Twitter @kellymsheridan.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights