Simulated or 'anonymized' data is a better option than exposing live data to outside sources

If you use real, live customer data in your testing and development of applications, you may want to think twice about the risks of exposing that data.

Organizations that use live data in their testing do so basically because it makes the testing more real-world and better puts the app through its paces. Trouble is, it also can expose sensitive data to engineering staff who normally wouldn't have access to that data, as well as to consultants and other outside contractors working with your organization on the testing process.

But you don't have to use the real thing in app testing and development: "It needs to be real enough, but it's better if it's not people's confidential information," says Gary McGraw, CTO of Cigital.

Still, it's common practice among many organizations today. According to a new study from the Ponemon Institute, which was commissioned by Compuware, 69 percent of the over 800 IT professionals surveyed said they use live data for testing their applications, and 62 percent say they do so in their software development. Over 50 percent outsource their app testing, and of that group, 49 percent of them share live data with the outsourcing organization.

"This flies under the radar. It's actually a common practice, although I don't know any statistics on it," says Chris Eng, director of security research for Veracode. "When we're doing penetration testing of an application… we find that, yes, those testing environment databases were just copied from a production/live database," for example.

But compliance and other pressures are pushing some organizations to reassess that practice, Eng says. "The minute you stick production data into a test environment, you're suddenly exposing it to everyone in the organization and quality assurance, as well as to any consultants or external contactors local or offshore. You're now widening the net and exposing it to a lot of parties that shouldn't have access to it."

One option is to develop simulated data that looks a lot like the real thing, notes Cigital's McGraw. His firm did so for the Financial Industry Regulatory Authority (FINRA) during a transaction application testing process. "It requires writing a little code," he notes. For the FINRA app, that meant ensuring the key trade elements were there -- dollar amounts and valid stock ticker symbols, for instance -- so it appeared as close the real thing as possible.

In the Ponemon study, 89 percent of the organizations running live data in their testing use customer files, and 74 percent, customer lists. Among the live data: employee and vendor records, customer account numbers, credit card numbers, Social Security numbers, and credit, debit, and payment information.

The study points to one case last year where an outside consultant hired by an insurance firm to develop applications turned around and sold some of the firm's customer data he had been privy to during the development project.

Meanwhile, Veracode's Eng says the problem with fake data is that it can sometimes compromise testing. Even something as minor as a missing punctuation mark can throw things off. "An apostrophe isn't something you'd think was a big deal, but it's important in SQL queries, for example," he says. "If you don't have a realistic set of data it may not be possible to catch all the SQL injection issues."

"Anonymizing" the data is an effective option, he says. "It boils down to taking the real data and masking it and transforming different parts that make it no longer identifiable," he says. "It's no longer the real data, but the structure of the data is the same."

But securing live data in testing isn't a priority for most organizations, says Paul Vallely, solutions sales director for enterprise solutions at Compuware. "All the organizations we interact with say this is a risk, but it's not being prioritized enough. They have so many projects that they need to deliver" that it gets lost in the shuffle.

Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.

About the Author(s)

Kelly Jackson Higgins, Editor-in-Chief, Dark Reading

Kelly Jackson Higgins is the Editor-in-Chief of Dark Reading. She is an award-winning veteran technology and business journalist with more than two decades of experience in reporting and editing for various publications, including Network Computing, Secure Enterprise Magazine, Virginia Business magazine, and other major media properties. Jackson Higgins was recently selected as one of the Top 10 Cybersecurity Journalists in the US, and named as one of Folio's 2019 Top Women in Media. She began her career as a sports writer in the Washington, DC metropolitan area, and earned her BA at William & Mary. Follow her on Twitter @kjhiggins.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights