Web-Searchable Databases An Increasing Security Risk

Breaches at Yale and the Southern California Medical-Legal Consultants demonstrate the importance of ensuring that databases that touch Web-facing interfaces aren't exposed by Web searches

Dark Reading Staff, Dark Reading

August 26, 2011

4 Min Read

Two database breaches that that came to light recently are highlighting the common but frequently overlooked problem of misconfigured databases containing sensitive information left vulnerable to exposure by Web searches.

The first was a breach at Yale University, which left a data store containing sensitive information belonging to 43,000 individuals on an FTP server that was indexed by Google in September 2010. The second occurred at Southern California Medical-Legal Consultants, Inc. (SCMLC), which exposed a database with sensitive information for nearly 300,000 people behind a web application that required no password to access and which was indexable to search engines.

According to security experts, search engines are the great equalizer when it comes to ferreting out gaps in database policy compliance.

"The thing about search is that it is thorough and most people's defenses are not thorough," says Dr Mike Lloyd, CTO of RedSeal Systems. "We find that most organizations that are trying to follow policies like 'Don't put sensitive data in FTP servers that are open to the Internet' traditionally feel pretty good about 95 percent compliance with those policies. The thing is that search makes it clear that anything less than 100 percent compliance with your policy is useless. If you make one mistake in a million, the search engines will find it for you."

The mistake made by Yale was first discovered by the school in late June and publicly announced last Friday. At that time its security team blocked search engine access to the FTP server and deleted the store of sensitive information that included social security numbers (SSN) but no addresses, birth dates, or financial information. But at that point, the information had been publicly available for ten months after Google rolled out the capability to crawl and index FTP servers last year.

Meanwhile, the breach at SCMLC was made public this week by a researcher from Identity Finder, who in June uncovered several gigabytes of SCMLC database, spreadsheet, and other documents containing sensitive information that was readily available through Web searches. The database files were particularly a gold mine for hackers that would know what to look for.

"This isn't just a simple case of entering a few keywords and to find what you're looking for; you need to know exactly what strings you're looking for and you need to have some type of idea how databases work and how database information is being stored," says Frank Kenney, former Gartner analyst and VP global strategy at Ipswitch. "But it is very interesting because the people you definitely don't want getting a hold of this stuff are the ones who know how to do it."

In fact, many of the recent LulzSec exposures over the last few months have come from the result of participants trolling Google for just the right kind of database information. Many in the security field believe that as Google continues to add features such as FTP and PDF indexing to bolster its Web and desktop search functionalities, the risk of poorly configured databases being exposed by the engine will skyrocket.

While it may seem convenient to blame Google for the problem, ultimately organizations have to remember that this is simply killing the messenger, says Lloyd.

"Blaming Google for this is really getting it all backwards," he says. "Google just makes it clear that there is a problem. If you left the door unlocked on a store room for years and then Google Maps came along and put a photograph showing there was no lock on the door, the fact that the photograph went up isn't the problem. The problem was that the door was unlocked for years." Kenny believes that organizations are going to need to become more cognizant of what Web-facing databases contain as the ease of database connectivity and the power of search engines that could potentially index their information, both increase in tandem.

"In many cases they don't know that they're wide open," he explains. "The databases that exist today have ultimately been designed to allow the easiest access from a multitude of devices and places. In many people's minds they think you need to access a server with an application running on that and that there is a measure of safety for the data sitting underneath the application because the application is secure. But your database is sitting out there and in many cases when it came out of the box it came configured to be connected to the Internet."

Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.

About the Author(s)

Dark Reading Staff

Dark Reading

Dark Reading is a leading cybersecurity media site.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights