Welcome Guest. | Log In| Register | Membership Benefits
Dark Reading's Security Views Weblog
Topics:   Database Security Tech Center : Security Views

  • Email this page E-mail this page
  • Print this page Print this page
  • Bookmark and Share

What Data Discovery Tools Really Do


Posted by Adrian Lane, Jan 20, 2010 09:00 AM

Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.

And, honestly, criminals don't really care whether they steal data from production servers or test machines -- whichever is easier. Whether your strategy is to remove, mask, encrypt, or secure sensitive data, you cannot act until you know where it is.

So how do these tools work? Let's say you want to find credit card numbers. Data discovery tools for databases use a couple of methods to find and then identify information. Most use special login credentials to scan internal database structures, itemize tables and columns, and then analyze what was found. Three basic analysis methods are employed:

1. Metadata: Metadata is data that describes data, and all relational databases store metadata that describes tables and column attributes. In our credit card example, we examine column attributes to determine whether the name of the column, or the size and data type, resembles a credit card number. If the column is a 16-digit number or the name is something like "CreditCard" or "CC#", then we have a high likelihood of a match. Of course, the effectiveness of each product will vary depending on how well the analysis rules are implemented. This remains the most common analysis technique.

2. Labels: Labeling is where data elements are grouped with a tag that describes the data. This can be done at the time the data is created, or tags can be added over time to provide additional information and references to describe the data. In many ways it is just like metadata, but slightly less formal. Some relational database platforms provide mechanisms to create data labels, but this method is more commonly used with flat files, becoming increasingly useful as more firms move to ISAM or quasi-relational data storage, like Amazon's simpleDB, to handle fast-growing data sets. This form of discovery is similar to a Google search, with the greater the number of similar labels, the greater likelihood of a match. Effectiveness is dependent on the use of labels.

3. Content analysis: In this form of analysis, we investigate the data itself by employing pattern matching, hashing, statistical, lexical, or other forms of probability analysis. In the case of our credit card example, when we find a number that resembles a credit card number, a common method is to perform a LUHN check on the number itself. This is a simple numeric checksum used by credit card companies to verify a number is a valid credit card number. If the number we discover passes the LUHN check, then it is a very high probability that we have discovered a credit card number. Content analysis is a growing trend, and one being used successfully in data loss prevention (DLP) and Web content analysis products.

Some discovery tools are available as stand-alone offerings, but most are packaged within other products, such as data masking, configuration management, or vulnerability assessment.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.

« Share Your New Security Innovations | Main | Emergency Microsoft Internet Explorer Patch Arrives Thursday »



Sign up now for the weekly InformationWeek Blog Newsletter.


This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.