Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Perimeter

1/20/2010
09:00 AM
Adrian Lane
Adrian Lane
Commentary
50%
50%

What Data Discovery Tools Really Do

Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.

Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.And, honestly, criminals don't really care whether they steal data from production servers or test machines -- whichever is easier. Whether your strategy is to remove, mask, encrypt, or secure sensitive data, you cannot act until you know where it is.

So how do these tools work? Let's say you want to find credit card numbers. Data discovery tools for databases use a couple of methods to find and then identify information. Most use special login credentials to scan internal database structures, itemize tables and columns, and then analyze what was found. Three basic analysis methods are employed:

1. Metadata: Metadata is data that describes data, and all relational databases store metadata that describes tables and column attributes. In our credit card example, we examine column attributes to determine whether the name of the column, or the size and data type, resembles a credit card number. If the column is a 16-digit number or the name is something like "CreditCard" or "CC#", then we have a high likelihood of a match. Of course, the effectiveness of each product will vary depending on how well the analysis rules are implemented. This remains the most common analysis technique.

2. Labels: Labeling is where data elements are grouped with a tag that describes the data. This can be done at the time the data is created, or tags can be added over time to provide additional information and references to describe the data. In many ways it is just like metadata, but slightly less formal. Some relational database platforms provide mechanisms to create data labels, but this method is more commonly used with flat files, becoming increasingly useful as more firms move to ISAM or quasi-relational data storage, like Amazon's simpleDB, to handle fast-growing data sets. This form of discovery is similar to a Google search, with the greater the number of similar labels, the greater likelihood of a match. Effectiveness is dependent on the use of labels.

3. Content analysis: In this form of analysis, we investigate the data itself by employing pattern matching, hashing, statistical, lexical, or other forms of probability analysis. In the case of our credit card example, when we find a number that resembles a credit card number, a common method is to perform a LUHN check on the number itself. This is a simple numeric checksum used by credit card companies to verify a number is a valid credit card number. If the number we discover passes the LUHN check, then it is a very high probability that we have discovered a credit card number. Content analysis is a growing trend, and one being used successfully in data loss prevention (DLP) and Web content analysis products.

Some discovery tools are available as stand-alone offerings, but most are packaged within other products, such as data masking, configuration management, or vulnerability assessment.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading. Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and ... View Full Bio

 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
COVID-19: Latest Security News & Commentary
Dark Reading Staff 7/6/2020
Ripple20 Threatens Increasingly Connected Medical Devices
Kelly Sheridan, Staff Editor, Dark Reading,  6/30/2020
DDoS Attacks Jump 542% from Q4 2019 to Q1 2020
Dark Reading Staff 6/30/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
How Cybersecurity Incident Response Programs Work (and Why Some Don't)
This Tech Digest takes a look at the vital role cybersecurity incident response (IR) plays in managing cyber-risk within organizations. Download the Tech Digest today to find out how well-planned IR programs can detect intrusions, contain breaches, and help an organization restore normal operations.
Flash Poll
The Threat from the Internetand What Your Organization Can Do About It
The Threat from the Internetand What Your Organization Can Do About It
This report describes some of the latest attacks and threats emanating from the Internet, as well as advice and tips on how your organization can mitigate those threats before they affect your business. Download it today!
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-15505
PUBLISHED: 2020-07-07
MobileIron Core and Connector before 10.3.0.4, 10.4.x before 10.4.0.4, 10.5.x before 10.5.1.1, 10.5.2.x before 10.5.2.1, and 10.6.x before 10.6.0.1, and Sentry before 9.7.3 and 9.8.x before 9.8.1, allow remote attackers to execute arbitrary code via unspecified vectors.
CVE-2020-15506
PUBLISHED: 2020-07-07
MobileIron Core and Connector before 10.3.0.4, 10.4.x before 10.4.0.4, 10.5.x before 10.5.1.1, 10.5.2.x before 10.5.2.1, and 10.6.x before 10.6.0.1 allow remote attackers to bypass authentication mechanisms via unspecified vectors.
CVE-2020-15507
PUBLISHED: 2020-07-07
MobileIron Core and Connector before 10.3.0.4, 10.4.x before 10.4.0.4, 10.5.x before 10.5.1.1, 10.5.2.x before 10.5.2.1, and 10.6.x before 10.6.0.1 allow remote attackers to read files on the system via unspecified vectors.
CVE-2020-15096
PUBLISHED: 2020-07-07
In Electron before versions 6.1.1, 7.2.4, 8.2.4, and 9.0.0-beta21, there is a context isolation bypass, meaning that code running in the main world context in the renderer can reach into the isolated Electron context and perform privileged actions. Apps using "contextIsolation" are affecte...
CVE-2020-4075
PUBLISHED: 2020-07-07
In Electron before versions 7.2.4, 8.2.4, and 9.0.0-beta21, arbitrary local file read is possible by defining unsafe window options on a child window opened via window.open. As a workaround, ensure you are calling `event.preventDefault()` on all new-window events where the `url` or `options` is not ...