Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Perimeter

1/20/2010
09:00 AM
Adrian Lane
Adrian Lane
Commentary
50%
50%

What Data Discovery Tools Really Do

Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.

Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.And, honestly, criminals don't really care whether they steal data from production servers or test machines -- whichever is easier. Whether your strategy is to remove, mask, encrypt, or secure sensitive data, you cannot act until you know where it is.

So how do these tools work? Let's say you want to find credit card numbers. Data discovery tools for databases use a couple of methods to find and then identify information. Most use special login credentials to scan internal database structures, itemize tables and columns, and then analyze what was found. Three basic analysis methods are employed:

1. Metadata: Metadata is data that describes data, and all relational databases store metadata that describes tables and column attributes. In our credit card example, we examine column attributes to determine whether the name of the column, or the size and data type, resembles a credit card number. If the column is a 16-digit number or the name is something like "CreditCard" or "CC#", then we have a high likelihood of a match. Of course, the effectiveness of each product will vary depending on how well the analysis rules are implemented. This remains the most common analysis technique.

2. Labels: Labeling is where data elements are grouped with a tag that describes the data. This can be done at the time the data is created, or tags can be added over time to provide additional information and references to describe the data. In many ways it is just like metadata, but slightly less formal. Some relational database platforms provide mechanisms to create data labels, but this method is more commonly used with flat files, becoming increasingly useful as more firms move to ISAM or quasi-relational data storage, like Amazon's simpleDB, to handle fast-growing data sets. This form of discovery is similar to a Google search, with the greater the number of similar labels, the greater likelihood of a match. Effectiveness is dependent on the use of labels.

3. Content analysis: In this form of analysis, we investigate the data itself by employing pattern matching, hashing, statistical, lexical, or other forms of probability analysis. In the case of our credit card example, when we find a number that resembles a credit card number, a common method is to perform a LUHN check on the number itself. This is a simple numeric checksum used by credit card companies to verify a number is a valid credit card number. If the number we discover passes the LUHN check, then it is a very high probability that we have discovered a credit card number. Content analysis is a growing trend, and one being used successfully in data loss prevention (DLP) and Web content analysis products.

Some discovery tools are available as stand-alone offerings, but most are packaged within other products, such as data masking, configuration management, or vulnerability assessment.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading. Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Data Leak Week: Billions of Sensitive Files Exposed Online
Kelly Jackson Higgins, Executive Editor at Dark Reading,  12/10/2019
Intel Issues Fix for 'Plundervolt' SGX Flaw
Kelly Jackson Higgins, Executive Editor at Dark Reading,  12/11/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
The Year in Security: 2019
This Tech Digest provides a wrap up and overview of the year's top cybersecurity news stories. It was a year of new twists on old threats, with fears of another WannaCry-type worm and of a possible botnet army of Wi-Fi routers. But 2019 also underscored the risk of firmware and trusted security tools harboring dangerous holes that cybercriminals and nation-state hackers could readily abuse. Read more.
Flash Poll
Rethinking Enterprise Data Defense
Rethinking Enterprise Data Defense
Frustrated with recurring intrusions and breaches, cybersecurity professionals are questioning some of the industrys conventional wisdom. Heres a look at what theyre thinking about.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-5252
PUBLISHED: 2019-12-14
There is an improper authentication vulnerability in Huawei smartphones (Y9, Honor 8X, Honor 9 Lite, Honor 9i, Y6 Pro). The applock does not perform a sufficient authentication in a rare condition. Successful exploit could allow the attacker to use the application locked by applock in an instant.
CVE-2019-5235
PUBLISHED: 2019-12-14
Some Huawei smart phones have a null pointer dereference vulnerability. An attacker crafts specific packets and sends to the affected product to exploit this vulnerability. Successful exploitation may cause the affected phone to be abnormal.
CVE-2019-5264
PUBLISHED: 2019-12-13
There is an information disclosure vulnerability in certain Huawei smartphones (Mate 10;Mate 10 Pro;Honor V10;Changxiang 7S;P-smart;Changxiang 8 Plus;Y9 2018;Honor 9 Lite;Honor 9i;Mate 9). The software does not properly handle certain information of applications locked by applock in a rare condition...
CVE-2019-5277
PUBLISHED: 2019-12-13
Huawei CloudUSM-EUA V600R006C10;V600R019C00 have an information leak vulnerability. Due to improper configuration, the attacker may cause information leak by successful exploitation.
CVE-2019-5254
PUBLISHED: 2019-12-13
Certain Huawei products (AP2000;IPS Module;NGFW Module;NIP6300;NIP6600;NIP6800;S5700;SVN5600;SVN5800;SVN5800-C;SeMG9811;Secospace AntiDDoS8000;Secospace USG6300;Secospace USG6500;Secospace USG6600;USG6000V;eSpace U1981) have an out-of-bounds read vulnerability. An attacker who logs in to the board m...