Welcome Guest. | Log In| Register | Membership Benefits
Dark Reading's security-views Weblog

Topics:   Database Security Tech Center : Security Views
  • Email this page E-mail this page
  • |  Print Print this page
  • |   Bookmark and Share

What Data Discovery Tools Really Do

Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.

Jan 20, 2010 | 09:00 AM | 

By Adrian Lane
Dark Reading
Data discovery tools are becoming increasingly necessary for getting a handle on where sensitive data resides. When you have a production database schema with 40,000 tables, most of which are undocumented by the developers who created them, finding information within a single database is cumbersome. Now multiply that problem across financial, HR, business processing, testing, and decision support databases -- and you have a big mess.And, honestly, criminals don't really care whether they steal data from production servers or test machines -- whichever is easier. Whether your strategy is to remove, mask, encrypt, or secure sensitive data, you cannot act until you know where it is.

So how do these tools work? Let's say you want to find credit card numbers. Data discovery tools for databases use a couple of methods to find and then identify information. Most use special login credentials to scan internal database structures, itemize tables and columns, and then analyze what was found. Three basic analysis methods are employed:

1. Metadata: Metadata is data that describes data, and all relational databases store metadata that describes tables and column attributes. In our credit card example, we examine column attributes to determine whether the name of the column, or the size and data type, resembles a credit card number. If the column is a 16-digit number or the name is something like "CreditCard" or "CC#", then we have a high likelihood of a match. Of course, the effectiveness of each product will vary depending on how well the analysis rules are implemented. This remains the most common analysis technique.

2. Labels: Labeling is where data elements are grouped with a tag that describes the data. This can be done at the time the data is created, or tags can be added over time to provide additional information and references to describe the data. In many ways it is just like metadata, but slightly less formal. Some relational database platforms provide mechanisms to create data labels, but this method is more commonly used with flat files, becoming increasingly useful as more firms move to ISAM or quasi-relational data storage, like Amazon's simpleDB, to handle fast-growing data sets. This form of discovery is similar to a Google search, with the greater the number of similar labels, the greater likelihood of a match. Effectiveness is dependent on the use of labels.

3. Content analysis: In this form of analysis, we investigate the data itself by employing pattern matching, hashing, statistical, lexical, or other forms of probability analysis. In the case of our credit card example, when we find a number that resembles a credit card number, a common method is to perform a LUHN check on the number itself. This is a simple numeric checksum used by credit card companies to verify a number is a valid credit card number. If the number we discover passes the LUHN check, then it is a very high probability that we have discovered a credit card number. Content analysis is a growing trend, and one being used successfully in data loss prevention (DLP) and Web content analysis products.

Some discovery tools are available as stand-alone offerings, but most are packaged within other products, such as data masking, configuration management, or vulnerability assessment.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.



Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dark Reading encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dark Reading moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Dark Reading further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
Subscribe to RSS









  1. Cookies, Social Media And FireSheep
  2. SMB Guide To Credit Card Regulations, Part 2: The Low-Hanging Fruit
  3. HP And The Scary Corporate Fifth Column Concept
  4. Taking USB Attacks To The Next Level
  5. NoSQL: Not Much, Anyway
  1. Taking Cybersecurity Lessons To The Bank
  2. Researchers See Real-Time Phishing Jump
  3. 'BlackSheep' Sniffs Out Firesheep WiFi-Hacking
  4. Slideshow: Ten Free Security Monitoring Tools
  5. A Different Spin On Sleuthing Stuxnet
  6. M&A Activity Muddles Database Security
  1. Secure Managed Web Hosting Saves 960.gs from Malicious Hackers
  2. Access Governance as a Business Service: An Integrated Strategy for Automation with ITSM
  3. Business Driven Access Management and Governance: Simplifying the Delivery and Governance of Access Throughout
 
 


 
  Ars Technica
Boing Boing
Channel 9 Forums
CRN Blogs
Dr.Dobb's Portal: Blogs
Engadget
Gizmodo
GrokLaw
  Lifehacker
Schneier on Security
Slashdot
TechCrunch
Techdirt
Techmeme
Valleywag
 
  February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
  May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008