Attacks/Breaches
5/20/2011
01:03 PM
50%
50%

Tech Insight: Finding And Securing Your Enterprise's Most Sensitive Data

The headlines are full of companies facing serious breaches. Here are some basic steps to protect your enterprise's critical data -- and stay out of the news

No matter what your business, information is likely one of your most valuable assets -- both to you and to the attacker. With so many breaches in the news -- from Sony to Epsilon to Heartland Payment Systems -- we all must understand what sensitive data we have, the risk associated with the data, and how to protect it.

But before we can do anything, we must know what sensitive data we have and where it's stored. We can’t protect what we don’t know about. Finding the data could be very simple or hugely complex, depending on your organization.

The first step is to list all of the sensitive data types your organization handles: employee personal information, customers' personally identifiable information, cardholder data, medical records, and corporate intellectual property, such as source code and transaction information. These data types will hold different risks for different companies -- if you're a software company, for example, the loss of source code is more damaging than the loss of externally regulated data.

Once you have a list of what you believe is critical data, ask department heads to add any other data types they know of or believe should be included. Use this as an opportunity to also ask teams to identify the places where these data types are utilized. At this point, all data types should be added to the list -- later, you’ll filter and prioritize based on risk and which you can most easily protect.

Once you know what sensitive data you're looking for, you'll inevitably find it in places where it shouldn’t be stored. During a data audit years ago, I found credit card information in the /tmp directory on a server. A production support staff member was debugging a data load that couldn’t be reproduced in staging and dumped the data load into the /tmp directory to review the data structure against that of staging. Once finding the problem, he forgot to remove the dump, thus leaving it there for anyone who accessed the server.

After you've talked with your people about what types of data to look for and where it might be found, the best way to find sensitive data is to scan and monitor for it. Most data in your organization can be fingerprinted in a way that allows for searching. For instance, credit card numbers and Social Security numbers follow a predefined format that’s well-documented.

There are literally dozens of pages on Google that show you how to search for these data types, utilizing everything from OpenDLP to simple Perl, PHP, and Python scripts. Custom data types, such as source code, typically can be fingerprinted and searched using header information added to the code on check in; standard comments that may apply to all files, such as copyright information; or by searching for common strings that appear in the code, such as variable names or custom include files.

If your source code doesn’t have something unique defined in every file checked in, then add a unique signature to every file so that it can be searched for in the future. This process can be automated through most source control software. Utilizing OpenDLP or a commercial alternative to perform searches -- rather than writing your own scripts -- is a great way to start. Of course, OpenDLP is Windows-centric, so other solutions might be required.

What if you need to search for data that is in an uncommon format? Easy. Find the pattern, break out your favorite regular expression helper, like Reggy for OS X, or call your resident regex guru and build your own regex to reduce false positives.

There are two types of data that you will run into when searching: structured and unstructured. Within these, there will be countless formats -- and these formats will be the bane of your search. Structured data is data stored in a known format -- such as in a database. Unstructured data is data that is stored in unpredictable or various formats, such as Word files, text files, Excel files, or any other random format.

Searching plain text files, XML files, and databases is pretty straightforward. You connect to the database and run a query across each table or for flat file. Search the file system, iterate over each line of each file, and you’re pretty much done.

Archives, Word files, Excel files, and data stored externally will cause more of a problem -- not to mention the data types that aren’t well-documented or predictable. If you’re trying to roll your own solution, then Perl, Python, Java, and just about every other language offer libraries of common file formats. All of the commercial data loss prevention (DLP) and search products can identify common file formats, too.

What if your sensitive data is stored in the cloud? Many enterprises don't have good visibility into data that is stored and shared externally -- whether it's a cloud services provider or a third-party service, such as the ones that might be managing your payroll or benefits.

Next: How to find data in the cloud.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
White Papers
Cartoon
Latest Comment: nice post
Current Issue
Flash Poll
Video
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2014-1750
Published: 2015-07-01
Open redirect vulnerability in nokia-mapsplaces.php in the Nokia Maps & Places plugin 1.6.6 for WordPress allows remote attackers to redirect users to arbitrary web sites and conduct phishing attacks via a URL in the href parameter to page/place.html. NOTE: this was originally reported as cross-sit...

CVE-2014-1836
Published: 2015-07-01
Absolute path traversal vulnerability in htdocs/libraries/image-editor/image-edit.php in ImpressCMS before 1.3.6 allows remote attackers to delete arbitrary files via a full pathname in the image_path parameter in a cancel action.

CVE-2015-0848
Published: 2015-07-01
Heap-based buffer overflow in libwmf 0.2.8.4 allows remote attackers to cause a denial of service (crash) or possibly execute arbitrary code via a crafted BMP image.

CVE-2015-1330
Published: 2015-07-01
unattended-upgrades before 0.86.1 does not properly authenticate packages when the (1) force-confold or (2) force-confnew dpkg options are enabled in the DPkg::Options::* apt configuration, which allows remote man-in-the-middle attackers to upload and execute arbitrary packages via unspecified vecto...

CVE-2015-1950
Published: 2015-07-01
IBM PowerVC Standard Edition 1.2.2.1 through 1.2.2.2 does not require authentication for access to the Python interpreter with nova credentials, which allows KVM guest OS users to discover certain PowerVC credentials and bypass intended access restrictions via unspecified Python code.

Dark Reading Radio
Archived Dark Reading Radio
Marc Spitler, co-author of the Verizon DBIR will share some of the lesser-known but most intriguing tidbits from the massive report