Tech Insight: Finding And Securing Your Enterprise's Most Sensitive Data
The headlines are full of companies facing serious breaches. Here are some basic steps to protect your enterprise's critical data -- and stay out of the news
No matter what your business, information is likely one of your most valuable assets -- both to you and to the attacker. With so many breaches in the news -- from Sony to Epsilon to Heartland Payment Systems -- we all must understand what sensitive data we have, the risk associated with the data, and how to protect it.
But before we can do anything, we must know what sensitive data we have and where it's stored. We can’t protect what we don’t know about. Finding the data could be very simple or hugely complex, depending on your organization.
More Security Insights
- Forrester Study: The Total Economic Impact of VMware View
- Securing Executives and Highly Sensitive Documents of Corporations Globally
The first step is to list all of the sensitive data types your organization handles: employee personal information, customers' personally identifiable information, cardholder data, medical records, and corporate intellectual property, such as source code and transaction information. These data types will hold different risks for different companies -- if you're a software company, for example, the loss of source code is more damaging than the loss of externally regulated data.
Once you have a list of what you believe is critical data, ask department heads to add any other data types they know of or believe should be included. Use this as an opportunity to also ask teams to identify the places where these data types are utilized. At this point, all data types should be added to the list -- later, you’ll filter and prioritize based on risk and which you can most easily protect.
Once you know what sensitive data you're looking for, you'll inevitably find it in places where it shouldn’t be stored. During a data audit years ago, I found credit card information in the /tmp directory on a server. A production support staff member was debugging a data load that couldn’t be reproduced in staging and dumped the data load into the /tmp directory to review the data structure against that of staging. Once finding the problem, he forgot to remove the dump, thus leaving it there for anyone who accessed the server.
After you've talked with your people about what types of data to look for and where it might be found, the best way to find sensitive data is to scan and monitor for it. Most data in your organization can be fingerprinted in a way that allows for searching. For instance, credit card numbers and Social Security numbers follow a predefined format that’s well-documented.
There are literally dozens of pages on Google that show you how to search for these data types, utilizing everything from OpenDLP to simple Perl, PHP, and Python scripts. Custom data types, such as source code, typically can be fingerprinted and searched using header information added to the code on check in; standard comments that may apply to all files, such as copyright information; or by searching for common strings that appear in the code, such as variable names or custom include files.
If your source code doesn’t have something unique defined in every file checked in, then add a unique signature to every file so that it can be searched for in the future. This process can be automated through most source control software. Utilizing OpenDLP or a commercial alternative to perform searches -- rather than writing your own scripts -- is a great way to start. Of course, OpenDLP is Windows-centric, so other solutions might be required.
What if you need to search for data that is in an uncommon format? Easy. Find the pattern, break out your favorite regular expression helper, like Reggy for OS X, or call your resident regex guru and build your own regex to reduce false positives.
There are two types of data that you will run into when searching: structured and unstructured. Within these, there will be countless formats -- and these formats will be the bane of your search. Structured data is data stored in a known format -- such as in a database. Unstructured data is data that is stored in unpredictable or various formats, such as Word files, text files, Excel files, or any other random format.
Searching plain text files, XML files, and databases is pretty straightforward. You connect to the database and run a query across each table or for flat file. Search the file system, iterate over each line of each file, and you’re pretty much done.
Archives, Word files, Excel files, and data stored externally will cause more of a problem -- not to mention the data types that aren’t well-documented or predictable. If you’re trying to roll your own solution, then Perl, Python, Java, and just about every other language offer libraries of common file formats. All of the commercial data loss prevention (DLP) and search products can identify common file formats, too.
What if your sensitive data is stored in the cloud? Many enterprises don't have good visibility into data that is stored and shared externally -- whether it's a cloud services provider or a third-party service, such as the ones that might be managing your payroll or benefits.
Next: How to find data in the cloud.