First, as you did with storage locations, determine what services are being used. Circulate a list to department heads and key users. Check with legal, IT, and accounting. You’ll find some services being used that aren’t "official," so you’ll have to work with the end users and departments on those. The big ones, such as Salesforce and the like, you can probably find through accounting or legal, since they leave a trail of licenses and budget. Once you know which services your organization is using, determine the best approach for each.
A good way to start is to monitor the network for data patterns. Pull out your trusty regex again and add it to your IDS or IPS -- or install a DLP solution to monitor network traffic for sensitive data. If the service utilizes encryption, then you won’t see much, but it's a start.
Identifying unencrypted, approved transactions is a good first step. We need to know data is protected when we share or move it. During this process, you might discover unapproved or unknown services and data transfers. You’ll be surprised what you find floating over email, IM, FTP, and HTTP.
Now the real work starts. If you can’t find the data traversing the network, how can you find data in outsourced services? You have a few options, though none of them is great. First, if the services have APIs that can be used to search data, get out your favorite IDE (I’m hardcore, so I code in vi), and whip up a script to search the service, report on data found, or take some action.
If there is no API, or if you don't have the coding skills, then manual searching and auditing might be in your future. Logging in as an administrator and running queries against stored data can obtain the same results, but it takes more time and resources. Make this your last resort -- you don't want to spend all day searching cloud services unless you have no other choice.
As you’re searching all of these locations and finding sensitive data, two things will likely occur to you. First, how can you prevent data from being stored in so many places? Second, how should you protect the sensitive data that you've found? Let’s tackle these in reverse order.
Protecting the data will help control its flow. Each type of data poses a different risk to the organization if it is lost, destroyed, compromised, or exposed. Understand the risk each data type poses and align your protections to fit those risks. If the data is highly valuable, then encryption and/or restricted access could be required.
Start with access restrictions. Restrict access to the storage locations on a "need to know" basis. Restrict access to the level of rights -- and to the users who have a need to access the data. The smaller the group, the lower the privileges and the less risk of compromise. Access controls can be applied at the file, directory, database, or application level. Use whatever makes sense.
Once you have identified the proper storage location for your sensitive data and restricted access to it, encryption might be the next step. There are a slew of possibilities available, ranging from file-level encryption to full-drive encryption. Modern database management systems support encryption natively. Add in an external key management product, and you can build a solid solution.
If, for some reason, this doesn’t work for you, there are file system-level encryption products that allow you to drop your database -- and other files -- into an encrypted mount point. From there, you can control which processes and users will be allowed to access the decrypted data.
Never forget to provide your users with a method of encrypting data when stored locally or on the network. I know we all say never to store sensitive data locally, but let's face it: Sometimes there is a need for local storage. Provide a method and guidance for protecting that data.
Using a commercial full-disk encryption product might be the solution for local encryption; in some cases, open-source TrueCrypt does the job. I like full-disk encryption because then I don't have to worry about whether the user stored the file in the correct folder -- everything on the disk is encrypted. Some others argue that full-disk encryption is overkill because the user only really needs to encrypt a tiny subset of the total files stored on the system. To each his own.
With access controls and encryption in place, we now have a method of monitoring data for unauthorized access. If our network monitoring doesn’t catch someone transferring data, then access logs associated with the storage location should help us identify any access attempts.
After we have identified and protected the approved locations for storing sensitive data, it's time to tackle the rogue locations. Identify these locations, send notifications to their owners, and work with each to move the data to the approved locations -- or bring the existing locations into compliance.
If the owners of the rogue locations don’t listen or respond, then get buy-in from your management. Then comes the fun part. Find the location, restrict all access, and wait to see who complains. When they complain, inform them that the data violates policy and must be stored in the approved location or in an approved method.
Now we have all of our data identified, stored, and protected. The last thing to tackle is retention and documentation. Be sure that your organization documents the data types, the control types required for each, and where each data type can be stored. These guidelines should be high-level enough to allow flexibility while still keeping sensitive data types secure.
Data retention is a key element of protecting sensitive data. One of the easiest ways to reduce data risk is to collect and store only the data your organization really uses. Purge old data or back it up and store it offline. By reducing the amount of data that's easily accessible, organizations naturally decrease risk. Establish purge policies for customer data that’s no longer required. Purge old, unused files on "public" shared drives. Be sure to clean up data stored with external vendors and cloud services.
Finding data across the organization and external services is not an easy task. There are many places to look and many technologies to work with. Start small, learn what you can about data usage, storage, and types, and gradually tackle the problem. If you can, use commercial products that continually expand their scope and capabilities. Open-source solutions, single-purpose tools, or custom scripts could also work -- but require more commitment.
Have a comment on this story? Please click "Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.