Keep in mind what we are talking about is applying security at the data layer because we can't make assumptions about the security of our environment either in public or private clouds. Further, the technologies we use will be very different depending on the "type" of cloud service (SaaS, IaaS, or PaaS) and the type of database (traditional relational databases, database as a service platforms, or indexed flat files). For the majority of your traditional relational database platforms, in public or private, platform-as-a-service environments will be the deployment model of choice.
The first phase in the data security life cycle is to classify sensitive data as it's "created." Created, in this context, can mean several things. It can be literally as data is created, or it can mean as data is moved from traditional IT systems into a cloud database, or apply to the discovery process when looking for data already residing in a cloud database. In essence, we discover what kind of data we have so we can define the proper security controls. In all cases we are creating a catalog of sensitive information, noting type and location, and designating how data will be protected.
If that sounds like the starting point for other security processes you have read about in the past, it is. But how you do it is very different. In many cases you don't have easy and direct access to the information you need. Many discovery, classification, and some rights management tools cannot be deployed in the cloud. Before you move data into the cloud, even before you create your policies and controls, you have to determine how to scan and classify data. For cataloging data as it is moved into the cloud:
* use DLP and content discovery tools to locate and classify unstructured data. Specify tags that will identify flat-file contents and direct rights management. Apply tags as you move data into the cloud.
* use data "crawlers" to scan relational databases for sensitive information. Define labels for relational cloud databases. All of the major relational platforms have label security options that you leverage, but you want to add these to cloud schema prior to data migration. You'll most likely do bulk inserts, so labels and security policies must be defined first. Plan on writing post-insertion scripts to apply controls and verify labels and access controls are in place. * use DRM, or encryption, to protect data where labeling technologies and access controls are insufficient. Different encryption keys can be used to help segregate who can access information, but this will need to be built into the application layer.
That's the easy part. It's harder for those of you who are already in the cloud, especially those using SaaS. For data that already resides within the cloud:
* SaaS: This is one of the harder problems to solve because the database is hidden from you by design.
Determine whether your SaaS service will provide you with a database schema -- with column definitions -- to assist locating sensitive information. This is a manual process of determining looking at storage structure to find what data needs to be protected. Because your service provider might not allow traditional data discovery tools in its multitenant environment, request an archive of your data and use scanning tools to locate sensitive information. You'll manually review CSV files to determine which columns contain sensitive information. You will need to work with the provider to see what options for encryption, label security, or authorization mapping is available. You will need to review, and possibly modify, your service-level agreements (SLAs) to enforce security controls.
* IaaS: In this case, IaaS equates to database-as-a-service. The good news is you have more access to the database infrastructure so you can discover database structure, system catalog, and content. The bad news is most databases provided as a service are not the same relational platforms you know and love. This means many of the discovery, rights management, and classification tools you use today are not supported. Column/Table encryption, label security and discovery tools are not provided.
As with SaaS, analyze archives to locate sensitive information, or scan database contents and structure with your own scripts. For tagged file or ISAM storage, use tags to designate classification. Work with your provider to determine what authorization facilities are available for fine-grained access controls. You will need to modify your queries to inspect the tags in order to mimic labeling capabilities, but for now we are focused on finding and tagging data so we know how we will protect information in subsequent steps.
* PaaS: Platform-as-a-service offers great flexibility because you can use the same databases in the cloud that you are already familiar with. And all of the existing tools for discovery, classification, and rights management will still work. The biggest problem is finding all of the data. Many leverage global resources to vertically partition databases, deploy multinode grids, or even mirroring. Most use unstructured storage to take database snapshots in lieu of traditional tape archives. Data gets strewn everywhere, so you need to take care in understanding the complete data life cycle prior to discovery. If it seems like we are doing a lot of work before we even get data into the cloud, we are. For most data centers, security has been an evolutionary process, with controls added in bits and pieces over time. For cloud deployments, we are not only moving the data center, but altering many of the underlying assumptions, so we need to rethink data usage and security controls.
Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.
Securing Databases In The Cloud series: