The first problem is relational databases require that you define the data type prior to storage. VARCHAR() is a common database data type for storing application data, but requires a pre-defined size. Encryption algorithms typically output binary data, whose output length is not known beforehand. This creates a mismatch that requires redefining, and in most cases rebuilding, the database to accommodate encrypted data. The second and more serious issue is you cannot perform queries or functions on encrypted data. You can't check date ranges or make comparisons inside the database when data is encrypted. And you can effectively use indexes to sort and mange data either.
There are several ways encryption is employed today to address these issues, most commonly a) using a form of transparent encryption or b) encrypting at the application layer. With transparent encryption data stored on disk is encrypted, but processed inside the database in clear text. With encryption at the application layer, the app decrypts and processes data locally and uses the database purely as a place to store data.
But what if you don't trust the DBA? Or you just don't trust your cloud service provider? Worse, what if you think the database engine may be compromised by an attacker? I came across a post on Werner Vogels' blog Back-to-the-Future Weekend Reading - CryptDB, where he discusses a research paper on processing encrypted data within a relational database. The idea that is presented in this research paper is "SQL-aware Encryption." The goal is to keep data protected even if the database server and app server have been compromised. Their approach is to provide encryption that still allows normal relational database functions to work.
What does this mean? It means comparisons of two encrypted values like "=", or ">" would work on encrypted data. Database functions and most comparisons operations would continue to work in the scheme being described. SQL queries of the most common types will continue to work as before, so you get full database functionality on encrypted data. That sounds ideal, right? Not so fast.
The concept the authors are trying to duplicate is homomorphic encryption. But there is no true homomorphic encryption available commercially today. What they are in fact doing is using "off-the-shelf" encryption algorithms like AES, only without initialization vectors or nonce to randomize the output of the block cipher. That means when you encrypt the word "SELECT" with a specific key, you get the same binary result every time.
And that makes it a lot easier to guess the encrypted values! Keep in mind that SQL queries have a common structure and finite set of elements. It's fairly easy to pre-compute encrypted values on the words SELECT, FROM, WHERE, MAX, SORT, GROUP BY, DISTINCT, etc. If all data is stored under Bob's schema is encrypted with Bob's single key, text can be guessed by their frequency of occurrence.
So what's going on here is we are sacrificing a degree of security encryption provides us to make it harder for an attacker to steal sensitive information should they compromise the database server, the application server, or both. The degree of security is inverse to the level of utility. The more complex the query operation provided, the less secure the encryption variant. The data won't be sitting in clear text where a malicious party can steal it. However, if the host platform has been compromised, your data is still subject to several types of attack. It's much more likely an attacker will conduct word-frequency attacks and guess the contents of the database -- with a reasonable degree of accuracy. It's more security, but a 'speed-bump' rather than a barrier.
The lesson here is there is no free lunch. If you want strong crypto to preserve the privacy and integrity of data for long periods of time, some of the variations described in CryotDB will not be a good option. It will -- as the paper posits -- raise the bar on data privacy while allowing the relational database platform to still function. There are several small commercial vendors that offer this type of technology today -- with the same basic methods and the same basic flaws. But if you have a database environment you suspect will be compromised, there are better technologies available. Use tokenization or masking to create non-sensitive random copies that also preserve data value and database operations. Those technologies completely remove the risk without the performance penalty or complexity.
Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.