12:06 PM
Adrian Lane
Adrian Lane
Connect Directly

A Look At Encrypted Query Processing

Stupid encryption tricks, only without a funny YouTube video

Encrypting data is one of the most basic -- and most effective -- data security measures we have at our disposal. But when used with relational databases, encryption creates two major problems.

The first problem is relational databases require that you define the data type prior to storage. VARCHAR() is a common database data type for storing application data, but requires a pre-defined size. Encryption algorithms typically output binary data, whose output length is not known beforehand. This creates a mismatch that requires redefining, and in most cases rebuilding, the database to accommodate encrypted data. The second and more serious issue is you cannot perform queries or functions on encrypted data. You can't check date ranges or make comparisons inside the database when data is encrypted. And you can effectively use indexes to sort and mange data either.

There are several ways encryption is employed today to address these issues, most commonly a) using a form of transparent encryption or b) encrypting at the application layer. With transparent encryption data stored on disk is encrypted, but processed inside the database in clear text. With encryption at the application layer, the app decrypts and processes data locally and uses the database purely as a place to store data.

But what if you don't trust the DBA? Or you just don't trust your cloud service provider? Worse, what if you think the database engine may be compromised by an attacker? I came across a post on Werner Vogels' blog Back-to-the-Future Weekend Reading - CryptDB, where he discusses a research paper on processing encrypted data within a relational database. The idea that is presented in this research paper is "SQL-aware Encryption." The goal is to keep data protected even if the database server and app server have been compromised. Their approach is to provide encryption that still allows normal relational database functions to work.

What does this mean? It means comparisons of two encrypted values like "=", or ">" would work on encrypted data. Database functions and most comparisons operations would continue to work in the scheme being described. SQL queries of the most common types will continue to work as before, so you get full database functionality on encrypted data. That sounds ideal, right? Not so fast.

The concept the authors are trying to duplicate is homomorphic encryption. But there is no true homomorphic encryption available commercially today. What they are in fact doing is using "off-the-shelf" encryption algorithms like AES, only without initialization vectors or nonce to randomize the output of the block cipher. That means when you encrypt the word "SELECT" with a specific key, you get the same binary result every time.

And that makes it a lot easier to guess the encrypted values! Keep in mind that SQL queries have a common structure and finite set of elements. It's fairly easy to pre-compute encrypted values on the words SELECT, FROM, WHERE, MAX, SORT, GROUP BY, DISTINCT, etc. If all data is stored under Bob's schema is encrypted with Bob's single key, text can be guessed by their frequency of occurrence.

So what's going on here is we are sacrificing a degree of security encryption provides us to make it harder for an attacker to steal sensitive information should they compromise the database server, the application server, or both. The degree of security is inverse to the level of utility. The more complex the query operation provided, the less secure the encryption variant. The data won't be sitting in clear text where a malicious party can steal it. However, if the host platform has been compromised, your data is still subject to several types of attack. It's much more likely an attacker will conduct word-frequency attacks and guess the contents of the database -- with a reasonable degree of accuracy. It's more security, but a 'speed-bump' rather than a barrier.

The lesson here is there is no free lunch. If you want strong crypto to preserve the privacy and integrity of data for long periods of time, some of the variations described in CryotDB will not be a good option. It will -- as the paper posits -- raise the bar on data privacy while allowing the relational database platform to still function. There are several small commercial vendors that offer this type of technology today -- with the same basic methods and the same basic flaws. But if you have a database environment you suspect will be compromised, there are better technologies available. Use tokenization or masking to create non-sensitive random copies that also preserve data value and database operations. Those technologies completely remove the risk without the performance penalty or complexity.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading. Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and ... View Full Bio

Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
White Papers
Flash Poll
Current Issue
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
Published: 2014-07-09
Heap-based buffer overflow in the xjpegls.dll (aka JLS, JPEG-LS, or JPEG lossless) format plugin in XnView 1.99 and 1.99.1 allows remote attackers to execute arbitrary code via a crafted JLS image file.

Published: 2014-07-09
The cdf_read_short_sector function in cdf.c in file before 5.19, as used in the Fileinfo component in PHP before 5.4.30 and 5.5.x before 5.5.14, allows remote attackers to cause a denial of service (assertion failure and application exit) via a crafted CDF file.

Published: 2014-07-09
Adobe Flash Player before and 14.x before on Windows and OS X and before on Linux, Adobe AIR before on Android, Adobe AIR SDK before, and Adobe AIR SDK & Compiler before allow attackers to bypass intended access restrictions via uns...

Published: 2014-07-09
Adobe Flash Player before and 14.x before on Windows and OS X and before on Linux, Adobe AIR before on Android, Adobe AIR SDK before, and Adobe AIR SDK & Compiler before allow attackers to bypass intended access restrictions via uns...

Published: 2014-07-09
The NTP implementation in Cisco IOS and IOS XE does not properly support use of the access-group command for a "deny all" configuration, which allows remote attackers to bypass intended restrictions on time synchronization via a standard query, aka Bug ID CSCuj66318.

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Marilyn Cohodas and her guests look at the evolving nature of the relationship between CIO and CSO.