Perimeter
9/20/2012
12:06 PM
Adrian Lane
Adrian Lane
Commentary
Connect Directly
RSS
E-Mail
50%
50%

A Look At Encrypted Query Processing

Stupid encryption tricks, only without a funny YouTube video

Encrypting data is one of the most basic -- and most effective -- data security measures we have at our disposal. But when used with relational databases, encryption creates two major problems.

The first problem is relational databases require that you define the data type prior to storage. VARCHAR() is a common database data type for storing application data, but requires a pre-defined size. Encryption algorithms typically output binary data, whose output length is not known beforehand. This creates a mismatch that requires redefining, and in most cases rebuilding, the database to accommodate encrypted data. The second and more serious issue is you cannot perform queries or functions on encrypted data. You can't check date ranges or make comparisons inside the database when data is encrypted. And you can effectively use indexes to sort and mange data either.

There are several ways encryption is employed today to address these issues, most commonly a) using a form of transparent encryption or b) encrypting at the application layer. With transparent encryption data stored on disk is encrypted, but processed inside the database in clear text. With encryption at the application layer, the app decrypts and processes data locally and uses the database purely as a place to store data.

But what if you don't trust the DBA? Or you just don't trust your cloud service provider? Worse, what if you think the database engine may be compromised by an attacker? I came across a post on Werner Vogels' blog Back-to-the-Future Weekend Reading - CryptDB, where he discusses a research paper on processing encrypted data within a relational database. The idea that is presented in this research paper is "SQL-aware Encryption." The goal is to keep data protected even if the database server and app server have been compromised. Their approach is to provide encryption that still allows normal relational database functions to work.

What does this mean? It means comparisons of two encrypted values like "=", or ">" would work on encrypted data. Database functions and most comparisons operations would continue to work in the scheme being described. SQL queries of the most common types will continue to work as before, so you get full database functionality on encrypted data. That sounds ideal, right? Not so fast.

The concept the authors are trying to duplicate is homomorphic encryption. But there is no true homomorphic encryption available commercially today. What they are in fact doing is using "off-the-shelf" encryption algorithms like AES, only without initialization vectors or nonce to randomize the output of the block cipher. That means when you encrypt the word "SELECT" with a specific key, you get the same binary result every time.

And that makes it a lot easier to guess the encrypted values! Keep in mind that SQL queries have a common structure and finite set of elements. It's fairly easy to pre-compute encrypted values on the words SELECT, FROM, WHERE, MAX, SORT, GROUP BY, DISTINCT, etc. If all data is stored under Bob's schema is encrypted with Bob's single key, text can be guessed by their frequency of occurrence.

So what's going on here is we are sacrificing a degree of security encryption provides us to make it harder for an attacker to steal sensitive information should they compromise the database server, the application server, or both. The degree of security is inverse to the level of utility. The more complex the query operation provided, the less secure the encryption variant. The data won't be sitting in clear text where a malicious party can steal it. However, if the host platform has been compromised, your data is still subject to several types of attack. It's much more likely an attacker will conduct word-frequency attacks and guess the contents of the database -- with a reasonable degree of accuracy. It's more security, but a 'speed-bump' rather than a barrier.

The lesson here is there is no free lunch. If you want strong crypto to preserve the privacy and integrity of data for long periods of time, some of the variations described in CryotDB will not be a good option. It will -- as the paper posits -- raise the bar on data privacy while allowing the relational database platform to still function. There are several small commercial vendors that offer this type of technology today -- with the same basic methods and the same basic flaws. But if you have a database environment you suspect will be compromised, there are better technologies available. Use tokenization or masking to create non-sensitive random copies that also preserve data value and database operations. Those technologies completely remove the risk without the performance penalty or complexity.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading. Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and ... View Full Bio

Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
White Papers
Flash Poll
Current Issue
Cartoon
Video
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2014-0607
Published: 2014-07-24
Unrestricted file upload vulnerability in Attachmate Verastream Process Designer (VPD) before R6 SP1 Hotfix 1 allows remote attackers to execute arbitrary code by uploading and launching an executable file.

CVE-2014-1419
Published: 2014-07-24
Race condition in the power policy functions in policy-funcs in acpi-support before 0.142 allows local users to gain privileges via unspecified vectors.

CVE-2014-2360
Published: 2014-07-24
OleumTech WIO DH2 Wireless Gateway and Sensor Wireless I/O Modules allow remote attackers to execute arbitrary code via packets that report a high battery voltage.

CVE-2014-2361
Published: 2014-07-24
OleumTech WIO DH2 Wireless Gateway and Sensor Wireless I/O Modules, when BreeZ is used, do not require authentication for reading the site security key, which allows physically proximate attackers to spoof communication by obtaining this key after use of direct hardware access or manual-setup mode.

CVE-2014-2362
Published: 2014-07-24
OleumTech WIO DH2 Wireless Gateway and Sensor Wireless I/O Modules rely exclusively on a time value for entropy in key generation, which makes it easier for remote attackers to defeat cryptographic protection mechanisms by predicting the time of project creation.

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Sara Peters hosts a conversation on Botnets and those who fight them.