Perimeter
9/20/2012
12:06 PM
Adrian Lane
Adrian Lane
Commentary
50%
50%

A Look At Encrypted Query Processing

Stupid encryption tricks, only without a funny YouTube video

Encrypting data is one of the most basic -- and most effective -- data security measures we have at our disposal. But when used with relational databases, encryption creates two major problems.

The first problem is relational databases require that you define the data type prior to storage. VARCHAR() is a common database data type for storing application data, but requires a pre-defined size. Encryption algorithms typically output binary data, whose output length is not known beforehand. This creates a mismatch that requires redefining, and in most cases rebuilding, the database to accommodate encrypted data. The second and more serious issue is you cannot perform queries or functions on encrypted data. You can't check date ranges or make comparisons inside the database when data is encrypted. And you can effectively use indexes to sort and mange data either.

There are several ways encryption is employed today to address these issues, most commonly a) using a form of transparent encryption or b) encrypting at the application layer. With transparent encryption data stored on disk is encrypted, but processed inside the database in clear text. With encryption at the application layer, the app decrypts and processes data locally and uses the database purely as a place to store data.

But what if you don't trust the DBA? Or you just don't trust your cloud service provider? Worse, what if you think the database engine may be compromised by an attacker? I came across a post on Werner Vogels' blog Back-to-the-Future Weekend Reading - CryptDB, where he discusses a research paper on processing encrypted data within a relational database. The idea that is presented in this research paper is "SQL-aware Encryption." The goal is to keep data protected even if the database server and app server have been compromised. Their approach is to provide encryption that still allows normal relational database functions to work.

What does this mean? It means comparisons of two encrypted values like "=", or ">" would work on encrypted data. Database functions and most comparisons operations would continue to work in the scheme being described. SQL queries of the most common types will continue to work as before, so you get full database functionality on encrypted data. That sounds ideal, right? Not so fast.

The concept the authors are trying to duplicate is homomorphic encryption. But there is no true homomorphic encryption available commercially today. What they are in fact doing is using "off-the-shelf" encryption algorithms like AES, only without initialization vectors or nonce to randomize the output of the block cipher. That means when you encrypt the word "SELECT" with a specific key, you get the same binary result every time.

And that makes it a lot easier to guess the encrypted values! Keep in mind that SQL queries have a common structure and finite set of elements. It's fairly easy to pre-compute encrypted values on the words SELECT, FROM, WHERE, MAX, SORT, GROUP BY, DISTINCT, etc. If all data is stored under Bob's schema is encrypted with Bob's single key, text can be guessed by their frequency of occurrence.

So what's going on here is we are sacrificing a degree of security encryption provides us to make it harder for an attacker to steal sensitive information should they compromise the database server, the application server, or both. The degree of security is inverse to the level of utility. The more complex the query operation provided, the less secure the encryption variant. The data won't be sitting in clear text where a malicious party can steal it. However, if the host platform has been compromised, your data is still subject to several types of attack. It's much more likely an attacker will conduct word-frequency attacks and guess the contents of the database -- with a reasonable degree of accuracy. It's more security, but a 'speed-bump' rather than a barrier.

The lesson here is there is no free lunch. If you want strong crypto to preserve the privacy and integrity of data for long periods of time, some of the variations described in CryotDB will not be a good option. It will -- as the paper posits -- raise the bar on data privacy while allowing the relational database platform to still function. There are several small commercial vendors that offer this type of technology today -- with the same basic methods and the same basic flaws. But if you have a database environment you suspect will be compromised, there are better technologies available. Use tokenization or masking to create non-sensitive random copies that also preserve data value and database operations. Those technologies completely remove the risk without the performance penalty or complexity.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading. Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and ... View Full Bio

Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
White Papers
Cartoon
Current Issue
Dark Reading Tech Digest, Dec. 19, 2014
Software-defined networking can be a net plus for security. The key: Work with the network team to implement gradually, test as you go, and take the opportunity to overhaul your security strategy.
Flash Poll
Title Partner’s Role in Perimeter Security
Title Partner’s Role in Perimeter Security
Considering how prevalent third-party attacks are, we need to ask hard questions about how partners and suppliers are safeguarding systems and data.
Video
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2014-8142
Published: 2014-12-20
Use-after-free vulnerability in the process_nested_data function in ext/standard/var_unserializer.re in PHP before 5.4.36, 5.5.x before 5.5.20, and 5.6.x before 5.6.4 allows remote attackers to execute arbitrary code via a crafted unserialize call that leverages improper handling of duplicate keys w...

CVE-2013-4440
Published: 2014-12-19
Password Generator (aka Pwgen) before 2.07 generates weak non-tty passwords, which makes it easier for context-dependent attackers to guess the password via a brute-force attack.

CVE-2013-4442
Published: 2014-12-19
Password Generator (aka Pwgen) before 2.07 uses weak pseudo generated numbers when /dev/urandom is unavailable, which makes it easier for context-dependent attackers to guess the numbers.

CVE-2013-7401
Published: 2014-12-19
The parse_request function in request.c in c-icap 0.2.x allows remote attackers to cause a denial of service (crash) via a URI without a " " or "?" character in an ICAP request, as demonstrated by use of the OPTIONS method.

CVE-2014-2026
Published: 2014-12-19
Cross-site scripting (XSS) vulnerability in the search functionality in United Planet Intrexx Professional before 5.2 Online Update 0905 and 6.x before 6.0 Online Update 10 allows remote attackers to inject arbitrary web script or HTML via the request parameter.

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Join us Wednesday, Dec. 17 at 1 p.m. Eastern Time to hear what employers are really looking for in a chief information security officer -- it may not be what you think.