Cyber Risk

Data Masking Primer

December 26, 2009

3 Min Read

Data masking is an approach to data security used to conceal sensitive information. Unlike encryption, which renders data unusable until it is restored to clear text, masking is designed to protect data while retaining business functionality.Masking is most commonly used with relational databases, maintaining the complex data relationships that database applications rely on. Masking, in essence, scrambles data in such a way as to render individual data meaningless, but still provides business use and database functional dependencies. One example: shuffling patient care data so that individual data points cannot be traced to one person, but medical trend data can still be derived from the database as a whole.

The two most common business use cases for masking are testing and analytics. Using real customer data is the best way to confirm application functionality, but moving sensitive production data (patient records, financial transactions, customer history) into lower security test systems is very risky. Similarly, so is moving sensitive data into business analytics and decision-support systems, with correspondingly greater exposure to loss. Masking provides test applications and business analytics with valuable data and simultaneously secure sensitive information.

"Data masking" is the industry accepted term for this market segment. Masking implies concealment, but not alterations; most data masking products alter the original copy. There are many other ways to scramble data, including transposition, substitution, obfuscation, concatenation, statistical averaging, and hashing algorithms (just to name a few). These technologies transform information into something that looks like the original, but with the original copy obliterated, and the new data cannot be reverse-engineered.

Data masking is commonly employed using three basic strategies:

1. ETL (Extract, Transform and Load): This describes the process most commonly associated with data masking. As data is queried or archived from the database, it is run through a transformational algorithm and then reloaded into a test or decision-support database. The original production database remains intact, but the copies have been transformed into a safe state.

2. Dynamic In Place Masking: This is a new catchphrase for the masking market and, unlike ETL, does not create a new copy. Dynamic masking keeps the original data, but creates a transformation "mask" dynamically, as queries are received. Implemented as a database "view" or trigger, query results are transformed before returned to the user. Depending on users' credentials, they may get unaltered data or masked data. This allows masking to be run in parallel to the original data set, using the same database installation, but it comes at some cost in performance.

3. Static In Place Masking: In this model, original data within the database undergoes obfuscation in place. The vendors provide the capability to make the changes without breaking data relationships. This model allows for complex, multitransformational algorithms to be applied simultaneously to keep obfuscated data value close to the original. There is no performance degradation or additional space requirements, but it requires periodic checking to mask new data entries.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.

About the Author(s)

Adrian Lane

Contributor

Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and Unisys, he has extensive experience in the vendor community, but brings a pragmatic perspective to selecting and deploying technologies having worked on "the other side" as CIO in the finance vertical. Prior to joining Securosis, Adrian served as the CTO/VP at companies such as IPLocks, Touchpoint, CPMi and Transactor/Brodia. He has been invited to present at dozens of security conferences, contributed articles to many major publications, and is easily recognizable by his "network hair" and propensity to wear loud colors.

See more from Adrian Lane

Related Topics

Related Topics

Related Topics

Related Topics

Data Masking Primer

About the Author(s)

Editor's Choice