informa
Commentary

Data Masking Primer

Data masking is an approach to data security used to conceal sensitive information. Unlike encryption, which renders data unusable until it is restored to clear text, masking is designed to protect data while retaining business functionality.
Data masking is an approach to data security used to conceal sensitive information. Unlike encryption, which renders data unusable until it is restored to clear text, masking is designed to protect data while retaining business functionality.Masking is most commonly used with relational databases, maintaining the complex data relationships that database applications rely on. Masking, in essence, scrambles data in such a way as to render individual data meaningless, but still provides business use and database functional dependencies. One example: shuffling patient care data so that individual data points cannot be traced to one person, but medical trend data can still be derived from the database as a whole.

The two most common business use cases for masking are testing and analytics. Using real customer data is the best way to confirm application functionality, but moving sensitive production data (patient records, financial transactions, customer history) into lower security test systems is very risky. Similarly, so is moving sensitive data into business analytics and decision-support systems, with correspondingly greater exposure to loss. Masking provides test applications and business analytics with valuable data and simultaneously secure sensitive information.

"Data masking" is the industry accepted term for this market segment. Masking implies concealment, but not alterations; most data masking products alter the original copy. There are many other ways to scramble data, including transposition, substitution, obfuscation, concatenation, statistical averaging, and hashing algorithms (just to name a few). These technologies transform information into something that looks like the original, but with the original copy obliterated, and the new data cannot be reverse-engineered.

Data masking is commonly employed using three basic strategies:

1. ETL (Extract, Transform and Load): This describes the process most commonly associated with data masking. As data is queried or archived from the database, it is run through a transformational algorithm and then reloaded into a test or decision-support database. The original production database remains intact, but the copies have been transformed into a safe state.

2. Dynamic In Place Masking: This is a new catchphrase for the masking market and, unlike ETL, does not create a new copy. Dynamic masking keeps the original data, but creates a transformation "mask" dynamically, as queries are received. Implemented as a database "view" or trigger, query results are transformed before returned to the user. Depending on users' credentials, they may get unaltered data or masked data. This allows masking to be run in parallel to the original data set, using the same database installation, but it comes at some cost in performance.

3. Static In Place Masking: In this model, original data within the database undergoes obfuscation in place. The vendors provide the capability to make the changes without breaking data relationships. This model allows for complex, multitransformational algorithms to be applied simultaneously to keep obfuscated data value close to the original. There is no performance degradation or additional space requirements, but it requires periodic checking to mask new data entries.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading.

Recommended Reading: