The EU's General Data Protection Regulation means that organizations must look at new ways to keep data secure as it moves.

Rick Bilodeau, Vice President of Marketing, StreamSets

May 3, 2018

4 Min Read

The EU's General Data Protection Regulation (GDPR) will take effect on May 25, a response to data breaches and demands for greater oversight relating to security of personal identifiable information (PII). As shown by the recent Equifax and Cambridge Analytica debacles, the risks to PII are real as digital transformation makes all interaction data usable and the Internet of Things (IoT) causes an explosion of new data sources.

GDPR is the latest of numerous laws around the use of PII. These laws often vary by jurisdiction, industry, and data type, making for a complex puzzle for enterprise data governance. For companies with large global customer bases, compliance with the strictest regulation across the customer base ends up being the prudent course, as it can be difficult to apply only geographic-specific restrictions to PII.

Adding to this complexity is the fact that digital data takes many forms. Efforts to analyze data to improve the business end up distributing PII throughout the enterprise. This business imperative to move and change data means that organizations must look at novel ways to keep it safe and secure.

This complexity bears itself out with the Gartner prediction that by the end of 2018, more than one-half of organizations affected by GDPR won't be in compliance. Given the high stakes of noncompliance, organizations must have technology and processes in place to protect PII.

Keeping the Genie in the Bottle
Many organizations already have solutions that scan for and protect personal "data at rest." However, in the time between when the data arrives and when it's masked or encrypted, it might have already been shared. And, with the growth of real-time stream processing, the time between arrival and sharing compresses to almost nothing. In short, the genie may be out of the bottle before you even know you have PII. 

Additionally, any arriving PII is moved across data stores and computing platforms for a valid business reason and to be available for use. A balance must be established between data protection and data availability. This balance can be achieved through governance zones that allow different levels of access based on the type of data and the type of user; however, achieving this adds another layer of complexity to data protection and compliance.

The problem of big data sources and data drift (where fields are added or data types are changed without notice) further complicates matters. New data sources such as IoT devices, API data, and log files are added all the time in the name of digital transformation and business agility, and they may include PII. Plus, many of these data sources that are governed by others or loosely governed — such as unstructured data sources — are subject to data drift. As a result, a data protection solution that is compliant on day 1 may be noncompliant by day 3.

Data Protection Should Start When Data Is Born
The pressures of real-time data, data sharing, and data drift mean that sole reliance on "scan at rest" across every data store is risky. Discovering PII and mitigating compliance exposure must start at the point of data ingestion. A multilayered strategy that includes both incoming pathways and the data stores is optimal.

First, inspect for patterns in the live data because your chief vulnerability is around sensitive data that you don't expect to see but arrives because it's impossible to keep track of all data efforts across the company, or simply because of data drift. To catch this data, you must scan the contents of your data flows, inspect the data, and compare it to known or likely PII patterns. Some form of probabilistic match capability will allow you to catch patterns and those that may be new or specific to your industry or company.

Second, you must be able to act on that data as soon as the PII pattern is detected and have a wide variety of actions to take. Then you can customize the approach based on the potential uses of the data.

Third, due to the need to classify the use of different data types as well as different user groups, the ideal approach should be based on centrally driven policy management that is integrated with how you protect data at rest to ensure completeness. Enterprise risk teams should set up security service-level agreements for data and expect the system to alert on violations and stop insecure data delivery before it happens.

Tooling > Coding
Monitoring and discovering sensitive data in stream can be very difficult. As GDPR takes effect, solutions must mature from ad hoc or DIY approaches focusing on data at rest to tooling that can discover and track data starting with its first appearance. Moving protection from data stores out to first detection is a critical step that will help ensure the integrity and security of PII.

Related Content:

About the Author(s)

Rick Bilodeau

Vice President of Marketing, StreamSets

Rick Bilodeau is a marketing leader with deep experience with enterprise data, networking, and security innovators. Before joining StreamSets, Rick led outbound marketing functions for B2B technology companies Qualys (IT security), iPass (enterprise mobility), and 3Com (enterprise networks) and also ran product marketing for online education leader Apollo Group.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights