A new machine learning tool aims to mine privacy policies on behalf of users.

4 Min Read

Aiming to correct the privacy imbalance between consumers and businesses, a group of academics released a tool that uses automation and machine learning to mine privacy policies and deliver easy-to-use options for a consumer to limit a company's use of data.

The browser plug-in, called Opt-Out Easy,  is the brainchild of a group of researchers from Carnegie Mellon University, the University of Michigan, Stanford University, and Penn State University and represents the latest shift on the status quo in data collection. The groups have analyzed a large number of privacy policies with machine learning algorithms to identify the actionable choices users can take using those policies.

The goal of the tool is to allow consumers to easily apply their own privacy wishes to any website they visit, says Norman Sadeh, a CyLab researcher and professor in Carnegie Mellon’s School of Computer Science.

"Privacy regulations are a great step forward because you need to offer people choices," he says. "On the other hand, what good are those choices to anyone if engaging with these policies is too burdensome? Right now we don't see a lot of people making privacy decisions because they don't know they can."

The tool represents the latest potential disruption to the data economy that businesses may have to contend with this year.

In the past three years, new regulations — such as the European Union's General Data Protection Regulation (CDPR) and the California Consumer Protection Act (CCPA) — have come into force, driving ever-larger fines for data breaches and privacy violations. In addition, new technologies, such as the Solid project at the Massachussetts Institute of Technology, offer a different approach to data sharing that empowers individuals over businesses. 

These changes are already being noted by privacy-focused companies, says Caitlin Fennessy, research director at the International Association of Privacy Professionals. 

"Data is valuable and so companies are still going to want to collect and use it, but if it is not providing value to the company, then it is creating big risk," she says. "With the increase in hacks and breaches ... as well as the increased focus of regulators on enforcing substantive privacy protections, companies are becoming a lot more strategic about how the approach data collection and retention."

In many ways, companies are being dragged into the future. 

The legal framework that allows businesses to collect a broad range of data with purported consumer approval, so-called "notice and consent," has largely failed to provide any meaningful privacy protection. Companies regularly drown out meaningful language in legalese deep inside privacy policies written at a grade level that very few people can, and ever, read. An analysis of 150 privacy policies by The New York Times, for example, found the vast majority required a college-level reading ability and at least 15 minutes to read.

The university researchers aim to even the playing field. Using machine learning, the group built a model to recognize the choices provided by privacy policies, including opting out of data collection and sharing of data. The approach has been used to identify the opt-out links in nearly 7,000 privacy policies

The approach could even be used to allow consumers to specify their desired level of sharing and use the machine learning system to find the right settings to achieve that, says CMU's Sadeh. While the tool does not have that capability yet, finding ways to tailor privacy preferences may be preferable to a one-size-fits-all approach. 

"Privacy is ultimately about ethical principles," he says. "Those principles include transparency, they include fairness, but they also include agency. I should be able to take control of what happens to my data."

Fennessy sees the tool as a way give users more control of privacy without requiring companies to take action — perhaps the best of both worlds. 

However, she stresses that widespread adoption of the tool will require companies to better manage the privacy preferences of every user. While automated tools for data security and privacy compliance are available, many companies have not yet adopted them, she says. 

"The more opt-out requests that companies see, the more likely that they will need an automated solution," she says. "Companies who are looking to the future are saying that they need to automate."

She also notes that the automation extends down to whichever companies are being used to process data or transactions. Just as supply chain issues have become a significant consideration for security, third-party suppliers of data processing services are a significant privacy issue as well.

"If you are passing private data onto processors, you will have to work with them to correctly handle the data as well as process correction and deletion requests," she says. "As the volume of transactions increase, handling those different communications will require automation, especially for vendors that are handling a whole bunch of clients."

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights