Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

2/13/2020
04:35 PM
Connect Directly
Twitter
LinkedIn
RSS
E-Mail
50%
50%

Architectural Analysis IDs 78 Specific Risks in Machine-Learning Systems

The new threat model hones in on ML security at the design state.

Researchers at the Berryville Institute of Machine Learning (BIML) have developed a formal risk framework to guide development of secure machine-language (ML) systems.

BIML's architectural risk analysis of ML systems is different from previous work in this area in that it focuses on issues that engineers and developers need to be paying attention to at the outset when designing and building ML systems. Most of the previous work on securing ML systems has focused on how to best protect operational systems and data against particular attacks and not on how to design them securely in the first place.

"This work provides a very solid technical foundation for taking a look at the risks associated with adopting and using ML," says Gary McGraw, noted security researcher, author, and co-founder of BMIL. The need for this kind of a risk analysis is critical because very few are really paying any attention to ML security at the design state, even as ML use is growing rapidly, he says.

For the architectural risk analysis, BIML researchers considered nine separate components that they identified as common to setting up, training, and deploying a typical ML system: raw data; dataset assembly; datasets; learning algorithms; evaluation; inputs; trained model; inference algorithm; and outputs. They then identified and ranked multiple data security risks associated with each of those components so engineers and developers can implement controls for mitigating those risks where possible.

For instance, they identified data confidentiality, the trustworthiness of data sources, and data storage as key security considerations around the raw data used in ML systems, such as training data, test inputs, and operational data. Similarly, for the datasets used in ML systems, the researchers identified data poisoning — where an attacker manipulates data to cause ML systems to go awry — as a major risk. For training algorithms, BIML researchers identified the potential for attackers to subtly nudge an online learning system in a direction not intended by its developers as a major concern.

In total, BIML's architectural analysis showed that typical ML systems are exposed to as many as 78 specific security risks across all individual components. They categorized the risks under multiple categories including input manipulation, data manipulation, model manipulation, and extraction attacks where threat actors try and extract sensitive data from an ML system dataset.

McGraw says the BIML analysis is about identifying and discussing ML risks and discussing them, and not so much about what to do about them. "Identifying the risks is more than half the battle," he says. "Once you know what the risks are, it's a lot easier to design around them."

The BMIL report listed the top 10 risks impacting ML systems. According to the think tank, the biggest — and most commonly discussed risks — to ML systems are so-called "adversarial examples" involving the use of malicious inputs to cause the system to make false predictions or categorizations. Data poisoning, online system manipulation, and attacks impacting data confidentiality, data integrity, and data output were all identified as other top ML security risks.

The Importance of Data Security
"One of the remarkable differences in ML security and, say, normal operational security is that data and data security play a huge role," says McGraw. "When you are training up a system, you can train it up to be racist and xenophobic and horrible if your data are set up that way," he says.

As one example, he points to Microsoft's very short-lived experiment with Tay, an AI-enabled chatbot that learned from interactions on Twitter and quickly began spewing out venomous tweets of its own. "Tay was learning about Twitter by being on it, and what happened was it became a racist, bigoted troll," he says. "Tay learned what it was like to be on Twitter, and it wasn't pretty."  

Such incidents highlight why organizations need to think carefully about the data they are using for machine training, how the data gets sourced, and whether the sources are reliable, he says.

Contrary to what some might assume, attacking a machine-learning system is not all that complicated, McGraw notes. "Imagine the input data for Google Translate is anything that you type in," he says. "If you are using public data sources to train your machine learning model, you have to think about what happens when an attacker starts screwing around with.

"The good news is if you are an engineer or a designer, you can make it harder for someone to attack your system. That's the purpose of this work."

Related Content:

Check out The Edge, Dark Reading's new section for features, threat data, and in-depth perspectives. Today's featured story: "Chaos & Order: The Keys to Quantum-Proof Encryption"

Jai Vijayan is a seasoned technology reporter with over 20 years of experience in IT trade journalism. He was most recently a Senior Editor at Computerworld, where he covered information security and data privacy issues for the publication. Over the course of his 20-year ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
gem@garymcgraw.com
50%
50%
[email protected],
User Rank: Author
2/13/2020 | 11:22:44 PM
BIML Risk Analysis
Thanks for this thoughtful writeup of our work at BIML.  Though there is a link to our work in the body of the article, it may be hard to spot.

You can download the article here: berryvilleiml.com/results (cut and paste)

There is also an interactive risk model you can play with: berryvilleiml.com/interactive (cut and paste)

gem
COVID-19: Latest Security News & Commentary
Dark Reading Staff 9/17/2020
Cybersecurity Bounces Back, but Talent Still Absent
Simone Petrella, Chief Executive Officer, CyberVista,  9/16/2020
Meet the Computer Scientist Who Helped Push for Paper Ballots
Kelly Jackson Higgins, Executive Editor at Dark Reading,  9/16/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-5421
PUBLISHED: 2020-09-19
In Spring Framework versions 5.2.0 - 5.2.8, 5.1.0 - 5.1.17, 5.0.0 - 5.0.18, 4.3.0 - 4.3.28, and older unsupported versions, the protections against RFD attacks from CVE-2015-5211 may be bypassed depending on the browser used through the use of a jsessionid path parameter.
CVE-2020-8225
PUBLISHED: 2020-09-18
A cleartext storage of sensitive information in Nextcloud Desktop Client 2.6.4 gave away information about used proxies and their authentication credentials.
CVE-2020-8237
PUBLISHED: 2020-09-18
Prototype pollution in json-bigint npm package < 1.0.0 may lead to a denial-of-service (DoS) attack.
CVE-2020-8245
PUBLISHED: 2020-09-18
Improper Input Validation on Citrix ADC and Citrix Gateway 13.0 before 13.0-64.35, Citrix ADC and NetScaler Gateway 12.1 before 12.1-58.15, Citrix ADC 12.1-FIPS before 12.1-55.187, Citrix ADC and NetScaler Gateway 12.0, Citrix ADC and NetScaler Gateway 11.1 before 11.1-65.12, Citrix SD-WAN WANOP 11....
CVE-2020-8246
PUBLISHED: 2020-09-18
Citrix ADC and Citrix Gateway 13.0 before 13.0-64.35, Citrix ADC and NetScaler Gateway 12.1 before 12.1-58.15, Citrix ADC 12.1-FIPS before 12.1-55.187, Citrix ADC and NetScaler Gateway 12.0, Citrix ADC and NetScaler Gateway 11.1 before 11.1-65.12, Citrix SD-WAN WANOP 11.2 before 11.2.1a, Citrix SD-W...