News, news analysis, and commentary on the latest trends in cybersecurity technology.

Understanding the risks of generative AI and the specific defenses to build to mitigate those risks is vital for effective business and public use of GenAI.

+1
Banghua Zhu, Jiantao Jiaoand 1 more

November 20, 2023

4 Min Read
Collaboration between robotic hand and human software developer at computer keyboard solving problems
Source: Pitinan Piyavatin via Alamy Stock Photo

Jailbreaking and prompt injection are new, rising threats to generative AI (GenAI). Jailbreaking tricks the AI with specific prompts to produce harmful or misleading results. Prompt injection conceals malicious data or instructions within typical prompts, resembling SQL injection in databases, that leads the model to produce unintended outputs, creating vulnerabilities or reputational risks.

Reliance on generated content also creates other problems. For example, many developers are starting to use GenAI models, like Microsoft Copilot or ChatGPT, to help them write or revise source code. Unfortunately, recent research indicates that code output by GenAI can contain security vulnerabilities and other problems that developers might not realize. However, there is also hope that over time GenAI might be able to help developers write code that is more secure.

Additionally, GenAI is bad at keeping secrets. Training an AI on proprietary or sensitive data introduces the risk of that data being indirectly exposed or inferred. This may include the leak of personally identifiable information (PII) and access tokens. More importantly, detecting these leaks can be challenging due to the unpredictability of the model's behavior. Given the vast number of potential prompts a user might pose, it's infeasible to comprehensively anticipate and guard against them all.

Traditional Approaches Fall Short

Attacks on GenAI are more similar to attacks on humans — such as scams, con games, and social engineering — than technical attacks on code. Traditional security products like rule-based firewalls, designed primarily for conventional cyber threats, were not designed with the dynamic and adaptive nature of GenAI threats in mind and can't address the emergent threats outlined above. Two common security methodologies — data obfuscation and rule-based filtering — have significant limitations.

Data obfuscation or encryption, which disguises original data to protect sensitive information, is frequently used to ensure data privacy. However, the challenge of data obfuscation for GenAI is the difficulty in pinpointing and defining which data is sensitive. Furthermore, the interdependencies in data sets mean that even if certain pieces of information are obfuscated, other data points might provide enough context for artificial intelligence to infer the missing data.

Traditionally, rule-based filtering methods protected against undesirable outputs. Applying this to GenAI by scanning its inputs and outputs seems intuitive. However, malicious users can often bypass these systems, making them unsuitable for AI safety.

This figure highlights some complex jailbreaking prompts that evade simple rules:

Rule-based defenses can be easily defeated. (Source: Banghua Zhu, Jiantao Jiao, and David Wagner)

Current models from companies like OpenAI and Anthropic use RLHF to align model outputs with universal human values. However, universal values may not be sufficient: Each application of GenAI may require its own customization for comprehensive protection.

Toward a More Robust GenAI Security

As shown in the examples above, attacks on GenAI can be diverse and hard to anticipate. Recent research emphasizes that a defense will need to be as intelligent as the underlying model to be effective. Using GenAI to protect GenAI is a promising direction for defense. We foresee two potential approaches: black-box and white-box defense.

A black-box defense would entail an intelligent monitoring system — which necessarily has a GenAI component — for GenAI, analyzing outputs for threats. It's akin to having a security guard who inspects everything that comes out of a building. It is probably most appropriate for commercial closed-source GenAI models, where there is no way to modify the model itself.

A white-box defense delves into the model's internals, providing both a shield and the knowledge to use it. With open GenAI models, it becomes possible to fine-tune them against known malicious prompts, much like training someone in self-defense. While a black-box approach might offer protection, it lacks tailored training; thus, the white-box method is more comprehensive and effective against unseen attacks.

Besides intelligent defenses, GenAI calls for evolving threat management. GenAI threats, like all technology threats, aren't stagnant. It's a cat-and-mouse game where, for every defensive move, attackers design a countermove. Thus, security systems need to be ever-evolving, learning from past breaches and anticipating future strategies. There is no universal protection for prompt injection, jailbreaks, or other attacks, so for now one pragmatic defense might be to monitor and detect threats. Developers will need tools to monitor, detect, and respond to attacks on GenAI, as well as a threat intelligence strategy to track new emerging threats.

We also need to preserve flexibility in defense techniques. Society has had thousands of years to come up with ways to protect against scammers; GenAIs have been around for only several years, so we're still figuring out how to defend them. We recommend developers design systems in a way that preserves flexibility for the future, so that new defenses can be slotted in as they are discovered.

With the AI era upon us, it's crucial to prioritize new security measures that help machines interact with humanity effectively, ethically, and safely. That means using intelligence equal to the task.

About the Author(s)

Banghua Zhu

Ph.D. candidate, California, Berkeley

Banghua Zhu is a Ph.D. candidate in the Department of Electrical Engineering and Computer Sciences at University of California, Berkeley, advised by Prof. Michael I. Jordan and Prof. Jiantao Jiao. He is the recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research. He is affiliated with Berkeley AI Research (BAIR), Berkeley Laboratory for Information and System Sciences (BLISS) and the Center for the Theoretical Foundations of Learning, Inference, Information, Intelligence, Mathematics and Microeconomics at Berkeley (CLIMB). His research interests include generative AI, reinforcement learning with human feedback, and trustworthy machine learning.

Jiantao Jiao

Assistant Professor, University of California

Jiantao Jiao is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences and Department of Statistics at the University of California, Berkeley. He received a Ph.D. from Stanford University in 2018. He co-directs the Center for the Theoretical Foundations of Learning, Inference, Information, Intelligence, Mathematics, and Microeconomics at Berkeley (CLIMB). He is also a member of the Berkeley Artificial Intelligence Research (BAIR) Lab, the Berkeley Laboratory of Information and System Sciences (BLISS), and the Berkeley Center for Responsible, Decentralized Intelligence (RDI). His research has been focusing on generative AI, foundation models, privacy and security in machine learning systems, reinforcement learning, the economic perspective of machine learning, and the applications of machine learning in natural language processing, code generation, computer vision, autonomous driving, and robotics.

David Wagner

Professor of Computer Science, University of Californi

David Wagner is a Professor of Computer Science in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He earned an A.B. in Mathematics from Princeton and M.S./Ph.D. degrees from UC Berkeley, where he was advised by Eric Brewer. His research interests include computer security, systems security, usable security, and program analysis for security. He is currently working on security for generative AI, software, wearable devices, smartphone security, and other topics in computer security. He has published two books and over 90 peer-reviewed scientific papers. Wagner also served on the committee for the NSA Award for the Best Scientific Cybersecurity Paper, and the editorial boards for CACM Research Highlights and the Journal of Election Technology and Systems (JETS). His paper, "Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples" won the best paper award at ICML 2018. He testified before Congress about remote voting security in 2020.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights