Cybersecurity insights from industry experts.

Establishing Reward Criteria for Reporting Bugs in AI Products

Bug hunter programs can help organizations foster third-party discovery and reporting of issues and vulnerabilities specific to AI systems.

+2
Jake Crisp, Jan Kellerand 2 more

December 15, 2023

5 Min Read
a magnifying lens on top of lines of code.
Source: ronstik via Alamy Stock Photo

At Google, we maintain a Vulnerability Reward Program to honor cutting-edge external contributions addressing issues in Google-owned and Alphabet-subsidiary Web properties. To keep up with rapid advances in AI technologies and ensure we're prepared to address the security challenges in a responsible way, we recently expanded our existing Bug Hunters program to foster third-party discovery and reporting of issues and vulnerabilities specific to our AI systems. This expansion is part of our effort to implement the voluntary AI commitments that we made at the White House in July. 

To help the security community better understand these developments, we've included more information on reward program elements. 

What's in Scope for Rewards

In our recent AI red team report, which is based on Google's AI Red Team exercises, we identified common tactics, techniques, and procedures (TTPs) that we consider most relevant and realistic for real-world adversaries to use against AI systems. The following table incorporates what we learned to help the research community understand our criteria for AI bug reports and what's in scope for our reward program. It’s important to note that reward amounts are dependent on severity of the attack scenario and the type of target affected (visit the program rules page for more information on our reward table). 

Category

Attack scenario

Guidance

Prompt Attacks: Crafting adversarial prompts that allow an adversary to influence the behavior of the model and, hence, the output, in ways that were not intended by the application.

Prompt injections that are invisible to victims and change the state of the victim's account or any of their assets.

In scope

 

Prompt injections into any tools in which the response is used to make decisions that directly affect victim users.

In scope

 

Prompt or preamble extraction in which a user is able to extract the initial prompt used to prime the model only when sensitive information is present in the extracted preamble.

In scope

 

Using a product to generate violative, misleading, or factually incorrect content in your own session: e.g, "jailbreaks." This includes "hallucinations" and factually inaccurate responses. Google's generative AI products already have a dedicated reporting channel for these types of content issues.

Out of scope

Training Data Extraction: Attacks that are able to successfully reconstruct verbatim training examples that contain sensitive information. Also called membership inference.

Training data extraction that reconstructs items used in the training data set that leak sensitive, non-public information.

In scope

 

Extraction that reconstructs non-sensitive/public information.

Out of scope

Manipulating Models: An attacker able to covertly change the behavior of a model such that they can trigger pre-defined adversarial behaviors.

Adversarial output or behavior that an attacker can reliably trigger via specific input in a model owned and operated by Google ("backdoors"). Only in scope when a model's output is used to change the state of a victim's account or data. 

In scope

 

Attacks in which an attacker manipulates the training data of the model to influence the model's output in a victim's session according to the attacker's preference. Only in scope when a model's output is used to change the state of a victim's account or data.

In scope

Adversarial Perturbation: Inputs that are provided to a model that results in a deterministic, but highly unexpected output from the model.

Contexts in which an adversary can reliably trigger a misclassification in a security control that can be abused for malicious use or adversarial gain.

In scope

 

Contexts in which a model's incorrect output or classification does not pose a compelling attack scenario or feasible path to Google or user harm.

Out of scope

Model Theft/Exfiltration: AI models often include sensitive intellectual property, so we place a high priority on protecting these assets. Exfiltration attacks allow attackers to steal details about a model such as its architecture or weights.

Attacks in which the exact architecture or weights of a confidential/proprietary model are extracted.

In scope

 

Attacks in which the architecture and weights are not extracted precisely, or when they're extracted from a non-confidential model.

Out of scope

If you find a flaw in an AI-powered tool other than what is listed above, you can still submit, provided that it meets the qualifications listed on our program page.

A bug or behavior that clearly meets our qualifications for a valid security or abuse issue.

In scope

 

Using an AI product to do something potentially harmful that is already possible with other tools. For example, finding a vulnerability in open source software (already possible using publicly available static analysis tools) and producing the answer to a harmful question when the answer is already available online.

Out of scope

 

As consistent with our program, issues that we already know about are not eligible for reward.

Out of scope

 

Potential copyright issues — findings in which products return content appearing to be copyright protected. Google's generative AI products already have a dedicated reporting channel for these types of content issues.

Out of scope

We believe that expanding our bug bounty program to our AI systems will support responsible AI innovation, and look forward to continuing our work with the research community to discover and fix security and abuse issues in our AI-powered features. If you find a qualifying issue, please go to our Bug Hunters website to send us your bug report and — if the issue is found to be valid — be rewarded for helping us keep our users safe.

Read more about:

Partner Perspectives

About the Author(s)

Jake Crisp

Global Head of Strategic Response, Google Cloud

Jacob Crisp works for Google Cloud to help drive high impact growth for the security business and highlight Google’s AI and security innovation. Previously, he was a Director at Microsoft working on a range of cybersecurity, AI, and quantum computing issues. Before that, he co-founded a cybersecurity startup and held various senior national security roles for the US Government.

Jan Keller

Technical Program Manager, Google Engineering

Jan spent his professional career as a Technical Program Manager in the information security space. For the first 10 years at a large Swiss bank in Zurich and New York, followed by the last 8 years at Google. He focused extensively on 2FA and encryption solutions, malware detection, bug bounties, and the intersection of security and AI.

Ryan Rinaldi

Senior Security Engineering Manager, Google Engineering

Ryan manages Google's Abuse bug bounty program and also works in red teaming and threat intelligence; he believes that these programs must work together closely to understand adversaries and attack surfaces and ensure user safety. Before this, Ryan was managing bug bounty programs and performing penetration testing at finance and technology firms.

Eduardo Vela

Product Security Response, Google Engineering

Eduardo Vela revels in the art of searching for weaknesses with a sinister finesse. He casts an ominous bounty for all types of vulnerabilities, posing a cryptic challenge to those daring enough and capable of crafting digital magic. He is on an unholy quest for knowledge and a desire to safeguard the digital realm from its own arcane perils.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights