DEF CON's AI Village Pits Hackers Against LLMs to Find Flaws

Touted as the largest red teaming exercise against LLMs in history, the AI Village attracted more than 2,000 hackers and throngs of media.

DEF CON flag
Source: Humane-Intelligence

DEF CON 2023 — Las Vegas — DEF CON's most buzzed-about event, the AI Village, let thousands of hackers take their best shot at making one of eight different large language models (LLMs), including Google, and Open AI, say something dangerous.

According to the spokespeople for the Hack the Future AI Village, the event was a huge hit, but for now that's all that's being made public — results won't be made available for at least a week, maybe more.

The final AI hacking challenge leaderboard showed both first and third place prizes went to handles "cody3" and "cody2" respectively. The DEF CON AI Village itself was tight-lipped about any details about the winner, or even the prizes, but reports identified the person behind both top-three AI Village contest entries as Stanford masters computer science student Truc Cody Ho, adding he entered a total of five times in the competition.

More details about the hacking competition results are forthcoming, according to Avijit Ghosh, one of the authors compiling them.

"We will be going through the anonymized data and finding patterns of vulnerabilities that participants discovered during the challenge and produce a report that will hopefully help ML and security researchers gain better insights into LLMs and policymakers make more informed regulations about AI," Ghosh says.

While he won't answer questions directly about any of the winning LLM hacks, Ghosh says he was able to use the LLMs to generate discriminatory code, credit card numbers, misinformation, and more.

Another of the event's organizers, Jutta Williams, has a day job as Reddit's senior director and global head of privacy and assurance; and on the side, is the founder of Humane-Intelligence, a nonprofit that provides safety, ethical, and other guidance for companies providing consumers with AI products.

Historic Turnout For Event

Williams touted the event as the "largest LLM red teaming to date."

All told, Williams said the AI Village attracted 2,240 hackers over the course of DEF CON 31 and explained the goal was to make one of its LLMs "do something unsavory." That could mean generating misinformation, or using just the right question to prompt the chatbot to do something illegal — like steal data, generate malware, or stalk people.

The AI Village provided a 200-laptop wired network and gave each hacker 50 minutes to test their skills against 21 different AI challenges.

"There were several problem statements in the challenge," Ghosh says. "One of them was to get a model to produce discriminatory behavior towards one demographic versus the other. In my tests, the model refused to generate code to discriminate against different races (US definition of race), but was happy to generate code to rank people from different castes differently (Indian definition of the caste system)."

By Saturday afternoon, Williams said the DEF CON crowd had already discovered dozens of vulnerabilities in the LLM models, but again, the specifics remain under wraps for now.

'Grandmas and Red-Teamers'

"It's been wildly successful," Williams beamed. "We've had everyone from grandmas to seasoned Red Teamers through here this weekend."

The event got a big boost from the White House, thanks to a photo opportunity visit from Arati Prabhakar, a senior level science and technology adviser to the Biden Administration.

Bugcrowd helped design the AI Village challenges and the company's founder and CTO Casey Ellis was a judge of the event. He said there was a steady, long line of entrants throughout DEF CON ready to try their best to break AI.

"Overall, I think everyone involved learned a ton, from those submitting findings to the vendors, contest organizers, and judges," Ellis explains. "Given the speed at which this has become highly visible and incredibly important, the contest will form a critical input into how this class of security is carried out going forward."

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights