ChatGPT Jailbreaking Forums Proliferate in Dark Web Communities

By code or by command, cybercriminals are circumventing ethical and safety restrictions to use generative AI chatbots in the way that they want.

4 Min Read
Red and white robots placed beside each other
Source: Charles Taylor

The weaponization of generative AI tools like ChatGPT that everybody has been waiting for is slowly, slowly beginning to take form. In online communities, curious cats are collaborating on new ways to crack ChatGPT's ethics rules, commonly known as "jailbreaking," and hackers are developing a network of new tools to leverage or create large language models (LLMs) for malicious ends.

Just as it has above ground, ChatGPT appears to have inspired a frenzy in underground forums. Ever since December, hackers have been on the hunt for new and inventive prompts to manipulate ChatGPT, and open-source LLMs they can repurpose for malicious ends

The result, according to a new blog from SlashNext, is a still nascent but flourishing LLM hacking community, in possession of lots of clever prompts but few AI-enabled malwares worth a second thought.

What Hackers Are Doing With AI LLMs

Prompt engineering involves cleverly asking chatbots like ChatGPT questions aimed at manipulating them, making them break their programmed rules against, say, creating malware, without the models knowing it. This is an exercise of brute force, explains Patrick Harr, CEO of SlashNext: "Hackers are just trying to look around the guardrails. What are the edges? I just continuously change the prompts, ask it in different ways to do what I want."

Because it's such a tedious task, and because everybody's attacking the same target, it's only natural that healthy-sized online communities have formed around the practice to share tips and tricks. Members of these jailbreak communities scratch one another's backs, helping each other to make ChatGPT to crack and do things the developers intended to prevent it from doing.

a screenshot of an online chat group made for prompts to

Prompt engineers can only achieve so much with fancy wordplay, though, if the chatbot in question is built as resiliently as ChatGPT is. So, the more worrying trend is that malware developers are beginning to program LLMs for their own, nefarious ends.

The Looming Threat of WormGPT & Malicious LLMs

An offering called WormGPT appeared in July to kick off the malicious LLM phenomenon. It's a black-hat alternative to GPT models specifically designed for malicious activities like BEC, malware, and phishing attacks, marketed on underground forums "like ChatGPT but [with] no ethical boundaries or limitations." The creator of WormGPT claimed to have built it on a custom language model, trained on various data sources, with an emphasis on data relating to cyberattacks.

"What it means for hackers," Harr explains, "is I can now take, say, a business email compromise (BEC), or a phishing attack, or malware attack, and do this at scale at very minimal cost. And I could be much more targeted than before."

Since WormGPT, a number of similar products have been bandied about in shady online communities, including FraudGPT, which is advertised as a "bot without limitations, rules, [and] boundaries" by a threat actor who claims to be a verified vendor on various underground Dark Web marketplaces, including Empire, WHM, Torrez, World, AlphaBay, and Versus. And August brought the appearance of the DarkBART and DarkBERT cybercriminal chatbots, based on Google Bard, which researchers at the time said represent a major leap ahead for adversarial AI, including Google Lens integration for images and instant access to the whole of the cyber-underground knowledge base.

According to SlashNext, these are proliferating now, with the majority of them build upon open source models like OpenAI's OpenGPT. A slew of lower-skilled hackers simply customize it, disguise it in a wrapper, then slap a vaguely ominous "___GPT" name on it (e.g. "BadGPT," "DarkGPT"). Even these ersatz offerings have their place in the community, though, offering few limitations and total anonymity for users.

Defending Against Next-Gen AI Cyberweapons

Neither WormGPT, nor its offspring, nor prompt engineers, present such a significant danger to businesses quite yet, according to SlashNext. Even so, the rise of underground jailbreaking markets means that more tools are becoming available to cybercriminals, which in turn portends a broad shift in social engineering, and how we defend against it. 

Harr advises: "Don't rely on training, because these attacks are very, very specific, and very targeted, much more so than they were in the past."

Instead, he subscribes to the generally agreed-upon view that AI threats require AI protections. "If you don't have AI tools detecting and predicting and blocking these threats, you're going to be on the outside looking in," he says.

About the Author(s)

Nate Nelson, Contributing Writer

Nate Nelson is a freelance writer based in New York City. Formerly a reporter at Threatpost, he contributes to a number of cybersecurity blogs and podcasts. He writes "Malicious Life" -- an award-winning Top 20 tech podcast on Apple and Spotify -- and hosts every other episode, featuring interviews with leading voices in security. He also co-hosts "The Industrial Security Podcast," the most popular show in its field.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights