35 years after the Morris worm, we're still dealing with a version of the same issue: data overlapping with control.

3 Min Read
An earthworm in the dirt
Source: Denis Crawford via Alamy Stock Photo

A worm that uses clever prompt engineering and injection is able to trick generative AI (GenAI) apps like ChatGPT into propagating malware and more.

In a laboratory setting, three Israeli researchers demonstrated how an attacker could design "adversarial self-replicating prompts" that convince a generative model into replicating input as output – if a malicious prompt comes in, the model will turn around and push it back out, allowing it to spread to further AI agents. The prompts can be used for stealing information, spreading spam, poisoning models, and more.

They've named it "Morris II," after the infamous 99-line self-propagating malware which took out a tenth of the entire Internet back in 1988.

"ComPromptMized" AI Apps

To demonstrate how self-replicating AI malware could work, the researchers created an email system capable of receiving and sending emails using generative AI.

Next, as a red team, they wrote a prompt-laced email which takes advantage of retrieval-augmented generation (RAG) — a method AI models use to retrieve trusted external data — to contaminate the receiving email assistant's database. When the email is retrieved by the RAG and sent on to the gen AI model, it jailbreaks it, forcing it to exfiltrate sensitive data and replicate its input as output, thereby passing on the same instructions to further hosts down the line.

The researchers also demonstrated how an adversarial prompt can be encoded in an image to similar effect, coercing the email assistant into forwarding the poisoned image to new hosts. By either of these methods, an attacker could automatically propagate spam, propaganda, malware payloads, and further malicious instructions through a continuous chain of AI-integrated systems.

New Malware, Old Problem

Most of today's most advanced threats to AI models are just new versions of the oldest security problems in computing.

"While it's tempting to see these as existential threats, these are no different in threat than the use of SQL injection and similar injection attacks, where malicious users abuse text-input spaces to insert additional commands or queries into a supposedly sanitized input," says Andrew Bolster, senior R&D manager for data science at Synopsys. "As the research notes, this is a 35-year-old idea that still has legs (older in fact; father-of-modern-computing-theory John Von Neumann theorized on this in the 50s and 60s)."

Part of what made the Morris worm novel in its time three decades ago was the fact that it figured out how to jump the data space into the part of the computer that exerts controls, enabling a Cornell grad student to escape the confines of a regular user and influence what a targeted computer does.

"A core of computer architecture, for as long as there have been computers, has been this conceptual overlap between the data space and the control space — the control space being the program instructions that you are following, and then having data that's ideally in a controlled area," Bolster explains.

Clever hackers today use GenAI prompts largely to the same effect. And so, just like software developers before them, for defense, AI developers will need some way to ensure their programs don't confuse user input for machine output. Developers can offload some of this responsibility to API rules, but a deeper solution might involve breaking up the gen AI models themselves into constituent parts. This way, data and control aren't living side-by-side in the same big house.

"We're really starting to work on: How do we go from this everything-in-one-box approach, to going for more of a distributed multiple agent approach," Bolster says. "If you want to really squint at it, this is kind of analogous to the shift in microservices architecture from one big monolith. With everything in a services architecture, you're able to put runtime content gateways between and around different services. So you as a system operator can ask 'Why is my email agent expressing things like images?' and put constraints on."

About the Author(s)

Nate Nelson, Contributing Writer

Nate Nelson is a freelance writer based in New York City. Formerly a reporter at Threatpost, he contributes to a number of cybersecurity blogs and podcasts. He writes "Malicious Life" -- an award-winning Top 20 tech podcast on Apple and Spotify -- and hosts every other episode, featuring interviews with leading voices in security. He also co-hosts "The Industrial Security Podcast," the most popular show in its field.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights