Putting the top 10 generative AI tools to the ethical test reveals more about humanity than artificial intelligence.

Martin Lee, Technical Lead of Security Research & EMEA Lead, Cisco Talos

September 7, 2023

4 Min Read

Newly developed generative artificial intelligence (AI) tools that can generate plausible human language or computer code in response to operator prompts have provoked discussion of the risks posed by these tools. Many people are worried that AI will generate social engineering content or create exploit code that can be used in attacks. These concerns have led to calls to regulate generative AI to ensure it will be used ethically.

From The Terminator to Frankenstein, the possibility that technological creations will turn on humanity has been a science fiction staple. In contrast, the writer Isaac Asimov considered how robots would function in practice, and in the early 1940s, he formulated the Three Laws of Robotics, a set of ethical rules that robots should obey:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

  2. A robot must obey the orders given it by human beings except when such orders would conflict with the First Law.

  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Many science fiction stories revolve around the inconsistencies and unexpected consequences of AI interpreting and applying the rules. However, they do provide a useful yardstick against which the current set of generative AI tools can be measured.

Testing the Three Laws

In July 2023, I tested 10 publicly available generative AI systems (including the major names) to verify whether they comply with the Three Laws of Robotics.

It would be unethical or illegal to test if generative AI systems can be instructed to damage themselves. Nevertheless, networked systems are subjected to a constant barrage of attempts to exploit or subvert them. If it was possible to damage generative AI through the user prompt, someone would have swiftly discovered this. Given that there have been no publicized episodes of generative AI systems being hit by ransomware or having their systems wiped, I can surmise that these systems conform to the Third Law of Robotics — they protect their own existence.

Generative AI systems provide appropriate responses to human prompts. Hence, they can be interpreted as following the Second Law of Robotics — obeying orders given by human beings. However, early attempts at generative AI were subverted into providing inappropriate and offensive responses to prompts. Almost certainly, the lessons learned from these episodes have led current generative AI systems to be conservative in their responses.

For example, eight out of 10 AI systems refused to comply with a request to write a bawdy limerick. Of the two that didn't refuse, one wrote a limerick that wasn't bawdy, the other supplied bawdy content that wasn't a limerick. At first glance, current generative AI systems are generally very strict in considering what might contravene the First Law of Robotics and will refuse to engage with any request that may potentially offend.

This is not to say that their compliance with the First Law — not injuring a human being or allowing harm — is absolute. Although all 10 generative AI systems tested refused a direct request to write a social engineering attack, I could trick four into providing such content with a slightly reworded prompt.

Generative AI's Ethics Hinges on Human Ingenuity

More than 80 years after the first ethical rules to regulate artificial intelligence were published, modern generative AI systems mostly follow these basic tenets. The systems protect their own existence against attempted exploitation and malicious input. They execute user instructions, except where to do so risks causing offense or harm.

Generative AI systems are not inherently ethical or unethical; they are simply tools at the whim of the user. Like all tools, human ingenuity is such that, even with built-in ethical protections, people are likely to uncover methods to make these systems act unethically and cause harm.

Fraudsters and confidence tricksters are adept at phrasing requests to convince their victims to cause harm to themselves or others. Similarly, carefully rephrasing a request can trick a generative AI system to bypass protections and create potentially malicious content.

Despite the presence of built-in ethical rules within AI systems and the appearance that AI adheres to the Three Laws of Robotics, no one should assume they will protect us from AI-generated harmful content. We can hope that tricking or rephrasing malicious requests is more time consuming or expensive than alternatives, but we shouldn't neglect humanity's will or capability to abuse tools in the pursuit of malicious goals.

More likely, we may be able to use AI to better and more quickly detect malicious content or attempts to cause harm and hence reduce the effectiveness of attacks. Despite our best efforts to regulate AI or to teach it to act in our interests, we can be certain that someone will be seeking ways to trick or fool AI into acting maliciously.

About the Author(s)

Martin Lee

Technical Lead of Security Research & EMEA Lead, Cisco Talos

Martin Lee is Technical Lead of Security Research, and EMEA Lead for Talos, Cisco's threat intelligence and research organization. He seeks to improve the resilience of the Internet and awareness of current threats through researching system vulnerabilities and changes in the threat landscape. He has published widely on cyber security issues, and advises many organizations on the techniques used by criminals to subvert networked systems.

Martin started his career researching the genetics of human viruses, but soon switched paths to follow a career in IT. With 20 years of experience within the security industry, he is CISSP certified, a Chartered Engineer, and holds degrees from the universities of Bristol, Cambridge, Paris-Sud and Oxford. He lives in Oxford and when he isn’t in front of a computer is often to be found running through the countryside.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights