Compare how well OpenAI's and Google's generative AI products handle infosec professionals' top 10 tasks.

Alex Haynes, Chief Information Security Officer, IBS Software

March 15, 2024

5 Min Read
Colorful, abstract artificial intelligence concept art
Source: Blackboard via Alamy Stock Photo

In late 2023, I wrote an article comparing how well ChatGPT and Google Bard handle writing security policies. Given that ChatGPT 4.0 has been available as a paid version, called ChatGPT Plus, for some time now, and Google recently rebranded Google Bard as Gemini (with Gemini Advanced available as paid offering), it's a good time to compare how the two perform in a head-to-head of the top 10 use cases for information security professionals.

Before we jump in, the usual generative artificial intelligence (AI) caveats apply: Be careful of the data you punch in and remember the output may not always be reliable.

1. Generating Diagrams or Concept Flows

Both tools claim to be able to generate diagrams and concept flows. However, Gemini admits it can only generate ASCII diagrams, pointing you to more professional tools if you want something better. I asked both tools to generate a diagram to explain the OAuth authentication flow.

Gemini while represented in ASCII, does the job and breaks it down into usable categories.

ChatGPT hallucinates badly. At first glance, while the image looks professional, it doesn't represent OAuth at all. The wording is nonsensical, misspelled, or downright illegible: Authiration and Athoricazt anyone?

ChatGPT output when asked to produce diagram about OAuth

2. Explaining Architecture Diagrams

Both tools can ingest diagrams and explain what's going on. The results are much better than what happens when you ask them to generate diagrams. As input, I used an example Web application firewall (WAF) architecture from Edgenexus.

Google Gemini is much better at explaining architecture diagrams because it's succinct. ChatGPT will do the job just fine; it's just a tad wordy.

3. Interpreting Exploit Code

A common security operations (SecOps) activity is trying to figure out what a specific malware or exploit code does. I took a recent Elasticsearch stack overflow public exploit and fed it into each tool to see what it understood. There's no clear winner: Both tools identify the exploit correctly and explain the end result, what each portion of the code does, and how it works.

4. Interpreting Log Files

SecOps professionals often need to figure out what the heck is going on in log files. I fed both tools an example CEF format log file of an attempted breach and asked each to explain what's going on. Gemini explains it better, summarizing well and even suggesting follow-up steps. It also clearly states what happened (attempted access of /etc/passwd) right at the beginning and elaborates on how it came to that conclusion. While ChatGPT arrives at the same conclusion, it is way too verbose.

5. Writing Policies and Security Documentation

I won't elaborate too much on this and will instead refer you to my previous article on this topic. I ran the test again with Gemini, and the results are consistent with Bard's: Gemini clearly understands and generates better security documentation than ChatGPT.

6. Identifying Vulnerable Code

While these tools weren't designed for (and shouldn't be used for) identifying vulnerable code, they can still do an adequate job. I decided to test it by feeding both tools an insecure direct object reference (IDOR) vulnerability example in Python, which also contains a SQL injection.

ChatGPT correctly identified both vulnerabilities and the lack of authentication. Gemini missed the IDOR but pointed out the SQL injection and went a step further to propose amended code to fix the vulnerability. ChatGPT can also do this, but it must be prompted to do so.

7. Writing Scripts and Code

A common security operations center (SOC) activity is writing  scripts for log parsing or data manipulation. I gave both tools the following prompt:

"Write me a Python script that extracts all IPv6 addresses from a txt input file, removes all duplicates, does a lookup to geo-locate and identify the owner of the IP, and output the result in a CSV file"

There's no clear winner here; both tools produce clear, readable code that works and explains what it does.

8. Analyzing Data and Metrics

I also tested whether these tools could help with analyzing data or security metrics. Gemini is a big loser here because it doesn't do it at all — it can only guide you through how do this in Excel and Power BI. ChatGPT has the advantage through its Data Analyst plug-in, which ingests Excel files to generate any graphs you want. It even suggests visualization types, and you can modify a graph's design, including color, axes, and labels, through the prompt. 

Example of graph output by ChatGPT's Data Analyst plug-in

9. Writing User Awareness Messages

Both tools can also generate emails for security awareness campaigns. I gave both the following prompt: "Generate an email used for a security awareness campaign. Be funny and sarcastic. Remind people why they shouldn't click on random emails from random people."

Gemini wins here — its email is brief, has the right tone, and (although humor is subjective) I found it slightly funnier. ChatGPT still generates the right tone and a good email, but I found it a tad too long for an awareness email. Either way, both tools do a great job.

Gemini outputs a user awareness email after being given a prompt

10. Interpreting Compliance Frameworks

If you have a quick question about how to implement a compliance framework, these tools can definitely help. While you may not do this often, they are very useful when you need it.

If you've ever argued with someone about what constitutes a "significant" change under PCI-DSS and how it should be applied, you're not alone. I prompted each tool with: 

"Explain the concept of 'significant change' in the context of PCI-DSS. What constitutes a major change usually? List the exact requirement from the standard as well"

Gemini has the upper hand: It correctly lists the exact requirements from the standard (such as 6.4.5 and 6.4.6) and how to interpret whether something is a significant change. ChatGPT doesn't mention exactly where this information appears in the standard. 

Which AI Is Better, ChatGPT or Gemini?

There you have it. Depending on your use case, either tool can be a helpful ally in boosting productivity and helping you in your day-to-day activities in the infosec trenches.

About the Author(s)

Alex Haynes

Chief Information Security Officer, IBS Software

Alex Haynes is a former pen tester with a background in offensive security and is credited for discovering vulnerabilities in products by Microsoft, Adobe, Pinterest, Amazon Web Services and IBM. He is a former top 10 ranked researcher on Bugcrowd and a member of the Synack Red Team. He is currently CISO at IBS Software. Alex has contributed to United States Cyber Security Magazine, Cyber Defense Magazine, Infosecurity Magazine, and IAPP tech blog. He also has spoken at security conferences including OWASP and ISC Security Summits.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights