Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

End of Bibblio RCM includes -->

GitHub Unveils AI Tool to Speed Development, but Beware Insecure Code

The company has created an AI system, dubbed Copilot, to offer code suggestions to developers, but warns that any code produced should be tested for defects and vulnerabilities.

A machine agent designed to help developers quickly create blocks of code based on a few semantic hints is expected to cut programming time significantly, but it also comes with a warning to watch out for errors and security issues. 

The technical preview, dubbed Copilot, was trained on billions of lines of code from projects on GitHub's development service to predict the intent of a function based on comments, documentation strings (docstrings), the name of the function, and any code entered by the developer. A collaboration between GitHub and OpenAI, Copilot will auto-complete entire blocks of code, but the groups warn that code could have defects, contain offensive language, and potentially have security issues.

Related Content:

Expect an Increase in Attacks on AI Systems

Special Report: Building the SOC of the Future

New From The Edge: 7 Skills the Transportation Sector Needs to Fuel Its Security Teams

"There’s a lot of public code in the world with insecure coding patterns, bugs, or references to outdated APIs or idioms," GitHub stated on its Copilot site. "When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns."

Technologies based on artificial intelligence (AI) and machine learning (ML) hold great promise for developers looking to reduce vulnerabilities, security analysts triaging alerts, and incident response specialists aiming to remediate issues faster, among other benefits. However, early ML systems are often prone to errors and adversarial attacks. 

In 2016, for example, Microsoft unveiled a chatbot named "Tay" on Twitter. The ML system attempted to converse with anyone sending it a message online, but it also learned from those conversations. A coordinated attack on Tay, however, led to the chatbot parroting offensive phrases and retweeting inappropriate images.

The example highlights how using input from the untrusted Internet to train ML algorithms can lead to unexpected results. GitHub stressed that Copilot is still an early effort, and security will be a focus in the future.

"It's early days, and we're working to include GitHub's own security tooling and exclude insecure or low-quality code from the training set, among other mechanisms," GitHub said in a response to questions from Dark Reading. "We hope to share more in the coming months."

Copilot is based on the OpenAI Codex, a new generative learning system that has been trained on the English language as well as source code from public repositories, such as GitHub. Typing in a comment, a function name, and variables will lead to Copilot auto-completing the body of the function with the most likely result, but the system will also offer other possible code blocks. 

In a test using Python functions, Copilot guessed the correct content in 43% of cases, and in 57% of cases, the correct code block existed in the top 10 results. The service is intended to prove that a machine agent can act as the other half of a pair programming team and significantly speed development. 

"GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense," the company stated in its response to questions from Dark Reading. "While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code."

Such systems are very likely to be targets of attackers. Researchers have taken great interest in finding ways to modify the results or disrupt such systems. The number of research papers focused on AI security and published online jumped to more than 1,500 in 2019, up from 56 three years earlier

In November 2020, MITRE worked with Microsoft and other technology companies to create a dictionary of potential adversarial attacks on AI/ML systems, and gave examples of a significant number of real attacks. 

Insecure code is not the only worry. Personal data inadvertently published to GitHub's site could be included in the output of the code, although the company found such instances "extremely rare" in this testing of the system. A study found that while Copilot could output exact instances of code, the system rarely did so.

Even so, the system is a tool and not a replacement for good coding practices, the company said.

"As the developer, you are always in charge," it said.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
User Rank: Apprentice
2/7/2023 | 2:41:11 AM
Pending Review
This comment is waiting for review by our moderators.
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
The 10 Most Impactful Types of Vulnerabilities for Enterprises Today
Managing system vulnerabilities is one of the old est - and most frustrating - security challenges that enterprise defenders face. Every software application and hardware device ships with intrinsic flaws - flaws that, if critical enough, attackers can exploit from anywhere in the world. It's crucial that defenders take stock of what areas of the tech stack have the most emerging, and critical, vulnerabilities they must manage. It's not just zero day vulnerabilities. Consider that CISA's Known Exploited Vulnerabilities (KEV) catalog lists vulnerabilitlies in widely used applications that are "actively exploited," and most of them are flaws that were discovered several years ago and have been fixed. There are also emerging vulnerabilities in 5G networks, cloud infrastructure, Edge applications, and firmwares to consider.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2023-03-27
In Delta Electronics InfraSuite Device Master versions prior to 1.0.5, an attacker could use URL decoding to retrieve system files, credentials, and bypass authentication resulting in privilege escalation.
PUBLISHED: 2023-03-27
In Delta Electronics InfraSuite Device Master versions prior to 1.0.5, an attacker could use Lua scripts, which could allow an attacker to remotely execute arbitrary code.
PUBLISHED: 2023-03-27
Delta Electronics InfraSuite Device Master versions prior to 1.0.5 contains an improper access control vulnerability in which an attacker can use the Device-Gateway service and bypass authorization, which could result in privilege escalation.
PUBLISHED: 2023-03-27
Delta Electronics InfraSuite Device Master versions prior to 1.0.5 are affected by a deserialization vulnerability targeting the Device-DataCollect service, which could allow deserialization of requests prior to authentication, resulting in remote code execution.
PUBLISHED: 2023-03-27
Heap-based Buffer Overflow in GitHub repository gpac/gpac prior to 2.4.0.