Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

GitHub Unveils AI Tool to Speed Development, but Beware Insecure Code

The company has created an AI system, dubbed Copilot, to offer code suggestions to developers, but warns that any code produced should be tested for defects and vulnerabilities.

A machine agent designed to help developers quickly create blocks of code based on a few semantic hints is expected to cut programming time significantly, but it also comes with a warning to watch out for errors and security issues. 

The technical preview, dubbed Copilot, was trained on billions of lines of code from projects on GitHub's development service to predict the intent of a function based on comments, documentation strings (docstrings), the name of the function, and any code entered by the developer. A collaboration between GitHub and OpenAI, Copilot will auto-complete entire blocks of code, but the groups warn that code could have defects, contain offensive language, and potentially have security issues.

Related Content:

Expect an Increase in Attacks on AI Systems

Special Report: Building the SOC of the Future

New From The Edge: 7 Skills the Transportation Sector Needs to Fuel Its Security Teams

"There’s a lot of public code in the world with insecure coding patterns, bugs, or references to outdated APIs or idioms," GitHub stated on its Copilot site. "When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns."

Technologies based on artificial intelligence (AI) and machine learning (ML) hold great promise for developers looking to reduce vulnerabilities, security analysts triaging alerts, and incident response specialists aiming to remediate issues faster, among other benefits. However, early ML systems are often prone to errors and adversarial attacks. 

In 2016, for example, Microsoft unveiled a chatbot named "Tay" on Twitter. The ML system attempted to converse with anyone sending it a message online, but it also learned from those conversations. A coordinated attack on Tay, however, led to the chatbot parroting offensive phrases and retweeting inappropriate images.

The example highlights how using input from the untrusted Internet to train ML algorithms can lead to unexpected results. GitHub stressed that Copilot is still an early effort, and security will be a focus in the future.

"It's early days, and we're working to include GitHub's own security tooling and exclude insecure or low-quality code from the training set, among other mechanisms," GitHub said in a response to questions from Dark Reading. "We hope to share more in the coming months."

Copilot is based on the OpenAI Codex, a new generative learning system that has been trained on the English language as well as source code from public repositories, such as GitHub. Typing in a comment, a function name, and variables will lead to Copilot auto-completing the body of the function with the most likely result, but the system will also offer other possible code blocks. 

In a test using Python functions, Copilot guessed the correct content in 43% of cases, and in 57% of cases, the correct code block existed in the top 10 results. The service is intended to prove that a machine agent can act as the other half of a pair programming team and significantly speed development. 

"GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense," the company stated in its response to questions from Dark Reading. "While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code."

Such systems are very likely to be targets of attackers. Researchers have taken great interest in finding ways to modify the results or disrupt such systems. The number of research papers focused on AI security and published online jumped to more than 1,500 in 2019, up from 56 three years earlier

In November 2020, MITRE worked with Microsoft and other technology companies to create a dictionary of potential adversarial attacks on AI/ML systems, and gave examples of a significant number of real attacks. 

Insecure code is not the only worry. Personal data inadvertently published to GitHub's site could be included in the output of the code, although the company found such instances "extremely rare" in this testing of the system. A study found that while Copilot could output exact instances of code, the system rarely did so.

Even so, the system is a tool and not a replacement for good coding practices, the company said.

"As the developer, you are always in charge," it said.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
Enterprise Cybersecurity Plans in a Post-Pandemic World
Download the Enterprise Cybersecurity Plans in a Post-Pandemic World report to understand how security leaders are maintaining pace with pandemic-related challenges, and where there is room for improvement.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-09-22
The vCenter Server contains a local privilege escalation vulnerability due to the way it handles session tokens. A malicious actor with non-administrative user access on vCenter Server host may exploit this issue to escalate privileges to Administrator on the vSphere Client (HTML5) or vCenter Server...
PUBLISHED: 2021-09-22
The vCenter Server contains a denial-of-service vulnerability due to improper XML entity parsing. A malicious actor with non-administrative user access to the vCenter Server vSphere Client (HTML5) or vCenter Server vSphere Web Client (FLEX/Flash) may exploit this issue to create a denial-of-service ...
PUBLISHED: 2021-09-22
The Ninja Forms WordPress plugin is vulnerable to sensitive information disclosure via the bulk_export_submissions function found in the ~/includes/Routes/Submissions.php file, in versions up to and including 3.5.7. This allows authenticated attackers to export all Ninja Forms submissions data via t...
PUBLISHED: 2021-09-22
The Ninja Forms WordPress plugin is vulnerable to arbitrary email sending via the trigger_email_action function found in the ~/includes/Routes/Submissions.php file, in versions up to and including 3.5.7. This allows authenticated attackers to send arbitrary emails from the affected server via the /n...
PUBLISHED: 2021-09-22
Talend ESB Runtime in all versions from 5.1 to 7.3.1-R2021-09, 7.2.1-R2021-09, 7.1.1-R2021-09, has an unauthenticated Jolokia HTTP endpoint which allows remote access to the JMX of the runtime container, which would allow an attacker the ability to read or modify the container or software running in...