Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

End of Bibblio RCM includes -->

GitHub Unveils AI Tool to Speed Development, but Beware Insecure Code

The company has created an AI system, dubbed Copilot, to offer code suggestions to developers, but warns that any code produced should be tested for defects and vulnerabilities.

A machine agent designed to help developers quickly create blocks of code based on a few semantic hints is expected to cut programming time significantly, but it also comes with a warning to watch out for errors and security issues. 

The technical preview, dubbed Copilot, was trained on billions of lines of code from projects on GitHub's development service to predict the intent of a function based on comments, documentation strings (docstrings), the name of the function, and any code entered by the developer. A collaboration between GitHub and OpenAI, Copilot will auto-complete entire blocks of code, but the groups warn that code could have defects, contain offensive language, and potentially have security issues.

Related Content:

Expect an Increase in Attacks on AI Systems

Special Report: Building the SOC of the Future

New From The Edge: 7 Skills the Transportation Sector Needs to Fuel Its Security Teams

"There’s a lot of public code in the world with insecure coding patterns, bugs, or references to outdated APIs or idioms," GitHub stated on its Copilot site. "When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns."

Technologies based on artificial intelligence (AI) and machine learning (ML) hold great promise for developers looking to reduce vulnerabilities, security analysts triaging alerts, and incident response specialists aiming to remediate issues faster, among other benefits. However, early ML systems are often prone to errors and adversarial attacks. 

In 2016, for example, Microsoft unveiled a chatbot named "Tay" on Twitter. The ML system attempted to converse with anyone sending it a message online, but it also learned from those conversations. A coordinated attack on Tay, however, led to the chatbot parroting offensive phrases and retweeting inappropriate images.

The example highlights how using input from the untrusted Internet to train ML algorithms can lead to unexpected results. GitHub stressed that Copilot is still an early effort, and security will be a focus in the future.

"It's early days, and we're working to include GitHub's own security tooling and exclude insecure or low-quality code from the training set, among other mechanisms," GitHub said in a response to questions from Dark Reading. "We hope to share more in the coming months."

Copilot is based on the OpenAI Codex, a new generative learning system that has been trained on the English language as well as source code from public repositories, such as GitHub. Typing in a comment, a function name, and variables will lead to Copilot auto-completing the body of the function with the most likely result, but the system will also offer other possible code blocks. 

In a test using Python functions, Copilot guessed the correct content in 43% of cases, and in 57% of cases, the correct code block existed in the top 10 results. The service is intended to prove that a machine agent can act as the other half of a pair programming team and significantly speed development. 

"GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense," the company stated in its response to questions from Dark Reading. "While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code."

Such systems are very likely to be targets of attackers. Researchers have taken great interest in finding ways to modify the results or disrupt such systems. The number of research papers focused on AI security and published online jumped to more than 1,500 in 2019, up from 56 three years earlier

In November 2020, MITRE worked with Microsoft and other technology companies to create a dictionary of potential adversarial attacks on AI/ML systems, and gave examples of a significant number of real attacks. 

Insecure code is not the only worry. Personal data inadvertently published to GitHub's site could be included in the output of the code, although the company found such instances "extremely rare" in this testing of the system. A study found that while Copilot could output exact instances of code, the system rarely did so.

Even so, the system is a tool and not a replacement for good coding practices, the company said.

"As the developer, you are always in charge," it said.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
The Promise and Reality of Cloud Security
Cloud security has been part of the cybersecurity conversation for years but has been on the sidelines for most enterprises. The shift to remote work during the COVID-19 pandemic and digital transformation projects have moved cloud infrastructure front-and-center as enterprises address the associated security risks. This report - a compilation of cutting-edge Black Hat research, in-depth Omdia analysis, and comprehensive Dark Reading reporting - explores how cloud security is rapidly evolving.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2023-02-01
Versions of the package mt7688-wiscan before 0.8.3 are vulnerable to Command Injection due to improper input sanitization in the 'wiscan.scan' function.
PUBLISHED: 2023-02-01
Dell BIOS contains a heap buffer overflow vulnerability. A local attacker with admin privileges could potentially exploit this vulnerability to perform an arbitrary write to SMRAM during SMM.
PUBLISHED: 2023-02-01
Dell Rugged Control Center, versions prior to 4.5, contain an Improper Input Validation in the Service EndPoint. A Local Low Privilege attacker could potentially exploit this vulnerability, leading to an Escalation of privileges.
PUBLISHED: 2023-02-01
Dell Command | Update, Dell Update, and Alienware Update versions prior to 4.7 contain a Exposure of Sensitive System Information to an Unauthorized Control Sphere vulnerability in download operation component. A local malicious user could potentially exploit this vulnerability leading to the disclo...
PUBLISHED: 2023-02-01
Dell Command | Update, Dell Update, and Alienware Update versions prior to 4.7 contain a improper verification of cryptographic signature in get applicable driver component. A local malicious user could potentially exploit this vulnerability leading to malicious payload execution.