Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

End of Bibblio RCM includes -->

GitHub Unveils AI Tool to Speed Development, but Beware Insecure Code

The company has created an AI system, dubbed Copilot, to offer code suggestions to developers, but warns that any code produced should be tested for defects and vulnerabilities.

A machine agent designed to help developers quickly create blocks of code based on a few semantic hints is expected to cut programming time significantly, but it also comes with a warning to watch out for errors and security issues. 

The technical preview, dubbed Copilot, was trained on billions of lines of code from projects on GitHub's development service to predict the intent of a function based on comments, documentation strings (docstrings), the name of the function, and any code entered by the developer. A collaboration between GitHub and OpenAI, Copilot will auto-complete entire blocks of code, but the groups warn that code could have defects, contain offensive language, and potentially have security issues.

Related Content:

Expect an Increase in Attacks on AI Systems

Special Report: Building the SOC of the Future

New From The Edge: 7 Skills the Transportation Sector Needs to Fuel Its Security Teams

"There’s a lot of public code in the world with insecure coding patterns, bugs, or references to outdated APIs or idioms," GitHub stated on its Copilot site. "When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns."

Technologies based on artificial intelligence (AI) and machine learning (ML) hold great promise for developers looking to reduce vulnerabilities, security analysts triaging alerts, and incident response specialists aiming to remediate issues faster, among other benefits. However, early ML systems are often prone to errors and adversarial attacks. 

In 2016, for example, Microsoft unveiled a chatbot named "Tay" on Twitter. The ML system attempted to converse with anyone sending it a message online, but it also learned from those conversations. A coordinated attack on Tay, however, led to the chatbot parroting offensive phrases and retweeting inappropriate images.

The example highlights how using input from the untrusted Internet to train ML algorithms can lead to unexpected results. GitHub stressed that Copilot is still an early effort, and security will be a focus in the future.

"It's early days, and we're working to include GitHub's own security tooling and exclude insecure or low-quality code from the training set, among other mechanisms," GitHub said in a response to questions from Dark Reading. "We hope to share more in the coming months."

Copilot is based on the OpenAI Codex, a new generative learning system that has been trained on the English language as well as source code from public repositories, such as GitHub. Typing in a comment, a function name, and variables will lead to Copilot auto-completing the body of the function with the most likely result, but the system will also offer other possible code blocks. 

In a test using Python functions, Copilot guessed the correct content in 43% of cases, and in 57% of cases, the correct code block existed in the top 10 results. The service is intended to prove that a machine agent can act as the other half of a pair programming team and significantly speed development. 

"GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense," the company stated in its response to questions from Dark Reading. "While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code."

Such systems are very likely to be targets of attackers. Researchers have taken great interest in finding ways to modify the results or disrupt such systems. The number of research papers focused on AI security and published online jumped to more than 1,500 in 2019, up from 56 three years earlier

In November 2020, MITRE worked with Microsoft and other technology companies to create a dictionary of potential adversarial attacks on AI/ML systems, and gave examples of a significant number of real attacks. 

Insecure code is not the only worry. Personal data inadvertently published to GitHub's site could be included in the output of the code, although the company found such instances "extremely rare" in this testing of the system. A study found that while Copilot could output exact instances of code, the system rarely did so.

Even so, the system is a tool and not a replacement for good coding practices, the company said.

"As the developer, you are always in charge," it said.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Comment  | 
Print  | 
More Insights
//Comments
Newest First  |  Oldest First  |  Threaded View
anushiya
anushiya,
User Rank: Apprentice
2/7/2023 | 2:41:11 AM
Pending Review
This comment is waiting for review by our moderators.
Edge-DRsplash-10-edge-articles
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
News
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
Commentary
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Everything You Need to Know About DNS Attacks
It's important to understand DNS, potential attacks against it, and the tools and techniques required to defend DNS infrastructure. This report answers all the questions you were afraid to ask. Domain Name Service (DNS) is a critical part of any organization's digital infrastructure, but it's also one of the least understood. DNS is designed to be invisible to business professionals, IT stakeholders, and many security professionals, but DNS's threat surface is large and widely targeted. Attackers are causing a great deal of damage with an array of attacks such as denial of service, DNS cache poisoning, DNS hijackin, DNS tunneling, and DNS dangling. They are using DNS infrastructure to take control of inbound and outbound communications and preventing users from accessing the applications they are looking for. To stop attacks on DNS, security teams need to shore up the organization's security hygiene around DNS infrastructure, implement controls such as DNSSEC, and monitor DNS traffic
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2023-33196
PUBLISHED: 2023-05-26
Craft is a CMS for creating custom digital experiences. Cross site scripting (XSS) can be triggered by review volumes. This issue has been fixed in version 4.4.7.
CVE-2023-33185
PUBLISHED: 2023-05-26
Django-SES is a drop-in mail backend for Django. The django_ses library implements a mail backend for Django using AWS Simple Email Service. The library exports the `SESEventWebhookView class` intended to receive signed requests from AWS to handle email bounces, subscriptions, etc. These requests ar...
CVE-2023-33187
PUBLISHED: 2023-05-26
Highlight is an open source, full-stack monitoring platform. Highlight may record passwords on customer deployments when a password html input is switched to `type="text"` via a javascript "Show Password" button. This differs from the expected behavior which always obfuscates `ty...
CVE-2023-33194
PUBLISHED: 2023-05-26
Craft is a CMS for creating custom digital experiences on the web.The platform does not filter input and encode output in Quick Post validation error message, which can deliver an XSS payload. Old CVE fixed the XSS in label HTML but didn’t fix it when clicking save. This issue was...
CVE-2023-2879
PUBLISHED: 2023-05-26
GDSDB infinite loop in Wireshark 4.0.0 to 4.0.5 and 3.6.0 to 3.6.13 allows denial of service via packet injection or crafted capture file