Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

Researchers Find Bugs Using Single-Codebase Inconsistencies

A Northeastern University research team finds code defects -- and some vulnerabilities -- by detecting when programmers used different code snippets to perform the same functions.

Repeatable, consistent programming is considered a best practice in software development, and it becomes increasingly important as the size of a development team grows. Now, research from Northeastern University shows that detecting inconsistent programming — code snippets that implement the same functions in different ways — can also be used to find bugs and, potentially, vulnerabilities. 

In a paper to be presented at the USENIX Security Conference in August, a team of researchers from the university used machine learning to find bugs by first identifying code snippets that implemented the same functionality and then comparing the code to determine inconsistencies. The project, dubbed "Functionally-similar yet Inconsistent Code Snippets" (FICS), found 22 new and unique bugs by analyzing five open source projects, including QEMU and OpenSSL.

Related Content:

Developers Need More Usable Static Code Scanners to Head Off Security Bugs

Special Report: Assessing Cybersecurity Risk in Today's Enterprises

New From The Edge: Ghost Town Security: What Threats Lurk in Abandoned Offices?

The research is not meant to replace other forms of static analysis but to give developers another weapon in their arsenal to analyze their code and find potential errors, says Mansour Ahmadi, a former post-doc research associate at Northeastern University who now works as a security engineer at Amazon.

Other static analysis approaches have to have previously encountered an issue or be given a rule to detect an issue to recognize the pattern, he says. 

"If there is a bug in the system with no previously found variant, [those approaches] will fail to find the bug," Ahmadi says. "In contrast, if there are correct implementations of the functionally similar code snippets to the buggy counterpart, FICS can detect that."

The research uses machine-learning techniques — not to find matches to know vulnerability patterns, as many other projects do — but to find functionally similar code that is implemented in different, or inconsistent, ways. Such bugs can be easily verified by developers and testers when presented with both implementations, the researchers stated in a prepublication paper.

"[F]rom basic bugs such as absent bounds checking to complex bugs such as use-after-free, as long as the codebase contains non-buggy code snippets that are functionally similar to a buggy code snippet, the buggy one can be detected as an inconsistent implementation of the functionality or logic," the researchers state. "This observation is more obvious in software projects of reasonable sizes, which usually contain many clusters of functionally-similar code snippets, often contributed by different developers."

The FICS system aims to find bugs and not vulnerabilities, but it is not uncommon that the issues found impact security, Ahmadi says. The list of bugs found by the researchers include memory leaks, missing checks of values, and bad typecasting. 

The researchers believe that some of the issues should be considered vulnerabilities, but the developers maintaining the project produced patches for the defects without much consideration for their exploitability.

"We have requested CVE for a couple of the bugs, without providing the exploits that we found. While we were acknowledged by the developers for our findings, the developers did not proceed to assign CVEs to them as they believe the bugs are not exploitable," Ahmadi says. "Overall, this is the drawback of all static analyzers as it is hard to prove if a bug is exploitable without providing a proof-of-concept."

The researchers used two types of unsupervised clustering, in which the machine-learning system organizes data with similar features into groupings. First, the researchers transformed code into functional constructs so that parts of a program's code could be clustered together based on their functionality. After that, the researchers compared code in the same clusters and used machine learning to group them by implementation. A code snippet that accounted for the majority of implementations in a specific functional cluster is considered to be the correct way of coding.

False positives are a problem. The researchers used filtering to reduce the total reported consistencies by a factor of 10, which still left 1,821 identified inconsistencies. Of those, 218 are considered valid cases. The high level of false positives is an issue with all static analyzers, but specifically in the case of FICS, is not a showstopper because verification is fairly simple, says Ahmadi.

"The manual vetting effort is not as heavy as required to validate results from many other static analyzers," he says. "The ease of manual validation of FICS's reports is largely due to the presence of both the consistent and the inconsistent constructs and the highlighted differences."

The technique could be fooled into deciding the wrong code snippet is the correct one if the developer used the incorrect method more often than the correct one. Yet, this error is rare and only occurred in a single instance during the research, when two similar code snippets were incorrect and the single inconsistent code snippet was correct, Ahmadi says.

The research team also included Northeastern University PhD students Reza Mirzazade Farkhani and Ryan Williams, and Long Lu, an associate professor of computer science.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
How Enterprises are Attacking the Cybersecurity Problem
Concerns over supply chain vulnerabilities and attack visibility drove some significant changes in enterprise cybersecurity strategies over the past year. Dark Reading's 2021 Strategic Security Survey showed that many organizations are staying the course regarding the use of a mix of attack prevention and threat detection technologies and practices for dealing with cyber threats.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-10-21
Rasa is an open source machine learning framework to automate text-and voice-based conversations. In affected versions a vulnerability exists in the functionality that loads a trained model `tar.gz` file which allows a malicious actor to craft a `model.tar.gz` file which can overwrite or replace bot...
PUBLISHED: 2021-10-21
Sulu is an open-source PHP content management system based on the Symfony framework. In versions before 1.6.43 are subject to stored cross site scripting attacks. HTML input into Tag names is not properly sanitized. Only admin users are allowed to create tags. Users are advised to upgrade.
PUBLISHED: 2021-10-21
"HCL Connections Security Update for Reflected Cross-Site Scripting (XSS) Vulnerability"
PUBLISHED: 2021-10-21
Reflected Cross-Site Scripting (XSS) vulnerability in WordPress Ivory Search plugin (versions <= 4.6.6). Vulnerable parameter: &post.
PUBLISHED: 2021-10-21
The Catch Themes Demo Import WordPress plugin is vulnerable to arbitrary file uploads via the import functionality found in the ~/inc/CatchThemesDemoImport.php file, in versions up to and including 1.7, due to insufficient file type validation. This makes it possible for an attacker with administrat...