Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

End of Bibblio RCM includes -->

Researchers Find Bugs Using Single-Codebase Inconsistencies

A Northeastern University research team finds code defects -- and some vulnerabilities -- by detecting when programmers used different code snippets to perform the same functions.

Repeatable, consistent programming is considered a best practice in software development, and it becomes increasingly important as the size of a development team grows. Now, research from Northeastern University shows that detecting inconsistent programming — code snippets that implement the same functions in different ways — can also be used to find bugs and, potentially, vulnerabilities. 

In a paper to be presented at the USENIX Security Conference in August, a team of researchers from the university used machine learning to find bugs by first identifying code snippets that implemented the same functionality and then comparing the code to determine inconsistencies. The project, dubbed "Functionally-similar yet Inconsistent Code Snippets" (FICS), found 22 new and unique bugs by analyzing five open source projects, including QEMU and OpenSSL.

Related Content:

Developers Need More Usable Static Code Scanners to Head Off Security Bugs

Special Report: Assessing Cybersecurity Risk in Today's Enterprises

New From The Edge: Ghost Town Security: What Threats Lurk in Abandoned Offices?

The research is not meant to replace other forms of static analysis but to give developers another weapon in their arsenal to analyze their code and find potential errors, says Mansour Ahmadi, a former post-doc research associate at Northeastern University who now works as a security engineer at Amazon.

Other static analysis approaches have to have previously encountered an issue or be given a rule to detect an issue to recognize the pattern, he says. 

"If there is a bug in the system with no previously found variant, [those approaches] will fail to find the bug," Ahmadi says. "In contrast, if there are correct implementations of the functionally similar code snippets to the buggy counterpart, FICS can detect that."

The research uses machine-learning techniques — not to find matches to know vulnerability patterns, as many other projects do — but to find functionally similar code that is implemented in different, or inconsistent, ways. Such bugs can be easily verified by developers and testers when presented with both implementations, the researchers stated in a prepublication paper.

"[F]rom basic bugs such as absent bounds checking to complex bugs such as use-after-free, as long as the codebase contains non-buggy code snippets that are functionally similar to a buggy code snippet, the buggy one can be detected as an inconsistent implementation of the functionality or logic," the researchers state. "This observation is more obvious in software projects of reasonable sizes, which usually contain many clusters of functionally-similar code snippets, often contributed by different developers."

The FICS system aims to find bugs and not vulnerabilities, but it is not uncommon that the issues found impact security, Ahmadi says. The list of bugs found by the researchers include memory leaks, missing checks of values, and bad typecasting. 

The researchers believe that some of the issues should be considered vulnerabilities, but the developers maintaining the project produced patches for the defects without much consideration for their exploitability.

"We have requested CVE for a couple of the bugs, without providing the exploits that we found. While we were acknowledged by the developers for our findings, the developers did not proceed to assign CVEs to them as they believe the bugs are not exploitable," Ahmadi says. "Overall, this is the drawback of all static analyzers as it is hard to prove if a bug is exploitable without providing a proof-of-concept."

The researchers used two types of unsupervised clustering, in which the machine-learning system organizes data with similar features into groupings. First, the researchers transformed code into functional constructs so that parts of a program's code could be clustered together based on their functionality. After that, the researchers compared code in the same clusters and used machine learning to group them by implementation. A code snippet that accounted for the majority of implementations in a specific functional cluster is considered to be the correct way of coding.

False positives are a problem. The researchers used filtering to reduce the total reported consistencies by a factor of 10, which still left 1,821 identified inconsistencies. Of those, 218 are considered valid cases. The high level of false positives is an issue with all static analyzers, but specifically in the case of FICS, is not a showstopper because verification is fairly simple, says Ahmadi.

"The manual vetting effort is not as heavy as required to validate results from many other static analyzers," he says. "The ease of manual validation of FICS's reports is largely due to the presence of both the consistent and the inconsistent constructs and the highlighted differences."

The technique could be fooled into deciding the wrong code snippet is the correct one if the developer used the incorrect method more often than the correct one. Yet, this error is rare and only occurred in a single instance during the research, when two similar code snippets were incorrect and the single inconsistent code snippet was correct, Ahmadi says.

The research team also included Northeastern University PhD students Reza Mirzazade Farkhani and Ryan Williams, and Long Lu, an associate professor of computer science.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Comment  | 
Print  | 
More Insights
Threaded  |  Newest First  |  Oldest First
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
The 10 Most Impactful Types of Vulnerabilities for Enterprises Today
Managing system vulnerabilities is one of the old est - and most frustrating - security challenges that enterprise defenders face. Every software application and hardware device ships with intrinsic flaws - flaws that, if critical enough, attackers can exploit from anywhere in the world. It's crucial that defenders take stock of what areas of the tech stack have the most emerging, and critical, vulnerabilities they must manage. It's not just zero day vulnerabilities. Consider that CISA's Known Exploited Vulnerabilities (KEV) catalog lists vulnerabilitlies in widely used applications that are "actively exploited," and most of them are flaws that were discovered several years ago and have been fixed. There are also emerging vulnerabilities in 5G networks, cloud infrastructure, Edge applications, and firmwares to consider.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2023-03-17
The Bookly plugin for WordPress is vulnerable to Stored Cross-Site Scripting via the full name value in versions up to, and including, 21.5 due to insufficient input sanitization and output escaping. This makes it possible for unauthenticated attackers to inject arbitrary web scripts in pages that w...
PUBLISHED: 2023-03-17
The WP Express Checkout plugin for WordPress is vulnerable to Stored Cross-Site Scripting via the ‘pec_coupon[code]’ parameter in versions up to, and including, 2.2.8 due to insufficient input sanitization and output escaping. This makes it possible for authenti...
PUBLISHED: 2023-03-17
A vulnerability was found in SourceCodester Student Study Center Desk Management System 1.0. It has been rated as critical. This issue affects the function view_student of the file admin/?page=students/view_student. The manipulation of the argument id with the input 3' AND (SELECT 2100 FROM (SELECT(...
PUBLISHED: 2023-03-17
A vulnerability classified as critical has been found in SourceCodester Student Study Center Desk Management System 1.0. Affected is an unknown function of the file Master.php?f=delete_img of the component POST Parameter Handler. The manipulation of the argument path with the input C%3A%2Ffoo.txt le...
PUBLISHED: 2023-03-17
A vulnerability classified as critical was found in SourceCodester Student Study Center Desk Management System 1.0. Affected by this vulnerability is an unknown functionality of the file admin/?page=reports&date_from=2023-02-17&date_to=2023-03-17 of the component Report Handler. The manipula...