CISA Flags Memory-Unsafe Code in Major Open Source Projects
Despite more than 50% of all open source code being written in memory-unsafe languages like C++, we are unlikely to see a massive overhaul to codebases anytime soon.
June 28, 2024
A comprehensive new study has unearthed fresh details on the extensive and troubling use of memory-unsafe code in major open source software (OSS) projects.
However, the chances that fresh insight on a long known issue will spur any immediate changes to the software landscape remain bleak, given just how enormous, costly, and complex the task is of rewriting codebases entirely in memory-safe code.
Memory-unsafe programming languages such as C and C++ allow programmers to have more direct control over memory-related functions in code, which can often lead to very common application security issues like buffer overflows and use-after-free errors. Such flaws represent a large proportion of all vulnerabilities in modern application software. In contrast, memory-safe languages — the most common examples of which include Rust, Python, Java, and Go —offer guardrails such as built-in runtime and compile time checks to mitigate against common memory related errors.
Most OSS Projects Contain Memory-Unsafe Code
The US Cybersecurity and Infrastructure Security Agency (CISA) along with the FBI and counterparts at the Australian Cyber Security Centre and the Canadian Centre for Cyber Security this week released a report summarizing the results of their investigation into the use of memory-unsafe code in OSS.
The findings, while troubling, are not entirely unexpected given past data on the extensive use of memory-unsafe languages in almost all modern codebases. Fifty-two percent of the 172 major open source projects that the research authors looked at contained code written in a memory-unsafe language. More than half (55%) of the total lines of code in all the projects combined were written in a memory-unsafe language, with the larger projects being the worst culprits.
Some 95% of the total lines of code in Linux for instance are memory-unsafe. For MySQL Server, that number was 84%; for TensorFlow it was 64%; for Zephyr 84%; and for Chromium 51%. On average, 26% of the total lines of code in the 10 largest open source projects consisted of memory-unsafe code. Even projects written in memory-safe languages were at risk from dependencies on unsafe components.
"Most critical open source projects analyzed, even those written in memory-safe languages, potentially contain memory safety vulnerabilities," the report noted. "This can be caused by direct use of memory-unsafe languages or external dependency on projects that use memory-unsafe languages."
In addition, the tendency — and often the need — to disable memory-safety features to accommodate functional requirements in applications can often neutralize the benefits of using otherwise memory-safe languages.
"These limitations highlight the need for continued diligent use of memory safe programming languages, secure coding practices, and security testing," the report authors noted.
CISA Consistent With Previous OSS Data
The findings are consistent with numerous previous studies that have examined the extensive problems tied to the use of memory-unsafe languages.
And indeed, concerns over the ubiquity of the problem have prompted calls for change over the years. The most recent is a February 2024 technical report from the White House that urged industry stakeholders to go back to the building blocks and start over with using memory safe code in all software. In 2022, the US National Security Agency (NSA) urged software makers and all organizations developing software to consider adopting memory-safe languages to reduce risk from memory management related software issues in modern code bases. The continued pounding away at the topic over the years has spurred some change, but most expect it will take years — if not even decades — for a whole scale shift to memory-safe languages to happen.
"Adopting memory-safe code is challenging, primarily because changing a programming language often requires a complete rewrite of existing code," says Neatsun Ziv, CEO and Co-Founder of OX Security. The cost and effort required to undertake such a massive overhaul without significant economic incentives will likely make any change, a slow process.
Making the World Memory-Safe: A Huge & Complex Challenge
Omkhar Arasaratnam, general manager at OpenSSF says memory safety issues aren't specifically a problem for either open or closed-source software. It's a problem in general for all modern software.
"There are many memory-safe languages available today like JavaScript, Python, and Java, but software engineers often use memory-unsafe older languages like C/C++ for performance or low-level hardware access," he says.
Also, while Rust has emerged as a viable alternative to C/C++ for low level systems programming in recent years, there are many embedded systems and safety-critical applications for which Rust is not appropriate, he adds.
"While it is certainly possible to write memory-safe code in a memory-unsafe language, 25 years of CVEs tells us it is highly unlikely," Arasaratnam says. "It is not that people are bad programmers, but defensively writing code that is memory-safe in a memory-unsafe language is very difficult," he notes. As newer projects adopt memory-safe languages, expect the use of memory-unsafe languages to decrease over time, in all but niche applications.
Tim Mackey, head of software supply chain risk strategy at Synopsys Software Integrity Group, says the new report does a good job showing how some major open source software projects such as Kubernetes and WordPress are authored in a memory-safe language. However, there are other issues that remain unexplored, he says. For example, it would be interesting to know if memory-safe languages are being used in new projects on GitHub, and whether memory-safe libraries are being used as dependencies in larger projects.
"We can safely say that awareness of memory safe languages is growing, but is it growing at a rate that would displace older languages? For example, are the creators of new embedded software solutions using C++ or Rust, and to what degree?"
About the Author
You May Also Like
Cybersecurity Day: How to Automate Security Analytics with AI and ML
Dec 17, 2024The Dirt on ROT Data
Dec 18, 2024