5 min read

Hundreds of Open Source Components Could Undermine Security, Census Finds

The Linux Foundation and Harvard University create lists of the top 500 most popular open source projects, highlighting critical software that needs to be secured.

The Linux Foundation and Harvard's Lab for Innovation Science this week released the rankings of the top 500 open source projects in two major ecosystems in the first step toward cataloging the critical software components on which much of the Internet, applications, and device firmware rely.

The lists of common software components, dubbed the Census II of Free and Open Source Software – Application Libraries, ranked the top 500 packages from the JavaScript-focused Node Package Manager (NPM) ecosystem and the top 500 components from non-NPM ecosystems, including the Java-focused Maven repository, the Python Package Index (PyPI), and the .NET-focused NuGet package repository. The Linux Foundation and Harvard created four different lists for each ecosystem, based on whether the packages were directly or indirectly imported and whether the analysis counted each version separately.

In addition to public repository data, the census used telemetry from three software composition analysis firms to gain information on what software components are commonly used by companies in their applications, says Frank Nagle, an assistant professor of business administration at Harvard Business School and one of the five authors of the Census II report.

"We want to bring a little bit more science to how we are going to invest in securing open source software," he says. "By no means is this list perfect — it is not the be-all, end-all of open source, but at the same time, it is something that is based on the use of open source at thousands of companies."

Securing the open source and commercial software included in applications has become a major issue for the US government and businesses this year. Open source software typically accounts for 70% to 90% of code in Web and cloud applications — application security firm Synopsys found that 98% of applications analyzed using its service included open source software and 75% of the average codebase came from open source projects. The average software application relied on more than 500 open source dependencies, the company said.

Biden's Software Security Order 
In January, as companies struggled to hunt down and patch applications affected by the Log4j component, the White House held a software summit to direct the public and private sectors to invest more resources into securing open source software. A score of technology companies, led by Microsoft and Google, have put money into the Open Source Security Foundation (OpenSSF) to identify critical software components and fund security training, additional developers, and application testing, including as part of the OpenSSF's Alpha-Omega Project.

The Census II project aims to find widespread open source projects that use outdated versions, popular components that are maintained by overworked developers, and common components that have slow vulnerability remediation times.

"[T]here may be integral FOSS projects whose simplicity or size may belie their vital importance to the modern economy," the authors stated in the report. "As such, the overarching goal [of the project] is to reinforce this infrastructure and guard against systemic vulnerabilities."

The goal of the project is to give three groups more data on the usage of open source software, Nagle says. The first group — government agencies and open source security organizations — need the data to determine where to invest the funds that are being allocated by public initiatives, such as the Biden administration's executive order on software security, and by private efforts, such as the OpenSSF. Meanwhile, companies need the information to determine what software components to use in their application in the future, he says.

"When companies think about what code they are using and what code they are relying on, often they are only thinking about what packages their developers are putting directly into their products," Nagle says. "However, we have all these layers and layers of dependencies. ... When you think about what software you use and what you are reliant on, it just can't be the highest level, the tip of the iceberg; you have to dig deeper than that."

Finally, the top 500 lists can be a notification to open source project maintainers that their code affects far more users than they might have expected, he says.

"The individual developers often don't know — they often don't know how widely spread their packages are," he says.

Second Phase of Linux Foundation Initiative
The Census II project is an extension of a previous effort to find the most critical open source software. In 2015, the Linux Foundation's Core Infrastructure Initiative completed the initial Census Project, or Census I, to survey the Debian Linux distribution and identify the components that were most critical to the core operation system's operation and security. Census II dramatically expands that with more data, Brian Behlendorf, executive director at Linux Foundation’s Open Source Security Foundation (OpenSSF), said in the announcement.

"Understanding what FOSS packages are the most widely used in society allows us to proactively engage the critical projects that warrant operations and security support," he said. "Open source software is the foundation upon which our day-to-day lives run, from our banking institutions to our schools and workplaces. Census II provides the foundational detail we need to support the world’s most critical and valuable infrastructure."