There's never enough time or staff to scan code repositories. To avoid dependency confusion attacks, use automated CI/CD tools to make fixes in hard-to-manage software dependencies.

Nicholas Lang, Threat Research Engineer, Sysdig

March 2, 2023

4 Min Read
a woman walking carefully on rocks while crossing a river
Source: guy Oliver via Alamy Stock Photo

Software dependencies, or a piece of software that an application requires to function, are notoriously difficult to manage and constitute a major software supply chain risk. If you're not aware of what's in your software supply chain, an upstream vulnerability in one of your dependencies can be fatal.

A simple React-based Web application can have upward of 1,700 transitive NodeJS "npm" dependencies, and after a few months "npm audit" will reveal that a relatively large number of those dependencies have security vulnerabilities. The case is similar for Python, Rust, and every other programming language with a package manager.

I like to think of dependencies as decaying fruit in the unrefrigerated section of the code grocer, especially npm packages, which are often written by unpaid developers who have little motivation to put in more than the bare minimum of effort. They're often written for personal use and they're open sourced by chance, not by choice. They're not written to last.

Recently (as of the writing of this article), PyTorch, one of the top two most popular machine learning libraries for Python, was compromised for five days via its dependency "torchitron." The attacker was able to collect system information and steal 1,000 files from each affected user's home directory. In 2021, Log4j was famously compromised and because it was bundled into basically every Web-facing Java project, DevSecOps teams were under a lot of pressure to identify where exactly they were vulnerable.

Even 20 intentional dependencies are too many for a development team to continually audit, much less 2,000 transitive ones.

What's Been Done So Far?

Not all hope is lost. For known (reported and accepted) vulnerabilities, tools exist, such as pip-audit, which scans a developer's Python working environment for vulnerabilities. Npm-audit does the same for nodeJS packages. Similar tools exist for every major programming language and, in fact, Google recently released OSV-Scanner, which attempts to be a Swiss Army knife for software dependency vulnerabilities. Whether developers are encouraged (or forced) to run these audits regularly is beyond the scope of this analysis, as is whether they actually take action to remediate these known vulnerabilities.

However, luckily for all of us, automated CI/CD tools like Dependabot exist to make these fixes as painless as possible. These tools will continually scan your code repositories for out-of-date packages and automatically submit a pull request (PR) to fix them. Searching for "dependabot[bot]" or "renovate[bot]" on GitHub and filtering to active PRs yields millions of results! However, 3 million dependency fixes versus hundreds of millions of active PRs at any given time is an impossible quantification to attempt to make outside of an in-depth analysis.

State of the Art Isn't Quite Good Enough

You need to keep in mind that these auditing tools do nothing to protect developers against zero-day exploits or N-days that were reported but have yet to be officially accepted. In the case of the PyTorch vulnerability previously mentioned, none of the auditing tools would have caught this class of dependency vulnerability, because there was no traditional "software vulnerability" being exploited. These "dependency confusion" attacks take advantage of the fact that package managers look to their "default" repository before checking other repositories that dependencies may select for their dependencies. In the case of PyTorch, the "torchitron" dependency was self-hosted by the PyTorch Foundation. When the attacker uploaded their version to PyPI, it took precedence over the official version.

How long will this continue? Forever. If there is value in exploiting these vulnerabilities, attackers will continue to take advantage of them. Luckily, in the case of PyTorch, "only" data exfiltration took place. Future victims might not be so lucky, however. A sophisticated attacker only needs one successful attempt at access to remain persistent in a victim's system (and network).

What Does This Mean for Me?

Did you install your packages from the command line? If so, did you type them in properly? Now that you've installed your dependencies "correctly," did you verify that the code for each dependency does exactly what you think it does? Did you verify that each dependency was installed from the expected package repository? Did you ….

Probably not, and that's OK! It's inhumane to expect developers to do this for every single dependency. The best bet for software developers, software companies, and even individual tinkerers is to have some form of runtime protection/detection. Luckily for us all, there are detection and response tools that have relatively recently been created which are now part of a healthy and competitive ecosystem! Many of them, like Falco, Sysdig Open Source, and Osquery, even have free and open source components. Most even come with a default set of rules/protections.

About the Author(s)

Nicholas Lang

Threat Research Engineer, Sysdig

Nicholas Lang is a security researcher at Sysdig, where he aids in the team's offensive efforts. He works on offensive research and cryptocurrency analysis and most recently helped to research and write the cryptomining section of the "2022 Sysdig Cloud Native Threat Report." Prior to Sysdig, Nicholas worked at the boutique security research firm Narf Industries, working in both an offensive and defensive capacity for clients like Darpa and the NRL.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights