A recent project to scan the main Python repository's 268,000 packages found only a few potentially malicious programs, but work earlier this year uncovered hundreds of instances of malware.

4 Min Read

Open source repositories form the backbone of modern software development — nearly every software project includes at least one component — but security experts increasingly worry that attackers are focused on infecting systems by inserting malicious code into popular repositories.

A number of projects have kicked off this year to search for such Trojan horses. Last week, Stripe engineer Jordan Wright published the results of a home-brew research project that downloaded every Python component from the Python Package Index (PyPI) and looked for system calls that could indicate malicious intent. Overall, he found hundreds of packages that created network connections — most by including a common dependency — and a few packages that seemed risky. These included two that appeared to be test cases — one named "i-am-malicious" and another named "maliciouspackage" — and a third that used obfuscation to hide commands.

However, none of the scanned packages seemed outright malicious, Wright said in his analysis.

"Looking through the data, I didn't find any packages doing significantly harmful activity that didn't also have 'malicious' somewhere in the name, which was good," he said. "But it's always possible I missed something, or that [attackers installing malicious code] would happen in the future."

In fact, such attacks have already happened. Two years ago, for example, an attacker compromised a developer's account and published malicious versions of two components of the popular Javascript package ESLint to the Node Package Manager (NPM) service. While the package has millions of weekly downloads, the project group received a warning and unpublished the packages within two hours, limiting the impact.

The attack often takes another form: typosquatting, where attackers create Trojan horses that have names similar to common packages. In April, an attacker seeded the Ruby package repository, RubyGems, with more than 760 malicious packages with names similar to legitimate packages. Such attacks attempt to take advantage of mistyped install commands — relatively rare, perhaps, but devastating if they produce a compromise.

Last year, the Python core development team asked the community for ways of finding malicious code inserted into the modules and packages used by Python. For open source projects, these issues are particularly challenging, said Mike Myers, principal security engineer at Trail of Bits, a software security consultancy, in an answering comment.

"[T]he Google and Apple app stores have both invested heavily in runtime analysis sandboxes and static analysis approaches for detecting malice in their app stores," he said. "The difference there being, they can run their detections in secret, and adversaries can't develop an evasion in advance without disclosing it in a submission."

A team of researchers from the Georgia Institute of Technology carried out a similar analysis for three major repositories: Python's PyPI, the Node Package Manager (NPM), and RubyGems. Their system, dubbed MalOSS, combines metadata analysis, static code analysis, and dynamic runtime analysis to determine whether a package is behaving maliciously. The researchers found seven malicious packages in PyPI, 41 in NPM, and 291 in RubyGems, according to their paper published in February 2020.

Inspired by the Georgia Tech work, Wright aimed to look for signs that attackers inserted malicious code into packages by analyzing the system functions called during installation. Using the PyPI API, he downloaded 268,000 packages into a container, installed each, and watched for suspicious changes. The entire process cost about $120 in cloud fees, he said.

Wright plans to expand the effort to continuously monitor PyPI and add repositories for other platforms in the future.

"This found a few instances of potentially malicious behavior that you can find in the post, but the real power will be setting up continuous monitoring moving forward," he stated on Twitter.

Overall, Wright makes the case that each of the major repositories need to implement their own security and continuously monitor for malicious supply chain attacks in the future. Otherwise, installing packages from code in the repositories presents too great a risk, he said.

"I still don't like that it's possible to run arbitrary commands on a user's system just by them pip installing a package," Wright said. "I get that the majority of use cases are benign, but it opens up risk that must be considered. Hopefully by increasingly monitoring various package managers we can identify signs of malicious activity before it has a significant impact."

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights