Attackers are mimicking the names of existing packages on public registries in hopes that users or developers will accidentally download these malicious packages instead of legitimate ones.

Kim Lewandowski & Bentz Tozer, Product Manager, Google Security / Senior Member of Technical Staff, Cyber Practice, In-Q-Tel

March 18, 2021

5 Min Read

When typosquatting is mentioned, most people think of domain typosquatting, which according to the Anticybersquatting Consumer Protection Act (ACPA) of 1999 means registering, trafficking in, or using an Internet domain name with bad-faith intent to profit from the goodwill of a trademark belonging to someone else. Domain (or URL) cybersquatting was commonplace before the passage of the ACPA as individuals looked to profit by registering domains associated with well-known companies and registered trademarks. After the passage of the ACPA and creation of other regulations to resolve disputes over domain name control, there are clear policies and processes in place to address this type of typosquatting.

This article focuses on a different type of typosquatting, called package typosquatting, where there is less oversight and more opportunities for bad actors to cause harm. Here's how it works. Modern software development and usage relies on the use of package managers that support code reuse, including code from registries where developers upload their built software packages for others to download from the Internet and use. Package typosquatting is a type of software supply chain attack where the attacker tries to mimic the name of an existing package on a public registry in hopes that users or developers will accidentally download the malicious package instead of the legitimate one.

Because there is no central body for managing or validating software packages, it's easy for attackers to upload a malicious package that is very similar to the real one, and there are no real repercussions if they are caught. For example, a developer may try to install an image editor that has the filename "moving_images," while a malicious attacker has uploaded a package titled "moving-images." In that instance, an underscore is replaced with a dash. Attackers can also try slight misspellings or flip-flopping the name (e.g., nmap-python instead of python-nmap) in hopes of confusing the developer into picking the malicious package.

While package typosquatting is a relatively obscure issue compared with other attack techniques, it's growing at an alarming rate. In 2018 alone, research indicated more than 100 malicious packages had more than a cumulative 600 million downloads. In April 2020, more than 700 malicious typosquatting libraries were found in the RubyGems repository alone. One of the better-known package typosquatting events occurred in December 2019, when it was reported that two Trojanized Python libraries from PyPI (Python Package Index) were actually mimicking other more popular libraries and, if used, the malicious code would steal SSH and GPG keys from the projects of infected developers.

Stay Protected
A simple countermeasure is for developers, while considering what package to add, to do due diligence before they add a package: double-check the package name carefully, look for similar names, and make sure that the package "date added" and "number of downloads" are what they'd expect. This can counter many attacks, but developers can make mistakes, and sometimes malicious packages are already in use, so more needs to be done. On the research side, one security technique being worked on to minimize the threat is the use of string detection algorithms that identify how close two words are to each other with hopes of capturing and flagging misspellings (a common package typosquatting technique), while others are looking at the relative commit activity and popularity of packages with similar names. While there are examples of this work being done in the wild (like the PyPI community for Python), not all package registries have prioritized the prevention of typosquatting attacks.  

One way to help mitigate package typosquatting attacks is by using your own internal registry that only references packages that have been determined to be what you expected, such as Sonatype Nexus, JFrog Artifactory, and Google Artifact Registry. Using products like these, it's not too difficult to build your own software registry internally. That way, if you make a typo, you don't have to worry about what someone may have uploaded to take advantage of your mistake. However, even private registries aren't guaranteed to mitigate some attacks, as we've seen with this recent post by Alex Birsan on dependency hijacking.

Another potential solution to fight package typosquatting would be for additional package registries and managers to support namespacing, a technique employed to avoid collisions with other objects or variables in the global namespace. Trusted publishers and identity verification mechanisms would also help mitigate attacks by linking across source code repositories and package registries and making it much more difficult for attackers to act, but there are numerous issues with implementing this type of system that are not limited to package management. It might also help to have more law enforcement involvement (as with ACPA) when typosquatting is detected; typosquatting is often a trademark violation (even if the trademark is unregistered), and creating a package to intentionally access a computer without authorization is against the law in many countries.

We know the list of everything a developer must worry about is daunting, package typosquatting included, and any distraction that impedes progress is unwelcomed. Developers want to code and build applications, not spend all their time wading through dependencies looking for vulnerabilities and backdoors. Since the average Python or JavaScript library has more than 100 dependencies, that's a fulltime job! That's why automated safeguards are needed. Luckily, there's a lot of interest across the industry to help mitigate against package typosquatting attacks. It is an industrywide problem that's affecting everyone and, therefore, discussion and collaboration must be a part of the solution. Discussions are starting to happen now in the Securing Critical Projects Working Group meetings in the OpenSSF.

About the Author(s)

Kim Lewandowski & Bentz Tozer

Product Manager, Google Security / Senior Member of Technical Staff, Cyber Practice, In-Q-Tel

Kim Lewandowski (she/her) is a Product Manager on Google's Open Source Security Team. She represents Google on the Governing Board of the Open Source Security Foundation (OpenSSF.org). At Google, she has shipped a number of Cloud products and is now focusing on improving security of the open source software we all depend on. Prior to joining Google, Kim wrote code for the world's most powerful laser at a nuclear research lab, and also worked at a number of startups, including her own.

Bentz Tozer is a Senior Member of Technical Staff in In-Q-Tel's Cyber Practice, where he identifies and works with startups with the potential for high impact on national security. In previous roles, he has performed security research and software development with a focus on IoT devices and embedded systems. He has a PhD in Systems Engineering from The George Washington University.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights