Cyberattackers Torch Python Machine Learning Project

The popular PyTorch Python project for data scientists and machine learning developers has become the latest open source project to be targeted with a dependency confusion attack.

4 Min Read
Tiki torch near the beach at sunset
Source: Regis Martin via Alamy Stock Photo

An unknown attacker slipped a malicious binary into the PyTorch machine learning project by registering a malicious project with the Python Package Index (PyPI), infecting users' machines if they downloaded a nightly build between Dec. 25 and Dec. 30.

The PyTorch Foundation stated in an advisory on Dec. 31 that the effort was a dependency confusion attack, in which an unknown entity created a package in the Python Package Index with the same name, torchtriton, as a code library on which the PyTorch project depends. The malicious library included the functions normally used by PyTorch but with a malicious modification: It would upload data from the victim's system to a server at a now-defunct domain.

The malicious function would grab a variety of system-specific information, the username, environment variables, a list of hosts to which the victim's machine connects, the list of password hashes, and the first 1,000 files in the user's home directory.

"Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository," the advisory stated. "This design enables somebody to register a package by the same name as one that exists in a third party index, and [the package manager] will install their version by default."

The attack is the latest software supply chain attack to target open source repositories. In mid-December, for example, researchers discovered a malicious package disguised as a client from cybersecurity firm SentinelOne that had been uploaded to PyPI. In another dependency confusion attack in November, attackers created more than two dozen clones of popular software with names designed to fool unwary developers. Similar attacks have targeted the .NET-focused Nuget repository and the Node.js Package Manager (npm) ecosystem.

Same Name, Different Packages

In the latest attack on PyTorch, the attacker used the name of a software package that PyTorch developers would load from the project's private repository, and because the malicious package existed in the PyPI repository, it gained precedence. The PyTorch Foundation removed the dependency in its nightly builds and replaced the PyPI project with a benign package, the advisory stated.

The group also removed any nightly builds that depend on the torchtriton dependency from the project's download page and says it plans to take ownership of the torchtriton project on PyPI.

Fortunately, because the torchtritan dependency was only imported into the nightly builds of the program, the impact of the attack did not propagate to typical users, Paul Ducklin, a principal research scientist at cybersecurity firm Sophos, said in a blog post.

"We're guessing that the majority of PyTorch users won't have been affected by this, either because they don't use nightly builds, or weren't working over the vacation period, or both," he wrote. "But if you are a PyTorch enthusiast who does tinker with nightly builds, and if you've been working over the holidays, then even if you can't find any clear evidence that you were compromised, you might nevertheless want to consider generating new SSH key pairs as a precaution, and updating the public keys that you've uploaded to the various servers that you access via SSH."

The PyTorch Foundation confirmed that users of the stable version of the PyTorch library would not be affected by the issue.

Mistaken Intentions?

In a widely circulated mea culpa, the attacker claimed that they are a legitimate researcher and that the issue resulted from their investigation into dependency confusion issues.

"I want to assure that it was not my intention to steal someone's secrets," the person wrote, claiming to have notified Facebook on Dec. 29 of the issue and made reports to companies using the HackerOne crowdsourcing platform. "Had my intents been malicious, I would never have filled [sic] any bug bounty reports, and would have just sold the data to the highest bidder."

Because of the statement, some experts considered the PyTorch advisory to be a "false alarm," but there have been other attackers that have donned the mantle of a misunderstood researcher.

Moreover, the impact of the attack could have exposed victims' sensitive information, even if the person behind the malware had good intentions, Sophos' Ducklin wrote in a blog post about the software supply chain attack.

"How is this a 'false alarm'? " he also said in a tweet. "This malware deliberately steals your data… and transmits it scrambled, not encrypted ... so anyone on your network path who recorded it can trivially decode it."

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights