Data Scientists, Watch Out: Attackers Have Your Number

Researchers should take extra care in deploying data-science applications to the cloud, as cybercriminals are already targeting popular data-science tools such as Jupyter Notebook.

4 Min Read
Visualization of data science terms projected in blue onto a glass panel for a hand to manipulate
Source: Wright Studio via Adobe Stock

Always looking for an easy compromise, attackers are now scanning for data-science applications — such as Jupyter Notebook and JupyterLab — along with cloud servers and containers for misconfigurations, cloud-protection firm Aqua Security stated in an advisory published on April 13.

The two popular data science applications — used frequently with Python and R for data analysis — are generally secure by default, but a small fraction of instances are misconfigured, allowing attackers to access the servers with no password, according to the Aqua Security's researchers. In addition, after setting up its own server as a honeypot, the company detected in-the-wild attacks that attempted to install cryptomining tools and ransomware onto accessible instances of the software.

Signs that there are attackers targeting data-science environments is worrisome, considering that the researchers setting up those environment are largely uninformed about cybersecurity, says Assaf Morag, lead data analyst with Aqua Security.

"We know, based on our experience with application security, that developers are starting to learn more about security, but what about data scientists?" he says. "Are they gaining a proper education? My training is as a data scientist, and there were no focus on data security."

Looking for Misconfigurations
In the past, threat actors have frequently scanned the Internet for servers running insecurely configured application. Last year, for example, a misconfigured Git server using default login credentials allowed attackers to steal source code for Nissan's mobile apps, market-research tools, and vehicle-connected services. Over the past five years, a significant number of data breaches have been caused by misconfigured storage servers — such as Amazon's Simple Storage Service (S3) buckets — that were found by attackers.

Research conducted in 2020 found that misconfigured cloud services, containers, and servers are attacked within hours of appearing online. The research, which used an insecurely configured Elasticsearch instance, was targeted by 175 scans over 11 days, increasing to 435 requests over the subsequent 2 weeks, including searches for particular terms such as "password" and "wallet."

The attacks on data-science tools used similar tactics, as seen through the lens of its honeypots, Aqua Security stated in its advisory.

"Most of the attacks got initial access via misconfigured environments," the company stated. "After gaining access, adversaries attempted to achieve persistence by creating a new user in the notebook or adding Secure Shell (SSH) keys. Then, most of the attacks executed a cryptominer, trying to get a quick gain."

The problem is that a small minority of the instances are configured to allow anyone to access the notebook server. During the research, for example, the Aqua Security researchers saw a novel attempt at using access gained through Jupyter Notebook to run a Python-based crypto-ransomware program.

"Normally, access to the online application should be restricted, either with a token or password or by limiting ingress traffic," Aqua Security wrote in the ransomware advisory. "However, sometimes these notebooks are left exposed to the internet with no authentication means, allowing anyone to easily access the notebook via a web browser."

Lock Down Data Tools
Overall, data scientists and the Jupyter Project appear to be doing a credible job in securing the software. In total, less than 1% of the approximately 10,000 of instances of Jupyter Notebook are configured for open access, according to scans using the Shodan search engine. The fact that attackers are targeting the servers, however, should prompt defenders to focus on ensuring that the data-science tools are locked down, says Aqua Security's Morag.

"There will always be a zero-day or another misconfiguration that users did not know about, like Log4shell or Spring4 shell," he says. "While those do not apply to Jupyter Notebook, they show that you need to have another layer of protection. If you rely on the network to protect you, it is not enough these days, I think."

The Aqua research is not the first to expose vulnerabilities and misconfigurations of data-science tools. In 2021, researchers from cloud-security firm found that Microsoft created an insecure implementation for linking Jupyter Notebook instances to Azure Cosmos DB databases, allowing attackers to create a connection between an instance of Jupyter Notebook and the database service, which then allowed the attacker to access all other databases using the same service.

Attackers' strategy of targeting a new community of technical users is not new. Threat actors have already started targeting machine-learning researchers and the data behind artificial-intelligence and machine-learning systems, according to experts.

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights