Data Scientists Dial Back Use of Open Source Code Due to Security WorriesData Scientists Dial Back Use of Open Source Code Due to Security Worries
Data scientists, who often choose open source packages without considering security, increasingly face concerns over the unvetted use of those components, new study shows.
September 21, 2022
Vulnerabilities in open source components — such as the widespread flaws revealed 10 months ago in Log4j 2.0 — have forced data scientists to reevaluate the open source code frequently used in analysis and the creation of machine learning models.
According to a report by Anaconda, a data-science platform firm, in the past year, 40% of surveyed data scientists, business analysts, and students have scaled back their use of open source components, while a third remained steady, and only 7% incorporated more open source code into their projects. The majority of those surveyed do not report to the information technology department (18%), but work within their own data science or research and development group (47%), according to Anaconda's "2022 State of Data Science" report, released last week.
While software developers and IT have already started vetting secure code, the concerns over the security in open source software is a relatively new trend for the data science world, says Peter Wang, co-founder and CEO of Anaconda.
"We see a tremendous portion of people who are at organizations where IT has created a very strict posture around open source and Python," he says. "These are not expert developers. ... They are data scientists and machine learning people who may not be very seasoned developers at all, using whatever they could download to do their analysis, and then they handed that over that to IT."
The security of open source components — and the software supply chain, in general — has become a primary consideration among software developers, businesses, and national governments over the past two years. In May, for example, the US National Institute of Standards and Technology (NIST) issued guidance for address software supply chain risks. In addition, a growing number of software vendors have joined with the Linux Foundation's Open Software Security Foundation (OpenSSF).
While many data science teams scan open source components for vulnerabilities, many create their own software instead. Source: Anaconda's "2022 State of Data Science" report.
Overall, the maturity of organizations' security efforts has improved. About half of firms have an open source security policy in place, which leads to better performance in measures of security readiness, according to the June survey. In addition, the efforts to control open source risk has jumped by 51% in the past 12 months, a study of security maturity stated on Sept. 21.
"[W]ith the attention placed on software supply chains, most enterprise organizations are taking a risk-based approach to application security," Jason Schmitt, general manager of the Synopsys Software Integrity Group, said in a statement announcing the study. "Such an approach recognizes that security isn't limited to the codebase; it includes the process of software development where security reviews and testing 'shift everywhere' to continuously improve security outcomes."
Devs Expand Use of Open Source
Software companies are not seeing any sort of decrease in open source usage, according to other data. Instead, development organizations are focusing on improving the security of open source software and using security as a primary guide in selecting components.
A self-reported move away from open source packages by the data science community is likely indicative of greater awareness of security issues and less about jettisoning open source components in development, says Tracy Miranda, head of open source at Chainguard.
While data science teams and development teams may have reacted differently to major security issues — such as Log4j 2.0 — companies have little recourse when moving away from one open source package than to adopt a different package whose maintainers have put a greater emphasis on security, she says.
"Companies leverage open source as a way to increase their velocity so if they are scaling back, what are they scaling back to? Writing code in-house? Using third-party versions packaged up?" Miranda says, adding that instead, "I do think we can expect to see companies be more discerning about the quality of the open source they use, especially related to security features."
Data Scientists Are Playing Catch-up
In addition, while data science professional work at companies that overwhelmingly (87%) allow open-source software, about a quarter (26%) have minimal oversight by the IT department of their open source choices, the Anaconda report stated. In another 18% of companies, the IT department only specifies about half of the available open source components.
The maintainers of the most critical projects — of which there are hundreds, if not thousands — need to use secure dependencies, test their own code, and validate the trustworthiness of contributors. The maintainers should also publish a security scorecard — a Google-created initiative now managed by the Open Source Security Foundation (OpenSSF), which gives a security grade to a project based on nearly 20 different criteria.
While awareness is likely increasing, there is no quick solution, Miranda says.
"The reality is that the more secure options have not previously existed," she says. "Trimming unnecessary dependencies to reduce attack surface is sensible, but it's hard to do once the dependency tree has grown large."
About the Author(s)
You May Also Like
Reducing Cyber Risk in Enterprise Email Systems: It's Not Just Spam and PhishingNov 01, 2023
SecOps & DevSecOps in the CloudNov 06, 2023
What's In Your Cloud?Nov 30, 2023
Everything You Need to Know About DNS AttacksNov 30, 2023
Passwords Are Passe: Next Gen Authentication Addresses Today's Threats
How to Use Threat Intelligence to Mitigate Third-Party Risk
Concerns Mount Over Ransomware, Zero-Day Bugs, and AI-Enabled Malware
Everything You Need to Know About DNS Attacks
Securing the Remote Worker: How to Mitigate Off-Site Cyberattacks
9 Traits You Need to Succeed as a Cybersecurity Leader
The Ultimate Guide to the CISSP
The Evolving Ransomware Threat: What Business Leaders Should Know About Data Leakage
Building Immunity: The 2021 Healthcare and Pharmaceutical Industry Cyber Threat Landscape Report
Supply Chain Cyber Risk Management Whitepaper