More than five out of every 1,000 commits to GitHub included a software secret, half again the rate in 2021, putting applications and businesses at risk.

3 Min Read
Developer with hand raised to a digital screen.
Source: Dmytro Zinkevych via Alamy Stock Photo

The rate at which developers leaked critical software secrets, such as passwords and API keys, jumped by half to reach 5.5 out of every 1,000 commits to GitHub repositories.

That's according to a report published by secrets-management firm GitGuardian this week. Though the percentage seems small, overall, the firm detected at least 10 million instances of secrets leaking to a public repository, accounting for more than 3 million unique secrets in total, the company stated in its "2022 State of Secrets Sprawl" report. 

While generic passwords accounted for the majority (56%) of secrets, more than a third (38%) involved a high-entropy secret that includes API keys, random number generator seeds, and other sensitive strings.

As more companies move their application infrastructure and operations to the cloud, API keys, credentials, and other software secrets have become critical to the security of their business. When those secrets leak, the results can be devastating, or at the very least, expensive.

"Secrets are the crown jewels of any business or organization — they really can grant access into all of your systems and infrastructure," says Mackenzie Jackson, security and developer advocate at GitGuardian. "The risk can be anything, from complete system takeovers, to small data exposures, or various other things."

Chart of most sensitive files types. Environment files (env) typically contain the most sensitive data. Source: GitGuardian

"Millions of such keys accumulate every year, not only in public spaces, such as code-sharing platforms, but especially in closed spaces such as private repositories or corporate IT assets," GitGuardian stated in its "2023 State of Secrets Sprawl" report.

And even those private spaces can be vulnerable. In January, for instance, collaboration and messaging platform Slack warned users that a "limited number of Slack employee tokens" had been stolen by a threat actor, who then downloaded private code repositories. Last May, cloud application platform provider Heroku, a subsidiary of Salesforce, acknowledged that an attacker had stolen a database of hashed and salted password after gaining access to the OAuth tokens used to integrate with GitHub.

Infrastructure — as ... Whoops!

Part of the reason for the increase in leaking secrets is because infrastructure-as-code (IaC) has become much more popular. IaC is the managing and provisioning of infrastructure through code instead of through manual processes, and in 2022, the number of IaC-related files and artifacts pushed to GitHub repositories increased by 28%. The vast majority (83%) of files consisting of configuration files for Docker, Kubernetes, or Terraform, according to GitGuardian.

IaC allows developers to specify the configuration of the infrastructure used by their application, including servers, databases, and software-defined networking. To control all those components, secrets are often necessary, Jackson says.

"The attack surface keeps expanding," he says. "Infrastructure-as-code has become this new thing and it's exploded with popularity, and infrastructure needs secrets, so infrastructure-as-code [files] often contains secrets."

In addition, three filetypes commonly used as caches for sensitive application information — .env, .key, and .pem — are considered the most sensitive, defined as having the most secrets per file. Developers should almost always avoid publishing those files to a public repository, Jackson says.

"If one of these files is in your Git repository, then you know you have holes in your security," he says. "Even if the file doesn't contain secrets, they just should never be there. You should have prevention in place to make sure that they're not there and alerting in place to know when they are there."

For that reason, companies should continuously scan systems and files for secrets, gaining visibility and ability to block potential dangerous files, Jackson adds.

"You want to scan all of your infrastructure to make sure you have visibility" he says. "And then the next steps include implementing tools to check engineers and developers ... to detect any secrets when they slip out."

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights