Source Code Leaks: The Real Problem Nobody Is Paying Attention To

Source code is a corporate asset like any other, which makes it an attractive target for hackers.

Mackenzie Jackson, Developer Advocate, GitGuardian

December 14, 2021

5 Min Read

At 10:30 p.m. PST on Oct. 6, Twitch released the following statement on its corporate blog: "We have learned that some data was exposed to the internet due to an error in a Twitch server configuration change that was subsequently accessed by a malicious third party."

Then, on Oct. 15, Twitch released an updated version of its statement revealing more details about the leak and confirming the "exposed data primarily contained documents from Twitch's source code repository, as well as a subset of creator payout data." GitGuardian inspected the 6,000 leaked git repositories for secrets and sensitive data, and while most of the attention has been on the leaked creators' revenues, the results show a much more serious problem that extends beyond this breach.

This leak can be added to a long list: Symantec in 2012, Adobe in 2013, Microsoft in 2017, Apple and Snapchat in 2018, Samsung in 2019, and dozens of enterprise companies in one single high-profile operation executed by a Swiss hacker in 2020. Not a year goes by without hearing or reading about such horror stories in the world of cybersecurity. But what makes source code such an attractive target for hackers?

Is Source Code More Than Just Lines of Code?
Source code is a corporate asset like any other. It takes thousands of hours to design, write, test, release, fix, and improve. Companies in the technology sector, like Twitch, consider source code as a blueprint that describes the internals of their digital platforms and the products they build and offer. Code is arguably one of the most valuable assets for such companies, at the source of business opportunities and value creation.

However, a blueprint, like any technical or engineering drawing of physical goods, isn't enough to reproduce the same goods it details. For many cybersecurity analysts, the same reasoning holds for source code leaks. From a technical standpoint, they don't consider these leaks to be dramatic events that threaten business continuity. Isolated, most of the source code is deemed to have no real value or use unless the attackers have other pieces of technology and, more importantly, the people and talent to use it. Moreover, stolen source code rapidly depreciates without the support and improvement from its original maintainers.

This is nowhere near an excuse for organizations to stop caring about securing their source code and enforcing strict internal security and access management policies. Source code remains an important piece of intellectual property, allowing its viewers to understand the technologies and logic that went into building the applications. For hackers looking to inflict deeper damage, source code also reveals logic flaws and vulnerabilities that can be further exploited. In addition, it often contains secrets and credentials that give easy access to all or parts of an organization's IT systems.

The Larger Problem: Secrets in Source Code
When discussing secrets in the context of software development, the term refers to digital authentication credentials that grant access to systems or data.

Secrets exist in the context of Web applications that rely on hundreds of independent building blocks to function. Secrets tie together these different building blocks by creating a secure connection between each component.

The Twitch source code leak revealed important shortcomings in the handling of secrets. Credentials and sensitive data regularly used by developers and applications were stored in plaintext in many of the 6,000 Git repositories — neglecting application security best practices and standards.

What's the Worst That Could Happen?
The main narrative in the media is that this doesn't present a security risk and that no significant customer data was leaked. The presence of secrets in source code, however, challenges this, and while no customer information may have been compromised directly in the breach, the vast number of exposed secrets shows that there are security concerns associated with this leak that must be addressed by Twitch.

Out of the nearly 6,000 repositories, more than 1,100 had at least one occurrence of what looks like a secret candidate. Various types of secrets were found, ranging from cloud services provided by AWS or Google to messaging APIs like Twilio or Vonage Nexmo. Full database connection strings and GitHub OAuth keys were also among the key findings of our investigation.

When secrets are exposed in such a fashion, what is referred to as the "attack surface" of the company stretches out further. This is the number of points where an unauthorized user could gain access to internal systems or data. In the case of Twitch, what started as "a server configuration change that allowed improper access by an unauthorized third party," as the company described it, could turn into hundreds if not thousands of new and unsecured entry points by using the exposed credentials.

Leaking Secrets Can Be Prevented
The massive and unwanted distribution of secrets is commonly called "secret sprawl." The truth is that it's extremely difficult to prevent it entirely because of human error and lack of discipline. In 2020 alone, research on public GitHub found over 2 million secrets exposed on the code-sharing platform.

However, this doesn't mean organizations are helpless. A rigorous approach to the transformation of software engineering teams' practices and development tooling can ultimately lead to preventing secrets-in-code. The first step toward this journey starts with developer education, making sure all code contributors are aware of the risks and consequences. Better equipment in the face of this challenge is also welcome. Tools such as vaults for secure secret storage and handling provide a clear set of instructions to be followed by everyone on board. In addition, organizations can look to automate secrets detection in code for a fail-safe solution, triggering real-time alerts in the event of policy breaks.

Holistic approaches that embed security practices in the software development life cycles, from initial design to implementation, are the safest bet to contain the damage if and when such source code leaks happen.

About the Author(s)

Mackenzie Jackson

Developer Advocate, GitGuardian

Mackenzie Jackson is the developer advocate at GitGuardian. He is passionate about technology and building a community of engaged developers to shape future tools and systems.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like

More Insights