Never before have organizations handled more information — or been more concerned about how it may fall into the wrong hands. This concern applies to all data, but especially the source code they rely on to keep their processes running.
Businesses and individuals alike rely on platforms such as GitHub, GitLab, and BitBucket to store and manage their source code and keep their development projects running. These platforms are wildly popular: GitHub has more than 73 million developers and 200 million repositories, GitLab estimates 30 million registered users, and BitBucket reported 10 million users in 2019.
If security teams aren't worried about the source code stored on these platforms, they should be because chances are their developers have at least a few projects they're keeping there. Some attacks in recent years have highlighted the threat: A 2019 ransomware attack wiped Git source code repositories across platforms and replaced them with a ransom demand. There is also the risk of downtime, as was the case when GitHub was down for at least two hours in June 2020.
The cost of losing source code is high, says John Bambenek, principal threat hunter at Netenrich.
"Anything that is critical to an organization should be backed up," he says. "A good rule of thumb is, 'Can the company continue to operate without this?' and if the answer is no, there needs to be a backup plan."
There are many reasons why a company might not be thinking about backing up their source code. It could partly be wanting to save money and partly feeling invulnerable to attacks that will compromise their source code. There's also the reality that backups cost money without any tangible benefit — until they're needed, notes Mark Loveless, senior security engineer at GitLab.
"For the most part, you're just doing something where you don't see an immediate gain," he says. "That's the way backups are. You don't see an immediate gain, and you never want to see an immediate gain on backups because you're hoping that everything works out and you never have to resort to them. But you need a plan for that."
Awareness is another issue. Some people may not back up their source code because they don't think they have to, he adds. GitLab, GitHub, and BitBucket, much like the major cloud providers, have a "shared responsibility model" in which users and providers of the service share the responsibility for protecting their information.
GitLab does backups on its own servers "pretty much constantly," says Loveless, but a lot of people have their own instance of GitLab running on their own private cloud space or on a physical server in their data center. In these cases, users should consider the cloud provider they're using, what kind of backups they keep, and how far back they want to back their data.
"Git ... since it stores a history of code check-ins and you can do rollbacks to a previous version of code, [users] have a tendency to think that there's a backup," Loveless says. "There is, as far as revisions and your code changes ... but those are stored in a database [and] data files, and those need to be backed up."
A working copy of the repository on each computer should not be considered a backup as it typically only contains the source code and not the issues, comments, pull requests, and other metadata associated with it. It's common to think that a Git repository or other version control is sufficient, adds Taylor Gulley, senior application security consultant at nVisium. Version control, while very useful, still only has your code stored in a single centralized location.
"Unless your disaster recovery plan is to pull the code from a developer's local machine — assuming there are any that survive the incident that took down the server — proper backups are critical," Gulley says.
What Companies Should Know About the Process
Backups for source code can take multiple forms. Organizations can choose to manage their own backups and take ownership of the associated infrastructure, processes, and repair costs. While this gives them greater control over their data, it may cost more in the long run due to resources spent on maintenance.
Manual backups also involve technical challenges. It's difficult to keep all assets consistent to make them recoverable to any Git repository because each vendor has its own API, process, comments, and issues. The API request rate limits pose another obstacle: Usually Git backup is associated with sending many requests to the API of the Git provider, and they have to limit the number of requests sent in a limited period of time.
Alternatively, they can look to a third party that handles backup management. In many cases, there are cloud services that can help with this, Bambenek notes. Organizations may turn to a service such as GitProtect.io, a tool designed to back up code on GitHub, GitLab, and BitBucket.
"The need was found inside our own company," says GitProtect product development manager Greg Bak of the product's creation. "We had some internal scripts to protect those repositories, but no one was able to guarantee that we will always be able to restore those repositories ... that they are protected properly, that our backups are tested. So we decided to [build] it."
GitProtect is available in two models: backup-as-a-service and on-premises, so organizations can install it locally or deploy it to the public cloud. The product's goal is to not only protect source code, but also all the related metadata needed to keep a repository consistent, such as comments, issues, and CI/CD tasks, Bak says.
There are a number of threats that could compromise source code, beyond attacks targeting repositories and potential disruption of these platforms. Human error, and unwanted changes to the code itself, could require backups to get processes back up and running, he adds.
Backup Best Practices
Regardless of how you decide to back up your source code, GitLab's Loveless advises bringing a security expert into the room.
"Invest in some security people," he says. "If you can have people in there, experienced people who know how to do this, invest in people and you should get a lot better results."
Experts also advise keeping backups stored in a safe place and encrypted. If you're running a multicloud environment, rotate backups off-site or off-system. Gulley recommends keeping a couple of copies on-site, and one off-site, in case the location is compromised. Previous backups should not be able to be modified or deleted by the automated backup processes or accounts.
All experts agree that it's not enough to make backups of source code. It's also important to test them and ensure they work. If they don't, you don't want to find out when you need them. Test the process of accessing and using the backups to make sure you can use them and that everyone involved understands their role in the event of an attack, outage, or compromise.