In 2016, Justin Smith wrote a great article about securing cloud-native platforms such as CloudFoundry and OpenShift. Smith describes what he calls the "Three R's" — rotate, repair, and repave — and how they can improve security. The model is powerful, but Smith focuses on the platform itself. Those same techniques, with minor modifications, can be extended to securing the apps running on top of the platform.
Let's dive into the three R's, and see how they apply to the platform.
The first R stands for rotate. A cloud-native platform relies on automation, so different parts of the system need to access and modify others. To do this, they use a mix of different tokens, ranging from username and password to UUIDs (universally unique IDs, a 128-bit number) to private keys and certificates. Over time, many systems and individuals have access to these tokens, inevitably resulting in tokens being leaked, giving attackers control over our systems.
The rotate principle means you should rotate your keys as often as possible. This way, leaked credentials will quickly become irrelevant, often before the attacker even gets to use them. Rotating credentials requires rethinking how you manage them, switching from a reliance on manual tracking to use of a central key management system, and the automation of the creation and consumption of access keys. (CloudFoundry recently launched CredHub for this purpose.)
This security principle applies to cloud-native apps pretty much as is. Although platforms rely on credentials for steps like SSH (Secure Shell) access and system provisioning, apps need them for accessing databases, third-party services, and interacting with the platform itself. Cloud apps typically store credentials in the platform's configuration or environment variables, which makes the values available to the app. Such a centralized credential storage makes it fairly easy to rotate keys — simply generate a new token, update the app's configuration, and restart all running instances.
Rotating application credentials is a great way to improve app security in cloud-native apps. All it requires is a bit of automation. Note that this often requires having two valid credentials at any given time — one used by the current system and one being rolled out. Such pairs allow you to rotate credentials with no downtime, as old running instances will keep using the old keys until all are restarted.
The next R stands for repair, as in repairing known vulnerabilities in your systems. Known vulnerabilities account for the majority of successful exploits, and high-profile vulnerabilities such as Heartbleed and Shellshock demonstrate their severity well. Each of these are usually easy to fix, often by simply grabbing the fixed version, but doing so at scale is hard.
Using a cloud-native platform implies that you can automate the creation and deployment of your containers. This presents the opportunity to automate the tracking of known vulnerabilities in your operating system or its dependencies, trigger the creation of new containers and images, and streamline the rollout of those new systems to the cloud. Handling such automated repair well should be a key criterion when picking the platform to use.
Repair applies to applications, as well. As demonstrated by the Equifax breach, a vulnerable application library (in that case, a remote command execution in the Java Struts2 library) can cause as much damage as an OS dependency, and it won't be fixed by patching the underlying server. The steps are similar as well — automate tracking of newly disclosed vulnerabilities, repackage (in this case, rebuild) and roll out the new version.
The key distinction between app and platform lies in where this process takes place. Most developers are uncomfortable rolling out a library update without going through their automated testing, implying that this tracking should occur earlier in the dev process, integrated into your source control (e.g., GitHub, BitBucket, or GitLab), or as part of your continuous integration and continuous delivery/deployment (CI/CD).
The last-but-not-least R is repave. The original repave concept involves recreating servers frequently in case they've been compromised, removing the risk of a compromised server causing damage. Smith suggests switching from boasting about how long our servers can run to how quickly we can shut them down.
This concept doesn't really apply to apps, as code always needs some server to run on (even if you're using "serverless" computing). Frequent repaving of the underlying server would also help deal with servers compromised via an app-layer vulnerability. However, I suggest replacing this R with a new one: readjust.
Readjust/Divide and Conquer
Cloud-native applications take great advantage of microservices. In addition to being easier to maintain, they make our systems more flexible, as we use the same services in different constellations and execution paths, creating new business value with far less effort. Cloud-native platforms typically support locking down the permissions of each microservice individually, granting it only the permissions it needs to get the job done.
Unfortunately, maintaining granular permissions across many applications is hard, so developers reuse the same policies and credentials in multiple applications to lessen the load. This means each policy and service account needs to have the total permissions each microservice needs to operate — far more than what an individual policy would have allowed. To make things worse, it's not clearly known which app needs each permission in the policy, so it's hard to make them stricter over time.
To successfully mitigate the damage a compromised microservice can cause, make sure you regularly readjust the permissions to the minimum necessary for each. Doing so with cross-service policies will be hard, so I recommend requiring that each microservice maintains its own policy and credentials, making such constant adjustments feasible.
Switching to a cloud-native platform may be scary, as it shuffles your priorities. However, if this is done well, you can use the increase in automation and greater speed to your advantage and make both the platform and the apps running on it more secure.