The proper identification of indicators of compromise (IoCs) — whether generated from misguided negligence, a well-intentioned operational change, or the acts of a malicious insider or external attacker via compromised account credentials — often occurs only after data exfiltration has succeeded.
Fortunately, IT infrastructure (secifically, application processes, workloads, sessions, and network connections) doesn't lie. It tells the tale with a trail of threat crumbs that can be observed well before the breach has done its damage. And to find and collect those crumbs, targeted monitoring of IT infrastructure and application behavior remains the most effective way to detect and hopefully shut down attacks in progress.
Let's consider which specific crumbs, or telemetry, are best to use and how they can help organizations respond to threats earlier in the kill chain.
It all starts with a process. Consider a shell script or Java app — the time, process identifier (pid), arguments, and checksum of the process are all important factors.
For instance, in brute-force or other more sophisticated attacks, a rogue change in process behavior offers the first IoC — like the result of an attacker living off the land. So, implementing techniques with the fidelity to detect rogue process behavior is a critical front-line defense.
Modern business applications are distributed across the enterprise and its data centers and cloud instances. With their network traffic flowing across these different environments, between their APIs, with Internet of Things platforms and now leveraging containers and microservices, network metrics without context are useless. It is essential to have in-depth understanding of both interprocess and interapplication network traffic to gain and develop the context necessary to understand what's normal and what's not.
For example, many of the most recent high-profile breaches with exfiltration have exploited the compromise of privileged, non-human, service account credentials. One way to detect this type of attack is to graph network traffic to and from an application or business service as well as its underlying process activity — namely, its pid, executed commands, and arguments, and even resulting changes in its network connections. Access to this time-series data with real-time alerting for leading and trailing forensics can help ensure infosec teams aren't late to the party.
Observing only process data is insufficient and doesn't matter if it's the commercial off-the-shelf (COTS) or operational support system (OSS) variety. Instead, it's necessary to verify and validate that no Web application resource (WAR) files, binaries, secrets, or configurations have been compromised. This may be accomplished via file system scans and focused tests against manifests or checksums.
However, a better practice is to build this capability into the toolchain. An emerging threat carries the risk of polluting build systems — perhaps by an insider or maybe through an upstream dependency. Putting measures in place to monitor the software that is built and deployed is critical. An unfortunate example of this would be a compromise of a library that Ruby's strong password library depends on. At the very least, organizations should measure ownership, permissions, and checksums and build assurance into the toolchain to yield metrics during test, release, and operation.
Many organizations already implement controls on users and are perhaps even using UEBA. However, many users don't exist in the directory and are not part of user management processes. System and service accounts litter systems. Yet they are the backbone of the infrastructure servicing Web, application and RDBMS servers. These are the accounts that must be observed with context to avoid taking action on an account upon which business services depend.
For example, while an Apache user should never log in, the Oracle user should be allowed to but only by a small, whitelisted number of users and from predetermined, controlled locations.
Infosec and attacker philosophies are orthogonal — while the endpoint is the unit of measure for IT, the attacker's currency is the target's data. And attackers look to exploit any weakness in the network, UI, APIs, or operating systems to get to it. To realign, organizations must observe an entire service and how services interact.
To illustrate, should the fact that the CPU spiked on an application server or, conversely, that it dropped, be investigated? The only way to make that determination is by knowing the context. Was there a planned change, or a new product or geography launched? Is a campaign being run? A macro-level view is the only way to avoid false positives. Leveraging information from service delivery automation helps make organizations more change-aware, by adding more depth to the context of events.
Let's cement this with another example. In preparation for a product launch, an organization pre-provisions four times more capacity in their Web and application server tier and redistributes the load using HAProxy. Before the launch, no security alerts are generated and therefore the organization is unaware that there is now additional capacity. On launch day, there's a huge increase in volume on the database servers, which might otherwise indicate that perimeter defenses have been breached and confidential customer data accessed.
Bringing It Together
Taken in isolation, each of the metrics discussed lose their value without context. The full picture of relationships between apps and their underlying hardware platforms, operating systems, network connections, performance, processes, and the identities and time of usage are required to detect threats as they unfold.