Attackers who use standard system commands during a compromise — a technique known as living off the land (LotL) — to avoid detection by defenders and endpoint security software may find their activities in the spotlight if a machine learning project open sourced by software firm Adobe this week bears fruit.
The project, dubbed LotL Classifier, uses supervised learning and an open source dataset of real-world attack to extract features of specific commands and then classifies the command based on a features extracted using human analysis as a model. Those features are then used to determine whether the command is good or bad and to label the command with a set of tags that can be used for anomaly detection.
Each feature by itself — such as accessing the /etc/shadow directory, where passwords hashes are typically stored, or access to Pastebin — may seem suspicious, but usually are not malicious, says Andrei Cotaie, technical lead for security intelligence and engineering at Adobe.
"On their own, most of the tags — or tag types — have a high FP [false positive] rate, but combining them and feeding this combination through the machine learning algorithm can generate a higher rate of accuracy in the classifier," he says, adding that Adobe has benefited from the machine learning model. "The LotL Classifier is operational in our environment and based on our experience, by suppressing reoccurring alerts, the LotL Classifier generates a few alerts per day."
Living off the land has become a widely used attacker tactic when targeting enterprises. Malware attacks are just as likely to begin with a PowerShell command or Windows Scripting Host command — two common administrative tools that can escape notice — than as a more traditional malware executable. In 2019, CrowdStrike's incident response group found that "malware-free" attacks, another name for LotL, surpassed malware-based incidents. By the summer of 2021, they accounted for more than two-thirds of investigated incidents.
"Attackers are increasingly attempting to accomplish their objectives without writing malware to the endpoint, using legitimate credentials and built-in tools (living off the land) — which are deliberate efforts to evade detection by traditional antivirus products," CrowdStrike stated in its "2021 Threat Hunting Report."
The LotL Classifier uses a supervised machine learning approach to extract features from a dataset of command lines and then creates decision trees that match those features to the human-determined conclusions. The dataset combines "bad" samples from open source data, such as industry threat intel reports, and the "good" samples come from Hubble, an open source security compliance framework, as well as Adobe's own endpoint detection and response tools.
The feature extraction process generates tags focused on binaries, keywords, command patterns, directory paths, network information, and the similarity of the command to known patterns of attack. Examples of suspicious tags might include a system-command execution path, a Python command, or instructions that attempt to spawn a terminal shell.
"The feature extraction process is inspired by human experts and analysts: When analyzing a command line, people/humans rely on certain cues, such as what binaries are being used and what paths are accessed," Adobe stated in its blog post. "Then they quickly browse through the parameters and, if present in the command, they look at domain names, IP addresses, and port numbers."
Using those tags, the LotL Classifier uses a random-forest tree model that combines several decision trees to determine whether the code is malicious or legitimate.
"Interestingly, these stealthy moves are exactly why it's often very difficult to determine which of these actions are a valid system administrator and which as are an attacker," the company stated in a blog post.
The machine learning model can benefit companies in a variety of threat-analysis pipelines, says Adobe's Cotaie. Threat hunters could use it as a local service or the model could process global security information and event management (SIEM) data to find anomalies by feeding another open source tool released by Adobe, the One-Stop Anomaly Shop (OSAS). The model has a component for Windows systems and a separate one for Linux, but it's otherwise context independent.
"The classifier is integrated into ... One Stop Anomaly Shop (OSAS)," he says. "The parent project can model local or group system behavior using many context-dependent features and its anomaly detection features are complementary to the LotL classifier model."