To understand how the bad guys have become so adept at producing the flood of uniquely hashed malware, we need to look at what our adversaries have been doing the past few years.
Why go back in history? Because software takes years to spread through society according to an "adoption curve." Despite its unconventional path from programmer to user, malware follows this same multi-year curve before it pops up on our radar. Take today’s ransomware headlines, Mario Vuksan, CEO of ReversingLabs points out, "Ransomware has been around for a long time, and it's just exploded the last two years."
No Magic in Building Zero Days
A black hat programmer in possession of malware's source code always has the option to make slight alterations and build new binaries with unique hash values. The variants created through custom builds are referred to as part of a malware family, because they come from common source code. Many times cybercriminals adept at programming make their living selling these builds in online crime markets.
To really see the decline of file hashing, we need to step back in time to look at tools that have lowered the bar for those lacking source code and programming savvy to create polymorphic malware. A simple example would be packing tools.
Packers allow the insertion of malware into existing binaries, creating a distinct executable with a unique hash that runs malicious code. Anyone who can run a command line utility can pack executables even without owning any source code.
New Malware "Families" Produce Unique Children, Lots of Children
Possibly the most obvious trend leading to the proliferation of zero-day binaries are those crimeware technologies which come with simple user consoles, and include builder functionality that create unique binaries at the click of a button.
Our industry loves to come up with creative names for malware categories. Remote Access Trojans (RATs), or C2 Trojans (Command and Control Trojan) as they're more commonly called now, caused a lot of trouble for government agencies in 2014 and 2015. The PlugX RAT, for example, lead to the historic theft of 18 million classified identities from OPM. To give you a little feel for the C2 Trojan adoption curve, PlugX was first discovered six years prior, in 2008.
While PlugX's UI is Chinese, the Gh0st RAT console pictured below is another Trojan which caused havoc. It has a UI remarkably similar to PlugX, except in English. Gh0st includes everything a novice needs to own their enemy including a "Create" button that produces unique Trojan files in about a second. Using this console, it's actually impossible to create a Trojan binary with a known hash; building zero days is the standard workflow within the UI.
Why We Should Identify Malware Families
In days past an analyst could look through threat intel to see overlapping intelligence where a given hacking crew hit their organization and other victims using the same malware hashes. Today, how do you track your malware sample back to a crew of bad actors who work off a common code base, or use common builders if they use uniquely hashed malware against all their victims? With all the zero-day malware, URLs and network communications are probably better used for attribution.
Malware reverse-engineers can manually deconstruct binaries back to their source code to identify familial DNA. But while rapid hashing of binary instances have been a mainstay of malware identification, no automated method to classify familial DNA has emerged.
Recognizing Polymorphic Malware
Builds of variants may morph their file hashes with small changes. Yet since a malware family centers around source code which defines common capabilities, sections of binaries holding this functionality remain constant across all their children.
Some vendors are able to recognize malware by noticing sections of binary files implementing functionality rather than hashing the entire file. As Tomislav Pericin, chief software architect at ReversingLabs noted, polymorphic malware can’t be correlated "based on hashing all the bits of the file anymore, that's why we developed our own algorithms to say these files are functionally similar" and thus part of a malicious family.
We're seeing examples of companies innovating new ways to detect polymorphic variants with partial hashing algorithms. Maybe in the future vendors will extend these approaches to cataloging families for threat intelligence, and as aides to attribution.
It won't happen overnight, this task is bigger than just the threat intelligence vendors. We’d have to see the industry as a whole move towards standardized ways to classify malware's familial DNA.
This is the second in a two-part series on the slow death of malware fingerprinting. You can click on What To Do When All Malware Is Zero-Day? to read the first installment.