Cybersecurity is built, at least in part, on fingerprinting and cataloging malware. Polymorphic malware has always existed, but the recent proliferation of do-it-yourself builders, which allow novice hackers to easily create unique crimeware, is sending ripples through the threat intelligence industry.
The primary method of identifying malware has always been file hashing. A file hash is produced through a mathematic operation that creates a unique fingerprint for files, allowing security vendors to compare a suspicious sample against known files from the past.
The weakness of the file hash is that if even a single byte changes, the hash value changes too. The ease of building "zero-day" hash variations killed the old antivirus industry, which relied too heavily on looking up hashes in signature databases. Today's detection industry has already adjusted to polymorphic malware. Instead of using hashes, modern detection products monitor malware behavior on the endpoint or in sandboxes, or utilize machine learning to look inside files and recognize similarities to known malware.
In today's detection industry, one should think of hashing as more of a shortcut to locate the easy stuff, or rule out known good files (whitelisting). It's also a data transfer shortcut: one can avoid moving an entire file across the network or into the cloud by instead sending a small hash value, and then query it against a hash database.
While detection products have adjusted, file hashes are still used in categorizing malware, sharing intelligence, and working backward to figure out who your adversary is, referred to as attribution. Herein lies a growing problem.
Threat Intel to Know Your Enemy and Predict Behaviors
Humans are habitual creatures who do not get up in the morning each day and learn an entirely new set of tools and a way of operating. They fall into a pattern of "Tools, Tactics, and Procedures," or TTPs. TTPs can also be used to profile and predict hacker behaviors. Because TTPs include the tendency for hackers to reuse malware for multiple targets, there is value in organizations comparing their suspicious samples with others across the industry.
For example, upon locating a file sample in your organization, a researcher might want to tap into threat intel to identify the type and family of malware and learn of its behavior and capabilities. Thus, the workflow of threat intelligence usage is often, "I have malware with this hash; who else has seen it?" But what happens when the proliferation of uniquely hashed malware is so great they are all unique to your organization? This erodes the collaborative value of threat intel.
It would be extreme to say the threat intelligence industry has lost its value. Intelligence also includes correlating malware behavior as well as URLs and IP addresses of command and control servers beaconed to by malware. Additionally all malware will never be unique; there are cases such as advanced persistent threats designed to sit on networks for many months, which — if their files are completely unique — would draw the attention of infosec personnel.
Yet there is a definite trend the industry is seeing toward increasing amounts of malware uniqueness. The 2015 Verizon DBIR Report, when commenting on the hashes of malware, proclaimed in capital letters that "Seventy to ninety percent OF MALWARE SAMPLES ARE UNIQUE." Last year, Verizon doubled down on this stating, "We first wanted to reaffirm what we found last year regarding the uniqueness of hashes." 2017's DBIR Report claims that in data sets that it monitors, 99% of malware files are replaced by uniquely hashed binaries within 58 seconds of appearing.
The industry needs methods to classify malware, to determine who's behind breaches, and what can be done to stop them. File hashing certainly appears to becoming less useful to accomplish these aims. It's time to adjust our thinking.
Editor's Note: This is the first of a two-part series. Next week's installment, Why We Need To Reinvent How We Catalogue Malware, will discuss how hackers have become adept at producing uniquely hashed malware, and what can be done, if anything, to classify this new ocean of unique cyberthreats.
Prior to becoming an independent analyst, Paul Shomo was one of the engineering and product leaders behind the forensics software EnCase. In addition to his work in the digital forensics and incident response (DFIR) space, he developed code for OSes that power many of today's ... View Full Bio