New Model Uses 'Malicious Language Of The Internet' To Find Threats Fast

OpenDNS's new NLPRank tool may identify malicious domains before they are even put to nefarious use.

While many in the security industry are pushing for better methods of assigning attribution for cyberattacks after the damage is done, there is also a growing effort to strengthen early-stage defenses -- to stop attacks before they have a chance to do much harm. OpenDNS has introduced a new tool to fit into that second category. NLPRank is an advanced threat detection model that uses the "malicious language of the Internet," to identify suspicious domains almost as soon as they're registered.

"Only recently have we been able to prove just how valuable [NLPRank] is," says OpenDNS director of security research Andrew Hay. Now, Hay says, the threat model has proven able to sniff out attack campaigns "long before" indicators of compromise or attribution theories are publicly released.

OpenDNS security researcher Jeremiah O’Connor first got the idea for NLPRank in November, after Kaspersky Labs revealed details about the DarkHotel campaign. O'Connor realized that the DarkHotel attackers and the APT1 hacking group -- detailed by Mandiant in 2013 -- follow the same basic patterns when choosing the domain names they use in phishing campaigns.

"The way that attackers 'sell' a spear-phishing attack is by spoofing a domain so that it looks like it comes from a legitimate company,” O’Connor said in an OpenDNS blog post today. "After running detailed analytics on the data from these types of campaigns, I found that these domain names were predictable."

Using DarkHotel and APT1 as test cases, O'Connor found that some common English words show up a lot --  those popular phishermen calls-to-action like "update," "install," and "download." Coupled with those are other common parts of legitimate domain names, like "java," "gmail," or "adobe." NLPRank begins by cross-referencing those terms they've identified as part of the "malicious language of the Internet" lexicon.

Then, as the blog explains, NLPRank "uses alignment techniques from computational biology to grade permutations of these domain names, like 'installad0be,' and then judge the likelihood they will be used in spear phishing."

When one of those likely domain names shows up in OpenDNS's scans of DNS records, NLPRank checks out some other information to see if their suspicions are well-founded. For example, they check the WHOIS information to see if the registrar used by the suspect domain is the same as the parent company. They check the HTML to see if outbound links are heading to suspicious locations and to look for other subtle differences.

Hay explains that when attackers first register a domain, they usually test it several times before putting it to any meaningful malicious use. "We can see that initial bump in the wire," says Hay, "and then we can block it before it becomes a full-blown campaign."

They've already started applying NLPRank to that task. Last month they used it to discover a PayPal phishing campaign. Plus, when they had a look at Kaspersky Lab's data on the impressively sophisticated Carbanak banking crime ring, they saw that NLPRank had flagged the Carbanak command-and-control domains weeks earlier. 

Hay says they're "not quite ready" to put NLPRank into production. They want to wait until further testing proves the tool doesn't produce too many false positives. After that, they may consider expanding the lexicon to include terms that are popular in the most heavily targeted industries (like finance). However, Hay says they couldn't go too crazy with that effort lest they turn the whole exercise into "looking for a needle in a pile of needles."

OpenDNS has expanded its research team recently. Hay says to expect more innovations like NLPRank to come soon. 

Editors' Choice
Kelly Jackson Higgins 2, Editor-in-Chief, Dark Reading