Supranamaya "Soups" Ranjan, a research scientist, says he and a team of colleagues came up with a prototype method of detecting botnets like Conficker, Kraken, and Torpig that use so-called DNS domain-fluxing for their command and control (C&C) infrastructure. Domain-fluxing, also known as domain generation algorithm (DGA), randomly generates domain names; a bot basically queries a series of domain names, but the domain owner registers just one. To get to the C&C, botnet researchers typically reverse-engineer the bot malware and figure out the domains that are generated on a regular basis -- a time- and resource-intensive process in an attempt to discern all of the domain names that would be registered by a botnet so they can jump ahead and register them in order gain a foothold in their investigation.
"Botnets such as Kraken, Conficker, and Torpig came up with domain fast-flux, where even the domain name that each bot queries for is randomly generated," Ranjan says. "Each bot queries for tens of thousands of domain names hoping that the botnet operator has registered for at least one of them via DNS. Now consider security vendors, who in this situation have no way of predicting which DNS queries are related to a botnet."
Ranjan, who is with Narus Inc., and Sandeep Yadav, Ashwath Reddy, and A.L. Narasimha Reddy, all with Texas A&M, created a method of studying in real-time all DNS traffic for domain-flux activity. The researchers presented their findings this week at the ACM Measurement Conference in Melbourne. Their method basically looks at the pattern and distribution of alphabetic characters in a domain name to determine whether it's malicious or legitimate: This allows them to spot botnets' algorithmically generated (rather than generated by humans) domain names. Bottom line: Given that most domain names are already taken, botnet operators have to go with gibberish-looking names like Conficker does: joftvvtvmx.org, gcvwknnxz.biz, and vddxnvzqjks.ws, which their bots generate.
Domain-fluxing makes the botnet researcher's job of tracking botnets even more difficult. "This [domain-fluxing] is obviously a defensive headache for us, but for the attacker it exposes possible future rally points that the good guys can block," says Jose Nazario, senior security researcher at Arbor Networks. "We expect this trend to continue, so the work [here] makes sense: speed up the identification of these in the malcode analysis steps or from packet traces, making analysis more efficient."
Conficker-A, for example, generated about 250 different domains every three hours while using the current date and time at UTC, according to Ranjan and his team. The Conficker creators upped the ante with Conficker-C, generating 50,000 domain names per bot to make it more difficult for a researcher to preregister the domain names, they said in their report.
But this isn't the only method for tracing these stealthy botnets. Gunter Ollmann, vice president of research at Damballa, says a dynamic reputation system method developed by researchers at Georgia Tech works well. "This is probably the most advanced assisted machine-learning approach to the problem," he says. It doesn't require seeing copies of the malware to detect the botnets using domain-fluxing, he says.
And another technique used by Damballa performs so-called NX Domain analysis, Ollmann says, which has been used since 2009. When a domain is generated that doesn't exist, the TLD name server responds with a so-called NX response, meaning the domain doesn't exist. "It's relatively simple to detect at the network level the fluxing attempts by the malware to located these dynamically generated domains, and to also see the number/heuristics of the NX Domain responses from the DNS servers," Ollmann says. "Simple machine-learning algorithms are trained using known data sets for an assortment of malware samples, and the system then automatically detects new, known or suspicious malware infections. The clustering algorithms automatically identify the malware family."
Narus' Ranjan says the NX Domain analysis is limited in that it can only find DNS anomalies, when too many DNS queries return failure messages, for example. "So they may be used as a first signal for detecting domain flux. Our methodology goes one step further and we can distinguish between cases of legitimate queries that are returning failure responses -- due to network failures -- versus domain flux queries," he says.
He says his method also differs from Georgia Tech's in that it uses more detailed statistics about the domain names.
But the next big thing, he says, is botnets using both IP fast-flux and domain fast-flux, something his team has already spotted in the wild. IP fast-flux is a round-robin method where infected bots serve as proxies or hosts for malicious websites and are constantly rotated, changing their DNS records to prevent their discovery by researchers, ISPs, or law enforcement. Ranjan says his team's new detection method also works for detecting IP fast flux.
While running the prototype against live traffic, the researchers found some new botnet behavior from a botnet they've christened "Storm2.0." "The domain names mapping to the C&C server IP address are composed of two words from the English language. A similar behavior is observed with the [original] Storm botnet where the domain names were composed of one English language word and a randomized string," Ranjan says.
All of the domain names for Storm2.0 are in the ".ru" top-layer domain, he says. "In fact, we observed several IP addresses for Storm2.0, once again highlighting that bots have begun using a combination of domain- as well as IP-fluxing," he says.
Ranjan says organizations need to incorporate this type of analysis in order to fight botnets. "A system such as ours should be the first alarm that goes off whenever a new domain fast-flux botnet becomes active. After that an organization can take steps to capture the traffic corresponding to the IP addresses suspected to harbor such bots and examine them further to develop signatures," he says. "But not the other way around, where previously researchers had to scramble to discover the exact algorithm used by Conficker and only then did they register all the domain names that Conficker was going to query for."
Full technical details on the research are available here.
Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.