Bot detection over IP networks isn't easy, but it's becoming a fundamental part of network security practice.

Avidan Avraham, Security Researcher at Cato Networks

November 22, 2019

6 Min Read

With the rise of security breaches using malware, ransomware, and other remote access hacking tools, identifying malicious bots operating on your network has become an essential component to protecting your organization. Bots are often the source of malware, which makes identifying and removing them critical.

But that's easier said than done. Every operating environment has its share of "good" bots, such as software updaters, that are important for good operation. Distinguishing between malicious bots and good bots is challenging. No one variable provides for easy bot classification. Open source feeds and community rules purporting to identify bots are of little help; they contain far too many false positives. In the end, security analysts wind up fighting alert fatigue from analyzing and chasing down all of the irrelevant security alerts triggered by good bots.

At Cato, we faced a similar problem in protecting our customers' networks. To solve the problem, we developed a new, multidimensional approach that identifies 72% more malicious incidents than would have been possible using open source feeds or community rules alone. Best of all, you can implement a similar strategy on your network.

Your tools will be the stock-and-trade of any network engineer: access to your network, a way to capture traffic, like a tap sensor, and enough disk space to store a week's worth of packets. The idea is to gradually narrow the field from sessions generated by people to those sessions likely to indicate a risk to your network. You'll need to:

  • Separate bots from people

  • Distinguish between browsers and other clients

  • Distinguish between bots within browsers

  • Analyze the payload

  • Determine a target's risk

Let's dive into each of those steps.

Separate Bots from People
The first step is to distinguish between bots (good and bad) and humans. We do this by identifying those machines repeatedly communicating with a target. Statistically, the more uniform these communications, the greater the chance that they are generated by a bot.

Distinguish Between Browsers and Other Clients
Having isolated the bots, you then need to look at the initiating client. Typically, "good" bots exist within browsers while "bad" will operate outside of the browser.

Operating systems have different types of clients and libraries generating traffic. For example, "Chrome," "WinInet," and "Java Runtime Environment" are all different client types. At first, client traffic may look the same, but there are some ways to distinguish between clients and enrich our context.

Start by looking at application-layer headers. Because most firewall configurations allow HTTP and TLS to any address, many bots use these protocols to communicate with their targets. You can identify bots operating outside of browsers by identifying groups of client-configured HTTP and TLS features.

Every HTTP session has a set of request headers defining the request and how the server should handle it. These headers, their order, and their values are set when composing the HTTP request. Similarly, TLS session attributes, such as cipher suites, extensions list, ALPN (Application-Layer Protocol Negotiation), and elliptic curves, are established in the initial TLS packet, the "client hello" packet, which is unencrypted. Clustering the different sequences of HTTP and TLS attributes will likely indicate different bots. 

Doing so, for example, will allow you to spot TLS traffic with different cipher suites. It's a good indicator that the traffic is being generated outside of the browser — a very non-humanlike approach and hence a good indicator of bot traffic.

Distinguish Between Bots within Browsers
Another method for identifying malicious bots is to look at specific information contained in HTTP headers. Internet browsers usually have a clear and standard header image. In a normal browsing session, clicking on a link within a browser will generates a "referrer" header that will be included in the next request for that URL. Bot traffic will usually not have a "referrer" header — or worse, it will be forged. Identifying bots that look the same in every traffic flow likely indicates maliciousness.

User-agent is the best-known string representing the program initiating a request. Various sources, such as fingerbank.org, match user-agent values with known program versions. Using this information can help identify abnormal bots. For example, most recent browsers use the "Mozilla 5.0" string in the user-agent field. Seeing a lower version of Mozilla or its complete absence indicates an abnormal bot user-agent string. No trustworthy browser will create traffic without a user-agent value.

Analyze the Payload
Having said that, we don't want to limit our search for "bad" bots only to the HTTP and TLS protocols. We have also observed known malware samples using proprietary unknown protocols over known ports and such could be flagged using application identification.

In addition, the traffic direction (inbound or outbound) has a significant value here. Devices that are connected directly to the Internet are constantly exposed to scanning operations and therefore these bots should be considered as inbound scanners. On the other hand, scanning activity going outbound indicates a device infected with a scanning bot. This could be harmful for the target being scanned and puts the organization IP address reputation at risk.

Determine a Target's Risk
Until now, we've looked for bot indicators in the frequency of client-server communications and in the type of clients. Now, let's pull in another dimension — the destination or target. To determine malicious targets, consider two factors: target reputation and target popularity.

Target reputation calculates the likelihood of a domain being malicious based on the experience gathered form many flows. Reputation is determined either by third-party services or through self-calculation by noting whenever users report a target as malicious.

All too often, though, simple sources for determining targets reputation, such as URL reputation feeds, alone are insufficient. Every month, millions of new domains are registered. With so many new domains, domain reputation mechanisms lack sufficient context to categorize them properly, delivering a high rate of false positives.

Bot detection over IP networks is not an easy task, but it's becoming a fundamental part of network security practice and malware hunting specifically. By combining the five techniques we've presented here, you can detect malicious bots more efficiently.

For detailed graphics and practical examples on applying this methodology, go here.

Related Content:

 

Check out The Edge, Dark Reading's new section for features, threat data, and in-depth perspectives. Today's top story: "What's in a WAF?"

About the Author(s)

Avidan Avraham

Security Researcher at Cato Networks

Avidan Avraham is a Security Researcher in Cato Networks. Avidan has a strong interest in cybersecurity, from OS internals and reverse engineering, to network protocols analysis and malicious traffic detection. Avidan is also a big data and machine learning enthusiast who enjoys solving complex problem related to this world. Previously, he worked at IBM Trusteer and was responsible for a vast part of the company's detection and prevention capabilities: enterprise security threats and exploits, financial malware variants, and more. Today, in Cato Research Labs, he's focusing on network based security research and novel methods for finding threats in enterprise network environments.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights