Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

10:00 AM
Avidan Avraham
Avidan Avraham
Connect Directly
E-Mail vvv

The 5-Step Methodology for Spotting Malicious Bot Activity on Your Network

Bot detection over IP networks isn't easy, but it's becoming a fundamental part of network security practice.

With the rise of security breaches using malware, ransomware, and other remote access hacking tools, identifying malicious bots operating on your network has become an essential component to protecting your organization. Bots are often the source of malware, which makes identifying and removing them critical.

But that's easier said than done. Every operating environment has its share of "good" bots, such as software updaters, that are important for good operation. Distinguishing between malicious bots and good bots is challenging. No one variable provides for easy bot classification. Open source feeds and community rules purporting to identify bots are of little help; they contain far too many false positives. In the end, security analysts wind up fighting alert fatigue from analyzing and chasing down all of the irrelevant security alerts triggered by good bots.

At Cato, we faced a similar problem in protecting our customers' networks. To solve the problem, we developed a new, multidimensional approach that identifies 72% more malicious incidents than would have been possible using open source feeds or community rules alone. Best of all, you can implement a similar strategy on your network.

Your tools will be the stock-and-trade of any network engineer: access to your network, a way to capture traffic, like a tap sensor, and enough disk space to store a week's worth of packets. The idea is to gradually narrow the field from sessions generated by people to those sessions likely to indicate a risk to your network. You'll need to:

  • Separate bots from people
  • Distinguish between browsers and other clients
  • Distinguish between bots within browsers
  • Analyze the payload
  • Determine a target's risk

Let's dive into each of those steps.

Separate Bots from People
The first step is to distinguish between bots (good and bad) and humans. We do this by identifying those machines repeatedly communicating with a target. Statistically, the more uniform these communications, the greater the chance that they are generated by a bot.

Distinguish Between Browsers and Other Clients
Having isolated the bots, you then need to look at the initiating client. Typically, "good" bots exist within browsers while "bad" will operate outside of the browser.

Operating systems have different types of clients and libraries generating traffic. For example, "Chrome," "WinInet," and "Java Runtime Environment" are all different client types. At first, client traffic may look the same, but there are some ways to distinguish between clients and enrich our context.

Start by looking at application-layer headers. Because most firewall configurations allow HTTP and TLS to any address, many bots use these protocols to communicate with their targets. You can identify bots operating outside of browsers by identifying groups of client-configured HTTP and TLS features.

Every HTTP session has a set of request headers defining the request and how the server should handle it. These headers, their order, and their values are set when composing the HTTP request. Similarly, TLS session attributes, such as cipher suites, extensions list, ALPN (Application-Layer Protocol Negotiation), and elliptic curves, are established in the initial TLS packet, the "client hello" packet, which is unencrypted. Clustering the different sequences of HTTP and TLS attributes will likely indicate different bots. 

Doing so, for example, will allow you to spot TLS traffic with different cipher suites. It's a good indicator that the traffic is being generated outside of the browser — a very non-humanlike approach and hence a good indicator of bot traffic.

Distinguish Between Bots within Browsers
Another method for identifying malicious bots is to look at specific information contained in HTTP headers. Internet browsers usually have a clear and standard header image. In a normal browsing session, clicking on a link within a browser will generates a "referrer" header that will be included in the next request for that URL. Bot traffic will usually not have a "referrer" header — or worse, it will be forged. Identifying bots that look the same in every traffic flow likely indicates maliciousness.

User-agent is the best-known string representing the program initiating a request. Various sources, such as fingerbank.org, match user-agent values with known program versions. Using this information can help identify abnormal bots. For example, most recent browsers use the "Mozilla 5.0" string in the user-agent field. Seeing a lower version of Mozilla or its complete absence indicates an abnormal bot user-agent string. No trustworthy browser will create traffic without a user-agent value.

Analyze the Payload
Having said that, we don't want to limit our search for "bad" bots only to the HTTP and TLS protocols. We have also observed known malware samples using proprietary unknown protocols over known ports and such could be flagged using application identification.

In addition, the traffic direction (inbound or outbound) has a significant value here. Devices that are connected directly to the Internet are constantly exposed to scanning operations and therefore these bots should be considered as inbound scanners. On the other hand, scanning activity going outbound indicates a device infected with a scanning bot. This could be harmful for the target being scanned and puts the organization IP address reputation at risk.

Determine a Target's Risk
Until now, we've looked for bot indicators in the frequency of client-server communications and in the type of clients. Now, let's pull in another dimension — the destination or target. To determine malicious targets, consider two factors: target reputation and target popularity.

Target reputation calculates the likelihood of a domain being malicious based on the experience gathered form many flows. Reputation is determined either by third-party services or through self-calculation by noting whenever users report a target as malicious.

All too often, though, simple sources for determining targets reputation, such as URL reputation feeds, alone are insufficient. Every month, millions of new domains are registered. With so many new domains, domain reputation mechanisms lack sufficient context to categorize them properly, delivering a high rate of false positives.

Bot detection over IP networks is not an easy task, but it's becoming a fundamental part of network security practice and malware hunting specifically. By combining the five techniques we've presented here, you can detect malicious bots more efficiently.

For detailed graphics and practical examples on applying this methodology, go here.

Related Content:


Check out The Edge, Dark Reading's new section for features, threat data, and in-depth perspectives. Today's top story: "What's in a WAF?"

Avidan Avraham is a Security Researcher in Cato Networks. Avidan has a strong interest in cybersecurity, from OS internals and reverse engineering, to network protocols analysis and malicious traffic detection. Avidan is also a big data and machine learning enthusiast who ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
COVID-19: Latest Security News & Commentary
Dark Reading Staff 10/23/2020
7 Tips for Choosing Security Metrics That Matter
Ericka Chickowski, Contributing Writer,  10/19/2020
Russian Military Officers Unmasked, Indicted for High-Profile Cyberattack Campaigns
Kelly Jackson Higgins, Executive Editor at Dark Reading,  10/19/2020
Register for Dark Reading Newsletters
White Papers
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2020-10-23
A Cross-Site Request Forgery (CSRF) vulnerability is identified in FruityWifi through 2.4. Due to a lack of CSRF protection in page_config_adv.php, an unauthenticated attacker can lure the victim to visit his website by social engineering or another attack vector. Due to this issue, an unauthenticat...
PUBLISHED: 2020-10-23
FruityWifi through 2.4 has an unsafe Sudo configuration [(ALL : ALL) NOPASSWD: ALL]. This allows an attacker to perform a system-level (root) local privilege escalation, allowing an attacker to gain complete persistent access to the local system.
PUBLISHED: 2020-10-23
NVIDIA GeForce Experience, all versions prior to, contains a vulnerability in the ShadowPlay component which may lead to local privilege escalation, code execution, denial of service or information disclosure.
PUBLISHED: 2020-10-23
An arbitrary command execution vulnerability exists in the fopen() function of file writes of UCMS v1.4.8, where an attacker can gain access to the server.
PUBLISHED: 2020-10-23
NVIDIA GeForce Experience, all versions prior to, contains a vulnerability in NVIDIA Web Helper NodeJS Web Server in which an uncontrolled search path is used to load a node module, which may lead to code execution, denial of service, escalation of privileges, and information disclosure.