Staying Ahead of the Bot LandscapeThinking of the bot landscape as homogeneous paints an overly simplistic picture.
By Renny Shen, Director of Product Marketing, Akamai
Akamai launched Bot Manager three years ago. Since then, the bot landscape has continued to evolve, and we've introduced a number of improvements to our bot detections to stay ahead of it.
One lesson learned from having a wide range of both basic and advanced bot detections at our disposal is that thinking of the bot landscape as homogeneous paints an overly simplistic picture. There's isn't a single bot interacting with your website; there's a crowd. Additionally, the individual bots in that crowd have varying levels of sophistication in terms of their footprint, their behavior, and the technology they use to try to circumvent bot detections.
Perhaps unintuitively, simple signature-based detections remain useful, especially when they are continuously updated by threat research. These detections often identify up to half of the bots we see going to any customer's site – even when looking at higher value pages, such as login and account creation pages. However, that doesn't paint the whole picture. Simpler bot detections can remove the cruft – and there's a lot of cruft – but the bots that remain can have an outsized impact. Consider credential stuffing and account takeover: It takes only one bot to successfully validate thousands of account credentials to have a huge impact on your bottom line.
A signature-based approach will always reach its limit because signatures are created by humans. When looking at a vast sea of data, humans can identify trends. Yet finding a needle in a haystack may rely on a result from luck – and luck isn't repeatable. This is why it's a smart idea to complement signature-based detections with other detections that employ machine learning to sift through vast amounts of data.
What often gets overlooked with machine learning is the importance of data. Most conversations try to compare the algorithms (i.e., "my algorithms are better than yours") which, unless you're a data scientist, can be a pointless conversation. Instead, consider that the output of any machine-learning algorithm will only be as good as the amount and quality of data that feed it.
This is where advanced bot detection technologies – namely, unsupervised device anomaly and adaptive anomaly clustering – come into play. (Both are part of Akamai's March Release of Bot Manager Premier.)
Now, the notion of device anomaly detection is not new. For example, many bot detection vendors look at the browser user agent, using human analysts to build rules that define legal user agents. However, the human-based approach is subject to several pitfalls, including being limited to a few obvious properties, requiring continuous overhead to manually update, and being subject to spoofing.
Unsupervised device anomaly is designed to avoid those pitfalls by using an unsupervised machine-learning approach to identify illegal device properties and attributes. It automatically analyzes signals across the device stack, including hardware, OS, application, and network, not only for legality, but also for overuse from a statistical perspective – for example, a spike in the occurrence of certain device characteristics that is typically only seen in a fraction of a percentage of the population. This type of detection is most effective (and accurate) on large platforms that interacts with millions of unique devices every single day.
Adaptive anomaly clustering works a little differently. This detection technology does not look for specific characteristics. It is not constrained by its human creators' preconceived notions of what characteristics, signals, or patterns indicate a bot. Instead, it employs a combination of unsupervised clustering coupled with deep learning to examine all of the available signals and look for clusters of signals that are out of step with the population as a whole.
Consider a more tangible example: If you walked into a room and saw a person wearing a blue shirt, you wouldn't think anything was amiss. Likewise, an algorithm that identified inappropriate attire based on preconceived notions of propriety would not flag a person wearing a blue shirt because it's an entirely appropriate choice of attire. However, if you walked into a room with 100 people wearing the same blue shirt, your human instinct would tell you that there's something going on. Yet the algorithm would miss it – because each person is making an appropriate choice. This is precisely the scenario that adaptive anomaly clustering is designed to address.
Device anomaly and adaptive anomaly clustering are excellent examples of how machine learning can be so much more powerful due to the amount of data that can be analyzed. Visibility drives data, and data drives intelligence. In turn, intelligence drives the continuous evolution of bot detection as well as the ability to protect customers against the always-changing bot landscape.
(Image: Adobe Stock)