Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Threat Intelligence

4/21/2020
05:35 PM
100%
0%

Automated Bots Are Increasingly Scraping Data & Attempting Logins

The share of bot traffic to online sites declines, but businesses are seeing an overall increase in automated scraping of data, login attempts, and other detrimental activity.

The volume of Internet traffic due to automated software — bots — has declined to its lowest point in at least six years, but the share of the traffic due to unwanted automated activity — "bad" bots — has increased to its highest level over the same period, according to cybersecurity firm Imperva in a report published on April 21.

In 2019, bad bots accounted for 24% of all Internet traffic seen by Imperva's customers, 5.5 points higher than its lowest level in 2015, the company stated in its "Bad Bot Report 2020." Bad bots are automated software programs that perform unwanted activities, such as scrape price data or availability information from websites, or conduct outright-malicious activities, such as account-takeover attempts or credit card fraud. 

Acceptable bot activity has fallen by nearly two-thirds to 13% of all traffic in 2019, down from 36% in 2014, the report states. The move to a data-driven economy has created an incentive for more bots while at the same time making their activities less acceptable, says Kunal Anand, chief technology officer for Imperva. 

"The digital transformation and the movement of information to the Web is a major driver that makes running bots more lucrative," he says. "There is also increased awareness, and companies are controlling what bots they allow through whitelisting or allow what are seen as good bots."

Bots are a natural evolution of connecting computers and software to the Internet, but they are problematic for companies that have to expose their intellectual property online as part of their business. Airlines, for example, need to give flight information and pricing to customers, but at the same time, a bot-using competitor can scrape that information and gain valuable information. 

Businesses that see Internet efficiencies declining — such as poor conversion rates, content appearing on other sites, or increased failed logins — have likely been targeted by bots, according to Imperva's report.

"The two biggest problems from bad bots are credential stuffing to attack account logins and scraping of data, [such as] pricing and/or content," Anand says. "Almost every website suffers from both of these."

About a quarter of bots are considered simple, with traffic that comes from a single IP address and does not use a browser agent header to pretend that its traffic is legitimate. More-complex bots use browser emulation software, such as Selenium, to masquerade as a legitimate visitor. Selenium is an open source project that is commonly used to test websites for vulnerabilities. The most sophisticated bots move the mouse to mimic human behavior, Imperva states in the report. 

More than 55% of bots impersonated Google's Chrome browser, the highest percentage yet, the company found.

Different industries see different levels of bad bot activity. The financial industry encountered very little "good" bot traffic, with a little less than 48% of traffic due to bad bots and a little more than 51% of traffic from humans. Similarly, the education and IT services sectors are seeing around 45% of their traffic accounted for by bad bots.

Online data firms and business service firms encountered the largest share of traffic from good bots, which accounted for 51% and 54% of their traffic, respectively.

Nearly 46% of unwanted bot traffic came from the United States in 2019, and in many cases, the bot activity is likely legal. In September, a US appeals court upheld the ability of HiQ Labs, a provider of intelligence on employees, to scrape LinkedIn and other services to compile profiles of professionals. 

"Our definition of good bot is typically a tool that the business is willing to allow to be on its site — search engines and SEO tools fall into this list," Anand says. "Companies typically also whitelist other tools that they use themselves, like a vulnerability scanner that they control when it is being deployed. Bad bots are classified as those requests that don't come from a recognized browser and are there for another reason that wasn't authorized by the company."

Related Content

Check out The Edge, Dark Reading's new section for features, threat data, and in-depth perspectives. Today's top story: "How Can I Help My Users Spot Disinformation?"

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Lex2525
50%
50%
Lex2525,
User Rank: Apprentice
4/25/2020 | 4:58:02 AM
Didn't know about the company Selenium
I can see how the airline industry would be affected negatively by bot software that scans for prices. The same case for other sectors that have significant price option offers like hotels and cruises. The 55% bots showing up in Google Chrome makes a sense given the majority of users use this for their web browser. Well, like all problems come the opportunity for individuals to tackle and execute. And quite interesting to note that majority of the bot traffic is coming from the US and that a good portion of the truck is legal. Thank you for sharing what Selenium is and how it functions to help with testing websites for vulnerabilities. I was completely unaware of this company. <a href="https://www.mesa-carpenter.com"></a>
COVID-19: Latest Security News & Commentary
Dark Reading Staff 11/19/2020
New Proposed DNS Security Features Released
Kelly Jackson Higgins, Executive Editor at Dark Reading,  11/19/2020
How to Identify Cobalt Strike on Your Network
Zohar Buber, Security Analyst,  11/18/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: A GONG is as good as a cyber attack.
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-25660
PUBLISHED: 2020-11-23
A flaw was found in the Cephx authentication protocol in versions before 15.2.6 and before 14.2.14, where it does not verify Ceph clients correctly and is then vulnerable to replay attacks in Nautilus. This flaw allows an attacker with access to the Ceph cluster network to authenticate with the Ceph...
CVE-2020-25688
PUBLISHED: 2020-11-23
A flaw was found in rhacm versions before 2.0.5 and before 2.1.0. Two internal service APIs were incorrectly provisioned using a test certificate from the source repository. This would result in all installations using the same certificates. If an attacker could observe network traffic internal to a...
CVE-2020-25696
PUBLISHED: 2020-11-23
A flaw was found in the psql interactive terminal of PostgreSQL in versions before 13.1, before 12.5, before 11.10, before 10.15, before 9.6.20 and before 9.5.24. If an interactive psql session uses \gset when querying a compromised server, the attacker can execute arbitrary code as the operating sy...
CVE-2020-26229
PUBLISHED: 2020-11-23
TYPO3 is an open source PHP based web content management system. In TYPO3 from version 10.4.0, and before version 10.4.10, RSS widgets are susceptible to XML external entity processing. This vulnerability is reasonable, but is theoretical - it was not possible to actually reproduce the vulnerability...
CVE-2020-28984
PUBLISHED: 2020-11-23
prive/formulaires/configurer_preferences.php in SPIP before 3.2.8 does not properly validate the couleur, display, display_navigation, display_outils, imessage, and spip_ecran parameters.