Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Operations

5/21/2020
04:55 PM
Connect Directly
Twitter
LinkedIn
RSS
E-Mail
50%
50%

Web Scrapers Have Bigger-Than-Perceived Impact on Digital Businesses

The economic impact of bot traffic can be unexpectedly substantial, a PerimeterX-commissioned study finds.

Automated bots that collect content, product descriptions, pricing, inventory data, and other public-facing information from websites have a greater economic and performance impact than many organizations might realize, a new study suggests.

Bot mitigation company PerimeterX recently commissioned market intelligence firm Aberdeen Group to look into how web-scraping bots might be affecting the revenues of digital businesses.

The study found bots account for between 40% and 60% of total website traffic in certain industries and can impact businesses in multiple ways, including overloading their infrastructure, skewing analytics data, and diminishing the value of their IP, marketing, and SEO investments. The impact to revenues from such factors is considerable, according to PerimeterX.

"Web scraping hurts your revenue in more ways than you know," says Deepak Patel, security evangelist at PerimeterX. For the e-commerce sector, website scraping can dilute overall annual website profitability by as much as 80%, the study shows.

"For the media sector, the median annual business impact of website scraping is as much as 27% of overall website profitability," Patel adds.

Many organizations don't view web-scraping bots as a security threat because they don't breach the network or exploit a security flaw. However, they do pose a big threat to business logic or proprietary content essential for maintaining a competitive edge.

"Malicious web-scraping bots can steal your exclusive, copyrighted content and images," says Patel, adding that it can also damage a site's SEO rankings when search engines detect pages with duplicate content.

Organizations routinely use web scrapers to look up information on their competition, to build services based off of third-party data, or for a variety of other reasons. The bots scour websites — in much the same way search engine crawlers do — and collect any information the operator might have publicly posted and would be useful to the organization using the bots.

Though there are some questions over the legality of the practice, numerous products and services are available that allow organizations to scrape another firm's website for information that is available publicly. In a lawsuit involving talent management advisory firm hiQ Labs and LinkedIn, the Ninth Circuit Court of Appeals last year held that the scraping of publicly available data does not violate US computer fraud laws. LinkedIn had wanted hiQ to stop scraping publicly available data from its site, which the latter was using to create analytics tools to help companies deal with employee retention issues.

"As a technical matter, web scraping is simply machine-automated web browsing and accesses and records the same information, which a human visitor to the site might do manually," the Electronic Frontier Foundation had noted in welcoming the appellate court's decision.

Bad Bots
The study shows that while humans and "good bots" — such as those used by search engines— represented a substantial proportion of web traffic, "bad bots" represented a significant proportion as well. Nearly 17% of all traffic on e-commerce websites, for example, was comprised of bad bots. On travel sites, the proportion was closer to 31% and on media sites around 9.5%.

Patel says bad bots are bots that crawl websites to perform abusive or malicious actions, including account takeover and content plagiarism. Such bots often mimic human behavior and use multiple IPs to evade detection.

They also can scrape content that other sites might have invested in substantially to develop — like SEO-optimized product descriptions or marketing content, for instance. For companies that are doing the scraping, such content can help reduce or even eliminate the need to develop their own content. Conversely, for digital businesses that are the targets, web scraping can potentially erode the value of their investments, the study found. Similarly, information that companies need to put on their sites — like pricing information or product availability — could help rivals gain valuable insight for making their own decisions.

Bot traffic can also overload web infrastructure by sending millions of requests to a specific path, such as login or checkout pages, causing a slowdown for users, Patel says. According to him, 80% of account logins originate from bad bots.

"Scraping bots can significantly impact website performance since they have to collect a lot of data quickly," Patel says. On retail sites, for example, the traffic from bots trying to keep pace with new product listings or pricing changes can degrade performance.

Many tools are commercially available that are designed to help digital businesses deal with web scrapers.

"But today's bots, unlike more crude, basic bots of the past, are becoming more adept at mimicking actual users and disguising their true purpose," Patel says. "Hyper-distributed scraping attacks, achieved by using many different user agents, IPs, and [autonomous system numbers] are even more dangerous, resulting in higher volume and higher difficulty of detection."

Related Content:

 

 
 
 
 
Learn from industry experts in a setting that is conducive to interaction and conversation about how to prepare for that "really  bad day" in cybersecurity. Click for more information and to register
 
Jai Vijayan is a seasoned technology reporter with over 20 years of experience in IT trade journalism. He was most recently a Senior Editor at Computerworld, where he covered information security and data privacy issues for the publication. Over the course of his 20-year ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
NSA Appoints Rob Joyce as Cyber Director
Dark Reading Staff 1/15/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: Hunny, I looked every where for the dorritos. 
Current Issue
2020: The Year in Security
Download this Tech Digest for a look at the biggest security stories that - so far - have shaped a very strange and stressful year.
Flash Poll
Assessing Cybersecurity Risk in Today's Enterprises
Assessing Cybersecurity Risk in Today's Enterprises
COVID-19 has created a new IT paradigm in the enterprise -- and a new level of cybersecurity risk. This report offers a look at how enterprises are assessing and managing cyber-risk under the new normal.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-8567
PUBLISHED: 2021-01-21
Kubernetes Secrets Store CSI Driver Vault Plugin prior to v0.0.6, Azure Plugin prior to v0.0.10, and GCP Plugin prior to v0.2.0 allow an attacker who can create specially-crafted SecretProviderClass objects to write to arbitrary file paths on the host filesystem, including /var/lib/kubelet/pods.
CVE-2020-8568
PUBLISHED: 2021-01-21
Kubernetes Secrets Store CSI Driver versions v0.0.15 and v0.0.16 allow an attacker who can modify a SecretProviderClassPodStatus/Status resource the ability to write content to the host filesystem and sync file contents to Kubernetes Secrets. This includes paths under var/lib/kubelet/pods that conta...
CVE-2020-8569
PUBLISHED: 2021-01-21
Kubernetes CSI snapshot-controller prior to v2.1.3 and v3.0.2 could panic when processing a VolumeSnapshot custom resource when: - The VolumeSnapshot referenced a non-existing PersistentVolumeClaim and the VolumeSnapshot did not reference any VolumeSnapshotClass. - The snapshot-controller crashes, ...
CVE-2020-8570
PUBLISHED: 2021-01-21
Kubernetes Java client libraries in version 10.0.0 and versions prior to 9.0.1 allow writes to paths outside of the current directory when copying multiple files from a remote pod which sends a maliciously crafted archive. This can potentially overwrite any files on the system of the process executi...
CVE-2020-8554
PUBLISHED: 2021-01-21
Kubernetes API server in all versions allow an attacker who is able to create a ClusterIP service and set the spec.externalIPs field, to intercept traffic to that IP address. Additionally, an attacker who is able to patch the status (which is considered a privileged operation and should not typicall...