Automated bots that collect content, product descriptions, pricing, inventory data, and other public-facing information from websites have a greater economic and performance impact than many organizations might realize, a new study suggests.
Bot mitigation company PerimeterX recently commissioned market intelligence firm Aberdeen Group to look into how web-scraping bots might be affecting the revenues of digital businesses.
The study found bots account for between 40% and 60% of total website traffic in certain industries and can impact businesses in multiple ways, including overloading their infrastructure, skewing analytics data, and diminishing the value of their IP, marketing, and SEO investments. The impact to revenues from such factors is considerable, according to PerimeterX.
"Web scraping hurts your revenue in more ways than you know," says Deepak Patel, security evangelist at PerimeterX. For the e-commerce sector, website scraping can dilute overall annual website profitability by as much as 80%, the study shows.
"For the media sector, the median annual business impact of website scraping is as much as 27% of overall website profitability," Patel adds.
Many organizations don't view web-scraping bots as a security threat because they don't breach the network or exploit a security flaw. However, they do pose a big threat to business logic or proprietary content essential for maintaining a competitive edge.
"Malicious web-scraping bots can steal your exclusive, copyrighted content and images," says Patel, adding that it can also damage a site's SEO rankings when search engines detect pages with duplicate content.
Organizations routinely use web scrapers to look up information on their competition, to build services based off of third-party data, or for a variety of other reasons. The bots scour websites — in much the same way search engine crawlers do — and collect any information the operator might have publicly posted and would be useful to the organization using the bots.
Though there are some questions over the legality of the practice, numerous products and services are available that allow organizations to scrape another firm's website for information that is available publicly. In a lawsuit involving talent management advisory firm hiQ Labs and LinkedIn, the Ninth Circuit Court of Appeals last year held that the scraping of publicly available data does not violate US computer fraud laws. LinkedIn had wanted hiQ to stop scraping publicly available data from its site, which the latter was using to create analytics tools to help companies deal with employee retention issues.
"As a technical matter, web scraping is simply machine-automated web browsing and accesses and records the same information, which a human visitor to the site might do manually," the Electronic Frontier Foundation had noted in welcoming the appellate court's decision.
The study shows that while humans and "good bots" — such as those used by search engines— represented a substantial proportion of web traffic, "bad bots" represented a significant proportion as well. Nearly 17% of all traffic on e-commerce websites, for example, was comprised of bad bots. On travel sites, the proportion was closer to 31% and on media sites around 9.5%.
Patel says bad bots are bots that crawl websites to perform abusive or malicious actions, including account takeover and content plagiarism. Such bots often mimic human behavior and use multiple IPs to evade detection.
They also can scrape content that other sites might have invested in substantially to develop — like SEO-optimized product descriptions or marketing content, for instance. For companies that are doing the scraping, such content can help reduce or even eliminate the need to develop their own content. Conversely, for digital businesses that are the targets, web scraping can potentially erode the value of their investments, the study found. Similarly, information that companies need to put on their sites — like pricing information or product availability — could help rivals gain valuable insight for making their own decisions.
Bot traffic can also overload web infrastructure by sending millions of requests to a specific path, such as login or checkout pages, causing a slowdown for users, Patel says. According to him, 80% of account logins originate from bad bots.
"Scraping bots can significantly impact website performance since they have to collect a lot of data quickly," Patel says. On retail sites, for example, the traffic from bots trying to keep pace with new product listings or pricing changes can degrade performance.
Many tools are commercially available that are designed to help digital businesses deal with web scrapers.
"But today's bots, unlike more crude, basic bots of the past, are becoming more adept at mimicking actual users and disguising their true purpose," Patel says. "Hyper-distributed scraping attacks, achieved by using many different user agents, IPs, and [autonomous system numbers] are even more dangerous, resulting in higher volume and higher difficulty of detection."
- Automated Bots Are Increasingly Scraping Data & Attempting Logins
- Bad Bots Build Presence Across the Web
- Website Attacks Become Quieter & More Persistent
- 7 Secure Remote Access Services for Today's Enterprise Needs
- How Cybersecurity Incident Response Programs Work (and Why Some Don't)
- Latest Security News & Commentary about COVID-19