Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Perimeter

Gather Intelligence On Web Bots To Aid Defense

BotoPedia, a registry of Web bots, could help companies keep their sites open to good crawlers but closed to attackers and site scrapers

Automated traffic to Web sites has steadily increased, driven by legitimate search-engine indexing, questionable crawlers, and malicious attackers. Companies need to know which is which.

To that end, Web-security cloud service Incapsula launched a site on Wednesday for cataloging Web bots, the automated programs that crawl websites to index pages, grab competitive price information, gather information on social-networking users, or scan for vulnerabilities. With the site, dubbed BotoPedia, the company is gathering data on the Internet addresses used by Web bots as well as the user-agent strings and any other identifying information. The catalog will be open, but moderated, in much the same way as Wikipedia, says Marc Gaffan, co-founder and vice president of business development for Incapsula.

"This is essentially trying to take the gray area and classify it to a higher level of granularity, so that website operators have got the ability to cherry-pick who they want to let in and who they don't," Gaffan says.

While many services attempt to identify bots by the user-agent strings -- typically indicating browser information -- the signature is changed too easily to be useful, he says. Instead, BotoPedia will include the user-agent string, IP addresses, and other details.

While that is a simple change, it's an important one, says Bogdan Botezatu, senior threat analyst with security firm BitDefender.

"If you block my spider, I will change its name and come back and crawl your Web server in a few minutes without losing much money or time," Botezatu says. "But if you block my IP address, then I will have to either change my IP or change my provider or move to a different data center."

[ Researchers release free search engine-based data mining tools to identify and extract sensitive information from many popular cloud-based services. See Researchers To Launch New Tools For Search Engine Hacking. ]

BotoPedia was initially seeded with data on the top 50 bots, but another dozen had been submitted by outside sources by Wednesday evening. While the operators of good Web bots will self-submit, researchers will likely add information on bad bots as well, Gaffan says.

"I do expect a lot of bad bots to get in there, but obviously not by them coming forward," he says.

The rise of automated Web traffic is playing out against the backdrop of an estimated quadrupling of Internet traffic by the year 2016, according to networking giant Cisco's efforts to predict future bandwidth demand. Web traffic will increase slightly faster, expanding some five-fold between 2011 and 2016, the company estimates.

Automated traffic is taking an increasing share of the pie. Currently, slightly more than half of the traffic to websites comes from bots, according to Incapsula's data. Of the total, 20 percent are good page indexers and other desired bots, another 19 percent are intelligence-gathering bots that sites may not want, and the remaining 12 percent are scrapers, comment spammers, and flat-out attacks.

Attacks could be automated SQL-injection attacks on back-end databases, the scraping of user information, or just automated attempts at logging in. Overall, sites should expect each Web application to suffer a sustained attack nearly 120 days of each year, according to a report issued earlier this week by Web security firm Imperva. Companies should prepare for intense automated attacks, the company says.

"The success of the whole mission depends on the defense performance when under attack," states the report. "Therefore, the defense solutions and procedures should be designed to accommodate attack bursts."

Increasingly, attackers will cloak themselves in the appearance of legitimacy. By appearing to be a search-engine index bot, attackers will be able to bypass most filters, Incapsula's Gaffan says.

In a study of 1,000 customers, Incapsula found that more than 16 percent encountered Web bots that impersonated Google's automated crawlers. Because Google search rankings are so important, no site wants to block the company from indexing its pages.

In the end, the company hopes the online catalog will empower companies to make better decisions about what automated traffic they allow to peruse their sites, and what traffic they block, he says.

"This will give website owners a lot of different information and better awareness into who they want to let in," Gaffan says.

Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Cloud Security Startup Lightspin Emerges From Stealth
Kelly Sheridan, Staff Editor, Dark Reading,  11/24/2020
Look Beyond the 'Big 5' in Cyberattacks
Robert Lemos, Contributing Writer,  11/25/2020
Why Vulnerable Code Is Shipped Knowingly
Chris Eng, Chief Research Officer, Veracode,  11/30/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: We are really excited about our new two tone authentication system!
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-7335
PUBLISHED: 2020-12-01
Privilege Escalation vulnerability in Microsoft Windows client McAfee Total Protection (MTP) prior to 16.0.29 allows local users to gain elevated privileges via careful manipulation of a folder by creating a junction link. This exploits a lack of protection through a timing issue and is only exploit...
CVE-2020-15257
PUBLISHED: 2020-12-01
containerd is an industry-standard container runtime and is available as a daemon for Linux and Windows. In containerd before versions 1.3.9 and 1.4.3, the containerd-shim API is improperly exposed to host network containers. Access controls for the shim’s API socket verified that...
CVE-2020-9114
PUBLISHED: 2020-12-01
FusionCompute versions 6.3.0, 6.3.1, 6.5.0, 6.5.1 and 8.0.0 have a privilege escalation vulnerability. Due to improper privilege management, an attacker with common privilege may access some specific files and get the administrator privilege in the affected products. Successful exploit will cause pr...
CVE-2020-9117
PUBLISHED: 2020-12-01
HUAWEI nova 4 versions earlier than 10.0.0.165(C01E34R2P4) and SydneyM-AL00 versions earlier than 10.0.0.165(C00E66R1P5) have an out-of-bounds read and write vulnerability. An attacker with specific permissions crafts malformed packet with specific parameter and sends the packet to the affected prod...
CVE-2020-4126
PUBLISHED: 2020-12-01
HCL iNotes is susceptible to a sensitive cookie exposure vulnerability. This can allow an unauthenticated remote attacker to capture the cookie by intercepting its transmission within an http session. Fixes are available in HCL Domino and iNotes versions 10.0.1 FP6 and 11.0.1 FP2 and later.