Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Google Captcha Dumps Distorted Text Images

Tired of reading those wavy words? Changes to Google's reCaptcha system -- which doubles as quality control for its book and newspaper scanning projects -- prioritize bot-busting puzzles based on numbers.

9 Android Apps To Improve Security, Privacy
9 Android Apps To Improve Security, Privacy
(click image for larger view)
Google is making changes to its reCaptcha system: distorted text images are out, while numbers and more-adaptive, puzzle-based authentication checks are in.

The change is necessary because text-only Captchas are no longer blocking a sufficient number of automated log-in attempts, according to Google's reCaptcha product manager, Vinay Shet. "Over the last few years advances in artificial intelligence have reduced the gap between human and machine capabilities in deciphering distorted text," he said in a Friday blog post. "Today, a successful Captcha solution needs to go beyond just relying on text distortions to separate man from machine."

Based on extensive user testing, Google thinks it can better separate real users from bots by using better risk analysis. This is based in part on watching what a supposed user is doing before, during and after the check, and serving up multiple puzzle-based checks. Although Shet didn't spell out exactly what these puzzles might look like, he did say that unlike humans, bots have a tough time with numbers.

[ Twitter's new security measures can be a double-edged sword. Read Twitter Two-Factor Lockout: One User's Horror Story. ]

"We've recently released an update that creates different classes of Captchas for different kinds of users. This multi-faceted approach allows us to determine whether a potential user is actually a human or not, and serve our legitimate users Captchas that most of them will find easy to solve," he said. "Bots, on the other hand, will see Captchas that are considerably more difficult and designed to stop them from getting through."

The Captcha -- an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart -- challenge-response technique was first developed at Carnegie Mellon University in 2000. The approach is designed to create a test that humans can pass, but computers can't. In theory, Captchas can be used for a variety of tasks, including preventing automated spam from appearing in blog comments, blocking automated spam-bot signup attempts for email services -- such as free Gmail accounts -- and safeguarding Web pages that site administrators don't want to be tracked by search bots.

In fact, Google purchased reCaptcha in 2009, in a bid to better block spammers who signed up for free accounts. The approach offered by reCaptcha was notable not just for presenting users with a Captcha phrase, but drawing those images from scans of books. That squares with Google's own Google Books and Google News Archive Search projects, which rely on optical character recognition (OCR) scans of printed source material, which aren't 100% accurate. By designating scanned content for use with the reCaptcha system, however, Google killed two birds with one stone: creating a security check, while also tapping users to manually enter or verify scanned text for free.

In short order, Google also rolled out -- and still offers -- reCaptcha as "a free anti-bot service that helps digitize books," and is available for use by any website. "Answers to reCaptcha challenges are used to digitize textual documents," according to Google's reCaptcha overview. "It's not easy, but through a sophisticated combination of multiple OCR programs, probabilistic language models, and most importantly the answers from millions of humans on the internet, reCaptcha is able to achieve over 99.5% transcription accuracy at the word level."

But no information security challenge-response system -- at least to date -- is perfect. Spam rings also have access to OCR tools, and have duly defeated many Captcha systems. Other criminal groups, echoing Google's crowd-sourced reCaptcha approach, have even tricked users into recording target sites' Captcha phrases -- most sites have a finite pool of possibilities -- with the lure of free porn.

By adopting a more adaptive approach to verifying people's identities via reCaptcha, Google has taken a page from Facebook's login verification system, which looks at a variety of factors when someone attempts to log into an account, including their geographic location, and whether they're using a computer that Facebook has seen before. For unusual types of log-ins, Facebook's system can hit would-be users with an escalating series of security challenges.

Similarly, RSA's Adaptive Authentication system, which is used by about 70 of the country's 100 biggest banks to verify their customers' identity, assesses a number of risk factors before granting access. Based on different risk factors, furthermore, users can also be made to jump through more hoops before the system believes that they are who they say they are.

It's been a busy month for Captcha researchers. Earlier this month, a team of Carnegie Mellon researchers unveiled an inkblot-based Captcha system that's designed to defeat automated attacks.

This week, startup firm Vicarious claimed it has created an algorithm that can successfully defeat any text-based Captcha system, as well as defeat reCaptcha -- widely seen as the toughest Captcha system available -- 90% of the time, New Scientist reported. But Luis von Ahn, who was part of the Carnegie Mellon team that created Captchas, remains skeptical, saying he's counted 50 such Captcha-breaking claims since 2003. "It's hard for me to be impressed since I see these every few months," he told Forbes.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Arlo James Barnes
Arlo James Barnes,
User Rank: Apprentice
11/5/2013 | 6:21:40 PM
re: Google Captcha Dumps Distorted Text Images
"Although Shet didn't spell out exactly what these puzzles might look like, he did say that unlike humans, bots have a tough time with numbers."
Ironically, perhaps.
Inside the Ransomware Campaigns Targeting Exchange Servers
Kelly Sheridan, Staff Editor, Dark Reading,  4/2/2021
Beyond MITRE ATT&CK: The Case for a New Cyber Kill Chain
Rik Turner, Principal Analyst, Infrastructure Solutions, Omdia,  3/30/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-04-14
An overly permissive CORS policy in Devolutions Server before 2021.1 and Devolutions Server LTS before 2020.3.18 allows a remote attacker to leak cross-origin data via a crafted HTML page.
PUBLISHED: 2021-04-14
An SQL Injection issue in Devolutions Server before 2021.1 and Devolutions Server LTS before 2020.3.18 allows an administrative user to execute arbitrary SQL commands via a username in api/security/userinfo/delete.
PUBLISHED: 2021-04-14
An issue was discovered in Joomla! 3.0.0 through 3.9.25. Inadequate escaping allowed XSS attacks using the logo parameter of the default templates on error page
PUBLISHED: 2021-04-14
An issue was discovered in Joomla! 3.0.0 through 3.9.25. Inadequate filters on module layout settings could lead to an LFI.
PUBLISHED: 2021-04-14
Command Injection in TOTOLINK X5000R router with firmware v9.1.0u.6118_B20201102, and TOTOLINK A720R router with firmware v4.1.5cu.470_B20200911 allows remote attackers to execute arbitrary OS commands by sending a modified HTTP request. This occurs because the function executes glibc's system funct...