Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Google Captcha Dumps Distorted Text Images

Tired of reading those wavy words? Changes to Google's reCaptcha system -- which doubles as quality control for its book and newspaper scanning projects -- prioritize bot-busting puzzles based on numbers.

9 Android Apps To Improve Security, Privacy
9 Android Apps To Improve Security, Privacy
(click image for larger view)
Google is making changes to its reCaptcha system: distorted text images are out, while numbers and more-adaptive, puzzle-based authentication checks are in.

The change is necessary because text-only Captchas are no longer blocking a sufficient number of automated log-in attempts, according to Google's reCaptcha product manager, Vinay Shet. "Over the last few years advances in artificial intelligence have reduced the gap between human and machine capabilities in deciphering distorted text," he said in a Friday blog post. "Today, a successful Captcha solution needs to go beyond just relying on text distortions to separate man from machine."

Based on extensive user testing, Google thinks it can better separate real users from bots by using better risk analysis. This is based in part on watching what a supposed user is doing before, during and after the check, and serving up multiple puzzle-based checks. Although Shet didn't spell out exactly what these puzzles might look like, he did say that unlike humans, bots have a tough time with numbers.

[ Twitter's new security measures can be a double-edged sword. Read Twitter Two-Factor Lockout: One User's Horror Story. ]

"We've recently released an update that creates different classes of Captchas for different kinds of users. This multi-faceted approach allows us to determine whether a potential user is actually a human or not, and serve our legitimate users Captchas that most of them will find easy to solve," he said. "Bots, on the other hand, will see Captchas that are considerably more difficult and designed to stop them from getting through."

The Captcha -- an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart -- challenge-response technique was first developed at Carnegie Mellon University in 2000. The approach is designed to create a test that humans can pass, but computers can't. In theory, Captchas can be used for a variety of tasks, including preventing automated spam from appearing in blog comments, blocking automated spam-bot signup attempts for email services -- such as free Gmail accounts -- and safeguarding Web pages that site administrators don't want to be tracked by search bots.

In fact, Google purchased reCaptcha in 2009, in a bid to better block spammers who signed up for free accounts. The approach offered by reCaptcha was notable not just for presenting users with a Captcha phrase, but drawing those images from scans of books. That squares with Google's own Google Books and Google News Archive Search projects, which rely on optical character recognition (OCR) scans of printed source material, which aren't 100% accurate. By designating scanned content for use with the reCaptcha system, however, Google killed two birds with one stone: creating a security check, while also tapping users to manually enter or verify scanned text for free.

In short order, Google also rolled out -- and still offers -- reCaptcha as "a free anti-bot service that helps digitize books," and is available for use by any website. "Answers to reCaptcha challenges are used to digitize textual documents," according to Google's reCaptcha overview. "It's not easy, but through a sophisticated combination of multiple OCR programs, probabilistic language models, and most importantly the answers from millions of humans on the internet, reCaptcha is able to achieve over 99.5% transcription accuracy at the word level."

But no information security challenge-response system -- at least to date -- is perfect. Spam rings also have access to OCR tools, and have duly defeated many Captcha systems. Other criminal groups, echoing Google's crowd-sourced reCaptcha approach, have even tricked users into recording target sites' Captcha phrases -- most sites have a finite pool of possibilities -- with the lure of free porn.

By adopting a more adaptive approach to verifying people's identities via reCaptcha, Google has taken a page from Facebook's login verification system, which looks at a variety of factors when someone attempts to log into an account, including their geographic location, and whether they're using a computer that Facebook has seen before. For unusual types of log-ins, Facebook's system can hit would-be users with an escalating series of security challenges.

Similarly, RSA's Adaptive Authentication system, which is used by about 70 of the country's 100 biggest banks to verify their customers' identity, assesses a number of risk factors before granting access. Based on different risk factors, furthermore, users can also be made to jump through more hoops before the system believes that they are who they say they are.

It's been a busy month for Captcha researchers. Earlier this month, a team of Carnegie Mellon researchers unveiled an inkblot-based Captcha system that's designed to defeat automated attacks.

This week, startup firm Vicarious claimed it has created an algorithm that can successfully defeat any text-based Captcha system, as well as defeat reCaptcha -- widely seen as the toughest Captcha system available -- 90% of the time, New Scientist reported. But Luis von Ahn, who was part of the Carnegie Mellon team that created Captchas, remains skeptical, saying he's counted 50 such Captcha-breaking claims since 2003. "It's hard for me to be impressed since I see these every few months," he told Forbes.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Arlo James Barnes
Arlo James Barnes,
User Rank: Apprentice
11/5/2013 | 6:21:40 PM
re: Google Captcha Dumps Distorted Text Images
"Although Shet didn't spell out exactly what these puzzles might look like, he did say that unlike humans, bots have a tough time with numbers."
Ironically, perhaps.
SOC 2s & Third-Party Assessments: How to Prevent Them from Being Used in a Data Breach Lawsuit
Beth Burgin Waller, Chair, Cybersecurity & Data Privacy Practice , Woods Rogers PLC,  12/5/2019
Navigating Security in the Cloud
Diya Jolly, Chief Product Officer, Okta,  12/4/2019
Register for Dark Reading Newsletters
White Papers
Cartoon Contest
Write a Caption, Win a Starbucks Card! Click Here
Latest Comment: Our Endpoint Protection system is a little outdated... 
Current Issue
Navigating the Deluge of Security Data
In this Tech Digest, Dark Reading shares the experiences of some top security practitioners as they navigate volumes of security data. We examine some examples of how enterprises can cull this data to find the clues they need.
Flash Poll
Rethinking Enterprise Data Defense
Rethinking Enterprise Data Defense
Frustrated with recurring intrusions and breaches, cybersecurity professionals are questioning some of the industrys conventional wisdom. Heres a look at what theyre thinking about.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2019-12-11
Arbitrary command execution is possible in Git before 2.20.2, 2.21.x before 2.21.1, 2.22.x before 2.22.2, 2.23.x before 2.23.1, and 2.24.x before 2.24.1 because a "git submodule update" operation can run commands found in the .gitmodules file of a malicious repository.
PUBLISHED: 2019-12-10
All Samba versions 4.x.x before 4.9.17, 4.10.x before 4.10.11 and 4.11.x before 4.11.3 have an issue, where the (poorly named) dnsserver RPC pipe provides administrative facilities to modify DNS records and zones. Samba, when acting as an AD DC, stores DNS records in LDAP. In AD, the default permiss...
PUBLISHED: 2019-12-10
All Samba versions 4.x.x before 4.9.17, 4.10.x before 4.10.11 and 4.11.x before 4.11.3 have an issue, where the S4U (MS-SFU) Kerberos delegation model includes a feature allowing for a subset of clients to be opted out of constrained delegation in any way, either S4U2Self or regular Kerberos authent...
PUBLISHED: 2019-12-10
A flaw was found with the libssh API function ssh_scp_new() in versions before 0.9.3 and before 0.8.8. When the libssh SCP client connects to a server, the scp command, which includes a user-provided path, is executed on the server-side. In case the library is used in a way where users can influence...
PUBLISHED: 2019-12-10
A remote code execution vulnerability exists when Microsoft Windows OLE fails to properly validate user input, aka 'Windows OLE Remote Code Execution Vulnerability'.