12:36 PM
Connect Directly

Google Captcha Dumps Distorted Text Images

Tired of reading those wavy words? Changes to Google's reCaptcha system -- which doubles as quality control for its book and newspaper scanning projects -- prioritize bot-busting puzzles based on numbers.

9 Android Apps To Improve Security, Privacy
9 Android Apps To Improve Security, Privacy
(click image for larger view)
Google is making changes to its reCaptcha system: distorted text images are out, while numbers and more-adaptive, puzzle-based authentication checks are in.

The change is necessary because text-only Captchas are no longer blocking a sufficient number of automated log-in attempts, according to Google's reCaptcha product manager, Vinay Shet. "Over the last few years advances in artificial intelligence have reduced the gap between human and machine capabilities in deciphering distorted text," he said in a Friday blog post. "Today, a successful Captcha solution needs to go beyond just relying on text distortions to separate man from machine."

Based on extensive user testing, Google thinks it can better separate real users from bots by using better risk analysis. This is based in part on watching what a supposed user is doing before, during and after the check, and serving up multiple puzzle-based checks. Although Shet didn't spell out exactly what these puzzles might look like, he did say that unlike humans, bots have a tough time with numbers.

[ Twitter's new security measures can be a double-edged sword. Read Twitter Two-Factor Lockout: One User's Horror Story. ]

"We've recently released an update that creates different classes of Captchas for different kinds of users. This multi-faceted approach allows us to determine whether a potential user is actually a human or not, and serve our legitimate users Captchas that most of them will find easy to solve," he said. "Bots, on the other hand, will see Captchas that are considerably more difficult and designed to stop them from getting through."

The Captcha -- an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart -- challenge-response technique was first developed at Carnegie Mellon University in 2000. The approach is designed to create a test that humans can pass, but computers can't. In theory, Captchas can be used for a variety of tasks, including preventing automated spam from appearing in blog comments, blocking automated spam-bot signup attempts for email services -- such as free Gmail accounts -- and safeguarding Web pages that site administrators don't want to be tracked by search bots.

In fact, Google purchased reCaptcha in 2009, in a bid to better block spammers who signed up for free accounts. The approach offered by reCaptcha was notable not just for presenting users with a Captcha phrase, but drawing those images from scans of books. That squares with Google's own Google Books and Google News Archive Search projects, which rely on optical character recognition (OCR) scans of printed source material, which aren't 100% accurate. By designating scanned content for use with the reCaptcha system, however, Google killed two birds with one stone: creating a security check, while also tapping users to manually enter or verify scanned text for free.

In short order, Google also rolled out -- and still offers -- reCaptcha as "a free anti-bot service that helps digitize books," and is available for use by any website. "Answers to reCaptcha challenges are used to digitize textual documents," according to Google's reCaptcha overview. "It's not easy, but through a sophisticated combination of multiple OCR programs, probabilistic language models, and most importantly the answers from millions of humans on the internet, reCaptcha is able to achieve over 99.5% transcription accuracy at the word level."

But no information security challenge-response system -- at least to date -- is perfect. Spam rings also have access to OCR tools, and have duly defeated many Captcha systems. Other criminal groups, echoing Google's crowd-sourced reCaptcha approach, have even tricked users into recording target sites' Captcha phrases -- most sites have a finite pool of possibilities -- with the lure of free porn.

By adopting a more adaptive approach to verifying people's identities via reCaptcha, Google has taken a page from Facebook's login verification system, which looks at a variety of factors when someone attempts to log into an account, including their geographic location, and whether they're using a computer that Facebook has seen before. For unusual types of log-ins, Facebook's system can hit would-be users with an escalating series of security challenges.

Similarly, RSA's Adaptive Authentication system, which is used by about 70 of the country's 100 biggest banks to verify their customers' identity, assesses a number of risk factors before granting access. Based on different risk factors, furthermore, users can also be made to jump through more hoops before the system believes that they are who they say they are.

It's been a busy month for Captcha researchers. Earlier this month, a team of Carnegie Mellon researchers unveiled an inkblot-based Captcha system that's designed to defeat automated attacks.

This week, startup firm Vicarious claimed it has created an algorithm that can successfully defeat any text-based Captcha system, as well as defeat reCaptcha -- widely seen as the toughest Captcha system available -- 90% of the time, New Scientist reported. But Luis von Ahn, who was part of the Carnegie Mellon team that created Captchas, remains skeptical, saying he's counted 50 such Captcha-breaking claims since 2003. "It's hard for me to be impressed since I see these every few months," he told Forbes.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Arlo James Barnes
Arlo James Barnes,
User Rank: Apprentice
11/5/2013 | 6:21:40 PM
re: Google Captcha Dumps Distorted Text Images
"Although Shet didn't spell out exactly what these puzzles might look like, he did say that unlike humans, bots have a tough time with numbers."
Ironically, perhaps.
Register for Dark Reading Newsletters
Partner Perspectives
What's This?
In a digital world inundated with advanced security threats, Intel Security seeks to transform how we live and work to keep our information secure. Through hardware and software development, Intel Security delivers robust solutions that integrate security into every layer of every digital device. In combining the security expertise of McAfee with the innovation, performance, and trust of Intel, this vision becomes a reality.

As we rely on technology to enhance our everyday and business life, we must too consider the security of the intellectual property and confidential data that is housed on these devices. As we increase the number of devices we use, we increase the number of gateways and opportunity for security threats. Intel Security takes the “security connected” approach to ensure that every device is secure, and that all security solutions are seamlessly integrated.
Featured Writers
White Papers
Current Issue
Dark Reading's October Tech Digest
Fast data analysis can stymie attacks and strengthen enterprise security. Does your team have the data smarts?
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
Published: 2014-10-24
Cross-site scripting (XSS) vulnerability in admincp/apilog.php in vBulletin 4.4.2 and earlier, and 5.0.x through 5.0.5 allows remote authenticated users to inject arbitrary web script or HTML via a crafted XMLRPC API request, as demonstrated using the client name.

Published: 2014-10-24 in Not Yet Commons SSL before 0.3.15 does not properly verify that the server hostname matches a domain name in the subject's Common Name (CN) field of the X.509 certificate, which allows man-in-the-middle attackers to spoof SSL servers via an arbitrary valid certificate.

Published: 2014-10-24
WP-Ban plugin before 1.6.4 for WordPress, when running in certain configurations, allows remote attackers to bypass the IP blacklist via a crafted X-Forwarded-For header.

Published: 2014-10-24
Stack-based buffer overflow in CPUMiner before 2.4.1 allows remote attackers to have an unspecified impact by sending a mining.subscribe response with a large nonce2 length, then triggering the overflow with a mining.notify request.

Published: 2014-10-24
Electric Cloud ElectricCommander before 4.2.6 and 5.x before 5.0.3 uses world-writable permissions for (1) and (2), which allows local users to execute arbitrary Perl code by modifying these files.

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Follow Dark Reading editors into the field as they talk with noted experts from the security world.