Vulnerabilities / Threats
8/18/2010
05:52 PM
Connect Directly
Google+
Twitter
RSS
E-Mail
50%
50%

Researcher Cracks ReCAPTCHA

Homegrown algorithms for cheating Google's reCAPTCHA released earlier this month

A researcher earlier this month demonstrated how he solved Google's reCAPTCHA program even after recent improvements made to the anti-bot and anti-spam tool by the search engine giant.

Chad Houck, an independent researcher, also released the algorithms he wrote to crack reCAPTCHA. Houck had published a white paper on the hack prior to presenting his research at Defcon in Las Vegas, and says that Google made several fixes to reCAPTCHA that defeated several of his algorithms before he was scheduled to give his presentation. He then quickly came up with a few additional approaches with his algorithms, and says he was able to beat the updated reCAPTCHA 30 percent of the time.

"[ReCAPTCHA] has never been wholly secure. There are always ways to crack it," says Houck, whose algorithms have been available online since Defcon. "The information [about the research] is out there. Google still hasn't changed it, which kind of surprises me."

Google, however, thus far has not seen any signs of this being actively used in the wild.

A Google spokesperson says the company had strengthened the verification words in the program both before and after Houck's paper was published. "We introduced changes both before and after its appearance to improve the strength of our verification words," the spokesperson says. "We've found reCAPTCHA to be far more resilient while also striking a good balance with human usability, and we've received very positive feedback from customers. Even so, it's good to bear in mind that while CAPTCHAs remain a powerful and effective tool for fighting abuse, they are best used in combination with other security technologies."

ReCAPTCHA, which was originally created by Carnegie Mellon University and later purchased by Google, basically protects websites from bots and spam by generating distorted text or words that humans can read, but software or optical character readers cannot. The words used by the reCAPTCHA program come from books that are being digitized. The program, which runs on many major websites as a way to validate that the user on the site is a human and not an automated bot or spammer, presents the user with two real words to type into a box, one of which is for verification and the other for digitization purposes.

Houck's hack works using a combination of his own algorithms, including one that decodes the "ribboning" protections reCAPTCHA uses to mask the words from software, a homemade OCR, and a dictionary attack.

He says the weakness of the reCAPTCHA program are in the way it's designed. "It presents two words, one for verification and one for digitization," he says. "Every time someone types the verification word correctly, [the program] assumes they also typed the digitization word correctly."

Google's latest tweaks to the program took out what Houck calls the "inverted blob," or ellipses that help mask the text from bots, and increased the vertical ribboning and dilatation of the text, which positions the characters so they overlap slightly and aren't easy to segment, he says. "[But] I solved that," he says. "So all of their security features are flawed."

His so-called "blanket algorithm" basically straightens out the text so it's machine-readable. "And it segments the characters and gets run through the OCR," which scans them, he says. "I also used a dictionary attack, which makes it a lot more efficient."

Houck says he emailed recaptcha.net about his research, but never got a reply.

Just how difficult would it be for a bad guy to exploit this? "As long as you know how to program well enough, it would take a day to implement my algorithms," he says.

Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.

Kelly Jackson Higgins is Executive Editor at DarkReading.com. She is an award-winning veteran technology and business journalist with more than two decades of experience in reporting and editing for various publications, including Network Computing, Secure Enterprise ... View Full Bio

Comment  | 
Print  | 
More Insights
Register for Dark Reading Newsletters
Partner Perspectives
What's This?
In a digital world inundated with advanced security threats, Intel Security seeks to transform how we live and work to keep our information secure. Through hardware and software development, Intel Security delivers robust solutions that integrate security into every layer of every digital device. In combining the security expertise of McAfee with the innovation, performance, and trust of Intel, this vision becomes a reality.

As we rely on technology to enhance our everyday and business life, we must too consider the security of the intellectual property and confidential data that is housed on these devices. As we increase the number of devices we use, we increase the number of gateways and opportunity for security threats. Intel Security takes the “security connected” approach to ensure that every device is secure, and that all security solutions are seamlessly integrated.
Featured Writers
White Papers
Cartoon
Current Issue
Dark Reading's October Tech Digest
Fast data analysis can stymie attacks and strengthen enterprise security. Does your team have the data smarts?
Flash Poll
Video
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2014-7877
Published: 2014-10-30
Unspecified vulnerability in the kernel in HP HP-UX B.11.31 allows local users to cause a denial of service via unknown vectors.

CVE-2014-3051
Published: 2014-10-29
The Internet Service Monitor (ISM) agent in IBM Tivoli Composite Application Manager (ITCAM) for Transactions 7.1 and 7.2 before 7.2.0.3 IF28, 7.3 before 7.3.0.1 IF30, and 7.4 before 7.4.0.0 IF18 does not verify X.509 certificates from SSL servers, which allows man-in-the-middle attackers to spoof s...

CVE-2014-3668
Published: 2014-10-29
Buffer overflow in the date_from_ISO8601 function in the mkgmtime implementation in libxmlrpc/xmlrpc.c in the XMLRPC extension in PHP before 5.4.34, 5.5.x before 5.5.18, and 5.6.x before 5.6.2 allows remote attackers to cause a denial of service (application crash) via (1) a crafted first argument t...

CVE-2014-3669
Published: 2014-10-29
Integer overflow in the object_custom function in ext/standard/var_unserializer.c in PHP before 5.4.34, 5.5.x before 5.5.18, and 5.6.x before 5.6.2 allows remote attackers to cause a denial of service (application crash) or possibly execute arbitrary code via an argument to the unserialize function ...

CVE-2014-3670
Published: 2014-10-29
The exif_ifd_make_value function in exif.c in the EXIF extension in PHP before 5.4.34, 5.5.x before 5.5.18, and 5.6.x before 5.6.2 operates on floating-point arrays incorrectly, which allows remote attackers to cause a denial of service (heap memory corruption and application crash) or possibly exec...

Best of the Web
Dark Reading Radio
Archived Dark Reading Radio
Follow Dark Reading editors into the field as they talk with noted experts from the security world.