Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security //

Database Security

10:12 AM
Connect Directly

Simplifying SQL Injection Detection

Black Hat researcher releases new lexical analysis tool that doesn't rely on regular expressions

Even many years after gaining prominence as one of the most popular and convenient ways for criminals to break into corporate databases through vulnerable web applications, SQL injection still remains the apple of the eye of many a black hat hacker. While there are plenty of reasons to conspire against enterprises doing a better job preventing these attacks, one of the most fundamental is that it is very difficult to detect SQL injection attacks. This week at Black Hat, a researcher released a new tool to embed in applications that makes that detection process easier.

Click here for more of Dark Reading's Black Hat articles.

Part of the problem with many existing detection mechanisms today, including in many web application firewalls, Nick Galbreath, director of engineering at Etsy, told his audience yesterday, is their dependence on regular SQL expressions to do that detection. Analysis using regular expressions quickly gets bogged down because SQL is such a rich, complicated language. He cited a Black Hat talk back in 2005 by Hanson and Patterson that shows how regular expressions can be prone to breaking down and producing false positives.

"So what happens is, a lot of the web application firewalls have sort of ended up using what I call regular expression soup," he says."It's impossible to debug and test against. "Regular expressions, no matter what you do, are gonna miss something and something that you don't want is going to be flagged as a false positive."

One of the big difficulties in analyzing user input as a potential SQL injection attack is the fact that it is very tough to automatically tell the difference between things like phone numbers or Twitter handles and snippets of SQL statements used to inject code for attacks.

"It turns out to be a difficult problem. How do you detect if user input is SQL, good input, or what? Is that my phone number or an arithmetic expression? Is it a Twitter handle or or is it a SQL variable?" he says. "So trying to disambiguate these things turns out to be a hard problem."

As Galbreath examined that problem, he considered using some existing SQL parsers to do the heavy lifting. But as he doveinto them he found that not only would they only parse their particular flavor of SQL, but that they're not really designed to handle partial bits of code. They're also hard to extend and are very worried about correctness, because they're usually meant to ensure code runs properly. But someone seeking out SQL injection isn't so worried about correctness.

So instead of depending on tools not specifically meant for SQL injection analysis, Galbreath wrote his own.

"It sounds crazy but it turns out is pretty straightforward and not so bad (because) we don't need it to actually run SQL," he says. "What it does is it converts input into a stream of tokens. There's a master list of keywords and functions which is sort of combined against all the major databases. It's not completely intractable and it handles also the comments strings, literals and all the weird cases and things like that."

Called libinjection, it's an open source C library that takes a lexical analysis approach that was trained with real user input data from his company's site, a top 50 internet site with a rich base of user input data. With the tokenization approach, the tool is more lightweight and streamlines the process of analyzing user data.

"So it goes through, disambiguates, merges tokens, specializes, merges strings together, does all the stuff it needs to do and then it does one last step, which is really designed to reduce false positives," he says. "If it sees a bunch of arithmetic operations together, it just merges them all together. My phone number just returns into 1. We don't actually care what the value is because sql injection doesn't care what the value is, just that there's a number there. Same thing with multiple nested parenthesis, it just gets rid of them."

By parsing and analyzing these tokens in this way, what Galbreath finds is that his tool doesn't have to sift through bytes and bytes of user data to find whether or not user input is SQL injection or benign. In fact, through his testing of millions of user input and SQL injection input scenarios he found the magic number of tokens needed to" distinguish between SQL injection and benign input was just five tokens.

"That's pretty interesting compared to regular expression, because then you're parsing the entire input. If you have a 10 megs of input, it's going to be parsing 10 megs of data," he says. This, as soon as it hits 5 tokens, done."

Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.

Don A. Bailey is a pioneer in security for mobile technology, the Internet of Things, and embedded systems. He has a long history of ground-breaking research, protecting mobile users from worldwide tracking systems, securing automobiles from remote attack, and mitigating ... View Full Bio

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Attackers Leave Stolen Credentials Searchable on Google
Kelly Sheridan, Staff Editor, Dark Reading,  1/21/2021
How to Better Secure Your Microsoft 365 Environment
Kelly Sheridan, Staff Editor, Dark Reading,  1/25/2021
Register for Dark Reading Newsletters
White Papers
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: I can't find the back door.
Current Issue
2020: The Year in Security
Download this Tech Digest for a look at the biggest security stories that - so far - have shaped a very strange and stressful year.
Flash Poll
Assessing Cybersecurity Risk in Today's Enterprises
Assessing Cybersecurity Risk in Today's Enterprises
COVID-19 has created a new IT paradigm in the enterprise -- and a new level of cybersecurity risk. This report offers a look at how enterprises are assessing and managing cyber-risk under the new normal.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-01-25
The MediaWiki "Report" extension has a Cross-Site Request Forgery (CSRF) vulnerability. Before fixed version, there was no protection against CSRF checks on Special:Report, so requests to report a revision could be forged. The problem has been fixed in commit f828dc6 by making use of Medi...
PUBLISHED: 2021-01-25
ORAS is open source software which enables a way to push OCI Artifacts to OCI Conformant registries. ORAS is both a CLI for initial testing and a Go Module. In ORAS from version 0.4.0 and before version 0.9.0, there is a "zip-slip" vulnerability. The directory support feature allows the ...
PUBLISHED: 2021-01-25
An XML external entity (XXE) injection vulnerability was discovered in the Nutch DmozParser and is known to affect Nutch versions < 1.18. XML external entity injection (also known as XXE) is a web security vulnerability that allows an attacker to interfere with an application's processing of XML ...
PUBLISHED: 2021-01-25
When handler-router component is enabled in servicecomb-java-chassis, authenticated user may inject some data and cause arbitrary code execution. The problem happens in versions between 2.0.0 ~ 2.1.3 and fixed in Apache ServiceComb-Java-Chassis 2.1.5
PUBLISHED: 2021-01-22
Pepperl+Fuchs Comtrol IO-Link Master in Version 1.5.48 and below is prone to an authenticated reflected POST Cross-Site Scripting