Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security //

Database Security

7/26/2012
10:12 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Simplifying SQL Injection Detection

Black Hat researcher releases new lexical analysis tool that doesn't rely on regular expressions

Even many years after gaining prominence as one of the most popular and convenient ways for criminals to break into corporate databases through vulnerable web applications, SQL injection still remains the apple of the eye of many a black hat hacker. While there are plenty of reasons to conspire against enterprises doing a better job preventing these attacks, one of the most fundamental is that it is very difficult to detect SQL injection attacks. This week at Black Hat, a researcher released a new tool to embed in applications that makes that detection process easier.

Click here for more of Dark Reading's Black Hat articles.

Part of the problem with many existing detection mechanisms today, including in many web application firewalls, Nick Galbreath, director of engineering at Etsy, told his audience yesterday, is their dependence on regular SQL expressions to do that detection. Analysis using regular expressions quickly gets bogged down because SQL is such a rich, complicated language. He cited a Black Hat talk back in 2005 by Hanson and Patterson that shows how regular expressions can be prone to breaking down and producing false positives.

"So what happens is, a lot of the web application firewalls have sort of ended up using what I call regular expression soup," he says."It's impossible to debug and test against. "Regular expressions, no matter what you do, are gonna miss something and something that you don't want is going to be flagged as a false positive."

One of the big difficulties in analyzing user input as a potential SQL injection attack is the fact that it is very tough to automatically tell the difference between things like phone numbers or Twitter handles and snippets of SQL statements used to inject code for attacks.

"It turns out to be a difficult problem. How do you detect if user input is SQL, good input, or what? Is that my phone number or an arithmetic expression? Is it a Twitter handle or or is it a SQL variable?" he says. "So trying to disambiguate these things turns out to be a hard problem."

As Galbreath examined that problem, he considered using some existing SQL parsers to do the heavy lifting. But as he doveinto them he found that not only would they only parse their particular flavor of SQL, but that they're not really designed to handle partial bits of code. They're also hard to extend and are very worried about correctness, because they're usually meant to ensure code runs properly. But someone seeking out SQL injection isn't so worried about correctness.

So instead of depending on tools not specifically meant for SQL injection analysis, Galbreath wrote his own.

"It sounds crazy but it turns out is pretty straightforward and not so bad (because) we don't need it to actually run SQL," he says. "What it does is it converts input into a stream of tokens. There's a master list of keywords and functions which is sort of combined against all the major databases. It's not completely intractable and it handles also the comments strings, literals and all the weird cases and things like that."

Called libinjection, it's an open source C library that takes a lexical analysis approach that was trained with real user input data from his company's site, a top 50 internet site with a rich base of user input data. With the tokenization approach, the tool is more lightweight and streamlines the process of analyzing user data.

"So it goes through, disambiguates, merges tokens, specializes, merges strings together, does all the stuff it needs to do and then it does one last step, which is really designed to reduce false positives," he says. "If it sees a bunch of arithmetic operations together, it just merges them all together. My phone number just returns into 1. We don't actually care what the value is because sql injection doesn't care what the value is, just that there's a number there. Same thing with multiple nested parenthesis, it just gets rid of them."

By parsing and analyzing these tokens in this way, what Galbreath finds is that his tool doesn't have to sift through bytes and bytes of user data to find whether or not user input is SQL injection or benign. In fact, through his testing of millions of user input and SQL injection input scenarios he found the magic number of tokens needed to" distinguish between SQL injection and benign input was just five tokens.

"That's pretty interesting compared to regular expression, because then you're parsing the entire input. If you have a 10 megs of input, it's going to be parsing 10 megs of data," he says. This, as soon as it hits 5 tokens, done."

Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.

Don A. Bailey is a pioneer in security for mobile technology, the Internet of Things, and embedded systems. He has a long history of ground-breaking research, protecting mobile users from worldwide tracking systems, securing automobiles from remote attack, and mitigating ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
News
US Formally Attributes SolarWinds Attack to Russian Intelligence Agency
Jai Vijayan, Contributing Writer,  4/15/2021
News
Dependency Problems Increase for Open Source Components
Robert Lemos, Contributing Writer,  4/14/2021
News
FBI Operation Remotely Removes Web Shells From Exchange Servers
Kelly Sheridan, Staff Editor, Dark Reading,  4/14/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: "Elon, I think our cover's been blown."
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2021-31597
PUBLISHED: 2021-04-23
The xmlhttprequest-ssl package before 1.6.1 for Node.js disables SSL certificate validation by default, because rejectUnauthorized (when the property exists but is undefined) is considered to be false within the https.request function of Node.js. In other words, no certificate is ever rejected.
CVE-2021-2296
PUBLISHED: 2021-04-22
Vulnerability in the Oracle VM VirtualBox product of Oracle Virtualization (component: Core). The supported version that is affected is Prior to 6.1.20. Difficult to exploit vulnerability allows high privileged attacker with logon to the infrastructure where Oracle VM VirtualBox executes to compromi...
CVE-2021-2297
PUBLISHED: 2021-04-22
Vulnerability in the Oracle VM VirtualBox product of Oracle Virtualization (component: Core). The supported version that is affected is Prior to 6.1.20. Difficult to exploit vulnerability allows high privileged attacker with logon to the infrastructure where Oracle VM VirtualBox executes to compromi...
CVE-2021-2298
PUBLISHED: 2021-04-22
Vulnerability in the MySQL Server product of Oracle MySQL (component: Server: Optimizer). Supported versions that are affected are 8.0.23 and prior. Easily exploitable vulnerability allows low privileged attacker with network access via multiple protocols to compromise MySQL Server. Successful attac...
CVE-2021-2299
PUBLISHED: 2021-04-22
Vulnerability in the MySQL Server product of Oracle MySQL (component: Server: Optimizer). Supported versions that are affected are 8.0.23 and prior. Easily exploitable vulnerability allows high privileged attacker with network access via multiple protocols to compromise MySQL Server. Successful atta...