Neural fuzzing can help uncover bugs in software better than traditional tools, company says.

4 Min Read

Microsoft has developed a new technique to test software for security flaws that uses deep neural networks and machine learning techniques to improve upon current testing approaches.

Early experiments with the new "neural fuzzing" method show that it can help organizations uncover bugs in software better than traditional fuzzing tests, according to Microsoft.

Fuzzing is an approach where security testers try to uncover commonly exploitable flaws in applications by targeting the software with deliberately malformed data—or inputs—to see if it will crash so they can investigate and fix the cause.

The effectiveness of these tests depends to a large extent on how well the malicious inputs are crafted. Generally, the more input you find to trigger a crash, the more application vulnerabilities you are likely going to be able to identify and close.

Security testers can use a few different methods to generate the malicious or malformed input.

Microsoft's approach improves on one method that uses data from previous fuzz tests as feedback for creating new mutations or tests. The approach involves a "learning technique that uses neural networks to learn patterns in the input files from past fuzzing explorations to guide future fuzzing explorations," according to Microsoft.

"The neural models learn a function to predict good (and bad) locations in input files to perform fuzzing mutations based on the past mutations and corresponding code coverage information," the company said.

Neural networks are computing systems loosely modeled on the human brain that can autonomously learn from observed data. Many modern applications such as face and voice recognition and weather prediction use such networks.

Microsoft has implemented the new technique in American Fuzzy Lop (AFL), an open-source fuzzer that uses observed behavior from previous fuzzing executions to guide future tests. Researchers from the company tested the method on four target programs using parsers for the ELF, XML, PNG and PDF file formats.

The results were "very encouraging," Microsoft Development Lead William Blum said in a blog Monday. "We saw significant improvements over traditional AFL in terms of code coverage, unique code paths and crashes for the four input formats, " he said.

For instance, for the ELF parser, the neural AFL reported more than 20 crashes compared to zero with the traditional AFL fuzzer. Similarly, the neural AFL found 38% more crashes than traditional AFL for text-based file formats such as XML.

"This is astonishing given that neural AFL was trained on AFL itself," Blum said. The only area where the neural AFL did not perform as well was with PDF format, likely because of the large size of the files.

"AFL is essentially a mutational fuzzer that uses feedback from the target software to determine how the test cases are mutated," says Jonathan Knudsen, Security Strategist at Synopsys.  "Microsoft has modified the feedback mechanism with a neural network."

The challenge with any fuzzing lies in finding the finite set of malformed inputs that are most likely to trigger bugs, Knudsen says. "Microsoft has modified how AFL chooses its inputs in a way that gave them better code coverage and more bugs for a couple of applications."

Moreno Carullo, co-founder and chief technical officer of Nozomi Networks says Microsoft has broken new ground with this technique. "It is absolutely a new [and] innovative approach," Carullo says.

"Neural fuzzing increases the speed of fuzzing as a test method and also finds more issues that cause a crash," he notes. The automation behind the approach can drastically reduce the time organizations require to test their software for security vulnerabilities. "Without this, it would take a team of heuristics experts a lot more time to discover the issues."

According to Blum, the capability that Microsoft has demonstrated only scratches the surface of what can be achieved using neural networks in fuzzing. "Right now, our model only learns fuzzing locations, but we could also use it to learn other fuzzing parameters such as the type of mutation or strategy to apply," he said.

Related content:

Join Dark Reading LIVE for two days of practical cyber defense discussions. Learn from the industry’s most knowledgeable IT security experts. Check out the INsecurity agenda here.

About the Author(s)

Jai Vijayan, Contributing Writer

Jai Vijayan is a seasoned technology reporter with over 20 years of experience in IT trade journalism. He was most recently a Senior Editor at Computerworld, where he covered information security and data privacy issues for the publication. Over the course of his 20-year career at Computerworld, Jai also covered a variety of other technology topics, including big data, Hadoop, Internet of Things, e-voting, and data analytics. Prior to Computerworld, Jai covered technology issues for The Economic Times in Bangalore, India. Jai has a Master's degree in Statistics and lives in Naperville, Ill.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights