I had an interesting conversation recently about the after-effects of Heartbleed and the challenges facing static analysis with Barton Miller, the chief scientist of the Software Assurance Marketplace (SWAMP), which is a project I'm sponsoring at the Department of Homeland Security to improve software quality and raise the bar of static analysis capabilities.
I wanted to know if the problems associated with static analysis can be attributed to a lackluster analysis engine. Are the core engines in static analysis tools robust enough to keep pace with the complexity and size of modern software? Obviously, these tools appear to be lacking in depth and breadth, which results in oversimplifying, which may lead tools to make inaccurate assumptions about code; as a result, they miss (simple) things and produce a generous amount of false positives.
Generating false positives can be annoying and very time consuming for developers to triage. However, when static analysis tools miss things, then the users need to know what was missed. That's why it is important for me, in the role that I'm in, to sponsor and support tool studies to understand what a tool can and cannot do, such as the ones sponsored by the National Institute of Standards and Technology (NIST), with its Static Analysis Metrics and Tool Evaluation (SAMATE) program, the National Security Agency (NSA), the Center for Assured Software Tool Study, as well as a tool study I'm sponsoring through the Security and Software Engineering Research Center (S2ERC).
Tool studies are essential for improving static analysis capabilities. They model the behavior of tools to help identify gaps that exist in techniques, and provide some evidence as to a tool's strengths and weaknesses. What is important to note is that tools perform differently on different program structures. All Java code is not written the same: Not all C/C++ code is written the same; so the program structure (as seen with OpenSSL) strongly impacts how static analysis tools perform. My end goal with tool studies is to understand where the gaps are, and innovate -- sponsor research and development projects to create new techniques and capabilities that will help advance the state-of-the-art, specifically improving open-source static analysis capabilities.
Many organizations have a structure that is based on having various development contracts (some outsourced), with a host of developers that have different coding styles, and use different programming languages to support their enterprise-wide application environments. Given this complicated approach, it is not realistic for an organization to use one static analysis tool to satisfy all of its software assurance needs.
The fallacy or lack of understanding of static analysis also creates residual risks in many organizations, where weaknesses are present in software code, but the tool is not able to produce the evidence that can be attributed to a particular coding violation. This creates a situation where an organization that uses static analysis to assess a new system or application will get a report from a tool and remediate what is stated in the report, and then proceed to deploy that system or application online in a production network without knowing what risks remain. The residual risk associated with static analysis could give an adversary an attack vector to exploit vulnerable systems.
There is no über tool; all tools struggle to some degree with tool coverage. Every tool has a sweet spot (or several sweet spots), some thing or things it does very well; for instance, some tools may be really good at identifying SQL injection, cross-site scripting, or code quality bugs or issues, but may not analyze other weakness classes that well.
The results from tools studies have suggested that using multiple tools together can improve tool coverage, and improve the accuracy of results. As tool studies produce more powerful analytics and results, mixing and matching tools (open-source and commercial) can help organizations reach deeper into their codes to reduce the residual risks associated with conducting static analysis.
This is where SWAMP plays a key role in helping create better tools and provide a way for software researchers to discover new techniques and capabilities in static analysis. The SWAMP analysis framework uses CodeDx to bring together the many sweet spots of static analysis tools. CodeDx takes results from disparate static analysis tools, normalizes and correlates the results. I like to say, "The sum of many is better than the sum of one."