OWASP Benchmark Clarifications
Jason, this discussion is great and I'm thrilled that the OWASP Benchmark is driving improvements in application vulnerability detection tools. But I did want to add a few clarifications on how the Benchmark works.
In your Benchmark results table, you indicate: "True Positives detected by Fortify SCA, and declared Secure by Benchmark" - 9206. While its great that Fortify found all these additional vulnerabilities in the Benchmark, the Benchmark makes no claims there are no other vulnerabilities in it beyond the ones specifically tested for and scored. Any such results found by any tool are simply ignored in the Benchmark scoring system, so they have no effect on the score one way or the other. So, saying that Fortify found a bunch of issues the project wasn't aware of and other tools did not find simply isn't accurate. Most of the tools we tested found a bunch of additional issues, just like Fortify did.
As part of our 1.2 effort, we have eliminated a number of unintended vulnerabilities of the type tested for in the Benchmark, particularly XSS. This is an ongoing effort and we have more work to do there. In fact, if you can send us your results, we'll be happy to use them to help us track down and eliminate more of them. That said, these 'extra' vulnerabilities are, and should be, ignored as they simply aren't measured/scored.
You also mention: "False Positives reported by Fortify SCA" - 4,852. In the Benchmark v1.1, there are 9206 True Negative test cases, Meaning 9206 test cases that are safe, and do not possess the type of vulnerability they are testing for. And Fortify reported 4,852 of the as actual vulnerabilities (False Positives as you said). The Benchmark project scores that as 4,852 out of 9,206, which is a 52.7% False Positive rate. So if your True Positive rate is actually 100% as you claim, the Benchmark would produce an average score for Fortify as 100% - 52.7% which equals 47.3%. This average score for Fortify is higher than the scores the project has seen with the results we were able to generate so we are pleased to see that your team's efforts have improved its score against the Benchmark and that your customer's will ultimately benefit from these improvements.
I think discussions like this are incredibly healthy and hope lots of vendors for both commercial and free tools will get involved to make both the OWASP Benchmark project and their tools better for the community we both serve. And given the amount of discussions I'm having with project participants at OWASP, the discussions are just getting started, and many tools, including Fortify, are getting better already. And in fact, I'm going to talk about exactly that at my OWASP AppSec USA talk on the Benchmark project tomorrow afternoon at 4. If any of you are around, please come by!!
Dave Wichers
OWASP Benchmark Project Lead
User Rank: Apprentice
9/25/2015 | 5:49:01 PM