Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

02:15 PM

Black Hat, Data Science, Machine Learning, and… YOU!

The time has come for security pros to start honing in on their machine learning skills. Here's why.

Black Hat 2015 is in the books and once again, another crazy year in Las Vegas with a mountain of stellar speakers and content. There is an incredible amount to learn in our industry and all of us struggle to keep up. So when I was asked by Dark Reading to author a post on an interesting session, I was happy to oblige. In some ways, I have a bit of an edge. As a long-time Black Hat speaker veteran, I have the honor of serving on the speaker review board to help select many of the presentations on the schedule – all of which I’m always excited to see firsthand!

At the same time, choosing which session to attend, because there’s simply not enough time to see them all, is a serious challenge. In the end, I narrowed in on a topic around data science and machine learning presented by Invincea Labs’ Joshua Saxe to discuss in this blog post. 

Josh’s research and presentation dove into the reasons why data science and machine learning apply to malware, and in particular malware detection, threat intelligence, malware analysis, as well as the scalability of malware analytics. For me, this particular technology and application of advanced math is absolutely fascinating. So few of us in the information security industry have a background in this stuff, and it’s time to get familiar. The sooner the better! I’m also incredibly curious about how data science and machine learning may apply to other areas of security, particularly application security – my day job and passion.

In his presentation, Josh correctly illustrates that the battle we face in security is asymmetric: the bad guys workload remains constant, while the good guys work is ever increasing. Bottom line, given enough time, and right now it ain’t a lot, the bad guys win, which is to say, we get hacked. Yeah, we already know that. The net effect of a defender’s day job, ahead of the hack, is being buried in sea of log data, which is not only annoying, but expensive.

Malware identification, analysis, and classification are now more difficult than ever and effectiveness is little better than a coin flip. For corporate defenders, there is simply too much log data to ever sort through manually, no matter how proficient you are at grep. Getting any real actionable value at scale requires machine learning to isolate signals in the noise.

On stage Josh showed a visualization of the fight using malware as an example. SUPER cool stuff! He showed how to take in a bunch of sample applications and algorithmically analyze them using machine learning algorithms to detect malware. These algorithms will cluster any malware strains together, making them easier to identify with the naked eye. This is a process that can happen in seconds or minutes, which is exactly what we need to make security decisions quickly versus manual log analysis which can take months or never happen at all. We’ll lose every battle that way.

Once malware is identified, and the more malware samples sampled, the smarter the machine learning algorithms become and malware families will start to cluster together. The machine gets smarter with each new strain making it harder for malware or viruses to break through and be successful without having to adapt, which increases their costs.

Josh also cautioned that algorithms tend to go stale over time, requiring them to evolve or risk generating an increasing volume of time-wasting false positives. The idea here, and hopefully the eventual outcome, is the workload will remain fixed for the good guys, while the bad guys have to increase theirs. Turning the economic tables!

Data science and machine learning can also be applied to application security in the same way. It’s possible to identify application entry points, vulnerabilities, defects and more in code using machine learning much faster than most current technology, let alone how a human could perform manually. This is a technique that WhiteHat has been using with success for the past decade: login detection, 404 page detection, page-crawling, attack surface detection, and other areas where machine learning is crucial.

To take this same idea into other areas of security will speed up the process in making the Internet a safer and more secure place for everyone. Machine learning is unquestionably a very powerful tool and I believe points a way forward for the industry. I enjoyed Josh’s talk very much and highly recommend it for other practitioners. I for one am going to start honing my machine learning skills. The hardest part: knowing where to begin, but this is as good of a place as any.

[Learn more about the pitfalls and promises of data science directly from Josh Saxe in his Dark Reading Radio interview with Community Editor Marilyn Cohodas.]

Jeremiah Grossman, Chief of Security Strategy, SentinelOne, Professional Hacker, Black Belt in Brazilian Jiu-Jitsu, & Founder of WhiteHat Security. Jeremiah Grossman's career spans nearly 20 years. He has lived a literal lifetime in computer security to become one of the ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
1/26/2016 | 10:41:02 AM
Data Science & Machine Learning Training at BlackHat
I totally agree with this article.   As the data volumes increase and the stakes get higher, it data science and machine learning are going to become indispensable tools for sec analysts.  It's with this in mind that my team and I are offering two courses for network professionals, the Crash Course in Data Science and the  Crash Course in Machine Learning at BlackHat 2016.  If you come, you'll definitely walk away with some useful skills that can help you protect your network. 
User Rank: Ninja
8/17/2015 | 2:51:47 PM
Machine Learning
The ending line of that article sounded like the ending statement to a post-apocalyptic movie.

I agree heurisitics is going to play a major role in future security measures and techniques. How, is the main question! Your guess is as good as mine.
COVID-19: Latest Security News & Commentary
Dark Reading Staff 8/3/2020
Pen Testers Who Got Arrested Doing Their Jobs Tell All
Kelly Jackson Higgins, Executive Editor at Dark Reading,  8/5/2020
'BootHole' Vulnerability Exposes Secure Boot Devices to Attack
Kelly Sheridan, Staff Editor, Dark Reading,  7/29/2020
Register for Dark Reading Newsletters
White Papers
Cartoon Contest
Current Issue
Special Report: Computing's New Normal, a Dark Reading Perspective
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
The Changing Face of Threat Intelligence
The Changing Face of Threat Intelligence
This special report takes a look at how enterprises are using threat intelligence, as well as emerging best practices for integrating threat intel into security operations and incident response. Download it today!
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2020-08-05
LimeSurvey 4.3.2 allows reflected XSS because application/controllers/LSBaseController.php lacks code to validate parameters.
PUBLISHED: 2020-08-05
USVN (aka User-friendly SVN) before 1.0.9 allows XSS via SVN logs.
PUBLISHED: 2020-08-05
IBM UrbanCode Deploy (UCD),,, and is vulnerable to an XML External Entity Injection (XXE) attack when processing XML data. A remote attacker could exploit this vulnerability to expose sensitive information or consume memory resources. IBM X-Force ID: 181848.
PUBLISHED: 2020-08-05
CAMS for HIS CENTUM CS 3000 (includes CENTUM CS 3000 Small) R3.08.10 to R3.09.50, CENTUM VP (includes CENTUM VP Small, Basic) R4.01.00 to R6.07.00, B/M9000CS R5.04.01 to R5.05.01, and B/M9000 VP R6.01.01 to R8.03.01 allows a remote unauthenticated attacker to bypass authentication and send altered c...
PUBLISHED: 2020-08-05
Directory traversal vulnerability in CAMS for HIS CENTUM CS 3000 (includes CENTUM CS 3000 Small) R3.08.10 to R3.09.50, CENTUM VP (includes CENTUM VP Small, Basic) R4.01.00 to R6.07.00, B/M9000CS R5.04.01 to R5.05.01, and B/M9000 VP R6.01.01 to R8.03.01 allows a remote unauthenticated attacker to cre...