Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

02:15 PM

Black Hat, Data Science, Machine Learning, and YOU!

The time has come for security pros to start honing in on their machine learning skills. Here's why.

Black Hat 2015 is in the books and once again, another crazy year in Las Vegas with a mountain of stellar speakers and content. There is an incredible amount to learn in our industry and all of us struggle to keep up. So when I was asked by Dark Reading to author a post on an interesting session, I was happy to oblige. In some ways, I have a bit of an edge. As a long-time Black Hat speaker veteran, I have the honor of serving on the speaker review board to help select many of the presentations on the schedule – all of which I’m always excited to see firsthand!

At the same time, choosing which session to attend, because there’s simply not enough time to see them all, is a serious challenge. In the end, I narrowed in on a topic around data science and machine learning presented by Invincea Labs’ Joshua Saxe to discuss in this blog post. 

Josh’s research and presentation dove into the reasons why data science and machine learning apply to malware, and in particular malware detection, threat intelligence, malware analysis, as well as the scalability of malware analytics. For me, this particular technology and application of advanced math is absolutely fascinating. So few of us in the information security industry have a background in this stuff, and it’s time to get familiar. The sooner the better! I’m also incredibly curious about how data science and machine learning may apply to other areas of security, particularly application security – my day job and passion.

In his presentation, Josh correctly illustrates that the battle we face in security is asymmetric: the bad guys workload remains constant, while the good guys work is ever increasing. Bottom line, given enough time, and right now it ain’t a lot, the bad guys win, which is to say, we get hacked. Yeah, we already know that. The net effect of a defender’s day job, ahead of the hack, is being buried in sea of log data, which is not only annoying, but expensive.

Malware identification, analysis, and classification are now more difficult than ever and effectiveness is little better than a coin flip. For corporate defenders, there is simply too much log data to ever sort through manually, no matter how proficient you are at grep. Getting any real actionable value at scale requires machine learning to isolate signals in the noise.

On stage Josh showed a visualization of the fight using malware as an example. SUPER cool stuff! He showed how to take in a bunch of sample applications and algorithmically analyze them using machine learning algorithms to detect malware. These algorithms will cluster any malware strains together, making them easier to identify with the naked eye. This is a process that can happen in seconds or minutes, which is exactly what we need to make security decisions quickly versus manual log analysis which can take months or never happen at all. We’ll lose every battle that way.

Once malware is identified, and the more malware samples sampled, the smarter the machine learning algorithms become and malware families will start to cluster together. The machine gets smarter with each new strain making it harder for malware or viruses to break through and be successful without having to adapt, which increases their costs.

Josh also cautioned that algorithms tend to go stale over time, requiring them to evolve or risk generating an increasing volume of time-wasting false positives. The idea here, and hopefully the eventual outcome, is the workload will remain fixed for the good guys, while the bad guys have to increase theirs. Turning the economic tables!

Data science and machine learning can also be applied to application security in the same way. It’s possible to identify application entry points, vulnerabilities, defects and more in code using machine learning much faster than most current technology, let alone how a human could perform manually. This is a technique that WhiteHat has been using with success for the past decade: login detection, 404 page detection, page-crawling, attack surface detection, and other areas where machine learning is crucial.

To take this same idea into other areas of security will speed up the process in making the Internet a safer and more secure place for everyone. Machine learning is unquestionably a very powerful tool and I believe points a way forward for the industry. I enjoyed Josh’s talk very much and highly recommend it for other practitioners. I for one am going to start honing my machine learning skills. The hardest part: knowing where to begin, but this is as good of a place as any.

[Learn more about the pitfalls and promises of data science directly from Josh Saxe in his Dark Reading Radio interview with Community Editor Marilyn Cohodas.]

Jeremiah Grossman, Chief of Security Strategy, SentinelOne, Professional Hacker, Black Belt in Brazilian Jiu-Jitsu, & Founder of WhiteHat Security. Jeremiah Grossman's career spans nearly 20 years. He has lived a literal lifetime in computer security to become one of the ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
1/26/2016 | 10:41:02 AM
Data Science & Machine Learning Training at BlackHat
I totally agree with this article.   As the data volumes increase and the stakes get higher, it data science and machine learning are going to become indispensable tools for sec analysts.  It's with this in mind that my team and I are offering two courses for network professionals, the Crash Course in Data Science and the  Crash Course in Machine Learning at BlackHat 2016.  If you come, you'll definitely walk away with some useful skills that can help you protect your network. 
User Rank: Ninja
8/17/2015 | 2:51:47 PM
Machine Learning
The ending line of that article sounded like the ending statement to a post-apocalyptic movie.

I agree heurisitics is going to play a major role in future security measures and techniques. How, is the main question! Your guess is as good as mine.
How SolarWinds Busted Up Our Assumptions About Code Signing
Dr. Jethro Beekman, Technical Director,  3/3/2021
'ObliqueRAT' Now Hides Behind Images on Compromised Websites
Jai Vijayan, Contributing Writer,  3/2/2021
Attackers Turn Struggling Software Projects Into Trojan Horses
Robert Lemos, Contributing Writer,  2/26/2021
Register for Dark Reading Newsletters
White Papers
Cartoon Contest
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you today!
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-03-05
jpeg-xl v0.3.2 is affected by a heap buffer overflow in /lib/jxl/coeff_order.cc ReadPermutation. When decoding a malicous jxl file using djxl, an attacker can trigger arbitrary code execution or a denial of service.
PUBLISHED: 2021-03-05
Apache Superset up to and including 0.38.0 allowed the creation of a Markdown component on a Dashboard page for describing chart's related information. Abusing this functionality, a malicious user could inject javascript code executing unwanted action in the context of the user's browser. The javasc...
PUBLISHED: 2021-03-05
Cross-site scripting vulnerability in in Role authority setting screen of Movable Type 7 r.4705 and earlier (Movable Type 7 Series), Movable Type Advanced 7 r.4705 and earlier (Movable Type Advanced 7 Series), Movable Type 6.7.5 and earlier (Movable Type 6.7 Series), Movable Type Premium 1.39 and ea...
PUBLISHED: 2021-03-05
Cross-site scripting vulnerability in in Asset registration screen of Movable Type 7 r.4705 and earlier (Movable Type 7 Series), Movable Type Advanced 7 r.4705 and earlier (Movable Type Advanced 7 Series), Movable Type 6.7.5 and earlier (Movable Type 6.7 Series), Movable Type Premium 1.39 and earlie...
PUBLISHED: 2021-03-05
Cross-site scripting vulnerability in in Add asset screen of Contents field of Movable Type 7 r.4705 and earlier (Movable Type 7 Series), Movable Type Advanced 7 r.4705 and earlier (Movable Type Advanced 7 Series), Movable Type Premium 1.39 and earlier, and Movable Type Premium Advanced 1.39 and ear...