Fighting malware is a modern arms race. Not only has malware evolved to be more evasive and harder to detect, but their vast numbers make it even more difficult to handle. As a result, detecting a malware has become a big data problem which requires the help of self-learning machines to scale the knowledge of analysts, handle the complexity beyond human capabilities, and improve the accuracy of threat detection.
There are number of approaches to this problem; choosing the right algorithm to serve the security engine’s purpose is not an easy task. In this article, we will refer to machine learning (ML) as an application of artificial intelligence (AI) where computers learn without being explicitly programmed. We will look into some use cases and challenges, starting with an interesting question: why do we see this growing trend now? The answer has to do with lower costs and increased availability of private and public cloud technology for collecting, storing and analyzing big data in real time, and the academic research progress in ML and related algorithms such as Deep Neural Networks (DNN).
Putting together a successful ML cybersecurity implementation is a multidisciplinary task, which requires coding capabilities, as well as cyber domain expertise, and deep math/statistics knowledge, originally described by Drew Conway in his data science Venn diagram. ML models can be used to classify malicious files (including ransomwares), analyze abnormal user and network behavior, perform advanced event analytics, identify encrypted malware traffic, synthesize threat intelligence feeds, and fuse in-direct telemetry signals with security events in cloud deployments.
Implementing a complete solution requires embedding the selected ML algorithm into a three-stage workflow of operation. First, the ML engine performs analysis, usually enhanced with other detection technologies to deliver open and integrated defense in depth. Then, enforcement is performed across the entire network preferably in an automatic and unified way. And finally, Cyber Threat Intelligence (CTI) is shared and received with other systems and entities, to further enrich and add context to the next analysis task -- feedback.
Cyber Defense Challenges and Machine Learning
A ML model is only as good as the content from the data sources that feed it (better known as: garbage in, garbage out). Similarly, performing analysis without domain expertise and context can be misleading, and measuring the engine’s performance/accuracy is tricky.
Another challenge is that attackers also use machines for different attack phases, as described by Intel Security in their 2017 threat predictions report. But the most interesting challenge is the risk of attackers actually manipulating ML defense engines. A visible example, as described by Dave Gershgorn in Popular Science last year, was presented by Google’s researchers who manipulated road signs to deceive a driverless car, using black-box attack principles that can be leveraged also in the cyber domain to fool the machine.
Machines are taking over many aspects of our lives (Did anyone say autonomous cars?), but given the pros and cons described, should we let the machines take over our defense systems? The answer is yes and no. On the one hand, machines can outsmart human capabilities on certain aspects of scale and complexity. On the other hand, they can be manipulated, but so can humans. The debate is ongoing But based on the buzz in the market it's clear that machines are already transforming the way we perform cyber defense.