Cybersecurity is arguably the most rapidly evolving industry, driven by the digitalization of services, our dependency on Internet-connected devices, and the proliferation of malware and hacking attempts in search for data and financial gain. More than 600 million malware samples currently stalk the Internet, and that’s just the tip of the iceberg in terms of cyber threats.
Advanced persistent threats, zero-day vulnerabilities and cyber espionage cannot be identified and stopped by traditional signature-based detection mechanisms. Behavior-based detection and machine learning are just a few technologies in the arsenal of some security companies, with the latter considered by some as the best line of defense.
What is Machine Learning?
The simplest definition is that it’s a set of algorithms that can learn by themselves. Although we’re far from achieving anything remotely similar to human-level capabilities – or even consciousness – these algorithms are pretty handy when properly trained to perform a specific repetitive task. Unlike humans, who tire easily, a machine learning algorithm doesn’t complain and can go through far more data in a short amount of time.
The concept has been around for decades, starting with Arthur Samuel in 1959, and at its core is the drive to overcome static programming instructions by enabling an algorithm to make predictions and decisions based on input data. Consequently, the training data used by the machine learning algorithm to create a model is what makes the algorithm output statistically correct. The expression “garbage in, garbage out” has been widely used to express poor-quality input that produces incorrect or faulty output in machine learning algorithms.
Is There a Single Machine Learning Algorithm?
While the term is loosely used across all fields, machine learning is not an algorithm per se, but a field of study. The various types of algorithms take different approaches towards solving some really specific problems, but it’s all just statistics-based math and probabilities. Decision trees, neural networks, deep learning, genetic algorithms and Bayesian networks are just a few approaches towards developing machine learning algorithms that can solve specific problems.
Breaking down machine learning into the types of problems and tasks they try to solve revolves around the methods used to solve problems. Supervised learning is one such method, involving training the algorithm to learn a general rule based on examples of inputs and desired outputs. Unsupervised learning and reinforcement learning are also commonly used in cybersecurity to enable the algorithm to discover for itself hidden patterns in data, or dynamically interact with malware samples to achieve a goal (e.g. malware detection) based on feedback in the form of penalties and rewards.
Is Machine Learning Enough for Cybersecurity?
Some security companies argue that machine learning technologies are enough to identify and detect all types of attacks on companies and organizations. Regardless of how well trained an algorithm is, though, there is a chance it will “miss” some malware samples or behaviors. Even among a large set of machine learning algorithms, each trained to identify a specific malware strand or a specific behavior, chances are that one of them could miss something.
This silver shotgun shell approach towards security-centric machine learning algorithms is definitely the best implementation, as more task-oriented algorithms are not only more accurate and reliable, but also more efficient. But the misconception that that’s all cybersecurity should be about is misguided.
When protecting physical or virtual endpoints, it’s vital to have more layers of defense against malware. Behavior-based detection that monitors processes and applications throughout their entire execution lifetime, web filtering and application control are vital in covering all possible attack vectors that could compromise a system.