Researchers with MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) believe that can offer the security world a huge boost in incident response and preparation with a new artificial-intelligence platform it believes can eventually become a secret weapon in squeezing the most productivity from security analyst teams.
Dubbed AI2, the technology has shown the capability to offer three times more predictive capabilities and drastically fewer false positive than todays analytics methods.
CSAIL gave a sneak peek into AI2 in a presentation to the academic community last week at the IEEE International Conference on Big Data Security, which detailed the specifics of a paper released to the public this morning. The driving force behind AI2 is its blending of artificial intelligence with what researchers at CSAIL call "analyst intuition," essentially finding an effective way to continuously model data with unsupervised machine learning while layering in periodic human feedback from skilled analysts to inform a supervised learning model.
"You can think about the system as a virtual analyst,” says CSAIL research scientist Kalyan Veeramachaneni, who developed AI2 with former CSAIL postdoc Ignacio Arnaldo, who is now a chief data scientist at PatternEx. “It continuously generates new models that it can refine in as little as a few hours, meaning it can improve its detection rates significantly and rapidly.”
This offers the best of both worlds in what has become a bright line division in security analytics today. For the most part, security systems today either depend on analyst-driven solutions that rely on rules created by human experts or they lean heavily on machine-learning systems for anomaly detection that trigger highly disruptive false positive rates.
In the paper released today, Veeramachaneni, Arnaldo and their team showed how the system did when tested with 3.6 billion pieces of log data generated by millions of users over three months. During this test, the platform was able to detect 85% of attacks, three times better than previous benchmark, while at the same time reducing false positives by a factor of five.
The approach of melding together human- and computer-based approaches to machine learning has long run into stumbling blocks due to the challenge of manually labeling cybersecurity data for algorithms. The specialized nature of analyzing the data makes it a difficult data set to crack with typical crowdsourcing strategies employed in other arenas of big data analysis. The average person on a site like Amazon Mechanical Turk would be hard-pressed to apply accurate labels for data indicating DDoS or exfiltration attacks, Veermachaneni explained.
Meanwhile, security experts have already tried several generations worth of supervised machine learning models only to find that 'feeding' these systems ends up creating more work rather than saving an analyst time. This is what has lead many organization to dump early analytics solutions in the proverbial waste bin after experiencing those frustrations.
AI2 is able to perform better by bringing together three different unsupervised learning models to sift through raw data before presenting data to the analyst. So on day one, that system offers 200 of the most abnormal events to an analyst, who then manually sifts through those to identify the real attacks. That information is fed back into the system and even within a few days the unsupervised system is presenting as few as 30 to 40 events for verification.
“The more attacks the system detects, the more analyst feedback it receives, which, in turn, improves the accuracy of future predictions,” Veeramachaneni says. “That human-machine interaction creates a beautiful, cascading effect.”
Check out this video for a quick overview of the way AI2 works.