Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

1/30/2019
10:30 AM
Andrew Fast
Andrew Fast
Commentary
Connect Directly
LinkedIn
RSS
E-Mail vvv
50%
50%

Open Source & Machine Learning: A Dynamic Duo

In recent months, machine-learning code has become readily available in the open source community, putting security analysts on a path toward easier data pattern recognition.

As a data scientist, I'm always looking for new patterns and insights that guide action — especially ways to make data science more effective for cybersecurity. One pattern I see consistently throughout the industry is the inability to operationalize machine learning in a modern security operations center. The challenge is that the capabilities behind different machine-learning models are difficult to explain. And if those of us in security can't understand how something works, and how to apply it to what we do, why on earth would we trust it?

Machine learning (ML) can revolutionize the security industry, change the way we identify threats, and mitigate disruption to the business. We've all heard that. Things break down when we start to talk about ML more in practice and less in theory.  

Trust is built through education, testing, and experience. Unfortunately, commercial interests have impaired the situation. Far too often, we see commercial offerings rolled out assuring their audiences they can hit the ground running on day one — without explaining how the artificial intelligence (AI) behind it arrives at specific insights. We call this a "black-box approach." But more "explainable AI" approaches are needed. We don't need to be told why to use a hammer. We need to be told how.

Understanding the "how" comes from practice and learning from others. This points to another fundamental requirement: easy access to ML code with which to experiment and share outcomes and experiences with a like-minded community.

That door leads us to the open source community. The typical security analyst is coming to the table with a specific challenge needed to be solved in a network environment, such as defending against sophisticated threat actors. The analyst knows how to write rules to prevent a specific tactic or technique from being used again, but he cannot detect the patterns to proactively hunt threats because he does not have the models to dynamically assess data as it arrives. If machine learning can be demonstrated to solve particular use cases in an open forum, more analysts will be willing to adopt the technology in their workflows.

Sharing code for use and constant alteration by others — and for the good of others and the enterprises they serve — has proved to be a wonderful learning mechanism. Two decades ago, we saw a similar challenge facing security engineers and analysts when few understood how to accurately assess network packets. Then came along Snort, which changed the game. They learned how to assess at their pace, experiment with the codes and models in simple ways, and in time began to trust real-time traffic analysis in the network intrusion detection system. The open signature ecosystem has grown over time into a global effort.

In recent months, ML code has become readily available in the open source community, offering security analysts opportunities to explore, experiment with, and exchange ideas about ML models, putting them on a path toward easier data pattern recognition. As analysts begin their journey testing out ML codes and models for themselves, here are three best practices to keep in mind:

  • Come prepared with a specific problem: No technology is magic. Machine learning can only solve problems for which it is well-suited. Coming to the table with a defined problem will make it easier to determine whether ML can help and, more importantly, it will help avoid wasting time and spinning wheels that force going back to square one.
  • Start with the end in mind: Having an idea about how the model could be used in production is a helpful guide during model development. A great model that can't be deployed in production is worthless. Starting with the end guides decisions about algorithm choice, data selection, and which question to address.
  • Remember that simplicity is the name of the game: Start with simple data counts, look at frequency and standard deviations, and gradually move to statistics and then onto the ML models. Simpler approaches can be deployed more easily. Remember: A model in the lab does not produce value until it is used on live data.

Sharing one's experiences with the experimentation of models is vital to advancing the adoption of machine learning and building trust over time. As more problems are shared, a deeper catalog of use-case recipes can be generated to help analysts optimize their ML models. Analysts helping other analysts — for the good of the community, for the good of enterprises — is common. It is very easy to detect a pattern here. All doors lead to open source.

Related Content:

Andrew Fast is the chief data scientist and co-founder of CounterFlow AI. CounterFlow AI builds advanced network traffic analysis solutions for world-class security operation centers (SOC). Previously, Dr. Fast served as the chief scientist at Elder Research, a ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Oldest First  |  Newest First  |  Threaded View
tdsan
50%
50%
tdsan,
User Rank: Ninja
2/4/2019 | 1:18:22 PM
Interesting Article - Open Community Model
Quotes -

→ "And if those of us in security can't understand how something works, and how to apply it to what we do, why on earth would we trust it? Interesting, I would beg to differ, do we really know how a TV works, Refrigerator or even the concepts of an alternator. Let's look at it from what companies are using every day, "Cloud Computing". It was adopted just like the "Smart Phone", people use the "Smart Phone" and the "Cloud Services" but they don't understand its underpinnings. So even from a security perspective, people often use what they feel comfortable with and don't take the time to explore better alternatives. For example, Juniper or Cisco are companies that provided major security products in the security field, their solutions work but did individuals understand that Juniper created its security platforms from FreeBSD (opensource) and Cisco developed its platform from Linux (opensource). Often, we use things because of overall social acceptance and not because it is the best. If that were the case, then why didn't the US move towards IPv6 when it initially came out, if we are looking at things from a security perspective IPv6 provides IPSEC AES256 ESP/AH VPN capability, tunnelling, 17 quintillion addresses, no man in the middle attacks and improved efficiency. IPv6 uses uses a hexadecimal numbering scheme where computers can interpret the key string much more effectively than IPv4, this protocol is 10 times more secure and are currently being made more secure than IPv4.

→ We don't need to be told why to use a hammer. We need to be told how.

Again, I would disagree, I think we need both, to determine the direction and use of a product one must clearly understand its use and outcome. The "how" will come with training (you had mentioned this in your commentary), the "why" and "for what" will follow when we have a clear understanding of the applications intentions so we can narrow down the focus for improved performance and security (understand the entire stack so there is no confusion as to the proposed outcome, if the outcome does not result from your expectations, then you can make adjustments so that the outcome is inline with your intended expectations).

→ The analyst knows how to write rules to prevent a specific tactic or technique from being used again, but he cannot detect the patterns to proactively hunt threats because one does not have the models to dynamically assess data as it arrives...Start with simple data counts, look at frequency and standard deviations

Interesting assumption, I think in this section, the concept of SIEM, Security Analytics tools and various Security frameworks effectively address this area of concern. From McAfee Nitro, Cisco Sourcefire, IBM Qradar, Splunk and others provide this capability where you stated security analysts are not able to derive patterns in an attacker/actor's behavior (this is just not true, it is not the just the attack, but amount of data that results from the event or prior to, a company called Gigamon is addressing this area). In addition, from a historical perspective, large amounts of data are correlated where relationships can and are being developed, pinpointing attack vectors, and analytics using a decentralized model to create a picture for individuals is being derived to create an attack profile. In addition, companies like Extrahop, Virtual Instruments , Dynatrace, AppDynamics and Extremenetworks (Netsight Atlas) are able to pin-point infrastructure weaknesses in the application environment and infrastructure arena. To help identify a possible and/or impending threat, we need to be able to amass this information in a way where we take "Big Data", ML, SIEM and baselining techniques to determine where anomalies arise (companies like DomainTools, InfoBlox and BlueVector gives users the ability to create analytics using IP, traffic patterns and rouge DNS sites to create patterns from sites that had been identified as being nefarious, we can block countries from a single mouse click, not sure why more organizations are not employing this technology but again, it does not seem to be socially accepted, to the point made earlier).

→ Analysts helping other analysts — for the good of the community, for the good of enterprises — is common. It is very easy to detect a pattern here. All doors lead to open source.

Where I do agree with sharing information to make organizations better, it is not just open-source, it is more about people than anything, it is an "Open-Community" model (Palo Alto Networks, Symantec, IBM, Cisco, Juniper - https://www.weforum.org/agenda/archive/cyber-security/). Where thoughts are brought to the forefront and people incorporate their ideas into one framework, the problem is that we are flawed in our thinking (greed, envy keep us back), when one organization feels they are eating less than someone else on the board, then they don't want to play anymore. It really comes down to human nature and the basic ideals of life. You have to ask yourself, why is it that China, Russia, Iran, Iraq, Africa, Pakistan all have a problem with America (other than the bad deals, mass incarceration and murders)? It is about resources and the quality of life. And you have to ask yourself, why do hackers do what they do (respect, money, prestige, political statement, and control). Once we address these problems and start being honest with one another, the world, at that point we can reduce the emphasis on money/power and redirect that value on human life (Gold), then we could start taking the blinders off to reveal the beauty that each one of us brings.

T
AI Is Everywhere, but Don't Ignore the Basics
Howie Xu, Vice President of AI and Machine Learning at Zscaler,  9/10/2019
Fed Kaspersky Ban Made Permanent by New Rules
Dark Reading Staff 9/11/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win a Starbucks Card! Click Here
Latest Comment: This comment is waiting for review by our moderators.
Current Issue
7 Threats & Disruptive Forces Changing the Face of Cybersecurity
This Dark Reading Tech Digest gives an in-depth look at the biggest emerging threats and disruptive forces that are changing the face of cybersecurity today.
Flash Poll
The State of IT Operations and Cybersecurity Operations
The State of IT Operations and Cybersecurity Operations
Your enterprise's cyber risk may depend upon the relationship between the IT team and the security team. Heres some insight on what's working and what isn't in the data center.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-4147
PUBLISHED: 2019-09-16
IBM Sterling File Gateway 2.2.0.0 through 6.0.1.0 is vulnerable to SQL injection. A remote attacker could send specially-crafted SQL statements, which could allow the attacker to view, add, modify or delete information in the back-end database. IBM X-Force ID: 158413.
CVE-2019-5481
PUBLISHED: 2019-09-16
Double-free vulnerability in the FTP-kerberos code in cURL 7.52.0 to 7.65.3.
CVE-2019-5482
PUBLISHED: 2019-09-16
Heap buffer overflow in the TFTP protocol handler in cURL 7.19.4 to 7.65.3.
CVE-2019-15741
PUBLISHED: 2019-09-16
An issue was discovered in GitLab Omnibus 7.4 through 12.2.1. An unsafe interaction with logrotate could result in a privilege escalation
CVE-2019-16370
PUBLISHED: 2019-09-16
The PGP signing plugin in Gradle before 6.0 relies on the SHA-1 algorithm, which might allow an attacker to replace an artifact with a different one that has the same SHA-1 message digest, a related issue to CVE-2005-4900.