Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Vulnerabilities / Threats

1/30/2019
10:30 AM
Andrew Fast
Andrew Fast
Commentary
Connect Directly
LinkedIn
RSS
E-Mail vvv

Open Source & Machine Learning: A Dynamic Duo

In recent months, machine-learning code has become readily available in the open source community, putting security analysts on a path toward easier data pattern recognition.



As a data scientist, I'm always looking for new patterns and insights that guide action — especially ways to make data science more effective for cybersecurity. One pattern I see consistently throughout the industry is the inability to operationalize machine learning in a modern security operations center. The challenge is that the capabilities behind different machine-learning models are difficult to explain. And if those of us in security can't understand how something works, and how to apply it to what we do, why on earth would we trust it?

Machine learning (ML) can revolutionize the security industry, change the way we identify threats, and mitigate disruption to the business. We've all heard that. Things break down when we start to talk about ML more in practice and less in theory.  

Trust is built through education, testing, and experience. Unfortunately, commercial interests have impaired the situation. Far too often, we see commercial offerings rolled out assuring their audiences they can hit the ground running on day one — without explaining how the artificial intelligence (AI) behind it arrives at specific insights. We call this a "black-box approach." But more "explainable AI" approaches are needed. We don't need to be told why to use a hammer. We need to be told how.

Understanding the "how" comes from practice and learning from others. This points to another fundamental requirement: easy access to ML code with which to experiment and share outcomes and experiences with a like-minded community.

That door leads us to the open source community. The typical security analyst is coming to the table with a specific challenge needed to be solved in a network environment, such as defending against sophisticated threat actors. The analyst knows how to write rules to prevent a specific tactic or technique from being used again, but he cannot detect the patterns to proactively hunt threats because he does not have the models to dynamically assess data as it arrives. If machine learning can be demonstrated to solve particular use cases in an open forum, more analysts will be willing to adopt the technology in their workflows.

Sharing code for use and constant alteration by others — and for the good of others and the enterprises they serve — has proved to be a wonderful learning mechanism. Two decades ago, we saw a similar challenge facing security engineers and analysts when few understood how to accurately assess network packets. Then came along Snort, which changed the game. They learned how to assess at their pace, experiment with the codes and models in simple ways, and in time began to trust real-time traffic analysis in the network intrusion detection system. The open signature ecosystem has grown over time into a global effort.

In recent months, ML code has become readily available in the open source community, offering security analysts opportunities to explore, experiment with, and exchange ideas about ML models, putting them on a path toward easier data pattern recognition. As analysts begin their journey testing out ML codes and models for themselves, here are three best practices to keep in mind:

  • Come prepared with a specific problem: No technology is magic. Machine learning can only solve problems for which it is well-suited. Coming to the table with a defined problem will make it easier to determine whether ML can help and, more importantly, it will help avoid wasting time and spinning wheels that force going back to square one.
  • Start with the end in mind: Having an idea about how the model could be used in production is a helpful guide during model development. A great model that can't be deployed in production is worthless. Starting with the end guides decisions about algorithm choice, data selection, and which question to address.
  • Remember that simplicity is the name of the game: Start with simple data counts, look at frequency and standard deviations, and gradually move to statistics and then onto the ML models. Simpler approaches can be deployed more easily. Remember: A model in the lab does not produce value until it is used on live data.

Sharing one's experiences with the experimentation of models is vital to advancing the adoption of machine learning and building trust over time. As more problems are shared, a deeper catalog of use-case recipes can be generated to help analysts optimize their ML models. Analysts helping other analysts — for the good of the community, for the good of enterprises — is common. It is very easy to detect a pattern here. All doors lead to open source.

Related Content:

Andrew Fast is the chief data scientist and co-founder of CounterFlow AI. CounterFlow AI builds advanced network traffic analysis solutions for world-class security operation centers (SOC). Previously, Dr. Fast served as the chief scientist at Elder Research, a ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
tdsan
50%
50%
tdsan,
User Rank: Strategist
2/4/2019 | 1:18:22 PM
Interesting Article - Open Community Model
Quotes -

→ "And if those of us in security can't understand how something works, and how to apply it to what we do, why on earth would we trust it? Interesting, I would beg to differ, do we really know how a TV works, Refrigerator or even the concepts of an alternator. Let's look at it from what companies are using every day, "Cloud Computing". It was adopted just like the "Smart Phone", people use the "Smart Phone" and the "Cloud Services" but they don't understand its underpinnings. So even from a security perspective, people often use what they feel comfortable with and don't take the time to explore better alternatives. For example, Juniper or Cisco are companies that provided major security products in the security field, their solutions work but did individuals understand that Juniper created its security platforms from FreeBSD (opensource) and Cisco developed its platform from Linux (opensource). Often, we use things because of overall social acceptance and not because it is the best. If that were the case, then why didn't the US move towards IPv6 when it initially came out, if we are looking at things from a security perspective IPv6 provides IPSEC AES256 ESP/AH VPN capability, tunnelling, 17 quintillion addresses, no man in the middle attacks and improved efficiency. IPv6 uses uses a hexadecimal numbering scheme where computers can interpret the key string much more effectively than IPv4, this protocol is 10 times more secure and are currently being made more secure than IPv4.

→ We don't need to be told why to use a hammer. We need to be told how.

Again, I would disagree, I think we need both, to determine the direction and use of a product one must clearly understand its use and outcome. The "how" will come with training (you had mentioned this in your commentary), the "why" and "for what" will follow when we have a clear understanding of the applications intentions so we can narrow down the focus for improved performance and security (understand the entire stack so there is no confusion as to the proposed outcome, if the outcome does not result from your expectations, then you can make adjustments so that the outcome is inline with your intended expectations).

→ The analyst knows how to write rules to prevent a specific tactic or technique from being used again, but he cannot detect the patterns to proactively hunt threats because one does not have the models to dynamically assess data as it arrives...Start with simple data counts, look at frequency and standard deviations

Interesting assumption, I think in this section, the concept of SIEM, Security Analytics tools and various Security frameworks effectively address this area of concern. From McAfee Nitro, Cisco Sourcefire, IBM Qradar, Splunk and others provide this capability where you stated security analysts are not able to derive patterns in an attacker/actor's behavior (this is just not true, it is not the just the attack, but amount of data that results from the event or prior to, a company called Gigamon is addressing this area). In addition, from a historical perspective, large amounts of data are correlated where relationships can and are being developed, pinpointing attack vectors, and analytics using a decentralized model to create a picture for individuals is being derived to create an attack profile. In addition, companies like Extrahop, Virtual Instruments , Dynatrace, AppDynamics and Extremenetworks (Netsight Atlas) are able to pin-point infrastructure weaknesses in the application environment and infrastructure arena. To help identify a possible and/or impending threat, we need to be able to amass this information in a way where we take "Big Data", ML, SIEM and baselining techniques to determine where anomalies arise (companies like DomainTools, InfoBlox and BlueVector gives users the ability to create analytics using IP, traffic patterns and rouge DNS sites to create patterns from sites that had been identified as being nefarious, we can block countries from a single mouse click, not sure why more organizations are not employing this technology but again, it does not seem to be socially accepted, to the point made earlier).

→ Analysts helping other analysts — for the good of the community, for the good of enterprises — is common. It is very easy to detect a pattern here. All doors lead to open source.

Where I do agree with sharing information to make organizations better, it is not just open-source, it is more about people than anything, it is an "Open-Community" model (Palo Alto Networks, Symantec, IBM, Cisco, Juniper - https://www.weforum.org/agenda/archive/cyber-security/). Where thoughts are brought to the forefront and people incorporate their ideas into one framework, the problem is that we are flawed in our thinking (greed, envy keep us back), when one organization feels they are eating less than someone else on the board, then they don't want to play anymore. It really comes down to human nature and the basic ideals of life. You have to ask yourself, why is it that China, Russia, Iran, Iraq, Africa, Pakistan all have a problem with America (other than the bad deals, mass incarceration and murders)? It is about resources and the quality of life. And you have to ask yourself, why do hackers do what they do (respect, money, prestige, political statement, and control). Once we address these problems and start being honest with one another, the world, at that point we can reduce the emphasis on money/power and redirect that value on human life (Gold), then we could start taking the blinders off to reveal the beauty that each one of us brings.

T
7 Truths About BEC Scams
Ericka Chickowski, Contributing Writer,  6/13/2019
DNS Firewalls Could Prevent Billions in Losses to Cybercrime
Curtis Franklin Jr., Senior Editor at Dark Reading,  6/13/2019
10 Notable Security Acquisitions of 2019 (So Far)
Kelly Sheridan, Staff Editor, Dark Reading,  6/15/2019
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Building and Managing an IT Security Operations Program
As cyber threats grow, many organizations are building security operations centers (SOCs) to improve their defenses. In this Tech Digest you will learn tips on how to get the most out of a SOC in your organization - and what to do if you can't afford to build one.
Flash Poll
The State of IT Operations and Cybersecurity Operations
The State of IT Operations and Cybersecurity Operations
Your enterprise's cyber risk may depend upon the relationship between the IT team and the security team. Heres some insight on what's working and what isn't in the data center.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2019-12865
PUBLISHED: 2019-06-17
In radare2 through 3.5.1, cmd_mount in libr/core/cmd_mount.c has a double free for the ms command.
CVE-2017-10720
PUBLISHED: 2019-06-17
Recently it was discovered as a part of the research on IoT devices in the most recent firmware for Shekar Endoscope that the desktop application used to connect to the device suffers from a stack overflow if more than 26 characters are passed to it as the Wi-Fi name. This application is installed o...
CVE-2017-10721
PUBLISHED: 2019-06-17
Recently it was discovered as a part of the research on IoT devices in the most recent firmware for Shekar Endoscope that the device has Telnet functionality enabled by default. This device acts as an Endoscope camera that allows its users to use it in various industrial systems and settings, car ga...
CVE-2017-10722
PUBLISHED: 2019-06-17
Recently it was discovered as a part of the research on IoT devices in the most recent firmware for Shekar Endoscope that the desktop application used to connect to the device suffers from a stack overflow if more than 26 characters are passed to it as the Wi-Fi password. This application is install...
CVE-2017-10723
PUBLISHED: 2019-06-17
Recently it was discovered as a part of the research on IoT devices in the most recent firmware for Shekar Endoscope that an attacker connected to the device Wi-Fi SSID can exploit a memory corruption issue and execute remote code on the device. This device acts as an Endoscope camera that allows it...