Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Security Management //


01:30 PM
Joe Vadakkan
Joe Vadakkan
Joe Vadakkan

Cloud Monitoring: The New 'Alert Overload' Problem & How to Fix It

While cloud computing offers a variety of proven business benefits, from a security perspective, IT teams are often still wavering in uncharted territory - and cloud monitoring is one such area.

"Alert overload" in cybersecurity is a well-understood phenomenon. You'd be hard-pressed to find an IT security professional who hasn't experienced the pains associated with trying to keep up with a cacophony of security tools and services, each of which generates a deluge of alerts warranting analysis and action. The security industry is working to solve this problem by using automation, artificial intelligence, machine learning and other technologies designed to cut down on the "noise." Unfortunately for IT security professionals, as they tackle this issue, another overload problem is emerging -- one that is even more onerous and dangerous: cloud monitoring.

The cloudy state of affairs
The public cloud is becoming the underlying fabric of enterprise IT organizations. According to Gartner, Inc.: "The worldwide public cloud services market is projected to grow 17.5 percent in 2019 to total $214.3 billion, up from $182.4 billion in 2018." While cloud computing offers a variety of proven business benefits, from a security perspective, IT teams are often still wavering in uncharted territory -- and cloud monitoring is one such area.

The intended purpose of cloud monitoring is to analyze cloud applications, services, assets and environments to quickly detect and remediate potential threats in the cloud. While its purpose is straightforward, the act of successfully executing cloud monitoring functions can be much more complex. There are two main reasons for this:

    1. Cloud environments are in a state of continuous change, thanks to next-generation application development processes, such as DevOps and continuous delivery; multi-cloud and hybrid-cloud architectures; multi-data sources, including from third parties; and the general flexibility and elasticity of cloud environments. This means new vulnerabilities and potential compliance violations are continuously created, making it impossible for IT security teams to keep up using traditional manual monitoring and remediation processes.


  1. There are too many tools issuing alerts, and not enough IT staff available to manage them. Not only are alerts being generated at a rapid pace due to the dynamic nature of cloud environments, but IT infrastructures today are overly complex, with too many vendor applications, services and tools issuing alerts. In fact, the average enterprise has 70 different security vendors in its infrastructure. In this state, organizations simply cannot hire enough people to monitor and remediate all of the issues arising in the turbulent cloud, especially with the industry's chronic shortage in cybersecurity and cloud skills.

In short, over-stretched IT teams are struggling to monitor, manage and secure dynamic hybrid- and multi-cloud environments from ever-evolving threats, vulnerabilities and compliance violations, leading to increased enterprise risk.

The fix: first, see clearly in the cloud
As a starting point, normally organizations discover cloud issues in one of two ways: something goes wrong, or they take steps to attain full visibility into the cloud environment and detect issues before something goes wrong. Obviously, the second approach is far preferable to the first. The way to achieve full visibility is to use cloud monitoring tools, which can effectively track disk configurations, mislabeled tags, CIS benchmarks, regulatory modules, NIST frameworks, etc. Fortunately, many cloud providers already offer strong monitoring tools within their own platforms, such as Google Cloud Stackdriver and its cloud security command center, Microsoft Azure security center and AWS CloudWatch/CloudTrail, which integrates with AWS's Macie, GuardDuty and Inspector. IT teams can use these tools natively and get good visibility into the critical state of a cloud environment (low, medium, high, etc.). If you start validating an environment with these tools, your organization will find out how vulnerable its cloud security posture really is.

Where this approach fails is in the next step…once someone sees a high or medium criticality, it may get fixed on a one-time basis, but that is not enough. You need to fix the full infrastructure code across the entire environment (via configuration management) and add guardrails so that the security posture continually evolves. It requires more than a one-time point fix solution.

Auto remediation = automation and action
This post-visibility point is where auto-remediation can have a significant impact -- it allows IT teams to create functions within their environment with logic written around it. For example, you can use if/then statements such as "if there is a misconfiguration, then do XYZ task." Or, "if there is an S3 bucket that has a read attribute to it, then shut it off or encrypt every object that gets uploaded." This approach moves beyond cloud visibility and monitoring -- it puts the information through the DevOps lifecycle and applies automation to take appropriate action to make it secure.

A broad shift in processes (and mindset)
It's important to note that organizations first need to evolve their business processes in order to effectively use auto remediation. As enterprises move from on-premise to the cloud, they may still be using traditional (legacy) change-management processes. That is where things can really slow down. Old processes are often at odds with automation, so they need to be modernized. This is a struggle that many, if not most, enterprises are currently having, which explains why so few organizations are currently taking advantage of auto remediation.

The problem is not the tools -- there are plenty of those -- it's the shift in mindset that is needed around how to automate workflows. It comes back to human behavior -- the muscle-memory of doing things a certain way over a long period of time. (Anyone who's used self-parking technology in a car knows how unnerving it can be to take their hands off the steering wheel for the first time.) Automation needs to be the basis of the "new process," and people need to understand that the old process is not a solution to anything -- it's actually the root cause of the cloud monitoring and remediation problem.

As usual, it's people, process and technology
When it comes to cloud monitoring, the old "people, process and technology" approach to organizational change still applies. In most cases today, the process is outdated, the technology is not being used properly, and people are in a no-win situation. By implementing technology that enables full visibility into the cloud and auto-remediation, agile change-management processes that accommodate the new world of automation, and expanding the concept of "people" to third-party specialists to take the burden off staff, organizations can stop the cloud-monitoring overload problem before it metastasizes into breaches and compliance violations.

— Joe Vadakkan is the global cloud security leader at Optiv Security. He also serves as the president of the Cloud Security Alliance, Southwest Chapter.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
COVID-19: Latest Security News & Commentary
Dark Reading Staff 9/25/2020
Hacking Yourself: Marie Moe and Pacemaker Security
Gary McGraw Ph.D., Co-founder Berryville Institute of Machine Learning,  9/21/2020
Startup Aims to Map and Track All the IT and Security Things
Kelly Jackson Higgins, Executive Editor at Dark Reading,  9/22/2020
Register for Dark Reading Newsletters
White Papers
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
How IT Security Organizations are Attacking the Cybersecurity Problem
How IT Security Organizations are Attacking the Cybersecurity Problem
The COVID-19 pandemic turned the world -- and enterprise computing -- on end. Here's a look at how cybersecurity teams are retrenching their defense strategies, rebuilding their teams, and selecting new technologies to stop the oncoming rise of online attacks.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, when determining the common dimension size of two tensors, TFLite uses a `DCHECK` which is no-op outside of debug compilation modes. Since the function always returns the dimension of the first tensor, malicious attackers can ...
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, a crafted TFLite model can force a node to have as input a tensor backed by a `nullptr` buffer. This can be achieved by changing a buffer index in the flatbuffer serialization to convert a read-only tensor to a read-write one....
PUBLISHED: 2020-09-25
In tensorflow-lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, if a TFLite saved model uses the same tensor as both input and output of an operator, then, depending on the operator, we can observe a segmentation fault or just memory corruption. We have patched the issue in d58c96946b and ...
PUBLISHED: 2020-09-25
In TensorFlow Lite before versions 1.15.4, 2.0.3, 2.1.2, 2.2.1 and 2.3.1, saved models in the flatbuffer format use a double indexing scheme: a model has a set of subgraphs, each subgraph has a set of operators and each operator has a set of input/output tensors. The flatbuffer format uses indices f...
PUBLISHED: 2020-09-25
In TensorFlow Lite before versions 2.2.1 and 2.3.1, models using segment sum can trigger writes outside of bounds of heap allocated buffers by inserting negative elements in the segment ids tensor. Users having access to `segment_ids_data` can alter `output_index` and then write to outside of `outpu...