Cybersecurity In-Depth: Feature articles on security strategy, latest trends, and people to know.

10 Tips for Better Security Data Management

CISOs must build out a security data management and security data architecture to get the most out of their security data for the least amount of investment.

Ericka Chickowski, Contributing Writer

March 13, 2024

10 Min Read

a funnel shape in blue with made of 0 and 1 to indicate a tunnel of data

Source: Robert Eastman via Alamy Stock Photo

Whether it is to support compliance efforts for regulatory mandated logging, feed daily security operations center (SOC) work, support threat hunters, or bolster incident response capabilities, security telemetry data is the lifeblood of a healthy cybersecurity program. But the more security relies on data and analysis to carry out its core missions, the more data it must manage, curate, and protect — while keeping data-related costs tightly under control.

As such, security data management and security data architecture are quickly becoming key competencies that CISOs must build out over time. This will take careful consideration and action at both the tactical and strategic levels. The following are some best practices that security leaders should keep in mind as they seek to improve security data management in order to get the most out of their security data for the least amount of investment.

Normalization and Correlation Can Be a Heavy Lift

With so many sources of data — log data from varying systems, telemetry data from security monitoring, and threat intelligence from numerous internal and external sources among them — one of the hardest parts of security data management is in simply normalizing this data so it can be mashed up and queried consistently across the lot of it.

"The biggest mistakes security operations teams make today involve underestimating the complexity of integrating diverse security data sources and not prioritizing the effective normalization and correlation of data, leading to inefficiencies and potential security gaps," says John Pirc, vice president at Netenrich, a San Jose, Calif.-based security and operations analytics SaaS company.

Before SOCs pick out and start using shiny, new data-driven tools, they need to think carefully about whether they'll play nicely with existing systems and data streams. Data ingestion and mobility can quickly spiral into costly expenses — and a lot of it has to do with barriers to integration and correlation that stem from normalization and data quality issues.

"For SOCs evaluating or deploying data-focused tools, the most important best practices are ensuring the tool's scalability and compatibility with existing systems and verifying that it provides actionable insights rather than just data collection," Pirc says.

Standard Field Scheme for Log Data

One way that a security team can extend its ability to use more tooling and get the most out of the data sources available for security analysis is to be proactive about normalization.

"Security operations teams should establish a clear and standardized default field scheme for all log data within the organization," recommends Or Saya, cybersecurity architect at CardinalOps, a detection posture management company. "This involves defining the standard set of fields that should be present in every log entry, such as time stamp, source IP, destination IP, user, and action taken. Ensure consistency across different log sources to facilitate correlation and analysis."

As Saya explains, this standardization can help analysts map even the most obscure log sources to an understandable model, which makes it easier to build detection and correlation content around new sources. But this will take investment, as someone will need to babysit the process to continuously validate that the data is normalized against the scheme. If it isn't validated, then the organization is likely to suffer from blind spots that will be tough to pick up on.

Capabilities for Creating Content on Top of Data Streams

Relying solely on prebuilt artificial intelligence (AI) detection rules provided by a security product may not adequately address the organization's specific threat landscape and unique risks. It is important to acknowledge that while AI detection rules in security products are valuable, they may not cover all scenarios. SOC teams should implement a strategy for creating custom detection rules tailored to the organization's environment, industry, and specific risks. These custom rules can enhance the precision of threat detection and response by addressing context-specific threats that may not be covered by generic AI rules.

Training Data Lineage to Assure Trustworthy AI-Backed Correlation

Security data correlation and detection capabilities have come a long way through the use of data science — and that is bound to only accelerate through the intelligent use of AI and large language models (LLMs).

"The area of security operations most ripe for automation is the extraction of security-relevant signals from what looks like a pile of noise," says Brian Neuhaus, CTO of Americas at Vectra AI. However, the reliability of AI and LLMs in crunching security data for meaningful signals will hinge on a lot of data lineage and data management issues.

"Companies that don’t have any experience with language models are beginning to integrate them into their products to analyze and reason about security incidents without understanding how those models operate, what data they were trained on, or why LLMs can hallucinate answers to the questions they shouldn't be able to answer, as well as hallucinating answers to questions they should be able to answer," Neuhaus says. "Poorly integrated AI and LLM capabilities will result in people having an ersatz sense of security, without actually being secured. Security leadership will need to vet AI-driven security correlation tooling carefully, particularly the data lineage of the training data that went into developing the models.

Evaluate Data Sources With an Eye Toward Costs

Ingesting poor quality data into a security information and event management (SIEM) or other security tool can be expensive and distract security analysts from making meaningful insights. Security operations should be thinking carefully about the sources they lean on to do analysis — evaluating and choosing sources with a sense of purpose and an eye toward costs.

"Defining clear objectives and requirements and how exactly more or better quality data will drive better decision-making will greatly benefit organizations," says Balazs Greksza, threat response lead at Ontinue, a managed detection and response (MDR) provider. "Data integrations should serve a purpose and have a perceived value beforehand to help prioritize the meaningful ones. Balancing lower TCO with security value and time to value, while integrating with all important internal data sources and tools, is a difficult equation that needs to be solved."

Beware Garbage Data

As organizations evaluate the data sources that feed their detection and correlation engines, organizations should be on the hunt for excising the noise from data streams.

"We really try to suppress garbage data from getting even near our environment," says Greg Notch, CISO of MDR firm Expel and a longtime security veteran who served as CISO for the National Hockey League prior to this job. This data is neither high fidelity nor does it point toward meaningful outcomes.

Some examples of garbage data include network detections that don't come from highly restricted environments and untuned Windows logs — beside authentication, he says.

"These alerts are not high fidelity. They're not going to help us deliver a security outcome for you, so we're going to ignore it," Notch says, explaining the process his team takes to eliminate garbage data. "We've got very smart folks who are thinking about that data ingestion, what to take, what to leave behind, what things matter, how they fit together, so how an alert from your EDR [endpoint detection and response tool] would fit together with an alert from your network connectivity, and only taking the pieces of that that matter to make that correlation and give you the package data."

Cross-Pollinate SecOps Teams With Data Science Expertise

Picking the right data sources for effective analysis — and then coming up with the detection content to use those sources effectively — requires a blend of security and data science know-how. Whether it is by hiring security analysts with strong data science knowledge, training existing analysts in these concepts, hiring data science pros to work side by side with the security experts, or some combination of the three, security operations teams will increasingly need to cross-pollinate their skill sets with data science expertise.

In a robust organization such as an MSP or large enterprise, adding data scientists to the mix is increasingly a best practice.

"There's a yin and yang to the data science part of it and the people who are doing the security part of it," Notch says, noting that the right combination will feed more cost-effective design of security data architecture and execution of security data management. "The people who are building the detections that are both for a specific tool and span multiple tools, they understand what data they need to build those detections. They look for it in the data sets, and they communicate with the data science people who are very much about the cost optimization of the data pipelines. They're saying, 'Well, all right, we can get you just the pieces of that you need without having to bring along all of the other logging and all of the other telemetry information that comes along with it, or you can go query this other system where we don't have to pull it in.'"

Decouple Data for Flexibility

Many security strategists have been grabbing for the elusive brass ring of security data consolidation for decades. That was for so long the promise of SIEM — to provide a "single pane of glass" look into security-related data and offer a unified platform for data correlation and detection. But data ingestion and data egress costs across enterprise architecture, along with issues of normalization and parsing, have all contributed to clouding these waters. Some experts say that security needs to rethink the consolidation narrative, at least for the short- and medium-term.

"What you want to be able to do is decouple your analytics, your data and detection components, and even the incident response so that you can start mixing and matching them and basically removing them and adding them as you need to," says Oliver Rochford, a longtime security industry analyst and security futurist.

A Data Lake for More Cost-Effective Observability

As a part of that decoupling, an increasing number of security organizations are layering security data lakes into their analytics architecture. These unstructured pools of security data provide a flexible place to quickly and cheaply ingest new data sources that can still be directly queried and upon which new security analytics capabilities can be built or integrated.

"Security data lakes provide security teams more flexibility and faster time to value as they are not having to monkey with their back-end data architectures. A lot of legacy SIEMS require full-time employees just to manage the data infrastructure, and it requires a lot of care and feeding, particularly as you add new data sources," explains Ken Westin, field CISO of Panther Labs.

At the same time, he cautions not to get caught in the weeds with implementation.

"One mistake I have seen organizations make is to try and roll their own security data lake, which becomes a science project taking their security team's attention off of finding threats and more time as system administrators," he says.

Capabilities for Creating Content on Top of Data Streams

Telemetry and log data both play a role in the security data ecosystem, but the detection content on top of that is what's mainly prized by the SOC analysts. As Netenrich's Pirc recommends, teams should be seeking data-driven security tools that provide those detection rules and security analysis content right out of the box. But prebuilt rules are probably not going to complete an organization's need for sifting through the data to find risks unique to them. No matter the architecture, organizations also need to pair their security data management capabilities with the ability to create good content on top of the data pipeline.

"It is important to acknowledge that while AI detection rules in security products are valuable, they may not cover all scenarios. SOC teams should implement a strategy for creating custom detection rules tailored to the organization's environment, industry, and specific risks," CardinalOps' Saya says. "These custom rules can enhance the precision of threat detection and response by addressing context-specific threats that may not be covered by generic AI rules."

Future-Proof for New Data Sources

With the security market moving so quickly and the pace of development of new digital systems that must be monitored and logged rapidly advancing, security teams are going to need to future-proof their security analytics capabilities. This is why security leaders should be examining their analytics and data management tools based not just on today's needs but for the flexibility to handle the unknown future needs without ripping and replacing.

"We don't know what key data sources will be need in five years from now," says Olivier Spielmann, global lead of managed detection and response services at Kudelski Security. "So it is important that we have some capabilities to have a platform and services to be able to ingest those new, unknown security controls that will be put in place and without having to change every two years."

About the Author

Ericka Chickowski, Contributing Writer

Ericka Chickowski specializes in coverage of information technology and business innovation. She has focused on information security for the better part of a decade and regularly writes about the security industry as a contributor to Dark Reading.

Related Topics

Related Topics

Related Topics

Related Topics

10 Tips for Better Security Data Management

Normalization and Correlation Can Be a Heavy Lift

Standard Field Scheme for Log Data

Capabilities for Creating Content on Top of Data Streams

Training Data Lineage to Assure Trustworthy AI-Backed Correlation

Evaluate Data Sources With an Eye Toward Costs

Beware Garbage Data

Cross-Pollinate SecOps Teams With Data Science Expertise

Decouple Data for Flexibility

A Data Lake for More Cost-Effective Observability

Capabilities for Creating Content on Top of Data Streams

Future-Proof for New Data Sources

About the Author

Related Topics

Related Topics

Related Topics

Related Topics

<span class="ArticleBase-LargeTitle">10 Tips for Better Security Data Management</span>10 Tips for Better Security Data Management

Normalization and Correlation Can Be a Heavy Lift

Standard Field Scheme for Log Data

Capabilities for Creating Content on Top of Data Streams

Training Data Lineage to Assure Trustworthy AI-Backed Correlation

Evaluate Data Sources With an Eye Toward Costs

Beware Garbage Data

Cross-Pollinate SecOps Teams With Data Science Expertise

Decouple Data for Flexibility

A Data Lake for More Cost-Effective Observability

Capabilities for Creating Content on Top of Data Streams

Future-Proof for New Data Sources

About the Author

10 Tips for Better Security Data Management