Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Partner Perspectives  Connecting marketers to our tech communities.
10:00 AM
Vincent Weafer
Vincent Weafer
Partner Perspectives

10 Questions To Ask Yourself About Securing Big Data

Big data introduces new wrinkles for managing data volume, workloads, and tools. Securing increasingly large amounts of data begins with a good governance model across the information life cycle. From there, you may need specific controls to address various vulnerabilities. Here are a set of questions to help ensure that you have everything covered.

1. What is your high-risk and high-value data?

Data classification is labor intensive, but you have to do it. It just makes sense: The most valuable or sensitive data requires the highest levels of security. Line-of-business teams have to collaborate with legal and security personnel to get this right. A well-defined classification system should be paired with determination of data stewardship. If everybody owns the data, nobody is really accountable for its care and appropriate use, and it will be more difficult to apply information lifecycle policies.

2. What is your policy for data retention and deletion?

Every company needs clear directions on which data is kept, and for how long. Like any good policy, it needs to be clear - so everyone can follow it. And it needs to be enforced – so they will.

More data means more opportunity, but it can also mean more risk. The first step to reducing that risk is to get rid of what you don’t need. This is a classic tenet of information lifecycle management. If data doesn’t have a purpose, it’s a liability.  One idea for reducing that liability in regards to privacy is to apply de-identification techniques before storing data. That way you can still look for trends, but the data can’t be linked to any individual. De-identification might not be appropriate for any given business need, but it can be a useful approach to have in your toolbox.

3. How do you track who accesses which data?

How you are going to track the data, and who has access to the data, is a foundational element of security. As your analytics programs become more successful, you are likely to be exposed to more sensitive data, so tools and storage mechanisms should have that tracking capability built in from the beginning. After all, if you don’t have the right tracking tools in place at the outset, it’s hard to add them after the fact.

4. Are users creating copies of your corporate data?

Of course they are. Data tends to be copied. A department might want a local copy of a database for faster analysis. A single user might decide to put some data in an Excel spreadsheet, and so on.

So the next question to ask yourself is this: what is the governance model for this process, and how are policies for control passed through to the new copy and the maintainer of this resource? Articulating a clear answer for your company will help prevent sensitive data from leaking out by gradually passing into less secure repositories.

5. What types of encryption and data integrity mechanisms are required?

Beyond technical issues of cryptographic strength, hashing and salting and so on, here are sometimes-overlooked questions to address:

·       Is your encryption setup truly end-to-end, or is there a window of vulnerability between data capture and encryption, or at the point when data is decrypted for analysis? A number of famous data breaches have occurred when hackers grabbed data at the point of capture.

·       Does your encryption method work seamlessly across all databases in your environment?

·       Do you store and manage your encryption keys securely, and who has access to those keys?

Encryption protects data from theft, but doesn’t guarantee its integrity. Separate data integrity mechanisms are required for some use cases, and become increasingly important as data volumes grow and more data sources are incorporated. For example, to mitigate the risk of data poisoning or pollution, a company can implement automatic checks flagging incoming data that doesn’t match the expected volume, file size or pattern.

6. If your algorithms or data analysis methods are proprietary, how do you protect them?

Protecting proprietary discoveries? That’s old hat. What’s easier to miss is the way you arrive at those discoveries. In a competitive industry, a killer algorithm can be a valuable piece of intellectual property.

The data and systems get most of the glory, but analysis methods may deserve just as much protection, with both technical and legal safeguards. Have you vetted and published a plan for securely handling this type of information?

7. How do you validate the security posture of all physical and virtual nodes in your analysis computing cluster?

Big-data analysis often relies on the power of distributed computing. A rogue or infected node can cause your cluster to spring a data leak. Hardware-based controls deserve consideration.

8. Are you working with data generated by Internet of Things sensors?

The key with IoT is to ensure that data is consistently secured from the edge to the data center, with a particular eye on privacy-related data. IoT sensors may present their own security challenges. Are all gateways or other edge devices adequately protected? Industrial devices can be difficult to patch or have a less mature vulnerability management process.

9. What role does the cloud play in your analytics program?

You’ll want to review the contractual obligations and internal policies of those hosting your data or processing. It’s important to know which physical locations they will use, and whether all those facilities have consistent physical (not just logical) security controls. And of course, the geographic locations may impact your regulatory compliance programs.

10. Which individuals in your IT organization are developing security skills and knowledge specific to your big-data tool set?

Over time, your project list, data sets, and toolbox are likely to grow. The more in-house knowledge you develop, the better your own security questions will be. 

Vincent Weafer is Senior Vice President of Intel Security, managing more than 350 researchers across 30 countries. He's also responsible for managing millions of sensors across the globe, all dedicated to protecting our customers from the latest cyber threats. Vincent's team ... View Full Bio
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Peter Fretty
Peter Fretty,
User Rank: Moderator
4/27/2016 | 1:44:57 PM
IoT necessity
Great advice. There is a huge need for a focus on developing and strengthening big data governance and security strategies, especially as we head into the IoT realm. Peter Fretty, IDG blogger for SAS Big Data Forum 
COVID-19: Latest Security News & Commentary
Dark Reading Staff 10/23/2020
7 Tips for Choosing Security Metrics That Matter
Ericka Chickowski, Contributing Writer,  10/19/2020
Russian Military Officers Unmasked, Indicted for High-Profile Cyberattack Campaigns
Kelly Jackson Higgins, Executive Editor at Dark Reading,  10/19/2020
Register for Dark Reading Newsletters
White Papers
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2020-10-23
A Cross-Site Request Forgery (CSRF) vulnerability is identified in FruityWifi through 2.4. Due to a lack of CSRF protection in page_config_adv.php, an unauthenticated attacker can lure the victim to visit his website by social engineering or another attack vector. Due to this issue, an unauthenticat...
PUBLISHED: 2020-10-23
FruityWifi through 2.4 has an unsafe Sudo configuration [(ALL : ALL) NOPASSWD: ALL]. This allows an attacker to perform a system-level (root) local privilege escalation, allowing an attacker to gain complete persistent access to the local system.
PUBLISHED: 2020-10-23
NVIDIA GeForce Experience, all versions prior to, contains a vulnerability in the ShadowPlay component which may lead to local privilege escalation, code execution, denial of service or information disclosure.
PUBLISHED: 2020-10-23
An arbitrary command execution vulnerability exists in the fopen() function of file writes of UCMS v1.4.8, where an attacker can gain access to the server.
PUBLISHED: 2020-10-23
NVIDIA GeForce Experience, all versions prior to, contains a vulnerability in NVIDIA Web Helper NodeJS Web Server in which an uncontrolled search path is used to load a node module, which may lead to code execution, denial of service, escalation of privileges, and information disclosure.