Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Partner Perspectives  Connecting marketers to our tech communities.
10:00 AM
Vincent Weafer
Vincent Weafer
Partner Perspectives

10 Questions To Ask Yourself About Securing Big Data

Big data introduces new wrinkles for managing data volume, workloads, and tools. Securing increasingly large amounts of data begins with a good governance model across the information life cycle. From there, you may need specific controls to address various vulnerabilities. Here are a set of questions to help ensure that you have everything covered.

1. What is your high-risk and high-value data?

Data classification is labor intensive, but you have to do it. It just makes sense: The most valuable or sensitive data requires the highest levels of security. Line-of-business teams have to collaborate with legal and security personnel to get this right. A well-defined classification system should be paired with determination of data stewardship. If everybody owns the data, nobody is really accountable for its care and appropriate use, and it will be more difficult to apply information lifecycle policies.

2. What is your policy for data retention and deletion?

Every company needs clear directions on which data is kept, and for how long. Like any good policy, it needs to be clear - so everyone can follow it. And it needs to be enforced – so they will.

More data means more opportunity, but it can also mean more risk. The first step to reducing that risk is to get rid of what you don’t need. This is a classic tenet of information lifecycle management. If data doesn’t have a purpose, it’s a liability.  One idea for reducing that liability in regards to privacy is to apply de-identification techniques before storing data. That way you can still look for trends, but the data can’t be linked to any individual. De-identification might not be appropriate for any given business need, but it can be a useful approach to have in your toolbox.

3. How do you track who accesses which data?

How you are going to track the data, and who has access to the data, is a foundational element of security. As your analytics programs become more successful, you are likely to be exposed to more sensitive data, so tools and storage mechanisms should have that tracking capability built in from the beginning. After all, if you don’t have the right tracking tools in place at the outset, it’s hard to add them after the fact.

4. Are users creating copies of your corporate data?

Of course they are. Data tends to be copied. A department might want a local copy of a database for faster analysis. A single user might decide to put some data in an Excel spreadsheet, and so on.

So the next question to ask yourself is this: what is the governance model for this process, and how are policies for control passed through to the new copy and the maintainer of this resource? Articulating a clear answer for your company will help prevent sensitive data from leaking out by gradually passing into less secure repositories.

5. What types of encryption and data integrity mechanisms are required?

Beyond technical issues of cryptographic strength, hashing and salting and so on, here are sometimes-overlooked questions to address:

·       Is your encryption setup truly end-to-end, or is there a window of vulnerability between data capture and encryption, or at the point when data is decrypted for analysis? A number of famous data breaches have occurred when hackers grabbed data at the point of capture.

·       Does your encryption method work seamlessly across all databases in your environment?

·       Do you store and manage your encryption keys securely, and who has access to those keys?

Encryption protects data from theft, but doesn’t guarantee its integrity. Separate data integrity mechanisms are required for some use cases, and become increasingly important as data volumes grow and more data sources are incorporated. For example, to mitigate the risk of data poisoning or pollution, a company can implement automatic checks flagging incoming data that doesn’t match the expected volume, file size or pattern.

6. If your algorithms or data analysis methods are proprietary, how do you protect them?

Protecting proprietary discoveries? That’s old hat. What’s easier to miss is the way you arrive at those discoveries. In a competitive industry, a killer algorithm can be a valuable piece of intellectual property.

The data and systems get most of the glory, but analysis methods may deserve just as much protection, with both technical and legal safeguards. Have you vetted and published a plan for securely handling this type of information?

7. How do you validate the security posture of all physical and virtual nodes in your analysis computing cluster?

Big-data analysis often relies on the power of distributed computing. A rogue or infected node can cause your cluster to spring a data leak. Hardware-based controls deserve consideration.

8. Are you working with data generated by Internet of Things sensors?

The key with IoT is to ensure that data is consistently secured from the edge to the data center, with a particular eye on privacy-related data. IoT sensors may present their own security challenges. Are all gateways or other edge devices adequately protected? Industrial devices can be difficult to patch or have a less mature vulnerability management process.

9. What role does the cloud play in your analytics program?

You’ll want to review the contractual obligations and internal policies of those hosting your data or processing. It’s important to know which physical locations they will use, and whether all those facilities have consistent physical (not just logical) security controls. And of course, the geographic locations may impact your regulatory compliance programs.

10. Which individuals in your IT organization are developing security skills and knowledge specific to your big-data tool set?

Over time, your project list, data sets, and toolbox are likely to grow. The more in-house knowledge you develop, the better your own security questions will be. 

Vincent Weafer is Senior Vice President of Intel Security, managing more than 350 researchers across 30 countries. He's also responsible for managing millions of sensors across the globe, all dedicated to protecting our customers from the latest cyber threats. Vincent's team ... View Full Bio
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Peter Fretty
Peter Fretty,
User Rank: Moderator
4/27/2016 | 1:44:57 PM
IoT necessity
Great advice. There is a huge need for a focus on developing and strengthening big data governance and security strategies, especially as we head into the IoT realm. Peter Fretty, IDG blogger for SAS Big Data Forum 
I Smell a RAT! New Cybersecurity Threats for the Crypto Industry
David Trepp, Partner, IT Assurance with accounting and advisory firm BPM LLP,  7/9/2021
Attacks on Kaseya Servers Led to Ransomware in Less Than 2 Hours
Robert Lemos, Contributing Writer,  7/7/2021
It's in the Game (but It Shouldn't Be)
Tal Memran, Cybersecurity Expert, CYE,  7/9/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
Enterprise Cybersecurity Plans in a Post-Pandemic World
Download the Enterprise Cybersecurity Plans in a Post-Pandemic World report to understand how security leaders are maintaining pace with pandemic-related challenges, and where there is room for improvement.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-09-18
Teleport before 4.4.11, 5.x before 5.2.4, 6.x before 6.2.12, and 7.x before 7.1.1 allows forgery of SSH host certificates in some situations.
PUBLISHED: 2021-09-18
Teleport before 4.4.11, 5.x before 5.2.4, 6.x before 6.2.12, and 7.x before 7.1.1 allows alteration of build artifacts in some situations.
PUBLISHED: 2021-09-18
Teleport before 6.2.12 and 7.x before 7.1.1 allows attackers to control a database connection string, in some situations, via a crafted database name or username.
PUBLISHED: 2021-09-18
A path traversal vulnerability on Pardus Software Center's "extractArchive" function could allow anyone on the same network to do a man-in-the-middle and write files on the system.
PUBLISHED: 2021-09-17
static/main-preload.js in Boost Note through 0.22.0 allows remote command execution. A remote attacker may send a crafted IPC message to the exposed vulnerable ipcRenderer IPC interface, which invokes the dangerous openExternal Electron API.