Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Partner Perspectives  Connecting marketers to our tech communities.
4/27/2016
10:00 AM
Vincent Weafer
Vincent Weafer
Partner Perspectives
50%
50%

10 Questions To Ask Yourself About Securing Big Data

Big data introduces new wrinkles for managing data volume, workloads, and tools. Securing increasingly large amounts of data begins with a good governance model across the information life cycle. From there, you may need specific controls to address various vulnerabilities. Here are a set of questions to help ensure that you have everything covered.

1. What is your high-risk and high-value data?

Data classification is labor intensive, but you have to do it. It just makes sense: The most valuable or sensitive data requires the highest levels of security. Line-of-business teams have to collaborate with legal and security personnel to get this right. A well-defined classification system should be paired with determination of data stewardship. If everybody owns the data, nobody is really accountable for its care and appropriate use, and it will be more difficult to apply information lifecycle policies.

2. What is your policy for data retention and deletion?

Every company needs clear directions on which data is kept, and for how long. Like any good policy, it needs to be clear - so everyone can follow it. And it needs to be enforced – so they will.

More data means more opportunity, but it can also mean more risk. The first step to reducing that risk is to get rid of what you don’t need. This is a classic tenet of information lifecycle management. If data doesn’t have a purpose, it’s a liability.  One idea for reducing that liability in regards to privacy is to apply de-identification techniques before storing data. That way you can still look for trends, but the data can’t be linked to any individual. De-identification might not be appropriate for any given business need, but it can be a useful approach to have in your toolbox.

3. How do you track who accesses which data?

How you are going to track the data, and who has access to the data, is a foundational element of security. As your analytics programs become more successful, you are likely to be exposed to more sensitive data, so tools and storage mechanisms should have that tracking capability built in from the beginning. After all, if you don’t have the right tracking tools in place at the outset, it’s hard to add them after the fact.

4. Are users creating copies of your corporate data?

Of course they are. Data tends to be copied. A department might want a local copy of a database for faster analysis. A single user might decide to put some data in an Excel spreadsheet, and so on.

So the next question to ask yourself is this: what is the governance model for this process, and how are policies for control passed through to the new copy and the maintainer of this resource? Articulating a clear answer for your company will help prevent sensitive data from leaking out by gradually passing into less secure repositories.

5. What types of encryption and data integrity mechanisms are required?

Beyond technical issues of cryptographic strength, hashing and salting and so on, here are sometimes-overlooked questions to address:

·       Is your encryption setup truly end-to-end, or is there a window of vulnerability between data capture and encryption, or at the point when data is decrypted for analysis? A number of famous data breaches have occurred when hackers grabbed data at the point of capture.

·       Does your encryption method work seamlessly across all databases in your environment?

·       Do you store and manage your encryption keys securely, and who has access to those keys?

Encryption protects data from theft, but doesn’t guarantee its integrity. Separate data integrity mechanisms are required for some use cases, and become increasingly important as data volumes grow and more data sources are incorporated. For example, to mitigate the risk of data poisoning or pollution, a company can implement automatic checks flagging incoming data that doesn’t match the expected volume, file size or pattern.

6. If your algorithms or data analysis methods are proprietary, how do you protect them?

Protecting proprietary discoveries? That’s old hat. What’s easier to miss is the way you arrive at those discoveries. In a competitive industry, a killer algorithm can be a valuable piece of intellectual property.

The data and systems get most of the glory, but analysis methods may deserve just as much protection, with both technical and legal safeguards. Have you vetted and published a plan for securely handling this type of information?

7. How do you validate the security posture of all physical and virtual nodes in your analysis computing cluster?

Big-data analysis often relies on the power of distributed computing. A rogue or infected node can cause your cluster to spring a data leak. Hardware-based controls deserve consideration.

8. Are you working with data generated by Internet of Things sensors?

The key with IoT is to ensure that data is consistently secured from the edge to the data center, with a particular eye on privacy-related data. IoT sensors may present their own security challenges. Are all gateways or other edge devices adequately protected? Industrial devices can be difficult to patch or have a less mature vulnerability management process.

9. What role does the cloud play in your analytics program?

You’ll want to review the contractual obligations and internal policies of those hosting your data or processing. It’s important to know which physical locations they will use, and whether all those facilities have consistent physical (not just logical) security controls. And of course, the geographic locations may impact your regulatory compliance programs.

10. Which individuals in your IT organization are developing security skills and knowledge specific to your big-data tool set?

Over time, your project list, data sets, and toolbox are likely to grow. The more in-house knowledge you develop, the better your own security questions will be. 

Vincent Weafer is Senior Vice President of Intel Security, managing more than 350 researchers across 30 countries. He's also responsible for managing millions of sensors across the globe, all dedicated to protecting our customers from the latest cyber threats. Vincent's team ... View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Peter Fretty
100%
0%
Peter Fretty,
User Rank: Moderator
4/27/2016 | 1:44:57 PM
IoT necessity
Great advice. There is a huge need for a focus on developing and strengthening big data governance and security strategies, especially as we head into the IoT realm. Peter Fretty, IDG blogger for SAS Big Data Forum 
7 Tips for Choosing Security Metrics That Matter
Ericka Chickowski, Contributing Writer,  10/19/2020
IoT Vulnerability Disclosure Platform Launched
Dark Reading Staff 10/19/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon
Current Issue
Special Report: Computing's New Normal
This special report examines how IT security organizations have adapted to the "new normal" of computing and what the long-term effects will be. Read it and get a unique set of perspectives on issues ranging from new threats & vulnerabilities as a result of remote working to how enterprise security strategy will be affected long term.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-15270
PUBLISHED: 2020-10-22
Parse Server (npm package parse-server) broadcasts events to all clients without checking if the session token is valid. This allows clients with expired sessions to still receive subscription objects. It is not possible to create subscription objects with invalid session tokens. The issue is not pa...
CVE-2018-21266
PUBLISHED: 2020-10-22
** REJECT ** DO NOT USE THIS CANDIDATE NUMBER. ConsultIDs: none. Reason: This candidate was withdrawn by its CNA. Notes: none.
CVE-2018-21267
PUBLISHED: 2020-10-22
** REJECT ** DO NOT USE THIS CANDIDATE NUMBER. ConsultIDs: none. Reason: This candidate was withdrawn by its CNA. Notes: none.
CVE-2020-27673
PUBLISHED: 2020-10-22
An issue was discovered in the Linux kernel through 5.9.1, as used with Xen through 4.14.x. Guest OS users can cause a denial of service (host OS hang) via a high rate of events to dom0, aka CID-e99502f76271.
CVE-2020-27674
PUBLISHED: 2020-10-22
An issue was discovered in Xen through 4.14.x allowing x86 PV guest OS users to gain guest OS privileges by modifying kernel memory contents, because invalidation of TLB entries is mishandled during use of an INVLPG-like attack technique.