Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Perimeter

12/7/2012
01:44 PM
Adrian Lane
Adrian Lane
Commentary
50%
50%

What Is Big Data?

Big data is not about buying more big iron

When someone says big data, what do you think of?

Do you think of mainframes? Data warehouses? Do you think of Oracle Grids, Exadata, or Teradata clusters?

Perhaps you think of Hadoop, MongoDB, Cassandra, or CouchDB? Or maybe it's any NoSQL database?

Or perhaps you think it's just a giant mass of data in one place?

If you read press articles on big data, then it's all of these things. It's my belief that no good definition of big data exists today. In fact, the term is so overused, and I think intentionally so, that it's almost meaningless. I want to address that problem here.

And I'll state up front that the big data phenomena is not because people are buying more big iron.

During the past year, I've spent an inordinate amount of time researching security in and around big data clusters. It has been a challenge; each time I think I have a handle on one aspect of what constitutes big data, I find an exception that breaks the conceptual model I've created. Every time I think I've quantified a specific attribute or feature, I find another variation of NoSQL that's an exception to the rule. It was even a struggle to just define what big data actually is, with definitions from Wikipedia and other sources missing several essential ingredients: In fact, the definition section of the Wikipedia entry on big data does not really offer a definition at all. All in all, this is one of the most difficult, and interesting, research projects I've been involved with.

I want to share some of the results of that research here because I think it will be helpful in understanding why securing big data is difficult, and how the challenge is not the same as relational platforms many of you are familiar with. In a future post, I'll discuss some of the fundamental differences in how big data systems are deployed and managed from a security perspective, but before I can talk about how to secure "it," I need to define what "it" is.

Yes, big data is about lots of data, of differing types, coming in at velocities that cripple most traditional database systems. But there are other essential characteristics besides size and the need for fast insertion, such as the ability to elastically scale as the data set grows. It's about distributed, parallel processing to tackle massive analysis tasks. It's about data redundancy to provide failure resistant operation, which is critical when computing environments span so many systems that hardware failures are to be expected during the course of operation.

And just as importantly, these systems are hardware-agnostic, accessible from complexity standpoint, extensible, and relatively inexpensive. These characteristics define big data systems.

The poster child for big data is Hadoop, which is a framework that at its core provides data management and query (map-reduce) services across (potentially) thousands of servers. Everything about big data clusters is designed to address storage and processing of multiple terabytes of data across as many systems as needed, in an elastic, expansive way. In fact, these clusters are so large that the prospect or failure increases to the point where it's probable a node will fail. Without elasticity, resiliency, and potential to process requests in more than one location, that makes big data different than the databases that have come before it.

But the reason why big data is a major trend is because of the convergence of three things: huge amounts of data with cheap computing resources and free (or nearly free) analytic tools. Enterprises and midmarket firms are all embracing big data not because they can suddenly afford to invest millions of dollars in data warehouse systems, MPPs, mainframes, or giant systems in-a-box. It's because they can now afford data analysis on massive data sets without spending much money up front. Cheap, commodity, or cloud computing resources with free and easy data management systems like Hadoop make it possible.

If you need to understand what big data is, then consider the characteristics outlined above. They should help you differentiate traditional systems from big data.

Adrian Lane is an analyst/CTO with Securosis LLC, an independent security consulting practice. Special to Dark Reading. Adrian Lane is a Security Strategist and brings over 25 years of industry experience to the Securosis team, much of it at the executive level. Adrian specializes in database security, data security, and secure software development. With experience at Ingres, Oracle, and ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Cryptodd
50%
50%
Cryptodd,
User Rank: Moderator
12/10/2012 | 5:49:16 PM
re: What Is Big Data?
Adrian from Securosis did a fantastic piece of research on securing Big Data that provides a nice summary of the topic.- The paper is available for download at http://www.vormetric.com/resou...-. Enjoy, TT
Mobile Banking Malware Up 50% in First Half of 2019
Kelly Sheridan, Staff Editor, Dark Reading,  1/17/2020
7 Tips for Infosec Pros Considering A Lateral Career Move
Kelly Sheridan, Staff Editor, Dark Reading,  1/21/2020
For Mismanaged SOCs, The Price Is Not Right
Kelly Sheridan, Staff Editor, Dark Reading,  1/22/2020
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win a Starbucks Card! Click Here
Latest Comment:   It's a PEN test of our cloud security.
Current Issue
IT 2020: A Look Ahead
Are you ready for the critical changes that will occur in 2020? We've compiled editor insights from the best of our network (Dark Reading, Data Center Knowledge, InformationWeek, ITPro Today and Network Computing) to deliver to you a look at the trends, technologies, and threats that are emerging in the coming year. Download it today!
Flash Poll
How Enterprises are Attacking the Cybersecurity Problem
How Enterprises are Attacking the Cybersecurity Problem
Organizations have invested in a sweeping array of security technologies to address challenges associated with the growing number of cybersecurity attacks. However, the complexity involved in managing these technologies is emerging as a major problem. Read this report to find out what your peers biggest security challenges are and the technologies they are using to address them.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-7245
PUBLISHED: 2020-01-23
Incorrect username validation in the registration processes of CTFd through 2.2.2 allows a remote attacker to take over an arbitrary account after initiating a password reset. This is related to register() and reset_password() in auth.py. To exploit the vulnerability, one must register with a userna...
CVE-2019-14885
PUBLISHED: 2020-01-23
A flaw was found in the JBoss EAP Vault system in all versions before 7.2.6.GA. Confidential information of the system property's security attribute value is revealed in the JBoss EAP log file when executing a JBoss CLI 'reload' command. This flaw can lead to the exposure of confidential information...
CVE-2019-17570
PUBLISHED: 2020-01-23
An untrusted deserialization was found in the org.apache.xmlrpc.parser.XmlRpcResponseParser:addResult method of Apache XML-RPC (aka ws-xmlrpc) library. A malicious XML-RPC server could target a XML-RPC client causing it to execute arbitrary code. Apache XML-RPC is no longer maintained and this issue...
CVE-2020-6007
PUBLISHED: 2020-01-23
Philips Hue Bridge model 2.X prior to and including version 1935144020 contains a Heap-based Buffer Overflow when handling a long ZCL string during the commissioning phase, resulting in a remote code execution.
CVE-2012-4606
PUBLISHED: 2020-01-23
Citrix XenServer 4.1, 6.0, 5.6 SP2, 5.6 Feature Pack 1, 5.6 Common Criteria, 5.6, 5.5, 5.0, and 5.0 Update 3 contains a Local Privilege Escalation Vulnerability which could allow local users with access to a guest operating system to gain elevated privileges.