So what exactly is big data? In a nutshell, it's a data set that's too big to be crunched by traditional database tools. Whether it is from scientific or environmental sensors spewing out a cascade of data, financial systems producing a mounting cavalcade of information or web and social media apps that create a snowballing mass of records, big data is typically classed as such if it maintains three essential dimensions. They're what Gartner's Doug Laney, then of META Group, back in 2001 called the 3Vs of data management: volume, variety and velocity. The first one's obvious, clearly something wouldn't be called big data if there wasn't a heck of a lot of it. But big data is also a swarm of unstructured data that has got to be fast to store, fast to recover and, most importantly, fast to analyze.
"While many analysts were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on," Laneywrote recently in a retrospective on that first report. "Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data."
When Landoll first wrote about the 3Vs 11 years ago, it was mostly addressing the data management challenges that had contributed to the evolution of data warehousing. These types of data stores gain their value mainly through analysis--which is why data warehousing and business intelligence had gone hand-in-hand for years before 'big data' became common parlance. Speculatively speaking, the benefits of analyzing big data include the ability to make better business decisions and reduce waste in vertical markets such as the public and health care sectors. According to a study by MGI, even retailers properly utilizing Big Data can increase their operating margin by a whopping 60 percent.
Whether big data is going to reside in the data warehouse or some other more scalable data store still remains up in the air. One thing is for certain, though, big data is not easily handled by the relational databases that the typical DBA is used to wrangling within the traditional enterprise database server environment.
"What’s emerging is a new world of horizontally scaling, unstructured databases that are better at solving some old problems. More importantly, they’re prompting us to think of new problems to solve whose resolution was never attempted before, because it just couldn’t be done," say the authors of the Accenture Technology Vision 2012 report released last week. "We foresee a rebalancing of the database landscape as data architects embrace the fact that relational databases are no longer the only tool in the toolkit."
The question for security professionals, of course, is if this growing mass of data is becoming increasingly unstructured and accessed from an ever-distributed cloud of users and applications looking to slice and dice it in a million and one ways, how can they be sure they're keeping tabs on the regulated information in all that mix?
"Organizations aren’t realizing the importance of such areas as PCI or PHI and failing to take necessary steps because it is flowing with other basic data," says Jon Heimerl, director of strategic security for Solutionary. "Mainly, big data stores are leading organizations to not worry enough about very specific pieces of information."
Joe Gottlieb, president and CEO of Sensage, says that the healthcare example is one of the most important for compliance executives as they examine how big data creation, storage and flow works in their organizations.
"The move to electronic health record (EHR) systems driven by HIPAA/HITECH is causing a dramatic increase in the accumulation, access and inter-enterprise exchange of PII," he says. "For the largest healthcare providers and payers, this has already become a big data problem that must be solved to maintain compliance."
While the prospect of proving compliance even within massively muddled big data stores , the slow development of laws and regulations may work in favor of CISOs trying to get a bead on big data.
"From a compliance perspective, many of the laws and regulations have not addressed the unique challenges of data warehousing. Many of the regulations don’t address the rules around protecting data from different customers at different levels," says Tom McAndrew, executive vice president of professional services at Coalfire. "For example, if a database has credit card data and healthcare data, does PCI and HIPAA apply to the entire data store, or only the parts of the data store that have the data. The answer is highly dependent on your interpretation of the requirements and the way you have implemented the technology."
Similarly, social media applications that are collecting tons of unregulated, yet potentially sensitive data, may not yet be a compliance concern. But they are still a security problem that if not properly addressed now may be regulated in the future.
"Social networks are accumulating massive amounts of unstructured data--a primary fuel for the big data problem, but they are not yet regulated so this is not a compliance concern but remains as a security concern," Gottlieb says.
According to McAndrew, security professionals concerned about how things like Hadoop and NoSQL deployments are going to affect their compliance efforts need to take a deep breath and remember that the general principles of data security still apply.
"t really starts with knowing where you data resides. The good news is that with the newer database solutions, there are automated ways of detecting data and triaging systems that appear to have data they shouldn’t," he says. "As you get your organization to map and understand your data, look for opportunities to automate and monitor compliance and security through data warehouse technologies. Automation has the ability to decrease compliance and security costs and get higher levels of assurance that you know where your data is and where it is going."
In addition to understanding where the important data sits, organizations also need to think about finding ways to segregate, which will make the deployment of security measures such as encryption and monitoring more manageable.
"After organizations better understand their data, they need to take important steps to segregate it. The more data you silo as high-level, the easier it will be to protect and control it," Heimerl says. "Smaller sample sizes are easier to protect and can be monitored separately for specific necessary controls."
Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.