|Click here for more articles.|
While many RSA attendees had a hard time even figuring out what the heck vendors meant when they referred to "big data" at the show -- and perhaps even the vendors themselves were a bit fuzzy on the definitions -- talk about big data in security wasn't purely hype. In fact, the show acted as the proving grounds for practitioners at one financial institution to show how they’ve been able to use the power of Hadoop-driven clusters and business intelligence tools (BI) to parse more data far more quickly than with traditional SIEM tools.
The result has given that institution, Salt Lake City-based Zions Bancorporation, the ability to come closer to tasting that elusive fruit of the security monitoring world: achieving actionable intelligence on a real-time basis.
According to Preston Wood, CSO at Zions and the moderator of a panel of his Zion team members, the institution has been trying to move to a more data-driven approach to its security practice during the past several years. But it was finding that it was continually running into the limitations of its traditional SIEM tools.
In order to drive deeper forensics and to train statistical machine-learning models, Zions found it needed months or even years of data before it became functionally useful. This quantity of data and the frequency analysis of events was too much for SIEM to handle alone.
“We [knew] we’d be bumping our heads against the ceiling with SIEM fairly early on,” Wood said. “The underlying data technology just couldn’t handle it.”
What’s more, the analysis itself was watery. The team was swimming in data but had a hard time turning that into action.
“The SIEM is good for telling the data what to do,” Wood said. “But who is telling us what to do?”
The pivotal point came with Hadoop, which allowed the company to use data in a new, more effective way. Open-source Hadoop, when coupled with Google’s MapReduce, has made life much different for Zions.
“The crux of the system is the distributed file system,” said Mike Fowkes, director of fraud prevention and analytics for Zions. The file system makes it easy for administrators to run Java-based queries that will then run against data spread across multiple systems. This allows more timely analysis of a greater sum of data than was before possible.
Zions’ results have been dramatic. In an environment where its security systems generate 3 terabytes of data a week, just loading the previous day’s logs into the system can be a challenge. It used to take a full day, Foust said.
“With MapReduce, HIVE, and Hadoop, we’re doing it in near-real-time fashion,” he said. “We’re pulling in data every five minutes, hourly, every two minutes -- it just depends on the frequency of how fresh our data needs to be.”
And actual searches can be even more dramatically fast. Searching among a month’s load of logs could take anywhere between 20 minutes to an hour depending on how busy the server was, he said.
“In our environment within HIVE, it has been more like a minute to get the same deal,” Fowkes said.
Aside from a boost in data-mining firepower, Hadoop’s HDFS file system brings a robust level of availability to the data warehouse environment, too.
“If you’re running a job and something fails on a system, it will dynamically readjust,” said Fowkes, explaining that a failure of a node or a hard drive isn’t the show-stopper it used to be. Instead, the system is able to reapportion the data based on the number of remaining nodes.
With a fast and effective infrastructure set up and running, Zions uses the data for dozens of purposes. Database logs, firewall, antivirus, IDS logs, plus industry-specific logs like wire ACS deposit applications and credit data are all pulled together into a centralized syslog server.
While queries are written in Java, it takes more than an off-the-shelf Java programmer to put together meaningful queries and make sense of what they return. That’s where Aaron Caldiero comes in. As senior data scientist at Zions, he plays the part of “part computer scientist, part statistician, and part graphic designer,” he explains.
Caldiero's job is to collect and centralize the data, design methods of synthesizing it (ranging from basic logic to machine-learning algorithms), and then present it in a coherent way.
His approach has achieved incredible results for his organizations, but it may be foreign for security professionals.
“It’s a bottom-up process where you’re putting the data first,” Caldiero said.
Compiling huge amounts of data allows analysts to draw trends, patterns, or correlations that they might never have found had they put the questions first and sorted through terabytes of data for the answers.
It’s an approach that has worked well for Zion and Wood, and his team believes it could be well-applied elsewhere. Wood stressed that the power of big data analytics isn’t just for big companies, either.
“You can start with a single box in your environment,” he said, stressing that it is a technology well-suited for security, but the expectation needs to be set that “big data strategy is a journey, not a destination. It’s not a product you’re going to buy; it’s not something you’re going to stand up there and be done with.”
Have a comment on this story? Please click "Add Your Comment" below. If you'd like to contact Dark Reading's editors directly, send us a message.