When I first started in the nascent cybersecurity field as a pimply-faced intern in '92, we analyzed each and every computer virus by hand — you might call it an artisan process. Fast forward 26 years and, like most other fields, cybersecurity is now driven by big data. Malware detection, cyber-risk management, incident investigation and response, and dozens of other cybersecurity use cases are all driven by the collection and analysis of huge amounts of security-relevant metadata.
The cybersecurity vendors were the first to get this — about a decade ago, antivirus vendors shifted from artisan analysis to the collection and analysis of massive amounts of machine telemetry from their customers' devices and networks. A handful of the largest corporations soon followed; they began collecting not only traditional security alerts (antivirus, firewall, intrusion-detection system), but also other types of data such as DNS records, web proxy and authentication logs, NetFlow records, and endpoint detection and response logs.
These early adopters realized that these secondary data feeds, while of huge volume and expensive to store and process, were indispensable for rooting out latent attacks. Some of these corporations imported this data into complex graph analysis solutions such as Palantir. But the biggest players built their own proprietary big data lakes, began archiving these data feeds, hired a stable of high-paid data scientists, and set out to make sense of the data.
Those are the heavy hitters, but how about everyone else? Today, most corporations only have the capacity to collect traditional security alerts (including antivirus, data-loss prevention, and firewall alerts). They can't afford the huge costs of storing, backing up, and processing the mounds of (high-value) metadata generated by these other sources. Because they can't afford to build and maintain their own data lakes, let alone hire highly paid data scientists to trawl the data, they pump what they can into an off-the-shelf security information and event management systems or Splunk, discard it after a few months (to save money), and enable simple log searching and alert aggregation use cases. These corporations can't begin to even think about collecting, storing, or processing other, more useful, signals. It's as if a bank had to piece together a heist based on just the alarm and the broken glass and didn't have any video footage.
So, this is the state of the cybersecurity world.
A Different Way
Let's try to envision a different future. Imagine a future in which companies could, at reasonable expense, collect, store, index, and analyze all of the cybersecurity-relevant data from their environment. Not just alerts, but the full spectrum of security-relevant telemetry from their devices, networks, and cloud systems. And not just a few weeks or months of data (to save costs), but potentially years of historical data.
Furthermore, imagine what would happen if this data were stored in a well-documented, standardized form, encrypted and secured in enterprise-controlled data vaults. This would not only enable internal security teams to analyze it at scale for various cybersecurity use cases but also potentially enable service providers, with the permission of the data owner, to deliver additional services and derive new insights from this data. This would unlock the value of this data, and it would open it up to an entire ecosystem of providers. It would also enable the typical enterprise — not just the heavy hitters — to drastically improve their security posture.
This model is working in other industries and areas where extremely sensitive data has historically been siloed. For instance, Open Government Partnership is enabling the use of public records for citizens to get greater insights into the data collected on their behalf. Banks are unlocking customer data as part of the open banking movement in order for customers to make more use of their information. And in healthcare, HMOs are using data about medicines, patient behaviors, and outcomes to come up with insights across data sources that can improve patient care.
What would it look like if more than a tiny fraction of enterprises had access to all the signals hidden in their big data today? They could:
The biggest enterprises might build some of these solutions themselves, but the rest would rely upon trusted service providers to slice and dice these data feeds and apply their secret sauce to solve these use cases. A hearty analytics marketplace would emerge, with competing companies offering products that wring the most out of their customers' data.
Enterprises are sitting on all the intelligence they need to better protect themselves from cyberattacks and other threats; they just need to make better use of it to turn the data into an effective defense. Other industries are seeing benefits from liberating data from proprietary siloes, and in the future world we envision, security teams can too. The stakes are too high not to.
Black Hat Europe returns to London Dec. 3-6, 2018, with hands-on technical Trainings, cutting-edge Briefings, Arsenal open-source tool demonstrations, top-tier security solutions, and service providers in the Business Hall. Click for information on the conference and to register.Carey Nachenberg is Chronicle's Chief Scientist and a founding member of the Chronicle team. Prior to joining Chronicle, Carey served as Fellow and Chief Engineer at Symantec. During his 22-year tenure in the cybersecurity industry, Carey pioneered numerous cybersecurity ... View Full Bio