Log analysis and log management are often considered dirty words to enterprises, unless they're forced to adopt them for compliance reasons. It's not that log analysis and management have a negative impact on the security posture of an organization -- just the opposite. But their uses and technologies are regularly misunderstood, leading to the potential for security breaches going unnoticed for days, weeks, and sometimes months.
According to the Verizon 2009 Data Breach Investigation Report (PDF), "66 percent of victims had sufficient evidence available within their logs to discover the breach had they been more diligent in analyzing such resources." This begs the question: Why is it that organizations big and small fail to do proper log analysis? Or after going through the effort to set up logging, why aren't they using those logs to detect issues as they arise?
The root cause of such problems stems from a fundamental lack of understanding about what should be logged, how the data should be centralized, and how it should be analyzed once collected. Even when a company implements a solution to address the first two issues, it's No. 3 that sends their staff straight into information overload -- a problem that can be just as bad as not having logs at all.
Log analysis is a daunting task that can benefit an organization both proactively and reactively. For example, if log analysis is done regularly, events leading up to a user account being broken into or a piece of hardware failing can be caught before serious damage can occur. Likewise, if a security breach does occur, then log analysis can provide the forensic evidence to determine what happened and which systems were affected.
Centralization is the first step. System administrators and security professionals do not have time to log into tens, hundreds, or even thousands of systems, so logs need to be shipped from the source to a hardened log collection server. The central collection server, whether located geographically or organizationally, can be as simple as a syslog server or as complex as a security information and event management (SIEM) solution.
Collecting logs is the easy part. It's how to manage them that's difficult for most IT shops: how to store the logs, how long to store them, whether you keep all of them, etc. Sometimes that can be handled by a SIEM or complementary product that focuses just on those tasks.
Then there's the analysis. Because no one wants to sit around and stare at logs all day, automation and correlation play a huge role in separating the wheat from the chaff. The key for any solution is to bring the events of interest to the surface so an analyst can alert the right people to address the problem. Essentially, log analysis, whether done through homegrown tools or an expensive SIEM, needs to feed actionable information to operations staff to deal with daily issues and security teams to handle incidents.
The issues and solutions discussed on the previous page map directly into several of stages within the incident response process. To be successful at incident-handling, you need to have the logs available and be able to query them for the necessary information quickly.
The first stage is preparation, which involves setting up logging, getting the tools in place, verifying logs are being collected, and making a decision about how long the logs should be retained.
The second stage is identification, which is where log analysis begins to flex its muscle. Once a call comes in that one of your systems is attacking another company's network, it's time to start analyzing the logs collected from the firewalls, routers, IDS, etc., to determine whether the call is indicative of an actual compromise and outbound attack, or is a false positive due to backscatter.
Containment is the third stage, which calls for triage in order to prevent any additional systems from becoming compromised and data from being exfiltrated out of your company. Real-time alerting is important to help ensure firewall rules and disabled accounts were done properly.
In the fourth, eradication, stage, the clean-up takes place and protections are put in place to prevent an issue from happening again. For example, if a rootkit was installed or an account compromised, then logs from antivirus management server can show you which systems need to be rebuilt. Similarly, queries to see which hosts that account logged into can be run so a deeper investigation of those hosts can be conducted.
Stage five, recovery, is another area where real-time alerting can help. Once an incident has been cleaned up, the affected hosts need to be monitored to be sure they are operating normally before placing them back into production. Configuring alerts to look for anomalies can help the sysadmins and security team be sure they addressed everything properly during the eradication phase.
The final stage, lessons learned, provides the different teams involved with a chance to look back and see where things failed, what could be improved, and what needs to be done to prevent similar incidents. This is a good time to confirm the logs you needed during the incident response process were available and easy to access. If not, come up with a plan to improve your log management and analysis process, and then present it to management during the debriefing.
Managing and analyzing logs in an enterprise is not an easy task, but it's obviously an important one that can go overlooked and left by the wayside. When done right, however, it is a process that can improve response time for both operational and security staff.
Have a comment on this story? Please click "Discuss" below. If you'd like to contact Dark Reading's editors directly, send us a message.