Historically, enterprise organizations have not sufficiently monitored their employees' activities within internal business applications. They were essentially (and blindly) trusting their employees. This trust has unfortunately caused severe business damage due to the actions of some malicious insiders.
Monitoring is hard when existing solutions for detecting malicious activities in business applications are mainly based on rules that have to be written and maintained separately for each application. This is because each application has a bespoke set of activities and log formats. Rules-based detection solutions also generate many false positives (i.e., false alerts) and false negatives (i.e., malicious activities go undetected).
Detection needs to be agnostic to the meaning of an application's activities so it can be applied to any business application.
The solution to this challenge lies is in analyzing sequences of activities instead of analyzing each activity on its own. This means we should analyze user journeys (i.e., sessions) to monitor authenticated users in business applications. A detection engine learns all typical journeys for each user, or cohort, and uses them to detect a journey that deviates from typical journeys.
The two main challenges a detection engine needs to address are:
- Each application has a different set of activities and log format.
- We need to accurately learn typical user journeys in each application and across applications.
Standardizing the Detection Model
In order to apply one detection model to any application layer log, we can extracting from each journey the following three sequence-based features (i.e., characteristics):
- The set of activities, each denoted by numeric codes.
- The order in which activities were performed in the session.
- Time intervals between activities during the session.
These three characteristics can be applied to any application session, and even to sessions across applications.
The figure below illustrates the three characteristics of a user journey based on five activities, each denoted by a number, as the activity is a numeric code from the model's perspective.
Learning Typical User Journeys Across Apps
As explained above, the detection of abnormal journeys is based on learning all typical user journeys. Clustering technology groups similar data points to learn these user journeys and generate a typical user journey for each group of similar journeys. This process runs continuously as new log data becomes available.
Once the system learns the journeys typical to the user, the detection solution can check every new journey to see whether it is similar to a previously-learned one. If the current journey does not resemble past sessions, the solution flags it as an anomaly. It's also possible to compare the current journey against journeys associated with the cohort the user belongs to.
A detection solution must be based on an extremely accurate clustering engine tailored for sequence clustering, while still remaining almost linear in the number of journeys it clusters and not requiring prior knowledge as to how many clusters to generate. In addition, it has to detect outliers, remove them from the data set to enhance clustering accuracy, and identify these outliers as anomalies. That's how the clustering engine that generates groups of similar user journeys can also detect abnormal user journeys in historical data and report them as anomalies.