Table 1.
Comparison of methods for log anomaly detection.
Fig 1.
LSTM auto encoder algorithm illustration.
Fig 2.
A concrete example showing a few log lines from a VMWare log file.
Fig 3.
A schematic diagram of a single log file.
The diagram illustrates the generation of logs. Each virtual machine generates log files in chronological order over time. The intervals between log file generations are often inconsistent, resulting in some virtual machines generating a large number of log files within a given time frame ‘K’, while others generate fewer log files. The number of log lines in each log file also tends to vary.
Fig 4.
One case of log file anomaly detection is shown.
Fig 5.
Another case of log file anomaly detection is shown.
The latest log file on each virtual machine at time T is the object to be detected Discriminator is a detection system. Normal and Anormal represent the two categories into which the log files are divided. In one case (Fig 4), T3 is a noisy normal log file alerted as an anomaly. In another case (Fig 5), T3 is a noisy normal log file considered as normal.
Fig 6.
A brief overview of virtual machine log anomaly detection.
In the training phase, the training log set undergoes log parsing to obtain log templates. The log templates are then sorted based on their length to create a mapping dictionary between the log templates and numerical values. This dictionary converts the log data into numerical data. The feature vector data, obtained through feature extraction, serves as input for training the SVM discriminator. In the testing phase, the log set is mapped into numerical data using the dictionary obtained during the training phase. The feature vector data, obtained through feature extraction, is then used as input for the SVM discriminator to detect anomalies.
Fig 7.
Data processing diagram, logs are classified and converted into numerical vectors through Algorithm 1.
Fig 8.
Overview of Algorithm 2.
Table 2.
D1 training without noise—Testing without noise.
Table 3.
D2 training without noise—Testing with noise.
Table 4.
D3 training with noise—Testing without noise.
Table 5.
D4 Training with noise—Testing with noise.
Table 6.
D5 training without noise—Log sequence disorder—Testing without noise.
Fig 9.
Comparison of F1 score.
Fig 10.
Comparison of accuracy.
Fig 11.
Ablation study.
Table 7.
Training time.