Fig 1.
Structure of literature review.
Fig 2.
Complete experimental flow chart.
Fig 3.
As can be seen from the graph, there is a significant imbalance in the data, with 97% majority and 3.226% minority.
Table 1.
Anomaly records.
Fig 4.
The histogram of this attribute value is skewed to some extent.
Fig 5.
A spot check of the ‘Equity to Liability’ skew value reveals that it has reached more than 7.40.
Fig 6.
Although there is a strong correlation of 0.93 between Attr1 and Attr2, 0.98 between Attr1 and Attr3, and 0.92 between Attr6 and Attr7, most of the variables have weak correlation.
Table 2.
Binary confusion matrix.
Fig 7.
Among all the models, NB had the best F2-score, followed by LDA with an F2-measure of 0.333; LR, XGB, and NN had almost the same score (0.235, 0.295, and 0.304, respectively); and SVM had the lowest score, which was equal to the baseline score.
Table 3.
F2-measure of machine learning models.
Fig 8.
Boxplot of undersampling for LDA.
All five sampling methods had similar scores; however, NCR had the lowest standard deviation.
Table 4.
F2-measure of LDA.
Fig 9.
Boxplot of undersampling for NB.
The ENN score was the highest (0.423), the RENN score was slightly lower (0.079), and the other three undersampling scores were similar.
Table 5.
F2-measure of NB.
Fig 10.
Boxplot of undersampling for NN.
The ENN RENN OSS shared similar highest scores; however, the standard deviation was not low, and the RENN score was 0.350.
Fig 11.
Centroid undersampling for NB.
Without undersampling, the highest F2-score is 0.3975. Continuously increasing the undersampling rate does not blindly reduce the evaluation score. It is non-decreasing but increases when the undersampling rate is approximately 54% and 80%.
Fig 12.
Centroid undersampling for LDA.
LDA has a higher F2-measure when the undersampling is 30%, with a value of 0.3994, and has the lowest value at 54%; it is non-decreasing but increases after 54%.