Table 1.
Dataset description.
Fig 1.
A flowchart of our experimental process.
Fig 2.
The information gain ranking of the attributes of the dataset.
Fig 3.
AUC of different models with different percentage of synthetic examples created using SMOTE evaluated using 10-fold cross validation.
Fig 4.
AUC of the different ML models using Spread Subsample technique.
Table 2.
Comparison of the performance of Support Vector Machine (SVM) classifier with sampling using polynomial, normalized polynomial and puk kernels using complexity parameters 0.1, 10 and 30 using 10-fold cross validation using SMOTE.
Table 3.
Comparison of the performance of Artificial Neural Networks (ANN) classifier with gradient descent back-propagation using hidden units {1, 2, 4, 8} and the momentum {0,0.2, 0.5} using 10-fold cross validation using SMOTE.
Table 4.
Comparison of the performance of Bayesian Network classifier (BN) using different search algorithms K2, Hill Climbing, Repeated Hill Climber, LAGD Hill Climbing, TAN, Tabu and Simulated Annealing using 10-fold cross validation using SMOTE.
Fig 5.
AUC Curves for the Different Machine Learning Models using SMOTE evaluated using 10-fold cross-validation.
Fig 6.
AUC Curves for the Different Machine Learning Models using SMOTE and evaluated using holdout (70/30).
Fig 7.
AUC Curves for the Different Machine Learning Models using SMOTE and evaluated using holdout (80/20).
Table 5.
The performance of the Different Machine Learning Models evaluated using the 10-fold cross validation method using SMOTE.
The RTF model achieves the highest AUC (0.93), F-Score (86.70%), sensitivity (69,96%) and Specificity (91.71%).
Table 6.
The performance of the Different Machine Learning Models evaluated using the Hold Out method (70/30) using SMOTE.
The RTF model achieve the highest AUC (0.88), Sensitivity (74.30%), Precision (73.50%) and F-Score (73.90%).
Table 7.
The performance of the Different Machine Learning Models evaluated using the Hold Out method (80/20) using SMOTE.
The RTF model achieves the highest AUC (0.89), Sensitivity (75%), Precision (73%) and F-Score (74%). The SVM model achieves the highest Specificity (88.9%).