Table 1.
Confusion matrix for a binary outcome (adapted from Fawcett [30]).
Table 2.
Comparison of biological and biochemical values between FH and non-FH patients, according to statins usage.
Table 3.
Area under the ROC and PR curves for medicated patients, using the original and SMOTE sample data.
Fig 1.
Comparison of areas under the ROC and PR curves, for each replica of the dataset.
On the left column are presented the results using original sample data, and on the right column the results using SMOTE sample data. AUROC: area under the receiver operating characteristics curve; AUPRC: area under the precision-recall curve; LR: logistic regression; NB: naive bayes; RF: random forest; XGB: extreme gradient boosting; SMOTE: synthetic minority oversampling technique.
Table 4.
Mean and standard deviation values of operating characteristics (OC), for different classification algorithms and techniques to cope with data imbalance, and values obtained with SB criteria.
Fig 2.
Comparison of operating characteristics values between different classification algorithms, and strategies to deal with data imbalance.
The dashed line represents the value obtained when applying SB criteria. Acc: accuracy; Sens: sensitivity; Spec: specificity; PPV: positive predictive value; NPV: negative predictive value; SMT: synthetic minority oversampling technique; YI: Youden Index; LR: logistic regression; NB: naive Bayes; RF: random forest; XGB: extreme gradient boosting; SB: Simon Broome criteria.
Table 5.
Significant differences for operating characteristics (OC) values among several classification methods.