Comparative study on the performance of different classification algorithms, combined with pre- and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemia

doi:10.1371/journal.pone.0269713

Table 1.

Confusion matrix for a binary outcome (adapted from Fawcett [30]).

More »

Expand

Table 2.

Comparison of biological and biochemical values between FH and non-FH patients, according to statins usage.

More »

Expand

Table 3.

Area under the ROC and PR curves for medicated patients, using the original and SMOTE sample data.

More »

Expand

Fig 1.

Comparison of areas under the ROC and PR curves, for each replica of the dataset.

On the left column are presented the results using original sample data, and on the right column the results using SMOTE sample data. AUROC: area under the receiver operating characteristics curve; AUPRC: area under the precision-recall curve; LR: logistic regression; NB: naive bayes; RF: random forest; XGB: extreme gradient boosting; SMOTE: synthetic minority oversampling technique.

More »

Expand

Table 4.

Mean and standard deviation values of operating characteristics (OC), for different classification algorithms and techniques to cope with data imbalance, and values obtained with SB criteria.

More »

Expand

Fig 2.

Comparison of operating characteristics values between different classification algorithms, and strategies to deal with data imbalance.

The dashed line represents the value obtained when applying SB criteria. Acc: accuracy; Sens: sensitivity; Spec: specificity; PPV: positive predictive value; NPV: negative predictive value; SMT: synthetic minority oversampling technique; YI: Youden Index; LR: logistic regression; NB: naive Bayes; RF: random forest; XGB: extreme gradient boosting; SB: Simon Broome criteria.

More »

Expand

Table 5.

Significant differences for operating characteristics (OC) values among several classification methods.

More »

Expand