Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction
Fig 4
The model performance in AUROC and AUPRC with training data cleaning in COVID-19 patient ICU admission prediction task under different percentages of training data label permutation. The AUROC (A) and AUPRC (B) on the validation set, and the AUROC (C) and AUPRC (D) on the test set with a wrongly labeled data detection threshold of 0.8. The mean and 95% confidence intervals are shown. The statistically significant improvement in accuracy has been marked as follows: .: p < 0 . 1, *: p < 0 . 05, **: p < 0 . 01, ***: p < 0 . 001; first row: LR models, second row: LDA models.