Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction
Fig 6
The model performance in accuracy and F1 score with training data cleaning in the breast cancer subtype prediction task under different percentages of training data label permutation. The classification accuracy (A) and macro-averaged F1 score (B) on the validation set, and the classification accuracy (C) and macro-averaged F1 score (D) on the test set with a wrongly labeled data detection threshold of 0.5. The mean and 95% confidence intervals are shown. The statistically significant improvement in accuracy has been marked as follows: .: p < 0 . 1, *: p < 0 . 05, **: p < 0 . 01, ***: p < 0 . 001; first row: LR models, second row: LDA models.