Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction
Fig 3
The model accuracy improvement with training data cleaning in DILI literature classification task based on the W2V embeddings under different percentages of training data label permutation. The classification accuracy on the validation set (A–C) and on the test set (D–F) with a wrongly labeled data detection threshold of 0.8 (A,D), 0.5 (B,E), and 0.2 (C,F). The mean and 95% confidence intervals are shown. The statistically significant improvement in accuracy has been marked as follows: .: p < 0 . 1, *: p < 0 . 05, **: p < 0 . 01, ***: p < 0 . 001; first row: LR models, second row: LDA models.