Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction
Fig 10
The number of wrong labels detected under different percentages of training data label permutation in TCGA breast cancer subtype prediction task. The number of wrongly labeled data based on LR models (A–C) and LDA models (D–F) under different detection thresholds of wrongly labeled data: 0.8 (A,D), 0.5 (B,E), 0.2 (C,F). The cleaning process visualization is based on optimized hyperparameters for the conformal predictor tuned on the validation dataset for each classifier and each percentage of labels permuted.