Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction

doi:10.1371/journal.pcbi.1012803

Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction

Fig 10

The number of wrong labels detected under different percentages of training data label permutation in TCGA breast cancer subtype prediction task. The number of wrongly labeled data based on LR models (A–C) and LDA models (D–F) under different detection thresholds of wrongly labeled data: 0.8 (A,D), 0.5 (B,E), 0.2 (C,F). The cleaning process visualization is based on optimized hyperparameters for the conformal predictor tuned on the validation dataset for each classifier and each percentage of labels permuted.

doi: https://doi.org/10.1371/journal.pcbi.1012803.g010