Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction

doi:10.1371/journal.pcbi.1012803

Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction

Fig 7

The number of wrong labels and outliers detected under different percentages of training data label permutation in DILI literature prediction task with W2V embeddings. The number of wrongly labeled data (A-C) and outliers (D-F) under different detection thresholds of wrongly labeled data: 0.8 (A,D), 0.5 (B,E), 0.2 (C,F). The cleaning process visualization is based on W2V embeddings and fixed hyperparameters for the conformal predictor. Here, total means the total number of wrongly labeled data detected, regardless of labels.

doi: https://doi.org/10.1371/journal.pcbi.1012803.g007