Skip to main content
Advertisement

< Back to Article

Reliability-enhanced data cleaning in biomedical machine learning using inductive conformal prediction

Fig 1

The design of the reliability-based training data cleaning method based on inductive conformal prediction and the validation process. The training data cleaning method based on the conformal prediction is shown on the left half (module 1) while the modeling of the downstream classification tasks and the evaluation of the validation and test sets are shown on the right half of the figure (module 2). Based on the standard ICP method, the training dataset is partitioned into the proper training set and calibration set. The proper training set is used to represent the noisy training data and the calibration set represents the well-curated dataset. Wrongly labeled data and outliers in the proper training set are detected and corrected based on the P-values calibrated on the nonconformity measure distribution on the calibration set. The cleaned training set is then used to train classifiers for downstream classification tasks and compared against baselines.

Fig 1

doi: https://doi.org/10.1371/journal.pcbi.1012803.g001