Harnessing clinical annotations to improve deep learning performance in prostate segmentation
Fig 3
Evaluation metrics for PX2 and P12 datasets.
Soft Dice coefficients (A) and average Hausdorff distances (B) for every sample in the ProstateX-2 (PX2, n = 99) and PROMISE12 (P12, n = 50) datasets, after model evaluation for the baseline, BraTS, and refined primary baseline models. Each solid dot represents a single training example. The models trained by refining the BraTS pretrained model or the baseline pretrained model both exhibited improved performance and reduced variance on both evaluation metrics, and with the refined primary baseline model exhibiting the highest performance and lowest variance. Detailed statistics are available in Tables 1, 2, and 4.