Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

The overall framework for evaluating the reliability of MIL models follows a three-step process.

First, a MIL model is trained on a weakly-supervised task for predicting slide-level labels. Next, the trained model is applied to predict scores for individual image patches. Finally, the reliability value is computed based on the predicted patch scores and their corresponding annotations, where tumor patches are highlighted in green and normal patches in orange in the annotation visualization.

More »

Fig 1 Expand

Fig 2.

(I) The test-30 slide with ground truth annotations (green) overlaid on the tissue section. (II) Corresponding heatmap generated by MAX-POOL, showing predicted patch scores distribution from low (blue) to high (red).

The annotation and heatmap are spatially aligned for comparison.

More »

Fig 2 Expand

Table 1.

Average reliability, classification, and computational metrics (± standard deviation) over five repetitions for the CAMELYON16 dataset.

More »

Table 1 Expand

Table 2.

Average reliability, classification, and computational metrics (± standard deviation) over five repetitions for the CAMELYON16 dataset using additive models.

More »

Table 2 Expand

Fig 3.

(I) The test-40 slide with ground truth annotations (green) overlaid on the tissue section. (II) Corresponding heatmap generated by MEAN-POOL-INS, showing predicted patch scores distribution from low (blue) to high (red).

The annotation and heatmap are spatially aligned for comparison.

More »

Fig 3 Expand

Fig 4.

(I) A slide from CATCH with ground truth annotations (green) overlaid on the tissue section. (II) Corresponding heatmap generated by MAX-POOL, showing predicted patch scores distribution from low (blue) to high (red).

The annotation and heatmap are spatially aligned for comparison.

More »

Fig 4 Expand

Table 3.

Average reliability, classification, and computational metrics (± standard deviation) over five repetitions for CATCH.

More »

Table 3 Expand

Table 4.

Average reliability, classification, and computational metrics (± standard deviation) over five repetitions for the CATCH using additive models.

More »

Table 4 Expand

Fig 5.

(I) A slide from CATCH with ground truth annotations (green) overlaid on the tissue section. (II) Corresponding heatmap generated by ACMIL/4, showing predicted patch scores distribution from low (blue) to high (red).

The annotation and heatmap are spatially aligned for comparison.

More »

Fig 5 Expand

Table 5.

Average reliability, classification, and computational metrics (± standard deviation) over five repetitions for TCGA BRCA.

More »

Table 5 Expand

Fig 6.

(I) A slide from TCGA BRCA with ground truth annotations (green) overlaid on the tissue section. (II) Corresponding heatmap generated by MADMIL/3, showing predicted patch scores distribution from low (blue) to high (red).

The annotation and heatmap are spatially aligned for comparison.

More »

Fig 6 Expand

Fig 7.

(I) A slide from TCGA BRCA with ground truth annotations (green) overlaid on the tissue section. (II) Corresponding heatmap generated by MEAN-POOL-INS, showing predicted patch scores distribution from low (blue) to high (red).

The annotation and heatmap are spatially aligned for comparison.

More »

Fig 7 Expand

Table 6.

Average reliability, classification, and computational metrics (± standard deviation) over five repetitions for TCGA BRCA using additive models.

More »

Table 6 Expand

Fig 8.

Bar plots comparing the models with different metrics.

More »

Fig 8 Expand

Table 7.

The overall mean of the average reliability, classification, and computation metrics across CAMELYON16, CATCH, and TCGA BRCA.

More »

Table 7 Expand

Table 8.

The overall mean of the average reliability, classification, and computation metrics for additive models across CAMELYON16, CATCH, and TCGA BRCA datasets.

More »

Table 8 Expand

Table 9.

Analysis of reliability metrics showing the effect of excluding each metric on model ranking.

Rankings are obtained by summing the scores of the selected reliability metrics (with equal weight) across CAMELYON16, CATCH, and TCGA BRCA datasets, and ranking models based on the aggregated score.

More »

Table 9 Expand