Fig 1.
Components: A) The WSI preprocessing pipeline; B) the deep learning backend to extract deep features; C) the Data Analysis Plan (DAP) for the machine learning models; and D) the UMAP module and other modules for unsupervised analysis.
Fig 2.
The tissue detection pipeline.
The identification of the tissue bounding box is performed on the WSI thumbnail in three steps: a) Binarization of the grayscale image by applying Otsu’s thresholding; b) Binary dilation and filling of the holes; c) Selection of the biggest connected region as tissue region and computation of the vertex of the containing rectangle.
Table 1.
Total: total number of tiles composing the dataset; Min: number of tiles in the class with less samples; Max: number of tiles in the class with more samples; Average; average number of tiles for each class.
Table 2.
Backend architectures statistics.
Table 3.
Summary of experiments with the backend architectures.
Table 4.
Matthew correlation coefficient values for each experiment, and classifier head pairs on HINT dataset.
The average cross validation MCC with 95% CI (H-MCCt), and MCC on the external validation set (H-MCCv) are reported. Best-performing backend network, and classifier head combination on each dataset are reported in bold.
Table 5.
Accuracy values for each experiment, and classifier head pairs on HINT dataset.
The average cross validation ACC with 95% CI and ACC on the external validation set are reported. Best-performing backend network, and classifier head combination on each dataset are reported in bold.
Fig 3.
Comparison of DAPPER cross validation MCC (H-MCCt), vs MCC on external validation (H-MCCv) performance for each classifier.
(a) FCH; (b) SVM; (c) RF.
Fig 4.
Confusion matrix for ResNet+SVM model on HINT20.
Red shaded cells indicate the most confused classes.
Table 6.
Performance of DAPPER framework for VGG backend network, and classifier heads (FCH, SVM, RF) on KIMIA24 dataset.
The average cross validation MCC (K24-MCCt), and ACC (K24-ACCt) with 95% CI, as well as MCC (K24-MCCv), and ACC (K24-ACCv) on external validation set are reported.
Fig 5.
Canberra stability indicator on HINT and KIMIA datasets.
For each architecture, a set of deep feature lists is generated, one list for each internal run of training in the nested cross validation schema, each ranked with KBest. Canberra stability is computed as in [32]: lower stability is better.
Table 7.
Metrics at WSI-level for increasing number of tiles per WSI.
Metrics are computed on a subset of HINT20 external validation set, consisting of 15 WSI per class (300 WSI in total). The WSI class is determined by the most frequent predicted class by the ResNet+SVM model for the considered tiles.
Fig 6.
Confusion matrix for pathologist classification on a subset of HINT20 external validation set.
Red shaded cells indicate the most confused classes.
Table 8.
Tissue classification performance of DAPPER vs pathologist.
DAPPER with ResNet+SVM model outperforms the pathologist at tile-level. Metrics are computed on a subset of HINT20 external validation set (2, 000 tiles).
Fig 7.
UMAP projection of external validation set for VGG-20 experiment.
Table 9.
Histology types well separated by SVM+ResNet model for HINT20.
Accuracy is computed with respect to the confusion matrix in Fig 4 and expressed in percentage, together with the total number of samples for each class.
Fig 8.
Representative tiles predicted from VGG-20 experiment.
A) Examples from two well-separated clusters observed in the UMAP embedding. B) Samples of mislabeled tiles from tissues partially overlapping in the UMAP embedding.