Evaluating reproducibility of AI algorithms in digital pathology with DAPPER

doi:10.1371/journal.pcbi.1006269

Fig 1.

The DAPPER environment.

Components: A) The WSI preprocessing pipeline; B) the deep learning backend to extract deep features; C) the Data Analysis Plan (DAP) for the machine learning models; and D) the UMAP module and other modules for unsupervised analysis.

More »

Expand

Fig 2.

The tissue detection pipeline.

The identification of the tissue bounding box is performed on the WSI thumbnail in three steps: a) Binarization of the grayscale image by applying Otsu’s thresholding; b) Binary dilation and filling of the holes; c) Selection of the biggest connected region as tissue region and computation of the vertex of the containing rectangle.

More »

Expand

Table 1.

Summary of the HINT datasets.

Total: total number of tiles composing the dataset; Min: number of tiles in the class with less samples; Max: number of tiles in the class with more samples; Average; average number of tiles for each class.

More »

Expand

Table 2.

Backend architectures statistics.

More »

Expand

Table 3.

Summary of experiments with the backend architectures.

More »

Expand

Table 4.

Matthew correlation coefficient values for each experiment, and classifier head pairs on HINT dataset.

The average cross validation MCC with 95% CI (H-MCCt), and MCC on the external validation set (H-MCCv) are reported. Best-performing backend network, and classifier head combination on each dataset are reported in bold.

More »

Expand

Table 5.

Accuracy values for each experiment, and classifier head pairs on HINT dataset.

The average cross validation ACC with 95% CI and ACC on the external validation set are reported. Best-performing backend network, and classifier head combination on each dataset are reported in bold.

More »

Expand

Fig 3.

Comparison of DAPPER cross validation MCC (H-MCCt), vs MCC on external validation (H-MCCv) performance for each classifier.

(a) FCH; (b) SVM; (c) RF.

More »

Expand

Fig 4.

Confusion matrix for ResNet+SVM model on HINT20.

Red shaded cells indicate the most confused classes.

More »

Expand

Table 6.

Performance of DAPPER framework for VGG backend network, and classifier heads (FCH, SVM, RF) on KIMIA24 dataset.

The average cross validation MCC (K24-MCCt), and ACC (K24-ACCt) with 95% CI, as well as MCC (K24-MCCv), and ACC (K24-ACCv) on external validation set are reported.

More »

Expand

Fig 5.

Canberra stability indicator on HINT and KIMIA datasets.

For each architecture, a set of deep feature lists is generated, one list for each internal run of training in the nested cross validation schema, each ranked with KBest. Canberra stability is computed as in [32]: lower stability is better.

More »

Expand

Table 7.

Metrics at WSI-level for increasing number of tiles per WSI.

Metrics are computed on a subset of HINT20 external validation set, consisting of 15 WSI per class (300 WSI in total). The WSI class is determined by the most frequent predicted class by the ResNet+SVM model for the considered tiles.

More »

Expand

Fig 6.

Confusion matrix for pathologist classification on a subset of HINT20 external validation set.

Red shaded cells indicate the most confused classes.

More »

Expand

Table 8.

Tissue classification performance of DAPPER vs pathologist.

DAPPER with ResNet+SVM model outperforms the pathologist at tile-level. Metrics are computed on a subset of HINT20 external validation set (2, 000 tiles).

More »

Expand

Fig 7.

UMAP projection of external validation set for VGG-20 experiment.

More »

Expand

Table 9.

Histology types well separated by SVM+ResNet model for HINT20.

Accuracy is computed with respect to the confusion matrix in Fig 4 and expressed in percentage, together with the total number of samples for each class.

More »

Expand

Fig 8.

Representative tiles predicted from VGG-20 experiment.

A) Examples from two well-separated clusters observed in the UMAP embedding. B) Samples of mislabeled tiles from tissues partially overlapping in the UMAP embedding.

More »

Expand