Fig 1.
Overall workflow. A. MSI classificaiton model development. (a) H&E Whole slide image (WSI) of colorectal (CRC), stomach (STAD) and endometrial (UCEC) cancer were download from TCGA publich databases. (b) WSI of patients diagnosed with MSI-H and MSS using PCR testing were selected. (c) WSI were cut into 521 × 521 pixel tiles and color normalized by Macenko’s method. (d) Tumor tiles were selected from the entire patches by the tumor classification model. (e) The training data was used to train the convolutional neural networks with a 5-fold cross-validation, and the testing set was used to evaluate trained models each cancer types. The four models were trained using combinations of individual cancer types or tissues as training data. (f) The generated models (CRC-, STAD-, UCEC-, Multi-tissue trained model) were inferred the MSS status at the tile level for each cancer tissue. (g) Each model calculated slide-level probabilities by averaging tile-level probabilities, and model evaluations were compared using AUC. B. Tumor classification model development. (a) Tile image collected by Katther et al. [13] were download from the publicly available website (doi.org/10.5281/zenodo.2530788) to pretrain a tumor tissue classifier. Tiles were color normalized by Macenko’s method. (b) The classifier has excellent performance of classifying tissue (overall accuracy = 99.67%) and detecting tumor tumor tiles (accuracy = 99.8%).
Table 1.
Datasets for constructing the MSI classification model.
Fig 2.
Examples of whole slide image (WSI) and tumor tile predictions.
The WSIs were comprised both normal and tumor tissues (a,c), which were divided into 521 × 521 pixel patches. Subsequently, utilizing a tumor classifier, only the tumor tissues were selected, and the resulting patches were integrated into a single image (b,d) on the side corresponding to the WSI for validation. Annotated regions of normal tissues are indicated by blue or green lines, while regions of tumor tissues are denoted by red lines (A-a and B-a,c). In the case of A-c, the green line represents the annotation region of the tumor.
Table 2.
The performance of our models.
Fig 3.
Visualization of MSI probability heatmap at the slide level.
A.Whole slide images. B. Corresponding predicted MSI heatmaps for the image shown in A visualize patch-level MSI scores generated by three single-tissue trained models and a CRC-STAD-UCEC tissue trained model. The average patch-level MSI score beneath each heatmap represents the slide’s MSI value. The heatmap bar illustrates MSI scores ranging from 0 to 1, where values closer to 1 indicate MSI-H and values closer to 0 suggest a higher probability of MSS.
Fig 4.
Comparative examples of MSI prediction patterns and tissue morphology.
The prediction heatmaps (a and c) display results generated using an EfficientNet Model1 architecture with multi-tissue training. These maps show predicted microsatellite instability status across tissue samples, with corresponding H&E histology images (b and d) revealing the actual tissue morphology from regions marked by white boxes. A and B represent colorectal cancer and stomach cancer, respectively, with results showing microsatellite instability high (a, MSI scores: 0.85 and 0.88) and microsatellite stable (c, MSI scores: 0.18 and 0.08) status.
Fig 5.
False results of microsatellite instability (MSI) prediction.
A. MSS falsely classifed as MSI-H (false positve). B. MSI-H falsely classfied as MSS (False netative). Left image is a Whole Slide Image, and the two images on the right are visualizations of MSI probability heatmaps at the slide level. The average patch-level MSI score beneath each heatmap represents the slide’s MSI value. The heatmap bar illustrates MSI scores ranging from 0 to 1, where values closer to 1 indicate MSI-H and values closer to 0 suggest a higher probability of MSS.