Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Evaluating deep learning-based melanoma classification using immunohistochemistry and routine histology: A three center study

  • Christoph Wies ,

    Contributed equally to this work with: Christoph Wies, Lucas Schneider

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft

    Affiliations Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany, Medical Faculty, University Heidelberg, Heidelberg, Germany

  • Lucas Schneider ,

    Contributed equally to this work with: Christoph Wies, Lucas Schneider

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, Writing – original draft

    Affiliation Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany

  • Sarah Haggenmüller,

    Roles Resources, Writing – review & editing

    Affiliation Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany

  • Tabea-Clara Bucher,

    Roles Project administration, Writing – review & editing

    Affiliation Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany

  • Sarah Hobelsberger,

    Roles Resources, Writing – review & editing

    Affiliation Department of Dermatology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany

  • Markus V. Heppt,

    Roles Resources, Writing – review & editing

    Affiliation Department of Dermatology, Uniklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

  • Gerardo Ferrara,

    Roles Conceptualization, Resources, Writing – review & editing

    Affiliation Anatomic Pathology and Cytopathology Unit—Istituto Nazionale Tumori di Napoli, IRCCS “G. Pascale”, Naples, Italy

  • Eva I. Krieghoff-Henning ,

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    ‡ EIKH and TJB also contributed equally to this work.

    Affiliation Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany

  • Titus J. Brinker

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    titus.brinker@dkfz.de

    ‡ EIKH and TJB also contributed equally to this work.

    Affiliation Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany

Abstract

Pathologists routinely use immunohistochemical (IHC)-stained tissue slides against MelanA in addition to hematoxylin and eosin (H&E)-stained slides to improve their accuracy in diagnosing melanomas. The use of diagnostic Deep Learning (DL)-based support systems for automated examination of tissue morphology and cellular composition has been well studied in standard H&E-stained tissue slides. In contrast, there are few studies that analyze IHC slides using DL. Therefore, we investigated the separate and joint performance of ResNets trained on MelanA and corresponding H&E-stained slides. The MelanA classifier achieved an area under receiver operating characteristics curve (AUROC) of 0.82 and 0.74 on out of distribution (OOD)-datasets, similar to the H&E-based benchmark classification of 0.81 and 0.75, respectively. A combined classifier using MelanA and H&E achieved AUROCs of 0.85 and 0.81 on the OOD datasets. DL MelanA-based assistance systems show the same performance as the benchmark H&E classification and may be improved by multi stain classification to assist pathologists in their clinical routine.

Introduction

Melanoma diagnoses have increased in recent decades [1] and melanoma is the fifth most common cancer in the United States [2]. In spite of its relatively high frequency, melanoma is often difficult to be histopathologically differentiated from nevi, a high diagnostic discordance rate having been reported even among experienced histopathologists [3]. If a melanoma is initially misclassified as nevus and therefore diagnosed at a later stage, the patient’s chances of survival could be significantly reduced and therapy will probably have to be more intense. On the other hand, if harmless benign lesions are diagnosed as melanoma, the patient will suffer an unnecessary psychological and physical burden. In individual cases, overdiagnosis can even lead to unnecessary, expensive and stressful therapies, which can also be associated with high costs in the healthcare system and unnecessary toxicity for affected patients [4]. More precise diagnostic options could contribute to overcoming these problems.

Due to rapid technological advances of the last few years, AI-based assistance systems may become powerful tools for pathological cancer diagnostics. Deep Learning (DL) with Convolutional Neural Networks (CNN) has shown promise in studies aimed at distinguishing melanomas and nevi on digitized hematoxylin and eosin (H&E)-stained whole slide images (WSI) [5, 6]. In some cases, the DL approach could even outperform humans [7]. However, accuracy of these classifiers especially on external data still shows room for improvement.

In addition to standard H&E-stained slides, immunohistochemical (IHC)-stained tissue sections are often available for many cancer entities and represent a source of complementary prognostic and/or predictive information in addition to H&E-stained tissue. However, the analysis of IHC-stained slides by DL models is a relatively new area of research. Recent studies, however, have employed DL for successful classification of non-skin cancer entities, i.e., to determine HER2 status in breast cancer [8] and immune cell multistains as prognostic and predictive biomarkers in colorectal cancer [9] on IHC-stained slides. Moreover, as shown in previous work, the fusion of different data modalities often improves generalizability and performance of DL models [911].

IHC-stains routinely used by pathologists to better differentiate other, usually benign lesions from melanomas are MelanA (MART-1) [12], HMB-45 [13], Ki-67 [14], tyrosinase [15], S100 [16] and PRAME [17]. The expression of melanocyte antigen MelanA (also called melanoma antigen recognized by T-cells 1 (MART1)) is a lineage-specific melanocytic marker which is commonly used by histopathologists for routine diagnosis of melanocytic neoplasms, since it highlights the cytomorphology and the distribution of melanocytes.

IHC expression of MelanA can be automatically analyzed using state-of-the-art artificial intelligence (AI) methods. In this study, we investigate the use of DL-based image analysis models on MelanA-stained tissue for melanoma classification in comparison and in addition to the standard H&E-based diagnosis.

Materials and methods

The presented study investigates melanoma suspicious lesions based on dermatoscopic investigation, which were verified histopathologically as melanoma or nevus. We use DL models to classify whether a lesion is a melanoma or a nevus based on MelanA or H&E stained tumor tissue or a combination of both stains.

Ethics approval was obtained from the ethics committees of the technical university in Dresden. Patients provided informed written consent. This work was performed in accordance with the Declaration of Helsinki [18].

Datasets

The inclusion criteria to participate in our study was to be 18 years old with melanoma-suspicious skin lesions that were biopsied after dermoscopic examination. Suspicious lesions that were pre-biopsied or located near the eye, under the fingernails or toenails were excluded. The ground truth labels were histopathological confirmed by at least one reference dermatopathologist investigating at least the H&E-stained reference slide. MelanA (MART-1) [19, 20] immunohistochemical (IHC) and Hematoxylin and Eosin (H&E) stained tissue slides from the university hospital in Dresden were used for training, validation and hold-out testing. Slides from the university hospital in Erlangen and from the National Cancer Institute of Naples were used for out of distribution (OOD) testing. Table 1 describes the population of all three cohorts. The Dresden, Erlangen and Naples cohorts were collected prospectively, with data collection starting in April 2021 and ending in April 2023, participants provided informed written consent. Data was physically transferred in batches. Data received before 2023 from the university hospital in Dresden was used as a training set, data received later was used as a holdout test dataset. The labels of the datasets were pathologically verified. All 3 cohorts differ in the stains of MelanA slides. Antibodies from different manufacturers with different dilutions were used at each site (Table 2). Representative WSI thumbnails from all 3 cohorts and for both stains are shown in the supplements in S1 Fig.

thumbnail
Table 1. Description of the population in our datasets.

For continuous features we report median, range, and number of NAs, for categorical features we report the total number of observations per group. Here the training population as well as all three test populations are described. Melanoma in situ describes the early stage of a malignant melanoma that has not yet broken through the basement membrane. However, features at the cellular level do not differ between melanoma in situ and malignant melanoma.

https://doi.org/10.1371/journal.pone.0297146.t001

thumbnail
Table 2. Antibodies and parameters of staining methods used by the different clinics.

https://doi.org/10.1371/journal.pone.0297146.t002

Pre-processing

IHC and adjacent H&E slides from the Dresden, Erlangen and Naples cohorts were digitized with an Aperio® AT2 Slide Scanner with a 40× magnification resulting in WSIs with a resolution of 0.25 μm/px. Tumor boundaries were manually annotated under expert supervision with the QuPath digital pathology software version 0.3 [21]. WSIs were tessellated into patches of 237 px x 237 px by an in-house developed QuPath script for each slide in different (40x, 20x, 10x, 5x) magnifications for IHC WSIs and in 40x magnification for H&E WSIs. Tiles with 40x magnification were created with a size of 60 x 60 μm, which corresponds to 237px x 237px. Tile sizes at 20x, 10x and 5x magnification are 120x120 μm, 240x240 μm and 480x480 μm, respectively. All tiles used for training, validation and hold-out testing were extracted without stride/overlap.

Models

To classify pigmented lesions between melanomas and nevi, the ResNet architecture introduced by He et al. [22] was selected as a model for all data modalities. The hyperparameters of the different models were tuned individually using the Bayesian optimization framework Optuna [23] and five-fold cross-validation, all the models where load from the timm library [24] To avoid overfitting with respect to slides containing a huge amount of tiles, we used weighted sampling to train with a predefined amount of tiles per slide in all epochs. The hyperparameters we tuned were the size of the ResNet, the learning rate, the number of training epochs, the type of pooling, the number of tiles used per training epoch and whether or not the initialized ResNet was pretrained on ImageNet. The parameterization of all models is shown in the supplements in S1 Table.

The slide prediction procedure for the different image modalities is as follows: The models were trained at the tile level (with resolutions of 0.25, 0.5, 1.0 and 2.0μm/px), using the slide label for each tile of the slide. All tiles of the slide were predicted and the slide score was calculated by averaging all tile scores (Fig 1). To train models capable of handling domain shifts, the color jitter augmentation package of PyTorch [25] was used as part of the training process, as mentioned by Tellez et al. [26]. In contrast to H&E stained slides, features of protein expression can be distributed over a larger area in the cytoplasm. For this purpose, different magnifications were used to analyze these larger features.

thumbnail
Fig 1. Schematic diagram of the different models.

The red box shows the pipeline for MelanA-stained WSIs and the purple box the pipeline for H&E-stained WSIs. We tessellated MelanA-stained WSIs corresponding to different magnifications and trained individual models on each tile size. The class probabilities for each tile were predicted and aggregated into a slide score by averaging all tile scores. For the H&E-based model we proceeded in the same way.

https://doi.org/10.1371/journal.pone.0297146.g001

Combined models

Unimodal classifiers were combined to build models based on multiple data modalities. A classifier based on all four MelanA magnifications was built, where predictions with higher certainty give a higher contribution to the combined prediction. Scores of the different magnification models were averaged and weighted based on their distance to the optimal decision threshold. Other fusion approaches like averaging the scores unweighted or weighted based on the model’s validation performances were investigated, all of which yielded comparable results (shown in the supplements, S2 Table). The H&E classifier was combined with the MelanA multiscale classifier using the same fusion method.

Motivated through the clinical practice we investigated another setup, called the hierarchical setup, where we first predict the label based on the H&E-classifier but add the MelanA based classifier for those lesions where the H&E WSI leads to an uncertain prediction only.

To calculate whether or not a H&E-based prediction was uncertain, we calculated confidence intervals (CIs) of the slide-level score via bootstrap and checked afterwards whether the optimal decision threshold is contained in the 95% CI. For cases where the threshold was contained in the CI of the slide-level score we added the MelanA based classifier.

Reporting

For all results, 95% CIs are given next to the corresponding Areas under the receiver operating characteristic curve (AUROCs) of the model. CIs were calculated using the bootstrap-method [27]. The method was applied to the predicted values of a cohort. AUROCs were then calculated for this bootstrap cohort. After 10,000 repetitions the 2.5% as well as the 97.5% quantiles and thus, the 95% CI were calculated.

Results

W The unimodal H&E classifier is based on previous works [5, 10, 2830] and was adapted to the corresponding MelanA resolutions. The MelanA-models were combined and fused with the H&E model into one multi-modal classifier. All described models were tested within internal distribution (InD) on the Dresden holdout set, and OOD on the Erlangen and Naples cohorts (see Table 1). AUROCs and bootstrapped CIs for all models are shown in Table 3.

thumbnail
Table 3. AUROC values as well as 95% bootstrapped CIs for the three test cohorts and all evolved models.

https://doi.org/10.1371/journal.pone.0297146.t003

All results differ significantly from random guessing since no CI contains 0.5, the critical value. Thus, we are able to classify melanoma on all evolved MelanA-based models as well as with the benchmark H&E-based model as well. Beside this, it should be highlighted that almost all models on all cohorts perform with a AUROC significantly better than 0.7 which makes findings probably relevant for clinical practice. However, note that CIs overlap in several cases, indicating that different models perform similarly and thus, probably contain a high amount of shared Information.

In addition, we investigated another hierarchical approach motivated by clinical practice, using only MelanA-stained slides for cases where the H&E-based model is uncertain.

The ROC diagrams of the MelanA-based, the H&E-based, and the combined models for all three cohorts are shown in Fig 2. Another representation of this plot, to better compare models within one cohort is shown in the supplements in S4 Fig. Additional ROC plots of the individual MelanA models, which consider only one magnification, and results of the hierarchical approach are shown in the supplementary material in S2 and S3 Figs.

thumbnail
Fig 2. ROC plots by data modality with corresponding AUROC values.

The different subplots show results for the individual evolved models: A: MelanA-based performance B: H&E-based performance taking all magnifications into account C: combined model using H&E as well as MelanA by aggregating the individual scores. The different colors of the ROC curves show from which data source site the results come: Red: internal results (Dresden), Blue: external results (Erlangen), Purple: external results (Naples).

https://doi.org/10.1371/journal.pone.0297146.g002

MelanA-based classifiers

The curves of the different magnifications are shown in the supplementary material in S1 Fig. They overlap at several points in all cohorts, which means that for different sensitivity/specificity trade-offs, different magnifications lead to the best results. In the internal cohort, the classifiers reached AUROCs between 0.85 and 0.92, in the Erlangen cohort AUROCs between 0.67 and 0.78, and in the Naples cohort AUROCs between 0.75 and 0.80. The CIs of the different magnifications overlap in all cohorts, so there is no magnification that leads to a significantly best performance overall.

The combination of all 4 magnifications, shown in Fig 2 A), was not significantly different from the models that use only one magnification.

In the Dresden (0.88) and Erlangen (0.74) cohorts, the AUROC of the combined MelanA model without considering CIs is worse than that of the (0.50 μm/px) model as a stand-alone classifier. For the Naples cohort, the AUROC of the combined MelanA classifier (0.82) is slightly, but not significantly, better than all individual models.

H&E-based classifier

The classifier using only H&E-stained tissue, as our benchmark, achieved an AUROC of 0.96 on the internal test set and AUROCs of 0.75 and 0.81 on the external cohorts, respectively. The ROC plot in Fig 2 B) and the results in Table 3 show that the internal performance is significantly better than the external performance. Performance on both external data sets is not significantly different.

Combined classifiers using H&E and MelanA

The model based on both data modalities, the H&E-stained tissue as well as the MelanA- stained tissue of all investigated resolutions, shown in Fig 2 C), performs numerically slightly worse compared to the H&E model on the Dresden cohort, reaching an AUROC of 0.94.

However, in the external cohorts the combined model performs best in absolute numbers, reaching AUROCs of 0.81 and 0.85 on the Erlangen and Naples cohorts, respectively. Nevertheless, the performance of the combined model is not significantly different from the MelanA-based model or from the H&E-based model for any of the investigated cohorts.

The hierarchical approach, where MelanA predictions are only taken into account when H&E-based prediction is uncertain, which reflects the diagnostic path better, leads to ROC-plots shown in S3 Fig. This approach resulted in the numerically best, albeit still not significantly different, performance on the internal cohort. It did not change results on Naples, the smaller external cohorts, since the H&E-based model was only uncertain for one sample within the Naples cohort and was certain for all samples in the Erlangen cohort.

Discussion

In this work, we were able to predict melanoma/nevi classification across multiple datasets on MelanA slides with a similar accuracy as on benchmark H&E slides using DL-based image analysis. Furthermore, the results may suggest that the multistain approach has the potential to improve prediction accuracy and robustness, since at least on both external cohorts the combined model reached the highest AUROCs.

To integrate the presented work into clinical practice, a method for AI-pathologist interaction needs to be developed. For this purpose, we are developing an Explainable artificial intelligence system in collaboration with dermatologists [31], which produces easily interpretable explanations based on dermatoscopic images and aims to be integrated as an AI tool into digital pathology and clinical practice. Such a system can be expanded to include other data modalities such as immunohistochemistry or routine histology.

In clinical practice, pathologists often use H&E-stained tissue sections for melanoma diagnosis and resort to IHC-stained tissue in uncertain cases [32]. While DL-assisted detection of melanoma on H&E sections has been well studied [7], few studies have been performed using additional routine IHC-stained slides. Digital image analysis by automated quantification of the proliferation marker Ki-67 was used to distinguish melanoma from nevi as a diagnostic and prognostic aid [33]. Recently, an improved DL annotation method for H&E/SOX10 dual stains was developed to better identify tumor cells in cutaneous melanoma [34]. In the study presented here, MelanA-stained tissue was selected as an additional diagnostic tool since it highlights the cytomorphology and the distribution of melanocytes, thereby allowing a more accurate evaluation of the architecture of any melanocytic tumor, along with the size and the shape of single cells. Other IHC stains such as HMB45, p16, and PRAME were excluded because they are useful only in selected cases. However, SOX10 was not chosen because it is a nuclear marker and gives no idea about the actual size of melanocytes and about the morphologic features of their dendritic processes. Finally, Ki67, although largely used in routine, is of little help in the recognition of in situ and early invasive melanoma [1316].

In the current pathological routine, IHC markers including MelanA are used heterogeneously in different hospitals and laboratories. At the university hospital in Dresden, generally all dermatologically melanoma-suspicious skin lesions are stained with MelanA, providing an unbiased training dataset for our study. In contrast, the OOD datasets likely contain more challenging lesions since MelanA-stained tissue was only prepared at the university hospital in Erlangen in case the H&E-stained slides provided uncertain pathological results. The Naples dataset contained 40% in situ melanomas, all of which are small in size and generate few tiles, making classification in general potentially difficult.

The Dresden test set apparently does not benefit from the inclusion of additional data (S4 Fig), since the H&E-stained tissue slides are already sufficient to yield maximum accuracy. This may be due to the rather unambiguous dataset and thus, the very high performance and a broad data set with many subclasses. In contrast, the OOD datasets benefit from incorporating the additional MelanA-stained slides, making the classifier externally more robust. A combined classifier thus provides an advantage here, a finding we have already made in predicting BRAF status using H&E, clinical and methylation data in melanoma [10] suggesting that a multi stain based classifier can lead to better generalizability. Although the information contained in the H&E- and the MelanA-stained slides is probably partially redundant, one can still see a benefit of combining both stains on OOD data.

A detailed analysis of the various misclassifications revealed that it was not individual histopathological features that caused the model to underperform, but mainly technical artifacts such as overlapping section fragments, low staining intensity combined with strong pigmentation and very small lesions that resulted in only a small number of tiles. This also explains why the MelanA+H&E classifier did not perform better in the Dresden cohort. The simple H&E classifier is already sufficiently good; a combination in the Dresden hold out dataset does not lead to any improvement in individual cases, as there are individual unseen technical coloring artifacts here. This could be avoided by increasing the amount of data in the training set or by excluding the staining artifacts. In clinical pathology, MelanA staining is used in parallel as well as in addition to H&E staining to perform melanoma and nevi classifications. The combined model design was conceived as a parallel evaluation, whereby the model has the greatest possible information at its disposal.

Due to the cytoplasmic distribution of the MelanA protein, tiles from a higher magnification can potentially be too small to extract all relevant features. Pathologists frequently investigate the MelanA stains at lower magnifications to evaluate the silhouette and overall architecture of the lesion, which also contains valuable information. Our data could not show that there is an identifiable best magnification. However, each magnification seems to contain partly different information as the combination of all 4 magnifications brings a slight overall improvement, which can be attributed to ensembling.

Contrary to clinical practice, the hierarchical approach did not lead to any improvement on external datasets. This shows that an unbiased dataset is preferable for training a DL model, since the network can make better decisions with larger datasets. Interestingly, the uncertain Dresden specimens are lesions with large diameters of 8 mm to 17 mm, where a melanoma has developed in the center of a nevus, with melanoma features smoothly merge into nevus features which probably confuses the model, as all tiles are weighted equally in our model. In contrast, the uncertain lesion in Naples is very small with a diameter of <1.0 mm.

Limitations

Overall, the major limitation of this study is the relatively small sample size of the external test sets. In addition, the above-mentioned variability in the pathological routine as well as the different staining protocols of the respective clinics complicate the comparison of the results and findings. In addition, a not inconsiderable label noise must be taken into account, since the labels were histopathologically verified according to the gold standard of care, but a high interrater variability must be assumed, as shown in previous studies [3, 35].

Conclusions

With DL analysis of MelanA-stained tissue, we were able to classify melanomas and nevi in two distinct OOD cohorts with similar accuracy as with H&E-stained tissue. The numerically, but not statistically significantly, better classification results achieved by combining H&E and MelanA classifiers suggests that the combination of these image modalities may lead to improved generalizability and performance. However, these results need to be confirmed in larger studies containing more lesions.

Supporting information

S1 Fig. Representative thumbnails for melanoma, melanoma in-situ, and nevus for all three cohorts.

https://doi.org/10.1371/journal.pone.0297146.s001

(TIF)

S2 Fig. ROC plots by data source site with corresponding AUROC values.

A: Results from Dresden B: Results from Erlangen C: Results from Naples. Red: 40x magnification Blue: 20x magnification Purple: 10x magnification Gray: 5x magnification.

https://doi.org/10.1371/journal.pone.0297146.s002

(TIF)

S3 Fig. ROC plot of the hierarchical compared to the combined approach with corresponding AUROC values by data source site.

A: Results from Dresden B: Results from Erlangen C: Results from Naples. Black: Results of the combined approach using H&E and MElanA for all lesions Red: Hierarchical approach using MelanA-stained tissue only for H&E-based uncertain lesions.

https://doi.org/10.1371/journal.pone.0297146.s003

(TIF)

S4 Fig. ROC plots by data modality with corresponding AUROC values.

A: Results from Dresden B: Results from Erlangen C: Results from Naples. Red: MelanA-based performance taking all magnifications into account Purple: H&E-based performance Black: combined model using H&E as well as MelanA by aggregating the individual scores.

https://doi.org/10.1371/journal.pone.0297146.s004

(TIF)

S1 Table. Hyperparameters of all evolved models.

https://doi.org/10.1371/journal.pone.0297146.s005

(XLSX)

S2 Table. Additional results derived by using different fusion approaches: Dist-opt means weighted by the distance to the individual models optimal thresholds; dist-05 means weighted by the distance to the default threshold of 0.5; avg denotes the fusion by conducting a simple average of all scores; perf means weighted based on the individual models validation performance in a way that better performing models contribute more to the fused result.

https://doi.org/10.1371/journal.pone.0297146.s006

(XLSX)

References

  1. 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021 May;71(3):209–49. pmid:33538338
  2. 2. Saginala K, Barsouk A, Aluru JS, Rawla P, Barsouk A. Epidemiology of Melanoma. Med Sci. 2021 Oct 20;9(4):63. pmid:34698235
  3. 3. Elmore JG, Barnhill RL, Elder DE, Longton GM, Pepe MS, Reisch LM, et al. Pathologists’ diagnosis of invasive melanoma and melanocytic proliferations: observer accuracy and reproducibility study. BMJ. 2017 Jun 28;357:j2813. pmid:28659278
  4. 4. Niebling MG, Haydu LE, Karim RZ, Thompson JF, Scolyer RA. Pathology review significantly affects diagnosis and treatment of melanoma patients: an analysis of 5011 patients treated at a melanoma treatment center. Ann Surg Oncol. 2014 Jul;21(7):2245–51. pmid:24748128
  5. 5. Höhn J, Krieghoff-Henning E, Jutzi TB, von Kalle C, Utikal JS, Meier F, et al. Combining CNN-based histologic whole slide image analysis and patient data to improve skin cancer classification. Eur J Cancer Oxf Engl 1990. 2021 May;149:94–101. pmid:33838393
  6. 6. Li M, Abe M, Nakano S, Tsuneki M. Deep Learning Approach to Classify Cutaneous Melanoma in a Whole Slide Image. Cancers. 2023 Mar 22;15(6):1907. pmid:36980793
  7. 7. Brinker TJ, Schmitt M, Krieghoff-Henning EI, Barnhill R, Beltraminelli H, Braun SA, et al. Diagnostic performance of artificial intelligence for histologic melanoma recognition compared to 18 international expert pathologists. J Am Acad Dermatol. 2022 Mar;86(3):640–2. pmid:33581189
  8. 8. Tewary S, Mukhopadhyay S. AutoIHCNet: CNN architecture and decision fusion for automated HER2 scoring. Appl Soft Comput. 2022 Apr;119:108572.
  9. 9. Foersch S, Glasner C, Woerl AC, Eckstein M, Wagner DC, Schulz S, et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat Med [Internet]. 2023 Jan 9 [cited 2023 Jan 23]; Available from: https://www.nature.com/articles/s41591-022-02134-1 pmid:36624314
  10. 10. Schneider L, Wies C, Krieghoff-Henning EI, Bucher TC, Utikal JS, Schadendorf D, et al. Multimodal integration of image, epigenetic and clinical data to predict BRAF mutation status in melanoma. Eur J Cancer. 2023 Apr;183:131–8. pmid:36854237
  11. 11. Schneider L, Laiouar-Pedari S, Kuntz S, Krieghoff-Henning E, Hekler A, Kather JN, et al. Integration of deep learning-based image analysis and genomic data in cancer pathology: A systematic review. Eur J Cancer. 2022 Jan;160:80–91. pmid:34810047
  12. 12. Kawakami Y, Eliyahu S, Delgado CH, Robbins PF, Rivoltini L, Topalian SL, et al. Cloning of the gene coding for a shared human melanoma antigen recognized by autologous T cells infiltrating into tumor. Proc Natl Acad Sci. 1994 Apr 26;91(9):3515–9. pmid:8170938
  13. 13. Gown AM, Vogel AM, Hoak D, Gough F. Monoclonal Antibodies Specific for Melanocytic Tumors Distinguish Subpopulations of Melanocytes. 1986;9.
  14. 14. Soyer HP. Kl 67 immunostaining in melanocytic skin tumors. Correlation with histologic parameters. J Cutan Pathol. 1991 Aug;18(4):264–72.
  15. 15. Chen YT, Stockert E, Tsang S, Coplan KA, Old LJ. Immunophenotyping of melanomas for tyrosinase: implications for vaccine development. Proc Natl Acad Sci. 1995 Aug 29;92(18):8125–9. pmid:7667256
  16. 16. Cho KH, Hashimoto K, Taniguchi Y, Pietruk T, Zarbo RJ, An T. Immunohistochemical study of melanocytic nevus and malignant melanoma with monoclonal antibodies against s-100 subunits. Cancer. 1990 Aug 15;66(4):765–71. pmid:2201426
  17. 17. Watari K, Tojo A, Nagamura-Inoue T, Nagamura F, Takeshita A, Fukushima T, et al. Identification of a melanoma antigen, PRAME, as a BCR/ABL-inducible gene. FEBS Lett. 2000 Jan 28;466(2–3):367–71. pmid:10682862
  18. 18. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015 Oct 28;351:h5527. pmid:26511519
  19. 19. Coulie PG, Brichard V, Van Pel A, Wölfel T, Schneider J, Traversari C, et al. A new gene coding for a differentiation antigen recognized by autologous cytolytic T lymphocytes on HLA-A2 melanomas. J Exp Med. 1994 Jul 1;180(1):35–42. pmid:8006593
  20. 20. Kawakami Y, Eliyahu S, Sakaguchi K, Robbins PF, Rivoltini L, Yannelli JR, et al. Identification of the immunodominant peptides of the MART-1 human melanoma antigen recognized by the majority of HLA-A2-restricted tumor infiltrating lymphocytes. J Exp Med. 1994 Jul 1;180(1):347–52. pmid:7516411
  21. 21. Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y, McArt DG, Dunne PD, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. 2017 Dec 4;7(1):16878. pmid:29203879
  22. 22. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [Internet]. Las Vegas, NV, USA: IEEE; 2016 [cited 2023 Jun 6]. p. 770–8. Available from: http://ieeexplore.ieee.org/document/7780459/
  23. 23. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework [Internet]. arXiv; 2019 [cited 2022 Dec 13]. Available from: http://arxiv.org/abs/1907.10902
  24. 24. Wightman R, Raw N, Soare A, Arora A, Ha C, Reich C, et al. rwightman/pytorch-image-models: v0.8.10dev0 Release [Internet]. Zenodo; 2023 [cited 2023 Nov 20]. Available from: https://zenodo.org/record/4414861
  25. 25. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library [Internet]. arXiv; 2019 [cited 2023 Jun 5]. Available from: http://arxiv.org/abs/1912.01703
  26. 26. Tellez D, Litjens G, Bándi P, Bulten W, Bokhorst JM, Ciompi F, et al. Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Med Image Anal. 2019 Dec;58:101544. pmid:31466046
  27. 27. Efron B. Bootstrap Methods: Another Look at the Jackknife. Ann Stat. 1979;7(1):1–26.
  28. 28. Zhou Z, Ren Y, Zhang Z, Guan T, Wang Z, Chen W, et al. Digital histopathological images of biopsy predict response to neoadjuvant chemotherapy for locally advanced gastric cancer. Gastric Cancer. 2023 Sep;26(5):734–42. pmid:37322381
  29. 29. Kulkarni PM, Robinson EJ, Sarin Pradhan J, Gartrell-Corrado RD, Rohr BR, Trager MH, et al. Deep Learning Based on Standard H&E Images of Primary Melanoma Tumors Identifies Patients at Risk for Visceral Recurrence and Death. Clin Cancer Res. 2020 Mar 1;26(5):1126–34.
  30. 30. Wessels F, Schmitt M, Krieghoff-Henning E, Kather JN, Nientiedt M, Kriegmair MC, et al. Deep learning can predict survival directly from histology in clear cell renal cell carcinoma. Huk M, editor. PLOS ONE. 2022 Aug 17;17(8):e0272656. pmid:35976907
  31. 31. Chanda T, Hauser K, Hobelsberger S, Bucher TC, Garcia CN, Wies C, et al. Dermatologist-like explainable AI enhances trust and confidence in diagnosing melanoma. arxiv [Internet]. Available from: https://arxiv.org/abs/2303.12806
  32. 32. Kim SW, Roh J, Park CS. Immunohistochemistry for Pathologists: Protocols, Pitfalls, and Tips. J Pathol Transl Med. 2016 Nov;50(6):411–8. pmid:27809448
  33. 33. Nielsen PS, Riber-Hansen R, Raundahl J, Steiniche T. Automated quantification of MART1-verified Ki67 indices by digital image analysis in melanocytic lesions. Arch Pathol Lab Med. 2012 Jun;136(6):627–34. pmid:22646269
  34. 34. Nielsen PS, Georgsen JB, Vinding MS, Østergaard LR, Steiniche T. Computer-Assisted Annotation of Digital H&E/SOX10 Dual Stains Generates High-Performing Convolutional Neural Network for Calculating Tumor Burden in H&E-Stained Cutaneous Melanoma. Int J Environ Res Public Health. 2022 Nov 2;19(21):14327.
  35. 35. Lodha S, Saggar S, Celebi JT, Silvers DN. Discordance in the histopathologic diagnosis of difficult melanocytic neoplasms in the clinical setting. J Cutan Pathol. 2008 Apr;35(4):349–52. pmid:18333894