Annotation-free prediction of immunotherapy response in melanoma using single-cell transcriptomic data
Fig 2
Gene-based classification model using scRNA-seq data for immunotherapy response.
(A) Schematic overview of classification model building using differentially expressed genes (DEGs). Dataset (n = 16,290; baseline and post-treatment) was split into 80% training and 20% test sets. Various models, including XGBoost, Random Forest, Logistic Regression, SVM, FNN, and 1D CNN, were trained using the training set. Hyperparameters were optimized through cross-validation and the best parameters were used to retrain the models. The retrained models were then evaluated on testing set to obtain final performance metrics. (B) ROC curves of classification models. XGBoost achieved the highest area under the curve (AUC = 0.87), followed by Random Forest (AUC = 0.86) and SVM (AUC = 0.86). (C) Scatter plot comparing feature importance rankings from XGBoost and Random Forest. A subset of 29 genes was consistently identified as important for classification. (D) To assess the reproducibility of the predictive model, we permuted the dataset 100 times and measured AUC of the model; the results were similar to the original test (Spearman correlation test). (E) Dot plot showing expression levels of 29 genes in responders and non-responders (related to Fig 2C). (F) Differences in CCR7 and MTRNR2L2 expression between responders and non-responders in baseline and post-treatment samples, respectively (Wilcoxon rank-sum test). %Exp indicates the percentage of cells expressing the gene.