Fig 1.
PCA of GSE218007 and GSE237752 datasets before and after batch effect adjustment.
(A) PCA prior to batch effect adjustment. The plot shows a clear separation of samples from the two datasets (GSE218007, red circles; GSE237752, cyan triangles) along the first principal component. (B) PCA following batch effect adjustment. After correction, the samples from both datasets show substantial overlap.
Fig 2.
The color scale represents normalized expression levels, with red indicating high expression and blue indicating low expression.
Fig 3.
(A) GO enrichment analysis. The bar plots show the significant enrichment terms for the DEGs across the three GO categories: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The length of the bar represents the number of genes associated with each term, and the color corresponds to the statistical significance, with red indicating greater significance. (B) GSEA enrichment plots. The plots illustrate the enrichment of representative KEGG pathways that are significantly enriched in the control (left) and keloid (right) groups.
Fig 4.
Machine learning workflow for feature gene selection and model performance evaluation.
(A) Identification of the optimal regularization parameter (λ) using cross-validation. The binomial deviance (red curve) across values of log(λ) is plotted, with the gray band representing the confidence interval. (B) LASSO coefficient shrinkage paths. As regularization intensifies, coefficients for less relevant features are shrunk to zero, with 14 genes retaining non-zero coefficients at the optimal threshold. (C) Random Forest Out-of-Bag (OOB) Error Convergence. The plot depicts the decline in OOB error rate as the number of trees increases. (D) Variable Importance Based on MeanDecreaseGini. Genes are ranked by their importance scores, with longer bars indicating greater contribution to classifying keloid versus normal samples. (E) SVM-RFE feature selection accuracy curve. The plot shows the 10-fold cross-validation (CV) accuracy versus the number of selected features. (F) SVM-RFE cross-validation error profile. The relationship between feature subset size and 10-fold CV classification error is displayed.
Fig 5.
Identification of consensus biomarker genes for keloid diagnosis.
Consensus feature gene selection by LASSO, Random Forest, and SVM-RFE.
Fig 6.
(A) Validation of candidate gene expression in the testing cohort GSE44270 (n = 12; 9 keloid and 3 normal). Box plots comparing the expression levels of the two consensus biomarkers, (left) TNFRSF9 and (right) AMPH, between Keloid (red) and Control (blue) samples. (B) ROC curve for features in the merged training cohort (total n = 35; 26 keloid, 9 normal). (C) ROC curve for features in the testing cohort GSE44270 (n = 12; 9 keloid and 3 normal).
Fig 7.
PPI network construction and hub gene identification.
(A) PPI network of DEGs. The network depicts interactions among keloid-related DEGs, with nodes representing proteins and edges representing interactions. (B) Topological analysis of hub genes using CytoHubba. The core hub genes were consistently identified by three distinct centrality algorithms: Closeness, Degree, and EPC which is a connected subgraph formed through a process called edge percolation, where edges in a network are probabilistically selected based on a specific criterion, such as a weight or confidence score.
Fig 8.
Identification and of Aloe vera targets for keloid.
Identification of overlapping targets between Aloe vera and keloid.
Fig 9.
Validation of Aloe vera targets for keloid.
Validation of candidate gene expression in an independent cohort GSE44270 (n = 12; 9 keloid and 3 normal). Box plots comparing the expression levels of HAS2, IL6, and TOP2A between keloid and normal skin (Con) samples in the independent validation cohort GSE44270 (n = 12; 9 keloid and 3 normal).
Fig 10.
Molecular docking analysis of quercetin with key keloid-related targets.
The 3D structure (left) shows quercetin (green sticks) embedded in the binding pocket of HAS2 or IL6, with a calculated binding energy of −8.6 kcal/mol and −7.7 kcal/mol respectively. The 2D interaction map (right) details hydrogen bonds (green dashed lines) and hydrophobic interactions (purple dashed lines) with key residues.