Table 1.
GEO Microarray Chip Information.
Fig 1.
Flowchart for comprehensive analysis of IMRDEGs.
PA: Pediatric asthma, TF: Transcription factor, DEGs: Differentially expressed genes, KEGG: Kyoto Encyclopedia of Genes and Genomes, IMRGs: Iron-metabolism-related genes, GO: Gene ontology, GSEA: Gene Set Enrichment Analysis, IMRDEGs: Iron-metabolism-related differentially expressed genes.
Fig 2.
Batch effect removal in combined GEO datasets.
(A). Box plots showing the distribution of the combined GEO datasets before batch removal. (B). Boxplots depicting the distribution of the same combined datasets after batch effect removal, illustrating the improved homogeneity across the datasets. (C). PCA plot showing the distribution of the datasets prior to batch effect correction. (D). PCA plot of the integrated combined GEO datasets following batch effect removal, demonstrating the improved clustering and reduced technical variation. In the PCA plots, the PA datasets are represented as follows: GSE27011 (orange), GSE40888 (green), and GSE40732 (purple). PCA: Principal component analysis, PA: Pediatric asthma, GEO: Gene Expression Omnibus.
Fig 3.
Differential gene expression analysis.
(A). Volcano plot of differentially expressed genes (DEGs) analysis between control and PA groups in the combined GEO datasets. (B). Venn diagram of DEGs and iron metabolism-related genes (IMRGs) in integrated GEO datasets (combined datasets). (C). Heatmap of iron-metabolism-related differentially expressed genes (IMRDEGs) sorted by logFC in integrated GEO datasets (combined datasets). Yellow represents the control group (control), and brown represents the PA group. Red in the heatmap denotes high expression, and blue denotes low expression. Not sig.: Not significant, GEO: Gene Expression Omnibus, logFC: log fold change.
Fig 4.
Correlation analysis of IMRDEGs.
(A). Group comparison map of expression disparities in iron-metabolism-related differentially expressed genes (IMRDEGs) in the combined GEO datasets. (B). Correlation heatmap of 15 iron metabolism-related differentially expressed genes (IMRDEGs) in integrated GEO datasets (combined datasets). ns indicates no statistical significance (p ≥ 0.05); *p < 0.05; **p < 0.01; ***p < 0.001. An absolute correlation coefficient (r-value) of <0.3 indicated weak or no correlation, whereas an r-value of 0.3–0.5 indicated weak correlation. Brown denotes the PA group, and yellow denotes the control group. Red indicates a positive correlation, and blue represents a negative correlation. PA: Pediatric asthma, IMRDEGs: Iron metabolism-related differentially expressed genes, Cor: Correlation, GEO: Gene Expression Omnibus.
Table 2.
Results of GO and KEGG Enrichment Analysis for IMRDEGs.
Fig 5.
GO and KEGG enrichment analysis for IMRDEGs.
(A). Bubble diagram of gene ontology (GO) and pathway (KEGG) enrichment analysis of differentially expressed genes related to iron metabolism (IMRDEGs): biological pathway (KEGG), molecular function (MF), cellular component (CC), and biological process (BP). The abscissa shows GO and KEGG terms. (B–E). GO and pathway (KEGG) enrichment analysis outcomes of the iron metabolism-related differentially expressed gene (IMRDEGs) network diagram showing BP (B), CC (C), MF (D), and KEGG (E). Yellow nodes denote items, green nodes denote molecules, and the lines denote the relationships between items and molecules. The bubble size in the bubble plot indicates the number of genes, with the bubble color reflecting the p-value. A deeper red color denotes a smaller p-value, whereas a deeper blue color indicates a larger p-value. A p-value <0.05 and FDR value (q-value) <0.25 served as screening criteria for GO and pathway (KEGG) enrichment analysis. IMRDEGs: Iron metabolism-related differentially expressed genes, CC: Cellular component, KEGG: Kyoto Encyclopedia of Genes and Genomes, GO: Gene Ontology, BP: Biological process, MF: Molecular function, FDR: false discovery rate.
Table 3.
Results of GSEA for Combined Datasets.
Fig 6.
(A). Gene set enrichment analysis (GSEA) mountain map presentation of biological functions in the integrated Gene Expression Omnibus (GEO) datasets (combined datasets). (B–E). GSEA results showed significant enrichment for all genes in the following pathways: Stambolsky Targets of Mutated TP53 DN (B), Rutella Response to CSF2RB and IL4 DN (C), TGF β signaling pathway (D), and Hinata NFKB targets fibroblast up (E). In the mountain map, the color represents the adjusted p-value (adj.p-value): darker red indicates a smaller adj.p-value, and darker blue indicates a larger adj.p-value. In the bubble plot, the bubble size represents the gene set size, and the bubble color indicates the adj.p-value, with darker red corresponding to smaller values and darker blue corresponding to larger values. Red in the heatmap denotes high expression, whereas blue denotes low expression. GSEA screening criteria were false discovery rate (FDR) value (q-value) <0.25 and adj.p-value <0.05, with Benjamini-Hochberg (BH) as the p-value correction method.
Fig 7.
(A). The diagnostic value of the 15 IMRDEGs included in the logistic regression model for PA is visualized through a forest plot. The horizontal axis for each gene represents the odds ratio and its 95% confidence interval; the red dots indicate the p-value of the gene, reflecting its statistical significance in the model. (B). The trend of accuracy changes in the SVM model. As the number of genes increases, the accuracy gradually rises, with the best accuracy being observed for four genes (0.6835). (C). Changes in the trend of error rates. As the number of features (genes) increases, the error rate gradually decreases, indicating that the model performs optimally when four genes are selected, with the error rate reaching its lowest point (0.365). (D). Visualization of the LASSO regression model. The error changes of the model under different regularization parameters (λ) are depicted, with the error decreasing as λ increases; the optimal λ value is marked as the one that minimizes the cross-validation error. €. The coefficient changes of each gene in the LASSO regression model. Different colors represent different genes; as the λ increases, the coefficients of each gene gradually decrease, illustrating the feature selection process. This figure demonstrates the importance of the 15 IMRDEGs in the diagnosis of PA and indicates the diagnostic potential of four key genes (C19orf12, IREB2, XK, and GDF15). PA: Pediatric asthma, LASSO: Least Absolute Shrinkage and Selection Operator, IMRDEGs: Iron Metabolism-Related Differentially Expressed Genes, SVM: Support vector machine.
Fig 8.
PA diagnostic and validation analysis.
(A). Nomograms of the combined Gene Expression Omnibus (GEO) datasets of model genes in PA diagnostic models. (B–C). Calibration curve plot (B) and decision curve analysis (DCA) plot (C) of pediatric asthma (PA) diagnostic model based on the RiskScore in integrated GEO datasets (combined datasets). (D). ROC curve of RiskScore in the integrated GEO datasets (combined datasets). (E). Comparative charts of model genes in the high-risk and low-risk PA groups. (F–G). ROC curves of model genes C19orf12 and IREB2 (F), XK and GDF15 (G) in the PA group. (H). Boxplot of functional similarity (Friends) analysis outcomes of model genes. The ordinate of the DCA plot represents the net benefit, whereas the abscissa denotes the probability threshold or threshold probability. **p < 0.01; ***p < 0.001. The AUC demonstrated some precision, ranging from 0.7 to 0.9. Pink represents the low-risk group, whereas purple represents the high-risk group. PA: Pediatric asthma, ROC: Receiver operating characteristic, AUC: Area under the curve, DCA: Decision curve analysis.
Fig 9.
Differential gene expression analysis and GSEA according to risk group.
(A–B). Volcano map (A) and heatmap of expression values (B) of differentially expressed genes analysis in high-risk and low-risk groups in the combined Gene Expression Omnibus (GEO) datasets. (C). Mountain plot presentation of four biological functions from gene set enrichment analysis (GSEA) of pediatric asthma (PA) specimens from integrated GEO datasets (combined datasets). (D–G). GSEA results indicated that PA specimens were significantly enriched in Reactome TP53 Regulates Transcription of Caspase Activators and Caspases (D), Plasari TGF-β Targets 10hr DN (E), Rutella Response to Hgf Vs Csf2rb and IL4 Up (F), and Wp Notch Signaling Pathway (G). Pink denotes the low-risk group, whereas purple denotes the high-risk group. In the mountain map, the color represents the adjusted p-value (adj.p-value): darker red indicates a smaller adj.p-value, and darker blue indicates a larger adj.p-value. In the bubble plot, the bubble size represents the gene set size, and the bubble color indicates the adj.p-value, with darker red shades corresponding to smaller values and darker blue shades corresponding to larger values. Red in the heatmap denotes high expression, whereas blue denotes low expression. Gene set enrichment analysis (GSEA) screening criteria were adj.p-value <0.05 and FDR value (q-value) <0.25, along with Benjamini-Hochberg (BH) as the p-value correction method.
Table 4.
Results of GSEA for Risk Group.
Fig 10.
Regulatory network of model genes.
(A). mRNA-TF regulatory network of model genes. (B). mRNA-miRNA regulatory network of model genes. Green denotes mRNA, blue denotes TF, and yellow denotes miRNA. TF: Transcription factor.
Fig 11.
Combined dataset immune infiltration analysis by the CIBERSORT algorithm.
(A–B). The proportion of immune cells in integrated Gene Expression Omnibus (GEO) datasets (combined datasets) is shown as a bar graph (A) and group comparison graph (B). (C). Correlation heatmap illustrating relationships among immune cells in integrated GEO datasets (combined datasets). (D). Correlation bubble plots showing the association between immune cell infiltration abundance and model genes in combined GEO datasets. ns indicates no statistical significance (p ≥ 0.05); *p < 0.05; **p < 0.01; ***p < 0.001. An absolute correlation coefficient (r-value) of <0.3 indicated weak or no correlation, 0.3–0.5 indicated weak correlation, 0.5–0.8 indicated moderate correlation, and >0.8 indicated strong correlation. Yellow represents the control group, and brown represents the PA group. Red denotes a positive correlation, whereas blue denotes a negative correlation. Color depth reflects correlation strength. PA: Pediatric asthma.
Fig 12.
Risk group immune infiltration analysis by the CIBERSORT algorithm.
(A). Comparison of immune cells in low- and high-risk PA groups. (B–C). Heatmap of the correlation between immune cells in the low-risk (B) and high-risk (C) groups of pediatric asthma (PA) specimens. (D–E). Bubble plot of the correlation between immune cell infiltration abundance and model genes in the low-risk (D) and high-risk (E) groups of PA specimens. An absolute correlation coefficient (r-value) of <0.3 indicated weak or no correlation, 0.3–0.5 indicated weak correlation, 0.5–0.8 indicated moderate correlation, and >0.8 indicated strong correlation. Pink indicates the low-risk group, whereas purple indicates the high-risk group. Red denotes a positive correlation, whereas blue denotes a negative correlation. Color depth reflects correlation strength.
Fig 13.
qRT-PCR validation of biomarkers in clinical samples.
(A-D) Relative mRNA expression levels of C19orf12, GDF15, XK and IREB2 were measured in peripheral blood samples obtained from patients with PA (n = 12) and Controls (n = 8). Data are presented as mean ± standard error of mean of relative quantification values calculated using the 2^ − ΔΔCt method with β-actin as the internal reference. Statistical significance was assessed by unpaired Student’s t-test. *p < 0.05, **p < 0.01, ***p < 0.001.