Figures
Abstract
Recurrent miscarriage (RM) is a distressing reproductive condition affecting approximately 1–3% of couples. The underlying causes related to oxidative stress remain largely unclear and necessitate additional research. This study aimed to employ bioinformatics approaches to uncover the differential expression of oxidative stress-responsive genes in RM to elucidate their potential involvement in the disorder etiology. Upon examination of the data retrieved from the Gene Expression Omnibus (GEO), 18 oxidative stress-responsive differentially expressed genes (OSRDEGs) were identified. Bioinformatics techniques, which led to the identification of six hub genes—ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1—as potential biomarkers for RM, elucidated the biological processes and molecular functions associated with genes that are differentially expressed under oxidative stress. GSEA enrichment profiling revealed significant enrichment of genes between the RM and control groups in pathways such as hypoxia, epithelial‒mesenchymal transition (EMT) in breast tumors, upregulation of Wnt signaling in liver cancer progenitors, and TGFβ-induced EMT. The mRNA-miRNA interaction network analysis revealed five hub genes interacting with 41 miRNAs, with STK3 exhibiting the highest connectivity among miRNA interactions. Additionally, the analysis of immune cell infiltration demonstrated a substantial inverse relationship between ARRB2 and the levels of plasma cells and neutrophils. Conversely, UCN2 was positively associated with T. cells. CD8 are inversely associated with monocytes. Furthermore, immunohistochemical (IHC) analysis on endometrial tissues from RM patients and matched controls confirmed significantly elevated protein expression levels of six hub genes in the RM group, consistent with the bioinformatics findings. This study establishes a diagnostic model and provides insights into immune-modulation therapies for RM.
Citation: Wang M, Zhu L, Cheng X (2025) Identification of oxidative stress-responsive genes in recurrent miscarriage and their role in disease pathogenesis. PLoS One 20(12): e0337362. https://doi.org/10.1371/journal.pone.0337362
Editor: Gary S. Stein, University of Vermont, UNITED STATES OF AMERICA
Received: January 24, 2025; Accepted: November 6, 2025; Published: December 2, 2025
Copyright: © 2025 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data relevant to the study are included in the article or available as supplemental information. Gene expression data are available in the GEO repository (https://www.ncbi.nlm.nih.gov/geo/) under accession numbers GSE22490, GSE26787, and GSE165004.
Funding: This work was supported by the Municipal Health Commission of Nantong (No. MSZ2023057). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Recurrent miscarriage (RM) refers to the phenomenon of experiencing two or more sequential pregnancy losses before the 20th week of gestation, impacting approximately 1–3% of couples attempting to conceive [1,2]. Despite its common occurrence, the underlying causes of RM are not fully understood, and many cases remain unexplained, with no definitive solution currently available [3]. Therefore, there is an urgent need for improved diagnostic tools and treatment strategies to address these complex reproductive health issues.
Oxidative stress (OS) occurs when free radicals exceed the body’s antioxidant capacity to neutralize them [4]. Research has shown that OS can lead to endothelial damage, compromised placental vasculature, immune dysfunction, and negative pregnancy outcomes [5]. It has also been implicated in various pregnancy-responsive disorders, including RM, preeclampsia, fetal growth restriction, preterm labor, and preterm premature rupture of membranes [6]. Most knowledge regarding the role of OS in pregnancy comes from animal studies. Ishii et al. [7] demonstrated that oxidative stress in the mitochondria could cause placental inflammation and halt embryonic development in mice. These findings suggest that reducing OS could be a potential therapeutic strategy for treating spontaneous RM.
Additionally, Wang et al. [8] used bioinformatic analysis to identify differentially expressed genes in endometrial samples from the RM and control groups, thereby pinpointing potential biomarkers for RM. However, previous studies have not explored the differential expression of genes related to RM and OS.
Within the scope of this study, we retrieved the GSE22490 and GSE26787 datasets from the Gene Expression Omnibus (GEO) database [9] and merged them into a combined dataset. We analyzed these datasets as a test set, with GSE165004 as the validation set. We comprehensively analyzed the oxidative stress-responsive differentially expressed genes (OSRDEGs) in RM by integrating data from multiple gene expression datasets. We scrutinized the variability in gene expression specific to RM and conducted analyses of functionality, pathways, and gene set enrichment. Furthermore, we developed machine learning algorithms to devise diagnostic models. In addition, we analyzed immune cell infiltration in RM samples to elucidate the interplay between OS and the immune response in this disease. This holistic methodology is expected to provide a novel understanding of the molecular underpinnings of RM and establish a foundation for innovative diagnostic and therapeutic approaches.
Materials and methods
Data download
We retrieved the datasets GSE22490 [10], GSE26787 [10], and GSE165004 [10] from patients with RM from the GEO database [9] via GEOquery [11]. All datasets were confirmed to be derived from Homo sapiens. The GSE22490 dataset consists of microarray gene expression profiles of the placenta, totaling 10 samples. Samples with > 12 gestational cycles were excluded. The resulting dataset included three RM samples (RM group) and five normal samples (control group). These samples were analyzed via the Affymetrix Human Genome U133 Plus 2.0 Array, which was identified via the platform GPL570 [HG-U133_Plus_2]. The GSE26787 dataset consisted of microarray gene expression profiles of the endometrium, totaling 15 samples. This dataset included five RM samples and five normal samples. The analysis of these 10 samples was performed via the Affymetrix Human Genome U133 Plus 2.0 Array, which corresponds to the platform identifier GPL570 [HG-U133_Plus_2]. The GSE165004 dataset consists of microarray gene expression profiles of the endometrium, totaling 72 samples. This dataset included 24 RM and 24 normal samples. The comprehensive analysis of these 48 samples was executed on the Agilent-039494 SurePrint G3 Human GE v2 8 × 60K Microarray 039381 platform, identified under GPL16699. The GSE22490 and GSE26787 datasets were amalgamated to form the test cohort, and the GSE165004 dataset served as the validation cohort for further analysis. The specific dataset information is presented in Table 1.
We assembled a collection of oxidative stress-responsive genes (OSRGs) from the GeneCards database, a repository offering extensive information on human genes [12]. Using the term “oxidative stress” as a search keyword, we obtained 1558 OSRGs. We searched the Molecular Signatures Database [13] (MSigDB) website with “oxidative stress” as the key word, and 120 OSRGs were identified from the GObp Cell Death In Response to OS and Biocarta Arenrf2 Pathway reference gene sets. A total of 279 OSRGs were obtained from the literature [14]. Finally, the OSRGs from the three sources were merged with the combined datasets, and the genes of the GSE165004 dataset were intercrossed to obtain 1571 OSRGs, which were subsequently integrated into our analysis and categorized as OSRGs. The nomenclature of the genes is shown in S1 Table.
Differentially expressed genes correlated with oxidative stress
Initially, we employed the sva R package [15] to perform batch correction on the GSE22490 and GSE26787 datasets, forming a combined datasets containing eight RM samples and ten control samples. Principal component analysis (PCA) [16] reduces data dimensions by extracting feature vectors from high-dimensional datasets and mapping them into a lower-dimensional space for visual representation in two- or three-dimensional graphs.
To discern differentially expressed genes (DEGs) associated with the RM and control groups, we utilized the limma package [17] for variance analysis of the combined datasets expression data. We set a threshold of |logFC| > 0.5 and P.adjust < 0.05 to identify DEGs, defining upregulated genes as those with logFC > 0.5 and P.adjust < 0.05 and downregulated genes as those with a logFC < −0.5 and P.adjust < 0.05.
We then merged the datasets via variance analysis to identify genes with | logFC| > 0.5 and P.adjust < 0.05, linking them to 1571 OSRGs. By intersecting the datasets and constructing a Venn diagram, we identified the OSRDEGs. Subsequently, volcano plots and a differential ranking map were constructed via variance analysis via the ggplot2 R package [18], and a heatmap [19] was developed via the pheatmap R package.
Functional enrichment analysis (GO) and pathway enrichment analysis (KEGG) between the RM group and the control group
Gene Ontology (GO) [20] serves as a standard method for conducting large-scale functional classification studies and includes the examination of biological processes (BP), molecular functions (MF), and cellular components (CC). The Kyoto Encyclopedia of Genes and Genomes (KEGG) [21] is recognized as an extensive database repository that archives data on genomes, biological pathways, diseases, and pharmaceuticals. Using the R package clusterProfiler [22], we performed GO and KEGG pathway analyses to identify OSRDEGs between the RM and control groups within the combined datasets. We deemed items with P.adjust < 0.05 and FDR value (q value) < 0.25 as statistically significant via the Benjamini‒Hochberg (BH) correction approach.
GSEA of the RM and control groups
Gene set enrichment analysis (GSEA) [23] was used to determine the influence of a preset gene collection on the phenotype by scrutinizing their arrangement in a ranked gene list correlated with phenotypic associations. The genes in the combined datasets were initially sorted and divided into high- and low-phenotypic-relevance groups. Next, we employed the clusterProfiler package to perform enrichment analysis comparing all genes between the RM and control groups. For the GSEA enrichment analysis, the following parameters were specified: a seed value of 2022, a calculation count of 1000, a minimum gene set size of 10, a maximum gene set size capped at 500, and BH correction for p values. We obtained the GMT gene set under c2.Cp.All.V2022.1. Hs. Symbol designation from the MSigDB repository as a reference. The thresholds for considering enrichment as significant were set at P.adjust < 0.05 and a q value < 0.05.
Constructing a diagnostic model for RM
Initially, we employed the R package glmnet [24], utilized OSRDEGs, and set the number of seed parameters to 500. OSRDEGs were selected via least absolute shrinkage and selection operator (LASSO) regression, which enhances linear regression by incorporating a penalty term consisting of lambda multiplied by the absolute slope value to improve predictive accuracy. The outcomes of the LASSO regression were depicted via a diagnostic model and variable trajectory diagrams.
We developed a support vector machine (SVM) [25] model utilizing hub genes from the combined datasets, prioritizing genes with the highest precision and lowest error rate. The genes selected by the LASSO regression and SVM models were compared, and a Venn diagram was used to identify the overlapping genes related to necrotizing apoptosis. Additionally, the RCircos package [26] was used to visualize the chromosomal distribution of the hub genes.
Genes resulting from the overlap between the LASSO and SVM analyses were identified as hub genes for subsequent studies. The hub genes are shown in Table 2.
Semantic comparison within the GO annotation framework offers a numerical approach for assessing gene similarity, a critical foundation for numerous bioinformatics methodologies. Semantic similarity among hub genes was determined via the GOSemSim package [27], with further computation of the geometric means of the BP, CC, and MF categories to establish the final score. Functional similarity analyses were performed via the ggplot package.
We visualized the outcomes of the functional similarity analysis via the ggplot package. To derive the OSRDEG diagnostic model within the combined datasets, we subsequently conducted logistic regression analysis focusing on the hub genes. Logistic regression was applied to the hub genes to develop a diagnostic model for OSRDEGs across the combined datasets. Regression was used to examine the binary relationship between the independent and dependent variables, which corresponded to the RM and control groups, to construct a logistic model. The RM samples were bifurcated into a high predictive value group (High) and a low predictive value group (Low) on the basis of the median predictive value derived from the logistic regression. In addition, a nomogram [28], a graphical tool that delineates the functional interplay between multiple independent variables within a rectangular coordinate system with disjoint line segments, was generated via the RMS package to indicate the relationships between hub genes within the logistic regression model.
Finally, a calibration analysis was conducted to produce a calibration curve, which served to evaluate the precision and discriminatory power of the logistic regression model developed from the hub genes. Decision curve analysis (DCA) [29] is a straightforward approach for appraising clinical forecasting models, diagnostic assays, and molecular markers. The ggDCA R package was used to generate a DCA plot, which facilitated the assessment of the model’s accuracy and discrimination.
Correlation analysis of the hub genes
We employed the Spearman method to assess the correlation between the hub gene expression levels, and the findings were depicted via a correlation heatmap.
GSEA between the high and low groups
Initially, we categorized the genes within the combined datasets into two groups on the basis of their correlation with the phenotypic traits. The clusterProfiler package was subsequently used to perform enrichment analysis on the genes distinguishing the high- and low-phenotypic-correlation cohorts. In setting up the GSEA parameters, we designated the seed as 2022, established 1000 as the number of computations, set the threshold for genes per gene set at a minimum of 10 and a maximum of 500, and applied the BH method for p value correction. Our reference gene set was sourced from the MSigDB database under the GMT collection (c2.Cp.All.V2022.1.Hs. Symbols), with enrichment significance defined as P.adjust < 0.05 and a q value < 0.05.
Construction of mRNA‒miRNA and mRNA‒transcription factor networks
The miRDB database [30] (http://mirdb.org) is an online tool for predicting functional miRNA targets. We used the miRDB database to predict interactions between miRNAs and hub genes and subsequently constructed an mRNA‒miRNA network via Cytoscape software.
The CHIPBase repository (edition 2.0), accessible via its secure web address (https://rna.sysu.edu.cn/chipbase/) [31], employs ChIP-seq data of DNA-binding proteins to discern numerous composite base sequence patterns and their respective binding locations. Furthermore, it projects an extensive array of regulatory interactions between factors (TFs) and their target genes. The HTFtarget database [32], which can be accessed at http://bioinfo.life.hust.edu.cn/hTFtarget, is a holistic resource that encompasses information on human transcription factors and their targeted regulatory mechanisms and presents a detailed, trustworthy, and user-friendly environment for exploring the regulatory control exerted by human TFs. We explored the CHIPBase and HTFtarget databases to identify TFs that bind to hub genes and depicted their interactions via Cytoscape software.
Analysis of immune cell infiltration (CIBERSORTx)
To assess the distribution of various immune cell types within RM samples, we used the analytical tool CIBERSORTx [33]. We synthesized gene expression datasets from the CIBERSORTx online resource (http://cibersortx.stanford.edu/) and cross-referenced them with the LM22 gene signature to construct a matrix indicating the extent of immune cell infiltration. A filter was then applied to retain only data points with immune cell enrichment scores exceeding zero, culminating in a matrix that specified distinct patterns of immune cell infiltration. The disparities in immune cell infiltration between the high- and low-RM sample groups were visually represented via stacked bar charts, as interpreted via logistic regression analysis. The interdependencies among the different immune cells within the RM samples were evaluated via Spearman’s correlation coefficient and graphically displayed via the ggplot2 package in R. By integrating the gene expression data from the combined datasets, we ascertained the correlation between immune cells and hub genes in the stratified groups, and Spearman’s correlation analysis was employed to interpret and visualize these correlations in a heatmap.
Immunohistochemical detection of hub proteins
Endometrial tissue samples were obtained via hysteroscopic biopsy during the luteal phase from three patients diagnosed with RM and three matched control subjects. All tissue samples were fixed in formaldehyde and embedded in paraffin for sectioning. Following deparaffinization and rehydration, antigen retrieval was conducted on the paraffin-embedded tissue sections via citrate buffer (pH 6.0). Endogenous peroxidase activity was inhibited by incubation with 3% H₂O₂, and nonspecific binding was blocked with normal goat serum. The sections were incubated overnight at 4°C with primary antibodies, including BMF (Abcam, UK) as well as SORCS2, STK3, VIPR1, UCN2, and ARRB2 (all from Proteintech, USA). This was followed by incubation with an HRP-conjugated secondary antibody at room temperature for 30 minutes. Immunoreactivity was visualized using diaminobenzidine (DAB), and the nuclei were counterstained with hematoxylin. Two blinded pathologists independently evaluated all the slides via a semiquantitative H-score system. The scoring criteria were defined as follows: three points for strong positivity, two points for moderate positivity, one point for weak positivity, and zero points for negative staining.
The study protocol was approved by the Institutional Research Ethics Committee of Affiliated Maternity and Child Health Care Hospital of Nantong University. All procedures were conducted in accordance with the ethical principles outlined in the Declaration of Helsinki.
Data analysis and interpretation
With R software version 4.3.1, we executed all the statistical computations and analytical interpretations for this study. We used the classic Student’s t test for datasets that adhered to a Gaussian distribution to evaluate the statistical significance of the continuous data across the two groups. For datasets that were not normally distributed, we opted for the Mann‒Whitney U test, a nonparametric alternative. In instances of data analysis involving three or more groups, we relied on the Kruskal‒Wallis test as our primary analytical tool. We employed either the chi-square test or Fisher’s exact test to determine significant disparities between the two groups. Spearman’s rank correlation was used to determine the correlation coefficients between the various molecular entities. Unless stated otherwise, all P values were calculated on a two-tailed basis, with statistical significance indicated by a threshold of less than 0.05.
Results
Technology roadmap
To systematically investigate the role of OSRGs in RM, we designed a comprehensive analytical workflow (Fig 1). Our approach began with the retrieval and processing of gene expression data from multiple GEO datasets, followed by the identification of DEGs and OSRGs. Through the intersection of these gene sets, we identified OSRDEGs. We subsequently conducted extensive bioinformatics analyses, including GO and KEGG pathway enrichment, GSEA, and the construction of diagnostic models via LASSO regression and SVM. Additionally, we explored transcription factor networks and performed immune cell infiltration analysis to gain deeper insights into the molecular mechanisms underlying RM. To further elucidate the underlying molecular mechanisms, we conducted an integrated analysis of transcription factor regulatory networks and immune microenvironment infiltration. Ultimately, immunohistochemical(IHC) analysis was performed to validate the protein-level expression of pivotal candidate genes, thereby providing critical experimental corroboration for our bioinformatic predictions.
DEGs, differentially expressed genes. OSRGs, oxidative stress-responsive genes. OSRDEGs, oxidative stress-responsive differentially expressed genes. GO, Gene Ontology. KEGG, Kyoto Encyclopedia of Genes and Genomes. GSEA, Gene Set Enrichment Analysis. LASSO, least absolute shrinkage and selection operator. SVM, support vector machine. TF, transcription factor. ROC, receiver operating characteristic. IHC, immunohistochemistry. RM, recurrent miscarriage.
Data collection and correction
Initially, we employed the sva R package to mitigate the influence of batch effects from the integrated GSE22490 and GSE26787 datasets, yielding a unified GEO datasets. Comparative visualizations were subsequently generated (Fig 2A and 2B) to illustrate the state of the datasets before and after batch effect mitigation. PCA plots, as depicted in Fig 2C and 2D, demonstrated consistency across the postcorrection datasets. Distribution boxplot and principal component analysis (PCA) plot analyses confirmed that the batch effect was substantially diminished after removal from the combined datasets.
A. Distribution boxplot of the dataset before batch effect correction. B. Distribution boxplot of the dataset after batch effect correction. C. PCA plot of the dataset before batch effect correction. D. PCA plot of the dataset after batch effect correction. PCA, principal component analysis.
Oxidative stress-responsive differentially expressed gene analysis
To identify the OSRDEGs, we segregated the dataset into RM and control groups. Following this categorization, meticulous differential gene expression analysis was subsequently performed. The analysis identified 290 genes that surpassed the predefined thresholds for differential expression, with a notable |logFC| > 0.5 and P.adjust < 0.05. Among these genes, 151 were upregulated in the RM group relative to the control group, as indicated by logFC > 0.5 and P.adjust < 0.05. In contrast, 139 genes were downregulated in the RM group compared with the control group, as indicated by logFC < 0.5 and P.adjust < 0.05. To visualize the outcomes of the gene expression analysis, a volcano plot was constructed (Fig 3A), providing a clear illustration of the dataset’s differential analysis. When genes whose expression was modulated by OS were targeted, the initial phase was the amalgamation of datasets from combined datasets. Following integration, meticulous differential expression analysis was conducted, applying the thresholds of |logFC| > 0.5 and P.adjust < 0.05. This process led to the identification of 290 genes whose expression significantly differed. The intersection of these genes with a pool of 1571 OSRGs yielded a subset of OSRDEGs that received 18 OSRDEGs (ADM2, AOC3, APLN, ARRB2, BMF, CD40, GADD45G, GPT, IKBKE, IL18, MMACHC, MSRB3, NPPA, SORCS2, STK3, TREM2, UCN2, VIPR1) and a Venn diagram (Fig 3B). On the basis of the intersecting data, we assessed the divergent expression profiles of OSRDEGs between the RM cohort and control subjects across various samples within the combined datasets. This led to the development of a heatmap (Fig 3D) and comparative graphical representation (Fig 3F) to illustrate the expression trends of OSRDEGs. Parallel to this approach, an analysis of gene expression variance was performed on the distinct samples of the GSE165004 dataset, RM versus control, resulting in the depiction of a heatmap (Fig 3E) and a comparative chart (Fig 3G) for the OSRDEGs. Gene expression analysis revealed that the genes GADD45G, IKBKE, MMACHC, and MSRB3 were markedly distinct across both the combined and GSE165004 datasets. To capstone the analysis, the chromosomal positioning of 18 OSRDEGs was examined via the RCircos R package, culminating in the visualization of a chromosome distribution map (Fig 3C). Chromosome mapping revealed that OSRDEGs were generally located on chromosomes 1, 3, 4, 6, 8, 9, 11, 12, 15, 17, 20, 22 and X.
A. Volcano plot of DEGs between the RM and control groups in the combined dataset; the OSRDEGs are highlighted as marker genes. B. Venn diagram illustrating the intersection of DEGs and OSRGs. C. Chromosomal mapping of OSRDEGs. D. Heatmap of OSRDEGs between the RM and control groups in the combined dataset. E. Heatmap of OSRDEGs in the GSE165004 dataset between the RM and control groups. F. Group comparison plot of OSRDEGs expression between RM and control groups in the combined dataset. G. Group comparison plot of OSRDEGs expression between RM and control groups in the GSE165004 dataset. Asterisks indicate statistical significance: ns, P ≥ 0.05 (not significant); *, P < 0.05; **, P < 0.01; ***, P < 0.001. DEGs, differentially expressed genes. OSRGs, oxidative stress-responsive genes. OSRDEGs, differentially expressed oxidative stress-responsive genes. RM, recurrent miscarriage.
GO and KEGG profiling of OSRDEGs in the RM and control groups
To elucidate the BPs, MFs, and CCs of the 18 OSRDEGs in the RM and control groups, we performed a comprehensive analysis. The analysis included GO enrichment to shed light on the roles of genes and a KEGG pathway study to chart the pathways linked to OSRDEGs. Fig 4A presents the results of both the GO functional and KEGG pathway enrichment analyses, which are displayed as bar graphs. The network diagram in Fig 4B elucidates the intricate associations between OSRDEGs and the findings of the enrichment analyses. The lines denote the interactions between molecules, whereas the size of the nodes corresponds to the number of molecules that each entry encompasses. By employing the benchmarks of p.adjust < 0.05 and q.value < 0.25 for statistical significance, the analysis revealed that the 18 OSRDEGs were enriched mainly in functions such as positive regulation of heart contraction and positive regulation of blood circulation in the context of RM. BP is associated with negative regulation of blood pressure and lymphocyte-mediated immunity. CC, perikaryon, CD40 receptor complex, dendritic spines, and neuron spines. The enriched molecular functions (MFs) that were enriched included hormone activity, neuropeptide receptor binding, receptor ligand activity, and signaling receptor activator activity (Table 3).
A. Bar graph of the GO and KEGG analysis results of OSRDEGs. B. Network diagram of the GO and KEGG analysis results of OSRDEGs. C. Circle diagram of the GO and KEGG analysis of OSRDEGs combined with logFC results. D. Chord graph of the GO and KEGG analysis of OSRDEGs combined with logFC results. OSRDEGs, differentially expressed oxidative stress- responsive genes. GO, Gene Ontology. BP, biological process. CC, cellular component. MF, molecular function. KEGG, Kyoto Encyclopedia of Genes and Genomes. RM, recurrent miscarriage.
We subsequently performed GO and KEGG enrichment analyses of the combined logFC of these 18 OSRDEGs, and on the basis of the enrichment analysis, the corresponding z score of each gene was subsequently calculated by providing the logFC value of the OSRDEGs in the difference analysis results of the combined datasets. The results of the GO enrichment analysis of the combined logFC data are shown via a circle diagram (Fig 4C) and chord diagram (Fig 4D). Fig 4c-d shows the results of the GO enrichment analysis of the 18 OSRDEGs combined with logFC, which focused mainly on receptor ligand activity. signaling receptor activator, and other MFs. Detailed information is presented in Table 3.
We performed enrichment analyses via GO and KEGG for the logFC values consolidated from the 18 OSRDEGs. For each gene, the associated z scores were determined on the basis of the logFC values present in the differential analysis of the combined datasets. The findings from the GO and KEGG enrichment analyses, which were correlated with the logFC values, are illustrated in circular (Fig 4C) and chord diagrams (Fig 4D). These diagrams (Fig 4C-D) reveal a focus on molecular functions such as receptor ligand activity. signaling receptor activator activity, along with other MFs. Detailed information is presented in Table 3.
GSEA profiling of OSRDEGs in the RM and control groups
To elucidate the role of gene expression in the etiology of RM and compare the RM and control groups, we employed GSEA. This analysis focused on scrutinizing the gene expression profiles and their linked biological pathways within the combined datasets. The thresholds for identifying significant enrichment were set as P.adjust < 0.05 and a q value < 0.25. Using these criteria, a mountain plot (Fig 5A) was constructed to visualize the results of the enrichment analysis.
A. Mountain plot depicting the top four significantly enriched gene sets from GSEA of OSRDEGs in the combined dataset. B-E. OSRDEGs were significantly enriched in Hypoxia Dn (B), Emt Breast Tumor Dn (C), HCC Progenitor Wnt Up (D), and Prodrank Tgfb Emt Up (E). The significant enrichment screening criteria were p.adjust < 0.05 and q value < 0.25. GSEA, Gene Set Enrichment Analysis. OSRDEGs, oxidative stress-responsive differentially expressed genes. RM, recurrent miscarriage.
The comparative evaluation of gene expression data between the RM and control groups within combined datasets revealed considerable enrichment of genes within defined biological pathways. Specifically, the enrichment was significant for hypoxic DN (Fig 5B), epithelial‒mesenchymal transition (EMT), breast tumor DN (Fig 5C), and EMT-related breast tumor DN (Fig 5C). HCC Progenitors Wnt Up (Fig 5D), Prodrinks Tgfb Emt Up (Fig 5E), and Other Pathways. Specific information is provided in Table 4.
LASSO analysis coupled with support vector machine (SVM) modeling
Identification of 18 OSRDEGs (ADM2, AOC3, APLN, ARRB2, BMF, CD40, GADD45G, GPT, IKBKE, IL18, MMACHC, MSRB3, NPPA, SORCS2, STK3, TREM2, UCN2, and VIPR1) in the combined datasets revealed that the LASSO regression model was anchored by the genetic profiles of 18 OSRDEGs. The LASSO variable trajectory diagram (Fig 6A) and the LASSO regression model diagram (Fig 6B) were drawn for visualization. The results showed that the LASSO regression model included a total of 7 oxidative stress-responsive differentially expressed genes (OSRDEGs), which were ARRB2, BMF, IL18, SORCS2, STK3, UCN2, and VIPR1. To further verify the value of the RM diagnostic model, a forest map was drawn based on the seven OSRDEGs (Fig 6C).
A. Variable trajectory plot of the LASSO regression model. B. Diagnostic model plot of the LASSO regression model. C. Forest plot of the LASSO regression model. D-E. Plots of the cross-validation error rate (D) and accuracy (E) against the number of genes in the SVM model. F. OSRDEGs selected by the LASSO regression model and SVM Venn diagram. G. Box plot for functional similarity analysis of hub genes. OSRDEGs, differentially expressed oxidative stress-responsive genes. LASSO, least absolute shrinkage and selection operator. SVM, support vector machine.
On the basis of 18 OSRDEGs (ADM2, AOC3, APLN, ARRB2, BMF, CD40, GADD45G, GPT, IKBKE, IL18, MMACHC, MSRB3, NPPA, SORCS2, STK3, TREM2, UCN2, and VIPR1), an SVM methodology was implemented to develop the SVM model, and the collection of genes with the lowest error rates (Fig 6D), and the highest accuracy rates (Fig 6E) were identified. The results indicated that the SVM model achieved peak precision with a gene count of 10, and the 10 genes were STK3, VIPR1, TREM2, SORCS2, MSRB3, UCN2, ARRB2, GADD45G, BMF, and GPT.
Next, to obtain the common OSRDEGs defined as hub genes, we intersected the OSRDEGs from the LASSO regression model with those from the random forest model (RF) and constructed a Venn diagram (Fig 6F) to identify a total of 6 hub genes, which were as follows: ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1.
Finally, to evaluate the comparative significance of the hub genes via the GOSemSim package, we performed an analysis focusing on the functional resemblance among these central genes and displayed the results via a functional similarity analysis box diagram (Fig 6G). The results showed that VIPR1 and BMF were relatively more important for function.
Construction and diagnostic performance of the hub gene diagnostic model
To ascertain the diagnostic potential of the six hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1) in the combined datasets, we performed logistic regression analysis to develop a forecasting model. This model was based on these six hub genes, and RM samples were stratified into high-risk (High) and low-risk (Low) groups via the median forecast derived from the model’s analysis. The logistic regression model included six hub genes. To further validate the diagnostic value of this model, we constructed a forest map (Fig 7A) and a nomogram (Fig 7B) to illustrate the contribution of these genes to the model. The logistic regression model revealed that the predictive efficacy of STK3 gene expression markedly surpassed that of other variables. The formula for ascertaining the forecasted worth is as follows:
A-B. Forest plot (A) and nomogram (B) of the 6 hub genes in the logistic regression model. C-D. Calibration plot (C) and DCA plot (D) of the logistic regression model. E. ROC curves of the logistic model prediction values in the combined datasets. F. ROC curves of ARRB2, BMF, and SORCS2 expression in the combined datasets. G. ROC curves of STK3, UCN2, and VIPR1 gene expression in the combined dataset. AUC, area under the curve. DCA, decision curve analysis. ROC, receiver operating characteristic curve.
Finally, to assess the correctness and discriminatory power of the logistic regression model, we constructed a calibration curve via a calibration analysis. The predictive performance of the model against the actual outcomes was assessed by comparing the optimal theoretical probabilities (solid lines) with the probabilities predicted by the model (dashed lines) across the various scenarios depicted in the figure (Fig 7C). The clinical effectiveness of the logistic regression models was assessed via decision curve analysis (DCA), and the findings are displayed in Fig 7D. In the DCA figure, if the model’s curve surpasses both the all-positive and all-negative lines, the broader the interval is, the greater the net benefits, and the more a model performs. On the basis of these findings (Fig 7C-D), the model constructed in this study has high accuracy in the diagnosis of recurrent spontaneous abortion.
For the purpose of additional validation of the logistic model and the central gene diagnostic model, receiver operating characteristic (ROC) analysis was performed, incorporating the forecasted probabilities of the logistic model, the gene expression levels of the six hub genes, and the group (RM/Control) details from the combined datasets. The outcomes revealed that the forecasted efficacy of the logistic model (AUC = 0.975, Fig 7E) was highly accurate in diagnosing the RM/control groups. The diagram in Fig 7F indicates that the levels of ARRB2 (AUC = 0.960), BMF (AUC = 0.912), and SORCS2 (AUC = 0.938) had increased diagnostic accuracy in identifying the RM/control groups. The diagram in Fig 7G shows that the expression levels of STK3 (AUC = 0.936), UCN2 (AUC = 0.975), and VIPR1 (AUC = 0.975) exhibited high diagnostic accuracy in distinguishing between the RM and control groups.
Hub gene correlation analysis
By utilizing the gene expression matrix for the six hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1) from both the combined datasets and the GSE165004 dataset, we initially employed Spearman’s rank correlation method to assess the relationships between the expression levels of the six hub genes, subsequently presenting the outcomes in a correlation heatmap (Fig 8A-B).
A-B. Correlation heatmap of the hub genes based on Spearman's rank correlation analysis in the combined dataset (A) and GSE165004(B) dataset. C‒G. Scatter plots with correlations in the combined dataset: BMF and ARRB2 (C), SORCS2 and ARRB2 (D), SORCS2 and BMF (E), UCN2 and ARRB2 (F), and UCN2 and SORCS2 (G). H-L. Scatter plots with correlations in the GSE165004 dataset: BMF and ARRB2 (H), SORCS2 and ARRB2 (I), SORCS2 and BMF (J), UCN2 and ARRB2 (K), and UCN2 and SORCS2 (L). Asterisks in the correlation heatmap indicate statistical significance: ns, P ≥ 0.05 (not significant); *, P < 0.05; **, P < 0.01; ***, P < 0.001.
The results indicated that the expression patterns of the six hub genes identified in the combined datasets (Fig 8A) and GSE165004 datasets (Fig 8B) were significantly correlated with the expression levels of the majority of the other hub genes. According to the results of the correlation heatmap in Fig 8A and Fig 8B, we selected the gene combinations with the same significant correlation in the combined datasets and GSE165004 dataset and plotted the correlation scatter plot (Fig 8B). C-L). In the combined datasets, there was a slight positive correlation between BMF and ARRB2 (R = 0.474, P = 0.049; Fig 8C) and an intermediate positive correlation between SORCS2 and ARRB2 (R = 0.701, P = 0.002; Fig 8D). There was an intermediate positive correlation between SORCS2 and BMF (R = 0.583, P = 0.013; Fig 8E) and between UCN2 and ARRB2 (R = 0.748, P < 0.001; Fig 8F). An intermediate positive correlation was observed between UCN2 and SORCS2 expression (R = 0.562, P = 0.017; Fig 8G). In dataset GSE165004, there was a slight positive correlation between BMF and ARRB2 (R = 0.360, P = 0.012; Fig 8H), and a slight positive correlation was observed between SORCS2 and ARRB2 (R = 0.373, P = 0.009; Fig 8I). There was an intermediate positive correlation between SORCS2 and BMF (R = 0.637, P < 0.001; Fig 8J) and between UCN2 and ARRB2 (R = 0.629, P < 0.001; Fig 8K). An intermediate positive correlation was observed between UCN2 and SORCS2 (R = 0.646, P < 0.001; Fig 8L).
Combined datasets GSEA profiling of OSRDEGs in the high and low groups
To determine the link between gene expression patterns and predictive values in the high- and low-BP groups stratified by the logistic regression model, we performed GSEA on the combined datasets to investigate the interplay between gene expression and associated BPs, CCs, and MFs. The thresholds for identifying significant enrichment were set as P.adjust < 0.05 and a q value < 0.25. A mountain plot (Fig 9A) was constructed to visualize the results of the enrichment analysis. The results revealed that the genes separating the high and low groups in the combined datasets were substantially overrepresented in pathways related to hypoxia (Fig 9B), apoptosis (Fig 9C), Tgfb Emt Up (Fig 9D), the Il23 pathway (Fig 9E), and other relevant pathways. Further details are presented in Table 5.
A. Mountain plot displaying the top four significantly enriched gene sets from GSEA between the high and low groups in the combined dataset. B-E. Genes in the high and low groups in the combined dataset were significantly enriched in the hypoxia (B), apoptosis (C), Tgfb Emt Up (D), and Il23 pathways (E). GSEA, Gene Set Enrichment Analysis. The significant enrichment screening criteria: p.adjust < 0.05 and FDR value (q value) < 0.25.
Development of mRNA‒miRNA and mRNA‒TF interaction networks
We harnessed mRNA‒miRNA information from the miRDB database to predict miRNAs that engage with six hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1). The interaction network of mRNAs and miRNAs was subsequently constructed via Cytoscape software (Fig 10A). The mRNA‒miRNA interaction network is diagrammed, with purple ovals indicating mRNAs and green ovals indicating miRNAs. The mRNA‒miRNA interaction network included five hub genes (ARRB2, BMF, SORCS2, STK3, and UCN2) and 41 miRNA molecules, resulting in 41 mRNA‒miRNA interaction associations. STK3, a hub gene within the mRNA‒miRNA interaction network, exhibited the highest level of interaction with miRNAs and was associated with 17 TFs. The specific information is presented in Table 6.
A. The mRNA-miRNA interaction network of the hub genes and miRNAs. Nodes represent molecules (purple ovals: hub gene mRNAs; green ovals: miRNAs). B. The mRNA-TF interaction network. Nodes represent molecules (purple ovals: hub gene mRNAs; orange ovals: TFs). TF, transcription factor.
We explored the CHIPBase and HTF target databases to identify the TFs associated with binding to hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1). We retrieved interaction data from both databases, calculated their intersections, and identified six hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1) along with 94 transcription factors. A total of 201 mRNA‒TF interaction associations were established and graphically represented via Cytoscape. (Fig 10B). The mRNA‒TF interaction network is diagrammed, with purple ovals indicating mRNAs and orange ovals indicating miRNAs. BMF, a hub gene within the mRNA‒TF interaction network, exhibited the highest level of interaction with TFs and was associated with 56 TFs. The specific information is presented in Table 7.
Analysis of immune system infiltration across the high and low groups in the combined datasets (CIBERSORTx)
Immune cells play crucial roles in the onset and progression of depression. We evaluated the status of 22 immune cell distributions via CIBERSORT analysis and evaluated the prevalence of diverse immune cell populations via the Wilcoxon test. The level of immune cell distribution in the combined datasets was evaluated across the high and low groups (Fig 11A). The results revealed that 22 types of immune cells (B.cells.naive, B.cells.memory, Plasma.cells, T.cells.CD8, T.cells.CD4.naive, T.cells.CD4.memory.resting, T.cells.CD4.memory.activated, T.cells.follicular.helper, T.cells.regulatory.Tregs., T.cells.gamma.delta, NK.cells.resting, NK.cells.activated, Monocytes, Macrophages. M0, Macrophages. M1, Macrophages. M2, Dendritic.cells.resting, Dendritic.cells.activated, Mast.cells.resting, Mast.cells.activated, Eosinophils, Neutrophils) were differentially abundant across the high and low groups. We created a heatmap (Fig 11B) illustrating the associations between the abundance of infiltrating immune cells and six hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1). Plasma cells, T. cells.CD8, monocytes, and neutrophils were significantly correlated with ARRB2 and UCN2, and a correlation scatter plot was created (Fig 11C-F), which revealed an intermediate negative correlation between ARRB2 and plasma cells (R = −0.714, P = 0.047; Fig 11C). There was an intermediate positive correlation between the UCN2 and T. cells.CD8 (R = 0.791, P = 0.019; Fig 11D). There was a pronounced negative correlation between UCN2 and monocyte count (R = −0.810, P = 0.022; Fig 11E). There was a pronounced negative correlation between ARRB2 and the neutrophil count (R = −0.976, P < 0.001; Fig 11F).
A. Stacked bar plot of 22 immune cell types in the high and low groups under the CIBERSORT algorithm. B. Correlation heatmap between the infiltration abundance of 22 immune cells and the expression of hub genes. C-F. Scatter plots showing significant correlations between hub genes and specific immune cells: (C) ARRB2 vs Plasma.cells, (D) UCN2 vs T.cells.CD8, (E) UCN2 vs Monocytes, (F) ARRB2 vs Neutrophils. Asterisks in the heatmap (B) indicate statistical significance: ns, P ≥ 0.05 (not significant); *, P < 0.05; **, P < 0.01; ***, P < 0.001.
Immunohistochemical analysis of hub protein expression
IHC analysis revealed significantly higher expression levels of six proteins (ARRB2, SORCS2, VIPR1, UCN2, STK3 and BMF) in the endometrial tissues of the RM group than in those of the control group, as detailed in Fig 12.
Discussion
RM, which is characterized by the loss of two or more pregnancies, is a troubling reproductive disorder that affects approximately 1–2% of women of reproductive age [2]. RM profoundly influences the emotional and psychological states of affected individuals [34]. Despite its prevalence and impact, the underlying causes of RM remain poorly understood, necessitating further research on its pathophysiology and potential therapeutic targets.
Exploring the differential expression of OSRGs in RM offers a promising avenue for understanding the pathophysiology of this disease. OS is associated with placental dysfunction and pregnancy complications [5], leading us to hypothesize that altered OSRG expression contributes to RM development. This study aimed to uncover the molecular basis of RM by examining the differential expression and functional roles of these genes and potentially identifying new biomarkers for diagnosis and treatment. These findings may ultimately improve patient prognosis.
This study identified 18 OSRDEGs in RM, highlighting the intricate relationship between OS and pregnancy maintenance. Through intersectional analysis, six hub genes (ARRB2, BMF, SORCS2, STK3, UCN2, and VIPR1) were identified. Beta-arrestin 2 (ARRB2) is a versatile protein that modulates G protein-coupled receptor signaling and affects various cellular processes, such as receptor desensitization, internalization, and downstream signaling pathways [35]. ARRB2 may regulate inflammatory responses by influencing the infiltration of immune cells, such as macrophages, neutrophils, and T cells [36]. RM is closely associated with the dysregulation and infiltration of immune cells at the maternal–fetal interface. Specifically, an increase in proinflammatory Th17 cells and a decrease in regulatory T cells represent a core mechanism disrupting immune tolerance [37]. Additionally, the polarization of macrophages toward the proinflammatory M1 phenotype and the formation of neutrophil extracellular traps (NETs) by neutrophils exacerbate local inflammation and thrombosis, collectively creating a proinflammatory and prothrombotic microenvironment unfavorable for pregnancy [38]. Therefore, ARRB2 may play a significant role in the inflammatory regulation of RM by modulating the recruitment or functional states of these immune cells. As a member of the BH3-only protein family, BMF is a key regulator of the mitochondrial apoptotic pathway [39]. It is activated under stress conditions such as hypoxia [40] and endoplasmic reticulum stress [41], and changes in membrane permeability changes by antagonizing antiapoptotic proteins such as Bcl-2, thereby initiating the apoptotic cascade [42]. Excessive apoptosis of trophoblasts or endometrial cells is closely associated with gestational disorders such as placental insufficiency and implantation failure [43]. SORCS2 (sortilin-related VPS10 domain containing receptor 2) is a VPS10P domain receptor that is expressed primarily in the nervous system. It influences neuronal survival and apoptosis by modulating neurotrophic signaling, such as the BDNF/TrkB pathway, and participates in signal transduction cascades, including the MAPK and PI3K/Akt pathways [44,45]. In terms of immunoregulation, SORCS2 is actively involved in neuroinflammatory processes [46]. This immune enrichment, achieved through the regulation of glial cell activity and the release of inflammatory factors, underpins its role in neurological disorders [47]. Although direct evidence remains limited, the potential function of SORCS2 in regulating OS, cell migration, and cellular homeostasis at the maternal–fetal interface warrants further investigation. STK3, also known as mammalian sterile 20-like kinase 2 (MST2), participates in the Hippo signaling pathway, where it functions as a core kinase to regulate cell proliferation, apoptosis, and OS responses, thereby playing a vital role in maintaining homeostasis and development in various tissues [48]. STK3 regulates cellular proliferation and apoptosis through downstream signaling axes such as the PI3K/AKT axis [49]. Loss of Mst1/2 function results in increased oxidative damage, phagocyte senescence, and cell death [50]. Dysregulated Hippo pathway activity is closely associated with impaired differentiation, migration, and invasion of trophoblast cells, underscoring its critical role in embryonic implantation and placental development [51,52]. UCN2, a member of the corticotropin-releasing factor (CRF) family, plays a significant role in stress responses and immunoregulation [53]. Alterations in UCN2 expression during immune and cellular stress responses are closely associated with pregnancy outcomes such as preterm birth and preeclampsia [54–56]. VIPR1 (also known as VPAC1) mediates the signaling of vasoactive intestinal peptide (VIP) and is involved in neural, immune, and vascular regulation [57]. At the maternal–fetal interface, VIP modulates immune tolerance, trophoblast function, and vascular development primarily through the VPAC1 and VPAC2 receptors, and deficiency in this signaling pathway can disrupt placental homeostasis, thereby leading to adverse pregnancy outcomes [58]. In summary, the six hub genes (BMF, SORCS2, STK3, VIPR1, UCN2, and ARRB2) operate synergistically within molecular networks regulating OS, apoptosis, signal transduction, and immune microenvironment dynamics at the maternal–fetal interface. They play central roles in sustaining interface integrity, supporting embryo implantation and development, and modulating inflammatory and immune homeostasis. Dysregulation of these genes may lead to impaired maternal–fetal crosstalk and defective embryonic growth, ultimately promoting RM. This concept is experimentally supported by IHC analysis of endometrial tissues from RM patients, which demonstrated significantly elevated expression of six hub genes compared with controls. These findings provide mechanistic insight into RM pathogenesis and establish a foundation for early risk detection, personalized management, and the development of targeted therapeutic interventions.
Enrichment analysis of the OSRDEGs revealed their involvement in crucial biological processes and pathways. GSEA revealed that the genes distinguishing the RM and control groups in the combined datasets were substantially overrepresented in pathways including hypoxia Dn, Emt Breast Tumor Dn, HCC Progenitor Wnt Up, and Prodrank Tgfb Emt Up. The pathways referenced above, although frequently named for their roles in oncogenesis or specific tissue contexts, in fact underscore the conservation of fundamental molecular mechanisms, including EMT, Wnt signaling, and TGF-β regulation, across a range of physiological and pathological conditions. In the context of pregnancy, EMT is indispensable for trophoblast function during embryo implantation and placental development, as it critically regulates cell migration, invasion, and differentiation [59–61]. Similarly, signaling pathways such as Wnt and TGF-β pathways contribute to trophoblast differentiation, immune modulation, and oxidative stress management [62–64]. Dysregulation of these pathways resulting from aberrant activation or suppression can disrupt trophoblast function, compromise immune tolerance at the maternal–fetal interface, and perturb redox homeostasis, thereby potentially contributing to the pathogenesis of RM. Our GSEA findings support the relevance of these pathways in RM, highlighting their importance in clarifying disease mechanisms and suggesting their potential utility for future therapeutic targeting.
Our analysis investigated the intricate interplay between OS and immune responses in RM. ARRB2 is significantly negatively correlated with neutrophils, which are implicated in the inflammatory responses triggered by various microbial infections. The ARRB2-mediated GPR43-NF-κB signaling pathway might be a plausible target for a variety of inflammatory diseases [65], and the negative regulatory role of ARRB2 in inflammation has been confirmed by the fact that ARRB2 deficiency leads to an increase in neutrophils [66]. ARRB2 acts as a negative regulator of CXCR2 signaling within the G protein-coupled receptor (GPCR) family [67], and the chemokine receptor CXCR2 is crucial for neutrophil migration and inflammatory cytokine production [67]. Positive correlation between UCN2 and T cells.CD8 suggest a potential role for UCN2 in T-cell subsets. T.cells.CD8 is not only capable of directly inducing cell death [68] but also produces effector cytokines and regulatory molecules in the decidua. RM patients present a considerable increase in decidual T cells, with a focus on T cells. The CD8 population might be associated with the disruption of maternal–fetal immune tolerance [69]. Ran et al. [70] reported a transcriptome analysis of RM, revealing abnormal infiltration of immune cells, particularly an abnormal increase in T cells.CD8 and neutrophils in patients with RM. These findings indicate that the gene expression patterns of immune cells could serve as markers of underlying immune dysregulation, leading to a deeper understanding of the etiology of recurrent miscarriages and offering insights into potential therapeutic targets.
On the basis of the constructed mRNA–miRNA regulatory network, STK3 was identified not only as a signaling hub gene but also as a target under extensive posttranscriptional regulation by multiple miRNAs. Previous studies have confirmed that miRNAs can directly or indirectly modulate the expression of Hippo pathway components, including STK3 [71]. In addition, existing evidence indicates that miRNAs participate in the regulation of trophoblast proliferation, apoptosis, and oxidative stress responses, thereby influencing placental structure and function [72–74]. On the basis of these findings, we hypothesize that in RM, dysregulated interactions between STK3 and specific miRNAs may concurrently disrupt Hippo signaling and impair trophoblast function, ultimately compromising embryo implantation and placental development, and increasing the risk of miscarriage. STK3 and its associated miRNAs may represent promising molecular targets for the diagnosis and treatment of RM.
Despite the systematic bioinformatics analyses conducted across multiple cohorts using public databases to identify OSRGs associated with RM and establish a diagnostic model, several limitations warrant consideration. First, the absence of in vivo or in vitro functional studies hinders confirmation of the functional roles of the pivotal OSRGs and associated mechanisms in RM, despite the addition of IHC experiments that provided preliminary protein-level validation. Second, Second, while the sample size in both the bioinformatics analysis and IHC experiments was sufficient for generating preliminary data, it remains relatively small and may not fully capture the heterogeneity of the RM population. Third, the absence of prospective clinical studies to validate the diagnostic model’s real-world applicability indicates that clinical deployment necessitates rigorous evaluation in larger, multicenter cohorts. Finally, the use of multiple datasets in this study introduces the potential for batch effects, although efforts have been made to address this through statistical adjustments. Future studies will validate and refine the molecular mechanisms and diagnostic model through integrated analysis of clinical samplescombined with in vitro and in vivo experiments, aiming to facilitate the translation of these bioinformatic discoveries into clinical applications.
Conclusions
In summary, this study identified numerous OSRDEGs associated with RM, with functional enrichment analysis shedding light on BP, CC, and MF and the pathways that these genes might influence. The construction of diagnostic models based on these genes has the potential to improve the prediction and comprehension of RM, which is preliminarily supported by our IHC results from three case‒control cohorts. The observed correlations between hub genes and their potential roles in immune cell infiltration offer new insights into the pathophysiology of RM. Further analysis is necessary to ascertain the clinical applicability of this diagnostic model for predicting RM. These findings, including the initial IHC data, provide a foundation for future larger studies and experimental validation to translate bioinformatics discoveries into clinical practice, ultimately enhancing RM management and treatment.
References
- 1. Rai R, Regan L. Recurrent miscarriage. Lancet. 2006;368(9535):601–11. pmid:16905025
- 2. Practice Committee of the American Society for Reproductive Medicine. Evaluation and treatment of recurrent pregnancy loss: a committee opinion. Fertil Steril. 2012;98(5):1103–11. pmid:22835448
- 3. ESHRE Guideline Group on RPL, Bender Atik R, Christiansen OB, Elson J, Kolte AM, Lewis S, et al. ESHRE guideline: recurrent pregnancy loss. Hum Reprod Open. 2018;2018(2):hoy004. pmid:31486805
- 4. Pizzino G, Irrera N, Cucinotta M, Pallio G, Mannino F, Arcoraci V, et al. Oxidative Stress: Harms and Benefits for Human Health. Oxid Med Cell Longev. 2017;2017:8416763. pmid:28819546
- 5. Gupta S, Agarwal A, Banerjee J, Alvarez JG. The role of oxidative stress in spontaneous abortion and recurrent pregnancy loss: a systematic review. Obstet Gynecol Surv. 2007;62(5):335–47; quiz 353–4. pmid:17425812
- 6. Fujimaki A, Watanabe K, Mori T, Kimura C, Shinohara K, Wakatsuki A. Placental oxidative DNA damage and its repair in preeclamptic women with fetal growth restriction. Placenta. 2011;32(5):367–72. pmid:21435716
- 7. Ishii T, Miyazawa M, Takanashi Y, Tanigawa M, Yasuda K, Onouchi H, et al. Genetically induced oxidative stress in mice causes thrombocytosis, splenomegaly and placental angiodysplasia that leads to recurrent abortion. Redox Biol. 2014;2:679–85. pmid:24936442
- 8. Wang H, Liu Z, Meng L, Zhang X. Comprehensive bioinformation analysis of differentially expressed genes in recurrent pregnancy loss. Hum Fertil (Camb). 2023;26(5):1015–22. pmid:35306956
- 9. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, et al. NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res. 2007;35(Database issue):D760-5. pmid:17099226
- 10. Zhao X, Jiang Y, Luo S, Zhao Y, Zhao H. Intercellular communication involving macrophages at the maternal-fetal interface may be a pivotal mechanism of URSA: a novel discovery from transcriptomic data. Front Endocrinol (Lausanne). 2023;14:973930. pmid:37265689
- 11. Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23(14):1846–7. pmid:17496320
- 12. Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics. 2016;54:1.30.1-1.30.33. pmid:27322403
- 13. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25. pmid:26771021
- 14. Wu Z, Wang L, Wen Z, Yao J. Integrated analysis identifies oxidative stress genes associated with progression and prognosis in gastric cancer. Sci Rep. 2021;11(1):3292. pmid:33558567
- 15. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3. pmid:22257669
- 16. Ringnér M. What is principal component analysis?. Nat Biotechnol. 2008;26(3):303–4. pmid:18327243
- 17. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. pmid:25605792
- 18. Zhao Z, Yang H, Ji G, Su S, Fan Y, Wang M, et al. Identification of hub genes for early detection of bone metastasis in breast cancer. Front Endocrinol (Lausanne). 2022;13:1018639. pmid:36246872
- 19. Li G-M, Zhang C-L, Rui R-P, Sun B, Guo W. Bioinformatics analysis of common differential genes of coronary artery disease and ischemic cardiomyopathy. Eur Rev Med Pharmacol Sci. 2018;22(11):3553–69. pmid:29917210
- 20. Yu G. Gene Ontology Semantic Similarity Analysis Using GOSemSim. Methods Mol Biol. 2020;2117:207–15. pmid:31960380
- 21. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. pmid:10592173
- 22. Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16(5):284–7. pmid:22455463
- 23. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. pmid:16199517
- 24. Engebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics. 2019;11(1):123. pmid:31443682
- 25. Sanz H, Valim C, Vegas E, Oller JM, Reverter F. SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics. 2018;19(1):432. pmid:30453885
- 26. Zhang H, Meltzer P, Davis S. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics. 2013;14:244. pmid:23937229
- 27. Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics. 2010;26(7):976–8. pmid:20179076
- 28. Park SY. Nomogram: An analogue tool to deliver digital knowledge. J Thorac Cardiovasc Surg. 2018;155(4):1793. pmid:29370910
- 29. Van Calster B, Wynants L, Verbeek JFM, Verbakel JY, Christodoulou E, Vickers AJ, et al. Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators. Eur Urol. 2018;74(6):796–804. pmid:30241973
- 30. Chen Y, Wang X. miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res. 2020;48(D1):D127–31. pmid:31504780
- 31. Zhou K-R, Liu S, Sun W-J, Zheng L-L, Zhou H, Yang J-H, et al. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data. Nucleic Acids Res. 2017;45(D1):D43–50. pmid:27924033
- 32. Zhang Q, Liu W, Zhang H-M, Xie G-Y, Miao Y-R, Xia M, et al. hTFtarget: A Comprehensive Database for Regulations of Human Transcription Factors and Their Targets. Genomics Proteomics Bioinformatics. 2020;18(2):120–8. pmid:32858223
- 33. Steen CB, Liu CL, Alizadeh AA, Newman AM. Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. Methods Mol Biol. 2020;2117:135–57. pmid:31960376
- 34. Toth B, Jeschke U, Rogenhofer N, Scholz C, Würfel W, Thaler CJ, et al. Recurrent miscarriage: current concepts in diagnosis and treatment. J Reprod Immunol. 2010;85(1):25–32. pmid:20185181
- 35. Wess J, Oteng AB, Rivera-Gonzalez O, Gurevich EV, Gurevich VV. beta-Arrestins: Structure, Function, Physiology, and Pharmacological Perspectives. Pharmacol Rev. 2023;75(5):854–84. pmid:37028945
- 36. Gao P-P, Li L, Chen T-T, Li N, Li M-Q, Zhang H-J, et al. β-arrestin2: an emerging player and potential therapeutic target in inflammatory immune diseases. Acta Pharmacol Sin. 2025;46(9):2347–62. pmid:39349766
- 37. Lee SK, Kim JY, Hur SE, Kim CJ, Na BJ, Lee M, et al. An imbalance in interleukin-17-producing T and Foxp3⁺ regulatory T cells in women with idiopathic recurrent pregnancy loss. Hum Reprod. 2011;26(11):2964–71. pmid:21926059
- 38. Andreescu M, Tanase A, Andreescu B, Moldovan C. A review of immunological evaluation of patients with recurrent spontaneous abortion (RSA). Int J Mol Sci. 2025;26(2). pmid:39859499
- 39. Puthalakath H, Villunger A, O’Reilly LA, Beaumont JG, Coultas L, Cheney RE, et al. Bmf: a proapoptotic BH3-only protein regulated by interaction with the myosin V actin motor complex, activated by anoikis. Science. 2001;293(5536):1829–32. pmid:11546872
- 40. Whelan KA, Caldwell SA, Shahriari KS, Jackson SR, Franchetti LD, Johannes GJ, et al. Hypoxia suppression of Bim and Bmf blocks anoikis and luminal clearing during mammary morphogenesis. Mol Biol Cell. 2010;21(22):3829–37. pmid:20861305
- 41. Puthalakath H, O’Reilly LA, Gunn P, Lee L, Kelly PN, Huntington ND, et al. ER stress triggers apoptosis by activating BH3-only protein Bim. Cell. 2007;129(7):1337–49. pmid:17604722
- 42. Pervushin NV, Nilov DK, Zhivotovsky B, Kopeina GS. Bcl-2 modifying factor (Bmf): “a mysterious stranger” in the Bcl-2 family proteins. Cell Death Differ. 2025. pmid:40849581
- 43. Sharp AN, Heazell AEP, Crocker IP, Mor G. Placental apoptosis in health and disease. Am J Reprod Immunol. 2010;64(3):159–69. pmid:20367628
- 44. Kaas M, Dinesen SB, Ahlgreen O, Madsen P, Molgaard S, Dalby A, et al. A low frequency damaging SORCS2 variant identified in a family with ADHD compromises receptor stability and quenches activity. Mol Psychiatry. 2025. pmid:40968259
- 45. Chen W-J, Zhu B-L, Qian J-J, Zhao J, Zhang F, Jiang B, et al. Hippocampal SorCS2 overexpression represses chronic stress-induced depressive-like behaviors by promoting the BDNF-TrkB system. Pharmacol Biochem Behav. 2024;242:173820. pmid:38996926
- 46. Li J, Li H, Wei C, Chen C, Zheng Z. Astragalus polysaccharide attenuates retinal ischemia reperfusion-induced microglial activation through sortilin-related vacuolar protein sorting 10 domain containing receptor 2/laminin subunit alpha 1 upregulation. Cytojournal. 2025;22:2. pmid:39958884
- 47. Khabazeh A, Cho E, Ekuta V, Kumar J, Poursahdi N, Wong T, et al. Investigating the Pathological Relevance of N-acylsphingosine Amidohydrolase 2 (ASAH2) and Related Proteins in Alzheimer’s Disease. Cureus. 2025;17(7):e87463. pmid:40772191
- 48. Dey A, Varelas X, Guan K-L. Targeting the Hippo pathway in cancer, fibrosis, wound healing and regenerative medicine. Nat Rev Drug Discov. 2020;19(7):480–94. pmid:32555376
- 49. Chen J, Liu F, Wu J, Yang Y, He J, Wu F, et al. Effect of STK3 on proliferation and apoptosis of pancreatic cancer cells via PI3K/AKT/mTOR pathway. Cell Signal. 2023;106:110642. pmid:36871796
- 50. Wang P, Geng J, Gao J, Zhao H, Li J, Shi Y, et al. Macrophage achieves self-protection against oxidative stress-induced ageing through the Mst-Nrf2 axis. Nat Commun. 2019;10(1):755. pmid:30765703
- 51. Du X, Dong Y, Shi H, Li J, Kong S, Shi D, et al. Mst1 and mst2 are essential regulators of trophoblast differentiation and placenta morphogenesis. PLoS One. 2014;9(3):e90701. pmid:24595170
- 52. Kong Q, Li J, Zhao L, Shi P, Liu X, Bian C, et al. Human cytomegalovirus inhibits the proliferation and invasion of extravillous cytotrophoblasts via Hippo-YAP pathway. Virol J. 2021;18(1):214. pmid:34717661
- 53. Deussing JM, Chen A. The Corticotropin-Releasing Factor Family: Physiology of the Stress Response. Physiol Rev. 2018;98(4):2225–86. pmid:30109816
- 54. Imperatore A, Rolfo A, Petraglia F, Challis JRG, Caniggia I. Hypoxia and preeclampsia: increased expression of urocortin 2 and urocortin 3. Reprod Sci. 2010;17(9):833–43. pmid:20616367
- 55. Novembri R, Torricelli M, Bloise E, Conti N, Galeazzi LR, Severi FM, et al. Effects of urocortin 2 and urocortin 3 on IL-10 and TNF-α expression and secretion from human trophoblast explants. Placenta. 2011;32(12):969–74. pmid:22000474
- 56. Petraglia F, Imperatore A, Challis JRG. Neuroendocrine mechanisms in pregnancy and parturition. Endocr Rev. 2010;31(6):783–816. pmid:20631004
- 57. Latek D, Langer I, Krzysko K, Charzewski L. A molecular dynamics study of vasoactive intestinal peptide receptor 1 and the basis of its therapeutic antagonism. Int J Mol Sci. 2019;20(18). pmid:31491880
- 58. Calo G, Hauk V, Vota D, Van C, Condro M, Gallino L, et al. VPAC1 and VPAC2 receptor deficiencies negatively influence pregnancy outcome through distinct and overlapping modulations of immune, trophoblast and vascular functions. Biochim Biophys Acta Mol Basis Dis. 2023;1869(2):166593. pmid:36328148
- 59. Oghbaei F, Zarezadeh R, Jafari-Gharabaghlou D, Ranjbar M, Nouri M, Fattahi A, et al. Epithelial-mesenchymal transition process during embryo implantation. Cell Tissue Res. 2022;388(1):1–17. pmid:35024964
- 60. Davies JE, Pollheimer J, Yong HEJ, Kokkinos MI, Kalionis B, Knöfler M, et al. Epithelial-mesenchymal transition during extravillous trophoblast differentiation. Cell Adh Migr. 2016;10(3):310–21. pmid:27070187
- 61. DaSilva-Arnold SC, Zamudio S, Al-Khan A, Alvarez-Perez J, Mannion C, Koenig C, et al. Human trophoblast epithelial-mesenchymal transition in abnormally invasive placenta. Biol Reprod. 2018;99(2):409–21. pmid:29438480
- 62. Dietrich B, Haider S, Meinhardt G, Pollheimer J, Knöfler M. WNT and NOTCH signaling in human trophoblast development and differentiation. Cell Mol Life Sci. 2022;79(6):292. pmid:35562545
- 63. Horvat Mercnik M, Schliefsteiner C, Sanchez-Duffhues G, Wadsack C. TGFβ signalling: a nexus between inflammation, placental health and preeclampsia throughout pregnancy. Hum Reprod Update. 2024;30(4):442–71. pmid:38519450
- 64. Adu-Gyamfi EA, Ding Y-B, Wang Y-X. Regulation of placentation by the transforming growth factor beta superfamily†. Biol Reprod. 2020;102(1):18–26. pmid:31566220
- 65. Lee SU, In HJ, Kwon MS, Park B, Jo M, Kim M-O, et al. β-Arrestin 2 mediates G protein-coupled receptor 43 signals to nuclear factor-κB. Biol Pharm Bull. 2013;36(11):1754–9. pmid:23985900
- 66. Sharma D, Malik A, Lee E, Britton RA, Parameswaran N. Gene dosage-dependent negative regulatory role of β-arrestin-2 in polymicrobial infection-induced inflammation. Infect Immun. 2013;81(8):3035–44. pmid:23753627
- 67. Su Y, Raghuwanshi SK, Yu Y, Nanney LB, Richardson RM, Richmond A. Altered CXCR2 signaling in beta-arrestin-2-deficient mouse models. J Immunol. 2005;175(8):5396–402. pmid:16210646
- 68. Sears JD, Waldron KJ, Wei J, Chang C-H. Targeting metabolism to reverse T-cell exhaustion in chronic viral infections. Immunology. 2021;162(2):135–44. pmid:32681647
- 69. Huang X, Liu L, Xu C, Peng X, Li D, Wang L, et al. Tissue-resident CD8+ T memory cells with unique properties are present in human decidua during early pregnancy. Am J Reprod Immunol. 2020;84(1):e13254. pmid:32329123
- 70. Ran Y, He J, Chen R, Qin Y, Liu Z, Zhou Y, et al. Investigation and Validation of Molecular Characteristics of Endometrium in Recurrent Miscarriage and Unexplained Infertility from a Transcriptomic Perspective. Int J Med Sci. 2022;19(3):546–62. pmid:35370464
- 71. Sadri F, Hosseini SF, Rezaei Z, Fereidouni M. Hippo-YAP/TAZ signaling in breast cancer: Reciprocal regulation of microRNAs and implications in precision medicine. Genes Dis. 2023;11(2):760–71. pmid:37692482
- 72. Guo Y, Lee C-L, Meng Y, Li Y, Wong SCS, Leung HKM, et al. Unraveling the role of miRNAs in placental function: insights into trophoblast biology and pregnancy pathology. Placenta. 2025;:S0143-4004(25)00662-9. pmid:40841282
- 73. Gallagher LT, Bardill J, Sucharov CC, Wright CJ, Karimpour-Fard A, Zarate M, et al. Dysregulation of miRNA-mRNA expression in fetal growth restriction in a caloric restricted mouse model. Sci Rep. 2024;14(1):5579. pmid:38448721
- 74. Gerede A, Stavros S, Danavasi M, Potiris A, Moustakli E, Machairiotis N, et al. MicroRNAs in Preeclampsia: Bridging Diagnosis and Treatment. J Clin Med. 2025;14(6):2003. pmid:40142811