Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identification and validation of diagnostic markers and drugs for pediatric bronchopulmonary dysplasia based on integrating bioinformatics and molecular docking analysis

  • Rui Guo,

    Roles Data curation, Methodology, Validation, Visualization, Writing – original draft

    Affiliation Neonatology, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China

  • Qirui Zheng,

    Roles Data curation, Methodology, Visualization, Writing – original draft

    Affiliation Department of Ultrasound, The People’s Hospital of China Medical University, The People’s Hospital of Liaoning Province, Shenyang, Liaoning Province, China

  • Liang Zhang

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing

    19802450958@163.com

    Affiliation Neonatology, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China

Abstract

Background

BPD is a prevalent chronic lung disease in infancy with lifelong impacts. Its early diagnosis and treatment are hindered by complex pathophysiology and limited mechanistic understanding. This study seeks to establish a foundation for early diagnosis and targeted therapy by identifying diagnostic markers and exploring drug-gene associations.

Methods

Gene expression data were retrieved from the GEO database. Functional enrichment analyses were conducted on the differentially expressed genes (DEGs). DEGs were used to construct a PPI network. Three algorithms were applied to identify diagnostic markers. Immune cell infiltration was analyzed using the CIBERSORT tool, assessing relationships between immune cells and diagnostic markers. Molecular docking was performed to evaluate interactions between predict candidate drugs and diagnostic markers.

Results

Six hub genes were identified as diagnostic markers. Diagnostic markers showed significant correlations with specific immune cells. Resveratrol and progesterone were found to stably bind to all six diagnostic markers in molecular docking analyses, suggesting therapeutic potential.

Conclusion

In conclusion, our results show that IL7R, CXCL10, DEFA4, PRTN3, NCAPG and CCNB1 are BPD diagnostic indicators, and revealing immunological features associated with BPD. The molecular interactions of resveratrol and progesterone with the aforementioned key targets suggest their potential as therapeutic drugs for treating BPD.

1. Introduction

Bronchopulmonary dysplasia (BPD) is a major disease affecting the prognosis and quality of life of preterm infants. With advancements in perinatal medicine, the survival rates of preterm infants with lower gestational age and birth weight have increased in most countries, meanwhile the incidence of BPD has increased [1]. Previous studies have shown that BPD has lifelong impacts on adult health and quality of life, potentially leading to severe long-term pulmonary sequelae [2]. Follow-up studies of the BPD survivors suggested the lifelong consequences including compromised pulmonary function, asthma-like symptoms, pulmonary hypertension and exercise intolerance [36]. BPD imposes an increasingly significant burden on the healthcare system [7].

The pathophysiology of BPD involves a complex interplay of multiple factors and its prevention and lifelong consequences remain difficult to characterize. Therefore, it is important and valuable to identify diagnostic markers of BPD for improving diagnostic accuracy and guiding targeted therapies, which can enhance our understanding of the pathological development of BPD.

High-throughput sequencing and microarray have been extensively employed for the identification of potential genomic biomarkers and the analysis of gene expression changes within organisms, providing valuable insights for disease diagnosis and prognosis evaluation. Meanwhile, Machine learning (ML) algorithms have shown significant value in analyzing the potential relationships within high-dimensional data, which can be applied in the identification of biologically meaningful genes [8,9].

Therefore, our study analyzed high throughput sequencing data of BPD via integrating bioinformatics and performed Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, Gene set enrichment analysis (GSEA) and protein-protein interaction (PPI) network analysis on the differentially expressed genes (DEGs) identified. Following that, we applied topological algorithms and ML algorithms to identify hub genes.

Moreover, considering the underlying crucial role of immune responses in the pathogenesis of BPD [10,11], our study also adopted immune cell infiltration analysis to investigate the relationship between diagnostic markers and immune cells, which can enhance our comprehension of the molecular immune mechanisms in BPD.

Currently, treatment options for BPD typically include surfactant, Glucocorticosteroids (GCS), caffeine, diuretics, inhaled bronchodilators, and Vitamin A, among others [1216]. However, due to a lack of complete understanding of the mechanisms that lead to lung injury, which explains why these drugs are yielding satisfactory, ineffective, or even adverse outcomes under different circumstances [17]. It indicated that further studies are needed to study the potential therapeutic role of drugs on BPD and its detailed pharmacological mechanisms. Network pharmacology is a recent frontier in systematic drug research [18], and it can assist in evaluating the efficiency of multi-component, multi-target compound formulas and exploring more therapeutic strategies by targeting a specific network [19]. Compared with previous studies the present study utilized the network pharmacology and molecular docking approach to reveal the interactions between the treatment drugs and targets for the first time.

2. Materials and methods

2.1 Data collection from GEO and identification of DEGs

The study flowchart is depicted in Fig 1. The gene expression dataset were downloaded from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), it is a worldwide public database that features high-throughput microarray and next-generation sequencing functional genomic datasets that are submitted by the research community [20]. Finally, GSE220135 and GSE32472 were obtained using “BPD” as keyword and “Homo sapiens” as species. Dataset GSE220135 was set as the discovery cohort, and GSE32472 was utilized as the validation cohort. The GSE220135 [21] contains available 162 normal samples and 38 BPD samples. The GSE32472 [22] was divided into 112 no BPD samples as control samples and 42 serve BPD samples as BPD samples.

thumbnail
Fig 1. The study flowchart.

Abbreviations:GEO, gene expression omnibus; BPD, Bronchopulmonary dysplasia; PPI, protein-protein interaction; GO, gene ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; DMNC, Maximum Neighborhood Component; EPC, EcCentricity, Edge Percolated Component; MCC, Maximal Clique Centrality; MNC, Maximum Neighborhood Component; SVM-RFE, Support Vector Machine-Recursive Feature Elimination; ssGSEA, Single-sample gene set enrichment analysis; CCNB1, Cyclin B1; CXCL10, the C-X-C motif chemokine 10; IL7R, IL-7 receptor; DEFA4, the Defensin Alpha4; PRTN3, Proteinase 3; NCAPG, non-SMC condensin I complex subunit G.

https://doi.org/10.1371/journal.pone.0323006.g001

As for discovery cohort, the count matrix were read using R language. Then R package “DESeq2” was utilized to identify DEGs based on a threshold of | logFC| > 1.0 and an adjusted p-value < 0.05. The R package “ggplot2” was used to generate a volcano plot to visualize the identified DEGs.

2.2 Functional and pathway enrichment analyses

We used the online analysis tool DAVID (https://david.ncifcrf.gov/) to perform GO enrichment and KEGG pathway analyses based on DEGs and the adjusted p-value < 0.05 was set as the cut-off standard. The Gene Ontology (GO) knowledge base (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and non-coding RNAs) [23], and its terms contain biological process (BP), cellular component (CC), and molecular function (MF). The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a manually curated database resource that integrates various biological objects, which are categorised into the following systems: systems, genomic, chemical and health information [24].

2.3 PPI network construction and pre-identifcation of hub genes

In order to enhance our comprehension of interactions among protein-coding genes, we utilized the Search Tool for the Retrieval of Interacting Genes (STRING) [25] online database (http://string-db.org) to construct a DEGs’s PPI network and it was imported into Cytoscape (version 3.7.2) [26]. In this network, we hid the nodes without connections with others and option the minimum required interaction score was 0.7. After that we used the plug-ins CytoHubba of Cytoscape and 12 topological analysis methods (Betweenness, BottleNeck, Closeness, Clustering Coefficient, Degree, Density of Maximum Neighborhood Component(DMNC), EcCentricity, Edge Percolated Component (EPC), Maximal Clique Centrality (MCC), Maximum Neighborhood Component (MNC), Radiality, Stress) to screened out hub genes. The overlapping DEGs of 12 topological algorithms were visualized by R packages “venn”. Furthermore, R packages “heatmap” was used to generate a heat map based on the overlapping DEGs.

2.4 Identification of diagnostic markers via ML algorithms

Support vector machine (SVM) is a supervised machine learning (ML) method capable of learning from data and making decisions [27]. Random Forest (RF) is a powerful feature selection, classification, and prediction method that offers several advantages, including no limitations on variable conditions and superior accuracy, sensitivity and specificity [28,29]. The least absolute shrinkage and selection operator (LASSO) is a data-mining method, which commonly used in multiple-linear regression to streamline models and provides widespread use in a variety of fields [30].

Therefore, we employed three machine learning algorithms to obtain diagnostic markers of BPD. SVMs was performed using the R package “e1071” and used 5-fold cross-validation to train the model to improve the performance of the model. RF was performed via the R package “randomForest”. LASSO regression analysis was performed using the R package “glmnet”.

2.5 Immune cell infiltration analysis

In order to estimate the relative abundance of each cell type in a bulk of cells from their gene expression profiles. We used the web tool CIBERSORT [31] (http://CIBERSORT.stanford.edu/) to calculate immune cell infiltration and explore the disease immune microenvironment. Then, we utilized histogram and boxplot to visualize the proportion of different immune cells in each sample and expression of difference immune cell between BPD group and control group respectively. Furthermore, Spearman correlation analysis was utilized to calculated the correlation of diagnostic markers and immune cells. The correlation between different immune cells was also analyzed. Only coefficients with P values below 0.05 will be included in the plots.

2.6 Single sample gene set enrichment analysis (ssGSEA)

ssGSEA is an extended version of gene set enrichment analysis that calculates an enrichment score for each sample-gene set pair. We conducted ssGSEA enrichment analyses using the signalling pathway dataset from the MSigDB (h.all.v2024.1.Hs.entrez.gmt) database [32] as the background. We subsequently utilized enrichplot to display the top 10 pathways based on normalized enrichment score (NES).

2.7 Receiver operating characteristic (ROC) evaluation and nomogram construction

We used GraphPad Prism (version 9.5) to established ROC curves to evaluate the diagnostic value of diagnostic markers for discovery cohort and validation cohort. We calculated the area under the ROC curve (AUC) and its 95% CI to quantify this value. An AUC value greater than 0.6 was considered an available diagnostic value. The R package “rms” was utilized to construct the nomogram based on logistic regression.

2.8 Network pharmacology and molecular docking

The combination of ML with network pharmacology is considered an important topic of biomedical sciences [19]. Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries [33]. We uploaded diagnostic markers to the Enrichr website (http://amp.pharm.mssm.edu/Enrichr) to predict therapeutic drugs using the DSigDB database. The protein structures of diagnostic markers were retrieved from Uniprot Data Bank (https://www.uniprot.org/) in PDB format. The 3D structures of candidate drugs were obtained from PubChem (http://pubchem.ncbi.nlm.nih.gov), which were transformed by Open Babel Toolkit (version 3.1.1) into a MOL2 file format. Finally, we used AutoDock software (version 1.5.7) to optimize protein structures for molecular docking and used Open-Source PyMOL (version 3.0.4) to visualize docking results.

2.9 Statistical analysis

If the comparison between two groups satisfies the equality of variance test, a t-test is used, otherwise a non-parametric test is employed, P-value < 0.05 was considered statistically significant. R software (version 4.4.1) and GraphPad Prism (version 9.5.0) were used to perform statistical analyses.

3. Results

3.1 Screening of DEGs and enrichment analysis

The research process is illustrated in Fig 1.

1103 DEGs were screened in discovery cohort with 615 upregulated and 488 downregulated genes (Fig 2A). Then we conducted GO enrichment and KEGG enrichment analysis on 1103 DEGs through the use of the DAVID online tool. The full results are presented in S1 Table. GO enrichment analysis showed: (1) CC: The DEGs were enriched in various locations, such as “extracellular region”, “extracellular space”, “T cell receptor complex”, “specific granule lumen”, “azurophil granule lumen” (Fig 3A). (2) BP: The DEGs were significantly enriched in several processes, including “RNA processing”, “adaptive immune response”, “immune response”, “defense response to bacterium”, “antibacterial humoral response” (Fig 3B). (3) MF: The DEGs were significantly enriched in “antigen binding” and “immunoglobulin receptor binding” (Fig 3C). Meanwhile, KEGG enrichment analysis demonstrated that the DEGs were significantly enriched in “cytokine-cytokine receptor interaction” and “IL-17 signaling pathway” metabolic pathways (Fig 3D).

thumbnail
Fig 2. DEGs.

(A) The volcano plot shows all DEGs, of which red and blue dots refer to significant DEGs. (B) Visualization of the PPI network genes by Cytoscape. (C) Venn diagram shows that 38 genes are identified from the Cytoscape using 12 topological analysis methods. (D) The heatmap displays the 38 upregulated and downregulated DEGs identified from Cytoscape, and each column represents one of BPD cases or controls. Red and blue represent upregulated and downregulated gene expression. DEGs differentially expressed genes.

https://doi.org/10.1371/journal.pone.0323006.g002

thumbnail
Fig 3. Enrichment analysis.

(A-C) GO analysis of the DEGs, including biological process (BP), cellular component (CC), and molecular function (MF), respectively. The y-axis represents different GO terms, the x-axis represents gene ratio enriched in relative GO terms, the circle size refers to gene numbers, and the color represents p-value. (D) KEGG pathway analysis of the intersection of genes. Different colors represent various significant pathways and related enriched genes.

https://doi.org/10.1371/journal.pone.0323006.g003

3.2 PPI network construction and identification of diagnostic markers

All DEGs were uploaded to the STRING database to obtain PPI network, using high confidence (0.700) and the “hide disconnected nodes in network” as threshold. The PPI network containing 216 nodes was finally obtained (S1 Fig). To explore potential interactions between the DEGs, we imported the PPI network data attained into Cytoscape software (Fig 2B). We employed the Cytohubba plugin in Cytoscape to detect hub nodes and implemented 12 topological algorithms to calculate hub genes in the PPI network. After selecting the top 108 hub genes from the 12 algorithms (S2 Table), 38 overlapping hub genes were identified and displayed using a Venn diagram (Fig 2C), and the heat map based on 38 genes is presented in Fig 2D.

Following that, we employed three ML algorithms to identify diagnostic markers, and the 6 overlapping genes (CCNB1, CXCL10, IL7R, DEFA4, PRTN3, and NCAPG) were identified eventually which were visualized in Fig 4. Complete list of three ML algorithms was in S3 Table.

thumbnail
Fig 4. Machine learning.

(A-B) Biomarkers was using the SVMs through 5-fold cross-validation. (C-D) LASSO logistic regression algorithm to screen diagnostic markers. (E) Based on RF algorithm to screen biomarkers. (F) Venn diagram shows that 6 genes are identified from three ML algorithms. SVM-RFE support vector machine-recursive feature elimination, RF random forest, LASSO least absolute shrinkage and selection operator. ML Machine Learning.

https://doi.org/10.1371/journal.pone.0323006.g004

3.3 Immune cell infiltration

We utilized CIBERSORT to quantify the proportions of 22 immune cell types, and each sample was shown as a staked bar plot (Fig 5A). The box plot (Fig 5B) revealed that neutrophils, B cells memory and eosinophils were higher in BPD group compared to control group. However, proportions of B cells naive, T cells CD4 naive, T cells CD4 memory resting and NK cells activated were lower in the BPD group compared to the control group. Moreover, the correlation analyses of immune infiltrated cells were investigated (S2 Fig). It shows that Dendritic cells activated had the top positive correlation with Macrophages M1 (r = 0.54), while neutrophils had the the largest negative correlation with T cells CD8 (r = -0.57) at same time.

thumbnail
Fig 5. Immuno-infiltration analysis of the BPD dataset.

(A) Histo-gram displays the results of 20 immune cell infiltrations. (B) Box plot displays the results differentially immune cell infiltrations between the BPD and non‐BPD group. An asterisk (*) signifies a P-value of < 0.05, indicating statistical significance. (C) Heat map displays the correlation analyses of immune infiltrated cells and immune infiltrated cells between six hub genes, red indicates a positive correlation, while blue represents a negative correlation.

https://doi.org/10.1371/journal.pone.0323006.g005

Subsequently, we conducted correlation analyses between immune infiltrated cells and diagnostic markers (Fig 5C). In our analysis, IL7R showed a positive correlation with CD4 + naive T cells and a negative correlation with eosinophils, DEFA4 showed a positive correlation with neutrophils and a negative correlation with CD8 + T cells.

3.4 ssGSEA

Each ssGSEA enrichment score indicates the degree of upregulation or downregulation of genes within a specific gene set in the sample (S4 Table). Based on the results, the top 10 pathways associated with the six genes as determined by NES were visualized (Fig 6). The results showed that all six hub genes were enriched in pathways such as myc targets v1, interferon gamma response and inflammation response.

thumbnail
Fig 6. The enrichplot of ssGSEA for CCNB1, CXCL10, DEFA4, IL7R, NCAPG and PRTN3.

https://doi.org/10.1371/journal.pone.0323006.g006

3.5 Nomogram construction and ROC curves

Next, the nomogram showed that IL7R, CXCL10, and NCAPG had high predictive values (Fig 7H). Meanwhile, ROC curves were established for each of the six candidate hub genes to evaluate their diagnostic specificity and sensitivity (Fig 7G) and the results were as follows: IL7R (AUC 0.7524, CI 0.6689 0.8360), CCNB1 (AUC 0.6494, CI 0.5582–0.7407), CXCL10 (AUC 0.5127, CI 0.3982–0.6272), DEFA4 (AUC 0.6174, CI 0.5089–0.7260), PRTN3 (AUC 0.5550, CI 0.4440–0.6660) and NCAPG (AUC 0.6222, CI 0.5251–0.7192). The expression of hub genes was displayed using box plots generated in GraphPad Prism (Fig 7A7F). Furthermore, the expression of five hub genes (IL7R, CXCL10, DEFA4, PRTN3, NCAPG) was verified in GSE32472 (S3 Fig).

thumbnail
Fig 7. The diagnostic value evaluation and nomogram construction of the discovery cohort.

(A-F)The Box plot showed expression of hub genes in BPDs and non-BPD groups. (G) The ROC curve of 6 hub genes in BPD. The number in the parentheses represents the AUC (Area Under the Curve).

https://doi.org/10.1371/journal.pone.0323006.g007

3.6 Network pharmacology

We uploaded the six identified hub genes to the Enrichr website and obtained results for the candidate drug predicted using DSigDB (S5 Table). Based on the proportion of genes associated with predicted drugs and their safety profiles in humans, we ultimately selected resveratrol and progesterone as candidate drugs for further research. Next, we conducted molecular docking to identify potential binding between the drug and six diagnostic markers. All interactions between ligand-macromolecule complexes were mediated by hydrogen bonds, ensuring stability of these bindings. The result showed that both resveratrol (Fig 8) and progesterone (Fig 9) successfully bound to DEFA4, IL7R, CXCL10, PRTN3, CCNB1, and NCAPG, with docking energy values of: −5.49 kcal/mol, −5.20 kcal/mol, −3.76 kcal/mol, −5.00 kcal/mol, −5.07 kcal/mol, −4.76 kcal/mol and −8.83kcal/mol, −6.65kcal/mol, −7.72kcal/mol, −7.46kcal/mol, −8.39kcal/mol, −6.86kcal/mol. The smaller the docking energy value, the greater the stability of the binding between the drug and the gene.

thumbnail
Fig 8. Molecular docking diagram of the six diagnostic bmarkers with resveratrol.

https://doi.org/10.1371/journal.pone.0323006.g008

thumbnail
Fig 9. Molecular docking diagram of the six diagnostic bmarkers with progesterone.

https://doi.org/10.1371/journal.pone.0323006.g009

4. Discussion

In this work, We not only attempted to identify diagnostic markers for BPD, but also utilized network pharmacology and molecular docking to reveal the interactions between predicted drugs and diagnostic markers. Meanwhile, we investigated the impact of immune cells on BPD and the relationship between diagnostic biomarkers and immune cells. In our study, Enrichment analysis revealed that the 1,103 differentially expressed genes we screened were mostly enriched in processes such as “adaptive immune response”, “T cell receptor complex”, and “immunoglobulin receptor binding”. Next, we further identified diagnostic markers by constructing a PPI network and using three ML algorithms. Ultimately, we identified DEFA4, IL7R, PRTN3, CXCL10, CCNB1, and NCAPG as the diagnostic markers for BPD.

IL 7 R regulates the activation and proliferation of immune cells through the activation of the JAK -STAT signaling pathways [34]. IL-7 receptor (IL-7R) is continually expressed by T cells in both the naive and memory states [35], and signaling through this receptor is essential for the long-term maintenance of all T cell populations. T lymphocytes play a crucial role not only in the recognition and elimination of pathogens, but also in coordinating tissue inflammatory responses and repair [36]. Alveolar macrophages (AMs) [37] and alveolar epithelial cells (AECs) [38] are crucial in the inflammatory response underlying lung inflammation. Studies have shown that certain drugs can successfully treat lung inflammation by blocking the IL-7/IL-7R pathways between macrophages and epithelial cells [39].

In addition to being a chemoattractant for different types of immune cells, human α-defensins can also induce the production of cytokines and chemokines [40]. Defensin Alpha 4 (DEFA4) is a member of the defensin family and is also known as Corticosterone Production (Corticostatin) [41] due to its antibacterial activity. Numerous studies have demonstrated that DEFA4 plays a crucial role not only in anti-microbial activity [42] and antiviral activity [43], but also in infectious diseases [43], and autoimmune diseases [44]. Furthermore, DEFA4 also plays a significant role in the respiratory system, and it is upregulated in various respiratory-related disorders, such as asthma [45] and idiopathic pulmonary fibrosis (IPF) [46].

In addition to binding to its receptor and triggering chemotaxis, cell growth, and apoptosis [47], the C-X-C motif chemokine 10 (CXCL10) also modulates immune response by recruiting inflammatory cells to the site of inflammation [48]. As an inflammation and macrophage-regulated chemokine [49], Loss of CXCL10 in BPD may restrict macrophage infiltration into the lungs, reduce lung apoptosis, attenuate pulmonary fibrotic remodeling in neonates, and promote alveolar growth [50].

Proteinase 3 (PR3) is a member of neutrophil serine proteases (NSPs) family [51] and it is encoded by the gene PRTN3 [52]. NSPs are considered to be multifunctional enzymes that participate in both pathogenic agent killing and regulation of inflammatory processes [53]. PR3 plays an important role in the anti-microbial activity of neutrophils and activation of neutrophils can result in the release of various cytotoxic products, leading to lung injury [54]. In addition to limiting lung development and contributing to BPD through the release of ROS, neutrophils can also disrupt the process of alveolar formation by releasing exosomes [55].

Cyclin B1 participates in mitochondrial dynamics [56] and Mitochondria-regulated apoptosis [57], which plays a significant role in cell cycle. When the CCNB1 gene is knocked out, the cell cycle is arrested [58]. Similarly, NCAPG, also known as non-SMC condensin I complex subunit G, not only leads to weakened cell migration and invasion ability after its knockout but also promotes cell apoptosis. However, our understanding of CCNB1 and NCAPG and their associated mechanisms in BPD is limited.

Although inflammation is a natural response to injury that aids in the healing process, it can also lead to further damage and dysfunction of the affected organ [59]. Inflammation plays a significant role in BPD and excessive inflammatory responses are the primary pathogenic mechanism of lung disease. We analyze the immune infiltration process in BPD to gain more comprehensive understanding of the impact of immune cell infiltration on the development of BPD. In our study, IL7R, DEFA4 and CXCL10 have been shown to be associated with various immune cells, and DEFA4 and CXCL10 were found to be upregulated, while IL7R was downregulated. which could offer new treatment avenues.

To further reveal the potential functions of these six hub genes, we conducted ssGSEA enrichment analysis. To test the predictive performance of the genes, we constructed nomogram and ROC curves and conducted external validation.

Subsequently, resveratrol and progesterone were selected as predicted drugs and subjected to molecular docking with the six aforementioned hub genes. Resveratrol (3,4’,5-trihydroxy-trans-stilbene) is a plant antitoxin that naturally exists in many dietary sources. It not only functions as an antioxidant by scavenging free radicals [60], but also possesses potential anti-inflammatory effects [61]. Studies have shown that resveratrol not only has anti-fibrotic effects [62], but also has a protective effect against neonatal oxygen-induced airway hyperreactivity [63]. The development of pharmaceutical formulations of resveratrol may become a prospective targeted treatment approach for BPD. Due to their potent anti-inflammatory effects, corticosteroids are utilized in the prevention or treatment of BPD, but their use can also increase the risk of neurological developmental disorders and other related diseases [64,65]. Both progesterone and corticosteroids belong to the class of steroid hormones, and progesterone is a critical participant in the interaction between the endocrine and immune systems [66], and further exploration is needed to determine whether progesterone can be used to treat BPD.

Interestingly, although the expression of 5 hub genes(IL7R, CXCL10, DEFA4, PRTN3, NCAPG) was validated in GSE32472, both of our predicted drugs successfully docked with 6 genes. In addition, while CXCL10 was upregulated in the discovery cohort, it was downregulated in the validation cohort, indicating the need for further research into the mechanisms of CXCL10 and CCNB1 in BPD.

The present study offers valuable insights into the potential of diagnostic markers and therapeutic agents for BPD. However, the study is not without its limitations. Firstly, the limited BPD sample size may reduce statistical robustness. Moreover, the analyses were predominantly based on transcriptomic data, with a paucity of integration with other histological data (e.g., proteomic, metabolomic, etc.). Furthermore, molecular docking results are typically based on static protein structures, which cannot fully model the dynamic behavior of proteins in vivo, necessitating further experimental validation, especially through in vitro and in vivo biological experiments. Finally, while the molecular docking results indicated that resveratrol and progesterone bind well to the diagnostic markers, the clinical development of these drugs remains challenging due to issues of bioavailability, toxicity, and safety for long-term use. It is imperative that these limitations are addressed in future studies by considering increasing the sample size, integrating multi-omics data, performing more comprehensive experimental validation, and further exploring the clinical translational potential of potential therapeutic agents.

Supporting information

S1 Table. GO and KEGG pathway enrichment analysis of overlapping DEGs(FDR < 0.05).

https://doi.org/10.1371/journal.pone.0323006.s001

(DOCX)

S2 Table. Complete list of Top108 DEGs from Degree,Closeness and Betweenness algorithms via CytoHubba plug-in.

https://doi.org/10.1371/journal.pone.0323006.s002

(DOCX)

S3 Table. Complete list of three machine learning algorithms.

https://doi.org/10.1371/journal.pone.0323006.s003

(DOCX)

S4 Table. Complete list of ssGSEA results (adj.P value <0.05).

https://doi.org/10.1371/journal.pone.0323006.s004

(DOCX)

S5 Table. Candidate drug predicted using DSigDB (adj.P value <0.05).

https://doi.org/10.1371/journal.pone.0323006.s005

(DOCX)

S1 Fig. The PPI network via STRING database.

https://doi.org/10.1371/journal.pone.0323006.s006

(DOCX)

S2 Fig. The correlated heatmap showed the correlation analysis of immune infiltrated cells.

https://doi.org/10.1371/journal.pone.0323006.s007

(DOCX)

S3 Fig. The diagnostic value evaluation and nomogram construction of the validation cohort.

https://doi.org/10.1371/journal.pone.0323006.s008

(DOCX)

Acknowledgments

The authors thank all the participants for their cooperation and are grateful for the support of Neonatology, The First Affiliated Hospital of China Medical University.

References

  1. 1. Lui K, Lee SK, Kusuda S, Adams M, Vento M, Reichman B, et al. Trends in outcomes for neonates born very preterm and very low birth weight in 11 high-income countries. J Pediatr. 2019;215:32–40.e14. pmid:31587861
  2. 2. Davidson LM, Berkelhamer SK. Bronchopulmonary dysplasia: chronic lung disease of infancy and long-term pulmonary outcomes. J Clin Med. 2017;6(1):4. pmid:28067830
  3. 3. Vom Hove M, Prenzel F, Uhlig HH, Robel-Tillig E. Pulmonary outcome in former preterm, very low birth weight children with bronchopulmonary dysplasia: a case-control follow-up at school age. J Pediatr. 2014;164(1):40–5.e4. pmid:24055328
  4. 4. Fawke J, Lum S, Kirkby J, Hennessy E, Marlow N, Rowell V, et al. Lung function and respiratory symptoms at 11 years in children born extremely preterm: the EPICure study. Am J Respir Crit Care Med. 2010;182(2):237–45. pmid:20378729
  5. 5. Kim D-H, Kim H-S, Choi CW, Kim E-K, Kim BI, Choi J-H. Risk factors for pulmonary artery hypertension in preterm infants with moderate or severe bronchopulmonary dysplasia. Neonatology. 2012;101(1):40–6. pmid:21791938
  6. 6. Joshi S, Powell T, Watkins WJ, Drayton M, Williams EM, Kotecha S. Exercise-induced bronchoconstriction in school-aged children who had chronic lung disease in infancy. J Pediatr. 2013;162(4):813–8.e1. pmid:23110946
  7. 7. Álvarez-Fuente M, Arruza L, Muro M, Zozaya C, Avila A, López-Ortego P, et al. The economic impact of prematurity and bronchopulmonary dysplasia. Eur J Pediatr. 2017;176(12):1587–93. pmid:28889192
  8. 8. Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19(1):281. pmid:31864346
  9. 9. Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med. 2018;284(6):603–19. pmid:30102808
  10. 10. Kalikkot Thekkeveedu R, Guaman MC, Shivanna B. Bronchopulmonary dysplasia: a review of pathogenesis and pathophysiology. Respir Med. 2017;132:170–7. pmid:29229093
  11. 11. Sun Y, Chen C, Zhang X, Weng X, Sheng A, Zhu Y, et al. High neutrophil-to-lymphocyte ratio is an early predictor of bronchopulmonary dysplasia. Front Pediatr. 2019;7:464. pmid:31781524
  12. 12. Kribs A, Roberts KD, Trevisanuto D, O’Donnell C, Dargaville PA. Surfactant delivery strategies to prevent bronchopulmonary dysplasia. Semin Perinatol. 2023;47(6):151813. pmid:37805275
  13. 13. van de Loo M, van Kaam A, Offringa M, Doyle LW, Cooper C, Onland W. Corticosteroids for the prevention and treatment of bronchopulmonary dysplasia: an overview of systematic reviews. Cochrane Database Syst Rev. 2024;4(4):CD013271. pmid:38597338
  14. 14. Bruschettini M, Brattström P, Russo C, Onland W, Davis PG, Soll R. Caffeine dosing regimens in preterm infants with or at risk for apnea of prematurity. Cochrane Database Syst Rev. 2023;4(4):CD013873. pmid:37040532
  15. 15. Ó Briain E, Byrne AO, Dowling J, Kiernan J, Lynch JCR, Alomairi L, et al. Diuretics use in the management of bronchopulmonary dysplasia in preterm infants: a systematic review. Acta Paediatr. 2024;113(3):394–402. pmid:38214373
  16. 16. Ng G, da Silva O, Ohlsson A. Bronchodilators for the prevention and treatment of chronic lung disease in preterm infants. Cochrane Database Syst Rev. 2016;12(12):CD003214. pmid:27960245
  17. 17. Principi N, Di Pietro GM, Esposito S. Bronchopulmonary dysplasia: clinical aspects and preventive and therapeutic strategies. J Transl Med. 2018;16(1):36. pmid:29463286
  18. 18. Noor F, Tahir Ul Qamar M, Ashfaq UA, Albutti A, Alwashmi ASS, Aljasir MA. Network pharmacology approach for medicinal plants: review and assessment. Pharmaceuticals (Basel). 2022;15(5):572. pmid:35631398
  19. 19. Noor F, Asif M, Ashfaq UA, Qasim M, Tahir Ul Qamar M. Machine learning for synergistic network pharmacology: a comprehensive overview. Brief Bioinform. 2023;24(3):bbad120. pmid:37031957
  20. 20. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41(Database issue):D991-5. pmid:23193258
  21. 21. Wang X, Cho H-Y, Campbell MR, Panduri V, Coviello S, Caballero MT, et al. Epigenome-wide association study of bronchopulmonary dysplasia in preterm infants: results from the discovery-BPD program. Clin Epigenetics. 2022;14(1):57. pmid:35484630
  22. 22. Pietrzyk JJ, Kwinta P, Wollen EJ, Bik-Multanowski M, Madetko-Talowska A, Günther C-C, et al. Gene expression profiling in preterm infants: new aspects of bronchopulmonary dysplasia development. PLoS One. 2013;8(10):e78585. pmid:24194948
  23. 23. Gene Ontology Consortium, Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, et al. The gene ontology knowledgebase in 2023. Genetics. 2023;224(1):iyad031. pmid:36866529
  24. 24. Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023;51(D1):D587–92. pmid:36300620
  25. 25. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605–12. pmid:33237311
  26. 26. Doncheva NT, Morris JH, Holze H, Kirsch R, Nastou KC, Cuesta-Astroz Y, et al. Cytoscape stringApp 2.0: analysis and visualization of heterogeneous biological networks. J Proteome Res. 2023;22(2):637–46. pmid:36512705
  27. 27. Valkenborg D, Rousseau A-J, Geubbelmans M, Burzykowski T. Support vector machines. Am J Orthod Dentofacial Orthop. 2023;164(5):754–7. pmid:37914440
  28. 28. Ellis K, Kerr J, Godbole S, Lanckriet G, Wing D, Marshall S. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers. Physiol Meas. 2014;35(11):2191–203. pmid:25340969
  29. 29. Boateng EY, Otoo J, Abaye DA. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review. JDAIP. 2020;08(04):341–57.
  30. 30. Motamedi F, Pérez-Sánchez H, Mehridehnavi A, Fassihi A, Ghasemi F. Accelerating big data analysis through LASSO-random forest algorithm in QSAR studies. Bioinformatics. 2022;38(2):469–75. pmid:34979024
  31. 31. Le T, Aronow RA, Kirshtein A, Shahriyari L. A review of digital cytometry methods: estimating the relative abundance of cell types in a bulk of cells. Brief Bioinform. 2021;22(4):bbaa219. pmid:33003193
  32. 32. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. pmid:16199517
  33. 33. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90-7. pmid:27141961
  34. 34. Rochman Y, Spolski R, Leonard WJ. New insights into the regulation of T cells by gamma(c) family cytokines. Nat Rev Immunol. 2009;9(7):480–90. pmid:19543225
  35. 35. Schluns KS, Kieper WC, Jameson SC, Lefrançois L. Interleukin-7 mediates the homeostasis of naïve and memory CD8 T cells in vivo. Nat Immunol. 2000;1(5):426–32. pmid:11062503
  36. 36. Toldi G, Hummler H, Pillay T. T lymphocytes, multi-omic interactions and bronchopulmonary dysplasia. Front Pediatr. 2021;9:694034. pmid:34169050
  37. 37. Fan EKY, Fan J. Regulation of alveolar macrophage death in acute lung inflammation. Respir Res. 2018;19(1):50. pmid:29587748
  38. 38. Conlon TM, John-Schuster G, Heide D, Pfister D, Lehmann M, Hu Y, et al. Inhibition of LTβR signalling activates WNT-induced regeneration in lung. Nature. 2020;588(7836):151–6. pmid:33149305
  39. 39. Wang Y, Wei H, Song Z, Jiang L, Zhang M, Lu X, et al. Inhalation of panaxadiol alleviates lung inflammation via inhibiting TNFA/TNFAR and IL7/IL7R signaling between macrophages and epithelial cells. J Ginseng Res. 2024;48(1):77–88. pmid:38223829
  40. 40. Chaly YV, Paleolog EM, Kolesnikova TS, Tikhonov II, Petratchenko EV, Voitenok NN. Neutrophil alpha-defensin human neutrophil peptide modulates cytokine production in human monocytes and adhesion molecule expression in endothelial cells. Eur Cytokine Netw. 2000;11(2):257–66. pmid:10903805
  41. 41. Singh A, Bateman A, Zhu QZ, Shimasaki S, Esch F, Solomon S. Structure of a novel human granulocyte peptide with anti-ACTH activity. Biochem Biophys Res Commun. 1988;155(1):524–9. pmid:2843187
  42. 42. Ericksen B, Wu Z, Lu W, Lehrer RI. Antibacterial activity and specificity of the six human {alpha}-defensins. Antimicrob Agents Chemother. 2005;49(1):269–75. pmid:15616305
  43. 43. Furci L, Sironi F, Tolazzi M, Vassena L, Lusso P. Alpha-defensins block the early steps of HIV-1 infection: interference with the binding of gp120 to CD4. Blood. 2007;109(7):2928–35. pmid:17132727
  44. 44. Villanueva E, Yalavarthi S, Berthier CC, Hodgin JB, Khandpur R, Lin AM, et al. Netting neutrophils induce endothelial damage, infiltrate tissues, and expose immunostimulatory molecules in systemic lupus erythematosus. J Immunol. 2011;187(1):538–52. pmid:21613614
  45. 45. Bigler J, Boedigheimer M, Schofield JPR, Skipp PJ, Corfield J, Rowe A, et al. A severe asthma disease signature from gene expression profiling of peripheral blood from U-BIOPRED cohorts. Am J Respir Crit Care Med. 2017;195(10):1311–20. pmid:27925796
  46. 46. Molyneaux PL, Willis-Owen SAG, Cox MJ, James P, Cowman S, Loebinger M, et al. Host-microbial interactions in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med. 2017;195(12):1640–50. pmid:28085486
  47. 47. Neville LF, Mathiak G, Bagasra O. The immunobiology of interferon-gamma inducible protein 10 kD (IP-10): a novel, pleiotropic member of the C-X-C chemokine superfamily. Cytokine Growth Factor Rev. 1997;8(3):207–19. pmid:9462486
  48. 48. Lee J-H, Kim B, Jin WJ, Kim H-H, Ha H, Lee ZH. Pathogenic roles of CXCL10 signaling through CXCR3 and TLR4 in macrophages and T cells: relevance for arthritis. Arthritis Res Ther. 2017;19(1):163. pmid:28724396
  49. 49. Tsai C-F, Chen J-H, Yeh W-L. Pulmonary fibroblasts-secreted CXCL10 polarizes alveolar macrophages under pro-inflammatory stimuli. Toxicol Appl Pharmacol. 2019;380:114698. pmid:31394157
  50. 50. Hirani DV, Thielen F, Mansouri S, Danopoulos S, Vohlen C, Haznedar-Karakaya P, et al. CXCL10 deficiency limits macrophage infiltration, preserves lung matrix, and enables lung growth in bronchopulmonary dysplasia. Inflamm Regen. 2023;43(1):52. pmid:37876024
  51. 51. Heutinck KM, ten Berge IJM, Hack CE, Hamann J, Rowshani AT. Serine proteases of the human immune system in health and disease. Mol Immunol. 2010;47(11–12):1943–55. pmid:20537709
  52. 52. Korkmaz B, Moreau T, Gauthier F. Neutrophil elastase, proteinase 3 and cathepsin G: physicochemical properties, activity and physiopathological functions. Biochimie. 2008;90(2):227–42. pmid:18021746
  53. 53. Pham CTN. Neutrophil serine proteases: specific regulators of inflammation. Nat Rev Immunol. 2006;6(7):541–50. pmid:16799473
  54. 54. Tetley TD. New perspectives on basic mechanisms in lung disease. 6. Proteinase imbalance: its role in lung disease. Thorax. 1993;48(5):560–5. pmid:8322246
  55. 55. Vargas A, Roux-Dalvai F, Droit A, Lavoie J-P. Neutrophil-derived exosomes: a new mechanism contributing to airway smooth muscle remodeling. Am J Respir Cell Mol Biol. 2016;55(3):450–61. pmid:27105177
  56. 56. Kashatus DF, Lim K-H, Brady DC, Pershing NLK, Cox AD, Counter CM. RALA and RALBP1 regulate mitochondrial fission at mitosis. Nat Cell Biol. 2011;13(9):1108–15. pmid:21822277
  57. 57. Xie B, Wang S, Jiang N, Li JJ. Cyclin B1/CDK1-regulated mitochondrial bioenergetics in cell cycle progression and tumor resistance. Cancer Lett. 2019;443:56–66. pmid:30481564
  58. 58. Wang F, Chen X, Yu X, Lin Q. Degradation of CCNB1 mediated by APC11 through UBA52 ubiquitination promotes cell cycle progression and proliferation of non-small cell lung cancer cells. Am J Transl Res. 2019;11(11):7166–85. pmid:31814919
  59. 59. Korkmaz B, Horwitz MS, Jenne DE, Gauthier F. Neutrophil elastase, proteinase 3, and cathepsin G as therapeutic targets in human diseases. Pharmacol Rev. 2010;62(4):726–59. pmid:21079042
  60. 60. de la Lastra CA, Villegas I. Resveratrol as an antioxidant and pro-oxidant agent: mechanisms and clinical implications. Biochem Soc Trans. 2007;35(Pt 5):1156–60. pmid:17956300
  61. 61. Ganesan K, Xu B. A critical review on polyphenols and health benefits of black soybeans. Nutrients. 2017;9(5):455. pmid:28471393
  62. 62. Özdemir ÖMA, Gözkeser E, Bir F, Yenisey Ç. The effects of resveratrol on hyperoxia-induced lung injury in neonatal rats. Pediatr Neonatol. 2014;55(5):352–7. pmid:24630815
  63. 63. Reçica R, Kryeziu I, Thaçi Q, Avtanski D, Mladenov M, Basholli-Salihu M, et al. Protective effects of resveratrol against airway hyperreactivity, oxidative stress, and lung inflammation in a rat pup model of bronchopulmonary dysplasia. Physiol Res. 2024;73(2):239–51. pmid:38710061
  64. 64. Doyle LW, Cheong JL, Ehrenkranz RA, Halliday HL. Early (< 8 days) systemic postnatal corticosteroids for prevention of bronchopulmonary dysplasia in preterm infants. Cochrane Database Syst Rev. 2017;10(10):CD001146. pmid:29063585
  65. 65. Doyle LW, Cheong JL, Hay S, Manley BJ, Halliday HL. Early (< 7 days) systemic postnatal corticosteroids for prevention of bronchopulmonary dysplasia in preterm infants. Cochrane Database Syst Rev. 2021;10(10):CD001146. pmid:34674229
  66. 66. Arck P, Hansen PJ, Mulac Jericevic B, Piccinni M-P, Szekeres-Bartho J. Progesterone during pregnancy: endocrine-immune cross talk in mammalian species and the role of stress. Am J Reprod Immunol. 2007;58(3):268–79. pmid:17681043