Identification of hub genes associated with COVID-19 and idiopathic pulmonary fibrosis by integrated bioinformatics analysis

Introduction The coronavirus disease 2019 (COVID-19), emerged in late 2019, was caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The risk factors for idiopathic pulmonary fibrosis (IPF) and COVID-19 are reported to be common. This study aimed to determine the potential role of differentially expressed genes (DEGs) common in IPF and COVID-19. Materials and methods Based on GEO database, we obtained DEGs from one SARS-CoV-2 dataset and five IPF datasets. A series of enrichment analysis were performed to identify the function of upregulated and downregulated DEGs, respectively. Two plugins in Cytoscape, Cytohubba and MCODE, were utilized to identify hub genes after a protein-protein interaction (PPI) network. Finally, candidate drugs were predicted to target the upregulated DEGs. Results A total of 188 DEGs were found between COVID-19 and IPF, out of which 117 were upregulated and 71 were downregulated. The upregulated DEGs were involved in cytokine function, while downregulated DEGs were associated with extracellular matrix disassembly. Twenty-two hub genes were upregulated in COVID-19 and IPF, for which 155 candidate drugs were predicted (adj.P.value < 0.01). Conclusion Identifying the hub genes aberrantly regulated in both COVID-19 and IPF may enable development of molecules, encoded by those genes, as therapeutic targets for preventing IPF progression and SARS-CoV-2 infections.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a novel enveloped RNA beta coronavirus, is accountable for an ongoing outbreak of coronavirus disease 2019 (COVID-19) [1,2], which constitutes an enormous global burden on society. COVID-19 has resulted in over 224 million confirmed cases and over 4.68 million deaths globally. The research and development of anti-COVID-19 vaccine is currently ongoing; moreover, controlling disease transmission requires the development of effective drugs to cure it.
Idiopathic pulmonary fibrosis (IPF) is a chronic progressive disease with an irreversible advanced lung failure. IPF patients suffer from lung function decline, respiratory failure, and ultimately death [3]. The risk factors for IPF and COVID-19 are reported to be common [4]. However, the molecular mechanism underlying a crosstalk between COVID-19 and IPF was poorly defined. Identification of novel molecular targets has thus become imperative for the advancement of targeted therapy for COVID-19 with antifibrotic strategies.
The goal of the current study was to investigate the potential role of differentially expressed genes (DEGs) in the association between COVID-19 and IPF. We performed an overlap of DEGs between two the diseases on a basis of 5 datasets, followed by distinguishing the upregulated and downregulated genes. Based on a series of enrichment analysis, we interpreted the function of upregulated and downregulated DEGs. Furthermore, we carried out a protein-protein interaction (PPI) network analysis in which 22 upregulated hub genes and 11 downregulated hub genes were identified. Then, we analyzed the prominent function of 22 hub genes, and it was revealed that these hub genes upregulated in COVID-19 and IPF were involved in cytokine mediation, such as cell response to interferon. Finally, we performed a drug-target analysis and 155 candidate drugs targeting upregulated hub genes were identified. The workflow of the current study is shown in Fig 1. Herein, our findings demonstrated that hub gene and the candidate drug will be beneficial to the COVID-19 treatment. We also provide an insight that we can design and develop a candidate drug against virus variant such as Delta SARS-CoV-2, when there are common risk factors between a different disease and that caused by Delta.
The high-throughput data of SARS-CoV-2 infection was obtained from biopsy of a COVID-19 patient in GSE147507. The data of IPF was obtained from biopsy of IPF patient in five datasets, including GSE13065, GSE110147, GSE1i01286, GSE53845, and GSE24206. Venn diagram was used to reveal overlapped DEGs. The magenta circle represents DEGs in GSE147507 and yellow one represents DEGs in IPF datasets. Subsequently, common DEGs were subjected to a series of enrichment analysis and PPI network investigation. Based on an identification of highly expressed hub genes, a candidate drug was predicted to be available for a crosstalk between COVID-19 and IPF during the COVID-19 therapy.

The collection of databases and the identification of DEGs
DEGs were obtained from six datasets in Gene Expression Omnibus (GEO, https://www.ncbi. nlm.nih.gov/geo/) database [5,6] The DEGs related to SARS-CoV-2 were obtained from GSE147507 including SARS-CoV-2 infection in lung epithelium and lung alveolar cells of humans in Apr 07, 2021 [7,8]. Five GEO datasets were collected to obtain the DEGs related to IPF, including GSE13065 with 3 IPF samples and 3 normal samples lastly updated in May 02, 2019 [9], GSE110147 with 22 IPF samples and 11 normal samples lastly updated in Aug 19, 2018 [10], GSE101286 with 12 IPF samples and 3 normal samples lastly updated in Jul 25, 2021 [11], GSE53845 with 40 IPF samples and 8 normal samples lastly updated in Jan 23, 2019 [12], and GSE24206 with 17 IPF samples and 6 normal samples lastly updated in Mar 25, 2019 [13]. DEGs for the datasets were analyzed through GEO2R (https://www.ncbi.nlm.nih.gov/geo/ geo2r/) web tool which uses limma package for identifying DEGs and visualized by ggplot 2 in R package. Benjamini-Hochberg method was applied to both the datasets for controlling of false discovery rate (FDR). Cut-off criteria was obtained for GSE147507 using adjusted Pvalue < 0.05 and log2-fold change (absolute) > 1.0. All data generated or analyzed during this study are included in this published article and its supplementary information files.

Identification of common genes between COVID-19 and IPF
To determine detailed information of DEGs, these genes were further divided by aberrant expression level in distinct database. The adjusted P-value < 0.05 and log2-fold change > 1.0 is used as cut-off criteria for high expression DEGs, and adjusted P-value < 0.05 and log2-fold change < −1.0 for low expression DEGs in distinct dataset.
The upregulated as well as downregulated DEGs in GSE147507 were overlapped with other five datasets related to IPF.

Enrichment analysis for common DEGs
To understand a functional characteristic of DEGs in COVID-19 and IPF, a series of enrichment analysis were adopted to gain a detailed information of biological function and pathways. Gene Ontology (GO) was performed to provide three terms, including biological process, molecular function, and cellular component [14]. Kyoto Encyclopedia of Genes and Genomes (KEGG) was used to identify metabolic pathway [15]. An online tool Enrichr (https://amp. pharm.mssm.edu/Enrichr/) was carried out to enrich the significant pathways, including WikiPathways, Reactome, and BioCarta database [16,17]. Based on the enrichment analysis, we concentrated on biological function of DEGs in both COVID-19 and IPF.

PPI network analysis for the identification of hub genes
For assessing an association between DEGs, we established a PPI network on the Search Tool for the Retrieval of Interacting Genes (STRING) (https://string-db.org/) [18], which was utilized to predict physical and functional associations between proteins. Subsequently, we determined hub genes via an analysis of Cytohubba and MCODE on Cytoscape (3.8.2). Cytohubba (http://apps.cytoscape.org/apps/cytohubba) is a plugin of Cytoscape to explore protein associations according to topological algorithms. Top 10 hubba node was set to obtain the hub genes from DEGs. Molecular Complex Detection (MCODE) (http://apps.cytoscape.org/apps/ mcode) is another plugin to provide clusters of subnetworks. The parameter of MCODE is Degree Cutoff = 2, Node score cutoff = 2, and K-score = 2.

Prediction of candidate drugs for hub genes
The final stage of the study was designed to determine candidate drug for highly expressed hub genes. The access of the Drug Signatures database (DSigDB) is acquired through Enrichr (https://amp.pharm.mssm.edu/Enrichr/) platform, which contains the largest number of drugs/compound-related gene sets to date, were extracted and compiled from quantitative inhibition data of drugs/compounds from a variety of databases and publications [19]. Enrichr is mostly used as an enrichment analysis platform that represents numerous visualization details on collective functions for the genes that are provided as input. We predicted candidate drug targeted hub gene. The adj.P.value < 0.01 was considered statistically significant. The candidate drugs can be sorted by adj.P.value and combined score ranking.

Identification of common DEGs between COVID-19 and IPF
From GSE147507 dataset, we identified 812 DEGs including 396 upregulated genes and 417 downregulated genes (Fig 2). Out of 5977 DEGs identified from five IPF GEO datasets, 2369 were upregulated and 3608 were downregulated. We then overlapped DEGs from one SARS-CoV-2-infected sample dataset and five IPF datasets. A total of 117 and 71 genes were identified as common upregulated ( Fig 3A) and downregulated DEGs (Fig 3B), respectively. Next, we tried to identify the function of common DEGs involved during the progression of COVID-19 and IPF.

GO and pathway identification by gene set enrichment analysis
To further understand the function and pathways of common DEGs, enrichment analysis was preformed to show that common upregulated DEGs in COVID-19 and IPF were involved in cytokine mediation, such as cell response to interferon (Fig 4, Table 1). The DEGs downregulated in both the diseases, however, were associated in the disassembly of cellular components and extracellular matrix ( Fig 5, Table 2). The upregulated DEGs were mainly located in cellular component, and main molecular function of these genes was found to bind to small molecules and metabolites. Among upregulated DEGs, 25 genes were involved in cytokine-mediated signaling pathway and 11 genes were involved in cellular response to type I interferon and 11 genes in type I interferon signaling pathway in GO terms. The downregulated DEGs located at the intracellular membrane organelle and nucleus, appeared to bind with RNA and catalytic enzymes, and mediate channel activity. The GO results were consistent with those of a serial pathway analysis, including KEGG, Reactome, wikipathway, and Biocarta. For instance, Reactome and wikipathway revealed that upregulated DEGs were related to interferon signaling and cytokine signaling pathway (S1 Fig

Enrichment analysis of hub genes
The GO enrichment analysis revealed that the 22 hub genes were upregulated in cellular response to type I interferon and type I interferon signaling pathway. The analysis also exhibited significant involvement of mitochondrial envelope and adenylyl transferase activity in the upregulated group (S3 Fig). In the downregulated group, 11 hub genes mostly enriched in nucleolus and nuclear lumen, were appeared to be evolved in RNA binding and mitotic G1 DNA damage checkpoint signaling (S4 Fig).

Candidate drug prediction for targeting hub genes between COVID-19 and IPF
For further investigating the significant role of common hub genes, candidate drugs targeting the 22 upregulated hub genes were predicted (Table 3). A total of 155 candidate drugs were identified with adj.P.value < 0.01 (S3 Table). These drugs were further examined to affect molecular activity of 22 hub genes and their downstream molecules, which are displayed as a list (S4 Table). Among these drugs, 11 were predicted to target more than 10 hub molecules, while 69 drugs targeted less than 3 hub molecules.

Discussion
A strong association between COVID-19 and IPF has been previously reported [4,20,21], and IPF was reported as risk factor for COVID-19 [4]. On the contrary, anti-fibrosis therapies are available for inhibiting severe COVID-19 progression [4]. Moreover, COVID-19 has changed the approach to treat IPF patients, since SARS-CoV-2 infection is reported to impact the prognosis of IPF patients [22]. The relevance between COVID-19 and IPF is supposed to be through the association between up-and downregulated genes. One COVID-19 dataset and five IPF datasets were analyzed, the latter are designed to analyze only the lung samples. These datasets were published from 2011 to 2019, ranging from America to East Asia to ensure that our study is broadly representative. Our finding of aberrant expressed genes from 6 GEO datasets suggested that these DEGs influenced the crosstalk between COVID-19 and IPF. In addition, the present study was designed for the identification of hub genes and the prediction of their potential drug, which may enable novel molecular targets as new COVID-19 strategies with antifibrotic treatment. Given that common DEGs can drive the development of drugs against COVID-19 and IPF, we concentrated on the DEG-related function after dividing upregulated and downregulated

PLOS ONE
Role of differentially expressed genes common in IPF and COVID-19 genes. Except for immune response and defense response to virus, it is somewhat surprising that upregulated DEGs are enriched in inflammatory molecules, especially cytokine-related function. Type I interferon signaling pathway and cytokine-mediated signaling pathway were mainly related to upregulated DEGs. An association between type I interferon and IPF has been reported to show that type I interferon pathway may drive chronic inflammation and fibrosis [23]. Type I interferon response was amplified based on ex vivo evidence of IPF [24]. It has been reported that there were similar cytokine profiles in IPF and COVID-19 [22], which is consistent with an observation that the level of profibrotic mediators in COVID-19 patients was increased at the serum level. Our finding was an important evidence to support an antifibrotic therapy for COVID-19 patients by mediating cytokine signaling. In case of downregulated genes between COVID-19 and IPF, the biological function was enriched in disassembly of cellular components and extracellular matrix. The pathological changes in IPF developed from an alteration of extracellular matrix, which can replace the healthy lung tissue, contributing to the deterioration of lung compliance [25]. The lung architecture is destructed due to the secretion of excessive amounts of extracellular matrix from fibroblast and myofibroblast foci [26]. Our findings were in accordance with the previous research. In our study, matrix metalloproteinases (MMPs) which accounted for disassembly of extracellular matrix were downregulated in both COVID-19 and IPF. These findings may help us to understand that absence of these genes in COVID-19 patients might induce the progression to fibrosis.
The common hub genes between COVID-19 and IPF were the most strongly associated among all DEGs. The hub genes were indeed relevant with IPF progression. For example, our study revealed that several hub genes were related to interferon signal pathway, which was demonstrated to influence IPF treatment. Besides, 19 hub genes are involved in the enrichment of chemokines. Previous research showed that chemokine CCL2 and its downstream pathways were the key to the development of IPF [27]. Our findings from PPI network analysis were consistent with our above functional enrichment, suggesting that these hub genes could be novel therapeutic targets between COVID-19 and IPF.
Considering that the hub genes played a vital role in a crosstalk between COVID-19 and IPF, we used hub genes to identify potential candidate drugs. We found several potential candidate drugs which probably contributed to the treatment of COVID-19 and IPF. Among all candidate drugs, the current study highlights the top 10 significant drugs. Among them, candidate drugs targeting exogenous invasion enabled to be an important approach along with suloctidil, which has been suggested as potential antifungal agent [28]. 3'-Azido-3'-deoxythymidine CTD 00007047 was used as an anti-viral agent and a reverse transcriptase inhibitor active against HIV-1, and thioridazine was proved to exhibit anti-viral activity [29]. Moreover, a previous study has revealed that myofibroblast activation and uncontrolled proliferation associated IPF with cancer [30]. Several candidate drugs exhibit anticancer activities. Chlorophyllin CTD 00000324 was determined to deactivate ERKs and inhibit breast cancer cell proliferation [31]. Prochlorperazine has been proved to exhibit anticancer activity towards different types of human cancer [32]. Terfenadine, demonstrated to be effective against PC-3 and DU-145 cells (two prostate cancer cell lines) by inducing cell apoptosis [33], and etoposide were identified as anticancer drugs as they induced cancer cell apoptosis [34]. It can be assumed that candidate drugs which possess anticancer activity with the inhibition of cell proliferation and fibroblast activation might contribute to the treatment of IPF and COVID-19. In summary, the present study raised the possibility that existing drug and compounds may be available for the development of COVID-19 therapy.
Although the risk factors for IPF and COVID-19 are common, our study provides insufficient evidence to support the clinical practice of candidate drug for COVID-19 and IPF treatment. Furthermore, due to this limitation, the downstream molecules of hub genes should be determined in the future, and the role of the hub genes in crosstalk between COVID-19 and IPF should be confirmed using clinical samples and experimental models. Although the current research against COVID-19 has been conducted and data on COVID-19 in GEO are rapidly enriched, GSE147507 dataset has been verified to be reliable with solid evidence. Our conclusions were based on the responses of 5 GSEs in GEO database and so might not reflect processes via the in vivo and in vitro experiments.

Conclusion
In summary, our results provide the common DEGs between COVID-19 and IPF, which add to the accumulating evidence that suggests a treatment for COVID-19 patients in the pulmonology ward administered antifibrotic therapy. With a series of enrichment analysis, herein, we offer new insights into the development of COVID-19 treatment on the basis of biological function. The current study unveiled a potential role of hub genes in COVID-19 and IPF, contributing to a combined COVID-19 treatment. Moreover, our findings offer some suggestions on therapeutic target identification in diseases caused by the Delta SARS-CoV-2 variant, when the common risk factor of the Delta associated with a distinct disease will be uncovered. Table 3. Prediction of TOP 10 candidate drugs for high expressed hub genes.

Name of drugs P-value Adjusted P-value Genes
Supporting information S1 Fig