Comprehensive analysis of an immune infiltrate-related competitive endogenous RNA network reveals potential prognostic biomarkers for non-small cell lung cancer

Globally, non-small cell lung cancer (NSCLC) is the most common malignancy and its prognosis remains poor because of the lack of reliable early diagnostic biomarkers. The competitive endogenous RNA (ceRNA) network plays an important role in the tumorigenesis and prognosis of NSCLC. Tumor immune microenvironment (TIME) is valuable for predicting the response to immunotherapy and determining the prognosis of NSCLC patients. To understand the TIME-related ceRNA network, the RNA profiling datasets from the Genotype-Tissue Expression and The Cancer Genome Atlas databases were analyzed to identify the mRNAs, microRNAs, and lncRNAs associated with the differentially expressed genes. Weighted gene co-expression network analysis revealed that the brown module of mRNAs and the turquoise module of lncRNAs were the most important. Interactions among microRNAs, lncRNAs, and mRNAs were prognosticated using miRcode, miRDB, TargetScan, miRTarBase, and starBase databases. A prognostic model consisting of 13 mRNAs was established using univariate and multivariate Cox regression analyses and validated by the receiver operating characteristic (ROC) curve. The 22 immune infiltrating cell types were analyzed using the CIBERSORT algorithm, and results showed that the high-risk score of this model was related to poor prognosis and an immunosuppressive TIME. A lncRNA–miRNA–mRNA ceRNA network that included 69 differentially expressed lncRNAs (DElncRNAs) was constructed based on the five mRNAs obtained from the prognostic model. ROC survival analysis further showed that the seven DElncRNAs had a substantial prognostic value for the overall survival (OS) in NSCLC patients; the area under the curve was 0.65. In addition, the high-risk group showed drug resistance to several chemotherapeutic and targeted drugs including cisplatin, paclitaxel, docetaxel, gemcitabine, and gefitinib. The differential expression of five mRNAs and seven lncRNAs in the ceRNA network was supported by the results of the HPA database and RT-qPCR analyses. This comprehensive analysis of a ceRNA network identified a set of biomarkers for prognosis and TIME prediction in NSCLC.


Introduction
Lung cancer is the leading cause of cancer-related deaths [1]. Non-small cell lung cancer (NSCLC) accounts for approximately 83% of all lung cancers [2]; its two dominant histological phenotypes include lung adenocarcinoma (LUAD,~50%) and lung squamous cell carcinoma (LUSC,~40%) [3]. Although remarkable advances have been made for lung cancer diagnoses and treatment strategies, 60-month overall survival (OS) rate and 5-year survival rates remain poor (68% and 0%-10% at stages IB and IV, respectively) [4]. Thus, accurate detection of NSCLC at an early stage can provide a good prognosis. However, localized diseases at stages I or II are diagnosed in only 16% of patients [5]. A low-dose computerized tomography scan is currently the most practical method for the early diagnosis of lung cancer [6]. However, its high false-positive rate of up to 96.4% [7] requires greater medical attention than needed and is an unnecessary psychological burden for the patients. Diagnostic precision can be enhanced by developing biomarkers that can accurately classify the patients according to their probable disease risk, diagnosis, and prognosis or response to treatment [8]. In addition, functional biomarkers with known underlying mechanisms related to the disease can be used as potential therapeutic targets [8]. For instance, identification of epidermal growth factor receptor (EGFR), anaplastic lymphoma kinase (ALK), c-ros proto-oncogene 1 receptor tyrosine kinase (ROS1), kirsten rat sarcoma viral oncogene homolog (KRAS), serine/threonine-protein kinase B-Raf (BRAF), mesenchymal-epithelial transition factor (MET), proto-oncogene tyrosine-protein kinase receptor Ret (RET), human epidermal growth factor receptor 2 (HER2), and neurotrophic receptor tyrosine kinase (NTRK) tyrosine kinase inhibitors (TKIs) have improved the outcomes for oncogenepredisposed NSCLC patients [9]. However, patient responses to TKIs are usually temporary because tumors eventually develop resistance to targeted therapies through on-or off-target mechanisms [10]. NSCLC patients show a positive response to immune checkpoint inhibitors (ICIs) that target programmed cell death-1 (PD-1)/programmed cell death ligand-1 (PD-L1) interaction [11]. Nevertheless, most NSCLC patients do not show any initial responses to ICIs, whereas others who respond for a certain period relapse due to an immunosuppressive microenvironment [12]. Thus, future studies should focus on possible options such as combination strategies involving an agent that can address a specific on-or off-target resistance mechanism upfront or during disease progression [12,13]. Therefore, the identification of novel molecular network biomarkers for early screen to improve prognosis and treatment in NSCLC is needed.
Cellular and molecular immune markers in the tumor immune microenvironment (TIME) play an important role in NSCLC development. The density and localization of tumor-infiltrating immune cells greatly affect the OS in NSCLC patients [14]. In NSCLC patients, a high intra-tumoral density of immature dendritic cells, regulatory T cells, and M2 macrophages has been associated with poor prognosis, whereas a high intra-tumoral density of CD8+ T cells, CD4+ T cells, M1 macrophages, mature dendritic cells, and natural killer cells has been correlated with better prognosis [14]. PD-1 is the main immune checkpoint for immunotherapy in NSCLC. Several meta-analyses have recently shown a possible link between the PD-1/PD-L1 axis and the prognosis of NSCLC patients [15]. The expression of chemokines, including CXCL9, CXCL -10, and CXCL -16, is also correlated with the prognosis of these patients [16]. Thus, the complex regulation of TIME in NSCLC must be examined to identify effective immune targets for accurate prognosis and increase the efficacy of clinical immunotherapy. Pandolfi et al. [17] first proposed the competing endogenous RNA (ceRNA) hypothesis. Accordingly, a post-transcriptional regulatory network allows all RNA transcripts including mRNAs, long non-coding RNAs (lncRNAs), and circular RNAs, to regulate each other, by competing for shared microRNAs (miRNAs). miRNA response elements (MREs) are the keystone for the competitive binding of miRNAs with lncRNAs and mRNAs [18]. ceRNAs participate in the pathogenesis of multiple cancers, including NSCLC, and their abnormal expression is substantially associated with the prognosis of the patients [19,20]. Jiang et al. [21] constructed a ceRNA network to identify a prognostic signature in bladder cancer. Liu et al. [22] developed the seven-lncRNA prognostic signature in melanoma through an integrative analysis of the DElncRNA-DEmiRNA-DEmRNA ceRNA network. For the ceRNA network of NSCLC, through gene expression studies, some biomarkers associated with prognosis and tumor progression have been identified [23,24]. Based on the alternative splicing events, Li et al. [25] developed a high-performance prognostic predictor model for risk stratification in patients with NSCLC. The above studies have focused on the comparative analyses between limited peritumoral samples and paired tumor samples from The Cancer Genome Atlas (TCGA) database. However, this paradigm is far from ideal because of the differences in the transcriptomic profiles between para-tumor and healthy tissues [26]. The Genotype-Tissue Expression (GTEx) dataset is a large-scale public resource for genome-wide association studies in healthy human tissues [27]. GTEx and TCGA databases allow unprecedented studies of gene expression to address the limitation [28]. Weighted gene co-expression network analysis (WGCNA) is an effective approach to examine the intrinsic links between clinical outcomes and co-expression gene modules [29]; however, only a few studies have employed this method to analyze the relevance of co-expression patterns for ceRNAs.

PLOS ONE
In this study, a TIME-associated ceRNA regulatory network for the prognostic prediction of NSCLC was constructed using the data from TCGA and GTEx databases by multiple bioinformatic analytic methods including WGCNA and The Cell Type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT). Fig 1 depicts the study design. The RNA expression levels in 944 NSCLC samples and 578 normal lung samples were evaluated. A total of 1838 significantly up-regulated mRNAs and 3097 down-regulated mRNAs were identified and termed as differentially expressed mRNAs (DEmRNAs) in NSCLC; their distribution is shown in Fig 2A. Gene Ontology (GO) analysis was performed to identify the enriched biological mechanisms underlying DEmRNAs, and the GO enrichment results are shown in Fig 2B. In the biological process (BP), DEmRNAs were significantly enriched in cornification, epidermis development, skin development, keratinization, extracellular structure organization, and keratinocyte differentiation. The Z-score of GO enrichment was closer to blue module, which indicated that most of the BPs were likely underrepresented ( Fig 2C). Gene Set Enrichment Analysis (GSEA) showed that the N-glycan biosynthesis, p53 signaling pathway, cell cycle, and DNA replication-related genes were up-regulated, whereas calcium, VEGF, mTOR, and MAPK signaling pathways were significantly down-regulated ( Fig 2D).

Analyses of mRNA modules by WGCNA
The top 80% mRNAs (14428 genes) obtained from variance comparison were used to construct the gene modules using the WGCNA algorithm. Soft-thresholding was set at β = 8 and module size cut-off at 25 to characterize a scale-free network. After the co-expressed genes were categorized by average linkage clustering, a total of 35 gene color modules were identified ( Fig 3A). Unassigned genes were categorized into the gray module. A total of 1000 genes from the mRNA color modules were randomly chosen to plot the network heatmap and visualize the topological overlap matrix (TOM) (Fig 3B). Among the 35 color modules, the module eigengene (ME) of the brown module showed the highest association with NSCLC and normal traits ( Fig 3C). Thus, the brown module containing 9041 genes was considered as the key module for NSCLC. GO term enrichment analysis was performed for the mRNAs in the brown module and the functional enrichment and gene interactions in BPs were identified (Fig 3D  and 3E). These genes were mainly related to the developmental processes in reproduction, multi-organism reproductive processes, and cell cycle regulation; these were clustered in focal adhesion, protein processing in the endoplasmic reticulum, mTOR signaling pathway, and autophagy according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis ( Fig 3F).

Analysis of lncRNA modules by WGCNA and prediction of target genes
The constructed lncRNA network was used to identify the hub modules. The top 60% lncRNAs (8102 genes) obtained from variance comparison were analyzed by WGCNA. Softthresholding was set at β = 3 and module size cut-off at 25. A total of 23 co-expressed lncRNA modules were identified ( Fig 4A). Among them, the turquoise module, which included 1587 lncRNAs as shown in Fig 4B, was the most significantly associated with NSCLC (r = 0.92). Furthermore, lncRNAs in the turquoise module had the most remarkable correlation with gene significance for NSCLC in all modules ( Fig 4C, cor = 0.96). A total of 316 differentially expressed miRNAs (DEmiRNAs) were obtained from the comparison of TCGA miRNA-seq data between 961 NSCLC and 90 normal lung tissue samples ( Fig 4D). The lncRNA-miRcode-miRNA relationship between 1587 differentially expressed lncRNAs (DElncRNAs) in the turquoise module and 316 DEmiRNAs was evaluated using the online miRcode tool. Overlapping miRNAs (60) between lncRNA-miRcode-miRNA (326) and DEmiRNAs (316) were then obtained. Using miRNA target prediction tools, 60 common miRNAs from the intersection were used for subsequent analyses. A total of 8704 predicted target mRNAs were acquired ( Fig 4E). As shown in Fig

Construction of 13-mRNA-based prognostic model to predict chemotherapeutic response
TCGA patient data including OS and clinical features were obtained for 944 NSCLC patients. The clinicopathological characteristics of NSCLC patients are shown in Table 1. A univariate regression analysis was performed to investigate the relevance of the expression levels of the 1094 target genes with OS in NSCLC. The cut-off for significant association was set at p < 0.05. As a result, 31 genes were obtained and further included in the subsequent multivariate cox regression analysis (    Fig 5B displays the correlation between the expressions of each gene in the 13 gene-based model. The risk assessment scoring was computed for each NSCLC patient, and the threshold for risk score was set at 1.006. Accordingly, the patients were divided into high-(n = 471) and low-risk (n = 473) groups as shown in Table 3. The accuracy of the 13 gene-based prognostic model in predicting NSCLC clinical outcomes were estimated by Kaplan-Meier (K-M) survival analysis, Cox proportional hazard regression model, and ROC. As shown in Fig 5C, the K-M survival curves showed that OS in the high-risk group was significantly shorter (P < 0.001) than that for patients with low-risk prediction. Fig 5D shows the comparison between the risk score distribution, survival status, and expression heatmap for 13 genes between the high-and low-risk groups. According to the univariate analysis, OS was significantly associated with the risk score and the TNM, T, N, and M stages (P<0.001) ( Fig 5E). However, multivariate analysis indicated that among the above-mentioned prognostic factors, the risk score was the only independent prognostic predictor (HR = 1.960, 95% CI = 1.689-2.275, P<0.001) (Fig 5F). The area under the curve (AUC) for the risk score was 0.690 ( Fig 5G); higher than that for other clinical factors. The relationship between the clinicopathological parameters and the 13 genebased prognostic model is shown in Fig 5H. The values for the prognostic prediction model were higher in T3-4 than in T1-2 stages (P = 0.011); in N1-3 than in N0 (P < 0.001), and in stages III-IV than in stages I-II (P < 0.001). The estimated half-maximal inhibitory concentrations (IC50) of three commonly used NSCLC chemotherapeutic agents, including, cisplatin, docetaxel, and paclitaxel, were compared between the low-and high-risk groups of patients using the pRRophetic algorithm. The three drugs showed higher IC50 values and lower sensitivity in the high-risk as compared to low-risk groups (all P < 0.01) ( Fig 5I). Thus, the result suggested that high-risk patients with NSCLC exhibit resistance to cisplatin, docetaxel, and paclitaxel. Taken together, the prognostic model was effective in predicting the therapeutic response of high-and low-risk NSCLC patients towards the three drugs.   (Fig 6A). However, only three miRNAs were further selected. These three miRNAs were retained after overlapping with the 29 predicted miRNAs. Thus, five mRNAs were obtained from the 13 prognostic mRNAs which corresponded to the three miRNAs ( Fig 6B). Finally, a ceRNA network containing five mRNAs, three miRNAs, and 69 lncRNAs was constructed as shown in Fig 6C. The relationships between clinicopathological parameters and expression of the five mRNAs in the ceRNA network are shown in Fig 6D.  Relationship between the risk score from ceRNA network and tumor immune microenvironment

Construction of a ceRNA network
The differences in the 22 infiltrating immune cell types were determined between low-and high-risk NSCLC patients using the CIBERSORT algorithm. Fig 7A summarizes the proportion of immune cell types in 944 patients from the TCGA database. The high-risk patients had higher infiltration levels of activated CD4 memory and follicular helper T cells (Fig 7B and  7D), whereas greater infiltration of CD4 memory resting T cells, M0 macrophages, resting dendritic cells, and neutrophils were observed in low-risk patients (Fig 7B and 7E-7H). Given the important role of chemokines and immune checkpoints in TIME and immune response, we analyzed the correlation between the expression of these molecules and the risk score. The expression of PD-1, positively associated with the risk score, was significantly upregulated in the high-risk group as compared to the low-risk group (Fig 7I and 7J). In addition, the expression of chemokines for immune activation (i.e., CXCL9, CXCL10, and CXCL16) was significantly lower in the high-risk group than in the low-risk group (Fig 7K). These results indicated that ceRNA risk signatures may be implicated in the NSCLC immunosuppressive microenvironment.

PLOS ONE
Immune microenvironmental prognostic ceRNA in NSCLC The results of lncRNA analysis were similar to the findings through mRNA analysis. Univariate regression analysis was performed to evaluate the relevance of the expression of the 69 lncRNAs from the ceRNA network with OS. Nine lncRNAs were obtained by setting the cutoff for significant association at P < 0.05. These were further included for the subsequent multivariate cox regression analysis (Table 4). A prognosis model for OS was constructed with the following seven lncRNAs: LINC00707, LINC00460, FEZF1-AS1, LINC00593, OTX2-AS1, LINC01833, and CASC8. The risk score was calculated for each NSCLC patient, and the threshold was set at 0.9755. Accordingly, the patients were classified into high-(n = 471) and low-risk (n = 473) groups. Fig 8A shows the distribution of risk scores for NSCLC patients, the survival status in different groups, and the heatmap of the expression of seven lncRNAs. The accuracy of the seven-lncRNA-based prognostic model in predicting NSCLC clinical outcomes was estimated by K-M survival analysis, Cox regression model, and ROC. As shown in Fig 8B, the K-M survival curves showed that OS in the high-risk group was significantly shorter than that in patients with low predicted risk (P < 0.001). According to the univariate analysis, the risk score and TNM, T, N, and M stages were significantly correlated with OS in NSCLC (P<0.001) (Fig 8C). However, multivariate analysis results showed that among them, only the risk score was an independent prognostic predictor (HR = 2.050, 95% CI = 1.702-2.470, P<0.001) (Fig 8D). The AUC for risk score was 0.650 ( Fig 8E); higher than those for other clinical factors. Fig 8F illustrates the relationship between the clinicopathological parameters and the seven-lncRNA-based model. Higher prognostic prediction values were found in T3-4 than in T1-2 (P = 0.007) stages; in N1-3 than in N0 (P = 0.012), and in stages III-IV than in stages I-II (P = 0.013). Gefitinib, gemcitabine, and cisplatin are drugs commonly used in NSCLC treatment [30]. The estimated IC50 values of these drugs for the low-and high-risk patients were compared using the pRRophetic algorithm. All three drugs had higher IC50 values and lower sensitivity in the high-risk patients as compared to those for low-risk patients (all P < 0.01) (Fig 8G). This result indicated that high-risk NSCLC patients were resistant to gefitinib, gemcitabine, and cisplatin. Based on the seven-lncRNA signature, a prognostic ceRNA network involving five mRNAs of the 13 gene-based prognostic model and three miRNAs were extracted (Fig 8H).

Validation of the risk mRNAs and risk lncRNAs in ceRNA network
We analyzed the immunohistochemical staining data obtained from the Human Protein Atlas (HPA) database to compare the expression levels of five risk genes (FSTL3, CPS1, PTPN21,  The results showed that the expression at the protein level for two risk genes (DEPDC1B and COL9A3) was significantly higher in LUSC or LUAD tissues than in normal lung tissues (Fig 9D and 9E); two genes (FSTL3 and PTPN21) showed an opposing trend (Fig 9A and 9C), which was also consistent at the transcriptional level ( Fig 5A). Only CPS1 expression at the protein level was not detected in both the normal lung and NSCLC tissues in the HPA database (Fig 9B).

Discussion
NSCLC is one of the leading causes of cancer-related death, worldwide [31]. Despite the rapid development in immunotherapy and targeted therapy for NSCLC, the 5-year overall survival of NSCLC patients remains low, due to the lack of availability of effective biomarkers for early diagnosis and a poor understanding of the pathogenesis of NSCLC. Powerful computational models to predict potential disease-related non-coding RNAs for experimental validation are helpful for in-depth interpretation of the pathogenesis and processes underlying NSCLC development and improving related treatment strategies, and may dramatically decrease the time and expenditure on biological experiments [32][33][34][35]. The ceRNA network hypothesis was proposed to describe a new RNA regulatory crosstalk where different RNA transcripts can potentially regulate each other through MREs [17]. This concept unfolds a novel paradigm for the lncRNA-miRNA-mRNA regulatory network that may explain the mechanism underlying tumorigenesis. Recently, the advancement in interaction prediction studies using computational biology provides valuable insights into the development of lncRNA-mediated ceRNA network [36][37][38]. TIME composition is strongly associated with neoplastic progression, antitumor immune response, and clinical outcome in NSCLC [14,39]. Therefore, an immunerelated ceRNA network must be explored to improve the accuracy of prognostic prediction and clinical response to immunotherapy. To the best of our knowledge, this is the first study that establishes an immune-related ceRNA prognostic model based on GTEx-TCGA data, WGCNA, pRRophetic algorithm, and TIME in NSCLC.
In this study, the RNA-Seq data from 944 NSCLC and 578 normal lung specimens were acquired from TCGA and GTEx databases, respectively. Key mRNAs and lncRNA signatures associated with NSCLC prognosis were identified using bioinformatic methods, including edgeR, WGCNA, GO term enrichment analysis, GSEA, KEGG pathway enrichment analysis, ceRNA network construction, and Cox regression analyses. A ceRNA network containing five mRNA signatures, seven lncRNA signatures, and three miRNAs were significantly related to OS in NSCLC patients. Additionally, correlations between the risk score of ceRNA prognostic model and TIME, and with sensitivity towards cancer drugs were investigated. Finally,

PLOS ONE
Immune microenvironmental prognostic ceRNA in NSCLC immunohistochemical data from the HPA database and RT-qPCR were used to validate the findings from this study.
The lncRNA-miRNA-mRNA ceRNA network has an important influence on NSCLC pathogenesis and prognosis. Wang et al. [23] investigated a ceRNA network including 113 DElncRNAs, 12 DEmiRNAs, and 36 DEmRNAs with significant association with NSCLC. Wang et al. [40] constructed a ceRNA network involving 155 lncRNAs, 30 miRNAs, and 68 mRNAs to identify novel targets and pathways in NSCLC. In both these studies, the tumor and non-tumor samples were obtained from the same patients from the TCGA database to accurately elucidate the ceRNA interactive mechanism underlying NSCLC biology. Their findings highlight the prognostic value of ceRNA for NSCLC patients. However, the number of normal samples in the TCGA-NSCLC cohort is limited. Biomolecules secreted by cancer cells can be engulfed by the neighboring normal cells, thereby altering the gene expression and biological processes in these two cells [41]. The idea that cancer-related genetic aberrancy also exists in para cancer tissues adjacent to the highly genetically abnormal tumor tissues may be biased [42]. Hence, the use of a large number of normal samples from individuals with no tumors is necessary [43]. In the present work, edgeR was used to identify the DEmRNAs between NSCLC and normal lung samples. KEGG-GSEA results showed that DEmRNAs were enriched in the cell cycle, DNA replication-related genes, and the mTOR pathway. Further WGCNA analysis showed that the brown gene modules were significantly related to NSCLC. GO term enrichment results showed that the mRNAs in the brown modules were involved in cell cycle regulation, and KEGG analysis showed that they were clustered in the mTOR signaling pathway and DNA replication processes. PI3K/Akt/mTOR signaling commonly occurs in tumors and controls DNA replication and replisome stability, thereby regulating cell cycle progression [44]. Therefore, PI3K/Akt/mTOR signaling may be an essential mechanism underlying NSCLC initiation and progression.
Only a few ceRNA network studies discuss the relationship of expression patterns among ceRNAs associated with NSCLC. WGCNA was used to identify the co-expression modules in NSCLC using the data from the TCGA and GTEx consortia. GTEx provides abundant data on gene expression from healthy people, thus benefitting the tumor purity quantification of the samples [41]. The WGCNA-brown mRNA module and the WGCNA-turquoise lncRNA module were the key modules for NSCLC. Among the predicted mRNAs, DEmRNAs, and mRNAs in the brown module, 1094 overlapping mRNAs were identified. Furthermore, a survival prognostic model with 13 genes (FSTL3, CPS1, PTPN21, DEPDC1B, COL9A3, DSG2, LAMB1, STYK1, RBM6, DEPDC1, GTSE1, NAV3, and FKBP5) was constructed for NSCLC through K-M estimator and Cox proportional hazard regression model.
Non-coding RNAs can promote PD-L1/PD-1 expression through ceRNA regulation of NSCLC mechanisms [45]. However, given the complexity of the immune system, knowledge of a single biomarker that can precisely identify patients who can potentially benefit from ICIs is lacking. Immune-related gene signatures, immune infiltrating cells, and chemokines are important in predicting the clinical response to immunotherapy and prognosis of NSCLC patients [14]. A previous study shows that activated memory CD4 T cells and helper T cells are positively correlated with improvement in survival AMONG lung cancer patients [14]. By contrast, suppressive immune cells, including resting dendritic cells, M0 macrophages, and neutrophils, form an immunosuppressive microenvironment [46] and indicate a poorer prognosis. Integrated analysis of the correlation between ceRNA and TIME is necessary to improve the accuracy of prognostic predictions. The ceRNA signature in NSCLC is related to immune infiltrating cell types and PD-1 expression. Results from the execution of the CIBER-SORT algorithm showed that the proportion of activated CD4 memory T cells and follicular helper T cells was positively correlated with the risk score of 13 gene-based ceRNA network, and the proportion of CD4 memory resting T cells, M0 macrophages, resting dendritic cells, and neutrophils was negatively associated with the 13 gene-based ceRNA risk score. Chemokines, especially CXCL9, CXCL-10, and CXCL11/CXCR3 axis, play key roles in TIME and immune response [39]. Reduced levels of CXCL9 and CXCL10 are correlated with the immunosuppressive microenvironment, negative responses to ICIs, and poor prognosis among cancer patients [47]. In the present study, CXCL9 and CXCL10 were downregulated in the highrisk group, which further promoted immunosuppressive TIME. Taken together, our ceRNA risk model may effectively predict the TIME in NSCLC patients.
Furthermore, five mRNAs were obtained from the 13 gene-based prognostic model which corresponded to the three miRNAs selected from the prediction from among the 69 DElncR-NAs. Based on these five mRNAs (FKBP5, DEPDC1, CPS1, COL9A3, and PTPN21), a ceRNA network that included 69 DElncRNAs and three DEmiRNAs (miR-17-5p, miR-20b-5p, and miR-429) was further established. Survival analysis revealed that seven DElncRNAs (LINC00707, LINC00460, FEZF1-AS1, LINC00593, OTX2-AS1, LINC01833, and CASC8) in the ceRNA network were associated with the OS in NSCLC patients. Finally, the prognostic ceRNAs from a coregulatory network involving five mRNAs from the 13-gene-based prognostic model and three DEmiRNAs were identified based on the seven-lncRNA signature.
Although chemotherapy and targeted anti-cancer therapies have improved the outcomes in NSCLC patients, resistance to chemotherapeutic and targeted drugs pose a serious challenge. The mechanisms of drug resistance remain unclear owing to the lack of approaches to accurately predict clinical responses to drugs. In this study, we predicted the association between agents and signatures in the ceRNA network using the pRRophetic algorithm. Our results demonstrated that the signatures in the ceRNA network were correlated with resistance to chemotherapeutic and targeted drugs, including cisplatin, paclitaxel, docetaxel, gemcitabine, and gefitinib, thus allowing prediction of drug response for developing personalized treatment of NSCLC patients.
In the five-gene signatures in the ceRNA network, the expression levels of DEPDC1, CPS1, COL9A3, and PTPN21 were significantly different between early-and advanced-stage tumor tissues. Meanwhile, FKBP5 was associated with a subtype of NSCLC. In progressive stages of NSCLC, DEPDC1, CPS1, and COL9A3 were overexpressed, while PTPN21 was down-regulated. DEPDC1 is a putative oncogene [48] that can function as a transcriptional co-repressor during transcriptional regulation [49]. The results of a previous study strongly indicate that DEPDC1 expression is positively related to the poor prognosis in NSCLC [50], which is consistent with the findings of the present study. Another previous study reports that DEPDC1 promotes angiogenesis and invasion by activating chemokines in hepatocellular carcinoma [51]. CPS1 is the mitochondrial enzyme involved in the first committed step of the urea cycle [52]. Its expression is positively correlated with tumor growth and is also associated with poor NSCLC prognosis [53]. COL9A3 encodes instructions for generating one of the three alpha chains of type IX collagen [54] and is highly expressed in salivary adenoid cysts, breast basallike carcinomas, and melanoma [55]. However, its function in cancer remains unknown. The correlation between COL9A3 and NSCLC has not yet been reported. The present study is the first to show that abnormal COL9A3 expression is associated with the tumor size in NSCLC patients. PTPN21, also called PTPD-1, is a member of the protein tyrosine phosphatase (PTP) family and controls cell growth, migration, and oncogenic transformation [56]. Given its inverse correlation with tumor invasion in bladder cancer tissues [57] and its immunosuppressive function [58], PTPN21 may influence the early stages of tumor progression by inhibiting the immunosuppressive TIME and tumor invasion. This phenomenon could explain the high expression of PTPN21 in the N0 stage.
miR-17-5p, miR-20b-5p, and miR-429 are all involved in lung cancer [59][60][61]. miR-429, a member of the miRNA-200 family, functions as a tumor promoter in human NSCLC cells [62]. It is also positively correlated with the expression of immune checkpoints in NSCLC patients [63]. miR-429 is known to regulate six lncRNAs (LINC00707, LINC00460, FEZF1-AS1, LINC00593, LINC01833, and CASC8) out of the seven lncRNAs in the proposed prognostic model. Among them, LINC00707 is a novel long intergenic non-coding RNA that enhances sensitivity towards cisplatin in NSCLC cells [64], regulates mRNA stability, and can function as a potential diagnostic and prognostic marker in gastric cancer [65]. LINC00460 promotes cell migration and invasion in NSCLC; it is a potential prognostic marker and a treatment target for patients with NSCLC [66]. FEZF1-AS1 can promote the proliferation and invasion in tumor cells by suppressing Wnt/β-catenin signaling in NSCLC [67]. lncRNA CASC8A is a potential diagnostic biomarker for cancer [68] which can predict treatment response to platinum-based therapy and toxicity in patients with lung cancer [69]. Previous research findings corroborate our prediction of the ceRNA network and the prognostic models. Using the 13-mRNA-based prediction model and the seven-lncRNA-based prognostic model, we estimated the risk scores of NSCLC patients. The AUC of the ROC curve was > 0.65, which indicated that the prediction value had high accuracy in these prognostic models. Patients with NSCLC belonging to the high-risk group exhibited a high risk of progression according to the ceRNA network. Thus, these results provide a strong basis for the potential use of these novel predictive biomarkers for NSCLC prognosis and may guide molecular treatment approaches and further clinical research in NSCLC.
However, there were some inherent limitations in this study. First, was the lack of robust experimental verification. The function and mechanism of key ceRNAs should be further explored in the in vivo and in vitro settings. Second, was the need for prospective clinical trials to validate the prognosticators of the five-mRNA and seven-lncRNA signatures in the ceRNA network. Third, some superior computational algorithms could be used for biomarker identification [32][33][34][35] and prediction of miRNA-lncRNA interactions [36][37][38] in NSCLC. Finally, only the correlation between the 13-gene-based ceRNA prognostic model and NSCLC TIME was investigated, but further research on the predicted effect of the relevant lncRNAs in the microenvironment remains unknown. Despite these limitations, this study provided reliable prognostic markers that could be used to predict TIME and outcome in NSCLC patients. A multi-dimensional perspective was also proposed for the molecular regulatory mechanism of NSCLC through the TIME-related ceRNA network.

TCGA and GTEx RNA sequencing datasets
The RNA-seq data for 455 LUAD samples and 489 LUSC samples were obtained from the TCGA dataset (https://gdc.cancer.gov/). All data for normal tissue samples were obtained from the GTEx Analysis V8 release version (https://gtexportal.org/home/datasets) which included a total of 578 lung samples. The miRNA sequence data of 455 tumor tissues with 45 control samples from LUAD patients and 489 tumor tissues with 45 adjacent non-cancerous lung tissues from LUSC patients were retrieved from the TCGA database.

Identification of differentially expressed genes
The ensemble IDs of NSCLC samples were transformed into gene symbols using The Human GENCODE Gene Set-Release 34 version (https://www.gencodegenes.org/human/). The trimmed mean of the M-values was used to normalize the raw RNA-seq data. DEmRNAs, DElncRNAs, and DEmiRNAs between NSCLC and normal lung tissues were identified using the edgeR package in the Bioconductor project (version 3.11; http://www.bioconductor.org/). Absolute |logFC| > 1.5 and statistically adjusted P value of < 0.05 (P < 0.05) were considered as statistically significant DEmRNAs; |LogFC| > 1 and P < 0.05 were considered as statistically significant DEmiRNAs, and |LogFC| > 4 and P < 0.05 were considered as statistically significant DElncRNAs. The volcano map of DEmRNA and DEmiRNA expressions was visualized using the ggplot2 package (version 3.3.1, https://github.com/tidyverse/ggplot2) in R.

Gene function and pathway enrichment analysis
ClusterProfiler R package [70] was used to perform GO functional annotation, KEGG pathway enrichment, and GSEA analyses for DEmRNAs. GO analysis was conducted to describe gene functions including BPs, cellular components (CCs), and molecular functions (MFs). KEGG-GSEA was performed for identification of the enriched signaling pathways, and P < 0.05 was considered as statistically significant enrichment.

Co-expression network construction by WGCNA
The WGCNA package in R [71] was utilized to establish a co-expression network and identify the co-expression modules of mRNAs and lncRNAs. First, the variances of mRNAs and lncRNAs were computed by analyzing the variance across LUAD and LUSC samples. The top 80% mRNAs and the top 60% lncRNAs with the highest variances in WGCNA were chosen. Second, an appropriate soft-thresholding power (β) and a scale-free fit R^2 > 0.9 were set to establish a weighted adjacency matrix based on the scale-free topology criterion. After the modules with the highest correlation coefficient were analyzed, the adjacency matrix was transformed into a topological overlap matrix [72]. Co-expression modules were obtained using the dynamicTreeCut package in R, and their expression was analyzed by ME. After merging the highly correlated modules, the gene's connectivity was calculated using the WGCNA package in R. Finally, GO-GSEA and KEGG analyses showed the functional enrichment of mRNAs in key modules.

Construction of target mRNAs-related prognostic model
Clinical data of NSCLC patients were obtained to validate the potential prognostic applications of the target genes. The correlation between gene expression and patient survival was identified through univariate cox proportional hazard regression analysis using the "survival" package in R. Multivariate Cox proportional hazards regression model was subsequently utilized to construct the prognostic gene-based model. The risk score was calculated for each patient using the following formula: A high-risk score indicated a high risk of poor prognosis. The coefficient of genes, "β", was calculated using the multivariate Cox regression model, and the expression of the corresponding mRNAs was represented as "exp" [73]. Patients with NSCLC were categorized into high-and low-risk groups with the medium value set as the demarcation point. Survival differences between the two groups were examined by K-M analysis and log-rank test. A ROC curve of risk score and the clinical features, including age, gender, and T, N, M, and TNM stages were plotted using the "survival ROC" package in R. The accuracy of the predictions of the prognostic model was evaluated using ROC curves. Associations between risk score or identified genes and clinical features were evaluated by stratified analysis. R software (version 3.6.1) was used for all statistical analyses.

Analyses of tumor-infiltrating immune cells, PD-1, and chemokines
The fractions of 22 types of infiltrating immune cells, PD-1, and chemokines were identified using the gene expression data of NSCLC patients from the TCGA database to establish the relationship between ceRNA network and TIME. The differential immune infiltration levels were compared between the low-and high-risk groups using the CIBERSORT algorithm as previously described [74]. Only immune cells with a CIBERSORT P < 0.05 were selected for subsequent correlation analysis. The correlation coefficient between risk mRNAs in the ceRNA network and immune cell infiltration was evaluated through Pearson analysis.

Construction of the ceRNA co-expression network
Competing endogenous RNA (ceRNA) networks were constructed using prognostic mRNA signatures, predicted miRNAs, and DElncRNAs. mRNA-miRNA interactions were identified using the miRDB, miRTarBase, StarBase, and TargetScan databases. The potential target lncRNAs for DEmiRNAs were identified using miRcode. Overlaps between mRNA-miRNA and lncRNA-miRNA interactions with prognostic mRNAs, DEmiRNAs, and DElncRNAs were further used to construct the ceRNA network. The open-source Cytoscape platform (https://cytoscape.org/) was used to visualize the regulatory relationships in the ceRNA network.

Cox regression analyses for lncRNAs
A Cox proportional hazard regression analysis was performed for lncRNAs and mRNAs. Relevant associations between the lncRNAs in the ceRNA network and the OS in NSCLC patients were evaluated through the univariate Cox proportional hazard regression model. Prognostic lncRNAs were identified by the multivariate Cox regression model. The risk score formula for lncRNA was identical to that of mRNA and was as follows: Risk score ¼ exp mRNA1 � β mRNA1 þ exp mRNA2 � β mRNA2 þ . . . þ exp mRNAn � β mRNAn A high-risk score indicated a high risk of poor prognosis. The regression coefficient of lncRNAs, "β", was obtained through the multivariate Cox regression model. The expression of the corresponding lncRNAs was represented as "exp" [73]. ROC curve and clinicopathological correlation analyses were used to estimate the potential applicability of the prognostic signatures for predicting the NSCLC outcomes.

Estimation of the sensitivity towards chemotherapeutic and targeted molecular agents
The IC50 values and sensitivity of chemotherapeutic and targeted therapeutic drugs were compared between the NSCLC subgroups classified by risk scores using the "pRRophetic" package in R. TCGA mRNA and lncRNA expression levels were analyzed based on the mRNA and lncRNA risk scores, respectively.

Validation of the risk signatures from ceRNA network at protein level
The HPA database was used to validate the protein level expression of five risk mRNAs and compare them at the gene transcriptional level.

Statistical analysis
Statistical analyses for bioinformatic experiments were performed using the previously mentioned R packages in the R software (v.3.6.3). Differential expression levels between groups through RT-qPCR analyses were compared using the parametric one-way ANOVA in Graph-Pad Prism version 8.0 (GraphPad Software) and SPSS 23.0 software (IBM). Statistical significance was considered at P < 0.05.