Construction and analysis of a competing endogenous RNA network to reveal potential prognostic biomarkers for Oral Floor Squamous Cell Carcinoma

Background Patients diagnosed with Oral Floor Squamous Cell Carcinoma (OFSCC) face considerable challenges in physiology and psychology. This study explored prognostic signatures to predict prognosis in OFSCC through a detailed transcriptomic analysis. Method We built an interactive competing endogenous RNA (ceRNA) network that included lncRNAs, miRNAs and mRNAs. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were used to predict the gene functions and regulatory pathways of mRNAs. Least absolute shrinkage and selection operator algorithm (LASSO) analysis and Cox regression analysis were used to screen prognosis factors. The Kaplan-Meier method was used to analyze the survival rate of prognosis factors. Risk score was used to assess the reliability of the prediction model. Results A specific ceRNA network consisting of 56 mRNAs, 16 miRNAs and 31 lncRNAs was established. Three key genes (HOXC13, TGFBR3, KLHL40) and 4 clinical factors (age, gender, TNM, and clinical stage) were identified and effectively predicted the for survival time. The expression of a gene signature was validated in two external validation cohorts. The signature (areas under the curve of 3 and 5 years were 0.977 and 0.982, respectively) showed high prognostic accuracy in the complete TCGA cohort. Conclusions Our study successfully developed an extensive ceRNA network for OFSCC and further identified a 3-mRNA and 4-clinical-factor signature, which may serve as a biomarker.


Introduction
Oral cancer is one of the most common malignant tumors of the head and neck [1]. Only 10% of patients with advanced metastatic oral cancer survive for 5 years following therapy [2]. The incidence of Oral Floor Squamous Cell Carcinoma (OFSCC) is only inferior to tongue cancer. Surgery is the main radical treatment [3].
MiRNA combined with targeted mRNA and render it untranslatable normally, silencing the corresponding genes. In 2011, Salmena [4] proposed the competitive endogenous RNA (ceRNA) hypothesis, which states that lncRNA could combine competitively with miRNAs, thus eliminating the inhibition of miRNAs on target genes and regulating the expression of target genes.
Therefore, the pathogenesis of ceRNA has attracted considerable attention. In many malignant tumors, such as colon cancer, liver cancer and lung adenocarcinoma, the mechanism of tumorigenesis and development induced by lncRNA-miRNA-mRNA has been elucidated [5].
In the present study, mRNA and mature miRNA material of patients with oral floor OSSC were collected and analyzed. A ceRNA network was designed. Meanwhile, a functional analysis of mRNAs from the ceRNA network was performed. We identified 3-mRNA as novel candidate biomarkers for OFOSSC. The mRNA expression profiles were combined with the clinical features, as the potential model to predict survival.

Patients and TCGA data retrieval
The publicly available TCGA dataset (https://portal.gdc.cancer.gov/) concentrates cancer genetic information from previous studies. This initiative is of considerable significance in the diagnosis and prevention of cancer in future generations. Because all data in this paper were extracted from TCGA, this study strictly followed the publication guidelines approved by TCGA (https://cancergenome.nih.gov/publications/publicationguidelines), and an ethics application was not required. The RNA sequence data (mRNA and lncRNA; Illumina HiSeq RNA-Seq platform), mature miRNA sequence information (Illumina HiSeq miRNA-Seq platform), and clinical data of 54 samples from OFSCC and 3 samples from normal tissues were retrieved and downloaded from TCGA (up to Aug 29, 2019). (Refer to S1 File for retrieval and download files). The RNA material was merged by Perl (See "6.merge_files.pl" in the S1 File) and transformed into lncRNAs (sense_overlapping, lncRNA, 3prime_overlapping_ncrna, pro-cessed_transcript, antisense, sense_intronic) and mRNAs (protein coding) through the Ensembl database (http://www.ensembl.org/index.html, version 89) [6]. Ensembl is a software system capable of automatic annotation and maintenance of eukaryotic genomes. Mature miRNA data manual was extracted and merged by Perl (See "6.merge_files.pl" in the S1 File).
(See the " 7. The RNAs (lncRNAs, miRNAs, mRNAs) expression of the studied samples" folder of S1 File for the sorted data.)

Differential analysis of DERNAs
Differentially expressed RNAs (DERNAs) contain mRNA, miRNA and lncRNA. The Ensembl database was used to separate lncRNAs from mRNAs. We normalized the classified data and conducted gene identification. The false discovery rate (FDR) was used to correct the statistical significance of the multiples test. Log2 fold changes(log2FC) > 1 and FDR < 0.05 were considered to be significant. All of the above operations were implemented by the edgeR software package [7]. Then, we generated volcano plots with three types of DERNAs obtained from the previous step using the pilots package.

Construction of a ceRNA network
MiRcode was used to predict the targeted DEmiRNAs of DElncRNAs [8]. Three databases, Targetscan (http://www.targetscan.org/), miRDB [9] (http://www.mirdb.org/) and miRTar-Base [10], were used to predict the target mRNAs of DEmiRNAs by combined utilization. The intersection of DEmRNAs and targeted mRNAs was perceived by the VennDiagram software package. The visualization of coexpression constructed by cytoscape 3.7.1. revealed the potential relationship of RNAs in OFSCC

Functional enrichment analysis
To further analyze the functional characteristics of DEmRNAs in OFSCC, GO enrichment and KEGG pathway analyses were performed based on the Database for Annotation, Visualization and Integrated Discovery (DAVID) bioinformatics resources (version 6.8) https:// david.ncifcrf.gov/ [11]. A significance level of P< 0.05 was set as the cutoff criteria.

Survival analysis and a predictive model for prognosis construction
We used the "survival" package in R software for Kaplan-Meier (K-M) survival analysis of DEmRNAs in the ceRNA network.
The DEmRNAs were evaluated using univariate Cox'sproportional hazard regression model. Genes with P-value <0.05 were considered as candidate variables and entered into a stepwise LASSO regression [12][13][14][15] and multivariate Cox regression analysis.

Risk assessment models and ROC curve construction
The risk score was the linear combination of key genes weighted by the regression coefficients. A risk score of patients was determined on the basis of the following Equation.
In this Equation, Exp represents the value of expression of key genes, and Coe represents their corresponding coefficients from the multivariable Cox regression analysis. Finally, we established a prognostic risk score system based on screen-out of mRNAs. A time-dependent receiver operating characteristic (ROC) curve was used to estimate the specificity and sensitivity of the prognosis factors in predicting survival and to estimate the distinguishing and predictive abilities of the risk score system. The AUC of ROC was calculated to predict the 3-or 5-year survival rate of patients with OFSCC.

Validation of the key genes in GEO dataset
Three GEO Datasets (GSE74530 with 6 normal samples and 6 tumor samples, GSE30784 with 45 normal samples and 167 tumor samples, GSE23558 with 5 normal samples and 27 tumor samples) provided external validation of key genes. The adj.P value, log2FC and expression of the key genes in both the TCGA and three GEO datasets were calculated by the limma package.

Combining the key genes with clinical characteristic prediction for OFSCC
To evaluate the prognostic value of different clinical characteristics, such as age, gender, clini-cal_stage, TNM, invasive degree (T stage), lymph node status (N stage), metastasis (M stage), and risk scores of key genes, a Cox proportional hazards model was constructed to investigate the prediction capability of prognosis in OFSCC. The result of the multivariable Cox analysis was displayed by nomogram, which was created with the R package rms. Then, risk score and the ROC curve were used to evaluate the accuracy of the prediction model. Moreover, we investigated the potential prediction ability of prognosis in OFSCC by combining risk score and clinical characteristics (age + stage) using a Kaplan-Meier estimator and log-rank test.

Construction of the ceRNA network
To further understand the role of DERNAs in OFSCC and to further elucidate the interactions between these DEmRNAs, DElncRNAs and DEmiRNAs, we constructed a ceRNA regulatory network of DERNAs step by step.
Step 1: We predicted the miRNAs targeted by the 372 DElncRNAs using the miRcode database.
Step 2: We predicted that 16 miRNAs could interact with the 31 lncRNAs.
Step 4: A total of 71 DEmRNAs were obtained from the intersection of 595 targeted mRNAs and 2198 DEmRNAs.
Step 5: A total of 56 mRNAs were obtained with the reverse trend of related miRNA expression. We constructed the ceRNA network of OFSCC using 56 mRNAs, 16

GO and KEGG analyses
In addition, to elucidate the mechanisms underlying OFSCC and to further understand the functional characteristics of DEmRNAs, GO and KEGG analyses were performed via the DAVID and ggplot2 packages of R software. 16 significantly enriched GO terms and 12 significantly enriched KEGG pathways are listed in Fig 3 and S1 and S2 Tables.

Key genes were screened out by comprehensive analysis
Only 9 of 56 mRNAs in ceRNA network were statistically significant though unvariable Cox analysis. Further, 7 mRNAs with statistical difference were screened out, though lasso analysis (Fig 4). So we carried out survival analysis on 7 mRNA respectively, and found that 4 mRNAs had statistical differences, which were TGFBR3, KLHL4, HOXC13 and HADHB. Though literature analysis, we selected these three mRNAs as key genes for verification (S1 File). TGFBR3, KLHL40, and HOXC13 were screened out, and a predictive mRNA model was constructed (Fig 5A and Table 1). The risk score based on mRNA expression according to their relative coefficient in the multivariate Cox regression was calculated as: Risk score = ExpHOXC13 � 0.16163+ExpKLHL40 � 0.19592+ExpTGFBR3 � (-0.52294).
We calculated the risk score for each patient. With the median as the critical value, patients were divided into high-risk and low-risk groups. (S3 Table and Fig 5B) Survival analysis of 3 key genes was performed using the Kaplan-Meier method with a log-rank statistical test. (Fig 5C-5E) The ROC curve shows that 3 key genes exhibited relatively accurate prognostic capacity for predicting survival rates in OFSCC with AUCs of 0.853 (3-year) and 0.914 (5-year). (Fig 5F)

Validation of the expression of key gene molecules with GEO data
Three studies were screened out from GEO to verify the differential expression of key mRNAs in OFSCC. As shown in Fig 6 and Table 2, the expression trend of the 3 molecules essentially is consistent with our results.

Combining risk score with clinical significance prognostic prediction for OFSCC
Based on the clinical data, 4 clinical (age, gender, TNM, clinical stage) factors were selected, which may affect the prognosis of the cancer. We further combined the clinical features with the risk level of three mRNAs to construct a prognostic model and evaluated the influence of the above factors on the survival time by multivariate Cox regression analysis. In the multivariate analysis, age (age 50-59, age 60-69, age 70-79, age > = 80), gender, T stage, N stage, M stage, clinical stage, and 3 mRNA signatures were associated with the survival time (Table 3). Additionally, we performed a log-rank test and Kaplan-Meier survival analysis for 3 key genes and clinical features to obtain factors that exhibit a close relation to the prognosis and survival time of patients with OFSCC.
Kaplan-Meier curves also showed that patient prognosis separated by the risk levels of 3 key genes, N stage, M stage and clinical stage was significantly different (Fig 7). Patients with lower risk score and tumor grade have obviously better prognosis. We constructed a nomogram that integrated the risk score of a model with 3 mRNAs and clinicopathological features to predict the survival probability of patients with OFSCC. We calculated concordance indexes (c-indexes). The C-index quantified the discrimination between random patients, with a C-index of 0.5 indicating no discrimination and 1 indicating perfect discrimination. The C-index for the model was 0.901 (95% CI: 0.8422-

Discussion
The initial stage of OFSCC is usually asymptomatic, therefore early diagnosis is our priority to optimizing survival rate and the life quality of patients. OFSCC is a molecularly heterogeneous disease, and there are few single genetic driver factors that retain the dominant position in malignant invasive disease identification [16,17]. Therefore, it is essential for earlier detection and targeted treatment options optimization to novel molecular network biomarkers construction. It generally accepted that ceRNA alters gene expression through a mechanism mediated by miRNAs, thereby affecting cell function, which may lead to cancer [18]. We established a ceRNA network including 74 mRNAs, 31 lncRNAs and 16 miRNAs with a series of bioinformation datasets. 3-mRNA (TGFBR3KLHL40 and HOXC13) model was identified in the network and that was associated with the clinical outcome of floor of mouth cancer according to univariate and multivariate Cox proportional regression analyses. In this study, we developed a predictive model based on three mRNAs and four clinical outcomes, and the reliability of the model was also proven. In prostate cancer, breast cancer, renal cell carcinoma and endometrial cancer, TGFBR3 has been identified as a tumor suppressor gene [19][20][21][22][23][24]. Wei Z found that hsa_circ_0042666 regulated laryngeal squamous cell carcinoma (LSCC) cell proliferation and invasion by the miR-223/TGFBR3 axis [25]. HOXC13 has been reported to be correlated with progression from leukoplakia to OFSCC arising in the Gingivo Buccal Complex (GBC) with integrative genome-wide analysis [26]. HOXC13 was identified as the novel oral cancer driving

PLOS ONE
genes. KLHL40-deficient patients were observed to phenocopy muscle abnormalities. The relationship between it and cancer has not been previously confirmed and could be new prognostic indicators for patients with cancer [27,28]. The ceRNA network analysis revealed that HOXC13 was regulated by miR-31, and that KLHL40 was a target for miR-211. A research showed that "passenger" miR-31-3p was significantly upregulated in well and moderately differentiated head and neck cancers [29]. Long non-coding RNA LOC554202 promotes laryngeal squamous cell carcinoma progression through regulating miR-31. GO and KEGG analysis showed that the DEmRNAs were mainly enriched in the functions. "Positive regulation of transcription from RNA polymerase ii promoter", "Intracellular signal transduction" and "Sequence-specific DNA binding" and in the pathways including "HIF-1 signaling pathway" and "AMPK signaling pathway", which are closely associated with tumorigenesis. Survival analysis showed that age, TNM stage, and clinical stage were significantly associated with survival time of OFSCC. Although unvariable Cox analysis shows that there was no correlation between smoking and survival of OFSCC, both clinical experience and scientific research recognized that smoking contributes to mortality.
This paper is the first study of ceRNA in OFSCC, a branch of HNSCC. Mature miRNA instead of RNA seq was analyzed, eliminating the interference of RNA precursor and improving the accuracy of detection. However, our study was limited that experimental verifications of the results using clinical specimens or cell lines have not been included due to insufficient material. https://doi.org/10.1371/journal.pone.0238420.g008

Conclusion
We established a mRNA-associated ceRNA network in OFSCC samples. The 3-mRNA model could serve as potential prognostic indicator alone or in combination with other clinicopathological for patients with OFSCC. Meanwhile, the prognostic model have been validated in the GEO database.