A five-gene signature predicts overall survival of patients with papillary renal cell carcinoma

Background The present study aims to investigate the gene expression changes in papillary renal cell carcinoma(pRCC) and screen several genes and associated pathways of papillary renal cell carcinoma progression. Methods The papillary renal cell carcinoma RNA sequencing (RNA-seq) data set was downloaded from TCGA (The Cancer Genome Atlas). We identified the differentially expressed mRNAs between cancer and normal tissues and performed annotation of differentially expressed mRNAs to figure out the functions and pathways they were enriched in. Then, we constructed a risk score that relied on the 5-mRNA. The optimal value for the patients’classification risk level was identified by ROC analysis. The relationship between mRNA expression and prognosis of papillary renal cell carcinoma was evaluated by univariate Cox regression model. The 5-mRNA based risk score was validated in both complete set and testing set. Result In general, the 5-mRNA (CCNB2, IGF2BP3, KIF18A, PTTG1, and BUB1) were identified and validated, which can predict papillary renal cell carcinoma patient survival. This study revealed the 5-mRNA expression profile and the potential function of a single mRNA as a prognostic target for papillary renal cell carcinoma. Conclusion In addition, these findings may have significant implications for potential treatments options and prognosis for patients with papillary renal cell carcinoma.


Methods
The papillary renal cell carcinoma RNA sequencing (RNA-seq) data set was downloaded from TCGA (The Cancer Genome Atlas). We identified the differentially expressed mRNAs between cancer and normal tissues and performed annotation of differentially expressed mRNAs to figure out the functions and pathways they were enriched in. Then, we constructed a risk score that relied on the 5-mRNA. The optimal value for the patients'classification risk level was identified by ROC analysis. The relationship between mRNA expression and prognosis of papillary renal cell carcinoma was evaluated by univariate Cox regression model. The 5-mRNA based risk score was validated in both complete set and testing set.

Result
In general, the 5-mRNA (CCNB2, IGF2BP3, KIF18A, PTTG1, and BUB1) were identified and validated, which can predict papillary renal cell carcinoma patient survival. This study revealed the 5-mRNA expression profile and the potential function of a single mRNA as a prognostic target for papillary renal cell carcinoma.

Conclusion
In addition, these findings may have significant implications for potential treatments options and prognosis for patients with papillary renal cell carcinoma. PLOS  renal cell carcinoma, (2) patients underwent radical nephrectomy or partial nephrectomy, (3) patients that didn't receive any neoadjuvant treatment, (4) patients with corresponding RNAseq data, (5) patients with explicit clinical prognostic data. Besides, the corresponding clinical characteristics, including sex, age, pathological stage, clinical stage, definite TNM stage and histological type of our cohort were integrated as a clinical statistical chart in Table 1. Signal value for all genes were transformed to the log base 2. Standardization of the level of mRNA expression for each sample was normalized with median quantification. Since the data comes from the TCGA database, no further approval is required from the Ethics Committee.

Screening of differentially expressed mRNA
The RNA sequencing data (level 3) of 321 pRCC samples was downloaded from the TCGA data portal. The differentially expressed mRNAs between pRCC tissues and matched normal ones were analyzed using the limma package in R [15]. The edgeR package and DESeq2 package in R were subsequently used for the calculation of DEGs. The adjust P value <0.05 and | log 2 FC|>1 were set as the cut-off criteria. The genes that presented in both edgeR and DESeq2 package analysis results were selected as the final DEGs.
In our study, we mainly used the program code written in R language to analyze and deal with RNA-seq data.

Identification and selection of prognostic related mRNA
The patients were further randomly divided into the training set (n = 143) and the testing set (n = 142) with caret package in R language. In the training set, the association between mRNAs expression and the overall survival time of patients were analyzed. Univariate Cox regression analysis was performed with Survival package in R language [16]. In the survival analysis, We selected the time that patient's initial pathological diagnosis to papillary renal cell carcinoma as the start point, and the time that death or censored events occurred as the end point. Messenger RNAs with expressing significance p values <0.05 were selected as a target mRNA. To further ensure the reliability and feasibility of clinical prognosis based on mRNA, we made a Robust likelihood-based survival analysis by using Rbsurv package in R [17,18]. All 285 samples were again randomly assigned to a training set or a testing set with the caret package in R language. We fitted a gene to the training set of samples and obtained a corresponding parameter estimate. Then we evaluated log likelihood with the parameter estimate and the testing set of samples. This procedure was repeated 10 times independently, which resulted in 10 log-likelihood for each gene. The best gene, g (1), with the largest mean log likelihood was selected. We searched the next best gene by evaluating every two-gene model and selected an optimal one with the largest mean log likelihood. Akaike information criterions (AICs) for all the candidate models were computed and an optimal predictive model was selected as with the lowest AIC value. The above procedure was repeated 1000 times, thus obtained 1000 loglike � s for each gene. The mRNA with the highest frequency (freq>300) was selected as the final target mRNA.

Establishment and validation risk score formula
The association between DEGs and patients'overall survival time was analysed in training set. Univariate Cox regression analysis was performed in R language by Survival package with the threshold of p value set as 0.05. In the univariate Cox regression analysis of the training set, these prognostic-related mRNAs were included in their estimated regression coefficients, and a risk scoring formula was established. In the training set, the patients were divided into highrisk score or low-risk score group based on the optimal risk score cutoff value, which represented the point at which the Youden index (sensitivity + specificity-1) reached a maximum value in training set. Then, we applied the optimal risk score cutoff value to the validation set and the whole pRCC cohort to evaluate the classification performance of the model. By using survival ROC package in R, the receiver operating characteristic (ROC) curve was obtained. Then, choosing the optimal cutoff point with maximum sensitivity and specificity. Based on the optimal cut-off value, the survival difference between the low-risk and high-risk group was evaluated. The accuracy of the risk score formula was then further verified in the testing set and the complete set.

Functional and pathway enrichment analysis of DEGs
The ConsensusPathDB (http://cpdb.molgen.mpg.de/) database is a molecular functional interaction database, integrating information on protein interactions, genetic interactions signaling, metabolism, gene regulation, and drug-target interactions in humans. In order to better understand the biological functions and characteristic, the present study performed Gene ontology (GO) using clusterProfiler package in R and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses in ConsensusPathDB database. All genes from human beings were used as reference. In this study, we analyzed the DEGs that were significantly upand down-regulated as determined from pRCC data. The p value of the univariate survival analysis was less than 0.05 and considered statistically significant. To correct errors following multiple comparison analysis, the Benjamini-Hochberg step-up method was used to control the FDR.

Differentially expressed mRNAs in pRCC
With a cut-off value of P < 0.05 and |log2FC| >1.0, 4776 and 4831 differentially expressed mRNAs were identified by using the DESeq2 and edgeR package in R language, respectively (S1 File). The genes that presented in both edgeR and DESeq2 package analysis results were selected as the final DEGs. According to this standard, 2632 up-regulated mRNAs and 1921 down-regulaed ones were identified (Fig 1A and 1B). We displayed the distribution of all the differentially expressed mRNAs in both the-log(FDR) and logFC dimensions through a volcano plot in Fig 2.

Identification and selection of prognostic related mRNA
The total of 4553 mRNAs was further randomized into a training set and a testing set. A total of 749 differential expressed mRNAs were identified by univariable cox regression with the P <0.05 in training set (S2 File). The top of 20 genes with the lowest p value were selected and presented in Table 2. Random data analysis was performed by using Robust likelihood-based modelling for 1000 times. Statistic frequency analysis of the significantly altered mRNAs indicated that all the selected mRNAs had a high frequency (Fig 3). In further analysis, differentially expressed mRNA with a frequency above 300 was picked out. Finally, there were 5 mRNAs identified as prognostic feature (Table 3). Among these genes, four mRNAs (CCNB2, IGF2BP3, KIF18A, PTTG1) had positive coefficients which suggested that higher expression level was associated with worse survival time and one (BUB1) had negative coefficients suggested that higher levels of expression were related with better survival time.

The 5-mRNA signature predicts the survival of patients with pRCC
A risk score formula based on the expression level and coefficient of 5 mRNAs was created as follows: Risk score = (-0.560 � expression level of BUB1) + (0.337 � expression level of CCNB2) + (0.424 � expression level of IGF2BP3) + (0.516 � expression level of KIF18A) + (0.310 × expression level of PTTG1). Then, the mRNAs signature based risk score for each patient in training set were calculated, and patients in the cohort were assigned into high-risk group (n = 51) and low risk group (n = 92). The risk score was inversely related to the patients'survival time. The patients with lower risk score had longer survival time and death patients had higher risk score (Fig 4). The patients in high risk group had a tendency to have higher expression levels of IGF2BP3, KIF18A, PTTG1, BUB1 and CCNB2.

Available prognostic indicator in pRCC patients
We performed ROC analysis for the 5-mRNA risk score in the training set to further validate the sensitivity and specificity of survival prediction. The optimal cut-off value for false negative and false positive minimum was 0.951, and the area under the curve (AUC) was 0.82 ( Fig 5A). The Kaplan-Meier curves showed that patients in the high-risk group had significantly shorter overall survival than those in the low-risk group (log-rank test p < 0.0001) (Fig 5B).

External validation of five-gene signature
The predictive efficiency of the five-mRNA signature in testing set with 142 patients was then evaluated. The patients in the testing set were classified into high-risk (n = 53) and low-risk groups (n = 89) by using the same model and criteria. Similar to the training set, overall survival was significantly lower in the high-risk group than in the low-risk group (p < 0.0001)( Fig  6A). Risk score-based classification of the external complete set also yielded similar results as shown in Fig 6B. Despite unbalanced samples in each group, the results of survival analysis were similar to those of low-risk patients with significantly longer survival times than those with high-risk patients (p < 0.0001). The clinical characteristics available from the TCGA database were integrated and univariate Cox regression analysis was performed to detect the candidate clinical prognostic parameters. To validate independent predictive power of five-gene signature from clinicopathological factors in pRCC cohort such as age, pathological stage, clinical stage, pathological type and so on, the stratified Cox proportional hazard analysis was constructed and visualized in Fig 7 and Fig 8, which suggested this signature could distinguish the high-risk subgroup from low-risk one in each clinical subtype and further convinced the independent prognostic efficacy of the final gene signature.

Functional and pathway enrichment analysis of DEGs
After performing GO analysis of DEGs with clusterProfiler package in R language, the DEGs were classified into three groups: molecular function group, biological process group and cellular component group. The biological results revealed that DEGs were primarily enriched in chromosome segregation, mitotic nuclear division, nuclear chromosome segregation, urogenital system development, renal system development and kidney development. The cellular component results indicated that DEGs were mainly riched in chromosomal region, proteinaceous extracellular matrix, condensed chromosome, spindle, chromosome and centromeric region. The molecular function results showed that DEGs were mainly enriched in ion channel activity and substrate-specific channel activity (Fig 9A-9C). To investigate pathway enrichment, KEGG signaling pathway analysis was used to identify the pathways, including cAMP signaling pathway, phospholipase D signaling pathway, Hippo signaling pathway, cell cycle, complement and coagulation cascades, TGF-beta signaling pathway, inflammatory mediator regulation of TRP channels and melanoma (Fig 9D).

Discussion
In the present study, the R language was used to analysis the gene data downloaded from TCGA databases. The total of 4553 DEGs in pRCC compared with control samples were identified, which included 2632 up-regulated and 1921 down-regulated genes. The DEGs were mainly enriched in 60 GO terms, including ion channel activity, chromosome segregation and chromosomal region. The KEGG pathway enrichment analysis result showed that the DEGs were associated with calcium,cAMP,phosphollipase D, and hippo signaling pathway. The Ca2 +-mediated signaling pathways have been implicated either directly or indirectly related to tumorigenesis and tumor progression [19][20][21]. In this pathway, Orai1 exaggerates cell proliferation, migration, invasion and evasion of apoptosis [19]. The cAMP plays a complex and context-dependent role in regulating cell migration [22]. Usually the focus has been mainly on PKA-mediated migratory effects. Shaikh et al. suggests that hypoxia enhances cAMP-dependent protein kinase activity by up-regulating PKA gene expression in a HIF dependent mechanism and that PKA plays a key role in hypoxia-mediated EMT, migration, and invasion in lung cancer cells [23]. There is increasing evidence that the Hippo pathway is dysregulated in many human cancers, and dysregulation of the Hippo pathway exerts a significant impact on cancer development, including liver, breast, lung, colon, ovary, and others [24]. Aberrant phospholipase D (PLD) expression has been identified in multiple facets of complex pathological states, including cancer and inflammatory diseases. PLD contributes to various mitogenic or oncogenic signaling pathways [25]. Therefore, monitoring of these signaling pathway maybe beneficial to understanding the mechanism of carcinogenesis and researching treatment of prostate cancer. We identified 5 mRNAs that are associated with the survival of patients with papillary renal cell carcinoma,namely CCNB2, IGF2BP3, KIF18A, PTTG1, and BUB1 in the training set. The prognostic related mRNAs were further selected to construct a risk score formula by Cox  regression model. Besides, we used the ROC analysis to identify the optimal cutoff point, and divided patient into high-risk and low-risk groups. In the univariate Cox regression model, the survival time of high-risk patients was significantly shortened. Subsequently, the GO and KEGG pathway analysis suggest that mRNA plays a crucial role in molecular pathogenesis and progression of pRCC. Among the 5 mRNAs which are associated with the prognosis of pRCC, some have been reported to express in cancer or other diseases, but have not been examined in pRCC. For example, elevated cytoplasmic CCNB2 protein levels are strongly associated with short-term disease-specific survival of breast cancer patients [26]. Cyclin B2 (CCNB2), the member of cyclin family proteins, regulates the activities of cyclin dependent kinases (CDKs) and different cyclins function spatially and temporally in specific phases of the cell cycle [27]. CCNB2 . Therefore, CCNB2 is important in pRCC and may be used as a prognostic indicator. It has been reported that insulin-like growth factor 2 mRNA binding protein 3 (IGF2BP3) expression correlates with malignancy[36]. Schaeffer et al. suggested that IGF2BP3 was found to be selectively overexpressed in pancreatic ductal adenocarcinoma tissues but not in benign pancreatic tissues [37]. In ovarian cancer, high expression of IGF2BP3 was associated with poor survival, and women diagnosed at advanced stages with elevated IGF2BP3 was at higher risk of developing chemoresistance [38]. Lochhead et al. revealed that normal colorectal epithelium was negative for IGF2BP3 in patients of normal mucosa adjacent to carcinoma, and IGF2BP3 was associated with poor differentiation, stage III-IV disease, BRAF mutation, and LINE-1 hypomethylation [39]. Lin et al. demonstrated that IGF2BP3 enhances cell invasion ability and tumorigenicity in human OSCC in vitro and in vivo [40]. In the present study, it was demonstrated that the pRCC patients with IGF2BP3 alterations exhibited a poorer survival rate compared with those without the genetic alterations. This result suggested that the mutation in IGF2BP3 reduces the survival rate of patients with pRCC. Kif18A, a member of the kinesin super family of molecular motor proteins, regulates chromosome congregation and suppresses kinetochore movements to control mitotic chromosome alignment in the pre-anaphase state of the mammalian cell cycle [41]. Notably, Nagahara et al. demonstrated that colorectal cancer cells transfected with Kif18A cDNA demonstrated significant enhanced migration and invasion compared to mocktransfected cells [42]. KIF18A might be a biomarker for HCC diagnosis and an independent predictor of DFS and OS after surgical resection [43]. The results of the present study were The gene signature predicts overall survival in pRCC patients consistent with the results of previous studies. KIF18A may have an important role in progression of pRCC, however this requires further study, in order to verify the specific molecular marker role of KIF18A in the diagnosis of patients with pRCC. Pituitary tumour transforming gene 1 (PTTG1) is over-expressed in a vast array of malignancies including pituitary [44,45], thyroid [46], colorectal [47] and lung [48] cancer. The high level of PTTG1 is commonly associated with an enhanced proliferative capacity, increased tumour grade and high invasive potential. Current evidences regarding the biological role of BUB1 in cancer is contradictory. Mutations in BUB1, some of which are functional, occur in some cancers, including those that originate in the lung, colon, and are reported to be associated with chromosomal instability and lymph node metastasis, suggesting that silencing of this kinase may mediate aggressive clinical behavior [49]. Moreover, Aurora b hyperactivation caused by overexpression of BUB1 leads to misaggregation of chromosomes, thereby inducing aneuploidy [50].

Conclusion
Taken all together, by performing a comprehensive analysis for differentially expressed mRNA profiles and corresponding clinical information, our study demonstrated that five-mRNA signature was a potential diagnostic marker in pRCC, and was an independent prognostic factor in pRCC patients. This signature has a lot of potential prognostic and therapeutic implications for the pRCC patient management. However, further research is needed to validate our findings and establish molecular mechanisms for mRNAs interactions and papillary renal cell carcinoma progression.