Integrated analysis of the functions and prognostic values of RNA-binding proteins in neuroblastoma

Background Neuroblastoma (NB) is the most common solid tumor in children. NB treatment has made significant progress; however, given the high degree of heterogeneity, basic research findings and their clinical application to NB still face challenges. Herein, we identify novel prognostic models for NB. Methods We obtained RNA expression data of NB and normal nervous tissue from TARGET and GTEx databases and determined the differential expression patterns of RNA binding protein (RBP) genes between normal and cancerous tissues. Lasso regression and Cox regression analyses identified the five most important differentially expressed genes and were used to construct a new prognostic model. The function and prognostic value of these RBPs were systematically studied and the predictive accuracy verified in an independent dataset. Results In total, 348 differentially expressed RBPs were identified. Of these, 166 were up-regulated and 182 down-regulated RBPs. Two hubs RBPs (CPEB3 and CTU1) were identified as prognostic-related genes and were chosen to build the prognostic risk score models. Multivariate Cox analysis was performed on genes from univariate Cox regression and Lasso regression analysis using proportional hazards regression model. A five gene prognostic model: Risk score = (-0.60901*expCPEB3)+(0.851637*expCTU1) was built. Based on this model, the overall survival of patients in the high-risk subgroup was lower (P = 2.152e-04). The area under the curve (AUC) of the receiver-operator characteristic curve of the prognostic model was 0.720 in the TARGET cohort. There were significant differences in the survival rate of patients in the high and low-risk subgroups in the validation data set GSE85047 (P = 0.1237e-08), with the AUC 0.730. The risk model was also regarded as an independent predictor of prognosis (HR = 1.535, 95% CI = 1.368–1.722, P = 2.69E-13). Conclusions This study identified a potential risk model for prognosis in NB using Cox regression analysis. RNA binding proteins (CPEB3 and CTU1) can be used as molecular markers of NB.

Introduction Neuroblastoma (NB) is the main cause of tumor-related deaths in children worldwide [1]. Diagnosis and treatment have made great progress in the past 20 years and the average 5-year relative survival rate of NB has reached 50% [2]. Currently, the diagnosis of NB mainly relies on histopathological examination, imaging results, and molecular biomarkers [3]. Early detection of NB is difficult. This may be the most important factor affecting the mortality of patients with NB [4]. Therefore, further study of the molecular mechanisms underlying NB and identification of effective molecular markers for early cancer screening are essential to enhance the therapeutic outcomes and quality of life of children [5].
RNA binding proteins (RBPs) are pleiotropic proteins that can regulate gene expression by interacting with various types of RNA [6], including rRNA, ncRNA, snRNA, miRNA, mRNA, tRNA, and snoRNA [7]. Currently, over 1500 different types of RBPs have been identified in the human genome through whole-genome sequencing [8]. Specifically the ribonucleoprotein complex formed by the binding of RBP and target RNA regulates the stability of mRNA at the post-transcriptional level, thereby affecting RNA processing, splicing, localization, export, and translation [9][10][11]. Ultimately, post-transcriptional modification of mRNA leads to the regulation of various important physiological processes of cells [12]. Current research has found that RBPs play an important role in many human diseases and are key regulators of the development and progression of cardiovascular diseases [13], myotonic muscular dystrophy [14], neurological diseases [15], and cancer [16].
In recent years, new bioinformatics approaches, such as bipartite network projection, IRWNRLPI (Integrating Random Walk and Neighborhood Regularized Logistic Matrix Factorization for lncRNA-Protein Interaction), and HLPI-Ensemble, have greatly improved the research and prediction of RBP function [17][18][19]. Herein, we used high-throughput bioinformatics analysis to identify RBPs that are differentially expressed in cancer samples and normal tissue samples, and systematically investigated their expression profiles, functional effects, and potential mechanisms to understand their role in tumors. This study will deepen our understanding of the molecular mechanism of NB and provide potential diagnostic or prognostic biomarkers for NB.

Data sets and preprocessing
We obtained RNA expression datasets and corresponding clinical data of NB patients from the Therapeutically Applicable Research To Generate Effective Treatments project database (TAR-GET, https://target-data.nci.nih.gov/Public/NBL/mRNA-seq/L3/), and normal neural tissues samples datasets from Genotype-Tissue Expression Database (GTEx, https://gtexportal.org/), respectively. All data derived from an open access data platform, and thus, this study did not require ethics committee approval. To determine differentially expressed genes between NB tissue and normal samples, the Limma software package was used for analysis. The GSE85047 dataset was downloaded from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih. gov/geo/query/acc.cgi?acc=gse85047) and was used as a validation cohort.

Prognostic model construction
The R package "survival" was applied to carry out univariate Cox regression analysis using all differentially expressed RBPs to identify prognostic genes, and Lasso regression was performed to further screen important key genes. Finally, based on the preliminary screening of the above key candidate genes, we built a multivariate Cox proportional hazard regression model and evaluated the survival of patients using the following risk score formula: where, β was the value of the risk coefficient, and Exp represented the value of the expression of a certain gene. Based on the median value of the risk score, NB patients were divided into two groups: low-risk group and high-risk group, and the survival differences between the two subgroups were compared through survival analysis. In addition, the prognostic ability of the above model was estimated through receiver operating characteristic curve (ROC) analysis. A sample of 276 NB patients with reliable follow-up information from the GSE85047 data set was used as the validation group to evaluate the predictive power of the prognostic model. P<0.05 was considered a statistically significant difference. Constructing the lncRNA-miRNA-mRNA network of key RBPs. To study the relationship of lncRNA, miRNA and mRNA, based on the interactions of miRNA-lncRNA and miRNA-mRNA, the online database such as starBase (http://starbase.sysu.edu.cn/ agoClipRNA.php?), targetscan(http://www.targetscan.org/vert_72/) [23], and LncBase (http:// carolina.imis.athenannovation.gr/diana_tools/web/index.php?r=lncbasev2%2Findex) were used to build the lncRNA-miRNA-mRNA network.

Identification of differently expressed RBPs in NB patients
We performed a methodical analysis of the key role and prognostic value of RBP in NB. The study workflow is illustrated in Fig 1. The NB data was downloaded from TARGET, which contained 144 tumor samples, and the normal nerve tissue data was downloaded from the GTEx database and contained 278 samples. After analyzing the currently known 1542 RBPs, 348 RBPs with significant differences in expression (P-adjusted <0.05, |log 2 fold change [FC]|>1.0) were identified, and comprised 166 up-regulated RBPs and 182 down-regulated RBPs (Fig 2) (S1 Table).

Enrichment analysis of the differently expressed RBPs
In order to study the functions and mechanisms of the selected RBP, we used the R package clusterProfiler for enrichment analysis. The results showed that enriched biological processes mainly involved mRNA processing, RNA splicing; ncRNA metabolic processes; RNA phosphodiester bond hydrolysis; RNA splicing, via transesterification reactions with bulged adenosine as a nucleophile; mRNA splicing, via spliceosome; RNA splicing, via transesterification reactions; nucleic acid phosphodiester bond hydrolysis; and RNA catabolic processes. Molecular functions included catalytic activity, acting on RNA ribonuclease activity; nuclease activity; mRNA 3'-UTR binding; endonuclease activity; translation regulator activity; catalytic activity, acting on a tRNA; mRNA binding; double-stranded RNA binding; endoribonuclease activity; and single-stranded RNA binding. Cellular components involved mainly the ribonucleoprotein granule, cytoplasmic ribonucleoprotein granule, ribosome, ribosomal subunit, organellar ribosome, mitochondrial ribosome, P-body, mitochondrial matrix, P granule, and pole plasm. The KEGG analysis [24] indicated mainly enrichment in RNA transport, the mRNA surveillance pathway, ribosome biogenesis in eukaryotes, RNA degradation, ribosome,

PPI network building and subnet detection
To further study the function of differentially expressed RBPs and their role in the development of NB, we used Cytoscape software to create a PPI network that contained 311 nodes and 1766 edges. The co-expression network was analyzed using MCODE to recognize potential key modules (Fig 4). The RBPs in subgroup 1 were mainly enriched in ribosome biogenesis in the eukaryote pathway, ribosome biogenesis, rRNA processing, ncRNA processing, maturation of SSU-rRNA, ribosomal small subunit biogenesis, rRNA metabolic process, maturation of SSU-rRNA from tricistronic rRNA transcript (SSU-rRNA, 5.8S rRNA, LSU-rRNA), and in ribosomal large subunit biogenesis.

Selection of prognosis-related RBP
Difference analysis identified a total of 348 key RBPs. To identify the prognostic significance of these RBPs and their effects on clinical outcome and survival, we performed univariate Cox regression analysis and obtained four candidate RBPs related to prognosis ( Fig 5). Subsequently, using Lasso regression, the prognostic risk score including multi-factor Cox regression values was constructed (Fig 6, Table 3).

Prognosis-related RBPs model building and analysis
CPEB3 and CTU1 were identified as the key prognostic genes using the multivariate Cox regression analysis. We used these two hub genes to construct the predictive model. The risk score for each child was calculated based on the following formula: Then, based on the median value of the individual risk scores, 144 NB patients were stratified into two groups: the low-risk and high-risk groups. The results showed that compared with patients in the low-risk group, patients in the high-risk group had significantly poorer survival (P = 2.152e-04). The value of the area under the curve (AUC) in the TARGET model was 0.720 (Figs 7A, 7B, and 8A).

Validation of hub RBPs
To evaluate the prognostic value of the RBP prediction model, we used the GSE85047 patient cohort to verify the relationship between risk score and survival. In the GSE85047 cohort, groups were also grouped based on the median value of risk score in the TARGET model. The survival of patients with high-risk scores was poorer than for patients having lower risk scores (P = 0.1237e-08), and the AUC was 0.730 (Figs 7C, 7D, and 8B).

The RBP risk score was an independent prognostic factor
We assessed the prognostic value of risk scores of RBPs. For the NB TARGET cohort, the risk scores on univariate analysis were significantly correlated with overall survival (OS) (HR = 1.535, 95% CI = 1.368-1.722, P = 2.69E-13) (Fig 9A). The multivariate analysis showed that the risk score was an independent prognostic indicator (HR = 1.518, 95% CI = 1.344-1.715, P = 1.91E-11) (Fig 9B). We constructed a nomogram that integrated multiple risk factors to quantify individual risks to be used in the clinical setting to predict the OS probability at 1, 2, and 3 years (Fig 9C).

Constructing the lncRNA-miRNA-mRNA network of key RBPs
Using online databases, we constructed the lncRNA-miRNA-mRNA network of key RBPs (Fig 10).

Discussion
The prognosis of different NB patients varies greatly, given the extensive tumor heterogeneity of NB. For low-risk NB patients (most commonly infants), simple observation or surgical treatment can often achieve good results; but for high-risk NB patients, even if a variety of intensive treatment options are combined [25], the prognosis is still not ideal. The true cause of NB is still unclear. In recent years, with the emergence of immunotherapy and new drugs, the survival of patients in the high-risk group has improved to a certain extent [26].
RBPs have always played a role in the life of RNA. It is not an exaggeration to say that without RBP, RNA cannot achieve anything. The main role of RBPs is to mediate RNA maturation, transport, localization, and translation; one RBP may have multiple target RNAs, and defect in expression may cause multiple diseases. Recently, the importance of RBP in tumor occurrence, development, and metastasis has increasingly been revealed [27].
In our study, we identified 348 RBPs based on NB datasets from the TARGET NB dataset. We systematically analyzed the related biological functions and built RBP PPI networks and the relative subnets. In addition, we performed univariate Cox regression analysis, survival analysis, Lasso regression analysis, and multivariate Cox regression analysis of differential RBPs to further investigate their biological function and prognostic value.  The GO enrichment and KEGG pathway analysis of these differentially expressed RBPs indicated that RBP were used in mRNA monitoring pathways, RNA transport, ribosomal biogenesis in eukaryotes, RNA degradation, ribosomes, aminoacyl-tRNA biosynthesis, and spliceosomes. RBPs were significantly enriched in the RNA polymerase pathway, and play a critical role in mRNA processing, RNA splicing, ncRNA metabolism, RNA phosphodiester bond hydrolysis, catalytic activity, and act on RNA ribonuclease and nuclease activity. The PPI network revealed the relationship between RBPs. At present, with the development of computational biology, it is possible to predict the complex regulatory relationship between proteins, which provides a powerful tool for further research [28]. At present, many studies have described the role of RBPs role in metabolism and disease. RBPs plays a dual and opposite role in tumorigenesis, regulating the proliferation of early tumor cells and promoting tumor progression and metastasis in advanced cancer. According to these reports, abnormal expression of multiple RBPs had been found in many malignant tumors [9,29,30]. However, the impact of RBP on the occurrence and development of cancer is still poorly understood.
In our study, two RBPs were identified as hub RBPs related to NB prognosis: CPEB3 and CTU1. CPEB3, also known as the Cytoplasmic Polyadenylation Element Binding Protein is an RBP that shuttles through the cytoplasm. It mainly exists in the cytoplasm and can inhibit the translation of target RNA. GO analysis showed that CPEB3 was associated with mRNA processing, RNA catabolic processes, mRNA 3'-UTR binding, translation regulator activity, and  mRNA binding. CPEB3 can disrupt the crosstalk between cancer cells and tumor-associated macrophages through the IL-6R/STAT3 signaling pathway, and thereby inhibit epithelial-mesenchymal transition. Studies have found that CPEB3 is related to the tumorigenesis and development of glioma [31], high-grade serous ovarian cancer [32], colorectal cancer [33], hepatocellular carcinoma [34], and cervical cancer [35]. CPEB3 is expressed in both the brain and heart and is involved in the regulation of synaptic plasticity by down-regulating the expression of several plasticity-related proteins (PRPs), such as N-methyl-D-aspartate receptor (NMDAR) and postsynaptic density protein 95 (PSD95). Down-regulation of CPEB3 expression can affect NMDAR activated CaMKII α, which overcomes the inhibition of synaptic transmission under stress [36,37]. In addition, CPEB3 knockout mice exhibited hippocampus-dependent neurological dysfunction, including acquisition and extinction of long-term spatial memory and short-term fear memory. In nerve cells, activation of NMDAR can increase the expression of CPEB3 in the nucleus and redistributes its content in nucleus and cytoplasm [38]. CPEB3 has low expression in colon cancer and HPV-positive cervical cancer, which indicates it may be involved in tumorigenesis as a tumor suppressor, but the specific mechanism still needs further study [39]. CPEB protein family members contain multiple miRNA binding sites, and can be regulated by a

PLOS ONE
variety of miRNAs [40]. After up regulating the expression of mir-107 in hepatoma cell lines Huh7 and HepG2, the mir-107 bound to the 3'-UTR region of the CPEB3 transcript, and resulted in an increased expression of EGFR and phosphorylated Akt and decreased expression of p21. Mir-107 and CPEB3 interaction plays an important role in the occurrence and development of hepatocellular carcinoma by regulating the EGFR signaling pathway [34].
CTU1, or cytoplasmic sulfurylase subunit 1, is a protein-coding gene. Diseases associated with CTU1 include spinal cord septal membrane tumors and spinal gliomas. The related pathways include gene expression and tRNA processing, and its function is related to tRNA binding and nucleotide transferase activity. GO analysis shows CTU1 was associated with ncRNA metabolic process and ncRNA processing. CTU1 promotes cancer resistance to targeted therapy [41], its expression is associated with higher rates of morbidity and mortality in spinal cord gliomas [42], and up-regulation of CTU1 is involved in human breast cancer metastasis.
In our study, based on the two hubs RBPs identified in the TARGET cohort training set, multi-step Cox regression analysis produced a risk score model that could predict the prognosis of NB. In the NB TARGET and GSE85047 cohorts, the survival outcomes of the high-and low-risk subgroups were significantly different. The ROC values of the risk score model of the training set and validation set were 0.72 and 0.73, respectively, indicating that the 2-gene marker prognostic model applied to evaluate the prognosis of NB patients has a certain value. Both genes were differentially expressed in NB and correlated with survival, which strongly suggests that these two genes are potential tumor-related genes and require further study. In the future, we intend to establish a real-life cohort of NB patients to confirm the validity of these genes. However, the molecular mechanisms involving these two RBPs are unknown, and further study of their underlying function may be valuable.
In recent years, more and more studies have confirmed that long non-coding RNA (lncRNA) and micro RNA (miRNA) and their interactions play an important role in the diagnostic biomarkers and therapeutic targets of various diseases. Some theoretical methods have played a role in predicting potential lncRNA-miRNA associations. With the development of bioinformatics, some new prediction methods, such as the lncRNA-miRNA interactions prediction by logistic matrix factorization with neighborhood regularized (LMFNRLMI), enable us to study lncRNA-miRNA interactions more accurately, that is, to predict lncRNA-miRNA interactions [43].
In summary, we systematically studied the function and prognostic value of RBPs differently expressed in NB. These RBPs may be associated with the occurrence, development, invasion, and metastasis of NB. The establishment of a prognostic model of NB based on two RBP coding genes is conducive to clinical application. Our results contribute to a better understanding of the pathogenesis of NB and the development of new therapeutic and prognostic molecular markers. Although our gene signature and nomogram showed excellent performance in the training set and validation set, both inevitably had some limitations. First, although our risk score performed well in predicting the survival rate of NB patients, it lacks confirmation from large-scale prospective trials. Secondly, the validation data derived only from the GSE 85047 dataset; thus, the predictive value of our model requires further verification. Third, the molecular mechanisms involving CPEB3 and CTU1 have not been verified in NB cells. Thus, our follow-up studies will verify the conclusions reached in this study from the aspect of their clinical application and molecular mechanisms.
Supporting information S1