Identification of ubiquitination-related genes in human glioma as indicators of patient prognosis

Ubiquitination is a dynamic and reversible process of a specific modification of target proteins catalyzed by a series of ubiquitination enzymes. Because of the extensive range of substrates, ubiquitination plays a crucial role in the localization, metabolism, regulation, and degradation of proteins. Although the treatment of glioma has been improved, the survival rate of patients is still not satisfactory. Therefore, we explore the role of ubiquitin proteasome in glioma. Survival-related ubiquitination related genes (URGs) were obtained through analysis of the Genotype-Tissue Expression (GTEx) and the Cancer Genome Atlas (TCGA). Cox analysis was performed to construct risk model. The accuracy of risk model is verified by survival, Receiver operating characteristic (ROC) and Cox analysis. We obtained 36 differentially expressed URGs and found that 25 URGs were related to patient prognosis. We used the 25 URGs to construct a model containing 8 URGs to predict glioma patient risk by Cox analysis. ROC showed that the accuracy rate of this model is 85.3%. Cox analysis found that this model can be used as an independent prognostic factor. We also found that this model is related to molecular typing markers. Patients in the high-risk group were enriched in multiple tumor-related signaling pathways. In addition, we predicted TFs that may regulate the risk model URGs and found that the risk model is related to B cells, CD4 T cells, and neutrophils.


Introduction
The ubiquitination pathway is a highly conserved post-translational modification process regulated by a series of enzymes [1,2]. It plays a leading role in the degradation of most proteins in eukaryotic cells. Owing to the wide range of substrates that are targeted by ubiquitination, the ubiquitination pathway participates in multiple cellular events, some of which underlie a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 368) [19]. The author obtained 95 deubiquitination genes by consulting the literature [19]. Using the limma package of R software (https://www.r-project.org/), we analyzed the differential expression of non-tumor brain tissue and glioma tissue in the GTEx and TCGA databases, and then we only retained the URGs in the differentially expressed genes ( Table 2). URGs satisfying FDR<0.05 and log2 |fold change|> 1 are considered statistically significant. Using the R software survival package, we discovered URGs related to the prognosis of patients. A pvalue less than 0.05 was considered statistically significant.

Survival curve analysis and receiver operating characteristic (ROC) curve analysis
We analyzed the relationship between differentially expressed URGs and the risk model with patient prognosis using the survival package of R software. The results were analyzed by survival curves. The survival ROC package of R software was used to analyze the accuracy of URGs and the risk model in predicting patient prognosis. P<0.05 (for survival curves) and AUC>0.7 (for ROC curves) were considered to indicate statistical significance.

Univariate and multivariate Cox analysis
To analyze whether the URGs and the risk model were prognostic factors for glioma patients, we analyzed the relationship between the individual factors and the survival of patients by univariate Cox analysis. We used multivariate Cox to analyze whether the factors and model can be used as prognostic factors for patients. The analyses were performed using R software.

Risk model of URGs and patient risk level calculations
We analyzed the correlation between differentially expressed URGs and survival through R software, and obtained URGs related to survival. Then, we performed multivariate Cox regression analysis on survival-related URGs, excluded URGs that could complement survival relationships, and constructed an optimal risk model for predicting patient prognosis. After obtaining the risk model, we multiplied the risk model gene expression level of each patient by the model coefficient to obtain the risk score of each patient. We divided patents into highand low-risk groups according to the median value of the risk score.

Gene Set Enrichment Analysis (GSEA) of risk model-enriched KEGG pathways
We used GSEA (http://software.broadinstitute.org/gsea/index.jsp) to analyze the risk modelenriched KEGG signaling pathways. We calculate the risk value of each patient according to the risk model formula. According to the median risk value, patients were divided into high and low risk groups. Then the KEGG signaling pathway enriched in the high-and low-risk groups was analyzed by GSEA.

Correlation analysis between risk model and clinical markers
To explore the relationship between the risk model and common clinical traits, we analyzed the relationship between the risk model and these common clinical traits, such as ATRX status, Chr 19/20 co-gain, Chr 7 gain/Chr 10 loss, MGMT promoter status, 1P/19q co-deletion, IDH status, mutation count, histology, gender, grade, age and fustat (state of existence) [20]. We found that the risk model is related to all clinical traits except gender. �� p<0.01, ��� p<0.001.

Analysis of transcription factors (TFs) regulating URGs in the risk model
Cistrome Cancer contains a regulatory link between TFs and the transcriptome, which is constructed by TCGA tumor molecules, 23,000 ChIP-seq and chromatin [21]. The Cistrome Cancer database contains 318 TFs. We used clinically relevant TFs to construct regulatory networks of TFs and URGs. A P-value<0.001 was considered statistically significant.

Analysis of the correlation between the risk model and immune cells
We used TIMER (Tumor Immune Estimation Resource) to analyze the numbers of six subtypes of tumor-infiltrating immune cells, including B cells, CD4 T cells, CD8T cells, dendritic cells, macrophages, and neutrophils of tumor-infiltrating immune cells, in 10,897 samples from 32 cancers of TCGA [22]. We downloaded the level of glioma immune infiltration by TIMER (https://cistrome.shinyapps.io/timer/) and analyzed the relationship between the risk score obtained and immune cell infiltration.

Statistical analysis
R software was used for statistical analyses. The differentially expressed URGs were obtained by Wilcoxon test. Statistical significance was defined as a two-tailed P value < 0.05. GSEA was performed using GSEA software, where |NES (normalized enrichment score)| > 1, NOM pvalue < 0.05, and FDR q-value < 0.25 were considered to indicate statistical significance.

Identification of differentially expressed URGs and survival-related ubiquitination genes
By referring to the literature, we obtained ubiquitination-related genes (URGs), including 929 ubiquitin ligase genes and 95 deubiquitination enzyme genes. To determine whether URGs can be used as markers to judge the prognosis of glioma patients, we first obtained 36 differentially expressed URGs, including 19 downregulated genes and 17 upregulated genes, by comparing 1157 non-tumor brain tissues with 697 glioma tissues ( Table 2). Cox analysis showed that 25 of these URGs were associated with patient survival (Fig 1). Among the 25 URGs, 11 were high-risk genes and 14 were low-risk genes. Survival analysis of CDC20, UBE2C, WDR62, DTL, HOXB4 and TRIM38 were statistically significant, and the AUC value was greater than 0.7 (Figs 2 and 3). We performed univariate and multivariate Cox analyses to determine the impact of individual genes on patient prognosis. Univariate analysis found that the six genes and more clinical traits (age, grade, histology, IDH status, 1p/19q co-deletion, MGMT promoter status, Chr 7 gain/Chr 10 loss, Chr 19/20 co-gain, ATRX status) are related to the prognosis of the patient without the influence of other factors ( Fig 4A, Table 3). However, multivariate Cox analysis found that only age, IDH status and Chr 19/20 co-gain were independent factors for patient prognosis (Fig 4B, Table 3).

Construction of a clinical prognostic model for URG evaluation
Cox analysis found no independent prognostic factors among the survival-related URGs. Therefore, R software was used to analyze the relationship between the 25 differentially expressed URGs and the prognosis of patients and remove the URGs that can be correlated. We obtained an optimized risk model for predicting the prognosis of glioma patients (Table 4). Through multivariate Cox regression analysis, glioma patients were divided into high-and-low risk groups according to the median value of the risk value, and the risk value was calculated according to the risk model formula. A model was constructed to predict patient prognosis (Fig 5, Table 4). The formula is as follows: [CDC20 expression level × (0.343) . Survival analysis showed that the 3-year and 5-year survival rates of patients in the high-risk group were 32.2% and 22.4%, respectively, and those in the low-risk group were 92.4% and 78.5%, respectively ( Fig 6A). ROC curve analysis found that the model accuracy rate was 85.3% ( Fig 6B). The model we constructed is expressed by riskScore, and univariate and multivariate Cox analyses showed that age, grade, IDH status, Chr 19/20 co-gain and riskScore can be used as independent prognostic factors (Fig 7, Table 5). In addition, we also constructed prognostic models for low-level and high-level groups, survival analysis and ROC analysis for verification in the TCGA and GSE108474 databases (Tables 6 and 7, S1 and S2 Figs).

The relationship between the risk model and clinical traits, signaling pathways, TFs and immune cells
We analyzed the relationship between the risk model and multiple clinical molecular markers and found that the risk model is related to ATRX status, Chr 19/20 co-gain, Chr 7 gain/Chr 10 loss, MGMT promoter status, 1P/19q co-deletion, IDH status, mutation count, histology, grade, and age (Fig 8). GSEA analysis revealed that the high-risk group is enriched in ECMreceptor interaction, focal adhesion, homologous recombination, Jak-STAT signaling pathway, leukocyte transendothelial migration, mismatch repair, and the Toll-like receptor signaling pathway (Fig 9A). However, no meaningful enrichment was found in the low-risk group. Cancer is caused by aberrant gene regulation. TFs are important molecules that directly regulate gene expression. To explore potential molecular mechanisms corresponding to the URGs in the risk model, we thus explored the potential TFs that may function in regulating the URGs in the risk model. Cytoscape was used to display TFs that regulate risk model genes ( Fig  9B). GSEA analysis found that the risk score was related to leukocyte transendothelial migration. Since immune cell infiltration is very important in tumor progression, we next examined the relationship between the risk score and immune cells and found that it was related to B cells, CD4 T cells, and neutrophils (Fig 10).

Discussion
By analyzing 929 ubiquitination ligase genes and 95 deubiquitination genes in the TCGA and GTEx databases, we obtained 36 differentially expressed URGs. Our results showed that 25 of the differentially expressed URGs were related to survival in glioma patients; six genes, CDC20, UBE3C, WDR62, DTL, HOXB4, and TRIM38, were statistically significant with an AUC value greater than 0.7. However, none of the six genes were independent prognostic factors for glioma patients. We constructed a model using the URGs that predicts the prognosis    of glioma patients. Patients were stratified into high-and low-risk groups using the model, and the 5-year survival rates of the patients in the high-and low-risk groups were 22.4% and 78.5%, respectively, with a true positive rate of 0.853. Cox analysis showed that the risk model was independent factor for patient prognosis, in addition to age, grade, IDH status, and Chr 19/20 co-gain. The risk model was related to multiple factors such as ATRX status, Chr 19/20 co-gain, Chr 7 gain/Chr 10 loss, MGMT promoter status, 1P/19q co-deletion, IDH status, and mutation count, among other factors. GSEA found that the high-risk group was enriched in ECM-receptor interaction, focal adhesion, homologous recombination, Jak-STAT signaling pathway, leukocyte transendothelial migration, mismatch repair, and Toll-like receptor signaling pathway. In addition, we identified TFs that may regulate these genes and were related to B cells, CD4 T cells and neutrophils. The degradation of intracellular proteins is mainly accomplished by the ubiquitin-proteasome system, and the ubiquitin-proteasome have strong selectivity for protein degradation [23]. Studies have shown that the ubiquitination pathway is closely related to the occurrence and development of tumors [1,2,8,9,19]. Here we identified six differentially expressed URGs that were related to survival in glioma patients. CDC20 drives the aggressiveness and self-renewal of gliomas and is associated with genomic instability [24,25]. CDC26 is a component of the late cell cycle anaphase promotion complex, which is activated by CDC20 to regulate the cell cycle [26]. CORO6 is a member of the coronin family and is an actin-binding protein [27,28]. HOXB4 is associated with the prognosis of ovarian cancer patients, leukemia resistance, and renal cancer [29][30][31]. RNF212 is associated with postoperative survival rate and TMZ chemoradiation response in GBM patients as well as mammalian meiosis [32,33]. TRIM38 is associated with innate immunity and inflammation [34,35]. Few studies have

PLOS ONE
examined RFPL4B and UBE2NL. Our constructed model of URGs can be used as an independent prognostic factor to predict the prognosis of glioma patients. With the completion of the Human Genome Project, some gene mutations have been shown to be aberrant in patients with gliomas and can be used for diagnosis and treatment of gliomas, such as IDH status, MGMT promoter status, and 1P/19q co-deletion etc [36][37][38]. We analyzed the relationship between the score of the risk model and these molecular markers of glioma and found that the risk score was closely related to the above molecular markers, which proves the reliability of the model. GSEA identified multiple tumor signaling pathways that may lead to disease progression in high-risk groups. Previous studies showed that the ubiquitination pathway regulates tumor progression through an immune response [1,39]. GSEA found that the high-risk group was enriched for leukocyte transendothelial migration, and correlation analysis found that the risk score is related to the number of B cells, CD4 T cells and neutrophils.
In summary, we built a model of URGs to predict prognosis in glioma patients. GSEA results may have identified a possible mechanism in the patients with poor prognosis. Our results may provide new targets for the prognosis and treatment of glioma patients.