Identification of biomarkers for Barcelona Clinic Liver Cancer staging and overall survival of patients with hepatocellular carcinoma

The aim of the current study was to identify biomarkers that correlate with the Barcelona Clinic Liver Cancer (BCLC) staging system and prognosis of patients with hepatocellular carcinoma (HCC). We downloaded 4 gene expression datasets from the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo), and screened for genes that were differentially expressed between HCC and normal liver tissues, using significance analysis of the microarray algorithm. We used a weighted gene co-expression network analysis (WGCNA) to identify hub genes that correlate with BCLC staging, functional enrichment analysis to associate hub genes with their functions, protein-protein interaction network analysis to identify interactions among hub genes, UALCAN analysis to assess gene expression levels based on tumour stage, and survival analyses to clarify the effects of hub genes on patients’ overall survival (OS). We identified 50 relevant hub genes using WGCNA; among them, 13 genes (including TIGD5, C8ORF33, NUDCD1, INSB8, and STIP1) correlated with OS and BCLC staging. Significantly enriched gene ontology biological process terms included RNA processing, non-coding RNA processing and phosphodiester bond hydrolysis, and 6 genes were found to interact with 10 or more hub genes. We identified several candidate biomarkers that correlate with BCLC staging and OS of HCC. These genes might be used for prognostic assessment and selection of HCC patients for surgery, especially those with intermediate or advanced disease.


Introduction
Hepatocellular carcinoma (HCC) ranks fifth in cancer incidence and third in cancer mortality worldwide [1]. Barcelona Clinic Liver Cancer (BCLC) staging is a commonly used staging system for HCC that considers tumour burden, liver function, and patients' general condition; for patients with early-stage HCC, it also analyses indications for hepatic resection [2]. However, increasing evidence indicates that patients with intermediate or advanced HCC might benefit from hepatic resection [3][4][5][6][7][8][9]. Our previous study showed that prognoses of patients at the same BCLC stage varied significantly after surgery. For  stage B, 14.7% died within 1 year after surgery whereas 33.8% were still alive after 5 years [10]. We conclude that HCC is a highly heterogeneous and comprehensive tumour even within the same clinical stage. Differences in prognosis after hepatectomy for HCC might result from different biological behaviours of tumours, which could be assessed by different expression levels of genes correlated with BCLC staging and patients' overall survival (OS). Thus, identification of differentially expressed genes (DEGs) in HCC is clinically significant, as their products could be markers for survival, prognostic effects of liver resection, or surgical indication, especially among patients with intermediate or advanced HCC. Other studies have used microarray technology to identify HCC-associated DEGs in daily clinical practice, including applications for differential diagnosis, risk assessment, determination of prognosis, disease monitoring and prediction of treatment response [11][12][13]. For instance, Sakabe T et al. reported that DKK1 might be an unfavourable prognostic marker for HCC patients [11]; Zheng H et al. found that TMED3 promotes HCC progression [12]; and Emma MR et al. identified NUPR1 as a promising target in HCC for its effects in controlling cell growth, migration, invasion and sorafenib resistance [13]. However, their respective gene expression datasets were not fully utilized due to restricted purposes or limited tools, and information on potential biomarkers associated with BCLC staging and patients' prognosis remains scarce.
In the present study we identified HCC-associated DEGs through differential analysis of 3 gene expression datasets. Another dataset from surgical specimens was used to perform a weighted gene co-expression network analysis (WGCNA) for upregulated DEGs. Gene modules with close relationships to the BCLC staging system were screened. We also used UAL-CAN analysis, an online tool based on The Cancer Genome Atlas (TCGA) datasets, to reassess the expression levels of hub genes associated with tumour stages and OS.
The raw data were pre-processed. Probes were matched to genes and gene expression levels were taken as the average probe values for genes corresponding to more than one probe. Log2 conversion and quantile normalization were applied to data if appropriate. Genes with more than 20% missing values were removed.

Functional enrichment analysis
Gene ontology (GO) is a major bioinformatics tool for gene annotation that uses a highly structured vocabulary including three main categories: molecular functions (MF), biological processes (BP) and cellular components (CC) [19]. The Kyoto Encyclopaedia of Genes and Genomes (KEGG) is a database used to associate related genes by pathways [20]. The Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://david.ncifcrf.gov; version 6.8) provides a comprehensive set of functional annotation tools that enables investigators to understand the biological meaning behind large lists of genes [21]. In the present study, DAVID online tools were used to perform GO annotation and KEGG pathway enrichment analyses of DEGs. P<0.05 and false discovery rate (FDR) <0. 25 were set as thresholds.

WGCNA
A gene co-expression network was constructed to screen for genes related to BCLC staging among the DEGs. The WGCNA package of R was used [22]. Before network construction, obvious outlier samples or samples with excessive numbers of missing entries were removed using a sample network method for outlier detection.
Step-by-step network construction and module detection were performed [22]. The soft-threshold power was selected based on the criterion of approximate scale-free topology. We calculated adjacencies using soft-threshold power and then transformed the adjacency into Topological Overlap Matrix (TOM) and calculated the corresponding dissimilarities. Average linkage hierarchical clustering was performed based on the TOM with a minimum module size of 30 and a medium sensitivity of 2, and gene modules with very similar expression probes were identified and merged. Furthermore, after correlating gene modules with external clinical traits, the associated gene significance (GS, the absolute value of the correlation between the gene and the trait) and module membership (MM, the correlation of the module gene and the gene expression profile) were assessed. Based on GS and MM, specific gene modules related to BCLC staging were selected and the top 50 core genes in the selected modules were screened and regarded as hub genes.

Functional annotation analysis for hub genes
To analyse hub gene functions at the molecular level, GO functional enrichment and KEGG pathway analysis were performed using the DAVID online tool. The Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org; version 10.5) database was used to construct a protein-protein interaction (PPI) network [23]. An interaction with a combined score >0.4 was considered statistically significant.

UALCAN analysis
UALCAN is an interactive web-portal for facilitating tumour subgroup gene expression and survival analyses (http://ualcan.path.uab.edu/analysis.html) [24]. We used UALCAN analysis to estimate the effects of hub gene expression levels based on tumour stages in the Cancer Genome Atlas (TCGA) liver cancer datasets. Available TCGA patient survival data were also used for Kaplan-Meier survival analyses.

Identification of DEGs
Sets of 19611, 20258 and 22185 genes were detected in the datasets of GSE87630, GSE84598, and GSE45267, respectively, after pre-recession of raw data. DEGs were screened in 3 datasets, which contained both tumour and normal tissues. Cohorts of 1163, 1914, and 1582 DEGs were obtained in GSE87630, GSE84598, and GSE45267, respectively. Only 352 genes were common among the 3 sets of DEGs; 623 genes were common between 2 sets of DEGs; and 2340 genes were unique to one set. This suggests a high heterogeneity among HCC samples. The combined 3 sets of DEGs, a total of 3315 genes, of which 1573 were upregulated and 1748 were downregulated, were regarded as HCC-related DEGs for further analysis.

Functional enrichment analysis
Results of GO analysis showed that upregulated DEGs were significantly enriched in cell division, sister chromatid cohesion, and DNA replication in BP; protein binding, poly-A RNA binding and ATP binding in MF; and nucleoplasm, nucleus and cytosol in CC. Among downregulated DEGs, significantly enriched GO-terms included oxidation-reduction process, immune response and xenobiotic metabolic process in BP; haem binding, monooxygenase activity and oxygen binding in MF; and extracellular space, extracellular region and extracellular exosome in CC ( Table 1).
The KEGG pathways of the upregulated DEGs were enriched in cell cycle, DNA replication, and viral carcinogenesis; and those of the downregulated DEGs were enriched in complement and coagulation cascades, metabolic pathways, chemical carcinogenesis, and drug metabolism by cytochrome P450 (Table 2).

WGCNA
To further screen relationships between the 1573 upregulated DEGs and clinical traits, we performed WGCNA. Among 1573 DEGs, 1568 gene expressions mapped to probes in GSE76427 (S1 Table), which contained pre-processed raw data. Based on WGCNA results, systematic clustering of upregulated DEGs was generated (Fig 1). The gene modules were signified by different colours. According to the hierarchical clustering tree of genes, 7 gene modules were identified with the soft-threshold power of 3 and the gene number of gene modules ranged from 41 (black) to 756 (turquoise). The heat map of upregulated DEGs classified by gene modules was shown in Fig 2. The gene modules were also associated with external clinical traits (S2 Table) (Fig 3). Genes in the Green module clearly showed a closer relationship with BCLC staging than any other gene modules, and were also correlated with OS. Based on GS and MM, the top 50 core genes in the Green module were regarded as hub genes, including PUF60, BOP1, and SCRIB (S3 Table).

Functional annotation analysis for hub genes
Results of GO analysis showed that hub genes were mainly concentrated in RNA processing, non-coding RNA processing and RNA phosphodiester bond hydrolysis in BP; RNA binding, nucleic acid binding and poly-A-RNA binding in MF; and nucleus, nuclear lumen and nuclear part in CC (Table 3). No other significantly enriched pathways of hub genes were identified. Based on the information in the STRING databases, we created the PPI network of hub genes. HSP90AB1, BOP1, DCAF13, RPL8 and STIP1 were the 5 genes with the most interactions with other hub genes, and were at the core of the PPI network.

Discussion
Recent studies have shown that many patients with HCC whose disease exceeds BCLC staging recommendations for resection might benefit from hepatectomy [3][4][5][6][7][8][9]. However, in our previous studies 18.6% of these patients had unfavourable surgical outcomes, such as death within 1 year of surgery with tumour recurrence; therefore, these liver resections were regarded as futile procedures [10]. We believe that HCC is a highly heterogeneous tumour even when classified into the same clinical stage, and a thorough selection process is essential to ensure that only patients whose survival is likely to improve undergo liver resection. Biomarkers are substances found in tissue, blood, or other body fluids that indicate diagnostic, prognostic, predictive, therapeutic, or other clinically relevant properties [25]. Many biomarkers for HCC and their corresponding targeted agents have been identified for use in  the diagnosis and treatment of HCC [11][12][13]. In the present study, we aimed to find candidate biomarkers for which high expression was associated with higher BCLC staging and poorer OS of HCC patients, as they may reflect tumour characteristics indicative of prognosis (such as malignancy) and could thus be used to improve selection of candidates for surgery; patients with comparatively low expression of these biomarkers might have less malignant tumours and therefore might be likely to enjoy longer OS after hepatectomy. We found a total of 3315 HCC-related DEGs, among which 1568 genes were upregulated in HCC and were matched with probes in another dataset containing samples from 115 HCC patients with BCLC stage 0-C disease who underwent surgery. Using WGCNA, we identified 50 hub genes that were significantly correlated with BCLC staging. The results of functional enrichment analysis of upregulated DEGs showed significant enrichment of genes involved in cell division and mitotic nuclear division, which might be related to proliferation of cancer cells. Pathway enrichment analysis showed that cell cycle, DNA replication, viral carcinogenesis, and RNA transport were significantly represented in the DEGs; these pathways are closely related to the occurrence and progression of tumours [26,27]. However, functional enrichment analysis of 50 hub genes gave quite different results. The significant enrichment terms of hub genes were RNA processing, non-coding RNA processing and RNA phosphodiester bond hydrolysis, rather than the GO-BP terms related to cell proliferation identified for DEGs, such as positive regulation of cell division, mitotic nuclear division, and cell cycle. Furthermore, these hub genes showed no significant enrichments in biological pathways. We speculate that the hub gene products have complex molecular functions; that is advanced HCC reflects more than accelerated cell proliferation or enhanced inhibition of apoptosis, and may not result from alterations in only a few central molecular pathways.
Unsurprisingly, genes related to BCLC are also associated with HCC prognosis, as shown by the WGCNA results; in previous studies this may have been interpreted as a decline in patient prognosis as BCLC stage progresses [2]. To reassess hub gene expression levels  [28][29][30][31]. C8ORF33 has been associated with OS in HCC patients and could be used to distinguish poorly differentiated from well differentiated HCCs [28]. STIP1 acts as an adaptor protein that coordinates functions of HSP70 and HSP90 in protein folding, and as a secretory protein that regulates malignant cell growth. Autocrine STIP1 may inhibit HCC apoptosis through the PI3K-AKT-dependent pathway and is associated with poor prognosis in HCC [29]. Hsp90AB1 is frequently upregulated in cancer [30]. In the HCC cell line HepG2, HSP90AB1 expression is upregulated by hepatitis B virus encoded X protein (HBx), which may help reveal the role of HBx in hepatocarcinogenesis [31]. HSP90AB1 may also regulate angiogenesis, an important function in tumour progression [30]. PRKDC has been identified as critical to HCC pathogenesis and prognosis through RNA sequencing data [32]. Decreased PRKCA expression results in decreased cell proliferation, migration and invasion of HCC cells, suggesting a role for PRKDC in the malignant progression of HCC [33].
With regard to the other genes, NUDCD1 (also known as CML66 or OVA66) has been reported to have an oncogenic function in cervical carcinoma by favouring tumour cell proliferation, invasion, and metastasis associated with multiple pathways [34]. NUDCD1 is also expressed abundantly in primary acute myeloid leukaemia cells, acute lymphoid leukaemia cells, and chronic myelogenous leukaemia cells in advanced phase, compared with normal cells [35]. INTS8 (also known as C8orf52) has been identified as a potentially mutational driver gene in endometrial carcinoma [36]; studies based on microarrays and exome-sequencing have proposed roles for INTS8 in gastric cancer [37] and peripheral T-cell lymphoma [38]. DSCC1 is implicated in sister chromatid cohesion and DNA replication, and its elevated expression decreases apoptosis in colorectal cancer cells [39]. These data suggest that NUDCD1, INTS8, and DSCC1 might be novel indicators for HCC staging and prognosis, although the link between these three genes and HCC is currently unclear.
TIGD encodes POP1 (also known as ANXD2), which belongs to the Tigger subfamily of the Pogo superfamily of DNA-mediated transposons in humans. POP1 is a ribonuclease that localizes to the nucleus and affects pre-RNA processing. ARHGAP39 (also known as CrGAP or Vilse) is a Rho-GTPase activating protein that may affect dendritic structure and synaptic function in the brain [40]. The functions of ZNF623, YDJC, and PUSL1 are unclear. However, their correlations with liver cancers warrant further study, as these genes and their products are potential prognostic markers of HCC.

Conclusion
We identified 50 hub genes that correlated with BCLC staging of HCC patients, of which 13 hub genes (TIGD5, C8ORF33, NUDCD1, INTS8, ZNF623, STIP1, HSP90AB1, DSCC1, POP1, ARHGAP39, PRKDC, YDJC and PUSL1) also correlated with OS in HCC patients. We hypothesize that patients with BCLC stage B/C HCC and higher expression of these genes are likely to have poor prognoses after surgery, and curative surgical treatment might be avoided in such cases. Prospective studies are needed to confirm the gene functions and clarify the mechanisms of their effects on HCC staging and progression.
Supporting information S1 Table. Gene expressions mapped with probes in GSE76427. Gene expressions mapped with probes in GSE76427 with pre-processed raw data. (XLSX) S2 Table. External clinical traits of GSE76427. External clinical traits of GSE76427. (XLSX) S3 Table. Gene significance and module membership based on WGCNA. Based on gene significance and module membership, the top 50 core genes in the Green module were regarded as hub genes. (XLSX)