Identifying Prognostic Features by Bottom-Up Approach and Correlating to Drug Repositioning

Background Traditionally top-down method was used to identify prognostic features in cancer research. That is to say, differentially expressed genes usually in cancer versus normal were identified to see if they possess survival prediction power. The problem is that prognostic features identified from one set of patient samples can rarely be transferred to other datasets. We apply bottom-up approach in this study: survival correlated or clinical stage correlated genes were selected first and prioritized by their network topology additionally, then a small set of features can be used as a prognostic signature. Methods Gene expression profiles of a cohort of 221 hepatocellular carcinoma (HCC) patients were used as a training set, ‘bottom-up’ approach was applied to discover gene-expression signatures associated with survival in both tumor and adjacent non-tumor tissues, and compared with ‘top-down’ approach. The results were validated in a second cohort of 82 patients which was used as a testing set. Results Two sets of gene signatures separately identified in tumor and adjacent non-tumor tissues by bottom-up approach were developed in the training cohort. These two signatures were associated with overall survival times of HCC patients and the robustness of each was validated in the testing set, and each predictive performance was better than gene expression signatures reported previously. Moreover, genes in these two prognosis signature gave some indications for drug-repositioning on HCC. Some approved drugs targeting these markers have the alternative indications on hepatocellular carcinoma. Conclusion Using the bottom-up approach, we have developed two prognostic gene signatures with a limited number of genes that associated with overall survival times of patients with HCC. Furthermore, prognostic markers in these two signatures have the potential to be therapeutic targets.


Introduction
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death in the world, especially in Asia and Africa [1]. Surgical resection is one of the most important curative treatments for HCC, while long-term survival of HCC remains poor because of high recurrence rate. Improvements in early diagnosis and accurate staging systems can help guide patients to take optimum treatment strategies that may suppress recurrence and prolong survival [2].
Currently, beyond tumor-node-metastasis(TNM) staging system, several prognostic algorithms used to predict survival among patients with hepatocellular carcinoma have been established, Barcelona Clinic Liver Cancer (BCLC) and Cancer of the Liver Italian Program (CLIP) systems are among the most commonly used systems worldwide [3][4][5][6][7]. Nonetheless, all these staging systems only select optional clinical and serum biochemical indexes such as tumor size, vascular invasion, alpha fetoprotein (AFP), albumin(ALB), etc. Although these clinicopathologic staging systems have been proven useful, their predictive accuracy remains limited and they failed to provide molecular biological characteristics of HCC that might be genetic and heterogenic. With the recent advances in genome studies, gene expression profiling-based studies have improved our understanding of cancer biology and gene expression signatures have been successfully used as prognostic tools especially in breast cancer [8][9][10].
Recently, two main strategies have been used for prognostic gene signature identification, symbolized as "top-down" or "bottom-up". In 'top-down' approach firstly genes with different expression patterns between case samples and control samples were sought, secondly gene(s) with expression level(s) significantly correlating to histological grades or biological phenotypes were selected as candidate genes, then a regression model with best predictors was built to construct the final gene signature. Many studies have applied this approach and identified quite a few gene signatures with prognostic values, such as MammaPrint [11]. Unfortunately, prognostic signatures identified by this approach in HCC present minimal overlaps and few of them have been adopted in routine clinical practice [12][13][14]. The 'bottom-up' approach was firstly based on supervised analysis of genes which are directly associated with occurrence of the event studied (metastasis, survival), Secondly some genes were selected by machine learning algorithms or significant enrichment analysis of specific pathways or biological functions, and then the prognostic value of these gene sets could be calculated [15,16]. This 'bottom-up' approach was applied by some groups in several cancers but few of assays used this approach to identify HCC prognostic signatures. While the candidate gene set was determined by "topdown" or "bottom-up" approach, various machine learning algorithms including regression models can be used to identify the final gene signatures. However, overfitting and the low accuracy in independent cohort limited the clinical application of these algorithms [17]. Meanwhile, analysis of networks and modular biological processes has shown the effective capacity to estimate key genes which may have impact on patient outcomes [18].
In addition to identifying prognostic signatures in tumor tissues, many research groups showed that genes in adjacent non-tumor tissues appear to be implicated in tumour progression and aggressiveness, gene expression profiles in surrounding non-tumor tissues can be helpful to identify signatures associated with outcomes [9]. This can be especially true with adjacent non-tumor tissues for HCC, where often pathological change such as cirrhosis is present because of long-term inflammation caused by hepatitis B or C virus (HBV or HCV) infection [9,19].
Over the past years, large investments have been input to discover novel markers for valuable biological insights to mechanisms of many common diseases. Nonetheless, the translation from genetic findings to clinical applications remains limited. Recently, some works explored one potential application of these genetic molecular markers: drug repositioning [20,21]. We suppose that prognostic gene signatures could not only be used for predicting patient outcome, but also can be used to gain important information for drug discovery, which can then be utilized to provide appropriate therapeutic advices to patients.
In this study, we have mainly implicated the 'bottom-up' approach to develop HCC prognostic signatures both in tumor and adjacent non-tumor tissues. The workflow of our analysis is shown in Fig. 1. In the training cohort, we first identified genes significantly associated with clinical staging systems or patient survival times, then selected some representative and important markers by network and pathway enrichment analyses, finally developed two signatures that can predict overall survival of HCC patients. Likewise, the 'top-down' approach have also been tried. Differentially expressed genes(DEGs) in tumor tissues compared to adjacent nontumor tissues were firstly obtained, then we adopted the same bioinformatics methods to identify some core genes, based on which a DEGs-related signature was finally constructed. The three signatures were further tested in an independent cohort. The results showed that signatures developed by the 'bottom-up' approach were more efficient compared to signatures identified by the 'top-down' approach, and two signatures identified by 'bottom-up' approach were validated in the independent cohort. In addition, we have evaluated the potential application of our molecular markers as drug targets by drug repositioning analysis.

Datasets: Genomic profiles and patient information
Publicly available datasets with whole-genome gene expression measures in 225 tumors and 220 adjacent non-tumor tissues from 225 primary HCC patients were downloaded from microarray databases Gene Expression Omnibus (GEO: http://www.ncbi.nlm.nih.gov/projects/ geo/), the accession number is GSE14520 [14]. Pre-processed series of matrixes originally provided by the authors were used in our analysis. Accessory available clinical and follow-up data were also provided by the authors. Patient and tumor features are detailed in Table 1. Among the 225 patients, four patients were excluded due to the lack of survival time. The validation set that included 80 tumors and 82 adjacent non-tumor liver tissues from 82 primary HCC patients was also retrieved from Gene Expression Omnibus, the accession number is GSE10143 [9]. For each sample, the expression values of all probes for a given gene were reduced to a single value by taking the average expression value.
'Top-down' approach: identification of differentially expressed genes (DEGs) As the first step of the 'top-down' approach, we identified the most varying genes between tumors and non-tumor tissues using Linear Models for Microarray Data (limma) [22] analysis. Eventually, only the DEGs with a false discovery rate(FDR) < 0.05 and fold change 2 were selected. The workflow of prognostic model construction. Firstly, prognostic genes were selected for ProgGenes and ProgNGenes sets (bottom-up approach). Differentially expressed genes(DEGs) were selected by comparing the profiles of tumors and non-tumors synchronously (top-down approach). Next, the three gene clusters identified by both approaches went through both network and pathway analysis to obtain most important genes for prognostic signature assembling and model construction. Finally the signatures were validated in one independent clinical data set. 'Bottom-up' approach: identification of outcome correlated genes in tumor and adjacent non-tumor tissues In this study, the 'Bottom-up' approach was mainly applied to develop HCC prognostic signatures. To obtain genes with potential prognostic value in tumor and adjacent non-tumor tissues, we evaluated all the genes in two aspects (Fig. 1).

Survival time related genes
Survival-directed prognostic genes contains recurrence free survival(RFS) and overall survival (OS) related genes respectively. The correlation to RFS or OS of each gene is tested by both univariate Cox proportional hazards regression and supervised principal components analysis (SPCA) [23,24]. For univariate-Cox analysis, genes with a p-value < 0.05 was selected as Cox RFS or OS prognostic genes. In our work, the patient outcomes were used as response variables and the first principal component of SPCA were selected to determine most important score for each gene. Finally, top 2000 most important genes were selected as SPCA RFS or OS prognostic genes. At last, overlapping genes identified by both univariate-Cox and SPCA were considered as RFS or OS prognostic genes. Stage related genes. HCC staging systems are commonly used to indicate outcomes. As noted in Table 1, the three clinical staging systems TNM, BCLC, and CLIP are most significantly associated with RFS and OS. We may reason that genes significantly correlated to TNM, BCLC and CLIP also have prognostic values. Therefore, genes associated with the three clinical staging systems were selected. TNM and CLIP related genes were selected by logistic regression analysis and Kruskal-Wallis test. BCLC related genes were selected using logistic regression analysis and t-test. Genes with a p-value <0.05 were regarded as significant genes.
According to the above analysis, five gene sets were identified in tumor tissues as well as adjacent non-tumor tissues, which are related to RFS, OS, TNM, BCLC or CLIP respectively. Finally genes identified in two of five types of prognostic gene sets were defined as prognostic genes( Fig. 1), which was termed as ProgGenes in tumors. Similarly, a set of prognostic genes termed ProgNGenes in adjacent non-tumors were also determined.

Ranking of candidate gene sets by network prioritization
Networks construction. To identify the key genes involved in HCC from the aforementioned three larger sets of candidate genes separately, we constructed three protein-protein interaction (PPI) networks and performed topological analysis for each. Each gene set was firstly converted to be the seed proteins. Initial interactions among these proteins with a confidence score > 0.4 were obtained from STRING database (version 9.1) (Search Tool for the Retrieval of Interacting Genes/Proteins; http://string.embl.de/). Then, to ensure these interactions truly exist in the gene expression profiles, only gene pairs with a p-value from Pearson Correlation Test less than 0.05 were retained. After that, an undirected network was finally constructed based on these gene pairs.
Topological analysis of protein interaction networks. In order to analyze these three networks and to search topologically important nodes, three fundamental measurements in network: degree, betweenness centrality(BC) and closeness centrality(CC) were calculated. Degree measures how many neighbors a node directly links to. A node with high degree centrality may have more influence over others. BC measures how often nodes occur on the shortest paths between other nodes [25], and CC measures the average length from a node to all other nodes [26]. In the PPI network, the nodes with high degree are defined as hub nodes, the nodes with high BC were defined as bottleneck nodes [27], and the nodes with high CC are also important [28], all the three types of nodes are key nodes in the network. In this study, the prioritization of candidate genes was based on the average ranking derived from the three parameters. For each gene in the network, a comprehensive rank score(RS) was calculated as follow: Where Rank degree , Rank BC and Rank CC represent the rank of the gene in the network according to degree, BC and CC.
We selected the top 5% nodes as the key nodes for later analysis. Network visualizations were generated by Cytoscape [29] and network topological parameters were calculated using the Network Analyzer Cytoscape plugin [30].

Pathway Enrichment Analysis
By using the online software GSEA(gene set enrichment analysis, http://www.broad.mit.edu/ gsea/, v3.87), KEGG pathway enrichment analysis was carried out to find the main functional and metabolic pathways involved in HCC. The most significantly enriched genes (FDR<0.001) were mapped to the corresponding KEGG pathways.

Integration of network prioritization and pathway enrichment
In order to find vital genes not only playing key roles in networks but also possessing important biological functions in pathways, we integrated the network prioritization result and pathway enrichment result. P-value of hypergeometric probability distribution were calculated to examine whether genes in each significantly enriched pathway were also enriched in key genes selected from PPI network. Significant genes involved in pathways of the same main category were combined into a gene cluster. At last, three types of gene clusters were retrieved, which were representative for DEGs, ProgGenes and ProgNGenes.

Construction of prognostic signatures
Each gene cluster of DEGs, ProgGenes and ProgNGenes type was tested for its prognostic value separately. Both leave one out cross validation (LOOCV) combined with Cox proportional regression and unsupervised hierarchical clustering were used for patient classification. LOOCV involves using each sample in turn as the validation data for prediction, and the remaining samples as the training set to establish Cox proportional hazards model. The classification of the left-out sample was based on predicted value compared with the mean predicted value of the remaining samples. Finally all samples were divided into two classes, then we evaluate whether the two classes show significantly different overall survival [31]. Unsupervised hierarchical clustering is a more popular classifier in many prognostic studies. Here Ward's method was applied for clustering [32]. Gene clusters that performed well in both LOOCV and hierarchical clustering method are retained to construct integrated prognostic signature for each type of DEGs, ProgGenes and ProgNGenes.

Validation in the independent cohort
For each type of DEGs, ProgGenes and ProgNGenes, a best integrated prognostic signature was constructed by combining gene clusters in the same category as a final prognostic signature. To evaluate the consistency and robustness of our signatures, we tested their prognostic power in an independent HCC microarray dataset(GSE10143) using LOOCV and hierarchical clustering. Further, we compared the performance of our signatures with other 21 signatures reported with prognostic value for either recurrence or survival of HCC from Molecular Signature Database (v4.0, http://www.broadinstitute.org/gsea/msigdb).

Survival Analysis
We used log-rank test and Kaplan-Meier method to assess survival. Survival analyses were performed using the Cox proportional regression model. All statistical analyses were conducted using the open-source R software (http://www.r-project.org). All P values were two sided.

Drug repositioning analysis
Genes identified by our methods play very important roles in HCC and approved drugs targeting these genes might also be effective for treating HCC. Information on drugs, drug targets and drug indications was obtained through a combination of Therapeutic Target Database (TTD, http://xin.cz3.nus.edu.sg/group/ttd/ttd.asp) and DrugBank (http://www.drugbank.ca/).

Identification of top-down and bottom-up candidate prognostic gene sets in HCC
To identify a robust molecular signature to predict outcome of HCC, we analyzed the genomewide expression profiling of 225 HCC patients from three aspects (see flowchart in Fig. 1).
According to the top-down approach, we first compared gene expression profiles between tumor tissues and adjacent non-tumor tissues. A total of 1400 transcripts corresponding to 1079 genes were selected as DEGs.
In bottom-up approach, 1312 genes were significantly related to outcome or prognostic staging systems within tumor tissues, which were termed as ProgGenes. Similarly, 872 prognostic genes obtained from adjacent non-tumor tissues were regarded as ProgNGenes. In the following analysis, each of the three candidate gene sets will be analyzed to identify a prognostic gene signature separately.

Prioritization of candidate gene clusters by network topology and function analysis
In order to excavate stable and important genes involved in the development of hepatocellular carcinoma, we combined network and pathway enrichment analysis.
The three candidate gene sets were converted to seed proteins. Then functional linkages among these proteins were acquired from the STRING database. However, these associations were derived through wide diversity of experiments or prediction algorithms, and detected in various types of cell rather than in hepatic cells. Therefore, we next evaluated all the interactive pairs with our expression profiles. Only significantly related pairs were retained to construct subsequent molecular networks. At last, three PPI networks were constructed based on these linkages respectively, as shown in S1 Fig. A-C. The interactions of DEGs network were most closely linked(density: 0.019, and average neighbors: 15.68). And the next is the ProgNGenes network, of which density and average neighbors is 0.009 and 4.96 separately. The density and average neighbors of ProgGenes network is 0.007, 6.35 respectively. To extract key genes from one big network effectively, we analyzed and processed the networks with measurements of degree, betweenness centrality(BC) and closeness centrality(CC). The average ranking of these three measurements for each node in the undirected networks was evaluated. Nodes at top 5% of the networks, which are more important than other less connected nodes, were selected as the core network gene sets (S1 Table).
KEGG pathway enrichment analyses were carried out for the three candidate gene sets separately. The DEGs were significantly enriched in 78 pathways, out of which 28 belong to metabolism related subcategories(S2 Table). Besides, the ProgGenes were significantly enriched in 86 pathways, and 39 out of which also belong to metabolism related subcategories(S3 Table). Relatively, ProgNGenes were enriched in 52 pathways, out of which only 4 related to metabolism, the other enriched pathways belong to 19 subcategories such as signal transduction and immune system(S4 Table).
In order to find vital genes not only playing key roles in networks but also possessing important biological functions, we examined whether genes in each significantly enriched pathway were at the same time enriched in core genes derived from PPI network analysis. The significant genes are shown in S5-S7 Tables. It can be seen that many genes involved in different pathways belong to the same main category. To generate representative signatures for each type of DEGs, ProgGenes, and ProgNGenes, significant genes belonging to the same main category were combined into one functional gene cluster. Eventually, five gene clusters termed as D1-D5 which were most representative for DEGs, four gene clusters P1-P4 for ProgGenes and five clusters PN1-PN5 for ProgNGenes were obtained. In addition, because that immune microenviroment has been proven with value for predicting outcomes [33] and many immune related pathways were identified in non-tumor tissues, we supposed that genes related to immune system to be important for HCC prognosis prediction. These genes were singled out as PN6. Moreover, as we all know, most cancers exhibit similar characteristics and share some common related genes such as EGFR, P53. We hypothesized that our final identified genes embedded in cancer related pathways may also be indicators for HCC outcome, and these genes in DEGs, ProgGenes and ProgNgenes were group as signature D6, P5 and PN7.

Evaluation of prognosis performance for functional gene clusters and construction of final combined prognostic gene signatures
For our ultimate goal of prognosis analysis for HCC patients, we first assessed the prognosis performance of all the functional gene clusters. We adopted supervised LOOCV and unsupervised hierarchical clustering methods to test whether each gene cluster could classify patients into two groups with significantly different overall survival(OS) rates (good and poor prognosis respectively). The corresponding p-values for each functional gene cluster are shown in Table 2 and S2 Fig. From the table we can see most of the gene clusters show significant prognostic relevance, clusters in non-tumor perform even better than clusters of DEGs and ProgGenes (from tumor). Among all the clusters identified in DEGs and ProgGenes, metabolism related cluster D1, replication and repair cluster D2, cell growth and death cluster D3, carbohydrate metabolism cluster P1, mTOR signaling cluster P2, cancer related cluster P5 were significant (p<0.05) or showed a tendency (p<0.1) for prognostic value for HCC, evaluated by both LOOCV and hierarchical clustering. In non-tumor tissues, one carbon pool by folate cluster PN1, signal transduction cluster PN2, Human Diseases cluster PN5, immune related cluster PN6 and cancer related cluster PN7 were tested to have prognosis value for HCC. These clusters with a small number of genes showed great prognostic potential for hepatocellular carcinoma.
In order to retrieve a comprehensive and representative prognostic gene signature for each type of DEGs, ProgGenes and ProgNGenes, we combined gene clusters in each type as listed above which possess prognosis values in HCC. At last, three representative prognostic gene signatures were finally constructed, D123 was combined by D1, D2 and D3, P125 was united by P1, P2, P5, and similarly PN1, PN2, PN5, PN6, PN7 were combined into PN12567. Next we reexamined the prognosis power of these representative three signatures in the original patient cohort of HCC. The clustering result for D123, P125 and PN12567 was shown in S3 Fig. It can be seen from the survival curves shown in Fig. 2A-B, D123 failed to show significant predictive performance according to both LOOCV and HC. P125 successfully divided patients into two significant groups by LOOCV (p = 0.024) but not by hierarchical clustering (Fig. 2C-D). PN12567 performed well in HC (p = 0.021, Fig. 2E) and patients classified by LOOCV also showed different OS rates but not significant (p = 0.065, Fig. 2F).  Fig. 3D, F).

Validation of prognostic gene signatures from DEGs, ProgGenes and ProgNGenes in one independent cohort
Furthermore, we compared our results with the performances of 21 reported signatures related to tumor recurrence, metastasis or survival in HCC. Most of these reported signatures were derived from DEGs and analyzed by the pipeline of top-down approaches. None of them successfully classified patients into two groups with significantly different overall survival by LOOCV(S8 Table). These results suggest that prognostic gene signature identified by bottom up approaches might be more robust than those identified by top-down approaches, and also contain fewer genes.

Indications of prognostic markers in drug-repositioning
After all the above analyses, we have finally obtained two prognostic gene signatures P125 and PN12567 for HCC patients. For decades, only a few drugs have been effective for the treatment of hepatocellular cancer such as the widely used Sorafenib. We investigated the potential of identifying indications to HCC for existing drugs based on these prognostic biomarkers, since they have been proven to be consistently correlated with HCC patients outcome. By scanning the Therapeutic Target Database and DrugBank [34,35], we found that 10 out of 11 genes in P125 and 10 out of 14 genes in PN12567 are annotated as druggable targets (S9 Table). These proportions are much higher than if derived from the whole genome, which contains 19027 protein coding genes and 3421 available druggable targets(18%, p < 7e -9 for P125 and p < 2e -6 Identifying Prognostic Features and Correlating to Drug Repositioning for PN12567). By retrieving the DrugBank database, we furthermore found that 11 human drug targets(8 for P125, 3 for PN12567) can be modulated by 40 unique approved drugs (Table 3). However, all the indications of these approved drugs were unrelated to hepatocellular cancer before, which means, they might be repositioned to HCC-related as well. Meanwhile, there are 3 genes in P125 and 11 genes in PN12567 left which are not annotated by DrugBank. Furthermore, we studied their interactions with drug targets by analysis of the subnetworks of P125 and PN12567. Interestingly, all the genes in each signature set were connected closely (S5 Fig.). These results suggested that some signatures which play important roles in HCC tumorigenesis and are not directly targeted by drugs, such as HIF1A, IGF1 in P125 subnetwork and TP53, PIK3CB in PN12567 subnetwork, were the direct/indirect interactive genes of drug targets.

Discussion
Prognosis analysis is probably the most frequently performed clinical modeling in cancer research based on gene expression profiles. A top-down approach pipeline is quite conceivable, that is, first differentially expressed genes(DEGs) in cancer versus control were identified by statistical methods, then certain gene prioritization methods were utilized to reduce the DEGs to a smaller gene signature set that possess the best prognostic power, finally appropriate   [36,37].
In this work, we emphasize another direction, the bottom-up approach for prognostic signature identification based on gene expression profiles. It is more like an objective-oriented approach. That is to say, in a training set with patient samples and survival information and gene expression datasets, outcome(survival time, tumor stage) correlated genes were statistically identified in the first step, then prognostic gene signatures formed by core functional genes were extracted by gene prioritization, finally clinical prognostic signatures were constructed based on prioritized gene signatures. The established prognostic signatures must be validated in independent datasets. In this work gene prioritization utilized combination of network topology and pathway enrichment.
The finding that adjacent non-tumor tissues to hepatocellular cancer actually showed strong tendency for prognosis may have several biological implications. First it states for the importance of tumor surrounding environment. How well a tumor patient will survive may actually be determined on how adverse the tumor surrounding environment is, rather than or in addition to how adverse the tumor itself is. Second it provides the molecular base for strong pathological value of adjacent tissue to HCC which has been clinically repeatedly confirmed with long-term inflammation and cirrhosis.
Because the prognostic gene signatures identified from bottom-up approach are certainly related to patient outcome such as survival time and tumor stage, they must be genes that are important in tumor development and progression. For examples, IGF1 reflects hepatic synthetic function and takes part in the development and progression of various cancers, and low IGF1 level indicates a poor prognosis in hepatocellular carcinoma [38,39]; HIF1A plays an essential role in embryonic vascularization, tumor angiogenesis and its overexpression is related to an unfavorable prognosis in HCC [40,41]; PPARG involved in PPAR signaling pathway exerts an inhibitory effect on tumor cell growth in hepatocellular carcinoma and therefore its dysregulation affects HCC outcome [42,43]; EGFR plays prominent roles in HCC proliferation, and metastasis [44]. CXCR4 is a comprehensive cytokine receptor with an important role in the dissemination and metastasis in HCC [45,46]; Another immune response gene H2AFX is reported to be associated with fibrosis progression and hepatocarcinogenesis [47]. It should be rational to consider them as good candidate targets for intervention. Previously, we have integrated mutation data from various sources to find significantly mutated genes in HCC [48]. Here in order to find direct genomic variation evidence for our signature genes, we downloaded 202 HCC patients' mutation information from TCGA (http://cancergenome.nih.gov/) to find high frequently mutated genes which might play important roles in HCC pathogenesis. Only the mutated frequencies of EGFR and HSP90AA1 in P125 reached 5%. And they have been targeted by approved drugs (see Table 3). Therefore it might be rational to search for other druggable targets.
We therefore in our work made a preliminary try to investigate the potential of using such prognostic genes to guide drug-repositioning. To our expectation, a very high percentage of the prognostic gene signatures are druggable targets which have been indicated by approved drugs but not having been referred to hepatocellular cancer. For examples, PPAGR, included in P125, is a target for a variety of drugs. Current indication for Rosiglitazone (DrugBank ID: DB00412) that targets PPAGR is diabetes mellitus. Rosiglitazone appears to activate PPAGR and has an anti-inflammatory effect that down-regulates nuclear factor kappa-B (NFκB) level [49,50]; Sulfasalazine(SASP, DB00795), another PPAGR-targeted drug, for which the current indication is inflammatory bowel disease and rheumatoid arthritis, is also useful in restricting lymphoma growth and for therapy of prostate cancers by inhibiting cystine transporter [51,52]; Telmisartan (DB00966) is effective for prevention and treatment of prostate cancer [53]. These observations imply that Rosiglitazone, Sulfasalazine and Telmisartan could have alternative indications for intervention of hepatocellular carcinoma. Another representative marker TYMS in PN12567, plays an important role in DNA methylation, synthesis, and repair. Fluorouracil (DB00544) targeting TYMS is an antineoplastic anti-metabolite, which could masquerade as purine or pyrimidine and form the building blocks of DNA and result in DNA synthesis being inhibited. Fluorouracil injection is indicated in the palliative management of some types of cancer, including colon, breast, renal cell and so on [54]. It was reported that enhancement of TYMS is also related to a reduced risk of HCC through discouraging the misincorporation of uracil into DNA, and some HCC patients may be benefited from 5-Fluorouracil-based chemotherapy [55,56]. Similarly, Gemcitabine(DB00441) inhibits thymidylate synthetase, leading to inhibition of DNA synthesis and cell death, is indicated for the treatment of advanced cancers [57]. These reports also support that Fluorouracil and Gemcitabine may be implicated in drugrepositioning for HCC.
The main limitation to this study is that the training cohort was not big enough or representative enough. Therefore the carefully elaborated prognostic gene signatures although can be validated in one or more independent patient cohort(s), it is hard to expect them to possess universal values for prognosis for HCC patients in general. The road to identifying as good prognostic features in hepatocellular cancers as in breast cancers is still long. Identifying common prognostic signatures is more difficult for HCC due to the high heterogeneity as well as the population diversity. Our work represented by bottom-up prognostic gene signature identification and drug-repositioning analysis may be proved as useful approaches for HCC outcome study and intervention, after being tested on more datasets in the future.  Table. The three core gene sets representative for DEGs, ProgGenes and ProgNGenes by network analysis. Genes at top 5% of the networks, which should be more important than other less connected nodes, were selected as the core network gene sets. (XLSX) S2  Table. Genes annotated as druggable targets in P125 and PN12567. 10 out of 11 genes in P125 and 10 out of 14 genes in PN12567 are potential drug targets. (XLSX)