The Expression of Glyceraldehyde-3-Phosphate Dehydrogenase Associated Cell Cycle (GACC) Genes Correlates with Cancer Stage and Poor Survival in Patients with Solid Tumors

Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is often used as a stable housekeeping marker for constant gene expression. However, the transcriptional levels of GAPDH may be highly up-regulated in some cancers, including non-small cell lung cancers (NSCLC). Using a publically available microarray database, we identified a group of genes whose expression levels in some cancers are highly correlated with GAPDH up-regulation. The majority of the identified genes are cell cycle-dependent (GAPDH Associated Cell Cycle, or GACC). The up-regulation pattern of GAPDH positively associated genes in NSCLC is similar to that observed in cultured fibroblasts grown under conditions that induce anti-senescence. Data analysis demonstrated that up-regulated GAPDH levels are correlated with aberrant gene expression related to both glycolysis and gluconeogenesis pathways. Down-regulation of fructose-1,6-bisphosphatase (FBP1) in gluconeogenesis in conjunction with up-regulation of most glycolytic genes is closely related to high expression of GAPDH in the tumors. The data presented demonstrate that up-regulation of GAPDH positively associated genes is proportional to the malignant stage of various tumors and is associated with an unfavourable prognosis. Thus, this work suggests that GACC genes represent a potential new signature for cancer stage identification and disease prognosis.


Introduction
Although the gene encoding glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is frequently used as a stable marker for constitutive gene expression, its expression is not always constant, especially in cancer. For example, in one study of non-small cell lung cancer (NSCLC), GAPDH was the least stable of the 6 ''reference'' genes examined [1]. This housekeeping glycolytic enzyme has been implicated in multiple functions and has been found to be over-expressed in certain cancers [2].
More than 50 years ago, Warburg hypothesized that cancer growth is facilitated by tumors generating their energy through aerobic glycolysis [3]. Recent studies aimed at evaluating this hypothesis have demonstrated that cancer cells have adapted their metabolism to facilitate the uptake and incorporation of nutrients into the biomass required to produce new cells [4]. Tumor development and progression are indeed correlated with enhanced glucose uptake and/or aberrant glucose metabolism [5][6][7][8]. The hypoxic environment in which tumor cells reside leads to an increase in glycolytic metabolism. As a key intermediate component of glycolysis, GAPDH could serve an important role in cancer cell development and tumor progression. While it is known that most glycolytic enzymes, including GAPDH, are activated and highly expressed to respond to oxygen deprivation in the tumor [9], the role of up-regulated GAPDH in NSCLC remains unclear. In one possibly cancer-related scenario, GAPDH was found to be a prosurvival regulator of caspase-independent cell death (CICD) [10].
In the current study, a publically available microarray database was employed to identify cell cycle-dependent genes that correlate with GAPDH up-regulation and anti-apoptotic activity. This analysis has identified a set of cell cycle-based signature genes, designated GAPDH Associated Cell Cycle (GACC) genes, whose up-regulation is correlated with the aggressiveness of several tumor types and their unfavourable prognosis. Identification of GACC genes may be useful in efforts aimed at elucidating pathways that connect carbohydrate metabolism with cell cycle-based cancer cell development, which might lead to the novel cancer targets based on GACC gene expression patterns in cancers. created from three independent cohorts that were directly downloaded from the publically available European Bioinformatics Institute ArrayExpress database: E-GEOD-18842, E-GEOD-19188 and E- GEOD-19804. The combined dataset, designated the full cohort, consisted of 174 NSCLCs and 156 control tissues. Preliminary analysis suggested the transcription of some cell cycle genes in the NSCLC dataset might correlate with the transcription of GAPDH. Given the role of GAPDH in glycolysis, this finding encouraged further gene expression profile analysis of the tumors for the relationship between carbohydrate metabolism and cell cycle regulation.
Gene expression correlation analysis identified many genes whose up-regulation in the tumors is closely correlated with GAPDH expression. Specifically, within the cancer cohort, 341 upregulated genes were identified with a GAPDH expression correlation coefficient greater than or equal to 0.6. Of these, 117 genes (34%) are described as cell cycle-related based on the Gene Ontology biological process terminology. In the original GeneChip Human Genome U133 Plus 2.0 Array with 54,613 probes, only 2,044 related genes (3.7%) are associated with Gene Ontology biological process term ''cell cycle,'' which implies more than a 9-fold enrichment for cell cycle-related genes in the tumors. An even higher percentage of cell cycle-related genes (18/26, 69%; a 17-fold enrichment for cell cycle-related genes) are found when the correlation coefficient is set more stringently, at greater than or equal to 0.72 (Table 1). These genes are designated here as GAPDH Associated Cell Cycle (GACC) genes. The genes in the list encode proteins related to G1/S and/or G2/M phase transitions (FOXM1, CDCA3 (Tome-1), CCNB2 (cyclin B2), BIRC5, CCNB1 (cyclin B1), CDC45, PRC1 and CCNA2 (cyclin A2)), and proteins involved in mitotic processes (NCAPD2, TPX2, AURKB, PSMD2, KIF4A, KIF2C, UBE2S, and FAM83D). KPNA2 (importin alpha 1) and CDKN3 (cyclin-dependent kinase inhibitor 3) are also cell cycle-related. Top ranked GAPDH positively associated genes in this class include triosephosphate isomerase 1 (TPI1) and glucose-6-phosphate isomerase (GPI), each of which encodes a key glycolytic enzyme. Thus, only 6 of the 26 genes in the list (CENPA, RAD51AP1, SLC2A1, KIF4A, RFC4, and RRM2) may not be designated as Table 1. Top ranked up-regulated GAPDH positively associated genes in NSCLC (correlation coefficient greater than or equal to 0.72). related to the cell cycle or glycolysis. At least a sub-set of these six genes may not actually be exceptions, as CENPA is an essential centromeric protein subject to cell cycle-dependent changes [11]. There is also some evidence that suggests RAD51AP1 may contribute to neoplastic growth [12], whereas SLC2A1 is involved in glycolysis related glucose transport. Finally, KIF4A is implicated in chromosome condensation and segregation during mitosis [13], RFC4 is essential for proliferating cell nuclear antigen (PCNA) dependent DNA synthesis [14], and RRM2 encodes ribonucleotide reductase M2, which catalyzes the formation of deoxyribonucleotides from ribonucleotides in a cell-cycle dependent fashion [15]. In sum, two gene classes have been identified in the list. Class I consists of glucose metabolic pathway related genes while Class II covers cell cycle related genes. The highest ranked GACC genes are highly associated to the pathway regulated by the transcriptional factor Forkhead Box M1 (FoxM1, r = 0.77) in NSCLC. Table 2 lists the genes in the FoxM1 pathway in which their expression is related to G2/M progression, chromosomal segregation, and cytokinesis. The r values between the expression of a majority of these genes (19/29, 68%) and the expression of GAPDH in the list are above 0.6.
One hundred twenty-three down-regulated genes, having a correlation coefficient less than or equal to 20.6, were identified in NSCLC in the course analyzing the data (data not shown). None of the genes in this group are described by the Gene Ontology biological process as cell cycle related.
In contrast to the cancer tissues, analysis of the cancer-free control tissues in the cohort identified only 5/70 (7%) of the genes that correlated positively with GAPDH expression (r greater than or equal to 0.6). The Gene Ontology biological process describes these genes as cell cycle. The correlated cell cycle genes include CDK4 (r = 0.62) and CHK1 (r = 0.62). These two genes were also observed to be highly expressed within the cancer tissues. As implied by the above findings, for most of the highly expressed GACC genes in the tumors, there was only a weak correlation for GAPDH and GACC gene expression observed in the control tissue samples (correlation coefficients were mostly in 0.2-0.5 ranges). By contrast, the correlation coefficients for the 15 highest ranked GACC genes from the cancer tissues were greater than 0.7 ( Figure 1). Expression of genes within the glycolytic pathway (GAPDH, TPI1 and GPI) remained highly correlated in both in NSCLCs and non-cancer control tissue samples.

Up-regulation of GAPDH in NSCLC and glycolysis/ gluconeogenesis pathways
The forgoing results imply that GAPDH expression may be associated with cell cycle related activities. To explore the relationship between the up-regulated GACC genes in lung cancer and the possibly aberrant expression of glucose metabolic pathways, we analyzed the expression values of genes implicated in glycolysis and gluconeogenesis. The expression of each gene in these pathways is up-regulated or unchanged, in the tumor cells, with the exception of fructose-1,6-bisphosphatase (FBP1) (Table 3, Figure 2). Up-regulation of GAPDH is most statistically significant among all 10 glycolysis (G1-10 in Table 3) steps and 4 bypass (BP1-3 in Table 3) gluconeogenesis steps (1.07 fold increase and one-tailed t-test p = 1.44E-57). Approximately 75% or 35% of NSCLC displays both increased GAPDH expression and decreased FBP1 expression in the dataset using either the median or 25% quartile value as cut-off point. Furthermore, the expression of GAPDH and FBP1 are co-regulated with the up-regulation of the GACC gene FoxM1 (r = 0.77) in tumors (Figure 3).
Relevance of up-regulated GAPDH positively correlated genes in NSCLC to regulation of cell senescence To further explore the correlation of the up-regulation of GACC genes and that of the other GAPDH positively correlated genes in NSCLC, we speculated that cancer may arise in part from a reduction in the normal regulation of senescence by genes associated with the cell cycle. We therefore examined gene expression levels in cultured cells in which anti-senescence regulation had been induced, and compared them to the genes whose up-regulated expression were found to be positively correlated with GAPDH expression in the NSCLC cohort. This comparison employed microarray dataset E-GEOD-19018, in which low passage IMR-90 human diploid fibroblasts had been cultured in 20% oxygen (normal replicative capacity) or 3% oxygen (increasing replicative capacity), after which RNA had been extracted and hybridized on microarrays. When the top 341 GAPDH positively correlated genes (r greater than or equal to 0.6) identified in the NSCLC dataset were compared with those from the cultured fibroblasts, the up-regulated GAPDH positively correlated genes (including at least 34% of the GACC genes) in both the NSCLC and the fibroblasts cultured in 3% oxygen (low oxygen tension and anti-senescence) were found to be similar ( Figure 4). A good correlation (r = 0.71) was found between an expression signal ratio of cancer/control and the ratio of 3% oxygen/20% oxygen cultured cells in the expression of those genes.

GACC genes are novel gene signatures for NSCLC and other solid tumors
Using various datasets available to us, we evaluated whether GACC gene expression levels may be relevant to cancer stage and prognosis. As shown in the heat map plot in Figure 5, the top ranked GACC genes, as well as the glycolysis related genes, are clustered in the tumors. Furthermore, these genes can be used to distinguish different cancer stages.
Specifically, the cohorts from E-GEOD-19804, which include 24 NSCLCs with different tumor stages (I, II, III) that had been randomly selected from the original dataset (6 stage I, 6 stage II, and 12 stage III), are preferentially clustered according to stage. Cluster 1 is preferentially associated with more advanced stage tumors, as it has 8 stage III (89%) tumors and 1 stage II tumor. By contrast, cluster 2, which has 11 stage I/II (73%) and 4 stage III tumors, is enriched for early stage tumors ( Figure 5A).
The cohorts from E-GEOD-10927, which include 33 adrenocortical cancers with tumor grades (high or low), can also be clustered. Cluster 1 is enriched for high-grade tumors, as it has 13 high grade (87%) and 2 low-grade tumors. On the other hand, cluster 2 is relatively enriched for low-grade tumors, as has 11 lowgrade (61%) and 7 high-grade tumors ( Figure 5B).
The cohorts from E-GEOD-29431, which contain 28 breast cancers with tumor grades (I, II, III), are clustered. Fourteen stage I/II and 14 stage III samples were randomly selected from original dataset. Cluster 1 has 8 stage III (73%) and 3 stage II tumors, while cluster 2 has 11 stage I/II (65%) and 6 stage III tumors ( Figure 5C).
The cohorts from E-GEOD-6764, which contain hepatocellular carcinoma (HCC) and normal liver, are clustered. Cluster 1 is composed almost entirely of very advanced/advanced III HCC (91%), while cluster 3 has mainly very early/early tumors (67%). All 8 normal samples are in cluster 2 (73%), which also includes 2 very early and 1 early tumors ( Figure 5D).
The pattern of gene clustering suggests that GACC related gene expression might be relevant as biomarkers to predict cancer prognosis. To evaluate this possibility, GACC gene expression levels were analyzed from a dataset of 442 lung adenocarcinomas [16] and correlated to 60-month survival outcomes. High expression of the top ranked GACC genes is associated with poor disease outcome ( Figure 6A). Combining GACC gene expression level with GAPDH level can improve the predictive power of GACC gene level alone in most cases ( Figure 6B).

Discussion
In this study, a publically available microarray database [17] was employed to evaluate transcriptional levels of GAPDH and associated gene expression in solid tumors. The study data were analyzed using the Affymetrix GeneChip Human Genome U133 plus2 or U133A Array formats, which are commonly used for gene expression detection. Data analysis relied on a larger NSCLC cohort (total 330 samples) that combined independent data from various sources. Our identification of statistically significant changes in GAPDH expression in tumors enabled the evaluation of the gene expression profile that correlated with that of GAPDH.
Elevated levels of GAPDH have been observed in oncogeneinduced transformation [18], angiogenesis [19] and anti-apoptotic function [20][21][22]. In other studies, however, GAPDH has been implicated in promoting apoptosis [23][24][25]. The reason for this paradox is poorly understood [26]. Barbini's work [27] has suggested that differential subcellular localization of GAPDH may contribute to its opposing biological activities in apoptotic and proliferating hepatocytes. The different functions of GAPDH may also be regulated by various levels of post-translational modification of the protein [28].
Our analysis shows that the profiles of GACC gene upregulation are proportional to anti-senescence in the profiles in the tumors, rather than correlating with the promotion of senescence or apoptosis. We have identified a group of genes for which mRNA expression strongly correlates with GAPDH expression in tumor cells. In general, two such gene classes have been identified in the tumors. Class I consists of metabolic pathway related genes. Top ranked GAPDH positively associated genes in this class include triosephosphate isomerase 1 (TPI1) and glucose-6-phosphate isomerase (GPI), each of which encodes a key glycolytic enzyme. Class II covers cell cycle related genes that typically encode proteins involved in G2/M transition and M phase cell cycle regulation. Of these, the most informative protein appeared to be transcriptional factor Forkhead Box M1 (FoxM1), a crucial element for cell cycle regulation [29] and an important regulator of tumor metastasis [30]. FoxM1 associated genes CCNB2 (r = 0.75), CENPA (r = 0.75), AURKB (r = 0.73), BIRC5 (survivin r = 0.73), NEK2 (r = 0.69) are among the top GACC genes. The nuclear protein CENPF, which is a transcriptional target of FoxM1 (CENPF, r = 0.67), regulates the spindle assembly checkpoint to ensure proper chromosome stability and segregation during mitosis [31]. Up-regulation of these genes is consistently associated with high-expressions of GAPDH.   Table 3. Gene expression of regulatory enzymes of glycolysis and gluconeogenesis. The mechanism by which GAPDH and FoxM1 may be coregulated is unclear. In order to trigger cell cycle related events, it is possible that both GAPDH and FoxM1 translocate from the cytoplasm to the nucleus in cancer cells. Nuclear translocation of GAPDH may be regulated by acetylation [32]. FoxM1 is localized predominantly in the cytoplasm in late G1 and S phases. Nuclear translocation of FoxM1 occurs just before progression into the G2/M phase of the cell cycle and requires activity within the Raf/ MEK/MAPK signaling pathway [33]. Both GAPDH and FoxM1 co-translocate to the nucleus during the G2/M transition phase through their interaction with other proteins. In this process, KPNA2 (importin alpha 1) may interact with GAPDH given that the correlation coefficient of KPNA2 expression with GAPDH in NSCLC is 0.75.
GACC genes are up-regulated in the cells with 3% oxygen incubation, which have greater replication capacity (anti-senescence) than those cultured in 20% oxygen. This result is in agreement with recent studies demonstrating that GAPDH depletion switches human tumor cells to a senescent phenotype [34]. Rescue experiments that employed metabolic and genetic models confirmed that GAPDH has important regulatory functions that link energy metabolism and cell cycle networks. In sum, our data suggest that GAPDH serves a key role in cell cycle regulation and cell senescence during cancer cell development.
In addition to NSCLC, the up-regulation of GAPDH and associated genes, including GACC genes, was observed in other tumor types. The results suggest that transcription of GAPDH in tumors is an important step in cancer development, where it may contribute to increased cell cycle-related cell proliferation. This process may also include aberrant glycolysis and gluconeogenesis, as most genes in both pathways are up-regulated. However, the gluconeogenesis gene FBP1 is down-regulated in the tumors. During normal glucose metabolism, excess GAPDH is continuously metabolized by glycolysis to pyruvic acid, which is then converted by FBP1 to fructose-1,6-bisphosphate during gluconeogenesis. In cancer, down-regulation of FBP1 can result in GAPDH accumulation within the cytoplasm and may also cause translocation of excess GAPDH to the nucleus. Down-regulation of FBP1 in cancer cells has been reported recently [35], [36]. FBP1 is considered a tumor suppressor gene in gastric cancer cells and down-regulation of FBP1 that is mediated by promoter hypermethylation is found in human hepatocellular carcinoma and colon cancer. Over-expression of GAPDH in the tumors may connect aberrant glucose metabolism with cell proliferation. However, our analysis cannot exclude the possibility that the observed up-regulation of both GAPDH and GACC genes may be a secondary effect of the cancer that is attributable to the highenergy demands required for rapid growth. While apoptosis is inhibited, cellular metabolism is increased and key metabolic pathways are activated. In addition, this study has been limited to measuring transcriptional levels based on the microarray data, and cannot establish a correlation between gene expression and cellular protein levels within the high metabolic environments. Further experimentation is required to address these issues.
Our data indicate that up-regulation of GACC genes within tumor cells is proportional to their malignant status, and thus can be a potential predictor of disease prognosis. Based on the GAPDH positively associated gene expression levels, the subclass of NSCLC, adrenocortical carcinoma, breast cancer and HCC can also be classified to various stages. The up-regulation of GAPDH positively associated genes correlates with more severe and/or later stages of cancer. Most importantly, using both GAPDH transcription and GACC gene expression levels in survival analysis greatly improve the predictive power of using GACC gene level alone in most cases.

Methods
The Gene expression microarray analyses reported in this study employed data from ArrayExpress (http://www.ebi.ac.uk/ arrayexpress) of the European Institute of Bioinformatics (EBI) and caArray (https://array.nci.nih.gov/caarray/home.action) of  GEOD-19018) and a cohort from caArray containing lung adenocarcinomas (jacob-00182) for Affymetrix GeneChip Human Genome U133 plus 2 or U133A Arrays. The CEL files containing the raw data from each experiment were directly downloaded from the EBI or NCI website with particular accession number. Data were then normalized with CEL file quality control evaluation using 39 Expression Arrays Robust Multi-array Analysis (RMA) from the Affymetrix software Expression Console (http:// www.affymetrix.com). The normalized expression values represent the probe set intensity on a log-2 scale. The integrated NSCLC dataset from ArrayExpress include three independent NSCLC datasets for analysis, in which all CEL files from these sources were combined for RMA analysis.
The student's t-test (one-tailed) and Pearson's correlation coefficient calculation were carried out using Microsoft Excel. P values of t-test less than 0.05 were considered as statistically different. Chi Square test (chisq.test), heat map drawing (heatmap) and Kaplan-Meier survival analysis were carried out using the open source statistical tool R (version 2.14.1) (Supporting Information). In the gene expression analyses, the value of a selected gene expression level was compared with the median value or 25% quartile value of the gene expression in each cohort. The numbers higher or lower than the median or 25% quartile value are plotted in the results. For survival analysis, values higher or lower than median in each gene group were placed in ''high'', ''low'', or different combinations for the analysis. All survival times have been adjusted to months.