Distinct Functional Patterns of Gene Promoter Hypomethylation and Hypermethylation in Cancer Genomes

Background Aberrant DNA methylation plays important roles in carcinogenesis. However, the functional significance of genome-wide hypermethylation and hypomethylation of gene promoters in carcinogenesis currently remain unclear. Principal Findings Based on genome-wide methylation data for five cancer types, we showed that genes with promoter hypermethylation were highly consistent in function across different cancer types, and so were genes with promoter hypomethylation. Functions related to “developmental processes” and “regulation of biology processes” were significantly enriched with hypermethylated genes but were depleted of hypomethylated genes. In contrast, functions related to “cell killing” and “response to stimulus”, including immune and inflammatory response, were associated with an enrichment of hypomethylated genes and depletion of hypermethylated genes. We also observed that some families of cytokines secreted by immune cells, such as IL10 family cytokines and chemokines, tended to be hypomethylated in various cancer types. These results provide new hints for understanding the distinct functional roles of genome-wide hypermethylation and hypomethylation of gene promoters in carcinogenesis. Conclusions Genes with promoter hypermethylation and hypomethylation are highly consistent in function across different cancer types, respectively, but these two groups of genes tend to be enriched in different functions associated with cancer. Especially, we speculate that hypomethylation of gene promoters may play roles in inducing immunity and inflammation disorders in precancerous conditions, which may provide hints for improving epigenetic therapy and immunotherapy of cancer.


Introduction
DNA hypermethylation and hypomethylation play important roles in the initiation, progression and metastasis of cancer [1,2]. It is commonly believed that DNA hypermethylation and hypomethylation are independent processes governed by different mechanisms, and they appear to play separate roles in tumor progression [3,4]. Specifically, DNA hypermethylation in cancer genomes usually occurs in the promoter regions of tumor suppressor genes, which can result in silencing of tumor suppressor genes [5]. In contrast, DNA hypomethylation often targets DNA repeats, which may induce genomic instability and mutation events in cancer genomes [6,7,8,9]. There is evidence that promoter hypomethylation of some genes may be associated with the development of cancer by regulating the activity of genes [10] and that promoter hypomethylation of specific immunity-related genes may promote carcinogenesis [11,12]. For example, the promoter hypomethylation of cytokine IL-10 can activate its expression to inhibit the generation of immune response in breast cancer [11], and the promoter hypomethylation of SPAN-Xb, an immunogenic antigen, can induce de novo B-cell response in myeloma cells [12]. However, the biological significance of promoter hypomethylation in cancer is still poorly understood [13].
In this work, we explored the distinct roles of genes with promoter hypermethylation and hypomethylation in cancer (hereafter referred to as hypermethylated and hypomethylated genes for simplicity) using the promoter methylation profiles of five cancer types. First, we evaluated the consistency of functions enriched with hypermethylated (or hypomethylated) genes across different cancer types. Then, we identified hypermethylationspecific (or hypomethylation-specific) functions significantly enriched with hypermethylated genes (or hypomethylated genes) and significantly depleted of hypomethylated genes (or hypermethylated genes). Finally, we discuss potential links between hypo-methylated genes in cancer and immune and inflammatory response disorders in precancerous conditions.

Methylation Data
The promoter methylation datasets for five cancer types were extracted from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) database (http://tcga-data.nci.nih. gov/tcga), as described in Table 1. For each dataset, the data were derived from paired samples of tumor and adjacent normal tissues from the same organ site, and the percentage of tumor cells in each tumor sample of TCGA was higher than 70% [14]. Details about the preparation of tissues can be found in the TCGA document (http://rcb.cancer.gov/rcb-internet/appl/rfp/07013/ SOWAttachmentNo3-BCR-3-10.pdf). To avoid a potential batch effect [15], we selected the batch with the largest sample size for each cancer type for analysis. All data were collected with the Illumina HumanMethylation27 platform, which detected the methylation value of 27578 CpG loci located within the proximal promoter regions of transcription start sites of 14495 genes.
We used level_1 data with methylated signal intensity (M) and unmethylated signal intensity (U). The methylation level (betavalue) for each CpG locus was calculated by max (M, 0)/ (|U|+|M|+100), and a constant of 100 was added to regularize the beta value when both U and M values were small [16]. Then, a beta value between 0 (unmethylated) and 1 (fully methylated) was assigned to each CpG locus in each sample. For each dataset, the detection P value reported by BeadStudio (Illumina) was used as a quality control measure of probe performance. We excluded samples that consisted of .5% probes with detection P values .0.05 and probes that consisted of .10% samples with detection P values .0.05. A total of 1092 CpG loci within promoters of 605 sex chromosome genes were also excluded from the analysis to eliminate gender-specific bias.

Selection of Differentially Methylated Genes
The non-parametric Mann-Whitney U test was applied to select differentially methylated (DM) CpG loci around the promoter regions of genes [18] by comparing the beta values of each CpG locus between normal and cancer samples. The false discovery rate (FDR) was controlled by the Benjamin and Hochberg procedure [19]. If the promoter of a gene had both hypermethylated and hypomethylated CpG loci, this gene was excluded from subsequent analyses [20]. The genes with at least one DM CpG locus were termed DM genes. By comparing the mean beta values of DM CpG loci between normal and cancer samples, we classified the DM genes into hypermethylated and hypomethylated genes.

Functional Enrichment and Consistency Analysis
Using the GO function algorithm [21] with an FDR ,0.05, we selected GO terms (biological processes) [22] that were significantly enriched with hypermethylated (or hypomethylated) genes for each cancer type, and then treated the local redundancy. For treatment of local redundancy, when both an ancestor and its offspring term(s) were detected to be statistically significant, the GO function extracted only the ancestor term as being relevant if there was evidence that the remaining genes in the ancestor term were still likely to be relevant to the disease after the removal of genes in its significant offspring term(s), [21]; otherwise, only the offspring term was kept.
If there were N significantly hypermethylated terms in dataset 1, among which K 1 terms were also identified as significantly hypermethylated in dataset 2, the PO (percentage of overlaps) score of the two term lists (from dataset 1 to dataset 2) was calculated as K 1 /N. Then, we proposed a score, denoted as the POE (percentage of overlaps extended) score, to evaluate the consistency of these two lists of significant GO terms. For a hypermethylated GO term extracted from dataset 1, if its raw P value of enrichment with hypermethylated genes for dataset 2 was lower than 0.05, then it was defined to be tentatively significant in dataset 2. If K 2 of the N hypermethylated terms extracted from dataset 1 were significant or tentatively significant in dataset 2, the POE score of the two term lists (from dataset 1 to dataset 2) was calculated as K 2 /N. Finally, we performed random experiments to demonstrate that the observed POE score was unlikely to be produced by chance. From dataset 2, we randomly extracted genes as ''hypermethylated genes'', with the same number of hypermethylated genes extracted from dataset 2, and then performed the functional analysis and calculated the random POE scores. This process was repeated 10,000 times, and the P value of the observed score from dataset 1 to dataset 2 was calculated as the percentage of the random scores exceeding the observed score. The same analysis was performed for the hypomethylated terms.

Extensive Hypermethylation and Hypomethylation of Gene Promoters in Cancers
We selected DM genes using the Mann-Whitney U test with an FDR ,5%. As shown in Figure 1, approximately one third of all the measured genes for each dataset were found to be differentially methylated. On average, 56% of the DM genes were hypomethylated in the five cancer types ( Figure 1).

Functional Consistency of Methylation Alterations Across Different Cancer Types
For each dataset, with an FDR of 5%, the GO-function algorithm [21] was used to identify GO terms that were significantly enriched with hypermethylated genes, called hypermethylated terms. The hypermethylated terms extracted from different datasets appeared to have low PO scores ( Figure 2A). For example, only 22 to 24 of the 43 hypermethylated terms extracted from the dataset for kidney carcinoma could be found in the datasets for the other cancer types, with PO scores of 51-55%. However, even for the same cancer type, the significant hypermethylated (or hypomethylated) terms extracted from different datasets tended to have low PO scores due to the inherent limitations of the statistical decision [21]. To address this problem, we proposed the POE score to evaluate the functional consistency of the hypermethylated terms extracted from different datasets (see Materials and Methods). For example, 41 to 43 of the 43 hypermethylated terms extracted for kidney carcinoma had raw enrichment P values less than 0.05 in all of the datasets for the other cancer types, with POE scores of 95-100%. However, an average of less than two of the terms extracted for kidney carcinoma had an enrichment P value less than 0.05 in 10,000 randomized datasets for each of the other cancer types (see Materials and Methods), which is significantly fewer than the number observed in the original dataset (P,0.0001). These results suggested that the re-occurrence in the other four cancer types was not random for the majority of the hypermethylated terms for kidney carcinoma. Similar results were observed for hypermethylated terms extracted for the other four cancer types ( Figure 2B). Thus, hypermethylated genes for different cancer types are highly consistent in their function.
With an FDR ,0.05, we identified 117 terms that were consistently hypermethylated across the five cancer types. Each of these terms was significant in at least one cancer type and tentatively significant (P,0.05) in all of the other four cancer types, which was unlikely to be observed by chance (binomial test, P,6.25E-06). As shown in Table S1, these terms are mainly related to ''developmental process'' (including ''cell differentiation'' and ''cell development''), ''transport'' (including ''calcium ion transport'' and ''neurotransmitter transport''), ''response to stimulus'' (including ''response to chemical stimulus'' and ''behavior'') and the ''regulation of biological process'' (including ''regulation of transcription, DNA-dependent'' and ''regulation of signaling''). Notably, when a term and one of its offspring terms are both detected to be significant, researchers are often interested in the specific offspring term, assuming that specific GO terms might be more biologically relevant [21]. However, in some cases, the general parent term could be globally disturbed. Taking the term ''cell differentiation'' (GO:0030154) as an example, the genes remaining after the removal of the genes in its four significant offspring terms were still significantly enriched with hypermethylated genes in the dataset for colon adenocarcinoma (hypergeometric test, P = 7.98e-005). This result suggested that ''regulation of cell differentiation'' might be widely disturbed in this cancer.
Similarly, the lists of hypomethylated terms extracted for different cancer types with an FDR of 5% had low percentages of overlap ( Figure 3A). For example, only 6 to 11 of the 21 hypomethylated terms extracted from the dataset for colon adenocarcinoma were also found in the datasets for the other cancer types, with PO scores of 28-52%. However, 19 of the 21 hypomethylated terms for colon adenocarcinoma had raw enrichment P values less than 0.05 in all of the other four cancer types, and the other two terms had raw enrichment P values less than 0.05 in at least one of the other cancer types, all with POE scores greater than 90%. In 10,000 randomized experiments for each cancer type (see Materials and Methods), less than one of the terms, on average, extracted for colon adenocarcinoma had enrichment P values less than 0.05 in all of the other four cancer types, which was significantly fewer than the corresponding number observed in the original dataset (P,0.0001). Thus, most of the hypomethylated terms for colon adenocarcinoma could be non-randomly found in the datasets for the other four cancer types. Similar results were observed for hypomethylated terms extracted for the other four cancer types ( Figure 3B). Therefore, hypomethylated genes for different cancer types were also highly consistent in their function.
Finally, we identified 41 terms that were consistently hypomethylated across the five different cancer types (Table S2). Each of these terms was significant in at least one cancer type and tentatively significant (P,0.05) in all of the other cancer types, which was unlikely to be observed by chance (binomial test, P,6.25E-06) (see Materials and Methods). These terms were mainly related to ''response to stimulus'' (including ''immune response'', ''defense response'' and its offspring terms ''inflammatory response'', ''cellular defense response'' and ''defense response to bacterium''), and epidermis development (including ''keratinocyte differentiation'' and its offspring ''keratinization''). Taking the term ''defense response'' as an example, the genes remaining after exclusion of the genes of its three significant offspring terms were still significantly enriched with hypomethylated genes in colon adenocarcinoma (hypergeometric test, P = 1.30E-04). Thus, ''defense response'' might be widely hypomethylated in cancer. As the hypomethylation of genes in ''immune response'' and ''inflammatory response'' could be induced by infiltration of lymphocytes in cancer tissue [23], we needed to evaluate the effect of infiltrated lymphocytes on the epigenetic changes of genes annotated in these two terms. Here, we only analyzed the dataset for invasive breast cancer as the data of lymphocyte infiltration in cancer and adjacent normal tissues was available just for this cancer type. We focused on analyzing 7 pairs of tumor and adjacent normal tissues with an equal percentage of lymphocytes in each pair of samples and found that hypomethylated genes were still significantly enriched in ''immune response'' (P = 2.41E-05) but not in ''inflammatory response'' (P = 3.74E-01) which could be due to the low power of detecting hypomethylated genes with an FDR ,5% for ''inflammatory response'' [24]. As the functional enrichment analysis is rather robust to the false discoveries of DM genes [25], we selected hypomethylated genes with an FDR,10% and found that ''inflammatory response'' was also enriched with hypomethylated genes (P = 1.92E-02). These results indicated that the methylation changes in ''immune response'' as well as in ''inflammatory response'' could not be explained by the infiltration of lymphocytes in cancer tissue.
Notably, we found that some typical cancer-associated functions such as ''cell cycle'' and ''apoptosis'' were not enriched with hypermethylated genes or hypomethylated genes. Oppositely, some of these functions were significantly depleted of both hypermethylated and hypomethylated genes for all five cancer types. For example, ''cell cycle'' was significantly depleted of both hypermethylated and hypomethylated genes for all five cancer types (all P,2.09E-10). This result could be partially due to the strong target gene specificity of methylation alternations [26,27]. On the other hand, we still observed some genes in these functions  that were consistently differentially methylated across the five cancer types. For example, 80 genes associated with cell cycle showed consistent hypermethylation or hypomethylation changes across the five cancer types, indicating that they are also common targets of methylation alternations in these cancer types.

Hypermethylation-and Hypomethylation-specific Functions
From the 117 terms consistently hypermethylated across the five cancer types, we defined hypermethylation-specific functions as those that were not significantly enriched with hypomethylated genes in any of the cancer types and significantly depleted of hypomethylated genes in at least one cancer type. The depletion analysis was performed using a one-sided hypergeometric distribution test [28]. We found 58 hypermethylation-specific functions, most of which are related to ''regulation of biology process'' and ''developmental process''. Table S3 contains a complete list of hypermethylation-specific functions.
Similarly, from the 41 terms consistently hypomethylated across the five cancer types, we defined hypomethylation-specific functions as those that were not significantly enriched with hypermethylated genes in any of the cancer types and significantly depleted of hypermethylated genes in at least one cancer type (Table S4). We found 24 hypomethylation-specific functions, the majority of which are related to response to stimulus (including ''immune response'', ''response to fungus'', ''defense response'' and its offspring ''inflammatory response''), immune system process and cell killing ( Table 2). Considering that immune cells affect malignant cells through the production of various types of cytokines, we found that cell cytokines collected in the KEGG database were significantly hypomethylated in each of the five cancer types (P = 3.72E-13, 5.11E-11, 6.60E-08, 2.44E-08 and 5.94e-09 for colon, kidney, stomach, lung and breast cancers, respectively). Specifically, we found that the hematopoietin, TNF, IL1, IL10 and IL17 families of cytokines had a significant tendency to be hypomethylated in all five cancer types. For example, an average of 70.0% of genes in the IL10 family, which promote innate immune responses from tissue epithelia to limit the damage caused by infection or inflammation [29], were hypomethylated in all five cancer types (Figure 4).
We noticed that hypermethylation-and hypomethylationspecific functions are related to different types of ''response to stimulus'', as shown in Figure 5. The hypomethylation-specific functions are mainly related to ''immune response'', ''response to fungus'' and ''defense response'' (including its offspring ''inflammatory response''), which are mainly performed by immune cells in an organism in response to a potential threat (such as cancer cells and bacteria); in these processes, cells communicate with each other through the use of signal molecules, such as cytokines [30]. In contrast, the hypermethylation-specific functions are mainly related to ''signal transduction'' within the cell and ''behavior'' (specific actions or reactions) of an organism in response to external or internal stimuli ( Figure 5).

Discussion
Our results showed that genes with promoter hypermethylation and hypomethylation in different cancer types are highly consistent in function, respectively. Although different tissues have specific methylation patterns [31], this high level of consistency suggests that they have similar methylated functional changes in different cancer types. Our results also indicated that gene promoter hypermethylation and hypomethylation tend to target different biological processes associated with tumor progression. Hypermethylation-specific functions are mostly associated with ''development process'' and ''regulation of biology process'', whereas hypomethylation-specific functions are mostly related to ''response to stimulus'' (including ''immune response'', ''response to fungus'', ''inflammatory response''), ''immune system process'' and ''cell killing''. These results suggest that DNA hypermethylation and hypomethylation might be independent processes in carcinogenesis [8]. In accordance with previous reports that the methylation state of genes can be modified by environmental stimulus [32], our results showed that both hypermethylated and hypomethylated functions are related to ''response to stimulus''. Specifically, our results further revealed that hypermethylation and hypomethylation are associated with different types of ''response to stimulus''. Notably, if a function is significantly enriched with hypomethylated (or hypermethylated) genes, it indicates that a significant portion of genes in this function are hypomethylated (or hypermethylated) in cancer, however, it does not mean that this function cannot include a small number of hypermethylated (or hypomethylated) genes. For example, in the hypomethylation-specific function ''immune response'', IRF4, which negatively regulates toll-like-receptor signalling that is central to the activation of innate and adaptive immune systems [33], was observed to be hypermethylated in all five cancer types. We also found that the ''G-protein coupled receptor protein signaling pathway'' was significantly enriched with both hypermethylated genes and hypomethylated genes across all five cancer types. These results may be due to the hypomethylation of chemokine receptors and the hypermethylation of genes related to the signals transduction within the cell, both of which may disturb pathways contributing to carcinogenesis [34].
Although our results showed that genes with promoter methylation alternations in different cancer types are highly consistent in their function, cancer is a highly heterogeneous disease with respect to different DM genes in different patients. Even for the same cancer type, unique subtypes are characterized by distinct epigenetic alternations [35,36,37], which, should also be consistent in function. For example, we found that the four lists of hypermethylated genes for the four colon cancer subtypes (CIMP-H, CIMP-L, cluster 3 and cluster 4) reported by Hinoue et al [35] were highly consistent in function ( Figure S1) although these subtypes were differ in terms of their hypermethylated genes [35]. Specifically, for the 117 terms consistently hypermethylated across different cancer types, we found that 115 terms were consistently enriched with hypermethylated genes for all four subtypes (hypergeometric test, P,0.05) and the other two terms were also marginally significant for all four subtypes (hypergeometric test, P,0.1). Different samples for a particular cancer type may harbor different methylation alternations which could also be consistent in function.
To extrapolate the functional consequence of methylation alternations of gene promoters in cancer genomes, researchers often investigate the relationship between gene methylation and gene expression. Hypermethylation of gene promoters is significantly correlated with the down-regulation of gene expression but hypomethylation of gene promoters is not or is only weakly correlated with gene up-regulation [38,39]. Similar complex relationships were also observed at the functional level. To exemplify this, we analyzed 16 of the 20 pairs of samples for invasive breast cancer that contained both methylation and expression data. The differentially expressed genes were selected using the SAM (significance analysis of microarray) algorithm [40] with an FDR ,0.05. Then, for the 117 terms consistently hypermethylated across the five cancer types, we found that 64 terms were significantly enriched with down-regulated genes (hypergeometric test with an FDR ,0.05) and 83 terms were marginally enriched with down-regulated genes (hypergeometric test with P,0.1). However, for the 41 terms that were consistently hypomethylated across the five cancer types, we found that none was enriched with up-regulated genes. Oppositely, 10 hypomethylated terms (including ''inflammatory response'' and ''leukocyte migration'') were even enriched with down-regulated genes. One possible explanation for this phenomenon is the hypothesis that hypomethylation of gene promoters must cooperate with other key activators such as appropriate levels of transcriptional factors [37,41] to control gene expression. For example, as inflammatory genes tend to be hypomethylated in inflammatory diseases [42,43,44], we could hypothesize that hypomethylation of inflammatory gene promoters may happen in precancerous inflammatory disorders, which together with the activation of the coupled activators could induce hyperactivation of inflammatory response in the precancerous conditions. During the development of cancer, the hypomethylation of inflammatory genes could be inherited through cell division, whereas the coupled activators could lose function due to genome instability induced by the protumorigenic microenvironment [45,46,47]. Thus we could observe these genes' down-regulation coexisting with hypomethylation in cancer. To prove this hypothesis, we need to monitor the methylation and expression changes during the progression from precancerous inflammation to cancer, which is a difficult task but deserves future study.
Insight into the functional roles of DNA methylation alterations in cancer genomes may help improve the epigenetic therapy of cancer. Currently, most epigenetic drugs are hypomethylating agents that target hypermethylated genes in cancer [48]. However, Table 2. Hypomethylation-specific functions.

Functional classification
GO-accession # GO Term because promoter hypomethylation of genes may also play an important role in carcinogenesis, agents targeting hypomethylated genes in cancer might be useful for cancer therapy. For example, reversal of the hypomethylation status of urokinase (uPA) promoter blocks breast cancer growth and metastasis [49]. Considering the close link between promoter hypomethylation and immunity, epigenetic therapy and immunotherapy may need to be combined for the treatment of cancer. Figure S1 The POE scores between every two lists of the hypermethylated functions extracted for the four methylation-based subtypes of colon cancer. Each row represents the scores between the hypermethylated terms for one subtype and the hypermethylated terms for the other subtypes. The POE score 1 is shown in red and 0 is indicated in blue. The details of the four subtypes are described in [35].