Identification of endometrial cancer methylation features using combined methylation analysis methods

Background DNA methylation is a stable epigenetic mark that is frequently altered in tumors. DNA methylation features are attractive biomarkers for disease states given the stability of DNA methylation in living cells and in biologic specimens typically available for analysis. Widespread accumulation of methylation in regulatory elements in some cancers (specifically the CpG island methylator phenotype, CIMP) can play an important role in tumorigenesis. High resolution assessment of CIMP for the entire genome, however, remains cost prohibitive and requires quantities of DNA not available for many tissue samples of interest. Genome-wide scans of methylation have been undertaken for large numbers of tumors, and higher resolution analyses for a limited number of cancer specimens. Methods for analyzing such large datasets and integrating findings from different studies continue to evolve. An approach for comparison of findings from a genome-wide assessment of the methylated component of tumor DNA and more widely applied methylation scans was developed. Methods Methylomes for 76 primary endometrial cancer and 12 normal endometrial samples were generated using methylated fragment capture and second generation sequencing, MethylCap-seq. Publically available Infinium HumanMethylation 450 data from The Cancer Genome Atlas (TCGA) were compared to MethylCap-seq data. Results Analysis of methylation in promoter CpG islands (CGIs) identified a subset of tumors with a methylator phenotype. We used a two-stage approach to develop a 13-region methylation signature associated with a “hypermethylator state.” High level methylation for the 13-region methylation signatures was associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration in the TCGA test set. In addition, the signature devised showed good agreement with previously described methylation clusters devised by TCGA. Conclusion We identified a methylation signature for a “hypermethylator phenotype” in endometrial cancer and developed methods that may prove useful for identifying extreme methylation phenotypes in other cancers.

Introduction Cancers develop and progress as a result of accumulation of mutations that alter the coding sequence of genes, as well as changes in gene expression. Changes in gene expression in cancers are associated with alteration in transcription factors, mutations in DNA binding elements, miRNAs, and chromatin remodeling. Chromatin remodeling, including epigenetic modifications to histones and DNA methylation, normally plays a key role in cell differentiation; stably switching cellular pathways on/off until the cells reach a terminally differentiated state that is typically irreversible. Epigenetic changes can lead to tumor suppressor silencing or re-expression of oncogenes in tumor cells, contributing to dysregulation of genes and pathways important in tumorigenesis [1].
DNA methylation is one of the better understood mechanisms of epigenetic control. DNA methylation in humans is mediated by the DNA methyltransferases DNMT1 and DNMT3, which add a methyl group to the 5' carbon of cytosine. In differentiated cells, DNA methylation occurs in the context of cytosine followed by guanine (CpG) in the DNA sequence. DNA methylation in promoter CpG islands (CGIs) has been shown to mediate stable gene silencing. Tumor suppressor silencing associated with DNA methylation is found in a wide range of tumor types [2]. It is not surprising that DNA methylation is recognized as a potential biomarker [3]. Methods that can be linked to DNA sequencing have been developed to assess methylation, including affinity-based capture of methylated regions and bisulfite conversion [4].
The CpG island methylator phenotype (CIMP) is a cancer-specific accumulation of DNA methylation in CGIs. Originally identified in colorectal cancer [5], CIMP has since been identified in multiple cancer types: glioma [6], breast cancer [7], acute myeloid leukemia [8], gastric cancer [9], clear cell renal cell carcinoma [10], oral squamous cell carcinoma [11], hindbrain ependymomas [12] and endometrial cancer (EC) [13,14]. CIMP arises early in tumorigenesis as evidenced in some colorectal serrated adenomas prior to malignant progression and the development of microsatellite instability (MSI) [15,16]. It can lead to multiple changes in gene expression and, with that, altered tumor biology. CIMP is associated with good prognosis in some cancer types (e.g., colorectal, breast) and poor prognosis in others (e.g., renal cell carcinoma) [17]. In addition to potential roles as a biomarker for precancerous lesions and prognosis in established tumor, CIMP could also represent a therapeutic target for demethylating therapies [18]. Despite its potential diagnostic, prognostic and therapeutic value, CIMP and its manifestations in different cancer types remain poorly understood. Defining CIMP at the genome level requires extensive methylome profiling. Methylome profiling in ECs has been completed largely through sampling small numbers of CpGs, either at candidate regions or using more general methods in a modest number of tumors. Several methods have emerged to profile the methylome, including the Infinium beadchip [19] and affinity-based methylation capture followed by shotgun sequencing (e.g., MethylCap-seq [20]). The Infinium beadchip has been a method of choice for analyzing tumor DNAs because it is cost-effective, scalable, has demonstrated high accuracy and reproducibility, and has a userfriendly analysis pipeline. The method relies on hybridization of bisulfite-converted DNA to the beadchip, followed by single-base extension. The end result is a readout of percent methylation for individual CpGs, with~7 CpGs assessed per promoter CGI using the HumanMethylation 450 platform. At the genome level, approximately 8% of the CpGs in promoter CGIs are evaluated. The methylation status of CpGs near those directly assessed is assumed to be similar. MethylCap-seq is one of several affinity-based capture methods that leverage shotgun sequencing to assess methylation patterns. MethylCap-seq uses the MBD2 protein to capture methylated fragments, which are then sequenced to yield piles of methylation tags across the genome [21]. By comparing tag frequency between samples, relative methylation levels can be inferred for a given region. As sequencing costs continue to fall, MethylCap-seq and similar methods will become increasingly cost-effective. For analysis of promoter CGI, MethylCapseq has a particular advantage over Infinium: average methylation over the regions is measured, rather than assumed. The purpose of this study was to develop a signature for methylation in endometrial cancers that distinguishes tumor from normal endometrium, and that has potential to classify tumors as having discrete levels of DNA methylation. We developed a 13-region signature that stratified endometrioid endometrial tumors based on CGI methylation status. The signature distinguishes tumors from both normal controls and adjacent normal tissue. This signature was based on a training set of MethylCap-seq data and validated using TCGA Infinium datasets. This signature could prove useful for detecting and classifying endometrioid endometrial carcinomas.

Patient samples and sequence data
Seventy-six primary human endometrioid endometrial cancer and 12 nonmalignant endometrial samples were analyzed from a previously published cohort [22]. The normal endometrial tissues were from patients who did not have endometrial cancer and are thus referred to as "unmatched". Cohort characteristics are shown in S1 Table. A sequencing read summary is provided in S2 Table. All studies involving human endometrial cancer samples were approved by the Human Studies Committee at the Washington University and at The Ohio State University.

MethylCap-seq quality control
MethylCap-seq quality control was implemented as previously described [23]. Fourteen of 102 samples showed evidence of poor methylated fragment enrichment or poor sequencing reproducibility and were excluded from analysis, leaving 76 tumors and 12 normals. This method was demonstrated to reduce noise in methylation signal and improve the ability to discriminate between tumors and normal tissue.

MethylCap-seq data analysis
Sequence files were aligned and processed as previously described [23]. Reads were extended to the average fragment length and the resulting count distribution was normalized against the total aligned reads by conversion to reads per million (RPM). Differentially methylated promoter CGIs were identified by performing a Wilcoxon rank sum test for each CGI across the two sample groups being considered. Results were adjusted for multiple comparisons by setting a false discovery rate (FDR) cutoff of 0.05. Methylation was categorized by genomic feature as follows: CpG islands (CGI, as defined in the UCSC genome browser), promoters (2kb in length, 1kb upstream and downstream of the transcription start site (TSS)), CGI shores (200bp to 2kb distant from both ends of each CGI), and the first exon of RefSeq genes. CGIs were further subdivided by proximity to promoters (within 10kb upstream or 1kb downstream of a 2kb promoter), and 2kb promoters were subdivided by overlap with CGI.

Infinium validation of methylation signature candidates
Eleven of 76 tumors were chosen for technical validation (S1 Table) using the Infinium Human-Methylation 450 beadchip platform, a well-validated bisulfite-based method for assessing methylation of individual CpGs genome-wide. The assay was performed according to manufacturer protocol by the University of Southern California Epigenome Center. Methylation was reported using beta-values, a number which represents the fraction of DNA fragments that were methylated at a given CpG site.
Computation of methylation score using the 13-promoter CGI signature Methylation score was computed by taking the average of the beta-values for all probes within a promoter CGI, then averaging the result across the 13-promoter CGI in the signature. The final signature comprised a total of 88 Infinium HumanMethylation 450 probes.

In silico analysis of TCGA endometrioid endometrial tumors
Methylation was analyzed for 203 endometrioid endometrial tumors from the original published TCGA cohort of 373 endometrial tumors [24]. For 170 tumors, Infinium HumanMethylation 450 data were lacking. Non-endometrioid endometrial cancers were not analyzed. Some analyses assessed fewer than 203 samples due to gaps in data availability for each assay. Methylation was assessed using Level 3 data from The Cancer Genome Atlas Data Portal, while clinical and molecular correlating data were gathered from cBioPortal for Cancer Genomics (Memorial Sloan Kettering Cancer Center).

Replicate signature analysis
To demonstrate the reproducibility of our method for identifying tumors with a CpG island methylator phenotype, two additional 13-region signatures were compiled from the original list of top differentially methylated promoter CGIs between CG island highly methylated (CGI-H) and CG island low level methylation (CGI-L) tumors in the initial MethylCap-seq analysis. CGI-L tumors in the discovery set were defined as showing promoter CGI methylation signal of less than 5000 RPM, while CGI-H tumors were defined as showing signal greater than 15000 RPM (three-fold difference in signal). Normal controls showed an average methylation signal of 4771 RPM and a maximum methylation signal of 8261 RPM. These definitions were intended to capture the most extreme methylation phenotypes for subsequent analysis, rather than include all tumors with aberrant methylation patterns. For this replicate signature analysis, regions that had already been considered for the original signature were excluded from this analysis. Mirroring the technical validation of the original signature, candidate regions that showed <0.1 difference in average beta-value between groups in the Infinium technical validation set were discarded. An additional negative control signature was populated with the 13-promoter CGI that showed the least differences in methylation between groups in the discovery set (as determined by fold change). Endometrioid endometrial tumors from the test set were indexed using all four signatures, and methylation score was computed using the average beta-value of the regions in each signature. Rank correlation of tumor methylation scores between replicate signatures and the original signature was compared using a Spearman test.

Characterizing a CpG island methylator phenotype
Methylome data from a previously reported MethylCap-seq study of 76 endometrioid endometrial carcinomas and 12 normal endometrial tissue controls [25] were analyzed (S1 Table). Patterns of methylation in the tumors and normal DNAs were compared [26,27]. Normal tissues had low level methylation compared to tumors with much less variability in overall methylation than was seen in tumors. Overall, cancers showed a nearly 2-fold increase in methylation of promoter CGIs, with less pronounced gains in methylation of CGI shores ( Fig 1A). The increased methylation in genic regions was greatest at the promoters, but was also seen in first exons. Overall, promoter CGI methylation was highly variable with the greatest variation seen in tumor DNAs ( Fig 1B). CGI tumor methylation ranged from slightly below the levels seen in normal endometrial tissues to 5-fold higher than normals. Among the 76 tumors investigated, five stood out as having distinctly higher levels of CGI methylation (referred to as CGI-H for highly methylated) and a number of tumors had methylation levels comparable to that seen in the normal endometrial tissues (CGI-L for low level methylation)(S1 Table). To determine if different efficiencies in the methylated fragment enrichment (rather than biological differences in methylation) might explain the variation in methylation seen across the DNAs investigated, we compared levels of nuclear CGI and mitochondrial DNA methylation. A positive correlation could indicate sample-specific differences in efficiency of capture of methylated DNA. Nuclear and mitochondrial methylation levels were not correlated (Spearman r = -0.15, p = 0.2, data not shown). Given the fact that nuclear and mitochondrial methylation are mediated by different processes in distinct cellular compartments [28,29], we reasoned that the lack of correlation made it unlikely that differences in overall methylation were attributable to technical differences/enrichment bias.
Comparison of promoter CGI methylation in normal and tumor tissues revealed an overall increase in methylation in tumors, consistent with a CIMP (Fig 2A). The differences in CGI methylation between the most highly methylated (5 CGI-H tumors) and least methylated tumors (8 CGI-L tumors) were, as expected, almost all gains (4,672 hypermethylated vs 17 hypomethylated) (Fig 2B). The extensive variability in CGI methylation involved 29% of all promoter CGIs. Among the 4,672 CGIs hypermethylated in the CGI-H tumors, 2,269 (49%) overlapped with the hypermethylated CGIs for the tumor vs normal comparison (Fig 2C). The overlap is more than twice than expected (49% vs 23%, Chi squared p-value < 0.001). The loci in the overlap presumably include "hotspots" in the genome that are likely to acquire methylation in endometrial tumorigenesis. Pathway analysis of these 2,269 shared hypermethylated promoter CGIs showed enrichment for known targets of epigenetic regulation, including targets of the Polycomb Repressor Complex and regions known to be methylated in other cancers (S3 Table).
Technical validation of highly methylated CGIs and development of an endometrial cancer methylation signature The 16 CGIs showing the most significant or largest fold differences between CGI-H and CGI-L tumors, and that had distinguished tumor and normal DNAs, were considered  An endometrial cancer methylation signature for CIMP candidates for a "highly methylated" signature for ECs. The number of CpGs in the 16 CGIs ranged from 23 to 234, with the MethylCap fold enrichment ranging from 11.7X to 18.9X (Table 1). When the methylation levels of the 16 CGIs were compared in the CGI-H and CGI-L tumors (5 and 8 cases respectively) 15 of 16 candidates were, as expected, more methylated in the CGI-H tumors. The exception was the TMEM115 CGI that was hypermethylated in only 3 of the CGI-H tumors (Fig 3A).
Analysis of CGI-H and CGI-L tumors using the Infinium HumanMethylation 450 beadchip validated the observed increase in methylation seen with MethylCap for 13 of 16 candidates (Fig 3B). Because DNA was not available for some of the tumors studied by MethylCap (four tumors for which DNA stocks were depleted), the orthogonal validation included only nine of the previously studied cases (4 CGI-H and 5 CGI-L)(S1 Table). One additional case with methylation near the CGI-H cut-off and three additional tumors with methylation close to the CGI-L cut-off ( Fig 1B) were analyzed. Candidate hypermethylated loci were considered to be technically validated using the following criteria: beta-value difference of greater than +0.1 (CGI-H-CGI-L) or Student's t-test p<0.1. We purposely set low beta and permissive p values to avoid over-fitting. The 13 validated signature regions comprise a total of 88 Infinium HumanMethylation 450 probes located within the respective CGIs, with a median of six probes per region and a range of 2 to 14 ( Table 2).
The validated signature regions robustly distinguished 5 out of 6 of the CGI-H tumors from the CGI-L tumors (Fig 3B). The aggregate signature composed of these 13 CGI promoter regions likewise distinguished CGI-H from CGI-L tumors (mean average beta-value of 0.47 vs 0.26, Student's t-test p<0.05) (Fig 3C). Methylation signature stratifies endometrioid endometrial tumors by methylation phenotype and distinguishes tumors from normal controls in the endometrial cancer TCGA dataset To test whether the 13-CGI signature that we developed can distinguish methylation phenotypes in an independent cohort, we examined the methylation profiles for endometrioid  endometrial carcinomas from TCGA. The available data for 203 ECs generated using the Infinium HumanMethylation 450 beadchip were analyzed [24]. TCGA over-sampled for high grade endometrioid cases with approximately one-third of cases being grade 1, one-third grade 2 and one-third grade 3. The cohort is otherwise largely representative of women with ECs [24]. Because the 13-CGI methylation signature we devised came from comparison of individual CGIs in tumors that showed the largest differences in overall promoter CGI methylation, we first assessed the relationship between our 13-CGI signature with overall CGI methylation in the TCGA cohort. Our 13-gene signature and overall CGI methylation proved to be highly correlated (Fig 4A) as would be expected for a marker for genome-wide CIMP. Our 13-locus signature was also highly correlated with methylation clusters (MC1-4) developed by TCGA (Fig 4B). The 13-CGI signature scores (beta values) were significantly different for tumors assigned to TCGA MC1 and MC2 clusters (very highly and highly methylated groups) compared to the other two groups (MC3 with methylation comparable to levels seen in normal and MC4 with intermediate methylation). ANOVA revealed that the 13-gene signature scores were significantly different across the four groups. The MC3/MC4 comparison however, showed these two groups have indistinguishable scores/beta values (Fig 4B). Methylation scores for tumors were compared to the scores for matched and unmatched normal control tissues (Fig 4C). Ninety-five percent (192 of 203) of endometrioid ECs showed a higher methylation score than unmatched normal controls (N = 11), suggesting that overall the 13-region signature could reliably distinguish tumor and normal tissues. Likewise, for the small number of cases with matched normal and tumor tissues (N = 13), tumors had an average 3-fold increase in methylation compared to normal, and only one tumor had methylation levels in the same range as normal tissues. The 13-CGI methylation signature showed a sensitivity of 0.95 +/-0.03 and a specificity of 0.93 +/-.07 (95% confidence interval) for distinguishing tumors from normal tissue at a methylation score threshold of 0.16 (Fig 4D).
To determine if methylation levels for the 13 CGIs in our signature were related to gene expression, we evaluated the transcript levels using the publicly available RNA-seq data. Seven of the 13 genes showed a significant association between methylation and transcript levels: increased methylation was associated with reduced levels of transcripts (Table 3). EPHX3 expression decreased notably with increasing promoter methylation (S1 Fig). High methylation score is associated with mismatch repair deficiency, high mutation rate, and low somatic copy number alteration In colorectal cancer, CIMP is a feature of tumors with defective DNA mismatch repair (MMR) [15]. Tumors with DNA MMR defects have elevated mutation levels and have characteristically accumulated large numbers of strand-slippage mutations that give rise to the MSI phenotype. MMR defects are frequently seen in EC. Epigenetic silencing of the MLH1 MMR gene associated with hypermethylation in its promoter, accounts for the vast majority of MSI-positive/MMR deficient ECs [30][31][32]. When we compared our 13-CGI methylation scores for TCGA tumors stratified based on MSI status, mutation rate cluster and copy number cluster, clear differences with all three features were evident (Fig 5). Methylation score correlated with MSI status (MSI+ vs microsatellite stable (MSS)), median 0.40 vs. 0.27, Wilcoxon rank sum p<0.001, Fig 5A) and mutation frequency (High vs Low clusters, mean 0.38 vs 0.28, ANOVA with Holm-Sidak post-hoc p<0.001, Fig 5B). Methylation score also varied with somatic copy number alteration (SCNA) cluster. Clusters 2 and 3 have higher methylation than SCNA cluster 4 (median 0.33 and 0.39 vs. 0.24, Kruskal-Wallis with Bonferroni-corrected Student's t-test post-hoc p<0.01). SCNA cluster 3 also have significantly higher methylation than very low SCNA cluster 1 (median 0.39 vs. 0.28) with cluster 2 having an intermediate value (Fig 5C). Given the highly significant association between MSI and methylation of the MLH1 promoter in sporadic endometrioid endometrial cancers [32], the strong correlation observed between methylation score and MSI was expected. The high mutation TCGA group is greatly enriched for MSI-positive tumors and thus our 13-CGI signature was similarly higher is these tumors. The inverse relationship between methylation score with low SCNA suggests that CIMP/MSI and chromosomal instability may be features of two distinct pathways of tumorigenesis [24]. When we compared our 13-feature CGI methylation score across other published endometrioid endometrial TCGA cancer cluster data (mRNA and miRNA expression) and with clinicopathologic and demographic variables (BMI, stage, grade and relapse-free survival), no significant relationships with methylation score were seen (threshold of p<0.01, data not shown).

Methodological validity
The reproducibility of the methods used to generate the 13-CGI methylation signature was tested by generating two additional "replicate signatures", each including 13 different promoter CGI selected using similar methods. As shown in S2 Fig, when the "replicate signatures" were applied to TCGA set, they performed similarly (r = 0.82,0.89 for replicates R1 and R2 vs original signature, p<0.001; r = 0.144 for negative control vs original signature, p>0.01). This suggests our approach to developing a methylation score is robust. The relationships with molecular signatures associated with our 13 CGI score (MSI, mutation rate, copy number alteration) were evaluated using the "replicate signatures", revealing similar strong correlations.

Relationship between a reduced feature CIMP signature and TCGA methylation clusters
We assessed the relationship between the 13-feature CGI signature developed using Methyl-Cap and TCGA methylation clusters [24]. The 13-feature CGI signature (Fig 3B) captures most of the most highly methylated TCGA tumors (those assigned to methylation cluster 1, MC1) with a threshold beta-value of 0.4. All tumors assigned to methylation cluster 3 and 4 (25 and 56 tumors, respectively) have beta-values <0.4 (Fig 4B). The !0.4 value, which we consider a marker for high CIMP (CIMP-H), also excludes one of MethylCap-seq CGI-H tumors we profiled with the Infinium HumanMethylation 450 platform (Fig 3C). When the

Discussion
Aberrant DNA methylation is a feature of most cancers and can be an early event in tumorigenesis [33][34][35][36]. There is tremendous variability in both the extent and patterns of methylation across and within cancer types, and profiling methylation has increasingly become part of the molecular phenotyping for tumors. DNA methylation is an attractive biomarker with potential diagnostic, prognostic, and therapeutic applications [3,18,[37][38][39].
Although a CIMP has been defined in a variety of tumor types, there are a limited number of studies that have leveraged genome-wide methylome profiling techniques to examine CIMP in endometrial cancer [24,40,41]. We combined measurement of methylation over CGI regions (MethylCap-seq) which defines large-scale methylation differences with Infinium data, which relies on a smaller number of data points to generate a signature for global differences in methylation that is based on a small number of features. By doing so, we leveraged the increased CpG coverage of enrichment-based methylation profiling to determine which of the smaller number of features best capture large increases in methylation over a CGI region.
Our analysis was based on the premise that a CIMP could be identified based on aggregate methylation, which has not typically been used to define CIMP markers. We demonstrated that the approach for identifying CIMP based on aggregate methylation shows general agreement with a clustering-based approach (Fig 4B), and furthermore show that the methylation score yielded by our signature reflects aggregate CpG island methylation (Fig 4A). In addition, most tumors show more aggregate CGI methylation than normal controls (Figs 1, 4C and 4D), suggesting that promoter methylation is more prevalent in endometrioid endometrial cancer than previously thought.
The aggregate methylation analysis approach that we took is unlike unsupervised clustering methods used in many genome-wide/global methylation studies. MethylCap-seq methods do not require complex data normalization and correction for batch effects necessary for clustering, but do require rigorous quality assurance to avoid technical bias for poor CpG enrichment (elimination of cases with very low levels of CpG methylation and evaluation of mitochondrial methylation). By excluding tumors with very low CpG methylation (presumed to be poor capture of methylated fractions) from analyses, there is the possibility that we fail to consider samples that do indeed have very low CGI methylation (significantly below that of normal tissue). An obvious implication of the bias towards increased methylation is that should a subset of endometrial cancers have a "CpG island hypomethylator phenotype", they would likely go undetected. The validity of our approach for identifying CIMP in endometrioid endometrial cancer was best evidenced by the strong correlation between what we measured as aggregate promoter CGI methylation in the TCGA data set, and the previously assigned TCGA methylation clusters (Fig 4A). The TCGA data set was not only for a completely different set of tumors, but also relied on an entirely different platform for measuring methylation.
Methylation in normal tissues is highly tissue-specific, and it is not surprising the tumorspecific methylation abnormalities tend to be related to cell of origin [42]. Given the specificity of methylation in normal tissues, it follows that the markers used to define CIMP vary from one cancer type to another. Broad changes in DNA methylation are shared by many tumor types as are a range of sequence/locus-specific changes, but these general methylation abnormalities are not markers for tissue-type CIMP. CGIs for three genes known to be methylated in other tumor types are part of our 13-region signature for endometrioid endometrial cancer: EPHX3 (ABHD9), FGF12, and ASCL1. Methylation of EPHX3 is seen in primary prostate cancers [43]. Methylation of FGF12 has been reported in colorectal tumors but not in matched controls [44] as has methylation of ASCL1 [42]. It is appealing to suggest that ASCL1 and FGF12 methylation might have diagnostic potential (ability to discriminate between tumor and normal tissues) in both colorectal and endometrial cancer or reflect similarities in the biology underlying these cancer types.
Our study corroborates and expands on TCGA for endometrial cancer consortium methylation profiling and cluster analysis. The CIMP developed by TCGA was based on Infinium methylation data (Infinium HumanMethylation450 platform) that includes 113,521 probes from CpG islands (average of 7 probes per CGI). These promoter CGI have an average length of 904bp and 84 CpGs per island; therefore the Infinium platform measures methylation of 8% of CpGs in promoter CGI. Although it is widely accepted that methylation of Infinium probes is representative of regional methylation, this may not be the case for all tissues or tumor types. Similarly the small number of probes per region may not reflect the region as a whole in tumors with profound dysregulation genome-wide methylation patterns. Our methylation signature is based on genome-wide promoter CGI data collected using MethylCap-seq and the agreement of our methylation score data with the clusters in the TCGA Consortium study validates their method as well as our own. Our CIMP classification based on the 13 loci has an additional advantage in that it requires measuring methylation of only 82 CpGs relative to the large number of probes from across the genome used for clustering in the TCGA Consortium study. Such a 13-feature signature could easily be formatted for low cost high-throughput analysis. The methylation threshold of 0.4 for identifying endometrial CIMP endometrial tumors we established could be used to dichotomize the methylation state. Our data, however, suggest that CIMP in endometrioid endometrial cancer could be viewed as a continuum rather than as a discrete phenomenon (Figs 1B and 4C). We suggest the score threshold of 0.4 for our 13 features distinguished CIMP-H tumors in TCGA data set for 203 endometrioid tumors, but analysis of additional endometrial cancer methylation data sets is warranted.
It is important to note that the methylation of 13 CpG islands included in our methylation signature are correlated with broad gains in CpG island methylation in both our data and in the TCGA data set. The significance of an assigned methylation score is this underlying correlation, not the methylation of the individual islands. Methylation of an arbitrary set of islands could be useful for diagnosis of cancer and could predict response to treatment, but may not in and of themselves indicate an underlying methylator phenotype.

Conclusion
In summary, we used two methylome profiling techniques to stratify tumors by overall promoter CGI methylation, identified a signature to reproduce this stratification, and verified that classification of tumors using this signature reproduced known characteristics of CIMP tumors (e.g., the association with MSI). More generally, we demonstrated an approach for translating methylome profiling findings to the Infinium platform, which will become increasingly important as more publicly available methylation datasets become available and the associated clinical data mature. Our analyses suggest that widespread promoter methylation is more prevalent in endometrioid endometrial cancer than previously appreciated, and that promoter methylation could be a useful marker for distinguishing tumors and normal tissue. Comparison of original methylation signature methylation levels with values for replicate signatures in TCGA data. Two hundred and three endometrioid endometrial tumors from TCGA were indexed using the average beta-value of all regions in the signature, and relative index values between replicates were compared by plotting as a normalized log2 transformed heatmap. Samples were ranked by the original signature index (O) for visual comparison. Statistical comparison of rank correlation vs. the original signature was performed using a Spearman test (r = 0.82, 0.89 for replicates and p<0.001; r = 0.14 for NC and p>0.01). R1: replicate signature 1, R2: replicate signature 2, NC: negative control. (TIF) S1