Breast Cancer DNA Methylation Profiles Are Associated with Tumor Size and Alcohol and Folate Intake

Although tumor size and lymph node involvement are the current cornerstones of breast cancer prognosis, they have not been extensively explored in relation to tumor methylation attributes in conjunction with other tumor and patient dietary and hormonal characteristics. Using primary breast tumors from 162 (AJCC stage I–IV) women from the Kaiser Division of Research Pathways Study and the Illumina GoldenGate methylation bead-array platform, we measured 1,413 autosomal CpG loci associated with 773 cancer-related genes and validated select CpG loci with Sequenom EpiTYPER. Tumor grade, size, estrogen and progesterone receptor status, and triple negative status were significantly (Q-values <0.05) associated with altered methylation of 209, 74, 183, 69, and 130 loci, respectively. Unsupervised clustering, using a recursively partitioned mixture model (RPMM), of all autosomal CpG loci revealed eight distinct methylation classes. Methylation class membership was significantly associated with patient race (P<0.02) and tumor size (P<0.001) in univariate tests. Using multinomial logistic regression to adjust for potential confounders, patient age and tumor size, as well as known disease risk factors of alcohol intake and total dietary folate, were all significantly (P<0.0001) associated with methylation class membership. Breast cancer prognostic characteristics and risk-related exposures appear to be associated with gene-specific tumor methylation, as well as overall methylation patterns.


Introduction
Breast cancer is the most common non-skin cancer among American women. The American Cancer Society's estimates indicate approximately 1.3 million new cases of invasive breast cancer were diagnosed globally in 2007; and nearly 500,000 women died from the disease [1]. Currently, there are over 2.5 million breast cancer survivors in the US, and an estimated $8.1 billion dollars is spent each year on treatment of breast cancer [2].
The principal prognostic indicator currently in clinical use for breast cancer is the tumor-node-metastasis (TNM) stage [3,4]. Morphological attributes of malignant tumors that influence disease prognosis are the size of the primary tumor (T), presence and extent of regional lymph node involvement (N) and presence of distant metastases (M). Molecular attributes of tumors are also considered in clinical decision-making; loss of hormone receptor expression [5] and increased expression of ERBB2 [6] have each been associated with poor prognosis. Although numerous recent studies have demonstrated that alterations of DNA methylation in breast cancers are common and may be important etiologic and prognostic markers [7][8][9][10][11][12][13][14], large gaps in our knowledge remain. There is a notable lack of studies examining tumor DNA methylation in relation to breast cancer risk factors such as diet or reproductive factors in conjunction with other important tumor markers. Patient exposures such as alcohol and folate intake have potentially strong mechanistic links to epigenetic dysregulation [15]. In addition, recent work in-vitro and in animal models suggest that long term exposure to estrogen may lead to epigenetic effects and altered profiles of DNA methylation [16,17]. To explore associations of tumor methylation with important tumor and patient characteristics, we analyzed tumors from breast cancer patients in the Kaiser Permanente Division of Research Pathways Study using a large scale methylation array. Table 1 shows the patient demographic, hormonal, dietary and tumor characteristics for the 162 women overall (and stratified by menopausal status in Table S1). Results of unsupervised hierarchical clustering of the 750 most variable CpG loci indicate the epigenetic heterogeneity of these tumors ( Figure 1).

Unsupervised clustering and locus-by-locus analysis
In array-wide locus-by-locus analysis the strongest associations of methylation of individual loci (Q-values ,0.05) were observed for tumor grade (loci n = 209), tumor size (loci n = 74), estrogen receptor status (loci n = 183), progesterone receptor status (loci n = 69), and triple negative status (tumors negative for both estrogen and progesterone receptors as well as ERBB2; loci n = 130; Table S2). Together with tumor size, patient lymph node status is used in tumor staging. Among five CpG loci whose methylation was significantly associated (Q,0.05) with lymph node status, four (two in COL1A2, and one each in LOX and P2RX7) were also associated with tumor size (Q,0.05). Additionally, there was a trend of increased methylation associated with increased tumor size: for all 74 CpG loci that were significantly associated with tumor size (Q,0.05) methylation increased with larger tumor size. Similarly, all five CpGs associated with diseasepositive lymph nodes had increased methylation in tumors in women with disease-positive lymph nodes. Details of locus-bylocus analyses for tumor grade, size, hormone receptor, and triple negative status (loci with Q,0.05) are given in Table S3.

Array validation
Methylation array validation was performed at CpGs with highly ranked associations from locus-by-locus analysis. The array CpG whose methylation was most significantly increased with increasing tumor stage was in the FES gene (Table S3) and array methylation was significantly correlated with Sequenom methylation (rho = 0.68, P = 1.1E-12, n = 85; Figure 2A). Promoter CpGs in P2RX7 and HSD17B12 had significantly increased methylation (Q,0.0001, and Q = 0.01 respectively) with increasing tumor size (Table S3) and array methylation at these CpGs were significantly correlated with Sequenom methylation (P2RX7; rho = 0.65, P = 8.6E-12, n = 88; HSD17B12; rho = 0.34, P = 5.4E-05, n = 137; Figure 2B and 2C). A promoter CpG in GSTM2 had significantly increased methylation with increasing tumor grade Recognizing the increasing use of pre-operative chemotherapy for patients with operable, early-stage disease, there is added complexity in breast cancer staging. Since chemotherapy can considerably decrease tumor size, it is still unclear whether pre-operative or post-operative stage best informs prognosis and treatment decisions for patients electing pre-operative chemotherapy. However, our data clearly illustrate the promise of tumor DNA methylation for augmenting tumor staging and can be attained with minimal tissue in a pre-operative context. (Table S3) and array methylation was significantly correlated with Sequenom methylation (rho = 0.83, P,2.2E-16, n = 140; Figure 2D). Additionally, in all cases, Sequenom methylation values were significantly associated with respective covariates; tumor stage with FES methylation (P = 0.05), tumor size with P2RX7 (P,0.005) and HSD17B12 methylation (P,0.02), and tumor grade with GSTM2 methylation (P,0.001). Furthermore, relative mRNA expression of GSTM2 was significantly decreased among tumors with high array methylation at both CpGs associated with tumor grade (P,0.001 and P,0.03, Figure S1).

Clustering of DNA methylation patterns with RPMM
In order to explore overall methylation profiles of these tumors and their potential relationships with patient demographic, tumor and exposure characteristics we applied a modified model-based form of unsupervised clustering known as recursively partitioned mixture modeling (RPMM) [18]. The RPMM resulted in the eight methylation classes (average methylation profiles shown in Figure 3). Patient race was significantly associated with methylation class membership (P = 0.015, Table 2), with the majority of African Americans (54%) residing in class 2, and 40% of Hispanic cases residing in class 4. An association between methylation class membership and alcohol consumption approached statistical significance (P = 0.07, ever vs. never drinker, Table 2). Both supplemental folic acid intake (mg/day) and total dietary folate (mg/day) had associations with methylation class membership that approached statistical significance (P = 0.06 and P = 0.08 respectively; Table 2). For both folate variables, cases in methylation class 4 had the lowest intake and cases in methylation class 6 had the highest intake. Of the tumor characteristic variables, only tumor size was significantly associated with overall methylation profile (P = 0.0006, Table 2).  Trends of DNA methylation related to alcohol and folate intake Associations between alcohol intake and dietary folate and methylation class membership approached statistical significance. While methylation of only one CpG locus (in IL17RB) was significantly associated with folate intake in locus-by-locus tests (Q,0.05), regression coefficients from univariate locus-by-locus analysis plotted against their respective P-values revealed trends in the pattern of methylation for both alcohol and folate intake. Figure 4A illustrates the strong trend for patients with increasing alcohol intake to have negative regression coefficients, indicative of decreased methylation. In contrast, the trend for patients with increasing total dietary folate shows a strong shift to positive regression coefficients, indicative of increased methylation ( Figure 4B).

Multivariate modeling of RPMM classes
The relationships between methylation classes and several covariates of interest were then modeled together using multinomial logistic regression in order to adjust for other factors in the model. Patient age, alcohol consumption, total dietary folate, and tumor size were each strongly associated with methylation class membership when controlling for all modeled variables (all Wald P-values ,0.0001) and complete model details are given in Table   S4. Figure 5 displays an illustration of the model results for covariates significantly associated with methylation classes. As alcohol consumption increased, there was an increased probability of cases residing in methylation classes 3 and 8, and a concomitant decrease in the probability of cases residing in classes 2 and 4 ( Figure 5B). Increasing total dietary folate intake imparted a striking increase in the probability of membership in class 6, and a decreased probability of class membership in classes 1, 3, 4, and 7 ( Figure 5C). The strong association between tumor size and methylation class membership remained after controlling for potential confounders, with the probability of patients being in class 2 increasing from about 20% to about 60% across the span of tumor size from 0 mm to 80+mm ( Figure 5D). Accompanying this trend for tumor size were simultaneous decreases in the probability of cases with increasingly large tumors residing in classes 1 and 5-8, while tumor size had less influence on the probability for residing in classes 3 or 4 ( Figure 5D).

Hormone receptor status among postmenopausal cases
Although neither estrogen nor progesterone receptor status were significantly associated with RPMM methylation profiles, large numbers of specific CpG loci had significant methylation associations with these tumor characteristics in locus-by-locus analysis (Table S2 and Table S3). Compared to the overall population of women diagnosed with breast cancer in the Kiaser Permanente Northern California cancer registry from 200-2009, this surgical cohort has a higher prevalence of hormone receptor positivity (78% overall vs. 88% here), particularly among premenopausal women's tumors (74% overall vs. 95% here). We therefore stratified on menopausal status, running RPMM on methylation data from post menopausal patients' tumors only (n = 117). This model resulted in eleven methylation classes ( Figure  S2) and methylation class membership was significantly associated with estrogen receptor status (P,0.03), and the association for triple negative tumors approached significance (P = 0.07) detailed results available in Table S5.

Discussion
It is becoming increasingly common to include data on molecular alterations from patient tumor samples into routine clinical practice as a means of improving prognosis and evaluating the predictive power of alterations of interest. As technology improves and population-based studies and clinical trials are conducted, medicine is being ushered into a new era of molecular characterization of disease. Tumor-node-metastasis (TNM) stage is the current prognostic indicator for breast cancer, though several clinical trials are currently under way to investigate the utility of molecular markers [19], and as more patients elect neoadjuvant therapy (specifically pre-operative chemotherapy), improved clinical staging and additional staging tools are poised to have great impact. Most current studies and one commercially available tool (Oncotype DX) are focused on gene expression markers, though the inherent instability of mRNA may make implementation of these strategies challenging outside of major surgical centers or centralized commercial laboratories. In contrast, DNA methylation is a stable mechanism of control of transcription, and the stability of DNA makes it an attractive target for accurate and reproducible assessment. Here we reported that tumor size, a cornerstone of breast cancer prognosis, is associated with tumor DNA methylation profile. In addition, we found that alcohol and folate intake, exposures related to disease risk, are independently associated with tumor DNA methylation profiles. This work sheds light on the relationship between important etiologic exposures and molecular subclasses of disease, extends the evidence for the utility of molecular characterization in tumor staging, and can be accomplished with minimal tissue in a pre-operative context.
The recently updated American Joint Committee on Cancer (AJCC) staging manual for breast cancer does not include additional molecular markers, though the committee acknowledged their consideration of markers such as hormone receptor status and stated that TNM staging ''may play increasingly less important roles than understanding the biology of the cancer'' [4]. Examining TNM variables we found that overall DNA methylation profile and methylation alterations in dozens of individual CpG loci were significantly associated with tumor size (all increased methylation). In contrast, methylation alterations of only five CpG loci (two in COL1A2, and one each in FAS, LOX, and P2RX7) were significantly associated with disease-positive lymph nodes. However, methylation of four of five lymph-nodepositive associated CpGs (excepting FAS) were also significantly associated with tumor size, suggesting that these phenotypes are mechanistically related, and at least in part manifest via epigenetic alterations. As FAS encodes a TNF-receptor involved in regulating apoptosis it is not surprising that methylationinduced silencing of this receptor is associated with diseasepositive lymph node status. In addition, hypermethylation of COL1A2 (collagen type I, alpha 2) has been associated with both proliferation and migration activity in bladder cancer [20], LOX     is involved in the control of normal collagen deposition [21], and P2RX7 loss has been linked to morphologic changes in stroma related to altered collagen fibril alignment [22]. Collectively these data suggest that perturbations in collagen and collagen-related genes promote tumor growth and invasion, perhaps by altering the architecture of connective tissues in the tumor microenvironment. In support of this hypothesis, recent work in a mouse model has shown that altered mammary stromal tissue collagen expression significantly increases tumor formation and invasiveness potential [23]. Additionally, Chernov et al. showned that epigenetic alterations in collagen and collagen-related genes allows the deposition of an invasion-promoting collagen matrix in both breast and brain tumor cell lines [24]. The primary objective of TNM staging is to provide a standard prognosis nomenclature for patient care [4], and our results suggest that methylation markers may be a robust proxy for tumor size. Importantly, broader application of neoadjuvant therapy complicates breast cancer staging since chemotherapy can considerably decrease tumor size prior to surgical treatment, and it is still unclear whether clinical or pathologic stage best informs prognosis and treatment decisions [19]. The AJCC has added methodology (yc or ypTNM) for differentiating clinical and pathologic staging; in part, this is from recognition of the increasing use of neoadjuvant therapy for patients with operable, early stage disease [4,25,26]. Our data illustrate the promise of tumor DNA methylation for augmenting tumor staging. However, additional study of the relationship between tumor methylation and size in both pretreatment and postoperative samples is necessary. Specifically, the value of methylation to act as an additional marker of size in the neoadjuvant setting should be evaluated in future studies that compare both imaging and pathologically based size determination. In order to evaluate the predictive power of DNA methylation profiles and individual loci for disease prognosis and recurrence, these patients continue to be followed for these events. Associations between DNA methylation and patient survival have been reported for individual genes such as GSTP1 and PITX [7,8,10], though overall DNA methylation profiles, or patterns of methylation at selected CpG loci or genes, may improve predictive power. Well recognized molecular subtypes of breast cancer such as hormone receptor negative and ERBB2 over-expressing tumors are known to be associated with reduced survival [27], and it will be necessary to extensively examine methylation markers stratified by commonly used molecular tumor markers. However, we did not find significant associations between ERBB2 status and CpG methylation in our analysis. Nonetheless, other well recognized molecular subtype markers; estrogen receptor, progesterone receptor, and triple negative status were among the covariates with the highest number of significant CpGs from array-wide locus-by-locus analysis. However, hormone receptor status and triple negativity were not associated with methylation profile when modeling all cases. Premenopausal patients' tumors in our surgical cohort had a higher prevalence of hormone receptor positivity compared to the overall population of premenopausal patients diagnosed with breast cancer. In order to address the potential bias this introduced we modeled the methylation profiles of postmenopausal patients' tumors separately and found a significant association between estrogen receptor status and methylation class. Additional study will be needed to better understand the role of hormone receptor and growth factor receptor expression in these tumors as they relate to methylation profile in the context of a patient's menopausal status.
We found significant, independent associations between both alcohol and folate intake and overall tumor DNA methylation profiles when controlling for potential confounders. Folate is a B vitamin that donates its methyl group for homocysteine remethylation to methionine as part of one-carbon metabolism. In turn, methionine is the methyl donor for DNA methylation via Sadenosyl methionine. However, alcohol is known to interfere with folate absorption in the intestine and hepatic release of folate, and hence, supply to tissues [28]. In fact, strong evidence of an etiologic role for alcohol in breast cancer has been reported in multiple meta-analyses of prospective and case-control studies with an excess risk for each alcoholic drink per day of about 10% [29,30]. In contrast, meta-analysis of prospective studies has not provided clear support for an overall protective association between folate intake and breast cancer risk [31]. Yet, metaanalysis of case control studies of dietary folate, including results from the Shanghai Breast Cancer Study (whose participants are not regular alcohol drinkers) generally support a protective role for folate [31,32].
While there have been numerous studies of alcohol and folate in relation to risk of breast cancer, investigations of the relationship between these exposures and epigenetic alterations in tumors themselves are scarce. Tao et al. reported that the prevalence of breast tumor methylation at CDKN2A, CDH1, and RARB did not differ by folate intake or lifetime alcohol consumption in genotype strata of one-carbon metabolism enzymes methylenetetrahydrofolate reductase (MTHFR) and methionine synthase (MTR) [33]. Consistent with these findings (and perhaps the lack of similar null results in the literature), we too did not find associations between alcohol or folate and methylation of CpG loci in CDKN2A, CDH1, and RARB. Further, after correcting for multiple comparisons, no CpG loci had significant alcohol-related methylation, and only one CpG locus (in the IL17RB promoter) was associated with folate intake. Alone, these results suggested that folate and alcohol intake do not influence tumor DNA methylation. However, plots of regression coefficients indicated strong independent trends for increased folate and reduced alcohol intake associations with increased CpG methylation. Since global, low-level effects of alcohol and folate intake on CpG methylation may not be detectable at individual CpGs in a genome-wide context, we examined the global relationships between alcohol or folate intake and DNA methylation using RPMM methylation classes. Modeling both exposures together revealed highly significant, independent associations between alcohol and folate and DNA methylation profile. Another human cancer for which alcohol is an important etiologic factor is head and neck squamous cell carcinoma, and previous work from our group demonstrated a similar relationship between DNA methylation profiles of these tumors and alcohol consumption [34]. Taken together with the weak mutagenic potential of alcohol [35], these results suggest that a major carcinogenic mechanism of action of alcohol is interference with epigenetic regulation through disruption of one-carbon metabolism.
In summary, we found tumor DNA methylation associated with tumor characteristics predictive of prognosis, and DNA methylation and patient exposures known to be related to disease risk. Additional study is needed to determine the prognostic value of DNA methylation markers. However, the potential clinical utility of tumor-size-related DNA methylation is apparent.

Study population
The Pathways Study is a prospective cohort study of breast cancer survival actively recruiting women diagnosed with invasive breast cancer from the Kaiser Permanente Northern California (KPNC) patient population since January 2006. Further study details are provided elsewhere [36]. Written informed consent is obtained from all participants before they are enrolled in the study. The study was approved by the IRB of KPNC and all collaborating sites.

Demographic, hormonal, and dietary factors
During the in-person baseline interview, participants were asked detailed information on family history of cancer and reproductive history, including: age at first full-term pregnancy, number of biological children, breastfeeding, and menopausal status. Additional information was collected on smoking, alcohol use, hormone use (oral contraceptives, hormone replacement therapy), and demographics (age at breast cancer diagnosis, race/ethnicity, household income, education). Self-reported height and weight around diagnosis was obtained to calculate body mass index (BMI, kg/m 2 ). Any missing values were supplemented by concurrent information from KPNC electronic medical records.

Tumor characteristics
Data on estrogen and progesterone receptor status and ERBB2 expression were obtained from the KPNC Cancer Registry [37]. Tumor size was measured in a uniform manner by participating study pathologists. Data are collected, coded, and added to the KPNC registry approximately four months post-diagnosis to allow for the completion of treatment. For all breast surgical specimens, hormone receptor status and ERBB2 expression is routinely determined by IHC at the KPNC regional IHC lab, and if the IHC staining for ERBB2 expression is equivocal (less than 30% strong staining, but more than 10% weak staining), by fluorescence in situ hybridization at the KPNC regional cytogenetics lab.

Study samples
162 tumor specimens from the initial diagnostic biopsy were obtained from the KPNC tumor biorepository for methylation analysis. All tumor specimens were from patients who did not receive neoadjuvant chemotherapy.

Methylation analysis
FFPE tissue DNA was extracted using the QIAamp DNA mini kit according to the manufacturer's protocol (Qiagen, Valencia, CA). DNA was treated with sodium bisulfite to convert unmethylated cytosines to uracil using the EZ DNA Methylation Kit (Zymo Research, Orange, CA) according to the manufacturer's protocol. Illumina GoldenGate methylation bead arrays were used to simultaneously interrogate 1505 CpG loci associated with 803 cancer-related genes. Bead arrays have a similar sensitivity as quantitative methylation-specific PCR and were run at the UCSF Institute for Human Genetics, Genomics Core Facility according to the manufacturer's protocol and as described by Bibikova et al [38]. GoldenGate array methylation data are publicly available on the Gene Expression Omnibus archive, accession GSE22290.

Array methylation validation by Sequenom EpiTYPER mass spectroscopy
Array methylation was validated with Sequenom EpiTYPER base-specific cleavage and MALDI-TOF MS of bisulfite treated DNA [39]. EpiTYPER assays were designed for CpG loci both with significant associations between methylation and tumor characteristic variables as well as a high standard deviation of methylation values across samples. One assay (for COL1A2) failed the design process. Samples were processed at the UCSF Institute for Human Genetics, Genomics Core Facility. Briefly, PCR with primers located on either side of the CpG sites of interest are transcribed into an RNA transcript and cleaved base specifically. The cleavage products are analyzed by MALDI-TOF MS, and a characteristic mass signal pattern that distinguishes methylcytosine from thymine is obtained.

Gene expression by RT-PCR
Messenger RNA expression was measured using RT-PCR with preamplification using a validated approach [40]. RNA extraction was performed using the RecoverAll (Ambion), with a 16 hour tissue digestion and yields were determined using a Nanodrop spectrophotometer. Samples were concentration-normalized and reverse-transcribed with iScript cDNA synthesis kit (BioRad). Following cDNA synthesis, we performed linear, gene specific preamplification of samples and controls using the TaqMan preamp protocol (Applied Biosystems). Relative expression was measured using a HT7900 real time PCR instrument (Applied Biosystems).

Statistical analysis
Data assembly. Data were assembled with BeadStudio methylation software from Illumina (SanDiego, CA). All array data points are represented by fluorescent signals from both methylated (Cy5) and unmethylated (Cy3) alleles, and methylation level is given by b = (max(Cy5, 0))/(|Cy3|+|Cy5|+100), the average methylation (b) value is derived from the ,30 replicate methylation measurements. Raw average b values were analyzed without normalization as recommended by Illumina. At each locus for each sample the detection P-value was used to determine sample performance; all samples, had detection P-values ,1e-5 at more than 75% of CpG loci and passed performance criteria. CpG loci with a median detection P-value .0.05 (n = 8, 0.5%), were eliminated from analysis. All CpG loci on the X chromosome were excluded from analysis. The final dataset contained 1413 CpG loci associated with 773 genes.
Unsupervised clustering. Subsequent analyses were carried out using the R software [41]. For exploratory and visualization purposes, hierarchical clustering was performed using R function hclust with Manhattan metric and average linkage. To discern and describe the relationships between CpG methylation and patient and tumor covariates a modified model-based form of unsupervised clustering known as recursively partitioned mixture modeling (RPMM) was used as described in [18] and as used in [42]. Permutation tests (running 10,000 permutations) were used to test for association with methylation class by generating a distribution of the test statistic for the null distribution for comparison to the observed distribution. For continuous variables, the permutation test was run with the Kruskal-Wallis test statistic. For categorical variables we used the standard chisquare statistic for testing association between two categorical variables.
Locus-by-locus analysis. Associations between covariates and methylation at individual CpG loci were tested with a generalized linear model. The b-distribution of average b values was accounted for with a quasi-binomial logit link with an estimated scale parameter constraining the mean between 0 and 1, in a manner similar to that described by Hsuing et al. [43]. Arraywide scanning for CpG loci associations with sample type or covariate used false discovery rate estimation and Q-values computed by the qvalue package in R [44].
Multinomial logistic regression. Multinomial logistic regression was used to model methylation class while controlling for potential confounders. Referent class selection does not affect the underlying interpretation of the model and as class three was neither the largest, nor the smallest class, and was relatively hypomethylated it was chosen as the referent class. Because of the potentially large number of methylation classes, logistic regression coefficients were regularized using a ridge (L2) penalty, with coefficients for a common (non-intercept) covariate across outcome levels shrunk toward zero [34] the tuning parameter was selected by minimizing Bayesian information criterion.
Sequenom EpiTYPER methylation and RT-PCR. Spearman correlation coefficients and test P-values are reported for correlation between array and Sequenom methylation values. Tests for association between methylation and mRNA expression used relative mRNA expression versus array methylation average b stratified into two groups around 0.5 with the Kruskal-Wallis test statistic.