Epigenotyping in Peripheral Blood Cell DNA and Breast Cancer Risk: A Proof of Principle Study

Background Epigenetic changes are emerging as one of the most important events in carcinogenesis. Two alterations in the pattern of DNA methylation in breast cancer (BC) have been previously reported; active estrogen receptor-α (ER-α) is associated with decreased methylation of ER-α target (ERT) genes, and polycomb group target (PCGT) genes are more likely than other genes to have promoter DNA hypermethylation in cancer. However, whether DNA methylation in normal unrelated cells is associated with BC risk and whether these imprints can be related to factors which can be modified by the environment, is unclear. Methodology/Principal Findings Using quantitative methylation analysis in a case-control study (n = 1,083) we found that DNA methylation of peripheral blood cell DNA provides good prediction of BC risk. We also report that invasive ductal and invasive lobular BC is characterized by two different sets of genes, the latter particular by genes involved in the differentiation of the mesenchyme (PITX2, TITF1, GDNF and MYOD1). Finally we demonstrate that only ERT genes predict ER positive BC; lack of peripheral blood cell DNA methylation of ZNF217 predicted BC independent of age and family history (odds ratio 1.49; 95% confidence interval 1.12–1.97; P = 0.006) and was associated with ER-α bioactivity in the corresponding serum. Conclusion/Significance This first large-scale epigenotyping study demonstrates that DNA methylation may serve as a link between the environment and the genome. Factors that can be modulated by the environment (like estrogens) leave an imprint in the DNA of cells that are unrelated to the target organ and indicate the predisposition to develop a cancer. Further research will need to demonstrate whether DNA methylation profiles will be able to serve as a new tool to predict the risk of developing chronic diseases with sufficient accuracy to guide preventive measures.


Introduction
Each year, more than 1.15 million new cases of breast cancer are diagnosed worldwide [1]. Currently the Gail model, based on epidemiological risk factors, is the best we have of predicting overall risk of invasive breast cancer and the concordance statistics for correctly classifying women with respect to this outcome is 58-59% [2].
Epigenetic changes, in particular DNA methylation, are emerging as one of the most important events in carcinogenesis [3][4][5][6]. Recently, an increasing body of evidence from animal studies demonstrated that environmental factors can result in epigenetic modifications and subsequent changes in the risk of developing disease [7,8]. There is preliminary evidence from small studies that peripheral blood cell DNA contains epigenetic information, which is a valuable predictive marker of an individual's risk of developing cancer [9,10,11]. Only the largest of these studies, involving 172 individuals and analyzing one locus, demonstrated that epigenetic alterations are able to predict the risk of colorectal cancer independently of family history [9].
We have undertaken a large scale study to test the hypothesis that methylation in peripheral blood cell DNA may be a predictor of breast cancer risk. Recently, two alterations in the pattern of DNA methylation in breast cancer have been described. First, active estrogen receptor-a (ER-a), which reflects breast cancer risk, is associated with decreased methylation of ER-a target genes (ERT) [12,13]. Second, polycomb group target genes (PCGT), which play a key role in stem cell biology, are more likely to have promoter DNA hypermethylation in cancer than other genes [14,15,16]. These observations led to our hypothesis that the pattern of peripheral blood cell DNA methylation of ERT genes and PCGT genes is an important predictor of breast cancer risk. Based on our previous data we hypothesized that as a function of time and dose, cumulative estrogen exposure during lifetime leaves an epigenetic signature in peripheral white blood cell DNA (depending on a woman's genetic and lifestyle background), which is associated with a postmenopausal risk to develop breast cancer. Physiologically, aging and decrease of estrogens in the postmenopause may lead to methylation of ERT genes. However, lack of methylation may be an indicator for a long-term high estrogen exposure, thus the expectation would be that ERT genes have a lower frequency of methylation in peripheral blood cell DNA of women with breast cancer compared to controls. Furthermore, embryonic stem cells rely on polycomb group proteins to reversibly repress genes required for differentiation. We recently reported that stem cell polycomb group targets are up to 12-fold more likely to have cancer-specific promoter DNA hypermethylation [14], supporting a stem cell origin of cancer where reversible gene expression is replaced by permanent silencing, locking the cell into a perpetual state of self-renewal that predisposes to subsequent malignant transformation. Based on this, our expectation would be that PCGT genes have a higher frequency of methylation in peripheral blood cell DNA of women with breast cancer compared to controls.
In this study we have demonstrated that methylation in peripheral blood cell DNA is related to breast cancer risk. This observation opens opportunities for progress in risk assessment and prevention of breast cancer and is a model for investigation in other chronic diseases.

Results and Discussion
In order to test our hypothesis that peripheral blood cell DNA methylation is predictor of breast cancer risk, we took a three step approach. In step 1 a total of 49 genes were selected as shown in Table 1. These include (a) 19 ERT genes, previously reported to be associated with decreased DNA methylation [12,13], (b) 4 genes which are not known to be ER-a target genes, but have been demonstrated to be differently methylated depending on hormone receptor status (DMHR) [12], (c) 20 PCGT genes, which play a key role in stem cell biology and are more likely to have promoter DNA hypermethylation in cancer [14,15,16] and (d) 6 genes which are known to be methylated in breast cancer (MBC) [12] and are not a member of the other three groups.
In step 2 we used MethyLight [17] to analyze the 49 genes (Table 1; additional information see Table S1a-b) in peripheral blood cell DNA of 83 healthy postmenopausal women (participating in UKOPS; see Materials and Methods) blinded for any Gene loci (either one locus or two separate loci of the same gene) have been analyzed and were selected based on their frequency of methylation in peripheral blood cell DNA in the general population of 83 healthy postmenopausal women. Genes that were selected for further analysis in the case/control study are indicated with an asterisk. ERT = estrogen receptor-a target; DMHR = differently methylated depending on hormone receptor status; PCGT = stem cell polycomb group target; MBC = methylated in breast cancer. doi:10.1371/journal.pone.0002656.t001 history of cancer. DNA methylation at each locus (expressed as percentage of methylated reference; PMR) was divided into methylated (PMR.0) and unmethylated (PMR = 0), given that methylation was present only in a minority of women for most genes. Based on the hypothesis that lack of methylation of ERT (and DMHR) and methylation of PCGT genes predict breast cancer risk, for further analysis in the case-control study (step 3), we selected those ERT loci with a higher frequency in methylation and those PCGT loci with a lower frequency of methylation in the healthy postmenopausal women analyzed. All 4 DHMR and the 2 MBC gene loci with a lower frequency of methylation in the healthy individuals were also selected for further analysis. On this basis a total of 25 genes were further analyzed in a case-control population (indicated with an asterisk in Table 1).
Step 3 involved conducting a carefully designed case-control study involving 1,083 samples from both healthy women and breast cancer patients provided from the ESTHER study (see Material and Methods). 353 cases and 730 age-matched controls, recruited in a state-wide study in Saarland, Germany, were included in the study. Cases were more likely to have first-degree relatives with breast cancer (odds ratio (OR) 1.90 (95% confidence interval (CI) 1.27-2.85; P = 0.002). In addition, there was a trend towards increased breast cancer risk for women with late menopause (50+ vs. ,50 yrs. OR 1.31 (95% CI 0.98-1.75; P = 0.10). There was no significant difference between cases and controls regarding other features (Table 2). Results were essentially unchanged when all continuous covariates were entered as such in the model. A multiple logistic regression model based on the risk factors addressed in table 2 showed a concordance statistic [18] of 0.585, which is identical with the concordance statistics demonstrated for the Gail model [2]. Jointly, the various risk factors contributed significantly to the prediction of disease status (P-value for likelihood ratio test = 0.007, 12 degrees of freedom). Clinicopathological features of the cases are equivalent to an average cohort of women with breast cancer (Table S2).
Using PMR values, 7 of the 25 genes analyzed in stage 3 demonstrated significant differences between the 353 cases and 730 controls ( Table S3). 6 of these 7 genes retained their significant differences after dividing each locus into methylated (PMR.0) or unmethylated (PMR = 0) and adjusting for age ( Table 3). 5 of those were significantly associated with breast cancer risk even after adjusting for family history of breast cancer ( (methylated vs. unmethylated) showed a concordance statistic of 0.628. Jointly, the methylation status of the 25 genes was significantly related to disease risk (P-value for likelihood ratio test with 25 degrees of freedom = 0.048). The combination of both risk factors and methylation status increased the concordance statistic to 0.647. The likelihood ratio test for adding traditional risk factors (family history, age at menarche, age at 1 st childbirth, age at menopause, BMI at age 20, history of breastfeeding, history of hormone replacement therapy, 7 degrees of freedom) to a model including methylation status of the 25 genes and the matching factor age yielded a P-value of 0.011. Conversely, the likelihood ratio test for adding methylation status of the 25 genes (25 degrees of freedom) to a model including traditional risk factors yielded a P-value of 0.057. Taken together, these patterns suggest exactly what we had expected based on the selection of genes we used for this study: The prediction of risk by traditional risk factors and methylation patterns are not entirely independent. This again supports our hypothesis that DNA methylation, on one hand may act as a surrogate for genetic risk and lifelong environmental exposure, but on the other hand, independently reflects the individual response of women to these factors. In this context the results of the first genome-wide-breast cancer susceptibility study which was recently reported are relevant.
The study identified 5 novel breast cancer susceptibility loci after a three stage procedure, which started with analysis of 227,876 single nucleotide polymorphisms (SNPs). The per allele odds ratio for an association with breast cancer for the top five SNPs was between 1.07 and 1.26 [19]. This study suggests that it is unlikely that genetic tests alone will achieve the sensitivity and specificity required for risk assessment to guide preventive measures.
Furthermore, the two main histological subtypes in breast cancer are invasive ductal and invasive lobular. It is known that there are specific carcinogenic pathways for these two breast cancer groups. There is growing evidence suggesting that a change in tumor tissue architecture takes place in the genesis of invasive lobular breast cancer known as epithelial-mesenchymal transition [20]. To study whether this is also reflected in the epigenotype of peripheral blood cells, we calculated separate ORs which predict each histological subtype (Table S4a-b). The genes which predicted invasive ductal breast cancer (ZNF217, NEUROD1, SFRP1) were completely different from the genes associated with invasive lobular cancer (PITX2, TITF1, GDNF, MYOD1, DCC). Interestingly genes which predicted invasive lobular cancer are involved in diseases that affect the mesenchyme like Hirschsprung disease or are known to be involved in epithelial-mesenchymal transition [21][22][23][24][25]. Beside histology, estrogen receptor status is the most important feature in breast cancer. It is known that estrogen exposure (e.g. serum estradiol in postmenopausal women) is only associated with ER positive breast cancer [26]. Therefore we tested which of the 25 genes predict ER positive and which predict ER negative breast cancer. Interestingly only the ERT gene loci (ZNF217 OR 1.48 (95% CI 1.01-2.16; P = 0.042) and NUP155 OR 1.46 (95% CI 1.05-2.02; P = 0.023) predicted ER positive breast cancer (Table 4), whereas a PCGT gene locus (SFRP1 OR 2.37 (95% CI 1.23-4.57; P = 0.01) predicted ER negative breast cancer (Table 5).
Given the relationship between ERT genes and breast cancer risk we considered the possibility that current serum estrogen activity may have an impact on DNA methylation of these genes. To assess this possibility we analyzed ER-a bioactivity in the serum of the cases and controls using a functional assay, which measures the potential for binding to ER-a and transactivating the estrogen responsive elements (EREs). This ER-a ERE-GFP (green fluorescent protein) reporter test system in Saccharomyces cerevisiae has been recently described [27,28]. Only peripheral blood cell DNA methylation of ZNF217, one of the 10 ERT genes, demonstrated a significant inverse correlation (r = 20.112; P = 0.046) with ER-a bioactivity in the corresponding serum ( Figure 1). The possibility that this correlation could be triggered by the current use of hormone replacement therapy was considered. After excluding women currently taking hormone replacement therapy, the inverse correlation remained significant (r = 20.166; P = 0.013). Our data suggest that lack of ZNF217 methylation is a long-term surrogate marker of estrogen exposure, indicating a woman's risk of breast cancer. This observation fits with evidence for changes in the activity of ZNF217 in breast carcinogenesis. ZNF217 encodes a transcription factor which mainly represses genes involved in differentiation [29]. Introduction of ZNF217 into human mammary epithelial cells leads to immortalization [30]. In addition, ZNF217 is amplified and overexpressed in 40% and 18% of breast cancer cell lines and tissues, respectively [31,32].
Interestingly, analyzing solely women currently taking hormone replacement therapy, PGR, which is also an ERT gene, was the only locus demonstrating significant inverse correlation with corresponding serum ER-a bioactivity (r = 20.242; P = 0.044). Peripheral blood cell DNA methylation of PGR, the gene coding for the progesterone receptor whose expression is strongly regulated by ER-a, was not linked to breast cancer risk. PGR methylation is not significantly different between non-neoplastic breast tissue and breast cancer in premenopausal women [14], but PGR methylation analyzed in breast cancer tissue adjusted for age and expression of progesterone receptor is the most significant predictor of ER status in breast cancer [12]. Using logistic regression and after adjustment for age and progesterone receptor status, lack of peripheral blood cell PGR methylation predicted the ER status of the corresponding breast cancer (OR 1.52; 95% CI 1.10-2.11; P = 0.012), whereas none of the other 24 gene loci's methylation status was an ER status predictor. This finding indicates that the epigenetic pathways and interactions, which have previously only been identified in cancer tissue, are also reflected in peripheral blood, a source of DNA completely unrelated to the breast cancer tissue. The results did not however confirm the hypothesis that peripheral blood cell DNA methylation of PCGT genes is associated with an increased breast cancer risk. One possible explanation for this could be that lack of methylation at specific gene loci is only a surrogate for general global hypomethylation. This hypothesis is supported by data demonstrating that lack of folic acid supply (a condition leading to depletion of available methyl groups and subsequent global hypomethylation) is associated with some forms of cancer [33]. In order to study global methylation we analyzed peripheral blood cell DNA methylation of ALU repetitive elements, which has recently been demonstrated to be an excellent marker for global methylation [34]. None of the single predictive loci was associated with global methylation in the 349 samples analyzed for methylation of ALU (Table 6). ALU methylation did not differ between cases and controls (controls (n = 180), median PMR 54.8 and interquartile range 34.0-84.0; cases (n = 169), median PMR 51.1 and interquartile range 34.7-73.0; Wilcoxon-Mann-Whitney test P = 0.25). This again indicates that the predictive potential of DNA methylation is only reflected by specific gene loci rather than by a surrogate marker which reflects global single-carbon metabolism.
It is worth noting that the samples used in this study were obtained after the diagnosis of breast cancer. In women with metastatic breast cancer, tumor cells (5 tumor cells in 7.5 mL blood) can be identified in the peripheral blood [35]. This means that on average 1 out of 40 million cells in the peripheral blood of women with metastatic (less in non-metastatic) are breast cancer cells. Although MethyLight is very sensitive, this assay is only capable of detecting methylated alleles in the presence of a 10,000 -fold excess of unmethylated alleles [17]. The possibility that the methylation signal we detect in peripheral blood cell DNA is influenced by tumor cell DNA can therefore be discounted. In this first large scale epigenotyping study, we were able to demonstrate that particular DNA methylation patterns in peripheral blood may serve as a surrogate marker for breast cancer risk. The current report also provides a basis for further research to assess the role of a combination of genotyping and epigenotyping as a clinical tool to predict an individual's risk of developing breast cancer, other cancers and chronic diseases with sufficient accuracy to guide preventive measures.

Materials and Methods
Samples UKOPS (United Kingdom Ovarian Cancer Population Study) study is being carried out in 10 regional centers in England, Wales and Northern Ireland. Recruitment has started since January 2006 and is aiming to include 2,000 ovarian cancer patients, 1,500 benign and 5,000 controls. Recruitment of cancer cases has been carried out during visits to the Gynecological Oncology wards. Control participants have been recruited from the UKCTOCS (United Kingdom Collaborative Trial of Ovarian Cancer Screening) trial centers (www.ukctocs.org.uk). Detailed information about the medical history was obtained by a standardized questionnaire. Diagnosis data from histology and cytology reports has also been included. Furthermore, serum and whole blood were collected from all subjects. Written consent was obtained from each participant. Ethical approval was received by the joint University College London Committees on the Ethics of Human Research (Committee A). In the current study 83 healthy postmenopausal women (blinded for cancer history) were selected for step 2.
ESTHER (www.esther.dkfz.org/esther/) is a population-based study carried out in the state of Saarland (located in South West Germany). In the clinical arm of the study, 1,981 cancer patients age 50 to 75, including 380 women with breast cancer, were recruited during their first stay in the hospital for primary cancer treatment. In the community arm, of the study, 9,953 women and men age 50-75 were recruited during routine health examinations by their general practitioners. Recruitment and baseline examinations were carried out in 2000-2003. In both study arms, detailed information about medical history, including family history, sociodemographic and lifestyle factors and current health status was obtained by standardized questionnaire. In addition, serum and whole blood samples were collected. Written informed consent was obtained from all subjects. The protocol was approved by the Ethics Committee of the Medical Faculty of the University of Heidelberg. Germany. In this study, all 353 cases  with postmenopausal invasive breast cancer were included, and a stratified random subset of 730 age-matched postmenopausal women were selected as controls.

DNA isolation and storage of DNA
A standard chloroform-based DNA isolation protocol has been used to extract DNA from whole blood from the UKOPS samples. DNA was extracted from whole blood samples of the ESTHER study participants by Invisorb extraction kits (Invitek; www.invitek. de). DNA from both sample collections have been dissolved in distilled water and stored at 280uC until analysis.

Analysis of DNA Methylation
Sodium bisulfite conversion (Zymo Research; www.zymoresearch. com), MethyLight analysis (Applied Biosystems; www. appliedbiosystems.com) and nucleotide sequences for most MethyLight primers and probes in the promoter or 59 end region was described recently [12,14,34,36]. Each Methy-Light reaction at a specific locus covers on average 5-7 CpG dinucleotides. A detailed list of primer and probes (Metabion; www.metabion.com) for all analyzed loci is provided in Table S1. Briefly two sets of primers and probes, designed specifically for bisulfite-converted DNA, have been used: a methylated set for the gene of interest and a reference set (COL2A1) to normalize for input DNA. Specificity of the reactions for methylated DNA was confirmed separately using SssI (New England Biolabs; www.newenglandbiolabs. co.uk) treated human white blood cell DNA (heavily methylated). The percentage of fully methylated molecules at a specific locus were calculated by dividing the GENE:-COL2A1 ratio of a sample by the GENE:COL2A1 ratio of the SssI-treated human white blood cell DNA and multiplied by 100. The abbreviation PMR (Percentage of Methylated Reference) indicates this measurement.
The analysis was performed blinded and cases and controls were randomly mixed for bisulfite treatment and real-time PCR. The concentration of bisulfite modified DNA (assessed by the level of the reference gene COL2A1) was the same between cases and controls.

ER-a serum bioactivity assay
ER-a bioactivity has been as recently described [27]. Each serum has been tested blindly in quadruplicates (20 mL for each reaction in a total of 100 mL) and a mean value has been calculated based on four independent measures (done on two different days).

Statistical Analysis
Descriptive analyses were performed on age and known risk factors for breast cancer among cases and controls and on clinical features among breast cancer cases. Percent methylated, mean and median values of PMR were calculated for the selected gene loci. Differences in PMR values between cases and controls were analyzed by means of Wilcoxon-Mann-Whitney test. In addition, odds ratios for breast cancer (overall and according to estrogen receptor status) associated with absence of methylation, adjusted for age (matching factor) and for positive family history (at least one first degree relative with breast cancer) as well as for other established breast cancer risk factors were calculated by multiple logistic regression. Multiple imputation was employed to deal with missing covariate data in multivariable analyses. Initially, a linear term for percent methylation in addition to the dichotomous term for absence of methylation was included in the models to address a potential dose-response relationship. As this term did not significantly improve prediction for any of the 25 genes assessed, it was dropped from the final models. Joint contribution of methylation status of all 25 genes to the prediction of breast cancer risk was evaluated by concordance statistics [18] and likelihood ratio tests. Spearman's rho was calculated to assess the correlation of global methylation (ALU repetitive elements) with methylation at single gene loci. All statistical analyses were done using SAS Software, version 8.2.

Supporting Information
Table S1 Information on genes analyzed. S1A: Primers and probe sequences for MethyLight. S1B: General gene information. Gene alternative names, chromosomal location and amplicons' position relative to the transcription start site are indicated. Found at: doi:10.1371/journal.pone.0002656.s001 (0.14 MB DOC)