Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context

Epigenetic control of gene transcription is critical for normal human development and cellular differentiation. While alterations of epigenetic marks such as DNA methylation have been linked to cancers and many other human diseases, interindividual epigenetic variations in normal tissues due to aging, environmental factors, or innate susceptibility are poorly characterized. The plasticity, tissue-specific nature, and variability of gene expression are related to epigenomic states that vary across individuals. Thus, population-based investigations are needed to further our understanding of the fundamental dynamics of normal individual epigenomes. We analyzed 217 non-pathologic human tissues from 10 anatomic sites at 1,413 autosomal CpG loci associated with 773 genes to investigate tissue-specific differences in DNA methylation and to discern how aging and exposures contribute to normal variation in methylation. Methylation profile classes derived from unsupervised modeling were significantly associated with age (P<0.0001) and were significant predictors of tissue origin (P<0.0001). In solid tissues (n = 119) we found striking, highly significant CpG island–dependent correlations between age and methylation; loci in CpG islands gained methylation with age, loci not in CpG islands lost methylation with age (P<0.001), and this pattern was consistent across tissues and in an analysis of blood-derived DNA. Our data clearly demonstrate age- and exposure-related differences in tissue-specific methylation and significant age-associated methylation patterns which are CpG island context-dependent. This work provides novel insight into the role of aging and the environment in susceptibility to diseases such as cancer and critically informs the field of epigenomics by providing evidence of epigenetic dysregulation by age-related methylation alterations. Collectively we reveal key issues to consider both in the construction of reference and disease-related epigenomes and in the interpretation of potentially pathologically important alterations.


Introduction
While all somatic cells in a given individual are genetically identical (excepting T and B cells), different cell types form highly distinct anatomic structures and carry out a wide range of disparate physiologic functions. The vast repertoire of cellular phenotypes is made possible largely via epigenetic control of gene expression, which is known to play a critical role in cellular differentiation. Epigenetics is the study of mitotically and/or meiotically heritable changes in gene function that cannot be explained by changes in DNA sequence [1], and includes critical normal processes such as X-chromosome inactivation and genomic imprinting. Alterations in epigenetic control have been linked to several human pathologic conditions including cancers, and Rett, ICF, and Beckwith-Wiedemann syndromes [2][3][4][5]. The most widely studied epigenetic mark is DNA cytosine methylation, most often investigated in the context of CpG dinculeotides in promoter regions which often have concentrations of CpGs known as CpG islands. Normal cells are thought to generally maintain CpG islands in an unmethylated state permissive to transcription [6]. However, emerging work has established the presence of tissue specific methylation patterns in normal tissue at these islands [7][8][9][10]. Further, just as normal genetic variation is now understood to be associated with a predisposition to a vast array of human diseases [11], it is important that we begin the research needed to define the underlying interindividual differences in tissue specific methylation that lead us to an understanding of the nature of the relationships that govern these crucial tissue specific differences.
We have previously distinguished normal and tumor tissues using methylation profiling [12,13]. These studies demonstrated variability in the methylation profiles of pleural mesotheliomas and tumors of the head and neck that was, in part, attributable to etiologically important exposures. In a similar manner, there is a basic need for epigenetic profiling of normal tissues to more completely characterize the normal pattern of promoter methylation variation in development, aging, and in response to common environmental exposures such as alcohol and tobacco smoke.
Efforts to describe the methylation profiles of normal tissues are now underway. Recent genome-wide studies of methylation in normal human tissues have shown that DNA methylation profiles are tissue-specific and correlated with sequence elements [8][9][10][14][15][16]. However, while these studies are groundbreaking in showing that tissues have different patterns of methylation, the underlying causes and extent of tissue-specific and non-specific interindividual variation in DNA methylation patterns remain largely unknown. In fact, in a follow up experiment from a larger effort, Illingworth et al. observed significant variation among individuals when bisulfite sequencing a particular CpG island, and suggested that larger-scale studies are required to determine the extent of interindividual variability in methylation patterns [9]. Epigenetic variation has been hypothesized to cause underlying differences in disease susceptibility among monozygotic twins, and young twin-pairs have been shown to be more epigenetically similar than older monozygotic twins [17]. Therefore, the aging process and differences in environment have been hypothesized to influence clinically significant changes in methylation profiles as individuals accumulate varying exposures with age. In fact, recent work has shown an overall trend of increased methylation associated with older age in normal human prostate and colon tissues in several genes [18,19]. Although an increase in promoter methylation with aging is generally accepted, recent evidence from Bjornsson et al. suggests a more complex picture. These authors found both increased and decreased intra-individual global methylation levels (enriched for promoter regions) in peripheral blood cell DNA over time [20]. In this background, it is crucial to more extensively characterize the contribution of aging and the environment to tissue-specific interindividual epigenetic variation.
In this study we used Illumina's GoldenGate methylation platform to investigate cytosine methylation in 217 normal human tissue specimens from 10 different anatomic sites in order to begin to understand variation both between and within tissues across individuals. Profiling CpG methylation of normal human tissues allowed us to begin characterizing the role of aging and environmental exposures in interindividual methylation variation, as well as specific gene-loci determinant of normal tissuespecificity. This work highlights the dynamic nature of epigenomes, and begins to disentangle the roles of aging, environmental factors, and innate variability among individual epigenomic profiles, both within, and across tissues.

Unsupervised clustering
Array methylation data were first assembled for exploration and visualization with unsupervised hierarchical clustering using Manhattan distance and average linkage for the 500 most variable autosomal CpG loci ( Figure 1). Epigenetic profiles among these normal tissues are strikingly different. Applying recursively partitioned mixture modeling (RPMM) [21] to methylation data from all autosomal CpG loci across all 217 normal human tissue samples resulted in 23 methylation classes and their average methylation profiles (Figure 2A). Among the 23 classes in this model, 16 classes (70%) perfectly captured only a single tissue type (Table 1), and methylation profile classes were a highly significant predictor of sample tissue type (permutation P,0.0001). Further, age was significantly associated with methylation classes (P,0.0001). Separating samples into groups as placenta, blood, or other solid tissue, we found a significant association between group and methylation profile classes (P,0.0001).

Supervised clustering
Random Forests (RF) classification of all samples based on methylation average beta values at all autosomal loci returned a confusion matrix showing: which samples are correctly classified, which are misclassified, and the misclassification error rate for each sample type (Table 2). Overall, 19 samples were confused with different tissue types, giving an overall misclassification error rate of 8.8%, significantly lower than expected under the null hypothesis (P,0.0001). Not unexpectedly, tissue types with larger sample sizes showed a significantly reduced misclassification error rates (P,0.05). The mean and standard deviation of average beta values for all autosomal CpG loci in each tissue type, and values for the decrease in random forest classification accuracy with locus removal are given in Table S1. In a RF analysis that examined whether samples could be correctly classified as placenta, blood, or other solid tissue, no samples were misclassified (misclassification error = 0%, P,0.0001), and the mean and standard deviation of average beta values for all autosomal CpG loci in each of placenta, blood, or solid tissue, and values for the decrease in random forest classification accuracy with locus removal are given in Table S2.

Author Summary
The causes and extent of tissue-specific interindividual variation in human epigenomes are underappreciated and, hence, poorly characterized. We surveyed over 200 carefully annotated human tissue samples from ten anatosites at 1,413 CpGs for methylation alterations to appraise the nature of phenotypically, and hence potentially clinically important epigenomic alterations. Within tissue types, across individuals, we found variation in methylation that was significantly related to aging and environmental exposures such as tobacco smoking. Individual variation in age-and exposure-related methylation may significantly contribute to increased susceptibility to several diseases. As the NIH-funded HapMap project is critically contributing to annotating the human reference genome defining normal genetic variability, our work raises key issues to consider in the construction of reference epigenomes. It is well recognized that understanding genetic variation is essential to understanding disease. Our work, and the known interplay of epigenetics and genetics, makes it equally clear that a more complete characterization of epigenetic variation and its sources must be accomplished to reach the goal of a complete understanding of disease. Additional research is absolutely necessary to define the mechanisms controlling epigenomic variation. We have begun to lay the foundations for essential normal tissue controls for comparison to diseased tissue, which will allow the identification of the most crucial disease-related alterations and provide more robust targets for novel treatments.

Tissue-specific RPMM
Variation in tissue-specific methylation relative to differences between tissue types was first explored visually. Scatter plots of methylation values for representative samples from two different tissues were less well correlated than similar plots of two representative samples from the same tissue type, though variation in tissue-specific methylation was also evident ( Figure S1). Tissue-specific methylation patterns for adult blood, lung, and pleural tissue samples were then modeled with RPMM to investigate potential associations of age and exposures with methylation profiles. An RPMM of adult bloods (n = 30) resulted in two methylation classes ( Figure 2B), and age differed significantly by methylation class (P,0.005), though we did not detect significant associations between methylation class and smoking status, packyears, or alcohol consumption. An RPMM of lung tissues (n = 53) resulted in five methylation classes ( Figure 2C) where class membership was not associated with age or smoking status. An RPMM of pleural tissues (n = 18) resulted in five methylation classes ( Figure 2D), and class membership was not associated with age; yet, an association between methylation class and asbestos exposure approached significance (P,0.07).

Locus-by-locus analysis of exposure-related methylation
While exposures were not strongly associated with array-wide methylation profiles, locus-specific analysis revealed several exposure-related methylation alterations. Among pleural tissues 24 CpG loci had asbestos-related alterations in methylation, all of which were increases in methylation (Q,0.05, Table S3). In adult bloods, increasing packyears of smoking was significantly associated with MLH1 (Q,0.0001), and RIPK3 (Q,0.002) methylation; and over 30 CpG loci had significantly altered methylation in never versus ever drinkers (Q,0.05, Table S4). Among lung tissues, smoking status (never/ever) was associated with altered methylation at 138 CpG loci (Q,0.05, Table S5).
CpG locus-specific, age-related methylation Given our results from RPMM and previous reports of agerelated increases in methylation in normal tissues [18,19,22] we next focused on age-related methylation at specific CpG loci. We began by examining gene-loci that other investigators have reported to be associated with age and found that ESR1, GSTP1, IGF2, MGMT, MYOD1, RARB, and RASSF1 had significant age-associated methylation alterations, the majority of which were increases (P,0.05, age range .0, n = 139, Table 3). Hypothesizing that alterations in epigenetic regulatory genes or genes involved in aging processes could lead to the observed associations between age and methylation profiles from RPMM, we tested CpG loci in epigenetic regulatory genes, telomere maintenance genes, and a premature aging syndrome gene, again finding significant age related methylation alterations (Table 3). For example, LAMB1 -involved in subchromosome domain positioning [23] -had increased methylation with age. Significant age-related methylation alterations in telomere maintenance gene-loci TERT, ERCC1, RAD50, and the Werner syndrome gene-locus (WRN) were also observed. Additionally, and in contrast to the predominantly increased age associated methylation at other gene-loci, there was a significant age-related decrease in CpG methylation of the de novo methyltransferase DNMT3B; and unlike the vast majority of other CpGs tested, DNMT3B_P352 was not located in a CpG island (Table 3).

Array-wide, locus-by-locus analysis of age-related methylation
To expand the examination of age-associated methylation alterations, we performed array-wide locus-by-locus analysis of CpGs. For all tissues (age range .0, n = 139), after correcting for multiple comparisons, over 300 CpG loci had age-related methylation alterations (Q,0.05, Table 4). Restricting analysis to solid tissues (n = 119) revealed over 250 CpG loci with age-related methylation alterations (Q,0.05, Table 4). Tissue-specific locusby-locus analysis of age-related methylation was also performed (tissue types with n .10), detailed in Table S6.
There is now a considerable literature that suggests that genome structure affects both the initial placement of DNA methylation marks in development [24] as well as protecting silenced regions from being perturbed later in life [25]. To examine the possibility that genomic structure can affect the changes we observed in normal methylation, we assessed the potential effects of CpG island status on age-related methylation, with the hypothesis that there may be differential susceptibility to changes in DNA methylation in queried regions defined as canonical islands compared to those not in CpG islands. A CpG island is defined according to Takai and Jones [26], as a region of 200 bp with a GC content of .55% with an observed to expected ratio of CpG .0.65. This analysis used Generalized Estimator Equations (GEE), which are robust to within-person correlation and to the influence of aberrant observations [27], and estimated mean associations between age and methylation by CpG island status. Among all solid tissues (n = 119), the direction of correlation between age and methylation differed dependent upon whether the CpG was found in a CpG island. Loci in CpG islands had significantly positive correlations between methylation and aging, while loci not in CpG islands had significant losses of methylation with aging (P = 7.0E-04; Table 5). Similar trends were observed for other solid tissue types; age-related associations with methylation were significantly positive for loci in CpG islands for pleural tissues, and significantly negative for loci not in islands in brain tissues (Table 5). Interestingly, among adult blood samples, significantly negative correlations between age and methylation alterations were observed irrespective of CpG island status (P = 5.2E-05, Table 5).
To investigate CpG-dependent correlations between aging and methylation in more detail we clustered CpGs (rather than samples) with RPMM (aiming to examine classes of CpGs with similar methylation profiles in more detail), grouping CpGs with similar methylation into eight separate classes. The CpG island status of all loci was plotted, and illustrates the well known tendency for CpGs located in islands to be unmethylated, while non-island CpGs tend to be methylated ( Figure 3). We again used      GEE, here estimating RPMM class-specific mean associations between age and methylation and plotted the estimates with their 95% confidence intervals. In a class-specific model for solid tissue samples, there was a positive correlation between age and methylation in classes whose loci were predominantly located in CpG islands (P = 1.9E-05, Figure 3A). The tissue specific analysis of pleura demonstrated that classes rich in CpG island loci had significant age-associated increases in methylation (P = 2.3E-08, Figure 3B). Interestingly, the pattern of class-specific correlations between age and methylation in adult bloods was similar to those for solid tissue types, though the correlation between age and methylation was shifted towards the negative such that there was a significant decrease in age-related methylation among loci not in CpG islands (P = 6.3E-06, Figure 3C). Lung tissues displayed a similar pattern of class-specific correlations between age and methylation, and the strength of these correlations approached statistical significance (P = 0.13, Figure 3D). Finally, both brain, and head and neck samples demonstrated increased age-associated methylation in classes rich in CpGs island loci, and decreases in age-associated methylation in classes rich in loci not in CpG islands (P = 7.0E-04, P = 5.2E-08, Figure 3E and 3F, respectively).

Array validation and independent confirmation
Bisulfite modified DNA pyrosequencing was performed to validate array results. Array average beta values were significantly correlated with pyrosequencing percent methylation for sequenced array target CpGs; RARA_P176 (P = 0.003), DNMT3B_P352 (P = 0.008), and LIF_P383 (P = 3.0E-06, Figure S2). Consistent with array-based results, increased RARA_P176-local methylation was associated with reported asbestos exposure in pleural samples (n = 16, P = 0.10, Figure 4A). To confirm the observed associations between age or smoking packyears with methylation, specific loci were sequenced in array samples (n = 28) and an independent set of control blood DNAs (total n = 112). Sequencing DNMT3B_P352 both validated the association between decreased methylation and aging, and confirmed it in an independent population (P = 0.03, n = 112, Figure 4B). Similarly, the association between LIF_P383 methylation and packyears smoked from array results (P,0.02) was validated by pyrosequencing (P,0.02, n = 112, Figure 4C). In addition, pyrosequencing FZD9_E458local CpGs confirmed the association between increased methylation and aging (P,0.001, n = 112, Figure 4D).

Discussion
Epigenetic patterning and maintenance are of paramount importance for normal cellular functioning and identity. Hence, pursuing the annotation of normal human tissue-specific epigenomes is an important and necessary endeavor. However, such a project is considerably more challenging than sequencing the genome because of the tissue-specific and dynamic nature of epigenomes. Thus, a more complete understanding of what constitutes a normal epigenome, and the degree to which epigenomes vary (in a tissue dependent fashion) based on aging and the environment has the potential to dramatically improve the success of studies of epigenetic alterations in disease. Hence, our work characterized methylation of phenotypically important CpG loci across several human tissue types, elucidating interindividual tissue-specific variation in methylation profiles and the contribution of CpG island context to age associated methylation alterations. This work increases our appreciation for the dynamic nature of the epigenome, and begins to define basic tenets to follow in pursuit of both constructing reference epigenomes and elucidating epigenetic alterations truly indicative of disease states.
Using recursively-partitioned mixture modeling and random forests approaches, we differentiated tissues based on CpG methylation profile, consistent with other recent studies conducting genome-wide DNA methylation profiling [8][9][10]15,16]. These studies used high resolution methylation data and together have now shown that tissues have distinct methylation profiles. This novel and consistent body of work has, however, not addressed exposures in relation to interindividual variations in methylation. Not only do our findings confirm that tissue-specific epigenetic patterns can be readily defined with a targeted promoter-based CpG array, but they identify target sets of gene-loci most consistently capable of differentiating tissue types.
Factors known to contribute to methylation alterations include carcinogen exposures, inflammation, and diet. Several carcinogen exposures such as tobacco, alcohol, arsenic, and asbestos have been associated with methylation-induced gene-inactivation in various human cancers including bladder cancer, head and neck squamous cell carcinoma, and mesothelioma [28][29][30][31][32][33]. It is therefore reasonable to suggest that various and potentially accumulating exposures throughout life may directly or indirectly lead to methylation alterations and impact disease susceptibility. Carcinogens are well known to induce genetic abnormalities that can lead to clonal selection and expansion in normal appearing tissues (termed ''field effect''). Hence, the association of carcinogen exposures with the occurrence of altered methylation at phenotypically important loci may arise as a consequence of altered (''initiated'') clones. Our data suggest that large epigenetic changes occur in normal appearing tissues, and the relationship of these changes to companion genetic changes is of interest to study in the future. Cancer is a disease of aging, and initial studies of age-related methylation in normal tissues were motivated in large part by studies of methylation in cancer [34]. An early report from Issa et al. described an association between aging colonic mucosa and estrogen receptor methylation [35]. In general, trends of global (repeat element) hypomethylation and promoter hypermethylation found in cancer also have been observed in normal tissues with aging [36]. In recent reports of age-related methylation in normal human prostate and colon tissues, several CpG-island-containing genes were reported to have age-related increases in methylation [18,19]. Our results confirm these findings and, in addition, document that age-related alterations in these CpG loci are tissuedependent. More importantly, our examination of loci with previously reported age-associated methylation alterations, in conjunction with reports from others, suggested that the relationship between aging and promoter CpG methylation is complex. For example, using restriction-landmark genome scanning of over 2000 CpG loci in T lymphocytes comparing newborns, middle age, and elderly people, Tra et al. reported that 29 loci had age-related methylation alterations, with 23 loci displaying increased methylation with age and 6 decreasing with age [37]. In addition, measuring intra-individual global methyl- Figure 3. The direction of correlations for age associated methylation alterations differ dependent upon CpG island status. For (A-F), the top plot is the estimate of mean regression coefficients for age associated methylation (by decade), and its 95% confidence interval from GEE for each CpG RPMM class. The middle plot is of CpGs clustered with RPMM into eight classes for each group of samples. The bottom plot indicates the CpG island status for each locus (where magenta = CpG island locus, green = non-CpG island locus). (A) Estimates of class-specific age-associated methylation among all solid tissues (n = 119), RPMM clustering of CpGs, and CpG island status. Age-associated methylation is significantly increased among classes with a high prevalence of CpG-island loci (P = 2.3E-08). (B) Estimates of class-specific age-associated methylation among blood samples (n = 29), RPMM clustering of CpGs, and CpG island status. Age-associated methylation is significantly decreased among classes with a high prevalence of non-CpG-island loci (P = 6.3E-06). (C) Estimates of class-specific age associated methylation among pleural tissues (n = 18), RPMM clustering of CpGs, and CpG island status. Age-associated methylation is significantly increased among classes with a high prevalence of CpG-island loci (P = 2.3E-08). (D) Estimates of class-specific age-associated methylation among lung tissues (n = 52), RPMM clustering of CpGs, and CpG island status. (E) Estimates of class-specific age-associated methylation among brain tissues (n = 11), RPMM clustering of CpGs, and CpG island status. Age-associated methylation is significantly increased for the predominantly CpG island loci in class 3, and significantly decreased among classes with a high prevalence of non-CpG-island loci (P = 7.0E-04). (F) Estimates of class-specific age associated methylation among head and neck tissues (n = 10), RPMM clustering of CpGs, and CpG island status. Age-associated methylation is significantly decreased among classes with a high prevalence of non-CpG-island loci (P = 5.2E-08). doi:10.1371/journal.pgen.1000602.g003 ation changes over .10 years, Bjornsson et al. found both increased and decreased methylation levels dependent on the individual, with over 50% of participants exhibiting .5% change in methylation [20].
Stratifying our data on CpG-island status of loci, we showed that both the direction and strength of correlation between age and methylation were largely dependent upon CpG island status. More specifically, we found a propensity for CpG-island loci to gain methylation with age, and non-island CpGs to lose methylation with age. Our data are consistent with the literature that has demonstrated age-related increases in methylation at gene-loci found within CpG islands [18,19,22], as well as the findings of Tra et al. and Bjornsson et al. who showed bi-modal agerelated methylation in normal tissues. A direct comparison, by examination of the data of Bjornsson et al., indicated that a high percentage of their top 50 most age-altered loci (all decreases in methylation) are not located in CpG islands; among 24 of 30 autosomal CpGs in their Table 1 (where CpG island status can be identified by readers), only 5/24 (21%) are located in CpG islands, whereas 70% would be expected. Our results from blood samples corroborate their findings, and extend them to demonstrate similar trends in multiple other tissue types, where the strongest negative correlation between age and methylation occurs at CpGs which are not in CpG islands, and the strongest positive correlation between age and methylation occurs at loci in CpG islands.
The observed pattern of age associated methylation was irrespective of tissue-type, suggesting a common mechanism or dysregulation to explain these alterations. Reduced fidelity of maintenance methyltransferases with aging is one potential explanation for age related decreases in methylation; while age related increases in methylation could potentially reflect the accumulation of stochastic methylation events over time. As the examined tissues do not have a pathologic phenotype, methylated CpGs in these cells may not indicate dramatic functional consequences upon gene expression. However, the (in part selective) accumulation of alterations without readily detectable functional consequences should not be interpreted as biologically insignificant. Age-related drift of normal epigenomes without prominent changes in gene expression may nonetheless confer significantly increased risk of conversion to a pathologic phenotype by enhancing both the likelihood and frequency of methylation events that ultimately result in aberrant expression or altered genomic stability. For example, in the context of acquired ''nonfunctional'' CpG methylation in the promoter region of an aged individual, continued stochastic methylation events (e.g. ''methylation spreading'') increase the chance of methylation induced silencing at that promoter (or silencing of another locus through action at a distance via silencing of other important regions such as enhancers), and hence, progression to a pathologic phenotype. Certainly, this hypothesis is especially plausible for the many diseases of aging. Alternatively, aberrant CpG methylation that silences a gene on a single allele may not appear to have a functional consequence if the complementary allele can provide compensatory expression. As a result, for example, clusters of cellular clones with mono-allelic gene expression could contribute to an increased risk of progression to a pathologic phenotype (e.g. loss of the 2 nd functional allele). Future population-based studies addressing the potential of quantifying age and/or exposure associated methylation alterations indicative of disease risk are necessary.
We have provided clear evidence of interindividual variation in tissue-specific methylation related to aging and environmental exposures at disease-relevant CpGs across 10 normal human tissue types. We have demonstrated both general and tissue specific alterations, uncovered a CpG island context-dependent directionality to age associated methylation alterations, and provided a novel path for examining the mechanistic basis of these alterations. By enumerating the methylation status of a panel of cancer-related genes known to stably control transcription in normal tissues, we have also afforded important controls for comparison to diseased tissues, potentially aiding in identification of the most critical alterations in specific diseases and providing more robust targets for novel treatments. Importantly, we have also begun to disentangle the contributions of aging and environmental factors to methylation alterations in normal tissues. Uncovering age and exposure-related methylation changes and their clear contextual dependence is an important contribution to our basic understanding of epigenetic maintenance as it relates to both aging and the pathologic process, provides a potential avenue to pursue clinically useful biomarkers, as well as to identify novel markers of disease susceptibility.

Study samples
Normal human tissues were assembled by a collaborative, multiinstitutional network of investigators conducting molecular epidemiologic studies of human cancer. Tissues were obtained through Institutional Review Board approved studies already underway at these institutions, or were obtained from the National Disease Research Interchange (NDRI, Philadelphia, PA). Briefly, normal brain tissues (n = 12) were contributed by the Wiencke lab at UCSF through the San Francisco Adult Glioma Study [38]. Normal lung tissues (n = 49) were obtained from adjacent nontumor portions of lung in patients treated for NSCLC [39] or from the NDRI from non-diseased individuals at autopsy (n = 4). Peripheral blood DNA was obtained from controls enrolled in a study of bladder cancer (n = 15) [40], controls enrolled in a study of head and neck cancer (n = 15) [41], and newborn infants (n = 55) [42]. Non-tumorigenic pleural samples (n = 18) were obtained from grossly disease-uninvolved regions of incident mesothelioma [28]. Head and neck anatomic sites (n = 11), bladder (n = 5), kidney (n = 6), and small intestine (n = 5) were obtained from the NDRI, all from individuals with no gross diseases or tumors of the obtained tissues. Non-pathologic placenta samples (n = 19) were obtained as residual tissues from control infant term births as part of an ongoing hospital-based case-control study of intrauterine growth restriction at Women and Infants Hospital in Providence RI. All tissues obtained from patients with disease (lung, pleura) were histologically confirmed as normal by independent study pathologist review of tissue samples prior to DNA extraction.

Methylation analysis
Fresh frozen tissue and whole blood DNA was extracted using the QIAamp DNA mini kit according to the manufacturer's protocol (Qiagen, Valencia, CA). DNA was modified by sodium bisulfite to convert unmethylated cytosines to uracil using the EZ DNA Methylation Kit (Zymo Research, Orange, CA) according to the manufacturer's protocol. Illumina GoldenGate methylation bead arrays were used to simultaneously interrogate 1505 CpG loci associated with 803 cancer-related genes. Bead arrays have a similar sensitivity as quantitative methylation-specific PCR and were run at the UCSF Institute for Human Genetics, Genomics Core Facility according to the manufacturer's protocol and as described by Bibikova et al [43].

Pyrosequencing
Quantification of cytosine percent methylation was performed by pyrosequencing bisulfite-converted DNA using the PyroMark MD pyrosequencing system (Biotage). Specific pyrosequencing primers were designed to amplify array CpG sites and as many downstream CpGs as conditions permitted (2 to 5 additional) using Biotage Assay Design Software v1.0.6.

Statistical analysis
Data assembly. Data were assembled with BeadStudio methylation software from Illumina (SanDiego, CA). All array data points are represented by fluorescent signals from both methylated (Cy5) and unmethylated (Cy3) alleles, and methylation level is given by b = (max(Cy5, 0))/(|Cy3|+|Cy5|+100), the average methylation (b) value is derived from the ,30 replicate methylation measurements. Raw average beta values were analyzed without normalization as recommended by Illumina. Each array CpG is annotated with the gene name followed by the CpG location (P for promoter, E for exonic, or (rarely) seq to reference a specific sequence) and its physical distance from the transcription start site. At each locus for each sample the detection P-value was used to determine sample performance; 5 samples (2%), had detection Pvalues .1.0E-05 at more than 25% of CpG loci and were removed from subsequent analysis. Similarly, CpG loci with a median detection P-value .0.05 (n = 8, 0.5%), were eliminated from analysis. Finally, all CpG loci on the X chromosome were excluded from analysis. The final dataset contained 217 samples and 1413 CpG loci associated with 773 genes. The manufacturer recommended CpG island designation of array CpGs was used and follows the definition of CpG island in [26].
Unsupervised clustering. Subsequent analyses were carried out using the R software [44]. For exploratory and visualization purposes, hierarchical clustering was performed using R function hclust with Manhattan metric and average linkage. To discern and describe the relationships between CpG methylation and tissue type (sample clustering), and the relationships between CpGs with coordinate methylation (CpG clustering), a modified model-based form of unsupervised clustering known as recursively partitioned mixture modeling (RPMM) was used as described in [21] and as used in [33]. This approach, a model-based version of HOPACH [45], and Dynamic Tree Cutting [46], built classes of samples based upon profiles of methylation with data using a mixture of beta distributions to recursively split samples into parsimoniously differentiated classes [47][48][49][50]. For sample clustering the number of classes was determined by recursively splitting the data via 2-class models with Bayesian information criterion (BIC) used at each potential split to decide whether the split was to be maintained or abandoned [21,51,52]. For CpG clustering the same approach was used and models were subsequently pruned to eight terminal classes. Permutation tests (running 10,000 permutations) were used to test for association with methylation class by generating a distribution of the test statistic for the null distribution for comparison to the observed distribution. For continuous variables, the permutation test was run with the Kruskal-Wallis test statistic. For categorical variables we used the standard chisquare statistic for testing association between two categorical variables.
Random forests. The R Package was also used to build classifiers with the Random Forest (RF) approach. RF is a treebased classification algorithm similar to Classification and Regression Tree (CART) [53,54] and was performed on CpG average beta values using RandomForest R package version 4.5-18 by Liaw and Wiener. RF builds each individual tree by taking a bootstrap sample (sampling with replacement) of the original data and on average about 1/3 of the original data are not sampled (out of bag or OOB). Those sampled are used as the training set to grow the trees (here 500,000), and the OOB data are used as the test set. At each node of the tree, a random sample of m out of the total M variables is chosen and the best split is found among the m variables. We utilized the default value for m in the Random Forest R package, !M (!1413 = 38). The misclassification error rate is the percentage of time the RF prediction is incorrect.
Locus-by-locus analysis. Associations between covariates and methylation at individual CpG loci were tested with a generalized linear model. The beta-distribution of average beta values was accounted for with a quasi-binomial logit link with an estimated scale parameter constraining the mean between 0 and 1, in a manner similar to that described by Hsuing et al. [55]. When testing for associations between age and CpG methylation, samples with an age range = 0 (placenta, infant bloods) were excluded. Tissue types with ,10 samples (bladder, kidney, and small intestine) were not analyzed independently. CpG loci where an a priori hypothesis existed were tested separately, such as those that have been previously associated with aging in normal tissues [18,19], as well as genes involved in epigenetic regulation, telomere maintenance (selected with GO term lists), and the premature aging syndrome (Werner) gene WRN. Array-wide scanning for CpG loci associations with sample type or covariate used false discovery rate estimation and Q-values computed by the qvalue package in R [56].
To test the hypothesis that there are associations of age and exposure with methylation, we constructed measures of partial methylation sets, in analogy with global DNA methylation, which is measured on repeats [57]. Specifically, we hypothesize that only CpGs that have limited functional significance (within a tissue) are allowed to acquire methylation alterations , and that these events are stochastic; consequently, only CpGs whose methylation varies across specimens of the same type will show evidence of association with age or exposure, but individual CpG methylation, as a binary event with low individual probability, may demonstrate correlation too weak to detect and validate in a subsequent sample. On the other hand, average methylation over a defined set of CpGs will have increased power to measure the totality of such events with the set. To define such sets, we conducted RPMM clustering on the CpGs (rather than specimens) over all specimens of a defined tissue type, pruning the resulting tree to 8 classes. Within each class we averaged methylation values to form a single, CpG-class-specific partial methylation statistic. For each set, we fit a quasi-binomial model [55] expressing the logit mean partial methylation for each class as a linear function of age or exposure; standard errors were computed using generalized estimating equations (GEE) [27], from which confidence intervals were constructed. Furthermore, to properly account for multiple comparisons in a statistically efficient manner, we tested the omnibus null hypothesis of no association in any of the 8 partial methylation classes using a Wald test statistic constructed from the GEE estimates and robust variance-covariance matrix. To test a similar methylation hypothesis while distinguishing between loci in CpG-Islands and those that are not in CpG islands, we followed a similar methodology, replacing the 8 CpG RPMM class designation with CpG island designation.
Pyrosequencing analysis. Sequencing data were processed using Pyro Q-CpG software v1.0.9 (Biotage) under default analysis parameters and exported for subsequent analysis in R software [44]. Associations between pyrosequencing percent methylation values and array average beta values or covariates such as age or environmental exposures were tested with a Spearman correlation test. Figure S1 Pairwise plots comparing average beta values (A) between all blood and all head & neck samples, (B) individual blood sample versus an individual head & neck sample, comparisons within tissue type between individual samples for (C) blood and (D) head & neck. Average beta value scatterplots between tissue types indicate significant differences between tissues, and scatterplots within tissue type indicate relative similarity in the presence of interindividual variation. A) Mean of average betas for all blood samples (n = 30) versus mean of average betas for all head and neck samples (n = 11), indicates relatively high variability between tissue types, R2 = 0.84. B) Representative blood sample 1 average betas versus representative head and neck sample 1 average betas indicate similarly high variability between tissue types at the individual sample level, R2 = 0.87. C) Representative blood sample 2 versus representative blood sample 3 indicates relative similarity between individuals within a tissue type in the presence of interindividual variation, R2 = 0.97. D) Representative head and neck sample 1 versus representative head and neck sample 2 indicates relative similarity between individuals within a tissue type in the presence of interindividual variation, R2 = 0.96. Found at: doi:10.1371/journal.pgen.1000602.s001 (0.07 MB TIF) Figure S2 Bisulfite pyrosequencing mean percent methylation across all CpGs measured for RARA, DNMT3B, and LIF versus their respective CpG of interest on the array. Bisulfite pyrosequencing mean percent methylation across all CpGs measured for RARA, DNMT3B, and LIF versus their respective CpG of interest on the array. A) Mean bisulfite pyrosequencing percent methylation across array target CpG RARA_P176 and 5 downstream CpGs plotted versus Illumina GoldenGate methylation array average beta demonstrates a significant correlation between sequencing and array methylation (P = 0.03; n = 16). B) Mean bisulfite pyrosequencing percent methylation across array target CpG DNMT3B_P352 and 2 downstream CpGs plotted versus Illumina GoldenGate methylation array average beta demonstrates a significant correlation between sequencing and array methylation (P = 0.02; n = 28). Mean bisulfite pyrosequencing percent methylation across array target CpG LIF_P383 and 2 downstream CpGs plotted versus Illumina GoldenGate methylation array average beta demonstrates a significant correlation between sequencing and array methylation (P = 7.7E-08; n = 28). Found at: doi:10.1371/journal.pgen.1000602.s002 (0.04 MB TIF)