Association between Genetic Variants in DNA and Histone Methylation and Telomere Length

Telomere length, a biomarker of aging and age-related diseases, exhibits wide variation between individuals. Common genetic variation may explain some of the individual differences in telomere length. To date, however, only a few genetic variants have been identified in the previous genome-wide association studies. As emerging data suggest epigenetic regulation of telomere length, we investigated 72 single nucleotide polymorphisms (SNPs) in 46 genes that involve DNA and histone methylation as well as telomerase and telomere-binding proteins and DNA damage response. Genotyping and quantification of telomere length were performed in blood samples from 989 non-Hispanic white participants of the Sister Study, a prospective cohort of women aged 35–74 years. The association of each SNP with logarithmically-transformed relative telomere length was estimated using multivariate linear regression. Six SNPs were associated with relative telomere length in blood cells with p-values<0.05 (uncorrected for multiple comparisons). The minor alleles of BHMT rs3733890 G>A (p = 0.041), MTRR rs2966952 C>T (p = 0.002) and EHMT2 rs558702 G>A (p = 0.008) were associated with shorter telomeres, while minor alleles of ATM rs1801516 G>A (p = 0.031), MTR rs1805087 A>G (p = 0.038) and PRMT8 rs12299470 G>A (p = 0.019) were associated with longer telomeres. Five of these SNPs are located in genes coding for proteins involved in DNA and histone methylation. Our results are consistent with recent findings that chromatin structure is epigenetically regulated and may influence the genomic integrity of telomeric region and telomere length maintenance. Larger studies with greater coverage of the genes implicated in DNA methylation and histone modifications are warranted to replicate these findings.


Introduction
Like most eukaryotic organisms, human chromosomes are capped with a 6 base pair telomeric repeat (-TTAGGG-) that helps prevent incomplete DNA replication and genomic degradation [1]. Telomeres shorten with each cell division, and this progressive shortening has been postulated to be a causal factor, or at least an indicator, of organismal aging [2]. Short telomeres in blood cells have been inversely related to chronological age, and age-related disorders such as hypertension [3] and cardiovascular disease [4,5].
Significant associations with telomere length have been also found for SNPs in MEN1, MRE1A, RECQL5, and TNKS in a study evaluating 43 telomere-associated genes such as genes encoding telomerase, shelterin proteins and proteins involved in DNA repair [19]. Although not included as a candidate pathway in this study [19], emerging data suggest that epigenetic modifications might be another regulatory mechanism of telomere length. In mouse models, knockout of histone methyltransferases [20] or of DNA methyltransferases [21] both have been shown to result in abnormal telomere elongation: Telomeric DNA repeats lack CpG sites and are not directly methylated, but subtelomeric DNA is heavily methylated and correlates with telomere length and telomeric recombination in human cancer cell lines [22].
The purpose of the present study is to investigate genetic variants in DNA and histone methylation as well as other telomere biology-associated proteins in relation to telomere length in blood cells.

Results
Relative telomere length, estimated from the ratio of telomeric DNA relative to a single copy gene DNA (t/s ratio), ranged from 0.43 to 2.71 with an average of 1.25 among 989 women in the present study. Table 1 shows the associations between relative telomere length and 38 SNPs in genes involved in telomere biology and DNA damage response. Only one out of these SNPs (ATM rs1801516 G.A) was found to be associated with relative telomere length after adjustment for age and breast cancer diagnosis (p = 0.031 for recessive model). In contrast, we found suggestive evidence for an association with relative telomere length for five of 33 Table 2).
For the six SNPs that were significantly associated with relative telomere length at a = 0.05, we further estimated multivariableadjusted relative telomere length by genotype under different genetic models and report the model with smallest p-value ( Table 3). The minor alleles of BHMT rs3733890 G.A, MTRR rs2966952 C.T and EHMT2 rs558702 G.A were associated with shorter telomeres, while minor alleles of ATM rs1801516 G.A, MTR 1805087 A.G and PRMT8 rs12299470 G.A were associated with longer telomeres. Age and lifestyle factors like obesity and smoking are known to be important determinants of telomere length. In our data, however, there were no significant associations of obesity or smoking with relative telomere length [23]. Age was significantly associated with telomere length, but there was no evidence of effect modification of the association between telomere length and individual SNPs by age groups (age ,55 years vs. $55 years).

Discussion
A few SNPs have been related to telomere length, but other common genetic variations related to telomere length remain to be discovered. In the present study, we carried out an analysis of common genetic variations in candidate genes in relation to telomere length in blood cells, and observed suggestive evidence for associations with telomere length for several polymorphisms in genes involved in DNA and histone methylation. We found that women inheriting the variant allele of BHMT rs3733890, MTRR rs2966952 and EHMT2 rs558702 had shorter telomeres, whereas women inheriting the variant alleles of MTR rs1805087 and PRMT8 rs12299470 had longer telomeres.
Epigenetic modifications are associated with telomere length [24]. Telomeres are flanked by large blocks of heterochromatin, which stabilize repetitive DNA sequences by inhibiting recombination between homologous repeats [25]. DNA methylation and histone H3 methylation at lysine 9 are associated with repressed chromatin [26] and deregulation of epigenetic modifications have long been known to affect the integrity of the telomeric region [25]. Conversely, in the absence of telomerase the progressive shortening of telomeres results in a variety of epigenetic changes including increased histone acetylation, decreased histone methylation, and subtelomeric DNA methylation [24]. Together, these suggest that epigenetic changes and the regulation and maintenance of telomere length are intertwined processes.
We observed that rs12299470, located in intron 1 of PRMT8, is associated with long telomeres. PRMT8 belongs to a family of protein arginine methyltransferases (PRMTs) [27], and recognizes a glycin-and arginine-rich (GAR) motif as a preferred methylation site [28]. It was recently found that the shelterin component TRF2 contains the GAR motif, and deletion of PRMT1 promotes the formation of dysfunctional telomeres via inhibiting the binding of TRF2 to telomeric DNA [29]. The role of PRMT8 in telomere stability and function remains to be fully elucidated. However, it is interesting to note that PRMT8 was identified because of a high degree of sequence homology with PRMT1 [27].
Euchromatic histone-lysine N-methyltransferase 2 (EHMT2) is a key histone methyltransferase [30] and is known to be particularly important for histone methylation of euchromatin [30]. The EHMT2 rs558702 is located in the 59 flanking region of the EHMT2 gene (4,862 bp upstream from transcriptional start position), and predicted by TFsearch (http://www.cbrc.jp/ research/db/TFSEARCH.html) to be in a putative binding site of v-Myb or c-Myb transcription factors. Therefore, it is possible that the variant allele limits the binding of Myb transcription factors to its consensus site and reduces the EHMT2 gene expression.
Enzymes of folate single-carbon metabolism play an essential role for the synthesis of DNA precursors and remethylation of homocysteine for S-adenosylmethionine (SAM)-dependent DNA methylation [31]. Among those enzymes are betaine:homocysteine methyltransferase (BHMT), methyltetrahydrofolate:homocysteine methyltransferase (MTR) and 5-methyltetrahydrofolate-homocysteine methyltransferase reductase (MTRR). In the present study, the BHMT rs3733890 G.A was associated with shorter telomere length, and the MTR rs1805087 A.G was related to longer telomeres. The BHMT rs3733890 is a missense mutation resulting in the conversion of an arginine residue to a glutamine residue at codon 239 in exon 6, although the variant allele does not appear to change in enzyme activity or homocysteine levels [32,33]. However, carriers of the variant alleles have been reported to have favorable health profiles such as low prevalence of coronary artery diseases [32] and reduced risk of several congenital anomalies such as orofacial cleft [34] and neural tube defect [35]. The MTR rs1805087 is a missense change resulting in an amino acid substitution from aspartic acid to glycine at codon 919 in exon 26. The variant allele of this SNP has been associated with moderate increase of homocysteine levels [36].
We also observed shorter telomeres associated with the MTRR rs2966952 C.T. This SNP was selected in the present analysis because it is located at the 59 flanking region of the MTRR gene (1187 bp upstream from transcriptional start position) and is predicted by TFSEARCH (http://www.cbrc.jp/research/db/ TFSEARCH.html) to destroy the binding site of transcriptional factor C/EBPb [37]. However, the SNP also leads to a lysine to arginine amino acid change at codon 56 in exon 2 of the FASTKD3 gene, which encodes fast kinase domain-containing protein 3, a mitochondrial protein essential for cellular respiration [38]. Although the functional implication of this gene in telomere length is not known, it is possible that the observed association between rs2966952 and telomere length is mediated through effects on FASTKD3 rather than MTRR.
Two SNPs in the MTHFR gene (rs1801131 and rs1801133) were not associated with relative telomere length. This finding is in agreement with a previous report showing a weak association between MTHFR 677C.T polymorphism (rs1801133) and longer telomeres only among those having lower than median plasma folate concentration but no overall association between the variant allele and telomere length in men [39]. Possible effect modification by plasma folate status could not be evaluated in the present study.
The ATM (ataxia telangiectasia mutated) gene encodes a protein kinase, ATM, that regulates a large number of proteins including the checkpoint kinases CHK1 and CHK2 [40,41]. Induction of the checkpoint kinases is crucial for cell cycle arrest in response to DNA damage, and defective checkpoint responses can cause genomic instability and neoplastic transformation [40]. The present study found longer telomere length associated with the ATM rs1801516 G.A. The polymorphism is a missense mutation resulting in an amino acid change from aspartic acid to asparagine (dbSNP) that was predicted by PolyPhen [42] to be possibly damaging. However, a previous study by Mirabello et al. examined this SNP and 8 additional SNPs in the ATM gene, and did not find significant association with telomere length [19].
Several limitations in the present study should be discussed. First, as a large number of statistical tests were performed, our findings are particularly subject to type I (false-positive) error. However, we chose to report the p-values without correction for multiple comparisons because the SNPs in our study were not selected randomly but from candidate genes based on functional prediction using SIFT [43] and PolyPhen [42]. Still, the results need to be interpreted with caution given that none of the associations would have reached the same level of significance after adjustment for multiple comparisons. Second, some important candidate genes were not evaluated in this study. For example, H3-K9 methylatransferase, SUV39H1 and SUV39H2 were shown to be associated with the heterochromatin protein HP1 [44], and their absence resulted in modification of telomeric chromatin structure, and subsequent alteration in telomere length [20]. However, none of SNPs for these two genes met our criteria for selection. Lastly, we should point out that this study took place within a cohort enriched for a family history of breast cancer. While these women might have different allele frequencies or telomere lengths than a sample of the general population, we have no a priori reason to believe that specific characteristics of this population would impact the observed relationship of SNPs and telomere length. However, such an effect remains a possibility until verified in other general populations.
In conclusion, the present study found associations with telomere length for candidate SNPs (BHMT rs3733890, MTRR rs2966952, EHMT2 rs558702, MTR rs1805087 and PRMT8 rs12299470) that are implicated in DNA and histone methylation. These results support existing findings of epigenetic regulation of telomere length. These novel associations with telomere length require further replication in larger studies with more substantial genomic coverage as well as functional characterization of the variant alleles.

Ethics Statement
All individuals were informed about purposes, requirements, and rights as study participants. Written informed consent was

Study Population and Telomere Length Measurement
Data are from the Sister Study, which is a nationwide cohort study of environmental and genetic risk factors for breast cancer among women aged 35 to 74 years who have a sister with breast cancer [45]. A case-cohort analysis within the Sister Study was performed to examine the relationship between telomere length in blood cells and breast cancer risk in 342 incident breast cancer cases and 736 subcohort members who were randomly selected from 29,026 participants enrolled by June 1, 2007. Methods for relative telomere length measurement and characteristics of the study population have been previously described [23]. Briefly, genomic DNA was extracted from prospectively collected frozen blood samples using an Autopure LS (Qiagen) in the NIEHS Molecular Genetics Core Facility, and 10 ng of the extracted DNA was robotically aliquoted and plated in duplicate onto each of 4 replicate 384-well plates. Telomere length was determined as the ratio of telomere repeat copy number to single copy gene copy number (T/S ratio) relative to that of an arbitrary reference sample, using the monochrome multiplex quantitative PCR protocol. This method has been shown to give a high correlation (R 2 = 0.84) with telomere length determined by the traditional Southern blot analysis [46]. Plates were run on a BioRad CFX384 (Hercules, CA) with the cycling parameters previously described [23]. A 5-point standard curve ranging from 1.9 to 75 ng in a 2.5fold dilution series run was generated in each assay plate to estimate the value for each sample T (telomere) and S (albumin single copy) using Biorad CFX Manager software. Standard curve efficiencies for both primer sets were above 90%, and regression coefficients were at least 0.99 in all PCR runs. Plates were verified for overall quality control parameters. Average coefficient of variation (%CV) was 11% and intraclass correlation coefficient (ICC) of a single T/S ratio was 0.85. Individual estimates were obtained from the average of up to eight replicate T/S ratio values.

Genotyping
As a part of our study of telomeres and breast cancer risk, we selected a broad group of candidate genes related to telomere biology such as genes encoding telomerase and telomere-binding proteins, DNA repair and cell cycle checkpoint proteins, and epigenetic regulators of chromatin structure. Candidate SNPs were selected using the SNPinfo GenePipe tool, a web-based SNP selection tool that can integrate GWAS results with SNP functional predictions and linkage disequilibrium (LD) information [47]. Briefly, a list of candidate genes (N = 140), including some previously linked to regulation and maintenance of telomere length, was first filtered against the Cancer Genetic Markers of Susceptibility (CGEMS) breast cancer genome-wide association study (GWAS) [48] results to exclude genes that showed no evidence of association with breast cancer (i.e., had no SNPs with p,0.05). However, some candidate genes that were very poorly represented in the CGEMS GWAS panel were retained even if they had no SNPs with p,0.05.We define poor representation having ,20% of known common SNPs (as reported for that gene in dbSNP and with MAF $0.05) in high LD (r 2 $0.8) with at least one SNP in the GWAS panel. We then used SNPinfo to select SNPs from the remaining set of candidate genes. In addition to those showing associations with breast cancer risk, SNPinfo enhances selection of SNPs with predicted functional effects that are in high LD with the GWAS panel.
A total of 72 SNPs on 46 genes were included in the final analysis. Genotyping was conducted by the NIEHS Molecular Genetics Core Facility, using a custom-designed Illumina Gold-enGate genotyping panel. A total of 20 HapMap trios (20*3 = 60 samples) were genotyped to evaluate parent-parent-child (P-P-C) error. A total of 20 Sister Study sample duplicates were included to monitor replication error. Illumina BeadStudio genotyping software (version 1.6.3) was used to call genotypes. Individual genotypes with an Illumina GenCall (GC) score below 0.25 were assigned as missing. The overall call rate was 0.998. Both averaged P-P-C genotype error and averaged replication error were 0. The concordance between our genotype data and data in HapMap for the 20 HapMap trios was an average of 0.998 for each SNP.

Statistical Analysis
Out of 1,078 women who had relative telomere length measurements in baseline samples, the current analysis was restricted to 989 non-Hispanic white women comprised of 325 incident breast cancer cases and 664 subcohort members. Relative telomere length was not associated with breast cancer in our data [23], and minor allele frequencies of the SNPs were highly correlated between cases and subcohort members (Pearson r = 0.9981).
Relative telomere length measurements were skewed to the right and therefore were logarithmically transformed. The association of each SNP with relative telomere length was estimated using linear regression models that included age as a continuous variable and breast cancer status. The model fit was evaluated comparing additive, dominant and recessive linear regression models. Reported P-values are nominal two-tailed p-values and have not been corrected for multiple comparisons. All analyses were performed using the Stata 10.0 (College Station, TX).