A Genome-Wide Association Study Identifies a Locus on TERT for Mean Telomere Length in Han Chinese

Leukocyte telomere length (LTL) is a predictor of aging and a number of age-related diseases. We performed genome-wide association studies of mean LTL in 2632 individuals,with a two-stage replication in 3917 individuals from Chinese populations. To further validate our findings, we get the results of 696 samples from a cohort of European ancestry. We identified two loci associated with LTL that map in telomerase reverse transcriptase (TERT; rs2736100, P = 1.93×10−5) on chromosome 5p15.33 and near keratin 80 (KRT80; rs17653722, P = 6.96×10−6) on 12q13.13. In Chinese population each C allele of rs2736100 and T allele of rs17653722 was associated with a longer mean telomere length of 0.026 and 0.059 T/S, respectively, equivalent to about 3 and 7 years of average age-related telomere attrition. Our findings provide new insights into telomere regulatory mechanism and even pathogenesis of age-related diseases.


Introduction
Telomeres are structures at the ends of eukaryotic chromosomes, which are made up of a repetitive sequence (in humans, TTAGGG) and have a major role in genomic stability. Telomere length is important in determining telomere function, whose dysregulation can lead to cell death, cell senescence, or abnormal cell proliferation [1]. In humans, leukocyte telomere length (LTL) progressively shortens with age because of the inability of DNA polymerase to fully replicate the 39 end of the DNA strand in mitotic division, and is frequently reported to be relatively shorter in aging-related diseases: such as Alzheimer's disease [2] and vascular dementia [3]. LTL varies among individuals with the same age, and is found to be inheritable in quantitative-trait linkage analyses of sib pairs, with heritability estimates ranging from 36% to 86% [4][5][6][7].
In some cells, such as germ cells and proliferative stem cells that require renewal, telomere length is maintained by telomerase, a large RNA-protein complex consisting of a reverse transcriptase (TERT) and an RNA template (TERC) [8,9]. The latter component complexes with TERT and provides the template for the synthesis of telomere repeats. Mutations in the TERT and TERC genes cause telomere shortening and are the major risk factors for rare syndromes, including idiopathic pulmonary fibrosis, aplastic anemia and dyskeratosis congenita [10]. TERT gene at 5p15.33 encodes the catalytic protein component of telomerase. However, a consistent conclusion could not be drawn from the association studies between TERT and LTL. Atzmon et al. reported that one haplotype of TERT (consisting of rs2853669, rs2736098, rs33954691 and rs2853691) was associated with LTL [11], but this association couldn't be verified by Soerensen et al. [12]. SNPs at 5p15. 33 have been reported to be associated with risk of lung cancer in Chinese Populations [13,14]. TERC, which serves as a template for addition of multiple 6 bp (TTAGGG) telomere repeats, is another main component of telomerase. The association between TERC and LTL has been identified by two recent genome-wide association studies (GWAS) [15,16], and this association was validated in different studies including a recently report in the Han Chinese population [12,17]. Notably, recent genome-wide studies of populations of European ancestry have revealed additional variants associated with LTL [7,16,18,19,20]. However, much of the heritability of LTL remains unaccounted for, and the pathways or biological mechanisms that underlie LTL variation are still largely unknown, e.g. TERC only accounts for no more than 1% of variation in telomere length [15,16]. To identify more variants that affect telomere length, we performed GWAS analyses in one Han Chinese cohort and then replicated promising signals in two further Han Chinese cohorts.

Results
The discovery cohort comprised 2,632 individuals (1,318 Type 2 diabetes patients and 1,312 health controls), which had been used and described in our previous study [21]. Further details of the cohort are given in Table S1. Principal-component analysis (PCA) was used to evaluate the population structure of samples of the discovery stage in comparison to the Hapmap populations, and the first two eigenvectors were plotted in Figure S1, which differentiating the Asian populations (CHB and JPT) and the study samples from the Caucasian (CEU) and Yoruba (YRI) samples clearly. LTL were approximately distributed normally ( Figure S2) and showed the expected decline according to ages ( Figure S3).
We analyzed the association of T/S ratio with individual's genotype in the discovery cohort, adjusted for age, gender and disease status. The quantile-quantile plots for the discovery cohort are shown in Figure S4, which showed little evidence for inflation due to population stratification (genomic inflation factor l = 1.011). In the initial study, no SNP reached genome-wide significance (P,5.0610 28 ), including previously reported SNPs on TERC [11] and OBFC1 [12] ( Figure S5). We set a pragmatic significance threshold of P,1610 24 in the discovery cohort to determine which SNPs to be selected for replication study in an independent cohort comprised of 2,533 individuals (1,173 subjects with Type 2 diabetes and 1,360 healthy controls) (Rep1 cohort). Details of the Rep1 cohort are given in Table S1.
Totally, 20 SNPs were selected for replication, including one SNP rs2736100 (P GWAS = 8.29610 23 ), located on TERT (Table  S2). We also z-transformed mean leukocyte telomere lengths in individual cohort to obtain comparable results and performed a meta-analysis (Table S3). We successfully replicated the association findings near genes KRT80 and TERT in Rep1 cohort, and the associations in the GWAS-Rep1 reached nominal significance (KRT80: rs17653722, P GWAS-Rep1 = 5.85610 27 , p-values for the heterogeneity test: P het = 0.463; TERT: rs2736100, P GWAS-Rep1 = 4.03610 24 , P het = 0.942) ( Table 1), and the direction of the effect was consistent with our findings in the discovery stage. And, we examined the association of these two SNPs with telomere length in additional cohort of Han Chinese (Rep2 cohort) comprised of 1,384 individuals (618 Type 2 diabetes and 766 healthy controls). As shown in Table 1, the meta-analysis results of both rs17653722 (P GWAS-Rep1-Rep2 = 6.96610 26 , P het = 0.071) and rs2736100 (P GWAS-Rep1-Rep2 = 1.93610 25 , P het = 0.790) still showed significant associations with telomere length. For those markers reported to have associations with LTL in previously GWAS, we only found rs12638862, which was in high linkage disequilibrium (r 2 = 0.928 in CHB) with rs1317082 in TERC, showed association with LTL in our GWAS results (P GWAS = 5.57610 23 , A allele b = 0.027). A full listing of SNPs is provided in Table S5.
To further validate our findings, we get the genotyping results of rs2736100 in an additional cohort of European ancestry (Rep3 cohort), including 696 samples in total (Table S1). By meta-analysis, we got significant association of rs2736100 (P GWAS-Rep1-Rep2-Rep3 = 4.45610 26 , P het = 0.915), which strongly support our results.

Discussion
Notably, rs2736100 is located in intron 2 of TERT, which encodes telomerase reverse transcriptase (Figure 1b), played a crucial role in protection of telomere integrity. Interestingly, rs2736100 has been reported to be associated with lung cancer risks in populations of both European descent and Asian ancestry [22]. However, it is unclear whether LTL is predictive of lung cancer risk: retrospective studies report that lung cancer patients at the time of or after diagnosis have shorter telomeres than unaffected controls [23][24][25], but prospective studies report that longer telomere length is associated with lung cancer risk including a study in China [26,27]. We found the risk allele C in lung cancer susceptibility corresponds to a longer telomere length, which is consistent with the results that individuals with longer LTL may have an increased risk of lung cancer. Our finding suggests the TERT locus may influence the cancers development through variation in LTL.
To date, the TERT locus has been identified in several independent populations for its association with telomere length. Atzmon et al. identified a common TERT haplotype that was associated with telomere length, but their association analysis of common TERT variants showed no significant association [11]. Bojesen et al. found SNPs at the TERT locus were associated with telomere length and SNP rs7705526 had the largest effect [28]. However, in their study, SNP rs2736100 was not highly correlated with rs7705526 (r 2 = 0.52), nor did it show independent associations with telomere length after adjustment for rs7705526. Codd et al. identified seven loci affecting mean telomere length including rs2736100 at 5p15.33 (TERT) locus [20]. We found rs2736100 at the TERT locus was associated with mean LTL in Chinese population, observing the association between the minor allele [C] of rs2736100 and LTL (joint P = 1.93610 25 ) with an effect size and direction consistent with that of Codd et al.
On 12q13.13, rs17653722 is located ,2 kb downstream of KRT80, encoding keratin 80 (Figure 1a). This protein is involved in cell differentiation, localizing near desmosomal plaques in earlier stages of differentiation but then dispersing throughout the cytoplasm in terminally differentiating cells. In expression data from the lymphoblastoid cell lines, rs17653722 is nominally associated with the expression of several genes (ACVR1B, C12orf44 and KRT80; P,0.05) (Table S4); According to the cohort used in GWAS stage, we deduced that the age-telomere declining formula for Chinese population is ''(T/ S ratios) = 20.00816YEAR+1.56, R 2 = 0.052, P,10 216 )'', which indicates that LTL declines on average by 0.0081 T/S per year between the ages of 20 and 90. In Chinese population each C allele of rs2736100 and T allele of rs17653722 was associated with a longer mean telomere length of 0.026 and 0.059 T/S, respectively, equivalent to about 3 and 7 years of average agerelated telomere attrition (Table S2), which showed similar effect size as TERC, a key component of telomere length maintaining system [17].
One limitation of this study is that the sample size was not large enough and the statistical power was relatively low to identify common variants of small effects with genome-wide significance. Another limitation is that we were unable to control more environmental factors because the information was not available. The third limitation is our study used different samples as the calibrator sample of telomere length measurement in each cohort. Therefore, we z-transformed the individual telomere length measurements in each cohort to obtain comparable results. Despite these limitations, we conducted a genome-wide association study of mean LTL in Han Chinese for the first time. In summary, we have identified common genetic variants on 5p15.33 and 12q13.13 that affect telomere length in the Han Chinese population. Several promising candidate genes, including TERT, are implicated in these regions. Given the importance of telomeres in cellular function and the central role of telomere length in determining telomere function, the identification of these new common genetic risk variants is the first step of elucidating the telomere regulatory mechanism and even pathogenesis of agerelated diseases. And, future studies aimed at accurately mapping these regions could be fruitful.

Ethics Statement
The study protocols were approved by the institutional review board of the ethics committee of the Shanghai Institute for Biological Sciences and conducted according to the Declaration of Helsinki Principles.

Participants
The characteristics details of the three cohorts are shown in Table S1.Clinical information, such as age and family history, was collected by questionnaire. Written informed consent was obtained from all subjects before enrollment.

DNA extraction
Venous blood samples anti-coagulated with EDTA were collected from all participants. High-molecular-weight genomic DNA was prepared from venous blood using the Quick Gene 610L Automatic DNA/RNA Extraction System (Fujifilm, Tokyo, Japan) and diluted to working concentrations of 50 ng/ml for SNP chip genotyping and 10-20 ng/ml for replication genotyping.

Leukocyte telomere length (LTL) measurement
Mean LTL was measured using a previously described modified version [29] of the quantitative real-time PCR-based assay [30]. Relative telomere length was calculated as a T/S ratio with Rnase P as a reference (ABI) for each sample. For each sample the quantity of telomere repeats and the quantity of Rnase P reference were determined in duplicate in 10 ml reactions in the same plate on an ABI-Applied Biosystems 7900 HT Thermal Cycler (Applied Biosystems).
Telomere reaction contained 16 Sybr green Taqman Gene Expression master mix (Applied Biosystems, Foster City, California), 300 nM of Tel-F, 300 nM Tel-R primers and 1 ng of template DNA.
(Primers: Tel-F: 59-CGGTTTGTTTGGGTTTGGGTTTGGGTTTGGGTTTG-GGTT-39; Tel-R: 59-GGCTTGCCTTACCCTTACCCT-TACCCTTACCCTTACCCT-39). A commercial kit was used according to the manufacturer's instructions to estimate the expression of the RNase P gene as an internal standard (TaqManRNase P Detection Reagents Kit, Applied Biosystems), using 16 Primers and TaqManH probe reagent, 16 TaqManH Genotyping Master mix and 3 ng of template DNA. Cycling conditions for telomere and RNase P were 90uC incubation for 10 mins followed by either 50 cycles of 95uC for 15 sec or 60uC for 1 min.
Alongside the samples each run also contained a Calibrator sample using pooled DNA. Dilution series (0.675-5 ng in two-fold dilutions) were run for both telomere and RNase P assays to establish the linear range. Good linearity was observed across this range (R 2 .0.99). Any samples outside this range were diluted and run again. For quality control all samples were run in duplicate and the duplicate values were checked for concordance. Samples showing a CV .2% were excluded and re-run. In addition, to test reproducibility of the assay, multivariable samples were randomly chosen and run again and a high level of agreement between the T/S ratios from the two runs was observed (R 2 .0.85, P,0.0001).

GWAS genotyping and quality control
The genome-wide association analysis was performed using the Affymetrix Genome-Wide Human SNP Array 6.0. Quality control (QC) filtering of the GWAS data was performed by excluding arrays with Contrast QC ,0.4 from further data analysis. Genotype data were generated using the birdseed algorithm [31]. For sample filtering, arrays that generated genotypes at ,95% of loci were excluded. For SNP filtering (after sample filtering), SNPs with call rates ,95% in either cases or controls were removed in each geographic group. SNPs with minor allele frequency (MAF) ,1% or significant deviation from Hardy-Weinberg equilibrium (HWE; P,0.001) in controls were also excluded. SNPs passing QC were used for further analysis. After QC filtering, there were 585,206 SNPs remaining in the discovery stage.

SNP selection criteria and replication genotyping
We selected representative SNPs fromthe ones with P GWAS ,10 24 for the replication study. Genotyping for the replication study was performed MGB Taqman probe assays from Applied Biosystems, using TaqMan Universal PCR Master Mix reagent kits under the guidelines. PCR plates were read on an ABI7900 system (Applied Biosystems, Foster City, CA, USA). Duplicates of randomly chosen samples were genotyped for quality control and duplicate concordance rates were 100%.

Analysis of population substructure
Population substructure was evaluated using principal-components analysis (PCA) ( Figure S1) using EIGENSTRAT software Table 1. Z-score based meta-analysis results for GWAS and replication studies for two leading SNPs on 5p15.33 and 12q13.13.  [32,33]. Twenty components, some of which were predicted to reflect ancestry differences among subjects, were generated for each sample. Logistic regression was used to determine whether there was a significant difference in component scores between cases and controls; significant components were used as covariates in the association analysis to correct for population stratification.

Statistical methods
Mean telomere length was considered as a quantitative trait, and expressed as a T/S ratio. Multivariable linear regression was conducted to analyze the association of the T/S ratio with SNPs, after adjusted for age, gender, and diabetes status using PLINK [34]. HWE analysis was performed using PLINK, and Haploview [35] was used to generate genome-wide P plots. Quantile-quantile plots were created using the R package, and regional plots were generated using LocusZoom (see URLs). We z-transformed the individual telomere length measurements in each cohort by subtracting the mean and division by the standard deviation and a meta-analysis using the random-effect model was carried out on the basis of the results of the three cohorts using PLINK. Heterogeneity across the data sets was evaluated using the Cochran's Q test. The meta-analysis was carried out using the Mantel-Haenszel method with a random-effects model.

Expression quantitative trait loci (eQTL) analysis
Expression profiles were analyzed in the lymphoblastoid cell lines eQTL data sets [36,37]. The expression data set were downloaded from the NCBI GEO database. The data setconsists of gene expression profiles generated using RNA extracted from lymphoblastoid cell lines derived from the 210 unrelated HapMap individuals from four sample groups (60 CEU, 45 CHB, 45 JPT and 60 Yoruba in Ibadan (YRI)). Expression analysis was performed using Sentrix Human-6 Expression BeadChips (Illumina). The SNP genotypes from HapMap 2 were used in the analysis. The eQTLs were tested by linear regression of normalized expression levels on SNP genotypes (coded as the number of minor alleles at each SNP: 0, 1 or 2). Analyses were conducted for each population and the combined data set.  Figure S1 The principal components analysis (PCA) of samples of discovery stage and HapMap individuals (CEU, CHB+JPT and YRI). Samples from different geographical origin were marked by different colors.

Supporting Information
(TIF) Figure S2 The distribution of LTL in four cohorts. Figure  S2 shows the distribution of LTL in the four cohorts. Telomere length was normally distributed in all cohorts.
(TIFF) Figure S3 Age-related LTL plot in four cohorts. Figure S3 shows the age relationship of the T/S ratio in the four cohorts. All cohorts showed the expected decline in LTL in individuals of increasing age. Regression lines are shown in black. In the cohort used in GWAS stage, we derived an age-telomere declining formula for Chinese population as ''(T/S ratios) = 20.00816 YEAR+1.56, R 2 = 0.052, P,10 216 )'', which indicates that, LTL declined on average by 0.0081 T/S per year between the ages of 20 and 90. (TIFF) Figure S4 Quantile-quantile Plots of GWAS stage datasets.
(TIF) Figure S5 Manhattan plot in the discovery stage.
(TIF)     The results of those markers published in previously genome wide LTL association studies in our study.
(DOC) Figure 1. Regional plots of the two loci associated with telomere length. Results are shown for the 12q13.13 (a) and 5p15.33 (b) regions. Top, 2log 10 P values are shown for SNPs for the region 350 kb on either side of the marker SNPs. P GWAS is for results obtained from the discovery stage and is shown for genotyped (circle). P GWAS-Rep1-Rep2 is for results obtained from the combination of the initial and replication study data (diamond). The marker SNP is shown in purple, and the r 2 values of the other SNPs are indicated by color. The genes within the relevant regions are annotated and shown as arrows. doi:10.1371/journal.pone.0085043.g001