Genetically predicted telomere length is associated with clonal somatic copy number alterations in peripheral leukocytes

Telomeres are DNA-protein structures at the ends of chromosomes essential in maintaining chromosomal stability. Observational studies have identified associations between telomeres and elevated cancer risk, including hematologic malignancies; but biologic mechanisms relating telomere length to cancer etiology remain unclear. Our study sought to better understand the relationship between telomere length and cancer risk by evaluating genetically-predicted telomere length (gTL) in relation to the presence of clonal somatic copy number alterations (SCNAs) in peripheral blood leukocytes. Genotyping array data were acquired from 431,507 participants in the UK Biobank and used to detect SCNAs from intensity information and infer telomere length using a polygenic risk score (PRS) of variants previously associated with leukocyte telomere length. In total, 15,236 (3.5%) of individuals had a detectable clonal SCNA on an autosomal chromosome. Overall, higher gTL value was positively associated with the presence of an autosomal SCNA (OR = 1.07, 95% CI = 1.05–1.09, P = 1.61×10−15). There was high consistency in effect estimates across strata of chromosomal event location (e.g., telomeric ends, interstitial or whole chromosome event; Phet = 0.37) and strata of copy number state (e.g., gain, loss, or neutral events; Phet = 0.05). Higher gTL value was associated with a greater cellular fraction of clones carrying autosomal SCNAs (β = 0.004, 95% CI = 0.002–0.007, P = 6.61×10−4). Our population-based examination of gTL and SCNAs suggests inherited components of telomere length do not preferentially impact autosomal SCNA event location or copy number status, but rather likely influence cellular replicative potential.

Telomeres are DNA-protein structures at the ends of chromosomes essential in maintaining chromosomal stability. Observational studies have identified associations between telomeres and elevated cancer risk, including hematologic malignancies; but biologic mechanisms relating telomere length to cancer etiology remain unclear. Our study sought to better understand the relationship between telomere length and cancer risk by evaluating genetically-predicted telomere length (gTL) in relation to the presence of clonal somatic copy number alterations (SCNAs) in peripheral blood leukocytes. Genotyping array data were acquired from 431,507 participants in the UK Biobank and used to detect SCNAs from intensity information and infer telomere length using a polygenic risk score (PRS) of variants previously associated with leukocyte telomere length. In total, 15,236 (3.5%) of individuals had a detectable clonal SCNA on an autosomal chromosome. Overall, higher gTL value was positively associated with the presence of an autosomal SCNA (OR = 1.07, 95% CI = 1.05-1.09, P = 1.61×10 −15 ). There was high consistency in effect estimates across strata of chromosomal event location (e.g., telomeric ends, interstitial or whole chromosome event; P het = 0.37) and strata of copy number state (e.g., gain, loss, or neutral events; P het = 0.05). Higher gTL value was associated with a greater cellular fraction of clones carrying autosomal SCNAs (β = 0.004, 95% CI = 0.002-0.007, P = 6.61×10 −4 ). Our population-based examination of gTL and SCNAs suggests inherited components of telomere length do not preferentially impact autosomal SCNA event location or copy number status, but rather likely influence cellular replicative potential.

Author summary
Telomeres lie at the ends of chromosomes and protect from damage and chromosomal fusions. Recent studies have identified relationships between telomere length and cancer

Introduction
Telomeres consist of hexanucleotide DNA repeats and a protein structure at chromosome ends that protect genetic information by maintaining chromosomal stability during cellular division [1,2]. Each time a cell divides, small amounts of telomeric DNA are lost due to DNA polymerase's inability to fully extend 3 0 DNA ends [1]. Consequently, telomeres shorten with each cell division and can be a marker of cellular aging [1,3,4]. Telomere attrition over time results in critically short telomere lengths and leads naturally to cellular senescence and/or apoptosis in normal cells; cancer cells can bypass this process through upregulation of telomerase as well as inactivation of TP53 or RB or both [5,6]. In contrast, increased telomere length may promote tumorigenesis by allowing cells to continue to divide despite accumulation of mutations and genomic instability [5,[7][8][9]. Epidemiological studies have found geneticallypredicted telomere length (gTL) to be associated with increased risk of some hematologic malignancies (e.g. chronic lymphocytic leukemia, small lymphocytic lymphoma) [10] and solid tumors [2,[11][12][13] but the biologic mechanisms connecting telomere length with cancer etiology remain unclear. Somatic copy number alterations (SCNAs) are the presence of two or more genetically different cell populations in an individual [14]. Clonal expansion of cells harboring these SCNAs (i.e., non-inherited mutations in chromosome copy number resulting in genomic deletions or amplifications as well as copy neutral loss of heterozygosity) results in heterogeneous cellular populations with genetic mosaicism [14]. SCNAs in peripheral blood leukocytes have been robustly associated with increasing age, with approximately 10-20% of individuals acquiring a detectable autosomal SCNA by age 80 [15,16]. Clonal SCNAs in leukocytes have been associated with increased risk of hematopoietic cancers, such as leukemia, lymphoma, and multiple myeloma [17][18][19][20], suggesting that SCNAs may be representative of some sort of underlying chromosomal instability. SCNAs may be related to telomere length, as a representation of overall chromosome degradation or a manifestation of cellular replicative potential, and further serve as one of many potential mechanisms linking inherited telomere length to elevated cancer risk. Previous studies also suggest that the telomerase reverse transcriptase (TERT) gene is associated with both clonal hematopoiesis and autosomal mosaic events, further supporting a potential association between telomere length and clonal SCNAs [17,21,22].
To our knowledge, no prior study has systematically evaluated the relationship between telomere length and SCNAs, though studies on each measure separately have implicated these measures as risk factors for chronic diseases (e.g., cancer, diabetes) [2,14]. This study aimed to use existing UK Biobank genotyping array data to evaluate the association between gTL and clonal SCNAs.

Results
A total of 431,507 individuals with complete genotyping data were included in this analysis. Mean gTL was similar between males and females (P = 0.6360), but differed by age quartiles, smoking status, and by self-report ethnicity (P<2x10 -16 ; Table 1). The differences observed in gTL by age quartile and smoking status were no longer significant when adjusted for self-reported ethnicity (P = 0.0800 and P = 0.4295, respectively; S1 and S2 Tables) as both age and smoking vary by ethnicity, which is an important determinant of gTL. Participant demographic characteristics by autosomal SCNA status are described in Table 2. Overall, 15,236 (3.5%) participants had at least one detectable clonal SCNA on an autosomal chromosome. Compared to individuals without autosomal SCNAs, those with autosomal SCNAs were more likely to be male, tended to be older, more likely to be former or current smokers, and had a higher proportion of European genetic ancestry (P<2x10 -16 ). Additionally, subjects with autosomal SCNAs had on average significantly higher gTL value compared to those without autosomal SCNAs (P = 3.54×10 −9 ).
Further analyses stratified by chromosomal event location (telomeric, interstitial or whole chromosome events) and copy number state (gain, loss, neutral, or undetermined events) also

PLOS GENETICS
identified an association between autosomal SCNA status and higher gTL value regardless of SCNA chromosomal location or copy number state ( Table 3). Tests for heterogeneity indicated no evidence for differences in associations by chromosomal event locations (P het = 0.3707) or copy number states (P het = 0.0502). Analyses were also conducted to explore whether the proportion of cells impacted by autosomal SCNAs was associated with gTL. Multivariable results found a positive association such that as gTL increases, so does the proportion of cells with autosomal SCNAs (β = 0.004, 95% CI = 0.002-0.007, P = 6.61×10 −4 ). Similarly, higher gTL value was associated with a greater expected number of autosomal SCNAs among participants with autosomal SCNAs (IRR = 1.07, 95% CI = 1.02-1.11, P = 1.94×10 −3 ). The distribution of the total number of autosomal events for each participant is given in S1 Fig.
All analyses were repeated to additionally assess the association between gTL and SCNAs within the sex chromosomes (chromosome X and Y). In total, 12,200 (5.2%) female participants had a chromosome X SCNA, and 38,685 (19.6%) male participants had chromosome Y loss. Only chromosome Y loss was considered for our analyses, as the majority of chromosome Y SCNAs represent a loss. Overall, there was a positive association between gTL and chromosome X SCNAs (OR = 1.04, 95% CI = 1.03-1.06, P = 5.87×10 −6 ; S7 Table). gTL was also positively associated with the proportion of cells affected with chromosome X SCNAs (β = 0.002, 95% CI = 0.0002-0.003, P = 2.35×10 −2 ), but not the expected number of chromosome X SCNAs (IRR = 1.45, 95% CI = 0.89-2.35, P = 0.1300) as few women had more than 1 chromosome X event (N = 14). Conversely, overall analyses suggested gTL was negatively associated with mosaic loss of the Y chromosome (OR = 0.97, 95% CI = 0.96-0.99, P = 2.71×10 −5 ). Subset analyses by mosaic fraction suggest low cell fraction events (LRR > -0.05; N = 25,338) are the predominant driver of this negative association between gTL and mosaic loss of the Y chromosome; however, analyses on a small set of individuals with greater degrees of mosaic loss of Y (LRR � -0.40; N = 505) suggest a positive association may exist for higher cell fraction events (LRR � -0.40: OR = 1.10, 95% CI = 1.01-1.21, P = 0.0340; S8 Table).

Discussion
Our findings from this large observational study of 431,507 individuals suggest that gTL is associated with the presence of autosomal SCNAs in peripheral blood leukocytes. We observed

PLOS GENETICS
consistent associations between gTL and autosomal SCNAs by strata of chromosomal event location and copy number state. We observed gTL was positively associated with both the autosomal SCNA cellular fraction and number of SCNA events. These results suggest telomere length has potentially less influence on chromosomal location or copy number status of SCNAs, but rather that longer telomeres could be associated with clonal expansion of SCNAs by increasing cellular replicative potential. This study represents the first population-based study to assess the relationship between gTL and SCNAs. As measured telomere length was not available from the UK Biobank data,  our study used genetic variants associated with leukocyte telomere length as a proxy for measured telomere length [23]. This genetic approach to estimate telomere length does not contain the biases generally attributable to measured telomere length (e.g., differences in DNA extraction or storage [24]). Several studies have demonstrated the power of this approach to identify associations between telomere length and a variety of outcomes [2,[10][11][12][13]25,26]. Additionally, previous studies that have derived gTL using a PRS have incorporated variant weights from several different genome-wide association studies (GWAS) [2,10,11], which may lead to uncomparable weight estimates due to study specific differences. Our study only utilized weights from a single large telomere length GWAS [23]. This ensures that all SNP weights are on the same scale, potentially leading to a more accurate estimate of gTL within our study.
SCNAs in the sex chromosomes were analyzed separately from the autosomes as mosaic events on the sex chromosomes vary by frequency and location on the sex chromosomes [19,27,28]. The positive associations found between gTL and autosomal SCNAs were also observed within chromosome X SCNAs. This replication further supports a relationship between gTL and SCNAs. Although an association between gTL and mosaicism was observed with chromosome Y SCNAs, the observed association was negative, potentially reflecting unique molecular drivers of somatic copy number alterations on the haploid male Y chromosome [29]. Further analyses of prior identified Y loss susceptibility variants around telomererelated genes (e.g., RPN1, TERT, ATM, TCL1A) [30][31][32] indicated these variants were in linkage disequilibrium and negatively associated with telomere length increasing variants [33], suggesting some component of telomere length may be an important contributor to mosaic Y loss even though the direction of association differs for mosaic Y loss. The telomere lengthassociated PRS used in our current analysis contains only currently discovered telomere length-associated variants, which likely represents only a small fraction of all genetic variants associated with telomere length and therefore may not capture all telomere length associations.
Our study is not without limitations. While the original GWAS results used to derive our PRS for gTL found 22 telomere length-associated variants [23], two of the variants, rs547680822 and rs4027719, were not found in the UK Biobank imputed data. rs547680822 has a reported alternate allele frequency of 0% within TOPMed European samples [23], so it is not surprising that the variant was not found within our data. Additionally, rs4027719 is an indel on chromosome 11 which was not included in the UK Biobank imputed genotype data. While our analysis is missing data on these two telomere length-associated variants, the resulting missingness is minimal, especially for the rs547680822 variant which is rare in Europeans, and is not anticipated to appreciably change analytic conclusions. Furthermore, the original TOPMed GWAS results did not include standard errors for the weights of the 22 telomere length-associated variants [23], which precluded rigorous MR tests to be conducted with our original panel of included variants. Instead, MR tests were conducted using available summary statistics from a telomere length GWAS performed in Europeans [34]. This alternative telomere length variant panel has considerable overlap with our original panel, but also includes a few novel loci. Results from MR analyses provide additional evidence for a directional relationship between the telomere length-associated variants and SCNAs.
Our analyses suggest a connection between longer gTL and SCNAs at the population level. Specifically, our analyses indicate cellular fraction of SCNA clones is the predominant driver of the association between gTL and SCNAs, indicating that inherited telomere length may be important for clonal expansion of hematopoietic stem cells harboring SCNAs. Additional studies are needed to further evaluate the association between telomere length and clonal SCNAs, particularly in relation to cancer risk. Prospective studies, which collect serial samples, may be useful in evaluating joint impacts of telomere length and clonal SCNAs over time and further help to explore possible biologic mechanisms that may lead to improved etiologic understanding, with potential relevance for cancer risk modeling and cancer prevention.

Data source
Existing data from the UK Biobank was utilized to investigate the association between leukocyte telomere length and SCNAs. Briefly, the UK Biobank is a large prospective study based in the United Kingdom which collected blood samples for genotyping as well as medical history and environmental exposures from study participants between 2006 and 2010 [35]. In total, approximately 500,000 participants were genotyped, and additional health outcomes data are available from linked UK national registries and hospital records. Each participant provided signed informed consent at enrollment. Blood samples were provided by participants after informed consent and sent to a central laboratory to be processed, aliquoted and cryopreserved. All research was performed in accordance with relevant regulations, and the UK Biobank study was approved by the National Information Governance Board for Health and Social Care and the National Health Service North West Multi-centre Research Ethics Committee. Data used in this analysis is available through application to the UK Biobank.

Study variables
Participant characteristics were obtained from questionnaire results at the time of enrollment. Based on these results, a detailed 25-level smoking variable was created to give a complete overview of each participant's lifetime smoking history which included information on smoking status, smoking duration, smoking intensity, time since quitting (for former smokers), and the type of tobacco smoked [29]. Genetic ancestry proportions were also inferred for each participant using SNPWEIGHTS which uses SNP weights computed from large reference panels to estimate genetic ancestry [36]. This approach to estimate ancestry has several advantages over reference-free approaches (e.g., principal component analysis), namely no dependency on sample size and utilization with related individuals [36,37]. The percentage of African, Asian, and European ancestry was computed for each participant, with European ancestry serving as the reference category in analyses. We also performed principal component analysis with SMARTPCA [38,39] on all included UK Biobank participants using the ancestry informative SNP panel described by Yu et al. [40] to calculate the first 10 principal components for a sensitivity analysis in place of genetic ancestry proportions.
Measured leukocyte telomere length was not available within the UK Biobank data. Instead, for each study participant, leukocyte telomere length was estimated using a genetic profile estimated from a polygenic risk score (PRS) [2,11]. This telomere length-associated PRS explains approximately 1.5% of the variation in leukocyte telomere length and contains 22 germline genetic variants that were identified to be associated with leukocyte telomere length within a GWAS of TOPMed whole genome sequencing data ( Table 4) [23]. The telomere length-associated PRS was calculated for the 22-telomere length-associated variants as: where PRS i is the risk score for individual i, w j is the weight assigned to each telomere lengthassociated variant given as the change in the number of estimated base pairs per telomere length-increasing allele, and x ij is the number of individual-specific telomere length-increasing alleles for the j th telomere length-associated variant. This PRS was further standardized to have mean 0 and standard deviation of 1 and used in analyses as a proxy for telomere length, where higher standardized PRS value indicates longer gTL. Clonal SCNAs were detected in study participants using intensity values and haplotype information from SNP genotyping data obtained by hybridizing blood-derived DNA to SNP microarrays (Affymetrix UK BiLEVE Axiom and UK Biobank Axiom arrays) [17,41,42]. Specifically, estimates of log 2 R ratio (LRR), B allele frequency (BAF) and genetic haplotype were used to detect large structural clonal SCNAs [17,42]. LRR provides a metric for assessing copynumber change (e.g., losses versus gains); whereas, BAF is a measure of allelic imbalance (i.e., deviations from Mendelian allelic fractions) [42]. SCNA calls within autosomal chromosomes and the Y chromosome were previously generated using a hidden Markov model-based approach that detects allelic imbalances in long-range phased haplotypes (analyzing only the pseudoautosomal region for the chromosome Y) [17,30,41]. We extended the previous analysis to the X chromosome, phasing all individuals together using Eagle2 as previously described [17,43] and then restricting the calling algorithm to females. To obtain improved estimates of mosaic fraction for Y chromosome events, LRR values were combined across the whole chromosome after removal of pseudoautosomal regions [29].
The copy number state of each clonal SCNA was further characterized as gain, loss, neutral, or undetermined based on LRR deviation [17]. Events with low cellular fraction have small deviations in LRR values and are difficult to classify into a distinct copy number state. These events were not called for copy number state and categorized as undetermined events. Clonal SCNAs were also categorized based on where in the chromosome the event occurred (e.g., telomeric ends, centromeric, interstitial, or whole chromosome event; S2 Fig). Chromosome size

PLOS GENETICS
Telomere length associated with clonal somatic copy number alterations in peripheral leukocytes as well as centromeric positions were pulled from the UCSC Human Genome Browser [44]. Events were defined as follows: (1) events which only occurred around telomeric ends (±1 Mb from chromosome ends) were defined as telomere events, (2) events which crossed the centromere were defined as centromeric, (3) events that spanned an entire chromosome were defined as whole chromosome events, and (4) all other events were defined as interstitial.

Statistical analysis
Demographic characteristics were first described by gTL and autosomal SCNA status (yes/no). Multivariable logistic regression models were then used to assess the association between genetically predicted telomere length and autosomal SCNA status. Further multivariable models categorized SCNA status by both chromosomal event location and copy number state. The association between the autosomal SCNA cellular fraction, given as the proportion of cells affected, and gTL was conducted using a multivariable linear model. The number of autosomal SCNAs, given as a count variable, were further analyzed with multivariable Poisson regression models. In addition to the performed gTL analyses with a PRS, a variety of MR analyses were conducted in order to combine telomere length-associated variants into a genetic instrument [45][46][47]. As standard errors were not available for the 22 included TOPMed telomere length-associated variants, other available variants and summary statistics were used to conduct MR analyses [34]. Heterogeneity among the included variants was investigated, and any variants with detected pleiotropic effects (i.e., false discovery rate < 0.2) were removed from subsequent MR analyses [48]. Multivariable analyses adjusted for sex, age, age-squared (age 2 ), genetic ancestry, and detailed smoking status (25-level indicator variables). All statistical analyses were performed using a 64-bit build of R version 3.5.2 and two-sided significance levels were set at P < 0.05.  Table. (DOCX) S1  Table. Association between genetically-predicted telomere length and chromosome X SCNAs by event type and copy number change.

Supporting information
(DOCX) S8 Table. Association between genetically-predicted telomere length and chromosome Y loss by log 2 R ratio.