Are long telomeres better than short? Relative contributions of genetically predicted telomere length to neoplastic and non-neoplastic disease risk and population health burden

Background Mendelian Randomization (MR) studies exploiting single nucleotide polymorphisms (SNPs) predictive of leukocyte telomere length (LTL) have suggested that shorter genetically determined telomere length (gTL) is associated with increased risks of degenerative diseases, including cardiovascular and Alzheimer’s diseases, while longer gTL is associated with increased cancer risks. These varying directions of disease risk have long begged the question: when it comes to telomeres, is it better to be long or short? We propose to operationalize and answer this question by considering the relative impact of long gTL vs. short gTL on disease incidence and burden in a population. Methods and findings We used odds ratios (OR) of disease associated with gTL from a recently published MR meta-analysis to approximate the relative contributions of gTL to the incidence and burden of neoplastic and non-neoplastic disease in a European population. We obtained incidence data of the 9 cancers associated with long gTL and 4 non-neoplastic diseases associated with short gTL from the Institute of Health Metrics (IHME). Incidence rates of individual cancers from SEER, a database of United States cancer records, were used to weight the ORs in order to align with the available IHME data. These data were used to estimate the excess incidences due to long vs. short gTL, expressed as per 100,000 persons per standard deviation (SD) change in gTL. To estimate the population disease burden, we used the Disability Adjusted Life Years (DALY) metric from the IHME, a measure of overall disease burden that accounts for both mortality and morbidity, and similarly calculated the excess DALY associated with long vs. short gTL. Results Our analysis shows that, despite the markedly larger ORs of neoplastic disease, the large incidence of degenerative diseases causes the excess incidence attributable to gTL to balance that of neoplastic diseases. Long gTL is associated with an excess incidence of 94.04 cases/100,000 persons/SD (45.49–168.84, 95%CI) from the 9 cancer, while short gTL is associated with an excess incidence of 121.49 cases/100,000 persons/SD (48.40–228.58, 95%CI) from the 4 non-neoplastic diseases. When considering disease burden using the DALY metric, long gTL is associated with an excess 1255.25 DALYs/100,000 persons/SD (662.71–2163.83, 95%CI) due to the 9 cancers, while short gTL is associated with an excess 1007.75 DALYs/100,000 persons/SD (411.63–1847.34, 95%CI) due to 4 non-neoplastic diseases. Conclusions Our results show that genetically determined long and short telomere length are associated with disease risk and burden of approximately equal magnitude. These results provide quantitative estimates of the relative impact of genetically-predicted short vs. long TL in a human population, and provide evidence in support of the cancer-aging paradox, wherein human telomere length is balanced by opposing evolutionary forces acting to minimize both neoplastic and non-neoplastic diseases. Importantly, our results indicate that odds ratios alone can be misleading in different clinical scenarios, and disease risk should be assessed from both an individual and population level in order to draw appropriate conclusions about the risk factor’s role in human health.


Methods and findings
We used odds ratios (OR) of disease associated with gTL from a recently published MR meta-analysis to approximate the relative contributions of gTL to the incidence and burden of neoplastic and non-neoplastic disease in a European population. We obtained incidence data of the 9 cancers associated with long gTL and 4 non-neoplastic diseases associated with short gTL from the Institute of Health Metrics (IHME). Incidence rates of individual cancers from SEER, a database of United States cancer records, were used to weight the ORs in order to align with the available IHME data. These data were used to estimate the excess incidences due to long vs. short gTL, expressed as per 100,000 persons per standard deviation (SD) change in gTL. To estimate the population disease burden, we used the Disability Adjusted Life Years (DALY) metric from the IHME, a measure of overall disease burden that accounts for both mortality and morbidity, and similarly calculated the excess DALY associated with long vs. short gTL.

Introduction
Telomeres are the protective ends of chromosomes consisting of a repeating DNA sequence, which function to preserve genome stability by buffering against the progressive loss of terminal DNA during cell division and other forms of cellular damage [1]. Telomeres shorten as a normal process of human aging, but individuals vary widely in their rate of attrition and in their measured telomere length at any given point in time. Accordingly, measures of telomere length (TL) and rate of attrition have been proposed as biomarkers for risks of age-related diseases [2]. Phenotypically measured telomere length (mTL) is the cumulative result of both genetic and non-genetic contributions [2]. The significant role of inheritance was demonstrated by early reports from twin and sibling studies, which estimated the heritability of telomere length to be 34-82% [3][4][5][6], while a meta-analysis from six independent family-based cohorts estimated the heritability to be 70% (95% CI 64%-76%) [7]. These estimates from family-based studies reflect both the genetic contribution and shared environmental factors due to the relatedness in the study participants, as well as a shared intrauterine environment in some cases, and therefore are likely to be higher than the true portion of telomere length variation determined by genetic inheritance [8]. Telomere length genetic inheritance includes non-telomere region determined by genetic variance as well as direct transmission of the telomere ends from the parental gametes to the zygote, and is therefore only partially determined by gene variants [9,10]. A recent study using genome-wide complex trait analysis (GCTA) estimated that additive genetic variance, that is, the totality of single nucleotide polymorphisms, contributed only 28% of total phenotypic variance of TL in a European American sample [11]. However, it should be noted that this study used salivary DNA for TL measurements, which are less robust than DNA from leukocytes. Yet, as of now, only several dozen single nucleotide polymorphisms (SNPs) associated with phenotypically measured telomere length have been reported . Each of these SNPs contributes to a very small percentage of the telomere length variance [15,38]. Therefore, our current understanding of genetic determinants of telomere length is still very limited.
Nevertheless, Mendelian randomization studies restricted to utilization of genetic scores composed of several TL-associated SNPs as instrument variables had provided strong evidence for causal links between TL and several major diseases. The Mendelian randomization study design is less susceptible to confounding and reverse causation than phenotypically measured telomere length (mTL), which is impacted by lifestyle and other environmental factors. The relationship between mTL and cancer clearly illustrates this point. Observational studies have reported associations of both short and long phenotypically measured telomere length (mTL) with cancers in various study design settings [39][40][41][42]. This complex relationship between mTL and cancer may be due to the different roles telomere length play in cancer at various stages of the pathogenesis [43] and the confounding effects of disease progression and treatment on telomere length.
Mendelian randomization designs have commonly implicated longer genetically predicted telomere length in increased risk for several cancers [44][45][46][47][48][49], while shorter genetically predicted telomere length is associated with increased rates of degenerative disease, such as cardiovascular disease, diabetes, COPD, and Alzheimer's disease [49][50][51][52]. It has been proposed that the opposing directions of these associations imply that human telomere length has evolved to balance the disease risks imposed by both short and long telomeres [53,54], and thereby achieving an optimal length over successive generations. It would seem that this argument fails to consider the fact that the manifestation of the majority of the diseases associated with long or short telomeres, with the exception of a number of rare cancers (e. g. neuroblastoma and testicular cancer, some types of leukemia and lymphoma and type I diabetes mellitus), happens in later life, well past the reproductive period to have real evolutional pressure on the population. However, this finding among contemporary humans could alternatively suggest that selection forces have acted on telomere length precisely to lessen the impact of degenerative and neoplastic diseases during the reproductive years [55,56].
These varying directions of disease risk beg the question: what is the relative impact of long gTL vs. short gTL on disease incidence and burden in a population? We sought to determine the contribution of genetically predicted telomere length to disease burden, including DALYs, in a European population for which relevant data were available. We defined gTL as classically genetically determined telomeres (assessed using SNPs). Here, a genetic sum score which included up to 16 telomere length associated SNPs was used as the proxy for gTL. This analysis did not take into account possible contributions from telomeres directly transmitted via parental germlines.
In the most extensive analysis published thus far, the recent meta-analysis by Haycock et al. examined the relationship between genetically determined telomere length (gTL) and 35 cancer types and 48 non-neoplastic diseases using a Mendelian Randomization design. Up to 16 TL-associated SNPs were used to approximate gTL in 420,081 cases and 1,093,105 controls from 45 published cohorts and one unpublished study. They found that 9 cancers (endometrial, ovarian serous LMP, testicular, bladder, kidney, lung adenocarcinoma, malignant skin melanoma, glioma, neuroblastoma) are associated with longer gTL, while 6 non-neoplastic diseases (coronary heart disease, aortic aneurysm, Alzheimer disease and other dementias, type I diabetes mellitus, interstitial lung disease and Celiac disease) are associated with shorter gTL. These results confirm and extend the notion that there exists an evolutionary tradeoff in the regulation of telomere length between cancer and degenerative disease [54].
However, their results, which are presented as odds ratios (ORs), also suggest associations of markedly larger magnitude between gTL and cancers than between gTL and degenerative diseases. For instance, glioma has an OR of 5.27 per SD increase in gTL, while coronary heart disease has on OR of only 0.78 per SD increase in gTL (corresponding to an OR of 1.28 per SD shorter gTL). This imbalanced pattern of effect would seem to suggest that long telomeres have a far larger impact on incidences of cancer, and therefore has far greater clinical significance to cancer, than shorter telomeres do to non-neoplastic diseases. It perhaps even challenges the utility of telomere length as a biomarker of aging, as it has been described in a body of work [2,57].
However, heart disease and Alzheimer's disease are common conditions with high incidence rates, while the cancers associated with long telomeres are much less common. These conditions also have vastly different diseases courses and prognoses. As a result, it is unclear what the overall impact of telomere length is on the burden of disease experienced both by individuals and by populations. To this end, we propose a quantitative approach which takes into account the increased ORs of disease associated with TL, the incidence of specific diseases, and the population burden associated with those diseases in order to explore the role of telomeres in the health of modern populations. We used incidence and Disability-Adjusted Life Years (DALY) data published by the Institute of Health Metrics and Evaluation (IHME) at the University of Washington to consider how the disease risks associated with gTL telomere length manifest in a European population. DALYs can be thought of as years of 'healthy life' lost due to disease. They are calculated as the sum of two measures: Years of Life Lost (YLL), which reflects premature mortality attributable to disease, and Years Lost due to Disability (YLD), which reflects the burden of the disease experienced by the patient prior to death. Unlike prevalence, incidence, and mortality, DALYs allow for the direct quantitative comparison of diseases with vastly different trajectories. For instance, the burden of non-fatal chronic diseases such as cardiovascular disease can be directly compared to that of rapidly fatal diseases such as aggressive cancers. Our analysis provided quantitative estimates of the impact of short and long gTL in a human population for the first time.

Calculating excess disease incidence and burden associated with gTL
Odds ratios (ORs) of disease associated with gTL telomere length were taken from the 2017 Haycock et al. meta-analysis, which identified 16 SNPs as a genetic proxy for telomere length in order to assess the relationship with 56 primary-outcome diseases in a Mendelian Randomization design [49]. An additional 7 ORs of disease were taken from an original publication by Li et al. [38], in which ORs of disease were calculated for 122 diseases using 52 independent variants. Only odds ratios with statistical significance were included in the present analysis.
Age-standardized Disability-Adjusted Life Years (DALY) and incidence data for the 2017 year were downloaded through the GBD Results Tool (http://ghdx.healthdata.org/gbd-resultstool). Only the "Europe" population was included in order to maximize consistency with the Haycock et al. meta-analysis, where all participants were of European ancestry.
In several cancers, Haycock et al. report odds ratios for narrower disease definitions than the available IHME data. In order to most accurately approximate the excess impact of these conditions associated with gTL, we used data from the US National Cancer Institutes' publicly available SEER Database (seer.cancer.gov) to interpolate the incidence and DALYs for the conditions defined by Haycock et al. Case counts were pulled from the SEER database for both the specific cancer (e.g., squamous cell carcinoma) and for the broader category defined by the IHME (e.g., trachea, bronchus, and lung cancers), and a percentage of total cases calculated. The IHME data were multiplied by the percentage to yield incidence and DALY values for the narrow disease definitions (See S1 File).
The ORs in Haycock et al. are all given relative to longer gTL. To allow for our calculations, the ORs for non-neoplastic diseases were inversed in order to provide an OR relative to shorter gTL. The ORs in Li et al. are given relative to shorter gTL, and are all less than 1; they were also inversed to allow for our calculations.
To capture the excess DALYs per 100,000 persons associated with gTL, the formula "(OR-1) x DALY" was applied. In other words, if an OR = 1 reflects baseline odds of disease, we subtract OR-1 to capture odds of disease predicted by gTL in excess of baseline. This value is multiplied by the corresponding DALY to estimate the excess disease burden experienced by the European population due to longer gTL. Excess incidence and 95% confidence intervals (CI) for our estimates were calculated by the same method.

Excluded odds ratios
Celiac disease and abdominal aortic aneurysm were reported in Haycock et al. as associated with shorter gTL, but were excluded from our analysis because corresponding IHME was not available. Similarly, uterine polyps and hypothyroidism from Li et al. were excluded.

SNP literature search
A list of all SNPs reported to be associated with TL in the published literature was compiled as follows: 1) A PubMed search for the term ["telomere length" AND (SNP OR "polymorphism, single nucleotide") AND human] was completed on November 26, 2019, yielding 211 results. All abstracts were reviewed, and studies were selected if they included a direct measure of association between peripheral blood leukocyte TL and SNPs. Selected studies include both GWAS and hypothesis-driven tests of association. Participants must have been cancer-free at the time of LTL measurement. Publications were excluded if: N <100; study design and/or statistical methods were considered as unreliable; effect allele could not be determined from data presented in publication; errors/inconsistencies were identified in the publication that precluded interpretation. 2) An additional 2 studies were identified in the GWAS Catalog in November 2019. 3) All 16 SNPs reported by Haycock et al. [49] were included, as well as any other significant SNPs in the original publications used for meta-analysis. These were captured by the above literature search methods. An additional publication by Mangino et al. [35] was identified by focused review of publications by the same authors.
Chromosomal position data for all SNPs was drawn from the PubMed SNP database, using assembly GRCh38.p7. "Short Allele" refers to the short telomere allele and is reported as in original publications. 'Long Allele" is reported as in the publication where available, or otherwise it reflects the alternative allele reported in the PubMed SNP database. All alleles are reported in forward orientation, modifying from the original publications where necessary. Short Allele frequencies are based on the '1000Genomes' global population data, found in the PubMed SNP database.

LD determination
Linkage disequilibrium (LD) between SNPs on the same chromosome is given as R 2 values, obtained from the 'LDlink' suite of applications, provided by National Cancer Institute Division of Cancer Epidemiology & Genetics (https://ldlink.nci.nih.gov). The 'LDmatrix' tool was used to generate tables of R 2 values for all chromosomes with at least two identified SNPs. While there are a number of exceptions, the majority of GWAS studies listed in Table 1 were When estimating the number of independent potential sentinel/causal variants, we consider any pair of SNPs with R 2 <0.5 as uncorrelated, therefore count them as independent potential sentinel/causal variants.

Results and discussion
On the face of it, odds ratios alone suggest that long gTL is associated with a much greater excess risk of neoplastic disease than short gTL is with excess risk of degenerative disease. However, taking into account the actual disease incidence tells a different story: the incidences of some of these degenerative conditions-namely coronary heart disease and Alzheimer's disease-far outweigh those of the neoplastic diseases (Table 1) and have disease trajectories that may be on the order of decades rather than months-to-years. When we apply our method to estimate the excess incidence of these conditions, we find that, despite markedly larger ORs of neoplastic disease, the large incidence of degenerative diseases causes the excess incidence attributable to gTL to balance or exceed that of the neoplastic diseases. For instance, the largest OR of any disease associated with gTL is that of glioma, at 5.27 (3.15-8.81, 95%CI) per SD long gTL. However, with an incidence of only 8.39 per 100,000 persons (7.29-9.30, 95%CI), the excess incidence per SD long gTL per 100,000 persons amounts to only 35.83 cases. In contrast, coronary heart disease, which is the most common condition examined at an incidence of 190.81 per 100,000 persons (171.96-211.00, 95%CI) but with an OR of only 1.28 per SD short gTL, contributes an excess incidence of 53.43 cases (20.99-93.50, 95%CI) per SD short gTL. When we combine all 9 neoplastic diseases together, long gTL is associated with a total excess incidence of 94.04 cases (45.49-168.84, 95%CI; per SD long gTL per 100,000 persons), while short gTL is associated with a total excess incidence of 121.49 cases (48.40-228.58, 95% CI; per SD short gTL per100,000 persons) from the 4 non-neoplastic diseases.
When the same method is applied to DALYs in order to assess the impact of gTL on total population health burden due to both death and disability, as opposed to incidence alone, a similar pattern is observed. Coronary heart disease is again associated with the greatest impact of gTL, with 589.74 (231.68-1032.05, 95%CI) excess DALY per SD short gTL per 100,000 persons (Table 1). However, glioma and lung adenocarcinoma follow closely behind, with 574.14 (289.09-1050.12, 95%CI; per SD long gTL per 100,000 persons) and 487.79 (311.83-717.21, 95%CI; per SD long gTL per 100,000 persons) excess DALYs, respectively. These values are the result of their large ORs of disease, as noted, but also of their large DALY burdens relative to their incidence rates. Overall, long gTL is associated with excess DALYs totaling 1,255.25 (662.71-2,163.63, 95%CI; per SD long gTL per 100,000 persons) from the 9 neoplastic diseases, while short gTL is associated with excess DALYs totaling 1,007.75 (411.63-1847.34, 95%CI; per 1SD short gTL per 100,000 persons) from the 4 non-neoplastic diseases.
Notably, interstitial lung disease contributes significantly to total excess incidence and excess DALYs associated with short gTL in the Haycock et al. meta-analysis. Idiopathic pulmonary fibrosis (IPF) likely contributes to a large portion of the burden of interstitial lung disease associated with telomeres. It is a rare condition of unknown etiology, with a prevalence of only 2-29/100,000 persons [58]. Evidence from rodent models and rare human genetics diseases (telomere syndromes) suggest that mutations in telomere maintenance genes may be directly involved in the pathogenesis of IPF [59][60][61], accounting for the large OR associated with the disease. Data from the Danish National Registry of Patients shows that IPF accounts for approximately 26.8% of interstitial lung disease cases, based on the IHME disease definition [62]. If the approximated contribution of IPF is subtracted from our calculations, the total excess incidence of non-neoplastic disease associated with short gTL falls to 109.46, and the excess DALYs fall to 932.64. In other words, even when we subtract out the contribution of a disease where the pathogenesis is known to be directly related to mutations in telomere maintenance genes, the same patterns of relative contribution to disease incidence and DALYs persist.
Repeating our analysis on another recently published Mendelian Randomization study using 52 SNPs associated with LTL and health data from the UK Biobank reveals a similar pattern of contribution to the incidence and burden of neoplastic disease [38]. The Li et al. study identified five cancer types (thyroid cancer, lymphoma and multiple myeloma, leukemia, lung cancer, skin cancer (including melanoma)) and two benign conditions of abnormal cellular proliferation (uterine fibroid, benign prostatic hyperplasia (BPH)) that were significantly associated with long gTL. Based on the reported ORs, long gTL contributed a total 231.42 excess cases per 1 SD long gTL per 100,000 persons, and a total 840.28 excess DALYs per 1 SD long gTL per 100,000 persons ( Table 2). The main drivers of excess incidence were uterine fibroids and BPH, with 77.71 and 62.35 excess cases per 1 SD long gTL per 100,000 persons, respectively. The contribution of gTL to cancer incidence is small in comparison, ranging from 8.18 excess cases for thyroid cancer to 37.64 excess cases for skin cancer. In contrast, the excess DALYs were overwhelmingly driven by lung cancer, with 511.26 excess DALYs per 1 SD long gTL per 100,000 persons. Leukemia contributes an additional 151.32 excess DALYs, while uterine fibroids and BPH trail far behind with only 10.71 and 19.61 DALYs, respectively. These results again demonstrate that while long gTL contributes relatively little to the population incidence of cancers, the severe morbidity and mortality associated with some cancers results in a heavy DALY burden. Also of note, Li et al. use a different and much larger set of 52 SNPs are a proxy for gTL, and with the exception of lung cancers and skin cancers, identify non-overlapping associations of gTL with disease. In the cases of lung cancers and skin cancers, the two groups use different disease definitions, with Li et al. adopting broader disease

PLOS ONE
Population health burden of long vs. short telomeres categories in both cases. This highlights the need for standardized methods of SNP selection and case definition in the Mendelian Randomization studies of telomere length to facilitate meaningful replication and extrapolation. These results suggest that genetically-determined long and short telomere lengths are associated with disease burden of approximately equal magnitude, despite their vastly different ORs. Short gTL is associated with slightly higher disease incidences, and long gTL with slightly higher DALYs. We believe these results provide the first quantitative estimate of the relative impacts of short and long telomere length on the health of a human population. Our results are also consistent with earlier reports that human telomere length is regulated under opposing evolutionary forces that act to minimize the risks of both neoplastic and non-neoplastic diseases [53].
However, our results should be interpreted with caution. They are based on a calculated measure of genetically-predicted TL using only 16 SNPs as a proxy by Haycock et al. These SNPs only account for a small percentage of the large telomere length variation in the population. The estimation of 28% narrow inheritance with genome-wide SNP data suggest that it is likely that more SNPs with small effect size are yet to be discovered [11]. How the totality of the all TL SNPs contribute to disease burden is unknown. We compiled an updated, comprehensive list of all SNPs associated with measured leukocyte telomere length in the published literature (Table 3), and assessed linkage disequilibrium (LD) between nearby sites (Table 4). LD between two SNPs in the same gene suggests a common causal genetic variant shared between the two loci, while SNPs not in LD suggest two distinct genetic causes of variance in TL. This list includes 106 SNPs on 18 chromosomes. Linkage disequilibrium analysis (Table 4) suggests that these 106 SNPs likely reflect 70 distinct causal variants from 50 genes, only 18 of which are known to be mechanistically involved in telomere maintenance pathways (Table 3). This list provides a tool for future studies of Mendelian Randomization using these SNPs as proxies for genetically determined telomere length. However, caution should still be applied when using these SNPs for future studies. Mendelian randomization approaches are based on the assumptions that the (1) selected SNPs are associated with telomere length; (2) the selected SNPs are not associated with confounders; and (3) the selected SNPs are associated with disease exclusively through their effect on telomere length. Therefore, the candidate SNPs to be used in MR studies should be from those genes that have been well-documented as mechanistically involved in telomere maintenance (Table 4). Even with this caution, we note that some telomere maintenance genes have functions other than telomere length. For example, in addition to extending telomeres, telomerase protein gene hTERT is also reported to be involved in NF-kb and Wnt/b-catenin transcriptional pathways [63,64] and is localized to mitochondria to inhibit caspase mediated apoptosis [65,66]. Similarly, it is possible that genes with yet unknown functions will have both telomere and non-telomere functions.
We also note several other limitations. First, genetic determinants of telomere length include both the variation in the non-telomeric regions attributable to SNPs and the direct inheritance of the lengths of telomeres from the oocyte and sperm when the zygote was formed. A recent paper examining the extent of physical telomere sharing among relatives suggests that the direct transmission of telomeres from gametes to zygotes contribute to at least 11% of the telomere length variability [9]. The mechanisms of how the telomere lengths of the oocyte and sperm contribute to the zygote, and how the initial length of telomeres is reset after fertilization is largely unknown [67]. Telomere length of newborns, which likely reflects the impact of the genetic determinants and the prenatal environment, will play an important role in contributing to the risks of both neoplastic and non-neoplastic disease in adult life [68].
Second, as environmental factors can influence telomere length throughout the whole lifespan, and perhaps differentially during different developmental stages (childhood, adulthood       and geriatric), the disease risks caused by short or long telomere length can change accordingly. Although it should be noted that compared to the inter-individual TL variation at birth, the overall magnitude of the effect of environmental factors is smaller [69]. Adding to the complexity is the bidirectionality of the relationship between environmental factors and disease. While environmental factors can lead to telomere length change, which in turn impacts disease risks, the disease itself and its progression and treatment may lead to telomere length change as well. Therefore, estimating disease risks from phenotypically measured telomere length at any given time point is challenging and imprecise without fully accounting for the myriad potential confounding factors. Finally, our analysis reflects our best efforts to accurately estimate the incidences, DALYs, and excesses in both that may be attributed to just one source of telomere length variationgenetically predicted telomere length. Data were pulled from multiple sources, including a meta-analysis and multiple population health databases. While every effort was made to maximize consistency across the study populations (see S1 File), the values presented here are subject to change as new methods and research provide more accurate epidemiological data. Nevertheless, our analysis of the population disease burden due to genetically determined telomere length provides the first such estimation on a population level without confounders from environmental exposure, lifestyle factors, and diseases. Future studies using telomere length as a biomarker for disease and risks need to carefully consider the separate effects of genetic, environment (both prenatal and postnatal) and lifestyle factors, and their potential interactions.