Genetic Factors Are Not the Major Causes of Chronic Diseases

The risk of acquiring a chronic disease is influenced by a person’s genetics (G) and exposures received during life (the ‘exposome’, E) plus their interactions (G×E). Yet, investigators use genome-wide association studies (GWAS) to characterize G while relying on self-reported information to classify E. If E and G×E dominate disease risks, this imbalance obscures important causal factors. To estimate proportions of disease risk attributable to G (plus shared exposures), published data from Western European monozygotic (MZ) twins were used to estimate population attributable fractions (PAFs) for 28 chronic diseases. Genetic PAFs ranged from 3.4% for leukemia to 48.6% for asthma with a median value of 18.5%. Cancers had the lowest PAFs (median = 8.26%) while neurological (median = 26.1%) and lung (median = 33.6%) diseases had the highest PAFs. These PAFs were then linked with Western European mortality statistics to estimate deaths attributable to G for heart disease and nine cancer types. Of 1.53 million Western European deaths in 2000, 0.25 million (16.4%) could be attributed to genetics plus shared exposures. Given the modest influences of G-related factors on the risks of chronic diseases in MZ twins, the disparity in coverage of G and E in etiological research is problematic. To discover causes of disease, GWAS should be complemented with exposome-wide association studies (EWAS) that profile chemicals in biospecimens from incident disease cases and matched controls.


Introduction
As the world's population ages, mortality increasingly reflects the ravages of complex chronic diseases, particularly cancer and heart disease [1]. A person's risk of succumbing to a chronic disease is linked to his or her genetics (G) and exposome (E, representing all exposures during life) plus G×E interactions. Although geneticists and epidemiologists have debated the importance of G and E as causes of chronic diseases, it is clear that both factors affect disease risks [2,3]. However, most etiologic research has focused on genetic causes and has relegated exposures to secondary roles. For example, when queried on Feb. 6, 2016, there were 566,685 PubMed citations for the keywords "disease causes AND genetics" compared to 71,922 citations for "disease causes AND exposure", a ratio of about eight to one.
This genome-centric view of causation is motivated by the technologic ability to detect and manipulate genes, and fosters the notion that genetic factors are necessary determinants of disease that operate in a causal background of diverse exposures [4,5]. Certainly, technologies spawned by the human genome project led to stunningly comprehensive genome-wide association studies (GWAS) that investigated genomic variability across thousands of diseased and healthy subjects. Yet, because more than 2,000 GWAS rarely reported relative risks greater than 1.2 [6,7], geneticists are turning to whole-genome sequencing in searches for 'missing heritability' [8,9]. This motivation stems, at least in part, from calculations of heritability that do not differentiate disease variation arising from genetic factors and shared exposures [10].
In contrast to GWAS, the epidemiology of causal exposures still relies on self-reported and geographic information plus a few targeted measurements [11,12], much as it did a century ago. Nonetheless, data from the World Health Organization (WHO) has attributed nearly half of global mortality to a handful of exposures (Table 1), mainly particulate air pollution (including indoor smoke and occupational exposure) (14% of all deaths), tobacco smoking and second-hand smoke (13%), high plasma levels of sodium (6%), and alcohol use (which is generally protective but can be harmful with high consumption) (5%) [13]. There is also strong epidemiologic evidence that genetically-stable populations experience profound alterations in cancer incidence across generations and with migration that logically reflect changing exposures [3,14,15]. Thus, the empirical evidence promotes the notion that exposures are necessary determinants of disease that operate in a causal background of genetic diversity. However, compared to GWAS, the universe of exposures that has been investigated for associations with chronic diseases essentially consists of airborne particulate matter plus a set of about 300 environmental chemicals and nutrients [16].
To investigate the global influence of genetic factors on chronic-disease risks, data from cohorts of monozygotic (MZ) twins in Western Europe were compiled to estimate population attributable fractions (PAFs) for 28 chronic diseases, including prominent cancers, cardiovascular diseases, neurologic diseases, lung diseases, and autoimmune diseases. Because pairs of MZ twins have essentially identical genomes and also share many exposures [10], especially in early life, these PAFs estimate proportions of cases that would theoretically be prevented if interventions were able to remove particular combinations of genotypes and shared exposures [17,18]. To further evaluate the impacts of G and E on the risks of chronic diseases, PAFs from MZ twins were linked with mortality statistics from Western Europe to estimate the numbers of deaths attributable to genetic factors and shared exposures for ischemic heart disease and prominent cancers.

Materials and Methods
Data for estimation of PAFs were obtained from publications of disease phenotypes in large MZ-twin cohorts, i.e. [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35], some of which had been curated by Roberts et al. [36]. Virtually all of the data were from Western European twins, primarily Swedish, Danish, and Finnish. The 28 diseases included nine types of cancer, cardiovascular diseases (heart disease and stroke), neurological diseases (Parkinson's disease, Alzheimer's disease, dementia, and migraine), lung diseases (chronic obstructive pulmonary disease and asthma), obesity-associated diseases (Type-2 diabetes and gallstone disease), autoimmune diseases (rheumatoid arthritis, Type-1 diabetes, and thyroid autoimmunity), genitourinary diseases (general dystocia, stress urinary incontinence, and pelvic organ prolapse), and three other syndromes (chronic fatigue, irritable bowel syndrome, and gastroesophageal reflux). The following data were extracted from each study: gender for twins from Western Europe, mortality statistics for Western Europeans in that year were obtained for ischemic heart disease and relevant cancers from the WHO Global Burden of Disease Database [37,38]. To estimate deaths attributable to genetics and shared exposures, the number of deaths for each disease type were multiplied by the corresponding PAF from MZ twins.

Results and Discussion
Statistics from studies of MZ twins are summarized in Table 2. Estimated G-related PAFs ranged from 3.4% for leukemia to 48.6% for asthma with a median value of 18.5% and interquartile range of 9.9% to 24.2%. This indicates that fractions of cases attributable to genetics plus shared exposures tend to be modest, with three fourths of the phenotypes having PAFs less than 25%. In fact, G-related PAFs for only two phenotypes were greater than 40%, i.e. thyroid autoimmunity (42%) and asthma (49%). Fig 1 displays the cumulative distribution for the 28 phenotypes with symbols representing disease categories. Although there was variability within a given category, cancers tended to have the lowest PAFs (median = 8.26%) while neurological (median = 26.1%) and lung (median = 33.6%) diseases had the highest PAFs. Although these are apparently the first estimates of PAFs derived exclusively from MZ twins, Hemminki and Czene reported familial PAFs for cancers in the Swedish-Family Cancer Database (10.2 million individuals) [18] that are consistent with these results.
Since heart disease and cancer are the two leading causes of mortality in Western Europe (and worldwide), the contributions of genetics plus shared exposures to incidence of these diseases were estimated as summarized in Fig 2. Assuming that the populations of MZ twins that were used to derive PAFs are reasonable surrogates for Western Europeans in the year 2000, then 0.25 million of the 1.53 million cancer and heart-disease deaths (16.4%) can be attributed to G-related factors.
Because comprehensive exposure data were not collected in the twin studies, it is not possible to directly estimate E-related PAFs or contributions to disease risks from G×E interactions. But given the modest values of G-related PAFs reported here, it is reasonable to infer that the combined effects of non-shared exposures (E) and G×E would be greater than those of G alone. This conjecture is supported by results of structural equation modeling by Lichtenstein et al. [19], who reported that non-shared exposures in monozygotic and dizygotic twins accounted for between 58% and 82% (median = 62%) of the variation in 12 types of cancer. Nonetheless, the hypothesized dominance of E and G×E on chronic-disease risks is at odds with a recent paper by Tomasetti and Vogelstein who found a strong correlation between cancer risks and total numbers of stem-cell divisions in various tissues, and concluded from this  [35] Legend: N T , total twin pairs; N C , concordant twin pairs; N D , discordant twin pairs; P, proportion of case twins with an affected co-twin (%); RR, relative risk. that 'bad luck' accounts for about two thirds of the variation in cancers [39]. In refuting this conclusion, Wu et al. pointed out that the bad-luck hypothesis is illogical because: it equates correlation with causation; it is inconsistent with the epidemiological evidence; and it requires that mutation signatures of cancers be correlated with age, which is rarely the case [15]. Recognizing that both intrinsic random errors and E-related factors can influence cancer risks, Wu et al. then applied models to the same data used by Tomasetti and Vogelstein [39], which allowed E-related factors to be estimated after adjustment for intrinsic random errors. Results indicated that E-related factors typically explained more than 90% of cancer risk, consistent with the small genetic PAFs observed for cancers in MZ twins (median = 8.26%).

Conclusions
Because the human genome project planted the seeds for genome sequencing and large-scale omics technologies [5], it was inevitable that these methods would be used to search for causes of major diseases, and almost 2,000 GWAS have been reported [6]. Yet, the matrix of diseaseassociated genetic variants does not explain much heritability [7,9]. Indeed, Yang et al. predicted that between 20 and 50 causal genetic variants would be required to explain half the burden of a common disease, depending on the frequency of each variant and risk ratio of the  Table 2. doi:10.1371/journal.pone.0154387.g001 genotype [17]. The small genetic PAFs estimated here from studies of MZ twins (Table 2 and Fig 1) cast further doubt on the notion that our inherited genomes are the primary causes of chronic diseases. Nonetheless, the genome can influence disease outcomes through G×E interactions, and may also contribute through epistasis and heritable epigenetic effects that are as yet unknown. Thus, investigations of causes of chronic diseases should continue to consider genetic factors as part of a balanced strategy that characterizes both E and G with high resolution.
One avenue for discovering E-related risks would be to extend the data-driven approach embodied in GWAS and conduct exposome-wide association studies (EWAS) [40] via untargeted analyses of chemicals in blood (the 'blood exposome') [16]. Since disease processes can alter the blood exposome through dysregulation of systems biology, it is important that EWAS be conducted with archived biospecimens collected prior to diagnosis from incident cases and matched controls in prospective cohort studies. This makes it possible to distinguish chemical signatures of potentially causal exposures from those generated by progression of the disease (reverse causality) [40,41].
A good example of this data-driven approach for EWAS is given by Wang et al. [42] who found 18 chemical features (out of more than 2000 detected) that were associated with cardiovascular disease in samples totaling only 75 incident cases and 75 matched controls. Three of the features were identified as choline and its metabolites, betaine, and trimethylamine-Noxide (TMAO), with TMAO exhibiting the strongest disease risk in follow-up studies [43,44]. Since TMAO is a product of joint microbial and human metabolism of choline, the positive association between plasma TMAO and disease risk points to possible involvement of the gut microbiota in the etiology of cardiovascular disease. It is interesting that a study of colorectal cancer by Bae et al. [45] also found a positive association between plasma TMAO and disease risk, again suggesting involvement of the gut microbiota.