No Evidence of a Common DNA Variant Profile Specific to World Class Endurance Athletes

There are strong genetic components to cardiorespiratory fitness and its response to exercise training. It would be useful to understand the differences in the genomic profile of highly trained endurance athletes of world class caliber and sedentary controls. An international consortium (GAMES) was established in order to compare elite endurance athletes and ethnicity-matched controls in a case-control study design. Genome-wide association studies were undertaken on two cohorts of elite endurance athletes and controls (GENATHLETE and Japanese endurance runners), from which a panel of 45 promising markers was identified. These markers were tested for replication in seven additional cohorts of endurance athletes and controls: from Australia, Ethiopia, Japan, Kenya, Poland, Russia and Spain. The study is based on a total of 1520 endurance athletes (835 who took part in endurance events in World Championships and/or Olympic Games) and 2760 controls. We hypothesized that world-class athletes are likely to be characterized by an even higher concentration of endurance performance alleles and we performed separate analyses on this subsample. The meta-analysis of all available studies revealed one statistically significant marker (rs558129 at GALNTL6 locus, p = 0.0002), even after correcting for multiple testing. As shown by the low heterogeneity index (I2 = 0), all eight cohorts showed the same direction of association with rs558129, even though p-values varied across the individual studies. In summary, this study did not identify a panel of genomic variants common to these elite endurance athlete groups. Since GAMES was underpowered to identify alleles with small effect sizes, some of the suggestive leads identified should be explored in expanded comparisons of world-class endurance athletes and sedentary controls and in tightly controlled exercise training studies. Such studies have the potential to illuminate the biology not only of world class endurance performance but also of compromised cardiac functions and cardiometabolic diseases.


Introduction
Early studies on the genetic basis of sports performance reported that identical twins who engaged in competitive sports were significantly more likely to participate in the same sports than pairs of dizygotic twins [1]. The first documented attempts to identify genetic markers for sports performance date to the 1968 Mexico and 1976 Montreal Olympic Games and were based on common blood genetic markers. They did not yield any strong positive findings [2][3][4]. A high maximal oxygen uptake (VO 2 max) is a necessary condition to reach the level of an endurance athlete of international caliber. A high VO 2 max can only be achieved if an individual is endowed with a very high level in the sedentary state (intrinsic level) in combination with large increases in response to sustained and demanding exercise training regimens (trainability).
Twin and family studies have revealed that the intrinsic level of VO 2 max is strongly influenced by a genetic component. For instance, the heritability of VO 2 max adjusted for age, sex and body composition in sedentary families of European descent reached 51% in the HERI-TAGE Family Study [5]. Evidence of a significant heritability level has also been documented for exercise training-induced improvements in VO 2 max. Large individual differences in VO 2 max gains have been found in sedentary young adults subjected to standardized endurance training programs [6]. A series of exercise training experiments conducted with pairs of identical twins revealed that the differences in trainability were not distributed randomly among the twins, with intraclass correlations in response of VO 2 max (L O 2 /min) ranging from 0.44 to 0.77 [7][8][9], indicating that members of the same twin pair responded similarly to training. In HERITAGE, the increase in VO 2 max in 481 individuals from 99 two-generation families of Whites of European descent showed 2.5 times more variance between families than within families for VO 2 max response, with a maximal heritability estimate of 47% [10]. Adjusting the VO 2 max response data for baseline VO 2 max did not modify this estimate, suggesting along with other evidence that the familial and genetic factors underlying VO 2 max in the sedentary state and its response to exercise training are different [11].
In the case-control GENATHLETE study, single nucleotide polymorphisms (SNPs) in several candidate genes were investigated but none have provided strong evidence for differences in allele and genotype frequencies between elite endurance athletes and controls [12][13][14][15][16][17]. Reports published to date have focused on differences in allele frequencies between athletes and non-athlete controls mainly on angiotensin I converting enzyme (ACE), α-actinin-3 (ACTN3), and peroxisome proliferator-activated receptor-γ coactivator 1α (PPARGC1A) polymorphisms and on mitochondrial DNA (mtDNA) haplogroup distributions [18]. There were several positive findings but most were derived from post hoc subgroup analyses and only a few studies controlled for multiple testing. Most of prior studies have come from elite Japanese runners [19,20], Spanish endurance athletes [21], Ethiopian and Kenyan endurance runners [22,23], Russian endurance athletes [24] as well as endurance athletes from Australia [25], Israel [26] and Poland [27].
Unbiased genome-wide approaches have been used in the search for genomic regions, transcripts or DNA variants linked or associated with endurance performance-related traits [28][29][30][31]. For instance, in a recent study, the association of 324,611 SNPs with the response of VO 2 max to endurance training in 473 Whites from HERITAGE was investigated [28]. None of the SNPs reached genome-wide significance even though there were several SNPs moderately associated with VO 2 max trainability.
It is of scientific interest to understand the differences in the genomic profile of highly trained endurance athletes of world class caliber and sedentary controls from the same ethnic ancestry. Such data could illuminate the biology of cardiorespiratory fitness and human adaptability and would likely have implications for common health problems such as those observed in aging individuals with declining health, diabetic patients with compromised cardiovascular fitness or patients with ischemic heart disease or heart failure to name but a few. Given the heterogeneous results reported in previous studies, we established an international consortium (GAMES) in order to reach larger sample sizes of elite endurance athletes and matched controls in a case-control study design. Genome-wide explorations were undertaken on two cohorts of world-class endurance athletes and controls (GENATHLETE and Japanese endurance runners), which generated a panel of 45 markers that were subsequently used for replication in seven additional cohorts of endurance athletes and controls.

Participants and Methods
The GENATHLETE cohort and a sample of endurance athletes and controls from Japan were used for the discovery phase. The characteristics of the GENATHLETE all male participants are presented in S1 Table and S1 Fig of the Supporting Information. There were 315 elite endurance athletes (national and world-class level) from Germany, Finland, Canada and the USA plus 320 sedentary controls from the same countries. All GENATHLETE participants are males and athletes and controls were closely matched for ethnicity and country of origin. The mean highest recorded VO 2 max of these athletes was 79.0 mL O 2 /kg/min with a standard deviation (SD) of 3.4 mL O 2 while the mean value for the sedentary controls was 40.0 (SD = 7.1). In the Japanese sample, 60 Japanese elite runners (world-class athletes) and 116 controls were available for the discovery phase. (Additional information on these two cohorts is provided in Supporting Information).
Endurance athletes and controls from 7 countries were used for the replication phase (Table 1). Participants were from Australia, Ethiopia, Japan, Kenya, Poland, Russia and Spain. It should be noted that we used a second Japanese cohort (143 athletes, 692 controls) in the replication phase. Again, in each country, endurance athletes were competing at least at the national level and most of them competed in Olympic or world cup events. A total of 1520 athletes and 2760 controls were involved in the present study. Among the endurance athletes, 1045 (about 69% of total) took part in endurance events at world championships and Olympic Games. We hypothesized that these world-class athletes are likely to be characterized by a higher concentration of "endurance performance alleles" and we performed separate analyses on this subsample (see Supporting Information for details).

Discovery Phase
GAMES was established after the two studies (GENATHLETE and Japanese endurance athlete cohort) had already performed their genome-wide screen and performed their analyses. Thus there are differences in the chips used and the analytical strategies. However, the data of the replication studies were all analyzed centrally at the Pennington site.
In GENATHLETE, genomic DNA was extracted from whole-blood samples by commercial DNA extraction kits (Gentra Systems, Inc., Minneapolis, MN), and the DNA stock samples were diluted to 50 ng/μl concentrations. SNPs for the study were those captured in the Illumina CardioMetabochip (Illumina Inc., San Diego, CA), which contains over 195,000 genetic markers including~66,000 variants implicated in the aetiology of cardiometabolic traits and disease outcomes from discovery GWAS cohorts, as well as variants around known loci for the purposes of fine-mapping [32]. The SNPs were genotyped using the Illumina Infinium II assay on Illumina iScan platform.
SNPs showing marked deviation from Hardy-Weinberg equilibrium (HWE) (p 0.00001) were excluded. However, since deviations from HWE may be related to case-control statusrelated differences of genotype frequencies, identical non-HWE pattern was confirmed both in endurance athletes and non-athlete controls before the SNP was excluded from the database. A total of 143,000 SNPs were polymorphic and passed the quality control filters. In GENATH-LETE, we estimate that we an 80% statistical power to detect odds ratios (ORs) of 2.7 and 2.1 for minor allele frequencies of 0.1 and 0.3, respectively, assuming an additive model and an alpha level of 5x10 -8 . For the Japanese cohort, total DNA was isolated from saliva or venous blood by use of QIAamp DNA blood Maxi Kit (QIAGEN, Hilden, Germany) or Oragene DNA Collection Kits (DNA genotek, Ontario, Canada), respectively. Total DNA samples were genotyped for more than 700,000 markers using the Illumina 1 HumanOmniExpress Beadchip. The genotype calls were performed with the Illumina GenomeStudio software. Quality control measures were performed as defined in the Supporting Information. After removing SNPs failing quality control, 541,179 autosomal SNPs in 60 Japanese endurance athletes and 116 Japanese controls were available for association analyses. In the Japanese cohort, we have the ability to detect at 80% statistical power ORs of 6.1 and 4.5 for minor allele frequencies of 0.1 and 0.3, respectively, assuming an additive model and an alpha level of 5x10 -8 .

Data Analysis
For the GENATHLETE samples, tests of HWE for each SNP were conducted using the exact test implemented in the PEDSTATS software [33]. Allele frequency differences between athletes and controls were tested using a chi-square test as implemented in the PLINK software package [34]. For the Japanese study, standard allelic association analysis was performed by comparing allele-frequency differences between Japanese endurance runners and controls.
None of the SNPs reached genome-wide significance level (p<5 x 10 −8 ) in GENATHLETE or the Japanese Study. Considering that this may be simply the result of the relatively small sample size of the discovery cohorts, we elected to retain the 45 most promising SNPs (all p<1 x 10 −4 ) for further testing in the replication studies. Among these 45 SNPs, 26 came from GEN-ATHLETE CardioMetabochip (13 based on full cohort, 13 from the highest VO 2 max subgroup analyses) and 19 from the Japanese GWAS.

Replication Phase
The 45 SNPs carried forward were genotyped in athletes and controls from Australia, Ethiopia, Japan, Kenya, Poland, Russia and Spain as described in the Supporting Information. Tests of HWE for each SNP were also performed with the PEDSTATS software [33] and allele-frequency differences between athletes and controls were tested using a chi-square test as implemented in PLINK [34].

Meta-analysis
Results from individual studies were combined using a meta-analysis approach. These analyses were done with the meta-analysis routine of the PLINK software using study-specific association test result files as an input. Three subsets were meta-analyzed: all studies based on athletes of European descent that were part of the replication phase (i.e. Australia, Poland, Russia and Spain), the two studies based on participants from Africa, and all studies combined (including athletes from Ethiopia, Kenya and Japan). In each case, analyses were undertaken based on the results of the comparisons between all athletes and controls from each national cohort.

Further Analyses Performed on World-Class Endurance Athletes
In some of the national cohorts, all endurance athletes were of world-class caliber. However, in other cohorts (GENATHLETE, Poland, Russia and the second Japanese cohort), there was a combination of national level and world-class caliber endurance athletes. We were able to classify the endurance athletes between those who were truly of world-class caliber, as defined by their participation in World Championship competitions or in Olympic Games, and those who competed at the national level (see Supporting Information). In the case of GENATH-LETE, the classification was based on the VO 2 max measurement, with those exhibiting a VO 2 max value 78 mL O 2 /kg/min being classified as part of the "super elite" contingent. All the analyses described in the previous sections were repeated using only these world-class super elite endurance athletes and controls.

Informed Consent
The GENATHLETE project was originally approved by the Medical Ethics Committee of Laval University (Quebec, Canada). Approval has also been obtained from the medical ethics committees of all institutions that have contributed participantsto the GENATHLETE cohort. Continuing approval for the study has been granted by the IRB of Pennington Biomedical Research Center. Each GENATHLETE participant has given written informed consent. Written informed consent was obtained from all participants from the Japanese, Ethiopian, and Kenyan cohorts, respectively, and was approved by the Institutional Review Board of Tokyo Metropolitan Institute of Gerontology, National Institute of Health and Nutrition, Japan; the Oxford Tropical Research Ethics Committee, the University of Glasgow Ethics Committee, and a committee from the Ethiopian Athletics Federation; and the University Ethics Committee in Kenya. Written informed consent was obtained for all Australian participants. The study was approved by the institutional review boards and Ethics Committees of the Children's Hospital at Westmead, the University of Sydney, and the Australian Institute of Sport. The procedures followed in the study from Poland were approved by the Pomeranian Medical University Ethics Committee and all participants gave informed written consent. The elite athlete study from Russia was approved by the Ethics Committee of the Research Institute for Physical-Chemical Medicine. Written informed consent was obtained from each participant. All participants from the Spanish cohort provided written consent and the study protocol was approved by the IRB of Pablo Olavide University (Spain).

Discovery Phase
In GENATHLETE, a total of 143,000 SNPs were polymorphic and passed the quality control filters. A Manhattan plot depicting associations between elite endurance athletes and sedentary controls for all SNPs across the 22 autosomes is shown in Panel A of Fig 1. The strength of the association is shown on the y-axis as a-log10 of the p-value, which represents the statistical significance of the allele frequency chi-square test. None of the SNPs reached the genome-wide significance threshold of 5x10 -8 . 50 SNPs showed p-values of less than 5.0 x 10 −4 and the top 26 SNPs are shown in Table 2. Among the 50 SNPs, 8 were found to be in strong pairwise linkage disequilibrium (r 2 >0.9) and 2 had a minor allele frequency of <5%. From the 40 SNPs left, the top 26 with the most significant p values in GENATHLETE were retained and are shown in the table.
The strongest evidence of association (1.2x10 -5 < p < 6.1x10 -5 ) was detected with a cluster of SNPs located on the long arm of chromosome 15 (15q23) about 69.7 million base pairs from the start of the chromosome (Panel A in Fig 1). While the cluster is located in an intergenic region, the same chromosomal region has been previously reported to be associated with atrial fibrillation in the Framingham Heart Study [35]. Five additional SNPs showed associations with p-values ranging from 2.61x10 -5 to 7.37x10 -5 ; these markers were located within or in the vicinity of genes encoding adrenoceptor alpha 1 A (ADRA1A), BCL2-like 14 [apoptosis facilitator] (BCL2L14), GLIS family zinc finger 3 (GLIS3), myosin, light-chain 2, regulatory, cardiac, slow (MYL2), and myosin VB (MYO5B).
To evaluate whether these genetic associations were even more prominent among elite athletes with the highest cardiorespiratory fitness level, the analyses were repeated by comparing only those athletes with a VO 2 max of 78 mL/kg/min (N = 168) versus controls. The rationale was that sequence variants playing a role in maximal endurance capacity should cluster even more tightly in world-class endurance athletes compared to endurance athletes with lower VO 2 max levels. An overview of the results is shown Panel B of Fig 1. A total of 50 SNPs showed associations with p-values less than 5.0 x 10 −4 and eight SNPs with p-values less than 9.7 x 10 −5 . The strongest evidence of association was detected with a SNP located in the ADRA1A locus (p = 8.69 x 10 −7 ) and the top SNPs are also listed in Table 2.

Japanese Endurance GWAS
In the Japanese cohort, after removing SNPs and individuals failing quality control, 541,179 autosomal SNPs were available for analysis. A total of 31% of the SNPs were common to the Illumina chips used in the Japanese endurance athlete cohort and GENATHLETE. Allelic association analyses were performed by comparing allele-frequency differences between Japanese endurance athletes and controls. The Quantile-Quantile p-value plot of observed versus expected-log 10 (p) values is shown in S2 Fig No SNPs exceeded the threshold for genome-wide significance (p value < 5 x 10 −8 ). The association results for markers with p < 5 x 10 −5 plus a few markers as defined next are displayed in Table 3. Out of the total SNPs entered into the association analysis, 21 met this threshold. Among them, 3 SNPs were associated at p < 5 x 10 −6 . Regional association plots of the top signals were created using LocusZoom Version 1.1 [36], including information on the location and orientation of genes, local estimates of recombination rates and levels of linkage disequilibrium (LD). An individual plot was specified by the SNP of interest and treated as the key marker for the region. Markers within a 500 kb flanking region each side of the index SNP were included. Plots were generated based on Human Genome19 (hg19). Pairwise LD between the index SNP and the surrounding SNPs and recombination rates were estimated in Locus-Zoom using 1000G Mar 2012 ASN as the reference population.
Regional association plots of the 21 SNPs with a p < 5 x 10 −5 were further inspected, leading to the exclusion of 10 SNPs, which were considered as redundant. The regional association plots for the remaining 11 SNPs (see Table 3) are presented in S3-S13 Figs of the Supporting Information. Another set of 7 SNPs with a p < 10 −4 were added to the panel after examination of regional association plots (S14-S20 Figs). Furthermore, rs2694093 (p = 0.00014) was independently associated with athlete status and is in a region near the strongest signal rs921665 (chr2:3174321, hg19/1000 Genomes Mar 2012 ASN; S21 Fig). It was also included in the discovery set. All 19 SNPs retained for the replication phase based on the results on the Japanese athletes are summarized in Table 3. These 19 SNPs were combined with the 26 retained from GENATHLETE to constitute the panel of 45 SNPs that was used in the replication cohorts. Table 4 depicts the association results for the 42 SNPs that could be tested in at least two of the four replication cohorts of Caucasians, namely Australian, Polish, Russian and Spanish endurance athletes and controls. Meta-analysis showed that none of the markers summarized in Table 4 reached statistical significance after accounting for multiple testing (Bonferroni-corrected statistical significance p<0.0011), although one marker (rs558129 in GALNT6) showed a nominal p-value of P = 0.02. Five SNPs were nominally (p<0.05) associated with endurance athlete status in the Australian study, as well as one in the Polish cohort, four in the Spanish cohort, and five in the Russian cohort. One SNP (rs7947391) showed nominally significant associations in three replication cohorts (Australians, Polish and Spanish). However, as is evident from the meta-analysis results, the direction of the association was not the same across the three studies; the minor allele frequency was lower in athletes than in controls in the Spanish cohort (as well as in the GENATHLETE discovery cohort), whereas Australian and Polish athletes had greater minor allele frequency than their non-athlete controls.

Replication Phase
The meta-analysis of the subgroup of athletes classified as world class and truly elite endurance athletes did not reveal any significant associations. Five SNPs had nominal p-values less than 0.05 in Australian world-class athletes, one in Polish world-class athletes, four SNPs in Spanish world-class athletes, and three in Russian world-class athletes (Table 5). However, these associations were generally not directionally consistent among cohorts.
Next, we report on the associations between 35 SNPs identified in the discovery phase and endurance athlete status in Ethiopian and Kenyan athletes and controls ( Table 6). The lower number of SNPs available for these analyses is due to the fact that several SNPs selected for the replication phase were monomorphic or very rare in African populations. The meta-analysis of the Ethiopian and Kenyan athletes did not reveal any associations that were significant after accounting for multiple testing, although four SNPs had nominal p-value less than 0.05. Five and three SNPs had nominal p-value less than 0.05 in the Kenyan sample (276 athletes versus 83 controls) and in the Ethiopian study (75 athletes versus 198 controls), respectively, but none of these SNPs overlapped between the two cohorts. In fact, the most significant SNPs observed in the Kenyan cohort showed associations in the opposite direction in the Ethiopian cohort.
In an attempt to identify lead SNPs based on all the cohorts contributing to the GAMES project, meta-analyses were performed using all contributing studies. The results are summarized in Table 7. Finally, the summary meta-analyses were repeated with all subsamples of world-class endurance athletes and these results are depicted in Table 8. Among all available athletes and controls (9 studies, including discovery cohorts), the meta-analysis revealed one statistically significant SNP (rs558129 at GALNTL6 locus, p = 0.0002), even after correcting for multiple testing (Bonferroni-corrected statistical significance p<0.0011). As shown by the low heterogeneity index (I 2 = 0), all eight cohorts showed the same direction of association with rs558129, even though p-values varied considerably across the individual studies. When the meta-analysis was repeated without the discovery cohorts, the association persisted near the multiple testing-adjusted significance threshold (p = 0.0019).
When the meta-analysis was restricted to the world class athletes versus all controls, none of the SNPs retained for replication reached multiple testing-corrected statistical significance, although SNPs rs4288991 (p = 0.0028) and rs10938202 (p = 0.0053) were trending close. SNP rs558129 was characterized by a nominal p value of 0.04 in this meta-analysis.

Discussion
It is commonly recognized that to perform at a world-class level in endurance athletic events, one has to be intrinsically well endowed in terms of cardiorespiratory and skeletal muscle potential to exercise at high intensity for sustained periods of time in combination with the ability to respond very favorably to exercise training regimens. Observational genetic epidemiology studies and experimental studies in humans as well as in rodents have shown that there are substantial genetic components to intrinsic cardiorespiratory fitness and its trainability [5,9,10]. However, the exact genomic features responsible for these genetic effects have not been identified despite many years of candidate gene driven research, a topic that has been reviewed in recent times [18,[37][38][39]. A limited number of studies have used an unbiased genomic search to identify genomic regions harboring allelic markers of athlete status [40] or response to exercise training of physiological traits [30,31,[41][42][43] or GWAS-identified loci associated with determinants of aerobic fitness or its trainability [28][29][30] but most findings have not been subjected to replication studies. The present report constitutes the largest effort to date designed to identify in an unbiased manner common variants that could begin to define the world-class endurance performance genotype. Even though the total sample size was substantially higher than any other published effort in this field, it is recognized that it was not optimal for the identification of common genomic variants with small effect size that could discriminate between elite endurance athletes and sedentary controls. The main conclusion from the present study is that common genetic variants do not appear to be strong determinants of elite endurance athlete status. From the GWAS performed on a panel of Japanese endurance athletes and controls plus the CardioMetabochip screen on the endurance athletes from four countries and their matched controls of the GENATHLETE cohort, a panel of 45 SNPs was retained and carried forward for replication in seven other cohorts. Importantly, none of these 45 SNPs reached genomewide significance in the two discovery studies. But these studies were small and were expected to generate targets at the genome-wide level of significance only if SNPs with large effect sizes were contributing to elite endurance athlete status. There was no overlap among the top SNPs identified in both discovery panels.
When the results from all cohorts were pooled together in meta-analyses, SNP rs558129 located in the GALNTL6 locus on chromosome 4q34.1 was statistically significant (p = 0.0002) even after multiple testing was taken into account. The nominal p-values of the individual cohorts ranged from 0.011 (Australia) to 0.9572 (Spain), but the direction of the association was uniform across all cohorts, that is the T allele was less frequent in athletes than in controls. The association remained robust after excluding the discovery cohort from the meta-analysis (p = 0.0071). The same SNP in GALNTL6 was also nominally significant (p = 0.037) when the meta-analysis was undertaken on the subsets of world-class endurance athletes. This illustrates how random effects meta-analysis helps to identify consistent trends across several relatively Effect allele (i.e., minor allele). OR = odds ratio from a random-effects meta-analysis. P = p-value for OR from a random-effects meta-analysis. Q = p-value for Cochrane's Q statistic (tests heterogeneity in effects across individual studies).
MAP based on GRCh38 (hg38). *The gene located nearest to the SNP. Distance to the gene in kilo bases (1,000 bp) is shown in parentheses. If no distance is shown, the SNP is located within the gene locus.
doi:10.1371/journal.pone.0147330.t005   small cohorts which individually would not allow for the documentation of such an association. The GALNTL6 gene encodes N-acetylgalactosaminyltransferase-like 6 but the functional role of the peptide is not fully elucidated at this time. It is expressed primarily in the testes but also in the brain and to some extent in skeletal muscle [44]. The rs558129 polymorphic site is in the last intron of the gene. A few weak leads may also be of interest in future research. Two SNPs in or near TSSC1, the gene encoding tumor suppressing substransferable candidate 1, were associated with endurance athlete status in the cohort from Russia and one of these SNPs (rs2694093) was also shown to be nominally significant and directionally consistent in the global meta-analysis of all cohorts for all endurance athletes as well as the world-class athletes. Of potential interest could also be SNPs near TOX3/CHD9 on chr 16 and RPLP1/TLE3 on chr 15 which were nominally associated with endurance athlete status in the global meta-analyses of all endurance athletes and in the subset of world-class athletes as well. None of these genes have been implicated in exercise biology before.
Interestingly, SNPs in three candidate genes (CKM, ACTN3 and GNB3) selected on the basis of prior results in Japanese endurance athletes were not associated with endurance athlete status in any of the cohorts or in meta-analyses. Along the same line, 161 SNPs related to 13 candidate genes for endurance performance derived from the human gene map for physical performance [18] were available among the CardioMetabochip markers genotyped in GEN-ATHLETE. These candidate genes were: ACE, ACSL1, ACTN3, ADRB1, ADRB2, AMPD1, BDKRB2, GH1, IL6, KDR, NOS3, PPARA and PPARGC1A. None of the 161 SNPs reached the Bonferroni-adjusted significance threshold of p = 3.5 x 10 −7 . Two SNPs in the vicinity of PPARGC1A were the most strongly associated with endurance athlete status at p0.0012. None of the SNP associations were close to significance when the subsample of the GENATH-LETE world-class endurance athletes (VO 2 max 78 mL O 2 /kg/min) was considered.
Over all, there is no convincing evidence for the contribution of common genomic variants to elite endurance performance in the present report. However, since we were underpowered to identify contributing alleles with small effect sizes, one may be justified in proposing that a few leads, even though supported by very modest levels of evidence, be explored in subsequent studies with adequate statistical power. For instance, the joint contribution from gene-members of a pathway or a network module could be significant, even if individual gene contributions are small and non-significant. We have used this approach successfully in other GWA studies, for example on the analysis of genetic associations to maximal oxygen uptake in response to exercise [45]. An alternative approach employs unsupervised bioinformatics data analysis techniques to generate functional hypotheses on how some of the top candidates from a study may be related to a trait. Graph based methods, such as those based on stochastic random walks [46], are well suited for this type of analysis and can provide useful information for candidate gene prioritization. In this regard, it may be useful to expand the comparison of world-class endurance athletes and sedentary controls to DNA sequence variants in or in the vicinity of a number of genes (for instance: GALNTL6 but also perhaps ADRA1A, ATP8A1, CHD9, CNTN3, RPLP1, TLE3, TOX3, TSSC1, AKT3, BET1, BMP10) in order to shed some light on the genetics, biology and the highly demanding selection process leading to world class endurance performance. Thus, using the 12 genes listed above in a Biograph analysis, in which the query was on athletic performance, we have found additional support for the suggestion that sequence variants in these genes should be further investigated (results not shown). Even when SNPs and genes are characterized by small effect size and marginal nominal significance, the joint contributions from gene-members of a pathway or a network may provide useful information.
The present report has several limitations. The SNP chips used for the discovery phase differed between GENATHLETE and the Japanese cohort of endurance athlete. These discovery studies were performed independently and later used to identify the panel of the most promising 45 SNPs. It is also recognized that the CardioMetabochip used in GENATHLETE offers less than a comprehensive coverage of the genome and it may provide less than optimal inclusion of markers and genes that could be important for the peripheral determinants of endurance performance. In some of the replication studies, participants in the control group were from the general population or recreationally active subjects while in others they were confirmed sedentary participants. However, even though the endurance athletes were exposed to variable training regimens, which are undoubtedly heterogeneous by athletic event and country, these factors should not have an influence on the results as they all achieved national and international levels of competition requiring an extraordinary level of cardiorespiratory endurance. Obviously sample size continues to be a major issue for this type of studies. This is a challenge that is very difficult to overcome as the number of world-class endurance athletes on the planet is limited and those who reach this level of performance are hesitant to give consent to participate in genetic studies. Finally, one has also to consider that a number of the athletes who gave consent to participate in these genetic studies are using performance enhancement drugs. However, it is unlikely that doping of any kind could have influenced the results as the use of these drugs is not by itself responsible for an athlete reaching world-class level but rather their uses come more into play when athletes of world-class caliber are engaged competitively against one another. OR = odds ratio from a random-effects meta-analysis. P = p-value for OR from a random-effects meta-analysis. Q = p-value for Cochrane's Q statistic (tests heterogeneity in effects across individual studies).
I 2 (I-squared) = heterogeneity index (0-100; 0 = no heterogeneity, 100 = max. heterogeneity). MAP based on GRCh38 (hg38). *The gene located nearest to the SNP. Distance to the gene in kilo bases (1,000 bp) is shown in parentheses. If no distance is shown, the SNP is located within the gene locus. In all likelihood, attaining the required sample size of world class caliber endurance athletes, for an adequate statistical power in the search for critical sequence variants, will require a buyin from the relevant world sports federations and the International Olympic Committee. With their support and active participation, thousands of world-class endurance athletes could be enrolled in genomics studies aimed at understanding the fundamentals of inherited biological traits that are necessary to perform at the world class level. Such an effort, particularly if it relied on whole genome sequencing, would allow for the exploration of not only common polymorphisms but also rare variants and copy number variants and could be complemented by the investigation of epigenomic signatures in accessible tissues. In summary, we found that the T allele in GALNTL6 was less frequent in endurance athletes of all studies compared to ethnicity-matched controls. However, we could not find evidence for a detailed genomic signature that differentiates endurance athletes from controls.