There are strong genetic components to cardiorespiratory fitness and its response to exercise training. It would be useful to understand the differences in the genomic profile of highly trained endurance athletes of world class caliber and sedentary controls. An international consortium (GAMES) was established in order to compare elite endurance athletes and ethnicity-matched controls in a case-control study design. Genome-wide association studies were undertaken on two cohorts of elite endurance athletes and controls (GENATHLETE and Japanese endurance runners), from which a panel of 45 promising markers was identified. These markers were tested for replication in seven additional cohorts of endurance athletes and controls: from Australia, Ethiopia, Japan, Kenya, Poland, Russia and Spain. The study is based on a total of 1520 endurance athletes (835 who took part in endurance events in World Championships and/or Olympic Games) and 2760 controls. We hypothesized that world-class athletes are likely to be characterized by an even higher concentration of endurance performance alleles and we performed separate analyses on this subsample. The meta-analysis of all available studies revealed one statistically significant marker (rs558129 at GALNTL6 locus, p = 0.0002), even after correcting for multiple testing. As shown by the low heterogeneity index (I2 = 0), all eight cohorts showed the same direction of association with rs558129, even though p-values varied across the individual studies. In summary, this study did not identify a panel of genomic variants common to these elite endurance athlete groups. Since GAMES was underpowered to identify alleles with small effect sizes, some of the suggestive leads identified should be explored in expanded comparisons of world-class endurance athletes and sedentary controls and in tightly controlled exercise training studies. Such studies have the potential to illuminate the biology not only of world class endurance performance but also of compromised cardiac functions and cardiometabolic diseases.
Citation: Rankinen T, Fuku N, Wolfarth B, Wang G, Sarzynski MA, Alexeev DG, et al. (2016) No Evidence of a Common DNA Variant Profile Specific to World Class Endurance Athletes. PLoS ONE 11(1): e0147330. doi:10.1371/journal.pone.0147330
Editor: Stuart Raleigh, University of Northampton, UNITED KINGDOM
Received: April 29, 2015; Accepted: January 1, 2016; Published: January 29, 2016
Copyright: © 2016 Rankinen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are provided within the paper and its Supporting Information files. Regulations regarding confidentiality and access to original and confidential data on athletes vary from country to country. The original athlete data are controlled by the Principle Investigator of each study. Interested researchers may access the data of the two discovery cohorts (GENATHLETE and Japanese cohort) at Figshare (Rankinen, Tuomo (2015): GAMES discovery data sets http://dx.doi.org/10.6084/m9.figshare.1619893) and at www.athlomeconsortium.org. For the data of the various replication studies, the principal investigator of each contributing study should be contacted. If problems arise, interested scientists may contact Claude Bouchard (email@example.com) who will facilitate the interactions with the investigators of the contributing studies.
Funding: Partial funding for the study has been received from the Prince Faisal Prize awarded to Drs. C. Bouchard, T. Rankinen, M. Sarzynski and B. Wolfahrt. CB is partially funded by the John W. Barton Jr Chair in Genetics and Nutrition. The study in Russia was supported by a grant from the Federal Medical-Biological Agency (“Sportgen project”), http://fmbaros.ru/en/fmba/infor/. The Spanish group (AL, CAM, CS, TY) was funded by Fondo de Investigaciones Sanitarias (grant # PI12/00914). Research in sports genetics to A. Lucia in Spain was funded by grants from the Consejo Superior de Deportes (CSD). This work in Japan was supported in part by grants from the programs Grants-in-Aid for Challenging Exploratory Research (24650414 to NF) from the Ministry of Education, Culture, Sports, Science and Technology; and by a grant-in-aid for scientific research from the Ministry of Health, Labor, and Welfare of Japan (to MM). No specific funding for this work was received for the studies performed in Australia, Poland, Kenya and Ethiopia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: CB was a member of the Science Advisory board of Pathway Genomics from 2011–2014. CB was a member of the Advisory board of Nike-SPARK from 2012–2014. All other authors have declared that no competing interests exist. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.
Early studies on the genetic basis of sports performance reported that identical twins who engaged in competitive sports were significantly more likely to participate in the same sports than pairs of dizygotic twins . The first documented attempts to identify genetic markers for sports performance date to the 1968 Mexico and 1976 Montreal Olympic Games and were based on common blood genetic markers. They did not yield any strong positive findings [2–4]. A high maximal oxygen uptake (VO2max) is a necessary condition to reach the level of an endurance athlete of international caliber. A high VO2max can only be achieved if an individual is endowed with a very high level in the sedentary state (intrinsic level) in combination with large increases in response to sustained and demanding exercise training regimens (trainability).
Twin and family studies have revealed that the intrinsic level of VO2max is strongly influenced by a genetic component. For instance, the heritability of VO2max adjusted for age, sex and body composition in sedentary families of European descent reached 51% in the HERITAGE Family Study . Evidence of a significant heritability level has also been documented for exercise training-induced improvements in VO2max. Large individual differences in VO2max gains have been found in sedentary young adults subjected to standardized endurance training programs . A series of exercise training experiments conducted with pairs of identical twins revealed that the differences in trainability were not distributed randomly among the twins, with intraclass correlations in response of VO2max (L O2/min) ranging from 0.44 to 0.77 [7–9], indicating that members of the same twin pair responded similarly to training. In HERITAGE, the increase in VO2max in 481 individuals from 99 two-generation families of Whites of European descent showed 2.5 times more variance between families than within families for VO2max response, with a maximal heritability estimate of 47% . Adjusting the VO2max response data for baseline VO2max did not modify this estimate, suggesting along with other evidence that the familial and genetic factors underlying VO2max in the sedentary state and its response to exercise training are different .
In the case-control GENATHLETE study, single nucleotide polymorphisms (SNPs) in several candidate genes were investigated but none have provided strong evidence for differences in allele and genotype frequencies between elite endurance athletes and controls [12–17]. Reports published to date have focused on differences in allele frequencies between athletes and non-athlete controls mainly on angiotensin I converting enzyme (ACE), α-actinin-3 (ACTN3), and peroxisome proliferator-activated receptor-γ coactivator 1α (PPARGC1A) polymorphisms and on mitochondrial DNA (mtDNA) haplogroup distributions . There were several positive findings but most were derived from post hoc subgroup analyses and only a few studies controlled for multiple testing. Most of prior studies have come from elite Japanese runners [19, 20], Spanish endurance athletes , Ethiopian and Kenyan endurance runners [22, 23], Russian endurance athletes  as well as endurance athletes from Australia , Israel  and Poland .
Unbiased genome-wide approaches have been used in the search for genomic regions, transcripts or DNA variants linked or associated with endurance performance-related traits [28–31]. For instance, in a recent study, the association of 324,611 SNPs with the response of VO2max to endurance training in 473 Whites from HERITAGE was investigated . None of the SNPs reached genome-wide significance even though there were several SNPs moderately associated with VO2max trainability.
It is of scientific interest to understand the differences in the genomic profile of highly trained endurance athletes of world class caliber and sedentary controls from the same ethnic ancestry. Such data could illuminate the biology of cardiorespiratory fitness and human adaptability and would likely have implications for common health problems such as those observed in aging individuals with declining health, diabetic patients with compromised cardiovascular fitness or patients with ischemic heart disease or heart failure to name but a few. Given the heterogeneous results reported in previous studies, we established an international consortium (GAMES) in order to reach larger sample sizes of elite endurance athletes and matched controls in a case-control study design. Genome-wide explorations were undertaken on two cohorts of world-class endurance athletes and controls (GENATHLETE and Japanese endurance runners), which generated a panel of 45 markers that were subsequently used for replication in seven additional cohorts of endurance athletes and controls.
Participants and Methods
The GENATHLETE cohort and a sample of endurance athletes and controls from Japan were used for the discovery phase. The characteristics of the GENATHLETE all male participants are presented in S1 Table and S1 Fig of the Supporting Information. There were 315 elite endurance athletes (national and world-class level) from Germany, Finland, Canada and the USA plus 320 sedentary controls from the same countries. All GENATHLETE participants are males and athletes and controls were closely matched for ethnicity and country of origin. The mean highest recorded VO2max of these athletes was 79.0 mL O2/kg/min with a standard deviation (SD) of 3.4 mL O2 while the mean value for the sedentary controls was 40.0 (SD = 7.1). In the Japanese sample, 60 Japanese elite runners (world-class athletes) and 116 controls were available for the discovery phase. (Additional information on these two cohorts is provided in Supporting Information).
Endurance athletes and controls from 7 countries were used for the replication phase (Table 1). Participants were from Australia, Ethiopia, Japan, Kenya, Poland, Russia and Spain. It should be noted that we used a second Japanese cohort (143 athletes, 692 controls) in the replication phase. Again, in each country, endurance athletes were competing at least at the national level and most of them competed in Olympic or world cup events. A total of 1520 athletes and 2760 controls were involved in the present study. Among the endurance athletes, 1045 (about 69% of total) took part in endurance events at world championships and Olympic Games. We hypothesized that these world-class athletes are likely to be characterized by a higher concentration of “endurance performance alleles” and we performed separate analyses on this subsample (see Supporting Information for details).
GAMES was established after the two studies (GENATHLETE and Japanese endurance athlete cohort) had already performed their genome-wide screen and performed their analyses. Thus there are differences in the chips used and the analytical strategies. However, the data of the replication studies were all analyzed centrally at the Pennington site.
In GENATHLETE, genomic DNA was extracted from whole-blood samples by commercial DNA extraction kits (Gentra Systems, Inc., Minneapolis, MN), and the DNA stock samples were diluted to 50 ng/μl concentrations. SNPs for the study were those captured in the Illumina CardioMetabochip (Illumina Inc., San Diego, CA), which contains over 195,000 genetic markers including ~66,000 variants implicated in the aetiology of cardiometabolic traits and disease outcomes from discovery GWAS cohorts, as well as variants around known loci for the purposes of fine-mapping . The SNPs were genotyped using the Illumina Infinium II assay on Illumina iScan platform.
SNPs showing marked deviation from Hardy-Weinberg equilibrium (HWE) (p ≤ 0.00001) were excluded. However, since deviations from HWE may be related to case-control status-related differences of genotype frequencies, identical non-HWE pattern was confirmed both in endurance athletes and non-athlete controls before the SNP was excluded from the database. A total of 143,000 SNPs were polymorphic and passed the quality control filters. In GENATHLETE, we estimate that we an 80% statistical power to detect odds ratios (ORs) of 2.7 and 2.1 for minor allele frequencies of 0.1 and 0.3, respectively, assuming an additive model and an alpha level of 5x10-8.
For the Japanese cohort, total DNA was isolated from saliva or venous blood by use of QIAamp DNA blood Maxi Kit (QIAGEN, Hilden, Germany) or Oragene DNA Collection Kits (DNA genotek, Ontario, Canada), respectively. Total DNA samples were genotyped for more than 700,000 markers using the Illumina® HumanOmniExpress Beadchip. The genotype calls were performed with the Illumina GenomeStudio software. Quality control measures were performed as defined in the Supporting Information. After removing SNPs failing quality control, 541,179 autosomal SNPs in 60 Japanese endurance athletes and 116 Japanese controls were available for association analyses. In the Japanese cohort, we have the ability to detect at 80% statistical power ORs of 6.1 and 4.5 for minor allele frequencies of 0.1 and 0.3, respectively, assuming an additive model and an alpha level of 5x10-8.
For the GENATHLETE samples, tests of HWE for each SNP were conducted using the exact test implemented in the PEDSTATS software . Allele frequency differences between athletes and controls were tested using a chi-square test as implemented in the PLINK software package . For the Japanese study, standard allelic association analysis was performed by comparing allele-frequency differences between Japanese endurance runners and controls.
None of the SNPs reached genome-wide significance level (p<5 x 10−8) in GENATHLETE or the Japanese Study. Considering that this may be simply the result of the relatively small sample size of the discovery cohorts, we elected to retain the 45 most promising SNPs (all p<1 x 10−4) for further testing in the replication studies. Among these 45 SNPs, 26 came from GENATHLETE CardioMetabochip (13 based on full cohort, 13 from the highest VO2max subgroup analyses) and 19 from the Japanese GWAS.
The 45 SNPs carried forward were genotyped in athletes and controls from Australia, Ethiopia, Japan, Kenya, Poland, Russia and Spain as described in the Supporting Information. Tests of HWE for each SNP were also performed with the PEDSTATS software  and allele-frequency differences between athletes and controls were tested using a chi-square test as implemented in PLINK .
Results from individual studies were combined using a meta-analysis approach. These analyses were done with the meta-analysis routine of the PLINK software using study-specific association test result files as an input. Three subsets were meta-analyzed: all studies based on athletes of European descent that were part of the replication phase (i.e. Australia, Poland, Russia and Spain), the two studies based on participants from Africa, and all studies combined (including athletes from Ethiopia, Kenya and Japan). In each case, analyses were undertaken based on the results of the comparisons between all athletes and controls from each national cohort.
Further Analyses Performed on World-Class Endurance Athletes
In some of the national cohorts, all endurance athletes were of world-class caliber. However, in other cohorts (GENATHLETE, Poland, Russia and the second Japanese cohort), there was a combination of national level and world-class caliber endurance athletes. We were able to classify the endurance athletes between those who were truly of world-class caliber, as defined by their participation in World Championship competitions or in Olympic Games, and those who competed at the national level (see Supporting Information). In the case of GENATHLETE, the classification was based on the VO2max measurement, with those exhibiting a VO2max value ≥78 mL O2/kg/min being classified as part of the “super elite” contingent. All the analyses described in the previous sections were repeated using only these world-class super elite endurance athletes and controls.
The GENATHLETE project was originally approved by the Medical Ethics Committee of Laval University (Quebec, Canada). Approval has also been obtained from the medical ethics committees of all institutions that have contributed participantsto the GENATHLETE cohort. Continuing approval for the study has been granted by the IRB of Pennington Biomedical Research Center. Each GENATHLETE participant has given written informed consent. Written informed consent was obtained from all participants from the Japanese, Ethiopian, and Kenyan cohorts, respectively, and was approved by the Institutional Review Board of Tokyo Metropolitan Institute of Gerontology, National Institute of Health and Nutrition, Japan; the Oxford Tropical Research Ethics Committee, the University of Glasgow Ethics Committee, and a committee from the Ethiopian Athletics Federation; and the University Ethics Committee in Kenya. Written informed consent was obtained for all Australian participants. The study was approved by the institutional review boards and Ethics Committees of the Children’s Hospital at Westmead, the University of Sydney, and the Australian Institute of Sport. The procedures followed in the study from Poland were approved by the Pomeranian Medical University Ethics Committee and all participants gave informed written consent. The elite athlete study from Russia was approved by the Ethics Committee of the Research Institute for Physical-Chemical Medicine. Written informed consent was obtained from each participant. All participants from the Spanish cohort provided written consent and the study protocol was approved by the IRB of Pablo Olavide University (Spain).
In GENATHLETE, a total of 143,000 SNPs were polymorphic and passed the quality control filters. A Manhattan plot depicting associations between elite endurance athletes and sedentary controls for all SNPs across the 22 autosomes is shown in Panel A of Fig 1. The strength of the association is shown on the y-axis as a—log10 of the p-value, which represents the statistical significance of the allele frequency chi-square test. None of the SNPs reached the genome-wide significance threshold of 5x10-8. 50 SNPs showed p-values of less than 5.0 x 10−4 and the top 26 SNPs are shown in Table 2. Among the 50 SNPs, 8 were found to be in strong pairwise linkage disequilibrium (r2>0.9) and 2 had a minor allele frequency of <5%. From the 40 SNPs left, the top 26 with the most significant p values in GENATHLETE were retained and are shown in the table.
(A) Athletes with VO2max ≥ 75 ml O2/kg/min (N = 315) vs. Controls (N = 320). (B) Elite Athletes with VO2max ≥ 78 ml O2/kg/min (N = 168) vs. Controls (N = 320).
The strongest evidence of association (1.2x10-5 < p < 6.1x10-5) was detected with a cluster of SNPs located on the long arm of chromosome 15 (15q23) about 69.7 million base pairs from the start of the chromosome (Panel A in Fig 1). While the cluster is located in an intergenic region, the same chromosomal region has been previously reported to be associated with atrial fibrillation in the Framingham Heart Study . Five additional SNPs showed associations with p-values ranging from 2.61x10-5 to 7.37x10-5; these markers were located within or in the vicinity of genes encoding adrenoceptor alpha 1 A (ADRA1A), BCL2-like 14 [apoptosis facilitator] (BCL2L14), GLIS family zinc finger 3 (GLIS3), myosin, light-chain 2, regulatory, cardiac, slow (MYL2), and myosin VB (MYO5B).
To evaluate whether these genetic associations were even more prominent among elite athletes with the highest cardiorespiratory fitness level, the analyses were repeated by comparing only those athletes with a VO2max of ≥78 mL/kg/min (N = 168) versus controls. The rationale was that sequence variants playing a role in maximal endurance capacity should cluster even more tightly in world-class endurance athletes compared to endurance athletes with lower VO2max levels. An overview of the results is shown Panel B of Fig 1. A total of 50 SNPs showed associations with p-values less than 5.0 x 10−4 and eight SNPs with p-values less than 9.7 x 10−5. The strongest evidence of association was detected with a SNP located in the ADRA1A locus (p = 8.69 x 10−7) and the top SNPs are also listed in Table 2.
Japanese Endurance GWAS
In the Japanese cohort, after removing SNPs and individuals failing quality control, 541,179 autosomal SNPs were available for analysis. A total of 31% of the SNPs were common to the Illumina chips used in the Japanese endurance athlete cohort and GENATHLETE. Allelic association analyses were performed by comparing allele-frequency differences between Japanese endurance athletes and controls. The Quantile-Quantile p-value plot of observed versus expected—log10(p) values is shown in S2 Fig. The genomic inflation factor (λ) value in Japanese was 1.002, indicating that there was no substantial evidence of population stratification. A Manhattan plot of—log10(p) values for associations of elite Japanese endurance status with markers in 22 autosomes is shown in Fig 2.
Red line refers to p = 5x10-6; blue line refers to p = 5x10-5.
No SNPs exceeded the threshold for genome-wide significance (p value < 5 x 10−8). The association results for markers with p < 5 x 10−5 plus a few markers as defined next are displayed in Table 3. Out of the total SNPs entered into the association analysis, 21 met this threshold. Among them, 3 SNPs were associated at p < 5 x 10−6. Regional association plots of the top signals were created using LocusZoom Version 1.1 , including information on the location and orientation of genes, local estimates of recombination rates and levels of linkage disequilibrium (LD). An individual plot was specified by the SNP of interest and treated as the key marker for the region. Markers within a 500 kb flanking region each side of the index SNP were included. Plots were generated based on Human Genome19 (hg19). Pairwise LD between the index SNP and the surrounding SNPs and recombination rates were estimated in LocusZoom using 1000G Mar 2012 ASN as the reference population.
Regional association plots of the 21 SNPs with a p < 5 x 10−5 were further inspected, leading to the exclusion of 10 SNPs, which were considered as redundant. The regional association plots for the remaining 11 SNPs (see Table 3) are presented in S3–S13 Figs of the Supporting Information. Another set of 7 SNPs with a p < 10−4 were added to the panel after examination of regional association plots (S14–S20 Figs). Furthermore, rs2694093 (p = 0.00014) was independently associated with athlete status and is in a region near the strongest signal rs921665 (chr2:3174321, hg19/1000 Genomes Mar 2012 ASN; S21 Fig). It was also included in the discovery set. All 19 SNPs retained for the replication phase based on the results on the Japanese athletes are summarized in Table 3. These 19 SNPs were combined with the 26 retained from GENATHLETE to constitute the panel of 45 SNPs that was used in the replication cohorts.
Table 4 depicts the association results for the 42 SNPs that could be tested in at least two of the four replication cohorts of Caucasians, namely Australian, Polish, Russian and Spanish endurance athletes and controls. Meta-analysis showed that none of the markers summarized in Table 4 reached statistical significance after accounting for multiple testing (Bonferroni-corrected statistical significance p<0.0011), although one marker (rs558129 in GALNT6) showed a nominal p-value of P = 0.02. Five SNPs were nominally (p<0.05) associated with endurance athlete status in the Australian study, as well as one in the Polish cohort, four in the Spanish cohort, and five in the Russian cohort. One SNP (rs7947391) showed nominally significant associations in three replication cohorts (Australians, Polish and Spanish). However, as is evident from the meta-analysis results, the direction of the association was not the same across the three studies; the minor allele frequency was lower in athletes than in controls in the Spanish cohort (as well as in the GENATHLETE discovery cohort), whereas Australian and Polish athletes had greater minor allele frequency than their non-athlete controls.
The meta-analysis of the subgroup of athletes classified as world class and truly elite endurance athletes did not reveal any significant associations. Five SNPs had nominal p-values less than 0.05 in Australian world-class athletes, one in Polish world-class athletes, four SNPs in Spanish world-class athletes, and three in Russian world-class athletes (Table 5). However, these associations were generally not directionally consistent among cohorts.
Next, we report on the associations between 35 SNPs identified in the discovery phase and endurance athlete status in Ethiopian and Kenyan athletes and controls (Table 6). The lower number of SNPs available for these analyses is due to the fact that several SNPs selected for the replication phase were monomorphic or very rare in African populations. The meta-analysis of the Ethiopian and Kenyan athletes did not reveal any associations that were significant after accounting for multiple testing, although four SNPs had nominal p-value less than 0.05. Five and three SNPs had nominal p-value less than 0.05 in the Kenyan sample (276 athletes versus 83 controls) and in the Ethiopian study (75 athletes versus 198 controls), respectively, but none of these SNPs overlapped between the two cohorts. In fact, the most significant SNPs observed in the Kenyan cohort showed associations in the opposite direction in the Ethiopian cohort.
In an attempt to identify lead SNPs based on all the cohorts contributing to the GAMES project, meta-analyses were performed using all contributing studies. The results are summarized in Table 7. Finally, the summary meta-analyses were repeated with all subsamples of world-class endurance athletes and these results are depicted in Table 8. Among all available athletes and controls (9 studies, including discovery cohorts), the meta-analysis revealed one statistically significant SNP (rs558129 at GALNTL6 locus, p = 0.0002), even after correcting for multiple testing (Bonferroni-corrected statistical significance p<0.0011). As shown by the low heterogeneity index (I2 = 0), all eight cohorts showed the same direction of association with rs558129, even though p-values varied considerably across the individual studies. When the meta-analysis was repeated without the discovery cohorts, the association persisted near the multiple testing-adjusted significance threshold (p = 0.0019).
When the meta-analysis was restricted to the world class athletes versus all controls, none of the SNPs retained for replication reached multiple testing-corrected statistical significance, although SNPs rs4288991 (p = 0.0028) and rs10938202 (p = 0.0053) were trending close. SNP rs558129 was characterized by a nominal p value of 0.04 in this meta-analysis.
It is commonly recognized that to perform at a world-class level in endurance athletic events, one has to be intrinsically well endowed in terms of cardiorespiratory and skeletal muscle potential to exercise at high intensity for sustained periods of time in combination with the ability to respond very favorably to exercise training regimens. Observational genetic epidemiology studies and experimental studies in humans as well as in rodents have shown that there are substantial genetic components to intrinsic cardiorespiratory fitness and its trainability [5, 9, 10]. However, the exact genomic features responsible for these genetic effects have not been identified despite many years of candidate gene driven research, a topic that has been reviewed in recent times [18, 37–39]. A limited number of studies have used an unbiased genomic search to identify genomic regions harboring allelic markers of athlete status  or response to exercise training of physiological traits [30, 31, 41–43] or GWAS-identified loci associated with determinants of aerobic fitness or its trainability [28–30] but most findings have not been subjected to replication studies. The present report constitutes the largest effort to date designed to identify in an unbiased manner common variants that could begin to define the world-class endurance performance genotype. Even though the total sample size was substantially higher than any other published effort in this field, it is recognized that it was not optimal for the identification of common genomic variants with small effect size that could discriminate between elite endurance athletes and sedentary controls. The main conclusion from the present study is that common genetic variants do not appear to be strong determinants of elite endurance athlete status.
From the GWAS performed on a panel of Japanese endurance athletes and controls plus the CardioMetabochip screen on the endurance athletes from four countries and their matched controls of the GENATHLETE cohort, a panel of 45 SNPs was retained and carried forward for replication in seven other cohorts. Importantly, none of these 45 SNPs reached genome-wide significance in the two discovery studies. But these studies were small and were expected to generate targets at the genome-wide level of significance only if SNPs with large effect sizes were contributing to elite endurance athlete status. There was no overlap among the top SNPs identified in both discovery panels.
When the results from all cohorts were pooled together in meta-analyses, SNP rs558129 located in the GALNTL6 locus on chromosome 4q34.1 was statistically significant (p = 0.0002) even after multiple testing was taken into account. The nominal p-values of the individual cohorts ranged from 0.011 (Australia) to 0.9572 (Spain), but the direction of the association was uniform across all cohorts, that is the T allele was less frequent in athletes than in controls. The association remained robust after excluding the discovery cohort from the meta-analysis (p = 0.0071). The same SNP in GALNTL6 was also nominally significant (p = 0.037) when the meta-analysis was undertaken on the subsets of world-class endurance athletes. This illustrates how random effects meta-analysis helps to identify consistent trends across several relatively small cohorts which individually would not allow for the documentation of such an association. The GALNTL6 gene encodes N-acetylgalactosaminyltransferase-like 6 but the functional role of the peptide is not fully elucidated at this time. It is expressed primarily in the testes but also in the brain and to some extent in skeletal muscle . The rs558129 polymorphic site is in the last intron of the gene.
A few weak leads may also be of interest in future research. Two SNPs in or near TSSC1, the gene encoding tumor suppressing substransferable candidate 1, were associated with endurance athlete status in the cohort from Russia and one of these SNPs (rs2694093) was also shown to be nominally significant and directionally consistent in the global meta-analysis of all cohorts for all endurance athletes as well as the world-class athletes. Of potential interest could also be SNPs near TOX3/CHD9 on chr 16 and RPLP1/TLE3 on chr 15 which were nominally associated with endurance athlete status in the global meta-analyses of all endurance athletes and in the subset of world-class athletes as well. None of these genes have been implicated in exercise biology before.
Interestingly, SNPs in three candidate genes (CKM, ACTN3 and GNB3) selected on the basis of prior results in Japanese endurance athletes were not associated with endurance athlete status in any of the cohorts or in meta-analyses. Along the same line, 161 SNPs related to 13 candidate genes for endurance performance derived from the human gene map for physical performance  were available among the CardioMetabochip markers genotyped in GENATHLETE. These candidate genes were: ACE, ACSL1, ACTN3, ADRB1, ADRB2, AMPD1, BDKRB2, GH1, IL6, KDR, NOS3, PPARA and PPARGC1A. None of the 161 SNPs reached the Bonferroni-adjusted significance threshold of p = 3.5 x 10−7. Two SNPs in the vicinity of PPARGC1A were the most strongly associated with endurance athlete status at p≤0.0012. None of the SNP associations were close to significance when the subsample of the GENATHLETE world-class endurance athletes (VO2max ≥78 mL O2/kg/min) was considered.
Over all, there is no convincing evidence for the contribution of common genomic variants to elite endurance performance in the present report. However, since we were underpowered to identify contributing alleles with small effect sizes, one may be justified in proposing that a few leads, even though supported by very modest levels of evidence, be explored in subsequent studies with adequate statistical power. For instance, the joint contribution from gene-members of a pathway or a network module could be significant, even if individual gene contributions are small and non-significant. We have used this approach successfully in other GWA studies, for example on the analysis of genetic associations to maximal oxygen uptake in response to exercise . An alternative approach employs unsupervised bioinformatics data analysis techniques to generate functional hypotheses on how some of the top candidates from a study may be related to a trait. Graph based methods, such as those based on stochastic random walks , are well suited for this type of analysis and can provide useful information for candidate gene prioritization. In this regard, it may be useful to expand the comparison of world-class endurance athletes and sedentary controls to DNA sequence variants in or in the vicinity of a number of genes (for instance: GALNTL6 but also perhaps ADRA1A, ATP8A1, CHD9, CNTN3, RPLP1, TLE3, TOX3, TSSC1, AKT3, BET1, BMP10) in order to shed some light on the genetics, biology and the highly demanding selection process leading to world class endurance performance. Thus, using the 12 genes listed above in a Biograph analysis, in which the query was on athletic performance, we have found additional support for the suggestion that sequence variants in these genes should be further investigated (results not shown). Even when SNPs and genes are characterized by small effect size and marginal nominal significance, the joint contributions from gene-members of a pathway or a network may provide useful information.
The present report has several limitations. The SNP chips used for the discovery phase differed between GENATHLETE and the Japanese cohort of endurance athlete. These discovery studies were performed independently and later used to identify the panel of the most promising 45 SNPs. It is also recognized that the CardioMetabochip used in GENATHLETE offers less than a comprehensive coverage of the genome and it may provide less than optimal inclusion of markers and genes that could be important for the peripheral determinants of endurance performance. In some of the replication studies, participants in the control group were from the general population or recreationally active subjects while in others they were confirmed sedentary participants. However, even though the endurance athletes were exposed to variable training regimens, which are undoubtedly heterogeneous by athletic event and country, these factors should not have an influence on the results as they all achieved national and international levels of competition requiring an extraordinary level of cardiorespiratory endurance. Obviously sample size continues to be a major issue for this type of studies. This is a challenge that is very difficult to overcome as the number of world-class endurance athletes on the planet is limited and those who reach this level of performance are hesitant to give consent to participate in genetic studies. Finally, one has also to consider that a number of the athletes who gave consent to participate in these genetic studies are using performance enhancement drugs. However, it is unlikely that doping of any kind could have influenced the results as the use of these drugs is not by itself responsible for an athlete reaching world-class level but rather their uses come more into play when athletes of world-class caliber are engaged competitively against one another.
In all likelihood, attaining the required sample size of world class caliber endurance athletes, for an adequate statistical power in the search for critical sequence variants, will require a buy-in from the relevant world sports federations and the International Olympic Committee. With their support and active participation, thousands of world-class endurance athletes could be enrolled in genomics studies aimed at understanding the fundamentals of inherited biological traits that are necessary to perform at the world class level. Such an effort, particularly if it relied on whole genome sequencing, would allow for the exploration of not only common polymorphisms but also rare variants and copy number variants and could be complemented by the investigation of epigenomic signatures in accessible tissues. In summary, we found that the T allele in GALNTL6 was less frequent in endurance athletes of all studies compared to ethnicity-matched controls. However, we could not find evidence for a detailed genomic signature that differentiates endurance athletes from controls.
S1 Fig. Distribution of GENATHLETE VO2max of the 315 Elite Endurance Athletes.
S2 Fig. Quantile-Quantile plot of observed vs expected—log 10 p values for genome-wide data from Japanese athletes and controls.
S3 Fig. Regional association plot of the index SNP—rs921665.
S4 Fig. Regional association plot of the index SNP—rs6548153.
S5 Fig. Regional association plot of the index SNP—rs7650685.
S6 Fig. Regional association plot of the index SNP—rs10007111.
S7 Fig. Regional association plot of the index SNP—rs558129.
S8 Fig. Regional association plot of the index SNP—rs2910756.
S9 Fig. Regional association plot of the index SNP—rs11975386.
S10 Fig. Regional association plot of the index SNP—rs16906888.
S11 Fig. Regional association plot of the index SNP—rs17690338.
S12 Fig. Regional association plot of the index SNP—rs2761291.
S13 Fig. Regional association plot of the index SNP—rs4541108.
S14 Fig. Regional association plot of the index SNP—rs10874242.
S15 Fig. Regional association plot of the index SNP—rs12047209.
S16 Fig. Regional association plot of the index SNP—rs2361506.
S17 Fig. Regional association plot of the index SNP—rs9355947.
S18 Fig. Regional association plot of the index SNP—rs6959675.
S19 Fig. Regional association plot of the index SNP—rs3780169.
S20 Fig. Regional association plot of the index SNP—rs9580890.
S21 Fig. Regional association plot of the index SNP—rs2694093.
S1 File. Supporting Information Text.
S1 Table. The Discovery Phase: The GENATHLETE Cohort.
The contributions of Monique Chagnon and Anne-Marie Bricault from Laval University, Quebec, Canada, Kathryn Cooper and Jessica Watkins from Pennington Biomedical Research Center, Baton Rouge, LA, USA, to the GENATHLETE study are gratefully acknowledged. Gratitude is expressed to Dr. Vladimir A. Naumov from the Research Institute for Physical-Chemical Medicine, Russia, and to Dr. Leysan J. Gabdrakhmanova, Dr. Emiliya S. Egorova and Dr. Albina A. Galeeva from the Volga Region State Academy of Physical Culture, Sport and Tourism, Russia.
Conceived and designed the experiments: CB YP TR KNN NF AL PC IIA MAS. Performed the experiments: BW GW DGA MRB NE MLF FCG EVG VMG PJH TK ESK NAK AKL AMK MM CAM HM EAO SP AVP ONP CS MS RAS VVU TY LP RR. Analyzed the data: TR GW MAS PJH SG YP CB. Contributed reagents/materials/analysis tools: FCG PJH PC NE KNN AL CAM CS TY IIA MLF VMG VVU AMK MS MM TK BW SG YP. Wrote the paper: CB TR YP AL KNN IIA NF PC MAS.
- 1. Gedda L. Sports and genetics. A study on twins (351 pairs). Acta Genet Med Gemellol (Roma). 1960;9:387–406. Epub 1960/10/01. pmid:13704140.
- 2. Chagnon YC, Allard C, Bouchard C. Red blood cell genetic variation in Olympic endurance athletes. J Sport Sci. 1984;2(2):121–9. doi: 10.1080/02640418408729707
- 3. Couture L, Chagnon M, Allard C, Bouchard C. More on red blood cell genetic variation in Olympic athletes. Can J Appl Sport Sci. 1986;11(1):16–8. Epub 1986/03/01. pmid:3698155.
- 4. de Garay A, Levine L, Carter JEL. Genetic and Anthropological Studies of Olympic Athletes. Press A, editor. New York: Academic Press 1974.
- 5. Bouchard C, Daw EW, Rice T, Perusse L, Gagnon J, Province MA, et al. Familial resemblance for VO2max in the sedentary state: the HERITAGE family study. Medicine and science in sports and exercise. 1998;30(2):252–8. Epub 1998/03/21. pmid:9502354. doi: 10.1097/00005768-199802000-00013
- 6. Lortie G, Bouchard C, Leblanc C, Tremblay A, Simoneau JA, Theriault G, et al. Familial similarity in aerobic power. Hum Biol. 1982;54(4):801–12. Epub 1982/12/01. pmid:7166301. doi: 10.1080/03014468400007201
- 7. Bouchard C, Tremblay A, Despres JP, Theriault G, Nadeau A, Lupien PJ, et al. The response to exercise with constant energy intake in identical twins. Obes Res. 1994;2(5):400–10. Epub 1994/09/01. pmid:16358397. doi: 10.1002/j.1550-8528.1994.tb00087.x
- 8. Hamel P, Simoneau JA, Lortie G, Boulay MR, Bouchard C. Heredity and muscle adaptation to endurance training. Medicine and science in sports and exercise. 1986;18(6):690–6. Epub 1986/12/01. pmid:3784881. doi: 10.1249/00005768-198612000-00015
- 9. Prudhomme D, Bouchard C, Leblanc C, Landry F, Fontaine E. Sensitivity of maximal aerobic power to training is genotype-dependent. Medicine and science in sports and exercise. 1984;16(5):489–93. pmid:6542620 doi: 10.1249/00005768-198410000-00012
- 10. Bouchard C, An P, Rice T, Skinner JS, Wilmore JH, Gagnon J, et al. Familial aggregation of VO(2max) response to exercise training: results from the HERITAGE Family Study. Journal of applied physiology (Bethesda, Md: 1985). 1999;87(3):1003–8. Epub 1999/09/14. pmid:10484570.
- 11. Skinner JS, Jaskolski A, Jaskolska A, Krasnoff J, Gagnon J, Leon AS, et al. Age, sex, race, initial fitness, and response to training: the HERITAGE Family Study. Journal of applied physiology (Bethesda, Md: 1985). 2001;90(5):1770–6. Epub 2001/04/12. pmid:11299267.
- 12. Rankinen T, Wolfarth B, Simoneau JA, Maier-Lenz D, Rauramaa R, Rivera MA, et al. No association between the angiotensin-converting enzyme ID polymorphism and elite endurance athlete status. Journal of applied physiology (Bethesda, Md: 1985). 2000;88(5):1571–5. Epub 2000/05/08. pmid:10797114.
- 13. Rivera MA, Dionne FT, Simoneau JA, Perusse L, Chagnon M, Chagnon Y, et al. Muscle-specific creatine kinase gene polymorphism and VO2max in the HERITAGE Family Study. Medicine and science in sports and exercise. 1997;29(10):1311–7. Epub 1997/11/05. pmid:9346161 doi: 10.1097/00005768-199710000-00006
- 14. Rivera MA, Perusse L, Gagnon J, Dionne FT, Leon AS, Rao DC, et al. A mitochondrial DNA D-loop polymorphism and obesity in three cohorts of women. Int J Obes Relat Metab Disord. 1999;23(6):666–8. Epub 1999/07/20. pmid:10411243. doi: 10.1038/sj.ijo.0800900
- 15. Wolfarth B, Rankinen T, Muhlbauer S, Ducke M, Rauramaa R, Boulay MR, et al. Endothelial nitric oxide synthase gene polymorphism and elite endurance athlete status: the Genathlete study. Scand J Med Sci Sports. 2008;18(4):485–90. Epub 2007/12/11. SMS717 [pii] doi: 10.1111/j.1600-0838.2007.00717.x pmid:18067521.
- 16. Wolfarth B, Rankinen T, Muhlbauer S, Scherr J, Boulay MR, Perusse L, et al. Association between a beta2-adrenergic receptor polymorphism and elite endurance performance. Metabolism. 2007;56(12):1649–51. Epub 2007/11/14. S0026-0495(07)00268-5 [pii] doi: 10.1016/j.metabol.2007.07.006 pmid:17998016.
- 17. Wolfarth B, Rivera MA, Oppert JM, Boulay MR, Dionne FT, Chagnon M, et al. A polymorphism in the alpha2a-adrenoceptor gene and endurance athlete status. Medicine and science in sports and exercise. 2000;32(10):1709–12. Epub 2000/10/20. pmid:11039642. doi: 10.1097/00005768-200010000-00008
- 18. Bray MS, Hagberg JM, Perusse L, Rankinen T, Roth SM, Wolfarth B, et al. The human gene map for performance and health-related fitness phenotypes: the 2006–2007 update. Medicine and science in sports and exercise. 2009;41(1):35–73. Epub 2009/01/06. pmid:19123262. doi: 10.1249/mss.0b013e3181844179
- 19. Mikami E, Fuku N, Takahashi H, Ohiwa N, Pitsiladis YP, Higuchi M, et al. Polymorphisms in the control region of mitochondrial DNA associated with elite Japanese athlete status. Scand J Med Sci Sports. 2013;23(5):593–9. Epub 2012/02/01. doi: 10.1111/j.1600-0838.2011.01424.x pmid:22288660.
- 20. Mikami E, Fuku N, Takahashi H, Ohiwa N, Scott RA, Pitsiladis YP, et al. Mitochondrial haplogroups associated with elite Japanese athlete status. Br J Sports Med. 2011;45(15):1179–83. Epub 2010/06/17. [pii]. pmid:20551160. doi: 10.1136/bjsm.2010.072371
- 21. Buxens A, Ruiz JR, Arteta D, Artieda M, Santiago C, Gonzalez-Freire M, et al. Can we predict top-level sports performance in power vs endurance events? A genetic approach. Scand J Med Sci Sports. 2011;21(4):570–9. Epub 2010/05/13. [pii]. pmid:20459474. doi: 10.1111/j.1600-0838.2009.01079.x
- 22. Ash GI, Scott RA, Deason M, Dawson TA, Wolde B, Bekele Z, et al. No association between ACE gene variation and endurance athlete status in Ethiopians. Medicine and science in sports and exercise. 2011;43(4):590–7. Epub 2010/08/28. pmid:20798657. doi: 10.1249/mss.0b013e3181f70bd6
- 23. Scott RA, Georgiades E, Wilson RH, Goodwin WH, Wolde B, Pitsiladis YP. Demographic characteristics of elite Ethiopian endurance runners. Medicine and science in sports and exercise. 2003;35(10):1727–32. Epub 2003/10/03. pmid:14523311. doi: 10.1249/01.mss.0000089335.85254.89
- 24. Ahmetov II, Druzhevskaya AM, Astratenkova IV, Popov DV, Vinogradova OL, Rogozkin VA. The ACTN3 R577X polymorphism in Russian endurance athletes. Br J Sports Med. 2010;44(9):649–52. Epub 2008/08/23. [pii]. pmid:18718976. doi: 10.1136/bjsm.2008.051540
- 25. Yang N, MacArthur DG, Gulbin JP, Hahn AG, Beggs AH, Easteal S, et al. ACTN3 genotype is associated with human elite athletic performance. Am J Hum Genet. 2003;73(3):627–31. Epub 2003/07/25. [pii]. pmid:12879365; PubMed Central PMCID: PMC1180686. doi: 10.1086/377590
- 26. Eynon N, Meckel Y, Alves AJ, Yamin C, Sagiv M, Goldhammer E. Is there an interaction between PPARD T294C and PPARGC1A Gly482Ser polymorphisms and human endurance performance? Exp Physiol. 2009;94(11):1147–52. Epub 2009/08/12. [pii]. pmid:19666693. doi: 10.1113/expphysiol.2009.049668
- 27. Zarebska A, Sawczyn S, Kaczmarczyk M, Ficek K, Maciejewska-Karlowska A, Sawczuk M, et al. Association of rs699 (M235T) polymorphism in the AGT gene with power but not endurance athlete status. J Strength Cond Res. 2013;27(10):2898–903. Epub 2013/01/05. pmid:23287839. doi: 10.1519/jsc.0b013e31828155b5
- 28. Bouchard C, Sarzynski MA, Rice TK, Kraus WE, Church TS, Sung YJ, et al. Genomic predictors of the maximal O(2) uptake response to standardized exercise training programs. Journal of applied physiology (Bethesda, Md: 1985). 2011;110(5):1160–70. Epub 2010/12/25. [pii]. pmid:21183627; PubMed Central PMCID: PMC3098655. doi: 10.1152/japplphysiol.00973.2010
- 29. Rankinen T, Sung YJ, Sarzynski MA, Rice TK, Rao DC, Bouchard C. Heritability of submaximal exercise heart rate response to exercise training is accounted for by nine SNPs. Journal of applied physiology (Bethesda, Md: 1985). 2012;112(5):892–7. Epub 2011/12/17. [pii]. pmid:22174390; PubMed Central PMCID: PMC3311659. doi: 10.1152/japplphysiol.01287.2011
- 30. Timmons JA, Knudsen S, Rankinen T, Koch LG, Sarzynski M, Jensen T, et al. Using molecular classification to predict gains in maximal aerobic capacity following endurance exercise training in humans. Journal of applied physiology (Bethesda, Md: 1985). 2010;108(6):1487–96. Epub 2010/02/06. [pii]. pmid:20133430; PubMed Central PMCID: PMC2886694. doi: 10.1152/japplphysiol.01295.2009
- 31. Argyropoulos G, Stutz AM, Ilnytska O, Rice T, Teran-Garcia M, Rao DC, et al. KIF5B gene sequence variation and response of cardiac stroke volume to regular exercise. Physiol Genomics. 2009;36(2):79–88. Epub 2008/11/06. [pii]. pmid:18984674; PubMed Central PMCID: PMC2636926. doi: 10.1152/physiolgenomics.00003.2008
- 32. Voight BF, Kang HM, Ding J, Palmer CD, Sidore C, Chines PS, et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 2012;8(8):e1002793. Epub 2012/08/10. [pii]. pmid:22876189; PubMed Central PMCID: PMC3410907. doi: 10.1371/journal.pgen.1002793
- 33. Wigginton JE, Cutler DJ, Abecasis GR. A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet. 2005;76(5):887–93. Epub 2005/03/25. S0002-9297(07)60735-6 [pii] doi: 10.1086/429864 pmid:15789306; PubMed Central PMCID: PMC1199378.
- 34. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. Epub 2007/08/19. S0002-9297(07)61352-4 [pii] doi: 10.1086/519795 pmid:17701901; PubMed Central PMCID: PMC1950838.
- 35. Larson MG, Atwood LD, Benjamin EJ, Cupples LA, D'Agostino RB Sr., Fox CS, et al. Framingham Heart Study 100K project: genome-wide associations for cardiovascular disease outcomes. BMC Med Genet. 2007;8 Suppl 1:S5. Epub 2007/10/16. 1471-2350-8-S1-S5 [pii] doi: 10.1186/1471-2350-8-S1-S5 pmid:17903304; PubMed Central PMCID: PMC1995607.
- 36. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7. Epub 2010/07/17. doi: 10.1093/bioinformatics/btq419 pmid:20634204; PubMed Central PMCID: PMCPmc2935401.
- 37. Pitsiladis Y, Wang G, Wolfarth B, Scott R, Fuku N, Mikami E, et al. Genomics of elite sporting performance: what little we know and necessary advances. Br J Sports Med. 2013;47(9):550–5. Epub 2013/05/02. [pii]. pmid:23632745. doi: 10.1136/bjsports-2013-092400
- 38. Rankinen T, Sarzynski MA, Bouchard C. Genes and response to training. In: Bouchard C, Hoffman EP, editors. Genetic and Molecular Aspects of Sport Performance. Volume XVIII of the Enclyclopaedia of Sports Medicine. An IOC Medical Commission Publication. West Sussex, UK: Wiley-Blackwell; 2011. p. 177–84.
- 39. Wolfarth B. Genes and Endurance Performance. In: Bouchard C, Hoffman EP, editors. Genetic and Molecular Aspects of Sport Performance. Volume XVIII of the Enclyclopaedia of Sports Medicine. An IOC Medical Commission Publication. West Sussex, UK: Wiley-Blackwell; 2011. p. 151–8.
- 40. De Moor MH, Spector TD, Cherkas LF, Falchi M, Hottenga JJ, Boomsma DI, et al. Genome-wide linkage scan for athlete status in 700 British female DZ twin pairs. Twin Res Hum Genet. 2007;10(6):812–20. Epub 2008/01/09. doi: 10.1375/twin.10.6.812 pmid:18179392.
- 41. Rice TK, Sarzynski MA, Sung YJ, Argyropoulos G, Stutz AM, Teran-Garcia M, et al. Fine mapping of a QTL on chromosome 13 for submaximal exercise capacity training response: the HERITAGE Family Study. Eur J Appl Physiol. 2012;112(8):2969–78. Epub 2011/12/16. doi: 10.1007/s00421-011-2274-8 pmid:22170014; PubMed Central PMCID: PMC4109813.
- 42. Spielmann N, Leon AS, Rao DC, Rice T, Skinner JS, Rankinen T, et al. Genome-wide linkage scan for submaximal exercise heart rate in the HERITAGE family study. Am J Physiol Heart Circ Physiol. 2007;293(6):H3366–71. Epub 2007/10/09. 00042.2007 [pii] doi: 10.1152/ajpheart.00042.2007 pmid:17921336.
- 43. Stutz AM, Teran-Garcia M, Rao DC, Rice T, Bouchard C, Rankinen T. Functional identification of the promoter of SLC4A5, a gene associated with cardiovascular and metabolic phenotypes in the HERITAGE Family Study. Eur J Hum Genet. 2009;17(11):1481–9. Epub 2009/04/23. [pii]. pmid:19384345; PubMed Central PMCID: PMC2766005. doi: 10.1038/ejhg.2009.64
- 44. Peng C, Togayachi A, Kwon YD, Xie C, Wu G, Zou X, et al. Identification of a novel human UDP-GalNAc transferase with unique catalytic activity and expression profile. Biochemical and biophysical research communications. 2010;402(4):680–6. Epub 2010/10/28. doi: 10.1016/j.bbrc.2010.10.084 pmid:20977886.
- 45. Ghosh S, Vivar JC, Sarzynski MA, Sung YJ, Timmons JA, Bouchard C, et al. Integrative pathway analysis of a genome-wide association study of (V)O(2max) response to exercise training. Journal of applied physiology (Bethesda, Md: 1985). 2013;115(9):1343–59. doi: 10.1152/japplphysiol.01487.2012 pmid:23990238; PubMed Central PMCID: PMC3841836.
- 46. Liekens AM, De Knijf J, Daelemans W, Goethals B, De Rijk P, Del-Favero J. BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome biology. 2011;12(6):R57. doi: 10.1186/gb-2011-12-6-r57 pmid:21696594; PubMed Central PMCID: PMC3218845.