Genome-wide association screens for Achilles tendon and ACL tears and tendinopathy

Achilles tendinopathy or rupture and anterior cruciate ligament (ACL) rupture are substantial injuries affecting athletes, associated with delayed recovery or inability to return to competition. To identify genetic markers that might be used to predict risk for these injuries, we performed genome-wide association screens for these injuries using data from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort consisting of 102,979 individuals. We did not find any single nucleotide polymorphisms (SNPs) associated with either of these injuries with a p-value that was genome-wide significant (p<5x10-8). We found, however, four and three polymorphisms with p-values that were borderline significant (p<10−6) for Achilles tendon injury and ACL rupture, respectively. We then tested SNPs previously reported to be associated with either Achilles tendon injury or ACL rupture. None showed an association in our cohort with a false discovery rate of less than 5%. We obtained, however, moderate to weak evidence for replication in one case; specifically, rs4919510 in MIR608 had a p-value of 5.1x10-3 for association with Achilles tendon injury, corresponding to a 7% chance of false replication. Finally, we tested 2855 SNPs in 90 candidate genes for musculoskeletal injury, but did not find any that showed a significant association below a false discovery rate of 5%. We provide data containing summary statistics for the entire genome, which will be useful for future genetic studies on these injuries.

The purpose of this study was to perform a screen of the entire genome for polymorphisms associated with either Achilles tendon injury or ACL rupture in a large dataset. We also analyzed specific SNPs previously reported to be associated with either injury.

Materials and methods
Genome-wide association screens were performed for either Achilles tendon injury (defined as tendinopathy, rupture or repair) or ACL rupture using data from the Genetic Epidemiology Determination of genetic ancestry was performed by principal component analysis (PCA), as previously described [24]. These ancestry principal components were used in the GWAS to adjust for genetic ancestry.

Phenotype definition
Achilles tendon injuries were identified in the GERA cohort based on clinical diagnoses and surgical procedures captured in the KPNC Electronic Health Record system. Cases were defined as individuals with at least one International Classification of Disease, Ninth Revision (ICD9) or Common Procedure Terminology, Fourth Edition (CPT4) code: ICD726_71, ICD727_67, and/or CPT27650, describing Achilles bursitis or tendinitis, nontraumatic rupture of Achilles tendon, and primary repair of Achilles tendon rupture, respectively ( Table 1). The Electronic Medical Record includes injuries over the entire lifetime of the patients; i.e. those that occurred prior to enrollment in KPNC as well as those that occurred after the genotyping analysis was performed, and until this study was initiated July 22, 2015.
For ACL rupture, KPNC patients with any potential ACL injury were identified by search of the electronic medical record for the ICD9 codes 717.83 and 844.2 and the CPT4 codes 29888 and 27407 at any time. The imaging studies and surgical reports from these patients were reviewed and those who had strong evidence for a full or partial ACL rupture on MRI and/or underwent ACL reconstruction were considered to have had an ACL rupture. The charts of the remaining patients were then manually reviewed by one of the study physicians (AA) with the assistance of the study orthopedist (JD). Those patients who had confirmatory evidence of ACL rupture (e.g., unambiguous history in a progress note) were also classified as having had an ACL rupture. Patients with ambiguous histories or ACL injury without rupture were not considered to have had a full or partial ACL rupture.

Genome-wide association and meta-analysis
We tested for SNP associations with a logistic regression model using allele in an additive genetic model for each of the five ancestry groups separately, as in Roos et al., 2016 [24]. The model was adjusted for sex, age at enrollment into the RPGEH cohort, ancestry group using principal components, and genotyping chip version for all populations.
For Achilles tendinopathy or rupture, the principal components used were: AFR (6 PCs), EAS (6 PCs), EUR (10 PCs), LAT (6 PCs) and SAS (6 PCs). For ACL rupture, there were only 7 cases in the SAS and AFR populations, and so these populations were not analyzed further. For the remaining populations for ACL rupture, the principal components used were: EAS (2 PCs), EUR (4 PCs) and LAT (3 PCs). 10,551,193 SNPs were processed in the meta-analysis for Achilles tendinopathy or rupture and 8,303,052 SNPs were processed for ACL rupture metaanalysis. To assess for inflation due to population stratification, the genomic control parameter (λ) was calculated [28]. For Achilles tendinopathy or rupture, genomic inflation was small for GWA analyses in the EUR, LAT, EAS, AFR populations (λ between .99 and 1.02), but substantial in the SAS population (λ~0.37). For ACL rupture, λ was between .93 and 1.0 for the EAS, EUR and LAT populations. Subsequently, p-values were adjusted for genomic control in each population. The linkage disequilibrium score method was used to evaluate the genomic inflation factor using the python script ldsc from Bulik-Sullivan et al. 2015 [29]. LD Hub was used to compute the genetic correlation between the GWAS for the two injuries from this paper and those from 198 published studies (http://ldsc.broadinstitute.org/) [30]. The input data contained the unadjusted p-values from the European populations from either the Achilles tendon injury or the ACL injury GWAS. SNPs that overlapped those in HAPMAP3 were retained for analysis. For both the Achilles and ACL injury phenotypes, there was a low intercept of the LD score regression, indicating lack of bias from population structure (S1 Table). Results from each population were combined by inverse-variance, fixed-effects meta-analysis and by inverse-variance, random-effects meta-analysis. Results are reported using fixed-effects metaanalysis p-values. SNPs were discarded from the meta-analysis if data were missing from the EUR population, which comprises over 80% of the cohort.
The fixed-effects and random-effects p-values for all of the SNPs tested in this study are available from NIH GRASP: https://grasp.nhlbi.nih.gov/FullResults.aspx. These data could be useful for future studies on the genetic risk for Achilles tendon injury or ACL rupture.

Replication analysis of previously reported SNPs
We searched our meta-analysis results for previously-published genetic associations with either Achilles injury or ACL rupture [1,2,[4][5][6][7][8][9][10][11][12]13,[14][15][16][17][18][19][20][21][22]. We searched the public databases PubMed/MEDLINE (http://www.ncbi.nlm.nih.gov/pubmed) and Scopus (http://www.scopus. com/) for previously published studies on the genetics of Achilles tendon injury or ACL rupture. Inclusion criteria for filtering results were: research papers with original data, English language, full-text article available (or abstract with enough data), genetic association studies with human subjects, cases with Achilles tendon injury or ACL rupture phenotypes, and gene or SNP associations reported. Articles were included regardless of whether or not the associations were significant and regardless of the size of the study population. Furthermore, the references from each publication were used to find other articles not captured in the initial search. The final searches were conducted on June 1, 2016. A Benjamini-Hochberg false discovery rate threshold (q = 0.05) was used to account for multiple testing [31].

Analysis of novel candidate SNPs
Based on our knowledge of biological functions of the Achilles tendon and ACL, we generated a list of 90 candidate genes potentially associated with either Achilles or ACL injury. These genes code for structural components or the development of ligaments and/or tendons. A set of all known SNPs found within 2kb of the start and end of these genes was generated using SCAN (http://www.scandb.org/) [31]. Rare SNPs (MAF<0.005) were removed because they would not have enough statistical power to show an association in gene association studies. SNPs not found in our GWAS results were removed, yielding 14,734 SNPs. To select a single tag SNP in each linkage disequilibrium block, SNP pruning with an LD threshold (r 2 >0.5) was performed in PLINK v1.90 (b3.34), resulting in 2,855 candidate SNPs. For the candidate genes that had been tested earlier, previous studies tested only one or a few SNPs from each gene whereas our list contains many tag SNPs spanning the entire gene. Because we tested many candidate SNPs, we used the Benjamini-Hochberg method to set the false discovery rate to q = .05; i.e. the top SNP in the list would require a p-value of 1.7x10 -5 in order to be deemed significant [32].
Gene-based testing was performed using the VErsatile Gene-bASed test VEGAS2 [33]. 2855 SNPs from 90 candidate genes and their fixed-effects p-values from either the ACL rupture or the Achilles tendon injury meta-analyses were used. DNA variants that were not SNPs were filtered from the input files. Gene-based analysis was performed using the online VEGAS2 software from https://vegas2.qimrberghofer.edu.au/ [33].

Ethical considerations
This study analyzed stored data from RPGEH. The health and genotype data for the subjects were de-identified. All study procedures were approved by the Institutional Review Board of the Kaiser Foundation Research Institute.

Study population and genotype information
We obtained access to injury information and genotype data from the Research Program on Genes, Environment and Health cohort (Materials and Methods). This program includes genotype and medical data from 102,979 patients in the Kaiser-Permanente system in Northern California. For Achilles tendon injury, we interrogated the electronic medical records of these individuals for those that had incurred tendinopathy or a rupture (Table 1)(Materials and Methods). For ACL rupture, cases were defined as patients that showed ACL rupture by MRI or that had undergone a procedure for ACL reconstruction (Methods). In total, there were 5,148 cases of Achilles tendon injury (consisting of 290 with ruptures and 4,858 with bursitis or tendinitis) from 102,979 individuals. For ACL rupture, there were only 7 cases each from the AFR and SAS populations, so these ancestry groups were excluded from further analysis. This left 598 cases of ACL rupture from 99,342 individuals in the EAS, EUR and LAT ancestry groups. Descriptive factors for these injuries are shown in Table 2. For ACL rupture, the average age of the cases is 10 years less than that of the controls (p<10 −100 ). One possible explanation is that there could be an ascertainment bias against ACL ruptures in older patients that incurred the injury when they were young before MRI was commonly used.
Genome-wide studies for association with either achilles tendon Injury or ACL rupture Fig 1 shows QQ plots with the observed p-values from the GERA cohort compared to the pvalues that would be expected by chance. For either injury, the p-values from the association study roughly overlap those that would be expected by chance.
We used Linkage Disequilibrium Score Regression to analyze the GWAS data for: 1) heritability, 2) fraction of the genetic association signal derived from polygenic associations, and 3) overlap with GWAS data for other traits [29,30]. First, genetic heritability refers to amount of the trait that can be explained by the sum of all of the SNPs in the GWAS. For both Achilles tendon injury and ACL rupture, the heritability was low (0.5-1.1%)(S1 Table). Second, LD score regression is able to discern whether the observed genetic associations are due to associations with small effects from many loci versus associations due to population structure. If the associations are due to polygenic associations, then the intercept from LD score regression will be less than the genomic inflation factor. For both Achilles tendon injury and ACL rupture, LD score regression suggests that population structure did not have a strong influence in the observed association results. Third, the amount of genetic correlation between the results from the two studies analyzed here with those from 198 previously published studies was determined using LD Hub (Materials and Methods) [30]. Neither of the GWAS results from Achilles tendon injury or ACL rupture showed a significant genetic correlation with any of the 198 datasets (S2 Table).
We plotted the p-value for every SNP on Manhattan plots (Fig 2). We did not find any SNP that matched the genome-wide statistical threshold of p<5x10 -8 . Table 3 lists the independent SNPs that reach p<10 −6 . There were four SNPs for Achilles tendon injury and three SNPs for ACL rupture.
Re-testing candidate SNPs for association with achilles tendon Injury or ACL rupture Previous studies have reported candidate SNPs that have shown an association with either Achilles tendon injury or ACL rupture using p<0.05 as a cutoff. Specifically, 19 SNPs in 12 candidate genes have been reported to show an association with Achilles tendinopathy (using p<0.05 as a cutoff) [1,2,[4][5][6][7][8][9][10][11][12]. Likewise, nine SNPs in eight candidate genes have been reported to show an association with ACL rupture [7,13,[14][15][16][17][18][19][20][21][22]. There is a small difference in ages between cases and controls (p = 0.01). c The difference in ages between the cases and controls is highly significant (p<10 −100 ). https://doi.org/10.1371/journal.pone.0170422.t002 For Achilles tendinopathy or rupture, 14 of these SNPs were contained in our dataset whereas for ACL rupture, 6 SNPs were present. We attempted to replicate the previous results for these candidate SNPs using our dataset. Because we are testing only a small number of SNPs, the threshold for statistical significance can be much lower than the genome-wide threshold that we used for the genome-wide study above (p<5x10 -8 ). It is still important, however, to adjust the p-value threshold to compensate for multiple testing. We used the Benjamini-Hochberg method to set the false discovery rate to q = 0.05. In addition, the association should show the same direction of effect (i.e. same risk allele) as in the previous publication.  Minor Allele Frequency in the control population. c P-value from fixed effects (FE) or random effects (RE) meta-analysis. d For Achilles tendinopathy or rupture, none of the 14 SNPs were found to be significant using the Benjamini-Hochberg threshold (Table 4). However, rs1045485 and rs4789932 showed small p-values of p<0.01, but in both cases the direction of the effect in our study was opposite to the direction reported previously (Table 4). For ACL rupture, none of the SNPs showed an association in our data set using the Benamini-Hochberg cutoff (Table 4).
In our gene association study, we used sex, age and ancestry group as covariates whereas previous studies did not use these covariates. To control for this difference, we re-tested the candidate SNPs without using covariates in order to align our analysis with those that had been done previously (S3 Table). For Achilles tendinopathy or rupture, one SNP in MIR608 (rs4919510) showed a low p-value (p = 5.1x10 -3 ) and had the same direction of effect (G is protective) as seen previously [5]. The p-value for rs4919510 (p = 5.1x10 -3 ) is close to the p-value threshold set using the Benjamini-Hochberg method (p = 3.5x10 -3 ); i.e. a p-value of 5.1x10 -3 has about a 7% chance to be a false replication using 14 tests. Additionally, four other SNPs (rs1045485, rs591058, rs679620 and rs4789932) showed a small p-value but their direction of effect was opposite to the effect seen previously, so these SNPs do not provide support for the previous findings (S3 Table).
For ACL rupture, when we repeated the analysis without using covariates, we found that rs1800255 in COL3A1 had a small p-value of 0.03 with the same direction as found previously (S3 Table)[21]. With seven tests, this p-value does not meet the Benjamini-Hochberg threshold (p = 7x10 -3 ) and has about a 21% chance of being a false discovery.

Testing new candidate SNPs for association with achilles tendon Injury or ACL rupture
To demonstrate how these genome-wide data can be applied, we used the data to evaluate a new set of candidate genes for association with Achilles tendinopathy or rupture and with ACL rupture. We created a list of 90 genes encoding proteins involved in the formation of ligaments or tendons, including the set of candidate genes from previous publications (S4 Table).
We assembled a list of all independent, common SNPs in these 90 genes. We tested each of the candidate SNPs for association with these injuries by looking up their p-values in the metaanalysis results (S5 Table). None of the candidate SNPs had a significant p-value for either injury (using the Benjamini-Hochberg false discovery rate of q = 0.05). The top SNP for Achilles tendon injury (rs4660148) had p = 5x10 -5 , which corresponds to a false discovery rate of about 14%. The top SNP for ACL rupture (rs8090) had p = 6x10 -4 , which corresponds to a false discovery rate greater than 50% (S5 Table). Besides evaluating each SNP individually, SNPs were analyzed as part of genes using genebased analysis [33]. In gene-based analysis, SNPs are assigned to individual genes in close proximity, and then each gene is tested for association with a phenotype. Gene-based testing confers an advantage in cases where there is allelic heterogeneity-when there are multiple, independent variants influencing a trait at the same locus. With gene-based testing, the signals for genetic association from independent variants are combined, possibly yielding a p-value below a given significance threshold even though the individual SNPs in the gene may not themselves have significant p-values. We used gene-based analysis to test the 90 candidate genes for association with either Achilles tendinopathy or rupture or with ACL rupture (Methods). For Achilles tendinopathy or rupture, none of the genes showed a significant association (S6 Table).
For ACL rupture, GLT25D1 (also referred to as COLGALT1) initially showed a borderline significant association using gene-based analysis. Specifically, GLT25D1 had a p-value of 5.7x 10 -4 among 87 genes tested, which is just below the threshold for Bonferroni significance (p = 5.8x10 -4 )(S6 Table). GLT25D1 encodes collagen beta(1-O)galactosyltransferase 1, which transfers galactose to hydroxylysine residues of collagen [34]. Four SNPs (rs2082001, rs2375637, rs8110571 and rs55960725) within GLT25D1 showed an association with ACL rupture with p<0.05 (S1 Fig, S7 Table). One possibility is that each of these SNPs might influence ACL rupture independently (allelic heterogeneity; four independent alleles). Another possibility is that only one SNP is responsible for the association and the other SNPs may show an association due to weak linkage (r 2 was between 0.03 and 0.43 between the four SNPs). To distinguish between these two possibilities, we performed a conditional analysis in which the genotype of the SNP with the strongest p-value (rs55960725) was added as a covariate in the logistic regression for ACL rupture. In the conditional analysis, none of the other SNPs (rs2082001, rs2375637 or rs8110571) had p-values that fell below 0.05 and their effect sizes shrank, indicating that their association with ACL rupture was partially dependent on rs55960725 (S7 Table).

Discussion
Achilles tendon and ACL injury are common in recreational and elite athletes, and even in non-athletes [35][36][37][38][39][40]. Prior studies have identified 12 genes associated with Achilles tendinopathy and 8 genes associated with ACL rupture [1,2,[4][5][6][7][8][9][10][11][12]13,[14][15][16][17][18][19][20][21][22]. These prior studies, however, evaluated a small number of candidate genes among small cohorts of athletes. With the advent of large-scale genotyping programs, it is now possible to screen the entire genome for polymorphisms associated with sports injury risks such as Achilles tendon or ACL injury. In principle, a genome-wide screen for injury could provide new insight about the differences between individuals regarding their inherent propensity for injury. Furthermore, because the genotype data includes most common polymorphisms, a genome-wide screen reports the strongest associations in the genome without bias, and these associations are the most useful for predicting individual risks for injury.
Here, we performed a study to find genetic variants associated with Achilles tendon or ACL injury by obtaining access to large-scale genotype and phenotype data from the Research Program on Genes, Environment and Health. The data contains information from 102,979 individuals of whom 5,418 had Achilles tendinopathy or rupture (from all five ancestry groups) and 598 had an ACL rupture (from the EAS, EUR and LAT ancestry groups). To date, this is the largest gene association study for either Achilles tendon or ACL injury reported in terms of number of SNPs that were genotyped and number of injury cases.
We were unable to find any SNPs associated with either injury type at a genome-wide significance level. For a hypothetical common SNP with a minor allele frequency of 5% and a genotype relative risk of 1.3, power calculations indicate that this SNP would have about a 78% chance of being detected in our Achilles study and a 59% chance for detection in our ACL study. Thus, it is unlikely that there are many common SNPs with a medium-to-strong association with these injuries (i.e. genotype relative risk > 1.3).
We found four independent polymorphisms, however, associated with Achilles tendinopathy or rupture with a borderline significant p-value between 1x10 -6 and 5x10 -8 (Table 3). Similarly, three independent polymorphisms were associated with ACL rupture with borderline significance (Table 3). A previous analysis tallied available SNP associations with borderline associations (between p 10 −7 and p>5x10 -8 ), and found that 73% were replicated in subsequent studies [41]. For p-values between 10 −6 and 10 −7 the replication rate is likely to be much smaller, but not negligible [42]. Thus, some of the SNPs presented in Table 3 may be replicated in future studies.
Although we re-tested candidate SNPs that were previously reported to show an association with either Achilles or ACL injuries, we did not find strong evidence for replication of the candidate gene results. We performed the replication analysis using age, sex and ancestry group as covariates (as was done in the genome-wide analysis) as well as without these covariates (as was done in the candidate gene studies). For Achilles tendinopathy or rupture, we found moderate evidence for replication of one of the 14 tested SNPs; specifically, rs4919510 in MIR608 had a p-value of 5.1x10 -3 without using covariates that suggests a false discovery rate of about 7%, which is almost statistically significant (S3 Table). None of the other SNPs showed evidence for replication either with or without using covariates in the analysis.
One possible explanation for the failure to replicate previous candidate gene studies is that our cohort consists of patients in the Kaiser-Permanente medical system in Northern California regardless of activity level, whereas previous studies evaluated cohorts of competitive athletes. Varying levels of physical activity may affect risk for sustaining an Achilles tendon injury or ACL rupture.
For Achilles tendon injury, another possible explanation for the failure to replicate previous results is that cases were identified based on diagnosis and procedure codes in the electronic medical record. Patients in large, administrative data sets may have been misdiagnosed, introducing information bias. Non-differential misclassification of Achilles tendon injury would tend to reduce the strength of associations. A second possible explanation is that we evaluated patients with Achilles tendinopathy, bursitis or rupture as a single injury group. Achilles tendon rupture may represent an increased injury severity versus tendinopathy with a correspondingly stronger genetic effect, or conversely may be associated with acute trauma with limited genetic effect. A third possibility is that the genetic risk factors for Achilles bursitis may be different from those for intrinsic Achilles tendon pathology. Identifying cases of ACL rupture was more definitive, as cases had either shown an ACL rupture by MRI or had undergone an ACL reconstruction procedure.
We performed a candidate-gene study for Achilles tendon injury and ACL rupture using 2855 SNPs in 90 candidate genes. In the first analysis, SNPs were tested individually but none showed a significant association with either type of injury. In the second analysis, gene-based analysis was used to aggregate SNPs into genes and then test each of the genes for association with these injuries. None of the genes showed an association with Achilles tendon injury but GLT25D1 (which encodes a protein that glycosylates collagen) showed a borderline significant association with ACL rupture. GLT25D1 contained four SNPs with nominally significant pvalues. However, it is unclear if the gene-based result for GLT25D1 is due to multiple, independent SNPs contributing to risk for ACL rupture or due to weak linkage between SNPs in the gene.
The summary statistics from the GWAS for Achilles tendon and ACL injury are available to the public at NIH GRASP: https://grasp.nhlbi.nih.gov/FullResults.aspx. These tables may be used in future genetic studies. For instance, one could make a list of additional genes and genetic pathways involved in the development, formation or structure of tendons/ligaments, and then use the data presented here to test those genes for an association. In addition, the data from this paper could be used to validate SNPs found in a first-stage GWAS for Achilles tendon or ACL injury. Finally, these data could be combined with data from other musculoskeletal injuries (e.g. rotator cuff injuries) in a cross-phenotype meta-analysis in order to find SNPs associated with musculoskeletal injuries in general.