Lifelong Reduction of LDL-Cholesterol Related to a Common Variant in the LDL-Receptor Gene Decreases the Risk of Coronary Artery Disease—A Mendelian Randomisation Study

Background Rare mutations of the low-density lipoprotein receptor gene (LDLR) cause familial hypercholesterolemia, which increases the risk for coronary artery disease (CAD). Less is known about the implications of common genetic variation in the LDLR gene regarding the variability of cholesterol levels and risk of CAD. Methods Imputed genotype data at the LDLR locus on 1 644 individuals of a population-based sample were explored for association with LDL-C level. Replication of association with LDL-C level was sought for the most significant single nucleotide polymorphism (SNP) within the LDLR gene in three European samples comprising 6 642 adults and 533 children. Association of this SNP with CAD was examined in six case-control studies involving more than 15 000 individuals. Findings Each copy of the minor T allele of SNP rs2228671 within LDLR (frequency 11%) was related to a decrease of LDL-C levels by 0.19 mmol/L (95% confidence interval (CI) [0.13–0.24] mmol/L, p = 1.5×10−10). This association with LDL-C was uniformly found in children, men, and women of all samples studied. In parallel, the T allele of rs2228671 was associated with a significantly lower risk of CAD (Odds Ratio per copy of the T allele: 0.82, 95% CI [0.76–0.89], p = 2.1×10−7). Adjustment for LDL-C levels by logistic regression or Mendelian Randomisation models abolished the significant association between rs2228671 with CAD completely, indicating a functional link between the genetic variant at the LDLR gene locus, change in LDL-C and risk of CAD. Conclusion A common variant at the LDLR gene locus affects LDL-C levels and, thereby, the risk for CAD.


Introduction
A causal link between serum cholesterol levels and coronary artery disease (CAD) risk is supported by multiple epidemiological association studies as well as treatment trials involving lipidlowering drugs [1]. Less clear are the implications of acquired versus heritable variability of low-density lipoprotein cholesterol (LDL-C) levels as the latter may come into effect already during childhood [2]. Some insights in this respect come from relatively rare mutations either decreasing or increasing LDL-C levels. For example, a variant in the PCSK9 gene has been associated with a substantial decrease in serum LDL-C levels and almost abolished myocardial infarction (MI) risk [3]. Similarly, rare mutations in the LDL-receptor gene (LDLR) are known to cause familial hypercholesterolemia (FH), a condition which is characterized by markedly increased LDL-C levels resulting in premature manifestation of atherosclerotic disease [4]. More recently, several genome wide association studies on lipid traits have identified common SNPs located in the LDLR gene locus showing significant association with LDL-C levels [5,6,7]. The study by Willer and coworkers has also presented suggestive evidence for the association of SNP rs6517720 with CAD on one population [6]. This SNP shows high linkage disequilibrium with SNP rs2228671 for which the association with LDL-C and CAD is investigated in detail and across multiple populations in the study presented here.
Inheritance of either relatively low or high LDL-C levels may be interpreted as a Mendelian Randomisation study, which allows to specifically analyse the genetic component in the variability of this risk factor [8,9]. Particularly, such studies offer the opportunity to test a potentially causal association between risk factor and disease given that interference of behavioural, social, and environmental factors, that otherwise may confound the relationship, is largely excluded by ''Randomisation to the genetic variant''. Here, we attempted to identify more frequent genetic variants affecting LDL-C levels and thereby to further study the relation between the inherited variability of serum cholesterol and CAD risk in the European population.

Study design
Initially, the LDLR genomic region was screened in 1644 population-based subjects of the KORA F3 study for association with LDL levels using genotype data obtained with the GeneChipH Human Mapping 500K Array (Affymetrix, ) enriched by imputed genotypes based on the genome-wide dataset. The single nucleotide polymorphism (SNP) at the LDLR gene locus showing the strongest association with LDL-C levels was further investigated. Particularly, association with LDL-C was examined in several population-based studies comprising adults and children, and association with coronary artery disease (CAD) was tested in six case-control studies. Figure 1 a and 1b provide an overview of the experimental strategy and study populations involved. Details on each of the study populations are provided in the supplementary methods (Methods S1 and table S1). Briefly, the MONICA/KORA F3 and S4 studies as well as the PopGen population controls represent individuals from strictly age-and gender-stratified surveys of the German population [10,11,12,13]. A total of 533 overweight but otherwise healthy children and adolescents (mean age 10.863.1 years) came from the West German Obesity Cohort of the Bonn/Datteln Childhood Obesity Study Group [14].
To investigate the association of SNP rs2228671 with CAD we performed genotyping of this SNP in four case-control studies (German MI Family Study II [15], PopGen CAD cases [12], Left Main Disease Study [16], Aachen Heart Study [17]) and imputed genotypes for this SNP in two further studies for which genome-wide association data were available (German MI Family Study I, WTCCC CAD study) [18]. Control subjects from the KORA F3 and S4 samples were matched to cases for the German MI Family study 1, Left-main disease and Aachen Heart studies. The matching procedure ensured that a control subject was used only once in any of these three samples. No LDL-values were available for the subjects of the Aachen, WTCCC and Left-main disease studies. All samples have been established in accordance with the principles of the Declaration of Helsinki and were approved by the respective local ethics committees. Laboratory measurements of LDL-C in nonfasting individuals were performed after precipitation of apolipoprotein-B containing lipoproteins and measured with standard enzymatic methods. Gene arrays in the MONICA/KORA F3 survey, the WTCCC CAD study and the German MI Family Study I were performed with the Affymetrix GeneChipH 500K Mapping Array Set as previously described in detail [18]. For the MONICA/ KORA S4, PopGen, German MI Family Study II studies and West German Obesity Cohort genotyping for rs2228671 was done using 59exonuclease activity (Taqman) assay on a HT7900 (Applied Biosystems, Darmstadt, Germany). SNP-assays were ordered from Applied Biosystems as Custom TaqManH SNP genotyping assay.

Statistical analysis
At the LDLR locus (Chromosome 19p13.2, base pair position 10.900.000 to 11.300.000 according to NCBI Build 36.1), we investigated 33 genotyped and 223 imputed SNPs for association with LDL-C levels. For assessing the effects of variants not represented on the Affymetrix GeneChipH 500K Mapping Array Set, haplotypes were estimated using the CEU sample from the HapMap project (http://www.hapmap.org/) [19] and further variants were imputed using MACH 1.0 (http://www.sph. umich.edu/csg/abecasis/MACH/index.html). Association analysis of LDL-C on SNPs adjusted for age and gender were performed with MACH2QTL. SNPs with a predicted coefficient of determination r 2 ,0.3 were excluded from the analyses.
Allele frequencies were analyzed in control populations by a x 2test.
For every study group individually, deviations of genotype distribution from and compatibility with Hardy-Weinberg equilibrium (HWE) were analyzed (details see supplement).
LDL-C was regressed on rs2228671 for each population-based group and subjects of the West German Obesity Cohort separately. A prospectively planned pooled analysis using individual data as proposed by Blettner et al. [20] was performed using all control groups studied by regressing LDL-C on rs2228671 with study as random effect (details see supplement). We additionally stratified the analysis by gender and age groups. Means, standard errors, and p-values are reported.
The power for detecting an association between rs2228671 and CAD was calculated using Quanto (http://hydra.usc.edu/GxE/) assuming a log-additive model a = 0.05 and power = 80% (details see supplement).
Differences in genotype frequencies of rs2228671 were compared between CAD cases and controls for every study using the two-sided asymptotic Cochrane-Armitage trend test. Heterozygous and homozygous odds ratios (OR) for the minor allele and corresponding 95% confidence intervals (CI) were estimated. In the prospectively planned pooled analysis [20] we investigated the additive effect of rs2228671 on CAD unadjusted and adjusted for LDL-C by random effect logistic regression models. The most likely genetic model for rs2228671 on serum LDL-C and for rs2228671 on CAD was investigated using logistic regression models and the Minelli approach [21] (details see supplement) [16,22] .
There was no adjustment for population stratification (details see supplement).
The association of rs2228671 with serum LDL-C was utilized to investigate the possible causal effect of serum LDL-C on CAD (Mendelian Randomisation). The inheritance of rs2228671 should be subject to the random assortment of parental alleles at the time of gamete formation. If serum LDL-C increases the risk of CAD, then carriage of an allele that exposes individuals to a long-term elevation in serum LDL-C should confer an increased risk of CAD proportional to the difference in serum LDL-C attributable to the allele. This relationship is largely unconfounded and free of reverse causality bias if specific assumptions are met [23,24,25,26,27] e.g., that rs2228671 is independent of CAD given serum LDL-C. To investigate these relationships between LDL-C, CAD and rs2228671, we fitted a structural equation model using data from cases and controls with known non-use of lipid lowering agents or estimated before-treatment LDL-C levels by correction of measured values for the dosage of a given lipid lowering agent according to a previously published algorithm [28]. In addition, structural equation models including known confounders (age, gender, smoking status, diabetes, and hypertension) were fitted. In case of non-linear variables, the structural equation approach allows a simple interpretation of the regression coefficients when compared with an instrumental variable approach [5]. Estimation was done using MPlus 5, which allows specification of categorical variables, and sensitivity analyses were performed with SEPATH using correlation coefficients estimated by use of the polychor package within R. For this analysis, adjustments for population stratification in the present studies are not required because only German Caucasian subjects were included in this analysis, and population heterogeneity is negligible in the German population as previously demonstrated [Samani et al. 2007;Steffens et al. 2006].

Results
The genomic region of the LDLR located at 19p13.2 was screened for association with LDL-C levels using data from a gene array study performed in the MONICA/KORA F3 survey. Nine of 33 SNPs representing the region on the Affymetrix GeneChipH 500K Mapping Array Set (black squares in Fig. 2a) showed association with LDL-C levels with a p-value ,0.05 (see table S2). A higher density of SNPs mapping to the LDLR genomic region resulted from imputation of further SNPs, which predicted genotypes for 223 additional SNPs located at 19p13.2. Of these, 185 passed quality analyses (blank circles in Fig. 2a; see Methods section), and 50 were associated with LDL-C levels with a p-value ,0.05. Two of these SNPs showed markedly higher levels of significance for association with LDL-C than the surrounding SNPs: rs6511720 (predicted p-value = 4.6610 26 ) and rs2228671 (predicted p-value = 2.4610 26 ). The SNP with the strongest effect (0.36 mmol/L mean difference per one T allele, 95% CI [0.21,0.52] mmol/L; rs2228671; red circle in Fig. 2a) was chosen for replication in further populations.
In order to verify the predicted association of the imputation experiment we genotyped SNP rs2228671 using the 59-exonuclease method in 1 644 subjects of the MONICA/KORA F3 study demonstrating a slightly higher (3.6%) minor allele frequency than in imputed genotypes. However, the relation between rs2228671 and LDL-C remained consistent for the genotyped SNP (Fig. 3a). We then confirmed this association in the larger MONICA/ KORA S4 study comprising 4 184 individuals with a mean difference per T allele of 0.20 mmol/L (95% CI [0.13-0.28] mmol/L; p = 2.6610 28 ) and the population-based PopGen study comprising 2 458 individuals with a mean difference per T allele of 0.14 mmol/L (95% CI [0.06-0.23] mmol/L; p = 0.0012; Fig. 3a).
In order to test if the association of rs2228671 with LDL-C was already detectable before adulthood we performed additional genotyping in children and adolescents referred for treatment of obesity. In this sample we found a mean difference per T allele of 0.20 mmol/L (95% CI [0.03-0.37] mmol/L; p = 0.0220; Fig. 3a).
The genetic effect of rs2228671 on LDL-C is most likely additive (see Methods S1). This implies that individuals carrying a copy of the T allele have a lifelong exposure to lower LDL-C levels than homozygous C allele carriers with a reduction of 0.19 mmol/ L per T allele (95% CI [0.13-0.24] mmol/L; p = 1.5610 212 ; Fig. 3a). Furthermore, the genetic variant represented by SNP rs2228671 affected LDL-C levels to a similar extent in men and women and across different age groups (Fig. 3b).
Since carriers of the minor allele of SNP rs2228671 have a lifetime of lower LDL-C levels we hypothesized that this translates into a lower CAD risk. To explore this hypothesis, we performed imputation of SNPs located at the LDLR gene locus in two genome-wide association studies of the German MI Family I and WTCCC CAD studies (Fig. 1). In both studies we observed a significant association of the imputed T allele of rs2228671 with CAD risk resulting, expectedly, in decreased ORs for disease manifestation (Fig. 4). Additionally, the association of rs2228671 with CAD was explored by genotyping in four case-control studies for CAD comprising a total number of 4 310 cases and 8 086 controls. Across all six studies, we found an association with CAD with an OR per copy of the T allele of 0.82 (95 % CI [0.76-0.89], p = 2.1610 27 , Fig. 4, for genotype distribution and p-values of individual studies see Table S3).
According to the PROCAM-Algorithm [29] a reduction in LDL-C of 0.38 mmol/L, as found in TT versus CC homozygous individuals in the population-based studies translates into a 21% reduction in the risk of cardiovascular events over a 10 year period. In fact, reduction in risk of CAD by 23% was observed for homozygosity of the minor allele of rs2228671 in the combined analysis of all individuals.
In order to further examine whether a lower risk of CAD, as observed in carriers of the T allele, is causally linked to the genetically determined lower LDL-C associated with this common SNP in the LDLR gene, we performed a Mendelian Randomisation study. For this analysis we focused on individuals on whom native LDL-C levels, i.e., levels not a ffected by or corrected for lipid lowering medication, were available or could be estimated (1 324 cases and 6 255 controls). The structural equation model revealed a highly significant association between the genotype and LDL-C as well as between LDL-C and CAD risk, but the association between rs2228671 and the risk of CAD was abolished (OR 0.99, 95 % CI [0.85-1.16], p = 0.95, Fig. 5). Including known confounders, which were found to be independent of rs2228671, did not change the results (see Fig. S1). Further, these findings were confirmed by regression analysis adjusting the CAD risk related to rs2228671 for LDL-C (see Table S3). Thus, the protective effect on CAD risk observed for the T allele is likely to be causally linked to the reduction in LDL-C associated with this particular variant in the LDLR gene (Fig. 5).

Discussion
We identified a common variant in the LDLR gene that is related to lower LDL-C levels and a lower risk of CAD and MI. In the PROCAM and Framingham risk scores, a decrease of LDL-C by 0.38 mmol/L, as found in homozygote carriers of the T allele, matches a 21-23% decrease of MI risk [29,30]. We observed an effect of such magnitude in six consecutive samples with CAD cases and healthy controls.
We replicated the association of rs2228671 with LDL-C levels in individuals across a wide age spectrum including children. Interestingly, the magnitude of difference in LDL-C levels was stable in all subgroups studied suggesting a stepwise decrease of lifetime exposure to lower cholesterol levels for each copy of the T allele. In parallel, we observed a stepwise decrease in CAD risk. Together with the data from the Mendelian Randomisation, the findings plausibly suggest a functional link between the decrease in cholesterol and the decrease in CAD risk related to the genetic variant or a closely linked variant.
From a clinical point of view the variability in LDL-C modulated by rs2228671 genotypes might be considered rather small in comparison to the profound LDL-C lowering achieved in recent therapeutic trials (examples: CARE [31]: D1.1 mmol/L, Heart Protection Study [32]: D1.0 mmol/L,TNT [33]: D1.9 mmol/L ) [34]. Yet the risk reduction for CAD events reached almost the same magnitude as observed by statin treatment in these trials [1]. Thus, it seems conceivable that a lifelong reduction of LDL-C by 0.38 mmol/L in homozygotes for the T rather than the C allele or in individuals followed long-term in epidemiological cohorts is comparable to the effects of 3-5 years of statin treatment, which by itself results in much larger LDL-C lowering. In this respect it may be of inerest that the variability of LDL-C levels carries a high heritability such that the effects observed in the population may indeed reflect to a large extent the effects of underlying genetic factors [35]. Previous studies using the Mendelian Randomization approach have observed an even greater change in CAD risk through lifelong genetic lowering of cholesterol than predicted by  epidemiological scores [2,3]. However, the sample size of the rare variants was small such that the conclusions of these previous studies on PCSK9 are in line with our study with the risk reduction observed for the protective allele being similar to the risk reduction predicted by epidemiological scores.
Several genes are known to affect LDL-C and MI risk including those coding for APOB [36,37], LDLR [38], and PCSK9 [3]. However, until recently only variants with fairly low allele frequencies ranging between 0.001 to 0.08% in the APOB gene [36], 0.01 to 0.02% in the LDLR gene [39], and 2 to 3 % in the PCSK9 gene had been identified [3]. With the availability of GWAs based DNA-Arrays, a number of more frequent variants affecting LDL-levels emerged [5,6,7,35,40]. Indeed, association of rs2228671 with LDL-C can be found in the comprehensive data of a GWA meta-analysis of three British populations comprising 4337 individuals [5]. Furthermore, the SNP has previously been implicated with hyperlipidemia in interaction models and haplotype analyses [41,42]. However, a direct association of this SNP with LDL-C levels and CAD risk has not been demonstrated so far. While it was unclear whether the T allele of the LDLR polymorphism rs2228671 also affects coronary artery disease risk the relatively high prevalence of the minor allele (frequency 11%) made it plausible that the contribution of rs2228671 to the variability of LDL-C serum levels also translates to a modification in CAD risk at the population level.
In parallel to the present work, Kathiresan and coworkers studied two polymorphisms from the LDLR locus (rs1529729 and rs688) in a CAD prediction score that was based on genotypes that had previously displayed association with lipid levels. Indeed, this score allowed to predict cardiovascular events in a prospective cohort [35]. However, both SNPs from the LDLR included in this score are not in significant linkage disequilibrium with rs2228671 such that the present findings are likely to further extend the implications of common genetic variation for prediction of LDL variability and CAD risk.
Strengths of the present study include the utilisation of several population-based samples and several cohort studies displaying consistently significant effects across multiple sub-groups as well as broad replication in multiple case-control studies. Moreover, evidence for a causal link between modulation of LDL-C levels and CAD risk is strongly supported by the Mendelian Randomisation approach. Nevertheless, limitations of the present work need  In the structural equation model, carriage of the T allele at rs2228671 leads to lower LDL-C levels, and higher LDL-C levels lead to an increased risk of CAD. Given this, there is no additional direct path from rs2228671 to CAD risk, indicating that the functional pathway between the genetic variant at the LDLR gene locus and risk of CAD is through changes in LDL-C. doi:10.1371/journal.pone.0002986.g005 consideration including the fact that we cannot mechanistically explain the decrease in LDL-C associated with the T-allele. SNP rs228671 is located within the first exon of LDLR in the third position of a triplet coding for amino acid 27, a cysteine (Fig. 2b). However, allelic variation in this SNP does not result in an amino acid change. No further polymorphisms in LDLR are known to cause amino acid changes or previously associated with LDL-C levels or CAD were found to be in linkage disequilibrium with rs2228671 or rs6511720 [6,7], suggesting that hitherto unknown mechanisms are responsible for this association. Both, gain of function or increased expression of the LDLR are possible to affect LDL-C serum levels. Remotely, there is even a chance that another gene in linkage disequilibrium with the variant accounts for the observed effects.
Whatever the underlying molecular mechanism is, from a genetic point of view this study on multiple populations offers robust data that the rs2228671 T-allele goes along with lower LDL-C levels and, secondary to this effect, a lower risk of CAD.