Genome-Wide Association Study for Incident Myocardial Infarction and Coronary Heart Disease in Prospective Cohort Studies: The CHARGE Consortium

Background Data are limited on genome-wide association studies (GWAS) for incident coronary heart disease (CHD). Moreover, it is not known whether genetic variants identified to date also associate with risk of CHD in a prospective setting. Methods We performed a two-stage GWAS analysis of incident myocardial infarction (MI) and CHD in a total of 64,297 individuals (including 3898 MI cases, 5465 CHD cases). SNPs that passed an arbitrary threshold of 5×10−6 in Stage I were taken to Stage II for further discovery. Furthermore, in an analysis of prognosis, we studied whether known SNPs from former GWAS were associated with total mortality in individuals who experienced MI during follow-up. Results In Stage I 15 loci passed the threshold of 5×10−6; 8 loci for MI and 8 loci for CHD, for which one locus overlapped and none were reported in previous GWAS meta-analyses. We took 60 SNPs representing these 15 loci to Stage II of discovery. Four SNPs near QKI showed nominally significant association with MI (p-value<8.8×10−3) and three exceeded the genome-wide significance threshold when Stage I and Stage II results were combined (top SNP rs6941513: p = 6.2×10−9). Despite excellent power, the 9p21 locus SNP (rs1333049) was only modestly associated with MI (HR = 1.09, p-value = 0.02) and marginally with CHD (HR = 1.06, p-value = 0.08). Among an inception cohort of those who experienced MI during follow-up, the risk allele of rs1333049 was associated with a decreased risk of subsequent mortality (HR = 0.90, p-value = 3.2×10−3). Conclusions QKI represents a novel locus that may serve as a predictor of incident CHD in prospective studies. The association of the 9p21 locus both with increased risk of first myocardial infarction and longer survival after MI highlights the importance of study design in investigating genetic determinants of complex disorders.


Introduction
There is strong and consistent evidence that coronary heart disease (CHD) is highly heritable and is influenced by a wide range of genetic factors [1,2]. Recently genome-wide association studies (GWAS) identified common genetic variants involved in cardiovascular disease and its risk factors [3]. The loci reported by the latest and largest GWAS altogether explain around 10% of CHD heritability [4].
To date, GWAS for CHD have been conducted mostly in cross-sectional case-control setting, and this design, which uses prevalent cases, typically oversamples those with long post-event survival times. Although such a design often makes it possible to collect information from a large number of patients, this approach may incorrectly identify factors that are associated with a high or low case-fatality rate. For instance, a factor associated with a low case-fatality will be enriched among surviving cases and may appear to increase the risk of disease when prevalent cases are compared with controls. This bias is known as incidence-prevalence (Neyman) bias [5,6]. One major advantage of studying incident cases rather than prevalent cases is that incident cases properly represent the fatal cases and persons with only brief post-event survival. To date the strong and reliable evidence for identifying and assessing factors such as LDL-cholesterol and systolic blood pressure that predict future clinical disease are provided by well-designed population-based, prospective cohort studies that collect large number of incident cases [7].
Here we aimed to study genetic variants that affect the incidence of myocardial infarction (MI) and CHD in prospective, population-based cohorts and whether the genetic variants identified to date are also associated with risk of CHD in a prospective setting. Moreover, we investigated whether the known genetic variants are associated with total-mortality after MI. To this end we used the data from the Cohorts for Heart and Aging Research in Genome Epidemiology (CHARGE) Consortium [8] and collaborating prospective studies.

Study Population
We performed our study in two stages. Stage I studies comprised participants from five prospective cohort studies that form the CHARGE consortium [8]: the Age, Gene Environment Susceptibility Reykjavik Study (AGES) [9]; the Atherosclerosis Risk in Communities (ARIC) Study [10]; the Cardiovascular Health Study (CHS) [11]; the Framingham Heart Study (FHS) [12]; and the Rotterdam Study (RS) [13,14]. Stage II comprised individuals from: The Health, Aging, and Body Composition (Health ABC) Study; The Health Professionals Follow-Up Study (HPFS); The Nurses' Health Study (NHS); PROSPER/PHASE Study; the Study of Health in Pomerania (SHIP); The Women's Genome Health Study (WGHS); the MOnica Risk, Genetics, Archiving and Monograph (MORGAM) Study comprising the Alpha-Tocopherol, Beta-Carotene (ATBC) Study; The FINRISK Study; The PRIME Study (including the PRIME cohorts of Belfast, Lille, Strasbourg and Toulouse); The Northern Sweden Study. Participants in Stages I and II were of European ancestry. Participants with a history of MI or CHD at baseline were excluded. All studies had protocols approved by local institutional review boards. Participants provided written informed consent and gave permission to use their DNA for research purposes. The Supplementary Document provides details about the design and characteristics for these studies.

Case Definitions for MI and CHD
The definitions of incident MI were consistent among the participating studies, including both fatal and non-fatal MI. CHD included fatal or non-fatal MI, and in most studies fatal CHD or sudden death. The definition of MI and CHD for each cohort study is summarized in S1 Table  and S2 Table. Statistical Analysis The date of entry to the analysis was the date of cohort entry (AGES, ARIC, CHS, RS) or DNA collection (FHS). Within each study, Cox proportional hazards regression models were used to test the association between each SNP and time to incident MI or CHD, while adjusting for sex and baseline age. FHS adjusted for familial correlation by clustering on pedigree. Analyses in CHS and ARIC were adjusted for study site and in FHS, for generation and additionally for ancestry using principal components [15]. The censor date was the time of MI or CHD diagnosis, the time of death, last date of contact, or at the end of follow-up, whichever came first. For each SNP, additive genetic models were used to estimate the regression coefficient for the hazard ratio (HR) for allele dosage and its respective standard error. For each analysis, a genomic control coefficient (λ) was calculated, which estimated the extent of underlying population structure. Further information on the analysis methods can be found in S3 Table and S4 Table. Information regarding the genotyping and imputation as well as genotype quality control are found in S5 Table and S6 Table. SNPs with a minor allele frequency of less than 1%, imputation quality less than 0.3 or very large regression coefficients (absolute value larger than 5) were excluded from meta-analysis. Results from individual studies were meta-analyzed for a total of 2,543,842 autosomal SNPs based on Phase 2 HapMap. A fixed effects inverse variance weighted meta-analysis approach was implemented in METAL [16] to combine the regression coefficients and their standard errors, producing a summary regression coefficient and standard error from which a p-value was computed. An arbitrary significance threshold for followup in Stage II was set at 5.0×10 −6 . When more than one SNP clustered at a locus, we carried forward four SNPs with smallest p-values in the associated locus for further investigation in Stage II.
In Stage II, three studies provided data both for incident MI and CHD (HABC, MORGAM, and WGHS), two studies provided data only for MI (PROSPER, SHIP), and two others provided data only for CHD (HPFS, NHS). Each Stage II study used the same analytic method as used in Stage I to examine the association of the 60 SNPs with MI or CHD. As in the Stage I meta-analysis, we used inverse-variance weighted fixed effects meta-analysis to evaluate the Stage II results. We applied a Bonferroni correction for 60 SNPs and set 8.3×10 −4 as the significance threshold. Finally, results from all studies in Stage I and II were combined using inversevariance weighted fixed effects meta-analysis.
We further studied each of the 46 SNPs reported by the CARDIoGRAMplusC4D Consortium [4], for association with incident events in our meta-analysis of longitudinal cohort studies. Moreover, the SNPs were combined into a weighted genetic risk score using beta estimates from the CARDIoGRAMplusC4D Consortium report [4]. The association of each SNP, as well as the score from the combination of all 46 SNPs, was examined with incident MI and CHD using the results of Stage I meta-analysis.
We applied a Cox proportional hazards model adjusted for age and sex to examine the association of the known SNPs with mortality after MI. Five studies including AGES, ARIC, CHS, FHS and the Rotterdam Study provided data for this analysis and in total 2953 individuals were followed after incidence of MI of which 1828 died. The median follow up time ranged from 2.3 years in AGES to 4.7 years in FHS. The baseline characteristics of the study populations for this analysis are presented in S7 Table. Since this analysis was meant to explore potential reasons for weak association or lack of association with incident MI and CHD, we limited the analysis to three SNPs with more than 80% power in Stage I to study its estimated associations with incident MI and CHD.

Fig 1 describes Stage I and Stage
II of the study. The Stage I panel included five prospective cohort studies comprising a total of 24,024 participants who were free of MI and CHD at baseline. The average age ± standard deviation ranged from 54.1±5.6 in ARIC to 74.6±5.5 in AGES. More than half of the participants (54.5%) were women. The basic characteristics of the participating studies are shown in Table 1. A total of 1570 incident MI events (6.5%) and 2406 incident CHD events (10.0%) occurred over an average of 8.2 years and 8.1 years of follow-up for MI and CHD, respectively. The average age at the time of MI ranged from 65.2 years in ARIC to 80.8 years in CHS.
In Stage II, we sought additional evidence for associations in eight loci for MI (QKI, ODZ3, DGKB, FOXL1, CALCOCO2, BARD1, COL8A1, ATXN1) and eight loci for CHD (PAP2D, GPC5, CTNNA3, BARHL2, IGFBP3, LRFN2, ATXN1, SNCA) using four SNPs per locus, for a total of 60 SNPs in 15 loci (ATXN1 was associated with both MI and CHD). Baseline characteristics of the participants of Stages II are shown in S8 and S9 Tables. The results for all 60 SNPs are presented in S10 Table and S11 Table, for MI and CHD, respectively. None of the SNPs passed the Bonferroni adjusted threshold of 8.3×10 −4 . The results for the best association in   Fig 2 presents the linkage disequilibrium (LD) and p-values of regional markers for this locus. We tested for evidence of replication of this association in 8201 African American individuals including 546 incident cases from the PAGE Study [17], however, rs6941513 was not significantly associated with risk of MI in this population (p = 0.49). We sought evidence for the association of 46 SNPs recently reported in the largest GWAS to date for coronary artery disease [4] with the incidence of MI and CHD (Table 4). Despite excellent power, we found only modest evidence for replication of the association with 9p21 locus (CDKN2A/B), the most established finding from previous cross-sectional case-control GWAS. The most replicated SNP at 9p21 locus, rs1333049, was nominally associated with MI (HR: 1.09 [95%CI: 1.01, 1.18], p-value = 0.02) and marginally with CHD (HR = 1.06 [95%CI: 0.99, 1.13], p-value = 0.08). The most significant association with MI was found for rs15563, a SNP in UBE2Z (HR: 1.12 [95%CI: 1.04, 1.20], p-value = 1.9×10 −3 ) and the most significant association with CHD was found for rs10947789, a SNP within the KCNK5 locus (HR: 1.13 [95%CI: 1.05, 1.22], p-value = 5.6×10 −4 ). We found nominally significant associations (p<0.05) with SNPs annotated to CDKN2A/B for MI, LIPA for CHD and COL4A2, TCF21, PDGFD, KCNK5, VAMP8, MRAS, UBE2Z and TCF21 for both MI and CHD (Table 4). A weighted genetic risk score composed of these 46 SNPs was associated with MI (p-value = 1.3×10 −3 ) and CHD (pvalue = 1.2×10 −3 ) in the Stage I meta-analysis.
Among individuals who experienced MI during follow-up, the risk allele of rs1333049 was associated with a significantly decreased risk of mortality (HR: 0.90 [95% CI: 0.84, 0.97], pvalue = 5.5×10 −3 ) ( Table 5). In both SNPs at 9p21 locus the "risk allele" from cross-sectional case-control GWAS was associated with longer survival after MI and would have been enriched in surviving prevalent cases. Fig 3 illustrates the inverse association of 78 top SNPs at the 9p21 locus as reported by CARDIoGRAMplusC4D Consortium [4] with survival after MI. We also examined the association of rs6941513 with mortality after MI, however, the association was not significant.

Discussion
We performed a GWAS on incident MI and CHD and examined whether the gene variants identified to date are also associated with risk of CHD in a prospective setting. In a two-stage design, involving 37,561 participants with 2,328 cases of incident MI, we identified a novel genome-wide significant locus, QKI, associated with incident MI. This finding requires further replication. The results also highlighted the difference between the genes identified in prospective versus cross-sectional case-control studies. The 9p21 locus was associated with both an increased risk of incident MI and, during follow-up post-MI, a decreased risk of total mortality, indicating that genetic variants may operate differently in an alternative setting.  In this two-stage design, we found evidence for MI-associated genetic variants nearby QKI (KH domain containing, RNA binding). The combined p-value for three out of four genetic variants that were examined in the region exceeded genome-wide significant threshold. Although these data provide evidence for an association between the QKI locus and incident MI, this finding should be confirmed by further studies since these variants attained conventional levels of genome-wide significant p-value only in the combined meta-analysis.
If confirmed, the QKI finding may represent a novel pathway in developing CHD. QKI is known to be involved in cell cycle regulation, a pathway for which there is emerging evidence for a key role in developing atherosclerotic plaques and cardiovascular disease [18,19]. A functional study has reported that QKI is a central regulator of vascular smooth muscle cell phenotypic plasticity and that intervention in QKI activity can improve pathogenic fibro-proliferative responses to vascular injury [20]. Moreover, a recent paper shows that the RNA-binding properties of QKI play a critical role in regulating human monocyte to macrophage differentiation [21]. de Bruin and co-workers identified that the conversion of monocytes to both pro-and anti-inflammatory macrophages with GM-CSF or M-CSF, respectively, markedly increased expression of the QKI, which all were readily detected in CD68+ macrophages of fibrous cap atheromata and atherosclerotic lesions with intraplaque hemorrahage. Furthermore, reduced expression of QKI in monocytes delayed their differentiation into macrophages, perturbed their capacity to become lipidengorged foam cells, and led to a reduction in monocyte infiltration in atherosclerotic lesions [21]. Altogether we propose that QKI is involved in inflammatory responses to injury and could be a potential thrapeutic target to prevent cardiovascular disease. Further functional investigation is needed to robustly identify mechanisms involved for this locus.
Prior GWAS which included extremely large sample sizes did not report QKI though they should have had enough statistical power to detect a locus with such an effect. However rs6941513 was not associated with CAD in the Cardiogram plusC4D GWAS (OR = 1.01, pvalue = 0.45). In contrast to former GWAS, we have used a prospective, longitudinal cohort design to examine genetic association with incident cases of MI and CHD. It is possible that the magnitude of the effect with prevalent cases is smaller than with incident cases; thus the locus was not detected by previously published GWAS that primarily use a case-control design.
Although CHD includes MI events by definition, the loci we found for MI and CHD overlapped only for one locus (ATX1). One reason could be differences in mechanisms involved in the restrictive diagnosis of MI versus the broader diagnosis of CHD. However, unstable effect estimates and p-values due to lack of statistical power could have contributed to this observation as well.
Despite excellent statistical power, we identified only a modest signal at the 9p21 locus. This locus, initially identified by GWAS, has been validated by numerous studies in different geographic and ethnic subgroups. However, our study is not the first study to report a weak signal or lack of association at this locus. In fact, prominent differences have been observed between cross-sectional case-control versus longitudinal studies. For instance, in a meta-analysis by Chan et al [22], cross-sectional analyses of angiographically defined cases and controls show a strong per allele association with 9p21 (OR: 1.31, 95% CI: 1.20, 1.43). However, in a meta-analysis of follow-up studies by Patel et al [23], the per allele hazard ratio of the 9p21 variants for fatal and non-fatal adjudicated MI was 1.09 (95% CI: 1.03-1.16). The latter is the same as what we report in this study, though the meta-analysis includes earlier reports from some of our studies. One explanation for this inconsistency is the incidence-prevalence bias. Most GWAS for coronary artery disease to date have consisted of cross-sectional case-control studies, a design that over represents patients who survived their MI or CHD event. Using data from five population based cohort studies we found that the reported risk alleles for this locus are associated with longer survival after MI. This finding that was previously reported as well [23][24][25] supports the conjecture. Thus, the high prevalence of the risk allele in various types of cross- sectional analyses may not be due entirely to a high risk of experiencing MI or CHD, but also to an improved chance of survival after MI.
The molecular biology behind the protective effect of the risk alleles at 9p21 is yet unclear, however, there is a growing body of evidence to show that 9p21 locus is only increasing the risk of CHD for the first event and not for the subsequent events. For instance, Patel et al found no association with subsequent CHD events in a recent meta-analysis of 25,163 individuals with established CHD [23]. Thus, it could be concluded that 9p21 locus is contributing to the formation and progression of plaques and not to their instability prior to events; therefore, the association is merely observed in early stages of the disease. This is in agreement with the report by Palomaki [26] that suggests a diminished effect of 9p21 locus by age, a finding that is confirmed by Patel et al for secondary events. It should be noted that the mean age of participants was more than 70 years old in two and more than 60 years old in four of the participating cohorts. In this context, the older mean age of our population could be another reason why our findings do not replicate known loci such as 9p21.Our study is the largest collection of population-based prospective GWAS on incident MI and CHD and includes high quality genotyping and phenotyping data from well-known cohort studies in the field of cardiovascular disease. Moreover, similar case definitions for MI and CHD, comparable quality control for genotyped data, harmonized imputation strategies and collaboratively designed analysis plans are further strengths of our study. Despite these strengths, there are several limitations that merit discussion. First, nearly all studies who contributed to our GWAS are also members of the CARDIoGRAMPlusC4D Consortium [27], however, they have used only their prevalent cases in CARDIoGRAMPlusC4D project and therefore there is no overlap between the two GWAS. Second, since our sample size was limited, further susceptibility variants of weaker effects may have been missed in our study. Third, we have tried to use consistent definitions for MI, however, slight differences exist between the definitions for CHD. This might have introduced heterogeneity in our case definition. Finally, our findings may not be directly generalizable to non-European populations.
A potential clinical application of risk alleles identified from GWAS is the prospective prediction of cardiovascular disease. To date, the totality of evidence from prospective studies suggests that there is only modest, independent prediction of increased cardiovascular disease risk using genetic information with small to modest incremental reclassification for prediction beyond the known clinical CVD risk scores [28]. This lack of success has been attributed to the small percentage of variance explained by known genetic factors. However, our results also suggest that genetic risk prediction needs to consider differences in genetic variants that predict the risk of cardiovascular disease in prospective and cross-sectional settings.
In summary, using the largest collection of population-based prospective genome-wide association studies we have identified QKI as a potential locus for incident myocardial infarction. Furthermore, we have shown that the genes associated with risk of cardiovascular disease may differ in effect size when studied in a cross-sectional case-control versus cohort settings. The role of 9p21 locus may be complex, increasing the risk of incident MI and decreasing mortality among those with CHD. This highlights the importance of examining longitudinal cohort studies in the study of etiology even for genetic factors. These findings may have implications for application of genetic variants in risk estimation for cardiovascular disease, an effort that so far has not provided strong evidence for incremental risk prediction by genetic markers.