Improvement in Prediction of Coronary Heart Disease Risk over Conventional Risk Factors Using SNPs Identified in Genome-Wide Association Studies

Objective We examined whether a panel of SNPs, systematically selected from genome-wide association studies (GWAS), could improve risk prediction of coronary heart disease (CHD), over-and-above conventional risk factors. These SNPs have already demonstrated reproducible associations with CHD; here we examined their use in long-term risk prediction. Study Design and Setting SNPs identified from meta-analyses of GWAS of CHD were tested in 840 men and women aged 55–75 from the Edinburgh Artery Study, a prospective, population-based study with 15 years of follow-up. Cox proportional hazards models were used to evaluate the addition of SNPs to conventional risk factors in prediction of CHD risk. CHD was classified as myocardial infarction (MI), coronary intervention (angioplasty, or coronary artery bypass surgery), angina and/or unspecified ischaemic heart disease as a cause of death; additional analyses were limited to MI or coronary intervention. Model performance was assessed by changes in discrimination and net reclassification improvement (NRI). Results There were significant improvements with addition of 27 SNPs to conventional risk factors for prediction of CHD (NRI of 54%, P<0.001; C-index 0.671 to 0.740, P = 0.001), as well as MI or coronary intervention, (NRI of 44%, P<0.001; C-index 0.717 to 0.750, P = 0.256). ROC curves showed that addition of SNPs better improved discrimination when the sensitivity of conventional risk factors was low for prediction of MI or coronary intervention. Conclusion There was significant improvement in risk prediction of CHD over 15 years when SNPs identified from GWAS were added to conventional risk factors. This effect may be particularly useful for identifying individuals with a low prognostic index who are in fact at increased risk of disease than indicated by conventional risk factors alone.


Introduction
There has been much discussion of personalised medicine and the use of genetic risk scores for identifying people at increased risk for chronic diseases including coronary heart disease (CHD).The expectation is that such individuals might benefit from targeted interventions, thereby reducing their risk of developing disease.The Framingham risk score [1] is the most commonly used method of CHD risk prediction, and has been widely assessed for validity.However, the accuracy of this score differs between populations, commonly over-estimating risk in European countries [2], and overall accuracy is generally low for individuals not at the extremes of risk distributions.Alternative risk prediction models have been developed which incorporate a range of additional risk factors, such as biomarkers [3], socio-economic indicator, or family history [4], but these still have limited predictive power.
Family history is predictive of CHD after adjusting for other conventional risk factors [5,6], and CHD is estimated to be approximately 40-50% heritable [7,8].Despite this, genetic information has so far generally not resulted in appreciable improvements in prediction over non-genetic risk factors, (apart from monogenic disease).This is likely due in part to the small effects exerted by individual single nucleotide polymorphisms (SNPs) relative to established risk factors; but the selection of SNPs for evaluation, and methods of inclusion in a predictive model, are also likely contributors.Previous genetic risk prediction models have often relied on candidate SNPs that have a known biological role in, or association with, CHD or atherosclerosis [9].The publication of genome-wide association studies (GWAS) has provided another method for identification of SNPs, independent of known biological function, but based on statistical evidence of association.Models have often used genetic risk scores, basically a sum of the number of risk alleles, which do not take into account the individual effect sizes and assume independence of these alleles.
The primary aim of this analysis was to determine whether a systematically selected panel of SNPs, already found individually to be reproducibly associated with CHD through GWAS, could improve prediction of CHD over and above well established conventional risk factors, thereby contributing additional clinical utility.Since the majority of coronary events occur in individuals with Framingham based risk scores of less than 20% [10], the inclusion of genetic information has the potential to create a more personalised and accurate risk evaluation.

Study Population
Details of the Edinburgh Artery Study (EAS), have been published previously [11,12].In brief, the EAS enrolled 1592 men (809) and women (783) aged 54-75 years living in Edinburgh, Scotland.Recruitment used an age-stratified random sample from ten general practices, resulting in a geographical and socioeconomic representation of the population of Edinburgh.Clinical examinations were held during 1987/8, and DNA samples were collected at a five year follow-up examination (attended by 1165 (73%) subjects).At time of genotyping for the current study (2009), DNA was available for 856 subjects, of which 840 were successfully genotyped (409 men, 431 women).Reasons for not having a DNA sample included refusal to provide a blood sample or allow genotyping at the 5-year examination, or insufficient sample remaining.Baseline characteristics of the full EAS population and the population used for the current analysis were very similar (Table 1).
Data collection for identification and validation of coronary events at baseline and throughout follow-up included the WHO chest pain questionnaire, ECG (coded using Minnesota Classification Code), self-reported doctor diagnosis of disease, record linkage to hospital discharge data and death certificates, and scrutiny of general practitioner records [12].Conventional risk factors mea-sured at baseline included lipids and blood pressure.Complete follow-up was available until June 2003, a mean follow-up of 15 years.
The classification of CHD used in the current analyses was based on validated events and comprised of fatal or non-fatal myocardial infarction (MI), angioplasty, coronary artery bypass surgery, angina and/or unspecified ischaemic heart disease as a cause of death.To reduce the potential for mis-classification, further analyses were restricted to fatal or non-fatal MI or coronary intervention (angioplasty or coronary artery bypass surgery).Family history was also collected at baseline, but was limited to unconfirmed self-reports of MI or angina in a parent.

Ethical Approvals
Ethical approval for the EAS was granted by the Lothian Health Board Medical Research Ethics Committee.Written informed consent was obtained from all participants.

SNP Identification
Selection of SNPs used recent large scale meta-analyses of GWAS of CHD to identify SNPs that have demonstrated reproducible associations with CHD [13,14].This provided 36 SNPs, of which six were not available on Metabochip (rs10953541, rs1412444, rs17609940, rs216172, rs46522, rs964184) and no proxy was available; rs4977574 was replaced with rs133049 (r 2 = 0.97, D' = 1.0).Three SNPs were removed because they were in LD (r 2 .0.85) with other included SNPs (rs646776, rs1199338, rs12526453).Details of SNPs used in prediction models are presented in Table 2 (detailed in Table S1).
Additional SNPs for use in a secondary, exploratory analysis were selected based on nominal significance (P,1610 25 ) in GWAS of CVD, significant associations with lipids in GWAS, and/or biological plausibility.This provided an additional 44 SNPs (detailed in Table S2) that were available and successfully genotyped in the study population, resulting in a total set of 74 SNPs for use in secondary analysis.This was a more subjectively selected and therefore potentially biased set of SNPs.

Genotyping
Genotyping used the Illumina MetaboChip, from which the chosen SNPs were extracted.Quality control was carried out on the full MetaboChip results, 16 samples with call rates below 75% were excluded.Table S1 reports: call rates, mean genotypic call rate of 97.7% (range 85.5-99.5);Hardy Weinberg Equilibrium (HWE), one SNPs showed deviation from HWE (rs4773144); and minor allele frequencies (MAF), range 3-49%.

Statistical Analysis
Statistical analysis used R version 2.14.0 [15], all p-values were two-sided.Prediction of coronary risk used multivariate adjusted Cox proportional hazards in the survival library [16], the assumption of proportional hazards was satisfied for all models.Conventional risk factors were based on the Framingham model [1], and included: sex, baseline age, systolic blood pressure, smoking (Yes/No), diabetes and/or glucose intolerance (Yes/No), and total cholesterol/HDL cholesterol.SNPs were added as covariates to the conventional risk factors, assuming an additive model.This was thought preferable to creation of a single genetic risk score as it allows more influential SNPs to exert more of an effect on the model, whereas a composite risk score assumes all SNPs have the same effect size.The derived ß coefficients were used to calculate prognostic indices, thereby creating weighted prediction models.Prognostic indices were converted to predicted probabilities as 12S 0 (t) exp(PI) [1].
Model performance was evaluated by C-indices, net reclassification indices (NRI), integrated discrimination improvement (IDI), and plotted ROC curves.ROC curves were plotted using the ROCR library [17], C-indices, NRI, and IDI used the Hmisc library [18].The C-index used in survival analysis is analogous to area under the ROC curve used in logistic regression, simply it is a measure of the concordance in predicted and observed survival times between subjects [19].NRI was based on event specific reclassification and used continuous measures rather than categories, which increases statistical power.NRI can be used to compare the clinical impact of different models, simply, it is a comparison of the proportion of subjects with disease who have appropriately increased risk scores with the new model, and the proportion of subjects without disease who have appropriately decreased risk scores with the new model [20].IDI represents desired improvements in average sensitivity corrected for undesirable increases in 1-specificity, it therefore compared whether the new models improved sensitivity without affecting specificity, as described in Pencina et al. (2008).ROC curves are plots of 1- All analyses used first incident events only, subjects with a diagnosis of prevalent CHD at baseline were excluded, as appropriate.Time to event was determined individually for both CHD and fatal or non-fatal MI or coronary intervention, based on appropriate diagnostic criteria.Since models based on different subjects could differ, risk prediction models that were compared contained identical population groups.Power was calculated using the gap library [21].Though underpowered to detect significant associations for individual SNPs, it was hypothesised that a set of SNPs with high prior probability could jointly have a sufficiently large effect size.There was 80% power to detect an effect size of 1.5 with a minor allele frequency (MAF) of 30% in a multiplicative model, at 5% significance with a disease prevalence of 20%.
An exploratory method of selecting SNPs used regression trees in the rpart library [22].To identify SNPs that were informative after conventional risk factors, the residuals of a model containing conventional risk factors were used as the dependent variable.This analysis included the full collection of 74 SNPs as potential covariates.Tree development used the Gini Index as the splitting rule, SNPs were treated as ordinal, and splitting was only considered as dominant or recessive models.Regression trees sequentially selected SNPs that best partitioned subjects into the appropriate group [23]; the sets of SNPs that were identified by the regression trees were then used to develop prediction models.

Risk Prediction Using SNPs with Confirmed Associations with CHD
27 SNPs identified in meta-analysis of GWAS of CHD were successfully genotyped in the EAS population (Table 2), and results of prediction models are summarised in Table 3 (hazard ratios given in Tables S3 and S4).Addition of the 27 SNPs to conventional risk factors in prediction of CHD increased the C-index from 0.671 to 0.740 (P = 0.001) and NRI was 54% (95%CI 35-74; P,0.001).When restricted to fatal or non-fatal MI or coronary intervention the C-index increased from 0.717 to 0.750 (P = 0.256), and NRI was 44% (95%CI 20-67; P,0.001).The results were almost identical when family history of CHD was also included in the models.
Plotted ROC curves (Figure 1) showed that addition of SNPs improved prediction over much of the curve for CHD, however for fatal or non-fatal MI or coronary intervention the models performed differently at different sensitivities when SNPs were added; here the addition of SNPs better improved discrimination when the sensitivity of conventional risk factors was lower, translating to improved identification of an individual with a low prognostic index in fact at increased risk of an event.This was mirrored in density plots, in which a second distribution of higher risk scores for subjects with events emerged upon addition of SNPs (Figure S1).Addition of SNPs to conventional risk factors moved 10 subjects to predicted risk $20%, and increased the OR of having any CHD given a $20% predicted risk increased from 3.86 (95%CI 2.52,5.93) to 5.42 (95%CI 3.54,8.38).When restricted to fatal or non-fatal MI or coronary intervention, 16 subjects moved to predicted risk $20%., and the odds ratio of having an event given a $20% predicted risk increased from 4.42 (95%CI 1.78,10.46)to 12.18 (95%CI 6.30,24.03).Reclassification tables are presented in Table S5.

Discussion
In this prospective, population-based cohort of men and women from Edinburgh, Scotland, a systematically-selected set of SNPs improved prediction of CHD over 15 years, over-and-above conventional risk factors.A total of 27 SNPs that were significantly associated with CHD, when added to the Framingham-based conventional risk factors of age, sex, SBP, total cholesterol/HDL cholesterol, diabetes and/or glucose intolerance, and smoking, improved prediction as indicated by significant improvements in NRI and C-indices.NRI were used to evaluate the clinical impact of addition of SNPs.Given that an estimated 15-20% of MI occur in individuals considered as lower risk based on conventional risk factors [24], the ability of this genetic model to identify such subjects and increase their predicted risk indicates potential clinical utility.The highest risk category of at least 20% CHD risk was of interest as individuals in this category are often considered suitable for clinical intervention, and the risk of misclassification is decreased [25].The appropriate reclassification of subjects to $20% predicted risk on addition of SNPs to conventional risk factors suggests that such a model could affect treatment decisions for a number of individuals.
Regression trees were used to evaluate whether a smaller collection of SNPs was sufficient to improve prediction, to account for the possibility that not all SNPs contribute to prediction.This allowed for selection of additional SNPs as it was not expected that GWAS would have sufficient power to identify all associated and/ or predictive SNPs.Though regression trees are prone to over fitting, they were an exploratory method to limit the number of SNPs included in the models.They also provided branching patterns that may show that an effect at one SNP may only occur in the presence of another SNP.This would indicate that only SNPs with independent effects should be included, in order to get more accurate population based risk associated with the SNP.
Previous studies that added candidate SNPs to conventional risk factors, using either genetic risk scores (a count of the number or risk alleles) or weighting of SNPs, have generally not significantly improved model discrimination as measured by C-index.They have however indicated through NRI [26] and/or increased hazard ratios that SNPs could improve risk prediction [27,28,29,30,31]; with significant associations reported between incident CHD and genetic risk scores [26,27,28].Humphries et al. (2007) [32] found a significant improvement in C-index in the Northwick Park Heart Study II (P,0.001), which was further improved after inclusion of an interaction with smoking (P = 0.01).
More recently there have been other studies that used GWASidentified SNPs in prospective cohorts; these contained many but not all of the GWAS SNPs used in the present analyses.Paynter et al. (2010) [6] assessed the predictive ability of adding genetic risk scores to conventional risk factors for prediction of any CVD (MI, stroke, arterial revascularization, and cardiovascular death) in a large  cohort, and found no improvement in discrimination or reclassification.Additionally the investigators found that the genetic risk score alone was not associated with risk of CVD; this may be due to the use of a broader phenotype.Davies et al. (2010) [33] assessed the predictive utility for CHD and found a significant improvement in the C-index when SNPs were added to conventional risk scores, from 0.801 to 0.809, (P = 0.0073).They additionally found that weighting SNPs led to models that performed better than unweighted models.Ripatti et al. ( 2010) [34] also reported an association between incident CHD and genetic risk score after adjusting for conventional risk factors, however this was observed through improvements in IDI and NRI, and did not lead to significant changes in C-index.They also reported that though family history was associated with increased risk of CVD, adjustment for family history did not change the risk estimates of the genetic risk score.The use of a genetic risk score results in equal weighting of all SNPs, possibly missing relevant information on the relative effects of each SNP within the model [32].Also, in the development of a model in which covariates are not unrelated, the ß coefficients need to be adjusted to account for the impact covariates have on each other to prevent distortion of the model.ROC curves measure discrimination but are 'insensitive to change' [19,35], however as our curves showed, the changes in risk prediction did not always change consistently over the full range of sensitivities, a large change in one portion of the curve may be clinically relevant but not represented in summary measures.The clinical value was demonstrated by the increased NRI, and subsequent increased odds of subjects with CHD to have predicted risk $20%, showing that addition of GWAS SNPs can have clinical applicability.
There were a number of strengths and weaknesses of the current study.A strength of the EAS population was the long follow-up of 15 years, and the prospective method that included regular contact with study participants and general practitioners, as well as use of hospital discharge records and death certificates.This enabled confirmation of reported events, providing accurate phenotypic data and minimising misclassification bias; as well as detailed and accurate records for subjects that died during follow-up, thereby removing survivor bias.Here we found that genetic data was more informative than self-reported family history of CHD.This was possibly due to the difficulty in collecting accurate reports of family history in epidemiological studies, which would also exist clinically and therefore not result in accurate risk prediction.
As the cohort was recruited from Edinburgh only and was primarily white, the risk of population stratification was low.However, the EAS study population was small for a genetic study.There may also have been temporal trends that affected CHD risk and consequently risk prediction, such as smoking habits and primary prevention of CHD.At baseline, medications for CHD risk factors were not as commonly used as recently, and during follow-up a considerable portion of the population were prescribed anti-hypertensive, lipid lowering, and/or diabetes treatments.With such a long follow-up this may have been a confounder that was not accounted for.This is not a definitive list of predictive SNPs.Further analysis of GWAS and fine mapping is necessary to identify causal SNPs that will be more accurate in risk prediction.There remains the possibility that some of the GWAS significant SNPs did not contribute to risk prediction.It is also likely that there are geneenvironment and gene-gene interactions that were not accounted for, for example Humphries et al. (2007) found an interaction with smoking [32], and HMGCR genotypes may affect lipid lowering responses to statins [36].Though use of GWAS results removed sources of bias associated with the inclusion of candidate gene study results, there remain problems specifically associated with GWAS, such as poor representation of low MAF SNPs, that debatably have larger effect sizes [37].However, we have shown that use of a systematically selected panel of SNPs can significantly improve prediction of CHD risk over-and-above conventional risk factors, indicating that this approach to incorporating genotypic data into prediction models has potential clinical utility.

Figure 1 .
Figure 1.ROC curves of prediction of coronary heart disease when GWAS significant SNPs were added to conventional risk factors.A: ROC curves for CHD, comprised of fatal or non-fatal MI, angioplasty, coronary artery bypass surgery, angina and/or unspecified ischaemic heart disease as a cause of death; B: ROC curves for diagnoses limited to fatal or non-fatal MI or coronary intervention (angioplasty or coronary artery bypass surgery).doi:10.1371/journal.pone.0057310.g001

Table 1 . Comparison of baseline characteristics of the EAS population used in genetic risk prediction models and full study population.
doi:10.1371/journal.pone.0057310.t001

Table 2 .
SNPs identified from meta-analysis of GWAS of CHD used in risk prediction models.

Table 3 .
Incidence, Discrimination, and Calibration Estimates of Models Using Conventional Risk Factors* and GWAS or Regression Tree SNPs in the EAS.
*Conventional risk factors = Age, Sex, SBP, Total Cholesterol/HDL Cholesterol, Diabetes and/or glucose intolerance, Smoking.Each analysis used only subjects without a diagnosis at baseline, as appropriate to investigate incident events, and with full genotypic data for included SNPs.doi:10.1371/journal.pone.0057310.t003