Perspectives on the Use of Multiple Sclerosis Risk Genes for Prediction

Objective A recent collaborative genome-wide association study replicated a large number of susceptibility loci and identified novel loci. This increase in known multiple sclerosis (MS) risk genes raises questions about clinical applicability of genotyping. In an empirical set we assessed the predictive power of typing multiple genes. Next, in a modelling study we explored current and potential predictive performance of genetic MS risk models. Materials and Methods Genotype data on 6 MS risk genes in 591 MS patients and 600 controls were used to investigate the predictive value of combining risk alleles. Next, the replicated and novel MS risk loci from the recent and largest international genome-wide association study were used to construct genetic risk models simulating a population of 100,000 individuals. Finally, we assessed the required numbers, frequencies, and ORs of risk SNPs for higher discriminative accuracy in the future. Results Individuals with 10 to 12 risk alleles had a significantly increased risk compared to individuals with the average population risk for developing MS (OR 2.76 (95% CI 2.02–3.77)). In the simulation study we showed that the area under the receiver operating characteristic curve (AUC) for a risk score based on the 6 SNPs was 0.64. The AUC increases to 0.66 using the well replicated 24 SNPs and to 0.69 when including all replicated and novel SNPs (n = 53) in the risk model. An additional 20 SNPs with allele frequency 0.30 and ORs 1.1 would be needed to increase the AUC to a slightly higher level of 0.70, and at least 50 novel variants with allele frequency 0.30 and ORs 1.4 would be needed to obtain an AUC of 0.85. Conclusion Although new MS risk SNPs emerge rapidly, the discriminatory ability in a clinical setting will be limited.


Introduction
Multiple sclerosis (MS) is caused by an interplay of multiple genetic variants and environmental factors. The genetic influence on MS is substantial, as evidenced by the 20-fold risk increase for siblings of MS patients [1]. Part of the genetic risk is explained by the MHC class II locus (HLA-DR15) [2]. In 2007 several novel risk alleles for MS were identified by a genome-wide association (GWA) study [3] and others confirmed the susceptibility loci by meta-analyses and replication [4]. Since GWA the progress has been rapid and more new risk loci have been identified and confirmed [5,6,7,8,9]. A recent study in 9,722 cases and 17,376 controls identified 53 associated variants [9].
Given the gene-environmental and multi-genetic causes of MS, these susceptibility variants mainly have weak effects and are likely to contribute to a small increase in MS risk individually. It is commonly agreed that testing single susceptibility genes is not useful for prediction of MS risk, but the question remains whether combining susceptibility loci in risk models could have an added value on MS prediction in individuals. The predictive performance of genetic risk models has been investigated for other diseases in simulation studies [10,11]. These studies suggest that the predictive value improves by combining multiple common low-risk loci.
We investigated the extent to which MS risk can be predicted using genetic risk models. First of all we tested in our empirical data the predictive performance of 6 combined genotyped SNPs, using risk scores compared to a prior chance of someone in our population having MS. However whether genetic risk models will potentially be used in clinical or public health practices depends on the accuracy of the test to discriminate between individuals who will develop MS and who will not. The discriminative accuracy is generally expressed as the area under the receiver operating characteristic curve (AUC). Therefore, secondly we tested the potential performance of SNP genotyping in a simulation study by adding risk genes into the model. For this, we constructed a risk model based on 1) the 6 genotyped SNPs, 2) the 24 recently well replicated genome-wide associated polymorphisms [9] and 3) the 53 replicated genome-wide associated polymorphisms including the 29 newly identified polymorphisms [9]. Finally, we included hypothetical variants in the risk model, in order to investigate the future potential.

Empirical study
Ethics Statement. This study was approved by the Ethics Committee of the Erasmus University Medical Centre, METC Erasmus MC Rotterdam. All participants were recruited in Erasmus University Medical Centre and written informed consent was obtained.
Study population. A total of 591 MS patients and 600 controls were included in this study. The MS patients were recruited and ascertained as part of an ongoing nationwide study on genetic susceptibility in MS and fulfilled McDonald criteria for MS [12]. Details on ascertainment are given elsewhere [13].
Risk score analysis. All statistical analyses on empirical data were performed using SPSS version 15. Associations of individual SNPs were investigated using logistic regression. We also applied logistic regression analyses to investigate the combined predictive value of the risk allele score based on all SNPs with and without HLA-DRB (rs3135388) using the a priori probability of an individual in our population developing MS as reference. As we tested a total of 6 SNPs in our empirical study, the Bonferoni-corrected p-value for significance was 0.008. The weighted risk allele score was calculated by multiplying the number of risk alleles with the effect size for each SNP obtained from the literature and summing this up for each participant with complete genotype information, with risk alleles being the alleles associated with increased risk of MS. All analyses were adjusted for age and sex.

Simulation study
Modelling strategy. We used a modelling procedure that has been developed and published previously [14], and which has also been used by others [15]. Briefly, the procedure creates a dataset with information on genotypes and disease status for a population of 100,000 individuals. The dataset is constructed in such a way that the odds ratios and frequencies of the genotypes and the disease risk match the specified values, which are obtained from the literature. Predicted MS risks are calculated using Bayes' theorem, which states that the posterior odds of MS for each individual is obtained by multiplying the prior odds by the likelihood ratio (LR) of their genotype status on all polymorphisms. The prior odds is calculated from the baseline population MS risk (p) using the formula p/(1-p). Under the assumption of independent genetic effects i.e., no linkage disequilibrium between the genetic variants, the LR is obtained by multiplying the LRs of all individual genotypes that are included in the risk model [16]. The LRs of the genotypes of each single genetic variant are calculated from a genotype by disease status contingency table [14]. This table is constructed from the frequency and ORs of the genotypes and the population MS risk. The table can also be constructed from allele frequencies and per allele ORs when Hardy-Weinberg Equilibrium is assumed for the distribution of the genotypes. The frequencies and ORs all are specified as study parameters and varied between the simulation scenarios. The posterior odds are converted into MS risks using the formula odds/(1+odds).
Discriminative accuracy. The discriminative accuracy is the extent to which the test results can discriminate between individuals who will develop MS and those who will not [17]. The AUC gives an assessment of the discriminative accuracy of a prediction model and ranges from 0.5 (equal to tossing a coin) to 1.0 (perfect prediction). All simulations were repeated 100 times to obtain robust estimates of the AUC. All results are presented as averages of the repeated simulations. The obtained confidence intervals were extremely small, often equal to the point estimate, and therefore not presented in this paper. Analyses were performed using R software (version 2.12.1) [18].
Simulation scenarios. Recently, a large GWA study was presented as part of the collaboration between Wellcome Trust Case Control Consortium 2 (WTCCC2) and the International Multiple Sclerosis Genetics Consortium (IMSGC) [9]. Twentythree MS associated non-major histocompability complex (MHC) loci were replicated in the primary GWAS involving 9,772 cases and 7,296 controls with P GWAS ,1*10 23 . Table 2 provides the 23 replicated non-MHC SNPs with the combined ORs and p-value. The risk allele frequency represents the allele frequency in control population of UK, as being the largest sample. Table 2 also includes the HLA-DRB1*15:01 MHC SNP, which have been shown to significantly increase the risk for MS. These 24 risk SNPs also include the 6 polymorphisms of our empirical data. The collaboration also presented the identification of 29 novel susceptibility loci as shown in table 3. This leads to a total of 53 risk SNPs.
Three different simulation scenarios were considered. In each scenario genotypes and MS status were simulated for 100,000 individuals, assuming a lifetime MS risk of 0.1%. The first scenario calculated the AUC within the empirical data weighted on literature frequency. The second scenario assessed the increase in AUC by adding additional risk alleles, starting with the 6 genotyped risk loci given the replicated ORs. We compared this to the calculated AUC for validation of the simulation model. Next, the AUC was calculated with the 24 replicated SNPs in the recent Nature paper including the 6 genotyped SNPs. And finally, the AUC was assessed on a risk model including the 29 novel susceptibility loci on top of the replicated SNPs, leading to a total of 53 SNPs. The third scenario investigated the magnitude of the allele ORs of 1 to 100 polymorphisms that need to be added to the risk model to increase the discriminative accuracy. Since there are no models known in the literature for predicting MS risk we pursued AUCs known to be used for other diseases in the literature [19,20]. We investigated AUC thresholds of 0.70, 0.75, 0.80 and 0.85. The ORs were obtained for different frequencies of the risk alleles.

Empirical study
A total of 588 cases and 599 controls were successfully genotyped for at least one polymorphism, while complete genotype information on all polymorphisms was available for 564 cases and 581 controls. The mean age (SD) within the cases and controls was 45 (12) and 49 (17) years, respectively. The cases included 71% female and the controls 55%. None of the polymorphisms deviated significantly from Hardy Weinberg Equilibrium (lowest Hardy Weinberg p-value = 0.15 for IL2RA: rs2104286). Table 1 shows the individual effects of each SNP on MS risk in our genotyped population. Increased risk for MS was confirmed for the minor alleles of EVI5, HLA-DRB and CLEC16A, and for the major alleles of CD58 and IL7R. For IL2RA the association was not statistically significant (OR 1.14, 95% CI 0.95-1.38). When adjusting for multiple testing only HLA-DRB, CLEC16A and CD58 remained statistically significant. Figure 1A shows the risk score when including all SNPs into the model. The reference category is based on the a priori risk for developing MS, which in our population was 49% ( = 564 cases divided by 581 controls). Individuals with 0 to 5 risk alleles have a significantly decreased risk for developing MS of 0.28 (95% CI 0.16-0.48) compared to the a priori risk for developing MS. On the other end of the spectrum, individuals with 10 to 12 risk alleles have a significantly increased risk of 2.76 (95% CI 2.02-3.77). Figure 1B shows that, when excluding the variant with the strongest risk effect (HLA-DRB) from the risk score, individuals with 0 to 5 risk alleles have a decreased risk of 0.50 (95% CI 0.34-0.73) and individuals with 8 to 10 risk alleles have an increased risk Simulation study Table 2 provides the 24 replicated SNPs from the recent Nature paper [9] which have been shown to significantly increase the risk for MS. These 24 risk SNPs include also the 6 polymorphisms of our empirical data. Table 3 shows the 29 newly identified polymorphisms in this Nature paper, leading to a total of 53 risk SNPs.
First, we calculated that the AUC for the genotyped 6 SNPs within the empirical data weighted on literature frequency was 0.64. Second, in the simulation study we assessed the AUC increase by including additional risk alleles. The AUC for the recently replicated ORs of the 6 SNP's used in the empirical study was 0.64. This showed to be the same as the calculated AUC from the empirical study. Next, including the 24 known polymorphisms in the model the AUC rised to 0.66, and slightly increased to 0.69 after including all 53 SNPs in the model (Figure 2). Finally, we explored the possibilities in the future with new risk alleles to be discovered. Table 4 shows the number of new risk genes with specific allele frequencies in combination with different ORs that would be needed in addition to the original 53 risk variants to obtain AUCs of 0.70, 0.75, 0.80 and 0.85. For example to increase the AUC just slightly to 0.70 we have to add to our model 20 new variants, with a realistic OR of 1.1 and an allele frequency of 0.30. However if we want to increase the AUC to 0.85 we have to add 50 new variants with an OR of 1.4 and an allele frequency of 0.30. For more realistic ORs this would mean we would have to add even more polymorphisms to the model.

Discussion
This study investigated the extent of MS prediction by genetic risk models, using empirical and simulation data on the most updated genetic information for MS. First, we showed that the predictive performance of testing multiple genes can be enhanced by using a combination of individual MS risk alleles. As expected, HLA-DR influences the ability to predict MS considerably due to its high OR. However, even without HLA-DR there was an increased, but small, risk for developing MS in people with 8 to 10 risk alleles. This underlines the current insight that multiple genes exert a small effect on developing MS on top of the major influence of HLA-DR [21,22]. Next, after validating the genetic risk models with simulated genotype and MS status in a population of 100,000 individuals, we estimated that the predictive value as reflected in AUCs would be 0.66 when all 24 well replicated GWA derived polymorphisms were considered. Moreover, we showed that including the 29 novel risk genes increased the AUC only slightly to 0.69, illustrating that even more than doubling the number of risk SNPs does not increase the AUC sufficiently to make it useful in clinical practice. The AUC of 0.69 is comparable to other risk prediction models in MS [23,24,25]. In 2009, De Jager and colleagues investigated the prediction of 16 MS susceptibility loci using weighted genetic risk scores in three cohorts [23]. They demonstrated a consistent discriminatory ability in three independent samples (AUC varying 0.64-0.70). Gourraud and colleagues also investigated the aggregation of genetic MS risk markers in individuals by comparing multiple and single case families [24]. They showed that a greater genetic burden in siblings of MS patients was associated with an increased MS risk (OR 2.1, p = 0.001). However, the AUC for genetic burden differences between probands and siblings was only 0.57, indicating that the available genetic data is not sufficient to achieve case-control prediction of MS. They also used 16 MS susceptibility loci, partly matching with those of De Jager et al.
Before interpreting the clinical relevance of our findings, a methodological issue needs to be disclosed. We assumed that genetic variants inherited independently and that the combined effect of the genetic variants on disease risk followed a multiplicative risk model of independent effects (i.e., no statistical interaction terms were included in the model). Although so far no studies have demonstrated gene-gene interactions with MS risk, it is still possible that these will be discovered in future studies in larger populations. However, gene-gene interactions only improve the MS risk predictions if their effect sizes are substantially high (e.g., OR.5). When interaction effects are smaller, their effects on the predictive accuracy will be comparable with that of single gene effects, because by definition their frequencies are lower.
With the current model including 53 variants, we are still not able to differentiate with reasonable accuracy between individuals who will develop MS and those who will not (AUC 0.69). This makes our model not clinically useful. So the question is raised how to improve MS prediction.
We demonstrated in the simulation study that in order to obtain higher AUCs, a considerable number of additional common genetic variants or stronger associated variants with high ORs (table 4) need to be identified. The per-allele OR of the polymorphisms identified in GWA studies ranges from 1.08 to 2.1. When future GWA studies will identify polymorphisms with per-allele ORs around 1.1, the predictive ability of the genetic risk model can theoretically be improved beyond that of the existing models. Yet, even small improvements to 0.70 still require the discovery of 20 new statistically significant variants. Despite the increase it is still not clinically applicable. Because even in a disease that is readily treatable and even preventable like coronary heart disease (as presented in the Framingham Risk score) an AUC of about 0.80 is used [26]. For MS there is still no cure or preventive treatment available, and so a higher predictive accuracy is desirable to prevent false positives. We have shown that to pursue an AUC of 0.85, we have to include 50 new variants with ORs of 1.4 or a few common variants (minor allele frequency .30%) with high ORs (table 4). This may prove to be difficult, because the common genetic variants with high ORs may already have been identified, which would imply that even higher numbers of common genetic variants with relatively smaller ORs or many exceedingly rare variants (minor allele frequency ,1%) with high ORs, will be needed. This seems not feasible. To note, unlike HLA-DR most of the genetic risk factors identified so far have only a slight effect on susceptibility to MS (with ORs that range from 1.1 to 1.2) [23]. However, more high risk genetic MS risk variants can be expected in near future [27]. With novel techniques such as next generation sequencing we can expect new rare variants with high ORs to be discovered [28]. This approach has already been proven successful in rare Mendelian disorders and can potentially  Table 4. Odds ratios and related allele frequencies needed to obtain AUCs of 0.70-0.85 in addition to the 53 statistically significant genetic susceptibility variants (AUC = 0.69).  also identify rare variants explaining the high recurrence rate of MS within families [29]. Also, this technique potentially allows us to find the causal variants for MS which will most likely have higher ORs than those found in GWA studies. Another approach to improve MS prediction could be combining genetic with nongenetic risk factors such as infection with Epstein-Barr virus (EBV), smoking, and serum vitamin D concentrations [30]. It is likely that risk prediction models combined with nongenetic factors will perform better as ORs for SNPs tend to be smaller than ORs based on nongenetic factors (e.g. infectious mononucleosis [31]). De Jager and colleagues showed an enhanced discriminatory ability of 16 susceptibility genes by the inclusion of sex (AUC increasing from 0.70 to 0.74) and smoking and immune response to EBV (AUC increasing from 0.64 to 0.68). Others have performed studies combining the effects of HLA-DR and non-genetic factors like smoking and anti EBV serum levels [32,33]. Also, integration of transcriptional, proteomics, and clinical factors will probably improve the prediction model and with that our understanding of MS genetics [34]. However, the added value of the SNPs might then be questioned. For other diseases it has been shown that the AUC does not improve a lot when adding SNPs to clinical risk factors. It should be noted though, that in these studies only small numbers of SNPs were added to the clinical risk factors.

Risk allele Frequency
Even if we can improve the prediction of MS in the future the question remains what the clinical implications of such predictive risk models would be. The discriminative accuracy that is required in preventive or clinical care depends on the goal of testing, the availability of (preventive) treatment, and the adverse effects of false-positive and false-negative test results.
Although the early results from GWA studies have not yet been used clinically, at least a partial goal of understanding the genetic basis of MS is to investigate the use of these variants to predict disease risk, so that environmental changes or therapeutic interventions can be initiated before the inflammatory demyelinating process progresses or even starts. Also, by better mapping the genetic of MS, we hope to improve our understanding of the pathofysiology of MS. This could help us finding better and new therapeutic drugs. By combining family history with a quantitative measure of genetic risk, a screening method might eventually be implemented that could identify clinically silent evidence of disease among first-degree relatives of MS patients, who have 20-50 times higher risk of developing MS [35]. However, the absolute risk is only 2-5% and therefore the models could be more useful in high risk populations with individuals who have had clinically isolated syndrome suggesting MS. These patients present with a neurological disability during their productive years of life and face the possibility of a chronic disease. Thus, they yearn for more clarity about their future. But also improving the risk prediction would enable us to distinguish individuals at risk to start early treatment for reducing the accumulation of neurological disability [36].
Given the possible clinical consequences of false-positivity within these patients, the required prediction AUCs for the presymptomatic diagnosis is considerably higher than an AUC intended for clinically isolated syndrome. It has been suggested that identified genetic variants have stronger effects in multiplex families [37]. It is of note that the ORs assessed up to now in GWA studies and validation studies are generally derived from datasets on sporadic cases. In a multiplex family setting, with potential stronger effects for individual risk variants, our estimates may prove to be conservative.
In conclusion, our analyses show that prediction of MS risk based on low susceptibility variants theoretically can improve prediction of disease when more variants are being discovered. However, the discriminatory ability in a clinical setting will be limited.

Ethics approval
The empirical study was conducted with the approval of the institutional medical ethics committee of Erasmus MC, Rotterdam, The Netherlands.