Combinations of Susceptibility Genes Are Associated with Higher Risk for Multiple Sclerosis and Imply Disease Course Specificity

Multiple sclerosis (MS) is a chronic autoimmune disease of the central nervous system that predominantly affects young adults. The genetic contributions to this multifactorial disease were underscored by a genome wide association study (GWAS) conducted by the International Multiple Sclerosis Genetic Consortium in a multinational cohort prompting the discovery of 57 non-MHC MS-associated common genetic variants. Hitherto, few of these newly reported variants have been replicated in larger independent patient cohorts. We genotyped a cohort of 1033 MS patients and 644 healthy controls with a consistent genetic background for the 57 non-MHC variants reported to be associated with MS by the first large GWAS as well as the HLA DRB1*1501 tagging SNP rs3135388. We robustly replicated three of the 57 non-MHC reported MS-associated single nucleotide polymorphisms (SNPs). In addition, our study revealed several genotype-genotype combinations with an evidently higher degree of disease association than the genotypes of the single SNPs. We further correlated well-defined clinical phenotypes, i.e. ataxia, visual impairment due to optic neuritis and paresis with single SNPs and genotype combinations, and identified several associations. The results may open new avenues for clinical implications of the MS associated genetic variants reported from large GWAS.


Introduction
Multiple sclerosis (MS) is an immune mediated, demyelinating disease of the central nervous system and is the most common non-traumatic cause for neurologic disability in young adults in the Western world [1]. While the etiology of MS remains unknown, results of multinational and multidisciplinary projects revealed genetic and epigenetic as well as environmental influences causing MS [2]. The genetic basis of MS, like other complex multifactorial diseases, has been a matter of investigation for the last four decades. Recently, enabled by decisive progress in genetic technology, large genome-wide association studies (GWAS) have allowed for more precise investigations into this matter, coming a long way from simple linkage studies [3]. In view of many questions concerning the impact of the genetic risk in MS-etiology, two statements find consensus in the field: a) there are no rare variants with large effects following Mendelian traits that are attributable to MS, and b) there are most likely a relatively large number of common genetic variants each with small effects associated with MS which moderately add to disease risk [4]. Ensuing from an overall small risk for MS in the general population, i.e. deduced from the MS prevalence of~0.001, it has been suggested that, even in a hypothetical scenario where all associated genetic variants have been identified, screening for them would not allow reliable prediction of MS [4].
A large international GWAS containing~10,000 MS patients and >17,000 controls with European backgrounds from 15 countries has helped to determine genetic variants contributing to the genetic risk of MS by mapping, in an initial investigation, 57 non-MHC susceptibility loci [5], with 48 additional loci reported after further enlargement of the cohort to 29,300 MS patients and 50,794 controls [6]. Although genotyping for susceptibility loci does not seem feasible to serve as a predictive or diagnostic tool in MS, in-depth analysis of the biological role of these variants [7,8], might prove useful both in predicting the clinical course of MS and for precision therapy.
In our current study, we genotyped 1,033 MS patients and 644 controls in an independent (from the International MS Genetics Consortium, IMSGC, and the Wellcome Trust Case Control Consortium 2, WTCCC2) cohort for the previously-reported 57 susceptibility loci [5] and the HLA DRB1 Ã 1501 tagging single nucleotide polymorphism (SNP) rs3135388. The aim of our study was to replicate previous findings, test for clinical parameters that may correlate with the genetic variations, as well as analyze whether genotype-genotype combinations and the number of risk loci present may partially explain the odds of developing MS.

Materials and Methods
Subjects DNA samples were obtained via isolation from peripheral white blood cells from 1,033 unrelated German MS patients (336 males with a mean age at blood withdrawal of 42.05±11.13 years, 694 females with a mean age at blood withdrawal of 41.33±11.70 years, three samples for which patients' sex was not confirmed). In the MS cohort, 648 patients showed relapsing remitting (RR), 229 secondary progressive (SP), 145 primary progressive (PP), and 11 clinically isolated syndrome (CIS) course according to Poser's or McDonald's criteria. Because collection of available DNA occurred over the last 15 years, not all patients had initially been stratified according to the McDonald criteria. Subsequent reevaluation revealed that, despite this, all CIS patients met the criteria established by McDonald et al., i.e. objective clinical evidence of one lesion [9]. SPMS was defined as continuous disability progression, i.e. motor dysfunction in the absence of or in conjunction with superimposed relapses for at least six months despite the use of immunomodulatory/immunosuppressive drugs in patients who had presented a RRMS disease course in the past. PPMS was defined as continuous disability progression without any history of past relapses. For some analyses, the MS cohort was stratified for disease progression into two subcategories: one containing only PPMS patients, the other comprised of RRMS, SPMS, and CIS patients.
DNA was also obtained from 644 age-matched healthy control subjects (380 males with a mean age of 43.28±12.05 years, 260 females with a mean age of 40.16±12.26 years, four persons with lacking sex data) residing in the Rhein-Ruhr and Hamburg areas (Germany). This current study was approved by the Ethics Committee of the Medical Faculty of the Ruhr-University Bochum, Germany (register number 4745-13). All patients and controls gave written consent for their participation.
Clinical and paraclinical data, i.e. MRI, were not available for all patients; however, correlations assessed for MRI and genotypes have been published separately [10]. For those patients whose data was available, retrospectively-acquired non-imaging clinical data obtained during regular routine examinations in our university outpatient clinic were assessed for our correlation study. The clinical data included paresis (n = 545), visual impairment due to optic neuritis (n = 545), and ataxia (n = 545) as evaluated by the functional scale scores for EDSS. These measures were correlated with either single SNPs or genotype-genotype combinations. Visual impairment due to optic neuritis included abnormal visually evoked potentials but not optical coherence tomography (OCT). The phenotypes paresis, visual impairment, and ataxia were utilized due to their quantitative nature, unlike other phenotypes consistently documented in clinical examinations. Furthermore, family history was consulted for regression analysis assessing familial clustering (n = 372), with familial history defined as an individual having any affected relatives in a direct relationship line over two generations.

Genotyping
Investigated SNP markers were selected based on the previously published study "Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis" [5] with the addition of the HLA DRB1 Ã 1501 tagging SNP rs3135388. Cohorts were genotyped for a total of 58 SNPs via TaqMan assays according to the manufacturer's protocol (Applied Biosystems, Life Technologies) on fast PCR cycling machines (Veriti Thermal Cycler, StepOnePlus Real-Time PCR System). Genotypes were accepted for automated quality calls exceeding 98% or after manual review.

Statistical analysis
Principle component analysis (PCA). Homogeneity of the analyzed cohorts was assessed using STRUCTURE v. 2.3.4 [11]. Under the assumption of K = 3 populations, 100,000 burn-in periods, 100,000 Markov chain Monte Carlo (MCMC) replications were included after the burn-in period. Correlations, as well as independency of allele frequencies models, among the tested populations were analyzed separately. The simulations were performed 20 times for each allele frequency model, and the mean value of the proportion of membership of each pre-defined population was calculated including its corresponding standard deviation. No significant differences were observed for the two tested cohorts as indicated by the triangle plot for K = 3 populations (see Figure A in S1 File and Table A in S1 File).
Genotyping. Single SNP-marker testing: Allele and genotype frequencies were compared by χ 2 testing. P-values were evaluated uncorrected as well as after Benjamini and Hochberg correction. Markers would have been excluded when Hardy-Weinberg equilibrium (HWE) yielded values less than 0.001; however, all genotyped markers, with exception of CYP24A1 (p = 0.000583 in the control group) passed HWE-selection criteria. For completeness, the results for CYP24A1 remain included.
Of the 58 SNPs tested, 21 (36.2%) markers share p uncorr -values of <0.05. After Benjamini and Hochberg correction for multiple testing, 4 (6.9%) markers pass a p corr -value threshold of p<0.006. Comparison of consistency for the previously-identified risk alleles reveals one deviation for the 21 (4) SNPs with p uncorr <0.05 (p corr <0.006) for ZNF746 as underlined in Table 1.
Sex effects. We stratified the cohorts with respect to sex. Allele and genotype frequencies were compared by χ 2 testing. P-values were evaluated uncorrected as well as after Benjamini Weighted genetic risk score for MS. A weighted genetic risk score (wGRS) has been previously reported using 16 established MS risk loci [12]. Here, we tested whether a higher genetic load for MS-risk factors can be identified in the MS-cohort (and sub-cohorts) in comparison to the control-cohort. Therefore calculated ORs were used as weighting factors and the amount of risk alleles as multiplicative variables (no risk allele = 0, one risk allele = 0.5, two risk alleles = 1) in order to obtain the sum above all MS-risk loci.
Only individuals with no missing genotypes for the investigated loci were included. We then compared the wGRS values of all groups using Student's t-test (N all MS = 868; N controls = 535; N RRMS+SPMS+CIS = 743; N PPMS = 125). We confirmed that a significant correlation is present between wGRS scores and MS outcome (p = 6.5 Ã 10 -28 ) with a mean wGRS of 34.169 ± 2.784 for controls in comparison to a wGRS of 35.820 ± 2.621 for the entire MS cohort (see Fig 1) for the tested 58 risk loci. Comparison of the MS sub-cohorts with the control-cohort yields significant differences (p = 4.2 Ã 10 -12 for PPMS and p = 6.5 Ã 10 -25 for RRMS+SPMS+CIS). Comparison of the MS sub-cohorts among each other revealed no significant differences with regard of wGRS and disease progression. In addition, reciprocal subtraction of the wGRS curves indicates that MS patients in general have a higher presence of wGRSs between 34.5 and 43.0 whereas controls show in general a higher frequency of wGRS between 24.0 and 34.5 (see Fig 2). χ 2 -testing for wGRS>34.5 being a risk factor reveals statistical significance for higher wGRS, with a two-tailed p-value of 2.4 Ã 10 -16 , an OR of 2.516 with a 95% C.I. of 2.003-3.160.
The wGRS was also tested for association with the age of onset and the EDSS score, for the entire as well as the sub-cohorts.
Regression analysis. Using regression analysis, we investigated whether certain genotype/ genotype combinations led to increased MS-risk when compared to corresponding single genotype contributions. We also included testing of different inheritance models (recessive, dominant and co-dominant), and thus allele combinations, in relation to MS-manifestation by designating one of the recoded genotype combinations (see Figure C in S1 File) as risk factor and consolidating the remaining eight combinations as a group for the absence of the risk factor. χ 2 -testing was performed for all nine combinations for the corresponding models (see Figure C in S1 File), comparing all MS patients vs. controls, and, in addition, the MS subcohorts (PPMS and RR+SPMS+CIS) vs. the controls. The HLA tagging SNP was excluded from genotype-genotype analysis.
Significantly-associated genotype-genotype combinations were further processed after passing a false discovery rate (FDR) control of <0.2. In order to evaluate the contribution of given genotype-genotype combination and MS risk, we further compared the obtained combination ORs (OR combined ) with the ORs of the underlying single genotypes (OR single ) present in that given combination, excluding all those where no significant difference was present between OR combined and the corresponding OR single , using following calculation: 4. p À value for the ratio z ¼ d

SEðdÞ
Genotype-genotype combinations recurring in more than one model were filtered based on best fit (p-value/OR), thus only one combination is listed for clarity. The genotype-genotype combinations passing all abovementioned criteria are listed in the Tables B-D in S1 File. In addition to the sole positive hits for each sub-cohort and model tested, we listed the results for the other cohorts and the same test category in order to estimate whether the identified genotypegenotype combination is only relevant for a MS sub cohort or the whole MS population tested.
Power analysis. Power analysis for the replication of previously reported risk loci [5] were conducted using QUANTO [13] under the assumption of a conservative recessive model and the reported ORs.

Results and Discussion
In the present study, we uniformly recruited MS patients from our outpatient university clinic. Though all patients and controls originate from a confined geographic area in Germany, principle component analyses confirmed the representative nature of the data set (see Figure A in S1 File). For the phenotype correlation analyses, we included retrospective data of patients who had been monitored for clinical and paraclinical outcome measures, i.e. MRI and disease course.
Twenty-one of 58 SNPs tested in our cohort were significantly associated with MS, with the HLA DRB1 Ã 1501 tagging SNP being the strongest risk locus and 4 of the 21 passing significance threshold after correction for multiple testing (HLA, no gene rs13192841, ZNF746, TNSF14; Table 1).
Considering the power of our data set and the resulting limited probability of~8-23% per SNP that these loci occur in independent cohorts, the confirmation of 4 (21) susceptibility loci (after and before Benjamini and Hochberg correction, respectively) underscores the robustness of the GWAS data. Important to note, the risk allele in the ZNF746 susceptibility locus in our cohort deviated from the IMSGC&WTCC2 risk allele (see Table 1), which may represent an independent effect in our cohort. Taking into account that the majority of the individual cohorts included in the GWAS were smaller than our single center cohort (nine countries included considerably smaller numbers of MS patients [5]), our results also imply that the set of recently identified MS-associated SNPs may contain false-positive common variants and/or that population-specific susceptibility loci need to be reconsidered, e.g. by larger population specific GWAS. This line of hypothesis is supported the observation that certain loci show inverse associations within different populations, i.e. rs180515 in the French (risk allele) and the Norwegian ("protective" allele) populations [5]. Association analysis after sex stratification and subsequent Benjamini and Hochberg correction showed only one sex-specific effect: ZFP36L1. Both female and male MS/sex subcategories show opposing trends for genotype distribution (see Figure B in S1 File) when compared to their corresponding counterparts, possibly explaining why no significant association for ZFP36L1 was apparent in the overall evaluation (all MS patients vs. all controls; see Table 1). However, this sex-specific result has to be considered with caution since the male to female ratio is 0.48:1 and 1.46:1 in the MS and control cohorts, respectively. Nevertheless, previous epidemiological studies have shown that the within-cohort sex distribution has no significant effect on the obtained results [14].
Assuming that the genetic risk of MS is determined by many common genetic variants, each with a small effect, it is feasible to conclude that individual MS patient genomes contain a higher load of disease associated SNPs than respective non-affected individuals in the same population [4]. This assumption has been translated into a predictive value for MS detection in a study which utilized 16 known susceptibility loci [12]. It has also previously been suggested that the accumulation of disease-associated SNPs is likely to differ in individual MS patients in multi-case families from sporadic cases, with familial cases having a higher load of associated SNPs [15]. In order to evaluate whether the SNP load in our MS cohort revealed differences from the controls, we assessed the wGRS, which considers the number and relative contribution of the risk alleles, as previously reported [12] based on the 58 susceptibility loci genotyped in our MS and control cohort. Indeed, the SNP load in MS patients [mean wGRS = 35.820 ± 2.621; p = 6.5 Ã 10 -28 ] was significantly higher than in the control group [mean wGRS = 34.169 ± 2.784], within the whole cohort (not distinguishing between familial and sporadic MS). Moreover, reciprocal subtraction of the wGRS curves indicates that MS patients in general have a higher wGRS (between 34.5 and 43.0), whereas controls show a frequency of wGRS values between 24.0 and 34.5. Estimating the probability to develop MS based on a wGRS>34.5, we were able to confirm that an increased load of risk SNP alleles clearly elevates the risk for MS (OR = 2.516, 95% C.I. of 2.003-3.160, Figs 1 and 2). Correlation of the wGRS with age of onset and EDSS score, for the entire as well as the corresponding sub-cohorts, revealed no significant associations. As discussed in prior studies, higher wGRS scores are of very limited value in MS. Yet, these findings underscore the concept of cumulative genetic risks in MS through the additive effect of tiny contributions by common genetic variants [4].
We next evaluated whether single SNP-SNP genotype combinations within the 57 non-HLA susceptibility loci, including different dominance models for any two risk-loci, exerted a higher risk for MS than the respective single risk genotypes. Risk association was analyzed for the MS patients vs. controls and for different clinical subgroups stratification (RRMS and SPMS v. PPMS). To test our hypothesis, we evaluated a total of 14,364 possible genotype combinations for each group and inheritance model (all MS; RR/SPMS and CIS were considered one group considering the clinical course of MS; PPMS; controls) by binary logistic regression analyses. To reduce statistic artifacts to a minimum, we rigorously discarded all combinations with a higher FDR of > 0.2. Our analyses revealed in total 37 genotype-genotype combinations possibly contributing greater to MS pathogenesis than their underlying single genotypes, four of which show p-values <1 Ã 10 -4 that were significantly associated either with ORs>1 or ORs<1 from either the entire MS, or RR/SPMS as depicted for the different scenarios in Fig 3. All four genotype-genotype combinations revealed higher or respectively lower ORs than the single genotypes and, of special relevance, seven loci that were not significantly associated after correction with MS in our cohort were represented in these extended genotype-genotype combinations (IL7, ARL6IP4, CXCR5, TMEM39A(CD80), TNFRSF1A, no gene rs669607, PVT1; Table 1 and Fig 3). Despite the size of this single center cohort, we could not overcome the limitation of statistical power with respect to the SNP-SNP combination analysis after sex stratification. Therefore, we cannot exclude the possibility that sex, and in this present cohort also sex composition, may reveal further SNP-SNP genotype combinations which should be addressed in future larger GWAS.
In our extended study, we correlated well-defined MS clinical phenotypes, i.e. paresis, visual impairment due to optic neuritis, and ataxia with MS associated SNPs. Here, we saw several single genetic variations that revealed significant correlation with the candidate clinical phenotypes assessed in our study (see Table E in S1 File). Interestingly, we also found an association of identified genotype-genotype combinations with these clinical phenotypes which seem to outweigh the single SNP effect as shown for the prevalence of visual impairment (as shown in Fig 4 for visual impairment and Table F in S1 File). However, due to limited available data, the clinical correlation data need to be interpreted with caution with respect to statistical robustness. At most, the present study may indicate the possibility how GWAS can be applied to the clinical context. incl. 95% C.I.) for the significantly associated genotype-genotype combinations, including the contributing single genotype ORs. Lead genotypegenotype combinations have been listed and colored for ORs with a p-value <1*10 -4 , red = OR>1, green = OR<1. ORs have been listed for all tested cohorts/disease courses (All MS; RRMS, SPMS and CIS; PPMS) in order to distinguish disease course specificity. The contributing genotype is stated in brackets behind the corresponding heading gene name. On the right handed side, the percentage distribution of each genotype-genotype combination in the given cohort is presented as a bar chart. The significantly associated genotype-genotype combination from the left handed graphic is highlighted via frame and colored arrow; red = OR>1, green = OR<1. A, B, C: correspond to significantly associated genotype-genotype combinations obtained from models A, B and C. Our replication data of associated SNPs identified by the latest GWAS suggest that larger, population-specific GWASs are needed to identify all potentially population-specific relevant MS-associated genetic variants. While this paradigm opposes the notion that MS shares the same causal pathogenetic features among different populations, it is important to remember that all individual MS-risk alleles confer a small contribution to the greater disease course. The interaction of (risk) genes among each other [16], as shown in other autoimmune disorders such as lupus erythematosus [17], and with environmental risk factors, as recently demonstrated for esophageal malignancies [18] and previously for MS [19], will most likely differ within populations with diverse genetic backgrounds and lead to innumerable complex combinations that predispose for MS. For instance, it seems rather unlikely that Vitamin D deficiency, a suspected environmental risk factor for MS, has the same risk effect among populations situated in distant-vs. near-equatorial locations. Based on recently-identified risk loci, our data further imply that genotype-genotype combinations of MS associated SNPs reveal higher risks for MS than corresponding single SNPs (Fig 3 and Tables B-D in S1 File), bearing in mind that the analyzed SNPs may just tag the MS-relevant variations. This observation is consistent with previously proposed gene-protein pathway analyses based on mathematical models [20]. These results require confirmation in large independent replication studies, and, importantly, they require functional correlates before assumptions about pathogenetic gene-gene combination effects can be truly made. In view of additional MS associated risk loci expected to be reported soon, it is vital to elaborate further on how these loci contribute to disease.
Supporting Information S1 File.   incl. 95% C.I.) for the significantly with visual impairment associated genotype-genotype combination, including the contributing single genotype ORs. The lead genotype-genotype combination has been listed and colored in red for OR>1. The contributing genotype is stated in brackets behind the corresponding heading gene name. On the right handed side the percentage distribution of each genotype-genotype combination in the given cohort is presented as a bar chart. The significantly associated genotype-genotype combination from the left handed graphic is highlighted via frame and colored arrow; red = OR>1.