The Nature of Genetic Susceptibility to Multiple Sclerosis

OBJECTIVE To explore the nature of MS-susceptibility and, by extension, other complex-genetic diseases. BACKGROUND Basic-epidemiological parameters of MS (e.g., prevalence, recurrence-risks for siblings and twins, time-dependent changes in sex-ratio, etc.) are well-established. Moreover, >200 genetic-loci are unequivocally MS-associated, especially the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype-association. DESIGN/METHODS We define the “genetically-susceptible” subset-(G) to include everyone with any non-zero life-time chance of developing MS. We analyze, mathematically, the implications that these epidemiological observations have regarding genetic susceptibility. In addition, we use the sex-ratio change (observed over a 35-year interval), to derive the relationship between MS-probability and an increasing likelihood of a suitable environmental-exposure. RESULTS We demonstrate that genetic-susceptibitly is restricted to less than 4.7% of populations across Europe and North America. Among carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype, fewer than 20% are even in the subset-(G). Women are less likely to be susceptible than men although their MS-penetrance is considerably greater. Response-curves for MS-probability increase with an increasing likelihood of a suitable environmental-exposure, especially among women. These environmental response-curves plateau at under 50% for women and at a significantly lower level for men. CONCLUSIONS MS is fundamentally a genetic disorder. Despite this, a suitable environmental-exposure is also critical for disease-pathogenesis. Genetic-susceptibility requires specific combinations of non-additive genetic risk-factors. For example, the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype, by itself, poses no MS-risk. Moreover, the fact that environmental-response-curves plateau below 50%, indicates that disease-pathogenesis is partly stochastic. By extension, other diseases for which monozygotic-twin recurrence-risks greatly exceed disease-prevalence (e.g., rheumatoid arthritis, diabetes, and celiac disease), must have a similar genetic basis. Author Summary We define a “genetic-susceptible” subset (G) of the general population (Z) to include everyone with any non-zero chance of developing MS over their life-time. Using well-established epidemiological data from across Europe and North America, we establish that genetic-susceptibility is confined to less than 4.7% of these populations. Thus, the large majority of individuals have no chance whatsoever of developing MS, irrespective of any environmental conditions that they may experience during their lifetimes. In this sense, MS is fundamentally a genetic disorder. And, indeed, more than 200 genetic-loci, in multiple genomic locations have now been well-established to be associated with MS. Notably, however, the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 or (H+) haplotype, which has, by far, the strongest MS-association of any, has a carrier frequency in the population of 23% in North America and Europe. Therefore, with genetic susceptibility in the population being less than 4.7%, more than 80% of (H+)-haplotype carriers, must not be genetically-susceptible and, thus, have no chance of developing MS. In this circumstance, genetic susceptibility to MS must arise from a combination of this haplotype with “susceptible states” at other genetic loci. By itself, the (H+)-haplotype poses no risk. Indeed, genetic-susceptibility, generally, seems to require specific combinations of non-additive genetic risk-factors. Naturally, the conclusion that MS is fundamentally genetic does not preclude the possibility the environmental events are also critical to disease-pathogenesis. Using epidemiological data about the world-wide increase in the (F:M) sex-ratio for MS to construct (for men and women separately) the response curves relating an increasing likelihood of MS to an increasing likelihood of a sufficient environmental exposure (i.e., an exposure sufficient to cause MS in a susceptible individual). This analysis provides insight to both disease-susceptibility and disease-pathogenesis. First, men are more likely to be susceptible than women although susceptible women are considerably more likely to actually develop MS. Second, men seem to have a lower environmental threshold than women for developing MS. Nevertheless, women are more responsive to changes in the environmental conditions compared to men. Third, even with a maximal environmental exposure, susceptible women never exceed a 50% chance of developing MS. By contrast, susceptible men have a significantly lower likelihood (<10% chance) of developing MS. This indicates that stochastic factors must also be critical in disease pathogenesis. Finally, the nature of genetic susceptibility developed here for MS is applicable to many other complex genetic disorders. Indeed, for any disease, in which the proband-wise MZ-twin concordance rate greatly exceeds the disease-prevalence in the population (e.g., type I diabetes, rheumatoid arthritis, and celiac disease), only a small fraction of the population can possibly be genetically susceptible, as defined.

Despite the undoubted influence of genetic and environmental factors in MS-pathogenesis, susceptibility to MS might be envisioned in number of different ways. Four examples of disease states, for which we understand, generally, the pathophysiology, can be helpful to highlight some of the issues that might also be involved in MS pathogenesis.
First, sickle cell disease (SCD) occurs in ~3% of individuals in certain sub-Saharan regions of Africa [25]. All affected individuals are homozygous for the HbS mutation of the hemoglobin gene.
Despite the fact that the clinical expression of SCD can be influenced by environmental factors such as strenuous exercise, high-altitude, infection, and dehydration, SCD is fundamentally a genetic disorder.
Second, each year, 5−20% of the population in North America gets the flu [25]. Although the genetic make-up might make one person more or less susceptible to a particular year's variant, presumably, everyone could develop the flu if they had a sufficient exposure to the influenza virus.
Therefore, despite the possible genetic differences in susceptibility, the flu is fundamentally an environmental (infectious) disease.
Third, the life-time probability of breast cancer in the US is ~12.5% in women and ~0.1% in men.
Individuals (especially women) who carry the BRCA1 or BRCA2 mutations (<1% of the population) have 4-7 times the risk as that in the general population [25]. Nevertheless, presumably, there is a baseline risk of breast cancer such that no one is completely risk-free. Although the genetic make-up (including gender) influence the baseline risk and the environment likely affects the penetrance of the BRCA mutations, some breast cancer cases are fundamentally genetic and others are fundamentally environmental (of unclear type, but possibly due to exposures such as by toxins, radiation, pregnancy, or other occurrences).
Fourth, the human immunodeficiency virus (HIV) can infect anyone in the population although individuals who engage in certain high-risk behaviors (e.g., having unprotected anal-receptive sex or using IV drugs and sharing needles) are particularly susceptible [25]. Among persons of northern European extraction, ~1% are homozygous for the Δ-32 mutation of the CCR5 gene and are almost completely resistant to HIV. Consequently, HIV infection is fundamentally an environmental disorder (infectious) with an interaction between two environmental factors (i.e., the virus and specific high-risk behaviors). However, certain genetic traits (e.g., the Δ-32 mutation) can be decisive in determining the degree of susceptibility.
Whether susceptibility to MS resembles any of these disease-states (or some other) is unknown although its polygenic nature is certain [5][6][7][8][9][10][11][12][13][14]. Nevertheless, several recent epidemiological observations in MS bear directly on the different possibilities. In this paper, we utilize directly observable, and wellestablished, "population parameters" (e.g., twin concordance rates, the percent women among MS patients, the population prevalence of MS, time-dependent changes in the gender-ratio, etc.) to logically infer the values of other non-observable parameters of interest (e.g., the population probability of being genetically susceptible, the percentage of susceptible individuals who are women, the likelihood that a susceptible individual receives a sufficient environmental exposure, etc. ).

Methods
For the purpose of this analysis we define, explicitly, five general terms ( Table 1). The first term is {P(MS)}, which represents the expected life-time probability that a random individual from the general population (Z) will develop MS . As discussed below, this parameter is related to, but not the same as, the population prevalence.
The second term is {P(G)}, which represents the expected probability that a random individual from (Z) is also a member of the subset-(G) -. In turn, we define the subset-(G) to include everyone who has any non-zero chance of developing MS (i.e., regardless of how small that risk might be). We also define the set {X} to be the set of penetrance values for members of the subset-(G). In our analysis, we will only consider the possibility that set {X} has either a unimodal or a bimodal distribution. We will discount the possibility that {X} has a more extreme distribution such as one that is trimodal or multimodal in nature.
The third term {P(E)}, represents the expected probability that a member of the (G)-subset will experience an environmental exposure sufficient to cause MS given the prevailing environmental given the fact that their co-sibling either has or will develop MS.
Lastly, the term {P(MS│IGMS)} represents the adjusted proband-wise concordance rate for MZtwins. Such an adjustment may be necessary because concordant MZ-twins, in addition to sharing their identical genotypes (IG), also share similar intrauterine (IU) and early post-natal environments. Thus, it is possible that these shared early environmental experiences of twins might significantly impact the likelihood of their developing MS in the future. One method to estimate the adjustment necessary in such a circumstance is to consider the difference in concordance rates between non-twin siblings and fraternal twins (i.e., siblings who share the same genetic relationship but who are divergent in their IU and early post-natal experiences). Although epidemiological studies have differed somewhat with regard to the magnitude of any such differences [27][28][29][30][31][32][33][34], population-based studies out of Canada suggest that the impact of these early environmental events may be substantial [29]. As demonstrated in the Supplemental Material, we can use the observed recurrence-rate data ( Table 2) to make this adjustment such that: From these definitions and relationships, we can use well-established values for the different population parameters to logically deduce the value of the another, non-observable, parameter , which represents the conditional life-time probability of getting MS for a member of the (G)-subset. This term is referred to as the expected penetrance for the (G)-subset.
We note that, from the definition of the (G)-subset, everyone who actually develops MS during their life-time must belong to this subset. Therefore, the joint probability {P(MS, G)} must be the same as {P(MS)}, so that, by definition: By contrast, if , this indicates that only certain individuals can possibly get the disease (e.g., SCD) and that, therefore, MS must be fundamentally a genetic disorder (i.e., unless a person has the correct genetic make-up, they have no chance, whatsoever, of getting the disease, regardless of their environmental exposure). Naturally, also, such a conclusion would have no bearing on whether disease pathogenesis also requires the co-occurrence of specific environmental events. Also, in this circumstance, how we might characterize the nature of genetic susceptibility, would depend upon the degree to which P(G) was less the unity and upon the magnitude of the disparity between any so-called "high" and "low" penetrance subgroups. For example, in HIV, if homozygous Δ-32 mutations were completely protective, then: . In this circumstance, however, we would likely characterize the disease as being fundamentally environmental and the homozygous Δ-32 mutations as being protective rather characterizing every other genotype as being "susceptible". By contrast, in SCD, where: , we would characterize carrying homozygous HbS mutations as the defining trait for membership in the "genetically susceptible" subset. Even if it were possible, in extremely rare circumstances, for an individual to develop SCD in the absence of homozygous HbS mutations, we would still consider this disease to be fundamentally genetic.

Argument:
One possible estimate of P(MS) could be the prevalence of MS in a population.
However, because the clinical onset of MS occurs largely between the ages of 15 and 45 years (e.g., Fig   1), the measured cross-sectional prevalence of MS (using the entire population as the denominator) will P( MS,G) = P( MS) and, analogously: P( MZMS,G) = P( MZMS) necessarily include individuals with different likelihoods of having already developed MS [36]. For example, using the 2010 United States census data (for the total resident population -see Fig 2) as an approximation, we can divide the general population (Z) into the three mutually exclusive age-bands (A1, A2, and A3), such that: Because so few of MS patients have their disease onset prior to the age of 15 years (e.g., Fig 1) it seems a reasonable approximation that: By contrast, as noted above, the age group (15-45 years) accounts for the large majority of clinical onsets, which have a roughly symmetrical distribution with a mean of 28.3 years (Fig 1). If the distribution were exactly symmetrical and centered on 30 years, the measured prevalence in this age band would be ~50% of the value of P(MS). Therefore, it seems reasonable to estimate: For the older age band (>45 years) most patients will have already developed the disease (Fig 1).
Thus, on the one hand, one might expect that the measured prevalence in this age-band to be equal to P(MS). On the other hand, there is a small but definite excessive mortality in MS such that life expectancy is reduced in MS-patients by about 5-10 years [37][38][39][40][41]. This will make the estimate too small by some amount. However, it seems unlikely for this reduction to be more than 25%. Thus, a range of plausible estimates is likely to be: Combining these three different estimates yields the estimate: Defining the measured prevalence in the population as (prev), this estimate translates to: A second method to estimate P(MS) would be to use a measured prevalence for MS, which is restricted to the age-band of 45-54 years. Thus, within this age-band, almost all patients will have already experienced their clinical-onset and only a few will have experienced their (expected) excessive mortality.
Consequently, by this method: A third method would be to use population-based death data and to consider the percentage of death certificates that mention the diagnosis of MS (not necessarily as, but including, the immediate, Such an estimate is consistent with many other published studies in northern populations, which generally find the prevalence of MS to be 100-200 per 100,000 population [42].
Similarly, in a Swedish study by Sundström and co-workers [43], the age-specific prevalence of MS in the 45-54 year age-band was reported to be 304 per 100,000 population.
And, finally, in a recent population-based multiple-cause-death study from British Columbia [44], a diagnosis of MS was mentioned on 0.28% of the death certificates.
Thus, all three of these methods of estimation are quite consistent with each other and support the conclusion that, in the northern populations of Europe and the Americas: However, despite the notable consistency of these three estimates, each of these methods relates only to "diagnosed" MS in the general population (Z). However, if undiagnosed (i.e., pathological) MS is included in the calculation [45][46][47][48], this estimate may increase by as much as 50-100% (see #8 below).

Adjusting for the Shared Early Environment of MZ-twins -P(MS│IGMS)
Conclusion:

Argument:
Most epidemiological studies in northern populations report the proband-wise concordance rate for MZ-twins to be in the range of 25-30% [27][28][29][30][31][32][33][34][35]. Using the population data out of Canada ( Table 2), leads to the estimate of: Suppose that each of the individuals within the general population (Z), has a unique genotype (Gk) such that: We can then define the term (IGMS) such that: where: Therefore: This is just the expected "adjusted" penetrance for the (MZMS) subset. As discussed earlier and, as developed in the Supplemental Material, can be estimated from the difference in proband-wise concordance rates between siblings and fraternal twins. Using the Canadian populationbased data ( Table 2) on the recurrence risks in non-twin siblings and DZ-twins (concordance rates for siblings=2.9%; concordance rates for DZ-twins=5.4%) to make this adjustment (see above) leads to the estimate of: Also, we can partition the subset (G) into two mutually exclusive sub-subsets, (G1) and (G2) such that the sub-subset (G1) has a penetrance as great or greater than that of the sub-subset (G2). Again, for ease of notation, we further define the quantities such that: and: In earlier iterations of this analysis [3,4,49,50], we defined the subset-(G) differently -i.e., .
We have chosen the current definition because it considerably simplifies the biological interpretation of the findings. Nevertheless, we note that, when , the new (G1)subset is, effectively, identical to the (G)-subset defined earlier (see below).

In turn, the term
can be re-written as: Combining these two Equations (i.e., 1 & 2 above) yields: However: Where: Consequently: and, with rearrangement: Notably, this equation can also be rearranged to yield a quadratic in (x) of: In turn, this quadratic equation can be solved to yield: which has real, non-negative, solutions only for: The maximum variance for any distribution [52,53] on the closed interval is: Consequently, the maximum variance from the quadtatic is the same that for the interval , which is: In addition, this maximum variance, , occurs when the distribution of individual penetrance values in the set {X} is bimodal [52,53], such that half of the (G)-subset has a penetrance of and the other half has a penetrance of . At this point, for each of the two quadratic solutions: From this point the variance of the (G)-subset decreases both when: (the upper solution) and when: (the lower solution) By definition, any solution requiring for any portion of (G) is excluded.
Therefore, the upper solution becomes: And the lower solution becomes: Moreover:

The Upper Solution
The upper solution, as: , represents the gradual transition from a bimodal distribution to a unimodal distribution and, ultimately, to a distribution, in which every genotype in (G) has exactly the same penetrance . As noted earlier (above), the upper solution requires that: Also, as demonstrated by others [52], the maximum variance of any unimodal distribution on the closed interval is: . Considering a unimodal distribution in the interval , therefore: Substituting this limit into the upper quadratic solution (above) -assuming this limit applies equally to the set {X} -yields: Consequently, for a unimodal distribution:

The Lower Solution
By contrast, the lower solution as: , represents an increasingly assymetric bimodal distribution of penetrance values within the (G)-subset where: . As noted above, for this lower solution to apply, requires that: Notably, the value of represents an observed population parameter and is, therefore, "fixed", allowing for the possiblity of error in its observation. Even with this constraint, however, some Lower Solution distributions are possible, which meet both of these two requirements and, for which, (see #4, below)

Breast Cancer
As an example, it is instructive to apply this same analysis to the risk in women of developing breast cancer (descsribed briefly in the Introduction). Clearly, this distribution is bimodal with <1% of women possessing the BRCA mutations, and with these individuals having 4-7 times the risk of breast cancer as that for everyone else. For this analysis, we assume that the subsets of women with (G1) and without (G2) BRCA mutations have a uniform penetrance within each subset. Also, we will also use parameter values that conform to the known epidemiology of breast cancer in women (BC) such that: Under these conditions, and in all circumstances, it is the case that: Although, unlike MS, we don't have "observational" estimates for adjusted the MZ-twin recurrence risk (x'), these circumstances for breast cancer, clearly, conform to the upper solution of the quadratic equation (above). For example, if this recurrence risk were (~15%) then: and: . In this case, the fact that the distribution is bimodal is confirmed by the fact that the value of (x) is below the lower limit for a unimodal distribution (see above). By contrast, if all breast cancers are, to some degree, purely genetic disorders -{i.e., if: } -then, as P(G) decreases, the value of (x) will increase. Nevertheless, the bimodality of the distribution will still be evident down to . Below this point, however, the bimodal nature of the distribution will no longer be distinguishible (purely by consideration of the variance) from a unimodal distribution. Regardless, however, using these parameter values, the distribution would not actually become unimodal until the point at which: .

Argument:
From the Upper Solution in Proposition #1 and in conjunction with our estimate for , it follows directly, that: Using the relationship developed in the Methods of: With this we have all the data necessary to establish the limits for the percentage of the population who are members of the (G)-subset. Thus, using this range for P(MS│G), together with our estimate for P(MS) -see #1 above -it follows that: which seems to be independent of latitude (Table 3).
Notably, we arrived at this estimate for by adjusting the observed value of downward to account for the presumed impact of the shared IU and early post-natal environments of MZ-Twins (see #2 above). To do this, we estimated this impact from the increased recurrence risks in DZ-twins compared to that in non-twin siblings (see Supplemental Material).
Although, the Canadian data suggests a larger discrepancy between {P(MS│DZMS ) and P(MS│SMS )} compared to other studies [27][28][29][30][31][32][33][34][35], it is still possible that our adjustment is too small. Even so, there is a limit to how large any adjustment can be. Thus, from Table 2, it must be the case that: Otherwise, there would be no increased risk of MS in persons who both have 100% of their genes in common and don't share their IU and early post-natal environments compared to persons who both have only 50% of their genes in common and also don't share their IU and early post-natal environments.
Importantly, however, even with such an extreme adjustment: Therefore, even using this extreme estimate, the large majority of the population (>79%) would have no chance of getting MS, regardless of their environmental exposures (see Proposition #1).
Nevertheless, these considerations pertain only to the Upper Solution and, as discussed in #5 (below), the observations from Canada regarding recurrence risks for the gender partition in MS make it clear that the set {X} is bimodal and, moreover, that it conforms to the Lower Solution. This circumstance will increase the upper limit for genetic susceptibility to MS from the 4.5% estimated here. Nevertheless, even in this circumstance, there are constraints on possible solutions such that: and: Consequently, although Lower Solutions exist for which, , none of these simultaneously match the constraints placed the observed the values of for the partitions based on gender or HLA-status (see #5 & #6, below). We conclude, therefore, that the circumstance of is excluded and that, as a result, developing MS is not a possibility for some portion of the population. Moreover, in earlier iterations of this analysis [3,4,49,50], we defined the subset-(G) differently -i.e., as . It is noteworthy that, in the present analysis, using our new defnitions, our older definition (in the circumstance of the Lower Solution) corresponds to defining only members of the (G1)-subset as being "genetically susceptible" to MS (see Supplemental Material).

Argument:
For ease of notation, we will use the parameters already defined in Proposition #1 (above Because we are only considering the possibility of either unimodal or bimodal distributions for the set {X}, therefore, the sets (G1) and (G2), considered separately, must each be unimodal and, thus, meet conditions for the Upper Solution. Using logic identical to that of Proposition #1 for the (M / F) partition, considering susceptible women and susceptible men , using the estimated adjustments for the similar early environment of twins for these two subgroups (see Supplemental Material), and using the data provided in Table 2, it follows that:

(Equation #1b)
The proportion of MS patients who are women from using the smallest (i.e., the most conservative) of these estimated gender imbalances, using the above ranges for men and women, and from the definition of the (G)-subset, we can estimate that: Because: Therefore: And similarly: These possible ranges for men and women don't overlap. Therefore, at a minimum, the excess of men in the (G)-subset must be: where: so that: In fact, this gender imbalance is likely even greater than this (see Third, it is not possible that the variance of penetrance values for the subset to be at its maximum value. Thus, because , then the maximum variance for the -subset -exceeds the maximum total variance possible for the entire (G)-subset -. Consequently, the lower limit for the value of (x1) in Equation #1a -i.e., at its maximum possible variance -must be too low.
And fourth, some of the maximum possible variance in the {X} set must be accounted for just by the separation of from -see Supplemental Material. Thus, following the standard development of variance relationships [55], and taking each of these factors into account, leads to the conclusion that: and that:  The estimate derived from Table 2 for the quantity , because it is based on only two observations, seems likely to be the least reliable of any in the Table. However, even if this estimated penetrance were doubled, there would still be an excess of men in the (G)-subset such that: and also: Consequently, not only is genetic susceptibility extremely rare in the population, but also men are more likely than women to be genetically susceptible to MS. At first pass, it might seem biologically improbable that men would be two to four times more likely than women to be in the genetically susceptible subset-(G). Thus, if membership in the (G)-subset is envisioned as being due to an individual possessing a sufficient combination of some number of loci in a "susceptible state" [56], it is unclear how men could be more likely than women (or vice versa) to possess certain combinations and not others. This seems especially unlikely for circumstances, when one association study, specifically focused on the Xchromosome, failed to identify any susceptibility loci on this chromosome [7], when another large GWAS found that all but one of the ~200 MS-associated loci were located on autosomal chromosomes [14], and when no major gender interaction term has been reported in the literature. Indeed, considering the different "risk" haplotypes in the HLA region identified in the WTCCC, men and women seem equally likely to be carriers [57]. Nevertheless, we can designate (Gak) to represent each of the (n) autosomal genotypes in the general population (Z) − i.e., omitting any specification of gender. In this circumstance, it is entirely possible that: and, yet, for some specific autosomal genotypes to have the characteristic that: Indeed, such an explanation for the excess in susceptible men would fit well with the observation that the specific genetic combinations, which underlie susceptibility to MS, seem to be unique to each individual (see #9, below; see also Supplemental Material). In addition, such a circumstance might also help to rationalize the finding that men likely have a lower threshold of environmental exposure for developing MS compared to women (see #7, below).

Argument:
We define (ET) to be the prevailing environmental conditions (whatever these are) experienced by the population during some time-period (T). As noted in the Methods, we define (Ei) to be the specific environmental exposure, which is sufficient for MS to develop in the i th susceptible individual (however many events are involved, whenever these events need to act, and whatever these events might be) -i.e., both the events need to occur jointly in order for MS to develop in the Notably, also, this expression for P(E) explicitly incorporates the possibility that each genotype in (G) may require a unique set of environmental events in order for MS to develop in that individual.
Nevertheless, despite this possibility, the existing epidemiological data suggests that many (or most) MS patients are responding to similar environmental events and, thus, any large variability in this regard is probably not a major factor in MS pathogenesis.
For example, despite the fact that every MS patient (except MZ-twins) has a unique combination of "states" at the (>200) susceptibility loci (see Supplemental Material), the data from Canada indicates that the change in general environmental conditions (whatever these are), which have taken place between the time periods of (1941)(1942)(1943)(1944)(1945) and (1976)(1977)(1978)(1979)(1980), have produced, at a minimum, a 32% increase in the (Ei and Gi) prevalence of MS (see Supplemental Material). Moreover, because this increase has occurred world-wide and predominantly in women [3,4,49,50,54], the (F:M) sex ratio for MS in Canada has increased during every 5-year increment except one between these two time-periods [54]. Over the entire interval, the ratio has increased from 2 .2 in (1941-1945) to 3.2 in (1976-1980). These changes are far too rapid to be genetically based.
It is conceivable that this observed sex-ratio change might be artifactual. For example, if women were more likely than men to have minimally symptomatic MS, then, with such patients now being diagnosed by our improved imaging and laboratory methods, women might represent a disproportionate number of these newly diagnosed cases. Alternatively, in earlier eras, vague symptoms of MS in women may have been written off as "non-organic" more often than they were in men. Nevertheless, four lines of evidence argue strongly against this change being an artifact. First, this increase in the sex ratio began before, and continued up to, the advent of modern imaging and laboratory methods [54]. Because these (a) units are arbitrary, we can assign "1 unit" of environmental exposure in men to be the difference in exposure level between any two time points (e.g., a1 and a2) such that: For women, we can similarly transform exposure into a different scale of so-called "apparent" exposure units (a app ) such that: and where we now define "1 unit" of environmental exposure (on this scale) as: The choice of which gender (men or women) to assign to which scale is completely arbitrary.
A standard derivation from survival analysis methods [63], demonstrates that the survival curves are exponential with respect to their hazard functions.
Thus, for men: and, for women: So that, for men: and, for women: In considering the probability of failure (i.e., of developing MS), we will use subscripts (1) and (2)   Moreover, as demonstrated in the Supplemental Material, assuming that MS prevalence has either remained stable or increased over the time-interval, we can define the term (C) such that: and, thereby, re-express (Zw1) and (Zm1) in terms of (Zw2) and (Zm2).

Thus:
and: where: And, thus: Consequently, based on the population data from Canada, the prevalence of MS must have increased by more than 32% between these two time periods.
Finally (see Supplemental Material), we can estimate the value of both (c) and (d) as: and: Thus, using the observed change in the (F:M) sex-ratio over time in Canada, together with our estimates for P(G) and P(F│G), we have all the data needed to construct the complete response curves for the cumulative probability of developing MS in genetically susceptible women and men (Fig. 3). What these curves make clear is that both P(E) and P(MS) are changing over time, which indicates that specific environmental conditions, in addition to specific susceptible genetic combinations, are necessary for MS to develop. Thus, MS develops when the right genetic constitution is exposed to the right environmental conditions (i.e., it is fundamentally due to a gene-environment interaction).
Because, as noted above, the scales for the response curves in women and men are assumed to be proportional, the plot for women on the (a) scale will be stretched or compressed (along the x-axis)depending upon the value of (R) -compared to the plot for women on the (a app ) scale (Fig. 3). The threshold for men is at the intersection-point and the threshold for women is at the intersection-point . By the definitions of (E) and (a), one of these two thresholds must occur at . However, these thresholds need not be the same, so we define the threshold difference between women and men as: . Thus, if women have a higher threshold than men: . On the (a app ) scale, the x-intercept always occurs at , and, thus, is independent of R.
Three final points are also worth making. First, because is independent of R, we can use the condition of to evaluate . In this circumstance, these exponential equations (see Supplemental Material) can be re-arranged to yield: Consequently, basic epidemiologic data can be used to determine the difference in threshold that exists between women and men. This analysis leads to the conclusion that: Therefore, if the hazards are proportional, men have a lower threshold for developing MS compared to women ( Fig.3; Supplemental Material). A lower threshold in men is also suggested by a report from Europe and the United States [64], which found that, prior 1922, men accounted for 58% of the MS cases ( Table 4). By our definition of P(E) these thresholds indicate the exposure, at which MS becomes possible. If women required a fundamentally different kind of exposure than men, it would be very hard to rationalize a difference in threshold because, in such a circumstance, in some environments, women would be more likely and, in other environments, less likely than men to receive the correct exposure. Rather, a difference in threshold implies that men and women are responding to similar events but that men require a less extreme degree of exposure in order to develop MS. For example, perhaps, ∀C > 0.50 : 0.37 < λ < 4.67 and, in fact: ∀C > 0 : λ > 0 men become susceptible with a lesser degree of vitamin D deficiency or with EBV infection occurring over a broader age-range compared to women.
Second, we note that: , so that (Zm2) can be re-expressed as: This equation can be rearranged to yield: From above: Assuming that there has been less than a two-fold increase in MS prevalence in Canada over the 35-year interval (i.e., ) then: And, similarly, for women: These results strongly suggest that the relevant environmental exposures (especially when these are multiple) are currently occurring at population-wide levels. For example, if three, equally likely and independent, environmental events (EE1, EE2, and EE3) -possibly sequential [49,50] -were necessary to produce MS in a susceptible individual, then: or: so that, under the stated circumstances, more than 94% of the population would experience each environmental event. Such a conclusion is fully consistent with the same conclusion reached from studies in adopted individuals, in siblings and half-siblings raised together or apart, in conjugal couples, and in brothers and sisters of different birth order, which have generally indicated that MS-risk is currently unaffected by the micro-environment of families [65][66][67][68][69][70][71].
And third, it is clear that both of these response curves plateau well below 100% failure (Fig 3).
Therefore, there must be stochastic processes that partially determine whether a susceptible individual with a sufficient environmental exposure will actually develop disease (see #9, below).

2.
Argument: As a sufficient environmental exposure {P(E)} becomes more likely, the quantity {P(MS│IGMS)} will, of necessity, change. Earlier, we described this term as having removed the impact of the shared IU and early post-natal environments of MZ-twins. This description, however, is not quite 3 = P(EE2) 3 = P(EE3) 3  accurate. For example, we can break down a "sufficient" environmental exposure (see Supplemental Material) into those factors that are shared exclusively by MZ-twins (E1), those factors that are shared by the population generally (E2), and those factors that shared exclusively within the family microenvironment (E3). As noted above, however, the family microenvironment seems not to have any impact on the likelihood of MS [65][66][67][68][69][70][71]. In this circumstance, because only factors (E1 and E2) are necessary for a sufficient exposure, then: and: If an individual's identical twin is known to have MS, it is likely that this individual, also, has experienced a "sufficient" IU and early post-natal environment.
Conceived of in this way, the term {P(MS│IGMS)} can be rewritten as: and the adjusted penetrance {P(MS│IGMS)} hasn't really "removed" the impact of these early environmental similarities. Rather, has simply been reset to its population level . Because MZ-twins share both identical genotypes and the same IU and early post-natal environment, we expect that: . Consequently, as increases in the population to: the term {P(MS│IGMS)} will approach, and ultimately reach: In this case, therefore, the limiting value for in men (c) and women (d) -see #7 above -must conform to the constraints of: The reason for the inequality is that, in those circumstance where: it must be that both: . Naturally, the fact that has increased to unity does not guarantee that has done the same, so that the limiting value for may be greater than .
Nevertheless, if it is currently true (see #7 above), that: then it must also true that: Regardless, however, the depicted curves (Fig. 3) must be inaccurate because, in the Figure: and: Clearly there are several variables that can be adjusted to match the values for both (c) and (d) with these observed MZ-twin concordance rates. Therefore, it is possible to consider different combinations of these variables and determine those combinations that match these constraints. For example, using variables ranges of: , , , , and , and further requiring that the estimates for (c) and (d) to be within (± 15%) of the observed values for the proband-wise MZtwin concordance rates ( Table 2), there are numerous combinations, which match these constraints. The solution space covered by these combinations includes the full range of possibilities for the parameters of (C) and (R). By contrast, the ranges for both P(F|G) and P(G) are restricted: and . This restricted range for P(G) fits, generally, within the framework developed previously (see #5 above). The range for P(F|G), however, is outside of the range developed previously (see #5 above) although, as discussed in the Supplemental Material, this could relate to an underestimate for the parameter from Table 2.
Also, 94% of these potential solutions require the condition that . This latter circumstance might reflect an under-ascertainment of cases when estimating the disease prevalence for the general population (Z). Indeed, several autopsy studies have indicated that the prevalence of undiagnosed (pathological) MS is ~0.1% [45][46][47][48]. Thus, with minimally symptomatic (or asymptomatic) MS occurring in as many 0.1% of the population, this could potentially increase the estimated P(MS) by as much as 50−100%. Although, such diagnostic errors are probably less common in the modern era, many minimally symptomatic (or asymptomatic) patients are still being undiagnosed during life [59].
Moreover, any such under-ascertainment is likely to be less for MZ-twins, DZ-twins, and siblings. For example, an initially unaffected twin or non-twin sibling of a patient with MS will, almost certainly, be more carefully monitored for possible MS symptoms (i.e., for minimally symptomatic presentations) than will an individual in the general population. In such a circumstance, these diagnostic failures will be fewer in the (MZMS), (DZMS), and (SMS) populations than in the general population and the MZ-twin concordance rates will, thus, provide a more accurate reflection of the maximum likelihood of getting MS {i.e., } than will those estimates of derived from the MS-prevalence in the general population. Such a circumstance might help to rationalize this apparent discrepancy.

Missing Heritability?
Conclusions: 1. Both "genetic" and "environmental" factors are necessary for MS expression; Neither alone are sufficient. 2. A large portion of the "causal pathway" to MS is stochastic 3. There is no need to invoke any "missing heritability" in MS Argument: Only a small proportion of the population seems to be genetically susceptible to developing MS, which implies that this is a "genetic" disorder. In addition, a suitable environmental exposure, like a suitable genetic constitution, is also a necessary part of MS pathogenesis. Despite this, however, the combination of a susceptible genotype together with a sufficient environmental exposure, does not invariably lead to the disease of MS and, in fact, the response curves in both men and women plateau well below 100% (Fig 3),

Discussion
The present analysis provides considerable insight to the nature and basis of susceptibility to MS  [23,24]. Even considering the large number and variety of these highly selected CEHs, however, genetic susceptibility cannot be explained on the basis of the state of the MHC.
Despite a significant variability in the observed disease-association among the different (H+)-carrying CEHs, every such CEH (regardless of its rarity) is strongly MS-associated [23,24].
Moreover, it seems clear that, although certain genetic combinations increase the likelihood of (G)-subset membership, the actual combinations that do this are quite heterogeneous, and only a small proportion of genetically susceptible individuals (who actually develop MS) share even the same 4-locus genetic combination (Supplemental Material). These observations also suggest that susceptibility to MS, although genetically based, is idiosyncratic.
Despite the conclusion that MS is genetic, however, MS is equally an environmental disease.
Specific environmental exposures are also necessary for disease-pathogenesis. Indeed, the fact that there has been a marked recent increase in both MS-prevalence and the (F:M) sex-ratio, indicates that a sufficient environmental exposure is required for MS to develop (Fig. 3). If you are not exposed to a sufficient environment, you cannot get MS, regardless of your genetic make-up. However, neither environment nor genetics alone is sufficient. Rather, MS is due to an interaction between the two.
Several environmental events, probably sequential, seem necessary for MS to develop in a genetically susceptible individual [3,4,49,50,[60][61][62]. The first environmental event, as discussed previously [49], is one that occurs during IU or early post-natal period. Support for such a factor comes from the discrepancy in recurrence-rates between twin and non-twin siblings, from the fact that concordant half-twins are twice as likely to share the mother than the father, and from the periodic, circaannum, effect that month-of-birth has on the subsequent likelihood of developing MS [49]. In the northern hemisphere, this periodicity to MS-susceptibility peaks just before the summer months and dips to its nadir just before winter and this pattern is inverted southern hemisphere [49]. Such a pattern of periodicity implicates an environmental factor, occurring near birth, that is coupled to the solar cycle [49].
A second environmental event is implied by the published migration data [49]. Thus, when an individual relocates (prior to ~15 years of age) from an area of high-prevalence to an area of lowprevalence (or vice versa), their MS risk is similar to that of the area to which they moved. By contrast, when they make the same relocation after this time, their MS risk seems to remain that of the area from which they moved. These observations implicate an environmental event, involved in MS-pathogenesis, which occurs at or around puberty [49]. And third, the clinical onset of MS generally occurs long after the first and second events have already taken place ( Fig. 1) However, even when the correct genetic background occurs together with an environmental exposure sufficient to cause MS in someone of that background, more than 50% of such individuals will still not develop clinical disease. Some of these individuals, no doubt, will have subclinical disease [45][46][47][48]59]. However, although such a circumstance, will increase our estimate of by as much as 50-100%, this is still insufficient to get the plateaus of the response curves ( Fig. 3) to exceed the 50% mark. In men (who have a plateau significantly lower than that of women), this conclusion is even more evident (Fig. 3). Consequently, because a sufficient environmental exposure has been defined broadly (to include both factors that are known or suspected as well as factors that are completely unknown), the fact that some individuals with the proper combination of genes and environment still fail to develop disease, indicates that stochastic processes are also involved in disease-pathogenesis.
And finally, it is worth noting that the nature of genetic susceptibility developed in this manuscript is applicable to a wide range of other complex polygenetic disorders such as type-1 diabetes mellitus, celiac disease, and rheumatoid arthritis. Indeed, based solely upon Proposition #1, if the proband-wise MZ-twin concordance rate, for any disease, greatly exceeds the prevalence of disease in the general population, then only a tiny fraction of the population has any possibility of getting the illness.
Moreover, any disease for which the proband-wise MZ-twin concordance rate is substantially less than 100% must, in addition to genetic susceptibility, include environmental factors, stochastic factors, or both in the causal pathway leading to the disease.     [42]. A range is given because, often, a range of estimates is available for a particular region. † Estimates are presented as proband-wise concordance rates [26]. Sometimes concordance was reported as a pair-wise rate and, in these cases, the estimates have been converted into proband-wise rates assuming random sampling of twin-pairs [26]. Nevertheless, in at least some reports [e.g., 32], this assumption is almost certainly violated. † † For the purposes of determining the probability of "genetic-susceptibility" {P(G)} in each region, we have taken: − see Text − and we have adjusted the MZ-twin concordance rates using the Canadian data for differences between fraternal-twin and sibling concordance (see Text) as:
Finally, the range of values for P(G) is taken both from the range of the prevalence data and also from the range provided by Proposition #1 (see Text).     ,546 110,053 116,493 248,791 121,284 127,507 281,425 138,056 143,368 308,746 151,781 156  Increasing the estimate of makes the two curves slightly closer to each other. If the hazard is not proportional, for women, each of the points would be the same as depicted for , although the scale of the x-axis for the two exponential curves would be transformed non-linearly and, thus, the response-curve in men could not be compared directly to the curve in women. Moreover, the x-intercept for the  under any circumstances, women (relative to men) would still be seen to exhibit a greater responsiveness to those changes in environmental exposure, which have taken place between the two time-periods.