The nature of genetic and environmental susceptibility to multiple sclerosis

Objective To understand the nature of genetic and environmental susceptibility to multiple sclerosis (MS) and, by extension, susceptibility to other complex genetic diseases. Background Certain basic epidemiological parameters of MS (e.g., population-prevalence of MS, recurrence-risks for MS in siblings and twins, proportion of women among MS patients, and the time-dependent changes in the sex-ratio) are well-established. In addition, more than 233 genetic-loci have now been identified as being unequivocally MS-associated, including 32 loci within the major histocompatibility complex (MHC), and one locus on the X chromosome. Despite this recent explosion in genetic associations, however, the association of MS with the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 (H+) haplotype has been known for decades. Design/Methods We define the “genetically-susceptible” subset (G) to include everyone with any non-zero life-time chance of developing MS. Individuals who have no chance of developing MS, regardless of their environmental experiences, belong to the mutually exclusive “non-susceptible” subset (G–). Using these well-established epidemiological parameters, we analyze, mathematically, the implications that these observations have regarding the genetic-susceptibility to MS. In addition, we use the sex-ratio change (observed over a 35-year interval in Canada), to derive the relationship between MS-probability and an increasing likelihood of a sufficient environmental exposure. Results We demonstrate that genetic-susceptibitly is confined to less than 7.3% of populations throughout Europe and North America. Consequently, more than 92.7% of individuals in these populations have no chance whatsoever of developing MS, regardless of their environmental experiences. Even among carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype, far fewer than 32% can possibly be members the (G) subset. Also, despite the current preponderance of women among MS patients, women are less likely to be in the susceptible (G) subset and have a higher environmental threshold for developing MS compared to men. Nevertheless, the penetrance of MS in susceptible women is considerably greater than it is in men. Moreover, the response-curves for MS-probability in susceptible individuals increases with an increasing likelihood of a sufficient environmental exposure, especially among women. However, these environmental response-curves plateau at under 50% for women and at a significantly lower level for men. Conclusions The pathogenesis of MS requires both a genetic predisposition and a suitable environmental exposure. Nevertheless, genetic-susceptibility is rare in the population (< 7.3%) and requires specific combinations of non-additive genetic risk-factors. For example, only a minority of carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype are even in the (G) subset and, thus, genetic-susceptibility to MS in these carriers must result from the combined effect this haplotype together with the effects of certain other (as yet, unidentified) genetic factors. By itself, this haplotype poses no MS-risk. By contrast, a sufficient environmental exposure (however many events are involved, whenever these events need to act, and whatever these events might be) is common, currently occurring in, at least, 76% of susceptible individuals. In addition, the fact that environmental response-curves plateau well below 50% (especially in men), indicates that disease pathogenesis is partly stochastic. By extension, other diseases, for which monozygotic-twin recurrence-risks greatly exceed the disease-prevalence (e.g., rheumatoid arthritis, diabetes, and celiac disease), must have a similar genetic basis.

Despite the undoubted influence of genetic and environmental factors in MS-pathogenesis, susceptibility to MS might be envisioned in number of different ways. Four examples of disease states, for which we understand, generally, the pathophysiology, can be helpful to highlight some of the issues that might also be involved in MS pathogenesis.
First, sickle cell disease (SCD) occurs in~3% of individuals in certain sub-Saharan regions of Africa [25]. All affected individuals are homozygous for the HbS mutation of the hemoglobin gene. Despite the fact that the clinical expression of SCD can be influenced by environmental factors such as strenuous exercise, high-altitude, infection, and dehydration, SCD is fundamentally a genetic disorder.
Second, each year, 5−20% of the population in North America gets the flu [25]. Although the genetic make-up might make one person more or less susceptible to a particular year's variant, presumably, everyone could develop the flu if they had a sufficient exposure to the influenza virus. Therefore, despite the possible genetic differences in susceptibility, the flu is fundamentally an environmental (infectious) disease.
Third, the life-time probability of breast cancer in the US is~12.5% in women and~0.1% in men. Individuals (especially women) who carry the BRCA1 or BRCA2 mutations (<1% of the population) have 4-7 times the risk as that in the general population [25]. Nevertheless, presumably, there is a baseline risk of breast cancer such that no one is completely risk-free. Although the genetic make-up (including gender) influences the baseline risk and the environment likely affects the penetrance of the BRCA mutations, some breast cancer cases are fundamentally genetic and others are fundamentally environmental (of unclear type, but possibly due to exposures such as by toxins, radiation, pregnancy, or other occurrences).
Fourth, the human immunodeficiency virus (HIV) can infect anyone in the population although individuals who engage in certain high-risk behaviors (e.g., having unprotected analreceptive sex or using intravenous drugs and sharing needles) are particularly susceptible. Among persons of northern European extraction,~1% are homozygous for the Δ-32 mutation of the CCR5 gene and are almost completely resistant to HIV [25]. Consequently, HIV infection is fundamentally an environmental disorder (infectious) with an interaction between two environmental factors (i.e., the virus and specific high-risk behaviors). However, certain genetic traits (e.g., the Δ-32 mutation) can be decisive in determining the degree of susceptibility.
Whether susceptibility to MS resembles any of these disease-states (or some other) is unknown although its polygenic nature is certain [5][6][7][8][9][10][11][12][13][14]. Nevertheless, several basic epidemiological observations in MS bear directly on the different possibilities. In this paper, we utilize directly observable, and well-established, "population parameters" (e.g., the concordance rates in twins and siblings, the proportion of women among MS patients, the population prevalence of MS, the time-dependent changes in the sex-ratio, etc.) to logically infer the values of other non-observable parameters of interest (e.g., the population probability of being genetically susceptible, the likelihood that a susceptible person actually develops MS, the proportion of susceptible individuals who are women, the likelihood that a susceptible individual experiences a sufficient environmental exposure, etc.).

Methods
For the purpose of this analysis we define, explicitly, five general terms ( Table 1) and, in addition, provide a set of parameter abbreviations to be used for the purposes of notational simplicity ( Table 2). The first term is {P(MS)}, which represents the expected life-time probability that a random individual from the general population (Z) will develop MS {i.e., the expected penetrance of MS is P(MS) = P(MS|Z)}. As discussed below, this parameter is related to the population prevalence.
The second term is {P(G)}, which represents the expected probability that a random individual from (Z) is also a member of the (G) subset-i.e., P(G) = P(G|Z). In turn, we define the (G) subset to include everyone who has any non-zero chance of developing MS (i.e., regardless of how small that risk might be). All individuals who are not in the subset (G) are considered to be in the mutually exclusive subset (G−) of non-susceptible individuals who have no chance of getting MS, regardless of their environmental experiences. We also define the set {X} to be the set of penetrance values for members of the (G) subset. If the variance of penetrance values

P(E)
Expected probability of an environmental exposure "sufficient to cause MS" in the subset (G), given the prevailing environmental conditions of the time The prevailing environmental conditions of during a specific time-period (T) (E 1 ) That part of the sufficient environmental exposure shared exclusively by MZ-or DZ-twinsespecially during the IU and early post-natal period That part of the sufficient environmental exposure shared by the population generally: The potential part of a sufficient environmental exposure due exclusively to the shared microenvironment of families. However, observationally [62][63][64][65][66][67][68]: in {X} is non-zero then, for at least one partition, the subset (G) can be divided into two mutually exclusive sub-subsets, (G1) and (G2), suitably defined, such that the expected penetrance for the sub-subset (G1) is greater than that for (G2). If this difference in expected penetrance between the two sub-subsets is statistically significant, we will restrict our analysis to those circumstances, in which the sub-subsets {(G1) and (G2)}, considered separately, each has a penetrance distribution, which conforms to the Upper Solution (see Proposition #1, below).
Although such an analysis focuses on possible unimodal and bimodal distributions for the set {X}, this constraint does not impose a bimodal distribution on it. Rather, the distribution of the set {X} could still be unimodal, bimodal, trimodal or multi-modal {NB: however, if the set {X} is bimodal and the subset (G−) is non-empty, then the population (Z) has a trimodal distribution of penetrance values}. The third term is {P(E)}, which represents the expected probability that a member of the (G) subset will experience an environmental exposure sufficient to cause MS, given the prevailing environmental conditions of the time (E T ) -i.e., P(E) = P(E|G,E T ). Using this definition for environmental exposure, even for those circumstances in which MS is either "purely genetic" or "purely environmental", we note that in all cases: P(MS,E) = P(MS).
The fourth is a set of related terms {P(MS│MZ MS ), P(MS│DZ MS ), and P(MS│S MS )}. The 1 st two terms, {P(MS│MZ MS )} and {P(MS│DZ MS )}, represent the expected conditional lifetime probability of developing MS for an individual from either a monozygotic (MZ) or a dizygotic (DZ) twin-ship, given the fact that their co-twin either has or will develop MS. These p P(G1|G) -the proportion of the (G) subset that is also in (G1) -the limiting value of the exponential curve for men probabilities are estimated by the observed proband-wise concordance rate for either MZtwins or DZ-twins [26]. In a similar manner, the term {P(MS│S MS )} represents the expected conditional life-time probability of developing MS in a sibling (S), given the fact that their cosibling either has or will develop MS. Last, is the term {P(MS│IG MS )}, which represents the adjusted proband-wise concordance rate for MZ-twins. Such an adjustment may be necessary because concordant MZ-twins, in addition to sharing their identical genotypes (IG), also share the intrauterine (IU) and certain other (especially early) post-natal environments. Thus, it is possible that these shared environmental experiences of MZ-twins might significantly impact the likelihood of their developing MS in the future. One method to estimate the adjustment necessary in such a circumstance is to consider the difference in concordance rates between non-twin siblings and fraternal twins (i.e., siblings who share the same genetic relationship but who are divergent in their IU and certain post-natal experiences). Although epidemiological studies have differed somewhat with regard to the magnitude of any such differences [27][28][29][30][31][32][33][34], population-based studies out of Canada suggest that the impact of these shared environmental events may be substantial [29]. As demonstrated in the S1 File (#1), we can use the observed recurrence-rate data to make this adjustment such that: From these definitions and relationships, we can use well-established values for the different population parameters to logically deduce the value of the another, non-observable, parameter {P(MS|G)}, which represents the conditional life-time probability of developing MS for a member of the (G) subset. This term is referred to as the expected penetrance for the (G) subset. We note that, from the definition of the (G) subset, everyone who actually develops MS during their life-time must belong to this subset. Therefore, the joint probability {P(MS, G)} must be the same as {P(MS)}, so that, by definition: PðMS; GÞ ¼ PðMSÞ and; analogously : PðMZ MS ; GÞ ¼ PðMZ MS Þ From this, and from the definition of conditional probability: This equation can be re-arranged to yield:

PðGÞ ¼ PðMSÞ=PðMSjGÞ
This relationship, once established, can then be used to assess the nature of MS pathogenesis. For example, if {P(G) = 1}, then anyone can get the disease under the right environmental circumstances (e.g., flu, breast cancer, & HIV) and we would conclude that MS must, in some cases, be caused by "purely environmental" factors. Notably, however, such circumstance does not preclude the possibility that genetic factors strongly influence the likelihood of disease (e.

g., breast cancer & HIV).
By contrast, if {P(G)<1}, this indicates that only certain individuals can possibly get the disease (e.g., SCD) and, therefore, that MS must be a genetic disorder (i.e., unless a person has the correct genetic make-up, they have no chance, whatsoever, of getting the disease, regardless of their environmental exposure). Naturally, also, such a conclusion would have no bearing on whether disease pathogenesis also requires the co-occurrence of specific environmental events. Also, in this circumstance, how we might characterize the nature of genetic susceptibility, would depend upon the degree to which P(G) was less the unity and upon the magnitude of the disparity between any so-called "high" and "low" penetrance subgroups. For example, in HIV, if homozygous Δ-32 mutations (occurring in 1% of a northern European population) were completely protective, then: P(G) = 0.99. In this circumstance, however, we would likely characterize the disease as being fundamentally environmental and the homozygous Δ-32 mutations as being protective rather characterizing every other genotype as being "susceptible". By contrast, in SCD, where: P(G) = 0.03 -i.e., 3% of certain African populations-we would characterize carrying homozygous HbS mutations as the defining trait for membership in the "genetically-susceptible" subset (G). Even if it were possible, in extremely rare circumstances, for an individual to develop SCD in the absence of homozygous HbS mutations, we would still consider this disease to be fundamentally genetic.

Conclusion: P(MS)�0.003
Argument: One possible estimate of P(MS) could be the prevalence of MS in a population. However, because the clinical onset of MS occurs largely between the ages of 15 and 45 years (e.g. , Fig 1), the measured cross-sectional prevalence of MS (using the entire population as the denominator) will necessarily include individuals with different likelihoods of having already developed MS [35]. For example, using the 2010 United States census data (for the total resident population-see Fig 2) as an approximation, we can divide the general population (Z) into the three mutually exclusive age-bands (A1, A2, and A3), such that: PðA1jZÞ � 0:20 Because so few of MS patients have their disease onset prior to the age of 15 years (e.g., Fig 1) it seems a reasonable approximation that: By contrast, as noted above, the age group (15-45 years) accounts for the large majority of clinical onsets, which have a roughly symmetrical distribution with a mean of 28.3 years (Fig 1). If the distribution were exactly symmetrical and centered on 30 years, the measured prevalence in this age band would be~50% of the value of P(MS). Therefore, it seems reasonable to estimate: For the older age band (>45 years) most patients will have already developed the disease (Fig 1). Thus, on the one hand, one might expect that the measured prevalence in this age- band to be equal to P(MS). On the other hand, there is a small but definite excessive mortality in MS such that life expectancy is reduced in MS-patients by about 5-10 years [36][37][38][39][40] although a recent study from Denmark [41] reported that short-term survival has steadily improved for patients beginning in 1950 and continuing through 1999. This excessive mortality will make the estimate too small by some amount although it seems unlikely that this reduction will be more than 25%. Thus, a range of plausible estimates is likely to be: Combining these three different estimates yields the estimate: PðMSÞ � 0:20�PðMSjA1Þ þ 0:41�PðMSjA2Þ þ 0:39�PðMSjA3Þ Defining the measured prevalence in the population as (prev), this estimate translates to:

1:7�ðprevÞ < PðMSÞ < 2�ðprevÞ
A second method to estimate P(MS) would be to use a measured prevalence for MS, which is restricted to the age-band of 45-54 years. Thus, within this age-band, almost all patients will have already experienced their clinical-onset and only a few will have experienced their (expected) excessive mortality. Consequently, by this method: A third method would be to use population-based death data and to consider the percentage of death certificates that mention the diagnosis of MS (not necessarily as, but including, the immediate, underlying, or contributing cause). Thus, by the time of death, any case of clinically evident MS must, by definition, have already declared itself. Consequently, by this method: PðMSÞ � %of death certificates mentioning MS In 2001, we took a cursory (unpublished) look at the Kaiser northern California database. At the time, there were 4,352 unique persons in the database with a diagnostic code for MS. With 2.9 million persons enrolled in Kaiser northern California at the time, and if this population is a representative sample, this would translate to an MS-prevalence in northern California of 150 per 100,000 population. Such an estimate is consistent with many other published studies in northern populations, which generally find the prevalence of MS to be 100-200 per 100,000 population [42]. A recent study from the United States, using multiple administrative health claims (AHC) datasets [43] estimated that the prevalence of MS in adults (i.e., age � 18 years)-which represents~75% of the US population (see Fig 2)-as 288-309 per 100,000 population. For comparison purposes, this translates to a prevalence in the entire population of 216-232 per 100,000 individuals.
Similarly, in a Swedish study by Sundström and co-workers [44], the age-specific prevalence of MS in the 45-54 year age-band was reported to be 304 per 100,000 population. In the AHC study [43], the estimate for this same age-band (considering the entire population) was 314-337 per 100,000 individuals.
And, finally, in a recent population-based multiple-cause-death study from British Columbia [45], a diagnosis of MS was mentioned on 0.28% of the death certificates.
Thus, all three of these methods of estimation are quite consistent with each other. The range of values supported, collectively, by these observations is: The best support is for the conclusion that, in the northern populations of Europe and the Americas: However, despite the notable consistency of these three estimates, each of these methods relates only to "diagnosed" MS in the general population (Z). If undiagnosed (i.e., pathological) MS is included in the calculation [46][47][48][49], this estimate may increase by as much as 50-100% (see #8 below).
This is just the expected "adjusted" penetrance for the (MZ MS ) subset. As discussed earlier and, as developed in the S1 File (#1), {P(MS|IG MS )} can be estimated from the difference in observed concordance rates between siblings and fraternal twins. Using the Canadian population-based data ( Table 3; Fig 3) on the recurrence risks in non-twin siblings and DZ-twins The value for P(H+) was provided by D. Sadovnick, was based on 400 Canadian controls, and the rate was confirmed in a large transplant database (personal communication). The F:M sex-ratio in the general population of Canada was taken from the 2010 Canadian census. Recurrence risks for monozygotic (MZ) twins, dizygotic (DZ) twins, and siblings (S) were taken from the study of Willer et al. [29]. The other summary data was taken from Table 3, and/or from the study of Willer et al. [29]. The F:M sex-ratio among Canadian MS patients at each of the 5-year time-periods (1941-1945 & 1976-1980) was taken from the study of Orton et al. [56].
https://doi.org/10.1371/journal.pone.0246157.g003 (concordance rates for siblings = 2.9%; concordance rates for DZ-twins = 5.4%) to make this adjustment (see above) leads to the estimate of: 8fx > x 0 =2g : p < ð2 À b 2 sÞ=ða 2 r À b 2 sÞ 5. And finally, for more extreme non-unimodal distributions of (G)-i.e., P(MS|G)<x'/2then the Lower Solution applies and: Proof: For notational simplicity, as discussed previously, we use abbreviated terms for several parameters (see Table 2). Among the (n) individuals in the general population (Z), we have already defined the (G) subset, which consists of everyone who has any non-zero lifetime probability of developing MS. Thus, each of the (m) individuals in the (G) subset (i = 1,2,. . .,m) has a unique genotype (G i ), such that: We define (see Table 2) the parameters, (x i ) and (x), such that: Thus, (x i ) represents the expected penetrance for MS in the i th individual of the (G) subset. Even if this penetrance exactly matches that of another person, (x i ) is still unique to the i th individual. Also, considering the penetrance values for each of the members of the (G) subset, we can define the set {X} such that: Because the (G) subset forms a partition of the population (Z), each of the (m 2 = n−m) individuals, who are not in the (G) subset, belongs to the mutually exclusive "non-susceptible" subset (G−). Moreover, each of the (m 2 ) individuals in the (G−) subset (j = 1,2,. . .,m 2 ) has a unique genotype (G j ), which has a zero conditional life-time probability of developing MS, so that: 8G j 2 GÀ : PðMSjG j Þ ¼ 0 and; thus : PðMSjGÀ Þ ¼ 0 Also, if {Var(X)6 ¼0}, we can partition the subset (G) into two mutually exclusive sub-subsets, (G1) and (G2), suitably defined, such that the sub-subset (G1) has a penetrance greater than that of (G2). Again, for ease of notation, we define the quantities (x',x 1 ,x 1 ',x 2 , & x 2 ')see Table 2 -such that: In earlier iterations of this analysis [3,4,51,52], we defined the subset (G) differently-i.e., 8G i 2G: P(MS|G i )�P(MS). We have chosen the current definition because it considerably simplifies the biological interpretation of the findings. Nevertheless, we note that, when circumstances fit the conditions of the Lower Solution (see below), the new sub-subset (G1) is, effectively, identical to the subset (G) defined earlier.
We define the term {P(MZMS)} to represent the life-time probability of developing MS for any single individual from an MZ twin-ship (i.e., where the status of their co-twin is unknown). Because identical twinning is considered non-hereditary [53], we expect that: As noted earlier, we also define the set {X} set to consist of the individual MS-penetrance values for all members of the (G) subset. Thus, the variance ðs 2 X Þ of the set {X} can be expressed as:

It follows directly from the definitions of {P(G)} and {P(MS│IG MS )}-see Methods & #2, above-that:
Therefore, the probability {P(MS,G i |G,IG MS )} can be re-written as: In turn, the term P(G i |G,IG MS ) can be re-written as: Combining these two Equations (i.e., 1 & 2 above) yields: However : Consequently : and; with rearrangement : s 2 Notably, this equation can also be rearranged to yield a quadratic in (x) of: In turn, this quadratic equation can be solved to yield: x ¼ ðx 0 =2Þ � ð ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi which has real, non-negative, solutions only for: The maximum variance for any distribution [54,55] on the closed interval [a,b] is: Consequently, the maximum variance for the set {X} is identical to that for the interval [0,x'], which is: In addition, this maximum variance, (x'/2) 2 , occurs when the distribution of penetrance values in the set {X} is bimodal [54,55], such that half the (G) subset has a penetrance of (0) and the other half has a penetrance of (x'). From this point of maximum variance, the variance of the {X} subset decreases both when: x!x' and: x>x'/2 (the Upper Solution) and when: x!0 and: x<x'/2 (the Lower Solution) By definition, any solution requiring {P(MS|G i ) = 0} for any portion of (G) is excluded. Therefore, the Upper Solution limits become: x'/2<x�x' And the Lower Solution limits become: 0<x<x'/2 Moreover because: Table 2), notably, there are three other related equivilences: and : Because, by definition (x 1 >x 2 ) -see Methods -therefore, applying Eq 1B: also: if: x 2 >x'/2; then x>x'/2; and the distribution of {X} will conform to the Upper Solution (see above). Also, applying Eq 1C: if: (x 1 '>x 2 '); then: (x 1 '>x'>x 2 ') The Upper Solution. The Upper Solution, as: (x!x'), represents the gradual transition from a bimodal distribution to a unimodal distribution and, ultimately, to a distribution, in which every genotype in (G) has exactly the same penetrance (i.e., x = x'). As noted earlier (above), the Upper Solution requires that: Alternatively, we can define (p, a, b, r, & s) -see Table 2 -such that: and, as shown in the in S1 File (#3b), the Upper Solution applies whenever: Also, as demonstrated by others [54], the maximum variance of any unimodal distribution on the closed interval [a,b] is: Substituting this limit into the upper quadratic solution (above)-assuming this limit applies equally to the set {X}-yields: x � ðx 0 =2Þ þ ð ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Consequently, in order for {X} to have a unimodal distribution requires that: The Lower Solution. By contrast, the Lower Solution as: (x!0), represents an increasingly assymetric non-unimodal distribution of penetrance values within the (G) subset. Nevertheless, as noted above, all Lower Solutions require that: Alternatively (as above), using the parameters (p, a, b, r, & s) -see Table 2 -the Lower Solution applies whenever: Because, by definition, (x 1 >x 2 ), and because, we assume that sub-subsets (G1) and (G2) having different penetrances, considered separately, conform to an Upper Solution (see Methods) therefore: and, consequently: x 1 '<2x 1 and: x 2 '<2x 2 .
Therefore, for all Lower Solutions: x 1 '>x 2 ' and from Eq 1C (above): Notably, the values of {x',x 1 ',x 2 ', & P(MS)} represent observed population parameters (or are drrived from observed parameters). As such, these values shoud be considered as "fixed" although, naturally, there is always the possiblity of error in their observation.
Breast cancer. As an example, it is instructive to apply this same analysis to the risk in women of developing breast cancer (descsribed briefly in the Introduction). Clearly, this distribution is bimodal with <1% of women possessing the BRCA mutations, and with these individuals having 4-7 times the risk of breast cancer as that for everyone else. For this analysis, we assume that the subsets of women with (G1) and without (G2) BRCA mutations have a uniform penetrance within each subset. Also, we will also use parameter values that conform to the known epidemiology of breast cancer in women (BC) such that: Under these conditions, and in all circumstances, it is the case that: Although, unlike MS, we don't have "observational" estimates for adjusted the MZ-twin recurrence risk (x'), these circumstances for breast cancer, clearly, conform to the upper solution of the quadratic equation (above). For example, if this recurrence risk were (~15%) then: {P(G) = 1} and: {x = 0.83 � x'}. In this case, the fact that the distribution is bimodal is confirmed by the fact that the value of (x) is below the lower limit for a unimodal distribution (see above). By contrast, if all breast cancers are, to some degree, genetic disorders-{i.e., if: (P(G)<1)}then, as P(G) decreases, the value of (x) will increase. Nevertheless, the bimodality of the distribution will still be evident down to P(G) = 0.86. Below this point, however, the bimodal nature of the distribution will no longer be distinguishible (purely by consideration of the variance) from a unimodal distribution. Regardless, however, using these parameter values, the distribution would not actually become unimodal until the point at which: {x = x'}.

Genetic susceptibility to MS-general considerations
4a. The Upper Solution. Conclusions: 0.022�P(G)�0.045 Argument: From the Upper Solution in Proposition #1 and in conjunction with our estimate from #2 (above) for {P(MS|IG MS )}, it follows directly, that: We can then apply the relationship developed in the Methods that:

PðGÞ ¼ PðMSÞ=PðMSjGÞ
With this we have all the data necessary to establish the limits for the percentage of the population who are members of the (G) subset. Thus, using this range for P(MS│G), together with our estimate for P(MS)-see #1 above-it follows that: Consequently, by this analysis, only 4.5% or less of the general population (Z) could possibly be genetically susceptible to getting MS and the remainder of the population would have no possibility of getting this condition, regardless of their environmental experiences. Multiple reports from other MS-populations throughout Europe and North America yield very similar Upper Solution estimates for P(G), which seems to be independent of latitude (Table 4).
Notably, we arrived at this estimate for {P(MS|IG MS )} by adjusting the observed value of {P (MS|MZ MS )} downward to account for the presumed impact of the shared IU and early postnatal environments of MZ-Twins (see #2 above). To do this, we estimated the magnitude of this impact from the increased recurrence risk in DZ-twins compared to that in non-twin siblings (see Methods; see also #1, in S1 File). Although, the Canadian data suggests a larger discrepancy between {P(MS│DZ MS ) and P(MS│S MS )} compared to other studies [27][28][29][30][31][32][33][34]50], it is still possible that our adjustment is too small. Even so, there is a limit to how large any adjustment can be. Thus, from Fig 3, it must be the case that: Otherwise, there would be no increased risk of MS in persons who have 100% of their genes in common and don't share their IU and post-natal environments compared to persons who have only 50% of their genes in common and also don't share their IU and post-natal � Per 100,000 population. The prevalence of MS for each region is taken from data provided in [42]. A range is given because, often, a range of estimates is available for a particular region. † Estimates are presented as proband-wise concordance rates [26]. Sometimes concordance was reported as a pair-wise rate and, in these cases, the estimates have been converted into proband-wise rates assuming random sampling of twin-pairs [26]. Nevertheless, in at least some reports [e.g., 32], this assumption is almost certainly environments. Importantly, however, even in this case: PðGÞ < 0:003=ð0:029=2Þ ¼ 0:21 Therefore, even using this extreme estimate, the large majority of the population (>79%) would have no chance of getting MS, regardless of their environmental exposures (see Proposition #1).

Argument:
The considerations in #4a pertain only to an Upper Solution and the observations from Canada regarding recurrence risks for the gender partition in MS make it clear that the set {X} is, at least, bimodal (see #5, below). Moreover, given the magnitude of the gender imbalance in the (G) subset, it seems possible that the distribution of {X} might conform to a Lower Solution. Such a circumstance may increase the upper limit for genetic susceptibility to MS from the 4.5% estimated in #4a (above). Nevertheless, even in this case, there are constraints on possible solutions. For example, because we are assuming that sub-subsets (G1) and (G2) with significantly different expected penetrance values, considered separately, each conform to an Upper Solution (see Methods), the application of Eq 1A (above), together with the fact that (x 1 >x'/2) -see Proposition #1, above-and with our observational estimates for P (MS) and (x') -see #1 & #2, above-indicates that: or, with substitution: P(G1)<0.003/0.067 = 0.045 Consequently, using these estimates, no more than 4.5% of the population can possibly be in the (G1) subset. In addition, we undertook an analysis, which incorporated possible errors in these epidemiological observations. We then iteratively assigned, to each input parameter {g, p, x', r, s, & P(MS)}, values which spaned their entire plausible ranges, solved Eqs 5a & 5b (see #3a, in S1 File) for the Lower Solution using the different parameter combinations, and determined which combinations satisfied the constraints placed by the epidemiological observations (see #3b, in S1 File). From this analysis we conclude that:  Table 3; Fig 3 &  #5, below). Indeed, this analysis demonstrated that: which is far removed from the actual observational data ( Table 3; Fig 3). It seems, therefore, that the circumstance of {P(G) = 1} is excluded, even for Lower Solutions, in all but the most extreme distributional circumstances and, thus, for the majority of the population, developing MS is not possible. In earlier iterations of this analysis [3,4,51,52], we defined the (G) subset differently-i.e., as 8G i 2G: P(MS|G i )�P(MS). Also, we note that, in the present analysis for Lower Solutions, our older definition effectively corresponds to defining only members of the (G1) subset as being genetically-susceptible to MS.

Genetic susceptibility in the gender partition-P(F│G) & P(M│G)
Conclusions: 1 Because the sub-subsets (G1) and (G2) have significantly different expected pentrances, we assume that each, considered separately, conforms to the Upper Solution (see Methods). Therefore, from the estimated adjustments for the similar environment of MZ-twins for this partition (see #1.1b, in S1 File), together with the data in Fig 3, it follows that: and : 0:017 < x 2 ¼ PðMSjM; GÞ � 0:034 ðEq 2BÞ These possible ranges for men and women don't overlap. Therefore, for this partition, we have defined (above) the sub-subsets (G1) and (G2) correctly because: (x 1 >x>x 2 ) -see Methods. In this case: (a>1>b) and, as a consequence, P(G1|G,MS) must be greater than P(G1|G)see #2a, in S1 File. The proportion of MS patients who are women from Table 3; Fig 3 is 66%. For the WTCCC data this number is 72%. From the study of Orton and colleagues [56] out of Canada, in the most recent epoch, the percentage of MS patients who are women is 76%. From a recent prevalence estimate for the United States [43], the percentage of women among MS patients is 74%. Using the data from Table 3 In fact, the gender imbalance may be even greater than this (see #4, in S1 File). Thus, there are four serious concerns about undertaking any calculations that use the limits for (x 1 and: x 2 ) set forth by Eqs 2A & #2B, above. First, in making the above calculation, we are positing an extreme and tri-modal distribution for the set {X}-i.e., not the unimodal or bimodal distributions under primary consideration. Thus, this calculation, envisions a distribution, in which half of the women have a uniform penetrance of slightly greater than zero and the other half have a uniform penetrance of (x 1 ') -i.e., women have the maximum variance possible-and, in which every man has exactly the same penetrance of (x 2 '), which is intermediate between these two extreme penetrance groups of women-i.e., men have a zero variance.
Second, such an extreme distribution seems unlikely, especially for circumstances, in which partitioning the (G) subset by a different MS-associated characteristic-i.e., HLA-status (see #6, below)-doesn't even give a hint of the bimodal nature of {X}. Third, it is not possible that the variance of penetrance values for the (F,G) subset to be at its maximum value. Thus, because, (x 1 '>x') -see Table 3; Fig 3 -the maximum variance for the sub-subset (F,G) -(x 1 '/2) 2 -exceeds the maximum total variance possible for the entire (G) subset-(x'/2) 2 . Consequently, the lower limit for the value of (x 1 ) in Eq 2A -i.e., at its maximum possible variance-must be too low. And fourth, some of the maximum possible variance in the {X} set must be accounted for just by the separation of (x 1 ) from (x 2 ) -see #4, in S1 File.
Following the standard development of variance relationships [57], and taking each of these factors into account (see #4, in S1 File), including all solutions (either Upper or Lower), in which the penetrance values of (G1) and (G2) each follow an Upper Solution, leads to the conclusion that: Importantly, however, if the distribution of {X} follows an Upper Solution, those limits still apply (see #4a, above) although the somewhat different estimates for P(G), in this circumstance, would need to be reconciled. Because the estimate derived from Table 3 G2) are distributed in a unimodal manner would also help (see #4, in S1 File), as would an underestimate (from Table 3; Fig 3) for the proportion of women among MS patients (see above, this section, see #8, below, & see #3b, in S1 File).
Regardless, however, it seems clear not only that genetic susceptibility is rare in the population, even for Lower Solutions, but also that men are more likely than women to be genetically susceptible to MS. At first pass, it might seem biologically improbable that men would be more likely than women to be in the genetically-susceptible subset (G). Thus, if membership in the (G) subset is envisioned as being due to an individual possessing a sufficient combination of some number of loci in a "susceptible state" [58], it is unclear how men could be more likely than women (or vice versa) to possess certain combinations and not others. This seems especially unlikely for circumstances, where one association study, specifically focused on the Xchromosome, failed to identify any susceptibility loci on this chromosome [7], where another large GWAS found that all but one of the 233 MS-associated loci were located on autosomal chromosomes [14], and where no major gender interaction term has been reported in the literature. Indeed, considering the different "risk" haplotypes in the HLA region identified in the WTCCC, men and women seem equally likely to be carriers [59]. Nevertheless, we can designate (G ak ) to represent each of the (n) autosomal genotypes (k = 1,2,. . .,n) in the general population (Z) − i.e., omitting any specification of gender. In this circumstance, it is entirely possible that: 8G ak 2 Z : PðG ak jMÞ ¼ PðG ak jFÞ ¼ 0:5�PðG ak Þ and, yet, for some specific autosomal genotypes to have the characteristic that: PðG ak ; MÞ 2 G and : PðG ak ; FÞ= 2G Indeed, such an explanation for the excess in susceptible men would fit well with the observation that the specific genetic combinations, which underlie susceptibility to MS, seem to be unique to each individual (see #9, below; see also #7, in S1 File). In addition, such a circumstance might also help to rationalize the finding that men seem to have a lower threshold of environmental exposure for developing MS compared to women (see #7, below).
It is clear that (H+) status is considerably enriched in the MS population compared to controls. For example, in WTCCC controls {P(H+) = 0.23}, whereas in cases {P(H+|MS) = 0.50}. This enrichment of (H+) status in MS could occur in two ways (see #5, in S1 File). First, (H+) could make membership in the (G) subset more likely than it is for the (H−)-subset-i.e., it is due to an impact on the ratio of: P(G|H+)/P(G|H−). Second, members of the (G,H+) subset may have a greater penetrance for MS than members of the (G,H−) subset-i.e., it is due to an impact on the ratio of: P(MS|G,H+)/P(MS|G,H−). The available epidemiological data (see #5, in S1 File) suggests that the majority of enrichment is the due to the 1 st of these two possible mechanisms and that: PðGjHþÞ � 3:35�PðGjHÀ Þ In addition, the observation (from the Lower Solution) that less than 7.3% of the population is genetically susceptible (see #5; above), together with the WTCCC observation that: P (H+) = 0.23, indicates that fewer than 32% (7.3/23) of (H+)-carriers are even genetically susceptible to MS. Indeed, taken together, the fact that only half of MS-patients are in the (H+) subset and the fact that this estimate for genetic susceptibility represents an upper bound for the Lower Solution, indicates that the actual percentage of (H+) carriers who are genetically susceptible must be far less than this 32% figure. Nevertheless, essentially all of the conserved extended haplotypes (CEHs) that carry (H+) − even those with a single representation in the WTCCC dataset − are associated with MS [60]. Therefore, it seems likely that all (H+)-carrying CEHs can contribute to genetic susceptibility. Despite this contribution, however, the majority of (H+) subset members have no chance whatsoever of developing MS. Therefore, at least with respect to the (H+)-carrying CEHs, genetic susceptibility to MS must result from the combined effect of (H+) together with the effects of certain other (as yet, unidentified) genetic factors (see #7, in S1 File). By itself, however, (H+) membership poses no MSrisk.

Stochastic factors play an important role in MS pathogenesis
Argument: As noted in the Methods, we define (E T ) to be the prevailing environmental conditions (whatever these are) experienced by the population during some time-period (T). We also define (E i ) to be the specific environmental exposure, which is sufficient for MS to develop in the i th susceptible individual (however many events are involved, whenever these events need to act, and whatever these events might be)-i.e., both the events (E i and G i ) need to occur jointly in order for MS to develop in the (i th ) individual. Because genetic susceptibility is independent of the environmental conditions, the probability of a sufficient environmental exposure {P(E)} in the (G) subset at time-period (T) can be expressed as: When {P(E) = 0}, it is not possible for any susceptible person to experience an environment sufficient to cause MS. By contrast, when {P(E) = 1}, every susceptible person experiences an environment sufficient to cause MS. If there are some susceptible individuals, for whom any environmental experience is sufficient to cause MS (i.e., these individuals have "purely genetic" MS), then: 0<P(E)�1 and thus, {P(E) = 0} cannot be observed. Importantly, those circumstances, in which {P(E) = 0}, only imply that, whatever environmental exposures take place (i.e., E T ), these are insufficient to cause MS in anyone. Regardless, considering the definitions of both P(E) and the (G) subset (see Methods), it is clear that:

PðMS; G; EÞ ¼ PðMSÞ
Notably, also, the above expression for P(E) explicitly incorporates the possibility that each genotype in (G) may require a unique set of environmental events in order for MS to develop in that individual. Nevertheless, despite this possibility, the existing epidemiological data suggests that many (or most) MS patients are responding to similar environmental events and, thus, any large variability in this regard is probably not a major factor in MS pathogenesis.
For example, despite the fact that every MS patient (except MZ-twins) has a unique combination of "states" at the (>200) susceptibility loci (see #7, in S1 File), the population-based data from Canada indicates that the change in general environmental conditions (whatever these are), which have taken place between the time periods of (1941)(1942)(1943)(1944)(1945) and (1976)(1977)(1978)(1979)(1980), have produced, at a minimum, a 32% increase in the prevalence of MS (see #6d, in S1 File). Moreover, because this increase has occurred world-wide and predominantly in women [3,4,51,52,56], the (F:M) sex ratio for MS in Canada has increased during every 5-year increment except one between these two time-periods [56]. Over the entire interval, the ratio has increased from 2.2 in (1941)(1942)(1943)(1944)(1945) to 3.2 in (1976-1980). These changes are far too rapid to be genetically based.
It is conceivable that this observed sex-ratio change might be artifactual. For example, if women were more likely than men to have minimally symptomatic MS, then, with such patients now being diagnosed by our improved imaging and laboratory methods, women might represent a disproportionate number of these newly diagnosed cases. Alternatively, in earlier eras, vague symptoms of MS in women may have been written off as "non-organic" more often than they were in men. Nevertheless, four lines of evidence argue strongly against this change being an artifact. First, this increase in the sex ratio began before, and continued up to, the advent of modern imaging and laboratory methods [56]. Second, among asymptomatic individuals, incidentally, found to have MS by MRI, the (F:M) ratio is approximately the same as current estimates for symptomatic MS and 80% of the those with spinal cord lesions are women − i.e., those lesions having, by far, the greatest odds for progression to "clinical" MS [61]. Third, if (as seems likely), women have a higher threshold for developing MS than men, this would require the difference in exposure between the genders to be one of degree not one of kind (see below, this section; see also #6e, in S1 File). Finally, and most persuasively, the greater penetrance of MS in women is confirmed independently by the MZ-twin data (see #5 above). Consequently, the increase observed in the (F:M) sex ratio of Canada [56] almost certainly has an environmental basis.
In addition, a prior Epstein Barr viral (EBV) infection seems to be a prerequisite for most (or all) genotypes in (G) to develop MS [3,4,51,52,[62][63][64]. Indeed, if (as suggested by these studies) a prior EBV infection occurs in 100% of MS cases, this would indicate that EBV exposure can be designated as a 'necessary factor' and, as such, must be part of the causal pathway leading to MS [51]. In addition, the likelihood that members of the (G) subset will develop MS seems to be influenced greatly by vitamin D deficiency, latitude, migration, and the IU environment [3,4,51,52,[62][63][64]. Each of these additional observations also indicates that similar environmental changes can affect a large proportion of genetically susceptible individuals in a similar manner (i.e., contribute to MS pathogenesis).
Using the standard methods of survival analysis [65], we can define the cumulative survival {S(u)} and failure {F(u)} functions as well as the hazard-rate functions {h(u)} and {g(u)} for developing MS at different environmental exposures in "susceptible" men and women (respectively). These hazard-rate functions are assumed (initially) to be proportional. The implications of non-proportionality are considered in the in S1 File (#6e) and in the legend of Fig 4. However, assuming proportionality, then: For men, we can transform exposure from (u) units into (a) units, first by defining {H(u)}to be the definite integral of the hazard-function {h(u)} from a (u) level of exposure to a (0) level of exposure and, second, by defining the (a) units to be: Because these (a) units are arbitrary, we can assign "1 unit" of environmental exposure in men to be the difference in exposure level between any two time points (e.g., a 1 and a 2 ) such that: For women, we can similarly transform exposure into a different scale of so-called "apparent" exposure units (a app ) such that: a app ¼ R�a and where we now define "1 unit" of environmental exposure (on this scale) as: The choice of which gender (men or women) to assign to which scale is completely arbitrary. A standard derivation from survival analysis methods [65], demonstrates that the survival curves are exponential with respect to their hazard functions.
Thus, for men: ln½SðuÞ� ¼ À and, for women: So that, for men: SðaÞ ¼ e À a and : FðaÞ ¼ 1 À e À a and, for women: Sða app Þ ¼ e À a app and : Fða app Þ ¼ 1 À e À a app In considering the probability of failure (i.e., of developing MS), we will use subscripts (1) and (2) to denote the failure probabilities and the values of other parameters at the 1 st and 2 nd time-periods respectively. Importantly, unlike true survival (where everyone fails given a sufficient amount of time), the probability of developing MS may not become 100% as the probability of a sufficient environmental exposure increases to {P(E) = 1}. Moreover, the limiting value for the cumulative probability of developing MS in men (c) need not be the same as that in women (d). However, because the new definition of the subset (G) differs from earlier iterations of our analysis [3,4,51,52], the environmental exposure at which the development of MS becomes possible (i.e., the threshold) must occur at {P(E) = 0} for, at least, one of these two sub-subsets-provided that this exposure level is possible for either one or both of these 2 gender subgroups (see Fig 4, and above). Increasing the estimate of P(F|G) will reduce the separation of the response curves by lowering the plateau for women and raising it for men; increasing the estimate of P(F|MS) 2 will increase the separation of between the curves in men from women for the opposite reason; increasing the estimate of P(G) will reduce the plateaus of both response curves. Response curves for women under conditions (R = 0.67) and: (R = 1.5) are also depicted and are shown in grey lines (dashed and dotted, respectively). Changes to the value of (C) will slightly alter the units of the y-axis. As seen in the Figure, men have a lower threshold for developing MS compared to women (see #7, Text), and changes to the value of (R) alter how quickly the curves reach their plateau (limit). If the hazard is not proportional, for women, each of the points (Zw 1 , Zw 2 , Zm 1 , and Zm 2 ) would be the same as depicted for (R = 1), although the scale of the x-axis for the two exponential curves would be transformed non-linearly and, thus, the response-curve in men could not be plotted on the same graph as women. Moreover, the x-intercept for the curve in women would be at (a app = λw = 0). Nevertheless, the limiting values (c) and (d) would be unchanged and, under any circumstances, women (relative to men) would still be seen to exhibit a greater responsiveness to those changes in environmental exposure, which have taken place between the two timeperiods. If some cases of MS were "purely genetic" (i.e., P(E|M) = 0, or P(E|F) = 0 or both were not possible), this could elevate the zero-point on y-axis for "environmental" MS to the intersection of the curves for men and women (see Text) and this would make the threshold difference then disappear (i.e., λ = 0) -see #7, Text. https://doi.org/10.1371/journal.pone.0246157.g004 From these definitions, the failure probability for susceptible women (Zw) and men (Zm) at the 1 st time period is: and : By our definitions of "1 exposure unit", these equations, at the 2 nd time point, become: and : Because the observations at time-periods (1) and (2) represent two points on the exponential response curves for both women and men, and because any two points on an exponential curve defines the curve (both uniquely and completely), we can use the observations regarding the (F:M) sex-ratio change over time in Canada [56], to derive and construct these two response curves.
Thus, from the definition of {P(E)} and using: {P(G,F) = P(F|G) � P(G) = 0.25 � 0.044 = 0.011 & P(F|MS) 2 = 0.66 − i.e., P(F|G) and P(G) are in the middle of their estimated ranges (see #5 above; see also #4, in S1 File) and P(F|MS) 2 is taken from Table 3 − we can estimate the values of (Zw 2 ) and (Zm 2 ) as: Moreover, as demonstrated in the in S1 File (#6a & #6d), we define the term (C) such that: and, thereby, re-express (Zw 1 ) and (Zm 1 ) in terms of (Zw 2 ) and (Zm 2 ) such that: Consequently, based on the population data from Canada, the prevalence of MS must have increased by more than 32% between these two time periods.
Finally (see #6b, in S1 File), we can estimate the value of both (c) and (d) as: c ¼ ðZm 2 Þ�f1 À ½PðMjMSÞ 1 =PðMjMSÞ 2 ��C�e À 1 g=ð1 À e À 1 Þ and : d ¼ ðZw 2 Þ�f1 À ½PðFjMSÞ 1 =PðFjMSÞ 2 ��C�e À 1 g=ð1 À e À 1 Þ Thus, using the observed change in the (F:M) sex-ratio over time in Canada, together with our estimates for P(G) and P(F│G), we have all the data needed to construct the complete response curves for the probability of developing MS with a changing environmental exposure in genetically susceptible women and men (Fig 4). What these curves make clear is that both P (E) and P(MS) are changing over time, which indicates that specific environmental conditions, in addition to specific susceptible genetic combinations, are necessary for MS to develop. Thus, MS develops when the right genetic constitution is exposed to the right environmental conditions (i.e., it is fundamentally due to a gene-environment interaction).
Because, the scales for the response-curves for women and men are initially assumed to be proportional they can be plotted on the graph (see Fig 4; see also #6e, in S1 File) and, when this is done, the threshold (x-intercept) occurs at {(a,Zm) = (λm,0)} for men and {(a,Zw) = (λw,0)} for women. By the definitions of (E) and (a), one of these thresholds must occur at {(a,Z) = (0,0)} -provided this exposure level is possible (see Fig 4 & above, same section). However, because these thresholds need not be the same, we define the difference in threshold between women and men as: (λ = λw−λm) such that, if women have a higher threshold than men: (λ>0). However, as noted (above), the (a) scale (for men) may be different than the (a app ) scale for women so that in order to plot them on the same graph requires the conversion of (a app ) units into (a) units-see #6c, in S1 File.
Three final points are also worth making. First, because, as demonstrated in the (#6c) in S1 File, (λw) is independent of R, we can use the condition of (R = 1) to evaluate (λw). In this circumstance, these exponential equations can be re-arranged (#6c, in S1 File) to yield: Consequently, basic epidemiologic data can be used to determine the difference in threshold (λ) that exists between women and men. As demonstrated in the in S1 File (#6c), this leads to the two conclusions that: 8C > 0:50 : 0:37 < l < 4:67; and : 8C > 0 : l > 0 Moreover, the value of (λ) depends only upon the value of (C) and the sex-ratio change over time so that, if the hazards are proportional, men must have a lower threshold for developing MS compared to women (Fig 4; see also #6c, in S1 File). A lower threshold in men is also suggested by a report from Europe and the United States [66], which found that prior to 1922 men accounted for 58% of the MS cases ( Table 5). By our definition of P(E), these thresholds indicate the exposure, at which MS becomes possible. If women required a fundamentally different kind of exposure than men, it would be very hard to rationalize a difference in threshold because, in such a circumstance, in some environments, women would be more likely and, in other environments, less likely than men to receive the correct exposure. Rather, a difference in threshold implies that men and women are responding to similar events but that men require a less extreme degree of exposure in order to develop MS. For example, perhaps, susceptible men develop MS with a lesser degree of vitamin D deficiency or with EBV infection occurring over a broader age-range compared to susceptible women. {NB: even if there were no threshold difference, proportionality, by itself, would suggest that difference in exposure was one of degree but not kind.} Alternatively, there may an environment-gender interaction such that susceptible men, in any given environment (i.e., E T ), are more likely to experience a sufficient exposure than susceptible women. For example, perhaps men are more likely to engage in "risky" behaviors compared to women, or that they are more likely to be "sun-averse" than women. Having said this, however, it is not clear how (or whether) "individual" differences in behavior (even if they are biologically driven) could lead to a "population-level" difference in threshold (Fig 4). More likely, any such interactions would have to be related to physiological differences between the genders.
Another possibility is that a small percentage of MS patients (in men or women or both), have "purely genetic "MS, whereby any environment is sufficient to cause MS, given their genotypes. Such a circumstance renders the points (λw = 0), or (λm = 0) or both unobservable, as drawn in Fig 4 (see above; same section). For example, in Fig 4, if~1.8% of both susceptible men and women had "purely genetic" MS, this would raise the zero point of the y-axis for "environmental" MS such that this threshold difference would disappear (i.e., λ = 0) and both men and women would begin their "environmental" response at the new (0, 0) point in Fig 4 -i.e., at the point of intersection of the two curves. The same would be true if only men had this percentage of "purely genetic" MS except that, in this case, men would begin their "environmental" response at the point of intersection-i.e., at (Zm = 0.018), wherease women would begin at (Zw = 0), which would define the the new onset point (0,0). If both had more than this percentage (or other combinations), the exact relationship between the curves at the start would change but, depending upon the exact situation, there still could be no difference in the threshold for "environmental" MS (Fig 4). Clearly, this example only applies to the specific conditions of Fig 4. Nevertheless, because (λ>0) and because, at every exposure at or below the exposure at the point of intersection: (0<Zm<Zm 1 ), in this circumstance, only a small amount of "purely genetic" MS would be necessary to eliminate the threshold difference for every condition (see also #8 below).
Second, we note that: {P(MS|G,E,M) = c}, so that (Zm2) can be re-expressed as: These results strongly suggest that the relevant environmental exposures (especially when these are multiple) are currently occurring at population-wide levels. For example, if three, equally likely and independent, environmental events (EE 1 , EE 2 , and EE 3 )-possibly sequential [51,52]-were sufficient to produce MS in a susceptible individual, then: so that, under the stated circumstances, more than 94% of the population would experience each environmental event. Such a conclusion is fully consistent with the same conclusion reached from studies in adopted individuals, in siblings and half-siblings raised together or apart, in conjugal couples, and in brothers and sisters of different birth order, which have generally indicated that MS-risk is unaffected by the micro-environments of families but, rather, results from population-wide exposures [67][68][69][70][71][72][73]. And third, it is clear that both of these response curves plateau well below 100% failure, especially in men (Fig 4). Therefore, there must be stochastic processes that partially determine whether a susceptible individual with a sufficient environmental exposure will actually develop disease (see #9, below). As a sufficient environmental exposure {P(E)} becomes more likely, the quantity {P(MS│IGMS)} will, of necessity, change. Earlier, we described this term as having removed the impact of the shared IU and certain (especially early) post-natal environments of MZ-twins. This description, however, is not quite accurate. For example, we can break down a "sufficient" environmental exposure (see #1, in S1 File) into those factors that are shared exclusively by MZ-twins (E1), those factors that are shared by the population generally (E2), and those factors that shared exclusively within the family micro-environment (E3). As noted above, however, the family micro-environment seems not to have any impact on the likelihood of MS [67][68][69][70][71][72][73]. In this circumstance, assuming only factors (E1 and E2) are necessary for a sufficient exposure, then:

Conclusions
If an individual's identical twin is known to have MS, it is likely that this individual, also, has experienced a "sufficient" (E1) exposure.
Conceived of in this way, the term {P(MS│IGMS)} can be rewritten as: The reason for the inequality is that, in those circumstance where: it must be that both: P(E 1 ) = 1 and: P(E 2 |E 1 ) = 1. Naturally, the fact that {P(E 1 )} has increased to unity does not guarantee that {P(E 2 |E 1 )} has done the same, so that the limiting value for P (MS,E|G) may be greater than P(MS|MZ MS ).
Nevertheless, if it is currently true (see #7 above), that: PðEÞ > 0:76 and; thus : PðE 2 j E 1 Þ > 0: 76; then it must also true that: c�P(MS|M,MZ MS ) and: d�P(MS|F,MZ MS )} Regardless, however, the depicted curves (Fig 4) must be inaccurate because, in the Figure:  (Table 3; Fig  3). In this analysis, we found numerous combinations, which matched these constraints. The solution space covered by these matching combinations included the full range of possibilities for the parameters of (C) and (R). By contrast, the ranges for both P(F|G) and P(G) were restricted: {0.33�P(F|G)�0.5} and {0.02�P(G)�0.055}. This restricted range for P(G) fits, generally, within the framework developed previously and confirms the conclusion that developing MS is not a possibility for a large majority of the population (see #4a & #4b above). Similarly, this analysis confirms that women are less likely than men to be in the (G) subset, although the estimated range for P(F|G) is somewhat higher than the ranges developed previously (see #5 above; see also #4 in S1 File). As discussed in the in S1 File (#4), however, this could relate to an underestimate for the parameter {P(MS|M,MZ MS )}, which is based upon only 2 observations of concordant male MZ-twins (Table 3).
Also, the 5 potential solutions for which: {P(MS)�0.003} accounted for 11% of the total matching combinations. By contrast, the 5 potential solutions for which: {P(MS)�0.004} accounted for 79% of the total. This circumstance suggests that we are under-estimating P (MS) when using the observed disease prevalence in the general population (Z). Indeed, several autopsy studies have indicated that the prevalence of undiagnosed (pathological) MS is~0.1% [46][47][48][49][50][51]. Thus, with minimally symptomatic (or asymptomatic) MS occurring in as many 0.1% of the population, this could potentially increase the estimated P(MS) by as much as 50−100%. Although, such diagnostic errors are probably less common in the modern era, many minimally symptomatic (or asymptomatic) patients are still being undiagnosed during life [59]. Moreover, any such under-ascertainment is likely to be less for MZ-twins, DZ-twins, and siblings than in the general population. For example, an initially unaffected twin or non-twin sibling of a patient with MS will, almost certainly, be more carefully monitored for possible MS symptoms (i.e., for minimally symptomatic presentations) than will an individual in the general population. In such a circumstance, these diagnostic failures will be fewer in the (MZ MS ), (DZ MS ), and (S MS ) populations than in the general population and the MZ-twin concordance rates will, thus, provide a more accurate reflection of the maximum likelihood of getting MS {i.e., P(MS|G,E 1 ,E 2 )} than will those estimates of P(MS) derived from the MS-prevalence in the general population. Such a circumstance would help to account for this apparent discrepancy.

Missing heritability?
Conclusions: 1. Both "genetic" and "environmental" factors are necessary for MS expression; Neither alone are sufficient. 2. A large portion of the "causal pathway" to MS is stochastic 3. There is no need to invoke any "missing heritability" in MS Argument: Only a small proportion of the population seems to be genetically susceptible to developing MS, which implies that MS is a genetic disorder. In addition, a suitable environmental exposure, like a suitable genetic constitution, is also a necessary part of MS pathogenesis. Despite this, however, the combination of a susceptible genotype together with a sufficient environmental exposure, does not invariably lead to the disease of MS and, in fact, the response curves in both women and especially men plateau well below 100% (Fig 4), even when everyone receives an environmental exposure suitable for their particular genotype-i.e., when {P(E) =1}. This variance in the likelihood of getting MS for certain susceptible genotypes cannot be attributed to unidentified environmental conditions because the definition of the term {P(E)} − see #7 above − explicitly includes all such factors, both if they are known (or suspected) and also if they are completely unknown. Therefore, a large portion of the overall variance in MS disease-expression must be due to stochastic processes.
In this context, dividing the total variance in disease expression into genetic and environmental components, at least for MS, mischaracterizes the situation. This has important implications regarding current estimates for the "missing hereditability" in MS [74][75][76]. First, as noted above, a large portion of the variability in MS expression must be due to stochastic processes that are neither environmental nor genetic. And second, specific gene-gene combinations (likely unique to individuals or very small groups of individuals) must underlie genetic susceptibility to MS (see #6 above; see also #7, in S1 File). Thus, with over 200 MS-associated loci [14], each (potentially) having more than one "susceptible state" (e.g., the MHC), the number of possible combinations of states at these loci is so huge that, almost certainly, everyone (except MZ-twins) possesses a unique combination of these "susceptible states" (see #7, in S1 File). Indeed, considering (H+)-status together with only the first 102 of these MS-associated SNPs [13], everyone (including both cases and controls) in the WTCCC population does, in fact, possess a unique combination (#7, in S1 File). Consequently, if only a few such combinations are members of the (G) subset, even among those combinations that are quite similar to each other (see #6 above; see also #7, in S1 File), then there are more than enough genetic associations already identified to account fully for (G) subset membership. Naturally, many more loci may yet be identified, although positing their existence is unnecessary.
Alternatively, if "missing heritability" is only meant to imply that our genetic model cannot predict accurately the occurrence of MS, then, indeed, almost all of the heritability of MS remains unexplained. Thus, the environmental factors, the actual (as opposed to associated) genetic factors involved in causing disease, the necessary gene-gene combinations, the various gene-environment interactions, and the stochastic factors-all of which contribute importantly to whether MS can, or will, develop in a specific individual-are poorly understood, thereby making any accurate prediction of MS occurrence impossible at present.

Discussion
The present analysis provides considerable insight to the nature and basis of susceptibility to MS and to the role of genetic determinants in polygenic diseases. Firstly, we establish that, fundamentally, MS pathogenesis requires both a genetic predisposition and a sufficient environmental exposure. Moreover, only a fraction of the population (less than 7.3%) is geneticallysusceptible. Thus, more than 92.7% of the population has no chance of developing MS, regardless of the environmental conditions that these individuals experience. Thus, the correct genetic make-up is essential for disease pathogenesis. The basis of this genetic susceptibility, however, is complex. Single genes or single haplotypes do not contribute much. For example, in MS, the Class II HLA-DRB1 � 15:01~HLA-DQB1 � 06:02~a1, or (H+), haplotype is the genetic trait with the largest (by far) disease-association of any in the genome (for the WTCCC: OR = 3.28; p<<10 −300 ). Nevertheless, despite this strong association more (and, likely, far more) than 68% of individuals who carry this haplotype have no MS-risk whatsoever. In this circumstance, it must be that genetic susceptibility depends upon the possession of this haplotype in combination with other genetic traits. Notably, this haplotype is only a part of much larger CEHs, which span the entire MHC region [23,24]. Even considering the large number and variety of these highly selected CEHs, however, genetic susceptibility cannot be explained on the basis of the state of the MHC. Despite a significant variability in the observed disease-association among the different (H+)-carrying CEHs, every such CEH (regardless of its rarity) seems to be strongly MS-associated [23,24].
In addition, it seems clear that, although certain genetic combinations increase the likelihood of (G) subset membership, the actual combinations that do this are quite heterogeneous, and only a small proportion of genetically susceptible individuals (who actually develop MS) share even the same 4-locus genetic combination (see #7, in S1 File). These observations also suggest that susceptibility to MS, although genetically based, is idiosyncratic.
Despite the conclusion that MS is genetic, however, MS is equally an environmental disease. Specific environmental exposures are also necessary for disease-pathogenesis. Indeed, the fact that there has been a marked recent increase in both MS-prevalence and the (F:M) sex-ratio, indicates that a sufficient environmental exposure is required for MS to develop (Fig 4). If a person is not exposed to a sufficient environment, they cannot get MS, regardless of their genetic make-up. However, neither environment nor genetics alone is sufficient. Rather, MS is due to an interaction between the two.
Several environmental events, probably sequential, seem necessary for MS to develop in a genetically susceptible individual [3,4,51,52,[62][63][64]. The first environmental event, as discussed previously [51], is one that occurs during IU or early post-natal period. Support for such a factor comes from the discrepancy in recurrence-rates between twin and non-twin siblings, from the fact that concordant half-twins are twice as likely to share the mother than the father, and from the periodic, circa-annum, effect that month-of-birth has on the subsequent likelihood of developing MS [51]. In the northern hemisphere, this periodicity to MS-susceptibility peaks just before the summer months and dips to its nadir just before winter and this pattern is inverted southern hemisphere [51]. Each of these three observations implicates an environmental event, involved in MS pathogenesis, that is occurring near birth [51]. The evidence for a circa-annum periodicity to susceptibility suggests that this event is coupled to the solar cycle [51].
A second environmental event is implied by the published migration data [51]. Thus, when an individual relocates (prior to~15 years of age) from an area of high-prevalence to an area of low-prevalence (or vice versa), their MS risk is similar to that of the area to which they moved. By contrast, when they make the same relocation after this time, their MS risk seems to remain that of the area from which they moved. These observations implicate an environmental event, involved in MS-pathogenesis, which occurs at or around puberty [51]. And third, the clinical onset of MS generally occurs long after the first and second events have already taken place (Fig 1), suggesting that one or more additional environmental events are also required for clinical MS to develop.
Naturally, there is no guarantee that the environmental events, which are sufficient to cause MS in one person, are the same as those that are sufficient in another. Nevertheless, those factors or events, which have been implicated in MS-pathogenesis so far, appear to affect a large proportion of susceptible individuals in a similar manner. Thus, the fact that we even have evidence for the first two factors (as described above) suggests this. In addition, a prior Epstein Barr viral (EBV) infection has been strongly linked to MS, especially when this infection results in symptomatic mononucleosis. Indeed, such an infection prior to clinical onset occurs iñ 100% of MS cases [3,4,51,52,[62][63][64] and, if this is the case, this would indicate that EBV exposure is a 'necessary factor' in the causal pathway leading to MS [51]. Finally, there is a considerable amount of circumstantial evidence, which suggests a role for vitamin D deficiency in this causal pathway [51].
However, even when the correct genetic background occurs together with an environmental exposure sufficient to cause MS in someone of that background, more than 50% of such individuals will still not develop clinical disease. Some of these individuals, no doubt, will have subclinical disease [46][47][48][49]61]. However, although such a circumstance will increase our estimate of {P(MS)} by as much as 50-100%, this is still insufficient to get the plateaus of the response curves (Fig 4) to exceed the 50% mark. In men (who have a plateau significantly lower than that of women), this conclusion is even more evident (Fig 4). Consequently, because a sufficient environmental exposure has been defined broadly (to include both factors that are known or suspected as well as factors that are completely unknown), the fact that some individuals with the proper combination of genes and environment still fail to develop disease, indicates that stochastic processes are also involved in disease-pathogenesis.
And finally, it is worth noting that the nature of genetic susceptibility developed in this manuscript is applicable to a wide range of other complex polygenetic disorders such as type-1 diabetes mellitus, celiac disease, and rheumatoid arthritis. Indeed, based solely upon Proposition #1, if the proband-wise MZ-twin concordance rate, for any disease, greatly exceeds the prevalence of disease in the general population, then only a tiny fraction of the population has any possibility of getting the illness. Moreover, any disease for which the proband-wise MZtwin concordance rate is substantially less than 100% must, in addition to genetic susceptibility, include environmental factors, stochastic factors, or both in the causal pathway leading to the disease.