Numerical Integration of the Master Equation in Some Models of Stochastic Epidemiology

The processes by which disease spreads in a population of individuals are inherently stochastic. The master equation has proven to be a useful tool for modeling such processes. Unfortunately, solving the master equation analytically is possible only in limited cases (e.g., when the model is linear), and thus numerical procedures or approximation methods must be employed. Available approximation methods, such as the system size expansion method of van Kampen, may fail to provide reliable solutions, whereas current numerical approaches can induce appreciable computational cost. In this paper, we propose a new numerical technique for solving the master equation. Our method is based on a more informative stochastic process than the population process commonly used in the literature. By exploiting the structure of the master equation governing this process, we develop a novel technique for calculating the exact solution of the master equation – up to a desired precision – in certain models of stochastic epidemiology. We demonstrate the potential of our method by solving the master equation associated with the stochastic SIR epidemic model. MATLAB software that implements the methods discussed in this paper is freely available as Supporting Information S1.


Introduction
Stochasticity can play an important role when studying a disease that spreads through a population of individuals [1][2][3]. A common approach to modeling this problem is by means of a Markov process, whose probability distribution satisfies a deterministic differential equation known as the master equation. Solving the master equation analytically however is not in general possible and Monte Carlo sampling, based on the Gillespie algorithm [4], is often used to accomplish this goal. Unfortunately, accurate evaluation of the probability distribution of a Markov process requires a prohibitively large number of Monte Carlo samples for most systems of interest. As a consequence, Monte Carlo sampling is mostly used to estimate statistical summaries of the underlying stochastic population dynamics, such as means and variances.
To evaluate the solution of the master equation, a number of approximation techniques have been proposed in the literature, such as the system-size expansion method of van Kampen [1,5,6]. While approximations may work well in certain circumstances, they often fail when the underlying assumptions are not satisfied. The system-size expansion method for example can only produce a normal approximation to the solution of the master equation. Therefore, if the probability distribution of the population process is bimodal, then this method will produce erroneous results. Some effort has recently shifted away from Monte Carlo sampling and approximation techniques and has focused on exploiting the linear structure of the master equation associated with the population process. This results in a numerical solution to the master equation through matrix exponentiation; e.g., see [2,[7][8][9][10]. A popular technique along these lines employs a Krylov subspace approximation (KSA) method [7,8] that dramatically reduces the size of matrix exponentiation and results in an attractive iterative algorithm for solving the master equation. However, the KSA technique is based on several approximations, whose cumulative effect may appreciably affect the method's accuracy, numerical stability, and computational efficiency.
There are two main issues that can affect performance of the KSA method. One is choosing the dimension of the approximating Krylov subspace used. If the dimension is chosen too small, the method may produce an inaccurate solution to the master equation, whereas, a value that is too large can result in an appreciable decrease of computational efficiency. Unfortunately, there is no rigorous way to optimally determine an appropriate value for this parameter, which is chosen manually, even in advanced implementations such as Expokit [7]. Another issue is the fact that, at each step, the KSA method may not necessarily produce a probability vector (i.e., a vector composed of nonnegative elements that sum to one). This problem can be addressed by using a sufficiently small step-size, but this may seriously affect the method's computational efficiency. In practice, the KSA method is equipped with a heuristic step that zeros-out all negative values and re-normalizes the positive values so that they sum to one. This step however introduces its own errors, which may affect the quality of the approximation in a manner that is not easy to predict.
Instead of using the population process, we can describe the stochastic spread of a disease by a more informative stochastic process known as the degree-of-advancement (DA). Exploiting the structure of the master equation governing this process results in a novel numerical algorithm for calculating the exact solution of the master equation, up to a desired precision, which we refer to as the implicit Euler (IE) method. This technique enjoys several advantages over the KSA method: its global error is of first-order with respect to the stepsize, it is numerically stable regardless of the step-size used, and always produces a solution whose elements are nonnegative and sum to one. As we will discuss in this paper, the IE method shows great promise for solving certain problems in stochastic epidemiology in which the sample space associated with the DA process is reasonably sized. It is not however meant to replace the KSA method, which is still the best numerical method available for solving the master equation in problems where implementation of the IE method is not computationally attractive or possible. To illustrate the potential of the proposed IE method, we calculate the solution of the master equation associated with the stochastic SIR epidemic model and use this solution to study some important properties of this model.

Disease dynamics
The classical SIR epidemic model (without births, deaths, or imports of disease) is one of the simplest models in epidemiology [11]. Here, each individual in a population is either susceptible to a disease, infected, or recovered. If we denote by S, I, and R the susceptible, infected and recovered individuals, respectively, and by S(t), I(t) and R(t) their corresponding (and possibly random) population numbers, we can characterize the state of the SIR model at time t by using the 3|1 vector ½S(t) I(t) R(t) T , where T denotes vector transpose. The state depends on time due to the (possibly random) occurrences of the following two reactions: which model infection of a susceptible individual (first reaction) as well as recovery of an infected individual (second reaction). We can model a complex epidemiological system in more general terms by using the following reactions: where N :~f1,2, . . . ,Ng and M :~f1,2, . . . ,Mg. This model congregates individuals into N different groups, X 1 ,X 2 , . . . ,X N , which interact through M coupled reactions. Parameters n nm §0 and n' nm §0 are the stoichiometry coefficients of the m th reaction. These parameters tell us how individuals interact with each other as well as their status after occurrence of a particular reaction. For example, in the aforementioned SIR model, we may set X 1~S , X 2~I , X 3~R , resulting in n 11~n21~n22~n ' 32~1 ,n' 21~2 , with the remaining coefficients being zero. The usual way to characterize an epidemiological system is by means of the N|1 random vector X(t) with elements X n (t),n[N , where X n (t) denotes the population of the n th group of individuals present in the system at time t §0. By convention, we set X(0)~x(0), for some known value x(0) (i.e., we assume that we know the initial population numbers at time t~0). We refer to the multivariate stochastic process fX(t),tw0g as the population process.
Let Z m (t) be the (possibly random) number of times that the m th reaction occurs during the time interval ½0,t) and Z(t) be the M|1 random vector with elements Z m (t),m[M. Then, fZ m (t),tw0g is a counting process, known as the degree of advancement (DA) of the m th reaction [5]. We set Z m (0)~0 and refer to the multivariate stochastic process fZ(t),t §0g as the DA process. Note that where is the N|M net stoichiometry matrix of the system with elements s nm :~n' nm {n nm . Therefore, and for a given initial population vector x(0), equation (3) allows us to uniquely determine the population process X(t) from the DA process Z(t). However, we cannot in general determine the DA process from the population process. This can only be done when the nullity of is zero, in which case Z(t)~( T ) {1 T ½X(t){x(0) . As a consequence, the DA process is more informative than the population process.

The master equation
To model an epidemiological system governed by the reactions in (2), we must specify, for each m[M, the probability that one reaction m will occur within an infinitesimally small time interval ½t,tzdt). For most systems of interest, this probability is given by p m (x)dtzo(dt), where o(dt) is a term that goes to zero faster than dt and p m (x) is a (usually nonlinear) function of individual populations at time t, known as the propensity function of the m th reaction [12]. It turns out that the DA process fZ m (t),t §0g is a Markovian counting process with intensity p m (X(t)). As a result, it can be shown that the probability mass function p Z (z; t) of the DA process satisfies the following master equation [13,14]: for tw0, initialized by p z (z; 0)~d(z), with d(z) being the Kronecker delta function, where and e m is the m th column of the M|M identity matrix. In the theory of Markov processes, the master equation is a special case of the more general forward Kolmogorov equation [5]. We can use the solution p z (z; t) of the previous master equation to calculate the probability mass function p X (x; t) of the population process. Since we are dealing with discrete random variables, this calculation is straightforward: where B(x) :~fzDx~x(0)z zg. Therefore, by solving the master equation (4) [i.e., by calculating the probability mass function p z (z; t), for tw0], we can completely specify the dynamic properties of a Markovian model. However, and for most cases of interest, this is a notoriously difficult task, both analytically and computationally. In the following, we propose a promising numerical method to address this problem and illustrate its potential using a simple example based on the stochastic SIR epidemic model.

Exploiting structure
Most available algorithms for solving the master equation focus on the population process instead of the DA process. It turns out that, by using the DA process, we may reap some benefits that can lead to a simple numerical solver for the general master equation (4).
In the following, we assume that statistical analysis of an epidemiological model of interest is limited within a finite time interval T :~½0,t max , where t max is the maximum time for which the DA process is almost surely contained within an M-dimensional discrete and finite sample space Z; i.e., We index the elements in Z by z k , k~1,2, . . . ,K, where K is the cardinality of Z (i.e., the total number of elements in Z). We can then define the K|1 vector w(t) with elements w k (t)~p z (z k ; t), for k~1,2, . . . ,K. Clearly, w(t) specifies the probability mass function p z (z; t). It can be seen from (4) that w(t) can be calculated by solving the following system of K linear ordinary differential equations (ODEs): where is a K|K matrix that can be directly constructed from the master equation. In the theory of Markov processes, is known as the generator matrix. Note that the k th column of contains zeros in most places except for the k th element that takes value Therefore, the elements of each column of add to zero. Note also that equation (7) is initialized by a vector w(0) whose first element equals 1 (assuming that z 1~0 ), whereas, the remaining elements are all zero.
The main advantage of using the DA process Z(t) is that, under an appropriate ordering of the elements in Z, the generator matrix will be lower triangular. We will shortly demonstrate that this can result in substantial simplification of the numerical algorithm used to solve (7).
To obtain a matrix that is lower triangular, we must order the points z k in the sample space Z lexicographically, such that z k [z kz1 , for k~1,2, . . . ,K{1, where [ denotes that one variable is lexicographically smaller than another [e.g., (z 1 ,z 2 )[(z' 1 ,z' 2 ) if and only if z 1 vz' 1 or z 1~z ' 1 and z 2 vz' 2 ]. Because a reaction can only increase (by one) the value of a single element of z, it is not possible for probability mass to be transferred from z k' to z k when z k [z k' . Such monotonic transfer of probability does not generally occur when the population process X(t) is used. Therefore, when the points z k , k~1,2, . . . ,K, in Z are ordered lexicographically, the (k,k') element of matrix will be zero when k'wk and, therefore, will be lower triangular. An example is provided in Supporting Information S2.

Numerical solver
We now proceed by exploiting the three key structural characteristics of matrix : its stability, triangularity, and sparsity. We have noted that the diagonal elements of are non-positive. However, since is triangular, its diagonal elements will be the eigenvalues of . Thus, the linear constant coefficient system of ODEs given by (7) is stable, ensuring the efficacy of implicit ODE solvers [15]. As a consequence, we can use the implicit Euler method to estimate w(t) at discrete time points t j :~jt, j~1,2, . . ., for a given time step t. Then, given an estimateŵ w(t j{1 ) of w(t j{1 ), we can obtain an estimateŵ w(t j ) of w(t j ) by solving the following system of linear equations: where is the K|K identity matrix. By initializing this computation withŵ w(0)~w(0), we can therefore recursively calculate the values of the probability mass function p z (z; t) of the DA process at the discrete time points t j , j~1,2, . . .. In Supporting Information S2, we show that solving the previous system is always possible, for any tw0, due to the invertibility of matrix {t . We also show that this procedure always returns a probability vector for any stepsize t §0. Moreover, we demonstrate that the resulting method is a first-order solver, since the global error the global error is proportional to the step-size t). Finally, since the implicit Euler step is always stable for any choice of t [15], the errors from previous iterations will not be amplified in later stages, regardless of the step-size used. Therefore, a desired error can be achieved by simply reducing the value of the step-size t. We refer to the resulting technique for solving the master equation based on (8) as the implicit Euler (IE) method. In general, solving (8) would require O(K 3 ) computations, where K is the cardinality of the sample space Z, which will be prohibitive. However, since is a triangular matrix, we can use forward substitution whose cost is usually of O(K 2 ). But since is a sparse matrix, with each column having only Mz1 non-zero elements, forward substitution can be done at a cost of O(MK) [16], where M is the number of reactions. In addition, calculating the probability mass function at time t j requires storage of O(MK) nonzero numbers. In particular, we need to store MK nonzero elements of matrix {t as well as 2(K{1) elements of vectorsŵ w(t j ) and w w(t j{ 1 ) [note that the elements of each column of matrix {t and the elements of each of the two vectorsŵ w(t j ) andŵ w(t j{ 1 ) sum to one]. Since K&M, the computational and memory requirements of the IE method will be O(K), which grow linearly in terms of K.

Matrix exponentiation
Instead of the previous approach, we may attempt to solve the master equation governing the population process X(t) by a matrix exponentiation method [2,9]. Let X be an N-dimensional discrete and finite sample space such that If we index the elements in X by x l , l~1,2, . . . ,L, where L is the cardinality of X , then we can define the L|1 vector h(t) with elements h l (t)~p X (x l ; t), for l~1,2, . . . ,L. In this case, the probability mass function h(t) can be calculated by solving the following system of L linear ODEs: where is a sparse L|L matrix whose structure can be inferred directly from the master equation [9]. Note that (9) is initialized by a vector h(0) whose first element equals 1 [assuming that x 1~x (0)], whereas, the remaining elements are all zero.
We can obtain estimatesĥ h(t j ) of h(t j ), for j~1,2, . . ., by using the following recursion: initialized withĥ h(0)~h(0). The main issue with this equation however is the need to evaluate the matrix exponential exp(t ).
Although many techniques are currently available to do this job, they are not very satisfactory due to issues related to stability, accuracy, and computational efficiency [17]. The best method currently available to compute (10) is based on a Krylov subspace approximation (KSA) method [7,8,18]. In its simplest form, the method approximates h(tzt)~exp(t )h(t), for a small tw0, byĥ h(tzt)~DDh(t)DD 2 (t) expft (t)ge 1 , where (t) is an L|L 0 matrix whose columns form an orthonormal basis for the L 0 -dimensional Krylov subspace K(t) :~span fh(t),t h(t), . . . , (t ) L0{1 h(t)g, (t) is an L 0 |L 0 matrix computed during the well-known Arnoldi procedure used to calculate (t), and e 1 is the first column of the L 0 |L 0 identity matrix. Then, the value of h(t j ) is recursively approximated bŷ for j~1,2, . . ., withĥ h(0)~h(0). The KSA method reduces the problem of calculating the exponential of the large and sparse L|L matrix to the problem of calculating the exponential of the much smaller and dense L 0 |L 0 matrix (note that L 0 %L, with L 0~3 0-50 being sufficient for most applications). Computation of the reduced size problem can be done by standard methods, such as Chebyshev or Padé approximation [7,8,17].
As we mentioned in the Introduction, there are several disadvantages of using the KSA method: possible error accumulation that may lead to instabilities, inability to produce, at each step, a probability vector without heuristically modifying the calculated values, and a need to specify an appropriate dimensionality for the Krylov subspace without appreciably affecting computational efficiency while achieving acceptable numerical accuracy. These issues are nicely circumvented by the IE method, but with a price: the proposed method can be applied to a smaller set of problems than the KSA method.

Practical considerations
In general, the computational and memory requirements of matrix exponentiation grow quadratically in terms of the cardinality L of the sample space X , and can quickly become prohibitive for large values of L. The KSA method however can greatly reduce this expense to O(L 0 (MzL 0 )L) computations and O((MzL 0 )L) memory locations, where L 0 is the dimension of the approximating Krylov subspace used and M is the number of reactions; see our discussion in Supporting Information S2. Thus, the relative efficiency of the IE method, which requires O(MK) computation and storage cost, to the KSA approach will depend on the relative values of the cardinalities K and L of the sample spaces Z and X , respectively.
As we mentioned before, if the nullity of the net stoichiometry matrix is zero, then there is a one-to-one correspondence between x~x(0)z z and z. As a consequence of (6), the cardinalities of X and Z will be the same, in which case K~L. Under these circumstances, the IE method will outperform the KSA method. This is a consequence of the fact that MK~MLvL 0 (MzL 0 )L and MK~MLv(MzL 0 )L in this case. We can easily verify that, for the simple SI model (SzI?2I), the SIR epidemic model characterized by (1), and the SEIR model (SzI?EzI, E?I, I?R, where E denotes a group of individuals exposed to disease but not yet infectious), the nullity of is indeed zero and, therefore, the IE method will be superior to the KSA method.
In general, the IE method will be computationally superior to the KSA method, provided that the cardinality of the sample space Z is not appreciably larger than L 0 (MzL 0 )=M times the cardinality of the sample space X [or not much larger than (MzL 0 )=M times the cardinality of the sample space X , if we also consider memory requirements]. Of course, in situations where the nullity of is large, the sample space Z can become appreciably larger than X , in which case the KSA method will be more preferable. Note that there are cases in which Z and X can become infinite (e.g., suppose an influx of people at some constant rate 1?X n , in which case both sample spaces will be unbounded). In these situations, the use of a finite state projection approach [9] is required to reduce the sample spaces, and the relative efficiency of the two methods will depend on the sizes of the resulting subspaces.
For a given step-size t, the IE method described so far generates a sequence of probability vectorsŵ w(t j ), j~1,2, . . .. Assuming that the true solution w(t j{1 ) is known at time t j{1 , we can show [see equation (S.11) in Supporting Information S2] that the local error Ew(t j ){ŵ w(t j Dt j{1 )E 1 is of O(t 2 ), whereŵ w(t j Dt j{1 ) is the approximation of w(t j ) obtained by the IE method for a given value of w(t j{1 ). We can further improve this result by employing a powerful computational tool known as Richardson extrapolation [19].
We have shown in the Supporting Information S2 that, if w w t (t j Dt j{1 ) andŵ w t=2 (t j Dt j{1 ) are the approximations of w(t j ) obtained from w(t j{1 ) by the IE method with step-sizes t and t=2, respectively, thenŵ w Ã (t j Dt j{1 ) :~2ŵ w t=2 (t j Dt j{1 ){ŵ w t (t j Dt j{1 ) also approximates w(t j ), but with a local error of O(t 3 ). We therefore expect thatŵ w Ã (t j Dt j{1 ) is a better approximation to w(t j ) thanŵ w t (t j Dt j{1 ) [or evenŵ w t=2 (t j Dt j{1 ); see Supporting Information S2] for a sufficiently small step-size t. This suggests that we can use a valuable modification of the IE method to obtain a better approximation to the solution of the master equation than the original technique. This modification combines two runs of the IE method, with time steps t and t=2, and produces a solutionŵ w Ã (t j ), given bŷ where ½x min denotes the minimum value of the elements of vector x. In this case,ŵ w Ã (t j ) is given by the 'improved' vector 2ŵ w t=2 (t j ){ŵ w t (t j ) only when all elements of that vector are nonnegative. Otherwise,ŵ w Ã (t j ) is given by the vectorŵ w t=2 (t j ) calculated by the IE method with the smaller step-size t=2. This assures that w w Ã (t j ) is always a probability vector. We will be referring to the resulting technique as the Richardson-based implicit Euler (RIE) method. We illustrate one step of this method in Figure 1. Many ODE solvers, including the KSA method, adjust the stepsize at each iteration to assure that the local error ERR is less than a pre-specified error tolerance TOL while minimizing the computational effort required to accomplish this goal. We can also modify the RIE method to accommodate variable step-sizes. By following our analysis in the Supporting Information S2, we can approximately calculate the local error ERR j at step j by [see equation (S.16)] where we use a factor of 1:1 to compensate for the possibility that the true (but unknown) local error is larger (by 10%) than the actual error calculated by Eŵ w t=2 (t j ){ŵ w t (t j )E 1 . If ERR j vTOL, then we consider the step successful and increase the step-size from t to t Ã , where [see equation (S.17) However, if ERR j wTOL, then the step is unsuccessful. In this case, we decrease the step-size from t to t Ã by using (12) and redo the RIE step. We finally note that some readers might be concerned with precision loss in the forward substitution step of the IE and RIE methods. To address this issue, we could employ the standard numerical technique of iterative improvement [15] with moderate additional computational cost. However, we show in the Supporting Information S2 that the matrix {t being inverted is never singular. Moreover, this matrix is far from being singular (i.e., its eigenvalues are not numerically close to zero) as t?0. We therefore suggest that a preferable method of combating precision loss is to reduce t, since the step-size tightly regulates the global error as well (see Supporting Information S2). Although in the subsequent example we did not perform iterative improvement, the results indicate that any precision loss is negligible, despite the large dynamic range of probability values involved in the solution.

Results
To demonstrate the efficacy of our method, we tackle the problem of modeling a well-documented 1978 influenza epidemic in an English boarding school [20]. A deterministic SIR model was originally developed to analyze these data [21]. Subsequently, the model was extended to the stochastic case and approximately solved using van Kampen's system-size expansion method [1]. In the following, we use the IE method to compute the exact solution of the underlying master equation up to a desired precision.
There are three classes of individuals, S, I and R, representing Q~763 susceptible, infected and recovered pupils. Spreading of the epidemic is governed by the reactions in (1) with propensity functions p 1 (S(t),I(t),R(t))~k 1 S(t)I(t) and p 2 (S(t),I(t),R(t))~k 2 I(t), where k 1~0 :00218/day and k 2~0 :44036/day are the rate constants of infection and recovery, respectively [1]. The initial conditions are given by S(0)~762, I(0)~1, R(0)~0, reflecting the fact that only one pupil is infected at the start of the epidemic. We take the sample space Z to be the rectangular region in the z plane that begins at (0,0) and extends to include the maximal point (762,763). This is due to the fact that the first reaction can occur at most 762 times, after which all pupils will have been infected, whereas, the second reaction can occur at most 763 times, after which all pupils will have recovered from the infection. As a consequence, the sample space Z contains K~763|764~582,932 points.
Numerically solving the master equation over a period of 25 days by means of the KSA method using Expokit [7] took 72 minutes of CPU time on a 2.20 GHz Intel Mobile Core 2 Duo T7500 processor running Matlab 7.7. The resulting solution produces an L2 error DDh(25){ĥ h(25)DD 2~1 :48|10 {3 , where h is a solution of the master equation obtained by a stringent run of Expokit, which we consider to be the 'true' solution. The required TOL value (used to determine a desired error tolerance for the KSA method and for the RIE method with variable step-size) was set to 1|10 {3 . We obtained the Expokit solution by using a Krylov subspace approximation with dimension L 0~6 5. This value was determined by starting with the default value of L 0~3 0 and successively increasing it by 5 until the resulting Expokit error estimate was less than TOL~1|10 {3 . The reported L2 errors were calculated using a solution obtained by a computationally more expensive Expokit run with L 0~7 0 and TOL~1|10 {4 , which we consider it to be the 'true' solution. This is based on the premise that Expokit will produce the true solution for sufficiently large L 0 and small TOL.
To be compatible with Expokit, we report here the L2 error. Note however that the error analysis of our method, provided in the Supporting Information S2, is based on the L1 error. On the other hand, using equation (8) with t~0:01 days, the IE method took a mere 53 seconds of CPU time, achieving a smaller (by a factor of 2.8) final L2 error of 5:35|10 {4 . We can achieve a further reduction of the L2 error by using the RIE method with fixed step-size. This is clear from the results summarized in Table 1. This performance can be achieved however at the expense of increasing the CPU time required to calculate the solution. Note that we may be able to decrease the CPU time by using the RIE method with variable step-size (see Table 1). This method however results in a noticeable decrease of accuracy (at least for the example considered here), with an L2 error that is 2.8 Figure 1. One step of the RIE method. The upper branch implements the standard IE method with step-size t, whereas, the lower branch implements the IE method with step-size t=2. ''OR'' implements equation (11). doi:10.1371/journal.pone.0036160.g001 times larger than the one obtained with the KSA method. Since R(t)~Q{S(t){I(t), it suffices to focus on the joint probability mass function Pr½S(t),I(t) of susceptible and infected pupils. It turns out however that the epidemic-free state occurs with high probability Pr½S(t),I(t)~0, a situation that visually obscures the values of Pr½S(t),I(t). For this reason, instead of Pr½S(t),I(t), we depict in Figure 2 a snapshot of the calculated joint conditional probability mass function Pr½S(t),I(t)DI(t)w0 of the susceptible and infected pupils at the end of the 6th day, given that at least one pupil is infected. The Supporting Information S3 contains a .gif movie that depicts the dynamic evolutions of Pr½S(t),I(t)DI(t)w0 and Pr½R(t)DI(t)w0 during the 25 day period. We have obtained these and all subsequent results by exclusively using the basic IE method.
In Figure 3, we depict the dynamic profiles of the mean numbers of susceptible, infected and recovered pupils (solid green lines) as well as the dynamic profiles of the +1 standard deviations (dashed red lines), computed directly from the joint probability mass function Pr½S(t),I(t),R(t). We also depict the observed data (blue circles) obtained from the literature [20]. These results are identical to the results obtained by Monte Carlo estimation based on 1,000 trajectories sampled from the master equation using the Gillespie algorithm (only data related to the infected pupils are shown), and assures that the IE method produces the correct results. Unfortunately, we cannot employ the Gillespie algorithm to accurately estimate the joint probability mass function Pr½S(t),I(t),R(t) in a reasonable time, due to the prohibitively large number of samples required by this method.
The bimodal nature of the probability mass function depicted in Figure 2 clearly demonstrates that the system-size expansion method used previously in [1] is not appropriate for this model, since the method leads to a unimodal Gaussian approximation. As a matter of fact, the results depicted in Figure 3 are different than the mean and standard deviation profiles depicted in Figures 3-4 of [1]. Because of the Gaussian nature of the system-size expansion method, the results reported in [1] over-estimate the means and under-estimate the standard deviations, since this technique is blind to the bimodal nature of the probability distribution. As a matter of fact, using the means and standard deviations to characterize the stochastic properties of individual classes in the SIR model is not appropriate. This is also evident by the fact that the +1 standard deviations can take negative values as well as values greater than 763. In Figure 3, we have truncated these misleading values.
We can use the calculated joint probability mass functions Pr½S(t),I(t),R(t) to study a number of dynamic properties of the SIR model in a stochastic setting. In Figure 4(a), for example, we depict the evolution of the expected number of recovered pupils (solid green line), as well as the +1 standard deviations (dashed red lines), given that at least one pupil is always infected. During the first few days, few infections occur, and the expected number of recovered pupils will almost be zero. Subsequently, this number increases monotonically to 763, following a near sigmoidal curve. The +1 standard deviation curves and the evolution of the Fano factor (variance/mean) depicted in Figure 4(b) indicate that there is appreciable fluctuation in the number of recovered pupils during days 3-10, after which most pupils recover from the infection. According to the results depicted in Figure 4(b), the maximum fluctuation in the number of recovered pupils occurs during the 6th day.
In Figure 4(c), we depict the dynamic evolution of the calculated probability of extinction Pr½I(t)~0, tw0, during a period of 50 days. This evolution is characterized by four phases. During phase I (days 1-4), the probability of extinction increases rapidly from 0% to about 26%, due to the small number of infectious pupils. During phase II (days 5-17), the probability remains relatively constant to about 26%. During this period of time, the epidemic takes its natural course, increasingly infecting susceptible individuals, who eventually recover from the disease. As a consequence, we do not expect the probability of extinction to increase during this phase. On the other hand, during phase III (days 18-40), the number of infected pupils monotonically decreases to zero. It is therefore expected that, during this phase, the probability of extinction will monotonically increase to its maximum value of one. Finally, during phase IV (days 40-50), there is no infectious pupils present. As a result, the influenza virus cannot be transmitted to the remaining susceptible pupils and the epidemic ceases to exist.
When studying an epidemic model with extinction, a task of practical interest is to calculate the number of individuals that escape infection. This is usually done by evaluating the expected number e of individuals that escape infection (or the average number of susceptible individuals that remain after extinction) as the mean value of the stationary probability mass function Pr½S(?),I(?)~0 [2]. In Figure 4(d), we depict the joint probability Pr½S(50),I(50)~0 at time t~50 days, which we assume to be a very close approximation to the stationary probability mass function Pr½S(?),I(?)~0. By using this probability, we compute e^547. Note however that, due to the bimodal nature of Pr½S(50),I(50)~0, calculating e is misleading. On the other hand, by using the result depicted in Figure 4(d), we can confirm that there is a 73:35% chance that 40 pupils or less, and a 26:53% chance that 753 pupils or more, escape infection. Clearly, these 'confidence intervals' provide a more accurate statistical assessment of the number of individuals that escape infection than e. Interestingly, there is only 0:12% chance that the number of pupils escaping infection is within the range ½41,752, which includes the value of e.

Discussion
Modeling the stochastic dynamics of a disease that spreads through a small and well-mixed population of individuals is an increasingly important subject of modern epidemiology. Unfortunately, even for the simplest model, calculating the underlying probability distribution is a daunting task.
In an effort to address this problem, we have introduced in this paper a new approach to numerically compute the probability mass function of a Markovian population process governed by the master equation. Implementation of this approach is feasible when the number of possible states is not prohibitively large. In this case, the proposed method can lead to exact statistical analysis -up to a desired precision -of certain stochastic epidemiological models of interest, such as the SIR epidemic model.
The method introduced in this paper is linear -both in terms of memory and computational requirements -with respect to the cardinality K of the sample space Z of the degrees of advancement of the underlying reactions. As a consequence, the method is feasible any time Z is relatively small. In general, however, the cardinality of Z may grow arbitrarily large, making implementation of the method impossible without an appropriate FSP approximation [9]. Thus, the proposed technique is only applicable to models that constrain the number of reaction events, such as the SIR epidemic model considered in this paper, or models for which the number of reaction events is sufficiently small during a time period of interest (i.e., models without 'fast' reactions). Moreover, due to the well-known problem of the 'curse of dimensionality,' K grows exponentially with respect to the number of reactions M. Hence, models with many reactions cannot be solved by the proposed method.
An effort is currently underway to reduce the size of the sample space Z, without compromising accuracy. A plausible way to accomplish this goal is to reduce the number of reactions involved by removing 'fast' reactions using a multi-scale approximation technique, such as one of the techniques introduced for biochemical reaction systems [13,14,22], and to adaptively update Z at each time point t by confining it to the smallest possible subspace Z(t) of Z. Because of the lower-triangular and sparse nature of matrix in (8), it is also plausible that we employ optimized algorithms developed for solving sparse triangular systems of linear equations on parallel and distributed memory computer architectures [23], indicating that future efforts towards solving the master equation could potentially focus on using high-performance computing systems.
Finally, it was brought to our attention by one of the reviewers that, in an earlier work, K. N. Crank proposed a method to map a general Markovian population process on a countable sample space to an augmented Markovian process with triangular generator matrix [24] by appropriately ordering that space. Crank's technique can be easily combined with our implicit Euler method to construct an alternative algorithm for numerically solving the master equation with focus on the population process instead of the DA process. However, we cannot find any advantage of using Crank's approach over ours. We believe that an approach for numerically solving the master equation based on the DA process is more preferable than an approach based on the population process. The former can provide the probability distributions of both the DA and population processes, whereas, the latter can only produce the probability distribution of the population process. Moreover, the IE method based on the DA process is easier to implement, due primarily to a faster and more natural implementation of the lexicographic ordering used by this approach as opposed to the more complex ordering of the population sample space proposed by Crank. For more details on this issue, see our discussion in Supporting Information S2.

Supporting Information
Supporting Information S1 This file contains the MATLAB code used to generate the results presented in the paper.

(ZIP)
Supporting Information S2 This file contains additional information and proofs that elucidate various mathematical and numerical aspects of the work presented in the paper. It also provides a brief discussion of an alternative method for solving the master equation using the implicit Euler method, based on ordering the population sample space instead of the DA sample space. (PDF) Supporting Information S3 This file contains a video of the dynamic evolution of the joint conditional probability mass function of susceptible and infected pupils in an influenza epidemic predicted by the SIR model. (GIF)