Spatiotemporal Infectious Disease Modeling: A BME-SIR Approach

This paper is concerned with the modeling of infectious disease spread in a composite space-time domain under conditions of uncertainty. We focus on stochastic modeling that accounts for basic mechanisms of disease distribution and multi-sourced in situ uncertainties. Starting from the general formulation of population migration dynamics and the specification of transmission and recovery rates, the model studies the functional formulation of the evolution of the fractions of susceptible-infected-recovered individuals. The suggested approach is capable of: a) modeling population dynamics within and across localities, b) integrating the disease representation (i.e. susceptible-infected-recovered individuals) with observation time series at different geographical locations and other sources of information (e.g. hard and soft data, empirical relationships, secondary information), and c) generating predictions of disease spread and associated parameters in real time, while considering model and observation uncertainties. Key aspects of the proposed approach are illustrated by means of simulations (i.e. synthetic studies), and a real-world application using hand-foot-mouth disease (HFMD) data from China.


Introduction
Understanding infectious disease patterns (i.e. space-time variations and/or changes) has always been a challenging affair. Disease diffusion can vary significantly from place to place and from time to time for a number of reasons, including heterogeneity of the hosts and pathogens, physical and social environments, and interactions across space and time. Moreover, uncertainties linked to population movement and records of infected individuals can increase the difficulty of understanding the spatiotemporal spread of an infectious disease. A number of key studies have shown that infectious disease spread depends significantly upon the spatial features of a population [1][2][3][4][5] whereas major benefits of spatial disease modeling include the assessment of disease intervention and control strategies (e.g., border control and quarantine). Accordingly, several models have been proposed to quantify the spatial disease features at both population and individual scales [1,[6][7][8]. Among the best-known models are the gravity, the spatial micro-simulation, and the network models [1,6,9]. Most of these models focus primarily on interactions between the susceptible and infected populations across geographical locations, without considering the continuous local population dynamics of disease evolution. This is especially the case for the gravity model, where the geographical distribution and interaction patterns of populations are discretized into separated locations. Stochastic ''Susceptible-Infected-Recovered'' models (SIR, [10][11][12][13]) have been widely implemented to represent disease evolution of populations over time. Spatial metapopulation approaches extend SIR models to explicitly account for the local or global population movements between different geographical locations, in terms of patches or networks with deterministic or stochastic characteristics [14][15][16].
The present study proposes a realistic space-time extension of a purely temporal SIR model, i.e. metapopulation model, in the context of Bayesian maximum entropy (BME) theory [17,18]. The space-time BME-SIR model has certain attractive features: (1) it represents the population dynamics of infectious diseases within and across localities; (2) it takes into consideration the composite space-time variation of disease features; (3) it accounts for observation uncertainties (e.g., in the records of infected individuals); (4) in addition to the susceptible-infected-recovered disease dynamics, it integrates different sources of knowledge (e.g., hard and soft disease data together with epidemic models and physical laws); and (5) it updates the space-time model parameters in real time.

Results
Theoretical SIR model vs. simulated data This simulation (synthetic) study assumes an initial distribution of infected population fraction, X s,0 , in a gridded domain of size 20620 square cells with unit area. Subsequently, the space-time distributions of X s,t , Y s,t , and Z s,t are generated in terms of Monte-Carlo simulation using the SIR model in Eqs. (1a-c). In this study, the sptial variability of X s,0 is described by a covariance function of the exponential form p X (r)~p 0 e {3r=ar , where r denotes physical distance, p 0~1 and a r~5 . To describe population movement, we used a Gaussian kernel function, k s , with bandwidth b~0:5. In this simulation, the population portion that resides at a certain location and does not migrate is estimated as where r ij is the distance between grid points i and j. Equation (7) gives k ii~q~0 :682, i.e. approximately 70% of the population at each unit cell of the grid has residence time equal to the time step of the simulation. In this case, the recovery and transmission rates are assumed to be 0.1 and 0.4, respectively. The simulated spacetime distributions of the infected, X s,t , susceptible, Y s,t , and recovered, Z s,t , population fractions are plotted in Figures 1a-d for t = 5, 10, 20 and 30 (note the changing color scales). Figure 2 shows how the expected value of the ratio R c between the right-and left-handsides of Eq. (S2) (see File S1) varies with the distance Ds{s'D at different times t = 2, 8, 10 and 30. One sees that for all times t and distances Ds{s'D, R c is close to 1 and, hence, the constraint in Eq. (S2) is satisfied to a good approximation. That said, the SIR model in Eqs. (1) can be accurately described using a spatially homogeneous Q t -function.
Similar results have also been obtained for different values of the parameters p 0 , a r and b in the corresponding space-time disease covariance models [19]. As noted earlier, Q t is a monotonically decreasing function of time t. In addition, Q t should include at least two parameters that allow variation at t~0 and produce different rates of decay as t increases; and a third parameter so that at t~0, Q 0 [(0, 1), and at t??, Q ? §0. For example, one may choose the function Q t~l exp({ct e ), where the parameters l[(0,1) and c,ew0. Other possibilities also exist. Figure 3 shows empirical estimates of the function w t calculated in terms of 300 synthetic realizations of the SIR model in Eqs. (1ac). One sees that the obtained estimates are fitted well by the functional form Q t~l exp({ct e ) with parameters l~1:0006, c~6:6777|10 -6 , and e~3:7237. The theoretical covariance of the infected population fraction is calculated from Eq. (S5) (see File S1) using Q s, t~Qt~l exp({ct e ) with l~1:0006, c~6:6777| 10 -6 , and e~3:7237. Figure 4 compares the theoretical spatial correlation function of the SIR model at times t = 2, 8, 10, 20, and 30 and the associated empirical correlations calculated directly from the simulated infected distributions. Calculation of the former is based on the theoretical covariances obtained from Eq. (S5) (see File S1) using the aforementioned exponential form of the Q-function. The latter are empirical estimates of the normalized to max-1 covariance function at different times t, obtained through Monte Carlo simulation. For small times t,10, one observes a very good fit between the empirical and theoretical covariances across space. However, as t increases and for large distances Ds{s'D the deviations between simulated and theoretical covariances become larger. We have investigated the matter to some extent and found that this is due to tiny differences in the initial condition c X ;s,0;s',0 , which propagate over time.

SIR model sensitivity analysis
Next, simulations of the infected (X s,t ), susceptible (Y s,t ) and recovered (Z s,t ) population fractions are generated assuming several numerical values for the SIR parameters. Significant features of the respective temporal evolution curves are illustrated and discussed in relation to the different scenarios. Figure 5a presents a comparison of the temporal evolution of X s,t (solid lines), Y s,t (dashed lines), and Z s,t (pointed lines) at a certain location assuming different values of the probability of infection transmission b~0:1 (red color), 0.2 (blue), 0.3 (green); the probability of recovery is a~0:1, the population fraction that resides at the domain of interest is q~0:7, and the kernel bandwidth is b~0:5. In Fig. 5b, a synthetic representation is given on the simplex triangle X s,t zY s,t zZ s,t~1 (for a technical discussion, see [20,21]. The distance between dots corresponds to time intervals starting from the low right corner of the simplex triangle. Several intuitive results are quantitatively represented in Fig. 5a-b. Note that in Fig. 5a, for higher b values: (i) the maximum infected fraction is greater and it is reached at an earlier stage; (ii) accordingly, the reduction of the susceptible fraction is faster with time, and (iii) the increase of the recovered population fraction is faster. Moreover, as b increases, the limit over time of the susceptible population fraction, that is the population fraction which finally remains unaffected by the disease, tends to be closer to zero. This limiting behavior is more clearly visualized in Fig. 5b. Fig. 6a presents a comparison of the temporal evolution of X s,t (solid lines), Y s,t (dashed lines), and Z s,t (pointed lines) assuming different values of a~0:3 (red color), 0.4 (blue), 0.6 (green). The probability of infection transmission is b~0:4, q~0:7, and b~0:5. As before, Fig. 6b is a simplex triangle representation of Fig. 6a. Note that variation of a leads to the ''inverse'' SIR behavior than that of b in Fig. 5a-b. In Fig. 6a, for smaller values of a: (i) the maximum infected fraction is greater and it is reached at an earlier stage; (ii) the reduction of the susceptible fraction is faster with time, and (iii) the increase of the recovered population fraction is faster. In this case, the maximum (over time) of the susceptible population fraction tends to be closer to zero for smaller values of a. Note that for a = 0.6, more than half of the population remains free of the disease. Figure 7a presents a comparison of the temporal evolution of X s,t (solid lines), Y s,t (dashed lines), and Z s,t (pointed lines) at a certain location considering the purely temporal model (red color), and the spatiotemporal model for two different kernel bandwidths b~0:5 (blue) and 3 (green); with b~0:4, a~0:1, and q~0:7. As before, Figure 7b is a simplex triangle representation of Figure 7a. It is worth noting that independently of the value of b near all population becomes infected and the maximum infected fraction remains unaffected, although it is reached at an earlier stage for smaller spatial spreads (b~0:5 vs. b~3). Also, the reduction of the susceptible population fraction is slower with time for larger values of b (b~3 vs. b~0:5). The simplex triangle paths are similar for the three cases, but the SIR velocities are different as reflected in the corresponding inter-point distances. Figure 8a presents a comparison of the temporal evolution of X s,t (solid lines), Y s,t (dashed lines), and Z s,t (pointed lines) at a certain location, again considering the purely temporal model (red color), and the spatiotemporal model for two different values of   q~0 (blue), and 0.7 (green). The kernel bandwidth has been set to b~0:5, b~0:4, and a~0:1. As before, Figure 8b is a simplex triangle representation of Fig. 8a. Note that in Fig. 8a, for q~0:7 the increase of the infected population fraction (or equivalently the reduction of the susceptible fraction) is faster with time than for q~0. In addition, the maximum infected fraction remains the same, but it is reached faster in the purely temporal case (red color), and when a considerable fraction of the population resides within the domain of interest (q~0:7 vs. q = 0). Similar conclusions can be drawn from the study of the plots in Fig. 8b, with shape similarities translated into coincidental paths with different SIR velocities in the simplex domain.

A Study of Hand-Foot-Mouth Disease Data
In what follows, the theoretical space-time BME-SIR method is applied to a real-world study of the spread of hand-foot-mouth disease in China (HFMD; [22]). HFMD is the most common infectious disease in China [23], hence there is considerable interest in understanding the evolution of its spatiotemporal patterns and potential correlations to environmental factors. For example, Wang (2011) explores HFMD and climate associations across Eastern China [23]. The HFMD data was obtained from China Center of Disease Control (see File S1).
The study focuses on a specific set of Chinese counties with relatively higher disease incidence; in particular, we focus on the disease evolution in 145 counties that extend between 111uE to 118uE, and 32uN to 37uN (Fig. 9). The data are weekly-aggregated HFMD rates (cases of infecteds per 10000 people) over a period of 20 weeks that span from September 27-October 3, 2008 (t~1) to February 7-13, 2009 (t~20). In the example, we account for uncertainty in the data survey by assuming all observations to be uncertain measurements. We consider each rate as a randomly sampled value from a uniform distribution that is 1 unit wide. Observed rates that were reported to be exactly 0 are represented by soft uniform distributions with rates between [0, 1]. The soft intervals width selection is a conservative, arbitrary estimate on the basis of the recorded national average rates for HFMD (3. For initial conditions, the initial spatial spread of infecteds, X s,t~1 , is given by the observed rates at t~1. We start with no recovered individuals at t = 1. By considering an approximate disease duration of 1 week, the remaining part of the population are susceptibles to the disease. For the present illustration, we also assume the following: (i) Relocation occurs sparsely during the 20-week study period, and it is accounted for by means of a Gaussian kernel function k s of bandwidth b~0:1 that results in factor sizes k ii with a mean value k k ii~0 :9755 and a very skewed distribution towards high values (sample skewness 22.68). This means that on average, 97.55% of the population does not relocate during the study period.  (ii) Constant recovery a s,t~a and transmission b s,t~b probabilities with initial values a~0:1 and b~0:4, and variances s 2 a~0 :05 and s 2 b~0 :1. The covariances P(tz1Dt) at the subsequent instances are based on the initial covariance that is computed for the initial spatial distribution of X s,t~1 . The covariance at t = 1 was estimated from the observed values at that instance, and was fitted by a correlation model with a nugget effect equal to 0.07 (rate variance units) and a spherical model with sill 0.07 (rate variance units) and spatial range 3u.
On the basis of the above input, the BME-SIR method produces space-time distributions of the infected X s,t , susceptible Y s,t , and recovered Z s,t , population fractions of HFMD throughout the 20-week study period. At each consecutive time instance, the general knowledge (BME-SIR model) drives the model parameters a and b progressively closer to the values that best interpret the present HFMD dataset. This process is also guided by updating the model with new specificatory (data) information at every time step. Figure 10 illustrates how the predicted parameter values from the current HFMD data reach equilibrium. The BME-SIR model predicts an approximate mean transmission rate b&0:17, and an approximate mean recovery rate a&0:21. One observes that despite the arbitrary initial values of a and b, relatively accurate parameter estimates are reached relatively fast within about 2-4 weeks. For the scope of the present illustration, the above initial values have been  selected rather arbitrarily. In more elaborate examples, it might be desirable to provide better-informed initial estimates for these rates. In the absence of expert knowledge, one possible way to tackle such cases could be to use existing data to obtain SIRbased regression estimates for the initial values of a and b [24].
Also, maps of the BME-SIR predicted mean of the infected distributions X s,t are produced for each of the 20 weeks of this study. Figure 11 shows these means inside the region of interest at selected week instances, and Fig. 12 illustrates the corresponding prediction error for those instances. The prediction error throughout the study was found to range between 0.0067 and 0.2884. These values are comparable to the corresponding predicted values, and reflect that the BME-SIR predictions also account for the uncertainty in the HFMD observations. In summary, this real-world case study indicates that BME-SIR can provide an informative overview of the disease evolution. Also, this application shows how BME-SIR can be effectively used to estimate the disease spread based on highly uncertain data, without any distributional assumptions. The BME-SIR estimation can assimilate both theoretical disease diffusion dynamics and the uncertain disease space-time data. As a result, the characteristics of disease evolution can be revealed over time, even in cases when the disease data are highly uncertain.

Discussion
Characterizing space-time diffusion dynamics is a challenging effort due to complexities in population movement, disease transmission and recovery mechanisms, and uncertainties in observations. SIR models have for a long time been applied to study population-based disease diffusion at a specific site over time.
To account for spatial diffusion, studies have been focusing on integrating detailed geographical information into SIR models by using multiple patches or networks to characterize the population movements and interactions. The detailed geographical topology can possibly consider the spatial heterogeneity of disease transmission. Under the framework of SIR-extended models, space-time disease diffusion can be studied based on the knowledge of the parameters of disease dynamics inferred from data; e.g. transmission rate. However, in most cases, detailed and accurate information on population interactions is partially available. In addition, infectious disease data can be sparse and highly noisy and, therefore, the characteristics of disease dynamics are highly uncertain, especially at the initial stage of the disease outbreak. In this study, we account for uncertainties in the available data, as well as the unknown characteristics of disease dynamics, by proposing a spatiotemporal BME-SIR method of infectious disease spread. Based upon the SIR concept, the BME framework allows space-time disease modeling to account for patch-based population movements and multiple-sourced uncer-  tainties, including: 1) unknown prior knowledge of disease dynamics, i.e. transmission and recovery rates, and 2) uncertainties in disease data from direct or indirect observations.
To gain additional insight of the complete space-time SIR model, we progressively simplified the model to: (a) be expressed in a linear state-space form (i.e., using the Q-function approximation), and (b) be described by analytical solutions (i.e., static population assumption). Overall, the linearized SIR model showed good performance in reproducing the infected, susceptible and recovered population fractions, their empirical correlations (Fig. 4), and in inferring the transmission and recovery rates from data with low estimation error (see Fig. 13 and the error bars in Figures 14-15) and minimal computational effort. This makes the developed BME-SIR model an ideal framework for real-world application studies, where one needs to model the spread of infectious diseases in space-time, for different initial conditions, using a minimum number of parameters. The latter should suffice to reproduce the covariance structure of the susceptible, infected and recovered population fractions at different times over the whole simulation grid. The BME-SIR model effectively achieves this goal using only two model parameters (transmission and recovery rates), which can be easily inferred from data and, in more demanding studies, can vary systematically in both space and time.
For the purposes of disease control, real-time prediction of space-time disease spread is required by governmental agencies. For the cases of emerging infectious diseases, real-time prediction is essential due to higher risks and increased uncertainties in the infected cases and disease parameters, e.g. reproduction number. Modeling of the spatiotemporal patterns of emerging disease spread involves uncertainties from various sources, e.g. model uncertainty, parameter and data uncertainties. Data assimilation approaches can continuously incorporate new observations into the physical process, and has been widely used in a variety of applications, e.g. geosciences [25,26]; however, relatively few studies investigated the application of data assimilation approaches to infectious disease predictions [27]. Kalman filter is one of the most widely used data assimilation approaches for real-time prediction. It is based upon the state-space model and assumes the model and observation uncertainties are Gaussian-distributed. The BME-SIR method combines the linearized state-space model (i.e. general knowledge), with disease data with various levels of uncertainty (i.e. site-specific knowledge), to produce real-time disease estimates. Similar to the other data assimilation methods (e.g. Kalman filter), BME-SIR can update the model predictions, whenever new observations become available. The proposed spatiotemporal BME-SIR filtering framework can incorporate multi-sourced uncertainties (like exKF), and produce real-time disease estimates in a space-time domain. The distinction between exKF and BME-SIR methods is that BME-SIR can account for data uncertainties without any underlying distributional assumptions. Real-time estimates of infected population fractions as well as the transmission and recovery rates are shown in Figs. 14 and 15. Note that the estimates obtained by BME-SIR and exKF are consistently updated as new observations become available. The parameter values predicted by BME-SIR and exKF reach an equilibrium after about 10 weeks (see Fig. 10). Spatial maps of the predicted mean of the infected population fraction, X s,t , were produced on a spatial grid of 30625 = 750 nodes for each of the 20 weeks of the study; see Fig. 11.

The Space-Time Disease Model
Disease spread is a fundamentally spatiotemporal phenomenon, the rigorous study of which should account for a number of uncertainty sources (e.g. disease variability, imperfect observation conditions, population density fluctuations, physiographic features, meteorological matters). This constitutes sufficient motivation for extending the original SIR model in the space-time context under conditions of real world uncertainty. The distribution of the fraction of infected population is represented as a spatiotemporal random field X p~Xs,t [28,29], where p~(s,t) denotes a physical location with spatial coordinates s~(s 1 ,s 2 ) at time t. Similarly, Y p~Ys,t and Z p~Zs,t are random fields representing the distributions, respectively, of the fraction of the population that is susceptible to become infected and the fraction of the population that has recovered and is immune. The basic relationships between X p , Y p , Z p are, X p zY p zZ p~1 , Y s,0~1 {X s,0 and Z s,0~0 , where X s,0 , Y s,0 , and Z s,0 denote the initial conditions (IC) of the corresponding population fractions [30]. The proposed modeling of the combined space-time distributions X p , Y p , Z p is described by the following generalized SIR model in continuous time (for the discrete time case, see [19] and references therein), where q s,t is the population fraction that resides (i.e., does not displace from) at the space-time domain p~(s,t), 1{q s,t is the fraction that migrates during the time period t?tzdt, d s{u is the delta function, and k s{s',t is a spatially homogeneous kernel (e.g., Gaussian kernel with finite variance) that controls population movement across space, with spatial integral being equal to 1{q s,t . In addition, a' s,t is the rate [T {1 ] that an infected individual, at the space-time domain p~(s,t), recovers and becomes immune, and b' s,t is the corresponding rate [T {1 ] of infection transmission during an encounter of one infected and one susceptible individual. Note that a' s,t and b' s,t allow one to include information about regional topography and local climatic conditions. Moreover, under conditions of in situ disease control (quarantine, vaccination etc.), transmission and recovery rates are time-varying. The stochastic SIR model (1a-c) is, by construction, a composite space-time representation of disease spread. The space-time covariances of X p , Y p , Z p are derived from the SIR model (details in File S1).  (1a-c) In the case when q s,t , k s,t , a' s,t and b' s,t vary slowly with t (say *log t), or they are constant in time (i.e. q s,t~qs , k s,t~ks , a' s,t~a ' s and b' s,t~b ' s ), Eqs (1a-c) satisfy the following set of integrodifferential space-time SIR equations: where a s,t~a ' s,t dt[(0,1 is the probability that an infected individual, at p~(s,t), recovers and becomes immune, and b s,t~b ' s,t dt[(0,1 is the probability of infection transmission during an encounter of one infected and one susceptible individual. BME is a stochastic approach for spatiotemporal modeling and prediction in conditions of space-time heterogeneity and in-situ uncertainty [31]. BME disease modeling can rigorously integrate different disease knowledge bases, e.g. laws of disease evolution dynamics with available space-time disease datasets to provide informative and accurate predictions of disease spread. BME distinguishes between two major disease knowledge bases (KBs): (a) the core or general KB, G-KB, which includes physical and biological laws (e.g., the SIR model); and (b) the site-specific KB, S-KB, which includes hard or exact data and soft or uncertain data (e.g., observations of disease counts across space-time as exact numerical values or as interval and probability distributions of possible values). The BME method integrates both knowledge bases (i.e. K~G|S) in terms of the following fundamental BME equations [17,18] where the vector x denotes realizations of the distribution of X p in space-time (clearly, the equation could be written in terms of Y p and Z p realizations, as well), g is a vector of functions that represent stochastically the G-KB under consideration (the bar denotes statistical expectation), and m is a vector of coefficients that depend on the space-time coordinates (m is linked to g). BME is not constrained by assumptions commonly used in the literature, such as Gaussian probability distribution of disease attributes and linear estimator forms. In the present case, disease evolution is governed by the space-time SIR model (1a-c); therefore, g includes the mean, covariance and cross-covariance functions of X p , Y p , and Z p derived from the theoretical space-time SIR model (1a-c) to account for disease trends and correlation patterns in the population at different geographical locations and during different time periods. (More details about the space-time BME-SIR method will be given later.) f S represents the available S-KB, which can be direct or indirect disease observations across space and time in terms of fixed values or probabilistic distributions of disease attributes. A is a normalization parameter, and f K is the probability density function (pdf) of estimated disease counts at each space-time point (the subscript K means that f K is based on the blending of core and site-specific KBs). g and f S are the inputs to Eqs. (3a-c), whereas the unknowns m and f K vary from place to place and from time to time. Estimates of the unknown parameters in vector m are generally obtained by means of optimization techniques [32][33][34]. In this study, since the means and covariances of the space-time SIR model are used in the G-KB, the unknown parameters in m can be directly derived from analytical statistical physics formulas [17]. The estimation of f K at different spatial locations and temporal instances (i.e. space-time points) is based on operational Bayesian theory which does not require any distributional assumptions [35]. Another way to look at Eqs (3) is that they generate a stochastic solution of the integodifferential SIR equations (2a-c) thatcompared to the standard solutions of general integodifferential equations-has the unique feature to also account for several other kinds of available knowledge (hard and soft data, empirical relationships, secondary information) and multi-sourced uncertainties (in the composite space-time disease variation, the records of infected individuals etc.). In section IV, the BME-SIR disease dynamics model will be discussed together with certain simulation cases, in which the BME-SIR method fuses the SIR disease model (G-KB) and disease related observations and records (S-KB).

Special Cases of SIR Dynamics with Closed-Form Solutions
Valuable insight about the spatiotemporal SIR dynamics is gained by considering some special cases of the SIR model (1a-c). For example, if the space-time dependence of the infected (X p ) and the susceptible (Y p ) distributions satisfy certain separability conditions (which assure system linearity), a function Q s,t can be defined that has a smooth shape similar to that of the covariance function of X s,t and Y s,t (details in File S1). For example, Q s,t may be chosen to be a monotonically decreasing function of time t with sufficient flexibility to represent the behavior of the population fraction that is susceptible to infection; see also section VI below.
Let us start by assuming that during the time period of interest the population is static (i.e., it does not move in space) while the disease spreads (i.e., q s,t~1 ), and the infecteds IC (X s,0 ) are spatially homogeneous. Note that in the case when q s,t~1 , the time-independent kernel that controls population movement across space, k s,t , does not play any role and, hence, the integral terms in Eqs (1a-c) can be neglected. In this case, Eqs (1a-c) reduce to The mathematical expressions of the covariance and crosscovariance functions of the disease variables X p , Y p and Z p are shown in Table 1. One can see that all space-time disease covariance and cross-covariance functions: (a) depend on c X;s{s',0 (covariance between the ICs X s,0 and X s',0 ); and (b) are broadly non-homogeneous (in space) and non-stationary (in time). In the case when a' s,t~a ' s , b' s,t~b ' s ,and Q s,t~Qs are constant in time, the parameters A s,t , B s,t , and C s,t receive the following closed analytical forms ð6a À ÀcÞ Below we consider some numerical applications of the SIR model presented above.

SIR in the BME setting
In practice, spatiotemporal disease modeling is performed in uncertain conditions, e.g., erratic disease observations, incomplete prior knowledge of disease transmission and recovery rates [36]. In most cases of SIR modeling, a rather incomplete knowledge of the in situ susceptible and recovered population fractions is possible. The numerical study shows that when the disease observations are uncertain and follow a non-Gaussian law, the BME method in Eq.
(3)( [28,32,37,38]) can further improve updating in the SIR model by representing erratic observations in the form of probabilistic data and by incorporating transmission and recovery rate uncertainties. In the BME framework, the core or general KB (G) includes the SIR equations and the associated disease covariance models, whereas the site-specific KB (S) includes the uncertain infected observations and the initial conditions of the transmission and recovery rates. The SIR states are then formulated in the matrix form where X(t) and X(tz1Dt) are vectors containing the current and predicted states of infected disease counts and the rates of the SIR model; i.e. recovery rate a' and transmission rate b', respectively. A and A _ are the transition and Jacobian matrices characterizing the dynamics of the SIR model. W t models the uncertainty of infected states across space, which cannot be represented by SIR modeling, and is characterized by the covariance matrix Q t . The observation matrix H contains only zeros and ones indicating data presence across space. P(t), P(tz1Dt) and P(tz1Dtz1) are the current, predicted and updated state covariances. Equation (8) involve the general KB containing the stochastic properties of disease dynamics (details of the matrix formulation of Equation (S8) shown in File S1). Concerning the site-specific KB, for estimation purposes the hard (accurate) data are randomly sampled over time at 44 spatial cells of the disease grid mentioned earlier. In addition, soft (uncertain) data that follow uniform probability distributions with uncertain ranges are sampled from another set of 29 cells. The sample locations are shown in Fig. 13.
Numerical comparisons between the simulation results for model prediction and parameter estimation by the BME-SIR method are shown in Figs. 14, 15, 16. For comparison purposes, the results obtained using the extended Kalman filter (exKF) are also shown (technical details of the exKF SIR model can be found in [19]). Figure 14 shows that both methods predict almost equally well the infected population fractions at different times t (meansquare errors: 23.93 for BME-SIR and 28.92 for the exKF). Improvements in estimation uncertainty gained by using the BME-SIR over the exKF method are also shown. Similar results were obtained for the susceptible and recovered population fractions. Figs. 15, 16 demonstrate the performance of the two methods in estimating the transmission and recovery rates at different times t. One sees that both methods provide effective Table 1. Covariances and cross-covariances of X p , Y p and Z p . X s', t' Y s', t' Z s', t' X s,t A s,t A s,t c X;s{s',0 {A s,t (1zB s',t 0 )c X;s{s',0 A s,t C s',t' c X;s{s',0 Y s, t {A s',t' (1zB s,t )c X;s{s',0 (1zB s,t )(1zB s',t 0 )c X;s{s',0 {(1zB s,t )C s',t 0 c X;s{s',0 Z s, t A s',t' C s,t c X;s{s',0 {(1zB s',t 0 )C s,t c X;s{s',0 C s,t C s',t 0 c X;s{s',0 doi:10.1371/journal.pone.0072168.t001 Figure 16. Comparison between simulated and estimated transmission rates using the exKF and BME-SIR methods. doi:10.1371/journal.pone.0072168.g016 (5a-c) (6a-c) estimates of the SIR recovery rates at all times, but the corresponding estimates of the transmission rates for large times t (i.e., t.50) are poor. The changes in the recovery and transmission rates show some interesting temporal patterns. When t is small (e.g., t,10), the estimated rates are closely associated with their initial guess and therefore the deviations of both the BME-SIR and exKF rate estimates are large. The improvement of rate estimation is shown over time. The transmission rate estimation accuracy obtained by both methods is low when t.40. This is due to the low portion of susceptibles after time t = 40 (i.e., the percentage of susceptible population after t.40 is less than 3%), which yields transmission rate estimates insensitive to observations. However, even in the case of low infected population fractions, both the BME-SIR and exKF methods produce accurate recovery rate estimates. Clearly, the SIR model is more sensitive to changes in the recovery rate rather than the transmission rate (Figs. 15-16). As a result, real-time data assimilation should lead to better estimates of real-time transmission rates.

Supporting Information
File S1 Supplementary materials.