A Simulation Optimization Approach to Epidemic Forecasting

Reliable forecasts of influenza can aid in the control of both seasonal and pandemic outbreaks. We introduce a simulation optimization (SIMOP) approach for forecasting the influenza epidemic curve. This study represents the final step of a project aimed at using a combination of simulation, classification, statistical and optimization techniques to forecast the epidemic curve and infer underlying model parameters during an influenza outbreak. The SIMOP procedure combines an individual-based model and the Nelder-Mead simplex optimization method. The method is used to forecast epidemics simulated over synthetic social networks representing Montgomery County in Virginia, Miami, Seattle and surrounding metropolitan regions. The results are presented for the first four weeks. Depending on the synthetic network, the peak time could be predicted within a 95% CI as early as seven weeks before the actual peak. The peak infected and total infected were also accurately forecasted for Montgomery County in Virginia within the forecasting period. Forecasting of the epidemic curve for both seasonal and pandemic influenza outbreaks is a complex problem, however this is a preliminary step and the results suggest that more can be achieved in this area.

Individual-based epidemiology models are used to study the effects of public health interventions and changes in individual behavior to the dynamics of infectious diseases [1]. A detailed description of the individual-based model used in simulating the outbreaks in this study is presented in [2]. The individual-based model is made up of a dynamic social contact network and a disease transmission model. The social contact network which is a synthetic representation of a particular geographical region is generated using United States Census data. The structure of the underlying network is important since interactions between agents influences how disease propagates during epidemic simulations. In this initial study, neither changes in individual behavior nor intervention strategies are explored. However, in later studies to better understand the implications of specific intervention strategies, we would forecast disease dynamics given that effective measures have been introduced to control the spread of an outbreak.
The construction of the dynamic social contact network requires two steps. In step 1 a synthetic population is created. Individuals in the synthetic population are tagged to realistic demographics while preserving the confidentiality of the original data sets. Each individual is assigned to a household using a decision tree based on demographic information (such as number of people in a household, number of children etc.) from data provided in SF3 and PUMA (Public Use Microdata Area) census files. Households are placed in realistic geographical regions.
The synthetic population is statistically identical to the census aggregated at the block level. More information can be found in [3], [4] and [5].
Step 2 assigns a daily activity list with specified durations (start and end times) to each household based on several thousand responses to an activity or time-use survey. The activity templates which vary by region given factors such as the geographical location and age composition of the population, are created based on the National
This process results in the assignment of a minute-by-minute schedule to each of the agents in the synthetic population. The contact between agents during activities results in a realistic contact graph G PL , where P is the set of people and L is the set of locations. If person p ∈ P visits location l ∈ L, there is an edge (p, l, label) ∈ E(G PL ) between them with type and duration of activity recorded as label [1]. Edges exist between individuals in the same activity location. Synthetic individuals mimic the behaviors of real people by participating in everyday activities such as socializing, shopping etc. and multiple edges can be used between each person and the locations representing their frequency of visits. See [6] and [7] for additional information. The modeling approaches used in the individual-based as presented in [1] are given in Table S1.
After generating the social contact network, a computational model is developed to represent disease progres- The probability of transmission with contact duration w(u, v) is given by: where r is the probability of disease transmission for a unit of contact time. Each infected node progresses through the succeeding transmission states where it stays for a defined period of time. The transition durations can be described using probability distributions based on the disease of interest. Several studies have been implemented to validate specific components of the model and the general approach. See [3], [16], and [13] for structural validity of these models.
To run a simulation experiment, a population (contact network), characteristics of a disease and initial conditions are specified. Each simulation is seeded with a randomly selected set of initially infected individuals and several realizations of the stochastic process of disease propagation are computed. Intervention options such as vaccination, antiviral and social distancing can be applied during the outbreak to control its propagation.

The Epidemic Parameter Search Problem
Let G(V, E) denote a contact network on a population V of size |V| = n. Each edge e = (u, v) ∈ E denotes contact between two nodes u, v ∈ V and is weighted by the duration of contact w(u, v). Each node u ∈ V is assumed to be in one of four health states -susceptible, exposed, infectious, or recovered -and the set of all health states of nodes in V at the start of the simulation is given by the n-dimensional vector X 0 .
The three disease parameters used in the network-based disease model are the disease transmissibility, the incubation period distribution, and the infectious period distribution. The transmissibility of the disease (S) is defined as the probability that a node in the infectious state in the contact network infects a neighboring susceptible node for each unit of contact time. An infected node in the contact network has an incubation period of i days and an infectious period of j days with probabilities p E i and p I j , respectively, where i = 1, · · · , t E max , and j = 1, · · · , t I max .
The upper bounds on the support of the incubation and infectious period distributions, t E max and t I max , are obtained from literature [13,21].
As part of the epidemic forecasting problem, we seek a set of disease parameters θ true that satisfies y(θ) ; p I 1 , p I 2 , · · · , p I t I max ; S) denotes the set of nonnegative real valued epidemic parameter values and α ∈ t denotes the epidemic curve observed up to time t for the set of disease parameters θ true . The outcome of y : d → t (d = t E max + t I max + 1) is stochastic and can be estimated using Y k (θ) = 1 k k i=1 z i (θ), where z : d → t is the simulated outcome of the network-based epidemic model for parameter θ, and the superscript i indicates the i th simulated replicate.
For a fixed contact network G(V, E), an initial health state X 0 of the population V, and α and Y k defined as above, the epidemic parameter search problem at time t can be stated as a simulation optimization problem as follows. Minimize We note that α here is only one instance of the stochastic event observed for the set of disease parameters θ true , but is considered as the expected epidemic outcome in the simulation optimization formulation. Also, for the purposes of this study we consider a Euclidean norm in the objective function. However, the problem formulation allows for its substitution with any other applicable norm. At every step of the algorithm, one of the above-mentioned operations is used to generate a new parameter set that replaces a vertex in the simplex representing parameters with the worst objective value. If no better parameter set is found, all vertices are drawn halfway toward the current best vertex. This requires new simulation runs to evaluate the objective function for the updated parameter set. The dimension of the polytope always remains the same; containing m + 1 vertices. The algorithm converges if the relative difference between the best and worst function values in the new polytope is less than the defined relative tolerance. The method is illustrated in Algorithm S1. Based on [22,23] best values for the Nelder-Mead parameters are ρ = 1 (reflection coefficient),
The modified version of the Nelder-Mead algorithm is also given in Algorithm S2. The algorithm proposes new values for the transmissibility, in addition to the probability values for the incubation and infectious period distributions. The probabilities must be non-negative and sum to one independently for the incubation and infectious periods.