The Separatrix Algorithm for Synthesis and Analysis of Stochastic Simulations with Applications in Disease Modeling

Decision makers in epidemiology and other disciplines are faced with the daunting challenge of designing interventions that will be successful with high probability and robust against a multitude of uncertainties. To facilitate the decision making process in the context of a goal-oriented objective (e.g., eradicate polio by ), stochastic models can be used to map the probability of achieving the goal as a function of parameters. Each run of a stochastic model can be viewed as a Bernoulli trial in which “success” is returned if and only if the goal is achieved in simulation. However, each run can take a significant amount of time to complete, and many replicates are required to characterize each point in parameter space, so specialized algorithms are required to locate desirable interventions. To address this need, we present the Separatrix Algorithm, which strategically locates parameter combinations that are expected to achieve the goal with a user-specified probability of success (e.g. 95%). Technically, the algorithm iteratively combines density-corrected binary kernel regression with a novel information-gathering experiment design to produce results that are asymptotically correct and work well in practice. The Separatrix Algorithm is demonstrated on several test problems, and on a detailed individual-based simulation of malaria.


Introduction
The Separatrix Algorithm is composed of two primary sub-algorithms. The Separatrix Inference Algorithm algorithm estimates the distribution of the probability of success at one or more inference points. The Separatrix Interest Guided Design of Experiments chooses new simulation experiment to run so as to maximally gain information about the separatrix. Here, we provide a concise summary of each of these algorithmic components.

Separatrix Inference Algorithm
The inference portion of the Separatrix Algorithm can be concisely summarized as follows: 1. Compute the density of sample points at each sample point using (4).
2. Compute local sample point count at each inference point using (6).
3. Compute and apply the kernel.
(a) The kernel bandwidth comes from (5), replacing N withN (x), the local sample point count computed above.
(b) Compute the kernel, L g , between sample and inference points. For all results presented in the paper, we have applied a squared exponential kernel here.

Separatrix Interest Guided Design of Experiments
The portion of the Separatrix Algorithm responsible for selecting M BDOE new simulations to run for the next iteration, called igBDOE, can be described as follows: 1. Choose T test points and C = λM BDOE candidate sample points: • For the first iteration, choose candidate and test points using LHS.
• For subsequent iterations, load these points from the previous iteration.
2. Estimate the mode,f , at each candidate sample point using the inference algorithm.
3. Compute the baseline interest distribution at all test points using (12) and (13).
4. Compute the expectation of the average (over test points) KL divergence resulting from the addition of each candidate sample point • Loop over candidate sample points, for each candidate: • Assume a TRUE outcome will be observed at the candidate and recompute the interest distribution at all test points. Then compute the KL divergence with respect to baseline.
• Assume a FALSE outcome will be observed at the candidate and recompute the interest distribution at all test points. The compute the KL divergence with respect to baseline.
• Compute the expectation of the KL divergence as the convex combination of the above two divergences, weighted byf .
5. Select the best M BDOE candidates, as ranked by the expected KL divergence.
6. Prepare and store test and candidate sample points for the next iteration: (a) Compute the variance of the baseline interest distribution.
(b) Move the test and candidate points using Markov-chain Monte Carlo techniques so that the points become distributed according to the variance of the baseline interest distribution. Interpolate as needed.
(c) If desired, apply a Gaussian blur by displacing each point by a normally distributed amount.
(d) Save the modified test and candidate points for the next iteration.
The iteration is stopped when either the expected information gain is below a threshold or a maximum number of iterations is reached.