Bayesian History Matching of Complex Infectious Disease Models Using Emulation: A Tutorial and a Case Study on HIV in Uganda

Advances in scientific computing have allowed the development of complex models that are being routinely applied to problems in disease epidemiology, public health and decision making. The utility of these models depends in part on how well they can reproduce empirical data. However, fitting such models to real world data is greatly hindered both by large numbers of input and output parameters, and by long run times, such that many modelling studies lack a formal calibration methodology. We present a novel method that has the potential to improve the calibration of complex infectious disease models (hereafter called simulators). We present this in the form of a tutorial and a case study where we history match a dynamic, event-driven, individual-based stochastic HIV simulator, using extensive demographic, behavioural and epidemiological data available from Uganda. The tutorial describes history matching and emulation. History matching is an iterative procedure that reduces the simulator's input space by identifying and discarding areas that are unlikely to provide a good match to the empirical data. History matching relies on the computational efficiency of a Bayesian representation of the simulator, known as an emulator. Emulators mimic the simulator's behaviour, but are often several orders of magnitude faster to evaluate. In the case study, we use a 22 input simulator, fitting its 18 outputs simultaneously. After 9 iterations of history matching, a non-implausible region of the simulator input space was identified that was times smaller than the original input space. Simulator evaluations made within this region were found to have a 65% probability of fitting all 18 outputs. History matching and emulation are useful additions to the toolbox of infectious disease modellers. Further research is required to explicitly address the stochastic nature of the simulator as well as to account for correlations between outputs.

u(x) ∼ GP(0, c(x, x )). (1) The c(x, x ) term is known as the covariance function and plays a fundamental role in Gaussian processes as it determines the association between u(x) and u(x) based on the distance between x and x . If we denote this distance as W , the covariance function can take a multitude of forms, some of the most common are shown in table 1. The covariance functions cannot take an arbitrary form, as they have to satisfy the constraint of positive definiteness (see chapter 4 of [1]). Suppose that we have two p-

Correlation function Formula
Gaussian Table 1. Some common correlation functions c(x, x ), with W denoting a weighted distance between x and x .
dimensional points x = [x 1 , x 2 , . . . , x p ] and x = [x 1 , x 2 , . . . , x p ]. Their weighted distance is defined as The parameters δ = [δ 1 , δ 2 , . . . , δ p ] are known as correlation lengths and weigh the distance between x and x . A large δ i implies that u(x) and u(x ) will be strongly correlated along the i th dimension of x and vice versa. Suppose now that we observe the Gaussian process at n discrete points D = {x i : i = [1, . . . , n]} and we get the vector of observations u(D) = [u(x 1 ), u(x 2 ), . . . , u(x n )]. D and u(D) here represent the emulator's training points. According to the definition of a Gaussian process, the observations u(D) will follow a joint Gaussian distribution, whereÃ is the covariance matrix of the design points u(D), which is defined as The distribution of u(D) is conditioned on the covariance function hyperparameters δ. One way of training the emulator could be to search for a value of δ that maximises equation 3, i.e.
A more involved approach, would be to define a prior distribution for δ and draw samples from the posterior distribution p(δ|u(D)). For sake of simplicity, we assume here the first approach.
We are now interested in estimating the value of the Gaussian process at an unknown point x, conditional on the observed data u(D) andδ. This distribution is with the vector c(x) defined as c(x) = [c(x, x 1 ), c(x, x 2 ), · · · , c(x, x n )] T . The above distribution is the posterior distribution of a zero mean -unit variance emulator.

The general case
We will now present the equations of the more general non-zero mean emulator with arbitrary variance. The model assumed is which is the same as equation 1, apart from the mean term h T (x)β and σ 2 that controls the scaling (variance) of the process. Assuming again a set of n observations g(D), the likelihood of this model is where The matrix A is defined as A =Ã + νI, where I a diagonal matrix and ν is a parameter, commonly known as the nugget [2]. The role of the nugget is to account for the existence of noise in the model (i.e. if the observations g(D) were noisy measurements of the process, rather than the process itself), while it can also safeguard against numerical instabilities that can occur in the calculations involved in training and fitting the emulator. The parameters δ and ν are jointly referred to as θ c = [δ, ν].
If we assume a non informative prior for σ 2 and β p(σ 2 , β) ∝ σ −2 [3], these two parameters can be marginalised in the Bayesian sense, yielding . This was the expression we optimised w.r.t. θ c when we trained the emulators in the manuscript.
In the same fashion, one can obtain the posterior distribution for the emulator p(g(x)|g(D),θ c ), which will be a multivariate t-distribution, with n − q degrees of freedom and mean and variance with If the degrees of freedom (n − q) of the t-distribution are sufficiently large (e.g. n − q > 20) we can consider p(g(x)|g(D),θ c ) as being a multivariate normal distribution with the same mean and variance as above.