Fig 1.
Flowchart summarizing the different steps of our analysis.
We sample the parameter space by creating quasi-random grids of simulations using the agent-based models (ABM). We then employ the sampled parameter sets (Θsim) and the simulation output (xsim) to train different machine learning (ML) methods: Neural networks (NN), mixture density networks (MDN) and Gaussian processes (GP). These ML methods can either be used as emulators or direct inference machines. Emulators mimic the simulations at low computational cost. They can be employed in Bayesian sampling schemes, such as MCMC and Likelihood Free Inference (LFI), to infer model parameters based on the observations. Direct inference machines can be seen as black boxes that produce samples from the posterior distribution of model parameters when given the observations. We use synthetic data obtained from the ABM which allows us to compare the obtained posterior distributions for the model parameters with the ground truth. For both the emulators and direct inference machines, we thus perform a classical inference task, amounting to a self-consistent test. Moreover, we quantify how well the emulators capture the behaviour of the ABMs in separate comparisons.
Table 1.
Upper and lower boundaries for the model parameters (Θ) in the constructed grids.
Fig 2.
Comparison of the performance of the emulators as a function of the size of the training set (resolution) for the agent-based cellular automata (CA) model of brain tumours.
The plot shows the results for three emulators: A neural network (NN), a mixture density network (MDN), and a Gaussian process (GP). The subscripts ‘pred’ and ‘sim’ refer to the predictions by the emulators and the ground truth obtained from simulations, i.e. the synthetic data, respectively. The four panels contain our four performance metrics for emulators: the error in the predicted mean (μpred), the relative absolute error in the predicted median (Mpred), the ratio between the predicted width of the marginals in terms of the standard deviation (σpred) and true width, and the Wasserstein distance (cf. the section Metrics). The figure summarizes these quantities for one of the output variables of the CA model, the number of glioblastoma stem-like cells (GSC). For the other outputs, we refer the reader to S7 Fig.
Fig 3.
Counterpart to Fig 2 for our epidemiological SIRDS model.
The plot shows different metrics of the performance of emulators for one of the output variables of the SIRDS model, the total duration of the epidemic. For the other outputs, we refer the reader to S8 Fig.
Fig 4.
Performance across different inference schemes for our cancer CA model in terms of the residuals between the mean of the marginal distribution and the true parameter values (θo).
For other metrics, we refer the reader to S9 Fig. The plot includes the results from the emulation-based approaches (emu.) and our direct inference machines (inf.). We include two machine learning approaches: a neural network (NN) and Gaussian processes (GP). In connection with the emulators, we distinguish between results obtained using rejection ABC and MCMC. For each approach, the label specifies the size of the training set: For the emulators, we consistently used 104 simulations.
Table 2.
Median and mean autocorrelation length for different MCMC setups.
For each of the two ABMs (our cancer CA and epidemiological SIRDS model), we use two ML emulation algorithms (NN and GP). We distinguish between two different likelihoods (case a and b, cf. the methods section). The first column specifies the different setups by first naming the ABM, followed by the emulator and the likelihood.
Fig 5.
Counterpart to Fig 4 for our epidemiological SIRDS model.
Here, we show the performance across different inference schemes for our SIRDS model in terms of the residuals between the mean of the marginal distribution and the true parameter values (θo) in units of the width of the observed parameter range (Δθ). In contrast to Fig 4, we scale the residuals based Δθ because the different parameters cover very different ranges. For other metrics, we refer the reader to S10 Fig.
Fig 6.
Required CPU time across different methods for our cancer CA model (upper panel) and the epidemiological SIRDS model (lower panel).
The light magenta bars show the total time required for each inference technique. The remaining three bars specify the time required for the simulations involved (red), the time required for training and validation (yellow), and the time that is needed to infer the parameters for a single set of observations (blue). The number of required simulations is specified in the labels. We include the NN direct inference machine based on three different sizes of the training set. Moreover, we include the NN emulator in tandem with the rejection ABC and MCMC algorithms. The ABC accepts the best 104 among 107 randomly drawn samples. The ensemble MCMC uses 8 walkers, drawing 20,000 samples for each walker. For the MCMC, we average over the CPU times for cases a and b. For comparison, we include the estimated time required for running the ABC with 107 simulations. Note that the ordinate is logarithmic. Note also that the time needed to run any given simulation depends on the number of agents and the model parameters.