^{1}

^{*}

^{2}

^{2}

^{3}

^{3}

^{4}

^{5}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: KA MR JT PSA MC. Performed the experiments: KA. Analyzed the data: KA MR. Contributed reagents/materials/analysis tools: MR HVF MC. Wrote the paper: KA MR JT PSA HVF MC. Provided biological expertise in the development of the simulator and understanding of the results it generated: HVF MC.

Integrating computer simulation with conventional wet-lab research has proven to have much potential in furthering the understanding of biological systems. Success requires the relationship between simulation and the real-world system to be established: substantial aspects of the biological system are typically unknown, and the abstract nature of simulation can complicate interpretation of

The integration of computer simulation with current experimental techniques has become a popular approach to aid the understanding of biological systems

Whereas a number of packages have been developed that aid simulation development

In previous work we have utilised computer simulation to model the process of lymphoid tissue development

S

Prior to any simulator being used as a tool to complement wet-lab investigations, it is critical that the effect of inherent simulation stochasticity on results be understood

Consistency analysis operates by contrasting distributions of simulation responses, all generated using the same fixed set of parameter values and containing identical numbers of simulation samples. By varying the number of samples comprising the distributions, the analysis determines the number required to obtain statistically consistent distributions. Larger sample sizes produce increasingly identical distributions, thereby mitigating the effect of simulation stochasticity on results. In the description by Read et al

As an example, one could consider analysing sample sizes of 5, 50, 100, and 300 to determine the number of simulation runs required to mitigate aleatory uncertainty. A set of parameter values is fixed and used for all runs. The researcher performing the analysis must then gather 20 sets of simulation results for each sample size being analysed, each containing that number of results. In this example, where a sample size of 5 is being examined, 20 sets of simulation results should be generated, with each containing 5 sets of simulation results. Where a sample size of 300 is being analysed, each of the 20 sets should contain the results of 300 runs. When this is complete, each sample size is analysed in turn. A distribution of median responses for each simulation run is generated for each of the 20 subsets.

Distributions 2–20 are contrasted with the distribution from the 1st set using the Vargha-Delaney A-Test

Any biological simulation will feature biologically-derived parameters for which values are fully or partially unknown: some biological values cannot be determined experimentally, whereas others cannot be represented easily within a simulation. For example, diffusion of a chemoattractant could be implemented using a particular mathematical function for which values cannot be verified, as biological quantities cannot currently be measured. Robustness analysis examines the implications of biological uncertainty or parameter estimation on simulation results. Where a simulation is found to be highly sensitive to such parameters caution must be exercised when interpreting results; they may be artefacts of parametrisation rather than representations of the biology

Robustness to parameter perturbation is explored using a ‘one at a time’ approach

Though robustness analysis elucidates affects of perturbing single parameters, it cannot reveal compound effects that occur when two or more are adjusted simultaneously. The effect one parameter has may rely on the value of another. Global sensitivity analyses reveal such effects, showing how different parameters could be coupled, and can indicate the parameters that have the greatest influence on simulation responses.

A latin-hypercube sampling approach is used to select parameter sets from within these ranges, whilst minimising correlations in parameter values across the sets and ensuring an efficient coverage of parameter space

The extended Fourier Amplitude Sampling Test (eFAST) developed by Saltelli et al

For each simulation parameter included in this analysis, a range over which values are to be explored is provided. Taking each in turn, values are chosen for all parameters through the use of sinusoidal functions of a particular frequency through the parameter space, with the frequency of the parameter of interest being varied greatly to that used for its complementary set. A number of parameter values are selected from points along each of these curves, creating a set of simulation parameters for each parameter of interest. An illustration of this sampling approach can be seen in Marino

Simulation results are analysed taking into account the frequencies that were used to generate the parameter set used. Through Fourier analysis using these frequencies, variation in output can be partitioned between the parameters, giving an indication of the impact each has on simulation response. Using the equations given in Marino et al

The example simulation data, from which the results presented here were generated, is taken from our recently published lymphoid tissue development simulator (

In our published analyses

To determine the number of simulation runs required to obtain a representative result, we analysed sample sizes of 1, 5, 50, 100, 300, and 500 runs. Parameter values were kept constant at their calibrated values. Each sample size is analysed in turn using the procedure described, with the generation of 20 subsets of each sample size. This analysis thus required 19,120 individual runs. The online tutorial examines the first five sample sizes.

In our case study

Parameter value sets for the six parameters were created using the methods in the

Each parameter is addressed in turn, and simulation results for each value assigned to that parameter analysed. 500 simulation executions are performed for each parameter value in accordance with consistency analysis results. In our case, this resulted in 32,500 individual simulation runs. The distribution of response values obtained for each parameter value is contrasted with a distribution obtained using baseline parameter values using the Vargha-Delaney A-Test

A: A-Test scores for simulations perturbing the initial expression level of a chemoattractant. This parameter has a large effect on both simulation responses. B: A-Test scores for simulations perturbing the upper limit of chemoattractant expression, which when perturbed has no significant effect on simulation response. C: Distribution of cell displacement responses for the parameter perturbed in A.

In this analysis we sought to identify any compound effects that become apparent when the values of the six parameters examined in Technique 2 above are perturbed simultaneously. This has revealed the parameters that are highly influential on simulation behaviour, and provided unique biological insight into the factors that are important at this stage of tissue development. The online tutorial demonstrates how both the parameter samples and results described in this section have been generated.

Using the latin-hypercube sampling approach, 500 sets of simulation parameter values were generated.

Five hundred parameter value sets were generated from the parameter space using the latin-hypercube sampling approach. With results from Technique 1 suggesting 500 simulation executions are required to gain a representative result from our simulation, a total of 250,000 simulation executions were performed to generate the data required for this analysis. Median output responses for each of the parameter value sets were then calculated from the 500 sets of results. Taking each parameter in turn, median response values are plotted against the parameter value that generated them, and partial rank correlation coefficients are calculated.

For online tutorial 3,

A: Parameter that captures the chemoattractant expression level required to influence cell motility. No trend or effects are apparent. B: Parameter which captures the level of adhesion required to restrict cell motility. A clear trend is apparent suggesting this has a large influence on simulated cell behaviour.

In this analysis we examined the same six parameters as above, and determined the proportion of variation in simulation response that can be explained by perturbing the value of each parameter. Through use of the eFAST approach

Parameter value sets have been generated using the sinusoidal curve sampling approach. We have seven parameters (six plus the ‘dummy’ used for statistical comparison), taken 65 parameter values from each curve, and employed three re-sampling curves, producing 1,365 parameter value sets, 195 per parameter.

Simulation responses are analysed using the Fourier frequency approach

Si (black): the fraction of output variance that can be explained by the value assigned to that parameter; STi (grey): the variance caused by higher-order non-linear effects between that parameter and the others explored. Error bars are standard error over three resample curves. A: Velocity response. B: Displacement response.

Spartan R package for Linux and Mac OS. Includes tutorials for each technique.

(ZIP)

Spartan R package for Windows OS. Includes tutorials for each technique.

(ZIP)

The authors would like to thank members of the Timmis lab who tested