^{1}

^{1}

^{2}

^{1}

^{3}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: JL SF MK MPHS. Performed the experiments: JL SF. Analyzed the data: JL SF. Contributed reagents/materials/analysis tools: JL SF MK. Wrote the paper: JL SF MK MPHS.

Our understanding of most biological systems is in its infancy. Learning their structure and intricacies is fraught with challenges, and often side-stepped in favour of studying the function of different gene products in isolation from their physiological context. Constructing and inferring global mathematical models from experimental data is, however, central to systems biology. Different experimental setups provide different insights into such systems. Here we show how we can combine concepts from Bayesian inference and information theory in order to identify experiments that maximize the information content of the resulting data. This approach allows us to incorporate preliminary information; it is global and not constrained to some local neighbourhood in parameter space and it readily yields information on parameter robustness and confidence. Here we develop the theoretical framework and apply it to a range of exemplary problems that highlight how we can improve experimental investigations into the structure and dynamics of biological systems and their behavior.

For most biological signalling and regulatory systems we still lack reliable mechanistic models. And where such models exist, e.g. in the form of differential equations, we typically have only rough estimates for the parameters that characterize the biochemical reactions. In order to improve our knowledge of such systems we require better estimates for these parameters and here we show how judicious choice of experiments, based on a combination of simulations and information theoretical analysis, can help us. Our approach builds on the available, frequently rudimentary information, and identifies which experimental set-up provides most additional information about all the parameters, or individual parameters. We will also consider the related but subtly different problem of which experiments need to be performed in order to decrease the uncertainty about the behaviour of the system under altered conditions. We develop the theoretical framework in the necessary detail before illustrating its use and applying it to the repressilator model, the regulation of Hes1 and signal transduction in the Akt pathway.

Mathematical models of biomolecular systems are by design and necessity abstractions of a much more complicated reality

These challenges have prompted the development of novel statistical and inferential tools, required to construct (or improve) mathematical models of such systems. We can loosely group these methods into (i) those aimed at reconstructing network models

Inferential tools have been developed that, given some observed biological data and a suitable mathematical candidate model, provide us with parameters that best describe the system's dynamics. Unfortunately obtaining reliable parameter estimates for dynamical systems is plagued with difficulties

We use

Performing different experiments is costly, however, in terms of both money and time, and not all experiments are equally informative. Ideally we would like to perform only those experiments which yield

Experimental design in systems biology is different from classical experimental design studies. The latter theory was first developed at a time when the number of alternative hypotheses was smaller than the amount of available data and replicates

Several authors have used the information theoretical framework, in particular the expected gain in Shannon information to assess the information content of an experiment

Below we first develop the theoretical concepts before demonstrating the use (and usefulness) of the Bayesian experimental design approach in the context of a number of biological systems that exemplify the set of problems encountered in practice. In order to demonstrate the practical applicability of our approach we investigate two simple models (repressilator and Hes1 systems), as well as a complex signalling pathway (AKT) with experimentally measured dynamics.

To achieve their full functionality mathematical models require parameter values that generally need to be inferred from experimental data. The extraction of this information is, however, a nontrivial task and is further compounded by the need to assess the statistical confidence of parameter estimates. In the Bayesian framework for example, we seek to evaluate the conditional probability distribution,

Rather than providing a single parameter estimate the posterior distribution allows us to assess how well a parameter is constrained by data (see

(A) The regions of plausible parameters values for three different experiments. Each ellipse defines the set of parameters which are commensurate with the output

Given the importance of the predictive role of mathematical modelling it is also of interest to reduce the uncertainty of model predictions; intriguingly and perhaps counterintuitively — but demonstrably and provably (see below and

Below we use three examples of different complexity to show how this combination of rigorous Bayesian and information theoretical frameworks allows us to design/choose optimal experimental setups for parameter/model inference and prediction, respectively.

To investigate the potential of our experimental design method for parameter estimation we first apply it to the repressilator model, a popular toy model for gene regulatory systems

(A) Illustration of the original repressilator model. The model consists of 3 mRNA species (coloured wavy lines, labeled

To infer the parameters of this model,

To determine which experiment to carry out we compute the mutual information between the parameter prior distribution and the system output via Monte-Carlo estimation. We use uniform priors over

Top: The mutual information

Sometimes we are interested in estimating only some of the parameters, e.g. those that have a direct physiological meaning or are under experimental control. To investigate this aspect we consider the Hes

(A) Diagram of the Hes ^{−1} as experimentally determined by

This can again be further substantiated by simulations. We perform parameter inference based on such simulated data (simulated data are shown in

We next focus on a scenario where we aim to predict the behaviour of a biological system

(A) Diagram of the model of the EGF-dependent AKT pathway. Epidermal growth factor (EGF, red triangle) is a stimulus for a signalling cascade, which results in the phosphorylation (green circle) of Akt (blue square) and S6 (purple square). EGF binds to the EGF membrane receptor EGFR (orange), which is generated from a pro-EGFR. The Binding results in the phosphorylation of the receptor, which consequently leads to the activation of downstream cascades (thick black circle). This simplified model was shown to capture the experimentally determined dynamics

We are interested in predicting the dynamics under multiple pulsed stimuli with EGF in the presence of background noise, as shown in

(A) The noisy

To obtain better predictions we can use data from other experiments measuring the time course of the

In

This ability to predict the time courses extends to much greater signal distortion and even with a noise level of

We have found that maximizing the mutual information between our target information — here either model parameter values or predictions of system behaviour — and the (simulated) output of potentially available experiments offers a means of arriving at optimally informative experiments. The experiments that are chosen from a set of candidates are always those that add most to existing knowledge: they are, in fact, the experiments that most challenge our current understanding of a system.

This framework has a number of advantages: First, we can simulate cheaply any experimental set-up that can in principle be implemented; second, using simulations allows us to propagate the model dynamics and to quantify rigorously the amount of (relevant) information that is generated by any given experimental design; third, our information measure gives us a means of meaningfully comparing different designs; finally, our approach can be used to design experiments sequentially — our preferred route as this will enable us to update iteratively our knowledge of a system along the way — or in parallel, i.e. selecting more than one experiment. Previous approaches had taken a more local approach

Here we have focussed on designing experiments that increase our ability to estimate model parameters and to predict model behaviour. The latter depends on model parameters in a very subtle way: not all parameters affect system output equally and under all conditions. Target conditions could, for example, include clinical settings which are generally not experimentally amenable (at least in early stage research); here the current approach offers a rationale for designing

With an optimal design we can overcome the problems of sloppy parameters

We tested and applied our approach in three different contexts: while the repressilator serves as a toy model with hypothetical data and experiments, the Hes1 transcription regulatory system and the EGF induced Akt pathway are relevant biological systems. The question to answer in the Hes

The approach we presented here yields the potential for model discrimination or checking the target of our analysis

Our aim is to choose an experiment

We first consider the task of choosing an experiment that will on average provide most information about model parameters measured through the reduction in their respective uncertainties. In the information theoretic language, as by Lindley

Here we specifically consider models such that the output is of the form

Similar reasoning leads us to a criterion for selecting an experiment

The mutual information for models of type (9) can be estimated using Monte Carlo simulations

Similarly, we can estimate the mutual information between any single component

To finish we consider the estimation of the mutual information between the output of the system for two different experiments

We implemented the algorithms in

Once data

The estimation of entropy has been performed only to test and confirm our experimental choice, which is based on Monte Carlo estimation of mutual information. For each experiment

To compute the entropy

The experimental data sets used to investigate the Akt model were collected and published by the lab of S. Kuroda. The data are normalised Western blot measurements as described in

Information content of different parameter regimes. The mutual information

(TIFF)

The simulated evolution of the mRNA and protein concentration in the repressilator model for each experimental setup. The parameter vector used for simulations is

(TIFF)

Mutual information between the parameter and each species (

(TIFF)

The posterior distribution given the data represented in

(TIFF)

Simulated trajectories of the mRNA and protein concentrations (dots). The parameter used for simulation is

(TIFF)

(A) Simulated trajectories of the mRNA and protein concentration (dots) for the parameter

(TIFF)

Ordinary differential equations which describe the dynamics of the

(TIFF)

The time course of phosphorylated EGF receptor (pEGFR), phosphorylated Akt (pAKT) and phosphorylated S6 (pS6) in response to an impulse input of EGF over

(TIFF)

The time course of phosphorylated EGF receptor (pEGFR), phosphorylated Akt (pAKT) and phosphorylated S6 (pS6) in response to a step input of EGF over

(TIFF)

(A) A noisy

(TIFF)

The predicted time course of the proteins pEGFR, pAKT and pS6 under the noisy

(TIFF)