Bayesian parameter estimation for dynamical models in systems biology

doi:10.1371/journal.pcbi.1010651

Fig 1.

A comprehensive Bayesian parameter estimation and uncertainty quantification framework for dynamical models in systems biology.

(A) Model development in systems biology begins with model construction and data collection. Dynamical models in systems biology typically involve a system of ODEs that capture the dynamics of the concentrations of different chemical species in the system (A1). The reaction rates associated with these concentration changes are usually mass action, Michaelis Menten kinetics, or cooperative kinetics represented by the Hill equation (A2). The free parameters in these models include kinetic rate constants, e.g. k, V_max, equilibrium constants, e.g. K_m, K_A, and Hill coefficients, e.g. n. These parameters are first constrained by best guess values based on physiological ranges and typical values of model parameters from the literature (A3). Finally, the model needs experimental data for validation; this data can either be from published work or new experiments. (B) Parameter preprocessing and Bayesian parameter estimation with the CIUKF-MCMC algorithm. First structural identifiability and global sensitivity analyses on the entire parameter set reduce the set of free parameters that can be estimated (B1). Next, we perform Bayesian parameter estimation for this reduced set of parameters to learn their posterior distributions. The posterior distribution is the parameter distribution conditioned on the data (B2). Bayes’ rule relates the posterior distribution to the product of the prior distribution and the likelihood function. The prior distribution encodes known information about the parameters and the likelihood function (which requires simulating the model) measures the misfit between predictions and the data. A state-constrained Unscented Kalman filter approximates the likelihood function to account for uncertainty in the model equations. Although Bayes’ rule provides a means to evaluate the posterior distribution at specific points in the parameter space, we use Markov chain Monte Carlo (MCMC) sampling (B3) to characterize the entire distribution. (C) The posterior distributions enable uncertainty analysis of model outputs through ensemble simulation. We perform simulations using the posterior parameter samples to propagate parameter uncertainty through the model (C1). Statistical analysis of the ensemble enables us to compute uncertainty intervals and study various system behaviors, for example, the statistics of the steady state values (C2).

More »

Expand

Fig 2.

Examples of point estimates depicted on an arbitrary probability density function (black line).

The MAP (maximum a posteriori) point is located at the most probable point (blue dashed line). The long tail of the distribution shifts the mean (gray dotted line) away from the MAP point. Secondary modes (red dashed-dotted line) can effect the quality of a point estimate. Additionally, the green shaded region highlights the 95% credible interval, the region between the 2.5th and 97.5th percentiles, that is used to capture the uncertainty in an estimate.

More »

Expand

Fig 3.

Parameter estimation for a simple two-state model.

(A) Top row: Network diagram of the two-state model with states, x₁(t), x₂(t), input function u(t), and four unknown parameters, . Bottom row: Trajectories of the input function u(t) and corresponding state trajectories. The input has at least one non-zero derivative to ensure that all model parameters are globally structurally identifiable following [55]. (B) Marginal posterior distributions of the model parameters show increasing uncertainty in the parameter estimates (e.g. widening and flattening) with increasing levels of additive normally distributed measurement noise with mean zero. We control the noise level by setting the noise covariances to the specified percentage of the standard deviation of each state variable. The dashed black vertical lines indicate each parameter’s nominal (true) value. Marginal posteriors are visualized by fitting a kernel density estimator to 20,000 MCMC samples obtained using CIUKF-MCMC with the delayed rejection adaptive Metropolis (DRAM) MCMC algorithm after discarding the first 10,000 samples as burn-in. (C) Posterior distributions of the trajectory of x₁(t) reflect increasing parameter estimation uncertainty in panel B. The true trajectory (solid black line) shows the dynamics with the nominal parameters, dashed black lines show that trajectory with the most probable set of parameters (MAP point), and the empty circles show the noisy data at the specified noise level. The 95% credible interval shows the region between the 2.5th and 97.5th percentiles that contains 95% of the 5, 000 trajectories. (D) Marginal posterior distributions of the model parameters show increasing uncertainty (widening and flattening) with increasing data sparsity (fewer samples). We simulate data sparsity by sampling the simulation from 0 ≤ t ≤ 2 with three time steps, Δt = 0.05 (40 experimental samples), Δt = 0.1 (20 experimental samples) and Δt = 0.2 (10 experimental samples). Marginal posteriors are fit to 20,000 MCMC samples obtained as in panel B. (E) Posterior distributions of the trajectory of x₁(t) reflect increasing parameter estimation uncertainty seen in panel D.

More »

Expand

Fig 4.

Parameter estimation for a simplified MAPK cascade that exhibits multistability.

(A) Network diagram of the model of the core MAPK signaling cascade. The red line indicates inhibition; the black lines indicate activation. (B) Trajectories of x₃(t) with the sets of nominal parameters that produce bistability (top) and limit cycle oscillations (bottom). The two dynamical regimes correspond to two different sets of nominal parameter values. The low (black dashed line) and high (solid green line) steady states are reached by manipulating the initial condition x₀. The initial condition for the high steady state is and that for the low steady state is . (C and D) Sobol sensitivity for the MAPK model parameters. All parameters except the total concentrations, S_1t, S_2t and S_3t, exponents, n₁ and n₂, and locally identifiable K₁ and K₂, are varied uniformly over the identified ranges (see S2 Table). We use 5,000 and 15,000 samples for the bistable and limit cycle regimes, respectively. (C) Sensitivity indices for bistable behavior dynamics. We use the steady state value of x₂(t) and x₃(t) for both the high and low steady states as quantities of interest. By selecting the two most sensitive parameters for the four quantities of interest, we reduce the set of free parameters to . (D) Sobol sensitivity indices for a set of free parameters that contribute to limit cycle behavior. We show the first-order sensitivity indices S_i and the total-order indices for the limit cycle amplitude and period of x₃(t). We reduce the number of free parameters by selecting those with S_i > 10⁻³ across both output quantities, that is, .

More »

Expand

Fig 5.

Varying levels of uncertainty in the parameters associated with the MAPK model impact steady state prediction.

(A) Marginal posterior distributions of the model parameters for parameter estimation from noisy data of the low steady state. Posterior distributions are visualized by fitting a kernel density estimator to 325,200 (150 walkers with 2,660 steps each) MCMC samples obtained using CIUKF-MCMC with the affine invariant ensemble sampler (AIES) for MCMC after discarding the first 840 samples per walker as burn-in. (B) Marginal posterior distributions of the model parameters for parameter estimation from noisy data of the high steady state reveal larger uncertainty in the model parameters when compared to the low steady state. We visualize distributions by fitting a kernel density estimator to 347,700 (150 walkers with 2,644 steps each) MCMC samples obtained using CIUKF-MCMC with the affine invariant ensemble sampler (AIES) for MCMC after discarding the first 856 samples per walker as burn-in. (C) Posterior distribution of the trajectory of x₃(t) with initial conditions that yield the low steady state highlights low uncertainty in the predicted dynamics. The true trajectory (dashed black line) shows the dynamics with the nominal parameters, the dotted blue line shows the trajectory evaluated at the MAP point, and the empty circles show the noisy data (covariance is 50% of the standard deviation of the true trajectory). The 95% credible interval shows the region between the 2.5th and 97.5th percentiles that contains 95% of 30,000 posterior trajectories. (D) Posterior distribution of the trajectory of x₃(t) with initial conditions that yield the high steady state highlights the ambiguity between which steady state is reached. All lines and computations are the same as in panel (A), except simulations were run using an initial condition that results in the high steady state.

More »

Expand

Fig 6.

Parameter estimation results for the MAPK model in the limit cycle regime with data only sampled from the oscillations.

(A) Marginal posterior distributions of the model parameters. Distributions are visualized by fitting a kernel density estimator to 1,305,720 (30 walkers with 43,524 steps each) MCMC samples obtained using CIUKF-MCMC with the affine invariant ensemble sampler (AIES) after discarding the first 7,447 samples per walker as burn-in. (B) Posterior distribution of the trajectory of x₃(t) in the limit cycle regime. The true trajectory (dashed black line) shows the dynamics with the nominal parameters, the dotted blue line shows the trajectory evaluated at the MAP point, and the points show the noisy data (covariance is 1% of the variance of the true trajectory). The 95% credible interval shows the region between the 2.5th and 97.5th percentiles that contains 95% of 30,000 posterior trajectories. (C) Sample posterior trajectories (50 out of 30,000 total) reveal that most trajectories closely match the true limit cycles. Additionally, several trajectories that reach a fixed point are shown. (D) Quantification of the fraction of the 30,000 sample trajectories that produce limit cycles oscillations, 90.6% (27,182 samples), or reach a fixed point, 9.4% (2,818 samples). (E) Histograms quantify the variability in limit cycle amplitude and period for the 27,182 trajectories that show limit cycle oscillations. We define the limit cycle amplitude as the peak-to-peak difference for one oscillation, and the period is the time to complete an oscillation. The vertical black lines show these quantities for the true trajectory.

More »

Expand

Fig 7.

Parameter estimation results for the MAPK model in the limit cycle regime with varied sampling strategies.

(A) The equidistant sampling data includes 30 samples taken every 2 minutes over 0 < t ≤ 60 (min). The non-equidistant sampling data includes 20 total samples with two sampling rates; there are 5 samples taken every 5 minute for the first 30 minutes and 15 samples taken every 2 minutes for 30 additional minutes. The oscillations only data set only includes samples from the oscillations with 15 samples taken every two minutes over the interval t ∈ (30, 60]. (B) The fraction of the 30,000 simulations that yield limit cycle or fixed point trajectories, with parameter samples from the prior, and the posterior distribution associated with each data set. (C–E) Marginal posterior distributions of the model parameters. Distributions are visualized by fitting a kernel density estimator to 2,524,800 samples for the equidistant sampling data, 763,080 for the non-equidistant sampling data, and 1,305,720 for the oscillations only data. (F–H) Two-dimensional scatter plots reveal relationships between k₄ and α that are necessary to produce limit oscillations. Simulations with blue points produce limit cycle oscillations, and those with red points produce fixed points. Darker regions indicate a higher probability of observing the corresponding parameter values.

More »

Expand

Fig 8.

Parameter estimation for a coupled kinase-phosphatase switch for long-term potentiation and long-term depression in neurons as a function of calcium input.

(A) Network diagram of the simplified coupled kinase-phosphatase signaling model where calcium Ca²⁺(t) acts as the input. (B) Trajectories of the three state variables in response to long-term potentiation (LTP; pulse of Ca²⁺(t) ≡ 4.0 [μM] from 2 ≤ t ≤ 3 (sec)) and long-term depression (LTD; pulse of Ca²⁺(t) ≡ 2.2 [μM] from 2 ≤ t ≤ 3 (sec)) inducing calcium inputs. The calcium level is set to a baseline of Ca²⁺(t) ≡ 0.1 [μM] before and after stimulus. We compute normalized EPSP by normalizing A(t) to its initial condition as described in [57]. The synthetic noisy data for the LTP and LTD cases are indicated by the black square and green circle marks, respectively, with the noise covariance equal to 1% of the variance of the data. (C and D) Sobol sensitivity indices for all free model parameters in response to LTP-inducing and LTD-inducing inputs, respectively. The quantities of interest are the steady state values of each state variable. We show both the first-order sensitivity indices S_i and the total-order indices . We select a reduced set of free parameters by choosing the parameters whose first-order sensitivity index is greater than 0.05, e.g., S_i > 0.05. This gives us the same set of free parameters, , k₆, K₀, P₀, K_tot, P_tot, A_tot]^⊤, for both the LTP and LTD cases. Remaining model parameters are fixed to the nominal values in S4 Table.

More »

Expand

Fig 9.

Comprehensive parameter estimation and uncertainty quantification reveal failures to predict the correct long-term model behavior.

(A) We estimated marginal posterior distributions of the model parameters from noisy data with an LTP-inducing calcium input. Distributions are visualized by fitting a kernel density estimator to 547,500 (150 walkers with 3,649 steps each) MCMC samples obtained using CIUKF-MCMC with AIES for MCMC after discarding the first 4,351 samples per walker as burn-in. (B) Posterior distribution of the trajectories of the state variables show LTP (normalized EPSP >1) and LTD (normalized EPSP <1) responses for an LTP inducing input. The true trajectory (dashed black line) shows the dynamics with the nominal parameters, the dotted blue line shows the trajectory evaluated at the MAP point, and the points show the noisy data (covariance is 1% of the variance of the true trajectory). The 95% credible interval shows the region between the 2.5th and 97.5th percentiles that contains 95% of 30,000 posterior trajectories. (C) Sample posterior trajectories (100 out of 30,000 total) highlight the LTP (blue lines) and LTD (black lines) in response to the LTP inducing input (top) and the LTD inducing input (bottom). The dashed green lines show the true trajectories for the respective calcium inputs. (D) Histograms reveal the distribution of the long-term responses to the LTP-inducing input (top) and the LTD-inducing input (bottom). The dashed black lines show the true response for the respective calcium inputs. (E) Quantifying the percentage of the 30,000 sample trajectories that produce LTD and LTP responses for each calcium input. The LTP-inducing input yields 76.59% (22,972 samples) of the responses in the LTP state and 23.41% (7,024 samples) in the LTD state. The LTD inducing input yields 68.93% (20,680 samples) of the responses in the LTP state and 31.07% (9,320 samples) in the LTD state.

More »

Expand