Skip to main content
Advertisement

< Back to Article

Fig 1.

Comparison of likelihoodist and Bayesian probability densities.

Using a noise model, the likelihood function normalized by its integral gives a likelihoodist probability density function (red) for the independent variable. In A the resulting probability that the variable of interest lies below 1 is clearly positive. In a hypothetical scenario where the variable of interest is known to be ≥ 1, a shifted exponential prior could be assigned (B). The posterior probability (orange) is then obtained via Bayes’ rule (Eq 1) and gives the desired 0 probability of the variable being less than 1. A may be viewed as a special case of the Bayesian perspective with a flat prior (blue). Observations and distribution parameters were chosen to obtain a good layout: , d = [2.0, 2.3, 2.4].

More »

Fig 1 Expand

Fig 2.

Relationship of independent and dependent variable.

The distribution of measurement responses (dependent variable) can be modeled as a function of the independent variable. This measurement response probability distribution (here: Student-t) is parametrized by its parameters the mean μ (solid green line) and spread parameters σ and ν. Some or all of the distributions parameters are modeled as a function of the independent variable.

More »

Fig 2 Expand

Fig 3.

Diagnostic plots of model fits.

The raw data (blue dots) and corresponding fit is visualized in the top row alongside 95, 90, and 68% likelihood bands of the model. Linear and logistic models were fitted to synthetic data to show three kinds of lack-of-fit error (columns 1–3) in comparison to a perfect fit (column 4). The underlying structure of the data and model is as follows: A: Homoscedastic linear model, fitted to homoscedastic nonlinear data. B: Homoscedastic linear model, fitted to heteroscedastic linear data. C: Homoscedastic linear model, fitted to homoscedastic linear data that is Lognormal-distributed. D: Heteroscedastic logistic model, fitted to heteroscedastic logistic data. The residual plots in the middle row show the distance between the data and the modeled location parameter (green line). The bottom row shows how many data points fall into the percentiles of the predicted probability distribution. Whereas the lack-of-fit cases exhibit systematic under- and over-occupancy of percentiles, only in the perfect fit case all percentiles are approximately equally occupied.

More »

Fig 3 Expand

Fig 4.

Uncertainty about the independent variable.

An intuition for inferring the independent variable from an observed dependent variable is to cut (condition) the green probability distribution model at the observed value (blue slices) and normalize its area to 1. The resulting (blue) slice is a potentially asymmetric probability distribution that describes the likelihood of the observation, given the independent variable. Its maximum (the maximum likelihood estimate) is the value of the independent variable that best describes the observation. For multiple observations, the probability density function for the independent variable corresponds to the product of the PDFs of the observations. The red shoulders mark the regions outside of the 90% equal-tailed interval.

More »

Fig 4 Expand

Fig 5.

Reparametrized asymmetric logistic function.

When parametrized as shown in Eq 11, each of the 5 parameters can be manipulated without influencing the others. Note that, for example, the symmetry parameter c can be changed without affecting the x-coordinate of the inflection point (Ix), or the slope S at the inflection point (gray vs. black).

More »

Fig 5 Expand

Fig 6.

calibr8 class diagram.

All calibr8 models inherit from the same CalibrationModel and DistributionMixin classes that define attributes, properties and method signatures that are common to all calibration models. Some methods, like loglikelihood() or objective() are implemented by CalibrationModel directly, whereas others are implemented by the inheriting classes. Specifically the predict_* methods depend on the choice of the domain expert. With a suite of Base*T classes, calibr8 provides base classes for models based on Student-t distributed observations. A domain expert may start from any of these levels to implement a custom calibration model for a specific application.

More »

Fig 6 Expand

Fig 7.

Data structures and computation graph of murefi models.

Elements in a comprehensive parameter vector are mapped to replicate-wise model instances. In the depicted example, the model instances for both replicates “B1” and “B2” share θ1,global as the first element in their parameter vectors. The second model parameter θ2 is local to the replicates, hence the full parameter vector (left) is comprised of three elements. Model predictions are made such that they resemble the structure of the observed data, having the same number of time points for each predicted time series. An objective function calculating the sum of log-likelihoods is created by associating predicted and observed time series via their respective calibration models. By associating the calibration models based on the dependent variable name, a calibration model may be re-used across multiple replicates, or kept local if, for example, the observations were obtained by different methods.

More »

Fig 7 Expand

Fig 8.

Linear (top) and logistic (bottom) calibration model of glucose assay.

A calibration model comprising linear functions for both the location parameter μA365 and the scale parameter of a Student-t distribution was fitted to calibration data of glucose standard concentrations () and absorbance readouts by maximum likelihood estimation (A-C). The calibration data used to fit the linear model is the subset of standards that were spaced evenly on a log-scale up to (B, E). Likewise, a calibration model with a 5-parameter asymmetric logistic function for the μ parameter of the Student-t distribution was fitted to the full calibration dataset (D-E). In both models, the scale parameter was modeled as a 1st-order polynomial function of μ and the degree of freedom ν as a constant. The extended range of calibration standard concentrations up to reveals a saturation kinetic of the glucose assay (A, D) and depending on the glucose concentration, the residuals (C, F) with respect to the modeled location parameter are scattered by approximately 5%. Modeling the scale parameter of the distribution as a 1st-order polynomial function of μ describes the broadening of the distribution at higher concentrations (C).

More »

Fig 8 Expand

Fig 9.

Calibration model of biomass-dependent backscatter measurement.

Backscatter observations from two independent calibration experiments (1400 rpm, gain = 3) on the same BioLector Pro cultivation device were pooled. A non-linearity of the backscatter/CDW relationship is apparent already from the data itself (A). The evenly spaced calibration data (B) are well-described with little lack-of-fit error (C). At low biomass concentrations the relative spread of the measurement responses starts at ca. 20% and reduces to approximately 2% at concentrations above .

More »

Fig 9 Expand

Fig 10.

Independent variable PDFs in various observation scenarios.

Posterior densities inferred from various numbers of observations corresponding to different biomass concentrations are shown (A). The ends of the drawn lines in A indicate the 95% equal-tailed interval. Near biomass concentrations of 0, the posterior density is asymmetric (A, blue), indicating that very low concentrations cannot be distinguished. As the number of observations grows, the probability mass is concentrated and the ETIs shrink (A, oranges). The choice of a Student-t distribution model can lead to a multi-modality of the inferred posterior density when observations lie far apart (B). For asymmetric distributions, the median (dashed line) does not necessarily coincide with a mode and equal-tailed and highest-density intervals (ETI, HDI) can be different. Maximum likelihood estimates from individual observations, as obtained via predict_independent are shown as arrows. Note: and the model’s ν parameter were chosen at extreme values for illustrative purposes.

More »

Fig 10 Expand

Table 1.

Tabular representation of a parameter mapping.

With columns corresponding to the parameter names of a naive Monod process model, the parametrization of each replicate, identified by a replicate ID (rid) is specified in a tabular format. Parameter identifiers that appear multiple times (e.g. S0) correspond to a parameter shared across replicates. Accordingly, replicate-local parameters names simply do not appear multiple times (e.g. X0_A06). Numeric entries are interpreted as fixed values and will be left out of parameter estimation. Columns do not need to be homogeneously fixed/shared/local, but parameters can only be shared within the same column. The parameter mapping can be provided as a DataFrame object.

More »

Table 1 Expand

Fig 11.

Measurements and maximum likelihood estimate of C. glutamicum growth Monod model.

Original measurement responses of online biomass (backscatter) and at-line endpoint glucose assay measurements (absorbance) are shown in (A). Glucose measurements were obtained by sacrificing culture wells, hence each backscatter time series terminates at the time of glucose assay observations. The time and well ID of sacrifices are marked by arrows, colored by row in the cultivation MTP. The inset plot shows a typical layout of the cultivation plate (FlowerPlate). The preculture wells are highlighted in green, main cultures in black. In B, the observations and MLE predictions of the ODE process model are shown in SI units. Observations were transformed from original units using the predict_independent method of the respective calibration model. Whereas all curves start at the same global initial substrate concentration S0, each well has individual initial biomass concentrations, resulting in the time shifts visible in the zoomed-in inset plot. Biomass observations in the inset plot (●) correspond to the median posterior inferred from each backscatter observation individually.

More »

Fig 11 Expand

Fig 12.

Parameter correlations, data and posterior distributions of growth curves.

Each kernel density estimate (KDE) in the top half shows a 2-dimensional cross section of the full posterior, visualizing correlations between some of the model parameters. For example, the topmost KDE shows that the posterior samples of Foffset,D04 are correlated with X0,μ. Axis labels correspond to the lower and upper bound of 90% HDIs. The large pair plot shows just the marginals that are relevant for the replicates D04 and D06, whereas the small pair plot shows the dimensions for all parameters (high resolution in S2 Fig). In the bottom half of the figure, the kinetics of replicates D04 and D06 are drawn. The red (substrate) and green (biomass) densities correspond to the distribution of predictions obtained from posterior samples, as described in Section 3.2.7. The red violins visualize the posterior inferred from single glucose measurement responses without the use of the process model. Likewise, the green vertical bars on the biomass concentrations show the 90% HDI.

More »

Fig 12 Expand

Fig 13.

Comparison of Monod model fit with linear error model.

Two Monod kinetic process models were fitted to the same observations from culture well D06 utilizing either a linear calibration model for the biomass/backscatter relationship (orange in A, calibration in D) or the previously established logistic model (blue in A). In A the posterior distribution of backscatter observations (density bands) is overlaid with actual backscatter observations. A linear calibration (D) model with fixed intercept (Section 3.1.5) was fitted to the subset of calibration data points up to such that it covers the range of biomass concentrations expected in the experiment. Residual plots of the observations compared to the posterior predictive distribution of backscatter observations (B, C) show that the fit obtained with the logistic calibration model (blue) has much less lack-of-fit compared to the one with the linear model (orange). Note that the backscatter residuals of ±1% are small compared to the amplitude of the absolute values going from close to 0 to approximately 20. The discrepancy between the two models is also evident from the 90% HDI of the maximum growth rate μmax of [0.414, 0.423] h−1 in the logistic and [0.480, 0.530] h−1 in the linear case.

More »

Fig 13 Expand

Fig 14.

Predictions, observations and residuals of Monod model fitted to backscatter data.

A: Through a logarithmic y-axis, the plot A shows that both process model (blue density) and the obtained from the biomass calibation model with individual observations (orange) describe an exponentially increasing biomass concentration up to approximately 9 hours. B: The residuals between prediction and observed backscatter (black) and the posterior predictive backscatter distribution (green density) show that the lack-of-fit is consistently less than ±0.25 backscatter units with the exception of a fluctuation at the time of substrate depletion.

More »

Fig 14 Expand

Fig 15.

Posterior group mean and well-specific initial biomass concentrations X0.

Variability between the growth curves in separate wells is described by well-specific initial biomass concentrations X0,well. Their posterior probability distribution is wide if the well was sacrificed early (left) and narrows down with the number of observed time points (right). Their common hyperprior (a.k.a group mean prior) X0,μ for the mean of each X0,well was updated to a posterior with .

More »

Fig 15 Expand

Table 2.

Comparison with related software packages.

DSL: Domain-Specific Language, GUI: Graphical User Interface.

More »

Table 2 Expand