Fig 1.
Reconstructing a probability density function by density estimation or regression.
Using Monte Carlo sampling we have samples drawn from the target probability distribution. With density estimation, the locations of the samples is used to reconstructed the probability distribution. With regression, both the sample location and the unnormalized, relative probability is used to reconstruct the probability distribution. The example function is a t-distribution with ν = 4 centered at 7.5.
Fig 2.
Density estimation methods applied to a bivariate example.
At the top, we start with samples obtained through Monte Carlo sampling from the posterior of two variables (x1 and x2). The two variables are βkill and γ from the (bounded) Lotka-Volterra example discussed later. On the left, a kernel density estimate or a Gaussian mixture is fitted to the samples. In the middle, the variables are first transformed to an unbounded domain (in this case through a scaled logit transform) before a KD or GM is fitted. On the right, the variables are transformed to have uniform marginal distributions between 0 and 1, using either a parametric mixture or an empirical cumulative distribution with Pareto tails. Subsequently, a copula function is fitted to the transformed variables. Finally, on the bottom row, new samples are drawn from each of the approximations. Where necessary, the new samples are transformed with the inverse of the original transformation. In each case the distribution of the new samples is similar to the original sample distribution, but slight differences between the approximations can be observed as well.
Fig 3.
Comparison of the approximation methods for reconstructing known target distributions with increasing dimensionality.
The density approximations were trained on 500 samples, and the accuracy was evaluated by the root mean square error (RMSE) calculated over 500 new samples. This procedure was repeated 100 times and the boxplots show the resulting RMSEs. Note that the scale of the RMSE is different for different dimensionalities and test cases, as the mean density at the location of the samples is also different.
Fig 4.
Lynx-hare datasets and posterior predictive distributions.
The lynx data provides an estimate of lynx density (number of animals per surface area) and the hare data provides an estimate of hare density and natality. Black dots indicate the data, the thick blue line is the median and the shaded blue area the 90% confidence interval of the posterior predictive.
Fig 5.
Lynx and hare dataset posterior approximation.
(A) Marginal posterior densities after seeing the lynx data, the graphs are constructed using kernel density estimation with plug-in bandwidth selection. (B) Correlations between the parameters in the lynx posterior. (C) Scatter plot of the samples for one parameter combination. (D) Approximation accuracy as a function of sample size. Spearman correlation and root mean square error were calculated by comparing the approximation with another 1,000 MCMC samples from the target.
Fig 6.
Sequential inference performance.
(A) Marginal density of one of the parameters (for clarity, only the GM approximation and importance reweighting result is shown). The dashed lines indicate the posterior of the two datasets separately, and the black line is the true joint. Other colors are the same as in C. (B) Empirical cumulative distribution of the same parameter, showing all approximation methods. The colors are the same as in C. (C) Kolmogorov-Smirnov statistics for the comparison of the marginal distributions of the true joint to the marginals of the posterior obtained after sequential inference with each of the approximation methods. Each dot indicates one of the parameters.
Table 1.
Log marginal likelihood estimates (± estimation variance if available).
Fig 7.
Sequential inference with bounded priors.
(A) Marginal posterior densities after seeing the lynx data; compare with Fig 5A. (B) Correlations between the parameters in the lynx posterior. (C) Scatter plot of the samples for one parameter combination. (D) Sequential inference accuracy; same as in Fig 6, with the addition of transformed and truncated variations.
Fig 8.
Accuracy of joint versus sequential inference.
Each point represents the mean and standard deviation of the marginal posterior distribution from one run. Each run has the same total number of model evaluations (1.8 million). The dashed line indicates the mean and standard deviation of the posterior from a joint inference run with 100-fold more model evaluations.
Table 2.
Time complexity of training and evaluation of the approximation methods.
Evaluation is the cost of evaluating one new sample. N = number of Monte Carlo samples used for the estimation, D = dimensionality, G = number of mixture components. The training time gives the number of seconds required to fit a 10-dimensional approximation on 1,000 samples using mvdens. For mixture fitting, the time for fitting a 5-component model is reported; optimizing the number of components will grow linearly with the number of components considered.
Fig 9.
Sequential inference in the breast cancer signaling model.
(A) Signaling model in Systems Biology Graphical Notation format. (B) Data and posterior predictive distributions. Black dots indicate the data and the blue shaded area is the 90% confidence interval of the predictive mean. “p-Akt_S473” is the measurement of p and “p-PRAS40_T246” is the measurement of q. (C) Performance in sequential inference when the data is split by first using the measurement of p and then q (i.e. first use the data shown in the top two graphs in (B), and then the bottom two). (D) Performance in sequential inference when the data is split by pre-treatment and on-treatment (i.e. first use the data shown in the left two graphs shown in (B), and then the right two). (E) Density of one of the parameters as inferred by joint inference (black line) and through sequential approximation split by treatment (colored lines). The grey lines indicate the separate posteriors of the pre-treatment and on-treatment data. (F) Contour plot of the bivariate posterior density of two of the parameters obtained from either dataset alone or the true joint.