Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies

doi:10.1371/journal.pcbi.1007043

Fig 1.

Performance of the HBI in a synthetic dataset.

10 and 30 artificial subjects were generated according to the RL (RL) and Kalman filter (KF) models, respectively. A) Model selection by HBI using protected exceedance probabilities (PXP); B) Model frequencies estimated by the HBI. C) Model attribution at the individual level by the HBI; Responsibility estimates are plotted for true attributions (TA), in which the true model has been attributed, and for false attributions (FA), in which the incorrect model is attributed. The HBI shows lower levels of responsibility for FA. Inset: percentage of correct assignment of the model by the HBI at the individual level. D) Comparison of accuracy of model selection with HPE and NHI; E, F) Error in estimating individual parameters of the RL (E) and the Kalman filter model (F). The estimation error is defined as the absolute difference between estimated parameters and the true parameters. In all plots, error-bars are standard errors of the mean obtained across 20 simulations.

More »

Expand

Fig 2.

Robustness of model selection to outliers.

The same 20 datasets simulated in the previous section were used as the base datasets (i.e. 0 outliers) and the effects of adding 1, 2 or 3 outliers to each dataset were examined. The HPE shows severe sensitivity to outliers, while the other two (random effects) methods are robust.

More »

Expand

Fig 3.

Performance of the HBI in a synthetic dataset including models with the different number of parameters.

10 and 30 artificial subjects were generated according to the RL and dual-α RL models, respectively. A) Model selection by HBI using protected exceedance probabilities (PXP); B) Model frequencies estimated by the HBI. C) Model attribution at the individual level by the HBI. Responsibility estimates are plotted for true attributions (TA) and for false attributions (FA). The HBI shows lower levels of responsibility for FA. Inset: percentage of correct assignment of the model by the HBI at the individual level. D) Model selection performance of NHI, HPE, and HBI; E, F) Error in estimating individual parameters of the RL (E) and the dual-α RL model (F). The estimation error is defined as the absolute difference between estimated parameters and the true parameters. In all plots, error-bars are standard errors of the mean obtained across 20 simulations.

More »

Expand

Fig 4.

Comparison of HBI with NHI in model selection and model attribution.

We compared the performance of HBI and NHI in three simulation analyses with different ratio of subjects expressing each model. The first simulation includes 10 subjects expressing RL and 30 subjects expressing dual-α RL model (10/30). The second one includes 20 subjects per model (20/20) and the third one includes 30 subjects expressing RL and 10 dual-α RL (30/10). A) Mean protected exceedance probabilities (PXP) estimated by the HBI and NHI; B) Mean model frequency of RL across all simulations (true frequencies are also plotted). C-D) Model selection performance at PXP>0.5 (C) and PXP>0.95 (D). For the 20/20 simulations, 50% of each model should be selected at the chance level, i.e. PXP>0.5, and none of the models should be selected at PXP>0.95. E) Model attribution performance, at the individual level, using responsibility (r) parameters at 0.95 thresholds across all three simulations. The HBI is more accurate than the NHI in model attribution and shows more true attributions (TA) and less false attributions (FA). E) ROC curves, across all three simulations, for HBI and NHI, which illustrate model attribution performance at various threshold settings. Inset: area under the curve (AUC) of the ROC, as a metric for model attribution performance. The HBI shows better performance than the NHI according to this metric. In A-B, error-bars are standard errors of the mean obtained across 20 simulations.

More »

Expand

Fig 5.

Performance of the HBI as a function of the number of trials.

10 and 30 artificial subjects were generated according to the RL and dual-α RL models, respectively. These simulations were performed with a different number of trials (T) per subject. A) The accuracy of model selection by NHI, HPE, and HBI for T = 50, T = 100, and T = 200 trials; B) Mean error in estimating individual parameters across both models and parameters. Note that the estimation errors here are computed on the normally distributed parameters. The estimation error is defined as the absolute difference between estimated parameters and the true parameters. In all plots, error-bars are standard errors of the mean obtained across simulations 20 times.

More »

Expand

Fig 6.

Performance of the HBI as a function of the number of subjects.

In this analysis, simulations were repeated 1000 times, in which in half of the simulations, the ratio of the RL model was three times more than the dual-α RL, and vice versa in the other half. A) Protected exceedance probabilities (PXP) of the most frequent model estimated by the HBI and NHI; B) Model frequency of the most frequent model across all simulations. The black line indicates the true frequency (0.75). C-D) Model selection performance by the HBI and NHI at PXP>0.5 and PXP>0.95, respectively. The NHI almost never selects the most frequent model at PXP>0.95. E) Model selection performance using area under the ROC curve. Higher values indicate better performance (one corresponds to perfect model selection). The HBI performance improves by increasing the number of subjects. F) Error in estimating individual parameters across both models and parameters. Estimation errors are computed on the normally distributed parameters. The estimation error is defined as the absolute difference between estimated parameters and the true parameters. In A, B, and F, median across 1000 simulations is plotted and error-bars represent the first and third quantile.

More »

Expand

Fig 7.

The sensitivity of parameter estimation to outliers.

30 subjects are simulated using the RL model. A) In scenario 1, a number of outliers are also simulated with the same learning rate but small decision noise parameter. B) In scenario 2, outliers are simulated with small learning rate and small decision noise parameter. Errors in recovering the group-level parameters are plotted (for the learning rate, and decision noise,). HBI performs better than alternatives. The estimation error is defined as the absolute difference between estimated group-level parameters and the true parameters. In all plots, error-bars are standard errors of the mean obtained across simulations 20 times.

More »

Expand

Fig 8.

Performance of the HBI in a large model space.

HBI was tested in a large model space including RL, dual-α (DA) RL, Kalman filter (KF) and actor-critic (AC) models in four scenarios. In each scenario, one model (the dominant model) was used to generate 30 subjects. Other models were used to generate 10 subjects. A) Model selection by HBI using protected exceedance probabilities (PXP). B) Model frequencies estimated by the HBI. Note that in each scenario, the model frequency of the dominant model is 0.5 and it is about 0.17 for the other models. C) Model selection performance (at 50%) of NHI, HPE, and HBI. D) Error in estimating individual parameters across both models and parameters. Estimation errors are computed on the normally distributed parameters, defined as the absolute difference between estimated parameters and the true parameters. In all plots, error-bars are standard errors of the mean obtained across 20 simulations.

More »

Expand

Fig 9.

Performance of the HBI in the two-step Markov decision task.

30, 10 and 10 artificial subjects have been generated using the hybrid, the model-based (MB) and the model-free (MF) models, respectively. A) Model selection by HBI using protected exceedance probabilities (PXP). B) Model frequencies estimated by the HBI. C) Model selection performance (at 50%) of NHI, HPE, and HBI. D) Error in estimating the critical weight parameter of the hybrid model at the individual level. HBI shows less error than other methods in all simulations. In all plots, error-bars are standard errors of the mean obtained across 20 simulations.

More »

Expand

Fig 10.

Performance of the HBI t-test for making inference at the population level.

RL agents with a bias parameter were generated according to different mean (effect size) values in two simulations where A) there is only one model in the model-space (scenario 1); or B) there are two models in the model-space (scenario 2). The HBI makes inference using the HBI t-test, the NHI makes inference by performing a t-test on its estimated parameters and the HPE makes inference by comparing the full fit and null fit (in which the group-level prior mean for the bias parameter is fixed). The sensitivity (or power) of the tests in detecting true effects at P <0.05 for a number of different effect sizes is plotted (i.e. true positive rate). For the HPE, log-evidence of at least 3 was considered as significant. The HPE shows lower sensitivity than the other methods in both scenarios. Moreover, the HBI shows higher sensitivity than the NHI in scenario 2.

More »

Expand

Fig 11.

Performance of the HBI t-test under the null.

A bias parameter was generated under the null (effect size is 0) in two simulations where A) there is only one model in the model-space (scenario 1); or B) there are two models in the model-space (scenario 2). The probability distribution of P-value is obtained by repeating the simulation 2000 times. Note that under the null hypothesis, the resulting P-value is theoretically expected to have a uniform distribution. The error-bars are 95% confidence intervals for the binomial distribution.

More »

Expand

Fig 12.

Performance of the HBI t-test when samples are drawn from a skewed distribution.

A) The skewed distribution (skewness of −0.5). The mean, variance and kurtosis of the distributions are 0, 1 and 3 (i.e. kurtosis of the normal distribution), respectively. This distribution was used to generate the bias parameter, which was then used to generate 20 (A) and 50 (B) subjects according to the biased RL model. B-C) Inference at P <0.05 for the HBI t-test on estimated parameters and t-test on true parameters, as a benchmark, when there is no effect (under the null). Note that this is an unrealistic benchmark because it is based on true parameters that the HBI does not have access to. D-E) Probability of P-value is obtained under the null hypothesis by repeating simulations 2000 times. Under the null hypothesis, the resulting P-value is theoretically expected to have a uniform distribution. Increasing the number of subjects improves the performance of the HBI t-test. The error-bars are 95% confidence intervals for the binomial distribution.

More »

Expand

Fig 13.

Using HBI for making inference on empirical datasets.

A) HBI has been applied to a dataset of the two-step Markov decision task. The model space consisted of the hybrid, the model-based (MB) and the model-free (MF) models. Protected exceedance probabilities (PXP), model frequencies and estimated parameters of the winning model (the hybrid) are plotted. The error-bars are obtained by applying the corresponding transformation function on the hierarchical errors and, therefore, are not necessarily symmetric.

More »

Expand

Fig 14.

Using HBI for making inference on Parkinson’s patients data.

A) HBI has been applied to a dataset of 31 PD patients performing a probabilistic reward and punishment learning task. The model space consisted of a null non-learning (NL) model, RL, and the dual-α RL. Protected exceedance probabilities (PXP), model frequencies and estimated parameters of the winning model (the dual-α RL) are plotted. The HBI revealed that the dual-α RL is more likely across PD patients. B) The same model space was fitted to a dataset of 20 healthy control subjects performing the same task. In contrast to PD patients, the RL model is more likely across the control group. In addition to the decision noise, β, and learning rate parameters, both RL models also modeled tendency to repeat or avoid the previous choice regardless of outcomes using a perseveration parameter, p. A permutation test revealed that the dual-α model is more likely than the RL model in PD compared with the controls. The error-bars are obtained by applying the corresponding transformation function on the hierarchical errors and, therefore, are not necessarily symmetric.

More »

Expand