Skip to main content

Advertisement

PLOS Computational Biology

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
About

Search Search

advanced search

< Back to Article

Fig 1 — Fig 1.

Fitting of HGF model on dataset with changing variance.
Two signals with a low (0.1) and high (1) variance were successively simulated for 200 trials each. A two-level HGF and the HAFVF were fitted to this simple dataset. A. The HGF considered the lower variance component as a “tonic” factor whereas all the additional variance of the second part of the signal was assigned to the “phasic” (time-varying) volatility component. This corresponded to a high second-level activation during the second phase of the experiment (B.) reflecting a low estimate of signal stability. The corresponding Maximum a Posteriori (MAP) estimate of the HAFVF had a much better variance estimate for both the first and second part of the experiment (A.), and, in contrast to the HGF, the stability measure (B.) decreased only at the time of the change of contingency. Shaded areas represent the 95% (approximate) posterior confidence interval of the mean. Green dots represent the value of the observations.

More »

Table 1 — Table 1.

HAFVF prior parameters in the case of normally distributed variables.
Horizontal lines separate the various levels.

More »

Fig 2 — Fig 2.

Directed Acyclic Graph of the HAFVF model.
Plain circles represent observed variables, white circles represent latent variables and dots represents prior distribution parameters. Dashed circles and dashed arrows represent approximate posteriors and approximate posterior dependencies. A weighted prior latent node is highlighed.

More »

Fig 3 — Fig 3.

Simulated policies for the AC-HAFVF as a function of reward variance β_j and number of effective observations κ_j, for a fixed value of posterior mean rewards (μ₁ = −μ₂ = 1), shape parameter α₁ = α₂ = 3 threshold ζ = 2, start point z₀ = ζ/2 and τ = 0.
A. Choices were more random for more noisy reward distributions (i.e. high values of β_j) and for mean estimates with a higher variance (i.e. with a lower number of observations κ_j). B. Decisions were faster when the difference of the means was clearer (high κ_j) and when the reward distributions was noisy (high β). Subjects were slower to decide what to do for noisy mean values but precise rewards, reflecting the high cognitive cost of the decision process in these situations.

More »

Fig 4 — Fig 4.

HAFVF and HGF performance on the same dataset.
Shaded areas represent the ±3 standard error interval. The two models were fitted to the first 400 trials, and then tested on the whole trace of observations. A. Observations, mean and standard error of the mean estimated by both models. B. The variance estimates show that the HAFVF adapted better to the variance in the first part of the experiment, reflected better the surprise at the contingency change and adapted successfully its estimate when the environment was highly stable. The HGF, on the contrary, rapidly degenerated its estimate of the variance, and did not show a significant trace of surprise when the contingency was altered. C. The value of the effective memory of the HAVFV is represented by the approximate posterior parameter κ_μ, and the maximum memory (efficient memory, see Bayesian Q-Learning and the problem of flexibility) allowed by the model at each trial.

More »

Table 2 — Table 2.

This table summarizes the parameters of the beta prior of the two forgetting factors w and b used in the Learning and flexibility assessment section, as well as the initial prior over the mean and variance.
A low value of initial number of observations κ₀ was used, in order to instruct learner to have a large prior variance over the value of the mean. Each subject will be referred by its expected memory at the lower and higher level (i.e. L = long, S = short memory). For instance, the subject number 3 (LS) is expected to have a long first-level memory, but a short second-level memory, which should make her more flexible than subject 2 (SL) after a long training, whom has a short first-level memory but a long second-level memory.

More »

Fig 5.

HAFVF predictions after a CC.
Each column displays the results of a specific hyperparameter setting. The blue traces and subplots represent the learning in an experiment with a long training, the orange traces and subplots show learning during a short training experiment. A. The stream of observations in the two training cases are shown together with the average posterior expected value of the mean . The box line width illustrates the ranking of the ELBO of each specific configuration for the dataset considered, with bolder borders corresponding to larger ELBOs. For both training conditions, the winning model (i.e. the model that best reflected the data) was the Long-Short memory model. This can be explained by the fact that the first two models trusted too much their initial knowledge after the CC, whereas the Short-Short learner was too cautious. B. Efficient memory (defined as ) for the first level (, plain line) and second level (, dashed line).

More »

Fig 5.

HAFVF predictions after a CC.
Each column displays the results of a specific hyperparameter setting. The blue traces and subplots represent the learning in an experiment with a long training, the orange traces and subplots show learning during a short training experiment. A. The stream of observations in the two training cases are shown together with the average posterior expected value of the mean . The box line width illustrates the ranking of the ELBO of each specific configuration for the dataset considered, with bolder borders corresponding to larger ELBOs. For both training conditions, the winning model (i.e. the model that best reflected the data) was the Long-Short memory model. This can be explained by the fact that the first two models trusted too much their initial knowledge after the CC, whereas the Short-Short learner was too cautious. B. Efficient memory (defined as ) for the first level (, plain line) and second level (, dashed line).

More »

Fig 6 — Fig 6.

HAFVF predictions after an isolated unexpected event.
The figure is similar to Fig 5. Here, the winning model was the one with a high memory on the first and second levels. The figure is structured as Fig 5, and we refer to this for a more detailed description.

More »

Table 3 — Table 3.

Average ELBOs for Experiment 1 and 2.
Higher ELBOs stand for more probable models.

More »

Fig 7 — Fig 7.

Simulated behavioral results.
A. The values of the two available rewards are shown with the dotted lines. The average drift rate μ₁ − μ₂ is shown in plain lines for two selected simulated subjects n₁ and n₂, and population average. Subject n₁ was more flexible than subject n₂ on both the first and the second level, making her more prone to adapt after the CCs, situated at trials 400, 500 and 600. This result is highlighted in the underlying zoomed box. B. The subjects’ expected variance (blue, log-valued) correlated negatively with the mean RT. The same correlation existed with the expected stability on the first level (orange, logit-valued), but not with the second level, which correlated positively with the average RT (green, logit-valued). Pearson correlation coefficient and respective p-values are shown in rounded boxes. C. Similarly, subjects with a higher expected variance and first-level stability had a lower average accuracy. Again, second-level memory expectation had the opposite effect.

More »

Fig 8 — Fig 8.

Correlation between the true (x axis) and the posterior estimate (y axis) of the parameters of the prior distributions across subjects.
The first row displays the correlations between true value and estimated θ₀. The second row focuses on ϕ₀ and β₀, whereas the third row shows the correlations for the NIGDM parameters (threshold, relative start-point and non-decision time). Correlation coefficients and associated p-value (with respect to the posterior expected value) are displayed in blue boxes. All parameters are displayed in the unbounded space they were generated from. Overall, all parameters correlated well with their true value, except for the α₀(a).

More »

Fig 9 — Fig 9.

Average quadratic approximation to the posterior covariance of the HAFVF parameters at the mode.

More »

Fig 10 — Fig 10.

Correlation between true and expected values of the variances and forgetting factors.
All the fitted values (y axis) are derived from the expected value of θ₀ (A., variance) and {ϕ₀, β₀} (B., forgetting factors) under the fitted approximate posterior distribution. Each dot represents a different subject. A. True (x-axis) to fit (y axis) correlation for the reward (blue) and mean reward (red) variance. Both expected values correlated well with their generative parameter, although the initial number of observations of the gamma prior α₀ did not correlate well with its generative parameter. B. True (x-axis) to fit (y axis) correlation for the first (blue) and second (red) level expected forgetting factor.

More »

Fig 11 — Fig 11.

MDPs of experiment 1 (A) and 2 (B).
Rewarded states are displayed in green. Each action had a probability of 90% to lead to the end state indicated by the red and black arrows (respectively left and right action). The remaining 10% transition probabilities were evenly distributed among the other states. For clarity, the thick arrows show the optimal path the agent should aim to take during the two contingencies. Note that the only difference between experiment (i) and (ii) is the location of the rewarded state after the CC (state 2 for (i) and 5 for (ii)).

More »

Fig 12.

Behavioural results in the first (left of Fig 11) and second experiments (right of Fig 11).
A. and B. Heat plot of the probability of visiting each state and selecting each action for the 64 agents simulated. (i) Agents progressively learned the first optimal actions (left action in state 3-4-5) during the first half of the experiment, then adapted their behaviour to the new contingency (right action in states 1-4-2). (ii) Similarly, in the second experiment, agents adapted their behaviour according to the new contingency (left action in 1-3-5). C. Efficient memory on the first and second level, and foreseeing capacity. Since the CC was less important in (ii) than in (i), because the left action in state 3 kept being rewarded, the expected value of w dropped less. The behaviour of the foreseeing capacity () and, therefore, of the expected value of γ, is indicative of the effect that a CC had on this parameter: when the environment became less stable, tended to increase which had the effect of increasing the impact of future states on the current value. D. (i) Reward rate dropped after the CC, whereas the RT increased. The fact that subjects made slower choices after the CC can be viewed as a mark of the increased task complexity caused by the re-learning phase. Along the same line, RT decreased again when the subjects were confident about the structure of the environment. (ii) The CC had also a lower impact on the reward rate and RT in experiment (ii).

More »

Fig 12.

Behavioural results in the first (left of Fig 11) and second experiments (right of Fig 11).
A. and B. Heat plot of the probability of visiting each state and selecting each action for the 64 agents simulated. (i) Agents progressively learned the first optimal actions (left action in state 3-4-5) during the first half of the experiment, then adapted their behaviour to the new contingency (right action in states 1-4-2). (ii) Similarly, in the second experiment, agents adapted their behaviour according to the new contingency (left action in 1-3-5). C. Efficient memory on the first and second level, and foreseeing capacity. Since the CC was less important in (ii) than in (i), because the left action in state 3 kept being rewarded, the expected value of w dropped less. The behaviour of the foreseeing capacity () and, therefore, of the expected value of γ, is indicative of the effect that a CC had on this parameter: when the environment became less stable, tended to increase which had the effect of increasing the impact of future states on the current value. D. (i) Reward rate dropped after the CC, whereas the RT increased. The fact that subjects made slower choices after the CC can be viewed as a mark of the increased task complexity caused by the re-learning phase. Along the same line, RT decreased again when the subjects were confident about the structure of the environment. (ii) The CC had also a lower impact on the reward rate and RT in experiment (ii).

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US