Figure 1.
A The six-arm restless bandit is implemented graphically as a board game. Six locations correspond to the six arms. Locations are color-coded; blue locations have lower average unexpected uncertainty than red locations. Blue locations pay 1, 0 or −1 CHF (Swiss francs). Red locations pay 2, 0 or −2 CHF. Chosen option is highlighted (in this case, location 5). Participants can freely choose a location each trial. Histories of outcomes in locations chosen in the past are shown by means of coin piles. B Visual representation of risk and estimation uncertainty. Risk can be tracked using entropy, which depends on the relative magnitudes of the outcome probabilities, i.e., the relative heights of the bars in the left chart. The bars represent the three estimated outcome probabilities (mean of the posterior probability distribution or PPD). Entropy (risk) is maximal when the bars are all equal. Estimation uncertainty is represented by the widths of the posterior distributions of the outcome probabilities, depicted in the right chart.
Figure 2.
Three kinds of uncertainty in the task.
A Evolution of the estimation uncertainty (entropy of mean posterior outcome probabilities) of chosen options in one instance of the board game. Learning is based on choices of one participant in our experiment. Blue dots on the horizontal axis indicate trials when a blue location was chosen; red dots indicate trials when a red location was visited. B Evolution of the unexpected uncertainty of chosen options in one instance of the board game, measured (inversely) as the probability that no jump has occurred. Learning is based on choices of one participant in our experiment. Blue dots on the horizontal axis indicate trials when outcome probabilities for the visited blue location jumped; red dots indicate trials when outcome probabilities for the visited red location jumped. C Average estimated risk (entropy of outcome probabilities) in one instance of the board game, by location (numbered 1 to 6). Learning is based on the choices of one participant in our experiment. Locations are arranged by level of unexpected uncertainty (blue: low; red: high). Average estimated risks are compared with true risks. The participant managed to distinguish risk differentials across blue locations, but not across red locations. Average estimated risks regress towards the grand mean because of estimation uncertainty after each jump in outcome probabilities.
Figure 3.
Evolution of the (logarithm of the) Bayesian learning rate for two options in one instance of the board game.
Learning is based on the choices of one participant in our experiment. Top option has low average unexpected uncertainty (low chance of jumps) and low risk (one outcome probability was very high); bottom option has high average unexpected uncertainty and low risk. Crosses on the horizontal axis indicate trials when the option was chosen.
Figure 4.
Goodness-of-fits of the Bayesian models, with (right) and without (left) penalty for ambiguity.
Based on approximately 500 choices of 62 participants. Data are from [9]. Heights of bars indicate mean of the individual negative log-likelihood; line segments indicate standard deviations. :
;
:
;
:
.
Figure 5.
Replication of the experiment in [9].
Mean BICs and standard deviations of the Bayesian, reinforcement and Pearce-Hall learning models without structural uncertainty (Treatment 3). Based on the choices of 30 participants in approximately 500 trials of our board game. The Bayesian model is the base version (unadjusted for ambiguity aversion). :
;
:
;
:
.
Figure 6.
Goodness-of-fits of the Bayesian and reinforcement learning models under varying levels of structural uncertainty.
A Goodness-of-fits of the Bayesian and reinforcement learning models under full structural uncertainty (Treatment 1). Based on the choices of 43 participants in approximately 500 trials of our board game. The Bayesian model includes a penalty for estimation uncertainty – like in the data from [9], this model turned out to fit the data better than the base version of the Bayesian model. Heights of bars indicate mean of the individual Bayesian Information Criterion (BIC); line segments indicate standard deviations. The difference in the mean BIC is not significant (). B Goodness-of-fits of the Bayesian and reinforcement learning models under partial structural uncertainty (Treatment 2). Mean BICs and standard deviations of the Bayesian and reinforcement learning models in Treatment 2. Based on the choices of 32 participants in approximately 500 trials of our board game. The Bayesian model includes a penalty for estimation uncertainty. Participants knew the structure of the game except for the jumps in outcome probabilities. They were told that the description of the structure was incomplete.
:
;
:
;
:
.