^{1}

^{*}

^{2}

^{3}

Conceived and designed the experiments: EP-LN. Performed the experiments: EP-LN. Analyzed the data: EP-LN. Wrote the paper: EP-LN PB. Supervised EP-LN: PB.

The authors have declared that no competing interests exist.

Recently, evidence has emerged that humans approach learning using Bayesian updating rather than (model-free) reinforcement algorithms in a six-arm restless bandit problem. Here, we investigate what this implies for human appreciation of uncertainty. In our task, a Bayesian learner distinguishes three equally salient levels of uncertainty. First, the Bayesian perceives irreducible uncertainty or risk: even knowing the payoff probabilities of a given arm, the outcome remains uncertain. Second, there is (parameter) estimation uncertainty or ambiguity: payoff probabilities are unknown and need to be estimated. Third, the outcome probabilities of the arms change: the sudden jumps are referred to as unexpected uncertainty. We document how the three levels of uncertainty evolved during the course of our experiment and how it affected the learning rate. We then zoom in on estimation uncertainty, which has been suggested to be a driving force in exploration, in spite of evidence of widespread aversion to ambiguity. Our data corroborate the latter. We discuss neural evidence that foreshadowed the ability of humans to distinguish between the three levels of uncertainty. Finally, we investigate the boundaries of human capacity to implement Bayesian learning. We repeat the experiment with different instructions, reflecting varying levels of structural uncertainty. Under this fourth notion of uncertainty, choices were no better explained by Bayesian updating than by (model-free) reinforcement learning. Exit questionnaires revealed that participants remained unaware of the presence of unexpected uncertainty and failed to acquire the right model with which to implement Bayesian updating.

The ability of humans to learn changing reward contingencies implies that they perceive, at a minimum, three levels of uncertainty: risk, which reflects imperfect foresight even after everything is learned; (parameter) estimation uncertainty, i.e., uncertainty about outcome probabilities; and unexpected uncertainty, or sudden changes in the probabilities. We describe how these levels of uncertainty evolve in a natural sampling task in which human choices reliably reflect optimal (Bayesian) learning, and how their evolution changes the learning rate. We then zoom in on estimation uncertainty. The ability to sense estimation uncertainty (also known as ambiguity) is a virtue because, besides allowing one to learn optimally, it may guide more effective exploration; but aversion to estimation uncertainty may be maladaptive. Here, we show that participant choices reflected aversion to estimation uncertainty. We discuss how past imaging studies foreshadowed the ability of humans to distinguish the different notions of uncertainty. Also, we document that the ability of participants to do such distinction relies on sufficient revelation of the payoff-generating model. When we induced structural uncertainty, participants did not gain awareness of the jumps in our task, and fell back to model-free reinforcement learning.

In an environment where reward targets and loss sources are stochastic, and subject to sudden, discrete changes, the key problem humans face is learning. At a minimum, they need to be able to assess

To correctly gauge estimation uncertainty, two additional statistical properties of the environment ought to be evaluated:

With Bayesian learning, the three notions of uncertainty are tracked explicitly. This is because Bayesians form a model of the environment that delineates the boundaries of risk, estimation uncertainty and unexpected uncertainty. The delineation is crucial: estimation uncertainty tells Bayesians how much still needs to be learned, while unexpected uncertainty leads them to forget part of what they learned in the past.

This contrasts with model-free reinforcement learning. There, uncertainty is monolithic: it is the expected magnitude of the prediction error

Recently, evidence has emerged that, in environments where risk, estimation uncertainty and unexpected uncertainty all vary simultaneously, humans choose as if they were Bayesians

To discover that humans are Bayesians implies that they must have tracked the three levels of uncertainty. Here, we discuss how the levels differentially affected the Bayesian learning rate in our restless bandit task, and how participants could have distinguished between them.

Neural implementation of Bayesian learning would require separate encoding of the three levels of uncertainty. Recent human imaging studies appear to be consistent with this view. The evidence has only been suggestive, however, as no imaging study to date involved independent control of risk, estimation uncertainty and unexpected uncertainty.

Indeed, to our knowledge, ours is the first comprehensive study of risk, estimation uncertainty, and unexpected uncertainty. Many studies have focused on risk

The task in

This is important because, here, we are interested in re-visiting the data in

In our six-arm restless bandit problem, estimation uncertainty varied substantially over time and across arms, thus providing power to detect the presence of an exploration bonus in valuation, and hence, an effect of estimation uncertainty on exploration. Before our study, behavioral evidence in favor of an exploration bonus had been weak:

Firing of dopaminergic neurons in response to novel, uncertain stimuli has been interpreted as signaling exploration value

We re-visited the choices generated by the restless six-arm bandit problem of

Finally, we studied to what extent the empirical support for Bayesian learning depended on the level of detail participants received regarding the outcome generating process. In

Here, we report new results that clarify to what extent

Our task was a six-arm restless bandit problem, visually presented as a board game (see

Outcome probabilities within a color group jumped simultaneously. Participants did not know the jump frequencies. Nor did they know when jumps occurred. As such, there was unexpected uncertainty. After a jump, the outcome probabilities are given new, unknown values. Specifically, they did not revert to old values as in reversal learning tasks (e.g.,

In the version of this task in

To analyze the results, we implemented a

In each trial

We start from the same prior distribution of outcome probabilities for all options. It is denoted

In Eqn 3,

The posterior mean outcome probabilities are computed as follows:

To model adjudication between the six options, we opted for a softmax rule. Specifically, in trial

Risk can be measured by the entropy of the outcome probabilities. Since outcome probabilities are unknown throughout our experiment, entropy needs to be estimated. We compute entropy based on the posterior mean of the outcome probabilities. See

The presence of unexpected uncertainty and the recurring parameter estimation uncertainty make it more difficult to correctly assess risk.

The latter illustrates the

The different levels of uncertainty affect the learning rate in complex ways. Inspection of Eqn. 4 shows that the learning rate

This shows how unexpected uncertainty affects estimation uncertainty, and hence, the learning rate. While not directly, estimation uncertainty itself does affect the learning rate, through its effect on unexpected uncertainty. This can be verified by inspecting the formula for the probability that no jump occurred in trial

An analogous result obtains for risk – here defined as the entropy of the outcome probabilities. Intuitively, entropy is the variability in the probabilities across possible outcomes. If all outcome probabilities are the same, entropy is maximal. If one or more outcome probabilities are extreme (high or low), then entropy will be low. Eqn. 6 shows that unexpected uncertainty depends on outcome probabilities. The intuition is simple: if a particular outcome is estimated to occur with low probability, and that outcome does realize, the likelihood that it occurred because there was a jump is higher; conversely, if an outcome had high

Consequently, while the three levels of uncertainty separately influence the learning rate, unexpected uncertainty is pivotal. That is, estimation uncertainty and risk impact the learning rate through their effect on unexpected uncertainty. For instance, if the probability of an outcome is estimated with low precision (estimation uncertainty is high) or if it is estimated to be average (around

Learning is based on the choices of one participant in our experiment. Top option has low average unexpected uncertainty (low chance of jumps) and low risk (one outcome probability was very high); bottom option has high average unexpected uncertainty and low risk. Crosses on the horizontal axis indicate trials when the option was chosen.

One can easily discern the effect of a reduction in estimation uncertainty on the learning rate. During episodes when the participant chooses an option, she learns about the outcome probabilities, which reduces estimation uncertainty, and hence, the learning rate. This continues until she stops visiting the location at hand, and consequently, the – now imaginary – learning rate increases again. (We call the learning rate “imaginary” because there are no outcomes to be used to update beliefs; belief updating for the unchosen options evolve only because of what one learns about the chosen options.)

To implement Bayesian learning, the decision maker has, at a minimum, to track estimation uncertainty. As such, the decision maker senses that she does not know the parameters, and hence, she is ambiguity sensitive.

In multi-armed bandit settings, exploration is valuable. Only by trying out options will one be able to learn, thus reducing estimation uncertainty. As such, there should be a bonus to exploration of options with high ambiguity. This was recently proposed

Here, we re-visit behavior in our six-arm restless bandit task to determine to what extent choices reflect the presence of an exploration bonus or an ambiguity penalty, both equal to the level of estimation uncertainty. To this end, we added to the expected value of each option an exploration bonus, or alternatively, we subtracted an ambiguity penalty – computational details are provided in

The model with exploration bonus fitted worse than the one without any correction of valuations for estimation uncertainty. In contrast, the model with ambiguity penalty generated a better likelihood than did the base version of the Bayesian model for

Based on approximately 500 choices of 62 participants. Data are from

To investigate to what extent the evidence in favor of Bayesian updating is related to our providing subjects with ample structural knowledge of the outcome generating process, we ran a new experiment. We considered three treatments. In the first treatment, we provided subjects only with the rules of the game, and no structural information. In the second treatment, subjects were given some structural information (e.g., within a color group, one option was “biased” in the sense that its entropy was lower, while another option was close to random), but were left ignorant about the presence of jumps in the outcome probabilities; which means they were not informed about the potential of unexpected uncertainty. The third treatment was a replication of the original setting in

43 undergraduates from the same institution (Ecole Polytechnique Fédérale Lausanne) participated in the first treatment;

To calibrate the results, we first compare the fits of the third treatment to those of

Mean BICs and standard deviations of the Bayesian, reinforcement and Pearce-Hall learning models without structural uncertainty (Treatment 3). Based on the choices of 30 participants in approximately 500 trials of our board game. The Bayesian model is the base version (unadjusted for ambiguity aversion).

Having replicated the results with full disclosure of the structure of the outcome generating process, we turn to the first treatment, where subjects were not given any structural information.

Common to both Treatments 1 and 2 is the absence of information on the presence of unexpected uncertainty. The findings suggest that participants were not able to recognize that outcome probabilities jumped. To verify this conjecture, we examined the answers to the debriefing questionnaire after the experiment – participant answers are available upon request. Pooling the first two treatments (with a total of 75 cases), only

These findings are significant. In no way did the instructions attempt to mislead the participants. On the contrary, we stated explicitly that subjects had to watch out for features of the outcome generating process other than those spelled out in the instructions. In contrast, in the third treatment (as well as in the original experiment of

On occasion, humans have been shown to choose like Bayesian decision makers. In a context where outcome contingencies change constantly, this implies that humans should be able to distinguish various types of uncertainty, from unexpected uncertainty, over (parameter) estimation uncertainty, to risk. We will argue here that there exists emerging neurobiological evidence for separate encoding of these categories of uncertainty. As such, key components for neural implementation of Bayesian learning have become identified in the human brain.

Numerous studies have localized neural signals correlating with risk. Some sub-cortical regions are also involved in tracking expected reward (striatal regions;

Estimation uncertainty, or ambiguity as it is referred to in economics, has also recently been investigated in imaging studies. Early evidence pointed to involvement of the amygdala and lateral orbitofrontal cortex

Involvement of locus coeruleus and the neurotransmitter norepinephrine in tracking unexpected uncertainty has been conjectured a number of times and the evidence in its favor is suggestive

Activation of the amygdala-hippocampus complex to novel images in a learning context may be conjectured to reflect unexpected uncertainty

Evidence has thus emerged that the distinction of the three forms of uncertainty exists at the neuronal level. The well-documented sensitivity of humans to ambiguity (estimation uncertainty) further proves that the distinction can readily be observed in behavior. Confirming humans' sensitivity to estimation uncertainty, we presented evidence here that participants' tendency to explore in a six-arm restless bandit task decreased with estimation uncertainty. This finding falsifies the hypothesis that estimation uncertainty ought to induce exploration. It is, however, consistent with evidence of ambiguity aversion in the experimental economics literature, starting with

The reader may wonder why we have not augmented the reinforcement learning model with an ambiguity penalty, and examined the behavioral fit of this version of model-free reinforcement learning. The point is that non-Bayesians do not sense ambiguity. Indeed, the concept of a posterior belief is foreign to non-Bayesian updating, and hence, the variance or entropy of the posterior distribution of outcome probabilities, our two measures of estimation uncertainty, are quintessentially Bayesian. Since the representation of ambiguity is absent in the context of model-free reinforcement learning, a fortiori ambiguity cannot weigh in the exploration strategy. In light of this, one should not combine model-free reinforcement learning with an ambiguity penalty/bonus.

A third major finding was that full Bayesian updating is reflected in human learning only if enough structural information of the outcome generating process is provided. Specifically, the ability to track unexpected uncertainty, and hence, to detect jumps in the outcome probabilities, appeared to rely on instructions that such jumps would occur. When participants were not informed about the presence of unexpected uncertainty, their choices could equally well be explained in terms of simple reinforcement learning. This evidence emerged despite suggestions to watch for features of the outcome generating process that were not made explicit in the instructions.

Situations where decision makers are ignorant of the specifics of the outcome generating process entail model or structural uncertainty. Our study is the first to discover that humans cannot necessarily resolve model uncertainty. In our experiment, many participants failed to recognize the presence of unexpected uncertainty. Consequently, in the exit questionnaires they often took the arms to be “random” [in our language, risky] which illustrates the antagonistic relationship between risk and unexpected uncertainty – jumps were confounded with realization of risk.

Our participants' failure to detect jumps may suggest that their “mental models” excluded nonstationarity

Structural uncertainty was originally suggested in the economics literature, where it is referred to as

Nevertheless, we think it is important to refrain from reducing structural uncertainty to mere parameter estimation uncertainty, because the number of possible models of the outcome generating process in any given situation is large, and hence, the number of parameters to be added to capture structural uncertainty can be prohibitively high

The latter may explain our finding that human choice in our six-arm restless bandit task reveals less evidence of Bayesian updating when we introduce structural uncertainty. Since reinforcement learning provides ready guidance in situations where Bayesian updating may fail, our participants understandably switched learning strategies. Because they became (model-free) reinforcement learners, they no longer detected unexpected uncertainty. Indeed, uncertainty is monolithic in the absence of a model of the outcome generating process; there is no distinction between risk, estimation uncertainty, unexpected uncertainty, or even model uncertainty.

To conclude, our results suggest that learning-wise, structural uncertainty should not be thought of as an extension of ambiguity. We thus advocate a separation of situations entailing structural uncertainty and situations entailing ambiguity in future studies of decision making under uncertainty. We would also advocate a clear separation of situations where the outcome probabilities change suddenly and the related but mathematically distinct situations, where outcome probabilities change continuously. The former entail unexpected uncertainty. The latter are analogous to the contexts where Kalman filtering provides optimal forecasts, but where risk is stochastic. In financial economics, one therefore uses the term

In our six-arm restless bandit, the three levels of uncertainty change in equally salient ways. Future imaging studies could therefore rely on our task to better localize the encoding of uncertainty and its three components. In addition, our task could allow one to investigate engagement of brain structures in the determination of the learning rate.

All the experiments reported on here had the approval from the ethics commission of the Ecole Polytechnique Fédérale Lausanne.

We implemented a six-arm restless bandit task with a board game. See

In our Bayesian learning model, the distribution of outcome probabilities is updated using Bayes' law and a

Since our task involves multinomial outcomes, we chose a Dirichlet prior to initiate learning. Without jumps, posterior distributions will be Dirichlet as well. As initial (first-trial) prior, we take the uninformative Dirichlet with center

Let

In a stationary world, this would provide the optimal inference. Because jumps may occur (outcome probabilities may change), we augment the standard Bayesian updating using a forgetting operator, which we denote

After a jump in trial

In the absence of a jump, the decision maker should use the standard Bayesian posterior, here denoted

Therefore, in principle, the new posterior should either be

From minimization of a Bayes risk criterion,

Consequently, the forgetting operator equals:

The geometric mean is a tractable way to introduce information on unexpected uncertainty in the updating because, for large

Another advantage of the forgetting operator, important for our purposes, is that updating can be expressed directly in terms of a learning rate. Usually, with Bayesian updating, learning rates are only implicit (because the Bayes transformation is generally non-linear). We shall use the symbol

Specifically, with the forgetting algorithm, the posterior mean probability vector is computed as follows:

The learning rate

For model-free reinforcement learning, we applied a simple Rescorla-Wagner rule. Let

If

If

Here, the learning rate is fixed but color-specific. As such, the reinforcement learning model allows for adjustment of the learning rate to the average level of unexpected uncertainty (red options jump more often than blue ones), in line with evidence that the learning rate increases with average unexpected uncertainty

We also fit a modified reinforcement learning model, where the learning rate adjusts to the size of the prediction error in the last trial. This is the Pearce-Hall model

The computations, which are provided in

Unexpected uncertainty, the chance that a jump has occurred, is complementary to the chance that no jump has occurred. At the red location, it equals

Estimation uncertainty is the dispersion of the posterior distribution of outcome probabilities. It can be measured either by the variance or the entropy.

The

We used the softmax function to transform valuations for the options into choice probabilities. It generated a probability distribution

A couple of alternative versions were considered, by taking the average of the expected payoff and a bonus or (if negative) a penalty. The bonus/penalty was equal to the level of parameter estimation uncertainty (variance or entropy of the posterior distribution as defined above). In the model with bonus,

Using participant choices, we fitted the free parameters of each model:

Graphical display of the individual (negative) log-likelihoods of the Bayesian models, with penalty for ambiguity (Y-axis) and without (X-axis).

(0.01 MB PDF)

Graphical display of the individual (negative) log-likelihoods of the models in Treatment 3.

(0.02 MB PDF)

Graphical display of the individual (negative) log-likelihoods of the models in Treatment 1.

(0.02 MB PDF)

Graphical display of the individual (negative) log-likelihoods of the models in Treatment 2.

(0.02 MB PDF)

Supplemental material.

(0.20 MB PDF)

We are grateful to Chen Feng for programming the board game application.