^{1}

^{2}

^{*}

^{1}

^{1}

^{1}

^{2}

^{2}

^{1}

^{1}

Conceived and designed the experiments: JRC KJF. Performed the experiments: JRC DRB. Analyzed the data: JRC. Contributed reagents/materials/analysis tools: JRC JD EF GF. Wrote the paper: JRC RJD KJF. Mathematical/computational modelling: JRC.

The authors have declared that no competing interests exist.

Adaptive behavior often exploits generalizations from past experience by applying them judiciously in new situations. This requires a means of quantifying the relative importance of prior experience and current information, so they can be balanced optimally. In this study, we ask whether the brain generalizes in an optimal way. Specifically, we used Bayesian learning theory and fMRI to test whether neuronal responses reflect context-sensitive changes in ambiguity or uncertainty about experience-dependent beliefs. We found that the hippocampus expresses clear ambiguity-dependent responses that are associated with an augmented rate of learning. These findings suggest candidate neuronal systems that may be involved in aberrations of generalization, such as over-confidence.

Intelligent behavior requires flexible responses to new situations, which exploit learned principles or abstractions. When no such principles exist, the imperative is to learn quickly from scratch. Behaviorally, we show that subjects learn action-reward relationships in a manner that enables them to generalize rules to new situations. Our fMRI results show that when subjects have no evidence that such a rule exists, medial temporal lobe responses (that reflect uncertainty) predict their augmented learning.

Successful behavior in new situations often requires us to apply ‘rules-of-thumb’. However, acquiring and applying abstract rules from limited experience presents a fundamental computational problem

Probabilistic inference in a natural environment is confounded by multiple sources of uncertainty

At the behavioral level, human subjects readily abstract probabilistic rules and use them to generalize

In this study, we examined the neuronal correlates of generalization with a special focus on the hippocampus: The hippocampus is involved in generalization

Nineteen subjects (age 19–31, 11 female) were recruited from the UCL psychology Dept subject pool. All subjects gave informed consent, before reading a brief description of the task which was then performed under fMRI. The study protocol was approved by the local UCL ethics committee.

Images were acquired on a 3 T Allegra head scanner (Siemens Medical Systems) with a head coil for RF transmission and signal reception. We used BOLD signal sensitive T2*-weighted transverse single-shot gradient-echo echo-planar imaging (EPI; flip angle 90°; bandwidth BW, 3551 Hz/pixel; phase-encoding (PE) direction, anterior–posterior; bandwidth in PE direction BWPE, 47.3 Hz/pixel; TE, 30 ms; effective TR, 2600 ms). An automatic 3D-shim procedure was performed at the beginning of each experiment. Each volume contained 40 slices of 2-mm thickness (1-mm gap between slices; field of view, 192×192-mm^{2}; matrix size, 64×64). Sensitivity losses due to susceptibility artifacts were minimized by applying a

Whole-brain anatomical scans were acquired using a modified driven equilibrium Fourier transform (MDEFT) sequence with optimized parameters

Functional imaging data were analyzed with statistical parametric mapping (SPM8; Wellcome Trust Centre for Neuroimaging;

While our goal was to identify domain-general computational processes, the paradigm was framed as a social inference task: Subjects were told that two

A face was presented for 600 ms before two choice options were displayed. The choice options cue the subjects' guess, which was then indicated by a yellow border around the selected option. Audio and visual feedback indicated whether the choice was rewarded (correct) or not (incorrect).

Unbeknown to subjects, individuals from one group had similar preferences, while the other group had more between-individual variability. This meant that subjects had to make guesses about choices in two distinct contexts established by the group an individual belonged to: in the

The generalization context therefore contained a probabilistic rule prescribing the best guess, even in the absence of learning about an individual's preferences. Conversely, in the ambiguous context, subjects had to learn about individual preferences because their group membership provided no clues about what they would preferentially choose. Trials were arranged into blocks, in which the same individual was presented for ten consecutive trials. The blocks alternated between AC and GC, with a new individual (face) for each block. This resulted in

The blue and purple options were presented with equal probability on the left and right of the screen on each trial. Individuals (faces) were randomly reassigned to either group, between subjects. All subjects experienced the same feedback contingencies (with randomly reassigned cues). Subjects had three short breaks during the task: for each they were first cued ‘PLEASE HAVE A SHORT REST AND RELAX’ before being prompted to restart thirty seconds later: ‘OK! PLEASE PRESS ANY KEY TO CONTINUE’.

Bayesian learning theory predicts that subjects should learn more quickly about a new individual from the ambiguous group, relative to an individual from the generalization group. This is based upon the assumption that subjects are making Bayes-optimal guesses using a notion of group or context. The increase in learning rate with higher levels of ambiguity is related to increases in learning rate in situations with a high degree of volatility

To make optimal guesses about the choices of each group member, subjects have to infer their preferences i.e. the probability that this individual will choose a particular option, say ‘purple’. We denote this probability with

In what follows, we consider alternative models that subjects might have used to infer the

The critical feature of

The form of our model appeals to behavioral evidence that human rule learning resembles non-parametric Bayesian inference

We assume that subjects store the number of times (out of

The implicit form of generalization is more transparent when we integrate over

To predict subject's responses we require M1's posterior belief about the behavioral contingencies. This quantifies the ambiguity as well as the value of their response options. For posterior inference, one can obtain a sample from the posterior of

Having assumed

→

→

This procedure approximates the trial-by-trial evolution of posterior belief about preferences,

We used two measures of this time-dependent posterior as explanatory variables to predict the behavioral and neurophysiological responses of each subject. Firstly, we operationalized the ambiguity about each new individual using the Shannon entropy

We have emphasized that greater prior ambiguity (i.e. higher number of inferred subgroups/higher predictive entropy) is accompanied by a diminished

Subjects do not know the true expected reward,

To assess the predictions of M1 in relation to a null model, we also considered the predictions under M2, where subjects learn about each individual without generalization. Under this assumption, the expected reward (correct choice) can be modeled with classical Rescorla-Wagner learning

The resulting sequence of value for our two models

M1 tracks current information when necessary (AC), and otherwise exploits generalization to limit the impact of spurious outcomes on action (GC). M2 is ignorant about each new individual and myopically chases reward. Red circles indicate the actual guesses of a typical subject.

Without evidence of a contextual norm (in the AC) subjects are uncertain about what to do with an unfamiliar person, and must learn quickly. This time-series, convolved with a hemodynamic response function predicted hippocampal fMRI responses (see main text).

To ensure we had not overlooked other explanations for the subjects' responses, we performed secondary analyses, to establish the explanatory power of M1 in the context of alternative models, M3–M5. M3 was a generalization of M2

We used logistic regression to predict trial-by-trial choices from the value (expected reward) based on

Our analyses of the fMRI data used a conventional approach in which the parametric effects of variables from our formal Bayesian model were used to predict the amplitude of fMRI responses, after convolution with a suitable hemodynamic response function

1) The prediction ‘risk’ under M1 (time-locked to the choice presentation): 2) The reward predictions under M1, conditioned on the subject's choice (time-locked to the choice): 3) The model-based (Shannon) surprise at the outcome under M1 (time-locked to the outcome): 4) The signed model-based prediction error under M1, conditional on subject's choice (time-locked to the outcome): 5) the trial outcome: correct/incorrect, coded at 1,0 respectively (time-locked to the outcome). In terms of our hypothesis, these regressors can be regarded as modeling nuisance effects. Our final regressor was the key effect of interest; namely, the trial specific ambiguity as measured by the Shannon entropy above. It is this measure that reflects an encoding of contextual uncertainty that weakens generalization. The entropy entered as parametrically modulated delta functions at the time of choice, but before feedback. Six columns describing scan-specific rigid body translations and rotations were included as confounds. The data was temporally filtered to remove low-frequency drifts below 1/128 Hz.

A secondary behavioral analysis assessed the specificity of M1 predictions by examining the explanatory power of M1 in the context of the alternative models, M2 to M5. For each subject, we used logistic regression to explain subject's choices as a mixture of predictions from five models (M1 to M5), plus a constant term. Having estimated the logistic regression model for each subject, we again considered the subject-specific estimates for the coefficient reporting on M1 predictions. A two-tailed Student's

To summarize, we used standard regression techniques to ask if, having accounted for competing models, a component of choice behavior reflects Bayes-optimal generalization (M1). Specifically, we included several model predictions in one linear model and estimated the partial regression coefficient for the predictor of interest (action-values derived from M1). One can therefore

While M1 differs from other models in many ways, the important aspect for the fMRI analysis is that M1 provides an ambiguity measure. We therefore tested the null hypothesis that the fMRI signal is sensitive to ambiguity, as quantified by the Shannon entropy of prior belief (see above). In our fMRI data, fourteen subjects satisfied the inclusion criteria for a second-level between subject analysis (no interruptions to the scanner session or rapid head movement, as estimated by co-registration). We conducted regional and whole-brain analyses. All fMRI results presented here are based on the same general linear model, including the confounding factors (i.e., with nine regressors). In view of our specific hypothesis, region of interest (ROI) analyses asked whether activity within bilateral hippocampi tracked ambiguity about the current contingencies.

(

As with the behavioral data, we next examined the between-subject correlation between the hippocampal ambiguity coefficients and the total number of rewards attained in the experiment (correlation

In an exploratory whole brain analysis, we then smoothed the data with a Gaussian Kernel

Behaviorally, we have shown that subjects learn action-reward relationships in a manner that enables them to generalize rules to new situations. Crucially, this enables subjects to adapt their learning rate to provide an optimal balance between pre-existing generalizations and new information. We established this by showing that the accuracy of subjects' guesses evolved over trials in a way that was predicted by Bayes-optimal generalization, using a statistical model equipped with prior beliefs that allowed for contextual ambiguity. Furthermore, we established that a significant component of hippocampal responses could be explained by fluctuations in ambiguity under this model. These regionally specific responses were also significant in a whole brain SPM analysis.

We provide empirical support for a model that explains how experience moderates decision making. In this model, the bias towards rule-based choices is determined by low ambiguity. We show that both learning and hippocampal responses are attenuated when the underlying rule is learned and applied in an unambiguous context. Conventional ‘model-free’ reinforcement learning cannot easily explain such effects because these schemes do not include contextual ambiguity. As noted in the

As in previous treatments

Previous work

The learning rate in (model-free) reinforcement learning prescribes the sensitivity of belief updates to current information. When this information is under or over-weighted, inefficient learning ensues. While classical RL is non-probabilistic (i.e. has a degraded uncertainty representation

We would like to thank Peter Dayan, Jon Rosier, Read Montague, Antonio Rangel, John O'Doherty, Grit Hein and our anonymous reviewers for invaluable discussions and feedback.