Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Vicarious Neural Processing of Outcomes during Observational Learning

  • Elisabetta Monfardini ,

    elisabetta.monfardini@inserm.fr

    Affiliations INSERM, U1028; CNRS, UMR5292; Lyon Neuroscience Research Center, ImpAct Team, Lyon, France, Institut de Médecine Environnementale, Paris, France

  • Valeria Gazzola,

    Affiliations University Medical Center Groningen, University of Groningen, Department of Neuroscience, BCN NeuroImaging Center, Groningen, The Netherlands, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands

  • Driss Boussaoud,

    Affiliation Institut de Neuroscience des Systèmes, UMR 1106, INSERM, Aix-Marseille Université, Marseille, France

  • Andrea Brovelli ,

    Contributed equally to this work with: Andrea Brovelli, Christian Keysers, Bruno Wicker

    Affiliation Institut de Neurosciences de la Timone, CNRS & Aix-Marseille Université, UMR 7289, Marseille, France

  • Christian Keysers ,

    Contributed equally to this work with: Andrea Brovelli, Christian Keysers, Bruno Wicker

    Affiliations Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands, University Medical Center Groningen, University of Groningen, Department of Neuroscience, BCN NeuroImaging Center, Groningen, The Netherlands

  • Bruno Wicker

    Contributed equally to this work with: Andrea Brovelli, Christian Keysers, Bruno Wicker

    Affiliations Institut de Neurosciences de la Timone, CNRS & Aix-Marseille Université, UMR 7289, Marseille, France, Integrative Neuroscience Laboratory, Physics Department, University of Buenos Aires, Capital Federal, Argentina

Vicarious Neural Processing of Outcomes during Observational Learning

  • Elisabetta Monfardini, 
  • Valeria Gazzola, 
  • Driss Boussaoud, 
  • Andrea Brovelli, 
  • Christian Keysers, 
  • Bruno Wicker
PLOS
x

Abstract

Learning what behaviour is appropriate in a specific context by observing the actions of others and their outcomes is a key constituent of human cognition, because it saves time and energy and reduces exposure to potentially dangerous situations. Observational learning of associative rules relies on the ability to map the actions of others onto our own, process outcomes, and combine these sources of information. Here, we combined newly developed experimental tasks and functional magnetic resonance imaging (fMRI) to investigate the neural mechanisms that govern such observational learning. Results show that the neural systems involved in individual trial-and-error learning and in action observation and execution both participate in observational learning. In addition, we identified brain areas that specifically activate for others’ incorrect outcomes during learning in the posterior medial frontal cortex (pMFC), the anterior insula and the posterior superior temporal sulcus (pSTS).

Introduction

The capacity to vicariously learn from others which action is most rewarding in a particular situation is one of the most basic forms of human social cognition [1][3]. Learning-by-observation (LeO) plays a crucial role in many adaptive behaviours such as foraging and predator avoidance [4] and it has been observed in several animal species including rats [5], dogs [6], pigeons [7] and monkeys [8][11]. LeO relies on multiple functions, including the ability to infer others’ intentions from action observation, process others’ action outcomes (i.e. successes and errors) and combine these sources of information to learn arbitrary stimulus-action-outcome associations that can later serve the selection of behaviours leading to desired outcomes.

During individual trial-and-error learning (TE), decades of research have uncovered a detailed mechanistic understanding of how learning to select the most rewarding action in response to a stimulus is governed by multiple reward-related signals. Reward prediction-error signals (i.e. the difference between obtained and expected rewards) are represented in the ventral striatum [12], [13] and ventral tegmental area [14][16]. fMRI activations correlating with the absolute value of prediction errors signals have been found in the dorsal striatum [17] and in the dorsal fronto-parietal network [18]; and first correct outcomes selectively activate the left dorsolateral prefrontal cortex in humans [18] and produce specific signals in anterior cingulate cortex in monkeys [19]. Such a detailed mechanistic understanding still lacks for LeO. Recent results suggest that LeO depends on observational action prediction-errors (i.e. the actual minus the predicted action of others) and observational outcome prediction-errors (i.e. the actual minus predicted outcome received by others) that selectively recruit the dorsolateral and ventromedial prefrontal cortex, respectively ([20] see also [21]). The relationship of these signals with those recruited during TE remains, however, poorly understood.

Observing the actions of others is known to vicariously recruit brain regions traditionally associated with action execution [22][24]. The network of brain regions common to action observation and execution in humans has been dubbed the putative mirror neuron system (pMNS) in analogy to the mirror neurons found in similar brain regions in monkeys [25], [26]. This pMNS includes the ventral and dorsal premotor cortex, the inferior parietal lobule and adjacent somatosensory areas, and the middle temporal gyrus (see [24] for review). Such vicarious motor activations in the pMNS and what we know about mirror neurons from animal studies provide a powerful conceptual framework to understand how observers can learn to reproduce the observed actions of others (see [27] for review). But during LeO, how do observers learn which of many observed and vicariously activated actions are most rewarding in response to a particular stimulus? Here we explore whether activations in the pMNS coexist with representations of the outcomes obtained by the observed agents to make such LeO possible. Specifically, we explore whether representations of the outcomes of others depend on the vicarious recruitment of the brain circuits normally involved in individual TE and/or whether such information triggers activity in regions not as involved during TE. To this aim, we scanned human participants using functional magnetic resonance imaging (fMRI) while learning stimulus-action-outcome associations either by TE (i.e. first hand) or LeO (i.e. vicariously). This allowed us to identify, and for the first time directly compare, the brain networks mediating the processing of errors and successes during individual and observational learning.

Materials and Methods

Subjects

Eighteen healthy, right-handed volunteers (7 males) participated in the study (mean age: 27.6±4.5 years), but one was discarded for technical problems and two based on their poor learning performance. Consequently, fifteen subjects were included into the analysis (6 males; mean age: 27.1±4.7 years). The subjects were screened to rule out medication use, history of neurological or psychiatric disorders, head trauma, substance abuse or other serious medical conditions. Written consent was obtained after the procedure had been fully explained. The study was approved by the Medical Ethical Commission (METc) of the University Medical Center Groningen (NL). Volunteers were paid for their participation.

Task Design

The experiment was built as an event-related paradigm with nine experimental runs. Each run consisted of a single task, corresponding to one of the following experimental conditions: learning by trial-and-error (TE), learning-by-observation (LeO), pMNS localizer. The two learning conditions were repeated four times in order to increase the number of events per condition. The ordering of runs was randomized across subjects. Each task was explained to the subjects step by step before scanning.

Learning by trial-and-error (TE).

During scanning, participants had to learn the correct associations between each of 3 coloured stimuli and 1 of 4 possible joystick movements (Fig. 1A). Subjects performed 4 TE learning sessions. To avoid confusion across runs, in each run, the coloured stimuli had a different geometric shape (e.g. triangles of 3 colours in one run, circles of 3 colours in another run, rhombus in another run, squares in another run still). On each trial, subjects were presented with a coloured shape and they had to make a decision within 1.5 s by moving the joystick in one of the 4 possible directions. After a variable delay ranging from 4 to 10 s (randomly drawn from a log-normal distribution) following the disappearance of the coloured stimulus, an outcome image was presented (Fig. 1A). The outcome image lasted 1s and informed the subject whether the response was correct (green tick-mark), incorrect (red-cross) or late (question-mark, if the reaction time exceeded 1.5s). In case of a late trial, the same visual stimulus was repeated in the next trial in order to obtain the same number of valid trials per session. Late trials (mean±standard error of the mean per subject: 1.62±0.26) were modeled at the first level of analysis with a predictor of no interest and thus excluded from the regressors of interest in later analyses. The next trial started after a variable delay ranging from 4 to 10s with the presentation of another visual stimulus. Visual stimuli were pseudo-randomized in blocks of three trials. Each learning session was composed of 18 trials, 3 stimulus types (i.e. identical shape but different colours, S1, S2, and S3) and 4 possible joystick movements. Thereafter, subject performed 12 trials in which they were tested on their knowledge of the associations (TE-test trials). In these trials, the stimuli appeared (1.5 s) on the screen and subjects were asked to perform the correct movement within 1.5 s. No feedback was presented to prevent improvement in performance.

thumbnail
Figure 1. fMRI task design.

(A) Learning by trial-and-error (TE). A trial started with the presentation of a coloured stimulus. Participants had to displace the joystick in one of the four possible directions (up, down, right and left) within 1.5 seconds. After a variable delay, a feedback stimulus was presented for 1 second indicating whether the action was correct (green tick), incorrect (red cross) or late (question mark). (B) Learning-by-observation (LeO). Each trial started with the presentations of a video showing a hand on a joystick performing one of the four possible movements in response to the presentation of a coloured stimulus on a monitor. The camera view was set to actor’s perspective. The video lasted 2 seconds and the coloured stimulus was presented for 1.5 seconds, as in the trial-and-error condition. The outcome images were presented after a variable delay and they were identical to those used in the TE condition. Participants were instructed to learn the correct stimulus-action-outcome associations by looking at the videos and outcomes. (C) Task design of an exemplar learning session. Stimuli were randomised in blocks of 3 trials. (D) Matrix of all possible stimulus-response combinations corresponding to the exemplar session in (C). Correct associations were not set a priori, but they were assigned as subjects advanced in the task. The first presentation of each stimulus was always followed by an incorrect outcome, irrespective of the motor response (from trial 1 to 3). On the second presentation of S1 (the blue circle), any untried joystick movement was always followed by a correct outcome (trial 4). The correct response for S2 and S3 (red and green circles, respectively) was found after 2 and 3 incorrect joystick movements (at trials 7 and 9, respectively). In other words, the correct response was the 2nd joystick movement (different from the first tried response) for stimulus S1, the 3rd joystick movement for stimulus S2, and the 4th for stimulus S3. This task design ensured a minimum number of incorrect trials during acquisition (one for S1, two for S2 and three for S3) and fixed representative steps during learning. The LeO task was built using a design similar to the one used for the TE learning task. Given the scarcity of repetition and maintenance errors in TE, in LeO the actor neither repeated incorrect actions while searching for the correct association (i.e. no repetition errors in the acquisition phase of learning), nor made errors after the first correct response (i.e. no maintenance errors). Therefore, learning-by-observation consisted in 6 incorrect (one for S1, two for S2 and three for S3) and 12 correct trials. (E) Observation and execution of actions. Participants observed a video of a hand performing a joystick movement in response to a grey stimulus (i.e. action observation). After a variable delay, subjects were instructed to perform the movement they had previously observed (i.e. action execution).

https://doi.org/10.1371/journal.pone.0073879.g001

In order to induce reproducible performances across runs and subjects, we adapted a task design previously developed by Brovelli et al. [27], [28] that ensures similar number of successful and unsuccessful attempts across learning sessions. In fact, the stimulus-response associations were not established a priori, but assigned as the subject progressed in the learning task (cf. legend of the Fig. 1C, D). Consequently, the task design ensured a minimum number of incorrect trials during acquisition (one for S1, two for S2 and three for S3) and fixed representative steps during learning.

Learning-by-observation task (LeO).

The LeO task was built using a similar task design. Subjects were asked to learn the associations between stimuli and joystick movements by observation of a video showing an actor learning the associations (Fig. 1B). The video lasted 2 seconds but the coloured stimulus was presented for 1.5 seconds, as in the TE condition, to make the timing of the conditions identical. After a variable delay ranging from 4 to 10s a positive or negative feedback appeared on the screen to inform whether the actor’s action was correct or incorrect. The subjects were instructed to learn the correct stimulus-response associations via the observation of the movies and the outcomes given to the actor. To ensure that both learning conditions contained similar numbers of successful and unsuccessful attempts, the progression of the actor performance was comparable to the actual performances of the subjects in the TE condition (Figure 1C, D). The actor never repeated the same incorrect action while searching for the correct association with a given stimulus (i.e. no repetition errors in the early phases of learning) and never made errors after the correct response (i.e. no maintenance errors). Each LeO session was composed of 18 learning trials as described above, 6 of which contained error trials. Visual stimuli were pseudo-randomized in blocks of three trials, except for the last trial of the third block (i.e 9th) that was always correct (this explains why the actor reached the 100% of correct responses on the trial 9, cf. Fig. 2A). Thereafter, subjects performed 12 trials in which they were tested on their knowledge of the associations (LeO-test trials). As in the TE-test trials, the outcome was not presented.

thumbnail
Figure 2. Behavioural performances of subjects in the fMRI learning sessions.

(A) Mean learning curve averaged across runs and subjects for the TE condition (gray curve) and the LeO condition (black curve). Note that the LeO curve represents the progression of the actor performance in the videos shown to the participant. Error bars indicate the standard error of the mean (SEM). (B) Mean percentage of correct responses in the TE-test and LeO-test sessions following learning. Error bars indicate the standard error of the mean (SEM).

https://doi.org/10.1371/journal.pone.0073879.g002

pMNS localizer task.

This task was created to functionally map the brain areas activated during both action observation and execution, irrespectively of learning (Fig. 1E). To map action observation, subjects observed the movies (2s) used in the LeO condition to guarantee comparable visual characteristics across conditions. The colour of the visual stimulus was masked to remove the possibility to implicitly learn a visuomotor association. Participants were instructed to observe the action with the intent to repeat it. After a variable delay ranging from 4 to 10 s, a go signal (a green-cue) appeared on the screen to instruct the participants to execute the movement performed by the actor in the video. The execution phase lasted 1.5 s. Subjects saw 72 videos and therefore executed 72 actions in a single fMRI run.

Experimental set-up.

Visual stimuli were projected at the centre of a screen positioned at the back of the scanner. Subjects could see the image reflected on a mirror (15×9 cm) suspended 10 cm in front of their faces and subtending visual angles of 42° horizontally and 32° vertically. The subject’s responses were recorded using an fMRI-compatible joystick (fORP, CurrentDesigns, Inc., Philadelphia, USA). Before the experiment, the participants were instructed that the correct stimulus-response associations were: (i) completely arbitrary, and (ii) not mutually exclusive (all stimuli could be associated with the same joystick movement), meaning that the subjects could not infer correct associations by excluding previous correct movements.

fMRI Data Acquisition and Preprocessing

Images were acquired using a Philips Intera 3T Quaser, a synergy SENSE head coil, 30 mT/m gradients and a standard single shot-EPI with TE = 30 ms, TR = 2s, 37 axial slices of 3 mm thickness, with no slice gap and a 3×3 mm in plane resolution acquired to cover the entire brain and cerebellum. The slices were acquired in an interleaved spatial order. The first three volumes of each participant’s data were discarded to allow for longitudinal relaxation time equilibration.

Data were preprocessed with SPM5 (Wellcome Trust center for NeuroImaging, London, UK; http://www.fil.oin.ucl.ac.uk/software/spm5/). EPI images from all sessions were slice-time corrected and aligned to the first volume of the first session of scanning to correct for head movement between scans. A mean image was created using the realigned volumes. T1-weighted structural images were first co-registered to the mean EPI image of each participant. Normalization parameters between the co-registered T1 and the standard MNI T1 template were then calculated, and applied to the anatomy and all EPI volumes. Data were then smoothed using a 8 mm full-width-at-half-maximum isotropic Gaussian kernel to accommodate inter-subject differences in anatomy.

fMRI Data Analysis

The statistical analysis of the pre-processed event-related BOLD signals was performed using a general linear model (GLM) approach. Each trial in the TE and LeO conditions consisted of two events. The first (SR: stimulus+response) was associated with the processing of the stimuli and the selection of motor response (TE), or the observation of movies (LeO). The second was associated with the processing of outcomes (O). To dissociate the two events in each trial, the regressors were constructed by convolving the canonical hemodynamic response function with delta functions of constant or varying amplitudes aligned on the time of SR and O onsets. Given that learning of stimulus-action-outcome associations only happens in the outcome phase, we only present the results related to brain activations recruited during the processing of outcomes (O). For the pMNS localizer trials, one regressor was aligned with the onset of videos presentation (action observation), the other with the go-cue onset (action execution).

Single participant analyses.

The goal of the GLM analyses was to identify the cerebral networks involved in the processing of outcomes that displayed learning-related changes during TE and LeO. To do so, we computed two design matrices at the 1st level of analysis. In the first one, the design matrix contained 10 regressors. The first 4 regressors modelled the BOLD responses in the TE condition. The 1st and the 2nd regressors were aligned on the stimulus presentation and included the trials in the acquisition (TE_SR_acquisition) and early consolidation (TE_SR_consolidation) phases of learning, respectively. The acquisition phase included the incorrect and 1st correct trial (Fig. 1C, D). The early consolidation phase was composed of all the trials starting with the second correct. The 3rd and the 4th regressors modelled the same learning phases, but they were aligned on the presentation of the outcome image (TE_O_acquisition and TE_O_consolidation, respectively). The same trial and event types in the LeO condition were modelled from the 5th to the 8th regressors (LeO_SR_acquisition, LeO_SR_consolidation, LeO_O_acquisition and LeO_O_consolidation). The 9th and 10th regressors included the trials of the action observation and execution (pMNS localizer).

In a second GLM, we refined the first analysis to dissociate the neural systems associated with processing of incorrect and first correct trials. We thus created a design matrix at 1st level that contained 12 regressors (6 regressors for both TE and LeO). For each learning condition, three regressors were aligned on the SR event and three on O. Among these, the first regressor included incorrect trials (TE_SR_incorrect, TE_O_incorrect; LeO_SR_incorrect, LeO_O_incorrect), the second included the first correct trial for each association (TE_SR_1stcorrect, TE_O_1stcorrect; LeO_SR_1stcorrect, LeO_O_1stcorrect), whereas the third included subsequent correct trials (TE_SR_consolidation, TE_O_consolidation; LeO_SR_consolidation, LeO_O_consolidation).

Group analyses.

All the fMRI statistics and P values arise from group random-effects analyses on the outcome phase of learning. Group analyses were thresholded at the voxel-level at p<0.001(uncorrected). The minimum cluster size (k) was 15 voxels, which ensured a cluster p≤0.05. To control the overall rate of false positives and because we searched for significant effects over the entire brain, we only report (unless specified otherwise) results with a False Discovery Rate (FDR) q<0.05 (k = 15 voxels).

The brain regions recruited during the acquisition phase of learning in both TE and LeO conditions were mapped using a two-way repeated-measures ANOVA, with 2 learning phases (acquisition and consolidation) × 2 learning conditions (TE and LeO). The learning signals are mainly processed in the acquisition phase, and voxels processing these signals should thus show an effect of phase, with the acquisition phase showing more activation than the consolidation phase. Such an effect dissociates processes associated with early learning (i.e. acquisition) from the sensory processing of the outcome (i.e. consolidation). If this effect is a main effect, without significant interaction with learning condition, the voxel would be similarly involved in learning for TE and LeO. If the voxel additionally shows an interaction with learning condition, it would be evidence of its stronger involvement in one form of learning than in the other.

To refine this analysis, an additional two-way repeated-measures ANOVA was implemented, with 2 correctness (incorrect, 1stcorrect) × 2 learning conditions (TE, LeO). The goal of this analysis was to dissociate the neural systems relative to the processing of incorrect and first correct trials during TE and LeO learning conditions.

Finally, we mapped the pMNS by first running two one-sample t-tests on the single participant beta values from the action observation and execution conditions. The thresholded group t-map resulting from the conjunction analysis [29] between observation and execution regressors (t = 3.41, punc<0.001) was used as a localizer mask for the pMNS.

The anatomical location of each activated cluster was assessed using the SPM anatomy toolbox (https://www.fz-juelich.de/ime/spm_anatomy_toolbox) [30] and the Talairach Daemon software (http://www.talairach.org) [31]. Graphical display was performed using MRIcron software (http://www.cabiatl.com/mricro/mricron/index.html).

To depict the BOLD dynamics across conditions, we extracted the BOLD responses from all voxels in each activated cluster using the MarsBar toolbox for SPM (http://marsbar.sourceforge.net/). The average BOLD response was calculated by temporally aligning the BOLD time series on outcome onset and by averaging them across trials and subjects for each experimental condition (TE, LeO, OBS and EXE; Fig. 3B). Two separate three-way repeated measures ANOVAs with 2 conditions (observing others vs doing) × 2 tasks (learning vs. not learning) × 6 ROIs were then conducted by considering both (i) the peak values and (ii) the Δ scores (peak value-first point) of the curves shown in the Fig. 3B. In addition, we also plotted the mean value of the parameter estimates for the maxima of each clusters (Fig. 4B).

thumbnail
Figure 3. Clusters of activation are superimposed on to the average T1 image derived from all participants.

(A) Brain networks commonly recruited during the acquisition phase of learning (i.e. incorrect trials +1st correct trial) in both TE and LeO. Active brain regions in both TE (i.e. TE_O_acquisition>TE_O_consolidation) and LeO (i.e. LeO_O_acquisition>LeO_O_consolidation) contrasts (conjunction thresholded at punc<0.001, t = 3.24;k = 15; all clusters also survive qFDR<0.05). See also Fig. S1. (B) Brain networks commonly recruited during the acquisition phase of learning, action observation and execution. Intersection analysis between the results from (A) and the localizer mask for the pMNS. Grand-average BOLD responses in the regions of overlap for TE_O_acquisition and LeO_O_acquisition (black and gray continuous line), OBS and EXE (continuous and dotted light gray) conditions. (C) Brain networks commonly recruited during the processing of 1st correct outcome in TE and LeO. Positive effect of the 1st correct outcome (LeO_O_1stCorrect+TE_O_1stCorrect-LeO_O_incorrect-TE_O_incorrect) exclusively masked with the interaction of correcteness by learning condition (t = 3.24; punc<0.001, k = 15; all clusters also survive qFDR<0.05).

https://doi.org/10.1371/journal.pone.0073879.g003

thumbnail
Figure 4. Clusters of activation are superimposed on the average T1 image derived from all participants.

(A) Direct comparison between LeO_O_acquisition and TE_O_acquisition. Results from Leo_O_acquisition>TE_O_acquisition t-contrast (t = 3.24; punc<0.001, k = 15; all clusters also survive qFDR<0.05). (B) Direct comparison between LeO_O_incorrect and TE_O_incorrect. Areas showing greater activation for processing of incorrect outcomes in LeO, with respect on processing of incorrect outcomes in TE (punc<0.001, k = 15; all clusters also survive qFDR<0.05). Plot of the mean value of the parameter estimates (arbitrary units) for the maxima of the left anterior insula, left and right pSTS, left pMFC and middle cingulate cortex.

https://doi.org/10.1371/journal.pone.0073879.g004

Results

Behaviour

In order to compare the neural substrates of trial-and-error and observational learning, we used a task designed to induce comparable performances across sessions, subjects and learning conditions (Fig. 1). Indeed, the mean learning curves, averaged across runs and subjects for the TE condition (Fig. 2A, gray curve), showed a profile comparable to the learning profile in the LeO condition (Fig. 2A, black curve; r = 0.55). The number of repetition and maintenance errors in the TE condition was very limited (mean±standard error of the mean per subject: repetition errors 0.78±0.16; maintenance errors 0.6±0.12). In addition, the mean percentage of correct responses in the test sessions following learning was 94.01% ±0.58% (mean+/− standard error of the mean) and 95.06% ±0.53% for the TE-test and LeO-test phases (p = 0.6; cf. Fig. 2B), respectively. Overall, the behavioural results showed that the task design successfully manipulated learning performance and induced reproducible performances across sessions and subjects. Most importantly, no significant difference was observed in the final performance after TE and LeO (Fig. 2B).

Neuroimaging

Networks for the processing of outcomes during TE and LeO.

A group-level 2×2 repeated-measures ANOVA with two phases (acquisition, consolidation) and two learning conditions (LeO, TE) showed a main effect of learning phase (F1,14 = 12.06, punc<0.001; F1,14 = 6.47, qFDR<0.05), and a main effect of learning condition (F1,14 = 12.06, punc<0.001; F1,14 = 9.81; all clusters also survive qFDR<0.05). The interaction of learning phase by learning condition (F1,14 = 12.06, punc<0.001) was only significant in 23 voxels (13 voxels in left anterior insula extending to inferior frontal gyrus, 10 voxels in left posterior superior temporal sulcus) and therefore did not survive an FDR correction (qFDR = 0.35).

Post-hoc t-tests revealed that the activations specifically involved during the acquisition phase of learning (i.e. LeO_O_acquisition+TE_O_acquisition - LeO_O_consolidation - TE_O_consolidation) were localized in a large network of brain regions including the bilateral inferior (IPL, BA40) and superior (SPL, BA7) parietal cortex, the bilateral postcentral gyrus (BA2), the bilateral dorsal premotor (PMd, BA6) and dorsolateral prefrontal (dlPFC, BA9) cortices, the supplementary motor area (SMA), the bilateral middle temporal cortex (BA21/22), the bilateral cerebellum as well as the right caudate nucleus (dorsal striatum), the right inferior frontal gyrus (vlPFC, i.e. ventro-lateral prefrontal cortex, BA44/45/47) and the left anterior vlPFC (BA10) (t = 3.24, punc<0.001, k = 15; all clusters also survive qFDR<0.05; cf. Fig. S1, Table S1). None of these regions showed an interaction of learning phase and condition. Consequently, this result suggests that the learning signal provided by outcomes yielded similar activations in these brain regions during the acquisition phase of both TE and LeO. To confirm the significance of the recruitment of this brain network in both TE and LeO acquisition phase individually, we also computed a conjunction analysis [29] between two contrasts: (TE_O_acquisition>TE_O_consolidation) ∩ (LeO_O_acquisition>LeO_O_consolidation). The results showed that outcome processing during TE and LeO acquisition phase commonly activated the bilateral IPL and SPL (BA40 and BA7, respectively), the bilateral PMd (BA6), the SMA, the bilateral cerebellum, the bilateral dlPFC (BA46/9), the right vlPFC (BA45/44), the left anterior vlPFC (BA10) and the left dorsal striatum (t = 3.24, punc<0.001, k = 15; all clusters also survived qFDR<0.05; cf. Fig. 3A and Table 1A). These common brain activations, and the scarcity of voxels showing significant interactions, confirm that neural mechanisms engaged during acquisition LeO are strictly similar to those engaged during acquisition TE learning in humans.

thumbnail
Table 1. (A) Brain networks commonly recruited during the acquisition phase (i.e. incorrect outcomes+1st correct outcome) in both TE and LeO conditions (conjunction thresholded at punc<0.001, t = 3.24; k = 15; all clusters also survive qFDR<0.05).

https://doi.org/10.1371/journal.pone.0073879.t001

To further explore the effect of learning type, we investigated the difference between LeO and TE during the acquisition phase of learning. The contrast TE_O_acquisition>LeO_O_acquisition revealed no significant clusters (punc<0.001, qFDR>0.149). The opposite contrast (LeO_O_acquisition>TE_O_acquisition), revealed BOLD changes reflecting specific LeO-related activity in middle and anterior cingulate gyri extending to SMA, in bilateral posterior superior temporal sulcus (pSTS), left anterior insula (BA13), bilateral supramarginal gyrus (BA40), bilateral fusiform gyrus and in left inferior frontal gyrus (BA44/45; t = 3.24, punc<0.001; all clusters survived qFDR<0.05; cf. Fig. 4A and Table 2A).

Processing of outcomes and the putative Mirror Neuron System.

In order to investigate whether the pMNS is activated when the outcome is revealed during the acquisition phase, we acquired a pMNS localizer (t = 3.4, punc<0.001; cf. Table S2), which identified the key parietal (BA2/PF/PFop and intraparietal sulcus hIP2) and premotor (PMv, PMd, SMA) regions consistently associated with the pMNS [23], [24]. We then inclusively intersected the pMNS localizer with the activations common to TE and LeO during the acquisition phase of learning [(TE_O_acquisition>TE_O_consolidation) ∩ (LeO_O_acquisition>LeO_O_consolidation)]. As shown in Fig. 3B and Table 1B, overlap analysis between learning-related network and pMNS revealed clusters in the bilateral superior (BA7A) and inferior (PF/PFop, hIP2) parietal lobes, the postcentral gyrus (BA2), in the bilateral PMd (BA6), in the ventral premotor cortex (right inferior frontal gyrus, BA44) and in the SMA (Fig. S2, Table S3). Averaging the time courses of BOLD response relative to the time at which the outcome is revealed is illustrated in Fig. 3B and shows a distinctive peak of activity after the outcome in all the clusters. Three-way repeated measures ANOVAs with 2 conditions (observing others vs. doing) ×2 tasks (learning vs. not learning)×6 clusters revealed no significant three way interaction (condition× task× cluster, Δ scores: F5,84 = 1.36, p = 0.25; peak values: F5,84 = 1.46, p = 0.21), meaning that similar patterns of BOLD signal change were found across the different region of interest (ROI). However, the interaction effect of condition× task revealed significant results (Δ scores: F1,84 = 13.36, p<0.001; pick values: F1,84 = 68.64, p<0.001), which suggests that the BOLD activity in all six ROIs showed a different effect of condition depending on task. In other words, the BOLD activity in these areas presented an opposite pattern depending on whether the subjects were involved in a learning task or not.

Networks for the processing of incorrect and 1st correct outcomes during trial-and-error and learning-by-observation.

This analysis was aimed to refine the understanding of neural dynamics engaged during acquisition phases of learning by differentiating the processing of errors and 1st correct trials in both TE and LeO conditions. A two-way repeated-measures ANOVA with 2 correctness (incorrect, 1stcorrect) × 2 learning conditions (TE, LeO) revealed a main effect of correctness (F1,14 = 12.06, punc<0.001; F1,14 = 5.89, qFDR<0.05); a main effect of learning condition (F1,14 = 12.06, punc <0.001; F1,14 = 12.17, all clusters also survive qFDR<0.05); and a trend toward an interaction of correcteness by learning condition (F1,14 = 12.06, punc<0.001 but q = NS; see Table S4). The contrast LeO_O_incorrect+TE_O_incorrect-LeO_O_1stCorrect-TE_O_1stCorrect yielded no significant voxels (punc<0.01, qFDR>0.99), while the opposite contrast LeO_O_1stCorrect+TE_O_1stCorrect-LeO_O_incorrect-TE_O_incorrect revealed significant BOLD increases (t = 3.24, punc<0.001; all clusters also survive qFDR<0.05).

Before interpreting voxels as being equally modulated by correctness in both learning conditions, we identified voxels showing an interaction between correctness and learning type. Although this interaction was not significant using an FDR correction, it was significant at an uncorrected threshold in some voxels (F1,14 = 12.06, punc<0.001; see Table S4). To isolate the brain regions that showed a similar preference for 1st correct over incorrect trials for LeO and TE, we therefore exclusively masked the results of positive effect of 1st correct trials with the interaction effect of learning type by correctness (F1,14 = 12.06, punc<0.001). The results revealed BOLD changes bilaterally in the fusiform gyrus, in the left middle temporal gyrus (BA21) and in the middle and anterior cingulate gyrus extending to the left dlPFC (BA9) and vlPFC (BA44/45), as well as in the left SPL (BA7), left postcentral gyrus (BA2), left supramarginal gyrus (BA40), right superior medial gyrus (BA10) and right middle occipital gyrus (BA19; Fig. 3C and Table 1C). The current result suggests that the processing of 1st correct outcomes has a crucial role for both TE and LeO and relies on similar neural computations.

We calculated the following contrasts to explore differences in the processing of 1stcorrect outcomes in TE and LeO: LeO_O_1stcorrect>TE_O_1stcorrect and TE-O-1stcorrect>LeO_O_1stcorrect contrasts. In both cases, no brain areas displayed differences that survived our thresholds (i.e. punc<0.001 and qFDR<0.05).

Finally, two t-contrasts were calculated in order to examine whether particular brain regions might be specifically involved in the processing of errors in one of the learning conditions: 1) LeO_O_incorrect>TE_O_incorrect; 2) TE_O_incorrect>LeO_O_incorrect. The contrast LeO_O_incorrect>TE_O_incorrect revealed a network of brain areas showing significantly greater activation when the subject processed other’s errors. This network includes the left medial temporal and the bilateral superior temporal gyrus (respectively, BA21 and BA22) including the posterior superior temporal sulci (pSTS), the bilateral anterior insula and the middle and anterior cingulate gyrus (BA32, BA24) encompassing the posterior medial frontal cortex (pMFC) (t = 3.24, punc<0.001; all clusters also survive at qFDR<0.05; cf. Fig. 4B, Table 2B). Other clusters were identified in the bilateral fusiform gyrus extending to the cerebellum on the right hemisphere, in the SMA and in the right postcentral gyrus (BA3). No brain areas were found in the opposite contrast (TE_O_incorrect>LeO_O_incorrect), neither at punc<0.001 nor at qFDR<0.05.

Discussion

The aim of the current study was to explore the neural substrates allowing us to learn the correct action to perform in a particular situation by observing the successes and failures of others. We investigated the neural systems involved in the processing of others' successes and errors during learning-by-observation (LeO), and compared them to those recruited during trial-and-error (TE) learning. The experimental learning tasks were designed to produce reproducible phases of acquisition and consolidation across sessions and individuals during LeO and TE. This allowed us to compare brain activations across learning types at different stages of learning, from acquisition to early consolidation. In addition, we investigated the role of the pMNS during learning by mapping brain areas involved in both action observation and action execution.

Common Brain Networks Mediating Individual and Observational Learning

Our study shows that, independently of whether learning is achieved by observation or trial-and-error, the processing of outcomes during acquisition (as compared with early consolidation) is mediated by brain regions encompassing three documented cerebral systems: the dorsal fronto-parietal, the fronto-striatal, and the cerebellar networks. These brain systems are activated during both TE and LeO (Fig. 3A, Table 1A; see also Fig. S1 and Table S1), and display stronger activation during the initial learning phase, when outcomes drive learning signals, than during the following correct trials in the early consolidation phase. The dorsal fronto-parietal system, which comprises the superior and inferior parietal lobes and the premotor dorsal cortex bilaterally, is thought to play a key role in sensorimotor transformation [32], [33], in the control of goal-directed attention to salient stimuli and responses [34], and in instrumental learning (e.g. [35], [36]). Previous neuroimaging studies have also confirmed its role in trial-and-error learning [37][39] and more specifically in the processing of outcomes [18]. This suggests that the processing of others' successes and errors during LeO partly exploits the same neural system mediating individual learning, visuomotor transformations and the control of goal-direct attention.

Our fronto-striatal network comprises the left dorsal striatum, the anterior ventro-lateral, dorso-lateral prefrontal cortices and the SMA. These structures form the associative fronto-striatal loop thought to subserve goal-directed processes during individual instrumental learning [40][47]. Previous work has shown learning-related activities during individual learning in the head of the caudate nucleus and portions of the prefrontal cortex (ventrolateral and dorsolateral), as well as in the premotor and supplementary motor areas [17], [18], [48][55]. In particular, the anterior caudate nucleus may integrate information about performance and cognitive control demands during individual instrumental learning [28], whereas the ventrolateral prefrontal cortex is implicated in the retrieval of visuomotor associations learned either by trial-and-error or by observation of others' actions [56]. Again, the overlap in the fronto-striatal network of learning specific activity during LeO and TE suggests that the processing of outcomes during observational learning relies, additionally to the dorsal fronto-parietal system, on a fronto-striatal network that is pivotal for individual instrumental learning. During TE, this system is thought to create an association between actions and outcomes. We suggest that during LeO, the same network encodes associations between the vicariously represented actions of others and their outcomes, which can later be used to guide the observers’ own behaviour.

The last network involved in outcome processing during the early phases of TE learning is located bilaterally in the cerebellum. Clinical reports on cerebellar patients describe severe impairments in cognitive planning and procedural learning (e.g. [57][60]). Moreover, using repetitive transcranial magnetic stimulation (rTMS), Torriero et al. [61] provided evidence in favour of a role of cerebellar structures during the acquisition of new motor patterns both by-observation and trial-and-error. The activation of the cerebellum in our study suggests that this structure is involved in both TE and LeO, even when new motor patterns do not need to be learned.

The fact that LeO depends in part on the brain mechanisms of TE is further supported by the observation that a common network of brain areas is also engaged during the processing of first correct outcomes in both learning processes. Previous studies examining TE learning found a selective increase in activity in the dorsolateral prefrontal cortex, BA9 [18] on first correct trial and the inferior frontal gyrus has been shown to selectively activate on first correct trial [38]. Such selective activation upon first correct outcomes may be responsible for our ability to rapidly learn stimulus-response-outcome associations. The selective activation at first correct outcomes during LeO, as revealed by our study, suggests that the dorsolateral prefrontal cortex is involved in rapid, seemingly one-trial, learning, irrespectively of the type of learning mechanisms (through trial-and-error or observation). Alternatively, this activation may allow the correct implementation of learning strategies such as the repeat-stay (perform the same action if previously rewarded). These interpretations are in line with previous reports showing deficits in rapid arbitrary visuomotor learning and strategy use after lesions of the lateral and orbital prefrontal cortex [62] and electrophysiological findings showing a selectivity in the discharge of prefrontal neurons for the type of learning strategy [63].

Taken together, our results suggest that the processing of other's outcomes during the acquisition of visuomotor associations by observation is largely implemented by a neural circuit overlapping with the brain areas involved in individual trial-and-error learning.

Role of the Putative Mirror Neuron System During Learning-by-observation

Previous research on observational learning in humans has focused on the acquisition of novel motor patterns through imitational and mirror-like mechanisms. In these tasks, participants do not need to choose amongst multiple observed actions. Instead, they have to imitate observed actions, without any of the actions leading to positive or negative outcomes. Several of these studies have reported that the fronto-parietal pMNS is strongly recruited while observing actions during the learning of new motor patterns through imitation of other’s actions ([64][67] see also [68]). The same pMNS is also activated when participants simply view the actions of others without needing to replicate them, or when they simply execute these actions [23], [24]. Accordingly, it is thought that the pMNS transforms observed actions into motor codes required for the execution of similar actions. However, the role of the pMNS in the acquisition of arbitrary visuomotor associations, where it is critical to distinguish between rewarded (i.e. positive feedback) and unrewarded (i.e. negative feedback) actions in a particular context, remained unexplored. In our task, no novel motor patterns need to be acquired. Instead, novel associations need to be crafted between familiar motor patterns, stimuli and rewards. So far, we have focused on the fact that the processing of outcomes during visuomotor association learning shares neural substrates in our participants when performed by LeO and TE. Since LeO involves the observation of the actions of others during the stimulus presentation, and TE involves the execution of an action during the response phase, we suggest that the pMNS may be activated during the stimulus/response phase of each trial in our experiment. Given that previous action observation experiments describing the properties of the pMNS never distinguished correct from incorrect actions, it was unclear whether this system would also be recruited while our participants find out if the action was correct or not. Here, we therefore focused on analysing the outcome phase of each trial, and we found that both LeO and TE involved a brain network also active during simple action execution and observation and corresponding to the pMNS described in the literature. Interestingly, the BOLD activity in these areas presents an opposite pattern depending on whether the subjects were involved in a learning task or not (cf. Fig. 3B). The BOLD signal increase following outcome presentation was generally larger for observation (LeO) than for execution (TE). However, during the action observation/execution task, the signal was larger for execution (EXE) than for observation (OBS). The lesser activation in OBS compared to EXE is a common finding in the pMNS literature and is likely to be related to the fact that only about 10% of premotor neurons responds to action observation in primates [25], [69]. Why LeO has a slightly larger signal than TE in these somatosensory-motor regions is difficult to infer from our data, and we can only speculate about the origin of this effect. One possibility is that the BOLD signal in these somatosensorimotor regions is enhanced in LeO (compared to TE) as a consequence of the fact that in LeO (unlike in TE) the action was not executed by the participant during the SR phase, and that the participants may thus have a stronger urge to mentally re-enact the observed action upon finding out whether it was to be associated with the stimulus or not. In the absence of overt execution, this additional mental re-enactment of another’s action might be important to consolidate the stimulus-response link that needs to be established during our task.

Recent fMRI evidence from Gazzola and Keysers [23] and meta-analyses [24] showed that action observation and execution do not exclusively recruit the “classic” mirror areas (namely, the ventral premotor cortex and the inferior parietal lobule; see for example [68]), but additional brain areas such as the dorsal premotor cortex and the superior parietal lobule, as well as the supplementary and cingulate areas. Our results are in line with these findings, and suggest that the processing of outcomes during LeO and TE recruits regions involved in action execution and observation. While it is thus not surprising that pMNS regions are activated while participants move their hand in the TE condition and see others move in the LeO condition (the very definition of pMNS), we demonstrate that these regions activate while processing outcomes during LeO and TE, when no action was perceived or performed. This suggests that motor representations are activated twice during arbitrary stimulus-response-outcome associative learning. In TE, once when the participant executes a candidate action, and once when the participant finds out whether the action was successful or not during the acquisition phase. The latter activation becomes weaker in the consolidation phase, suggesting that reactivation of motor programs serves learning. During LeO, the first activation during action observation would represent a vicarious sharing of the attempted action, and resemble that often described in action observation experiments [24]. The second, however, would again serve learning, co-opting the mechanisms of TE learning by feeding it with vicarious rather than first-hand motor activations. In other words, we suggest that not only imitation learning (i.e. learn novel motor patterns through observation), but also abstract visuomotor associations learning-by-observation is partially supported by activation of the pMNS.

Neural Systems Selectively Recruited During Learning-by-observation

Our study also revealed brain areas that are specifically activated during LeO. Whereas no brain activation was found to differentiate the processing of first correct outcomes across TE and LeO, the processing of others' errors showed significant differences across learning conditions. Brain areas emerged as significantly more activated during incorrect outcome presentation in observational versus individual learning. The activated clusters were localised bilaterally in the middle cingulate cortex and posterior medial frontal cortex (pMFC), the anterior insula and the posterior superior temporal sulci (pSTS) (Fig. 4B, Table 2B). Both pMFC and the anterior insula are thought to be components of the error-monitoring network [70]. The pMFC is located in the dorsal anterior cingulate cortex, which has been suggested to be involved in individual learning from errors [71], [72]. Current research indicates that the pMFC plays a crucial role in error-monitoring and subsequent behavioural adjustement [73]. In particular, a performance-monitoring system in the pMFC seems to signal the need for adjustments when action outcomes call for adaptations [74]. In addition, recent data from electrophysiological recordings in the monkey suggest that neurons in dorsomedial prefrontal selectively respond to another’s erroneous actions and that their activity is associated with a subsequent behavioral adjustment [75]. The anterior insular cortex is known to contribute to performance monitoring processes [70]. It has been proposed to be involved in autonomic responses to errors in non-social contexts [74] and to increase its activity with error awareness [76], [77]. This network has also been found to be active during error-detection in non-learning contexts [70], [73], [78] and its activity has not been found to differentiate others’ from individual’ errors [78][80] nor to depend on the experimental setting or social context [70]. Our results provide critical information about the role of the pMFC - anterior insula network in the processing of other's error during LeO. Research to date has identified an association between the magnitude of error-related activity and subsequent learning performance [77], [81], [82]. We speculate that the selective activity in the pMFC - anterior insula network may represent a neural correlate of the cognitive biases that psychology and neuroeconomics have described as the predisposition to process the errors of others differently than personal errors in humans. Among these, the ‘actor-observer’ cognitive bias consists in the tendency to attribute others’ failures to their personality, and one’s own failures to the situation [83]. Additional neuroimaging and behavioural research is needed to explore the relative effectiveness of individual and observational learning from others and individual errors (cf. [11], [84]).

Our study showed that the posterior superior temporal sulcus (pSTS) also specifically correlated with the processing of others' errors during learning-by-observation. Previous non-human primate connectivity data indicate that the STS is anatomically well situated to integrate information derived from both the ventral and dorsal visual pathways [85][87]. For this reason, several studies suggest that initial analysis of social cues occurs in the STS region, which is sensitive to stimuli that signal the actions of another individual. Particular attention was given to the posterior part of the STS, which has been characterized as the substrate of goal-driven action understanding [88] and social perception [89]. In general, current literature supports the idea that the perception of agency activates the pSTS [90] and that activity in pSTS may be part of a circuit associating observed actions with motor programs [91], [92]. In addition, the pSTS is thought to be involved in the attribution of mental states to other organisms [93][96] and the extraction of contextual and intentional cues from goal-directed behaviour [97]. Importantly, activity in pSTS has previously been found in humans during imitation of actions [98]. Our results show that this region is selectively activated during the processing of error signals early during observational learning. Therefore, our results are compatible with a role of pSTS in the processing of social cues, such as others' actions' outcomes, a necessary step during the early observational learning. In addition, the fact that the pSTS was more activated by the errors of others than self, could reflect more intensive mentalizing (what does the actor think now that he knows that this action didn’t work?) or a reactivation of the visual representation of the observed action in order to reduce its association with the stimulus.

Conclusions

Our results suggest that the processing of others' outcomes during learning-by-observation shares a common brain network with trial-and-error learning. This network includes the dorsal fronto-parietal system, the associative fronto-striatal loop and the cerebellum. In addition, we showed that this shared network overlaps with the putative mirror neuron system, known to be involved during action observation and execution. This suggests that the pMNS, in addition to its role in acquiring new motor patterns during imitation learning, may mediate the vicarious learning of abstract visuomotor associations. Finally, we identified brain areas more activated for others- than self- errors during learning in the posterior medial frontal cortex (pMFC), the left anterior insula and the bilateral posterior superior temporal sulci (pSTS). We suggested that the pMFC and anterior insula, known to be crucial for error-detection, are involved in error monitoring during learning-by-observation. In parallel, the pSTS seems to provide information about social cues, such as others' actions' outcomes, a necessary step during the early phases of learning-by-observation. Overall, our study contributes to a better understanding of brain regions involved in vicariously learning stimulus-action-outcome associations by showing that this process recruits the mechanisms of the pMNS and the trial-and-error learning machinery.

Supporting Information

Figure S1.

Brain networks commonly recruited during learning in both TE and LeO conditions. Positive effect of the acquisition phase (i.e. incorrect outcomes +1st correct outcome), reflecting the common activations of TE and LeO during learning (t = 3.24, punc<0.001; all clusters also survive qFDR<0.05). Clusters of activation are superimposed on to the average T1 image derived from all participants.

https://doi.org/10.1371/journal.pone.0073879.s001

(TIFF)

Figure S2.

Brain networks commonly recruited during the acquisition phase of learning, action observation and execution. The localizer t-map for the pMNS was inclusively intersected with the positive effect of the acquisition phase (see Fig.S1), reflecting the common activations of TE and LeO during learning (t = 3.24, punc<0.001; all clusters also survive qFDR<0.05). Clusters of activation are superimposed on to the average T1 image derived from all participants.

https://doi.org/10.1371/journal.pone.0073879.s002

(TIFF)

Table S1.

Positive effect of the acquisition phase, reflecting the common activations of TE and LeO during learning (t = 3.24, punc<0.001; all clusters also survive qFDR <0.05).

https://doi.org/10.1371/journal.pone.0073879.s003

(DOC)

Table S2.

Localizer t-map for the pMNS (t = 3.24, punc <0.001).

https://doi.org/10.1371/journal.pone.0073879.s004

(DOC)

Table S3.

Intersection analysis between the localizer t-map for the pMNS and the positive effect of the acquisition phase, reflecting the common activations of TE and LeO during learning (t = 3.24, punc <0.001; all clusters also survive qFDR<0.05).

https://doi.org/10.1371/journal.pone.0073879.s005

(DOC)

Table S4.

Interaction of correcteness (incorrect, 1st correct) by learning condition (TE, LeO) (F1,14 = 12.06, punc<0.001 but q = NS).

https://doi.org/10.1371/journal.pone.0073879.s006

(DOC)

Acknowledgments

We thank Marc Thioux for help with scanning.

Author Contributions

Conceived and designed the experiments: EM VG DB AB CK BW. Performed the experiments: EM VG CK. Analyzed the data: EM VG AB CK. Wrote the paper: EM VG AB CK BW.

References

  1. 1. Bandura A (1977) Social learning theory. Englewood Cliffs (NJ): Prentice-Hall.
  2. 2. Rendell L, Boyd R, Cownden D, Enquist M, Eriksson K, et al. (2010) Why copy others? Insights from the social learning strategies tournament. Science 328: 208–213.
  3. 3. Frith CD, Frith U (2010) Mechanisms of Social Cognition. Annu Rev Psychol 63: 287–313.
  4. 4. Galef BG, Laland KN (2005) Social learning in animals: empirical studies and theoretical models. Bio Sci 55: 489–499.
  5. 5. Heyes CM, Dawson GR (1990) A demonstration of observational learning in rats using a bidirectional control. Q J Exp Psychol B 42: 59–71.
  6. 6. Range F, Viranyi Z, Huber L (2007) Selective imitation in domestic dogs. Curr Biol 17: 868–872.
  7. 7. Biederman GB, Robertson HA, Vanayan M (1986) Observational learning of two visual discriminations by pigeons: a within-subjects design. J Exp Anal Behav 46: 45–9.
  8. 8. Subiaul F, Cantlon JF, Holloway RL, Terrace HS (2004) Cognitive imitation in rhesus macaques. Science 305: 407–410.
  9. 9. Meunier M, Monfardini E, Boussaoud D (2007) Learning by observation in rhesus monkeys. Neurobiol Learn Mem 88: 243–248.
  10. 10. Chang SW, Winecoff AA, Platt ML (2011) Vicarious reinforcement in rhesus macaques (macaca mulatta). Front Neurosci 5: 27.
  11. 11. Monfardini E, Gaveau V, Boussaoud D, Hadj-Bouziane F, Meunier M (2012) Social learning as a way to overcome choice-induced preferences? Insights from humans and rhesus macaques. Front Neurosci 6: 127.
  12. 12. O'Doherty J, Dayan P, Schultz J, Deichmann R, Friston K, et al. (2004) Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304: 452–454.
  13. 13. Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD (2006) Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442: 1042–1045.
  14. 14. Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1–27.
  15. 15. Schultz W, Dickinson A (2000) Neural coding of prediction errors. Annu Rev Neurosci 23: 473–500.
  16. 16. D'Ardenne K, McClure SM, Nystrom LE, Cohen JD (2008) BOLD responses reflecting dopaminergic signals in the human ventral tegmental area Science. 319: 1264–1267.
  17. 17. Haruno M, Kawato M (2006) Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus– action–reward association learning. J Neurophysiol 95: 948–959.
  18. 18. Brovelli A, Laksiri N, Nazarian B, Meunier M, Boussaoud D (2008) Understanding the neural computations of arbitrary visuomotor learning through fMRI and associative learning theory. Cereb Cortex 18: 1485–1495.
  19. 19. Quilodran R, Rothé M, Procyk E (2008) Behavioral shifts and action valuation in the anterior cingulate cortex. Neuron 57: 314–25.
  20. 20. Burke CJ, Tobler PN, Baddeley M, Schultz W (2010) Neural mechanisms of observational learning. Proc Natl Acad Sci 107: 14431–6.
  21. 21. Suzuki S, Harasawa N, Ueno K, Gardner JL, Ichinohe N, et al. (2012) Learning to simulate others' decisions. Neuron 74: 1125–37.
  22. 22. Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27: 169–192.
  23. 23. Gazzola V, Keysers C (2009) The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fMRI data. Cereb Cortex 19: 1239–55.
  24. 24. Caspers S, Zilles K, Laird AR, Eickhoff SB (2010) ALE meta-analysis of action observation and imitation in the human brain. Neuroimage 50: 1148–67.
  25. 25. Gallese V, Fadiga L, Fogassi L, Rizzolatti G (1996) Action recognition in the premotor cortex. Brain 119: 593–609.
  26. 26. Fogassi L, Ferrari PF, Gesierich B, Rozzi S, Chersi F, et al. (2005) Parietal lobe: from action organization to intention understanding. Science 308: 662–7.
  27. 27. Rizzolatti G, Sinigaglia C (2010) The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations. Nat Rev Neurosci 11: 264–274.
  28. 28. Brovelli A, Nazarian B, Meunier M, Boussaoud D (2011) Differential roles of caudate nucleus and putamen during instrumental learning. NeuroImage 57: 1580–90.
  29. 29. Friston KJ, Holmes AP, Price CJ, Büchel C, Worsley KJ (1999) Multisubject fMRI Studies and Conjunction Analysis. NeuroImage 10: 385–396.
  30. 30. Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR, et al. (2005) A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. NeuroImage 25: 1325–1335.
  31. 31. Lancaster JL, Woldorff MG, Parsons LM, Liotti M, Freitas CS, et al. (2000) Automated Talairach atlas labels for functional brain mapping. Hum Brain Mapp 10: 120–131.
  32. 32. Burnod Y, Baraduc P, Battaglia-Mayer A, Guigon E, Koechlin E, et al. (2001) The role of ventral and orbital prefrontal cortex in conditional visuomotor learning and strategy use in rhesus monkeys (Macaca mulatta). Behav Neurosci 115 : 971–82. Erratum in: Behav Neurosci (2001) 115: 1317.
  33. 33. Culham JC, Valyear KF (2006) Human parietal cortex in action. Curr Opin Neurobiol 16: 205–212.
  34. 34. Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3: 201–215.
  35. 35. Wise SP, Murray EA (2000) Arbitrary associations between antecedents and actions. Trends Neurosci 23: 271–276.
  36. 36. Suzuki W, Brown E (2005) Behavioral and neurophysiological analyses of dynamic learning processes. Behav Cogn Neurosci Rev 4: 67–95.
  37. 37. Deiber MP, Wise SP, Honda M, Catalan MJ, Grafman J, et al. (1997) Frontal and parietal networks for conditional motor learning: a positron emission tomography study. J Neurophysiol 78: 977–991.
  38. 38. Eliassen JC, Souza T, Sanes JN (2003) Experience-dependent activation patterns in human brain during visual-motor associative learning. J Neurosci 23: 10540–10547.
  39. 39. Law JR, Flanery MA, Wirth S, Yanike M, Smith AC, et al. (2005) Functional magnetic resonance imaging activity during the gradual acquisition and expression of paired- associate memory. J Neurosci 25: 5720–5729.
  40. 40. Yin HH, Knowlton BJ (2006) The role of the basal ganglia in habit formation. Nat Rev Neurosci: 7 464–476.
  41. 41. Balleine BW, Delgado MR, Hikosaka O (2007) The role of the dorsal striatum in reward and decision-making. J Neurosci 27: 8161–8165.
  42. 42. Graybiel AM (2008) Habits rituals and the evaluative brain. Annu Rev Neurosci 31: 359–387.
  43. 43. Yin HH, Ostlund SB, Balleine BW (2008) Reward-guided learning beyond dopamine in the nucleus accumbens: the integrative functions of cortico-basal ganglia networks. Eur J Neurosci 28: 1437–1448.
  44. 44. Packard MG (2009) Exhumed from thought: basal ganglia and response learning in the plus-maze. Behav Brain Res 199: 24–31.
  45. 45. White NM (2009) Some highlights of research on the effects of caudate nucleus lesions over the past 200 years. Behav Brain Res 199: 3–23.
  46. 46. Balleine BW, O'Doherty JP (2010) Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35: 48–69.
  47. 47. Ashby FG, Turner BO, Horvitz JC (2010) Cortical and basal ganglia contributions to habit learning and automaticity. Trends Cogn Sci 14: 208–215.
  48. 48. Boettiger CA, D'Esposito M (2005) Frontal networks for learning and executing arbitrary stimulus–response associations. J Neurosci 25: 2723–2732.
  49. 49. Seger CA, Cincotta CM (2005) The roles of the caudate nucleus in human classification learning. J Neurosci 25: 2941–2951.
  50. 50. Delgado MR, Miller MM, Inati S, Phelps EA (2005) An fMRI study of reward-related probability learning. NeuroImage 24: 862–873.
  51. 51. Galvan A, Hare TA, Davidson M, Spicer J, Glover G, et al. (2005) The role of ventral frontostriatal circuitry in reward-based learning in humans. J Neurosci 25: 8650–8656.
  52. 52. Grol MJ, de Lange FP, Verstraten FA, Passingham RE, Toni I (2006) Cerebral changes during performance of overlearned arbitrary visuomotor associations. J Neurosci 26: 117–125.
  53. 53. Toni I, Passingham RE (1999) Prefrontal-basal ganglia pathways are involved in the learning of arbitrary visuomotor associations: a PET study. Exp Brain Res 127: 19–32.
  54. 54. Toni I, Ramnani N, Josephs O, Ashburner J, Passingham RE (2001) Learning arbitrary visuomotor associations: temporal dynamic of brain activity. NeuroImage 14: 1048–1057.
  55. 55. Tricomi EM, Delgado MR, Fiez JA (2004) Modulation of caudate activity by action contingency. Neuron 41: 281–292.
  56. 56. Monfardini E, Brovelli A, Boussaoud D, Takerkart S, Wicker B (2008) I learned from what you did: Retrieving visuomotor associations learned by observation. NeuroImage 42: 1207–13.
  57. 57. Gomez-Beldarrain M, Garcia-Monco JC, Rubio B, Pascual-Leone A (1998) Effect of focal cerebellar lesion on procedural learning in the serial reaction time task. Exp Brain Res 120: 25–30.
  58. 58. Appollonio IM, Grafman J, Schwartz V, Massaquoi S, Hallett M (1993) Memory in patients with cerebellar degeneration. Neurology 43: 1536–1544.
  59. 59. Pascual-Leone A, Grafman J, Clark K, Stewart M, Massaquoi S, et al. (1993) Procedural learning in Parkinson’s disease and cerebellar degeneration. Annals of Neurology 34: 594–602.
  60. 60. Grafman J, Litvan I, Massaquoi S, Stewart M, Sirigu A, et al. (1992) Cognitive planning deficit in patients with cerebellar atrophy. Neurology 42: 493–1496.
  61. 61. Torriero S, Oliveri M, Koch G, Caltagirone C, et al. (2007) The what and how of observational learning. J Cogn Neurosci 19: 1656–1663.
  62. 62. Bussey TJ, Wise SP, Murray EA (2001) The role of ventral and orbital prefrontal cortex in conditional visuomotor learning and strategy use in rhesus monkeys (Macaca mulatta). Behav Neurosci 115: 971–82. Erratum in: Behav Neurosci (2001) 115: 1317.
  63. 63. Genovesio A, Brasted PJ, Mitz AR, Wise SP (2005) Prefrontal cortex activity related to abstract response strategies. Neuron 47: 307–20.
  64. 64. Vogt S, Buccino G, Wohlschläger AM, Canessa N, Shah NJ, et al. (2007) Prefrontal involvement in imitation learning of hand actions: effects of practice and expertise. NeuroImage 37: 1371–1383.
  65. 65. Buccino G, Vogt S, Ritzl A, Fink GR, Zilles K, et al. (2004) Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. Neuron 42: 323–34.
  66. 66. Frey SH, Gerry VE (2006) Modulation of neural activity during observational learning of actions and their sequential orders. J Neurosci 26: 13194–13201.
  67. 67. Cross ES, Kraemer DJ, Hamilton AF, Kelley WM, Grafton ST (2009) Sensitivity of the action observation network to physical and observational learning. Cereb Cortex 19: 315–326.
  68. 68. Fabbri-Destro M, Rizzolatti G (2008) Mirror neurons and mirror systems in monkeys and humans. Physiology (Bethesda) 23: 171–179.
  69. 69. Keysers C, Kohler E, Umiltà MA, Nanetti L, Fogassi L, et al. (2003) Audiovisual mirror neurons and action recognition. Exp Brain Res. 153: 628–36.
  70. 70. Radke S, de Langen FP, Ullsperger M, de Bruijn ER (2011) Mistakes that affect others: an fMRI study on processing of own errors in a social context. Exp Brain Res 211: 405–413.
  71. 71. Mars RB, Coles MG, Grol MJ, Holroyd CB, Nieuwenhuis S, et al. (2005) Neural dynamics of error processing in medial frontal cortex. NeuroImage 28: 1007–1013.
  72. 72. Holroyd CB, Coles MG (2002) The neural basis of human error processing: reinforcement learning dopamine and the error-related negativity. Psychol Rev 109: 679–709.
  73. 73. Ridderinkhof KR, Ullsperger M, Crone EA, Nieuwenhuis S (2004) The role of the medial frontal cortex in cognitive control. Science 306: 443–447.
  74. 74. Ullsperger M, von Cramon DY (2003) Error monitoring using external feedback: specific roles of the habenular complex, the reward system, and the cingulate motor area revealed by functional magnetic resonance imaging. J Neurosci. 23: 4308–14.
  75. 75. Yoshida K, Saito N, Iriki A, Isoda M (2012) Social error monitoring in macaque frontal cortex. Nat Neurosci. 15: 1307–12.
  76. 76. Ullsperger M, Harsay HA, Wessel JR, Ridderinkhof KR (2010) Conscious perception of errors and its relation to the anterior insula. Brain Struct Funct. 214: 629–43.
  77. 77. Klein TA, Neumann J, Reuter M, Hennig J, von Cramon DY, et al. (2007) Genetically determined differences in learning from errors. Science 318: 1642–5.
  78. 78. de Bruijn ER, de Lange FP, von Cramon DY, Ullsperger M (2009) When errors are rewarding. J Neurosci 29: 12183–12186.
  79. 79. van Schie HT, Mars RB, Coles MG, Bekkering H (2004) Modulation of activity in medial frontal and motor cortices during error observation. Nat Neurosci 7: 549–54.
  80. 80. Shane MS, Stevens M, Harenski CL, Kiehl KA (2008) Neural correlates of the processing of another's mistakes: a possible underpinning for social and observational learning. NeuroImage 42: 450–459.
  81. 81. van der Helden J, Boksem MA, Blom JH (2010) The importance of failure: feedback-related negativity predicts motor learning efficiency. Cereb Cortex 20: 1596–603.
  82. 82. Hester R, Barre N, Murphy K, Silk TJ, Mattingley JB (2008) Human medial frontal cortex activity predicts learning from errors. Cereb Cortex 18: 1933–1940.
  83. 83. Jones EE, Nisbett RE (1971) The actor and the observer: Divergent Perceptions of the Causes of Behavior. New York: General Learning Press.
  84. 84. Nicolle A, Symmonds M, Dolan RJ (2011) Optimistic biases in observational learning of value. Cognition 119: 394–402.
  85. 85. Pandya DN, Yeterian EH (1985) Architecture and connections of cortical association areas. In : Peters A, Jones EG, editors. Cerebral Cortex vol 4. New York : Plenum. 3–61.
  86. 86. Boussaoud D, Ungerleider LG, Desimone R (1990) Pathways for motion analysis: cortical connections of the medial superior temporal and fundus of the superior temporal visual areas in the macaque. J Comp Neurol 296: 462–95.
  87. 87. Baizer JS, Ungerleider LG, Desimone R (1991) Organization of visual inputs to the inferior temporal and posterior parietal cortex in macaques. J Neurosci 11: 168–190.
  88. 88. Saxe R, Xiao DK, Kovacs G, Perrett DI, Kanwisher N (2004) A region of right posterior superior temporal sulcus responds to observed intentional actions. Neuropsychologia 42: 1435–1446.
  89. 89. Allison T, Puce A, McCarthy G (2000) Social perception from visual cues: role of the STS region. Trends Cogn Sci 4: 267–278.
  90. 90. Tankersley D, Stowe CJ, Huettel SA (2007) Altruism is associated with an increased neural response to agency. Nat Neurosci 10: 150–151.
  91. 91. Keysers C, Gazzola V (2006) Towards a unifying neural theory of social cognition. Prog Brain Res 156: 379–401.
  92. 92. Rilling JK, Sanfey AG, Aronson JA, Nystrom LE, Cohen JD (2004) The neural correlates of theory of mind within interpersonal interactions. NeuroImage 22: 1694–1703.
  93. 93. Frith CD, Frith U (1999) Interacting minds–a biological basis. Science 286: 1692–1695.
  94. 94. Frith U, Frith CD (2003) Development and neurophysiology of mentalizing. Philos Trans R Soc Lond B Biol Sci 358: 459–73.
  95. 95. Samson D, Apperly IA, Chiavarino C, Humphreys GW (2004) The left temperoparietal junction is necessary for representing someone else’s belief. Nat Neurosci 7: 449–500.
  96. 96. Saxe R, Kanwisher N (2003) People thinking about thinking people. The role of the temporo-parietal junction in “theory of mind”. NeuroImage 19: 1835–1842.
  97. 97. Toni I, Thoenissen D, Zilles K (2001) Movement preparation and motor intention. NeuroImage 14: 110–7.
  98. 98. Iacoboni M, Koski LM, Brass M, Bekkering H, Woods RP, et al. (2001) Reafferent copies of imitated actions in the right superior temporal cortex. Proc Natl Acad Sci 98: 13995–9.