Removal of reinforcement improves instrumental performance in humans by decreasing a general action bias rather than unmasking learnt associations

doi:10.1371/journal.pcbi.1010201

Fig 1.

Task structure and participants’ behaviour.

(A) Schematic of the go/no-go learning task. On each trial, a fixation cross was presented for 1000–1600 ms. Then, participants were presented with one stimulus for 500 ms and had 1000 ms to decide whether to perform a go (button press) or no-go (no button press) response. Blocks of reinforced trials alternated with probe blocks (illustrated in the timeline). On reinforced trials (cyan), a go response resulted in reward or punishment (monetary win or loss, indicated by a smiley or frowny, respectively), depending on whether the stimulus was a go or no-go stimulus. No-go responses resulted in no feedback, and in neither reward nor punishment. A progress bar at the bottom of the screen displayed cumulative reward (rewards increased the bar, punishments shrank it). On probe block trials (purple), participants were required to respond as during reinforced blocks, but no feedback following responses was provided. (B) Sensitivity index d’, separately for reinforced (cyan) and probe trials (purple). (C) Time course of go-response probabilities, P(Go), for go trials (green) and no-go trials (red). Darker shades of green and red indicate probe trials. Solid lines in B and C represent mean, shaded areas SEM across participants.

More »

Expand

Fig 2.

Behavioural results, expressed as difference between probe trials and preceding reinforced trials.

Results are shown both for the mean across all five probe blocks (left) and separately for each probe block. Points reflect individual participants’ behaviour. (A) The sensitivity index d’ increased in probe compared to reinforced trials. (B) The negative bias criterion c decreased on probe blocks, indicating a reduced propensity to act on probe trials. (C), (D) Both hit rate (HR, C) and false alarm rate (FAR, D) decreased on probe blocks, but the decrease in FAR was more pronounced than the decrease in HR, which lead to the increase in d’ represented in (A).

More »

Expand

Fig 3.

Computational modelling results.

(A) Comparison of the Bayesian information criterion (BIC) relative to the baseline model. Negative BIC differences indicate a decrease in BIC relative to the baseline model and hence better fit. Conversely, a positive BIC difference indicates worse fit. The bias model provided the best fit. (B) The bias model contained two separate bias parameters, b_R and b_P, for reinforced and probe blocks, respectively. The bias is reduced on probe compared to reinforced trials. (C) Initial estimates Q₀ of option values. On average, estimates were initialized with positive values. (D) Softmax choice probabilities to select an option as a function of its value. The sigmoids for reinforced and probe trials were generated using the mean fitted parameters. This figure illustrates how a reduction in response bias together with a positive value initialization resulted in the increase in d’ observed in behaviour. Solid vertical grey line indicates average Q₀. As values of go stimuli were acquired (shifting rightwards from the vertical line), the difference in action probabilities between probe and reinforced trials became smaller (green arrow). Conversely, as values of no-go stimuli were acquired (shifting leftwards from the vertical line), the difference became more pronounced (red arrow), thus leading to a stronger reduction in false alarm rates. (E) Time course of simulated go-response probabilities. The probability P(Go) for go trials (green) and no-go trials (red) was simulated based on the bias model. Darker shades of green and red indicate probe trials. Solid lines represent mean, shaded areas SEM across simulations.

More »

Expand