Fig 1.
Adaptive decision-making in a detection-theory framework.
a. Illustration of SDT for a two-stimulus, two-response conditional discrimination task. On any given trial, presentation of a stimulus is tantamount to drawing a random sample x from either of the two distributions (S1 and S2), giving rise to a specific value of a decision variable. The distance between the means of the two distributions (expressed in units of standard deviations) is called “sensitivity” or d’ (gray). The subject makes the decision to emit either response (R1 or R2) by comparing x to a criterion c (vertical dotted line): R1 if c < x, R2 if c > x. In classical SDT, the value of the criterion is fixed, but the value of x changes from trial to trial. In this example, d’ = 2 and c = 0.25. If S1 and S2 are presented equally often, expected accuracy (fraction correct trials) is 0.83 across both stimuli (0.89 in S1 trials and 0.77 in S2 trials). b. Definitions of three mechanistic models of adaptive criterion setting: 1) Integrate Rewards (IR, blue), 2) Integrate Reward Omissions (IRO, purple), 3) Integrate Rewards and Reward Omissions (IR&RO, yellow). Basically, all models specify that the criterion in the next trial t + 1 equals the criterion in the present trial t times the leak factor γ, and c is incremented or decremented by δ depending on whether the response was rewarded or not. c. Exemplification of the criterion updating mechanisms of the three models in a sequence of four consecutive trials. Following stimulus presentation, the subject compares the current value of the decision variable generated by stimulus presentation x with the current value of criterion c, emits the selected response, receives a reward or not, and then shifts the criterion in accordance with the update rule of the specific model. The IR model shifts the criterion after each reward, the IRO model after each reward omission, and the IR&RO model after both rewards and reward omissions.
Fig 2.
a. Principle of condition design. In each condition, 3-4 out of a set of 5 stimuli were chosen (here, 3). Each stimulus (S) i was assigned to a response category (C) j and had its unique probability of presentation P(Si) and its unique reward probability P(Rew|Si) (rewards were given only when the response was correct). This example refers to the construction of condition “Lean L”. Solid lines in the left and middle bottom panels represent stimulus distributions (as in Fig 1A), bold dashed lines in the bottom right panel represent ‘decision distributions’, i.e., the distributions for each of the two categories (C) j ∈ {1;2} scaled by presentation and reward probability, i.e., for category 1, p(x|C1)*P(C1)*P(Rew|C1,R1). Note that the x-value at the intersection of the two decision distributions equals the optimal (reward-maximizing) criterion (see Methods, section “Criterion setting according to optimal account”). b. Steady-state criterion predictions of the three criterion-setting models and a reward-maximizing account for all experimental conditions. Bold dashed lines in each panel represent the decision distributions. Solid vertical lines denote the steady-state criterion predictions of the three models and a reward-maximizing account. The parameters used for this example are γ = 0.99, δ = υ = 0.04. See Table 3 and S1 Fig for more details on each condition.
Table 1.
Comprehensive overview over experimental conditions. Means: stimulus means on the decision axis used in Figs 2 and S1. P(Si): stimulus presentation probability for stimulus i. P(Rew|Si,Corr): probability of reward in trials in which stimulus i from category (C) j with j ∈ {1;2} was presented and a correct response was emitted.
Table 2.
Stimulus center frequencies (in Hz) of the chords used for each rat.
Table 3.
Grayscale values of the visual stimuli used for each pigeon (monitor grayscale values of 140 and 220 correspond to illuminances of 35 and 76 lux, respectively).
Fig 3.
Auditory single-interval forced-choice (SIFC) task and behavioral results.
a. Schematic drawing of the operant chamber with three conical nose ports and five representative sound waveforms used as stimuli (S1 through S5). b. Schematic outline of the task epochs and possible outcomes. Each rectangle represents the wall with the three nose ports (circles), filled circles represents ports which the subject is visiting in each epoch. c. Response bias (fraction R2, i.e., leftward responses, black) and reward density (fraction of rewarded trials, green) across all experimental conditions for all four subjects. Each panel shows results for one subject, data points represent individual sessions. Conditions are denoted by their respective initials: CL & CR for Confuse Left and Confuse Right; LL & LR for Lean Left and Lean Right, and RL & RR for Rich Left and Rich Right, respectively. The gray shaded areas highlight baseline sessions. d. Development of hit rate (HR, blue) for S1 and false alarm rate (FA, red) for S5 over the course of behavioral testing. Each individual line represents data from a single subject, thick lines represent the means over subjects. e. Steady-state criteria observed in the experimental conditions relative to criteria observed in the initial baseline sessions. Points represent the mean steady-state criterion values from the last 3 sessions of each condition for each animal, crosses represent means over the different animals. Observed session-by-session criteria were calculated using the one-criterion-per-session model (OCPS; see Methods for details).
Fig 4.
Correlation of predicted and observed criterion locations in the steady states of the experimental conditions.
a. Predicted vs. experimentally observed criterion locations for the reward maximization (“optimal”) account, as well as the IR, IRO, and IR&RO models. Individual data points represent a specific pair of predicted and observed mean criteria for a specific animal in a specific condition. Observed criterion locations were computed using the OCPS model (see Methods) whereas predicted criteria were obtained by solving the steady-state criterion equations (see Methods) through numerical optimization, using the fitted parameters for each rat. If predictions were perfect, data points would fall along the main diagonal (dashed line). Conditions are color-coded. r2, r and p-values of the correlations are given for each model. b. Criterion shift from the onset (average from first three sessions, in dark green) compared to the steady states of the experimental conditions (average from last three sessions, light green), relative to the criterion that would maximize reward in the respective condition. Crosses represent individual animals; points represent means over all four animals. All criteria were normalized to baseline criterion values prior to plotting.
Fig 5.
Trial-by-trial fits of the IR, IRO and IR&RO models to the experimental data.
a. The fraction of leftward responses in each session, P(R2), is plotted for each individual animal across the different experimental conditions, similar as in Fig 3C. Orange lines are model fits. Blue lines are averages over 1000 simulations; blue shaded areas represent ±1 SD. b. Distributions of fitted parameters of the different models. Each data point pertains an individual subject. c. Regression weights for rewards and reward omissions. The regression weights indicate the influence of both types of outcomes (blue for rewards and red for reward omissions) for trials t-1, t-2, t-3 and t-4. Dashed lines represent individual rats and thick lines means across the four subjects. See Methods for details of the GLM fit.
Fig 6.
Session-by-session and steady-state performance of the IR, IR-SLR and IR-SLR-RD models.
a. Fit and simulation results for the two new model versions (IR-SLR and IR-SLR-RD, IR replotted for comparison purposes) for an example animal (rat 5). Format as in Fig 5A. IR-SLR differs from the original IR model only in the number of learning rates δ (IR: 1, IR-SLR: 5). IR-SLR-RD additionally differs from both other models in that the criterion does not decay towards 0 on unrewarded trials, i.e., when RewR1 = RewR2 = 0. b. Same as in a, but plotting session-by-session criteria. c. Stimulus-specific learning rates returned by the IR-SLR model as a function of the fitted stimulus means. For comparison purposes, all values were normalized to the overall highest value. d. Model comparison through the Bayesian Information Criterion (BIC). In this panel, relative values are shown, i.e., the BIC of the IR-SLR-RD (i.e., “full”) model was subtracted from that of the other models, so that positive values are indicative of worse fits than the full model. e. Same as in Fig 4A, but comparing steady-state criterion prediction performance of models IR-SLR and IR-SLR-RD with the basic IR model. In the IR-SLR-RD plot, r, r2, and p-values exclude outlier (highlighted by a black arrow); including the outlier, the resulting values are r = 0.71 and r2 = 0.51.
Fig 7.
Experimental setup and results from a second dataset from pigeons performing a visual task.
a. Pigeons were tested in operant chambers. One side of the chamber featured a touch screen on which three horizontally aligned rectangular areas were designated as “pecking keys”. On these keys, visual stimuli were displayed, and key pecks within these areas were counted as responses. In each trial, one out of five possible discriminative stimuli (shades of gray, numbered S1 through S5) was presented on the center key. b. Timeline of an example trial. Each trial started with orange illumination of the center key. Following a single peck at this key, the discriminative stimulus was presented for one second. Thereafter, the center key again turned orange, and the pigeon had to emit a single peck at the key to turn it off and illuminate the two side (choice) keys. Pecking at a choice key was followed either by food tray illumination and food delivery, or by a negative feedback sound and the turning off of the house light. See Methods for further details. c. Criterion shifts from condition onsets (dark green) to steady states (light green) relative to the reward-maximizing criterion for all subjects and experimental conditions. Format as in Fig 4B. d. Comparison of the initial criterion-setting models (IR, IRO, IR&RO and optimal account) in their ability to predict steady-state criteria for all subjects and conditions. e. Fits and simulation results of the IR, IR-SLR and IR-SLR-RD models to response data from an example pigeon (subject 897). Format as in Fig 5A. f. Model parameters returned by each of the initially considered models. g. Model comparison of all the models featuring reward learning. h. Steady-state criterion prediction performance of the IR-SLR and IR-SLR-RD models with the IR model in the pigeon dataset.