Adolescent and adult mice use both incremental reinforcement learning and short term memory when learning concurrent stimulus-action associations

doi:10.1371/journal.pcbi.1012667

Fig 1.

Behavioral schematic of RL+WM task.

A) In the operant chamber, mice initiate a trial via nosepoke to the center port where they receive a puff of a single odorant at a time. “GO” lights in the two peripheral ports indicate the availability of water, which mice receive only if they make a correct choice. B) In each session of the full RL+WM task (following early learning), mice must reach 70% fraction correct or higher in odors A & B within the first 100 to 200 trials. If mice pass the readiness criterion, they are then exposed to novel odors (see S1 Table for a full list of odors used) in set sizes of either 2 or 4 odors (presented individually in pseudorandom order). If mice fail to pass the readiness criterion, they will retrain on odors A & B during that session.

More »

Expand

Fig 2.

Male and female mice learn RL+WM behavior task similarly across set sizes and development.

Within the first 200 trials per odor in the novel odor phase, both males (A) and females (D) showed an average performance (fraction correct) significantly above chance (dotted black line at 50%, dotted green line at 70%) in both set size = 2 and set size = 4. Trials are binned as trials per stimulus to allow for direct comparison across set sizes. Males showed a small but significant difference in performance between set sizes that grew more pronounced near the end of each session (set size: p = 0.04, set size * time interaction: p = 0.01). Female mice showed a significant interaction between set size and time that favored set size = 4 (set size * time interaction: p = 0.006). (B, E) Performance in males and females showed significant benefit when an odor stimulus was repeated two trials in a row when compared to non-repeated stimulus trials (males: p < 0.0001 in both set sizes; females: p = 0.01 for set size = 2 and p < 0.001 for set size = 4). This effect was observed in both set sizes for both sexes. However, in male mice, while performance in repeat trials was higher in set size = 2 than in set size = 4 (possibly driven by a higher proportion of repeat trials when mice are exposed to 2 odors compared to 4), this relationship was reversed when performance in non-repeat trials was examined (p < 0.0001). (C,F) Mean fraction correct in males and females did not change in sessions across development P30–90, a period that includes both adolescent development and early young adulthood in mice. Lines connect up to three sessions per mouse and show performance for first 200 trials per odor in both set size = 2 and set size = 4 for all sessions analyzed, with dotted lines reflecting 95% confidence intervals. Error bars indicate SEM. Asterisks indicate the result of RM 2-way ANOVA (A,D) and either a paired t-test for within session comparions (repeat with non-repeat in the same set size) or an unpaired t-test and Wilcoxon signed rank test (B,E). *** p < 0.001, ** p < 0.01, * p < 0.05.

More »

Expand

Fig 3.

Summary regression coefficients from main regressions run in set size = 2 and set size = 4.

In order to understand the influence of past trials on current trial t, we developed and tested multiple regressions with a logit link function and fit them to individual sessions. Here we show our preliminary regression, regression #1, and our winning regression model (regression #3) after testing a total of six regressions (see S1 Fig). All regressions included a theoretical marker for RL (correct history) and a series of different one-back trial identities that could be used to approximate WM. The influence of correct history for both male and female mice across both set sizes and regressions was significant, indicating the use of RL. By looking at one-back and two-back trials in regression #1, we found that one-back was significantly above 0 for male mice in both set sizes and for females in set size = 4 only (p = 0.04). We also found that two-back was not significant for either sex. For males in regression #3 for both set size = 2 and set size = 4, all coefficients were significantly above 0. For females, there was only a significant effect of reward for both set size = 2 and set size = 4 and in the interaction between reward and repeat in set size = 4 only. Normality tests on grouped regression coefficients from individual sessions were run as described in Materials and methods and parametric (one sample t) or non-parametric (Wilcoxon signed rank) tests were applied accordingly. Each column reflects the mean of individual sessions (3 from each animal in each set size) and error bars correspond to SEM. **** p < 0.0001 *** p < 0.001, ** p < 0.01, * p < 0.05.

More »

Expand

Fig 4.

Male regression coefficients for regression #3 across development in set size = 2 and set size = 4.

In order to understand the relationship between age and the predictors of current choice in trial t (see Fig 3 for the logistic regression summary), we looked at whether coefficient weight (y-axis) would change over development (x-axis) for male mice. It is possible that as mice age, they rely to differing extents on separable learning systems (for example, RL, captured by ‘correct history’ and, short-term memory, captured by one-back combinations of stimuli and reward). For each coefficient from regression #3, our winning regression (see S1 Fig), we report the R² and p-value from a simple linear regression (best fit line) and the results of a mixed linear model in the main text. While ‘correct history,’ a measurement of RL, was significantly above 0 for both set sizes (Fig 3), the coefficient weight did not change across age. (B,G). The intercept for both set sizes did not change significantly across age (A,F) and some developmental changes that were significant in set size = 2 (reward-repeat: p < 0.0001; noreward-repeat: p < 0.0001) were also reflected in set size = 4 (reward-repeat: p < 0.0001; main effect of reward: p < 0.0001). There was a difference between the regression weight significance across set sizes for noreward-repeat and main effect of reward (C,J), which could have been influenced by the addition of two more stimuli.

More »

Expand

Fig 5.

Female regression coefficients for regression #3 across development in set size = 2 and set size = 4.

The influence of age on regression coefficients from our winning regression, regression #3, for both set size = 2 and set size = 4 are formatted in the same way as males in Fig 4. All statistics are done identically and are reported in the text. There was a slight decrease in the regression coefficient weight of intercept for set size = 2 (A), but not set size = 4 (F). Outside of this, there were no significant changes across development in set size = 4, but there was a significant change in reward-repeat (p < 0.0001) and noreward-repeat (p < 0.0001) only in set size 2. These results are in contrast to significant changes seen across development in male mice in regression #3 that had some consistencies across set sizes (Fig 4).

More »

Expand

Fig 6.

Model comparison and validation.

(A) We examined 4 parameters that isolated different ways mice might use 1-trial back information (S1-S4). Later in the best-fitting model, S2 and S4 were collapsed to one parameter to reduce complexity. Each left cartoon indicates the stimulus and the mouse’s action at time t and the cartoon on the right reflects the choice options at time t+1 that a mouse has when presented with either stimulus. The plus/minus sign reflect the positive/negative direction of the specific strategy parameter and the name of each strategy parameters describes the positive value of the parameter at time t+1. (B) The winning model a0bs1232 (where a indicates a free α₊ parameter, 0 indicates that α₋ = 0 is fixed, and s1232 indicates three free S parameters with S4 = S2) was compared to other a0b models with various combinations of strategy parameters (s) or no strategy parameters (nos) as well as a model leaving the negative learning rate free (aabs1232). All competing models built from basic RL model described in Materials and methods. The winning model had an average BIC of 859.4, not significantly different from the full model a0bs1234 (BIC = 857.6191). However, a Bayesian model comparison analysis showed that a0bs1232 was significantly more frequent across sessions (C; posterior frequency 39.74% vs. 26.11% for a0bs1234), with a protected exceedance probability (pxp, [12]) of 0.991 (D), confirming that it was the best model at the group level. (E-F) Example logistic regression. To validate our winning model we analyzed simulated behavior from parameters fit on each individual and session with the same logistic regression approach. We found that the simulated behavior captured well the impact of repeat, reward, their interaction, and the effect of reward history on a session by session basis, as shown by the group level effect (E-F) and tight correlations between mouse behavior and simulated behavior using fit parameters for regression #3 (sim; G-K).

More »

Expand

Fig 7.

Strategy parameters across development for male mice in set size = 2 and set size = 4 from winning computational model.

In order to understand the relationship between age and the parameters from our winning model (see Fig 6 and Materials and methods for model details), we looked at whether parameter weight (y-axis) would change over development (x-axis) for male mice in set size = 2 (A-E) and set size = 4 (F-J). Parameter values are colored by individual with separate sessions connected by lines. For each parameter, we report the R² and p-value from a simple linear regression (best fit line) on the figure and the results of a mixed linear model with age as the predictor variable and the parameter as the dependent variable in order to better account for variability driven by repeat sessions by individual mice is reported in the main text (see Materials & methods for more information about the analyses). RL parameter α₊ learning rate and decision noise parameter softmax β were stable across development in both set size = 2 (A-B) and set size = 4 (F-G). Both parameters S1 “Inappropriate Lose-Shift” and S3 “Inappropriate Lose-Stay” decreased significantly with age in set size = 2 (C: p = 0.02, E: p = 0.04), but not in set size = 4 (H,J). However, strategy parameter, S2 = S4 “Stimulus Insensitive Win Stay,” increased significantly for male mice in both set size = 2 (D:p < 0.0001) and set size = 4 (I: p = 0.03).

More »

Expand

Fig 8.

Strategy parameters across development for female mice in set size = 2 and set size = 4 from winning computational model.

Parameters from our winning model across age for female animals are formatted in the same way as Fig 7 and all mixed linear model statistics are reported in the main text. RL parameter α₊ learning rate and decision noise parameter softmax β were stable across development in both set size = 2 (A-B) and set size = 4(F-G). Both parameters S2 = S4 “Stimulus Insensitive Win Stay” and parameter S3 “Inappropriate Lose-Stay” did not change across development in either set size = 2 (D,I) or set size = 4 (E,J). However, female mice had a significant decrease in parameter S1 “Inappropriate Lose-Shift” in set size = 4 (H: p = 0.04) with a trend in a similar direction in set size = 2 (C: p = 0.08).

More »

Expand

Fig 9.

Effect of session on winning model parameters for set size = 2 and set size = 4 for both male and females.

To test if mice adjusted one-back strategies with experience, we next compared how parameter weights (y-axis) changed across the 6 sessions analyzed for each mouse (x-axis). Since set size = 2 and set size = 4 days were interspersed, we combined set size data and analyzed sessions chronologically. Since each mouse had multiple sessions, each individual was colored and their sessions connected as described previously (Fig 7). Gray bars indicate mean values for each session with SEM and a line connecting each mean for better visualization. Dotted lines are set to 0 for strategy parameters. Similar to what was seen in comparisons between regression coefficients and age, there was no effect of experience on α₊ (A,F) or β (B,G) for either sex. Use of one-back strategy parameters changed significantly across sessions for male mice with (C) S1 “Inappropriate Lose Shift” decreasing across sessions, p = 0.01 (D) S2 = S4 “Stimulus Insensitive Win Stay” increasing, p = 0.009 and (E) S3 “Inappropriate Lose Stay” decreasing, p < 0.0001. Female mice showed a trend in the same direction of male mice in (H) S1 “Inappropriate Lose Shift”, p = 0.06, but did not show any experience related changes in (I) S2 = S4 “Stimulus Insensitive Win Stay” or (J) S3 “Inappropriate Lose Stay”. Full statistics from mixed linear models are reported in the main text.

More »

Expand

Fig 10.

Age-matched GDX and sham controls show no differences in behavior or RL learning parameters and minimal differences in strategy parameters.

A separate cohort of male mice underwent gonadectomy (GDX) (n = 6) or sham (n = 5) surgery prior to puberty to test if gonadal testosterone at puberty was required for age-dependent changes in males. Mice were tested on behavior between P61 and P96. Three sessions from set size = 2 and set size = 4 data were pooled and mixed linear models were used to compare groups while controlling for repeated measures. (A) GDX and sham animals showed comparable learning curves. (B) α₊ learning rate was comparable between groups. (C) decision noise parameter β was comparable between groups. There were also no differences across (D) S1: Inappropriate Lose-Shift and (E) S2 = S4: Stimulus-Insensitive Win-Stay. However, there was a difference between GDX and sham in (F) S3: Inappropriate Lose-Stay with sham animals having lower values than GDX. Notably, sham animals were also significantly lower than intact mice of the same age range (S7(K) Fig β_conditon = −0.15, 95% CI = [−0.20, −0.08], p < 0.0001), suggesting there may be an effect of sham surgery on this metric or variation in cohorts.

More »

Expand