Fig 1.
Mice display noisy behavior in a deterministic reversal learning task.
(a) (Top) Behavioral task setup for head–fixed mice with freely–rotating wheel. Schematic created with biorender.com. (Bottom) Timing structure for each trial, demarcating the cue, movement and outcome epochs. (b) Structure of deterministic reversal learning task in a ‘100–0’ environment. Hidden states alternated between right–states, with high reward probability for right actions, and left–states, with high reward probability for left actions. The block lengths were randomly sampled from a uniform distribution between 15–25 trials. (c) Example behavioral performance of an animal in the reversal learning task, block transitions are demarcated by vertical dashed lines. Dots and crosses represent individual trials (correct or incorrect). Black trace indicates the rolling performance of 15 trials. (d) Session–averaged performance of all mice (n = 21) during training of 30 sessions. Dashed line indicates the ideal win–stay–lose–shift (WSLS) strategy. (e) Illustration of the sigmoidal transition function with three parameters: switch offset s, switch slope α, and lapse ε. Session–averaged switch offset (f), slope (g), and lapse (h) of all mice (n = 21) during training of 30 sessions. Dashed line indicates the ideal win–stay–lose–shift (WSLS) strategy.
Fig 2.
Non–uniform performance of mice in reversal learning.
(a) Example performance of a mouse (E54) in relation to its lapse rate in the last 5 days of training (days 26–30). Individual dots show the combinations of performance and lapse rates in single blocks. Lighter blue dots represent early blocks in the session while dark blue dots are late blocks. Red error bars represent the expected mean and standard deviation in performance and lapse rate assuming the mouse uses a single strategy. (b) Comparison of the observed standard deviation of block performances in the final five training sessions (black vertical lines) with the expected standard deviation in performance for an agent that uses a uniform strategy (box plots, n = 100 bootstrap runs). Each row represents one of 21 experimental mice (ID of animals shown on the y–axis). The average performance of each animal on the last 5 days of training, E, is shown on the right.
Fig 3.
a) Behavior of an example agent generated by a Hidden Markov process with K = 3 components. Colored circles represent the underlying hidden states, zi, which evolve according to a Markov chain. Each state (shown by blue, red and green shade) follows a different set of underlying switching dynamics. Blue dots represent correct choices, red crosses represent incorrect choices. (Inset) Average transition function across all blocks of the session (black) together with the fitted sigmoidal curve (blue). b) (Top) Transition functions corresponding to each of the three hidden states, zi = 1, 2, 3. Each sigmoidal curve can be parameterized by three features, the slope αi, offset si, and lapse εi. Arrows represent transition probabilities between the states. (Bottom) Eqs of the blockHMM generative model. Each hidden state governs the choice sequence in each block according to the sigmoidal transitions (Eqs 1 and 2). The log–likelihood of the observed choices in the block is the sum of the log–likelihoods of individual trials (Eq 3). c) (Top) Example behavior in 1000 blocks of trials generated by the same blockHMM mixture shown in panels a and b. Each column represents one block, with trials 1 to 30 of each block running from top to bottom. Red represents incorrect choices and blue represents correct choices. (Middle) True states that underlie the behavior shown in the top panel. (Bottom) Inferred latent states by the blockHMM fitting procedure. d) (Left) Convergence of the log–likelihood during model fitting in panel c to the true log–likelihood of the data (dashed line). (Right) Dependence of cross–validated log–likelihood on the number of components, K. e) True and inferred transition matrices for the behavior shown in panel c. f) Grouping of blocks of trials according to the inferred state after the model fitting with K = 3 HMM components. (Top) Raw behavioral performance grouped by the identity of the inferred latent state. (Bottom) Average transition function and fitted sigmoidal curve for all blocks that share the same inferred state. g) Comparison of true and inferred parameters for the three components of the behavior shown in panel c.
Fig 4.
Component mixtures of expert mice.
(a) blockHMM transition curves of six representative mice (E46, E53, E56, F02, FH02, FH03). The number of behavioral modes ranged from two to six in all our experimental mice. In each row, transition curves are sorted in increasing order of performance. (b) Distribution of block performances across behavioral modes. Behavioral modes were divided into three groups based on this distribution: low performance (<65%), intermediate (65–84%), and high (>84%). (c) Distribution of two transition parameters, offset and lapse of the low–performance, medium–performance and high–performance groups. The low performance regime (blue) had a high offset and lapse rate. The intermediate performance regime consists of two sub–groups: one group (yellow) had a high lapse rate but a low offset, and the other (pink) had a high offset but a low lapse rate. The high–performance regime (green) had a low lapse rate as well as an offset. (d) Block transition dynamics of the four behavioral regimes identified in c. (e) Types of behavioral modes found in the behavior of each experimental mouse (n = 21). All animals performed a mix of behavioral modes ranging from low to high performance.
Fig 5.
Mapping transition dynamics to underlying behavioral strategies.
(a) Implementation of Q–learning (top) and inference–based algorithms (bottom) for simulating choice sequences of artificial agents. (b) Example behavior of simulated Q–learning (top) and inference–based agents (bottom). Each dot or cross represents the outcome of a single trial. In the Q–learning plot, black and blue traces represent the values of each of the two actions. In the inference–based plot, black trace represents the posterior probability of the right state P (st = R ∣ c1, r1,…, ct–1, rt–1). (c) We performed a computational simulation of an ensemble of Q–learning and inference–based agents taken from grids that spanned the Q–learning parameter space (top), or the inference–based parameter space (bottom). Based on the results of the simulations, the spaces were clustered into six groups (represented by different colors), that showed qualitatively different behavior. (d) Transition functions grouped according to the behavioral regime Q1–4, IB5–6. Black lines represent single agents and red trace represents the mean across all the transition functions in each group. (e) Behavioral regime composition of each of the six algorithmic domains (Q1–4, IB5–6). (f) Cross–validated confusion matrix showing the classification performance of a k–nearest neighbor (kNN) classifier trained to predict the class identity (Q1–4, IB5–6) based on the observed transition curve. Diagonal entries show the accuracy for each respective class.
Fig 6.
Mice use combination of model–free and inference–based strategies in reversal learning.
(a) Composition of blockHMM mixtures for individual animals. Each row represents one mouse with ID shown on the left. The color of each square represents the decoded behavioral regime of each HMM mode (Q1–4, IB5–6). The number of blocks for each animal, K, was selected by cross–validation and are sorted here in descending order. (b) Transition function of HMM modes for all animals, grouped according to the decoded behavioral regime. (c, d) Distribution of HMM modes for two example animals, f16, and f11, which displayed vastly different behavioral strategies with learning. The average performances of the two animals on the last 5 days of training, E, are shown. (e) Average frequency of HMM modes for n = 21 experimental animals (mean ± standard error) showing the average evolution of behavioral mixtures over the course of training.