Fig 1.
Decision-making deconstructed.
Most voluntary decision policies depend on the CBGT circuits (where; left panel). These circuits comprise distributed neuronal populations within the basal ganglia, that interact with each other, as well as cortical and thalamic neurons (connections with circles: inhibition; connections without circles: excitation). This interaction can be described at the algorithmic level by a set of parameters in a process model (here, the DDM) that abstractly simulates evidence accumulation. The goal of this process is to determine the distributions of decision outcomes such as reward rates (what; right panel). Contours were generated by simulations of the DDM with drift rate v and boundary height a selected on a fine grid of values. Other DDM parameters (onset time, t; bias z) were fixed. Different initial parameter values and changes in parameters map to different changes in these decision outcomes (arrows in right panel). Control ensembles within CBGT circuits effectively determine the relative configuration of decision policy parameters (how; middle panel) [29]; that is, each ensemble represents a mapping between a pattern of increases (green) or decreases (magenta) in firing in CBGT regions (middle panel, left column) and increases (green) or decreases (magenta) of DDM parameters (middle panel, right column). What remains unclear, and we address in this work, is how learning modulates the balance between control ensembles in a way that shifts decision policies so as to maximize reward rate. Cx, cortical PT cells; CxI, inhibitory interneurons; FSI, fast spiking interneurons; d/iSPN, direct/indirect spiny projection neurons; STN, subthalamic nucleus; GPe, external globus pallidus; GPi, internal globus pallidus.
Fig 2.
Dopamine-dependent cortico-striatal plasticity drives CBGT networks in the direction of reward rate maximization.
(A) The evolution of RTs achieved by a DDM fit to CBGT network behavior, projected to (v,a)-space. The average starting position for the fast (orange), intermediate (brown) and slow (red) networks are shown as stars. The squares indicate the evolution of each network group over the plasticity stages, which converge after 15 trials (shaded elliptical regions). The yellow (purple) colors represent high (low) RTs. The network trajectories do not evolve in the direction that would be expected to minimize the RTs (e.g., optimal direction shown in blue from the initial position of all three speed groups). (B) The yellow (purple) colors represent high (low) accuracy. The networks evolve towards increasing expected accuracy but not in an optimal fashion (trajectories vs. blue arrows). (C) The yellow (purple) colors represent high (low) reward rate. The network evolution aligns closely with the direction that maximizes the reward rate (blue arrows). (D) The cosine distances calculated for every network at each plasticity stage for RT, accuracy and reward rate are were pooled together and shown as distributions.
Fig 3.
Canonical correlation analysis (CCA) identifies control ensembles (cf. [29]).
Given matrices of average firing rates, F (both summed rates across channels, Σ, and between-channel differences, Δ), and fit DDM parameters, D, derived from a set of networks at baseline (left panels), CCA finds the low-dimensional projections, for firing rates and
for DDM parameters (right panels), which maximize the correlation, ρ, between the projections
and
of F and D. Blue lines in the F plot show left channel activity, orange show right channel activity, and green shows populations that go across both channels.
Fig 4.
Plasticity-induced changes of control ensemble influence.
(A) The loading weights of the first 5 PCs of firing rate changes from before to after plasticity, pooled for all networks. (B) The drivers (columns of S), which quantify the modulations of control ensembles (responsiveness, pliancy, choice) that capture each PC (pooled for all network classes). (C): The variance-weighted drivers for the three control ensembles, computed separately for the three network classes (fast, intermediate and slow).
Fig 5.
Suboptimal and optimal choices modulate control ensembles in opposite directions.
(A) The modulation of control ensembles associated with various reward sequences encountered in two initial trials with cortico-striatal plasticity. U represents “Unrewarded" and R represents “Rewarded" trials. (B) The reward rate changes obtained by simulation of networks with synaptic weights frozen after various reward sequences occurred on two initial trials.