Reward-driven changes in striatal pathway competition shape evidence evaluation in decision-making

doi:10.1371/journal.pcbi.1006998

Fig 1.

Multi-level modeling design.

A STDP model of DA effects on Ctx-dMSN and Ctx-iMSN synapses is used to determine how phasic DA signals affect the balance of these synapses. A spiking model of the CBGT pathways simulates behavioral responses, under different conditions of Ctx-MSN efficacy based on the STDP simulations. The simulated behavioral responses from the full CBGT network model are then fit to a DDM of two-alternative choice behavior. Notation: j − Ctx—cortical population, j − dMSN—direct pathway striatal neurons, j − iMSN—indirect pathway striatal neurons (j ∈ {L, R}); DA—dopamine signal; STR—striatum; GPe—globus pallidus external segment; STN—subthalamic nucleus; GPi—globus pallidus internal segment; FSI—fast spiking interneuron; RT—reaction time; v—DDM drift rate; a—separation between boundaries in DDM; z—bias in starting height of DDM; tr—time after which evidence accumulation begins in DDM.

More »

Expand

Fig 2.

Corticostriatal synaptic weights with probabilistic reward feedback.

First column: p_L = 0.65; second column: p_L = 0.75; third column: p_L = 0.85 A, B, and C: Averaged weights over each of four specific populations of neurons, which are dMSN neurons selecting action L (solid black); dMSN neurons selecting action R (solid red); iMSN neurons countering action L (dashed black); iMSN neurons countering action R (dashed red). D, E, and F: Evolution of the estimates of the values for actions L (Q_L) and R (Q_R) estimated by Q-learning versus the ratio of the corticostriatal weights to those dMSN neurons that facilitate the action relative to the weights to those iMSN that interfere with the action. Both the weights and the ratios have been averaged over 8 different realizations. A small jump occurs in the Q_R trace for p_L = 0.65 and is joined by a dashed line; this comes from the time discretization and averaging.

More »

Expand

Fig 3.

Single trial example of CBGT dynamics.

Population firing rates of CBGT nuclei, computed as the average of individual unit firing rates within each nucleus in L (black) and R (red) action channels, are shown for a single representative trial in the high reward probability condition. The selected action (L) and corresponding RT (324 ms) are determined by the first action channel to raise its thalamic firing rate to 30 Hz.

More »

Expand

Fig 4.

Striatal pathway dynamics and behavioral effects of reward probability in full CBGT network.

A: Time courses show the average population firing rates for L (black) and R (red) dMSNs (top) and iMSNs (bottom) over the the trial window. Shaded areas reflect 95% CI. Colored vertical lines depict the average RT in the low (blue), medium (cyan), and high (yellow) reward conditions. B and C: Summary statistics of dMSN and iMSN population firing rates were extracted on each trial and later included as trialwise regressors on parameters of the DDM, allowing specific hypotheses to be tested about the mapping between neural and cognitive mechanisms. In B, lighter colored bars show the difference between dMSN firing rates in the L and R action channels whereas darker colored bars show the difference between dMSN and iMSN firing rates in the L action channel. Each was computed by averaging normalized values of trialwise estimates of the area under the appropriate firing rate curve (AUC); see main text for details. In C, lighter colored bars show the difference between iMSN firing rates in the L and R action channels and darker colored bars show the average iMSN firing rate (combined across left and right channels). Error bars show the bootstrapped 95% CI. D: Average accuracy (probability of choosing L) and RT (L choices only) of CBGT choices across levels of reward probability. E: RT distributions for correct choices across levels of reward probability; note that higher reward yields more correct trials. Error bars in B-D show the bootstrapped 95% CI.

More »

Expand

Fig 5.

DDM fits to CBGT-simulated behavior reveals pathway-specific effects on drift rate and threshold mechanisms.

A: ΔDIC scores, showing the relative goodness-of-fit of all single- and dual-parameter DDMs considered (top) and all DDM regression models considered (bottom) compared to that of the null model (all parameters held constant across conditions; see Table 2). The ΔDIC score of the best-fitting model at each stage is plotted in green. The best overall fit was provided by DDM regression model III. B: DDM schematic showing the change in v and a across low (blue), medium (cyan), and high (yellow) reward conditions, with the threshold for L and R represented as the upper and lower boundaries, respectively. C: Posterior distributions in each reward condition for a (Eq 1), estimated on each trial as a function of the average iMSN firing rate across left and right action channels (see I_all in Fig 4C), and v (Eq 2), estimated on each trial as a function of the the difference between dMSN firing rates in the left and right channels (D_L − D_R in Fig 4B). D: Histograms and kernel density estimates showing the CBGT-simulated and DDM-predicted RT distributions, respectively. E: Point plots showing the CBGT network’s average accuracy and RT across reward conditions overlaid on bars showing the DDM-predicted averages.

More »

Expand

Table 1.

Single- and dual-parameter DDM goodness-of-fit statistics.

DIC is a complexity-penalized measure of model fit, DIC = D(θ) + pD, where D(θ) is the deviance of model fit under the optimized parameter set θ and pD is the effective number of parameters. ΔDIC is the difference between each model’s DIC and that of the null model for which all parameters are fixed across conditions. Asterisks denote models providing best fits within the single-parameter group (*) and across both groups (**).

More »

Expand

Table 2.

DDM regression models and goodness-of-fit statistics.

Asterisk denotes best performing model.

More »

Expand

Fig 6.

Simulated behavior and striatal influences from randomly sampled networks.

A. Horizontal lines show subject-averaged accuracy (left) and correct response RT (right) means. Individual subject means are displayed as dots, connected by lines across conditions. B. Correct RT distributions in low (blue), medium (cyan), and high (yellow) reward conditions. C-D. Normalized striatal regressors, as in Fig 4C and 4D.

More »

Expand

Fig 7.

Model comparison and fits to randomly sampled network data.

A. ΔDIC values for all single (left) and dual (right) free parameter models. B. ΔDIC values for DDM regression models I − XII associated with parameters (v, a). C. Posterior distributions for boundary height (a) and drift rate (v) estimated in the best-fitting regression model III. D. Correct RT distributions generated by the sampled networks (histograms) with predicted distributions from regression model III overlaid as kernel densities for each reward condition (see Fig 5 for corresponding model fit results from original network data).

More »

Expand

Table 3.

Synaptic efficacy (g) and probability (P) of connections between populations in the CBGT network, as well as postsynaptic receptor types (AMPA, NMDA, and GABA).

The topology of each connection is labeled as either diffuse, to denote connections with a P > 0 of projecting to left and right action channels, or focal, to denote connections that were restricted to within each channel.

More »

Expand

Table 4.

Corticostriatal weights in the CBGT network across levels of reward probability.

In each reward condition (rows), corresponding values of ϕ were used to scale the synaptic efficacy of corticostriatal inputs (g_Ctx-MSN) to the direct (D) and indirect (I) pathways within the left (L) and right (R) action channels.

More »

Expand