Distinct cortico-striatal compartments drive competition between adaptive and automatized behavior

doi:10.1371/journal.pone.0279841

Table 1.

Key additions that extend the model from prior works.

More »

Expand

Fig 1.

Diagram of simulated behavioral sessions.

Model instances were trained with reward feedback that was specific to the behavioral session. The three other types of behavioral sessions each began after being trained by an initial learning session such that in the reward reversal, devaluation, and punished outcome sessions, the model instance was already trained to select action #1.

More »

Expand

Fig 2.

A model of bimodal and concurrent learning in the basal ganglia.

(A) Cortico-basal ganglia-thalamo-cortical loops are distributed between the medial and lateral partitions of the model and are organized into distinct channels that represent individual actions (color-coded as blue and orange). The medial partition incorporates the prefrontal cortex (PFC) and the portion of the basal ganglia that includes the dorsomedial striatum (DMS). The lateral partition incorporates the premotor cortex (PMC) and the portion of the basal ganglia that includes the dorsolateral striatum (DLS). The DMS is coded red to indicate that dopamine release in this region codes for reward prediction error. The DLS is coded green to indicate that dopamine release in this region codes for salience. Within each compartment of the basal ganglia, each of the two channels processes cortical input via the striatonigral (direct) pathway and striatopallidal (indirect) pathway which converge on a combined node representing the substantia nigra pars reticulata and the globus pallidus internal (SNr/GPi). D1-expressing medium spiny neurons (MSNs) directly inhibit the SNr/GPi (direct pathway). D2-expressing MSNs project to the indirect pathway, which includes the globus pallidus external (GPe) and the subthalamic nucleus (STN). Within each partition, cortical excitation of MSNs is modulated by dopaminergic projections from the substantia nigra compacta (SNc). These dopamine signals encode different quantities within the DMS and the DLS and thus induce different modes of learning in each partition. In the DMS, dopamine modulates cortico-striatal synaptic weights based on the reward prediction error (RPE), and in the DLS, dopamine modulates cortico-striatal weights based on contextual salience. (B) Sample model activity during a behavioral session includes end-trial cortical activity, cortico-striatal synaptic weights, and reward feedback metrics. (C,D) Cortical partitions (PFC and PMC) perform independent outcome and action selection. PFC and PMC selection may agree (C) or disagree (D).

More »

Expand

Fig 3.

An agent learns to select action #1 in a initial learning behavioral task.

(A) Cortical activity, cortico-striatal synaptic weights, and reward feedback metrics during initial learning behavioral session. In panels depicting cortico-striatal weights, D1 synaptic weights are solid traces and D2 synaptic weights are dashed traces. Blue traces correspond to action #1 and orange traces correspond to action #2. (B) Insets correspond to trials indicated by markers 1–6 in part (A). (B1) Initial D1, D2 weights were not biased to select either action. (B2) When the reward was greater than the expected reward, the reward prediction error (RPE) was positive. (B3) The PFC-DMS cortico-striatal synaptic weights evolve to promote the selection of outcome #1. (B4) When the expected reward is high and no reward is delivered, the reward prediction error becomes negative. (B5) Marker *5: selection of outcome #1 is de-emphasized when PFC selects outcome #1 and PMC selects the unrewarded action #2. Marker *6: selection of outcome #2 is de-emphasized when PFC selects outcome #2 and PMC selects the unrewarded action #2. (C) Learning performance of 100 agents. Performance is defined as the likelihood for an agent to selection action #1.

More »

Expand

Fig 4.

An agent learns to select action #2 in a reward reversal behavioral task.

(A) In panels depicting cortico-striatal weights, D1 synaptic weights are solid traces and D2 synaptic weights are dashed traces. Blue traces correspond to action #1 and orange traces correspond to action #2. The yellow shaded trials before trial 0 indicate trials at the end of the initial learning session. (B) Insets correspond to trials indicated by markers in part (A). (B1) PFC-DMS cortico-striatal synaptic weights respond to the change in reward feedback rule to stop selecting outcome #1. (B2) The PFC-DMS weights for both channels emphasize the indirect pathway; this configuration promotes exploration in the PFC. (B3) PFC-DMS cortico-striatal synaptic weights evolve to promote the selection of outcome #2.

More »

Expand

Fig 5.

The magnitude of reward feedback is reduced in a reward devaluation behavioral task.

The selection of action #1 persists but becomes less likely. (A) In panels depicting cortico-striatal weights, D1 synaptic weights are solid traces and D2 synaptic weights are dashed traces. Blue traces correspond to action #1 and orange traces correspond to action #2. The yellow shaded trials before trial 0 indicate trials at the end of the initial learning session. (B) Insets correspond to trials indicated by markers 1–3 in part (A). (B1) Following reward devaluation, the reward feedback is less than the expected reward. Even though, reward feedback continues to be mostly positive, the reward prediction error (RPE) is negative. (B2) The cortico-striatal synaptic weights for outcome #1 reflect the negative reward prediction error, and the indirect pathways for both outcomes are emphasized. (B3) Since action #1 is rewarded, the cortico-striatal synaptic weights for outcome #1 eventually recover to emphasize the direct pathway over the indirect pathway.

More »

Expand

Fig 6.

An agent learns to select action #2 in a punished outcome behavioral task.

Cortical activity, cortico-striatal synaptic weights, and reward feedback metrics evolve to overcome habit to select action #1. The yellow shaded trials before trial 0 indicate trials at the end of the initial learning session. (A) In panels depicting cortico-striatal weights, D1 synaptic weights are solid traces and D2 synaptic weights are dashed traces. Blue traces correspond to action #1 and orange traces correspond to action #2. (B) Insets correspond to trials indicated by markers 1–3 in part (A). (B1) PFC-DMS cortico-striatal synaptic weights for outcome #1 respond to the negative reward prediction error and evolve to emphasize the indirect pathway. (B2) During exploration, the PFC-DMS channels for both outcomes emphasize the indirect pathway. (B3) In the channel for outcome #2, the direct pathway becomes emphasized to promote the selection of the unpunished choice.

More »

Expand

Fig 7.

Impairment of PFC decreases learning performance.

(A). Illustration of impairment of PFC coding. The projections of the PFC into the DMS and the PMC contained mixed signals in PFC^† behavioral sessions. OC1 and OC2 correspond to projections that represent the channels for outcome #1 and outcome #2. (B-C) Performance is defined as the likelihood for agents (N = 100) to selection action #1; here, agents learn to select action #2. (B) Progression of outcome selection in the PFC and action selection in the PMC during a reward reversal task (black) and a reward reversal task with impaired PFC (red). (C) Progression of outcome selection in the PFC and action selection in the PMC during a punished outcome task (black) and a punishment learning task with impaired PFC (red).

More »

Expand

Fig 8.

Analysis of model output across behavioral tasks.

(A) Example model activity of an initial learning session and the following reward reversal session with change point analysis. Trial number here is relative to the beginning of the reward reversal session. (B) Change point analysis indicates the trial at which an ideal observer detects a change in cortical selection from action #1 to action #2. We compared change point performance in control simulations (black) and sessions with impaired PFC (red) (N = 100 agents). Box and whisker plots with median depicted. (C) Performance of agents at the end of each behavioral task showing the likelihood that agents select action #1). We compared PFC performance (black) to PMC performance (grey) (N = 100 agents). Box and whisker plots with median depicted.

More »

Expand

Table 2.

Discrepancy between goal-directed learning and habit.

More »

Expand

Table 3.

Model parameter values.

More »

Expand

Table 4.

Behavioral sessions were defined by the organization of reward feedback.

More »

Expand