Models of heterogeneous dopamine signaling in an insect learning and memory center

doi:10.1371/journal.pcbi.1009205

Fig 1.

Diagram of the mushroom body model.

(A) Kenyon cells (KCs) respond to stimuli and project to mushroom body output neurons (MBONs) via weights W_KC→MBON. These connections are dynamic variables that are modified according to a synaptic plasticity rule gated by dopamine neurons (DANs). Output neurons and dopamine neurons are organized into compartments (dotted rectangles). External signals convey, e.g., reward, punishment, or context to the mushroom body output circuitry according to weights W_ext. A linear readout with weights W_readout determines the behavioral output of the system. Connections among output neurons, dopamine neurons, and feedback neurons (gray) are determined by weights W_recur. Inset: expanded diagram of connections in a single compartment. (B) The form of the dopamine neuron-gated synaptic plasticity rule operative at KC-to-MBON synapses. ΔT is the time difference between Kenyon cell activation and dopamine neuron activation. (C) Illustration of the change in KC-to-MBON synaptic weight ΔW following forward and backward pairings of Kenyon cell and dopamine neuron activity.

More »

Expand

Fig 2.

Schematic of meta-learning procedure.

(A) Two phases of meta-learning and testing. Left: During the optimization phase, connections that form the mushroom body output circuitry are updated with gradient descent (orange). Kenyon cell to output neuron weights evolve “online” (within each trial) according to dopamine-dependent synaptic plasticity. Right: After optimization is complete, the network is tested on a new set of trials. In this phase, connections that form the output circuitry are fixed. (B) Illustration of trials involving CS/US associations presented during training (left) and testing (right). Each trial involves new CS/US identities and timing.

More »

Expand

Fig 3.

Behavior of network during reward conditioning paradigms.

(A) Behavior of output neurons (MBONs) during first-order conditioning. During training, a CS+ (blue) is presented, followed by a US (green). Top: The network is optimized so that a readout of the output neuron activity during the second CS+ presentation encodes valence (gray curve). Black curve represents the target valence and overlaps with the readout. Bottom: Example responses of output neurons. (B) Same as A, but for CS- presentation without US. (C) Same as A, but for extinction, in which a second presentation of the CS+ without the US partially extinguishes the association. (D) Same as A, but for second-order conditioning, in which a second stimulus (CS2) is paired with a conditioned stimulus (CS1). (E) Error rate averaged across networks in different paradigms. An error is defined as a difference between reported and target valence with magnitude greater than 0.2 during the test period. Networks optimized with recurrent output circuitry (control, black) are compared to networks without recurrence (no recur., red), and networks prior to optimization (initialization, gray). Error rates for each network realization are evaluated over 50 test trials and used to generate p-values with a Mann-Whitney U-test over 20 network realizations.

More »

Expand

Fig 4.

Comparison to networks without dopamine-gated plasticity.

(A) Behavior during first-order conditioning, similar to Fig 3A, but for a non-plastic network. Because of the need for non-plastic networks to maintain information using persistent activity, performance degrades with longer delays between training and test phases. We therefore chose this delay to be shorter than in Fig 3A. Results are shown for a network optimized with 10 odors. (B) Same as A, but for a trial in which a CS-US pairing is followed by the presentation of a neutral CS. (C) Difference in response (reported valence) for CS+ and neutral CS as a function of the number of odors. Each CS+ is associated with either a positive or negative US. For comparison, the corresponding response difference for networks with dopamine-gated plasticity is shown in blue. Error bars represent s.e.m. over 8 network realizations.

More »

Expand

Fig 5.

Population analysis of dopamine neuron (DAN) activity.

(A) First-order conditioning trials with positive or negative valence US. (B) Responses of model dopamine neurons from a single network. Neurons are sorted according to hierarchical clustering (illustrated with gray dendrogram) of their responses. (C) Principal components analysis of dopamine neuron population activity. Left: Response to CS before conditioning. Middle: Response to a positive (green) or negative (red) valence US. Right: Response to a previously conditioned CS.

More »

Expand

Fig 6.

Behavior of a network that encodes both valence and novelty.

The network is similar to Fig 5 but a second readout that computes novelty is added. The novelty readout is active for the first presentation of a given CS and zero otherwise. (A) The addition of novelty as a readout dimension introduces dopamine neuron responses that are selective for novel CS. Compare with Fig 5B. (B) The first principal component (PC1) for the network in A is selective for CS novelty. Compare with Fig 5C.

More »

Expand

Fig 7.

Model behavior for long sequences of associations.

(A) Illustration of non-specific potentiation following dopamine neuron activity (compare with Fig 1C). (B) Example sequence of positive and negative associations between two odors CS+ and CS2+ and US. Neutral gray odors (CS-) are also presented randomly. (C) Histogram of synaptic weights after a long sequence of CS and US presentations for networks with (black) and without (red) non-specific potentiation. Weights are normalized to their maximum value. The means of the distributions across 18 network realizations for each condition were significantly different (p < 2 ⋅ 10⁻⁷, Mann-Whitney U-test). (D) Left: dopamine neuron responses for the sequence of CS and US presentations. Right: same as left, but for a network without non-specific potentiation. (E) Error rate (defined as a difference between reported and target valence with magnitude greater than 0.5 during a CS presentation; we used a higher threshold than Fig 3 due to the increased difficulty of the continual learning task) for networks with (black) and without (red) non-specific potentiation. Error rates for each network realization are evaluated over 20 test trials and used to generate p-values with a Mann-Whitney U-test over 18 network realizations.

More »

Expand

Fig 8.

Behavior of a network whose activity transitions between a sequence of discrete states in addition to supporting associative conditioning.

(A) Brief pulse inputs to the network signal that a switch to a new state should occur. (B) Top: A linear readout of dopamine neuron activity can be used to decode the network state. Bottom: dopamine neuron (DAN) activity exhibits state-dependent fluctuations in addition to responding to CS and US. (C) Decoding of stimuli that predict state transitions. Heatmap illustrates the correlation between output neuron population responses to the presentation of different stimuli that had previously been presented prior to a state transition. Stimuli are ordered based on the state transitions that follow their first presentation. Blue blocks indicate that stimuli that predict the same state transition evoke similar output neuron activity. (D) Performance of networks on conditioning tasks. For each network realization, error rates are computed over 50 test trials and bars represent s.e.m. over 40 network realizations.

More »

Expand

Fig 9.

Model behavior for a navigation task.

(A) Top: Schematic of navigation task. After conditioning, the simulated organism uses odor concentration input (blue) and information about wind direction w relative to its heading h. Bottom: Diagram of a network that uses these signals to compute forward and angular velocity signals for navigation. Velocity signals are read out from other neurons in the mushroom body output circuitry (gray), rather than output neurons. (B) Position of the simulated organism as a function of time during navigation. Black: Simulation with intact dopamine-gated plasticity during navigation; Red: Simulation with plasticity blocked. Arrowheads indicate direction of movement. In the top left plot, the starting location (gray circle) is indicated. (C) Position error (mean-squared distance from rewarded odor source at the end of navigation) for control networks and the same networks in which dopamine-gated plasticity is blocked during the navigation phase. For each network realization, error rates are computed over 50 test trials and bars represent s.e.m. over 30 network realizations. Significance is computed with a Wilcoxon signed-rank test. (D) Forward (top) and angular (bottom) velocity as a function of time during one example navigation trial. (E) Left: Dopamine neuron activity during CS and US presentation in the conditioning phase of a trial. Right: Dopamine neuron activity during the navigation phase of the trial (same trial as in D).

More »

Expand