Models of heterogeneous dopamine signaling in an insect learning and memory center
Fig 3
Behavior of network during reward conditioning paradigms.
(A) Behavior of output neurons (MBONs) during first-order conditioning. During training, a CS+ (blue) is presented, followed by a US (green). Top: The network is optimized so that a readout of the output neuron activity during the second CS+ presentation encodes valence (gray curve). Black curve represents the target valence and overlaps with the readout. Bottom: Example responses of output neurons. (B) Same as A, but for CS- presentation without US. (C) Same as A, but for extinction, in which a second presentation of the CS+ without the US partially extinguishes the association. (D) Same as A, but for second-order conditioning, in which a second stimulus (CS2) is paired with a conditioned stimulus (CS1). (E) Error rate averaged across networks in different paradigms. An error is defined as a difference between reported and target valence with magnitude greater than 0.2 during the test period. Networks optimized with recurrent output circuitry (control, black) are compared to networks without recurrence (no recur., red), and networks prior to optimization (initialization, gray). Error rates for each network realization are evaluated over 50 test trials and used to generate p-values with a Mann-Whitney U-test over 20 network realizations.