Fig 1.
Schematic explanation of the modified reward-modulated STDP rule.
(A) The whole network overview. (B) The STDP learning window. (C) The mechanism of synaptic plasticity. Synaptic weight changes as a product of eligible trace c(t) and dopaminergic signal D(t). (D) The upstate propagation from a presynaptic neuron to a postsynaptic neuron.
Fig 2.
Upstate propagation improves the reinforcement task of poly-synaptic paths.
(A) An example of a successful trial. The initial synaptic weights are represented in color (Left). All neurons are aligned in a grid with 100 μm spacing, and the adjacent neurons within 200√2 μm are randomly connected with a probability of 0.5. Synaptic connections from stimulated neuron S are all outward, while the synaptic connections to target neuron T and false-target neurons F are all inward. The path from S to T is selectively strengthened at the end of the learning (Middle). The difference between the initial synaptic weight and the final synaptic weight (Right). (B) A successful example of this task. The firing rate of the target neuron selectively increases. (C) The averaged synaptic weight difference from the initial condition to the 10th, 25th, 40th trials is plotted. The averaged synaptic weights are calculated, including the direction of synaptic weights (so that the opposite direction has a minus sign). (D) The averaged synaptic weight difference between the initial trial and the last trial (the 80th trial) is plotted. (E) The success rate of each condition (50 simulations averaged). The shaded area indicates the standard error of the mean. A combination of wave and tonic dopamine signal Dt (red line) shows the best task performance, while the conventional model (black line) fails to complete this task.
Fig 3.
Wave propagation helps find a shortcut.
(A) An example of a successful trial. Each panel represents the initial synaptic weight (Left), the last synaptic weight (Middle) and the difference between them (Right). Initially, a strong detour path from the stimulated neuron S on the upper-left at (100 μm, 500 μm) to the target neuron T on the bottom-left at T (100 μm, 100 μm) is prepared. Neurons within 100√5 μm are randomly connected with a probability of 0.5. Synaptic connections from S are all outward, while synaptic connections to T are all inward. At the end of the trial, the shortcut paths are strengthened while the detour paths are preserved. (B) The averaged synaptic weight difference from the initial trial to the 10th, 25th, and 40th trials is plotted. The averaged synaptic weights are calculated by each neuron, including the direction of synaptic connection. (C) The averaged synaptic weight difference from the initial trial and the last trial (the 60th trial) is plotted. (D) The amount of reward signal is plotted. The wave conditions can successfully escape a local solution state and reach a better solution for this task. The error bar indicates the standard error of the mean. (E) The latency index takes the latency of the first spike in the target neuron after the stimulus onset if it is below 300 ms and takes 300 ms if the latency is above 300 ms. The condition with waves and tonic dopaminergic signal Dt (red) shows the best performance, while the conventional model (black) fails. The error bar indicates the standard error of the mean.
Fig 4.
Wave propagation is useful for learning a nonlinear function.
(A) A successful example of synaptic weight change. The Initial condition (Left), the last condition (Middle), and the difference (Right). Each middle layer neuron receives a strong synaptic weight and a weak synaptic weight from stimulated neurons and sends an output to T or F. The success rate is initially at the chance level. (B) The averaged synaptic weight difference between each middle layer neuron projecting to T and F is plotted at the 5th, 10th, and 15th trials. Each path strength is calculated as a product of the averaged synaptic weight from stimulated neurons to middle layer neurons and middle layer neurons to a target neuron. Percentage changes in averaged synaptic weight are shown in color. (C) The same plots as (B) at the last trial (the 25th trial). Wave condition (top column) successfully learns the correct paths, while no-wave condition (lower column) fails. (D) The success rate of the XOR task. Wave conditions (red & yellow) shows better results than no-wave conditions (green & black).
Fig 5.
The whole system of our model.
Excitatory neurons are locally connected via synapses. Inhibitory feedback signal controls the firing rate of excitatory neurons, global dopaminergic signal modulates the synaptic weights, and wavefield created by the activities of other neurons controls the activity level of each excitatory neuron. External input and reward functions are externally provided. The + = operator means that the right-hand-side is added to the left-hand-side when an event happens (with delay for Dp). We use the same parameters (except three parameters summarized in Table 1) to learn three qualitatively different tasks in different network architectures, which underscores the robustness of the learning rule and the role of traveling waves.
Table 1.
The task-dependent variables are summarized.