A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback
Figure 7
Results for reinforcement learning of exact spike times through reward-modulated STDP.
(A) Synaptic weight changes of the trained LIF neuron, for 5 different runs of the experiment. The curves show the average of the synaptic weights that should converge to (dashed lines), and the average of the synaptic weights that should converge to
(solid lines) with different colors for each simulation run. (B) Comparison of the output of the trained neuron before (top trace) and after learning (bottom trace). The same input spike trains and the same noise inputs were used before and after training for 2 hours. The second trace from above shows those spike times S* which are rewarded, the third trace shows the realizable part of S* (i.e. those spikes which the trained neuron could potentially learn to reproduce, since the neuron μ* produces them without its 10 extra spike inputs). The close match between the third and fourth trace shows that the trained neuron performs very well. (C) Evolution of the spike correlation between the spike train of the trained neuron and the realizable part of the target spike train S*. (D) The angle between the weight vector w of the trained neuron and the weight vector w* of the neuron μ* during the simulation, in radians. (E) Synaptic weights at the beginning of the simulation are marked with ×, and at the end of the simulation with •, for each plastic synapse of the trained neuron. (F) Evolution of the synaptic weights w/wmax during the simulation (we had chosen
for i<50,
for i≥50).