A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback
Figure 4
Simulation of the experiment by Fetz and Baker [17] for the case where an arbitrarily selected neuron triggers global rewards when it increases its firing rate.
(A) Spike response of 100 randomly chosen neurons within the recurrent network of 4000 neurons at the beginning of the simulation (20 sec–23 sec, left plot), and at the end of the simulation (the last 3 seconds, right plot). The firing times of the reinforced neuron are marked by blue crosses. (B) The firing rate of the positively rewarded neuron (blue line) increases, while the average firing rate of 20 other randomly chosen neurons (dashed line) remains unchanged. (C) Evolution of the average weight of excitatory synapses to the reinforced neuron (blue line), and of the average weight of 1663 randomly chosen excitatory synapses to other neurons in the circuit (dashed line). (D) Spike trains of the reinforced neuron before and after learning. (E) Histogram of the time-differences between presynaptic and postsynaptic spikes (bin size 0.5 ms), averaged over all excitatory synapses to the reinforced neuron. The black curve represents the histogram values for positive time differences (when the presynaptic spike precedes the postsynaptic spike), and the red curve represents the histogram for negative time differences.