A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback
Figure 8
Test of the validity of the analytically derived conditions 13–15 on the relationship between parameters for successful learning with reward-modulated STDP.
Predicted average weight changes (black bars) calculated from Equation 22 match in sign and magnitude the actual average weight changes (gray bars) in computer simulations, for 6 different experiments with different parameter settings (see Table 1). (A) Weight changes for synapses with . (B) Weight changes for synapses with
. Four cases where constraints 13–15 are not fulfilled are shaded in light gray. In all of these four cases the weights move into the opposite direction, i.e., a direction that decreases rewards.