A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

doi:10.1371/journal.pcbi.1000180

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

Figure 6

Setup for reinforcement learning of spike times.

(A) Architecture. The trained neuron receives n input spike trains. The neuron μ* receives the same inputs plus additional inputs not accessible to the trained neuron. The reward is determined by the timing differences between the action potentials of the trained neuron and the neuron μ*. (B) A reward kernel with optimal offset from the origin of t_κ = −6.6 ms. The optimal offset for this kernel was calculated with respect to the parameters from computer simulation 1 in Table 1. Reward is positive if the neuron spikes around the target spike or somewhat later, and negative if the neuron spikes much too early.

doi: https://doi.org/10.1371/journal.pcbi.1000180.g006