A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

doi:10.1371/journal.pcbi.1000180

A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback

Figure 5

Evolution of the dynamics of a recurrent network of 4000 LIF neurons during application of reward-modulated STDP.

(A) Distribution of the synaptic weights of excitatory synapses to 50 randomly chosen non-reinforced neurons, plotted for 4 different periods of simulated biological time during the simulation. The weights are averaged over 10 samples within these periods. The colors of the curves and the corresponding intervals are as follows: red (300–360 sec), green (600–660 sec), blue (900–960 sec), magenta (1140–1200 sec). (B) The distribution of average firing rates of the non-reinforced excitatory neurons in the circuit, plotted for the same time periods as in (A). The colors of the curves are the same as in (A). The distribution of the firing rates of the neurons in the circuit remains unchanged during the simulation, which covers 20 minutes of biological time. (C) Cross-correlogram of the spiking activity in the circuit, averaged over 200 pairs of non-reinforced neurons and over 60 s, with a bin size of 0.2 ms, for the period between 300 and 360 seconds of simulated biological time. It is calculated as the cross-covariance divided by the square root of the product of variances. (D) As in (C), but between seconds 1140 and 1200. (Separate plots of (B), (C), and (D) for two types of excitatory neurons that received different amounts of noise currents are given in Figure S1 and Figure S2.)

doi: https://doi.org/10.1371/journal.pcbi.1000180.g005