Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

doi:10.1371/journal.pcbi.1003024

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Figure 3

Actor neurons.

A: A ring of actor neurons with lateral connectivity (bottom, green: excitatory, red: inhibitory) embodies the agent's policy (top). B: Lateral connectivity. Each neuron codes for a distinct motion direction. Neurons form excitatory synapses to similarly tuned neurons and inhibitory synapses to other neurons. C: Activity of actor neurons during an example trial. The activity of the neurons (vertical axis) is shown as a color map against time (horizontal axis). The lateral connectivity ensures that there is a single bump of activity at every moment in time. The black line shows the direction of motion (right axis; arrows in panel B) chosen as a result of the neural activity. D: Maze trajectory corresponding to the trial shown in C. The numbered position markers match the times marked in C.

doi: https://doi.org/10.1371/journal.pcbi.1003024.g003