Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

doi:10.1371/journal.pcbi.1003024

Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons

Figure 5

Acrobot task.

A: The acrobot swing-up task figures a double pendulum, weakly actuated by a torque at the joint. The state of the pendulum is represented by the two angles and and the corresponding angular velocities and . The goal is to lift the tip above a certain height above the fixed axis of the pendulum, corresponding to the length of the segments. B: Goal reaching latency of TD-LTP agents. The solid line shows the median of the latencies for each trial number and the shaded area represents the 25th to 75th percentiles of the agents performance. The red line represents a near-optimal strategy, obtained by the direct search method (see Models). The blue line show the trajectory of one of the best amongst the 100 agents. The dotted line shows the limit after which a trial was interrupted if the agent did not reach the goal. C: Example trajectory of an agent successfully reaching the goal height (green line).

doi: https://doi.org/10.1371/journal.pcbi.1003024.g005