Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons
Figure 5
A: The acrobot swing-up task figures a double pendulum, weakly actuated by a torque at the joint. The state of the pendulum is represented by the two angles
and
and the corresponding angular velocities
and
. The goal is to lift the tip above a certain height
above the fixed axis of the pendulum, corresponding to the length
of the segments. B: Goal reaching latency of
TD-LTP agents. The solid line shows the median of the latencies for each trial number and the shaded area represents the 25th to 75th percentiles of the agents performance. The red line represents a near-optimal strategy, obtained by the direct search method (see Models). The blue line show the trajectory of one of the best amongst the 100 agents. The dotted line shows the
limit after which a trial was interrupted if the agent did not reach the goal. C: Example trajectory of an agent successfully reaching the goal height (green line).