Learning with sparse reward in a gap junction network inspired by the insect mushroom body
Fig 6
Number of steps per episode in Taxi-v3 task (A) is the number of steps with the dynamic routing model.
The inset figure is a zoomed-in version of the outer plot showing convergence to around 20 steps. (B) is the number of steps taken per episode by Q-learning in the same training configuration same as Fig 5. Q-learning does not converge.