Learning with sparse reward in a gap junction network inspired by the insect mushroom body
Fig 10
Result of using Q-learning with a similar training configuration to solve Voronoi World.
That is, maximum 10000 steps for each episode and sparse reward. Q-learning did not converge in such a training configuration. (A) is reward per episode, (B) is reward per step.