Learning with sparse reward in a gap junction network inspired by the insect mushroom body
Fig 9
Cumulative episode reward in the Voronoi world task by the dynamic routing model.
10000 step limit. Only the reward at the final step is fed to the model. Blue line: Episode reward. Yellow line: 100 episode average reward. (A) is reward per episode, (B) is reward per step. The y-axes are in linear-scale between -10 to 10, but log-scale out of this range.