Learning with sparse reward in a gap junction network inspired by the insect mushroom body
Fig 5
Result of using Q-learning with a similar training configuration to that used for our model, i.e., maximum 100000 steps for each episode and sparse reward.
Blue line: Episode reward. Yellow line: 100 episode average reward. The Left shows the reward per episode and the right reward per step. Please note the y-axis is not in the same scale with Fig 4. The average episode reward suggests that the Q-learning’s performance decreased in the early episodes of training and failed to converge.