Learning with sparse reward in a gap junction network inspired by the insect mushroom body

doi:10.1371/journal.pcbi.1012086

Learning with sparse reward in a gap junction network inspired by the insect mushroom body

Fig 5

Result of using Q-learning with a similar training configuration to that used for our model, i.e., maximum 100000 steps for each episode and sparse reward.

Blue line: Episode reward. Yellow line: 100 episode average reward. The Left shows the reward per episode and the right reward per step. Please note the y-axis is not in the same scale with Fig 4. The average episode reward suggests that the Q-learning’s performance decreased in the early episodes of training and failed to converge.

doi: https://doi.org/10.1371/journal.pcbi.1012086.g005