Learning with sparse reward in a gap junction network inspired by the insect mushroom body
Fig 3
The learned state network of the full taxi domain task.
This creates four disconnected graphs (one for each destination, which is unique to an episode) each consisting of four subgraphs, for four possible locations of the passenger (the fifth location of the passenger, at their destination, ends the episode, so is not included). Each subgraph represents the topology of locations in the environment.