Optimal Behavioral Hierarchy
Vertices represent states (green = start, red = goal), and edges feasible transitions. B. Mean performance of three hierarchical reinforcement learning agents in the rooms task. Inset: Results based on four graph decompositions. Blue: decomposition from panel C. Purple: decomposition from panel D. Black: entire graph treated as one region. Orange: decomposition with orange vertices in panel A segregated out as singleton regions. Model evidence is on a log scale (data range to ). Search time denotes the expected number of trial-and-error attempts to discover the solution to a randomly drawn task or subtask (geometric mean; range 685 to 65947; tick mark indicates the origin). Codelength signifies the number of bits required to encode the entire data-set under a Shannon code (range to ). Note that the abscissa refers both to model evidence and codelength. Model evidence increases left to right, and codelength increases right to left. C. Optimal decomposition. D. An alternative decomposition.