Fig 1.
State space of a 4-disk ToH with 81 states.
Each state corresponds to a unique configuration of the disks on three pegs and edges encode allowed transitions between states. The task is to reach the configuration associated with a randomly selected target state (for example 2201 in this figure). Warmer colors are associated with the higher value function (see Sec. 2.1 for discussion).
Fig 2.
Recursive structure of the state space of a 4-disk ToH with 81 states.
Each state corresponds to a unique configuration of the disks on three pegs and edges encode allowed transitions between states. The state space can be visualized as comprising three triangular structures. The states that connect different triangular structures are critical states to transition between triangles.
Fig 3.
Experimental interface for the human subject participating in the training task of Experiment 5.
Fig 4.
Box plots displaying percentage scores for both training (a) and transfer (b) tasks.
Within each box plot, the median is represented by the red horizontal line, while the lower and upper edges of the box signify the 25th and 75th percentiles, respectively. Whiskers extend to encompass the most extreme data points that are not classified as outliers, and individual outliers are plotted using the symbol ‘+’.
Table 1.
Percentage of successful trials in the training and transfer tasks.
Fig 5.
Box plots displaying positive percentage scores for both training (a) and transfer (b) tasks.
Fig 6.
Bar plots displaying the mean percentage scores for different trials for both training (a) and transfer (b) tasks.
Fig 7.
IRL plots in training tasks for all states.
IRL plots displaying learned human rewards in the training tasks for all states, using trajectory datasets (from trials 6-10 for each participant) from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.
Fig 8.
IRL plots in transfer tasks for all states.
IRL plots displaying learned human rewards in the transfer tasks for all states, using trajectory datasets from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.
Fig 9.
IRL plots in training tasks for a subset of 8 states.
IRL plots displaying learned human rewards in the training tasks for a subset of 8 states, using trajectory datasets (from trials 6-10 for each participant) from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.
Fig 10.
IRL plots in transfer tasks for a subset of 8 states.
IRL plots displaying learned human rewards in the transfer tasks for a subset of 8 states, using trajectory datasets from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.
Table 2.
AIC, and BIC values (normalized by the number of observations) for different models allowing non-zero rewards for all the 81 states.
Table 3.
AIC, and BIC values (normalized by the number of observations) for different models allowing non-zero rewards for a sub-set of 8 states.