Fostering human learning in sequential decision-making: Understanding the role of evaluative feedback

doi:10.1371/journal.pone.0303949

Fig 1.

State space of a 4-disk ToH with 81 states.

Each state corresponds to a unique configuration of the disks on three pegs and edges encode allowed transitions between states. The task is to reach the configuration associated with a randomly selected target state (for example 2201 in this figure). Warmer colors are associated with the higher value function (see Sec. 2.1 for discussion).

More »

Expand

Fig 2.

Recursive structure of the state space of a 4-disk ToH with 81 states.

Each state corresponds to a unique configuration of the disks on three pegs and edges encode allowed transitions between states. The state space can be visualized as comprising three triangular structures. The states that connect different triangular structures are critical states to transition between triangles.

More »

Expand

Fig 3.

Experimental interface for the human subject participating in the training task of Experiment 5.

More »

Expand

Fig 4.

Box plots displaying percentage scores for both training (a) and transfer (b) tasks.

Within each box plot, the median is represented by the red horizontal line, while the lower and upper edges of the box signify the 25th and 75th percentiles, respectively. Whiskers extend to encompass the most extreme data points that are not classified as outliers, and individual outliers are plotted using the symbol ‘+’.

More »

Expand

Table 1.

Percentage of successful trials in the training and transfer tasks.

More »

Expand

Fig 5.

Box plots displaying positive percentage scores for both training (a) and transfer (b) tasks.

More »

Expand

Fig 6.

Bar plots displaying the mean percentage scores for different trials for both training (a) and transfer (b) tasks.

More »

Expand

Fig 7.

IRL plots in training tasks for all states.

IRL plots displaying learned human rewards in the training tasks for all states, using trajectory datasets (from trials 6-10 for each participant) from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.

More »

Expand

Fig 8.

IRL plots in transfer tasks for all states.

IRL plots displaying learned human rewards in the transfer tasks for all states, using trajectory datasets from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.

More »

Expand

Fig 9.

IRL plots in training tasks for a subset of 8 states.

IRL plots displaying learned human rewards in the training tasks for a subset of 8 states, using trajectory datasets (from trials 6-10 for each participant) from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.

More »

Expand

Fig 10.

IRL plots in transfer tasks for a subset of 8 states.

IRL plots displaying learned human rewards in the transfer tasks for a subset of 8 states, using trajectory datasets from each experiment that encompass (a) all available trajectories and (b) only successful trajectories, where success is defined by reaching the target state. The red color represents high rewards close to 1 and dark blue represents close to 0 reward.

More »

Expand

Table 2.

AIC, and BIC values (normalized by the number of observations) for different models allowing non-zero rewards for all the 81 states.

More »

Expand

Table 3.

AIC, and BIC values (normalized by the number of observations) for different models allowing non-zero rewards for a sub-set of 8 states.

More »

Expand