Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

doi:10.1371/journal.pcbi.1009070

Novelty is not surprise: Human exploratory and adaptive behavior in sequential decision-making

Fig 5

Model comparison of model-based (MB, blue bars), model-free (MF, red bars), and hybrid algorithms (Hyb and SurNoR, purple bars).

Exploratory behavior is either induced by optimistic initialization (+OI), uncertainty-seeking (+U), unbiased random action choices (RC), or novelty-seeking (+N); e.g., a model-based algorithm with novelty seeking is denoted as MB+N. SurNoR and the model-free or hybrid algorithms annotated with ‘+S’ use surprise to modulate the learning rate of the model-free TD learner; SurNoR and all algorithms annotated with ‘+S’ use surprise modulation also during model building (see Methods). A. Difference in log-evidence (with respect to RC) for the algorithms for all episodes of both blocks (left panel), the 1st episode of block 1 (middle), and the 1st episode of block 2 (right panel). High values indicate good performance; differences greater than 3 or 10 are considered as significant or strongly significant, respectively (see Methods); a value of 0 corresponds to random action choices (RC). The random initialization of the parameter optimization procedure introduces a source of noise, and the small error bars indicate the standard error of the mean over different runs of optimization (Methods, statistical model analysis). B. The expected posterior model probability [52, 53] given the whole dataset (Methods) with random effects assumption on the models. C. Accuracy rate of actions predicted by SurNoR (left scale and purple bars: mean and the standard error of the mean across participant) and the average uncertainty of SurNoR (right scale and dashed grey curve: mean entropy of action choice probabilities and the standard error of the mean across participants).

doi: https://doi.org/10.1371/journal.pcbi.1009070.g005