Humans combine value learning and hypothesis testing strategically in multi-dimensional probabilistic reward learning
Fig 2
Participants’ behavior in the “build your own icon” task.
(A, B): Performance and choices over the course of a game, by game type. (A) Participants’ average probability of reward (based on the number of rewarding features in their configured stimuli), over the course of 1D-, 2D- and 3D-relevant games (left, middle and right columns). Red and blue curves represent “known” and “unknown” conditions, respectively. For all game types, chance reward probability is 0.4 and 0.8 is the maximum reward probability. Shading (ribbons around the lines) represents ±1 s.e.m. across participants. ** p < .01. For grouping of these learning curves by task complexity, see S1 Fig. (B) Same as in (A), but for the number of features selected. (C, D): Responses to post-game questions regarding the rewarding features in each game condition. (C) Average number of correctly-identified rewarding features; (D) Average number of false positive responses, i.e., falsely identifying an irrelevant dimension as relevant. *** p < .001. Error bars represent ±1 s.e.m. across participants.