Metacognitive efficiency in learned value-based choice
Fig 2
Two-outcome reversal learning task.
(A) Participants made choices between two bandits (generated by an LLM: OpenAI ChatGPT, model o4-mini; see OpenAI (2025)) and reported their confidence in the correctness of those choices (on a continuous scale) before receiving feedback on the outcome. (B) The bandits dispensed rewards, the dots, according to two normal distributions with means of 40 and 60, which alternated every trials. The variance for both options was set at 8 in a low-variance condition and set to 16 in a high-variance condition. Each participant completed both conditions in a counterbalanced order, with a time gap of
days.