Individual differences in the perception of probability

In recent studies of humans estimating non-stationary probabilities, estimates appear to be unbiased on average, across the full range of probability values to be estimated. This finding is surprising given that experiments measuring probability estimation in other contexts have often identified conservatism: individuals tend to overestimate low probability events and underestimate high probability events. In other contexts, repulsive biases have also been documented, with individuals producing judgments that tend toward extreme values instead. Using extensive data from a probability estimation task that produces unbiased performance on average, we find substantial biases at the individual level; we document the coexistence of both conservative and repulsive biases in the same experimental context. Individual biases persist despite extensive experience with the task, and are also correlated with other behavioral differences, such as individual variation in response speed and adjustment rates. We conclude that the rich computational demands of our task give rise to a variety of behavioral patterns, and that the apparent unbiasedness of the pooled data is an artifact of the aggregation of heterogeneous biases.

Subjects' estimates of these probabilities and their response times were recorded for 95 each ring draw, and subjects were given monetary rewards for the accuracy of their 96 estimates. Notably, unlike other trial-by-trial experimental procedures, the task was 97 self-paced: subjects clicked a button (labeled 'NEXT') to request the next ring draw. The 98 reported probabilities were measured using each subject's slider position at the time the 99 'NEXT' button was clicked; response times denote the time elapsed between clicks 100 requesting the next ring. We have 10,000 observations for each subject, which gives us 101 enough data to conclude with some confidence that any systematic patterns we identify are    . In contrast to the distributions of probabilities reported by individual subjects, the underlying probabilities were drawn uniformly from the unit interval. Deviations from the uniform distribution in the underlying probabilities reflect finite sampling. for each subject, and thus do not represent inherent noise across individual trials, which 119 would wash out over such a large sample. These differences foreshadow the more formal 120 results of both conservatism and repulsive biases, discussed in the next section.

122
In order to formally test for and characterize potential biases in these data, we turn 123 to a model comparison exercise. We consider a family of computational models that reported probability be denoted by R, we estimate the log odds model of the form where B is the Bayesian estimate, and the random noise in subjects' estimates is assumed 130 to be normally distributed with zero mean and variance σ 2 across trials, ε ∼ N (0, σ 2 ). The

131
Bayesian estimate for each draw is the optimal forecast given (i) the constant probability  We allow for two types of systematic distortions. First, the parameter α allows for 136 uniform over-or underestimation: non-zero values of α predict that a subject's estimates 137 will systematically be either higher or lower than the Bayesian response (Fig. 3A)   Note: The parameter values fixed to null values are shown in gray. For estimations that allow parameters to differ across subjects, the table reports the averages across subjects, in bold. LL is the log-likelihood, k is the number of free parameters, and BIC = −2LL + kln(N ). The number of observations is N = 109, 890 for the unconditional estimates and N = 9, 673 for estimates conditional on adjustment.
Electronic copy available at: https://ssrn.com/abstract=3446790 parameters explains the data best among those tested here, once again, even after 195 penalizing for model complexity. Cross-sectional variations in the non-linear bias parameter 196 β and in the standard deviation of noise σ generate the biggest improvements in model fit.  We report subject-level parameter estimates in Table 2. The table also     Comparing the average level of between-and within-subject variability, we find that 254 there was no significant difference in within-versus across-subject variability for the 255 additive bias parameter α (t(19) = 0.17, p = 0.57). We find support for intra-subject 256 variability being smaller than inter-subject variability for the parameter β (t(19) = -1.53,   half's distribution of session-wise parameter estimates across sessions (Table 3). Moreover, 273 the difference between sub-populations appears to be stable across sessions, as shown in Non-parametric bootstrap tests confirm that the observed variance ratios -where between-subject variance is greater than within-subject variance by a factor of around two -are significantly different from the null benchmark (for parameters β and σ 2 ).
between early and late parameters (of each subject) would further support the hypothesis 279 that bias levels remain relatively constant throughout the task. We find that this is indeed Similarly, are these extreme estimators adjusting their response sliders more frequently? 299 We begin with the examination of associations between bias magnitudes exhibited by 300 each subject. We test for correlations between each subjects' nonlinear bias parameter β 301 and their respective overestimation (α) and randomness (σ 2 ) parameters. We find a 302 significant correlation between β and α (r(9) = 0.60, p < .05); however, average β values 303 do not correlate with the subjects' associated values of σ 2 (r(9) = 0.40, p = 0.11). 304 We next confirm that individual degrees of the conservative-repulsion bias (using the 305 M.L. estimates of β reported in Table 2) to be negatively and significantly correlated with of rings drawn between the subject's revisions of estimates (Fig. 7). Hence, subjects who 310 make more conservative predictions also deliberate longer between ring draws. These 311 subjects might also adjust their reports less frequently, waiting for more ring draws to be can then be interpreted as resulting from subjects being engaged serially with these 333 different cognitive tasks.

334
The links between individual subjects' bias and secondary measures such as response 335 time also suggests that differing biases may reflect differing approaches to decision making. correspond to variation across more general psychological constructs (e.g., patience), and 361 could also be related to differences in apparent preferences (e.g., time and risk preferences).

362
At present, it is difficult to distinguish between the biases in our subjects' behavior 363 that result from probability estimation as opposed to the detection of change points.

364
Future experimental designs should seek to isolate one factor from the other. Indeed, in a match associated rates of reinforcement (e.g., Baum, 1974;Herrnstein, 1970). Extending 388 the scope of this task to include additional probabilistic outcomes will enable comparisons 389 of performance to more general learning paradigms. Overall, future work should aim not 390 only to tease apart the specific cognitive processes that contribute to heterogeneity in 391 subjects' estimation styles, but also to establish to what extent these differing styles imply 392 systematic differences in responses in other estimation paradigms.