A confirmation bias in perceptual decision-making due to hierarchical approximate inference

doi:10.1371/journal.pcbi.1009517

Fig 1.

Differences in “sensory information” and “category information” can explain differences in temporal biases reported by earlier studies.

a) A observer’s “temporal weighting strategy” is an estimate of how their choice is based on a weighted sum of each frame of evidence e_f (more precisely, a weighted sum of the log odds at each frame). Three commonly observed motifs are decreasing weights (primacy), constant weights (optimal), or increasing weights (recency). b) Information in the stimulus about the category may be decomposed into information in each frame about a sensory variable (“sensory information”) and information about the category given the sensory variable (“category information”). c) Category information and sensory information may be manipulated independently, creating a two-dimensional space of possible tasks. Any level of task performance can be the result of different combinations of sensory and category information. A qualitative placement of previous work into this space separates those that find primacy effects in the upper-left (low sensory/high category information or LSHC regime) from those that find recency effects or optimal weights in the lower right (high sensory/low category information or HSLC regime). Numbered references are: [1] Kiani et al (2008), [2] Nienborg and Cumming (2009), [3] Brunton et al (2013), [4] Wyart et al (2012), [5] Raposo et al (2014), [6] Drugowitsch et al (2016). See S1 Text for justifications of placements.

More »

Expand

Fig 2.

Information flow during hierarchical inference where categorical beliefs are fed back as a prior on sensory features.

a) Generative model that we assume the brain has learned for a discrimination task, which specifies how sensory observations e_f depend on the category for the trial, C, in two stages: each sensory observation e_f is assumed to be a noisy realization of underlying sensory features, x_f, and each frame of sensory features is itself assumed to be selected according to the trial’s category. b-c) Integrating evidence about C requires updating the current belief about C with new information derived from the sensory representation (left-right “integration” and bottom-up “update” arrows). The posterior distribution over x combines top-down expectations (diagonal “prior” arrows) with new evidence from the stimulus, e_f (bottom-up “likelihood” arrows). Width of arrows indicates average amount of information communicated; red and blue arrows indicate changes in information flow between conditions. Note that when inference is exact, the prior is subtracted from the information in the update during the integration to prevent double-counting early evidence. While the generative model in (a) operates with discrete frames, f, inference in the brain happens in continuous time, t. b) LSHC: Low sensory information means little information in the likelihood about sensory features x_f. High category information means that most of this information is also informative about C. It also means high information in the prior that is fed back to the sensory representation. c) HSLC: High sensory information means high information in the likelihood about sensory features x_f. Low category information means that this information is only weakly predictive of C. It also means little information in the prior that is being fed back to the sensory representation.

More »

Expand

Fig 3.

Changes in bias predicted by approximate hierarchical inference models.

a) Performance of an ideal observer reporting C given ten frames of evidence. White line shows threshold performance, defined as 70% correct. The ideal observer’s temporal weights are always flat (not shown). b) Performance of our sampling-based approximate inference model with no leak (Methods). Colored dots correspond to lines in the next panel. c) Temporal weights in the model transition from flat to a strong primacy effect, all at threshold performance, as the stimulus transitions from the HSLC to the LSHC conditions. d) Visualization of how temporal biases change across the entire task space. Red corresponds to primacy, and blue to recency. White contour as in (b). Black lines are iso-contours for slopes corresponding to highlighted points in (b). e-g) Same as (b-d) but with leaky integration, which lessens primacy effects and produces recency effects when category information is low.

More »

Expand

Fig 4.

Two task conditions that reduce either sensory or category information to threshold level using a staircase.

a) Each trial consisted of a 200ms start cue, followed by 10 stimulus frames presented for 83ms each, followed by a single mask frame of zero-coherence noise. After a 750ms delay, left or right targets appeared and participants pressed a button to categorize the stimulus as “left” or “right.” Stimulus contrast is amplified and spatial frequency reduced in this illustration. b) Category information is determined by the expected ratio of frames in which the orientation matches the correct category, and sensory information is determined by a parameter κ determining the degree of spatial orientation coherence (Methods). At the start of each block, we reset the staircase to the same point, with category information at 9: 1 and κ at 0.8. We then ran a 2-to-1 staircase either on κ or on category information. The Low-Sensory-High-Category (LSHC) and High-Category-Low-Sensory (HSLC) ovals indicate sub-threshold trials; only these trials were used in the regression to infer observers’ temporal weights. c) Visualization of a noisy stimulus in the LSHC condition. All frames are oriented to the left. d) Psychometric curves for all observers (thin lines) and averaged (thick line) over the κ staircase. Shaded gray area indicates the median threshold level across all observers. e) Example frames in the HSLC condition. The orientation of each frame is clear, but orientations change from frame to frame. f) Psychometric curves over frame ratios, plotted as in (d).

More »

Expand

Fig 5.

Every observer’s temporal bias consistently changes from primacy to unbiased/recency between conditions as predicted.

a-b) Temporal weights from logistic regression of choices from sub-threshold frames for individual observers. Weights are regularized by a cross-validated smoothness term, and are normalized to have a mean of 1. c-d) To summarize temporal biases, we constrained weights to be an exponential function of time and re-fit them to observers’ choices. Exponential weights had higher cross-validated performance than regularized logistic regression, supporting their use to summarize temporal biases (Fig B in S1 Text; Methods). e) The change in the temporal bias, quantified as the exponential slope parameter (β), between the two task contexts for each observer is consistently positive (combined, p < 0.01, sign test on median slope from bootstrapping). This result is individually significant in 9 of 12 observers by bootstrapping (p < 0.05, p < 0.01, and p < 0.001 indicated by *, **, and *** respectively; non-significant observers plotted with dashed lines). Points are median slope values after bootstrap-resampling each observer’s sub-threshold trials. A slope parameter β > 0 corresponds to a recency bias and β < 0 to a primacy bias. We found similar results using linear rather than exponential weight functions (Fig C in S1 Text).

More »

Expand

Fig 6.

Fitting an extended integration-to-bound (“Extended ITB”) model to data demonstrates that integration dynamics (negative α for confirmation bias, positive α for forgetting), rather than a bound, best accounts for data.

a) Illustration of Extended ITB model. As in classic drift-diffusion models with an absorbing bound, evidence is integrated to an internal bound, after which new evidence is ignored. Compared to perfect integration (α = 0), a positive leak (α > 0) decays information away and results in a recency bias, and a negative leak (α < 0) amplifies already integrated information, resulting in a primacy bias. Since α < 0 may also result in more bound crossings, both leak and bound together determine the shape of the temporal weights. b) Inferred values of the bound and leak parameters in each condition, shown as median±68% credible intervals. The classic ITB explanation of primacy effects corresponds to a non-negative leak and a small bound—illustrated here as a shaded green area. Note that the three observers near the ITB regime are points from the HSLC task—two still exhibit mild recency effects and one exhibits a mild primacy effect as predicted by ITB. c) Across both conditions, the temporal slopes (β) implied by the full model fits closely match the slopes in the data. β < 0 corresponds to primacy, and β > 0 to recency. Error bars indicate 68% confidence intervals from bootstrapping trials on β_data and from posterior samples on β_fit. d) Median temporal biases implied by the full model (middle) and by the model with either zero leak (left) or infinite bound (right). Each line corresponds to a single observer. (LSHC condition only—HSLC condition in Fig L in S1 Text). d) Across the population, the negative leak (confirmation bias) accounted for 99% (68%CI = [93%, 106%]), and bounded integration accounted for 18% (68%CI = [13%, 23%]) of the primacy bias captured by the model. (Additional analyses in Fig L in S1 Text).

More »

Expand