Suprathreshold perceptual decisions constrain models of confidence

doi:10.1371/journal.pcbi.1010318

Table 1.

Some implementations of the three most common confidence metric types used for modelling perceptual confidence and example study that used this model.

More »

Expand

Fig 1.

Sensory uncertainty and stimulus strength in the confidence computation.

A-C) Extracting the confidence decision variable from the Type 1 decision process. Scenarios contrasted: dot cloud with 2 dot samples (teal; high uncertainty; see inset for stimulus) and a 3-sample dot cloud (purple; low uncertainty), with both dot-clouds’ means equidistant from the screen centre (dotted line of inset). The observer correctly identifies the lateralisation of the generating distribution’s mean, as shown by their response in quotation marks. Vertical line: decision boundary. Horizontal lines: similarities (grey) or differences (coloured) between the distributions. A) Probability metric. Curves: normalised likelihood functions of the distribution mean, given the sensory measurement (marker; the most likely measurement selected for illustrative purposes). Shaded region: probability of the judgement being correct. The shaded region is greater for the 3-dot scenario, so the observer has higher confidence in this judgement. B) Scaled-distance metric. A signal-to-noise ratio transformation is applied to the distributions in A (i.e., rescaled to units of standard deviation while the areas under the curve on either side of the Type 1 criterion are preserved). The rescaled sensory measurement in the 3-dot scenario has a greater Distance-from-Criterion (DFC) and is judged as more confident. C) Influence of a centred prior (see inset). The posterior distributions (continuous curves), computed according to Bayes’ Rule, are differentially shifted towards the centre from the likelihood function locations (dashed). The 2-dot scenario is shifted more because of its higher uncertainty. Consequently, Probability metrics and DFC metrics yield higher confidence for the 3-dot scenario. D-F) How the confidence metric is affected by stimulus strength and sensory uncertainty (low to high uncertainty represented by colours ranging from purple to teal). The greater the stimulus strength (i.e., distance of mean from centre), the larger the confidence metric. However, raw probability values asymptote at 100% confident (D), whereas the confidence decision variable is unbounded if a Log-Probability-Ratio transformation is applied to the probability of being correct (E) or an Unscaled- or Scaled-Distance metric is used (F). Sensory uncertainty affects the rate of change of the confidence metric in response to changes in stimulus strength. Note y-axes have been rescaled for illustrative purposes. G) Heuristic confidence metric computed from estimates of stimulus strength (dot-cloud mean) and sensory uncertainty (number of dots) without consideration of the Type 1 process. The observer sets the weights on these factors, w₁ and w₂.

More »

Expand

Fig 2.

Experimental methods.

A) Task design. Observers were shown dots drawn from a Gaussian generating distribution with seven possible horizontal spatial offsets between ±4° from the screen centre. They judged if the distribution mean was left or right of centre. After each pair of perceptual decisions (Interval 1 and Interval 2), they reported the interval in which they had higher confidence in their decision. There were six levels of stimulus uncertainty, defined by information quantity (number of dots: 2 or 5) and quality (sampling distribution SD: 1.5, 2, or 2.5°), presented in an interleaved design. B) Occurrence of unique stimulus sets (colour coded). Squares: stimulus set in a single session. Sets are ordered by type, not session order, and participants by frequency of stimulus set repeats (5-pass to 1-pass).

More »

Expand

Table 2.

Summary of the seven base models.

Models spanned all three metric types and considered different implementations of confidence noise, prior distributions, and confidence-variable transformations. Standard unbiased Gaussian confidence noise is used unless noted otherwise. There are twelve distinct models for comparison when including the prior variants.

More »

Expand

Fig 3.

Confidence manipulation checks.

A) Suprathreshold Type 1 task performance. The mean proportion of correct spatial judgements across observers is shown for different trial categories. The left-most bar shows performance for all trials, and the next two bars for trials sorted by confidence, either being in the interval chosen as more confident or declined. B-C) Raw confidence choices, sorted by stimulus properties across the two intervals of a confidence pair. The colour code represents the proportion of “Interval 2” as more confident choices averaged across observers. Confidence choices are plotted as a function of the difference in distance from centre across intervals and difference in inverse dot spread. The distance from centre and dot spread were calculated using the empirical mean and SD of the dots displayed, and binned in the range ±3° for plotting. Gold line: the confidence-indifference contour, where the observer is equally likely to report Interval 1 or 2, calculated from the Full model in the nested logistic regression analysis. B) Comparisons where the number of dots was the same in each interval. C) Comparisons where the number of dots differed, with stimulus information and confidence selectively flipped so that Interval 2 has more dots for plotting purposes. D) Model comparison for the nested logistic regression analysis. AICc scores are reported relative to the Full model (winner) that contained both quantity and quality predictors. The Basic + Quantity and Basic + Quality models only contained one of these predictors and the Basic model contained neither. The results show that both the quantity and quality manipulations affected confidence. Larger positive scores indicate a worse fit. Error bars: ±SEM.

More »

Expand

Fig 4.

Model fit results (n = 16).

A) Relative AICc scores for the Type 1 models. Scores were compared to the winning flat-prior variant models. Bars: average relative AICc score. Markers: Individual participant results. Colour: flat-prior (orange) or centred-prior (blue) variant. Error bars: ±SEM. B) Relative AICc scores for the Type 2 models, with fraction best-fit annotated. Models are grouped by metric type. Note the different y-axis scales between panels A and B. C) Average best-fitting Heuristic-model coefficients across all observers. Note that the position coefficient was always fixed at 1 in the model, but is graphed to illustrate the relative coefficient weights. Error bars: ±SEM. D) The best-fitting quantity and quality coefficients per observer. Marker colour and type indicates the best-fitting model for that observer. Dotted lines: the position coefficient value for comparison. Dashed line: equality line. E) Model recovery results. For each model and observer, 10 data sets were simulated using the participant’s best-fitting parameters (1920 data sets total) and then fit by each of the 12 models. The best-fitting model was tallied per dataset per simulated model. Dark squares along the downward diagonal indicate high model recovery success. F) The relative AICc scores compared to the simulated model (downward diagonal of 0).

More »

Expand

Table 3.

Equations for the seven base Type 2 Confidence models.

The models consider the decision evidence, Ev(), in favour of the perceptual choice, r, given the dot measurements, X, and Type 1 decision criterion, k₁. The DFC Evidence-Strength models take a point estimate of the posterior’s mode (), with or without scaling by the posterior’s spread (), and compare its unsigned distance from k₁. Confidence forced-choice judgements involve comparing the relative confidence evidence for the intervals (w₁ versus w₂), with an influence of a confidence interval bias, k₂.

More »

Expand

Fig 5.

Confidence agreement results (n = 15).

Confidence agreement was calculated per trial as the proportion of the most-selected confidence choice for the participants who did a 3-, 4-, or 5-pass version of the experiment (Fig 2B). A) Heatmaps of average confidence agreement according to the properties of the two stimuli displayed, pooled across observers. Gold: the indifference lines where each interval is equally likely to be selected as more confident according to the preliminary analyses (Fig 3B). B) Comparing the 5-pass confidence agreement of a representative example participant (red; #11) with the predicted confidence agreement of the models. Green: the best-fitting model for this observer (Heuristic model). Black: other models (flat- and centred-prior variants had similar confidence agreement counts so only the flat-prior variant is shown). Grey: the Basic-Probability model with additional late noise (1% SD). Model predictions calculated from 100 simulated datasets using the participant-specific best-fitting parameters. Error bars: ±2 SD. C) A comparison of the predicted and the observed proportion of trials for the highest agreement count. Each marker is an individual participant, where marker style indicates their best-fitting model (“F” refers to the flat-prior variant and “C” the centred-prior variant). The best-fitting model per observer was used for the confidence-agreement prediction. Dashed line of equality is also shown.

More »

Expand

Fig 6.

Elements of the decision models.

Depicted is the estimated joint probability of the dot-cloud generating distribution mean and precision for an example stimulus presentation of 2 dots. A) Centred Normal-gamma prior distribution (based on the stimulus statistics of the experiment), likelihood function based on the noisy dot observations, and the resulting normal-gamma posterior distribution. All three distributions have the same axes as the ones shown in the posterior panel. B) The marginal prior (grey), likelihood (orange), and posterior (blue) for estimating the generating distribution mean. An unbiased Type 1 decision criterion, k₁, is also depicted. Inset: the displayed and perceived locations of the two sampled dots, with the dashed line indicating the screen midline.

More »

Expand