The folded X-pattern is not necessarily a statistical signature of decision confidence

Manuel Rausch; Michael Zehetleitner

doi:10.1371/journal.pcbi.1007456

Abstract

Recent studies have traced the neural correlates of confidence in perceptual choices using statistical signatures of confidence. The most widely used statistical signature is the folded X-pattern, which was derived from a standard model of confidence assuming an objective definition of confidence as the posterior probability of making the correct choice given the evidence. The folded X-pattern entails that confidence as the subjective probability of being correct equals the probability of 0.75 if the stimulus in neutral about the choice options, increases with discriminability of the stimulus in correct trials, and decreases with discriminability in incorrect trials. Here, we show that the standard model of confidence is a special case in which there is no reliable trial-by-trial evidence about discriminability itself. According to a more general model, if there is enough evidence about discriminability, objective confidence is characterised by different pattern: For both correct and incorrect choices, confidence increases with discriminability. In addition, we demonstrate the consequence if discriminability is varied in discrete steps within the standard model: confidence in choices about neutral stimuli is no longer .75. Overall, identifying neural correlates of confidence by presupposing the folded X-pattern as a statistical signature of confidence is not legitimate.

Author summary

Confidence in perceptual choices is a degree of belief that a choice about a stimulus is correct. To identify the neural correlates of decision confidence, recent studies have widely used statistical signatures of confidence. The most widely used statistical signature is the folded X-pattern, which entails that the subjective probability of being correct is 0.75 when the stimulus is neutral about the choice, increases with discriminability in correct trials, and decreases with discriminability in incorrect trials. Here, we examine the consequences if key assumptions of the folded X-pattern are violated. If decision makers are provided with evidence about discriminability, objective confidence increases with discriminability for correct and incorrect choices. In addition, if discriminability is varied in discrete levels, confidence in choices about neutral stimuli is not 0.75. Overall, this means that researchers should not search for correlates of confidence by assuming the folded X-pattern as signature of confidence a priori.

Citation: Rausch M, Zehetleitner M (2019) The folded X-pattern is not necessarily a statistical signature of decision confidence. PLoS Comput Biol 15(10): e1007456. https://doi.org/10.1371/journal.pcbi.1007456

Editor: Alireza Soltani, Dartmouth College, UNITED STATES

Received: April 5, 2019; Accepted: October 2, 2019; Published: October 21, 2019

Copyright: © 2019 Rausch, Zehetleitner. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Reproduceable computer code and all results reported in the present study are available online at the open science framework website (https://osf.io/pfsqz/).

Funding: The research presented here was in part funded by the Profor+ internal research funding of the Catholic University of Eichstätt-Ingolstadt (www.ku.de, to MR) and in part by the Deutsche Forschungsgemeinschaft (www.dfg.de, grants ZE 887/8-1 to MZ and RA2988/3-1 to MR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Confidence is a metacognitive evaluation of decision making: Each choice can be accompanied by some degree of confidence that the choice is correct. In neuroscience, confidence has become a flourishing research topic, uncovering the underlying neural mechanisms in humans [1–6] as well as non-human animals [7–13]. A major obstacle to the scientific study of confidence is the inherently subjective nature of the psychological construct of decision confidence. Therefore, a large amount of recent research on confidence has been inspired by a novel approach that formalizes confidence mathematically as an objective statistical quantity [14,15]. This formalization defines confidence as the belief that a choice is correct [16]. From a Bayesian perspective, beliefs are best formalised as probabilities [17,18]. Decision confidence in this formalization is the posterior probability of being correct given the evidence [16,19]. Several predictions about objective confidence have been formally derived from the model to which we subsequently refer to as the standard model of confidence [7,14,15]: First, the average objective confidence in correct choices increases as a function of the discriminability of the stimulus. Second, the average confidence in incorrect choices decreases with discriminability. Finally, when the stimulus is neutral about the choice options, confidence is exactly 0.75. The overall pattern, which we refer to here as folded X-pattern [20], has been dubbed a “statistical signature of confidence” [14,21]. Given that the folded X-pattern follows objectively from the posterior probability of being correct, it has been argued that when the folded X-pattern is detected in another behavioural, neural, or physiological variable, that variable should be considered a correlate of confidence [7,14,15,22]. Thus, it is frequently used to empirically identify correlates of decision confidence [7,8,10,22–25]. Nevertheless, a recent study suggested that the Bayesian calculation of the posterior probability of being correct does not necessarily imply the folded X-pattern [26]. The inverse is also not correct as the folded X-pattern does not necessary imply the Bayesian calculation of confidence [27–29].

Here, we show that the folded X-pattern is no longer expected when confidence is informed by a trial-by-trial representation of discriminability. When objective confidence is calculated from a model of confidence which is more general in the sense that it includes a representation of discriminability, the folded X-pattern occurs only as a special case when the evidence about the discriminability of a specific stimulus is not reliable. When there is accurate information about the discriminability of a stimulus, confidence tends to increase as a function of discriminability in correct and incorrect trials, which is why we refer to this pattern as the double increase pattern.

The standard model of confidence

The standard model of confidence is depicted in Fig 1. According to Sanders et al. [14], when an observer is presented with a stimulus and asked to make a choice ϑ∈{−1,1} about the stimulus, the stimulus d is a continuous variable that differentiates between the two options of ϑ. Negative values of d mean that observers should choose ϑ = −1; d = 0 means no objective feature of the stimulus suggests any of the two options, and positive values indicate that observers ought to choose ϑ = 1. As the sign of d determines what response observers ought to give, we refer to the sign of the stimulus as identity I. The absolute value of |d| is referred to as discriminability: The greater is the distance between d and 0, the easier is the choice. The accuracy of the choice A is 1 if I and ϑ are the same, and 0 otherwise. However, observers cannot perceive d directly, instead, the choice is based on noisy sensory evidence e_I (referred to as percept by Sanders et al.), which can be considered an estimate of d. The most frequent approach is to model e_I as a random sample from a Gaussian with a mean of d, while ϑ is modelled as a deterministic function of d. Finally, given that observers know the distributions from which d and e_I are sampled, the posterior probability of a correct choice given the sensory evidence e_I and the choice ϑ can be calculated using Bayes’ theorem (see S1 Appendix).

Download:

Fig 1. The standard model of confidence.

The stimulus objectively supports the choice options”red” and “green” to varying degrees. As perception is noisy, the percept is a corrupted representation of the degree to which the stimulus favours a specific choice option. Confidence is the probability of making the correct choice given percept and choice.

https://doi.org/10.1371/journal.pcbi.1007456.g001

The standard model has been presupposed to derive the folded X-pattern [14,15], although different aspects of the folded X-pattern come with specific additional assumptions: First, confidence in choices about neutral evidence is .75 only if the distribution of the stimulus is uniform and yields choice accuracies spanning from 0.5 to 1, and if sensory evidence is sampled from a symmetric distribution with a single peak centred on the stimulus, and if choice is deterministic [14,15,26]. Second, the decrease of confidence in incorrect choices presupposes that the observer is not provided with any information about the discriminability of the stimulus at the level of single choices [15]. Although the Bayesian calculation of the probability of being correct implies knowledge of the distribution from which d is sampled, knowledge the distribution of d only implies that observers know the probability of the degrees of discriminability across the experiment. For each specific choice however, the standard model assumes that observers do not possess any knowledge what the discriminability of the stimulus is over and above the distribution from which d is sampled.

The general model of confidence

The general model of confidence extends the standard model by including the possibility that observers perceive or infer the discriminability of the stimulus on the level of single choices. For example, when a driver in heavy rain needs to discern if a traffic light is green or red, the driver might not only be unsure because their colour percept is ambiguous, but they might also be cautious because they see or know their view is hindered by rain. Analogous to traffic lights and rain, many psychophysical experiments do not manipulate the stimulus as one independent variable; instead, two features of the stimulus are varied across the experiment. Therefore, the general model of confidence (see Fig 2) considers identity I and discriminability d as two independent aspects of each single stimulus: The identity, which in each trial can be either -1 or 1, is the variable in the external world that determines which of the choice options is correct. The model generates a choice ϑ about the identity I of the stimulus. For example, the stimulus could be red or green, and participants need to make a choice accordingly. Choices are correct when I and ϑ are both either -1 or 1. Discriminability d is the variable in the external world that determines how easy/difficult the choice is. For instance, many experiments manipulate contrast, presentation time, or luminance orthogonally to stimulus identity I. According to the general model, observers in each single trial obtain sensory evidence about both aspects of the stimulus, i.e. there is sensory evidence for identity e_I, and evidence for discriminability e_d. While e_I depends on I and on d, e_d depends only on d, but not on I. To represent that observers’ do not have direct access to I and d, e_d is sampled from a Gaussian distribution whose mean depends on d, and e_I is sampled from a Gaussian whose mean depends on I and on d. The posterior probability of a correct choice given and the choice ϑ can again be calculated based on Bayes’ theorem (see S2 Appendix).

Download:

Fig 2. The general model of confidence.

The general model is a generalization of the standard model. In many psychophysical experiments, the stimulus varies in two aspects: stimulus identity (symbolized here as red and green colour patches) and discriminability (symbolized here by the noise dots). In the general model, the stimulus generates two internal variables: the evidence about the stimulus identity, a continuous variable that differentiates between the possible identities, and evidence about the discriminability. Objective confidence about the correctness of the choice is based on evidence about the identity as well as evidence about discriminability.

https://doi.org/10.1371/journal.pcbi.1007456.g002

There are at least two possibilities why in an experimental situation, evidence about discriminability e_d may exist separately from the evidence about the identity e_I: First, when stimuli with different degrees of discriminability are not presented in random sequence, for example when discriminability is constant within one block of the experiment, observers can infer the discriminability of the present stimulus. A second possibility is that observers in many cases are able to perceive discriminability directly: Within the visual system, there is not only sensory evidence about the choice-relevant stimulus feature I, but also sensory evidence about other features of the stimulus, irrelevant to the current choice [30,31]. For example, in a masked orientation task, observers may estimate the discriminability not only by their percept of the orientation, but also by their percept of the shape, texture, or presentation time of the stimulus, even when these features are not explicitly manipulated by the experimenter [32]. All sensory evidence irrelevant to the current choice can be used as evidence about the discriminability as long as it is correlated with discriminability.

Why is confidence not exclusively based on sensory evidence dependent on the choice-relevant features of the stimulus if decision confidence is calculated objectively, but also on evidence for the quality and reliability of perception itself? The key fact is that confidence as the posterior probability that the choice is correct given the evidence is only objective if it includes all information that is dependent on the stimulus. Given confidence is objective only if all evidence available is used, and if e_d exists in a specific task, it follows that objective confidence should be based on e_d, too.

Rationale of the present study

In the present study, we used Monte Carlo simulations to trace the statistical patterns of optimal confidence calculated as the posterior probability of being correct given the evidence. Our simulations were based on the standard model as well as on the general model, which extends the standard model by assuming that observers on single trial basis obtain evidence about the discriminability of the stimulus. Based on the general model, we also examined the impact of the reliability of evidence about discriminability on the statistical pattern of confidence. Finally, we examined if relying confidence on evidence about discriminability is a beneficial strategy, or if it is an example of a suboptimal mental shortcut to the probability of being correct [6,27,33–36], i.e. a heuristic [37,38].

Results

Standard model

Fig 3 shows the patterns of confidence obtained from simulations based on the standard model. Only two of the three postulated features of the folded X-pattern consistently follow from the standard model: Independent of the distribution of discriminability |d|, confidence in correct choices always increases as a function of discriminability, and confidence in incorrect choices always decreases with discriminability. However, when the stimulus is neutral about the choice options, confidence is .75 only when |d| is sampled from a continuous uniform distribution that includes high discriminability (see Fig 3F). When |d| is sampled from a discrete uniform distribution (Fig 3A–3C) or a gamma distribution (Fig 3D and 3E), or when the continuous uniform distribution does not support high discriminability (Fig 3A and 3B), confidence in choices about neutral stimuli is not .75.

Download:

Fig 3. Objective confidence given the standard model of confidence.

Confidence (y-axis) is shown as a function of discriminability (x-axis) in correct choices (blue) and incorrect choices (orange). Different panels show different distributions from which discriminability was sampled. Panels a-c: Discrete uniform distributions. Panels d-f: Continuous uniform distributions. Panels g-i: Gamma distributions. In all simulations, the percept e_I was sampled from a normal distribution with a mean equal to the stimulus d and a standard deviation σ_I of 1.

https://doi.org/10.1371/journal.pcbi.1007456.g003

Fig 4 shows the effect of the number of possible values for |d| with a constant maximal level for |d|, assuming a finite number of possible values as well as an equal probability of each value. When there are only few possible values for |d|, confidence in choices with neutral evidence is below .75 (see Fig 4A and 4B). Only when the number of discrete possible values increases–and thus the distribution which |d| is sampled from becomes more similar to a continuous uniform distribution—confidence in choices about neutral evidence becomes close to .75 (see Fig 4C and 4D).

Download:

Fig 4. Objective confidence in standard model depending on the number of different levels of discriminability.

Confidence (y-axis) is shown as a function of discriminability (x-axis) in correct choices (blue) and incorrect choices (orange). Different panels show different numbers of levels of discriminability |d|, sampled from discrete uniform distributions. Possible values of |d| are 0, 2, and 4 (Panel a), 0, 1, 2, 3, and 4 (Panel b), 0, ½, 1, …, 4 (Panel c) or 0, ¼, ½, …, 4 (Panel d). The percept e_I was sampled from a normal distribution with a mean equal to the stimulus d and a standard deviation σ_I of 1.

https://doi.org/10.1371/journal.pcbi.1007456.g004

Fig 5 shows the patterns of confidence assuming only two equally probable values of |d|. In this case, confidence in choices about neutral evidence is not .75, irrespective of whether neutral stimuli are paired with hard decisions (see Fig 5A and 5B), or with easier decisions (see Fig 5C and 5D).

Download:

Fig 5. Objective confidence in the standard model if discriminability is either 0 or maximal.

Discriminability |d| was always sampled from discrete uniform distributions with only two values. One of the two possible values was always 0, indicating neutral stimuli with respect to the choice options. The second possible value of |d| was 1 (Panel a), 2 (Panel b), 3 (Panel c), or 10 (Panel d). Confidence (y-axis) is shown as a function of discriminability (x-axis) in correct choices (blue) and incorrect choices (orange). The percept e_I was sampled from a normal distribution with a mean equal to the stimulus d and a standard deviation σ_I of 1.

https://doi.org/10.1371/journal.pcbi.1007456.g005

General model

What is the pattern of confidence expected from the general model? As can be seen from Fig 6, the general model is compatible with both the folded X-pattern and the double increase pattern. When σ_d is small and thus the evidence about discriminability is reliable (see Fig 6, a1-a9), confidence approaches 0.5 when discriminability is 0. In addition, confidence increases with discriminability for both in correct choices well as in incorrect choices, i.e. confidence is characterised by what we refer to as the double increase pattern. These patterns are the same across different distributions of discriminability (Fig 6, different rows). When σ_d is large and thus there is only corrupted evidence about discriminability (see Fig 6, d1-d9), the pattern of confidence is the same as for the standard model (cf. Fig 3). When σ_d increases (see Fig 6, b1-b9, c1-c9), confidence in choices about stimuli with d = 0 increases. Additionally, when σ_d increases, the correlation between discriminability and confidence in incorrect choices becomes more negative, eventually switching sign from positive to negative.

Download:

Fig 6. Objective confidence according to the general model of confidence.

The sensory evidence about the identity e_I and evidence about discriminability e_d were both sampled from normal distributions, with standard deviations σ_I = 1 and σ_d varying across columns. Confidence (y-axis) is shown as a function of discriminability (x-axis) in correct trials (blue) and incorrect trials (orange). Panels a1-a9: σ_d = 0.1. Panels b1-b9: σ_d = 0.33. Panels c1-c9: σ_d = 1. Panels d1-d9: σ_d = 10. Different rows indicate different distributions of discriminability within the simulated experiments. Rows 1–3: Discrete uniform distributions, rows 4–6: continuous uniform distributions, rows 7–9: Gamma distributions.

https://doi.org/10.1371/journal.pcbi.1007456.g006

Accuracy of confidence

Accuracy of confidence was assessed by the information entropy of choice accuracy conditioned on confidence H(A|c). The information entropy is a measure of prediction error motivated by the free energy principle [39]: H(A|c) reflects the uncertainty with respect to choice accuracy given confidence; if choice accuracy is perfectly specified by confidence, H(A|c) will be zero. Fig 7 compares H(A|c) between confidence based on evidence about the identity e_I only and confidence based on evidence about the identity e_I and evidence about discriminability e_d. The assumption that confidence is based exclusively on evidence about e_I is equivalent to the standard model. Fig 7 shows that when the standard deviation of the evidence about discriminability σ_d is low, confidence based on e_I and e_d is associated with a lower information entropy of accuracy conditioned on confidence than confidence based on e_I alone. This means that when there is an accurate estimate of discriminability, confidence that takes the evidence about discriminability into account is associated with a smaller prediction error than confidence ignoring evidence about discriminability. For larger values of σ_d, H(A|c) is the same between confidence based on e_I and e_d and confidence based on e_I, meaning that there is no longer a benefit of the estimate of discriminability when the estimate was too noisy. Importantly, even when σ_d is very large, there is never a case when confidence based on e_I and e_d is worse than confidence based solely on e_I.

Download:

Fig 7. The information entropy of choice accuracy conditioned on confidence H(A|c).

The noise parameters of the estimate of discriminability σ_d is displayed on the x-axis. Different panels indicate different distributions of discriminability within the simulated experiments. Panels a-c: Discrete uniform distributions. Panels d-f: Continuous uniform distributions. Panels g-i: Gamma distributions. Violet symbols indicate H(A|c) when confidence is calculated exclusively based on sensory evidence about the identity of the stimulus e_I. Orange symbols indicate H(A|c) when confidence is calculated based on evidence about the identity of the stimulus e_I and on evidence about discriminability e_d. The standard deviation of evidence about the identity σ_I was set to 1.

https://doi.org/10.1371/journal.pcbi.1007456.g007

Discussion

The present study showed that the objective calculation of confidence does often not imply the folded X-pattern. When there is sufficient evidence about discriminability as predicted by the general model, the correlation between discriminability and confidence in incorrect trials is positive, not negative. Even if there is no evidence about discriminability, confidence in choices about neutral stimuli is not .75 unless discriminability is sampled from a continuous uniform distribution with high maximal discriminability. We also showed by simulations that if observers make optimal use of the evidence, and if evidence about discriminability is available, then confidence depends on evidence about discriminability.

The observation that the Bayesian calculation of confidence does not always imply the folded X-pattern corroborates the results of a previous study [26]. Adler and Ma showed that the folded X-pattern depends on the distribution from which the stimulus is sampled. Specifically, confidence in incorrect choices no longer decreases with discriminability if stimuli are only probabilistically related to which choice observers ought to make. Likewise, confidence in neutral events is .75 only if the width of the stimulus distribution is quite large compared to the noise in perception. The present study shows that there are at least two more cases where confidence is not expected to follow the folded X-pattern. First, when discriminability does not vary continuously but in a small number of discrete steps, optimal confidence in choices about neutral events is not .75. Notably, previous studies assuming the folded X-pattern typically relied on discrete manipulations of discriminability. Second, when observers can perceive or infer discriminability on a single trial level with sufficient accuracy, objective confidence follows the double increase pattern.

In summary, these observations imply that blind reliance on the folded X-pattern potentially leads to false conclusions. Identifying correlates of confidence by a priori presupposing the folded X-pattern is not advisable because objective confidence may not show the expected properties. Likewise, it is also not advisable to infer the computational principles underlying observed confidence judgments based on statistical signatures alone, because various different models are able to recreate the folded X-pattern [26,28,29,32], just as the double increase pattern [26,32,40]. Importantly, both the folded X-pattern and the double increase pattern are compatible with Bayesian computation of confidence, which is why model fitting is necessary to ascertain which model is the generative model of the data [26].

Why should sensory evidence parallel to the choice improve objective confidence?

The double increase pattern has been regarded as indicative of a suboptimal mental shortcut to the probability of being correct [33], i.e. a heuristic [37,38]. However, as evidence about discriminability in fact decreases the prediction error of confidence, the double increase pattern may in some cases indicate optimal, not suboptimal calculation of confidence.

Too see why it is necessary to include e_d in the calculation of objective confidence, we can look at the formula of posterior probability of the identity according to the general model (see S2 Appendix for the derivation): (1) In formula (1), I represents the identity of the stimulus, e_I the evidence about the identity, d discriminability, and e_d the is the evidence about discriminability. As can be seen from the formula, evidence about the discriminability e_d is needed to calculate the objective posterior probability given the evidence. This means if observers make optimal use of the evidence, and if evidence about the discriminability e_d is available, e_d ought to be included into the calculation of the posterior probability of the identity and hence confidence.

Now, to get some intuition why it is optimal to include e_d in the calculation of confidence, let us look at formula (1) more closely. The Bayesian computation of the posterior probability divides the likelihood of the evidence about the identity e_I given the identity 1 (the terms in the numerator) by the sum of the likelihood of e_I given I = -1 and the likelihood of e_I given I = 1 (the terms in the denominator). Calculating the likelihood of e_I requires knowledge of the distribution from which e_I is sampled. However, according to the model, e_I is sampled from a Gaussian whose mean not only depends on I, but also on d. For this reason, the likelihood of e_I given I is calculated by multiplying the prior probability of a specific level discriminability p(d_k) with the likelihood of e_I given the level discriminability and the identity p(e_I|d_k,I), and summing these terms across all levels of discriminability. Conceptually, these terms imply a consideration how plausible e_I is given the identity and given the level of discriminability, weighted by the plausibility of that level of discriminability. These terms are summed over all possible values of discriminability. The product of p(d_k) and p(e_I|d_k,I) represents the case of the standard model: Observers know how plausible each degree of discriminability is across the experiment, and based on that prior information, they evaluate the plausibility of e_I. The novel feature of the general model is the inclusion of the probability of evidence about discriminability given discriminability p(e_d|d_k). Conceptually, p(e_d|d_k) implies the evaluation how plausible the level of discriminability is based on the evidence about the discriminability. As can be seen in the formula, p(e_d|d_k) is multiplied with p(d_k) and p(e_I|d_k,I). Thus, in the general model, observers attach a weight to p(e_I|d_k,I) not only based the prior knowledge of the distribution of discriminability within the experiment, but they also evaluate the plausibility of each degree of discriminability based on sensory evidence about the discriminability. Thus, evidence about the discriminability improves the efficiency of the evaluation of e_I because evaluating the plausibility of p(e_I|I) requires knowledge about d, and some additional information about the discriminability is better than the prior distribution alone. If p(e_d|d_k) is the same across all levels of discriminability, the general model makes the same predictions as the standard model; conceptually, identical p(e_d|d_k) across all levels of discriminability represents the case when there is no information about discriminability on a single trial basis.

Empirical support for folded X- and the double increase pattern

What is the empirical evidence concerning the two statistical patterns of confidence? Several previous experiments were indeed in accordance with the folded X-pattern. In an auditory discrimination task [14], a general knowledge task [14], as well as a visual two-alternative forced choice tasks [41], confidence increased with discriminability in correct trials, decreased with discriminability in incorrect trials, and was medium when stimuli could not be distinguished. The folded X-pattern was also consistent with rats’ willingness to wait for reward in an odour discrimination task [7,24], which can be seen as a marker of confidence in non-humans.

However, six other studies based on human observers were not consistent with the folded X-pattern, and three of these studies revealed the double increase pattern instead. In two random dot motion discrimination tasks, coherence of motion was positively, not negatively, associated with confidence in incorrect trials [42,43]. Likewise, in a masked orientation discrimination task, confidence in incorrect trials increased with stimulus-onset-asynchrony as well [32]. Two studies revealed a relationship between confidence in incorrect trials and discriminability that was essentially flat. In a second masked orientation discrimination task, in which observers' confidence was assessed by asking observers on which of two subsequent orientation judgments they were willing to bet, confidence in incorrect trials was approximately constant across levels of stimulus contrast [44]. Moreover, in a low-contrast orientation discrimination task, the average confidence in incorrect trials was approximately constant across task difficulty levels [45]. Finally, in a discrimination task about the average orientation of a sequence of oriented Gabor patches, one subset of observers showed the folded X-pattern and another subset the double increase pattern [33], although the interpretation of the inverse variability of sequence of oriented Gabor patches as discriminability is controversial [26].

Overall, these studies suggested that the folded X-pattern is by no means universal. Although there is empirical support for the folded X-pattern in some experiments, in other experiments the pattern is just opposite to what has been considered as the signature of confidence.

How can the differences between those studies be explained? One possibility is that some experimental tasks allow observers to estimate the discriminability on a single trial basis, as predicted by the general model: Strikingly, all studies that reported an increase of confidence and incorrect choices with discriminability were based on psychophysical tasks where the stimulus was composed out of one feature that defined the response as well as an orthogonal manipulation of discriminability: In the random dot motion discrimination tasks, participants responded to the direction of motion, and the discriminability was manipulated by the coherence of the motion signal [42,43]. Likewise, in the masked orientation task, the identity of the stimulus was defined by the orientation of the stimulus, while discriminability was manipulated by the time between stimulus onset and mask onset [32]. In contrast, those studies that observed that confidence in incorrect choices decreased with discriminability all aimed to vary the evidence more directly by using stimulus material providing different mixtures of evidence to the observer: The auditory discrimination experiment delivered click streams to both ears of the observers, and participants had to indicate which click rate was faster. Importantly, evidence was varied by the ratio between click frequencies in the two streams [14]. Likewise, the general knowledge task required observers to decide which of two countries had a greater population, with discriminability defined as the log ratio of the population size of the two countries [14]. Finally, participants in one of the two visual two-alternative forced choice tasks indicated which of two presented textured stimuli showed had un unequal amount of white and black squares. The difficulty of the task was varied by the proportion of white to black squares [41]. In all these tasks, the stimulus consisting of mixtures of evidence about the identity might make it more difficult to estimate discriminability.

An alternative explanation for the differences between studies relying on the timing of the confidence measurement is not consistent with all the existing studies. It has been argued that asking observers to indicate their choice and their confidence at the same time interferes with the confidence report [14]. For example, asking participants to report confidence and choice at the same time might be sufficient to induce a report strategy that is no longer based on posterior probabilities, but on heuristics [36]. Additionally, measuring confidence after the choice may allow observers to collect additional evidence after the choice or even change their minds [3,41,42,46,47]. In favour of the timing-based explanation, those studies to report a decrease of confidence with discriminability assessed first the choice and confidence only after the choice [14,41]. The studies to report the opposite pattern more often recorded confidence simultaneously with the response [42,43]. Nevertheless, at least in the masked orientation discrimination task, the timing of the responses does not provide a satisfying explanation, because an increase of confidence in incorrect choices with discriminability was consistently observed irrespective of whether confidence was assessed at the same time as the choice or afterwards [32]. Future experiments appear necessary to test if the timing of the confidence measurement influences patterns of confidence in the other experimental paradigms.

Is there other empirical support for the hypothesis that confidence is not only based on sensory evidence about the identity of the stimulus, but also on evidence about discriminability? There is evidence that the brain represents estimates of discriminability: A recent neuro-imaging study showed that neural areas in posterior parietal cortex and ventral striatum track sensory reliability independently of the choice [4]. To our knowledge, only one study so far included evidence about discriminability into a formal modelling analysis. In a masked orientation discrimination task, confidence was best explained by a combination of evidence about the identity of the stimulus as well as the general visibility of the stimulus, although the study did not test whether evidence about the identity of the stimulus and visibility were combined in a Bayesian fashion [32]. In contrast, when the double increase pattern was observed in random dot kinematograms, the increase of confidence in errors with discriminability was explained by an influence of decision times of confidence [42,43]. However, at least in the masked orientation discrimination task, decision times cannot not account for the increase of confidence in errors with discriminability because decision time in incorrect trials was uncorrelated with discriminability [32].

Although more experiments are clearly necessary to investigate the relationship between confidence and decision time, the hypothesis regarding e_d gains some plausibility due to converging evidence that human confidence is informed by many cues. One mechanism may rely on the variability of e_I: In a random dot motion discrimination task, confidence depended on the consistency of the random dot motion, although discrimination performance was equated [48]. Additionally, when observers discriminated the average colour of an array of coloured shapes, confidence was not only determined by the distance of the average colour to the category boundary, but was also affected by the variability of colour across the array [49]. A second mechanism may rely on the elapsed time during decision making: In in a global motion discrimination task, the time required to make a decision was varied while the sensory evidence about the motion direction was equated, showing that decision time directly informed confidence [42]. Given that human metacognition appears to make use of such a variety of cues, it seems plausible to us that sensory evidence about discriminability may be involved as well.

Conclusion

To summarize, the present paper argues that the folded X-pattern can be misleading as a signature of confidence. On theoretical grounds, it can be expected that in many psychophysical tasks, confidence in incorrect choices increases, not decreases with discriminability. On empirical grounds, it must be acknowledged that the folded X-pattern can only be observed for some tasks, while it does hold true for other tasks. Overall, it is not legitimate to identify neural correlates of confidence by assuming a specific signature of confidence a priori. When statistical properties are used to track correlates of confidence, it appears essential to empirically assess the pattern of confidence in each single task using behavioural markers of confidence.