Observing the Observer (II): Deciding When to Decide

In a companion paper [1], we have presented a generic approach for inferring how subjects make optimal decisions under uncertainty. From a Bayesian decision theoretic perspective, uncertain representations correspond to “posterior” beliefs, which result from integrating (sensory) information with subjective “prior” beliefs. Preferences and goals are encoded through a “loss” (or “utility”) function, which measures the cost incurred by making any admissible decision for any given (hidden or unknown) state of the world. By assuming that subjects make optimal decisions on the basis of updated (posterior) beliefs and utility (loss) functions, one can evaluate the likelihood of observed behaviour. In this paper, we describe a concrete implementation of this meta-Bayesian approach (i.e. a Bayesian treatment of Bayesian decision theoretic predictions) and demonstrate its utility by applying it to both simulated and empirical reaction time data from an associative learning task. Here, inter-trial variability in reaction times is modelled as reflecting the dynamics of the subjects' internal recognition process, i.e. the updating of representations (posterior densities) of hidden states over trials while subjects learn probabilistic audio-visual associations. We use this paradigm to demonstrate that our meta-Bayesian framework allows for (i) probabilistic inference on the dynamics of the subject's representation of environmental states, and for (ii) model selection to disambiguate between alternative preferences (loss functions) human subjects could employ when dealing with trade-offs, such as between speed and accuracy. Finally, we illustrate how our approach can be used to quantify subjective beliefs and preferences that underlie inter-individual differences in behaviour.


Introduction
How can we infer subjects' beliefs and preferences from their observed decisions? Or in other terms, can we identify the internal mechanisms that led subjects to act, as a response to experimentally controlled stimuli? Numerous experimental and theoretical studies imply that subjective prior beliefs, acquired over previous experience, strongly impact on perception, learning and decisionmaking ( [2][3][4][5][6]). We also know that preferences and goals can impact subjects' decisions in a fashion which is highly contextdependent and which subjects may be unaware of ( [7][8]). But how can we estimate and disentangle the relative contributions of these components to observed behaviour? This is the nature of the socalled Inverse Bayesian Decision Theory (IBDT) problem, which has been a difficult challenge for analytical treatments.
In a companion paper [1], we have described a variational Bayesian framework for approximating the solution to the IBDT problem in the context of perception, learning and decisionmaking studies. Subjects are assumed to act as Bayesian observers, whose recognition of the hidden causes of their sensory inputs depends on the inversion of a perceptual model with subject-specific priors. The Bayesian inversion of this perceptual model derives from a variational formulation, through the minimization of sensory surprise (in a statistical sense). More precisely, the variational Bayesian approach minimizes the so-called ''free energy'', which is a lower bound on (statistical) surprise about the sensory inputs. The ensuing probabilistic subjective representation of hidden states (the posterior belief) then enters a response model of measured behavioural responses. Critically, decisions are thought to minimize expected loss or risk, given the posterior belief and the subject-specific loss (or utility) function that encodes the subject's preferences. The response model thus provides a complete mechanistic mapping from experimental stimuli to observed behaviour. Over time or trials, the response model has the form of a state-space model (e.g., [9]), with two components: (i) an evolution function that models perception and learning through surprise minimization and (ii) an observation function that models decision making through risk minimization.
Solving the IBDT problem, or observing the observer, then reduces to inverting this state-space response model, given experimentally measured behaviour. This meta-Bayesian approach (experimenters make Bayesian inferences about subject's Bayesian inferences) provides an approximate solution to the IBDT problem in that it enables comparisons of competing (perceptual and response) models and inferences on the parameters of those models. This is important, since evaluating the evidence of, for example, different response models in the light of behavioural responses means we can distinguish between different loss functions (and thus preferences) subjects might have. This paper complements the theoretical account in the companion paper by demonstrating the practical applicability of our framework. Here, we use it to investigate what computational mechanisms operate during learning-induced motor facilitation. While it has often been found that (correct) expectations about sensory stimuli speed up responses to those stimuli (e.g. [10][11]), explaining this acceleration of reaction times in computationally mechanistic terms is not trivial. We argue that such an explanation must take into account the dynamics of subjective representations, such as posterior beliefs about the causes that generate stimuli, and their uncertainty, as learning unfolds over trials. Throughout the text, ''representation'' refers to posterior densities of states or parameters. We investigate these issues in the context of an audiovisual associative learning task [12], where subjects have to categorize visual stimuli as quickly as possible. We use this task as a paradigmatic example of what sort of statistical inference our model-based approach can provide. As explained in detail below, this task poses two interesting explananda for computational approaches: (i) it relies upon a hierarchical structure of causes in the world: visual stimuli depend probabilistically on preceding auditory cues whose predictive properties change over time (i.e., a volatile environment), and (ii) it introduces a conflict in decision making, i.e. a speed-accuracy trade-off.
We construct two Bayesian decision theoretic (BDT) response models based upon the same speed-accuracy trade-off (c.f. [13] or [14]), but differing in their underlying perceptual model. These two perceptual models induce different learning rules, and thus different predictions, leading to qualitatively different trial-by-trial variations in reaction times. We have chosen to focus on reaction time data to highlight the important role of the response model and to show that optimal responses are not just limited to categorical choices.
Of course, the validity of a model cannot be fully established by application to empirical data whose underlying mechanisms or ''ground truth'' are never known with certainty. However, by ensuring that only one of the competing models was fully consistent with the information given to the subjects, we established a reference point against which our model selection results could be compared, allowing us to assess the construct validity of our approach. Furthermore, we also performed a simulation study, assessing the veracity of parameter estimation and model comparison using synthetic data for which the ground truth was known.

Methods
How does learning modulate reaction times? In this section, we first describe the associative learning task, and then the perceptual and response models we have derived to model the reaction time data. We then recall briefly the elements of the variational Bayesian framework which is described in the companion paper in detail and which we use to invert the response model given reaction time data. Next, we describe the Monte-Carlo simulation series we have performed to demonstrate the validity of the approach. Finally, we summarize the analysis of real reaction time data, illustrating the sort of inference that can be derived from the scheme, and establishing the construct validity of the approach.

The associative learning task
The experimental data and procedures have been reported previously as part of a functional magnetic resonance imaging study of audio-visual associative learning [12]. We briefly summarize the main points. Healthy volunteers were presented visual stimuli (faces or houses) following an auditory cue. The subjects performed a speeded discrimination task on the visual stimuli. On each trial, one of two possible auditory cues was presented (simple tones of different frequencies; C 1 and C 2 ), each predicting the subsequent visual cue with a different probability. The subjects were told that the relationship between auditory and visual stimuli was probabilistic and would change over time but that these changes were random and not related to any underlying rule. The reaction-time (from onset of visual cue to button press) was measured on each trial.
The probability of a given visual outcome or response cue, say face, given C 1 was always the same as the probability of the alternative (house) given C 2 : p faceDC 1 ð Þ1{p faceDC 2 ð Þ . Moreover, since the two auditory cues occurred with equal frequency, the marginal probability of a face (or house) on any given trial was always 50%. This ensured that subjects could not be biased by a priori expectations about the outcome. In the original regression analyses in [12] no differences were found between high and low tone cues, nor any interactions between cue type and other experimental factors; here, we therefore consider the trials cued by C1 and C2 as two separate (intermingled, but non-interacting) sequences. This allows us to treat the two sequences as replications of the experiment, under two different auditory cues. We hoped to see that the results were consistent under the high and low tone cues.
A critical manipulation of the experiment was that the probabilistic cue-outcome association pseudorandomly varied over blocks of trials, from strong p faceDC ð Þ0:9, and moderate p faceDC ð Þ0:7, to random p faceDC ð Þ~0:5. Our subjects were informed about the existence of this volatility without specifying the structure of these changes (timing and probability levels). We prevented any explicit search for systematic relationships by varying the length of the blocks and by presenting predictive and random blocks in alternation. In one session, each block lasted for 28-40 trials, within which the order of auditory cues was randomized. Each of five sessions lasted approximately seven minutes. On each trial, an auditory cue was presented for 300 ms, followed by a brief (150 ms) presentation of the visual outcome. In order to prevent anticipatory responses or guesses, both the intertrial interval (20006650 ms) and visual stimulus onset latency (150650 ms) were jittered randomly.
The conventional analysis of variance (ANOVA) of the behavioural measures presented in [12] demonstrated that subjects learned the cue-outcome association: reaction times to the visual stimuli decreased significantly with increasing predictive strengths of the auditory cues. In what follows, we try to better understand the nature of this learning and the implicit perceptual models the subjects were using.

Perceptual and response models
The first step is to define the candidate response models that we wish to consider. In what follows, we will restrict ourselves to two qualitatively different perceptual models, which rest on different prior beliefs and lead to different learning rules (i.e. posterior belief update rules or recognition processes). To establish the validity of our meta-Bayesian framework, the two models used for the analysis of the empirical data were deliberately chosen such that one of them was considerably less plausible than the other: whereas a ''dynamic'' model exploited the information given to the subjects about the task, the other (''static'') model ignored this information. This established a reference point for our model comparisons (akin to the ''ground truth'' scenario used for validating models by simulated data). These perceptual models were combined with a loss-function embodying the task instructions to form a complete BDT response model. This loss-function had two opposing terms, representing categorization errors and the decision time, respectively, and thus inducing the speedaccuracy trade-off of the task. We now describe the form of these probabilistic models and their inversion.
Perceptual models. The sensory signals (visual outcomes) u presented to the subjects were random samples from two sets of images, composed of eight different faces and eight different houses, respectively. A two-dimensional projection of these images onto their two first principal eigenvectors clearly shows how faces and houses cluster around two centres that can be thought of as an ''average'' face and house, respectively (see Figure 1). We therefore assumed the sensory inputs u to be as a univariate variable (following some appropriate dimension reduction), whose expectation depends upon the hidden state (face or house). This can be expressed as a likelihood that is a mixture of Gaussians: Here g 1 ,g 2 ð Þare the expected sensory signals caused by houses and faces (the ''average'' face and house images), k is a trial index, x (1) k [ 0,1 f g is an indicator state that signals the category (x (1) k~1 : house, x (1) k~0 : face), and a is the standard deviation of visual outcomes around the average face/house images. During perceptual categorization, subjects have to recognize x (1) k , given all the sensory information to date. As faces and houses are wellknown objects for whose categorisation subjects have a life-long experience, it is reasonable to assume that g 1 ,g 2 ð Þ and a are known to the subjects. The hidden category states x (1) k have a prior Bernoulli distribution conditioned on the cue-outcome associative strength x (2) k : Figure 1. 2D projection of the visual stimuli that were presented to the subjects (two sets of eight face images and eight house images, respectively). X-axis: first principal component, y-axis: second principal component. On this 2D projection, house and face images clearly cluster (green and blue ellipses) around ''average'' face and house (green and blue stars), respectively. One might argue that these ellipses approximate the relative ranges of variations of faces and houses, as perceived by the visual system. doi:10.1371/journal.pone.0015555.g001 The sigmoid function s x (2) k ~p x (1) k~1 DC i maps the associative strength x (2) k to the probability of seeing a house given the present auditory cue C i i[ 1,2 f g ð Þ . Figure 2 summarises the general structure of the perceptual models of associative learning in this paper.
We considered two perceptual models that differed only in terms of prior beliefs about the associative strength. Although both models have a prior expectation of zero for the associative strength, they differ profoundly in their predictions about how that associative strength changes over time. This is reflected by the different roles of the perceptual parameter q in the two models: N The static perceptual model, m (p) 1 : Subjects were assumed to ignore the possibility of changes in associative strength and treat it as stationary. Under this model, subjects assume that the associative strength has a constant value, x (2) 0 , across trials and is sampled from a Gaussian prior; i.e.: where q is its (fixed) prior precision. Here, the perceptual parameter q effectively acts as an (unknown) initial condition for the state-space formulation of the problem (see Equation 13 below).
N The dynamic perceptual model m (p) 2 : Subjects assumed a priori that the associative strength x (2) k varied smoothly over time, according to a first-order Markov process. This is modelled as a random walk with a Gaussian transition density: Here, q is the precision hyperparameter which represents the roughness (inverse smoothness) of changes in associative strength (i.e., its volatility).
Note that the task information given to subjects did highlight the possibility of changes in cue strength. Therefore, from the point of view of the experimenter, it is more likely that the subjects relied upon the dynamic model to form their prior predictions. The choice of these two models was deliberate as it allowed for a clear prediction: we hoped to see that model comparison would show a pronounced superiority of the dynamic model (see section 'Inverting the response model below').
Recognition: the variational Bayesian inversion of the perceptual model. Given the perceptual models described above, we can now specify the recognition process in terms of their variational Bayesian inversion. The generic derivation of the recognition process is detailed in the companion paper [1]. In brief, subjects update their belief on-line, using successive stimuli to optimise l k~m (1) k ,m (2) k ,s (2) k n o , the sufficient statistics of the posterior density on the k-th trial. Under a mean-field/Laplace approximation to the joint posterior, these sufficient statistics are (i) m (1) , the first-order moment of the Bernoulli posterior q x (1) k about the outcome category x (1) , and (ii) m (2) ,s (2) À Á , the first-and second-order moments of the Gaussian posterior q x (2) k about the associative strength x (2) . The recognition process derives from the minimization of the surprise conveyed by sensory stimuli at each trial. Within a variational Bayesian framework, negative surprise is measured (or, more precisely, lower-bounded) via the so-called perceptual free-energy F (p) k [15]: where the expectation is taken under the approximate posterior densities (representations) q x (1) Note also that the perceptual free energy F (p) k of the k-th trial depends on the representation of associative strength at the previous trial, through the sufficient statistics m (2) k{1 and s (2) k{1 . Therefore, these affect the current optimal sufficient statistics l k (including that of the outcome category), allowing learning to be expressed over trials. Optimizing the perceptual free energy F (p) k with respect to q x (1) k and q x (2) k yields the updated posterior densities of both the outcome category (face or house) and of the associative strength Note that functional form of the sufficient statistics above depends upon the perceptual model, through the variance parameter s (2) 0 , which in turn depends upon the precision parameter q (see Equation 6). This dependence is important, since it strongly affects the recognition process. Under the static perceptual model, equation 8 tells us that the subject's posterior variance s (2) k about the associative strength is a monotonically decreasing function of trial index k. This means that observed cueoutcome stimuli will have less and less influence onto the associative strength representation, which will quickly converge. Under the dynamic perceptual model however, q scales the influence the past representation has onto the current one. In other words, it determines the subject's speed of forgetting (discounting): the more volatile the environment, the less weight is assigned to the previous belief (and thus past stimuli) in the current representation. The key difference between the two perceptual models thus reduces to their effective memory.
We (experimentally) estimate the parameter q through inversion of the response model m (r) , as summarized in the next section. This means the optimisation of perceptual representations has to be repeated for every value of q that is considered when observing the observer, i.e. during inversion of the response model. This is an important operational aspect of meta-Bayesian inference, where ð5Þ inversion of the response model entails a nested inversion of the perceptual model.
Response model: deciding when to decide. Following the description of the perceptual models, we now define the BDT mapping from representations to behaviour. We assume that subjects decide on the basis of an implicit cost that ranks possible decisions in terms of what decision is taken and when it is made. This cost is encoded by a loss-function where c[ 0,1 f g is the subject's choice (face or house) and t[R z is the decision time. The first term makes a categorisation error costly, whereas the second penalizes decision time. This loss-function creates a speed-accuracy conflict, whose optimal solution depends on the loss parameter h 1 . Since the categorization error is binary, the loss parameter h 1 can be understood as the number of errors subjects are willing to trade against one second delay. It is formally an error rate that controls the subject-dependent speed-accuracy trade-off. This can lead to an interaction between observed reaction times and choices, of the sort that explains why people make mistakes when in a hurry (see below).
This loss function is critical for defining optimal decisions: ' h x (1) ,c,t À Á returns the cost incurred by making choice c at time t while the outcome category is x (1) . Because subjects experience perceptual uncertainty about the outcome category, the optimal decision c Ã ,tÃ ð Þminimizes the expected loss, which is also referred to as posterior riskQ h (this is discussed in more detail in the companion paper): Note that because the expectation is taken with regard to the posterior density on the hidden states (i.e., the belief about stimulus identity), optimal decisions (concerning both choice c and response time t) do not only depend on the loss-function ', but also on the perceptual model m (p) .
To derive how posterior risk evolves over time within a trial, we make the representation of outcome category a function of withintrial peristimulus time t (dropping the trial-specific subscript k for clarity): m (1) ?m (1) t ð Þ. We can motivate the form of m (1) t ð Þ by assuming that the within-trial recognition dynamics derive from a gradient ascent on the perceptual free-energy F (p) k . This has been recently suggested as a neurophysiologically plausible implementation of the variational Bayesian approach to perception ( [3,16;17]). Put simply, this means that we account for the fact that optimizing the perceptual surprise with respect to the representation takes time.
At each trial, the subject's representation is initialized at her prior prediction m (1) 0 :m (1) 0 ð Þ, and asymptotically converges to the optimum perceptual free energy m (1) ? : lim t??
(Note that the prior prediction at the beginning of a trial, m (1) 0 , changes over trials due to learning the predictive properties of the auditory cue; see Equations 5-8 above). It turns out (see Appendix S1) that, the posterior risk in Equation 10 can be rewritten as a function of within-trial peristimulus time t and the difference Dm (1) 0m (1) ? {m (1) 0 between the posterior representation and the prior prediction of the outcome category (which can thus be thought of as a post-hoc prediction error): where the second response parameter h 2 is an unknown scaling factor that controls the sensitivity to post-hoc prediction error. Note that in the present experimental context, the sensory evidence in favour of the outcome category is very strong. Hence, at convergence of the recognition process, there is almost no perceptual uncertainty about the outcome (m (1) ? &x (1) ). Thus, regardless of the prior prediction m (1) 0 , the post-hoc prediction error Dm (1) 0 is always positive when a house is presented (x (1)~1 ) and always negative when a face is shown (x (1)~0 ). This means that categorization errors occur if: (C1) Dm (1) 0 v0 and c~1, or (C2) Dm (1) 0 w0 and c~0. These conditions can be unified by rewriting them as Dm (1) 0 2c{1 ð Þv0 (see the Appendix for further mathematical details). An interesting consequence is that categorization errors can be interpreted as reflecting optimal decision-making: they occur whenever the (learned) prior prediction of the visual outcome is incorrect (e.g. m (1) 0 &0 despite x (1)~1 ) and the delay cost is high enough. In other words, categorization errors are optimal decisions if the risk of committing an error quickly is smaller than responding correctly after a longer period.
Note that when Dm (1) 0 2c{1 ð Þw0 (no categorization error), the posterior risk given in equation 11 is a convex function of decision time t. The shape of this convex function is controlled by both the error rate parameter h 1 and the sensitivity h 2 to post-hoc prediction error. Finally, Equation 11 yields the optimal reaction time: Note that this equation has two major implications. First, as one would intuit, optimal reaction times and post-hoc prediction error show inverse behaviour: as the latter decreases, the former increases. Second, and perhaps less intuitive, the optimal reaction time when committing perceptual categorization errors is zero, because in this case the post-hoc prediction error is such that: The reader may wonder at this stage whether predicted RTs of zero are at all sensible. It should be noted that this prediction arises from the deterministic nature of Equation 12. When combined with a forward model accounting for random processes like motor noise (see Equation 13 below), non-zero predicted RTs result. Put simply, Equation 12 states that the cost of an error is reduced, if the decision time is very short.

Inverting the response model
Together with equations 7 and 8, equations 11 and 12 specify the state-space form of our response model m (r) : where y k is the observed reaction time at trial k and the residuals e k *N 0,U ð Þ, with precision U {1~h 3 , account for (i.i.d. Gaussian) random variability in behavioural responses (e.g. motor noise). The second (evolution) equation models recognition through the minimization of perceptual free energy (or negative sensory surprise) and the first (observation) equation models decision making through the minimization of posterior risk. The functional form of the optimal decision time is given in equation 12 (evaluating the post-hoc prediction error Dm (1) 0 at the current trial) and that of the perceptual free energy is given in equation 5 (recall that learning effects are modulated by the perceptual parameter q). Equation 13 basically implies that the current reaction time y k is a nonlinear function of both the response parameters h and the perceptual parameter q, through the history of representations l 1 ,l 2 ,:::,l k . The trial-to-trial variation of reaction times y 1 ,y 2 ,:::,y k therefore informs us about both the hidden loss and the belief structures of the observer.
The complete formulation of the probabilistic response model involves the definition of the likelihood function (directly derived from equation 13) and the prior density over the unknown model parameters q,h ð Þ. Here, we use weakly informative log-normal priors (see [18]) on the perceptual parameter q and the response parameters h 1 ,h 2 f gto enforce positivity. These are given in table 1. In addition, the variational Bayesian inversion of the response model makes use of a mean field approximation p h,qDy,m (r) À Á &r h 1 ,h 2 ,q ð Þ r h 3 ð Þ that separates the noise precision parameter h 3 from the remaining parameters. Lastly, we relied on a Laplace approximation to the marginal posterior r h 1 ,h 2 ,q ð Þ . This reduces the Bayesian inversion to finding the first-and second-order moments of the marginal posterior (see equations 13 and 14 in the companion paper [1] for a complete treatment).
The algorithmic implementation of the variational Bayesian inversion of the response model is formally identical to that of a Dynamic Causal Model (DCM, see e.g. [19] for a recent review). The variational Bayesian scheme furnishes the approximate marginal posteriors and a lower bound on the response model evidence (via the response free energy F (r) &p yDm (r) À Á ), which is used for model comparison. One can also recover the representations since these are a function of the perceptual parameter q, for which we obtain a posterior density r q ð Þ (see equations 12 and 14 in the companion paper [1]): where all sufficient statistics and gradients are evaluated at the modê q q: Ð q r q ð Þdq of the approximate posterior r q ð Þ and Var q y j ½ : Ð q{q q 2 r q ð Þdq is the experimenter's posterior variance about the perceptual parameter.

Results
In what follows, we first apply our approach to simulated data in order to establish the face validity of the scheme, both in terms of model comparison and parameter estimation. We then present an analysis of the empirical reaction-time data from the audio-visual associative learning task in [12].

Monte-Carlo evaluation of model comparison and parameter estimation
We conducted two series of Monte-Carlo simulations (sample size = 50), under the static (series A) and dynamic perceptual models (series B). In each series, the (log) perceptual parameters were sampled from the intervals 0,3 ½ for series A and {2,2 ½ for series B. For both series, the first two (log) response parameters were sampled from the interval {2,2 ½ . As an additional and orthogonal manipulation, we systematically varied the noise on reaction times across several orders of magnitude: Each simulated experiment comprised a hundred trials and the sequence of stimuli was identical to that used in the real audio-visual associative learning study. We chose the parameters a,m ð Þ of the perceptual likelihood such that that the discrimination ratio (a=Dm 1 {m 2 D&10) was approximately similar to that of the natural images (see Figure 1). We did not simulate any categorization error. For each synthetic data set, we used both static and dynamic perceptual models for inversion of the response model and evaluated the relative evidence of the perceptual models. Since we knew the ground truth (i.e., which model had generated the data) this allowed us to assess the veracity of model comparison. Figure 3 shows a single example of simulated recognition, in terms of the subject's belief about both the stimulus and the cueoutcome association. For this simulation, the volatility of the association was set to log q~{2 (emulating a subject who assumes a low volatile environment), both for generating stimuli and recognition. We found that the variational Bayesian recognition recovers the stimulus categories perfectly (see blue line in upper-right panel of figure 3) and the cue-outcome association strength well (see lower-left panel and green lines in upper-right panels). This demonstrates that variational recognition is a close approximation to optimal Bayesian inference. Figure 4 shows the inversion of the response model, given the synthetic reaction time data in Figure 3 which were corrupted with unit noise (h 3~1 ). Adding this observation noise yielded a very low signal-to-noise ratio (SNR = 0 dB, see Figure 4), where by definition: SNR~10 log 10 StT 2 U. We deliberately used this high noise level because it corresponded roughly to that seen in the empirical data reported below. Table 1 lists the priors we placed on the parameters for this example and for all subsequent inversions with the dynamic perceptual model. Despite the low SNR of the synthetic data, the posterior estimates of the response parameters (grey bars) were very close to the true values (green circles), albeit with a slight overconfidence (upper left panel in Figure 4). Furthermore, the posterior correlation matrix shows that the perceptual and the response parameters are identifiable and separable (upper centre panel). The non-diagonal elements in the posterior covariance matrix measure the degree to which any pair of parameters is non-identifiable (see appendix in the companion paper [1]. Note that the model fit looks rather poor and gives the impression that the RT data are systematically ''under-fitted'' (lower right and lower centre panels of Fig. 4). This, however, is simply due to the high levels of observation noise: In contrast, the estimation of the true subjective beliefs is precise and accurate (see upper right and lower left panels of Fig. 4). This means that the variational Bayesian model inversion has accurately separated the ''observed'' reaction time data into noise and signal components. In other words, the estimation of the deterministic trial-by-trial variations of reaction times is not confounded by high levels of observation noise. This result (using simulated data) is important because it lends confidence to subsequent analyses of empirical reaction time data. Figure 5 shows the results of the model comparison based on series A and B. This figure shows the Monte-Carlo empirical distribution of the response free-energy differences where F (r) static (respectively F (r) dynamic ) is the approximate log-evidence for the response model under the static (respectively dynamic) perceptual model. This relative log-evidence is the approximate log-Bayes factor or log odds ratio of the two models. It can be seen from the graphs in Figure 5 that model comparison identifies the correct perceptual model with only few exceptions for the static model (left panel) and always for the dynamic model (note that a log-evidence difference of zero corresponds to identical evidence for both models). Table 2 provides the average free-energy differences over simulations as a function of the true model (simulation series) and SNR.
It is interesting that the free-energy differences are two orders of magnitude larger for series B, relative to series A. In other words, when the data-generating model is the dynamic one, it is easier to identify the true model from reaction times than when the static model generated the data. This might be due to the fact that the  Figure 2. Note that the discrimination ratio (a=Dm 1 {m 2 D) is approximately similar to that of the natural images (see Figure 2). Upper Right: Subject's posterior belief, as obtained using the inversion of the perceptual model given observed sensory cues (green: cue-outcome association, blue: visual stimulus category; solid line: posterior mean m (2) , shaded area: 99% posterior confidence interval, dots: sampled hidden states). Note that on each trial, the category of the visual stimuli was recognized perfectly. Lower Left: scatter plot comparing the simulated (sampled, x-axis) versus perceived (estimated, y-axis) cue-outcome associative strength. Lower right: simulated reaction times. doi:10.1371/journal.pone.0015555.g003 static model is a limiting case of the dynamic model; i.e. when the volatility q tends to zero the dynamical perceptual model can account for the variability in reaction times generated using the static model. However, note that this difference in model complexity does not distort or bias our model comparisons since the free energy approximation to the model evidence accounts for such differences in complexity [20].
As expected there is also a clear effect of noise: the higher the SNR, the larger the relative log-evidences. This means that model comparison will disambiguate models more easily the more precise the experimental data.
We next characterised the accuracy of parameter estimation under the best (correct) model, using the sum of squared error (SSE), in relation to the true values. We computed the Monte-Carlo empirical distribution of the SSE for each set of (perceptual and response) parameters, for each simulation series (A and B) and SNR. Figure 6 shows these distributions and Table 3 provides the Monte-Carlo averages.
Quantitatively, the parameters are estimated reasonably accurately, except for the perceptual parameter (prior precision on associative strength) of the static model. This reflects the fact that the prior on association strength has little impact on the longterm behaviour of beliefs, and hence on reaction times. This is because within the static perceptual model, q acts as an initial condition for the dynamics of the representation l, which are driven by a fixed point attractor that is asymptotically independent of q. Thus, only the first trials are sensitive to q. The ensuing weak identifiability of q expressed itself as a high estimation error (high SSE). Again, there is a clear effect of noise, such that the estimation becomes more accurate when SNR increases. Also, consistent with the model comparison results above, parameter estimates are more precise for the dynamic model than for the static one.

Application to empirical reaction times
The Monte-Carlo simulations above demonstrate the face validity of the method, in the sense that one obtains veridical model comparisons and parameter estimates, given reaction time data with realistic SNR. We now apply the same analysis to empirical reaction times from nineteen subjects [12]. Specifically, we hoped to show two things to provide evidence for the construct validity of our approach: first, that the dynamic model (which was consistent with the information given to the subjects) would have higher evidence than the static model (which was not), and secondly, that our results would generalise over both auditory cues, both in terms of model comparison and parameter estimates (as explained above, we treated reaction times for the two cues as separate sequences).
We conducted a hierarchical (two-level) analysis of the data from the nineteen subjects. Note that the original study by [12] contained twenty subjects. For experimental reasons, one of these subjects experienced a different stimulus sequence than the rest of the group. Even though it would have been perfectly possible to analyze this subject with the present approach, we decided, for reasons of homogeneity in the inter-subject comparison, to focus on subjects with identical stimulus sequence. In a first-level analysis, we inverted both dynamic and static models on both type I cues (high pitch tones) and type II cues (low pitched tones) separately, for each subject. As in the simulations above, the parameters a,m ð Þ of the perceptual likelihood (equation 1) were chosen such that stimulus discriminability (a=Dm 1 {m 2 D&10) was similar to that of the natural images (see Figure 1). Also, categorization errors were assigned a response time of zero (see histograms in upper right panels of Figs. 12-13) and a very low precision U, relative to the other trials. This allowed us to effectively remove these trials from the data without affecting the trial-to-trial learning effects. Figure 7 summarizes the model comparison results for each subject, showing the difference in log-evidence for both auditory cues. A log-evidence difference of three (and higher) is commonly considered as providing strong evidence for the superiority of one model over another [21]. Using this conventional threshold, we found that in 13 subjects out of 19 the competing perceptual models could be disambiguated clearly for at least one cue type. It can be seen that for all of these subjects except one the dynamic perceptual model was favoured. Also, it was reassuring to find that the variability of response model evidences across cue types was much lower than its variability across subjects. In particular, in 10 out of the 13 subjects where the perceptual models could be distinguished clearly, the model evidences were consistent across cue types.
In a second step, we performed a Bayesian group-level random effect analysis of model evidences [20]. Assuming that each subject might have randomly chosen any of the two perceptual models, but consistently so for both cues, we used the sum of the subjectspecific log-evidences over both cues for model comparison at the group level. Figure 8 shows the ensuing posterior Dirichlet distribution of the probability q dynamic of the dynamic perceptual model across the group, given all datasets. Its posterior expectation was approximately E q dynamic Dy Â Ã &0:82. This indicates how  frequently the dynamic model won the model comparison within the group, taking into account how discernable these models were.
We also report the so-called ''exceedance probability'' of the dynamic model being more likely than the static model, given all datasets: P q dynamic §q static Dy À Á 0:999. This measures the overall strength of evidence in favour of the dynamic perceptual model, at the group level. This is a pleasing result because, as described above, the dynamic model (where subjects assume a priori that the cue-outcome association is varying in time) was consistent with the information delivered to the subjects (whereas the static model was not).
Having established the dynamic model as the more likely model of reaction time data at the group level, we now focus on the actual estimates of both response and perceptual parameters. First, we tested for the reliability of the parameter estimates, that is, we asked whether the subject-dependent posterior densities r h,q ð Þ of the perceptual and response parameters were reproducible across both types of cues. Specifically, we hoped to see that the variability across both types of cues was smaller than the variability across subjects. For the three parameters q,h 1 ,h 2 ½ , figures 9, 10 and 11 display the variability of the posterior densities across both cues and all subjects, taking into account the posterior uncertainty Var h,qDy ½ (see equation 14). First, it can be seen that there is a consistent relationship between cue-dependent parameter estimates. Second, there is a comparatively higher dispersion of parameter estimates across subjects than across cues. Taken together, this demonstrates the reliability of parameter estimates in  the context of empirically measured behavioural data with low SNR (i,e., reaction times). This implies that one can obtain robust and subject-specific inferences with our approach. Such inferences concern both subject-specific a priori beliefs (e.g., about the stability of the environment; see equations 3 and 4) and preferences (as encoded by their individual loss function; see equation 9). To demonstrate the potential of our approach for characterizing inter-individual differences in beliefs and preferences, we present a subject-specific summary of the inverted response model (under the dynamic perceptual model) for two individuals (subjects 5 and 12). These results are summarized by Figures 12 and 13.
First, we would like to stress that, as for the group as a whole, the SNR of empirical data from these two subjects is similar to the SNR of the Monte Carlo simulation series described above (around 0 dB; see Figure 4). It is therefore not surprising that the model fit to the empirical looks similarly bad as in our simulations (compare upper left panels in Figs. 12-13 with lower right panel in Fig. 4). Note, however, that our simulations demonstrated that despite this poor fit the model parameters were estimated with high accuracy and precision; this instils confidence in the analysis of the empirical data.
Even though the two histograms of reaction time data from these two subjects were almost identical (compare upper right panels in figures 12 and 13), the trial-to-trial variations of reaction time data allowed us to identify rather different subject-specific structures of beliefs and preferences (loss functions). Concerning the beliefs of these two individuals, the parameter estimates indicated that, a priori, subject 5 assumed a much more stable environment (i.e., had a much lower prior volatility q) than subject 12; as a consequence, the dynamics of her estimates of the associative strength m (2) are considerably smoother across trials (compare lower left panels in figures 12 and 13). In other words, she averaged over more past cue-outcome samples when updating her posterior belief or representation than subject 12. Another consequence of this is the fact that subject 5 uncertainty s (2) about the associative strength is much smaller and less ''spiky'' than subject 12's. This has an intuitive interpretation: since subject 12 assumes a volatile environment, a series of predicted visual outcomes (approaching a nearly deterministic association) is highly surprising to her. This causes high perceptual uncertainty about the tracked associative strength whenever its trial-to-trial difference approaches zero.
As for the preferences (loss functions) that guided the actions of these two subjects, subject 12 displays a greater variability in her optimal decision times for very small post-hoc prediction errors (m (1) ? {m (1) 0 , see equation 12). As a consequence, her optimal decision time is greater than that of subject 5, for any given magnitude of the post-hoc prediction error (compare lower right panels in Figures 12 and 13). This is because both subject 12's error rate (i.e., h 1 ) and sensitivity to post-hoc prediction error (i.e., h 2 ) is smaller than subject 5's.
In summary, subject 12 is assuming a more variable associative strength. This means that, when compared to subject 5, she discards information about past cue-outcome associations more quickly and has more uncertain (prior) predictions about the next outcome. However, she is willing to make more categorization errors per second delay than subject 5. This is important, since she effectively needs more time to update her uncertain (i.e. potentially inaccurate) prior prediction to arrive at a correct representation. In contrast, subject 5 is more confident about her prior predictions and is more willing to risk categorization errors in order to gain time.

Discussion
In a companion paper [1], we have described a variational Bayesian framework for approximating the solution to the Inverse Bayesian Decision Theory (IBDT) problem in the context of perception, learning and decision-making studies. We propose a generic statistical framework for (i) comparing different combinations of perceptual and response models and (ii) estimating the posterior distributions of their parameters. Effectively, our approach represents a meta-Bayesian procedure which allows for Bayesian inferences about subject's Bayesian inferences. In this paper, we have demonstrated this approach by applying it to a simple perceptual categorization task that drew on audio-visual associative learning. We have focused on the problem of 'deciding when to decide', i.e. we have modelled reaction time data as arising from subjective beliefs and preferences under the constraint of a speed-accuracy trade-off. This model is novel and quite different from classical evidence accumulation and 'race' models (e.g. [22,23,10]), in two ways. First, a reaction time is understood in terms of the convergence speed of an optimization process, i.e. perceptual recognition. This is because it takes time for a (variational) Bayesian observer to arrive at an optimal representation or belief. In this paper, the within-trial (peri-stimulus time) dynamics of the recognition process emerged from a gradientascent on the free-energy, where free-energy is a proxy for (negative) perceptual surprise under a given perceptual model. The resulting form of the response model is analytically tractable and easy to interpret. Second, the variability of reaction times across subjects is assumed to depend on individual differences in prior beliefs (e.g., about the stability of the environment) and preferences (i.e., loss or utility functions). Our approach thus provides insights into both within-trial mechanisms of perception as about inter-individual differences in beliefs and preferences.
In this work, we have chosen to focus on modelling reaction time data and have deliberately ignored categorization errors. This is because considering both reaction time and choice data at the same time would have required an extension of the response likelihood. The difficulty here is purely technical: the ensuing bivariate distribution is Bernoulli-Gaussian, whose sufficient statistics follow from the posterior risk (equation 11). Although   feasible, deriving this extended response model would have significantly increased its complexity. Since the focus of this article was to provide a straightforward demonstration of our theoretical framework (described in the companion paper), we decided not to include choice data in the response model. Clearly, this is a limitation as we are not fully exploiting the potential information about underlying beliefs and preferences that is provided by observed categorization errors. This extension will be considered in future work.
In our model, categorization errors arise when incorrect prior predictions coincide with high delay costs (see equations 11 and 12). One might think that there is an irreconcilable difference between this deterministic scheme and stochastic diffusion models of binary decisions ( [24]; see also [25], for a related Bayesian treatment). However, there are several ways in which our scheme and stochastic diffusion models can be reconciled. For example, the trial-wise deterministic nature of our scheme can be obtained by choosing the initial condition of the stochastic process such that the probability of reaching the upper or lower decision threshold is systematically biased in a trial-by-trial fashion. Also, delay costs can be modelled by letting the distance between lower and upper diffusion bounds shrink over time. Alternatively, one could motivate the form of stochastic diffusion models by assuming that the brain performs a stochastic (ensemble) gradient ascent on the free energy. This would relate the frequency of categorization errors to task difficulty; for example, when a stimulus is highly ambiguous or uncertain, the perceptual free energy landscape is flat (perceptual uncertainty is related to the local curvature of perceptual free energy; see equations 5 and 6 of the companion paper). In summary, there are several ways in which our approach and stochastic diffusion models could be formally related. The utility of such hybrid models for explaining speed-accuracy tradeoffs (cf. [26]) will be explored in future work.
We initially evaluated the method using Monte-Carlo simulations under different noise levels, focusing on model inversion given synthetic data and on how well alternative models could be disambiguated. This enabled us to assess both the efficiency of parameter estimation and veracity of model comparison as a function of SNR. Importantly, we found that even under very high noise levels (SNR = 0dB, comparable to the SNR of our empirical data), and therefore poor model fit, the model nevertheless (i) yielded efficient estimates of parameters, enabling us to infer and track the trial-to-trial dynamics of subjective beliefs from reaction time data, and (ii) robustly disambiguated correct and wrong models. We then applied the approach to empirical reaction times from 19 subjects performing an associative learning task, demonstrating that both model selection results and parameter estimates could be replicated across different cue types. Reassuringly, the model selection results were consistent with the information available to the subjects. In addition, we have shown that subject-to-subject variability in reaction times can be captured by significant differences in parameter estimates (consistently again across cue types) where these parameters encode the prior beliefs and preferences (loss functions) of subjects, Together, the simulations and empirical analyses establish the construct validity of our approach and illustrate the type of inference that can be made about subjects' priors and lossfunctions. Our results suggest that the approach may be fairly efficient when it comes to comparing and identifying models of learning and decision-making on the basis of (noisy) behavioural data such as reaction times.
Some readers may wonder why we have used a relatively complicated criterion to evaluate the relative goodness of competing models; i.e., an approximation to the log-evidence, instead of simply comparing their relative fit. Generally, pure model fit indices are not appropriate for comparing models and should be avoided (cf. [27][28][29]). There are many reasons why a perfectly reasonable model may fit a particular data set poorly; for example, independent observation noise (see Figure 4 for an example). On the other hand, it is easy to construct complex models with excellent or even perfect fit, which are mechanistically meaningless and do not generalize (i.e., ''over-fitting''). In brief, competing models cannot be compared on the basis of their fit alone; instead, their relative complexity must also be taken into account. This is exactly what is furnished by the (log) model evidence, which reports the balance between model fit and complexity (and can be approximated efficiently by the variational techniques used in this paper). This allows us to compare models of different complexity in an unbiased fashion. Crucially, our Bayesian model selection method does not require models to be nested and does not impose any other constraints on the sorts of model that can be compared ( [30,20]). For example, alternative models compared within our framework could differ with regard to the mathematical form of the perceptual or the response model, the priors or the loss function -or any combination thereof. In principle, this makes it possible to investigate the relative plausibility of different explanations: For example, whether individual differences in behaviour are more likely to result from individual differences in the perceptual or the response model. For clarity, however, the empirical example shown in this paper dealt with a very simple case, in which the perceptual model was varied while the response model was kept fixed.
As with all inverse problems, the identifiability of the BDT model parameters depends upon both the form of the model and the experimental design. In our example, we estimated only one parameter of the perceptual models we considered. One might argue that rather than fixing the sensory precision (a, see Equation  1) with infinitely precise priors, we should have tried to estimate it from the reaction time data. It turns out, however, that estimating q,a ð Þ and h 1 ,h 2 ð Þ together represents a badly conditioned problem; i.e. the parameters are not jointly identifiable because of posterior correlations among the estimates. This speaks to the utility of generative models for decision-making: the impact that their form and parameterisation has on posterior correlations can be identified before any data are acquired. Put simply, if two parameters affect the prediction of data in a similar way, their  (1) ) and associative strength (m (2) and s (2) ). See main text for the precise meaning of these variables. Lower-right: posterior risk as a function of post-hoc prediction error (y-axis), i.e. the difference between posterior and prior expectations, and decision time (x-axis). The posterior risk is evaluated at subject 5's response parameters estimateĥ h for 'house' decisisions (i.e. c~1); it can be symmetrically derived for c~0. The white line shows the optimal decision time t c ð Þ for each level of post-hoc prediction error (see Equation 12 in the main text). Note that t c ð Þ is identically zero for all negative post-hoc prediction error. This signals a perceptual categorization error (Dm (1) 0 2c{1 ð Þ w0, see Equation 11 in main text), which is emitted (at the limit) instantaneously. doi:10.1371/journal.pone.0015555.g012 unique estimation will be less efficient. In our particular example, there is no critical need to estimate a from the data. This is because faces and houses are well-known objects for whose categorisation subjects have a life-long experience. It is therefore reasonable to assume that a is known to the subjects, and its value can be chosen in correspondence with the statistics of the visual stimuli (see above). However, pronounced inter-individual differences can be observed empirically in face-house discrimination tasks, and this may result from differences in the individuals' history of exposure to faces throughout life. A limitation of our model is that it does not account for such inter-subject variability but assumes that a is fixed across subjects.
In contrast to a, which can (and should) be treated as a fixed parameter, it is necessary to estimate the perceptual parameter q. Note that from the subject's perspective, q (similar to a) is quasifixed (i.e., with nearly infinite precision) as this prior has been learnt throughout life. From the experimenter's perspective, however, q is an unknown parameter which has to be inferred from the subject's behaviour. Estimating this parameter is critical for the experimenter as its value determines the subject's learning rate. This is best explained by highlighting the link between 'learning rates', as employed by reinforcement learning models, and Bayesian priors, or more precisely prior precision parameters. In the 'dynamic' perceptual model, the learning rule effectively replaces the past history of sensory signals with a summary based on the previous representation (see Eq. 8). In turn, the perceptual representation discounts past sensory signals with an exponential weighting function, whose half-life is an affine transformation of the prior volatility q. The link between q and the subject's learning rate can be seen by considering the solution to equation 8 (at convergence): where m (1) k is the belief about the auditory outcome category (face/ house) and s m (2) k is its posterior prediction based on the past history of sensory signals. Equation 15 gives the effective update rule for the perceived associative strength m (2) k when the perceptual free energy has been optimized. Note that the form of Eq. 15 corresponds to the Rescorla-Wagner learning rule [31], in which the change in associative strength m (2) k {m (2) k{1 is proportional to the prediction error, i.e. d~m (2) k {s m (2) k . In summary, for the model in the present paper, the subject's learning rate depends on the prior volatility q of cue-outcome associations. Note, however, that there may not always be a quantitative relation between prior precision parameters and learning rates because this depends on the specificities of the perceptual model. There is a general qualitative relationship between the two quantities, however, because the prior precision of hidden causes within hierarchical perceptual models controls the relative weight of upcoming sensory information and prior (past) beliefs in forming the actual posterior representation. In short, this means the learning rate itself (and thus any 'forgetting' effect) emerges from optimal Bayesian recognition (see e.g., [32] for a nice example). A full treatment of these issues will be presented in forthcoming work [33].
Another analogy concerns the optimal decision time derived from the speed-accuracy trade-off given in Equation 12 which is similar in form to Hick's law. This law relates the reaction times to the amount of extracted information (c.f. [34]). In its simplest form, Hick's law is given by: RT~azb log(n), where RT is the expected reaction time and n is the number of choice alternatives. Here, log (n) is the perceptual uncertainty (as measured by Shannon entropy). It turns out that when no categorization error is made, Equation 12 could be rewritten as RT~azb logDDm 0 D, where Dm 0 is the post-hoc prediction error, i.e. posterior minus prior expectation. Put simply, logDDm 0 D measures incoming information.
There are obvious formal (information theoretic) differences between Equation 12 and Hick's law, but they capture similar intuitions about the mechanisms causing variations in reaction times.
This paper has demonstrated the practical application of the meta-Bayesian framework described in the companion paper, using empirical reaction time data from an audio-visual associative learning task reported in [12]. Authors presented several analyses of these data, including a formal comparison of alternative learning models. The results provided in the present article finesse the original comparisons and take us substantially beyond the previous report. First, the paper [12] did not provide any decision theoretic explanation for (learning induced) motor facilitation. In that paper, the behavioural comparison of different learning models was a precursor to using prediction error estimates in a model of fMRI data. It therefore only used a very simple response model assuming that (inverse) reaction times scale linearly with prediction error. In contrast, we have proposed a response model that is fully grounded in decision theory and does not assume a specific (e.g., logarithmic) relationship between prediction errors and motor facilitation. Second, we conducted a full two-level analysis of the reaction time data, in order to assess interindividual differences. This was made possible because, as opposed to the work in [12], we allowed for inter-individual differences in both the perceptual and response parameters (see above).
Finally, we wish to emphasize that the ''observing the observer'' (OTO) approach for inference on hidden states and parameters can be obtained in a subject-specific fashion, as demonstrated by our empirical analyses in this paper (see Figs. 9-13). This allows for analyses of inter-individual differences in the mechanisms that generate observed behaviour. Such quantitative inference on subject-specific mechanisms is not only crucial for characterizing inter-individual differences, an important theme in psychology and economics in general, but also holds promise for clinical applications. This is because spectrum diseases in psychiatry, such as schizophrenia or depression, display profound heterogeneity with regard to the underlying pathophysiological mechanisms, requiring the development of models that can infer subject-specific mechanisms from neurophysiological and/or behavioural data [35]. In this context, the approach presented in this paper can be seen as a complement to DCM: OTO may be useful for inference on subject-specific mechanisms expressed through behaviour, in a similar way as DCM is being used for inference on subject-specific mechanisms underlying neurophysiology.

Supporting Information
Appendix S1 Appendix S1 ('deciding when to decide') is included as 'supplementary material'. It summarizes the mathematical derivation of the optimal reaction times (as given in equation 12) from first principles, within the framework of Bayesian Decision Theory. (DOC)