A predictive coding account of bistable perception - a model-based fMRI study

In bistable vision, subjective perception wavers between two interpretations of a constant ambiguous stimulus. This dissociation between conscious perception and sensory stimulation has motivated various empirical studies on the neural correlates of bistable perception, but the neurocomputational mechanism behind endogenous perceptual transitions has remained elusive. Here, we recurred to a generic Bayesian framework of predictive coding and devised a model that casts endogenous perceptual transitions as a consequence of prediction errors emerging from residual evidence for the suppressed percept. Data simulations revealed close similarities between the model’s predictions and key temporal characteristics of perceptual bistability, indicating that the model was able to reproduce bistable perception. Fitting the predictive coding model to behavioural data from an fMRI-experiment on bistable perception, we found a correlation across participants between the model parameter encoding perceptual stabilization and the behaviourally measured frequency of perceptual transitions, corroborating that the model successfully accounted for participants’ perception. Formal model comparison with established models of bistable perception based on mutual inhibition and adaptation, noise or a combination of adaptation and noise was used for the validation of the predictive coding model against the established models. Most importantly, model-based analyses of the fMRI data revealed that prediction error time-courses derived from the predictive coding model correlated with neural signal time-courses in bilateral inferior frontal gyri and anterior insulae. Voxel-wise model selection indicated a superiority of the predictive coding model over conventional analysis approaches in explaining neural activity in these frontal areas, suggesting that frontal cortex encodes prediction errors that mediate endogenous perceptual transitions in bistable perception. Taken together, our current work provides a theoretical framework that allows for the analysis of behavioural and neural data using a predictive coding perspective on bistable perception. In this, our approach posits a crucial role of prediction error signalling for the resolution of perceptual ambiguities.


Mathematical Appendix
As an example for an ambiguous stimulus, we use the Lissajous figure that -due to its ambiguous depth structure -is alternately perceived as a clockwise (as viewed fom above, i.e. movement of the front surface to the left) and counter-clockwise (vice versa) rotating object (see also supplementary video).
From a predictive coding perspective, the brain entertains and inverts a generative model of how sensory data are caused. In our case, the sensory environment is constrained to objects which rotate either clockwise or counter-clockwise, while direction of rotation changes at a specific frequency. In analogy, our model represents a generative model of how potentially ambiguous sensory data are caused by objects in the sensory environment, while the frequency of changes is governed by an implicit prior for stability.
The inversion of this model is based on sensory information µ stereo (t) and the prediction of the perceived direction of rotation y(t). It allows for the estimation of model parameters (i.e. the initial precision of the stability prior π init ; the precision of sensory stimulation π stereo , the inverse decision temperature ζ). These model parameters govern the updates in model quantities (i.e. the mean and precision of the stability prior µ stability and π stability ; the mean and precision of the joint prior distribution µ m and π m ; the probability of perceiving counter-clockwise rotation P (θ > 0.5); the predicted perceptual response y predicted ). See Table 7 for a list of model parameters and model quantities.
As new perceptual decisions on the rotation of the Lissajous figure are made almost exclusively at overlapping stimulus configurations ('overlaps', [1,2]). we can convert the continuous timecourse of stimulus presentation to discrete timepoints t of 'overlaps'. Accordingly, the sampling rate of our model is given by the frequency at which 'overlaps' occur and depends on the rotational speed of the Lissajous figure.
At each timepoint t, the two alternative visual percepts are predicted on the basis of a posterior probability distbution over θ: Participants responded with button-presses indicating the current visual percept: Similar to other ambiguous stimuli, the Lissajous figure can be disambiguated by additional cues -e.g. by stereodisparity between the two eyes -in order to create a 'replay' condition.
Here, perception is forced to alternate between a clockwise and counter-clockwise rotating stimulus with a similar time-course as during bistable perception. Hence, we formalize the sensory information in both replay and ambiguity by the Gaussian distribution 'stereodisparity' (N (µ stereo , π −1 stereo )), which is used as a weight on a bimodal likelihood distribution (see Equation 7 -10, [3]). The mean of this distribution µ stereo is given by the direction of disambiguation at timepoint t: Its precision π stereo (the inverse of its variance) encodes the strength of disambiguation, and is either chosen as a free parameter or fixed to 0 (thereby eliminating a contribution of a disambiguation on the prediction of perceptual decisions).
Furthermore, we hypothesize that the current percept represents an (implicit) prior belief contributing to the stability of visual perception, which is given by the Gaussian distribution 'perceptual stability' (N (µ stabiliy , π −1 stability )). The mean of this prior µ stability at timepoint t is defined by the current percept as indicated by the participant at the preceding overlap: In turn, the impact of this prior distribution on visual perception is reflected by its precision π stability . Central to our model of bistable perception, we allow this precision to be affected by a prediction error signal.
If a new perceptual decision was made at the preceding timepoint, π stability (t) is set to the initial perceptual precision π init : This initial perceptual precision π init represents the strength of an initial perceptual stabilization following a perceptual transition and can be chosen as free parameter or fixed to 0 (thus eliminating the stability prior from the model).
In all other cases, π stability (t) is calculated by updating the perceptual precision of the preceding timepoint π stability (t − 1) with the prediction error of the preceding timepoint P E(t − 1): To compute a posterior distribution, we combine the prior distribution 'perceptual stability' (parameterized by µ stability and π stability ) with the 'stereodisparity'-weight of the likelihood (parameterized by µ stereo and π stereo ) into a joint distribution m: π m = π stereo + π stability (7) µ m = π stereo * µ stereo + π stability * µ stability π m (8) The joint distribution m is used as weight on a bimodal likelihood distribution [3] in order to calculate the density ratio of the posterior for the two peak locations θ 0 = 0 and θ 1 = 1: Please note that it is an arbitrary choice which of the two directions we consider, as the two posterior probabilities P (θ > 0.5) and P (θ < 0.5) sum up to 1.
By applying a unit sigmoid function parametrized by the inverse decision temperature ζ to the posterior probability of counter-clockwise rotation P (θ > 0.5)(t), we predict the participants response y(t), which represents the basis for the optimization of model parameters: Most importantly, we use the difference between the current percept y(t) as indicated by the participant and the posterior probability of counter-clockwise rotation P (θ > 0.5)(t) to calculate a prediction error P E(t) that represents the residual evidence in favour of the suppressed percept: It is noteworthy that the inclusion of a stereodisparity weight allows us to treat both ambiguity and replay within the same framework: The prediction error P E(t) is also computed in the replay condition and shows similar temporal dynamics in the replay condition as in the ambiguity condition. The stereodisparity weight (µ stereo = 0.5), however, renders the posterior probability P (θ > 0.5) more similar to the currently induced percept y(t) as compared to the ambiguity condition, where the stereodisparity weight (mean µ stereo = 0.5) is uninformative with respect