Humans adapt their anticipatory eye movements to the volatility of visual motion properties

Animal behavior must constantly adapt to changes, for example when the statistical properties of the environment change unexpectedly. For an agent that interacts with this volatile setting, it is important to react accurately and as quickly as possible. It has already been shown that when a random sequence of motion ramps of a visual target is biased to one direction (e.g. right or left), human observers adapt to accurately anticipate the expected direction with their eye movements. Here, we prove that this ability extends to a volatile environment where the probability bias could change at random switching times. In addition, we also recorded the explicit direction prediction reported by observers as given by a rating scale. Both results were compared to the estimates of a probabilistic agent that is optimal in relation to the event switching generating model. Compared to the classical leaky integrator model, we found a better match between our probabilistic agent and the behavioral responses, both for the anticipatory eye movements and the explicit task. Furthermore, by titrating the level of preference between exploration and exploitation in the model, we were able to fit each individual experimental data-set with different levels of estimated volatility and derive a common marker for the inter-individual variability of participants. These results prove that in such an unstable environment, human observers can still represent an internal belief about the environmental contingencies, and use this representation both for sensory-motor control and for explicit judgments. This work offers an innovative approach to more generically test the diversity of human cognitive abilities in uncertain and dynamic environments. Author summary Understanding how humans adapt to changing environments to make judgments or plan motor responses based on time-varying sensory information is crucial for psychology, neuroscience and artificial intelligence. Current theories for how we deal with the environment’s uncertainty most rely on the equilibrium behavior in response to the introduction of some randomness change. Here we show that in the more ecological case where the context switches at random times all along the experiment, an adaptation to this volatility can be performed online. In particular, we show in two behavioral experiments that humans can adapt to such volatility at the early sensorimotor level, through their anticipatory eye movements, but also at a higher cognitive level, through explicit ratings. Our results suggest that humans (and future artificial systems) can use much richer adaptive strategies than previously assumed.

belief about the environmental contingencies, and use this representation both for 23 sensory-motor control and for explicit judgments. This work offers an innovative 24 approach to more generically test the diversity of human cognitive abilities in uncertain 25 and dynamic environments. 26 Author summary 27 Understanding how humans adapt to changing environments to make judgments or plan 28 motor responses based on time-varying sensory information is crucial for psychology, 29 neuroscience and artificial intelligence. Current theories for how we deal with the 30 environment's uncertainty most rely on the equilibrium behavior in response to the 31 introduction of some randomness change. Here we show that in the more ecological case 32 where the context switches at random times all along the experiment, an adaptation to 33 this volatility can be performed online. In particular, we show in two behavioral 34 experiments that humans can adapt to such volatility at the early sensorimotor level, 35 through their anticipatory eye movements, but also at a higher cognitive level, through 36 explicit ratings. Our results suggest that humans (and future artificial systems) can use 37 much richer adaptive strategies than previously assumed. 38 * Institut de Neurosciences de la Timone (UMR 7289), Aix Marseille Univ, CNRS -Marseille, France 1 Motivation 39 40 cognitive systems 41 We live in a fundamentally volatile world for which our cognitive system has to 42 constantly adapt. In particular, this volatility may be generated by processes with 43 different time scales. Imagine for instance you are a general practitioner and that you 44 usually report an average number of three persons infected by measles per week. 45 However, this rate is variable and over the past week you observe that the rate 46 increased to ten cases. As such, two alternative interpretations are available: the first 47 possibility is that there is an outbreak of measles and one should then estimate its 48 incidence (i.e. the rate of new cases) since the inferred outbreak's onset, in order to 49 quantify the infection rate specific to this outbreak, but also to update the value of the 50 environmental volatility (as given by the probability of a new outbreak) at a longer time 51 scale. Alternatively, these cases are "unlucky" coincidences that originate from the 52 natural variability of the underlying statistical process which drive patients to the 53 doctor, but which are instances drawn from a stationary random process. In that 54 option, it may be possible to readjust the estimated baseline rate of infection with this 55 new data. This example illustrates one fundamental problem with which our cognitive 56 system is faced: when observing new sensory evidence, should I stay and continue to 57 exploit this novel data with respect to my current beliefs about the environment's state 58 or should I go and explore a new hypothesis about the random process generating the 59 observations since the detection of a switch in the environment? 60 By definition, volatility measures the temporal variability of the sufficient 61 parameters of a random variable. Such meta-analysis of the environment's statistical 62 properties is an effective strategy at the large scale level of our example, but also at all 63 levels which are behaviorally relevant, such as contextual changes in our everyday life. 64 Inferring near-future states in a dynamic environment, such that one can prepare to act 65 upon them ahead of their occurrence [1] -or at least forming beliefs as precise as 66 possible about a future environmental context -is an ubiquitous challenge for 67 cognitive systems [2]. In the long term, how the human brain dynamically manages this 68 trade-off between exploitation and exploration is essential to the adaptation of the 69 behavior through reinforcement learning [3]. In controlled experimental settings which 70 challenge visual perception or sensorimotor associations, such adaptive processes have 71 been mostly put in evidence by precisely analyzing the participants' behavior in a 72 sequence of experimental trials, typically highlighting sequential effects at the time scale 73 of several seconds to minutes or even hours in the case of the adaptation to a persistent 74 sensorimotor relation. modulated by extra-retinal, predictive information even in the absence of a direct visual 141 stimulation. The anticipatory smooth pursuit behavior is remarkable in different aspects. 142 First, its buildup is relatively fast, such that only a few trials are sufficient to pick up 143 some regularity in the properties of visual motion, such as speed or direction [11,42,43]. 144 Second, it is in general an unconscious process of which participants are not aware of. 145 As such, this behavior is potentially a useful marker to study the internal representation 146 of motion expectancy and in particular to analyze how sensorimotor expectancy 147 interacts dynamically with contextual contingencies in shaping oculomotor behavior. 148 Typically, an aSPEM is observed after a temporal cue and before target motion 149 onset [37,38,42]. It is generally assumed that the role of aSPEMs is to minimize as fast 150 as possible the visual impairment due to the amplitude of eye-to-target position and 151 velocity mismatch. Overall, anticipation can potentially reduce the typical sensorimotor 152 delay between target motion onset and foveation. In a previous study [44], we have 153 analyzed how forthcoming motion properties, such as target speed or direction, can be 154 predicted and anticipated with coherently oriented eye movements. It has been observed 155 that the strength of anticipation, as measured by the mean anticipatory eye velocity, 156 increases when the target repeatedly moves in the same direction [42,45,46]. We 157 similarly found a graded effect of both the speed and the direction-bias on the strength 158 of aSPEM. In particular, this effect is linearly related to the probability of motion's 159 speed or direction. These results are coherent within previous oculomotor findings by 160 our and also other groups [47]. These results imply that the probability bias over a 161 target's direction is one additional factor beyond other physical and cognitive 162 cues [12,47,48] that modulate the common predictive framework driving anticipatory 163 behavior. 164

165
The goal of this study is to generalize the adaptive process observed in the aSPEM 166 response in previous studies [44,47] to more ecological settings and also to broaden its 167 scope by showing that such adaptive processes occur at the conscious level as well. We 168 already mentioned that by manipulating the probability bias for target motion direction, 169 it is possible to modulate the direction and mean velocity of aSPEM. This suggests that 170 probabilistic information may be used to inform the internal representation of motion 171 prediction for the initiation of anticipatory movements. However, it is yet unclear what 172 generative model to use to dynamically manipulate the probability bias and generate an 173 ecologically relevant input sequence of target directions. A possible confound comes 174 from the fact that previous studies have used trial sequences (blocks) of fixed lengths, 175 stacked in a sequence of conditions defined by the different probability biases. Indeed, 176 observers may potentially pick up the information on the fixed block's length to predict 177 the occurrence of a switch (a change in probability bias) during the experiment. Second, 178 we observed qualitatively that following a switch, the strength of aSPEM changed 179 gradually, consistently with other adaptation paradigms [49][50][51]. The estimate of the 180 characteristic temporal parameters for this adaptation mechanism may become 181 particularly challenging in a dynamic context, where the probabilistic contingencies vary 182 in time in an unpredictable way. Finally, whether and how the information processing 183 underlying the buildup of aSPEM and its dynamics is linked to an explicit estimate of 184 probabilities is still largely unknown.

185
To assess the dynamics of the adaptive processes which compensate for the variability 186 within sensory sequences, one may generate random sequences of Target Directions 187 (TDs) using a dynamic value for the probability bias p = Pr(TD is 'right'), with a 188 parametric mechanism controlling for the volatility at each trial. In the Hierarchical 189 Gaussian Filter model [52], for instance, volatility is controlled as a non-linear 190 transformation of a random walk (modeled itself by a Brownian motion with a given We tested the capacity of human participants to adapt to a volatile environment by using a simple, 3-layered generative model of fluctuations in target directions (TD) that we call the Binary Switching Model (BSM). This TD binary variable is chosen using a Bernoulli trial of a given probability bias. This probability bias is constant for as many trials until a switch is generated. At a switch, the bias is chosen at random from a given prior. Switches are generated in the third layer as binary events drawn from a Bernoulli trial with a given hazard rate (defined here as 1/40 per trial). (B) The eye-movements task was an adapted version of a task developed by [44]. Each one of 600 trials consisted of sequentially: a fixation dot (of random duration between 400 and 800 ms), a blank screen (of fixed duration of 300 ms) and a moving ring-shaped target (with 15°/s velocity) which the observers were instructed to follow. The direction of the target (right or left) was drawn pseudo-randomly according to the generative model defined above. (C) In order to titrate the adaptation to the environmental volatility of target direction at the conscious level, we invited each observer to perform on a different day a new variant of the direction-biased experiment, where we asked participants to predict, before each trial, their estimate of the forthcoming direction of the target. As shown in this sample screenshot, this was performed by moving a mouse cursor (black triangle) on a continuous rating scale between "sure left", to "unsure" and finally "sure right". diffusion coefficient). Ultimately, this hierarchical model allows to generate a sequence 192 of binary choices where the variability fluctuates along a given trajectory. Such a 193 forward probabilistic model is invertible using some simplifying assumptions and allows 194 to extract a time-varying inference of the agent's belief about volatility [53]. Herein, to 195 analyze the effect of history length in all generality, we extended the protocol of [44] 196 such that the probability bias is still fixed within blocks but that these blocks have 197 variable lengths, that is, by introducing switches occurring at random times. Therefore, 198 similarly to [54], we will use a model where the bias p in target direction varies 199 according to a piecewise-constant function. In addition, in our previous study the range 200 of possible biases was finite. In the present work, we extended the paradigm by drawing 201 p as a continuous random variable within the whole range of possible probability biases 202 (that is, the segment [0, 1]). As a summary, we first draw random events (that we 203 denote as "switches") with a given mean frequency and which controls the strength of 204 the volatility. Second, the value p of the bias only changes at the moment of a switch, 205 independently of the previous bias' value and is stationary between two switches, 206 forming what we call an "epoch". Third, target direction is drawn as a Bernoulli trial 207 using the current value of p. Such a hierarchical structure is presented in Figure 1-A, 208 where we show the realization of the target's directions sequence, the trajectory of the 209 underlying probability bias (hidden to the observer), and the occurrences of switches.

210
Mathematically, this can be considered as a three-layered hierarchical model defining 211 the evolution of the model at each trial t as the vector ( switches. In the middle layer, the probability bias p of target direction is a random 216 variable that we define as x t 1 ∈ [0, 1]. It is chosen at random from a prior distribution P 217 at the moment of a switch, and else it is constant until the next occurrence of a switch. 218 The prior distribution P can be for instance the uniform distribution U on [0, 1] or 219 Jeffrey's prior J (see Appendix 8.3). Finally, a target moves either to the left or to the 220 right, and we denote this variable (target direction, TD) as x t 0 ∈ {0, 1}. This direction 221 is drawn from a Bernoulli trial parameterized by the direction bias p = x t 1 . In summary, 222 this is described according to the following equations: In practice, we generated a sequence of 600 trials, and there is by construction a switch 224 at t = 0 (that is, x 0 2 = 1). In addition, we imposed in our sequence that a switch occurs 225 after trial numbers 200 and 400, in order to be able to compare adaptation properties 226 across different chunks of the trials sequence. With such a three-layered structure, the 227 model generates the randomized occurrence of switches, itself generating epochs with experiments. We will use that generative model as the basis of an ideal observer model 233 inverting that model to predict probability biases from the observations (TDs) and 234 which we will test as a model for the adaptation of human behavior.

235
This paper is organized in five parts. After this introduction where we presented the 236 motivation for this study, the next section (Section 2) will present an inversion of the model. To our knowledge, such algorithm was not yet available, and we will here 239 provide with an exact analytical solution by extending previous results from [55] to the 240 case of binary data as in the BSM presented above (see Equation 1). In addition, the 241 proposed algorithm is biologically realistic as it uses simple computations and is online, 242 that is, that all computations on the sequence may be done using solely a set of 243 variables available at the present trial, compactly representing all the sequence history 244 seen in previous trials. We will also provide a computational implementation and a 245 quantitative evaluation of this algorithm. Then, we will present in Section 3 the analysis 246 of experimental evidence to validate the generalization of previous results with this 247 novel protocol. In one session, participants were asked to estimate "how much they are 248 confident that the target will move to the right or left in the next trial" and to adjust 249 the cursor's position on the screen accordingly (see Figure 1-C). In the other 250 experimental session on a different day, we showed the same sequence of target 251 directions and recorded participants' eye movements (see Figure 1-B). Indeed, in order 252 to understand the nature of the representation of motion regularities underlying this 253 adaptive behavior, it is crucial to collect both the recording of eye movements and the 254 verbal explicit judgments about expectations on motion direction. Another novelty of 255 our approach is to use that agent as a regressor which will allow us to match 256 experimental results with the BBCP and to compare its predictive power compared to 257 classical models such as the leaky integrator model. Hence, we will show that behavioral 258 results match well with the BBCP model. In Section 4, we will synthesize these results 259 by inferring the volatility parameters inherent to the models by best-fitting it to each 260 each individual participant. This will allow the analysis of inter-individual behavioral 261 responses for each session. In particular, we will test if one could predict observers' 262 prior (preferred) volatility, that is, a measure of the dynamic compromise between 263 exploration ("should I go?") and exploitation ("should I stay?") across the two different 264 sessions challenging predictive adaptive processes at the unconscious and conscious 265 levels. Finally, we will summarize and conclude this study and offer some perspectives 266 for future work in Section 5. To begin, we define a first ideal observer as a control, the leaky integrator (or forgetful 276 agent), which has an exponentially-decaying memory for the events that occurred in the 277 past trials. This agent can equivalently be described as one which assumes that 278 volatility is stationary with a fixed characteristic frequency of switches. Then, we will 279 extend this model to an agent which assumes the existence of (randomly occurring) 280 switches, that is, that the value of the probabilistic bias may change at specific (yet

283
The leaky integrator ideal observer represents a classical, widespread and realistic model 284 of how trial-history shapes adaptive processes in human behavior. It is also well anticipatory SPEMs. In this model, given the sequence of observations x t 0 from trial 0 287 to t, the expectation p =x 1 t+1 of the probability for the next trial direction can be 288 modeled by making a simple heuristic: This probability for an event is the weighted 289 average of the previously estimated probability,x 1 t , with the new information x t 0 , where 290 the weight corresponds to a leak term (or discount) by a factor (1 − h), with 291 h ∈ [0, 1] [56]. At trial t, this model can be expressed with the following equation: wherex 1 t=0 is equal to some prior value (0.5 in the unbiased case), corresponding to the 293 best guess at t = 0 (prior to the observation of any data).

294
In other words, the estimated probabilityx 1 t+1 is computed from the integration of 295 previous instances with a progressive discount of past information. The value of the 296 scalar h represents a compromise between responding rapidly to changes in the 297 environment (h ≈ 1) and not prematurely discarding information still of value for slowly 298 changing contexts (h ≈ 0). As such, we will call this scalar the hazard rate. Similarly, The first term corresponds to the discounted effect of the prior value before any 305 observation and it tends to 0 when t increases. More importantly, as 1 − h < 1, the 306 second term corresponds to the leaky integration of novel observations. Inversely, let us 307 now assume that the true probability bias for direction changes randomly with a mean 308 rate of once every τ trials. As a consequence, the probability that the bias does not 309 change is P r(x t 2 = 0) = 1 − h at each trial. Assuming independence of these 310 occurrences, the estimated probability p =x 1 t+1 is thus proportional to the sum of the 311 past observations weighted by the belief that the bias has not changed during i trials in 312 the past, that is, exactly as defined by the second term of the right-hand side 313 in Equation 3. This shows that assuming that changes occur at a constant rate 314 (x 2 t = h) but ignoring the variability in the temporal occurrence of the switch, the 315 optimal solution to this inference problem is the ideal observer defined in Equation 3,316 which finds an online recursive solution in Equation 2. We therefore proved here that 317 the heuristic derived from [56] is an ideal inversion of the two-layered generative model 318 which assumes a constant hazard rate for the probability bias.

319
The correspondence that we proved between the weighted moving average heuristic 320 and the forgetful agent model as an ideal solution to that generative model leads us to 321 several interim conclusions. First, the time series of inferredx 1 t+1 values can serve as a 322 regressor for behavioral data to test whether human observers follow a similar strategy. 323 In particular, the free parameter h may be fitted to variations of the behavioral data 324 across the sequence, which itself is assumed to depend on the agents' belief in the 325 weight decay. Now, since we have defined a first generative model and the 326 corresponding ideal observer (the forgetful agent), we next define a more complex 327 model, in order to overcome some of the limits of the leaky integrator. Indeed, a first 328 criticism could be that this model is too rigid and does not sufficiently account for the 329 dynamics of contextual changes [57] as the weight decay corresponds to assume a priori 330 a constant precision in the data sequence, contrary to more elaborate Bayesian 331 models [58]. It seems plausible that the memory size (or history length) used by the 332 brain to infer any event probability can vary, and that this variation could be related to 333 the environmental volatility inferred from past data. The model presented in Equation 3 334 uses a constant weight (decaying with the distance to the current trial) for all trials, 335 while the actual precision of each trial can be potentially evaluated and used for 336 precision-weighted estimation of the probability bias. To address this hypothesis, our 337 next model is inspired by the Bayesian Change-point detection model [55] of an ideal 338 agent inferring both the trajectory in time of the probability bias (x t 1 ) but also of the 339 probability P r(x t 2 = 1) of the occurrence of switches.

341
There is a crucial difference between the forgetful agent presented above and an ideal 342 agent which would invert the Binary Switching Model (BSM, see Equation 1). Indeed, 343 at any trial during the experiment, the agent may infer beliefs about the probability of 344 the volatility x t 2 which itself is driving the trajectory of the probability bias x t 1 .

345
Knowing that the latter is piece-wise constant, an agent may have a belief over the 346 number of trials since the last switch. This number, that is called the run-length r t , is 347 useful in two manners. First, it allows the agent to restrict the predictionx 1 t+1 of x t+1 Bayesian predictive model, we introduce the run-length as a latent variable which gives 365 to the agent the possibility to represent different parallel hypotheses about the input. 366 We therefore draw a computational graph (see Figure 2-A) where, at any trial, an 367 hypothesis is formed on as many "nodes" than there are run-lengths (and limited for 368 instance by the total number of trials). As a readout, we can use this knowledge of the 369 predictive probability conditioned on the run-length, such that one can compute the 370 marginal predictive distribution: With these premises, we define the BBCP as a prediction / update cycle which 378 connects nodes from the previous trial to that at the current trial. Indeed, we will 379 predict the probability β . Run-length estimates are expressed as hypotheses about the length of a sub-block over which the probability bias was constant, that is, the number of trials since the last switch. Here, the true probability bias switched from a value of .5 to .9 at trial 7, as can be seen by the trajectory of the true run-length (blue line). The BBCP model tries to capture the occurrences of a switch by inferring the probability of different possible run lengths. At any new datum (trial), this defines a Hidden Markov Model as a graph (treillis), where edges indicate that a message is being passed to update each node's probability (as represented by arrows from trial 13 to 14). Black lines denote a progression of the run length at the next step (no switch), while gray lines stand for the possibility that a switch happened: In this case the run length would fall back to zero. The probability for each node is represented by the grey scale (darker grey colors denote higher probability) and the distribution is shown in the inset for two representative trials: 5 and 11. Overall, this graph shows how the model integrates information to accurately identify a switch and produce a prediction for the next trial (e.g. for t = 14). (B) On a longer sequence of 200 trials, representative of a sub-block of our experimental sequence (see Figure 1-A), we show the actual events which are observed by the agent (TD), along with the (hidden) dynamics of the true probability bias P true (blue line), the value inferred by a leaky integrator (P leaky , orange line) and the results of the BBCP model in estimating the probability bias P BBCP (green line), along with .05 and .95 quantiles (shaded area). This shows that for the BBCP model, the accuracy of the estimated value of the probability bias is higher than for the leaky integrator. Below we show the belief (as grayscales) for the different possible run lengths. The green and orange line correspond to the mean run-length which is inferred, respectively, by the BBCP and leaky models: Note that in the BBCP, while it takes some trials to detect switches, they are in general correctly identified (transitions between diagonal lines) and that integration is thus faster than for the leaky integrator, as illustrated by the inferred value of the probability bias.
a previous trial. In particular, at the occurrence of the first trial, we know for certain 381 that there is a switch and initial beliefs are thus set to the values β (0) 0 = P r(r t = 0) = 1 382 and ∀r > 0, β (r) 0 = P r(r 0 = r) = 0. Then, at any trial t > 0, as we observe a new ) and the transition probabilities defined by the generative 385 model to predict the beliefs over all nodes: In the computational graph, Equation 5 corresponds to a message passing from the 387 nodes at time t − 1 to that at time t. We will now detail how to compute the transition 388 probabilities and the likelihood.
This may be illustrated by a graph in which information will be represented at the 393 different nodes for each trial t. This defines the transition matrix P r(r t |r t−1 ) as a 394 partition in two exclusive possibilities: Either there was a switch or not. It allows us to 395 compute the growth probability for each run-length. On the one hand, the belief of an 396 increment of the run-length at the next trial is: where h is the scalar defining the hazard rate. On the other hand, it also allows to 398 express the change-point probability as: Knowing this probability strength and the previous value of the prediction, we can 401 therefore make a prediction for our belief of the probability bias at the next trial t + 1, 402 prior to the observation of a new datum x t+1 0 and resume the prediction / update cycle 403 (see Equations 4, 7 and 8).

404
Integrated in our cycle, we update beliefs on all nodes by computing the likelihood 405 π (r) t of the current datum x t 0 knowing the current belief at each node, that is, based on 406 observations from trials 0 to t − 1. A major algorithmic difference with the BCP 407 model [55], is that here the observed data is a Bernoulli trial and not a Gaussian 408 random variable. The random variable x t 1 is the probability bias used to generate the 409 sequence of events x t 0 . We will infer it for all different hypotheses on r t , that is, 410 knowing there was a sequence of r t Bernoulli trials with a fixed probability bias in that 411 epoch. Such an hypothesis will allow us to compute the distribution P r(x t+1 0 |r t , x 0:t 0 ) by 412 a simple parameterization. Mathematically, a belief on the random variable x t 1 is 413 represented by the conjugate probability distribution of the binomial distribution, that 414 is, by the beta-distribution As a consequence, ∀r, t; ν (r) t is the sample size corrected by the initial condition. is the average at trial t over 422 the r + 1 last samples, which can also be written in a recursive fashion: This updates for each node the sufficient statistics of the probability density function at 424 the current trial. We can now detail the computation of the likelihood of the current 425 datum x t 0 with respect to the current beliefs : π . This scalar is with Z such that L(r|o = 1) + L(r|o = 0) = 1. The full derivation of this function is ) as a readout. This probability 438 bias is best estimated by its expected valuex 1 t+1 = P r(x t+1 0 |x 0:t 0 ) as it is marginalized 439 over all run-lengths:  incremented by 1 at each trial until a new switch, and the probabilityx 1 t is estimated 473 by integrating sensory evidence in this epoch (it 'stays'). Then, we observe that shortly 474 after a switch (hidden to the agent), the belief diffuses until the relative probability of a 475 continuously increasing run-length is lower that that assigned to a smaller run-length: 476 There is a transition to a new state (the model 'goes'). Such adaptation is similar to the 477 slow / fast heuristic model proposed in other studies [59]. Second, we can use this 478 information to readout the most likely probability bias and use it as a regressor for the 479 behavioral data. Note that the leaky integrator model is implemented by the agent 480 assuming a fixed run-length profile (see orange line in Figure 2  In order to quantitatively evaluate the algorithm and following a similar strategy 489 as [60], we computed an overall cost C as the negative log-likelihood (in bits) of the 490 estimated probability bias, knowing the true probability and averaged over all T trials: The measure C(x t 1 ,x 1 t ) explicitly corresponds to the average score of our model, as the 492 Kullback-Leibler distance ofx 1 t compared to the hidden true probability bias x t 1 . We Velocity of eye °/s We also show the evolution of the value of the probability bias P true (blue line) which is hidden to observers and used to generate the TD sequence above. We have overlaid the results of the BBCP model (see Figure 2-B, green line). This shows qualitatively a good match between the experimental evidence and the model. Note that short pauses occurred every 50 trials (as denoted by vertical black lines, see main text), and we added the assumption in the model that there was a switch at each pause. This is reflected by the reset of the green curve close to the 0.5 level and the increase of the uncertainty after each pause.
protocol: averaged over all observers, for each individual observer or independently in 501 all individual sub-blocks. In a second step, by testing different values of h assumed by 502 the agent but for a fixed hazard rate h = 1/40 in the BSM, we found that the distance 503 given by Equation 13 is minimal for the true hazard rate used to generate the data. In 504 other words, this analysis shows that the agent's inference is best for a hazard rate 505 equal to that implemented in the generative model and which is actually hidden to the 506 BBCP agent. This property will be important in a following section to estimate the 507 hazard rate implicitly assumed by an individual participant on the basis of the set of 508 responses given to the sequence of stimuli (see Section 4). As a summary, for each trial 509 of any given sequence, we obtain an estimate of the probability bias assumed by the 510 ideal observer and which we may use as a regressor. We will now present the analysis of 511 this model's match to our experimental measures of anticipatory eye movements and 512 explicit guesses about target motion direction.

513
3 Results: Anticipatory eye movements and explicit 514 ratings 515 We used the BSM model to generate the (pseudo-)random sequence of the dot's 516 directions (the alternation of leftward/rightward trials) as the sequence of observations 517 that were used in both sessions (see Figure 3). In one session, we recorded the 518 participants' eye movements and we show the anticipatory smooth pursuit velocity for 519 two representative participants (out of 12 participants), throughout a sub-block of 200 520 trials of the experimental sequence. Note that these participants were chosen as those 521 whose fitting score was nearest to the median score in the quantitative analysis that will 522 be illustrated below in Section 4. In the top panel of Figure 3 we the Bernoulli trial. In general, results are more variable when the bias is weak (p ≈ .5) 536 than when it is strong (close to zero or one), consistent with the well-known dependence 537 of the variance of a Bernoulli trial upon the probabilistic bias (Var(p) = p · (1 − p)). In 538 addition, the precision (i.e. the inverse of the variance) of the inferred probability bias 539 x 1 seems to increase in longer epochs (inter-switch blocks) as information is integrated 540 over more trials. As a result, the inferred probability as a function of time seems 541 qualitatively to constitute a reliable regressor for predicting the strength of aSPEM.

542
In addition, the explicit ratings for the next trial's expected motion direction (or bet 543 scores, red curve in Figure 3) provided in the other experimental session seem to 544 qualitatively follow the same trend. Indeed, similarly to the strength of aSPEM, we 545 qualitatively compare in Figure 3 the trace of the bet scores with the inferred 546 probability biasx 1 . As with aSPEM, the series of the participants' bias guesses exhibits 547 a positive correlation with the true probability bias: The next outcome of x t 0 will in 548 general be correctly inferred, as compared to a random choice, as reported 549 previously [64]. Moreover, we observe again that a stronger probability bias leads to a 550 lower variability in the bet scores, compared to bias values close to 0.5. Again, a 551 (hidden) switch in the value of the bias is most of the time correctly identified after only 552 a few trials. Finally, note that at every pause (black vertical bar in Figure 3), 553 participants tended to favor unbiased guesses, closer to 0.5 than at the end of a 554 sub-block of trials. We can speculate that this phenomenon could correspond to a 555 spontaneous resetting mechanism of the internal belief on the probability bias and the bet score, respectively. As a comparison, we have plotted in blue and orange colors the regression lines with respectively the true probability (P true = x t 1 ) and the probability bias estimates P leaky obtained with a leaky integrator. Insets summarize the quantitative measure of this match by computing the Pearson correlation coefficient r and the mutual information (MI) over the whole data set. Dots correspond to these measures for each individual observer. This shows quantitatively that for both experimental measures there is a strong statistical dependency between the behavioral results and the prediction of the BBCP model, but also that this dependency is significantly stronger than that obtained with the true probability and with the estimates obtained with the leaky integrator (see text). was applied to the true value P true and to the estimate obtained by the leaky integrator. 571 We quantitatively estimated the Pearson correlation coefficient and the mutual 572 information between the raw data and the different models. First, as we can see 573 in Figure 4-A, the probability bias P BBCP estimated by the BBCP algorithm is linearly 574 correlated with the aSPEM velocity, both as computed on the whole data or for each 575 observer individually (see insets in the Figure). The respective values for the whole 576 dataset (r = 0.657 and M I = 0.687) and across subjects (r = 0.673 ± 0.079 and 577 M I = 0.707 ± 0.134) are slightly higher than that found by [44] and [12] for aSPEM  have reported the quantitative analysis of the group-pooled data for the fit between 619 experimental and model-inferred estimates of the hidden probability bias. For instance, 620 the experimental measures for two representative participants in Figure 3, support the 621 qualitative match between behavioral data and model predictions, which we then 622 confirmed quantitatively on the whole group of participants. It is important to note 623 that no model fitting procedure was used so far, but only the match of the results of the 624 BBCP-model applied to the sequence of binary target directions presented to the 625 human participants, as shown in Figure 2-B. However, we observed that in both sessions 626 the qualitative match between model and data varied across participants. This was best 627 characterized by differences in the variability of the responses, but also, for instance, by 628 the different characteristic delays after a switch. This reflects the spectrum of individual 629 behavioral choices between exploration versus exploration [57]. As a consequence, we  This provides with 3 values of the best fitted hazard rate for each session and observer. 650 The scatter plot of the best fit values is shown in Figure 5. This figure suggests, in the 651 first place, that there is some variability in the best fitted value of the hazard rate in 652 both sessions. Overall, the value of correlation coefficient of the best fit hazard rate was 653 slightly higher than that computed in Figure 3 with r = 0.682 ± 0.080 for the eye 654 movement session and r = 0.811 ± 0.089 for the rating scale session. A part of the 655 variability in the estimated hazard rates comes from the limited length of the data 656 blocks, while another part is due to intra-individual and inter-individual variabilities.

657
Overall, the median (with 25% and 75% quantiles) are h aSPEM = 0.069 (0.038, 0.093) for 658 the aSPEM session and h bet = 0.025 (0.011, 0.093) for the rating scale. We observe that 659 these values are close to the (hidden) ground truth value (h = 1/40 = 0.025) used to 660 generate the sequence. In addition, the best-fit hazard rate value is higher for aSPEM 661 compared to the true value and the rating scale measures. In addition, we observed a  Figure 3. This plot shows that best fit hazard rates are in general higher than the ground truth (blue line), and in general higher for eye movements (below the diagonal). Note that the histograms of hazard-rate best-fit estimates (grey shaded areas) is much more narrower for the eye movement session than for the bet experiment, as also illustrated by the cumulative distributions (plain lines in black or colors). Such an analysis suggests that participants ultimately have different mechanisms at the unconscious and conscious levels for guiding their tendency of exploration versus exploitation. qualitatively cluster, but the dataset is still insufficiently large to support the 669 significance of such observation at a quantitative level. Moreover, there is a difference in 670 the distribution of observed hazard rates in both sessions. Indeed, we observed that the 671 marginal distribution for each session is different, with the distribution in the aSPEM 672 session being narrower than that observed for the rating scale session. In particular, we 673 also observed the same behavior for each sub-block independently, suggesting that the 674 origin of this variability mainly comes from inter-subject variability. Such an analysis 675 suggests that even though the predictive processes at work in both sessions may reflect 676 a common origin for the evaluation of volatility, this estimation is then more strongly 677 modulated by individual preferences when a more conscious cognitive process is at stake. 678

679
The capacity to adapt our behavior to the environmental regularities has been  The time-varying statistical regularities that characterize the environment are likely to 704 influence several cognitive functions. In this study, we have made the choice to focus on 705 a largely unconscious motor behavior (aSPEM), as well as on the explicit rating of 706 expectation for the forthcoming motion direction. In contrast, we have not addressed 707 the question of whether and how statistical learning affects visual motion perception 708 throughout our model-generated volatile sequences. In an empirical context similar to 709 ours, [11] have recently shown that perceptual adaptation for speed estimation occurs 710 concurrently to priming-based aSPEM throughout a sequence of motion tracking trials 711 with randomly varying speed. They actually found a robust repulsive adaptation effect, 712 with perceptual judgements biased in favor of faster percepts after seeing stimuli that 713 were slower and vice-versa. Concurrently, these authors also found a positive effect on 714 anticipatory smooth pursuit, with faster anticipation after faster stimuli, somehow in 715 agreement with the adaptive properties of aSPEM that we also report here. [11] 716 quantified the trial-history effects on aSPEM and speed perception by fitting a 717 fixed-size memory model similar to the forgetful agent. They found that aSPEM and 718 speed perception change over different time scales, with the priming effects being 719 maximized for short-term stimulus history (around 2 trials) and adaptation for longer 720 stimulus history, around 15 trials. Their main conclusion was that perceptual 721 adaptation and oculomotor priming are the result of two distinct readout processes 722 using the same internal representation of motion regularities. Note that both these 723 history lengths can be considered short in comparison to the several hundreds of trials 724 that are commonly used in psychophysics and sensorimotor adaptation studies and that, 725 similar to the present study, the inferred characteristic times are even shorter for the 726 buildup of anticipatory eye movements. However, it is also important to note that in 727 the study by [11], the generative model underlying the random sequence of motion trials 728 was different and much simpler than in the present study: In particular the role of We can speculate that different statistics can play a different role depending 749 on the context, but altogether the study by [28] and the present one converge to 750 highlight the importance of a dynamic estimate of the hierarchical statistical properties 751 of the environment for efficient behavior. There are also other limits to the agent that 752 we have defined. In this study we assume that data are provided as a sequence of 753 discrete steps. A similar approach using a Poisson point process allows to extend our 754 model to the continuous time domain, such as addressed by [65]: In their experiments, 755 the authors analyzed the licking behavior of rats in a dynamic environment. The 756 generalization to the time-continuous case is beyond the scope of our current protocol, 757 but it would consist in a natural extension of it to more complex and ecological settings. 758 The way expectations act on cognitive processes in general has been investigated in a 759 wide range of domains such as predictive coding [66], active inference [67], motor 760 control [68] and reinforcement learning [12,57,62]. Non-stationary observations can also 761 explain why both local and global effects emerge and why local effects persist in the 762 long run even within purely random sequences [20,32]. This constant update of a 763 general belief on the world can be a consequence of the constant attempt to learn the 764 non-stationary structure of the environment that can change at unpredictable times [20]. 765 Many studies have actually already pointed out the brain's ability to apprehend 766 non-stationary states in environments [64,69]. Future work will be needed to address   772 When we perceive the physical world, make a decision or take an action to interact with 773 it, our brain must deal with an ubiquitous property of it, uncertainty. Uncertainty can 774 arise at different levels and be structured around different characteristic time scales.

775
During the past decades, modern science seems to have completed an epistemological 776 transition, from struggling to reduce or neglect uncertainly to engaging in 777 understanding it as a crucial constituent of the world. In the cognitive neurosciences 778 this transition has been formalized in the theoretical framework of Bayesian 779 probabilistic inference, which has become very popular as a benchmark of optimal 780 behavior in perceptual, sensorimotor and cognitive tasks [70] and gives a unified 781 framework for studying the brain [67]. Furthermore, plausible hypotheses about the 782 implementation of Bayesian computations -or approximations of them-in the activity 783 of neuronal populations have been proposed [71][72][73].

784
However, one should be careful when evaluating the quality of fit of Bayesian 785 inference models for behavioral data, and avoid any over-interpretation of the results.

786
Note that, if we assume that the inversion of the generative model is perfect (that is, if 787 no algorithmic approximation has been done, like in the present study), this means that 788 by fitting different ideal observers to the data, one evaluates as a matter of fact the 789 adequacy of a specific generative model, not of the probabilistic calculus in its detailed 790 implementation. There is a common confusion around the idea of a "Bayesian brain". 791 We actually believe that the challenge here is not to validate the hypothesis that the 792 brain uses or not the Bayes' theorem, or a more complex hierarchical combinations of 793 inferential computations, but rather to test different hypotheses about the different 794 generative models that agents may use. This methodological point will be essential in 795 designing future experimental protocols, and in evaluating quantitatively the results.

796
The brain is probably only "weakly Bayesian" (it does not care about equations but 797 more about sugar, after all!). One remaining question though, is to understand why in 798 cognitive systems the adaptation to hierarchical probabilistic fluctuations occurs and in 799 particular why it may deviate in some pathological disorders such as 800 schizophrenia [5,74] or across the natural variability of autistic traits [75].

801
While it was not our original objective, we have analyzed in this study the individual 802 best-fit parameters (hazard rates) of the BCCP model: despite a consistent variability 803 of such parameters across sub-blocks of the trial sequence, we highlighted some 804 noteworthy tendencies for participants to cluster around specific properties of the 805 dynamic adaptation to a volatile probabilistic environment. Most important, this 806 analysis corroborates and strengthens some recent attempts to realize a computational 807 phenotyping of human participants. However, more extensive studied should be 808 conducted to be able to quantitatively titrate inter-individual tendencies. Finally, neuroeconomists have pointed out a generic aversion to risk [76] such that the 812 value of a possible outcome is weighted by the precision of the inference, leading in 813 general to an under-weighting of high gains and losses. Importantly, [77] compared a 814 classical economic decision task with a motor decision task: they found that 815 participants were more risk seeking in the motor task compared to the first one. More 816 recently, in a task similar to ours, where the behavioral choice was not specifically 817 associated to a reward schedule, [47] found a weak non-linearity in the dependence of 818 aSPEM upon the probability of motion direction, yielding an overweight of the extreme 819 values of probability, whereas an opposite non-linearity (underweight) was observed 820 when the target direction was visually-cued with a given probability of validity. In our 821 data we have not found consistent evidence suggesting a clear non-linearity in either 822 sense. Further work is needed to disentangle the possible specificities (e.g. non 823 linearities), in this respect, of different cognitive tasks, as well as to investigate the 824 dependence of non-linearities upon the environmental volatility.

832
• We applied such a framework to the case of a probability bias in a visual motion 833 task where we manipulated the target direction probability. We observed a good 834 match between the largely unconscious anticipatory smooth eye movements and 835 the results of the model, replicating and providing a novel solid theoretical 836 framework for previous findings [12,44,47].

837
• We also found a good match between model predictions and the explicit rating of 838 the expected target motion direction, a novel result suggesting that this model  The moving target used in our experiments was a white ring (0.35°outer diameter 858 and 0.27°inner diameter) with a luminance of 102 cd/m2 that moved horizontally on a 859 grey background (luminance 42 cd/m 2 ). Each trial started with a central fixation point 860 displayed for a random duration drawn from a uniform distribution ranging between 400 861 and 800 ms. Then a fixed-duration 300 ms gap occurred between the offset of the 862 fixation point and the onset of the moving target, which was presented slightly offset 863 from the fixation location [79] and immediately started moving horizontally at a  The recorded horizontal and vertical raw gaze position data were numerically 888 differentiated to obtain velocity measures. We adopted an automatic conjoint 889 acceleration and velocity threshold method (the default saccade detection implemented 890 by SR Research) to detect ocular saccades. Saccades and eye-blinks were excluded from 891 eye velocity traces (and replaced by Not-a-Number values in the numerical arrays) 892 before trial averaging and data fitting for the extraction of the oculomotor parameters 893 of interest. In order to extract the relevant parameters of the oculomotor responses, we 894 developed new tools based on a best-fitting procedure of predefined oculomotor patterns 895 and in particular the typical smooth pursuit velocity profile that was recorded in our 896 experiment. A piecewise-defined function was fitted to the different epochs of the eye 897 velocity traces: a constant function during fixation, a ramp-like linear function during 898 smooth pursuit anticipation, an increasing sigmoid-function during the initiation of 899 visually-guided smooth pursuit, reaching its saturating value during the pursuit 900 steady-state. This analysis was applied to each trial individually and it allowed in 901 particular to estimate the anticipatory smooth pursuit velocity. Some trials were 902 excluded from the analysis as the proportion of missing data-points, due to eye blinks or 903 saccades was considered too large, namely when the missing data exceeded 45 ms 904 during the gap or one third of the total target motion epoch (4.36% of all trials). In 905 addition, trials were also excluded when the eye-movement fitting procedure did not 906 converge, after visual inspection, to a satisfactory match with the data (3.25% of all 907 trials). The python scripts used to analyze eye movements are available at 908 https://github.com/invibe/ANEMO. before the presentation of the moving target, participants had to answer to the question 913 "How sure are you that the target will go left or right". This was performed by adjusting 914 a cursor on the screen using the mouse (see Figure Figure 1-C). The cursor could be 915 placed at any point along a horizontal segment representing a linear rating scale with 916 three ticks labeled as "Left", "Right" (at the extreme left and right end of the segment 917 respectively), and "Unsure" in the middle. Participants had to validate their choice by 918 clicking on the mouse left-button and the actual target motion was shown thereafter.

919
The rationale to collect rating responses on a continuous scale instead of a simple binary 920 prediction (Right/Left) was to be able to infer the individual estimate of the direction 921 bias at the single trial scale (in analogy to the continuous interval for the strength of 922 aSPEM velocity). We called this experiment the "Bet" experiment, as participants 923 were explicitly encouraged to make reasonable rating estimates, as though they had to 924 bet money on the next trial outcome. Every 50 trials, a "score" was displayed on the 925 screen, corresponding to the proportion of correct direction predictions (Right or Left of 926 the "Unsure" tick) weighted by the confidence attributed to each answer (the distance 927 of the cursor from the center). If we write it for trial t − 1, we have 1161 As such, the integrative formula above becomes an iterative relation: 1162 As such, the definitions in Equation 2 and Equation 3 are equivalent. Beta-distribution. For example, the beta distribution can be used in Bayesian analysis 1173 to describe initial knowledge concerning probability of success such as the probability 1174 that a product will successfully complete a stress test. The beta distribution is a 1175 suitable model for the random behavior of percentages and proportions.

1176
It is usually defined using shape parameters α and β: Note that here, the variable is the probability bias p. The normalization constant 8.3.2 Prediction: run-length distribution 1211 The steps to achieve the update rule are: