Dynamic divisive normalization circuits explain and predict change detection in monkey area MT

Sudden changes in visual scenes often indicate important events for behavior. For their quick and reliable detection, the brain must be capable to process these changes as independently as possible from its current activation state. In motion-selective area MT, neurons respond to instantaneous speed changes with pronounced transients, often far exceeding the expected response as derived from their speed tuning profile. We here show that this complex, non-linear behavior emerges from the combined temporal dynamics of excitation and divisive inhibition, and provide a comprehensive mathematical analysis. A central prediction derived from this investigation is that attention increases the steepness of the transient response irrespective of the activation state prior to a stimulus change, and irrespective of the sign of the change (i.e. irrespective of whether the stimulus is accelerating or decelerating). Extracellular recordings of attention-dependent representation of both speed increments and decrements confirmed this prediction and suggest that improved change detection derives from basic computations in a canonical cortical circuitry.


Author summary
The world is a dynamic place: visual scenes are changing rapidly, and decisions have to be taken quickly and reliably to ensure successful behavior and, finally, survival. This creates a challenge for the brain, since it has to fulfill two requirements at the same time: It has to detect changes regardless of their magnitude and sign of change, and it has to primarily focus on behaviorally relevant changes. We here studied transient stimulus-speed change responses of neurons in motion-sensitive area MT and identify a mechanism supporting both of these requirements. This mechanism can be realized by an elementary neural model circuit which closely fits physiological data. The model is based on dynamic divisive inhibition generating fast transient rate modulations in response to rapid input changes. We analyzed this circuit mathematically and arrived at the formal prediction that attention will consistently increase the steepness of the transient, irrespective of the magnitude of the pre-change activation and the sign of the input change, thus allowing for

Introduction
For change detection in behaviorally relevant situations, stimulus-induced, strong transient firing rate modulations provide a powerful signal to downstream visuomotor areas on short timescales [1,2]. Such transients are particularly information-rich [3,4], and the temporal sensitivity of neurons to stimulus changes is suggested to constitute a major function of neuronal tuning properties [5]. Human behavioral performance to detect speed changes [6] correlates with the size of area MT change transients [7] and strongly improves with both spatial and feature-directed attention [8]. Accordingly, both forms of attention were found to increase sustained and transient MT firing rates before and after a stimulus change [9,10], and the transient response after the change also correlates with reaction times [9]. This close relation between neuronal transients and behavioral change detection performance on the one hand, and the general effect of attention to increase neuronal response rates [11,12] on the other, raises the question how attention exerts its beneficial effects under conditions where firing rate increases seem to impede the formation of pronounced transients and counteract behavioral performance. For example, because attention-dependent enhancement of firing rates brings a neuron closer to its maximum activity, a stimulus-induced transient firing rate increase under attention might be smaller than without attention. Furthermore, a negative transient would start from a larger pre-change activation level and is presumably having a smaller absolute negative peak than under conditions of no or remote attention. Absolute firing rates of the transient, therefore, may only allow for a poor prediction of attention-dependent improvements in behavioral change detection. As a consequence, the neuronal circuit processing the change depends on some form of normalization to compensate for differences in absolute activity. Moreover, to facilitate change detection, attentional modulation of the circuit should induce a consistent effect on neuronal transients that is independent from the stimulus-induced activation level before the change. It is unclear presently, which feature of a change-transient can be most consistently modulated by attention independent of the specific stimulus condition, and which neuronal mechanism might underlie its corresponding dynamics.
To investigate this issue, we set up and mathematically analyze a biologically plausible, canonical circuit providing divisive inhibition to an excitatory unit, and apply the model to a wide range of different stimulus conditions under, at first, passive viewing conditions [7]. By then introducing an input response gain to simulate top-down modulation of change detection by visual attention, the model circuit predicts a main effect on response rise times not only for positive transients but also for the case of negative transients, i.e. rapid rate decreases in a population of neurons. To test this prediction experimentally, we recorded neurons from motionsensitive area MT while monkeys were engaged in a change detection task for both speed increments and decrements, eliciting large positive and negative population transients, respectively. We show that MT activity exactly follows the prediction of the model, having steeper response slopes irrespective of the sign of the transient. Thus, the model circuit consistently explains change detection under different stimulus and attention conditions by a rather simple computational unit, realized in the canonical circuitry of the cortex.

Transient neural responses in area MT
Neurons in area MT are well-activated by moving stimuli such as localized drifting Gabor patches presented inside their receptive fields (RFs). Neural responses are tuned to the direction of movement, exhibiting maximal activity when the stimulus is moving in the cell's preferred direction [13]. The additional preference of MT neurons for a particular stimulus speed can be approached by a Gaussian function in log-velocity space [14,15]. Rapid changes in the input to these neurons result in pronounced, transient changes of their activation. An example from previous work [7] is showing the population response of an ensemble of MT neurons (N = 94) to a range of sudden stimulus accelerations and decelerations ( Fig 1A). Transients show up as fast increases (or decreases) in firing rate, followed by a slower decrease (or increase) in firing rate back to a sustained level of activation. Such transients are visible in response to any instantaneous change in stimulation (black arrows in Fig 1A): stimulus onset, motion onset, speed change, motion offset, and stimulus offset. Due to their causal relation to visual perception and change detection [16], we aimed to study their properties and non-linear dynamics in a theoretical framework providing access to a comprehensive mathematical analysis, and to experimentally test predictions derived from it.

Model for transient activation in area MT
For modeling transient responses, we consider a circuit where an input I activates an excitatory and an inhibitory unit, with output A i of the inhibitory unit providing divisive inhibition to

PLOS COMPUTATIONAL BIOLOGY
the excitatory unit ( Fig 1B). The dynamics of the circuit is given by the following differential equations: where output A e (t) of the excitatory unit represents the instantaneous firing rate A e of a neuron in MT at time t, with I being the external input provided by the visual stimulus. Terms τ e , τ i denote time constants, σ is a constant positive offset, and g e , g i represent gain functions realized by threshold-linear rectification, with m e , m i indicating gain factors, and θ e , θ i indicating thresholds: g e ðIÞ ¼ m e ðI À y e Þ for I > y e ; and 0 otherwise ð3Þ With constant suprathreshold input I 0 , the steady-state solution for Eq (1) is equivalent to a standard divisive normalization model [17,18], where A i represents feedback from co-activated neighboring MT columns. As such, our model can be interpreted as a dynamical reformulation of static divisive normalization, with relaxation dynamics similar to those proposed for recurrent divisive normalization or low-pass filtering the output of a divisive normalization circuit [19,20]. The advantage of having two time constants for excitation and inhibition is the explicit representation of transients, allowing activation A e to quickly follow changes in I on a fast time scale τ e , while divisive inhibition is acting on a longer time scale τ i , bringing the output towards a sustained, steady-state level of activation ( Fig 1C).

Model fit to transients from experimental data
To investigate how well the model explains experimental data, A e (t) was fitted to MT stimulus onset transients. Because MT neurons exhibit spontaneous activity with low firing rates even in absence of a visual stimulus, we assume θ e = θ i = 0 and I(t)>0, which allows to replace Eqs (3) and (4) by linear gain functions to simplify the analysis. By modeling stimulus onset as an instantaneous change in external input at t = t change from I pre to I post , Eq (2) can be explicitly solved, assuming that for t<t change the system is in its steady state for a constant input I pre . The result can be rewritten in terms of the sustained activation A pre e before stimulus onset, and sustained activation A post e after decay of the transient response, to account for the fact that experimental access is given to the output of the neuron rather than to its input: The ratio of the two gain factors A max e ¼ m e =m i designates the theoretical maximum sustained activation of the circuit. We used a grid search to find the remaining free parameters τ e , τ i , and A max e to minimize the average quadratic error between model activation and recorded

PLOS COMPUTATIONAL BIOLOGY
MT firing rate during the transient response, excluding units yielding total spike counts that were too low to allow for successful fits (cf. Methods).
This structurally very simple model allowed for a close approximation of transient and sustained MT responses to motion onsets, despite the significant differences in their shape, caused by, among other things, the different tuning of individual neurons to actual stimulus speeds. Three examples of multi-unit responses and corresponding model fits are given in Fig 2A-2C, and the Chi-squared distributions for the total of 48 (monkey 1 (M1)) and 44 (M2) remaining units are shown in Fig 2D. The χ 2 /N t ratio was sufficiently close to 1 for most units (M1: 1.49 ± 0.74 SD, M2: 1.37 ± 0.5 SD), indicating that fits were estimating the mean response nearly as good as the experimental data. This conclusion was confirmed by a comparison to surrogate distributions with the same statistical power and known "ground truth" (Fig 2D, see Methods for more details).
Excitatory time scales were much faster than inhibitory ones, as to be expected for the dynamics of transients (Fig 2E). For M1, mean τ e was 17 ± 16 ms SD, mean τ i was 45 ± 30 ms, average ratio τ e /τ i was 0.4 with SD 0.98, and average A max e was 87 ± 54 spikes/s. For M2, mean τ e was 20 ± 9 ms, mean τ i was 52 ± 25 ms, average ratio τ e /τ i was 0.38 with SD 0.34, and average A max e was 127 ± 99 spikes/s. We also tested whether less than three free parameters per individual MT neuron would be sufficient to explain the different shapes of transient responses across the population of neurons. We therefore fixed one or more parameters (τ e, τ i , A max e ) for all units in a population, and repeated the fitting procedure. With one fixed global parameter, the time-averaged chi-square measure between model fit and experimental data increased by 3.3% (M1) and 2.8% (M2) on average. Which of the three parameters was globally defined was largely irrelevant, but fixing the time constant ratio τ e /τ i had the smallest impact, increasing the fitting error by only 1.6% (for both monkeys). With two or three global parameters fixed, however, the error increased strongly (14.7% (M1) and 13.8% (M2) for two fixed parameters, 42.3% (M1) and 36.6% (M2) for three fixed parameters).

Transient response characteristics to stimulus changes
After the model had been calibrated to motion onset transients, we were next interested to investigate whether it is capable to predict and explain the complex non-linear scaling of transients in response to changes in speed, which depend on the sign and magnitude of the physical speed change and the individual neurons' tuning characteristics [7]. Because time scales τ e were shorter than τ i , an idealized version of the model can be considered by assuming that the excitatory unit reacts infinitely fast to input changes. With this approximation, it is possible to solve Eq (6) explicitly for τ i >0 and obtain the peak of the transient shortly after t = t change analytically (cf. Eqs (12) & (13) in Methods).
Peak amplitudes obtained from the model in this manner reproduced an important and so far, unexplained characteristic of MT change transients. In MT, peak amplitudes in response to speed changes exceed those expected from the neuron's speed tuning profile significantly if the speed before the change is away from the neuron's preferred speed. Neurons well-tuned to the pre-change speed are very poor change detectors, while neurons for which the pre-change speed is on the flank of their tuning curve have a strong impact on the population response [7]. This tuning-dependent, non-linear relationship between pre-change stimulus speed and individual neuronal tuning profiles is well captured by the model. Simulated speed changes, realized by step functions applied to the circuit's input at time t change (Fig 1C), predict speed change transients very closely matching the experimental data with regard to sign and amplitude of the peak for both changes occurring on the ascending and the descending flank of the tuning curve (Fig 3). Thus, the model reproduces the full dynamics of physiological change transients, including the over-and undershooting of peak amplitudes.

Model predictions for attention-dependent modulation of change transients
Because transients signal stimulus changes in a rapid and pronounced manner, they were previously suggested as a possible target for attentional modulation and a neuronal mechanism to speed up reaction times [9,21,22]. In area MT, using speed changes of 100% magnitude, attention was found to modulate both the peak amplitude and latency of a change transient [9,10]. Modulations due to other magnitudes of change, including negative changes, have not been investigated yet. Therefore, in our model, we next studied the dynamical effects of attention on transients for speed changes of arbitrary magnitude (without separation of time scales as in the previous subsection, allowing for arbitrary combinations of excitatory and inhibitory time constants). We included attention by simply assuming a multiplicative scaling [11] of the input I by a factor α>1, I!αI (Fig 4A, top left).
Ideally, to improve computation and, ultimately, perception, the effect of attention should be consistent across the entire dynamical range of the circuit. In particular, for attention to be effective, any change in the external input I causing a positive (or negative) transient (as e.g. the traces for the ± 100% changes in Fig 1A) should be associated with attention-dependent modulations preserving the sign of the transient independent from the neuron's pre-change activation, as e.g. a consistent increase (or decrease) in the transient's peak amplitude. If, . Ideally, attention should elicit a consistent modulation of the neuronal change response, as e.g. enhancing a neural response feature for positive input changes (above diagonal, red shading), and suppressing it for negative input changes (below diagonal, blue shading) irrespective of the particular pre-change activation level. (B) Predicted attention-dependent changes in slope ΔF rise (top left, indicating consistent modulation by attention), sustained activation ΔF sus (top right, indicating inconsistent, pre-change activation-dependent modulation by attention), and relative peak height ΔF peak (bottom plots). Relative peak height critically depends on the time scale ratio τ e /τ i and becomes similar to ΔF rise for τ e /τ i !0 and similar to ΔF sus for τ e /τ i !1. Axes and color scales are identical for all plots in (B), indicated at the top left and right chart.  however, peak amplitude is only increased for low pre-change activation levels, but decreased for high pre-change activation levels for otherwise identical stimulus conditions, the effect of attention would be inconsistent. Fig 4A (bottom right) exemplifies a consistent pattern of attentional modulation for stimulus-change responses associated with positive and negative rate changes, normalized to a dynamic range between 0 (lowest level) and 1 (highest level). The colored surface illustrates a consistent (i.e. pre-change activity independent) positive modulation by attention for stimulus-induced rate increases (above diagonal), and a corresponding negative modulation by attention for stimulus-induced rate decreases (below diagonal).
The model was used to investigate three different change-response features regarding their dependence from attention and pre-change activity: the initial slope F rise of the transient (i.e. rise/decay time), the maximal amplitude F peak of the transient, and the sustained activation ; 1� relative to the hypothetical maximum response A max e . Peak and sustained responses were quantified relative to pre-change activation, assuming that any plausible change detection circuit needs to base its computation on the number of spikes exceeding or falling below this level. The initial slope becomes and its attention-induced change ΔF rise is visualized in Fig 4B (left). Similarly, the sustained activation level F sus is given by and its attention-induced change ΔF sus relative to the pre-change activity is plotted in Fig 4B  (right). Finally, the relative change in maximal amplitude ΔF peak depends on the specific ratio of the excitatory and inhibitory time constants τ e /τ i . Numerical evaluations for different values of τ reveal that for τ e /τ i !0, ΔF peak approaches ΔF rise , while for τ e /τ i !1, it approaches ΔF sus ( Fig 4B, bottom panels). These computations reveal two main insights: First, the slope and the sustained response indicate the two extremes of the analysis. The slope is subject to a strong and consistent pattern of attentional modulation, which is independent of both the overall activity and the sign of the rate change, indicating generally faster transients with attention for both speed increments and decrements. In contrast, the sustained response exhibits an inconsistent pattern of attentional modulation, with the sign of the modulation depending on the overall activation of the neuron before the change. Second, as a function of the specific ratio of excitatory and inhibitory time constants, the modulation of the peak amplitude shifts between these extremes. Attentional modulation becomes stronger and more consistent the smaller the ratio τ e /τ i , but is attenuated (and, theoretically, even inconsistent) for larger values of τ e /τ i . Thus, the model makes the explicit prediction that, like positive transients, negative transients (i.e. rapid decreases in firing rate) will have a steeper rise time, i.e. shorter latencies, under the influence of attention as well as higher relative peaks, assuming τ e /τ i being in the previously estimated range (Fig 2E). The inconsistent pattern of attentional modulation as predicted for the sustained response can be explained by having a closer look at the steady state in Eq (5) with θ e and θ i set to zero, which can be rearranged into the form of a Naka-Rushton equation.
For an attention-scaled input of αI with α>1 for small inputs I, the function rises more steeply (Fig 4B right, red region above diagonal) than with α = 1 (no attention). From a critical input strength I > s=ð ffi ffi ffi a p m i Þ on, however, this behavior reverses and the function rises more slowly with attention (Fig 4B right, blue region above diagonal).
Interestingly, the qualitative results depicted in Fig 4 remain identical also when considering a more general class of models in which attentional modulation of inputs to the excitatory and inhibitory units can be different (Eqs (16) and (17)). As long as attentional modulation α e of excitatory input is equal or larger than the corresponding modulation of inhibitory input α i , the expression in Eq (16) yields a positive result for a post e > a pre e (steeper rise), and a negative result (steeper decay) otherwise. Conversely, the expression in Eq (17) can be positive or negative for a post e > a pre e , depending on the model parameters and pre-and post-change activation. Assuming a different input I i to the inhibitory population is mathematically equivalent to rescaling the gain factor m i and leads also to the same qualitative results as long as the corresponding input to the excitatory population I e is equal or larger.

Experimental investigation of model predictions
Because attention is generally found to facilitate the response of visual neurons to the initial stimulus, any decrease in the population firing induced by a corresponding change of the behaviorally relevant stimulus would be antagonized by the opposite effect of attention, in terms of absolute firing. For this problem, the model's prediction of generally faster rise times and potentially more pronounced relative peak firing rate changes as a result of attention offers a particularly attractive solution for effective detection of stimulus changes. Changes in rise time (and, under appropriate conditions, relative peak firing rate changes), provide a mechanism independent of absolute firing to transmit attentionally selected information to downstream areas of the visuomotor pathway.
To test the model predictions, we recorded neuronal responses from motion-sensitive area MT of two macaques (M3: N = 45 units, M4: N = 25 units). Monkeys performed a speedchange detection task requiring them to either attend towards or away from the recorded unit's RF and to detect increments or decrements of the target's speed (Fig 5A). Peri-stimulus time histograms (PSTHs) aligned to the speed change of the stimulus displayed higher prechange firing rates when the stimulus inside the RF was attended, and strong, transient increases and decreases of the firing rate in response to increments (accelerations) and decrements (decelerations) of motion speed, respectively ( Fig 5B). Interestingly, neither pre-change firing rates (Wilcoxon signed rank tests,  43 21]) differed between blocks of speed increments and decrements. This indicates that both activity and attentional modulation during the pre-change period were independent from the sign of the speed change to be detected, despite the block-wise design of the experiment. Spike counts for all attentional conditions and all 25 ms time intervals between 400ms before and 200ms after a speed change exhibited a variance close to their mean (Fig 5B,  insets), compatible with the statistical properties of a Poisson process (used below for assessing significance of spike count differences).
To analyze these transients according to model predictions, we assessed first, response slopes and second, relative firing rate changes during the transient time period of 50 to 200 ms following the speed change. First, for analyzing slopes, we calculated excess cumulative spike counts representing the number of spikes over-or undershooting the mean firing rate before the speed change as a proxy for the initial slope. Because rates increase or decrease almost linearly for the first 40 ms following the population transient onset, a larger (smaller) cumulative count is equivalent to a steeper positive (negative) slope. In contrast to estimating slopes directly from the PSTHs, the cumulative measure has two advantages. First, because accumulation is based on integration, noise does not become amplified but attenuated. Second, accumulation does not need smoothing of data to more reliably calculate specific response parameters, which is problematic if transients are faster than the width of the PSTH filter kernel used. . After monkeys properly fixated and pressed a lever, two moving gratings appeared, one of which was placed inside the RF of the recorded neuron (dashed white circle), while the other was placed in the opposite hemifield, mirrored across the fixation spot. Following a pseudo-randomized delay of 0.66 to 5.5 s, the RF-stimulus rapidly increased (top row) or decreased speed (bottom row). Any speed change of the stimulus in the uncued hemifield had to be ignored. Keeping fixation and releasing the lever within 750 ms after the speed change was rewarded with some drops of water or diluted grape juice. Depending on the cued stimulus location, the RF stimulus was either attended or non-attended, giving rise to four experimental conditions (right plots, color-coded). (B) PSTHs for two example multi-unit sites (one from each monkey), illustrating rapid firing rate adjustments in response to speed changes. Note that because negative speed changes result in a decrease of the firing rate, the stimulus-induced modulation of the firing rate is opposite to the attention-induced modulation before the change. Insets show spike counts and variances assessed in time intervals of 25 ms from -400 ms to +200 ms relative to stimulus change for all units. Color coding of the different attentional conditions is indicated in panel (A). https://doi.org/10.1371/journal.pcbi.1009595.g005

PLOS COMPUTATIONAL BIOLOGY
Consistent with the prediction of the model, the excess cumulative spike count was found to rise more steeply for positive transients, and to decay more rapidly for negative transients when attention was directed to the stimulus (Fig 6A and 6B). Onset of the population transient was around 55 ms after the stimulus change, which is well in the range of typical MT response latencies [23,24]. Attention-dependent differences in excess cumulative spike counts become significantly different from 0 already shortly after response onset (Fig 6C). The steepening of the PSTH's slope by attention causes the neurons to reach a certain rate increase or decrease earlier than without attention. To quantify the induced time advantage, we computed the average smoothed PSTH (Gaussian kernel, RMS width 10ms) for each condition and monkey, and extracted the time difference in reaching a certain rate increase or decrease with and without attention. Attention provided a time advantage of up to 10 ms and more for both accelerations and decelerations, which corresponds to the typical attention-dependent latency reduction in area MT and goes along with a reaction time decrease of 50 ms and more (cf. Table 1 in Ref. [9]). Significantly different cumulative spike counts following speed increments were found for 60% of all recorded units during the onset of the transient (Fig 6E), and of those, 85% confirmed the predictions of the model (Fig 6F). Likewise, for speed decrements, 46% of all units displayed significantly different spike counts (Fig 6E), and 76% of those were confirming the model's prediction of steeper slopes (Fig 6F). This result was evident for both monkeys,

PLOS COMPUTATIONAL BIOLOGY
with 84% (M3) and 72% (M4) of all significantly modulated units being in accordance with the model prediction. Interestingly, units with strong rate increases in the acceleration condition also showed strong decreases in the deceleration condition. Performing a linear regression analysis on the excess spike count for deceleration vs. acceleration revealed a consistent anticorrelation with slope -0.20 ± 0.02 STD and offset -0.11 ± 0.02 SD for monkey M3 and slope -0.13 ± 0.04 SD and offset 0.00 ± 0.02 SD for monkey M4 (variability assessed by bootstrap using one-leave-out method).
Second, for analyzing the transient's peak in response to the stimulus change, the firing rate was calculated in consecutive bins of 25 ms width. For both speed increments and decrements, up to more than 90% of units were found to have significantly higher and lower rates, respectively, in the attend-in condition during the time intervals of 50 to 100 ms post-change ( Fig  6G). Yet, because this ratio decreased rapidly for later bins of the transient, consistent attentional modulation was limited to a brief period immediately following the stimulus change.
Taken together, the experimental data confirmed the model predictions for both the effect of attention onto the slope of positive and negative transients and, for the initial part of the transient, onto the modulation of relative peak responses for time constant ratios of τ e /τ i �1, as found in the response onset fits (Fig 2E). As a novel physiological result, they provide evidence that attention modulates the same features of a negative transient than it does for positive transients, suggesting that processing of visual information, and its perception, can rely on information contained in reductions of firing rates.

Discussion
The ability to detect rapid changes in complex, ever-changing environments is fundamental for both animal and human behavior. Neuronal responses to fast stimulus transitions usually come as brief episodes of increased or decreased neuronal activity, followed by a steady-state level of lower absolute amplitude. Pronounced transient changes in neuronal activation were observed in the brain of many different species, spanning the range from invertebrates to primates [25][26][27][28], suggesting that they represent a basic principle in neuronal network dynamics. We here show that such a canonical computation can be realized by a very simple circuitry, essentially built of only one excitatory and one inhibitory unit, in which the excitatory unit's output time course is normalized through divisive inhibition. The circuitry can be expressed by a set of equations with only three free parameters, obtained by fitting the model to each neuron's onset response (to cover the individual unit's kinetics). Note that, although we assumed a log-Gaussian speed tuning during the steady state of the neurons to calculate sustained firing rates, the prediction of transients did not depend on any assumption about the underlying speed tuning profile (which may be influenced by higher-order parameters as e.g. stimulus history and rapid response adaptation [29,30]). The simplicity of the model allowed for a comprehensive mathematical analysis of neuronal response dynamics and the reproduction and prediction of physiological transients in response to a wide range of stimulus transitions, including the interesting case of attentional modulation of rate-decreasing events.

Non-stationary normalization by divisive inhibition
A key element of the model is normalization of the circuitry's output by divisive inhibition. Normalization by divisive (shunting) inhibition was initially suggested as a means to explain nonlinearities in the response of neurons in visual cortex [17]. It consists of dividing the response of a given neuron, or group of neurons, by the average response of a pool of normalizing units, either within the same cortical area or between areas [31][32][33][34][35]. Since its introduction, the concept was successfully used to explain neural response characteristics in a range of different contexts and neural systems, both with and without attention (for overview: [18,36]). Most work, however, has focused on static divisive normalization to describe modulations of sustained neuronal responses, while the circuitry we here introduce explicitly addresses the temporal response dynamics. Static normalization in our model emerges as the fixed point of the activation dynamics for a constant input. Static models may be capable to reproduce MT responses to dynamic stimuli to some extent [37], but they have limits to capture fast neural responses and they are inappropriate to explain non-monotonic transient responses to sudden input changes.
Temporal low-pass filtering combined with delayed inhibition was previously suggested to circumvent these limitations and allowed modelling the time course of MT responses during pursuit eye movements [38]. While low-pass filtering converts an instant input change to an exponentially saturating input current, divisive inhibition kicks in later than excitation and thus allowed us to successfully predict change-transients as observed experimentally. With inhibition following the dynamics of excitation instantaneously [39], it is not possible to reproduce the time course and the peak (or trough) of a change transient. The relaxation differential equations in our model thus have the same effect as in [38]-low-pass filtering the input drive, and delaying inhibition by assuming a larger time constant for the rise of the divisive term. However, the earlier model has a large number of free parameters and multiple functional modules for allowing detailed fits of neural responses to pursuit eye movements of different velocity, while the model we here introduce has its focus on structural simplicity. This property enables thorough mathematical analysis for a large variety of stimulus conditions, with and without the effects induced by attention, and allowed to predict neuronal response dynamics under so far untested experimental conditions.
Dynamic divisive normalization has recently been applied also by other models. A recent study, for example, implemented dynamic divisive normalization in a model to investigate the response to competing visual stimuli under different forms of attention [40], but focused only on stationary states. Other dynamical models incorporate divisive interactions via recurrent inputs. For example, recurrent divisive inhibition was used to study effects of feature-based attention [41], and for creating an abstract model of a cortical column that accounts for signal filtering and nonlinear gain control in combined feedforward and feedback pathways [42]. In decision-related neuronal circuits recurrent divisive inhibition was shown to generate both positive and negative transients [20]. In general, recurrent inhibition allows for very 'rich' model dynamics and, through reverberating activity, is also capable to generate oscillations [19,20,42], whose origin can be understood through linear stability analysis.
Yet, mathematical analysis of the dynamical properties of recurrent models becomes quickly intractable. In contrast, the feedforward model we here introduce is allowing a thorough mathematical analysis, while it still explains the complex firing rate dynamics of our experimental data. The anatomical origin, however, of divisive normalization may be rooted in a recurrent circuit. A possible test of this, and a continuation of our work, is to implement divisive normalization in a recurrent, 'ring-model'-like network, which consists of identical elementary circuits to represent neuronal columns with different preferred motion directions as in [31,36,43], yet including temporal dynamics for inhibitory feedback [19]. In conjunction with appropriate experimental data, this would allow 1) to test whether the effects we here describe are compatible with a recurrent origin of divisive inhibition, 2) how attentional modulation in this network would affect the response of a heterogeneously tuned population, and 3) further illuminate the physiological mechanism underlying dynamic divisive inhibition. Note, however, with regard to the latter, that it was recently put into question whether divisive inhibition constitutes a physiological mechanism on its own, or whether it may simply emerge as a network effect [44], possibly in conjunction with noise [45].

Relation between transients and information processing, perception, and behavior
Biologically relevant signals usually occur on very short time scales, and the brains of both vertebrates and invertebrates generate rapid visual percepts, decisions, and motor behaviors. Flies, for instance, track and chase other flies with response times as short as 30 ms [46], carnivorous vertebrates possess extremely fast sensorimotor programs for visually guided pursuit predation [47], and primates, both human and non-human, categorize objects and perform appropriate motor responses within tens of milliseconds [48][49][50]. These findings imply that, while different behaviors in different species may involve different neuronal substrates and mechanisms, the brain strongly relies on fast neuronal codes to account for the strong temporal variability of sensory input during eye movements, self-and object-motion. Transient firing rate changes of small groups of neurons in response to sudden changes in sensory input are likely part of this code. Accordingly, such transients not only participate in detection of objects and events but carry detailed information about stimulus properties. In monkey temporal cortex, transients were shown to exhibit specificity for different head views within 25 ms following onset of the population response, and to contain more information than later epochs of the response [51,52]. A corresponding pattern was found in primary visual cortex V1, reaching a peak for detectability and discriminability of oriented gratings within 150 ms of the onset response, and in area MT, where most information on motion direction is available within the first 100 to 200 ms after stimulus onset [3,4,53]. This higher information content of onset transients is likely due to a larger gain and smaller variance as compared to steady-state activation levels during continuous stimulation [53,54]. Accordingly, because brief episodes of coherent motion or rapid speed changes were found to induce firing rate changes significantly correlating with behavioral choices, transients were linked to perceptual judgments [9,[55][56][57][58][59]. The results of the current study show that the simple circuitry we used to implement rapid firing changes in response to input changes is fully reproducing the experimentally observed MT responses, including the over-and undershooting during change transients in comparison to firing rate changes expected from steady-state tuning properties [7]. Furthermore, although the excess spike counts in Fig 6 might seem small, they will have a substantial effect in a population of neurons: assuming only 10 to 100 presynaptic neurons projecting to a target unit, about 100ms after a speed change this unit would have received 10 to 100 spikes more than without attention-a number which can readily make the difference for the target unit to either remain silent or fire an action potential. Assuming a fixed threshold for excess spikes, such as in standard race models, this would allow to detect changes 10 to 20ms earlier, given the size of effects in our data (graphs not shown). Because we recently showed that thresholding of transients permitted the read-out of information in full accordance with human behavioral performance both for rate increments and decrements [6], our results provide a mechanistic explanation for computations within sensory cortex underlying the perceptual process of change detection and discrimination, basically realizable by non-stationary divisive inhibition within a simple cortical circuitry.

Modulation of change transients by selective attention
Due to the close relation between transient firing rate patterns and perceptual judgments on the one hand, and the influence of selective attention on neuronal processing and behavioral performance on the other, transients are likely targets for attentional modulation. Recent monkey neurophysiological studies reported attention-dependent modulation of amplitudes, latencies, and gamma coherence during stimulus onset or stimulus change responses in various visual areas [9,21,[60][61][62][63], and a close relation between reaction times to attended speed increments and the latency of the change transient [9]. All of these results, however, were obtained by investigating stimulus events eliciting an increase in firing rate. If specific parameters of transient firing rate changes indeed underlie perceptual performance, the question arises how attention influences a population of neurons for stimuli inducing a decrease in mean activation. The mathematical analysis of our model dynamics predicted-not without surprise-that negative transients would basically be modulated by attention in the same way as positive ones, with the slope and the relative peak height being the response features to allow for consistent attentional modulation independent of the pre-change activation level of the neuron, and independent of the sign of the transient. The experimental confirmation of this prediction provides new physiological insights for understanding change detection and its modulation by attention. As an important result, they indicate a relevant constraint of task-dependent modulation. Because attention was implemented as a positive gain to the input of the circuitry in the model, firing rates during the pre-change epoch were always larger with attention than without, regardless of the type of change occurring later. These new physiological results show that, albeit speed increments and decrements were presented block-wise and allowed the animals to make correct predictions on the sign of the upcoming change, pre-change firing rates were not different between acceleration and deceleration trials and attention consistently increased neuronal responses during the pre-change epoch by about the same factor. These results strongly suggest that attention-dependent modulation in early visual cortex is generally associated with a positive gain of neuronal responses, as opposed to a mechanism modulating responses in the same direction as the sensory event. This conclusion is consistent with recent results on both spatial [9] and feature-directed [10] attention, which were found to exert very similar effects during both the pre-change epoch and transient firing. Particularly, it was recently shown that feature-directed attention exerts a positive, tuning-independent gain even in the absence of a visual stimulus [10]. This tuning-independent effect likely constitutes the main source of topdown modulation during feature-directed attention, while tuning dependent modulation, as proposed by the feature-similarity hypothesis [64,65], is likely evolving on top of this and seems to be limited to early visual areas [66]. The new results on consistent pre-change attentional modulation in acceleration and deceleration trials are fully in line with this. Moreover, the model predicted steeper response slopes with attention for both stimulus changes inducing an increase and a decrease in neuronal firing. This prediction was a direct consequence of the model's inherent dynamics, since apart from input gain no other parameter of the model was changed to implement attention. In line with this prediction, the physiological experiments show a significant influence of attention on both the rise and the decay time of the change transient, being steeper with attention than without, i.e. attentional modulation was independent of whether the stimuli induced an increase or a decrease in the firing of neurons. Based on the model's temporal dynamics, a mechanistic explanation of this effect is that the stronger drive of both the excitatory and inhibitory unit allows a faster effect of divisive normalization with attention. Because normalization is acting in the direction of the input change, it is affecting the slope of both positive and negative transients likewise. Both the computational and the physiological data suggests that not only rapid positive changes in firing rate, but also rapid negative changes provide important information to downstream areas that are used for subsequent visual processing.

Ethics statement
Housing of animals, experimental and surgical procedures were all in accordance with the Directive 2010/63 issued by the European Commission and with the Regulation for the Welfare of Experimental Animals issued by the Federal Government of Germany, and were approved by the local authorities (Der Senator für Gesundheit, Freie Hansestadt Bremen, Az. 522-27-11/02-00).

Subjects
MT data used to develop and test the model were partly recorded in the context of previously published studies [7,10]. Additional data to test model predictions were recorded using nonhuman primate standard behavioral and neurophysiological procedures. Data were acquired from two male macaque monkeys, six and eight years old, both kept pair-wise with another male. Initial surgical interventions were performed under anesthesia and strictly sterile conditions. Anesthesia was initiated by ketamine/medetomidine and was maintained by propofol, supplemented with isoflurane. Remifentanil and Carprofen were given for peri-and postoperative analgesia, respectively. Monkeys were given sufficient time to recover before behavioral training in the laboratory. Further details on surgical procedures are given elsewhere [10,67]. Both monkeys were familiar with laboratory standard procedures and general task conditions. They received additional training on detecting negative speed changes during the course of several weeks using water and fruit juice as reinforcer for performing the task. On non-training and nonrecording days, monkeys received fruits and liquid in their home compartments, consisting of large indoor rooms with daily access to equally sized outdoor compartments. All compartments were enriched by a manifold of monkey toys, puzzles, and climbing opportunities. Health and well-being were checked by daily monitoring and regular visits by veterinarians, and body weight was checked several times a week.

Electrophysiological data
Recordings were performed using tungsten microelectrodes (2-5 MOhm, 125 mm shank diameter; Frederic Haer, Bowdoin, ME) and standard electrophysiological equipment. The pre-amplified signal was filtered between 0.7 and 5 KHz and sampled with a frequency of 25 KHz. Spikes were detected online by thresholding the signal. All spike data were then subjected to offline semiautomatic spike sorting using Klustakwik [68], followed by manual adjustment of spike clusters using a custom-made algorithm for spike form and spike parameter illustration [69]. Visual stimulation, control, and documentation of behavioral data was performed using custom-made Matlab scripts and in-house software. Eye movements were controlled by a custom-made video-oculography system with 0.2-degree resolution.

Visual stimulation and behavioral paradigm
Monkeys were tested in a behavioral task requiring detection of a rapid change in the speed of a moving stimulus, either an acceleration (speed change by a factor of~2), or a deceleration (speed change by a factor of~0.5), presented block-wise. The basic task design was the same as in previous studies 9,10 . Each trial started with appearance of a small red fixation spot (0.14-degree side length) at the center of the screen (22-inch cathode ray tube monitor, resolution 1.280 x 1.024-pixel, 100 Hz refresh rate). Monkeys initiated the trial by gazing at the fixation point, pressing a lever, and keeping it hold. Following a delay of 250 ms, a spatial cue appeared for 700 ms to indicate the location of the upcoming target stimulus, followed by another delay of 200 ms and subsequent appearance of two static Gabor stimuli (sine wave spatial frequency: 2 cycles/degree, Gaussian envelope: σ = 0.75 degree at half height), one centered above the RF of the recorded neuron and the other one mirrored across the fixation spot. Mean Gabor luminance was identical to background luminance (10 cd/m 2 ). Gabors started to intrinsically move (speed: 2.17 degree/sec) 200 ms after onset, with motion direction adjusted to the preferred direction of the recorded neuron, as described elsewhere [9]. In about 40%-50% of the trials, the uncued stimulus changed speed before the target stimulus, which had to be ignored by the monkeys. Following the speed change of the target, monkeys had to keep fixation for another 300 ms (to avoid contamination of the neuronal post-change response by eye movements) and to release the lever within a response window of 150-750 ms. Speed changes occurred within 0.66 and 5.5 sec after motion onset. Subsequent trials were separated by an intertrial interval of 3-4 sec. Releasing the lever outside the response window and eye movements of more than one degree from the fixation point caused immediate termination of a trial. Monkeys were rewarded with a few drops of water or diluted fruit juice for each correctly performed trial.

Model steady-state activation and response to step functions
The model consists of two units driven by external input I, with one unit providing divisive inhibition on the other unit (Fig 1B). Their dynamics are described by the differential Eqs (1, 2) and threshold-linear gain functions (3,4), with the steady-state activation resulting from a constant, suprathreshold input I 0 given by Inserting (10) into (9) provides Eq (5). Assuming zero thresholds θ e = θ i = 0 and a suprathreshold input I(t)>0 allows to replace (3)-(4) by linear gain functions, simplifying further analysis. Instantaneous stimulus changes are considered as step functions providing a change in constant external input from I pre to I post at t = t change . Assuming that the model is in its steady state for t<t change , Eq (2) can be explicitly solved and Eq (1) for post-change activation (t�t change ) can be written as:

Model fit to physiological data
88 recordings from M1 and 53 recordings from M2 (single-and multi-units) were used to fit the model to stimulus onset transients. Transients were caused by the appearance of a grating inside a neuron's RF, moving into its preferred direction while monkeys performed a simple fixation paradigm (cf. Ref. [7] for experimental details). While the model responds immediately to a change in its input at time t change , the physiological change response is delayed by the processing time between retina and area MT. To account for this, response onset delays Δτ relative to stimulus onset were estimated by visual inspection of the PSTHs binned at different temporal resolutions, individually for each unit (averages: M1: 31 ms +/-12 ms SD, M2: 32 +/-10 ms). Sustained responses A pre e and A post e before and after stimulus change were determined by computing the average spike rate over all trials in the intervals [-100 ms, 0 ms] and [200 ms, 500 ms], respectively (time denoted relative to stimulus onset). A pre e and A post e allow to numerically solve Eq (6) and compare it directly to the delay-compensated, trial-averaged physiological response by computing the Chi-square function, which relates the quadratic error between observation and prediction to the standard error of the observation via w 2 =N t ¼ hkA e ðtÞ À A exp e ðt À DtÞk 2 =s exp ðt À DtÞi, averaged over a 200ms interval after response onset. Values of χ 2 /N t around 1 are considered good fits because the average deviation between model and observed data is then comparable to the variability of the data. Experimental A exp e was sampled within 5 ms time bins and the model's response was downsampled to the same temporal resolution for comparison. Neural units were excluded from the fitting procedure if their spike count summed over all available trials was less than 100 (i.e., an average of only 2.5 spikes per time bin, due to a poor response to the applied speed, which was purposefully not adjusted to the neuron's tuning properties), which provides insufficient data to yield a well-shaped PSTH. Applying this criterium left 48 and 44 units from monkey M1 and M2, respectively, for fitting. Parameters τ e , τ i , and A max e for explaining physiological dynamics were determined by an iterative grid search for minimizing the quadratic error. Because A max e must be at least the value of A post e , and values higher than 3 A post e were never reached by the optimization procedure, initial grid search ranges for A max e were set to [1.03 A post e , 3 A post e ]. Ranges for τ e and τ i were set to [1 ms, 100 ms] and [1 ms, 500 ms], respectively, with the upper limits being well above the observed rise/decay times of the PSTHs. The resolution of the search grid was 40 bins for A max e , and 15 bins for the two time constants. Four iterations with subsequently refined grids resulted in a precision of 0.01% for the parameter estimates within the initially chosen range. For determining the goodness-of-fit we compared the empirical distribution of χ 2 /N t with surrogate spike data drawn from model responses generated with parameters sampled from the distributions of the fitted parameters. This surrogate data yielded distributions perfectly shadowing the empirical distributions, and thus all fits of the selected units were considered as appropriate, valid parameter estimates.

Calculation of transient and sustained activation changes
Under the condition τ e �τ i (separation of time scales), the peak response during the transient can be approximated by: ð12Þ Under assumption of a log-Gaussian velocity tuning [14,15], sustained activation can be expressed in terms of the stimulus velocity v:  (14), to obtain analytical expressions for A peak e =A pre e and A post e =A peak e for arbitrary acceleration/deceleration ratios v pre /v post (Fig 3). For evaluating A peak e =A pre e and A post e =A peak e in comparison to experimental data (Fig 3), we chose A 0 = 0, a tuning half-width of 2.5 octaves (σ v = log(2 2.5 )�1.73) and A max e ¼ 1:4A pref e .

Modulation by attention
Attention is modeled by multiplicative modulation of input I by a factor α>1, I!αI (Fig 4A).
For assessing the attention-induced modulation in these quantities (Fig 4B), we computed DF rise a ¼ F rise a À F rise 1 and DF sus a ¼ F sus a À F sus 1 . Peak activation F peak a during transient responses must be computed numerically, thus DF peak a ¼ F peak a À F peak 1 was obtained by solving Eq (6) explicitly in dependence on the chosen time constants τ e and τ i .
As a control, and for the purpose of generalization, we extended the model to have different attentional modulation factors α e and α i as input to the excitatory and inhibitory units, such that the attention-induced modulations become:

Calculation and comparison of physiological response parameters
In total, N = 45 sites were recorded in M3 and N = 25 sites in M4. For inclusion to data analysis, we required the speed-up condition to be associated with a significant firing rate increase in the attend-out condition (assessed in the time interval 140 to 160 ms after stimulus change, one-tailed test on Poissonian distribution around mean firing rate before stimulus change, p < 0.05), which was given for N up = 36 sites in M3 and N up = 19 sites in M4. Similarly, the speed-down condition was required to be associated with a significant firing rate decrease in the attend-out condition, which was fulfilled for N down = 42 sites in M3 and N down = 21 sites in M4. Rise times and relative spike counts of the experimentally observed transients (Fig 5) were calculated to compare physiological data against model predictions for ΔF rise and ΔF peak . First, rise times of physiological transients were assessed by determining excess cumulative spike counts following the stimulus change, defined as the number of spikes exceeding the spike count of a neuron continuing to fire with its observed pre-change rate. By visual inspection, we estimated an average response delay of t change = 55 ms for both monkeys. For estimating pre-change activity, we computed the summed firing rate F pre over all trials of a given attention condition in the time window [-400 ms, t change ]. Excess cumulative spike count ec(t) was then defined for t>t change by where t k denote the times of K spikes k = 1,. . .,K, considering all trials of the respective condition. For comparing the initial slopes of two attention conditions N and A, we first computed their cumulative excess responses ec N (t) and ec A (t). We then evaluated the difference Δec AN (t) = ec A (t)−ec N (t) and tested statistically whether Δec AN (t) was significantly deviating from zero. The transient of condition A was considered to rise significantly faster (or slower) than the transient of condition N, if Δec AN (t) was above (or below) a time-dependent threshold. Thresholds were set to �z ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ðF pre A þ F pre N Þðt À t change Þ q , where z = 2.32 was chosen to yield a significance level of p<0.01.
Second, spike counts for individual attention conditions and during different periods of the transient were calculated as the difference Δac between the absolute spike count before and after the speed change, Δac(itv) = ac post (itv)−ac pre , where itv denotes intervals of 25 ms length, taken between 50 ms and 200 ms following the speed change for the transient. The steady-state response before the change was estimated during the period from -400 ms to t onset . The difference between two relative spike counts was considered significantly different from zero when it exceeded ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi nar pre A þ nar post A þ nar pre N þ nar post N p , where var designates the spike count variance in the corresponding attentional condition pre-or post-stimulus change.
Supporting information S1 Data Code Repository. The archive 'S1_Data_Code_Repository.zip' contains all relevant data sets and program code for generating the results reported in the manuscript. Running the code requires Matlab R2020a (The MathWorks, Inc.) and Python (Version 3.6). To perform all data analyses and fits, execute script 'PLOS_run_all.m' from the main directory. See the comments in the scripts named 'PLOS_run.m' in the subdirectories for further information on the result plots and files created by running the code. (ZIP) 69. Galashan