Temporal dynamics of short-term neural adaptation across human visual cortex

Neural responses in visual cortex adapt to prolonged and repeated stimuli. While adaptation occurs across the visual cortex, it is unclear how adaptation patterns and computational mechanisms differ across the visual hierarchy. Here we characterize two signatures of short-term neural adaptation in time-varying intracranial electroencephalography (iEEG) data collected while participants viewed naturalistic image categories varying in duration and repetition interval. Ventral- and lateral-occipitotemporal cortex exhibit slower and prolonged adaptation to single stimuli and slower recovery from adaptation to repeated stimuli compared to V1-V3. For category-selective electrodes, recovery from adaptation is slower for preferred than non-preferred stimuli. To model neural adaptation we augment our delayed divisive normalization (DN) model by scaling the input strength as a function of stimulus category, enabling the model to accurately predict neural responses across multiple image categories. The model fits suggest that differences in adaptation patterns arise from slower normalization dynamics in higher visual areas interacting with differences in input strength resulting from category selectivity. Our results reveal systematic differences in temporal adaptation of neural population responses between lower and higher visual brain areas and show that a single computational model of history-dependent normalization dynamics, fit with area-specific parameters, accounts for these differences.

To elucidate these issues, we studied two signatures of adaptation in time-resolved neural responses at short (sub-second) time-scales.First, neural responses reduce in magnitude when a static stimulus is viewed continuously, evident in transient-sustained dynamics in the shape of response time courses (Fig. 1A).Second, when two stimuli are viewed close in time, the response to the second stimulus is reduced; i.e., repetition suppression (RS; Miller et al. 1991;Li et al. 1993;Miller and Desimone 1994;Lueschow et al. 1994; Fig. 1B).Higher visual areas have been found to show slower transients and more slowly decaying responses than lower visual areas in human ECoG (Zhou et al., 2019;Groen et al., 2022) and in simulated neural fMRI responses (Stigliani et al., 2017;Kim et al., 2023), and fMRI studies suggest that higher visual areas show stronger RS than lower areas (e.g., V1; Zhou et al. 2018;Fritsche et al. 2020).Further, a computational model of delayed divisive normalization (Heeger, 1992(Heeger, , 1993) ) simultaneously predicts transient-sustained dynamics and RS in neural population responses measured with ECoG (Zhou et al., 2019;Groen et al., 2022), implying that both forms of adaptation may reflect divisive normalization mechanisms.
Together, these findings suggest that adaptation signatures differ across the visual hierarchy and that this may reflect differences in history-dependent normalization.However, in most studies, the stimuli were noise patches or simple contrast patterns, which primarily drive responses in lower-level areas.Thus, the observed differences across areas may reflect suboptimal stimuli for higher visual areas, rather than systematic differences in temporal adaptation.Further, neural adaptation also may vary within an area, depending on stimulus type.Monkey and human fMRI studies find that in visual areas with increased sensitivity to stimulus categories such as faces or bodies, preferred stimuli elicit stronger RS than non-preferred stimuli For a prolonged single stimulus, adaptation is evident because the neural response, after an initial transient, is followed by a decay plateauing to a sustained response level.B. For two presentations of an identical image with a brief gap in between the stimuli, adaptation is evident because the neural response for the second stimulus is reduced.(Sawamura et al., 2006;Weiner et al., 2010;Rangarajan et al., 2020).Thus, to compare and model adaptation across the visual hierarchy, stimulus effectiveness must be considered.
To disentangle the influence of hierarchy and stimulus on neural adaptation, we quantified transient-sustained and repetition suppression dynamics of neural responses across multiple visual brain regions in a new set of intracranial EEG (iEEG) recordings from human participants.Participants were presented with naturalistic stimuli from distinct image categories, allowing us to assess stimulus preference and its effectiveness on neural adaptation patterns.By fitting an augmented version of the delayed divisive normalization model (Zhou et al., 2019;Groen et al., 2022) that considers stimulus category preference, we propose explanations for differences in adaptation patterns.
Our results yield three insights.First, we demonstrate systematic differences in neural adaptation between lower and higher human visual areas: lower areas show faster transient-sustained dynamics and faster recovery from repetition suppression.Second, we reveal stimulus-specific differences in recovery from RS in category-selective electrodes: preferred stimuli elicit stronger repetition suppression than nonpreferred stimuli.Third, our augmented DN model accurately predicts neural responses to different stimulus categories along the visual hierarchy.Based on the observed model behavior, we propose that observed differences in neural adaptation patterns reflect differences in divisive normalization dynamics.

Results
We collected iEEG recordings while participants viewed single and repeated naturalistic images from six stimulus categories (Fig. 2A), with variable stimulus duration and inter-stimulus-intervals (ISI) (Fig. 2B).
By aggregating responses across four patients, we identified 79 visually responsive electrodes which we separated into one lower-level visual group (V1-V3) and two higher-level ventral-occipital cortex (VOTC) and lateral-occipital cortex (LOTC) groups using retinotopic atlases (Fig. 2C).Some electrodes in VOTC and LOTC were category-selective, showing higher sensitivity to one stimulus class (Fig. 2C; see Materials and Methods, Data selection).We computed a single average time-resolved broadband response for each temporal stimulus condition and stimulus class, resulting in 72 response time-courses per electrode.
To model neural response dynamics across visual areas and stimuli, the time courses were fitted using a delayed divisive normalization (DN) model.The model takes as input a stimulus time course and produces as output a predicted neural response (Fig. 3A).To take into account category-selectivity, we allowed the model to scale the input stimulus time course as a function of category (Fig. 3B, see Materials and Methods, Computational modeling).Incorporating category-dependent scaling improves model predictions in all visual areas (Fig. 3C).
We compared the DN model to temporal two-channel models (Stigliani et al., 2017(Stigliani et al., , 2019)), which we augmented such that it similarly employs a scaling factor for modulating category-specific input strength.These models predict neural responses using distinct channels responsible for either the transient and sustained responses observed in neural signals (Horiguchi et al., 2009) and has been shown to accurately predict some aspects of iEEG responses (Groen et al., 2022) and fMRI responses (Zhou et al., 2018;Kim et al., 2023).We distinguished two different model implementations.The L+Q model (Stigliani et al., 2017) consists of a linear sustained channel and a transient channel with quadratic nonlinearity, whereas the A+S model (Stigliani et al., 2019) contains a sustained channel with adaptation and a transient channel with sigmoid nonlinearities.In the current data, the DN model outperforms the L+Q model in V1-V3 and LOTC and the A+S model in V1-V3 (Supp.Fig. 3A).While for the higher visual regions the A+S model (Stigliani et al., 2019) performs nearly on par with the DN model, we see a qualitatively poorer fits with the data, which we will discuss in more detail below.
In the following sections, we first characterize transient-sustained dynamics and repetition suppression in lower and higher visual areas and then examine repetition suppression within category-selective electrodes.
Along with the neural data, we present DN model predictions to demonstrate how the observed differences result from divisive normalization dynamics.

Higher visual areas exhibit slower and prolonged responses to single stimuli
Neural time courses to duration-varying stimuli in V1-V3, LOTC and VOTC exhibit different transient-sustained dynamics (Fig. 4A, top panel).In all areas, responses show an initial transient, which for short durations is the only part of the response, while for longer durations, a subsequent lower-amplitude sustained response emerges.However, electrodes in V1-V3 show faster and shorter transients with relatively low sustained responses, while VOTC and LOTC have slower and wider transients with higher sustained responses.
To quantify these differences in response shapes across visual areas, we computed two metrics which capture different characteristics of the time courses.First, responses rise more slowly in higher visual areas as reflected by the time-to-peak, which is shortest for V1-V3, intermediate for VOTC and longest for LOTC (Fig. 4B, circle markers).Second, compared to V1-V3, responses for VOTC and LOTC show a broadening of the transient as reflected by the full-width at half-maximum (Fig. 4C, circle markers).This difference becomes more pronounced as the stimulus duration lengthens for LOTC and to a lesser degree for VOTC.These metrics indicate a slower rise and a slower decay of the response, resulting in a prolonged, more slowly adapting response in higher visual areas.
The DN model accurately captures the broadband responses for the duration trials across all visual areas (Fig. 4A, lower panel).It also predicts the differences in response shapes, that is the slower rise (Fig. 4B, triangle markers) and the wider transients (Fig. 4C, triangle markers) in LOTC compared to V1-V3, with intermediate values for VOTC.While the A+S model captures the overall response shapes for the different visual areas, it also shows some model failures.First, the model predicts offset responses for longer stimulus durations which are not present in the neural data (Supp.Fig. 1B).Moreover, model predictions show narrower response widths for electrodes in VOTC more similar to those observed in V1-V3 (Supp.Fig. 1C).
To assess whether the differences in transient-sustained dynamics across areas are affected by stimulus selectivity, we quantified these dynamics separately for each electrode's preferred category (eliciting the maximum response) and for all remaining stimulus categories combined (non-preferred stimuli).While for higher visual areas, the response decay for preferred stimuli seems to be slightly stronger compared to nonpreferred stimuli, the neural and DN model time courses overall exhibit the same area-specific differences regardless of whether preferred (Supp.Fig. 2A-C) or non-preferred (Supp.Fig. 2D-F) stimuli were shown.This suggests that the transient-sustained dynamics in higher visual regions are not stimulus-dependent, Schematic depiction of the delayed divisive normalization (DN) model, originally proposed by Heeger (1992Heeger ( , 1993) ) and first presented in this form in Zhou et al. (2019).The model is defined by a linear-nonlinear-gain control structure, taking as input a stimulus time course and producing a predicted neural response as output.The linear computation consists of a convolution with an impulse response function (IRF), h 1 , parameterized as a gamma function with τ 1 as a free parameter.The nonlinear computation consists of rectification, exponentiation with a free parameter, n, and division by a semi-saturation constant σ, which is summed with a delayed copy of the input that is also rectified and exponentiated.but rather reflect intrinsically slower temporal integration.

Stronger RS and a slower recovery in higher visual areas for repeated stimuli
Viewing repeated stimuli results in repetition suppression in all visual areas (Fig. 5A, top panel), whereby responses to the second stimulus are most suppressed at shortest ISIs and show a gradual recovery as ISI increases.Across conditions, there also appear to be differences in RS between lower and higher visual areas.However, quantifying differences in the degree of recovery in these response time courses is not straightforward: the response to the first stimulus continues after its offset (see Fig. 4A), and as demonstrated above, this continued response is longer in higher visual regions (Fig. 4C).This problem is especially evident for short ISIs: at 17 ms ISI, response amplitudes measured after onset of the second stimulus are higher in LOTC and VOTC than in V1-V3 (Fig. 5A), but this could result from weaker RS of the second stimulus, the continued neural responses to the first stimulus, or a combination.
To disentangle these responses, we estimated the response to the second stimulus in isolation (see Materials and Methods, Summary metrics) while correcting for the ongoing activity caused by the first stimulus (Fig. 5B, Neural data).This shows that recovery from RS qualitatively differs between visual areas: V1-V3 shows less suppression and recovers faster than VOTC and LOTC.These differences between areas are partly due to differences in the peak magnitude of the response to the second stimulus, as well as the faster decay after the peak for higher visual areas (Fig. 5B).We quantified the level of RS in these responses by  Recovery from adaptation gradually increases as the ISI becomes longer, and the rate of recovery is higher for V1-V3 compared to VOTC and LOTC in both the neural data and the DN model as a result of a higher peak magnitude to the second stimulus and a less strong decay after the peak (black arrows).This figure can be reproduced by mkFigure5 6.py.
computing their Area Under the Curve (AUC) divided by the AUC of the first stimulus response (see Supp.We also fitted neural responses with the A+S model and find that the A+S model poorly aligns with the neural data, predicting an overall higher degree of RS with area-dependent differences for short as opposed to long ISIs (Supp.Fig. 1D).
Given prior reports of stimulus-specific differences in RS depending on a neural population's stimulus selectivity (Sawamura et al., 2006), we also quantified RS separately for preferred and non-preferred stimuli in all areas.In both neural responses and model predictions, the differences in RS between areas are most pronounced for preferred stimuli (Supp.Fig. 4), and comparatively less strong for non-preferred stimuli (Supp.Fig. 5).This suggests that the repetition suppression effects in higher visual areas are partly stimulus-dependent.

Differences in adaptation reflect slower normalization in higher visual areas
We showed that lower and higher visual areas show different adaptation patterns, as evident from transientsustained dynamics and recovery from repetition suppression, which are both accurately captured by the DN model.To better understand the neural computations underlying these response profiles, we examined the temporal dynamics of two components of the DN model: the input drive (i.e. the numerator) and the normalization pool (i.e. the denominator).
To explain differences in transient-sustained dynamics, we considered the model prediction for the longest duration (533 ms, Fig. 7A), because it has the most pronounced sustained response difference across areas.The DN model captures transient-sustained dynamics in neural responses because the input drive dominates the prediction early in the response, resulting in a transient, followed by the normalization pool, resulting in a response decay to sustained levels.The model suggests that lower visual areas exhibit relatively fast dynamics in both the numerator and the denominator, resulting in a fast initial rise and a fast subsequent decay of the response.These dynamics occur at a lower pace in higher visual areas, where both the input drive and normalization pool rise more slowly.This results in broader response shapes, which are most pronounced for LOTC and to a lesser degree for VOTC.
To explain differences in recovery from RS, we again examined the longest temporal condition (ISI of 533 ms, Fig. 7B), because differences in adaptation between lower and higher visual areas were most distinct at this ISI.The DN model captures suppression of repeated stimuli by adapting the dynamics of the normalization pool.After the offset of the first stimulus, the normalization pool decays and approaches the minimum possible value of the denominator, which is set by σ n .If the normalization pool has not reached this minimum value at the start of the second stimulus, suppression occurs due to the lingering normalization from the first stimulus.Thus, the difference in RS between visual areas is a result of slower dynamics of the normalization pool in VOTC and LOTC, leading to more lingering normalization at the start of the second stimulus presentation and consequently stronger RS and slower recovery.
The differences in temporal adaptation across areas are also reflected in the fitted parameter values (Fig. 7C).Both τ 1 (time constant of the IRF) and n (exponentiation) are higher in VOTC and LOTC, reflecting the slower dynamics of the input drive and normalization pool, which give rise to the area-dependent differences in transient-sustained dynamics and RS; τ 1 controls the width of the transient, reflected by the time to peak, whereas n controls the decay of the transient response.Thus, these parameters affect the full-width at half maximum and degree of recovery from RS for single and repeated stimuli, respectively.However, τ 2 and σ also affect the width and decay of the transient and fitted parameters (to some degree) trade off.Therefore, parameter differences across the visual hierarchy however should be interpreted with caution.Nonetheless, our results suggest that adaptation differences between lower and higher visual areas could arise from underlying differences in temporal normalization dynamics.

Stronger RS for preferred image categories in category-selective electrodes
The results indicated that transient-sustained dynamics are slower in higher than lower visual areas regardless of stimulus preference, whilst repetition suppression differences across areas are most pronounced for preferred stimuli (Supp.Fig. 4).To further investigate how adaptation is influenced by stimulus preference, we directly compared responses within a subset of electrodes in higher visual regions that exhibit strong category-selectivity.
We identified a subset of category-selective electrodes in LOTC and VOTC by calculating a sensitivity measure (d') on the response per stimulus category averaged across all stimulus durations (see Materials and Methods; for electrode positions and counts see Fig. 1C and Table 4, respectively).We then calculated average broadband responses separately for the preferred and non-preferred categories for each ISI and calculated recovery from RS similar as before.RS occurs for both preferred and non-preferred stimuli (Fig. 8A, top panel), but more strongly for preferred stimuli (Fig. 8B, Neural data).Model simulations show that the DN model also captures these differences, including the overall shape of the neural time courses (Fig. 8A, bottom panel) and stronger RS for preferred stimuli (Fig. 8B, DN model).
Quantifying the recovery from RS for the different stimulus types shows that the stronger RS for preferred image categories which is most pronounced for longer ISIs (Fig. 8C, left), which is accurately captured by the model, although it slightly overestimating the degree of recovery for non-preferred stimuli for shorter ISIs (Fig. 8C, right).Preferred stimuli show slower long-term recovery of RS (Fig. 8D, circle markers) in both the neural data and the DN model (Fig. 8D, triangle markers).These differences were robust in both data and model and became even more pronounced when increasing the threshold for category-selectivity selection (see Supp.Fig. 7 and Supp.Fig. 8 for a threshold of d' of 0.75 and 1, respectively, resulting in fewer selected electrodes).

Lingering normalization and stronger input drive result in stronger adaptation and slower recovery rate for preferred stimuli
Our results suggest that preferred stimuli elicit stronger RS than non-preferred stimuli in category-selective electrodes in VOTC and LOTC.The DN model explains this from the balance between the two components that make up the model denominator (Fig. 9).As before, RS for both stimulus types results from lingering normalization at the start of the second stimulus.For preferred stimuli, the input drive is strong, and therefore the lingering normalization amply surpasses the value of the semi-saturation constant, σ n .Because dynamics are slow, the lingering normalization is (relatively) high at the start of the second response, resulting in strong RS.For non-preferred stimuli, the lingering normalization is much smaller in comparison to σ n , due to the weaker input drive.While there is still lingering activity at the start of the second stimulus, σ n comprises a much larger part of the denominator, marginalizing the effect of the lingering normalization.Since σ n is the same for the first and second stimulus, less RS is observed.
In short, the differences in adaptation between preferred and non-preferred stimuli in category-selective electrodes can be explained by the balance between the normalization pool components, which depends on the initial input drive, in combination with the slower dynamics in higher visual areas.

Discussion
Our aim was to examine how short-term neural adaptation differs across human visual cortex and to pinpoint the underlying neural computations using a model of delayed divisive normalization.We demonstrate that, compared to V1-V3, higher visual areas have more prolonged responses for single stimuli and stronger repetition suppression for repeated stimuli.The DN model accurately predicts the neural response time courses and their adaptation profiles in both lower and higher visual areas by means of a category-dependent scaling on the input stimulus time course.The model fits show that differences in temporal adaptation across areas can be explained by slower dynamics of both the input drive and normalization pool for higher visual regions.We additionally find that neural responses in category-selective electrodes exhibit stronger RS for preferred than non-preferred stimuli, which the DN model explains from the balance of the normalization pool components in combination with slower dynamics in these regions.
We believe this study offers several novel insights.First, we demonstrate clear differences in temporal dynamics along the visual hierarchy when using naturalistic stimuli that drive both low and high-level visual regions well.Earlier studies which examined temporal dynamics across lower and higher areas used stimuli that mostly drive early visual regions.Our results show that stimulus effectiveness affects short-term adaptation in various ways and should therefore be carefully considered when measuring and modeling adaptation across the visual hierarchy.Second, while previous work using single-cell recordings to study stimulus-specific effects on temporal dynamics of neural responses across the visual cortex, this study is to our knowledge the first to demonstrate and model the neural computations possibly underlying such effects in human data with both high spatial-and temporal resolution.

Slower time-scales of neural processing in higher visual areas
We observed prolonged responses with slower transient-sustained dynamics in higher visual areas.This is consistent with the idea that time scales of temporal processing become longer when ascending the visual hierarchy, as suggested based on brain responses to both single (Groen et al., 2022;Zhou et al., 2018) and repeated stimulus presentations (Fritsche et al., 2020;Weiner et al., 2010;Zhou et al., 2018Zhou et al., , 2019)), as well as the pattern of responses to intact and scrambled natural movies (Hasson et al., 2008;Honey et al., 2012).Increasing temporal windows across the cortical hierarchy may have several computational benefits.First, Heeger (2017) proposed that such a hierarchy is useful for prediction over multiple timescales.Second, temporal windows may be tuned to the temporal regularities of the input features, as demonstrated in both theoretical Chaudhuri et al. (2015) and empirical work (Hasson et al., 2008;Honey et al., 2012).Different types of image feature are likely to exhibit different temporal regularities in natural viewing condi-tions: low-level features (e.g., orientation, edges, and contrast) change each time an observer moves their eyes, thereby benefiting from shorter processing windows, while high-level features (e.g., holistic representations of faces and objects) are likely to be stable over longer viewing durations, and areas tuned to that information may therefore be tuned to longer timescales.In addition to computational benefits, the ability to integrate and hold information across a variety of time scales is also critical for cognition and flexible behaviour (Soltani et al., 2021).
In addition to a hierarchy within unimodal areas (e.g.visual or auditory cortex), there may also be a hierarchy of time scales in multi-modal processing, with shorter time windows in unimodal regions and longer time windows in association cortex (e.g.lateral prefrontal cortex or the default model network), which has been observed across several acquisition modalities, species and task states (e.g.Lerner et al. 2011).It is believed that this hierarchy of timescales plays a key role in both integrating and segregating sensory information across time.Regions with shorter timescales may favour temporal segregation, reflected by shorter neural responses, whereas higher areas are involved in temporal integration, reflected by longer neural responses.This balance of temporal integration and segregation may enable the segmentation of continuous inputs (for a review see Wolff et al. 2022), benefiting perception and cognition.Whether similar distinctions can be made between lower and higher regions within unimodal areas in visual cortex, and how this contributes to perception, warrants future investigation.

Slower recovery from RS in higher visual areas
We found differences in the overall degree of repetition suppression and recovery rate from RS between lower and higher visual areas.These results differ from a prior study (Groen et al., 2022), which found that the degree of RS and the recovery rate from RS did not differ between early visual and lateral-occipital retinotopic regions, ranging from V1 to IPS.Here we find stronger RS as well as higher recovery rates in VOTC and LOTC compared to V1-V3.We attribute the difference between studies to the difference in stimuli, simple contrast patterns in Groen et al. (2022) vs naturalistic stimuli in this study.Simple contrast patterns strongly drive responses in lower visual areas (V1-hV4, Kay et al. 2013;Zhou et al. 2018), but not higher areas that are selective for complex, naturalistic stimuli (Sayres and Grill-Spector, 2008;Arcaro et al., 2009;Silson et al., 2016).The reduced responses in higher areas to simple contrast patterns could have made it more difficult to accurately measure RS.In addition to eliciting weaker responses, sub-optimal stimuli may have also led to less RS in higher areas, making the adaptation patterns more similar to early areas.This explanation is supported by our current observations of similar RS between areas for non-preferred stimuli (Supp.Fig. 5), as well as less RS for non-preferred stimuli within category-selective electrodes (Fig. 8).Furthermore, stimulus type influences not only the magnitude of neural responses but also their temporal stability (Marks and Goard, 2021) as well as their oscillatory components (Hermes et al., 2015), which could also affect RS patterns.
An fMRI study on short-term adaptation by Fritsche et al. (2020) found stronger RS for higher visual regions, consistent with our findings, but did not observe differences in recovery rate between visual areas despite using complex stimuli, differing from our findings.One reason for the discrepancy with our findings could be the way (recovery from) adaptation was computed.As the sluggish nature of the BOLD signal makes it difficult to estimate independent fMRI responses to stimuli presented close in time, Fritsche et al. (2020) used stimulus pairs consisting of either repeated, identical stimuli, or two distinct stimuli, and quantified RS as the difference in the maximal response to identical versus non-identical stimulus pairs.In contrast, we measured iEEG responses only to repeated representations of the same image, and measured recovery from RS as the difference in response AUC between the first and second stimulus representation.

Differences in temporal dynamics between ventral and lateral occipital cortex
We separated our electrodes into two higher-level groups covering ventral and lateral occipito-temporal cortex, respectively.Previous work has shown differences in the temporal dynamics between these regions using an encoding framework where neural responses were modelled in separate sustained and transient channels (Stigliani et al., 2019).VOTC responded to both transient and sustained visual inputs, while LOTC predominantly responded to visual transients.The authors suggested that VOTC regions are mainly involved in processing of static inputs while LOTC regions process dynamic inputs.In contrast, our data show a more sustained response in LOTC compared to VOTC (Fig. 4C).These sustained responses could indicate that LOTC accumulates information over relatively longer time periods, in line with work suggesting that LOTC regions may also be involved in more stable information processing (Honey et al., 2012).
While Stigliani et al. (2019) showed that VOTC and LOTC both exhibit transient responses, they also observed differences in the dynamics of transient processing across the two visual streams.In LOTC, the onset and offset of the visual stimulus elicited equal increase in neural responses, suggesting that these areas process information regarding moment-to-moment changes in the visual input.In VOTC, the transient responses for the onset and offset of the stimulus were surprisingly asymmetric and were mostly dominated by stimulus offset.The authors hypothesized that this reflected memory traces maintained by these regions after the stimulus is no longer visible.In our data however, we did not observe strong stimulus offset responses.A similar lack of offset responses was observed in earlier ECoG studies (Groen et al., 2022;Zhou et al., 2019).Zhou et al. (2019) furthermore noted that offset responses were more pronounced for electrodes with peripherally tuned spatial receptive fields (beyond 5 degrees eccentricity).The stimuli used in the current study extended to 8.5 eccentricity, therefore the lack of offset responses may be related to the spatial coverage of the stimulus.However, other explanations are possible, such as differences in data type (fMRI vs. ECoG), brain areas sampled, or experimental design.In conclusion, further research is needed to elucidate the differences in temporal dynamics between higher-level regions and how they relate to the timescales of the visual input.

Stimulus-specific differences in temporal dynamics in category-selective areas
We found stronger RS for preferred than non-preferred stimuli in category-selective electrodes, consistent with findings from fMRI (Weiner et al., 2010), single-cell recordings (Sawamura et al., 2006;Williams and Olson, 2022) and ECoG (Rangarajan et al., 2020).The DN model shows that the stimulus-specific adaptation differences could result from the balance in normalization pool components in combination with slower normalization dynamics in these areas.The strong input drive for preferred stimuli causes more lingering normalization so that when the second stimulus arrives, there is a reduced response.To model effects of stimulus preference on neural response dynamics, we augmented the DN model by incorporating categorydependent scaling on the stimulus timecourse.Model fits showed that adding a category-based scaling factor results in better predictions in all visual regions, including V1-V3, which is not typically considered to exhibit category-selectivity.We attribute the scaling benefit in these early visual regions to co-variation of low-level feature differences with the categories in the dataset.Specifically, one of the six categories consisted of scrambled stimuli which had many edge elements, and one of scene stimuli which had a slightly larger retinotopic extent than the other classes.These classes likely are more optimal stimuli for lower visual areas.
While our data revealed stimulus-specific effects on RS for repeating stimuli, we observe weak to no effect of stimulus preference on transient-sustained dynamics during single stimulus trials.This is in line with previous work on non-human primates using single-cell recordings, which predominantly report differences in temporal dynamics in the context of RS (Li et al., 1993;Sawamura et al., 2006;McMahon and Olson, 2007;De Baene and Vogels, 2010;Kaliukhovich and Vogels, 2016).While some studies also present stimuli in isolation, they do not further examine adaptation-related differences based on stimulus preference.For example Sawamura et al. (2006) showed, similar to our experimental paradigm, stimulus sequences with either identical or varying images that elicited weaker or stronger responses depending on stimulus preference.While the authors do make comparisons between repeated stimuli and single stimuli, no analysis is conducted regarding the dynamics of preferred and non-preferred stimuli in isolation.While the lack of reports regarding stimulus-dependent effects on transient-sustained dynamics does not evince their nonexistence, further research should elucidate the presence of stimulus-specific effects on the temporal dynamics during briefly presented stimuli with varying durations.

Limitations and future work
First, since electrode positioning was determined based on clinical constraints, the number of electrodes localized to individual retinotopic maps was limited.Therefore, our comparisons focused on coarse groupings of the visual areas: early (V1-V3) versus ventral (VOTC) vs lateral (LOTC) maps.For fine-grained comparison between visual areas across the cortical hierarchy (say V1 vs V2), different methods are needed.Second, the current model form does not explicitly represent the computations in each stage of processing, and so the model is agnostic to the origin of the divisive signals.Third, the behavioral task participants performed was orthogonal to the temporal stimulus manipulations.This design was purposeful to reduce variability in top-down signals from trial to trial and between participants.Nonetheless, neural adaptation is important for behavior such as priming (Cave, 1997;McMahon and Olson, 2007), and the link between them cannot be directly studied without a task that is relevant to the stimulus.
Several approaches could be undertaken to tackle some of these limitations, including collecting and fitting behavioral measurements of adaptation with the DN model, or measuring transient-sustained dynamics and RS in neural data from animals to allow a more systematic comparison across the visual hierarchy.Another approach is to study adaptation in Artificial Neural Networks (ANNs).ANNs have recently come forward as a powerful new tool to model sensory processing (Yamins and DiCarlo, 2016;Richards et al., 2019;Doerig et al., 2023).These models are image-computable, are trained to process naturalistic stimuli, consist of units whose activations are inspired by biological neuronal signals, and output predictions that can be compared with human behavior.Moreover, these models process inputs in a sequential fashion, where activations from earlier layers are fed to later layers which is comparable to the input-out transformations mimicking the neural processing from lower-to higher-level areas.Future studies could examine the link between adaptation phenomena and behavior by implementing biologically plausible adaptation in ANNs, including divisive normalization.Such paradigms could aid in better understanding how different adaptation mechanisms may benefit perception.
Lastly, we would like to note that the DN model as presented in this study, is not a circuit-level model and the predicted neural responses can be the result of a variety of biophysical and cellular mechanisms.Future studies should perform a more in-depth examination, using other types of data such as single-cell recordings and alternative models (e.g.van Rossum et al. 2002), to identify the neural circuitry that could give rise to observed normalization dynamics across visual areas and stimuli.

Materials and Methods
The methods for collecting and preprocessing the ECoG data have been recently described by Groen et al. (2022).For convenience, the following sections were duplicated with modifications reflecting differences from the previous method: ECoG recordings, Data preprocessing and Electrode localization.

Subjects
Intracranial EEG data were collected from four participants who were implanted with subdural electrodes for clinical purposes at the New York University Grossman School of Medicine (New York, USA).The study was approved by NYU Grossman School of Medicine IRB, and prior to the experiment participants gave informed consent.All participants had normal or corrected-to-normal vision and were implanted with standard clinical strip, grid and depth electrodes.One participant was additionally implanted with a high-density research grid (HDgrid), for which separate consent was obtained.Detailed information about each participant and their implantation is provided in Table 1 and Supplementary Figure 9.

iEEG recordings
Recordings were made using a Neuroworks Quantum Amplifier (Natus Biomedical) recorded at 2048 Hz, band-pass filtered at 0.01-682.67Hz, and then downsampled to 512 Hz.An audio trigger cable, connecting the laptop and the iEEG amplifier, was used to record stimulus onsets and the iEEG data.Behavioral responses were recorded by an external number pad that was connected to the laptop through a USB port.Participants initiated the start of the next run by pushing a designated response button on the number pad.

Stimuli
Stimuli consisted of natural color images presented on a gray background belonging to one of the following six categories: bodies, buildings, faces, objects, scenes and scrambled (Fig. 2A).Images (568 x 568 pixels) were taken from a set of stimuli used in prior fMRI studies to localize functional category-selective brain regions (Silson et al., 2019(Silson et al., , 2022)).In total the dataset consisted of 288 images with 48 images per category.Bodies consisted of pictures of hands (24 images) and feet (24 images) taken from a variety of viewpoints.Buildings consisted of a large variety of human-built structures (including houses, apartment buildings, arches, barns, mills, towers, skyscrapers, etc).Face images were taken from frontal viewpoints and were balanced for gender (24 male, 24 female) and included a variety in race and hairstyle.Objects consisted of both man-made items (24 images, e.g., household items, vehicles, musical instruments, electronics and clothing) and natural items (24 images, e.g., fruits/vegetables, nuts, rocks, flowers, logs, leaves, and plants).Scene images were equally divided between indoor, outdoor man-made and outdoor natural scenes (16 images each).Faces, bodies, buildings and objects were cropped out and placed on gray-scale backgrounds.Scrambled images consisted of an assembly of square image patches created by taking the cropped object images and randomly swapping 48 × 48 pixel 'blocks' across images and placing them on a gray scale background.Stimuli were shown on a 15 inch MacBook Pro laptop with a screen resolution of 1280 x 800 pixels (33 cm x 21 cm), which was placed 50 cm from the participant's eyes (at chest level), resulting in stimuli subtending 8.5 degrees of visual angle.Stimuli were presented at a frame rate of 60 Hz using Psychtoolbox-3 (Brainard and Vision, 1997;Pelli and Vision, 1997;Kleiner et al., 2007).

Experimental procedure
Participants viewed two different types of trials (Fig. 2B).Duration trials showed a single stimulus for one of six durations, defined as powers of two times the monitor dwell time (1/60): 17, 33, 67, 134, 267 and 533 ms.Repetition trials contained a repeated presentation of the same image with fixed duration (134 ms) but variable inter-stimulus interval (ISI), ranging between 17-533 ms (same temporal step sizes as the duration trials).These temporal parameters were identical to previous studies (Zhou et al., 2018(Zhou et al., , 2019;;Groen et al., 2022), but here naturalistic color images were presented instead of gray-scale noise patterns.
Each participant underwent 2-6 runs of 144 trials each, including 72 duration trials and 72 repetition trials, which each contained 12 stimuli from each of the six stimulus categories.Trial order was randomized, with an inter-trial-interval (ITI) randomly chosen from a uniform distribution between 1.25-1.75s.Participants were instructed to fixate on a cross at the center of the screen and press a button when it changed from black to white or vice versa.Fixation cross changes occurred independently of the stimulus sequence on randomly chosen intervals between 1-5 s.In between runs participants were allowed a short break.Stimuli were divided into two sets, one for even and one for odd runs, with each set containing 72 of the 144 stimuli.The number of odd/even run pairs determined the number of repetitions for a specific trial-type.Detailed information about the amount of data collected for each participant is provided in Table 1.Three participants (p12-14) additionally viewed repetition trials in which the second image differed from the first (either different exemplar from the same category or a different category).These trials are included in the dataset (see Data Availability) but not further analyzed for the purpose of this study.

Data preprocessing
Data was read into MATLAB 2020b using the Fieldtrip Toolbox (Oostenveld et al., 2011) and preprocessed with custom scripts available at https://github.com/WinawerLab/ECoG_utils.The raw voltage time series from each electrode, obtained during each recording session, were inspected for spiking, drifts or other artifacts.Electrodes were excluded from analysis if the signal exhibited artifacts or epileptic activity, determined based on visual inspection of the raw data traces and spectral profiles, or at the clinician's indication.Next, data were divided into individual runs and formatted according to the iEEG-BIDS format (Holdgraf et al., 2019).For each run, the data were re-referenced to the common average computed separately for each electrode group (e.g.grid or strip electrodes, see bidsEcogRereference.m) and a time-varying broadband signal was computed for each run (see bidsEcogBroadband.m): First, the voltage-traces were band-pass filtered by applying a Butterworth filter (passband ripples < 3 dB, stopband attenuation 60 dB) for 10 Hz-wide bands ranging between 50-200 Hz.Bands that included frequencies expected to carry external noise were excluded (60, 120 and 180 Hz).Next, the power envelope of each band-pass filtered time course was calculated as the square of the squared magnitude of the analytic signal.The resulting envelopes were then averaged across bands using the geometric mean (see ecog extractBroadband.m),ensuring that the resulting average is not biased towards the lower frequencies.The re-referenced voltage and broadband traces for each run were written to BIDS derivatives directories.

Electrode localization
Pre-and post-implantation structural MRI images were used to localize intracranial electrode arrays (Yang et al., 2012).Electrode coordinates were computed in native T1 space and visualized onto pial surface reconstructions of the T1 scans, generated using FreeSurfer (Fischl, 2012).Boundaries of visual maps were generated for each individual participant based on the preoperative anatomical MRI scan by aligning the surface topology with two atlases of retinotopic organization: an anatomically-defined atlas (Benson et al., 2014;Benson and Winawer, 2018) and a probabilistic atlas derived from retinotopic fMRI mapping (Wang et al., 2015) (Supp.Fig. 9A).Using the alignment of the participant's cortical surface to the fsaverage subject retrieved from FreeSurfer, atlas labels defined on the fsaverage were interpolated onto the cortical surface via nearest neighbor interpolation.Electrodes were then matched to both the anatomical and the probabilistic atlas using the following procedure (bidsEcogMatchElectrodesToAtlas.m):For each electrode, the distance to all the nodes in the FreeSurfer pial surface mesh was calculated and the node with the smallest distance was determined to be the matching node.The matching node was then used to assign the electrode to one of the following visual areas in the anatomic atlas (hereafter referred to as the Benson atlas): V1, V2, V3, hV4, VO1, VO2, LO1, LO2, TO1, TO2, V3a, V3b, or none; and to assign it a probability of belonging to each of the following visual areas in the probabilistic atlas (hereafter referred to as the Wang atlas): V1v, V1d, V2v, V2d, V3v, V3d, hV4, VO1, VO2, PHC1, PHC2, TO2, TO1, LO2, LO1, V3b, V3a, IPS0, IPS1, IPS2, IPS3, IPS4, IPS5, SPL1, FEF, or none.After localization, all electrodes were assigned to one of three visual electrode groups: early (V1-V3), ventral-occipital (VOTC) and lateral-occipital (LOTC), according to the following rules (Table 2): electrodes were assigned to V1-V3 if located in V1, V2, V3 according the Benson atlas or if located in V1v, V1d, V2v, V2d, V3v, V3d according to the Wang atlas.Electrodes were assigned to VOTC if located in hV4, VO1 VO2 according to either the Benson or Wang atlas.Electrodes were assigned to LOTC if electrodes were located in any of the remaining retinotopic atlas areas (with exception of SPL1 and FEF).Electrodes that showed robust visual responses according to the inclusion criteria (see Data selection) but were not matched to any retinotopic atlas region (i.e. that obtained the label 'none' from the retinotopic atlas matching procedure described above), were manually assigned to one of the three groups based on visual inspection of their anatomical location and proximity to already-assigned electrodes (e.g.being located on the same electrode strip extending across the lateral-occipital surface, or penetrating the same cortical region as nearby depth electrodes being assigned to V1-V3).Detailed information about the subject-wise electrode assignment is provided in Table 3.A schematic layout of the electrodes assigned to the visual regions pooled across all four participants is shown in Figure 2C.

Data selection
Python scripts used for data selection can be found at https://github.com/ABra1993/tAdaptation_ECoG.git.Two consecutive data selection steps were performed: 1) trial selection and 2) selection of visuallyresponsive and category-selective electrodes.

Trial selection
Trial selection was performed on the broadband time courses for each electrode separately (analysis selectEpochs.py).We first computed the maximum (peak) response within each trial, after which the standard deviation (SD) of these maximum values over all trials was computed.Trials were excluded from analysis if the maximum response was > 2 SD.Across participants, on average 3.25% of epochs (min: 1.81%, max: 3.80%) were rejected.Next, broadband time courses were converted to percentage signal change by point-wise dividing and subtracting the average prestimulus baseline (100 to 0 ms prior to stimulus onset) across all epochs within each run (analysis baselineCorrection.py).

Electrode selection
Electrode selection was performed separately for the analyses focusing on comparison of temporal dynamics across visual areas (Fig. 4, 5, 6 and 7) and analyses focusing on comparison across stimuli in category-selective regions (Fig. 8 and 9).
Selection of visually-responsive electrodes.For the comparisons across areas, electrodes were included when showing a robust broadband response based on the following two metrics computed onto the duration trials (analysis selectElectrodes.py): the z-score (z − score = µ σ ) where the mean and deviation are computed across time samples and the onset latency of the response computed over the average stimulus duration.The onset latency was determined during the 150 time samples (∼300 ms) time window after stimulus onset.First, responses were z-scored, after which the onset latency was defined as the first time point at which the response passed a threshold (0.85 std) for a duration of at least 60 time samples (∼120 ms).Note, converting responses to z-score was only applied during the electrode selection procedure.The reason for this is because response magnitudes when expressed as a percent signal change vary highly across electrodes.To determine the response onset latency, we were interested in the relative increase after presenting the stimulus, and for this reason time courses were converted to a z-score.Electrodes were included in the final selection when i) a response onset could be determined and ii) if the z-score > 0.2.Based on the selection methods described above, on average 37% (min: 14%, max: 57%) of the electrodes assigned to a visual group either according to the Benson or Wang atlas were included.
Selection of category-selective electrodes Electrodes were considered category-selective if they preferentially responded to a given image category over other image categories (excluding scrambled) computed for the duration trials.Category-selectivity of an electrode was measured as d': where X cat and σ cat represent the mean response and standard deviation for one image category over time, while X other and σ other represent the mean response and standard deviation over time for the other image categories.Category-selective electrodes generally exhibit a low z-score for the non-preferred image categories, possibly leading to exclusion from analysis when considering only the z-score computed over all categories (see above).Therefore, for the comparison across stimuli in category-selective regions, electrodes were included if i) a onset latency for the averaged response over all categories for the duration trials was present and if ii) d' passed a threshold of 0.5, 0.75 or 1.The reason for using a range of threshold values was to verify whether the observed data patterns depend on the chosen threshold, whereby a lower threshold allows inclusion of electrodes which show weaker selectivity for a specific image category (analysis selectElectrodes.py).Detailed information about the number of category-selective electrodes included is provided in Table 4 and a schematic layout of the category-selective electrodes is shown in Fig. 2C for a d' threshold of 0.5 (see Supp.Fig. 10A and Supp.Fig. 10B for a threshold of 0.75 and 1.0, respectively).

Data summary
The data preprocessing, electrode localization and data selection procedures outlined above resulted in 79 electrodes with robust visual responses over either V1-V3 (n = 17), VOTC (n = 15) or LOTC (n = 47).A subset of these electrodes showed selectivity for specific image categories where the number of categoryselective electrodes depended on the threshold of d' (n = 26, n = 12, n = 6 for a threshold of 0.5, 0.75 and 1 respectively).After averaging the time series within trial types, there were 72 response time courses per electrode: 12 temporal conditions (6 durations and 6 ISIs) times 6 image categories.The time series from these 72 conditions were used to investigate the temporal profile of neural adaptation and constituted the data for model fitting.

Model fitting
Computational models and associated model fitting procedures were implemented using custom Python code available at https://github.com/ABra1993/tAdaptation_ECoG.git.Models were fitted separately to individual electrodes, after which parameters or metrics derived from these fits were averaged within visual areas using a bootstrapping procedure described below.Models were fitted using a nonlinear least-squares algorithm (scipy.optimize.leastsquareas, SciPy, Python), with bounds on the parameters.The starting points, and upper and lower bounds that were used for fitting can be found at modelling utils paramInit.py.

Delayed normalization model
The broadband time courses for each individual electrode were fitted with a delayed divisive normalization (DN) model previously described conceptually in the appendices of Heeger (1992Heeger ( , 1993) ) and implemented in Zhou et al. (2019) and Groen et al. (2022).In the DN model, an input drive is divisively normalized by its own delayed activation history, implemented as a low-pass filter on the input drive (DN.py).The model takes a stimulus time course as input and produces a predicted neural response time course as output, by applying a series of transformations which take the form of a Linear-Nonlinear-Gain control (LNG) structure, corresponding to filtering (L), exponentiation (N), and normalization (G).The model contains four free parameters of interest: τ 1 , n, σ and τ 2 (Fig. 3A).In addition, two nuisance parameters are fitted, including a shift (delay in response onset relative to stimulus onset) and electrode-specific scale (i.e. gain of response) to take into account differences in overall response latency and amplitude between electrodes.In the following, we will drop the time index for brevity, and denote free parameters between parentheses.
The input drive, r input drive , is computed by first convolving a stimulus time course (s = 0 when stimulus is absent, s = 1 when the stimulus is present) with an impulse response function (IRF), h 1 (τ 1 ), yielding a linear response prediction: where h 1 is defined as: (3) The parameter τ 1 is a time constant and determines the peak (i.e.function peaks when t = τ 1 ).The input drive is obtained by converting the linear response to a nonlinear response by applying a full-wave rectification and an exponentiation with n: The normalization pool, r normalization , is computed by summing a saturation constant, σ, and a convolution of the linear response with a low-pass filter followed by rectification, where both terms are exponentiated with n: with the low-pass filter taking the form of the following decaying exponential function with a time constant τ 2 : In summary, the delayed divisive normalization is applied as follows: The computation of the temporal dynamics by the DN model as described in Equation 7 has the form of a canonical divisive normalization (Carandini and Heeger, 2012), where the normalization pool (i.e. the denominator) consists of a delayed version of the numerator, yielding an output that is characterized by a transient response rise followed by a decay to a sustained response level.

Figure 1 :
Figure 1: Two forms of temporal short-term adaptation observed in neural response time courses A. For a prolonged single

Figure 2 :
Figure 2: Experimental design and electrode positions A: Stimuli consisted of natural images belonging to one of six image categories (bodies, buildings, faces, objects, scenes and scrambled).For privacy reasons, the face exemplar shown here depicts the first author and was not included in the actual stimulus set.B: Subjects were presented with two different trial types.Duration trials (left) consisted of a single stimulus with one of six durations, ranging from 17-533 ms.Repetition trials (right) comprised two stimulus presentations of 134 ms each with one of six ISIs ranging from 17-533 ms.Subjects fixated on a small cross and were instructed to press a button whenever it changed color.C: Electrodes with robust visual responses were identified in V1-V3 (n = 15), VOTC (n = 17) or LOTC (n = 47).Electrodes not included in the dataset are shown in black.Electrodes were considered category-selective if the average response for one image category was higher than for the other image categories (d' > 0.5, see Eq. 1, Materials & Methods, n = 26).Apparent misalignments between electrode positions and the brain surface in C result from the fact that the electrodes here are displayed on the reconstructions of the average brain surface; electrode assignment was performed in each participant's native T1 space (Supp.Fig.9).L = lateral, M = medial, D = dorsal, V = ventral, A = anterior, P = posterior.The brain surfaces and electrode positions can be reproduced by mkFigure2.py.

Figure 3 :
Figure 3: Modeling neural responses using delayed divisive normalization with category-dependent input strength A:

Figure 4 :
Figure 4: Slower rise and prolonged responses in higher visual areas A: Top, Average, normalized broadband iEEG responses (80-200 Hz) for electrodes assigned to V1-V3 (n = 17), VOTC (n = 15) and LOTC (n = 47) to single stimuli (gray).Responses are shown separately per duration from shortest (17ms, left) to longest (533 ms, right).Bottom, DN model predictions for the same conditions.The shapes of the neural time courses differ between visual areas and are accurately captured by the DN model.Time courses were smoothed with a Gaussian kernel with standard deviation of σ = 10; the shaded regions indicate 68% confidence interval across 1000 bootstrapped timecourses (see Materials & Methods, Bootstrapping procedure and statistical testing).B-C: Summary metrics plotted per visual area derived from the neural responses (circle marker) or model time courses (triangle marker).Time-to-peak (B) computed to the longest duration (533 ms).Full-width at half maximum (C), computed for each stimulus duration.Data points indicate medians and error bars indicate 68% confidence interval across 1000 samples derived from the bootstrapped timecourses.Bootstrap test, * = p < 0.05 (two-tailed, Bonferroni-corrected).This figure can be reproduced by mkFigure4.py.

Figure 5 :
Figure 5: Qualitative differences in repetition suppression across visual areas A: Top, Average, normalized broadband responses for electrodes assigned to V1-V3 (n = 17), VOTC (n = 15) and LOTC (n = 47) to repeated visual stimuli (gray).Responses are shown separately per ISI from shortest (17 ms, left) to longest (533 ms, right).Bottom, DN model predictions for the same data.Time courses differ between visual areas which is captured by the DN model.B. Estimated, normalized response to the second stimulus for V1-V3, VOTC and LOTC.For each visual area, the left panel shows the neural data and the right panel shows the model prediction.Recovery from adaptation gradually increases as the ISI becomes longer, and the rate of recovery is higher for V1-V3 compared to VOTC and LOTC in both the neural data and the DN model as a result of a higher peak magnitude to the second stimulus and a less strong decay after the peak (black arrows).This figure can be reproduced by mkFigure5 6.py.

Fig. 3 )
. Neural responses show overall stronger RS for shorter compared to longer ISIs (Fig. 6A, left), but also relatively more RS in VOTC and LOTC than in V1-V3.Responses in V1-V3 are nearly fully recovered at the longest ISI of 533 ms, while VOTC and LOTC are still suppressed.Summary metrics of the average recovery across ISIs (Fig. 6B, circle markers) and long-term recovery (Fig. 6C, circle markers) confirm that there is less RS and faster recovery in V1-V3 compared to VOTC and LOTC.Fitting these responses with the DN model again shows accurate predictions: the model captures the overall gradual recovery from RS with longer time lags, closely mimicking the neural data (Fig. 5A, lower panel and Fig. 5B, DN model).The DN model also predicts stronger RS (Fig. 6A, right), reflected in average level of suppression (Fig. 6B, triangle markers) and faster recovery (Fig. 6C, triangle markers), for higher than lower visual areas, although it underestimates the average suppression in VOTC and LOTC, possibly due to a slight over-prediction of the recovery for shorter ISIs.

Figure 6 :
Figure 6: Stronger RS and slower recovery rate from adaptation in higher visual areas A: left, Recovery from adaptation for V1-V3 (n = 17), VOTC (n = 15) and LOTC (n = 47), computed as the ratio of the Area Under the Curve (AUC) between the first and second response.The fitted curves express recovery as a function of the ISI (see Materials and Methods, Summary metrics).Higher visual areas show stronger RS and slower recovery from adaptation.Right, model predictions for the same data.The model captures area-specific recovery from adaptation.B-C: Summary metrics plotted per visual area derived from the neural responses (circle markers) or model time courses (triangle markers).Average recovery (B) from adaptation for each area, computed by averaging the AUC ratios between the first and second stimulus over all ISIs.The long-term recovery (C) reflects the amount of recovery for an ISI of 1s, obtained by extrapolating the fitted line.Higher visual areas show stronger RS and a slower recovery rate which is accurately predicted by the DN model.Data points indicate medians and error bars indicate 68% confidence interval across 1000 samples derived from the bootstrapped timecourses.Bootstrap test, * = p < 0.05 (two-tailed, Bonferroni-corrected).This figure can be reproduced by mkFigure5 6.py.

Figure 7 :
Figure 7: Slower normalization in higher visual areas results in prolonged response shapes and stronger RS A. DN model prediction of the neural response for a single stimulus with a duration of 533 ms in V1-V3 (n = 17), VOTC (n = 15) and LOTC (n = 47).For each visual area, an additional panel is shown depicting the input drive (numerator, solid line) and the normalization pool (denominator, dashed line).The slower rise and prolonged response for VOTC and LOTC result from slower dynamics (arrow) of the normalization pool.B. Same as A for a repeated stimulus with an ISI of 533 ms.The stronger RS for higher visual areas results from lingering normalization at the start of the second stimulus, which is stronger for VOTC and LOTC compared to V1-V3.C. Fitted DN model parameters per visual area, from left to right: h 1 (time constant of the IRF), h 2 (time constant of the exponential decay), n (exponent) and σ (semi saturation constant).Data points indicate medians and error bars indicate 68% confidence interval across 1000 samples derived from the bootstrapped timecourses.This figure can be reproduced by mkFigure7.py.

Figure 8 :Figure 9 :
Figure 8: Differences in recovery from adaptation across stimuli in category-selective areas A. Top, Average, normalized broadband responses of category-selective electrodes (threshold d' = 0.5, n = 26) of trials during which preferred (blue) or non-preferred (red) stimuli were presented in repetition (gray).Responses are shown separately per ISI from shortest (17 ms, left) to longest (533 ms, right).Bottom, DN model predictions for the same data.Time courses differ for preferred and non-preferred stimuli which is captured by the DN model.For non-normalized responses see (Supp.Fig. 6) B. Estimated, normalized response to the second stimulus for trials containing preferred and non-preferred stimuli.Per visual area, the left panel shows the neural data and the right panel shows the model prediction.The rate of recovery is higher for non-preferred compared to preferred stimuli.C: Recovery from adaptation computed as the ratio of the AUC between the first and second response derived from the neural data (left) or DN model predictions (right).The fitted curves express the degree of recovery as a function of the ISI (see Materials and Methods, Summary metrics).Responses derived from trials containing preferred stimuli show a stronger degree of RS and the DN model is able to capture stimulus-specific recovery from adaptation.D: Long-term recovery from adaptation derived from the neural responses (circle marker) or DN model (triangle marker), reflecting the amount of recovery for an ISI of 1s.Responses for trials presenting preferred stimuli show stronger RS and a slower recovery rate.Data points indicate medians and error bars indicate 68% confidence interval across 1000 samples derived from the bootstrapped timecourses.Bootstrap test, * = p < 0.05 (two-tailed).This figure can be reproduced by mkFigure8.py.
in recovery from repetition suppression across visual areas for non-preferred stimuli A. Estimated, normalized response to the second stimulus for V1-V3, VOTC and LOTC.For each visual area, the left panel shows the neural data and the right panel shows the model prediction.Time courses were obtained using a bootstrapping procedure (n = 1000, see Materials & Methods, Bootstrapping procedure and statistical testing).Recovery from adaptation gradually increases as the ISI becomes longer, and the rate of recovery is higher for V1-V3 compared to VOTC and LOTC in both the neural data and the DN model.B: left, Recovery from adaptation for V1-V3 (n = 17), VOTC (n = 15) and LOTC (n = 47), computed as the ratio of the Area Under the Curve (AUC) between the first and second response.The fitted curves express the degree of recovery as a function of the ISI (see Materials and Methods, Summary metrics).Higher visual areas show stronger RS and slower recovery from adaptation.Area-related differences are less pronounces compared to preferred stimulus trials (Supp.Fig.2).Right, model predictions for the same data.The model is able to capture area-specific recovery from adaptation.B-C: Summary metrics plotted per visual area derived from the neural responses (circle marker) or model time courses (triangle marker).Average recovery (B) from adaptation for each area, computed by averaging the AUC ratios between the first and second stimulus over all ISIs.The long-term recovery (C) reflects the amount of recovery for an ISI of 1s, obtained by extrapolating the fitted line.Higher visual areas show stronger RS and a slower recovery rate which is accurately predicted by the DN model.Data points indicate medians and error bars indicate 68% confidence interval across 1000 samples derived from the bootstrapped timecourses.Bootstrap test, * = p < 0.05 (two-tailed, Bonferroni-corrected).This figure can be reproduced by mkFigure5 6.py.