The authors have declared that no competing interests exist.
Visual neurons respond to static images with specific dynamics: neuronal responses sum sub-additively over time, reduce in amplitude with repeated or sustained stimuli (neuronal adaptation), and are slower at low stimulus contrast. Here, we propose a simple model that predicts these seemingly disparate response patterns observed in a diverse set of measurements–intracranial electrodes in patients, fMRI, and macaque single unit spiking. The model takes a time-varying contrast time course of a stimulus as input, and produces predicted neuronal dynamics as output. Model computation consists of linear filtering, expansive exponentiation, and a divisive gain control. The gain control signal relates to but is slower than the linear signal, and this delay is critical in giving rise to predictions matched to the observed dynamics. Our model is simpler than previously proposed related models, and fitting the model to intracranial EEG data uncovers two regularities across human visual field maps: estimated linear filters (temporal receptive fields) systematically differ across and within visual field maps, and later areas exhibit more rapid and substantial gain control. The model is further generalizable to account for dynamics of contrast-dependent spike rates in macaque V1, and amplitudes of fMRI BOLD in human V1.
This paper contributes to modeling and understanding the neuronal dynamics of visual cortex in four ways. First, we proposed a model that describes stimulus-driven neuronal dynamics in a simple and intuitive way. Second, we applied the model to intracranial EEG data and found regularities of response dynamics across and within human visual field maps. Third, the model was generalizable across different ways of measuring brain activity, allowing us to potentially link the sources underlying diverse measurements. Fourth, we comprehensively summarized existing models of neuronal dynamics, and identified effective components that give rise to accurate prediction.
Our visual system extracts behaviorally relevant information from a large quantity of inputs spread over space and time. To do so, some aspects of visual inputs are prioritized over others. In space, for example, the center-surround receptive fields in retinal ganglion cells enhance sensitivity to contrast, while attenuating sensitivity to diffuse illumination [
For each phenomenon, we show a schematic with a stimulus time course (gray shading), a linear prediction (black dashed line), and a cartoon illustration of plausible neuronal responses consistent with prior findings (red line). The linear prediction is the result of convolving an impulse response (left) with a stimulus time course. A. For a sustained stimulus, the neuronal response reduces after an initial transient, differing from the sustained linear prediction [e.g.,
To achieve a unified understanding of these seemingly disparate phenomena, here, we developed a general yet simple model that predicts neuronal dynamics in response to a static image whose contrast varies arbitrarily over time. The model is based on canonical neuronal computations [
The delayed normalization model has an LNG structure (Linear, Nonlinear, Gain control). In the linear stage, the model convolves the contrast time course of a single image with an impulse response. The output of this linear computation is then full-wave rectified (absolute valued) and expansively exponentiated. The interpretation of the full-wave rectification depends on the type of measurement. In single-cell measurements, full-wave rectification can be interpreted as the result of a linear combination of half-wave rectified (zeroing negative linear predictions) responses from two cells with complementary receptive fields–the excitatory part of one cell’s receptive field corresponds to the inhibitory part of the other [
The last and the most important computation of the model is a delayed gain control, implemented as a divisive normalization. The numerator here is the exponentially rectified linear output. The denominator consists of two exponentiated components: a constant (semi-saturation) and a low-passed rectified-linear output. The low-pass causal filter (implemented as an exponential decay) in the denominator is what gives rise to the predicted adaptation behavior. Intuitively, at stimulus onset, the linear filter sums the stimulus contrast within some time window and the numerator dominates the predicted initial response, resulting in a sharp rise in the response. The response then starts to decay once the sluggish gain control kicks in and starts to dominate. A history-dependent normalization signal has been proposed, and implemented as part of a feedback circuit [
The DN model was parameterized by five variables: τ1, τ2,
The remaining three variables parameterize the history-dependent or delayed divisive normalization. The numerator of the normalization is a linear response raised point-wise to a power
We assume a self-normalization process in the DN model, i.e. the numerator and the denominator share the same driving signal/linear response (‘L’ in
Following stimulus onset, the DN prediction increases rapidly due to convolution and exponentiation, and then reduces due to normalization, remaining at a lower, sustained level until stimulus offset. Although summation (convolution) and neuronal adaptation (normalization) both occur continuously throughout the predicted time course, different parts of the time course emphasize different neuronal phenomena: The initial response increase primarily reflects temporal summation (combining current inputs with past inputs), whereas the reduction following initial transient reflects adaptation, since the response level declines when the stimulus is unchanging.
In the remaining parts of the Results, we used data from different measurement techniques to examine the 4 temporal phenomena shown in
In this section, we show that the DN model captures the transient-decay neuronal dynamics at sustained (hundreds of milliseconds) presentation of static images (
We extracted part of the intracranial EEG (or ECoG/electrocorticography) signal that is thought to be correlated with the average local neuronal firing rates for model fitting. To do so, we computed the envelope of the high frequency (70–210 Hz, ‘broadband’) time courses from a large set of human ECoG electrodes spanning multiple visual field maps. Spectral patterns of the responses in these electrodes (but not the time courses) were analyzed for a prior publication [
(A) The DN model fits (red) accurately describe the ECoG broadband time course (black) in multiple ROIs. Data were averaged across trials and electrodes within ROIs, and models were fit to the average time course. Each trial had a 500-ms stimulus (gray box) followed by a 500-ms blank. Plots show the mean and 50% CI for data (bootstrapped 100 times across electrodes within an ROI), and the model fit averaged across the 100 bootstraps. The number of electrodes per ROI and the 50% CI of model accuracy (r2 per bootstrap) are indicated in each subplot. (B) The model fits for the 4 ROIs are plotted together, scaled to unit height. For this plot, the latency was assumed to be 0 for each ROI, so that the difference in time to peak reflects a difference in integration time rather than a difference in response latency. (C) Cross-validation over trials and over electrodes. 30-fold leave-one-out cross validation over trials was performed on the 30 repeats. Red dots represent the median r2 across trials, and black dots are the leave-one-out prediction to each trial. Leave-one-out cross validation was also performed over electrodes. Details of the cross-validated fit were presented in
In each trial during the experiment, a static texture (22°-diameter) was presented for 500 ms followed by a 500-ms blank. The textures were noise patterns with 1/
The DN model provided good fit to the broadband time course from all 4 ROIs, with the variance explained by the model between 90% and 99%, and cross-validated variance explained generally above 70%, especially across trials. The responses in each of the 4 ROIs exhibited the characteristic pattern whereby the amplitude substantially declined following an initial large response (e.g., as depicted in the schematic in
We derived two interpretable summary metrics to quantify model behavior in each ROI (
(A) Temporal summation window length and the extent of gain control increase along the visual hierarchy. The model parameters fit to the data are shown on the right. The model fits were then summarized by two metrics. Tpeak is the duration from the onset of a sustained stimulus to the peak response, excluding the onset latency. Tpeak is longer for later ROIs, ranging from ~115 ms (V1) to ~145 ms (anterior ROIs). Rasymp is the level at which the response asymptotes for a sustained stimulus, as a fraction of the peak response. A smaller Rasymp indicates a greater extent of gain control. Rasymp is largest in V1 (~0.12) and declines in extrastriate areas. See
Previous work has shown that within V1, regions with more peripheral eccentricities are more sensitive to visual transients [
To quantify the offset transient response, we fit the DN model with varying
The model provided excellent fits to the full time-course of the response in individual electrodes including stimulus offset (
Above, we showed that the DN model accurately fit the ECoG broadband time series from different visual areas and different eccentricities. Here, to test generalizability, we fit the model to example time courses from 3 measurement types in early visual cortex obtained from prior publications (
In our prior fMRI studies [
In brief, the fMRI subjects were presented with a large-field contrast pattern either once or twice per 4.5-s trial (
There are two types of temporal profiles used for the fMRI experiment: one-pulse stimuli with varying durations and two-pulse stimuli (134 ms each) with varying ISI. To generate DN model predictions to these stimuli, we used the median DN parameters fit to the V1 broadband time course measured in individual electrodes (
First, we generated the predicted time-varying neuronal response for each stimulus within each of the 4 ROIs using parameters estimated from the ECoG experiment. Because we had more electrodes than ROIs, within each ROI we used the median model parameters across ECoG electrodes (
Second, we summed this predicted time series to quantify the total predicted neuronal response for that trial. Finally, we scaled the responses in order to translate them into units of percentage BOLD based on a fit of the best single scale factor across all 13 temporal conditions. Overall, this procedure yields one predicted number for each temporal condition per ROI. The experimentally measured fMRI BOLD amplitudes (one per stimulus condition) were derived by solving a general linear model (GLM). Because the GLM already accounts for the hemodynamic response function, the BOLD amplitude predicted from the DN model can be compared directly to the fMRI-measured amplitude. Because the DN model parameters were derived from the ECoG data alone, there were no free parameters other than the gain to convert the summed time series to percent BOLD units. Although the DN models were solved with different participants, different stimuli, and a different instrument, they nonetheless accurately fit the BOLD data (
Both the measured BOLD response and the predictions derived from the ECoG model fits show two patterns consistent with neuronal phenomena schematized in
Second, the BOLD signal shows evidence of reduced responses for repeated stimuli. This can be seen when the inter-stimulus interval for the two-pulse stimuli is short and the response is low (adaptation), compared to when the interval is longer and the response is higher (recovery from adaptation). This pattern is not predicted by a linear model, for which the total predicted response is the same irrespective of the inter-stimulus interval, but it is predicted by the DN model (compare red versus blue lines in
In this section, the model was generalized to account for single unit peri-stimulus time histograms (PSTHs) measured in macaque V1 in response to stimuli of variable contrasts. In previous sections, the DN model took a time course of binary values as input, where 1 represented stimulus present and 0 represented stimulus absent (neutral gray screen). Here, the model was generalized to take any value
Although the three complex cells differ in their overall dynamics, within each cell, the PSTH dynamics vary in a consistent way with stimulus contrast: compared to high contrast, response peaks tend to be lower and later at low contrast (
The model captures low response amplitude at low contrast, because in the divisive normalization equation, although the numerator (LN response) and the normalization pool response (the second component in the denominator) are similar in amplitude, at low contrast, they are both relatively small compared to the semi-saturation term σ (small numbers divided by a small number). At high contrast, the numerator is large compared to σ, so the overall response amplitude is relatively large (large numbers divided by a small number). The model captures the slow response dynamics at low contrast for the following reason: the peak time in the response reflects a tradeoff between summing of the impulse response function (resulting in an amplitude increase) and the normalization extent (resulting in an amplitude decrease). Large normalization at high contrast causes an earlier response decrease, and therefore an earlier peak time. At low contrast, the response increases slowly because the impulse response function sums less contrast per unit time, and the normalization tradeoff occurs later because the normalization extent at each time point also depends on temporal summation.
For all three cells, the model predicts that responses to different stimulus contrasts all converge over the time course of the stimulus presentation. In particular, the model predicts that the response time courses are near identical at all contrasts after stimulus offsets. This convergence feature is predicted by the model, but 1) this is not a consistent feature across the three cells, and 2) the model does not predict this feature in general, given different combinations of the parameters, especially when the extent of normalization is low (small
A descriptive model of different response time courses was proposed by Albrecht et al. 2001 [
We proposed a model of neuronal dynamics that generalized in 3 ways. First, the model accurately accounted for diverse temporal phenomena, including sub-linear summation, neuronal adaptation, and slower dynamics at low contrast. Second, the model generalized across measurement types, including the fMRI BOLD signal, the ECoG broadband signal, and single unit PSTH, among others. Third, parameters of the model varied systematically both within and across visual field maps.
In our previous work [
DN is not the only model that captures all four phenomena of neuronal dynamics described in
Within non-RC circuit models (blue), Horiguchi et al. 2009 [
On the other hand, within the RC-circuit model class, Carandini and Heeger 1994 [
Although multiple models qualitatively capture the temporal phenomena, the DN model is relatively simple to compute, and it is not rooted in the assumption that gain control arises from a change in conductance due to spiking inputs, a mechanism which may not be a sufficient explanation for all forms of normalization [
There exist other models of a similar nature constructed for different experiment or measurement types, and models that include gain control without assuming a divisive form. For example, in Tsai et al. 2012 [
Outside the domain of visual cortical computations, dynamic or history-dependent normalization models have been proposed for decision making [
The presence of a delayed suppressive signal, as proposed in our DN model, does not preclude the possibility that there are also more rapid suppressive signals. In fact, both psychophysical [
The temporal phenomena studied in this paper mostly pertain to neuronal spiking activities. Some other types of neuronal activities, e.g. membrane potential dynamics, exhibit different shapes and different properties [
We observed 3 systematic trends across visual cortex: (1) the temporal window length and (2) the degree of normalization increased from V1 to extrastriate areas; (3) the relative sensitivity to transients (reflected in the response to stimulus offsets) increased from fovea to periphery.
The increase in temporal window length was systematic but small, increasing by about 30% from V1 to the anterior maps just beyond V3. Qualitatively, this is similar to the increase in spatial receptive field size across the cortical hierarchy, but the differences in spatial receptive fields are larger: receptive field size more than doubles from V1 to V3, and increases by at least 4 times from V1 to V4, measured with either single units [
In addition to the increasing temporal summation window length, we also found an increasing extent of normalization from early to late visual areas. This gradation of neuronal adaptation levels is consistent with previous results showing that the anterior visual field maps sum more compressively in time [
Note that the difference in response profiles (earlier response peak and higher response level at stimulus offset) between striate and extra-striate areas can also be qualitatively captured by a cascaded DN model, as compared to a one-stage DN model implemented with different parameters (
We found that temporal dynamics varied not only between maps but also within maps. Specifically, within V1-V3, peripheral response time courses measured by ECoG tended to exhibit large transient at stimulus offset. As a consequence, the peripheral responses, dominated by the onset and offset transients, are more sensitive to changes in stimulus contrast, whereas the foveal responses are more sensitive to the stimulus duration. It is likely that these differences start to emerge early in visual processing. For example, the ratio of parasol to midget cells is higher in the periphery than the fovea, contributing to higher sensitivity to transients [
Our model establishes baseline performance by demonstrating explanations of several important phenomena obtained for static, large-field images over a few hundred milliseconds. This type of stimuli is well matched to many natural tasks such as scene exploration and reading, in which fixations of (mostly) static images alternate with saccades, at approximately this time scale [
ECoG data were re-analyzed from prior work [
Functional MRI data was re-analyzed from prior work [
The data were pre-processed as in [
At the beginning of each 1-second trial, a large field (22°) noise image was randomly selected from one of 8 image classes. Several of these image classes were chosen for studying gamma oscillations in the original paper, which differs from the purpose of the current study. For this study, we analyzed data from the noise image classes only (3 of the 8 image classes): white, pink, and brown noise (amplitude spectra proportional to 1/
We extracted the broadband component of the ECoG signal for model fitting and other analyses. The broadband signal is thought to be correlated with local spiking activities: the broadband signal correlates with multi-unit activities near electrodes [
A number of steps were taken to extract time-varying broadband signals, which can be summarized as the power of the envelope of band-pass filtered ECoG voltage time courses. First, the raw ECoG time courses were band-pass filtered in several bands within the range between 70 and 210 Hz. Below 70 Hz, signals are more likely to be influenced by low frequency cortical rhythms and other processes and may contaminate the broadband estimation [
Instead of band-pass filtering the entire voltage time course with a single filter, we took ten 10-Hz bins from 70 to 210 Hz (70–80 Hz, 80–90 Hz, skipping 60 Hz line noise and its harmonics) using Butterworth filter (passband ripples < 3 dB, stopband attenuation 60 dB). 10 Hz bins were chosen because we estimated that we need at least 10 Hz bins to capture the sharp transients in the spike rates, and bandwidths broader than 10 Hz do not affect the estimated shapes of the spike rate transients any further. The reason for multiple bands is because the power in field potentials declines with frequency; if we computed the envelope of a single large pass band (say, 70–210 Hz), it would be dominated by signals near 70 Hz. After extracting the multiple bands, their time series were combined by computing a geometric mean, which ensures that the high frequency bands (low power) still contribute.
To convert the unit of the time-varying broadband to percent signal change in each electrode, we first averaged each broadband time series across epochs. We defined the first 200 ms prior to stimulus onset as the baseline period for the epoch-averaged time course, then we computed the percent signal change by dividing the entire 1200ms time course point-wise by the average of the baseline. To equalize the baseline across electrodes, we subtracted the baseline average from the entire time-course, so each electrode has trial-averaged baseline 0.
We first selected all electrodes located in identifiable visual areas based on separate retinotopy scans. Among these location-identifiable electrodes, we only chose the electrodes that satisfy the following two criteria for further analysis: 1. electrodes whose trial-averaged broadband response during the stimulus on period (500 ms) is greater than the baseline period on average; 2. electrodes whose maximal trial-averaged broadband response is greater than 150% of the pre-normalized baseline average (see
Based on the retinotopy analysis, we separated the electrodes within V1-V3 into three eccentricity bins (<5, 5–10, >10 degrees) based on their estimated receptive field centers.
We re-analyzed two macaque single-unit data sets from Albrecht et al. 2002. The first data set consists of trial-averaged PSTH from 12 complex cells in V1 (
The second data set (Albrecht et al. 2002
We re-analyzed the data correspond to the “Contextual modulation” experiment in [
MUA and LFP extractions are exactly the same as described in [
First, for the linear stage, a weighted difference between two gamma functions was used as the form of the impulse response function. Each gamma function is simplified from the following equation [
The function peaks when
Here, we assumed the peak timing of the second gamma function to be 1.5 times the first one, and we vary
The exponential rectification, following the linear computation, is of the form
The exponential rectification may be thought of a correlate of the non-linearity resulting from thresholded spiking.
The purpose of the exponential rectification is to capture the non-linearity resulted from thresholded spiking. The last computation in the model is a dynamic, or history-dependent divisive normalization. The numerator of the division is
The second term in the denominator represents the response dynamics of the normalization pool. The term is a low-pass filtered version of the numerator, and the low-pass filter is implemented by an exponential decay, parameterized by τ2:
Over all, the divisive normalization can be represented as.
To fit the first DN model to the time series data (SUA, MUA, LFP broadband and ECoG broadband), we vary all four model parameters (
A higher weight of the transient component leads to a higher degree of the offset transient response, and a higher weight of the sustained component leads to a higher level of the sustained response. (See
For simplicity, we assumed n = 2 when fitting this model to the ECoG time course in
For the search fit, we used a bounded nonlinear search algorithm in MATLAB (
Throughout the paper, we summarized model accuracy as the variance explained,
To ensure that our computational methods are reproducible, all data and all software will be made publicly available via an Open Science Framework web site,
Here we explore how different DN model parameters affect the model predictions to two 500ms stimulus time courses (1 and 0.3 in contrast respectively). The black curve in each panel indicates the predicted response time course to the high contrast stimulus (at a chosen set of parameters), and the gray curve indicates response to the low contrast stimulus. The range of values we sweep across for each parameter is the range of values we used for the grid search step to fit each model parameter. In general, the DN model predicts an initial transient response followed by a decay. The width of the initial transient increases with increase in
(TIF)
The plots show the ECoG broadband time course in individual electrodes from ECoG subject S1, averaged across 90 trials (30 repeats each of three stimulus types). Each row shows electrodes from one ROI. Some electrodes (e.g., 74) are in two rows, since the electrode was near an ROI boundary. The plots are color coded by eccentricity bin (0–5°, 5–10, >10°). The pRF location was based on a separate ECoG pRF data set published previously (Winawer et al., 2013). The two mesh images show a magnified view of S1’s right occipital lobe, exposing the medial surface (left) and lateral surface (right). Insets show the zoomed-out view of the cortical mesh. Related to Figs
(TIF)
Cross-validation over trials. During the experiment, the subject was presented with large field white, pink, and brown noise stimuli, and each image class was repeated over 30 times. Each electrode’s response to different image class was slightly different (e.g. a foveal electrode responded with higher amplitude to white noise compared to brown noise stimuli), and the DN model does not have a spatial component to capture such differences. To discount such differences when cross-validate, we took each “trial” as the average response over one repeat of white, pink, and brown noise images. The black curves were the left-out response, and the red was the DN prediction based on the other 29 “trials.” Each row represents a different ROI, and each column represents a left-out trial. Cross-validation over electrodes. The black curves are the trial-averaged responses from the left-out electrodes, and the red curves are the DN model prediction based on the rest of the electrodes within an ROI. Each row represents an ROI, and each column represents a left-out electrode. Related to
(TIF)
(A) Response time courses from 3 different recording methods are shown. In each plot, the data are in black (±1 sem in gray) and the DN model fit in red. Left: single unit spike rates, averaged across neurons in macaque V1. Middle: Multiunit spike rates from human V2/V3. Right: High frequency broadband power (LFP) from human V2/V3. (B) DN model parameters from human ECoG. The model parameters in each of 4 ROIs are shown for the data plotted in the main text (
(TIF)
Individual electrode time courses and DN model fits in V1-V3. The background color indicates the eccentricity bins: 0°-5° (red), 5°-10° (purple), and >10° (green). There is a general tendency toward greater offset responses in more peripheral electrodes. Related to
(TIF)
We thank Dora Hermes for helpful discussion and for helping us analyze ECoG data from prior work. We also thank Josef Parvizi and the Stanford Human Intracranial Cognitive Electrophysiology Program for helping us with ECoG data acquisition for a prior paper, which was re-analyzed for this paper. We thank Bill Geisler for providing single unit data. We thank David Heeger, Mike Landy, and Jennifer M. Yoon for comments on an earlier draft of this manuscript.