Fig 1.
Overview of our divisive normalization (DN) model.
The model takes as input an image covering 1.1° of visual field and predicts neurons’ spike counts in response to this image (details in Fig 2). The model is split into two parts: a core that computes a shared nonlinear feature space and a readout that maps the shared feature space individually to each neuron’s spike count. A. Divisive normalization mechanism (simplified). The visual input is convolved with 32 filters covering 0.4° of visual field and then rectified and exponentiated to produce an excitatory output. The output of each filter is then divided by a weighted sum of the excitatory outputs of all filters with normalization weights pkl and a semi-saturation constant σl. In our general formulation, all weights and constants are learned from the data. B. Readout that maps the shared feature space to each neuron’s spike count through an individual weighted sum over the entire shared feature space and a pointwise output nonlinearity. The readout weights are factorized into a feature vector—capturing the nonlinear feature(s) that a neuron computes—and a spatial mask—localizing each neuron’s receptive field (RF).
Fig 2.
Experimental paradigm from Cadena and colleagues [14].
Natural images were flashed to a monkey covering 2° of their visual angle, and located at the center of the multi-unit receptive field. Multiple neurons were isolated from recordings with silicon probes inserted into V1 [41]. Natural images were shown in a fast sequence without blanks, each presented for 60 ms. Spike counts from all isolated neurons corresponding to each image were extracted from a window 40 ms after the image onset lasting 60 ms.
Fig 3.
Performance comparison of our models fitted to the data from Cadena and colleagues [14] relative to the gap between the best shallow model—a subunit one layer convolutional neural network (CNN)—and the deeper data-driven state-of-the-art three-layer CNN [14].
Non-specific divisive normalization (DN) accounts for 41% of this gap, while specific DN improves it up to 52%. Absolute values in terms of percentage of explainable variance explained (FEV) on the right (mean over the ten best models selected in terms of validation set accuracy, see main text for details). Error-bars (black) indicate the standard error of the mean. Model performance is significantly different between each model class (pairwise Wilcoxon signed rank test on best models in terms of validation accuracy: p < 0.024, N = 166 neurons, family-wise error rate α = 0.05 using Holm-Bonferroni correction).
Fig 4.
Cross-orientation inhibition was learned by our DN model and the three-layer CNN, but not significantly by the subunit model.
A. Tuning curves for an example neuron of all three models and various contrast combinations of the optimal Gabor (box on the right, examples for contrasts of 0%, 1% and 2% not shown) and an orthogonal Gabor masking. As the contrast of the orthogonal mask increases, the model prediction (normalized by the maximum response) decreases. The cross-orientation inhibition (inhib.) index measures the percentage of response inhibition by adding the masking compared to the optimal Gabor presented alone, in this case approximately 20%. A. insets: Illustration of plaid stimuli, created by overlaying an optimally oriented Gabor with an orthogonal mask. B. Histograms of the cross-orientation inhibition indices accumulated across the best ten models (in terms of validation set accuracy) per model type, with kernel density estimate of the underlying distribution. The fraction of cells that show more than 10% cross-orientation inhibition is displayed right of the dotted line (mean and 95% confidence interval over the ten best models selected in terms of validation set accuracy). For the DN model, more cells show cross-orientation inhibition compared to the other models. The subunit model shows almost no cross-orientation inhibition.
Fig 5.
Structure of divisive normalization.
A. The matrix shows the average strength of the normalizing inputs (products in denominator of Eq (1) averaged across images; see Methods) for each combination of filter response being normalized (rows) and filter response providing normalizing input (columns). Darker shades of blue indicate stronger normalization. Orientation-selective filters are grouped at the top, ordered by preferred orientation and marked by the black square. The dashed black lines within the square separate pairs of filters with similar (< 45°) and dissimilar (≤ 45°) orientations. Normalizing inputs are stronger for similarly tuned filters. Unoriented filters mainly accounting for orientation-unspecific contrast are sorted by total normalization input. Darkest blue color corresponds to the maximum normalization input for the group of oriented filters, higher normalization input values for the unoriented filters are clipped to that value. Data of the model with highest accuracy on the validation set is shown. B. Normalization input from similar orientations (< 45°) compared to the normalization input from dissimilar orientations (≥ 45°) for each oriented linear filter. Grey line: identity. Most features are normalized preferentially by the responses of filters with similar preferred orientations. Data of the model with highest accuracy on the validation set is shown. C. Normalization input, binned into orientation difference of 10°. Each bin was averaged over the top-10 models (assessed on the validation set). The shaded area depicts the standard deviation per bin. C inset. Normalization input (norm. input) vs. cosine similarity between linear filters (cos. sim.) averaged across the top-10 models (assessed on the validation set). A cosine similarity greater than zero corresponds to similar features. Error bars: standard error of the mean. D. Histogram of DN exponents (nl in Eq 1) of the ten best performing models in terms of validation set accuracy. Darker/lighter color: exponents corresponding to driving inputs due to oriented/unoriented linear filters. Most values are larger than one, with a few exceptions mainly corresponding to unoriented filters.
Fig 6.
Histogram of feature readout weights of the ten best performing models in terms of validation set accuracy.
For each model, feature weights are normalized across channels and averaged across individual neurons. All model’s channels are used to predict neural activity.
Fig 7.
Size-tuning in silico experiments and spatially extended DN control models.
A. inset: Prediction of the best DN model (chosen by validation set accuracy) for all neurons to gratings of increasing size. The gratings’ properties were determined from the units’ optimally stimulating Gabor pattern. As grating diameter increased, only very few neurons showed mostly weak suppression. Predictions normalized to maximum response per neuron. Suppression index measures asymptotic suppression relative to the maximum prediction A. main panel: Across all neurons and the ten best DN models (chosen by validation set accuracy), almost no neurons show significant surround suppression. B. Test set performance of the ten best performing DN models. The model’s performance rapidly decreases for spatially increasing normalization pool size (in units of visual angle in degrees). The best model on the validation set is indicated by a blue dot. C. & D. Weights of the spatial normalization pool for the best performing model with pool size of (C.) 1.06° of visual field (5 px × 5 px) and (D.) 1.34° of visual field (7 px × 7 px; all evaluated in terms of the validation set accuracy). For each feature (columns), the two components (rows) of the in total 32 spatial normalization pools are shown. Darker color corresponds to higher weights. Both components are similar. B. insets: Average across features and normalization pool components. The model learned normalization from the receptive field center (on average).