How well do models of visual cortex generalize to out of distribution samples?

doi:10.1371/journal.pcbi.1011145

Fig 1.

Neural predictivity as a measure of model-brain similarity.

A: We used electrophysiological recordings from area V4 in three macaque monkeys. Each animal’s data was collected across one or several recording sessions (1–4 sessions). Data corresponding to each session included measured neural responses (firing rate) to a set of naturalistic (i.e. Nat.) as well as a set of synthetically generated (i.e. Syn.) images. B: Illustration of the cross-validation procedure (2-fold) using principal component regression (PCR) for computing the neural predictivity score for naturalistic and synthetic stimuli. The natural dataset (top) is split randomly into two partitions (i.e. folds) along the stimulus dimension. For each fold, one partition is used to fit the PCR model towards predicting the neural data from neural network model unit activations in a given layer. The resulting PCR model is used to predict the responses to both the natural stimuli in the test-partition and the synthetic stimuli. The similarity scores for each stimulus domain (S_nat and S_syn) is computed by combining the predictions across the two folds.

More »

Expand

Fig 2.

Gap in neural prediction.

A: Scatter plot of normalized measured responses and their corresponding predicted values from a layer in ResNet50 model for two sample neuronal sites. The neuronal model’s generalization capability is highly variable across neurons. Left and right plot show two examples neuron with high and low generalization respectively. B: Scatter plot of Nat. and Syn. predictivity scores for a neuronal model based on ResNet50 unit activations for all neuronal sites with high internal consistency (larger than 0.7). The corner histogram shows the distribution of the difference between in- and out-of-distribution predictivity scores across neuronal sites; C: Neural predictivity score of the ResNet50 neuronal model and internal consistency of the neural data in naturalistic and synthetic domains for neural data collected from different animals (M, N, and S) and different recording sessions (S1–4); D: Bar plot of Nat. and Syn. predictivity scores as well as the neural predictivity gap for 7 different neural network models; E: Comparison of Nat. and Syn. predictivity scores for ResNet50 model when the regression model was fitted on naturalistic data (left) and synthetic data (right). The regression model fitted to the synthetic domain shows worse generalization to the naturalistic domain. All error bars denote the variance across 5 repetitions of each analysis.

More »

Expand

Fig 3.

Assumptions on brain-model correspondence affect generalization in neural predictivity.

A: Neural predictivity scores from unit activity in each layer of ResNet50 architecture for individual neuronal sites recorded during example sessions from different animal subjects. From top, rows correspond to M-S1, N-S1 and S-S1. Colors correspond to the neural predictivity score on natural (green) and synthetic (blue) domains. Different shades correspond to different neuronal site in the same animal. Bold lines correspond to the average predictivity score in each domain across all neuronal sites within that animal’s session; B: Number of neurons with highest neural predictivity in a given layer corresponding to the same subplot in a. Colors are the same as in a; C: Distribution of the difference between the layer number in ResNet50 neural network where each neuronal site is best predicted in-distribution and out-of-distribution. The difference is calculated as the best ID layer number—the best OOD layer number. The distribution spans a wide range but has a slightly negative mean (-1.29); D: Comparison of neural predictivity scores on natural and synthetic domains when a brain-model correspondence follows a Layer-Area (LA) or Layer-Neuron (LN) mapping assumption.

More »

Expand

Fig 4.

Object recognition performance and neural predictivity.

A: Example images from five object recognition benchmarks. Due to copyright restriction, sample ImageNet images are publicly available images that are qualitatively similar to those within that dataset. B: Object recognition performance of five example neural network models on different benchmarks. Object recognition performance on out-of-distribution benchmarks are significantly lower than those obtained on the ImageNet validation set. C: Measures of object recognition performance do not correlate with neural predictivity scores on natural or synthetic domains.

More »

Expand

Fig 5.

Robustness to adversarial perturbation improves neural predictivity.

A: Scatter plot of neural predictivity on natural (top) and synthetic (bottom) domains as a function of robust accuracy (i.e. average of accuracy under several adversarial perturbations); B: Comparison of neural prediction score on naturalistic (top) and synthetic (bottom) stimuli across a range of models optimized for improved robustness. All error bars denote the variance across 5 repetitions of each analysis; C: Effect of train-time perturbation magnitude on neural predictivity. Here, we only included the robust models with L₂ norm and similar attack settings and excluded 5 models that use different attack norm, number of steps, and step sizes; D: Scatter plot of neural predictivity scores for ResNet50 and ResNet50 () models. Each dot corresponds to an individual neuronal site.

More »

Expand

Fig 6.

Design choices affect neural predictivity.

A: Barplot shows the neural predictivity score for different variations of the ResNet architecture on naturalistic (top) and synthetic (bottom) domains. Deeper and wider networks tend to yield higher neural predictivity scores. B: Unsupervised learning algorithms do not significantly improve neural predictivity on natural stimuli while Momentum Contrast leads to significantly higher neural predictivity on synthetic stimuli. C: Larger datasets with substantially higher number of categories (Instagram) and examples (CLIP) do not improve neural predictivity in either stimulus domains. All error bars denote the variance across 5 repetitions of each analysis.

More »

Expand

Fig 7.

Prediction and neural prediction score consistency across neuronal models.

A: (left column) histogram of consistency in neuronal predictivity scores of different neuronal sites across neuronal models constructed from different layers within the same neural network architecture (top row) and across layers from different neural network architectures (bottom row). (right column) similar to the left column except the consistency is computed for image-level predictions; B: Comparison of neuronal predictivity scores between single and ensemble neuronal models in natural (top) and synthetic (bottom) domains. The single layer model consists of the best neuronal model for each neuronal site within a neural network model. Each ensemble model is constructed by aggregating the predictions from the top-5 neuronal models within the same network for each neural site; C: comparison of neural predictivity scores in natural and synthetic domains for best single model, within-model and cross-model ensembles. All error bars denote the variance across 5 repetitions of each analysis.

More »

Expand