Fig 1.
Schematic of neural similarity regularization.
Machine learning models are trained so that the stimuli manifold in the model feature space resembles the manifold in the neural response space. The most simple version considers only pairwise relationships among stimuli instances. Two images that are close in the neural response space should also be close in the model feature space.
Fig 2.
Neural regularization boosts model robustness and makes it less sensitive to high frequency component of the input.
A ResNet18 model (orange in a-d) was trained for grayscale CIFAR10 with mouse neural similarity regularization [7], a VGG19 model (orange in e-h) was trained for grayscale TinyImageNet with monkey neural response regularization [8]. (a) Grayscale CIFAR10 classification accuracy against common corruptions at different severity levels. Average accuracy over all corruptions are reported for a baseline ResNet model (black) and a mouse regularized model (orange). (b) Success rate of targeted attacks at different perturbation budget ϵ, using the boundary attack [20] with an L∞ metric. (c) Classification accuracy against different types of corruptions, broken down into three groups based on their frequency characteristics (Table A in S1 Appendix). Model performance is averaged over all severity levels. (d) Radial profile of the Fourier spectrum of adversarial perturbations. We found the minimal adversarial perturbations of all testing images, and calculated the averaged Fourier spectrum thereof, where blue is minimum and red is maximum values of each heat map respectively (insets), a logarithm scale color map is used for better visualization. The portion of power under different frequency thresholds are compared between baseline and neurally regularized models. The abscissa is the absolute value of the spatial frequency, normalized by sampling frequency fs. (e–h) Same as a–d, except comparing a baseline VGG model with a model co-trained with monkey neural data on the grayscale TinyImageNet dataset [21].
Fig 3.
Probing frequency sensitivity of mouse regularized model using hybrid images.
(a) Examples of hybrid images at different mixing frequencies. Hybrid images are constructed by mixing the low-frequency component of one image and the high-frequency component of another, while the two seed images belong to different categories. The range of mixing frequency’s values are normalized by the Nyquist frequency. (b) Model predictions on hybrid images at different mixing frequencies. As more low-frequencies from one image are included, the probability that a network reports its label plow increases. The reversal frequency frev where plow = phigh is smaller for the mouse regularized model (‘neural’) than for the baseline model (‘base’).
Fig 4.
Frequency analysis of adversarial attacks on robust models trained on CIFAR10.
(a) The Fourier spectrum of the minimal adversarial perturbations of different models, including six baseline models (‘base’), seven models trained for adversarial robustness (‘adv’), two models for corruption robustness (‘crp’), one model with preprocessing by blurring (‘blur’), and one with preprocessing by PCA compression (‘pca’). Model details are listed in Table B in S1 Appendix. The spectrum is averaged over 1000 images, and color maps are normalized separately for each panel. (b) Radial profiles of adversarial perturbation spectra. Light thin lines represent each individual model, while thick lines are the average within each group. The frequency where each line crosses 50% is denoted as half power frequency f0.5. (c) Scatter plot of minimum adversarial perturbation size versus f0.5 for all models.
Fig 5.
Hybrid CIFAR10 image classification performance of robust models.
(a) Difference between probability of choosing the low frequency label vs the high frequency label of hybrid images. As in Fig 4, light thin lines represent each individual model and thick lines are the average within each group. (b) Scatter plot of model accuracy on CIFAR10-C dataset versus reversal frequency frev in hybrid image classification. The dashed green line is the performance of a series of ‘blur’ models, using different degree of low-pass filtering. The left end corresponds to σ = 3 pixels and the right end corresponds to σ = 1 pixel, while the green dot is the model with σ = 1.5 pixel listed in Table B in S1 Appendix.
Fig 6.
Frequency analysis of models trained on ImageNet.
One baseline model (‘base’), two models (‘adv’) trained for adversarial robustness, and six models (‘crp’) trained for corruption robustness are compared. (a) Minimum adversarial perturbation size ϵ versus the half-power frequency f0.5 calculated from adversarial perturbation spectra. (b) Model accuracy on ImageNet-C dataset versus reverse frequency frev calculated from hybrid image experiment.