Efficient coding of natural scenes improves neural system identification

doi:10.1371/journal.pcbi.1011037

Fig 1.

Illustration of our hybrid model combining SI and EC.

a. Illustration of two common approaches to studying visual systems: system identification, symbolized by the green-labeled branch, aims at predicting responses of neuronal circuits (black rectangle) to specific stimuli, whereas efficient coding (purple-labeled branch) seeks working out principles of the visual system based on environmental statistics. As these two approaches are rarely combined in a single modeling framework, their potential synergies remain largely unexplored. b. Our hybrid modeling approach combines system identification (green) and efficient coding (purple) in a single model with shared filters (red circle) to predict neural responses to arbitrary visual stimuli.

More »

Expand

Fig 2.

Hybrid model with shared spatial filters.

a,b. Schemata of SI model (a) and EC model (b) from [39]. The SI model branch consists of spatial and temporal convolutional layers, a fully connected (FC) layer and a nonlinear layer (see Methods). The EC model branch is a convolutional autoencoder, consisting of an encoder and a decoder network. In the hybrid model, the two branches were trained in parallel with shared spatial filters (all spatial filters were shared; red). Input_SI: 8-frame UV-green noise (t₁ … t₈); Output_SI: predicted GCL cell Ca²⁺ responses; Input_EC: UV-green natural images; Output_EC: reconstructed Input_EC. c. Example for the different inputs (natural images, phase-scrambled natural images, and noise) for the EC branch in hybrid models (hybrid-natural, hybrid-pha-scr, hybrid-noise). d. Using PCA filters as basis vectors for spatial convolutional filters of the SI model; SI-PCA learned 16 weight vectors () with same vector length as the number of PCA basis elements.

More »

Expand

Fig 3.

Neural encoding tasks benefit from natural scene statistics.

a. Region-of-interest (ROI) mask of one recording field in dorsal retina (left) and mean Ca²⁺ responses (black) of exemplary ROIs in response to 6 repeats of noise stimuli (single trials in gray). b. Three representative GCL cell responses (gray) to the noise stimulus (cf. Fig 2a, left), together with predictions of best performing models on test data (black, SI; red, hybrid w/ natural scenes as input to the EC path, i.e., Input_EC), and learned spatio-temporal receptive fields (RFs) visualized by SVD. c. Model performance (linear correlation coefficient, CC; mean for n = 10 random seeds per model) based on validation data for hybrid model with natural scenes (red), with phase-scrambled scenes (brown), or with noise (magenta) as Input_EC, and for different weights. Note that the correlation values for the validation data are relatively low because these predictions were calculated on a single-trial basis (Methods). d. Best performance (mean for n = 10 random seeds per model) based on test data for SI, SI-PCA (16 bases), SI-DCT (4 bases), hybrid-natural (w = 0.2), hybrid-pha-scr (w = 0.3) and hybrid-noise (w = 0.4; p < 0.0001 for SI vs. hybrid-natural, p = 0.0085 for SI-PCA vs. hybrid-natural, p = 0.0011 for hybrid-natural vs. hybrid-pha-scr, two-sided permutation test, n = 10, 000 repeats). e. Scatter plot for model predictions based on test data for hybrid-natural (w = 0.2) vs. SI at one random seed, with each dot representing one neuron. f. Representative spatial filters (shared convolutional filters) for hybrid models with different Input_EC and different weights. Upper: with w = 0.5; lower: with optimal w (see (c)) for hybrid models. g. Mean R-squared of fitting a 2D Gaussian to spatial filters (cf. (f)), for hybrid model with natural scenes (red), with phase-scrambled scenes (brown), or with noise (magenta) as Input_EC, and for different w (n = 10 random seeds per model). h. Representative spatial filters (shared convolutional filters) for SI, SI with PCA filters (16 bases) and SI with DCT filters (4 bases). i. Mean R-squared of fitting a 2D Gaussian to the spatial filters for one chromatic stimulus channel (green; n = 10 random seeds per model; p < 0.0001 for SI vs. hybrid-natural, p < 0.0001 for SI-PCA vs. hybrid-natural, p = 0.0074 for hybrid-natural vs. hybrid-pha-scr, two-sided permutation test, n = 10, 000 repeats). Error bars in (c),(d),(g),(i) represent 2.5 and 97.5 percentiles obtained from bootstrapping.

More »

Expand

Fig 4.

Hybrid-natural models have better data efficiency for neural prediction.

a. Mean model performance (top) based on test data for SI and hybrid-natural (w = 0.2; n = 10 random seeds) with different training data sizes and mean difference between SI and hybrid-natural (bottom). b. Mean R-squared (top) of fitting a 2D Gaussian to spatial filters for green stimulus channel for SI and hybrid-natural (w = 0.2; n = 10 random seeds) with different training data sizes, and the mean difference between R-squared for SI and hybrid-natural (bottom). Error bars represent 2.5 and 97.5 percentiles with bootstrapping.

More »

Expand

Fig 5.

Past encoding or future prediction strategies using 3D shared filters perform equally well.

a. Top row: Responses of three exemplary GCL cells to 5-Hz noise stimulus (gray) and predictions of best performing models on test data (black, SI; blue, SI with PCA filters; red solid, hybrid for encoding the past; red dotted, hybrid for predicting the future). Bottom row: Respective learned RFs of the three cells (visualized by SVD). b. Mean model performance based on test data for SI, SI-PCA (128 bases), hybrid-natural-past, and hybrid-natural-future (both w = 0.4; n = 10 random seeds; p < 0.0001 for SI vs. hybrid-natural-past, p = 0.0005 for SI-PCA vs. hybrid-natural-past, p = 0.2563 for hybrid-natural-past vs. hybrid-natural-future, two-sided permutation test, n = 10, 000 repeats). c. Representative shared spatial and temporal filters of 3D models (n = 1 random seed, visualized by SVD; temporal kernels for UV and green stimulus channels indicated by purple and green, respectively). d. Mean R-squared of fitting a 2D Gaussian to shared spatial filters (for green stimulus channel; n = 10 random seeds per model; p = 0.0003 for SI vs. hybrid-natural-past, p = 0.4356 for SI-PCA vs. hybrid-natural-past, p = 0.1895 for hybrid-natural-past vs. hybrid-natural-future, two-sided permutation test, n = 10,000 repeats). Error bars in (b),(d) represent 2.5 and 97.5 percentiles with bootstrapping.

More »

Expand

Fig 6.

Direction-selective (DS) neurons benefit more from hybrid models.

a. Recorded (gray) and predicted (black, SI; red, hybrid-natural-past; response amplitude scaled with a constant 1.5 for better visualization) responses to noise, RFs, as well as full-field chirp responses and moving bar responses (gray, single trials; black, means) of representative DS and non-DS cells. Note that the RFs were dominated by UV stimulus channel because cells were recorded in ventral retina (see Methods). b. Mean model performance based on test data for SI, hybrid-natural-past and hybrid-natural-future (both w = 0.7; n = 10 random seeds per model; trained with responses of n = 427 GCL cells to 5-Hz noise stimulus; p < 0.0001 for SI vs. hybrid-natural-past, p = 0.9307 for hybrid-natural-past vs. hybrid-natural-future; two-sided permutation test, n = 10, 000 repeats). Note that compared to Fig 5b, these models had a lower predictive performance, as we used a different dataset, with 30% of data for training. c. Difference in mean performance between hybrid-natural-past and SI based on test data for 6 broad functional groups of GCL cells (35 OFF, 59 ON-OFF, 49 fast-ON, 38 slow-ON, and 64 uncertain RGCs, as well as 145 dACs; see Methods and Results; n = 10 random seeds per model). d. Like (b) but for n = 90 DS and n = 300 non-DS cells. e. Cumulative histogram of difference in mean prediction between hybrid-natural-past (w = 0.7) and SI on test data for DS (red) and non-DS cells (black), at one particular seed. Error bars in (b)–(d) represent 2.5 and 97.5 percentiles with bootstrapping.

More »

Expand

Fig 7.

Predicting neural responses to natural movies does not benefit from efficient coding.

a. Recorded (gray) and predicted (black, SI; red, hybrid-natural) responses to natural movie, RFs, as well as exemplary spatial filters for the SI model trained by full training data. b. Same as (a), but for the SI model trained by 23% of training data. c. Same as (a), but for the hybrid-natural model trained by 23% of training data. d. Predictive performance (top) based on test data for SI and hybrid-natural (w = 0.4; n = 10 random seeds) with different training data sizes, and the difference between SI and hybrid-natural (bottom). e. R-squared (top) of fitting a 2D Gaussian to spatial filters for UV stimulus channel for SI and hybrid-natural (w = 0.4; n = 10 random seeds) with different training data sizes, and the difference between SI and hybrid-natural (bottom). Error bars in (d),(e) represent 2.5 and 97.5 percentiles with bootstrapping.

More »

Expand