Receptive Field Inference with Localized Priors

doi:10.1371/journal.pcbi.1002219

Figure 1.

Neural encoding model and empirical Bayes receptive field inference.

(A) Linear Gaussian encoding model: the stimulus is projected on the receptive field and Gaussian noise is added to produce the neural response . (B) Graphical model for a hierarchical Bayesian receptive field model. The hyperparameters specify a prior over the receptive field , which together with stimulus determines the conditional probability of neural response . Circles indicate variables, arrows indicate conditional dependence, and the square denotes a pair of variables (stimulus and response ) that are observed many times. (C) Empirical Bayes involves a two-stage inference procedure: first, maximize the evidence for (left), which can be computed by integrating out from the generative model in (B); second, maximize the posterior over given the data and estimated hyperparameters (right). See text for details.

More »

Expand

Figure 2.

Comparison of estimators for 1D simulated example.

A 1D difference-of-Gaussians receptive field with 100 elements was stimulated with 2000 samples of correlated (1/F) Gaussian noise. Left column: True filter (top), maximum likelihood (linear regression) estimate (middle), and empirical Bayes ridge regression (L2-penalized) estimate (bottom). Middle: Lasso (L1-penalized) estimate (top) and ARD (middle) produce sparse estimates but fail to capture smoothness. The ASD estimate (bottom) captures smoothness, but exhibits spurious oscillations in the tails. Right column: Three variants of automatic locality determination (ALD): Spacetime localization (ALDs, top), which identifies a spatial region in which the filter coefficients are large; frequency localization (ALDf, middle), which identifies a local region of the frequency domain in which Fourier coefficients are large, leading to a smooth estimate that closely resembles ASD; and joint localization in spacetime and frequency (ALDsf, bottom), which simultaneously identifies a local region in spacetime and frequency, yielding an estimate that is both smooth and sparse.

More »

Expand

Figure 3.

Estimated filters and prior covariances for ALD methods.

(Same example filter as shown in Fig. 2). Left column shows the true filter (dotted black) and ALD estimates (red) replotted from the right-most column of Fig. 2. Top: Space-localized estimate. The estimated prior variance (black trace, middle) is a Gaussian form that controls the falloff in amplitude of filter coefficients (red) as a function of position. The prior covariance (right) is a diagonal matrix with this Gaussian along the diagonal. The prior is thus independent with location-dependent variance. Middle: Frequency-localized estimate. A Gaussian form (reflected around the origin due to symmetries of the Fourier transform) specifies the prior variance as a function of frequency (black trace, middle). The Fourier power of the filter estimate (red) drops quickly to zero outside the estimated region. The prior covariance matrix (right) is diagonal in the Fourier domain, meaning the Fourier coefficients are independent with frequency-dependent variance. Bottom: Space and frequency localized estimate. The estimated prior covariance matrix is not diagonal in spacetime or frequency, but takes the form of a “sandwich matrix” that combines the prior covariances from ALDs and ALDf (see text). The resulting prior covariance matrix can be visualized in either the spacetime domain (left) or the Fourier domain (right). It is localized (has a local region of large prior variance) in both coordinate frames, but has strong dependencies (off-diagonal elements), particularly across space.

More »

Expand

Figure 4.

Menagerie of simulated examples.

Noisy responses to 1600 random 1/F Gaussian stimuli were simulated and used for training. The leftmost column shows the true filter (a pixel image), while subsequent columns show various estimates. The mean squared error of each estimate is indicated below in red. Filters shown include: (A) Oriented Gabor filter, typical of a V1 simple cell; (B) Smaller Gabor filter; (C) center-surround “difference-of-Gaussians” filter, typical of retinal ganglion cells; D) grid cell with multiple non-zero regions (localized in the Fourier domain but not in space); (E) circularly windowed Gaussian white noise (localized in space but not in frequency); (F) full field Gaussian noise (not localized in space or frequency). ALDsf performs at or near the optimum for all examples we examined.

More »

Expand

Figure 5.

Comparison of error rates on simulated data.

Responses of a pixel Gabor filter (shown in Fig. 4 A) were simulated using white noise stimuli (left) or “naturalistic” 1/F Gaussian stimuli (right). (A): Filter error using white noise stimuli, for varying amounts of training data (See Methods). (B) Average filter error under each method. (C–D) Analogous to A–B, but for 1/F stimuli. For both kinds of stimuli, ALDsf achieved error rates almost 2 times smaller than ASD, the next best method. By examining horizontal slices through panels (A) and (C), it is apparent that traditional methods (ML and ridge regression) required four times more data on white noise stimuli, and twenty to thirty times more data on 1/F stimuli, to achieve the same error rate as ALDsf.

More »

Expand

Figure 6.

Receptive field estimates for V1 simple cells.

(Data from [53]). Left: Filter estimates obtained by ML, ridge regression, and ALDsf, for three different amounts of training data (1, 2, and 4 min). Numbers in red beneath each filter indicate relative cross-validation error. Middle: Relative cross validation error for each method, averaged across 16 neurons. ALDsf achieved the lowest average error, for all amounts of training data. Right: Number of times more training data required by each method to obtain the same error level of as ALDsf with 30s of training data. On average, the ML estimator required 5 times more training data, while ASD required 1.7 times more training data to match the performance of ALDsf.

More »

Expand

Figure 7.

Receptive field estimates for the full set of sixteen V1 simple cells analyzed.

(Data from [53]). Left: ML filter estimates from 1 minute of training data. Middle: ALDsf estimates from 1 minute of training data. Right: ML estimates from all data (an average of approximately 40 minutes of data per cell). Note the heterogeneity across cells, and that ALDsf captures the qualitative RF structure even when the 1-minute ML estimate is nearly indistinguishable from noise.

More »

Expand

Figure 8.

Comparison of 3D receptive field estimates for retinal data.

(Data from Chichilnisky lab, [55]). Top row: Maximum likelihood and ALDsf estimates for an OFF retinal ganglion cell (RGC) receptive field, stimulated using 1 minute of binary spatiotemporal white noise. Left column shows a schematic of the pixel 25 time bin receptive field, containing 2500 total coefficients. Each time bin was 8.33 ms, corresponding to a frame rate of 120 Hz. Colored lines indicate specific pixels whose timecourses shown at right, and spatial time-slices, depicted as images at right (taken at the 4th and 8th time bins, indicated by green and purple arrows, respectively). The ML and ALDsf estimates with 1 minute of training data are shown alongside the ML estimate computed from 20 minutes of data. Pixel time-courses were rescaled to be unit vectors, so that differences in temporal profiles (i.e., spacetime non-separability of filter) can be observed. Bottom row: Similar plots for an ON RGC, with spatial profiles shown for the 5th and 8th time bins. In both cases, the ALDsf accurately recovered the shape and timecourse of the RF, while the ML estimate was often indistinguishable from noise. We examined RF estimates from 3 ON and 3 OFF cells, and found that, with 1 minute of training data, the average mean-squared-error between each estimate and a reference estimate (the ML estimate computed with 20 minutes of data) was 18 times larger for ML and 6.6 times larger for ridge regression than for ALDsf.

More »

Expand

Figure 9.

Empirical Bayes (EB) and fully Bayes (FB) credible intervals on simulated data.

Left: FB and EB 95% credible intervals, computed from 100 samples of training data, for ALDs (above), ALDf (middle), and ALDsf (bottom). The true filter is shown in black. FB intervals are larger than EB intervals, due to the incorporation of uncertainty in the hyperparameters under fully Bayesian inference. Right: Credible intervals computed from 500 samples of training data. As the amount of training data increases, the FB and EB credible regions became indistinguishable, indicating that the evidence is tightly constrained around its maximum. For both amounts of training data, the posterior mean under FB and EB were virtually identical.

More »

Expand

Figure 10.

Empirical Bayes (EB) and fully Bayesian (FB) estimates on V1 data.

(A) ALDsf estimates for a single V1 simple cell under EB and FB inference, from 30 seconds (above) and 4 minutes (below) of training data. There was no significant difference in cross-validation error (numbers below in red, averaged over 100 resampled training sets). (B) Marginal posterior variance of RF coefficients, averaged across pixels and across all 16 cells, under EB and FB inference. As expected, FB estimates of the posterior variance were higher, especially for small datasets, reflecting the effects of posterior uncertainty in the hyperparameters. (C) Average cross-validation error across 16 cells for FB and EB estimates. For all amounts of training data, error rates were nearly identical, indicating that the FB posterior mean (computed via MCMC) is not superior to the more computationally inexpensive EB estimate.

More »

Expand