Skip to main content
Advertisement
  • Loading metrics

Structured random receptive fields enable informative sensory encodings

Abstract

Brains must represent the outside world so that animals survive and thrive. In early sensory systems, neural populations have diverse receptive fields structured to detect important features in inputs, yet significant variability has been ignored in classical models of sensory neurons. We model neuronal receptive fields as random, variable samples from parameterized distributions and demonstrate this model in two sensory modalities using data from insect mechanosensors and mammalian primary visual cortex. Our approach leads to a significant theoretical connection between the foundational concepts of receptive fields and random features, a leading theory for understanding artificial neural networks. The modeled neurons perform a randomized wavelet transform on inputs, which removes high frequency noise and boosts the signal. Further, these random feature neurons enable learning from fewer training samples and with smaller networks in artificial tasks. This structured random model of receptive fields provides a unifying, mathematically tractable framework to understand sensory encodings across both spatial and temporal domains.

Author summary

Evolution has ensured that animal brains are dedicated to extracting useful information from raw sensory stimuli while discarding everything else. Models of sensory neurons are a key part of our theories of how the brain represents the world. In this work, we model the tuning properties of sensory neurons in a way that incorporates randomness and builds a bridge to a leading mathematical theory for understanding how artificial neural networks learn. Our models capture important properties of large populations of real neurons presented with varying stimuli. Moreover, we give a precise mathematical formula for how sensory neurons in two distinct areas, one involving a gyroscopic organ in insects and the other visual processing center in mammals, transform their inputs. We also find that artificial models imbued with properties from real neurons learn more efficiently, with shorter training time and fewer examples, and our mathematical theory explains some of these findings. This work expands our understanding of sensory representation in large networks with benefits for both the neuroscience and machine learning communities.

Introduction

It has long been argued that the brain uses a large population of neurons to represent the world [14]. In this view, sensory stimuli are encoded by the responses of the population, which are then used by downstream areas for diverse tasks, including learning, decision-making, and movement control. These sensory areas have different neurons responding to differing stimuli while also providing a measure of redundancy. However, we still lack a clear understanding of what response properties are well-suited for different sensory modalities.

One way to approach sensory encoding is by understanding how a neuron would respond to arbitrary stimuli. Experimentally, we typically present many stimuli to the animal, measure the responses of sensory neurons, then attempt to estimate some kind of model for how the neurons respond to an arbitrary stimulus. A common assumption is that the neuron computes a linear filter of the stimulus, which then drives spiking through a nonlinear spike-generating mechanism. Mathematically, this assumption can be summarized as the number of measured spikes for a stimulus x being equal to σ(wT x) for a weight vector w and nonlinearity σ. Here, the weights w define the filtering properties of the neuron, also known as its receptive field [5]. This model is known as a linear-nonlinear (LN) model [6], and it is also the most common form of artificial neuron in artificial neural networks (ANNs). LN models have been used extensively to describe the firing of diverse neurons in various sensory modalities of vertebrates and invertebrates. In the mammalian visual system, LN models have been used to characterize retinal ganglion cells [7], lateral geniculate neurons [8], and simple cells in primary visual cortex (V1) [9]. They have also been used to characterize auditory sensory neurons in the avian midbrain [10] and somatosensory neurons in the cortex [11]. In insects, they have been used to understand the response properties of visual interneurons [12], mechanosensory neurons involved in proprioception [13, 14], and auditory neurons during courtship behavior [15].

Given the stimulus presented and neural response data, one can then estimate the receptive fields of a population of neurons. Simple visual receptive fields have classically been understood as similar to wavelets with particular spatial frequency and angular selectivity [9]. In mechanosensory areas, receptive fields are selective to temporal frequency over a short time window [13]. Commonly, parametric modeling (Gabor wavelets [4]) or smoothing (regularization, etc. [16]) are used to produce “clean” receptive fields. Yet, the data alone show noisy receptive fields that are perhaps best modeled using a random distribution [17]. As we will show, modeling receptive fields as random samples produces realistic receptive fields that reflect both the structure and noisiness seen in experimental data. More importantly, this perspective creates significant theoretical connections between foundational ideas from neuroscience and artificial intelligence. This connection helps us understand why receptive fields have the structures that they do and how this structure relates to the kinds of stimuli that are relevant to the animal.

Modeling the filtering properties of a population of LN neurons as samples from a random distribution leads to the study of networks with random weights [1820]. In machine learning (ML), such networks are known as random feature networks (RFNs) [2124]. The study of RFNs has rapidly gained popularity in recent years, in large part because it offers a theoretically tractable way to study the learning properties of ANNs where the weights are tuned using data [2527]. When the RFN contains many neurons, it can approximate functions that live in a well-understood function space. This function space is called a reproducing kernel Hilbert space (RKHS), and it depends on the network details, in particular the weight i.e. receptive field distribution [2830]. Learning can then be framed as approximating functions in this space from limited data.

Several recent works highlight the RFN theory’s usefulness for understanding learning in neural systems. Bordelon, Canatar, and Pehlevan, in a series of papers, have shown that neural codes allow learning from few examples when spectral properties of their second-order statistics aligns with the spectral properties of the task [3133]. When applied to V1, they found that the neural code is aligned with tasks that depend on low spatial frequency components. Harris constructed an RFN model of sparse networks found in associative centers like the cerebellum and insect mushroom body and showed that these areas may behave like additive kernels [34], an architecture also considered by Hashemi et al. [35]. These classes of kernels are beneficial for learning in high dimensions because they can learn from fewer examples and remain resilient to input noise or adversarial perturbation. Xie et al. investigated the relationship between the fraction of active neurons in a model of the cerebellum—controlled by neuron thresholds—and generalization performance for learning movement trajectories [36]. In the vast majority of network studies with random weights, these weights w are drawn from a Gaussian distribution with independent entries. This sampling is equivalent to a fully unstructured receptive field, which looks like white noise.

Closely related to our work, a previous study of ANNs showed that directly learning structured receptive fields could improve image classification in deep networks [37]. Their receptive fields were parametrized as a sum of Gaussian derivatives up to fourth order. This led to better performance against rival architectures in low data regimes.

In this paper, we study the effect of having structured yet random receptive fields and how they lead to informative sensory encodings. Specifically, we consider receptive fields generated by a Gaussian process (GP), which can be thought of as drawing the weights w from a Gaussian distribution with a particular covariance matrix. We show that networks with such random weights project the input to a new basis and filter out particular components. This theory introduces realistic structure of receptive fields into random feature models which are crucial to our current understanding of artificial networks. Next, we show that receptive field datasets from two disparate sensory systems, mechanosensory neurons on insect wings and V1 cortical neurons from mice and monkeys, are well-modeled by GPs with covariance functions that have wavelet eigenbases. Given the success of modeling these data with the GP, we apply these weight distributions in RFNs that are used in synthetic learning tasks. We find that these structured weights improve learning by reducing the number of training examples and the size of the network needed to learn the task. Thus, structured random weights offer a realistic generative model of the receptive fields in multiple sensory areas, which we understand as performing a random change of basis. This change of basis enables the network to represent the most important properties of the stimulus, which we demonstrate to be useful for learning.

Results

We construct a generative model for the receptive fields of sensory neurons and use it for the weights of an ANN. We refer to such a network as a structured random feature network. We first review the basics of random feature networks, the details and rationale behind our generative model, and the process by which we generate hidden weights. Our main theory result is that networks with such weights transform the inputs into a new basis and filter out particular components, thus bridging sensory neuroscience and the theory of neural networks. Next, we show that neurons in two receptive field datasets—insect mechanosensory neurons and mammalian V1 cortical neurons—are well-described by our generative model. There is a close resemblance between the the second-order statistics, sampled receptive fields, and their principal components for both data and model. Finally, we show the performance of structured random feature networks on several synthetic learning tasks. The hidden weights from our generative model allows the network to learn from fewer training examples and smaller network sizes.

Theoretical analysis

We consider receptive fields generated by GPs in order to connect this foundational concept from sensory neuroscience to the theory of random features in artificial neural networks. GPs can be thought of as samples from a Gaussian distribution with a particular covariance matrix, and we initialize the hidden weights of RFNs using these GPs. We show that using a GP causes the network to project the input into a new basis and filter out particular components. The basis itself is determined by the covariance matrix of the Gaussian, and is useful for removing irrelevant and noisy components from the input. We use these results to study the space of functions that RFNs containing many neurons can learn by connecting our construction to the theory of kernel methods.

Random feature networks.

We start by introducing the main learning algorithm and the neuronal model of our work, the RFN. Consider a two-layer, feedforward ANN. Traditionally, all the weights are initialized randomly and learned through backpropagation by minimizing some loss objective. In sharp contrast, RFNs have their hidden layer weights sampled randomly from some distribution and fixed. Each hidden unit computes a random feature of the input, and only the output layer weights are trained (Fig 1). Note that the weights are randomly drawn but the neuron’s response is a deterministic function of the input given the weights.

thumbnail
Fig 1. Random feature networks with structured weights.

We study random feature networks as models for learning in sensory regions. In these networks, each neuron’s weight w is fixed as a random sample from some specified distribution. Only the readout weights β are trained. In particular, we specify distributions to be Gaussian Processes (GPs) whose covariances are inspired by biological neurons; thus, each realization of the GP resembles a biological receptive field. We build GP models of two sensory areas that specialize in processing timeseries and image inputs. We initialize w from these structured GPs and compare them against initialization from unstructured white-noise distribution.

https://doi.org/10.1371/journal.pcbi.1010484.g001

Mathematically, we have the hidden layer activations and output given by (1) where is the stimulus, are the hidden neuron responses, and is the predicted output. We use a rectified linear (ReLU) nonlinearity, σ(x) = max(0, x) applied entrywise in Eq (1). The hidden layer weights are drawn randomly and fixed. Only the readout weights β0 and β are trained in RFNs.

In our RFN experiments, we train the readout weights and offset using a support vector machine (SVM) classifier with squared hinge loss and 2 penalty with regularization strength of tuned in the range [10−3, 103] by 5-fold cross-validation. Our RFNs do not include a threshold for the hidden neurons, although this could help in certain contexts [36].

In the vast majority of studies with RFNs, each neuron’s weights are initialized i.i.d. from a spherical Gaussian distribution . We will call networks built this way classical unstructured RFNs (Fig 1). We propose a variation where hidden weights are initialized , where is a positive semidefinite covariance matrix. We call such networks structured RFNs (Fig 1), to mean that the weights are random with a specified covariance. To compare unstructured and structured weights on equal footing, we normalize the covariance matrices so that Tr(C) = Tr(Id) = d, which ensures that the mean square amplitude of the weights .

Receptive fields modeled by linear weights.

Sensory neurons respond preferentially to specific features of their inputs. This stimulus selectivity is often summarized as a neuron’s receptive field, which describes how features of how the sensory space elicits responses when stimulated [5]. Mathematically, receptive fields are modeled as a linear filter in the stimulus space. Linear filters are also an integral component of the widely used LN model of sensory processing [6]. According to this model, the firing rate of a neuron is a nonlinear function applied to the projection of the stimulus onto the low-dimensional subspace of the linear filter.

A linear filter model of receptive fields can explain responses of individual neurons to diverse stimuli. It has been used to describe disparate sensory systems like visual, auditory, and somatosensory systems of diverse species including birds, mammals, and insects [7, 1012, 15]. If the stimuli are uncorrelated, the filters can be estimated by computing the spike triggered average (STA), the average stimulus that elicited a spike for the neuron. When the stimuli are correlated, the STA filter is whitened by the inverse of the stimulus covariance matrix [38]. Often these STAs are denoised by fitting a parametric function to the STA [6], such as Gabor wavelets for simple cells in V1 [9].

We model the receptive field of a neuron i as its weight vector wi and its nonlinear function as σ. Instead of fitting a parametric function, we construct covariance functions so that each realization of the resulting Gaussian process resembles a biological receptive field (Fig 1).

Structured weights project and filter input into the covariance eigenbasis.

We generate network weights from Gaussian processes (GP) whose covariance functions are inspired by the receptive fields of sensory neurons in the brain. By definition, a GP is a stochastic process where finite observations follow a Gaussian distribution [39]. We find that networks with such weights project inputs into a new basis and filter out irrelevant components. We will see that this adds an inductive bias to classical RFNs for tasks with naturalistic inputs and improves learning.

We view our weight vector w as the finite-dimensional discretization of a continuous function w(t) which is a sample from a GP. The continuous function has domain T, a compact subset of , and we assume that T is discretized using a grid of d equally spaced points {t1, …, td} ⊂ T, so that wi = w(ti). Let the input be a real-valued function x(t) over the same domain T, which could represent a finite timeseries (D = 1), an image of luminance on the retina (D = 2), or more complicated spatiotemporal sets like a movie (D = 3). In the continuous setting, the d-dimensional 2 inner product gets replaced by the L2(T) inner product 〈w, x〉 = tTw(t)x(t)dt.

Every GP is fully specified by its mean and covariance function C(t, t′). We will always assume that the mean is zero and study different covariance functions. By the Kosambi-Karhunen–Loève theorem [40], each realization of a zero-mean GP has a random series representation (2) in terms of standard Gaussian random variables , functions ϕi(t), and weights λi ≥ 0. The pairs are eigenvalue, eigenfunction pairs of the covariance operator , which is the continuous analog of the covariance matrix C. If C(t, t′) is positive definite, as opposed to just semidefinite, all and these eigenfunctions ϕi form a complete basis for L2(T). Using Eq (2), the inner product between a stimulus and a neuron’s weights is (3)

Eq (3) shows that the structured weights compute a projection of the input x onto each eigenfunction 〈ϕi, x〉 and reweight or filter by the eigenvalue λi before taking the 2 inner product with the random Gaussian weights zi.

It is illuminating to see what these continuous equations look like in the d-dimensional discrete setting. Samples from the finite-dimensional GP are used as the hidden weights in RFNs, . First, the GP series representation Eq (2) becomes w = ΦΛz, where Λ and Φ are matrices of eigenvalues and eigenvectors, and is a Gaussian random vector. By the definition of the covariance matrix, , which is equal to ΦΛ2 ΦT after a few steps of linear algebra. Finally, Eq (3) is analogous to wT x = zT ΛΦT x. Since Φ is an orthogonal matrix, ΦT x is equivalent to a change of basis, and the diagonal matrix Λ shrinks or expands certain directions to perform filtering. This can be summarized in the following theorem:

Theorem 1 (Basis change formula) Assume with C = ΦΛ2 ΦT its eigenvalue decomposition. For , define (4)

Then for .

Theorem 1 says that projecting an input onto a structured weight vector is the same as first filtering that input in the GP eigenbasis and doing a random projection onto a spherical random Gaussian. The form of the GP eigenbasis is determined by the choice of the covariance function. If the covariance function is compatible with the input structure, the hidden weights filter out any irrelevant features or noise in the stimuli while amplifying the descriptive features. This inductive bias facilitates inference on the stimuli by any downstream predictor. Because the spherical Gaussian distribution is the canonical choice for unstructured RFNs, there is a simple way to evaluate the effective kernel of structured RFNs as (see S1 Appendix).

Our expression for the structured kernel provides a concrete connection to the kernel theory of learning using nonlinear neural networks. For readers interested in such kernel theories, a full example and simulation results of how these work is given in S1 Appendix. There we show that there can be an exponential reduction in the number of samples needed to learn frequency detection using a structured versus unstructured basis (Fig A in S1 Appendix).

Examples of random yet structured receptive fields

Our goal is to model the weights of artificial neurons in a way that is inspired by biological neurons’ receptive fields. Structured RFNs sample hidden weights from GPs with structured covariance, so we construct covariance functions that make the generated weights resemble neuronal receptive fields. We start with a toy example of a stationary GP with well-understood Fourier eigenbasis and show how the receptive fields generated from this GP are selective to frequencies in timeseries signals. Then, we construct locally stationary covariance models of the of insect mechanosensory and V1 neuron receptive fields. These models are shown to be a good match for experimental data.

Warm-up: Frequency selectivity from stationary covariance.

To illustrate some results from our theoretical analysis, we start with a toy example of temporal receptive fields that are selective to particular frequencies. This example may be familiar to readers comfortable with Fourier series and basic signal processing. Let the input be a finite continuous timeseries x(t) over the interval T = [0, L]. We use the covariance function (5) where ωk = 2πk/L is the kth natural frequency and are the weight coefficients. The covariance function Eq (5) is stationary, which means that it only depends on the difference between the timepoints tt. Applying the compound angle formula, we get (6)

Since the sinusoidal functions cos(ωkt) and sin(ωkt) form an orthonormal basis for L2(T), Eq (6) is the eigendecomposition of the covariance, where the eigenfunctions are sines and cosines with eigenvalues . From Eq (2), we know that structured weights with this covariance form a random series: (7) where each . Thus, the receptive fields are made up of sinusoids weighted by λk and the Gaussian variables .

Suppose we want receptive fields that only retain specific frequency information of the signal and filter out the rest. Take λk = 0 for any k where ωk < flo or ωk > fhi. We call this a bandlimited spectrum with passband [flo, fhi] and bandwidth fhiflo. As the bandwidth increases, the receptive fields become less smooth since they are made up of a wider range of frequencies. If the λk are all nonzero but decay at a certain rate, this rate controls the smoothness of the resulting GP [41].

When these receptive fields act on input signals x(t), they implicitly transform the inputs into the Fourier basis and filter frequencies based on the magnitude of λk. In a bandlimited setting, any frequencies outside the passband are filtered out, which makes the receptive fields selective to a particular range of frequencies and ignore others. On the other hand, classical random features weight all frequencies equally, even though in natural settings high frequency signals are the most corrupted by noise.

Insect mechanosensors.

We next consider a particular biological sensor that is sensitive to the time-history of forces. Campaniform sensilla (CS) are dome-shaped mechanoreceptors that detect local stress and strain on the insect exoskeleton [42]. They are embedded in the cuticle and deformation of the cuticle through bending or torsion induces depolarizing currents in the CS by opening mechanosensitive ion channels. The CS encode proprioceptive information useful for body state estimation and movement control during diverse tasks like walking, kicking, jumping, and flying [42].

We will model the receptive fields of CS that are believed to be critical for flight control, namely the ones found at the base of the halteres [43] and on the wings [14] (Fig 2A). Halteres and wings flap rhythmically during flight, and rotations of the insect’s body induce torsional forces that can be felt on these active sensory structures. The CS detect these small strain forces, thereby encoding angular velocity of the insect body [43]. Experimental results show haltere and wing CS are selective to a broad range of oscillatory frequencies [14, 44], with STAs that are smooth, oscillatory, selective to frequency, and decay over time [13] (Fig 2B).

thumbnail
Fig 2. Random receptive field model of insect mechanosensors.

(A) Diagram of the the cranefly, Tipula hespera. Locations of the mechanosensors, campaniform sensilla, are marked in blue on the wings and halteres. (B) Two receptive fields of campaniform sensilla are shown in blue. They are smooth, oscillatory, and decay over time. We model them as random samples from distributions parameterized by frequency and decay parameters. Data are from the hawkmoth [14]; cranefly sensilla have similar responses [13]. (C) Two random samples from the model distribution are shown in red. (D) The smoothness of the receptive fields is controlled by the frequency parameter. The decay parameter controls the rate of decay from the origin (not shown).

https://doi.org/10.1371/journal.pcbi.1010484.g002

We model these temporal receptive fields with a locally stationary GP [45] with bandlimited spectrum. Examples of receptive fields generated from this GP are shown in Fig 2C. The inputs to the CS are modeled as a finite continuous timeseries x(t) over the finite interval T = [0, L]. The neuron weights are generated from a covariance function (8) where ωk = 2πk/L is the kth natural frequency. As in the warmup, the frequency selectivity of their weights is accounted for by the parameters flo and fhi. As the bandwidth fhiflo increases, the receptive fields are built out of a wider selection of frequencies. This makes the receptive fields less smooth (Fig 2D). Each field is localized to near t = 0, and its decay with t is determined by the parameter γ. As γ increases, the receptive field is selective to larger time windows.

The eigenbasis of the covariance function Eq (8) is similar to a Fourier eigenbasis modulated by a decaying exponential. The eigenbasis is an orthonormal basis for the span of λket/γ cos(ωkt) and λket/γ sin(ωkt), which are a non-orthogonal set of functions in L2(T). The hidden weights transform timeseries inputs into this eigenbasis and discard components outside the passband frequencies [flo, fhi].

We fit the covariance model to receptive field data from 95 CS neurons from wings of the hawkmoth Manduca sexta (data from [14]). Briefly, CS receptive fields were estimated as the spike-triggered average (STA) of experimental mechanical stimuli of the wings, where the stimuli were generated as bandpassed white noise (2–300 Hz).

To characterize the receptive fields of this population of CS neurons, we compute the data covariance matrix Cdata by taking the inner product between the receptive fields. We normalize the trace to be the dimension of each receptive field (number of samples), which in this case is 40 kHz × 40 ms = 1600 samples. This normalization sets the overall scale of the covariance matrix. The data covariance matrix shows a tridiagonal structure (Fig 3A). The main diagonal is positive while the off diagonals are negative. All diagonals decay away from the top left of the matrix.

thumbnail
Fig 3. Spectral properties of mechanosensory RFs and our model are similar.

We compare the covariance matrices generated from (A) receptive fields of 95 mechanosensors from [14], (B) the model Eq (8), and (C) 95 random samples from the same model. All covariance matrices show a tri-diagonal structure that decays away from the origin. (D) The first five principal components of all three covariance matrices are similar and explain 90% of the variance in the RF data. (E) The leading eigenvalue spectra of the data and models show similar behavior.

https://doi.org/10.1371/journal.pcbi.1010484.g003

To fit the covariance model to the data, we optimize the parameters flo, fhi, and γ, finding flo = 75 Hz, fhi = 200 Hz, and γ = 12.17 ms best fit the sensilla data. We do so by minimizing the Frobenius norm of the difference between Cdata and the model (see S1 Appendix). The resulting model covariance matrix (Fig 3B) matches the data covariance matrix (Fig 3A) remarkably well qualitatively. The normalized Frobenius norm of the difference between Cdata and the model is 0.4386. Examples of biological receptive fields and random samples from this fitted covariance model are shown in Fig B in S1 Appendix. To simulate the effect of a finite number of neurons, we generate 95 weight vectors (equal to the number of neurons recorded) and recompute the model covariance matrix (Fig 3C). We call this the finite neuron model covariance matrix Cfinite, and it shows the bump and blob-like structures evident in Cdata but not in Cmodel. This result suggests that these bumpy structures can be attributed to having a small number of recorded neurons. We hypothesize that these effects would disappear with a larger dataset and Cdata would more closely resemble Cmodel.

For comparison, we also calculate the Frobenius difference for null models, the unstructured covariance model and the Fourier model (5). For the unstructured model, the Frobenius norm difference is 0.9986 while that of the Fourier model is 0.9123. The sensilla covariance model has a much lower difference (0.4386) compared to the null models, fitting the data more accurately. We show the covariance matrices and sampled receptive fields from the null models in Fig C to E in S1 Appendix.

Comparing the eigenvectors and eigenvalues of the data and model covariance matrices, we find that the spectral properties of both Cmodel and Cfinite are similar to that of Cdata. The eigenvalue curves of the models match that of the data quite well (Fig 3E); these curves are directly comparable because each covariance is normalized by its trace, which makes the sum of the eigenvalues unity. Further, all of the data and the model covariance matrices are low-dimensional. The first 10 data eigenvectors explain 97% of the variance, and the top 5 explain 90%. The top 5 eigenvectors of the model and its finite sample match that of the data quite well (Fig 3D).

Primary visual cortex.

We now turn to visually driven neurons from the mammalian primary cortex. Primary visual cortex (V1) is the earliest cortical area for processing visual information (Fig 4A). The neurons in V1 can detect small changes in visual features like orientations, spatial frequencies, contrast, and size.

thumbnail
Fig 4. Random receptive field model of Primary Visual Cortex (V1).

(A) Diagram of the mouse brain with V1 shown in blue. (B) Receptive fields of two mouse V1 neurons calculated from their response to white noise stimuli. The fields are localized to a region in a visual field and show “on” and “off” regions. (C) Random samples from the model Eq (9) distribution. (D) Increasing the receptive field size parameter in our model leads to larger fields. (E) Increasing the model spatial frequency parameter leads to more variable fields.

https://doi.org/10.1371/journal.pcbi.1010484.g004

Here, we model the receptive fields of simple cells in V1, which have clear excitatory and inhibitory regions such that light shone on the excitatory regions increase the cell’s response and vice-versa (Fig 4B). The shape of the regions determines the orientation selectivity, while their widths determine the frequency selectivity. The receptive fields are centered to a location in the visual field and decay away from it. They integrate visual stimuli within a small region of this center [46]. Gabor functions are widely used as a mathematical model of the receptive fields of simple cells [9].

We model these receptive fields using another locally stationary GP [45] and show examples of generated receptive fields in Fig 4C. Consider the inputs to the cortical cells to be a continuous two-dimensional image x(t), where the domain T = [0, L] × [0, L′] and . Since the image is real-valued, x(t) is the grayscale contrast or single color channel pixel values. The neuron weights are then generated from a covariance function of the following form: (9)

The receptive field center is defined by c, and the size of the receptive field is determined by the parameter s. As s increases, the receptive field extends farther from the center c (Fig 4D). Spatial frequency selectivity is accounted for by the bandwidth parameter f. As f decreases, the spatial frequency of the receptive field goes up, making the weights less smooth (Fig 4E).

The eigendecomposition of the covariance function Eq (9) leads to an orthonormal basis of single scale Hermite wavelets [47, 48]. When c = 0, the wavelet eigenfunctions are Hermite polynomials modulated by a decaying Gaussian: (10) where Hk is the kth (physicist’s) Hermite polynomial; eigenfunctions for nonzero centers c are just shifted versions of Eq (10). The detailed derivation and values of the constants c1, c2, c3 and normalization are in S1 Appendix.

We use Eq (9) to model receptive field data from 8,358 V1 neurons recorded with calcium imaging from transgenic mice expressing GCaMP6s; the mice were headfixed and running on an air-floating ball. We presented 24,357 unique white noise images of 14 × 36 pixels using the Psychtoolbox [49], where the pixels were white or black with equal probability. Images were upsampled to the resolution of the screens via bilinear interpolation. The stimulus was corrected for eye-movements online using custom code. The responses of 45,026 cells were collected using a two-photon mesoscope [50] and preprocessed using Suite2p [51]. Receptive fields were calculated from the white noise images and the deconvolved calcium responses of the cells using the STA. For the covariance analysis, we picked cells above the signal-to-noise (SNR) threshold of 0.4; this gave us 8,358 cells. The SNR was calculated from a smaller set of 2,435 images that were presented twice using the method from [4]. As a preprocessing step, we moved the center of mass of every receptive field to the center of the visual field.

We compute the data covariance matrix Cdata by taking the inner product between the receptive fields. We normalize the trace to be the dimension of each receptive field, which in this case is (14 × 36) pixels = 504 pixels. The data covariance matrix resembles a tridiagonal matrix. However, the diagonals are non-zero only at equally spaced segments. Additionally, their values decay away from the center of the matrix. We show Cdata zoomed in at the non-zero region around the center of the matrix (Fig 5A); this corresponds to the 180 × 180 pixel region around the center of the full 504 × 504 pixel matrix. The full covariance matrix is shown in Fig F in S1 Appendix.

thumbnail
Fig 5. Spectral properties of V1 RFs and our model are similar.

We compare the covariance matrices generated from the (A) receptive fields of 8,358 mouse V1 neurons, (B) the GP model Eq (9), and (C) 8,358 random samples from the model. These resemble a tri-diagonal matrix whose diagonals are non-zero at equally-spaced segments. (D) The leading 10 eigenvectors of the data and model covariance matrices show similar structure and explain 68% of the variance in the data. Analytical Hermite wavelet eigenfunctions are in the last row and differ from the model due to discretization (both cases) and finite sampling (8,358 neurons only). (E) The eigenspectrum of the model matches well with the data. The staircase pattern in the model comes from repeated eigenvalues at each frequency. The model curve with infinite neurons (black) is obscured by the model curve with 8,358 neurons (red).

https://doi.org/10.1371/journal.pcbi.1010484.g005

In the covariance model, the number of off-diagonals, the center, the rate of their decay away from the center are determined by the parameters f, s and c respectively. The covariance between pixels decays as a function of their distance from c. This leads to the the equally-spaced non-zero segments. On the other hand, the covariance also decays as a function of the distance between pixels. This brings the the diagonal structure to the model. When the frequency parameter f increases, the number of off-diagonals increases. Pixels in the generated weights become more correlated and the weights become spatially smoother. When the size parameter s increases, the diagonals decay slower from the center c, increasing correlations with the center pixel and leading the significant weights to occupy more of the visual field.

We again optimize the parameters to fit the data, finding s = 1.87 and f = 0.70 pixels, by minimizing the Frobenius norm of the difference between Cdata and the model. We do not need to optimize over the center parameter c, since we preprocess the data so that all receptive fields are centered at c = (7, 18), the center of the 14 × 36 grid. The resulting model covariance matrix (Fig 5B) and the data covariance matrix (Fig 5A) match remarkably well qualitatively. The normalized Frobenius norm of the difference between Cdata and the model is 0.2993. Examples of biological receptive fields and random samples from the fitted covariance model are shown in Fig G in S1 Appendix. To simulate the effect of a finite number of neurons, we generate 8,358 weights, equal to the number of neurons in our data, to compute Cfinite shown in Fig 5C. This finite matrix Cfinite looks even more like Cdata, and it shows that some of the negative covariances far from center result from finite sample size but not all.

For comparison, we also calculate the normalized Frobenius difference for null models, the unstructured covariance model and a translation invariant V1 model. In the translation invariant model, we remove the spatially localizing exponential in Eq (9) and only fit the spatial frequency parameter, f. For the unstructured model, the Frobenius norm difference is 0.9835 while that of the translation invariant model is 0.9727. The V1 covariance model has a much lower difference (0.2993) and is a better fit to the data. We show the covariance matrices and sampled receptive fields from these null models in Fig H to J in S1 Appendix.

Similar spectral properties are evident in the eigenvectors and eigenvalues of Cmodel, Cfinite, Cdata, and the analytical forms derived in Eq (10) (Fig 5D and 5E). The covariances are again normalized to have unit trace. Note that the analytical eigenfunctions are shown on a finer grid than the model and data because the analysis was performed in continuous space. The differences between the eigenfunctions and eigenvalues of the analytical and model results are due to discretization. Examining the eigenvectors (Fig 5D), we also see a good match, although there are some rotations and differences in ordering. These 10 eigenvectors explain 68% of the variance in the receptive field data. For reference, the top 80 eigenvectors explain 86% of the variance in the data and all of the variance in the model. The eigenvalue curves of both the models and the analytical forms match that of the data (Fig 5E) reasonably well, although not as well as for the mechanosensors. In S1 Appendix, we repeat this analysis for receptive fields measured with different stimulus sets in the mouse and different experimental dataset from non-human primate V1. Our findings are consistent with the results shown above (Fig K to P in S1 Appendix).

Advantages of structured random weights for artificial learning tasks

Our hypothesis is that neuronal inductive bias from structured receptive fields allows networks to learn with fewer neurons, training examples, and steps of gradient descent for classification tasks with naturalistic inputs. To examine this hypothesis, we compare the performance of structured receptive fields against classical ones on several classification tasks. We find that, for most artificial learning tasks, structured random networks learn more accurately from smaller network sizes, fewer training examples, and gradient steps.

Frequency detection.

CS naturally encode the time-history of strain forces acting on the insect body and sensors inspired by their temporal filtering properties have been shown to accurately classify spatiotemporal data [52]. Inspired by this result, we test sensilla-inspired mechanosensory receptive fields on a timeseries classification task (Fig 6A, top). Each example presented to the network is a 100 ms timeseries sampled at 2 kHz so that d = 200, and the goal is to detect whether or not each example contains a sinusoidal signal. The positive examples are sinusoidal signals with f1 = 50 Hz and corrupted by noise so that their SNR = 1.76 (2.46 dB). The negative examples are Gaussian white noise with matched amplitude to the positive examples. Note that this frequency detection task is not linearly separable because of the random phases in positive and negative examples. See S1 Appendix for additional details including the definition of SNR and how cross-validation was used to find the optimal parameters flo = 10 Hz, fhi = 60 Hz, and γ = 50 ms.

thumbnail
Fig 6. Random mechanosensory weights enable learning with fewer neurons in time-series classification tasks.

We show the test error of random feature networks with both mechanosensory and classical white-noise weights against the number of neurons in their hidden layer. For every hidden layer width, we generate five random networks and average their test error. In the error curves, the solid lines show the average test error while the shaded regions represent the standard error across five generations of the random network. The top row shows the timeseries tasks that the networks are tested on. (A, top) In the frequency detection task, a f1 = 50 Hz frequency signal (purple) is separated from white noise (black). (B, top) In the frequency XOR task, f1 = 50 Hz (purple) and f2 = 80 Hz (light purple) signals are separated from white noise (black) and mixtures of 50 Hz and 80 Hz (gray). When their covariance parameters are tuned properly, mechanosensor-inspired networks achieve lower error using fewer hidden neurons on both frequency detection (A, bottom) and frequency XOR (B, bottom) tasks. However, the performance of bio-inspired networks suffer if their weights are incompatible with the task.

https://doi.org/10.1371/journal.pcbi.1010484.g006

For the same number of hidden neurons, the structured RFN significantly outperforms a classical RFN. We show test performance using these tuned parameters in Fig 6A. Even in this noisy task, it achieves 0.5% test error using only 25 hidden neurons. Meanwhile, the classical network takes 300 neurons to achieve similar error.

Predictably, the performance suffers when the weights are incompatible with the task. We show results when flo = 10 Hz and fhi = 40 Hz and the same γ (Fig 6A). The incompatible RFN performs better than chance (50% error) but much worse than the classical RFN. It takes 300 neurons just to achieve 16.3% test error. The test error does not decrease below this level even with additional hidden neurons.

Frequency XOR task.

To challenge the mechanosensor-inspired networks on a more difficult task, we build a frequency Exclusive-OR (XOR) problem (Fig 6B, top). XOR is a binary function which returns true if and only if the both inputs are different, otherwise it returns false. XOR is a classical example of a function that is not linearly separable and thus harder to learn. Our inputs are again 100 ms timeseries sampled at 2 kHz. The inputs either contain a pure frequency of f1 = 50 Hz or f2 = 80 Hz, mixed frequency signals with both f1 and f2, or white noise. In both the pure and mixed frequency cases, we add noise so that the SNR = 1.76. See S1 Appendix for details. The goal of the task is to output true if the input contains either pure tone and false if the input contains mixed frequencies or is white noise.

We tune the GP covariance parameters flo, fhi, and γ from Eq (8) using cross-validation. The cross validation procedure and algorithmic details are identical to that of the frequency detection task. Using cross validation, we find the optimal parameters to be flo = 50 Hz, fhi = 90 Hz, and γ = 40 ms. For incompatible weights, we take flo = 10 Hz, fhi = 60 Hz, and the same γ.

The structured RFN significantly outperform classical RFN for the same number of hidden neurons. We show network performance using these parameters in Fig 6B. Classification error of 1% can be achieved with 25 hidden neurons. In sharp contrast, the classical RFN requires 300 hidden neurons just to achieve 6% error. With incompatible weights, the network needs 300 neurons to achieve just 15.1% test error and does not improve with larger network sizes. Out of the four input subclasses, it consistently fails to classify pure 80 Hz sinusoidal signals which are outside its passband.

Image classification.

We next test the V1-inspired receptive fields on two standard digit classification tasks, MNIST [53] and KMNIST [54]. The MNIST and KMNIST datasets each contain 70,000 images of handwritten digits. In MNIST, these are the Arabic numerals 0–9, whereas KMNIST has 10 Japanese hiragana phonetic characters. Both datasets come split into 60,000 training and 10,000 test examples. With 10 classes, there are 6,000 training examples per class. Every example is a 28 × 28 grayscale image with centered characters.

Each hidden weight has its center c chosen uniformly at random from all pixels. This ensures that the network’s weights uniformly cover the image space and in fact means that the network can represent any sum of locally-smooth functions (see S1 Appendix). We use a network with 1,000 hidden neurons and tune the GP covariance parameters s and f from Eq (9) using 3-fold cross validation on the MNIST training set. Each parameter ranges from 1 to 20 pixels, and the optimal parameters are found with a grid search. We find the optimal parameters to be s = 5 pixels and f = 2 pixels. We then refit the optimal model using the entire training set. The parameters from MNIST were used on the KMNIST task without additional tuning.

The V1-inspired achieves much lower average classification error as compared to the classical RFN for the same number of hidden neurons. We show learning performance using these parameters on the MNIST task in Fig 7A. To achieve 6% error on the MNIST task requires 100 neurons versus 1,000 neurons for the classical RFN, and the structured RFN achieves 2.5% error with 1,000 neurons. Qualitatively similar results hold for the KMNIST task (Fig 7B), although the overall errors are larger, reflecting the harder task. To achieve 28% error on KMNIST requires 100 neurons versus 1,000 neurons for the classical RFN, and the structured RFN achieves 13% error with 1,000 neurons.

thumbnail
Fig 7. Random V1 weights enable learning with fewer neurons and fewer examples on digit classification tasks.

We show the average test error of random feature networks with both V1 and classical white-noise weights against the number of neurons in their hidden layer. For every hidden layer width, we generate five random networks and average their test error. The solid lines show the average test error while the shaded regions represent the standard error across five generations of the random network. The top row shows the network’s test error on (A) MNIST and (B) KMNIST tasks. When their covariance parameters are tuned properly, V1-inspired networks achieve lower error using fewer hidden neurons on both tasks. The network performance deteriorates when the weights are incompatible to the task. (C) MNIST and (D) KMNIST with 5 samples per class. The V1 network still achieves lower error on these fewshot tasks when the parameters are tuned properly.

https://doi.org/10.1371/journal.pcbi.1010484.g007

Again, network performance suffers when GP covariance parameters do not match the task. This happens if the size parameter s is smaller than the stroke width or spatial scale f doesn’t match the stroke variations in the character. Taking the incompatible parameters s = 0.5 and f = 0.5 (Fig 7A and 7B), the structured weights performs worse than the classical RFN in both tasks. With 1,000 hidden neurons, it achieves the relatively poor test errors of 8% on MNIST (Fig 7A) and 33% on KMNIST (Fig 7B).

Structured weights improve generalization with limited data.

Alongside learning with fewer hidden neurons, V1 structured RFNs also learn more accurately from fewer examples. We test few-shot learning using the image classification datasets. The training examples are reduced from 60,000 to 50, or only 5 training examples per class. The test set and GP parameters remain the same.

Structured encodings allow learning with fewer samples than unstructured encodings. We show these few-shot learning results in Fig 7C and 7D. The networks’ performance saturate past a few hundred hidden neurons. For MNIST, the lowest error achieved by V1 structured RFN is 27% versus 33% for the classical RFN and 37% using incompatible weights (Fig 7C). The structured network acheives 61% error using structured features on the KMNIST task, as opposed to 66% for the classical RFN and 67% using incompatible weights (Fig 7D).

Networks train faster when initialized with structured weights.

Now we study the effect of structured weights as an initialization strategy for fully-trained neural networks where all weights in the network vary. We hypothesized that structured initialization allows networks to learn faster, i.e. that the training loss and test error would decrease faster than with unstructured weights. We have shown that the performance of RFNs improves with biologically inspired weight sampling. However, in RFNs Eq (1) only the readout weights β are modified with training, and the hidden weights W are frozen at their initial value.

We compare the biologically-motivated initialization with a classical initialization where the variance is inversely proportional to the number of hidden neurons, . This initialization is widely known as the “Kaiming He normal” scheme and is thought to stabilize training dynamics by controlling the magnitude of the gradients [55]. The classical approach ensures that , so for fair comparison we scale our structured weight covariance matrix to have Tr(C) = 2. In our studies with RFNs the trace is equal to d, but this weight scale can be absorbed into the readout weights β due to the homogeneity of the ReLU.

We again compare structured and unstructured weights on MNIST and KMNIST tasks, common benchmarks for fully-trained networks. The architecture is a single hidden layer feedforward neural network (Fig 1) with 1,000 hidden neurons. The cross-entropy loss over the training sets are minimized using simple gradient descent (GD) for 3,000 epochs. For a fair comparison, the learning rate is optimized for each network separately. We define the area under the training loss curve as a metric for the speed of learning. Then, we perform a grid search in the range of (1e−4, 1e0) for the learning rate that minimizes this metric, resulting in the parameters 0.23, 0.14, 0.14 for structured, unstructured and incompatible networks respectively. All other parameters are the same as for image classification.

In both the MNIST and KMNIST tasks, the V1-initialized network minimizes the loss function faster than the classically initialized network. For the MNIST task, the V1 network achieves a loss value of 0.05 after 3,000 epochs compared to 0.09 for the other network (Fig 8A). We see qualitatively similar results for the KMNIST task. At the end of training, the V1-inspired network’s loss is 0.08, while the classically initialized network only reaches 0.12 (Fig 8B). We find that the the V1-initialized network performs no better than classical initialization when the covariance parameters do not match the task. With incompatible parameters, the V1-initialized network achieves a loss value of 0.11 on MNIST and 0.15 on KMNIST.

thumbnail
Fig 8. V1 weight initialization for fully-trained networks enables faster training on digit classification tasks.

We show the average test error and the train loss of fully-trained neural networks against the number of training epochs. The hidden layer of each network contains 1,000 neurons. We generate five random networks and average their performance. The solid lines show the average performance metric across five random networks while the shaded regions represent the standard error. The top row shows the network’s training loss on (A) MNIST and (B) KMNIST tasks. The bottom row shows the corresponding test error on (C) MNIST and (D) KMNIST tasks. When their covariance parameters are tuned properly, V1-initialized networks achieve lower training loss and test error under fewer epochs on both MNIST and KMNIST tasks. The network performance is no better than unstructured initialization when the weights are incompatible with the task.

https://doi.org/10.1371/journal.pcbi.1010484.g008

Not only does it minimize the training loss faster, the V1-initialized network also generalizes well and achieves a lower test error at the end of training. For MNIST, it achieves 1.7% test error compared to 3.3% error for the classically initialized network, and 3.6% using incompatible weights (Fig 8C). For KMNIST, we see 9% error compared to 13% error with classical initalization and 15% using incompatible weights (Fig 8D).

We see similar results across diverse hidden layer widths and learning rates (Fig Q to T in S1 Appendix), with the benefits most evident for wider networks and smaller learning rates. Furthermore, the structured weights show similar results when trained for 10,000 epochs (rate 0.1; 1,000 neurons; not shown) and with other optimizers like minibatch Stochastic Gradient Descent (SGD) and ADAM (batch size 256, rate 0.1; 1,000 neurons; not shown). Structured initialization facilitates learning across a wide range of networks.

However, the improvement is not universal: no significant benefit was found by initializing the early convolutional layers of the deep network AlexNet [56] and applying it to the ImageNet dataset [57], as shown in S1 Appendix and Fig U in S1 Appendix. The large amounts of training data and the fact that only a small fraction of the network was initialized with structured weights could explain this null result. Also, in many of these scenarios the incompatible structured weights get to performance on par with the compatible ones by the end of training, when the poor inductive bias is overcome.

Improving representation with structured random weights.

We have shown how structured receptive field weights can improve the performance of RFNs and fully-trained networks on a number of supervised learning tasks. As long as the receptive fields are compatible with the task itself, then performance gains over unstructured features are possible. If they are incompatible, then the networks performs no better or even worse than using classical unstructured weights.

These results can be understood with our theoretical framework. Structured weights effectively cause the input x to undergo a linear transformation into a new representation following Theorem 1. In all of our examples, this new representation is bandlimited due to how we design the covariance function. (The V1 weights have all eigenvalues nonzero, but the spectrum decays exponentially, so it acts as a lowpass filter.) By moving to a bandlimited representation, we both filter out noise—high-frequency components—and reduce dimensionality—coordinates in outside the passband are zero. In general, noise and dimensionality both make learning harder.

It is easiest to understand these effects in the frequency detection task. For simplicity, assume we are using the stationary features of our warm-up to do frequency detection. In this task, all of the signal power is contained in the f1 = 50 Hz frequency, and everything else is due to noise. If the weights are compatible with the task, this means that w is a sum of sines and cosines of frequencies ωk in some passband which includes f1. The narrower we make this bandwidth while still retaining the signal, the higher the SNR of becomes since more noise is filtered out (see S1 Appendix).

Discussion

In this paper, we describe a random generative model for the receptive fields of sensory neurons. Specifically, we model each receptive field as a random filter sampled from a Gaussian process (GP) with covariance structure matched to the statistics of experimental neural data. We show that two kinds of sensory neurons—insect mechanosensory and simple cells in mammalian V1—have receptive fields that are well-described by GPs. In particular, the generated receptive fields, their second-order statistics, and their principal components match with receptive field data. Theoretically, we show that individual neurons perform a randomized transformation and filtering on the inputs. This connection provides a framework for sensory neurons to compute input transformations like Fourier and wavelet transforms in a biologically plausible way.

Our numerical results using these structured random receptive fields show that they offer better learning performance than unstructured receptive fields on several benchmarks. The structured networks achieve higher test performance with fewer neurons and fewer training examples, unless the frequency content of their receptive fields is incompatible with the task. In networks that are fully trained, initializing with structured weights leads to better network performance (as measured by training loss and generalization) in fewer iterations of gradient descent. Structured random features may be understood theoretically as transforming inputs into an informative basis that retains the important information in the stimulus while filtering away irrelevant signals.

Modeling other sensory neurons and modalities

The random feature formulation is a natural extension of the traditional linear-nonlinear (LN) neuron model. This approach may be applied to other brain regions where LN models are successful, for instance sensory areas with primarily feedforward connectivity like somatosensory and auditory regions. The neurons in auditory and somatosensory systems are selective to both spatial and temporal structures in their stimuli [10, 11, 58], and spatial structure emerges in networks trained on artificial tactile tasks [59]. Their receptive fields could be modeled by GPs with spatiotemporal covariance functions [60]; these could be useful for artificial tasks with spatiotemporal stimuli such as movies and multivariate timeseries. Neurons with localized but random temporal responses were found to be compatible with manifold coding in a decision-making task [61]. Our GPs are a complementary approach to traditional sparse coding [62] and efficient coding [63, 64] hypotheses; the connections to these other theories are interesting for future research.

Receptive fields in development

Our generative model offers new directions to explore the biological basis and computational principles behind receptive fields. Development lays a basic architecture that is conserved from animal to animal [65, 66], yet the details of every neural connection cannot be specified [67], leading to some amount of inevitable randomness at least initially [19]. If receptive fields are random with constrained covariance, it is natural to ask how biology implements this. Unsupervised Hebbian dynamics with local inhibition can allow networks to learn principal components of their input [68, 69]. An interesting future direction is how similar learning rules may give rise to overcomplete, nonorthogonal structure similar to what has been studied here. This may prove more biologically plausible than weights that result from task-driven optimization.

The above assumes that receptive field properties actually lie within synaptic weights. For spatial receptive fields, this assumption is plausible [70], but the temporal properties of receptive fields are more likely a result of neurons’ intrinsic dynamics, for which the LN framework is just a model [7173]. Heterogeneous physiological (e.g. resonator dynamics) and mechanical (position and shape of mechanosensor relative to body structure) properties combine to give the diverse temporal receptive field structures [74]. Development thus leverages different mechanisms to build structure into receptive field properties of sensory neurons.

Connections to compressive sensing

Random projections have seen extensive use in the field of compressive sensing, where a high-dimensional signal can be found from only a few measurements so long as it has a sparse representation [7577]. Random compression matrices are known to have optimal properties, however in many cases structured randomness is more realistic. Recent work has shown that structured random projections with local wiring constraints (in one dimension) were compatible with dictionary learning [78], supporting previous empirical results [79]. Our work shows that structured random receptive fields are equivalent to employing a wavelet dictionary and dense Gaussian projection.

Machine learning and inductive bias

An important open question for both neuroscience and machine learning is why certain networks, characterized by features such as their architecture, weights, and nonlinearities, are better than others for certain problems. One perspective is that a network is good for a problem if it is biased towards approximating functions that are close to the target, known as an inductive bias, which depends on an alignment between the features encoded by neurons and the task at hand [32]. Our approach shows that structured receptive fields are equivalent to a linear transformation of the input that can build in such biases. Furthermore, we can describe the nonlinear properties of the network using the kernel, which varies depending on the receptive field structure. If the target function has a small norm in this RKHS, then there is an inductive bias and it is easier to learn [80, 81]. A small norm in the RKHS means that the target function varies smoothly over the inputs. Smooth functions are easier to learn compared to fast varying ones. In this way, the receptive field structure influences the ease of learning of the target function. We conjecture that receptive fields from neural-inspired distributions affect the RKHS geometry such that the target function’s norm is small in that RKHS, compared to the RKHS of random white-noise receptive fields. We leave to future work to verify this conjecture in detail.

Networks endowed with principles of neural computation like batch normalization, pooling of inputs, and residual connections have been found to contain inductive biases for certain learning problems [82, 83]. Learning data-dependent kernels is another way to add in inductive bias [84]. We also saw that initializing fully-trained networks from our generative models improved their speed of convergence and generalization compared to unstructured initialization. This result is consistent with known results that initialization has an effect on generalization [85]. The initialization literature has mostly been focused on avoiding exploding/vanishing gradients [55, 86]. Here, we conjecture that the inductive bias in our structured connectivity places the network closer to a good solution in the loss landscape [67].

The random V1-inspired receptive fields that we model can be seen as similar to what happens in a convolutional neural network (CNN) [87], which have similarities and differences compared to brains [88]. A recent study showed that CNNs with a fixed V1-like convolutional layer are more robust to adversarial perturbations to their inputs [89]. In a similar vein to our work, using randomly sampled Gabor receptive fields in the first layer of a deep network was also shown to improve its performance [90]. The wavelet scattering transform is a multi-layer network where wavelet coefficients are passed through nonlinearities, a model which is similar to deep CNNs [9193]. Our framework differs as a randomized model and yields wavelets of a single scale, and similar studies of robustness and learning in deep networks with our weights are possible. Adding layers to our model or sampling weights with a variety of spatial frequencies and field sizes would yield random networks that behave similar to the scattering transform, offering an another connection between the brain and CNNs. Directly learning filters in a Hermite wavelet basis led to good perfomance in ANNs with little data [37], and this idea was extended to multiple scales by [94]. Our structured random features can be seen as an RFN version of those ideas with supporting evidence that these principles are used in biology.

Limitations and future directions

There are several limitations to the random feature approach. We model neuron responses with a scalar firing rates instead of discrete spikes, and we ignore complex neuronal dynamics, neuromodulatory context, and many other details. Like most LN models, the random feature model assumes zero plasticity in the hidden layer neurons. However, associative learning can drive changes in receptive fields of individual neurons in sensory areas like V1 and auditory cortex [95, 96]. Further, our RFN is purely feedforward and cannot account for feedback connections. Recent work suggests that feedforward architecture lacks sufficient computational power to serve as a detailed input-output model for a network of cortical neurons; it might need additional layers with convolutional filters [97]. It can be difficult to interpret the parameters found from fitting receptive field data and connect them to experimental conditions. Also, the GP model of weights only captures covariance (second moments) and neglects higher-order statistics. It remains to be shown how the theory can yield concrete predictions that can be tested in vivo experimental conditions.

The random feature receptive field model is a randomized extension of the LN neuron model. The LN model fits a parameterized function to each receptive field [6]. In contrast, the random feature framework fits a distribution to an entire population of receptive fields and generates realistic receptive fields from that distribution. A natural question is how they compare. If the goal is to capture individual differences between neuronal receptive fields, one should resort to an LN model where each neuron’s receptive field is fit to data. The random feature model is not as flexible, but it provides a direct connection to random feature theory, and it is mathematically tractable and generative. This connection to kernel learning opens the door to using techniques which are a mainstay in machine learning theory literature, for instance to estimate generalization error and sample complexity [80], in the context of learning in more biologically realistic networks.

We see several future directions of structured random features in connecting computational neuroscience and machine learning. As already stated, the auditory, somatosensory, and tactile regions are good candidates for further study as well as developmental principles that could give rise to random yet structured receptive field properties. To account for plasticity in the hidden layer, one could also analyze the neural tangent kernel (NTK) associated with structured features [98]. These kernels are often used to analyze ANNs trained with gradient descent when the number of hidden neurons is large and the step size is small [26]. To incorporate lateral and feedback connections, the weights could be sampled from GPs with recurrent covariance functions [99]. Our theory may also help explain why CNNs with fixed V1-like convolutional layer are more robust to adversarial input perturbations [89] as filtering out high frequency corruptions. It seems likely that structured random features will also be more robust. It would be interesting to study intermediate layer weights of fully-trained networks as approximate samples from a GP by studying their covariance structure. Finally, one could try and develop other covariance functions and further optimize these RFNs for most sophisticated learning tasks to see if near high performance—lower error, faster training, etc.—on more difficult tasks is possible.

Methods

The methods are described throughout the Results section. Further details and additional results are in S1 Appendix.

Supporting information

S1 Appendix. Additional theory and supporting data.

Fig A: Simulation results for the simplified frequency detection task. On the left, test error versus dataset size for kernel ridge regression using structured kernels (purple lines, by bandwidth) and the unstructured kernel (blue). On the right, test accuracy versus dataset size for SVM classifier readout trained on structured random features (purple lines, by bandwidth) and unstructured random features (blue). In both cases, task structural information improves performance, leading to less error. Fig B: Receptive fields of mechanosensory neurons. We show (A) biological receptive fields and (B) random samples from the fitted covariance model. Fig C: Covariance matrix of mechanosensory receptive fields and unstructured model. We compare the covariance matrices generated from the (A) receptive fields of 95 mechanosensory neurons, (B) unstructured GP model and (C) 95 random samples from the model. Fig D: Covariance matrix of mechanosensory receptive fields and the Fourier model (5). We compare the covariance matrices generated from the (A) receptive fields of 95 mechanosensory neurons, (B) Fourier GP model and (C) 95 random samples from the model. Fig E: Receptive fields from mechanosensory neurons, the unstructured model and the Fourier model (5). We show the receptive fields from the (A) mechanosensory neurons, (B) unstructured GP model and (C) the Fourier GP model. Fig F: Covariance matrix of V1 receptive fields and our model for white noise stimuli. We show the full structure of the covariance matrices, which are the 180 × 180 pixel region around the centers of these 504 × 504 pixel matrices. These matrices are generated from the (A) receptive fields of 8,358 mouse V1 neurons, (B) the GP model Eq (9), and (C) 8,358 random samples from the model. Fig G: Receptive fields of V1 neurons from white noise stimuli. We show (A) biological receptive fields and (B) random samples from the fitted covariance model. Fig H: Covariance matrix of V1 receptive fields and unstructured model for white noise stimuli. We compare the covariance matrices generated from the (A) receptive fields of 8,358 mice V1 neurons, (B) unstructured GP model and (C) 8,358 random samples from the model. Fig I: Covariance matrix of V1 receptive fields and translation invariant V1 model for white noise stimuli. We compare the covariance matrices generated from the (A) receptive fields of 8,358 mice V1 neurons, (B) translation invariant version of the V1 GP model and (C) 8,358 random samples from the model. Fig J: Receptive fields from V1 neurons, the unstructured model and the translation invariant V1 model for white noise stimuli. We show the receptive fields from the (A) V1 neurons, (B) unstructured GP model and (C) the translation invariant V1 GP model. Fig K: Spectral properties of V1 receptive fields and our model for Ringach dataset. We compare the covariance matrices generated from the (A) receptive fields of 250 macaque V1 neurons, (B) the GP model Eq (9), and (C) 250 random samples from the model. The data is from [100]. (D) The leading 10 eigenvectors of the data and model covariance matrices show similar structure and explain 57% of the variance in the data. Analytical Hermite wavlet eigenfunctions are in the last row. (E) The eigenspectrum of the model matches well with the data. Fig L: Receptive fields of V1 neurons from the Ringach dataset. We show (A) biological receptive fields and (B) random samples from the fitted covariance model. Fig M: Spectral properties of V1 receptive fields and our model for natural image stimuli. We compare the covariance matrices generated from the (A) receptive fields of 10,782 mice V1 neurons, (B) the GP model Eq (9), and (C) 10,782 random samples from the model. (D) The leading 10 eigenvectors of the data and model covariance matrices show similar structure and explain 39% of the variance in the data. Analytical Hermite wavelet eigenfunctions are in the last row (E) The eigenspectrum of the model compared to the data. Fig N: Receptive fields of V1 neurons from natural images stimuli. We show (A) biological receptive fields and (B) random samples from the fitted covariance model. Fig O: Spectral properties of V1 receptive fields and our model for DHT stimuli. We compare the covariance matrices generated from the (A) receptive fields of 2,698 mice V1 neurons, (B) the GP model Eq (9), and (C) 2,698 random samples from the model. (D) The leading 10 eigenvectors of the data and model covariance matrices. They explain 29% of the variance in the data. Analytical Hermite wavelet eigenfunctions are in the last row. (E) The eigenspectrum of the model matches well with the data. Fig P: Receptive fields of V1 neurons from DHT stimuli. We show (A) biological receptive fields and (B) random samples from the fitted covariance model. Fig Q: Training loss on MNIST for fully-trained neural networks initialized with V1 weights. We show the average training loss of fully-trained networks against the number of training epochs across diverse hidden layer widths (50, 100, 400, and 1000) and learning rates (10−3, 10−2, and 10−1). For every hidden layer width, we generate five random networks and average their performance. The solid lines show the average training loss while the shaded region represents the standard error. When the covariance parameters are tuned properly, V1-initialized networks achieve lower training loss over fewer epochs. The benefits are more significant at larger network widths and lower learning rates. With incompatible weights, V1 initialization leads to similar performance as unstructured initialization. Fig R: Test error on MNIST for fully-trained neural networks initialized with V1 weights. We show the average test error of fully-trained networks against the number of training epochs across diverse hidden layer widths (50, 100, 400, and 1000) and learning rates (10−3, 10−2, and 10−1). For every hidden layer width, we generate five random networks and average their performance. The solid lines show the average test error while the shaded regions represent the standard error. When the covariance parameters are tuned properly, V1-initialized networks achieve lower test error over fewer epochs. The benefits are more significant at larger network widths and lower learning rates. With incompatible weights, V1 initialization leads to similar performance as unstructured initialization. Fig S: Training loss on KMNIST for fully-trained neural networks initialized with V1 weights. We show the average training loss of fully-trained networks against the number of training epochs across diverse hidden layer widths (50, 100, 400, and 1000) and learning rates (10−3, 10−2, and 10−1). For every hidden layer width, we generate five random networks and average their performance. The solid lines show the average training loss while the shaded regions represent the standard error. When the covariance parameters are tuned properly, V1-initialized networks achieve lower training loss over fewer epochs. The benefits are more significant at larger network widths and lower learning rates. With incompatible weights, V1 initialization leads to similar performance as unstructured initialization. Fig T: Test error on KMNIST for fully-trained neural networks initialized with V1 weights. We show the average test error of fully-trained networks against the number of training epochs across diverse hidden layer widths (50, 100, 400, and 1000) and learning rates (10−3, 10−2, and 10−1). For every hidden layer width, we generate five random networks and average their performance. The solid lines show the average test error while the shaded regions represent the standard error. When the covariance parameters are tuned properly, V1-initialized networks achieve lower test error over fewer epochs. The benefits are more significant at larger network widths and lower learning rates. With incompatible weights, V1 initialization leads to similar performance as unstructured initialization. Fig U: Initializing AlexNet using structured random features shows little benefit for ImageNet. Training and testing loss are shown for classical and structured random initializations of convolutional layers in AlexNet. These losses are initially lower for structured features, but by 6 epochs the classical initialization catches up and it eventually reaches a slightly lower loss than the structured initialization. Note that the training losses are higher than testing due to dropout applied in the training phase.

https://doi.org/10.1371/journal.pcbi.1010484.s001

(PDF)

Acknowledgments

We thank Dario Ringach for providing the macaque V1 data and Brandon Pratt for the hawkmoth mechanosensor data. We are grateful to Ali Weber, Steven Peterson, Owen Levin, and Alice C. Schwarze for useful discussions. We thank Sarah Lindo, Michalis Michaelos, and Carsen Stringer for help with mouse surgeries, calcium imaging, and data processing, respectively.

References

  1. 1. Yuste R. From the neuron doctrine to neural networks. Nature Reviews Neuroscience. 2015;16(8):487–497. pmid:26152865
  2. 2. Fusi S, Miller EK, Rigotti M. Why neurons mix: high dimensionality for higher cognition. Current Opinion in Neurobiology. 2016;37:66–74. pmid:26851755
  3. 3. Saxena S, Cunningham JP. Towards the neural population doctrine. Current Opinion in Neurobiology. 2019;55:103–111. pmid:30877963
  4. 4. Stringer C, Pachitariu M, Steinmetz N, Carandini M, Harris KD. High-dimensional geometry of population responses in visual cortex. Nature. 2019;571(7765):361–365. pmid:31243367
  5. 5. Sherrington C. The Integrative Action of the Nervous System. Cambridge University Press; 1907.
  6. 6. Chichilnisky EJ. A simple white noise analysis of neuronal light responses. Network: Computation in Neural Systems. 2001;12(2):199–213. pmid:11405422
  7. 7. Sakai HM, Naka K. Signal transmission in the catfish retina. V. Sensitivity and circuit. Journal of Neurophysiology. 1987;58(6):1329–1350. pmid:2830371
  8. 8. Clay Reid R, Alonso JM. Specificity of monosynaptic connections from thalamus to visual cortex. Nature. 1995;378(6554):281–284.
  9. 9. Jones JP, Palmer LA. An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology. 1987;58(6):1233–1258. pmid:3437332
  10. 10. Knudsen EI, Konishi M. Center-surround organization of auditory receptive fields in the owl. Science. 1978;202(4369):778–780. pmid:715444
  11. 11. S H. Central mechanisms of tactile shape perception. Current opinion in neurobiology. 2008;18(4).
  12. 12. Rusanen J, Frolov R, Weckström M, Kinoshita M, Arikawa K. Non-linear amplification of graded voltage signals in the first-order visual interneurons of the butterfly Papilio xuthus. Journal of Experimental Biology. 2018;221(12). pmid:29712749
  13. 13. Fox JL, Fairhall AL, Daniel TL. Encoding properties of haltere neurons enable motion feature detection in a biological gyroscope. Proceedings of the National Academy of Sciences. 2010;107(8):3840–3845.
  14. 14. Pratt B, Deora T, Mohren T, Daniel T. Neural evidence supports a dual sensory-motor role for insect wings. Proceedings of the Royal Society B: Biological Sciences. 2017;284(1862):20170969. pmid:28904136
  15. 15. Clemens J, Ronacher B. Feature Extraction and Integration Underlying Perceptual Decision Making during Courtship Behavior. Journal of Neuroscience. 2013;33(29):12136–12145. pmid:23864698
  16. 16. Park M, Pillow JW. Receptive field inference with localized priors. PLoS computational biology. 2011;7(10):e1002219. pmid:22046110
  17. 17. Bonin V, Histed MH, Yurgenson S, Reid RC. Local Diversity and Fine-Scale Organization of Receptive Fields in Mouse Visual Cortex. Journal of Neuroscience. 2011;31(50):18506–18521. pmid:22171051
  18. 18. Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review. 1958;65(6):386. pmid:13602029
  19. 19. Caron SJC, Ruta V, Abbott LF, Axel R. Random convergence of olfactory inputs in the Drosophila mushroom body. Nature. 2013;497(74477447):113–117. pmid:23615618
  20. 20. Litwin-Kumar A, Harris KD, Axel R, Sompolinsky H, Abbott LF. Optimal Degrees of Synaptic Connectivity. Neuron. 2017;93(5):1153–1164.e7. pmid:28215558
  21. 21. Broomhead DS, Lowe D. Radial basis functions, multi-variable functional interpolation and adaptive networks. Royal Signals and Radar Establishment Malvern (United Kingdom); 1988.
  22. 22. Igelnik B, Pao YH. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE transactions on neural networks. 1995;6(6):1320–1329. pmid:18263425
  23. 23. Rahimi A, Recht B. Random Features for Large-Scale Kernel Machines. In: Platt JC, Koller D, Singer Y, Roweis ST, editors. Advances in Neural Information Processing Systems 20. Curran Associates, Inc.; 2008. p. 1177–1184.
  24. 24. Liu F, Huang X, Chen Y, Suykens JAK. Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond. arXiv:200411154 [cs, stat]. 2020.
  25. 25. Arora S, Du SS, Hu W, Li Z, Wang R. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks. arXiv:190108584 [cs, stat]. 2019.
  26. 26. Arora S, Du SS, Hu W, Li Z, Salakhutdinov R, Wang R. On Exact Computation with an Infinitely Wide Neural Net. arXiv:190411955 [cs, stat]. 2019.
  27. 27. Chen L, Xu S. Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS. arXiv:200910683 [cs, math, stat]. 2021.
  28. 28. Neal RM. In: Priors for Infinite Networks. New York, NY: Springer New York; 1996. p. 29–53. Available from: https://doi.org/10.1007/978-1-4612-0745-0_2.
  29. 29. Williams CKI. Computation with Infinite Neural Networks. Neural Computation. 1998;10(5):1203–1216.
  30. 30. Rahimi A, Recht B. Uniform approximation of functions with random bases. In: 2008 46th Annual Allerton Conference on Communication, Control, and Computing. IEEE; 2008. p. 555–561. Available from: http://ieeexplore.ieee.org/document/4797607/.
  31. 31. Bordelon B, Canatar A, Pehlevan C. Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks. arXiv:200202561 [cs, stat]. 2020.
  32. 32. Bordelon B, Pehlevan C. Population Codes Enable Learning from Few Examples By Shaping Inductive Bias. bioRxiv. 2021;.
  33. 33. Canatar A, Bordelon B, Pehlevan C. Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks. arXiv:200613198 [cond-mat, stat]. 2021.
  34. 34. Harris KD. Additive function approximation in the brain. arXiv:190902603 [cs, q-bio, stat]. 2019.
  35. 35. Hashemi A, Schaeffer H, Shi R, Topcu U, Tran G, Ward R. Generalization Bounds for Sparse Random Feature Expansions. arXiv:210303191 [cs, math, stat]. 2021.
  36. 36. Xie M, Muscinelli S, Decker Harris K, Litwin-Kumar A. Task-dependent optimal representations for cerebellar learning. bioRxiv. 2022;.
  37. 37. Jacobsen JH, van Gemert J, Lou Z, Smeulders AWM. Structured Receptive Fields in CNNs. arXiv:160502971 [cs]. 2016.
  38. 38. Paninski L. Convergence properties of some spike-triggered analysis techniques. In: Network: Computation in Neural Systems; 2003. p. 2003. pmid:12938766
  39. 39. Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press; 2005.
  40. 40. Kosambi DD. Statistics in function space. The Journal of the Indian Mathematical Society New Series. 1943;7:76–88.
  41. 41. Wahba G. Spline Models for Observational Data. SIAM; 1990.
  42. 42. Dickerson BH, Fox JL, Sponberg S. Functional diversity from generic encoding in insect campaniform sensilla. Current Opinion in Physiology. 2021;19:194–203.
  43. 43. Yarger AM, Fox JL. Dipteran Halteres: Perspectives on Function and Integration for a Unique Sensory Organ. Integrative and Comparative Biology. 2016;56(5):865–876. pmid:27413092
  44. 44. Fox JL, Daniel TL. A neural basis for gyroscopic force measurement in the halteres of Holorusia. Journal of Comparative Physiology A. 2008;194(10):887–897. pmid:18751714
  45. 45. Genton MG. Classes of Kernels for Machine Learning: A Statistics Perspective. Journal of Machine Learning Research. 2001;2(Dec):299–312.
  46. 46. Hubel DH, Wiesel TN. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology. 1959;148(3):574–591. pmid:14403679
  47. 47. Marr D, Hildreth E, Brenner S. Theory of edge detection. Proceedings of the Royal Society of London Series B Biological Sciences. 1980;207(1167):187–217. pmid:6102765
  48. 48. Martens JB. The Hermite transform-theory. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1990;38(9):1595–1606.
  49. 49. Kleiner M, Brainard D, Pelli D. What’s new in Psychtoolbox-3? In: Perception—ECVP Abstract Supplement. European Conference on Visual Perception (ECVP-2007), August 27-31, Arezzo, Italy; 2007.
  50. 50. Sofroniew NJ, Flickinger D, King J, Svoboda K. A large field of view two-photon mesoscope with subcellular resolution for in vivo imaging. Elife. 2016;5:e14472. pmid:27300105
  51. 51. Pachitariu M, Stringer C, Dipoppa M, Schröder S, Rossi LF, Dalgleish H, et al. Suite2p: beyond 10,000 neurons with standard two-photon microscopy. BioRxiv. 2017.
  52. 52. Mohren TL, Daniel TL, Brunton SL, Brunton BW. Neural-inspired sensors enable sparse, efficient classification of spatiotemporal data. Proceedings of the National Academy of Sciences. 2018;115(42):10564–10569. pmid:30213850
  53. 53. LeCun Y, Cortes C, Burges C. MNIST handwritten digit database. ATT Labs [Online] Available: http://yannlecuncom/exdb/mnist. 2010;2.
  54. 54. Clanuwat T, Bober-Irizar M, Kitamoto A, Lamb A, Yamamoto K, Ha D. Deep Learning for Classical Japanese Literature. arXiv:181201718 [cs, stat]. 9999;.
  55. 55. He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: 2015 IEEE International Conference on Computer Vision (ICCV); 2015. p. 1026–1034.
  56. 56. Krizhevsky A. One weird trick for parallelizing convolutional neural networks. arXiv:14045997 [cs]. 2014.
  57. 57. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. 2015;115(3):211–252.
  58. 58. Pruszynski JA, Johansson RS. Edge-orientation processing in first-order tactile neurons. Nature Neuroscience. 2014;17(10):1404–1409. pmid:25174006
  59. 59. Zhao CW, Daley MJ, Pruszynski JA. Neural network models of the tactile system develop first-order units with spatially complex receptive fields. PLOS ONE. 2018;13(6):e0199196. pmid:29902277
  60. 60. Xu D, Ruan C, Korpeoglu E, Kumar S, Achan K. A Temporal Kernel Approach for Deep Learning with Continuous-time Information. arXiv:210315213 [cs]. 2021.
  61. 61. Koay SA, Charles AS, Thiberge SY, Brody CD, Tank DW. Sequential and efficient neural-population coding of complex task information. bioRxiv. 2021;. pmid:34776042
  62. 62. Olshausen BA, Field DJ. Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1? Vision Research. 1997;37(23):3311–3325. pmid:9425546
  63. 63. Barlow HB, et al. Possible principles underlying the transformation of sensory messages. Sensory communication. 1961;1(01).
  64. 64. Chalk M, Marre O, Tkačik G. Toward a Unified Theory of Efficient, Predictive, and Sparse Coding. Proceedings of the National Academy of Sciences. 2018;115(1):186–191.
  65. 65. Swanson LW. Brain architecture: Understanding the basic plan. vol. xv. New York, NY, US: Oxford University Press; 2003.
  66. 66. Strausfeld NJ. Arthropod Brains: Evolution, Functional Elegance, and Historical Significance. Harvard University Press; 2012. Available from: https://www.jstor.org/stable/j.ctv1dp0v2h.
  67. 67. Zador AM. A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications. 2019;10(1):3770. pmid:31434893
  68. 68. Oja E. Principal components, minor components, and linear neural networks. Neural Networks. 1992;5(6):927–935.
  69. 69. Pehlevan C, Sengupta AM, Chklovskii DB. Why Do Similarity Matching Objectives Lead to Hebbian/Anti-Hebbian Networks? Neural Computation. 2018;30(1):84–124. pmid:28957017
  70. 70. Ringach DL. Haphazard Wiring of Simple Receptive Fields and Orientation Columns in Visual Cortex. Journal of Neurophysiology. 2004;92(1):468–476. pmid:14999045
  71. 71. Ostojic S, Brunel N. From Spiking Neuron Models to Linear-Nonlinear Models. PLOS Computational Biology. 2011;7(1):e1001056. pmid:21283777
  72. 72. Weber AI, Pillow JW. Capturing the Dynamical Repertoire of Single Neurons with Generalized Linear Models. Neural Computation. 2017;29(12):3260–3289. pmid:28957020
  73. 73. Fairhall A. The receptive field is dead. Long live the receptive field? Current Opinion in Neurobiology. 2014;25:ix–xii. pmid:24618227
  74. 74. Barth FG. Mechanics to pre-process information for the fine tuning of mechanoreceptors. Journal of Comparative Physiology A. 2019;205(5):661–686. pmid:31270587
  75. 75. Eldar YC, Kutyniok G, editors. Compressed Sensing: Theory and Applications. Cambridge University Press; 2012.
  76. 76. Foucart S, Rauhut H. A Mathematical Introduction to Compressive Sensing. Birkhäuser Basel; 2013.
  77. 77. Ganguli S, Sompolinsky H. Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and Data Analysis. Annual Review of Neuroscience. 2012;35(1):485–508. pmid:22483042
  78. 78. Fallah K, Willats AA, Liu N, Rozell CJ. Learning sparse codes from compressed representations with biologically plausible local wiring constraints. bioRxiv. 2020;.
  79. 79. Barranca VJ, Kovačič G, Zhou D, Cai D. Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling. Scientific Reports. 2016;6(1):31976. pmid:27555464
  80. 80. Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge University Press; 2004.
  81. 81. Shalev-Shwartz S, Ben-David S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press; 2014.
  82. 82. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences. 2014;111(23):8619–8624. pmid:24812127
  83. 83. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv:151203385 [cs]. 2015.
  84. 84. Sinha A, Duchi JC. Learning Kernels with Random Features. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 29. Curran Associates, Inc.; 2016. Available from: https://proceedings.neurips.cc/paper/2016/file/e70611883d2760c8bbafb4acb29e3446-Paper.pdf.
  85. 85. Arpit D, Campos V, Bengio Y. How to Initialize your Network? Robust Initialization for WeightNorm & ResNets. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc.; 2019.Available from: https://proceedings.neurips.cc/paper/2019/file/e520f70ac3930490458892665cda6620-Paper.pdf.
  86. 86. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings; 2010. p. 249–256. Available from: http://proceedings.mlr.press/v9/glorot10a.html.
  87. 87. Olah C, Mordvintsev A, Schubert L. Feature Visualization. Distill. 2017;.
  88. 88. Lindsay GW. Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future. Journal of Cognitive Neuroscience. 2020; p. 1–15.
  89. 89. Dapello J, Marques T, Schrimpf M, Geiger F, Cox D, DiCarlo JJ. Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations. Advances in Neural Information Processing Systems. 2020;33.
  90. 90. Illing B, Gerstner W, Brea J. Biologically plausible deep learning—But how far can we go with shallow networks? Neural Networks. 2019;118:90–101. pmid:31254771
  91. 91. Mallat S. Group Invariant Scattering. Communications on Pure and Applied Mathematics. 2012;65(10):1331–1398.
  92. 92. Bruna J, Mallat S. Invariant Scattering Convolution Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013;35(8):1872–1886. pmid:23787341
  93. 93. Andén J, Mallat S. Deep Scattering Spectrum. IEEE Transactions on Signal Processing. 2014;62(16):4114–4128.
  94. 94. Pintea SL, Tomen N, Goes SF, Loog M, van Gemert JC. Resolution learning in deep convolutional networks using scale-space theory. arXiv:210603412 [cs]. 2021.
  95. 95. Goltstein PM, Meijer GT, Pennartz CM. Conditioning sharpens the spatial representation of rewarded stimuli in mouse primary visual cortex. eLife. 2018;7:e37683. pmid:30222107
  96. 96. Fritz J, Shamma S, Elhilali M, Klein D. Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nature Neuroscience. 2003;6:1216–1223. pmid:14583754
  97. 97. Beniaguev D, Segev I, London M. Single cortical neurons as deep artificial neural networks. Neuron. 2021;109(17):2727–2739.e3. pmid:34380016
  98. 98. Jacot A, Gabriel F, Hongler C. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. In: Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc.; 2018. Available from: https://papers.nips.cc/paper/2018/hash/5a4be1fa34e62bb8a6ec6b91d2462f5a-Abstract.html.
  99. 99. Mattos CLC, Dai Z, Damianou A, Forth J, Barreto GA, Lawrence ND. Recurrent Gaussian Processes. arXiv:151106644 [cs, stat]. 2016.
  100. 100. Ringach DL. Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. Journal of Neurophysiology. 2002 Jul;88(1):455–463. pmid:12091567