Fig 1.
Left: generative model. In the generative model (purple), a neural activity pattern (white and black circles), sampled from the prior distribution (Ising model), pψ(r), is mapped by the decoder to a probability distribution over stimuli, pψ(x|r) (dashed purple curve), parametrized by functions of the activity patterns (here, mean and variance of a Gaussian distribution). The sum of the decoder output distributions, weighted by the prior over neural activity, defines the generative distribution (solid purple curve). The objective of the generative model is to produce stimuli according to the stimulus distribution in the environment, π(x) (green curve). In order to do so, the posterior distribution over neural activity, given external stimuli, is approximated by an encoder, qθ(r|x) (right, blue). Neurons emit spikes according to bell-shaped tuning curves (grey curves) in response to a stimulus, x (green dot), drawn from the stimulus distribution π(x) (green curve). The population response consists in a binary neural activity pattern, r (white and black circles). The two models are trained such as to match the generative and stimulus distributions; this objective is approximated by minimizing two loss functions. The distortion, D, maximizes the likelihood of the observed stimulus after the encoding-decoding process (upper dashed black arrow). The rate, R, minimizes the Kullback-Leibler divergence between the conditional encoding distribution and the marginal (prior) distribution of neural activity of the generative model (bottom dashed black arrow).
Fig 2.
, N = 12,
(A) Evolution of negative ELBO, and the two terms, D and R, with training epochs. Plot in log-log scale. (B) Joint evolution of R and D in the rate-distortion plane, colored according to the epoch (increasing from blue to yellow, colors in logarithmic scale). (C) Evolution of β during training.
Fig 3.
Qualitatively different optimal configurations.
In all simulations, N = 10 and . Top row: high-distortion, low-rate solution. Bottom row: low-distortion, high-rate solution. (A) Bell-shaped tuning curves of the encoder (probability of neuron i to emit a spike, as a function of x). (B) Comparison between the stimulus distribution, π(x) (green curve), and the generative distribution, pψ(x) = ∑r pψ(x|r)pψ(r) (purple curve). (C) Numerical values of the ELBO, and the distortion and rate terms.
Fig 4.
Characterization of the optimal solutions as functions of the target rate.
In all simulations, N = 12, . Solid curves illustrate the mean across different initializations and shaded regions correspond to one standard deviation. (A) Solutions of the ELBO optimization problem as a function of target rate,
(blue curve), and theoretical optimum,
(black curve), in the rate-distortion plane. Values of
where the solutions coincide with the theoretical optimum (grey region). Solutions depart from the optimal line when the rate is very low (poor generative model) or very high (saturated distortion). Inset: mutual information between stimuli and neural responses as a function of
. (B) Kullback-Leibler divergence between the stimulus and the generative distributions, as a function of
. (C) Optimal tuning curves for different values of
. Each dot represents a neuron: the position on the y-axis corresponds to its preferred stimulus, the size of the dot is proportional to the tuning width, and the color refers to the amplitude (see legend). Tuning curve parameters are averaged across 16 initializations, ordering the neurons as a function of their preferred position. The curve on the right illustrates the data distribution, π(x). (D) Entropy of the prior distribution over neural activity, pψ(r), as a function of
. Insets show two configurations of the coupling matrices, with rows ordered according to the neurons’ preferred stimuli, and coupling strengths colored according to the legend. (E) MSE in the stimulus estimate, obtained as the MAP (blue curve, scale on the left y-axis), or from samples (orange curve, scale on the right y-axis), as a function of
. Inset: MSE (MAP) as a function of the average tuning width.
Fig 5.
Dependence of the results on the population size.
Solid curves illustrate the mean across different initializations and shaded regions correspond to one standard deviation. Same simulations as in Fig 4, with different values of the population size. (A) Optimal solutions (blue curves), , for different population sizes, N, (legend in panel B) and theoretical bound (black curve),
, in the rate-distortion plane. (B) Kullback-Leibler divergence between the stimulus and the generative distributions, as a function of
. (C) MSE in the stimulus estimate, obtained as the MAP, as a function of
.
Fig 6.
Characterization of optimal solutions as functions of training set size.
In all simulations, N = 12, and . Solid curves represent the mean across different initializations, and shaded regions correspond to one standard deviation. The legend in panel A serves as a legend for all panels. (A) Solutions of the ELBO optimization problem as functions of the target rate, for the training set (top) and for the test set (bottom). Top: distortion,
, and rate,
(inset), for the training set as a function of the target rate, for different sizes of the training set, colored according to the legend. For smaller training sets, at higher rates the model tends to overfit the data, resulting in a lower training distortion than optimal (red line, large training set, same data as in Fig 4). Bottom: distortion,
, and rate,
(inset), for the test set as functions of the target rate, for different sizes of the training set. For smaller training sets, at higher rates the model does not generalize to unseen samples, resulting in a large distortion. (B) Left: Kullback-Leibler divergence between the stimulus and the generative distributions, as a function of
, for different sizes of the training set. At higher rates, the generative model fits poorly the stimulus distribution. Right: examples of comparisons between stimulus (green line) and generative distribution (red and orange line) at low (top) and high (bottom) rates, for different sizes of the training set, Ntrn = 100 and Ntrn = 2000, colored according to the legend as in panel A. (C) Tuning width, wi, as a function of the location of a preferred stimulus, ci (dots), at low (left) and high (right) rates, for different sizes of the training set, Ntrn = 100 and Ntrn = 1000. The grey curve represents the stimulus distribution, π(x). (D) MSE in the stimulus estimate, obtained as the MAP, as a function of
, for different sizes of the training set.
Fig 7.
Optimal allocation of neural resources.
In all simulations, N = 12 and . Results are illustrated for regions of the stimulus space where the coding performance is sufficiently high, defined as the region where the MSE is lower than the variance of the stimulus distribution. Below, we mention exponents of the power law fit when the variance explained is larger then a threshold, R2 ≥ 0.7. (A) Neural density as a function x (dashed curves) and power-law fits (solid curves, R2 = (0.21, 0.83, 0.95), γd = (−, 0.43, 0.62)), for three values of
(low, intermediate, and high); the grey curve illustrates the stimulus distribution. The density is computed by applying kernel density estimation to the set of the preferred positions of the neurons. (B) Tuning width, wi, as a function of preferred stimuli, ci (dots), and power-law fits (solid curves, R2 = (0.78, 0.42, 0.82), γw = (1.15, −, 0.7)) for three values of
; the grey curve illustrates the stimulus distribution. (C) Tuning width, wi, as a function of the neural density, d(ci), for three values of
; Pearson correlation coefficient ρ = (−0.74, −0.66, −0.79).
Fig 8.
Optimal allocation of coding performance.
Same numerical simulations as in Fig 4. (A) MSE (MAP estimate) as a function of x (dashed curves), and power-law fits (solid curves, R2 = (0.98, 0.98, 0.76), γe = (0.87, 0.74, 0.59)), for three values of . (B),(C) MSE as a function of the neural density (B) and tuning width (C), for three values of
; Pearson correlation coefficient ρdensity = (−0.66, −0.96, −0.81), ρwidth = (0.36, 0.59, 0.70).
Fig 9.
Generative models for the distribution of acoustic frequencies.
In all simulations, N = 12. The decoder is either Gaussian (top row) or log-normal (bottom row). (A) Solutions of the optimization problem as a function of the target rate, (blue curve), in the rate-distortion plane. Inset: environmental distribution of acoustic frequencies, π(f), and generative model fit for two different values of the target rate, colored according to the legend. (B) Optimal tuning curves for different values of
. Each dot represents a neuron: the position on the y-axis corresponds to its preferred stimulus, the size of the dot is proportional to the tuning width, and the color refers to the amplitude (see legend in Fig 4). The curve on the right illustrates the stimulus distribution, p(f). Insets show two examples. (C) Frequency discrimination as a function of acoustic frequency. Red markers are data points from three different subjects, data from Ref. [61]. Solid curves are the RMSE for three values of
, scaled by a factor of
, with variance explained R2 = (0.42, 0.41, 0.66). (D)-(F) Same as panels (A)-(C) in the case of a log-normal decoder. In panel F,
, with variance explained R2 = (0.92, 0.81, 0.96). Solid curves illustrate the mean across different initializations and shaded regions correspond to one standard deviation.