The Equivalence of Information-Theoretic and Likelihood-Based Methods for Neural Dimensionality Reduction

doi:10.1371/journal.pcbi.1004141

Fig 1.

The linear-nonlinear-Poisson (LNP) encoding model formalizes the neural encoding process in terms of a cascade of three stages.

First, the high-dimensional stimulus s projects onto bank of filters contained in the columns of a matrix K, resulting in a point in a low-dimensional neural feature space K^⊤s. Second, an instantaneous nonlinear function f maps the filtered stimulus to an instantaneous spike rate λ. Third, spikes r are generated according to an inhomogeneous Poisson process.

More »

Expand

Fig 2.

Geometric illustration of maximally-informative-dimensions (MID).

Left: A two-dimensional stimulus space, with points indicating the location of raw stimuli (black) and spike-eliciting stimuli (red). For this simulated example, the probability of spiking depended only on the projection onto a filter k_true, oriented at 45^∘. Histograms (inset) show the one-dimensional distributions of raw (black) and spike-triggered stimuli (red) projected onto k_true (lower right) and its orthogonal complement (lower left). Right: Estimated single-spike information captured by a 1D subspace, as a function of the axis of projection. The MID estimate ${\hat{k}}_{M I D}$ (dotted) corresponds to the axis maximizing single-spike information, which converges asymptotically to k_true with dataset size.

More »

Expand

Fig 3.

Effects of the number of histogram bins on empirical single-spike information and MID performance.

(A) Scatter plot of raw stimuli (black) and spike-triggered stimuli (gray) from a simulated experiment using two-dimensional stimuli to drive a linear-nonlinear-Bernoulli neuron with sigmoidal nonlinearity. Arrow indicates the direction of the true filter k. (B) Plug-In estimates of p(k^⊤s|spike), the spike-triggered stimulus distribution along the true filter axis, from 1000 stimuli and 200 spikes, using 5 (blue), 20 (green) or 80 (red) histogram bins. Black traces show estimates of raw distribution p(k^⊤s) along the same axis. (C) True nonlinearity (black) and ML estimates of the nonlinearity (derived from the ratio of the density estimates shown in B). Roughness of the 80-bin estimate (red) arises from undersampling, or (equivalently) overfitting of the nonlinearity. (D) Empirical single-spike information vs. direction, calculated using 5, 20 or 80 histogram bins. Note that the 80-bin model overestimates the true asymptotic single-spike information at the peak by a factor of more than 1.5. (E) Convergence of empirical single-spike information along the true filter axis as a function of sample size. With small amounts of data, all three models overfit, leading to upward bias in estimated information. For large amounts of data, the 5-bin model underfits and therefore under-estimates information, since it lacks the smoothness to adequately describe the shape of the sigmoidal nonlinearity. (F) Filter error as a function of the number of stimuli, showing that the optimal number of histogram bins depends on the amount of data.

More »

Expand

Fig 4.

Illustration of MID failure mode due to non-Poisson spiking.

(A) Stimuli were drawn uniformly on the unit half-circle, θ ∼ Unif(−π/2,π/2). The simulated neuron had Bernoulli (i.e., binary) spiking, where the probability of a spike increased linearly from 0 to 1 as θ varied from -π/2 to π/2, that is: p(spike|θ) = θ/π+1/2. Stimuli eliciting “spike” and “no-spike” are indicated by gray and black circles, respectively. For this neuron, the most informative one-dimensional linear projection corresponds to the vertical axis ( ${\hat{k}}_{B e r}$ ), but the MID estimator ( ${\hat{k}}_{M I D}$ ) exhibits a 16^∘ clockwise bias. (B) Information from spikes (black), silences (gray), and both (red), as a function of projection angle. The peak of the Bernoulli information (which defines ${\hat{k}}_{B e r}$ ) lies close to π/2, while the peak of single-spike information (which defines ${\hat{k}}_{M I D}$ ) exhibits the clockwise bias shown in A. Note that ${\hat{k}}_{M I D}$ does not converge to the optimal direction even in the limit of infinite data, due to its lack of sensitivity to information from silences. Although this figure is framed in an information-theoretic sense, equations (19) and (20) detail the equivalence between I_Ber and ℒ_lnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.

More »

Expand

Fig 5.

A second example Bernoulli neuron for which ${\hat{k}}_{M I D}$ fails to identify the most-informative one-dimensional subspace.

The stimulus space has two dimensions, denoted s₁ and s₂, and stimuli were drawn iid from a standard Gaussian (0,1). (A) The nonlinearity f(s₁,s₂) = p(spike|s₁,s₂) is excitatory in s₁ and suppressive in s₂; brighter intensity indicates higher spike probability. (B) Contour plot of the stimulus-conditional densities given the two possible responses: “spike” (red) or “no-spike” (blue), along with the raw stimulus distribution (black). (C) Information carried by silences (I₀), single spikes (I_ss), and total Bernoulli information (I_Ber = I₀+I_ss) as a function of subspace orientation. The MID estimate ${\hat{k}}_{M I D} = 90^{\circ}$ is the maximum of I_ss, but the total Bernoulli information is in fact 13% higher at ${\hat{k}}_{B e r} = 0^{\circ}$ due to the incorporation of no-spike information. Although both stimulus axes are clearly relevant to the neuron, MID identifies the less informative one. As with the previous figure, equations (19) and (20) detail the equivalence between I_Ber and ℒ_lnb, so that this figure can be viewed from either an information-theoretic or likelihood-based perspective.

More »

Expand

Fig 6.

Lower bound on the fraction of total information neglected by MID for a Bernoulli neuron, as a function of the marginal spike probability p(spike) = p(r = 1), for the special case of a binary stimulus.

Information loss is quantified as the ratio I₀/(I₀+I_ss), the information due to no-spike events, I₀, divided by the total information due to spikes and silences, I₀+I_ss. The dashed gray line shows the lower bound derived in the limit p(spike) → 0. The solid black line shows the actual minimum achieved for binary stimuli s ∈ {0,1} with p(s = 1) = q, computed via a numerical search over the parameter q ∈ [0, 1] for each value of p(spike). The lower bound is substantially loose for p(spike) > 0, since as p(spike) → 1, the fraction of information due to silences goes to 1.

More »

Expand

Fig 7.

Two examples illustrating sub-optimality of MID under discrete (non-Poisson) spiking.

In both cases, stimuli were uniformly distributed within the unit circle and the simulated neuron’s response depended on a 1D projection of the stimulus onto the horizontal axis (θ = 0). Each stimulus evoked 0, 1, or 2 spikes. (A) Deterministic neuron. Left: Scatter plot of stimuli labelled by number of spikes evoked, and the piece-wise constant nonlinearity governing the response (below). The nonlinearity sets the response count deterministically, thus dramatically violating Poisson expectations. Middle: information vs. axis of projection. The total information I_count reflects the information from 0-, 1-, and 2-spike responses (treated as distinct symbols), while the single-spike information I_ss ignores silences and treats 2-spike responses as two samples from p(s|spike). Right: Average absolute error in ${\hat{k}}_{M I D}$ and ${\hat{k}}_{c o u n t}$ as a function of sample size; the latter achieves 18% lower error due to its sensitivity to the non-Poisson structure of the response. (B) Stochastic neuron with sigmoidal nonlinearity controlling the stochasticity of responses. The neuron transitions from almost always emitting 1 spike for large negative stimulus projections, to generating either 0 or 2 spikes with equal probability at large positive projections. Here, the nonlinearity does not modulate the mean spike rate, so Î_ss is approximately zero for all stimulus projections (middle) and the MID estimator does not converge (right). However, the ${\hat{k}}_{c o u n t}$ estimate converges because the LNC model is sensitive to the change in conditional response distribution. Equation (37) details the relationship between I_count and ℒ_lnc, so that this figure can be interpreted from either an information-theoretic or likelihood-based perspective.

More »

Expand

Fig 8.

Estimation of high-dimensional subspaces using a nonlinearity parametrized with cylindrical basis functions (CBFs).

(A) Eight most informative filters for an example complex cell, estimated with iSTAC (top row) and cbf-LNP (bottom row). For the cbf-LNP model, the nonlinearity was parametrized with three first-order CBFs for the output of each filter (see Methods). (B) Estimated 1D nonlinearity along each filter axis, for the filters shown in (A). Note that third and fourth iSTAC filters are suppressive while third and fourth cbf-LNP filter are excitatory. (C) Cross-validated single-spike information for iSTAC, cbf-LNP, and rbf-LNP, as a function of the number of filters, averaged over a population of 16 neurons (selected from [29] for having ≥ 8 informative filters). The cbf-LNP estimate outperformed iSTAC in all cases, while rbf-LNP yielded a slight further increase for the first four dimensions. (D) Computation time for the numerical optimization of the cbf-LNP likelihood for up to 8 filters. Even for 30 minutes of data and 8 filters, optimisation took about 4 hours. (E) Average number of excitatory filters as a function of total number of filters, for each method. (F) Information gain from excitatory filters, for each method, averaged across neurons. Each point represents the average amount of information gained from adding an excitatory filter, as a function of the number of filters.

More »

Expand