Interpretation of Correlated Neural Variability from Models of Feed-Forward and Recurrent Circuits

The correlated variability in the responses of a neural population to the repeated presentation of a sensory stimulus is a universally observed phenomenon. Such correlations have been studied in much detail, both with respect to their mechanistic origin and to their influence on stimulus discrimination and on the performance of population codes. In particular, recurrent neural network models have been used to understand the origin (or lack) of correlations in neural activity. Here, we apply a model of recurrently connected stochastic neurons to interpret correlations found in a population of neurons recorded from mouse auditory cortex. We study the consequences of recurrent connections on the stimulus dependence of correlations, and we compare them to those from alternative sources of correlated variability, like correlated gain fluctuations and common input in feed-forward architectures. We find that a recurrent network model with random effective connections reproduces observed statistics, like the relation between noise and signal correlations in the data, in a natural way. In the model, we can analyze directly the relation between network parameters, correlations, and how well pairs of stimuli can be discriminated based on population activity. In this way, we can relate circuit parameters to information processing.


Definitions and statement of the 'macroscopic equations' for population firing rates and covariances
References to equations and figures in the main text are preceded by the letter "M". In order to develop an intuitive understanding of the way in which correlations generated in a recurrent network influence coding, we examine a highly simplified model. The recurrent network consists of two excitatory sub-populations, labeled E and E , each made up of N neurons, see sketch in Fig. 1 A. We ask to what extent their activity discriminates two stimuli when these elicit preferential responses in the two sub-populations, respectively.
The activity in the network is determined by Eqs. (M7) and (M12). For the sake of simplicity, we can set the external variance to zero, such that the input to the network is entirely defined by its mean input, r ext . We assume some further simplifications, for the sake of calculational ease: each neuron in sub-population L projects to a fixed number, n KL , of neurons in sub-population K; all non-zero coupling weights are identical, G ij = g E . Additionally, all neurons in the same population receive identical external input. Then, the two-component vectors R ext (s 1 ) = (1 + ∆, 1) T and R ext (s 2 ) = (1, 1 + ∆) T define a pair of stimuli. Each component denotes the input to each neuron in the corresponding population.
We reduce the dimensionality of the system by considering the population activity at a macroscopic level. The average response of population K across trials is defined as and its trial-to-trial variability is described by the population covariance matrix, with elements defined as Σ KL = k∈K,l∈L C kl .
It turns out that these macroscopic quantities depend only on the overall number of connections between populations, not on a specific network realization. For example, the sum of the rates of neurons in one population depends only on the sum of the inputs to all neurons, but not on how these inputs are distributed among the neurons. This is a consequence of the linearity of the dynamics and the assumption that each neuron has a fixed number of output connections [1,2], as we show in detail in Sec. 2. More precisely, we define the coupling within each population, Γ s = n EE g E = n E E g E , and across populations, Γ c = n EE g E = n E E g E , which make up a population coupling matrix Applying the definition in Eqs.
Up to a factor N , these equations are equivalent to the microscopic equations in the restricted context of the two-population model.

Derivation of the macroscopic equations
Here, we derive the macroscopic Eqs. (4) and (5). The homogeneity of each of the two sub-populations implies that, for neuron l in population L, the sum k∈K G kl is independent of l, so that we can define the population coupling matrix, Γ, through its elements, where g KL is the weight of a microscopic connection between a neuron in population K and a neuron in population L. We would like to to express R and Σ in terms of Γ. At the microscopic level, the rates, r, and the covariances, C, depend on the transfer matrix, B = (I − G) −1 . We define an analogous, population transfer matrix, P , that depends only on Γ, and we show that R and Σ can be written in terms of P . Let In order to relate the matrix P to the microscopic quantities, r and C, we use the fact that the identities hold for any neurons l in population L. These identities can be proven by considering each term in the infinite sum defining the matrix P , and noting that As an illustration of this identity, we consider the case m = 2; we have k∈K,l∈L To express the firing rate of population I, R I , we then use Eq. (8) to show that since r ext,k = R ext,K for any k ∈ K. In vector form, this can be rewritten as the identity which matches Eq. (4). We can apply a similar manipulation to derive the covariances of the population responses, Σ, as 3 Macroscopic signal and noise correlations as functions of connection strengths We now evaluate signal and noise correlations in the population activity, as functions of the network couplings. The strength of the cross-coupling between populations, Γ c , determines the population rates and covariances via the 'effective cross-coupling', P c , in the population transfer matrix: The network is unstable if the eigenvalues of Γ are larger than unity; in other words, stability requires Γ s + Γ c < 1, or P c < P s . In a heterogeneous model with random coupling strengths, the signal and noise covariances were traced back to the mean and variance of the elements of the effective couple matrix. Similarly, here in a two-population model, the population behavior is ruled by the mean and variance of the elements of the matrix P , or, stated more simply, the two quantities P s + P c and P s − P c . To make this statement more precise, we consider an ensemble of two-component input vectors, R ext (s) = r ext (s), r ext (s) T , with the random external inputs, r ext and r ext , as elements and chosen from a normal distribution with with mean µ rext and variance σ 2 rext independently across a set of stimuli, s. The average population response, R = P R ext , then yields signal covariances, Σ S ij = cov(R i , R j ), across the stimulus ensemble, given by And the average signal correlation for the population activity is calculated as where we have used the notation and From Eq. (16), the normalized variance of the elements of P determines the strength of signal correlations. It is small if the network is strongly connected, that is, if Γ c is large. A similar relation holds for the average noise correlations, if we average first over stimuli, then over neurons: the average of the noise covariance matrix over stimuli is Σ(s) s = P R(s)P T = P R P T . Because the external input to both populations is identical, r ext = r ext , the average rates of both populations are identical, and the average over population pairs can be evaluated just as in the case of the signal correlations. From it follows that (20)

Condition on circuit parameters for the emergence of beneficial correlations
In the context of the two-population model, we can show that, depending on the crosscoupling, noise correlations can be beneficial for stimulus discrimination. This is meant in a comparison to the case of independent neurons, obtained by shuffling trials. As an illustration, consider a situation in which the population responses to two stimuli, s 1 and s 2 , are R(s 1 ) = (R 0 + ∆R, R 0 ) and R(s 2 ) = (R 0 , R 0 + ∆R), respectively. For the sake of simplicity, we assume that the population covariance matrix is stimulus-independent, and Σ EE = Σ E E Here, the most discriminant direction (see Eq. (M21), w, here is an eigenvector of Σ, (1, −1), and the variance along this direction corresponds to the smaller of the two eigenvalues of the matrix Σ, namely If this quantity is smaller than the population variance in a two-population model with independent neurons, namely σ 2 indep = i∈E C ii , then noise correlation are beneficial to stimulus discrimination. See Fig. 1 B-E for an illustration from a numerical calculation. From Eq. (22), this condition is equivalent to requiring that the summed covariances across the two populations, C cross = i∈E,k∈E C ik , be larger than the summed covariances within a population, C within = i =j∈E C ij .
We can now translate this condition on the values of covariances to a condition on the values of circuit couplings in our dynamical model. For the sake of simplicity, we consider the situation in which R ext = R ext ≡ R; then The average pairwise population cross-covariance is calculated as C cross = Σ E E /N 2 . Approximately, the average neuron variance differs from the average within-covariance only by the contribution of the rate to the variance: C ii ≈ R/N + C within (this can be seen from a series expansion of the equation for the covariances, C = BD[R]B T ). From this we get for the population variance Σ EE = N C ii + N (N − 1)C within ≈ R + N 2 C within . With Eq. (23), the condition on covariances, C cross > C within , translates into the following condition on circuit parameters: Since P s > P c , we obtain the simple inequality Using Eq. (14) for the elements of the matrix P , we obtain the condition where the function f (x) = x (1 − x) is a parabola that goes through the points (0, 0) and (0, 1), and has its minimum at (1/2, −1/4). Since 1 − Γ s > Γ c (see Eq. (14)), the condition in Eq. (29) can be fulfilled if Γ s < 1/2 and within-coupling is smaller than cross-coupling, Γ s < Γ c .

Continuous stimuli changing in arbitrary directions
Above, we used the signal-to-noise ratio, S, to measure how well discrete pairs of stimuli in a given ensemble can be discriminated. In a model network, we have access to all possible input dimensions, and we can use an alternative measure that combines these possible stimulus dimensions. If a high-dimensional stimulus varies continuously, the information in the response distribution about the stimulus can be measured by the Fisher information matrix [3]. An approximation, the 'linear Fisher information', is obtained by assuming Gaussian noise and by neglecting the information pertaining to the stimulus dependence of the covariance matrix; the elements of the linear Fisher Information matrix are defined as for the population covariance matrix, Σ, and population response, R. Here, ∂ m R is the derivative of the response vector with respect to stimulus coordinate m. Our stimulus dimensions are the coordinates of the input vector, R ext , and the mean output vector, in a network model, is R = P N R ext . It is easy to see that ∂ m R is given by the mth column of P , and, hence, As the covariance matrix in a recurrent network is given by Σ = P D[R]P T , the corresponding linear Fisher Information matrix reads According to Eq. 32, the linear Fisher information scales as the inverse of the strength of connections in the network. This is a special case of the more general result considered in [4]. This obtains because recurrent connections enhance noise, as compared to the firing rates. Generally, if the input depends linearly on a one-dimensional stimulus s, R ext (s) = R ext,0 + s∆R ext , the information about changes in this direction is ∆R T ext I∆R ext . As a measure of the information about stimulus changes in all possible directions, we use the trace of I, Tr(I) = i I ii .

6
Impact of circuit properties on stimulus coding: intuition from the two-population model We now exploit the various observations summarized above to investigate in greater detail the effect of the circuit parameters on the accuracy of population coding, in the context of our simple, two-population model. We start by examining how the cross-coupling between the two populations influences the statistics of the responses to two stimuli. Upon increasing the cross-coupling, Γ c , the response distributions evoked by the two stimuli change in a specific manner (Fig. 1): the average population responses are enhanced, but also more similar, i.e., the difference in the population responses to the two stimuli, |R(s 1 ) − R(s 2 )|, is suppressed. At the same time, the noise correlation between the two population responses is enhanced, so that the two response distributions become 'more elongated', Fig. 1 B.
To examine the combined impact of these changes on stimulus coding, we compare this situation to that of a population of independent neurons with matched average responses and variances; the latter can be obtained simply by shuffling the data from the correlated population among trials. As we showed in the sections above, a condition for correlations that are beneficial to stimulus coding, from this angle, is that correlations between neurons in different populations be stronger than pairwise correlations within the same population. This condition is realized for strong coupling between the two populations ( Fig. 1 C,D). Considering the dependence of the signal-to-noise ratio, S (the difference in means divided by the variance, see Eq. (M22) in the main text), on the cross-coupling, we observe that increasing the cross-coupling between the two populations decreases the signal-to-noise ratio, in both the 'native' and the shuffled cases (Fig. 1 E); but this suppression is stronger in the shuffled case. The suppression can be traced back to the noise produced in spike generation: under amplification through a matrix, P , firing rates, R = P N R ext , scale linearly with P , while the covariances, Σ = P D(R)P T , scale with a higher power of P . In our model, this cubic dependence on P comes from the combination of rate fluctuations (resulting from recurrent inputs from Poisson neurons and yielding a quadratic contribution) and the Poisson spike generation. Because of the cubic dependence of the variances, as compared to the linear dependence of the rates, the signal-to-noise ratio decreases as a function of the strength of the recurrent connections, P .
Independent of this effect, larger cross-coupling reshapes both external and internally generated noise to relegate it along a direction that is irrelevant for discrimination. This mechanism would compensate for the decrease in the difference of the average responses, if no internal noise were generated. This gain is reduced when correlations are shuffled, such that more information is lost in this case.
Up to here, we considered the special case of two stimuli. For a larger set of stimuli, noise correlations may be harmful for discriminating some of the stimulus pairs, while beneficial for discriminating others (as compared to the case of independent neurons). In Fig. 1 F, an example is sketched for each of the two cases: a pair of stimuli where both populations receive an identical input in the case of each stimulus (connected by a dashed line), and a pair of stimuli where in each case one population receives stronger input (connected by a dotted line). For an ensemble of stimuli, the overall effect of recurrent amplification depends on the distribution of favorable and unfavorable pairs. In order to quantify the effect of recurrent amplification more generally, rather than averaging across a number of possible stimulus pairs, we consider a measure that takes into account only local changes in the stimulus, but in all possible directions. In the context of the scenario visualized in Fig. 1 F, we consider a stimulus which can result in an average response at the position where the dashed and the dotted lines cross. To measure the combined effects on information for the case in which a change in stimulus results in a change of the response along any line in the plane, we use the trace of the linear Fisher information matrix (see the previous section for details). In Fig. 1 G, we show how the trace of the linear Fisher information matrix decreases with the strength of recurrent amplification, which we measure by the average over the elements of the matrix P .
7 Impact of circuit properties on stimulus coding in an extended model with excitatory and inhibitory neurons The simple two-population model considered above comes with a limited set of parameters. It is instructive to examine a slightly extended model, with four coupled neural sub-populations: two excitatory sub-populations (E 1 and E 2 ) and two inhibitory sub-populations (I 1 and I 2 ). In this model, we consider connections between neurons within each excitatory subpopulation (resulting in population coupling E 1 ←→ E 1 and E 2 ←→ E 2 equal in magnitude), cross-connections between the neurons from the two excitatory sub-populations (E 1 ←→ E 2 ), and connections between pairs of neurons from the excitatory and inhibitory sub-populations (E 1 ←→ I 1 and E 2 ←→ I 2 ). (In our simulations, each of these sub-populations contains 100 neurons.) As in the two-population model discussed above, this model can be described by effective parameters which combine the size of sub-populations, synaptic strengths, and the probabilities of forming synapses; two important parameters are the self-coupling within E 1 and E 2 , Γ S , and the cross-coupling between E 1 and E 2 , Γ C . By appropriately choosing the other parameters in the model, we can keep the firing rates in E 1 and E 2 , as well as the single-neuron variance averaged over E 1 , unchanged as we vary Γ S and Γ C . (It would be possible to keep also the variance in E 2 invariant, by extending the model further to include an additional parameter, such as inhibitory self-connections. We do not follow this route, here, for the sake of conciseness.) In spite of its simplicity, the model with both excitatory and inhibitory neurons exhibits an intriguing behavior in term of stimulus coding. We can examine the behavior of this model using a framework similar to the one described above. Here, we present the corresponding results, in terms of the signal-to-noise ratio for the discrimination between two stimuli (defined as before). The signal-to-noise ratio, S, varies as a function of both Γ S and Γ C ; it increases for smaller Γ S and larger Γ C (2 A). Furthermore, fixing the population-averaged variance to a set of values (2 A, top) defines a set of curves in the (Γ S , Γ C )-plane (2 A, bottom). These demonstrate that, even in the presence of fixed variance, circuit parameters can be varied in such a way as to modulate the signal-to-noise ratio appreciably. Interestingly, a higher signal-to-noise ratio does not result from a reduced noise entropy: indeed, the noise entropy of the correlated population in fact grows for larger signal-to-noise ratios, as compared to the noise entropy in a matched independent population (2 B) Fig 2. Effects of circuit parameters on stimulus coding in a simple recurrent network. A: Signal-to-noise ratio as a function of the self-coupling, Γ S , and the cross-coupling, Γ C , in two excitatory sub-populations (bottom). For a set of fixed population-averaged variances (top), the allowed values of the parameters Γ S and Γ C describe curves along which the signal-to-noise can vary appreciably (bottom). B: Ratio of the noise entropy in a model of independent neurons with matched firing rates and population-averaged variance and and the noise entropy in the recurrent model, as a function of the parameters Γ S and Γ C .