Eﬀective synaptic interactions in subsampled nonlinear networks with strong coupling

,

A major obstacle to uncovering structure-function relationships is the fact most experiments can only directly observe small fractions of an active network.The stateof-the-art methods for determining connections between neurons in living networks is to infer them by fitting statistical models to neural spiking data [16][17][18][19][20][21][22][23][24].However, the fact that we cannot observe all neurons in a network means that the statistically inferred connections are at best "effective" connections, representing some dynamical relationship between the activity of nodes but not necessarily a true physical connection [20,[25][26][27][28][29][30].The exact relationship between the effective and true connections is in general unknown, making it difficult to extrapolate from the statistics of effective connections back to the statistics of the true connections.Establishing this relationship thus has immediate importance for interpreting experimental measurements of synaptic interactions.It also informs fundamental a question in network computation: how hidden units shape the dynamics of a subset of nodes.
To provide a foundation upon which we can begin to understand the relationship between true network connectivity and the effective connections between neurons, we analyze a probabilistic model of network activity in which all properties are known, and derive a novel approximation to the effective model for the network of subsampled observed neurons.This makes explicit how the synaptic interactions between neurons are modified by unobserved neurons in the network, and under what conditions these modifications are, and are not, significant.As an important example, we study a sparse, random (Erdős-Réyni ) network with N cells and with synaptic weights that scale as 1/ √ N [31][32][33][34], as has been recently observed experimentally [35], and show how unobserved neurons significantly reshape the effective synaptic interactions away from the ground-truth interactions.This is not the case with more "classical" 1/N scaling.Hence, depending on the nature of the statistics of synaptic weights and how they scale with network size, hidden paths can have a major impact on the observed interactions between neurons, impacting both the local network computation and interpretation of experimental data.
Model-We model the full network of N neurons as a nonlinear Hawkes process [36].This is commonly known as a "Generalized linear (point process) model" in neuroscience, and is broadly used to fit neural activity data [16-19, 21-24, 37].Here we use it as a generative model for network activity, as it is a can be directly linked to common spiking models such as integrate and fire systems driven by noisy inputs [38,39].Similar studies of the impact of hidden neurons have been performed for the kinetic Ising model, another non-equilibrium model that has been applied to neural activity [40][41][42].However, the versions of the kinetic Ising model studied thus far have a Markovian time-dependence, whereas the nonlinear Hawkes model we consider here may have arbitrary history dependence, important for correctly capturing the statistics of network dynamics.
To derive an approximate model for an observed subset of the network, we partition the network into two sets: recorded neurons (labeled by indices r) and hidden neurons (labeled by indices h).Each recorded neuron has an FIG. 1.The hidden unit problem: A: The functional connection between two neurons can be decomposed into contributions from all paths from one neuron to the other through "hidden" intermediary neurons within the same circuit.In this schematic, neurons 1 and 2 are observed, while 3 and 4 are hidden.The effective connection from 1 to 2 comprises a direct connection plus contributions from paths signals can travel through neurons 3 and 4 before arriving at neuron 2. B: Leftmost, the effective connection from neuron 1 to 2. Subsequent plots decompose this connection into contributions from each path in A. Only directed connections from neuron 1 to 2 through hidden units contribute to the effective connection.For instance, although neuron 2 makes a connection to neuron 3, this does not generate an effective connection from 2 to 1 because neither neurons 3 nor 4 make a connection back to neuron 1, and hence there is no path that neuron 2 can send a signal to neuron 1.
instantaneous firing rate λ r (t) such that the probability that the neuron fires within a small time window [t, t+dt] is λ r (t)dt.The instantaneous firing rate in our model is λ r (t) = λ 0 φ µ r + r J r,r * ṅr (t) where λ 0 is a characteristic firing rate, φ(x) is a nonnegative, continuous function, µ r is a tonic drive that sets the baseline firing rate of the neuron, and J i,j * ṅj (t) ≡ ∞ −∞ dt J i,j (t − t ) ṅj (t ) is the convolution of the spike filter J i,j (t) with spikes ṅj (t) from pre-synaptic neuron j to post-synaptic neuron i.In this work we will take the tonic drive to be constant in time, and focus on the steady-state network activity in response to this drive.We consider spike filters of the form J i,j (t) ≡ J i,j g j (t), where the temporal waveforms g j (t) are normalized such that ∞ 0 dt g j (t) = 1 for all neurons j.Because of this normalization, the weight J i,j carries units of time.Selfcouplings J i,i need not be interpreted as actual autapses, but rather account for the influence of a neuron's spiking history on its own firing rate (e.g., J i,i < 0 suppresses firing, like a refractory period, while J i,i > 0 promotes bursting).The firing rates for the hidden neurons follow the same expression with indices h and r interchanged.
We seek to describe the dynamics of the recorded neurons entirely in terms of their own set of spiking histories, eliminating the dependence on the activity of the hidden neurons.This demands averaging out the activity of the hidden neurons; in practice this is intractable to perform exactly [43][44][45].Here, we use a mean field approximation and assume that the input from the hidden neurons can be approximated by its mean conditioned on the activity of the recorded neurons.This yields the following formula for the instantaneous firing rates of the recorded neurons: λ r (t) ≈ λ 0 φ µ eff r + r J eff r,r * ṅr (t) .
Here, the effective baselines µ eff r = µ r + h J r,h ν h are simply modulated by the net input to the neuron, so we do not focus on them here.The effective coupling filters are given in the frequency domain by Ĵeff r,r (ω) = Ĵr,r (ω) +

h,h
Ĵr,h (ω) Γh,h (ω) Ĵh ,r (ω).(2) Here, the ν h are the steady-state mean firing rates of the hidden neurons and Γh,h (ω) is the linear response function of the hidden network to perturbations in the input [46].Both ν h and Γh,h (ω) are calculated in the absence of the recorded neurons, using a mean field theory approximation.The complete derivation is given in the Supplementary Information (SI).
In deriving these results, we have neglected both fluctuations around the mean input from the hidden neurons, as well as higher order filtering of the recorded neuron spikes.These results hold for any choice of network parameters for which the mean field steady state of the hidden network exists.For details on the justification of these approximations, see the SI.
The effective coupling filters are what we would-in principle-measure experimentally if we observe only a subset of a network.In practice, inferring these network properties from data is an extremely nontrivial task [16-24, 40, 41], and details of the fitting procedure could potentially further skew the inferred coupling filters.We will put aside these complications here, and assume we have access to an inference procedure that allows us to measure J eff r,r (t) without error, so that we may focus on their properties and relationship to the ground-truth coupling filters.
Structure of effective coupling filters-The groundtruth coupling filters Ĵr,r (ω) are modified by a correction term h,h Ĵr,h (ω) Γh,h (ω) Ĵh ,r (ω).The linear response function Γh,h (ω) admits a series representation in terms of paths through the network that neuron r can send a signal to neuron r through hidden neurons only.This is shown schematically in Fig. 1.The effect of self-coupling from a neuron back to itself can be absorbed into the contribution of each hidden node in a path.
We may write down a set of Feynmanesque rules for explicitly calculating terms in this series.First, we define the gain, γ h ≡ λ 0 φ (µ h + h J h,h ν h ).The contribution of each term can then be written down using the following rules: i ) for the edge connecting recorded neuron r to a hidden neuron h i , assign a factor Ĵhi,r (ω); ii ) for each node corresponding to a hidden neuron h i , assign a factor γ hi /(1 − γ hi Ĵhi,hi (ω)); iii ) for each edge connecting hidden neurons h i = h j , assign a factor Ĵhj,hi (ω); and iv ) for the edge connecting hidden neuron h j to recorded neuron r, assign a factor Ĵr,hj (ω).All factors for each path are multiplied together, and all paths are then summed over.
In practice, the linear response matrix Γh,h (ω) can be calculated directly by numerical matrix inversion and multiplied with Ĵr,h (ω) and Ĵh ,r (ω) to form the correction to Ĵeff r,r (ω), then inverse Fourier transformed to return to the time domain.However, the utility of the path-length series is the intuitive understanding of the origin of contributions to the effective coupling filters and our ability to analytically analyze the strength of contributions from each path [47].This is reminiscent of recent works expanding correlations functions of linear models of network spiking in terms of network "motifs" [48][49][50].One immediate insight the path decomposition offers is that unconnected neurons only develop effective interaction between one other if there is a path that one neuron can send a signal to the other.
Strongly coupled networks-We can now investigate under what conditions hidden paths significantly skew measured neural interactions.The synaptic weights J i,j (for i = j) generally must scale with the size of the network in order for network activity to be stable.The more inputs a neuron receives, the weaker the incoming synaptic connection weights must be so as to not drive the neuron to constantly fire.While a glance at Eq. (1) suggests we might expect that O(N ) synaptic inputs to a neuron demands J i,j ∼ 1/N for the inputs to a neuron to be O(1), this will in fact be too weak.A balance of positive and negative independent couplings reduces the typical magnitude, requiring only J i,j ∼ O(1/ √ N ) for network stability.Hence, we term these two cases "weak" coupling (J i,j ∼ O(1/N )) and "strong" cou-pling (J i,j ∼ O(1/ √ N )).Previous work has studied the hidden-neuron problem in the weak coupling limit [51][52][53][54]; here we study the strong coupling limit.The 1/ √ N scaling of the strong coupling limit has been theoretically predicted to be important for network computations in balanced networks [31][32][33][34].Moreover, this scaling has recently been observed experimentally in cultured neural tissue [35], indicating that it may have intrinsic physiological importance.
We consider the case, ubiquitous in neural modeling, of an Erdős-Réyni (ER) network of N neurons, with connection sparsity p (only 100p% of connections are nonzero).The baselines of the neurons are taken to be equal for all neurons, µ i = µ 0 , for which there will exist a time-independent steady state.We choose an exponential nonlinearity, φ(x) = e x .This is the "canonical" choice of nonlinearity used in applications of this model [16-18, 21, 55].We will further assume exp(µ 0 ) 1, so that we may use this as a small control parameter.For i = j, the non-zero synaptic weights between neurons J i,j independently drawn from a normal distribution with zero mean and standard deviation J 0 /(pN ) 2a , where J 0 controls the overall strength of the weights and a = 1 or 1/2, corresponding to "weak" and "strong" coupling, respectively, for this choice of synaptic weight statistics.For simplicity we take J i,i = 0, for all neurons i.To simplify the analytic analysis, we do not impose Dale's Principle, which requires a neuron's outgoing connections to all be the same sign.We have checked the case of Dale's Principle numerically, and find similar results to the ER network [55].Numerical values of all parameters are given in Table SI in the SI.
Consider an example case of a network of N = 1000 neurons and N rec = 3 recorded neurons.The resulting effective coupling filters (solid blue curves) for these three neurons are plotted against their true ground-truth coupling filters in.Although the true network architecture is sparse, the resulting effective couplings are not sparse-a major qualitative difference in the apparent network architectures.While for many neuron pairs corrections to true non-zero filters were slight, for others, the effective coupling filters deviate significantly away from the true ones, indicating that contributions from paths through the hidden network are comparable to the direct connections.
This result demonstrates that the effective coupling filters can deviate significantly from the direct coupling, but what is the typical deviation?To assess this, we study the statistics of the integrated coupling strengths J eff r,r ≡ ∞ 0 dt J eff r,r (t).The expected value of J eff r,r is zero, so we compute the standard deviation of J eff r,r − J r,r , labeled σ[J eff r,r − J r,r ].Because the self-coupling weights are J r,r = 0, we consider only the standard deviations for r = r pairs.When the hidden network is large, J eff r,r − J r,r becomes increasingly Gaussian, and FIG. 2. Effective coupling filters J eff r,r (t) (solid blue) versus true coupling filters (dashed red) for Nrec = 3 recorded neurons in a network of N = 1000 total neurons.The simulated network has an Erdős-Réyni connectivity with sparsity p = 0.2 and normally distributed non-zero weights with zero mean and standard deviation J0/ √ pN (strong coupling).All self-couplings are set to zero (indicated by all true coupling filters being zero in plots along grid diagonal).
σ[J eff r,r − J r,r ] represents the width of this distribution.We first estimate these standard deviations by numerically computing J eff r,r − J r,r = h,h J r,h Γh,h (0)J h ,r as we scale up the synaptic amplitude J 0 and observe progressively smaller fractions of neurons f .The numerical results are shown as solid curves in Fig. (3), for both strong coupling and weak coupling.There are two striking results.First, deviations are nearly negligible (O(1/ √ pN )) for 1/N scaling of connections.Thus, for large Erdős-Réyni networks with synapses that scale with the system size, vast numbers of hidden neurons combine to have negligible effect on effective couplings.This is in marked contrast to the case when coupling is strong (1/ √ N scaling), when hidden neurons have a pronounced impact (O(1)).This is particularly the case when f 1-as in almost every experiment in neuroscience, where the hidden neurons outnumber observed ones by orders of magnitude-or when J 0 1.0, when typical deviations become half the magnitude of the true couplings themselves (upper blue line).For J 0 1.0, the network activity is unstable for an exponential nonlinearity.
To gain insight into these numerical results, we use our path series expansion to calculate the standard deviation σ[J eff r,r −J r,r ], normalized by σ[J r,r ], up to contributions from paths up to length-3.We find corresponding to the black dashed curves in Fig. 3. Eq. ( 3) is a truncation of a series in powers of λ 0 J 0 e µ0 √ 1 − f , where f = N rec /N is the fraction of recorded neurons.The most important feature of this series is the fact that it only depends on the fraction of recorded neurons f , not the absolute number, N .Contributions from long paths remain finite, even as N → ∞.Importantly, this is not the case for weak 1/N coupling, in which the series is in powers of λ 0 J 0 e µ0 (1 − f )/(pN ), so that contributions from long paths are negligible in large networks N 1. (See [55] for derivation and results for N = 100.)Deviations of Eq. (3) from the numerical solutions in Fig. 3 indicate that contributions from truncated terms are not negligible when f 1.As these terms correspond to paths of length-4 or more, this shows that long chains through the network contribute significantly to shaping effective interactions.
Discussion -We have derived a quantitative relationship between "ground-truth" synaptic interactions and the effective interactions that unobserved neurons generate between subsets of observed neurons.This provides the tools to determine the conditions under which unobserved neurons substantially reshape observed neural dynamics.For the commonly assumed Erdős-Réyni case with independent synaptic weights scaling with N −1/2 such hidden units yield qualitative changes to the magnitude and dynamics neural interactions.Together with theoretical and experimental evidence for this scaling in cortex [31][32][33][34][35], this suggests that neural coupling filters inferred from cortical activity data may differ markedly from the true connectivity.
Our work offers a way to probe the architecture of the unobserved network through measured J eff r,r (t).While inverting Eq. ( 2) to obtain a unique set of true couplings is generally an ill-posed problem, future work could use the path length expansion combined with physiological constraints to make inferences about the patterns of true connectivities consistent with the measured effective ones.Intriguingly, this same approach may yield insight into the set of circuits that can perform specific computations that are expressed in terms of effective interactions among a subset of cells.This can be particularly useful when the interactions required cor a computation violate Dale's principle or have architectures inconsistent with anatomical measurements [56,57].If these regarded as effective interactions, however, then we may be able to use Eq. ( 2) to generate patterns of biologically plausible "hidden" circuit architectures consistent with the effec- and true coupling weights J r,r between neurons r = r , normalized by the standard deviation of J r,r .These estimates account for all paths through hidden neurons.For weak 1/N coupling (red), the ratio of standard deviations is O(1/ √ N ).For strong 1/ √ N coupling (blue) the ratio is O(1) and grows in strength as the fraction of recorded neurons Nrec/N decreases or the typical synaptic strength J0 increases.Dashed black lines show theoretical estimates of the relative standard deviations accounting only for hidden paths of length-3 connecting recorded neurons.Deviations from the length-3 prediction at small f and large J0 indicate that contributions from circuit paths involving many hidden neurons are significant in these regimes.Parameter values are given in Table SI in the Supplementary Information.
tive interactions of the designed circuit.
Similar strategies may even be employed by the nervous system itself.For instance, why do many principal neurons-those which project from one circuit to another-not make direct reciprocal connections to one another, instead being linked by intermediary neurons?One hypothesis is that direct synaptic interactions J r,r (t) are limited by physiological constraints, and "hidden" interneurons are necessary to reshape the spatiotemporal dynamics of neural interactions into forms that can perform required computations, such as by mixing excitatory and inhibitory interactions to generate a richer range of possible synaptic calculations (cf.Fig. 1).The explicit link we draw between effective interactions and their origins in biological circuitry is thus a new lens for interpreting emerging data on neural connectivity.
The firing rate of a neuron i in the full network is given by where λ 0 is a characteristic rate, φ(x) ≥ 0 is a nonlinear function, µ i (potentially a function of some external stimulus θ) is a time-independent tonic drive that sets the baseline firing rate of the neuron in the absence of input from other neurons, J ij (t − t ) is a coupling filter that filters spikes ṅj (t ) fired by presynaptic neuron j at time t , incident on post-synaptic neuron i, and E i (t) is an external input current.We will take E i (t) = 0 for simplicity in this work, focusing on the spontaneous activity of the network.We need not attach a mechanistic interpretation to these filters, but a convenient interpretation is that the GLM model is like a soft-threshold integrate-and-fire network model, such that we can interpret the spike filtering as coming from some synaptic dynamics [1,2].As such, we take the actual rate to be where s j (t) is the synaptic activity of pre-synaptic neuron j at time t.In general, where g j (t) is the temporal waveform of the post-synaptic currents evoked by pre-synaptic neuron j, normalized to integrate to 1.This interpretation of the model is useful for simulating the network activity, and we use such a method for our simulations verifying our analytic calculations (see "Simulations of network activity").Otherwise, we will simply use the form of the model as written in Eq. (S.1).

B. Model network architectures
Our main result, Eq. ( 2), is valid for general network architectures with arbitrary weighted synaptic connections, so long as the hidden subset of the network has stable dynamics when the recorded neurons are removed.An example for which our method must be modified would be a network in which all or the majority of the hidden neurons are excitatory, as the hidden network is unlikely to be stable when the recorded neurons are disconnected.Similarly, we find that synaptic weight distributions with undefined moments will generally cause the network activity to be unstable.For example, J i,j drawn from a Cauchy distribution generally yield unstable network dynamics unless the weights are scaled inversely with a large power of the network size N .
To specify a concrete network architecture to study, we choose a directed Erdős-Réyni random network with sparsity p-i.e., each directed connection between neurons is assigned independently with probability p that the weight is nonzero.The weights of the non-zero connections are then drawn from a separate distribution chosen to have mean 0 and variance J 2 0 /(pN ) 2a .The choice of exponent a determines whether the coupling is weak (a = 1) or strong (a = 1/2).In most of our analytical results we only need the mean and variances of the weights, so we do not need to specify the exact distribution.In simulations, we use a normal distribution, but have also checked (symmetrized) lognormal distributions, which yield similar results (not shown).The reason for scaling the weights as 1/(pN ) a , as opposed to just 1/N a , is that the mean incoming degree of connections is p(N − 1) ≈ pN for large networks; this scaling thus controls for the typical magnitude of incoming spikes.
For strongly coupled networks, the combined effect of sparsity and synaptic weight distribution yields an overall standard deviation of √ pJ 0 / √ pN = J 0 / √ N .Because the sparsity parameter p cancels out, it does not matter if we consider p to be fixed or k 0 = pN to be fixed-both cases are equivalent.However, this is not the case if we scale J i,j by 1/k 0 , as the overall standard deviation would then be √ pJ 0 /k 0 , which only corresponds to the weak-coupling limit if p is fixed.If k 0 is fixed, the standard deviation would scale as 1/ √ N .
It is worth noting that the determination of "weak" versus "strong" coupling depends not only on the power of N with which synaptic weights scale, but also on the network architecture and correlation structure of the weights J i,j .For example, for an all-to-all connected matrix with symmetric rank-1 synaptic weights of the form J i,j = ζ i ζ j , where the ζ i are independently distributed normal random variates, the standard deviation of each ζ must scale as 1/ √ N in order for hidden paths to generate O(1) contributions to effective interactions, such that J i,j scales as 1/N but the coupling is still strong.
Real neurons appear to split into two broad separate classes, "excitatory" and "inhibitory," a dichotomy know as Dale's principle.Neurons in a network that obeys this principle will have coupling filters J i,j (t) that are strictly positive for excitatory neurons and strictly negative for inhibitory neurons.We do not impose this restriction on the model used to generate the results presented in the main text, but have tested the consequence of imposing Dale's principle on the network.We find similar results, shown in Fig. (S1) below (compare to Fig. (3) in main text).The trends are the same as in networks that do not obey Dale's principle, with the resulting ratios being slightly reduced.Because our analysis requires calculation of the mean field firing rates of the hidden network in absence of the recorded neurons, random sampling of the network may, by chance, yield hidden networks with an imbalance of excitatory neurons, for which the mean field firing rates of the hidden network may diverge for our choice of exponential nonlinearity.This is the origin of the relatively larger error bars in Fig. (S1): less random samplings for which the hidden network was stable were available to perform the computation.Choosing a nonlinearity that saturates, such as φ(x) = c/(1 + exp(−x)), prevents the mean-field firing rates from diverging, yielding stable network activity.
Finally, Erdős-Réyni networks are relatively easy to analyze analytically, and are ubiquitous in many influential computational and theoretical studies, real world networks typically have more structure.As preliminary tests of our results in networks with more realistic features, we have also simulated networks with Watts-Strogatz network architectures.A Watts-Strogatz network is generated by starting with a K-nearest neighbor network (such that fraction of non-zero connections each neuron makes is p = K/(N − 1)) and rewiring a fraction β of those connections.The limit β = 0 remains a K-nearest neighbor network, while β → 1 yields an Erdős-Réyni network.We have simulated networks on Watts-Strogatz architectures, finding similar results as for the Erdős-Réyni network, as seen in Fig. (S2).We generated the adjacency matrices of the Watts-Strogatz networks using code available in [3].  3) in main text, but for a Erdős-Réyni network architecture with Dale's principle imposed; i.e., all connections neuron j makes are positive if the neuron is excitatory (Ji,j > 0) or negative if the neuron is inhibitory (Ji,j < 0).The fraction of excitatory to inhibitory neurons is chosen to be 50% on average.The resulting ratios are smaller than the corresponding ratios for Erdős-Réyni networks in which we do not impose Dale's principle, but the trends otherwise hold.The greater uncertainties on estimates are because random sampling of the full network may generate hidden networks with an unstable mean field theory (see main text for explanation).Theoretical estimates of the relative standard deviations were not calculated.Parameter values are given in Table SI.
Parameter values used to generate our networks are given in Table SI.The nonlinear function φ(x) sets the instantaneous firing rate for the neurons in our model.The "canonical" choice of nonlinearity for our network model is an exponential, φ(x) = exp(x) [4][5][6][7].The exponential has particularly nice theoretical properties, but is also convenient for fitting this model to data, as the log-likelihood function of the model will be convex for the exponential (and some similar families of nonlinearity).
The fact that the exponential is unbounded is necessary to guarantee that a neuron spikes given enough input.A bounded nonlinearity imposes a maximum instantaneous firing rate, such that it is possible that the instantaneous rate saturates but does not guarantee the neuron will spike.The downside of an unbounded nonlinearity is that it is possible for the average firing rates to diverge, and the network never reaches a steady state.For example, in a purely excitatory network (all J i,j ≥ 0) with an exponential nonlinearity, neural firing will run away without a sufficiently strong self-refractory coupling to suppress the firing rate.This will not occur with a bounded nonlinearity, as excitation can only drive neurons to fire at some maximum but finite rate.This is a problem in simulations of networks obeying Dale's principle.For the exponential nonlinearity, the mean field theory for the hidden network occasionally does not exist due to an imbalance of excitatory and inhibitory neurons caused by our random recorded of neurons.However, the Dale's law network is stable if the nonlinearity is bounded.We demonstrate this below in Figs.S3 and S4, comparing simulations of Erdős-Réyni networks with and without Dale's principle for a sigmoidal nonlinearity φ(x) = 2/(1 + e −x ).

D. Derivation of effective baselines and coupling filters
To study how hidden neuron affects the inferred properties of recorded neurons, we partition the network into "recorded" neurons, labeled by indices r (with sub-or superscripts to differentiate different recorded neurons, e.g., r and r or r 1 and r 2 ) and "hidden" neurons labeled by indices h (with sub-or superscripts).The rates of these two groups are thus Because the set hidden neurons is generally much larger than the set of recorded neurons, we expect that we can approximate the input to the recorded neurons with the mean input, conditioned on the activity of the recorded neurons.That is, we can split the hidden input to the recorded neurons up into two terms, the mean plus fluctuations around the mean: where E [ ṅh (t)| { ṅr }] denotes the mean activity of the hidden neurons conditioned on the activity of the recorded units, and ξ r (t) are fluctuations around this mean, i.e., ξ r (t) ).Note that ξ r (t) is also conditional on the activity of the recorded units.
We can calculate the cross-correlation of the fluctuations, where C h1,h2 (t 1 , t 2 ) is the cross-correlation between hidden neurons h 1 and h 2 .If the autocorrelation of the fluctuations (r = r ) is small compared to the mean input to the recorded neurons ( h J r,h * E [ ṅh (t)| { ṅr }]), or if J r,h is small, then we may neglect these fluctuations and focus only on the effects that the mean input has on the recorded subnetwork.At the level of the mean field theory approximation we make in this work, the spike-train correlations are zero.One can calculate corrections to mean field theory (see below), but we leave investigation of the properties of these fluctuations for the focus of future studies.This will not affect our overall analysis, as non-negligible noise will not alter the form of the effective couplings between neurons, which are deterministic.
In order to calculate how hidden input shapes the activity of recorded neurons, we need to calculate the mean E [ ṅh | { ṅr }].This mean input is difficult to calculate in general, especially when conditioned on the activity of the recorded neurons.In principle, the mean can be calculated as This is not a tractable calculation.Taylor series expanding the exponential reveals that the mean will depend on all higher cumulants of the hidden unit spike trains, which cannot in general be calculated explicitly.Instead, we again appeal to the fact that in a large, sufficiently connected network, we expect fluctuations to be small, as long as the network is not near a critical point.In this case, we may make a mean field approximation, which amounts to solving the self-consistent equation In general, this equation must be solved numerically.Unfortunately, the conditional dependence on the activity of the recorded neurons presents a problem, as in principle we must solve this equation for all possible patterns of recorded unit activity.Instead, we note that the mean hidden neuron firing rate is a functional of the filtered recorded input I h (t) ≡ r J h,r * ṅr , so we can expand it as a functional Taylor series (sometimes known as a Volterra series) around some reference filtered activity Within our mean field approximation, the Taylor coefficients are simply the response functions of the network -i.e., the zeroth order coefficient is the mean firing rate of the neurons in the reference state I 0 h (t), the first order coefficient is the linear response function of the network, the second order coefficient is a nonlinear response function, and so on.
There are two natural choices for the reference state I 0 h (t).The first is simply the state of zero recorded unit activity, while the second is the mean activity of the recorded neurons.The zero-activity case conforms to the choice of GLM models used in practice, as one can view the linear filtering of spikes as the first order term in an expansion of a more general nonlinear filter.Choosing the mean activity as the reference state may be more appropriate if the recorded neurons have high firing rates, but requires adjusting the form of the GLM model so that firing rates are modulated by filtering deviations of spikes from the mean firing rate, rather than filtering the spikes themselves.In the main text, we focus on the zero-activity reference state.We present the formulation for the mean field reference state at the end of the SI.

The mean inputs E [ ṅh |0
] are the mean field approximations to the firing rates of the hidden neurons in the absence of the recorded neurons.Defining ν h ≡ E [ ṅh |0], these firing rates are given by in writing this equation we have used the fact that the steady-state mean field firing rates will be time-independent, and hence the convolution J h,h * ν h = J h,h ν h , where J h,h = ∞ 0 dt J h,h (t).The mean field equations for the ν h are a system of transcendental equations that in general cannot be solved exactly.In practice we will solve the equations numerically, but we can develop a series expansion for the solutions (see below).
The next term in the series expansion is the linear response function of the hidden unit network, Γ h,h (t The "gain" γ h is defined by where φ (x) is the derivative of the nonlinearity with respect to its argument.
We may solve for Γ h,h (t − t ) by first converting to the frequency domain and performing a matrix inverse: −1 admits a convergent series expansion where V(ω) is a matrix product and V(ω) 0 ≡ I.We can write an element of the matrix product out as Vh,h1 (ω) Vh1,h2 (ω) . . .Vh −1 ,h (ω) Vh ,h (ω); inserting Vhi,hj (ω) = γ hi Ĵhi,hj (ω) yields It is this expression that we interpret in terms of summing over paths through network of hidden neurons that join two observed neurons: the Ĵhi,hj (ω) are contributed by edges from neuron h j to h i , and the γ hi are contributed by the nodes.In this expansion, we allow edges from one neuron back to itself, meaning we include paths in which signals loop back around to the same neuron arbitrarily many times before the signal is propagated further.Such loops can be easily factored, contributing a factor . We can thus remove the need to consider self-loops in our rules for calculating effective coupling filters by assigning a factor γ h /(1 − γ h J h,h (ω)) to each node, rather than just γ h .This result can be derived directly.Let us decompose the matrix V(ω) in a diagonal and off-diagonal piece, V(ω) = Vdiag (ω) + Voff (ω).Then, We assumed that I − Vdiag (ω) is invertible, which requires that there is no element for which 1 − γ h Ĵh,h (ω) = 0 for all ω.Assuming this is the case, the inverse of the matrix is trivial to calculate, as it is diagonal: The matrix can be expressed as a series, as before: Hence, inserting the contribution from the factor I − Vdiag (ω) −1 that we pulled out, and the factor ν h that left- to give Γh,h (ω), we have This is the same as our previous expression, with γ h → γ h /(1 − γ h Ĵh,h (ω)) and restricting the sum over hidden units such that self-loops are removed (h i = h i+1 ), proving the result described in the main text.We note again that this puts restrictions on the allowed size of self-interactions, as the zeros of 1 − γ h Ĵh,h (ω) must be in the upper-half plane of the complex ω plane in order for the filters to be causal and physically meaningful.The complete expression for the correction term h,h Ĵr,h (ω) Γh,h (ω) Ĵh ,r (ω) is thus This is the exact mathematical expression underlying the rules given in the main text.
We have now determined the zero and first order Taylor series coefficients in our expansion of E[ ṅh (t)|{ ṅr }].If these are the dominant terms, i.e., if we may neglect higher order terms in this expansion, we may approximate the instantaneous firing rates of the recorded neurons by where are the effective baselines of the recorded neurons and Ĵeff r,r (ω) = Ĵr,r (ω) +

h,h
Ĵr,h (ω) Γh,h (ω) Ĵh ,r (ω) are the effective coupling filters in the frequency domain, as given in the main text.In addition to neglecting the higher order spike filtering terms here, we have also neglected fluctuations around the mean input from the hidden network.These fluctuations are zero within our mean field approximation, but we could in principle calculate corrections to the mean field predictions using the results of [8].Below, we use these methods to calculate the tree-level corrections to the correlation functions of the spikes and estimate the size of these fluctuations.

E. Specific choices of network properties used to generate Figures 2 and 3
To generate the results in Fig. 2 in the main text, we choose the coupling filters to be J i,j (t) = J i,j α 2 te −αt , which has Fourier transform Ĵi,j (ω) = J i,j α 2 (α + iω) 2 , using the Fourier convention The weight matrix J is generated as described in "Model network architectures," choosing J 0 = 1.0.We partition this network up into recorded and hidden subsets.For a network of N neurons, we choose neurons 1 to N rec to be recorded, and the remainder to be hidden, hence we define (using an index notation starting at 1; indices should be of, for example, the population variance Jvar across subsets gives an estimate of the error.However, this estimate is conditioned on the full network structure, for which we only have a single sample so far.To average over the effects of global network architecture, we draw a total of 10 network architecture samples, and average a second time over these samples to obtain our final estimates of the population variance of J eff r,r .We note that for an Erdős-Réyni network, this second stage of averaging is probabilistically unnecessary: for a large enough network random subsets of a single large network are statistically identical to random subsets drawn from several samples of full Erdős-Réyni networks.However, this will not be true for networks with more structure, such as Watts-Strogatz networks or networks that obey Dale's principle, and this second stage of averaging over the global network architecture is necessary in these cases.

F. Derivation of a series approximation for the mean field firing rates (exponential nonlinearity)
The mean field firing rates for the hidden neurons are given by where we focus specifically on the case of exponential nonlinearity φ(x) = exp(x).For this choice of nonlinearity, γ h = ν h , so our series for the mean field firing rates is also automatically a series for the gains.
This system of transcendental equations generally cannot be solved analytically.However, for small exp(µ h ) 1 we can derive, recursively, a series expansion for the firing rates.We first consider the case of µ h = µ 0 for all hidden neurons h.Let = exp(µ 0 ).We may then write Plugging this into the mean field equation, For = 0, it follows immediately that a (0) h = 1.Then, for = 1, the sum in m truncates at m = 1 (as δ , 1 +•••+ m+m is zero for m > , as all indices are positive).Thus, a With this we have calculated the firing rates to O( 4 ).
The analysis can be straightforwardly extended to the case of heterogeneous µ h , though it becomes more tedious to compute terms in the (now multivariate) series.Assuming h ≡ exp(µ h ) 1 for all h, to O( 3 ) we find G. Variance of the effective coupling to second order in Nrec/N & fourth order in λ0J0e µ 0 (exponential nonlinearity) To estimate the strength of the hidden paths, we would like to calculate the variance of the effective coupling J eff r,r and compare its strength to the variance of the direct couplings J r,r , where J eff r,r ≡ ∞ 0 dt J eff r,r (t) and J r,r ≡ ∞ 0 dt J eff r,r (t), as in the main text.
We assume that the synaptic weights J i,j are independently and identically distributed with zero mean and variance var(J ) = p J 2 0 (pN ) 2a for i = j, where a = 1 corresponds to weak coupling and a = 1/2 corresponds to strong coupling.We assume no self-couplings, J i,i = 0 for all neurons i.The overall factor of p in var[J ] comes from the sparsity of the network.For example, for normally distributed non-zero weights with variance J 2 0 /N 2a , the total probability for every connection in the network is Because the J i,j are i.i.d., the mean of J eff r,r is zero: where we used the fact that Γ h,h depends only on the hidden neuron couplings J h,h , which are independent of the couplings to the recorded neurons, J r,h and J h ,r .The variance of J eff r,r is thus equal to the mean of its square, for r = r , In this derivation, we used the fact that J r,h1 J r,h2 = J 2 r,h1 δ h1,h2 due to the fact that the synaptic weights are uncorrelated.We now need to compute Γ 2 h,h .This is intractable in general, so we will resort to calculating this in a series expansion in powers of ≡ exp(µ 0 ) for the exponential nonlinearity model.Our result will also turn out to be an expansion in powers of J 0 and 1 − f ≡ N hid /N .Because we are explicitly using the exponential nonlinearity, γ h = ν h , so we do not need to derive a series for γ h in powers of in order to perform this calculation.
The lowest order approximation is obtained by the approximation ν h ≈ λ 0 and Γ h,h ≈ ν h δ h,h , yielding This result varies linearly with f , while numerical evaluation of the variance shows obvious curvature for f 1 and J 0 1, so we need to go to higher order.This becomes tedious very quickly, so we will only work to O( 4 ) (it turns out O( 3 ) corrections vanish).
We calculate Γ 2 h,h using a recursive strategy, rather than the formal series solution.That is, we begin with the expression and plug it into itself until we obtain an expression to a desired order in .In doing so, we note that ν h ∼ O( ), so we will first work to fourth order in ν h , and then plug in the series for ν h in powers of .
We begin with The third order term ν 3 h J h,h δ h,h vanished because we assume no self-couplings.We have obtained Γ 2 h,h to fourth order in ν h ; now we need to plug in the series expression for ν h to obtain the series in powers of λ 0 .We will do this order by order in ν h .The easiest terms are the fourth order terms, as The second order term is where a h,h1,h2 = J h,h1 J h1,h2 + 1 2 J h,h1 J h,h2 .We need the average ν 2 h .The third-order term will vanish upon averaging, and a h,h1,h2 + J h,h1 J h,h2 = 0 + 3 2 var[J r,r ]δ h1,h2 ; the first term vanished because matching indices it requires h = h 1 and h 1 = h 2 , giving J 2 h,h = 0 because there are no self-couplings.We thus obtain We thus arrive at the factor 1 − δ h,h on the last term on the second line accounts for the fact that it does not contribute when h = h , as J h,h = 0. Putting everything together, For weak coupling, this tends to 1 in the N 1 limit.For strong coupling, N hid var(J ) = (1 − f )J 2 0 , and hence where we have used little-o notation to denote that there are higher order terms dominated by (λ 0 J 0 ) 4 (1 − f ) 2 .With this expression, we have improved on our approximation of the relative variance of the effective coupling to the true coupling; however, the neglected higher order terms still become significant as f → 0 and J 0 → 1, indicating that hidden paths have a significant impact when synaptic strengths are moderately strong and only a small fraction of the neurons have been observed.
Because the synaptic weights J i,j are independent, we may rewrite Eq. (S.4) as or, in terms of the ratio of standard deviations, where we used the approximation √ 1 + x ≈ 1 + x/2 for x small.
In the main text, we plotted results for N = 1000 total neurons (Fig. 3).For strongly coupled networks, the results should only depend on the fraction of observed neurons, f = N rec /N , while for weak coupling the results do depend on the absolute number N .To demonstrate these, in Fig. S5 we remake Fig. 3 for N = 100 neurons.We see that the strongly coupled results have not been altered, whereas the weakly coupled results yield stronger deviations (as the deviations are O(1/ √ N )).
FIG. S5.Same as Fig. 3, but for N = 100 neurons and Nrec = {1, 11, 21, 31, 41, 51, 61, 71, 81, 91, 99} recorded neurons.Because we plot the relative deviations of the coupling strength against the fraction of observed neurons, the curves for the strongly coupled case are the same as for N = 1000, as expected, while the weakly coupled case yields stronger deviations.
H. Estimating the error from neglecting higher order spike filtering (exponential nonlinearity) In the main text we calculate corrections to the baseline and linear spike filters, neglecting higher-order spike filtering and fluctuations around the mean input to the recorded neurons.We would like to know when these approximations are valid.We will do so within mean field theory (meaning the noise fluctuations contribute zero error as they do not contribute to the mean field approximation).Specifically, we will assume that the quadratic spike filtering term is small, and calculate the corresponding correction to our mean field approximation of the firing rates when this term is completely neglected.If we take as our approximation of the recorded neuron firing instantaneous firing rates then the mean field approximation of the firing rates is A r,r1,r2 ṅr1 ṅr2 , where we have used the fact that the average firing rates are independent of time, and replaced J eff r,r (t − t 1 ) and A r,r1,r2 (t, t 1 , t 2 ) with their time integrals, denoted by J eff r,r and A r,r1,r2 .The parameter b is just a book-keeping parameter.
To calculate the lowest order correction to the linear filtering approximation (b → 0), we write ṅr = ν sub r + bν r , treating b formally as a small parameter.The linear firing rate ν sub r is given by For the quadratically-modified firing rates, keeping terms only to linear order in b, Collecting on b and rearranging, A r,r1,r2 ν sub r1 ν sub r2 .
Because ν sub r ∝ exp(µ eff r ) ∝ exp(µ r ) = r , the expansion parameters we have been using, the lowest order approximation for νr is A r,r1,r2 ν sub r1 ν sub r2 .
The coefficient A r,r1,r2 is the amplitude of the quadratic spike filter.The expression, which we calculate later in the SI, is and hence To lowest order the error term νr is νr = (λ 0 ) 4 h,r1,r2 J r,h J h,r1 J h,r2 .
For J i,j i.i.d., the population average should converge to the expected value, which is zero because the J i,j have mean zero.We can calculate the root-mean-squared-error by looking at the variance: In principle, we should take care to separate out the r 1 = r 2 and r 1 = r 2 terms from the sum, as the latter will contribute a factor J 4 h,r1 , which we have not specified yet (though one could calculate for specific choices, such as the normal distribution that we use in practice).However, for both J 2 h,r 2 and J 4 h,r1 will scale as (J 2 0 /(pN ) 2a ) 2 , so we will neglect constant factors and simply use this scaling to arrive at the result If we take N → ∞ with N rec = f N and N hid = (1 − f )N for f fixed, the root mean squared error (RMSE) scales as For a = 1 (weak coupling), the error falls off quite quickly as N 3/2 , while it is independent of N for a = 1/2 (strong coupling).However, the error does still scale with the fraction of observed neurons, as f √ 1 − f .This demonstrates that the typical error that arises from neglecting the nonlinear filtering is small both when most neurons have been observed (f 1) and when very few neurons have been observed (f 0).While it may at first seem surprising that the error is small when very few neurons have been observed, the result does make intuitive sense: when a very small fraction of the network is observed, we can treat the unobserved portion of the network as a "reservoir" or "bath."Feedback from the observed neurons into the reservoir has a comparatively small effect, so we can get away with neglecting feedback between the observed and unobserved partitions of the network.However, when the number of observed neurons is comparable to the number of unobserved neurons, neither can be treated as a reservoir, and feedback between the two partitions is substantial.Neglecting the higher order spike filter terms may be inaccurate in this case.We check this numerically below.

I. Validating the mean field approximation and linear conditional rate approximation via direct simulations of network activity (exponential nonlinearity)
The results presented in the main text are based on analytical calculations or numerical analyses using analytically derived formulas.For example, the statistics of J eff r,r are calculated based on our expression J eff r,r = J r,r + h,h J r,h Γh,h (0)J h ,r , where Γh,h (0) can be calculated by solving the matrix equation . Generating these results does not require a simulating the full network, so we check here that our approximations indeed agree with the results of full network simulations.
We check of validity of two main results: 1) that mean field theory is an accurate approximation for the parameters we consider, and 2) that our truncation of the conditional hidden firing rates E[ ṅh (t)|{ ṅr }] at linear order in ṅr (t) is valid.
We first discuss some basic details of the simulation.The simulation code we use is a modification of the code used in [8], written by Gabe Ocker; refer to this paper for full details of the simulation.
The main changes we made are considering exponential nonlinearities and synaptic weights drawn from normal or lognormal distributions.
As in [8], we choose the coupling filters to follow an alpha function The Heaviside step function Θ(t) enforces causality of the filter, using the convention Θ(0) = 0.All neurons have the same time constant 1/α.
To efficiently simulate this network the code computes the synaptic variable s j (t) = dt g(t − t ) ṅj (t ) not by direct convolution but by solving the inhomogeneous system of differential equations, setting x(t) = s(t) and y(t) = ṡ(t), ẋj (t) = y j (t) ẏj (t) = −2α j y j (t) − α 2 j x j (t) + α 2 j ṅj (t), The instantaneous firing rates of the neurons can in this way be quickly computed in time steps of a specified size ∆t.The number of spikes n i that neuron i fires in the t th time bin is drawn from a Poisson distribution with probability (λ i (t)∆t) ni exp(−λ i (t)∆t)/(n i )!.An initial transient period of spiking before the network achieves a steady state is discarded.
The parameters we use in our simulations of the full network are given in Table SII.To confirm that the mean field approximation is valid, we seek to compare the empirically measured spike rates measured from simulations of the network activity to the calculated mean field rates.The empirical rates are measured as ṅi emp = number of spikes emitted by neuron i length of spike train , calculated after discarding the initial transient period of firing, for any neuron i (recorded or hidden).
The steady-state mean field firing rates are the solutions of the transcendental equation The only difference between this equation and the equation for ν h is that the neuron indices are not restricted to hidden units.i.e., the ν h are the mean field rates for the hidden neurons only (recorded neurons removed entirely), whereas the ṅi full MFT are the mean field rates for the entire network.If the mean field approximation is valid, the empirical rates should be approximately equal to the mean field rates, so a scatter plot of ṅi MFT versus ṅi emp should roughly lie along the identity line.We test this for a network in the strong coupling limit ( var(J ) = J 0 / √ N ) for four values of J 0 , J 0 = 0.25, 0.5, 0.75, and 1.0.We expect J 0 = 1.0 to be close to the stability threshold of the model based on a linearized analysis [9,10]; i.e., for J 0 1.0 there may not be a steady state, so this may be where we expect the mean field approximation to break down.As seen in Fig. S6, the mean field approximation appears to hold well even up to J 0 = 1.0, though there are some slight deviations for neurons with large rates.

Verifying the linearized conditional mean approximation
Having verified that the mean field approximation is valid, we now seek to check our linearized approximation of the firing rates of the hidden neurons conditioned on the activity of the recorded neurons, E [ ṅh (t)| { ṅr (t)}].That is, we calculated above that [Γ h,h * J h ,r * ṅr ](t) + . . .; the . . .correspond to higher order spike filtering terms that we have neglected in our analyses, assuming them to be small.In an earlier calculation above, we estimated that the error incurred by neglecting higher order spike filtering is of the order (λ 0 exp(µ 0 )) 4 f √ 1 − f J 3 0 , but we would like to confirm the negligibility of the higher order coupling through simulations.
To do so, we compare the empirical firing rates of the designated "hidden" neurons obtained from simulations of the full network models with the approximation of the firing rates of the hidden neurons conditioned on the recorded neurons using the linear expansion, averaged over recorded neuron activity to give Γh,h (0)J h ,r ṅr emp , where as usual the zero-frequency component of the linear response function Γh,h (0) of the hidden neurons is calculated in the absence of recorded neurons.
If we plot a scatter plot of this against the empirical estimates of the hidden neurons, ṅh emp , the data points will lie along the identity line if our approximation is valid.If the data deviates from the identity line, it indicates that the neglected higher-order filtering terms contribute substantially to the firing rates of the neurons.It is possible that the zeroth order rate approximation, ν h , would be sufficient to describe the data, so we compare the empirical rates to both ν h and ṅh approx .
As in the mean field approximation test, we focused on a strongly coupled network with J 0 = 0.25, 0.5, 0.75, and 1.0.Because our analytical estimate of the error suggests small error for small fractions of recorded neurons and larger error when N rec ∼ N hid , we check both N rec = 100 neurons out of N = 1000 neurons (f = 0.1) in Fig. S7 and N rec = 500 neurons out of N = 1000 (f = 0.5) in Fig. S8.
For each value of J 0 , we present two plots: the empirical rates versus the mean field rates ν h in the absence of recorded neurons (the zeroth order approximation; Figs.S7 and S8, top row), and the empirical rates versus the linear approximation ṅh approx (the first order approximation; Figs.S7 and S8, bottom row).We find that in both cases the data is centered around the identity line, but the spread of data grows with J 0 for the zeroth order approximation, while it is quite tight for the first order approximation up to J 0 = 1.0, validating our neglect of the higher order spike filtering terms.We also confirm that N rec = 500 offers worse agreement than N rec = 100, though the agreement between ṅh emp and ṅh approx is still not too bad.

J. Second order nonlinear response function
Higher order terms in the series expansion represent nonlinear response functions.We do not focus on these terms in this work, assuming instead that we can truncate this series expansion at linear order.We will, however, estimate the error incurred by this truncation.To do so will we need the next order term, so we now go to second order.Rather than differentiate our formal solution for the linear response, we differentiate the implicit form, yielding h,h1,h2 (t , t 1 , t 2 ) Top row: scatter plot comparing ν h , the mean field firing rates of the hidden neurons in the absence of recorded neurons, to empirically estimated firing rates in simulations of the full network, for four different values of typical synaptic strength, J0 = 0.25, 0.5, 0.75, and 1.0.The data lie along the identity line, demonstrating a strong correlation between ν h and the empirical data.However, the spread of data around the identity line indicates that deviations of the mean firing rates away from ν h , caused by coupling to the recorded neurons, is significant.Bottom row: Comparison of the first order approximation of the firing rates of hidden neurons, which accounts for the effects of recorded neurons, to the empirical rates.The data lie tightly along the identity with very little dispersion, demonstrating that higher order spike filtering is unnecessary even up to J0 = 1.0, for Nrec = 100. Rearranging, Inverting the operator on the left hand side yields the input linear response function (when combined with the factor of γ h on the right hand side), giving the solution Because Γ h,h (t − t ) is proportional to γ h , the second order nonlinear response function is also proportional to γ h .
The effective quadratic spike filtering that enters in the instantaneous rate of the recorded neurons is thus h,h1,h2 (t , t 1 , t 2 )J h1,r 1 (t 1 − t 1 )J h2,r 2 (t 2 − t 2 ) ṅr 1 (t 1 ) ṅr 2 (t 2 ),  S7 but for Nrec = 500 recorded neurons out of a total of N = 1000.Demonstrates validity of linear approximation (neglecting higher order spike filtering) up to J0 = 1.0, for Nrec = 500.The zeroth order approximation (top row) is quite poor, indicating the necessity of accounting for feedback from the recorded neurons.This first order approximation (bottom row) lies tightly along the identity line, indicating that even when the recorded and hidden populations are of comparable size, higher order spike filtering may not be significant.However, there appears to be some deviation for J0 = 1.0, indicating that accounting for higher order spike filtering may be beneficial in this parameter regime.
where we define the quadratic spike filter to be h,h1,h2 (t , t 1 , t 2 )J h1,r 1 (t 1 − t 1 )J h2,r 2 (t 2 − t 2 ) (S.5) We would like to estimate the typical size of this term to leading order so that we may estimate the error we make by neglecting it.For the exponential nonlinearity, γ h = ν h and to leading order in ν h ∼ λ 0 , Γ h,h1,h2 (t, t 1 , t 2 ) ≈ λ 0 δ h,h1 δ h,h2 δ(t − t 1 )δ(t − t 2 ), and hence This is the result we used in our earlier calculation estimating the error we make in the mean firing rates of the recorded neurons by neglecting higher order spike filtering.

K. Tree-level calculation of the effective noise correlations
In our approximation of the model for the recorded neurons, we also neglected fluctuations from the mean input around the hidden neuron input.We should therefore check how strong this noise is.At the level of a mean-field approximation we may neglect it, so we will need to go to a tree-level approximation to calculate it.
The noise is defined by It has zero mean (by construction), conditioned on the activity of the recorded units -i.e., the "noise" receives feedback from the recorded neurons.We can evaluate the cross-correlation function of this noise, conditioned on the recorded unit activity.This is given by and thus the cross-correlation function is zero.We can go beyond mean field theory and calculate the tree-level contribution to the correlation functions using the field theory diagrammatic techniques of [8].We will do so for the reference state of zero-recorded unit activity, { ṅr } = {0}, as we expect this to be the leading order contribution to the correlation function.As we are interested primarily in the typical magnitude of the noise compared to the interaction terms, we will work only to leading order in = exp(µ 0 ) for the exponential nonlinearity network.We find where ∆ h,h (ω) ≈ δ h,h + O( ) is the linear response to perturbations to the output of a neuron's rate.It is related to Γ h,h (ω) by Γ h,h (ω) = ∆ h,h (ω)γ h , where γ h = ν h for φ(x) = e x .The overall noise cross-correlation function is then approximately dt 1 J r,h (t − t 1 )J r ,h (t − t 1 ).
If r = r , the expected noise cross-correlation, averaged over the synaptic weights J i,j , is zero.If r = r , the expected value is non-zero.The expected noise auto-correlation function is then For the specific case of g(t) = α 2 te −αt Θ(t), we have For weak coupling (a = 1), the expected autocorrelation function falls off with network size as 1/N , while for strong coupling (a = 1/2), it scales with the fraction of observed neurons f , but is independent of the absolute network size.The overall λ 0 scaling puts the magnitude of the autocorrelation function on par with contributions from hidden paths through a single hidden neuron that contributes a factor of λ 0 to the correction to the coupling filters.Based on our results shown in Fig. 3, which suggest that contributions from long paths through hidden neurons are significant when the fraction of neurons f is small and J 0 1, we expect that network noise will also be significant in these regimes.This will not modify the results presented in the main paper, however.It simply means that this noise should be retained in the rate of our approximate model, λ r (t) ≈ λ 0 exp µ eff r + r J eff r,r * ṅr (t) + ξ r (t) .
Lastly, we note that here we only calculated the leading order contribution to the noise correlation functions around the reference state of no recorded unit activity.Much as we expanded E[ ṅh (t)|{ ṅr }] in a functional Taylor series around { ṅr } = 0, we could do so for the correlation functions as well to study how these effective noise correlations depend on the activity of the recorded neurons.We leave such a calculation for future investigations.
L. Full mean-field reference state For most of our analyses, we have been expanding the conditional firing rates of the hidden neurons around a reference state of zero activity of the recorded neurons.The quantities ν h , γ h , Γh,h (ω), and so on, are thus calculated using a network in which the recorded neurons have been removed.We have demonstrated that this approximation is valid for the networks considered in this paper.However, this approximation may break down in networks in which the recorded spike at high rates.In this case, we may need another reference state to expand the conditional rates around.A natural choice of reference state ṅ(0) r (t) in this case would be the mean firing rates of the neurons.We will set up this expansion here.
The mean firing rates of the neurons are intractable to calculate exactly, so we will estimate them by the mean field rates, an approximation that we expect to be accurate in the high-firing rate regime.
The mean field equations for the full network are ṅi = λ 0 φ We  note that we introduced 0 = r ṅr − r ṅr so that we could write the instantaneous firing not as a function of filtered spike trains but as a functioned of filtered deviations from the mean firing rate.Importantly, although it looks like only the baseline is different from the zero-activity reference state case but the coupling is the same, the linear response function Γ full h,h (τ ) is not the same as the zero-reference state case, and hence the correction to the coupling is slightly different.The solutions look similar, but the linear response functions now incorporate the effects of the recorded units as well.In particular, Γ full ij (t − t ) satisfies the equation where γ full i is the gain of neuron i accounting for the entire network, Thus, in Fourier space Γfull ij (ω) = I − Vfull (ω) where V full i,j (ω) = γ full i Ĵi,j (ω) is an N × N matrix -i.e., it contains the couplings and firing rates of all neurons, recorded and hidden.Hence, while this looks formally similar to the result we obtained in the zero activity state, the inclusion of recorded neurons modifies our rules for calculating contributions to the effective coupling filters.In particular, J eff r,r (ω) − J r,r (ω) involves contributions from paths through both hidden and recorded neurons, unlike the zero-activity reference case, which involved contributions only from paths through hidden neurons.The reason for this, of course, is that the reference state depends on the entire network, not just the hidden neurons.The difference between the cases matters only matter at higher orders in our expansion -paths of length = 4 or greater.We can see this by writing out the first few terms in the path length expansion of the effective coupling, Ĵeff r,r (ω) = J r,r (ω) + h Ĵr,h (ω)γ full h Ĵh,r (ω) + h,h Ĵr,h (ω)γ full h Ĵh,h (ω)γ full h Ĵh ,r (ω) + h,h ,j Ĵr,h (ω)γ full h Ĵh,j (ω)γ full j Ĵj,h (ω)γ full h Ĵh ,r (ω) + . . .; for conciseness, we have assumed zero-self coupling ( Ĵi,i (ω) = 0), but this could be restored by setting γ full i → γ full i /(1 − γ full i Ĵi,i (ω)).We see that the first few terms of the expansion are the same as the zero-activity reference case, with the exception that the γ full h are the gains for the entire network, not just the hidden network absent the recorded neurons.It is only the = 4 term at which contributions to the linear response functions involving paths through any neuron j, recorded or hidden, start to appear.Because we typically expect the amplitude of these terms to be small, we anticipate expanding around the mean field reference state will yield similar results to the expansion around the zero-activity reference state presented in the main paper.

FIG. 3 .
FIG.3.Solid lines: Numerical estimates of the standard deviation of the difference between effective coupling weights J eff r,r

5 N
FIG.S1.Same as Fig.(3)  in main text, but for a Erdős-Réyni network architecture with Dale's principle imposed; i.e., all connections neuron j makes are positive if the neuron is excitatory (Ji,j > 0) or negative if the neuron is inhibitory (Ji,j < 0).The fraction of excitatory to inhibitory neurons is chosen to be 50% on average.The resulting ratios are smaller than the corresponding ratios for Erdős-Réyni networks in which we do not impose Dale's principle, but the trends otherwise hold.The greater uncertainties on estimates are because random sampling of the full network may generate hidden networks with an unstable mean field theory (see main text for explanation).Theoretical estimates of the relative standard deviations were not calculated.Parameter values are given in TableSI.
is the output-linear response function, computed by performing a matrix inverse.If || V(ω)|| < 1 for some matrix norm || • ||, then the matrix [I − V(ω)] FIG. S6.Empirical estimates of average neuron firing rates from simulations plotted against mean firing rates predicted by mean field theory.The fact that the data lies along the identity line demonstrates validity of MFT up to J0 = 1.0.
FIG. S7.Top row: scatter plot comparing ν h , the mean field firing rates of the hidden neurons in the absence of recorded neurons, to empirically estimated firing rates in simulations of the full network, for four different values of typical synaptic strength, J0 = 0.25, 0.5, 0.75, and 1.0.The data lie along the identity line, demonstrating a strong correlation between ν h and the empirical data.However, the spread of data around the identity line indicates that deviations of the mean firing rates away from ν h , caused by coupling to the recorded neurons, is significant.Bottom row: Comparison of the first order approximation of the firing rates of hidden neurons, which accounts for the effects of recorded neurons, to the empirical rates.The data lie tightly along the identity with very little dispersion, demonstrating that higher order spike filtering is unnecessary even up to J0 = 1.0, for Nrec = 100.
FIG.S8.Same as Fig.S7but for Nrec = 500 recorded neurons out of a total of N = 1000.Demonstrates validity of linear approximation (neglecting higher order spike filtering) up to J0 = 1.0, for Nrec = 500.The zeroth order approximation (top row) is quite poor, indicating the necessity of accounting for feedback from the recorded neurons.This first order approximation (bottom row) lies tightly along the identity line, indicating that even when the recorded and hidden populations are of comparable size, higher order spike filtering may not be significant.However, there appears to be some deviation for J0 = 1.0, indicating that accounting for higher order spike filtering may be beneficial in this parameter regime.

TABLE SII .
Network activity simulation parameter values.