Discovering sparse control strategies in C. elegans

Biological circuits such as neural or gene regulation networks use internal states to map sensory input to an adaptive repertoire of behavior. Characterizing this mapping is a major challenge for systems biology, and though experiments that probe internal states are developing rapidly, organismal complexity presents a fundamental obstacle given the many possible ways internal states could map to behavior. Using C. elegans as an example, we propose a protocol for systematic perturbation of neural states that limits experimental complexity but still characterizes collective aspects of the neural-behavioral map. We consider experimentally motivated small perturbations -- ones that are most likely to preserve natural dynamics and are closer to internal control mechanisms -- to neural states and their impact on collective neural behavior. Then, we connect such perturbations to the local information geometry of collective statistics, which can be fully characterized using pairwise perturbations. Applying the protocol to a minimal model of C. elegans neural activity, we find that collective neural statistics are most sensitive to a few principal perturbative modes. Dominant eigenvalues decay initially as a power law, unveiling a hierarchy that arises from variation in individual neural activity and pairwise interactions. Highest-ranking modes tend to be dominated by a few,"pivotal"neurons that account for most of the system's sensitivity, suggesting a sparse mechanism for control of collective behavior.


Introduction
Control of complex dynamical networks is a problem of major interest in biology for finding drivers of diseased states [1][2][3] and determinants of behavior [4][5][6][7]. In neural systems, the question of which and how many neurons correspond to controllability is largely open. While there is evidence that single-neuron manipulation is sufficient to induce behavioral change in some cases [8,9], other research shows behavioral information to be encoded amongst many neurons [4,10,11]. Identifying interesting neurons to probe experimentally and choosing how to perturb them is difficult: in principle, a thorough experiment would require a combinatorially large number of procedures to test all the possible ways that neural targets could be modified. Theoretical tools provide a way of winnowing down the number using control-theoretic analysis of dynamical systems models [12,13], structural analysis [5,14,15], and network properties [16], amongst other approaches [17]. Here, we develop a theoretical framework that explicitly considers perturbation experiments to discover candidate control neurons. Our framework suggests a way of leveraging recent developments in optogenetics [18,19] that allow fast, precise control of specific neruons and even the regulation of brain states with the closed-loop optogenetic [20][21][22][23] or optoelectronic systems [24].
The question of control is in essence about how neural states map to behavior. When the mapping is not deterministic, this is a problem of stochastic encoding of the behavioral state Z in the global neural state Y . Whether from noise, experimental limitations, or other sources of uncertainty, there is for a given neural state Y a multiplicity of Z characterized by the conditional probability distribution P (Z|Y ). In practice, we cannot specify a unique neural state Y due to experimental uncertainty or limits to measurement. Instead, we measure some distribution over configurations P (Y ) that we then map to some probability distribution of behavioral outcomes P (Z). By considering the realm of possible distinct neural configurations, we obtain the stochastic mapping formulation in Figure 1, where at each point we specify a distribution of neural activity, of behavior, and the relationship between the two that accounts for imprecision and uncertainty.
A classic formulation of stochastic mapping in sensory encoding is that of individual neurons that encode increasingly complex patterns of sensory stimuli as information is funneled up a hierarchy [25,26]. In this case, the mapping between Y and Z is precise. For example, we take Y to be the firing rate of some neuron and Z to be the similarity of the stimulus to one's grandmother. Since firing rate is a mean of a noisy measurement, it is more appropriate to interpret the average firing rate to characterize a distribution P (Y ), e.g. a Poisson distribution for the number of times a neuron fires within a time frame. Then, varying the mean activity level of a single neuron traces out a curve in the space of possible distributions P (Y ). In the space of possible perceptions P (Z), representing for example a probability distribution over recognizing "grandma" or not and given by P (Z) = Y P (Z|Y )P (Y ), we likewise trace out a curve as a single neural parameter is varied.
Later paradigms generalized the focus on single neurons to a sparse set of neurons that together span the range of possible sensory inputs. In the formulation of sparse coding, where a basis set of neurons are extracted from measuring neural response to a set of visual stimuli, leading to abstract image features such as edges, corners, and contours [27,28]. The idea of sparse coding is that typical visual scenes can be represented with activity in only a few neurons, with each neuron representing a separate high-level feature. Then, it is the joint activity amongst these distributed  Four possibilities for the sparsity of collective modes of sensitivity. When collective activity shows sloppy structure, the local information geometry is elongated, with both stiff sensitive directions and sloppy insensitive ones. With dense collective encoding, multiple components each matter equally. The starred combination, with sloppy, sparse combinations of neurons, aligns with centralized control.
components that captures the whole image, an idea with multiple variations in the literature [29,30].
Despite the generalization away from single neurons in the study of sensory encoding [31], an echo of it still rings in experimental studies of neural control of behavior. Some experiments rely on the notion of individual (or sets of similar) "driver" neurons that -when ablated, silenced, or otherwise modified -substantially change behavioral outcomes [32][33][34]. While some individual cells, like interneurons connecting distal parts of the body, are essential for normal behavior, some higher-order behaviors depend on many neurons in a way that is robust to the function of individual cells [35][36][37]. Indeed, the finding that some neural circuits encode sensory input in a sparse, distributed way raises the question of whether neural control of behavior also follows similar principles [4,26,29,38].
Here, we explore two potential sources of sparsity in neural control of behavior. First, a desired output could be equally well produced by many possible changes, or modes, at the neural level or could be much more easily produced by only one particular change. Second, each mode can itself be sparse or dense in the number of neurons involved [39,40]. These possibilities lead to the four hypotheses we draw in Figure 2, where control is sloppy or non-sloppy with either sparse or dense collective modes of sensitivity.
In the context of distributions of neural activity, variation in sensitivity of neural activity is described by the local dependence of the neural activity distribution P (Y ) on parameters specifying its form. As pictured in the top of Figure 1, "stiff" directions correspond to large changes in P (Y ) being caused by small changes to neural parameters [41,42]. In the "sloppy" directions, changes to neural parameters are ineffectual, and only dramatic perturbations cause a comparable change in P (Y ). The relative elongation of the local information geometry disappears when encoding places no special importance on any neural subgroup such that each plays a commensurate role, as in some forms of population coding [43,44]. In this way, the local information geometry informs us about the first notion of sparsity mentioned above. Additionally, the second notion of sparsity is captured by the degree to which stiff directions involve changes to only a few neurons. Of particular interest is the combination of sparse and sloppy structure, where collective sensitivity would be concentrated in only a few perturbative sets that each include only a few neurons. Such a possibility reflects centralized control, which would seem to be optimal for simplifying the multiplicity and complexity of control nodes. If real world examples display a range of structure depending on organism and function, we need a flexible methodology for inferring such variation from data.
We propose a perturbative, experimental approach to study stochastic encoding between internal states and output behavior. Perturbing the statistics of neural activity parameterizes motion in the space of all possible distributions, the rate of which quantifies sensitivity to neural changes. Our focus on small perturbations allows us to derive a local approximation that is the simplest complete description of local sensitivity.
For concreteness, we focus on the model animal Caenorhabdits elegans. With only 302 neurons and structural connectome mapped, C. elegans presents a particularly interesting example with which to explore neural control. Current experiments measure whole-brain activity in a freely moving worm and soon will allow matching those neurons to the structural connectome [45,46]. Combined with tools for manipulating neural states [47], our proposed experimental protocol could be used to search for neural centers important to collective activity. As an example of the proposed method, we start with minimal models matching the statistics of neural activity in C. elegans, and then compute the local information geometry to extract combinations of neurons that, if perturbed, would strongly affect collective statistics. Though we focus on the neural activity of C. elegans, our procedure could be adapted to other measures of internal states such as hormone concentration, gene expression, or other experimentally accessible quantities, and to output behavior including morphological descriptors such as orientation, velocity, extension, or a stereotyped behavioral sequence [48]. In this sense, our work provides a generalizable theoretical framework with specific measurements and predictions that can be tested experimentally in C. elegans and other biological systems.

A perturbative thought experiment
An exhaustive protocol for probing the C. elegans neural system would be to test simultaneous perturbations to subsets of neurons in ways that are sympathetic, antagonistic, and of varying magnitude. Unfortunately, the combinatorial explosion in the number of possibilities makes such an experiment impossible. In the limit of small perturbations, however, all possible effects can be extracted from observing only the response of the system to perturbations of pairs of neurons. We use this limit in part because pairwise relationships between perturbations becomes a sufficient description of system response. Additionally, it is often the case that collective statistics of neural activity can be captured with pairwise neural statistics without needing to invoke higher-order correlations [39,49,50], so we might anticipate that the pairwise approximation could be a reasonable one even for larger changes. We demonstrate how this simplification could be useful by carrying out a thought experiment that mimics the perturbations in silico, making use of a minimal model of neural activity.
To formalize the set of perturbations, we consider the discretized gradient of calcium concentration s i of neuron i, which is either decreasing s i = −1, flat s i = 0, or increasing s i = 1 with probabilities p(s) of observing the set s ≡ {s i }. Such discretization has been explored in previous work [10]. Then, a perturbation to underlying neural activity is reflected in a modified probability distribution over discrete statesp(s) = p(s) + ∆(s). A unique measure of distinguishability, the information divergence, between the original distribution and the modified one reduces to Then, the complete set of perturbations is given by the Hessian F ij describing the local sensitivity of p(s) to perturbations along vectors dv, which may be constrained by what is accessible in the experiment or model. Because moving from one basis to another is a linear operation, the particular form of the perturbations are not as important as the fact that the set should span the possible set of perturbations. By exploiting this property of analyticity, we can in principle reduce the complexity of the problem by measuring a single set of experimental perturbations. In a hypothetical experiment, we might effect perturbations by inserting electrodes into immobilized worms -or, in a more elegant protocol using optogenetic tools, by clamping membrane potential and effectively coupling neurons to one another via an external circuit to control the strength and timing of perturbation. Here, we use a perturbation that with small probability copies the state of one neuron to another, chosen to resemble increase or decrease in synaptic strength. Specifically, we select pairs, identifying a "target" neuron in state s t and a "matcher" neuron in state s m , and clamp the matcher to the target's state, s m = s t , with small probability (see Appendix G for more details). Compared to more common clamping and ablation techniques [23,51], this proposed closed-loop perturbation is not as drastic, mostly preserves natural dynamics, and is closer to internal control mechanisms such as synaptic modulation that do not require constant input from the outside [52][53][54].

Maximum entropy (maxent) model example
We build a minimal model of the collective statistics of C. elegans on which to run such a protocol in silico, systematically perturbing it and extracting from its response the sensitive modes of collective neural statistics. The two examples we consider are of N = 50 neuron subsets from the anterior neural network of the immobilized worm [4]. With a discretized time series representing the three possible configurations for neuron m, we calculate time-averaged probabilities of being in state k, where the Kronecker delta function δ sm,k indicates when the state of the neuron s m is k at time t over the duration of the experiment T . Similarly, the pairwise probabilities of agreement between any two neurons is T t=1 δ sm,st (t)/T and similar higher-order correlations describe the probabilities of agreement between multiple neurons. These higher-order correlations are not necessarily given by the lower-order statistics, but only accounting for the pairwise correlations is sufficient to capture the higher-order correlations [50] (see Figures A.6 and A.7).
An advantage of the maxent approach is that it captures the statistics of neural activity in a way that reflects latent physical interactions while also obeying a quantitative formulation of Occam's Razor [55,56]. The maxent principle ensures that a model of the probability distribution p(s) only matches specified constraints but otherwise remains as structureless as possible. This can be done by maximizing the model's information entropy, S = − s p(s) log p(s), here a sum over all 3 N possible configurations, with the standard method of Langrangian multipliers [39,[57][58][59]. When the set of average individual neural activity { r k (s m ) } is constrained, the maxent procedure gives the "independent model" where the probability of finding neuron i in state k, is determined by the bias h i,k , with normalization term Z i = 1 k=−1 e h i,k . A larger bias h i,k relative to another state k indicates that it is more likely to find the neuron in state k than k . This independent model fails to capture correlations in neural activity.
A simple variation that does not assume independence involves constraining the pairwise correlations. Given the finite-size of the data set, we constrain the pairwise coincidence probabilities to obtain the pairwise maxent model with distribution given the energy function E(s), The log-probability of configuration s varies with its energy such that lower energy means larger probability. As with the independent model, increasing neural bias {h i,k } pushes neurons to state k and increasing the magnitude of couplings {J ij } magnifies the tendency of pairs to agree or to disagree when negative. Though the couplings are in principle exactly specified by the pairwise correlations in the data, sampling and experimental noise mean that there are many possible sets of couplings that align within the statistical variation of the data sample. We focus on a numerical solution that recovers topological structure in the coupling network and has been shown to faithfully capture collective neural patterns [50] (see Appendix A for solutions from an approximate gradient descent method). The solution then defines a minimal interaction network that is distinct from the pattern of pairwise correlations (see Figure A.6).

Mapping perturbations from experiment to simulation
With the maxent model, we enact our in silico thought experiment. Continuing with the formulation presented above, we connect a pair of target neuron t and matcher neuron m such that s m is clamped to match s t with small probability 1. Then, the modified probabilityr k that the matcher neuron is in state k is the mixturẽ As a result of the perturbation in Eq 5, the statistics describing neuron m becomes more like those of neuron t. This procedure likewise modifies the matcher neuron's coincidence p(s m = s j ) →p(s m = s j ) with all other neurons indexed j as if modifying synaptic weights [60]. If we were to take sufficiently large in experiment, we would expect such intervention not to remain localized to neuron s m but to alter the neighbors and eventually the neighbors of neighbors and so on. If is small, then we expect a localized perturbation to be able to well-approximate the desired change in statistics given in Eq 5. By taking the limit where the perturbation is sufficiently small and assuming that it then remain localized, we specify a perturbation that corresponds to a unique change of parameters. Such uniqueness implicitly depends on constraining ourselves to considering perturbations that move us along the family of pairwise maxent models. This constraint is not reflected in the replacement rule in Eq 5, which compatible with an infinite variety of changes to higher-order correlations. In other words, we do not necessarily need to restrict ourselves to considering only the energy function defined in Eq 4, but we could allow for higher-order interactions to appear under perturbation. This introduces ambiguity that can only be resolved by choosing the form of the perturbations. Once we have chosen to fix the structure of the energy function from the maxent principle, however, we do not allow a perturbation to arbitrarily alter the form. This assumption is consistent with the widespread observation that the pairwise model generally captures well collective features of biological neural networks of modest size [39,[61][62][63][64][65]. Thus, we use the pairwise maxent model not only to specify a minimal, compressed representation of neural statistics, but also to specify how the probability distribution evolves under the replacement rule specified in Eq 5.
This replacement rule maps experimental perturbations to observed statistics in contrast with starting with model parameters as is often the first instinct for theorists. In the latter formulation, the fields and couplings would form the basis for perturbations, also known as "natural" or "canonical" perturbations [66]. This latter perspective adheres to the physical intuition that the parameters from Eq 4 reflect latent physical interactions. When the pairwise maxent model instead serves as an approximation of the underlying physical interactions, a natural perturbation may be mediated by unknown parameters instead of fields and couplings [49,67], and this renders the straightforward model perturbation difficult to translate into a literal experiment. In comparison, defining perturbations in observable terms as in Eq 5 allows for easier experimental interpretation and for consistent comparison of inferred models [60].
If it is collective neural activity that encodes behaviorally relevant information [39,40], then we can use collective properties as a proxy for behavior. As one example measure of collective activity, we consider the probability that the number of neurons in each of the three states, φ fine (n 1 , n 2 , n 3 ), is described when ordered by n 1 ≥ n 2 ≥ n 3 such that n 1 corresponds to the size of the plurality and n 3 the smallest minority (i.e. there is no fixed correspondence to "rise," "fall," and "flat"). This presents a measure of synchrony more fine-grained than φ coarse (n 1 ) that only considers the number of neurons in the plurality n. Crucially, the pairwise maxent model also captures these statistics. While neither statistic differentiates between the orientation of the states, we limit ourselves here to simpler and less computationally expensive statistics of collective neural activity [68,69], noting that other statistics may easily be incorporated in the way we outline.
Under perturbation of a pair of matcher and target neurons, the synchrony distribution φ is mapped to an alteredφ that depends on the strength of perturbation . To measure the distance between the two distributions, we use the Kullback-Leibler (KL) divergence D KL as in Eq 1 [66]. In the limit of infinitesimal perturbation, → 0, KL divergence reduces to the Fisher information where the Jacobian J ij mt,m t transforms the maxent parameters such as fields and couplings {θ i }, here ordered along a single index i, into the vector of changes that correspond to the observable perturbations specified in Eq 5. In contrast with Eq 1, we consider in Eq 6 the impact of perturbation on collective synchrony in Eq 6 that accounts for the multiplicity of microscopic configurations belonging to a single coarse-grained state [70].
The Fisher information matrix (FIM), whose entries are defined in Eq 6, is spanned by eigenvectors that describe orthogonal perturbations of the parameters {θ i }, where modes with large eigenvalues describe perturbations to which collective synchrony is highly sensitive and small eigenvalues represent ones to which the system is insensitive [42,60]. As we show in Eq 6, this basis describes the local curvature of a Riemannian manifold generated by information distance. Importantly, the basis vectors can involve a mix of antagonistic and sympathetic perturbations of neurons of varying magnitude, one that would be nontrivial to extract a priori. Thus, the FIM encodes how quickly coarse-grained configurations φ change as we modify the system on a microscopic level by perturbing pairs of neurons at a time, with perturbations that mimic changes to synaptic connectivity.

Leading FIM eigenvalues show Zipfian decay
In Figure 4A, we show the rank-ordered eigenvalue spectrum of the FIM for collective synchrony φ fine calculated with the pairwise model in comparison with several null models: independent neurons, pairwise maxent model with couplings randomly shuffled between all pairs of neurons (which preserves the distribution of couplings but not the topology of the interaction network), and canonical perturbations directly modifying each coupling at a time by a fixed amount ( Figure C.10). Across all cases, we expect a hard cutoff at the dimensionality of the synchrony distribution φ fine corresponding to rank Z max = 234, which is much smaller than a single dimension of the matrix N 2 − N = 2,450 (Z max = 33 for the coarse-grained collective statistic φ coarse as we show in Figure E.19). In contrast with the others, the independent neuron model has short maximum cutoff well below the theoretical maximum, reflecting the essential role of interactions in mediating pairwise perturbations. The cutoff for the pairwise maxent model reveals the replacement rule in Eq 5 sufficient to very nearly span the full dimensionality of synchrony space. This observation confirms that the pairwise model with the pairwise replacement rule generates a useful basis to explore collective sensitivity.
The rank-ordered spectrum of eigenvalues λ z of rank z initially decays such that each successive level of perturbation returns multiplicatively smaller response, following roughly on a very limited range Zipf's law, λ z ∝ z −1 . In contrast, a simple exponential decay would indicate a sharp cutoff for sensitive modes beyond some rank. We find that exponential decay alone does not describe the eigenvalue spectrum, and instead find a reasonable fit using a power law with an exponential tail: In Eq 7, we have parameters for vertical scaling A, exponent α, tailz, and hard cutoff Z max beyond which there are no additional eigenvalues or numerical precision errors may be important (see Section C for more details about the fitting procedure). For random subsets of N = 50 neurons considered, we find that this model usually presents a compelling fit and that the scale of the cutoffz 30 indicates a power-law-like regime for the dominant eigenvalues (Figures G. 22 and G.23). Such a scaling regime seems to be a feature of the system statistics, consistently appearing across the null models. The shuffled null model agrees closely with our thought experiment, in which eigenvalue decay tends to be slowest, showing a power-law exponent closest to α = 1 (see Figure C.10). Perhaps unsurprisingly, the spectra are also scaled to higher sensitivities than the independent model. Though the latter is roughly commensurate in overall sensitivities with canonical perturbations, we might expect perturbations in the space of observable and model parameters to be scaled differently from the transformation of variables (Appendix E). Thus, this statistical hierarchy reflects the role of component disorder common across the models whose overall collective sensitivity is magnified by interactions.

FIM eigenvectors reveal pivotal neurons
Inspecting the corresponding eigenvectors, we find some modes are dominated by perturbations focused only on a few neurons. To better represent the connection between pairwise perturbations and eigenvectors, we reshape eigenvectors into eigenmatrices V mt such that the elements in a column indicate how matcher neuron m should imitate its target neighbors t in turn. When put into this representation, we often find vertical striations as in the inset of Figure 4A. These striations are visible because they are almost exclusively of the same sign, indicating that the mode describes perturbations localized to a single neuron that tend to increase or decrease its correlation with all neighbors simultaneously. In contrast, horizontal striations that represent uniform perturbations across all the neighbors of a particular neuron, a kind of global perturbation, tend to be sparser and weaker on average. This emergent pattern contained in the block structure of the FIM suggests that localized, uniform enhancement or suppression of synaptic connections leading to a small set of pivotal neurons may serve as effective mechanisms for modulating collective activity.
As a more direct analysis of such key neurons, we limit our analysis to perturbations focused on a single matcher neuron at a time and put aside perturbations combining different matcher neurons. When the FIM is ordered first by matchers and then targets for each matcher, these perturbations correspond to blocks along the diagonal of the FIM of dimension (N − 1) × (N − 1) for fixed m and variable t. From the principal eigenvalues of the diagonal blocks, we find that the single, pivotal neurons with the largest eigenvalues tend to coincide with the ones that manifest in vertical striations of the full FIM (Appendix C). These striations correspond to columns with high uniformity. As a measure of this, we define row and column uniformity, respectively, as for each normalized eigenmatrix v ij . When we consider the subspace of leading pivotal neurons and compare them with the subspace of randomly chosen neurons, the principal eigenvalues of the former set saturate much more quickly to the leading eigenvalue of the full matrix even when we have only considered about half of the available neurons ( Figure 4C), indicating that there is a subset that in combination overwhelmingly determines collective sensitivity. This concentration of sensitive modes in a few neurons is an indication of centralized structure that emerges from neural heterogeneity. The patterns that we note both in the eigenvalue spectrum and eigenvector basis hold generally for random neuron subsets of N = 50 sampled from each experiment, about the maximum possible number that can be sampled reliably [50]. In contrast with the typical notion that the identities of particular neurons are essential, pivotal neurons fluctuate between subsets and random samples as shown in Figure 5. While some neurons are more frequently identified in spite of random fluctuations for the same  subset of N = 50 neurons, consistently identified pivotal neurons are less pronounced once we consider different subsets. This means that the properties we find are not tied to individual neurons, but are general features of the ensemble. Since collective synchrony is a lower-dimensional statistic and in principle permits exchange symmetries between neurons, this is not necessarily surprising. On the other hand, we note that these collective properties are robust to heterogeneity in network connectivity, bias, and interactions that would tend to break such symmetries.

Discussion
Examples of how sensory and behavioral information is encoded in neural activity suggest that, while information is sometimes localized to a few neurons, it is at other times distributed amongst many. This may result from information flow between collective to individual components [71] and because the precise scale at which information is processed may vary with time, function, and organism. We develop a perturbative approach that is sensitive to the potential variety of involved scales. Using an information-geometric perspective, we identify statistical aspects of the distribution of neural behavior that are essential for preserving collective activity. Importantly, we do not have to assume that collective properties will be sensitive to individual neurons or certain combinations but can discover the appropriate ones in a principled way. This becomes feasible to explore comprehensively because the response of the system depends on only pairs of perturbations when they are sufficiently small, a property of analyticity of the mapping from activity to collective state.
As an in silico realization of the protocol, we use perturbations that mimic internal neural coordination and calculate their impact on collective synchrony. In a minimal model, we find that dominant perturbative modes do not tend to be distributed amongst many neurons but are localized into pivotal ones. The concentration of sensitivity in a few neurons is analogous to the presence of driver neurons, modulation of which drives the system from one configuration to another [72][73][74]. Interestingly, we find that pivotal neuron identities fluctuate across subsets and Monte Carlo random samples that introduce finite sample errors into the entries of the FIM. Instead of specific pivotal neurons, it is the collective properties of localized eigenvectors and scaling in sensitivity that are preserved.
These collective properties require neural heterogeneity. Since identical neurons would imply uniform eigenvectors, variation in each neuron's local interaction network and bias is responsible for the emergence of pivotal neurons [60]. By shuffling the inferred interaction network and the data, we verify that these features seem to be a result of the distribution and magnitude of interactions but not the specific topology. In this sense, the neurons do not need to be labeled differently as if given some predetermined role, but the collection of local interactions and biases is responsible for the statistical properties of the FIM, echoing findings in theoretical models of neural networks [75]. Though we do not address this question here, it would be interesting to know how such properties emerge and if such ensemble properties are encoded genetically, arise spontaneously, or generalize to mobile worms.
One of major questions that we approach in our framework is how experimental perturbations should be represented in the model. While the focus is often on model parameters as representations of physical dials that are experimentally accessible, direct correspondence is not obvious for a statistical model inferred from data. As an example, when couplings inferred by a pairwise maxent model were directly compared with physical contact between protein residues, the model recovered only a subset of real interactions even while recovering the statistical ensemble [55,76]. Taking the pairwise maxent model, it is clear exactly what increasing a coupling does to the energy function, enhancing the tendency of a pair of neurons to coincide, but the actual result is nontrivial modification of the entire distribution. For the experimentalist, it seems more natural to consider the problem from the perspective of the observable statistics that can be perturbed in a controlled and precise manner. In the case of Boltzmann-type models, the relationship between observables and model parameters can be made exact and results in a simple form for small perturbations. Thus, our formulation is a theoretically and experimentally tractable framework for predicting the effects of perturbations.
To test these ideas requires innovations on top of previous experiments (Section G). A central feature of our thought experiment is the small perturbation. If the strength of the intervention ranges from perturbative to discrete, the perturbative limit lies close to no intervention at all and discrete interventions correspond to silencing or ablation. Discrete interventions may be appropriate considering the logical structure of a circuit but become combinatorially expensive to explore fully [77,78]. While avoiding these issues, small perturbations require a sufficient number of measurements for the perturbed parameters to be measurable. By using the fact that the Fisher information is proportional to the number of independent samples and the Cramér-Rao bound, a lower bound on the number of experimental observations T required to distinguish a change in the mode for a perturbation strength for Fisher information F is on the order of Taking a relatively large perturbation of = 10 −2 , we have T ∼ 10 independent samples for an eigenvalue λ ∼ 10 3 , the order of magnitude of the largest mode [79]. Linear growth in the number of required samples, however, restricts us from measuring insensitive modes in a reasonable amount of time (the experiments analyzed here last 8 minutes and collect roughly 80 to 120 independent samples [50]). Though mapping the full local information geometry of N = 50 neurons would be difficult because the number of pairwise perturbations exceeds ∼ 10 6 (accounting two distinct pairs of matcher and target neurons), a similar experiment with about N ∼ 10 neurons seems reasonable when coupled with techniques for incomplete matrix estimation [80]. The feasibility of such an experiment stems from our argument that perturbative experiments harness analyticity to vastly simplify the range of possible perturbations. We exploit this property to extract a basis for the local information geometry including multi-component perturbations. One goal of such experimental intervention is then not to be more precise but sufficiently varied to span the local basis, an idea that can be generalized to other experimental systems besides the C. elegans model we consider. Until now, we have assumed that the stochastic encoding relating neural to collective activity does not change, but this is not necessarily the case. To consider this, we characterize the encoding with the fundamental notion of channel capacity, the maximum rate at which information can conveyed by such a mapping given by the mutual information I[Y ; Z] = Y,Z P (Y, Z) log (P (Y, Z)/P (Y )P (Z)) [81]. A targeted perturbation to the set of states Y corresponds to a new distributioñ P (Y ) = P (Y )[1 + a r v r (Y )] giving arbitrary weight a r to perturbation vector v r of index r [82]. The perturbation vector v r may refer to a projection from a single-neuron perturbation or eigenvector considered above. Generally, such a perturbation may change not only the distribution of neural statistics P (Y ) but also the mapping P (Z|Y ) = P (Z|Y )[1 + ∆ Z|Y ], which corresponds to an adaptive code that changes in response to incoming statistics. Under small perturbations a r 1 and small adaptation ∆ Z|Y , the change in the mutual information decomposes intõ where we have defined d(Z|Y ) ≡ P (Z|Y ) log[P (Z|Y )/P (Z)] such that Z d(Z|Y ) is the information gained about Z from observing Y . Eq 10 contrasts the "direct" result of a perturbation in contrast with an "adaptive" response by the system. Whereas the former term decomposes into the product of the local information geometry of P (Y ) and the information content of the mapping Y → Z, the latter depends on system response that modifies the stochastic mapping itself, such as rate limiting or rescaled response [83]. Deviations from the predictions in our perturbative thought experiment might represent the effects of such complications from the properties of the stochastic encoding.
The information geometry of scientific theories more generally suggests that they show sparse structure, where a few parameter dimensions strongly change the qualitative characteristics of phenomena and most dimensions are unimportant. This is a result of the logarithmically spaced eigenvalues of the FIM, or "sloppiness" [42,[84][85][86]. Whether by nature or by design, such quantitative reduction vastly simplifies the level of detail required for approximate theories, allowing for accurate prediction even with large uncertainty in most parameters [41,87]. We find here a variation on this idea in the statistics of neural activity in C. elegans. Neural activity seems non-sloppy, with largest eigenvalues decaying slower like Zipf's law in contrast with regular logarithmic spacing when we consider experimentally realistic perturbations. Thus, sensitivity, while concentrated into a few dominant modes, has non-negligible smaller modes that decay in a self-similar way. Might this be a feature of biological neural networks to reduce the dimensionality of control parameters (if interestingly not as strongly as the sloppy case) or to nest varying levels of control? The state-of-the-art today with single neuron measurement in C. elegans and optical genetics is approaching a point where experiments to test this hypothesis may become feasible.

A Numerical solutions to inverse maxent problem
In principle, the maxent formulation presents a unique mapping from statistical correlations to model parameters such that there is no parameter fitting in the usual sense. The problem of determining the fields h i,k and couplings J ij that closely match mean activity and pairwise correlations is known as the inverse maxent problem [88,89]. In practice, however, limitations to numerical precision and finite-sample noise mean that fitting the parameters for even moderately sized systems is not without ambiguity. With this ambiguity in mind, we present the different approaches we use to solve the inverse problem for the pairwise maxent and independent models.
The first method, which we focus on in the main text, allows us to identify a sparse interaction network between neurons mirroring the sparse structural connectivity of the neural connectome [50]. The parameters of the maxent model are initialized at zero and are updated individually as to maximally increase of the likelihood of the data at each step [90,91]. This learning procedure is stopped once the model reproduces the constrained observables within experimental variability, estimated by bootstrapping random halves of the data. If experimental variability is relatively large, this training procedure can return a sparse interaction matrix J ij . We show the results of such a procedure in Figures A.6 and A.7.
The second method we use for comparison relies on the Monte Carlo Histogram (MCH) approach [91], which is an approximate gradient descent algorithm. At each step, the parameters are simultaneously updated by an amount that is proportional to the difference between the corresponding observable calculated from the model and the data. We obtain a solution that is within a norm error threshold where the cutoff is arbitrarily set to obtain a relatively fast and close fit to the statistics of the data. Unlike the first method, MCH returns a dense network of connections since all couplings are updated at every iteration til convergence. While we find that the resulting model aligns well with the features of the data, including collective synchrony, this procedure does not make realistic assumptions about the topological structure of the underlying physical network. For these solutions, we again find a signal for distinguishing pivotal neurons in the uniformity of columns and rows, but it is not as consistent as evident in the column and row uniformities plotted in Figure C.14.
We also consider an independent neuron model. When solving the corresponding maxent model, we penalize large fields by maximizing the log-likelihood log L i for each spin i along with a penalty such that the total cost function C i is defined as because large fields make it especially costly to calculate the FIM accurately. Indeed, a sparse cost function, one where the cost scales with the absolute value of the fields, does not sufficiently penalize large fields, rendering it infeasible for our current implementation of the FIM calculation. With the constraint given in Eq 12, the goal is then to find the fields such that To determine the weight σ, we compute the cost function C i in Eq 12 for a range of σ. constant. At large σ, we recover the original maximum likelihood problem. We set σ = σ * as is determined by the midpoint between these two extremes for the cost function averaged over all spins i. Typically, this is in the interval σ * ∈ [2, 10]. These steps return an independent model of neurons, where the biases of the most extreme neurons have been tempered.

B Calculation of Fisher information matrix (FIM)
To calculate the entries of the FIM, we rely on a Monte Carlo Markov Chain (MCMC) sample of the distribution of neural states p(s), which we then coarse grain to approximate the distribution over collective synchrony φ. With the distribution, we then calculate its Kullback-Leibler divergence under perturbation, which is straightforward to do by altering the statistical correlation functions calculated on the MCMC sample according to Eq 5. Relating this change to a change in the fields {h i } and couplings {J ij } is a linear matrix problem in the perturbative limit. With the corresponding change in the parameters, we calculate the change in energy ∆E({s i }) for each configuration s sampled, the coarse graining of which maps onto the corresponding effective energy of the system. This algorithm is specified in more detail in the supplementary information of reference [60]. Overall, the calculation of the FIM is an expensive computational task where we must obtain each entry of the (N 2 − N ) × (N 2 − N ) FIM which are averages over K MCMC samples for each of |φ| coarse-grained statistics. To parallelize such a calculation, we combine custom code with the Metropolis algorithm implemented in the ConIII Python package [89]. We relied on a number of computational resources including local workstations and computing clusters at the Santa Fe Institute, University of New Mexico, and Vienna Scientific Computing (VSC).
In the main text, we show results from MCMC samples of size K = 10 5 and compare a few cases with a larger samples of size K = 10 6 to verify that we can obtain a good approximation of FIM properties with the smaller sample having fixed = 10 −4 . 1 We verify that our estimates of the FIM entries converge within a relative tolerance of 0.1% when compared with a larger perturbation of = 2 × 10 −4 . The example of the eigenvalue spectra in Figures 4 and E.18 involve an average over 10 samples of size K = 10 5 for the pairwise maxent model the shuffled null model. See Figure B.8 for a comparison of spectral properties of the FIM between the two sample sizes considered.

C Analyzing eigenmatrices
We compare vertical striations in comparison withrows by measuring comparing the sum of row and column uniformities as defined in Eq 8. In Figures C.11, C.12,C.13, and C.14, we show uniformity for 3 additional neuron subsamples for the main approach discussed in the text, the independent model, the shuffled null model, and the MCH solutions, respectively. For pairwise perturbations, we find matrices strongly biased to large column sum norms compared to rows. Given the nature of the pairwise perturbations that we consider, that means that perturbations localized to a particular neuron (in contrast with perturbations that would impact each of the neighbors in turn) would have a dominant impact on the collective outcomes. This is a feature of localized control because it means that turning off all local synaptic connections, or turning them up, for a particular neuron is effective.
As a more direct measure of this, we can analyze the subspace of the FIM corresponding to perturbation of each neuron at a time, fixing m and iterating over all t. As we show in Figure C.9, the principal eigenvalues extracted from single neuron perturbations are correlated with the fraction of MC samples in which the same neuron has unusually strong column uniformity. Thus, two different measures of pivotal neurons, one based on single neuron subspaces and another based on the structure of the entire system, align and show that the strong collective tendencies that we find in the data do not prevent individual neurons from playing important roles.
To determine the scaling of the rank-ordered eigenvalue spectrum, we fit the function detailed in the main text and reproduced here for eigenvalue λ(z) with rank z by performing least-squares minimization on the logarithmic differences. However, there no a priori guarantee that the spectrum of eigenvalues is full rank and the tail of the spectrum may be rife with numerical precision errors, and so we allow for the possibility of a hard cutoff that should be applied. As a heuristic, we apply a hard cutoff when the logarithmic slope falls below −3 and do not fit any points for rank above the cutoff. While we anticipate that numerical precision makes it difficult to estimate the smallest eigenvalues and thus the exact value of the cutoff, we take the sharp tail to be an effective cutoff for when the eigenvalues fall below  Figure 5). Subspace corresponds to diagonal blocks of FIM for fixed matcher index m, which is used in Figure 4C to order neurons. Average taken over ten MC samples with standard error of the mean shown. This means that neurons often above threshold tend to have large collective sensitivity, linking these two different measure of pivotal strength. Points are randomly offset on x-axis for visibility.
a threshold of λ < 10 −7 , minuscule in comparison with the typical principle eigenvalue in the range of 10 2 to 10 4 . As is visible in our fit in Figure 4, Eq 14 with a hard cutoff hews very well to the averaged FIM spectrum. In contrast, a simple exponential decay, having set α = 0, cannot capture the scaling form that we observe. We show an overview of the truncated power law fit exponents and exponential tails in Figure C.10. We compare our findings for the pairwise maxent model with null models including an independent model for neural activity and one where the couplings are randomly assigned to a pair called "coupling shuffled." While the independent model gives qualitatively different results, we find that shuffling the couplings amongst pairs preserves the qualitative features of the FIM.

D Classes of perturbations
We distinguish between two principal classes of perturbations in the main text denoted "observable" and "natural" or "canonical," taking the nomenclature used in Amari's well-known textbook on information geometry [66]. Essentially, these distinguish between perturbations defined in the space of correlations compared with perturbations defined in the space of parameters. Since perturbations in either representation can be transformed to one other by a linear operation, there is no a priori reason to prefer one basis over another. Theoretical treatments tend to consider the canonical picture because of the underlying intuition that they correspond to physical forces.
In physics experiments, it is possible to access directly quantities such as fields, couplings, and temperature to directly modify the Hamiltonian. With a phenomenological model of the neural network, however, we do not expect that perturbations protocols simply map to canonical perturbations. This means that for the natural perturbations we define in Eq 5, there is a corresponding representation in terms of the fields {h i } and couplings {J ij }, but it will generally be a complicated combination of many changes across the system (see Figures 3 and D.15). In this sense, the pairwise maxent model serves as a representation of our inference process rather than a literal model of the neural network.
Besides the distinction between types of perturbations, we have a choice of which  protocols to consider within each space. In the main text, we discuss individual neuron perturbations relative to neighbors as a way of approaching closed-loop control. However, other experimental protocols might be of interest such as probabilistic clamping to a fixed external reference frame always increasing, decreasing, or staying flat. An infinite variety of such variations are possible, and it may be the case that other representations turn out to be more appropriate for capturing simple experimental interventions. Though a complete basis can always, in principle, be transformed from one set of perturbations to another, such transformations may be difficult to perform accurately from limited and potentially noisy data extracted from experiments.
For the sake of completeness, we additionally compute the FIM for the example of clamping each neuron to an external reference frame-akin to clamping to an imaginary neighbor that is always in the up, down, or flat state. For the case of K > 2 possible states of the neuron, clamping a neuron to a single state involves specifying the relative chance in the probabilities that the neuron is in the remaining K − 1 states. We consider the case where clamping the neuron to a particular state y changes the probabilities of the remaining two configurations in a way that traces out the geodesic towards p y ≡ p(s m = y) = 1 in the two-dimensional simplex ( Figure E.17). Experimental results may suggest more empirically grounded way of accounting for the relative change in probabilities. Taking this formulation, the local perturbation for the matcher neuron changes its bias as cos(π/2 − θ y ) cos(θ y − π/6) , cos(π/6 + θ y ) cos(θ y − π/6) .
In Figure D.16, we show a summary of results from such a protocol on the pairwise maxent model. We find that eigenvalues decay faster with rank when considering observable perturbations compared with natural perturbations for both fine and coarse collective synchrony. Across the maxent models we consider and the fine and coarse measure of synchrony, we consistently find that the decay exponent is faster for natural perturbations.

E Relating observable and canonical perturbations
In the main text, we primarily focus on perturbations to the observed correlations, but we also find consistent differences between observable and natural perturbations. As we show in Figure 4, the spectrum tends to decay faster for canonical perturbations and they are smaller in general. To gain some insight into these differences, we consider the two types of perturbations for a highly simplified case of a binary neuron characterized by a mean activity level.
Consider states s with probabilities p(s) = exp[−E(s)]/Z coarse-grained into a distribution φ(k) = |s|=k p(s), where the notation |s| = k means that there are k spins in the majority. Under the replacement rule, the probability of neuron i being in one configuration p i is modified to becomẽ Note that the derivative that returns the appropriate is of the form −d/d[log(1 − p i )] = (1 − p i )d/dp i . This would correspond to our observable perturbation, whereas the canonical perturbation would be with respect to the Langrangian multipliers, the fields and couplings for the pairwise maxent model. The quantity of interest is the "score function," or the expectation value of the second derivative of the probability distribution that gives us the entries of the FIM, The terms involving log φ are of course the susceptibilities of the kth correlation function with the spin i. Since φ is a coarse-grained version of p(s), we expect that the derivatives will reduces to linear combinations of correlation functions.
To simplify notation, we take the term inside the brackets having defined where we have introduced the field h i for neuron i. By taking the derivative with respect to the field, we have incurred a term proportional to the curvature in the change of variables as well as the square of the jacobian bringing us from y i to h i . Now, we calculate the jacobian terms. Note that we are only considering the domain where the relationship between log(1 − p i ) and h i is analytic, otherwise we would have to deal with branch cuts. Given this caveat, we differentiate Using the fact that p i = ( s i + 1)/2 and that the derivative is the susceptibility, or the variance of spin i, we obtain Then, the second derivative is Thus, the maxent model allows us to explicitly calculate the jacobian in terms of linear response quantities that tell us how the observables change under a small perturbation of the fields. Now, let us go back to the score function in Eq 17. Note that because all the above quantities are already ensemble averages, we can factor them out of the brackets in Eq 17. This means that we have In contrast, if we were to do the same thing but take the derivative with respect to the fields in Eq 18 instead of the natural parameter, we would simply swap the y i 's with the h i 's. This means that we have the reciprocal of the jacobians, In other words, changing variables gives us a different factor that depends on the mean magnetization of the entries of the FIM. This means that we may expect the overall eigenvalue spectrum to be, at the least, scaled differently. From numerical calculations, we find that the spectrum for observable perturbations to almost always be strictly greater than that for natural perturbations, which may reflect a change in variables.

F Data sets and samples
We use the C. elegans worm data from reference [4] and the same inverse maxent procedures detailed in reference [50]. We analyze the two immobilized worms labeled with experiment numbers 170419 and 154139. For each worm, we consider a limited number of different subsets of N = 50 neurons (about the maximum number of neurons that can be reliable measured from the data), where the neurons have been randomly selected amongst those that have visited all three states at least once.

G Experimental implementation
A realization of our thought experiment depends on developments in simultaneous use of recording, perturbation, and computational analysis. The experiment would require tracking time-averaged neural statistics before and during perturbation with the ability to extract simultaneously the discrete state of recorded neurons, a procedure that currently relies on post-experimental analysis to handle noise and changing fluorescence [50]. Furthermore, single-neuron tracking is difficult and may be feasible with immobilized worms, but presents a challenge to perform accurately with freely moving ones. Furthermore, it is essential that the nature of the perturbation on membrane voltage be precisely calibrated in order to replicate as closely as possible theoretical clamping, which may require some characterization of the particular experimental techniques used. This presents an abbreviated list of some of the experimental difficulties that we foresee for implementing such an experiment. Though we assume that the perturbation randomly "flips" the neural configuration of the focus neuron with some small probability at any given moment in time, the perturbation specified in Eq 5 is compatible with variations of the experiment that may be more natural. For example, it may be more natural to force neurons up after a flat state but not directly from a down state. More specifically, it may be important to keep the slope of underlying neural activity from diverging. While such history-dependent clamping is compatible with our formulation that relies only on time averages, a rigorous comparison of techniques might reveal some protocols preferable over others in terms of feasibility or correspondence to theoretical predictions.
Finally, our proposed thought experiment generalizes beyond neural activity or the C. elegans model system, where it may be possible to simultaneously observe system configuration while imposing a perturbation. With optogenetic tools, it may be interesting to consider other neural systems such as in vitro cortical cultures or the neurons involved in Drosophila flight.