The quality and complexity of pairwise maximum entropy models for large cortical populations

We investigate the ability of the pairwise maximum entropy (PME) model to describe the spiking activity of large populations of neurons recorded from the visual, auditory, motor, and somatosensory cortices. To quantify this performance, we use (1) Kullback-Leibler (KL) divergences, (2) the extent to which the pairwise model predicts third-order correlations, and (3) its ability to predict the probability that multiple neurons are simultaneously active. We compare these with the performance of a model with independent neurons and study the relationship between the different performance measures, while varying the population size, mean firing rate of the chosen population, and the bin size used for binarizing the data. We confirm the previously reported excellent performance of the PME model for small population sizes N < 20. But we also find that larger mean firing rates and bin sizes generally decreases performance. The performance for larger populations were generally not as good. For large populations, pairwise models may be good in terms of predicting third-order correlations and the probability of multiple neurons being active, but still significantly worse than small populations in terms of their improvement over the independent model in KL-divergence. We show that these results are independent of the cortical area and of whether approximate methods or Boltzmann learning are used for inferring the pairwise couplings. We compared the scaling of the inferred couplings with N and find it to be well explained by the Sherrington-Kirkpatrick (SK) model, whose strong coupling regime shows a complex phase with many metastable states. We find that, up to the maximum population size studied here, the fitted PME model remains outside its complex phase. However, the standard deviation of the couplings compared to their mean increases, and the model gets closer to the boundary of the complex phase as the population size grows.


Reviewer 1
1. Olsen et al fit a large number of Ising models to Neuropixel recordings from multiple cortical areas of feely moving rats from a public dataset.They analyze how the quality of the model degrades with different features of the data such as population size, time binning or firing rates.The paper seems technically sound, and rigorous in varying different aspects of the analysis pipeline, which empirically expand theoretical results from Ref.10.It also introduces a couple of interesting statistical elements, in particular a new Ising partition function estimator and some lessons on the complex relationship between different empirical metrics of model quality used in practice.
We are very happy that the reviewer finds our results sound and rigorous and appreciates the statistical ideas we have introduced.

Experimentally we are not learning anything fundamentally new:
We agree that the focus of our writing in the first version was more on model evaluation and not a new biological result.In the revised manuscript we have addressed this issue in 3 ways: -We have employed PME models to assess the effect of sensory stimulation on higher-order correlations by comparing PME models in the auditory cortex during silence and during bursts of white noise.See line 478-484 and Fig. 11.

-
We have also expanded on our results on the effect of light versus dark conditions in the visual cortex so that the effect of stimulation can be compared between auditory and visual cortex.See Fig. 11.
-As per the reviewer's suggestion (8) below, we have analyzed the structure of the inferred couplings within and between the visual and auditory cortices.See section "Couplings within and between areas", line 484-499 and Fig. 12 and 13. - We have also made changes to the Introduction and Discussion to reflect these additional points and results.
3. it is a much more precise characterization of the long known observation that a simple Ising model only accounts for data well in small populations.Still, the manuscript does the grunt work of checking how this conclusion could be affected by various data and analysis considerations.
Indeed, as we and others have argued in the past the pairwise models are likely to perform well in terms of KL-divergence only for small populations.However, as Reviewer 3 also notes, there still is a lot of misunderstandings about maximum entropy models.There are indeed recent claims such as maximum entropy "[...] approach has been successful for populations of N ∼ 100 neurons" (https://arxiv.org/abs/2310.10860) and "These pairwise maximum entropy models have been strikingly successful in describing collective behavior not only in networks of real neurons […]" (https://arxiv.org/abs/2402.00007v1).
These statements and other similar ones in the literature are not completely incorrect, but imprecise and confusing because of the different ways different people have used and assessed the PME.We thus believe highlighting this issue and systematically evaluating different measures will be of great use to the community.
4. I find dubious the scientific relevance of fitting a single Ising model to data that combines several computationally distinct sensory/behavioral conditions from multiple brain areas, but that concern is somewhat outside the scope of this paper.Still, I wonder if lumping together multiple conditions in this way also affects analysis in ways that DO matter for the goal at hand: naively one would think that combining subsets of data with different statistics (due to different computational regimes) results in mixture distributions with more complex associated pairwise correlation structure.Please discuss.
We agree with the reviewer regarding combining data from different sensory/behavioural conditions and brain regions.For this reason, we had indeed confirmed that our results did not qualitatively change if we only included neurons from one area or experimental condition.
Following this comment, we now expanded on this issue (see Figure 9, 10, and 11).We also now performed more a more quantitative comparison between population recorded only from the visual and only from auditory cortices areas under different conditions.See response to point 2.
5. The explanation of the core metrics needs substantial improvement.In particular I get the moment matching argument for the entropy difference definition of the KL, but it is sloppily done; similarly, the form of the factorized model is not terribly intuitive or properly explained.I imagine that a reader not already quite familiar with MaxEnt models would have a hard time following that.Either provide references together with each equation to make clear where it's coming from, or (ideally and) explain it in an accessible way in the results.
Following this comment we have added further explanations to improve clarity and accessibility.In particular, we also spell out the steps in the moment matching argument.
6. Similarly, while I appreciate the symmetry of having all quality measures ranging between 0 and 1 with zero 'as good as factorized model' and one 'as good as actual data', this involves additional nonstandard steps for some of the metrics and those need a little justification: why sqrt(l2 distance ratio) specifically?Please comment.
We agree that such definitions have a degree of arbitrariness to them.Since previous studies have only relied on visual inspection of the scatter plots of third-order interactions we attempted to perform quantitative comparisons in a simple way.We apologize for the confusion.We meant the former and we have now clarified this on line 309.
8. Does the structure of the larger neural populations sub-factorize by area?If looking at the estimated Js in a mixes population including subsets of neurons from different brain regions then is that structure largely block diagonal, with weak across area interactions and stronger within area couplings?
We thank the reviewer for suggesting that we look into this.
We now test this in addition.In addition we also report results on how stable the stronger couplings are as more neurons are being added either from the same, or from a different area.
Our results show that in general connections from within an area consistently have larger mean than those from outside the area.See the new section "Couplings within and between areas" and Fig. 12 and 13. 9. since the partition function estimator is critical for a good part of the results, its construction needs to be explained more clearly.I also don't understand the logic of not describing its properties in the paper itself in a self-contained way.
We avoided this as we thought the paper would get too long.But since Reviewer 1 and Review 3 both request this, we have now added further explanations and evaluations to the current manuscript.See line 386-393 in the main text and the new appendix Appendix S7.
10. Critically, the results text is missing a clear explanation of how the entropy of the real data is being estimated for N>20; also how does one practically go from an estimate of Z to that for S_pair?
We have now clarified this but adding new text explaining this in lines 362-367 and lines 183-187.
11. Overall, across all analyses the critical variable affecting model quality is the product N\bar\nu\delta t.
The fundamental message of all analyses is to argue for a decrease in model performance as a function of this quantity.Given this agenda, one would need a little more initial motivation to justify why is this the critical quantity in the original theory and why should one expect it to be meaningful outside the perturbation regime of the original theory.
Following the reviewer's suggestion, we now provide a more intuitive explanation of where N\bar\nu\delta appears in the theoretical analysis and why it is a critical parameter in the study of PME models applied to neural data.See line 234-239 and 245-249.
12. I personally don't find the SK subsection of the Results useful or experimentally relevant , despite the precedent of similar analyses in the retina.I also fundamentally question the interpretation of such metrics given that the data distribution was obtained by marginalizing across experimental conditions.
We have indeed tested this analysis on subsets of data from lights-on and lights-off conditions obtaining similar results.
As we emphasize, and perhaps for a similar reason as the reviewer, we do not use an analogy with the SK to imply something about the neural code per se as done in the retina.We simply employ known results about the SK model to quantify how hard sampling from the fitted model is.We believe for this purpose, using such known results are quite suitable.This is now further emphaszied in lines 644-654.We also have limited our reference to the discussion of this results in the Discussion. Minor: 13. Section … is mentioned in the 2nd paragraph go Methods/Dataset , probably a legacy of a previous numbered section version of the text o Thank you, this is now fixed.
14.When introducing the constraints on the first and second moments (eq 1a,b) they are referred to as mean and correlations which is strictly speaking a little misleading.
Unfortunately we are not quite sure how this can be misleading as the quantities are clearly defined and are commonly used in the literature.We apologize for this and would be happy to change We have expanded on the information about the color gradients in the figure text.See Fig. 2 (page 9) and Fig. 3 (page 10) of the current version.
18. DATA and CODE sharing: the data is public but the code for the results it is not , at least as far as i can see.
The code is now available here: https://osf.io/dajc6/?view_only=9862613238544ad4b8b85b1b42451c30 Reviewer 2 1.The manuscript describes quantification of maximum entropy models to fit statistics of recorded networks in the brain of a foraging rat.The recorded cells were distributed in several brain regions including visual, auditory, motor and somatosensory cortices.The authors perform analysis of the success of the PME models to predict several other statistics of networks' dynamics.Overall, the analysis looks good and well performed.
We are pleased that the reviewer finds our analysis well-performed.
2. The authors also analyze predictions to larger populations and claim that the PME failed to predict the activity structure of these networks, a well know fact.
There are still numerous papers that considers PME's as an excellent model for neural data.Two recent examples are "[…] this approach has been successful for populations of N ∼ 100 neurons" (https://arxiv.org/abs/2310.10860) and "These pairwise maximum entropy models have been strikingly successful in describing collective behavior not only in networks of real neurons […]" (https://arxiv.org/abs/2402.00007v1).
As Reviewer 3 also states, there are still large misunderstandings about the PME models.A part of this emanates from the different ways different people have used and assessed the PME models.
Indeed, in addition to showing the failure at large N when measured via KL-divergence we also show that pairwise models may perform quite well in predicting third-order correlations or probability of simultaneous spikes, in the regime where they fail in terms of KL-divergence.
PME models, and indeed its failures, can also be used as a tool to assess the importance of higher-order correlations.
Given the above and as suggested by reviewer 1, we have further looked into how pairwise models perform between different stimulation conditions and across brain regions and have also studied the functional connections that emerge.We would also note that we show how the performance of the PME can be positively/negatively affected by the choice of time bin and average firing rate of the neurons.
We have now changed the main text to reflect the new results, and also changed the introduction and discussion so that the focus is not primarily on the failures of the PME.

Following
figure 8 but now only looking at the root mean squared error without the normalizations in see Supplementary FigureS5.