Skip to main content
Advertisement
  • Loading metrics

The topology of synergy: Linking topological and information-theoretic approaches to higher-order interactions in complex systems

  • Thomas F. Varley ,

    Roles Conceptualization, Formal analysis, Investigation, Software, Visualization, Writing – original draft

    tfvarley@uvm.edu

    Affiliations Vermont Complex Systems Institute, University of Vermont, Burlington, Vermont, United States of America, Department of Computer Science, University of Vermont, Burlington, Vermont, United States of America

  • Pedro A. M. Mediano,

    Roles Software, Writing – review & editing

    Affiliations Department of Computing, Imperial College London, London, United Kingdom, Division of Psychology and Language Sciences, University College London, London, United Kingdom

  • Alice Patania,

    Roles Supervision, Validation, Writing – review & editing

    Affiliations Vermont Complex Systems Institute, University of Vermont, Burlington, Vermont, United States of America, Department of Mathematics, University of Vermont, Burlington, Vermont, United States of America

  • Josh Bongard

    Roles Supervision

    Affiliations Vermont Complex Systems Institute, University of Vermont, Burlington, Vermont, United States of America, Department of Computer Science, University of Vermont, Burlington, Vermont, United States of America

Abstract

The study of irreducible higher-order interactions has become a core topic of study in complex systems, as they provide a formal scaffold around which to build a quantitative understanding of emergence and emergent properties. Two of the most well-developed frameworks, topological data analysis and multivariate information theory, aim to provide formal tools for identifying higher-order interactions in empirical data. Despite similar aims, however, these two approaches are built on markedly different mathematical foundations and have been developed largely in parallel - with limited interdisciplinary cross-talk between them. In this study, we present a head-to-head comparison of topological data analysis and information-theoretic approaches to describing higher-order interactions in multivariate data; with the goal of assessing the similarities, and differences, between how the frameworks define “higher-order structures.” We begin with toy examples with known topologies (spheres, toroids, planes, and knots), before turning to more complex, naturalistic data: fMRI signals collected from the human brain. We find that intrinsic, higher-order synergistic information is associated with three-dimensional cavities in an embedded point cloud: shapes such as spheres and hollow toroids are synergy-dominated, regardless of how the data is rotated. In fMRI data, we find strong correlations between synergistic information and both the number and size of three-dimensional cavities. Furthermore, we find that dimensionality reduction techniques such as PCA preferentially represent higher-order redundancies, and largely fail to preserve both higher-order information and topological structure, suggesting that common manifold-based approaches to studying high-dimensional data are systematically failing to identify important features of the data. These results point towards the possibility of developing a rich theory of higher-order interactions that spans topological and information-theoretic approaches while simultaneously highlighting the profound limitations of more conventional methods.

Author summary

The problem of understanding when a set of interacting components of a complex systems produce behavior that is “greater than the sum of their parts" is foundational in many areas of modern science. Two different mathematical approaches have been developed to study higher-order interactions in data: one based on topology, and another based on information theory. These two frameworks are very different, and there has been little study of their overlap or the extent to which they are sensitive to the same “kind" of higher-order interactions. In this study, we compare both types of analyses directly. We find that there is indeed overlap: higher-order structures in the topological sense are correlated with irreducibly synergistic interactions in the information-theoretic sense. These results suggest that these two fields may share as-yet undiscovered mathematical connections, and deepen our understanding of emergent properties in complex systems.

1 Introduction

Complex systems, such as brains, societies, and ecosystems are defined by emergent coordination between large numbers of distinct elements (e.g., neurons, individuals, or species). Consequently, a core problem in the study of any complex system is understanding the structure of the interactions between elements and the relationship between the “parts" and the “whole" (a philosophical tradition known as mereology [1]). Since the inception of complexity science, one of the most successful models that has been developed is that of the network [2,3]. In a network, the basic objects of study are 1) elements (sometimes referred to as nodes or vertices) and 2) dyadic interactions between elements (typically called edges or links). The dyadic nature of interactions is crucial: in a network, the only directly accessible dependencies are between pairs of elements. Interactions between multiple elements (such as meso-scale communities [4,5], multi-element motifs [6], etc.) must be built from combinations of lower-order (pairwise) dependencies. In many networks, this limitation is not problematic: for example, in an airline network, planes have defined origins and destinations. In those cases, having a structure composed of a dyadic building block is very natural. However, in statistical networks (also called “functional connectivity" networks), the pairwise restriction is not totally natural: it is possible to ask about the dependency that is “irreducibly" intrinsic to sets of three or more elements [7]. A classic example of this is the exclusive-OR (XOR) gate, which forms the basis of modern cryptography. In a trivariate XOR gate, there is no correlation between any pair of variables, but the entire triad contains one bit of information (“bits", and “nats" are units of measure that quantifies the amount of information shared by variables. Bits are produced when the logarithm is base 2, nats when the logarithm is base-e). For a worked example, see SI Text 6.

Recent analyses, both formal and empirical, have suggested that in statistical networks, meso-scale structures constructed from pairwise dependencies (eg. Pearson correlation or mutual information) are systematically biased in ways that fail to represent genuine higher-order interactions that are “greater than the sum of their parts" [8,9]. There remains a need for well-developed mathematical and statistical frameworks to directly assess higher-order relationships in complex systems. Several different approaches to assessing higher-order interactions in data have been proposed in recent years [10], and here we discuss two of the most well-developed: topological data analysis (TDA) [11,12] and multivariate information theory [1315]. Both formalisms provide frameworks by which higher-order interactions between three or more variables in a dataset can be identified, however, these branches of applied mathematics have developed almost entirely in parallel, with limited cross-fertilization between them. As such, it remains unclear to what extent “higher-order" interactions in the topological sense reflect the same kind of “higher-order interaction" in the information-theoretic sense.

A wrinkle in the problem of comparing topological and information-theoretic approaches to higher-order interaction is that the topological approach only defines one “kind" of higher-order interaction (based on the dimensionality of structures in the data manifold—cycles, voids, connected components, etc.), while the information-theoretic approach often defines two distinct “kinds" of higher-order interactions: redundancy and synergy. Redundancy constitutes information that is duplicated over multiple proper subsets of variables simultaneously: for example, given a set of random variables the redundant information is that information that could be learned by observing X1 alone or X2 alone or X3 alone and so on. In contrast, synergy constitutes information that is present only in the joint state of multiple variables and cannot be learned by observing any proper subset, i.e. synergy is that information that can only be learned when X1 and X2 and X3 and so on are observed together.

This study is broken into two parts: in the first part, we apply measures of higher-order information to point clouds sampled from manifolds with diverse, known topologies (spheres, toroids, planes, knots, etc.). This lets us build intuition by comparing and contrasting simple, easy-to-understand cases. In the second part, we apply measures from topological data analysis and multivariate information theory to fMRI data recorded from the human brain. The study of higher-order interactions has been particularly popular in neuroscience, as the question of how many interacting “parts" produce a coherent, emergent “whole" is a central question in the field. Both topological and information-theoretic approaches have found empirical links been higher-order interactions and diverse neural phenomenon including consciousness [1619], cognitive performance [2022], aging [23], neurodegeneration [2426], structural and genomic data [27], and more. By directly comparing topological and information-theoretic approaches in complex, empirical data, we aim to deepen our understanding of higher-order statistics in naturalistic circumstances.

We now turn to providing brief introductions to the basic machinery of multivariate information and topological data analysis, with a specific focus on how they represent higher-order interactions in data.

1.1 Multivariate information theory and higher-order interactions

For an N-dimensional random variable , which takes values x from the support set according to the probability distribution , the Shannon entropy of X is defined as:

(1)

The entropy can be understood as quantifying an observer’s uncertainty about the state of X. While entropy is commonly presented as a measure of “information", it is better understood as a measure of uncertainty, and information arises from the reduction in uncertainty. We can confirm this intuition by considering the simplest measure of information: the bivariate mutual information:

(2)(3)

The information that random variable X1 discloses about another random variable X2 (the mutual information) is the difference between our initial uncertainty about the state of X1 (H(X1)) and the uncertainty that remains after learning the state of X2 (the conditional entropy: ). The mutual information is a symmetric measure that shows how “information" is associated with the reduction in uncertainty.

To go beyond the pairwise case and consider multiple interacting elements simultaneously, several different measures have been introduced, each of which is sensitive to different notions of “structure": the total correlation, the dual total correlation, the O-information, and the S-information. See Rosas et al. [28] for a discussion of the relationship between these measures.

The total correlation [29] (also independently derived as the “integration" [30]) measures the degree to which a multivariate random variable deviates from independence:

(4)

If all variables in X are independent, then bit. Conversely, if all Xi are deterministic functions of each-other, then the total correlation achieves its maximum possible value of bit. In the case of two variables the total correlation reduces to the classic, bivariate Shannon mutual information: . Rosas et al., described the total correlation as a measure of the “collective constraints" imposed on the whole system: the more constrained the joint distribution, the higher the total correlation [13].

The second measure of higher-order information-sharing is the dual total correlation [31]:

(5)

The dual total correlation quantifies all information that is shared by two or more variables, and has become an increasingly popular measure for genuine higher-order interactions in neuroscience [9,24,32]. Like the total correlation, in the case of two variables the dual total correlation reduces to the classic, bivariate Shannon entropy: , but its limit behavior is more complex than the total correlation. Like the total correlation, it is zero when all elements are independent, but it is also low (but non-zero) in the case of total synchrony as well: instead it is maximized when all elements are integrated by complex, multipartite dependencies [31]. Rosas et al., described the dual total correlation as quantifying the amount of “shared" information present in the system [13], highlighting how different measures of higher-order information quantify different notions of structure.

The third measure is the O-information. First introduced by James and Crutchfield as the “enigmatic information" [33] and latter re-examined and renamed by Rosas et al., [13], the O-information is the difference between the total correlation and the dual total correlation:

(6)

The O-information is a signed measure that quantifies the overall balance of higher-order redundancy and synergy in the system. If , then the system is redundancy-dominated, while if , then the system is synergy-dominated. In the specific case of three variables (which is what is used exclusively in this manuscript), the O-information can be written out as a whole-minus-sum measure:

(7)

This definition provides insight into the nature of synergistic information: if there is more deviation from independence at the level of the triad () than in all pairs of elements (), then the “whole" is greater than the “sum of its parts."

The final measure is the S-information (first introduced by James et al., using the tongue-in-cheek “very mutual information" [33]) and later explored by Rosas et al. [13]:

(8)(9)

The S-information is equal to the sum of the mutual information between each element and the rest of the system. In a sense it quantifies the total amount of information in the structure of the system (although note that the S-information can actually be greater than the entropy, since redundant information gets double-counted).

Comparing different systems with the O-information is non-trivially difficult, and most analyses typically just consider the sign (whether the system is redundancy- or synergy-dominated). The absolute value of the O-information depends on the total deviation from independence, which makes it difficult to compare different systems with different values of and (see Liardi et al. [34] for a discussion on normalizing information quantities). One possible way to correct for this is by normalizing the O-information:

(10)(11)

The normalized is bounded by the range [–1,1] (although the bound is not tight), and since the S-information is strictly non-negative, the sign retains the same interpretation. Normalizing by the S-information controls for the variable total amounts of information in the higher-order structure, making direct comparison between different systems more precise.

1.1.1 K-nearest neighbors based information estimators.

Typically, information-theoretic measures are computed on discrete probability distributions, where variables can take on a finite number of mutually-exclusive states. For continuous random variables, Shannon defined the differential entropy:

(12)

and from this, continuous analogs of all previously introduced measures can be constructed. Unlike discrete probability distributions, however, the problem of estimating from data is more involved. If one can assume that the data is multivariate-normal, Gaussian estimators exist and are popular in neuroscience, particularly when using fMRI signals (e.g. [9,37]). However, for continuous, real-valued data that doesn’t satisfy parametric assumptions, a comparatively novel class of estimators based on K-nearest neighbors graphs have been developed.

A detailed formal treatment is beyond the scope of this manuscript, but the fundamental intuitions are reasonably straightforward. Imagine sampling points from some (potentially high-dimensional) probability manifold with non-trivial structure. Sampled points will, necessarily, cluster more in the high-probability regions, while points in low-probability regions will be more isolated. The distance from a given point to its nearest neighbor can then be used to estimate probabilities. The longer the distance, the lower the local probability in that region. This forms the basis of the Kozachenko-Leonenko non-parametric entropy estimator [38,39], and subsequent derivatives. For a visualization, see Fig 1.

thumbnail
Fig 1. Estimating underlying distributions via K-nearest neighbors measures.

This cartoon demonstrates how discrete K-nearest neighbors analyses can be used to estimate the structure of a continuous, underlying distribution. Consider a bivariate normal distribution (left): if we sample a large number of points from it (center), we see that the density of the point cloud tracks the underlying local probability density around each point. If we then compute the distance to the fourth nearest neighbor, and color the points (right), we see how the the distribution of distances roughly recapitulates the underlying bivariate Gaussian.

https://doi.org/10.1371/journal.pcbi.1013649.g001

The original estimators, developed by Kozachenko and Leonenko [38], have since been generalized and refined. The most well-known of these is the Kraskov (or KSG) mutual information estimator [35], which provides a non-parametric estimator of the coupling between two variables. Visualized in Fig 2, for two variables X and Y, for each data point, the Kraskov estimator counts the number of points along the marginal X and Y axes that fall between the point and its -nearest neighbor. This adaptive approach allows estimates of mutual information, and subsequently, generalizations such as the total correlation, dual total correlation, O-information, and S-information directly from point clouds without the need for discretizing the data or imposing parametric assumptions. For formal details of the non-parametric O-information estimator, see Sect 4.1. All measures reported here are implemented in the JIDT package [36]. Distance was defined using the Chebyshev distance for consistency with the TDA analysis using the Ripser package (described below).

thumbnail
Fig 2. Kraskov mutual information estimator.

A brief cartoon detailing the basic logic of the Kraskov mutual information estimator [35]. For each point, a diameter is defined by the distance from that point to it’s fourth nearest neighbor. The estimator then counts the number of points within the diameter projected down to the constituent axes, and from that computes the local (pointwise) mutual information [36]. The expected value over all the pointwise values gives the estimated mutual information.

https://doi.org/10.1371/journal.pcbi.1013649.g002

1.2 Topological data analysis and the Rips filtration

The other data-driven approach to higher-order structures is topological data analysis, which is based on algebraic topology, a field of mathematics that combines techniques from abstract algebra and topology to study the properties of high-dimensional structures and spaces. The core intuition behind TDA is identifying “voids" or “cavities" in the structure of some high-dimensional point cloud. These “forbidden" regions correspond to configurations or states that the system under study is restricted from adopting, suggesting some kind of global integration that governs the structure of the part-whole interactions [11,12].

The basic building block in TDA is a simplex, which is a k-dimensional generalization of a triangle (a 1-simplex is two points connected by a single edge, 2-simplex is a triangle made of three points and three-edges, a 3-simplex is a tetrahedron, and so on). Simplexes can be structured into simplicial complexes, which are collections of intersecting simplexes, with the requirement that any face of the simplicial complex is also a simplex in the complex (i.e. if a tetrahedron is in the complex, then each of it’s triangular faces is also part of the complex). This recursive structure provides useful mathematical guarantees that can be leveraged to rigorously analyze quantitative data.

There are many tools that have been developed to infer the presence of voids in data. One of the most popular is the Vietoris-Rips filtration [40], which identifies voids by “growing" a simplicial complex from the point cloud. For a visualization, see Fig 3, but briefly, balls are expanded around each point in the point cloud, and if two balls intersect, an edge is drawn between the points (forming a 1-simplex). As the diameter of the balls expand, increasingly distant points are connected and the simplicial complex becomes ever more densely connected until eventually the diameter of the balls becomes large enough that the point cloud becomes fully connected.

thumbnail
Fig 3. Rips filtration.

Consider a two-dimensional point cloud arranged into a rough ring (the same cloud as used in Fig 2). Around each point, balls (blue circles) are expanded, and when the radii of two balls intersect, an edge is drawn between the points. When the diameter is low, the simplicial complex is disconnected, comprised on small, tree-like structures. As the balls expand, the simplicial complex becomes denser, and large-scale structures in the data are revealed (in this case, the central void). Eventually, the diameters will be so large that the whole complex will be densely connected and the void will close.

https://doi.org/10.1371/journal.pcbi.1013649.g003

Voids in the point cloud can be identified by the diameter of the balls at the moment that they form (birth), and the diameter of the balls at the moment they become “filled in" (death). This defines the persistence lifetime of the void. From this basic framework, a large number of statistics can be calculated, including the maximum persistence (lifetime of the longest-lived void), the number of unique voids, and more complex derivatives such as the persistence entropy or Betti curves [19]. Here we focus on the average persistence and the number of unique, three-dimensional voids, as basic, easy-to-understand summary statistics describing the higher-order topology of the cloud.

2 Results

2.1 Manifolds with known structure

We begin by exploring a set of manifolds embedded in a three dimensional space and which have known structure and topological properties. These can serve as intuition-pumps, helping us ground the abstract mathematics of higher-order information with intuitively easy-to-grok examples and are visualized in Fig 4.

thumbnail
Fig 4. O-information of point-clouds with known structure.

Top row: Two point clouds that display only “contextual" higher-order information. The one-dimensional line, when embedded in a three-dimensional space, is highly redundant, but after rotation with a PCA, all higher-order information is obliterated, as all the information can be represented by one dimension. Similarly, the two-dimensional plane is synergy-dominated when embedded in a three-dimensional space, but also loses its higher-order structure after rotation with PCA. Middle row: Two shapes that have “intrinsic" synergy associated with three-dimensional cavities. The sphere is perfectly rotationally symmetric, and so no rotation changes the value of the O-information, while the toroid contains a mixture of contextual and higher-order information. Bottom row: Two shapes that contain intrinsic redundancy: a trefoil knot, and its generalization, the p,q-knot (p=5, q=3). These curves a locally line-like, but cannot be losslessly embeded in a lower-dimensional space.

https://doi.org/10.1371/journal.pcbi.1013649.g004

We begin with the simplest possible three-dimensional point cloud: a one-dimensional line embedded in a three-dimensional space. A one-dimensional random variable, X1 was defined by uniformly sampling 10,000 points on the [-1,1] interval. X1 was then copied two times to construct a three-dimensional random variable, where . We hypothesized that this system should have very high redundancy, as knowing the state of any Xi immediately resolves all uncertainty about the state of Xj and Xk. This was borne out: the KNN-based O-information estimator found a strongly redundancy-dominated structure ( nat). Despite this apparent higher-order information, however, the point cloud itself is fundamentally low dimensional: it is possible to rotate it so that all of the variance falls along the first dimension using PCA. After doing so, the O-information drops to 0. The apparent higher-order information was “contextual" - it depended on how the point cloud was oriented in three-dimensional space, rather than being “intrinsically" higher-order. This will become a recurring theme.

We can see the same phenomena occur with another very simple manifold: a two-dimensional plane, rotated to be embedded in a three-dimensional space. To construct such a plane, X1 and X2 were independently sampled from the interval [–1,1] (10,000 points each). Then the whole plane was rotated 45° degrees along each axis, embedding the plane in a three-dimensional space. This embedded plane had a strongly negative O-information ( nat), however, once again, after rotation using PCA, the O-information dropped to 0. Where does this apparent higher-order synergy come from? It emerges from the fact that, since the plane is a two-dimensional shape rotated into a three-dimensional space, knowing the joint state of X1 and X2 simultaneously is enough to uniquely specify the state of X3. However, knowing either X1 alone or X2 alone resolves very little uncertainty about X3, since for any given value of X1 alone or X2 alone, there are many possible values of X3. When the plane is rotated with PCA, though, it becomes re-embedded in its “natural" two-dimensional space, and there is no possibility of higher-order structure in a two dimensional space. Once again, we see a distinction between “contextual" higher-order information, and “intrinsic" higher-order information.

Can we construct a point cloud that has “intrinsic" higher-order information? Yes. Consider a hollow sphere. The sphere itself strongly deviates from global independence (i.e. ), as it is hollow: the points are constrained to an infinitely thin shell enclosing an empty void. However, if we project the sphere down onto any two-dimensional place, the result is (approximately) a filled circle. We can demonstrate this by sampling 10,000 points from a sphere centered on the origin, and with radius of 1. The resulting O-information is strongly negative ( nat), and crucially, since the sphere is radially symmetric along all axes, putting it through a PCA has absolutely no effect on the O-information at all. It remains –1.384 nat. To test whether it was the presence of the cavity specifically that drove the higher-order synergy (rather than the rotational symmetry), we sampled 10,000 points from a ball with the same dimensions as the sphere, and computed the O-information before and after rotation with PCA. As expected, the ball had vanishing O-information ( nat, possibly due to bias or variance in the estimator). It did retain the rotational symmetry however: after PCA, the O-information remained the same: ( nat). Based on these results, we argue that it is the topological feature of the hollow sphere (the empty cavity) drives the higher-order synergy, while the geometric rotational symmetry makes it robust to rotation. For further analysis of the sphere, see SI Text 1.

We can further demonstrate this using another shape with a non-trivial topology that includes a three-dimensional cavity: a hollow torus (an inflatable inner-tube). We randomly sampled 10,000 points from the surface of a torus with an minor radius of 0.5 and a major radius of 1, and then rotated the torus radians along every axis. The resulting O-information was strongly negative ( nat). When we rotated it with PCA, there was some decrease in synergy, but it remained significantly negative ( nat). Further rotation around the third principal component (which emerges perpendicular to the central “hole" in the donut) does not change the O-information.

Like with the hollow sphere, we can assess how the properties of the torus change if we “fill in" the central, three-dimensional cavity (instead of an inner-tube, picture a solid cake doughnut). If we sample 10,000 points from the interior of a torus with the same dimensions and do the same rotation as before, we see nat; almost an order of magnitude less than the hollow torus. If we rotate the filled torus using PCA, the O-information drops further: nat, not significantly different from the ball. We can see then that the torus displays a mixture of contextual and intrinsic synergy: it is not completely rotationally symmetric the way the sphere is, but the structure of the internal cavity provides some intrinsic synergy.

Can we generate a system with intrinsic redundancy? We have seen that the most obvious redundant surface (a straight line) does have significant redundancy, but that it can be rotated away, making it non-intrinsic. We hypothesized that a point cloud with intrinsic redundancy would have to be low-dimensional (locally line-like), but embedded in three-dimensional space in such a way that it could not be losslessly projected down into a lower-dimensional subspace. The natural candidate is a knot: a one-dimensional curve embedded in space in such a way that it never self-intersects. To test this, we generate 10,000 points along a trefoil knot (sometimes also called a triquetra). The trefoil knot is locally one-dimensional, but it requires being embedded into a three-dimensional space to ensure that it never self-intersections. A trefoil knot, rotated by radians has a strongly redundant structure ( nat). When put through a PCA, the redundancy decreases, however it remains well-above zero ( nat). This shows that the simple knot does have intrinsic redundancy.

The trefoil knot is a special case of a larger family of knots, known as p,q-knots (also known as torus knots). These knots provide an interesting counter to the finding that hollow toroids are synergy-dominated, as the p,q-knots are all embedded into the surface of a toroid. Consequently, they have superficially similar global structures, but markedly different mathematical structures. To test whether intrinsic redundancy was unique to the trefoil or not, we generated a 5,3-knot, rotated it in the same way as before, and found that it too was significantly redundancy-dominated ( nat), and that this redundancy persisted after rotation with PCA ( nat).

These results help us build a foundations of intuitions on how to think about the relationship between topological structure and higher-order information. Several basic features are worth considering. The first is that, while the topology of a point cloud is invariant to how the cloud is rotated, the higher-order information is not. This leads to a distinction that can be made between “intrinsic" higher-order information, which is that information that is fundamentally tied to the structure of the point cloud regardless of its orientation, and “contextual" higher-order information, which depends on the specific orientation of the data. The second point is that synergistic information is associated with three-dimensional cavities, or voids in the data. This suggests a natural link between higher-order synergistic information, and topological data analysis.

2.2 fMRI data analysis

In addition to exploring constructed manifolds with known structure, we also considered how higher-order information and topological data analysis interact in a more naturalistic case. Specifically, we explored multivariate time series data taken from a set of four concatenated resting-state fMRI scans from a single individual [41]. Neuroscience as a field has been a testbed for many cutting-edge approaches to studying polyadic interactions between multiple elements, including both information-theoretic and topological approaches. Both information theory and topological data analysis have been found to be informative about diverse cognitive processes, including loss of consciousness [1719] and aging [23,25].

Briefly, we computed the O-information, total correlation, and dual total correlation using a K-nearest neighbors-based generalization of the Kozachenko-Leonenko entropy estimator [38,39], as implemented by the JIDT package [36], for all triads of three brain regions (1,313,400 sets of three) from a single subject (four concatenated fMRI scans, totaling 4,400 frames). These three-dimensional time series form a point-cloud in a three-dimensional space, analogous to the point clouds described above (see Fig 5). From this large sample, we selected those triads that showed an O-information significantly greater than, or less than, the expected null O-information computed from an ensemble of autocorrelation-preserving null models (for details, see Materials and Methods). The result was a set of 30,100 significantly redundancy-dominated triads and 6,200 significantly synergy-dominated triads. For each significant triad, we also computed the information-theoretic metrics on the point-cloud after it was rotated with principal component analysis. For the topological data analysis, we computed the average persistence time for three-dimensional cavities and the total number of three-dimensional cavities using the Ripser package for persistence homology [42]. For both information-theoretic and topological data analyses, distance between points was defined with the Chebyshev metric. The various feature (information metrics and topological properties) were then correlated against each other using a Spearman correlation.

thumbnail
Fig 5. Sampling triads and computing measures of higher-order structure.

A: Sets of three brain regions are sampled from the cerebral cortex and their associated BOLD time series are extracted. B: Those time series can be represented as a point cloud embedded in a three-dimensional space (as in [19], with each point encoding the joint state of each of the three cortical parcells at time t. C: That point-cloud can then be analyzed using persistance homology or multivariate information-theoretic measures.

https://doi.org/10.1371/journal.pcbi.1013649.g005

To ensure consistency, we replicated all analyses with a second single subject (see SI Text 3). All the patterns described here for the first subject are strongly replicated in the second subject, suggesting that the relationships described below are robust. Finally, to ensure that our results were robust, we replicated all of the analyses done using average persistence with the maximum persistence (see SI Text 2). While the average persistence summarizes the distribution of lifetimes of all cavities, the maximum persistence considers only the largest (longest-lived) cavity. As with the additional-subject replication, we found that the same relationships appeared and all were significant (albeit slightly weaker).

2.2.1 Synergy is associated with three-dimensional cavities.

To assess how higher-order information was correlated with higher-order topological features, we sampled triads (groups of three brain regions), and computed both the non-parametric information-theoretic measures (total correlation, dual total correlation, O-information), and the persistence homology features of each three-dimensional point-cloud. To disambiguate between higher-order redundancies and synergies, we separated the triads into those that had statistically significant positive O-information (N=30,100 triads) and statistically significant negative O-information (N=6,200 triads). We then correlated the information-theoretic features against the topological features using the non-parametric Spearman’s correlation coefficient.

We found that for both redundant and synergistic triads, there was a significant negative correlation between average persistence time of three-dimensional voids and both total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100) and dual total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). For visualization see Fig 6. Curiously, this effect reverses if the point cloud is rotated using a principal component analysis: the correlations between the information measures and the average persistence become significantly positive (albeit much weaker) for both total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100) and dual total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100).

thumbnail
Fig 6. Total correlation and dual total corelation are inversely associated with higher-order topological features.

Left column: For redundancy-dominated triads, for both total correlation and dual total correlation, there was a negative relationship between higher-order information and both features of higher-order topology (average persistence and number of voids). Right column: The same pattern (albeit weaker) was also seen in the synergy-dominated triads. This effect was reversed after rotating the pointclouds with principal component analysis (for visualization, see SI Text 5.

https://doi.org/10.1371/journal.pcbi.1013649.g006

The same pattern was true when considering the raw number of three-dimensional voids. For both redundant and synergistic triads, there was a significant negative correlation between total number of three-dimensional voids and both total correlation (redundancy-dominated: , p < 10−100, synergy-dominated: , p < 10−100) and dual total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). Once again this relationship reversed after rotating the point cloud with a PCA transformation: the correlations between the information measures and the number of voids become significantly positive for both total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100) and dual total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100).

Continuing with the distinction introduced above between “contextual" higher-order information and “intrinsic" higher-order information, these results show that the relationship between different measures of higher-order structure in data (topological and information-theoretic) is more complex than it might seem at first blush. The contextual total correlation and dual total correlation have a different relationship with the three-dimensional topology of the point cloud than the intrinsic total correlation and dual total correlation.

When we consider the normalized O-information (see Fig 7), we find a strong, negative correlation with average persistence of a three dimensional void through the Rips filtration (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). Importantly, unlike with the total correlation and dual total correlation, the direction of the relationship did not change after rotating the point cloud using PCA, although the relationships became considerably weaker (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). Similarly, there is a robust negative correlation between normalized O-information and the number of voids (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). As with the average persistence, the direction of the correlation didn’t change after rotating the point cloud with PCA, however the relationships became considerably weaker (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−7).

thumbnail
Fig 7. Higher-order topological features are inversely correlated with normalized O-information.

Right column: For redundant triads, there were significant negative correlations between the normalized O-information both with (bottom two plots) and without (top two plots) rotation with PCA. Left column: The same pattern was present in the synergistic triads, although after PCA rotation, the negative correlations became much weaker (although still quite significant). Collectively, these results show that the amount of higher-order information in a point cloud is correlated with the presence of higher-order topological features, regardless of how the data is rotated (intrinsic synergy and redundancy).

https://doi.org/10.1371/journal.pcbi.1013649.g007

These results show that more negative normalized O-information (i.e. greater synergy dominance) is associated with larger numbers of three-dimensional voids, and longer-lived (i.e. larger) voids as well. Similarly, more positive normalized O-information (i.e. greater redundancy-dominance) is associated with smaller, less numerous cavities. When considered in the context of the results of the analysis of spheres and toroids, these findings suggest that there is a link between the presence of higher-dimensional topological features (specifically cavities) and higher-order information: the presence of one is correlated with greater incidence of the other.

2.2.2 Low-dimensional manifold analysis fails to represent higher-order structures.

A very common approach in modern neuroscience is to use manifold learning to extract a low-dimensional representation of highly multivariate data [4346]. It is generally the case that high-dimensional data can be projected down onto a much lower-dimensional manifold that preserves a significant fraction of the total variance. However, it is unclear how high-dimensional information (synergies and redundancies) are represented by these transformations. To test this in the fMRI data, we correlated the proportion of total variance explained by the first principal component of each point cloud against the O-information and topological features. For visualization of the results, see Fig 8.

thumbnail
Fig 8. Synergy is not captured by low-dimensional manifolds.

Top left: For significantly redundancy-dominated triads, the normalized O-information is strongly correlated with the variance explained by the first principal component, indicating that as the overall amount of redundancy increases, the degree of compressibility does as well. Top right: The same pattern can be seen in synergy-dominated triads as well: a more strongly negative value is associated with decreasing variance explained by the first principal components. Synergistic information is, in a sense, incompressible. Note that, in a totally unstructured system, the variance explained by the first principal component would be , which is within the range of observed values for the most synergistic triads, suggesting that the PCA “sees" these triads as random, despite a strong (higher-order) deviation from independence. Bottom: The distributions of amount of variance explained by the first principal component is also significantly different between redundancy- and synergy-dominated triads, with redundant triads collectively having a greater average compressibility compared to synergistic ones (KS = 0.97, p < 10−100).

https://doi.org/10.1371/journal.pcbi.1013649.g008

For redundant and synergistic triads, there was a significant, positive correlation between the variance explained by the first principal component and both the total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100) and dual total correlation (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). This is as expected, since both total correlation and dual total correlation track different notions of deviation from independence (both measures are zero when every element is independent of every other).

We found that for both redundancy-dominated and synergy-dominated triads, there was a strong, positive correlation between normalized O-information and the variance explained by the first principal component (redundancy-dominated: , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). This result shows that the more negative the normalized O-information is (corresponding to comparatively more synergy-dominated structure), the less variance is explained by the first principal component. Conversely, triads with strongly positive normalized O-information (corresponding to more redundancy-dominated structure) were more amenable to lossless dimensionality reduction.

Similar patterns were observed in the TDA measures. For both redundant and synergistic triads, there was a strong negative correlation between the average persistence time of three-dimensional voids and the variance explained by the first PC (redundancy-dominated: Spearman’s , p < 10−100, synergy-dominated: Spearman’s , p < 10−100). The same pattern was true, albeit weaker, when considering the relationship between the total number of three-dimensional voids at the variance explained by the first PC (redundancy-dominated: , p < 10−100, synergy-dominated: Spearman’s , p < 10−100).

Collectively, these results show that PCA-based approaches to manifold learning fail to capture higher-order, synergistic and topologically rich features of the data. Low-dimensional manifolds seem to preferentially represent redundant dependencies (typically in the form of synchronized components) and are blind to synergies. This is consistent with prior work from functional connectivity (FC) analysis, which found that FC networks also preferentially represent redundancies and miss synergies [8,9]. To reinforce this, we also replicated the functional connectivity results first reported in [9] again with the nearest-neighbors based estimators, rather than the Gaussian estimators (see SI Text 4), suggesting that these results are general and not specific to a given class of estimators.

3 Discussion

In this paper, we explored the relationships between two different approaches to characterizing higher-order structures in data: multivariate information [15] and topological data analysis [11,12]. Despite hailing from very different mathematical lineages, we found that topological and information-theoretic approaches to higher-order structures are related. By first analyzing point clouds with known topologies (spheres, planes, hollow toroids, and knots), and then naturalistic data collected from human brain activity, we find evidence of key similarities (and differences) between the two approaches to higher-order structure in multivariate data.

The most significant finding is that statistical synergy (information in the “whole" but none of the “parts") is associated with three-dimensional cavities in the point clouds. In both spheres and toroids, hollow cavities are associated with significantly greater synergy than their solid counterparts. In the fMRI data, both the number of voids and the average persistance were negatively correlated with normalized O-information, suggesting that both more cavities, and longer-lived cavities appeared in data with more synergy. This suggests that, in a fundamental way, topological data analysis and synergistic information theory are looking at the same kind of underlying structure. Intuitively, one might understand this link by considering Gauss’s famous Theorema Egregium [47] that one cannot project a three-dimensional globe onto a two-dimensional surface without deformation: the sphere and a plane are not isometric. This notion of synergistic information present in three-dimensions that is invariably lost when projecting down onto a lower-dimensional space may form the foundation for future methods of estimating synergistic information based on projection distortions. Most information-theoretic treatments of synergy are either coarse (giving a redundancy-synergy balance) like the O-information, or take a redundancy-first approach that implicitly define synergy as “that information that is left over when all the simpler redundancies are partialed out" [14]. Truly synergy-first approaches are less well-developed and remain an outstanding problem for the field (for examples, see [48,49]).

The finding that redundancy was associated with knots was an unexpected and intriguing one. Recall that . Following the interpretation detailed by Rosas et al., [13], we suggest that the structure of the knot can be understood as being dominated by “collective constraints" (the total correlation) versus “shared information" (the dual total correlation). Since the knot must be locally one-dimensional at any point, for a given value of X1 in the knot, the set of possible values of X2 and X3 is profoundly constrained by the requirement of local linearity. The collective constraints on the knot are greater than any information shared between the individual Xi, leading to a strongly positive O-information.

We also introduce the distinction between contextual and intrinsic higher-order information. Intrinsic higher-order information is that information which is “built into" the structure of the point cloud and persists regardless of how the point cloud is rotated. In contrast, contextual higher-order information depends on the specific ways the point cloud loads onto the axes that define the embedding space. Different structures can have either intrinsic synergy or intrinsic redundancy, with intrinsic synergy being associated with rotationally-symmetric three-dimensional cavities and intrinsic redundancy being associated with knots. In the fMRI data, we observed a mixture of intrinsic and contextual higher-order information: generally associations between measures remained significant after rotation with PCA, however the strength of the interactions became much weaker. The significance of these two flavors of higher-order interaction in neural data remains mysterious.

The finding that manifold learning algorithms like PCA represent redundancies and penalize synergies is consistent with a developing literature exploring how many existing analyses popular in complex systems and computational neuroscience are preferentially biased towards redundancy. Prior work on functional connectivity networks, both theoretical and empirical, has found that statistical networks also reflect higher-order redundancies and are largely blind to synergies [8,9]. Recent work in using Ising and complex contagion models has also found that lower-order statistics are incapable of correctly identifying higher-order group-interactions in complex systems [50], further motivating work on the development of measures of “genuine higher-order interactions." The fact that this also appears to be the case for manifold-learning approaches suggests that synergistic information represents a largely un-explored “shadow structure" in brain data; missed by currently popular methods like functional connectivity and manifold learning.

This approach has some limitations that are worth discussing. The first is that no formal proofs relating information theory and topology are provided—as such these relationships are all correlational in nature, making it more of an exercise in “experimental mathematics" than a rigorous treatment of the two fields. We conjecture that formal links may be derivable, although this is beyond the current scope of this project. This project is also limited by the computational costs of doing both TDA and non-parametric information theoretic analyses of large datasets. It was not practical to explore triads with more than ≈ 1,000 samples, which means that the robustness of the estimates of the underlying manifold may be under-powered. Prior work on synergistic information in fMRI data has been able to leverage hundreds of thousands of samples [8,9] by using discrete or parametric Gaussian estimators. This is not possible for this paper, and so the results should be interpreted in this context, although we feel that the strength of the observed relationships, as well as consistency between the fMRI results and the shape results, lend credence to our conclusions.

A distinction should also be drawn between higher-order “mechanisms" (the physical processes that instantiate the system") and higher-order “behaviors" (the statistical features of data recorded from the system [7,50]. Both multivariate information theory and topological data analysis are only sensitive to “behaviors", and cannot assess the causal mechanisms that generated the data (although there have been attempts to modify information theoretic analyses to more directly explore causal relationships, see: [51,52]). This is a key distinction, as lower-order mechanisms can produce higher-order behaviors [7,53], and the relationship between higher-order mechanisms and behaviors is complex and nuanced. Future work developing approaches that can assess when higher-order behaviors are “spurious" or generated by truly higher-order mechanisms will be important for understanding the role of integrative processes in biology and neuroscience.

Finally, this paper is the most recent in a string of studies that connect the idea of information-theoretic synergy to other concepts from mathematics and complex systems. Synergistic information has been linked to ideas in causal inference, with synergies being indicative of causal colliders [28,52]. Synergy has also been linked to chaos in dynamical systems: Boolean networks evolved for highly synergistic dynamics are generally highly chaotic, displaying classic features such as sensitivity to perturbation, long transients, and high-entropy dynamics [54]. This study continues in this vein by linking synergistic information to higher-order topological features.

4 Methods

Ethics declaration.

No new data was collected for the purposes of this study. All HCP participants participants provided written informed consent to data collection protocols approved by the Ethics Committee of the Montreal Neurological Institute and Hospital.

4.1 Nearest-neighbor estimators for the O-information

One of the key methodological requirements for this study is a reliable estimator of high-order interactions in high-dimensional, non-linear distributions. Common methods such as Gaussian [36] or kernel [55] estimators are not suitable for this task (Gaussian methods because they are linear, kernel methods because they are very sensitive to hyperparameter choices)—leaving nearest-neighbor estimators as the natural choice.

Following the seminal work of Kraskov et al. [35], the nearest-neighbor (NN) mutual information estimator consists primarily of two steps: i) a neighbor search, to find the distance from each point to its nearest neighbor; and ii) a range search, to count the number of points within this distance in each dimension (see Fig 2 for visualization). These counts are then used to compute the pointwise mutual information, as outlined in Sect 1.1.1 and described in detail in the original paper [35].

A naive approach to estimating O-information with NN methods could be, for example, to estimate each mutual information separately (using the Kozachenko-Leonenko [38] or Kraskov [35] algorithms) and then adding these to form the O-information. Unfortunately, this would result in a very high bias due to the neighbor searches in spaces with different dimensionality (as already pointed out by Kraskov et al. [35]). Our goal, then, is to formulate a nearest-neighbor O-information estimator that only involves one neighbor search, so bias is kept to a minimum.

This can be achieved thanks to the extension to Kraskov et al.’s “Algorithm 1” [35] for entropy combinations provided by Gómez-Herrero et al. [56]. To apply this estimator, we begin from the expression of the O-information in terms of marginal entropies [13, Def. 1]:

(13)

Through basic arithmetic, this equation can be mapped to Eq (1) of Ref [56], confirming that the O-information is indeed an entropy combination (scaled by a multiplicative factor). Applying this to Eq (3) of [56], we obtain the following O-information NN estimator:

(14)

where , ψ is the digamma function, and denotes a sample average across samples indexed by t. Denoting by the distance between point t and its nearest neighbor in the joint space X, ki(t) denotes the number of points within distance of point t in the marginal space Xi—and similarly for k−i(t) in the marginal space .

Therefore, the formulation in Eq (14) allows us to estimate O-information using 2N range searches (N for the one-dimensional marginals Xi and N for the (N–1)-dimensional marginals ) and, crucially, only one neighbor search—providing a much less biased estimator of the O-information.

The resulting algorithm is published as part of the JIDT toolbox [36] and is publicly available at github.com/jlizier/jidt.

4.2 Significance-testing O-information

A confound of any study of multivariate neural time series is the presence of autocorrelation, which can complicate the problem of inference by artificially inflating the apparent dependence between two actually-uncorrelated variables [57]. To ensure that we were only analyzing triads with genuine higher-order interactions, we employed a circular shift-based null hypothesis significance test to filter out triads with an O-information likely to be attributable to first-order autocorrelation and not true higher-order dependency. Briefly: we first computed the O-information for each of the 1,313,400 unique triads. We then re-computed the O-information for each of the triads, after circular-shifting each time series so that it came from a different scan (since four scans were appended to infer the joint distribution. This means that each of the three channels were recorded at different points in the scan, sometimes hours apart, and any O-information is attributable just to the autocorrelation (which is preserved by the circular shift).

From this set of 1,313,400 nulls, we built a distribution against which to test the empirical, un-perturbed O-informations. If a triad had an empirical O-information lower than three standard deviations from the mean of the null distribution, it was said to be significantly synergistic, and if it was greater than three standard deviations from the mean of the null it was said to be significantly redundant. The result was 30,100 significantly redundancy-dominated triads and 6,200 significantly synergy-dominated triads.

4.3 Topological data analysis

All topological data analysis was done using the Ripser package [42], on a pre-computed Chebyshev distance matrix with the maximum cohomology dimension set to 2. Since the persistence homology computation grows unmanageably with the number of points, we randomly sampled 1,100 frames from the 4,400 concatenated BOLD time series (equivalent to a single scan) to minimize excessive computation while still making sure that each of the four constituent scans contributed. All persistence statistics were computed from the resulting persistence diagram.

4.4 Data collection and preprocessing

The data used here has been previously described in a number of prior papers [8,9,58]. We will briefly reproduce a high-level overview from [8]. The data used in this study was taken from a set of 100 unrelated subjects included in the Human Connectome Project (HCP) [41]. All subjects gave informed consent to protocols approved by the Washington University Institutional Review Board. Data was collected with a Siemens 3T Connectom Skyra using a head coil with 32 channels. Functional data analysed here was acquired during resting state with a gradient-echo echo-planar imaging (EPI) sequence. Collection occurred over four scans on two separate days (scan duration: 14:33 min; eyes open). The main acquisition parameters included TR = 720 ms, TE = 33.1 ms, flip angle of 52circ, 2 mm isotropic voxel resolution, and a multiband factor of 8. Resting state data was mapped to a 200-node parcellation scheme [59] covering the entire cerebral cortex. Considerations for subject inclusion were established before the study and are as follows. The mean and mean absolute deviation of the relative root mean square (RMS) motion throughout any of the four resting scans were calculated. Subjects that exceeded 1.5 times the interquartile range in the adverse direction for two or more measures they were excluded. This resulted in the exclusion of four subjects, and an additional subject due to a software error during diffusion MRI processing.

The minimal preprocessing of HCP rs-fMRI data can be found described in detail in [60] Five main steps were followed: 1) susceptibility, distortion, and motion correction; 2) registration to subject-specific T1-weighted data; 3) bias and intensity normalization; 4) projection onto the 32k fs LR mesh; and 5) alignment to common space with a multimodal surface registration. This pipeline produced an ICA+FIX time series in the CIFTI grayordinate coordinate system. We included two additional preprocessing steps: 6) global signal regression and 7) detrending and band pass filtering (0.008 to 0.08 Hz) [61]. We discarded the first and last 50 frames of each time series after confound regression and filtering to produce final scans with length 13.2 min (1,100 frames).

Supporting information

S1 File. Additional analyses and replications using a second subject, as well as the maximum persistence.

https://doi.org/10.1371/journal.pcbi.1013649.s001

(PDF)

S1 Data. A python script for generating and analyzing the three-dimensional shapes seen in Fig 4.

https://doi.org/10.1371/journal.pcbi.1013649.s002

(PY)

S2 Data. A python script that computes the initial O-information for all triads.

https://doi.org/10.1371/journal.pcbi.1013649.s003

(PY)

S3 Data. A python script that builds the distribution of circularly-shifted nulls.

https://doi.org/10.1371/journal.pcbi.1013649.s004

(PY)

S4 Data. Computes the TDA of the significantly redundant triads.

https://doi.org/10.1371/journal.pcbi.1013649.s005

(PY)

S5 Data. Computes the TDA of the significantly synergistic triads.

https://doi.org/10.1371/journal.pcbi.1013649.s006

(PY)

S6 Data. A csv file with the BOLD data from one additional subject for replication.

https://doi.org/10.1371/journal.pcbi.1013649.s007

(CSV)

Acknowledgments

TFV would like to thank Joe Lizier for assistance with the JIDT package.

References

  1. 1. Cotnoir AJ, Varzi AC. What is mereology?. Mereology. Oxford University Press; 2021. p. xvi–20. https://doi.org/10.1093/oso/9780198749004.003.0001
  2. 2. Barabási AL, Pósfai M. Network science. Cambridge University Press; 2016.
  3. 3. Sporns O. Networks of the brain. MIT Press; 2010.
  4. 4. Betzel RF. Community detection in network neuroscience. arXiv preprint 2020. https://arxiv.org/abs/2011.06723
  5. 5. Traag VA, Bruggeman J. Community detection in networks with positive and negative links. Phys Rev E Stat Nonlin Soft Matter Phys. 2009;80(3 Pt 2):036115. pmid:19905188
  6. 6. Sporns O, Kötter R. Motifs in brain networks. PLoS Biol. 2004;2(11):e369. pmid:15510229
  7. 7. Rosas FE, Mediano PAM, Luppi AI, Varley TF, Lizier JT, Stramaglia S, et al. Disentangling high-order mechanisms and high-order behaviours in complex systems. Nat Phys. 2022;18(5):476–7.
  8. 8. Varley TF, Pope M, Puxeddu MG, Faskowitz J, Sporns O. Partial entropy decomposition reveals higher-order information structures in human brain activity. Proc Natl Acad Sci U S A. 2023;120(30):e2300888120. pmid:37467265
  9. 9. Varley TF, Pope M, Faskowitz J, Sporns O. Multivariate information theory uncovers synergistic subsystems of the human cerebral cortex. Commun Biol. 2023;6(1):451. pmid:37095282
  10. 10. Battiston F, Cencetti G, Iacopini I, Latora V, Lucas M, Patania A, et al. Networks beyond pairwise interactions: structure and dynamics. Physics Reports. 2020;874:1–92.
  11. 11. Chazal F, Michel B. An introduction to topological data analysis: fundamental and practical aspects for data scientists. arXiv preprint 2017. https://arxiv.org/abs/1710.04019
  12. 12. Gholizadeh S, Zadrozny W. A short survey of topological data analysis in time series and systems analysis. arXiv preprint 2018. https://arxiv.org/abs/1809.10745
  13. 13. Rosas FE, Mediano PAM, Gastpar M, Jensen HJ. Quantifying high-order interdependencies via multivariate extensions of the mutual information. Phys Rev E. 2019;100(3–1):032305. pmid:31640038
  14. 14. Williams PL, Beer RD. Nonnegative decomposition of multivariate information. arXiv preprint 2010. https://arxiv.org/abs/1004.2515
  15. 15. Varley TF. Information theory for complex systems scientists: what, why, and how. Phys Rep. 2025;1148(0370–1573)1–55. https://www.sciencedirect.com/science/article/pii/S037015732500256X. https://doi.org/10.1016/j.physrep.2025.09.007
  16. 16. Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, et al. Homological scaffolds of brain functional networks. J R Soc Interface. 2014;11(101):20140873. pmid:25401177
  17. 17. Luppi AI, Mediano PAM, Rosas FE, Allanson J, Pickard JD, Carhart-Harris RL, et al. A synergistic workspace for human consciousness revealed by integrated information decomposition. eLife Sciences Publications, Ltd. 2024. https://doi.org/10.7554/elife.88173.2
  18. 18. Luppi AI, Mediano PAM, Rosas FE, Allanson J, Pickard JD, Williams GB, et al. Reduced emergent character of neural dynamics in patients with a disrupted connectome. Neuroimage. 2023;269:119926. pmid:36740030
  19. 19. Varley TF, Denny V, Sporns O, Patania A. Topological analysis of differential effects of ketamine and propofol anaesthesia on brain dynamics. R Soc Open Sci. 2021;8(6):201971. pmid:34168888
  20. 20. Santoro A, Battiston F, Lucas M, Petri G, Amico E. Higher-order connectomics of human brain function reveals local topological signatures of task decoding, individual identification, and behavior. Nat Commun. 2024;15(1):10244. pmid:39592571
  21. 21. Anderson KL, Anderson JS, Palande S, Wang B. Topological data analysis of functional MRI connectivity in time and space domains. In: Wu G, Rekik I, Schirmer MD, Chung AW, Munsell B, editors. Connectomics in neuroImaging. Cham: Springer; 2018. p. 67–77.
  22. 22. Varley TF, Sporns O, Schaffelhofer S, Scherberger H, Dann B. Information-processing dynamics in neural networks of macaque cerebral cortex reflect cognitive state and behavior. Proc Natl Acad Sci U S A. 2023;120(2):e2207677120. pmid:36603032
  23. 23. Gatica M, Cofré R, Mediano PAM, Rosas FE, Orio P, Diez I, et al. High-order interdependencies in the aging brain. Brain Connect. 2021;11(9):734–44. pmid:33858199
  24. 24. Herzog R, Rosas FE, Whelan R, Fittipaldi S, Santamaria-Garcia H, Cruzat J, et al. Genuine high-order interactions in brain networks and neurodegeneration. Neurobiol Dis. 2022;175:105918. pmid:36375407
  25. 25. Rutkowski TM, Komendziński T, Otake-Matsuura M. Mild cognitive impairment prediction and cognitive score regression in the elderly using EEG topological data analysis and machine learning with awareness assessed in affective reminiscent paradigm. Front Aging Neurosci. 2024;15:1294139. pmid:38239487
  26. 26. Xu FH, Gao M, Chen J, Garai S, Duong-Tran DA, Zhao Y, et al. Topology-based clustering of functional brain networks in an Alzheimer’s Disease cohort. AMIA Jt Summits Transl Sci Proc. 2024;2024:449–58. pmid:38827100
  27. 27. Sizemore AE, Phillips-Cremins JE, Ghrist R, Bassett DS. The importance of the whole: topological data analysis for the network neuroscientist. Netw Neurosci. 2019;3(3):656–73. pmid:31410372
  28. 28. Rosas FE, Gutknecht A, Mediano PAM, Gastpar M. Characterising high-order interdependence via entropic conjugation. arXiv prerpint 2024. http://arxiv.org/abs/2410.10485
  29. 29. Watanabe S. Information theoretical analysis of multivariate correlation. IBM J Res & Dev. 1960;4(1):66–82.
  30. 30. Tononi G, Sporns O, Edelman GM. A measure for brain complexity: relating functional segregation and integration in the nervous system. Proc Natl Acad Sci U S A. 1994;91(11):5033–7. pmid:8197179
  31. 31. Abdallah SA, Plumbley MD. A measure of statistical complexity based on predictive information with application to finite spin systems. Physics Letters A. 2012;376(4):275–81.
  32. 32. Li Q, Yu S, Madsen KH, Calhoun VD, Iraji A. Higher-order organization in the human brain from matrix-based Rényi’s entropy. In: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW); 2023. p. 1–5. https://ieeexplore.ieee.org/abstract/document/10193346
  33. 33. James RG, Ellison CJ, Crutchfield JP. Anatomy of a bit: information in a time series observation. Chaos. 2011;21(3):037109. pmid:21974672
  34. 34. Liardi A, Rosas FE, Carhart-Harris RL, Blackburne G, Bor D, Mediano PA. Null models for comparing information decomposition across complex systems. arXiv preprint 2024. https://arxiv.org/abs/24101158
  35. 35. Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;69(6 Pt 2):066138. pmid:15244698
  36. 36. Lizier JT. JIDT: an information-theoretic toolkit for studying the dynamics of complex systems. Front Robot AI. 2014;1.
  37. 37. Luppi AI, Mediano PAM, Rosas FE, Holland N, Fryer TD, O’Brien JT, et al. A synergistic core for human brain evolution and cognition. Nat Neurosci. 2022;25(6):771–82. pmid:35618951
  38. 38. Kozachenko LF, Leonenko NN. Sample estimate of the entropy of a random vector. Problems of Information Transmission. 1987;23(2):9.
  39. 39. Delattre S, Fournier N. On the Kozachenko–Leonenko entropy estimator. Journal of Statistical Planning and Inference. 2017;185:69–93.
  40. 40. Edelsbrunner H, Harer J. Computational topology: an introduction. Providence, R.I.: American Mathematical Society; 2010.
  41. 41. Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K, et al. The WU-Minn human connectome project: an overview. Neuroimage. 2013;80:62–79. pmid:23684880
  42. 42. Tralie C, Saul N, Bar-On R. Ripser.py: a lean persistent homology library for python. JOSS. 2018;3(29):925.
  43. 43. Gallego JA, Perich MG, Miller LE, Solla SA. Neural manifolds for the control of movement. Neuron. 2017;94(5):978–84. pmid:28595054
  44. 44. Gallego JA, Perich MG, Naufel SN, Ethier C, Solla SA, Miller LE. Cortical population activity within a preserved neural manifold underlies multiple motor behaviors. Nat Commun. 2018;9(1):4233. pmid:30315158
  45. 45. Shine JM, Hearne LJ, Breakspear M, Hwang K, Müller EJ, Sporns O, et al. The low-dimensional neural architecture of cognitive complexity is related to activity in medial thalamic nuclei. Neuron. 2019;104(5):849-855.e3. pmid:31653463
  46. 46. Langdon C, Genkin M, Engel TA. A unifying perspective on neural manifolds and circuits for cognition. Nat Rev Neurosci. 2023;24(6):363–77. pmid:37055616
  47. 47. Gauss KF, Pesic P. General Investigations of Curved Surfaces. Courier Corporation; 2005.
  48. 48. Rosas FE, Mediano PAM, Rassouli B, Barrett AB. An operational information decomposition via synergistic disclosure. J Phys A: Math Theor. 2020;53(48):485001.
  49. 49. Varley TF. A scalable synergy-first backbone decomposition of higher-order structures in complex systems. NPJ Complex. 2024;1(1).
  50. 50. Robiglio T, Neri M, Coppes D, Agostinelli C, Battiston F, Lucas M, et al. Synergistic signatures of group mechanisms in higher-order systems. Phys Rev Lett. 2025;134(13):137401. pmid:40250379
  51. 51. Hoel EP, Albantakis L, Tononi G. Quantifying causal emergence shows that macro can beat micro. Proc Natl Acad Sci U S A. 2013;110(49):19790–5. pmid:24248356
  52. 52. Varley TF. A synergistic perspective on multivariate computation and causality in complex systems. Entropy (Basel). 2024;26(10):883. pmid:39451959
  53. 53. Matsuda H. Physical nature of higher-order mutual information: intrinsic correlations and frustration. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 2000;62(3 Pt A):3096–102. pmid:11088803
  54. 54. Varley TF, Bongard J. Evolving higher-order synergies reveals a trade-off between stability and information-integration capacity in complex systems. Chaos. 2024;34(6):063127. pmid:38865092
  55. 55. Marinazzo D, Pellicoro M, Stramaglia S. Kernel method for nonlinear granger causality. Phys Rev Lett. 2008;100(14):144103. pmid:18518037
  56. 56. Gómez-Herrero G, Wu W, Rutanen K, Soriano M, Pipa G, Vicente R. Assessing coupling dynamics from an ensemble of time series. Entropy. 2015;17(4):1958–70.
  57. 57. Cliff OM, Novelli LE, Fulcher BD, Shine JM, Lizier JT. Exact inference of linear dependence between multiple autocorrelated time series.
  58. 58. Pope M, Fukushima M, Betzel RF, Sporns O. Modular origins of high-amplitude cofluctuations in fine-scale functional connectivity dynamics. Proc Natl Acad Sci U S A. 2021;118(46):e2109380118. pmid:34750261
  59. 59. Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo X-N, Holmes AJ, et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb Cortex. 2018;28(9):3095–114. pmid:28981612
  60. 60. Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, et al. The minimal preprocessing pipelines for the human connectome project. Neuroimage. 2013;80:105–24. pmid:23668970
  61. 61. Parkes L, Fulcher B, Yücel M, Fornito A. An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. Neuroimage. 2018;171:415–36. pmid:29278773