^{1}

^{*}

^{1}

^{2}

^{1}

^{3}

^{1}

^{3}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: GP FV. Performed the experiments: GP MS ID FV. Analyzed the data: GP MS ID FV. Contributed reagents/materials/analysis tools: GP MS ID FV. Wrote the paper: GP MS ID FV.

The statistical mechanical approach to complex networks is the dominant paradigm in describing natural and societal complex systems. The study of network properties, and their implications on dynamical processes, mostly focus on locally defined quantities of nodes and edges, such as node degrees, edge weights and –more recently– correlations between neighboring nodes. However, statistical methods quickly become cumbersome when dealing with many-body properties and do not capture the precise mesoscopic structure of complex networks. Here we introduce a novel method, based on persistent homology, to detect particular non-local structures, akin to

Complex networks have become one of the prominent tools in the study of social, technological and biological systems

To avoid this pitfall, given a weighted network

Persistent homology is a recent development in computational topology designed for robust shape recognition and data-discovery from high dimensional datasets

Rank the weights of links from

At each step

A weighted network hole of weight

By unearthing their properties, we obtain the main contribution of this paper: the statistical features of weighted network holes yield a classification of real-world networks in two classes, depending on the compatibility or lack thereof with null models generated by graph randomisations. Furthermore, this classification is defined by mesoscopic homological structures that cannot be reconduced to local properties alone.

The method used for the classification itself, which we call

Each weighted hole

Similarly to stratigraphy, each step of the filtration is a topological stratum of the network, where the edge weight rank plays the role of depth. Intuitively,

We applied this analysis to various social, infrastructural and biological networks (see SI for a detailed list). In order to compare datasets, indices are normalized by the corresponding filtration length (maximal rank)

The statistical distributions obtained for the

Box plots of the distributions of persistences

cycle distributions are markedly different from the randomized versions (cycles display shorter persistence times, earlier and broader birth distributions and very short lengths as compared to their randomized versions);

cycle distributions are very close to their random versions (late appearance, short persistences, long cycles).

The short cycles of Class I networks nest hierarchically and appear and die over all scales while those in the randomized counterparts are born uniformly along the filtration but are more persistent, producing largely hollow network instances. The implications are twofold. Since cycles represent weaker connectivity regions, this results in class I networks being more

This can be seen easily by compressing the whole information within two scalar metrics which do not depend on the number of generators in a given network filtration. We define the

Dataset (class) | |||

Genes(I) | 1.14 | 14.6 | 873 |

Online forums(I) | 0.5 | ||

US Air 2000(I) | 0.868 | ||

US Air 2002(I) | 0.872 | ||

US Air 2006 (I) | 0.958 | ||

US Air 20011(I) | 0.941 | ||

Online messages(I) | 0.14 | ||

School day 1 (II) | 0.11 | 56 | |

School day 2 (II | 0.08 | 110 | |

C. elegans (II) | 0.25 | 76 | |

Twitter (II) | 0.11 | 370 | |

Hep-th (II) | 0.11 | 7.4 | |

Cond-mat (II) | 0.005 | 0.24 | |

Lin. RGG | 0.0034 | 34 | 836 |

Ran. RGG | 0.018 | 54 | 255 |

Interestingly, the hollowness values for the

Dataset (class) | ||||||||

Genes(I) | 0.515 | 0.003 | 0.35 | 0.006 | ||||

Online forums(I) | 0.02 | 0.0003 | ||||||

US Air 2000(I) | 0.160 | 0.001 | 0.02 | 0.0003 | ||||

US Air 2002(I) | 0.186 | 0.0008 | 0.23 | 0.002 | ||||

US Air 2006 (I) | 0.167 | 0.0005 | 0.165 | 0.001 | ||||

US Air 20011(I) | 0.181 | 0.0006 | 0.076 | 0.0007 | ||||

Online messages(I) | 0.21 | 0.0014 | 0.02 | 0.0003 | ||||

School day 1 (II) | 0.088 | 0.0034 | 0.015 | 0.0012 | ||||

School day 2 (II) | 0.090 | 0.0033 | 0.01412 | 0.00095 | ||||

C. elegans (II) | 0.0784 | 0.002 | 0.058 | 0.002 | ||||

Twitter (II) | 0.03 | 0.0001 | 0.01 | 0.0001 | ||||

Hep-th (II) | 0.08 | 0.0002 | – | – | ||||

Cond-mat (II) | 0.26 | 0.0004 | – | – | ||||

Lin. RGG | 0.227 | 0.003 | 0.28 | 0.006 | ||||

Ran. RGG | 0.3 | 0.0041 | 0.115 | 0.003 |

Naturally, the two network classes do not represent a binary taxonomy and should be considered as two extremes of a range over which networks are distributed. For example, we find networks that interpolate between these classes, e.g. the online messages network has short persistence intervals, but also late cycle appearances and short length cycles. However, classes do not appear to display uniform behavior for local and two-body quantities: degree- and weight-distributions and correlations are mixed within the same group and do not provide a direct answer for the nature of the two classes. Similarly, a recently proposed measure of structural organisation,

Finally, the classes do not show a consistent pattern in

Because homology is essentially a non-local property, it was expectable that the local measures mentioned would not be able to explain the observed homological patterns. Network homology can be seen in fact as the weighted complement to the

A simple artificial network helps illustrating this point: Random Geometric Graphs (RGG) have been recently shown to display long-range many-body correlations

For the latter and the airports, this organisation can be thought as the result of the non-local constraint imposed by the metric of the underlying space

Further evidence of this behavior can be found by zooming on specific cycles which convey information about underlying constrains hidden in the network weight-link connectivity patterns. For example, the cycle structure of the air passenger network detects the expected reduced connectivity over oceans in the form of strong persistent cycles– and the strong backbone of US airport hubs, which is then filled by the local (intra-community) links (

At the opposite extreme of local quantities lie the spectral properties of networks. It is very important therefore to investigate whether it is possible to highlight peculiar spectral signatures of the two classes. Network eigenvalues, especially those of the Laplacian matrix, figure prominently in a number of applications, ranging from spectral clustering

Interestingly, we find that class I networks have significantly larger spectral gaps (

Our results show therefore a deep connection between the homological network structure, the network spectral properties and their implications on network dynamics. Indeed, the role of mesoscopic structures in the stability and evolution of dynamical systems on networks is gradually emerging, as shown for example by recent work based on the concepts of basic symmetric subgraphs and their legacy eigenvalues in the global network spectrum

Hitherto, the homological structure of weighted networks could not be systematically studied. Our method, grounded in computational topology, allows to probe multiple layers of organized structure. It highlighted two classes of network distinguished by their homological features, which we interpreted as caused by differences in the higher order networks organisations that are not captured by (quasi)local approaches.

Among the many possible applications, two very relevant ones for social and infrastructural networks are the study of the weighted rich club’s geometry beyond the aggregate measure

This work therefore provides a stepping stone towards understanding the coupling between network dynamical processes and the network’s homology.

Finally, the filtration’s construction rule is flexible and can be readily adapted to other problems. Similarly to changing goggles, different edge metrics can be used (e.g. betweenness or salience

The dataset analysed in this paper cover a broad range of fields, spanning social, infrastructural and biological networks. Figures S1–S15 in the

In detail, they are:

The networks refer to the years 2000, 2002, 2006 and 2011. The years were chosen to provide snapshots of the air traffic situation at 4–5 years intervals, plus one extra (year 2000) just before the events of 9/11 which significantly affected the air transportation industry. The data used are publicly available from the website of the Bureau of Transportation Statistics (

The network is available at

The online messages network consists of messages in a student online community at University of California

The gene interaction network used in the paper is a sampling of the complete human genome dataset available from the University of Florida Sparse Matrix Collection. Each node is an individual gene, while the edges correlates the expression level of a gene with that of the genes (using a NIR score

The dataset consists of a network of mentions and retweet between Twitter users and is available online on the Gephi dataset page (

The dataset contains two days of recorded face-to-face interactions in a primary school. Each node represents a child, with the edge weight between two nodes being proportional to the amount of time the two children spent face to face. We analysed the two days separately, yielding two networks. The dataset has been collected by the Sociopattern project (

The networks analysed are the weighted co-authorship networks of the Condensed Matter E-print Archive between 1995 and 1999 (cond-mat) and the High-Energy Theory E-print Archive between 1995 and 1999 (hep-th)

The graph edgelists used in the paper are available online as part of the code package we developed

Finally, for comparison we use Random Geometric Graphs (RGG)

The networks analysed in this article are undirected and weighted, because the weighted clique filtration finds a natural application in such case. However, schemes for directed networks can be easily devised and tailored to specific case studies, e.g. one could adopt the definition used in the directed clique percolation method

The method we use to uncover weighted holes is persistent homology of the weight clique rank filtration. In this section we will briefly explain persistent homology and its realization through the weight rank clique filtration.

Persistent homology is a technique from computational algebraic topology that can be viewed as parametrized version of simplicial homology

a subset of a face in

the intersection of any two faces in

We assume that the vertex set is finite and totally ordered. A face of

Morphism between simplicial complexes are called simplicial maps. A simplicial map is a map between simplicial complexes with the property that the image of a vertex is a vertex and the image of a

The construction that leads to the vector space

The subspace

The

It assigns to a filtration the homology groups of the simplicial complexes

For each homology group, the information about the filtration is collected in a barcode: the set of intervals

In classical applications, the filtration is obtained from a point cloud using the Rips-Vietoris complex and persistent homology used to uncover robust topological features of the point cloud. We instead use the clique weight rank filtration to uncover properties deriving from the topology and weighted structure of weighted networks.

Recalling that an

The

Rank the weights of links from

At each step

For each graph

The clique complexes are nested along the growth of

In particular, persistent one dimensional cycles in the weight rank clique filtration represent weighted loops with much weaker internal links.

There is a conceptual difference in interpreting

Computing the filtration of a large dataset can be extremely demanding computationally. The identification of the maximal cliques requires in general exponential time, although algorithms exists for special cases that allow solutions to be obtained in polynomial time. In addition, the javaPlex library

In the case of non-metrical discrete spaces, for example networks, one cannot easily construct a witness complex through a controlled sub-sampling of the network. Luckily, it is still possible to reduce the computational complexity in different ways: first, one can limit the analysis to the first

(PDF)

The authors acknowledge M. Rasetti for stimulating discussions.