Nonparametric Sparsification of Complex Multiscale Networks

Many real-world networks tend to be very dense. Particular examples of interest arise in the construction of networks that represent pairwise similarities between objects. In these cases, the networks under consideration are weighted, generally with positive weights between any two nodes. Visualization and analysis of such networks, especially when the number of nodes is large, can pose significant challenges which are often met by reducing the edge set. Any effective “sparsification” must retain and reflect the important structure in the network. A common method is to simply apply a hard threshold, keeping only those edges whose weight exceeds some predetermined value. A more principled approach is to extract the multiscale “backbone” of a network by retaining statistically significant edges through hypothesis testing on a specific null model, or by appropriately transforming the original weight matrix before applying some sort of threshold. Unfortunately, approaches such as these can fail to capture multiscale structure in which there can be small but locally statistically significant similarity between nodes. In this paper, we introduce a new method for backbone extraction that does not rely on any particular null model, but instead uses the empirical distribution of similarity weight to determine and then retain statistically significant edges. We show that our method adapts to the heterogeneity of local edge weight distributions in several paradigmatic real world networks, and in doing so retains their multiscale structure with relatively insignificant additional computational costs. We anticipate that this simple approach will be of great use in the analysis of massive, highly connected weighted networks.


Supporting Information
depicts a number of the distributions used in the disparity filter [1] for various node degrees. We see that the family of distributions is quite rich and able to capture many different shapes of the local fractional edge weight distributions. However, Supporting Information Figure  S2 shows the local empirical cdfs for a random sample of nodes from the equities network (left panel), and the airline network (right panel). In the case of the equities network, all nodes have 873 neighbors (it is fully connected) and so a single distribution (thicker, pink-colored curve in figure) will be used by the disparity filter to determine significance. It is evident that the parametric distribution does not well-approximate the observed distributions. For the airline network we show the empirical cdfs of all nodes with degree 51, along with the distribution corresponding to the model of [1]. In this case the parametric distribution has similar shape to the empirical distributions; however, the shapes and scales of the empirical distributions vary, and it is evident that local information is being ignored.
As a quantitative measure to show the statistical significance of local heterogeneity we performed a Kolmogorov-Smirnov test [2] for all nodes in the equities, airline, and art networks with 40 or more neighboring nodes. The Kolmogorov-Smirnov test is a statistical test to determine equality of distributions between a sample (the empirical cdf) and a reference distribution (the parametric distribution [1] would have used for the node). Supporting Information Table S1 summarizes the results. We see that in the equities and art networks all tests rejected the null hypothesis at the 5% and 1% significance levels indicating the parametric distribution does not capture the true distributions of fractional edge weight. For the airline network 99.4% and 98.3% of the tests rejected the null-hypothesis at the 5% and 1% significance levels. Again we see that there is statistically significant heterogeneity in the distribution of fractional edge weight among nodes with the same degree. Therefore, the use of the empirical cdf in LANS is justified as it captures this extra information that previously has been ignored.

Clustering equities with sparsified network
We evaluate network sparsification using LANS as a preprocessing step for a common task in network analysis, namely clustering nodes. The goal is to assign each node to a single cluster, such that nodes within the same cluster are "more similar" to each other than to nodes in other clusters. A common method for clustering the nodes of a network is with spectral clustering [3].
We perform clustering on the equities network sparsified using the LANS method with α = 0.003. In Figure 4 in the main text of the paper, we have colored the nodes of the equities network according to the cluster assignment found with spectral clustering with 22 clusters. The number 22 was chosen as the number of clusters by maximizing the likelihood of a Gaussian mixture model on the same data. We see that most clusters do correspond to known economic sectors. This agrees with previous work on clustering equities using correlations [4].

Airline network
Supporting Information Figure S3 depicts the sparsified airline network found with LANS (Box a), the disparity filter (Box b), and the bistochastic filter (Box c). We see that LANS has sparsified the network in such a way that more of the network is connected and there are far fewer very small components. Additionally, the resulting network is more tree-like. The backbone extracted using the disparity filter on the other hand has many more small components and the large component is much denser than that of LANS. This indicates that in this network LANS tends to add edges between connected components while the disparity filter tends to add edges within a connected component. The backbone network created using the bistochastic filter has a tree-like structure, but does not retain any multiscale information.

LANS pseudocode
Supporting Information Figure S4 provides pseudocode for the creation of a backbone network using LANS, given a matrix S of similarity values and significance level α, return sparsified network in A.

Additional tests using bistochastic filter
We performed two additional experiments using the bistochastic filter, intended to serve as a comparison with LANS. We measured the fraction of edges retained as a function of the edge weight threshold in the equities and airline networks and observed that most weights are quite small and thus are excluded exponentially quickly once the threshold rises even slightly above zero (see Supporting Information Figure  S5). Thus, it is difficult to observe fine gradations between backbone networks generated using the bistochastic filter without using exponentially small increments in the weight threshold, which can be computationally very intensive.
We also compared our method to the bistochastic filter using a different stopping criterion for adding edges. Specifically, after obtaining a bistochastic matrix using the equities weight matrix, we kept all edges whose (transformed) weight exceeded the average of all the weights. This led to a very densely connected network with a single connected component and no obvious structure.