Fig 1.
Decision boundaries for positive, negative or random mixing in the network.
If the average similarity of connected nodes in the network falls in the top 2.5% quantile of
(e.g., green line) we can conclude—at the significance level of α = 0.05—that the network is positively mixed. Similarly, if
falls in the bottom 2.5% quantile of
(e.g., red line) the network is negatively mixed. Otherwise (e.g., orange line) we cannot reject the hypothesis that the network is randomly mixed with respect to x.
Fig 2.
The computation of VA-index in a nutshell.
VA-index involves network randomization and empirical hypothesis testing for quantifying the assortativity of a network with respect to a mutli-dimensional nodal attribute.
Fig 3.
Sensitivity of our metric with respect to ξ and ϵ.
The proposed VA-index outperforms the baseline extension of assortativity coefficient. Furthermore, it does not appear sensitive to the choice of ϵ (Eq (5)) and/or similarity metric.
Fig 4.
Comparison of the VA-index with the baseline extension of assortativity coefficient.
The VA-index outperforms the baseline metric in all cases, irrespective of x’s elements variance, correlation and the density δ of Σ. Nevertheless, for low variance the baseline performs almost equally as good with respect to the RMSE.
Table 1.
Mean difference Δeν between the absolute error of our method and the baseline.
The significance codes correspond to the two-sample t-test: 0 ‘***’ 0.01 ‘**’ 0.05 ‘*’ 0.1 ‘.’ 1 ‘’. Low, medium and high density correspond to δ ∈ [0, 0.2], δ ∈ [0.4, 0.6] and δ ∈ [0.8, 1] respectively.
Fig 5.
The bias and the variance of the VA-index.
Both the bias and the variance of the VA-index have small absolute values. However, values around ϵ = 1 appear to provide the best performance with regards to minimizing the mean square error of the estimator.
Table 2.
There is a clear positive assortativity mixing with regards to the mobility trails of Gowalla users.
Even when controlling for the home-distance distribution the average pairwise similarity in the real network is significantly higher compared to that of a randomized network.