^{1}

^{*}

^{2}

^{*}

^{3}

^{4}

^{1}

^{5}

^{6}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: AB DG. Performed the experiments: DG MK YT. Analyzed the data: DG MK. Contributed to the writing of the manuscript: DG AB PP SY PP. Contributed theory: DM PP SY.

We consider the dimensionality of social networks, and develop experiments aimed at predicting that dimension. We find that a social network model with nodes and links sampled from an

Empirical studies of on-line social networks as undirected graphs suggest these graphs have several intrinsic properties: highly skewed or even power-law degree distributions

In order to accurately capture the observed properties of social networks—in particular, constant or shrinking diameters—the dimension of the underlying metric space in the GEO-P model must grow logarithmically with the number of nodes. The logarithmically scaled dimension is a property that occurs frequently with network models that incorporate geometry, such as in multiplicative attribute graphs

We emphasize that the present paper is the first study that we are aware of which attempts to quantify the dimensionality of social networks and Blau space. While we do not claim to prove conclusively the logarithmic dimension hypothesis for such networks, our experiments, such as those of

Our findings provide evidence for dimensional properties underlying social networks that have a number of potential applications in future studies. First, the dimensional properties could be used for further classification and characterization of different types of networks. Second, many NP-hard optimization problems related to graph properties and community detection are polynomial time solvable in a low dimensional metric space, and thus, our findings suggest new techniques to explore for understanding why we may expect to solve these problems in social networks. Finally, if techniques to find these dimensions emerge, we should be able to create powerful new methods to harness the insight they offer into the network structure.

The particular network model we study is a simple variation on the GEO-P model that we name the memoryless geometric protean model (MGEO-P), since it enables us to approximate a GEO-P network without using a costly sampling procedure. Both GEO-P and the MGEO-P model depends on five parameters described in

the total number of nodes | |

the dimension of the metric space | |

the attachment strength parameter | |

the density parameter | |

the connection probability |

The nodes and edges of the network arise from the following process. Initially the network is empty. At each of

With probability

and where

Each figure shows the graph “replicated” in grey on all sides in order to illustrate the torus metric. Links are drawn to the closest replicated neighbor. The blue square indicates the region

We formally prove that the MGEO-P model has the following properties. Let

1. Let

This result implies that the degree distribution follows a powerlaw with exponent

2. The average degree of node of

3. The diameter of

This last property suggests that, ignoring constants, for a network with

Thus, like some network models that incorporate geometry

Both graph motifs and spectral densities are numeric summaries of a graph that abstract the details of a network into a small set of values that are independent of the particular nodes of a network. These summaries have the property that isomorphic graphs have the same values, and we will use these summaries to determine the dimension of the metric space that best matches Facebook and LinkedIn networks as illustrated in

Throughout, red lines denote the flow of features for the MGEO-P networks whereas blue lines denote flow of features for the original networks. At the bottom, we show an enlarged representation of the 8 graphlets we use.

We study dimensional scaling in social networks by comparing samples of the MGEO-P networks of varying dimensions with samples of social network data from Facebook and LinkedIn. We pay particular attention to the relationship between the number of nodes

Facebook distributed 100 samples of social networks from universities within the United States measured as of September 2005

We see similar scaling for both types of networks, but with slightly different offsets. For Facebook,

The results of our dimensional fitting for graphlets are shown in

Each red dot (SVM) is the predicted dimension computed via graphlet features and a support vector machine classifier. For the Facebook data, we find that

Each blue point (Eigen) is the dimension of the MGEO-P sample with the minimum KL-divergence between the graph and the MGEO-P sample. We also show any other other dimensions within 5% of this divergence value. The dimensions shift modestly higher for Facebook and remain almost unchanged for LinkedIn. Both still are closely correlated with the theoretical prediction based on the model based on

Data | Dimension fit | Coefficients | |||

Graphlet | 2.06 | −3.00 | (1.851, 2.264) | (−3.821, −2.182) | |

Spectral density | 1.21 | 1.65 | (0.9782, 1.446) | (0.7272, 2.578) | |

Graphlet | 0.98 | 1.01 | (0.786, 1.178) | (0.1591, 1.87) | |

Spectral density | 0.77 | 1.1 | (0.56, 0.99) | (0.23, 1.95) |

The most important feature of these results is that both methodologies show similar scaling in how the dimensionality scales with network size. There are minor differences between the precise predicted dimensions–for instance, the spectral density approach predicts slightly higher dimensions for Facebook than does the graphlet approach–but the results agree to a reasonable degree with the dimension predicted by the model:

We investigate the sensitivity of the graphlet results in two settings. If we reduce the training set size of the SVM classifier by using a random subset of 20% of the input training data and then rerun the training and classification procedure 50 times, then we find a distribution over dimensions that we report as a box-plot, shown in

There is a growing body of evidence that argues for some type of geometric structure in social and information networks. An important study in this direction views networks as samples of geometric graphs within a hyperbolic space

Note that these results do not conclusively argue that MGEO-P is a

The MGEO-P model correctly captures the peak of the distribution around 1, but fails to completely capture the tail between 1 and 2. Thus, we see meaningful difference between these profiles and hence, do not suggest that MGEO-P captures all of the properties of real-world social networks.

This finding suggests a number of opportunities for designing social network models with metric spaces that evolve in time. We believe that such models offer the opportunity to identify new properties of social network based on emergent properties of the models. One question to address is how the metric space and connection radius change, if at all, as the network grows. Answering this question would provide insight into the value of additional users of a network. Additionally, our results suggest that many network models that assume a fixed dimension should be reevaluated.

To determine the powerlaw exponent

The MGEO-P model of a network predicts that the dimension

To compute graphlets, we employ the rand-esu sampling algorithm

We approximate the spectral density via a 201-bin histogram of the eigenvalues of the normalized Laplacian, which all fall between 0 and 2. (The choice of 201 was based on prior experiences with the spectral histograms of networks.) To compute eigenvalues of a network, we employ the recently developed ScaLAPACK routine using the MRRR algorithm

We used a multi-class support-vector machine (SVM) based classification tool from Weka

Consider a graph

In order to derive this simple expression, we make the simplifying assumption that

We acknowledge Jure Leskovec for allowing us to access the LinkedIn dataset.

This supporting document contains the following components of our analysis. (i) Formal proofs of the MGEO-P properties. (ii) Full statistical information about each of the Facebook and LinkedIn networks including the graphlet counts. (iii) Figure S1: Predicted dimensions of random graphs with the same degree distribution. (iv) Figure S2: The change in predicted dimension found by perturbing the graph structure. (v) A discussion of the sensitivity results about the predicted dimension.

(PDF)

^{2}) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem. Ph.D. thesis, University of California, Berkeley.