Dimensionality of Social Networks Using Motifs and Eigenvalues

doi:10.1371/journal.pone.0106052

Table 1.

The parameters of the MGEO-P model.

More »

Expand

Figure 1.

An example describing the MGEO-P process on a graph with nodes in the unit square with torus metric, where and and .

Each figure shows the graph “replicated” in grey on all sides in order to illustrate the torus metric. Links are drawn to the closest replicated neighbor. The blue square indicates the region . Top row (left to right) The MGEO-P process begins with relatively few nodes, and thus, nodes must have large influence radii (red squares) to link anywhere. As more nodes arrive, large radii result in many connections, modeling influential users, and small radii result in a few connections, modeling standard users. Bottom row Illustrates the final constructed graph.

More »

Expand

Figure 2.

At left and center, we have the steps involved in fitting via graphlets; at right and center, we have the steps involved in fitting via spectral histogram.

Throughout, red lines denote the flow of features for the MGEO-P networks whereas blue lines denote flow of features for the original networks. At the bottom, we show an enlarged representation of the 8 graphlets we use.

More »

Expand

Figure 3.

The scale of the network data involved in our study varies over three orders of magnitude.

We see similar scaling for both types of networks, but with slightly different offsets. For Facebook, with ; for LinkedIn with . The regularity in the LinkedIn sizes is due to our construction of those networks.

More »

Expand

Figure 4.

Facebook dimension at left, LinkedIn dimension at right.

Each red dot (SVM) is the predicted dimension computed via graphlet features and a support vector machine classifier. For the Facebook data, we find that . For the LinkedIn data, we find that . And these are plotted as the red linear fit line. Our theoretical model predicts a dimension of and we plot this as the dashed line. In each figure, we show the variance in the fitted dimension as a box-plot. We estimate the variance by using only 20% of the original training data and repeating over 50 trials. There are only a few outliers for small dimensions.

More »

Expand

Figure 5.

Facebook data at left, LinkedIn data at right.

Each blue point (Eigen) is the dimension of the MGEO-P sample with the minimum KL-divergence between the graph and the MGEO-P sample. We also show any other other dimensions within 5% of this divergence value. The dimensions shift modestly higher for Facebook and remain almost unchanged for LinkedIn. Both still are closely correlated with the theoretical prediction based on the model based on (dashed line). The linear fits to the predicted dimensions is plotted as the red linear fit line.

More »

Expand

Table 2.

Dimension scaling for Facebook and LinkedIn.

More »

Expand

Figure 6.

For three of the Facebook networks, we show the eigenvalue histogram in red, the eigenvalue histogram from the best fit MGEO-P network in blue, and the eigenvalue histograms for samples from the other dimensions in grey.

The MGEO-P model correctly captures the peak of the distribution around 1, but fails to completely capture the tail between 1 and 2. Thus, we see meaningful difference between these profiles and hence, do not suggest that MGEO-P captures all of the properties of real-world social networks.

More »

Expand