Evolution Characteristics of the Network Core in the Facebook

Statistical properties of the static networks have been extensively studied. However, online social networks are evolving dynamically, understanding the evolving characteristics of the core is one of major concerns in online social networks. In this paper, we empirically investigate the evolving characteristics of the Facebook core. Firstly, we separate the Facebook-link(FL) and Facebook-wall(FW) datasets into 28 snapshots in terms of timestamps. By employing the k-core decomposition method to identify the core of each snapshot, we find that the core sizes of the FL and FW networks approximately contain about 672 and 373 nodes regardless of the exponential growth of the network sizes. Secondly, we analyze evolving topological properties of the core, including the k-core value, assortative coefficient, clustering coefficient and the average shortest path length. Empirical results show that nodes in the core are getting more interconnected in the evolving process. Thirdly, we investigate the life span of nodes belonging to the core. More than 50% nodes stay in the core for more than one year, and 19% nodes always stay in the core from the first snapshot. Finally, we analyze the connections between the core and the whole network, and find that nodes belonging to the core prefer to connect nodes with high k-core values, rather than the high degrees ones. This work could provide new insights into the online social network analysis.


Introduction
Online social networks are organized around participating users who create interactions with whom they associate [1][2][3]. As online social networks are gaining more attentions, more than a billion people have been integrated to make friends, communicate with friends, share interests, spread ideas and so on [4,5]. An in-depth investigation of the evolving network core is very important for deeply understanding the evolving characteristics of online social networks [6,7], where the core could be identified by the k-core decomposition method [8]. Carmi et al. [9], Zhang et al. [10], and Orsini et al. [11] investigated the topological properties of the internet at the autonomous system level, and found that the internet core was a small and well connected subgroup, specifically its size was approximately stable over time. Kitsak et al. [12] employed the k-core decomposition method to identify the most influential spreaders which is defined as the nodes with the highest k-core value(i.e. core). Miorandi et al. [13], Ren et al. [14] extended the k-core decomposition method to identify the node spreading influence in networks, and Garas et al. [15] presented a generalized method for calculating the k-core structure of weighted networks. These works have similar conclusions that nodes belonging to the core are the most influential spreaders. Regarding to the internet network analysis, little attention has been paid to the core properties of the online social networks. In this paper, we empirically analyze the evolution characteristics of the Facebook's core, and the statistical results indicate that (1) The core sizes of the Facebook-link(FL) and Facebook-wall(FW) networks are approximately stable around 672 nodes and 373 nodes respectively. (2) Nodes belonging to the core get more interconnected, and their k-core values increase correspondingly.
(3) The life span analysis of the nodes belonging to the core reveals that more than 50% nodes stay in the core for more than one year, and 19% nodes always stay in the core from the first snapshot. (4) The nodes in the core prefer to connect to high k-core nodes, regardless of the high degree ones.

Datasets
The Facebook datasets [16] are investigated in this paper, which consist of two different components. The first one is the Facebooklink(FL) that spans from September 5, 2006 to January 22, 2009. The timestamp of each link indicates the time when one pair of users become friends. The other one is the Facebook-wall(FW) that spans from September 14, 2004 to January 22, 2009. It should be noticed that a user can post comments on his/her friends' walls, and these comments can be seen by visitors. In this paper, we treat these interactions as undirect links. The information of each link in the FW network consists three parts: The wall owner, the user who posted and the corresponding posted time. In order to compare the evolution characteristics of the network core between the FL and FW networks, the period investigated in this paper is set from September 2006 to December 2008.
Firstly, we separate the FL network into pieces with the interval of one month. Since approximately 41% timestamps of links could not be determined, we set this kind of links as the initial network S 0 . The first piece S 1 is set from September 1, 2006to September 30, 2006. The second one S 2 is set from October 1, 2006to October 31, 2006, and the last one S 28 is set from December 1, 2008 to December 31, 2008. Based on the initial network S 0 and 28 pieces, we can construct 28 corresponding snapshots. The first snapshot is defined by merging S 0 and S 1 . The second one consists of S 0 , S 1 and S 2 . The last one consists of the initial network S 0 and all pieces S 1 ,S 2 , Á Á Á and S 28 . Similarly, the FW network can also be separated into 28 snapshots. It is emphasized that in the FW network, the initial network S 0 corresponds to September 14, 2004 to August 31, 2006.

Methods
Identifying the network core has been extensively investigated [9-13, 15, 17-19]. For example, the core might be defined as the set of all nodes with degree higher than some threshold. But this method requires setting a free parameter, the degree threshold. Other methods [18] like k-clique, k-clan and some improved methods based k-core methods like k-dense [11], Medusa-model [9] are used to identify the AS network core. In this paper, we focus on investigating the set of most influential nodes in online social networks, which is defined as the nodes with the highest kcore value [12]. We employ the k-core decomposition method to obtain the cores of different snapshots. The k-core decomposition method could be implemented as shown in Fig. 1. Firstly, remove all 1-degree nodes, and then keep pruning these nodes until no more such nodes remaining, the remained nodes form the node set named 2-core. In the similar manner, repeat the pruning process in a similar way for other nodes in the network which have assigned to the corresponding cores(denoted as k s ). The nodes with the largest k-core value is defined as the network core.
The following definitions are given to analyze the evolutional characteristics. The relative growth rate r(t) is defined to measure the core growth comparing with the network growth.
where d(i,t)~1 if node i exists in the core of the t th snapshot; Otherwise d(i,t)~0. N(t) is the number of nodes in the t th snapshot, and t[½1,28. If Dr(t)D~1, the size change of the cores is the same as that of the network. If Dr(t)Dv1, the size change of cores is less in compared with that of the network, otherwise Dr(t)Dw1.
To give the life span definition, we need to measure the existing times L(i) and the continues lifetime M(j, s). The existing times L(i) quantifies the number of snapshots that node i exists in the 28 cores. The continues lifetime M(j, s) quantifies the number of nodes staying in the cores from the j th snapshot to the s th one, which could be defined as follows.
If L(i)~0, the node i never appears in any core, and L(i)~28 means that the node i stays in all 28 cores. M(j,s)~1 means that there is one node stays in the core from the j th snapshot to the s th one, and M(j,s)[½0,N(t n ). According the above definitions, we can give the distribution P(L) of the existing times L(i) and the distribution P(M) of the nodes who exists in the cores from the t th snapshot to the last one.
where the n(t n ) is the core size of the last snapshot. To investigate the connection patterns from the viewpoints of the k-core(k s ) and the degree(k), the correlation between the k s value of the core element and the k s values of its neighbors and corresponding P(k,t) are defined as follows.
where k s (j,t) is the k s value of node j in the t th snapshot. k(j,t) is the degree of node j in the t th snapshot. The node j is one of the core neighbors. k(i,t) is the degree of the core node i in the t th snapshot. The n(t) is the core size of the t th snapshot.

Additional methods
The properties of the core including assortative coefficient (r(t)) [23], clustering coefficient (c(t)) [24], and the average shortest path lengths (l(t)) are detailed as follows. The assortative coefficient is a measure of the likelihood for nodes which connect to other nodes with similar degrees. A general measure of assortative coefficient is given by [23].
where j(i,t), h(i,t) are the degrees of the nodes of the i th link in the core of the t th snapshot, for i~1,2, . . . ,E(t). The assortative coefficient value ranges between 21 and 1. By construction, this formula yields r~0 when the amount of assortative mixing is the same as that expected independently at random i. A positive assortative coefficient value means that nodes tend to connect to the nodes with similar degree, while a negative assortative coefficient value means that nodes likely connect to nodes with very different degrees from their own.
The clustering coefficient is calculated as follows [24], (number of pairs of neighbors of i that are connected) (number of pairs of neighbors of i) : To understand how the shortest path lengths of the network core change in the evolving process, the average shortest path lengths l is used to express as follows.
where d(i,j,t) is the shortest path distance between node i and j in the core of the t th snapshot.

Results
The size stability of the core As shown in Fig. 2(a), the sizes of the FL and FW networks grow exponentially with N(t)*10 lt , where the parameters l are 0.078 and 0.028 respectively. However, as shown in Fig. 2(b), the core sizes of the FL and FW networks are approximately stable over time. The core relative growth rate r(t) of both the FW and the FL(after March{07) networks fluctuate around zero with time when is as shown in Fig. 2(c). In addition as shown in Tab. 1, we statistics the average core size n n which are equal to 672 and 373 respectively, and the average core relative growth rate value r r which are equal to 0.040 and 0.009 respectively. That is to say, the size of the core keep stable comparing with the rapid growth of the whole network. Our results suggest that as the Facebook becomes increasingly popular and attracts more and more users, the size of the network grows fast, while the size of core still maintains a stable level.
The evolving topological properties of the core From Fig. 3, we could find that the k s values of cores increase quickly, which indicates that the nodes of cores connect each other more closely. Figure 4 presents the evolving topological properties of the core. In Fig. 4(a), the assortative coefficient r(t) of the FL core is always lower than 0.05, while the r(t) of the FW core keeps decreasing from 0.25 to 0. The results indicate that the users in the Facebook core choose friends to post their comments in walls of their friends independently. They do not care their friends who have be or not be most popular or influential. From Fig. 4(b), one can find that the clustering coefficient c(t) gets larger with time, which indicates the core becomes more interconnected. As shown in Figure 4(c), the shortest path lengths of core gently decreases to 3 in the FL network, and to 4.5 in the FW network as time varying. The decreasing trend also manifests that the core is becoming more interconnected over time.
The life span of nodes in the core Figure 5(a) shows the distribution of the existing times P(L), which has a 'U' shaped feature. There are lots of nodes whose existing times are less than 6 or more than 24. Meanwhile, we analyze the number of nodes that stay in the core from one snapshot to last snapshot. Figure 5(b) indicates that over 50% nodes stay in the core for more than one year, and 19% nodes always stay in the core from the first snapshot. We suggest that when the Facebook quickly becomes popular and attracts large amounts of users, the most influential and active users will inhabit in the Facebook core for long time.

The connections between the nodes of core and their neighbors
Online social interactions have provided plentiful evidence of their influence for information diffusion. Unfortunately, it is difficult to understand the tendency for individuals who connect to friends with similar tastes or popular preferences [20]. A wellknown tendency is that new connections are made preferentially to more popular nodes [21]. Nonetheless, Papadopoulos et al. [22] pointed out that the connections should be formed by the trade-off optimization between the popularity and similarity. Here we analyze the core connections from the viewpoints of the k s values and degree respectively. As shown in Fig. 6(a) and (b), from which we observe that the correlation between the k s value of the core nodes and the k s values of its neighbors increases with the k s values in the FL and FW networks. We could find that the nodes with larger k s values are more likely to connect to the core. However, as time increasing, the correlation between the k s value of the core nodes and the k s values of its neighbors has fallen with the k s values. Figure 6(c) and (d) show the correlation between the degree k of the core element and the degree of its neighbors, from which we could see that nodes in the network have a high probability to connect to core even if they do not have largest degrees. We could conclude that nodes in the core prefer to connect to nodes with higher k s values, rather than the degrees ones.

Conclusions and Discussions
In this article, we empirically investigate the evolving characteristics of the core of the Facebook. We separate the Facebooklink(FL) and Facebook-wall(FW) networks into 28 snapshots in terms of timestamps, and employ the k-core decomposition method to identify the core of each snapshot. The empirical results show the number of users grows exponentially in the evolving process, while the core sizes approximately keep stable levels about 672 and 373 for the FL and FW networks respectively. We also analyze topological properties of the core including the k s values, assortative coefficient r(t), clustering coefficient c(t) and the average shortest path length l(t) versus time t. The k s values of cores increase quickly. The assortative coefficient r(t) of the FL core is always lower than 0.05, while the r(t) of the FW core keeps decreasing from 0.25 to 0. The clustering coefficient c(t) gets larger with time, which indicates the core becomes more interconnected. The shortest path length of core gently decreases from 3.5 to 3 in the FL network and from 5.7 to 4.5 in the FW network. From these topological properties of the core, we could conclude that the users in the core become more interconnected. Furthermore, we analyze the life span of nodes belonging to the core. The distribution of the existing times P(L) indicates that there are lots of nodes whose existing times are less than 6 or more than 24. Specially the distribution of the continues lifetime P(M,t) indicates that more than 50% nodes stay in the core for more than one year, and 19% nodes always belong to the core from the first snapshot. We estimated that the most influential users stay in the Facebook core for a long time. Finally, we analyze the connections of individuals in the core. The correlations between the k s value(k) of the core element and the k s (k) values of its neighbors indicate that the users in core prefer to make interactions with network users with higher k-core values, regardless of the high degree ones.
Our analysis only focused on the evolutional characteristics of the network core in the Facebook, but some additional researches are necessary to complete our findings. First, in this paper, we investigate the evolution properties of the Facebook core with the time interval one month. However, the identification of the network core is affected by the time interval, therefore we also  The average value n n is defined as n n = S t = 1 n(t)/28; The average value r r is defined as r r = S t = 1 r(t)/28. doi:10.1371/journal.pone.0104028.t001 investigated the corresponding results with the time interval two month as shown in the Fig. 7 and Tab. 1, and find that the core size, relative growth rate and other statistical characteristics are robust with the time interval, which suggests that the results obtained in the paper is independent with the time interval. It also should be emphasized that the interactions in online social networks are evolving rapidly, therefore, how to model the temporal relationship between each pair of users and identify the corresponding network core is still an open question for the online social network analysis.Second, although the k-core definition of the undirect network core is parameter-free and effective to implement, a lot of online social networks are directed weighted networks which are not suitable for the implementation of the kcore decomposition method. To further validate the work presented here, our work will develop a reliable identification of core for directed weighted networks.
In addition, our research could supple some important criteria for modeling the core of the online social networks. In a broader   There are lots of nodes whose existing times are less than 6 or more than 24, and less users whose existing times are larger than 6 and smaller than 24. (b) The distribution P(M) of the nodes who exists in the cores from the t th snapshot to the last one. There are more than 50% nodes belong to the cores for more than one year, 19% nodes always belong to the cores from the first snapshot. doi:10.1371/journal.pone.0104028.g005 The correlation between the degree k of the core element and the degree of its neighbors, from which we could see that nodes in the network have a high probability to connect to core even if they have not largest degrees. doi:10.1371/journal.pone.0104028.g006 context, our work may be relevant to construct dynamical core model to understand the evolution of the online social network core deeply. User interactions on the online social networks also affect the user behaviors, thus the user behaviors should not just consider the influence of the online societies, but also the influence of offline societies. Specially, the offline social influence could change the user behavior, and then may cause users to leave, which may trigger further leaves of others who lost connections to their friends. This may lead to cascades of users leaving and change the online social network topological structures dramatically. Hence, How to quantify the influence of the offline societies in these online systems can also be an interesting and important open problem.