Emergence of Scale-Free Close-Knit Friendship Structure in Online Social Networks

Although the structural properties of online social networks have attracted much attention, the properties of the close-knit friendship structures remain an important question. Here, we mainly focus on how these mesoscale structures are affected by the local and global structural properties. Analyzing the data of four large-scale online social networks reveals several common structural properties. It is found that not only the local structures given by the indegree, outdegree, and reciprocal degree distributions follow a similar scaling behavior, the mesoscale structures represented by the distributions of close-knit friendship structures also exhibit a similar scaling law. The degree correlation is very weak over a wide range of the degrees. We propose a simple directed network model that captures the observed properties. The model incorporates two mechanisms: reciprocation and preferential attachment. Through rate equation analysis of our model, the local-scale and mesoscale structural properties are derived. In the local-scale, the same scaling behavior of indegree and outdegree distributions stems from indegree and outdegree of nodes both growing as the same function of the introduction time, and the reciprocal degree distribution also shows the same power-law due to the linear relationship between the reciprocal degree and in/outdegree of nodes. In the mesoscale, the distributions of four closed triples representing close-knit friendship structures are found to exhibit identical power-laws, a behavior attributed to the negligible degree correlations. Intriguingly, all the power-law exponents of the distributions in the local-scale and mesoscale depend only on one global parameter, the mean in/outdegree, while both the mean in/outdegree and the reciprocity together determine the ratio of the reciprocal degree of a node to its in/outdegree. Structural properties of numerical simulated networks are analyzed and compared with each of the four real networks. This work helps understand the interplay between structures on different scales in online social networks.


Introduction
In recent years, an increasing number of online social systems (e.g., YouTube and Facebook) have been attracting wide attention from different fields [1][2][3]. Online social networks provide a platform for web surfers to make acquaintance with congenial friends [4], exchange photos and personal news [5], share videos [6], establish communities or forums on focused issues [7], etc. These online interactive behaviors, which partly reflect real-life social relationships among people, provide an unprecedented opportunity to study and understand the dazzling characteristics of real-life social systems [8,9].
Complex network theory has been proven to be a powerful framework to understand the structure and dynamics of complex systems [10][11][12][13][14][15][16]. Online social systems have been treated as undirected networks [17,18], which have been applied successfully in exploring various systems [10]. This simplification, however cannot describe the asymmetric interactions among users. Taking Flickr as an example, if a user A designates another user B as a friend, user A can see the photos of user B, but not the other way round unless user B also designates user A as his friend. Technically, an asymmetric interaction represents one directed link, and many online social systems are thus directed networks in nature. The directionality of links is important in characterizing the functioning of many systems, e.g., leadership structure of social reputation [19,20], reciprocal behavior in evolutionary games [21], information hierarchy of the World Wide Web [22,23], citation relationship of scientific publications [24,25], etc. Much effort has been devoted to understanding the structural properties of these directed networks, including the indegree and outdegree distributions [26], average shortest distance [26], degree correlation [27], and community structure [28][29][30]. Correspondingly, there are many models proposed for the underlying mechanisms of the statistical properties. Dorogovtsev et al. [31] generalized the Barabási-Albert(BA) model [32] and obtained the exact form of the indegree distribution of growing networks in the thermodynamic limit. Krapivsky et al. [33] introduced a directed network model that generates correlated indegree and outdegree distributions. Zhou et al. [20] argued that the ''good get richer'' mechanism would facilitate the emergence of scale-free leadership structure in online social networks.
Up to now, most of the work on complex networks can be classified into studies on three scales: the local scale based on the single node properties (through statistical distributions), the macroscale based on the global properties of networks (with global parameters), and the mesoscale based on properties due to a group of nodes (via modular properties) [34][35][36]. However, a majority of studies focused on the first two scales. In view of the significant role of modularity in the functionality of real networks, it has become increasing important to study the mesoscale structures. Communities and motifs are two key mesoscale structures of real complex networks. Community structures at mesoscale level are ubiquitous in a variety of real complex systems [37,38], such as Facebook, YouTube, and Xiaonei. There are more connections among members of the same community than among members in different communities. Lancichinetti et al. analyzed the statistical properties of communities in five categories of real complex networks, and found that communities detected in networks of the same category display similar structural characteristics [39]. Motifs, which are defined as subgraphs that occur much more often than expected in a random network, play a significant role in our understanding of the interplay between the structures and dynamics of real complex networks [40][41][42][43][44][45].
In spite of the structural features revealed at the three scales, understanding the interplay between the different scales has remained a major challenge [34][35][36]. In the present work, we study how the close-knit friendship structures of online social networks at the mesoscale level and the structural properties at the two other scales are affecting each other. In social networks, the close-knit friendship structure describes the closest unit, which is usually represented by the closed triples. In a directed network, there are 13 different possible three-node subgraphs [41]. For situations without reciprocal links, a focal node has three possible unclosed triples. Each unclosed triple can be closed by adding a directed link between the two unconnected nodes, giving rise to four types of closed triples as shown in Figure 1 [44,45]. The four closed triples fall into two groups: one is a feedback (FB) loop and the three others are feedforward (i.e., FF a , FF b , and FF c ) loops. Structurally, the roles of three nodes in the FB loop are equivalent, but it is not the case in the FF loops. Any FF a loop (from the perspective of the focal node) becomes a FF b loop for another node and a FF c loop for the third node, and thus the numbers of three feedforward loops are equal in directed networks. Compared to the unclosed triples, the closed triples play a more important role in dynamical processes on online social networks [46,47], such as opinion formation [48], game dynamics [49], and cooperation evolution [50].
In online social networks, the closed triples are a good indicator of close-knit friendships among people. To understand the mesoscale structural properties of online social networks, we analyze data of popular online social networks, establish the empirical facts, and introduce a directed network model. We analyze four large-scale online social networks, namely Epinions, Slashdot, Flickr, and Youtube, and establish that the distributions in each scale follow a similar power law. We propose a simple directed network model incorporating two processes: external reciprocation and internal evolution. Theoretical analysis shows that the distributions of four closed triples display almost identical scaling laws due to the negligible degree correlations, and the distribution exponents depend only on one global parameter -the mean in/outdegree. Simulation results based on the model are basically consistent with both the empirical results and theoretical analysis.

Empirical Results
We first analyze four representative directed online social networks and establish the empirical features. As listed in Table 1, these four datasets are: (i) Epinions Social Network (ESN, http:// snap.stanford.edu/data/soc-Epinions1.html) [51]: a who-trustwhom online social network of a general consumer review site Epinions.com in which members can decide whether to ''trust'' each other or not, and subsequently all the trusted relationships form a so-called social trust network. (ii) Slashdot Social Network (SSN, http://snap.stanford.edu/data/soc-Slashdot0902.html) [51]: a friendship network of a technology-related news website Slashdot.com. Nodes are the users and links represent the friendships among the users. (iii) Flickr Social Network (FSN, http://socialnetworks. mpi-sws.org/data-imc2007.html) [52]: a friendship network of a photo-sharing site Flickr.com that allows users to designate others as ''contacts'' or ''friends'' and track their activities in real time. This network contains all the friendship links among the users of Flickr. (iv) YouTube Social Netowrk (YSN, http://socialnetworks.mpi-sws. org/data-imc2007.html) [52]: a friendship network of a popular video-sharing website YouTube.com on which users can upload, share and view videos. The nodes in the network are the users of YouTube, and a directed link is established from a user A to a user B when user A declares user B as a friend. Table 1 summarizes the basic global features of the four online social networks. These networks all show a large reciprocity r, defined by r~E r =(E{E r ) [53] with E r and E being the numbers of reciprocal links and single directed links, respectively. Note that a reciprocal link contributes two single directed links. For example, r&0:25 for ESN, r&0:73 for SSN, r&0:45 for FSN, and r&0:65 for YSN.
We also studied the local-scale structural properties of these social networks via statistical distributions. The results of ESN are presented as an example. Figure 2 shows the indegree and outdegree distributions (black squares) on a log-log plot. The data span more than two decades. The distributions follow a power law with approximately the same exponent, i.e., P(k in )*k {c in in and P(k out )*k {c out out , with c in &1:73 and c out &1:71 obtained by the maximum likelihood estimation [54,55]. More details about the power-law fits are given in Table S1 of Appendix S1. Figure 3 shows that the indegree k in of each node is nearly proportional to Figure 1. Three possible unclosed triples and four basic closed triples for a focal node (red). The basic closed triples correspond to one feedback (FB) loop and three feedforward (FF ) loops. The three feedforward loops differ in the indegree k in of the focal node: k in~0 for FF a , k in~1 for FF b and k in~2 for FF c . The numbers of the three feedforward loops are equal because every FF a loop from the perspective of the focal node constitutes a FF b loop and a FF c loop from the perspective of the another two nodes, but the loops may arise from different growth histories. doi:10.1371/journal.pone.0050702.g001 its outdegree k out (also see Figures S4, S5, S6 of Appendix S1), which is consistent with the similar scaling law of their distributions. In growing networks, the fat-tail power-law behavior in the degree distribution suggests that directed links are not drawn toward and from existing users uniformly. Mislove et al. showed that there is a positive correlation between the number of links a user has and its probability of creating or receiving new links in online social networks [5]. This phenomenon is called ''preferential attachment'' [5,32,33,56]. The behavior k in &k out for any node implies that a node with large k in has a strong ability to attract links from other nodes and also a strong tendency to link to other nodes. This is reminiscences of the product k i out k j in used in the prediction of a link between the nodes i and j [57], i.e., a larger product gives a larger probability of having a directed link from i to j. These results lead us to incorporate a preferential attachment mechanism related to k i out k j in into the mechanism of how the links grow in a network.
The reciprocal degree is the number of reciprocal links that a node possesses. Figure 4 shows that the reciprocal degree distribution also follows a power law P(k r )*k c r r with an exponent c r &1:69 as examined by the maximum likelihood estimation [54,55], similar to that of the indegree and outdegree distributions. Figure 5 shows that the mean reciprocal degree of the nodes with the same indegree Sk r (k in )T is approximately linearly proportional to the indegree k in (also see Figures S10, S11, S12 of Appendix S1), i.e., Sk r (k in )T*k in , and in a similar fashion Sk r (k out )T*k out , implying that the probability that a randomly chosen directed link happens to be a reciprocal link is roughly a constant. All these features are consistent with the observation that the indegree, outdegree, and reciprocal degree distributions all follow a similar exponent.
For mesoscale structures, we focus on the four closed triples i.e., FB, FF a , FF b and FF c . As the numbers of three feedforward loops are equal, i.e., N FFa~NFFb~NFFc , we only look at the total numbers of FB and FF a closed triples. For ESN, N FB~7 40,310 and N FFa~3 ,586,403 as shown in Table 1. Considering the feedforward loops as the same up to the permutation of the focal node, it is interesting to see that N FB : N FFa &1 : 5. This implies the existence of some underlying mechanism. Since the indegree and outdegree distributions are heterogeneous, we study the numbers of the four closed triples (i.e., n FB ,n FFa ,n FF b , and n FFc ) at different nodes and their distributions. Figure 6 shows that, although the numbers of feedback and feedforward loops are different, their distributions follow similar scaling laws, i.e., P(n FB )*n FB  [54,55]. More details on the exponents are given in Table S1 of Appendix S1. Moreover, although the numbers of three feedforward loops are equal, their distributions look slightly different in detail. This is a phenomenon worthy of further research.
To understand this phenomenon, we consider the three unclosed triples in Figure 1. For a node with indegree k in and   outdegree k out , there are C 2 kout unclosed triples A, C 1 kin C 1 kout unclosed triples B, and C 2 kin unclosed triples C when reciprocal links are forbidden, where C m n~n !=½m!(n{m)! denotes the binomial coefficient.
. Accounting for all the nodes, we can obtain the total number of optional closed triples N' FB~P there is no degree correlation and making use of k in &k out , we have N FB : N FFa &1 : 5, which is basically consistent with the ratio found in ESN. The assumption of no degree distribution is supported by the results in Figure 7(a), in which the network shows a very weak degree correlation over two decades that can be treated almost as no degree correlation (further quantitative evidence is given by the Pearson correlation coefficient in Table S2 of Appendix S1) [58]. In this case, the number of closed triples at a node depends only on its indegree k in and outdegree k out , i.e., n FB *k 2 in and n FFa *k 2 in for large k in , n FB *k 2 out and n FFa *k 2 out for large k out . This behavior is confirmed in Figure 8 and Figure 9 (also see Figures S19, S20, S21, S22, S23, S24 of Appendix S1). This also gives the reason why the distributions of four closed triples follow similar scaling laws. Results of analyzing the other three networks (i.e., Slashdot, Flickr and YouTube) also exhibit similar phenomena (see Figures S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, S22, S23, S24 of Appendix S1).

Directed Network Model
We propose a growing network model with node and link creation processes incorporating link directionality that reproduces the empirical features. In the model, we consider two evolutionary ingredients: reciprocation and preferential attachment. On one hand, many empirical results show that the reciprocity r of online social networks is much greater than in sparse random directed networks with r?0 [5,53]. Our results of r&0:45 of FSN and r&0:65 of YSN provide further evidence. The high reciprocity implies that there is a good chance that the creation of a directed link prompts the establishment of a reversed link. For example, users of Flickr often respond to an incoming link by quickly establishing a reversed link as a matter of courtesy [5]. Thus, reciprocation is believed to be an independent growth mechanism in large-scale online social networks. On the other hand, preferential attachment has been proven to be an important and basic growing mechanism in online social networks [5,32,33,56]. Users with large indegrees and outdegrees are more likely to receive incoming links and create outgoing links, respectively. This motivated us to incorporate a preferential attachment mechanism depending on the product k i out k j in in creating new links. The model starts with an initial seed consisting of m 0 nodes. At each time step, a new node is added and 2zmzmp new directed links are introduced according to two processes: external reciprocation and internal evolution.
(1) External reciprocation. The new node in every time step establishes a new directed link with an existing nodes i in the network with a probability proportional to the indegree k i in of node i. To incorporate the reciprocation mechanism, the node i that receives the link creates a reversed link to the new node. Consequently, a reciprocal link is created between these two nodes. This mechanism is reasonable in that a strong motivation of a new user joining a social network is to get connected to and interact with someone already in the network. As we shall see, this process can be treated conveniently in the mathematical analysis of the model.   representing the activity of the network, are created among the existing nodes according to the preferential attachment mechanism. Consider two unconnected nodes i and j up to that time step, a new directed link from node i to node j is created with the probability where k i out and k j in are the outdegree of node i and the indegree of the target node j, respectively, and C in (y) in the normalization factor is the set of incoming neighbors of node y at that time step. This attachment probability is proportional to the product k i out k j in . The larger the product is, the greater probability a new directed link is created between them. For each of the new directed links created, a reversed link will be established with the reciprocation probability p. Therefore, mzmp directed links are introduced into the network through internal evolution in each time step. It should be noted that multiple links between two nodes and self-connections are prohibited in the model.

Rate Equation Analysis
We first analyze the indegree and outdegree distributions of the model. After t steps, the growing directed network has N~m 0 zt nodes and (2zmzmp)t directed links, where the tiny number of initial links in the seed are ignored. Meanwhile, the sum of indegree and the sum of outdegree are equal, i.e., P j k j in~Pj k j out~( 2zmzmp)t. For a sparse network with mean indegree SkT~2zmzmpvvN, we have (2) can be approximated by Consider the creation of one new directed link via the internal evolution at step t. The probability p z k i in that the indegree k i in of node i increases by one due to the creation of one link is where the first term gives the probability that the node i receives a new incoming link from one of the other nodes and the second term gives the probability that a reversed link is created back to node i when a new directed link was created from node i to some node j. According to Similarly, the probability p z k i out that the outdegree k i out of node i increases by one due to the creation of one link is Equations for the rate of change of the expected indegree k i in and outdegree k i out can then be written down. Taking k i in and k i out as continuous variables, the dynamical equations are where the first term in the equations comes from the newly added node in a time step. The difference of the two equations gives where Eqs. (5) and (6) have been used. Let t i be the time that the node i is introduced, i. e., k i in (t i )~k i out (t i )~1. It follows from Eq. (8) that k i in (t)~k i out (t) at any time t. Although the expected value of the difference between indegrees and outdegrees of a node does not grow over time mathematically, the difference does exist in a particular realization of the model in simulations. Eq. (7) and the initial condition where b~(1zmzmp)=(2zmzmp). The indegree and outdegree of the nodes both grow over time in the same functional form, with older nodes having higher indegrees and outdegrees. Let N kin (t) and N kout (t) be the number of nodes with expected indegree k in and outdegree k out at the time step t, respectively. The rate equation of N kin (t) is then given by The first and third terms on the right-hand side account for the increase of N k in (t) due to the external reciprocation and internal evolution, respectively; and the second and fourth terms account for the decrease due to the processes. The last term accounts for the introduction of a new node with indegree k in~1 at time t. Eq. (10) is valid for all k in §1. After many steps t, there are N~m 0 zt&t nodes in the network. In the asymptotic limit, we substitute N kin (t)~tP(k in ), where P(k in ) is the indegree distribution [59], and P j k j in~( 2zmzmp)t into Eq. (10) to obtain the simple recursive relation Using the initial condition that k in~1 at the time that a node was introduced, the solution of Eq. (11) is where A~2 zmzmp 3z2mz2mp C( 1 1zmzmp z3) and C is the Euler gamma function. Using the asymptotic form C(xzl)?x l as x??, we can extract the scaling form Similarly, the rate equation of N kout (t) is given by The first (second) and third (fourth) terms on the right-hand side account for the increase (decrease) in N k out due to the external reciprocation and internal evolution, respectively; and the last term accounts for the introduction of a new node with k out~1 at time t. Substituting N k out (t)~tP(k out ), where P(k out ) is the outdegree distribution, and P j k j in~( 2zmzmp)t into Eq. (14), the recursive relation for P(k out ) is which is identical to Eq. (11) for P(k in ). It follows that Scale-Free Close-Knit Friendship Structure The results show that the expected indegree and outdegree grow over time following the same functional form of Eq. (9), and the indegree and outdegree distributions follow the same scaling law with an exponent c~2z 1 1zmzmp : Next, we consider the reciprocal degree distribution P(k r ). For a node i with k i in~k i out , k i r satisfies the dynamical equation Substituting Eq. (9) into Eq. (18) and using the initial condition that k i r (t i )~1 at the time that node i was introduced, the solution to Eq. (18) is For large k i in , we have Using P(k r )dk r~P (k in )dk in , the distribution P(k r ) follows where c is given by Eq. (17) as for the indegree and outdegree distributions. Furthermore, we analyze the degree correlations between connected nodes by the rate equation approach. Let N lout kin be the number of links that originate from a node with an expected outdegree l out to a node with an expected indegree k in [60]. Generally, P lout kin is defined for k in §1 and l out §2. The quantity N lout kin (t) evolves according to where the first two terms on the right-hand side account for the changes due to the introduction of a new node, including the gains when the new node is connected to a node with indegree (k in {1) (outdegree (l out {1)) which is already connected to a node with outdegree l out (indegree k in ), and the losses when the new node is connected to either end of a link that connects a node with outdegree l out and another node with indegree k in . The third term accounts for the gain in N lout 1 due to the addition of the new node. The remaining terms take into account the changes due to the internal evolution process with the introduction of mzmp directed links.
Asymptotically, N lout kin ?(2zmzmp)tP lout kin , N kin ?tP(k in ) and N lout ?tP(l out ). Considering P k in N kin~P j k j in~( 2zmzmp)t and P x,y,x= [Cin(y) k x out k y in & P k x out | P k y in~½ (2zmzmp)t 2 , Eq. (22) gives a recursive relation {l out k in P(l out )P(k in ):  24) implies that there is no degree correlation, a feature that is supported by the empirical results in Figure 7 for ESN over a wide range of degrees (also see Figures S16, S17, S18 of Appendix S1). It also follows from k i in~k i out and Eq. (24) that P l out k in~P l out k out~P l in k in~P l in k out . Interpreting P l out k in as a joint probability, the lack of degree correlation as expressed in Eq. (24) implies that the conditional probability which is independent of l out . For a node i with large k i in~k i out , the average nearest neighbor function can be calculated as which is also independent of k out . This is consistent with the behavior of k nn in (k out ) in ESN, as shown in Figure 7. The number of FB loops can be formally written as [61] n FB~C where P k'' out k' in is the probability that a link connects a node with outdegree k'' out to a node with indegree k' in . The lack of degree correlations makes the summations independent of k in and k out , and thus n FB scales as Similarly, the numbers of four closed triples n D at a node with large indegree and outdegree follow the scaling behavior n D *k 2 in or n D *k 2 out . Combining n D *k 2 in with Eq. (13) (P(k in )*k c in in ), the distributions of four closed triples have the same scaling behavior as follows: where the exponent c D can be readily found by using P(n D )dn D~P (k in )dk in to be The exponent c D is determined by the parameters m and p and it falls into the range (1:5,2.

Simulation Results
We also carried out numerical simulations to study the structural properties of the model and compared results with data of real online social networks. The activity m and reciprocation probability p are two important parameters of the model. They determine the reciprocity r~(1zmp)=(1zm) and mean indegree SkT~2zmzmp of simulated networks. In order to compare results with real online social networks, we take three parameters from real data, namely the number of nodes N, the reciprocity r and the mean indegree (outdegree) SkT, and determine the parameter m and p in the model through Taking ESN as an example, we have SkT&6:7, r&0:25, and N~75879. The model parameters are then fixed at m&4:34 and p&0:08 according to Eq. (31). With the values of m and p, a network of N~75879 nodes is simulated. For a non-integer value of m, it is implemented in a probabilistic way. For ESN with m~4:34, for example, the initiation of the fifth new directed link through the internal evolution process is implemented with a probability 0:34 after establishing four new directed links in every time step. The structural properties of the simulated network are analyzed for each of the quantities studied for the real data. Results are shown in Figures 2-9 as red circles for comparison (also see Figures S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15, S16, S17, S18, S19, S20, S21, S22, S23, S24 of Appendix S1). The model basically reproduces the key properties of ESN.
For the indegree and outdegree distributions (see Figure 2) and the reciprocal degree distribution (see Figure 4), the simulation results also show similar scaling law, with the exponents c in &1:95, c out &1:96 and c r &2:1 determined by the maximum likelihood estimation [54,55] (see Table S1 of Appendix S1 for more detail). These values are slightly larger than the corresponding values of the exponents in ESN. According to Eqs. (17), (21) and (31), these exponents are equal and the theoretical value is c~2z1=(SkT{1)&2:17. Note that the rate equation analysis assumes an infinite system. The difference between the simulated results and the theoretical value comes from the finite size of simulated network, as well as the approximations made in getting at the values of the exponent. The indegree and outdegree distributions of simulated network are in reasonable agreement with the empirical results of ESN. The model, however, gives a reciprocal degree distribution smaller than the ESN empirical results over a wide range of k r . This discrepancy implies that there are some network growing mechanisms in ESN that are not included in the model, e.g., different reciprocation probabilities for different nodes [62]. This, together with a possibly very weak degree correlation in Figure 7 that we ignored, may be the reason for the simulation results in Figures 3 and 5 to be bigger than the empirical values for large in/outdegrees, and for the small differences in the tails in Figures 2 and 4 [63,64].
For the distributions of the four closed triples, the distributions from simulations follow a power-law behavior with almost the same exponent (see Figure 6), where c FB &1:47, c FFa &1:46, c FF b &1:46 and c FFc &1:46 as determined by the maximum likelihood estimation. These values are slightly larger than the exponents found in ESN. Theoretically, c D~3 =2z1=½2(SkT{1)&1:58 according to Eqs. (30) and (31). We note that the theoretical values of both c and c D depend only on the mean indegree SkT, which in turn is determined by the two model parameters m and p. Figure 10 shows the values of all the cexponents of the distributions for the four online social networks and the corresponding simulated networks, which are determined by the maximum likelihood estimation.
The two parameters m and p affect the reciprocal degree of nodes k r through Eq. (20). Substituting Eq. (31) into Eq. (20), we have k r *(2SkTr{r{1)k in =½(1zr)(SkT{1)&0:3k in for ESN. The reciprocal degree k r of a node and its k in are related by a factor depending on the two global parameters SkT and r. This linear relationship between k r and k in (k out ) with a slope 0:3 is observed in simulation results, as shown in Figure 5, but the ESN data show a faster increase of k r with k in and k out . When the network has a larger reciprocity, such as r&0:73 for Slashdot, r&0:45 for Flicker, and r&0:65 for YouTube, a better agreement is observed (see Figures S10, S11, S12 of Appendix S1). Despite some small differences in the tail in Figures 8 and 9, which may be caused by local proximity bias in link creation [5], simulation results for the dependence of the number of closed triples with k in and k out are basically in accordance with empirical results.

Discussion
With the advancement in information technology, online social systems become an increasingly important part of modern life. It is, therefore, of great significance to study the structures and dynamics of these systems. In this study, we focused on the local scale, mesoscale and macroscale structural properties of online social networks, especially the influence of properties on the local scale and macroscale on the mesoscale structures. We analyzed the data and extracted the local scale and macroscale structural properties of four large-scale online social networks. It was found that the indegree and outdegree distributions follow a similar scaling law, which follows from the fact that k in &k out for most of the nodes. It implies that there is a preferential attachment mechanism in which the product k i out k j in is important in the establishment of links during the evolution of online social networks. In addition, the very large reciprocity r observed in these networks suggests the existence of a reciprocation mechanism in online social networks. The reciprocal degree distribution also shows a similar exponent as that of the indegree distribution due to the roughly linear relationship between the reciprocal degree k r and the indegree k in of nodes (i.e., k r *k in ), which in turn implies a fixed probability of reciprocal links between connected nodes. In the mesoscale, the close-knit friendship structures are determined by both local scale (i.e., indegree and outdegree k in &k out ) and macroscale (i.e., mean in/outdegree SkT) structural properties. For a node with large k in &k out , the numbers of the four closed triples show the same scaling behavior: n FB *k 2 in and n FF *k 2 in , as a result of the negligible degree correlations in these networks. For all nodes, the distributions of these closed triples also follow a similar scaling law. Despite the numbers of the three feedforward loops are equal, their distributions look somewhat different in detail.
To reproduce the empirical features, we proposed and studied a simple directed network model incorporating an external reciprocation process and an internal evolution process. The two parameters in the model are the activity m and the reciprocation probability p. They can be inferred from the reciprocity r and mean indegree SkT of real online social networks according to Eq.(31), so as to ensure that the simulated network and the real network have the same reciprocity and mean indegree. Analytically, we derived the structural properties in the local-scale and mesoscale. The results show that the exponents characterizing the distributions of indegree, outdegree, reciprocal degree and four closed triples depend only on the mean indegree SkT, i.e., c~2z1=(SkT{1) and c D~3 =2z1=½2(SkT{1). In addition, the mean indegree SkT and the reciprocity r together determine the ratio of the reciprocal degree to the directed in/outdegree, i.e., k r *(2SkTr{r{1)k in =½(SkT{1)(1zr). The expected indegree and outdegree of nodes in the model grow as the same function of the time that the nodes are introduced, with very old nodes having very high indegrees and outdegrees. This phenomenon, coupled with an essentially fixed rate of reciprocation, reproduces almost all the properties of the online social networks studied here.
The mesoscale structural properties reported in our work help us understand the interplay between structural properties on different scales in online social networks. More specifically, the mesoscale structures in these online social networks are determined by global parameters as well as by local distributions. This provides a useful perspective of future studies in social network analysis. Our work also provides a better understanding of the evolution of online social networks, especially the emergence of close-knit friendship structures with a scaling behavior in their distributions. The two processes (reciprocation and preferential attachment) provide a possible explanation of the mechanisms underlying the local scale and mesoscale structural properties of online social networks. The former reflects that users often respond to a new incoming link by quickly establishing a reversed link. The latter means that a well-known user with a large k in is more likely to attract new connections and an active user with a large k out is more likely to create new connections. Our model may also be applied to other growing directed networks in which the indegree and outdgree distributions show a similar scaling behavior and the reciprocation mechanism is valid. However, the model is not applicable to the symmetric online social networks that lack the power-law degree distributions [1][2][3] (e.g., Facebook), and to the WWW [33] and Wikipedia [56] as the indegree and outdegree distributions in these systems carry different exponents and the reciprocation mechanism is absent. Similarly, it does not apply to the citation network as a paper can only cite published papers, but not vice versa.
Although simulated results of our model basically reproduced the structural properties of the online social networks at different scales, the differences in the exponents characterizing the distributions and in the tails of the distributions in real online social networks (e.g., Figures 4,5,8,9) imply that there exist other factors, such as individual users of different reciprocation probabilities and local proximity bias, that are ignored in the model. These factors are good ingredients for future work. It is also important to study the emergence of communities in online social networks. The present work also forms the basis for the understanding of the impact of mesoscale structural properties on dynamical processes on online social networks, such as information diffusion, opinions formation, and cooperation evolution. An interesting problem for future work is to investigate whether the model can be applied to offline real social networks. Such a work would help reveal the difference between online and offline social networks.

Supporting Information
Appendix SI Appendix to the manuscript.

(PDF)
Table S1 The exponents of various distributions obtained by power-law fits of real online social networks and the simulated network based on the model using the maximum likelihood estimation. x min is the lower bound of the range for fitting a power-law distribution, is the corresponding exponent and KS is the goodness-of-fit value based on the Kolmogorov-Smirnov statistic. (PDF) Table S2 Pearson correlation coefficient. r(in; in) quantifies the tendency of nodes with a high indegree to be connected to another node with a high indegree. The other quantities carry a similar interpretation. (PDF)  (17) and (31)) suggests a scaling behavior with an exponent 22.08, as shown by the dash line. Data points are averages over the logarithmic bins of the reciprocal degree k r . (PDF) Figure S9 Reciprocal degree distributions of the You-Tube social network and the model. Results of the YouTube social network (black squares) and simulation results (red circles) based on the model are shown. Analytic treatment (see Eqs. (17) and (31)) suggests a scaling behavior with an exponent 22.3, as shown by the dash line. Data points are averages over the logarithmic bins of the reciprocal degree k r . (PDF) Figure S10 Mean reciprocal degree of nodes with (a) the same indegree and (b) the same outdegree in the Slashdot social network and in the model. Results of the Slashdot social network (black squares) and simulation results (red circles) based on the model are shown in a log-log scale in the main panels. Analytic treatment suggests that AEk r ae is linearly dependent on k in and k out , and the blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a linear scale and the dash line has a slope of 0.82, as given by Eqs. (20) and (31). Data points are averages over the logarithmic bins of the indegree k in and outdegree k out , respectively. (PDF) Figure S11 Mean reciprocal degree of nodes with (a) the same indegree and (b) the same outdegree in the Flickr social network and in the model. Results of the Flickr social network (black squares) and simulation results (red circles) based on the model are shown in a log-log scale in the main panels. Analytic treatment suggests that AEk r ae is linearly dependent on k in and k out , and the blue dash lines of slope 1 show its dependence.
The inset in each panel shows the results in a linear scale and the dash line has a slope of 0.59, as given by Eqs. (20) and (31). Data points are averages over the logarithmic bins of the indegree k in and outdegree k out , respectively. (PDF) Figure S12 Mean reciprocal degree of nodes with (a) the same indegree and (b) the same outdegree in the YouTube social network and in the model. Results of the YouTube social network (black squares) and simulation results (red circles) based on the model are shown in a log-log scale in the main panels. Analytic treatment suggests that AEk r ae is linearly dependent on k in and k out , and the blue dash lines of slope 1 show its dependence. The inset in each panel shows the results in a linear scale and the dash line has a slope of 0.73, as given by Eqs. (20) and (31). Data points are averages over the logarithmic bins of the indegree k in and outdegree k out , respectively. (PDF)