Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Empirical Studies on the Network of Social Groups: The Case of Tencent QQ

  • Zhi-Qiang You,

    Affiliation Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China

  • Xiao-Pu Han ,

    Affiliation Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China

  • Linyuan Lü,

    Affiliation Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, China

  • Chi Ho Yeung

    Affiliation Department of Science and Environmental Studies, The Hong Kong Institute of Education, Hong Kong



Participation in social groups are important but the collective behaviors of human as a group are difficult to analyze due to the difficulties to quantify ordinary social relation, group membership, and to collect a comprehensive dataset. Such difficulties can be circumvented by analyzing online social networks.

Methodology/Principal Findings

In this paper, we analyze a comprehensive dataset released from Tencent QQ, an instant messenger with the highest market share in China. Specifically, we analyze three derivative networks involving groups and their members—the hypergraph of groups, the network of groups and the user network—to reveal social interactions at microscopic and mesoscopic level.


Our results uncover interesting behaviors on the growth of user groups, the interactions between groups, and their relationship with member age and gender. These findings lead to insights which are difficult to obtain in social networks based on personal contacts.


Social interactions are essential to our daily life, yet our understanding on the organization of social contacts is limited. Major reasons include the difficulties to quantify individual social relationship and to collect a comprehensive dataset. Nevertheless, the rapid development of the Internet has revolutionized the form of social interactions from postal mails, telephone voice calls, physical meeting and gathering, to emails, instant messaging, online forum and online social networks. Through the internet, interactions are quantified into data which greatly facilitates the studies of social networks. Many exciting findings are revealed. As an example, the hypothesis of six degrees of separation was initially considered in 1930s [1], which states that any two persons can be connected by a small number of acquaintances, was only recently tested on Facebook network which gives an average degree of separation of roughly 4 [2]. Other features revealed on online social networks include power-law degree distribution [37], community structure [8, 9] and special communication patterns (e.g. the non-Poisson properties on contact activities) [1016].

So far the studies on online social networks focus mainly on individual social relationship, leaving another important aspect—participation in social groups—less understood. It is because the collective behavior of human as a group is difficult to study in traditional social networks due to the ambiguity in quantitatively affiliating individuals to specific groups. This problem does not occur on the Internet since group-based applications have a definite membership identity for individuals. For instance, prototype online applications such as chatrooms and bulletin board systems (BBS) involve individual users joining and posting messages where membership identity is well defined [17, 18]. Including Windows Live and Google messengers, Whatsapp, Skype, Fetion and Tencent QQ, these applications set the basis for existing social applications and instant messengers. In these applications, users create social groups on demand. It drives social networks to a state which are more extensive and complicated than their physical counterparts.

Two different types of online social groups can be formed on the Internet. The first one is similar to ordinary social networks, which are joined by friends with real personal relationship. Circles in Google plus, Skype and Whatsapp groups belong to this type [19]. The second one is more unique to online social networks, consisting of groups of individuals with common interests but without prior personal relationship, for instance, membership in forums and student bulletins. They connect individuals beyond ordinary social networks and extend the social scope of individuals. Despite the difference in their nature, the two types of networks are interdependent on each other [2022]. For instance, two users in the same forum may become intimate friends and participate in each others’ personal social networks. By making new friends, an individual may find new interests and join new forums. This new form of social organization is unique to online social networks and has greatly supplemented or even replaced its physical counterpart.

In this paper, we analyze a comprehensive dataset obtained from Tencent QQ, an instant messenger with the highest market share in China. Both types of social networks are established on QQ and the interactions between the two networks are expected. Specifically, we analyze three different networks involving groups and their members—the hypergraph of groups, the network of groups, and the user network—to reveal social interactions at microscopic and mesoscopic levels. Our results uncover interesting behaviors on the growth of user groups, the interactions between groups, and their relationship with member age and gender. These findings reveal unique phenomenon in online social networks, as well as insights which are otherwise inaccessible in ordinary social networks. Here “ordinary social networks” refer to the social networks directly based on personal users.


Data description

Tencent QQ (commonly abbreviated as QQ, the website of Tencent QQ: is an instant communication tool developed by Tencent Holdings Limited in 1999. To date, it has over 700 million active users and has become the largest online application in China. QQ users can send messages, share photos and files, post microblogs, and voice or video chat with friends using computers or smartphones.

Social group is one of the main features of QQ which allows multiple users to communicate instantly. A message posted by a member is immediately received by all the other group members. When necessary, any two members can communicate via individual channel. Depending on the activeness of a user, each of them can create no more than six groups. Groups can be searched by their ID’s or names and other users can join the group upon the approval by the administrator, i.e. the group creator. QQ limits the group size by 100, 200, 500, and 1000, also depending on the activeness of the group creator. For example, according to the latest rule of Tencent, a user with level 0 (the activeness is less than 5) can create only one group (the limit of group size is 200), and a user with level 48 (the activeness is higher than 2496) can create 5 groups (the limits are 200 for one of the groups and 500 for the other 4 groups). Other than personal relationships, some groups are formed by members with common interests, e.g. movies, or belong to the same organizations, e.g. universities or companies. The latters are usually exclusive social circles based on physical organizations.

The QQ dataset (it was released from the online open database [23] and can be available using web crawler) we examine covers more than 58,523,079 groups and 274,335,183 users, of which 48,676,355 groups has the information with all ID, member list, and date. Due to the limit of 2000 groups which are allowed for an ordinary user to join, 34 users who joined more than 2000 groups must have superior permission given by Tencent, and thus they are considered as the robots or the customer services set by Tencent and are excluded from our analyzes. Since some users do not indicate his/her gender or age, or provide some seemingly false information, e.g. 0 year old, we exclude users without gender information or younger than 10 or older than 70. Overall, there are 273,204,518 users with gender information, of which 42.5% (116,135,972) are females, and 244,521,321 users with age between 10 and 70. For most of the QQ groups, its ID, its member list with gender and age, and the date of which it was established are known. The oldest and the youngest groups in our dataset are formed on 22nd September, 2005 and 25th March, 2011, respectively. We thus only use data up to 25th March, 2011.

Networks construction

We examine the following types of networks embedded in the datasets:

  1. User-group hypergraph and bipartite network—A hypergraph [24, 25] is a graph of nodes and hyperedges each of which connects two or more nodes. As shown in Fig 1(a), the hypergraph in our dataset describes the user-group relationship with nodes representing individual users and hyperedges representing groups. For instance, user B is a member of group G1 and G2, and are thus connected to A via hyperedge G1 as well as to C and D via hyperedge G2. In this paper, we label the results obtained on the user-group hypergraph by superscript H. We also show the corresponding bipartite network [26, 27] in Fig 1(b), which is an equivalent representation of the hypergraph H. The nodes in upper side and bottom side respectively are users and groups. The results on the bipartite network is labeled by superscript B.
  2. Group network—As shown in Fig 1(c), group networks in our context refer to weighted networks where nodes represent individual groups, and two groups are connected if they have at least one common member. The weight on the edge is defined as the number of common users between the two groups. For instance, group G3 and G4 in Fig 1(a) have 3 common users, the edge connecting G3 and G4 in Fig 1(c) has a weight of 3. In this paper, we label the results obtained on the group network by superscript G.
  3. User network—To focus on the behaviors of social groups, the user network in our context is not the ordinary friendship network in QQ, but instead is a weighted network which only connects two users if they are members of at least one common group. Hence, all members in a group are fully connected to each other. The weight of an edge connecting a pair of users is equal to the number of groups they both join. As shown in Fig 1(a), both user C and D are members of G2, G3 and G4, and hence the weight on the edge connecting user C and D in Fig 1(d) is 3. In this paper, we label the results obtained on the user network by superscript U.
The notations used throughout the paper are summarized in Table 1.

Fig 1. Schematic diagram showing (a) the user-group hypergraph H, (b) the bipartite network B, (c) the group network G, and (d) the user network U.

The data is composed of five groups denoted by the colored ellipses in (a) and eleven users. The thickness of edges in (c) and (d) is proportional to the weight on the edges.


The Structural Properties of the User-Group Hypergraph H

The distribution of social group size is one of the most interesting features in a social network. As shown in Fig 1(a), the group size sH is the total of node numbers covering by a hyperedge. As we can see in Fig 2(a), the distribution P(sH) shows a slow and smooth decay in the range 0 ≤ sH ≤ 50. The decay becomes faster for sH > 50 and the curve becomes discontinuous at sH = 100, 200, 500 and 1000, due to the limitation of group size by QQ. We find that the broken parts of the curve can be enclosed by two power laws with exponent −3.5 and −5.0, i.e. the two dashed lines in Fig 2(a). These exponents are more negative than similar exponents observed in other social networks, suggesting that it is more difficult for a group to maintain a large member community than for an individual to maintain a large number of friends. The results indicate a more homogeneous nature in the distribution of group size, probably because maintaining such close relationship in a large group, e.g. clubs or organizations, is not easy, which limits the growth of group. On the other hand, we show in Fig 2(b) a data collapse of the different broken parts after re-scaling, implying that formation mechanisms of groups are similar regardless of their size. And also, the relation between the number of groups and the number of users obeys power function with exponent 1 (The inset of Fig 2(c)).

Fig 2. Statistics for the hypergraph H.

(a) P(sH), the distribution of group size sH, with the distribution in semi-log scale shown in the inset. The two dashed lines in show the range of the tail exponent of P(sH), namely −3.5 (orange) and −5.0 (magenta). (b) The data collapse of the different broken parts on P(sH) after re-scaling, in which , here and respectively are the minimum and maximum value of sH in each section, and Pc(sH) is the corresponding re-scaled probability. (c) The average (pink) and the standard deviation (grey) of group size given specific date of establishment, and the inset shows the scaling relationship between total of groups and total of users at each date. (d) The distribution P(kH) of the number of joined group by individual users. P(kH) for male and female users are shown in the inset. The pink lines correspond to power-law fits with exponent −3.82.

Intuitively, we expect older groups to have a larger size since they have a longer time to accumulate members. To reveal the correlation between the size of a group and the date of which it is formed, we compute the average size of groups established on the same date. As we can see in Fig 2(c), the average size ⟨sH⟩ is almost independent of the date of establishment, which is contrary to our belief. This result may imply that most groups do not grow significantly after establishment, and the group size is mainly determined by the number of users who joined the group shortly after the group was created. It is because when a group is created, its information usually spreads rapidly in the creator’s social circle. As a result, most interested users join the group once they heard about it. Occasionally, a small number of users may join existing groups but on the other hand, some existing members may leave the group leading to an equilibrium group size. This certainty on group members creates some difficulties into the studies on recommendation algorithms for QQ groups.

The above pictures are further supported by the standard deviation σs of group size, which again does not increase with the age of a group. Moreover, after excluding the groups with size close to the size limits (i.e. excluding groups with size in the range 90–100, 180–200, 450–500), the average size of the remaining groups also shows the same phenomenon (the violet curve in Fig 2(c)). This behavior of constant size is different from those observations in many other slow-growing social networks.

Other than the group size, the number of groups joined by an individual user is also an important characteristic of a social network. In the context of hypergraph, one can represent the number of group joined by a user by the hyperdegree kH of the user. Fig 2(d) shows the distribution P(kH) with a tail well fitted by a power law with exponent −3.82. Although the exponent is more negative than most of the other social networks, a power-law decay does imply that users which joined a large number of groups are present. Unlike previous studies which revealed differences based on gender, we observed similar P(kH) for both male and female users. In addition, we find obvious positive correlation on the relationship between the average value of kH among group members and the corresponding group size sH. Further analysis (see Section 1 of Materials and Methods) indicates that this positive correlation reflects the preference of active users in joining large groups.

The Structural Properties of Group Network G

After examining the macroscopic characteristics of groups, we move on to reveal their microscopic interactions. In this respect, the weighted group network characterizes an indirect interaction between groups when they share some common members. As a reminder, two groups are considered connected if they share some common neighbors and the weight of the edge is the number of users who joined both groups.

As shown in Fig 3(a), the distribution P(kG) of the group degree kG shows a power law with exponent −0.8 when kG < 120 and another power law with exponent −2.23 when kG > 120. Similarly, as shown in Fig 3(b), the weighted degree distribution P(KG) shows a power law with exponent −0.81 when KG < 160 and another power law with exponent −2.33 when KG > 160. The results imply that a group only share members with a small number of groups, usually at most of the order O(102) among the 58 million groups in the QQ network. On the other hand, the number of common members between a pair of groups, i.e. the weight of edge, also obeys a two-region power-law as shown in Fig 3(c), with an exponent −5.94 at the tail. This implies that the number of users who have interests in a common pair of groups are limited to the order of O(102). Furthermore, using bipartite network projection, we calculate the effective edge’s weight that reflects the influence of a group on another one [28], and find that the fitting power-law exponent of the distribution of is smaller than the one of wG, indicating that the influence between groups is more heterogeneous (see Section 2 of Materials and Methods).

Fig 3. Properties of the group network G.

The figures show (a) the distribution P(kG) of group degree, (b) the distribution P(KG) of weighted group degree KG of G, and (c) the distribution P(wG) of edge’s weight. The insets show the same curves in semi-log scale.

The degree of a group is dependent on two factors, namely (i) the number of users in the group, and (ii) the total number of other groups joined by its members. Fig 4(a) shows the relation between the group degree kG and the group size sH, such that the relation between sH and the corresponding ⟨kG⟩ is given by the pink curve. The results show that group degree increases with group size, which is expected since the number of different groups joined by the members of a larger group should be proportionately higher. In Fig 4(b), a similar statistics shows the relation between the group degree kG and , the largest number of joined groups by an individual member in a group. The reason is similar to that in Fig 4(a), since a larger group has proportionately more active members, the largest number of group joined by an individual member is higher. The average of kG has an obvious transition from a faster growth to a slower growth (see Fig 4(b)), indicating that kG is more strongly dependent on when kG is smaller than 100. This results imply that when the group degree kG is small, the active users have a significant role in improving kG.

Fig 4. The heat maps which show the correlation (a) between kG and sH (a), and (b) between kG and .

The color scale corresponds to the log-frequency of occurrence. The Pearson correlation coefficients for log(kG) vs. log(sH) and for log(kG) vs. are 0.92 and 0.91, respectively. The pink lines show the curves on their means along vertical values, and the blue dashed line in (a) shows the fitting power function with slope 1.14.

Finally, we show that the QQ group network is sparse but shows “small-world” phenomenon, similar to the friendship network of Facebook [2, 29]. Comparing to Facebook, the average degree and average weighted degree are slightly smaller in QQ group network, with values 108.8 and 133.6 respectively. These degrees are small given the large size of the network, indicating the network is sparse. To show the “small-world” phenomenon, we randomly sample 2 × 104 pairs of groups and remarkably find that their average distance is only 3.70±0.004, indicating the upper limit of the average distance between each two users is only 4.70, similar to the four degrees of separation observed in Facebook (the average distance between users is 4.74) [2]. We also compute the local cluster coefficient CG (1) for 104 random chosen groups, such that nT is the number of connection among the neighbors of the group. We show the frequency of the values (kG, CG) for individual group in Fig 5. As we can see, CG is negatively related to kG in a rough power-law relation with exponent −0.62, which is similar to the Facebook case [2]. The average value of CG is 0.35, which is high compared to the other social networks.

Fig 5. The heat map which shows the correlation between local clustering coefficient CG and the degree kG in group network G.

The color scale corresponds to the log-frequency of occurrence over 104 randomly sampled groups. The pink dashed line shows the fitting curve with slope −0.62 on the means along vertical values.

The Structural Properties of User Network U

A similar analysis is conducted for the weighted user network. We show in Fig 6 the degree distribution P(kU), which has a power-law tail with exponent −3.22, and an average value 135.3. The degree distributions for male and female users do not show obvious difference and are shown in the bottom inset of Fig 6. By averaging 1600 random pairs of users, we obtained the average distance between a pair of users to be 4.36 ± 0.015, which is smaller than the average distance 4.74 observed in Facebook friendship network [2]. These results show that the user network is sparse and exhibits a small-world phenomenon. By comparing the degree distribution P(kU) and the weighted degree distribution P(wU) as shown in the top inset of Fig 6, we observe that the latter can be fitted well by a decay function P(wU) ≈ 10−5.45[log(wU)]−7.96, which is slower than power-law. It implies that users with large degree are more likely to share groups with other users, resulting in a large edge weight, and thus a shift of the tail part to the right. Nevertheless, the effective edge’s weight calculated by the projection of bipartite network B obeys a rapid-decaying type of distribution, indicating that the difference on the influence between users across the interactions in groups is not so large. The detailed discussion can be found in Section 2 of Materials and Methods.

Fig 6. Degree distribution P(kU) in the user network U.

The bottom inset shows the same distribution over male and female users respectively. The top inset shows the distribution P(wU) of edge’s weight wU.

Grouping behaviors and user age

The most active age group in QQ group participation.

To make the best use of the available data, we go on to reveal the relation between user age and their joined groups. Similar studies have shown that the preference for gender in social contacts changes with age [30]. Here we will reveal similar changes in grouping preference with age.

Fig 7(a) shows the distribution of individual user age, namely P(a), and the distribution of average member age of groups, namely PG(⟨a⟩). As we can see, members in QQ-groups are mainly young users of around 20 years old. As shown in Fig 7(b), the distributions P(kH) of the number of group joined by an individual user, i.e. the hyperdegree kH in the user-group hypergraph, is slightly dependent on ages: comparing different ages, the decay of P(kH) for users in the range of 40 to 44 is faster in the small kH regime and slightly slower in the large kH regime. We further show that the number of joined group ⟨kH⟩ is highest at two distinct ages, showing a bimodal form, where the first peak is located at a ≈ 15 and corresponds to a group of teenagers, and the second peak should appear in a > 65 and corresponds to the elderly. The number of joined group is minimum when users are at their 40s.

Fig 7. The relation between member age and group characteristics.

(a) The distribution of age over individual users and average member age over individual groups. (b) The distribution of the number of joined groups by different age groups. Inset: the average value of of joined group over users at different ages. (c) The distribution P(cva) of the coefficient of variation cva for users’ age in each group, and the average value ⟨cva⟩ as the function of the average age of groups is shown in the inset. (d) The dependence of average group size ⟨sH⟩ on average group member age ⟨a⟩. Inset: The dependence of average group degree ⟨kG⟩ on average group member age ⟨a⟩.

These results indicate that both teenagers and the elderly are active in group-based social interactions, in contrast to the less active middle-aged users at their 40s. We examined the groups joined by several elderly users and find that majority of them are groups for entertainment, indicating their needs for leisure activities and social interactions. For users at their 40s, they are likely to be engaged either in family or works and are less active in joining QQ groups.

The distribution of member age in groups.

We compute the coefficient of variation cva = σa/⟨a⟩ of each group, where σa and ⟨a⟩ are the standard deviation and the average of the member age in the group respectively. The distribution of cva is shown in Fig 7(c), which shows that the values of cva in more than half of the groups are smaller than 0.1, indicating group members are usually of similar age. This result is expected since users with similar age usually have similar interests or are engaged in similar institutes and thus are more likely to meet each other.

On the other hand, the behaviors of cva for groups with different average ages are different. As shown in Fig 7(c), cva is first peaked for groups with average age ⟨a⟩ ≈ 14, corresponding to groups of teenagers, and also peaked for groups with age ⟨a⟩ ≈ 33, corresponding to groups of adults who are likely to be at the intermediate level of their careers. The average value ⟨cva⟩ for groups with elderly users is low. In general, ⟨cva⟩ can be considered as a measure of the user diversity within the group, and the above findings may imply that teenagers and users at their 30s are more open to make friends with others who may not be in the same age group. On the other hand, the smaller cva for groups with ⟨a⟩ ≈ 20 may imply users at their 20s are looking for friends who are of similar age, e.g. lovers or fellow university students.

The above interesting bimodal characteristics are observed in the average group size ⟨sH⟩ and the average degree ⟨kG⟩ of groups in the group network. As shown in Fig 7(d) and its inset, the first peaks of sH and kG appear at age around 19, while the second one appears at the age of 28. These results reveal a non-monotonic change of group preference with age.

Change of group preference with age and gender differences.

To get a clearer picture of the change of group characteristics with age, we (i) compute the average value of variables over groups with particular average member age, and (ii) show simultaneously the change of a pair of variables on a 2D space, which constitutes a path of the group characteristics when average member age increases. As shown in Fig 8(a), we show the average user degree ⟨kU⟩ in the user network and the coefficient of variation cvu, along a path when member age increases. The results imply that teenagers usually have a smaller but more diverse friendship community until 20 years old, where their friendship community increases in size but decrease in diversity, probably because they are studying in universities. Afterwards, when users start their career, the diversity of friend increases but the friendship community slightly shrinks. These observations are consistent with our previous analyzes which show a transition from a pre-mature regime to a mature regime.

Fig 8. The paths of changes of two averaged variables along with age.

(a) X-axis: the averaged value ⟨kU⟩ of the degree kU of user network U for users in each age, Y-axis: the coefficient of variation cvu of kU for each age. (b) gender differences on the age trail in panel (a). (c) Horizontal axis: the average number of joined groups by individual users; Vertical axis: the average value size of the joined groups. (d) The same path in (c) by averaging only male and female users respectively. (e) Horizontal axis: the average coefficient of variation ⟨cvuf among the degree of neighbors of a user in the user network U, Vertical axis: the average coefficient of variation ⟨cvaf among the age of neighbors of a user in the user network U. (f) The same path in (e) by averaging only male and female users respectively. The labels close to each data point corresponds to the value of age, and the different colors in (a), (c) and (d) respectively show the data points in three different age stages.

Other than an average over all users, we show a similar path in Fig 8(b) by averaging over male or female members in a group. As we can see, the path of the female users shows a faster increase in kU, i.e. a faster increase in the size of their friendship network, and an earlier transition into the mature regime at the age a ≈ 16 compared to the age a ≈ 20 of male users. In general, female becomes mature at an earlier age may be the reason. In addition, female users show a smaller kU in the mature regime compared to male users, reflecting the lower level diverse on the social contacts of female users.

Then we analyze the change of group size sH and the number of joined groups kH by users at various ages. As we can see in Fig 8(c), a pre-mature regime and a mature regime can be roughly identified in the path, separated at the age around 15. In the pre-mature regime, users tend to join more groups, each with a smaller size, while in the mature regime, users usually join a smaller number of group, each with a larger size. Fig 8(d) shows the corresponding path averaged over male and female users only. As we can see, female users show an earlier transition into the mature regime, similar to that observed in Fig 8(b). And also, female users are also observed to join less groups after the transition.

Finally, we examine the change with age in the diversity of the neighboring users’ ages and the degree of the neighbors of a user, denoted by ⟨cvaf and ⟨cvuf respectively. As shown in Fig 8(e), users of age within the range 15–23, i.e. users lying in the transition from the pre-mature to mature regime, usually have a friendship network composed mainly of users of similar age, probably corresponding to a studying stage in colleges. However, the diversity of the neighbors’ degree becomes higher with the increases of age within the age group from 15–23. After the transition, users of age within the range 25–35 usually have a friendship community with wider range of age but similar degree. For elders, they have moderate diversity in terms of both the neighbors’ age and degree. Similar path by averaging over only male (female) members is shown in Fig 8(f).

The above results show that teenagers are generally active in different social communities until 20 years old, when they start their college study, reduce their activities and make friends with fellow universities students. We observe that after the age of 25, the path characterizing different pair of variables enters a mature regime and become stable in a small region of the 2D plane. This may correspond to a transition from the stage of studying to working. In this stage, users tend to join group with greater member diversity.

We find that female users are in general less active than male users after the transition to the mature stage. This gender difference would have deep social and psychological reasons. In the last decades, several types of gender differences on social behaviors and internet using have been observed [31]. Previous studies have found that adult male users usually make new friends using social networks and adult female users usually use it for keeping in touch with the old friends [32, 33], which would partially explain our observation. However, some other reasons would be also relevant to and should be noticed, because they relate to the disadvantage of female users on education and social status. For example, the gender gap on using computer and Internet technologies [3436], and negative attitudes toward computers, the Internet and online social media of female users [34, 37], and the role of women in family.


Participation in social groups is essential, yet our understanding on them is limited due to the difficulties in data collection in ordinary social networks. Fortunately, online social networks do not have such problems. By using a comprehensive dataset obtained from Tencent QQ, we analyzed three derivative networks involving groups and their members. We showed that the distribution of the number of groups joined by an individual follows a power law, similar to other social networks except a larger decay component “is” observed in the present case. The group size of QQ is limited by some specific values, nevertheless, we showed a data collapse on the statistics of groups limited by different maximum size, implying a similar group formation mechanism regardless of their size. Other than distributions, network at the group level shows a small-world phenomenon with an average distance of 3.7. Such findings are remarkable since there are 58 millions groups and the group network is extremely sparse, and yet on average only 3 to 4 steps are required to connect any group pair. All these findings on online social groups are otherwise inaccessible in the studies of their physical counterpart and would affect on social dynamics [38, 39].

To make the best use of available data, we went one step forward to study the interdependence between a group and the age of its members. The results showed a change in the user preference for groups at different ages. A pre-mature and a mature stage can be identified. For youngsters who are still in schools, they are more active in social group participation in QQ and have a larger diversity of friends in terms of age. The situation changes when users are in the age group of 20s, such that they reduce their activities and make friends with mostly fellow college students. Afterwards, when users start working, they enter a mature stage where the diversity of their friends and groups increase again. These changes along the growth of age are revealed in various characteristics of their grouping preference.

As we can see, data collected in online social networks has revealed the interaction and participation of users in social groups. The results lead us to a better understanding of social interaction via information technology. Nevertheless, ordinary social interaction is still essential and a comprehensive understanding of the connection between online and ordinary social networks is missing. In this respect, the present study provides useful insights into the study of ordinary social networks, for instance, a guide to the design of surveys and collection of data in ordinary social networks. We believe our insights obtained from the present studies are not limited only to online social networks, but would be useful to fill the missing connection to its physical counterpart.

Materials and Methods

The Correlation between between group size and the number of joined group of members

Denoting by the largest number of joined groups by an individual member in a group, we observe a strong positive correlation between and the group size sH as shown in Fig 9(a). The dependence of on sH can be fitted by a power law with exponent 0.54. The increasing with group size may not seem surprising since active users are proportionately more likely to be present in larger groups which include more users. This is true even if active users do not have a preference for joining large groups. However, does this preference really exist?

Fig 9. The heat maps showing the correlation between group size and the number of joined group of members.

The color scale corresponds to the log-frequency of occurrence between the size of a group and (a) the largest number of group joined by an individual member in the group, and (b) the average number of group joined by the members in the group. The Pearson correlation coefficients for vs. log(sH) and for log(⟨kH⟩) vs. log(sH) respectively are 0.77 and 0.59. The pink lines show the curves on their means along vertical values, and the blue dashed line in (a) shows the fitting power function with slope 0.54.

To answer this question, we also plot the relationship between the average kH among group members and the corresponding group size sH. As shown in Fig 9(b), the averaged curve of this relationship generally obeys a power function with exponent 0.14. To understand this result, we assume a Null model that the members of each group are completely randomly drawn from a strict power-law distribution with exponent β. According to the average value of the power-law distribution p(x) = Cxβ: (2) due to the fitting power-law exponent β of the distribution of kH is 3.82 (Fig 2(d)) and is larger than the threshold β = 2 of the convergence condition, the average value of kH must be convergent if the size sH → ∞. And thus the expected exponent of the relationship between ⟨kH⟩ and sH would be zero and lower than the observation (0.14). And thus this positive exponent is the evidence of the existence of the preference, namely, active users would like to join larger groups.

The bipartite network and effective edge’s weight

A equivalent representation of the hypergraph H is the bipartite network B (Fig 1(b)). In the bipartite network, the nodes in two sides respectively are users and groups, and links can be represented by an NG × NU adjacent matrix, here NG and NU respectively are the total number of groups and total number of users. The group network G and the user network U are actually the two projection networks of the bipartite network B.

Unlike the direct definitions based on the number of common users(groups), the effective edge’s weight of the projection network G(U) represents the proportion of the one group(user) would like to distribute to another group(user) [40]. Ref. [28] proposed a typical way to calculate the effective edge weight based on a resource-allocation process between the two sides of nodes. To calculate the effective edge’s weight between two groups (Gi and Gj, say), firstly, set a certain amount of a resource at each group (G1, G2, ⋯, Gi, ⋯), and each group equally distributes its resource to each of users. And then, each user equally distributes its received resources to each of the joined groups, and the fraction of resource group Gi transferred to Gj is the effective edge’s weight from group Gi to Gj. This calculation can be represented by the following equation: (3) where xil is the element of the adjacent matrix of the bipartite network B. Notice would not be equal to . Similarly, the effective edge’s weight of user network U can be calculated by: (4)

By using this method, we calculate the effective edge’s weight of networks G and U for a sample of QQ dataset. This sample includes the three-level group neighbors of the joined groups of a given user (QQ ID of this given user is 4172705, and for the three-order neighboring groups, only the users that joined more than one groups in the sample are included), a total of 183,762 groups and 6,829,611 users. The distribution of the effective edge’s weight of networks G is shown in Fig 10(a), which generally keeps power-law-like property and the fitting power-law exponent is smaller than that of P(wG) (the inset of Fig 10(a)), indicating a more heterogeneous feature on the structural influence between groups. Due to the discontinuous group size distribution (Fig 2(a)), the curve of is broken at we = 10−2. In contrast, is homogeneous-like and more fragmented (Fig 10(b)), which partly attribute to that the sample does not include all the joined groups of some users in the less popular groups.

Fig 10. The effective edge’s weight distributions of the sampling projection networks of the bipartite network U.

Panels (a) and (b) show the cases of projection network of groups and of users respectively, and the corresponding distribution of edge’s weight using common users/groups are shown in their insets.

Author Contributions

Conceived and designed the experiments: XPH. Performed the experiments: XPH. Analyzed the data: ZQY XPH LL CHY. Contributed reagents/materials/analysis tools: ZQY XPH LL. Wrote the paper: XPH LL CHY.


  1. 1. Karinthy F (1929) Chain-links. In Newman M, Barabási A-L, Watts DJ (eds.). The Structure and Dynamics of Networks. Princeton University Press (2006).
  2. 2. Backstrom L, Boldi P, Rosa M, Ugander J, Vigna S (2012) Four degrees of separation. WebSci 2012, June 22–24, Evanston, Illinois, USA.
  3. 3. Albert R, Barabási A-L (2002) Statistical Mechanics of Complex Networks. Rev Mod Phys 74: 47–97.
  4. 4. Anghel M, Toroczkai Z, Bassler KE, Korniss G (2004). Competition-driven network dynamics: Emergence of a scale-free leadership structure and collective efficiency. Phys Rev Lett 92: 058701. pmid:14995348
  5. 5. Borgatti SP, Mehra A, Brass DJ, Labianca G (2009) Network analysis in the social sciences. Science, 323: 892–895. pmid:19213908
  6. 6. Zhou T, Medo M, Cimini G, Zhang Z-K, Zhang Y-C (2011) Emergence of scale-free leadership structure in social recommender systems. PLOS ONE 6: e20648. pmid:21857891
  7. 7. Cui A-X, Zhang Z-K, Tang M, Hui P-M, Fu Y (2012) Emergence of scale-free close-knit friendship structure in online social networks. PLOS ONE 7: e50702. pmid:23272067
  8. 8. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99: 7821–7826. pmid:12060727
  9. 9. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69: 026113.
  10. 10. Rybski D, Buldyrev SV, Havlin S, Liljeros F, Maksea HA (2009) Scaling laws of human interaction activity. Proc Natl Acad Sci USA 106: 12640–12645. pmid:19617555
  11. 11. Rybski D, Buldyrev SV, Havlin S, Liljeros F, Makse HA (2012) Communication activity in a social network: relation between long-term correlations and interevent clustering. Sci Rep 2: 560. pmid:22876339
  12. 12. Barabási A-L (2005) The origin of bursts and heavy tails in human dynamics. Nature 435: 207–211. pmid:15889093
  13. 13. Wu Y, Zhou C, Xiao J, Kurths J, Schellnhuber HJ (2010) Evidence for a bimodal distribution in human communication. Proc Natl Acad Sci USA 107: 18803–18808. pmid:20959414
  14. 14. Hong W, Han X-P, Zhou T, Wang B-H (2009) Heavy-tailed statistics in short-message communication. Chin Phys Lett 26: 028902.
  15. 15. Zhao Z-D, Xia H, Shang M-S, Zhou T (2011) Empirical analysis on the human dynamics of a large-scale short message communication system. Chin Phys Lett 28: 068901.
  16. 16. Jiang Z-Q, Xie W-J, Li M-X, Podobnik B, Zhou W-X, Stanley HE (2013) Calling patterns in human communication dynamics. Proc Natl Acad Sci USA 110, 1600–1605. pmid:23319645
  17. 17. Goh K-I, Eom Y-H, Jeong H, Kahng B, Kim D (2006) Structure and evolution of online social relationships: Heterogeneity in unrestricted discussions. Phys Rev E 73: 066123.
  18. 18. Wang P, Zhou T, Han X-P, Wang B-H (2014) Modeling correlated human dynamics with temporal preference. Physica A 398: 145–151.
  19. 19. Kairam S, Brzozowski M, Huffaker D, Ed Chi H (2012) Talking in circles: selective sharing in google+. CHI’ 12 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1065–1074, ACM New York, NY, USA.
  20. 20. Buldyrev SV, Parshani R, Paul G, Stanley HE, Havlin S (2010) Catastrophic cascade of failures in interdependent networks. Nature 464: 1025–1028. pmid:20393559
  21. 21. Gao J, Buldyrev SV, Stanley HE, Havlin S (2012) Networks formed from interdependent networks. Nat Phys 8: 40–48.
  22. 22. Bashan A, Berezin Y, Buldyrev SV, Havlin S (2013) The extreme vulnerability of interdependent spatially embedded networks. Nat Phys 9: 667–672.
  23. 23. All the dataset can be available from the online open database “” using web crawler.
  24. 24. Berge C (1973) Graphs and Hypergraphs (vol. 6). Elsevier, New York.
  25. 25. Berge C (1989) Hypergraphs: Combinatorics of Finite Sets (Vol. 45). Amsterdam: North-Holland Holl.
  26. 26. Holme P, Liljeros F, Edling CR, Kim BJ (2003) Network bipartivity. Phys Rev E 68: 056107.
  27. 27. Shang MS, Lü L, Zhang YC, Zhou T (2010) Empirical analysis of web-based user-object bipartite networks. Europhys Lett 90: 48006.
  28. 28. Zhou T, Ren J, Medo M, Zhang Y-C (2007). Bipartite network projection and personal recommendation. Phys Rev E 76: 046115.
  29. 29. Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the Facebook social graph. arxiv: 1111.4503.
  30. 30. Palchykov V, Kaski K, Kertész J, Barabási A-L, Dunbar RIM (2012) Sex differences in intimate relationships. Sci Rep 2: 370. pmid:22518274
  31. 31. Szell M, Thurner S (2013) How women organize social networks different from men. Sci Rep 3: 1214. pmid:23393616
  32. 32. Mazman SG, Usluel YK (2011) Gender differences in using social networks. Turkish Online J Educat Technol 10: 133–139.
  33. 33. Muscanell NL, Guadagno RE (2012) Make new friends or keep the old: Gender and personality differences in social networking use. Computers in Human Behavior 28: 107–112.
  34. 34. Fuller J (2004) Equality in cyberdemocracy? Gauging gender gaps in on-line civic participation. Soc Sci Q 85(4):938–957.
  35. 35. Broos A (2005) Gender and information and communication technologies (ICT) anxiety: Male self-assurance and female hesitation. Cyberpsychol Behav 8: 21–31. pmid:15738690
  36. 36. Hargittai E, Shafer S (2006) Differences in Actual and Perceived Online Skills: The Role of Gender. Social Science Quarterly 87: 432–448.
  37. 37. Phillip MV, Suri R (2004) Impact of gender differences on the evaluation of promotional emails. J Advert Res 44:360–368.
  38. 38. Christakis NA, Fowler JH (2010) Social Network Sensors for Early Detection of Contagious Outbreaks. PLOS ONE 5(9): e12948. pmid:20856792
  39. 39. Hadzibeganovic T, Stauffer D, Han X-P (2015) Randomness in the evolution of cooperation. Behav Proc 113: 86–93.
  40. 40. Newman MEJ (2001) Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev E 64: 016132.