Characterizing and Modeling the Dynamics of Activity and Popularity

Social media, regarded as two-layer networks consisting of users and items, turn out to be the most important channels for access to massive information in the era of Web 2.0. The dynamics of human activity and item popularity is a crucial issue in social media networks. In this paper, by analyzing the growth of user activity and item popularity in four empirical social media networks, i.e., Amazon, Flickr, Delicious and Wikipedia, it is found that cross links between users and items are more likely to be created by active users and to be acquired by popular items, where user activity and item popularity are measured by the number of cross links associated with users and items. This indicates that users generally trace popular items, overall. However, it is found that the inactive users more severely trace popular items than the active users. Inspired by empirical analysis, we propose an evolving model for such networks, in which the evolution is driven only by two-step random walk. Numerical experiments verified that the model can qualitatively reproduce the distributions of user activity and item popularity observed in empirical networks. These results might shed light on the understandings of micro dynamics of activity and popularity in social media networks.


Introduction
In recent years, social media networks, vital platforms for sharing contents with others in the era of Web 2.0, such as YouTube, Facebook, Delicious, Amazon, Flickr and Wikipedia, to name just a few, have experienced explosive growth [1,2]. These systems record the fingerprints of every user's activity and every item's popularity, providing ''a wealth of data'' to study the dynamics of human activity and item popularity at the global system scale. In particular, it is found that the probability distributions of the activity degree of users, e.g., editing in Wikipedia [3], voting in News2 [3] as well as favorite marking in Flickr [4], and the popularity degree of items, e.g., the number of fans a photo has in Flickr [4], follow a power law. The power law distributions are explained by the rich-get-richer mechanism [5,6], which is also called preferential attachment in the field of complex networks [7][8][9]. However, how these two distributions arise simultaneously due to human activity has yet to be determined.
The activity dynamics [3,[10][11][12][13][14] and popularity dynamics [15][16][17][18][19][20][21][22][23] have been investigated in the literatures, respectively. However, human activity and item popularity, two perspectives of the cross links between users and items, are interdependent; therefore, we can not study the dynamics of one aspect alone. In addition, individuals are always embedded in a social network. It is widely believed that information can spread quickly along social links using user-to-user exchanges, also known as ''word-ofmouth'' exchanges; moreover, the users' behaviors are strongly influenced by their neighbors [23][24][25][26][27]. In particular, the social degree and the activity degree depend on each other [3]. Hence, it is considered worthwhile studying social networks to obtain deeper insights into the dynamics of human activity and item popularity. Until now, there has been no clear picture as to how online human activity and item popularity coevolve, so it is crucial to investigate the evolution of empirical human activity and item popularity as well as the theoretical model to obtain a better understanding of the possible generic laws governing the formation of activity distribution and popularity distribution.
In this paper, we first characterize the evolution of human activity and item popularity in the Amazon, Flickr, Delicious and Wikipedia networks. It is found that in such social media networks, both relative probabilities of users creating cross links and items acquiring cross links are proportional to the degree of activity and degree of popularity, respectively. In particular, the inactive users are more likely to trace popular items than the active users. Based on empirical observations, we then propose an evolving model based on two-step random walk. Finally, we justify the validity of our model by comparing the results of model with that of empirical networks. This work could shed light on the understanding of evolution of user activity and item popularity in social media networks, and it also could be helpful in certain applications, such as designing efficient strategies for virtual marketing and network marketing, etc.

Data description and notations
The Delicious data set was downloaded from http://data.dailabor.de/corpus/delicious/, and consists of 132,500,391 bookmarks, 50,221,626 URLs (books), and 947,835 users between September, 2003 and December 31, 2007 [28]. The Amazon usermovie rating data set was obtained from Stanford Large Network Dataset Collection (http://snap.stanford.edu/data/web-Amazon. html) [29]. The data consists of 7,911,684 ratings, 267,320 movies and 759,899 users between August 1997 and October 2012. The Flickr data set was collected by daily crawling Flickr over 2.5 million users from Nov 2, 2006 to Dec 3, 2006, and again daily from February 3, 2007 to May 18, 2007 (http://socialnetworks. mpi-sws.org/datasets.html) [4]. Here, we only considered the users who had at least one favorite photo. With this constraint, there are 497,937 users, 11,232,836 photos and 34,734,221 favoritemarkings in the data. The Wikipedia (English) data set was download from http://konect.uni-koblenz.de/networks/editenwiki. The data set consists of 21,416,395 articles written collaboratively by 3,819,691 volunteers around the world before September, 2010. The four datasets consist of individuals and items, such as movies in Amazon, URLs (books) in Delicious, photos in Flickr, and articles in Wikipedia. Moreover, users are able to show interest in these items using the network feature of rating in Amazon, bookmarking in Delicious, favorite-marking in Flickr, and editing in Wikipedia. Therefore, these systems are topologically equivalent. For analysis purposes, the user-item data can be mapped into a two-layer network, as shown schematically in Fig. 1. This network has two types of nodes: M users and N items totally. In principle, the individuals are embedded in a social network. For example, Flickr and Delicious allow users to make friends. Therefore, there should be two types of links: the cross links between users and items as well as the social links among users.
Mathematically, the topology shown in Fig. 1 can be characterized by two matrices. S, an M|M adjacency matrix, represents the social links among users, with element S ij~1 if user i declares user j as his friend, otherwise S ij~0 . Similarly, C, an M|N adjacency matrix, characterizes the cross links, with element C il~1 if user i is interested in the item l, otherwise C il~0 . To be specific, we defined the following types of degrees to characterize the multi-relational connections. Two degrees are related to the cross links: (1) the activity degree: k a (i)~P l C il , i.e., the number of items interested by user i; (2) the popularity degree k p (l)~P i C il , i.e., the number of users who are interested in the item l, which reasonably represents the popular extent in the network; and (3) the social degree k s~P j S ij , i.e., the number of friends for a given user. Note, k a and k p are two different perspectives of the cross links connecting users and items.

Measuring preferential attachment
Here, we explain the method for measuring the phenomenon of preferential attachment on temporal data [30,31]. The basic idea is to investigate whether new links are likely to attach to nodes with larger degree (size). We calculate the empirical value of the relative probability P(k T ) that a new cross link formed within a short period Dt connects to a user (item), which has a degree of k T at the time t 0 , as follows, Here, k T is the degree at time t 0 . A(k T )~P kT (t)~k i,l C il is the number of nodes with exact degree k T at t 0 , but creating (acquiring) new cross links within next small interval t (e.g., one day in this article). C(k T ) is the number of users (items) with degree k T at t 0 . The preferential attachment hypothesis states that the rate P(k T ) with which a node with k T links acquires new links is a monotonically increasing function of k T [32], namely P(k T )*k a T . To obtain a smooth curve from noisy data, we take the cumulative function form instead of P(k T ): In our measurement, k T can be either degree of activity k a or degree of popularity k p . This method has been successfully used to verify the preferential attachment mechanism of BA model [32] in empirical evolving networks [30,31,33,34] and theoretical models [35].

Measuring relative contributions ratio
To measure the relative contribution ratio within a small interval Dt (e.g., one day in this article), we extend the method proposed in the reference [33] as follows. Absolute contribution from the users with degree k a is measured simply by a percentage of new cross links created by these users within a short period ½t 0 ,t 0 zDt out of the total number of cross links at time t 0 , where H ka kp~P k i~k a k l~k p C il~A ka (k p ) is the number of new links, which are created by the users with degree k a and attached to the items with degree k p , within the period t (e.g, one day in this paper). H kp~P k l~k p l k l~C (k p )k p is the number of cross links attached to the items with degree k p at time t 0 . In order to observe the differences of users' activity with different degree k a , we Figure 1. Schematic plot of the social media networks. For these two types of links, we define three types of degrees. For example, U4 in the network has social degree (k s~5 ), and the activity degree (k a~2 ); I4 has the popularity degree (k p~3 ). Please note that there is no social links in some cases, such as Wikipedia. doi: 10 ∆ ∆ present a more detailed breakdown of this absolute contribution by calculating the percentage of new cross links, which are from the users with degree k a and attach to the items with degree k p , out of all links attached to these items at time t 0 , and then normalized by the absolute contribution of these users, namely relative contribution ratio, In a sense, it describes how often the users with degree k a are pursuing popular items. In principle, the Eq. (1) is related to the Eq. (4) for the users with degree k a as follows Results

Empirical analysis to temporal data
As shown in Fig. 1, Amazon, Flickr, Delicious and Wikipedia are typical social media networks consisting of users and items such as movies, photos, books, articles, etc (see Materials and Methods for data description and notations). Social media networks are more complicated than the networks with one type of links in  ∆ previously studies [7][8][9], including single node networks and bipartite networks, due to their multiplex nodes and multirelations. Basic statistical properties for each data set are shown in Table 1. The degree of activity and degree of popularity follow an approximately power law distribution [4,28,29]. In particular, the social degree follows a power law distribution in Flickr [4]. In the following, we report the main findings of our empirical analysis of the Amazon, Flickr, Delicious and Wikipedia networks. We pay particular attention to the evolution of activity degree and popularity degree in these four networks. Like many other complex networks, the growths of these four networks involve two major factors: adding new nodes and generating new links. Here we pay particular attention to the formation of new links during the evolution of networks, because this is the central process in which users can exchange information with each other. In the following, we focus on how the existing states of users and items affect the formation of new links and what encompasses the differences between various users' interests.
First, we examine the phenomenon of preferential creation on the existing users and preferential attachment on the existing items in these four data sets. To this end, we employ a numerical method, proposed to test preferential attachment (see Materials and Methods for more details), to investigate how the generation of new cross links depends on the existing degrees in the temporal data sets. Figure 2 shows the cumulative function k(k) with respect to the degree of activity and degree of popularity. We see that the relative cumulative probability k(k a ) (k(k p )) for users (items) to create (acquire) cross links is proportional to the existing degree of activity (popularity). In particular, the cumulative functions k approximately follow a straight line on the log-log scale, indicating that the relative cumulative probability of generating new degrees satisfies a power law with respect to the existing degrees, which can be characterized by the positive exponent a where k(x)*x az1 with x denoting the degree. In Table 2, we list the characteristic exponents a a and a p determined by least-square fitting the k functions for small k as the curves deviate from the straight line for large k due to low statistics. The positive exponents a a and a p indicate that the active users (with a higher degree of activity) have Exponents a a and a p as in k(x)*x az1 , which characterize the influence of current degree of activity and degree of popularity on the formation of cross links between existing users and existing items as well as the formation of cross links between existing users (items) and new items (user) [in the brackets] The results are averaged over 10 randomly selected snapshots, where the exponents are determined by least-square fitting, and R 2 w0:99 generally. doi:10.1371/journal.pone.0089192.t002 greater chance to create new cross links than the inactive users (with a lower degree of activity), while the popular items (with a higher degree of popularity) have greater chance to attract new cross links. As these four systems expand rapidly, we then investigate the formation of cross links between new users (items) and existing items (users). In the insets of Fig. 2, k(k a ) characterizes the relative probability that the existing users are interested in new items with respect to the users' degree of activity, whereas k(k p ) characterizes the relative probability that the existing items attract the attentions of new users with respect to the items' degree of popularity. Interestingly, as seen in the insets of Fig. 2, these cumulative functions also follow a power law. The positive exponents a a and a p indicate that the newly created items are more likely to attract the attentions of active users, while the new users are more likely to be interested in popular items. The above results suggest that the users are likely to trace popular items overall, and that the active users are more likely to create new cross links than the inactive users.
What is the influence of activity (popularity) degree on the intensity of users tracing popular items (items attracting attentions of users)? To attack this problem, we classify the users and items into different groups according to their activity degree and popularity degree. Then, we investigate the cumulative functions of relative probability k(k p ) and k(k a ) for different group of users and items, respectively. As seen from Fig. 3, the slops of inactive users (with smaller k a ) tracking items look qualitatively larger than those of active users. For instance, in Wikipedia, the slope is 2.2 for k a ƒ10, while it is 1.75 for k a w10000. This indicates that the inactive users more severely trace popular items than the active users. Moreover, as seen in the insets of Fig. 3, the slops of unpopular items (with smaller k p ) attracting users look slightly larger than those of popular items, indicating that the unpopular items attract a greater interest amongst active users. Please note that the differences between different groups of users or items are smaller for Amazon than for the other three networks. This may be due to the different spreading modes of items such as movies. For instance, a popular movie is similar to the well-known global information in the Amazon user-movie network, but there is no such counterpart in the Flickr user-photo, Delicious user-book and Wikipedia author-article networks.
To provide an additional evidence for the different intensity of users tracing popular items, we also calculate the relative contribution R ka (k p ) of users with activity degree k a , who create some new cross links to items with popularity degree k p within one day (see Materials and Methods for the detail). Ideally, if the intensities of users tracing popular items are identical, the relative contribution ratios R(k p ) should be always equal to 1 for all group of users. As seen in Fig. 4, the relative contribution of active users (with larger k a ) to unpopular items (with smaller k p ) is larger than 1 but is smaller than 1 for popular items (with larger k p ), indicating that the active users make higher contributions to unpopular items than average but lower to popular items than average. Meanwhile, the inactive users exhibit the opposite behavior with the exception of Amazon. Especially in Flickr and Wikipedia, it is obviously found that R(k p ) increases for the most inactive users, while R(k p ) decreases for the most active users with respect to popularity degree. Based on Eq. (5), the cumulative function k(k p ) for the inactive users will increase more faster than that for the active users, indicating that the slop for inactive users is larger than that for active users (as shown in Fig. 3). Furthermore, it is found that medium active users within Wikipedia, e.g., 1000vk a ƒ10000, make almost equivalent contributions to articles having different degree of popularity (as shown in Fig.4 (d)), indicating that they may not care about the article's popularity when they edit them. These results also indicate that the inactive users are more likely to trace popular items than the active users, in agreement with previous observations.

Modeling
To further understand the mechanisms governing the evolution of real networks, we attempt to set up a theoretical model for useritem networks. Our primary goal is to qualitatively reproduce the human activity and the item popularity observed in the four empirical networks previously mentioned. In the above numerical analysis, the rich-get-richer phenomenon has been observed in the growth of the user's degree of activity and the item's degree of popularity, i.e., the active users (the popular items) have a higher probability of creating (acquiring) new cross links.The mechanism of preferential attachment has successfully explained the rich-getricher phenomenon in previous works [7][8][9], but it implicitly requires global information, e.g. degree of all nodes. However, it is impossible for individuals to collect global information in real social systems. Therefore, this only gives a macroscopic explanation of how a user's degree of activity and an item's degree of popularity evolve. Moreover, the formation of cross links will simultaneously affect the activity degree and popularity degree. This poses an interesting question: What is the microscopic mechanism governing the growth of a user's activity degree and an item's popularity degree while giving rise to the various distribution observed?
There are two crucial questions to be considered. First, how are the users activated to create new cross links? It is very difficult to formulate the users' behaviors because human dynamics are very complicated due to the inherent diversity of real world circumstances. In the empirical analysis, the users with a larger degree of activity are more likely to create cross links. Moreover, it is found that the degree of activity is positively correlated to the social degrees [3]. In addition, the users' activities are influenced by their neighbors' activities as the information can spread along social links by user-to-user exchanges [23][24][25][26][27]. Therefore, we believe that the users with more friends are frequently activated as receiving more information from neighbors, and the users interested in more items are easily activated because they are very sensitive to stimuli. Hence, we employ the random walk, starting from one user and via either social or cross links, to select users, who will actively create cross links. The users with more friends and more items have a greater chance of being reached by random walk. It is found that random walk might be one possible micro mechanism governing the evolution of social networks [34,35], and is equivalent to preferential attachment from a macro perspective [35,36].
The second question to consider is: How do the activated users access the items? In Flickr, it has been found that over 80% of new social links are formed between friends' friends and over 50% of new cross links are formed between one user and his friends' favorite photos [4]. Moreover, the probability of a user favoritemarking one photo increases with the number of his friends who have favorite-marked the photo [4], indicating that the user is influenced by his friends and reaches the photo via his friends. We therefore assume that the users access the items via their friends by two-step random walk process. It is also worth noting that the popular items are exposed to more users, so they have a higher probability of being reached by the random walk process than the unpopular items.
For simplicity, we made the following assumptions in our model: (1) the users can befriend other users (see the example of Fig. 1), (2) the activated users are selected by random walk either via social or cross links, and (3) new links (except the first links attached to newly added users and items) are formed between one user and one of his second neighbors (either users or items). In this way, the link growth process can be understand as two-step random walks via the social links or via the cross links. One way to model this network is to select the activated users who will then actively create new links. Another is to select the target nodes (either users or items) that will passively acquire new links. In our numerical simulation, we employ a two-step random walk for simplicity.
Numerically, the topology evolves according to the following rules: (1) the initial network consists of a few users (M 0 ) and items (N 0 ). The users form a small random social network, while the items are randomly rated by the users. (2) At each time step, one new user is added into the system and randomly connected to one user and one of items rated by the user that is being connected to. Meanwhile, q new items are added into the system, each of which is rated by one activated user selected by two-step random walk.
(3) At each time step, m users are activated by using a two-step random walk, and each of them connects to his second neighbors by a two-step random walk via common friends or via common items if they are not previously connected. (4) At each time step, n users are activated by a two-step random walk, and each of them connects to one of items rated by his friends by a two-step random walk if he has not previously rated it.
We carried out numerical simulations to validate the model. We set the parameters as follows: M 0~1 00 and N 0~1 00. We then ran the simulation up to M~100,000. Based on the simple assumptions of random walk, the model can reproduce the distributions of activity degree p(k a ) and popularity degree p(k p ) observed in the above four empirical networks. In Fig. 5, we compare the degree distributions of the model with those of the empirical networks. It is found that the distributions of the model networks are qualitatively consistent with their counterparts of the empirical networks. Though the distributions of popularity degree in the model do not quantitatively match those of empirical networks as shown in Figs. 5 (c) and (d), the slops are consistent with each other. In the insets of Figs. 5 (c) and (d), we compare the distributions of larger degree of popularity of the model with those of the empirical networks. It is observed that they are approximately consistent with each other. These results indicate that our assumptions are reasonable that the users are activated by a twostep random walk, and subsequently find items of interest to them in the same manner. Figure 6 displays the influences of the parameters on the distributions. From Fig. 6 (a), we can see that the ratio between the number of new cross links n and the number of new items q has an obvious influence on the distribution of popularity degree; however, the distributions of activity degree are greatly affected by the parameter n as seen in Fig. 6 (b). Furthermore, the distributions of activity degree depend on the number of new social links m to some extent, while the distributions of social degree p(k s ) are almost independent of the number of new cross links n and the number of new items q. In particular, the distributions of social degree follow a power law, which is qualitatively consistent with that of Flickr networks.

Discussion
In this study, we first carried out an empirical analysis to four empirical networks: Amazon, Flickr, Delicious and Wikipedia. Our study revealed the growth patterns of the users' degrees of activity and items' degrees of popularity within these networks, both of which follow the law of the rich-get-richer. It was found that the users are likely to trace popular items, but the intensities are different for the users with different activity degrees. For example, the active users make a greater than average contributions to the unpopular items whereas the inactive users make a greater than average contributions to the popular items. Motivated by the empirical findings in these four networks, we proposed an evolving model based on a two-step random walk, which is able to qualitatively reproduce the activity and popularity distributions observed in empirical networks. Based on both the empirical analysis and the theoretical model, we believe that the information spreading amongst individuals, which is simplified as a two-step random walk in the model, could represent one possible micro mechanism governing the dynamic evolution of human activity and item popularity. Of course, the dynamics of human activity and item popularity are very complicated due to the inherent diversity of human behaviors and the varying nature of items. Hence, there may be other microscopic mechanisms governing the dynamics of human activity and item popularity.
It should be noted that the results of our model are only qualitatively consistent with the empirical results. The quantitative mismatch is due to the simplifications in our model. For example, the users are only activated by a two-step random walk, and then reach the items by another two-step random walk via friends. In reality, the situation could be much more complicated. In the empirical networks, for example, besides the stimulus of neighbors, the occasional events can inspire users to participate in networkrelating events. For instance, the users can access photos through various other channels in Flickr, such as the list of interesting photos provided by the web site, the search engine, the links between similar photos and so on. Furthermore, an item's particular attributes, such as being an award-winning picture, may disproportionally affect how quickly an item's popularity changes. These factors have remarkable influence on the growth of human activity and item popularity. We believe that if we consider more realistic factors in the model, we can improve the performance of our model and obtain more helpful insights in understanding the dynamics of human activity and item popularity. These problems deserve further investigations in the future.