Collaboration and followership: A stochastic model for activities in social networks

In this work we investigate how future actions are influenced by the previous ones, in the specific contexts of scientific collaborations and friendships on social networks. We describe the activity of the agents, providing a model for the formation of the bipartite network of actions and their features. Therefore we only require to know the chronological order in which the actions are performed, and not the order in which the agents are observed. Moreover, the total number of possible features is not specified a priori but is allowed to increase along time, and new actions can independently show some new-entry features or exhibit some of the old ones. The choice of the old features is driven by a degree-fitness method: indeed, the probability that a new action shows one of the old features does not solely depend on the popularity of that feature (i.e. the number of previous actions showing it), but it is also affected by some individual traits of the agents or the features themselves, synthesized in certain quantities, called fitnesses or weights, that can have different forms and different meaning according to the specific setting considered. We show some theoretical properties of the model and provide statistical tools for the parameters’ estimation. The model has been tested on three different datasets and the numerical results are provided and discussed.


Introduction
In the last years complex networks established as a proper tool for the description of the interactions within large systems [1][2][3][4]. The renewed attention to this field can be dated back to the well known Barabási-Albert model [5], in which the authors provide an explanation of the power-law distribution of node degrees in the World Wide Web (WWW) via a dynamic generative network model. At every step a new vertex is added and the probability to observe a new link is proportional to the number of connections (i.e. the degree) of the target node. The success of this proposal resides in the fact that only this simple rule, called preferential attachment, is able to reproduce with good accuracy the degree distribution of many real networks, such as the WWW. Even if the original mechanism was already present in the literature in a slightly different form [6,7], the paper of Barábasi-Albert boosted the attractiveness of complex networks and other scholars delved into the investigation of the properties of generative PLOS ONE | https://doi.org/10.1371/journal.pone.0223768 October 30, 2019 1 / 24 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 models based on different types of dependence of the connection probabilities and the degrees [8,9]. The preferential attachment was enriched with another ingredient, such as the fitness [10,11]: a quantity defined per node that measures the intrinsic ability of the vertex to collect links. Then, the probability of targeting a certain node becomes the product of its fitness and degree. The effect of this new variable is to amplify or dampen the preferential attachment effect. Indeed, the presence of the fitness permits to overcome the "first move advantage" (i.e. the fact that older nodes have greater degrees by construction), thus permitting to young nodes to grow easily. Furthermore time dependence has been included by considering the possibility of node aging, i.e. multiplying the probability of link by a time dependent damping function [2,[12][13][14]. The importance of the previous proposals was not in the definition of the model per se, but in providing an explanation for the structure of the networks examined. For instance, the preferential attachment in [5] explains the power-law degree distribution in the World Wide Web and describes a "rich get richer" competition for links. Instead, in the fitness methods, some attributes of the nodes, not directly observed in the network, define the structure of the network (as in the case of e-mails networks, in which senders do not have access to information about the number of connection of the receivers [15]). In the same way, fitness aging [13] gives an explanation to the limited (in time) growth in citation of most of the papers.
All previous efforts were devoted to monopartite, directed or undirected, networks. A much smaller number of contributions is available for the description of the evolution of bipartite networks. In bipartite networks, nodes are divided into two different classes and only links connecting nodes belonging to different classes are allowed [1,4]. Guillame and Latapy [16] proposed a simple model that produce a power-law degree distribution for both classes (for instance, this is the case of reviews and reviewers in the Netflix dataset). Some other dynamical models for bipartite networks were proposed for the description of specific systems. For instance, in [17] the authors propose a generative model to study the bipartite networks of lawyers and clients that develops according to a recommendation process: more popular lawyers are also more likely to be hired by new clients. Furthermore, the authors in [18] provide a framework in which the simultaneous evolution of two systems has been studied. Indeed, they analyse communities of scientists considering both the monopartite network describing the interactions among agents themselves and the bipartite semantic network in which the agents are associated to the concepts they use. Another example is [19], in which the structure of the (growing) bipartite trade network (one class includes the countries and the other one includes the exported products) was reproduced by assigning links with sequential preferential attachment, considering the degree of both nodes in the process. In order to describe the generation of an innovative product, following the idea of the "adjacent possibles" [20], new nodes (i.e. new products) are derived by the structure of an unobserved monopartite network of products describing the hierarchical productive process relations. Therefore, the evolution of the bipartite system is due to the simultaneous dynamics of an unobserved evolving network.
The present work aims at providing a generative model for the bipartite networks, where one class is formed by agents and the other one includes their actions. The starting point is the model for monopartite networks studied in [21] and its variant introduced in [22]. In [21], a set of nodes sequentially join the network, each of them showing a set of features. Each node can either exhibit new features or adopt some of the features already present in the network. This choice is regulated by a preferential attachment rule: the larger the number of nodes showing a certain feature, the greater the probability that future nodes will adopt it too. The total number of possible features is not specified a priori, but is allowed to increase along time. Differently from [16,23], each node has been weighted with a fitness variable, that accounts for nodes' personal ability to transmit its own features to future nodes. The model in [22] introduces some novelties in the previous context: the probability to exhibit one of the features already present in the network is defined as a mixture, i.e. a convex combination, of random choice and preferential attachment. However, neither fitnesses nor weights are introduced in the model, so that all nodes are assumed to have equal capabilities in transmitting their personal features to the newcomers. The present work moves along the same research line of the previously mentioned papers [21,22], but with a different spirit. Indeed, the previous papers provide two different models of network formation, in which the nodes sequentially join the network and the number of common features affects the probability of connections among them. The main drawback of these two models resides in the assumed chronological order of nodes' arrivals, which may typically be unknown (or non-relevant) in many real-world systems. In the present paper, given a system of n agents, we provide a model for the formation of the bipartite network of agents' actions and their features. Therefore, this model can be applied to all settings in which agents of interest are not observed in a specific chronological order, because the assumption on the chronological order is specified on the agents' actions only. Moreover, the probability to exhibit one of the features already observed is defined as a mixture of random choice and preferential attachment with weights, i.e. the probability of connection depends both on the features' degrees and the fitness of the agents involved and/or of the features themselves. These weights W t,j,k can have different forms and meanings according to the specific setting considered: the weight at time-step t of the observed feature k can depend on some characteristics of k itself, or it can be directly established by the agent performing action t; it may also represent the inclination of the agent performing action t in adopting the previous observed features, or some properties of the agent performing the previous action j with k among its features (for instance, her/his ability to transmit her/his own features).
We analyse two datasets of scientific publications (respectively IEEE for Automatic Driving, and arXiv for Theoretical High Energy Physics, or more briefly Hep-Th) and a dataset of posts of Instagram. We not only obtain a good fit of our model to the data, but our analysis also results useful in order to highlight interesting aspects of the activity of the three considered networks. Indeed, we find different variables playing a role in their evolution. In the three systems studied, we consider the degrees of the features (i.e. the popularity of, respectively, keywords in a scientific paper or hashtags on Instagram) and some fitness variables associated to the agents as drivers for the dynamics. For the scientific publications, we show a good agreement of the model to the IEEE dataset for Automatic Driving and to the arXiv dataset for Hep-Th with weights based on the number of publications or the number of co-authors of an author, the former performing better in the case of Automatic Driving. Otherwise stated, in the case of Automatic Driving the ability of an author to transmit the keywords of her/his papers, that essentially describe her/his research topics, is better reproduced by her/his number of publications, while in Hep-Th this ability is related both to the activity of the author, i.e. to the number of her/his publications, and to the number of collaborations established in her/his career. This difference can be due to the nature of the two research fields in the considered temporal window. Automatic Driving is more recent and limited, and new results drive the evolution of the research. Thus, an author transmits more keywords the more its activity in the research. Hep-Th research area, instead, is an older and structured research field, evolved in different specialized branches. In the case of Instagram, we find that the dynamics is well reproduced using the popularity of the users in a tricky sense: a standard user tends to employ many already existing hashtags, in order to acquire more visibility, while popular users mention just few already existing hashtags. Moreover, as for the previous two collaboration networks, also for the on-line social network Instagram, the relevance of an agent (with respect to the probability of transmitting her/his features) is well measured by her/his activity, that is the number of her/his actions.
The present paper is so organized. We first illustrate in detail the proposed model for the formation of the actions-features bipartite network. Then, we explain the meaning of the model parameters and the role of the weights introduced into the preferential attachment term. Some asymptotic results regarding the behavior of the total number of features and the mean number of edges in the actions-features bipartite network are collected in Section A.1 in S1 Text. The same file also contains a description of the statistical tools for the estimation of the model parameters (see Section A.2 in S1 Text). In the subsequent section we provide the general methodology used to analyse the data (the data cleaning procedure is explained in Section A.3 in S1 Text), and then we show the application of the model to the above mentioned real-world cases (IEEE, arXiv, Instagram datasets). We summarize the overall contents of the paper and recap the main findings in the last section.

Model
Suppose to have a system of n agents that sequentially perform actions along time. Each agent can perform more than one action. The running of the time-steps coincide with the flow of the actions and so sometimes we use the expression "time-step t" in order to indicate the time of action t. Each action is characterized by a finite number of features and different actions can share one or more features. It is important to point out that we do not specify a priori the total number of possible features in the system, but we allow this number to increase along time. In what follows, we describe the model for the dynamical evolution of the bipartite network that collects actors' actions on one side and the corresponding features of interest on the other side. We denote by F the adjacency matrix related to this network. The dynamics starts with the observation of action 1, the first action done by an agent of the considered system, that shows N 1 features, where N 1 is assumed Poisson distributed with parameter α > 0. (This distribution will be denoted from now on by the symbol Poi(α)). Moreover, we number the observed features with k from 1 to N 1 and we set F 1,k = 1 for k = 1, . . ., N 1 . Then, for each consecutive action t � 2, we have: 1. Action t exhibits some old features, where "old" means already shown by some of the previous actions 1, . . ., t − 1. More precisely, if N j denotes the number of new features exhibited by action j and we set the new action t can independently display each old feature k 2 {1, . . ., L t − 1 } with probability where δ 2 [0, 1] is a parameter, F j,k = 1 if action j shows feature k and F j,k = 0 otherwise, W t,j, k � 0 is a random weight associated to feature k, measured at the time of action t and related to the previous action j. Finally B t is a suitable normalizing factor so that P tÀ 1 j¼1 F j;k W t;j;k =B t belongs to [0, 1]. We will refer to quantity (2) as the "inclusion probability" of feature k at time-step t.

Action t can also exhibit a number of new features
where β 2 [0, 1] is a parameter. The variable N t is supposed independent of N 1 , . . ., N t−1 and of all the appeared old features and their weights (including those of action t).
With the observation of the t th action, all the matrix elements F t,k with k 2 {1, . . ., L t } are set equal to 1 if action t shows feature k and equal to 0 otherwise. Here is an example of a F matrix with t = 3 actions: In boldface we highlight the new features for each action: we have N 1 = 4, N 2 = 2, N 3 = 3 and so L 1 = 4, L 2 = 6, L 3 = 9 and, for each action t, we have F t,k = 1 for each k 2 {L t−1 + 1, . . ., L t }. Moreover, some elements F t,k , with k 2 {1, . . ., L t−1 }, are equal to 1 and they represent the features brought by previous actions exhibited also by action t.
It may be worth to note that our model resembles the one known as the "Indian buffet process" in Bayesian Statistics [24][25][26], but indeed there are significant differences in the definition of the inclusion probabilities: in particular, the parameter δ and the weights W t,j,k . Moreover, Bayesian Statistics deals with exchangeable sequences, while here we do not require this property. As a consequence, the role played by each parameter in (2) and (3) results more straightforward and easy to be implemented.

Discussion of the model
We now discuss the meaning of the model parameters α, β and δ and the role of the weights W t,j,k . Some asymptotic results for the model and the statistical tools employed to estimate the model parameters are collected in Sections A.1 and A.2 in S1 Text, respectively.

The parameters α and β
In the above model dynamics, the probability distribution of the random number N t of new features brought by action t is regulated by the pair of parameters (α, β) (see (3)). Specifically, the larger α, the higher the total number of new features brought by an action, while β controls the asymptotic behavior of the random variable L t ¼ P t j¼1 N j , i.e. the total number of features observed for the first t actions, as a function of t. In particular, it has been shown in [22] that the parameter β > 0 corresponds to the power-law exponent of L t : precisely, if β = 0 then the asymptotic behavior of L t is logarithmic, while for β 2 (0, 1] we obtain a power-law behavior with exponent β (see Section A.1 in S1 Text).

The parameter δ and the random weights W t,j,k
Looking at Eq (2) of the above model dynamics, we can see that, for a generic action t, both the parameter δ and the random weights W t,j,k affect the number of old features (k = 1, . . ., L t−1 ) also shown by action t. Specifically, the value δ = 1 corresponds to the pure i.i.d. case with inclusion probability equal to 1/2: an action can exhibit each feature with probability 1/2 independently of the other actions and features. The value δ = 0 corresponds to the case in which the inclusion probability P t (k) entirely depends on the (normalized) total weight associated to feature k at the time of action t, i.e. to the quantity In Eq (4), the term W t,j,k � 0 is the random weight at time-step t associated to feature k that can be related to the course of previous actions j. We denote this case as the "pure weighted preferential attachment case" since the larger the total weight of feature k, the greater the probability that also the new action will show feature k. When δ 2 (0, 1), we have a mixture of the two cases above: the smaller δ, the more significant is the role played by the weighted preferential attachment in the spreading of the observed features to the new actions. In the sequel we will refer to (4) as the "weighted preferential attachment term".
Regarding the weights, the possible ways in which they can be defined benefit of a great flexibility. Of course their meaning has to be discussed in relation to the particular application considered. For instance, the weight W t,j,k can be directly assigned by the agent performing action t to the feature k in connection with the previous action j, or it may represent the inclination of the agent performing action t of adopting the previous observed features, or it may implicitly due to some properties of the agent performing the previous action j (for instance, her/his ability to transmit her/his own features), or even more. We here describe some general interesting frameworks: 1. If we set W t,j,k = 1 for all t, j, k with normalizing factor B t = t, then all the observed features have the same weight. Then the sum in the numerator of (4) becomes the popularity of feature k, that is the total number of previous actions that have already exhibited feature k, while the quantity (4) is essentially the average popularity of feature k (we divide by t instead of t − 1 in order to avoid the quantity (4) to be exactly equal to 1 for all the first N 1 features). In this case the actions-features dynamics coincides with the nodes-features dynamics considered in [22].
2. We can assume that a positive random variable G i (with i = 1, . . ., n) is associated to each agent in order to describe her/his ability to transmit the features of her/his actions to the others. This random variable can be seen as a static fitness as defined in [10,11,15]. In this case the weight W t,j,k can be defined as G i(j) (or a function of this quantity), where i(j) denotes the agent performing action j. In particular, we have W t,j,k = W j , that is the weights only depend on j. Hence, the weight of a feature k is only due to the fitness of the agent that performs an action with k among its features and the sum in the numerator of (4) becomes the total weight of the feature k due to the agents that have previously exhibited it in their actions. The quantity B t ¼ c þ P tÀ 1 h¼1 W h can be chosen as normalizing factor, i.e. we basically normalize by the total fitness of the agents that have performed actions 1, . . ., t − 1. Note that case 1) can be seen as a special case of the present, taking G i = 1 and c = 1. Moreover, another interesting element to observe is that the weighted preferential attachment term (4) can be explained with an urn process. Indeed, for each feature k, let t(k) be the first action that has k as one of its features and image to have an urn with balls of two colors, say red and black, and associate an extraction from the urn to each action t � t(k) + 1. The initial total number of balls in the urn is c þ P tðkÞ h¼1 W h , of which W t(k) red. At each time-step t � t(k) + 1, if the extracted ball is red then action t exhibits feature k and the composition of the urn is updated with W t red balls; otherwise, action t does not exhibit feature k and the composition of the urn is updated with W t black balls. Therefore quantity (4) gives the probability of extracting a red ball at time-step t. This is essentially the nodes-features dynamics considered in [22] with δ = 0 only. If we have G i � 1, an alternative normalizing factor is B t = t. In this case the quantity (4) is the empirical mean of the random variables F j,k W j , with j = 1, . . ., t − 1 (again we divide by t instead of t − 1 for the same reason explained above).
3. We can extend case 2) to the case in which the fitness variables change along time and so we have W t,j,k = W t,j defined in terms of G t,i(j) , where i(j) denotes the agent that performs action j and G t,i(j) is her/his fitness at the time-step of action t, thus following prescription similar to those of [13,14]. We can also extend to the case in which the actions can be performed in collaboration by more than one agent. In this case the weight W t,j can be defined as a function of the fitness at time-step t of all the agents performing action j.
4. We can set W t,j,k = W t,k for all t, j, k with B t = t so that the term (4) becomes the average popularity of feature k adjusted by the quantity W t,k . For instance, we can take W t,k as a decreasing function of t � (k) = max{j: 1 � j � t − 1 and F j,k = 1}, which is the last action, before action t, that has k among its features. By doing so, in (4) the average popularity of k is discounted by the length of time between the last appearance of feature k and t. Another possibility is to use a weight W t,k in order to give more relevance to the features already shown by the same agent performing action t in the previous actions. More precisely, we can denote by i(j) the agent that performs action j and, for each action t, we can define W t,k as an increasing function of the sum ∑ j=1,. . .,t−1,i(j)=i(t) F j,k so that the more an agent has exhibited feature k in her/his own previous actions, the greater the probability that also her/ his new action will show feature k. An additional possibility is to eliminate the dependence on t and consider weights W t,j,k = W k , where W k can be seen as a fitness random variable associated to feature k.

5.
We can modify case 2) by giving a different meaning to G i . Indeed, we can associate to each agent i a positive random variable G i in order to describe her/his inclination of adopting the already appeared features. Then we can define the weight W t,j,k as G i(t) (or as a function of it), where i(t) denotes the agent performing action t. In this way, we have W t,j,k = W t for all t, j, k, that is the weights only depend on the inclination of the agent performing the action and, if we set B t = t as in case 4), the term (4) becomes the average popularity of feature k adjusted by the quantity W t .
6. Finally, we can take W t,j,k = W j,k (i.e. depending on j and k, but not on t) in order to represent the weight given by the agent performing action j to feature k exhibited in this action. Therefore the total weight of feature k at time-step t is the total weight given to feature k by the agents who performed the previous actions.
These are just general examples of possible weights. We refer to the following applications to real datasets for special cases of the above examples. It is worth to note that the weights W t,j,k may be not independent. For example, in case 5) we have exactly the same weight for all the actions performed by the same agent.

Results
In this section we present some applications of the model to different real-world bipartite networks. In the first subsection we illustrate the general methodology used to analyse the datasets (we refer to S1 Text for the data cleaning procedure). The other subsections contain instead three examples: we first consider two different collaboration networks, the first one in the area of Automatic Driving and downloaded from the IEEE database, the second one in the research field of High Energy Physics and downloaded from the arXiv repository. In both cases, the agents are the authors, the agents' actions are the published papers and the features are all 1-grams (nouns and adjectives) included in the title or abstract of each paper. Thus, the considered features identify the main research subjects treated in the papers. For these applications we make use of weights of the form W t,j , that are defined in terms of a fitness variable associated to the agents who performed previous action j, but measured at the timestep of the current action t. Finally, we present the last example: we study the quite popular on-line social network of Instagram, in which the users are the agents, the agents' actions are the posted photos and, for each media, the features are the hashtags included in its description. Thus, the considered features identify the topics the considered posts refer to. For this example, we investigate two kinds of weights: weights of the form W t , that solely depend on some quantity related to the agent performing the current action t, in order to adjust the average popularity of each feature in (4), and, as in the previous two applications, weights of the form W t,j , that are defined in terms of a fitness variable associated to the agents who performed previous action j. In all the three applications, the weights are observable random variables. A more detailed interpretation of the considered weights is provided in each subsection.

General methodology
For each considered applications, the analysis develops according to the same outline that we describe in the following subsections. Estimation of the model parameters. We provide the estimated value of the parameters α, β and δ of the model by means of the tools illustrated in Section A.2 in S1 Text. For each parameter p 2 {α, β, δ}, we also give the averaged value � p of the estimates on a set of R realizations and the related mean squared error MSE(p). More precisely, starting from the estimated valuesâ,b andd (and the observed chosen weights), we generate a sample of R simulated actions-features matrices and we estimate again the parameters on each realization, obtaining the valuesâ r ,b r andd r , for r = 1, . . ., R. We then compute, for each parameter p 2 {α, β, δ}, the average estimate � p over all the simulations and the MSE(p), as follows Check of the asymptotic behaviors. We consider the behavior of the total number L t of observed features along the time-steps t and we compare it with the theoretical one of the model (see Section A.1.1 in S1 Text). In particular, for each application, we verify that the power-law exponent matches the estimated parameter β. Moreover, we consider the behavior of the total number e(t) of edges in the real actions-features network and we compare it with the mean number μ e (t) of edges obtained averaging over R simulated actions-features networks, obtained by the model with the selected weights.
Comparison between real and simulated matrices and selection of the weights. We compare the real and simulated actions-features matrices on the basis of two groups of indicators: one regarding the spreading of the old features in the new actions, which depends on the weights, and the other one regarding the arrival process of the features, which does not depend on the weights. The first indicators allow us to select the most appropriate weights among those taken into consideration.
Indicators for the spreading process of the old features: We take into account the indicator For each action t, with 2 � t � T, the quantity O t is the number of old features shown by action t and � O T provides the averaged value overall the set of observed actions. This indicator is computed for the real matrix, for the simulated matrix by the model with different kinds of weights, including also the constant weights equal to 1 in order to evaluate the relevance of the weights inside the dynamics. In particular, for the simulated matrices, the provided values are an average on R realizations, together with their sample standard deviation s O T . Furthermore, in order to take into account also the not-exhibited old features (i.e. the zeros in the matrix F), we check also the number of correspondences, that is we compute the indicator where we use the apex abbreviation re or sim to indicate whether the considered quantity is related to the real matrix or the simulated matrix, respectively. The meaning of the above indicator is the following. Given the simulated matrix, for a certain action t, Indicators for the arrival process of the features: As said before, this process is not affected by the weights. We take into account the indicator where N t = L t − L t−1 (with L 0 = 0) is the number of new features brought by action t and � N T provides the averaged value overall the set of observed actions. This indicator is computed for the real matrix and for the simulated matrix. In particular, for the simulated matrix, the provided value is an average on R realizations, together with its sample standard deviation s N T . Moreover, we consider the indicator where, as above, we use the apex abbreviation re or sim to indicate whether the considered quantity is related to the real matrix or the simulated matrix, respectively. The meaning of the above indicator is the following. Given the simulated matrix, for a certain t, the quantity m L (t) computes the relative error committed in the total number of observed features and m L is the corresponding averaged values overall the observations. A value of m L close to 0 indicates that the relative error in the total number of observed features is very low. Again, the provided value is an average on R simulations, together with its sample standard deviation s m L .
Predictive power of the model. Once the weights are selected, we perform a prediction analysis on the actions-features matrix: we estimate the model parameters only on a subset of the observed actions, we simulate the rest by means of the model and compare the real and simulated matrices. More precisely, fixed a time-step T � < T, we estimate the model parameters on the "training set" corresponding to the set of actions observed at t = 1, . . ., T � . We then employ those estimates to simulate the dynamics of the actions-features network related to the remaining set of actions at times t = T � + 1, . . ., T. Finally, taking the features really observed for these last actions as "test set", we evaluate the goodness of our predictions by computing the following indicators: where, as before, we use the apex abbreviation re or sim to indicate whether the considered quantity is related to the real matrix or the simulated matrix, respectively. The meaning of the above indicators is the same of m O and m L : given a simulated matrix, for a certain action t,

IEEE dataset for Automatic Driving
For the first application we have downloaded (on June 26, 2018) all papers recorded between 2000 and 2003 present in the IEEE database in the scientific research field of Automatic Driving. As in [22], we selected all papers containing at least one of the keywords: Lane Departure Warning, Lane Keeping Assist, Blindspot Detection, Rear Collision Warning, Front Distance Warning, Autonomous Emergency Braking, Pedestrian Detection, Traffic Jam Assist, Adaptive Cruise Control, Automatic Lane Change, Traffic Sign Recognition, Semi-Autonomous Parking, Remote Parking, Driver Distraction Monitor, V2V or V2I or V2X, Co-Operative Driving, Telematics & Vehicles, and Night vision. The download has yielded 492 distinct publications belonging to the required scientific field and period. For each paper we have at our disposal all the bibliographic records, such as title, full abstract, authors' names, keywords, year of publication, date in which the paper was added to the IEEE database, and many others. The papers have been sorted chronologically according to the date in which they were added to the database. We have considered all nouns and adjectives (from now on "key-words") included in the title or abstract as the features of the model and sorted them according to their arrival time. (See Section A.3 in S1 Text for a more detailed description of the data preparation procedure.) The features matrix obtained at the end of the cleaning procedure collects T = 492 papers (actions) recorded in the period 2000 − 2003 and involving n = 1251 distinct authors (agents) and containing L T = 4553 key-words (features). The binary matrix entry F t,k indicates whether feature k is present or not into the title or the abstract of the paper recorded at timestep t. A pictorial representation of the matrix is provided in Fig 1. For this application, we use weights of the type 3): indeed, at each time-step t, we associate to each author i a fitness variable G t,i that quantifies the influence of author i in the considered research field, and we define the weights as IðjÞ ¼ set of the agents performing action j : ð11Þ Therefore the inclusion probability in Eq (2) reads as The term M t,j is the maximum among the fitness variables G t,i at time-step t of all the authors i 2 IðjÞ, i.e. the authors who published the paper appeared at time-step j. A high value of G t,i should identify a person who is relevant in the considered research field so that it is likely that other scholars use the same features of her/his actions, that are the keywords related to her/his research. As a consequence, in the preferential attachment term, we give to each old feature k a weight that is increasing with respect to the fitness variables of the authors who included k in their papers. We analyse two different fitness variables:  (Note that 1 is added in order to avoid division by zero in the previous formula (11)). We perform the analysis following the general methodology explained in the previous section (with R = 500), taking into both definition of fitness. We first estimate the model's parameters, obtaining the results in Table 1. The estimated value for the parameter δ, which is zero, points out that the weighted preferential attachment term (4) plays a leading role in the inclusion probabilities. Fig 2 provides in the left panel a log-log plot of the cumulative count of new features (key-words) as a function of time (see the red dots), that clearly shows a power-law behavior. Moreover, this agrees with the theoretical property of the model stated in Section A.1.1 in S1 Text, according to which the power-law exponent has to be equal to the parameter β (in the figure the black line has slope equal to the estimated value for β, that isb ¼ 0:5962). The goodness of fit of the model to the dataset has been evaluated through the computation of   Table 3. Table 4 also indicates that the model with G pub t;i is the best performing. More precisely, for the model with the fitness G pub t;i , the computed value of � m O ranges from 88% to 97%, pointing out that a high percentage of the entries in the actions-features matrix have been correctly inferred by the model. The same value for the model with the fitness G col t;i ranges from 83% to 96%, and, for the model with all the weights equal to 1, it ranges from 55% to 93%. The differences are more evident when we select the first k � features: indeed, with G pub t;i we succeed to infer the value of at least 88% of the entries; while with G col t;i and with all the weights equal to 1 the percentage remains under 88% and 70%, respectively. This means that the major difference in the performance of the different considered weights is in the first features, that are those for which the preferential attachment term (that depends on the weights) is more relevant. Note also that the actions-features matrix is more dense in the part corresponding to the first k � features. At this point, we select the model with the weights that take into account the authors' number of publications as the best performing one for the considered dataset and in the following we focus on it. In Table 5 we evaluate the predictive power of the model: we estimate the parameters of the model only on a subset of the observed actions, respectively the 75%, 50% and 25% of the total observations; we then predict the features for the future actions {T � + 1, . . ., T} and compare the predicted and observed results by means of the indicators in (10) over the whole set of features and only on a portion of it. Finally, in the right panel of Fig 2, we provide the  It is worth to note that the difference in the performance between the two definitions of fitness variables has a straightforward interpretation: in the considered case, i.e. for the publications in the area of Automatic Driving in the considered period, the relevance of an author (with respect to the probability of transmitting her/his features) is better measured by considering the number of her/his publications rather than the number of her/his co-authors. As we will see later on, we get a different result for the second application.

ArXiv dataset for Theoretical High Energy Physics
The second application has been performed with the arXiv dataset of publications in the scientific area of Theoretical High Energy Physics (Hep-Th), recorded in the period 2000−2003 (the same period used for the first application), freely available from [27]. The dataset collects a sample of text files reporting the full frontispiece of each paper, so we have information on: arXiv id number, date of submission, name and email of the author who made the submission, title, authors' names and the entire text of the abstract. From the original format we isolate the submission date and the identity number of the paper, in order to sort all papers (actions) chronologically. Then, with the final purpose of constructing the features matrix, we consider all key-words included either in the main title or in the abstract as the features of the papers and we sort them according to their time of appearance. (The complete data preparation phase is described in Section A.3 in S1 Text). We constructed the features matrix F, whose elements are equal to F t,k = 1 if paper t includes word k either in the title or in the abstract and F t,k = 0 otherwise. The result is shown in Fig 3, where the observed actions-features matrix collects T = 10603 papers (actions) registered between 2000 and 2003 and L T = 22304 key-words appeared in the title or in the abstract (features), while the total number of involved authors (agents) is n = 5633.
The weights for this application are defined as in the previous one, described in Eq (11). We consider again the two different definitions for the fitness term G t,i (see (13) and (14)). The performed analysis follows the general methodology (with R = 500) previously explained. We first estimate the model's parameters, obtaining the results in Table 6. Again, the estimated value for the parameter δ points out that the weighted preferential attachment term (4) plays a leading role in the inclusion probabilities.   (10) are computed for different levels of information used as "training set": more precisely, the different values of T � correspond to 75%, 50% and 25% of the set of the actions, respectively. Moreover, the indicator � m � O is computed on the whole matrix (k � = 4553) and also taking into account only the first k � = 200 features. In the brackets, there are the sample standard deviations.  has been evaluated through the computation of the quantities (6), (7), (8) and (9). These results are shown in Tables 7, 8 and 9. We can see that the model is able to perfectly reproduce the average number of new features � N T . Instead, the average number of old features (i.e. the quantity � O T ) is under-estimated by the model with the weights based on G pub t;i and G col t;i , while it is widely over-estimated in the case with all the weights equal to 1. The discrepancy in the values is a little smaller for the case with G col t;i (that is the case with the fitness based on the number of collaborators). Table 8 shows that the performance of the model in reproducing the data are comparable with both the considered definitions of fitness and they are slightly better than in the case with all weights equal to one. At this point, since the best performance in reproducing � O T , we select the weights that take into account the authors' number of collaborations and the last analysis focuses on it. In Table 10 we evaluate the predictive power of the model: we estimate the parameters of the model only on a subset of the observed actions, respectively the 75%, 50% and 25% of the total observations; we then predict the features for the future actions {T � + 1, . . ., T} and compare the predicted and observed results by means of the indicators in (10) over the whole set of features and only on a portion of it. Finally, in the right panel of  Table 8. arXiv High Energy Physics dataset. Comparison between real and simulated actions-features matrices by means of the indicators (7), computed on the whole matrix (k � = 22304) and also taking into account only the first k � = 16728, 11152, 5576 features. https://doi.org/10.1371/journal.pone.0223768.t008 Table 9. arXiv High Energy Physics dataset. Comparison between real and simulated actions-features matrices by means of the indicators (8) and (9).  Table 10. arXiv High Energy Physics dataset. Predictions on the actions-features matrix. The indicators (10) are computed for different levels of information used as "training set": more precisely, the different values of T � correspond to 75%, 50% and 25% of the set of the actions, respectively. Moreover, the indicator � m � O is computed on the whole matrix (k � = 22304) and also taking into account only the first k � = 11152 features. In the brackets, there are the sample standard deviations. Contrarily to the previous case, in this application we observe a comparable performance of the model with both the considered definitions of fitness. This means that, for the publications in High Energy Physics in the considered period, both the number of co-authors and the number of publications of an author can be considered as reasonable measures in order to evaluate her/his relevance in the research field.

Instagram dataset
The dataset has been crawled through the Instagram API between January 20 and February 17, 2014 and collects public media (with their author, time-stamp and set of hashtags) as well as users information (with their list of followers and followees) of a set of 2100 anonymized participants to 72 popular photographic contests that took place between October 2010 and February 2014. A detailed description of the dataset used for this application can be found in [28]. The overall media dataset records more than one million posts but, with the purpose of maximizing the density of our actions-features matrix, we considered only those posts posted during the weekends in the crawling period (Jan 20−Feb 17, 2014) in which at least 5 hashtags are used. This procedure yields a sample of T = 2151 posts (actions) and L T = 5890 hashtags (features). The available posts were ordered chronologically according to the associate time-stamp of publication and the hashtags (features) were sorted in terms of their first appearance in a post. After this first phase of data arrangement, we constructed the actions-features matrix F, with F t,k = 1 if post t contains hashtag k and F t,k = 0 otherwise. The resulting matrix is shown in Fig 5, with non-zero values indicated by black points.
For this application, we first consider weights of the type 5), that depend on an indicator related to the underlying Instagram network. Precisely, we associate to each agent i the variable G fol i defined as the number of user i's followers, among those who were active during the crawling period and we set where i(t) denotes the author of post t. Therefore the inclusion probability for hashtag k becomes where the average popularity of hashtag k is exponentially discounted by the factor G fol iðtÞ . The decision to introduce such kind of weights was driven by the following consideration. A user with a very high number of followers identifies a person who is very popular on the social networks, an "influencer" in the extreme case. As a consequence, it may be reasonable to think that she/he is less affected by other people's posts and, consequently, less prone to use old hashtags. For this user, the average popularity of k in the inclusion probability P t (k) should be less relevant. On the contrary, a user with a low number of followers may be more incline to follow the current trends and the others' preferences and choices. It is worthwhile to point out that in the definition of the weights, we considered the number of followers of an user as fixed to the value we observed at the end of the period of observation (the crawling period). In general, it may change in time, depending on the changes in her/his network of virtual friendships. However, we assume it to be constant because of the short time span considered.
Moreover, we also use weights of the type 3), similar to those used in the previous two applications: indeed, at each time-step t, we associate to each user i the fitness variable and we define the weights as (As before, we add 1 in (17) in order to avoid division by zero in (18).) A high value of G posts t;i should identify a user who is very active in the considered context so that it is likely that other users employ the same hashtags of her/his posts. Accordingly, the inclusion probability in Eq (2) reads as The performed analysis follows the general methodology (with R = 500) explained above. We first estimate the model's parameters, obtaining the results in Table 11. Again, the estimated values for δ reveal that the weighted preferential attachment term (4) plays an important role in the inclusion probabilities. Fig 6 provides in the left panel a log-log plot of the cumulative count of new features (key-words) as a function of time (see the red dots), that clearly shows a power-law behavior. Moreover, this agrees with the theoretical property of the model stated in Section A.1.1 in S1 Text, according to which the power-law exponent has to be equal to the parameter β (in the figure the black line has slope equal to the estimated value for β, that isb ¼ 0:5897). The goodness of fit of the model to the dataset has been evaluated through the computation of the quantities (6), (7), (8) and (9). These results are shown in Tables 12, 13 and 14. We can see that the average number of old features, i.e. the quantity Collaboration and followership: A stochastic model for activities in social networks � O T , shows a good agreement with the observed quantity in the case of the model with both the considered weights, contrarily to the model with all the weights equal to one for which we obtain a much higher value. We note that the difference in the values is smaller for the case with G fol i (i.e. the case with the fitness based on the number of followers). Moreover, the model is perfectly able to reproduce the average number of new features � N T . Table 13 also indicates that the model with weights (15) (i.e. those with fitness expressed as the number of followers) shows a better performance than the one with the weights depending on the number of posts or the one with all the weights equal to one. More precisely, for the model with weights (15), the computed values of � m O ranges from 97% to 99%, pointing out that a high percentage of the entries in the actions-features matrix have been correctly inferred by the model. The differences are more evident when we select the first k � features: indeed, with with weights (15) we succeed to infer the values of at least 97% of the entries; while with weights (18) and with all the weights equal to 1 the percentage remains under 96% and 86%, respectively. This means that the major difference in the performance of the different  considered weights is in the first features, that are those for which the preferential attachment term (that depend on the weights) is more relevant. Note also that the actions-features matrix is more dense in the part corresponding to the first k � features. At this point, we select the weights (15) (i.e. those taking into account the users' number of followers) and the last analysis focuses on it. In Table 15 we evaluate the predictive power of the model with the chosen weights: we estimate the parameters of the model only on a subset of the observed actions, respectively the 75%, 50% and 25% of the total observations; we then predict the features for the future actions {T � + 1, . . ., T} and compare the predicted and observed results by means of the indicators in (10) over the whole set of features and only on a portion of it. Finally, in the right panel of Fig 6, we provide the asymptotic behavior of the number of edges in the actions-features network: more precisely, the red dots represent the total number e(t) of edges observed in the real actions-features matrix at each time-step; while the continuous black line shows the mean number μ e (t) of edges obtained averaging over R = 500 simulations of the model with the chosen weights.  (7), computed on the whole matrix (k � = 5890) and also taking into account only the first k � = 500, 250, 100 features.   (10) are computed for different levels of information used as "training set": more precisely, the different values of T � correspond to 75%, 50% and 25% of the set of the actions, respectively. Moreover, the indicator � m � O is computed on the whole matrix (k � = 5890) and also taking into account only the first k � = 250 features. In the brackets, there are the sample standard deviations. We can conclude that for the considered dataset of Instagram, the number of followers of a user is a good measure in order to evaluate her/his inclination to employ old hashtags, but in the inverse sense: the bigger the popularity, the smaller the tendency of re-use already existing hashtags. The difference in the performance of the model with weights based on the number of followers and with weights based on the number of posts is not so significant. Therefore, as in the previous two applications, we can affirm that the relevance of an agent (with respect to the probability of transmitting her/his features) is well measured by the number of her/his actions.

Summary of the results
We here summarize the major findings of the three considered applications.
In all the three cases we chose the weights depending on a fitness variable. In the first two applications (IEEE and arXiv), the fitness variable measures the ability of the agents (authors) to transmit the features (keywords) of their actions (publications). In the third application (Instragram) we considered two kinds of fitness: one quantifies the inclination of the agents (users) to follow the features (hashtags) of the previous actions (posts) and the other, as before, measures the ability of the agents to transmit the features of their actions. From the performed analyses of the actions-features bipartite networks, we get the following main common issues for the three applications: • In the inclusion probabilities defined by (2), the preferential attachment term plays a relevant role, because of the small estimated values obtained for the parameter δ.
• The considered indicators and the plots regarding the behavior along time of the total number of observed features L t show a good fit between the model with the selected weights and the real datasets. In particular, the power-law behavior of L t perfectly matches the theoretical one with the estimated parameterb as the power-law exponent, and a high percentage of the entries of the actions-features matrix is successfully inferred with the model. Moreover, a good performance is also obtained when making a prediction analysis, i.e. testing the percentage of the entries that are successfully recovered by the model providing it with different levels of information. Regarding the plot of μ e (t), we note that the observed total number of edges is well replicated in all applications. Nevertheless, the plots of its dynamics show that the slope of the simulated curve and the real one asymptotically match only in the case of IEEE dataset. In the other applications the two curve intersect, but their slope are not the same. The different performance in replicating the dynamics of L t and μ e (t) reveal that the arrival process of the features is simpler and well captured by the proposed model (recall that in the model the dynamics of L t does not depend on the weights, but only on the parameters α and β) than the selection mechanism of the old features. Therefore, it is hard to recover the whole dynamics of μ e (t) along time. However, as said before, in all the considered applications, the proposed dynamics for the selection of old features (that is the inclusion probabilities endowed with suitable observed random weights and a single estimated parameter δ) show a good performance with respect to the employed indicators.
• With respect to the "flat weights", i.e. all weights equal to 1, the selected weights guarantee a better agreement with the real actions-features matrices. The difference in the performance of the model with different weights is put in evidence by the indicator � O T and it is also evident when we consider a subset of the overall set of the observed features for the computation of the indicator � m O . Indeed, the actions-features matrices appear more dense in the part corresponding to the first features and these features are those for which the preferential attachment term, that depends on the weights, is more relevant.

Conclusions
In this work we have presented our contribution to the stream of literature regarding stochastic models for networks formation. With respect to the previous publications, the present paper introduces some novelties. First of all, our focus is to define a model for the bipartite network that describes the activity of some agents, studying the behavior in time of agents' actions and the features shown by these actions. Therefore, we only assume to know the chronological order in which we observe the agents' actions, and not the order in which the agents arrive. Second, we extend the concept of "preferential attachment with weights" [10,11] to this framework. The weights can have different forms and meanings according to the specific setting considered and play an important role since the probability that a future action shows a certain feature depends, not only on its popularity (i.e. the number of previous actions showing the feature) as stated by the preferential attachment rule, but also on some characteristics of the agents and/or the features themselves. For instance, the weights may give information regarding the ability of an agent to transmit the features of her/his actions to the future actions, or the inclination of an agent to adopt the features shown in the past.
Summarizing, we first provide a full description of the model dynamics and interpretation of the included parameters and variables, also showing some theoretical results regarding the asymptotic properties of some important quantities. Moreover, we illustrate the necessary tools in order to estimate the parameters of the model and we consider three different applications. For each of them, we evaluate the goodness of fit of the model to the data by checking the theoretical asymptotic properties of the model in the real data, by comparing several indicators computed both on the real and simulated matrices, as well as testing the ability of the model as a predictive instrument in order to forecast which features will be shown by future actions. All in all, the analyses point out a good fit of the model and a good performance of the adopted tools in all the three considered cases.
The model and the related analysis have been able to detect some interesting aspects that characterize the different examined contexts. In the first two applications (IEEE and arXiv) we examined the publications in the scientific areas of Automatic Driving and of High Energy Physics (briefly Hep-Th) and we took into account two kinds of fitness variables for the authors: one based on the number of publications and the other based on the number of collaborators. This study reveals that, for Hep-Th, both the number of publications of an author and the number of her/his collaborators are able to provide a good agreement with real data, while, for Automatic Driving, we found a better performance of the model with the weights based on the number of publications. Probably this difference is due to the fact that, while, in the considered temporal window, the Physics of High Energies is quite an old subject in which different branches developed, Automatic Driving is a much younger research area. (Indeed, the observed values of T and L T , that is the number of publications and the number of keywords in the considered period, for the Automatic Driving are much smaller than the ones observed for Hep-Th in the same period. The indicator � N T also suggests that Automatic Driving is a younger research field than Hep-Th, since the observed value for the former is greater than the one for the latter.) The behavior of the considered on-line social network results well described with a different kind of weights. We examined the dataset of Instagram, with posts considered as actions and hashtags as features, and we observed that the less followers a user has the higher the number of old hashtags used. This could be related to the fact that less popular users tend to re-use many old hashtags in order to increase their visibility, while highly famous users do not feel the need of improving their popularity in this way and focus on few old hashtags. Indeed, this behavior seems to show a different role of the "on-line followership" relations respect to coauthorships: while collaborations incentive the usage of a high number of existing features, the number of followers takes to a limited usage of existing hashtags. Regarding this application, we also observed that, as in the considered collaboration networks, the relevance of an agent (with respect to the probability of transmitting her/his features) is well measured by her/his activity, that is the number of her/his actions.
Supporting information S1 Text. Some asymptotic results for the model, estimation of the model parameters and data cleaning procedure. (PDF)