Characterizing and Modeling Citation Dynamics

Citation distributions are crucial for the analysis and modeling of the activity of scientists. We investigated bibliometric data of papers published in journals of the American Physical Society, searching for the type of function which best describes the observed citation distributions. We used the goodness of fit with Kolmogorov-Smirnov statistics for three classes of functions: log-normal, simple power law and shifted power law. The shifted power law turns out to be the most reliable hypothesis for all citation networks we derived, which correspond to different time spans. We find that citation dynamics is characterized by bursts, usually occurring within a few years since publication of a paper, and the burst size spans several orders of magnitude. We also investigated the microscopic mechanisms for the evolution of citation networks, by proposing a linear preferential attachment with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.


Introduction
Citation networks are compact representations of the relationships between research products, both in the sciences and the humanities [1,2]. As such they are a valuable tool to uncover the dynamics of scientific productivity and have been studied for a long time, since the seminal paper by De Solla Price [3]. In the last years, in particular, due to the increasing availability of large bibliographic data and computational resources, it is possible to build large networks and analyze them to an unprecedented level of accuracy.
In a citation network, each vertex represents a paper and there is a directed edge from paper A to paper B if A includes B in its list of references. Citation networks are then directed, by construction, and acyclic, as papers can only point to older papers, so directed loops cannot be obtained. A large part of the literature on citation networks has focused on the characterization of the probability distribution of the number of citations received by a paper, and on the design of simple microscopic models able to reproduce the distribution. The number of citations of a paper is the number of incoming edges (indegree) k in of the vertex representing the paper in the citation network. So the probability distribution of citations is just the indegree distribution P k in ð Þ. There is no doubt that citation distributions are broad, as there are papers with many citations together with many poorly cited (including many uncited) papers. However, as of today, the functional shape of citation distributions is still elusive. This is because the question is illdefined. In fact, one may formulate it in a variety of different contexts, which generally yield different answers. For instance, one may wish to uncover the distribution from the global citation network including all papers published in all journals at all times. Otherwise, one may wish to specialize the query to specific disciplines or years. The role of the discipline considered is important and is liable to affect the final result. For instance, it is well known that papers in Biology are, on average, much more cited than papers in Mathematics. One may argue that this evidence may still be consistent with having similar functional distributions for the two disciplines, defined on ranges of different sizes. Also, the role of time is important. It is unlikely that citation distributions maintain the exact same shape regardless of the specific time window considered. The dynamics of scientific production has changed considerably in the last years. It is well known, for instance, that the number of published papers per year has been increasing exponentially until now [4]. This, together with the much quicker publication times of modern journals, has deeply affected the dynamics of citation accumulation of papers. Moreover, if the dataset at study includes papers published in different years, older papers tend to have more citations than recent ones just because they have been exposed for a longer time, not necessarily because they are better works: the age of a paper is an important factor.
So, the question of which function best describes the citation distributions is meaningless if one does not define precisely the set of publications examined. Redner [5] considered all papers published in Physical Review D up to 1997, along with all articles indexed by Thomson Scientific in the period 1981-1997, and found that the right tail of the distribution, corresponding to highly cited papers, follows a power law with exponent c~3, in accord with the conclusions of Price [3]. Laherrére and Sornette [6] studied the top 1120 most cited physicists during the period 1981-1997, whose citation distribution is more compatible with a stretched exponential P k in ð Þ* exp { k in ð Þ b h i , with b^0:3. Tsallis and de Albuquerque [7] analyzed the same datasets used by Redner with an additional one including all papers published up to 1999 in Physical Review E, and found that the Tsallis distribution P k in ð Þ~P(0)= 1z b{1 ð Þl k in ½ b=(b{1) , with l^0:1 and b^1:5, consistently fits the whole distribution of citations (not just the tail). More recently Redner performed an analysis over all papers published in the 110 years long history of journals of the American Physical Society (APS) [8], concluding that the log-normal distribution is more adequate than a power law. In other studies distributions of citations have been fitted with various functional forms: powerlaw [9][10][11][12][13][14], log-normal [12,15,16], Tsallis distribution [17,18], modified Bessel function [19,20] or more complicated distributions [21].
In this paper we want to examine citation networks more in depth. We considered networks including all papers and their mutual citations within several time windows. We have performed a detailed analysis of the shape of the distributions, by computing the goodness of fits with Kolmogorov-Smirnov statistics of three model functions: simple power law, shifted power law and lognormal. Moreover, we have also examined dynamic aspects of the process of citation accumulation, revealing the existence of ''bursts'', i.e. of rapid accretions of the number of citations received by papers. Citation bursts are not compatible with standard models of citation accumulation based on preferential attachment [22], in which the accumulation is smooth and papers may attract many cites long after publication. Therefore, we propose a model in which the citation attractiveness of a paper depends both on the number of cites already collected by the paper and on some intrinsic attractiveness that decays in time. The resulting picture delivers both the citation distribution and the presence of bursts.

The distribution of cites
For our analysis we use the citation database of the American Physical Society (APS), described in Materials and Methods. We get the best fit for the empirical citation distributions from the goodness of fit test with Kolmogorov-Smirnov (KS) statistics [23]. The KS statistic D is the maximum distance between the cumulative distribution function (CDF) of the empirical data and the CDF of the fitted model: Here S(k in ) is the CDF of the empirical indegree k in and P(k in ) is the CDF of the model that fits best the empirical data in the region k in §k min in . By searching the parameter space, the best hypothetical model is the one with the least value of D from the empirical data. To test the statistical significance of the hypothetical model, we cannot use the values of the KS statistics directly though, as the model has been derived from a best fit on the empirical data, rather than being an independent hypothesis. So, following Ref. [23] we generate synthetic datasets from the model corresponding to the best fit curve. For instance, if the best fit is the power law ax {b , the datasets are generated from this distribution. Each synthetic dataset will give a value D synth for the KS statistics between the dataset and the best fit curve. These D synth -values are compared with D emp , i.e. the D-value between the original empirical data and the best fit curve, in order to define a p-value. The p-value is the fraction of D synth -values larger than D emp . If p is large (close to 1), the model is a plausible fit to the empirical data; if it is close to 0, the hypothetical model is not a plausible fit. We applied this goodness of fit test to three hypothetical model distributions: log-normal, simple power law and shifted power law. The log-normal distribution for the indegree k in is given by the simple power law distribution by and the shifted power law by We used 1000 synthetic distributions to calculate the p-value for each empirical distribution. Fig. 1 shows some fits for datasets corresponding to several time windows (see Materials and Methods). The detailed summary of the goodness of fit results is shown in Table 1. The simple power law gives high p-value only when one considers the right tail of the distribution (usually k in w20). The log-normal distribution gives high p-value for early years (before 1970) but after 1970 the pvalue is smaller than 0.2. As shown in Figs. 1a and 1b, there is a clear discrepancy in the tail between the best fit log-normal distribution and the empirical distribution. The shifted power law distribution gives significant p-values (higher than 0.2) for all observation periods. The values of the exponent c of the shifted power law are decreasing in time. The range of c goes from 5:6 (1950) to 3:1 (2008).
We conclude that the shifted power law is the best distribution to fit the data.

The distribution of citation bursts
We now turn our attention to citation ''bursts''. While there has been a sizeable activity in the analysis of bursty behavior in human dynamics [24][25][26], we are not aware of similar investigations for citation dynamics. We compute the relative rate 1969,1989,2007 and dt~1 year are shown in Fig. 2a. They are visibly broad, spanning several orders of magnitude. Similar heavy tails of burst size distributions were observed in the dynamics of popularity in Wikipedia and the Web [27]. It is notable that the largest bursts take place in the first years after publication of a paper. This is manifest in Fig. 2b, where we show distributions derived from the same dataset as in Fig. 2a, but including only papers older than 5 (squares) and 10 years (triangles): the tail disappears. In general, more than 90% of large bursts (Dk=kw3:0) occur within the first 4 years since publication.

Preferential attachment and age-dependent attractiveness
For many growing networks, cumulative advantage [28,29], or preferential attachment [22], has proven to be a reliable mechanism to explain the fat-tailed distributions observed. In the context of citation dynamics, it is reasonable to assume that, if a paper is very cited, it will have an enhanced chance to receive citations in the future with respect to poorly cited papers. This can be formulated by stating that the probability that a paper gets cited is proportional to the number of citations it already received. That was the original idea of Price [30] and led to the development of the first dynamic mechanism for the generation of power law distributions in citation networks. In later refinements of the model, one has introduced an attractiveness for the vertices, indicating their own appeal to attract edges, regardless of degree. In particular, one has introduced the so-called linear preferential attachment [31,32], in which the probability for a vertex to receive a new edge is proportional to the sum of the attractiveness of the vertex and its degree. In this Section we want to check whether this hypothesis holds for our datasets. This issue has been addressed in other works on citation analysis, like Refs. [13,33].
We investigated the dependence of the kernel function P(k in ) on indegree k in [34,35]. The kernel is the rate with which a vertex i with indegree k i in acquires new incoming edges. For linear preferential attachment the kernel is In Eq. 6 the constant A i indicates the attractiveness of vertex i. Computing the kernel directly for each indegree class (i.e. for all vertices with equal indegree k in ) is not ideal, as the result may heavily fluctuate for large values of the indegree, due to poor  The fits are done for indegree larger than k min , whose values are also reported in the In Eq. 7 SAT is the average attractiveness of the vertices. In order to estimate P w (k in ), we need to compute the probability that vertices with equal indegree have gotten edges over a given time window, and sum the results over all indegree values from the smallest one to a given k in . The time window has to be small enough in order to preserve the structure of the network but not too small in order to have enough citation statistics. In Fig. 3 we show the cumulative kernel function P w (k in ) as a function of indegree for a time window from 2007 to 2008. The profile of the curve (empty circles) is compatible with linear preferential attachment with an average attractiveness SAT~7:0 over a large range, although the final part of the tail is missed. Still, the slope of the tail, apart from the final plateau, is close to 2, like in Eq. 7. Our result is consistent with that of Jeong et al. [34], who considered a citation network of papers published in Physical Review Letters in 1988, which are part of our dataset as well. We have repeated this analysis for several datasets, from 1950 until 2008, by keeping a time window of one year in each case. The resulting values of SAT are reported in Table 2, along with the number of vertices and mean degree of the networks. The average value of the attractiveness across all datasets is 7:1. This value is much bigger than the average indegree in the early ages of the network like, for example, from 1950 to 1960. Hence, in the tradeoff between indegree and attractiveness of Eq. 6, the latter is quite important for old papers. In general, for low indegrees, attractiveness dominates over preferential attachment. As we see in Fig. 3, in fact, for low indegrees there is no power law dependence of the kernel on indegree. Finally we investigated the time dependence of the kernel. As shown in Fig. 3, when we limit the analysis to papers older than 5 years (squares) or 10 years (triangles), the kernel has a pure quadratic dependence on indegree in the initial part, without linear terms, so the attractiveness does not affect the citation dynamics. This means that the attractiveness has a significant influence on the evolution of the citation network only within the first few years after publication of the papers. The presence of vertex attractiveness had been considered by Jeong et al. as well [34].

The model
We would like to design a microscopic model that reflects the observed properties of our citation networks. Preferential attachment does not account for the fact that the probability to receive citations may depend on time. In the Price model, for instance, papers keep collecting citations independently of their age, while it is empirically observed [33,36,37] that the probability for an article to get cited decreases as the age of the same article increases. In addition, we have seen that citation bursts typically occur in the early life of a paper. Some sophisticated growing network models include the aging of vertices as well [33,[37][38][39][40]. We propose a mechanism based on linear preferential attachment, where papers have individual values of the attractiveness, and the latter decays in time.
The model works as follows. At each time step t, a new vertex joins the network (i.e., a new paper is published). The new vertex/ paper has m references to existing vertices/papers. The probability P(i?j,t) that the new vertex i points to a target vertex j with indegree k j in reads where A j (t) is the attractiveness of j at time t. If A j (t) were constant and equal for all vertices we would recover the standard linear preferential attachment [31,32]. We instead assume that it decays exponentially in time In Eq. 9 A 0 is the initial attractiveness of the vertex, and t 0 is the time in which the vertex first appears in the network; t is the time scale of the decay, after which the attractiveness lowers considerably and loses importance for citation dynamics. Since citation bursts occur in the initial phase of a paper's life (Fig. 2b), when vertex attractiveness is most relevant, we expect that the values of the initial attractiveness are heterogeneously distributed, to account for the broad distribution of burst sizes (Fig. 2a). We assume the power law distribution

Discussion
We investigated citation dynamics for networks of papers published on journals of the American Physical Society. Kolmogorov-Smirnov statistics along with goodness of fit tests make us conclude that the best ansatz for the distribution of citations (from old times up to any given year) is a shifted power law. The latter beats both simple power laws, which are acceptable only on the right tails of the distributions, and log-normals, which are better than simple power laws on the left part of the curve, but are not accurate in the description of the right tails. We have also studied dynamic properties of citation flows, and found that the early life of papers is characterized by citation bursts, like already found for popularity dynamics in Wikipedia and the Web.
The existence of bursts is not compatible with traditional models based on preferential attachment, which are capable to account for the skewed citation distributions observed, but in which citation accumulation is smooth. Therefore we have introduced a variant of linear preferential attachment, with two new features: 1)  the attractiveness decays exponentially in time, so it plays a role only in the early life of papers, after which it is dominated by the number of citations accumulated; 2) the attractiveness is not the same for all vertices but it follows a heterogeneous (power-law) distribution. We have found that this simple model is accurate in the description of the distributions of citations and burst sizes, across very different scientific ages. Moreover, the model is fairly robust with respect to the choice of the observation window for the bursts.