The Distribution of the Asymptotic Number of Citations to Sets of Publications by a Researcher or from an Academic Department Are Consistent with a Discrete Lognormal Model

How to quantify the impact of a researcher’s or an institution’s body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the h-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, worst of all, susceptible to manipulation. Here, we pursue a principled approach to the development of an indicator to quantify the scientific impact of both individual researchers and research institutions grounded on the functional form of the distribution of the asymptotic number of citations. We validate our approach using the publication records of 1,283 researchers from seven scientific and engineering disciplines and the chemistry departments at the 106 U.S. research institutions classified as “very high research activity”. Our approach has three distinct advantages. First, it accurately captures the overall scientific impact of researchers at all career stages, as measured by asymptotic citation counts. Second, unlike other measures, our indicator is resistant to manipulation and rewards publication quality over quantity. Third, our approach captures the time-evolution of the scientific impact of research institutions.


Introduction
The explosive growth in the number of scientific journals and publications has outstripped researchers' ability to evaluate them [1]. To choose what to browse, read, or cite from a huge and growing collection of scientific literature is a challenging task for researchers in nearly all areas of Science and Technology. In order to search for worthwhile publications, researchers are thus relying more and more on heuristic proxies -such as author and journal reputations -that signal publication quality.
The introduction of the Science Citation Index (SCI) in 1963 [2] and the establishment of bibliographic databases spurred the development of bibliometric measures for quantifying the impact of individual researchers, journals, and institutions. Various bibliometric indicators have been proposed as measures of impact, including such notorious examples as the Journal Impact Factor and the h-index [3,4]. However, several studies revealed that these measures can be inconsistent, biased, and, worst of all, susceptible to manipulation [5][6][7][8][9][10][11][12][13][14][15]. For example, the limitations of the popular h-index include its dependence on discipline and on career length [16].
In recent years, researchers have proposed a veritable alphabet soup of "new" metrics -the g-index [17], the R-index [18], the ch-index [19], among others -most of which are ad-hoc heuristics, lacking insight about why or how scientific publications accumulate citations.
The onslaught of dubious indicators based on citation counts has spurred a backlash and the introduction of so-called "altmetric" indicators of scientific performance. These new indicators completely disregard citations, considering instead such quantities as number of article downloads or article views, and number of "shares" on diverse social platforms [20][21][22]. Unfortunately, new research is showing that altmetrics are likely to reflect popularity rather than impact, that they have incomplete coverage of the scientific disciplines [23,24], and that they are extremely susceptible to manipulation. For example, inflating the findings of a publication in the abstract can lead to misleading press reports [25], and journals' electronic interfaces can be designed to inflate article views and/or downloads [26].
Citations are the currency of scientific research. In theory, they are used by researchers to recognize prior work that was crucial to the study being reported. However, citations are also used to make the research message more persuasive, to refute previous work, or to align with a given field [27]. To complicate matters further, the various scientific disciplines differ in their citation practices [28]. Yet, despite their limitations, citations from articles published in reputable journals remain the most significant quantity with which to build indicators of scientific impact [12].
It behooves us to develop a measure that is based on a thorough understanding of the citation accumulation process and also grounded on a rigorous statistical validation. Some researchers have taken some steps in this direction. Examples include the ranking of researchers using PageRank [29] or the beta distribution [30], and the re-scaling of citation distributions from different disciplines under a universal curve using the lognormal distribution [31].
One crucial aspect of the process of citation accumulation is that it takes a long time to reach a steady state [32]. This reality is often ignored in many analyses and thus confounds the interpretation of most measured values. Indeed, the lag between time of publication and perception of impact is becoming increasingly relevant. For example, faced with increasingly large pools of applicants, hiring committees need to be able to find the most qualified researchers for the position in an efficient and timely manner [33,34]. To our knowledge, only a few attempts have been made in developing indicators that can predict future impact using citation measures [35,36] and those have had limited success [37].
Here, we depart from previous efforts by developing a principled approach to the quantification of scientific impact. Specifically, we demonstrate that the distribution of the asymptotic number of accumulated citations to publications by a researcher or from a research institution is consistent with a discrete lognormal model [32,38]. We validate our approach with two datasets acquired from Thomson Reuters' Web of Science (WoS): • Manually disambiguated citation data pertaining to researchers at the top United States (U.S.) research institutions across seven disciplines [39]: chemical engineering, chemistry, ecology, industrial engineering, material science, molecular biology, and psychology; • Citation data from the chemistry departments of 106 U.S. institutions classified as "very high research activity".
Significantly, our findings enable us to develop a measure of scientific impact with desirable properties.

The Data
We perform our first set of analyses on the dataset described by Duch et al. [39]. This dataset contains the disambiguated publication records of 4,204 faculty members at some of the top U.S. research universities in seven scientific disciplines: chemical engineering, chemistry, ecology, industrial engineering, material science, molecular biology, and psychology (see [39] for details about data acquisition and validation). We consider here only 230,964 publications that were in press by the end of 2000. We do this so that every publication considered has had a time span of at least 10 years for accruing citations [38] (the researcher's publication dataset was gathered in 2010). We perform our second set of analyses on the publication records of the chemistry departments at the top U.S. research institutions according to [40]. Using the publications' address fields, we identified 382,935 total publications from 106 chemistry departments that were in press by the end of 2009 (the department's publication dataset was gathered in 2014).
In our analyses we distinguish between "primary" publications, which report original research findings, and "secondary" publications, which analyze, promote or compile research published elsewhere. We identify as primary publications those classified by WoS as "Article", "Letter", or "Note" and identify all other publications types as secondary publications.
Moreover, to ensure that we have enough statistical power to determine the significance of the model fits, we restrict our analysis to researchers with at least 50 primary research publications. These restrictions reduce the size of the researchers dataset to 1,283 researchers and 148,878 publications. All 106 departments in our dataset have a total of more than 50 primary research publications.

The Distribution of the asymptotic Number of Citations
Prior research suggests that a lognormal distribution can be used to approximate the steady-state citation profile of a researcher's aggregated publications [31,41]. Stringer et al. demonstrated that the distribution of the number n(t) of citations to publications published in a given journal in a given year converges to a stationary functional form after about ten years [32]. This result was interpreted as an indication that the publications published in a single journal have a characteristic citation propensity [42] which is captured by the distribution of the "ultimate" number of citations. Here, we investigate the asymptotic number of citations n a to the publications of an individual researcher as well as the set of all researchers in a department at a research institution.
We hypothesize that n a is a function of a latent variable ψ representing a publication's "citability" [43]. The citability ψ results from the interplay of several, possibly independent, variables such as timeliness of the work, originality of approach, strength of conclusion, reputation of authors and journals, and potential for generalization to other disciplines, just to name a few [44,45]. In the simplest case, citability will be additive in all these variables, in which case the applicability of the central limit theorem implies that ψ will be a Gaussian variable, ψ ∈ N (µ a , σ a ), where µ a and σ a are respectively the mean and standard deviation of the citability of the publications by researcher a. Therefore, the impact of a researcher's body of work is described by a distribution characterized by just two parameters, µ and σ. Similarly, because in the U.S. departments hire faculty based on their estimated quality, the researchers associated with a department will presumably be similar in stature or potential.
Unlike citations, which are observable and quantifiable, the variables contributing to ψ are neither easily observable nor easy to quantify. Moreover, mapping ψ into citations is not a trivial matter. Citation counts span many orders of magnitude, with the most highly cited publications having tens of thousands of citations [46]. Large-scale experiments on cultural markets indicate that social interactions often create a "rich get richer" dynamics, far distancing the quality of an underlying item from its impact [47]. Citation dynamics are no different. For example, Duch et al. recently showed that the h-index has a power-law dependence on the number of publications N p of a researcher [39]. Here, we reduce the potential distortion of citation-accruing dynamics by focusing on the logarithm of n a . In effect, we take n a to be the result of a multiplicative process of the same variables determining ψ. Thus, we can calculate the probability p dln (n a ) that a researcher or department will have a primary research publication with n a citations, as an integral over ψ: Most researchers also communicate their ideas to their peers via secondary publications such as conference proceedings which, in many disciplines, are mainly intended to promote related work published elsewhere. Some secondary publications will have significant timeliness, in particular review papers and editorial materials, and therefore will likely be cited too. Most of them, however, will not be cited at all. If accounting for secondary publications, Eq. (1) has to be generalized as: where f s is the fraction of secondary publications in a body of work and p s (n a |θ) represents the probability distribution, characterized by parameters θ and not necessarily lognormal, of n a for secondary research publications. We found that in practice Eq. (2) can be well approximated by: where δ is the Kronecker delta. Surprisingly, we found that µ ′ ≈ µ and σ ′ ≈ σ, suggesting that secondary publications have citation characteristics that are significantly different from those of primary publications. Figure 1 shows the cumulative distribution of citations to primary research publications of two researchers in our database and two chemistry departments. Using a χ 2 goodness-of-fit test with re-sampling [48], we find that we can reject the discrete lognormal model, Eq. (1), for only 2.88% of researchers and 1.13% or departments in our database. The results of our statistical analysis demonstrate that a discrete lognormal distribution with parameters µ and σ provides an accurate description of the distribution of the asymptotic number of citations for a researcher's body of work and for the publications from an academic department. Figure 2 displays the sample characteristics of the fitted parameters. The median value ofμ obtained for the different disciplines lies between 1.0 and 1.6. Using data reported in [28] we find a significant correlation (τ Kendall = 0.62, p = 0.069) between the median value ofμ for a discipline and the total number of citation to journals in that discipline (Fig. 3). This correlation suggests thatμ depends on the typical number of citations to publications within a discipline. This dependence on discipline size can in principle be corrected by a normalization factor [14,31,49].

Results
We also plot the fraction of secondary publications, f s , for all the researchers. We find that nearly a fourth of the publications of half of all researchers are secondary, but intra-discipline variation is high. Interdiscipline variability is also high: 17% of the publications of a typical researcher in chemistry are secondary, whereas 60% of the publications of a typical researcher in industrial engineering are secondary.

Reliability of Estimation
We next investigate the dependence of the parameter estimates on number of publications, N p , both at the individual level -testing the effect of sample size -and at the discipline level -testing overall dependence on N p . To test for sample size dependence, we fit the model to subsets of a researcher's publication list. We find that estimates of σ are more sensitive to sample size than estimates of µ (Figs. S1 and S2). However, this dependence becomes rapidly negligible as the sample size approaches the minimum number of publications we required in creating our sample (N p ≥ 50). Next, we test whether, at the discipline level, there is any dependence ofμ on N p . We find no statistically significant correlation, except for a very weak dependence (R 2 ∼ 0.035, p = 0.0052) ofσ on N p for chemical engineering (Table S1). This is in stark contrast with the h-index which exhibits a marked dependence on . Correlation between medianμ for a discipline and the discipline's relative size. We use Rosvall et al. [28] reported values of the relative number of citations to publications in journals of several disciplines as a proxy for relative field size and compare them with the median value ofμ in each discipline. A Kendall rank-correlation test yields a τ K = 0.62 with p = 0.069. This correlation suggests thatμ depends on the typical number of citations of a discipline. number of publications [16].
Then, we test for variation of the estimated parameter values along a researcher's career. To this end, we order each researcher's publication records chronologically and divide them into three sets with equal number of publications and fitted the model to each set of publications. Each set represents the citability of the publications authored at a particular career stage of a researcher. Time trends in the estimated values of µ would indicate that the citability of a researcher's work changes over time. We find such a change for 25% of all researchers. For over 64% of those researchers whose citability changes of over time we find that µ increases (Table 1).
In general, a department has many more publications than any single researcher. Thus, we are able to apply the model from Eq. (1) to each year's worth of departmental publications. This fine temporal resolution enables us to investigate whether there is any time-dependence in the citability of the publications from a department. Figure 4 shows the time-evolution ofμ for the chemistry departments at four typical research institutions. We see that bothμ (circles) andσ (vertical bars) remain remarkably stable over the period considered.

Development of an Indicator
In the following, we compare the effectiveness of µ as an impact indicator with that of other indicators. First, we test the extent to which the value of µ i for a given researcher is correlated with the values of other indicators for the same researcher. In order to provide an understanding of how the number of publications N p influences the values of other metrics, we generate thousands of synthetic samples of n a for different values of N p and µ i , and a fixed value of σ for each discipline. We find that µ is tightly correlated with several other measures, especially with the median number of citations (Fig. 5). Indeedμ can be estimated from the median number of citations:μ This close relation between mean and logarithm of the median further supports our hypothesis of a lognormal distribution for the asymptotic number of citations to primary publications by a researcher.  Table 1. Trends ofμ on career stage for the seven disciplines considered. We divide each researcher's chronologically-ordered publication records into three sets with equal number of publications (start, middle, and end) and fit the model to each set of publications to obtainμ s ,μ m , andμ e . We then used ordinary-least-squares to perform a linear regression on the time dependence of (μ s ,μ m ,μ e ). We then calculate the fraction of researchers whose µ exhibits a statistically significant dependence on career length, by performing a two-tailed significance test on the slope of the regression. We use a randomization test (1,000 samples), combined with a multiple hypothesis correction [50] (false discovery rate of 0.05) to calculate a p-value: for each researcher, we randomly re-order his or her publications, divide them into three sets with equal number of publications and fit the model to each set of publications, and calculate the new slope; we obtain a p-value by comparing the original slope of the fit with the distribution of the randomized slopes.
An important factor to consider when designing a bibliometric indicator is its susceptibility to manipulation. Both the number of publications and total or average number of citations are easily manipulated, especially with the ongoing proliferation of journals of dubious reputation [51,52]. Indeed, the h-index was introduced as a metric that resists manipulation. However, it is a straightforward exercise to show that one could achieve h ∝ N p exclusively through self-citations. Indeed, because the h-index does not account for the effect of self-citations, it is rather susceptible to manipulation, especially by researchers with low values of h [53,54].
In order to determine the true susceptibility of the h-index to manipulation, we devise a method to raise a researcher's h-index using the least possible number of self-citations (see Materials and Methods for details). Our results suggest that increasing the h-index by a small amount is no hard feat for researchers with the ability to quickly produce new articles (Fig. 6A).
Our proposed indicator, µ, is far more difficult to manipulate. Because it has a more complex dependence on the number of citations than the h-index, to increase µ in an efficient manner we use a process whereby we attempt to increase the median number of citations of a researcher's work (see Materials and Methods for details). Specifically, we manipulated µ for all the researchers by increasing their median number of citations. Remarkably, to increase µ by a certain factor one needs at least 10 times more self-citations than one would need in order to increase the h-index by the same factor (Fig. 6B).
While a difference of 2 to 3 orders of magnitude in number of required self-citations may seem surprising for a measure so correlated with citation numbers (Fig. 5), the fact thatμ is actually dependent on the citations to half of all primary publications by a researcher (Eq. (3)) makesμ less susceptible than the h-index to manipulation of citation counts from a small number of publications. This view is also supported by the fact that increasing citations may actually decreaseμ, as we may be adding them to a publication that would not be expected to receive that number of citations given the lognormal model. As a result, manipulation of scientific performance would be very difficult if using a µ-based index.

Comparison of Parameter Statistics
Finally we estimate the parameters in Eg. (1) for chemistry journals and compareμ of chemistry departments and journals in selected years, and all chemistry researchers in our database (Fig. 7. See Fig. S4 forσ and  f s comparison). In order to make sense of this comparison, we must note a few aspects about the data. The researchers in the database were affiliated with the top 30 chemistry departments in the U.S., whereas the set of chemistry departments covers all the chemistry departments from very high research activity universities. Year µ Figure 4. Time-evolution of departmentsμ. Each circle and bar represent, respectively, theμ andσ for a given year of publications. We estimate the parameters in Eq. (1) for sets of departmental publications using a "sliding window" of 3 years. Fits for which we cannot reject the hypothesis that the data is consistent with a discrete lognormal distribution are colored green. We also show each department's average value of µ over the period considered (orange dashed lines).
Thus, it is natural that the typicalμ of researchers is higher than that of departments. Not surprisingly, we find thatμ is typically the lowest for journals.

Discussion
The ever-growing size of the scientific literature precludes researchers from following all developments from even a single sub-field. Therefore researchers need proxies of quality in order to identify which publications to browse, read, and cite. Three main heuristics are familiar to most researchers: institutional reputation, journal reputation, and author reputation.
Author reputation has the greatest limitations. Researchers are not likely to be known outside their (sub-)field and young researchers will not even be known outside their labs. Similarly, if we exclude a few journals with multidisciplinary reputations (Nature, Science, PNAS, NEJM), the reputation of a scientific journal is unlikely to extend outside its field. Institutional reputations are the most likely to be known broadly. Cambridge, Harvard, Oxford, and Stanford are widely recognized. However, one could argue that institutional reputation is not a particularly useful heuristic for finding quality publications within a specific research field.
Our results show that the expected citability of scientific publications published by (i) the researchers in a department, (ii) a given scientific journal, or (iii) a single researcher can be set on the single scale defined by µ. Thus, for a researcher whose publications are characterized by a very high µ, authorship of a publication may give a stronger quality signal about the publication than the journal in which the study is being published. Conversely, for an unknown researcher the strongest quality signal is likely to be the journal where the research is being published or the institution the researcher is affiliated with. Our results thus provide strong evidence for the validity of the heuristics used by most researchers and clarify the conditions under which they are appropriate.

Model Fitting and Hypothesis Testing
We estimate the discrete lognormal model parameters of Eq. (1) for all 1,283 researchers in our database using a maximum likelihood estimator [38]. We then test the goodness of the fit, at an individual level using the χ 2 statistical test. We bin the empirical data in such a way that there are at least 5 expected observation per bin. To assess significance we calculate the χ 2 o statistic for each researcher and then, for each of them, re-sample their citation records using bootstrap (1,000 samples) and calculate a new value of the statistics χ 2 i (i = 1 , . . . , 1,000). We then extract a p-value by comparing the observed statistic χ 2 o with the re-sampled χ 2 distribution. Finally we use a multiple hypothesis correction [50], with a false discovery rate of 0.05, when comparing the model fits with the null hypothesis.

Generation of Theoretical Performance Indicators
For each discipline we take the average value ofσ and 20 equally spaced values of µ between 0.5 and 2.0. We then generate 1,000 datasets of 50 and 200 publications by random sampling from Eq. (1). We then fit the model individually to these 2,000 synthetic datasets and extracted the h-index, average number of citations, total number of citations and median number of citations to publications with at least one citation. Finally, for each value of µ, we calculate the average and the 95% confidence interval of all the indicators.

Manipulation Procedure for h-index
We try to increase the h-index of a researcher by self-citations alone, i.e., we assume the researcher does not receive citations from other sources during this procedure. The procedure works by adding only the minimum required citations to those publications that would cause the h-index to increase. Consider researcher John Doe who has 3 publications with {n a } = (2,2,5). Doe's h is 2. Assuming those publications don't get cited by other researchers during this time period, to increase h by 1, Doe needs to publish only one additional publication with two self-citations; to increase h by 2 he must instead produce five publications with a total of eight self-citations, four of which to one of the additional five publications. We execute this procedure for all researchers in the database until they reached a h-index of 100.

Manipulation Procedure for µ
The manipulation of µ is based on Eq. (3). We try to change a researcher's µ by increasing the median number of citations to publications which have at least one citation already. We consider only self-citations originating from secondary publications, i.e., publications that will not get cited. For a given corpus of publications we first define a target increase in median, x and then calculate the number of self-citations needed to increase the current median by x citations and the corresponding number of secondary publications. We then take the initial corpus of publications and attempt to increase the median citation by x + 1. We repeat this procedure until we reach an increase in median citation of 2000.  Figure S1. Dependence ofμ on number of publications at the individual level. We fit the model to 1,000 randomized subsets of each researcher's publication list and compare theμ obtained from fitting each subset of 10, 50, and 100 publications with theμ associated with the complete publication list. Then, for each researcher and subset size, we calculate a z-score using the mean and standard deviation of the "sub-μ". For N p ≥ 50, the dependence on sample size is negligible for most researchers. Researchers with N p < 100 are omitted from the calculation on the subset of size 100.  Figure S2. Dependence ofσ estimates on number of publications at the individual level. We use the same procedure as in Fig. S1, except here we show the results for the dependence ofσ on sample size. Estimates ofσ are more dependent of sample size thanμ. However, as in the case ofμ, the dependence ofσ on sample size decays rapidly with increasing sample size. Researchers with N p < 100 are omitted from the calculation on the subset of size 100.  Figure S3. Susceptibility of impact measures to manipulation. We used the same procedure as in Fig. 6, except here we show the required number of publications with self-citations that researchers need to publish in order to increase their indicators. Other details are the same as in Fig. 6.  Figure S4. Comparison ofσ and f s across departments, journals, and researchers. We show the maximum likelihood fittedσ (top) and the fraction of secondary publications (bottom) for chemistry departments and chemistry journals in select years, and for all chemistry researchers in our database. The black horizontal dashed lines mark the value of the corresponding parameter for the Journal of the American Chemical Society in 1995. For clarity, we do not showσ for 19 journals and 9 researchers that are outliers.  Table S1. Individual lognormal parameters show no dependence on N p For each researcher within each of the seven disciplines we perform least-squares linear regression between the lognormal parametersμ andσ, and log 10 (N p ). We used a permutation test to calculate the p-values: for each set of pairs, (μ, N p ) and (σ, N p ), we performed 10,000 random swaps of all N p and subsequent regression; we obtained a p-value by comparing the original slope of the fit with the distribution of the permuted slopes. * p < 0.05/7 ∼ 0.0074.