## Figures

## Abstract

How to quantify the impact of a researcher’s or an institution’s body of work is a matter of increasing importance to scientists, funding agencies, and hiring committees. The use of bibliometric indicators, such as the *h*-index or the Journal Impact Factor, have become widespread despite their known limitations. We argue that most existing bibliometric indicators are inconsistent, biased, and, worst of all, susceptible to manipulation. Here, we pursue a principled approach to the development of an indicator to quantify the scientific impact of both individual researchers and research institutions grounded on the functional form of the distribution of the asymptotic number of citations. We validate our approach using the publication records of 1,283 researchers from seven scientific and engineering disciplines and the chemistry departments at the 106 U.S. research institutions classified as “very high research activity”. Our approach has three distinct advantages. First, it accurately captures the overall scientific impact of researchers at all career stages, as measured by asymptotic citation counts. Second, unlike other measures, our indicator is resistant to manipulation and rewards publication quality over quantity. Third, our approach captures the time-evolution of the scientific impact of research institutions.

**Citation: **Moreira JAG, Zeng XHT, Amaral LAN (2015) The Distribution of the Asymptotic Number of Citations to Sets of Publications by a Researcher or from an Academic Department Are Consistent with a Discrete Lognormal Model. PLoS ONE 10(11):
e0143108.
https://doi.org/10.1371/journal.pone.0143108

**Editor: **Tobias Preis,
University of Warwick, UNITED KINGDOM

**Received: **August 12, 2015; **Accepted: **October 30, 2015; **Published: ** November 16, 2015

**Copyright: ** © 2015 Moreira et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

**Data Availability: **All non-copyrighted data will be available at Figshare: http://dx.doi.org/10.6084/m9.figshare.1591864.

**Funding: **JAGM was funded by Fundação para a Ciência e Tecnologia Grant No. SFRH-BD-76115-2011 (http://www.fct.pt/). LANA was funded by the Department of Defense’s Army Research Office Grant No. W911NF-14-1-0259 (www.arl.army.mil) and by the John Templeton Foundation Award No. FP053369-A//39147 (https://www.templeton.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

The explosive growth in the number of scientific journals and publications has outstripped researchers’ ability to evaluate them [1]. To choose what to browse, read, or cite from a huge and growing collection of scientific literature is a challenging task for researchers in nearly all areas of Science and Technology. In order to search for worthwhile publications, researchers are thus relying more and more on heuristic proxies—such as author and journal reputations—that signal publication quality.

The introduction of the *Science Citation Index* (SCI) in 1963 [2] and the establishment of bibliographic databases spurred the development of bibliometric measures for quantifying the impact of individual researchers, journals, and institutions. Various bibliometric indicators have been proposed as measures of impact, including such notorious examples as the Journal Impact Factor and the *h*-index [3, 4]. However, several studies revealed that these measures can be inconsistent, biased, and, worst of all, susceptible to manipulation [5–15]. For example, the limitations of the popular *h*-index include its dependence on discipline and on career length [16].

In recent years, researchers have proposed a veritable alphabet soup of “new” metrics—the *g*-index [17], the *R*-index [18], the *ch*-index [19], among others—most of which are *ad-hoc* heuristics, lacking insight about why or how scientific publications accumulate citations.

The onslaught of dubious indicators based on citation counts has spurred a backlash and the introduction of so-called “altmetric” indicators of scientific performance. These new indicators completely disregard citations, considering instead such quantities as number of article downloads or article views, and number of “shares” on diverse social platforms [20–22]. Unfortunately, new research is showing that altmetrics are likely to reflect popularity rather than impact, that they have incomplete coverage of the scientific disciplines [23, 24], and that they are *extremely susceptible to manipulation*. For example, inflating the findings of a publication in the abstract can lead to misleading press reports [25], and journals’ electronic interfaces can be designed to inflate article views and/or downloads [26].

Citations are the currency of scientific research. In theory, they are used by researchers to recognize prior work that was crucial to the study being reported. However, citations are also used to make the research message more persuasive, to refute previous work, or to align with a given field [27]. To complicate matters further, the various scientific disciplines differ in their citation practices [28]. Yet, despite their limitations, citations from articles published in reputable journals remain the most significant quantity with which to build indicators of scientific impact [12].

It behooves us to develop a measure that is based on a thorough understanding of the citation accumulation process and also grounded on a rigorous statistical validation. Some researchers have taken some steps in this direction. Examples include the ranking of researchers using PageRank [29] or the beta distribution [30], and the re-scaling of citation distributions from different disciplines under a universal curve using the lognormal distribution [31].

One crucial aspect of the process of citation accumulation is that it takes a long time to reach a steady state [32]. This reality is often ignored in many analyses and thus confounds the interpretation of most measured values. Indeed, the lag between time of publication and perception of impact is becoming increasingly relevant. For example, faced with increasingly large pools of applicants, hiring committees need to be able to find the most qualified researchers for the position in an efficient and timely manner [33, 34]. To our knowledge, only a few attempts have been made in developing indicators that can predict future impact using citation measures [35, 36] and those have had limited success [37].

Here, we depart from previous efforts by developing a principled approach to the quantification of scientific impact. Specifically, we demonstrate that the distribution of the asymptotic number of accumulated citations to publications by a researcher or from a research institution is consistent with a discrete lognormal model [32, 38]. We validate our approach with two datasets acquired from Thomson Reuters’ Web of Science (WoS):

- Manually disambiguated citation data pertaining to researchers at the top United States (U.S.) research institutions across seven disciplines [39]: chemical engineering, chemistry, ecology, industrial engineering, material science, molecular biology, and psychology;
- Citation data from the chemistry departments of 106 U.S. institutions classified as “very high research activity”.

Significantly, our findings enable us to develop a measure of scientific impact with desirable properties.

## The Data

We perform our first set of analyses on the dataset described by Duch et al. [39]. This dataset contains the disambiguated publication records of 4,204 faculty members at some of the top U.S. research universities in seven scientific disciplines: chemical engineering, chemistry, ecology, industrial engineering, material science, molecular biology, and psychology (see [39] for details about data acquisition and validation). We consider here only 230,964 publications that were in press by the end of 2000. We do this so that every publication considered has had a time span of at least 10 years for accruing citations [38] (the researcher’s publication dataset was gathered in 2010).

We perform our second set of analyses on the publication records of the chemistry departments at the top U.S. research institutions according to [40]. Using the publications’ address fields, we identified 382,935 total publications from 106 chemistry departments that were in press by the end of 2009 (the department’s publication dataset was gathered in 2014).

In our analyses we distinguish between “primary” publications, which report original research findings, and “secondary” publications, which analyze, promote or compile research published elsewhere. We identify as primary publications those classified by WoS as “Article”, “Letter”, or “Note” and identify all other publications types as secondary publications.

Moreover, to ensure that we have enough statistical power to determine the significance of the model fits, we restrict our analysis to researchers with at least 50 primary research publications. These restrictions reduce the size of the researchers dataset to 1,283 researchers and 148,878 publications. All 106 departments in our dataset have a total of more than 50 primary research publications.

## The Distribution of the asymptotic Number of Citations

Prior research suggests that a lognormal distribution can be used to approximate the steady-state citation profile of a researcher’s aggregated publications [31, 41]. Stringer et al. demonstrated that the distribution of the number *n*(*t*) of citations to publications published in a given journal in a given year converges to a stationary functional form after about ten years [32]. This result was interpreted as an indication that the publications published in a single journal have a characteristic citation propensity [42] which is captured by the distribution of the “ultimate” number of citations. Here, we investigate the asymptotic number of citations *n*_{a} to the publications of an individual researcher as well as the set of all researchers in a department at a research institution.

We hypothesize that *n*_{a} is a function of a latent variable *ψ* representing a publication’s “citability” [43]. The citability *ψ* results from the interplay of several, possibly independent, variables such as timeliness of the work, originality of approach, strength of conclusion, reputation of authors and journals, and potential for generalization to other disciplines, just to name a few [44, 45]. In the simplest case, citability will be additive in all these variables, in which case the applicability of the central limit theorem implies that *ψ* will be a Gaussian variable, *ψ* ∈ *N*(*μ*_{a}, *σ*_{a}), where *μ*_{a} and *σ*_{a} are respectively the mean and standard deviation of the citability of the publications by researcher *a*. Therefore, the impact of a researcher’s body of work is described by a distribution characterized by just two parameters, *μ* and *σ*. Similarly, because in the U.S. departments hire faculty based on their estimated quality, the researchers associated with a department will presumably be similar in stature or potential.

Unlike citations, which are observable and quantifiable, the variables contributing to *ψ* are neither easily observable nor easy to quantify. Moreover, mapping *ψ* into citations is not a trivial matter. Citation counts span many orders of magnitude, with the most highly cited publications having tens of thousands of citations [46]. Large-scale experiments on cultural markets indicate that social interactions often create a “rich get richer” dynamics, far distancing the quality of an underlying item from its impact [47]. Citation dynamics are no different. For example, Duch et al. recently showed that the *h*-index has a power-law dependence on the number of publications *N*_{p} of a researcher [39]. Here, we reduce the potential distortion of citation-accruing dynamics by focusing on the logarithm of *n*_{a}. In effect, we take *n*_{a} to be the result of a multiplicative process of the same variables determining *ψ*. Thus, we can calculate the probability *p*_{dln}(*n*_{a}) that a researcher or department will have a primary research publication with *n*_{a} citations, as an integral over *ψ*:
(1)

Most researchers also communicate their ideas to their peers via secondary publications such as conference proceedings which, in many disciplines, are mainly intended to promote related work published elsewhere. Some secondary publications will have significant timeliness, in particular review papers and editorial materials, and therefore will likely be cited too. Most of them, however, will not be cited at all. If accounting for secondary publications, Eq (1) has to be generalized as:
(2)
where *f*_{s} is the fraction of secondary publications in a body of work and *p*_{s}(*n*_{a}|** θ**) represents the probability distribution, characterized by parameters

**and not necessarily lognormal, of**

*θ**n*

_{a}for secondary research publications. We found that in practice Eq (2) can be well approximated by: where

*δ*is the Kronecker delta. Surprisingly, we found that

*μ*′ ≈

*μ*and

*σ*′ ≈

*σ*, suggesting that secondary publications have citation characteristics that are significantly different from those of primary publications.

## Results

Fig 1 shows the cumulative distribution of citations to primary research publications of two researchers in our database (see S1 File for the results for all 1,283 researchers) and two chemistry departments. Using a *χ*^{2} goodness-of-fit test with re-sampling [48], we find that we can reject the discrete lognormal model, Eq (1), for only 2.88% of researchers and 1.13% or departments in our database. The results of our statistical analysis demonstrate that a discrete lognormal distribution with parameters *μ* and *σ* provides an accurate description of the distribution of the asymptotic number of citations for a researcher’s body of work and for the publications from an academic department.

We fit Eq (1) to all citations accrued by 2010 to publications published by 2000 for two researchers (**top row**), and to all citations accrued by 2013 to publications published in 2000 for two chemistry departments (**bottom row**). The red line shows the maximum likelihood fit of Eq (1) to the data (blue circles). The light red region represents the 95% confidence interval estimated using bootstrap (1000 generated samples per empirical data point). We also show the number of publications *N*_{p} in each set and the parameter values of the individual fits.

Fig 2 displays the sample characteristics of the fitted parameters. The median value of obtained for the different disciplines lies between 1.0 and 1.6. Using data reported in [28] we find a significant correlation (*τ*_{Kendall} = 0.62, *p* = 0.069) between the median value of for a discipline and the total number of citation to journals in that discipline (Fig 3). This correlation suggests that depends on the typical number of citations to publications within a discipline. This dependence on discipline size can in principle be corrected by a normalization factor [14, 31, 49].

We show the maximum likelihood fitted model parameters (**top** and **center**) and the fraction of secondary publications (**bottom**). The black horizontal dashed line indicates the median of all researchers. For clarity, we do not show the values of for 9 researchers that are outliers.

We use Rosvall et al. [28] reported values of the relative number of citations to publications in journals of several disciplines as a proxy for relative field size and compare them with the median value of in each discipline. A Kendall rank-correlation test yields a *τ*_{K} = 0.62 with *p* = 0.069. This correlation suggests that depends on the typical number of citations of a discipline.

We also plot the fraction of secondary publications, *f*_{s}, for all the researchers. We find that nearly a fourth of the publications of half of all researchers are secondary, but intra-discipline variation is high. Inter-discipline variability is also high: 17% of the publications of a typical researcher in chemistry are secondary, whereas 60% of the publications of a typical researcher in industrial engineering are secondary.

### Reliability of Estimation

We next investigate the dependence of the parameter estimates on number of publications, *N*_{p}, both at the individual level—testing the effect of sample size—and at the discipline level—testing overall dependence on *N*_{p}. To test for sample size dependence, we fit the model to subsets of a researcher’s publication list. We find that estimates of *σ* are more sensitive to sample size than estimates of *μ* (S1 and S2 Figs). However, this dependence becomes rapidly negligible as the sample size approaches the minimum number of publications we required in creating our sample (*N*_{p} ≥ 50).

Next, we test whether, at the discipline level, there is any dependence of on *N*_{p}. We find no statistically significant correlation, except for a very weak dependence (R^{2}∼ 0.035, *p* = 0.0052) of on *N*_{p} for chemical engineering (S1 Table). This is in stark contrast with the *h*-index which exhibits a marked dependence on number of publications [16].

Then, we test for variation of the estimated parameter values along a researcher’s career. To this end, we order each researcher’s publication records chronologically and divide them into three sets with equal number of publications and fitted the model to each set of publications. Each set represents the citability of the publications authored at a particular career stage of a researcher. Time trends in the estimated values of *μ* would indicate that the citability of a researcher’s work changes over time. We find such a change for 25% of all researchers. For over 64% of those researchers whose citability changes of over time we find that increases (Table 1).

In general, a department has many more publications than any single researcher. Thus, we are able to apply the model from Eq (1) to each year’s worth of departmental publications. This fine temporal resolution enables us to investigate whether there is any time-dependence in the citability of the publications from a department. Fig 4 shows the time-evolution of for the chemistry departments at four typical research institutions. We see that both (circles) and (vertical bars) remain remarkably stable over the period considered.

Each circle and bar represent, respectively, the and for a given year of publications. We estimate the parameters in Eq (1) for sets of departmental publications using a “sliding window” of 3 years. Fits for which we cannot reject the hypothesis that the data is consistent with a discrete lognormal distribution are colored green. We also show each department’s average value of over the period considered (orange dashed lines).

### Development of an Indicator

In the following, we compare the effectiveness of *μ* as an impact indicator with that of other indicators. First, we test the extent to which the value of *μ*_{i} for a given researcher is correlated with the values of other indicators for the same researcher. In order to provide an understanding of how the number of publications *N*_{p} influences the values of other metrics, we generate thousands of synthetic samples of *n*_{a} for different values of *N*_{p} and *μ*_{i}, and a fixed value of *σ* for each discipline. We find that *μ* is tightly correlated with several other measures, especially with the median number of citations (Fig 5). Indeed can be estimated from the median number of citations:
(3)
This close relation between mean and logarithm of the median further supports our hypothesis of a lognormal distribution for the asymptotic number of citations to primary publications by a researcher.

We generate 1000 synthetic datasets for each of 20 values of from 0.5 to 2.0, inclusive, and for *N*_{p} = 50 (blue) and *N*_{p} = 200 (red). We use the average of all researchers in chemistry. For each pair of values of and *N*_{p} we calculated the average value and 95% confidence interval. The colored circles indicate the observed values of the corresponding metrics for chemistry, which have been grouped according to their number of publications *N*_{p}. Values for 22 researchers fall outside of the figures’ limits: 3 in A, 7 in B, 4 in C, 3 in D. (A) The total number of citations depends dramatically on *N*_{p}, which in turn depends strongly on career length, and can be influenced by just a few highly cited publications. (B) The average number of citations is less susceptible to changes in *N*_{p} but can still be influenced by a small number of highly cited publications. (C) The *h*-index, like the total number of publications, is strongly dependent on *N*_{p}. (D) The median number of citations to publications, like the average, is not very dependent on *N*_{p}, and can capture most of the observed behavior.

An important factor to consider when designing a bibliometric indicator is its susceptibility to manipulation. Both the number of publications and total or average number of citations are easily manipulated, especially with the ongoing proliferation of journals of dubious reputation [51, 52]. Indeed, the *h*-index was introduced as a metric that resists manipulation. However, it is a straightforward exercise to show that one could achieve exclusively through self-citations. Indeed, because the *h*-index does not account for the effect of self-citations, it is rather susceptible to manipulation, especially by researchers with low values of *h* [53, 54].

In order to determine the true susceptibility of the *h*-index to manipulation, we devise a method to raise a researcher’s *h*-index using the least possible number of self-citations (see Materials and Methods for details). Our results suggest that increasing the *h*-index by a small amount is no hard feat for researchers with the ability to quickly produce new articles (Fig 6A).

**Bottom panel**: For each researcher in the database, we add publications with self-citations until we reach the desired value of index (see main text for details). The dashed black, dotted-dashed black and dotted white lines indicate the number of publications required to increase the index value by 10%, 50% and 100%, respectively. The solid diagonal black line indicates when the current value of is equal to the manipulated . The dark blue vertical line represents the average value of the indicator amongst all researchers in our database. **Top panel**: Distributions of current *h*-index (left) and (right) for all researchers in the database.

Our proposed indicator, *μ*, is far more difficult to manipulate. Because it has a more complex dependence on the number of citations than the *h*-index, to increase *μ* in an efficient manner we use a process whereby we attempt to increase the median number of citations of a researcher’s work (see Materials and Methods for details). Specifically, we manipulated *μ* for all the researchers by increasing their median number of citations. Remarkably, to increase *μ* by a certain factor one needs at least 10 times more self-citations than one would need in order to increase the *h*-index by the same factor (Fig 6B).

While a difference of 2 to 3 orders of magnitude in number of required self-citations may seem surprising for a measure so correlated with citation numbers (Fig 5), the fact that is actually dependent on the citations to half of all primary publications by a researcher (Eq (3)) makes less susceptible than the *h*-index to manipulation of citation counts from a small number of publications. This view is also supported by the fact that increasing citations may actually decrease , as we may be adding them to a publication that would not be expected to receive that number of citations given the lognormal model. As a result, manipulation of scientific performance would be very difficult if using a *μ*-based index.

### Comparison of Parameter Statistics

Finally we estimate the parameters in Eq (1) for chemistry journals and compare of chemistry departments and journals in selected years, and all chemistry researchers in our database (Fig 7. See S4 Fig for and *f*_{s} comparison). In order to make sense of this comparison, we must note a few aspects about the data. The researchers in the database were affiliated with the top 30 chemistry departments in the U.S., whereas the set of chemistry departments covers all the chemistry departments from very high research activity universities. Thus, it is natural that the typical of researchers is higher than that of departments. Not surprisingly, we find that is typically the lowest for journals.

We show the maximum likelihood fitted for chemistry departments and chemistry journals in select years, and for all chemistry researchers in our database. The black horizontal dashed lines mark the value of the corresponding parameter for the *Journal of the American Chemical Society* in 1995. For clarity, we do not show for 23 journals that are outliers.

## Discussion

The ever-growing size of the scientific literature precludes researchers from following all developments from even a single sub-field. Therefore researchers need proxies of quality in order to identify which publications to browse, read, and cite. Three main heuristics are familiar to most researchers: institutional reputation, journal reputation, and author reputation.

Author reputation has the greatest limitations. Researchers are not likely to be known outside their (sub-)field and young researchers will not even be known outside their labs. Similarly, if we exclude a few journals with multidisciplinary reputations (Nature, Science, PNAS, NEJM), the reputation of a scientific journal is unlikely to extend outside its field. Institutional reputations are the most likely to be known broadly. Cambridge, Harvard, Oxford, and Stanford are widely recognized. However, one could argue that institutional reputation is not a particularly useful heuristic for finding quality publications within a specific research field.

Our results show that the expected citability of scientific publications published by (i) the researchers in a department, (ii) a given scientific journal, or (iii) a single researcher can be set on the single scale defined by *μ*. Thus, for a researcher whose publications are characterized by a very high *μ*, authorship of a publication may give a stronger quality signal about the publication than the journal in which the study is being published. Conversely, for an unknown researcher the strongest quality signal is likely to be the journal where the research is being published or the institution the researcher is affiliated with. Our results thus provide strong evidence for the validity of the heuristics used by most researchers and clarify the conditions under which they are appropriate.

## Materials and Methods

### Model Fitting and Hypothesis Testing

We estimate the discrete lognormal model parameters of Eq (1) for all 1,283 researchers in our database using a maximum likelihood estimator [38]. We then test the goodness of the fit, at an individual level using the *χ*^{2} statistical test. We bin the empirical data in such a way that there are at least 5 expected observation per bin. To assess significance we calculate the statistic for each researcher and then, for each of them, re-sample their citation records using bootstrap (1,000 samples) and calculate a new value of the statistics (*i* = 1, ⋯, 1,000). We then extract a p-value by comparing the observed statistic with the re-sampled *χ*^{2} distribution. Finally we use a multiple hypothesis correction [50], with a *false discovery rate* of 0.05, when comparing the model fits with the null hypothesis.

### Generation of Theoretical Performance Indicators

For each discipline we take the average value of and 20 equally spaced values of *μ* between 0.5 and 2.0. We then generate 1,000 datasets of 50 and 200 publications by random sampling from Eq (1). We then fit the model individually to these 2,000 synthetic datasets and extracted the *h*-index, average number of citations, total number of citations and median number of citations to publications with at least one citation. Finally, for each value of *μ*, we calculate the average and the 95% confidence interval of all the indicators.

### Manipulation Procedure for *h*-index

We try to increase the *h*-index of a researcher by self-citations alone, i.e., we assume the researcher does not receive citations from other sources during this procedure. The procedure works by adding only the minimum required citations to those publications that would cause the *h*-index to increase. Consider researcher John Doe who has 3 publications with {*n*_{a}} = (2, 2, 5). Doe’s *h* is 2. Assuming those publications don’t get cited by other researchers during this time period, to increase *h* by 1, Doe needs to publish only one additional publication with two self-citations; to increase *h* by 2 he must instead produce five publications with a total of eight self-citations, four of which to one of the additional five publications. We execute this procedure for all researchers in the database until they reached a *h*-index of 100.

### Manipulation Procedure for *μ*

The manipulation of *μ* is based on Eq (3). We try to change a researcher’s *μ* by increasing the median number of citations to publications which have at least one citation already. We consider only self-citations originating from secondary publications, i.e., publications that will not get cited. For a given corpus of publications we first define a target increase in median, *x* and then calculate the number of self-citations needed to increase the current median by *x* citations and the corresponding number of secondary publications. We then take the initial corpus of publications and attempt to increase the median citation by *x* + 1. We repeat this procedure until we reach an increase in median citation of 2000.

## Supporting Information

### S1 File. Distribution of the asymptotic number of citations for all 1,283 researchers.

For a detailed description of the plots see the caption in Fig 1.

https://doi.org/10.1371/journal.pone.0143108.s001

(PDF)

### S1 Fig. Dependence of on number of publications at the individual level.

We fit the model to 1,000 randomized subsets of each researcher’s publication list and compare the obtained from fitting each subset of 10, 50, and 100 publications with the associated with the complete publication list. Then, for each researcher and subset size, we calculate a z-score using the mean and standard deviation of the “sub-”. For *N*_{p}≥ 50, the dependence on sample size is negligible for most researchers. Researchers with *N*_{p} < 100 are omitted from the calculation on the subset of size 100.

https://doi.org/10.1371/journal.pone.0143108.s002

(TIFF)

### S2 Fig. Dependence of estimates on number of publications at the individual level.

We use the same procedure as in S1 Fig, except here we show the results for the dependence of on sample size. Estimates of are more dependent of sample size than . However, as in the case of , the dependence of on sample size decays rapidly with increasing sample size. Researchers with *N*_{p}< 100 are omitted from the calculation on the subset of size 100.

https://doi.org/10.1371/journal.pone.0143108.s003

(TIFF)

### S3 Fig. Susceptibility of impact measures to manipulation.

We used the same procedure as in Fig 6, except here we show the required number of publications with self-citations that researchers need to publish in order to increase their indicators. Other details are the same as in Fig 6.

https://doi.org/10.1371/journal.pone.0143108.s004

(TIFF)

### S4 Fig. Comparison of and *f*_{s} across departments, journals, and researchers.

We show the maximum likelihood fitted (**top**) and the fraction of secondary publications (**bottom**) for chemistry departments and chemistry journals in select years, and for all chemistry researchers in our database. The black horizontal dashed lines mark the value of the corresponding parameter for the *Journal of the American Chemical Society* in 1995. For clarity, we do not show for 19 journals and 9 researchers that are outliers.

https://doi.org/10.1371/journal.pone.0143108.s005

(TIFF)

### S1 Table. Individual lognormal parameters show no dependence on *N*_{p}.

https://doi.org/10.1371/journal.pone.0143108.s006

(PDF)

### S2 Table. Individual discipline statistics of the lognormal model parameters.

https://doi.org/10.1371/journal.pone.0143108.s007

(PDF)

## References

- 1. Nicholas D, Watkinson A, Volentine R, Allard S, Levine K, Tenopir C, et al. Trust and Authority in Scholarly Communications in the Light of the Digital Transition: setting the scene for a major study. Learn Publ. 2014 Apr;27(2):121–134.
- 2.
Garfield E, Sher IH. Genetics Citation Index. Institute for Scientific Information, Philadelphia; 1963.
- 3. Garfield E. The History and Meaning of the Journal Impact Factor. JAMA J Am Med Assoc. 2006 Jan;295(1):90.
- 4. Hirsch JE. An index to quantify an individual’s scientific research output. Proc Natl Acad Sci. 2005 Nov;102(46):16569–16572. pmid:16275915
- 5. MacRoberts MH, MacRoberts BR. Problems of citation analysis: A critical review. J Am Soc Inf Sci. 1989 Sep;40(5):342–349.
- 6. Narin F, Hamilton KS. Bibliometric performance measures. Scientometrics. 1996 Jul;36(3):293–310.
- 7.
Cole JR. A Short History of the Use of Citations as a Measure of the Impact of Scientific and Scholarly Work. In: web Knowl. A Festschrift Honor Eugene Garf. Information Today; 2000. p. 281–300.
- 8. Glänzel W, Moed HF. Journal impact measures in bibliometric research. Scientometrics. 2002;53(2):171–193.
- 9. Borgman CL, Furner J. Scholarly communication and bibliometrics. Annu Rev Inf Sci Technol. 2005 Feb;36(1):2–72.
- 10. Vinkler P. Characterization of the impact of sets of scientific papers: The Garfield (impact) factor. J Am Soc Inf Sci Technol. 2004 Mar;55(5):431–435.
- 11. Bornmann L, Daniel HD. What do we know about theh index? J Am Soc Inf Sci Technol. 2007 Jul;58(9):1381–1385.
- 12. Bornmann L, Daniel HD. What do citation counts measure? A review of studies on citing behavior. J Doc. 2008 Jan;64(1):45–80.
- 13. Alonso S, Cabrerizo FJ, Herrera-Viedma E, Herrera F. h-Index: A review focused in its variants, computation and standardization for different scientific fields. J Informetr. 2009 Oct;3(4):273–289.
- 14. Castellano C, Fortunato S, Loreto V. Statistical physics of social dynamics. Rev Mod Phys. 2009 May;81(2):591–646.
- 15. Wilhite AW, Fong EA. Coercive Citation in Academic Publishing. Science. 2012 Feb;335(6068):542–543. pmid:22301307
- 16. Egghe L. Dynamich-index: The Hirsch index in function of time. J Am Soc Inf Sci Technol. 2007 Feb;58(3):452–454.
- 17. Egghe L. Theory and practise of the g-index. Scientometrics. 2006 Oct;69(1):131–152.
- 18. Jin B, Liang L, Rousseau R, Egghe L. The R- and AR-indices: Complementing the h-index. Chinese Sci Bull. 2007 Mar;52(6):855–863.
- 19. Franceschini F, Maisano D, Perotti A, Proto A. Analysis of the ch-index: an indicator to evaluate the diffusion of scientific research output by citers. Scientometrics. 2010 Oct;85(1):203–217.
- 20. Bonetta L. Should You Be Tweeting? Cell. 2009 Oct;139(3):452–453. pmid:19879830
- 21. Fausto S, Machado Fa, Bento LFJ, Iamarino A, Nahas TR, Munger DS. Research Blogging: Indexing and Registering the Change in Science 2.0. PLoS One. 2012 Dec;7(12):e50109. pmid:23251358
- 22. Kwok R. Research impact: Altmetrics make their mark. Nature. 2013 Aug;500(7463):491–493. pmid:23977678
- 23. Haustein S, Siebenlist T. Applying social bookmarking data to evaluate journal usage. J Informetr. 2011 May;5(3):446–457.
- 24. Priem J, Piwowar HA, Hemminger BM. Altmetrics in the wild: Using social media to explore scholarly impact. arXiv12034745v1 csDL 20 Mar 2012. 2012 Mar;1203.4745:1–23.
- 25. Yavchitz A, Boutron I, Bafeta A, Marroun I, Charles P, Mantz J, et al. Misrepresentation of Randomized Controlled Trials in Press Releases and News Coverage: A Cohort Study. PLoS Med. 2012 Sep;9(9):e1001308. pmid:22984354
- 26. Davis PM, Price JS. eJournal interface can influence usage statistics: Implications for libraries, publishers, and Project COUNTER. J Am Soc Inf Sci Technol. 2006 Jul;57(9):1243–1248.
- 27. Brooks TA. Evidence of complex citer motivations. J Am Soc Inf Sci. 1986;37(1):34–36.
- 28. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci. 2008 Jan;105(4):1118–1123. pmid:18216267
- 29. Radicchi F, Fortunato S, Markines B, Vespignani A. Diffusion of scientific credits and the ranking of scientists. Phys Rev E. 2009 Nov;80(5):056103.
- 30. Petersen AM, Stanley HE, Succi S. Statistical regularities in the rank-citation profile of scientists. Sci Rep. 2011 Dec;1:181. pmid:22355696
- 31. Radicchi F, Fortunato S, Castellano C. Universality of citation distributions: Toward an objective measure of scientific impact. Proc Natl Acad Sci. 2008 Nov;105(45):17268–17272. pmid:18978030
- 32. Stringer MJ, Sales-Pardo M, Nunes Amaral LA. Effectiveness of Journal Ranking Schemes as a Tool for Locating Information. PLoS One. 2008 Feb;3(2):e1683. pmid:18301760
- 33. Lehmann S, Jackson AD, Lautrup BE. Measures for measures. Nature. 2006 Dec;444(7122):1003–1004. pmid:17183295
- 34. Abbott A, Cyranoski D, Jones N, Maher B, Schiermeier Q, Van Noorden R. Metrics: Do metrics matter? Nature. 2010 Jun;465(7300):860–862. pmid:20559361
- 35. Acuna DE, Allesina S, Kording KP. Future impact: Predicting scientific success. Nature. 2012 Sep;489(7415):201–202. pmid:22972278
- 36. Mazloumian A. Predicting Scholars’ Scientific Impact. PLoS One. 2012 Nov;7(11):e49246. pmid:23185311
- 37. Penner O, Pan RK, Petersen AM, Kaski K, Fortunato S. On the Predictability of Future Impact in Science. Sci Rep. 2013 Oct;3:3052. pmid:24165898
- 38. Stringer MJ, Sales-Pardo M, Amaral LAN. Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. J Am Soc Inf Sci Technol. 2010 Apr;61(7):1377–1385. pmid:21858251
- 39. Duch J, Zeng XHT, Sales-Pardo M, Radicchi F, Otis S, Woodruff TK, et al. The Possible Role of Resource Requirements and Academic Career-Choice Risk on Gender Differences in Publication Rate and Impact. PLoS One. 2012 Dec;7(12):e51332. pmid:23251502
- 40.
List of research universities in the United States. Available: https://en.wikipedia.org/wiki/List_of_research_universities_in_the_United_States.
- 41. Redner S. Citation statistics from 110 years of physical review. Phys Today. 2005 Jun;58(6):49–54.
- 42. Burrell QL. Predicting future citation behavior. J Am Soc Inf Sci Technol. 2003 Mar;54(5):372–378.
- 43. Burrel QL. Stochastic modelling of the first-citation distribution. Scientometrics. 2001;52(1):3–12.
- 44. Shockley W. On the Statistics of Individual Variations of Productivity in Research Laboratories. Proc IRE. 1957;45(3):279–290.
- 45. Letchford A, Moat HS, Preis T. The advantage of short paper titles. R Soc Open Sci. 2015 Aug;2(8):150266. pmid:26361556
- 46. Van Noorden R, Maher B, Nuzzo R. The top 100 papers. Nature. 2014 Oct;514(7524):550–553. pmid:25355343
- 47. Salganik MJ. Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science. 2006 Feb;311(5762):854–856. pmid:16469928
- 48.
Manly BFJ. Randomization, bootstrap and Monte Carlo methods in biology. 3rd ed. Chapman and Hall/CRC; 2006.
- 49. Petersen AM, Wang F, Stanley HE. Methods for measuring the citations and productivity of scientists across time and discipline. Phys Rev E. 2010 Mar;81(3):036114.
- 50. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57(1):289–300.
- 51. Bohannon J. Who’s Afraid of Peer Review? Science. 2013 Oct;342(6154):60–65. pmid:24092725
- 52. Butler D. Investigating journals: The dark side of publishing. Nature. 2013 Mar;495(7442):433–435. pmid:23538810
- 53. Schreiber M. Self-citation corrections for the Hirsch index. Europhys Lett. 2007 May;78(3):30002.
- 54. Engqvist L, Frommen JG. The h-index and self-citations. Trends Ecol Evol. 2008 May;23(5):250–252. pmid:18367289