A novel bibliometric index with a simple geometric interpretation

Trevor Fenner; Martyn Harris; Mark Levene; Judit Bar-Ilan

doi:10.1371/journal.pone.0200098

Abstract

We propose the χ-index as a bibliometric indicator that generalises the h-index. While the h-index is determined by the maximum square that fits under the citation curve of an author when plotting the number of citations in decreasing order, the χ-index is determined by the maximum area rectangle that fits under the curve. The height of the maximum rectangle is the number of citations c_k to the kth most-cited publication, where k is the width of the rectangle. The χ-index is then defined as , for convenience of comparison with the h-index and other similar indices. We present a comprehensive empirical comparison between the χ-index and other bibliometric indices, focusing on a comparison with the h-index, by analysing two datasets—a large set of Google Scholar profiles and a small set of Nobel prize winners. Our results show that, although the χ and h indices are strongly correlated, they do exhibit significant differences. In particular, we show that, for these data sets, there are a substantial number of profiles for which χ is significantly larger than h. Furthermore, restricting these profiles to the cases when c_k > k or c_k < k corresponds to, respectively, classifying researchers as either tending to influential, i.e. having many more than h citations, or tending to prolific, i.e. having many more than h publications.

Citation: Fenner T, Harris M, Levene M, Bar-Ilan J (2018) A novel bibliometric index with a simple geometric interpretation. PLoS ONE 13(7): e0200098. https://doi.org/10.1371/journal.pone.0200098

Editor: Christos A. Ouzounis, CPERI, GREECE

Received: April 4, 2018; Accepted: June 19, 2018; Published: July 10, 2018

Copyright: © 2018 Fenner et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The Google Scholar Citation Data are from the Radicchi and Castellano’s study “Analysis of bibliometric indicators for individual scholars in a large data set.”Scientometrics. 2013;97:627-637. The dataset is available from http://homes.soic.indiana.edu/filiradi/Data/gsc_data.tar.bz2. The second dataset (Nobel prize laureates) is available from figshare: https://figshare.com/s/e07c07c932ce36ab9343 (DOI: 10.6084/m9.figshare.6668174).

Funding: The authors received no specific funding for this work.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Judit Bar-Ilan is an academic editor in PLOS ONE. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

1 Introduction

The debate in bibliometrics on quality versus quantity in evaluating academic research performance is still an ongoing concern [1]. One perspective is to view the number of publications of a researcher (P) as a measure of quantity and the total number of citations to these publications (C) as a perceived measure of quality; several variants of these, such as the average number of citations per publication, the number of citations to the top or the 10th most cited publication, and the number of publications with at least 10 citations, have also been suggested [2]. Although these simple metrics tend to take into account only one facet of a researcher’s impact, several other bibliometric indices, such as the h-index [3], the g-index [4] and generalisations of these [5], combine both citation and publication counts.

An extensive review of the h-index and some of its variants was provided by Egghe in [6], and, a comparison of 37 variants of the h-index was given by Bornmann et al. in [7]. In addition, Waltman and van Eck [8] discussed a number of inconsistencies of the h-index and its variants, and proposed a family of bibliometric indicators that do not suffer from these inconsistency problems. Of particular interest are extensions of the h-index, which take into account the full publication list of a researcher such as the tapered h-index [9]. Proposals for new variants of the h-index continue to appear, for example [10–13], as do comparisons and evaluations, for example [14, 15].

Nevertheless, the h-index and its variants do not normally take into account the full citation list of a researcher. This could be perceived as a drawback; however, the total citation count has the disadvantage of biasing the index in favour of researchers with very highly-cited top publications or very many publication with a relatively small number of citations. We now review the h-index and some of its variants, and then introduce the χ-index, a new index that addresses some of the drawbacks mentioned.

The h-index of a researcher is the maximum number h of the researcher’s publications such that each has at least h citations [3]. Equivalently, consider the citation vector, 〈c₁, c₂, …, c_n〉 of a researcher, where the c_i, the number of citations to publication i, are sorted in descending order, i.e. c_i ≥ c_j if i < j. Here we assume that for all i, c_i > 0, and that h will be zero in the absence of any citations; this is consistent with defining the value of a bibliometric index of a researcher to be zero if none of the researcher’s publications have been cited [16]. The h-index is thus the largest rank h for which c_h ≥ h. The h-index is completely insensitive to the fact that a researcher’s top few publications may be very highly cited, and conversely also to a researcher having a fair number of publications whose number of citations is less than but close to h [17]. A suggested improvement over the h-index, which gives extra weight to highly cited publications, is the g-index. The g-index of a researcher is the largest rank g for which [4]; it is easily shown that g ≥ h. A problem with the g-index is that it may still be biased since, if a researcher has a few publications that are very highly cited and the rest have very few citations, the g-index will still be high. This is because the g-index is equal to the largest rank g such that the average number of citations up until that rank is at least g. Suppose the h-index of a researcher is h, then the h-core is the set of the h most highly-cited publications for this researcher. The A-index, which is the average number of citations to the publications in the h-core, i.e. , was defined as an attempt to address the fact that the h-index does not take into account the total number of citations to publications in the h-core [18]. However, the A-index suffers from the fact that taking an average will, all other things being equal, often favour authors with fewer publications when they are highly cited. To remedy this issue, the R-index has been proposed, where [18]. It is easy to see that h ≤ R ≤ A. Nevertheless, the A and R indices, and to a lesser extent the g-index, ignore the effect of publications outside the h-core, which are also part of a researcher’s output. A recent proposal is the Euclidean-index [19] (which we call the E-index), designed to take account of the full list of an author’s cited publications; it is defined as the Euclidean norm of the citation vector, i.e. .

In order to motivate the χ-index, we first observe that, given a citation vector for a researcher, for any k, k ≤ n, the researcher has at least k publications with c_k or more citations. It follows that the h-index is the largest h such that c_h+1 ≤ h, i.e. for all h′ > h, c_h′ ≤ h. So, for example, if one author has a single publication with 100 citations and another has 10 publications each with 10 citations, then the h-index of the former is 1 while the h-index of the latter is 10. At the other extreme, an author with 100 publications, each with a single citation, has an h-index of 1. The argument for favouring publications with a higher number of citations is normally that of quality versus quantity. However, such an approach, on the one hand, disadvantages a researcher with a few very highly cited publications, who may have carried out some very influential seminal research, whilst, on the other hand, it also disadvantages a prolific researcher who may have many collaborators but fewer citations per publication. Avoiding the debate of number of citations versus number of publications, we propose an index for which all three afore-mentioned scenarios, (i) 1 publication with 100 citations, (ii) 10 publications with 10 citations each, and (iii) 100 publications with 1 citation each, are considered as equally desirable. So the χ-index is essentially the largest product ic_i where 1 ≤ i ≤ n; however, for comparison purposes with the h-index, we will actually define the χ-index to be the square root of this, i.e. . Thus, in all three scenarios the χ-index of the researcher is 10; see Fig 1, which illustrates the three scenarios in a geometrical context. If we let k denote the value of i that maximises ic_i, we see that in all three cases, the researcher has exactly k publications with c_k or more citations. It is clear that the h-index cannot be larger than the χ-index, since .

Download:

Fig 1. Example of the geometric interpretation of the h and χ indices.

https://doi.org/10.1371/journal.pone.0200098.g001

A possible future line of research would be to investigate pairwise combinations of the χ-index with other indices, along the lines of the two-variable metrics examined in [2].

The χ-index is formally introduced in Section 2, generalising the h-index by allowing the interplay between k (the number of publications, representing quantity) and c_k (the number of citations, representing quality). We also list some properties of the χ-index, which could form the basis of its axiomatisation (cf. [16, 20]), and explain the computational methods we use for the empirical analysis in the following sections. In Section 3 we introduce the two data sets analysed, a large Google Scholar data set, described in Subsection 3.1, and a small data set of Nobel prize winners, described in Subsection 3.2. In Section 4 we present the main analysis of the data sets and results obtained. In Subsection 4.1 we analyse the Google Scholar data set, and in Subsection 4.1 we turn our attention to the Nobel prize winners data set. Our main tool here is to partition the researchers into three classes, (i) when k is approximately equal to h, (ii) when k is significantly greater than h and (iii) when k is significantly less than h. We further partition that data according to whether χ is approximately equal to h or significantly larger than h to get a sense of when these two indices differ. Membership of the classes is determined by a basic bootstrap percentile method [21], Section 5.3.1] described in Section 2. In Section 5 we give our concluding remarks. (We note that we use the terms author and researcher interchangeably).

2 Methods

The citation curve is the curve resulting from plotting the number of citations against the ranking of the publications, as specified by the citation vector. The χ-index is the square root of the maximum area rectangle that can fit under the citation curve (see Fig 1). Formally, (1) where c_i is the number of citations to publication i in the citation vector 〈c₁, c₂, …, c_n〉, which represents all cited publications in decreasing order of the number of citations. In the following we let k denote the value of i that maximises ic_i.

We note that, since square root is monotonic, it does not affect the ranking of researchers implied by (1). It is, however, convenient for comparison with the h-index and its derivatives. This can be viewed as the requirement from physics, known as dimensional homogeneity, that we only compare quantities that have the same units [22]. The square root accords with the geometrical interpretations of the h and χ indices: the h-index is the square root of the area of the maximal square that fits under the citation curve [23], and the χ-index the square root of the area of the maximal rectangle. It could also be interesting to consider aggregate functions other than the maximum in (1), for example, minimum, average or average of the minimum and maximum, although these seem to be rather less intuitive in the context of bibliometrics.

Several researchers have studied various properties of citation indices [16, 20] in an attempt to provide objective justification for comparison between indices, and where possible to obtain an axiomatisation of the indices. We list some properties of the χ-index, desirable properties that the χ-index possesses and one that it does not; we leave a complete axiomatisation of the χ-index to future work.

and , where n is the number of cited publications and c₁ is the number of citations to the most highly cited publication.
for all i, .
h ≤ χ.
The χ-index is monotonic [16, 19], in the sense that adding citations to an existing publication or adding a new publication to the list do not lower the index. (Note that the h-index is also monotonic).
The χ-index is scale-invariant [19], in the sense that multiplying the number of citations to each publication by a constant does not change the relative ranking of two citation vectors. (Note that the h-index is not scale-invariant).
The χ-index is not independent [19], since adding a new paper with the same number of citations to two citation vectors may change their relative ranking. For example, the χ indices of both 〈2, 2〉 and 〈1, 1, 1, 1〉 are 2, however the χ-index of 〈2, 2, 1〉 is still 2 but the χ-index of 〈1, 1, 1, 1, 1〉 is . (Note that the h-index is also not independent).

In the following sections we carry out an empirical analysis of the χ-index, comparing it to the citation indices mentioned in the introduction, however, focusing our attention on the comparison of the χ-index and the h-index. We make use of a large data set compiled by Radicchi and Castellano from Google Scholar [24], and also analyse a small data set of 99 Nobel prize winners; both are described in Section 3.

Our initial comparison between the indices is carried out using the Spearman rank-correlation coefficient [25], which demonstrates that the indices we are comparing are all highly correlated, except for P, the number of cited publications. We carry out a more in-depth comparison of the χ and h indices in Section 4, by separating authors whose χ and h indices are approximately the same from those for which they are significantly different.

We make use of the bootstrap method [21], which is a technique for computing a statistic that relies on random resampling with replacement from a given sample data set. The bootstrap method is usually nonparametric, making no distributional assumptions about the data set employed. In its basic form, for example, it can be used to estimate the distribution of the population mean by computing sample means over a large number of bootstrap resamples taken from the original data set. The specific method we use to classify the authors is the basic bootstrap percentile method [21], Section 5.3.1]; see also [26], which also uses the bootstrap method in the context of bibliometrics. In particular, we resample author citation vectors 1000 times, with replacement, compute the h-index for each resample, and then compute a 99% one-sided confidence interval for the h-index values, starting from the lowest one from the 1000 resamples. This allows us to determine for a given author whether k is approximately equal to h and, additionally, whether χ is approximately equal to h by checking whether k or χ are in the confidence interval or not.

We thus first partition the authors into three classes, according to whether (i) k ≈ h, (ii) k > h, or (iii) k < h, where ≈ means approximately equals. The second and third classes capture a tendency of an author towards being prolific when k > h, or influential when k < h. (This does not imply that when k ≈ h the researcher is not prolific or influential, rather the distinction is meant to highlight the two opposing cases). We further partition each class according to whether χ ≈ h or χ > h to see when the indices differ, and to get a sense of the proportion of researchers for which χ ≈ h. Finally, we also consider the subclasses of χ > h, depending on whether c_k > k or c_k < k.

3 Data sets and preliminary analysis

We now introduce the two data sets, provide some basic statistics of these data sets, and compute the correlations between various indices for the researchers concerned. In Subsection 3.1 we consider the Google Scholar data set and in Subsection 3.2 we consider a data set of Nobel prize winners.

3.1 Google Scholar data set

For our main analysis, we made use of a large data set of Google Scholar profiles compiled and made available by Radicchi and Castellano [24]. The full data set contains approximately 90,000 citation vectors of authors across all disciplines, collected between June 29 and July 4, 2012. As in [24], we only included authors who had validated their Google Scholar account, and we removed authors with fewer than twenty publications, publications with no citations and publications dated before 1945. We then filtered the data further to include only authors having a career of five years or more, where the career is deemed to begin from the year of the first published paper within the window of years considered. After this preprocessing step, the final data set we used was reduced to 34,393 citation profiles.

We start by presenting, in Table 1, the basic statistics for the various indices introduced in Section 1; h, g, A, R, , χ, and P, stand for the h-index, the g-index, the A-index, the R-index, the square root of the Euclidian-index, the χ-index, the square root of the total number of citations and the number of publications, respectively. (We note that we have chosen to use and for comparison purposes). It can be seen that the number of cited publications P stands out as a clear outlier, and also A, to a lesser extent. Moreover, apart from min, the statistics for h are the lowest, closely followed by .

Download:

Table 1. Basic statistics for various indices for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t001

In Table 2, we present the Spearman rank-correlation coefficient r [25] between the various indices, noting that when computing the Pearson correlation [25] the results were similar; due to symmetry we only present the upper triangle of the correlation matrix. (We note that while the Pearson correlation measures the strength of a linear association between two random variables, the Spearman rank-correlation measures the strength of a monotonic association between the two, which may be nonlinear [27]). We observe that P has the lowest correlation with any of the other indices, and that all the other indices are highly correlated with each other. We further note that, although is indeed highly correlated with all the other indices apart from P, it has a possible perceived disadvantage, as do P and , in that it takes into account the complete list of publications.

Download:

Table 2. Spearman rank-correlation between the various indices computed from the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t002

From now on, we will concentrate on comparing the h and χ indices, h being the most commonly employed index, and leave detailed comparison to other indices for future work.

We start by showing, as was done in [24], that the probability density functions of the h and χ indices both follow log-normal distributions [28, 29]. To this end we introduce the Jensen-Shannon divergence (JSD) [30], which is a nonparametric measure of the distance between two empirical distributions p = (p_i) and q = (q_i), where i = 1, 2, …, n.

The formal definition of the JSD, which is a symmetric version of the Kullback-Leibler divergence and is based on Shannon’s entropy [31], is given by (2) where we use the convention that if p_i = 0 or q_i = 0, or both, 0 ln 0 and 0 ln (0/0) are both defined to be 0. (The factor 2 ln 2 is included to normalise the JSD to be between 0 and 1). We observe that the JSD is equal to 0 when p = q.

In Table 3 we give the mean μ, and standard deviation σ of the log-normal distributions fitted by the maximum likelihood method, and the JSD between the empirical distributions of the h and χ indices and the fitted log-normal distributions. The low JSD values indicate good fits for both indices. We also note that the means and standard deviations are quite close.

Download:

Table 3. Maximum likelihood fitting of log-normal distributions to the h and χ indices of the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t003

3.2 Nobel prize winners data set

For our second data set, we collected the citation vectors of 99 Nobel prize winners across a variety of disciplines from the Web of Science platform [32]. We included only authors having twenty or more publications, and only those publications with citations. However, for this data set we considered their full careers without a cutoff date. In Table 4, we present the basic statistics for the Nobel laureates, while in Table 5 we present the Spearman rank-correlation coefficient. As one would expect, the statistics are, overall, much higher than for the Google Scholar data set, although for this data set A is more of an outlier than P. On the other hand, the correlations are comparable to the Google Scholar data set, although, on average lower.

Download:

Table 4. Basic statistics for various indices for the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t004

Download:

Table 5. Spearman rank-correlation between the various indices computed from the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t005

In Table 6 we show the parameters of the log-normal distribution fitted by the maximum likelihood method, and the JSD between the empirical distributions of the h and χ indices and the fitted log-normal distributions. As for the Google Scholar data set, the low JSD values indicate good fits for both indices. We again note that the means and standard deviations are quite close.

Download:

Table 6. Maximum likelihood fitting of log-normal distributions to the h and χ indices of the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t006

4 Analysis and results

We now analyse the data sets introduced in Section 3, with the aim of revealing how authors are separated into classes depending on whether k ≈ h or not, or whether χ ≈ h or not. In Subsection 4.1 we analyse the Google Scholar data set, and in Subsection 4.2 we analyse the Nobel prize winners data set.

4.1 Results for Google Scholar data set

In Fig 2, we see three examples of authors according whether (i) k ≈ h, (ii) k > h, or (iii) k < h, exhibiting the geometry of the h and χ indices. When k > h there are many publications, each with fewer than h citations (tending towards prolific), and when k < h therefore fewer publications, each with more than h citations (tending towards influential).

Download:

Fig 2. Examples of authors for the Google Scholar data set: k ≈ h (left) k > h (middle) k < h (right).

https://doi.org/10.1371/journal.pone.0200098.g002

In Table 7, we exhibit the breakdown of the three classes for the Google Scholar data set, noting that k < h is the largest class, the other two comprising just over 53.50% of the data set. It is also apparent that, within the class k < h, there are by some margin, more authors for which χ > h. What this means is that, when χ is significantly larger than h, we expect that k will be significantly smaller than h, i.e. we expect the author to have several publications with more than h citations, contributing to χ being larger h; this can be justified from the data in Table 7 with the use of Bayes theorem. This confirms that the χ-index addresses a problem of the h-index that it does not sufficiently take into account highly cited publications. The statistics in Table 8 for the three classes further confirm this property of the χ-index, showing higher average values for the χ-index when k < h.

Download:

Table 7. Breakdown of the three k classes for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t007

Download:

Table 8. Basic statistics for k ≈ h (left) k > h (centre) k < h (right) for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t008

Moreover, it can be seen in Table 9 that out of all authors, there are 28.60% for which χ is significantly larger than h, clearly demonstrating the potential of the χ-index to separate authors that may have similar h indices. In addition, the statistics shown in Table 10 indicate higher average values when χ > h. The breakdown of the χ > h class, when c_k > k and c_k < k, can be seen in Table 11, while the basic statistics pertaining to these classes are shown in Table 12. It can be seen that the average values for the larger subclass, c_k > k, are much higher than those for the smaller subclass, c_k < k.

Download:

Table 9. Breakdown of the two χ classes for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t009

Download:

Table 10. Basic statistics for χ ≈ h (left) and χ > h (right) for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t010

Download:

Table 11. Further breakdown of the χ > h class for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t011

Download:

Table 12. Basic statistics for the χ > h class, when c_k > k (left) and c_k < k (right) for the Google Scholar data set.

https://doi.org/10.1371/journal.pone.0200098.t012

4.2 Results for Nobel prize winners data set

The Nobel prize winners data set looks at the extreme case of researchers having, on average, very high h values and therefore also very high χ values. In Fig 3 we see three examples of authors according to the three classes as in Fig 2, exhibiting the geometry of these classes for the χ-index for this data set. These examples can be contrasted to the ones shown in Fig 2 for the Google Scholar data set, demonstrating more extreme cases of the χ-index when k > h or k < h.

Download:

Fig 3. Examples of authors for the Nobel prize winners data set: k ≈ h (left) k > h (middle) k < h (right).

https://doi.org/10.1371/journal.pone.0200098.g003

In Table 13, we see a significant difference from the Google Scholar data set, since for about 80% of the laureates we have k < h and, of those, for over 75% of the authors χ > h. As expected, this implies that, overall, Nobel prize winners are influential. Looking at the statistics in Table 14, we see that when k < h, on average, the χ values of researchers are much larger than the h values. This is due to publications with a large number citations, significantly more than h. An interesting observation is that unlike Table 8, where the values of the χ-index are the highest when k < h, in Table 14 χ is highest for the smaller class when k > h. This is most likely due to a long tail of highly cited publications for these few laureates.

Download:

Table 13. Breakdown of the three k classes for the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t013

Download:

Table 14. Basic statistics for k ≈ h (left) k > h (centre) k < h (right) for the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t014

In contrast to Table 9, it can be seen from Table 15 that χ > h for over 60% of laureates. However, as the statistics in Table 16 reveal, in contrast to Table 10, the h-index for those Nobel prize winners with χ ≈ h, is actually, on average, higher than both the h and χ indices of the laureates with χ > h. This may indicate that for very influential researchers, such as Nobel laureates, when χ > h the h-index undervalues their contribution. The breakdown of the χ > h class, when c_k > k and c_k < k, can be seen in Table 17, while the basic statistics pertaining to these classes are shown in Table 18. It is interesting to note that as opposed to the Google scholar statistics shown in Table 12, the average values for the Nobel laureates subclass c_k > k are, in fact, much lower than those for the subclass c_k < k. This latter class is quite small as there are only three such Nobel prize winners; see Table 18. As noted above this is most likely due to a long tail of relatively highly cited publications for these few laureates.

Download:

Table 15. Breakdown of the two χ classes for the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t015

Download:

Table 16. Basic statistics for χ ≈ h (left) and χ > h (right) for the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t016

Download:

Table 17. Further breakdown of the χ > h class for the Nobel prize winners data set.

https://doi.org/10.1371/journal.pone.0200098.t017

Download:

Table 18. Nobel prize winners basic statistics for the χ > h class, when c_k > k (left) and c_k < k (right).

https://doi.org/10.1371/journal.pone.0200098.t018

5 Concluding remarks

We have presented a new citation index, the χ-index, which addresses some shortcomings of the h-index in terms of the balance between number of citations and number of publications. The χ-index has a simple geometric characterisation in terms of the largest area rectangle that fits under the citation curve; this generalises the h-index for which the rectangle is constrained to be a square.

We have analysed two data sets, a large one from Google Scholar and a small one of Nobel prize winners. Studying these data sets clearly shows the utility of the χ-index. First, as with many of the citation indices that combine number of citations (proxy for quality) with number of publications (quantity), the χ-index correlates strongly with the square root of the total number of citations, yet it is selective in its choice of publications to include in the index. Second, as we have seen from our analysis, there are many researchers whose χ-index is significantly larger than their h-index due to their tendency to be influential, in the case k < h, or prolific in the case k > h. We believe that this property of the χ-index is beneficial and could lead to a more satisfactory ranking of researchers than that obtained using the h-index.

Acknowledgments

The authors would like to thank the reviewers for their constructive comments, which helped us to improve the paper.

References

1. Sahel JA. Quality versus quantity: Assessing individual research performance. Science Translational Medicine. 2011;3:84cm13. pmid:21613620
- View Article
- PubMed/NCBI
- Google Scholar
2. Hausken K. The ranking of researchers by publications and citations: Using RePEc data. Journal of Economics Bibliography. 2016;3:530–558.
- View Article
- Google Scholar
3. Hirsch JE. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America. 2005;98:16569–16572.
- View Article
- Google Scholar
4. Egghe L. Theory and practise of the g-index. Scientometrics. 2006;69:131–152.
- View Article
- Google Scholar
5. van Eck NJ, Waltman L. Generalizing the h- and g-indices. Journal of Informetrics. 2008;2:263–271.
- View Article
- Google Scholar
6. Egghe L. The Hirsch index and related impact measures. Annual Review of Information Science & Technology (ARIST). 2010;44:65–114.
- View Article
- Google Scholar
7. Bornmann L, Mutz R, Hug SE, Daniel HD. A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants. Journal of Informetrics. 2011;5:346–359.
- View Article
- Google Scholar
8. Waltman L, van Eck NJ. The Inconsistency of the h-index. Journal of the American Society for Information Science and Technology. 2012;63:406–415.
- View Article
- Google Scholar
9. Anderson TR, Hankin RKS, Killworth PD. Beyond the Durfee square: Enhancing the h-index to score total publication output. Scientometrics. 2008;76:577–588.
- View Article
- Google Scholar
10. Mahbuba D, Rousseau R. Year-based h-type indicators. Scientometrics. 2013;96:785–797.
- View Article
- Google Scholar
11. Crispo E. A new index to use in conjunction with the h-index to account for an author’s relative contribution to publications with high impact. Journal of the American Society for Information Science and Technology. 2015;66:2381–2383.
- View Article
- Google Scholar
12. Schreiber M. A variant of the h-index to measure recent performance. Journal of the American Society for Information Science and Technology. 2015;66:2373–2380.
- View Article
- Google Scholar
13. Gao C, Wang Z, Li X, Zhang Z, Zeng W. PR-index: Using the h-Index and PageRank for determining true impact. PLoS ONE. 2016;e0161755:13 pages. pmid:27627767
- View Article
- PubMed/NCBI
- Google Scholar
14. Wildgaard L, Schneider JW, Larsen B. A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics. 2014;101:125–158.
- View Article
- Google Scholar
15. Raheel M, Ayaz S, Afzal MT. Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering. Scientometrics. 2018;114:1107–1127.
- View Article
- Google Scholar
16. Woeginger GJ. An axiomatic characterization of the Hirsch-index. Mathematical Social Sciences. 2008;56:224–232.
- View Article
- Google Scholar
17. Bornmann L, Daniel H. What do we know about the h index? Journal of the American Society for Information Science and Technology. 2007;58:1381–1385.
- View Article
- Google Scholar
18. BiHui J, LiMing L, Rousseau R, Egghe L. The R- and AR-indices: Complementing the h-index. Chinese Science Bulletin. 2007;52:855–863.
- View Article
- Google Scholar
19. Perry M, Reny PJ. How to count citations if you must. The American Economic Review. 2016;106:2722–2741.
- View Article
- Google Scholar
20. Marchant T. An axiomatic characterization of the ranking based on the h-index and some other bibliometric rankings of authors. Scientometrics. 2009;80:325–342.
- View Article
- Google Scholar
21. Davison AC, Hinkley DV. Bootstrap Methods and their Applications. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, UK: Cambridge University Press; 1997.
22. Prathap G. Citation indices and dimensional homogeneity. Current Science. 2017;113:853–855.
- View Article
- Google Scholar
23. Liu Y, Zuo W, Gao Y, Qiao Y. Comprehensive geometrical interpretation of h-type indices. Scientometrics. 2013;90:605–615.
- View Article
- Google Scholar
24. Radicchi F, Castellano C. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics. 2013;97:627–637.
- View Article
- Google Scholar
25. Rosner B. Fundamentals of Biostatistics. 7th ed. Boston, MA: Brooks/Cole, Cengage Learning; 2011.
26. Andersen JP, Haustein S. Bootstrapping to evaluate accuracy of citation-based journal indicators. In: Proceedings of the 15th International Society of Scientometrics and Informetrics Conference. Istanbul, Turkey; 2015. p. 413–414.
27. Hauke J, Kossowski T. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae. 2011;30:87–93.
- View Article
- Google Scholar
28. Johnson NL, Kotz S, Balkrishnan N. 14 Lognormal distributions. In: Continuous Univariate Distributions, Volume 1. 2nd ed. Wiley Series in Probability and Mathematical Statistics. New York, NY: John Wiley & Sons; 1994. p. 207–258.
29. Limpert E, Stahel WA. The log-normal distribution. Significance. 2017;14(1):8–9.
- View Article
- Google Scholar
30. Endres D, Schindelin J. A new metric for probability distributions. IEEE Transactions on Information Theory. 2003;49:1858–1860.
- View Article
- Google Scholar
31. Cover TM, Thomas JA. Elements of Information Theory. Wiley Series in Telecommunications. Chichester: John Wiley & Sons; 1991.
32. Clarivate Analytics. Web of Science; 2018. See www.webofknowledge.com.

[ref1] 1. Sahel JA. Quality versus quantity: Assessing individual research performance. Science Translational Medicine. 2011;3:84cm13. pmid:21613620
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Hausken K. The ranking of researchers by publications and citations: Using RePEc data. Journal of Economics Bibliography. 2016;3:530–558.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Hirsch JE. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America. 2005;98:16569–16572.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Egghe L. Theory and practise of the g-index. Scientometrics. 2006;69:131–152.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. van Eck NJ, Waltman L. Generalizing the h- and g-indices. Journal of Informetrics. 2008;2:263–271.
View Article
Google Scholar

[15] View Article

[16] Google Scholar

[ref6] 6. Egghe L. The Hirsch index and related impact measures. Annual Review of Information Science & Technology (ARIST). 2010;44:65–114.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Bornmann L, Mutz R, Hug SE, Daniel HD. A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants. Journal of Informetrics. 2011;5:346–359.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Waltman L, van Eck NJ. The Inconsistency of the h-index. Journal of the American Society for Information Science and Technology. 2012;63:406–415.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Anderson TR, Hankin RKS, Killworth PD. Beyond the Durfee square: Enhancing the h-index to score total publication output. Scientometrics. 2008;76:577–588.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref10] 10. Mahbuba D, Rousseau R. Year-based h-type indicators. Scientometrics. 2013;96:785–797.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref11] 11. Crispo E. A new index to use in conjunction with the h-index to account for an author’s relative contribution to publications with high impact. Journal of the American Society for Information Science and Technology. 2015;66:2381–2383.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Schreiber M. A variant of the h-index to measure recent performance. Journal of the American Society for Information Science and Technology. 2015;66:2373–2380.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Gao C, Wang Z, Li X, Zhang Z, Zeng W. PR-index: Using the h-Index and PageRank for determining true impact. PLoS ONE. 2016;e0161755:13 pages. pmid:27627767
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref14] 14. Wildgaard L, Schneider JW, Larsen B. A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics. 2014;101:125–158.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref15] 15. Raheel M, Ayaz S, Afzal MT. Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering. Scientometrics. 2018;114:1107–1127.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref16] 16. Woeginger GJ. An axiomatic characterization of the Hirsch-index. Mathematical Social Sciences. 2008;56:224–232.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref17] 17. Bornmann L, Daniel H. What do we know about the h index? Journal of the American Society for Information Science and Technology. 2007;58:1381–1385.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref18] 18. BiHui J, LiMing L, Rousseau R, Egghe L. The R- and AR-indices: Complementing the h-index. Chinese Science Bulletin. 2007;52:855–863.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref19] 19. Perry M, Reny PJ. How to count citations if you must. The American Economic Review. 2016;106:2722–2741.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref20] 20. Marchant T. An axiomatic characterization of the ranking based on the h-index and some other bibliometric rankings of authors. Scientometrics. 2009;80:325–342.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref21] 21. Davison AC, Hinkley DV. Bootstrap Methods and their Applications. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, UK: Cambridge University Press; 1997.

[ref22] 22. Prathap G. Citation indices and dimensional homogeneity. Current Science. 2017;113:853–855.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Liu Y, Zuo W, Gao Y, Qiao Y. Comprehensive geometrical interpretation of h-type indices. Scientometrics. 2013;90:605–615.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Radicchi F, Castellano C. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics. 2013;97:627–637.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Rosner B. Fundamentals of Biostatistics. 7th ed. Boston, MA: Brooks/Cole, Cengage Learning; 2011.

[ref26] 26. Andersen JP, Haustein S. Bootstrapping to evaluate accuracy of citation-based journal indicators. In: Proceedings of the 15th International Society of Scientometrics and Informetrics Conference. Istanbul, Turkey; 2015. p. 413–414.

[ref27] 27. Hauke J, Kossowski T. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae. 2011;30:87–93.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref28] 28. Johnson NL, Kotz S, Balkrishnan N. 14 Lognormal distributions. In: Continuous Univariate Distributions, Volume 1. 2nd ed. Wiley Series in Probability and Mathematical Statistics. New York, NY: John Wiley & Sons; 1994. p. 207–258.

[ref29] 29. Limpert E, Stahel WA. The log-normal distribution. Significance. 2017;14(1):8–9.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref30] 30. Endres D, Schindelin J. A new metric for probability distributions. IEEE Transactions on Information Theory. 2003;49:1858–1860.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref31] 31. Cover TM, Thomas JA. Elements of Information Theory. Wiley Series in Telecommunications. Chichester: John Wiley & Sons; 1991.

[ref32] 32. Clarivate Analytics. Web of Science; 2018. See www.webofknowledge.com.

Figures

Abstract

1 Introduction

2 Methods

3 Data sets and preliminary analysis

3.1 Google Scholar data set

3.2 Nobel prize winners data set

4 Analysis and results

4.1 Results for Google Scholar data set

4.2 Results for Nobel prize winners data set

5 Concluding remarks

Acknowledgments

References