The h’-Index, Effectively Improving the h-Index Based on the Citation Distribution

Background Although being a simple and effective index that has been widely used to evaluate academic output of scientists, the h-index suffers from drawbacks. One critical disadvantage is that only h-squared citations can be inferred from the h-index, which completely ignores excess and h-tail citations, leading to unfair and inaccurate evaluations in many cases. Methodology /Principal Findings To solve this problem, I propose the h’-index, in which h-squared, excess and h-tail citations are all considered. Based on the citation data of the 100 most prolific economists, comparing to h-index, the h’-index shows better correlation with indices of total-citation number and citations per publication, which, although relatively reliable and widely used, do not carry the information of the citation distribution. In contrast, the h’-index possesses the ability to discriminate the shapes of citation distributions, thus leading to more accurate evaluation. Conclusions /Significance The h’-index improves the h-index, as well as indices of total-citation number and citations per publication, by possessing the ability to discriminate shapes of citation distribution, thus making the h’-index a better single-number index for evaluating scientific output in a way that is fairer and more reasonable.


Introduction
The h-index, proposed by Hirsch [1], has received wide attention in recent years. For example, as of January 1, 2013, the original paper [1] putting forward the h-index has been cited for 1,232 times in the database of Science Citation Index (SCI). Due to its importance, the h-index study becomes one of the hottest topics in past years [2]. For recent reviews, refer to [3], [4], [5], [6], [7]. Although being a simple and effective index, the hindex suffers from drawbacks. To improve or complement the hindex, many h-type indices were proposed. We list a few of them, but not all, as follows: the g-index [8], A-index [9], R-index and AR-index [10], [11], h (2) -index [12], e-index [13], [14], etc. For more details about the h-type indices, refer to [3], [4], [6]. The relation among the h-index and some h-type indices was studied in [15]. One of the disadvantages of the h-index is the so-called isohindex problem, a phenomenon that many scientists share an identical h-index. To overcome the disadvantage, the real-value hindex [16] and rational (successive) h-index [17] were proposed. The time dependence of the h-index and h-type indices and the relation among them were studied in [18], [19], [20], [21]. At the same time, the h-index was used in various areas. For example, the h-index or the h-type indices were used for evaluating physicists [22], studying journals [23], evaluating chemical research groups correlated with peer judgment [24] and for evaluating the 100 most prolific economists [25]. Interestingly, it is reported recently that references [6], [8], [11], [22], [23] and [24] are the most-cited articles in their respective journals [2]. For example, reference [11] is the most-cited article ever, published in the Chinese Science Bulletin [2].
One important advantage of the h-index is its simplicity. The hindex uses only an integer to measure the academic output of a scientist. Therefore, the Web of Science provides the h-index for every scientist whose papers are indexed by the SCI database. As any single-number indicator, one of the disadvantages of the hindex is the loss of citation information. The area under the citation distribution is divided by the h-index into three parts, h 2 , excess and h-tail citations (Fig. 1). The h-index by itself does not carry information for excess and h-tail citations, which can play an even more dominant role than the h-index in determining the shape of citation distribution curve. Ignoring the contributions from the excess and h-tail citations usually either under-estimates or over-estimates the academic output of the scientist under study.
The current study aims to solve the above problem by proposing the h'-index, a new h-type index that satisfies the following requirements. First, we hope to keep the most important advantage of the h-index, i.e., to use a single-number to measure the academic output of scientists. Second, the new index should carry the main information of citation distributions. In other words, in addition to h 2 , the new index should reflect the information from the excess citations [13] and h-tail citations [26], [27]. Third, as a single-number evaluation index, the total citation number and citations per publication are widely used to evaluate the academic performance of scientists nowadays, e.g., the latter was used to rank the world's top 100 materials scientists by Thomson-Reuters in 2011 [28]. Therefore, we hope that the new index should be highly and linearly correlated with these two indices, respectively, while overcoming their main disadvantage of having no information about citation distribution.

Definition of the h'-index
As pointed out previously, the area under the citation distribution function is divided by the h-index into three parts, representing the h 2 , excess and h-tail citations. However, the shapes of the citation distributions are different for different scientists. The shapes may be roughly divided into three types, represented by a simple straight line model (Fig. 1).
The distribution functions shown in Fig. 1A, B and C represent three types of scientists.
Scientists are roughly divided into 3 types [29], [30]. Scientist A is called a perfectionist, who has few publications, which, however, are highly cited. Scientist B is called a prolific-type scientist, who publishes a large number of papers which also tend to be highly cited. Scientist C is called a mass producer, who publishes a large number of papers that are lowly cited. Scientist A corresponds to Fig. 1A, where ewt or e=tw1, with e 2 and t 2 corresponding to excess and h-tail citations [13], [26], [27]. Here the e-index and tindex need to be explained. The e-index is the square root of the excess citations over h 2 in the h-core [13]. The t-index is the square root of the h-tail citations [26], [27]. Scientist B corresponds to Based on the above analysis, it can be seen that the real number e=t is an important parameter to characterize the shapes of the citation distributions. Letting

r~e=t: ð1Þ
We call r the e-t ratio, or the head-tail ratio. The three cases of rw1, r~1 and rv1 correspond to three types of the citation distribution functions. The e-t ratio, r, is an important index to capture the overall shapes of citation distribution functions. The shapes of citation distribution functions for rw1 are peaked, and for rv1 the shapes of the citation functions are flat with a long tail, whereas for r~1 the citation functions are roughly symmetrical with respect to the diagonal line of the coordinate system. When rw1, especially rww1, the h-index under-estimates the academic output of the scientist being studied, whereas when rv1, especially rvv1, the h-index over-estimates the academic output of the scientist being studied. When r~1, the h-index properly reflects the academic output of the scientist under study. To have a fair evaluation of the academic output of scientists, we propose a novel h-type index, the h'-index, which is defined by where e, h and t are the e-index, h-index and t-index, respectively. The citations received by all papers in the h-core, denoted by C h{core , are where cit j are the citations received by the j th paper. Letting e 2 denote the excess citations within the h-core, we find [13]  The area under the citation distribution curve is divided by the h-index into three parts: the h 2 , excess citations (e 2 ) and h-tail citations (t 2 ). Here the citation distribution curve is simplified as a straight line. Cases shown in A, B and C all belong to an isohindex group, and they represent three types of scientists: A, perfectionisttype; B, prolific-type and C, mass producer-type [29], [30], respectively. doi:10.1371/journal.pone.0059912.g001 where R is the R-index [11]. Thus, Meanwhile, the t-index was defined by [27] where C total is the number of total citations received by all papers published by the scientist under study. Finally, we have Comparisons between the h'-index and h-index First, two concrete examples are considered. According to the citation information provided by Dodson [14], h~25, e~21:84, and t~24:45, we find r~0:89 and h'~22:25. Therefore, the hindex for Dodson is properly applicable. Another example is for the chemist Berni Alder, where h~50, e~114 and t~53:89 [13], we find r~2:16 and h'~105:77. Therefore, this example shows that Alder's h-index severely under-estimates his academic output, whereas the h'-index gives him a relatively fair evaluation.
Second, we turn to study the research output of three types of scientists A, B and C within the same isohindex group. The three scientists are the real applicants applying for the Young Investigator Programme, in the program of the European Molecular Biology Organization (EMBO) in Heidelberg, Germany [29]. Scientist A, B and C belong to the perfectionist-type, prolific-type and mass producer-type scientist, respectively [29], [30].
The common character of them is that they all have the same hindex, h~14: Based on the data provided by Fig. 1 in [29], the number of papers, the total number of citations, the citations per publication, the e-index, t-index, the e-t ratio r and the h'-index are all listed in Table 1. As we can see that the academic performance of the three types of scientists A, B and C is quite different. This example shows that the h-index does not posses the ability to discriminate different shapes of the citation distributions. In contrast, the citations per publication and the h'-index appropriately reflect the academic performance of the three types of scientists A, B and C, whereas the h-index does not discriminate the three types of scientists correctly.
Third, in what follows, we compare the h-index and h'-index for the 100 most prolific economists [25]. The correlation between the h-index and total-citation number for the 100 economists is shown in Fig. 2A, whereas that between the h'-index and total-citation number is shown in Fig. 2B. It is seen that the h'-index has a linear correlation with the total-citation number better than the h-index. Likewise, the linear correlation between the hor h'-index and citations per publication for the 100 economists is shown in Fig. 3A and B, respectively. We can see that the h'-index has a higher linear correlation with the citations per publication than the hindex. Of note, the correlation coefficient between the h'-index and the citations per publication is as high as 0.969. However, this does not imply that the two are nearly equal with each other. In fact, the average value and standard deviation of the h'-index and citations per publication over the 100 most prolific economists are h'~32:96+28:44 and cit=pub~21:51+21:12, respectively.
Theoretically, the h'-index carries more citation information than the h-index, because the h-index captures only the information of the h 2 citations, whereas the h'-index captures not only the h 2 citations, but also the excess and h-tail citation information. Examples shown in Fig. 1 and Table 1 indicate that the h-index does not possess the ability to discriminate the shapes of the citation distribution functions. The three cases shown in Fig. 1 belong to the same isohindex group, i.e., they have an identical h-index. The h'-index properly discriminates the three cases, with h'~1:73h, h'~h and h'~0:58h, respectively, for the cases shown in Fig. 1A, B and C.
As mentioned above, the h-index has the isohindex problem. Since the h'-index is a real number, it provides an alternative solution in addition to those suggested in [16] and [17]. However, in some special cases, the h'-index can have the similar problem. For instance, if the citation curve in Fig. 1A is modified so that it is not linear, with the area of upper part decreasing to 0.54 h, while the lower part (h-tail) remains the same 0.54 h, then e = 0.54 h = t and h' = h, as in Fig. 1B. This example shows that different citation distribution may lead to the same h'-index. However, it seems that this problem of the h'-index occurs so rarely that it is highly unlikely in reality. In addition, it is also possible that different hvalues can lead to the same h'-values, but it is not yet observed.
Hirsch suggested that the h-index is preferable to single-number criteria commonly used to evaluate scientific output of a researcher, such as the number of total citations and citations per publication [1]. Here we show that as a single-number evaluation index, the h'-index is also superior to the number of total citations and citations per publication. First, we compare the h'-index with the number of total citations. As a single-number, the merit of the number of total citations is to measure the total output of the scientist under study using only a simple integer. However, its notable disadvantage is that this number cannot discriminate the shapes of the citation distributions. Refer to Fig. 1 A and C, where the numbers of total citations are identical, being equal to the total area under the citation distribution functions. However, our scientific common sense shows that scientist shown in Fig. 1A (perfectionist with few but influential papers) has a better academic performance than that in Fig. 1C (mass producer with many lowimpact papers). This fact is correctly reflected by the h'-index, but not by the number of total citations. Second, in terms of citations per publication, although this widely used index seems better than the number of total citations, it also does not possess the ability to discriminate the shapes of the citation distribution functions. Consequently, as correctly pointed out by Hirsch [1], this index (cit/pub) usually rewards low productivity, and penalizes high productivity.

Ranking of the Performance of Scientists Based on Various Evaluation Indices
Here we show as an example rankings of the academic performance of the 100 most prolific economists. For convenience, we list only the top ten of the rankings ( Table 2). The rankings are based on the h'-index, citations per publication and the h-index, respectively. As we can see from Table 2 that the top three economists based on the h'-index were A Shleifer, RJ Barro and RF Engle, whereas the top three based on the citations per publication were R F Engle, R J Barro and A Shleifer. That is to say, except the ranking order is slightly different, the names of the top three were identical. However, the top three economists based on the h-index were different from those based on the h'-index and cit/pub. Indeed, except A Shleifer, the names of the remaining two were different.
Among the top ten of the 100 most prolific economists based the h'-index, nine were identical with those based on the citations per publication, indicating that the two rankings were considerably consistent. In contrast, of the top ten economists ranked on the h'index, only seven were identical with those based on the h-index. Furthermore, Dr. RF Engle, who ranked No. 1 based on cit/pub, and No. 3 based on the h'-index, did not appear in the name list of the top ten economists based on the h-index. Of the evaluation indices currently available, it is believed that the index of citations per publication is a relatively reliable index for ranking the academic performance of scientists. Recently, the index of citations per publication was used to rank the world's top 100 materials scientists by Thomson-Reuters [28]. In the analysis above, we show that the h'-index is preferable to citations per publication, because the index of citations per publication does not possess the ability to discriminate the shapes of the citation distribution functions. As a consequence, the index of citations per publication usually rewards low productivity, and penalizes high productivity [1]. Our overall opinion is that the h'-index is one of the best single-number evaluation indices, which can be used to rank the academic performance of scientists in a way that is fairer and more reasonable.

A Study Based on a Power Law Model
Based on the citation distribution function C(t), the e-t ratio can be calculated. Here we study only a simple power law model known as a Lotka's model. We assume that where C 1~C (1) is the maximum citations received by the papers in the h-core. This type of function is in fact the Zipf-type formulation rather than the Lotka [31]. It was shown that [31] h~C 1 lz1 1 : ð9Þ Please refer to equations (8) and (9) of Reference [31]. We further assume l=1. According to the definition of the h-index [1], we have C(h)~h, leading to the above result eq. (9) [13]. Based on eqs. (4) and (9), we find [13] e 2~1 l{1 The t-index can be calculated by.
where N is the number of all papers published by the scientist under study. Consequently, we have It can be seen that the parameter l is an important factor to determine the e-t ratio r. The parameter l cannot be too large. For example, letting l~2, and assuming C 1~1 000,N~100, we find h~10, e~89:44 and t~9:49. Consequently, r~e=t~89:44=9:49~9:43: This result shows that even when l~2, the excess citations (e 2~8 000) and h-tail citations (t 2~9 0) cannot be ignored. As a result, the h'-index (h'~94:3) results in a more reasonable evaluation.

Concluding Remarks
The three commonly used single-number indices, the h-index, total-citation number and citation per publication, all suffer from critical drawbacks. (i) Because of the loss of the information of citation distribution, evaluations based on the h-index alone can be The symbols Cit, P and Cit/P denote the total number of citations, the number pf papers and the citations per publication. The indices e, t, r and h'-index are defined in eqs. (5), (6), (1) and (2), respectively. The values of Cit were provided by Fig. 1 of [29]. Note that the figures of P were appropriately estimated from Fig. 1 in [29]. The indices e and t were calculated from the values of h 2 upper and h 2 lower, respectively, provided by Fig. 1 of [29 misleading, as exemplified by data shown in Table 1. (ii) The totalcitation number does not carry the information of the citation distribution, as reflected by the typical examples shown in Fig. 1A and C. Although both cases correspond to the same number of total citations, the shapes of the citation distribution are quite different. (iii) Likewise, the index of citations per publication also does not possess the ability to discriminate the shapes of the citation distribution functions. As a consequence, the index (cit/ pub) usually rewards low productivity, but penalizes high productivity [1].
The h'-index appears to overcome the above drawbacks by carrying additional information derived from citation distribution. In summary, the h'-index has the following features. (i) It is highly consistent with indices of total-citation number and citations per publication, which are relatively reliable and thus widely used in the evaluation of the academic output of scientists currently. (ii) Compared to the total-citation number and citations per publication, the h'-index possesses the ability to discriminate the shapes of the citation distributions, and thus leading to more reasonable evaluation. (iii) Compared with the h-index, the h'index appropriately carries the information of the excess and h-tail citations. (iv) The h'-index is a real number, thus largely solving the problem of isohindex groups of the h-index. In conclusion, these features enable the h'-index to be a better single-number index for evaluating scientific output in a way that is fairer and more reasonable.

Materials and Methods
The data used were from [25]. Please refer to Table A1, on pp.323-324 of [25]. The calculations performed in this paper were simple and trivial.