The Pagerank-Index: Going beyond Citation Counts in Quantifying Scientific Impact of Researchers

Quantifying and comparing the scientific output of researchers has become critical for governments, funding agencies and universities. Comparison by reputation and direct assessment of contributions to the field is no longer possible, as the number of scientists increases and traditional definitions about scientific fields become blurred. The h-index is often used for comparing scientists, but has several well-documented shortcomings. In this paper, we introduce a new index for measuring and comparing the publication records of scientists: the pagerank-index (symbolised as π). The index uses a version of pagerank algorithm and the citation networks of papers in its computation, and is fundamentally different from the existing variants of h-index because it considers not only the number of citations but also the actual impact of each citation. We adapt two approaches to demonstrate the utility of the new index. Firstly, we use a simulation model of a community of authors, whereby we create various ‘groups’ of authors which are different from each other in inherent publication habits, to show that the pagerank-index is fairer than the existing indices in three distinct scenarios: (i) when authors try to ‘massage’ their index by publishing papers in low-quality outlets primarily to self-cite other papers (ii) when authors collaborate in large groups in order to obtain more authorships (iii) when authors spend most of their time in producing genuine but low quality publications that would massage their index. Secondly, we undertake two real world case studies: (i) the evolving author community of quantum game theory, as defined by Google Scholar (ii) a snapshot of the high energy physics (HEP) theory research community in arXiv. In both case studies, we find that the list of top authors vary very significantly when h-index and pagerank-index are used for comparison. We show that in both cases, authors who have collaborated in large groups and/or published less impactful papers tend to be comparatively favoured by the h-index, whereas the pagerank-index highlights authors who have made a relatively small number of definitive contributions, or written papers which served to highlight the link between diverse disciplines, or typically worked in smaller groups. Thus, we argue that the pagerank-index is an inherently fairer and more nuanced metric to quantify the publication records of scientists compared to existing measures.

In this appendix, we show some additional information about the simulation and other experiments we have undertaken.

Attributes of Paper and Author nodes in the paper-generation simulation
Paper Attribute Description Paper ID This attribute is a unique identifier for each paper which would act as an index. Member Member is a 'generic' parameter which will take different names for different simulations scenarios. It essentially indicates whether a paper belongs to a particular 'group' or not, based on an inherent publication habit of its authors which varies in each simulation scenario.

Number of authors
This attributes dictates how many authors are assigned to each paper object.

Author list
The author list will contain the list of authors of a paper.

Impact Factor
This represents the impact factor of the paper. Note that this attribute is used in simulating the author behaviour only. The pagerank-index does not use impact factors in its calculation and when it is applied to real data sets, the impact factors of corresponding journals are not needed. In the simulated system, though, we use the impact factor in order to come up with a weighted preferential attachment process which models citation behaviours.

Number of references
This is a total number of references a paper makes (the length of the reference list).

Number of internal references
The number of citations a paper makes to other papers within the particular field studied. Thus, this is always less than the total number of references a paper makes. Page rank value This is the page rank value of a paper node after the page rank algorithm has been executed and a steady-state value has been reached. Citation count This is the number of citations a paper receives. Member is a 'generic' parameter which will take different names for different simulations scenarios. It essentially indicates whether an author belongs to a particular 'group' or not, based on an inherent publication habit which varies in each simulation scenario. Paper list This is a list of papers authored or co-authored by a specific author.

Citation count
This attribute represents the number of citations each author has for all of his papers. h-index This is the h-index of each author.

Page rank value
This attribute is to store the page rank value Ω of the authors in decimal form.

pagerank-index
This attribute stores the pagerank-index of authors (in percentiles).

Ranks of authors in quantum game theory field using h-index and pagerank-index
We ranked the authors in the quantum game theory field using both h-index and pagerank-index. The ranks are shown in the Fig. 3. As it can be clearly seen from this figure, the rank of some scientists have gone down when ranked using pagerank-index and the ranks of another group of scientists have gone up when ranked using pagerank-index. The ranks that coincides with y = x line represents the ranks of the scientists that has not been changed. The role of the reset parameter α The reset parameter α in Eq.1 was set at α = 0.9 in our experiments. This indicates that the 'random' component is minimal (0.1 · (1/N )) compared to the 'endorsement' component (0.9· cumulative endorsement). Therefore, the 'random surfer' behaviour is minimised. Moreover, bearing in mind that pagerank is run multiple times until steady stage is reached, it is easy to see that the influence of 'endorsements' will increase in each iteration and the random component will have very little influence in the final steady state value. Yet, we consider that it makes sense to have α < 1 (a non-zero (1 − α)), because there would be a small element of randomness even in citation networks. The equivalent scenario of 'surfing' of World Wide Web in citation networks is a person (say, an academic) browsing papers of interest by following the citation links. The value α = 0.9 signifies that after following ten links of citations, the academic would start her search from another random paper in the citation network (1/(1 − α) = 10). It is imaginable that even the most tenacious academic would lose interest after following citations through ten links, and would start her search/review from another paper in the literature (another 'random' point in the citation network). From a practical point of view, α < 1 is necessary to simulate the pagerank process, as α = 1 would result in all nodes having zero pagerank value at the beginning and hence all 'endorsements' will be zero. For these reasons, we have set a fairly high value for α but maintain it less than unity.