A Reverse Engineering Approach to the Suppression of Citation Biases Reveals Universal Properties of Citation Distributions

doi:10.1371/journal.pone.0033833

Figure 1.

Cumulative distribution of raw citation counts for papers published in

. The blue curve is calculated by aggregating all papers of all subject-categories (average number of citations ). The red curve, the orange curve and the green curve are calculated by considering only papers within the subject-categories “Agronomy” (), “Computer science, software engineering” () and “Genetics & heredity” (), respectively. The figure illustrates the mapping of into . Citation counts of single subject-categories are matched with the value of which corresponds to same value of the cumulative distributions.

More »

Expand

Figure 2.

Transformation of citation counts.

Citations within single subject-categories. are plotted against citation counts of the aggregated data . The quantities and are related by a power-law relation (Eq. 1). Different subject-categories have different values of the transformation factor and the transformation exponent . The best estimates of and for the subject-categories considered in this figure (the same subject-categories as those appearing in Fig. 1) are: and for “Agronomy”, and for “Computer science, software engineering”, and for “Genetics & heredity”. The results of the complete analysis for all subject-categories and years of publication are reported in the Supporting Information S2, S3, S4, S5, S6, and S7.

More »

Expand

Figure 3.

Cumulative distribution of the transformed citation counts.

When raw citation numbers are transformed according to Eq. 2, the cumulative distributions of different subject-categories become very similar. All citation distributions are mapped on top of the cumulative distribution obtained by aggregating all subject-categories together (the common reference curve in the transformation). We consider here the same subject-categories as those considered in Figs. 1 and 2. The complete analysis of all subject-categories and years of publication is reported in the Supporting Information S2, S3, S4, S5, S6, and S7.

More »

Expand

Figure 4.

Comparison between expected and observed proportions of top cited papers.

Probability density function of the proportion of papers belonging to a particular subject-category and that are part of the top of papers in the aggregated dataset. Red boxes are computed on real data, while blue curves represent the density distributions valid for unbiased selection processes. We consider different values of : , , and . These results refer to papers published in .

More »

Expand

Figure 5.

Effectiveness of the proposed normalization technique.

Percentage of subject-categories whose proportion values, after normalization, fall into the 95% confidence interval of values predicted in our null model. Percentage values are plotted as functions of the percentage of top papers considered in the analysis. We plot separate curves for different publication years.

More »

Expand

Figure 6.

Properties of the transformation parameters.

In the inset, we report the density distribution of the transformation exponents calculated for all subject-categories. In the main plot, we show the relation between the transformation exponent , the transformation factor , and the parameters and for the same data points as those appearing in the inset. The relation between the various quantities is fitted by the function , with and (blue line). Both plots have been obtained by analyzing papers published in , but the same results are valid also for different years of publications as shown in Figs. S115 and S116.

More »

Expand