Figure 1.
Cumulative distribution of raw citation counts for papers published in
. The blue curve is calculated by aggregating all papers of all subject-categories (average number of citations
). The red curve, the orange curve and the green curve are calculated by considering only papers within the subject-categories “Agronomy” (
), “Computer science, software engineering” (
) and “Genetics & heredity” (
), respectively. The figure illustrates the mapping of
into
. Citation counts
of single subject-categories are matched with the value of
which corresponds to same value of the cumulative distributions.
Figure 2.
Transformation of citation counts.
Citations within single subject-categories. are plotted against citation counts of the aggregated data
. The quantities
and
are related by a power-law relation (Eq. 1). Different subject-categories have different values of the transformation factor
and the transformation exponent
. The best estimates of
and
for the subject-categories considered in this figure (the same subject-categories as those appearing in Fig. 1) are:
and
for “Agronomy”,
and
for “Computer science, software engineering”,
and
for “Genetics & heredity”. The results of the complete analysis for all subject-categories and years of publication are reported in the Supporting Information S2, S3, S4, S5, S6, and S7.
Figure 3.
Cumulative distribution of the transformed citation counts.
When raw citation numbers are transformed according to Eq. 2, the cumulative distributions of different subject-categories become very similar. All citation distributions are mapped on top of the cumulative distribution obtained by aggregating all subject-categories together (the common reference curve in the transformation). We consider here the same subject-categories as those considered in Figs. 1 and 2. The complete analysis of all subject-categories and years of publication is reported in the Supporting Information S2, S3, S4, S5, S6, and S7.
Figure 4.
Comparison between expected and observed proportions of top cited papers.
Probability density function of the proportion of papers belonging to a particular subject-category and that are part of the top of papers in the aggregated dataset. Red boxes are computed on real data, while blue curves represent the density distributions valid for unbiased selection processes. We consider different values of
:
,
,
and
. These results refer to papers published in
.
Figure 5.
Effectiveness of the proposed normalization technique.
Percentage of subject-categories whose proportion values, after normalization, fall into the 95% confidence interval of values predicted in our null model. Percentage values are plotted as functions of the percentage of top papers considered in the analysis. We plot separate curves for different publication years.
Figure 6.
Properties of the transformation parameters.
In the inset, we report the density distribution of the transformation exponents calculated for all subject-categories. In the main plot, we show the relation between the transformation exponent
, the transformation factor
, and the parameters
and
for the same data points as those appearing in the inset. The relation between the various quantities is fitted by the function
, with
and
(blue line). Both plots have been obtained by analyzing papers published in
, but the same results are valid also for different years of publications as shown in Figs. S115 and S116.