^{1}

^{1}

^{2}

The authors have declared that no competing interests exist.

The distribution of scientific citations for publications selected with different rules (author, topic, institution, country, journal, etc…) collapse on a single curve if one plots the citations relative to their mean value. We find that the distribution of “shares” for the Facebook posts rescale in the same manner to the very same curve with scientific citations. This finding suggests that citations are subjected to the same growth mechanism with Facebook popularity measures, being influenced by a statistically similar social environment and selection mechanism. In a simple master-equation approach the exponential growth of the number of publications and a preferential selection mechanism leads to a Tsallis-Pareto distribution offering an excellent description for the observed statistics. Based on our model and on the data derived from PubMed we predict that according to the present trend the average citations per scientific publications exponentially relaxes to about 4.

The number of citations for a publication is basically a social popularity measure for it, while it is considered to reflect the quality and impact of the research.

Citations are thus in our focus when evaluating researchers, groups and institutes [

It has been reported [

Biology, physics and socio-economic phenomena offer many intriguing examples of scale-free distributions in complex systems [

A simple exercise on citation data collected from more than 600 000 ISI Web of Science (WOS) publications (mapping a part of the WOS citation network by using an Internet robot, please see the

A similar study can be performed on different Facebook pages for their posts (for details please consult again the

The appropriateness of the TP type fit can be proved by computing the generally used ^{2} ^{2} > 0.9 values presented in

WOS | JCR | FB | |
---|---|---|---|

629 575 | 12 026 | 160 889 | |

^{2} |
0.982 | 0.967 | 0.940 |

Goodness of fit for the large datasets (WOS, JCR and FB) in ^{2} the coefficient of determination. The fit is given by the TP distribution,

Many models have been already considered for explaining the dynamics of citations [

The approach considered here is the simplest mean-field type approximation where only the stochastic nature of the growth process is taken into account and the specificity of the posts quality are coarse-grained. The exponential growth of the number of publications which are the carriers of the citations is known (see for example [

On the other hand the linear preferential growth rate hypothesis or the commonly known Matthew effect (“For to all those who have, more will be given”) has been highlighted in various social systems [

We consider a classical master equation approach for the growth phenomenon. This approach is the simplest possible mean-field like description where the properties of different elements (posts, publications) are coarse-grained and only the stochastic character of the process is kept. In this framework, the stochastic growth process is quantified by a mean growth rate _{n} describing the transition rate from state with _{n}(_{n}(_{n}(_{n}(

The panel on the left side indicates the growth process in the number of elements with _{n}. Due to the fact that the total number of elements is exponentially increasing, the probability _{n} that an element will have

The number of elements in the considered systems is exponentially increasing. Assuming thus an exponential growth in _{n}(_{n} _{n}(

We consider now the continuous limit of _{s}(

The solution of this equation writes as

From this simple mean-field type model we learn that the popularity measures both for scientific publications and Facebook are the results of an exponential growth and a preferential retransmission of the received information. The collapse for the Facebook popularity measures and scientific citations indicate that for their coarse-grained dynamics the ratio

From the promising fit indicated in _{n} _{n}(

Combining this with the exponential growth of _{ech}, the equilibrium value for the average citation per paper in the considered ensemble.

We can now determine the time-evolution of the total citations number per year. Let us assume now that we measure the time in years, and introduce the yearly published article number _{0} we have _{0}) = _{0} and _{0}) = _{0} we get that

For the case of scientific articles indexed in MEDLINE/PubMed (see the

_{0} = 2005 and _{0} = 699 915. For _{0} = 2005, _{0} = 699 915, _{0} = 14 792 864,

Our

The data plotted in

For the WOS dataset (Scientific Citations from ISI Web of Science) we used an Internet robot that started form a given article and reached all the papers that were cited by this. We extracted only the article’s identification code and have done this for a depth of four levels recording the total number of citation for all ISI indexed articles that were accessed with this procedure. In total more than 600 000 articles were reached. The use of such robots for accessing an incomplete part (only accession numbers) of the database is not prohibited by the terms of use for the Web of Science [

For the JCR dataset we have downloaded the table from InCites, Journal Citation Reports [

In collecting the statistics for MEDLINE/PubMEd articles we have used the trend for the total number of publications from [

For Facebook we used only public pages and informations that are publicly visible. In order to do this in an automatic manner we registered as a developer, and used a publicly available page scraper [

All data used to plot the figures are available for download [

Probability distribution functions were constructed using a logarithmic binning method, considering bins of sizes 2^{n}. In order not to overload

All data collected on Facebook are publicly available, and no other personal data was collected. No privacy issues were violated.

The work of Tamás Bíro was supported by the NKFIH/OTKA project Nr. 104260 and a STAR-UBB fellowship.