Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Decision tree of questions that should be clarified before estimating power-law exponents from data.

The tree shows under which conditions the fitting algorithms developed in this paper r_plfit and r_plhistfit can be used.

More »

Fig 1 Expand

Fig 2.

The four types of distribution functions.

Data is sampled from a power-law distribution p(x) ∝ x−λ with an exponent λ = 0.7 (red line). The relative frequencies fi are shown for N = 10000 sampled data points according to their natural (prior) ordering that is associated with p (blue). The rank-ordered distribution (posterior) is shown in yellow, where states i are ordered according to their observed relative frequencies fi. The rank-ordered distribution follows a power-law, except for the exponential decay that starts at rank∼500. A low frequency cut-off should be used to remove this part for estimating exponents. The inset shows the frequency distribution ϕ(n) that describes how many states x appear n times (green). The frequency distribution has a maximum and a power-law tail with exponent α = 1 + 1/λ ∼ 2.43. To estimate α, one should only consider the tail of the frequency distribution function.

More »

Fig 2 Expand

Fig 3.

Comparison of the three power-law exponent estimators, LS, MLCSN, and ML*.

For 400 values of λ in the range between 0 and 4, we sample N = 10,000 events from Ω = {1, ⋯, 1,000}, from a power-law probability distribution p(x|λ, Ω) ∝ x−λ. The estimated exponents λest for the estimators LS (red), the MLCSN (green, ), and the new ML* (black, λest = λ*), are plotted against the true value of the exponent λ of the probability distribution samples are drawn from. Clearly, below λ ∼ 1.5 the MLCSN estimator no longer works reliably. MLCSN and ML* work equally well in a range of 1.5 < λ < 3.5. Outside this range ML* performs consistently better than the other methods. The inset shows the mean-square error σ2 of the estimated exponents. The LS-estimator has a much higher σ2 over the entire region, than the ML*-estimator. The blue dot represents the ML* estimate for the Zipf exponent of C. Dickens’ “A tale of two cities”. Clearly, this exponent could never reliably be obtained from the rank ordered distribution using MLCSN, whereas ML* works fine even for values of λ ∼ 0.

More »

Fig 3 Expand

Table 1.

Comparison of the estimators ML* and MLCSN on empirical data sets that were used in [23].

These include the frequency of surnames, intensity of wars, populations of cities, earthquake intensity, numbers of religious followers, citations of scientific papers, counts of words, wealth of the Forbes 500 firms, numbers of papers authored, solar flare intensity, terrorist attack severity, numbers of links to websites, and forest fire sizes. We added the word frequencies in the novel “A tale of two cities” (C. Dickens). The second column states if α or λ were estimated. The exponents reported in [23] are found in column CSN1, those reproduced by us applying their algorithm to data [23, 3437] is shown in column CSN2. The latter correspond well with the new ML* algorithm. For values λ < 1.5, CSN can not be used. We list the corresponding values for Kolmogorov-Smirnov test for the two estimators, KSCSN and KS*.

More »

Table 1 Expand