Figure 1.
Frequency-rank in genetic sequences.
(A) Log-log plot of the frequency, in descending order, with which the codons appear in the genome of E. coli. The bold line is the discrete generalized beta distribution (DGBD) fit with exponents and squared correlation coefficient (a,b,R2) = (0.25, 0.50, 0.99). The straight line is included as a guide to the eye of a power law behavior within a restricted range. (B) Semi-log plot of the frequency-ordered codons of the genomes of C. elegans, N. gonorrehea and E. coli. Solid lines are the fits with (a,b, R2) = (0.28, 0.38, 0.98), (0.31,0.65, 0.99) corresponding to the first two, values for E coli are given in (A). Frequencies for N. gonorrhea have been multiplied by a factor of 5 and those of C. elegans by 10 in order to avoid overlaps.
Figure 2.
Frequency-rank distributions for musical scores.
Plot of the occurrence of musical notes, ordered decreasingly, in the scores of Holst's “The Planets”, Beethoven's first movement of the “Fifth Symphony” and Alice Cooper's “Billion Dollar Babies”. Solid lines are DGDB fits with (a,b,R2) = (0.23, 1.54, 0.988), (0.42,1.25, 0.987), (0.71, 1.06, 0.978).
Figure 3.
Size-ordered distributions in abstract paintings.
(A) Plot of rectangle relative sizes in arbitrary units shown in decreasing order appearing in Klee's painting “Flora in the sand”. Bold line is the DGBD fit with (a,b, R2) = (0.70, 0.14, 0.999. (B) Plot of circle relative areas expressed in arbitrary units present in Kandinsky's “Several Circles” arranged in decreasing order, here the bold line fit has (a,b, R2) = (0.62, 0.32, 0.978).
Figure 4.
Rank-ordered distributions in biological systems.
(A) Plot of the relative area occupied by different species in abandoned fields of Ilinois over a span of 40 years [17]. For this case (a,b,R2) = (0.88, 0.76, 0.98). (B) Local field potential measurements of cat cerebral cortex taken every 4 ms in an awake state, total of 8192 data points plotted in decreasing order [18] (a,b,R2) = (0.08,0.25,0.98).
Figure 5.
Rank-ordered distributions in social phenomena.
(A) Academic ranking of world Universities [19] based on the number of publications in Nature and Science,(a,b,R2) = (0.37,0.43,0.99). (B). Bioscience and material science journals ordered by impact factor [20] (a,b,R2) = (0.59, 0.83, 0.99),(0.51,0.75,0.99) respectively. (C). Population of the municipalities of the Spanish provinces of Zaragoza and Valladolid [21] (a,b,R2) = (0.95, 0.54, 0.99), (0.98,0.42,0.99) respectively.
Figure 6.
Rank-ordered distributions in networks.
(A) Movie actor network based on the Internet Movie Database (c.f. http://www.nd.edu/̃networks) containing 372,794 actors linked by movie collaborations (a,b,R2) = (0.71,0.61,0.99). (B) E. coli regulatory network nodes ordered by the number of output links based on the data of reference [22].
Table 1.
Fitting parameters a, b and correlation coefficient R2 for diverse systems.
Figure 7.
Frequency-rank distributions of sextuples generated by an expansion-modification algorithm.
(A) Data is generated by the algorithm described in the text. Circles are determined with a modification probability p = 0.35, the corresponding solid line is the DGDB fit with (a,b,R2) = (0.36,1.55,0.96). For the rhomboids p = 0.5 and (a,b,R2) = (0.11,1.28,0.96). (B) shows the variation of the parameters (a,b) with probability p.
Figure 8.
Two parameter fits for rank ordered data.
The figure shows three fits for the population of the municipalities of the Mexican state of Chiapas [26] plotted in decreasing order. The bottom set of points corresponds to the original data, the other two sets have been obtained by successively multiplying by 10 in order to distinguish the behavior of the each fit. The top fit is the DGDB distribution, the middle one corresponds to a power law multiplied by a Gaussian factor and the bottom is a power law multiplied by an exponential factor. All fits have two adjustable parameters and produce good values for R2, in the neighbourhood of 0.97. Notice however that the DGBD curve reproduces more successfully the overall form of the data, particularly at the two extremes.