The “handedness” of language: Directional symmetry breaking of sign usage in words
The probability of occurrence of the 26 letters of the English alphabet in the Mieliestronk corpus comprising about 58000 unique words of the English language (see Methods for details), at (a) any position, (b) left terminal position (i.e., in the beginning) and (c) right terminal position (i.e., at the end) of a word. The distribution shows more heterogeneity in the letter occurrence probabilities for (c), indicating that only a few letters occur with high frequency at the right terminal position of a word, compared to a relatively more egalitarian frequency of occurrence of letters in the left terminal position (b). This difference is illustrated in the Lorenz curve (d) comparing the cumulative distribution function for the occurrence probability of the different letters in any (solid curve), left terminal (dash-dotted curve) and right terminal position (dashed curve) of a word. The thin broken diagonal line (line of perfect equality) corresponds to a perfectly uniform distribution, deviation from which indicates the extent of heterogeneity of letter occurrence probability distributions—measured by the Gini index which is the ratio of the area between the line of perfect equality and the observed Lorenz curve, and, the area between the lines of perfect equality and of perfect inequality (viz., the horizontal line).