The “handedness” of language: Directional symmetry breaking of sign usage in words

The observed asymmetry between heterogeneity of letter occurrence probability in left and right terminal positions is significant when the database is sufficiently large.

Gini index differential ΔG shown for the left and right terminal letter (1-gram) distributions calculated using a set of N words, as a function of N. Empirical results are shown for random samples (without replacement) taken from the Mieliestronk corpus comprising about 58000 unique words of the English language, each data point (circles) being the average over 103 samples of size N. For each empirical sample, a corresponding randomized sample is created by randomly permuting the letters in each of the N words, and a data point for the randomized set (squares) represents an average over randomizations of 103 samples of size N. With increasing N the empirical distribution becomes distinguishable from the randomized set (which, by definition, should not have any left-right asymmetry). The error bars indicate standard deviation over the different samples.

