Can Menzerath’s law be a criterion of complexity in communication?

doi:10.1371/journal.pone.0256133

Fig 1.

Memoryless source model.

The memoryless source has only one state and emits in an infinite loop symbols C, V and S with probabilities p_c, p_v, and 1 − p_c − p_v, respectively.

More »

Expand

Fig 2.

Menzerath-Altmann’s law for the memoryless source.

Parameter p_c = 0.48 is used accordingly with real languages. The solid line is the theoretical model, given by Eqs (4) and (5), whereas the square points are the simulation of the memoryless source.

More »

Expand

Table 1.

The adopted grapheme classification into seven sonority classes.

More »

Expand

Table 2.

Summary of characteristics for the 21 languages studied including the total number of books, word tokens, syllables, consonants (C) and vowels (V), as well as the probabilities of vowels (p_v), consonants (p_c) or spaces (p_s).

The last column (Syll/V) reports the syllable-vowel ratio.

More »

Expand

Fig 3.

Different methods of computing MAL reflect similar results.

Relation between word size (measured in number of syllables) versus the mean size of those syllables, where each panel corresponds to one different languages of Gutenberg corpus. Each thin grey line represents one book, whereas black circles are the mean duration of books. Meanwhile blue squares are the result of computing MAL from the full corpus. Both methods shows similar results and solid lines are just represented for visual comparison. Results on 17 additional languages are provided in S1 Fig.

More »

Expand

Table 3.

Menzerath’s law tested on the 21 languages under study.

Spearman’s rank correlation coefficients and p-values between the size of the word and the mean size of the syllables. The regression function (4) for the memoryless source is monotonically decreasing so Spearman’s rank correlation coefficient equals −1 in this case.

More »

Expand

Fig 4.

Menzerath-Altmann’s law and the memoryless source baseline.

Relation between the word size and the mean size of syllables for Italian, Dutch, Spanish and Portuguese. Experimental results are shown in blue circles, the red dotted line is a fit to Menzerath-Altmann’s law (2), whereas the gray solid line corresponds to Eq (4) with p_c given by the relative frequency of consonants. Results for 17 additional languages are provided in S2 Fig.

More »

Expand

Table 4.

The estimated parameters of MAL for the Gutenberg Corpus.

Fitting of MAL to the experimental data has been done using Levenberg–Marquardt algorithm and excluding words with only one syllable. R² (coefficient of determination) is used to determine the goodness of the fit. Column β/γ corresponds to the observable extremum of MAL for β ⋅ γ > 0 or is left blank otherwise.

More »

Expand

Fig 5.

Syllable sizes depending on word length and position.

The mean size of syllables in the number of characters for different word sizes and syllable positions for the case of English, Hungarian, Esperanto and Tagalog. The position in the word has been standardized according to Eq (6), 0 being the first syllable and 1 being the last syllable. The mean size value for monosyllabic words is represented with a black circle at x = 0.5. The results for 17 additional languages are provided in S4 Fig.

More »

Expand