Comparison of word selection, letter occurrence & possible acronyms

Posted by Browser on 20 Jan 2012 at 16:48 GMT

One question that pops up is the issue of double articulation in Tweets and the possible use of first letters in words to form acronyms and anagrams.

A comparison of letter occurrence in the word list presented in this article shows that a rough relationship exists between overall occurrence rate in all words chosen and the first letter of these words:

BLF = Beginning Letter Frequency
TLF = Total Letter Frequency

BLF = 0.0661(TLF) + 0.0222

R-Squared = 0.2294

with relatively high (and increasing) rates of co-occurrence for: w, f, b, p, m, c, d, s

and relatively low (but increasing) rates of co-occurrence
for: u, o, n, i, e

for the total list of words included in the data file.

The list of first letters used most, in order of (rapidly) decreasing frequency, for the first 15 letters is:

s, c, p, a, d, b, t, r, m, f, e, h, l, w, i

One wonders how such results would compare to letter usage in newspaper and web site headlines for news articles, and so forth.

