what is random and reasonable?

Posted by beckon on 18 Mar 2010 at 17:23 GMT

It may be of interest to the authors that a couple of decades ago biogeographers when through an extended discussion over how to populate islands randomly as a baseline for comparison with the species composition on real islands. How much should the selection of candidate species for random populations be guided by the characteristics of real populations? The analogy seems to be close with choosing the constraints on random text. It might be useful to go back over that literature. Linguists don't seem to have thought about this as much as biogeographers did.

RE: what is random and reasonable?

beckon replied to beckon on 18 Mar 2010 at 17:27 GMT

Sorry...I ment to say "biogeographers WENT through..."

RE: what is random and reasonable?

rferrericancho replied to beckon on 22 Mar 2010 at 18:25 GMT

In my opinion, the discussion about the relevance or meaningfulness of Zipf’s law lacks a proper null hypothesis. “random typing” could be considered a null hypothesis but no cognitive scientist would agree that this is the way words are produced when we speak or write. This is in connection with the discussion at the end of our article.
If we assumed, more realistically, that there is a mental lexicon, a possible null hypothesis would be “words are chosen uniformly at random from a mental lexicon”. But this, would not give Zipf’s law (with a typical exponent). The rank histogram would be flat (if the sample was large enough).
Another problem is that many researchers look at Zipf’s law as a null hypothesis, see for instance

Miller, G. A. & Chomsky, N. 1963. Finitary models of language users. In: Handbook of Mathematical Psychology (Ed. by R. D. Luce, R. R. Bush & E. Galanter), pp. 419–492. New York: J. Wiley.

(or more recently Nowak, M. A. (2000) The basic reproductive ratio of a word, the maximum size of a lexicon. Journal of Theoretical Biology, 204 (2), 179-189)

The point is that a null hypothesis is a possible explanation for a phenomenon but not the phenomenon itself.

