Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Statistics of Language Morphology Change: From Biconsonantal Hunters to Triconsonantal Farmers

Statistics of Language Morphology Change: From Biconsonantal Hunters to Triconsonantal Farmers

  • Noam Agmon, 
  • Yigal Bloch


Linguistic evolution mirrors cultural evolution, of which one of the most decisive steps was the "agricultural revolution" that occurred 11,000 years ago in W. Asia. Traditional comparative historical linguistics becomes inaccurate for time depths greater than, say, 10 kyr. Therefore it is difficult to determine whether decisive events in human prehistory have had an observable impact on human language. Here we supplement the traditional methodology with independent statistical measures showing that following the transition to agriculture, languages of W. Asia underwent a transition from biconsonantal (2c) to triconsonantal (3c) morphology. Two independent proofs for this are provided. Firstly the reconstructed Proto-Semitic fire and hunting lexicons are predominantly 2c, whereas the farming lexicon is almost exclusively 3c in structure. Secondly, while Biblical verbs show the usual Zipf exponent of about 1, their 2c subset exhibits a larger exponent. After the 2c > 3c transition, this could arise from a faster decay in the frequency of use of the less common 2c verbs. Using an established frequency-dependent word replacement rate, we calculate that the observed increase in the Zipf exponent has occurred over the 7,500 years predating Biblical Hebrew namely, starting with the transition to agriculture.


In most of its history, homo-sapiens sapiens followed the hunter-gatherer way of life. Between 15,000 and 10,000 years ago, a major transition in human sustenance was instigated in W. Asia, which set the stage for modern human society: the transition to agriculture [1]. This included domestication of plants and mammals [2], sedentism and the establishment of the large Neolithic villages [3]. An important factor allowing the concomitant enhancement in social complexity was the ability to communicate. Was language influenced by this decisive step in human prehistory? Most of the historical linguistic literature does not explicitly relate to this intriguing question.

In Semitic languages [4], a hypothetical transition from biconsonsonantal (2c) to triconsonantal (3c) language morphology was debated for quite some time [5]. Semitic lexemes are derived from roots consisting of predominantly three radicals (i.e., root consonants), termed 3c. However, there is a small corpus of 2c roots (defined in Methods), responsible for most of the irregular Semitic verbs. Are these remnants from a more archaic linguistic phase? One observation favoring this is the relative abundance of 2c body parts and, particularly facial features (“eye”, “tooth”, etc.). If this semantic field originated early in language development then so did the 2c morphology. But how can we know this?

Further progress can be made by correlating linguistic and archeological innovations. Selecting an archeologically dateable semantic field (e.g., materials), we have shown [6] that, in the reconstructed Proto-Semitic (PS) language [7,8], names of materials known to and utilized by early hunter-gatherers (wood, reed, stone, flint, lime, gravel, sand, mud, clay, cloth, skin and water) are overwhelmingly (85%) of 2c morphology, while materials introduced as of the Neolithic period in W. Asia (bitumen, sulfur, salt, charcoal, pottery, brick, wool, lead, antimony, copper, silver and gold) were all given 3c names. This non-uniform distribution of 2c vs. 3c lexemes in these two semantic fields suggests that a 2c > 3c language morphology change accompanied the transition to agriculture in the Early Neolithic, ca. 11,000 years Before Present (BP).

Such a dramatic event in the prehistory of pre-Semitic languages, if occurred, must have impacted the statistics of 2c vs. 3c lexemes in Semitic languages. The present work explores two independent consequences of such an irreversible language replacement process that together provide a rather compelling evidence for its occurrence. The first makes use of comparative linguistics and archeology, whereas the second uses lexical statistics.

Firstly, there should be a rift between the lexicon of farmers and their predecessor hunter-gatherers. One can nowadays reconstruct PS rather reliably [8] thanks to the extensive Akkadian (Akk.) texts [9], which go back 2.5–4.5 kyr. PS was supposedly spoken during the Chalcolithic period, sometime between 5,750 BP [10] and 6,300 BP [11]. The society then was already composed of well-established agricultural communities, whose language must have contained the linguistic innovations of the agricultural era side-by-side with relics from the hunter-gatherer lexicon, prevailing just 5 kyr earlier. We therefore reconstruct the hunter-gatherer and farmer lexicons at the PS level, focusing predominantly on archeologically dateable human innovations. The reconstructions are justified in the Etymological Appendix (EA) that includes Text S1, Tables S1-S4, and Text S2 within the Information (the complete EA is linked as Text S3).

Just as in the case of materials, we expect the hunter-gatherer lexicon to be enriched in 2c lexemes (bearing in mind that some of these have already been replaced by new 3c terms), whereas the farmer's lexicon should have 3c morphology. Additionally, a sizeable percentage of the PS hunter's lexicon should appear in Proto Afroasiatic (PAA) [11-18], the predecessor of PS, or even in the reconstructed lexicon of the Nostratic macrofamily [19-22].

The second approach utilizes word frequency analysis [23], starting with Zipf, who showed that when words in a given text are ranked (r) by their frequency (f) of utilization, a power-law is observed [24]:


The Zipf exponent α is about 1 in natural languages, while A is a normalizing factor that depends on the total size of the textual corpus [23]. A similar correlation holds just for the verbs extracted from a given text [25]. Eq. (1) fits most word frequency data, except for the highest ranks (which may be text-specific), and the low-frequency (large r) part that deviates downward from this correlation. This may represent a switch-over from α0≈1at small r to α1≈2 at large r [26,27]. The high frequency words obeying the original Zipf law constitute a "kernel lexicon", whereas the vast low frequency part consists of more specific terminology.

Word frequency is intimately connected to language history. It was already noted e.g., in Chap. 3 of Zipf's book [24], that high-frequency words tend to be older. This was recently quantified via a decay rate coefficient, k(f), that is larger for less frequently used words [28,29]. If words only disappear, with a monotonically decreasing k(f), their Zipf exponent would increase with time. The fact that α is always about 1 suggests that new words are formed with a similar k(f), and these balance the death of the old ones.

It follows that for any non-productive, morphologically distinguishable lexical subset (consisting of word types that stopped being created), α should increase with time. Here we analyze the frequency of verbal roots in an ancient Semitic text, the Hebrew Bible. We find that while the total verbal corpus shows the expected Zipf behavior, the 2c verbal roots (Table S5) exhibit a noticeably larger Zipf exponent. Using the power-law form of k(f) determined by Pagel et al. [29], the increase in α is uniquely converted to a lifetime for the 2c corpus. We find that the end of the 2c era has occurred ca. 7.8 kyr before Biblical Hebrew (BH) [30-32], and this corresponds rather nicely to the onset of agriculture.


Reconstruction of PS is rather straightforward [8] and less controversial than reconstructions on deeper levels, such as PAA [12-18] and Nostratic [19-21]. PS is believed to be based on the 29 consonant phonemes in the transliteration table S6 (one more than in Arabic). Every reconstructed PS word must normally have reflexes ("cognates") in Akk. [9] and at least one W. Semitic language. Relaxing the demand for an Akk. cognate will inevitably lead to additional (more questionable) reconstructions (see DAE). Borrowing is excluded based on expert opinions from the linguistic literature. Details and discussions of all reconstructions are given in the EA compiled by Yigal Bloch (Text S3), which is in general agreement with earlier work [8].

Next, we suggest two definitions of 2c roots. The narrow definition includes only strictly 2c nouns, such as *dam “blood” (note the embolden radicals), the "hollow" II-w roots (that in the traditional 3c grammar have w or y as the second radical), and those with a reduplicated last consonant. By direct counting in a Biblical Concordance we find that this group corresponds to about 12% of BH nouns. These possibly originate from an early phase of the 2c language.

The broad definition of 2c roots includes, in addition, all I-n and I-w plus most I-y and III-w/y roots [4,6,32]. These added consonants may represent early affixes, later perceived as radicals. We estimate that roughly 35% of BH nouns are 2c (and 60% 3c) according to this criterion. This broader 2c lexicon should correspond to the latest phase of the 2c language, just before the transition to 3c morphology occurred. In the statistical analysis of 2c BH verbs below we include, with little loss of accuracy, all I-y and III-w/y roots (Table S5), and this allows for a nearly automated procedure for 2c verb identification.


I: The hunter-gatherer lexicon

Our hunter-gatherer lexicon avoids terms characteristic of both hunter and farmer cultures (e.g., “to collect” may refer to collecting produce from the wild or an agricultural field). We avoid most plant and animal names that could have been introduced either before or after their domestication. This leaves mainly terms related to fire and hunting, whose inclusion in the lexicon is justified by archeological data discussed below. A priori one might expect a similar 3c/2c ratio in all semantic fields. But, as we show below, this is not the case.

(a) Fire.

"The manufacture of stone tools and the manipulation of fire are the most important extrasomatic milestones in our early evolutionary trajectory" [33]. "Fire played a multifunctional role in human history: a source of warmth, light, and a means for cooking; it could also serve to discourage carnivores, clear areas of vegetation, and be used for the smoking and drying of meat, among others" [34]. Evidence for hominin use of fire may go back 790,000 years [35].

The multifunctional role of fire is manifested by several PS synonyms for the noun “fire” (Table 1). We can use this abundance as a statistical test for archaic language morphology. Notably, four out of five (80%) are of 2c morphology (in its narrow definition, see Methods). This value is strikingly larger than the a priori probability for such 2c nouns in Semitic languages, say 12% in BH. It could be explained if most of the fire synonyms originate from an older proto-language that had an abundance of 2c lexemes. Normally, words are replaced approximately every 3,000 years, but some survive considerably longer [22,29]. These are typically the more frequently used ones. Since fire was so vital for existence it had to be manipulated daily. Consequently, prehistoric people must have used the word “fire” daily, and this explains its longevity.

#meaningAkk. [9]PS (EA)DAE1 #ND/RPN #
1.1fire (2c)išātu*ˀiš1154ND 86
1.2"girru*gir(r)1178ND 688, RPN 443
1.3"urru *ˀūr1152ND 73
1.4"nūru*nār1663ND 1617
1.5 flame (3c)2nablu*nabl
1.6to burn (2c)aggu *ˀ/hg1155RPN 596
1.8"kabābu*kb1192ND 592?
1.9"kawû (?)*kwy1146ND 1238
1.11"qalû*ql1144ND 1041
1.12"qamû*qm2193RPN 466, ND 1068a
1.14to burn (3c)laˀbu3*lhb1799

Table 1. PS synonyms for “fire” and “to burn.”.

Reconstructions are denoted by an asterisk, and root consonants (radicals) are in bold. DAE [18] entry numbers indicate suggested PAA origin, whereas a Nostratic origin is suggested by the cited entries from ND [20] and RPN [21]. See Table S1 of the Supporting Information for further detail.
1 There are PAA synonyms for “fire” that are unattested in PS. The full DAE list includes entries #1147, 1152, 1154, 1178, 1183-6, 1188, 1190, 1663, 2134 and 2599 there: all except #1185 are 2c.
2 I-n *nabl would be 2c in the broad definition of 2c roots (see Methods).
3 *lhb is PS only if Akk. laˀbu “skin disease” [9] is really a cognate. Otherwise there is only a single 3c/PS verb “to burn”.
Download CSV

Corroboration of our conclusions can indeed be obtained from older proto-languages. Preceding PS on the linguistic genealogical tree is PAA (previously called Hamito-Semitic), from which the Afroasiatic (AA) language families (Semitic, Egyptian, Berber, Cushitic, Omotic, and Chadic) have evolved. Unlike Semitic and Old Egyptian, the other African languages have only recently been documented. Consequently, there is yet no consensus over the PAA lexicon, and whether it originated before or immediately after the transition to agriculture, i.e. between 9,000 [16] to 12,000 BP [17]. A Hamito-Semitic Etymological Dictionary (HSED) was published by Orel and Stolboba [12], and criticized by several authors [7,13,14]. It has since been updated online as the Database of Afroasiatic Etymology (DAE) [18], of which we make use in the tables below. An even older (and more controversial) conjectured macrofamily of protolanguages is Nostratic, for which two major dictionaries were compiled, abbreviated herein ND [20] and RPN [21]. It encompasses AA, Indo-European (IE), Kartvelian and other Euro-Asian language families, and is estimated to originate in the Levant some 15,000 years ago [19].

Because our starting point is PS, we do not require that every item in these dictionaries be correct, only that they are sufficiently comprehensive to include the predecessors of most PS lexemes. The last two columns in Table 1 list entry numbers for PAA and Nostratic compilations, when exist. Of the five “fire” synonyms, all four 2c terms appear on both PAA and Nostratic levels, whereas the 3c term (*nabl) does not. Indeed, “fire” was found to be one of the "ultraconserved words" in the Nostratic macrofamily [22]. Thus all the pre-agricultural names for “fire” that survived in PS are ancient 2c terms.

(b): Burning statistics.

A sample space of 5 items might be too small for statistical inference. Hence we add the 10 synonyms for the verb “set afire, burn” in Table 1. Of these, eight are 2c and only two are 3c. Again, all of the 2c terms are classified as PAA. The behavior in this semantic field confirms that most nouns and verbs connected with fire are 2c, contrasting with the low abundance of 2c lexemes in Semitic languages.

(c) “Water”.

  1. like fire, is one of the "bare essentials" required to sustain life. Thus water vocabulary should also be immune to replacement. The PS noun for “water” is 2c *māy, whereas drinking (water) is depicted by the two PS/2c verbal roots *šty and *šqy [8]. All three are also PAA (DAE entries 999, 1878 and 1209), hence of pre-agricultural origin.

(d): Hunting.

Table 2 summarizes PS hunting terminology. The hunter had little possessions which he carried along: bow (#2.3) with which to shoot (#2.4) an arrow (#2.1), a small bag (#2.2) for collected items, perhaps a water bottle. All these PS terms are 2c and all are attested in PAA. The prehistory of the bow is difficult to determine because most components (except arrowheads) are perishable. Ballistic arrowhead analysis concluded [36] that lithic projectiles emerged with the onset of the Upper Paleolithic (ca. 45,000 BP). Nevertheless, the transition from atlatl to bow and arrow in W. Asia is believed to have occurred in Natufian times (15,000–11,700 BP), when both weapons may have been in use [37,38]. The fact that PS *qaš-t, “bow”, is agreeably PAA, and there is no obvious linguistic trace for “atlatl”, suggests that Natufians have utilized predominantly bows and arrows and/or that the term for atlatl has undergone a semantic shift to indicate the bow.

#MeaningAkk. [9]PS (EA)PAA (ref. #)
2.1Arrowūṣu, uṣṣu*ḥiẓT 11
2.2small bagkīsu*kīsT 13
2.3Bowqaštu*qaš-t1HSED 1560, DAE 524
2.4throw, shootramû*rmyDAE 1499
2.5 hunt, prowlṣâdu*ṣdDAE 1230
2. 6provisionsṣidītu*ṣīd

Table 2. Hunting terms in the PS lexicon (all are 2c).

References to PAA origins include entry numbers from the treatises denoted herein HSED [12], DAE [18], and T [15].
See Table S2 of the Supporting Information.
1 In Semitic, “bow” ends with a feminine suffix, -t, that is missing in AA.
Download CSV

In addition to the overwhelming 2c vs. 3c statistics, we note an interesting polysemy (multiple meanings) of the PS root *ṣd (#2.5). While in BH it means “to hunt”, in Akk. ṣâdu means “to prowl, turn about” [9]. Prowling characterizes a hunter-gatherer in search of food rather than a farmer. Farmers that occasionally went hunting would return to their permanent abode in a village [39]. Hunter-gatherer tribes in the Levant would spend the winter in the coastal plains, follow the deer in the spring to the mountains, and then turn around, completing an elliptic annual trajectory.

The pre-agricultural origin of this verb is supported by a possible pre-agricultural connection between hunting (#2.5) and provisions (#2.6). In PS, these must have been considered as homonyms (see EA), because for the farmer there was no connection between “hunting” and “provisions” (the latter coming mainly from his domesticated fauna and flora). For his hunter-gatherer predecessor, however, these must have been strongly associated, because provisions carried on hunting journeys might have included dried/smoked meat of hunted animals. Such a connection between hunting, prowling and provisions is thus indicative of a nomadic hunter-gatherer society.

II: The farmer's lexicon

Evidently, more farming than hunting terms survived in PS, and nearly all have 3c morphology. Table 3 lists 27 agricultural terms that have been archeologically dated. Verbs like “collect”, “grind” and “bake”, characterizing both agricultural and pre-agricultural communities, and animal or plant names that could have originated either before or after domestication, are not listed. The discussion below provides archeological evidence that the entries in Table 3 originate within the Neolithic or Chalcolithic societies (ca. 11,000–6,000 BP).

#MeaningAkk. [9]PS (EA)DAE #
3.3grape, fruitinbu*ˁinb
3.4well, pit1būru*biˀr 916, 2536
3.5ripe, cookbašālu*bšl
3.9storage/threshing placegarānu *gurn
3.10arable landugāru*hugār2327
3.12to plowerēšu*ḥr
3.13fermenting wineḫammurtu*ḫamr
3.16vine(yard) 2 karānu*karm1050
3.17mud bricklibittu*labinat
3.22to draw watersâbu ˀb 984
3.23bbe drunkšakâru*škr
3.24boil, cooksalāqu*šlq
3.25to plantsatālu*štl
3.26straw, chafftibnu*tibn
3.27to sowza*zrˁ2338

Table 3. Agricultural terms in PS are of 3c morphology.

Only 5 are possibly PAA, and even some of these assignments are questionable.
See Table S3 of the Supporting Information.
1 #3.4 is not PAA if the Chadic and Cushitic cognates are Arabic loans (DAE #916).
2 The PAA status of #3.16 “vineyard” relies on an Egyptian cognate which is likely a W. Semitic loan (see EA).
Download CSV

As of the Pre-Pottery Neolithic B (PPNB), ca. 10,500 BP, the farmer (#3.1) lived in a large village (#3.15), constructed of square houses [3], often made of straw (#3.26) reinforced [40] sun-baked mud bricks (#3.17). Indeed, straw became readily available after the Pre-Pottery Neolithic A (PPNA) wheat domestication [1], and hence its identification as an agricultural commodity.

The farmer would work in an agricultural field (#3.10, 3.11), which he would plow (#3.12), sow (#3.27) or plant (#3.25). Furrow tracts from W. Europe date to 5,500 BP [41] and must have appeared earlier in W. Asia. Cattle were domesticated in the upper Euphrates valley by 10,000 BP, spreading to Central Anatolia, Mesopotamia and the S. Levant around 8,500 BP [42]. This may mark the onset of ox-traction and hence the use of the scratch-plow (ard) for plowing. The ard might have also been instrumental in installing the first irrigation systems. An early irrigation canal (#3.19), over 7 kyr old, was discovered in Choga Mami, 110 km E. of Baghdad: "It is conceivable, indeed probable, that plough cultivation accompanies irrigation agriculture in the earlier Samarra period" [48].

Tilled fields can be sown only if grain from the previous year is stored under adequate conditions. PPNA granaries (#3.2 and 3.9), about 11,300 years old, were unearthed in the Dead-Sea region near Dhraʽ, Jordan [43]. These round structures, with suspended floors for air circulation and protection from rodents, were located between residential structures that contain plant-processing installations.

The first attested wells (#3.4) were dug by Neolithic farmers on the coast of Cyprus ca. 9,200 BP [44]. The oldest well found in Israel (8700–8400 BP) is in the undersea site of Atlit-Yam [45,46]. A Pottery Neolithic (PN) well, dated to ca. 8,300 BP, was found at Sha‘ar Hagolan in the Jordan Valley [47]. Thus wells were yet another important innovation of the Neolithic. The II-ˀ morphology of *biˀr “well” (#3.4) is also attested in the PS/3c verbal root for drawing waterˀb, #3.22), possibly because they have originated in the same period.

The earliest mineral-tempered ceramics from Tell Sabi Abyad (N. Syria) was likely introduced for cooking (#3.5 and 3.24), leading to a "culinary revolution" nearly 9,000 years ago [49], when (PS/2c) “baking” [8] and “roasting” (#1.11) were supplemented by cooking. Only later was pottery utilized for storing liquids.

Herding began after goat and sheep domestication, either in the Neolithic or as late as the Chalcolithic [50]. Livestock (#3.8) was often composed of mixed sheep and goat herds that optimize vegetation exploitation. This contrasts with their non-overlapping habitats in Nature [2], suggesting that *ḍaˀn “livestock” (#3.8) is a post-agricultural innovation. The herd was lead by a stockbreeder (#3.18) to a drinking trough (#3.21). Due to lactose intolerance, milk utilization has begun rather late, at the end of the Chalcolithic or the Early Bronze [41]. However, recent fatty acid analysis of pottery sherds suggests that processed milk was used as early as 8,500 BP [51]. In agreement with this, there is no PS name for milk but there is one for butter (#3.14), a low lactose milk product.

Fermenting wine (#3.13) was made from grapes (#3.3) already in the Neolithic: jars from Georgia (in the Caucasus), dating to ca. 8,000 BP, were shown to contain resinated wine deposits, as have 7,300 BP sherds from the Zagros Mountains in Iran ([52], Chap. 4). The popular resin was from the terebinth tree (#3.6), Pistacia atlantica [52]. The earliest known winery (6100 BP) was recently found in an Armenian cave site [53]. The prominence of viticulture in the Fertile Crescent is echoed in toponyms derived from *karm, *karān “vineyard” (#3.16): Mt. Karmel in N. Israel and Karānā in Upper Mesopotamia (perhaps Tell ar-Rimāh, 60 km W. of Nineveh). Although a dry wasteland today, the high concentration of archeological mounds suggests it has once been fertile land ([52] p. 173).

Beer (#3.23a) was the most popular intoxicating (#3.23b) drink in Mesopotamia. Until recently, the earliest evidence for beer (from ca. 5,500 BP) was found in the Sumerian trading post of Godin Tepe in Iran [54]. But recent evidence from Göbekli Tepe (S.E. Turkey) suggests that beer was brewed already in the PPNB [55].

Millet (#3.7) was domesticated in N.E. China about 10,000 years ago [56]. It made its way to the Black-Sea region around 7,000 BP [57], just in time to be included in the PS lexicon. Because it came from outside W. Asia, its PS name depicts the domesticated plant and not its wild progenitor.

Exceptions to the regularity demonstrated in Table 3 namely, PS agricultural terms with 2c morphology, are hard to find. We have found two such examples (as compared with 27 entries in Table 3), and even these are not clearly exceptions. (i) It is suggested that PS/PAA/2c *marr “a hoe”, derived from the 2c root *mrr “to hoe”, originates within a PAA farming lexicon [16]. The noun is either Nostratic, ND #1482 [20], or a "wandering-word" borrowed into many languages from Sumerian [58]. If the verb *mrr has itself been borrowed by Sumerian from PAA [16], then its original meaning was “to dig” [59], an activity practiced by hunter-gatherers much before the agricultural era. (ii) The PS verb for herding, *rˁy, is 2c although herding postdates ungulate domestication that occurred after the transition to agriculture. However, in some Chadic dialects it means “to chase, follow”, DAE #663 [18]. This may go back to gazelle chases, involving gathering herds by "effective utilization of drives and surrounds" [60], including the utilization of huge traps known as "desert kites" [61]. Thus if gathering domesticated herds is the behavioral continuation of gathering herds of gazelles, the continued use of the same verb for depicting it could be understandable.

III: Word Frequency Analysis

The study thus far focused on statistics of culturally specific terms (hunting vs. farming) that could be correlated with archeology. These are mostly low frequency lexemes, hence not belonging to the "kernel lexicon". We now analyze the kernel lexicon of an Ancient Semitic text, the Hebrew Bible, bisecting it into its 2c vs. 3c components. We consider verbs, because their 2c vs. 3c origin can be determined rather mechanically (see Methods), allowing processing a large number of verbs. Yet they constitute the only part of speech whose Zipf plot is similar to that of the whole corpus [25]. The black circles in Figure 1 depict the frequency-rank dependence, f(r), for the BH (non-Aramaic) verbal roots with f ≥ 10 [31]. It indeed appears that there are two regimes here [26], with α0=1.07and α1≈2 (dashed lines). The switchover occurs around f = 20, so that the kernel of BH is characterized by f ≥ 20.

Figure 1. Frequency-rank plot for Hebrew verbal roots appearing more the 10 times in the Bible (black circles)

[31]. Black dashed lines: fits of the total frequency,f0(r) , to Eq. (1) with A0=16,000 and α0=1.07or A1=1.3×106 andα1=2. Blue triangles represent 2c/BH verbal roots in their broadest definition (see Methods). They were extracted from Lester's list [31] and collected in Table S5 of the Supporting Information. The non-2c verbs there were defined as 3c, and their frequencies are depicted by the red triangles. Blue line is a fit to Eq. (2) with t = 7.8 kyr andA2c=3. The rate function k(f0) from Eq. (3b) has B = 0.55 kyr-1 and β = 0.13, as deduced from Figure 3a of Ref [29].. Red line is a fit to Eq. (4) with the same parameters, except forA3c=0.09.

Table 4 lists the 20 most frequent BH verbs. These are indeed very generic, not related to any specific culture or occupation, and likely used with high frequency in any natural language. Of these, 13 are 2c, far exceeding the fraction of 2c verbs in the Bible. This agrees with the observation that the most frequently used words in English tend to be short [24] (and also of Old English origin). According to Zipf's "principle of least effort" long words got shortened for ease of use. We have no evidence that 2c/PS verbs were shortened from 3c verbs, and thus suggest another mechanism leading to the prevalence of 2c verbs in Table 4.

rankMeaningBH (freq.)Akk. [9]DAE #2c/3c
1say, seeˀmr (5317)amāru3c
2becomehyh (3576)ewû20562c
3doˁśy (2632)5322c
4comebwˀ (2579)bâˀu5992c
5giventn (2014)nadānu12372c
6gohlk (1554)alāku6152c
7seerˀy (1310)8872c
8hear1šmˁ (1165)še2423c
9speakdbr (1135)8743c
10sityšb (1087)wašābu3072?2c
11go outyṣˀ (1075)waṣû2c
12returnšwb (1075)2c
13takelq (966)le3c
14knowydˁ (952)e2c
15ascendˁly (894)e2c
16stretch outšl (847)ša3c
17diemwt (845)mâtu24662c
18eatˀkl (814)akālu1197?3c
19call2qrˀ (736)qe8793c
20liftˀ (658)našû16272c

Table 4. The 20 most frequently used verbal roots in BH with their Biblical frequencies [31].

Those with Akk. cognates are PS, whereas DAE entry numbers [18] indicate possible PAA origin. See Table S4 of the Supporting Information.
1The listed AA cognates mean “ear” and they are 2c (DAE #242).
2The AA reconstruction means “shout” and it is 2c (DAE #879).
Download CSV

As recently shown [29], frequently used words (actually, meanings) are replaced (by other words of the same meaning) less often than the less frequent ones. Thus if the 2c stratum indeed predated the 3c one, the frequently used 2c lexemes may have simply survived replacement during the subsequent 3c era. This is supported by their frequency-rank dependence in Figure 1. As opposed to the total BH verbs with α ≈ 1, the 2c/BH verbs (collected in Table S5) exhibit an observably larger Zipf exponent (α2c=1.28), whereas the high frequency 3c verbs have a smaller α3c=0.82(linear fits not shown). This might be explainable by the 2c > 3c transition: while the 2c language was alive, 2c words of a given meaning were depleted at the same rate as alternate 2c lexemes were generated, and the language maintained its steady-state with the usual exponent α ≈ 1. After the 2c era has ended, 2c roots were no longer created only eliminated. Because less frequently used words decay faster, α2c increased with time.

One may turn this into a quantitative method for dating the 2c > 3c transition. Suppose that once there were only 2c words, and at some time (t = 0) they started to be replaced with new 3c words. Assume that (up to a constant) the frequency of use of a certain verbal meaning (at least in the kernel lexicon),f0(r) , is an inherent property of human language and hence not strongly time-dependent. We thus equate it with the frequency of the total verb distribution (black circles in Figure 1). Therefore, at t = 0 the 2c frequency-rank relation wasf2c(r,0)=A2cf0(r), where A2c is some constant. We expect A2c>1 if the 2c corpus was once used more frequently than today (or: with a smaller vocabulary each word is used more frequently).

Subsequently, the frequency of 2c verb utilization decayed exponentially with time:


The rate coefficient,k(f0) , is a unique function of the (time-independent) verb meaning frequency,f0(r). Eventually, after some time t that we opt to determine,f2c(r,t) reached the values observed in the Biblical lexicon (blue triangles in Figure 1).

A similar equation was suggested by Leiberman et al. [28], see their supporting Eq. (3). It can be interpreted in two ways. Firstly, like in radioactive decay: the decay of any particle is instantaneous, and one counts the number of particles surviving by time t. This is useful when texts from different epochs are available, as in [28], but not for the analysis of a single text. However, words need not disappear instantaneously from the lexicon. Their use may gradually decrease over time until they eventually become obsolete, and this allows applying the above equation even when text(s) from just a single period are available.

To proceed, a functional form for k(f) is required. We adopt Pagel et al. [29] power-law rate coefficient for lexical replacement. It depends on the part of speech, but otherwise is rather universal for the IE family, and possibly for all languages [22]:


From the correlation line for English verbs in their Figure 3a, one estimates B = 0.55 kyr-1 and β = 0.13. We do not vary these parameters in fitting our data. However, in Ref. [22] the frequencies are per million words of text, whereas in the Bible there are about 305,500 Hebrew words (a ratio of 3.27), hence what we insert into Eq. (2) is:


Adjusting t and A2c to fit the 2c data (blue triangles), we obtain the blue line agreeing with the data over the whole frequency range, even where it deviates from Zipf's law, Eq. (1). This gives t = 7.8 kyr. Adding the presumed age of BH, ca. 3 kyr, gives 10.8 kyr for the 2c > 3c transition, agreeing nicely with the onset of agriculture.

An analogous model may describe the 3c verbs, which experience exponential growth rather than decay:


Of course, such growth cannot go on indefinitely, but we assume the time-depth is not large enough to observe saturation. With exactly the same parameters as above (excepting A3c) we obtain the red line in Figure 1, which fits the 3c data at high frequencies. Thus 10.8 kyr BP marks both the end of the 2c era and the onset of 3c morphology.

As a check for the robustness of this analysis, we return to the "burning verbs" discussed in Subsec. I(b) above. We find 10 such verbs in BH (some of these are PS, and thus appear in Table 1). Their frequency-rank relation is shown in Figure 2 (circles). The deviation from Zipf's law, dashed line, is even larger and its exponent α = 2.5. Of these verbs, 6 are 2c (triangles). Although a rather small collection, we can repeat our analysis. Remarkably, when Eqs. (2) and (3b) are applied to the data, with exactly the same parameters as in Figure 1, we obtain either the dashed-dotted line (when the dashed line is used asf0), or the full line (when the circles are used asf0). Thus the "burning verbs" behave like the entire BH verb population, both yielding the same date for the 2c > 3c transition.

Figure 2. Frequency-rank plot for BH verbal roots that are near synonyms of “to burn” (circles), and their 2c subgroup (triangles).

Dashed line represents Eq. (1) with A = 900 and α = 2.5. The application of Eqs. (2) and (3b) to it gives the dash-dotted line, whereas their application to the data itself (circles) gives the full line. Parameters are identical to those in Figure 1. The frequencies of the 10 roots were taken from a Biblical Concordance, and are as follows: śrp 117 (3c), ḥry 94 (2c), bˁr 61 (3c), yṣt 30 (2c), kby 24 (2c), lh 11 (3c), yqd 9 (2c), dlq 4 (3c), qly 3 (2c), kwy 2 (2c).


In this work PS hunting vs. farming terms were collected based on the significance accorded to them in the archeological literature. Material innovations are paralleled by linguistic innovations namely: new names for new material objects and new verbs depicting their utilization. This allows to tentatively date these words independently from the comparative linguistic evidence.

From the hunter-gatherer period mostly the frequently used words have survived change. “Fire” and “water” must have been such words, because they were essential for daily survival. The associated verbs are “to burn” and “to drink”, respectively. We have collected all the PS synonyms of these four lexemes finding remarkable correlations: (a) Most of them are also PAA and/or Nostratic (corroborating their pre-agricultural origins) and (b) of 2c morphology. A similar trend is observed for PS hunting terms, which are all 2c.

The farming terms collected in Table 3 are those attributed by archeological studies to innovations of the Neolithic and Chalcolithic periods in W. Asia. These all have 3c morphology, and only rarely possess PAA cognates. We were able to find very few exceptions to this rule, and these represent secondary use of existing 2c roots. Hence PS hunting vs. farming lexicons have, on average, different time-depths and morphologies. Likely, then, a 2c-enriched hunter-gatherer language has evolved into a 3c-dominated farmer language with the transition to agriculture in W. Asia.

This suggestion is corroborated by a frequency analysis of BH verbs. While the total verbal corpus exhibits a Zipf plot with the expected exponent of about unity, its 2c subset has an observably larger exponent. This can be understood if the creation of new 2c roots ceased sometime in prehistory, and thereafter the use of the low frequency 2c verbs decreased faster than those of higher frequencies. This was turned into a novel quantitative method for dating the 2c > 3c transition. The date obtained, nearly 11 kyr BP, indeed marks the transition from hunting to farming. Thus two independent methods, applied to different parts of the Semitic lexicon ("specific" vs. "kernel"), lead to the same conclusion namely, that a major change in human lifestyle (the transition to agriculture) correlates, in W. Asia, with a major linguistic change.

Supporting Information

Text S1.

Explains how the Proto-Semitic word reconstruction was achieved.


Text S2.

List of references for the Etymological Appendix.


Text S3.

The complete Etymological Appendix composed of the above six supporting files.


Table S5.

Ranking of “weak” (originally 2c) verbal roots in the Hebrew Bible by frequency. The 2c roots were extracted from the list in Ref. 31 according to the "broad definition" in the Methods section.


Table S6.

Transliteration: Proto-Semitic consonant phonemes with their Hebrew and Arabic equivalents.



I thank Yigal Bloch for commenting on the manuscript. With an Etymological Appendix by Yigal Bloch

Author Contributions

Wrote the manuscript: NA. Supporting Information: YB.


  1. 1. Smith BD (1998) The Emergence of Agriculture. New-York: Scientific American.
  2. 2. Clutton-Brock J (1999) A Natural History of Domesticated Mammals, 2nd ed. Cambridge: Cambridge University Press.
  3. 3. Kuijt I, Goring-Morris N (2002) Foraging, farming, and social complexity in the Pre-Pottery Neolithic and Southern Levant: A review and synthesis. J World Prehistory 16: 361–440. doi:
  4. 4. Lipiński E (2001) Semitic Languages: Outline of a Comparative Grammar (Orientalia Lovaniensia Analecta 80). 2nd ed. Leuven: Peeters.
  5. 5. del Olmo Lete G (2008) Questions of Semitic Linguistics: Roots and Lexeme. The History of Research. Bethesda: CDL Press. pp 53–86.
  6. 6. Agmon N (2010) Materials and language: Pre-Semitic structural change concomitant with transition to agriculture (Etymological Appendix by Y. Bloch). Brill’s Annu Afroasiatic Lang Linguistics 2: 23–79, with an introductory note by Lowenstamm J, ibid. 2: 1–22.
  7. 7. Diakonoff I (1998) Earliest Semitic society linguistic data. J Semitic Studies 43: 209–219. doi:
  8. 8. Kogan L (2011) Proto Semitic Lexicon. In: The Semitic Languages: An International Handbook, ed Weninger S (in collab with Khan G, Streck . MP, Watson JCE), chap 8. Berlin: De Gruyter Mouton.
  9. 9. Biggs RD, Brinkman JA, Civil M, Farber W, Gelb IJ, et al. (eds.) (1956-2010) The Assyrian Dictionary of the Oriental Institute of the University of Chicago. Vols. 1–20. Chicago: Oriental Inst. (abbreviated herein CAD).
  10. 10. Kitchen A, Ehret C, Assefa S, Mulligan CJ (2009) Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East. Proc Biol Sci 276: 2703–2710. doi: PubMed: 19403539.
  11. 11. Militarev A (2000) Towards the chronology of Afrasian (Afroasiatic) and its daughter families. In: Time Depth in Historical Linguistics, Renfrew C, McMahon A, Trask L, editors. . Cambridge: Mcdonald Inst Arch Res. pp 267–307.
  12. 12. Orel VE, Stolbova OV (1995) Hamito-Semitic Etymological Dictionary: Materials for Reconstruction. Leiden: Brill (abbreviated herein HSED).
  13. 13. Takács G (1997) Hamito-Semitic Etymological Dictionary: Materials for a Reconstruction by Vladimir É Orel; Olga V Stolbova. J Cuneiform Studies 49: 108–117. doi:
  14. 14. Kogan L (2002) Addenda et corrigenda to the Hamito-Semitic Etymological Dictionary (HSED) By V Orel and O Stolbova (II ). J Semitic Studies 47: 183–202. doi:
  15. 15. Takács G (1998) Afro-Asiatic (Semito-Hamitic) substratum in the Proto-Indo-European cultural lexicon. Lingua Posnaniensis 40: 141–172.
  16. 16. Militarev A (2002) The prehistory of a dispersal: The Proto-Afrasian (Afroasiatic) farming lexicon. In: Examining the Farming/Language Dispersal Hypothesis, P. BellwoodC. Renfrew. Cambridge: Mcdonald Inst Arch Res. pp 135–150.
  17. 17. Ehret C, Keita SOY, Newman P (2004) The origins of Afroasiatic. Science 306: 1680–1681. doi: PubMed: 15576591.
  18. 18. Militarev A, Stolbova OV (n.d.) Database of Afroasiatic Etymology. Online edition: Available online at: (abbreviated herein DAE). Accessed 2013 Dec 1.
  19. 19. Dolgopolsky A (1998) The Nostratic Macrofamily and Linguistic Plaeonthology. Cambridge: Mcdonald Inst Arch Res.
  20. 20. Dolgopolsky A (2008) Nostratic Dictionary. Cambridge: McDonald Inst Arch Res (abbreviated herein ND).
  21. 21. Bomhard AR (2008) Reconstructing Proto-Nostratic. Comparative Phonology, Morphology and Vocabulary. Leiden: Brill (abbreviated herein RPN).
  22. 22. Pagel M, Atkinson QD, Calude AS, Meade A (2013) Ultraconserved words point to deep language ancestry across Eurasia. Proc Natl Acad Sci U S A 110: 8471–8476. doi: PubMed: 23650390.
  23. 23. Baayen RH (2001) Word Frequency Distributions. Dordrecht: Kluwer.
  24. 24. Zipf GK (1949) Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley.
  25. 25. Kwapień J, Drożdż S, Orczyk A (2010) Linguistic complexity: English vs. Polish, text vs. corpus. Acta Physiologica Polonica A 117: 716–720.
  26. 26. Ferrer , Cancho R, Solé RV (2010) Two regimes in the frequency of words and the origins of complex lexicons: Zipf’s law revisited. J Quant Linguistics 8: 165–173.
  27. 27. Petersen AM, Tenenbaum JN, Havlin S, Stanley HE, Perc M (2012) Languages cool as they expand: Allometric scaling and the decreasing need for new words. Sci Rep 2: 943. PubMed: 23230508.
  28. 28. Lieberman E, Michel J-B, Jackson J, Tang T, Nowak MA (2007) Quantifying the evolutionary dynamics of language. Nature 449: 713–716. doi: PubMed: 17928859.
  29. 29. Pagel M, Atkinson QD, Meade A (2007) Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449: 717–721. doi: PubMed: 17928860.
  30. 30. Koehler L, Baumgartner W (1994–2000) The Hebrew and Aramaic Lexicon of the Old Testament (trans. under the supervision of Richardson MEJ). Leiden: Brill.
  31. 31. Lester GB (2011) Frequency list for Biblical Hebrew. Available: . Accessed 2013 December 1.
  32. 32. Vernet i Pons E (2008); Formació I estructura de les arrels verbals en semític: Commentari etimològic dels determinatius radicals presents en els verba tertiae infirmae de lʼhebreu bíblic masorètic. Ph.D. Thesis. Univ. Barcelona. . Accessed 2013 Dec 1.
  33. 33. James SR (1989) Hominid use of fire in the Lower and Middle Pleistocene. Curr Anthropol 30: 1–26, p 1.
  34. 34. Goldberg P, Dibble H, Berna F, Sandgathe D, McPherron SJP, et al. (2012) New evidence on Neandertal use of fire: Examples from Roc de Marsal and Pech de l’Azé IV. Quat Internat 247: 325–340, p 325.
  35. 35. Goren-Inbar N, Alperson N, Kislev ME, Simchoni O, Melamed Y et al. (2004) Evidence of Hominin control of fire at Gesher Benot Ya‘aqov, Israel. Science 304: 725–727. doi: PubMed: 15118160.
  36. 36. Shea JJ (2006) The origins of lithic projectile point technology: Evidence from Africa, the Levant, and Europe. Arch Sci 33: 823–846. doi:
  37. 37. Peterson J (1998) The Natufian hunting conundrum: Spears, atlatls, or bows? Musculoskeletal and armature evidence. Int J Osteoarch 8: 378–389. doi:
  38. 38. Bocquentin A, Bar-Yosef O (2004) Early Natufian remains: Evidence for physical conflict from Mt. Carmel, Israel. J Human Evol 47: 19–23. doi:
  39. 39. Zeder MA (1994) After the revolution: Post-Neolithic subsistence in Northern Mesopotamia. American Anthropol 96: 97–126. doi:
  40. 40. Oats D (1990) Innovations in mud-brick: Decorative and structural techniques in Ancient Mesopotamia. World Arch 21: 388–406, p 390.
  41. 41. Sherratt A (1983) The secondary exploitation of animals in the Old World. World Arch 15: 90–104. doi:
  42. 42. Zeder MA (2008) Domestication and early agriculture in the Mediterranean basin: Origins, diffusion, and impact. Proc Nat Acad Sci USA 105: 11597–11604, Fig. 1.
  43. 43. Kuijt I, Finlayson B (2009) Evidence for food storage and predomestication granaries 11,000 years ago in the Jordan Valley. Proc Natl Acad Sci U S A 106: 10966–10970. doi: PubMed: 19549877.
  44. 44. Peltenburg E, Colledge S, Croft P, Jackson A, McCartney C et al. (2000) Agro-pastoralist colonization of Cyprus in the 10th millennium BP: Initial assessments. Antiquity 74: 844–853.
  45. 45. Galili E, Weinstein-Evron M, Hershkovitz I, Gopher A, Kislev M et al. (1993) Atlit-Yam: A prehistoric site on the sea floor off the Israeli coast. J Field Arch 20: 133–157. doi:
  46. 46. Kislev ME, Hartmann A, Galili E (2004) Archaeobotanical and archaeoentomological evidence from a well at Atlit-Yam indicates colder, more humid climate on the Israeli coast during the PPNC period. Arch Sci 31: 1301–1310. doi:
  47. 47. Garfinkel Y, Vered A, Bar-Yosef O (2006) The domestication of water: The Neolithic well of Sha‘ar Hagolan, Jordan Valley, Israel. Antiquity 80: 686–696.
  48. 48. Oates D, Oates J (1976) Early irrigation agriculture in Mesopotamia. In: Problems in Economic and Social Archaeology, Sieveking Gde G, Longworth IH, Wilson KE, editors. . London: Duckworth. pp 109–135. p 119a.
  49. 49. Nieuwenhuyse OP (2006) The earliest ceramics from Tell Sabi Abyad, Syria. Leiden J Pottery Studies 22: 111–128.
  50. 50. Levy TE (1983) The emergence of specialized pastoralism in the southern Levant. World Arch 15: 15–36. doi:
  51. 51. Evershed RP, Payne S, Sherratt AG, Copley MS, Coolidge J et al. (2008) Earliest date for milk use in the Near East and southeastern Europe linked to cattle herding. Nature 455: 528–531. doi: PubMed: 18690215.
  52. 52. McGovern PE (2003) Ancient Wine: The Search for the Origins of Viniculture. Princeton: Princeton University Press.
  53. 53. Barnard H, Dooley AN, Areshian G, Gasparyan B, Faull KF (2011) Chemical evidence for wine production around 4000 BCE in the Late Chalcolithic Near Eastern highlands. Arch Sci 38: 977–984. doi:
  54. 54. Michel RH, McGovern PE, Badler VR (1992) Chemical evidence for ancient beer. Nature 360: 24–24. doi:
  55. 55. Dietrich O, Heun M, Notroff J, Schmidt K, Zarnkow M (2012) The role of cult and feasting in the emergence of Neolithic communities. New evidence from Göbekli Tepe, south-eastern Turkey. Antiquity 86: 674–695.
  56. 56. Lu H, Zhang J, Liu KB, Wu N, Li Y et al. (2009) Earliest domestication of common millet (Panicum miliaceum) in East Asia extended to 10,000 years ago. Proc Natl Acad Sci U S A 106: 7367–7372. doi: PubMed: 19383791.
  57. 57. Lawler A (2009) Bridging East and West. Science 325: 940–943, p 942.
  58. 58. Blažek V, Boisson C (1993) The diffusion of agricultural terms from Mesopotamia. Archiv Orientalni 60: 16–37.
  59. 59. Ehret C (2010) History and the Testimony of Language. Berkeley: University of California Press. p 148.
  60. 60. Henry D (1975) Fauna in the Near Eastern archaeological deposits. In: Problems in Prehistory: North Africa and the Levant, eds Wendorf F, Marks AE, pp 379–385. Dallas: S.M.U. Press.
  61. 61. Legge AJ, Rowley-Conwy PA (1987) Gazelle killing in Stone-Age Syria. Scientific American 257: 76–83.