Understanding the patterns and causes of differential structural stability is an area of major interest for the study of language change and evolution. It is still debated whether structural features have intrinsic stabilities across language families and geographic areas, or if the processes governing their rate of change are completely dependent upon the specific context of a given language or language family. We conducted an extensive literature review and selected seven different approaches to conceptualising and estimating the stability of structural linguistic features, aiming at comparing them using the same dataset, the World Atlas of Language Structures. We found that, despite profound conceptual and empirical differences between these methods, they tend to agree in classifying some structural linguistic features as being more stable than others. This suggests that there are intrinsic properties of such structural features influencing their stability across methods, language families and geographic areas. This finding is a major step towards understanding the nature of structural linguistic features and their interaction with idiosyncratic, lineage- and area-specific factors during language change and evolution.
Citation: Dediu D, Cysouw M (2013) Some Structural Aspects of Language Are More Stable than Others: A Comparison of Seven Methods. PLoS ONE 8(1): e55009. https://doi.org/10.1371/journal.pone.0055009
Editor: John P. Hart, New York State Museum, United States of America
Received: September 13, 2012; Accepted: December 18, 2012; Published: January 28, 2013
Copyright: © 2013 Dediu, Cysouw. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Michael Cysouw was funded by European Research Council Starting Grant 240816 "Quantitative modeling of historical-comparative linguistics: Unraveling the phylogeny of native South American languages (QuantHistLing)." http://erc.europa.eu. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Languages always change no matter how vocal prescriptivists are and how strongly the rules of “good” language are enforced . However, different languages – and even different aspects of a single language – change in different manners and at different rates. For example, Icelandic is notoriously conservative among the Germanic languages , while English has relatively suddenly lost most of its morphological case marking system inherited from Old English. Understanding the patterns and causes of differential structural stability is an area of major interest for the study of language change and evolution. We will first briefly discuss the notion of stability in the context of molecular biology, before turning to previous research on stability in linguistics.
Stability in Biology
This situation is similar to evolutionary biology, where stability (and its complement, the rate of evolution) are complex outcomes of multiple factors, including universal and lineage-specific components. Neutral genetic markers evolve at a constant rate dictated by mutation rate  resulting in a molecular clock, while nearly neutral markers evolve at a rate determined by mutation and population size , reflecting the balance between mutation (production of novelty) and genetic drift (purging variation from the population). However, this is complicated by the non-constancy of mutation rates across the genome, across species and time, being influenced by, among others, the local DNA context, metabolism, life history parameters, age, gender and environmental stress , . The various types of natural selection add a supplementary level of complexity. For example, purifying selection will tend to resist change, while positive selection will increase the rates of evolution .
Thus, there are highly conserved genes, such as those coding for ribosomal RNA present across the whole of cellular life and covering at least 3.5 billion years  or the Pax6 gene (a master gene controlling the cascade leading to eye development) so well conserved that the mouse gene induces eye formation in the fruit fly , while the fruit fly homologue genes eyeless and twin of eyeless induce the formation of several eye structures in the frog embryos . At the other end of the spectrum, there are genes which evolve extremely fast, such as some involved in the immune system  or male reproductive biology , , where strong and dynamic natural or sexual selective pressures are acting. Interestingly, there are also stretches of DNA which, despite being very stable in general, have changed a lot in a given lineage, such as the so-called human accelerated regions (HARs; , ) which have changed dramatically in the lineage leading to us. Several genes involved in microcephaly , such as ASPM, Microcephalin and SHH, show faster evolution in primates and especially in the lineage leading to humans, suggesting that they might have been partly responsible for the evolution of increased human brain size. FOXP2, a gene involved in developmental verbal dyspraxia  is one of the most conserved genes within vertebrates , but modern humans and Neandertals carry a specific variant which differs at 2 positions from the chimpanzee , . A proposed explanation for these differences in patterns of stability among genes is represented by the extended complexity hypothesis , suggesting that genes that are involved in complex and extensive interactions, and whose products participate in informational processes (transcription, translation and related aspects) and other complex functions, tend to change more slowly.
Stability of Vocabulary
In the case of language, recent work – shows that not all concepts in a list of basic vocabulary – i.e. a standardised list of concepts selected for their universality, the best-known being the Swadesh 100 and 200-words lists  – are equally stable. For example, concepts such as “two”, “who”, “tongue”, “night”, “one” and “to die” seem to be extremely stable in the Indo-European language family, showing at most 1 cognate replacement per 10,000 years of language change (depending on the assumed age of the family), while the most unstable meanings, such as “dirty”, “to turn”, “to stab” and “guts”, show up to 9 such replacements during the same period . Moreover, the stability of these concepts has a relatively strong universal component, in the sense that their relative stabilities tend to be conserved across several different language families , , , . An important explanatory factor seems to be the frequency of use of these concepts, with the more frequently used tending to be more stable . Thus, it seems that certain concepts have a set of properties, including their frequency of use, which tend to make them resilient against lexical replacement across language families, time and space, the most stable showing a fidelity comparable to that of genetic systems .
Stability of Structural Linguistic Features
The properties and patterns of stability of structural aspects of language, such as the order of subject and verb or the number of consonants in a language, are less well understood. Some authors – suggest that the distributional properties of structural features might inform us about deeper historical relationships than are accessible through the standard comparative method of historical linguistics , , and they seem, at least in some cases, more resistant to admixture than human genes .
In contrast (and in agreement with widespread assumptions in historical linguistics), recent work  compared the historical signal and phylogenetic stability of the basic vocabulary to that of structural features in the Indo-European and the Austronesian language families and found that in both families the vocabulary data fitted the comparative-method established family trees much better than the structural data. This suggests that structural features evolve much faster and/or are more influenced by contact phenomena  than basic vocabulary. Moreover, the rates of evolution were roughly similar for vocabulary and structural data in both families but the structural features stabilities’ (in contrast to the vocabulary) show very weak correlations across these language families, leading the authors to conclude that they “do not support the existence of a set of universally stable typological features” (p.6). Likewise, other recent work  suggests that structural properties are language family-specific, although this work was not directly aimed at studying the stability of structural features but to understanding the regularities governing the temporal dynamics (i.e. correlated evolution) of various aspects of word order. Both these studies suffer from a limited coverage of different language families, potentially undermining the generality of their findings. By considering more language families and structural features, recent work involving the first author ,  seems to reconcile these two views by finding that there is an important universal, cross-language family component to the stability of structural features, but that there is also a non-negligible amount of variation among families. The view that multiple factors, such as universal tendencies, vertical and horizontal processes are requried to explain linguistic diversity has been suggested before (e.g., , ), but more empirical work using large databases and modern quantitative methods is required for a thorough understanding of this complex interplay .
There is currently a vigorous debate concerning the stability of the structural properties of languages discussing whether (i) there are universal, cross-language family tendencies in that some features are more stable than others, or (ii) the stability of a feature is entirely a language family-idiosyncratic property. The first view points to possible universal biases acting on structural features which influence their stability. These biases can be due to communicative pressures, or to extra-linguistic factors (such as neuro-physiologic, cognitive, articulatory and perceptive constraints), or to factors related to the linguistic system itself. As a result, some features might play a more central role in shaping the structural system of a language (alike to the model of the extended complexity hypothesis  in biology). The second view instead suggests that “historical accidents” (or “driving factors” in ) specific to individual language families are the major determinants of structural change. Of course, there is the third possibility , , namely that these two views are not mutually exclusive but complementary.
Summary of Paper
The present paper represents an empirical approach to the issues surrounding the stability of structural features. Here, we compare seven different published methods of estimating the stability of structural linguistic features, in order to quantify their overlap and differences. These seven methods propose different definitions of the concept of structural feature stability and different estimation techniques of these stabilities, while using the same large database of language families and features, namely the World Atlas of Language Structures (WALS) . Reviewing the various methods, we found that the seemingly simple concept of structural stability hides an irreducible complexity, mainly due to the prevalence and importance of horizontal processes in language change  and the manner in which various proposals acknowledge and quantify them. We believe that conceptual clarity about structural stability is a necessary step in any discussion concerning language change and evolution, and our empirical approach complements theoretical frameworks such as Nichols’ .
Most importantly, we found that, despite this variety of conceptualizations and methodological approaches, there is an important agreement between these different stability estimates. This strongly suggests the existence of a universal, cross-language family component, probably due to intrinsic properties of the structural features above and beyond the particular constraints of the specific language families and areas. However, this universal component does not explain the whole range of variation in structural stability, showing that there are also language family-specific factors at work. These findigs might help better understand the interplay between the various “competing forces” and the relationship between different types of stability in particular language families and areas and for particular structural features , . We hope that these findings will open the door to a research program aiming at understanding the nature and exact mixture of universal and idiosyncratic components governing structural language change and evolution.
Materials and Methods
We conducted an extensive literature survey in order to identify and compare different proposals about the concept of stability as applied to structural features of language. Given the differences between proposals that our survey revealed, only a very general gist of this concept (or rather of its opposite, instability) can be formulated, namely as the easiness with which features change value across time, under the influence of various processes. To ensure comparability and objectivity, we defined several criteria the proposals must meet in order to be considered:
- they must be described in published form or in publicly available drafts designed for publication;
- they must use a concept of stability fitting our general gist;
- they must be quantifiable, objective and repeatable;
- they must deal with many structural features, preferably using the WALS or equivalent datasets, allowing thus the comparison with other methods;
- they must produce estimates across many language families, preferably using the WALS, the Ethnologue or equivalent classifications, allowing broad comparability and cross-checking; and
- they must produce at least a rank of feature stabilities from the most stable to the most unstable.
We found seven methods that meet our criteria and we briefly describe them in the alphabetic order of the first author. All these methods used the structural features, their values and the language families as given by WALS , except for the method described in  which also used the classification given in the Ethnologue .
Cysouw, Albu & Dress (2008): the Consistency with Overall Patterns
These authors  develop a very original take on the issue of stability, in that their primary interest is in identifying consistent structural features. Such features “are most indicative of the overall structure of a language […] of the typological profile, or ‘genius’ of a language” (p.263) and are identified by comparing their distributional properties with the “averaged” distribution of many features. The fundamental insight is to compute the typological (structural) distances between languages relative to each structural feature and to quantify how accurately the typological distance given by any single feature reflects the overall typological distance given by all features considered simultaneously.
More specifically, the authors start by defining the typological distance between a pair of languages and relative to a feature as being 0 if both languages share the same attested feature value, 2 if they have attested but different values, and 1 if the feature value for at least one language is unattested (missing data). This is extended to a set of features by taking the average of for all features which have attested values in both languages. This leads naturally to a set of distance matrices between all pairs of languages, first, one matrix per feature, , and, second, an overall distance matrix computed for all considered features. The features for which is more similar to are defined as more consistent. The authors propose three ways to quantify this fit between and :
- Mantel’s congruence test, denoted in the following by CM, is based on Mantel’s proposal  to compute the similarity between the two matrices and as the proportion of matrices (obtained by randomly permuting ’s rows) which have a higher correlation with . In effect, this method uses the inverse of the p-value derived from the permutations test as the measure of consistency;
- the coherence method (CC) is based on the “excess” of two languages and relative to a third language given a distance matrix , denoted and computed as averaged across all languages for the feature , versus the overall distances . The excess measures the extra distance between languages and when taking a detour through language . With these, the coherence of feature is the ratio of the excess given to the excess given averaged across all possible triplets of languages, ;
- the rank method (CR) is based on the rank of a language relative to another language defined as the number of languages whose distance to is smaller than the distance of to , With these, the rank matrix between languages is defined as , and the coherence of a feature is computed based on the average ranks of languages sharing the same feature values, , where have the same value for as , and is the smallest possible value of the sum in the numerator. Thus, the method quantifies the ranks of languages that share the feature value with averaged across all languages .
Thus, there are three different ways of measuring the consistency of a structural feature with the overall pattern of all features, each quantifying it in a different manner. Given the complexity of these methods, we urge the interested readers to consult the original paper  for a better understanding of their details.
The authors report that (i) these quantifications tend to give comparable results across different datasets (with CC being the most resilient), that (ii) their sensitivity to the amount of missing data varies dramatically between methods (with CR being the least sensitive), that (iii) they seem relatively unaffected by the distribution of feature values and, surprisingly, that (iv) they do not inter-correlate very well (except for CC and CR, for which , ).
Finally, and importantly for the present paper, the authors tested the relationship between consistency and genealogical stability by comparing the distances between related and unrelated languages using sets of the most consistent 25%, 50% and 75% features, and found that related languages had significantly lower distances than unrelated languages. This suggests that the most consistent features might be genealogically stable as well, in the sense that they do distinguish between related and unrelated languages, their values tending to be inherited. If such consistent features do indeed reflect the “typological profile” of the languages and if we assume that such profiles tend to be vertically transmitted, then consistent features will also be stable in the genealogical sense, being inherited from ancestor to daughter languages.
Dediu (2011): the Phylogenetic Rates of Evolution
The approach from  (denoted in the following as D) estimates the stability of structural features from a Bayesian phylogenetic perspective , . More precisely, for a given language family, the observed values of the structural features within this family’s languages together with the tree representing the genealogical relationships between these languages are used to infer the rates at which the structural features have changed in this family. As is specific to Bayesian methods in general , , this results in a posterior distribution of such rates giving the posterior probability that a particular structural feature changes at a particular rate in this family.
The author uses two methods for estimating the rates of structural change: (i) a method estimating the probability of a transition between two feature values during an infinitesimally short time, implemented in MrBayes 3 , and (ii) another method related to maximum parsimony  estimating the minimum number of changes required to produce the observed feature values starting from the inferred root value, implemented in BayesLang . It must be highlighted that these phylogenetic methods are agnostic in what concerns the cause for these structural changes: they could include “spontaneous mutation”, borrowing, language shift and various forms of selective pressures acting on language (“driving factors”) such as cognitive, perceptive or articulatory biases. Therefore, non-vertical processes are treated as just another cause of structural change and reflected in the posterior distribution of rates .
In order to control for the influence of particular historical classifications, the author used the classifications given by WALS  and Ethnologue , later  extending this to a collection of more accepted classifications . Given that absolute rates of change cannot be directly compared across language families (as it would require the absolute dating of the root proto-languages), they were converted to ranks with features ordered from the most to the least stable. This reduction in measurement level to ordinal ensures comparability not only across family trees but also across different methods. Most importantly, all combinations of the two methods of stability estimation and the three classifications produced highly similar estimates for the rates of structural change , .
Maslova (2002, 2004): Estimating Transition Probabilities
Elena Maslova – proposed a method to estimate the transition probabilities between the values of a given structural feature using pairs of closely related languages. Basically (for a complete exposition of Maslova’s method see  and especially Methods S1 here, where we also give the R code [Script S1] used to implement the method using the WALS data [Dataset S1] ), for a binary structural feature which can take values and , there are four possible transitions in a given time period for a given language: with probability , with probability , does not change with probability , and does not change with probability . With these, the stability of feature is
To estimate and , we need to sample pairs of related languages and compute the divergence rate, , defined as the proportion of such pairs differing for feature . If we denote the frequency of languages with value for feature as , Maslova derives the following equation (see the Supporting Online Information):allowing the estimation of and from at least two such samples.
This method is based on the same fundamental insights  as the fully phylogenetic methods discussed above , but it uses a much simpler statistical approach and requires stronger assumptions concerning the relationships between the pairs of closely related languages. We have implemented this method in R  and estimated the stability of the structural features in WALS  using WALS’ genera (the intermediate level between individual languages and language families, such as Germanic and Romance within Indo-European) to provide the sets of closely related languages (these estimates are denoted in the following as M).
Parkvall (2008): Borrowability versus Genealogical Stability
Mikael Parkvall  proposes to distinguish between features which have a strong genealogical signal (‘a language needs to be “born with them” in order to have them’) from those that ‘may “come and go as they please’” (p.234). More precisely, he contrasts genealogically stable features defined as “a language either has it or lacks it, but whatever the case, contact or internal development is not going to change much” to unstable features, defined as “an easily borrowable or transferable characteristic, or for that matter, a feature easily gained or lost in contact” (p.235). Thus, it seems that his real focus is not on resistance to change whatever the cause of the change, be it internal (“mutation”-like processes) or external (language contact, selective pressures, random sampling or various types of constraints), but specifically on resistance to borrowing.
This is reinforced by the actual operationalisation of his definition (pp.235–238), which can be summarised as follows. First, he contrasts genealogical units (families and subfamilies from WALS ) to areal units (shown in his Map 1 on p.236). Second, for a given feature and a unit he computes the Herfindahl-Hirschman index (or Gini coefficient) defined as.where is the proportion of languages in which have value for the considered feature; in fact, this index is widespread in economics and is closely related to entropy (see Appendix A in ). The group’s homogeneity is obtained by taking the reciprocal of the Gini coefficient,
These homogeneities are then averaged over all the considered groups, resulting in the average homogeneity of feature over families (when the groups are genealogical) and areas (when the groups are areal) . The actual measure of stability is the ratio of the two:
When using all the language families available in the WALS, the author obtains a stability estimate which we will denote here as , but he also considered only a subset composed of “only the most widely accepted families” (Algonquian, Austronesian, Bantu, Dravidian, Indo-European, Iroquoian, Mayan, Mongolic, Semitic, Sino-Tibetan, Turkic and Uralic; p. 240) resulting in an estimate denoted here as . Those features scoring high on (and ) are those for which , and thus those for which related languages tend to share the same value as opposed to those for which languages in contact share the same value.
Therefore, we propose that this method should be more appropriately seen as estimating borrowability (and its opposite, resistance to borrowing) and not as stability in the sense of resistance to change. The difference between them is readily seen if we think about a feature which is easy to borrow and yet not very stable.
Wichmann & Holman (2009): Stable Features Tend to “Stay in the Family”
These authors  define the stability of a feature as “the probability that a given language remains unchanged with respect to the feature during [a fixed and arbitrary number of years; our note], that is, the feature undergoes neither internal change nor diffusion during the interval” (p.12), being thus explicitly an estimate of the feature’s resistance to change irrespective of the causes of change. They propose three slightly different methods for estimating the relative stabilities of WALS features. However, they conclude (by using computer simulations) that “metric C” performs the best, this is the only one we will describe and use here (denoted in the following as W).
The idea behind “metric C” is “that if one given feature more often tends to have the same value for languages that are related than does another given feature, then the first of the two may be considered to be more stable” (p.16) but the authors also correct for overall tendencies as well. For a feature and a genealogical group consisting of languages for which the feature is attested, they compute the proportion of pairs of languages sharing the same feature value . These proportions are then averaged across groups by weighting each group by resulting in.
Thus, given that both and are bounded by and , quantifies how much more similar related languages are than unrelated languages (on average), weighted by the maximum possible such difference. Therefore, is a measure of genealogical stability irrespective of the actual causes of change.
Comparing the Methods
For each of these methods we have extracted the estimated stabilities for each of the 142 structural features in WALS , as follows:
- Cysouw, Albu & Dress (2008): for each of their three methods (CM, CC and CR) we have extracted the estimated stabilities from their paper’s Appendix D . We use here the negative of CR to align it with the other methods;
- Dediu (2011): we have extracted the agreed ranks (the scores on the first principal component) for the polymorphic features from Table S7 in the paper’s Electronic Supplementary Material . We use here the negative of this estimate, D, to align it with the other methods;
- Maslova (2002, 2004): we computed the stabilities, M, as described above and as detailed in the Supplementary Material Online;
- Parkvall (2008): to allow comparability with the other methods, we have retained only the estimates for polymorphic features as computed using “all families” (denoted in the following by P) and using only the “most widely accepted families” (P), both extracted from his paper’s Appendix (pp.245–250) ;
- Wichmann & Holman (2009): we extracted the estimates produced by their “metric C” (denoted in the following by W) from their paper’s Appendix 1 (pp.43–46) .
These estimates are reported in Table 1 and each method’s coverage of the 142 WALS features is given in Table 2. P and M cover the most features (136; 97.18%) while D covers only 68 (47.89%): these differences are explained by the minimal requirements of the methods and the threshold of maximally acceptable proportion of missing data used by the different authors. There are 62 (43.66%) shared features covered by all methods, namely (please note that WALS uses unique numeric identifiers for its features, given here in parantheses): Consonant inventories (1), Vowel quality inventories (2), Voicing in plosives and fricatives (4), Uvular consonants (6), Glottalized consonants (7), Lateral consonants (8), Velar nasals (9), Vowel nasalization (10), Front rounded vowels (11), Syllable structure (12), Tone (13), Absence of common consonants (18), Locus of marking in the clause (23), Locus of marking in possessive noun phrases (24), Reduplication (27), Number of genders (30), Definite articles (37), Indefinite articles (38), Distance contrasts in demonstratives (41), Pronominal and adnominal demonstratives (42), Third-person pronouns and demonstratives (43), Gender distinctions in independent personal pronouns (44), Politeness distinctions in pronouns (45), Number of cases (49), Asymmetrical case marking (50), Ordinal numerals (53), Numeral classifiers (55), Position of pronominal possessive affixes (57), Obligatory possessive inflection (58), Possessive classification (59), Nominal and verbal conjunction (64), Perfective/imperfective aspect (65), Past tense (66), Future tense (67), Perfect (68), Morphological imperative (70), Optative (73), Overlap between situational and epistemic modal marking (76), Semantic distinctions of evidentiality (77), Suppletion according to tense and aspect (79), Verbal number and suppletion (80), Order of subject and verb (82), Order of object and verb (83), Order of adposition and noun phrase (85), Order of genitive and noun (86), Order of adjective and noun (87), Order of numeral and noun (89), Order of degree word and adjective (91), Position of polar question particles (92), Position of interrogative phrases in content questions (93), Verbal person marking (102), Order of person markers on the verb (104), Passive constructions (107), Antipassive constructions (108), Applicative constructions (109), Symmetric and asymmetric standard negation (113), Predicative adjectives (118), Nominal and locational predication (119), Zero copula for predicate nominals (120), ‘When’ clauses (126), ‘Hand’ and ‘arm’ (129), M-T pronouns (136). N-M pronouns would also belong to this list if not for P2.
Conceptually, these methods propose quite different approaches to the structural stability of language. Dediu (2011)  uses a standard concept from evolutionary biology, in which stability is equated with resistance to change (irrespective of the causes of change) while languages evolve following an assumed tree-like history. Stable features are those with a low rate of change. A related idea, but much simpler and ignoring many possible problematic issues, is proposed by Wichmann & Holman (2009) , which take stable features to be those that tend to share values within families rather than across them. Maslova’s method – also shares fundamental insights with Dediu (2011) and Wichmann & Holman (2009) in the sense that stability is understood in a genealogical context. Parkvall (2008)  estimates something which would probably be better called non-borrowability rather than stability in the sense that features high on this scale are those shared more within genealogical units than within linguistic areas. Finally, Cysouw, Albu & Dress (2008)  describe a method which apparently has no genealogical component, whereby they estimate the consistence of a feature with the overall pattern given by many features.
All analyses and graphs were realised using R .
Pairwise Relationships between Methods
Using all 62 features shared across all methods, the relationships between all pairs of methods are represented in the scatterplots in Figure 1. Table 3 shows the pairwise correlations (Pearson’s and Spearman’s ) between the stability estimates. For each pair of methods, we inspected the scatterplots and regression diganostic plots (using R’s lm() function; Residuals vs Fitted, QQ-plot and Leverage) in order to identify outliers. Given the small number of shared features across all methods, we applied a conservative approach by selecting only those features that were strong outliers for several pairs of methods, identifying the following features: Verbal Number and Suppletion (80), Obligatory Possessive Inflection (58), M-T Pronouns (136), and Front Rounded Vowels (11); see also Figure 1. Without these outliers the correlations do not change much (see Table 4), except for , as it takes an extreme position for almost all outlier features.
Each panel shows the scatterplot of the stability estimates for the shared features produced by a pair of methods (grey dots) and the identified outliers (red crosses; see text for details). The regression lines with the outliers (red) and without (blue) have been drawn for convenience.
Figure 2 shows a different view of the relationship between methods: in the 62-dimensional space determined by the shared features, each method represents a single point with coordinates given by the relative ranks of the features as computed by the method. In order to meaningfully compare the feature stabilities across methods, we converted them to relative ranks between 0.0 (most unstable feature) and 1.0 (most stable feature). We did this for each method separately, by first computing the ranks of the method’s actual stability estimates for all the features that the method provided estimates for, and then by normalizing these ranks between 0 and 1 using the formula where is a given feature’s rank and and are the smallest and largest ranks respectively. In this space we computed all pairwise Euclidean distances between the methods and projected these distances on two dimensions using classical multidimensional scaling (MDS) , resulting in Figure 2. A small distance between two methods means that they tend to estimate the same relative stabilities for all features, while a large distance signals disagreements between methods. The maximum possible distance in this space is but the distances between methods are between 1.34 and 3.43 with a mean of 2.43, suggesting again that the methods agree better than expected by chance. This was confirmed by randomly generating 10,000 sets of seven points in this 62-dimensional space and comparing the distribution of these generated distances to the observed distances between methods: both min and mean observed distances are much smaller than expected (), while the maximum distance is also smaller but within the distribution of maximum random distances (). It can be seen that CC and CR form a tight cluster, as do P2 and W. Further, D and M are relatively close together, while CM is a clear outlier.
The distances between methods computed in the 62-dimensional space defined by the relative ranks of all shared features projected using classic Multidimensional Scaling (MDS). The results excluding the outlier features are extremely similar.
Principal Components Analysis
To better understand the relationships between the stability estimates produced by these methods we have conducted a Principal Components Analysis  both on the full set of shared features and on the set excluding the outliers, using the actual stability estimates provided by each method.
On the full set of 62 shared features, the first four Principal Components explain in total 89.2% of the variation (Table 5). explains 48.3% and represents the agreement between all methods (their loadings have the same sign). explains 17.2% and contrasts CC, CR and D on one hand, and CM, P1, P2 and W on the other (excluding M, which has a loading close to zero on this component). explains 13.5% and makes a further distinction between the three methods in  (CM, CC and CR) and the two methods in  (P1 and P2). Finally, explains 10.2% and contrasts D, W and M with the other methods.
The first principal component, explaining by far most of the variance (48.3%), represents the agreement between all these highly different methods. The following components also make interesting distinctions, such as the grouping together of the strongly phylogenetic method D with two of the strongly non-genealogical methods CC and CR (component 2), the identification of ’s special concept of “borrowability” (P1n and P2) versus ’s “consistency” (CM, CC and CR) (component 3), and the recognition that D, W and M methods are fundamentally similar, even if differing widely in details (component 4).
When excluding the outliers, the first four principal components explain 90.5% of the variation (Table 6). The first principal component, , explains 55.8% of the variation and represents the agreement between all methods with similar loadings to the previous case. explains 16.1% and distinguishes CC, CR, D and M on one hand, from the CM, P1, P1 and W, on the other, in a pattern similar to the previous case. However, (12.2% of the variance) and (6.8%) differ from the ones found using all shared features; still distinguishes ’s “consistency” (but not ’s “borrowability”).
Figure 3 (left panel) shows all shared features in the space, capturing together 65.5% of the variation between methods. Features that cluster together are features that show similar stability estimates across methods. The right panel compares the stability of the various WALS areas and shows that Word Order features tend to be the most stable, with Phonology covering the whole spectrum. An one-way ANOVA shows that the areas differ in their average stability (, ), but a post-hoc pairwise comparison using Tukey’s HSD shows that only Word Order – Nominal Categories survive the multiple testing correction (adjusted ), most probably due to the small numbers of features included.
Left panel: a ll shared features (given by their numeric WALS unique ID; see Table 1) plotted on the . Right panel: the distribution of the stability across the WALS areas, with the number of features of each type shown on the right. represents the strong inter-method agreement and varies from the unstable (left) to stable (right); the actual scales of the axes are arbitrary. The colours and symbols represent the WALS areas (see Table 1 for details), with the open diamond representing all shared features together. The results excluding the outlier methods are extremely similar.
The Agreement and Differences between Methods
The pairwise correlations and the Principal Components Analysis strongly suggest that there is an important agreement between these different methods in what concerns the stability of the shared features.
Table 7 shows the shared features sorted from the most stable to the most unstable by their scores on the first principal component when using all shared features, , and then by the first principal component when excluding the outlier features, . The correlation between and is , .
Figure 4 shows the shared features ordered by their median relative rank across all methods, an indication of the agreement between methods for each feature (the interquartile range, IQR), as well as the actual estimate given by each individual method. Table 8 shows the features ordered by the disagreement between methods (IQR). It can be seen that despite a clear overall concordance between methods (as shown by the first principal component), the agreement is far from perfect, and there are clear differences between methods both overall (reflected in the second, third and fourth Principal Components, and in the patterns of inter-method correlations and distances), and in what concerns the estimated stability of individual features.
The stabilities (as relative ranks from 0.0 = most unstable to 1.0 = most stable) of the shared features as estimated by all methods. Shown are the median stability (black thick lines), the interquartile range (IQR; light gray) and the individual method estimates (D, 1, 2, W, A, C, R, and M – see legend for details). The features with a significantly smaller or larger IQR than expected by chance are marked with red “<” and blue “>” symbols respectively on the right-hand side of the figure, with the number of symbols being one for , two for and three for ; please note that this is before the multiple testing correction, after which only the features with , 11 and 87, survive (see text for details). The features are represented by transparently abbreviated names derived from their full WALS names (see Table 8) and their WALS unique IDs.
The IQR varies between 0.08 and 0.89 with a mean of 0.30, and while some features show very little disagreement between methods (such as 87: Order of Adjective and Noun, 86: Order of Genitive and Noun and 18: Absence of Common Consonants), for some (such as 11: Front Rounded Vowels, 129: ‘Hand’ and ‘Arm’ and 136: M-T Pronouns) the methods disagree almost completely. In order to better understand these IQR values, we computed their expected distribution by randomizing 10,000 times the stability estimates across features and computing the IQR between the methods; the distribution is relatively normal (as judged from the QQ-plot) with a mean of 0.45 and standard deviation of 0.14. For each feature we them compared its IQR to this expected distribution and we found that, after Holm’s correction for multiple testing , only feature 87: Order of Adjective and Noun has an IQR less than expected (adjusted ), and feature 11: Front Rounded Vowels has an IQR much larger than expected (adjusted ). Without multiple testing correction, 24 features have a smaller IQR than expected at an -level of 0.05, but only 11: Front Rounded Vowels has a larger IQR (see Figure 4 for details). For future research it will be interesting to clarify for individual features why the methods disagree and what these disagreements mean from a theoretical perspective.
Based on our literature survey, we selected seven methods that propose different approaches to defining and quantifying the stability of the structural features of language, generally understood as the inverse of the easiness with which features change value across time, under the influence of various processes. These methods are: CM, CC and CR  three related methods which estimate the consistency between the distributional pattern of a feature with the overall distribution of all features using various measures of such consistency; D  which proposes a fully phylogenetic approach inspired from evolutionary biology; M ,  estimating the transition probabilities using pairs of closely related languages; P1 and P2  which conceptualize and estimate the borrowability of structural features in contrast to their genealogical stability; and W  which defines stability as the tendency for sharing between related languages.
The methods D, M and W are all based on the same fundamental genealogic insight, namely that related languages will tend to share the values of stable features (through inheritance from their common ancestor), but these methods still vary widely in their assumptions and implementation. In contrast, P1 and P2 look at those features that resist borrowing across genealogical units, while CM, CC and CR focus on those features that show the distributional pattern as expected from the overall pattern when all features are taken together.
If we were to assume a “competing forces” framework such as suggested by Nichols , distinguishing between Inheritance (vertical transmission from ancestor to descendant language), Borrowing (propensity to be acquired by horizontal processes), Substratum (persistence from substratum languages), and Selection (a bias favouring certain feature values), then each of these methods can be seen as estimating a particular weigthed combination of “forces”. It is beyond the scope of this paper to analyze these combinations (and it is even unclear how these weights could be empirically determined in the absence of pure estimators of the “forces”), but see Table 9 for a subjective attempt based on the description of the methods and the stability estimates they produce (Table 4). Future computational work could simulate different types of features evolving under the influence of known combinations of forces and analyze the stability estimates produced by different methods.
Probably the most important result of our analysis is that, despite the large conceptual differences between the reviewed methods, they do tend to agree to a large extent. The simplest proposal to explain this agreement is that it is related to an intrinsic tendency of some structural features to be more stable than others across language families and geographic areas. Such tendencies are due to multiple factors affecting how features change, including cognitive, articulatory and perceptual biases, and constraints deriving from language use. One important mechanism might be represented by the iterated cultural transmission of language across generations in populations of biased language acquirers and users –, and some of these biases might even have a genetic component , . From the loadings on the first Principal Component (Tables 5 and 6), the pairwise correlations (Tables 3 and 4) and the distances (Figure 2) between methods, it is clear that while CM is discordant (but still agreeing), the other methods contribute relatively equally.
This suggests (i) that this agreement reflects mostly a vertical/genealogical component whereby the stable features’ values tend to be transmitted faithfully to daughter languages, and (ii) that genealogically (vertically) stable features also tend to be stable against contact (horizontal) processes. However, we would first need to rule out other possible explanations, such the pattern of missing data or hidden sampling biases in the considered features, languages, language families and areas. To effectively address these issues we will need to use realistic computer simulations and randomization of the WALS data, as well as, if possible, replications using other typological databases. We will leave this task for future research. Nevertheless, we would argue that, given the differences between the methods included and the fact that each individual publication introducing these methods performed various sanity checks and tests, this agreement does, with a very high probability, tell us something about the stability of structural features and not about contingent sampling and coding biases or dataset coverage.
Second, a clear and reliable distinction seems to cut across conceptual differences between methods. Based on an a priori analysis of the methods, we would not have expected that the strongly phylogenetic method D and the strongly non-genealogical methods CC and CR would agree so well. In turn, we would have expected D to agree with W and M, which all include a strong genealogic component. Surprisingly, W does not agree very well with D and M. In effect, the strongest agreement seems to be among the methods D, M, CC, and CR. The special status of the borrowability encapsulated by P1 and P2 and the consistency captured by CM, CC and CR also appears in the lower-ranked components of the PCA. The pattern of differences and similarities between methods that our investigation here has uncovered will help clarify not only the various aspects behind the apparently simple concept of structural stability, but also to allow the choice of the most appropriate concept of stability and associated estimation method for the problem at hand.
Third, the identification of globally stable and unstable features (Table 7) as well as those features for which the methods agree or disagree most, will allow a better understanding of language change and evolution and the multifaceted constraints acting on it. However, an important future direction will be represented by the study of structural stability at the level of language family and geographic area. An important first step has recently been made in this direction , showing that besides the universal tendencies and idiosyncratic differences between language families, there might be large-scale cross-family patterns in what concerns the stability of structural features.
In this context, it is interesting to note the recent work of Balthasar Bickel , which tests the Family Bias Theory, whereby “directional biases” attested across multiple families are due to the action of one or more “driving factors” (or “universal pressures”) as opposed to the action of “faithful inheritance”. A directional bias is defined in this context as the skewing of the distribution of a given structural feature’s values in the family’s (or other historical unit’s) languages as detected by a significant test at -level 0.05 when the p-values are computed using permutations (p.2–4). He then adduces several arguments based on the analysis of the WALS database  and computer simulations in support of the Family Bias Theory and concludes that “typological distributions are systematically driven by the interaction of faithful inheritance (genealogical stability) with various kinds of external pressure, such as universal principles and areal diffusion trends” (p.17).
Given the results of our analysis, and our discussion of stability in a biological evolutionary context, it seems to us that Bickel’s  apparent criticism is in fact agreeing with our findings on a deeper level. More precisely, the “copying fidelity” of a structural feature across generations and language splits is certainly a component of the feature’s stability but not the only one, as external pressures (“driving factors”), generated by language-internal or by the larger cultural, biological and ecological context, also play an important role in shaping the distribution of linguistic diversity. Linguists have been aware for a long time that languages have historical and evolutionary inertia and that different languages and language groups present different affordances and opportunities for language change, all these interacting in a complex manner with feature-specific properties in shaping their temporal stability.
In conclusion, this analysis represents an important step towards a better understanding of language as a complex evolutionary system, and it strongly suggests that structural stability shows a clear universal tendency for some features to be more stable than others, but that this apparently simple concept of stability is in fact very complex.
Elena Maslova’s estimation of transition probabilities. Here we present the detailed derivation of Elena Maslova’s (denoted in this paper as method M) estimation of transition probabilities starting from first prinsiples and its application to the WALS data.
The R implementation of Maslova’s method. This is the R script implementing our derivation of Maslova’s method (detailed in Methods S1) for the WALS database, released under a GPL v3 license.
The version of WALS dataset used in the paper. This dataset (released under an Attribution-NonCommercial-NoDerivs 2.0 Germany (CC BY-NC-ND 2.0) license) is the actual version of WALS used in our paper, included here for maximum reproductibility of the reported results.
The authors wish to thank A. Dima for discussions concerning data analysis and to L. O’Connor for comments on an earlier version.
Conceived and designed the experiments: DD MC. Performed the experiments: DD MC. Analyzed the data: DD MC. Wrote the paper: DD MC.
- 1. Crystal D (2006) The Fight for English: How language pundits ate, shot, and left. Oxford University Press: Oxford, UK.
- 2. Friðriksson F (2008) Language change vs. stability in conservative language communities: A case study of Icelandic. Ph.D. thesis, Faculty of Arts at University of Gothenburg, Sweden.
- 3. Kimura M (1968) Evolutionary rate at the molecular level. Nature 217: 624–626.
- 4. Ohta T (1968) Slightly deleterious mutant substitutions in evolution. Nature 246: 96–98.
- 5. Johnston MO (2008) Mutations and new variation: Overview. In: Cooper DN, Kehrer-Sawatzki H, editors, Handbook of Human Molecular Evolution, John Wiley & Sons Ltd:UK, volume 1 of Encyclopedia of Life Sciences. 108–117.
- 6. Metzgar D (2008) Mutation rates: Evolution. In: Cooper DN, Kehrer-Sawatzki H, editors, Hand-book of Human Molecular Evolution, John Wiley & Sons Ltd:UK, volume 1 of Encyclopedia of Life Sciences. 123–125.
- 7. Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press:NY.
- 8. Glansdorff N, Xu Y, Labedan B (2008) The last universal common ancestor: emergence, constitution and genetic legacy of an elusive forerunner. Biology Direct 3: 29.
- 9. Halder G, Callaerts P, Gehring W (1995) Induction of ectopic eyes by targeted expression of the eyeless gene in drosophila. Science 267: 1788–1792.
- 10. Onuma Y, Takahashi S, Asashima M, Kurata S, Gehring W (2002) Conservation of pax-6 function and upstream activation by notch signaling in eye development of frogs and flies. Proc Natl Acad Sci USA 99: 2020–2025.
- 11. Hughes AL (2008) Vertebrate immune system: Evolution. In: Cooper DN, Kehrer-Sawatzki H, editors, Handbook of Human Molecular Evolution, John Wiley & Sons Ltd:UK, volume 2 of Encyclopedia of Life Sciences. 1063–1067.
- 12. Clark NL (2008) Adaptive evolution of primate sperm proteins. In: Cooper DN, Kehrer-Sawatzki H, editors, Handbook of Human Molecular Evolution, John Wiley & Sons Ltd:UK, volume 2 of Encyclopedia of Life Sciences. 1158–1166.
- 13. Wyckoff GJ, Wang W, Wu CI (2000) Rapid evolution of male reproductive genes in the descent of man. Nature 403: 304–309.
- 14. Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, et al. (2006) An rna gene expressed during cortical development evolved rapidly in humans. Nature 443: 167–172.
- 15. Pollard KS, Salama SR, King B, Kern AD, Dreszer T, et al. (2006) Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2: e168.
- 16. Woods CG, Bond J, Enard W (2005) Autosomal recessive primary microcephaly (mcph): A review of clinical, molecular, and evolutionary findings. Am J Hum Genet 76: 717–728.
- 17. Lai CSL, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP (2001) A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413: 519–523.
- 18. Enard W, Przeworski M, Fisher SE, Lai CSL, Wiebe V, et al. (2002) Molecular evolution of foxp2, a gene involved in speech and language. Nature 418: 869–872.
- 19. Krause J, Lalueza-Fox C, Orlando L, Enard W, Green RE, et al. (2002) Molecular evolution of foxp2, a gene involved in speech and language. Nature 418: 869–872.
- 20. Aris-Brosou S (2005) Determinants of adaptive evolution at the molecular level: The extended complexity hypothesis. Molecular Biology and Evolution 22: 200–209.
- 21. Tadmor U, Haspelmath M, Taylor B (2010) Borrowability and the notion of basic vocabulary. Diachronica 27: 226–246.
- 22. Pagel M, Atkinson QD, Meade A (2007) Frequency of word-use predicts rates of lexical evolution throughout indo-european history. Nature 449: 717–721.
- 23. Pagel M (2009) Human language as a culturally transmitted replicator. Nat Rev Genet 10: 405–415.
- 24. Swadesh M (1952) Lexicostatistic dating of prehistoric ethnic contacts. Proc Am Phil Soc 96: 452–463.
- 25. Pagel M, Meade A (2006) Estimating rates of lexical replacement on phylogenetic trees of languages. In: Forster P, Renfrew C, editors, Phylogenetic methods and the prehistory of languages, McDonald Institute for Archaeological Research: UK, McDonald Institute Monographs. 173–182.
- 26. Greenhill SJ, Atkinson QD, Meade A, Gray RD (2010) The shape and tempo of language evolution. Proc R Soc B.
- 27. Nichols J (1999) Linguistic Diversity in Space and Time. University of Chicago Press.
- 28. Dunn M, Terrill A, Reesink G, Foley RA, Levinson SC (2005) Structural phylogenetics and the reconstruction of ancient language history. Science 309: 2072–2075.
- 29. Dunn M, Levinson SC, Lindström E, Reesink G, Terrill A (2008) Structural phylogeny in historical linguistics: Methodological explorations applied in island melanesia. Language 84: 710–759.
- 30. Campbell L, Poser WJ (2008) Language Classification: History and Method. Cambridge University Press.
- 31. Campbell L (2004) Historical linguistics : an introduction. Edinburgh: Edinburgh University Press.
- 32. Hunley K, Dunn M, Lindström E, Reesink G, Terrill A, et al. (2008) Genetic and linguistic coevolution in northern island melanesia. PLoS Genet 4: e1000239.
- 33. Nunn CL, Arnold C, Matthews L, Mulder MB (2010) Simulating trait evolution for cross-cultural comparison. Philos Trans R Soc Lond B Biol Sci 365: 3807–3819.
- 34. Dunn M, Greenhill SJ, Levinson SC, Gray RD (2011) Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473: 79–82.
- 35. Dediu D (2011) A bayesian phylogenetic approach to estimating the stability of linguistic features and the genetic biasing of tone. Proc R Soc B 278: 474–479.
- 36. Dediu D, Levinson SC (2012) Abstract profiles of structural stability point to universal tendencies, family-specific factors, and ancient connections between languages. PLoS ONE 7: e45198.
- 37. Dryer M (1997) Why statistical universals are better than absolute universals. In: Chicago Lin-guistic Society. volume 33, 123–145.
- 38. Bickel B (2010) Distributional biases in language families. Unpublished draft: University of Leipzig.
- 39. Haspelmath M, Dryer MS, Gil D, Comrie B, editors (2005) TheWorld Atlas of Language Structures. Oxford University Press:UK.
- 40. Thomason SG, Kaufman T (1988) Language contact, creolization, and genetic linguistics. University of California Press: Berkeley.
- 41. Nichols J (2008) Diversity and stability in language. In: Joseph BD, Janda RD, editors, The Handbook of Historical Linguistics, Blackwell Publishing Ltd. 283–310.
- 42. Lewiss M (2009) Ethnologue: languages of world. SIL International: Dallas, TX, 16 edition.
- 43. Cysouw M, Albu M, Dress A (2008) Analyzing feature consistency using dissimilarity matrices. STUF 61: 263–279.
- 44. Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Research 27: 209–220.
- 45. Ronquist F (2004) Bayesian inference of character evolution. Trends Ecol Evol 19: 475–481.
- 46. Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294: 2310–2314.
- 47. Gill J (2002) Bayesian methods: A social and behavioral sciences approach. Chapman & Hall/CRC:Boca Raton, FL.
- 48. Press SJ (2003) Subjective and objective Bayesian statistics: Principles, models, and applications. John Wiley & Sons, Inc.:NJ, 2 edition.
- 49. Ronquist F, Huelsenbeck JP (2003) Mrbayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.
- 50. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates Inc.:Sunderland, Mass.
- 51. Hammarström H (2010) A full-scale test of the language farming dispersal hypothesis. Diachronica 27: 197–213.
- 52. Maslova E (2000) A dynamic approach to the verification of distributional universals. Linguistic Typology 4: 307–333.
- 53. Maslova E (2002) Distributional universals and the rate of type shifts: towards a dynamic approach to “probability sampling”. URL http://anothersumma.net/Publications/Sampling.pdf. Lecture given at the 3rd Winter Typological School, Moscow. Accessed 2012 Dec 26.
- 54. Maslova E (2004) [Dynamics of typological distributions and stability of language types]. Voprosy Jazykoznanija 5: 3–16.
- 55. Maslova E, Nikitina T (2008) Stochastic universals and dynamics of cross-linguistic distributions: the case of alignment types. URL http://www.anothersumma.net/Publications/ProbabilityPubl.html. Accessed 2012 Dec 26.
- 56. Cysouw M (2011) Understanding transition probabilities. Linguistic Typology 15: 415–431.
- 57. R Development Core Team (2011) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. ISBN 3-900051-07-0.
- 58. Parkvall M (2008) Which parts of language are the most stable? STUF 61: 234–250.
- 59. Wichmann S, Holman EW (2009) Assessing Temporal Stability for Lin-guistic Typological Features. LINCOM Europa:München. URL http://email.eva.mpg.de/wichmann/WichmannHolmanIniSubmit.pdf. Accessed 2012 Dec 26.
- 60. Cox T, Cox M (1994) Multidimensional scaling. Chapman & Hall, London.
- 61. Tabachnick BG, Fidell LS (2001) Using multivariate statistics. Allyn & Bacon:Mass., 4 edition.
- 62. Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6: 65–70.
- 63. Tily H, Jaeger TF (2011) Complementing quantitative typology with behavioral approaches: Evidence for typological universals. Linguistic Typology 15: 479–490.
- 64. Culbertson J, Smolensky P, Legendre G (2012) Learning biases predict a word order universal. Cognition 122: 306–329.
- 65. Fedzechkina M, Jaeger TF, Newport EL (2012) Language learners restructure their input to facilitate efficient communication. Proceedings of the National Academy of Sciences 109: 17897–17902.
- 66. Dediu D (2011) Are languages really independent from genes? if not, what would a genetic bias affecting language diversity look like? Hum Biol 83: 279–296.
- 67. Dediu D, Ladd DR (2007) Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and microcephalin. Proc Natl Acad Sci U S A 104: 10944–9.