Universal Principles in the Repair of Communication Problems

There would be little adaptive value in a complex communication system like human language if there were no ways to detect and correct problems. A systematic comparison of conversation in a broad sample of the world’s languages reveals a universal system for the real-time resolution of frequent breakdowns in communication. In a sample of 12 languages of 8 language families of varied typological profiles we find a system of ‘other-initiated repair’, where the recipient of an unclear message can signal trouble and the sender can repair the original message. We find that this system is frequently used (on average about once per 1.4 minutes in any language), and that it has detailed common properties, contrary to assumptions of radical cultural variation. Unrelated languages share the same three functionally distinct types of repair initiator for signalling problems and use them in the same kinds of contexts. People prefer to choose the type that is the most specific possible, a principle that minimizes cost both for the sender being asked to fix the problem and for the dyad as a social unit. Disruption to the conversation is kept to a minimum, with the two-utterance repair sequence being on average no longer that the single utterance which is being fixed. The findings, controlled for historical relationships, situation types and other dependencies, reveal the fundamentally cooperative nature of human communication and offer support for the pragmatic universals hypothesis: while languages may vary in the organization of grammar and meaning, key systems of language use may be largely similar across cultural groups. They also provide a fresh perspective on controversies about the core properties of language, by revealing a common infrastructure for social interaction which may be the universal bedrock upon which linguistic diversity rests.

This model investigates the cost of repair sequences (repair initiation and repair solution together) relative to the trouble source turn. For a description of the variables used, see the supplementary information 'Main model: Open and Restricted repair initiators'.
Conservation (named p4 in the R file) is measured as the relative length of the trouble source compared to the insert sequence: p4 = log T S Clength RS Clength + RI Clength (4.1)

Methods
Mixed effect logit modelling was used to assess the data in R [12], using packages lme4 [13] and languageR [14]. The model predicts the type of repair initiator that is used given factors relating to the previous turn. The fixed effect factors were chosen based on a-priori predictions about what would affect the likelihood of using open versus restricted repair initiators.
The intercept of the model was set to reflect the least marked situation (determined by frequency, which matches intuition well). The least marked situation is an OIR from a 1PP, 'first' sequence from a dyadic conversation in an audible language with no visible nor audio trouble, no intervening material, no parallel activity, not recorded in a soundproof booth and where B gazes to A and A gazes to B. Model without random intercept by language: p4~RI_identity + TS_vis + seq_intervene + TS_aud.bin + TS_par + soundproof + (0 + RI_identity + TS_vis + TS_aud.bin + TS_par + seq_intervene | language) + (1 + TS_vis + TS_aud.bin + TS_par + seq_intervene | recording) + (1 | language.family)

Model Results
The model converged with the following fit: AIC BIC. More details on the model can be found by loading the R data file p4 modelP4.rd.

Permutation tests
The models above suggested that the average conservation was very close to 1:1. That is, the length of the trouble source turn was equal to the combined length of the repair initiator turn and the response turn. This is striking, given that the expected conservation for 3 randomly chosen turns would be 1:2. In order to assess the statistical significance of this finding, permutation tests were used. In all permutation tests below, data was only permuted within languages. The first permutation test (reported in the main paper) tried to estimate the average conservation in normal sequences by looking only at trouble source turn lengths. The mean conservation was calculated using one turn to represent the 'trouble source' and two turns to represent the 'insert sequence'. In each permutation, 1 third of the data was randomly assigned to be 'trouble source' turns, and the remaining 2 thirds were assigned to be 'insert sequences'. The trouble source and insert sequences were randomly paired, and the p4 measure was calculated for each pairing. A measure of distance from 1:1 was calculated for N pairs sequences, |. This calculates the mean value of p4 for the sample, and calculates the distance from zero (p4 is on a log scale, so zero is equal to a ratio of 1:1). That is, d is a measure of the distance from a ratio of 1:1. d was calculated for many permutations to give a distribution of mean p4 values. This was then compared to d calculated for the actual data.
For the actual, unpermuted data, d = 0.034. 100,000 permutations were carried out and the mean d was 0.556 (mean conservation of 1:1.74, closer to the length of the insert sequence being 2 times the length of the trouble source). See the section below on why this isn't 1:2. No permutations produced a d smaller than the actual d, so the probability of the real conservation measure being close to 1:1 by chance is less than 1 100000 = 0.00001. The second permutation test permuted all turn lengths (including repair initiator and response turns) so that turn lengths were randomly assigned as being the trouble source, repair initiator and response. 100,000 permutations were carried out and the mean d for permutations was 0.105. No permutations produced a d smaller than the actual d, so the probability of the real conservation measure being close to 1:1 by chance is less than 1 100000 = 0.00001. A third permutation test was done, permuting only the repair initiator turns. The mean d for permutations was 0.098. 2 out of 100,000 permutations resulted in a d smaller than the actual d, so the estimated probability of the conservation measure being close to 1:1 by this more conservative measure of chance is 0.00002.
Note that permuting only the trouble source length is not meaningful, since the mean of the permuted numerator is mathematically invariant, and would be equal to the actual mean of the unpermuted data. 4.6 Why isn't the permuted conservation equal to 1:2?
If all trouble source turns had equal length, then the permutation test above should have resulted in a mean conservation of 1:2. However, the mean value for the permuted data is not exactly 1:2 because the distribution of turn lengths is skewed towards short turns (see figure 4.3). This means that a short turn is more likely to be selected as one of the two insert sequence turns than the trouble source. Figure 4.3 shows the results of simulating the permutation test. A random normal distribution is generated (30,000 points), then skewed by raising the values to a power p. When p = 1 and the distribution is normal, the ratio of one 'turn' compared to another two is 1:2. As the skewness increases, the ratio drops. The red triangle in the figure shows the value for the actual data. While the direction of the relationship is constant, the exact relationship between the skewness and the expected conservation is affected by the sample size and the standard deviation of the initial normal distribution. This means that the permutation test carried out above is the most appropriate method for estimating significance.