Co-Variation of Tonality in the Music and Speech of Different Cultures

Whereas the use of discrete pitch intervals is characteristic of most musical traditions, the size of the intervals and the way in which they are used is culturally specific. Here we examine the hypothesis that these differences arise because of a link between the tonal characteristics of a culture's music and its speech. We tested this idea by comparing pitch intervals in the traditional music of three tone language cultures (Chinese, Thai and Vietnamese) and three non-tone language cultures (American, French and German) with pitch intervals between voiced speech segments. Changes in pitch direction occur more frequently and pitch intervals are larger in the music of tone compared to non-tone language cultures. More frequent changes in pitch direction and larger pitch intervals are also apparent in the speech of tone compared to non-tone language cultures. These observations suggest that the different tonal preferences apparent in music across cultures are closely related to the differences in the tonal characteristics of voiced speech.


Introduction
Tonal differences between traditional Eastern and Western music are readily heard. Explanations often refer to the use of different scales [1][2][3], but this begs the question of why different sets of pitch intervals are preferred in the first place. The alternative we examine here is that the tonal characteristics of a culture's music are related to the prosodic characteristics of its speech. There are several reasons for entertaining this idea. First, both speech prosody and the melodic contour are used to convey emotion [4]. Second, speech is the principal source of pitch and pitch relationships in the human auditory environment [5]. Third, several aspects of musical tonality including interval preference, scale preference, and the affective impact of major and minor modes are closely tied to voiced speech [5][6][7]. Finally, rhythm and pitch patterns in Western instrumental music and speech are similar [8][9].
The use of pitch in speech varies greatly among languages. The most obvious example is the broad division of languages into ''tone'' and ''non-tone'' groups [10]. In tone languages, the lexical meaning of each syllable is conveyed by the use of pitch contours, relative pitch levels, or both. For example, Standard Mandarin uses four tones, referred to as ''high'', ''rising'', ''falling then rising'' and ''falling''; the syllable 'ma' uttered as a high tone means 'mother', as a rising tone 'hemp', with a falling then rising tone 'horse', and as a falling tone 'scold'. Other tone languages, such as Thai and Vietnamese, are similar by definition, but vary in detail, using five and six tones respectively to convey the lexical meaning of syllables [11][12][13]. In contrast, pitch contours and relative levels are not typically used in non-tone languages (e.g., English, French and German) to convey lexical meaning (although stress, which can influence pitch, determines different meaning in some instances, e.g., CONtent or conTENT [12][13][14]). Imbuing each syllable with a different pitch contour gives tone language speech a ''sing-song'' quality. Accordingly, Standard Mandarin speech has more frequent changes in pitch direction and greater rates of pitch change than American English speech [15].
Based on these differences, we asked how, if at all, differences in the use of pitch in the traditional music of tone and non-tone language speaking cultures compares with the use of pitch in speech. To address this question, we compiled databases of speech and music from several tone and non-tone language cultures. The analysis focused on two aspects of pitch dynamics: the frequency of changes in pitch direction (slope reversals), and the size of the pitch intervals used. These aspects were chosen because they differentiate tone and non-tone language speech and play a central role in the structure of musical melodies [15,16]. Figure 1 shows the number of melodic slope reversals in the music of tone and non-tone language cultures compared to the number of prosodic slope reversals in speech. The median number of melodic slope reversals per 100 notes is greater in the music of tone compared to non-tone language speaking cultures (43.3 vs. 36.0 respectively, U = 9843.5, P,0.001; Figure 1A). The median number of prosodic slope reversals per 100 syllables is also greater in the speech of tone compared to non-tone language speaking cultures (79 vs. 63.5 respectively, U = 2317, P,0.001; Figure 1B). (See Figure S5 for breakdown by individual cultures). Figure 2 shows the size of melodic intervals in the music of tone and non-tone language cultures compared to the size of prosodic intervals in speech. In accord with previous studies [17,18], the majority of melodic intervals were relatively small (0-500 cents); melodic intervals larger than a perfect fourth (500 cents) were infrequent, presumably because they are harder to sing [19] ( Figure 2A). This overall tendency notwithstanding, the average distribution of melodic intervals in the music of tone and non-tone language cultures is different (Table 1). In tone language cultures, intervals smaller than a major second (200 cents) occur less often (15.8% vs. 36.2%; t = -11.4, P,0.001), whereas intervals equal to or larger than a major second occur more often (84.2% vs. 63.8%; t = 11.4, P,0.001). The only exception in this overall pattern is major thirds (400 cents), which are more frequent in the music of non-tone language cultures (7.7% vs. 2.3%). Figure 2B shows the average distribution of prosodic interval size in tone and non-tone language speech. As in music, intervals smaller than 200 cents occur less often in tone language cultures (48.7% vs. 60.3%; t = 25.6, P,0.001), whereas larger intervals occur more often (51.3% vs. 39.7%; t = 5.6, P,0.001). (See Figure S6 for breakdown by individual language).

Discussion
The music of tone and non-tone language cultures is tonally distinct, as are the languages spoken by their members. To explore the possible relationship between music and speech across cultures, we assessed the pitch dynamics of these modes of expression in three cultural groups that use tone languages (Chinese, Thai and Vietnamese) and three that use non-tone languages (American, French and German). The patterns apparent in music parallel those in speech. Thus the music of tone language cultures changes pitch direction more frequently and employs larger melodic intervals. Similarly, the speech of tone language cultures changes pitch direction more frequently and employs larger prosodic intervals. Presumably tone language speech exhibits these characteristics because the lexical meaning of each syllable in a tone language is tonally determined [15]. Consequently adjacent syllables often have different pitch contours and levels resulting in more frequent changes in pitch direction and larger pitch changes between syllables. In contrast, given that very few syllables in nontone languages are distinguished in this way, changes in pitch direction should be less frequent and pitch changes between syllables should be smaller.
The only exception to this pattern is the use of major thirds. Despite the greater frequency of larger prosodic intervals ($200 cents) in the speech of tone compared to non-tone languages, there are fewer melodic major thirds (400 cents) in the music of tone language cultures (see Figure 2). A possible reason for this anomaly is scale usage. The traditional music of the tone language speaking cultures examined tends to use pentatonic scales [20][21][22], whereas traditional music of non-tone language speaking cultures examined tends to use heptatonic scales [2,23]. Any particular scale affords different opportunities for particular melodic intervals to arise. Thus, in comparison with the major heptatonic scale, the major pentatonic scale offers approximately 6% fewer opportunities for major thirds despite an 11% increase in opportunities for larger melodic intervals (200-500 cents) overall (see Figure S7). This difference in opportunity for larger intervals may also be related to the larger number of melodic slope reversals in tone language music, since large melodic intervals tend to be followed by changes in the direction of pitch contour [16]. However, the use of different scales (and its implications for melodic structure) in different cultures raises the further question of why a given culture might favor a particular scale. Scale preferences appear to be based in part on the similarity of a set of tones to a harmonic series [6]. Consequently, the preference for pentatonic major scales in the music of the tone language speaking cultures examined could reflect the desire for a harmonically coherent set of notes that also uses relatively large melodic intervals to endorse speech similarity more specifically (see Text S1).
In sum, co-variation of tonal characteristics in the music and speech of the tone and non-tone language speaking cultures we examined indicates an intimate relationship between these two modes of social communication, providing a way of explaining at least some aesthetic preferences in biological terms.

Music Databases
Monophonic folk melodies from tone and non-tone language speaking cultures, all of which could be either played on an instrument or sung, were obtained from scores and MIDI format files obtained principally from National University of Singapore and the Singapore National Library Board. The tone language database comprised 50 traditional Mandarin, 20 traditional Thai, and 20 traditional Vietnamese melodies. The non-tone language database comprised 50 traditional American, 20 traditional French, and 20 traditional German melodies. To mitigate crosscultural contamination by modern media, all compositions predated 1900, often by hundreds of years.

Speech Databases
Tone and non-tone speech samples were acquired by recording monologues read by native speakers of each of the 6 relevant languages (i.e. Standard Mandarin, Thai, and Vietnamese; American English, French, and German). Each speaker read 5 emotionally neutral monologues translated into the appropriate language ( Figure S1). Prior to recording, all participants practiced reading the monologues out loud under supervision, the only instruction being to speak as if in normal conversation. The tone language database comprised recordings of 20 Standard Mandarin speakers (10 females), 10 Thai speakers (6 females), and 10 Vietnamese speakers (6 females); the non-tone language database comprised 20 American English speakers (10 females), 10 French speakers (5 females), and 10 German speakers (4 females).

Data Analysis
The analysis of music focused on melodic slope reversals, and melodic interval size. Melodic slope reversals were defined as any change in the direction of the pitch contour of a melody. For each melody, the number of local minima and maxima was tabulated and divided by the total number of notes; this value was multiplied by 100 to give the incidence of melodic slope reversals per 100 notes. Melodic interval size was defined as the pitch difference (in cents) between adjacent notes (see Text S1 and Figure S2 for details). For each music database the distribution of interval sizes was determined separately for each melody and then averaged. These results are reported in terms of absolute interval size because the distributions of interval sizes for descending and ascending intervals were broadly similar in both musical databases ( Figure S4A).  Statistics for the comparisons of the most commonly occurring melodic interval sizes in tone and non-tone language music databases; n 1 and n 2 refer to the sample sizes of tone and non-tone language music databases. (All comparisons were made with the two-tailed independent samples t-test, a-level adjusted using the Bonferroni method). doi:10.1371/journal.pone.0020160.t001 The analysis of speech focused on two aspects of pitch dynamics analogous to those examined in music: prosodic slope reversals, and prosodic interval size. Prosodic slope reversals were defined as any change in the direction of the simplified pitch contour of a speech recording (as given by the Prosogram algorithm; see Text S1). For each speaker, the number of local minima and maxima was tabulated and divided by the total number of syllables; this value was multiplied by 100 to give the incidence of prosodic slope reversals per 100 syllables. Prosodic interval size was defined as the pitch difference (in cents) between the final and beginning pitch levels of adjacent syllables (see Text S1 and Figure S3). For each speech database, the distribution of intervals sizes was determined separately for each speaker and then averaged. As with music, the results of this analysis are reported in terms of absolute interval size because the distributions interval sizes for descending and ascending interval were broadly similar in both speech databases ( Figure S4B).