Algorithmic voice transformations reveal the phonological basis of language-familiarity effects in cross-cultural emotion judgments

doi:10.1371/journal.pone.0285028

Fig 1.

Illustration of the algorithmic voice transformations used in the study.

A single recording of a French female speaker saying “J’ai oublié mon pardessus” (I forgot my jacket) is manipulated with the DAVID voice transformation platform to make it sound more happy, sad, or afraid (Experiment 1) or to insert a sudden pitch shift in the middle of the sentence (Experiment 2). (A) Solid black: time series of pitch values in the original recording estimated with the SWIPE algorithm [25]. The speech waveform of the unmodified recording is shown on the x-axis. Pitch values on y-axis are normalized to cents with respect to mean frequency 200 Hz. (B) Red, blue, and green lines: pitch of manipulated audio output in the Happy, Sad, and Afraid transformations, respectively, as used in Experiment 1. (C) Dashed line: +150 cents pitch shift occurring at t = 700 ms, Experiment 2.

More »

Expand

Table 1.

French and Japanese normal, jabberwocky, and shuffled sentences used in this study.

English translations are added for clarification but were not included in this study.

More »

Expand

Table 2.

Parameter values of the three emotional transformations used in Experiment 1 (for details, refer to main text and [22]).

More »

Expand

Table 3.

FR = French; JP = Japanese; SE = Swedish; H_b = biased hit rate (%); pi = proportion index; H_u = unbiased hit rate (%); p_c = chance proportion (%); t = t-score, degrees of freedom are 19 and 20 for the FR and JP groups respectively.

More »

Expand

Fig 2.

Emotion categorization: Unbiased hit rates averaged over the three non-neutral emotion categories, grouped by normal (French/FR, Japanese/JP and Swedish/SE stimuli), jabberwocky (FR, JP), shuffled (FR, JP), and reversed speech (FR, JP) conditions, for two groups of FR (N = 20) and JP (N = 21) native speakers.

*p < .05. **p < .01, ***p < .001, Bonferroni adjusted. Error bars, 95% CI.

More »

Expand

Fig 3.

Unbiased hit rates for each emotion category for both FR (Left) and JP participants (Right), averaged across normal, reversed, jabberwocky, and shuffled conditions. Solid line shows unbiased hit rat of FR sentences, while dashed line shows that of JP sentences. Error bars, 95% CI.

More »

Expand

Fig 4.

Pitch shift detection: Accuracy in 2IFC detection of individually-calibrated pitch shifts, grouped by normal, jabberwocky, shuffled and reversed conditions (FR and JP sentences), for two groups of FR (N = 24) and JP (N = 20) speakers.

*p < .05, ***p < .001, Bonferroni adjusted. Error bars, 95% CI.

More »

Expand