Fig 1.
Illustration of the algorithmic voice transformations used in the study.
A single recording of a French female speaker saying “J’ai oublié mon pardessus” (I forgot my jacket) is manipulated with the DAVID voice transformation platform to make it sound more happy, sad, or afraid (Experiment 1) or to insert a sudden pitch shift in the middle of the sentence (Experiment 2). (A) Solid black: time series of pitch values in the original recording estimated with the SWIPE algorithm [25]. The speech waveform of the unmodified recording is shown on the x-axis. Pitch values on y-axis are normalized to cents with respect to mean frequency 200 Hz. (B) Red, blue, and green lines: pitch of manipulated audio output in the Happy, Sad, and Afraid transformations, respectively, as used in Experiment 1. (C) Dashed line: +150 cents pitch shift occurring at t = 700 ms, Experiment 2.
Table 1.
French and Japanese normal, jabberwocky, and shuffled sentences used in this study.
English translations are added for clarification but were not included in this study.
Table 2.
Parameter values of the three emotional transformations used in Experiment 1 (for details, refer to main text and [22]).
Table 3.
FR = French; JP = Japanese; SE = Swedish; Hb = biased hit rate (%); pi = proportion index; Hu = unbiased hit rate (%); pc = chance proportion (%); t = t-score, degrees of freedom are 19 and 20 for the FR and JP groups respectively.
Fig 2.
Emotion categorization: Unbiased hit rates averaged over the three non-neutral emotion categories, grouped by normal (French/FR, Japanese/JP and Swedish/SE stimuli), jabberwocky (FR, JP), shuffled (FR, JP), and reversed speech (FR, JP) conditions, for two groups of FR (N = 20) and JP (N = 21) native speakers.
*p < .05. **p < .01, ***p < .001, Bonferroni adjusted. Error bars, 95% CI.
Fig 3.
Unbiased hit rates for each emotion category for both FR (Left) and JP participants (Right), averaged across normal, reversed, jabberwocky, and shuffled conditions. Solid line shows unbiased hit rat of FR sentences, while dashed line shows that of JP sentences. Error bars, 95% CI.
Fig 4.
Pitch shift detection: Accuracy in 2IFC detection of individually-calibrated pitch shifts, grouped by normal, jabberwocky, shuffled and reversed conditions (FR and JP sentences), for two groups of FR (N = 24) and JP (N = 20) speakers.
*p < .05, ***p < .001, Bonferroni adjusted. Error bars, 95% CI.