Several studies have investigated the acoustic effects of diagnosed anxiety and depression. Anxiety and depression are not characteristics of the typical aging process, but minimal or mild symptoms can appear and evolve with age. However, the knowledge about the association between speech and anxiety or depression is scarce for minimal/mild symptoms, typical of healthy aging. As longevity and aging are still a new phenomenon worldwide, posing also several clinical challenges, it is important to improve our understanding of non-severe mood symptoms’ impact on acoustic features across lifetime. The purpose of this study was to determine if variations in acoustic measures of voice are associated with non-severe anxiety or depression symptoms in adult population across lifetime.
Two different speech tasks (reading vowels in disyllabic words and describing a picture) were produced by 112 individuals aged 35-97. To assess anxiety and depression symptoms, the Hospital Anxiety Depression Scale (HADS) was used. The association between the segmental and suprasegmental acoustic parameters and HADS scores were analyzed using the linear multiple regression technique.
The number of participants with presence of anxiety or depression symptoms is low (>7: 26.8% and 10.7%, respectively) and non-severe (HADS-A: 5.4 ± 2.9 and HADS-D: 4.2 ± 2.7, respectively). Adults with higher anxiety symptoms did not present significant relationships associated with the acoustic parameters studied. Adults with increased depressive symptoms presented higher vowel duration, longer total pause duration and short total speech duration. Finally, age presented a positive and significant effect only for depressive symptoms, showing that older participants tend to have more depressive symptoms.
Citation: Albuquerque L, Valente ARS, Teixeira A, Figueiredo D, Sa-Couto P, Oliveira C (2021) Association between acoustic speech features and non-severe levels of anxiety and depression symptoms across lifespan. PLoS ONE 16(4): e0248842. https://doi.org/10.1371/journal.pone.0248842
Editor: Eric van Exel, VU medisch centrum School of Medical Sciences, NETHERLANDS
Received: July 9, 2020; Accepted: March 7, 2021; Published: April 8, 2021
Copyright: © 2021 Albuquerque et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files. Note that: Considering the Portuguese General Data Protection Regulation (58/2019), the raw audio files cannot be made available. The informed consent does not contemplate sharing the raw data.
Funding: This research was funded by the project Vox Senes POCI-01-0145-FEDER-03082 (funded by FEDER, through COMPETE2020 - Programa Operacional Competitividade e Internacionalização (POCI), and by national funds (OE), through FCT/MCTES), by IEETA Research Unit funding (UIDB/00127/2020), and by CIDMA (UID/MAT/04106/2019). The research of LA is supported by the FCT (Fundação para a Ciência e a Tecnologia) through grant SFRH/BD/115381/2016 (funded by FSE and by CENTRO2020). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The World Health Organization (WHO) recognizes that psychological disorders, such as depression and anxiety, are major public health concerns defined by a combination of atypical perceptions, thoughts, behaviors, emotions and relationships with others . Depression is the world’s fourth most significant etiology of disability, leading to high costs for governments worldwide . Psychological conditions present a global impact on individuals and on their quality of life .
The diagnostic process of depression and anxiety is based on assessment tools that rely on the patients’ perception of their symptoms and/or on the clinicians’ opinion about the interview style . Consequently, the diagnostic process is subjective and time-consuming, requiring training and practice to produce a reliable result . Measurable biomarkers, such as speech, could contribute and assist specialists in a more accurate and objective detection of symptoms and, consequently, in the selection of a more effective treatment [5, 6]. Due to its highly complex production, speech has shown to change along with the cognitive and physiological changes that result from mental health symptoms [7, 8]. Speech can be studied in their fully extension, comprising both segmental and suprasegmental features. Segmental features concern the characteristics of individual phonemes; suprasegmental or prosodic features are transmitted in syllables, utterances, or sentences and consists in, e.g., acoustic emphasis, rhythm, stress or intonation .
The analysis of the influence of diagnosed anxiety/depression disorders in acoustic parameters allows the collection of information that can contribute to the development of automatic detection systems of mood disorders to support the diagnosis based on measurable biomarkers (behavioral, biological and physiological features). Also important, but less studied, is how acoustic features are associated with minimal (i.e., subclinical) mood symptoms.
Research studies that focus on that subject will contribute to an early and more reliable recognition of mood disorders. Additionally, speech and language pathologists, the professionals responsible for the intervention on voice alterations, could increase their knowledge of the variation of acoustic features in people with anxiety and/or depression symptoms, contributing to the differentiation between alterations derived from voice disorders and variations derived from minimal-to-mild mood symptoms.
Anxiety symptoms and acoustic features
Fear, tension and distress are common symptoms associated with anxiety, usually assessed by subjective methods [10, 11]. As anxiety disorders have a reflection in people’s voice due to the somatic symptoms associated with the respiratory system, the acoustic parameters could be used as an objective method to assist in the assessment of anxiety symptoms [11, 12]. Several research studies had evidenced the influence of anxiety symptoms in acoustic parameters. According to Banse and Scherer , Hagenaars and Minnen , Diamond et al. , Weeks et al. , Low et al. , the mean fundamental frequency (F0) increase in individuals with anxiety. The variability of F0 was also evidenced to be a good indicator of anxiety symptoms, according to Hagenaars and Minnen  and Goberman et al. , reporting a higher pitch variability with the increase of anxiety. Other researchers, in contrast, found different trends in this acoustic variable [18–20].
Suprasegmental measures, such as percent pause time and number of pauses were proven to positively correlate with the increase of anxiety [10, 17, 21]. Although, speech rate tends to increase with anxiety increase [10, 14, 22].
Increased anxiety also leads to higher jitter and shimmer values [3, 23]. Loudness and harmonic-to-noise ratio (HNR), on the other hand, have an irregular performance, presenting distinct results in different research studies—either no change, decrease or increase [21, 22, 24].
Ozseven et al.  analyzed a broader set of acoustic parameters (122 acoustic measures) in patients diagnosed with anxiety and in healthy individuals and observed that 42 acoustic parameters (e.g., F0, F1, jitter, shimmer, mel-frequency cepstrum coefficients (MFCCs), and wavelet coefficient) change, in different directions and intensities, with anxiety. For example, F0 mean, F1 mean, jitter, shimmer and wavelet coefficients increase in anxious patients and, in general, MFCCs decrease with anxiety.
Depression symptoms and acoustic features
Depression cause changes in the somatic and automatic nervous system that reflects on muscle tension and respiratory rate [25, 26]. Those changes have an impact on prosody and speech quality [27–29]. The increase muscle tension and changes in salivation and mucus secretion affects vocal tract and limits articulatory movements, leading to articulation errors, reduce pitch range, decrease in speech rate and increase hesitations [25, 30]. In a vast amount of research studies, the reduction of F0 range and the F0 average are found to be linked with depression severity [3, 4, 31–33]. F0 range was also evidenced to be a biomarker in treatment responders, as pitch variability increase significantly in patients that present depressive symptoms decrease [4, 34].
The slowing of thoughts and reduction of physical movements that occur in depression—psychomotor retardation (PMR)—could explain the reduction in F0 parameters, as the complexity of the larynx neuromuscular system is affected by disturbances in muscle tension due to PMR [30, 35–38]. The increase of muscle tension in the vocal tract could also explain the tightening of the vocal folds and, consequently, less variable speech [25, 30, 38–41]. Although, other studies did not find a significant correlation between the F0 parameters of depressed and non-depressed patients, possibly due to methodological aspects or the intrinsic characteristics of F0 (i.e., simultaneously an indicator of the affective status and a marker of the physical state of vocal folds) [4, 30, 41].
Similarly to F0, contradictory results concerning variation in loudness were found in the literature, whereas only some research studies showed statistically significant improvements of energy parameters after depression treatment [27, 33].
More consistent results were found related to the other prosodic feature: speech rate. Cannizzaro et al.  found evidence of a strong negative correlation between speech rate and a clinical subjective rating of depression. Investigations using different sample sizes conclude, in general, that speech rate is reduced in individuals with depression [4, 30, 42–44]. A study , considering phonologically-based measures of speech rate, observed stronger correlations of these measures with depression status and subjective measures of depression, when compared with a global speech rate value. Despite the value of speech rate as a potential biomarker of depression severity, it remains unclear whether the reduction in speech rate is an indicator of motor retardation or lower cognitive functioning [5, 27, 30, 45]; additionally, speech rate could not present appropriate discriminatory evidence to be a single biomarker of depression .
Formant measures represent acoustic resonances of the vocal tract. Considering that depression could affect vocal tract properties, formant features are also suitable as a marker of these changes [4, 5, 46]. Several studies have [4, 46–49] found a decrease in formant frequencies in comparison with healthy individuals. This finding could be explained by PMR that causes either tightening on vocal tract or lack of motor coordination [41, 45–47, 50].
Further voice measures, such as jitter, shimmer and HNR are voice quality measures that are positively correlated with depression [3, 41, 49, 51]. Indirectly-relevant features of voice properties (e.g., MFCCs or power spectral density) are also correlated with individuals’ mood [6, 47, 50–52]. Taguchi et al.  investigated the differences in the MFCCs on individuals with and without depression and found evidence of higher levels of sensitivity and specificity of the second dimension of a MFCCs, concluding that this dimension could be a discriminatory factor between depressed and healthy patients and, consequently, a depression biomarker.
Suprasegmental speech measures were also found to have significant correlation with subjective measures of depression [4, 30]. Total recording duration increased with depression severity due to more variable and longer pauses, which resulted in a decrease in speech to pause ratio [4, 27]. Percent of pause time is higher in the depressed group . The studies of Mundt et al.  and Mundt et al.  also revealed that total recording duration, total pause time and number of pauses showed a significant decrease in patients that respond positively to depression treatment, so these measures could be considered as biomarkers to monitor treatment progress. By contrast, patients that do not respond to treatment presented smaller vocal acoustic changes or even no changes.
The acknowledgment that different acoustic features could be associated with depressive and/or anxiety symptoms lead to the exploration of this relationship in a sample composed by adult participants of different ages. Therefore, the present study intends to 1) analyze the association between the acoustic parameters of vowels in stress position with depressive and anxiety symptoms 2) analyze the association between suprasegmental characteristics of spontaneous speech (e.g., rhythmic measures, speaking F0 and HNR) with anxiety and depressive symptoms. So, the aim of this study is to determine if variations in segmental and suprasegmental acoustic features have corresponding alterations in anxiety or depression symptoms in adult population across lifetime.
All ethical procedures were ensured prior to any data collection for this cross-sectional study. The project was submitted and approved by the Ethics Committee Centro Hospitalar São João/ Faculty of Medicine, University of Porto, Portugal (number N38/18). All participants agreed and signed the written consent form before participating in the study.
A convenience sample of 112 adult Portuguese speakers (aged between 35-97) participated in this study, and were divided into 4 age groups [35-49] (15 men, 15 women), [50-64] (15 men, 15 women), [65-79] (15 men, 16 women), and ≥80 (11 men, 10 women). To be included, participants had to meet the following inclusion criteria: be Portuguese native speaker; no history of speech-language impairment, severe hearing problems, neurological conditions or head/neck cancer; be able to follow instructions; absence of upper respiratory tract infection for 3 weeks before the speech collection; absence of currently smoking habits or in the previous 5 years; good general health reported by self-assessment; absence of hearing aids.
The data used in the current study were originally collected in a large ongoing project concerning the analysis of the effects of age and gender on acoustic variables (i.e., F0, F1, F2 and duration of European Portuguese language (EP) oral vowels)  and suprasegmental measures derived from spontaneous speech. For more details see Albuquerque et al. .
The participants also fulfilled a questionnaire and an instrument concerning anxiety and/or depressive symptomatology (described below), whose data was studied in the present research.
Each participant completed a background questionnaire, which intends to collect information concerning age, gender, educational level and habits. The Hospital Anxiety Depression Scale (HADS), a self-report questionnaire, was used to evaluate anxiety and depression symptoms. HADS is not a time-consuming instrument, and has been largely used in research studies and in clinical settings with non-psychiatric populations . It presents good internal consistency, sensitivity and specificity and concurrent validity with questionnaires commonly used to assess anxiety and depression . HADS is divided into an Anxiety subscale (HADS-A) and a Depression subscale (HADS-D) with seven items each. Each item has a 4-point Likert score scale with a minimum value of 0 and a maximum value of 3. Higher scores represent higher levels of anxiety and depressive symptoms. The HADS manual provides cut-offs scores indicating mild (8–10), moderate (11–14), or severe (15–21) anxiety or depression [54, 56, 57]. Following the cut-offs, a score of 0–7 for each subscale could be regarded as being without anxiety or depression symptoms . So, 7 is the maximum value for the normal range.
Corpus and recording protocol
The corpus consists of two types of parameters: the first refers to segmental and the second to supragmental acoustic measures.
The speech corpus for segmental analysis consisted of 28 disyllabic words, with the EP vowels [i], [e], [ε], [a], [o], [ɔ] and [u] in stressed position, mostly composed by a CV.CV sequence.
The consonants used in the sequence were voiced/voiceless stop consonants or voiced/voiceless fricatives. The stimuli were embedded in a carrier sentence “Diga…por favor” (“Say…please”). Four different words were selected for each vowel. The words were chosen based on familiarity and easiness of graphical representation to overcome interferences of reading difficulties .
The randomized sentences were presented individually on the computer screen using the software system SpeechRecorder , where picture and orthographic words could be viewed simultaneously. After the participant became acquainted with the sentences structure, the researcher asked the participant to read the sentence at a comfortable loudness and pitch level. Each sentence was repeated three times, in a total of 12 repetitions of each vowel, 84 productions by participant (112 participants x 28 words x 3 repetitions = 9408 recordings).
The participants were also instructed to describe the standardized picture “Cookie Theft picture”  in order to analyze spontaneous speech.
All recordings took place in quiet rooms, in which participants were seated at a table and their speech productions were recorded using an AKG C535 EB cardioid condenser microphone connected to an external 16-bit sound system (PreSonus Audio-BoxTM USB) in a sampling rate of 44100 Hz.
Concerning data obtained from the production of disyllabic words, WebMAUS general [61, 62] was used to automatically segment the recordings at word and phoneme level. Data was then imported into Praat speech analysis software  and manually analyzed by four trained raters who checked the accuracy of vowel boundaries. Data with clipping, recording artifacts (e.g., noise or cough), with unusual hoarseness/ vocal fry or misread words were excluded from the analysis in a total of 6% of the total data .
Related to spontaneous speech, a Praat script  was used to automatically detect silent pauses of over length 250 ms  and create textgrid files. The automated alignments were manually checked by two trained analyzers, who verified the accuracy of pause and speech intervals. Speech intervals with speaker and/or environmental noise were not considered for further analysis, and also the beginning and end of all recordings were not considered in the analysis due to sentence initial and final acoustic variability (a total of 7% of the speech intervals were excluded).
A set of 18 parameters were extracted from the recording data. As the recordings were not conducted with this primary aim, it was not possible to measure all voice cues that are susceptible to change due to mood symptoms. The chosen parameters represent the acoustic features mostly used in this research field and also those that reflect alterations in the dynamics of speech production with a change in motor control related to depressed and/or anxiety symptoms [10, 39]. Parameters are defined in Table 1. The following procedures were adopted in the extraction of data for the segmental and suprasegmental domains.
F0, formant frequencies (F1 and F2) and vowel duration were automatically extracted from segmented data using Praat scripts. The cross-correlation algorithm was used to estimate F0 of the vowels, which is suitable for short vowels . F0 median value was obtained from the central 40% of each target vowel, thus minimizing the impact of flanking consonants on F0. The median value was obtained instead of mean F0 to decrease the impact of F0 measurement errors . The pitch range used for F0 analysis was 60-400 Hz for male and 120-400 Hz for female. The burg-LPC algorithm provided by Praat was used to compile values for F1 and F2 at the central 40% of the vowel. A procedure adapted from  and previously used in Albuquerque et al.  and Oliveira et al.  was applied to optimize the formant ceiling for a certain vowel of a certain speaker. F1 and F2 were calculated 201 times for each vowel, for all ceilings between 4500 and 6500 Hz in 10 Hz steps (for female) and between 4000 and 6000 Hz in 10 Hz steps (for male). The ceiling referred above was chosen as the one that produced the lowest variation.
Vowel duration was obtained from the annotation files considering the beginning and ending points of each vowel and vowels shorter than 20 ms were excluded.
For syllable count, an adapted Praat script of the BeatExtractor [69, 70] was used to detect vowel onset using a beat wave (a normalized and band-specific amplitude). The cut-off frequency were defined automatically, the thresholds were 0.1 (threshold 1) and 0.06 (threshold 2), the filter was defined as Butterworth and the technique was Amplitude.
To obtain speaking F0 automatically from the description picture task a Praat script (Prosody Descriptor)  was used to measure mean F0 in valid speech intervals, with the threshold 75-400 Hz for males and 120-600 Hz for females. Each value was considered and used to obtain the average of speaking F0 for each participant.
All acoustic and mood data were compiled in a SPSS file (IBM SPSS software package version 25.0; SPSS Inc., Chicago, IL, USA) . The segmental measures (F0, F1, F2 and duration) were obtained for each vowel and, afterwards, median of repetitions was obtained for each vowel type and speaker. F0, F1, F2 and duration mean for stressed vowels were also calculated. The suprasegmental measures (presented in Table 1) were also incorporated.
Descriptive data for HADS-A and HADS-D were obtained through the calculation of mean and standard deviation by age (both in a categorical and continuous format), and gender. A two-way ANOVA was performed, including the interaction term between age group and gender. The variance homogeneity (Levene test) and the normality of residuals (by using inspection of QQ plot) were verified. Additionally, descriptive data for segmental and suprasegmental acoustic parameters were reported in mean and standard deviation by gender and HADS-A or HADS-D mood symptoms classification (≤7 versus >7, respectively). Adopting the intensity of change used by , in the comparison of neutral reading and anxious reading/spontaneous speech, which considers that a high increase is superior to 10% and a high decrease exceeds -10%, the differences between speakers without anxiety/depression symptoms and speakers with mood symptomatology were analyzed by gender.
To explore and model the relationship between all acoustic variables and the scores of mood symptoms (either HADS-A or HADS-D), a multiple linear regression model was developed with non-highly correlated acoustic variables as independent variables (defined as multivariable model). Then the regression models were adjusted for age (continuous) and gender (defined as adjusted model). The assumptions of residuals normality (QQ plot inspection) and homoscedasticity (scatterplot of residuals versus predicted values) were verified. The multicollinearity between independent variables were evaluated by Pearson correlation. Correlation values superior than 0.70 (in module) were considered highly correlated. Acoustic variables that presented a very large (> 0.7) magnitude of correlation , meaning that they measure the same behaviour and present a similar contribution to the model , were excluded from the analysis. So, only the acoustic variables vowels F0, vowel duration, vowels F2, total speech duration, total pause duration, speech rate, percent pause time and HNR are included. Due to multiple testing, resulting from the regression models (four models at total), the significant level used was 0.0125.
First, this section presents the sample characterization in terms of HADS-A and HADS-D scores by gender and age group. Secondly, the association of HADS-A and HADS-D with acoustic parameters are presented.
Sample characterization concerning mood measures
Table 2 presents the sample characterization concerning demographic variables and mood measures. Concerning age and gender, the sample is almost balanced. Regarding mood measures, for HADS-A and HADS-D, the number of participants with and without presence of anxiety or depression symptoms is unbalanced (26.8% and 10.7%, respectively) and non-severe (HADS-A: 5.4 ± 2.9 and HADS-D: 4.2 ± 2.7, respectively). HADS-A and HADS-D mean score by age group and gender are also presented in Table 2. Figs 1 and 2 represent the age effect on HADS-A and HADS-D, respectively.
Concerning HADS-A (Fig 1), there was a tendency for the median values to decrease after the middle age in female participants; in male participants, the age group [65-79] presented the lower median value of HADS-A. Only females of the age groups [35-49] and [50-64] presented part of the boxplot whiskers above the cut-off of 10 (moderate symptoms of anxiety). Although, two-way ANOVA showed no statistical effect of age (F(3, 104) = 1.618; p = 0.190) or gender (F(1, 104) = 3.039; p = 0.084) on HADS-A scores. Additionally, significant interaction between age group and gender for HADS-A (F(3;104) = 0.692; p = 0.559) was not detected.
HADS-D (Fig 2) on male participants increases continuously with age and in female participants there is a sharper increase in older age groups. HADS-D tends to increase with age. In the older age group, for both genders, the boxplot whiskers achieved the moderate symptoms of depression, but all median values are observed in the normal range. The ANOVA results showed a significant effect of age on HADS-D (F(3;104) = 6.065; p = 0.001), with significant differences between the age group ≥80 and all the younger groups, but no significant statistical effect of gender (F(1;104) = 0.275; p = 0.601). Additionally, the interaction of age with gender was non-significant (F(3;104) = 0.470; p = 0.704).
Association of HADS-A and HADS-D with acoustic parameters
Considering the division of HADS scores in absence ([0-7]) and presence of symptoms (>7), the mean and SD values of all acoustic parameters by gender and mood symptoms are presented in Table 3.
Table 3 was analyzed based on the intensity and direction of change of each acoustic parameter, and changes higher than 10%  are reported next. Considering HADS-A, in the group of female speakers with anxiety symptoms, an increase occurs in total pause duration, percent pause time and mean pause duration; a decrease arises in total speech duration, speech pause ratio, mean speech duration and number of syllables. In male speakers, none acoustic variable presents a change that differs 10% from the mean in the group of speakers with anxiety symptomatology. Although, speaking F0 and number of syllables were the acoustic variables that present the highest increase (7,5%) and the largest decrease (-5.0%), respectively, in the group of speakers with anxiety symptoms.
Regarding depressive symptoms, for females, an increase is observed in total pause duration, percent pause time and pause variability. A decrease occurs in total speech duration, mean speech duration, speech variability and number of syllables. For males, the acoustic variables that present an increase higher that 10% are vowel F0, mean pause duration and speaking F0. A decrease superior to 10% occurs in the acoustic variables total speech duration, total pause duration, total recording duration, speech pause ratio, number of pauses and number of syllables.
For a more in deepth analysis of the association between the independent variables (i.e., acoustic variables: vowels F0, vowel duration, vowels F2, total speech duration, total pause duration, speech rate, percent pause time and HNR) and the dependent variable (HADS-A or HADS-D scores), a multiple linear regression model was applied and adjusted by the influence of age and gender. In Table 4 the multiple regression model results for HADS-A and HADS-D are presented. Although no significant gender differences were observed for both HADS sub-scales (see Figs 1 and 2), and only HADS-D presented significant age differences, this approach is justified by the influence of these demographic variables on anxiety/depressive symptoms in other studies in this field [75–81]. For HADS-A, none of the acoustic variables considered presented a significant effect in both models.
For depression symptoms, expressed by the HADS-D scores, vowel duration, total speech duration and total pause duration presented significant effects. However, in the adjusted model only total speech duration maintain the significant effect, along with age. In the adjusted model, age was also significantly associated with HADS-D.
Fig 3 demonstrates the association of the depression symptoms scores and the total speech duration and age. The increase of depressive symptoms is related to the total speech duration decrease and to age increase.
The present study aimed to analyze the relationship between the scores of the HADS questionnaire and the segmental acoustic parameters (e.g., F0, F1, F2 and duration of stressed vowels) and also the suprasegmental measures obtained in a sample of 112 individuals (aged 35 to 97) with non-severe mood symptoms. The aim of the study was achieved considering the general alignment of our results with previously reported research related with mood diagnosed disorders.
Regarding anxiety symptoms, there are no acoustic variables that presented a significant association with HADS-A scores. The independent variables used to develop the multiple linear regression do not present a high increase/decrease difference between participants without anxious symptoms and participants with anxious symptoms (see Table 3). In fact, the majority of those differences were below 5%, which can be an explanation for the non-significance observed in the multiple linear regression. The authors can argue that this minor difference could not be sufficient to make the acoustic variables sensitive to sub-clinical anxiety symptoms. Additionally, the observed tendency to higher HADS-A values in younger females has been reported in other studies [76, 79–81], due to interactions between behaviors, internal gender characteristics and stressors .
For depression symptomatology, this study presents significant results for both segmental and suprasegmental levels. At segmental level, vowel duration presents a significant effect of the depressive symptoms, meaning that vowel duration increases as depressive symptoms increase. The significant effect of depressive symptoms in vowel duration found in the present study, that analyzed a sample mostly constituted by speakers with non-severe depressive symptoms, also highlight the importance of segment duration in the identification of mood signs. The current results are in line with Trevino et al. , that, through the use of phone-duration measures instead of global measures of speech rate, found significant positive correlations between the duration of some vowels with the worsening of depression. To reinforce the results obtained in the present study the findings of Alghowinem et al.  and Honig et al.  can also be reported, which concluded that syllable duration (in average) were significantly higher in the group of depressed individuals.
At suprasegmental level, first, the total pause duration increases with more depressive symptoms, and the total speech duration present an inverse trend. Mundt et al.  and Mundt et al.  revealed that great depression symptoms result in more and longer pauses, which was also reflected in a higher total pause time, as occurred in the present research. Conversely, in both studies [4, 34] more time was needed to deliver the message (i.e., more total speech duration). However, other studies have reported that speakers with depressive symptoms exhibit a decrease in speech time or in verbal productivity [84–86]. That is, these speakers tended to produce fewer words  and to decrease the phonation time (i.e., utterances are shorter in duration and are less numerous) . Results concerning the increase in total pause duration and the decrease in total speech duration (the one that maintain the significant effect on the adjusted model) could be considered an index of psychomotor retardation or lower cognitive function, and affect the amount of information content to be communicated . In the present study, the total speech duration decreases in speakers with more depressive symptoms (a difference of -23.3% for participants with depressive symptoms), due to the fact that spontaneous speech production is more cognitively demanding in comparison with automatic speech/reading tasks, requiring preparation, word selection and higher motor articulatory control [4, 84]. The increase in total pause duration in participants with more depressive symptoms could suggest more efforts in communication planning and higher cognitive elaboration time . The current results (i.e., significant effect of HADS-D on total pause duration and total speech duration) also highlight the importance of rhythmic measures assessed in spontaneous speech for depression symptoms recognition.
Additionally, although the acoustic variable number of syllables has not entered in the regression model, due to the high correlation with total speech duration, in the descriptive data a decrease of more than 10% in the number of syllables was observed for both genders with score >7 on HADS-D (see Table 3). The number of syllables decrease may be related with the total speech duration decrease and the total pause duration increase with the depression worsening.
Although speech rate in spontaneous speech does not present a significant effect in depressive symptoms, considering that vowels constitute the syllable nucleus, an increase in the time needed to produce a vowel could contribute to a decrease in syllable production per time unit  and, consequently, a decrease in speech rate in reading task. Speech rate is referred in the literature as one of the most strongly associated acoustic features with depression status  and also one of the first symptoms of depressive disorders, observable by interlocutors . The literature indicates that individuals with more depressive symptoms present lower speech rates [25, 30, 34, 38, 88], even in brief sadness induction . The sensitivity of speech rate for recovery of depressive symptoms has also been evidenced, as the improvement in symptomatology has a positive influence on speech rate [27, 33].
The significant findings mentioned above concerning the rhythmic measures (i.e., total pause duration) and vowel duration do not maintain the effects on the adjusted model by age and gender, which provides evidence of a greater influence of age in mood symptoms. Age is the demographic variable that presents a significant effect on the depressive symptoms assessed by HADS-D. Depression symptoms presented statistically significant higher values in older adults, which is in accordance with studies developed in low-income countries [75, 78]. On epidemiologic studies in Western countries the rate of depression decreases with age, which is the opposite performance of depression mean values across age in the present study. Balabanova and MacKee  and Bobak et al.  suggest that high numbers of depression symptoms in older ages could mirror high levels of poverty or poor physical health. Bromet et al.  also suggest that an increase of depression in elderly could reflect negative changes in social support and in subjective health. The largest European research study on aging (DO-HEALTH)  concludes that elderly individuals in Portugal present low levels of cognitive and physical health compared to other European countries. In fact, only 9% of the Portuguese sample are considered healthy, much lower than the 58% from Austria, 51% from Switzerland or 38% from Germany. Portuguese researchers from the DO-HEALTH study revealed that different social resources could explain poor health levels, including level of education, values of pensions or ease access to health care.
Several differences found between the present study and previous research could be due the presence to having participants with absent-to-mild symptoms whereas most studies also include individuals with severe mood symptoms.
This study presented some limitations. The first limitation is related with the task nature used to extract suprasegmental features once the older speakers performed smaller descriptions than younger adults. This may be related with task or indicate differences in linguistic domain . Lastly, some results should be considered with caution due to the recording environment and the automatic extraction procedures, considering that labelled syllables were not manually verified, but they were obtained in a standardized way for all speakers. Additionally, while certain acoustic features were found to be associated and important in explaining depression and anxiety symptoms, machine learning models would be needed to determine how important they are predicting these mental health states.
The results of this study lead to different conclusions, concerning the impact of anxiety/depression symptoms on acoustic features extracted by a self-assessment of mood in a sample of adult individuals aged 35-97.
For the individuals of the present study, mainly constituted by adults with non-severe mood symptoms, an increase in depressive symptoms is associated with higher vowel duration, increase of total pause duration and less total speech duration in the univariable model. Adjusting the linear model for age and gender revealed that age affects the depressive symptoms. Only the total speech duration decrease in the adjusted model, along with age, maintain the significant relationship with depression symptoms. Contrariwise, an increase of the anxiety symptoms did not present significant relationships associated with the acoustic parameters studied.
The present study reports the association between non-severe symptoms of anxiety/depression and segmental and suprasegmental acoustic features, constituting an advance in this research field. However, and considering the study limitations, future research studies intend to analyze acoustic features extracted from other speech samples (e.g., text reading) in a group of individuals with a diagnose of anxiety and/or depression compared with a control group across lifetime.
S1 File. Database.
Speakers information concerning HADS scores, acoustic measures and demographic variables.
We would like to thank the institutions for making the data collection possible, and also to all the adults who contributed as speakers. We are also grateful to Professor Plínio Barbosa for his collaboration in the adaptation of the scripts used.
WHO. Mental disorders; 2018. Available from: https://www.who.int/en/news-room/fact-sheets/detail/mental-disorders.
- 2. Olesen J, Gustavsson A, Svensson M, Wittchen HU, Jönsson B. The economic cost of brain disorders in Europe. European Journal of Neurology. 2012;19:155–162. pmid:22175760
- 3. Low DM, Bentley KH, Ghosh SS. Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investigative Otolaryngology. 2020;5(1):96–116. pmid:32128436
- 4. Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J of neurolinguistics. 2007;20(1):50–64. pmid:21253440
- 5. Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF. A review of depression and suicide risk assessment using speech analysis. Speech Communication. 2015;71:10–49.
- 6. Taguchi T, Tachikawa H, Nemoto K, Suzuki M, Nagano T, Tachibana R, et al. Major depressive disorder discrimination using vocal acoustic features. Journal of affective disorders. 2018;225:214–220. pmid:28841483
- 7. Scherer KR. Vocal Affect Expression. A Review and a Model for Future Research. Psychological Bulletin. 1986;99(2):143–165. pmid:3515381
Scherer S, Morency LP, Gratch J, Pestian J. Reduced vowel space is a robust indicator of psychological distress: a cross-corpus analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2015. p. 4789–4793.
- 9. Wang X. Segmental versus Suprasegmental: Which One is More Important to Teach? RELC Journal. 2020; p. 1–9.
- 10. Laukka P, Linnman C, Åhs F, Pissiota A, Frans Ö, Faria V, et al. In a nervous voice: Acoustic analysis and perception of anxiety in social phobics’ speech. Journal of Nonverbal Behavior. 2008;32(4):195.
- 11. Özseven T, Dügenci M, Doruk A, Kahraman HI. Voice Traces of Anxiety: Acoustic Parameters Affected by Anxiety Disorder. Archives of Acoustics. 2018;43(4):625–636.
Sataloff RT. Treatment of Voice Disorders. San Diego: Plural Publishing; 2005.
Banse R, Scherer KR. Acoustic profiles in vocal emotion expression.; 1996.
- 14. Hagenaars MA, van Minnen A. The effect of fear on paralinguistic aspects of speech in patients with panic disorder with agoraphobia. Journal of Anxiety Disorders. 2005;19(5):521–537. pmid:15749571
- 15. Diamond GM, Rochman D, Amir O. Arousing primary vulnerable emotions in the context of unresolved anger: “Speaking about” versus “speaking to”. Journal of Counseling Psychology. 2010;57(4):402–410.
- 16. Weeks JW, Lee CY, Reilly AR, Howell AN, France C, Kowalsky JM, et al. “The Sound of Fear”: Assessing vocal fundamental frequency as a physiological indicator of social anxiety disorder. Journal of anxiety disorders. 2012;26(8):811–822. pmid:23070030
- 17. Goberman AM, Hughes S, Haydock T. Acoustic characteristics of public speaking: Anxiety and practice effects. Speech communication (Print). 2011;53(6):10–867.
Drioli C, Tisato G, Cosi P, Tesser F. Emotions and voice quality: experiments with sinusoidal modeling; 2003.
- 19. Protopapas A, Lieberman P. Fundamental frequency of phonation and perceived emotional stress. J Acoust Soc Am. 1997;101(4):2267–2277. pmid:9104028
- 20. Ververidis D, Kotropoulos C. Emotional Speech Recognition: Resources, Features, and Methods. Speech Communication. 2006;48:1162–1181.
Wörtwein T, Morency L, Scherer S. Automatic assessment and analysis of public speaking anxiety: A virtual audience case study. In: International Conference on Affective Computing and Intelligent Interaction (ACII); 2015. p. 187–193.
- 22. Murray IR, Arnott JL. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. J Acoust Soc Am. 1993;93(2):1097–1108. pmid:8445120
Fuller BF, Horii Y, Conner DA. Validity and reliability of nonverbal voice measures as indicators of stressor-provoked anxiety.; 1992.
- 24. Siegman AW, Boyle S. Voices of fear and anxiety and sadness and depression: The effects of speech rate and loudness on fear and anxiety and sadness and depression. Journal of Abnormal Psychology. 1993;102(3):430–437. pmid:8408955
- 25. Ellgring H, Scherer KR. Vocal indicators of mood change in depression. Journal of Nonverbal Behavior. 1996;20(2):83–110.
- 26. Won E, Kim YK. Stress, the Autonomic Nervous System, and the Immune-kynurenine Pathway in the Etiology of Depression. Current neuropharmacology. 2016;14(7):665–673. pmid:27640517
- 27. Alpert M, Pouget ER, Silva RR. Reflections of depression in acoustic measures of the patient’s speech. Journal of affective disorders. 2001;66(1):59–69. pmid:11532533
Scherer S, Stratou G, Mahmoud M, Boberg J, Gratch J, Rizzo A, et al. Automatic behavior descriptors for psychological disorder analysis. In: International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE; 2013. p. 1–8.
- 29. Yang Y, Fairbairn C, Cohn JF. Detecting depression severity from vocal prosody. IEEE Transactions on Affective Computing. 2013;4(2):142–150. pmid:26985326
- 30. Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ. Voice acoustical measurement of the severity of major depression. Brain and cognition. 2004;56(1):30–35. pmid:15380873
- 31. Breznitz Z. Verbal Indicators of Depression. The Journal of General Psychology. 1992;119(4):351–363. pmid:1491239
Hönig F, Batliner A, Nöth E, Schnieder S, Krajewski J. Automatic Modelling of Depressed Speech: Relevant Features and Relevance of Gender. In: INTERSPEECH. Singapore: ISCA; 2014. p. 1248–1252.
- 33. Kuny S, Stassen HH. Speaking behavior and voice sound characteristics in depressive patients during recovery. Journal of Psychiatric Research. 1993;27(3):289–307. pmid:8295161
- 34. Mundt JC, Vogel AP, Feltner DE, Lenderking WR. Vocal acoustic biomarkers of depression severity and treatment response. Biological Psychiatry. 2012;72(7):580–587. pmid:22541039
- 35. Bennabi D, Vandel P, Papaxanthis C, Pozzo T, Haffen E. Psychomotor retardation in depression: a systematic review of diagnostic, pathophysiologic, and therapeutic implications. BioMed Research International. 2013; p. 1–18. pmid:24286073
- 36. Greden JF. Psychomotor monitoring: A promise being fulfilled? Journal of Psychiatric Research. 1993;27(3):285–287. pmid:8295160
- 37. Roy N, Nissen SL, Dromey C, Sapir S. Articulatory changes in muscle tension dysphonia: Evidence of vowel space expansion following manual circumlaryngeal therapy. Journal of Communication Disorders. 2009;42(2):124–135. pmid:19054525
- 38. Sobin C, Sackeim HA. Psychomotor symptoms of depression. Am J Psychiatry. 1997;154(1):4–17. pmid:8988952
Horwitz R, Quatieri TF, Helfer BS, Yu B, Williamson JR, Mundt J. On the relative importance of vocal source, system, and prosody in human depression. In: 2013 IEEE International Conference on Body Sensor Networks. IEEE; 2013. p. 1–6.
- 40. Nilsonne Å. Acoustic analysis of speech variables during depression and after improvement. Acta psychiatrica scandinavica. 1987;76(3):235–245. pmid:3673650
Quatieri TF, Malyska N. Vocal-source biomarkers for depression: a link to psychomotor activity. In: INTERSPEECH; 2012. p. 1059–1062.
- 42. Godfrey HPD, Knight RG. The validity of actometer and speech activity measures in the assessment of depressed patients. The British Journal of Psychiatry. 1984;145(2):159–163. pmid:6466912
- 43. Greden JF, Albala AA, Smokler IA, Gardner R, Carroll BJ. Speech pause time: a marker of psychomotor retardation among endogenous depressives. Biological Psychiatry. 1981;16(9):851–859. pmid:7295844
- 44. Hardy P, Jouvent R, Widlöcher D. Speech pause time and the retardation rating scale for depression (ERD): Towards a reciprocal validation. Journal of Affective Disorders. 1984;6(1):123–127. pmid:6231326
- 45. Trevino AC, Quatieri TF, Malyska N. Phonologically-based biomarkers for major depressive disorder. EURASIP Journal on Advances in Signal Processing. 2011;2011(1):42.
- 46. Flint AJ, Black SE, Campbell-Taylor I, Gailey GF, Levinton C. Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. Journal of Psychiatric Research. 1993;27(3):309–319. pmid:8295162
- 47. France DJ, Shiavi RG, Silverman S, Silverman M, Wilkes M. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering. 2000;47(7):829–837. pmid:10916253
- 48. Tolkmitt F, Helfrich H, Standke R, Scherer KR. Vocal indicators of psychiatric treatment effects in depressives and schizophrenics. Journal of Communication Disorders. 1982;15(3):209–222. pmid:7096618
Vicsi K, Sztahó D, Kiss G. Examination of the sensitivity of acoustic-phonetic parameters of speech to depression. In: 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom). IEEE; 2012. p. 511–515.
Cummins N, Epps J, Ambikairajah E. Spectro-temporal analysis of speech affected by depression and psychomotor retardation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013. p. 7542–7546.
- 51. Ozdas A, Shiavi RG, Silverman SE, Silverman MK, Wilkes DM. Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions on Biomedical Engineering. 2004;51(9):1530–1540. pmid:15376501
Cummins N, Epps J, Breakspear M, Goecke R. An investigation of depressed speech detection: features and normalization. In: INTERSPEECH; 2011. p. 2997–3000.
Albuquerque L, Oliveira C, Teixeira A, Sa-Couto P, Figueiredo D. Age-related changes in European Portuguese vowel acoustics. In: INTERSPEECH. Graz, Austria; 2019. p. 3965–3969.
- 54. Pais-Ribeiro J, Silva I, Ferreira T, Martins A, Meneses R, Baltar M. Validation study of a Portuguese version of the Hospital Anxiety and Depression Scale. Psychology, health & medicine. 2007;12(2):225–237. pmid:17365902
- 55. Bjelland I, Dahl AA, Haug TT, Neckelmann D. The validity of the Hospital Anxiety and Depression Scale: an updated literature review. Journal of psychosomatic research. 2002;52(2):69–77. pmid:11832252
- 56. Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta psychiatrica scandinavica. 1983;67(6):361–370. pmid:6880820
- 57. Dietrich M, Abbott KV, Gartner-Schmidt J, Rosen CA. The frequency of perceived stress, anxiety, and depression in patients with common pathologies affecting voice. Journal of Voice. 2008;22(4):472–488. pmid:18395419
- 58. Eichhorn JT, Kent RD, Austin D, Vorperian HK. Effects of aging on vocal fundamental frequency and vowel formants in men and women. Journal of Voice. 2018;32(5):644.e1–644.e9. pmid:28864082
Draxler C, Jänsch K. SpeechRecorder (3.12.0); 2017.
Goodglass H, Kaplan E. The Assessment of Aphasia and Related Disorders. 2nd ed. Philadelphia, PA.: Lea and Febiger; 1983.
- 61. Kisler T, Reichel U, Schiel F. Multilingual processing of speech via web services. Computer Speech and Language. 2017;45:326–347.
Schiel F. Automatic phonetic transcription of non prompted speech. In: 14th ICPhS. San Francisco; 1999. p. 607–610.
Boersma P, Weenink D. Praat: doing phonetics by computer; 2012. Available from: http://www.praat.org/.
- 64. Albuquerque L, Oliveira C, Teixeira A, Sa-Couto P, Figueiredo D. A comprehensive analysis of age and gender effects in European Portuguese oral vowels. Journal of Voice. 2020;(In press).
- 65. de Jong NH, Wempe T. Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods. 2009;41(2):385–390. pmid:19363178
- 66. Escudero P, Boersma P, Rauber AS, Bion RAH. A cross-dialect acoustic description of vowels: Brazilian and European Portuguese. J Acoust Soc Am. 2009;126(3):1379–1393. pmid:19739752
Albuquerque L, Oliveira C, Teixeira A, Sa-Couto P, Freitas J, Dias MS. Impact of age in the production of European Portuguese vowels. In: INTERSPEECH. Singapore; 2014. p. 940–944.
Oliveira C, Cunha MM, Silva S, Teixeira A, Sa-Couto P, Sá-Couto P. Acoustic analysis of European Portuguese oral vowels produced by children. In: IberSPEECH. vol. 328. Madrid, Spain; 2012. p. 129–138.
Barbosa PA. Incursões em torno do ritmo da fala. Campinas: FAPESP/Pontes Editores; 2006.
Barbosa PA. Automatic duration-related salience detection in Brazilian Portuguese read and spontaneous speech. In: Speech Prosody. Chicago; 2010. p. 100067:1–4.
Barbosa PA. Semi-automatic and automatic tools for generating prosodic descriptors for prosody research. In: TRASP. vol. 13. Aix-en-Provence; 2013. p. 86–89.
IBM Corp. SPSS Statistics for Windows; 2017.
Hopkins WG. A scale of magnitudes for effect statistics. A new view of statistics; 2002. Available from: www.sportsci.org/resource/stats/effectmag.html.
Gillam R, Logan K, Pearson N. Test of childhood stuttering. Austin, Texas: Pro-Ed Inc; 2009.
- 75. Bromet EJ, Gluzman SF, Paniotto VI, Webb CPM, Tintle NL, Zakhozha V, et al. Epidemiology of psychiatric and alcohol disorders in Ukraine. Soc Psychiatry Psychiatr Epidemiol. 2005;40(9):681–690. pmid:16160752
- 76. Girgus J, Yang K, Ferri C. The Gender Difference in Depression: Are Elderly Women at Greater Risk for Depression Than Elderly Men? Geriatrics. 2017;2(35):1–21. pmid:31011045
- 77. Jorm AF. Does old age reduce the risk of anxiety and depression? A review of epidemiological studies across the adult life span. Psychological Medicine. 2000;30:11–22. pmid:10722172
- 78. Kessler RC, Birnbaum HG, Shahly V, Bromet E, Hwang I, McLaughlin KA, et al. Age differences in the prevalence and co-morbidity of DSM-IV major depressive episodes: results from the WHO World Mental Health Survey Initiative. Depression and Anxiety. 2010;27(4):351–364. pmid:20037917
- 79. Kuehner C. Why is depression more common among women than among men? The Lancet Psychiatry. 2017;4(2):146–158. pmid:27856392
Salk RH, Hyde JS, Abramson LY. Gender differences in depression in representative national samples: Meta-analyses of diagnoses and symptoms; 2017.
- 81. Van de Velde S, Bracke P, Levecque K. Gender differences in depression in 23 European countries. Cross-national variation in the gender gap in depression. Social Science & Medicine. 2010;71(2):305–313. pmid:20483518
- 82. Girgus JS, Yang K. Gender and depression. Current Opinion in Psychology. 2015;4:53–60.
Alghowinem S, Goecke R, Wagner M, Epps J, Breakspear M, Parker G. From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech. In: 25th International Florida Artificial Intelligence Research Society Conference. Association for the Advancement of Artificial Intelligence (AAAI); 2012.
Esposito A, Esposito AM, Likforman-Sulem L, Maldonato MN, Vinciarelli A. On the Significance of Speech Pauses in Depressive Disorders: Results on Read and Spontaneous Narratives. In: Esposito A, Faundez-Zanuy M, Esposito AM, Cordasco G, Drugman T, Solé-Casals J, et al., editors. Recent advances in nonlinear speech processing. vol. 48 ed. Springer; 2016. p. 73–82.
- 85. Hall jA, Harrigan JA, Rosenthal R. Nonverbal behavior in clinician—patient interaction. Applied & Preventive Psychology. 1995;4:21–37.
Klumpp H, Deldin P. Review of brain functioning in depression for semantic processing and verbal fluency; 2010.
- 87. Kraepelin E. Manic Depressive Insanity and Paranoia. The Journal of Nervous and Mental Disease. 1921;53(4).
- 88. Stassen HH, Kuny S, Hell D. The speech analysis approach to determining onset of improvement under antidepressants. European Neuropsychopharmacology. 1998;8(4):303–310. pmid:9928921
- 89. Sobin C, Alpert M. Emotion in speech: the acoustic attributes of fear, anger, sadness, and joy. Journal of Psycholinguistic Research. 1999;28(4):347–365. pmid:10380660
- 90. Balabanova D, McKee M. Access to health care in a system transition: the case of Bulgaria. Int J Health Plann Mgmt. 2002;17(4):377–395.
- 91. Bobak M, Room R, Pikhart H, Kubinova R, Malyutina S, Pajak A, et al. Contribution of drinking patterns to differences in rates of alcohol related problems between three urban populations. Journal of Epidemiology and Community Health. 2004;58(3):238–242. pmid:14966239
Bischoff-Ferrari HAc. DO-HEALTH; 2019. Available from: http://do-health.eu/wordpress/.
- 93. Mortensen L, Meyer AS, Humphreys GW. Age-related effects on speech production: A review. Lang Cognitive Proc. 2006;21(1-3):238–290.