Fig 1.
The bar plot (with standard error) of the fundamental frequency (in semitones with a reference of 100 Hz) of Mandarin Chinese and English with and without mask across gender (female and male).
Fig 2.
The bar plot (with standard error) of the speech rate (syllables/second) of Mandarin Chinese and English with and without mask across gender (female and male).
Fig 3.
The bar plot (with standard error) of the intensity (in dB) of Mandarin Chinese and English with and without mask across gender (female and male).
Fig 4.
The bar plot (with standard error) of the harmonic-noise-ratio (HNR in dB) of Mandarin Chinese and English with and without mask across gender (female and male).
Fig 5.
The bar plot (with standard error) of the jitter (in %) and shimmer (in dB) of Mandarin Chinese and English with and without mask across gender (female and male).
Fig 6.
The Bar plot (with standard error) of the H1-H2 (in dB) of Mandarin Chinese and English with and without mask across gender (female and male).
Table 1.
Average values (standard deviation) of the seven acoustic parameters with and without mask in Mandarin Chinese and English.
Table 2.
Results of the linear mixed-effect models on the seven acoustic parameters with mask (with vs. without), gender (female vs. male), and language (Mandarin Chinese vs. English) as independent variables.
The asterisks indicate the levels of statistical significance: * p < 0.05; ** p < 0.01; *** p < 0.001.
Table 3.
Optimized parameters and accuracies of the automatic classification analyses on speech with and without face mask across language based on the four supervised learning algorithms [i.e., linear discriminant analysis (LDA), naïve bayes classifier (NBC), random forest (RF), and support vector machine (SVM)].
All seven acoustic parameters were included as the predictor variables.
Fig 7.
The significance of the seven acoustic parameters in the random forest models for mask speech identification (data: All speech, Mandarin speech, and English speech) and individual speaker identification (data: All speech mixed by mask and no mask speech).
Table 4.
Optimized parameters and accuracies of the automatic speaker identification on all speech data based on the four supervised learning algorithms [i.e., linear discriminant analysis (LDA), naïve bayes classifier (NBC), random forest (RF), and support vector machine (SVM)].
All seven acoustic parameters were included as the predictor variables.
Fig 8.
An example of spectrogram on the same segment (“one day” in English) of no mask speech (above) and mask speech (below).
The speech was annotated in two layers, viz., word (i.e., first layer) and phoneme (i.e., second layer; in a ARPABET phone set). The red dots indicate the first four formants of speech.