The effects of face mask on speech production and its implication for forensic speaker identification-A cross-linguistic study

doi:10.1371/journal.pone.0283724

Fig 1.

The bar plot (with standard error) of the fundamental frequency (in semitones with a reference of 100 Hz) of Mandarin Chinese and English with and without mask across gender (female and male).

More »

Expand

Fig 2.

The bar plot (with standard error) of the speech rate (syllables/second) of Mandarin Chinese and English with and without mask across gender (female and male).

More »

Expand

Fig 3.

The bar plot (with standard error) of the intensity (in dB) of Mandarin Chinese and English with and without mask across gender (female and male).

More »

Expand

Fig 4.

The bar plot (with standard error) of the harmonic-noise-ratio (HNR in dB) of Mandarin Chinese and English with and without mask across gender (female and male).

More »

Expand

Fig 5.

The bar plot (with standard error) of the jitter (in %) and shimmer (in dB) of Mandarin Chinese and English with and without mask across gender (female and male).

More »

Expand

Fig 6.

The Bar plot (with standard error) of the H1-H2 (in dB) of Mandarin Chinese and English with and without mask across gender (female and male).

More »

Expand

Table 1.

Average values (standard deviation) of the seven acoustic parameters with and without mask in Mandarin Chinese and English.

More »

Expand

Table 2.

Results of the linear mixed-effect models on the seven acoustic parameters with mask (with vs. without), gender (female vs. male), and language (Mandarin Chinese vs. English) as independent variables.

The asterisks indicate the levels of statistical significance: * p < 0.05; ** p < 0.01; *** p < 0.001.

More »

Expand

Table 3.

Optimized parameters and accuracies of the automatic classification analyses on speech with and without face mask across language based on the four supervised learning algorithms [i.e., linear discriminant analysis (LDA), naïve bayes classifier (NBC), random forest (RF), and support vector machine (SVM)].

All seven acoustic parameters were included as the predictor variables.

More »

Expand

Fig 7.

The significance of the seven acoustic parameters in the random forest models for mask speech identification (data: All speech, Mandarin speech, and English speech) and individual speaker identification (data: All speech mixed by mask and no mask speech).

More »

Expand

Table 4.

Optimized parameters and accuracies of the automatic speaker identification on all speech data based on the four supervised learning algorithms [i.e., linear discriminant analysis (LDA), naïve bayes classifier (NBC), random forest (RF), and support vector machine (SVM)].

All seven acoustic parameters were included as the predictor variables.

More »

Expand

Fig 8.

An example of spectrogram on the same segment (“one day” in English) of no mask speech (above) and mask speech (below).

The speech was annotated in two layers, viz., word (i.e., first layer) and phoneme (i.e., second layer; in a ARPABET phone set). The red dots indicate the first four formants of speech.

More »

Expand