Detecting Nasal Vowels in Speech Interfaces Based on Surface Electromyography

doi:10.1371/journal.pone.0127040

Fig 1.

Muscles of the soft palate from posterior (left), and the side (right) view.

More »

Expand

Fig 2.

Mid-sagittal RT-MRI images of the vocal tract for several velum positions, over time, showing evolution from a raised velum, to a lowered velum and back to initial conditions.

The presented curve, used for analysis, was derived from the images.

More »

Expand

Fig 3.

Exemplification of the warped signal representing the nasal information extracted from RT-MRI (dashed line) superimposed on the speech recorded during the corresponding RT-MRI and EMG acquisition, for the sentence [6~p6, p6~p6, p6~].

More »

Expand

Fig 4.

Coronal-oblique RT-MRI images depicting the nasal cavity (in white), over time, and the curve derived for analysis purposes.

More »

Expand

Fig 5.

EMG electrodes positioning and the respective channels (1 to 5) plus the reference electrode (R).

EMG 1 and 2 use unipolar configurations and EMG 3, 4 and 5 use bipolar configurations.

More »

Expand

Fig 6.

Exemplification of the EMG signal segmentation into nasal and non-nasal zones based on the information extracted from the RT-RMI (dashed red line).

The square wave depicted with a black line represents the velum information split into two classes where 0 stands for non-nasal and 1 for nasal. The blue line is the average of the RT-MRI information (after normalization) and the green line is the average plus half of the standard deviation.

More »

Expand

Fig 7.

Raw EMG signal and pre-processed EMG signal of channel 1 (top) and 3 (bottom) for the sentence [6~p6, p6~p6, p6~] from speaker 1.

The pre-processed signal has been normalized and filtered using a 12-point moving average filter.

More »

Expand

Fig 8.

Filtered EMG signal (12-point moving average filter) for the several channels (pink), the aligned RT-MRI information (blue) and the respective audio signal for the sentence [6~p6, p6~p6, p6~] from speaker 1.

An amplitude gain was applied to the RT-MRI information and to the EMG for better visualization of the superimposed signals.

More »

Expand

Fig 9.

Filtered EMG signal (12-point moving average filter) for the several channels (pink), the aligned RT-MRI information (blue) and the respective audio signal for the sentence [i~p6, i~p6, pi~] from speaker 1.

An amplitude gain was applied to the RT-MRI information and to the EMG for better visualization of the superimposed signals.

More »

Expand

Fig 10.

Portuguese vowels in an isolated context (Pre-processed EMG signal for all EMG channels (pink), the aligned RT-MRI information (blue) and the respective audio signal (red) for [6~, e~, i~, o~, u~]).

An amplitude gain was applied to the RT-MRI information and to the EMG for better visualization of the superimposed signals.

More »

Expand

Fig 11.

Boxplot of the mutual information in the nasal zones between the RT-MRI information and the EMG signal of all speakers and for a single speaker.

More »

Expand

Table 1.

Class distribution for all speakers for a single EMG channel by zones and frames (nasal and non-nasal).

More »

Expand

Fig 12.

Classification results (mean value of the 10-fold for error rate, sensitivity and specificity) for all channels and all speakers.

Error bars show a 95% confidence interval.

More »

Expand

Table 2.

Mean sensitivity and specificity measures (%) for each EMG channel with a 95% confidence interval.

More »

Expand

Fig 13.

The graph on the left shows the mean error rate for each speaker clustered by EMG channel.

The graph on the right shows the mean of the error rates from each speaker also clustered by EMG channel. Error bars show a 95% confidence interval.

More »

Expand

Fig 14.

Difference between the mean error rate of all channels and the respective result of each channel for all (left) and each (right) speaker.

Error bars show a 95% confidence interval.

More »

Expand

Table 3.

Mean error rate grouped by nasal vowel.

More »

Expand

Table 4.

Mean error rate using multiple channels combinations.

More »

Expand

Table 5.

Results of the repeated-measures ANOVA analysis for the EMG channel pairs that attained significance level.

More »

Expand

Table 6.

Mean error rates using a classification technique based on the majority of nasal/non-nasal frames for each zone.

More »

Expand

Table 7.

Mean error rates using a classification technique based on the majority of nasal/non-nasal frames for each nasal zone.

More »

Expand

Fig 15.

Classification results (mean value of the 10-fold for error rate, sensitivity and specificity) for all channels of speaker 1.

These results are based on four additional sessions from this speaker recorded a posteriori. Error bars show a 95% confidence interval.

More »

Expand

Table 8.

Mean sensitivity and specificity measures (%) with a 95% confidence interval for each EMG channel of speaker 1.

More »

Expand