Fig 1.
Categorization of the disturbances associated to the hypokinetic dysarthria of PD patents.
Fig 2.
Histogram of the UPDRS-III labels of the corpus of speakers.
Fig 3.
Speech waveform and spectrogram of a 35 years old normophonic speaker uttering the /pa/-/ta/-/ka/ test.
Fig 4.
Detail of the speech corresponding to the /ka/ syllable of a 35 years old normophonic speaker.
The syllable starts with a stop gap (silence), followed by a burst that is previous to the periodic sound of the vowel. The structure depicted is typical of the plosive consonant-vowel combinations used in the DDK test.
Table 1.
Segments in the consonant-vowel combinations of the DDK test with their corresponding time and frequency characteristics.
Fig 5.
Speech trace and spectrograms of voiceless bilabial/alveolar/velar (left/center/right) stops uttered by five PD patients with different degrees of the disease according to the H&Y and UPDRS scales.
Fig 6.
Analogy of the movements of the articulators in PD patients.
Speakers do not move the articulators to their largest extent, with the required acceleration, and during the required time.
Fig 7.
Recognition rate vs. kernel length used to calculate the velocity of the envelope.
Fig 8.
Recognition rate vs. kernel lengths used to calculate the velocity and acceleration of the envelope.
A 50 ms long kernel for the velocity corresponds with a 40 ms kernel for the acceleration.
Fig 9.
Speech trace with its envelope and an estimate of the velocity and acceleration of the envelope for a young normophonic 35 years old person (a), a control speaker (b), a parkinsonian patient with H&Y = 2 (c), and a parkinsonian patient with H&Y = 3 (d), all of them calculated using 50 and 40 ms. long smoothing kernels for the velocity and acceleration respectively.
The speech traces correspond to one single utterance of the /pa/-/ta/-/ka/ test. The amplitudes are normalized in the range [–1, 1] for each 1.37 s long frame of analysis. Note that the time scales are different for each plot due to a different speech rate.
Fig 10.
3D attractors of the envelope speed for a young normophonic 35 years old person (a), a control speaker (b), a parkinsonian patient with H&Y = 2 (c), and a parkinsonian patient with H&Y = 3 (d), all of them calculated using 50 ms long smoothing kernel for the speed and a time delay of 70 samples.
Fig 11.
Accuracy vs. window size for a GMM-UBM system trained with 128 gaussians for two different parameterization approaches.
Best results are with 10 ms. windows.
Table 2.
Best results in terms of accuracy, area under the ROC curve, sensitivity and specificity for both configurations (GMM-UBM and iVectors) and parameterization approaches (MFCC and RASTA-PLP).
Fig 12.
DET curve using GMM-UBM and iVectors approaches for MFCC and RASTA-PLP parameterization approaches.
Fig 13.
Normalized histograms of the UPDRS-III labels corresponding to the speakers wrongly categorized.
a) using GMM-UBM and RASTA-PLP; b) using GMM-UBM and MFCC; c) using iVectors and RASTA-PLP; d) using iVectors and MFCC.
Fig 14.
DET plot of the best baseline system and of the proposed method.
Table 3.
Best results in terms of accuracy ± confidence interval, area under the ROC curve, sensitivity and specificity for both configurations.
Fig 15.
Boxplots corresponding to the complexity measures extracted from the acceleration (top row) and velocity (bottom row) sequences.
Fig 16.
a) Normalized histogram of the UPDRS-III labels corresponding to the speakers wrongly categorized with the proposed method, b) UPDRS-III level vs. score given by the proposed method.
Fig 17.
Example of the estimation of the time lag (a) and embedding dimension (b) for a 1.37 s. long frame corresponding to the velocity of variation of the envelope of a normophonic speaker during the /pa/-/ta/-/ka/ test. In this example, the first minimum of the auto mutual information can be found at 70. Regarding the embedding dimension, the plot of the E1 value used for the Cao’s method shows a kink at 6. The histograms in (c) and (d) correspond to the time delays and embedding dimensions respectively obtained for all the frames extracted from the database.