Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem
Fig 2
Mapping from GMM–HMM triphone log likelihoods to phone model RDMs.
(a) Each 10 ms frame of audio is transformed into MFCC vectors. From these, a GMM estimates triphone log likelihoods, which are used in the phonetic HMMs. (b) We used the log likelihood estimates for each triphone variation of each phone, concatenated over a 60 ms sliding window, to model dissimilarities between input words. Dissimilarities modelled by correlation distances between triphone likelihood vectors were collected as entries in phonetic model RDMs. (c) These phone-specific model RDMs were computed through time for each sliding window position, yielding 40 time-varying model RDMs.