Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires
Fig 18
Speech trajectories showing coarticulation in minimal pairs.
(A) Utterances of the words ‘day’, ‘say’, and ‘way’ are projected into a continuous UMAP latent space with a window size of 4ms. Color represents time, where darker is earlier in the word. (B) The same projections as in (A) but color-coded by the corresponding word. (C) The same projections are colored by the corresponding phonemes. (D) The average latent trajectory for each word. (E) The average trajectory for each phoneme. (F) Example spectrograms of words, with latent trajectories above spectrograms and phoneme labels below spectrograms. (G) Average trajectories and corresponding spectrograms for the words ‘take’ and ‘talk’ showing the different trajectories for ‘t’ in each word. (H) Average trajectories and the corresponding spectrograms for the words ‘then’ and ‘them’ showing the different trajectories for ‘eh’ in each word.