Brain-inspired model for early vocal learning and correspondence matching using free-energy optimization

doi:10.1371/journal.pcbi.1008566

Brain-inspired model for early vocal learning and correspondence matching using free-energy optimization

Fig 10

Analysis of STR reconstruction and MFCC mapping during acoustic matching with different speakers.

In A, the correspondence matrix between STR units X and MFCCs vector A within the audio database of unheared voices. In B, the Euclidean distance between the MFCC vectors of the predicted STR units X with the ground truth MFCC vectors A within the audio database. In C the correspondence matrix between the ground truth MFCC vectors A and the nearest ones B from the reconstructed vectors X selected in STR, based on the correspondence matrix in A; plotted for the first 10.000 MFCC vectors. In D, a zoom in the correspondence matrix for 5000 units within the interval range [120.000; 125.000]. The diagonal indicates the good matching between what perceives the Inferno network and what it can pronounce, even from unheared MFCC samples during the learning stage. In E, the ABX distance histogram proposed by [76, 77] computed from the Euclidean distance between the A and B vectors retrieved previously. In F, an example of a retrieved waveform is provided from an unheared sound sequence after the learning stage.

doi: https://doi.org/10.1371/journal.pcbi.1008566.g010