The detection of algebraic auditory structures emerges with self-supervised learning
Fig 2
A–C) Zero-shot emergence of word chunking.
A. Example of a syllable stream tested. B. Model contrastive loss to the first, second and third syllable is measured for each triplet of syllables (words) in a syllable stream. The loss is averaged over 30 trial of the tasks. The loss standard deviation across these trial is indicated as shaded area. (see Fig D in S1 Material for zooms). C. Evolution of the model ability to detect regular words as a function of its pretraining. This ability is measured by the difference between the mean contrastive loss of the second random sequence and the last repetition of the regular sequence. D-E-F. Same as A-B-C but with repeated tones sequences of cycle size 5 or 20 tones. G-H-I. Same as A-B-C but with a nested algebraic pattern from the Local Global paradigm. J-K-L. Same as A,B,C but with a set of 10 algebraic patterns of increasing complexity. In panel K, the “alternate” and “center-mirror” sequences are plotted, of complexity respective 6 and 21. Modeled studies: [8,14,21,23].