Segregating Complex Sound Sources through Temporal Coherence

doi:10.1371/journal.pcbi.1003985

Segregating Complex Sound Sources through Temporal Coherence

Figure 1

The temporal coherence model consists of two stages.

(A) Transformation of sound into a cortical representation [34]: It begins with a computation of the auditory spectrogram (left panel), followed by an analysis of its spectral and temporal modulations in two steps (middle and right panels, respectively): a multi-scale (or a multi-bandwidth) wavelet analysis along the spectral dimension to create the frequency-scale responses, , followed by a wavelet analysis of the modulus of these outputs to create the final cortical outputs (right panel). (B) Coincidence and clustering: The cortical outputs at each time-step are used to compute a family of coincidence matrices (left panel). Each matrix () is the outer product of the cortical outputs (i.e., separately for each modulation rate ). The C-matrices are then stacked (middle panel) and simultaneously decomposed by a nonlinear auto-encoder network (right panel) into two principal components corresponding to the foreground and background masks which are used to segregate the cortical response.

doi: https://doi.org/10.1371/journal.pcbi.1003985.g001