Skip to main content
Advertisement

< Back to Article

A New Approach to Model Pitch Perception Using Sparse Coding

Fig 9

Resolved vs. unresolved representation of harmonic cues.

(A) The solutions hk, k ∈ [1, 5], for the stimuli of Eq 7. We compare the SCs of the two dictionaries, Dsine (lines) and Dstack (dashed lines). Dsine consists of tone-atoms and Dstack consists of complex tones that contain six harmonics with decreasing amplitudes (1 to 1/6). All stimuli contain four harmonics of the same fundamental frequency, f0 = 433 Hz, but at different spectral locations (r ∈ {1, 6, 10, 17, 22}). The x-axis is normalized by f0 for convenience. The correlation between the SC solutions and the stimuli' spectral components (Eq 7) are apparent. Note that signals with low-frequency components (such as h1) have more prominent nonzero coefficients than those of the higher harmonics (e.g., h5). A closer look at h5 (the inset) shows that only two of the four harmonics are successfully reconstructed (the 23 and 24 tones of the 22–25 harmonics). (B) Pitch probabilities (pdfs) for the five complex tones for the Dsine (see text). The right figure shows all fp frequencies and the left one views fewer octaves around f0. The numbers above the curves state the four prominent peaks of the pdfs, from the highest (1) to the fourth lower peak. Observe that all five solutions peak at the first harmonic, that is, the model predicts the same 433 Hz pitch for all stimuli. Additionally, most of the other plausible pitches, i.e., other peaks, are usually located at harmonic ratios of f0, that is, they represent octave equivalence options. It is also instructive to note the fLOCUS frequencies in the right figure of (B). These peaks indicate the additional possibility of perceiving the pitches at the locus of the stimuli spectral energy and not of f0 [1]. All simulations were performed with Slaney's model and with a sound level of 45 dB SPL.

Fig 9

doi: https://doi.org/10.1371/journal.pcbi.1005338.g009