A New Approach to Model Pitch Perception Using Sparse Coding

doi:10.1371/journal.pcbi.1005338

A New Approach to Model Pitch Perception Using Sparse Coding

Fig 7

Comparing the performance of different dictionaries over moderate and high amplitude stimulus levels.

All simulations have the same spectral structure (Eq 6). This spectral structure is simulated for various fundamental frequencies, f₀, and the figures show the estimated pitches for each such case (i.e., the maximum peak in each pdf). The estimations are taken from an interval of ± 0 .5 octaves around f₀. Each row, i.e., figures A-B and figures C-D, show the estimation results of the SC model for the two dictionaries D_sine, and D_stack, respectively (see text). The column subplots refer to different stimuli levels: moderate (45dB SPL), and high (90dB SPL) amplitudes. The x-axis denotes the location of the first harmonic within the stimuli (i.e., the 3^rd harmonic); the thick black dashed lines define the main octave (f₀), and the thin black dashed lines define the lower and upper octaves, i.e., 0.5 f₀ and 2f₀, respectively. (A-B) At low frequencies, up to about 4k Hz of the lower harmonic in the complex stimulus, the estimations of the D_sine dictionary converge to the expected frequencies for both moderate and high stimuli. However, from 4k Hz and above, the pitch estimations for the high stimuli levels diverge from the main octave to other ratios of f₀. (C-D) The pitch estimations of the D_stack dictionary converge to the main octave better for the low and high frequencies and for both amplitudes.

doi: https://doi.org/10.1371/journal.pcbi.1005338.g007