Objects and events in the sensory environment are generally predictable, making most of the energy impinging upon sensory transducers redundant. Given this fact, efficient sensory systems should detect, extract, and exploit predictability in order to optimize sensitivity to less predictable inputs that are, by definition, more informative. Not only are perceptual systems sensitive to changes in physical stimulus properties, but growing evidence reveals sensitivity both to relative predictability of stimuli and to co-occurrence of stimulus attributes within stimuli. Recent results revealed that auditory perception rapidly reorganizes to efficiently capture covariance among stimulus attributes. Acoustic properties per se were perceptually abandoned, and sounds were instead processed relative to patterns of co-occurrence. Here, we show that listeners’ ability to distinguish sounds from one another is driven primarily by the extent to which they are consistent or inconsistent with patterns of covariation among stimulus attributes and, to a lesser extent, whether they are heard frequently or infrequently. When sounds were heard frequently and deviated minimally from the prevailing pattern of covariance among attributes, they were poorly discriminated from one another. In stark contrast, when sounds were heard rarely and markedly violated the pattern of covariance, they became hyperdiscriminable with discrimination performance beyond apparent limits of the auditory system. Plausible cortical candidates underlying these dramatic changes in perceptual organization are discussed. These findings support efficient coding of stimulus statistical structure as a model for both perceptual and neural organization.
Citation: Stilp CE, Kluender KR (2016) Stimulus Statistics Change Sounds from Near-Indiscriminable to Hyperdiscriminable. PLoS ONE 11(8): e0161001. https://doi.org/10.1371/journal.pone.0161001
Editor: Maurice J. Chacron, McGill University Department of Physiology, CANADA
Received: April 7, 2016; Accepted: July 28, 2016; Published: August 10, 2016
Copyright: © 2016 Stilp, Kluender. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This research was partially supported by grants from the National Institutes on Deafness and Other Communicative Disorders to the first (F31 DC009532) and second (RC1 DC010601) authors. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Objects and events in the sensory environment are highly predictable, making most of the energy impinging upon sensory transducers redundant. According to the Efficient Coding Hypothesis [1–2], the role of early sensory processing is to detect, extract, and exploit predictability in the input. An efficient sensory system should not only weaken its response to frequent or expected stimuli, but also produce a stronger response to infrequent or novel stimuli . Seizing upon predictability in the environment optimizes sensitivity to unpredictability–informative change that facilitates adaptive behavior .
Animal and human studies alike reveal heightened sensitivity to infrequent (less predictable) stimuli. Single-unit physiological studies demonstrate increased neural firing in response to a low-probability change in the stimulus, known as stimulus-specific adaptation (SSA; inferior colliculus: [5–6]; thalamus: [7–8]; cortex: [9–11]). Similar (but not identical) mechanisms are reported at the neural population level in the event-related cortical potential termed the mismatch negativity response (MMN; [12–15]). In both cases, unpredictable (‘deviant’) inputs elicit higher firing rates or larger evoked responses than predictable (‘standard’) inputs. Sensitivity to stimulus novelty extends to behavior as well, as discrimination is superior for rarely presented sounds .
While widely studied, probability of occurrence is only one form of predictability in the environment (e.g., covariance among stimulus features, conditional and transitional probabilities across time). Additionally, while natural sounds are typically complex and vary along a multitude of physical dimensions, stimuli used in these foregoing studies were relatively simple sounds that varied along a single physical dimension. Most natural signals are comprised of multiple attributes that covary in ways that reflect a structured world. For example, many acoustic attributes of speech sounds covary with one another in ways that reflect constraints on vocal tracts, and this redundancy provides impressive perceptual resilience to signal distortion [17–22].
Covariance among stimulus properties has dramatic consequences for perceptual organization. For example, a lifetime of experience with robust covariance between binocular disparity and texture leads to these cues functioning as the single dimension of perceived slant . Perceptual reorganization to efficiently capture covariance among attributes of novel sounds is sufficiently robust to develop within minutes of hearing them [24–26]. When presented with a range of novel complex sounds with near-perfectly redundant acoustic properties, discrimination performance was best predicted by whether stimulus differences adhered to or violated the main pattern of covariance among stimulus attributes (i.e., according to shared versus unshared covariance). As evidence of perceptual reorganization, sounds that are consistent with the main pattern of covariance remained discriminable, but sounds that modestly violated this pattern were poorly discriminated despite all stimuli being matched for equivalent psychoacoustic differences. Values for individual stimulus dimensions were not atypical; only their combinations varied in probability.
To the extent that enhancing transmission of information increases efficiency of sensorineural systems, heightened detection of changes from predictable occurrences of stimuli and from predictable co-occurrences of stimulus attributes are both expected. However, while large unidimensional physical deviations perceptually ‘pop out’, nothing is known about perception of large deviations from statistical context defined by covariance among attributes. Here, we investigate whether increasingly large deviations from experienced patterns of covariance receive privileged perceptual processing like that demonstrated for deviations (i.e., novelty) from predictable presentations of simple sounds. Magnitudes of novelty responses increase with increasing unidimensional dissimilarity between ‘standard’ and ‘deviant’ sounds [9,15]. Do complex sounds with properties that are increasingly statistically dissimilar become better discriminated?
The present experiments employed novel complex sounds (Fig 1) to explore perceptual organization based upon both lower-order (probability of occurrence) and higher-order statistical properties (covariance among stimulus attributes). We hypothesized that by making stimuli increasingly unpredictable, both by decreased probability of occurrence and larger violations of covariance among acoustic attributes, they would become more discriminable. Discriminability improved with larger violations of the principal pattern of covariance among attributes, demonstrating a close relation between perceptual organization and experienced statistics of the sensory environment. When sounds were infrequent and were extreme violations of predictable patterns of covariance, they became hyperdiscriminable with perceptual performance beyond apparent limits of the auditory system (i.e., discrimination performance based on acoustic differences alone).
Each circle represents one stimulus; different subsets from this matrix were presented in each experiment. Corner stimuli are replaced by spectrograms (500-ms abscissa, 10 kHz ordinate) to illustrate variation in Spectral Shape and Attack/Decay. Covariance between these properties occurs along either the Consistent statistical dimension (blue line) or the Orthogonal dimension (red line). Each experiment was counterbalanced such that half of listeners heard Consistent stimuli along the blue vector and Orthogonal stimuli along the red vector, while the other half heard Consistent stimuli along the red vector and Orthogonal stimuli along the blue vector.
The first question at test is how perception organizes in response to deviations of increasing magnitude from the principal pattern of covariance among stimulus attributes. When deviations were very small (i.e., minimal violations of the pattern of covariance supported by Consistent sounds, ), listeners were nearly unable to discriminate Orthogonal sounds with performance falling to near-chance levels (mean proportion of pairs correctly discriminated = 0.600, standard error of the mean [s.e.] = .033; compared to mean accuracy for Consistent pairs = 0.670, s.e. = .014; Z = 2.527, P = .011; Fig 2A). This difference extinguished with further testing (Block 2: Consistent mean = 0.681, s.e. = .016, Orthogonal mean = 0.634, s.e. = .030; Block 3: Consistent mean = 0.687, s.e. = .016, Orthogonal mean = 0.647, s.e. = .033). Here, we manipulated shared and unshared covariance by positioning Orthogonal sound pairs at increasing distances away from Consistent stimuli on the diagonal of the stimulus matrix. This systematically increased the amount of unshared covariance in the stimuli, making pairs increasingly statistically deviant.
Figures plot mean accuracy for discriminating pairs of Consistent (blue) or Orthogonal sounds (red) as a function of testing block for each experiment. Insets depict stimulus matrices to indicate which stimuli were tested in each block of each experiment. Half of the participants in each experiment heard stimuli as depicted while the other half heard counterbalanced stimuli rotated 90°. Rows are arranged according to statistical properties of Orthogonal sounds (red text) indicating the extent to which they violated the prevailing pattern of covariance supported by the Consistent sounds, increasing progressively from Minimal Dissimilarity (top row; inferior discrimination) to Extreme Dissimilarity (bottom row; superior discrimination). Major columns indicate frequency of presentation for Consistent and Orthogonal sound pairs: equally often (left column) or Orthogonal sounds withheld until the third testing block (right column). Dashed lines represent baseline performance when acoustic dimensions shared zero redundancy (mean accuracy = 0.690 ); significant improvement beyond baseline performance in Experiment 5 indicates hyperdiscriminability. Asterisks indicate statistically significant differences; *P < .05, **P < .01, ***P < .001. Error bars indicate standard error of the mean.
As the magnitudes of statistical deviations increased, discrimination of those sounds improved from being comparable (Experiment 1 Block 1: Consistent mean = 0.649, s.e. = .021, Orthogonal mean = 0.653, s.e. = .036; Block 2: Consistent mean = 0.648, s.e. = .021, Orthogonal mean = 0.653, s.e. = .027; Block 3: Consistent mean = 0.656, s.e. = .022, Orthogonal mean = 0.666, s.e. = .036; Fig 2B) to better than that for Consistent sounds (Experiment 2 Block 1: Consistent mean = 0.617, s.e. = .017, Orthogonal mean = 0.656, s.e. = .035; Block 2: Consistent mean = 0.631, s.e. = .018, Orthogonal mean = 0.678, s.e. = .038; Block 3: Consistent mean = 0.632, s.e. = .019, Orthogonal mean = 0.694, s.e. = .032; Z = 2.292, P = .022; Fig 2C; S1 Table). Superior discrimination of maximally statistically deviant Orthogonal sounds persisted throughout Experiment 3 (Block 1: Consistent mean = 0.628, s.e. = .016, Orthogonal mean = 0.703, s.e. = .029 [Z = 2.945, P = .003]; Block 2: Consistent mean = 0.652, s.e. = .021, Orthogonal mean = 0.725, s.e. = .029 [Z = 2.972, P = .003]; Block 3: Consistent mean = 0.665, s.e. = .021, Orthogonal mean = 0.734, s.e. = .035 [Z = 2.622, P = .009]; Fig 2D).
The second question at test is whether enhanced processing of unexpected (infrequent) occurrences extends beyond single acoustic dimensions to derived perceptual dimensions capturing patterns of covariance between stimulus attributes. Two experiments introduced manipulation of surprisal [27–28] by withholding presentation of Orthogonal sound pairs until the third and final testing block. These unexpected Orthogonal sound pairs deviated from the main pattern of covariance by either minimal (Experiment 4) or maximal amounts (Experiment 5). When sounds were unexpected but minimally deviant in terms of covariance, they were discriminated modestly worse than Consistent sound pairs (Consistent mean = 0.663, s.e. = .017, Orthogonal mean = 0.628, s.e. = .030; related-samples Wilcoxon signed-rank test: Z = 1.371, P = .170; Fig 2E), similar to when these sounds were presented as frequently as other sounds throughout the experiment (Fig 2A).
Conversely, highly statistically deviant sounds that were both unexpected and extreme violations of feature covariance were discriminated extremely well (mean = 0.795, s.e. = .028). Performance was significantly better than: Consistent sounds (mean = 0.690, s.e. = .018; related-samples Wilcoxon signed-rank test: Z = 3.650, P = .0003; Fig 2F); the same Orthogonal sounds with exposure equal to that for other sounds (Experiment 3 Block 1: mean = 0.703, s.e. = .028; one-tailed Mann-Whitney U test: U = 2.200, P = .014; Fig 2D); and most importantly, baseline performance when stimuli do not share redundant attributes (one-sample one-tailed Wilcoxon signed-rank test against mean discrimination accuracy of 0.690: Z = 2.590, P = .005; dashed line in Fig 2F). Highly statistically deviant sounds were hyperdiscriminable with performance beyond apparent limits of auditory perception. Deferred presentation alone did not contribute to the hyperdiscriminability observed in Experiment 5, as Orthogonal trials in the final testing block of Experiment 4 were discriminated less accurately than Orthogonal trials in the final block of Experiment 5 (two-tailed Mann-Whitney U test: U = 3.724, P = .001). Relative predictability, by simple probability of occurrence and probability of co-occurrence between stimulus attributes, has dramatic consequences for perceptual organization, rendering sounds from near-indiscriminable to hyperdiscriminable.
Principal components analysis (PCA) has reliably predicted discriminability on the basis of patterns of covariance between stimulus attributes [24, 26]. This same approach was used to predict behavioral performance in the present experiments. Values of Spectral Shape (SS) and Attack/Decay (AD) were coded as ordered pairs from 1 to 18 to indicate their positions along each axis of the stimulus matrix. These ordered pairs were arranged into matrices to represent the stimuli presented in each experiment. For example, stimuli in Experiment 1 were coded as follows: (1,1) to (18,18) for the Consistent stimuli, and (5,14) and (8,11) for the Orthogonal stimuli (see Fig 2B). This coding was repeated three times to represent stimuli being tested in three consecutive experimental blocks. A covariance matrix was computed on this stimulus list using the cov command in MATLAB (see Table 1 for covariance matrices for Experiments 1–5). Eigenvalues from PCA were calculated on these covariance matrices using the eig command in MATLAB (S2 Table). Experiment 1 from  (Fig 2A) served as a reference point, with substantial covariance along the Consistent dimension (λ1 = 49.27) and minimal covariance along the Orthogonal dimension (λ2 = 0.46). Increasingly eccentric Orthogonal stimuli in Experiments 1–3 progressively increased the second Eigenvalue (Experiment 1: λ2 = 2.11, Experiment 2: λ2 = 7.05, Experiment 3: λ2 = 9.43), but presentation of the same Consistent stimuli resulted in an unchanged first Eigenvalue. Experiments 4 and 5 required a slightly modified approach as stimuli were no longer tested equally often. Therefore, ordered pairs representing the 18 Consistent stimuli were repeated three times (again to represent testing in all three experimental blocks) while ordered pairs representing the two Orthogonal stimuli were included only once (to represent testing in the third block alone). This marginally increased the first Eigenvalue (λ1 = 52.85) and decreased the second Eigenvalue relative to repeated presentations of the same stimuli (Experiment 4: λ2 = 0.16, compared to λ2 = 0.46 in ; Experiment 5: λ2 = 3.60, compared to λ2 = 9.43 in Experiment 3).
Previous experiments tested discriminability of Orthogonal sounds that deviated only modestly from the Consistent sounds, reflected by very small second Eigenvalues (length of second Eigenvector) of the covariance matrix. With relatively short Eigenvectors, larger second Eigenvalues corresponded to a decrease in the advantage in discriminability for Consistent versus Orthogonal sound pairs. Across wide differences in stimulus selection , as relatively more covariance lay along the Orthogonal dimension, Orthogonal sound pairs were discriminated increasingly well relative to Consistent pairs, approaching parity.
The same PCA model predicts that, beyond the range tested, discriminability of Orthogonal stimuli should improve as the length of the second Eigenvector is further increased (larger Eigenvalue). For larger second Eigenvalues, PCA predicts that discriminability of Orthogonal pairs should exceed that for Consistent pairs even approaching hyperdiscriminability, and that prediction is tested here.
The relationship between stimulus statistics and behavioral performance was assessed via linear regression (S3 Table). The second Eigenvalue of the covariance matrix of experimental stimuli (λ2) served as the predictor variable, and effect size (Cohen’s d, comparing mean discriminability of Consistent versus Orthogonal sound pairs; averaged across testing blocks) was the outcome variable. Fig 3 shows the regression across the present experiments (squares) as well as related experiments using the same stimuli (, triangles; , circles). Across all experiments in which all stimuli were tested equally often ([24,26], Experiments 1–3 here), stimulus statistics were highly correlated with behavioral performance (R = –0.871, P = .001).
Covariance along the Orthogonal dimension in each experiment (as measured by the second Eigenvalue of the covariance matrix of tested stimuli, λ2) is along the abscissa, and effect size (Cohen’s d, calculated as the difference in mean discriminability between Consistent and Orthogonal stimuli, each averaged across experimental blocks) is along the ordinate. Positive values along the ordinate indicate Consistent stimuli were better discriminated than Orthogonal stimuli, while negative values indicate Orthogonal stimuli were better discriminated. Results from the present report are plotted as squares with each experiment labeled individually. Results from  are plotted as triangles, and results from  are plotted as circles. Experiment 1 from , which is included in Fig 2A as a point of reference, is the upper-leftmost circle, which is also labeled. The solid line is the linear regression fit. Increasing covariance along the Orthogonal dimension clearly results in those stimuli being better-discriminated, but results from Experiment 5 are an outlier such that rare, extreme deviations from stimulus statistics are discriminated far better than predicted by covariance alone.
Discrimination of Orthogonal pairs was relatively poor when acoustic attributes shared relatively little covariance (smaller λ2, positive effect sizes indicating Consistent stimuli were discriminated more accurately), but discriminability improved as Orthogonal stimuli conveyed greater covariance (larger λ2, negative effect sizes indicating Orthogonal stimuli were discriminated more accurately).
A second regression was conducted across all experiments, irrespective of whether all stimuli were tested equally often in an experiment or not. Inclusion of Experiments 4 and 5 in the regression markedly decreased correlation strength (R = –0.663, P = .019). While the regression is still statistically significant with performance in Experiment 4 adhering to the trend, results from Experiment 5 are a distinct outlier. The prediction error (squared residual) for this result is more than six times larger than any other prediction error in the analysis. While PCA predictions are consistent with trends across equivalent presentation formats, hyperdiscriminability discovered with late-appearing stimuli cannot be predicted by covariance alone, and instead requires inclusion of other stimulus statistics (frequency of occurrence; i.e., rarity).
Perception warped to capture stimulus statistical structure to an extreme not previously observed. Violating covariance between acoustic dimensions in complex sounds had profound effects on stimulus discriminability, ultimately resulting in hyperdiscriminability when presentations were deferred until the last block of presentations. Only one prior study reported very modest effects of violating a learned relationship between simple acoustic dimensions (frequency, intensity) in tone stimuli . Simpson and colleagues  reported improved discrimination of noise bursts with rarely presented amplitudes or interstimulus intervals, and this improvement required sufficient acoustic dissimilarity to frequent sounds. Unlike previous work, individual values of physical dimensions AD and SS in the present study were never exceptional, as stimuli were distinct only with respect to co-occurrences of values of AD and SS. Discrimination of extremely deviant Orthogonal sounds improved when they were rare (Experiment 5), but this improvement only occurred when they were also sufficiently statistically dissimilar to frequently heard Consistent sounds (Experiment 4).
Neural novelty response magnitudes increase with increasing acoustic dissimilarity between ‘standards’ and ‘deviants’ [9,15]. Here, ‘deviant’ Orthogonal sounds were better discriminated with increasing statistical dissimilarity relative to the main pattern of covariance (‘standard’ Consistent sounds). Experimental methods vary widely across physiological, electrophysiological, and behavioral studies, but all results highlight general principles of novelty detection in response to changes from physical contexts and particularly statistical contexts in the present studies.
Past [24–26] and present results are consistent with the principle of non-isomorphism  whereby neural representations of sensory input along ascending neural pathways decreasingly resemble the input and better correspond to functionally significant stimulus properties. Neural coding becomes more statistically independent  and better captures emergent properties at higher levels . Examples of non-isomorphic representations in auditory cortex include encoding spectral shape across varying absolute frequencies , relative changes in faster versus slower click trains [34–35], and relationships across frequency components instead of individual components [36–37]. Here, perceptual performance is predicted by statistical relationships between stimulus attributes while physical acoustic dimensions appear to be abandoned. While non-isomorphic transformations do not exclude parallel representations that more closely resemble physical stimulus properties (isomorphism ), present results reveal that relationships between acoustic dimensions are primary determinants of perceptual performance–not the acoustic dimensions themselves.
The present findings have special relevance for speech perception. Speech sounds are famously rich with statistical structure , and extracting stable relationships from highly variable inputs is critical to high-level perceptual processing including speech perception . Multiple acoustic dimensions covary in adherence with lawful constraints upon vocal tracts [21,38]. For example, vowel sounds are well-characterized by peaks in the frequency spectrum (formants) which correspond to resonances in the vocal tract. As vocal tract length decreases systematically across adult men, adult women, and child talkers, laws of physical acoustics compel formant frequencies to increase proportionately. This relationship captures over 75% of variability in vowel productions across men, women, and children . Reliable covariance between stimulus attributes has been proposed to underlie categorization in general  and contribute to categorical perception of complex sounds including speech [18,40].
Many have argued that probability of presentation is fundamental to perception and to categorization of complex sounds such as speech [40–43], even suggesting that, at best, other statistical regularities play secondary roles . Here, performance was far better (but not exclusively) explained by covariance among stimulus properties. Discrimination of Orthogonal sounds improved as their statistical dissimilarity increased when probability of presentation was held constant (Experiments 1–3). Conversely, discriminability of minimally deviant Orthogonal sounds was similar when they were tested one-third (Experiment 4), one, three, or ten times as often as each Consistent sound pair . Finally, discriminability of maximally deviant Orthogonal sounds was enhanced when they were tested less frequently (Experiment 5). Results require integration of probability of occurrence and patterns of covariance for perception, but with far greater importance attributed to covariance.
Stilp and colleagues  tested three simple connectionist models of neural organization to better understand effects of covariance among stimulus attributes when digression from the principal covariance was modest. A Hebbian  neural network model captured early aspects of listener performance, but predictably failed to adjust over time due to lack of inhibitory connections. An anti-Hebbian model  failed because it predicted enhanced discrimination of all violations of covariance, even modest violations for which decreased discriminability was observed. Closed-form PCA successfully predicted results from a wide range of experiments including Experiments 1–4 here. However, neither the closed-form nor connectionist implementation of PCA predicted the hyperdiscriminability observed in Experiment 5. This effect required that stimuli be unexpected due to lack of prior occurrence. As in everyday perception, perceptual organization reflects contributions of multiple concurrent statistical properties, and cannot be fully described by a single property.
Escera and Malmierca  proposed that the auditory system is hierarchically organized for novelty detection, with more complex levels of regularity encoded at higher levels of processing. Similarly, Kluender and Alexander  argued that processing of complex sounds is a progression of increasingly sophisticated processes for extracting predictable patterns, with hierarchical processing being a necessary consequence of successive relatively independent (efficient) representations. The neural locus or loci responsible for the present results remains an open question, but some neural observations are suggestive. Previous successes of a connectionist implementation of PCA  to predict results depend on inhibitory circuits from the output layer to input layers. In the microcircuitry across layers within cortical columns, such inhibitory signals may be provided in a fashion similar to that proposed to support predictive coding . Less locally, required inhibitory circuitry may be provided within hierarchical auditory cortical regions, which extend from primary auditory cortex (AI) to belt areas to more lateral parabelt regions in a third stage of cortical processing . While AI is responsive to most sounds, responses later in the auditory hierarchy are selective for more complex stimuli, such as band-limited noise and frequency-modulated sweeps in belt areas [51–53] and species-specific vocalizations such as human speech in parabelt areas .
Three important characteristics of cortical novelty responses make cortex an attractive neural locus for the observed behavioral results. First, acoustic similarity plays a larger role in cortical SSA than does simple probability. High acoustic similarity between standard and deviant stimuli extinguishes SSA despite extreme differences in probability of presentation (9:1 standard:deviant ratio; ). Here, statistical similarity (as defined by patterns of covariance) influenced stimulus discriminability far more than probability of presentation. Second, SSA in primary auditory cortex has been reported for complex sounds such as frozen noise and speech , offering some potential for SSA extending to more complex stimuli that are defined by predictable statistical structure. Third, the amplitude of the MMN response (generated in auditory cortex) increases with more repetitions of the standard stimulus before presenting the deviant . Discriminability of maximally deviant Orthogonal sounds in Experiment 5 was enhanced following two blocks of Consistent-only testing, resulting in superior performance compared to the beginning of Experiment 4 when presentation of these sounds was not delayed. These promising parallels raise the possibility of “statistic-specific adaptation”, where stimulus discriminability is modulated by statistical relations among acoustic properties and not the properties (or specific stimuli) themselves. However, physiological investigations are needed in order to substantiate generalization from behavioral data.
Contemporary investigations of efficient coding [1–2] explore the statistics of natural stimuli and ways through which sensory systems capture this structure [56–58]. Here, principles of efficient coding captured dramatic changes in perceptual organization that reflected statistical properties of acoustic inputs, ultimately resulting in hyperdiscriminability. Results suggest efficient coding to be an underlying principle for both neural and perceptual organization.
Materials and Methods
All listeners provided written informed consent under protocols approved by the Institutional Review Board of the University of Wisconsin.
One hundred ninety-nine undergraduate students participated in exchange for course credit (40 each in Experiments 1–4, 39 in Expt. 5). All self-reported normal hearing, and no one participated in more than one experiment.
One waveform period (3.78 ms duration = 264 Hz fundamental frequency) was excised from recordings of a French horn and a tenor saxophone in the McGill University Music Database . Pitch periods were iterated to 500-ms duration and matched in RMS energy. Attack/Decay (AD) was defined as the linear amplitude increase from zero at onset to peak amplitude (attack) before linear decrease to zero at offset (decay) without steady state. Attack durations were varied in eight 10-ms steps from 20 to 100 ms, and from 100 to 390 ms in nine equal logarithmic steps. Decays were 500 ms (total duration) minus attack duration. Spectral Shape (SS), defined as relative levels of energy across frequencies, varied via 18 summations of the two instrument endpoints in different proportions, ranging from 0.2 to 0.8 and summing to 1 across instruments. Mixture proportions were derived according to Euclidean distances between equivalent-rectangular-bandwidth-scaled  spectra processed by simulated auditory filters . All stimulus processing was conducted in MATLAB. Human speech and musical instruments naturally vary in AD and SS, which are relatively independent both perceptually and in early neural encoding .
AD and SS were each exhaustively normed in two-alternative forced-choice (AXB) discrimination tasks until every pair of sounds separated by three stimulus steps was approximately equally discriminable for normal-hearing listeners. Dimensions were then fully crossed to create the stimulus matrix. A separate control study measured the discriminability of all stimulus pairs (separated by three stimulus steps along both AD and SS) along each main diagonal (red and blue lines in Fig 1). The result of this AXB discrimination control task was approximately equal discriminability across every pair of stimuli separated by a fixed distance (mean proportion correct = 0.690; ), thereby creating a perceptually linearized space. Experimental stimuli lay along either one main diagonal of the stimulus matrix, conforming to robust covariance between AD and SS (Consistent condition), or the perpendicular main diagonal (Orthogonal condition; see Fig 1).
Listeners discriminated sounds that were either Consistent with the main pattern of covariance between AD and SS or Orthogonal to this covariance. In each experiment, the vast majority of stimuli belonged to the Consistent condition (18 sounds, or 15 unique pairs of sounds) while a small number of stimuli formed the Orthogonal condition (two sounds, or one sound pair). In each case, sound pairs were separated by three stimulus steps along both AD and SS dimensions. Each trial presented one sound pair (either Consistent or Orthogonal) in a two-alternative forced-choice AXB triad with 250-ms ISIs. No feedback was provided regarding accuracy or whether Consistent or Orthogonal sounds were being presented. Within an experiment, each testing block consisted of either 128 trials (8 repetitions of each of the 15 Consistent sound pairs plus 8 repetitions of the one Orthogonal sound pair; Experiments 1–3 and final testing block of Experiments 4–5) or 120 trials (8 repetitions of the Consistent sound pairs only; first and second blocks in Experiments 4–5). Trials were tested in different random orders for each participant in each block.
Different subsets of this matrix were selected to define different degrees of shared versus unshared covariance between AD and SS. This was achieved by holding the Consistent dimension constant and selecting different pairs of Orthogonal sounds. In Stilp and Kluender  and Experiment 4, Orthogonal sounds were highly similar to the Consistent sounds by virtue of being positioned very close in the stimulus matrix (ordered pairs in Fig 2A and 2E: [8,11] and [11,8]). In Experiment 1, Orthogonal sounds were positioned slightly further away from Consistent stimuli (ordered pairs in Fig 2B: [5,14] and [8,11]). In Experiment 2, Orthogonal stimuli were positioned even further away (ordered pairs in Fig 2C: [2,17] and [5,14]). In Experiments 3 and 5, Orthogonal stimuli were positioned at the furthest distance possible from the Consistent stimuli in the stimulus matrix (ordered pairs in Fig 2D and 2F: [1,18] and [4,15]). Experiments were counterbalanced so half of listeners heard stimuli forming a positive correlation between AD and SS (as in Figs 1 and 2) while the other half heard stimuli forming a negative correlation (90° rotation of Figs 1 and 2). One group’s Orthogonal dimension was the other group’s Consistent dimension and vice versa, thus serving as its control and replication.
Listeners participated in single-subject soundproof booths. Stimuli were upsampled to 48,828 Hz, D/A converted (Tucker-Davis Technologies RP2), amplified (TDT HB4), and played diotically at 72 dB SPL over circumaural headphones (Beyer-Dynamic DT-150). Participants heard trials in different randomized orders and responded by pushing labeled buttons on response boxes. Stimulus pairs were tested eight times in each of three testing blocks. Experiments 1–3 consisted of 384 trials (3 blocks of 128), lasting approximately 30 minutes. Experiments 4–5 consisted of 368 trials (first two blocks had 120 trials [Consistent pairs only], third block had 128 trials [Consistent and Orthogonal pairs]), lasting approximately 28 minutes. Participants were provided brief breaks between blocks.
Listeners discriminated pairs of sounds that were either Consistent with or Orthogonal to the dominant pattern of covariance among acoustic attributes. Omnibus analyses (ANOVA, Friedman test) are likely to result in Type II error when Orthogonal discriminability returns to (Experiment 1 in , Fig 2A) or begins at (Experiment 1 here, Fig 2B) a level matching Consistent discriminability. Instead, planned contrasts were employed to retain sensitivity to differences within a given experimental block. The difference between Consistent and Orthogonal discrimination within a given block was required to exceed a threshold of 5% before conducting statistical analyses, because this threshold reliably indicates significant differences between conditions in a given block [24,26].
Shapiro-Wilk tests were conducted to assess the normality of distributions of mean discrimination scores for Consistent and Orthogonal conditions. Distributions of mean Orthogonal scores were not normal (i.e., produced statistically significant Shapiro-Wilk tests), indicating that nonparametric analyses were appropriate. Nonparametric tests were conducted on paired samples (two-tailed Wilcoxon signed-rank test [W] comparing Consistent and Orthogonal performance in an experiment), independent samples (one- or two-tailed Mann-Whitney U tests [U] comparing Orthogonal performance across experiments and thus across listener groups), or one sample (one-tailed Wilcoxon signed-rank test [W] comparing discriminability against baseline performance when acoustic dimensions share zero redundancy, where mean proportion of trials correct = 0.690; ). Corrections for multiple comparisons on a single data set were made using Holm’s  method.
S1 Table. Behavioral Results.
Mean discrimination accuracy for every listener in each experiment depicted in Fig 2. Within a given experiment, each row depicts performance for a given listener. Means are calculated for Consistent and Orthogonal trials in each testing block. Group means and standard errors of the mean (SE) appear at the top of each section.
S2 Table. Covariance matrices and Eigenvalues for experimental stimuli.
The leftmost column indicates the testing block (out of 3) in the experiment. For each experiment depicted in Fig 2, each stimulus is represented by the ordered pair indicating its position in the stimulus matrix, from (1,1) to (18,18). Within each experiment, the first column indicates position along the abscissa (Spectral Shape, SS) and the second column indicates position along the ordinate (Attack/Decay, AD). Within each testing block, Consistent stimuli are listed first and Orthogonal stimuli (when included) are listed second. Below these stimulus representations, the covariance matrix calculated on these stimuli is listed, followed by Eigenvalues of that covariance matrix. λ1 indicates the first Eigenvalue (corresponding to the Consistent dimension), and λ2 indicates the second Eigenvalue (corresponding to the Orthogonal dimension).
S3 Table. Predicting relative discriminability as a function of stimulus covariance.
For 12 experiments including those in the present report, stimulus Eigenvalue, block means, overall means, and overall standard deviations are provided for Consistent and Orthogonal conditions. The second-to-last column lists pooled standard deviations across Consistent and Orthogonal conditions. The final column calculates Cohen’s effect size (d) for the difference in discriminating Consistent and Orthogonal stimuli (calculated as Consistent minus Orthogonal). Positive values indicate better performance when calculating Consistent stimuli, and negative values indicate better performance when discriminating Orthogonal stimuli. The bottom displays correlation coefficients between the λ2 and effect size for the first 10 experiments listed (where Consistent and Orthogonal stimuli are tested in every block) and across all 12 experiments (including Experiments 4 and 5 where Orthogonal stimuli were not presented in the first two testing blocks).
The authors are grateful to Edward Bartlett for insights concerning cortical and corticothalamic circuitry.
- Conceived and designed the experiments: CES KRK.
- Performed the experiments: CES.
- Analyzed the data: CES.
- Wrote the paper: CES KRK.
- Computational modeling: CES.
- 1. Attneave F. Some informational aspects of visual perception. Psych. Rev. 1954;61: 183–193.
- 2. Barlow HB. Possible principles underlying the transformations of sensory messages. In: Rosenblith WA, editor. Sensory communication. Cambridge: MIT Press, New York: Wiley; 1961. pp. 53–85.
- 3. Maravall M. Adaptation and sensory coding. In: Quiroga RQ, Panzeri S, editors. Principles of neural coding. Boca Raton (FL): CRC Press; 2013. pp. 357–373.
- 4. Kluender KR, Coady JA, Kiefte M. Sensitivity to change in perception of speech. Sp. Comm. 2003;41(1): 59–69.
- 5. Pérez-González D, Malmierca MS, Covey E. Novelty detector neurons in the mammalian auditory midbrain. Eur J Neurosci 2005;22(11): 2879–2885. pmid:16324123
- 6. Malmierca MS, Cristaudo S, Perez-Gonzalez D, Covey E. Stimulus-specific adaptation in the inferior colliculus of the anesthetized rat. J Neurosci 2009;29(17): 5483–5493. pmid:19403816
- 7. Antunes FM, Nelken I, Covey E, Malmierca MS. Stimulus-specific adaptation in the auditory thalamus of the anesthetized rat. PLoS One 2010;5(11): e14071. pmid:21124913
- 8. Anderson LA, Christianson GB, Linden JF. Stimulus-specific adaptation occurs in the auditory thalamus. J Neurosci 2009;29(22): 7359–7363. pmid:19494157
- 9. Ulanovsky N, Las L, Nelken I. Processing of low-probability sounds by cortical neurons. Nat Neurosci 2003;6: 391–398. pmid:12652303
- 10. Ulanovsky N, Las L, Farkas D, Nelken I. Multiple time scales of adaptation in auditory cortex neurons. J Neurosci 2004;24(46): 10440–10453. pmid:15548659
- 11. Nelken I, Yaron A, Polterovich A, Hershenhoren I. Stimulus-specific adaptation beyond pure tones. In: Moore BCJ, Patterson RD, Winter IM, Carlyon RP, Gockel HE, editors. Basic aspects of hearing. Berlin: Springer; 2013. pp. 411–418.
- 12. Näätänen R, Gaillard AWK, Mäntysalo S. Early selective-attention effect on evoked potential reinterpreted. Acta Psych 1978;42: 313–329.
- 13. Näätänen R, Paavilainen P, Rinne T, Alho K. The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clin Neuropsych 2007;118: 2544–2590.
- 14. Aaltonen O, Niemi P, Nyrke T, Tuhkanen M. Event-related brain potentials and the perception of a phonetic continuum. Biol Psych 1987;24(3): 197–207.
- 15. Tiitinen H, May P, Reinikainen K, Näätänen R. Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature 1994;372: 90–92. pmid:7969425
- 16. Simpson AJ, Harper NS, Reiss JD, McAlpine D. Selective adaptation to “oddball” sounds by the human auditory system. J Neurosci. 2014;34(5): 1963–1969. pmid:24478375
- 17. Assmann PF, Summerfield Q. The perception of speech under adverse conditions. In Ainsworth WA, Greenberg S, eds. Speech processing in the auditory system. New York: Springer; 2004. pp. 231–308
- 18. Kluender KR, Kiefte M, Speech perception within a biologically-realistic information-theoretic framework. In: Gernsbacher MA, Traxler M, editors. Handbook of psycholinguistics. London: Elsevier; 2006. pp. 153–199
- 19. Kluender KR, Alexander JM. Perception of speech sounds. In: Dallos P, Oertel D, editors. The senses: A comprehensive reference, Volume 3, Audition. Academic Press: San Diego; 2008, pp. 829–860.
- 20. Kluender KR, Lotto AJ. Virtues and perils of an empiricist approach to speech perception. J Acoust Soc Am 1999;105(1): 503–511.
- 21. Kluender KR, Stilp CE, Kiefte M. Perception of vowel sounds within a biologically realistic model of efficient coding. In: Morrison G, Assmann PF, editors. Vowel inherent spectral change. Berlin: Springer; 2013. pp. 117–151.
- 22. Repp BH. Phonetic trading relations and context effects: New experimental evidence for a speech mode of perception. Psych Bull 1982;92(1): 81–110.
- 23. Hillis JM, Ernst MO, Banks MS, Landy MS. Combining sensory information: mandatory fusion within, but not between, senses. Science 2002;298(5598): 1627–1630. pmid:12446912
- 24. Stilp CE, Rogers TT, Kluender KR. Rapid efficient coding of correlated complex acoustic properties. Proc Natl Acad Sci USA 2010;107(50): 21914–21919. pmid:21098293
- 25. Stilp CE, Kluender KR. Non-isomorphism in efficient coding of complex sound properties. J Acoust Soc Am. 2011;130(5): EL352–EL357. pmid:22088040
- 26. Stilp CE, Kluender KR. Efficient coding and statistically optimal weighting of covariance among acoustic attributes in novel sounds. 2012; PLoS ONE 7(1):e30845. pmid:22292057
- 27. Attneave F. Applications of information theory to psychology: A summary of basic concepts, methods, and results. 1st ed. New York: Holt, Rinehart & Winston; 1959.
- 28. Samson EW. Fundamental natural concepts of information theory. Air Force Cambridge Research Station. 1951;E5079.
- 29. Paavilainen P, Simola J, Jaramillo M, Näätänen R, Winkler I. Preattentive extraction of abstract feature conjunctions from auditory stimulation as reflected by mismatch negativity (MMN). Psychophysiol 2001;38: 359–365.
- 30. Wang X. Neural coding strategies in auditory cortex. Hear Res 2007;229: 81–93. pmid:17346911
- 31. Chechik G, Anderson MJ, Bar-Yosef O, Young ED, Tishby N, Nelken I. Reduction of information redundancy in the ascending auditory pathway. Neuron 2006;51: 359–368. pmid:16880130
- 32. Chechik G, Nelken I. Auditory abstraction from spectro-temporal features to coding auditory entities. Proc Natl Acad Sci USA 2012;109(46): 18968–18973. pmid:23112145
- 33. Barbour DL, Wang X. Contrast tuning in auditory cortex. Science 2003;299(5609): 1073–1075. pmid:12586943
- 34. Lu T, Wang X. Temporal discharge patterns evoked by rapid sequences of wide- and narrow-band clicks in the primary auditory cortex of cat. J Neurophys 2000;84(1): 236–246.
- 35. Lu T, Liang L, Wang X. Temporal and rate representations of time-varying signals in the auditory cortex of awake primates. Nat Neurosci 2001;4: 1131–1138. pmid:11593234
- 36. Bendor D, Wang X. The neuronal representation of pitch in primary auditory cortex. Nature 2005;436(7054): 1161–1165. pmid:16121182
- 37. Bendor D, Wang X. Neural coding of periodicity in marmoset auditory cortex. J Neurophysiol 2010;103: 1809–1822. pmid:20147419
- 38. Lisker L. Rapid versus rabid: A catalogue of acoustical features that may cue the distinction. Haskins Laboratories Status Report on Speech Research. 1978;SR-54: 127–132.
- 39. Rosch E. Principles of categorization. In: Lloyd BB, editor. Cognition and categorization. Hillsdale (NJ): Erlbaum; 1987. pp. 27–48
- 40. Kluender KR, Lotto AJ, Holt LL, Bloedel SL. Role of experience for language-specific functional mapping of vowel sounds. J Acoust Soc Am 1998;104(6): 3568–3582. pmid:9857515
- 41. Maye J, Werker JF, Gerken L. Infant sensitivity to distributional information can affect phonetic discrimination. Cognition 2002;82(3): B101–B111. pmid:11747867
- 42. Clayards M, Tanenhaus MK, Aslin RN, Jacobs RA. Perception of speech reflects optimal use of probabilistic speech cues. Cognition, 2008;108(3): 804–809. pmid:18582855
- 43. Anderson JL, Morgan JL, White KS. A statistical basis for speech sound discrimination. Lang. Speech. 2003;46(2–3):155–182.
- 44. Werker JF, Yeung HH, Yoshida KA. How do infants become experts at native-speech perception? Curr Dir Psych Sci 2012;21(4): 221–226.
- 45. Hebb DO. The organization of behavior. 1st ed. New York: Wiley; 1949.
- 46. Barlow HB, Földiák P. Adaptation and decorrelation in the cortex. In: Durbin R, Miall C, Mitchison G, editors. The computing neuron. Wokingham, England: Addison-Wesley; 1989. pp. 54–72.
- 47. Escera C, Malmierca MS. The auditory novelty system: An attempt to integrate human and animal research. Psychophysio 2014;51: 111–123.
- 48. Sanger TD. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 1989;2(6): 459–473.
- 49. Bastos AM, Usrey WM, Adams RA, Mangun GR, Fries P, Friston KJ. Canonical microcircuits for predictive coding. Neuron 2012;76(4): 695–711. pmid:23177956
- 50. Kaas JH, Hackett TA. Subdivisions of auditory cortex and processing streams in primates. Proc Natl Acad Sci USA 2000;97(22): 18968–18973.
- 51. Rauschecker JP, Tian B, Hauser M. Processing of complex sounds in the macaque nonprimary auditory cortex. Science 1995;268(5207): 111–114. pmid:7701330
- 52. Wessinger CM, VanMeter J, Tian B, Van Lare J, Pekar J, Rauschecker JP. Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. J Cog Neurosci 2001;13(1): 1–7.
- 53. Tian B, Rauschecker JP. Processing of frequency-modulated sounds in the lateral auditory belt cortex of the rhesus monkey. J Neurophys 2004;92(5): 2993–3013.
- 54. Chevillet M, Riesenhuber M, Rauschecker JP. Functional correlates of the anterolateral processing hierarchy in human auditory cortex. J Neurosci 2011;31(25): 9345–9352. pmid:21697384
- 55. Sams M, Alho K, Näätänen R. Sequential effects on the ERP in discriminating two stimuli. Bio Psych 1983;17(1): 41–58.
- 56. Simoncelli EP. Vision and the statistics of the visual environment. Curr Op Neurobio 2003;13: 144–149.
- 57. Simoncelli EP, Olshausen BA. Natural image statistics and neural representation. Ann Rev Neurosci 2001;24: 1193–1216. pmid:11520932
- 58. Geisler WS. Visual perception and the statistical properties of natural scenes. Ann Rev Psych 2008;59: 167–192.
- 59. Opolko F, Wapnick J. McGill University Master Samples. McGill University Faculty of Music, 1989. Montreal, Canada.
- 60. Glasberg BR, Moore BCJ. Derivation of auditory filter shapes from notched-noise data. Hear Res 1990;47: 103–118. pmid:2228789
- 61. Patterson RD, Nimmo-Smith I, Weber DL, Milroy R. The deterioration of hearing with age: Frequency selectivity, the critical ratio, the audiogram, and speech threshold. J Acoust Soc Am. 1982;72: 1788–1803. pmid:7153426
- 62. Caclin A, Brattico E, Tervaniemi M, Näätänen R, Morlet D, Giard MH, McAdams S. Separate neural processing of timbre dimensions in auditory sensory memory. J Cog Neurosci 2006;18(12): 1959–1972.
- 63. Holm SA. A simple sequentially rejective multiple test procedure. Scan J Stat 1979;6(2): 65–70.