Spectrotemporal Response Properties of Core Auditory Cortex Neurons in Awake Monkey

So far, most studies of core auditory cortex (AC) have characterized the spectral and temporal tuning properties of cells in non-awake, anesthetized preparations. As experiments in awake animals are scarce, we here used dynamic spectral-temporal broadband ripples to study the properties of the spectrotemporal receptive fields (STRFs) of AC cells in awake monkeys. We show that AC neurons were typically most sensitive to low ripple densities (spectral) and low velocities (temporal), and that most cells were not selective for a particular spectrotemporal sweep direction. A substantial proportion of neurons preferred amplitude-modulated sounds (at zero ripple density) to dynamic ripples (at non-zero densities). The vast majority (>93%) of modulation transfer functions were separable with respect to spectral and temporal modulations, indicating that time and spectrum are independently processed in AC neurons. We also analyzed the linear predictability of AC responses to natural vocalizations on the basis of the STRF. We discuss our findings in the light of results obtained from the monkey midbrain inferior colliculus by comparing the spectrotemporal tuning properties and linear predictability of these two important auditory stages.


Introduction
Many acoustic properties, such as sound spectrum, sound duration, and sound level, but also perceptual qualities, like pitch or binaural disparities, are processed at subcortical levels [1]. The core auditory cortex (AC) is the first stage receiving this preprocessed acoustic information. To study the presumed role of AC in higher-order auditory processing thus requires the use of sounds that contain other, more complex, properties that vary in both the spectral and temporal domains. Useful naturalistic stimuli that meet these requirements are so-called dynamic ripples [2][3][4][5][6]. Although ripples lack the higher-order statistical properties of natural sounds [6] and the 1/f dynamics [7] that cortical neurons might be sensitive to, they are particularly useful to quantify and compare neural tuning characteristics, because they can be described and varied in a straightforward, parametric way.
Ripples share many properties with natural sounds, and they provide adequate information to extract a neuron's spectrotemporal receptive field (STRF). The STRF is a linear representation of the joint temporal and spectral sensitivity of an auditory neuron [8,9]. An interesting question is whether a neuron treats time and spectrum as independent variables. If so, the STRF is separable, because it can be decomposed into the product of a spectral and a temporal sensitivity function. In contrast, when a neuron's STRF is inseparable, it is most sensitive to a particular combined change in time and frequency (like in an upward or downward FM sweep). While recordings in cat and ferret AC have indicated inseparable STRFs in a significant proportion of neurons, most AC cells have separable STRFs [4,6]. Also in the midbrain inferior colliculus (IC) of the monkey, the majority of cells have separable STRFs, with about 30% of the neurons being inseparable [10].
As the STRF is a linear response kernel of the cell under study, it can also be used to make a linear prediction of the cell's responses to arbitrary sounds. Versnel et al. [10] reported a considerable linear predictability of the neural responses from a large fraction of IC neurons recorded in alert monkeys. Although studies have also reported linear properties for AC neurons [3,11,12], other reports have demonstrated non-linear characteristics [6,[13][14][15]. For example, Atencio and Schreiner [6] demonstrated that so-called fast-spiking units are more separable and more linear than regular units. On the other hand, they did not obtain significant differences in separability or linearity between neurons with either broad or narrow frequency tunings [15]. Note that most studies have been performed in anesthetized animal models. Anesthesia, however, may affect crucial response properties, such as inhibitory mechanisms [16], which could in turn affect spectral-temporal separability and linearity of the cells [17], and their sustained responsiveness [18].
Studies of AC cells in awake monkey have so far focused on different aspects of neural responses, often in the context of complex sounds, or perception and overt behavior, rather than on basic acoustic spectral-temporal response properties (see, however, [3] which addressed linearity using STRFs, and [19] which distinguished cells based on basic onset and offset responses). For example, AC cells can be modulated by non-acoustic signals [20][21][22][23][24][25][26], by complex sounds like conspecific vocalizations [27][28][29], or by pitch [30,31]. However, the properties of STRFs and linearity of AC cells in awake monkey have so far not been studied in detail.
We therefore examined the STRFs of AC cells from two awake monkeys using dynamic ripples, and quantified selectivity to ripple density, ripple velocity and direction, as well as their spectral-temporal separability. By comparing the best frequencies for tone responses to those of ripple responses, and the predictability of responses to a set of natural stimuli, we also assessed a neuron's response linearity. By comparing our earlier results from monkey IC [10] with the current AC recordings, we aimed to clarify the IC-to-AC spectrotemporal processing of sound.

Subjects
We performed single-unit recordings in the left auditory cortex of two adult male rhesus monkeys (Macaca mulatta, Monkey J; 7-9 kg, and monkey T; 8-10 kg). Part of the data acquired from these recordings was presented in other papers [25,26]. conducted in accordance with the European Communities Parliament and Council Directive (September 22, 2010, 2010/63/EU). All experimental protocols were approved by the local Ethics Committee on Animal Research of the Radboud University Nijmegen (RU-DEC, 'Radboud University Dier Experimenten Commissie').
Monkeys were pair-housed to facilitate normal interactive behavior, including grooming. Their joined cages measured 1.6 × 2.4 × 2.0 m (height × width × depth), and cage enrichment was provided in the form of a swing, plastic 3D puzzles, a mirror, and tools. The room, in which four such paired cages were placed, was further enriched with soft background popmusic from a Dutch radio station (Sky Radio; 9-16 hours on almost every week day, provided by the Animal Care Facility). To promote foraging behavior, small seeds were dispersed across the floor bed on a daily basis. All animals in the room received a fixed amount (300 g) of dry food daily, and when outside the experimental sessions they each had daily access to a bottle containing 400 ml of water.
About 24 hours before the start of an experimental session, water intake of the monkey was limited to 20 ml/kg. In the experiment, the animal earned a small water reward of 0.2 ml per successful trial. We ensured that the animals earned at least the minimum of 20 ml/kg of water on an experimental day, by supplementing after an experimental session, if needed. Additionally, the animals received fruit. In weekends, the animals' fluid intake was increased to 400 ml daily.
To monitor the animal's health status, we kept records of body weight, and water and food intake. Expert veterinarian assistance was available on site. The facility's expert animal technician performed the surgeries on the animals, described below. Quarterly testing of hematocrit values ensured that the animal's kidney function remained within the normal physiological range. Our procedures followed the water-restriction protocol of the Animal Use and Care Administrative Advisory Committee of the University of California at Davis (UC Davis, AUCAAC, 2001). Whenever an animal showed signs of discomfort, or illness, experiments were stopped and the animal was treated until the problem was solved.
The animals were sacrificed at the end of the study by an intravenous injection of 1 ml of heparin, followed by an overdose of pentobarbital. The animals were then perfused, and their brains were removed.

Surgical procedures
After completing training in order to respond to spectrotemporal ripple stimuli at a performance level of at least 80% (see details in [25,26]), the animal underwent surgery under full anesthesia and sterile conditions. Anesthesia was maintained by artificial respiration (0.5% isoflurane and N2O), and additional pentobarbital (IV; 3 mg/kg/hour), ketamine (IM; 0.1 ml/kg), and fentanyl (IV; 20 mg/kg/hour) were administered. A stainless steel recording chamber (12 mm diameter) was placed over a trepanned hole in the skull (10 mm diameter). The orientation and coordinates of the chamber were directed to the auditory cortex, as determined on the basis of MRI images. The chamber allowed a vertical approach of the left AC. A stainless-steel bolt, embedded in dental cement on the skull, allowed firm fixation of the head during recording sessions.
in the skull. The analog electrode signal was amplified (BAK Electronics; model A-1), bandpass filtered between 0.1 and 12.5 kHz (custom-built 8 th order LP Butterworth filter in series with Krohn-Hite, model 3343, 100 Hz HP cut-off), and monitored through a speaker and oscilloscope. The raw signal was then digitized (at 25 kHz, A/D convertor, TDT2 system; module AD-1; Tucker-Davis Technologies). An automated algorithm that was controlled by the Brain-Ware program (V 9.07 for TDT, under Windows 98; DELL PC) detected individual action potentials. Data analysis and spike sorting was performed offline in MATLAB (version 7.14.0.739, R2012a, Natick, MA, USA).

Sound stimuli
Sound stimuli were digitally generated at a sampling rate of 100 kHz and delivered via the BrainWare software package and TDT2 hardware. A trigger, provided by a TG6 module, started sound presentation (DA1, low-pass filtered at 20 kHz through a TDT2-FT6 module), and spike data acquisition. A speaker (Blaupunkt PCxg352, flat frequency characteristics within 5 dB between 0.2 and 20 kHz), positioned at the frontal central position at a distance of 1.0 m from the monkey, presented the stimuli; sound levels were set by two programmable attenuators (PA4). The ambient background acoustic noise level was about 30 dBA. Acoustic foam that was mounted on the walls, floor, ceiling, and every large object in the room effectively absorbed reflections above 500 Hz. In this study, we presented three types of acoustic stimuli: (1) pure tones, and (2) frozen static noise, followed by a spectrotemporal ripple, and (3) vocalizations.
Tones. Pure tones lasted for 150 ms and were presented over a frequency range from 250 to 16000 Hz at 4 different sound levels [10,30,50, and 70 dB sound pressure level (SPL)]. Trials were presented at least four times in a randomized manner. The frequency-tuning curve of a neuron was determined from the average firing rate over 50 ms of onset response for each tone across all sound level presentations. The best frequency (BF tone ) of each neuron was taken at the maximum of the tuning curve. The cell's response onset latency was defined as the moment after the pure tone onset at which the mean firing rate across all tones exceeded the mean baseline activity plus twice its standard deviation (SD) for the first time for at least 10 milliseconds.
For the purpose of reconstructing a tonotopic map ( Fig. 1), we also determined the characteristic frequency (CF), as follows. First, the spontaneous firing rate was determined during a 300 ms pre-stimulus period across all (! 208) tone stimulus presentations. Second, driven responses were defined as the average firing rate of all (! 4) presentations of each tone stimulus that was greater than the mean plus two standard deviations of the spontaneous activity. Finally, the CF of each neuron was defined as the frequency that produced a driven response at the lowest intensity (threshold).
Ripples. The ripple stimuli consisted of a broadband complex of 126 spectral components, equally distributed (20/octave) from 250 Hz to 19.7 kHz [4,10]. All components had random phase. The ripple envelopes were sinusoidally modulated in the spectrotemporal domain. The amplitude of each component was described as follows: ð Eq:1Þ Vocalizations. Three different macaque calls and three different bird calls were presented to both monkeys. The bird calls and macaque vocalizations are described in detail in Versnel et al. [10]. Briefly, all vocalizations had been resampled to 50 kHz. Artifacts and background noises were removed by cutting and high-pass filtering. The calls were presented at three sound levels (40,50, and 60 dB SPL); their durations varied from 300 to 1600 ms, and the number of repetitions in each recording was at least 10.

Experimental paradigms
Neural responses were measured while monkeys were exposed to the spectral-temporal ripples. The data presented here were collected during recording sessions in which also the AC responses were recorded during a behavioral task (see [25,26]). Each stimulus started with a static epoch (o = 0 Hz, O = 0 c/o, 500 ms duration, presented at 60 dB SPL), which changed into a pseudo-randomly selected ripple that lasted for 1000 ms (Eq. 1). The static noise was frozen within each block of trials.
Each trial started automatically with the static noise. Data collection started 300 ms before sound onset, and ended 700 ms after sound offset, yielding a total recording duration of 2500 ms. The number of trial repetitions was between 4 and 10.

Characterization of recording sites
Although we cannot with certainty identify the exact AC subdivision(s) in which we encountered individual neurons, we are confident that we recorded from the AC core (primary auditory cortex A1 and its immediate rostral part, area R) for the following reasons: 1) MRI scans were used for stereotaxic placement of the recording chamber; the subsequent coordinates of the successful recording sites within the chamber corresponded closely to the stereotaxic coordinates of A1 as provided by the atlas of the Rhesus monkey brain [32]; 2) before reaching an AC recording site there was a physiologically silent period, corresponding to the gap between upper and lower parts of the lateral sulcus [33]; 3) tone-onset latency of the recorded sites was 22.6 AE 5.9 and 23.6 AE 5.6ms for monkey J and T, respectively; 4) almost all neurons (96%) responded well to pure tones (BF: 250-16000 Hz); 5) The pure-tone tuning bandwidths for monkey J and T were 1.5 AE 1.2 and 1.5 AE 1.3 octaves, respectively; 6) The pure-tone thresholds for monkey J was 21 AE 13 dB SPL, and for monkey T was 23 AE 12 dB SPL. These tuning characteristics all fall in the same ranges as reported by Recanzone et al. [34] for behaving monkeys in AC areas A1 and R [25,26]. 7) Finally, we reconstructed tonotopic maps of the recording sites in both animals (Fig. 1). The maps demonstrated a systematic increase in CF from anterior to posterior locations over a spatial extent that corresponded well to earlier studies (e.g., [24]). The tonotopic map obtained for monkey T (Fig. 1A) shows first a decrease in CF, followed by an increase along the anterior-posterior axis, indicative for an area between R and A1. The tonotopic map for monkey J (Fig. 1B) showed a general increase in CF along the anteriorposterior axis, indicating that recordings in this animal were most likely taken from A1.
Our database consisted of 426 cells from which we successfully recorded during presentation of the entire set of 55 ripples (monkey J: n = 178; monkey T: n = 248).

Data analysis
The spectrotemporal receptive field (STRF). We estimated a cell's STRF from the neural responses to ripple stimuli by using the same off-line method as described before [4,10]. First, detected spikes were sorted and binned into peri-stimulus time histograms. As stated above, the ripple stimuli followed the static epoch. This caused the ripple onset response to be dramatically suppressed, or fully eliminated [25,26]. However, to avoid any conflict with potential transient onset responses, we excluded the first 100 ms of the ripple-response window [2,4,11]. We then wrapped the 900-ms response window into 32-bin period histograms, in which the ripple velocity determined the period as 1/o. We subsequently performed a fast The encoding of ripple amplitude modulations. To quantify how well a cell followed the modulation period of a particular ripple stimulus, a measure, q, was computed as follows [10]: with A i the amplitude of the i-th harmonic of the ripple's period (1/o). The parameter q reflects the quality for the best sinusoidal fit with the period histogram. If q = 1 the period histogram corresponds to a single period of a pure sinusoid, and suggests a linear response of the neuron. If q = 0 the period histogram does not resemble a single-period sinusoid at all. We applied q as an index to judge the sensitivity of the neuron to the ripple stimulus. Specifically, we determined the 25 th percentile of the q-values, q 25 , for upward (O < 0) and downward (O > 0) moving ripples (25 ripples each) and the 50 th percentile (median), q 50 , for AM complexes (O = 0, 5 ripples) for every neuron. Then, through simulation of random period histograms, we determined that 99% of the time a q 25 value less than 0.387 is found by chance. Similarly, for the q 50 value a 99% level of 0.376 is observed. Thus, for instance, it is unlikely (p < 0.01) that q 25 > 0.387 if a neuron responds in a random way to moving ripples.
Best ripple density and velocity and direction selectivity. Three response parameters were derived from the transfer function T (Eq. 2): the best ripple velocity o B of a cell, its best ripple density O B , and its direction selectivity D. The parameters o B and O B were determined as the O and o at which the magnitude transfer function was the largest. We computed the direction selectivity, D [35]: with R up and R down the sum of the response magnitudes to all upward (O < 0) and downward (O > 0) ripples, respectively. We also determined the moving ripple/AM response magnitude ratio to indicate the preference of a neuron for moving ripples versus AM complexes. To that end, the magnitude values to the best O ≠ 0 were summed for all presented velocities and divided by the sum of the responses to O = 0 at all velocities.
best moving ripple AM ¼ Finally, a best frequency (BF strf ) and latency were derived as the frequency and latency at which the STRF had its maximum response.
Separability. We examined whether the two-dimensional modulation transfer function T(o, O) can be obtained by the product of two separate, one-dimensional, transfer functions, namely a temporal function F(o) and a spectral function G(O). We applied two different methods to examine the extent of this so-called separability.
First, we performed singular value decomposition (SVD) of the modulation transfer function [4]. For a fully separable function, the following equation applies: The two separability analyses were performed for the complete transfer function (−2 O 2 c/o), as well as for each of the two quadrants of the transfer function (O < 0 and O > 0). The former measure quantifies full separability, whereas the latter quantifies the so-called quadrant separability [4,10].
Phase functions. For a majority of transfer functions, the phase functions F(o, O) appeared to be linear, and could thus be described for each ripple direction as follows [4,10]: The phase intercepts for downward and upward sweep directions, w down and w up , contain temporal and spectral components, denoted by y and f, respectively. This can be written as follows [4,10]: The separated phase functions from the SVD analysis were used to derive the parameters t, x, y, and f. The slope t corresponds to the temporal position (group delay), and the slope x corresponds to the spectral position (reflecting BF); these parameters can be determined for both ripple directions. The phase constants y and f are related to asymmetry of the STRF around BF. The parameter y reflects inhibition before and/or after excitation at BF: y < 0 reflects excitation at onset followed by inhibition, and y > 0 reflects inhibition at onset followed by excitation. The parameter f reflects the extent of sideband inhibition above and/or below BF: f < 0 indicates a dominant inhibition below BF, and f > 0 indicates dominant inhibition above BF (see [4], for details).
Response predictions. From the STRF, a linear estimate of the temporal response pattern of a cell, R(t), to any sound stimulus can be predicted by time convolution and spectral integration of the STRF with the stimulus spectrogram, S(x, t): with F MIN and F MAX the minimum and maximum frequencies of the spectral range of the STRF, respectively. The spectrogram is generated using the MATLAB function "specgram" with logarithmic frequency bins. The bin size of the spectrogram S(x, t) in both the spectral and temporal dimension was set according to the bin size of the STRF (dx = 0.25 octaves and dt = 12.5 ms). Because of the limited 2.5 octaves range of the STRFs, we applied a spectral shift to align the STRF maximum with the center of the spectral range, and aligned the stimulus spectrograms accordingly. Convolving the STRF and stimulus spectrogram along the temporal dimension for each spectral bin resulted in a prediction spectrogram

General properties
We recorded neural activity during ripple presentation from 426 AC neurons (248 in monkey T, 178 in monkey J), which showed at least an acoustically elicited onset or offset response. Further systematic responsiveness of neurons to moving ripples (O ≠ 0) and AM complexes (O = 0) was quantified by the parameter q (Eq. 4; see Materials and Methods; summarized in Table 1). Forty-one cells (~10%) responded to both moving ripples and AM complexes. Three cells responded only to moving ripples, while 115 cells (27%) responded only to AM complexes. More than half of the neurons (63%, N = 267) demonstrated no significant response to the set of moving ripples, nor to the set of AM complexes, and were therefore excluded from further analysis. Tone responses were also recorded for 246 cells (Table 1; using the same response criteria as in [34]; see Materials and Methods). Fig. 2 shows the pure-tone responses and corresponding tuning curves of four example cells. The raster plots (left-hand columns) show each neuron's spiking activity, which significantly increased above background during tone presentation (from 300-450 ms; blue lines). The four cells showed a clear short-latency onset peak, whereas cell T77 also had a clear offset response (Fig. 2D). The tuning curves (right-hand columns) represent the mean response (averaged over the tone duration) as a function of sound level and frequency. Each of the cells was tuned to a narrow range of frequencies, and their tuning widths typically broadened at higher sound levels. Two cells responded to their preferred frequency at all presented sound levels (10-70 dB SPL; Figs. 2A and D), while the other two only responded at the higher The numbers in parentheses indicate the number of the cells that produced the driven response to pure tone stimuli (see Materials and Methods). Bold numbers indicate the cells that were selected for further analysis (N = 159 for ripples, and N = 120 to both tones and ripples).
doi:10.1371/journal.pone.0116118.t001 In general, the AC cells in our study showed high sensitivity to pure tones in terms of a low sound level threshold, short response latency, and/or narrow frequency tuning. Best frequencies (BFs) ranged from 0.25 to 16 kHz, with the majority of encountered BFs below 2 kHz (76%; Fig. 3).
In Fig. 4 we present the magnitude transfer functions and corresponding STRFs for the four example cells shown in Fig. 2. As the spectral range of the STRFs extended to 2.5 octaves (see Materials and Methods), the spectral position of the STRFs is ambiguous. To solve this ambiguity, we used the tone-evoked tuning curves (right-hand column), to determine the appropriate spectral range for each cell (see Material and Methods).
An important feature revealed by the magnitude transfer functions is each cell's selectivity to a limited range of ripple densities, O, and velocities, o, which strongly varied among the neurons. We encountered neurons that preferred low (Figs. 4B-C) or high ripple densities (Fig. 4A), and neurons that preferred low (Fig. 4A), or high ripple velocities (Fig. 4B). Also, direction selectivity varied, as neurons could strongly prefer either down-or upward moving ripples ( Fig. 4C-D, in these examples upward), or responses to both directions were equally strong (Fig. 4A-B). Two of the neurons responded better to moving ripples than to AM complexes (Fig. 4A, D), while the two other neurons responded to both moving ripples and AM complexes (Fig. 4B,C).
The STRFs (Fig. 4, center column) generally showed excitation for a narrow range of frequencies with latencies between 10 and 30 ms, so that a BF strf could easily be obtained (Methods, and see below). Inhibition occurred around the time of or after excitation (Fig. 4A-D). Interestingly, the frequency tuning reflected by the STRF did not agree well with the neuron's tuning to pure tones. The difference between best frequencies could be as large as 1.25 octaves for cell J112 (Fig. 4A), one octave for cell J113 (Fig. 4B), 0.5 octaves for cell T81 Figure 3. Best frequency (BF) range. From 255 cells responsive to either moving ripples or AM complexes, 246 cells were also responsive to pure tones. The BFs spread across the presented tone stimuli; however, the majority preferred the lower frequencies.
To quantify this point in more detail for the population of cells, we plotted the difference between BF tone and BF strf (DBF) for all 120 neurons that responded well to both tones and ripples, against BF tone values (Fig. 5A). One should note that the STRF spectral resolution was 0.25 octaves, while the frequency resolution of tone stimuli was 0.5 octaves. We therefore considered the full range of DBF between [−0.25, +0.25] octaves as indistinguishable BF values for tones and STRFs. Only 26 of the 120 AC neurons (22%) belonged to this category (grey circles, Fig. 5A).
For a direct comparison between AC and IC neurons, we also performed the same analysis for the 68 IC neurons reported in [10]. Here (Fig. 5B), DBF of 62% IC cells (42) fell within the AE0.25 range, indicating a much stronger resemblance between BF tone and BF strf and therefore better response linearity for IC neurons.  Fig. 6A). In general, auditory neurons with a low BF tone tend to have broader tuning curves, i.e. low preferred densities (e.g. Kowalski et al., 1995;Versnel et al., 1995 [36, 37]), this low-density preference may simply arise from the low-frequency preference of our recorded AC neurons (Fig. 3). To test for such a potential relation between BF tone and preferred density in our data, we plotted the BF tone versus O B for all 120 AC cells for which were driven by tones and ripples. There was no significant correlation (r = −0.07, p = 0.43; Fig. 6D), implying that AC neurons with broad tuning properties (low O B ) do not necessarily follow from a low BF tone preference.

Distributions of ripple responses
The distribution of best ripple velocities, o B , showed a slight preference for low velocities with 34% of AC cells preferring the lowest applied velocity in this study (8 Hz ; Fig. 6B). Still a substantial minority (12%) of the neurons preferred the highest velocity tested (40 Hz ; Fig. 6B). We observed no correlation between o B and BF tone (r = 0.08, p = 0.37; Fig. 6E).
Direction selectivity D of the neurons was symmetrically distributed around zero (median D = 0.002), and a majority of neurons (62%) had no preference for upward and downward  Fig. 6C). We obtained a strong direction preference (D < −0.2 or D > 0.2) for 13 neurons (8%). The joint distribution of o B and O B indicated that ripple selectivity did not cover a wide range of spectral-temporal combinations as few cells were responsive to high o B and O B ; furthermore, selectivity to O did not correlate with selectivity to o (Fig. 6F). Moreover, direction selectivity was uncorrelated with either o B (t 158 = 23, p < 0.0001) or O B (t 158 = 9, p < 0.0001; not shown).
The scatter plot in Fig. 7 (left) shows the ripple/AM ratio, which quantifies the selectivity of the neurons to moving ripples, as a function of best ripple density. As expected, neurons with O B = 0 tended to have a ratio below one, while neurons with O B > 0 had a ratio above one. Note that exceptions also occurred, as O B was obtained at a single velocity, while the ripple/ AM ratio was computed across all velocities. The ratio did not depend on O B >0, which means that, on average, neurons with a low O B responded equally strong to AM complexes as neurons with a high O B . The distribution of the ripple/AM ratio (right-hand side of Fig. 7) was unimodal and skewed towards low ratios, with the peak at a ratio of 0.5 meaning that most neurons (60%) preferred AM complexes rather than moving ripples. Most of the cells preferring moving ripples had their ratio close to one (median = 1.5) implying that they responded nearly as well to spectrally flat AM complexes as to moving ripples (cf. Table 1).  (Fig. 8A). The same property holds for upward ripple directions (not shown). These similarities imply a high degree of separability, which is indeed reflected by a low inseparability index (Eq. 8, Materials and Methods) for this neuron (a = 0.05). The responses can therefore be well described by the mean separated spectral and temporal tuning curves (corresponding to l 1 ), obtained from SVD (Fig. 8B). Indeed, the STRF reconstructed by multiplying the temporal and spectral functions for the magnitude and phase characteristics of both directions and AM complexes (Eq. 7) were very similar to the original STRF (r = 0.98) (Fig. 8C). Fig. 9 shows the separated transfer functions (cf. with format Fig. 8B) of the four neurons of Fig. 4. Three of these neurons responded to a sufficiently wide range of ripples to allow for a meaningful SVD analysis (median q over total transfer function >0.5; Figs. 9A, C, and D). Neuron J113 responded to only a limited range of ripples, mostly at zero density (q = 0.36; Figs. 4B and 9B). In the quadrant-separated representation of the data, the shape of the transfer functions for the two ripple directions (solid and dashed curves, respectively) appeared quite similar. Note also that the phase functions could typically be well approximated by straight lines (in the range with substantial responses), as predicted by Eq. 8, where only the intercepts of the phase curves could differ for the two directions. The slopes of the phase curves, which reflect the spectral position and group delay (see Eq. 9), were similar (note that slopes for O < 0 are shown inverted, for clarity). As all four neurons responded well to AM complexes, their AM transfer functions are plotted as dotted lines in the temporal graphs for comparison. Note that also the AM curves were similar to the separated ripple transfer functions, differing only in magnitude. We performed the SVD analysis (Eq. 8) for all cells in our population that had good temporal following over a wide range of ripples (median q > 0.4; N = 75 cells). Fig. 10A shows that for a large majority of selected neurons the transfer functions in both directions were separable (a up and down < 0.2 for 72% of the neurons). For 12 neurons (16%) good separability was obtained for one direction only (a up or down < 0.2). The direction for which the lowest a was obtained typically coincided with the preferred direction, D. A statistical simulation indicated that the probability of finding a value a < 0.3 for a random 5×5 transfer function is <0.01. Using a = 0.3 as a selection criterion for (in)separability, we found that six of the singledirection transfer functions were inseparable, and that only two neurons had inseparable transfer functions for both directions. The distributions per animal were very similar (gray and black dots and bars in Fig. 10 represent each monkey). In conclusion, AC neurons show a significant degree of spectral-temporal separability for a single direction, a feature that has become known as quadrant separability [4,10].
The value of a for the complete transfer function normally exceeds the lowest index obtained separately for the two directions. Assuming quadrant separability, we also performed a statistical simulation on complete 11×5 transfer matrices that consisted of three components: two different quadrant-separable 5×5 matrices representing transfer functions to both directions, and a 1×5 matrix representing the AM noise transfer function. When the three components differed randomly, we obtained a probability <0.01 of finding a value a < 0.3. Using a = 0.3 as a criterion, we found that only 7% (5 cells) of the complete transfer functions were inseparable (Fig. 9B). The inseparability indices for these transfer functions were confined to an intermediate range of values (0.3 < a < 0.5).
Because the data indicated quadrant separability for the AC neurons that responded well to moving ripples (see above), the full transfer function  = 0). We verified that r was indeed significantly related to the inseparability index a (r 2 = 0.95; Fig. 11A). The median r was 0.94, confirming the high degree of quadrant separability and validity of this approach to estimate the STRF of a separable cell (Fig. 11A). The correlation coefficients r were also estimated separately for the amplitude and phase of the transfer function, resulting in a median r of 0.88 for amplitude and a median r of 0.67 for phase. This indicated that the separability of amplitude was higher than for phase.

Phase functions
The phase functions, such as shown in Figs. 8 and 9, could be approximated by straight lines. This feature, which results from phase locking to the ripple envelope, is further quantified in Fig. 12A by fitting linear regression lines through the temporal and spectral phase data for upward (left) and downward (right) ripples (same neuron as in Fig. 8). The slopes and intercepts of the regression lines yielded meaningful parameters (see Eq. 9): the slopes can be associated to group delay and spectral position, whereas the spectral-temporal parameters derived from the intercepts (see Eq. 10) correspond to STRF asymmetries in excitatory and inhibitory sidebands in the spectral and temporal domains. Fig. 12B shows that the group delays obtained for the population of 75 cells fell in a range of 15-40 ms. The group delays for upward and downward ripple directions agreed within 3 ms for a majority of neurons (72%). The group delays were longer than the onset latencies to tone stimuli by approximately 5 ms, which was significant (t 67 = −4.7, p < 0.001). The group delays did not correlate with the tone-evoked latencies (r 2 = 0.02; p = 0.23), and the latency differences did not correlate with direction selectivity D either (r 2 < 0.001; p > 0.9). The spectral positions found for the two directions had almost identical values for a large majority of neurons (Fig. 12C). Spectral position differences also did not correlate with preferred ripple direction (r 2 = 0.004; p = 0.56). Fig. 12D shows that the temporal phase constant y varied between −180°and 0°for most neurons (64%), which implies an onset excitation at BF STRF (see Materials and Methods). We obtained a positive y for a substantial minority of neurons (36%). This feature hints at an inhibitory onset response for the STRF. The spectral phase constant f (Fig. 12E) was almost  normally distributed around 0, which means that dominant inhibition above (f > 0) and below (f< 0) a cell's BF was found in a similar number of neurons. A large majority of neurons has a f near 0 indicating symmetry of sideband inhibition above and below BF.
To illustrate how the spectral and temporal phase values influence the STRF, Fig. 13 shows four example AC cells with different phase values. Dt represents the difference between group delay (ms) of downward and upward directions responses (see also Fig. 12B). Dw denotes the difference of spectral position (BF strf ) of downward and upward ripple responses in octaves (see also Fig. 12C).
The cells in Figs. 13A, B, and D are examples of a negative temporal phase (y < 0), which indicates excitation that is followed by an inhibitory response. If y >0, the neuron exhibits an inhibition that is followed by an excitatory response (Fig. 13C, see also Fig. 12D). Values for f unequal to zero imply a spectral sideband asymmetry in the STRF. For an onset cell, like in Figs. 13A and B, the inhibitory sideband is asymmetric, while for an offset cell such as in Fig. 13C, the excitatory sideband is asymmetric. Negative values for f indicate an asymmetry only above the BF strf (e.g. Fig. 13A), whereas positive values correspond to an asymmetry only below the BF strf , (Figs. 13B and C). Fig. 13D shows an example cell (J222) without asymmetric inhibitory sidebands (f = 0).
Cell J160 (Fig. 13B) has a relatively large Dw value. As a result, the spectral positions for downward and upward directions (see Fig. 12C) are 0.7 octaves apart. This is reflected by a spectrally broad excitation, which ranges from about 2 kHz up to 5.6 kHz. Note that this cell also had a large difference between the response latencies for both ripple directions (Dt = 21 ms). In contrast, cell J222 (Fig. 13D) responded virtually identical to the downward and upward ripple directions as indicated by the small values for Dt and Dw.

Linear predictions
For 75 cells with good temporal following to a wide range of ripples (median q > 0.4), we also analyzed the responses to six different natural vocalizations. We predicted the responses to the vocalizations by applying the linear STRF model of Eq. 11 (details in Materials and Methods). The result of this analysis for one of the neurons (J114) is illustrated by the black curve in Fig. 14C for Figure 13. The effect of spectral and temporal phase values on STRF. A-D. For each cell different phase values are mentioned above its corresponding STRF. Δt represents the difference between group delay (ms) of downward and upward directions responses (see Fig. 12B). Δχ denotes the difference of spectral position of downward and upward ripple responses in octaves (see also Fig. 12C). The θ value represents excitation (θ<0) or inhibition (θ>0) on onset response (see Fig. 12D), and f value demonstrates spectral asymmetry around BF strf (see also Fig. 12E). A. an onset cell (θ<0) with inhibition merely above the BF strf (f< 0). B. an onset cell (θ<0) with inhibition below the BF strf (f>0). The Δχ value implies 0.7 octaves difference between BF position for downward and upward ripple responses, which is obvious from the broad spectral excitation pattern. C. an offset cell (θ>0) with excitation only below the inhibition spot (f>0). D. An onset AC cell responding equally to both upward and downward ripples (small Δt and Δχ values), which has also symmetrical small inhibition above and below the BF strf . the monkey grunt sound. The actual average response (gray curve) is shown for comparison. Note that the predicted response was not rectified; negative values thus indicate a predicted response inhibition, while positive values can be interpreted as the firing rate of the cell. Although the largest peak of the actual response was successfully predicted, the linear model failed to make a decent prediction for the rest of the trial, as evidenced by the low correlation coefficient between the two curves (r = 0.12). Fig. 15 shows the prediction for four of the other vocalizations presented to different neurons of the two monkeys. Although most acoustic energy of these vocalizations overlapped considerably with each neuron's STRF, the linear predictions were still poor.
To summarize the overall performance of the linear predictive power of the STRF for all vocalizations across the 75 neurons, Fig. 16 shows the response correlations as a function of each neuron's prediction strength. We defined the latter as the root-mean-square of the predicted response amplitude for each vocalization, to avoid trivially low correlations for stimuli for Figure 14. Prediction response of an example cell (J114). A. STRF and spectrogram of the stimulus (monkey vocalization, grunt). The STRF is spectrally centered around its BF. The spectrogram was taken at the same frequency range and the same spectral-temporal resolution. B. Prediction spectrogram yielded by convolution along the temporal dimension of STRF and stimulus spectrogram. C. Predicted response (black) obtained by summation along the spectral dimension of predicted spectrogram. The gray curve represents the actual response for comparison. The average spontaneous spike rate (here, 58 spikes/s) is subtracted from the actual response. The amplitude of the predicted response is scaled to the actual response.
doi:10.1371/journal.pone.0116118.g014 Figure 15. Prediction of response to four different vocalizations compared to actual response for four AC cells. A. Cell T79 with high BF (6.7 kHz) responding to bird call (Meadowlark). B. Cell T81 with low BF (0.8 kHz) and a broad excitation, which followed by inhibition, responding to Blackbird call. C. T84, an inhibitory onset cell with high BF (10 kHz) responding to undulating vocalization. D. J145, another inhibitory onset cell with broad inhibitionexcitation and medium BF (2kHz). The predicted (black) and actual (gray) responses are shown for each cell. In general the prediction was poor. which the expected response would be close to zero. In our previous study of monkey IC neurons, the highest linear response correlations were typically obtained for the strongest responses [10]. In contrast, however, we found no such relationship for the population of AC cells. In general, the correlations between actual and predicted responses were low for each of the six vocalizations, as shown in the table. The best predictions were obtained for the monkey grunt sound (r = 0.14, on average), although they were still very low. Rectification of the predicted responses, by setting the negative values to zero, had no significant influence on these results.

Discussion
Our study allows for a direct comparison of the spectral-temporal tuning properties of AC cells with those obtained from IC cells in awake macaque monkeys, as they were tested under the same experimental protocols and for the same sound stimuli [10]. Our results indicate that both structures share a wide diversity of selective responses to ripple densities and velocities (Figs. 4 and 6), and that both demonstrate quadrant and full spectral-temporal separability in the majority of recorded neurons (Fig. 10). However, we also observed marked differences between the IC and AC cells. First, while the correspondence between best frequencies for tones and ripples remained within 0.25 octaves for most IC cells, it was quite poor, with often more than half an octave difference, for the population of AC cells (Fig. 5). Second, the linear prediction for AC responses to the set of natural stimuli on the basis of the neuron's STRF was typically poor (Figs. 14-16), whereas for IC neurons the prediction was fair (correlations 0.4-0.6) for stimuli yielding high firing rates.

General characteristics
Spectral tuning. The AC population showed a predominant preference for low ripple densities (O B 0.4 c/o; Fig. 6A). This result is similar to monkey IC [10], and to data obtained from thalamus and A1 in anesthetized animals [2,14,[38][39][40][41]. This suggests that ripple-density selectivity might be faithfully transmitted from IC to AC, and that anesthesia may have little influence on spectral tuning characteristics. A similar preference for low ripple densities was also reported for human auditory cortex [42], and for human psychophysics [43]. This prominent preference for low ripple densities might be confounded by the high percentage (76%) of recorded AC cells with a low BF tone (<2 kHz), considering that low BF tone cells typically have broader tuning characteristics (e.g. Kowalski et al., 1995;Versnel et al., 1995 [36, 37]). However, the absence of a correlation between BF tone and O B and the low O B values for high-BF tone neurons observed in this non-human primate study, indicate that neurons across monkey AC have low O B (Fig. 6D).
Temporal tuning. Previous studies have suggested that the range of preferred ripple velocities decreases from IC to cortex [40,41]. In line with these studies, the peak at low o B in our study was similar to that of monkey IC (o B = 8 Hz for 34% of AC cells, vs. 31% for IC cells; [10]), but preference for the two highest measured velocities has decreased substantially from IC to AC (o B = 32-40 Hz in 36% of IC cells vs. 24% of AC cells; Fig. 6A; [2,10,14,40]). Also, the joint distribution of optimal velocities and densities for the AC population did not cover the entire 2D spectral-temporal space as for IC neurons [10]. For example, cells tuned to higher o B (> 24 Hz), tended to be associated with a smaller range of best densities than cells tuned to low o B (Fig. 6F). This suggests that AC cells can process sound features over a range that is spectrally narrower than IC cells [10].
Phase functions. The phase functions of the spectral-temporal transfer function were approximately linear in both quadrants (Fig. 12A), which was also reported for ferret A1 neurons [2,4] and for monkey IC [10]. These straight-line relationships indicate good phase following of the ripple envelopes. The regression coefficients derived from these straight-line phase relations are informative descriptors of the STRF: the slope t of the temporal phase function determines the group delay (BF strf latency), whereas the slope w of the spectral phase function determines its spectral position (BF strf ). Like reported for IC neurons, these parameters were symmetrically distributed for both upward and downward ripple directions for AC cells as well (Fig. 12B, C). However, unlike IC neurons, there was no significant relation between the updown differences of direction selectivities (D) and the up-down group-delay differences [10].
The spectral (f) and temporal (y) components of the intercepts of the straight-line phase functions refer to STRF asymmetries in excitatory and inhibitory bands around the cell's BF. Like for ferret A1 [2,4] and monkey IC [10], we found similar distributions for these intercepts. The majority of AC neurons have an onset excitation at their BFs (y < 0; Fig. 12D). Inhibitory side bands above (f > 0) and below (f < 0) the BF strf were found for the same fractions of AC neurons (Fig. 12E).

Separability and direction selectivity
In anesthetized preparations a substantial proportion of A1 cells were characterized by inseparable STRFs (ferret, [4,44]; cat, [40]; mouse, [5]). Most often, however, inseparability was due to an asymmetry between ripple directions (O>0 vs. O<0), as the transfer functions for inseparable STRFs were typically quadrant separable. Versnel et al. [10] reported that the majority of monkey IC neurons (>70%) had fully separable STRFs, and IC cells with inseparable STRFs were typically quadrant separable. We here found that STRFs of AC cells were even more separable than IC neurons, since 93% of the cells were both fully separable and quadrant separable. Our results may be quite surprising since direction selectivity is typically found to increase from IC to cortex (rat, [45]) and separability is found less often in awake than in anaesthetized conditions (ferret, [44]). We suggest that species differences underlie the apparent differences in (in)separability between previous studies and our results from awake monkey. The extent of separability may arise from the species-specific relevance of environmental sounds (e.g., vocalizations in bat have with strong spectrotemporal direction dominance leading to high inseparability of STRFs [17]).

Linear response predictions
As a simple measure for a potential nonlinearity in AC responsiveness we took the difference between the best frequencies found for pure tones (BF tone ) and for the STRF derived from broadband ripples (BF strf ). Our results revealed poor similarity between the two BFs (Fig. 5). Previous studies demonstrated a strong correspondence between the BFs in A1 cells of anesthetized ferrets [2,38,39], which could suggest that anesthesia might linearize potentially nonlinear response behavior of AC neurons. Recordings taken from IC showed a much better agreement between BF tone and BF strf in awake bats [17], as well as in awake monkeys (Fig. 5B, [10]). This might indicate that the contribution of nonlinear processing to the neural responses increases along the ascending auditory pathway.
Note that the BF tone is based on a spike-rate average response (see Materials and Methods). Conversely, the STRF relates to the synchronization with the ripple envelope modulations. This could lead to different BFs for pure tone and modulated sound stimuli. Previous studies have demonstrated a gradual transformation from temporal encoding (synchronized) at lower auditory system levels, to a rate coding mechanism at higher levels [46]. One would expect increased resemblance, rather than an increased difference between the BF tone and BF strf for AC cells relative to IC as both BFs in AC would be based on non-synchronized firing rates. However, this would not happen, as it has been reported by other studies that both AC and IC neurons of awake monkeys are well capable of phase locking above the modulation frequencies range (8:8:40 Hz) used for the AC and IC neurons [47]. Thus, although for both AC and IC neurons the STRF was based on synchronized responses, this led to a large discrepancy between the two measures of BF for AC cells, when compared to the IC cells [10].
In addition, we observed a general failure to faithfully predict responses to the set of vocalizations on the basis of the STRF (Figs. [14][15][16]. Previous studies, applying the STRF method, demonstrated that most AC neurons responded linearly to broadband sounds in rats [13], to ripples in anesthetized ferrets [11,39,48], and to virtual acoustic space stimuli in anesthetized ferrets and cats [12,49]. This could suggest that cortical responses may be more linear in anesthetized preparations than in awake animals. If so, the apparent discrepancy between results may be attributed to the state of alertness of the animal [16,50]. Note that other studies have shown that differences in an animal's alertness do not systematically change the shape of the STRF [25,51], but rather seem to affect the response strength [9]. When the neural response behavior is nonlinear, the shape of a neuron's STRF, and hence its potential usefulness to predict responses to other stimuli, may critically depend on the stimuli used to extract the STRF, as well as on their resemblance with test stimuli used to compute the linear prediction, or response approximation [52][53][54][55][56]. Indeed, broadband sustained stimuli, such as the spectral-temporal ripples used in this study, led to a different estimate of the STRF for cortical neurons than narrow-band stimuli, like natural vocalizations [48]. In line with this, some studies have reported that the linear STRF prediction of AC cells improves with the similarity between test stimuli and the sounds used to derive the STRFs (in anesthetized ferrets [39]; in anesthetized zebra finches [52]). We therefore suspect that part of the failure to predict AC responses to the set of vocalizations in Figs. 14-16 may be explained by such nonlinear aspects, as the STRFs were derived from responses to ripples, which differed substantially from the natural vocalizations. Although it is feasible that better predictive power to natural sounds could be obtained if STRFs are derived from similar natural sounds, a quantitative comparison with earlier IC data requires the use of the same data analysis procedures and stimulus sets [10].
Single-unit recordings from the IC of awake bats [17] and monkeys [10] showed a fair linear predictability of the responses. Those results, however, also indicated that IC responses could not be fully explained by a linear kernel such as the STRF. This could imply an increased contribution from nonlinear processing along the ascending auditory pathway from IC to AC, which is in line with the observations of Atencio et al. [57]. The data of Yeshurun et al. [58,59] showing that natural sound responses were better predicted by medial geniculate body (MGB) neurons than by cortical neurons, indicate that nonlinearities could arise at the level of the AC. This is further corroborated by studies reporting multiplicative encoding of different sound features in ferret AC [60], and multiplicative interactions between bottom-up (acoustic) and top-down (task-related) signals in single units of monkey AC [25,26].
Our findings add to a growing body of evidence for a hierarchical increase of auditory processing complexity along the auditory ascending pathway. Through a detailed comparison with data earlier obtained from the monkey IC [10], we found that the spectral-temporal acoustics across the IC and AC populations, as described by the STRF and responses to pure tones, are very similar (Table 2). Yet, prominent differences between the two structures have become apparent as well: the tuning characteristics for tones, ripples and natural stimuli are strongly related at the level of single IC neurons, where the STRF provides a reasonable model to predict the responses to the different classes of stimuli, while this correspondence is nearly lost for AC neurons. Whether these differences are due to intrinsic neural nonlinearities such as shortterm plasticity or rapid synaptic depression [61], increased trial-to-trial response variability [62], additional top-down cognitive signals [25,26], or to a combination of these factors, needs to be explored in future studies.