Representation of Sound Objects within Early-Stage Auditory Areas: A Repetition Effect Study Using 7T fMRI

Environmental sounds are highly complex stimuli whose recognition depends on the interaction of top-down and bottom-up processes in the brain. Their semantic representations were shown to yield repetition suppression effects, i. e. a decrease in activity during exposure to a sound that is perceived as belonging to the same source as a preceding sound. Making use of the high spatial resolution of 7T fMRI we have investigated the representations of sound objects within early-stage auditory areas on the supratemporal plane. The primary auditory cortex was identified by means of tonotopic mapping and the non-primary areas by comparison with previous histological studies. Repeated presentations of different exemplars of the same sound source, as compared to the presentation of different sound sources, yielded significant repetition suppression effects within a subset of early-stage areas. This effect was found within the right hemisphere in primary areas A1 and R as well as two non-primary areas on the antero-medial part of the planum temporale, and within the left hemisphere in A1 and a non-primary area on the medial part of Heschl’s gyrus. Thus, several, but not all early-stage auditory areas encode the meaning of environmental sounds.


Introduction
The human primary auditory cortex (PAC) is currently defined on the basis of cytoarchitectonic and tonotopic criteria. It is co-extensive with the koniocortex on the medial two-thirds of Heschl's gyrus (HG) [1][2][3][4][5][6]. Its precise localization in activation studies relies, especially in cases of partial or complete HG duplication (in 40-60% of hemispheres [4,7]), on tonotopic mapping. The presence of orderly tonotopic representations is a key feature of the three core areas in non-human primates [8][9][10][11][12][13][14][15][16], where primary subfields are organised in anterior-posterior frequency gradients from high-to-low (caudal primary auditory subfield A1), low-to-high (rostral primary auditory subfield R), and high-to-low (rostrotemporal primary auditory subfield RT) frequencies, with a low frequency cluster at the boundary between A1 and R and a high frequency cluster between R and RT. In humans, fMRI studies consistently revealed a double frequency representation with anterior-posterior frequency gradients on HG which are of the human homologues of monkey A1 and R subfields [17][18][19][20][21][22][23], but less often the third reversal equivalent to the RT subfield.
The cortex adjacent to PAC contains several non-primary areas-the auditory belt areaswhich are architectonically inhomogeneous [12,15,16,24,25]. The use of function-related chemical stains on post-mortem human brains has led to the identification of several earlystage areas which are the likely homologues of primate belt areas: the anterior area (AA), the medial area (MA), the anterior-lateral area (ALA), the lateral area (LA) and the posterior area (PA) [5,6]. The non-primary areas are partly frequency-selective, but without a clearcut tonotopic organisation [17]. They tend to respond to more complex auditory stimuli, which characterise them as putative belt areas [26].
Recognition of environmental sounds involves the early-stage auditory areas as well as parts of the temporal and parietal convexities [27][28][29][30][31]. The discrimination between broad categories, such as living vs. man-made sound sources occurs very rapidly (as early as 70 ms after stimulus onset) due to the interactions of top-down and bottom-up processes, which characterize the ventral and dorsal auditory streams [32][33][34][35]. Several lines of evidence indicate that early-stage areas on the supratemporal plane (STP) analyze spectrotemporal features of sounds, whereas higher-order areas on the temporal convexity are dedicated to semantic processing [36,37], suggesting a hierarchy within the ventral stream. Thus, our ability to distinguish between different samples of the same sound object (broad category) relies on the fundamental differences in sound amplitude and spectral power. We recognize the mewing of a kitten from that of an old cat because of these differences even though we already categorized the sound as coming from a cat. Additional evidence points to a key role of the planum temporale (PT) in the analysis of complex auditory stimuli, including auditory scene analysis [32,38].
Here, we have made use of the high spatial resolution of ultra-high field fMRI to investigate the representations of environmental sounds within individual early-stage auditory areas by means of a repetition effect paradigm. The increased signal-to-noise ratio and BOLD signal, the decrease of the signal strength of venous blood (due to the short relaxation times) and the restriction of activation signals to the cortical gray matter were shown to improve spatial specificity [69,70]. All together, these technical advances are beneficial for the tonotopic mapping of the (small) human areas A1 and R and for repetition effect paradigms within individual early-stage areas which require high sensitivity [17,18,20,[71][72][73][74].
We addressed two issues. First, as suggested by previous low-spatial resolution repetition effect studies, the planum polare (PP), which belongs to the hierarchically organized ventral pathway, may encode the meaning, and not solely the acoustic features of environmental sounds. It is currently not clear whether this is already the case in PP belt areas, i. e., areas immediately adjacent to PAC, and possibly also in PAC. If so, this would challenge the key tenet of a strictly hierarchical model of sound recognition. Second, PT, which serves as hub for complex auditory processing, may encode the meaning of environmental sounds. The possible role of PT belt and/or parabelt areas has not yet been investigated.

Materials and Methods Subjects
Ten subjects (6 female, mean age 23.9 ± 3.7) with normal hearing and no history of neurological or psychiatric illness participated in the study. Written, informed consent forms were signed by all subjects after a brief oral description of the protocol. The Ethics Committee of the Faculty of Biology and Medicine of the University of Lausanne approved all experimental procedures. Eight subjects were right-handed, one left-handed and one ambidextrous. One subject's data were discarded due to large motion artefacts. Data from the remaining nine subjects were used in the current analysis.

Auditory stimuli
Sound stimuli were generated using MATLAB and the Psychophysics Toolbox (www. psychtoolbox.org). Stimuli were delivered binaurally via MRI-compatible headphones (Insert Earphones, SensiMetrics, MA, USA) featuring flat frequency transmission from 100 Hz to 8 kHz. Sound intensities were adjusted to match standard equal-loudness curves (ISO 226) at phon 95: the sound intensity of each pure tone stimulus (ranging from 88 to 8000 Hz) was adjusted to approximately equal perceived loudness of a 1000 Hz reference tone at 95 dB SPL (range of sound intensities: 87-101 dB SPL). Sound levels were further attenuated (~35 dB) by silicone ear plugs (Etymotic Research Inc., ER38-15SM). All subjects were debriefed after the session and all reported hearing sounds clearly over the background of scanner noise.

Tonotopic mapping
Pure tones (88,125,177,250,354,500,707,1000,1414,2000,2828, 4000, 5657, and 8000 Hz; half-octave steps with a sampling rate of 44.1 kHz) were presented in ordered progressions, following our previously described protocols [17,75]. Each subject performed two tonotopic runs with ascending and descending progressions (low to high and high to low frequencies, respectively). Pure tone bursts were presented during a 2 s block in consecutive steps until all 14 frequencies had been presented. The 28 s progression was followed by a 4 s silent pause, and this 32 s cycle was repeated 15 times per 8 min run. Resulting maps of the two runs were averaged. This paradigm is designed to induce travelling waves of response across cortical tonotopic maps [76]. Linear cross-correlation was used to determine the time-to-peak of the fMRI response wave on a per-voxel basis, and to thus assign a corresponding best frequency value to each voxel. Analyses were performed in individual-subject volumetric space and results were then projected onto same-subject cortical surface meshes.
Similar to the example shown in Fig 1, two tonotopic gradients with mirror symmetry ("high-low-low-high") were clearly observed in both hemispheres of all subjects [17][18][19][20][21]75,[77][78][79], A1 was defined by the more posterior "high-to-low" gradient and R by the more anterior "low-to-high" gradient. In macaque auditory cortex, fields A1 and R receive parallel thalamic input and are both considered part of the primary auditory core.

fMRI repetition suppression experiment
The selection of the stimuli used proceeded as follows. To start with, we chose 583 extracts of easily identifiable sound objects, often heard in everyday life, from BBC sound effects database (following [46,47]) using Adobe Audition (Adobe Systems Software Ireland Ltd.). The duration of each sound was 500 ms. Amplitudes, sampling frequencies and linear rise/ fall times were normalized with the same routine for all sounds (16 bits, 44.1 kHz, 50 ms rise/fall times, without high-pass/low-pass filtering). Monophonic sounds were duplicated into stereophonic sounds. Five normal subjects, which did not participate in the fMRI study, were asked whether the sound was clear, whether they recognized it, and then to name it and to rate the degree of confidence of their recognition. We then proceeded to select sounds which were correctly named by all five subjects and which reached high confidence level (4 or 5 on a scale of 0-5). Sounds were then sorted into two groups: the repetition group (REP group, i.e. eight different individual/exemplars of the same sound) and the control group (CTRL group, i.e. eight different sounds objects). The two groups were compared for familiarity and for acoustic properties (amplitude and spectral power). The degree of familiarity, i.e. the level of confidence with which the subjects judged their recognition, was equivalent in both groups. The acoustics characteristics were controlled with the same approach as described previously [80]. Randomly selected sounds from either group were compared for their acoustic characteristics (amplitude and spectral power) at each time point with unpaired t-tests. This iteration was repeated until two lists of sounds were identified with less than 1% of significantly different time points (p<0.05);~1000 iterations were performed). To limit false negatives, i.e. to avoid underestimating putative differences, we did not apply any correction for multiple comparisons. To minimize the differences between REP and CTRL sets, we set the threshold to 1% the time points. This procedure yielded a total of 323 environmental sounds (64 REP sounds and 259 CTRL sounds). As an additional control measure, we calculated the mean power spectrum (amplitude spectrum) of each condition (see S4 Fig) and performed an unpaired t-test on those, which revealed no significant differences between the two conditions. Although the overall sound acoustics were controlled between REP and CTRL groups, within each REP and CTRL blocks, the amplitude and the spectral power differed between sounds repeats (see S4 Fig). Semantic categories (animal vocalizations, human-made sounds, tools, music instruments, and natural scene-like sounds; see S1 Table for the list of sounds used in the experiment) were equally distributed in both groups. Sounds from the REP group were never repeated in the CTRL group, and sounds from the CTRL group were randomised between blocks, subjects and runs.
Subjects listened passively to sounds during fMRI acquisitions, with their eyes closed. A block design with alternating blocks of sounds of the same semantic category (REP) and sounds of different semantic categories (CTRL) was used. REP blocks were made of eight different repetitions of the same object (e.g. eight baby cries of different babies), with in total 8 REP blocks or 64 REP sounds per run. CTRL blocks had 8 different exemplars of different categories randomly selected at the beginning of each run (8 different objects x 8 blocks = 64 out of the 259 CTRL sounds). Sounds were presented bilaterally during 500 ms with an inter-stimulus interval of 1500 ms during 16 s, followed by a 14 s silent pause. Each fMRI run consisted of 16 blocks of 30 s (8 REP and 8 CTRL, 8 minutes in total). Two runs, with the same sequence of sounds, were acquired both before and after tonotopic mapping runs. Sound presentations were synchronized with the scanner trigger. All subjects reported clear perception and recognition of the stimuli of both groups.

MRI data acquisition and analysis
Imaging was performed with an actively shielded 7 Tesla Siemens MAGNETOM scanner (Siemens Medical Solutions, Erlangen, Germany) located at the Centre d'Imagerie BioMédicale (CIBM) in Lausanne, Switzerland. Functional data were acquired using a 32-channel head volume rf-coil (Nova Medical, USA [81]) and an EPI pulse sequence with sinusoidal read-out (1.5 x 1.5 mm in-plane resolution, slice thickness = 1.5 mm, TR = 2000 ms, TE = 25 ms, flip angle = 47°, slice gap = 1.57 mm, matrix size = 148 x 148, field of view 222 x 222, 30 oblique slices covering the superior temporal plane, first three EPI volumes discarded). The sinusoidal shape of the readout gradients reduces the acoustic noise produced by the scanner. A T1-weighted high-resolution 3-D anatomical image was acquired for each subject using the MP2RAGE pulse sequence optimized for 7T (resolution = 1 x 1 x 1 mm, TR = 5500 ms, TE = 2.84 ms, TI1 = 2350 ms, TI2 = 0 ms, slice gap = 1 mm, matrix size = 256 x 240, field of view = 256 x 240; [82]).
Preprocessing steps were performed with BrainVoyager QX v2.3 software and included standard linear trend removal, temporal high-pass filtering and motion correction, but no spatial smoothing. Functional time-courses were interpolated into a 1 x 1 x 1 mm volumetric space and registered to each subject's 3D Talairach-normalized anatomical dataset. Cortical surface meshes were generated from each subject's anatomical scan using automated segmentation tools of the program. Alignment of anatomical data across subjects was performed with the cortex-based alignment [83]. This is a non-rigid alignment of cortical surface meshes across individuals based on the gyral and sulcal folding patterns. Each subject's cortical surface meshes were aligned to a target mesh (separately for left and right hemispheres) which had an intermediate HG anatomy (partial HG duplication in the left hemisphere and a large single gyrus in the right hemisphere). All alignments were visually inspected. A groupaveraged contrast environmental sound vs. rest activation map was generated in this cortexbased aligned mean space during the data analysis of the repetition suppression experiment (for regions outside the auditory cortex, see S3 Fig).

Identification of non-primary early-stage auditory areas
Individual tonotopic mappings were used to identify in each subject non-primary early stage areas, defined as subject-specific regions of interest (ROIs). This approach has been previously used in several studies across different modalities (visual localizer: [84]; auditory localizer: [36,85]). Maps were set to an intermediate threshold (r>0.13, equivalent to p0.05, uncorrected) in order to cover a region including most of STP in all subjects and hemispheres. We then manually outlined a contiguous patch of interest (mean volume LH: 1400.87 mm 2 ± 321.35, and mean volume RH: 1364.58 mm 2 ± 189.15) of cortical surface including the two tonotopic gradients within PAC, the remaining medial and lateral parts of HG, the posterior part of the PP and the anterior part of the PT using the drawing tools in BrainVoyager QX (external outlines in Fig 1). This patch of interest was subdivided into 10 regions with the following steps. First, the primary areas A1 and R were localized based on mirror-symmetric preferred-frequency reversals. The anterior and posterior borders thereof were drawn along the outer high-frequency representations, while lateral and medial borders were set so as to cover only the medial two-thirds of HG (in accordance with human architectonics [5,86]). The border between A1 and R was then drawn along the long axis of the low-frequency cluster. The location of the border of A1 and R was not dependent on the correlation threshold. Second, we divided the non-primary region surrounding A1 and R into eight ROIs. The common border between A1 and R was extended until the outlines of the main patch, dividing the main patch into anterior and posterior parts.  Table 1 and S2 Fig) included thus primary and non-primary auditory areas, in agreement with the monkey model [8]. Several of these areas have been identified in previous architectonic studies [5,6] (Table 1); M1, L1, L2, L3, L4 and M4 corresponded, respectively, to PA (posterior auditory area), LA (lateral auditory area), ALA (anterior lateral auditory area), ALA-AA (junction between the anterior lateral and anterior auditory areas), AA (anterior auditory area), and MA (medial auditory area).
These regions of cortical surfaces were projected into the same-subject's 1 x 1 x 1 mm interpolated volumetric space to generate 3D ROIs with a width of 2 mm (-1 mm to 1 mm from the vertex centre). Individual time-courses from the 3D-ROIs were subsequently analyzed in the repetition effect experiment.

Time-course analysis and plateau definition
Functional individual time-courses were also extracted for all individual voxels within the main region of interest. Using home-made Matlab scripts, they were baseline corrected and averaged in space (within ROIs) and in time (across runs and blocks repetitions), separating conditions, in order to have two final time-courses, one for REP and one for CTRL, with 15 time points each per ROI, for each hemisphere and subject. These time-courses were then averaged across subjects and normalised to the first time-point.
Repetition suppression effects measured with EEG are related to amplitude differences between the first presentation and the repeat of a brief, single event, which can be picked up due to the high temporal resolution of the technique. Repetition-induced changes of neural activity are more difficult to assess with fMRI, due to its poor temporal resolution. In order to overcome this limitation in our study, we used a block design approach. We assumed that whether the sound was followed by a repetition or not, the hemodynamic response of the first sound had the same behaviour at onset, and only the plateau (or the saturation period) differed between CTRL and REP conditions. We hypothesized that in case of repetition effects, i.e. in REP blocks, the slope of the BOLD response will be steeper than during the CTRL condition. BOLD signal intensities of consecutive time frames were subtracted pair wise to calculate their relative slopes (t n + 1 -t n ). We tested our hypothesis on the slope values using paired t-tests against 0. Positive p values indicate a rise period, negative values a decay and null values a plateau. We restricted our results in time to a minimum of two consecutive time frames. Time frame by time frame paired t-tests revealed significant differences (p<0.05, uncorrected) in slopes during the same time periods for all conditions and hemispheres: a rise between 2-6 s, a plateau between 6-18 s, and a decay between 18-22 s (S1 Fig).

Identification of early-stage auditory-related areas
Individual phase-encoding analysis of the time-courses of the tonotopy runs reproduced the mirror-symmetric tonotopic gradients as previously reported in other studies [17,75]. The location and the extent of frequency-selective regions were determined individually in each hemisphere and subject. When calculated at the same statistical threshold (p<0.05, uncorrected), this region covered in each subject a large part of the STP, including PP and PT, and it was co-extensive with the region activated by environmental sounds (main effect of environmental sound presentation; see S2 Fig).
Here tonotopic mapping was used as localizer for primary and non-primary auditory areas for which repetition effects to environmental sounds were then investigated. The primary areas A1 and R were identified by their characteristic "high-low-low-high" reversal. The surrounding frequency-selective region was parcelled into 8 ROIs and designated L1-L4 on the lateral and M1-M4 on the medial part (Fig 1; see Materials and Methods). Several ROIs corresponded to areas identified in previous architectonic studies ( [5,6]

Repetition effects within auditory cortex on the supratemporal plane
Irrespective of the condition (REP, CTRL), the BOLD time course within the auditory cortex had the similar evolution, consisting of rise, plateau and decay. These three time windows were defined by means of temporal derivatives of the average time course (Fig 2 and S1 Fig). The rise comprised the period of 2-6 s after block onset and was very likely shaped by the hemodynamic response to the first sound. The plateau stretched over the period of 6-18 s and was shaped by the hemodynamic response to the 7 following stimuli. A significant difference between REP and CTRL during the plateau was interpreted as a repetition suppression effect and hence an indication that the neural population encoded the meaning of the stimuli.
In a first analysis the BOLD response was averaged over the whole STP region with significant main effect of environmental sounds (which was co-extensive with the 10 early-stage areas). The auditory cortex time-courses (averaged across blocks and subjects for each condition) were significantly different between conditions near the peak of the BOLD response (which also correspond to the beginning of the plateau period) in both hemispheres (p<0.05, uncorrected; green line in Fig 2). Bilateral REP time-courses peaked 2 s earlier (6 and 8 s after stimulus onset for REP and CTRL times-courses, respectively) and had a different plateau decrease, than the CTRL time-courses. This could possibly reflect different saturation of the BOLD response when different individuals/exemplars of the environmental sounds are presented. Left hemisphere CTRL time-courses showed a sustained plateau between 12 and 18 s after stimulus onset, whereas right CTRL time-courses showed a slow decay. REP time-courses showed a faster return to baseline or lower plateau than CTRL time-courses, comparable to previous results with other sensory modalities showing repetition suppression effects [40]. The REP and CTRL conditions differed significantly at 8-10 s in the left hemisphere and at 6-8 s in the right hemisphere (paired t-test, uncorrected). No significant difference was found between the hemispheres for either condition.

Repetition effects within individual early-stage auditory areas
Time-courses of the BOLD response were analyzed separately in each area (Figs 3 and 4). Independent of the condition, responses look larger in posterior than anterior areas (for CTRL in RH: Table). The REP and CTRL time-courses were almost identical in areas L2 and L3 on either hemisphere (paired t-test, p<0.05, uncorrected; see S3 Table), suggesting that  Table). This could possibly reflect faster saturation of the BOLD response in these regions when different individuals/exemplars of the environmental sounds are presented.
The repetition effect, i.e., a significant difference between the REP and CTRL conditions during the plateau phase, was present in areas A1 and M3 of the left hemisphere (Fig 3) and in A1, R, M1 and M2 of the right hemisphere (Fig 4).
A time-point-per-time-point 2 x 2 ANOVA (Hemisphere x Condition) on the BOLD timecourses revealed a main effect of condition during the plateau phase in 6 areas, A1, R, M1, M2, M3, M4 and L4 (Fig 5, left panel, p<0.05, uncorrected). A main effect of hemisphere was present during the plateau phase in 2 areas, M1 and M2 (Fig 5, middle panel, p<0.05, uncorrected).   No significant interaction Hemisphere x Condition was observed during the plateau phase ( Fig  5, right panel).

Environmental sounds representations on the posterior temporal convexity
Main effect of environmental sounds, irrespective of condition, revealed bilateral activation clusters in the superior temporal gyrus (STG), near the HG, and two clusters in the right middle temporal gyrus (ES3 and ES4, S2 Fig and S2 Table). BOLD responses tended to be larger in ES3 than ES4. However, neither of these regions showed a significant difference between REP and CTRL conditions.

Discussion
Our results indicate that the representations of the meaning of environmental sounds are already present at the level of early-stage auditory areas. The repeated presentation of eight acoustically different sounds of the same semantic entity yielded repetition suppression effects in areas A1, R, M1 and M2 in the right hemisphere and in A1 and M3 in the left hemisphere (Fig 6). No repetition effects were observed in the other 6 areas on the right and 8 areas on the left side. Interestingly, the putative belt areas on the PP, often associated with the ventral auditory pathway, do not appear to encode the meaning of environmental sounds, whereas the primary cortex and the belt areas on the medial part of the PT do so.

Semantic coding in the ventral auditory stream occurs outside PP belt areas
Our results showed that the meaning of environmental sounds is encoded at the level of earlystage auditory areas within the PT, but not within the belt areas on the PP. This supports a model of hierarchical processing within the antero-ventral recognition pathway. Seminal studies have shown that the ventral auditory stream is dedicated to the identification of sound sources and that it processes sound-to-meaning in a hierarchically organized fashion. Regions on the STP, close to PAC, were found to be selective for acoustic features of stimuli such as spectral structure and temporal variability, but not for stimulus category, whereas more anterior regions on STG presented category selective responses [37]. The role of regions outside STP in category-specific coding has been reported also in other studies using comparisons between sounds of living vs man-made sources [34], animals vs tools [87] or several different categories [59,63,88,89]. The semantic involvement of the temporal convexity, but not STP, has been further demonstrated by means of repetition effects for specific sound categories (vocalizations: [50][51][52][90][91][92] or specific sound objects [46,47,49]). One seminal study reported category specific adaptation effects on the STP, but not specifically within belt areas [55]; neural responses to animal and tool sounds were acquired with 3T fMRI (spatial resolution of 3mm, spatial smoothing of 8 mm) and adaptation effects were averaged in anatomically pre-defined regions HG, PP, PT, anterior STG and posterior STG in the right and left hemisphere. The PP as delimited in this analysis stretched up to the temporal pole, reaching far beyond the belt areas. PP, HG and PT yielded adaptation effects to tool sounds on the left side and to tool and animal sounds on the right side.
We would like to stress that the absence of repetition effects in PP belt areas is not due to a lack of sensitivity of our paradigm. Repetition effects were well present in distinct PAC and PT areas (Fig 6) and also when the whole PAC-PP-PT region was averaged (Fig 2).

The role of the planum temporale in the representation of sound objects
The presence of repetition effects within PAC and within belt areas on the medial PT suggests that the meaning of environmental sounds is encoded at very early stages of cortical processing. The PT as a whole was shown to encode category-specific information of sounds as finegrained patterns of distributed activity [68,93]. More generally PT was shown to encode both pitch and spatial information [45,[94][95][96][97] or recognition and spatial information [28,30]; it has been referred to as a hub for the processing of different sound attributes [98]. Several studies have highlighted its role in auditory scene analysis, i.e., in the segregation of concurrent auditory streams by means of pitch or spatial differences [38,99,100]. The separation of meaningful sounds in an acoustically complex environment, as assessed by the task to localize a target sound among four simultaneous distracters vs alone, was shown to involve PT, together with the left inferior frontal gyrus, the precuneus and the right STG [101].
Taken together, the above evidence indicates that the PT plays an important role when sound recognition occurs in a complex acoustic environment. Surprisingly, the meaning of environmental sounds is already represented at the level of belt areas on the PT but not on the PP, which belongs to the classical recognition pathway.
The representation of sound objects on PT may constitute a processing stream for sound recognition which is, at least partially, independent from the antero-ventral pathway. The existence of a dual semantic pathway is supported by three observations. First, functional and anatomical studies speak in favour of an early segregation between the two semantic pathways. Semantic information is already encoded in PAC ( [55] and here), which has strong connections to belt areas [102]. The most parsimonious explanation is that PAC shares semantic information with the postero-medial belt areas on the PT semantic and acoustical information with the anterior belt areas on PP. The alternative explanation that the belt areas on PT receive input from the anterior belt areas and constitute thus the next step in semantic processing cannot be excluded, but it is not supported by the connectivity patterns between the human core, belt and parabelt areas [102]. Second, the nature of the semantic representation differs between the two pathways. The PT has been shown to play an important role in combining semantic and spatial information [45,[94][95][96][97], whereas the antero-ventral pathway mediates a truly position-independent coding [46,64]. Third, a relative functional independence of the two pathways is suggested by patient studies which reported a double dissociation between deficits in auditory scene analysis such as supported by the PT and semantic identification of sound objects in cases of focal brain lesions [103].

Methodological considerations
The subdivision of the supratemporal plane, which we used in this study, not based on anatomical landmarks, but on identification of PAC by the presence of mirror symmetric frequency reversals [17,75,77]. Recent improvements in T1 mapping at ultra-high field allowed a definition of PAC in each individual hemisphere according the underlying myelin layout of the cortex [73,74]. These seminal studies demonstrate a very good overlap of tonotopic maps and highly myelinated core on HG in selected cases and propose a potentially unique method for the localization of PAC. Before using a combination of these methods systematically as PAC localizer, it will be necessary to assess systematically the combination of tonotopy and myelin contrast in cases of HG variants of partial or complete duplication. Previous studies have shown that PAC, as defined by the dual tonotopic maps, is not restricted by the sulcal pattern of HG [17]. Instead, there is a continuum between the different variants and the tonotopic maps, where PAC extends over both parts of the gyrus in case of duplications.

Conclusions
Repetition effects revealed the encoding of the meaning of environmental sounds within primary areas A1 and R as well two belt areas on the antero-medial part of the PT in the right hemisphere and within A1 and a belt area on the medial part of HG in the left hemisphere, but not within belt areas on the PP. These results speak in favour of a dual auditory semantic pathway, one within the hierarchically organized antero-ventral stream and the other within the PT. The latter, but not the former, encodes the meaning of environmental sounds already at the level of belt areas.  Table; ES3 area: 362.59 mm 2 and ES4 area: 308.11 mm 2 ; p<0.05, Bonferroni correction). As for individual ROIs, the group ROIs were labelled with their region name and projected into the reference brain 1 x 1 x 1 mm interpolated volumetric space. Individual time courses of these regions were subsequently analyzed in the repetition suppression experiment. Time-courses of ES3 and ES4 are plotted for each condition in the graph. ES3 and ES4 ROIs showed both the same tendency, with higher BOLD response during control blocks, but none showed significant differences. It is to be noted that the group average fixed-effect multi-subject GLM constrast REP vs CTRL did not show any significant difference (p>0.05, Bonferroni correction). Upper panel: significant activation clusters (p<0.05, Bonferroni corrected). Lower panels: enlargement of the activated regions on a partially inflated brain. Environmental sounds activated two large clusters within the STG (ES1 and ES2), but also two smaller clusters in the right posterior MTG (ES3 and ES4). Mean time courses for these latter clusters are plotted in red and green in the graph between the two enlargements. Time frame by time frame analysis revealed no significant differences between the two conditions surviving the inclusion criteria. (TIF) S4 Fig. A. Mean amplitude spectrum of the environmental sounds used in the paradigm. Each amplitude spectrum of the sounds of the two conditions has been decomposed using a fast Fourier transform function and plotted across the frequency range from 0 to 25000 Hz. Blue line: mean amplitude spectrum for the repetition group sounds; red line: mean amplitude spectrum for the control group sounds. Unpaired t-tests between the amplitude spectra of both conditions for each frequency revealed that 110 non-consecutive frequencies were significantly different between conditions, which corresponded to 1% after Bonferroni correction (110/11025 = 0.01). B. Amplitude spectrum of each sound in a REP block where eight different bell sounds were presented. Frequency distributions within a block are different in each exemplar compared to the mean amplitude spectrum of REP condition (bottom right graph). (TIF) S1 Table. Environmental sounds used in the repetition suppression paradigm. Only sounds correctly recognized during the sound recognition pilot by five subjects were used in the fMRI experiment. All sounds of the REP group (8 sound objects) were used in the fMRI runs, whereas only one exemplar of each sound object was randomly selected in the CTRL group (64 sounds objects). The REP groups was the same in all subjects, whereas the CTRL group varied in all subjects. human voc.: human vocalizations; human non-voc.: human non-vocalizations; env. sound: environmental sound. (DOCX) S2 Table. Main effect of the environmental sound presentation (REP + CTRL > silence). Centre coordinates of the activation clusters shown in S3 Fig, t scores, and p values. Only regions that remained significant at p<0.05 after application of the Bonferroni correction were considered. (DOCX) S3 Table. Maxima, minima and amplitudes of the BOLD response during REP and CTRL in both hemispheres. Paired t-tests between REP vs CTRL maxima ([max]) revealed significant differences in right A1 (1) , right M1 (2) , right M2 (3) , right M4 (4) , left A1 (5) , and left M3 (6) (p<0.05, uncorrected). Paired t-tests between REP vs CTRL minima ([min]) revealed significant differences in right R (7) , left A1 (8) and left M3 (9) (p<0.05, uncorrected). Paired t-tests between RH vs LH maxima during REP and RH vs LH minima during CTRL revealed significant differences in M2 (10) and L2 (11) , respectively (p<0.05, uncorrected). No significant differences were found for the amplitudes. (DOCX)