Individual Differences in the Frequency-Following Response: Relation to Pitch Perception

Emily B. J. Coffey; Emilia M. G. Colagrosso; Alexandre Lehmann; Marc Schönwiesner; Robert J. Zatorre

doi:10.1371/journal.pone.0152374

Abstract

The scalp-recorded frequency-following response (FFR) is a measure of the auditory nervous system’s representation of periodic sound, and may serve as a marker of training-related enhancements, behavioural deficits, and clinical conditions. However, FFRs of healthy normal subjects show considerable variability that remains unexplained. We investigated whether the FFR representation of the frequency content of a complex tone is related to the perception of the pitch of the fundamental frequency. The strength of the fundamental frequency in the FFR of 39 people with normal hearing was assessed when they listened to complex tones that either included or lacked energy at the fundamental frequency. We found that the strength of the fundamental representation of the missing fundamental tone complex correlated significantly with people's general tendency to perceive the pitch of the tone as either matching the frequency of the spectral components that were present, or that of the missing fundamental. Although at a group level the fundamental representation in the FFR did not appear to be affected by the presence or absence of energy at the same frequency in the stimulus, the two conditions were statistically distinguishable for some subjects individually, indicating that the neural representation is not linearly dependent on the stimulus content. In a second experiment using a within-subjects paradigm, we showed that subjects can learn to reversibly select between either fundamental or spectral perception, and that this is accompanied both by changes to the fundamental representation in the FFR and to cortical-based gamma activity. These results suggest that both fundamental and spectral representations coexist, and are available for later auditory processing stages, the requirements of which may also influence their relative strength and thus modulate FFR variability. The data also highlight voluntary mode perception as a new paradigm with which to study top-down vs bottom-up mechanisms that support the emerging view of the FFR as the outcome of integrated processing in the entire auditory system.

Citation: Coffey EBJ, Colagrosso EMG, Lehmann A, Schönwiesner M, Zatorre RJ (2016) Individual Differences in the Frequency-Following Response: Relation to Pitch Perception. PLoS ONE 11(3): e0152374. https://doi.org/10.1371/journal.pone.0152374

Editor: Frederic Dick, Birkbeck College, UNITED KINGDOM

Received: December 4, 2015; Accepted: March 14, 2016; Published: March 25, 2016

Copyright: © 2016 Coffey et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The authors have publicly published the data from both experiments on Figshare. https://figshare.com/articles/Coffeyetal2016_EEGdata_zip/3117205 (DOI:10.6084/m9.figshare.3117205).

Funding: Research was supported by operating grants to RJZ from the Canadian Institutes of Health Research and from the Canada Foundation for Innovation, by a Vanier Canada Graduate Scholarship to EBJC, and by an NSERC-CREATE award to EMGC. The center is supported by funding from the Fonds de Recherche Québec Nature Technologie/Société Culture. MS was supported by a Quebec Research Scholar career award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The scalp-recorded frequency-following response (FFR) to complex sounds [1] may present a paradox: whereas it is thought to capture how the auditory system represents basic features of sound with high fidelity [2–4] and reliability [5,6], several of its features vary considerably between listeners, even amongst a homogenous sample of young, healthy adults [7]. This inconsistency is surprising, because subtle variations in the frequency content, temporal precision, and inter-trial consistency of FFRs have been linked to enhanced processing in expert groups like musicians (e.g. [8–10]), and are sufficiently sensitive to be useful as biomarkers of deficient sound encoding in auditory processing and learning disorders [3,7,11–14].

This individual variation raises questions about what the FFR means; that is, which auditory information and cognitive processes are represented in the FFR, and by extension, which of those processes are being modulated by behavioural tasks and contribute to the differences observed in health and pathology. We address these questions by exploring how variability in the FFR is related to the perceptual phenomenon known as the 'missing fundamental' [15–17], to explore their mutual connections to pitch perception: because pitch is a perceptual phenomenon rather than a direct reflection of the physics of sound in the external environment, its neural representation should reflect the subjects' experience rather than the sound itself.

The FFR reflects the brain's response to sustained periodic sound. It is thought to originate in a combination of subcortical auditory nuclei (supporting work from animal studies is summarized in [2]), although recent evidence shows that there is also a cortical source [18]. The FFR closely resembles a low-pass filtered version of the eliciting sound, although the FFR's frequency composition does not always match that of the sound to which it responds: energy is present at the fundamental frequency (f0) when none is present in the physical stimulus [19]. The amplitude of f0 and harmonic representations covary with behaviour independently and have been hypothesized to represent an early distinction between two streams of auditory information [20]. Our goal is to determine if inter-individual variability in f0 strength is related to inter-indivual variability in perception.

Relatively straightforward relationships between f0 amplitude and expert behaviour have been reported. For example, stronger f0 representations and f0-tracking are positively related to measures of musicianship, and are thought to index training-induced enhancements [8,21,22] (although it should be noted that between-group f0 peak amplitude differences have not been found in every study, e.g. [23,24]). Conversely, f0 amplitude is negatively related to autism [25]. Furthermore, the extent of FFR suppression elicited by a competing task was related to behavioural performance in a unidirectional fashion [26]. However, the f0 also frequently demonstrates more complex relationships to behaviour. When people learned to segment a sequence of sounds based on its statistical characteristics, some showed an increase in f0 strength while others showed a decrease; those with an increase demonstrated better behavioural performance [27]. When people attended to one of two streams of auditory information, they again showed a range of neural responses from f0 enhancement to suppression; but rather than the direction, it was the amplitude of the modulation that best explained performance on a related behavioural task, such that greater neural modulation was related to poorer scores [28]. In a study using fMRI as well as EEG, neural responses to repeated sound ranged from repetition suppression to repetition enhancement, and BOLD fMRI suppression in the inferior colliculus was related to lower-amplitude but higher-fidelity FFRs (as measured by stimulus-to-response correlation); this was in turn related to more successful learning of non-native speech patterns [29]. The observation that increases in FFR f0 strength can be linked to both better and worse behavioural results further illustrate our incomplete understanding of the f0's role in sound processing.

Pitch encoding varies between people. When a neurologically healthy population is presented with complex harmonic sounds that lack energy at their fundamental frequencies (i.e. missing fundamental tone; MF), some people tend to report the higher frequency spectra that are physically present ('spectral' or 'analytic' listeners; SP listeners), whereas others perceive the missing fundamental ('fundamental' or 'synthetic' listeners; f0 listeners) [15–17,30,31]. This perceptual bias has neural correlates that support a role for the auditory cortex in missing fundamental perception [30]: patients with right temporal-lobe excisions encroaching onto Heschl's gyrus have difficulty perceiving the missing fundamental [32], asymmetry in grey matter volume in lateral Heschl's gyrus is related to f0 vs SP bias [33], and electrophysiological responses of the primary auditory cortex during the first 100 ms after sound onset differ according to perceptual bias [33,34]. Perceptual bias also appears to have a more flexible aspect. Some studies have reported a fundamental bias in musicians, which may have been caused by musical practice shifting the perceptual focus from SP to f0 perception [35], though this may depend on the nature of the musical experience including the type of instrument [34]. It has recently been shown that the number, order, and frequencies of the harmonics can influence f0/SP perception [31,33], with idiosyncratic yet relatively stable differences between individuals for specific stimuli [31]. Perception can also be influenced by the listening context, such as the harmonic relationship of a tone to preceding tones [31,36]. Furthermore, f0 perception in spectrally-biased listeners can be induced through repeated exposure and training on a specific MF stimulus [37,38]. This effect was observed only in the spectral-to-fundamental direction [38,39], and was accompanied by increased low gamma band (24–48 Hz) activity, bilaterally [38]. Gamma band synchronization is implicated in a wide range of cognitive tasks, and is thought to be a fundamental aspect of inter-region communication and function within the brain (for a recent review, see [40]). In this context, it may represent the neural correlate of the spectral information being combined into a coherent Gestalt-like percept of the missing fundamental [38].

The fundamental frequency of a sound is associated with, but not identical to, the perceptual experience of pitch [41–43]. The amplitude of the FFR's f0 appears to reflect how well individuals represent this important sound property and their ability to perceive pitch [44–48]. However, it has been demonstrated using frequency-shifted complex tones and dichotic presentation of alternating harmonics that the FFR's f0 represents information relevant to pitch processing but is not a direct representation of pitch [49]. The pitch computation itself may take place in the auditory cortex, where regions that represent pitch in an invariant way have been observed [50–52]. A rostral brainstem computation, observable using a horizontal rather than a vertical electrode montage, has also been proposed [53]. Peripheral explanations such as cochlear non-linearities have been ruled out because the missing fundamental percept is heard even when the frequency region is masked with noise [54].

Because both the FFR f0 strength and perceptual mode bias vary considerably between subjects [7,31] and pertain to the representation of pitch, we wanted to test the hypothesis that there is a relationship between the two. Although work on the locus of the pitch computation is incomplete [55], pitch encoding offers a way of studying the function of densely interconnected higher and lower level components of the human auditory system as a whole [56]; we therefore also wished to compare the FFR f0 with known cortical functional correlates of MF perception (i.e. gamma range activity). With the aim of gaining insight into both the brain's pitch encoding and extracting mechanisms and the meaning the FFR f0 amplitude variability with respect to perception and task demands, we conducted two experiments. In the first experiment, we presented complex tones with and without f0 energy and evaluated subjects' perceptual bias in order to test the hypotheses that i) f0 strength in the MF condition is positively correlated with the subject's propensity to hear in the fundamental mode; and that ii) the presence of f0 energy in the stimulus affects the FFR's f0 strength. We also tested several secondary questions: whether the f0 strength was related to musicianship, whether the mastoid-to-mastoid electrode montage can be used to observe pitch processing as suggested in early work [53], and how comparing the two electrode montages in common use might contribute to our understanding of inter-individual variability.

In the second experiment, we addressed whether differences in f0 strength might be caused by top-down influences of directed attention in real time, using a sensitive within-subjects design. Using short MF melodies, we tested the hypotheses that iii) subjects can learn to reversibly switch between perceptual modes, thus voluntarily selecting between low-level sound representations; that iv) f0 representation is related to accuracy scores on a behavioural measure of perceptual mode switching and differs when subjects are perceiving one or the other mode, indicating behaviourally-relevant top-down modulation of these representations; and that v) cortical gamma band activity is affected reversibly by condition in parallel. We evaluate both group and intraindividual differences between conditions for both experiments (hypotheses ii) and iv)) due to the complexity of relationships to behaviour that have been reported in literature. Because relationships between experience and both MF perception bias and FFR f0 amplitude have been reported (e.g. [8,34]), we further test how measures of musical experience are related to each of the physiological and behavioural measures.

Experiment 1

Materials and Methods

Participants.

We recruited 39 healthy adults (mean age = 28.8 years, SD = 9.1, maximum age = 58); 22 were female, and 33 were right-handed. Written informed consent was obtained and experimental procedures were approved by the Montreal Neurological Institute Research Ethics Board and the Research Ethics Committee of the Faculty for Arts and Sciences of the University of Montreal.

All subjects reported having normal hearing and no neurological conditions. Data about musical and linguistic history were collected via an online survey (Montreal Music History Questionnaire; MMHQ; [57]. The mean total instrumental and vocal practice and lesson hours was 4365 (SD = 7491), and subjects with musical experience started training between 3 to 18 years (mean = 8.1, SD = 3.9). Subjects reported a variety of current main musical activities (Keyboard: 3, Voice: 5, Percussion: 2, Strings: 7, Woodwinds: 1, Electronic/computer-based production: 2, Other: 1, No current activity: 18).

Three subjects were native tonal language speakers. The first 30 subjects recruited demonstrated a fundamental bias, we therefore used the behavioural task to screen an additional 30 subjects and selected from among them 9 additional subjects who were not 'pure fundamental' listeners. We thereby obtained a range of behavioural scores for correlational analysis.

Study design.

Subjects completed the MMHQ before the experiment. Prior to the EEG session, they performed the pitch mode perception task, which was a computerized listening task designed to measure a general propensity to hear pitch either in the fundamental or spectral mode (~12 mins). Subjects were then prepared for EEG and instructed to relax, listen to the stimuli, and remain still. Prior to recording the FFR to missing fundamental and fundamental-present sounds (~20mins), we recorded auditory brainstem responses to 4000 clicks (~3 mins) to screen out retrocochlear abnormalities in the auditory system (the presence and approximately similar latency of waveforms I, III and V was observed for all subjects).

Pitch mode perception task.

To obtain a global index of subjects' perceptual preference, we adapted a commonly used task in which subjects listen to two consecutive harmonic complex tones that lack energy at the fundamental frequency and judge whether the second was higher or lower in pitch as compared to the first [31,33,35]. The tones are devised such that spectral and fundamental mode perceptions yield opposite answers (Fig 1(A)); to accomplish this, one of the tones contains a series of partials that is one harmonic number greater than the other (e.g. 7^th to 9^th vs. 8^th to 10^th), and the frequency of the highest partial is kept constant to minimize the perception of edge pitch. The fundamental frequencies of the 20 tones were: 90, 98, 107, 111, 117, 132, 137, 141, 147, 153, 155, 159, 161, 163, 168, 170, 172, 176, 178, and 180 Hz. Subjects received the written instruction, 'You will hear two tones, which will then be repeated. Please indicate if the second tone appears to be lower (“L”) or higher (“H”) than the first. If you aren’t sure, please guess.' Subjects were also told that there is no right or wrong answer and that they should go with their stronger impression.

Download:

Fig 1. Experiment 1 experimental paradigm.

(A) Behavioural testing of each listener’s perceptual bias. A sample tone pair is shown schematically; two complex missing fundamental tones comprised of the 8th-10th and 7th-9th harmonics were played sequentially after a short silent pause. This was repeated once, then subjects were asked to record whether they perceived the second tone to be higher or lower than the first. Tones were constructed such that spectral and fundamental perceptions lead to opposite responses, and a measure of overall perceptual bias was calculated from responses on 20 tone pairs. (B) Stimuli used in ABR testing; MF and FP stimuli differed only in the absence (MF: missing fundamental) or presence (FP: fundamental present) of energy at the fundamental frequency (f0).

https://doi.org/10.1371/journal.pone.0152374.g001

Complex relationships exist between perceptual mode preference and stimulus properties [58]: raising the lowest harmonic tends to produce more SP responses, whereas increasing the number of harmonics produces more f0 responses. Changing the average spectral frequency appears to influence the results, but in a non-linear fashion [31,33], and the harmonic relationship between the presented tones can affect how it is perceived [34]. Fundamental frequency, harmonic number and the number of harmonics are interrelated such that if the top frequency is kept steady it is not possible to vary the other parameters independently. We used a subset of stimuli similar to those used previously to reduce the dimensions (i.e. number of partials, harmonic number, and frequencies), reduce task length, and maximize similarity to the stimulus used to evoke FFRs. All stimuli had three partials; this is the midpoint of those characterized in Schneider et al. and has also been used in subsequent work [31]. We used the 7^th-9^th and 8^th-10^th harmonics of stimuli with fundamental frequencies in the range that produce strong FFRs (f0: 80-500Hz; [59], and top harmonic frequencies ranging from 900-1800Hz (a sample tone pair is illustrated in Fig 1(A)).

The task comprised 20 stimulus pair trials presented in counterbalanced order. The set of trials was presented twice in pseudo-randomized sequences. Each trial consisted of a 250 ms tone, a 500 ms gap, and a second 250 ms tone; this was then repeated once after a 1000 ms delay to allow subjects to confirm their initial judgement (see Fig 1(A)). Stimuli were constructed by summing equal amplitude cosine-phase tones (fade-in/fade-out of 10ms raised cosine ramp), using custom Matlab scripts (The Mathworks Inc., MA, USA) from which sound files were produced (WAV, 44.1kHz sampling rate). Subjects listened to stimuli through headphones (Sony MDR-V500 headphones), and recorded their responses ('H' or 'L'). Before the main experiment, subjects completed 10 practice trials in which both the f0 and SP percepts occurred in the same direction (i.e. different f0 frequencies, but the same harmonic numbers). This served to familiarize subjects with the procedure and to ensure they had normal pitch processing abilities (100% accuracy) in the frequency range of interest; all subjects passed.

The overall perceptual bias of each listener can be quantified as the proportion of f0 and SP responses on a scale from -1 to 1; (SP -f0) / (SP + f0), where f0 refers to the number of fundamental responses, and SP refers to the number of spectral responses. The valence of this measure has been used inconsistently in literature; here, negative values represent more fundamental answers [30,31,34] rather than the reverse [58]. We evaluated the relationship between perceptual bias and neural correlates and with musical experience using non-parametric statistics due to the non-normal distributions.

Data acquisition.

EEG data were recorded in a magnetically shielded audiometric booth from monopolar active Ag/AcCl electrodes placed at Cz (10–20 International System), Fz (approximate; hairline), C7 (7^th vertical vertebra), and both mastoids, using an averaged reference (BioSemi, www.biosemi.com). Two ground electrodes were placed above the right eyebrow. Because active electrodes reduce nuisance voltages caused by interference currents by performing impedance transformation on the electrode, we confirmed that direct-current offset was close to zero during electrode placement instead of measuring impedance. Electrode signals were amplified with a BioSemi ActiveTwo amplifier, recorded using open filters with a sampling frequency of 16384Hz, and stored for offline analysis using BioSemi ActiView software.

The main analyses were conducted using electrode Fz re-referenced to C7, because recent work suggests that the FFR measured with electrodes placed close to the mastoids (which are used in the Cz-averaged mastoid configuration) may pick up neural activity generated peripherally in the auditory nerve [60]. Data were also re-referenced to the right mastoid to replicate the horizontal montage used by Galbraith [53]—for this analysis, thirty subjects with good quality data and mastoids with similar noise levels on each side were included.

Stimulus presentation.

A single missing fundamental complex tone was selected from the behavioural task for the EEG recording. A 120 ms long (48.8kHz sampling frequency) version with an absent fundamental frequency of 98Hz and energy only in the 8^th, 9^h, and 10^th harmonics (784Hz, 882Hz, 980Hz) was prepared using Matlab custom scripts, as was a 'fundamental present' (FP) equivalent that included power at the fundamental frequency (see Fig 1(B)). The MF version has the same pitch (to f0 listeners) but is perceptually distinct [53].

Auditory stimuli were delivered binaurally via insert earphones (ER3, Etymotic Research, www.etymotic.com) using Matlab software interfaced with a signal processing system (RX6, Tucker-Davis Technologies, www.tdt.com). All subjects listened to a block of MF stimuli before the FP stimuli to increase the likelihood that SP listeners maintained their SP perception [42]. In both the MF and FP blocks, 2500 stimuli in alternating polarity were presented at 80dB SPL. Blocks were 8.5 minutes long and subject comfort and alertness was checked informally between blocks.

EEG preprocessing.

Data analysis was performed using EEGLAB [61] and custom Matlab scripts. Data were band pass filtered (80–2000 Hz for f0 analysis and 2-2000Hz for gamma analysis, using a 4^th order zero-phase Butterworth filter as implemented in EEGLAB's 'pop_basicfilter' function), epoched (-60 to 140 ms relative to stimulus onset), and DC correction was applied to the baseline period. Epochs containing myogenic or other physiological artifacts were removed by calculating the maximum absolute amplitude for each epoch and excluding the top 15% from each condition and polarity. This method serves to retain equal numbers of epochs for each subject and condition despite differences in baseline EEG amplitude as well as remove movement artifact, as confirmed by visual inspection. The mean minimum number of remaining epochs for each subject and conditions was 2110 (SD = 3.9). The spectra of the FFR portion of the averaged summation waves were obtained by phase-locking value (PLV) analysis using custom scripts, which provides comparable results to and is highly correlated with spectral amplitude methods but is more statistically sensitive [62]. For each subject and condition, a set of 400 epochs from the total pool of epochs was selected randomly with replacement. Each was trimmed to the FFR period, windowed (5ms raised cosine) to limit spectral splatter, zero-padded to 1s to allow for a 1Hz frequency resolution, and the phase of each epoch was calculated by discrete Fourier transform. The phase locking value for each epoch was computed by normalizing the complex discrete Fourier transform by its own magnitude, and averaging across 1000 iterations. From each subject and condition average, the mean f0 strength was taken (see 'Appendix: analysis methods' item 5, in [62] for formulae). The PLVs of the f0 and harmonics were obtained automatically using a script that selects the peak within a +/-7 Hz window around the corresponding peak in the stimulus and calculates the average PLV from a 5-Hz window centred on the peak; all subjects had a clear peak close to f0. PLVs have been shown to be highly correlated with measures of spectral amplitude for fundamental present stimuli [62]. To ensure that this relationship holds for missing fundamental stimuli and that PLV is an appropriate method for our research questions, and for comparison with previous work, we calculated the spectral amplitude of the time-domain summation wave by FFT (10 ms raised cosine ramp, zero-padded to 1s) for each subject and condition and assessed the degree of correlation between PLV and spectral amplitude at f0.

We used bootstrapping to statistically evaluate intraindividual differences between conditions (hypotheses ii) and iv)) [63]. We pooled epochs from both conditions and drew two samples randomly with replacement from them. Each sample matched the total number of epochs collected for a single condition. We calculated the difference in PLV at the fundamental frequency for each sample. We then repeated this procedure 1000 times per subject, to form a null distribution, and tested whether the difference in mean PLV from the correctly assigned epoch sets, resampled to the full number of epochs, lay within the 2.5^th to 97.5^th percentile range (i.e. two-tailed distribution with alpha = 0.05). Difference waves were also made for each subject, by subtracting single polarity time domain averages (positive–negative polarity averages). Spectral amplitude was calculated as described above to evaluate whether spectral energy at f0 might be observable and might differentiate conditions (see Results—Effect of presence of f0 energy in the stimulus).

Data analysis.

We calculated Spearman's rho (r_s) to assess rank correlations between perceptual bias and measures of musicianship, and current age. To test hypothesis i), that f0 strength is positively correlated with the subject's propensity to hear in the fundamental mode, we calculated Spearman's rho between f0 strength in the MF condition and perceptual bias, controlling for age, which is known to decrease FFR strength [64]. We also investigated the signed difference between f0 in the MF and FP conditions (e.g. MF–SP) vs. perceptual bias. Note that we could not evaluate what might appear to be the opposite relationship, that spectral listeners would have stronger harmonic representation, because the frequencies of the harmonics used in the study are too high to be strongly represented in the FFR (>700Hz). This means that the energy we observe at the second through 5^th harmonics cannot be assumed to be related to the strength with which the auditory system represents the physical properties of the stimulus.

To evaluate hypothesis ii), that the presence of f0 energy in the stimulus affects the FFR's f0 strength, we first compared the mean f0 PLV across stimulus conditions using a two-tailed Wilcoxon signed rank test, then evaluated within-subjects differences between f0 distributions generated by PLV in each condition, via resampling. The spectral amplitude of differences waves were visually inspected and f0 amplitudes statistically compared at the group level to rule out the possibility that interesting between-condition differences might have been removed by averaging over opposite polarities.

To test the secondary question that the mastoid-to-mastoid electrode montage reveals differences between MF and FP conditions, we calculated PLV as above, compared conditions using a Wilcoxon signed-rank test, and visually inspected each subjects' time and frequency domain data. We also calculated the amplitude spectra (i.e. by FFT rather than PLV) of single polarity averages, which is most similar to the analysis previously reported [53], to ensure that this methodological difference was not the cause of the observed results. To test the consequence of selecting one of the two electrode montages in current use, we calculated the same measures using the Cz to averaged mastoids montage. For comparison with cortical gamma-band results in Experiment 2, we repeated the PLV analysis using data that had been filtered to include lower frequencies (band-pass filter: 2-2000Hz), and calculated the mean PLV in the low gamma range (24–48 Hz) of the averaged spectrum in the MF condition. We assessed correlations between mean low gamma activity and perceptual bias, musicianship, and f0 strength.

Results

Behavioural results.

The mean score on the perceptual bias task was -0.56 (range: -1.0 to 0.9); see Fig 2(A). Total hours of vocal and instrumental music practice was negatively correlated with perceptual bias (i.e. as previously reported, musicians were more likely to have fundamental responses; one-tailed r_s = -0.31, p = 0.049). Age of start of musical training among people reporting musical experience (N = 30) was also correlated with perceptual bias (one-tailed r_s = 0.51, p = 0.002; an earlier training start increased the likeliness of fundamental bias). We did not find a significant relationship between perceptual bias and age (two-tailed r_s = -0.20, p > 0.2).

Download:

Fig 2. Perceptual bias and relationships to musical experience and age.

(A) Distribution of perceptual bias in our sample (N = 39); this is weighted towards fundamental listeners but nonetheless includes people with a range of responses. (B) Perceptual bias was correlated with cumulative hours of musical training, and was negatively correlated with (C) the age of training onset in subjects who reported musical experience (N = 30). (D) Perceptual bias was not significantly correlated with current age.

https://doi.org/10.1371/journal.pone.0152374.g002

Comparison of PLV and spectral amplitude at f0.

The PLVs and spectral amplitudes at f0 were highly correlated in both conditions (FP: r_s = 0.80, p = 2.7e-08; MF: r_s = 0.82, p = 7.5e-09), confirming that the two measures contain shared information both in the responses to fundamental present responses as has already been shown [62], and in missing fundamental responses used here.

Variability in FFR waveform.

Five subjects were selected to illustrate FFR variability; these are presented in Fig 3A and 3B. Considerable variability is present in the amplitude of the fundamental; for example, subjects 1, 3 and 4 have strong f0 representations, whereas 2 and 5 do not. Variability is also observed in the amplitude of the harmonics, as well as the relationship between the f0 and harmonic amplitude; for example, subject 4 has both a strong f0 and harmonic representation and subject 5 has both a weak f0 and harmonic amplitude, whereas subjects 1 and 3 are dominated by f0 and variable harmonics. Subject 2 instead has harmonics that are notably stronger than the f0. The variability in f0 PLV and mean harmonic PLVs (2^nd–5^th harmonics) in the entire sample is presented in Fig 3(C) and 3(D). Subjects with stronger f0s tend to have stronger harmonics (FP: r_s = 0.72, p < 0.001; MF: r_s = 0.59, p < 0.001), though variability is evident in both the strength of the fundamental and the harmonics, and their relationship.

Download:

Fig 3. Inter-individual variability in frequency following response to complex tones that contain (black), or lack (red) energy at the fundamental frequency (f0).

(A) Time domain representation of five example subjects (band-pass filtered between 80 and 2000Hz), demonstrating variation in ABR waveform. Grey shading indicates the FFR portion of the signal. (B) Frequency domain representation of the example subjects’ FFR. Variability is evident in the strength of the encoding of the f0 (98Hz) and its harmonics (integer multiples), and in their pattern of relative strength. The relationship between the f0 and mean harmonic PLVs (2nd-5th harmonics), in the (C) missing fundamental and (D) fundamental present conditions demonstrates considerable variability in the strength of each and their relationship, although in general people with a strong representation of the f0 have a strong representation of the harmonics. Histograms are associated with each axis to illustrate the distributions (above: f0 PLVs; right: mean harmonic PLVs).

https://doi.org/10.1371/journal.pone.0152374.g003

Relationship between perceptual bias and FFR f0.

The main hypothesis, concerning a relationship between perceptual bias and f0 strength in the MF condition, was supported (partial rank correlation, r_s = -0.32, p = 0.027; Fig 4(A)). A similar pattern was observed in the FP condition (partial rank correlation, r_s = -0.28, p = 0.04; Fig 4(B)). No trend was observed in the signed difference in f0 strength between conditions vs. perceptual bias (r_s = 0.02, p = 0.91).

Download:

Fig 4. Experiment 1 results.

(A, B) Fundamental perception bias was correlated with f0 peak magnitude in response to missing fundamental and fundamental present tones in each condition (the effect of age is controlled). (C) Histogram of PLV differences at f0 between conditions. While many subjects showed a statistically significant distinction between conditions as determined by resampling (14/39 subjects), subjects did not show consistency in the condition with greater amplitude; 9 had a greater amplitude in the FP condition and 5 had a greater amplitude in the MF condition. (D, E) The PLV of the f0 in each condition was positively correlated with the mean PLV in the low gamma frequency range in both conditions. Both the relationships between f0 PLV and perceptual bias and between f0 PLV and gamma PLV appear more clearly in the MF condition (red), when the fundamental frequency is not present in the stimulus and must be computed. FP: fundamental present condition; MF: missing fundamental condition.

https://doi.org/10.1371/journal.pone.0152374.g004

Effect of presence of f0 energy in the stimulus.

There was no effect at the group level of stimulus condition on the strength of the f0 representation (FP mean = 0.20, SD = 0.08; MF mean = 0.19, SD = 0.08; Z = 1.41, p = 0.16). At the individual level, 14 out of 39 subjects showed significant differences (in either direction) between conditions using a resampling approach (p < 0.05; Fig 4(C)); of those, 9 had a greater PLV in the FP condition and 5 had a greater PLV in the MF condition.

The spectral amplitude of the difference wave (positive—negative polarity) averages showed no distinct peaks at f0, and mean amplitudes at f0 were very small in both conditions (< 0.1). There was no significant difference between conditions, across subjects (Z = 0.49, p = 0.63).

Other sources of inter-individual variability.

Hours of musical training demonstrated non-significant trends in the expected direction, with more hours of musical experience predicting larger f0 PLVs (FP: one-tailed r_s = 0.22, p = 0.09; MF: r_s = 0.14, p = 0.20). Start ages amongst musicians also showed non-significant trends in the expected direction (i.e. larger f0 PLVs for earlier training initiation; FP: r_s = -0.20, p = 0.14; MF: r_s = -0.25, p = 0.09). The signed difference between conditions was not clearly related to hours of musical training (r_s = -0.22, p = 0.17) nor start age (r_s = -0.15, p = 0.43). Age was not significantly related to f0 PLV in either condition, though the trend was in the expected direction in both conditions (FP: r_s = -0.17, p = 0.15; MF: r_s = -0.19, p = 0.11).

Effect of montage selection.

f0 PLV when measured with the Fz-C7 montages was significantly correlated with f0 PLV when measured using the Cz-average mastoid montages, however, not perfectly so (FP: r_s = 0.69, p < 0.001; MF: r_s = 0.57, p < 0.001). As with the Fz-C7 montage, no effects of condition were found at the group level (Z = 1.02, p = 0.31). Only 21 subjects showed the same direction of difference across montages (i.e. either FP > MF or MF > FP). The relationship between f0 PLV and perceptual bias observed using the Fz-C7 montage was not evident in the Cz-averaged mastoids montage in the MF condition (r_s = 0.05, p = 0.62), nor in the FP condition (FP: r_s = 0.01, p = 0.52).

There was very little signal in the data using the horizontal montage as compared with the other montages (FP mean f0: 0.11, SD = 0.13; MF: mean f0: 0.11, SD = 0.12), and the two conditions were not statistically distinguishable (Wilcoxon signed rank test: Z = -1.21, p = 0.89). We also calculated the spectral amplitudes of only a single polarity average, as was used in previous work [53]; f0 amplitude was very small and was not greater in the FP condition than in the MF condition (FP: mean = 0.02uV, SD = 0.001; MF: 0.04, SD = 0.001).

Low gamma band activity.

Mean gamma band activity in the MF condition was not correlated with perceptual bias (r_s = -0.02, p = 0.91), nor hours of musicianship (r_s = -0.04, p = 0.81), but it was significantly correlated with f0 PLV, more clearly in the MF condition (MF: r_s = 0.43, p = 0.003, FP: r_s = 0.27, p = 0.049).

Discussion

Perceptual bias is related to musicianship.

Our results demonstrate a relationship between perceptual bias and measures of musicianship, such that musicians with more training and an earlier start age were more likely to have a fundamental bias. This finding supports previous work [35], and indicates that despite a fundamental-skew in the perceptual mode bias measure, the range of stimuli used was sufficient to capture individual differences in auditory perception. We confirmed and quantified the variability in FFR f0 and harmonic representation in the frequency domain.

The discrepancy in the distribution of perceptual bias scores with a previous large-scale study that found a more symmetrical bimodal distribution is most likely due to differences of approach [33]. That study included an additional 'fundamental present' condition in behavioural testing that allows octave-shifted responses (i.e. perceiving the 2^nd harmonic as the pitch of the tone) to be accounted for and excluded from the overall perceptual bias score. Studies that have not used this approach yield similar distributions of perceptual bias as the present study (e.g. [31,35]). Including octave-shifted perceptual responses did not qualitatively change the main findings of a relative hemispheric lateralization in the previous work [33], and similarly we are able to observe the effect of interest, here. However, future work could include an octave-shift control condition to allow for assessment of this effect.

Individual differences in FFR f0 strength are correlated with perceptual bias.

As hypothesized, the strength of the f0 in the MF condition was correlated with perceptual bias, with fundamental listeners more likely to have stronger f0 representations. There was no difference at the group level in the FFR f0 PLV in f0 response to stimuli with the fundamental present or absent. This could indicate that however the energy physically present at the stimulus' f0 is encoded and perceived, it is not measured as part of the FFR; however, at the individual level, distinctions could in fact be made between the conditions for many subjects (36%), suggesting that the specifics of the stimulus are captured in the FFR, but not in a consistent fashion across participants.

Electrode montage affects FFR f0 strength.

We attempted to replicate the finding that an FFR recorded using a horizontal electrode montage would show f0 energy in the FP but not MF conditions [53], while f0 energy was found in both conditions using a vertical (Cz to mastoids or earlobes) montage, as we have been observing. This work has been interpreted as demonstrating that the f0 representation of MF stimuli appears in the rostral brainstem [2,53]. We did not find evidence that the f0 is stronger in the FP condition than the MF condition; in fact the signal amplitude in both conditions is very small and inconsistently observed between subjects. One possible explanation is that the previous work used monaural stimulation, and it has been shown that electrodes placed at the mastoid or earlobe pick up activity from the auditory nerve, which does show activity related to amplitude modulation in the signal [60,65]; this auditory nerve activity might be cancelled out in our binaural configuration when the horizontal montage is used. We do however find support for the suggestion that different electrode montages measure different events [53]. Comparing the two main electrode montages in current use, Cz-average mastoids and and Fz-C7, we found that while the f0 amplitudes are significantly related, variability is also demonstrated, and behavioural relationships that we identified using the Fz-C7 montage were not all replicated in the second montage. Only 54% of subjects showed consistency in the direction of the difference between f0 in each condition. This suggests differences in the relative contribution of sources, with two implications: the two montages may not be fully interchangeable, and that differences in individual anatomy such as head shape might contribute to inter-individual variability in the strength of the recorded signal, as well as possibly contribute to some of the complex relationships to behaviour that have been observed. This finding underscores the importance of methods which allow source separation [60] such as MEG [18].

Conclusion, limitations, and further questions.

We can conclude that a person's general perceptual bias, and therefore the cognitive machinery that supports pitch perception, contributes to variability in the FFR's f0 strength and is measurable in it. However, we are not able to confirm that subjects were hearing in the spectral or fundamental mode during the FFR recording using this design. Because the auditory context [36] as well as repetition [38] can influence perception, and because responses to specific stimuli are variable [31], it is possible that at least some spectral listeners were hearing in the fundamental mode during the FFR recording. The observed relationship between f0 PLV in the MF condition and general perceptual bias might be attributed to relatively permanent differences between people due to biology or long-term training. It is also possible that the effect of fundamental vs. spectral perception acts more flexibly, in an online top-down manner, which provides the motivation for the second experiment.

Experiment 2

To be able to specifically examine the neural correlates of perceptual mode in a highly controlled fashion and thereby further support the hypothesis that f0 variability in the FFR is related to pitch perception, we developed a sensitive within-subject paradigm that allowed perceptual mode to be brought under voluntary control. This was made possible by first selecting harmonic numbers for each subject within an ambiguous zone between their thresholds for fundamental and spectral perception, creating custom MF melodies based on these harmonic numbers, and then using pure-tone primes to help the subject select and attend to each perceptual mode within the melody. We first assessed if subjects can reversibly switch between perceptual modes using behavioral data. We then recorded FFRs while subjects listened to identical stimuli in either the fundamental or spectral modes, and evaluated whether the strength of the f0 representation when subjects were trying to hear the f0 was related to their behavioural performance. Our goal was to extend the finding in Experiment 1, in which we found a correlation between f0 representation in the MF condition and perceptual bias, by actively causing f0 modulation with an experimental manipulation. Because an increase in cortical low-gamma band activity has been reported following a spectral-to-fundamental perceptual shift [38], we also assessed whether low-gamma band activity was greater in the f0-listening condition. Finally, we looked at the relationship between the neural correlates of mode perception and musicianship. This design served to control for possible effects of diminishing or fluctuating attention [28] in the first experiment, as conditions could be interleaved and active responses to periodic mode-specific mismatched stimuli were tracked.