Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Sustained Cortical and Subcortical Measures of Auditory and Visual Plasticity following Short-Term Perceptual Learning

  • Bonnie K. Lau ,

    Contributed equally to this work with: Bonnie K. Lau, Dorea R. Ruggles

    Current address: Institute for Learning and Brain Sciences, University of Washington, Seattle, Washington, United States of America

    Affiliation Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America

  • Dorea R. Ruggles ,

    Contributed equally to this work with: Bonnie K. Lau, Dorea R. Ruggles

    Affiliation Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America

  • Sucharit Katyal,

    Affiliation Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America

  • Stephen A. Engel,

    Affiliation Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America

  • Andrew J. Oxenham

    Affiliation Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America


Short-term training can lead to improvements in behavioral discrimination of auditory and visual stimuli, as well as enhanced EEG responses to those stimuli. In the auditory domain, fluency with tonal languages and musical training has been associated with long-term cortical and subcortical plasticity, but less is known about the effects of shorter-term training. This study combined electroencephalography (EEG) and behavioral measures to investigate short-term learning and neural plasticity in both auditory and visual domains. Forty adult participants were divided into four groups. Three groups trained on one of three tasks, involving discrimination of auditory fundamental frequency (F0), auditory amplitude modulation rate (AM), or visual orientation (VIS). The fourth (control) group received no training. Pre- and post-training tests, as well as retention tests 30 days after training, involved behavioral discrimination thresholds, steady-state visually evoked potentials (SSVEP) to the flicker frequencies of visual stimuli, and auditory envelope-following responses simultaneously evoked and measured in response to rapid stimulus F0 (EFR), thought to reflect subcortical generators, and slow amplitude modulation (ASSR), thought to reflect cortical generators. Enhancement of the ASSR was observed in both auditory-trained groups, not specific to the AM-trained group, whereas enhancement of the SSVEP was found only in the visually-trained group. No evidence was found for changes in the EFR. The results suggest that some aspects of neural plasticity can develop rapidly and may generalize across tasks but not across modalities. Behaviorally, the pattern of learning was complex, with significant cross-task and cross-modal learning effects.


One of the remarkable feats of perceptual neural processing is the ability to learn, change, and adapt to particular circumstances and tasks. Auditory plasticity has been demonstrated on multiple time scales and in a wide variety of contexts. Directed attention very quickly enhances the cortical representation of relevant sounds [1], and even very brief (1–5 s) listening experience in reverberant [2,3] or spectrally colored [4,5] environments can improve speech intelligibility. Long-term experience, such as musical training and tonal language fluency, have been associated with neural plasticity that may develop over the course of years [68]. The physiological mechanisms that underlie long- and short-term plasticity in auditory perceptual processing continue to be elusive but have significant implications for understanding the neurophysiology of the auditory pathways and for developing interventions for clinical populations. Similar questions of mechanism and time scale are of interest in the visual domain [9].

Long-term musical training has been extensively studied in terms of its impact on auditory perception as well as the encoding of sound along the auditory pathway. Although it is difficult to rule out genetic and social factors that may influence neural development as well as the pursuit of musical training, both cortical and subcortical neurophysiological enhancements have been associated with musical experience. Structural neuroimaging studies report increased gray matter volume in auditory, motor, and visuo-spatial cortical regions in professional musicians [8,10,11]. Enhancement of cortical evoked potentials including N1 and P2 [12,13] and the earlier N19m-P30m complex [10] have been observed in response to tonal stimuli. Sustained responses to periodic stimuli of 80–1000 Hz (EFR) are thought to be generated primarily in the subcortical auditory structures [14,15], and investigations of these responses have shown greater spectral magnitude of the fundamental frequency (F0; the acoustic correlate of pitch) of complex tones in musicians compared to non-musicians [16]. Behavioral results are consistent with these physiological findings, with superior frequency-discrimination abilities reliably reported in musicians compared to non-musicians [17,18]. However, whether such auditory perceptual enhancements generalize beyond music-related tasks, such as frequency discrimination is unclear [19]. For instance, improvements in speech perception in noise or interfering talkers have been reported by some groups [2022] but not by others [2326]. Improved understanding of plasticity in EFR after shorter, more controlled learning may help to clarify details of the subcortical plasticity attributed to musical training.

Studies of learning-induced plasticity in non-human primates and rats have shown expanded representation of trained frequencies in primary auditory cortex following frequency-discrimination training [2729]. Much less is known about the cortical correlates of short-term perceptual learning in humans, but several studies report enhancement of the N1 and P2 cortical event-related potentials (ERP) following auditory training [30,31]. In this study, we investigate whether plasticity can also be observed in sustained steady-state EEG responses, which reflect the entrainment of neural responses to the periodicity of external stimuli, as opposed to previous time-domain ERP measures. We paired three tasks involving auditory or visual discrimination with corresponding steady-state responses. All behavioral tasks required discrimination as opposed to detection, in order to target high-level perceptual judgments [32].

Pitch discrimination can be quickly learned, to the extent that initially untrained listeners can achieve the same level of performance as professional musicians with 4–6 hours of training [33]. Pitch discrimination also offers the opportunity to study short-term perceptual learning using a task that is also often associated with long-term plasticity. We trained listeners on pitch discrimination with harmonic complex tones, which elicit an EFR in response to the F0. The EFR is a broadly generated response which has been localized primarily to subcortical neural populations, and which reflects phase-locked responses to amplitude modulation in the range of 80–1000 Hz [14,15,34]. Although reports of plasticity following short-term perceptual learning are mainly limited to cortical measures, at least one study investigated the possibility of subcortical plasticity in sustained responses [35]. In their study, Carcagno and Plack [35] found significant enhancement of EFR amplitude in response to dynamic and static pitch stimuli, along with improvements in behavioral discrimination. However, their results were inconsistent for the dynamic conditions and the effects were sufficiently small to warrant replication.

The processing of AM is central to speech perception in quiet and noise [3638], with modulation rates below about 16 Hz most important for speech understanding in general [39,40]. AM rate discrimination also improves with training and shows minimal generalization to other tasks, making it ideal for combining with frequency discrimination to study specificity of learning and plasticity along the auditory pathway [41]. We tested AM rate discrimination training with amplitude-modulated complex tones, which also elicit an auditory steady state response (ASSR). The ASSR in response to AM between 0–40 Hz is thought to be generated in the auditory cortex [15].

The third set of measures included here involves a visual perceptual training task combined with steady-state visual evoked potentials (SSVEP). Visual and auditory steady-state responses are similar but independently generated, allowing us to investigate potential cross-modal transfer effects. Discrimination of Gabor pattern orientation improves with training [42] and some evidence suggests that the SSVEP is plastic in response to both perceptual training and aversive conditioning [43,44].

The aim of this study was to evaluate the influence of perceptual learning on sustained neurophysiological responses within and across sensory modalities. Each auditory stimulus had both an F0 and an AM rate, allowing us to test whether training on one feature (e.g., F0) affected the neural representation and perception of another feature (e.g., AM). Concurrent presentation of the AM and F0 also enables simultaneous measurements from potentially different auditory neural generators. Our hypothesis was that F0-discrimination training would selectively enhance the EFR to the F0, whereas AM-discrimination training would selectively enhance the ASSR to the AM rate, and that the visual orientation discrimination training would selectively enhance the SSVEP responses.

Materials & Methods


The participants were forty (25 females and 15 males) normal-hearing listeners between 18 and 35 years of age who had less than 5 years of musical training, no prior experience in psychophysical experiments, and who did not speak any tonal languages. All participants passed an audiometric screening with pure-tone thresholds below 20 dB hearing level (HL) for octave frequencies between 250 and 8000 Hz and wore visual corrective lenses as prescribed by an optometrist during all sessions if required. Written informed consent was obtained from all participants in accordance with protocols reviewed and approved by the Institutional Review Board at the University of Minnesota. The participants were paid for their participation.


The auditory stimuli were amplitude-modulated harmonic complexes (Fig 1A) with all components added in sine phase, bandpass filtered between 1700 and 3600 Hz (Butterworth filter with 24 dB/oct slopes). This filtering ensured that only harmonics 17 to 33 were presented, which in turn meant that the participants had to rely on the periodicity in the temporal envelope to extract the pitch [45,46]. In this way, both the perception and the EFR relied on the same acoustic cues. The nominal F0 of the complex was 137 Hz, and the complex was sinusoidally amplitude modulated at a rate of 13 Hz with 100% depth. The duration of the complexes was 400 ms for behavioral tasks and 1 s for the EEG measurements, and all complexes had 10-ms raised-cosine onset and offset ramps. Modulated tones were embedded in a threshold equalizing noise (TEN) [47] extending from 50 to 8000 Hz, with a spectral notch from 1500 to 4800 Hz to limit the contributions from neurons tuning to frequencies outside the stimulus passband, and to reduce the audibility of any distortion products.

Fig 1. Auditory and visual stimuli.

(A) Unresolved harmonic complex with a F0 of 137 Hz, amplitude modulated at 13 Hz. (B) 3° of visual angle standard deviation degree Gabor pattern with spatial frequency of 1 degree per cycle embedded in a 9° background noise.

During behavioral testing, the tones were presented at an overall level of 53 dB sound pressure level (SPL), and the TEN level was 43 dB SPL per equivalent rectangular bandwidth (ERB) at 1 kHz between 50 and 1500 Hz and 33 dB SPL per ERB between 4800 and 8000 Hz. During EEG recording, the tones were presented at an overall level of 65 dB SPL embedded in TEN at 55 dB SPL in the lower region and 45 dB SPL in the higher region. The tones were increased by 10 dB during EEG recording because higher presentation levels (65–80 dB SPL) recruit larger neural populations and maximize the signal-to-noise ratio in recorded potentials [34,35,48].

The visual stimuli were generated using Psychophysics toolbox [4951] in MATLAB (The MathWorks Inc., Natick, MA). The visual stimuli consisted of a 3° of visual angle standard deviation Gabor pattern with a spatial frequency of 1 cycle per degree embedded in a 9° background of filtered noise (Fig 1B). The noise was created by convolving white noise with an isotropic 2D Gaussian filter centered on 1 cycle per degree with a standard deviation of 0.33 cyc per degree. In order to record the SSVEP to both the stimuli and the background noise, the Gabor pattern sawtooth flickered at 13 Hz while the noise sawtooth flickered at 17 Hz.


The perceptual training paradigm was conducted with four groups of ten participants, assigned randomly to each group. In the three experimental groups, participants were trained for about 30 minutes a day for 6 days on one of three tasks: F0 discrimination (F0 Group), AM rate discrimination (AM Group), or visual orientation discrimination (VIS Group). Participants were encouraged to complete the sessions on consecutive days but were allowed a maximum of two days between training sessions if they were unable to attend a weekend session. The final training session and the post-test were always completed on consecutive days. The EEG recording and behavioral measures were repeated again 30 days post-training to investigate whether training effects were maintained. A fourth, no-training control group (CON Group) was also included, where participants received EEG recording and behavioral pre- and post-tests at comparable time intervals to participants in the other groups but were not trained on any discrimination task. Control group participants received the initial EEG and behavioral pre-test and returned one week later to complete the post-test but did not participate in the maintenance test.

EEG test procedure

Pre-, post- and maintenance tests consisted of an EEG recording session followed by behavioral threshold measurements for each of the three discrimination tasks. EEG measurements were acquired using a Biosemi active electrode system with a sampling rate of 4096 Hz and 32 channels referenced to averaged mastoid electrodes. The recordings were made in a sound-attenuating booth.

The SSVEPs were recorded in response to a Gabor pattern embedded in background noise (Fig 1B), partitioned into twenty 1-minute blocks. Participants were seated two feet in front of an HP LP2065 LCD monitor with a refresh rate of 75 Hz. The monitor’s luminance gamma curves were measured using a Photoresearch PR-655 and corrected in software to ensure correct display of stimulus intensity. During recording, participants were presented with a luminance-change discrimination task to ensure that they were attending to the stimuli. Within each 1-minute block, the intensity of the Gabor pattern increased ten times at random intervals. Participants were instructed to press a key on a number pad as quickly as possible whenever they detected the luminance change. All participants achieved 80% correct or better on the task. The blocks were self-paced and completed in less than 30 minutes for all participants.

Following the SSVEP, simultaneous cortical and subcortical auditory sustained responses were recorded to 1000 repetitions of the 1-s AM harmonic complex (Fig 1A). The (cortical) ASSR was measured to the 13-Hz AM rate while the (subcortical) EFR was measured to the 137-Hz F0 in the same tones. The complexes were generated using Matlab and played to participants via a Tucker Davis Technologies (TDT) real time processor with headphone buffer and Etymotic ER1 insert earphones. Interstimulus intervals were jittered randomly between 700 and 800 ms and stimulus polarity was randomly alternated, resulting in 500 presentations of each polarity. Participants watched a silent close-captioned movie while listening to the stimuli during the auditory EEG recordings.

Behavioral test procedure

All behavioral sessions took place in a double-walled sound-attenuating booth. The auditory stimuli were presented diotically over Sennheiser HD 650 headphones, which have an approximately diffuse-field response; specified sound pressure levels are approximate equivalent diffuse-field levels. The tones were generated digitally and presented through a soundcard with 24-bit resolution at a sampling rate of 48 kHz. The visual stimuli were presented via a Dell 1707FPc 17” LCD monitor with a vertical refresh rate of 76 Hz placed two feet from participants’ eyes. The monitor’s luminance gamma curves were measured using a Photoresearch PR-655 and corrected in software to ensure correct display of stimulus intensity.

Participants’ thresholds were estimated using a standard two-alternative forced-choice procedure with a two-down one-up adaptive tracking rule for all three tasks [52]. For the auditory thresholds, the stimuli were always AM complex tones but either the AM or the F0 was varied depending on the task. To obtain an AM rate discrimination threshold, the F0 remained at 137 Hz but the AM rate was varied adaptively. For the F0 threshold, the AM rate remained at 13 Hz while the F0 was varied adaptively. An important point to note is that with this stimulus design, both auditory groups are exposed to the AM and the F0 of the tone but trained to discriminate only one of the two attributes. Each trial began with a 400-ms tone followed by a 200-ms gap, and then a second 400-ms tone. The background noise was gated on 500 ms before the first interval and off 200 ms after the second interval. Participants were asked to indicate via the computer keyboard which of the two tones had the higher pitch (F0) or AM modulation rate, and immediate feedback was provided after each trial.

For the visual orientation discrimination task the adaptive threshold tracked the contrast-to-noise ratio (CNR) required to discriminate the orientation of the Gabor pattern. On each trial, the Gabor pattern was presented for 200 ms followed by a 100-ms gap, and then for another 200 ms. One of the patterns was oriented at 45° and the other at 135°, randomly determined on each trial. Subjects had to indicate via button press if the second grating was rotated clockwise or counterclockwise in orientation relative to the first one.

Pre-test behavioral thresholds were obtained prior to the start of the first training session. Post-test and maintenance behavioral thresholds were obtained during the same session as their respective EEG measurements. The same adaptive procedure was implemented across the three tasks. For the auditory thresholds, the starting value of ΔAM and ΔF0 was 20%. Initially the value increased or decreased by a factor of 3. The step size was decreased to a factor of 1.41 after the first two reversals and to a factor of 1.2 after the first four reversals. For the orientation discrimination threshold, the starting value of ΔCNR was 50%. Initially the value increased or decreased by a factor of 3. The step size was decreased to a factor of 2 after the first two reversals and to a factor of 1 after the first four reversals. For all threshold measures, six reversals occurred at the smallest step size, and threshold was calculated as the geometric mean of the ΔAM, ΔF0, or ΔCNR rate value at those last six reversal points. For pre-test, post-test, and maintenance testing, each participant repeated the measures four times, and the geometric mean of the four repetitions was defined as the individual’s threshold.

Behavioral training procedure

During each training session, participants completed 15 adaptive threshold tracks, which equates to approximately 900 trials of their designated training task. After training began, the participants were no longer exposed to the other two tasks. The adaptive tracks used for training were the same as described above in the test procedure.

Auditory EEG analysis

EEG data were analyzed using frequency-domain principal component analysis (cPCA), as described in Bharadwaj and Shinn-Cunningham [53]. This multi-channel analysis technique allows the reduction of data acquisition time and provides a significant SNR improvement to traditional single-channel steady-state response analysis methods. The cPCA combines multichannel recordings using complex-valued weights that consider channel-specific magnitudes and phases in each frequency bin (see [53] for further details). The cPCA is also more suited for the analysis of steady-state responses in comparison to time domain analyses that combine multichannel recordings, such as principal component analysis or averaging across electrodes, because these methods assume that the signal is the same phase across sensors.

The data were first filtered into high (70–1000 Hz) and low (5–20 Hz) frequency ranges, reflecting the putative sub-cortical and cortical responses, respectively. For each filtered dataset, individual epochs were extracted beginning 50 ms before stimulus onset and extending 200 ms beyond the stimulus offset. Epochs exceeding 100 μV peak-to-peak were rejected. Visual inspection of channels with a large proportion of rejected epochs resulted in exclusion of 1–5 channels in about 1/3 of the recordings. One subject in the AM group had an unusably noisy EEG dataset from their maintenance session (i.e., a large proportion of epochs exceeding 100 μV peak-to-peak) and their maintenance data was excluded. The multi-taper complex PCA computation was completed using a single taper in both frequency ranges. Resulting phase locking values (PLV) reflect the consistency of the sustained response phase over all epochs and across all included electrodes. PLV magnitude was extracted at the experimental frequencies of the 13 Hz AM rate and the 137 Hz harmonic complex F0 in the cortical and subcortical filtered regions, respectively.

Visual EEG analysis

Raw SSVEP data were band-pass filtered 1–59 Hz in EEGLAB [54] using a Hamming windowed sinc FIR filter, and 60-s epochs were extracted beginning at each event trigger. Epochs were transformed into the frequency domain, and signal-to-noise ratios (SNRs) were computed at expected SSVEP peaks (Gabor and noise flicker rates) compared to the average noise floor .05–0.2 Hz above and below those peaks. Occipital, parieto-occipital, and parietal electrodes (CP1, CP2, CP5, CP6, P8, P7, Pz, P3, P4, PO3, PO4, O1, Oz, and O2) were considered, and SNRs were averaged for electrodes within this set that were greater than an SNR threshold of 1.5. Five subjects had no electrodes with peaks reaching the SNR threshold in one of the three sessions (Pre-, Post- or Maintenance). For those sessions, an average of the 3 electrodes with the best SNR was used. One subject in the F0 group and one subject in the control group had unusable EEG data due to recording error and were excluded. One subject in the AM group had noisy data (i.e., a large proportion of data did not meet SNR cutoffs) only from their maintenance session so that session was excluded.


Perceptual learning on trained and untrained tasks

We first assessed whether training improved behavioral discrimination thresholds. The participants trained on each task (Fig 2, filled symbols) had lower thresholds at post-test than pre-test suggesting that the training led to perceptual learning. However, improved thresholds were also seen in participants not trained on the task, including participants in the no-training CON group (Fig 2, open symbols). The thresholds were log-transformed prior to statistical analysis to maintain roughly equal variance across conditions. A Session (Pre vs. Post) by Group mixed-model ANOVA conducted for each task confirmed threshold improvements with a significant main effect of session for the AM task (F1,36 = 132, p < .001), F0 task (F1,36 = 31.7, p < .001), and the VIS task (F1,36 = 15.9, p < .0001). For both the F0 and VIS task, no significant effect of Group (F0: F3,36 = .793, p = .506, VIS: F3,36 = 1.762, p = .172) or the Session by Group interaction (F0: F3,36 = 2.255, p = .099, VIS: F3,36 = 1.502, p = .230) was observed, indicating that all groups demonstrated comparable threshold improvements.

Fig 2. Pre-test, post-test, maintenance, and training session thresholds for the discrimination of AM rate (left), F0 (middle) and orientation (right) for the four participant groups.

Participants who trained on each task are shown with filled symbols. Error bars represent ± 1 standard error of the mean.

For the AM task, there was a significant Session by Group interaction (F3,36 = 7.17, p = .001). Further analysis using an ANOVA with the difference between pre- and post-training threshold (Fig 3A) as the dependent variable also revealed a significant main effect of Group (F3,36 = 7.17, p = .001). Posthoc pairwise comparisons showed that the two auditory groups each had larger threshold improvements than the CON group (AM: p < .001; F0: p = .004), but there was no difference between the VIS and CON groups (p = .183). There was a significant difference between the VIS and AM group (p = .033) but no other difference between the three trained groups (p>.584 in all cases).

Fig 3. Changes in the behavioral thresholds and EEG responses from pre-test to post-test.

(A) Pre—Post difference of log transformed behavioral thresholds as a function of task and training group. (B) Pre—Post difference in the PLV of subcortical EFRs as a function of training group. (C) Pre—Post difference in the PLV of cortical ASSRs as a function of training group. (D) Pre—Post difference in the SSVEP to the signal and noise as a function of training group.

To account for the potential effect of initial threshold variability seen in participants both within and between groups, performance on the F0 and VIS task was also converted into Pre—Post threshold difference scores (Fig 3A). However, an ANOVA with the Pre—Post threshold difference did not show a significant effect of group for either task (F0: F3,36 = 2.255, p = .099; VIS: F3,36 = 1.502, p = .230). Surprisingly, therefore, perceptual learning was observed across all training groups with relatively little evidence of task- or even modality-specific differences.

Neurophysiological responses

To assess subcortical physiological changes in response to the training, pre- and post-test EFRs were compared (Fig 4B). A Session by Group mixed-model ANOVA on the EFR PLVs showed no significant main effect of Session (F1,36 = .637, p = .430), Group (F3,36 = .402, p = .752), or the Group by Session interaction (F3,36 = 1.591, p = .209). A second analysis was conducted to investigate the effect of group on Pre—Post EFR PLV difference (Fig 3B). An ANOVA also revealed no significant effect of group on the Pre—Post PLVs (F3,36 = 1.591, p = .209), indicating that no significant changes in the EFRs were observed after the training.

Fig 4. Auditory EEG results showing (A) subcortical EFR phase locking by frequency for a single, representative subject (pre-test), (B) average phase locking values at 137 Hz by group and test session, (C) cortical ASSR phase locking by frequency for the same subject as panel A, and (D) average phase locking values at 13 Hz by training group and test session.

Error bars are ± 1 standard error of the mean.

Changes in cortical responses to auditory and visual stimuli after training were assessed by comparing ASSRs (Fig 4D), SSVEP-Signal (Fig 5B, left), and SSVEP-Noise (Fig 5B, right) across pre- and post-test sessions. For the ASSRs, all three training groups showed an increase in PLV at post-test, in contrast to the CON group which showed a decrease. This pattern is captured by a significant Session by Group interaction (F3,36 = 3.23, p = .034) on a Session by Group mixed-model ANOVA. Group differences were confirmed by a significant main effect of Group (F3,36 = 3.23, p = .034) in an ANOVA with the ASSR Pre—Post Difference as the dependent variable (Fig 3C). Posthoc pairwise comparisons showed a significant difference between auditory groups and the CON group (AM: p = .022; F0: p = .006) but no significant difference between the VIS and CON groups (p = .061) or the VIS and auditory groups (AM: p = .649; F0: p = .339). This indicates that only the two auditory groups showed enhanced cortical ASSRs post-training, consistent with the pattern of AM behavioral threshold improvement.

Fig 5. Visual EEG results showing (A) the SSVEP frequency magnitude spectrum for the subject shown in Fig 3A and 3C, and (B) group averages by test session for the 13 Hz signal (left) and 17 Hz noise flicker rates.

Error bars are ± 1 standard error of the mean.

The SSVEPs show a similar trend as the ASSRs with enhanced responses post training for the VIS group and decreased responses for the auditory and CON groups. SSVEPs to the stimuli and the noise were collapsed into a single Session (Pre vs. Post) by Group and Stimuli (Stimuli vs. Noise) mixed-model ANOVA, which revealed a significant Session by Group interaction (F3,66 = 5.50, p = .002). Further analysis with a SSVEP Pre—Post difference (Fig 3D) by Group and Stimuli ANOVA revealed group differences with a significant main effect of Group (F3,66 = 5.50, p = .002). Posthoc pairwise comparisons showed a significant difference between the VIS group and all other groups (AM: p = .022; F0: p < .001; CON: p = .028) with no other significant differences between the other groups (p>.09 in all cases), suggesting that the SSVEP was enhanced only in the VIS group post training.

The correlations between behavioral threshold and EEG response difference scores were computed to assess the relationship between changes in behavioral performance and changes in the neurophysiological measures. The AM Pre—Post threshold difference was not significantly correlated with the cortical ASSR Pre—Post difference (r = -.245, p = .128). The VIS threshold difference was also not significantly correlated to the SSVEP-Signal (r = .128, p = .449) or the SSVEP-Noise Pre-Post difference (r = -.032, p = .850). Furthermore, there were no correlations between the behavioral pre-test thresholds and the EEG pre-test responses (AM Pre & ASSR Pre: r = -.257, p = .109; F0 Pre & EFR Pre: r = -.043, p = .793; VS Pre & SSVEP-Signal: r = .252, p = .127; VS Pre & SSVEP-Noise: r = .322, p = .05). Although enhancement of the cortical ASSR was seen only in the auditory trained groups and enhancement of the SSVEP was seen only in the visual trained group, there was no overall correlation between the amount of behavioral threshold improvement and change in neural response at the level of individual participants.


To assess the maintenance of the enhanced cortical responses 30 days after the training, a Session (Post vs. Maintenance) by Group mixed-model ANOVA was conducted for the cortical ASSRs and the SSVEPs. No significant effect of Session (F1,26 = .108, p = .745), Group (F2,26 = .265, p = .770), or Session by Group interaction (F2,26 = 2.133, p = .139) was observed for the cortical ASSRs. Likewise, an SSVEP Session (Post vs. Maintenance) by Group and Stimuli mixed-model ANOVA revealed no significant main effect of Session (F1,50 = .086, p = .770), Group (F2,50 = .769, p = .469), or Session by Group interaction (F2,50 = .872, p = .424). Thus, no change was observed in either response one month after the training. However, another important consideration is how maintenance thresholds compare to pre-test performance. A Pre vs. Maintenance by Group mixed-model ANOVA revealed that the Session by Group interaction was no longer significant for the ASSR (p = .695) but remained significant for the SSVEP (p = .002). This outcome suggests that group differences were retained after 30 days for the SSVEP but not the ASSR.

To assess the maintenance of behavioral learning 30 days after the training, a Session (Post vs. Maintenance) by Group mixed-model ANOVA was conducted for each task. No significant effect of Session (AM: F1,27 = .239, p = .629; F0: F1,27 = 1.60, p = .216), Group (AM: F2,27 = 2.48, p = .103; F0: F2,27 = .399, p = .675), or a Session by Group interaction (AM: F2,27 = 2.85, p = .076; F0: F2,27 = 3.04, p = .065) was observed for the auditory trained groups. For the VIS task, there was no significant effect of Session (VIS F1,27 = 1.28, p = .267) or the Session by Group interaction (VIS F2,27 = .281, p = .757) but a significant effect of group (F2,27 = 4.60, p = .019), presumably reflecting the fact that the CON group appeared to have lower thresholds in both sessions on average.


The data presented here show enhancement of both auditory and visual cortical steady-state responses following 3 hours of training spread over 6 days. Perceptual training led to behavioral improvements in the discrimination of the F0 and AM rates of complex tones as well as the visual orientation of Gabor patterns for all training groups. Although the behavioral threshold improvements showed minimal task specificity, the neurophysiological measures more closely matched the training: auditory-trained groups demonstrated an enhancement of ASSR PLV and the visual group demonstrated an enhancement of SSVEP amplitude. Despite the fact that the modality of training and neurophysiological enhancement aligned, there was no correlation between the change in behavioral threshold and the change in physiological response. Furthermore, the VIS group showed enhancement of SSVEP magnitude to both the stimuli and the noise although they were trained only on stimuli discrimination. Similarly, enhancement of the ASSR PLV was seen in the F0 group although they were exposed to AM but not trained on AM discrimination. Finally, even though all participant groups showed improvements in the discrimination of F0, no evidence of training-induced subcortical plasticity was observed in the auditory EFR.

The results presented here demonstrate that scalp-recorded auditory and visual steady-state responses are sensitive to cortical plasticity even after very short-term perceptual learning. We have extended previous time-domain ERP results by demonstrating plasticity of steady-state responses using an analysis of PLV. Although our techniques are novel, our findings are consistent with the outcomes of other studies using different designs and measurement methods. Auditory training studies have found cortical enhancements after speech-sound training in both N1-P2 responses [30] and mismatched negativity (MMN) responses [55]. Enhancement of C1 amplitude has been documented after visual perceptual training [56], and the SSVEP has been shown to be sensitive to aversive conditioning, arguably closely related to perceptual training [44].

Our results are also consistent with a large body of animal physiological studies showing cortical plasticity following perceptual training. Recanzone et al. [57] trained monkeys on frequency discrimination and showed enhanced representation of trained frequency in A1. This finding has been replicated in other species [28,29], supporting the idea that the auditory cortex is quite malleable to experience and highlighting the importance of determining effective ways of studying human neural plasticity.

Inconsistent with our findings are previous studies that have identified enhancement of the EFR in response to short- or long-term training. Most notable is the study by Carcagno and Plack [35], who found enhanced EFR responses to both dynamic and steady pitch tokens after short-term training. One important difference between the studies may be the amount of training received by subjects. We trained subjects for a total of about 3 hours over 6 days while Cacagno and Plack’s subjects underwent a substantially greater amount of training of 10 hours over 10 days.

One surprising aspect of the visual results is the enhancement of SSVEP amplitude to both the signal and the noise. This outcome may suggest that repeated exposure to the background noise without discrimination training was sufficient to also enhance neural coding of the noise. A similar result was observed in the auditory domain with the enhancement of the ASSR PLV in the F0 group who was exposed to AM but not trained in AM discrimination. This pattern of results further suggests that stimulus exposure and procedural learning (e.g., how to direct auditory attention in the laboratory) without specific discrimination training can result in plasticity. Additional investigations are required to determine the robustness of these findings. In a study where participants were exposed to 40-Hz AM but trained only on pitch discrimination, Bosnyak et al. [58] report the opposing finding that training altered N1c and P2 evoked potentials but had no effect on the 40 Hz ASSR. One potential explanation for these inconsistent findings is that sustained cortical potentials in response to slower modulation rates may be more plastic [48].

Directed attention has been shown to modulate the ASSR amplitude both for the lower speech-related AM rates we used [48] and for more rapid rates [59,60]. It may be that the mechanisms of short-term learning observed here are more closely related to the mechanisms of directed sensory attention than to the longer-term mechanisms underlying the enhancement of musicians’ and tonal language speakers’ EFRs [61]. Such long-term training is complicated by personal, social, and emotional factors that are hard to replicate in the laboratory but may heighten the importance of certain sounds and the responses they elicit. Polley et al. [62] provide evidence suggesting that top-down, task-dependent factors, arising from multiple cortical areas play a role in the nature of observed physiological changes. While past studies [48] have reported the effect of attention on the amplitude of the ASSR, there is much less evidence for the effect of attention on measures of the ASSR’s PLV. If the ASSR change observed here is indeed related to the direction of attention, our findings suggest that PLV holds potential as a tool for studying the effects of attention on neurophysiological responses.

Although the results of human EEG studies show enhancement of neurophysiological responses to trained stimuli, we cannot determine the exact neural mechanisms underlying the modulation of measured responses. It may be that training increases the synchrony (phase locking) of neural fibers to the AM of the visual or auditory stimulus or that an increased number of fibers are recruited to the processing of a trained stimulus. Combining behavioral and human results like these with modeling of the auditory and visual pathways and ongoing animal work may help to unravel the question of what neuronal response properties are causally related to perceptual improvements. An important limitation to note is that the actual sources of the EEG signals were not localized in this study. There is a general consensus that the neural sources of the fast-rate EFR are primarily subcortical and the sources of the slow-rate ASSR are primarily cortical [14,15,53]. However, a recent study that utilized magnetoencephalography for source localization, showed a right hemisphere cortical contribution in addition to the subcortical sources for the EFR [63]. Our use of cortical versus subcortical with reference to the primary generators of the ASSR and EFR, though consistent with past literature, is certainly an oversimplified consideration of their neural sources. Furthermore, it is also possible that a common neural rate-discrimination mechanism could account for both high-rate (F0) and low-rate (AM) discrimination, regardless of where the responses are generated. This could serve as an alternate explanation for the increase in ASSR PLV observed in the F0 group who was trained on high-rate discrimination but showed low-rate physiological enhancement. Additional investigation into the generalization of different high and low training rates is required to confirm this possibility.

Although the main question addressed in this study was whether there was evidence of plasticity in the steady-state EEG responses, the perceptual training paradigm produced an interesting and complex pattern of behavioral results. We found threshold improvements on all three tasks at post-test for all groups including the control group. Improvement in the no-training control group is actually a common phenomenon (for example, see [64]). However, this finding also speaks to the effect of stimulus exposure on behavioral measures of learning. Participants in our study were exposed to the stimuli significantly more than is typical in a behavior-only training paradigm with the one thousand stimulus repetitions presented during the EEG recording. Procedural learning, or the impact of subjects acclimating to the environment and nature of psychophysical tasks, is also likely to have impacted our behavioral results, as all of our subjects were entirely naïve to psychophysical experiments before participating in this study. Although these effects have been demonstrated before, our data show a potentially large effect of these factors on the differences between pre-test and post-test thresholds.

Our behavioral findings also provide evidence for cross-modal transfer of learning. Participants trained on the auditory tasks showed improvement on the visual threshold while those trained on the visual task showed improvement on the auditory thresholds. This type of cross-modal effect was not seen in the EEG data (i.e., subjects learned behaviorally across modalities but did not exhibit any changes in cross-modal physiological responses). Neurophysiological enhancement matched the modality of training. One way to interpret this pattern of results is that the cross-modal transfer of behavioral learning may have occurred at a later stage, such as decision formation [65,66], and therefore did not affect stimulus coding as measured by EEG. It is important to note, however, that one complication in the interpretation of our behavioral results is the large difference in mean thresholds between the groups at pre-test. Additional analysis of the Pre—Post threshold difference scores to account for initial baseline variability, nevertheless, did not change the results.

In conclusion, this study provides evidence that short-term, learning-related physiological changes may be measured in the adult auditory and visual cortex using EEG. Simultaneous subcortical and cortical sustained responses provide a unique insight into how different levels of the auditory pathway respond to amplitude modulation and how those responses might be sensitive to perceptual learning. Our findings suggest that cortical responses may be more reflective of training-induced plasticity than subcortical responses and that modality-based specificity is more apparent than task-based specificity. In contrast, our behavioral results revealed apparent cross-task and cross-modality transfer of learning.


We thank Beverly Wright and Christopher Plack for helpful discussions relating to this project.

Author Contributions

  1. Conceptualization: BKL DRR AJO.
  2. Funding acquisition: AJO.
  3. Methodology: BKL DRR SK SAE AJO.
  4. Software: BKL DRR SK SAE AJO.
  5. Writing – original draft: BKL DRR.
  6. Writing – review & editing: BKL DRR SK SAE AJO.


  1. 1. Woldorff MG, Gallen CC, Hampson SA, Hillyard SA, Pantev C, Sobel D, et al. Modulation of early sensory processing in human auditory cortex during auditory selective attention. Proc Natl Acad Sci U S A. 1993;90: 8722–6. pmid:8378354
  2. 2. Brandewie E, Zahorik P. Prior listening in rooms improves speech intelligibility. J Acoust Soc Am. 2010;128: 291–299. pmid:20649224
  3. 3. Brandewie E, Zahorik P. Time course of a perceptual enhancement effect for noise-masked speech in reverberant environments. J Acoust Soc Am. 2013;134: EL265—EL270. pmid:23927235
  4. 4. Holt LL, Lotto AJ. Behavioral examinations of the level of auditory processing of speech context effects. Hear Res. 2002;167: 156–169. pmid:12117538
  5. 5. Lotto AJ, Kluender KR, Holt LL. Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). J Acoust Soc Am. 1997;102: 1134–40. pmid:9265760
  6. 6. Deutsch D, Henthorn T, Marvin E, Xu H. Absolute pitch among American and Chinese conservatory students: prevalence differences, and evidence for a speech-related critical period. J Acoust Soc Am. 2006;119: 719–722. pmid:16521731
  7. 7. Krishnan A, Gandour JT. The role of the auditory brainstem in processing linguistically-relevant pitch patterns. Brain Lang. 2009;110: 135–48. pmid:19366639
  8. 8. Gaser C, Schlaug G. Brain structures differ between musicians and non-musicians. J Neurosci. 2003;23: 9240–9245. pmid:14534258
  9. 9. Sasaki Y, Nanez JE, Watanabe T. Advances in visual perceptual learning and plasticity. Nat Rev Neurosci. Nature Publishing Group; 2009;11: 53–60.
  10. 10. Schneider P, Scherg M, Dosch HG, Specht HJ, Gutschalk A, Rupp A. Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nat Neurosci. 2002;5: 688–94. pmid:12068300
  11. 11. Bermudez P, Zatorre RJ. Differences in gray matter between musicians and nonmusicians. Ann N Y Acad Sci. 2005;1060: 395–399. pmid:16597791
  12. 12. Pantev C, Oostenveld R, Engelien A, Ross B, Roberts LE, Hoke M. Increased auditory cortical representation in musicians. Nature. 1998;392: 811–814. pmid:9572139
  13. 13. Shahin A, Bosnyak DJ, Trainor LJ, Roberts LE. Enhancement of Neuroplastic P2 and N1c Auditory Evoked Potentials in Musicians. J Neurosci. 2003;23: 5545–5552. pmid:12843255
  14. 14. Herdman AT, Lins O, Roon PV, Stapells DR, Scherg M, Picton TW. Intracerebral Sources of Human Auditory Steady-State Responses. Brain Topogr. 2002;15: 69–86. pmid:12537303
  15. 15. Krishnan A, Bidelman GM, Smalt CJ, Ananthakrishnan S, Gandour JT. Relationship between brainstem, cortical and behavioral measures relevant to pitch salience in humans. Neuropsychologia. Elsevier; 2012;50: 2849–59.
  16. 16. Musacchia G, Sams M, Skoe E, Kraus N. Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc Natl Acad Sci U S A. 2007;104: 15894–8. pmid:17898180
  17. 17. Spiegel MF, Watson CS. Performance on frequency-discrimination tasks by musicians and nonmusicians. J Acoust Soc Am. 1984;76: 1690–1695.
  18. 18. Kishon-Rabin L, Amir O, Vexler Y, Zaltz Y. Pitch discrimination: Are professional musicians better than non-musicians? J Basic Clin Physiol Pharmacol. 2001;12: 125–144. pmid:11605682
  19. 19. Carey D, Rosen S, Krishnan S, Pearce MT, Shepherd A, Aydelott J, et al. Generality and specificity in the effects of musical expertise on perception and cognition. Cognition. Elsevier B.V.; 2015;137: 81–105.
  20. 20. Parbery-Clark A, Skoe E, Lam C, Kraus N. Musician enhancement for speech-in-noise. Ear Hear. 2009;30: 653–61. pmid:19734788
  21. 21. Zendel BR, Alain C. Musicians experience less age-related decline in central auditory processing. Psychol Aging. 2012;27: 410–7. pmid:21910546
  22. 22. Swaminathan J, Mason CR, Streeter TM, Best V, Kidd G, Patel AD. Musical training, individual differences and the cocktail party problem. Sci Rep. Nature Publishing Group; 2015;5: 11628.
  23. 23. Fuller CD, Galvin JJ, Maat B, Free RH, Başkent D. The musician effect: Does it persist under degraded pitch conditions of cochlear implant simulations? Front Neurosci. 2014;8: 1–16.
  24. 24. Ruggles DR, Freyman RL, Oxenham AJ. Influence of Musical Training on Understanding Voiced and Whispered Speech in Noise. Malmierca MS, editor. PLoS ONE. 2014;9: e86980. pmid:24489819
  25. 25. Boebinger D, Evans S, Rosen S, Lima CF, Manly T, Scott SK. Musicians and non-musicians are equally adept at perceiving masked speech. J Acoust Soc Am. 2015;137: 378–387. pmid:25618067
  26. 26. Başkent D, Gaudrain E. Musician advantage for speech-on-speech perception. J Acoust Soc Am. 2016;139: EL51–EL56. pmid:27036287
  27. 27. Recanzone GH, Schreiner CE, Merzenich MM. Plasticity in the frequency representation of primary auditory cortex following discrimination training in adult owl monkeys. J Neurosci Off J Soc Neurosci. 1993;13: 87–103.
  28. 28. Rutkowski RG, Weinberger NM. Encoding of learned importance of sound by magnitude of representational area in primary auditory cortex. Proc Natl Acad Sci U S A. 2005;102: 13664–9. pmid:16174754
  29. 29. Polley DB, Steinberg EE, Merzenich MM. Perceptual learning directs auditory cortical map reorganization through top-down influences. J Neurosci Off J Soc Neurosci. 2006;26: 4970–4982.
  30. 30. Tremblay K, Kraus N, McGee T, Ponton C, Otis B. Central auditory plasticity: changes in the N1-P2 complex after speech-sound training. Ear Hear. 2001;22: 79–90. pmid:11324846
  31. 31. Atienza M, Cantero JL, Dominguez-Marin E. The time course of neural changes underlying auditory perceptual learning. Learn Mem. 2002;9: 138–50. pmid:12075002
  32. 32. Lalonde K, Holt RF. Audiovisual speech perception development at varying levels of perceptual processing. J Acoust Soc Am. 2016;139: 1713–1723. pmid:27106318
  33. 33. Micheyl C, Delhommeau K, Perrot X, Oxenham AJ. Influence of musical and psychoacoustical training on pitch discrimination. Hear Res. 2006;219: 36–47. pmid:16839723
  34. 34. Bharadwaj HM, Masud S, Mehraei G, Verhulst S, Shinn-Cunningham BG. Individual Differences Reveal Correlates of Hidden Hearing Deficits. J Neurosci. 2015;35: 2161–2172. pmid:25653371
  35. 35. Carcagno S, Plack CJ. Subcortical plasticity following perceptual learning in a pitch discrimination task. J Assoc Res Otolaryngol. 2011;12: 89–100. pmid:20878201
  36. 36. Stone MA, llgrabe C, Moore BCJ. Notionally steady background noise acts primarily as a modulation masker of speech. J Acoust Soc Am. 2012;132: 317. pmid:22779480
  37. 37. Yost WA, Sheft S, Opie J. Interference and Discrimination Amplitude Modulation. J Acoust Soc Am. 1989;86: 2138–2147.
  38. 38. Bacon SP, Grantham DW. Modulation masking: effects of modulation frequency, depth, and phase. J Acoust Soc Am. 1989;85: 2575–2580. pmid:2745880
  39. 39. Kingsbury BE, Morgan N, Greenberg S. Robust speech recognition using the modulation spectrogram. Speech Commun. 1998;25: 117–132.
  40. 40. Drullman R, Festen JM, Plomp R. Effect of temporal envelope smearing on speech reception. J Acoust Soc Am. 1994;95: 1053–64. pmid:8132899
  41. 41. Fitzgerald MB, Wright BA. Perceptual learning and generalization resulting from training on an auditory amplitude-modulation detection task. J Acoust Soc Am. 2011;129: 898–906. pmid:21361447
  42. 42. Dosher BA, Lu Z-L. Perceptual learning reflects external noise filtering and internal noise reduction through channel reweighting. Proc Natl Acad Sci. 1998;95: 13988–13993. pmid:9811913
  43. 43. Liebe S, Gold JM, Busey TA, O’Donnell B. Electrophysiological correlates of the effects of perceptual learning on signal and noise in the human visual system. Vision Sciences Society Annual Meeting Abstracts. 2004.
  44. 44. McTeague LM, Gruss LF, Keil A. Aversive learning shapes neuronal orientation tuning in human visual cortex. Nat Commun. 2015;6: 7823. pmid:26215466
  45. 45. Houtsma AJM, Smurzynski J. Pitch identification and discrimination for complex tones with many harmonics. J Acoust Soc Am. 1990;87: 304–310.
  46. 46. Bernstein JG, Oxenham AJ. Pitch discrimination of diotic and dichotic tone complexes: Harmonic resolvability or harmonic number? J Acoust Soc Am. 2003;113: 3323–3334. pmid:12822804
  47. 47. Moore BCJ, Huss M, Vickers DA, Glasberg BR, Alcantara JI. A test for the diagnosis of dead regions in the cochlea. Br J Audiol. 2000;34: 205–24. pmid:10997450
  48. 48. Riecke L, Scharke W, Valente G, Gutschalk A. Sustained Selective Attention to Competing Amplitude-Modulations in Human Auditory Cortex. PLoS ONE. 2014;9: e108045. pmid:25259525
  49. 49. Brainard DH. The Psychophysics Toolbox. 1997;10: 433–436.
  50. 50. Pelli DG. The VideoToolbox software for visual psychophysics: transforming numbers into movies. 1997;10: 437–442.
  51. 51. Kleiner M, Brainard D, Pelli D. What’s new in Psychtoolbox-3? Perception. 2007;36: 14–14.
  52. 52. Levitt H. Transformed Up-Down Methods in Psychoacoustics. J Acoust Soc Am. 1971;49: 467–477.
  53. 53. Bharadwaj HM, Shinn-Cunningham BG. Rapid acquisition of auditory subcortical steady state responses using multichannel recordings. Clin Neurophysiol. International Federation of Clinical Neurophysiology; 2014;125: 1878–1888.
  54. 54. Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods. 2004;134: 9–21. pmid:15102499
  55. 55. Kraus N, McGee T, Carrell TD, King C, Tremblay KL, Nicol T. Central auditory system plasticity associated with speech discrimination training. J Cogn Neurosci. 1995;7: 25–32. pmid:23961751
  56. 56. Bao M, Yang L, Rios C, He B, Engel SA. Perceptual learning increases the strength of the earliest signals in visual cortex. J Neurosci. 2010;30: 15080–15084. pmid:21068313
  57. 57. Recanzone GH, Schreiner CE, Merzenich MM. Plasticity in the frequency representation of primary auditory cortex following discrimination training in adult owl monkeys. J Neurosci. 1993;13: 87–103. papers://FAFC0638-5DD4-4A81-A69F-F8A54DFE70C3/Paper/p11227 pmid:8423485
  58. 58. Bosnyak DJ, Eaton RA, Roberts LE. Distributed auditory cortical representations are modified when non-musicians are trained at pitch discrimination with 40 Hz amplitude modulated tones. Cereb Cortex. 2004;14: 1088–1099. pmid:15115745
  59. 59. Bidet-Caulet A, Fischer C, Besle J, Aguera P-E, Giard M-H, Bertrand O. Effects of selective attention on the electrophysiological representation of concurrent sounds in the human auditory cortex. J Neurosci. 2007;27: 9252–61. pmid:17728439
  60. 60. Müller N, Schlee W, Hartmann T, Lorenz I, Weisz N. Top-down modulation of the auditory steady-state response in a task-switch paradigm. Front Hum Neurosci. 2009;3: 1. pmid:19255629
  61. 61. Shiffrin RM, Schneider W. Controlled and automatic human information processing: II. Perceptual learning, automatic attending and a general theory. Psychol Rev. 1977;84: 127–190.
  62. 62. Polley DB. Perceptual Learning Directs Auditory Cortical Map Reorganization through Top-Down Influences. J Neurosci. 2006;26: 4970–4982. pmid:16672673
  63. 63. Coffey EBJ, Herholz SC, Chepesiuk AMP, Baillet S, Zatorre RJ. Cortical contributions to the auditory frequency-following response revealed by MEG. Nat Commun. 2016;7: 11070. pmid:27009409
  64. 64. Wright BA, Fitzgerald MB. Different patterns of human discrimination learning for two interaural cues to sound-source location. Proc Natl Acad Sci U S A. 2001;98: 12307–12312. pmid:11593048
  65. 65. Petrov AA, Dosher BA, Lu Z-L. The Dynamics of Perceptual Learning: An Incremental Reweighting Model. Psychol Rev. 2005;112: 715. pmid:16262466
  66. 66. Law C-T, Gold JI. Neural correlates of perceptual learning in a sensory-motor, but not a sensory, cortical area. Nat Neurosci. 2008;11: 505–513. pmid:18327253