Atypical Mismatch Negativity in Response to Emotional Voices in People with Autism Spectrum Conditions

Autism Spectrum Conditions (ASC) are characterized by heterogeneous impairments of social reciprocity and sensory processing. Voices, similar to faces, convey socially relevant information. Whether voice processing is selectively impaired remains undetermined. This study involved recording mismatch negativity (MMN) while presenting emotionally spoken syllables dada and acoustically matched nonvocal sounds to 20 subjects with ASC and 20 healthy matched controls. The people with ASC exhibited no MMN response to emotional syllables and reduced MMN to nonvocal sounds, indicating general impairments of affective voice and acoustic discrimination. Weaker angry MMN amplitudes were associated with more autistic traits. Receiver operator characteristic analysis revealed that angry MMN amplitudes yielded a value of 0.88 (p<.001). The results suggest that people with ASC may process emotional voices in an atypical fashion already at the automatic stage. This processing abnormality can facilitate diagnosing ASC and enable social deficits in people with ASC to be predicted.


Introduction
In Autism Spectrum Conditions (ASC), abnormalities in social skills usually coexist with atypical sensory processing and aberrant attention. Social deficits are characterized by difficulty in understanding others' mental status, including the recognition of emotional expressions through voices [1,2]. Sensory dysfunction includes abnormalities in auditory processing, indicative of hyposensitivity or hypersensitivity to sounds [3,4]. Aberrant attention typically shifts orientation from social to nonsocial stimuli [5]. To comprehensively understand the pathophysiology of autism, determining whether voice processing is selectively impaired in people diagnosed with ASC and whether this impairment is associated with sensory dysfunction and attention abnormalities is necessary.
Previous studies have suggested that ASC causes difficulty in encoding and representing the sensory features of physically complex stimuli [6]. Such a deficit causes people with autism to have a disadvantage when processing social information, because affective facial and vocal expressions are multifaceted. However, ASC does not cause certain types of complex auditory inputs, such as music, loudness, and pitch discrimination, to be misperceived [7,8,9]. Furthermore, people with ASC are considered to exhibit a fragmented mental representation and lack causative association because of slow voluntary attention shifting [10,11]. A highly dynamic and interactive social realm should be highly susceptible to such impairments. However, studies on social-stimulus-specific deficits resulted from ASC have not distinguished sensory from attention processes nor have they evaluated the effects of physical stimulus complexity on their brain responses [5,12].
Voice communication, a part of social interaction, is critical for survival [13,14]. During the first few weeks following birth, infants can recognize the intonational characteristics of the languages spoken by their mothers [15,16]. Typically developing infants can discriminate affective prosodies at 5 months of age [17] and react to affective components in vocal tones by 6 months of age [18]. However, young children with ASC do not show a preference for their mother's voice to other auditory stimuli [12,19]. Adults with ASC exhibit difficulty in extracting mental state inferences from voices [1] and prosodies [20]. In a study of adults with ASC, the superior temporal sulcus, a voice-selective region, failed to activate in response to vocal sounds; however, the adults exhibited a normal activation pattern in response to nonvocal sounds [21]. Neurophysiological processing of emotional voices is atypical among people with ASC [22,23].
Regarding superior temporal resolution, electroencephalographic event-related brain potentials (ERPs) enable the distinct stages of sensory and attentional processing to be examined. Mismatch negativity (MMN), which is elicited by perceptibly distinct sounds (deviants) in a sequence of repetitive sounds (standards), can be used to investigate the neural representation underlying automatic central auditory perception [24,25]. Com-pared with standard stimuli, deviant stimuli evoke a more pronounced response at 100 to 250 ms and maximal amplitudes elicited over frontocentral regions [24]. The amplitude and latency of MMN indicate how effectively sound changes are discriminated from auditory background [26,27,28]. Recent studies have reported that MMN can be used as an index of the salience of emotional voice processing [29,30,31,32].
Previous MMN findings regarding ASC are mixed [33]. When children with ASC were exposed to pitch changes in previous studies, the MMN responses were early peak latencies, [34], strong amplitudes [35], weak amplitudes [36], and no abnormality [11,37,38]. MMN was preserved when children with ASC attended to stimuli, but decreased in unattending conditions [39]. When presented with frequency deviants in streams of synthesized vowels, children with high-functioning ASC yielded MMN amplitudes compatible with those of controls [10]. MMN was preserved in response to nonspeech sounds, but diminished in response to speech syllables [19]. When elicited by one-word utterances, MMN in response to the neutral syllable as the standard, compared with the commanding, sad, and scornful deviants, was diminished in adults with Asperger's syndrome [23], whereas MMN elicited by commanding relative to tender voices in boys with Asperger's syndrome yielded the opposite result [22]. These discrepant findings may be related to population characteristics, stimulus features, and task designs. In particular, the corresponding acoustic parameters have not been controlled to a degree.
P3a that follows MMN is an ERP index of attentional orienting [40]. If deviants are perceptually salient, then an involuntary attention switch is generated to elicit P3a responses [10]. In a previous study, people with ASC exhibited P3a amplitudes similar to those of people with mental retardation and controls when inattentively listening to pure tones [34,35]. Children with ASC exhibited P3a comparable to nonspeech sounds [41], but diminished responses to speech sounds [10,11,42]. Impaired attention orienting to speech-sound changes might affect social communication [10]. ASC cause speech-specific deficits in involuntary attention switching as well as normal orienting to nonspeech sounds.
To quantitatively control physical stimulus complexity, we presented meaningless emotionally spoken syllables, dada, and acoustically matched nonvocal sounds, representing the most and least complex stimuli, respectively, in a passive oddball paradigm, to people with ASC and matched controls. We hypothesized that people with ASC produce impaired MMN responses to emotional syllables and nonvocal sounds when general deficits in auditory processing are present. When the deficits are selective for voices, emotional syllables rather than nonvocal sounds diminish MMN responses among people with ASC. When involuntary attention orienting among people with ASC is speech-sound specific, P3a relevant to emotional syllables rather than nonvocal sounds would becomes atypical. In addition, to examine the relationship between electrophysiological responses and autistic traits, we conducted correlation analyses to determine the extent to which emotional MMN covaried with the Autism Spectrum Quotient (AQ) and receiver operating characteristic (ROC) analyses to evaluate the diagnostic utility of emotional MMN.

Materials and Methods
Participants 22 people with ASC and 21 matched controls participated in this study. Because of poor electroencephalogram (EEG) qualities, such as excessive eye movements and blink artifacts, 20 people with ASC and 20 controls were included in the data analysis. The participants with ASC, aged between 18 and 29 years (21.563.8 y, one female participant), were recruited from a community autism program. We reconfirmed the diagnosis of Asperger's syndrome and high-functioning autism by using Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV diagnostic criteria as well as the Autism Diagnostic Interview-Revised (ADI-R) [43]. The participants in the age-, gender-, intelligence quotient (IQ)-, and handedness-matched control group, aged between 18 and 29 years (22.063.7 y, one female participant), were recruited from the local community and screened for major psychiatric illness by conducting structured interviews. The participants did not participate in any intervention or drug programs during the experimental period. Participants with a comorbid psychiatric or medical condition, history of head injury, or genetic disorder associated with autism were excluded. All of the participants exhibited normal peripheral hearing bilaterally (pure tone average thresholds ,15 dB HL) at the time of testing. All of the participants or parents of the participants provided written informed consent for this study, which was approved by the Ethics Committee of Yang-Ming University Hospital and conducted in accordance with the Declaration of Helsinki.

Auditory Stimuli
The stimulus materials were divided into two categories: emotional syllables and acoustically matched nonvocal sounds (Table S1 and Figure S1 in File S1). For emotional syllables, a female speaker from a performing arts school produced the meaningless syllables dada with three sets of emotional (neutral, angry, happy) prosodies. Within each set of emotional syllables, the speaker produced the syllables dada for more than ten times (see [29,30,31,32] for validation). Emotional syllables were edited to become equally long (550 ms) and loud (min: 57 dB; max: 62 dB; mean 59 dB) using Sound Forge 9.0 and Cool Edit Pro 2.0. Each set was rated for emotionality on a 5-point Likert-scale. Two emotional syllables that were consistently identified as 'extremely angry' ad 'extremely happy' and one neutral syllables rated as the most emotionless were selected as the stimuli. The Likert-scale (mean 6 SD) of angry, happy, and neutral syllables were 4.2660.85, 4.0460.91, and 2.4760.87, respectively.
To create a set of control stimuli that retained acoustic correspondence, we synthesized nonvocal sounds by using Praat [44] and MATLAB (The MathWorks, Inc., Natick, MA, USA). The fundamental frequencies (f0) of emotional (angry, happy, neutral) syllables were extracted to produce the nonvocal sounds using a sine waveform and then multiplied by the syllable envelope. In this way, nonvocal sounds retained the temporal and spectral features of emotional syllables. All of the stimuli were controlled with respect to their length (550 ms) and loudness (min: 57 dB; max: 62 dB; mean 59 dB).

Procedures
Before the EEG recordings were performed, each participant completed a self-administered questionnaire, the AQ, used for assessing autistic traits [45]. During the EEG recordings, participants were required to watch a silent movie with Chinese subtitles while task-irrelevant emotional syllables or nonvocal sounds in oddball sequences were presented. The passive oddball paradigm for emotional syllables involved employing happy and angry syllables as deviants and neutral syllables as standards. The corresponding nonvocal sounds were applied in the same paradigm but were presented as separate blocks. Each stimulus category comprised two blocks, the order of which was counter-balanced and randomized among the participants. Each block consisted of 600 trials, of which 80% were neutral syllables or tones, 10% were angry syllables or tones, and the remaining 10% were happy syllables or tones. The sequences of blocks and stimuli were quasirandomized such that the blocks of an identical stimulus category and the deviant stimuli were not presented successively. The stimulus-onset asynchrony was 1200 ms, including a stimulus length of 550 ms and an interstimulus interval of 650 ms.

Electroencephalography Apparatus and Recordings
The EEG was continually recorded at 32 scalp sites. Please refer to Supplementary Materials (File S1) for details. The number of accepted standard and deviant trials between groups did not differ significantly irrespective of emotional syllables (ASC -Neutral: 7506149, Happy: 81615, Angry: 83611; Controls -Neutral: 7466112, Happy: 85611, Angry: 83613) or nonvocal sounds (7456189, 78615, 76617; 7816170, 78611, 80610). The paradigm was edited using MATLAB. Each event in the paradigm was associated with a digital code that was transmitted to the continual EEG, enabling offline segmentation and averages of selected EEG periods to be obtained for analysis. The ERPs were processed and analyzed using Neuroscan 4.3 (Compumedics Ltd., Australia).
MMN source distributions were qualitatively explored using current source density (CSD) mapping (http://psychophysiology. cpmc.columbia.edu/software/CSDtoolbox/index.html). The CSD method, as a measure of the strength of extracellular current generators underlying the recorded EEG potentials [46], computes the surface Laplacian over the surface potentials implying the dipole sources oriented normal to local skull [31,47].

Statistical Analysis
The MMN and P3a amplitudes were analyzed as an average within a 100-ms time window surrounding the peak latency at the electrode sites, Fz, Cz, and Pz according to previous knowledge [31,32,48]. The MMN peak was defined as the highest negativity in the subtraction between the deviant and standard sound ERPs, during a period of 150 to 250 ms after sound onset. Only the standards before the deviants were included in the analysis. The P3a peak was defined as the highest positivity during a period of 300 to 450 ms.
Statistical analyses were conducted, separately for each category (emotional syllables or nonvocal sounds), using a mixed ANOVA with deviant type (angry, happy), and electrode (Fz, Cz, or Pz) as the within-subject factors, and the group (ASC vs. control) as the between-subject factor with additional a priori group by deviant type ANOVA contrasts calculated within each electrode site [49]. The dependent variables were the mean amplitudes and peak latencies of the MMN and P3a components. Cohen's d was calculated to estimate the effect size (i.e., the standardized difference between means). Degrees of freedom were corrected using the Greenhouse-Geisser method. Bonferroni testing was conducted when preceded only by significant main effects.
To determine whether electrophysiological responses were associated with the severity of autistic traits, we conducted Pearson correlation analyses between MMN amplitudes and AQ scores. To examine the degree to which the MMN and P3a amplitudes could be used to differentiate between the participants with ASC and the controls, we conducted ROC analyses, which can identify optimal thresholds in diagnostic decision making.  Figure 1A). A post hoc analysis revealed that angry MMN were stronger than did happy MMN among the controls (p,.001), whereas no such difference was observed among the participants with ASC (p = .67).

Demographics and Dispositional Measures
To determine whether the MMN amplitude effects elicited by angry versus happy deviants between subject groups stemmed from differences in acoustic features, an additional MMN analysis was conducted by subtracting the neutral-derived ERP from the angry-and happy-derived ERPs. The  with ASC exhibited weaker MMN responses to nonvocal sounds than did the controls. Regardless of the group, MMN induced by angry-derived sounds (angry-derived MMN) was stronger than that elicited by happy-derived sounds (happy-derived MMN). Fz and Cz exhibited more negative deflections than did Pz. In addition, an interaction was observed between the deviant type and the electrode site [F (2, 76) = 11.08, p,.001, d = 1.08] ( Figure 1B). A post hoc analysis indicated that the topographical distribution of angry-derived MMN yielded the most negative deflections at Fz and the least negative deflections at Pz. The happy-derived MMN exhibited no differential topography. Unlike emotional syllables, no interaction between the deviant type and the group was observed among nonvocal sounds (p = .65).
The ANOVA on the peak latency of MMN revealed that, regardless of the group, MMN in response to angry deviants peaked significantly later than did MMN in response to happy P3a in response to angry syllables (angry P3a) yielded stronger amplitudes than did P3a in response to happy syllable (happy P3a). Fz exhibited the most positive deflections than did Cz and Pz. In addition, an interaction among the group, deviant type, and electrode site [F (2, 76) = 3.66, p = .029, d = 0.62]. A post hoc analysis revealed that angry P3a produced an interaction between the group and the electrode site [F (2, 76) = 3.89, p = .025, d = 0.64], but happy P3a did not (p = .96). People with ASC exhibited weaker angry P3a amplitudes than did the controls at Fz (p = .009). Figure 2 illustrates the ERP waveforms for standard and deviant responses.
Current Source Density Analyses. The scalp topographies for absolute voltages of MMN for emotional syllables and nonvocal sounds in both groups were consistent with the MMN amplitudes results ( Figure 3A). The exploratory source distribution analyses based on CSDs indicated that MMN received a major contribution from the auditory cortex ( Figure 3B). In the ASC group, there was a trend toward an additional posterior temporal source.

Correlation Among Mismatch Negativity and Autistic
Traits. When the two groups were combined, lower amplitudes of angry MMN at Fz were coupled with higher total scores on the AQ [r (36) = 0.36, p = .03, d = 0.77] (Figure 4). However, such a correlation was not observed in either the ASC group or the control group. MMN induced by nonvocal sounds did not exhibit any correlation. Also, there was no age-related correlation.

Relationship Between Sensitivity and Specificity for
Angry Mismatch Negativity. The area under the ROC curve (AUC) is indicative of the overall accuracy of the measurement, representing the probability that a randomly selected ''truepositive'' person scores higher according to the measure than a randomly selected ''true-negative'' person does. Separated ROC analyses for comparing the ASC participants with the controls were conducted for angry MMN, happy MMN, and angryderived MMN, and happy-derived MMN. When determining optimal thresholds, we used Youden's index. This value corresponds with the point on the ROC curve farthest from the diagonal line. The diagonal line (sensitivity = 0.5 and specificity = 0.5) represents performance no better than chance. The ROC analysis of angry MMN yielded an AUC value of 0.88 (p,.001) ( Figure 5). According to Youden's index, the most appropriate cutoff point for angry MMN amplitudes exhibiting a sensitivity of 95% and a specificity of 50% was 22.34 mV. By contrast, the AUC values of happy MMN, angry-derived MMN, and happyderived MMN were not significant (p = .63; p = .14; p = .17).

Discussion
This study investigated whether people with ASC exhibit selective deficits during emotional voice processing. The results indicated that people with ASC failed to exhibit differentiation between angry MMN and happy MMN. By contrast, in response to acoustically matched nonvocal sounds, people with ASC differentiated angry-derived MMN from happy-derived MMN to a low degree. P3a specific to emotional voices was reduced in people with ASC, indicating atypically involuntary attention switching. The significant correlation between the MMN amplitudes elicited by angry syllables and the total scores on the AQ indicated that angry MMN amplitudes were associated with autistic traits. ROC analyses revealed that angry MMN amplitudes yielded an AUC value of 0.88 (p,.001) for diagnosing ASC.
People with ASC failed to exhibit negativity bias in responses to emotional voices. In a previous study involving the same paradigm, we determined that negativity bias to affective voice emerges early in life [30]. Angry prosodies elicited a more negative-going ERP and stronger activation in the temporal voice area than did happy or neutral prosodies among infants [50]. Angry and fearful syllables evoked greater MMN than did happy or neutral syllables among adults and infants [30,51]. A recent visual MMN study determined that an early difference occurred during 70 ms to 120 ms after stimulus onset for only fearful deviants under unattended conditions [52]. From an evolutionary perspective, threat-related emotion processing (e.g., anger and fear) is particularly strong and indicates independence of attention [53]. Negativity bias in affective processing occurs as early as evaluative categorization into valence classes does [54]. In this study, the stronger amplitudes observed in angry MMN compared with happy MMN among the controls were obscured among the people with ASC.
The human voice not only contains speech information but can also carry a speaker's identity and emotional state [55]. One MMN study determined that the MMN amplitudes were higher in response to intensity change in vocal sounds than in response to intensity change in corresponding nonvocal sounds. Although vocal intensity deviants may call for sensory and attentional resources regardless of whether they are loud or soft, comparable resources are recruited for nonvocal intensity deviants only when they are loud [56]. Thus, emotional syllables are considered to be more complex than nonvocal sounds and beyond low-level acoustic features [29,30,31,32]. Because emotional MMN, instead of corresponding nonvocal sounds, exhibited a correlation with autistic traits and a positive predictive value for ASC, we speculated that low-level sensory deficits cannot be ascribed completely to social impairments in people with ASC. In addition to lacking differentiation between angry and happy MMN, people with ASC exhibited reduced MMN in response to nonvocal sounds. The discrepancy between the results of this study and those of previous reports may be reflective of the heterogeneous characteristics of clinical participants, auditory stimuli, and task design [11,34,35]. For example, people with low-functioning autism might exhibit different MMN from those with highfunctioning autism [35]. In one MMN study, basic acoustic features in the stimuli, specifically, emotional-neutral standards and emotional-laden deviants, were not controlled [23]. Furthermore, using one-word utterances or vowels as the auditory stimuli might cause variable familiarity or meaning, thus exerting potentially confounding effects on MMN responses [10,22].
Involuntary attention orienting to emotional voices was atypical in people with ASC, as indicated by diminished P3a amplitudes to angry syllables. P3a is reflective of the involuntary capture of attention to salient environmental events [57]. In a previous study, vowels compared with corresponding nonvocal sounds, produced stronger P3a [10]. The attention-eliciting effect may be particu-larly pronounced when threat-related social information is involved [58]. We detected P3a for only emotional syllables, not for acoustically matched nonvocal sounds. Consistent with the results of previous studies [10,59,60,61], our results indicated weaker P3a to emotional syllables among people with ASC compared with controls, suggesting that attention orienting in people with ASC is more selectively impaired to social stimuli than to physical stimuli.
In consistent with previous MMN studies [31,62], our explorative CSD analyses suggested that the major contribution to deviance-standard difference responses comes from the bilateral auditory cortex. Furthermore, a slight trend toward to posterior enhancement observed in ASC for angry and angry-derived deviants could possibly reflect an additional posterior temporal source. The posterior lateral non-primary auditory cortex could be sensitive to emotion voices as indicated by functional neuroimaging [63]. However, given the known inaccuracies with EEG source localization, there CSD findings needs to be confirmed with more accurate source approaches. ROC analyses revealed that the amplitudes of angry MMN yielded a sensitivity of 95% and a specificity of 50% for diagnosing ASC. Strong amplitudes of angry MMN were coupled with low total scores on the AQ when the ASC and control groups were combined. MMN changes can be reliably observed in people with autism [34,64]. The AQ is a valuable instrument for rapidly determining where any given person is situated on the continuum from autism to normality [44]. AQ scores were determined to be associated with the ability to recognize mental state of others according to voices and eyes [65]. Thus, emotional MMM, particularly in response to angry syllables, is potentially useful as a neural marker for diagnosing autism.
Two limitations of this study must be acknowledged. First, regarding sample homogeneity, the generalizability of the results may be limited because people with low-functioning autism were not included. Second, stimuli that lack a quantitatively controlled function related to physical stimulus complexity, for instance, pure tones spectrally matching the fundamental frequency envelope of emotional syllables [29,30,31,32], may limit the selectivity of emotional MMN. This may not be the optimal design, and future studies in which people with severe autism are recruited and a larger sample size and stimuli with greater acoustic correspondence are included are warranted.

Conclusions
This study revealed that ASC involves general impairments in affective voice discrimination as well as low-level acoustic distinction. In addition to reduced amplitudes of MMN in response to acoustically matched nonvocal sounds, people with ASC failed to differentiate between angry and happy syllables. Weak amplitudes of angry MMN were coupled with severe autistic traits. The ROC analysis revealed that the amplitude of angry MMN is suitable for predicting whether a person has a clinical diagnosis of ASC. The ability to determine the likelihood of an infant developing autism by using simple neurobiological measures would constitute a critical scientific breakthrough [66]. Consider- ing the advantages of clinical population assessment [67] and the presence of emotional mismatch response in the human neonatal brain [30], future studies must examine the ability of emotional MMN to facilitate the early diagnosis of infants at risk for ASC.

Supporting Information
File S1 Electroencephalography apparatus and recordings, Figure S1, and Tables S1-S3. Figure S1. Acoustic properties of stimulus materials. Table S1. Physical and acoustic properties for the stimuli. Table S2. Mean amplitudes and peak latencies of MMN to emotional syllables and nonvocal sounds within a time window of 150 to 250 ms at predefined electrodes in each group (Mean 6 SEM). Table S3. Mean amplitudes of P3a to emotional syllables within a time window of 300 to 450 ms at predefined electrodes in each group (Mean 6 SEM). (DOC)