What Do You Mean by That?! An Electrophysiological Study of Emotional and Attitudinal Prosody

Steven Wickens; Conrad Perry

doi:10.1371/journal.pone.0132947

Abstract

The use of prosody during verbal communication is pervasive in everyday language and whilst there is a wealth of research examining the prosodic processing of emotional information, much less is known about the prosodic processing of attitudinal information. The current study investigated the online neural processes underlying the prosodic processing of non-verbal emotional and attitudinal components of speech via the analysis of event-related brain potentials related to the processing of anger and sarcasm. To examine these, sentences with prosodic expectancy violations created by cross-splicing a prosodically neutral head (‘he has’) and a prosodically neutral, angry, or sarcastic ending (e.g., ‘a serious face’) were used. Task demands were also manipulated, with participants in one experiment performing prosodic classification and participants in another performing probe-verification. Overall, whilst minor differences were found across the tasks, the results suggest that angry and sarcastic prosodic expectancy violations follow a similar processing time-course underpinned by similar neural resources.

Citation: Wickens S, Perry C (2015) What Do You Mean by That?! An Electrophysiological Study of Emotional and Attitudinal Prosody. PLoS ONE 10(7): e0132947. https://doi.org/10.1371/journal.pone.0132947

Editor: Ruth Filik, The University of Nottingham, UNITED KINGDOM

Received: November 29, 2014; Accepted: June 21, 2015; Published: July 15, 2015

Copyright: © 2015 Wickens, Perry. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: The Swinburne University of Technology Human Ethics Committee did not allow the raw EEG data from this project to be made available to the public, even if it was made unidentifiable. This means the raw EEG data is not available even upon request. Alternatively, group means are reported in the paper and means from individuals can be obtained by emailing CP (conradperry@gmail.com).

Funding: Australian Research Council (DP120100883) grant awarded to CP.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The ability to evaluate and integrate verbal and non-verbal speech cues is integral to successful human communication. Moreover, non-verbal speech cues possess much functional utility, which includes communicating the intention, behaviour, and affective state of a speaker, as well as providing linguistic and syntactic utility (e.g., [1]). While these varying roles possess overlap in particular instances, different types of prosody can be conceptualised as functionally distinct and have been more formally classified in relation to emotional, attitudinal, linguistic, and inarticulate characteristics [2].

Affective prosody is comprised of both emotional and attitudinal components, although research has tended to focus on the former [2]. Emotional prosody relates to the expression of emotion by tone of voice and typically conveys a speaker’s emotional state (e.g., anger, happiness–‘I am happy’). In contrast, attitudinal prosody (e.g., [3]) relates to the communication of information associated with a speaker’s attitude that may alter the meaning of an utterance. In general, it has been conceptualized with respect to a speaker’s behavior (e.g., ‘I am being sarcastic’).

Given the limited number of prosodic elements (i.e., pitch, duration, rhythm etc.), emotional and attitudinal prosody rely on overlapping acoustic cues [4]. Although modulation of pitch is important for the expression of both types of prosody, emotional expressions rely more heavily on particular parameters such as voice quality (i.e., shrillness, harshness, softness etc.), whereas prosodic contour (i.e., pitch and rhythm modulation) is vital to attitudinal expressions [5]. Importantly, people can distinguish between different types of attitudes and emotions based on these prosodic characteristics [6–8]. In addition, the extent to which different cues contribute to the recognition of different types of expressions is likely to differ, and the level of consensus as to which cues are vital for the identification of expressions is dependent on the attitudes and/or emotions examined. For example, in relation to the expression of anger in English, there is consistency across studies in terms of the pitch speakers use, with almost all suggesting that angry sentences are conveyed with a higher pitch than neutral ones (e.g., [9–12]). Alternatively, there is less agreement about which prosodic cues consistently differ compared to neutral sentences with verbal sarcasm. For example, Rockwell [13] provided evidence to suggest that sarcastic speech uses a slower tempo, lower pitch, and greater intensity than speech with neutral valence. Bryant and Fox Tree [14], alternatively, found that sarcasm uses a higher pitch and that the duration of their neutral and sarcastic utterances did not significantly differ. Moreover, the authors suggested that amplitude variability and global processing might be important. Apart from sentences with some form of non-neutral valence, speech of a neutral valence may also differ across language dialects, thus making comparisons between neutral and non-neutral speech even more complicated. For example, younger speakers of New Zealand and Australian English often significantly raise the pitch of neutral sentences towards their end [15], and even then display variability. As such, the overall pitch and pitch variability of neutral sentences would increase compared to other English dialects and thus dilute any differences with other types of prosody on these measures.

The Neural Basis of Affective and Attitudinal Processing

In general, research has supported a right hemisphere dominant process for prosodic processing, in contrast to the verbal-linguistic dominant functionality of the left hemisphere (e.g., [2]). More specifically, the right hemisphere specialization hypothesis is typically in reference to emotional prosody, whereby converging evidence from lesion studies has revealed degradation of emotional prosody comprehension in patients with right hemisphere damage [16]. Alternatively, neuroimaging data suggests that both the left and right hemispheres are important for processing emotional prosody, and that the left hemisphere may be more involved with increased task demands [17,18].

One explanation of these findings is via the cue-dependent hypothesis, which suggests that the lateralization of prosodic processing is dependent on the acoustic cues that are critical for the extraction of meaning and that these cues are processed in different anatomical areas of the brain [19]. More precisely, it has been suggested that the two hemispheres process information on different temporal scales, with the left hemisphere using a finer temporal scale than the right hemisphere [20]. Accordingly, the left hemisphere should be more efficient in the analysis of rapidly changing information in speech, such as durational cues. Alternatively, the right hemisphere, which functions with a lower temporal resolution, should be more efficient at processing information related to pitch [21]. Based on these differences, emotional expressions, which tend to involve the modulation of pitch [22], are more typically associated with acoustic cues processed in the right hemisphere.

Consistent with the cue-dependent hypothesis, Schirmer and Kotz [23] propose a three-stage model of vocal emotional processing. In the model, the role of the auditory cortex is initially implicated in the modulation of early sensory acoustic information for which the right hemisphere is dominant. Subsequently, the emotional significance of vocalizations is processed with respect to the different patterns of emotionally salient acoustic cues, proposed to occur within the anterior superior temporal sulcus (STS) for which laterality is congruent with the cue-dependent hypothesis. Emotional information is then available for higher order cognitive processes, such as evaluative judgments, which are subsequently mediated by the right inferior and orbitofrontal cortex.

The first two stages of the Schirmer and Kotz model have also been suggested to be those used in attitudinal processing by Mitchell and Ross [2]. However after a comprehensive review of neuroimaging and lesion data, Mitchell and Ross suggested that attitudinal processing deviates from simple emotional processing (e.g., anger) in the third stage activation of frontal regions. In particular, they suggested that the first two stages were used for the simple decoding of auditory cues and used the same neural processes, irrespective of the actual type of prosody being processed. Alternatively, with the third stage, they suggested that attitudes are likely to be similar to complex emotions and require significantly higher levels of processing than simple emotions. They also noted that, empirically, the prosody of complex emotions had already been shown to activate medial prefrontal areas more than simple emotions, suggesting that they require a number of additional types of processing to occur. Examples given by the authors of instances where attitudes would demand a higher level of processing than simple emotions included judging the authenticity of an utterance, processing to do with theory of mind, and interpreting social intentions.

The Present Study

Event related potentials (ERPs) have been extensively used to examine the online temporal patterns and neural correlates associated with various types of sentence level prosodic processing (e.g., [24–31]). A number of different paradigms have been used and a large number of components have been found to be associated with emotional processing and to a lesser extent attitudinal processing. A summary of the paradigms and results found from some of these studies appears in Table 1. As can be seen, some of the results appear to be quite different from each other. In particular, the results of Liu et al. [32] and Paulmann and Kotz [33] demonstrate the opposite results entirely. Similarly, the cortical distribution of effects found in Kotz and Paulmann [27] and Chen et al. [25] differ despite the use of the same paradigm. One explanation of such differences may be that some components are relatively sensitive to task/stimuli differences. For example, in the study of Paulmann and Kotz, the prosody of words in sentences was examined whereas Liu et al. used affectively valanced vocalizations (e.g., sighing). Likewise, in the study of Kotz and Paulmann, German stimuli were used whereas Chen et al. used Mandarin ones, and because Mandarin is a tonal language and German is not, the way different areas of the brain are recruited due to tone/intonation interactions may differ (e.g., [34]). Additionally, Chen et al. noted that there are important differences in the acoustic characteristics Chinese speakers use to recognize emotion compared to English speakers.

Download:

Table 1. Overview of ERP studies on Emotional and Attitudinal Prosody.

https://doi.org/10.1371/journal.pone.0132947.t001

Alternatively, another possibility for the conflicting results is that due to the small number of studies examining these effects, more minor differences may simply be due to unfortunate statistical anomalies. These may have occurred in part because many of the studies have used relatively low numbers of items per cell for ERP studies (e.g., [35]), which is likely a consequence of the time it takes to present stimuli. For example, some studies have used 30 items per cell [33] and around 50 is common (e.g., [29]). Given this, the extent to which different effects can be replicated and under what conditions are important considerations.

The present study investigates the time-course underlying the on-line processing of emotional and attitudinal prosody using ERPs to expand the understanding of how prosodic information is processed. As such, a primary aim of the study is to bridge the gap in the literature pertaining to processing differences between emotional and attitudinal prosody. In terms of emotional prosody, ERP effects have demonstrated consistency across a range of emotions [28], with anger being the emotion most commonly used [25,27,29]. Anger also has well known acoustic properties [11,25,36]. By adopting anger as the representative of emotional prosody, the current study will allow comparison to previous research. In terms of attitudinal prosody, sarcasm will be employed even though other types of attitudinal prosody have been previously investigated (General irony [37]; Sincerity [38]). Sarcasm was chosen primarily due to it containing a negative aspect [39] like anger, meaning that both types of stimuli evoke negative valence. Importantly, however, the perceived impact and valence of sarcasm can differ to anger in certain contexts (e.g., [40,41]), such as when it is used in humor. This may either increase the negative aspect to the victim (e.g., further humiliation which may be amusing to others who are not the victim), or lessen the negative overall aspect when it is done as a form of politeness (e.g., making fun of an accidental faux pas). Thus, whilst anger and sarcasm may share some negative aspects, this similarity can often be context dependent. As such, matching on the negative valence has its limits. In addition, as the current study employs no situational context, the interpretations of the expressions are likely to be more dependent on the individual’s perception of the acoustic parameters. As a result, there is a trade-off for external validity. Another key problem with examining sarcasm is that it is less clear what the acoustic correlates are (see e.g., [42–44] and further discussion above) and that some forms may require the combination of prosodic and pragmatic/contextual features [3,45]. Neither of these problems is insurmountable since stimuli can simply be generated a-priori so that they are clearly recognised as sarcastic, even if this translates to a limited range of sarcastic expression.

The paradigm adopted will be the prosodic expectancy violation paradigm [25,27,29]. Stimuli consist of cross-spliced/merged syntactically matching and semantically neutral sentences with a prosodically neutral head (‘he has’/’she has’) and a neutral, angry, or sarcastic prosodic ending (e.g., ‘a serious face’). The basic premise is that upon participants hearing the prosodically neutral head, an expectation is generated that the same prosody will continue as the sentence unfolds [27]. A violation of this expectancy will subsequently occur when the cross-spliced emotional/attitudinal prosodic endings are presented. Alternatively, sentences with a neutral prosodic ending do not cause a violation of expectancy and therefore constitute the control condition. Similar to Kotz and Paulmann [27] and Chen et al. [25], the current study also manipulates task demands via introducing two variations of the same experimental design: a prosodic-categorization task and a probe-verification task (Experiment one and two, respectively). The purpose of this manipulation is to influence the extent to which attention is directed towards prosodic processing, thereby permitting a means to assess and compare the automaticity of processing and the effects of task demands on the two types of prosody.

Cross-spliced sentences were used in this study rather than simple sentences as we were interested in examining the extent to which angry and sarcastic prosody differ in terms of their temporal processing. In particular, one manner that emotional stimuli might be prioritized for more immediate processing is via their acoustic differences compared to normal speech. In turn, this could lead such utterances to differ from prosodically attitudinal ones in two important ways. First, as noted above, the acoustic correlates of sarcasm may be more variable than anger, and this may result in a slower detection of prosodic changes than with anger. Second, the cues may be potentially less emotionally relevant than anger–that is, their functional context may differ [27] because the reception of anger may signal the need for an immediate response due to it being more commonly associated with adaptive factors such as violence compared to sarcasm; the latter of which may reflect more subtle higher order social functionality. Thus, it may be more adaptive for people to be attuned to prosodic cues associated with anger than sarcasm. Obviously, cross-spliced sentences are more artificial than naturally occurring sentences, and important prosodic information from the head of sentences [46] may be lost. However, the sentences used in the current study had a neutral head and then a potentially sarcastic or angry body which is a type of change that is relatively common in written form where the speech of a character is in quotes or implied, albeit in a different voice.

There were three main aims of this research. First, to replicate previous findings relating to the temporal processing of emotional prosody in the context of the prosodic expectancy paradigm [25,27–29]. Based on the results of Chen et al. [25], it is expected that differences between angry and neutral stimuli may modulate a very early negative component in the 100-200ms time window. It is also expected that emotional violations will elicit a significant positive deflection in the 200-350ms time window resembling a prosodic expectancy positivity (PEP effect) [25,27–29]. Finally, when attention is directed towards prosodic processing, a late positive complex may be seen for emotional prosody that is indicative of online integration processes [25].

Second, it is less clear what components are likely to be associated with sarcastic processing. Given that the time-course of sarcastic prosody has not yet been examined within this paradigm, only tentative suggestions can be made. The most likely finding is that prosodic violations may deviate from matching neutral sentences due to early sensory change-detection processes (i.e., an early negative effect). In terms of later components, as attitudes differ from emotions [2], possessing less salient acoustic cue configurations and reflecting higher socially derived affectively charged dispositions, attitudinal violations may be processed later in time compared to emotional violations and hence cause a later PEP effect compared to emotional prosody. This expectation is derived in part from the finding that emotional prosodic violations were found to generate earlier PEP effects compared to linguistic prosodic violations [29].

Finally, the extent to which automaticity and task demands affect processing will be examined. For emotional prosody, previous research has shown that the presence of prosodic processing and the temporal PEP effect pattern is relatively independent of task demands [27]. Alternatively, attitudinal prosody is argued to demand a level of selective attention [3], and thus if automaticity effects emerge across the two experiments, the difference in the amplitude of the PEP effect is likely to be greater for attitudinal compared to emotional prosody.

Experiment One

In this experiment, the prosodic expectancy paradigm was used to examine the processing of anger and sarcasm when the task forced participants to process the prosody of the sentences.

Method

Participants.

Fifteen healthy individuals (9 males, 6 females), aged 18–25 years (M = 22.13; SD = 1.77), participated in Experiment one. All were native Australian English speakers with normal or corrected-to-normal vision, without any known psychiatric or neuropsychological disorder, and were not currently taking any psychiatric drugs or substances, as provided by self-report. All participants provided informed written consent and were compensated for their participation. This experiment was approved by the ethics committee of Swinburne University of Technology.

Stimulus Material.

In accordance with the prosodic expectancy paradigm used in Kotz and Paulmann [27], sentences were created comprising a common head structure (‘he/she has’) and a body structure (i.e. ‘a serious face’). The latter body structures were generated from 150 syntactically matching and semantically neutral sentences comprised of three words beginning with either an ‘a’ or ‘an’ determiner. To create the critical experimental manipulations, each sentence body was recorded in three different intonations: i) emotional (angry tone), ii) attitudinal (sarcastic tone), and iii) neutral. In addition, 100 filler and 10 practice sentence bodies of a similar syntax and length were created and recorded. The filler sentences all had a neutral tone.

All stimuli were produced by the first author (a native Australian English speaker who spoke with an accent from the same region as the participants) and recorded in a recording studio using a Rode NT2-A microphone with a pop filter at a sampling rate of 48 kHz. The audio recordings were recorded and edited using Protools 9 software. Importantly, given that the stimuli were not accompanied by contextual cues to facilitate interpretation, only those stimuli that were a-priori determined to be easy to prosodically categorize by the authors and a number of collaborators were used. This included 9 observers who categorized the final set of stimuli. They showed accuracy rates of 99.1%, 98.4%, and 98.0% on average when classifying the angry, sarcastic stimuli, and neutral stimuli respectively, and 98.7% on the fillers. Unfortunately, due to a computer problem, we did not collect this data on the actual participants, although given all of the observers were at ceiling it would be very unlikely that the participants in the EEG study would have had high error rates.

Final cross-spliced versions of the sentences were created by pseudo-randomly assigning a head structure to each sentence body. This resulted in three types of stimuli: i) prosodically neutral matching target sentences, ii) prosodically mismatching target sentences (angry or sarcastic prosodic violations), and iii) prosodically neutral filler sentences. In total, there were 450 critical stimuli sentences (150 for each of the three prosodic categories used), which were evenly split into three counterbalanced groups, with each group also having the 100 fillers and 10 practice sentences. The fillers and practice sentences were structured identically to the critical sentences, that is, a head was attached to a body structure. Thus, the most common type of sentence in each group had a prosodically neutral body (150 sentences vs. 50 with sarcasm and 50 with anger). To examine the linguistic characteristics of the sentences that were generated, the materials were acoustically analysed using the Praat [47] and Matlab v2014a software for fundamental frequency (F0), formant F1 and F2, and intensity (decibel). Apart from overall duration, this was done by dividing the sentence bodies into 50ms bins and taking the average of these values for the first 950ms. This latter value was used because all of the sentence bodies were over this duration. The results appear in Fig 1.

Download:

Fig 1. Mean fundamental frequency F0 (Hz), Loudness (DB), Formant F1 (Hz), and Formant F2 (Hz) values for the first 950ms of the angry, sarcastic, and neutral sentences in 50ms bins, starting from 0-50ms.

Error bars are +/- 2 Standard Errors. The asterisks under the graphs represent whether there was a significant difference between the conditions at each bin at an uncorrected p < .05 level.

https://doi.org/10.1371/journal.pone.0132947.g001

In terms of the overall duration, although a significant difference was found (duration (F(1.81, 283.63) = 60.71, p <. 001, partial η² = .28), the absolute mean durations were relatively similar (In seconds, Angry: 1.53; Sarcastic: 1.60; Neutral: 1.53). The results from the analysis of the bins, alternatively, showed that there were large differences on the measures examined, and that these occurred from the beginning of the sentence bodies. Most notably, anger had a higher pitch than the neutral and sarcastic stimuli, apart from at late time intervals where the pitch of the neutral utterances increased, and this result was especially strong (for example, the first 10 time bins showed p values less than 10⁻⁸ in both comparisons). This rise towards the end of neutral sentences is a common pattern in the speech of younger Australians in the geographical location the study was run. Sarcasm also displayed a higher pitch than the neutral utterances, although the difference was not as large as anger. Apart from difference in pitch, the angry speech also displayed heightened intensity, which is consistent with other English studies, although the differences were relatively small (albeit significant) [42,44], which was at least in part due to a deliberate effort made in recording to keep this measure similar across stimuli types. There were also clear differences between the different sentences at early time windows on Formant F1 and F2. Taken together, all of the measures suggest that, with our stimuli, anger, sarcasm, and neutral intonations are clearly prosodically different from each other on easily quantifiable measures from early time points.

In addition to the critical stimuli, the filler sentences were compared with the neutral critical sentences using the same four measures. The results showed that there were only minor differences in the values at the different times bins across the four measures, with 17 values under a p < .05 criterion and 8 under a p < .01 criterion. The pattern across the measures also appeared very similar and the classification of the stimuli as neutral in our validation phase was at ceiling. It therefore seems reasonable to suggest that, for all intents and purposes, the filler sentences worked as sentences with neutral valence.

Stimulus Presentation.

E-Prime software version 2.0 (Psychology Software Tools, Inc., Pittsburgh, PA) was used for stimulus presentation via a Compumedics NeuroScan Stim AUDIO system using Verbatim headphones. Participants were systematically allocated to one of the three counterbalanced groups. The stimuli were presented in a pseudorandom order. Stimulus triggers were programmed such that both head and body structure were captured for later processing, although only results from the triggers aligned with the body structure are reported in subsequent sections (i.e., the onset of each epoch was aligned with the start of the body and not the head structure). In terms of actual stimuli presentation (see Fig 2), before and during sentence presentation a fixation cross was located centrally. At the termination of each sentence, the fixation cross was replaced with a question mark and only then were participants able to make a response. Following the response, the fixation cross was displayed again after a button-triggered response by the participant. An inter-stimulus delay of 1000ms with a blank screen was present between trials.

Download:

Fig 2. Stimulus presentation for Experiment One.

https://doi.org/10.1371/journal.pone.0132947.g002

Procedure.

Participants were seated comfortably, given the study information and asked to read and sign informed consent. Before the study commenced, participants were asked to read an initial screen with instructions on it carefully before continuing. Participants were then instructed to pay attention to the speaker’s tone of voice and indicate whether the tone was angry, sarcastic, or neutral by pressing one of three possible keys. Moreover, it was expressed that breaks could be taken, if necessary, at any time where a response was required. In addition, participants were informed about the sensitivity of the EEG to artifacts such as eye movement and muscle activity and were therefore asked to remain as still and relaxed as possible, fixated on the fixation point, and to refrain from excessive blinking, eye, and muscular movements.

EEG Recording and Pre-processing.

Electroencephalogram (EEG) was obtained with a NeuroScan System SynAmps RT amplifier on a Dell Optiplex 780 computer system. The EEG was located in an electrically shielded room. The EEG was recorded with 64 Ag–AgCl electrodes mounted in an elastic cap (International 10/10 System, NeuroCap) sampling at 500Hz. EEG data was referenced online to the FCz electrode while AFz served as the ground. The EEG signal was recorded online with NeuroScan 4.3 software. The impedance for all the electrode sites was kept below 10kΩ.

The data was pre-processed and analyzed offline using EEGLab software [48] in MATLAB (R2010b, 7.11). Initially, epochs of the data for target body structure stimuli were extracted and a band pass filter was set offline between 1 and 35Hz. The data was then decomposed by Independent Component Analysis (ICA, runica algorithm). Subsequently, components identified by the ICA were studied to remove eye movement artifacts, blinks, cardiac rhythm, and any notable muscular or movement artifacts from the signal. The trace was then visually inspected to reject noisy trials and any trials containing notable artifacts. Across conditions, an average of 9.3% of trials from each participant were rejected by this procedure. EEG epochs (-200–1000ms) were time-locked to the onset of the body structure, baseline corrected (-200–0ms) and re-referenced offline to a common average reference.

Data Analysis.

According to previous research [25,27,28,49], the following time ranges were chosen to examine various ERP components: 100–200ms, 200–350ms, and 450–700ms. Previous research also demonstrates anterior dominant effects for the early time windows (100-200ms and 200-350ms [25,27,28,31,32,49] and this was corroborated by visual inspection of the effects in the current data. Consequently, the right and left anterior quadrants were statistically examined, as were the fronto-central electrodes above the midline (see Fig 3). Alternatively, late positive effects (>450ms) typically reveal a centro-parietal maximal distribution [25,38,49,50] and hence the left and right centro-parietal quadrants were examined in the 450-700ms time-window (see Fig 3). To examine these, mean amplitudes were calculated across prosodic condition (neutral, attitudinal, emotional) and region (left, centre, right in the two earlier time windows, and left and right hemisphere in the latter one), and these were considered repeated factors in the analysis.

Download:

Fig 3. Head-map of electrode positions displaying regions of interest (anterior: left, central, right; centro-parietal: left, right).

https://doi.org/10.1371/journal.pone.0132947.g003

Results

The data was first screened for outliers with respect to grand average amplitude for hemisphere, prosodic condition, and time-window. No outliers were identified with values greater than 3 standard deviations from the mean. Sphericity of the data was also assessed using Mauchley’s test. When this test was violated, a Greenhouse-Geisser correction was used, as is standard with EEG data [51]. A number of representative electrodes are shown in Fig 4. As can be seen, the dominating response and largest difference between the prosodic conditions occurred in the anterior electrodes, with the components in the posterior region showing a mirrored yet markedly reduced effect.

Download:

Fig 4. ERPs elicited by angry, sarcastic, and neutral sentences at selected representative electrode-sites in Experiment one.

The two areas between the first three dotted lines represent the two earlier time windows examined (100-200ms and 200-350ms). The two latter dotted lines represent the later (450-700ms) time window.

https://doi.org/10.1371/journal.pone.0132947.g004

Early Anterior Negativity (100-200ms)

A negative peak was observed for all conditions in the 100-200ms time-window, and was examined using a 3 (Prosodic Condition) × 3 (Region) ANOVA. Contrary to expectations, there was no significant effect of mean EEG activity across the prosodic conditions and nor was there a significant effect of hemisphere or interaction between the two, all Fs < 1 (see Fig 5).

Download:

Fig 5. Grand average ERPs for the three conditions averaged over the anterior left (F7, F5, F3, FT7, FC5, FC3), central (F1, FZ, F2, FC1, FCZ, FC2), and right (F8, F6, F4, FT8, FC6, FC4) electrode regions in Experiment one.

The space between the dotted vertical lines represents the time frames that the average activation was derived from for the statistical comparisons. The whole-head activation maps are for Angry-Neutral and Sarcastic-Neutral comparisons at 250ms.

https://doi.org/10.1371/journal.pone.0132947.g005

Early Anterior Positivity (200-350ms)

Visual inspection of the data revealed an anterior positive deflection at 200-350ms. Contrary to expectations, this potential was relatively similar for both angry and sarcastic violations, therefore measures of peak onset latency were not performed. A 3 (Prosodic Condition) × 3 (Region) ANOVA revealed that there was a significant effect of Prosodic Condition (F(2, 28) = 9.85, p < .005, partial η² = .41) and Region (F(2, 28) = 13.01, p < .001, partial η² = .48), as well as an interaction between the two (F(4, 56) = 3.02, p < .05, partial η² = .18) (see Fig 5).

Further planned comparisons were conducted to examine the difference in amplitude deflection in neutral-angry, neutral-sarcastic, and angry-sarcastic comparisons. The results showed that there was a significant difference between the angry and neutral (F(1, 14) = 14.61, p < .005, partial η² = .51) and the sarcastic and neutral sentences (F(1, 14) = 8.48, p < .05, partial η² = .38), but only a marginal difference between the sarcastic and angry sentences (F(1, 14) = 4.37, p = .066, partial η² = .22). There was also a marginal interaction between Region and Prosodic Condition with the angry and neutral sentences (F(1, 14) = 3.64, p = .077, partial η² = .21), and a significant interaction with the sarcastic and neutral ones (F(1, 14) = 6.73, p < .05, partial η² = .31). This appeared to be caused by higher positive potentials in the fronto-central compared to lateralized regions in the sarcastic and angry condition compared to the neutral one.

Late Posterior Positivity (450-700ms)

Despite predictions, visual inspection of the data revealed only a slight late occurring positive deflection in the centro-parietal regions in the 450-700ms time-window (see Fig 6). A 3 (Prosodic Condition) × 2 (Hemisphere) ANOVA showed that this difference was not significant across the prosodic conditions (F(2, 28) = 1.23, p = .31, partial η² = .081), and nor was there an effect of hemisphere (F(1, 14) = 1.08, p = .32, partial η² = .07) or an interaction between the two (F(2, 28) = 1.94, p = .16, partial η² = .12)

Download:

Fig 6. Top Panel: Grand average ERPs for the three conditions averaged over the left (CP5, CP3, CP1, P5, P3, P1, PO5, PO3) and right (CP6, CP4, CP2, P6, P4, P2, PO6, PO4) electrode regions in Experiment one.

The space between the dotted vertical lines represents the time frame that the average activation was derived from for the statistical comparisons.

https://doi.org/10.1371/journal.pone.0132947.g006

Overall, the results from Experiment one demonstrate a divergence of mismatching angry and sarcastic violations from neutral sentences at 200-350ms post violation onset. Importantly, the significant positive deflection occurred at a similar time-course for both emotional and attitudinal violations. Alternatively, the results did not reveal any significant differences between the conditions with respect to early negative or late positive potentials.

Experiment Two

Experiment two was run to examine the effect of task demands on prosodic processing. It was identical to Experiment one except the task was altered so that the main focus was not on prosody. The intention was to examine the extent to which prosodic processing is done automatically and how processing changes with altered task demands. To do this, participants were simply presented sentences and asked to determine if a probe word occurred in them.