Rhesus Monkeys (Macaca mulatta) Detect Rhythmic Groups in Music, but Not the Beat

It was recently shown that rhythmic entrainment, long considered a human-specific mechanism, can be demonstrated in a selected group of bird species, and, somewhat surprisingly, not in more closely related species such as nonhuman primates. This observation supports the vocal learning hypothesis that suggests rhythmic entrainment to be a by-product of the vocal learning mechanisms that are shared by several bird and mammal species, including humans, but that are only weakly developed, or missing entirely, in nonhuman primates. To test this hypothesis we measured auditory event-related potentials (ERPs) in two rhesus monkeys (Macaca mulatta), probing a well-documented component in humans, the mismatch negativity (MMN) to study rhythmic expectation. We demonstrate for the first time in rhesus monkeys that, in response to infrequent deviants in pitch that were presented in a continuous sound stream using an oddball paradigm, a comparable ERP component can be detected with negative deflections in early latencies (Experiment 1). Subsequently we tested whether rhesus monkeys can detect gaps (omissions at random positions in the sound stream; Experiment 2) and, using more complex stimuli, also the beat (omissions at the first position of a musical unit, i.e. the ‘downbeat’; Experiment 3). In contrast to what has been shown in human adults and newborns (using identical stimuli and experimental paradigm), the results suggest that rhesus monkeys are not able to detect the beat in music. These findings are in support of the hypothesis that beat induction (the cognitive mechanism that supports the perception of a regular pulse from a varying rhythm) is species-specific and absent in nonhuman primates. In addition, the findings support the auditory timing dissociation hypothesis, with rhesus monkeys being sensitive to rhythmic grouping (detecting the start of a rhythmic group), but not to the induced beat (detecting a regularity from a varying rhythm).


Introduction
The ability to perceive a regular beat in music and synchronize to it (e.g., by foot tapping or dancing) is a common and widespread human skill [1].It is also a skill that has been suggested to be domainspecific [2] and, arguably, conditional to the origins of music [3].Nevertheless, it is still unclear whether this ability should be considered species-specific [4].It was recently shown that rhythmic entrainment, long considered a human-specific mechanism, can be demonstrated in a select group of bird species [5,6], and, somewhat surprisingly, not in more closely related species such as nonhuman primates [7].This observation supports the vocal learning hypothesis [8] that suggests that rhythmic entrainment is a by-product of the vocal learning mechanisms that are shared by several bird and mammal species, including humans, but that are only weakly developed, or missing entirely, in nonhuman primates [4].However, since no evidence of rhythmic entrainment was found in many vocal learners (including dolphins, seals, and songbirds [9]), vocal learning may be necessary, but not sufficient [4] for beat induction -the cognitive mechanism that supports the perception of a regular pulse from a varying rhythm [3].
In addition, there might be a dissociation between rhythm perception and beat induction, as was shown in a lesion study with humans [10].This study suggests different cognitive mechanisms to be active for duration-based timing versus beat-based timing, with beat induction being dependent on distinct parts of the timing network in the brain [11,12].We hypothesize that humans share rhythm perception (or duration-based timing) with other primates, while the beat induction (or beat-based timing) is only present in specific species (including humans and a selected group of bird species [6]), arguably as a result of convergent evolution [13].We will refer to this as the auditory timing dissociation hypothesis.
Most existing animal studies on rhythmic entrainment have used behavioral methods to probe the presence of beat perception, such as tapping tasks [7] or measuring head bobs [5].However, if the production of synchronized movement to sound or music is not observed in certain species (such as in nonhuman primates, seals or dolphins [9]), this is no evidence for the absence of beat perception.It could well be that while certain species are not able to synchronize movements to a rhythm, they do have beat induction and as such, can perceive a beat.With behavioral methods that rely on overt motoric responses it is difficult to separate between the contribution of perception and action; more direct, electrophysiological measures such as event-related brain potentials, allow testing for neural correlates of beat perception.
In the current study, we measure auditory event-related brain potentials (ERP) in two rhesus monkeys (Macaca mulatta) using the mismatch negativity component (MMN) as an index of (the violation of) rhythmic expectation using an oddball paradigm [3,14].
MMN has been investigated mainly in mice, rats and rodents (which are primarily negative), and in carnivores (cat) and primates (macaque), which have reported positive results.Most studies, however, use intracranial and single-cell recording techniques and measure stimulus-specific adaptation (SSA), an index that is similar but not identical to MMN (see [15] for a discussion).Just a few studies measured non-invasive scalp-recorded auditory eventrelated potentials (ERPs) in nonhuman primates, with Ueno et al. [16] being the first study, to our knowledge, to show it is possible, in principle, to measure an MMN-like response in an awake, nonsedated chimpanzee (Pan troglodytes).
In the current study, using oddball paradigms [3,14], we record auditory ERPs from two rhesus monkeys (Macaca mulatta) utilizing the MMN as an index of the violation of (rhythmic) expectation.First we tested whether an MMN can be elicited in rhesus monkeys (using deviant tones at random positions in the sound stream; Experiment 1).Second, we investigated whether an MMN can be elicited by infrequent omissions of regular tones (inserting gaps at random positions in the sound stream; Experiment 2).Subsequently, we probed the presence of beat induction by selectively omitting parts of a musical rhythm (randomly inserting gaps at the first position of a musical unit, i.e. the 'downbeat'; Experiment 3).
The latter paradigm has been used previously to show sensitivity to the beat in human adults and newborns [17,18,19,20].In these studies sound sequences were used that are based on a typical 2measure rock drum accompaniment pattern composed of snare, bass and hi-hat spanning 8 equally spaced (isochronous) positions (see Figure 1).Because the MMN is known to be elicited by deviations from temporal expectations [3], it is especially appropriate for testing beat induction.One of the most salient perceptual effects of beat induction is a strong expectation of an event at the first position of a musical unit, i.e., the 'downbeat'.Therefore, occasionally omitting the downbeat in a sound sequence composed predominantly of strictly metrical (regular or 'nonsyncopated') variants of the same rhythm should elicit discriminative ERP responses, that is, if the subject extracted the beat of the sequence.

Ethics Statement
All the animal care, housing, experimental procedures were approved by the National University of Mexico Institutional Animal Care and Use Committee and conformed to the principles outlined in the Guide for Care and Use of Laboratory Animals (NIH, publication number 85 -23, revised 1985).Both monkeys were monitored daily by the researchers and the animal care staff, and every second day from the veterinarian, to check the conditions of health and welfare.To ameliorate their condition of life we routinely introduced in the home cage (1.3 m 3 ) environment toys (often containing items of food that they liked) to promote their exploratory behavior.The researcher that tested the animals spent half an hour interacting with the monkeys directly, giving for example new objects to manipulate.We think that this interaction with humans, in addition to the interaction that was part of the task performed, can help to reduce potential stress related to the experiment.Food and water where given ad libitum.

Participants
Two rhesus monkeys participated in the ERP measurements.Aji, a 2 year old male (referred to as monkey A) and Yko, a 5 year old male (referred to as monkey Y).Both monkeys have normal hearing.They were awake (i.e.not sedated) during the measurements, sitting in a quiet room [3 (l)62 (d)62.5 (h) m] with dimmed lighting and two loudspeakers in front of them.The ERP measurements were performed after a morning session of unrelated behavioral experiments.The animals were seated comfortably in a monkey chair where they could freely move their hands and feet.No head fixation was used and the EEG electrodes were attached to the monkey's scalp using tape.To ease the fixation of the electrodes, the monkey's hair on the scalp and reference ear was shaved.

Stimuli
In Experiment 1 pure sine-wave tones were used for the twostimulus oddball paradigm.Their frequencies were 500 Hz and 1500 Hz, with a duration of 50 ms, and a rise and fall of 5 ms.The frequencies of these tones were within the audible range of both monkeys.
In Experiment 2 a sine-wave with a frequency 1000 Hz was used, with a duration of 50 ms and a rise and fall of 5 ms.
In Experiment 3 sound sequences based on a typical 2-measure rock drum accompaniment pattern (S 1 ) were used, composed of snare, bass and hi-hat, spanning equally spaced positions (see Figure 1).Four further variants of the S 1 pattern (S 2 -S 4 and D) were created by omitting sounds in different positions.Within the patterns the onset-to-onset interval between successive sounds was 150 ms with 75 ms onset-to-offset interval (75 ms sound duration).Patterns in the sequence were delivered as a continuous sound stream.Loudness of the sounds was normalized so that all stimuli had the same loudness.
Sound stimuli were presented through 2 loudspeakers placed 1.1 meters away from the subject (and 1 meter apart from each other).The sound intensity measured at the subject position was approximately 60 dB SPL.
In Experiment 1 and 2 sound inter-onset-intervals were 600 ms and 150 ms, respectively.In both experiments standards (0.9 probability) were randomly replaced (0.1 probability) with deviants and deviant omissions (i.e.silence), respectively.In Experiment 1 for half of the blocks one frequency was used as deviant and the other as standard (i.e. S 500 , D 1500 ), switching roles for the other half of the blocks (i.e. S 1500 , D 500 ).In Experiment 2 the inter-onset-interval was 150 ms (an interval motivated by human studies [21], and that is within the 'preferred tempo' range of rhesus monkeys [22]).
In Experiment 3 the 4 strictly metrical sound patterns (S 1 -S 4 ; standards) made up the majority of the patterns in the sequences (0.225 probability, respectively).In the standard patterns regular omissions occurred in metrically weak positions, leaving these patterns metrically intact.Occasionally, the D pattern was delivered (0.1 probability) in which the downbeat was omitted, which interrupted the metricality of the pattern.The order of the five patterns was pseudo-randomized, enforcing at least three standard patterns between successive D patterns and no D after S 4 to avoid two consecutive omissions.A control sequence (deviantcontrol) repeating the D pattern 100% of the time was also delivered (see [20] for more details).
The ERP measurements were conducted in a repeated session, containing all three experiments in random order.The monkeys participated in one recording session per day, to a total of 11 sessions for monkey A and 23 sessions for monkey Y (monkey Y moved considerably more than Monkey A).All measurement was completed in about one month per monkey.Each experiment consisted of 10 blocks with 306 repetitions for each block.
The electrodes were connected to a Tucker-Davis Technologies (TDT) headstage (#RA16LI) for low impedance electrodes.This headstage was connected to a TDT RA16PA preamplifier, which in turn was connected to a TDT RZ2 processor.RZ2 was programmed to acquire the EEG signals with a sampling rate of 498.25 Hz and the bandpass filters were set at 0.01-100 Hz.
All electrodes were attached using Ten20 Conductive EEG Paste and medical tape, and were referenced to the right ear (fleshy part of the pinna).In the offline analysis, a 0.1-30 Hz bandpass FIR filter (Kaiser-window) was applied.With zero latency set to the onset of the stimuli, epochs of 2100-500 ms (Experiment 1), 0-450 ms (Experiment 2), and 0-600 ms (Experiment 3) were extracted.All epochs were baseline corrected to zero using a 100 ms pre-stimulus interval in Experiment 1 and the whole epoch in Experiments 2 and 3. Epochs that exceeded +/2150 mV amplitude were excluded from the statistical analysis.EMG recordings were obtained from the temporalis muscles.No eventlocked activity was found in these recordings.The number of epochs accepted for analysis for the three experiments are given in the Tables 1-4.
Statistical analysis was performed on the mean amplitudes in a 50 ms wide time window centered on the absolute maximum peak of difference waveforms (i.e. the difference between the standard and deviant wave).The resulting windows are stated underneath the Tables 1 to 4 and marked with gray-shaded rectangles in Figures 3, 4  Table 1.Mean amplitudes of standard-and deviant-waves for each condition and scalp position (Experiment 1).

Pitch Deviants Evoke an MMN-like Response
In Experiment 1 we presented two rhesus monkeys with a sequence of sounds using a two-tone oddball paradigm (see Methods) to see whether an MMN-like response can be elicited.
Figure 3 shows that the electrical brain responses elicited by the standard and deviant stimulus are different for both monkeys, with a morphology comparable to a human MMN, though with a shorter latency (peaks around 90 ms, instead of 150 ms) and slightly larger amplitude as compared to humans (around 10 mV, instead of 5 mV) [23].These differences in latency and amplitude can be attributed to the anatomical differences between human and monkey brains (e.g., skull size, thickness, and the distribution of musculature [24]).
An MMN-like response was found for the deviant responses as compared to physically identical standards in a time-window centered on the absolute maximum of the difference waves (D 500 -S 500 , D 1500 -S 1500 ; See Table 1 and gray-shaded windows in Figure 3).
The results show that physically identical deviant and standard stimuli elicited different responses.The average amplitude of the responses for both monkeys tended to be large in the frontal and central areas, similar to a human MMN [23].Table 1 shows the mean amplitudes for monkey A and monkey Y, for each condition, stimulus type and electrode position.There was no indication of hemispheric differences.
These results are in line with another study showing an MMNlike response in a single chimpanzee (Pan troglodyte) [16] using the same two-tone odd-ball paradigm with scalp-recorded EEG.Together with the current experiment these studies provide evidence that ERP and MMN can be measured in both monkeys and apes.

Omissions Evoke an MMN-like Response
To study whether an MMN can be elicited in response to omissions as well, the same rhesus monkeys were presented with a tone sequence in which tones were omitted (i.e.replaced by silence, see Methods).  1 for details on the time ranges used).doi:10.1371/journal.pone.0051369.g003 Figure 4 shows the electrical brain responses elicited by the standard (S) and the deviant (D; an omission).(Note that Figure 4 shows a time window with three repetitions of the standard tone, marked by rectangles at either side of the time line.)This allows for a comparison of the responses to the first and second tone after the omission.To test the effects of the omission we concentrate on the time range closest to the occurrence of the omission (see Methods; Table 2).In both monkeys the standard stimuli elicit a steady-state response with increased amplitude, phase-aligned to the stimuli.The amplitude of the response for the first tone after the omission (see Figure 4), most notably in monkey Y, neural activity increased after the short period of silence, but returns near to previous levels by the second tone.This could also be interpreted as a response marking the beginning of a rhythmic group [25].
Mean amplitudes of responses elicited by standard and deviant stimuli were measured within a time window centered on the absolute maximum of the D minus S difference waves (see Table 2 and gray-shaded windows in Figure 4).
Again for both monkeys the average amplitude tended to be large in the frontal and central areas, without any laterality effects.
The ERP responses to the omission (red lines in Figure 4) have a morphology comparable to human MMN (i.e.negative in early latencies).However, the polarity of the responses, probably due to inter-individual differences, were different in the two monkeys.Nevertheless, there is a small, but significant amplitude difference between the standard tone and the omission in a time range comparable to human MMN [21,26] suggesting that the omission was indeed detected.2 for details on the time ranges used).doi:10.1371/journal.pone.0051369.g004

Rhesus Monkeys do not Detect 'Loud Rests', but are Sensitive to Rhythmic Grouping
In Experiment 3 we presented the same two rhesus monkeys with complex stimuli consisting of sound sequences based on a typical rock drum accompaniment pattern (see Figure 1).
The standard stimuli are four randomly presented and strictly metrical sound patterns (S 1 -S 4 ), with a deviant pattern (D) presented which the 'downbeat' omitted.Humans adults perceive the D pattern within the context of standards as if the rhythm was broken, stumbled, or became strongly syncopated for a moment [20].We refer to the omission at the start of D as a 'loud rest' and the omissions in S 2 -S 4 as 'silent rests'; Music theory suggests the former to sound 'syncopated' (a violation of a metric expectation) and the latter not [3].
A sequence repeating the D pattern 100% of the time was also presented ('deviant-control' or D control ) to allow controlling for acoustic effects on the ERP.
On the basis of the dissociation hypothesis, and the observation that monkeys apparently can not synchronize to a beat [7] but are sensitive to auditory timing [12], one might expect that monkeys are sensitive to rhythmic structure (interval-based timing) but not to metric structure (beat-based timing).This hypothesis predicts that omissions that play a role in rhythmic grouping [27] can be detected, as they mark the structure of a rhythmic pattern (as is the case in D control ), consequently not eliciting an MMN as they are part of the regularity.In contrast, the omissions that do not affect the rhythmic grouping will not be detected as part of a regularity, since they occur irregularly (as is the case in S 2 -S 4 and D) and hence may elicit an MMN.
In humans these differences in salience appear to be related to the coding of an internal representation of the rhythmic structure of a sound pattern [27], with the first sound after a relatively long inter-onset interval determining the rhythmic group structure [25].If this is the case we expect the first sound of a repeated rhythmic pattern (D control ) -but not a randomly inserted pattern (D) -to elicit a response marking the beginning of a rhythmic group [25].
An alternative hypothesis is based on the observations made in human adults and newborns using the same stimuli and experimental paradigm [17,18,19,20].This hypothesis predicts that primates are not only able to sense rhythmic grouping, but are also able to detect the regular beat that is induced by a varying rhythmic stimulus.The perception of a 'loud rest' -a violation of a temporal expectation reflected by an MMN-like signal-can serve as evidence for the presence of a strong metric expectation [3].This hypothesis predicts an large and early MMN for the omission in the deviant (D, containing a 'loud rest'), but no or considerably smaller MMN for the omissions in the standard (S 2 -S 4, containing 'silent rests').And since the omission in the deviant-control (D control ) is expected -the pattern is presented repeatedly -, there as well no MMN is predicted.If these three aspects are observed (as they were found in human adults and newborns [20]), they suggest that a regular beat is extracted from the auditory stimulus.3 and 4 for details on the time ranges used).doi:10.1371/journal.pone.0051369.g005 This could be interpreted as evidence against the vocal learning hypothesis.
Figure 5 shows that the electrical brain responses elicited by omissions in the standard (S 2 -S 4 ) and deviant-control (D control ) are relatively flat, and different from the deviant (D), with the latter eliciting a more pronounced negative peak, most notably in monkey Y.This suggest a similar result as was found human adults and newborns.However, the ERP response to S 1 (dotted black line in Figure 5) is not different from that in response to D (solid red line in Figure 5), while D contains an omission and S 1 does not.This seriously weakens the interpretation that the monkeys are able to extract the beat from the stimulus.
Mean amplitudes of responses elicited by standard and deviant stimuli were measured within a time window centered on the absolute maximum of the D minus S 2-4 difference waves (see Table 3 and the early gray-shaded windows in Figure 5).
So in short, while there is a difference between D (containing a 'loud rest') and S 2 -S 4 (containing 'silent rests') and as such evidence in support of beat perception, there is no difference between D and S 1 : a pattern with and without an omission.This makes the interpretation that the monkeys are detecting the beat (by distinguishing 'loud rests' from 'silent rests') less likely and leads to the alternative hypothesis that the monkeys are solely detecting rhythmic groups [21][22]: the first note of a rhythmic group (separated by an omission) eliciting an MMN-like response in D control (but not in D).
Mean amplitudes were measured in a late time window just after the first tone (after 200 ms), centered on the absolute maximum of the D minus D control difference waves (see Table 4 and the late gray-shaded windows in Figure 5).
These results suggests that the monkeys are actually sensing surface-level rhythmic grouping (i.e.detecting the start of a repeating rhythmic group) instead of the induced beat (i.e.detecting a regular pulse in a varying rhythmic pattern).As such, we have to conclude that rhesus monkeys, contrary to what has been shown for human adults and newborns, show no sign of representing the beat in music, but apparently do represent rhythmic groups.

Discussion and Conclusion
Electrophysiological measures such as event-related brain potentials (ERP) are a useful tool in the study of beat induction the metrical encoding of rhythm, especially in examining its predictive nature [3].An informative component of ERP is the mismatch negativity (MMN): a negative deflection in the brain signal that occurs if something unexpected happens while listening (even during passive listening) [23].This MMN is generally thought to reflect an error signal that is elicited when incoming sensory information does not match the expectations created by previous information.Also abstract information (i.e. one auditory feature predicting another) and omissions [21,26] can cause an MMN, resulting in an interpretation of the MMN as reflecting the detection of regularity-violations as part of a predictive process, rather than just sample matching to sensory memory [28].
In the current study we demonstrate for the first time that an MMN-like ERP component can be measured in rhesus monkeys (Macaca mulatta), both for pitch deviants (Experiment 1) and omissions (Experiment 2).Together these results provide support for the idea that ERP and MMN can be used as an index of the detection of regularity-violations in an auditory signal in rhesus monkeys.
In addition, we showed that rhesus monkeys are not able to detect the regularity induced by a varying rhythm, while being sensitive to the rhythmic grouping structure.These findings are in support of the hypothesis that beat induction (the cognitive mechanism that supports the perception of a regular pulse from a varying rhythm) is species-specific, and it is likely restricted to vocal learners such as a selected group of bird species [4], while absent in nonhuman primates such as rhesus monkeys [7].This is evidence in support of the vocal learning hypothesis.
Furthermore, the results are in line with the auditory timing dissociation hypothesis, suggesting rhythm perception to be distinct from beat perception [10,11,12].However, the current paradigm, with just a few electrodes measuring EEG, does not allow us to say anything about the brain networks that might be involved.For this fMRI and other brain imaging techniques with a high spatial resolution are needed [29].
And finally, the current study suggests, together with the few existing studies on auditory [16] and visual [30] processing in monkeys, EEG to be a worthwhile, non-invasive alternative in the study of cognitive and neural processing in primates.

Figure 1 .
Figure 1.Schematic diagram of the rhythmic stimulus patterns used in Experiment 3 (Adapted from Honing et al., 2009).doi:10.1371/journal.pone.0051369.g001 , 5. In all three experiments channel Cz was used for the latency measurements.The resulting values were fed into an analysis of variance (ANOVA), where Electrode sites were treated as a within subject variable and all other variables as grouping variables.For Experiment 1 factors Stimulus (500 Hz vs. 1500 Hz) 6 Type (Deviant vs. Standard) 6 Electrode (Fz vs. Cz vs. Pz vs. F3 vs. F4) were used, for Experiment 2 Type (Omission vs. Sound) 6 Electrode (Fz vs. Cz vs. Pz vs. F3 vs. F4), and for Experiment 3 Type (Deviant vs. Deviant control vs. S 1-4 ) 6 Electrode (Fz vs. Cz vs. Pz vs. F3 vs. F4).Greenhouse-Geisser correction was used where necessary (corrected p, df and epsilon values reported).

Figure 3 .
Figure 3. Event-related potentials at Cz for Experiment 1. Zero-aligned ERP responses for standard (S 500 , S 1500 ) and deviant (D 500 , D 1500 ) tones for monkey A and monkey Y. Stimulus positions are marked with rectangles; The gray-shaded areas indicate the time windows used in the statistical analysis (See Table1for details on the time ranges used).doi:10.1371/journal.pone.0051369.g003

Figure 4 .
Figure 4. Event-related potentials at Cz for Experiment 2. Zero-aligned ERP responses for standard (tone) and deviant (omission) for monkey A and monkey Y. Stimulus positions are marked with rectangles; The gray-shaded areas indicate the time windows used in the statistical analysis (See Table2for details on the time ranges used).doi:10.1371/journal.pone.0051369.g004

Figure 5 .
Figure 5. Event-related potentials at Cz for Experiment 3. Omission-aligned ERP responses for the standard (S 2 -S 4 ; solid blue line), deviant (D; solid red line), and deviant-control (D control ; dashed red line).The standard without omission (S 1 ; dotted black line) is shown zero-aligned with both deviants (D and D control ) for comparison.The gray-shaded areas indicate the time windows used in the statistical analysis (See Tables3 and 4for details on the time ranges used).doi:10.1371/journal.pone.0051369.g005

Table 2 .
Mean amplitudes of standard-and deviant-waves for each scalp position (Experiment 2).
Note.Mean amplitudes (mV) are indicated with SE values in parentheses.S: values for standard stimuli; D: values for deviant stimuli (omissions), n: number of epochs.The time windows adopted are 124-174 ms for monkey A and 77-127 ms for monkey Y (See Methods).doi:10.1371/journal.pone.0051369.t002

Table 4 .
Note.Mean amplitudes (mV) are indicated with SE values in parentheses.S 1-4 : values for standard stimuli; D: values for deviant stimuli; D control: values for deviant-control stimuli; n: number of epochs.The time windows adopted are 105-155 ms for monkey A and 73-123 ms for monkey Y (See Methods).doi:10.1371/journal.pone.0051369.t003Mean amplitudes of standard-(S 1-4 ), deviant-(D), and 'deviant-control'-waves (D control ) in the late window (just after the first sound) for each stimulus type and scalp position (Experiment 3).Note.Mean amplitudes (mV) are indicated with SE values in parentheses.S 1-4 : values for standard stimuli; D: values for deviant stimuli; D control: values for deviant-control stimuli; n: number of epochs.The time windows adopted are 214-264 ms for monkey A and 220-270 ms for monkey Y (See Methods).doi:10.1371/journal.pone.0051369.t004