Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Behavioral Quantification of Audiomotor Transformations in Improvising and Score-Dependent Musicians

  • Robert Harris ,

    Affiliations Department of Neurology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands, BCN Neuroimaging Center, University of Groningen, Groningen, The Netherlands, Prince Claus Conservatoire, Hanze University of Applied Sciences, Groningen, The Netherlands

  • Peter van Kranenburg,

    Affiliation Meertens Institute, Amsterdam, The Netherlands

  • Bauke M. de Jong

    Affiliations Department of Neurology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands, BCN Neuroimaging Center, University of Groningen, Groningen, The Netherlands


The historically developed practice of learning to play a music instrument from notes instead of by imitation or improvisation makes it possible to contrast two types of skilled musicians characterized not only by dissimilar performance practices, but also disparate methods of audiomotor learning. In a recent fMRI study comparing these two groups of musicians while they either imagined playing along with a recording or covertly assessed the quality of the performance, we observed activation of a right-hemisphere network of posterior superior parietal and dorsal premotor cortices in improvising musicians, indicating more efficient audiomotor transformation. In the present study, we investigated the detailed performance characteristics underlying the ability of both groups of musicians to replicate music on the basis of aural perception alone. Twenty-two classically-trained improvising and score-dependent musicians listened to short, unfamiliar two-part excerpts presented with headphones. They played along or replicated the excerpts by ear on a digital piano, either with or without aural feedback. In addition, they were asked to harmonize or transpose some of the excerpts either to a different key or to the relative minor. MIDI recordings of their performances were compared with recordings of the aural model. Concordance was expressed in an audiomotor alignment score computed with the help of music information retrieval algorithms. Significantly higher alignment scores were found when contrasting groups, voices, and tasks. The present study demonstrates the superior ability of improvising musicians to replicate both the pitch and rhythm of aurally perceived music at the keyboard, not only in the original key, but also in other tonalities. Taken together with the enhanced activation of the right dorsal frontoparietal network found in our previous fMRI study, these results underscore the conclusion that the practice of improvising music can be associated with enhanced audiomotor transformation in response to aurally perceived music.


Classical music offers an interesting window on motor learning, not only because of the high level of motor control exhibited in performance [1], but especially because of the historically developed practice in Western culture of using sheet music not only to learn specific pieces, but also to learn how to play the instrument itself [2]. While in classical music education, great emphasis is placed on aural skills such as the identification of intervals and triads and their inversions, the skill of playing music ‘by ear’ is rarely taught or assessed. Classical musicians are de facto ‘score-dependent’, a term which refers not only to the fact that the music performed is an artistic representation of the music score, but also that it is learned from the printed score and not by aural imitation. From a global perspective, however, score-dependence may be considered to be the exception. All over the world, both in the past and in the present, instrumental music was and is generally learned by imitation and improvisation [3], a practice which intuitively seems more compatible with the learning of an audiomotor skill. Surprisingly, however, little study has been made of the relationship of the practice of improvisation with the development of audiomotor integration.

With few exceptions, neuroscientific studies to date have recruited mainly classically-trained musicians [413]. Studies contrasting improvising with score-dependent musicians are scarce. Tervaniemi and colleagues [14] noted an advantage for improvising musicians not only in the conscious detection of changes in melodic patterns, but also in subsequent brain responses to these changes during non-attentive listening. These results were corroborated and extended in a study by Vuust and colleagues [15] contrasting non-musicians not only with classical, but also with jazz and rock musicians. In contrast with all other groups, including classical musicians, jazz musicians exhibited significantly higher mean mismatch negativity (MMN) amplitudes to pitch, timbre, intensity, and rhythm. Behavioral scores as measured by AMMA, the Advanced Measures of Musical Audition [16] were not higher for jazz musicians, with the exception of the rhythm subtest. In a behavioral study, however, Woody and Lehmann [17] demonstrated that ‘vernacular’ musicians outperformed ‘formal’ musicians in aural learning, the latter requiring twice the number of trials to achieve accuracy in vocal reproduction of a melody and almost three times as many trials to achieve accuracy in instrumental reproduction of a melody (by ear).

The observed differences may be understood in the context of the procedural-declarative model of learning and memory [18] and the associated dual-stream model of action and perception [19] which propose that online recruitment of implicit, procedural knowledge via the dorsal stream enhances performance without the prerequisite of declarative knowledge [20], a phenomenon which can be observed in children who have clearly mastered the language, but know little about grammar. Musicians who play ‘by ear’ would therefore be able to employ procedural knowledge of music syntax to enhance audiomotor performance without knowing much about music theory or harmony. At the same time, score-dependent musicians who have acquired extensive declarative knowledge of music theory and harmony might not necessarily have acquired comparable procedural knowledge of music syntax.

The transformation of imagined or perceived pitch into goal-directed movement while playing a music instrument is a function of parietal cortex, just as the transformation of visually perceived music notation. The involvement of parietal cortex in audiomotor transformations has been demonstrated by imaging studies implicating the superior parietal cortex, in particular the intraparietal sulcus (IPS), not only in music transposition [21] and retrograde musical transformations [22], but also in pitch-to-space transformations [23]. Similar parietal activations have also been observed in pianists while sight-reading music and have been interpreted as reflecting the visuomotor transformation of music notation into spatial keyboard coordinates [2425, 2]. It is not inconceivable that score-dependent performance practice might bias sensorimotor learning in the direction of visuomotor learning, rather than audiomotor learning. The inability to play ‘by ear’ would be a logical consequence.

In a recent fMRI experiment, we assessed cerebral activations in improvising and score-dependent musicians while they imagined playing along with both familiar and unfamiliar excerpts composed in the two-part, tonal style. A crucial difference between the two groups was the significantly larger activation of a right hemisphere network of posterior superior parietal and dorsal premotor areas observed in improvising musicians. This was interpreted as evidence of enhanced pitch-to-keyboard space transformation, pointing towards the superior ability of improvising musicians to perform audiomotor transformations while listening to music [26].

In the present study, we investigated the instrumental performance of both groups of musicians, quantifying their ability to organize playing movements cued by aurally perceived music in an audiomotor alignment score. Our hypothesis was that improvising musicians would exhibit superior ability to replicate and transpose aurally perceived music on their instrument.

Materials and Methods

This study was approved by the Medical Ethics Committee of the University Medical Center Groningen, Groningen, The Netherlands. All subjects gave written informed consent in accordance with the Declaration of Helsinki (2008), prior to participation.


The improvising and score-dependent musicians who participated in this study had all previously participated in an fMRI study of audiomotor integration [26]. The group of improvising musicians consisted of eleven organists and one pianist while the group of score-dependent musicians consisted of ten pianists. All subjects were male. They were recruited from all over The Netherlands. The distinction between improvising and score-dependent musicians was not based on formal assessment of their ability to improvise, but on the nature of their performance practice i.e. whether or not their professional keyboard performances involved improvisation.

In The Netherlands, the eighteenth-century practice of keyboard improvisation has persisted among church organists. Organists are accustomed to improvise preludes and postludes before and after the service as well as introductions, intermezzos, and modulations while harmonizing and accompanying psalms and hymns. Organ concerts feature improvisation and many organists participate in improvisation competitions. Of the eleven organists, seven had participated in international improvisation competitions and six had won prizes. By contrast, the professional practice of the participating score-dependent pianists involved performance of the repertoire as notated and did not include extemporization. Like most classically-trained musicians, these pianists learn the pieces they perform from the printed score, frequently committing them to memory and performing by heart.

With the exception of two conservatory students (third and fourth year, one from each group), subjects had all completed one or more conservatory degrees in classical music performance in organ or piano. During their professional training, organists received the same instruction in music theory and ear training as score-dependent pianists and performed compositions learned from music notation just as their score-dependent colleagues. As piano is the required secondary instrument for organists in The Netherlands, all organists were able to play both the piano and the organ, making it possible to compare performance in the two groups using the same instrument. Mean age of the improvising group was 46 years (SD: ±14); one subject was left-handed, one subject had perfect pitch. In the score-dependent group, mean age was 39 (SD: ±13); two subjects were left-handed, three had perfect pitch (see Table 1). There was no significant difference between the two groups in age (T-value: 1.15; p = 0.265) or years of professional experience (T-value: 0.96; p = 0.347). Professional experience was defined as the number of years since completion of the propadeutic exam which, in the Dutch educational system, marks formal admission to the second year of the conservatory.

Experimental procedure

Subjects performed six different tasks on a digital piano on the basis of aural perception of short (±6s) excerpts from polyphonic pieces in the two-part, tonal style consisting of a bass and a treble voice of equal rhythmic and melodic salience. For examples of music excerpts, see: S1 Transcriptions. Excerpts were presented in six blocks, each devoted to a separate task. With the exception of the first excerpt in each block which was used to rehearse the task, the excerpts could all be considered unfamiliar, having been selected from pieces composed specifically for the fMRI experiment [26] and therefore heard only once in the scanner.

Twenty-seven different excerpts were presented. Excerpts were used only once during the experiment with the exception of six excerpts without aural feedback in block 1 and 2 which were later presented (with feedback) in a different tonality, four as the first motif of one of the sequences in block 5 and two for a transposition task. Each excerpt was comprised of a complete motif or phrase.

Tasks and conditions

Six tasks were performed on a digital piano under one of two conditions: with aural feedback or without (silent keyboard mode). Tasks were presented in six blocks, each containing five to eight excerpts (the number of excerpts is indicated in parentheses):

  1. Play along [no aural feedback] (n = 8): subjects were instructed to play together (simultaneously) with two consecutively presented recordings of the excerpt, without aural feedback. The tonality, which was the same for all excerpts, was announced before the task started.
  2. Replicate [no aural feedback] (n = 5): subjects were instructed to listen to the excerpt twice and then play it themselves, without aural feedback. The tonality, which was the same for all five examples, was announced before the task started.
  3. Replicate and then transpose to the relative minor [aural feedback] (n = 5): subjects were instructed to listen to the excerpt twice, a) play it once in the same (major) key and then b) transpose it to the relative minor.
  4. Replicate, adding inner voices [aural feedback] (n = 5): subjects were instructed to listen to the excerpt twice and then play it, adding inner voices belonging to the harmony.
  5. Replicate [aural feedback] (n = 7): subjects were instructed to listen to the excerpt twice and then play it in the same key. Subjects were informed that the excerpts used in this task each contained a sequential repetition. For an example, see excerpt 5 (S1 Transcriptions).
  6. Replicate and then transpose [aural feedback] (n = 5): subjects were instructed to listen to the excerpt twice, a) play it once in the same key (all excerpts were in g minor) and then b) transpose it to e minor.

While all tasks and conditions involved a form of replication of the aural model, they were also designed to promote recruitment of implicit knowledge of music and music syntax. Performance without aural feedback, for example, was designed to elicit top-down recruitment of procedural knowledge of the tonality. The two-part style used in all tasks would induce disambiguation of the harmony based on prior experience, and the obligation to add inner voices in block 4 might enhance this effect. The sequential repetition found in all excerpts in block 5 was designed to recruit knowledge of both harmony and tonality, just as the transposition tasks in blocks 3 and 6.

Data acquisition

The music excerpts were performed by one of the researchers (RH), a professional pianist, on an AKAI piano-action MPK88 MIDI (Musical Instrument Digital Interface) controller, without pedal, using the Steinberg ‘The Grand 3’ Yamaha C7, and recorded as midi sequences in Cubase AI5 using a Steinberg CI2 audio interface. Instead of recording the audio signal, MIDI registers key depression and velocity, allowing digital analysis. The choice not to use pedal was motivated by the fact that it might compromise the independence of the voices in the two-part tonal style and/or confound the analysis of the midi sequences. Every effort was made to achieve an ecologically valid concert performance despite the use of an electronic instrument.

Data acquisition made use of the same instrument used to record the aural model. Audio was presented with Stagg SHP-2300 stereo headphones. Subjects familiarized themselves with the keyboard prior to acquisition and adjusted the volume themselves. Before the experiment started, the protocol was explained in detail, block by block, making use of printed instruction material. Subjects then rehearsed the first excerpt from each block. Subjects were instructed that each of the six blocks would consist of at least five examples, one previously rehearsed excerpt and four or more unfamiliar excerpts.

During acquisition, the instructions for each block were repeated before it began, and the subject was reminded that the first example had already been rehearsed. An oral prompt announced that the recording was about to begin. Subjects then heard the pitches from the first beat of the ensuing recording, or if the example began with an upbeat, both the upbeat and the first beat. After a few seconds a woodblock (tuned to the first beat) would indicate the tempo by playing a full bar in the tempo of the recording, one note to a beat. Then the presentation of the music would begin. Subjects were allowed to hear each example twice, with an empty bar between presentations. During the empty bar, the woodblock kept time, playing on every beat. The amount of time given to perform each task was three times the length of the aural model.


Analysis consisted of a comparison between the original midi sequences used as the aural model and the midi sequences produced by the subjects during acquisition. The original midi sequences of the aural model were edited into a separate treble and bass midi sequence in Cubase. In addition, for the transposition tasks, the sequences were transposed to the new key in Cubase and not re-recorded, in order to preserve timing and expression. The midi sequences produced during acquisition were also edited into a separate treble midi sequence and a bass sequence. The ‘finding of the right key’ was edited out of the midi sequence, as well as false starts: subjects playing the first few beats, stopping and then beginning again. In a few cases, subjects did not respond to the aural model and in a few cases, only one voice was played, usually the treble. The inner voices from block 4 were edited out of the midi sequences as well as all other extra improvised voices.

In block 1, a large number of subjects did not play along with the first presentation of each excerpt. To avoid a bias, this first presentation was discarded before analysis, not only for the subjects who did not respond, but also for the subjects who immediately played along with the first presentation. The rehearsed excerpt from each block was also discarded before analysis. Therefore, the total number of midi sequences came to a maximum of thirty-seven per subject, per voice. After editing, the average number of midi sequences was 35.8 (±1.5) for the treble voice and 35.4 (±1.7) for the bass. If no midi sequence was acquired, it was treated as a missing value.

The similarity of the aural model and the performance of the subject was determined by the construction of an alignment. This approach has often been used in musicology, especially in folk song research where it has been used to study the variability of melodies in oral transmission [27]. Algorithmic alignment of melodies was proposed by Mongeau and Sankoff [28]. In this approach, the steps to construct an alignment of two melodies are explicitly formulated such that it can be executed by a computer. The general procedure is to provide the algorithm with two sequences of symbols (notes in our case), after which the algorithm will return the optimal alignment of the two sequences together with a score indicating the extent to which the sequences were able to be aligned. In the present study, we used this score as a proxy for the similarity between the aural model and the subject’s performance, both of which are represented as sequences of MIDI events.

In recent years, alignment algorithms have often been employed in Computational Musicology and Music Information Retrieval [2931]. The aim of an alignment algorithm is to find the (or one of the) alignments(s) with the highest score. Since the solution space is quite large, a dynamic programming approach is generally taken to find the optimal alignment efficiently. In the simplest form, the optimal alignment and its score are found by filling a matrix D recursively according to: in which x: x1,…,xi,…,xn, and y: y1,…,yj,…,ym are the sequences to be aligned, S(xi, yj) is a similarity measure for arbitrary symbols, and γ is the (fixed) gap score (the gap score is the numerical score awarded to a note in the replication of the aural model that does or does not correspond to a note in the aural model). D(0,0) = 0, D(i,0) = and D(0,j) = . D(i,j) contains the score of the optimal alignment up to symbols xi and yj of sequence x and y respectively and therefore D(n,m) contains the score of the optimal alignment of the complete sequences. The algorithm has both time and space complexity O(nm), which is quadratic. This algorithm is known as the Needleman-Wunsch algorithm [32]. For further details, see: S1 Appendix.

To apply this algorithm to melodies, or in this case midi sequences, the abstract elements of the algorithm that need to be defined are 1) the symbols, 2) the substitution score function, and 3) the gap score γ. In the present study, as we were dealing with MIDI, we took each element from the midi sequence (onset, pitch, duration) as a symbol. We subsequently determined a substitution score function S(xi,yj) and the gap score γ. The intuitive meaning of the substitution score function is: the higher the substitution score of two symbols, the more we want them to be aligned. In general, this implies that the substitution score function will be defined as a similarity measure for symbols. To define the function, we can of course use different properties of the notes. For the present study, we used exact pitch and IOR (interonset interval ratio). Both are available in, or computable from, the MIDI-input. To represent pitch, we used the MIDI-representation, which basically corresponds to the indices of the keys of the keyboard in which a1 (A440) = 69.

The IOR of a given note is the ratio between the IOI (interonset interval) of the note and the IOI of the previous note, where the IOI of a note is defined as the difference in time between the onset of one note and the onset of the next. The IOR can be considered to be the relative duration of a given note with respect to the previous note. For the last note in the sequence we defined the IOI as the duration of that note. For the first note in the sequence, we set IOR to 1.

We defined two substitution score functions: in which p(s) is the pitch of symbol x in MIDI encoding, and where ior(xi) = ioi(xi)/ioi(xi-1), in which ioi(xi) is the time difference between the onsets of xi and xi + 1. We defined the gap score as γ = -0.5 for exact pitch and γ = 0 for IOR. In the event we used Spitch we obtained a value for the similarity of the aural model and the recorded midi sequence with respect to the sequence of pitches, and when we used Sior we got a value for the similarity of the aural model with the recorded midi sequence with respect to the sequence of IORs, which reflects rhythmic similarity.

Since the score of an alignment depends on the length of the midi sequences, normalization is needed to compare different alignment scores. Otherwise, the alignment of two similar long sequences would result in a much higher score than the alignment of two short sequences. We therefore divided the alignment score by the length of the alignment, which is the length of sequence x increased with the number of gaps inserted in x (or the length of sequence y increased with the number of gaps inserted in y). Thus, an exact match resulted in a score of 1, as the maximum value of our substitution score functions is 1, and no gaps are needed, causing the score of the alignment to equal the length of the sequences. Anything less than an exact match resulted in a score lower than 1. The scores that are reported in this paper are the normalized alignment scores.

Our main goal was assessment of the differing ability of improvising and score-dependent musicians to replicate aurally perceived music at the piano. Accordingly, mean alignment scores for the four variables (treble exact pitch, treble IOR (interonset ratio), bass exact pitch, and bass IOR) were subjected to a one-way multivariate analysis of variance (MANOVA) to determine significance of the difference between groups. Subsequently, differences of means were tested for each of the four variables using one-way ANOVA. Interactions between the factors group (improvising, score-dependent), voice (treble, bass), and parameter (exact pitch, IOR) were studied using a three-way mixed (between-subjects/within-subjects/within-subjects) analysis of variance (ANOVA).

For the comparison of replication and transposition, the replication tasks from block 3 and 6 (3a and 6a) were contrasted with the transposition tasks from the same blocks (3b and 6b), based on identical stimuli. Two-way mixed ANOVA was used to investigate the interaction between group (between-subjects) and task (within-subjects) for each of the four variables. Similarly, for the comparison of performance with and without aural feedback, the replication tasks from block 1 and 2 without feedback were contrasted with all replication tasks from blocks 3–6 with aural feedback (3a, 4, 5, and 6a). Again, two-way mixed ANOVA was used to investigate the interaction between group (between-subjects) and condition (within-subjects) for each of the four variables.


Summarizing tasks and conditions, improvising and score-dependent subjects listened to short music excerpts composed in the two-part tonal style and performed various replication tasks, either 1) playing along with the excerpt, 2) listening and then replicating it in the same key, 3) listening and replicating a major-key excerpt, first in the same key and then in the relative minor, 4) listening and replicating the excerpt while adding inner voices, 5) listening to an excerpt containing a sequential repetition and replicating it in the same key, or 6) listening and replicating the excerpt, first in the same key and then in a different key. Tasks were performed under two contrasting conditions: with aural feedback (blocks 3–6) or without (blocks 1 and 2). Alignment scores were computed for exact pitch and IOR (which reflects rhythmic similarity) for the treble and bass voices separately. Mean alignment scores of the participants are presented in the Supporting Information (S1 Alignment Scores). MIDI sequences of the aural model and the individual participants have been made available in the Supporting Information (S1 Research Data).

Group: improvising vs. score-dependent musicians

One-way multivariate analysis of variance (MANOVA) revealed a significant difference between improvising and score-dependent musicians based on their combined audiomotor alignment scores, F(4, 17) = 3.309, p = 0.035. Subsequent one-way ANOVA indicated that improvising musicians’ mean treble alignment scores were significantly higher than those of score-dependent musicians, both for exact pitch and IOR (Fig 1, see Table 2 for exact values and parameters of significance tests). Mean bass alignment scores were also significantly higher for improvising musicians, however only for IOR. The range of alignment scores was larger for score-dependent musicians, particularly for exact pitch, although there was no significant difference of variance. Score-dependent musicians exhibited not only the lowest scores, but also a few of the highest, both in the treble and the bass.

Fig 1. Treble audiomotor alignment: comparison of groups.

Improvising vs. score-dependent musicians (mean ± SD). The comparison concerns the treble voice, all tasks (both conditions). A: exact pitch: improvising > score-dependent and B: IOR (interonset interval ratio): improvising > score-dependent. See Table 2 for exact values and parameters of significance tests.


A three-way mixed (between-subjects, within-subjects, within-subjects) ANOVA was conducted to investigate interaction between group, voice, and parameter which, however, was not observed. A significant two-way interaction was observed between parameter and voice, F(1, 20) = 102.636, p < 0.0001 (see Fig 2).

Fig 2. Interaction between parameter and voice.

Alignment scores were higher for the treble voice (top line) than for the bass (bottom line), both for exact pitch and IOR. The effect of parameter on alignment was larger in the bass (steeper bottom line) than in the treble voice. Significance of the interaction, F(1, 20) = 102.636, p < 0.0001, was determined using three-way mixed ANOVA.

In addition to the two-way interaction, a statistically significant main effect of both voice (treble > bass), F(1,20) = 116.508, p < 0.0001 and parameter (IOR > exact pitch), F(1,20) = 66.584, p < 0.0001 was observed. The latter was not further explored as we considered a pitch-rhythm comparison to be conceptually non-informative. One-way ANOVA indicated that the effect of voice (treble > bass) pertained to both exact pitch and IOR and was significant for both groups (Table 3).

Task: replication vs. transposition

In blocks 3 and 6, the same stimulus was used for two different tasks, enabling a direct comparison between replication (in the original key) and transposition (to a different key or to the relative minor). Two-way mixed ANOVA revealed significant interaction between group (between-subjects) and task (within-subjects), but only for treble exact pitch, F (1, 20) = 4.483, p = 0.047 (Fig 3). A significant main effect was found for task (treble exact pitch: replication > transposition), F (1, 20) = 121.364, p < 0.00001. As can be seen in Fig 3, the difference in replication of exact pitch in the original key by the two groups of musicians was relatively small (top line). The steeper bottom line, however, illustrates the fact that score-dependent musicians performed less well when transposing to a different key. Perusal of individual treble alignment scores revealed that only two of the twelve improvising musicians actually exhibited significantly lower alignment scores for treble exact pitch transposition, compared to replication, while six out of ten score-dependent musicians exhibited significantly lower scores for transposition versus replication. With the exception of bass exact pitch, alignment scores for transposition were all significantly higher in improvising musicians, similar to the group difference for the replication of treble and bass IOR (Table 4).

Fig 3. Interaction between group and task: treble exact pitch.

Treble exact pitch alignment scores were higher for replication (top line) than for transposition (bottom line). The effect of group (improvising > score-dependent) was larger for transposition (steeper bottom line) than for replication in the original key. Significance of the interaction, F (1, 20) = 4.483, p = 0.047, was determined using two-way mixed ANOVA.

Condition: aural feedback vs. no aural feedback

In blocks 1 and 2, subjects had no access to aural feedback during performance of the tasks. Two-way mixed ANOVA revealed interaction between group (between-subjects) and condition (within-subjects), but only for treble IOR, F(1, 20) = 6.254, p = 0.021 (Fig 4). Subsequently, one-way ANOVA indicated that improvising musicians scored higher than score-dependent musicians on treble exact pitch and IOR as well as bass IOR, both with and without feedback (Table 5).

Fig 4. Interaction between group and condition: treble IOR.

Treble IOR scores were higher for performance with feedback (top line) than without feedback (bottom line). The effect of group (improvising > score-dependent) was larger for performance with feedback (steeper top line) than for performance without feedback. Significance of the interaction, F(1, 20) = 6.254, p = 0.021, was determined using two-way mixed ANOVA.


No significant correlations were found between mean alignment scores and either age (treble exact pitch: rs = 0.02; treble IOR: rs = 0.19; bass exact pitch: rs = -0.22; bass IOR: rs = -0.08) or years of professional experience, expressed in number of years since completion of the propadeutic exam (treble exact pitch: rs = 0.02; treble IOR: rs = 0.17; bass exact pitch: rs = -0.14; bass IOR: rs = -0.02). One of the highest scoring organists was actually still completing his bachelor in organ performance at the time of the study.


The results of the present study indicate that improvising musicians can be distinguished from their score-dependent counterparts on the basis of their superior ability to replicate both the pitch and the rhythm of aurally perceived music on their instrument. While this ability is particularly evident in the treble voice, it also extends to the bass voice in the temporal domain. Higher treble alignment scores in improvising musicians could be associated with their superior ability to replicate both the pitch and rhythm of the treble voice in other tonalities (aural transposition). With the exception of treble IOR, aural feedback did not contribute significantly to higher alignment scores in improvising musicians, however, a possible effect of aural feedback on transposition was not assessed. The higher audiomotor alignment scores found here can be seen as evidence of enhanced audiomotor transformations, a notion that is supported by the significantly larger activation of the right dorsal parietal-premotor network identified with fMRI in improvising musicians while the participants imagined playing along with a recording or covertly assessed the quality of the performance [26].

The superior ability to replicate aurally perceived music ‘by ear’, particularly the ability to do so in different contexts, for example during aural transposition, may possibly be associated with enhanced employment of procedural knowledge of music syntax during performance. SRT (Serial Reaction Time) studies have demonstrated that implicit knowledge of low-level action syntax can be acquired non-consciously during the mere repetition of motor sequences [3335]. SRT studies of hierarchically more complex syntax, however, show that mere repetition is not sufficient for implicit acquisition to take place [3638].

The apparent distinction between low- and high-level syntax is reflected by the existence of specific processing networks in the brain dedicated to low- and high-level syntax. Imaging studies indicate that processing of low-level syntax activates a ventral network comprised of the frontal operculum and anterior temporal cortex [39] while complex syntax additionally activates a dorsal network involving posterior inferior frontal gyrus (caudal Broca) and posterior superior temporal cortex [40].

‘String parsing’ has been proposed as the mechanism by which ‘program-level imitation’ of behavior leads to the acquisition of hierarchically complex syntax [41]. It also offers a ‘parsimonious’ explanation for the beneficial effects of improvisation on the acquisition of hierarchically complex music syntax. An important characteristic of the practice methods employed in classical music is the frequent repetition of the notes in the exact order in which they are to be played. While repetition may be expected to lead primarily to segmentation and chunking of the sequence i.e. to the implicit acquisition of low-level syntax, syntax-congruent manipulation of the serial order while playing ‘by ear’ might lead to implicit parsing of the hierarchical structure.

The large individual differences in alignment scores found in the score-dependent group suggest that practice strategies in classical music might not be as uniform as one would think. It would seem that practice methods fostering implicit, non-conscious audiomotor learning are actually employed by a minority of score-dependent musicians and can be said to have a beneficial effect on the implicit acquisition of hierarchical music syntax. Although improvisation, like immersion in language acquisition [42], is a fertile ground for the type of implicit, non-conscious learning that is involved in audiomotor integration [43], parsing of the hierarchical structure is apparently also achieved during the practice of repertoire, given the right approach.

Treble alignment scores for replication were significantly higher than for transposition in both groups, but only for exact pitch, suggesting that transposition may not be just another form of pitch replication. Recent studies have implicated the right intraparietal sulcus (IPS) in transposition [21], retrograde musical transformations [22], and pitch-to-space transformations [23]. The fact that, during transposition, improvising musicians scored significantly higher than score-dependent musicians on both treble exact pitch and IOR indicates that they are more capable of performing such pitch-to-space transformations. It seems quite likely that improvising musicians’ greater success in replicating aurally perceived music at the original pitch is also based on the same type of audiomotor transformations they are employing during transposition. The smaller difference between replication and transposition exhibited by most improvising musicians in this study supports that view.

The observation that treble alignment scores were significantly higher than bass alignment scores, despite the use of two-part polyphonic excerpts, corroborates the high-voice superiority effect found in both behavioral and neural studies. A higher-pitch advantage for melody recognition was found in infants [44] as well as in musically trained and untrained individuals [45]. Auditory brainstem response to intervals has revealed heightened responses to harmonics of the upper tone [46]. MMN response to higher-pitched deviants is larger and earlier [4748]. Seventh-month old infants show earlier and larger MMN to changes in the higher voice [49]. MMN in even younger (3-month old) infants was smaller and later than 7 month infants, but size of MMN difference was similar across ages, supporting the hypothesis of an innate origin of the high-voice superiority effect [50].

The high-voice superiority effect has been shown to be subject to neuroplasticity. MMN caused by pitch deviants in the bass has been found to be equal (but not larger) to that elicited by the treble voice in double bass players [51]. In addition, lower-voice superiority has been found for temporal deviants in players of bass instruments [52]. The significantly higher scores for bass IOR observed in improvising musicians and the fact that bass IOR alignment scores were higher than those of score-dependent musicians both with and without aural feedback, suggests that they may also be subject to a lower-voice superiority effect. This is one group difference that could possibly be attributed to the instrument the subjects played, rather than to the practice of improvisation. Organists commonly use the pedals to play the bass line, while pianists incorporate the bass line in the left-hand part. In that sense, organists can be said to play a bass instrument and may therefore also be subject to the lower-voice superiority effect.

An advantage of aural feedback was observed for the replication of treble IOR, but only for improvising musicians. At first sight, this might seem to conflict with studies that have demonstrated that musicians are largely independent of aural feedback [53]. Performance without aural feedback is not only as accurate, but also almost as expressive as with feedback [54]. The concept that aural feedback might not be essential is supported by a study using event-related potentials (ERP) during the performance of memorized music, revealing early error signaling, before the actual error, independent of aural feedback [55]. Aural feedback has been shown to be more important during the learning phase than during music performance itself [56].

While musicians are quite able to perform without aural feedback, asynchronously altered feedback (AAF) may compromise performance during both singing and playing. Delaying feedback until the next tone is being sung or played (serial shift), however, compromises performance in singers but not in score-dependent pianists [57]. While the singers in the cited study had learned the melodies aurally, the pianists, being unable to play the melodies ‘by ear’, had learned them from music notation. The authors argue that the disruptive effects of altered feedback are ‘based on abstract, effector-independent, associations between perception and action’, suggesting that action-perception associations are stronger in singers than in score-dependent pianists. Though the experimental paradigm was considerably different, stronger action-perception associations in improvising musicians might also be responsible for the larger benefit from aural feedback experienced by improvising musicians in the present study during replication of the rhythm of the melody. Further study is necessary to determine the effect of feedback on aural transposition.


The present study has found behavioral evidence for superior audiomotor transformation during the replication and particularly the transposition of aurally perceived music in improvising musicians. These results are consistent with the associated fMRI study [26], providing arguments suggesting that improvisation supports audiomotor learning in music performance. The present findings underscore the notion that the gradual disappearance of improvisational task requirements in the field of classical music since the middle of the nineteenth century [58] has had a large impact not only on concert practice but, more importantly, also on the audiomotor characteristics of the musicians themselves. Nevertheless, the high alignment scores exhibited by a small number of score-dependent musicians indicate that, besides improvisation, specific practice strategies may also have an important impact on audiomotor integration [59].

Author Contributions

  1. Conceptualization: RH BMdJ.
  2. Data curation: RH.
  3. Formal analysis: RH PvK.
  4. Investigation: RH.
  5. Methodology: RH BMdJ PvK.
  6. Project administration: RH.
  7. Resources: RH PvK.
  8. Software: PvK.
  9. Supervision: BMdJ.
  10. Validation: RH.
  11. Visualization: RH.
  12. Writing – original draft: RH.
  13. Writing – review & editing: RH BMdJ.


  1. 1. Shaffer LH. Performances of Chopin, Bach, and Bartok: Studies in motor programming. Cognitive Psychology. 1981 Jul 1;13(3):326–76.
  2. 2. Stewart L, Henson R, Kampe K, Walsh V, Turner R, Frith U. Brain changes after learning to read and play music. Neuroimage. 2003 Sep 30;20(1):71–83. pmid:14527571
  3. 3. Nettl B, Russell M. In the Course of Performance: Studies in the World of Musical Improvisation. University of Chicago Press; 1998.
  4. 4. Ohnishi T, Matsuda H, Asada T, Aruga M, Hirakata M, Nishikawa M, et al. Functional anatomy of musical perception in musicians. Cerebral Cortex. 2001 Aug 1;11(8):754–60. pmid:11459765
  5. 5. Baumann S, Koeneke S, Schmidt CF, Meyer M, Lutz K, Jancke L. A network for audio–motor coordination in skilled pianists and non-musicians. Brain Research. 2007 Aug 3;1161:65–78. pmid:17603027
  6. 6. Haslinger B, Erhard P, Altenmüller E, Schroeder U, Boecker H, Ceballos-Baumann AO. Transmodal sensorimotor networks during action observation in professional pianists. Journal of Cognitive Neuroscience. 2005 Feb;17(2):282–93. pmid:15811240
  7. 7. Drost UC, Rieger M, Brass M, Gunter TC, Prinz W. When hearing turns into playing: Movement induction by auditory stimuli in pianists. The Quarterly Journal of Experimental Psychology Section A. 2005 Nov 1;58(8):1376–89.
  8. 8. Bangert M, Peschel T, Schlaug G, Rotte M, Drescher D, Hinrichs H, et al. Shared networks for auditory and motor processing in professional pianists: evidence from fMRI conjunction. Neuroimage. 2006 Apr 15;30(3):917–26. pmid:16380270
  9. 9. Mutschler I, Schulze-Bonhage A, Glauche V, Demandt E, Speck O, Ball T. A rapid sound-action association effect in human insular cortex. PloS one. 2007 Feb 28;2(2):e259. pmid:17327919
  10. 10. Herholz SC, Lappe C, Knief A, Pantev C. Neural basis of music imagery and the effect of musical expertise. European Journal of Neuroscience. 2008 Dec 1;28(11):2352–60. pmid:19046375
  11. 11. Trimarchi PD, Luzzatti C. Implicit chord processing and motor representation in pianists. Psychological Research. 2011 Mar 1;75(2):122–28. pmid:20556421
  12. 12. Novembre G, Keller PE. A grammar of action generates predictions in skilled musicians. Consciousness and Cognition. 2011 Dec 31;20(4):1232–43. pmid:21458298
  13. 13. Stewart L, Verdonschot RG, Nasralla P, Lanipekun J. Action–perception coupling in pianists: Learned mappings or spatial musical association of response codes (SMARC) effect?. The Quarterly Journal of Experimental Psychology. 2013 Jan 1;66(1):37–50. pmid:22712516
  14. 14. Tervaniemi M, Rytkönen M, Schröger E, Ilmoniemi RJ, Näätänen R. Superior formation of cortical memory traces for melodic patterns in musicians. Learning & Memory. 2001 Sep 1;8(5):295–300.
  15. 15. Vuust P, Brattico E, Seppänen M, Näätänen R, Tervaniemi M. The sound of music: differentiating musicians using a fast, musical multi-feature mismatch negativity paradigm. Neuropsychologia. 2012 Jun 30;50(7):1432–43. pmid:22414595
  16. 16. Gordon E. Manual for the Advanced Measures of Music Audiation. GIA Publications; 1989.
  17. 17. Woody RH, Lehmann AC. Student musicians’ ear-playing ability as a function of vernacular music experiences. Journal of Research in Music Education. 2010 Jul 1;58(2):101–15.
  18. 18. Ullman MT. Contributions of memory circuits to language: The declarative/procedural model. Cognition. 2004 Jun 30;92(1):231–70.
  19. 19. Milner AD, Goodale MA. The Visual Brain in Action. Oxford University Press; 1995.
  20. 20. Ullman MT. A cognitive neuroscience perspective on second language acquisition: The declarative/procedural model. Mind and context in adult second language acquisition. 2005:141–78.
  21. 21. Foster NE, Zatorre RJ. A role for the intraparietal sulcus in transforming musical pitch information. Cerebral Cortex. 2010 Jun 1;20(6):1350–59. pmid:19789184
  22. 22. Zatorre RJ, Halpern AR, Bouffard M. Mental reversal of imagined melodies: a role for the posterior parietal cortex. Journal of Cognitive Neuroscience. 2010 Apr;22(4):775–89. pmid:19366283
  23. 23. Brown RM, Chen JL, Hollinger A, Penhune VB, Palmer C, Zatorre RJ. Repetition suppression in auditory–motor regions to pitch and temporal structure in music. Journal of Cognitive Neuroscience. 2013 Feb;25(2):313–28. pmid:23163413
  24. 24. Sergent J, Zuck E, Terriah S, MacDonald B. Distributed neural network underlying musical sight-reading and keyboard performance. Science. 1992 Jul 3;257(5066):106–9. pmid:1621084
  25. 25. Schön D, Anton JL, Roth M, Besson M. An fMRI study of music sight-reading. Neuroreport. 2002 Dec 3;13(17):2285–89. pmid:12488812
  26. 26. Harris R, de Jong BM. Differential parietal and temporal contributions to music perception in improvising and score-dependent musicians, an fMRI study. Brain Research. 2015 Oct 22; 1624:253–64. pmid:26206300
  27. 27. Wiora W. Systematik der Musikalischen Erscheinungen des Umsingens. Jahrbuch für Volksliedforschung. 1941 Jan 1;7:128–95.
  28. 28. Mongeau M, Sankoff D. Comparison of musical sequences. Computers and the Humanities. 1990 Jun 1;24(3):161–75.
  29. 29. Lemström K. String matching techniques for music retrieval. Ph.D, Thesis, University of Helsinki; 2000.
  30. 30. Grachten M, Arcos JL, Lopez de Mantaras R. Melody retrieval using the implication/realization model. In: Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005).
  31. 31. Kranenburg VP, Volk A, Wiering F, Veltkamp RC. Musical Models for Folk-Song Alignment. In: Proceedings of the International Conference on Music Information Retrieval (ISMIR 2009).
  32. 32. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970 Mar 28;48(3):443–53. pmid:5420325
  33. 33. Reber PJ, Squire LR. Parallel brain systems for learning with and without awareness. Learning & Memory. 1994 Nov 1;1(4):217–29.
  34. 34. Willingham DB, Goedert-Eschmann K. The relation between implicit and explicit learning: Evidence for parallel development. Psychological Science. 1999 Nov 1;10(6):531–34.
  35. 35. Willingham DB, Salidis J, Gabrieli JD. Direct comparison of neural systems mediating conscious and unconscious skill learning. Journal of Neurophysiology. 2002 Sep 1;88(3):1451–60. pmid:12205165
  36. 36. Cleeremans A. Mechanisms of Implicit Learning: Connectionist Models of Sequence Processing. MIT Press; 1993.
  37. 37. Cleeremans A, McClelland JL. Learning the structure of event sequences. Journal of Experimental Psychology: General. 1991 Sep;120(3):235.
  38. 38. Newport EL, Aslin RN. Learning at a distance I. Statistical learning of non-adjacent dependencies. Cognitive Psychology. 2004 Mar 31;48(2):127–62. pmid:14732409
  39. 39. Friederici AD, Bahlmann J, Heim S, Schubotz RI, Anwander A. The brain differentiates human and non-human grammars: functional localization and structural connectivity. Proceedings of the National Academy of Sciences of the United States of America. 2006 Feb 14;103(7):2458–63. pmid:16461904
  40. 40. Friederici AD. The cortical language circuit: from auditory perception to sentence comprehension. Trends in Cognitive Sciences. 2012 May 31;16(5):262–68. pmid:22516238
  41. 41. Byrne RW. Imitation of novel complex actions: what does the evidence from animals mean?. Advances in the Study of Behavior. 2002 Dec 31;31:77–105.
  42. 42. Morgan-Short K, Steinhauer K, Sanz C, Ullman MT. Explicit and implicit second language training differentially affect the achievement of native-like brain activation patterns. Journal of Cognitive Neuroscience. 2012 Apr;24(4):933–47. pmid:21861686
  43. 43. Wolpert DM, Diedrichsen J, Flanagan JR. Principles of sensorimotor learning. Nature Reviews Neuroscience. 2011 Dec 1;12(12):739–51. pmid:22033537
  44. 44. Trehub SE, Trainor L. Singing to infants: Lullabies and play songs. Advances in Infancy Research. 1998;12:43–78.
  45. 45. Palmer C, Holleran S. Harmonic, melodic, and frequency height influences in the perception of multivoiced music. Perception & Psychophysics.1994;56(3):301–12.
  46. 46. Lee KM, Skoe E, Kraus N, Ashley R. Selective subcortical enhancement of musical intervals in musicians. The Journal of Neuroscience. 2009;29(18):5832–40. pmid:19420250
  47. 47. Fujioka T, Trainor LJ, Ross B, Kakigi R, Pantev C. Automatic encoding of polyphonic melodies in musicians and non-musicians. Journal of Cognitive Neuroscience. 2005;17(10): 1578–92. pmid:16269098
  48. 48. Fujioka T, Trainor LJ, Ross B. Simultaneous pitches are encoded separately in auditory cortex: An MMN study. Neuroreport. 2008;19(3):361–66. pmid:18303582
  49. 49. Marie C, Trainor LJ. Development of simultaneous pitch encoding: Infants show a high voice superiority effect. Cerebral Cortex. 2013;23(3):660–69. pmid:22419678
  50. 50. Marie C, Trainor LJ. Early development of polyphonic sound encoding and the high voice superiority effect. Neuropsychologia. 2014;57:50–58. pmid:24613759
  51. 51. Marie C, Fujioka T, Herrington L, Trainor LJ. The high-voice superiority effect in polyphonic music is influenced by experience: A comparison of musicians who play soprano-range compared with bass-range instruments. Psychomusicology. Music, Mind, and Brain. 2012;22(2):97–104.
  52. 52. Hove MJ, Marie C, Bruce IC, Trainor LJ. Superior time perception for lower musical pitch explains why bass-ranged instruments lay down musical rhythms. Proceedings of the National Academy of Sciences. 2014;111(28):10383–388.
  53. 53. Finney SA. Auditory feedback and musical keyboard performance. Music Perception: An Interdisciplinary Journal. 1997;15(2):153–74.
  54. 54. Repp BH. Effects of auditory feedback deprivation on expressive piano performance. Music Perception. 1999;16(4):409–38.
  55. 55. Ruiz MH, Jabusch HC, Altenmüller E. Fast feedforward error-detection mechanisms in highly skilled music performance. In: Proceedings of the International Symposium on Performance Science. 2009;187–97.
  56. 56. Finney S, Palmer C. Auditory feedback and memory for music performance: Sound evidence for an encoding effect. Memory & Cognition. 2003;31(1):51–64.
  57. 57. Pfordresher PQ, Mantell JT. Effects of altered auditory feedback across effector systems: Production of melodies by keyboard and singing. Acta Psychologica. 2012;139(1):166–77. pmid:22100135
  58. 58. Moore R. The decline of improvisation in Western art music: An interpretation of change. International Review of the Aesthetics and Sociology of Music. 1992;23(1):61–84.
  59. 59. Seppänen M, Brattico E, Tervaniemi M. Practice strategies of musicians modulate neural processing and the learning of sound-patterns. Neurobiology of Learning and Memory. 2007 Feb 28;87(2):236–47. pmid:17046293