Absolute Memory for Tempo in Musicians and Non-Musicians

The ability to remember tempo (the perceived frequency of musical pulse) without external references may be defined, by analogy with the notion of absolute pitch, as absolute tempo (AT). Anecdotal reports and sparse empirical evidence suggest that at least some individuals possess AT. However, to our knowledge, no systematic assessments of AT have been performed using laboratory tasks comparable to those assessing absolute pitch. In the present study, we operationalize AT as the ability to identify and reproduce tempo in the absence of rhythmic or melodic frames of reference and assess these abilities in musically trained and untrained participants. We asked 15 musicians and 15 non-musicians to listen to a seven-step `tempo scale’ of metronome beats, each associated to a numerical label, and then to perform two memory tasks. In the first task, participants heard one of the tempi and attempted to report the correct label (identification task), in the second, they saw one label and attempted to tap the correct tempo (production task). A musical and visual excerpt was presented between successive trials as a distractor to prevent participants from using previous tempi as anchors. Thus, participants needed to encode tempo information with the corresponding label, store the information, and recall it to give the response. We found that more than half were able to perform above chance in at least one of the tasks, and that musical training differentiated between participants in identification, but not in production. These results suggest that AT is relatively wide-spread, relatively independent of musical training in tempo production, but further refined by training in tempo identification. We propose that at least in production, the underlying motor representations are related to tactus, a basic internal rhythmic period that may provide a body-based reference for encoding tempo.


Introduction
The Italian word tempo (literally, 'time'; plural: tempi) indicates the perceived frequency of the rhythmic pulse of music. Tempo reflects the frequency of beats, the "regularly recurring articulations in the flow of musical time" [1], which is measured by the ratio of beats over time (beats per minute or bpm, e.g. 120 bpm = 120 beats / 60 s = 2 Hz). Tempo is also identifiable by the time interval between beats (inter-onset-interval or IOI) that is the reciprocal of frequency identify or to produce a specific tempo without an external reference, that is to say, absolute tempo (AT).

Spontaneous tempo (tactus)
An extra-musical candidate as absolute reference for tempo might be the so-called tactus, the body-based reference rhythm for establishing the beat before metronomes [21,3]. The Renaissance musical theorist Gaffurius (1496), for instance, equalled tactus to the pulse rate of a man breathing normally [22]. This idea resonates with more modern conceptions. Tactus as a hand movement to keep the time was first described in 1490 by Adam von Fulda [21] and recent studies emphasize kinaesthetic sensations in the connection between hearing rhythm and perceiving movement [9]. Tempo, as an expression of musical movement, recalls motion in physical space and alludes to physical motion of a body or limb. There is evidence that final retard, the expressive musical slowing at the end of piece or between sections of a piece, is interpreted relative to physical movement [23,24,25] and will tend to deviate from the preceding tempo according to specific rules [26]. Kronman and Sundberg [27] modelled final retard as a motion in constant negative acceleration, similarly to a runner slowing down. Thus, a framework for encoding tempo may be provided by constraints on actual human movement [28], not just by rhythmic physiological phenomena [29]. The basis for an internal beat reference may be constituted by typical rhythmic behaviours such as walking and running, which are by definition periodic. Interestingly, the mean stride for both adult men and women is about 117 steps per minute, men's strides being longer than women's strides, but not faster [28]. Although there is a great variability of this measure, the observed range (about 81 to 150 steps per minute) is very similar to the distribution of preferred tempi in a finger-tapping task [30,28].
Tempo perception occurs in a specific range. When frequency is too high, individual beats merge into a continuous flow; when it is too low, they lose their temporal structure and are perceived as individual events. This range defines the existence region of tempo perception but cannot be defined exactly because transitions are gradual and individual differences are large. Parncutt [31] proposed 33 bpm as the lower limit and 300 bpm as the upper limit. London [32] set this range from about 30 bpm to 240 bpm. Other works report 24 bpm and 600 bpm [33]. We find similar limitations in tempo production. We cannot produce repetitive movements too fast, in a controlled manner, or too slow; in the latter case we loose the sense of continuity and feel a series of individual movements. The upper biomechanical limit rate for finger tapping is constrained by the maximum frequency at which the effector can move. According to some estimates, the upper limit is about 400 bpm [34] (see also [33]) and the lower rate limit is about 30 bpm [33]. These limits bear a certain degree of ambiguity, as continuation tapping is not strictly periodic, but exhibits longer-term fluctuations (for a review, see Large [35]). The production limits are therefore more precisely expressed as the limit IOIs (in this case, about 150 ms to 2 s). Tempi near the limits of the existence region are not easily perceived or produced. In contrast, an optimal range for tempo production and perception exists in the middle of this region. This preferred tempo region varies somewhat between individuals. On the average, the range has been estimated to be between 67 bpm and 150 bpm (see Moelants [36]) or approximately from 75 bpm to 200 bpm (see [33]). In this range, there is a peak of maximal salience, the so-called spontaneous tempo. Spontaneous tempo corresponds to a moderate frequency and has a special significance because we tend to gravitate towards it [37]. According to Parncutt [31], spontaneous tempo is around 100 bpm. Other authors have reported different values but all the reported frequencies are under 120 bpm [36]. McAuley [33] distinguished between spontaneous motor tempo (SMT), the natural or preferred rate of rhythmic motor activity (e.g., tapping), and preferred perceptual tempo (PPT) the rate of a series of sounds or lights that is judged to be neither too fast, nor too slow, but appears to be 'just right' [37,38]. A representative value of SMT is 100 bpm (600 ms) but there are also large individual differences. SMT can vary from 300 bpm (200 ms) to 37.5 bpm (1,600 ms) [39,38]. There are some evidences that young children prefer faster rates than old children and adults [39,38] and musicians and non-musicians often differ in their spontaneous rates [40]. The most commonly reported value for PPT is around 100 bpm, like SMT, but a wide range of values have also been reported over the years. Notably, SMT and PPT have comparable frequencies. Such correlation supports the view that motor and perceptual tempo preferences have a common psychological basis [38].

Absolute tempo and the analogy with absolute pitch
We perceive pitch if a waveform frequency is between 16 and 20000 Hz [41]. Sounds of frequency less than 16 Hz are not 'normally' heard but may be felt bodily as vibrations [41]. Thus, both tempo and pitch are related to frequency. However, the analogy between pitch and tempo does not imply a spatial isomorphism [42]. A pitch relation (a melodic interval) refers to the distance between two pitches, measured on the degrees of the scale. In Western tonal music, pitches are organized such that a fixed pattern of inter-tone intervals, the diatonic scales, repeats at every octave in a cyclic structure [43]. In contrast, a tempo relation is not only a temporal distance, but it is also concerned with the velocity of motion between two onsets with respect to a metrical framework. Strong and weak beats organize in larger units over multiple time scales. These time scales constitute a hierarchy such that specific beats at each level periodically coincide [33]. A crucial aspect of this organization is again cyclicity: metre is a recurring pattern of time [44]. For this reason, the pitch-tempo analogy is better casted as a kind of cognitive isomorphism, based on a common cyclic structure that can be understood in terms of mathematical group theory [45] and described cross-culturally [46,47].
Absolute pitch (AP) is the ability to recall pitch from long-term memory either to identify the pitch or the chroma (pitch class) of a tone presented in isolation, or to produce a specified pitch without an external reference [48,49,50]. AP does not involve supernormal perceptual mechanisms but is instead related to extremely well developed pitch memory and verbal labelling [51,52,53]. It is a rare ability that generally occurs in a small percentage of the general population, estimated to be no more than 0.01% (1 out of 10,000 [54,48]) and it is strongly related to musical training [51,49,50]. AP is typically assessed by three kinds of tasks: Identification, production and memory decay. Possessors score well above chance on tests of these abilities [51]. Production and identification are highly correlated, although large individual differences exist. For example, not all individuals capable of absolute pitch identification are equally able at absolute pitch production [55]. Thus, these two abilities should be tested separately [52]. The phenomenon of AP provides strong evidence that at least some of us are capable of processing musically relevant representations without an external reference. While this is well established for pitch, however, whether a similar ability exists for tempo is much less clear.
It is well known that several great musicians, such as Mozart, Scrjabin, Messiaen, and Boulez, were AP possessors while others, such as Wagner, Čajkovskij, Ravel, or Stravinskij, were not [49]. Thus, absolute pitch is not necessary to become a musician; the basic skill exercised during musical training is relative pitch, the ability to recognize and produce pitch relations. Conversely, we have only anecdotal information on potential AT possessors. Bartók has been described as having an uncanny sense of tempo [56] and Toscanini was criticized for his 'inexorable beat' [57]. Reportedly, Ormandy was always able to produce exact tempo without a metronome. Italian pianist Vidusso was especially famous among his pupils for his tempo ability (personal communication). However, these anecdotal reports do not tell much about musicians who do not have this ability or on absolute tempo in non-musicians. Similarly to pitch, musical training stresses the role of tempo relations, such as for instance doubling or halving a tempo, and absolute tempo memory is typically not addressed [58]. Does AT exist? And if it does, is it relatively rare, like AP, or more common?

Previous studies
In a seminal paper, Levitin and Cook [18] asked participants to name some of their favourite songs, checked that they knew them only in one canonical version, and recorded how they sang them. They found that participants reproduced tempo accurately: 72% of the productions were within ± 8% of the tempo of the known canonical version (r = .95). Productions showed minimal overestimation errors that could be explained by performance stress, which is known to induce speeding [59], by motor factors such as the tendency to perform faster rather than slower [60], or by perceptual factors such as the better perception of slowed-down in comparison to speeded-up performance [61]. These results suggest that tempo was encoded in absolute terms and could be retrieved when singing the songs, even by musically untrained participants. In a later study Pauws [62] requested trained and untrained singers to sing from memory melodies of familiar and less familiar Beatles songs, after listening to the original CD. Results supported the existence of absolute memory for tempo, irrespective of singing ability. Almost two thirds of the participants came reasonably close to the actual tempo on the CD, without differences between trained and untrained singers.
Lapidaki [63] investigated the consistency of tempo judgements, more specifically the consistency of 'correct' subjective tempo, over a period of time, during the listening process. Participants were asked, across four separate trials, to listen to the same six musical examples, from various musical styles, and to indicate whether the experimenter should set the tempo 'faster' or 'slower' until it sounded right to them. For a relatively small number of participants, the judgments were remarkably consistent across trials and relatively unaffected by such other factors as fatigue, mood, or time of the day. Given that participants were not allowed to have external references, such as a musical score or body movements, Lapidaki labelled this ability 'absolute tempo' , by analogy with absolute pitch (see also [64]). However, we must consider that good performance may be biased by a strong memory for a small range of tempi, or by a subjectively preferred tempo that may vary in different contexts but remains mostly centred on 100 bpm (see above).
Collier and Collier [65,56] studied jazz recordings in relation to the ability to double the tempo. They observed that when jazz musicians attempted to return to the original tempo after doubling, they did so with considerable accuracy [56]. The conclusion was that, given that the musicians were consistent across takes on different days, they had good tempo memory. These authors also stressed that jazz musicians seldom use metronomes, if ever, and that the possible use of metronomes to set initial tempi cannot account for the return to the original tempo. According to this memory hypothesis, authors suggest that musicians were relying on a sense of absolute tempo, analogous to absolute pitch [56]. Absolute tempo was displayed both in short-term memory, within each take, and in long-term memory, between takes. Finally, Fine and Bull [66] asked musicians and non-musicians to reproduce three tempi (35,110 and 185 bpm) from memory by clapping. Results indicated that the slower and faster tempi were recalled better than the medium tempo, in accord with well-known serial position effects on free recall [67]. They did not find musical experience to affect tempo recall, but in their nonmusicians group there were three participants with some musical experience and this could have diluted the difference between groups.

The present study
Empirical studies indicate that the ability to remember tempo absolutely might exist. However, to our knowledge, no systematic assessments of absolute memory for tempo have been performed using laboratory tasks that could be compared to those used for assessing absolute pitch. In the present study, we sought to quantify the ability to identify or reproduce tempo in the absence of rhythmic or melodic frames of reference or external temporal anchors, in musically-trained and untrained participants. We asked participants to perform simplified identification and production tasks, which did not require musical training, and analysed accuracy and pattern of errors. To this aim, we developed a simple 'tempo scale' of metronome beats with artificial labels that were learned at the beginning of each testing block. To perform accurately on these tasks, participants needed to encode tempo information with the corresponding label, store the information, and recall it to give the responses. Our purpose was to test whether participants could memorize tempo without the musical cues provided by familiar songs or pieces used in previous studies. By using a simple sequence of beats, we completely eliminated melody and harmony cues, as well as some metric and rhythmic information (all the durations being the same), and focused on the specific and absolute components of tempo as beat frequency. Rhythmic information was not completely eliminated, as an isochronous series of beat remains a rhythmic frame of reference, but, indeed, it is a very minimal one.

Ethics statement
The research was conducted in compliance with the ethical standards of the Italian Board of Psychologists (see http://www.psy.it/codice_deontologico.html), the Ethical Code for Psychological Research of Italian Psychological Society (see http://www.aipass.org/node/26) and the Code of Ethical Principles for Medical Research Involving Human Subjects of the World Medical Association (Declaration of Helsinki). The experiment did not involve clinical tests or use of pharmaceuticals or medical equipment, did not require collecting health information from participants, and did not involve the use of deception or involve participant discomfort in any other way. For these reasons, and in accordance with its regulations, the approval of Ethics Committee for Clinical Research of the University of Trieste was deemed unnecessary.
All participants were 18 years or older at the time of the study. The study was conducted in established educational settings-the University of Trieste and the Trieste Music Conservatory -where students and colleagues are routinely involved in research activities as participants. All participants gave verbal consent after being adequately informed of the aims, methods, and procedure of the study. Potential participants were informed that their anonymity would be preserved at all stages. Verbal consent was a prerequisite for participating. The only information collected specifically for the purposes of this study were age and years of musical training. The names of those who gave verbal consent, namely the participants, were immediately transformed into coded identifiers (Subject number) and remained available to the first author only, who saved them in an encrypted file. Participants' names never entered in any analyses of the data.

Participants
Thirty volunteers participated in the study. Fifteen (nine women and six men) were undergraduate or graduate students of the University of Trieste (age range: 19-45 years, M = 26.9, SD = 7.2 years) with no specific musical training ('non-musicians'). Fifteen (nine women and six men) were undergraduate or graduate piano students of the Trieste Music Conservatory

Stimuli
The acoustic stimuli consisted of an ordered series of seven short sequences of metronomic clicks. We generated this series based on two criteria.
The first was that it could be reasonably assumed that the seven tempi were equally spaced perceptually. Based on well-established psychophysical principles, to achieve this we chose the target tempi to be at equal distances on a logarithmic scale and evaluated perceived differences based on assessments of tempo just-noticeable differences (JND) in the literature. Estimates of the JND in two-alternative forced-choice tempo perception tasks yield deviations from the actual tempo between 6.2% and 8.8% [68]. In continuation-tapping tasks, typical JNDs are between 7% and 11% from the correct tempo [69]. Listeners' ability to detect tempo differences between 40 and 600 bpm for single interval sequences are approximately on the order of 6%. For multiple isochronous interval sequences, thresholds improve, on average, to 3%. Best performance, slightly below 2%, is found for sequences of 6 intervals of 400 ms, a 150 bpm tempo [68,33].
The second criterion was that the ordering had to make sense from a musical point of view. Supporting this, we note that our tempo series can be considered a sort of tempo 'scale'. Although we acknowledge that the similarity should no be pushed too far, the tonal scale in the equal temperament system is precisely a series of equal logarithmic steps in frequency with one octave (1:2 frequency ratio) divided into 12 equal semitones [70]. We note further that the concept of a twelve-step logarithmic tempo series was employed by Karlheinz Stockhausen in his celebrated masterpiece Gruppen for three orchestras  as guide for the serial organization of the parts of the piece.
Based on these two criteria, we generated a temporal series of 'semitempi' , starting at 40 bpm, by repeatedly multiplying by ffiffi ffi 2 which corresponds to increasing the frequency by 6% at each step. We obtained three 'octaves' of semitempi, the series (in bpm, rounded to integer):  (2) we then chose seven bpm values, one every two steps (semitempo units), on the extension of one octave. This octave is roughly centred on 100 bpm and spans approximately the preferred tempo region as defined above. The seven bpm values (rounded to integer) were: and correspond to the IOIs (defined above): 845:1 À 750:0 À 666:7 À 594:1 À 531:0 À 472:4 À 419:6 ms These bpm values are equally spaced on a logarithmic scale. We therefore assume that they are approximately equally spaced in psychological space (see for instance [71]). Furthermore, we can be reasonably sure from the above-mentioned estimates of tempo JND's that the tempi in (3), increasing in frequency by 12%, are perceptually distinguishable from one another.
For each bpm value in (3) we produced an MP3 audio clip with WireTape Studio, from an open source digital metronome [72] providing a clearly audible click. The timbre of the click closely resembled that of standard, commercially available metronomes. Each audio clip of metronomic clicks, henceforth simply`tempo' , lasted 10s. The number of beats in each tempo, rounded to integer, was 12,13,15,17,19,21,24. Participants were not told in advance that all tempi had the same duration. Thus, they had no reason to attempt to count the number of beats during the learning phase, a very hard task to accomplish accurately given the relatively small differences between these numbers and the difficulty of memorizing seven similar numbers. Finally, to provide verbal labels instead of hard-to-master metronomic designations in (3) we chose the numbers one to seven, one indicating the slowest tempo of the series and seven the fastest. To prevent participants from comparing tempi between trials and thereby use a relative rather than absolute code, between successive trials, we randomly presented a series of six 12s distractors consisting in musical and visual excerpts. These clips were extracted from the beginning of an abstract animated movie of the first movement, Allegro, of Bach's cembalo Concert in F minor, BWV 1056. The full video and soundtrack are freely available online [73]. The mean tempo in all the excerpts was quarter note = 82 bpm.

Procedure
The whole experiment was run on a MacBook Pro laptop computer using a PowerPoint slideshow. The experiment consisted of two tasks, identification and production. The completion of each task required about 10 minutes. Participants were tested individually in a silent room. Each participant completed the two tasks in two sessions separated by one to three days, depending on participants' availability. At the beginning of each session, participants sat at the table in front of the laptop, and read the instructions for the specific task on the screen. The instructions were as follows (translated into English): "We will listen to seven sequences of metronome beats. They will be called 'tempi' and they will be ordered from slowest to fastest. Tempi will be identified by numbers from 1 (slowest) to 7 (fastest). In the test, you will be presented with a random sequence of these tempi (identification task version) / number (production task version). Your task will be, after each presentation, to report the number that in your opinion corresponds to the tempo you just heard (identification task version) / to tap on the table the tempo that corresponds to the presented number (production task version). In between presentations of tempi / number you will be presented with a brief audiovisual excerpt." After reading instructions, participants responded to two training items with tempi not included in the seven-tempi experimental scale. Afterwards, the ordered series of seven tempi on the screen, each lasting 10 s (learning set), was presented once, together with the image of the numerical label and with 4 s between each successive tempo (a black slide). We presented tempo from slowest to fastest in accord to the order of the Metronome series. Finally, a slide with the sentence: "Be ready as the test is about to start" was presented for 3 s and the test began. In the identification task, participants heard each randomly presented tempo (10s) and were required to identify it promptly, with a unique label, and to report verbally the corresponding number. In the production task, participants saw each numerical label randomly presented on the screen and were required to tap promptly, for 10s, with one finger on the table top to produce the corresponding tempo. After 10s, the end of the trial was signalled by the word 'stop' presented on the screen. After each response, the experimenter pressed the spacebar to continue. Participants heard the clicks through the computer internal speakers (they did not wear headphones). In each condition, participants performed seven trials; during the execution of tasks, they were not allowed to move any part of their body. All responses were recorded in MP3 format with a Yamaha POCKETRAK Recorder for later analyses.

Design
We used a 2x2 mixed factorial design, consisting of two variables with two levels each: Training (musician vs. non-musicians) as a between-participants variable and task (identification and production) as a within-participants variable. The order of tasks was counterbalanced between participants. The independent variables were the level of expertise of participants and the experimental tasks. The dependent variables were the accuracy in retrieving the seven tempi as measured by the proportion of correct identifications and correct productions, as well as the errors as assessed by the distance between response and target tempo, expressed in number of semitempi, in the two tasks.

Measures
Each participant's productions recorded in MP3 format were imported in the open source software Audacity [74] to display sound amplitude vs. time, allowing us to clearly visualize the beat onsets. The produced tempo was computed by counting the number of beats in the time window defined by the onset of the second and second-last beats. The first and last beats in each series were excluded. Specifically, to obtain the mean produced tempo expressed in bpm we used In the identification task, the error was defined as the difference between the target and the response tempo, expressed in number of steps (semitempo units) on the scale described by (2). In this task, therefore, correct responses are simply responses that match the target labels. In the production task, conversely, the error was defined as the difference between the target and the response tempo, again expressed in semitempo units as the result of log ffiffi such that, for instance, a 118 bpm response to the 101 bpm target corresponds to an error equal to 2.7 semitempi. We then considered as correct all responses falling within ± 1 semitempi from the target, corresponding to a bpm shift of ± 6%. We chose this range for several reasons. First, this range matches empirically observed precisions in tempo perception and production. Second, our chosen range corresponds to a bpm change of ± 6% and is a conservative estimate [75] that is adopted in most studies on absolute pitch where it corresponds to a resolution of one semitone [76,77,78]. Finally, given that the steps in scale (3) are divided by 2 semitempi intervals (a resolution of 12% between each contiguous step), our chosen range represents the smallest possible error in the identification task. This implies that this range allows the most meaningful comparison between accuracies in the two tasks.  Table 1. Bivariate distributions in the musicians and non-musicians groups were very similar between training groups (columns), whereas they differed clearly between tasks (rows). The bivariate distributions reveal two additional features characterizing this difference. First, the association between response and target tempi was slightly weaker in the production task (r = .82 and .73, for musicians and non-musicians, respectively) than in the identification task (r = .9 and .86). This feature is of limited interest as it is likely to reflect the different constraints on the response in the two tasks. For this reason, we will not discuss it further. Second, linear fits on the identification data indicated that in both conditions both training groups were reasonably accurate. Linear fits parameters on the identification data yielded slopes = 0.87 ± 0.04 and 0.87 ± 0.05 and intercepts = 12.60 ± 4.40 and 12.63 ± 5.42 for musicians and non-musicians, respectively.  Similar fits on the production data yielded slopes = 1.13 ± 0.08 and 1.07 ± 0.10 and intercepts = -10.49 ± 8.23 and -9.79 ± 10.43. Thus, performance was always close to the expectation that average response tempo = target tempo for each target tempo value, although there was a slight tendency to underestimate in identification and a similar tendency to overestimate in production. Table 1 presents percentages of correct responses by musicians and non-musicians in the two tasks. The corresponding marginal distributions are summarized by the box-plots in Fig 2. Raw data are included in Supporting Information file S1 Data. The distributions reveal substantial overlap between the two training groups, with the musicians' median only slightly larger than that of non-musicians. Conversely, there is a clear difference between the two tasks. Given that the distributions were reasonably consistent with the assumption of multivariate normality, Shapiro-Wilk test W = 0.98, p = .53, and homogeneity of variance, Bartlett's homoskedasticity test χ 2 (1) = 0.12, p = .73, we subjected these data to a 2x2 mixed-model ANOVA with training (musicians, non-musicians) as the between-participants factor, task (identification, production) as the within-participants factor, and number of correct responses as the dependent variable. This analysis revealed a significant main effect of task, F (1, 28) = 11.68, p = .001, η p 2 = .37 whereas the main effects of training, F (1, 28) = 2.76, p = .102, η p 2 = .05 and the interaction, F (1, 28) < 1, η p 2 = .002, did not prove significant.

Errors
Mean errors (difference between the response and the target tempo) and relative standard deviations are reported in Table 2. Note that errors are expressed in semitempo units, that is, unity corresponds to a 6% deviation relative to the target bpm and to roughly half the perceived difference between adjacent tempi in the graded series of our stimuli (assuming, as we have, that our series is approximately equally spaced psychologically, see Stimuli section). We observed that 48.6% of responses in identification and 22.0% in production fell within ± 1 semitempo from target and 87.6% of responses in identification and 47.1% in production fell within ± 2 semitempi (± 12%) from target. This is represented in Fig 1B and 1D), by the position of the data points relative to the marked areas that identify regions within one (light grey) and two semitempi (dark grey) from the line of perfect accuracy. Mean error magnitude is negative in each of the four conditions; this indicates a tendency to underestimate. Standard deviations are grater in non-musicians and in production.

Comparison with chance performance
These results indicate that the pattern of responses was not random, but depended both on target tempo and on its ordinal position in the learning set. This in turn suggests that some participants were occasionally able to encode the presented tempo and retrieve it without a reference, that is, they might possess a form of absolute tempo. However, to determine how many participants may be assumed to possess this ability and to evaluate whether musical training modulates its prevalence, we need a criterion to identify participants who performed above chance. We defined this criterion as a threshold number T of correct responses, such that the probability P of achieving at least that number of correct responses is < 0.05. Choosing T in the identification task is straightforward. The probability of a random correct response in a trial is 1/7. Using the binomial distribution, we can compute the vector of probabilities P of at least k random correct responses in 7 trials (see below). By inspecting these probabilities it appears that T = 4 satisfies the criterion. In the production task, chance level is lower because there are more than seven possible alternatives for each response; in this case, the choice of T is harder since there are several viable alternatives to calculate the probability of randomly producing a correct response. We compared two methods. In the first method, we computed repeated random permutations of the 210 participants' productions, and assigned them as putative responses to the test. The number of correct responses after 100 permutation cycles was 2,600, corresponding to an estimated probability of a single correct random response p = .12. Using the binomial distribution, we find that the probability of 3 or more correct guesses is P = .042 whereas the probability of 2 or and p 2 is the probability that the bpm produced in this range is the correct response, or p 2 = 1/ 7 because any bpm in this range is a potentially correct response. The composite probability of giving the correct response by chance is thus in reasonable agreement with the estimated probability p = .12 calculated with the first method.
Having calculated the probability of getting just one correct response by chance, using again with the binomial distribution we compute the probability of 2 or more guesses as P = .044 whereas the probability of 3 or more guesses as P = .0038. Thus, encouraged by the coincidence of results produced by both methods, we set the threshold for performance`above chance' at T = 3 for the production task. Fig 3 plots the number of correct responses for each participant in each task. The dotted lines correspond to the chosen values of T and divide the graph in four quadrants: chance performance in both tasks (bottom left), above chance in both tasks (top right), chance performance in identification but above chance in production (top left), and chance performance in production but above chance in identification (bottom right). We can see that five participants (three musicians and two non-musicians) performed above chance in both tasks. Nine participants (seven musicians and two non-musicians) performed above chance in identification, but not in production. Two participants (both non-musicians) performed above chance in production, but not in identification. Thus, more than a half of the participants (53.3%) were able to perform above chance in at least one of the two tasks. The majority of these were musicians, whereas the majority of participants performing at chance in both tasks were non-musicians (nine out of fourteen).
Finally, Fig 4 plots the average number of correct responses as a function of their ordinal position in the learning. The curves suggest that the two tasks were affected in dramatically different ways by ordinal position (An alternative possibility is that the tasks were affected by the items themselves. Although this seems unlikely, in principle it cannot be ruled out as the items were always presented in the same order during the learning phase). In identification, the curve was approximately U-shaped such that the initial and final tempi were identified best, whereas the central value (101 bpm) was the hardest. Out of 30 participants, only 7 (23%) correctly identified the central tempo; whereas these frequencies increased to 19,15,12,12,16, and 21 in the other six tempi (in order from 71 to 142, skipping 101bpm). A chi-square test of independence comparing frequencies of correct and incorrect responses within the central and all tempi revealed a significant association, χ 2 (1) = 8.92, p = .003, ϕ = .21. In production, the curve was instead approximately an inverted U such that the central value was produced most accurately and the initial and final tempi less accurately. Out of 30 participants, as many as 11 (37%) correctly produced the central tempo; whereas these frequencies decreased to 6,8,8,5,3,5 in the other six tempi (in order from 71 to 142, skipping 101bpm). Again a chi-square test of independence revealed a significant association, χ 2 (1) = 4.46, p = .035, ϕ = .15. Thus, the curves in Fig 4 revealed a dissociation between tempo identification and production when performance in these two tasks was evaluated as a function of item ordinal position. This finding may stem from a previously unreported difference in the memory encoding of tempo and in its later retrieval under the conditions of our identification and production tasks. We will return to our interpretation of the dissociation in the final discussion.

A note on distractors
The mean tempo in the music excerpts used as distractors was quarter note = 82 bpm. This value is therefore very similar to that of the second experimental tempo. It is known that when a finger-tapping task is accompanied by a distractor sequence, participants unconsciously tend to synchronize with the distractor sequence [34]. Our participants however did not synchronize with the distractor's tempo since there is no evidence in the data of improved performance on the second item, or of a shift of produced tempi toward 82 bpm.

Discussion
These results provide evidence that some individuals have the ability to retrieve the temporal rate of an acoustic event without a reference (absolute tempo, AT). When compared with the estimated prevalence of absolute pitch (AP) found in the literature (about 0.01%, see [48][49][50][51][52][53][54][55]), the number of individuals that performed better than chance in our tasks may be taken as support to the hypothesis that AT might be more common than AP. Also, in contrast with AP, which is generally considered to be relatively rare and strongly related to musical training [48], our results may be interpreted as evidence that AT is present in both musicians and non-musicians, although there is some evidence that musical training improves performance on tempo identification. It should be noted however that no accepted criterion exists for categorizing individuals as possessing AT. In the present study, as a first step in this direction we proposed a criterion based on a certain definition of chance performance. The current interpretation could however change if a different and presumably better criterion will be defined in future work.
Although our tasks did not differentiate sharply between musicians' and non-musicians' accuracies, we found a clear difference in performance between the identification and production tasks. Musicians showed better performance in identification in comparison to production and to non-musicians. This is especially surprising given that Western modern music is grounded on tonality, the systematic arrangement of pitches toward a referential pitch class (the tonic), whereas there is not a stable system of tempi. Our results are consistent with those of Pauws [62], who found absolute memory for tempo, but not for pitch, independent of singing ability. Participants were generally more accurate in identification, as one would expect given the nature of the two tasks. In the current data, approximately one every two participants performed above chance in identification, whereas only one out of four did so in production. Interestingly, when comparing performance against chance predictions the two tasks were affected in different ways by musical training. In the identification task, almost all of musicians were able to perform above chance, whereas the proportion of non-musicians that did so was approximately the same as the corresponding proportion in the production task. In the production task, most participants failed to perform above chance and, among those who did, musicians and non-musicians were present in approximately equal proportions. Surprisingly, musicians did not necessarily perform better than non-musicians in production tasks. This suggests that the ability to perform above chance in production is not related to musical training.
Presumably, tempo production involves more 'natural' abilities than tempo identification, as these abilities seem related to aspects of music cognition that are innate or learned very early [2,79] and to motor processes [80,29]. Music is not associated with a fixed semantic system but is, by essence, perceptually driven [11]. Perceptual learning from incidental exposure to the music of a culture provides the listener with implicit musical knowledge (automatically applied and not always available to conscious thought) of the structural pattern of that music [81]. Music is generally regarded as a product of human culture but core musical abilities are rooted in biological mechanisms [82]. For instance, a core mechanism enables most humans, independent of musical training, to sing a melody, to move in time with music, and to feel emotions when hearing music [83]; learning and singing a popular song are basic tasks that most of us can readily accomplish [82]. Peretz and Coltheart [83] describe these core mechanisms as a system of modules dedicated to the analysis or processing of different aspects of music. A modular account of music processing implies some degree of domain-specific processing and innateness [84]. Data on memory for tempo in one-week old infants [85] and the ability of newborns to perceive the temporal regularity of beats [86] also provide support for such innate components. However, it is prudent to consider that more general perceptual mechanisms may also account for the perceptual foundation of music [84].
Although our error analysis revealed that participants were generally accurate (see association between response and target tempi), the distribution of errors is also instructive. If there were no absolute memory for tempo, we would expect errors to be uniformly distributed. In contrast, we observed a clustering near the correct tempo (zero error); participants mainly made small errors, on average less than one semitempo. Finally, we observed a general tendency to give slower responses; this result is not consistent with the overestimation of tempo found by Levitin and Cook [18].
Performance on the central value of the learning set Finally, we found that 101 bpm, the central value of the learning set, was the best-produced and worst-identified tempo (Fig 4). Both our identification and production tasks required the conversion of tempo / label into label / tempo representations, entailing a mapping of the ordered series of tempo onto an ordered series of names and vice-versa [80]. In the identification task, the response, a conversion of a stimulus (tempo) to a name, is a cognitive process, a selection / competition among many names that are placed on an ordinal scale. In the production task, the response, consisting in the conversion of a name in a produced tempo, is a process that generates a motor program. We suggest that these features of the two tasks are presumably the reason for the observed two-pronged effect on the central value.
Identification. Our results in the identification task show the characteristic bow effect (called also edge or end effect) observed in absolute identification tasks when accuracy, the proportion of correct response, is plotted as a function of the ordered set of stimuli [87,88]. Performance on stimuli that are either at the beginning or at the end of the range is better than performance on stimuli towards the middle of the range. To our knowledge this is the first investigation that reports a bow effect in absolute identification tasks with tempo in the auditory domain.
Most existing models of absolute identification assume that the magnitude of the stimulus is compared with a long-term representation of the magnitude of each stimulus from the set or of particular anchor values [87]. For instance, in Thurstonian models, long-term absolute magnitude information is represented in the positioning of criteria along a perceptual continuum [89,90,91]. In exemplar models, long-term absolute magnitude is represented in the stored stimulus-magnitude, stimulus-label pairs [92,93,94]. In connectionist models, long-term absolute magnitude is represented in the mapping between stimulus and response nodes [88]. In anchor models, finally, long-term absolute magnitude is represented as the memory for anchors at the edge of the stimulus range [95,96] (for the empirical literature cf., among others, Stewart, Brown & Chater [87]; Lacouture & Marley [88]). In contrast to these models, the relative judgment model (RJM) does not assume long-term representations of absolute magnitudes. Instead, it assumes that responses are generated by comparing the current stimulus to the previous one, in conjunction with feedback from the previous trial [87,97]. Proponents of the RJM assume that limits in performance are not perceptual in nature but relate to the judgment and that judgments are relative to the previous stimulus, not absolute. According to the RJM, a primary explanation of the bow effect is that for the first and last stimuli the opportunity to make mistakes is restricted (responses can be wrong only in one direction, being respectively larger or smaller than the correct response) whereas for the stimulus on the middle of the range, wrong responses can be either smaller or larger than the correct one. This limited possibility of error causes the peaks at each end of the range. Thus, absolute models assume substantial knowledge of the complete set of stimuli; relative models require only partial knowledge.
The present study was not designed to distinguish between these two classes of models. Further work is needed therefore to investigate the observed, and unexpected, bow effect. One interesting possibility with this respect might be to track responses in blocks with and without feedback. When in absolute identification the feedback is omitted, as in our study, participants use their previous response as the best estimate of the correct answer against which to base a relative judgment [87]. If RJM holds, therefore, in blocks without feedback we would expect to see that error rates vary systematically as a function of the correctness of previous responses, whereas in blocks with feedback this effect should disappear.
Production. In the production task we did not observe the bow effect. On the contrary, the central value of the learning set was, over the group of participants, the best produced. This result is not consistent with Fine and Bull who found that the medium among three tempi (110 bpm) was reproduced significantly worse than the first and last tempo [66]. We suggest that, in the production task, motor information implicated in the response generation has a specific link with spontaneous tempo or tactus. Several neuroscience studies suggest that there is a link between auditory and motor systems in rhythm processing (for a review of cognitive neuroscience literature see [98]); the motor system is activated not only during beat production, but also during beat perception. An auditory-motor model of rhythm perception was proposed by Todd and Lee [99], who considered two temporal dependent components: the Time domain and the Frequency domain processes, carrying out temporal segmentation and periodicity analysis, respectively. A third source of tempo dependency is imposed by sensory-motor processes, a representation of dynamic properties of the motor system that is necessary to plan an action in advance. Sensory motor components operate as a filter on the perceived rhythm; we may describe them as two dynamic systems associated with two types of motion: spontaneous foot tapping, which has a natural period of about 100 bpm [37], and the natural body sway, which has a period of about 12 bpm [98,99]. The periodicity that is the nearest to the foot-tapping resonance will be the one favoured to select the tactus [98,99].
Given the strong relationship between musical and physical motion [98,99,27] we might conclude that what we observed in our results is not, presumably, a memory effect, but a consequence of sensory motor integration whereby the role of the body (motor system) affects the choice/production of tempo [29]. In the learning set the tempo nearest to the periodicity of spontaneous tempo was 110 bpm, the central value. This is a knowledge-free competence, not affected by musical training [100], and could be a reasonable explanation for why 110 bpm was the best-produced tempo and why in the production task musicians did not perform better than non-musicians.
An alternative interpretation, plausible though partially speculative, takes into account the nature of the inter-trial distractor audio-visual sequence at test and its compatibility with the requirements of the tasks. It is commonly accepted (e.g., [101,102]) that music shares important features with spoken language. For instance, both language and music involve the production and the organization of perceptually discrete elements into hierarchically structured sequences in accordance with syntactic principles [103,104]. In addition, both need precise sequential timing, with audition playing a central role. Lastly, musical tasks share features with tasks used in motor learning, such as those involving movements of the hands and fingers with no verbal component. "From a listener's perspective, music is a complex structured sequence of sounds, but from a performer's perspective, it is also a long, complex sequence of motor acts" ( [101] p. 52). In our task, the distractor sequence was introduced to prevent participants from comparing tempi between trials. However, being auditory in nature, it may have differentially impacted on the identification and production tasks, which relied on auditory recognition and motor reproduction, respectively. Thus, in the identification task, the distractor sequence may have prevented auditory rehearsal of the tempos, inducing reliance on their distinctiveness. The finding that in identification we observed the typical U-shaped serial position curve (i.e., the slowest and the fastest tempi were recognized better) is consistent with previous studies documenting ordinal position effects in auditory memory (e.g., [105,106]). However, this is the first investigation that reports such effects in memory for tempo and, most importantly, shows that the effects reverse when participants are required to reproduce the encoded tempos motorically. We speculate that the auditory distractor task did not suppress motor memory, leaving kinaesthesic information available. Using such information, participants may have implicitly rehearsed motor movements using the spontaneous tempo (about 100 bpm) as a reference. Using this central value in this fashion would cause the serial position curve to take an inverted U-shape. Though speculative, this interpretation calls for more specific manipulations of the conditions for encoding and retrieval in tempo memory tasks. An obvious comparison under this respect might involve comparing conditions whereby participants are explicitly encouraged to move their hand to encode the tempo with conditions whereby they perform a different movement. Other investigations might consider stimuli not centred on 100 bpm to evaluate whether the statistics of stimulus array, rather than an internal reference, may provide constraints on accuracy. Exploring these issues may open interesting avenues for future investigations of this phenomenon.