Quantifying Auditory Temporal Stability in a Large Database of Recorded Music

“Moving to the beat” is both one of the most basic and one of the most profound means by which humans (and a few other species) interact with music. Computer algorithms that detect the precise temporal location of beats (i.e., pulses of musical “energy”) in recorded music have important practical applications, such as the creation of playlists with a particular tempo for rehabilitation (e.g., rhythmic gait training), exercise (e.g., jogging), or entertainment (e.g., continuous dance mixes). Although several such algorithms return simple point estimates of an audio file’s temporal structure (e.g., “average tempo”, “time signature”), none has sought to quantify the temporal stability of a series of detected beats. Such a method-a “Balanced Evaluation of Auditory Temporal Stability” (BEATS)–is proposed here, and is illustrated using the Million Song Dataset (a collection of audio features and music metadata for nearly one million audio files). A publically accessible web interface is also presented, which combines the thresholdable statistics of BEATS with queryable metadata terms, fostering potential avenues of research and facilitating the creation of highly personalized music playlists for clinical or recreational applications.


Introduction
With the proliferation of back-end warehouses of music metadata (e.g., AllMusic, Gracenote, Last.fm, MusicBrainz, The Echo Nest [1]), front-end online music stores (e.g., Amazon MP3, Google Play Music, iTunes, 7digital, Xbox Music [2]), and streaming music services (e.g., Deezer, MySpace Music, Napster, Rdio, Rhapsody, Spotify [3]) comes heretofore unparalleled opportunities to change the way music can be personalized for and delivered to target users with varying needs.
One need, shared by both rehabilitation professionals and exercise enthusiasts, is the ability to create music playlists which facilitate the synchronization of complex motor actions (e.g., walking) with an auditory beat. Auditory-motor synchronization has been deemed a human cultural universal [4] and a ''diagnostic trait of our species'' [5]. Even infants show perceptual sensitivity to [6] and coordinated motor engagement with [7] musical rhythms. The phenomenon of auditory entrainment (the dynamic altering of an ''internal'' periodic process or action generated by an organism in the presence of a periodic acoustic stimulus) remains an active topic for the field of music cognition [8][9][10][11][12][13][14].

Physical Isochrony versus Perceptual Stability
A basic requirement for the music used in auditory2motor rehabilitation paradigms is it possesses a stable tempo (i.e., the rate at which beats or pulses are perceived to occur), thereby facilitating motor synchronization to the beat. This requirement is typically satisfied through the use of a digital metronome, either in isolation or superimposed on top of computer-generated music (e.g., [51]), ensuring a precisely isochronous inter-beat interval (IBeI). However, a slightly more relaxed requirement could be proposed: that the sequence of IBeIs in the music stimulus need not be physically isochronous, but rather, be perceptually stable.
Although these conditions are well-controlled experimentally, they do not necessarily generalize to performed music. That is, absent from a digitally produced rhythm track, it would be expected that IBeIs in performed music exhibit some degree of ''natural'' variability in tempo (or, perhaps less pejoratively, ''flexibility in tempo''). However, an important question that follows from this assumption-namely, ''How much physical variability in an IBeI sequence results in the perceptual instability of tempo?''-has not been clearly asked, or answered. By contrast, studies seeking to quantify listeners' perceptions of tonal stability (e.g., [72,73]), or overall ''musical stability'' (e.g., [74]) are more frequent.

Beat Tracking and Tempo Extraction Algorithms
Accurately estimating the tempo of recorded music is an important topic within the field of music information retrieval (e.g., [75][76][77]), and numerous algorithms have been developed to accomplish this (for summaries, see [78][79][80][81]). Two broad categories of algorithms can be defined. Beat tracking algorithms return a time series of detected IBeIs along with a point estimate of ''average'' tempo in beats per minute (bpm). Tempo extraction algorithms return only the latter.
An important goal for beat tracking algorithms is to identify the temporal locations of each beat accurately (i.e., with respect to listeners' ''ground truth'' perceptions) in the face of changes, drifts, fluctuations, or expressive variations in tempo within an audio file. The ability of a beat tracking algorithm to accurately identify the precise location of each beat in the face of a fluctuating temporal surface, however, is independent from its ability to meaningfully quantify how much temporal instability is actually present in the series of detected beats. Similarly, the ability of a tempo extraction algorithm to provide a point estimate (e.g., ''tempo590 bpm'') that agrees with human perception (e.g., the average inter-tap interval when listeners were instructed to tap to the beat) reveals nothing about whether that estimate is stable across the entire audio file; and if not, over what time indices of the file that estimate is stable. (The accuracy of any point estimate is of course dependent upon the manner in which it was computed, as will be illustrated in Section 4 of the Methods.).
To our knowledge, no current software algorithm, front-end interface, or backend metadata service provider has offered any statistic explicitly designed to quantify the amount of beat-to-beat temporal instability within an IBeI series.
To address this issue, we expand upon our previous conference paper [82] and present a novel analysis tool: a ''Balanced Evaluation of Auditory Temporal Stability'' (BEATS). BEATS itself does not perform beat tracking, but instead takes beat and barline (i.e., downbeat) onsets estimated by an independent beat tracking algorithm as input. For its initial release, BEATS has been optimized to the data structure of the ''Million Song Dataset'' [83] (MSD; http://labrosa.ee.columbia. edu/millionsong/), a publicly available collection of computed acoustic features (e.g., individual beat and barline onsets; average tempo; estimated time signature) and music metadata (e.g., artist, album, and genre information) associated with nearly one million audio files processed using the proprietary ''Analyze'' algorithm [84] developed by The Echo Nest (www.echonest.com). Compatibility with this data structure has scalable advantages, as the full Echo Nest library contains over 35 million analyzed audio files.
For each analyzed audio file, BEATS computes nine Summary Statistics that quantify some characteristic of the inter-beat or inter-bar interval data. These statistics can in turn serve as input to search engines for which tempo is a key query feature (e.g., [75,[85][86][87]).
By providing a more comprehensive quantitative analysis of both tempo and tempo stability, and incorporating those statistics as filterable features within an online resource (''iBEATS'', described in Section 3 of the Results), BEATS becomes a further step towards a solution that provides users with access to music that has been tailored to their (or their patients') recreation or rehabilitation needs.

Platform
BEATS is implemented in Matlab (version §7.8), supplemented by a few publicly available functions associated with the Million Song Dataset [88] and Matlab Central (http://www.mathworks.com/matlabcentral).

Raw Data
For each metadata file, BEATS pulls four Echo Nest fields: beats_start and bars_start (the estimated onsets of successive beats and barlines, respectively); and tempo and time_signature (point estimates directly provided by Echo Nest). Next, beats_start and bars_start are transformed into an inter-beat interval inter-bar interval series, respectively, by taking the first-order difference of each timestamp vector.

Initialization Thresholds
BEATS requires the user to specify three Initialization Thresholds:

Internal Calculations
The first statistic calculated by BEATS is an estimate of an IBeI series' central tendency, or location, l. Common measures of l include the mean, median, and mode. However, obtaining an optimal value for l can be more complicated than simply taking the mean, median, or mode of a series. Consider the hypothetical 80-element IBeI series S shown in Figure 1A, which exhibits two tempo changes (at the 21st and 41st elements). Visual inspection of the Matlab-derived mean, median, and mode reveals that all are clearly inadequate measures of the ''true'' central tendency of S (i.e., < 1.0). One widely used method of obtaining a more accurate value for the central tendency of a dataset (specifically, the mode) has been the use of kernel density estimation (KDE) techniques, first proposed in the 1960 s [89] Figure 1B plots the estimated probability density of the distribution of values in S, using various values for the kernel bandwidth (i.e., the smoothing parameter). The mode of S is defined simply: the x-axis value at which the highest probability density (y-axis) occurs. As can be appreciated from Figure 1B, the bandwidth plays a strong role in the resultant mode: too narrow, and the mode will default to its most frequent value; too wide, and the density estimate will ''smooth over'' distinct features (in this case, time-varying features) within the data set, such as the presence of multiple modes.
To circumvent this problem, and thus provide a more ''representative'' value for l, BEATS makes use of a recent implementation of adaptive (variablebandwidth) Gaussian KDE [90,91], which optimizes the bandwidth so as to return a valid density estimate even in the presence of multiple modes. Using this approach (shown as the blue density estimate in Figure 1B), l is calculated as 1.0002: a far more representative value.
Having determined l, the longest ''Stable Segment'' within the IBeI series is then identified. The first step in this process is to identify the locations of ''stable'' IBeIs, where stability is operationalized in two ways: stability of each IBeI relative to l, and stability between successive IBeIs. The first type of stability is quantified via a ''percentage deviation from l'' (PDL) transformation: The second type of stability is quantified via a ''successive percentage change'' (SPC) transformation between IBeIs i and i+1: (Both S PDL and S SPC are expressed as relative percentages so as to facilitate comparisons across IBeI sequences in different tempo ranges.) These two equations are used in sequence to identify the location of temporally stable IBeIs. First, an initial determination of stability is made for each IBeI: where ''1'' indicates a stable IBeI relative to l. Next, for all pairs of elements {i, i+1} for which S Stable,i has a value of {1, 1}, S Stable,i+1 is then revised:  Figure 1A, using various bandwidth values. The most accurate measure of central tendency was obtained using adaptive Gaussian KDE [90,91].
A ''Run'' (i.e., a string of 1 s) within S Stable thus indicates both temporal stability relative to l as well as between successive IBeIs; a ''Gap'' (i.e., a string of one or more 0 s) indicates temporal instability. The Stable Segment is defined as the longest consecutive sequence of adjacent Runs-plus-Gaps (e.g., {Run j , Gap j , Run j+1 }), where each Run has a duration § h Run and each Gap a duration # h Gap .

E. Summary Statistics
(F) ''Estimated Meter'': a more precise operationalization of meter than the usual integer value (e.g., ''4 beats-per-bar''). Specifically, for a Stable Segment with a bar timestamp series {r i , r i+1 , …} and beat timestamp series {b j , b j+1 , …}, let n i be the number of beat timestamps for which r i # b j ,r i+1 . Estimated Meter is then taken as the mean of all n i . Only in the case when all n i have the same value will a true integer result (e.g., 4: 0), providing an easy way to identify audio files that have an unstable meter within the Stable Segment. ''short term drift'' in tempo across all Runs, expressed as a percentage, and calculated as follows. First, within each Run, a series of 10-s windows is defined, with each successive window overlapping half of the previous window. Second, within each window, the best-fitting slope (i.e., linear tempo drift) through the IBeIs is found using least-squares linear regression Matlab's polyfit (highlighted in red in the two example IBeI series shown in Figure 2). Third, for each calculated regression slope, the y-axis endpoints within window w are found, and expressed as percentage change (i.e., a ''percentage of tempo drift'', PTD). In Figure 2A, for example, the best-fit slope in the 0 to 10 s window rises from y5.4997 to y5.5029 (yielding PTD50.65%), whereas the best-fit slope in the 10 to 20 s window falls from y5.5064 to y5.4897 (yielding PTD523.30%). Finally, PTD max is taken as the largest absolute value of all PTDs across all Runs. For the IBeI series in Figure 2A, PTD max 53.30%.
Importantly, PDL max , SPC max , and PTD max quantify partially independent aspects of temporal instability. The IBeI series in Figure 2B is in fact simply a random reshuffling of the IBeI series in Figure 2A, meaning that the two have identical means (50.50), standard deviations (50.005), and PDL max (52.69%) statistics. Their SPC max and PTD max statistics, however, are markedly different (by a factor of 4 and 3, respectively). Quantifying these three aspects of temporal instability provides a richer description of each IBeI sequence, as well as how IBeI sequences differ from one another.

F. Implementation
To illustrate its various features, BEATS was run on the full Million Song Dataset using Initialization Thresholds of h Local 55.0%, h Run 510 s, and h Gap 52.5 s. (The values of these thresholds, especially h Local , should be considered illustrative rather than prescriptive; more will be said about this point in Section 1 of the Discussion.). In Figure 3A, the entire audio file consists of a repeating (looped) four-beat percussion riff. The IBeI series is highly regular, with nearly all successive IBeI differences being less than 2 ms. This audio file represents an ''ideal'' case: nearperfect isochrony from the first beat to the last, yielding very low values for the three Summary Statistics that quantify IBeI variability (PDL max , SPC max , and PTD max ), as well as excellent agreement between BEATS' Estimated Tempo and Echo Nest's tempo estimate (a difference of less than one-tenth of 1%).

Individual Examples
In Figure 3B, the audio file begins with a complex rhythm, to which a simple drum-and-cymbal rhythm (at approximately 150 bpm) at a higher frequency (pitch) and intensity (loudness) is added at the 13-s mark. This simple rhythm is removed at the 110-s mark, reintroduced at the 116-s mark, and remains in place until the end of the file at 199 s. It is this simple rhythm that drives the output of the Analyze beat detection algorithm. As such, the 94-s Stable Segment (identified by BEATS) is the longer of the two segments at that same tempo (the other being roughly 83 s). Within the Stable Segment, most IBeIs differ by only a few ms (similar to Figure 3A), yielding low values for the IBeI variability statistics. However, although the estimates of tempo by BEATS and Echo Nest again show excellent agreement, using the entire audio file in a motor synchronization paradigm (rather than just the Stable Segment) may prove challenging for some patients.
In Figure 3C, the Stable Segment is comprised of four distinct Runs bridged across three Gaps (at roughly 40 s, 77 s, and 160 s) that emerge as a consequence of unexpected syncopations in the voice (Gaps 1 and 2) or electric bass (Gap 3).  PDL max and SPC max both have higher values than in the previous two examples, which might be expected as this audio file was recorded in a studio with session musicians (as opposed to synthesized on a computer, like the excerpts highlighted in Figures 2A and 2B) [92].
In Figure 3D, the accelerando for which the piece is famous is clearly visible in the IBeI plot; such an acoustic feature would, in theory, make for poor temporal stability. BEATS, however, was able to identify a 61-s Stable Segment where the tempo accelerated in less than 5% increments (as quantified by the ''Maximum of Percentage Tempo Drift'' statistic, PTD max ).
Another feature of this IBeI series is notable. Although the perceptual tempo of the audio file continues to accelerate throughout its second half, the detected IBeI series (which had been tracking the quarter-note pulse) dramatically shifts from 0.42 s (at the 113-s mark) to 0.74 s (by the 116-s mark). Listening to the recording itself reveals a prominent change in timbre and intensity with the introduction of the chorus (and its strong accents on alternating quarter notes) at this point in the musical score (i.e., bar 49 in [93]). Although this musical event falls outside the Stable Segment, it raises an important point about the intimate dependency of BEATS on the beat tracking algorithm from which it takes its input data-a point detailed further in Section 1 of the Discussion. Figure 4 presents a histogram (with log 2 spacing along the y-axis for visual clarity) for each Outcome Statistic. The number of files actually summarized in Figure 4 is 971,278; the remaining files (i.e., 2.9% of the full MSD) did not have an identifiable Stable Segment which satisfied the Run Duration Threshold (i.e., were found to have less than 10 s of temporal stability).

Static Presentation of Summary Statistics
An immediate question of interest concerns the agreement in ''average'' tempo as estimated by BEATS (T B ) and Echo Nest (T E ). As revealed in Figure 4E, this match was generally quite high: 95% of all ETM percentage values fell within the interval [-2.20, 1.69]. That a vast majority ofT B values differed from theirT E counterparts by less than the just-noticeable-difference for changes in tempo in isochronous IBeI sequences (cf. Section 1 of the Introduction) would seem, at first blush, to eliminate the need for BEATS entirely. Critically, however, agreement in terms of ''average'' tempo is only one piece of the puzzle, as it does address whether (and over what portion of the audio file) that tempo is stable-thus making that value statistically valid and experimentally useful.
In fact, Stable Percentage values (i.e., the percentage of each file's duration that consisted of temporally stable of Runs that were separated by temporally unstable Gaps of no more than 2.5 s) varied widely across the MSD, as revealed in Figure 4B. Less than 22% of MSD files (N5214,540) yielded a Stable Percentage5100 (i.e., indicating temporal stability from the first detected beat to the last). This result has important consequences for ''unsupervised'' tempo-based playlist generation algorithms (e.g., [52]- [54]): only a fraction of audio files actually maintain their nominal tempo (i.e., the their Echo Nest tempo estimate) over their entire duration.
By contrast, if a user simply requires music that is temporally stable over a minimum duration (say, 90 s; useful for short gait training episodes or bouts of rhythmic exercise between rest periods) rather than its entire duration, a more optimistic picture emerges. As highlighted in Figure 4A, 61% of MSD files (N5609,676) have a Stable Duration §90 s-nearly three times the number of MSD files that have a Stable Percentage5100. Allowing BEATS to identify the Stable Segment within each audio file (rather than using the entire audio file a priori) yields a greater number of files that could be utilized in tempo-based playlists.
With respect to meter, agreement between BEATS and Echo Nest was very high, as highlighted in Figure 4F: for 99.6% (N5967,226) files, the two estimates matched exactly (e.g., time_signature54 and Estimated Meter54: 0). An unexpected result, however, also emerged: a substantial number of audio files (N521,412) yielded an Estimated Meter57: 0. (This number was reduced to 11,164 when excluding audio files with a Stable Duration of less than 60 s.) This ''odd'' result was confirmed when comparing the time_signature statistic (i.e., Echo Nest's own meter estimation) for these files; agreement was found in all cases. A cursory listening of these audio files revealed that the Estimated Meter value was, not surprisingly, inaccurate. Identifying misclassifications such as these will provide important ''grist'' to refine future beat tracking algorithms, a point further elaborated upon in Section 2 of the Discussion.
A final question pertains to correlations among the three Summary Statistics that most directly quantify the stability of an IBeI series: IBeI deviations from l (PDL max ), successive changes between IBeIs (SPC max ), and IBeI drift within Runs (PTD max ). Figure 5 provides the answer, using scatter plots to visualize pairwise relationships between these three variables for the 609,676 MSD files with a Stable Duration §90 s. (This threshold was applied so that the scatter plot relationships would be less biased by Summary Statistics calculated from short excerpts of music.) Although the correlation between each pair of variables is positive (and ''very'' statistically significant given the large number of observations), it is clear that any one variable captures only a portion of what it means to be ''temporally stable''.

Interactive Exploration of Summary Statistics
To more effectively interact with (and benefit from) the full set of Summary Statistics, an interactive tool is required. To this end, a LAMP-based (Linux, Apache, MySQL, PHP) web interface was developed. This interface, termed iBEATS (with a permanent URL at http://ibeats.smcnus.org/), integrates the full output of BEATS with three other valuable pieces of metadata: artist name, album release year, and descriptive genre tags.
For each item in the MSD, album release year was obtained by querying the 7digital application programming interface (API) (http://developer.7digital.com) using the MSD variable release_7digitalid. This yielded a total of 930,852 matches, a significant improvement upon the 515,576 files with a non-zero value in the MSD year variable [83].
For each unique artist in the MSD, a set of descriptive terms were pulled (MSD variable artist_terms) covering both high-level genre (e.g., ''rock'', ''electronic'', ''heavy metal'') and specific subgenres (e.g., ''garage rock'', ''deep house'', ''progressive metal'', etc.), as well as broad geographic descriptors (''brazilian'', ''french'', ''swedish'') and specific regional influences (e.g., ''brazilian pop'', ''french rap'', ''swedish hip hop''), and up to 10 terms with an artist_terms_weight §0.5 for that particular artist were retained. The weight statistic, with values ranging from 0 to 1, reflects how descriptive a given term is with respect to the artist in question (as proprietarily determined by Echo Nest; cf. [94]), similar to a term frequency-inverse document frequency statistic. Table 1 lists the 20 terms most frequently encountered artist terms in the MSD, tallying the number of artists and the number of songs associated with each term.
(The Spearman correlation between these two item counts is r5.966 for the 1080 terms associated with at least 10 unique artists in the MSD.) The final number of MSD items which had valid tag data, year data, and a Stable Segment of at least 10 s was 902,081. Figure 6 presents a screenshot of an iBEATS query. The nine Summary Statistics are visualized using histograms, similar to Figure 2, and can be re-  thresholded at liberty. To facilitate users' ability to navigate musical space, 952 distinct artist terms were mapped onto one of two browsable, two-level hierarchies: one covering genre/style (with organization derived in part from www.allmusic.com/genres; e.g., ''garage rock'' is mapped to Rock R Psychedelic/  (2), and/or direct input to the Artist Name field (3). Filtering (4) reveals the number of candidate songs satisfying the query, which may then be further examined (5) and an audio sample previewed (6). The candidate playlist may then be exported (7) for subsequent use by a streaming music service (e.g., Spotify). Garage), and the other covering geography (roughly corresponding to continent and country; e.g., the term ''suomi rock'' is mapped to Europe, Northern R Finland). Additionally, specific artist names may be retrieved using text-based auto-completion (e.g., ''ab'' retrieves both ABBA and Abbott & Costello as options).
In the example shown in Figure 6, a playlist has been created for a hypothetical patient about to begin a gait rehabilitation paradigm. The following input parameters were used: all Rock genre songs from 1950 to the present, with a Stable Duration §90 s, Estimated Tempo between 115 and 125 bpm, Estimated Meter54: 0, and PDL max , PSD max , and PTD max all #5.0%. 19,725 audio files from the MSD satisfy this query, and are returned in a pop-up window; where available, 30-s audio previews are provided by making use of Echo Nest's integration with 7digital audio previews [95]. (Note that the number of available files for a particular query is scalable: as BEATS expands further into the 35-million-item Echo Nest catalog of metadata, so too does the number of candidate songs satisfying that query.) The final, customized playlist (including, importantly, the starting and stopping time indices demarking the Stable Segment) may then be exported for subsequent handling by a streaming music player (e.g., Spotify; www. spotify.com), as described further in Section 2 of the Discussion.

Discussion
Although many widely used beat tracking or tempo extraction algorithms, frontend software interfaces, and back-end metadata service providers offer point estimate statistics for the ''average'' tempo of an audio file, none has sought to systematically quantify the amount of temporal instability within an inter-beat interval (IBeI) series. Such an analysis is, we propose, acutely necessary to accurately design playlists for motor rehabilitation or rhythmic exercise paradigms, for which a stable beat is a prerequisite feature.
The proposed analysis tool, a ''Balanced Evaluation of Auditory Temporal Stability'' (BEATS), seeks to fill this need. The ultimate utility of BEATS, however, rests on (at least) two important caveats. The first caveat concerns the accuracy of the beat tracking algorithm; the second concerns the choice of thresholds used to define the Stable Segment.

Caveats
A first caveat, as noted in the Introduction, is that BEATS possesses no beat tracking capabilities itself; its raw material is a timestamp vector of beat and barline timestamps that had been previously detected by an external algorithm. For this reason, the idiosyncrasies of a particular beat tracking algorithm (or a systematic difference between two ''competing'' algorithms) will necessarily be reflected in whether and where BEATS identifies a Stable Segment of IBeIs. An algorithm's beat tracking performance can be affected by both temporal (e.g., a complex rhythm loop) and non-temporal (e.g., recording quality) features of an audio file; examples of this were highlighted in Figure 3 and detailed in Section 1 of the Results.
Although this fact may make BEATS conservative (in that some audio files will be deemed to lack a Stable Segment of a ''useful'' minimum duration if many Gaps are present), such conservativeness may be beneficial in practice, as it will exclude pieces of music that may in fact be too challenging for listeners to synchronize with. (An ever-larger library of processed audio files will, of course, mitigate this conservativeness.) Indeed, the relationship between how a beat tracking algorithm performs and how listeners themselves perform when given a beat tracking task continues to drive developments in the field [79,[96][97][98][99]. The more closely an algorithm mimics human perception with respect to how it responds to temporal instability, the higher the quality of the Summary Statistics calculated by BEATS.
A second caveat is that the output of BEATS depends heavily on the choice of its Initialization Thresholds (cf. Section 3 of the Methods): the Local Stability Threshold (h Local ), Run Duration Threshold (h Run ), and Gap Duration Threshold (h Gap ). Of these three, h Local perhaps has the strongest influence over the likelihood of finding a Stable Segment with a ''useable'' duration (e.g., §90 s). In the present report, a value of h Local 55.0% was selected. This value was chosen after a careful examination of the literature exploring just-noticeable differences (JNDs) within and between auditory temporal patterns (cf. Section 1 of the Introduction)-and determining that no prior reported threshold satisfied the constraints of the current project. Thus, the pattern of Summary Statistics obtained using h Local 55.0% should be taken as illustrative rather than prescriptive. A conservative h Local value (e.g., 1.0%) would certainly decrease the number of available audio files with a useable Stable Duration, but at the same time increase the confidence that any audio files that ''made the cut'' were truly perceptually stable. Ultimately, adjusting both the Initialization Thresholds and the musical content (genre, artist, decade) to suit the needs and preferences of each target user (and the goals of the accompanying motor task) would seem the most prudent choice.

Future Directions
The primary aim of BEATS and iBEATS is to provide accurate statistics about tempo stability in a large collection of audio files, and to make that information easily accessible to users. Increasing the size of BEATS' library (via access to Echo Nest metadata) to provide a greater collection of potential music stimuli is planned for the immediate future. Additionally, as noted by a reviewer, the manner in which genre/style terms are made available to a user by iBEATS may be as important as the statistics a user is hoping to obtain from iBEATS. Providing additional tools for musical ''navigation'' would offer enhanced accessibility and, in turn, widen the potential user base.
Although iBEATS itself is not viable as a means of delivering a rhythmic auditory cueing paradigm, we plan to author a mobile application that would (1) take a user's input (artist, genre, tempo range, tempo stability thresholds, etc.); (2) query BEATS and obtain a candidate playlist; and (3) deliver that playlist using existing APIs authored by licensed streaming music services such as Deezer (http://developers.deezer.com/), Rdio (http://www.rdio.com/developers/), or Spotify (https://developer.spotify.com/). The ability to pair iBEATS with other mobile applications would offer novel ways to discover music; for example, by identifying a segment of audio using a music identification service (e.g., Shazam; http://www.shazam.com/) and then using BEATS to find music with similar temporal characteristics (a form of ''query by example''; cf. [100]), or by utilizing a touchscreen-based ''query by tapping'' (cf. [101]) to more intuitively capture the desired movement rate.
In another vein, concurrent work from our laboratory [102] has sought to validate a mobile application to quantify the basic temporal dynamics of human gait in both healthy adults and Parkinson's patients. A subject's cadence (i.e., number of steps per minute) could then itself be used as an input parameter, creating a ''query by walking'' paradigm (which, although proposed previously [87], has yet to be explored within the music information retrieval literature).

Current Applications
Besides these future enhancements for ''front end'' users, current researchers may already benefit from BEATS. For researchers seeking to improve beat tracking algorithms, for example, BEATS could be used to identify audio files with ''strange'' IBeI patterns (e.g., Figure 3D) that may reflect an inherent limitation of a certain beat tracking algorithm, or to find those audio files with a sizable Estimated Tempo Mismatch (cf. Figure 4E).
BEATS could also prove useful with respect to identifying an algorithm's misclassifications of meter (e.g., [103]) or tempo ''octave'' (e.g. [104]). Because the Stable Segment identified by BEATS within a given audio file possesses, by definition, a repeating acoustic pattern at some rhythmic level (e.g., eighth note), only a brief portion of the Stable Segment should be necessary for a human annotator to (1) indicate (i.e., tap) the pulse level (e.g., eighth note, quarter note, half note) they felt was most natural and (2) indicate whether the meter estimated by the algorithm (e.g., 3, 4) agreed with their own perceptions. This ''accelerated'' annotation process would greatly reduce the labor required to confirm these important statistics and identify misclassifications (e.g., the suspiciously high number of audio files with an ''Estimated Meter57: 0'', as noted in Section 2 of the Results). Such audio files would provide an immediate set of diagnostic stimuli that could be used to compare how beat tracking algorithms-particularly those informed by computational, psychological, and neurobiological models of how human listeners track patterns in time; for recent comprehensive reviews, see [12-14, 105, 106]-perform relative to listeners' ground-truth tapping annotations. Fusing ''bottom-up, data-driven'' retrieval methods with ''top-down, knowledge-based'' models of human perception, cognition, and emotion remains a key focus for the field of music information retrieval (e.g., [43,[83][84][85][86]).

Conclusion
We present a novel tool to quantify auditory temporal stability in recorded music (BEATS). An important departure that BEATS makes from other methods is that it seeks to identify the most temporally stable segment within an audio file's interbeat interval (IBeI) series, rather than derive a point estimate of tempo for the entire IBeI series. This increased flexibility enables BEATS to identify a greater number of candidate audio files for use in tempo-based music playlists. An online interface for this analysis tool, iBEATS (http://ibeats.smcnus.org/), offers straightforward visualizations, flexible parameter settings, and text-based query options for any combination of artist name, album release year, and descriptive genre/style terms. Together, BEATS and iBEATS aim to provide a wide user base (clinicians, therapists, caregivers, and exercise enthusiasts) with a new means to efficiently and effectively create highly personalized music playlists for clinical (e.g., gait rehabilitation) or recreational (e.g., rhythmic exercise) applications.