The Global Jukebox: A public database of performing arts and culture

Standardized cross-cultural databases of the arts are critical to a balanced scientific understanding of the performing arts, and their role in other domains of human society. This paper introduces the Global Jukebox as a resource for comparative and cross-cultural study of the performing arts and culture. The Global Jukebox adds an extensive and detailed global database of the performing arts that enlarges our understanding of human cultural diversity. Initially prototyped by Alan Lomax in the 1980s, its core is the Cantometrics dataset, encompassing standardized codings on 37 aspects of musical style for 5,776 traditional songs from 1,026 societies. The Cantometrics dataset has been cleaned and checked for reliability and accuracy, and includes a full coding guide with audio training examples (https://theglobaljukebox.org/?songsofearth). Also being released are seven additional datasets coding and describing instrumentation, conversation, popular music, vowel and consonant placement, breath management, social factors, and societies. For the first time, all digitized Global Jukebox data are being made available in open-access, downloadable format (https://github.com/theglobaljukebox), linked with streaming audio recordings (theglobaljukebox.org) to the maximum extent allowed while respecting copyright and the wishes of culture-bearers. The data are cross-indexed with the Database of Peoples, Languages, and Cultures (D-PLACE) to allow researchers to test hypotheses about worldwide coevolution of aesthetic patterns and traditions. As an example, we analyze the global relationship between song style and societal complexity, showing that they are robustly related, in contrast to previous critiques claiming that these proposed relationships were an artifact of autocorrelation (though causal mechanisms remain unresolved).


S1.1.2. The Contemporary View
While not universally welcomed, systematic comparative research is reentering the domains of the arts and humanities [6]. Lomax's sweeping pronouncements could class him as a "grand theorist." Nonetheless, he and his colleagues were engaged in a modern endeavor, using cross-cultural analysis in a series of experiments delving into questions which are admittedly broad but defined and delimited by strict protocols of analysis, data points, and probability. Recent studies deal with musical universals, perception and taste, cultural and musical evolution, and the role of aesthetics in society [7][8][9]60]. To effectively investigate such questions, it is useful to supplement theory underpinning or arising from detailed studies of a single society or expressive phenomenon with approaches from other methodological vantage points, including comparative and cross cultural analysis. In contemporary archaeology there is a call for multiple approaches to research problems, ranging from world systems and historical perspectives to critical qualitative studies and comparative and quantitative analyses. [10]. Variation on the global scale isn't random; there are myriad global patterns that call for explanation.
While each society is unique and interesting in its own right, it is also necessary to explain the broad cross-cultural patterns we see in culture and, in this case, music. [11][12][13]. Lomax (1980) summarized the main methods and findings from the Cantometrics Project, and this was further summarized and reviewed by Savage (2018) and Wood (2018aWood ( , 2018b2021). Here we provide a condensed summary--please see these other publications for further details.

S1.2.2. Original project: clusters of musical style
Secondly, 10-14 broad but distinctive clusters of musical style emerged from factor analysis (Fig 1; Table S2). Although a new study finds that music is weakly related to genetic distance, language (basic vocabulary), and geographic proximity [14], the musical style clusters are roughly consistent with findings on human settlement made by geneticists and archeologists, and subgroupings within these regions matched their cultural geography; Kubik, for example, noted that Cantometrics's subregional clusters of African musical traditions agreed with his own findings [16]. Table S2. Cantometrics Song Style Cluster Descriptions [17,18] 1) African Gatherer: Harmonious, inclusive, integrated music. "Group singing is not only contrapuntal but polyrhythmic, a playful weaving of four and more strands of short, flowing, canon-like melodies (each voice imitating the melody of the others), sounding wordless streams of vowels in clear, bell-like yodelling voices" (Lomax, 1976, p. 38).

5) Nuclear
America: "diffuse, highly individualized choralizing... use of polyphony that often veers toward vocal heterophony. Frequency of irregular and one-beat meters, wide melodic intervals, and guttural vocalizing link the Nuclear American style with that of the Sibero-American hunters-but a pattern of soft, example, between C. African and S. African hunter gatherers [20,21], and between Tupian speaking Amazonian peoples and Melanesians), which were confirmed through ancient DNA analyses decades later [22, see also 21].

S1.2.3. Recent similar outcomes
Several unrelated studies have found seven to fourteen regions which grouped coded variable states together geographically [23][24][25]. They match Lomax's geographies of style, which for the most part match regions of (mostly) ancient settlement and migration found in genetic and archeological studies of old populations. A modern cluster analysis of 4,714 musical samples coded in Cantometrics using latent class analysis, found that, in tests modeling from one to 31 solutions, the 14-cluster solution and 2-3 of its nearest neighbors were the best "fit" to the data [26].

S1.2.4. Original correlation analyses
The societies included in the Ethnographic Atlas had been coded by Murdock for social and cultural characteristics documented by ethnographic research; these data have recently been validated [27]. The sociocultural variables were ethnographically documented between about 1650 and the 1960s (with a few much earlier exceptions).
Correlation analyses were run on every performance variable against many Ethnographic Atlas variables. This process helped identify the principal environmental and socio-cultural variables that varied with performance. A number of these were included in a set of 38 socio-cultural variables, some with modifications [19]. The societies in all the studies were then coded for these variables. These codings were tested against performance traits. Only statistically significant correlations (p < .05) were reported in publications. Questions about control for multiple comparisons and for autocorrelation were raised by Erickson (1976); but his reevaluation confirmed the cluster analysis and one of the earliest correlations in Cantometrics between pinched voice and nasality and restrictions on female sexuality [17,28] finding that sexual restrictiveness predicts less sonority in singing.
This made it possible to intercorrelate musical and socioeconomic factors, and to distinguish two larger independent systems of interrelated musical factors labeled "Differentiative'' (Information-Productivity') and "Integrative" (Labor Organization-Gender Role) with factors of dynamics, tension, ornament, melody, rhythm in a dependent and variable relationship to both. Plotting the two large systems (Integrative and Differentiative) on a vertical axis (left to right) against a horizontal axis bearing the regional cultural systems (Fig 1; Table S2), shows how the Integrative system varies its trajectory in response to the Differentiative system, and how both correlate with cultural systems ( Fig   S1).
Covariance of song style plotted against cultural systems, from [15].
Lomax concluded that musical style is an indirect response to soecietal fundamentals, such as mode of subsistence and productivity, social organization including stratification, the organization and gendered division of labor, and climate. The covariance of the two main models driving musical style portrayed the elaboration of performance style as generally following socio-economic complexity.

S1.2.5. Independent outcomes.
Lomax et al. found that polyphonic vocal organization is most frequent in complementary socio-economies such as those of warm latitude gatherers, or gardeners with a few domestic animals, where women make an equal contribution to subsistence [17]. The negative association between polyphony and plow agriculture has been retested [29], with higher coefficient values than those originally obtained. These findings are supported in an exhaustive study by Alesina et al [30] on the relationship between plow agriculture and women's loss of status. Consonant-Vowel syllables, which reflect a regular rhythm in language, are related to the degree of baby-holding, a theory derived from Barbara Ayres' work with Cantometrics that found regular rhythm in music significantly associated with baby-holding [28,31]; other independent analyses have confirmed some findings.
Regardless, both correlations and cluster analyses will be reanalyzed with the far more sensitive techniques now available.

2.
Minutage. This study of breath management and phrasing in indigenous and folk song investigated the ways that melody and song structure are articulated through the breath.
Minutage is a way of understanding song structure as it is performed. It considers the framework of a style in terms of the habits and small decisions concerning when to end an expulsion of sound, for example, when and how long to pause, who pauses, when to let a note ring out, whether to emphasize inhalations or to cover them up.

Line 21
Axis -Mid Back -Low Central Prevalence of oscillations between mid back and low central vowels.

Line 22
Axis -High Front -Low Back Prevalence of oscillations between high front and low back vowels.

Line 23
Axis -Mid Central -Low Central Prevalence of oscillations between mid central and low central vowels.

Line 24
Axis -High Front -Low Central Prevalence of oscillations between high front and low central vowels.

Line 25
Axis -Mid Front -Low Central Prevalence of oscillations between mid front and low central vowels.

Line 26
Axis -Mid Central -Low Front Prevalence of oscillations between mid central and low front vowels.

Line 27
Axis -Mid Front -Mid Central Prevalence of oscillations between mid front and mid central vowels.

Line 28
Axis -High Front -Mid Central Prevalence of oscillations between high front and mid central vowels.

Line 29
Prolongation -High Front Prominence of prolonged high front vowels.

Line 30
Prolongation -High Central Prominence of prolonged high central vowels.

Ensembles and 5.
Instruments . Two related studies of instruments and ensembles based on bibliographic information and the Sachs-Hornbostel system of classifying instruments, which is based on how instruments were played, rather than how they were constructed.
Instruments are also classified according to their gender and other symbolic functions.    Overall pattern of interaction How speaking time is divided between two people in a conversation.

Line 2 Pattern of interaction in stretches
Pattern of turn-taking in stretches of conversation, regardless of which speaker assumes the leading role in each stretch.

Line 3 Speaker Similarity
Similarity of the speakers' voice qualities and dynamics.

Line 4 Principle Type of Relationship
How transitions between speakers occur.

Line 5 Stability of Role Relation
Whether speakers' relationship to each other is constant or changeful throughout the course of the conversation.

Line 6 Longest speech
Length of the longest stretch any one speaker continues, even with interruptions or interjections, before another speaker takes over.

Number of interventions in longest speech
Number of utterances by the speaker who does not hold the floor during the longest speech measured in Line 6.

Line 8
Longest speech burst without pause/interruption Length of the longest stretch of speech in the conversation by a single speaker, without pauses or interruptions.
Line 9A Dominant speech burst 1 Most prominent length of speech bursts (stretches of speech by a single speaker without pause or interruption).
Line 9B Dominant speech burst 2 Second most prominent speech burst length.
Line 9C Dominant speech burst 3 Third most prominent speech burst length.
Line 10 Stability of Timing Degree to which the duration of speech bursts have regular patterning.

Line 11 Attempted Interventions
Frequency of brief energetic speech bursts by one speaker during activity by another (attempts to take over the leading position).
Line 12A Number interchanges per minute 1 Number of speaker changes during one minute of conversation.
Line 12B Number interchanges per minute 2 Number of speaker changes during a second minute of conversation.
Line 13 Inter-speaker Transition Degree of abruptness of the transitions between speakers.

Line 14 Gabble
Frequency of two speakers' utterances coinciding and masking each other to produce a noisy effect.

Line 15 Interjections as Responses
Frequency of interjected responses by one speaker to another.

Line 16 Murmured interjections
Frequency of quiet, brief interjections by speaker B during A's main activity that support this activity (e.g., um-hum, aha, etc.)

Line 17 Intoned Vocal Segregates
Frequency of vocal segregates with clear, musical pitch.

Line 18 Echo
Frequency with which one speaker repeats the speech of another.

Line 19 Repeats
Frequency with which a speaker repeats his or her own utterances.

Line 20 Inter-speaker Pauses
Presence and frequency of pauses between each speaker's speech.

Line 21 Intra-speaker Pauses
Presence and frequency of pauses within the speech of one speaker.

Line 22 Vocal Stance
Degree to which each speaker's vocal stance is stable or changeful.

Line 23 Laughter
Frequency of giggling, chuckling, snickering or laughing.

Line 24 Humorous
Degree of humor and laughter-filled tone in the interaction.

Line 25 Tender
Degree of tenderness in the conversation.

Line 26 Supportive
Degree to which the relationship between the speakers is supportive.

Line 27 Excited
Level of tension and excitement in the interchange.

Line 28 Competitive
Degree to which the interchange is competitive or aggressive. Line 43A Prolongation -Vocal Prominence of prolonged and/or very short notes in the vocal part.
Line 43B Prolongation -Orchestra Prominence of prolonged and/or very short notes in the instrumental part.
Line 44A Syncopation -Vocal Degree to which vocalists anticipate or delay the beat.
Line 44B Syncopation -Orchestra Degree to which instrumentalists anticipate or delay the beat.

Line 45
Vocal Tone Special ways that singers may approach their vocal tone.

Line 46
Vocal Features Special approaches to the musical treatment of the vocal part.

Line 47
Orchestral Tone Special ways that instrumentalists may approach their tone.

Line 48 Orchestral Features
Special approaches to the musical treatment of the instrumental part.

Line 49
Orchestral Type Special types of relationships within the instrumental group, and between instrumentalists and singers.

Line 50
Polyphonic Type Orch Manner in which instrumentalists produce simultaneous intervals other than unison or the octave, if polyphony is present.
Line 51A Harmonic Type -Vocal Prominent type of harmonic relationship in the vocal part.
Line 51B Harmonic Type -Orchestra Prominent type of harmonic relationship in the instrumental part.

Line 52
Rasp -Orchestra Degree of raspy, buzzy, scratchy, non-harmonic tone in the orchestra.

Line 53
Volume -Orchestra Loudness of instrumental part.

Line 54
Tempo -Orchestra Tempo of instrumental part.

Line 55
Orchestral Colors Special approaches to timbre and production in the instrumental part.

Line 56 Collage Components
Presence of layered or collaged recordings, or musical quotes.

Line 57 Melodic Form -Orchestra
Form of the instrumental part of the song, considering the degree that melodic material is repeated (litany, strophe, or through-composed), the complexity of the form, and the degree of variation in each repeated section.

S1.3.2. Performance Metadata
Metadata for individual performances (recorded songs in Cantometrics, Phonotactics, and Minutage; recorded conversations in Parlametrics; bibliographic sources in Instruments and Ensembles) is summarized in Table S10.

Song Texts
Main themes and preoccupations appearing in song texts.

Vocal Qualities
Indicators of the emotional content of singing

S1.4.2. Society Metadata
Metadata for each society in the Societies dataset include: a unique identification number; geographic coordinates based on a representative location that considers the modal location of our primary data sources for each society as well as additional research using secondary sources; focal years (date a given performance was recorded); sample size (by dataset) for each culture's representative data in our collection; Köppen climate and terrain designations and code [34]; language, language family, Glottocode [35] and language ISO code; country; Pre-industrial style cluster [17]; Pre-industrial subsistence taxon [33]; identifying information for matching societies in other D-PLACE datasets [36] (Ethnographic Atlas, Standard Cross-Cultural Sample, Binford Hunter-Gatherer, Jorgensen Western North American Indian) and eHRAF [37]; the Murdock World Sampling Province [38]; and alternative culture names. Societies will also be notated by rainfall (annual and mean distribution) and gender complementarity [19].
In our society metadata we include Murdock's "World Sampling Provinces" [38] as a carefully thought out design for grouping societies, especially for purposes of statistical analysis, which many researchers still use. In the original project, modal profiles of coded singing, dancing, etc. were created and factor-analyzed under this rubric as well as under society. We have retained provinces (encompassing societies, which in turn encompass performances) (a) for the convenience of other researchers working with provinces, and (b) because these groupings have been productive for analyzing data in the past. Although nationality became broadly consonant with culture only rather recently, we added "country" as a lookup field for the convenience of visitors and researchers relying on this point of reference.
Lomax's publications sometimes grouped peoples and song styles into regions such as "Old High Culture'' derived from his research. For classification purposes, more specificity was needed for the sake of a wide public looking to connect with familiar names and places.
We implemented a four-tiered geographic scheme for grouping societies: Region, Division,

S1.5.2. version history of the cantometrics dataset
The 5,776-song Cantometrics dataset we are publishing represents a substantial expansion from the 2,527-song set that was originally analyzed, but not released, in Lomax 1968 [20], a subset of ~1,800 songs from 148 societies used in previous Cantometric analyses [4,15,17], and the sample of approximately 4,000 songs from 400 societies mentioned in some of Lomax's writing [43]. These data have not been released to the public until now.
The first 4,062 songs in the sample were all recorded before 1973, while the remaining ~2,000 songs were added after 1991. In 2007 Victor Grauer added 800 coded songs from hunter gatherer societies and Taiwanese indigenous peoples that had been included in a project investigating the prehistoric migration of song styles and genes from Taiwan into Oceania using Cantometrics [44][45][46]. Miriam El Hajli added 50 Latin American folk songs which are now being coded. Many of the newly added songs are from societies represented by one or only a handful of songs. We are including all available data to allow researchers to access and analyze the sample in ways appropriate for their purposes--focusing on subsamples with larger numbers of performances per society, or using higher-level groupings for analysis and including songs from societies with small samples. S1.6. Data normalization S1.6.

Scaling and multiple coding
Each song in Cantometrics is coded for each of the 37 variables using a single number.
Scales for each variable range from 1 to 13, although in most cases do not contain all values in between. It is also possible for a song to have more than one code for a particular feature.

S1.6.2. Correcting coding criteria for CV-30 and 1
While The original version of Cantometrics included two different codings of "solo singer" for CV-1: 2 ("One singer, whether accompanied by instruments or not"), and 3 ("One singer with an audience whose dancing, shouting, etc. can be heard. In practice we omitted this point and coded all solos '2'"). To make the codings consistent with this description, we recoded 2 songs coded as "3" for CV-1 as "2".

S1.6.3. Data exclusion
Prior to publication, we removed audio recordings containing only instrumental music without vocal song, since Cantometrics was intended to measure and compare songs, not instrumental music. These codings had been originally coded with the idea of potentially expanding the method/sample to include instrumental music, but this was never followed through systematically. To do this, ~100 samples coded as "missing data" or "no singers" for CV-1 (Cantometrics Variable number 1) were excluded. We also excluded any popular songs added to the database, in order to ensure consistency with the original sampling criteria of restricting samples to traditional songs. Finally, we excluded 107 songs that were only partially coded (missing codings for 7 or more variables).

S1.6.4. Correcting "impossible codings"
Around 30 impossible coding values--codes that did not correspond with a defined scale point--were identified and corrected. Digitization errors accounted for all of the impossible coding values that were found amongst songs coded in the original sample, and these were corrected by referring to the original hand-written Cantometrics coding sheets. Nineteen out of the 30 impossible codes were associated with songs that had been digitally coded in a later addition to the Cantometrics sample, with no hard copies to reference in order to identify the source of the errors. In these cases, ACE staff listened to the recordings and re-coded the appropriate variables.

S1.6.5. Missing, incomplete or poorly recorded audio files
Over 1,200 audio files were missing and many others were curtailed or of poor quality. In collaboration with the Curator of the Alan Lomax Collection at The Library of Congress ACE researchers found nearly a thousand of these missing audio files on LPs and 7" tape reels at the Library of Congress, at Indiana University, and through Discogs. We also identified tracks with the worst audio problems and those with better versions, and had the worst audio tracks restored. Efforts to recover missing audio have recently resumed. Sound restoration will be resumed when possible. S1.7. Coding reliability S1.7.1. Inter-rater reliability inter-rater reliability against what may be his own codings). They coded the songs independently once, then compared their codings with one another and revised them appropriately over the course of several iterations into a combined "consensus" set of codings agreed on by both (this consensus was agreed on prior to unblinding and analyzing the data). After running the analyses, it was discovered that one of the 30 randomly selected songs was one of the uncoded songs with missing data that was eventually excluded from the dataset. This song was excluded from the inter-rater reliability analyses (note that this exclusion was not included in the pre-registration because we had not anticipated this possibility).  Table S12 shows the results of the inter-rater analysis. As predicted in our pre-registration, mean reliability for all 37 Cantometric variables was significantly greater than chance (mean K = 0.54, t = 12.5, df= 36, p = 6.3x10 -15 ) and our results with the Cantometric dataset were significantly correlated with our pilot analyses based on a dataset of training codings by 6 members of the CompMusic Lab (r = .72, df = 35, p=6.5x10 -7 ; Fig S8). While there is no general agreement about interpreting K values, our observed mean value of .54 has been described as "moderate" [53] and is higher than the threshold of .4 recommended as a minimum for clinical applications [54,55]. It is also higher than the mean level of 0.40 found in our pilot analyses using a different set of 30 songs coded independently by 6 members of the Keio University SFC CompMusic Lab, and higher than mean levels ranging from 0.24-0.47 found in similar analyses using Cantometrics or related schemes [9,45,51,52]

S1.7.2. Inter-rater accuracy
Any measurement comes with an associated error. While the Kappa statistic is useful as a measure of how well raters agree with each other, it falls short of an error measurement.

S1.8.1. Overview
We tried three separate, objective approaches to evaluate the reliability of the Cantometrics data set. We first note the proportion of recordings for which codings are absent. We then apply computational tools to assess Cantometrics criteria such as the inclusion or absence of instruments / number of singers. Finally, we exploit the few instances of redundancy in the Cantometrics system to check for consistency.

S1.8.2. Missing codings
For each recording, there should be at least one value coded for each variable. Upon checking the data, we found that 300 out of 5,776 (5%) of recordings are missing at least one variable (Table S13). In total, out of 5,776 songs and 37 variables, only 408 out of 213,712 codings (0.2%) are missing. These can be added in future releases by listening to and recoding the audio.

S1.8.3. Comparison with computational analyses
We compare Cantometrics codings with results from computational analyses in such a way that we can list the recordings in order of how likely they are to have errors. We then manually checked the items on the list, starting from the error-prone side. We calculated a moving average (window size of 20) of the fraction of recordings with errors ( Fig S10).
This average decreases approximately linearly with respect to the number of recordings checked. Thus, we consider that we have found the majority of each type of error when the moving average hits zero. We use a neural network (NN) classifier that distinguishes between speech, solo singing, group singing, and instrumental [56]. Using this classifier, we estimate what fraction of each recording fits into these four categories. We then separately identify recordings as Solo Vocal (SV; no instrument), Group Vocal (GV; no instrument), and Contains Instrument (CI; can also include singing) according to the Cantometrics codings shown in Table S14. For SV recordings, we list recordings starting with those that are estimated to have the smallest fraction of solo singing. Likewise for GV / CI we start with those estimated to have the smallest fraction of group singing / instrumental music.
We also use the fact that the human vocal range is typically constrained to about two octaves. This enables us to check for potential errors in recordings that are labeled SV according to Cantometrics codings. We use the pYIN algorithm to estimate the fundamental frequency throughout the recording [57], and list those recordings that are labeled monophonic vocal in descending order of the estimated melodic range in cents.
We manually checked and corrected the errors by referring to the original hand-written coding sheets as well as to the metadata and source audio, to identify whether the error occurred during the digitization process of the data, metadata, or audio. Digitization errors in the data were corrected by changing the codes to match those on the hand-written sheets, and mislabelled, incorrect, or incomplete audio was corrected by retrieving the correct audio from the Library of Congress and other sources, and updating the metadata where necessary to reflect this. Songs that were coded during a later addition to the Cantometrics sample and did not have hard copy coding sheets, but whose metadata and audio appeared otherwise correct, were manually re-coded by ACE staff.
The NN classification algorithm appears to have incorrectly labeled recordings largely due to bad quality recordings. However there are certain types of singing that are underrepresented in the data used to train the NN algorithm [56]. As a result, the algorithm sometimes incorrectly labels old singers and chant singing as speech, female singers are sometimes labeled instrumental. Additionally, the NN algorithm cannot handle overlapping categories, so soft sounds like clapping can be ignored when heard along with singing. The pYIN algorithm flagged many recordings as having a large melodic range, but the cantometrics codings were typically correct. These erroneous flags were mostly due to background noise / poor quality recordings, or when a male speaker introduces a female singer. The numbers of recordings checked and errors found are equivalent to those shown in Fig  S10. Coding logic values means Cantometrics Variable (CV), the following value is the number of the variable in question , followed by the code in question. e.g. CV2-1 is Cantometrics variable 2, code 1.

S1.8.4. Consistency of codings
Cantometrics has some degree of redundancye.g. there are multiple codings which indicate no instrumentwhich allows us to check the codings for consistency. Codings we found 179 instances where this was not true. The Cantometrics guide also specifies that if CV11-13 is present then so must CV26-1, and if CV13-13 is present then so must CV27-1; we found 181 instances where this was not true.
In total we found 349 instances of inconsistency in codings. Corrections are in progress and will be uploaded to Zenodo in new versions of the codings.

S1.8.5 Summary
Through a comprehensive screening process we identified several types of errors in the Cantometrics data set. In particular, we note that 300 out of 5,776 recordings (5%) are missing one or more codings (this amounts to a 0.2% missing variable rate). A computation screening approach identified 60 recordings (out of 284) that were incorrectly labeled with respect to instruments / number of singers; while we cannot rule out more of the same type of error in the rest of the recordings, this appears to be close to the limit of the number of errors that are identifiable using this method. Finally, we found a total of 349 instances of inconsistencies between coded variables.
It is difficult, due to the various screening methods and consistency criteria, to generalize these results to the rest of the data set. What we can say is that of the errors corrected so far, 59% were due to coding errors and 41% were due to various errors due to the digitization process. We think that these errors cover the majority of the objective errors (absence / presence of instruments / multiple singers). More errors undoubtedly remain, but they may be more subjective and harder to find. We can estimate the amount of remaining errors by assuming the rate is constant across coding variables. We managed to check 92,304 = 16 x 5,769 (the 16 variables investigated, i.e., CV2-1, CV3-1, CV4-4, CV4-13, CV5-1, CV7-1, CV8-1, CV9-1, CV12-1, CV13-1, CV13-13, CV14-1, CV16-13, CV22-1, CV26-1, CV27-1) variables in a systematic way, and out of these we found 409 errors. Since we don't know if we found all of the errors in the 16 x 5,769, the resulting extrapolation is a lower bound for these variables. On the other hand, these variables may be more likely to have errors than other variables due to explicit and somewhat complex coding instructions, so the error rate in other variables may likely be lower. We therefore treat the estimated lower bound rate of 0.4% errors per variable per song identified for these variables as a reasonable estimate of the overall error rate throughout the data set, though the actual rate may be more likely to be somewhat higher than lower (we speculate that 0.4-1% may be a reasonable likely range).
Although we may never correct all errors, we have added a "Comments and Feedback" function to the website where users may report coding/metadata errors or give other feedback. This function can be accessed by clicking the icon at the top of the Cantometric coding/metadata sheet for a given song (heart-shaped icon next to the "Song Description" text near the top of Fig 2), or the 'Comments' option in the menu at the bottom of each page. S1.9. Current full analyses: song style correlates with social complexity (controlling for autocorrelation) S1 Both the musical and social data is subset and standardized prior to statistical analysis. For each hypothesis, the data was subset to a set of complete data (i.e. no missing data). The count of data for each model is shown in Table S17. All musical variables are standardized to a 0 -1 scale according to the procedure described in S1.6.1. All social variables are divided by their maximum value, meaning they also exist on a 0 -1 scale. Data on the linguistic and geographical categorisation of societies was determined from society metadata held within the Global Jukebox. The creation of the data for the statistical models can be found in make_modeldata.R.
Note that our primary goal was to test for a correlation between musical PC1 and social PC1, so we report 1-tailed, uncorrected p-values. We also report 1-tailed, uncorrected pvalues for the bivariate correlations below for comparison with Lomax's original analyses (which also did not correct for multiple comparisons). These analyses were not formally pre-registered, but they were designed to replicate and extend Lomax's previously published findings, as described in the text.

S1.9.2. Principal component analyses
We use principal component analysis (PCA) to extract the latent variation within the musical and social variables described above. Principal components are extracted to represent the latent diversity of the musical and social variables separately. For each set of variables (musical or social), we analyze a subset of the data that includes only societies who have data for all variables (i.e. no missing data), then observe the scree plot to determine a sensible number of principal components. Finally, we extract principal components for the social and musical data.

S1.9.3. Musical PCA
We perform a PCA on the musical variables: CV7, CV10, CV21, CV23, and CV37. We use the resulting eigenvalues and scree plot (Fig S11) to determine how many dimensions should be used to represent the latent diversity. Typically, an eigenvalue greater than 1 indicates that a principal component contains more variation than any single variable, also known as the Kaiser rule. The first principal component has an eigenvalue of 2.06, and the second component has an eigenvalue of 1.003. Since the second variable effectively explains the same amount as any single variable, we opt for using a single principal component to represent musical diversity. The single musical component explains 45% of diversity. Table S15 shows the loadings of each variable onto the principal component.   Linguistic history is determined using a glottolog taxonomy [35]. As in [59], all language families are given a life of 6,000 years and language families are separated by a further 54,000 years, giving the tree a maximum time depth of 60,000 years. Branch lengths within language families are determined via Grafen branch length transformation. Linguistic covariance is accommodated into the regression using the following formula: Using a Pagel's Lambda transformation, = + (1 − ) . Where V is a variancecovariance matrix derived from the phylogenetic tree.
Spatial relationships were modeled using mixed-effect models with spatial random effects.
Locations are used at the society level (as opposed to song recording location) and are determined from the Global Jukebox metadata. We implement the following model: F is defined by the Matérn correlation function, and is used to determine spatial covariance.
Matérn parameters (rate of decay) and (smoothness) is estimated from the data. In all models was estimated to be 0.5, which approximates exponential decay in spatial covariance.
To compare the performance of each model variation, we use AIC comparison. We determine a difference of greater than -2 between models to indicate a statistically significant difference, a common rule of thumb. The AIC results are displayed in Table   S17. Next we explore the model output from the best performing generalized linear mixed model for each bivariate model, and present a plot of the data. The plot shows the raw data, and a regression line determined by the coefficients of the best fitting model as determined by the lowest AIC in Table S17. S1.9.6. Musical organization of the orchestra (CV7) vs Jurisdictional hierarchy As expected, we find a significant positive relationship between the musical organization of the orchestra (CV7) and jurisdictional hierarchy. The best performing model also controlled for geographic regions ( = 0.17, p < 0.001) This suggests that as the number of levels in jurisdictional hierarchy increases, there is also an increase in the complexity of musical texture within the orchestra. (Fig S14).

S1.9.7 Repetition of Text (CV10) vs Subsistence
We find a significant negative relationship between CV8 and Subsistence types. A model that also controlled for geography explained the data best ( = -0.40, p < 0.001). This suggests that as a reliance on agriculture increases, we see less text repetition within a society, in line with Lomax' prediction ( Fig S15).

Fig S15.
Scatter plot of the significant negative relationship between Text repetition (CV10) and Subsistence. Subsistence is an aggregated variable, whose calculation is described in Table S25.

S1.9.8. Interval size (CV21) vs community size (EA031)
We find a non-significant negative relationship between interval size and community size in the best fitting model, which controlled for language ( = -0.12, p = 0.011).

S1.9.9. Embellishment (CV23) vs class + caste + slavery
The next hypothesis we look at is between Embellishment and three measures of social hierarchy, also known as social layering. Previous correlations aggregated these variables, but a PCA of these three variables suggested these variables were largely orthogonal, meaning aggregating would be inappropriate. These effects are reflected in the significant Caste effect and significant Slavery effect, which we discuss next.   10

Musical Variables
Heterophony. Each instrument plays the same melody in a slightly different manner. The variation is usually rhythmic, with some instruments trailing behind, others pushing forward; or with some instruments more rhythmically active than others. 13 Polyphony or polyrhythm. The use of simultaneously produced intervals other than unison or the octave. Two-part intervals of this kind are considered polyphony, as well as harmonies of greater complexity. Little or no repetition-wordy. A continuous stream of dissimilar sung syllables, words, and phrases, with little or no repetition or use of non-lexical utterances. In such songs-epics, ballads, songs of prayer and supplication, and much of Western and Eurasian song-text is of paramount importance.
4 Some repetition. Some repetition and/or the use of non-lexical utterances-about one fourth repeated text.

7
Half repetition. A substantial amount of repetition and/or non-lexical utterances that more or less equals the flow of unrepeated words.

10
Quite repetitious. Considerably more than half (about two-thirds) of the sung performance is accounted for by repetition and/or non-lexical utterances.

13
Extreme repetition. The text seems to be almost entirely composed of repetition of some kind and/or non-lexical utterances.

7
Medium or considerable embellishment.

13
Little or no embellishment.
Note: Embellishment was reverse coded for the correlation analyses, meaning smaller values equate to less embellishment. Shown above is the original code. Very precise enunciation. Highly articulated consonants and syllables. This is generally typical of the storytelling singers of Eurasian polities.

4
Precise enunciation. Clearly articulated consonants in sung texts. Here one listens to the whole consonantal range and makes certain that all consonants are easily discernible.

7
Moderate enunciation. A moderate degree of enunciation.

10
Softened enunciation. Consonants are hard to distinguish and syllables are run together to some degree.

13
Very softened enunciation. Situations in which consonants are absent or nearly absent from the text, and/or in which syllables are run together.
Note: Enunciation was reverse coded for the correlation analyses. Shown above is the original code. No political authority beyond community (e.g., autonomous bands and villages)

5
Four levels (e.g., large states) More than 1,000 persons in the absence of indigenous urban aggregations 7 One or more indigenous towns of more than 5,000 inhabitants but none of more than 50,000

2
Hunting and/or fishing outweigh collection and/or agriculture (EA004 < 4 and EA005 < 4 and EA004 + EA005 < 6 and the greatest value of (E001, E002, E003) > the greater value of (EA004, EA005)) and does not satisfy the conditions for 1 (EA004 < 4 and EA005 < 4 and EA004 + EA005 < 6 and the greatest value of (E001, E002, E003) > the greater value of (EA004, EA005)) and does not satisfy the conditions for 1    Hereditary slavery present and of at least modest social significance

S1.10. The Global Jukebox and cultural equity
The Global Jukebox advances cultural equity and social justice by sharing its resources with researchers and culture bearers alike, and by evaluating them not by Western standards, but according to criteria used variably by singers and musicians worldwide. It is a living repository of the world's musical cultures, a platform for all forms of musical expression, and a source of inspiration for young musicians. The musical traditions it contains are endangered or no longer practiced. The Global Jukebox validates and sustains these historical musical practices and keeps their memory present. It is always expanding as new items of music and dance are added and eventually coded, new apps and educational offerings, new Journeys, new analyses are visualized, and new ways of looking at the data are added. Assuring its longevity is an interactive world music course using Cantometrics and based on deep listening, that trains researchers to code, or sensitizes students and casual users to the aesthetic preferences of the world's peoples, and familiarizes them with the myriad ways the "musical voice" is handled throughout the world.