Diffusion of cultural innovation depends on demography: testing an epidemiological model of cultural diffusion with a global dataset of rock art sites and climate-based estimates of ancient population densities

Demographic models of human cultural evolution have high explanatory potential but weak empirical support. Here we use a global dataset of rock art sites and climate and genetics-based estimates of ancient population densities to test an epidemiological model, where the spread of an innovation requires population density beyond a critical value. We show that the model has stronger empirical support than a null model, where rock art detection rates and population density are independent, or a proportional model where detection is directly proportional to population density. Comparisons between results for different geographical areas and periods yield qualitatively similar results, supporting the robustness of the model. Re-analysis of the rock art data, using a second set independent population estimates, again yields similar results. We conclude that a minimum population density is a necessary condition for the spread of rock art. Methods similar to those described can be used to test our model for other classes of archaeological artifact and to compare the epidemiological model against other demographic models.


Introduction
It is widely accepted that the complexity and diversity of human cultures are a result of Cumulative Cultural Evolution (CCE), enabled by humans' unique neuroanatomy and cognitive capabilities, especially their skills in "cultural learning" [1][2][3] . However, the old idea that modern human capabilities emerged suddenly as a result of a highly advantageous mutation some 50,000 years ago [4,5] is no longer accepted and many allegedly modern features of human behavior and cognition are now believed to be more ancient than previously Walker et al., 2019 2/16 suspected [6][7][8][9]. There is, furthermore, not the slightest evidence that variations in brain size, brain morphology or innate capabilities explain any aspect of the spatiotemporal patterning of human cultural evolution over the last 50,000 years. Against this background, demographic models of CCE [10] assign a determining role to variations in population size, density and structure. Such models suggest that larger, denser, better connected populations produce more innovations than smaller ones [11], are less vulnerable to stochastic loss of unique skills [12,13], and are more capable of exploiting beneficial transmission errors during social learning [14]. They also suggest that well-connected metapopulations produce faster cultural innovation than metapopulations with weaker connections among subpopulations [15] .
Demographic models could potentially provide valuable explanations for spatiotemporal patterning in the global archaeological record and several authors have used them for such purposes e.g. [14,16]. However, the assumptions and conclusions of the models are hotly contested [17][18][19]. Empirical studies are sparse and inconclusive, with some supporting an important role for demography e.g. [14,15,20,21], while others find no evidence for such a role [22][23][24][25][26].
In general, previous empirical studies have used data for historical or recently extinct huntergatherer populations from specific geographical regions. Here, for the first time, we test a demographic model of cultural evolution against fine-grained estimates of prehistoric population densities covering the whole globe, for a period of 46,000 years. Specifically, we test the ability of the model to predict the global, spatiotemporal distribution of parietal rock art sites over this period.

Results and discussion
An epidemiological model of cultural diffusion CCE involves the inception, diffusion and selective retention of cultural innovations such as new behaviors and practices, new beliefs, and new technologies. In this study, we consider the case of parietal rock art. The model tested here (see Fig 1), inspired by SIR models in epidemiology [27][28][29], focuses on cultural diffusion [30], treated as analogous to the spread of a disease. The underlying assumption is that an innovation that does not spread beyond the population where it originates is unlikely to contribute to CCE or to leave a trace in the archaeological record. Consider the emergence of rock art in a metapopulation comprising N subpopulations or communities, where I and S are respectively the proportion of infected communities (communities that have adopted the innovation), and susceptible communities (communities that have not). The innovation spreads from infected to susceptible communities at rate #. By analogy with empirical findings in [31] (see Methods), rates of encounter between communities are assumed proportional to the square root of population density. Infected communities recover (here: lose the ability to produce rock art) at rate $.
In this model (see The value of ! * is determined by the equilibrium between the rate of transmission, # , and the rate of recovery, $. Below ! * , there are no infected communities; above ! * , + * is an increasing function of population density. However, + * is not directly observable in the archaeological record. To test our model, we therefore consider the site detection rate i.e. the small probability, P, that a territorial cell of defined area contains at least one recorded instance of rock art. Since the proportion of cells where geology, climate and research effort allow the creation, preservation and recording of rock art is small, where 6 reflects the joint effects of these factors (see SI). The model predicts that for ! ≤ ! * , detection rates will be zero and that at higher densities it will rise approximately in direct proportion to !. As a result, the shape of the distribution of sites by population density will differ significantly from the distribution of globals and median population density for cells containing sites will be automatically higher than for the complete set of cells.

Testing the model Strategy
To test the predictions of the model, we collated a dataset containing the locations, dates and characteristics of 133 scientifically dated rock art sites (see Fig 2A, SI Table 1, Methods). Ancient population densities were estimated by combining data from two published models, informed by global genetic variation, and climate-based estimates of Net Primary Productivity [32,33] (See Methods). Population densities were inferred for all cells containing rock art site (sites) and for all cells (globals) (see Fig 2B). Sites and globals with inferred population densities of zero were eliminated from the subsequent analysis. We then computed site detection rates (number of sites / number of globals) as a function of population density for all latitudes between 20-60°N and 10-40°S, and all dates more recent than 46,000 years ago -the only latitudes and dates with significant numbers of sites in our sample. After exclusion of 4 sites at equatorial latitudes, 1 site older than 46,000 years and 8 sites with inferred population densities of zero, our final analysis included 119 sites and 9.8 million globals, with the age distribution shown in Fig 2C (see Methods). Empirical support for the model

Full dataset
As a first test of the model, we analyzed population densities for our complete rock art dataset. As predicted, the distribution of population densities for sites differed significantly from the  Table 1 for detailed results).
Using a Bayesian statistical framework, we estimated the most plausible parameter values for the epidemiological model given the empirical observations (Fig 3A). Posterior distributions for model parameters were tightly constrained (see SI Fig 1). The inferred critical population density for the model with the highest likelihood, was 13.18 individuals/100km 2 (95% CI: 6.66 -17.00). Comparison of the epidemiological model against a constant model (equal site frequency at all population densities) strongly supported rejection of the alternative model.
Comparison against a proportional model, where detection rates were directly proportional to population density (i.e. to the number of potential inventors in the population) (compare with [11]), again supported the superiority of the epidemiological model. (for detailed results see Table 1). Taken together, these findings strongly support the notion that the emergence of rock art requires population density above a critical value.

Sites with exact dates obtained by direct methods
Seven sites in the rock art dataset were located in cells with estimated population densities below the critical value inferred from the model and the empirical data. As described in the Methods section, some of the dates in our rock art dataset were minimum or maximum dates, and some were estimated using indirect methods (e.g. dating of organic materials stratigraphically associated with the art). Since these dates do not necessarily reflect the actual dates at which the art appeared for the first time, and correct inference of population densities requires correct dates, we repeated our analysis using only sites with exact dates, obtained with direct methods. With this filtered dataset (55 sites), as in the original analysis, the distributions of population densities for sites and globals were significantly different (Two sample KS test D=0.61, p<0.00001), median population density for sites was higher than for globals (Sites: 30.25; Globals 15.31, Mann-Whitney U=242, p<0.0001) and, empirical support for the epidemiological model was much stronger than for alternative models (see Table 1). The inferred value of the critical population density (14.50 individuals/100km 2 , 95% CI: 7.95-17.89) overlapped with the previous estimate (see Table 1).

Alternative population estimates
Another potential source of errors was the model used to generate our population estimates. We therefore repeated our analyses using more recent population estimates [34], based on a different climate model and different assumptions (see Methods). As in the earlier analysis, there were significant differences between the population density distributions for sites and globals, (two sample KS-test D=0.57, p<0.001), median inferred population was much higher for sites than for globals (sites: 25.02; globals: 18.04, Mann-Whitney U=243, p<0.01) and empirical support for the epidemiological model was much stronger than for the alternative null or proportional models (see Table 1). The CI for the inferred critical population density (95% CI: 13.80-22.15) overlapped with the CI for the original analysis (see Fig 3B,

Potential sample bias
A comprehensive survey of the vast rock art literature was beyond the scope of our study. As a result, large geographical areas in Central/South America, Central Africa, East and South-East Asia are practically unrepresented in our dataset (see Fig 2A). Moreover, 60.9% of the sites in our full dataset date from less than 10,000 years ago, and only 3.8% from before 40,000 years ago (see Fig 2C). All this suggests the possibility of systematic bias. Suggestions that the literature itself is biased by ecological and taphonomic factors and by geographical variations in research effort [35] make the presence of such biases even more plausible.
To test the potential impact of such biases, we repeated our analysis for data from two culturally unrelated geographical areas (France-Spain and Australia) and for two distinct periods (from the present until 10,000 years ago, and from 10,000 until 46,000 years ago (see Methods). In the case of France-Spain, the predicted effects were attenuated by high estimated population densities even in areas with no sites. However, in all four cases, we found different density distributions for sites and globals, higher median population densities for sites than for globals, and stronger support for the epidemiological model than for alternative models (Table 1). Support for the existence of a critical population density for each of the periods and geographical areas examined is evidence that our results are not due to sampling biases (differences in sampling rates between periods or areas).

Population density as a proxy for social contacts
Another potential problem was the use of population density as a proxy for contacts between subpopulations. Grove suggests that individual encounter rates depend not on population density but on the product of density and mobility [31], and shows empirically that modern hunter gatherers' mobility is inversely proportional to the square root of population density. If this is correct, the encounter-rate will be directly proportional to the square root of population density. This is the assumption on which we based our model. However, testing with a generalized model (results not shown) demonstrated the robustness of our qualitative findings to the exact specification of this relationship (Methods).

Conclusions
This study models one of the key mechanisms required for CCE ("diffusion"), and tests the predictions of the model for the case of parietal rock art. In all our analyses, (Fig 3B, Table 1), the distribution of population densities for sites differs significantly than the distribution for globals and median population densities for sites (25-30 individuals/km 2 ) are consistently higher. In all cases, empirical support for the epidemiological model and a critical population density is much stronger than for alternative proportional or null models. Taken together, these results provide robust grounds to reject the null hypothesis that the emergence of rock art is unrelated to demography, and strong support for the epidemiological model i.e. for the notion that population density above a critical value is a necessary condition for its emergence.
Importantly, nothing in our model or empirical results suggests that a minimum level of population density is a sufficient condition for the emergence of rock art. There is little doubt, in fact, that, the creation, preservation and discovery of rock art depends on a multitude of factors, including variations in research effort as well as climatic and geological factors creating or limiting opportunities for its creation and preservation. It is not surprising, therefore, that many areas of the globe (e.g. in equatorial Africa) with high inferred population densities for the relevant periods, have little or no reported rock art. It is plausible, furthermore, that rock art production depends not just on the population density of individual cells, but also on the overall size of so-called Culturally Effective Populations, spanning larger geographical areas, as posited by the demographic models reviewed in the Introduction. We consider the epidemiological model as complementary to these models.
The use of population density as the independent variable is, in fact, a key feature of our study: in archaeological contexts where a local population's cultural links to other populations are unknown, the size of "culturally effective populations" [12] is difficult to measure. A second key feature is the choice of "detection frequency" as the dependent variable: in the majority of current studies the dependent variable is "cultural complexity", usually quantified as the size of the population's toolkit [36]. The method used here circumvents the difficulty of applying such a concept in archaeological settings, where complete toolkits are rarely available. These methodological features of our study will facilitate the application of our model and methods outside the case of rock art and in contexts where other constructs are difficult to measure.
Our rock art dataset, population estimates, and software tools are publicly available at https://github.com/rwalker1501/cultural-epidemiology.git. Other researchers are encouraged to use them to replicate our findings, to test our hypotheses for other classes of artifact and with other population estimates, and to explore their own models.

Derivation of the model
The model describes the diffusion of a cultural innovation through a Culturally Effective Population (CEP) [12] i.e. a closed network of communicating individuals or subpopulations. Notation for the "disease dynamics" follows the conventions established in [29].
Consider a CEP with stable number of subpopulations, N. Assume that the population is divided into infected and susceptible (unaffected) subpopulations, with proportions I and S, respectively, such that I + S = 1. If the population is "well-mixed" (i.e. every subpopulation is in social contact with every other subpopulation), any infected subpopulation can transmit the innovation to any susceptible subpopulation. Let # be the rate of transmission per time step and $ the rate of recovery of the infected subpopulation (here: the rate at which it loses the ability to produce rock art). Thus, the rate of change in the proportion of infected subpopulations is given by In a partially connected population, infected subpopulations can only transmit an innovation to the susceptible subpopulations with which it is in contact. Assuming that on average each subpopulation is in contact with k other sub-populations, we can substitute k for N in Eq. (1). Using S+I=1, we thus obtain Rearranging, solving for 78 79 = 0, and imposing + * ≥ 0, we obtain the stable proportion of infected subpopulations: If social contact numbers k are proportional to the square root of density [31], i.e. k=JK!, In short, there exists a critical population density, ρ * , such that at densities below ρ * there are no infected subpopulations, and at densities above ρ * , the number of infected individuals is greater than zero.
At ρ * , 1 − D LF √ P = 0. Thus: To reduce the number of parameters in the model, we normalize the rate of transmission, setting J# to 1. We thus obtain the simplified expressions: and In this simplified version of the model, the equilibrium size of the infected population and the critical value of ! are entirely dependent on population density and a single parameter, $.
Finally, we assume that each infected subpopulation has an equal probability, R, of producing an artifact that produces a trace in the archaeological record. On this assumption, the probability P, that the n infected subpopulations will generate at least one such trace, is given by: Assuming z is small, we obtain The value of N for a cell of standard area, with fixed mean community size, is proportional to the population density for the cell: Grouping V and z in a single variable 6 we obtain In the setting of our empirical study, inferred population sizes may contain large errors, especially when sites dates are minimum dates or indirect estimates. We, therefore, include an error term W, representing the probability that a site is attributed to a cell with an incorrect date, whose population will thus be different from that of the cell with the correct date. Thus:

10/16
This is the model we tested in our empirical study.

Rock art dataset
The analysis presented here is based on a dataset of parietal rock art, generated through a literature search with Google Scholar. We are aware that the database contains only a small proportion of all rock painting sites in the world, and that it may be subject to systematic biases. These are discussed in the Results and Discussion.
For the purposes of our survey, parietal rock art was defined to include all figurative and nonfigurative pictograms (paintings and drawings) and petroglyphs (engravings) using rock wall as a canvas. "Art mobilier" (e.g. figurines) and geoglyphs (i.e. large designs or motifs on the ground) were excluded from the analysis.
The survey was seeded using the query: ("rock art" OR "parietal art" OR petroglyphs OR pictographs) [AND] (radiocarbon OR AMS OR luminescence OR Uranium).
We read the top 300 articles found by the query that were accessible through the EPFL online library, together with other relevant papers, cited within these articles. Sites where drawings, paintings and engravings were reported, were systematically recorded. Sites with no radiocarbon, optical luminescence or Uranium-Thorium date were excluded.
For each dated site, we recorded the longitude and latitude of the site (where reported), its name, the earliest and latest dates of "art" found at the site (converted to years before 1950 or years BP), the name of the modern country where the site was located, the journal reference, the method(s) used to produce the date, the nature of the dating procedure (direct dating, indirect dating), the nature of the data provided (exact data, minimum date, maximum date, mixed) a descriptor of the artefacts found (paintings, drawings, petroglyphs etc.), and a flag showing disputed dates. Where different authors reported different dates for a site, without disputing dates proposed by other authors, we systematically used the dates from the most recent report.
In cases where the article did not provide a latitude and longitude, online resources were used to locate the information. The main such resources were D. Zwiefelhofer's FindLatitudeAndLongitude web site [37], which is based on Google Maps, and Austarch, a database of 14C and Luminescence Ages from Archaeological Sites in Australia, managed by A. Williams and Sean Ulm [38] .
The survey generated 190 records. Records with identical latitudes and longitudes and overlapping dates were merged (5 records eliminated). Duplicate records (12), modern sites (1), sites which did not meet the inclusion criteria (13), sites where the source was deemed unreliable (5), sites where reliable geographical coordinates were unavailable (12), sites with disputed or doubtful dates (8) and 1 site described in a retracted article were excluded. These procedures left a total of 133 records.
All except 5 of these records referred to sites located between 20°-60°N and 10° -40°S and with dates more recent than 46,000 years ago. Since the data analysis (see below) compares the distribution of population densities for sites against the distribution of globals defined as all cells, within a comparable range of latitudes, inclusion of these sites would have required the inclusion of an unrepresentative set of globals, potentially distorting the results of the analysis. They were therefore excluded from the subsequent analysis. For similar reasons, one 11/16 very early site (Blombos Cave, South Africa) (77,000 years ago) was also excluded. After these exclusions the dataset comprised 127 records. Subsequent analysis (see below) showed that 8 records referred to sites with estimated population densities less than 1 individual / 100km 2 at the date corresponding to the earliest rock art at the site. These sites were also excluded from the later phases of the analysis. Thus, the analysis was based on a final dataset of 119 records.

Estimates of population density
The population density estimates used in our paper combine results from a climate-informed spatial genetic model [32] with an improved model reported in [33]. Briefly, models combine climate estimates for the last 120,000 years, based on the Hadley Centre model HadCM3, with data on patterns of modern genetic variability from the human genome diversity panel-Centre d'Etude du Polymorphisme Humain (HGDP-CEPH), and a mathematical model of local population expansion and dispersal. Estimates of past precipitation and temperature are used to estimate Net Primary Productivity (NPP), and hence maximum human carrying capacity, for each cell in a hexagonal lattice with equal area cells 100 km wide, for all dates from 120,000 years ago to the present using time steps of 25 years. The carrying capacity is a continuous function of NPP governed by two NPP threshold values and a maximum carrying capacity parameter, K. The carrying capacity is zero below the lower NPP threshold, increases linearly from zero to K between the two NPP thresholds, and is constant and equal to K for NPP above the upper NPP threshold value. Human expansion out of Africa is simulated using a model where populations that have reached the maximum carrying capacity of a cell expand into neighboring cells. Approximate Bayesian Computing is used to compare models with a broad range of parameter values and starting values. Model evaluation is based on their ability to predict regional estimates of pairwise time to most recent common ancestor (TRMCA). We generated population size estimates using parameters from the high-ranking scenario described in Fig 2 and Movie S1 in [32] and NPP values from [33]. These contain a more accurate model of the American ice sheet dynamics and therefore give more accurate estimates of the colonization of the Americas by humans.
These population estimates were compared against the results using a second set of population estimates reported in [34]. The data refer to the early exit scenario (Scenario A) described in the paper. As in [32], human population density estimates are based on a climate model (LOVECLIM) combined with a reaction-diffusion Human Dispersal Model. Unlike the model in [32], the estimates do not take account of genetic data. A second difference concerns the population estimator, which is based not just on NPP but also on temperature and predicted "desert faction" and incorporates ad hoc modeling hypotheses absent in the previous model. These include accelerated human dispersal along coastlines ("a coastal express") and a special Gaussian decay function, modeling the probability of island hopping as a function of distance. For clarity of presentation, population estimates from the two models are both expressed in terms of effective population/ 100km 2 .

Data analysis
In many sites in our dataset, the rock art reported in the literature spanned a broad range of dates. For the purposes of our analysis, we considered only the oldest date for each site. This procedure reduced our sample size and the statistical power of our analysis, but also the risk of artifacts due to dependencies in the data. Where confidence intervals were given, we used the midpoint of the interval. Each site was associated with the estimated population density for the cell in the population model whose date and center point were closest to the date and location of the site. Globals were defined according to the needs of the individual analysis.

Analysis Definition of globals World
All cells with non-zero population with dates in the range 0-46,000 years ago and locations between 60°N and 20°N or between 10°S and 40°S Periods (1) All cells with non-zero population with age between 0 and 9,999 years. (2) All cells with non-zero population with age between 10,000 and 46,000 years.

France and Spain
Derived from the maximum and minimum latitudes and longitudes for sites in our dataset with locations in modern France or Spain. All cells with non-zero population lying in a "rectangle" with NW corner (lat: 50.84°N, lon: 10°) and SE corner (lat: 35.00°N, lon: 7.00°E) Australia Derived from the maximum and minimum latitudes and longitudes for sites in our dataset with locations in modern Australia. All cells with non-zero population lying in a "rectangle" with NW corner (lat: 25.27°S, lon: 133.77°E) and SE corner (lat: 39.16°S, lon: 154.86E) Table 2: Definition of globals used in the different analyses Data for sites and globals were binned by population density (300 bins). Numbers of sites and globals and frequency of sites (sites/globals) were computed for each bin.

Testing model predictions
To test the predictions of the model for a specific dataset, we compared the distribution of sites by population density against the distribution of globals. Using the two-sided, two sample Kolmogorov-Smirnov test, we tested the null hypothesis that both were drawn from the same distribution, The difference between the medians of the two distributions was tested using the less sensitive Mann-Whitney U test.
Empirical support for the model was estimated using a Bayesian framework.  Table 1.

Analysis by period and geographical area
Subsets of "sites" and "globals" belonging to specific date bands (0 -9,999 years ago, 10,000 -46,000 years ago) and specific geographical areas with high numbers of sites (France/Spain, Australia) were subjected to the same analyses applied to the full dataset.

Sensitivity to the relationship between population density and encounter rate
To test the sensitivity of our model to the assumed inverse quadratic relationship between population density and encounter rate, we formulated a generalized version of our original model, where the critical population density and the encounter rates are power functions of population density. Thus: Replication of our analysis using this generalized model (results not shown) showed that our qualitative empirical findings are robust to the exact specification of the relationship between population density and encounter rate.

Software
Data extraction and analysis was based on custom code written in Python 2.7, using the Anaconda development environment.

Software availability
Python source code for the software used to perform the analyses is available under a GPL 3.0 license, at https://github.com/rwalker1501/cultural-epidemiology.git. The software includes (i) the script used to generate the figures and tables shown in this paper; (ii) methods to run additional data analyses and to produce figures not shown in the paper; (iii) a menu driven program providing easy to use access to these functions; (iv) documented source code for the statistical calculations and plots used in the paper and the additional analyses.

Data
The GIT repository includes the full rock art database used for the analysis, and copies of the population estimates from the Timmermann and the Eriksson models (reproduced with the permission of the authors). The original population estimates from Eriksson can be found at: https://osf.io/dafr2/ The original population estimates from Timmermann can be found at https://climatedata.ibs.re.kr/grav/data/human-dispersal-simulation.

Author contributions
RW developed the original epidemiological model and wrote a prototype version of the software.
AE contributed the population data used in the analysis, substantially contributed to the development of the epidemiological model, and developed the concepts and analysis tools used to validate the model. RW and AE jointly wrote the manuscript. CR rewrote the prototype data extraction and analysis software from RW and AE, and developed the publication version of the tool. TH contributed to the formal derivation of the epidemiological model. FC contributed to the statistical analysis.

Supplementary Information
A B C

SI Fig 1:
Posterior distributions for the !, ", and # parameters for the epidemiological model as shown in Fig 3A (observed site detection rates, for the full archaeological dataset and the combined population estimates in [28,29]) ,