Discontinuities in quinoa biodiversity in the dry Andes: An 18-century perspective based on allelic genotyping

History and environment shape crop biodiversity, particularly in areas with vulnerable human communities and ecosystems. Tracing crop biodiversity over time helps understand how rural societies cope with anthropogenic or climatic changes. Exceptionally well preserved ancient DNA of quinoa (Chenopodium quinoa Willd.) from the cold and arid Andes of Argentina has allowed us to track changes and continuities in quinoa diversity over 18 centuries, by coupling genotyping of 157 ancient and modern seeds by 24 SSR markers with cluster and coalescence analyses. Cluster analyses revealed clear population patterns separating modern and ancient quinoas. Coalescence-based analyses revealed that genetic drift within a single population cannot explain genetic differentiation among ancient and modern quinoas. The hypothesis of a genetic bottleneck related to the Spanish Conquest also does not seem to apply at a local scale. Instead, the most likely scenario is the replacement of preexisting quinoa gene pools with new ones of lower genetic diversity. This process occurred at least twice in the last 18 centuries: first, between the 6th and 12th centuries—a time of agricultural intensification well before the Inka and Spanish conquests—and then between the 13th century and today—a period marked by farming marginalization in the late 19th century likely due to a severe multidecadal drought. While these processes of local gene pool replacement do not imply losses of genetic diversity at the metapopulation scale, they support the view that gene pool replacement linked to social and environmental changes can result from opposite agricultural trajectories.

Tucumán). It comprises a stratigraphic sequence 30 cm thick made of two layers of anthropic origin, separated into three extractions in each case: Layer 1 (1st, 2nd and 3rd extractions) and Layer 2 (1st, 2nd and 3rd extractions). The excellent natural conditions of preservation allowed the recovery of a wide variety of archaeological remains of both inorganic and organic origin. Cueva de Los Corrales 1 has been defined as a multiple activity site with emphasis on processing, consumption and disposal of animal and plant food resources [62]. Quinoa seeds analyzed in this paper come from Layer 2 (1st

Section C. DNA EXTRACTION, SIMPLE SEQUENCE REPEAT GENOTYPING, ANCIENT DNA QUALITY CONTROL, AND POPULATION GENETIC ANALYSIS
Prevention of contamination. Following recommendations to minimize the risk of exogenous DNA contamination and ensure the reliability of the results [71], dissections, extractions and pre-PCR processing of ancient and modern seeds were rigorously separated in time and space. We first processed ancient seeds in a specific laboratory dedicated to ancient DNA under sterile conditions. In the ancient DNA laboratory, we purchased and used new consumables and extraction kits, with the room cleaned and exposed to UV overnight after each DNA extraction cycle, in order to destroy possible traces of DNA between successive extractions. We wore protective clothing and footwear.
Once all the extractions and amplifications of archaeological seeds were completed, we then proceeded to the extraction and amplification of modern seeds in a distant laboratory, without any spatial connection with the previous one.
Genotyping. We worked on 81 modern and 144 ancient quinoa seeds for DNA extraction according to the below-described procedures. In modern as well as ancient seeds, we dissected seeds under a stereomicroscope (Leica MZ 16, Leica camera DFC 280) to separate the embryo from the central perisperm. We next extracted total DNA from embryos. Quinoa embryos (ca 1-3 mm length, 1 mm thick) were dissected one seed after the other, using sterile dissection equipment and binoculars.
DNA extraction was successful for all the modern seeds, while we recovered well-preserved DNA from only 76 ancient seeds (53%). This level of ancient DNA recovery reflects the high preservation of genetic material in dry environments as pointed out by [72], conditions still improved in the dry Andes by cold temperatures and oxygen scarcity at high altitude [73,74]. Despite these favorable conditions, there appears to be a time limit for the preservation of quinoa seeds in archaeological contexts [75].   Base assignment was made with GeneMapper V3.0 software (Applied Biosystems). Phred quality score was settled at 20 to assure 99% of base call accuracy, as a measure of the quality of the identification of the nucleobases generated by automated DNA sequencing. Sequences were aligned using CLUSTALW [78] followed by minor manual modifications. We analyzed amplification products by comparison with the public sequence databases as nucleotide using BLASTN. Following the criteria of Meyers et al. [79], a sequence was classified as a known element when retrieved with a BLAST Evalue of less than 10 -5 .
Sequence analysis revealed amplification products corresponding to what was expected for both microsatellite loci. After BLAST search, we found modern sequences to have 100% homology with Chenopodium quinoa clones of microsatellite sequences with E-values of virtually zero. At locus QAAT024, we selected 6 ancient and 4 modern genotypes, 2 ancient samples failed to give positive results. Finally, we obtained 8 consensus sequences, from 4 modern and 4 ancient genotypes (Fig A).
In the microsatellite region, a six-base pair InDel was present between these modern genotypes, according to the size of the expected fragment. The ancient sequences revealed variations at a total of seven nucleotide positions, representing 97% of sequence similarity with modern sequences.
Variation at nucleotide position 70 was on only one single base in one ancient genotype. The other 6 mutations discriminated between modern and ancient sample groups, as well as among ancient genotypes (nucleotide position 229), which confirms the absence of contaminating sequences. At locus QAAT087, we obtained 4 consensus sequences, from 3 ancient and 1 modern genotype (the remaining samples failed to give positive results) (Fig B). Sequences from locus QAAT087 have a lower quality than locus QAAT024, not only in archeological samples but also in modern ones. Alignment analyses revealed sequence variations at a total of 7 nucleotide positions, representing 97% of sequence similarity with modern sequences. Genetic diversity indexes. We measured the genetic diversity of each sample using the allelic richness Nall-rar [80] and the expected heterozygosity He. In predominantly selfing populations, we expect a strong deviation compared to Hardy-Weinberg expectations. We estimated the inbreeding fixation coefficient FIS for each sample and FST for each pair of samples according to Weir & Cockerham [81]. FIS and FST measure genetic differentiation within and among populations respectively; both coefficients range from zero (no differentiation) to one (complete differentiation).
Analyses were performed in R using the packages adegenet and hierfstat [63,64] and the program ADZE for rarefaction analyses [66].
We expect quinoa populations to be highly selfing and therefore to display a limited number of repeated multilocus genotypes (thereafter MLG). We used the package poppr to count the number of MLGs present in each population (nbMLG) [65]. Populations with no more than two individuals sampled were removed from this analysis. We then used the rarefaction method (ADZE) to estimate the expected MLG richness (eMLG) corrected for the differences in sample size. Finally, within each population, the composition in MLGs can either be balanced, or highly biased with one predominant MLG. We measured this using the Simpson diversity index (λ) that is equivalent to a multilocus expected heterozygosity.
Selfing rate estimation. Two independent estimates of selfing rates were calculated: either directly as s(FIs)=2 FIs/(1+FIs) [82], or as s(LnL) using the program RMES (robust multilocus estimate of selfing), based on the distribution of multilocus heterozygosity, which allowed calculating a confidence interval at P=95% [83]. We used the maximum likelihood estimation with a precision of 0.00001, a maximum number of generations of selfing set to 10 and 100000 iterations. In some populations, the high degree of homozygosity (#1,3,6) or the low sample size (samples #10,11,16,19) prevented the estimation of s(LnL). Results of allelic diversity, heterozygosity and selfing rates in the 19 studied quinoa samples are shown in Table E.   [84] (Fig 1B in main text). To avoid over-fitting the model, we used the cross validation method to choose the optimum number of principal components to include in the model. We retained 15 principal components and 4 discriminant factors in the final model, which explained 57% of the sample variability. To find the number of genetic groups better fitting our sample, we used the k-means algorithm for a number of groups k=1-25. We ran 10 9 iterations with 2,000 starting points. The Bayesian Information Criterion (BIC) minimized at k=12 for the whole sample analysis (Figs C and D).  Table B. Vertical axis shows the assignment probabilities for values of k from 7 to 15. Colors for k=12 are the same as in Fig 1B in main text.
We also used a Bayesian clustering approach implemented in the program STRUCTURE v2.3.4 using the correlated-frequencies model without admixture [67] (Fig E). We assessed the structure of the sample for a number of clusters k=7-15. For each value of k, we ran 10 repetitions of 2×10 6 iterations of the MCMC algorithm after discarding 5×10 5 iterations as burn-in. Finally, we performed a Principal Component Analysis with the adegenet package [64) (Fig F).  show the genetic groups identified with STRUCTURE for k=12 (see Fig E).
Coalescent-based analyses. An approximate Bayesian computation (ABC) approach using random forests [68,69] was used to characterize the local demographic history of quinoa found around Antofagasta de la Sierra. The coalescence-based modeling of genetic relationships between modern and ancient samples from Antofagasta examined six scenarios (Fig 2 in main text): i) direct chronological filiation between all samples, mixing cultivated and wild forms (Fig 2A), ii) replacement of all ancient quinoas by modern quinoas (Fig 2B), iii) filiation between modern quinoas and intermediate ancient quinoas, both replacing the oldest quinoas (Fig 2C), iv) successive replacement of the three groups of quinoas: modern, intermediate, ancient (Fig 2D), v) same scenario as previously but differentiating between cultivated and wild forms (Fig 2E), vi) same scenario as previously but with admixture between cultivated and wild forms (Fig 2F).
Coalescent simulations and calculation of summary statistics were performed with DIYABC [85]. Reference tables were exported and ABC model choice and parameter estimation was performed with the random forest approach implemented in the R package abcrf [86]. The scripts in R employed are available at the Zenodo open access repository (https://zenodo.org/). All singlesample and two-sample summary statistics available at DIYABC and admixture summary statistics (from a reduced number of relevant sample trios) were used to grow random forests for ABC. Table   G presents prior and posterior probability distributions for parameters of the models. For each scenario 60 000 simulations were performed. Random forests of 800 trees were grown for model choice. For the best model, 140 000 additional simulation were run and random forests of 1000 trees were grown for parameter estimation.