Temporal Fluctuation in North East Baltic Sea Region Cattle Population Revealed by Mitochondrial and Y-Chromosomal DNA Analyses

Background Ancient DNA analysis offers a way to detect changes in populations over time. To date, most studies of ancient cattle have focused on their domestication in prehistory, while only a limited number of studies have analysed later periods. Conversely, the genetic structure of modern cattle populations is well known given the undertaking of several molecular and population genetic studies. Results Bones and teeth from ancient cattle populations from the North-East Baltic Sea region dated to the Prehistoric (Late Bronze and Iron Age, 5 samples), Medieval (14), and Post-Medieval (26) periods were investigated by sequencing 667 base pairs (bp) from the mitochondrial DNA (mtDNA) and 155 bp of intron 19 in the Y-chromosomal UTY gene. Comparison of maternal (mtDNA haplotypes) genetic diversity in ancient cattle (45 samples) with modern cattle populations in Europe and Asia (2094 samples) revealed 30 ancient mtDNA haplotypes, 24 of which were shared with modern breeds, while 6 were unique to the ancient samples. Of seven Y-chromosomal sequences determined from ancient samples, six were Y2 and one Y1 haplotype. Combined data including Swedish samples from the same periods (64 samples) was compared with the occurrence of Y-chromosomal haplotypes in modern cattle (1614 samples). Conclusions The diversity of haplogroups was highest in the Prehistoric samples, where many haplotypes were unique. The Medieval and Post-Medieval samples also show a high diversity with new haplotypes. Some of these haplotypes have become frequent in modern breeds in the Nordic Countries and North-Western Russia while other haplotypes have remained in only a few local breeds or seem to have been lost. A temporal shift in Y-chromosomal haplotypes from Y2 to Y1 was detected that corresponds with the appearance of new mtDNA haplotypes in the Medieval and Post-Medieval period. This suggests a replacement of the Prehistoric mtDNA and Y chromosomal haplotypes by new types of cattle.

Likelihood (ML) tree under the Bayesian Information Criterion (BIC), Akaike Information Criterion (AIC) and corrected AIC(c) was determinate with jModeltest v2.1 [6]. Hasegawa-Kishino-Yano model with invariant sites (I) and gamma distributed rates (G) (HKY+I+G) with a gamma value of 0.6420 was the best fit model according to BIC and was also suggested as a second best model by AIC and AIC(c). According to AIC alone, the best-fit model of DNA sequence evolution was a Kimura 3-parameter model with unequal base frequencies (uf), invariant sites, and gamma distributed rates (TPM1uf+I+G) with a gamma value of 0.6800.
Genetic distances for the 45 ancient cattle mtDNA sequences were estimated using ML analysis with a HKY+G+I substitution model and a gamma parameter of 0.6420. To generate an initial tree for the ML analysis, a model averaged phylogeny was calculated in jModelTest v2.1 from 88 candidate models using BIC as a selection criterion, a confidence interval of 1.00, and with a 50% majority rule in the 1.00 confidence interval. ML Bootstrap values were calculated with 1000 replicates using PhyML 3.0 [7]. The tree was drawn using The ML tree topology was confirmed by Bayesian Markov Chain Monte Carlo (MCMC) analyses using MrBayes 3.2 [8]. Bayesian MCMC analyses were conducted using theHKY+G+I model. Three million generations in four independent MCMC analyses were conducted with sampling every 100 generations and the first 25 % was discarded as burn-in. When the potential scale reduction factor values approached 1.0 and the average standard deviation of split frequencies fell close to 0.01, Markov chain stationary was considered to be reached.

Post-Mortem degradation and contamination
The post mortem degradation of DNA by endogenous nucleases as well as physical oxidative and hydrolytic damages may cause errors in PCR especially in very old specimens (tens of thousands years) [9]. Samples included in the current study derived mostly from the last few hundred years with the oldest samples from the Late Bronze Age (800-600 BC). All the ancient samples included in the statistical analyses were repeatable with no signs of deamination or contamination. The ancient cattle sequences showed reasonable molecular behaviour as they were assigned to previously found taurine T and Q mega-haplogroups with most of the haplotypes having counterparts in modern cattle (discussed in the main text). In addition, postmortem degradation of DNA should affect nucleotides and sugar-phosphate back bone of the DNA strand [9] over the whole length of the strand, not only at the diagnostic motifs. Therefore it is unlikely that haplotypes detected in the current study would be a result of post mortem degradation.

Sampling effect, newborn haplotypes, selection or migration?
Major concern was taken to take representative sample from ancient N-EBSR cattle and minimize the possibility to sample close relatives (Table A in S1 File). Ancient samples from all excavation sites where more than one individual was analysed showed variation in mtDNA haplotypes. The ancient bulls in this study were from different excavation sites and time periods, and thus it is unlikely that they were close relatives. Two exceptions are a pair of Post-Medieval bulls from Pietarsaari, Finland (BtPie1 and BtPie2, Table A in S1 File), but in this case both paternal lines Y1 and Y2 were detected.
We realize that the sampling effect and heterochronic nature of the data may affect the results. However, the time span of sample cohorts in this study was quite narrow: around 300 years for Medieval and Post Medieval periods, which corresponds to around 60 generations and 1900 years for the Prehistoric period. The longer time period in the Prehistoric cohort is due to the inclusion of two Bronze Age samples. As all the haplotypes in the Prehistoric cohort were different (Hd=1, Table 1 in main text) a summary of the genetic diversity gave similar statistical results for the total Prehistoric cohort as for the 300 years long Iron Age period. Moreover, the time period from the end of the Post-Medieval period to the present is around 200 years. Thus, when sampling the same N-EBSR, modern cohort corresponds to changes that occurred during the last 200 years. These time frames are too short for a major accumulation of new haplotypes by mutations as shown in the median joining network ( Fig. 1 in main text), where the new haplotypes detected at later time periods do not form star-like patterns around any older haplotypes, as expected if they were produced by new mutations. In addition, the bias between uncorrected and heterochronity corrected nucleotide diversity estimates were marginal ( Table  1 in main text).
A sampling effect may drop out rarer haplotypes in each temporal cohort, including modern samples, which may result in stochastic fluctuations in the distribution of haplotypes between temporal periods. The expected result of a sampling effect is therefore a decrease in the diversity of each sample especially in the smallest sample sets. In this data, a χ² -test was performed to statistically test the possibility that results were due to a sampling effect. Significant results rejected the null hypothesis of equal frequencies between periods, and in fact revealed an opposite pattern. Due to a temporal increase in the proportion of one haplotype, and a decrease in others ( Fig. 2b in main text, see results), smaller data sets of ancient cohorts displayed higher diversity with more haplogroups observed compared to modern N-EBSR cattle population (see results in main text). Selection of breeding animals or stochastic events in breeding populations may result to increase of certain haplotypes. In this data, artificial selection may have affected the haplotype diversity of the modern cohort, as the selection for specialised breeds started around 1900 when the herd books were established. Families with desired characters may be linked to certain haplotypes. In historical periods, selection of breeding animals may have happened more by change (i.e. genetic drift). For example, the most severe starvation in N-EBSR history in 1695-1697 AD likely caused a strong population bottleneck in all domestic animals. During these three exceptionally cold years, one third of the local human population died of starvation and a large proportion of the livestock was eaten [10]. After this kind of stock loss, any surviving animal was likely used for breeding. Whether the selection is intended or caused by drift, the signatures detected from population genetics would be similar, e.g. increase of the frequency of certain haplotype(s), as detected in this study (see the main text).
After considering the stochastic fluctuation due to a sampling effect and the possibility of new haplotypes evolving by mutations within the time frame of this data, the most plausible explanation for the observed temporal changes in this data is migration. The haplotypes detected in each period likely arrived to N-EBSR from other locations rather than evolving there. The temporal increase in frequency of certain haplotypes may be due to bottlenecks and/or artificial selection of breeding animals.          T1f  T2  T2  T3b  T3b  T3b  T3b  T3b  T3b  T3b  T3b  T3b  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  T3  NA  NA  NA  NA  Table A. Ancient samples studied in this article.
Identity codes used in present aDNA analysis (Sample ID) and radiocarbon analyses (Hela-codes), archaeological site (Site), town (Location) where samples were excavated and museum ID, bone type, and sex according to osteological analysis. Radiocarbon date (Radiocarbon date BP (± 1σ)) and calibrated date (with confidence interval of 95%) or dating by the context (DBC) and corresponding historical period (Dating). Only unclear contexts were radiocarbon dated (see text for details). Success rate in sequence analysis of the mtDNA D-loop (mtDNA) and Y-chromosomal UTY19 marker (UTY19). Total number of extractions (N extracts), PCR reactions (PCR), amplicons (N amplicons), and aDNA laboratories (aDNA laboratory) for each sample. Samples not analysed (-).