Selection signatures in two oldest Russian native cattle breeds revealed using high-density single nucleotide polymorphism analysis

Native cattle breeds can carry specific signatures of selection reflecting their adaptation to the local environmental conditions and response to the breeding strategy used. In this study, we comprehensively analysed high-density single nucleotide polymorphism (SNP) genotypes to characterise the population structure and detect the selection signatures in Russian native Yaroslavl and Kholmogor dairy cattle breeds, which have been little influenced by introgression with transboundary breeds. Fifty-six samples of pedigree-recorded purebred animals, originating from different breeding farms and representing different sire lines, of the two studied breeds were genotyped using a genome-wide bovine genotyping array (Bovine HD BeadChip). Three statistical analyses—calculation of fixation index (FST) for each SNP for the comparison of the pairs of breeds, hapFLK analysis, and estimation of the runs of homozygosity (ROH) islands shared in more than 50% of animals—were combined for detecting the selection signatures in the genome of the studied cattle breeds. We confirmed nine and six known regions under putative selection in the genomes of Yaroslavl and Kholmogor cattle, respectively; the flanking positions of most of these regions were elucidated. Only two of the selected regions (localised on BTA 14 at 24.4–25.1 Mbp and on BTA 16 at 42.5–43.5 Mb) overlapped in Yaroslavl, Kholmogor and Holstein breeds. In addition, we detected three novel selection sweeps in the genome of Yaroslavl (BTA 4 at 4.74–5.36 Mbp, BTA 15 at 17.80–18.77 Mbp, and BTA 17 at 45.59–45.61 Mbp) and Kholmogor breeds (BTA 12 at 82.40–81.69 Mbp, BTA 15 at 16.04–16.62 Mbp, and BTA 18 at 0.19–1.46 Mbp) by using at least two of the above-mentioned methods. We expanded the list of candidate genes associated with the selected genomic regions and performed their functional annotation. We discussed the possible involvement of the identified candidate genes in artificial selection in connection with the origin and development of the breeds. Our findings on the Yaroslavl and Kholmogor breeds obtained using high-density SNP genotyping and three different statistical methods allowed the detection of novel putative genomic regions and candidate genes that might be under selection. These results might be useful for the sustainable development and conservation of these two oldest Russian native cattle breeds.

Introduction National livestock genetic resources are the most important carriers of the unique forms of variability; they are necessary to ensure the sustainability of local agricultural production systems [1]. In Russia, over a long period, various cattle breeds have formed, which are characterised by good adaptation to the local environmental conditions and better suitability for the development of geographic production systems [2]. Among the Black-and-White cattle breeds, Kholmogor and Yaroslavl dairy cattle have remarkable importance for animal husbandry in the Central and Northern regions of European Russia. Both breeds originated from small-stature Great Russian Cattle (height at withers is 105-110 cm, body weight up to 325-390 kg [3,4]. The first records of the presence of cattle on the islands in the upper reaches of the Northern Dvina River near the Kholmogory town in Northern Russia date back to the sixteenth century (the breed was named after this town) [5]. Over a long history, the good pastures and hayfields in the Kholmogor breeding zone have contributed to the development of the tall cattle breed well suited for dairy production.
In the 18th-19th centuries, the Kholmogor cattle were well-known in Russia and abroad owing to their high productivity and excellent quality of dairy products. Cattle breeding began to develop in the Yaroslavl region (near Moscow), based on which the breed is named, in the seventeenth and eighteenth centuries [5]. The Yaroslavl cattle were selected for adaptability, high feed efficiency and good reproductive ability even in poor forage conditions during winter. Moreover, this breed was in great demand among peasants and small cattle holders [4].
Most researchers suggested that foreign breeds, including Holsteins contributed little in the development of the Yaroslavl and Kholmogor cattle populations [6,7], which was subsequently confirmed by molecular genetics studies [8][9][10]. The studies of the historical specimens of Yaroslavl and Kholmogor cattle, dated from the end of 19 th to the first half of 20 th century using microsatellites, indicated that Russian breeds and Holsteins were developed from different ancestral populations [11]. In the 1960s, the number of Kholmogor and Yaroslavl cattle was close to million individuals (993 and 951 thousand, respectively) each; in contrast, in 2015, their census population size remarkably decreased to 222.9 and 50.5 thousand heads, respectively [12]. During the last decade, along with the remarkable decrease in population size, crossbreeding has been actively used in the remaining purebred animals, which could lead to the loss of genetic distinctness.
Several molecular genetic studies have attempted to characterise the demographic history and population structure of the modern populations of Yaroslavl and Kholmogor breeds and their relationship to other taurine breeds [9, 10, 13,14], but their results were, in some cases, controversial. By using microsatellite-based clustering, Li and Kantanen [13] suggested the composite origin of these breeds. Whole-genome studies performed using medium-density SNP chips showed that Yaroslavl and Kholmogor cattle are valuable national genetic resources, which have been little influenced by introgression with non-Russian breeds [9, 10]. The maintenance of a visible part of historical components in the modern populations both of Yaroslavl and Kholmogor breeds was confirmed by the study of historical specimens [11]. Further elucidation of the genetic architecture of these two breeds and more precise identification of genomic regions that are affected by putative selection may be achieved by using more powerful molecular genetic tools such as bovine high-density SNP BeadChip (Illumina Inc., USA) which contains 777,962 SNPs.
Several studies have explored the genome diversity and adaptation of cattle breeds by using high-density DNA arrays [15][16][17]. However, the patterns of genetic variation within Russian cattle breeds have not yet been assessed using this powerful genetic tool. Identification of the genome regions exhibiting evidence of being subjected to selective pressure during the breeding process is of particular interest [18]. Detailed analysis of these 'signatures of selection' can help to elucidate the breed history and to identify genes responsible for the adaptive and economically significant traits in livestock populations [19].
Studies have highlighted the importance of utilising multiple methodologies for detecting regions of the genome that have been the target of selection [20], including fixation index (F ST ), hapFLK analysis and identification of runs of homozygosity (ROH) islands. F ST quantifies the differences in allele frequencies between populations [21,22] and is routinely used for identifying highly differentiated alleles [23,24]. The hapFLK statistics is a haplotype-based method for the detection of positive selection in multiple population data [25]. ROH islands are genomic regions with high homozygosity around the selected locus that might harbour targets of positive selection [26][27][28].
Several studies have focused on identifying signatures of selection in transboundary and local cattle populations by using high-throughput SNP genotypes and F ST [29-31], hapFLK, and ROH statistics [32]. However, only a limited number of studies have thus far attempted to identify signatures of selection in Russian cattle breeds, and they were based on the mediumdensity SNP genotypes [33,34].
In this study, we performed the comprehensive analysis of high-density SNP genotypes to characterise the population structure and detect the signatures of selection in two old Russian cattle breeds (Yaroslavl and Kholmogor), by using Holsteins as the reference. The three methods (F ST , hapFLK, and ROH) were implemented to detect the genome regions that could be under putative selection. We identified the regions revealing signatures of selection in each of the two breeds, annotated the genes localised within or close to the selected regions, and performed a functional study of these genes. Our results might be useful for the genetic improvement of Yaroslavl and Kholmogor cattle breeds and developing programs for the conservation of their biodiversity.

Ethics statement
This study does not involve any endangered or protected species. Samples were derived in part from the Bioresource Collection of the L.K. Ernst Federal Science Center for Animal Husbandry, supported by the Russian Ministry of Science and Higher Education. Blood samples (5 ml of whole blood) were collected for routine diagnostic purposes by trained personnel under strict veterinary rules. Sperm samples were provided by the artificial insemination (AI) stations according to specific scientific collaboration agreements.
The genotyping data supporting the results of this study are deposited in the Dryad Digital Repository (https://doi.org/10.5061/dryad.vt4b8gtq9).

Sampling and DNA extraction
The pedigree records were analysed to identify the purebred animals of Yaroslavl (YRSL) and Kholmogor (KHLM) breeds (S1 Fig). Only individuals having purebred ancestors on the both sides of pedigrees at least in three generations were selected for the analyses. To cover more breed variability, we sampled the animals originating from different breeding farms and representing different sire lines for each of the two breeds. Blood or semen samples were collected from 56 animals, including 31 samples of Yaroslavl breed and 25 samples of Kholmogor breed. DNA was extracted using Nexttec columns (Nexttec Biotechnology GmbH, Germany), according to manufacturer's instructions. The concentrations of dsDNA solutions were measured using a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Wilmington, DE, USA). The purity of DNA samples was checked by determining the OD 260/280 ratio by using NanoDrop-2000 (Thermo Fisher Scientific, Wilmington, DE, USA). In addition, the DNA quality was checked using 1% agarose gel electrophoresis.

SNP genotyping and quality control
The Illumina Bovine HD BeadChip (Illumina, San Diego, CA, USA) was used for genotyping. High-density SNP genotypes for European Holsteins (n = 25) obtained from Bahbahani et al. [16,35] were included in the dataset as the reference. R software [36] was used to create input files. Valid genotypes for each SNP were determined by setting the cut-off of 0.5 for the Gen-Call (GC) and GenTrain (GT) scores [37]. PLINK 1.9 software [38] was used to perform SNP quality control. All animals were subjected to filtering for genotyping efficiency (-mind 0.1). The SNPs genotyped in less than 90% of the samples (-geno 0.1) and those located on sex chromosomes were excluded from the analysis. The final data set used for analysis included 650,134 autosomal SNPs. Additional filters for linkage disequilibrium (LD) and minor allele frequency (MAF) were used to calculate several statistical parameters (see below). The positions of SNPs were assigned according Bos taurus genome assembly UMD 3.1.1 (https://www. ncbi.nlm.nih.gov/assembly/GCF_000003055.6).

Genetic diversity
Within-population genetic diversity was evaluated using 133,685 SNPs after LD-based pruning by using PLINK 1.9 [38]. One locus from each SNP pair where the LD (r 2 ) exceeded 0.5 within 50 SNP windows was removed (-indep-pairwise 50 5 0.5 flag, where 50 is the size of the sliding window, 5 is the number of SNPs shifted in each step, and 0.5 is the r 2 threshold). We calculated the observed heterozygosity (H O ), unbiased expected heterozygosity ( U H E ) [39], rarefied allelic richness (A R ), and inbreeding coefficient ( U F IS ) based on the unbiased expected heterozygosity by using R package "diveRsity" [40]. The genomic inbreeding coefficient based on ROH (F ROH ) was computed as the ratio of the sum of the length of all ROHs per animal to the total autosomal SNP coverage (2.51 Gb; for ROH estimation, see section 'Runs of homozygosity' below).

Effective population size
Trends of effective population size were estimated from LD by using the SNeP tool, as described by Barbato et al. [41]. We applied default parameters, except for sample size correction, mutation occurrence (α = 2.2) [42], and recombination rate between a pair of genetic markers, according to Sved and Feldman [43]. To determine the rate of N E changes in the 50 most recent generations, we determined the N E changing ratio (N E C) by calculating the slope of each segment that links a pair of neighbouring N E estimates and normalising the values by using the median of the most recent 50 N E estimates. Only SNPs with an MAF of >0.05 were used to estimate the effective population size.

Runs of homozygosity
A window-free method for consecutive SNP-based detection (consecutive runs method [44] implemented in R package "detectRUNS" [45] was used for the estimation of ROHs. We allowed one SNP with missing genotype and up to one possible heterozygous genotype in one run to avoid the underestimation of the number of ROHs that were longer than 8 Mb [46]. The minimum ROH length was 1000 kb. For minimising false-positive results, we calculated the minimum number of SNPs (l) as was initially proposed by Lencz  l ¼ logeða=ns�niÞÞ=ðlogeð1 À hetÞ; where ns = the number of genotyped SNPs per individual; ni = the number of genotyped individuals; α = the percentage of false-positive ROHs (set to 0.05 in our study), and het = the mean heterozygosity across all SNPs. In our case, the minimum number of SNPs was 31.

Principal component analysis, Neighbor-Net, and admixture
To characterize the relationship within and between populations at the genomic level and to assess the individuals with regard to their assignment to the corresponding breed, we performed principal component analysis (PCA), Neighbor-Net clustering, and model-based clustering (admixture). PLINK v1.9 software was used to perform PCA, and R package "ggplot2" was used to visualise the results [49]. A Neighbor-Net tree was constructed on the basis of pairwise identical-by-state (IBS) distances in SplitsTree4 [50]. We employed Admixture v1. 3 [51] for genetic admixture and R package "pophelper" [52] for plotting the results. The number of assumed ancestral populations (K) was estimated on the basis of the lowest cross-validation (CV) error compared to other K values by using a standard admixture cross-validation procedure [53].

Selection signature analysis
We used three different statistics for detecting the signatures of selection in the genome of the studied cattle populations: calculation of the fixation index (F ST ) for each SNP for comparison of the pairs of breeds, hapFLK analysis, and estimation of the ROH islands, which were overlapped among different animals within each breed.
We calculated pairwise genetic differentiation (F ST ) [53] between all SNPs in pairs of the studied cattle breeds (KHLM/HLST, YRSL/HLST, and KHLM/YRSL) by using PLINK 1.9. We applied a low threshold for MAF of less than 5% (-maf 0.05), because the filtering of SNPs based on MAF may affect the probability of identifying alleles related to selection [33]. The data set used for F ST analysis included 567,206 autosomal SNPs. The top 0.1% F ST values were used to represent the selection signature, as was previously considered by Kijas et al. [18] and Zhao [24].
The hapFLK method considers the haplotype structure of the population [25]. We used hapFLK 1.4 program [25] to detect the signatures of selection through haplotype differentiation among the studied populations. The number of haplotype clusters per chromosome was determined in fastPHASE by using cross-validation-based estimation and was set at 25 [54].
For detailed analyses, we selected the hapFLK regions containing at least one SNP with p-value threshold of 0.001 (-log10(p) > 3). The hapFLK regions carrying at least one SNP with p-value threshold of 0.00001 (-log10(p) > 5) were defined as the strongest hapFLK regions. To limit the number of false positives, we calculated q-values by using R package qvalue [55]. We applied a q-value threshold of 0.05 (corresponds to a false discovery rate of 5%).
We used R package "detectRUNS" [45] for ROH estimation. We selected the overlapping ROH segments with the minimal ROH length of 0.3 Mb shared by more than 50% of the samples as an indicator of possible ROH islands in the genome. Special attention was paid to the ROH islands shared by more than 70% of the samples, which were defined as the strongest ROH islands.
The results obtained by the three different methods (F ST , hapFLK, and ROH) were compared, and genomic regions selected by at least two different methods were identified. Moreover, we compared our results with the data previously obtained by Yurchenko et al.
[34] by using de-correlated composite of multiple signals (DCMS) statistics [56] for the presence of the overlapping genomic regions. Besides, we compared the identified genomic regions with the meta-assembly of selection signature in cattle [57].

Identification of genes and gene function within selected regions of the genome
The F ST statistic windows of 0.4 Mb (0.2 Mb upstream and 0.2 Mb downstream of the top 0.1% SNPs) were investigated to select overlapping gene segments. For hapFLK and ROH analyses, the genes were selected if they were localised entirely or in part within the identified genomic regions. Gene identification was performed using Bos_taurus_UMD_3.1.1 genome assembly (https://www.ncbi.nlm.nih.gov/assembly/GCF_000003055.6). The selected genes were considered as candidate genes. In addition, genes were compared with quantitative trait locus (QTL) regions included in the Cattle QTL database (http://www.animalgenome.org/ cgi-bin/QTLdb/index) [58]. Phenotypes that are known to be associated with the identified candidate genes were inferred from the most recent literature, by integrating PubMed database of the National Center of Biotechnological Information (https://pubmed.ncbi.nlm.nih.gov/).
The information for the annotation of some specific genes was sourced from Gene Cards (http://www.genecards.org/).

Genetic diversity
The estimation of the average heterozygosity within each breed indicated significant genetic diversities among the studied breeds ( Table 1).
The average level of U H E for the Yaroslavl breed was lower, whereas that for the Kholmogor breed was higher than that of Holsteins. The calculation of U F IS indicated heterozygote excess

Effective population size
The Yaroslavl and Kholmogor breeds had higher effective population size (N E5 = 111 and 130, respectively) than that of Holsteins (N E5 = 95) ( Table 1). The N E values for Kholmogor breed declined over time, whereas the lowest N E value was observed for Yaroslavl breed 5 generations ago (N E = 111) with the subsequent increase to 118 at 3 generation ago ( Fig 1A, S2 Fig).
All the three studied breeds had a similar pattern of the N E slope changes up to 7 generations ago. We observed 3 common peaks in N E changes characterised by accelerated decline of the effective population size 23, 15, and 7 generations ago. The most recent peak in N E for Kholmogor breed was found 4 generations ago ( Fig 1B).

Estimates of ROHs and F ROH
The ROH segments were found in all the studied breeds, with an average number of 71.84 ± 3.27 segments in Yaroslavl breed and 58.96 ± 2.38 segments in Kholmogor breed and mean ROH length of 3.59 and 2.50 Mb, respectively. In Holsteins, the values of the abovementioned estimations were 74.60 ± 2.28 segments and 3.62 Mb, respectively. Short segments (ROH 1-2 Mb ), caused by common ancestors around 25 to 50 generations ago, were the most distributed through the genome and accounted for 38.73% in Yaroslavl breed and 57.26% in Kholmogor breed of all ROHs identified, although the proportion of the genome covered by ROHs of this length class was relatively small (15.0% and 31.1%, respectively). In Holsteins, 44.97% of ROHs were characterised by the shortest ROH segments, which covered 17.2% of the genome (Fig 2, S1 Table).
The proportion of ROH segments with the greatest length (>16 Mb), typically caused by inbreeding to a very recent ancestor, as in parent-offspring, half-sib mating, or first cousin mating, was extremely low in the Yaroslavl and Holstein breeds (0.65% and 0.10% of the total number of ROHs, respectively). The genome coverage by the longest ROHs was the lowest and averaged 9.6% in the Yaroslavl breed. We did not find long ROH segments (>16 Mb) in the Kholmogor breed.

Breed relationship and admixture
The results of PCA revealed well-separated clusters corresponding to each of the three breeds. The first component, which explained 7.02% of the genetic variability, split the Yaroslavl breed from the Kholmogor breed and Holsteins, suggesting an early divergence of Yaroslavl from old Friesians; in contrast, the Kholmogor breed was separated from Holsteins by the second component, which was responsible for 5.03% of diversity (S3A Fig).

F ST statistics
We observed variation in genetic differentiation between breeds, based on F ST through the genome (Fig 3A-3C).

Identification of ROH islands
Overlapping ROH islands observed in more than 50% of samples within each breed were found across the genome in all the studied breeds (S5 Table,  We identified 17 and 20 ROH islands which were distributed among 12 chromosomes in the genome of Kholmogor and Yaroslavl cattle breeds, respectively. In most cases, the genomic distribution of ROH islands was specific for each breed both in length and localisation across chromosomes. The lengths of ROHs varied from 304.8 to 1276.9 kb in Kholmogor breed and from 297.2 to 1640.6 kb in Yaroslavl breed and were averaged 638.1 ± 59.4 and 691.0 ± 71.6 kb, respectively. For comparison, in Holsteins, 29 ROH islands were identified, which were distributed among 16 autosomes. The mean ROH length was 735.6 ± 85.8 kb with variation from 304.8 to 2741.3 kb. We identified several overlapping ROHs, which were common for two or three breeds. The . One strongest ROH region common for Kholmogor and Holstein breeds and one region common for Yaroslavl and Holstein breeds was found on BTA14 and BTA16 (S5 Table).
Comparison of the genomic localisation of the regions identified in our study by using F ST , hapFLK, and ROH analyses, as well as in the previous study performed by Yurchenko et al.
[34] by using DCMS statistic [56] showed the presence of several genomic regions under putative selection in the studied breeds identified at least by two different methods (Table 2).
Twelve such regions were detected in Yaroslavl breed and nine in Kholmogor breed compared to 17 regions in Holsteins.

Identification of candidate genes within selected regions of the genome
We identified candidate genes within which the SNPs selected by F ST analysis were localised (S2 Table), as well as the genes entirely or in part localised within the detected hapFLK regions and ROH islands (S6 Table). For more thorough analysis, we selected the genes, localised within genomic regions, which revealed the signature of selection identified by hapFLK and/or carried the strongest ROHs (ROH islands observed in more than 70% of animals) or were detected at least by two different methods (Table 3).   Table).

Genetic diversity, effective population size, and breed relationship
To our knowledge, this is the first comprehensive genome-wide study of putative signatures of selection in the genomes of Yaroslavl and Kholmogor breeds performed on the basis of the analysis of high-density SNP genotypes (650,134 autosomal SNPs) generated using Bovine SNP HD BeadChip (Illumina, USA). In our recent studies, we showed that these two oldest Russian native breeds have been little influenced by introgression with transboundary breeds  and maintained their historical genomic components [9, 11]; thus, they may carry specific signatures of selection, reflecting their adaptation to the local environmental conditions and response to the breeding strategy used. Adaptation to the changing environment is increasingly important [59]; thus, native cattle breeds might become valuable sources of genetic  TICAM1, FEM1A, MIR7, DPP9, TNFAIP8L1, SEMA6B, PLIN5, LRG1, PLIN4, HDGF2UBX, N6, CHAF1A,  SH3GL1, STAP2, MPND, FSD1, TMIGD2, SHD, CCDC94, EBI3, ANKRD24, SIRT6, CREB3L3 variability, which will be necessary for developing sustainable animal production systems in the future.
Considering the relatively small sample size, we were very thoughtful when choosing animals for research. The selected individuals represented different sire lines and different breeding farms, and unveiled the low amount of Holstein ancestral components (S3 Fig). Besides, the small sample sizes appear to be sufficient to obtain the reliable results using high-density SNP markers [56].
We found that 9.36% of variability was attributed to genetic differences between breeds (F ST = 0.094), and the remaining 90.64% was the result of allelic variation within breeds. We observed slightly higher level of genetic variability in Yaroslavl ( U H E = 0.349, A R = 1.967) and Kholmogor ( U H E = 0.361, A R = 1.977) breeds (Table 1) than was previously detected using medium-density DNA BeadChip [9, 10]. Variability was significantly lower in the Yaroslavl breed than in the Kholmogor breed (p < 0.001), probably because of more drastic decline in the census population size. During the last 60 years, the number of Yaroslavl cattle has decreased by practically 20 times compared to that of Kholmogor breed, which decreased by 4.5 times [12]. Our data agreed with the findings of Xu et al. [15], who performed the genotyping of three commercial taurine breeds by using high-density DNA array and observed similar level of genetic variability (H E = 0.337-0.356). In contrast, Stronen et al. [17] detected lower genetic variability in taurine breeds (H E = 0.226 and 0.298); this was probably because of the small population size of the studied native breeds. We observed slight excess of heterozygotes in both the Russian breeds ( U F IS = -0.022 and -0.013 for Kholmogor and Yaroslavl breeds, respectively), which agrees with the findings obtained using DNA chips of lower density [9, 10]. The detected inbreeding coefficient in Holsteins ( U F IS = -0.010) was similar or lower than one which was discovered in recent reports (F IS = −0.004 [60], F IS = 0.0526 [15]), which might reflect the differences in sample origins and population sizes.
The N E showed an overall decline for the studied breeds over generations (Fig 1A, S2 Fig). The most recent N E values observed for Yaroslavl (N E5 = 111) and Kholmogor (N E5 = 130) cattle were higher than those in Holsteins (N E5 = 95) and exceeded the critical threshold of N E = 100 estimated for long-term viability of discrete livestock breeds [61]. We observed four periods of accelerated decline in N E during the last 45 generations (Fig 1B). The more ancient periods were observed between 28 and 23 as well as 18 and 15 generations ago, which probably reflect the World wars crisis. The subsequent acceleration was detected between 8 and 7 generations ago, which was most likely the result of the implementation of artificial insemination, which is associated with the decrease in bull number. The recent accelerated decline in N E was found between 5 and 4 generations ago for Kholmogor breed and between 6 and 5 generations ago for Yaroslavl breed, when the N E decreased from 130 to 115 and 113 to 111, respectively. The possible explanation for this could be the large decrease in the population size of the purebred cattle from the mid-1990s to the mid-2000s owing to the general decline of industry in the first decade of the post-Soviet period.
We observed visible differences in the autozygosity level among the studied breeds. The average ROH length (258.3 Mb) in Yaroslavl breed was similar to that in Holsteins (270.1 Mb) and exceeded that in the Kholmogor breed (147.6 Mb). The higher total ROH length in Yaroslavl breed might be attributed to the higher inbreeding owing to the lower population size. Among the different length classes, short ROH segments (1-2 Mb) were prevalent (Fig 2, S1 Table), which agreed with the findings obtained in different cattle populations [48, 62,63].
The results of the PCA plot, Neighbor-Net analysis, and admixture clustering (S3 Fig) suggest a low contribution of Holstein ancestors (probably Holland cattle) in the formation of the architecture of the Yaroslavl and Kholmogor breeds, which has been proposed by several researchers [5, 9-11].

Identification of genomic regions under putative selection in the Kholmogor and Yaroslavl breeds
We applied three different statistical methods (selection of top 0.1% SNPs by F ST value for pair-wise breed comparison, identification of ROH islands shared by more than 50% of animals within each breed, and hapFLK analysis) to identify the genomic regions and genes that are affected by selection (S2 and S6 Tables). Considering a possible impact of genetic drift or recent inbreeding on ROHs [64], we selected the ROH islands to be under selection pressure, if they were confirmed by at least one more method or were observed in more than 70% of animals. Results obtained using multiple analytical approaches typically have higher informative power, furthermore, methods based on different approaches may also complement each other [20,65]. Fifteen of seventeen regions, identified in Holsteins were overlapped with genomic regions, which were described by Yurchenko et al. [34] and eleven were overlapped with regions, which were validated by meta-assembly of selection signatures (S8 Table) [57]. We confirmed nine regions under putative selection in the genome of Yaroslavl cattle and six regions in the genome of Kholmogor cattle, which were previously identified by Yurchenko et al.
[34] by using DCMS analysis [56] of medium-density SNP genotypes; in this study, the flanking positions of most of these regions were expanded (Table 3). In addition, we detected three new putative genomic regions affected by selection by using at least two different methods in the genome of Yaroslavl ( An additional overlapping region with Holsteins (localised on BTA 13 at 5.0-6.8 Mbp) was found in the Yaroslavl breed. Seventeen of twenty-one regions, found in the Yaroslavl and Kholmogor breeds were overlapped with the signature of selections, which were identified in other European breeds by meta-assembly [57], while two genomic regions in each breed were novel (S8 Table).

Functional annotation of candidate genes localised within the selected regions
We performed functional annotation of genes localised within the putative regions carrying signals of selection in the genome of the studied breeds. Along with genes, which were previously described for Yaroslavl and Kholmogor cattle [34], we expanded the list of candidate genes by elucidating the known genomic regions and detecting additional selection sweeps in the genome of the studied breeds (Table 3, S6 Table).
We found two regions overlapping with the Holstein genomic region under selection pressure in both Yaroslavl and Kholmogor breeds (  PPP1R16A, FOXH1, CYHR1, SLC39A4, CPSF1, ADCK5, FBXL6, BOP1, SCRT1, DGAT1,  GPAA1, EXOSC4, PARP10, HSF1, OPLAH, and GRINA), associated with milk production traits in Kholmogor cattle was found on BTA 14. These genes were shown to be involved in regulation of milk fatty acid composition [113][114][115][116], milk yield [117,118], milk fat yield, and milk fat percentage [119][120][121][122][123]. In contrast to Kholmogor cattle, we identified only two candidate genes on BTA 5, which were associated with milk production traits in Yaroslavl cattle, including TMCC3 and DDX10 [124,125]. The possible explanation for the presence of numerous milk trait genes affected by selection in Kholmogor cattle could be that one of the breeding targets for this breed from the middle of the 19 th century was to supply the Petersburg and Moscow citizens with high-quality dairy products [5]. This stimulated the selection of highproducing cows with excellent milk quality, suitable for butter and cheese production, and thus led to the alterations in the allele frequencies of genes which are involved in the regulation of milk production and composition.
We identified a few more candidate genes associated with reproduction traits, including GSK3B on BTA 1 [126], ADAD1 on BTA 17 [127,128], and UBE3B on BTA 21 [129,130], in the Yaroslavl breed. These genes are associated mainly with female fertility traits, that can reflect the unique capacity of cows of Yaroslavl breed to produce healthy calves even with severe weight losses because of poor feeding in winter [4]. The genes detected in Kholmogor breed, including WDR19 on BTA 6 [131], NME5 on BTA 7 [132,133], NT5E on BTA 9 [134], PIWIL4 on BTA 15 [135], TUBB3 and UQCRFS1 on BTA 18 [136,137], are mainly related to male fertility. Since the Kholmogor breed had become the breed of choice for the growing market of Petersburg and Moscow in the 19 th century, the use of bulls with increased fertility was preferred.
We detected selection pressure on genes involved in lipid metabolism in Yaroslavl cattle, including PLIN4, PLIN5, and CREB3L3 on BTA 7 [138][139][140][141], MFN2 on BTA 16 [112], and NDUFV2 on BTA 24 [142]. This observation is probably associated with the capacity of cattle population of Central Russia to accumulate sufficient subcutaneous fat during the pasture period that helped them to survive under poor nutrition conditions in winter and to maintain pregnancy [4].
In Yaroslavl breed, we found a putative signature of selection around the KIT gene on BTA 6, which is a functional candidate for coat coloration in cattle, including white spotting patterns in the Holstein [143] and Fleckvieh cattle breeds [144]. According to the QTL database [58], the region on BTA 6 between 71.4 and 72.1 Mbp, where the KIT gene is localised, contains the genes associated with eye area pigmentation in Fleckvieh cattle [145]. The black pigmentation of the eye area is one of the well-known distinguishing features of Yaroslavl cattle (S1 Fig). In summary, our study provides deeper insight into the genomic architecture of Yaroslavl and Kholmogor cattle breeds and allowed the identification of the genomic regions and genes that were affected by selection during the century-long history of breed formation. Our research results will be useful for the sustainable development and conservation of these two oldest Russian native cattle breeds.