Runs of Homozygosity and NetView analyses provide new insight into the genome-wide diversity and admixture of three German cattle breeds

Angler (RVA) and Red-and-White dual-purpose (RDN) cattle were in the past decades crossed with influential Red Holstein (RH) sires. However, genome-wide diversity studies in these breeds are lacking. The objective of the present study was to elucidate the genome-wide diversity and population structure of the three German cattle breeds. Using 40,851 single nucleotide polymorphism markers scored in 337 individuals, runs of homozygosity (ROH) were analysed in each breed. Clustering and a high-resolution network visualisation analyses were performed on an extended dataset that included 11 additional (outgroup) breeds. Genetic diversity levels were high with observed heterozygosity above 0.35 in all three breeds. Only RVA had a recent past effective population size (Ne) estimate above 100 at 5 generations ago. ROH length distribution followed a similar pattern across breeds and the majority of ROH were found in the length class of >5 to 10 Mb. Estimates of average inbreeding calculated from ROH (FROH) were 0.021 (RVA), 0.045 (RDN) and 0.053 (RH). Moderate to high positive correlations were found between FROH and pedigree inbreeding (FPED) and between FROH and inbreeding derived from the excess of homozygosity (FHOM), while the intercept of the regression of FROH on FPED was above zero. The population structure analysis showed strong evidence of admixture between RVA and RH. Introgression of RDN with RH genes was minimally detected and for the first time, the study uncovered Norwegian Red Cattle ancestry in RVA. Highly heterogeneous genetic background was found for RVA and RH and as expected, the breeds of the extended dataset effectively differentiated mostly based on geographical origin, validating our findings. The results of this study confirm the impact of RH sires on RVA and RDN populations. Furthermore, a close monitoring is suggested to curb further reduction of Ne in the breeds.


Introduction
Following their domestication, cattle became adapted to specific environments into which they were introduced by their human capturers. The complexity of their origin, coupled with both artificial and natural selection culminated into the development of numerous breeds that exhibit variable phenotypes linked to different local conditions across regions of the world [1,2]. In the past few decades, however, factors including advances in breeding methods, industrialisation of cattle husbandry and the globalization of human society have resulted in the development of specialised high yielding breeds, which are often used at the expense of local breeds [1,3]. This calls for urgent actions at regional, national and international levels to save local breeds from extinction.
In Schleswig-Holstein in the northern part of Germany, two local cattle breeds namely, Angler (RVA) and Red-and-White dual-purpose (RDN) cattle are regionally well acknowledged and kept in commercial enterprises. RVA is popular for its high milk fat percentage and though a dual-purpose breed, kept primarily for milk production. It has contributed to the genetics of many local red breeds of central and eastern European countries and in the Baltic region [4,5]. Similarly, RDN was used to improve the performance of the Belgian native Redand-White breed some decades ago [6]. The performance RVA and RDN in terms of milk yield falls below that of Red Holstein (RH) cattle in the same region. Therefore, breeders attempted to improve milk yield of the local breeds by the introduction of genes from high yielding breeds. Bennewitz and Meuwissen [7] highlighted the introgression of RVA with genetics of other breeds (e.g. RH and Swedish Red), which happened in the 1960s. For RDN, a pedigree of the breed typically harbours RH genes reaching about 25% [8]. Like other local breeds, RVA and RDN are small in numbers and their respective herdbook numbers are declining. To our knowledge, genetic diversity studies in these breeds are very limited. For instance, a pedigree-based assessment of genetic diversity in the RVA and RDN populations revealed some influential RH sires being key ancestors of these populations [9]. The influence of same sires in both populations makes it interesting to study genome-wide diversity across these three breeds. In an independent study [10], a close relationship (genome based kinship coefficient = 0.039) was found between RVA and RH although principal component analysis put the two breeds apart.
Genome-based genetic diversity studies are nowadays the method of choice given the recent advances in SNP chips technology and the developments of advanced computational methodologies in population genetics [11][12][13]. Medium to high density SNP panel has been used especially in cattle [14][15][16][17] and to a lesser extent in sheep [18][19][20], goats [21,22] and horses [23,24]. In most of these studies, principal component analysis (PCA) and the modelbased admixture analysis [25] were effectively used to infer population structure between breeds. More recently, a high-definition network visualisation tool (NetView) was introduced [26] and further developed [27] to investigate population differentiation and produce finescale population structures without prior knowledge of individual ancestry. For studies involving within breed diversity, an inbreeding concept based on runs of homozygosity (ROH) is gaining wider acceptance [28]. These methodologies provide new approaches that can be applied in local breeds to guide conservation decision. Therefore, the objective of the current study was to use various methods in the assessment of whole-genome diversity of three German cattle breeds (RVA, RDN and RH) and furthermore, investigate admixture and gene flow events amongst them. The study also investigated the usefulness of ROH in these breeds. Findings of the research will provide insight into the composition and diversity of the breeds.

Data and data preparation
Genotyping data on 338 animals including 169 RVA, 69 RDN and 100 RH individuals were obtained from the official computation centre responsible for breeding value estimation (vit, Verden, Germany) for the analyses. The vit also provided pedigree data on RVA and RDN. All animals were genotyped with the Illumina BovineSNP50 BeadChip: 207 with chip type v2 featuring 54,609 SNPs and 131 with v1 (54,001 SNPs). The data were merged in PLINK [29] after the removal of all non-autosomal and non-annotated SNPs. The merged dataset, consisting of 51,105 SNPs, was filtered to keep individuals with 95% of their SNPs called and SNPs with 95% call rate across all samples. SNPs with minor allele frequencies (MAF) lower than 5% or that deviate from Hardy Weinberg expectation (P value � 0.0001) were excluded. After filtering, 337 individuals remained, all having the same 40,851 SNPs with an average distance of 60.496 kb (minimum, 0.065 kb; maximum, 937.787 kb) and a mean r 2 value of 0.186 between adjacent SNPs (S1 Table). Using quality filtered data, the "-genome" function in PLINK [29] was used to derive genomic relationship coefficients between all individuals for each breed. Subsequently, from the pairs of highly related individuals (relationship coefficient > 0.30), one animal was randomly discarded to ensure an appropriate representation of the diversity of each breed. This step was necessary to minimize within breed clustering in population structure analysis. Finally, 106 RVA, 47 RDN and 88 RH individuals remained and were used for all calculations except in the detection of ROH where all 337 individuals were used. Besides the original data for this study as describes, genotype information on 274 individuals available publicly at the "Web-Interfaced next generation Database dedicated to genetic Diversity Exploration" (WIDDE) [30] were included as outgroup data in the population structure analyses. These cover 11 different breeds: 10 genotyped with the Illumina BovineSNP50 BeadChip and 1 (i.e. Angus) with the Illumina HD chip ( Table 1). The outgroup data were pruned applying previously mentioned filtering criteria and were subsequently merged with the quality controlled original dataset. The resulting dataset had 32,566 SNPs on 515 individuals.

Heterozygosity and genomic effective population size
Observed heterozygosity (H o ) was estimated for each breed using the "-het" command in PLINK [29]. This was calculated as a difference between the number of homozygotes and the number of non-missing genotypes, expressed as a proportion of non-missing genotypes. The recent past demographic history of each breed was studied by estimating effective population  [32] sizes (N e ) through linkage disequilibrium (LD) applying the SNeP package [33]. SNeP predicts N e following Corbin et al. [34]: where N T(t) is the effective population size t generations ago calculated as t = (2f(c t )) −1 [35], c t is the recombination rate for a specific physical distance between SNPs estimated using Sved and Feldman approximation [36], r 2 adj is the LD value adjusted for sample size and α is a correction for the occurrence of mutations. Default settings of the programme were used apart from a specification of sample size and a setting of the maximum distance between SNPs to 10 Mb. The chosen distance allowed the estimation of Ne related to 5 generations back (Ne 5Gen ), which was considered as the most recent estimate following Deniskova et al. [19]. For much recent generations, LD-based Ne tends to be unreliable [34]. The extent of LD decay in each breed was analysed from PLINK [29] generated LD values which were calculated for markers on the same chromosome that are less than 2 Mb apart.

Runs of homozygosity and inbreeding
For each of the 337 individual, ROH were detected using PLINK [29] and by applying a sliding window of 50 SNPs, allowing one possible heterozygous genotype, up to two missing genotypes, a minimum SNP density of 1 SNP every 100 kb and a maximum gap of 1 Mb between consecutive homozygous SNPs. Next, the abundance and size distribution of ROH were investigated. ROH were categorized based on their physical length into 1 to 5 Mb, >5 to 10 Mb, >10 to 15 Mb, >15 to 20 Mb, >20 to 25 Mb, >25 to 30 Mb and >30 Mb as in previous studies [21,22]. Furthermore, the total number of ROH and the sum of all ROH segments were calculated for all animals per breed and per ROH length category. Genomic inbreeding coefficient based on ROH (F ROH ) was calculated for each animal following McQuillan et al. [28]., i.e. F ROH = ∑L ROH /L AUTO , where ∑L ROH refers to the total length of all ROH in the genome of an individual and L AUTO is the length of the autosomal genome coverage by SNPs in the analysis (2,499,219 kb). Following Mészáros et al. [37], F ROH was calculated for three different minimum lengths of ROH including 4, 8 and 16 Mb representing F ROH>4 , F ROH>8 and F ROH>16, respectively. An alternative genomic inbreeding coefficient (F HOM ) was calculated for each animal based on the excess in the observed number of homozygous genotypes within an individual relative to the mean number of homozygous genotypes expected under random mating [38]. The observed and expected number of homozygous genotypes were obtained using the "-het" PLINK [29] command. Pedigree-based inbreeding coefficients (F PED ) were calculated for 155 RVA and 65 RDN genotyped individuals with 3,375 and 2,204 available records of ancestors, respectively. F PED was calculated following the algorithm of Meuwissen and Luo [39] as implemented in PEDIG [40]. Average pedigree completeness at the fifth parental generation was 88% and 89% for the 155 and 65 individuals, respectively. Finally, F ROH, F HOM and F PED were compared using linear regression and Pearson's correlation coefficient (r p ).

Population structure
To characterise the population structure, LD pruning was performed on the quality filtered data using the "-indep 50 5 2" PLINK [29] command. Only 19,971 SNPs remained for this analysis. In the first step, PLINK [29] was used to generate genome-wide pairwise identity-bystate (IBS) distances between individuals as well as eigenvectors on which basis the relationships between breeds were studied using PCA. Secondly, the programme ADMIXTURE [25], which uses a model-based clustering algorithm, was used to determine the optimal number of k clusters (i.e. the most likely number of contributing populations), and to describe individual ancestry in terms of these clusters. Using the "cv-flag" of the programme, a 20-fold cross-validation procedure was performed for a range of k between 1 and 40, and the k with the lowest cross-validation error was considered as the optimal number of clusters for the data. Cluster assignments ranging from k = 2 to k = 20 were graphically presented using POPHELPER [41]. Thirdly, a NetView analysis [26] which uses an unsupervised clustering method known as Super Paramagnetic Clustering (Spc) was applied to detect fine-scale population structures based on genetic distances. Network construction and the clustering of individuals were implemented using the R version of NetView [26,27]. The programme requires a specification of the maximum number of nearest neighbours (K-NN) an individual can have. K-NN was set to 10, 35 and 100 to investigate the characteristics of both fine-and large-scale genetic structures. Following clustering, population networks were visualised using Cytoscape v3 [42]. Finally, admixture results at the optimal k clusters were included in the network construction such that node colour represents individual level of admixture at the selected number of clusters. Table 2 summarises genetic diversity parameters including relationship coefficients, heterozygosity and effective population size (estimates). Average genomic relationship coefficients (Ø gRel) were 0.028 in RH, 0.031 in RVA and 0.033 in RDN. Estimates of H o were high in all breeds and reached 0.374 for RVA. Average LD decreased along growing inter-marker distances, with a slight variation across breeds (Fig 1). RVA (green) displayed the fastest LD decay while RH (red) showed the slowest. The estimate of Ne 5Gen was largest in RVA (113) and smallest in RDN (67). For all breeds, N e declined over time (Fig 2A). RVA maintained the largest N e across all generations. For several generations, N e trend in RDN was slightly above that in RH until about 30 generation ago when the trajectories became the same and the opposite happened then after ( Fig 2B).

Runs of homozygosity and inbreeding
The number and length of ROH segments differed considerably among animals and across breeds (Table 3). ROH segments were identified in all but 14 RVA individuals. Averaging the total number of ROH segments over the number of animals with identified ROH, RH had the highest number of ROH segments (10.25), whilst RVA had the lowest number (4.61). RH and RDN had greater proportion of their autosome, 0.53 (132.51 Mb) and 0.45 (112.91 Mb), respectively, covered with ROH. The proportion of autosomal genome length covered by ROH in RVA was 0.23 and comparatively small. Variation existed in the distribution of the various ROH length classes, but a common pattern was followed across breeds (Fig 3). The majority of ROH segments (~47 to 49%) were found in the length class >5 to 10 Mb for all breeds. Within the second highest represented length class, >10 to 15 Mb, the proportion of ROH was highest in RVA (28%) and about the same in RDN and RH (~24%). Longer ROH segments (>30 Mb) were very few (~5%) but the proportion was even smaller for length class 1 to 5 Mb. Regardless of the minimum length of ROH threshold or method of computation, inbreeding was generally lower in RVA, intermediate in RDN and highest in RH (Table 4). When the minimum length of ROH was set to 4 Mb, average F ROH varied from 2.1% (RVA) to 5.3% (RH). The estimates varied from 1.8% to 4.4% and 0.9% to 2.3% for minimum length setting of 8 and 16 Mb, respectively. F ROH within breed also varied (see standard deviations) with values reaching over 11% in some RDN and RH individuals. Overall, F ROH decreased with increased minimum ROH length threshold. Negative values were obtained for F HOM in most individuals. Estimation of F PED was only possible for RVA and RDN since pedigree records were not available for the RH genotypes. The completeness of pedigree in RVA and RDN are provided in S1 Fig.
The pedigree-based average inbreeding coefficients estimated were generally lower than those computed from ROH. In Fig 4, the linear regression plots of F ROH on F PED (left) and F ROH on F HOM (right) for RVA and RDN are provided for the different ROH length thresholds (A, B and C). Within and across breeds, moderate to high correlations significantly different from 0 (p-value < 2.2e-16) were found for all comparisons. Between F ROH and F PED, r p was strongest for the 4 Mb threshold (0.701, R 2 = 0.491), followed by 8 Mb (0.690, R 2 = 0.476) and then 16 Mb (0.620, R 2 = 0.385). The correlations between the two genomic inbreeding measures (i.e. F ROH vs. F HOM ) were slightly stronger and followed the afore-mentioned pattern, thus 4 Mb (0.752, R 2 = 0.565), followed by 8 Mb (0.740, R 2 = 0.548) and then 16 Mb (0.660, R 2 = 0.435). An investigation of the relationship between F HOM and F PED showed a comparatively weaker correlation (0.511, R 2 = 0.261) between the two inbreeding measures, although significant at p-value = 5.246e-16 (S2 Fig). The intercept of the regression of F ROH on F PED was high (0.011) at the minimum ROH length threshold of 4 Mb and close to 0 (0.002) when the minimum length was increased to 16 Mb.

Population structure
The first principal component (PC1) explained 21.2% while the second (PC2), third (PC3) and fourth (PC4) explained 17.9%, 13.4% and 9.2%, respectively of total variation in the data ( Fig  5). Contrasting PC1 vs. PC2, the West African breed (BAO) was clearly separated from the rest     African breed (BAO) from the European ones (predominantly red background). As the value of k increased, BAO maintained its unique ancestry while the European breeds differentiated mostly based on geographical origin. Consistent with the findings of PC1 vs. PC2, the admixture results at k = 5, showed a near identical genetic background for JER and GNS (steel blue) and for ANG and RGU (green). Additionally, the Northern breeds (RVA, RH, HOL, RDN and NRC) showed varying degrees of admixture (predominantly red) in a similar fashion as the Swiss and French breeds (predominantly pink). At k = 9, the French breeds (magenta) differentiated almost entirely from BSW; GNS (yellow) differentiated from JER; and RDN (blue) and NRC (forest green) showed higher degree of distinction from the rest of the Northern breeds. Here, NRC shared more of its ancestry with RVA while genetic background of RH and HOL was still almost identical. At k = 13, ABO and CHA differentiated from MON and some degree of dissimilarity of ancestor fractions was observed between RH and HOL for the first time. At the optimal number of k clusters (k = 19), all three French breeds were effectively differentiated although specks of admixture among them was evident. At this cluster level, there was persistency of the varying levels of admixture among the Northern breeds. RVA showed the highest level of admixture with RH and HOL and besides, these breeds, especially RVA and RH had highly mixed genetic background.
The network visualization analysis provided a fine-tuned resolution of connectedness within and between breeds. In Fig 7, network graphics are presented for K-NN = 10 (A) and K-NN = 100 (B). At a K-NN value of 10, there exists a massive interconnection of networks between the Northern European breeds (HOL, RH, RVA and NRC). This, however, excluded RDN that formed a separate and single cluster. Samples of the ANG and RGU were also highly connected although a displacement of four ANG individual was observed. All the remaining breeds formed separate clusters and for ABO, BAO, GNS and NRC at least one individual was displaced from the breed cluster. Using a K-NN value of 100, the Channel Island breeds formed a single cluster that loosely connects to the rest of the European breeds. Consistent with both the PCA and Admixture results, there was no connection between BAO and the European breeds even at this high K-NN value. The use of an intermediate number of nearest neighbours (i.e. K-NN = 35) resulted in the formation of clusters and networks completely based of geographical origin (Fig 8). Notably, RDN was only loosely connected to the rest of the Northern breeds. By including the results of admixture analysis at the optimal number of k = 19 clusters, the proportions of genetic background of each sample (illustrated with node colours), alongside the sample's connectedness with other genotypes are displayed (in Box). Generally, samples within a cluster had similar ancestry and for most breeds this tend to be uniform overall, while those from RVA and RH were most diverse.

Discussion
Knowledge of the genetic diversity within and between breeds is essential for management decisions that aim at breed preservation. This is especially true for local breeds that are often small in numbers due to deliberate use of high yielding breeds in most commercial settings. The primary objective of this study was to assess whole-genome diversity of two local cattle (RVA and RDN) and a high yielding breed (RH) and subsequently, investigate patterns of admixture of these breeds applying different methods. Our study represents the first investigation of heterozygosity, molecular based N e and ROH in RVA and RDN. Besides, admixture among the three breeds has never been reported.

Genetic diversity and demographic changes
Average genomic relationship within breed was similar (0.031 vs. 0.033) for RVA and RDN and slightly lower (0.028) for RH. These estimates are much lower than the estimates reported by Signer-Hasler et al. [17] (0.044 to 0.155) for several cattle breeds. Among the breeds in the previous study [17], average genomic relationship for RH was 0.060. The sampling of individuals for a study can influence the overall genomic relationship among chosen animals. The data used in our study were already available and to mimic a randomly sampled population, a cap Genome-wide diversity study of three German cattle of 0.3 was put on the genomic relationship between any pair of individuals within a breed, i.e., following Burren et al. [22]. This could partly explain the observance of lower estimates of average genomic relationship in our study. Thus, the sampling of whole families for instance, could result in an overestimation of relatedness within populations. Observed heterozygosity has been used as a measure of within breed genetic diversity in several studies [15][16][17] in which analyses were essentially based on the Illumina BovineSNP50 BeadChip. Estimates of H o in this study (0.356 to 0.374) fall within the range of values found in the previous studies: 0.357 to 0.399 in nine Swiss dairy cattle populations [17], 0.297 to 0.358 in thirty-two Italian cattle breeds [15] and 0.026 to 0.33 in fifty-three cattle breeds from around the world [16]. In the latter study [16], very low genetic diversity estimates (below 0.20) were found for some breeds, especially those from outside Europe. The authors attributed the differences in estimates to population demographic dynamics and to ascertainment bias [31,32] that tends to favour breeds that were included in the discovery of the BovineSNP50 chip assay. Similar to most European breeds, all the breeds in our study had a high estimate of heterozygosity. The H o estimates are consistent with estimates of N e in most recent generations. From our results, RVA maintained the highest genetic diversity among the breeds with an estimate of Ne 5Gen above 100. Smaller N e estimates in RDN (67) and RH (86) may suggest limited gene flow from other breeds during the developmental history of these breeds. Trends in N e however, revealed larger estimates for all breeds in earlier generations. Reduction in N e across generations in this study suggests the occurrence of a past genetic bottleneck in all breeds and this is consistent with findings of many other studies [3,15,43,44]. The 2018 annual report on cattle production in Germany provided census numbers including 595,929 (RH), 110,000 (RDN) and 30,411 (RVA) [45], however, the number of herdbook animals is much lower. Thus, the census numbers can be deceptive, as they do not accurately inform of the genetic diversity present in the breeds. RVA has a smaller census number; nevertheless, it maintained the highest genetic diversity probably due to high levels of introgression activity. This is to some extent supported by the observed trend in LD decay. RVA showed a faster LD decay indicative of high within breed diversity. In RDN and RH, decay was slower probably due to the presence of long haplotypes. Therefore, average inbreeding is expected to be higher in these latter breeds.

Usefulness of ROH inbreeding
Wright's [46] pedigree inbreeding concept has a long history of widespread and routine use in genetics. Pedigree inbreeding estimation assumes a base population in which individuals are unrelated and non-inbred and it has the tendency to underestimate realised relationships. Nowadays, the increased availability of cost effective genomic data in the form of SNPs has made it possible to develop methods for the estimation of genomic inbreeding. In a previous study, F ROH and F HOM were found to be more precise and often less biased than F PED [47]. Among the genomic estimators, F ROH presents some advantages that can make it the method of choice. For instance, ROH length can be used to distinguish between recent and ancient inbreeding. Thus, long ROH are indicative of recent inbreeding whereas short ROH predict inbreeding of ancient origin [38,47]. F ROH is also not affected by allele frequencies [48] and it Genome-wide diversity study of three German cattle only takes values between zero (0) and one (1). Consequently, it provides the best premise for comparison with F PED . ROH detection parameters in this study were somewhat adapted to the thresholds used by Purfield et al. [49] for the Bovine SNP50 panel in one of the most extensive studies of ROH in cattle. For all our breeds, the majority of ROH were observed in the length class >5 to 10 Mb. This is consistent with the finding of Signer-Hasler et al. [17] but contrary to other findings [49]. In the latter, Purfield et al. [49] found ROH of length class 1 to 5 Mb to be more frequent in using the high density SNP panel (777,972 SNPs). Our observation of very few ROHs (less than 4%) in the length class 1 to 5 Mb may be an inherent property of the Illumina BovineSNP50 BeadChip. Validating the panel's reliability to infer ROH that had been previously detected with greater accuracy using HD array, Purfield et al. [49] reported a failure of the reduced panel to detect about 27% of ROH in the length class 1 to 5 Mb. However, the proportion of ROH in this length class reported by Signer-Hasler et al. [17] was still high (about 18 to 35%) though the overall distribution was similar to what we found. In fact, all ROH detected in our study were above 4 Mb even when no restriction was put on the minimum ROH length. Hence, it is not surprising to have the smallest proportion of ROH in the length class 1 to 5 Mb. Undoubtedly, old inbreeding contributed to overall inbreeding in our breeds, however, a substantial proportion of this contribution arose from ancestors in generations not too distant in time, probably around 12 generations ago under the assumption that 1 cM = 1 Mb. Thus, inbreeding coefficients calculated from ROH with minimum length higher than 4, 8 and 16 Mb are expected to correspond to the reference ancestral population being remote approximately 12, 6 and 3 generations, respectively [37]. This actually forms the basis for calculating F ROH>4 , F ROH>8 and F ROH>16 in this study. The estimated F ROH>16 showed that recent inbreeding is more a problem in RDN and RH than in RVA. This is not different for the other minimum ROH length measures. These finding are consistent with the H o and N e estimates reported. Red Holstein is a high yielding breed whose performance is next to the Black-and-white Holstein in the region (Schleswig-Holstein). Average milk yield of the breed recorded in 2017 was 8,208 kg compared to that of RVA (7,638 kg) and RDN (6,738 kg) [45]. Use of reproductive techniques such as sexed semen in first inseminations is more intense and has a much longer history in RH [45]. Such techniques promote wide spread use of few animals with the consequence of increased inbreeding levels.
Negative average values of F HOM were found for all our breeds and this was not surprising since the estimation procedure made use of the current sample of individual as reference [50]. Purcell et al. [13] warned that strongly negative values of this parameter could stem from sample contamination events. However, Wang [50] explained that such negative values can be legitimate and must not be interpreted as probabilities, rather, the expected relative deficit of the occurrences of homologous genes that are identical in state within individuals due to the relative deficit of shared ancestry. Inbreeding estimates based on the pedigree method were found to be generally lower than those from F ROH and this is also the case for many other studies in cattle [14,17,48]. Similar to our finding, the magnitude of the difference between F ROH and F PED was about a double in Zhang et al.'s study [48] when a reduced SNP panel was used and even larger in using sequenced data. In Signer-Hasler et al.'s study [17] however, only a slight difference was observed between the two inbreeding estimators, for instance, estimates of 0.045 and 0.042 were obtained for F ROH and F PED , respectively, in RH. Lower estimates of F PED can be the results of incomplete and shallow-depth pedigree. In the current study, average pedigree completeness at the fifth parental generation was 88% for RVA and 89% for RDN. At that same parental generation in Signer-Hasler et al.'s study [17], pedigree completeness was above 99% for all breeds whose F PED estimates did not deviate much from the estimates of F ROH. It is not surprising that the authors found an extremely large difference between the two estimates for the breed (Evolèner) that had a pedigree completeness of 36%.
Nevertheless, our regression analysis revealed a linear and positive relationship between all the inbreeding estimators. F ROH>4 tend to correlate highly with F PED and F HOM than did either F ROH>8 or F ROH>16 . Notice is to be made that F ROH>4 is equivalent to a no restriction on minimum ROH length (i.e. F ROH>4 = F ROH ) in this study. Our correlation estimates (0.701, F ROH , F PED ; 0.752, F ROH , F HOM and 0.511, F HOM , F PED ) were in the range of estimates reported in previous studies [14,17,49,51,52] for cattle. Signer-Hasler et al. [17] reported values of 0.701 (F ROH , F PED ) and 0.677 (F HOM , F PED ) in using pedigrees with average completeness of at least 95%. In humans genetic studies, a higher value (r p = 0.86) was reported between F ROH and F PED when the minimum length of ROH was restricted to 1.5 Mb [28]. In Swiss goat breeds, Burren et al. [22] also reported correlation coefficient of 0.808 between F ROH and F PED from the use of pedigree information subjected to routine parentage testing. Meanwhile, the intercept of the regression line revealed a real advantage of F ROH over F PED . When F ROH>16 was regressed on F PED , the intercept was close to zero (0.002). Regressing F ROH>4 and F ROH>8 on F PED on the other hand resulted in higher estimates of intercept (i.e. 0.011 and 0.008, respectively). Recalling that shorter ROH predict ancient inbreeding, F ROH>4 captured information on past inbreeding that was neither detected by F ROH>16 nor by F PED . Similar findings were reported for analyses involving several cattle breeds [49] and also in human genetic studies [28].

Population structure
Based on 19,971 autosomal SNPs, all the three measures of population structure analysis effectively separated the 14 breeds based on their geographical origin and this is important in validating the results of our study. Further differentiation as was observed for 1) JER and GNS, 2) MON, CHA and ABO, 3) NRC and HOL and 4) ANG and RGU at higher k values of admixture are consistent with findings of a previous study [1] from which outgroup data were obtained. One of the aims of the current study was to investigate admixture and gene flow events amongst RVA, RH and RDN. Our study provided genomic evidence of common genetic background between these breeds. Variation, however, existed in the degree of ancestor sharing amongst the breeds. RVA hugely resemble RH at the optimal level of k (Fig 6) and it harboured a fraction of the unique background (green-yellow) from the Norwegian breed. The latter suggests an introgression of RVA with NRC genetics and although this is also true for both PCA and NetView findings, no such report exist to our knowledge. There has been the mentioning of introgressive hybridization to improve the performance of RVA [7] and RDN [8] in previous studies. RVA is known to have been crossed with Danish Red cattle to improve body size and milk yield around 1928 and with Red Holstein, Swedish and Finnish cattle after the 1960s [53]. According to Bennewitz and Meuwissen [7], the original RVA breed is close to final extinction. In RDN, Andresen et al. [8] mentioned the harbouring of up to 25% RH genes in the breed's pedigree. Compared to RVA, the proximity of RDN to RH seemed quite distant (Figs 5 and 7). However, even at high k values of the admixture analysis, low levels of introgression from RH could be detected in RDN. Apparently, studies that investigate relatedness between our breeds are lacking. The one study available [10], reported a separation (though somewhat close) of RVA and RH into different clusters using principal component analysis. In the same study, however, a high kinship coefficient value was found between the two breeds.
The NetView results supported the Admixture and PCA findings and provided deeper insight into the interconnectedness within and between breeds. The choice of K-NN value is an important factor that influences the topological structure of network graphs [12]. A K-NN value of 10 was suggested and successfully used in previous studies [24,26,27,54]. Usually, larger values of K-NN aid the investigation of admixture between populations whereas smaller values of the parameter investigates fine-scale genetic substructure. Thus, a K-NN value of 10 has been shown to be effective in detecting both large-and fine-scale genetic structures [42]. In the current study, the use of K-NN value of 10 separated most of the breeds into individual breed clusters except those that had high level of resemblance ( Fig 7A). Additionally, individuals that tended to show some level of dissimilarity to the breed group were distinguished. K-NN = 35 (Fig 8) elucidated regional differences while K = 100 (Fig 7B) provided insight into continental breed differences. By including the results of Admixture at the optimal level of k in the network visualisation analysis, the level of admixture as indicated by node colour (Fig 8,  Box) was highest in RVA and RH. This corroborates the admixture analysis at k = 19. For RVA, a highly mixed genetic background may reflects an artificial increase in heterozygosity consistent with the breed's history and large effective population size.

Conclusion
In this study, all three measures of population structure analysis confirmed high levels of introgressive gene flow from RH to RVA. These breeds also showed highly heterogeneous genetic background. The study for the first time uncovered the presence of Norwegian Red cattle genes in RVA suggestive of past gene flow event between the two breeds. Introgression of RDN with RH genes was comparatively limited and detectable only through Admixture analysis. Unlike the use of herdbook number of male and female animals for the determination of endangerment status as in the case of our breeds of interest, genomic measures of diversity provide a powerful premise beyond the limitations of pedigree analysis on which basis conservation decisions can be made. The study emphasised the advantages of ROH inbreeding over pedigree inbreeding and showed the highest level of genetic diversity in RVA. Evidently, the inbreeding levels in our breeds were low and did not raise alarming concerns about the within breed genetic diversity. However, the constantly decreasing effective population size across generations should inform decisions on conservation strategies for a proper management of the breeds.
Supporting information S1 Table. Description of the number of informative SNPs, length covered by SNPs, average, minimum and maximum distances and average r 2 (linkage disequilibrium) between adjacent markers of the 29 autosomes.