A genetically unique Chinese cattle population shows evidence of common ancestry with wild species when analysed with a reduced ascertainment bias SNP panel

In Hong Kong, there is a cattle population of ~1,200 individuals of uncertain origin and genetic diversity. This population shows heterogeneous morphology, both in body type and pigmentation. Once used as draught animals by the local farmers, they were abandoned around the 1970s due to changes in the economy, and since then have lived as feral populations. To explore the origins of these cattle, we analysed ~50k genotype data of 21 Hong Kong feral cattle, along with data from 703 individuals of 36 cattle populations of European, African taurine, and Asian origin, the wild x domestic hybrid gayal, plus two wild bovine species, gaur and banteng. To reduce the effect of ascertainment bias ~4k loci that are polymorphic in the two wild species were selected for further analysis. The stringent SNP selection we applied resulted in increased heterozygosity across all populations studies, compared with the full panel of SNP, thus reducing the impact of ascertainment bias and facilitating the comparison of divergent breeds of cattle. Our results showed that Hong Kong feral cattle have relatively high levels of genetic distinctiveness, possibly due to the low level of artificial selection, and a likely common ancestry with wild species. We found signs of a putative taurine introgression, probably dating to the import of north European breeds during the British colonialism of Hong Kong. We showed that Hong Kong feral cattle, are distinct from Bos taurus and Bos indicus breeds. Our results highlight the distinctiveness of Hong Kong feral cattle and stress the conservation value of this indigenous breed that is likely to harbour adaptive genetic variation, which is a fundamental livestock resource in the face of climate change and diversifying market demands.

Introduction Livestock domestication started~12,000 years ago (YA) and marked the most significant transition in human history, providing nutrients, traction, fertilizer, leather, fuel and provisions. To date, more than 8,800 breeds of livestock have been reported. However, the actual number of extant breeds is likely to be larger as a significant proportion of indigenous livestock populations, which are present in the developing world, have yet to be described at genotypic and phenotypic level [1,2]. Indigenous domestic populations (often referred to as native breeds) are typically unmanaged genetically, or managed through traditional husbandry. These native breed generally show high levels of phenotypic variation [3][4][5][6][7] and are better adapted to local environments than specialised dairy and beef breeds, which are mostly of European taurine origin [8][9][10]. Indigenous populations have been reported for the major livestock species, including sheep, goat, pig, and cattle [7,[11][12][13].
Cattle are among the most important livestock species worldwide, with almost~1.5 billion individuals in 2012, of which almost a quarter were present in India and China [2]. Two main cattle domestication events have been described, the first~10,000 YA in the Fertile Crescent, which gave rise to taurine cattle (Bos taurus), and the second~2,000 years later in the Indus Valley, giving rise to indicine cattle (Bos indicus) [14,15]. After domestication, cattle were spread across the world following human migration and trade. In particular, taurine cattle spread into Europe, Africa, and Central and North-East Asia, following agriculture expansion [16]. Indicine cattle moved into South East Asia~3,000 YA and Africa~4,000 YA, where they were crossed with local taurine cattle to create Sanga populations [16][17][18]. Introgression between taurine and indicine genomes was also observed in Central Asia, along the fringe of colonization of B. indicus in the South and B. taurus in the North, and in Southern Europe where genomic analyses have identified components of both African taurine and indicine ancestry in several breeds [19,20].
A cattle population of over 1,200 head exists in the proximities of Hong-Kong and Southern China [21], but little is known of its ancestry or genetic diversity, with only anecdotal reports suggesting a relationship with Indonesian wild cattle (S1 Text). Prior to 1970s cattle were widely used by local farmers as draught animals, but as the economy shifted towards service industries the cattle were abandoned. Currently, the Hong Kong cattle survive as feral populations in the less urbanised areas of the city and are characterised by substantial phenotypic heterogeneity in terms of horn shape and orientation, coat colour, hump size and shape, tail length, and conformation (Fig 1).
To date no genetic assessment has been undertaken of Hong Kong feral cattle. Here we used genomic tools to investigate the genetic diversity of Hong Kong feral cattle (HKF) and compared them with taurine, indicine, and composite cattle breeds.

Samples and genotyping
Hair samples of 21 HKF individuals of both sexes were collected in the Chong Hing in Sai Kung Country Park and the Ta Kwu Ling Government Farm. The procedure followed the specific conditions of a licence granted under the Animals (Control of Experiments) Ordinance Chapter 340 the Laws of Hong Kong (Licence Nos (16-47 to 50) in DH/HA&P/8/2/5 Pt 5) and with City University of Hong Kong Ethical Committee approval. Animals sampled are part of population of approximately 1,200 head of feral cattle located mainly on Lantau Island, Sai Kung and the East and North New Territories of Hong Kong [21]. DNA was extracted from hair follicles using a commercial kit (PureLink™ Genomic DNA Mini Kit from ThermoFisher) according to the manufacturer's instructions and was genotyped with the Illumina Bovi-neSNP50 v2 BeadChip array. For comparison, we collected genotypes of 703 individuals from 36 cattle breeds, including ten European taurine and two African taurine, two African Sanga, four Asian indicine breeds, 15 Asian local breeds including one Indonesian and 14 cattle breeds from Central and Southern China, two Asian wild Bos species: gaur (Bos gaurus) and banteng (Bos javanicus), and the semi-wild gayal (Bos frontalis), obtained from public repositories [19,[22][23][24][25].
To ensure a high-quality genotype dataset, only loci with less than 10% missing data and minor allele frequency (MAF) higher than 1% were used. Loci with unknown map position or located on the sex chromosomes were also removed. Genotype data filtering was performed using Plink v1.9 [26]. As the Illumina 50k SNP panel was selected from genome sequences of taurine origin the SNP are predominantly those commonly found in taurine breeds [27,28]. Such a bias is proportional to the degree of divergence between the discovery and the study populations [29] and affects single locus statistics more than multilocus or haplotype-dependent analyses [29][30][31]. To reduce the impact of ascertainment bias in this study we used the following approach. First those SNPs which are polymorphic in banteng or gaur were selected. We did not use the hybrid gayal, which is a gaur x zebu hybrid [32]. As the divergence time between taurine cattle and banteng with gaur is 2.6 and 3 million years, respectively [33], polymorphisms shared between either of these two wild species and domestic cattle are likely to be ancestral rather than new variations that occurred after species divergence. Then, we pruned the SNPs with high values of linkage disequilibrium (LD) from the ancestral SNP set, which has been shown to reduce the impact of ascertainment bias [31]. LD pruning was performed using Plink (--indep-pairwise 2000kb 10 0.2).

Genetic diversity and population structure
Heterozygosity (H o ) and inbreeding (F) coefficients were computed using the R-Bioconductor package snpStats v1.30.0 [34]. To evaluate the ascertainment bias reduction, 1) the same genetic diversity analyses were performed using the whole dataset and the set of ancestral polymorphic loci identified in the wild cattle species, 2) the non-random effect of the ancestral SNPs on H o was assessed through resampling by permutation. In brief, we sampled without replacement the same number of SNPs as our ancestral SNP set and computed the heterozygosity at the population level; this algorithm was run 1,000 times and the distribution of H o obtained for each population was compared with the same index computed using the full and ancestral SNP set. A p-value was calculated for each H o result obtained using the ancestral panel following Davison and Hinkley [35]: pval = (1 + r)/(1 + n), where r is the number of permutations that produced an H o value greater than or equal to that calculated for the ancestral panel and n is the total number of permutations.
Unsupervised hierarchical clustering analysis of population structure was performed using ADMIXTURE v1.23 [36]. The number of clusters computed ranged K values from 2 to 10. Supervised clustering analyses were carried out using ADMIXTURE with prior population information for European taurine (ANG, HOL, HFD, both as a meta-group and separating each breed), African taurine (MUT and non-admixed NDA individuals), indicine (GIR, THA, LOH), and banteng, gaur and gayal. Admixture plots were generated using AIDmixture v0.1 (https://github.com/barbatom/AIDmixture). To investigate the ordinal relationships between populations and individuals, model-free clustering was performed using principal component analysis (PCA) as implemented in Plink; a PCA on the HKF population alone was also performed. A Neighbour-net graph using Reynolds' distances, calculated with a custom script, was generated using SplitsTree v4.13.1 [37].
The occurrence of gene flow was further investigated using Treemix v1.12 [38]. This software models the relationship among the sample populations and their ancestral population using genome-wide allele frequency data and a Gaussian approximation of genetic drift [38]. First, we generated maximum-likelihood-based phylogenetic tree of all cattle populations, and iteratively, we added one migration edge to the previously generated graph with "m" migration edge [20]. We rooted the graphs using banteng as outgroup. Up to 20 possible gene flow vertices were computed and the proportion of information added by each 'm' was assessed using the f statistics as implemented in Treemix. All graphs were generated using the statistical software R [39]. All data generated during this study are available as supplementary material (S1 Dataset).

Results
After pruning for missing data, MAF and unmapped variants 31,482 autosomal SNPs were retained, of these 3,812 SNPs were polymorphic in either banteng or gaur, and were selected as the ancestral SNP dataset. Using this ancestral SNP dataset H o in the HKF was 0.268. Among the European taurine breeds, H o ranged from 0.305 to 0.345, with Hereford and Piedmontese having the lowest and highest values, respectively (Table 1). Heterozygosity for indicine breeds ranged from 0.227 in Tharparkar to 0.264 in Nellore (Table 1) HKF had an inbreeding value F = 0.350, which was comparable with indicine breeds F values ranging from 0.340 (Nellore) to 0.430 (Tharparkar). Most of the European taurine had lower F values, with Hereford having the highest and Piedmontese the lowest (0.240 and 0.140 respectively; Table 1). Asian local populations showed great variation in inbreeding values, We compared the genetic diversity results obtained in the ancestral dataset (~4k SNPs) with those obtained using the whole dataset (~32k SNPs) to evaluate the reduction in ascertainment bias. When the ancestral dataset was used, H o values for each cattle type were higher than those obtained with the full dataset (see S1 Table). The HKF, indicine, local Chinese and wild populations showed the greatest increase in heterozygosity with respect to the full dataset, with an increase in median H o values three-fold higher than that obtained for taurine-breeds, whereas median H o for Sanga and Asian local breeds increased in two-fold (S1 Table and Table).
Admixture analysis at K = 2 discriminated Asian ancestry (green in Fig 2), which was shared by indicine populations, Asian local cattle, the two Asian wild species, and European-African populations with taurine ancestry (brown in Fig 2).
At K = 2 Sanga, Asian local breeds, and HKF showed mixed proportions of Asian and taurine ancestral components. Among the Asian local breeds, those from north China had <30% Asian ancestry and the other Asian breeds showed >60%. Southern European breeds such as Chianina, Romagnola and Marchigiana, and the Yanbian from north-west China had around 10% Asian ancestry, whereas the two African taurine breeds (Muturu and N'Dama) were predominantly European-African, but showed some of the Asian component (Fig 2). At K = 3 the wild and the indicine populations were discriminated, whereas K = 4 separated the European and African taurine ancestry. At K = 4 HKF had~30% of wild Asian ancestry, whereas the remaining~70% was assigned to a combination of European taurine and Asian indicine ancestry (~60 and~30%, respectively; Fig 2). A small amount of the cluster component characterising the wild Asian populations were present in the north European taurine breeds, specifically Hereford, Angus, and Holstein, and <10% of those characterising indicine and taurine breeds were found in banteng. Both Sanga breeds showed comparable proportions of African taurine and indicine ancestry; however, some European taurine ancestry was observed, particularly in Nganda. The African taurine N'Dama was predominantly African taurine but the 11 animals showed differing levels of indicine ancestry. African taurine ancestry was also seen in the majority of European breeds, particularly those from Southern Europe. At K = 5 gayal and gaur separated as a distinct group, with only a small amount of this ancestry present in Hereford, and to a lesser extent in Angus and Holstein. At K = 7 Asian local breeds (excluding Yabian) shared a genetic component (purple in Fig  2) which mostly characterised the south-east China population Wannan and Wenling (>90% ;  Fig 2). At higher K values HKF remained as a unique cluster (Fig 2), although a small proportion of the cluster component (<10%) is present at K = 8 in the north European breeds Holstein, Angus and Hereford, and at K = 10 this component is found exclusively in Holstein. Supervised admixture analysis assigned~60% European taurine,~20% indicine,~10% gaur and~10% banteng ancestry to HKF. The same analysis performed with each European taurine population as a separate prior population assigned~100% Holstein ancestry to HKF (S4 Fig). The first principal component (PC) of the PCA accounted for 12% of the variance and discriminated taurine and indicine origins (Fig 3, x-axis), mirroring Admixture results from K = 2. The second PC accounted for 6.8% of the variance and reflected Admixture results for K = 3, discriminating wild and HKF from other populations (Fig 3, y-axis). The first two PC combined identified six distinct clusters corresponding to the main cattle types analysed in this work (Fig 3, Table 1). HKF formed a distinct tight cluster in an intermediate position between indicine and taurine populations for the first PC and between the wild animals and the domestic cattle for the second PC.
The Neighbour-net analysis of pairwise Reynolds' distances clearly discriminated among cattle types and geographic origin, with Asian indicine and European taurine breeds at opposite  Table 1. sides of the network (Fig 4), with the African breeds positioned in between. The pure African taurine breeds were closer to the European taurine branch and the crossbred Sanga were closer to the Asian indicine branch. The Asian local populations were split, such that the branch with the Northern breeds was closer to the European taurine group, and the remaining populations were located closer to the Asian indicine cluster. In particular, the Aceh, Dehong and Banna breeds showed more connections with the Asian indicine than the taurine group. The wild species and HKF clustered in a distinct branch, confirming the distinctiveness of HKF compared with other cattle, and closer genetic relationship of this population with the wild species.
A maximum likelihood assessment of population history with overlaid gene flow was performed using Treemix (Fig 5). From the f statistics the first migration edge was found to be the most informative, and suggested gene flow from the node representing the Asian indicine breeds to the node representing the Sanga breeds. The second-less informative-migration edge suggested gene flow from the Holstein to HKF (Fig 5).

Discussion
We used genome-wide genotypic data to assess the genetic diversity and structure of a feral cattle population from the Hong Kong area, and compared it with breeds representative of  Table 1. worldwide cattle diversity, along with 15 local breeds and two wild Bos species from southern Asia. Our results showed that HKF had higher levels of genetic diversity compared with local cattle breeds from south Asia and most likely common ancestry or introgression from wild cattle.

Reduction of ascertainment bias
The SNPs selected to develop the BovineSNP50 BeadChip were identified in taurine cattle breeds and it is likely that they underestimate diversity in non-taurine populations [27,40]. To reduce this ascertainment bias we selected ancestral polymorphisms that are present in banteng and gaur as well as cattle, and pruned the SNPs in high LD. Of the initial 48k SNP about 12% passed the initial quality filters leaving 32k high quality cattle SNP; of these~4k SNP were polymorphic in the wild species and were not in LD (S5 Fig). A similar proportion (~10%) 50k SNP on the Illumina array was found to be polymorphic in bison, yak, or banteng [33]. The ancestral loci are less likely to show population bias, which is reflected in the higher heterozygosity of the reduced set of SNP in the non-taurine populations compared with that observed  Table 1.
https://doi.org/10.1371/journal.pone.0231162.g004 using the full dataset (S1 Table and S2 Fig) [31,40]. Using ancestral SNPs and pruning for LD has been shown to significantly reduce ascertainment bias in other studies [31], although finescale patterns of diversity among closely related breeds may be missed [31,41]. The data presented here, for the full SNP set, showed a decrease in heterozygosity proportional to the phylogenetic distance between the cattle breeds used for SNP discovery and the target population, whereas the reduced set significantly reduced this bias across all the populations studied (S1 Table and S2 and S6 Figs). As expected, when we tested the ability of the ancestral SNP panel to reduce ascertainment bias via resampling by permutation, the distance from the H o computed using the ancestral set and the mean the H o distribution for each bread was proportional to the phylogenetic distance between that breed and the taurine breeds used for SNP discovery (S3 Fig). Noticeably, some of the taurine breeds in our dataset, which were among those used in the 50k SNP chip discovery panel, recorded the highest p-values (S2 Table), confirming that the strategy implemented here removed those SNPs having the largest ascertainment bias impact. Importantly, the H o values of the reference breeds obtained using our SNP selection approach were comparable with those of the Illumina BovineHD array, which is still today considered the most robust SNP array for cattle [24,42]. Further, the population structure analyses performed using the reduced dataset provided results in line with the know relationship among the reference populations in our dataset, whereas the same analysis performed using the full SNP was distorted by the percentage of taurine ancestry in a given breed (S6 Fig).

Diversity and structure
We observed high levels of genetic variation in HKF, with values of heterozygosity comparable or higher than those seen for indicine cattle and the local breed from southern Asia, and only slightly lower than those of African and European taurine ( Table 1).
Hybridisation between taurine and indicine breeds could have contributed to HKF diversity. Both taurine and indicine ancestry have contributed to Chinese cattle genetics, with Northern Chinese breeds having a higher taurine component than Southern breeds which have a greater indicine component (Figs 2-4) [22,43,44]. This is also reflected in mitochondrial data, [22,[45][46][47]. The remarkable observation in the present study is that HKF are genetically distinct from pure taurine and indicine breeds and from indicus x taurine crossbred populations. We found no evidence of gene flow from any of the Asian local breeds to the HKF, despite the geographical proximity. The genetic divergence of the HKF from other cattle may be due to drift, resulting from a severe bottleneck or prolonged isolation, which would increase genetic distance from the original population. This type of founder effect has been identified in Chinese taurine cattle and a bottleneck in the indicine populations in the south-eastern part of China [22]. Another contributing factor could be geographical isolation. The Qinling Mountains traverse the Shaanxi province from east to west, creating a natural barrier between North and South-China which have restricted the expansion of taurine cattle southward and the spread of indicine northwards [46], and potentially isolated the Hong Kong cattle. However, we did not find any shared genomic ancestry between HKF and any of the Asian local populations.
Admixture analysis at K = 2-4, PCA, and Neighbour-net analyses identified the presence of wild, in addition to indicine and European taurine ancestry in the HKF (Figs 2-4). The second principal component clustered HKF, banteng, gayal and gaur separately from the other cattle populations (Fig 3), and the Neighbour-net analysis suggested that the HFK and the wild populations stemmed from the same independent branch (Fig 4). This supports the anecdotal records which suggest the contribution of wild bovine species to HKF. B. javanicus and B. t. indicus ancestry has been reported in several Asian cattle populations from genome-wide SNP and microsatellite analyses [19,48]. Introgression from banteng and gayal into domestic cattle breeds of Southern China has been identified from analysis of SNP and whole genome sequence [22,44]. Our analysis of admixture performed with Treemix positioned the HKF branch near the branches of the wild cattle and separated from the domestic cattle, but did not identify any evidence of gene flow from the wild species (Fig 5). Unsupervised admixture results at K = 5 and 7 suggest banteng as putative source of wild ancestry (Fig 2), whereas supervised admixture assigned equal proportions of banteng and gaur (S4 Fig). Hence, it is possible that the wild ancestry component to HKF sources from a different un-sampled wild population or it may carry admixture of more than one wild species [22,49].
Treemix and admixture analyses suggested a north European cattle contribution to HKF (Figs 2 and 5). This finding aligns with historic records mentioning that milk producing cattle from Holland where brought to Hong Kong around the time of the first British colonisation [50]. Supervised admixture analysis performed with the distinct north-European taurine breeds as prior ancestry clustered HKF exclusively with Holstein ( S4 Fig). Although the association of HKF with Holstein aligns with Treemix and admixture results, the assigned ancestry proportion is clearly overestimated and possibly due to the reduced set of markers coupled with the restriction of the supervised model based assignment.
The diverse genetic origins of the HKF population would, at least partially, explain the phenotypic diversity among HKF population in which morphological traits span from zebu to taurine type (Fig 1).

Population homogeneity
The HKF cattle are characterised by a striking phenotypic diversity, especially considering the genetic homogeneity (see Fig 1) seen by the tight PCA clustering and unique genetic profile from admixture analyses (Figs 2 and 3). The HKF PCA cluster showed no greater dispersion than highly selected cattle breeds. Low levels of breeding management have been associated with a large phenotypic variability in other indigenous cattle populations [3]. There was little phenotypic selection practiced for the HKF before being released to feral life which may explain the large phenotypic diversity seen [21]. However, the distinct genetics of this population is intriguing. It would be interesting to hypothesise that HKF are a remnant of a distinct cattle domestication. Indeed, recent results from whole genome sequence analyses of Asian cattle suggested the putative domestication of a genetically differentiated wild B. t. indicus population to explain the highly divergent ancestry of Southern Asian zebu breeds compared with other cattle populations [44].

Conclusions
We applied stringent SNP selection to identify a panel of~4k ancestral polymorphic loci to reduce the effect of ascertainment bias and facilitate the comparison of divergent breeds of cattle. Using this ancestral SNP set we identified the HKF population as genetically distinct from other taurine, indicine and crossbred cattle populations, and showed evidence of a significant contribution of wild bovine species to the genetics of the HKF. Further, we identified signals of putative introgression from north European cattle into HKF, possibly due to the import of high productive cattle during the British colonisation. Domesticated local breeds, such as HKF are likely to have adapted genetic variation to match the local environments as they have had low selection pressure for production traits. This is particularly important as globalization and productivity oriented breeding programs are homogenising the genetics and reducing variability among cattle populations. Preserving local genetic resources is therefore required to maintain a pool of variants which developed as a response to environmental pressures such as disease and parasite tolerance, heat tolerance, and adaptation to local feed resources. The loss of local breeds such as HKF will significantly reduce our ability to face and rapidly adapt to a changing environment. While showing that the HKF are genetically different from other cattle populations, additional unbiased data, including mtDNA, Y chromosome and whole genome sequences are necessary to better define the origins of the HKF cattle and explore whether they may be traced to an independent domestication event.
Supporting information S1 Text. Additional historical information on HKF.