Population demographic history and population structure for Pakistani Nili-Ravi breeding bulls based on SNP genotyping to identify genomic regions associated with male effects for milk yield and body weight

The domestic Nili-Ravi water buffalo (Bubalus bubalis) is the best dairy animal contributing 68% to total milk production in Pakistan. In this study, we identified genome-wide single nucleotide polymorphisms (SNPs) to estimate various population genetic parameters such as diversity, pairwise population differentiation, linkage disequilibrium (LD) distribution and for genome-wide association study for milk yield and body weight traits in the Nili-Ravi dairy bulls that they may pass on to their daughters who are retained for milking purposes. The genotyping by sequencing approach revealed 13,039 reference genome-anchored SNPs with minor allele frequency of 0.05 among 167 buffalos. Population structure analysis revealed that the bulls were grouped into two clusters (K = 2), which indicates the presence of two different lineages in the Pakistani Nili-Ravi water buffalo population, and we showed the extent of admixture of these two lineages in our bull collection. LD analysis revealed 4169 significant SNP associations, with an average LD decay of 90 kb for these buffalo genome. Genome-wide association study involved a multi-locus mixed linear model for milk yield and body weight to identify genome-wide male effects. Our study further illustrates the utility of the genotyping by sequencing approach for identifying genomic regions to uncover additional demographic complexity and to improve the complex dairy traits of the Pakistani Nili-Ravi water buffalo population that would provide the lot of economic benefits to dairy industry.


Introduction
The domestic Nili-Ravi water buffalo (Bubalus bubalis) is a well-used dairy animal in livestock in Pakistan because it contributes about 68% of the total milk production in Pakistan [1]. Besides this, it is more adaptive to environment such as heat, disease and environmental stress and a better converter of poor quality roughages into nutritious and quality food [1,2]. There is a high variation in phenotypic attributes, which indicates that further enhancement might be possible. Milk yield of the Nili-Ravi breed is commonly associated with many influential factors; however, the breed has high milk potential as compared with other buffalo species in Asia [2]. This variability indicates the possibility of using genomic selection to improve milk traits in future breeding programs. Some bulls excel in certain economic traits but not all. All the selected bulls for this study have been made available for artificial insemination, and so these bulls were genetically tested for various traits deemed important in milk production and body weight [3].
Several genetic studies for milk yield exist and a meta-analysis suggested that a majority of female traits that contributed to milk yield in dairy cattle tend to be lowly heritable (0.02 to 0.04) when compared to the studies involving male animals with mean heritability estimates between 0.05 and 0.22 [4]. Though superior genetic effects in both males and females are fundamental to breed for high milk production, male genetics is more informative as breeding values of AI bulls can be estimated with lesser bias [5].
Advances in high-throughput sequencing technologies provide the ability to identify genome-wide diversity and bottlenecks in domestication. Superior gene pools can be enriched to reproduce buffalos efficiently with the existing genomic resources and plan future conservation strategies [2,6]. Genetic control of any trait specifies the interaction of environmental and genetic variance [7]. Inbreeding could cause further loss of genetic diversity and could accumulate deleterious alleles in the genomes, thereby resulting in deterioration of the breed, reduced milk yield, calving performance, udder health, fertility and survivability [8,9]. Inbreeding gives rise to continuous segments of homozygous genotypes that are linked to inbreeding depression [10]. For example, one percent increase in F ROH (association between increased inbreeding based on ROH) was associated with 20 kg decrease in 205-days milk yield [11], and also with 0.4-days extended insemination interval in heifers [12]. Informed decisions based on molecular knowledge could result in selecting diverse and superior animals for sustaining milk production.
With next generation sequencing (NGS), genome-wide single nucleotide polymorphisms (SNPs) can be developed affordably and can be effectively used for genome analysis of various traits in unexplored livestock such as water buffalo of Pakistan. A normal dam typically conceives on average up to 5 or 6 times in whole life-span but once an elite bull is effectively identified, its ability to produce a large number of offspring in life-time can be maximized by using artificial insemination. The detection and development of genome-wide SNP markers are essential tools for mapping and analyses of quantitative traits for superior bull selection [13][14][15].
Recently, many genotyping by sequencing (GBS) studies have been conducted on dairy breeds because it is a highly multiplexed and simple approach to construct NGS libraries [16][17][18][19][20][21][22][23]. De Donato et al. [24] used the GBS approach for genotyping 47 cattle breeds from Africa and the United States, producing 1.4 million unique reads per animal, and identified 63,697 SNPs. The authors concluded that the GBS approach is flexible, novel, sufficiently highthroughput and capable of providing acceptable genetic markers for genomic selection and genome-wide association study (GWAS). Ibeagha et al. [25] used the GBS approach to identify population-specific SNPs for use in improving complex dairy traits. Akanno et al. [26] reported seven SNPs with significant dominance associations for birth, weaning and yearling weight. Surya et al. [27] annotated the genome-wide SNPs of B. bubalis and reported 436 SNPs in 38 genes affecting fertility and 483 SNPs in 66 genes affecting milk production. Gao et al. [20] used SNP genotyping and identified two novel SNPs associated with body measurement traits of cattle.
The present study aimed to identify the chromosome-based distribution of SNPs, resolve the genetic diversity, and examine linkage disequilibrium (LD) and population structure for use in GWAS of body weight and milk yield traits in Nili-Ravi buffalo bulls from Pakistan to determine the quantity of specific traits that they may pass to their daughters. In addition, we resolved chromosome-based population demography and bottlenecks.

Materials and methods
The animal study was reviewed and approved by institutional guidelines of ethical review committee, University of Veterinary and Animal Sciences, Lahore, Pakistan.

Population and phenotypes
In total, 167 bulls were randomly selected for this study comprising single generation. Out of 167 breeding bulls, 155 bulls were from Semen Production Unit, Qadirabad, District Sahiwal, Punjab, Pakistan and other animals were from various other private dairy farms of Punjab in Pakistan. Animals selected for the study had no history of genital infections [28] and had diverse physical parameters with reference to the length from shoulder to pin-bone, wither height, height at sacrum, width between angles of the hip, and girth and width of the face above the eyes. In addition, bulls were selected based on reproductive parameters (scrotal circumference and scrotal length). All animals were dewormed and vaccinated regularly for footand-mouth disease, black quarter and haemorrhagic septicaemia. Blood samples were collected by a competent veterinary surgeon after the regular quarantine of animals. Source and phenotypic details of bulls in the study are in S1 Table. Genotyping by sequencing Genomic DNA for genotyping assay was isolated from blood samples of Nili-Ravi breeding bulls by using Qiagen columns (QIAamp DNA minikit, Qiagen, Hilden, Germany). The quantity of extracted DNA was measured by using the Qubit 3 Fluorometer with a double-stranded DNA HS assay kit (Invitrogen by Thermo Fischer Scientific, USA). GBS was performed as described [29,30]. Briefly, total DNA from all samples was digested with ApeKI to reduce genome complexity. Digested fragments were ligated to common and barcode adapters with appropriate overhangs. Ligated products were then further pooled and amplified with compatible primers. Amplified sample pools constituted a sequencing "library". After PCR reaction, pooled samples were gel-purified; a GBS "library" distribution of fragment sizes was checked by using BioAnalyzer (Agilent Technologies, USA). The GBS library was considered appropriate for sequencing reaction if the number of adapter dimers (approximately 128 bp in length) was minimal and the length of most of the genomic DNA fragment lengths was 170 to 350 bp. Templates were then quantified and diluted perfectly for sequencing reactions by using Next-Seq 500 (Illumina, San Diego, CA).
Raw sequence reads were trimmed with use of Sickle (https://github.com/najoshi/sickle), adopting a minimum PHRED quality threshold of 20 in a sliding window. Then reads were demultiplexed and barcodes were removed. Trimmed reads were aligned to the B. bubalis reference sequences (www.ncbi.nlm.nih.gov/biosample/SAMN08640746) with Burrows-Wheeler Aligner [29]. Aligned reads were processed to identify SNPs by using SAMtools [31, 32] and barcode reads were handled and interpreted into a distinct sequence tag set, with a single Tag-Counts file generated per input file of FASTQ. We used TASSEL-GBS, an efficient bioinformatics pipeline considered for processing raw sequence dataset into a variant genotype file [33].

Genetic diversity and population structure analysis
Expected nucleotide diversity (θ per bp) and Tajima's D for various chromosomes were estimated with sliding-window analysis by using TASSEL v5.0 as described [34]. Estimation of fixation index (F ST ) was based on Wright's F statistic [35] with use of SNP and Variation Suite (SVS) v8.1.5. For quantitative assessment of the number of groups in the panel, we used a Bayesian clustering analysis with a model-based approach implemented in STRUCTURE v2. 2 [36]. This approach involves use of multi-locus genotypic data to assign individuals to clusters or groups (K) without prior knowledge of their population affinities. The program was run with SNP markers for K values 1 to 9 (hypothetical number of subgroups), with 100,000 burnin iterations, followed by 500,000 Markov Chain Monte Carlo iterations for accurate parameter estimates with a high-performance cluster. To verify the consistency of the results, we used 3 independent runs for each K. An admixture model with correlated allele frequencies was used. The optimal K value was determined by use of an ad-hoc statistic, ΔK [37]. The number of Ks in each dataset was evaluated by ΔK values estimated with the software Structure Harvester, a website and program for visualizing STRUCTURE output and implementing the Evanno method. In a second approach, we used principal component analysis (PCA) with SVS v7.7.6 (Golden Helix, Inc., Bozeman, MT, www.goldenhelix.com). Female genetics was not included in the present study, which is in similar lines with several reports [3,[38][39][40][41][42].

Linkage disequilibrium (LD) analysis
For GBS study, we considered only SNPs that were successfully mapped to the B. bubalis whole genome sequence (WGS) draft to avoid spurious LD and incorrect association mapping. Before analyzing LD distribution, a block of the haplotypes was called for all adjacent SNPs by using SVS v8.1.5. Adjacent and pairwise measurements of LD were calculated separately for each chromosome. For computing LD, we used an expectation-maximization (EM) algorithm, formalized by Dempster et al. [43], an iterative technique for obtaining maximum likelihood estimates of sample haplotype frequencies.

Identification of runs of homozygosity (ROH)
ROHs were estimated using SNP and Variation Suite (SVS) that uses a sliding window to identify both ROHs and consecutive stretches that contain a minimum number of homozygous SNPs, at a minimum pre-specified distance. With this method, the software carries out basic detections of homozygous stretches identified by the sliding window, and the user only needs to define the parameter "minimum size" of segments to be identified [44].
annotation terms for the SNP-containing sequences for each trait were identified with the WGS draft of the B. bubalis genome assembly.

Genotyping of water buffalo collection and variation identification
We genotyped 167 water buffalo samples by GBS, which generated approximately 3.71 billion reads of 75 bp long. The median number of reads for each sample was 1.5 million (rangẽ 100K to 11.0 million). Good barcoded tags with at least 10 read counts, corresponding tõ 1.75 billion reads, were obtained and used for SNP calling. The 1.75-billion GBS reads were aligned to the B. bubalis reference genome (version 1; https://www.ncbi.nlm.nih.gov/genome/ 791?genomeassemblyid=374666), with 98% of the unique tags mapped to the genome. From the alignments, we identified 114,760 SNPs. After retaining SNPs with �5 0% missing data (representing at least 617 accessions) and minor allele frequency (MAF) � 0.01 (i.e., SNPs present in at least 7 accessions), we obtained 24,319 SNPs distributed across the B. bubalis reference genome, with an average of one SNP per 10.6 kb. Only eight regions >500 kb in the reference genome were not covered by SNPs, because these eight regions were centromeric or pericentromeric. The MAF distribution of these SNPs is shown in S1 Fig. Among

Genetic diversity
Using the final SNP dataset, we performed PCA, which illustrated two clusters and some outliers (Fig 1 and S2 Table). Our results are consistent with those from Lu et al. [49], who also classified water buffalos into two primary groups, with some outliers. To validate the results of PCA, we analyzed the population structure of the water buffalo with the Bayesian clustering algorithm implemented in the STUCTURE program.
ΔK analysis showed that two populations (K = 2) represented the best number of clusters for these 167 bulls (Fig 2 and S1 Fig). As shown in Fig 2, at K = 2, bulls were grouped into two groups, which indicated the presence of two lineages in Pakistan and the extent of admixture of these two lineages (S2 Fig). The population structure at this optimal K was consistent with the PCA results; all suggested two primary clusters in the Pakistan water buffalo bull collection. These two lineages indicated that Nili-Ravi is crossbred buffalo originated from the Nili and River-type buffalos.

Nucleotide diversity
Because of the strong population structure, we assessed patterns of expected nucleotide diversity (θ) and Tajima's D separately for each chromosome to understand genome-wide bottleneck effects. Such information will be of immense use when inferring the evolutionary dynamics of domestication across the water buffalo genome. Water buffalo domestication is often associated with "population bottlenecks" because of the limited number of founding individuals experiencing domestication events. The frequency of segregating SNPs, as reflected by various chromosomal measures of mean θ and Tajima's D, is presented in Fig 3. For water buffalo, chromosomes 5, 10 and 14 showed reduced nucleotide diversity as compared with the remaining chromosomes. For chromosome 10, Tajima's D was also reduced along with mean θ, which indicates purifying selection as compared with the other chromosomes. Chromosome 21 showed increased nucleotide diversity and Tajima's D, which indicates the accumulation of rapid mutations on this chromosome.

LD and recombination rate analysis across the chromosomes
Distribution of LD is essential to understand patterns of historical recombination and the non-random association of loci. We performed LD analysis for the whole dataset of water buffalo breeding bulls, involving all the adjacent pairs of SNPs mapped to various chromosomes (Fig 4). For computing LD, we used the EM algorithm [43] as an iterative technique for  Table. https://doi.org/10.1371/journal.pone.0242500.g001 obtaining maximum likelihood estimates of sample haplotype frequencies. The number of associations mean LD block and maximum LD block sizes for various chromosomes are presented in Table 1. As expected, the largest LD block was in chromosome Y, because the Y chromosome in mammals is fixed and will not undergo recombination. In this study, we identified 4169 adjacent SNP pairs in LD, with an average LD decay at 90 kb for the Pakistan water buffalo genome.

Distribution of ROH and haplotype blocks
The genome-wide pattern of ROH was determined to examine local homozygosity in Pakistan water buffalo bulls (Fig 5 and S3 Table). ROH regions can be critically analyzed to identify susceptibility/deleterious loci across the genome where there has been significant evolutionary pressure. ROH regions form as a result of selective sweeps around the gene candidates that have evolutionary significance. Large regions of homozygous SNPs can be common between subpopulations without a direct common lineage, and these regions can be areas of functional significance [50]. In this study, we resolved ROH regions on chromosomes 1, 3, 5, 10, 16 and Y of varied lengths. The Y chromosome showed the highest ROH islands, which is expected, as in other mammals [51].

Characterization of selection sweeps and domestication signature
A highly significant pairwise F ST (P<0.001) distribution with low and high milk yield and body weight for the buffalo bulls is presented in Fig 6 and S4 Table. These sliding windows allowed for identifying differential evolutionary signatures across the genome for milk yield and body weight of the bulls. Furthermore, the patterns of F ST variation indicated genomic areas with the signatures of positive selection and sweeps across various chromosomes. The pairwise analysis involving low and high bull groups showed higher genetic differentiation relative to low and high milk yield rather than body weight. Selection signatures detected loci with larger effects under strong selection on chromosomes 10, 18 and 20 for milk yield with 0.4 cut-off value ( Fig 6A). The sliding window of pairwise F ST for low and high body-weight bulls showed higher positive selection signals on chromosomes 7 and 18 with 0.35 cut-off value ( Fig 6B).

GWAS for milk yield and body weight
A Multi-Locus Mixed Linear model was implemented by using the EMMAX algorithm that involves stepwise regression and Byes Information Criteria (BIC) [45] to identify significantly associated SNPs for milk yield and body weight. Variance plots (Figs 7A and 8A) include partitioned error, genetic variance, variance explained in the current model and variance explained by covariates. Partition of variance explained in current study for milk yield and body weight was in the range of 70% to 80%. The-log10 P-values for the tested SNPs from GWAS analyses for both traits are shown as Manhattan plots (Figs 7B and 8B). All significant SNPs identified by GWAS showed cumulative phenotypic variance of 14% for milk yield and 20% for body weight. On chromosome 10, 14 SNPs were strongly associated with both milk yield and body weight. These 14 SNPs formed a tag, which is a representative in a region of the genome with high linkage disequilibrium. Such tag SNPs would reduce time and expense of mapping the genome areas associated with milk yield and body weight, because they eliminate the need to study each SNP [52]. The tag we identified on chromosome 10 explained the phenotypic variance of 14% and 22% for milk yield and body weight, respectively. P-values, regression beta and minor and major allelic effects are presented for milk yield and body weight in Tables 2 and 3. Level of significance, allelic effects and phenotypic variance explained for the other milk yield-associated SNPs on chromosomes 1, 3, 5, 8, 9, 12, 13, 16, 18, X and Y are also presented. A total of 14% and 85% of the significant SNPs exhibited positive contribution for milk yield and body weight, respectively. For milk yield, significant SNPs present on chromosomes 3 and 8 participated in positive contribution. For body weight,  Table. https://doi.org/10.1371/journal.pone.0242500.g006

Discussion
Nili-Ravi bulls are currently classified as a half herd and are an important source of milk in Pakistan [2,53]. Because of the narrow gene pool among this herd, transferring superior alleles to progeny is important. Previous genotyping methods using low density marker systems had limitations in understanding admixture and genome-wide diversity [54]. This is the first report of the use of genome-wide SNPs generated using GBS to estimate population structure and demographic history across various chromosomes in Nili-Ravi bulls. In addition, this study is important to breeders because we identified several genomic regions associated with milk yield and body weight that could help in genetic improvement of these important economic traits. Boison et al.
[55] summarized various advantages of using genomic methods for breeding buffalo bulls.
Among the 167 animals included in this study, we noted two different lineages based on admixture analysis. Similar findings were also found by Lu et al. [49] for Chinese indigenous buffalo breeds. We further noted highly variable nucleotide diversity and Tajima's D across various chromosomes, which also agrees with previous studies of Asian buffalo breeds that used microsatellite [56-58] and mitochondrial markers [59, 60]. For example, chromosome 10 showed reduced nucleotide diversity and Tajima's D, and chromosome 21 indicated higher nucleotide diversity and Tajima's D.
Parameters such as diversity, Tajima's D pattern, LD decay and ROH address various aspects of population demography and help in understanding the dynamics of genome-wide We performed GWAS using genome-wide SNPs for several novel candidate genes related to milk yield and body weight traits that are significant breeding goals for dairy industry. Gamma-secretase subunit APH-1B pseudogene/Rho guanine nucleotide exchange factor 26 gene (Gene ID: S1_158054912) present on chromosome 1 encodes a large protein that functions as a GDP-to-GTP exchange factor and is reported to be a significant candidate gene for the maternal effect on weaning weight and milk yield traits in cattle [74][75][76]. In this study, chromosome 3 harbored at least two genes that affect milk yield. The first is 26S proteasome non-ATPase regulatory subunit 5 (Gene ID: S3_172820617). The second is Coilin (Gene ID: S3_55703175) and is one of the main molecular components of Cajal bodies. Both of these gene members are involved in protein-mediated degradation found in mammary glands of cows, and this mechanism is involved in milk production of dairy cows [77,78]. The formation of various cellular bodies and small nuclear ribonucleoproteins depends on specific structural proteins, such as Coilin in Cajal bodies [79,80]. A subset of nuclear bodies uses specific  (8) is one of the major enzymes that participates early in lactation in dairy cattle. An lncRNA (LOC112586670) fragment is a candidate gene associated with buffalo milk production [49, 74, 83-86]. From functional analysis of liver microarray data from mid-lactating Holstein cows, delta(7)-isomerase pseudogene (Gene ID: S8_2495291, S8_2495296, S8_2495349) is involved in the cholesterol biosynthesis pathway [87,88]. This finding suggests that the candidate gene may contribute to milk metabolic pathways. Many studies have reported the kelch-like family member 29 (Gene ID: S12_75087502) involved in growth [89] and milk production in dairy cattle and buffalos [78, 90, 91]. Recently, it was also identified as the closest gene mapped by novel QTL-lead SNP for milking speed in French Holstein cattle, and T was an effect allele with a −0.08 allelic substitution effect of the SNP [92]. Multidrug resistance-associated protein 4-like (Gene ID: S13_18088155) is a member of C subfamily of ATP-binding cassette (ABC) transporters. ABCC4 acts in basic metabolic pathways and is highly polymorphic, with potential effects in a variety of phenotypes, and has also been detected in milk during early lactation [74, 91, [93][94][95]. The genetic effects of ABCG2 polymorphism have been observed for milk yield traits in Chinese Holstein cattle [96,97]. Alim et al. [98] reported that the A allele was the more frequent in Chinese Holstein cattle (0.53), followed by the C allele (0.47). The authors identified that the CC genotype was associated with increased milk yield and protein percentage during the first and second lactation, whereas the C-A genotype decreased protein yield, so the C allele is associated with superior milk performance [97,98], which is in partial agreement with our findings. Multidrug resistance-associated protein 4-like may be a novel candidate gene associated with milk yield.
In the bovine, fatty acid desaturase 2-like protein FADS2P1/olfactory receptor 5G3-like (Gene ID: S16_3831231) is a potential genetic marker for fatty acid composition in cattle milk [99][100][101]. In Jersey cattle, the major allele was A, and significant associations were recorded for the milk lauric, behenic, lignoceric, oleic, eicosatrienoic, and docosadienoic fatty acids. In Polish Holstein-Friesian cows, the G allele was prevalent, and significant associations were observed for erucic and docosahexaenoic acids [99]. MARVEL domain-containing 3/U6 spliceosomal RNA (Gene ID: S18_39186829) could be a novel marker related to milk trait because on relative expression analysis, its closest members (RNU6B, RNU5A and RNU1A) were used to assess the suitability of a combination of small nuclear RNAs as a reference RNA in milk somatic cells of lactating yaks [91, 102,103]. MAPK-interacting and spindle-stabilizing protein-like (Gene ID: S24_1645) is involved in regulating milk synthesis [91, [104][105][106]. EF-hand domain-containing 2 (Gene ID: S25_42968291) is one of the most common calcium-binding structural motifs, and a well-defined helix-loop-helix structural domain, present in many calcium-binding proteins, has been identified in milk [107][108][109][110]. Our results support these important insights into the synthesis of milk proteins and potential targets for the future improvement of milk quality.
We identified alpha-N-acetylgalactosaminide alpha-2,6-sialyltransferase 3 (Gene ID: S6_67621520) as a novel marker for body weight, and it also contributes to protein glycosylation for milk synthesis [111][112][113]. U6 spliceosomal RNA/Bardet-Biedl syndrome 7 (Gene ID: S7_114945813) is associated with a rare hereditary disorder that has several phenotypic features including retinopathy, obesity, polydactyly and hypogenitalism. Among them, obesity is the major clinical finding and the incidence is reported to be 72% to 86% in the Bardet-Biedl syndrome population [114,115]. From previous findings, Bardet-Biedl syndrome 7 gene could be considered a good marker for body weight. FAM172A family with sequence similarity 172 member A (Gene ID: S9_17523778) plays an important role in cell proliferation and facilitates S-phase entry. Thus, this gene is important as a cell growth regulator [116,117]. FAM172A is also reported as a candidate gene for CHARGE syndrome (heart defects, atresia of choanae, retardation of growth and genital abnormalities) [118]. Glutaredoxin-related protein 5, mitochondrial (Gene ID: S14_83340556) is required for normal cell growth [119,120]. Loss of glutaredoxin 3 impedes mammary lobuloalveolar development during pregnancy and lactation [121]. Our study adds further support to the importance of this gene in relation to the body weight trait. Spermatid nuclear transition protein 3-like (Gene ID: S25_110079376) was found a new marker for body mass. Large testes in relation to body mass (relative testes mass) is a strong predictor of high sperm competition levels in many taxa [122]. This gene family member is essential in chromatin condensation during spermiogenesis and hence, the family members are candidate genes for identifying sperm motility markers [123] and for semen quality traits [124]. Bombesin receptor subtype 3/Protein CXorf40A-like (Gene ID: S25_115449301) has been studied in animals as an important regulator of body weight, energy expenditure and glucose homeostasis [125][126][127].
Several female fertility traits in dairy cattle showed reduced genetic correlations among milk production traits because of low or moderate heritabilities [4,128]. The male lines received the greatest attention in selection because of accurate prediction of sire's heritability [129]. Estimated breeding values of AI bulls are considered to be more reliable for various predictions [130]. For this reason, many GWAS studies in dairy cattle has been carried out based on bulls' performance for use in genomic selection programs. The use of female data in GS programs has raised some debatable issues such as readjustment of phenotypes for comparison with the bulls [131]. A study that identified 105 genome-wide significant SNPs for milk production traits was through a paternal transmission disequilibrium test that explored associations from the sire families [132]. In similar lines to the current study, Nayeri and Stothard [86] utilized 3,729 North American Holstein bulls for GWAS and a genome scan with 4280 Nordic Red Cattle bulls was used to identify loci affecting milk yields Touru et al. [133].
We also performed pairwise population F ST (P<0.001) differentiation for low and high bulls for milk yield and body weight to understand pattern of selection [134]. All significant SNPs present on chromosomes 10 and 18, including MARVEL domain-containing 3/U6 spliceosomal RNA (S18_39186829), had high F ST values indicating positive selection at these regions. Flori et al. [135] indicated reduced F ST values for GHR gene in milk yield of cattle and also identified 13 highly significant genomic regions with varied pairwise F ST subjected to recent and/or strong positive selection. For body weight in our study, chromosomes 7 and 18 had high positive selection signals.
Defining ROH in a complex pedigree can be a way to examine homozygosity levels across the genome that has resulted by inbreeding and selection [136]. In this study, we observed ROH regions across the buffalo genome on chromosomes 1, 3, 5, 10, 16 and Y, with varied lengths. Chromosome 10 showed the longest ROH indicating functional significance of several important milk yield associated loci. Similar results were reported in several cattle breeds: shorter ROH results from the crosses with distant haplotype relatedness, and recombination has had more time to trim down ROH that is the target of selection sweeps, whereas greater ROH length is mostly caused by recent inbreeding [137,138]. We identified two significant SNPs (S5_76106561; S10_104237148) in ROH that were negatively associated with the milk yield trait, so those could be deleterious alleles and fixed in the population. Three other body weight associated SNPs (S10_104303781; S10_104392764; S25_110079376) are in the ROH region, S10_104303781 was negatively associated with the weight. Longer ROH had stronger linear relationship with the number of individuals that carry deleterious homozygous mutations reducing the biological fitness [51]. For example, a large number of genomic regions that contain long ROH manifested unfavorable associations because of deleterious mutations with milk yield in Holstein cattle [139].

Conclusion
Our study demonstrates the utility of the GBS approach for genetic analysis of complex dairy traits and for understanding the population structure of Pakistani Nili-Ravi buffalos. Our SNP analysis revealed that Nili-Ravi bulls are admixed with two lineages. Our GWAS for milk yield and body weight identified SNPs located in candidate genes previously identified in other dairy breeds. Results of this study suggested that male effects contribute strong genetic effects for milk yield. In future, favorable allele combinations from this study could be used to generate Nili-Ravi breeding bulls with high milk yield and body weight by selecting the bull with higher genetic merit is tantamount that will produce a daughter, who in-turn produces the greatest net revenue for dairy industry in terms of improved or superior commercial traits. Performance of their daughters will later be used to update these measures. Since these dairy bull data were publicly available, individual bull efficiency results can be disseminated to potential semen buyers.