Candidate Gene Approach for Parasite Resistance in Sheep – Variation in Immune Pathway Genes and Association with Fecal Egg Count

Sheep chromosome 3 (Oar3) has the largest number of QTLs reported to be significantly associated with resistance to gastro-intestinal nematodes. This study aimed to identify single nucleotide polymorphisms (SNPs) within candidate genes located in sheep chromosome 3 as well as genes involved in major immune pathways. A total of 41 SNPs were identified across 38 candidate genes in a panel of unrelated sheep and genotyped in 713 animals belonging to 22 breeds across Asia, Europe and South America. The variations and evolution of immune pathway genes were assessed in sheep populations across these macro-environmental regions that significantly differ in the diversity and load of pathogens. The mean minor allele frequency (MAF) did not vary between Asian and European sheep reflecting the absence of ascertainment bias. Phylogenetic analysis revealed two major clusters with most of South Asian, South East Asian and South West Asian breeds clustering together while European and South American sheep breeds clustered together distinctly. Analysis of molecular variance revealed strong phylogeographic structure at loci located in immune pathway genes, unlike microsatellite and genome wide SNP markers. To understand the influence of natural selection processes, SNP loci located in chromosome 3 were utilized to reconstruct haplotypes, the diversity of which showed significant deviations from selective neutrality. Reduced Median network of reconstructed haplotypes showed balancing selection in force at these loci. Preliminary association of SNP genotypes with phenotypes recorded 42 days post challenge revealed significant differences (P<0.05) in fecal egg count, body weight change and packed cell volume at two, four and six SNP loci respectively. In conclusion, the present study reports strong phylogeographic structure and balancing selection operating at SNP loci located within immune pathway genes. Further, SNP loci identified in the study were found to have potential for future large scale association studies in naturally exposed sheep populations.


Introduction
Assessment of livestock health conditions in developing countries for identification of priority diseases to be targeted for control, revealed helminth infections as one of the most important problems in sheep and goat [1,2]. Gastro-intestinal parasitic infestations such as Haemonchus contortus, Teledorsagia circumcincta, Trichostrongyles, Nematodirus sp. impose severe constraints on sheep and goat production especially those reared by marginal farmers under low external input system. These parasites incur heavy losses to farmers in terms of body weight loss, direct cost of anthelminthic drugs, loss due to mortality, etc. For example, annual treatment cost for Haemonchus contortus alone had been estimated to be 26 million USD in Kenya, 46 million USD in South Africa and 103 million USD in India [3]. Emergence of strains resistant to anthelminthic drugs has further complicated the management of parasitic diseases in small ruminants [4,5]. Breeding programs with the goal of enhancing host resistance to parasites may help to alleviate this problem in the long term. Genetic variation in host resistance exists for the major nematode species affecting sheep: Haemonchus contortus, Trichostrongylus colubriformis, Teledorsagia circumcincta and various Nematodirus species. Considerable variation has been reported among sheep breeds on their ability to resist gastro-intestinal nematodes (GIN). For example, indigenous sheep breeds like Red Maasai [6], Garole [7], Gulf Coast Native [8], Rhon [9] and Barbados Black Belly [10] were found to have relatively better resistance against GINs. Similarly, within-breed genetic variation has also been demonstrated in diverse sheep populations including Merino [11], Romney [12], Scottish Blackface [13], feral Soay sheep [14], etc. Estimation of genetic parameters revealed low to moderate heritability in different sheep populations (h 2 = 0.149, Avikalin sheep [15] to h 2 = 0.41, Armidale sheep [16]).
Exploration of genetic variation either within specific regions of genome or more specifically in candidate genes involved in innate and adaptive immune pathways may help to identify a set of DNA markers significantly associated with parasite resistance characteristics. The former approach in terms of quantitative trait locus (QTL) analysis is a powerful method to understand genotypephenotype relationship. Several QTL studies on parasite resistance characteristics have been reported in sheep. A quick evaluation of Animal QTL database [17] revealed a total of 753 QTLs reported for various economic traits in sheep. Among these, 81 were found to be related to parasite resistance characteristics and distributed in all sheep chromosomes except chromosomes 5 and 19. However, such QTLs related to parasite resistance were found to be more concentrated in chromosome 3 (16 QTLs) followed by chromosome 14 (7 QTLs). Among different parasites, 44 of 81 QTLs have been reported on resistance to Haemonchus spp., 20 on Trichostrongyles spp., 11 on Nematodirus spp. and six on Strongyles spp (Figure S1a-e). Thus the complexity of this analysis is evident from the fact that multiple, significant QTL regions have been reported across the entire genome, but the identification of candidate causative genes has remained elusive. The lack of consensus overlap among reported QTLs has hindered the identification of candidate genes and genetic markers for selection in sheep [18][19][20][21].
One of the important objectives of QTL studies is to identify underlying causative gene polymorphisms associated with the trait. Different QTLs reported in chromosome 3 for parasite resistance characteristics were found to be distributed all over the chromosome with varying overlapping regions. Hence, different candidate genes within chromosome 3 along with genes involved in immune related KEGG pathways (KEGG-Kyoto Encyclopedia of Genes and Genomes) could be important targets for establishing underlying causative variations. It is expected that the potential causative polymorphisms within candidate genes are members of the same overarching KEGG pathway that lead to the phenotypic expression on parasite resistance characteristics in each population. Further, the extent of genetic diversity and population substructure at such polymorphic loci are critical for such a genotypephenotype association study. Population stratification has been demonstrated to result in false positive associations in various species including humans [22,23], dogs [24] and cattle [25,26]. Considering the significance of genetic basis of parasite resistance in sheep, the Joint FAO/IAEA Division of International Atomic Energy Agency initiated a coordinated research project to document the phenotypic differences and underlying genetic variations in indigenous sheep and goat breeds of 12 countries from Asia, Africa and South America through on-farm artificial challenge and natural exposure under farmers' field conditions respectively. The ultimate goal of this project is to identify a common set of genetic markers that significantly influence parasite characteristics across different indigenous populations of sheep and goats. The present study thus aimed at exploring genetic variations within genomic regions containing significant QTLs and different candidate genes involved in immune pathways, identification of single nucleotide polymorphisms (SNPs) and genetic diversity analysis in sheep populations evolved under different environmental conditions. A preliminary study was performed on association of genotypes with host resistance characteristics against gastro-intestinal nematodes (i.e. fecal egg count, body weight change and packed cell volume measured in response to artificial challenge with infective L3 larvae of Haemonchus contortus parasite).

Animal Ethics Statement
All procedures for artificial challenge experiment at different locations in Argentina and Indonesia were approved respectively by the Institutional Committee for Care and Use of Experimental Animals of the National Institute of Agricultural Technology (CICUAE-INTA), Argentina (protocol number 35/2010) and Institutional Research Animal Facility, Bogor Agricultural University, Indonesia following the guidelines described in their institutional manuals. Experimental animals were challenged with infective L3 larvae of Haemonchus contortus and blood samples were collected from jugular vein under the supervision of qualified veterinarians for extraction of DNA and assessment of blood parameters. 42 days post challenge, animals were dewormed to clear parasites from the gut after the experiment. The experimental challenge in both locations did not involve animals from any endangered or protected species/breeds. Blood sample collection for DNA extraction and genotyping from remaining breeds were performed by local veterinarians in respective countries following good animal practice.

DNA samples for diversity analysis and association studies
A total of 713 unrelated sheep from 22 different breeds/ populations were utilized for diversity analysis in the present study. The sheep breeds/populations were distributed in three macroenvironmental geographical locations including Asia (tropics), Europe (temperate) and South America (tropics) with the assumption that level of parasite load and infections vary significantly across these regions [27,28] (17), Kajli (13) and Junin (45). The location of each of the sheep breeds/ populations under study are presented in Figure 1. Artificial challenge experiments were carried out in two sheep breeds each from Argentina (Pampinta and Corriedale) and Indonesia (Indonesian Fat Tailed and Indonesian Thin Tailed) to generate phenotypes and DNA samples from these experimental animals were utilized for association study. Under artificial challenge trial, animals were dewormed before challenging with infective L3 larvae. Four to six weeks after deworming, blood samples were collected for DNA extraction and estimation of anemic parameters and experimental animals were challenged with a dose of 5000 infective L3 larvae cultured under in vitro conditions. All animals were maintained together during the entire trial in dry lot or under conditions of minimum additional parasite challenge. Body weight (BW), fecal egg count (FEC) and packed cell volume (PCV) were recorded at 0, 28, 35 and 42 days after artificial infection. 136 animals from Corriedale (66), Pampinta (34), Indonesian Fat Tail (17) and Indonesian Thin Tail breeds of sheep (19) were utilized for artificial challenge at experimental stations located in Argentina and Indonesia respectively.
Targeted re-sequencing, SNP identification and Genotyping A total of 39 candidate genes were identified for targeted resequencing and to detect SNPs. The candidate genes were selected based on analysis of global list of sheep Entrez Gene IDs in bovine KEGG database using KEGGARRAY to identify candidate genes involved in pathways related to immune system. Oligonucleotide primers were designed for PCR amplification of partial regions of genes in a panel of eight or 16 unrelated animals from different breeds located in major geographical regions under study. Sequences generated from both ends were edited using Codon Code Aligner version 3.7.1 and secondary peaks were called to ascertain SNPs. The forward and reverse sequences were assembled to generate contigs using BioEdit version 7.1.3 (http://www.mbio. ncsu.edu/bioedit/bioedit.html). 44 novel SNPs were identified within the candidate genes under study for which competitive allele specific PCR (KASPar) assay based on FRET chemistry were developed for genotyping (KBiosciences, LGC Genomics, UK). Briefly, two forward primers one specific to each allele were designed with the respective proprietary tail sequence complementing the FAM or HEX fluorescence reporting system. A common reverse primer was designed for each genotyping assay. Thermal cycling parameters and recycling conditions were followed as per manufacturer's recommendations and are available on request. Endpoint allele discrimination module incorporated within the BioRad CFX96 (BioRad, USA) was utilized for calling the genotypes based on fluorescent intensity recorded for each of the two alleles. The emission data of all the samples in the plate were plotted in X and Y axis respectively for each allele and the genotypes were called based on distinct clustering. Quality of allele calling was confirmed by comparing the genotypes derived from KASPar assay with the available sequence data on individuals from the panel of unrelated animals. 38 out of 44 assays passed quality control and were subsequently utilized for genotyping large number of animals. Additionally, ten toll like receptor (TLR) genes were selected for in silico mining of SNP variations from sequences available at NCBI-GenBank database. A total of 14 non-synonymous SNPs within coding DNA regions of TLR genes were identified for development of genotyping assays (Details of SNPs identified and reference sequences used from NCBI-GenBank are given in Table S1). However, only three of these SNPs (within TLR5, TLR7 and TLR8 genes) were found to be polymorphic, while the remaining 11 were monomorphic in the populations under present study. A total of 41 SNPs were finally utilized for diversity analysis and association with parasite resistance characteristics.

Statistical Analysis
Basic diversity indices like allele frequency, genotype frequency, expected heterozygosity and test for Hardy Weinberg equilibrium were calculated using PEAS (Package for Elementary Analysis of SNP data) [29]. Allele sharing genetic distances (based on identical by state (IBS)) among pairs of individuals within and across different sheep breeds/populations were estimated using PEAS. Pair-wise allele sharing distance across different populations was utilized to construct the radial tree following UPGMA algorithm using PHYLIP version 3.5 [30]. Global F-statistics and pair-wise F ST among different sheep breeds were computed using FSTAT software [31]. To investigate the sub-population structure of sheep breeds, pair-wise F ST values among different sheep breeds were utilized to perform principal component analysis (PCA) using SPSS version 13.0. The first three principal components were used to draw the scattergram so as to understand underlying genetic structure and relationship of different breeds in three dimensional geometric space. The extent of population sub-structure was further explored using STRUCTURE with the assumption of different clusters, K = 1-15, 20, 25 and 30. Five replicate runs were performed for each K under admixture model without a priori population information. The number of burn in periods and MCMC repeats used for all the runs were 50000 and 100000 respectively. To identify the optimal 'K', the second order rate of change of L(K) with respect to 'K' was calculated by following the procedure reported elsewhere [32]. The results of STRUCTURE analysis were visualized using DISTRUCT [33].
Thirteen SNP loci within candidate genes involved in different immune related KEGG pathways and located in chromosome 3 were used to reconstruct haplotypes from unphased genotypic data. Reconstruction of haplotypes and estimation of haplotype   frequencies were performed using PHASE for Windows, version 2.1 (www.stat.washington.edu/stephens) [34,35]. The haplotype diversity and tests for departure from neutrality, Tajima's D, Fu and Li's D and Fu and Li's F were computed using DnaSP, version 4.10 [36]. The phased haplotypes were also utilized to perform analysis of molecular variance (AMOVA) and generate pair-wise F ST values using ARLEQUIN version 3.1 [37]. Reduced Median network of haplotypes was constructed using NETWORK 4.5.1.2 with reduction threshold (r) of 10.0 [38].
A complete fixed effect model was employed for association of different genotypes at each SNP locus with fecal egg count (FEC), body weight change (BWC) and packed cell volume change (PCVC) measured 42 days post infective L3 larvae challenge. The data on FEC were subjected to log transformation before applying the program of least squares, LSMLMW [39]. The model included location of experimental stations and genotypes as fixed effects along with linear regression of breed effect on fecal egg count.

Immune pathway genes and SNP Discovery
A total of 243 sequences were generated by targeted resequencing of selected candidate genes (accession numbers are presented in Table 1) to identify 41 novel SNPs. Various details of SNPs including SNP ID, candidate gene name, chromosome location, genomic location, functional domain of the gene, alleles at each locus, SNP type, and strand genotyped are presented in Table 1. Among 41 SNPs identified, 27 were located in chromosome 3, the chromosome with maximum number of QTLs related to parasite resistance characteristics in sheep. Out of the remaining 14 SNPs, three were located in each of chromosomes 1 and 12, two in each of chromosomes 11, 16 and 27 and one in chromosome 8 and 13 respectively. The candidate genes selected for the study are involved in at least 18 KEGG pathways related to immune system ( Table 2). The number of genes within each of these pathways varied from one to 14. JAK-STAT signaling pathway consisted a maximum of 14 genes followed by cytokine-cytokine receptor interaction and PIK3-AKT signaling pathways with eight genes each. The other major signaling pathways were Toll like receptor signaling pathway (5 genes) and chemokine signaling pathway (5 genes) followed by T cell receptor signaling pathway (4 genes). In a recent study, analysis of QTL and gene expression datasets following systems genetic approach revealed 14 KEGG pathways to be significant for parasite resistance in ruminants [21]. More than 50% of these immune related pathways have been included for candidate gene analysis in the present study. Altogether, 27 SNPs out of 41 SNP loci were within the candidate genes involved in immune pathways including 13 SNPs in chromosome 3. The location of SNPs in different functional domains of candidate genes varied considerably with 16 in 39untranslated regions, 14 in exonic regions, 10 in intronic regions and one in 59 flanking region upstream to start codon. Among the 14 SNPs located within exonic regions, eight were found to be non-synonymous mutations resulting in change of amino acid sequences while the remaining six were synonymous mutations.

Minor allele frequency and genetic diversity within sheep breeds
The basic diversity measures estimated for different breeds under study: minor allele frequency, observed heterozygosity, expected heterozygosity and number of SNP loci deviating from Hardy-Weinberg equilibrium are presented in Table 3. The global Table 1. Cont. minor allele frequency (MAF) across 41 SNP loci varied from 0.028 to 0.494 with a mean of 0.273. The power to detect genetic effect in a given study depends to a great extent on MAF of the tested alleles. Specifically, loci with a low MAF (,10%) have significantly lower power to detect weak genotype-phenotype associations than loci with a high MAF (.40%) [40,41]. Further, previous studies have demonstrated that rare genotypes are more likely to result in spurious findings due to relatively higher standard error (within each test) and higher false discovery rate (under multiple testing procedures for many loci) [42]. In the present study, the global minor allele frequency was more than 0.10 in all but three SNP loci (FGD6_519, SMCR7L_517, TLR8_1045) thus indicating their suitability for association study. Examination of SNP loci within each breed revealed presence of both alleles in more than 90% of SNP loci, thus indicating high degree of polymorphism and possibility of these loci predating the radiation of sheep breeds under study. In addition, 49.8% of SNP loci showed MAF$0.20 while 29.7% showed MAF$0.30, suggesting the SNP set identified in the present study will likely have high utility for association analysis in different populations. The mean MAF within breeds varied from 0.167 (Pampinta) to 0.238 (Bangladeshi), although no significant difference in MAF was observed across different geographical regions: Asia, Europe and South America (Table 3). This is in contrast to earlier findings [43] which reported Asian and African breeds having excess of low MAF SNP (,0.10) compared to European populations. This reflects the absence of any ascertainment bias in the present study as the diversity panel was adequately represented with Asian sheep breeds for SNP discovery.
The mean global observed and expected heterozygosities were 0.287 and 0.366 respectively. The mean observed heterozygosity within breeds varied from 0.230 (Pampinta) to 0.315 (Hamdani) while the mean expected heterozyosity varied from 0.237 (Pampinta) to 0.315 (Indonesian Thin Tail). Among different geographical regions, mean observed heterozygosity was highest in South West Asian sheep populations (0.309) followed by European populations [0.296] while South East Asian populations had the least mean observed heterozygosity (0.270). This is consistent with the fact that the diversity remains higher around the centre of domestication while decreasing with increasing geographic distance [44]. Among the European sheep breeds, Texel, the northern Europe originated sheep breed was having the lowest mean observed heterozygosity (0.260) as compared to other South or South Eastern European breeds [45]. Further, the overall mean observed heterozygosity of South West Asian and European sheep populations was found to be higher than gene diversity, although similar case was observed with respect to most South Asian sheep populations except Bangladeshi, Kachi and Karakul. The test for HWE showed significant deviations with a mean number of loci 7.6, 5, 4, 7.4 and 6.3 in South Asian, South East Asian, South West Asian, European and South American sheep populations. Among all the sheep populations, Hamdani was found to be in equilibrium at all the SNP loci except one further reiterating its high degree of genetic diversity.

Genetic distance within and between sheep breeds
Allele sharing distance was calculated for all pair-wise combinations of individuals both within and across populations by subtracting average proportion of alleles shared from one [46]. The mean inter-individual allele sharing distance of all pair-wise combinations within breeds was 0.236 (SD = 0.06; n = 16907), while it ranged from 0.197 (Pampinta) to 0.280 (Indonesian Thin Tail). Across different geographical regions, the mean distance within breeds was lowest in South American populations (0.232) while it was highest in South East Asian populations (0.265). The mean distance between individuals derived from different breeds was 0.325 (SD = 0.07; n = 236921). Although the observed values were found to be higher than that reported for cattle, they were  -STAT signaling pathway  14  CSF2RB,IL10,IL20RA,IL2RA,IL2RB,IL6R,  LEPR,PIK3CD,PIK3R3,PRLR,PTPN6,STAT2,STAT3,STAT5B   bta04151  PI3K-Akt signaling pathway  8  IL2RA,IL2RB,IL6R,ITGA5,ITGB7,PIK3CD,PIK3R3,PRLR   bta04060  Cytokine-cytokine receptor interaction  8  CSF2RB,IL10,IL20RA,IL2RA,IL2RB,IL6R,LEPR, PRLR   bta04620  Toll-like receptor signaling pathway  5  PIK3CD,PIK3R3,TLR5,TLR7,TLR8   bta04062  Chemokine signaling pathway  5  PIK3CD,PIK3R3 lower compared to the previous report in sheep [47]. The distribution of inter-individual distance values was found to be normal both within and across breeds ( Figure S2). There was considerable overlap between inter-individual distances within and across breeds with almost equal proportion of pairwise combinations around the range of 0.28 to 0.30. This indicates that some individuals were found to be more closely related to individuals from other breed than from members of the same breed. To further investigate the genetic differentiation among different sheep breeds, pairwise allele sharing distance pair-wise F ST (Table  S2) and global F-statistics (Table S3) Figure 2). However, the three South American sheep breeds formed a sub-cluster together and interestingly found to be more closely related to Southern Europe sheep breeds than North European sheep. Similarly Karakul and Bangladeshi breeds were found to be clustering together with South West Asian sheep than with other South Asian breeds.

Genetic structure of sheep breeds
The pairwise F ST were subjected to principal components analysis and the first three principal components (PCs) were plotted on a three dimensional scattergram to evaluate the genetic structure of sheep breeds (Figure 3a). The first, second and third principal components explained 42.01%, 38.12% and 7.15% of total genetic variation respectively. The clustering of sheep breeds followed their geographical origin and broadly differentiated into European and Asian groups. European Mouflon, which is more feral in nature was distinct from both these groups. However, analysis of a subset of SNP data at 27 loci located within immune pathway genes resulted in an additional distinct group with all the four South Indian sheep breeds clustering together closely (Figure 3b). In order to further understand the phylogeographic structure, analysis of molecular variance was performed to assess the SNP variation as a function of both breed membership and geographic origin (Table 4). Two types of groupings were assumed; the first grouping was with three major geographical groups Asia, Europe and South America; the second one with five geographical groups South Asia, South East Asia, South West Asia, Europe and South America. With grouping I, 14.16% of variation was due to differences in geographical groups and 11.54% due to between breed differences. In case of grouping II, among group variation marginally increased to 14.64% while between breed differences decreased to 9.51%. Further, with the analysis of subset of immune pathway SNP data at 27 loci, among group variation increased to 15.92% with grouping II showing a strong phylogeographic structure (Table 3). This is in contrast to earlier reports based on microsatellite genotypes [45], SNP genotypes [47] and mitochondrial DNA haplotypes [49], where much less variation was explained by grouping breeds into geographical regions. Similarly weak phylogeographic structure had been reported in domestic goats also [50]. All these studies concluded that the weak phylogeographic structure exhibited by domestic sheep and goat might be due to their small size and versatility enabling transportation and subsequent introgression in concert with human migration [47,51]. The relatively strong phylogeographic structure observed in the present study is interesting from the fact that most of these SNP loci are within candidate genes involved in different immune pathways. The geographical locations of sampled individuals vary widely with respect to diversity and load of pathogens resulting in differences in the magnitude of natural selection pressure across these regions [27,28]. Consequently, evolution of genes involved in immune system may either be highly optimized by natural selection process (purifying selection) or continue to evolve under low selection pressure (balancing selection) [52]. The genetic structure of sheep populations exhibited by a set of SNPs located in immune pathway genes could thus be different from those revealed by microsatellite or mitochondrial or genome wide SNP variations. In order to further clarify the breed demography and selection history, all the 41 SNP loci were subjected to Ewens-Watterson neutrality test to investigate whether the loci were influenced by selective forces within various sheep breeds/populations. 18 out of 41 SNP loci were found to deviate from selection neutrality in at least one of the breeds under study while the remaining 23 SNP loci were found to be selectively neutral (Table S4). The two subsets of data (23 neutral SNP loci and 18 non-neutral loci) were subjected to analysis of molecular variance. Among group variance at nonneutral SNP loci was found to be significant and higher (17.13%) than variance among populations within group as compared to selectively neutral SNP loci (11.01%) indicating the basis for strong phylogeographic structure observed in the present study (Table  S5). Bayesian clustering was performed without prior population information using STRUCTURE program, and the second order rate of change of average likelihood at each K was calculated (K = 1-15, 20, 25, 30). DK reached its peak at K = 5, suggesting optimal K value appropriate for the dataset. When K = 2 was assumed, most of the individuals from Europe and South America were assigned to cluster 1, while all the four Indian breeds were assigned to cluster 2 ( Figure 4). Individuals belonging to other breeds from South Asia (Kachi, Kajli, Karakul and Thalli), South    West Asia (Hamdani, Shal) and South East Asia (Indonesian Fat Tail and Indonesian Thin Tail) were admixed and observed to be assigned in both the clusters. When K = 3 was assumed, European and South American sheep were mostly assigned to cluster 1, Indian sheep to cluster 2 and the South West Asian sheep along with Karakul to cluster 3. Indonesian sheep and other South Asian sheep were found to be admixed between cluster 2 and 3. With K = 4, the Indian sheep population got subdivided into two clusters while with K = 5, the subdivision of European cluster was evident. Further, Bayesian analysis was also performed with the subsets of genotypes at non-neutral and neutral SNP loci ( Figure  S3a and S3b respectively). The results revealed relatively better and more precise geographical clustering of animals with the nonneutral subset than the neutral subset of genotype data.

Haplotype reconstruction and test for neutrality
To investigate the influence of natural selection processes, unphased diploid genotypes at SNP loci located in chromosome 3 were utilized to reconstruct the haplotypes. Out of 27 SNP loci located in chromosome 3, 13 loci within immune pathway genes were used for haplotype phasing. Data on all 713 animals were utilized to reconstruct a total of 1426 haplotypes, of which 389 were found to be singletons. Predicted haplotype phases with best pair probabilities for each individual were retained for further analysis. Table 5 provides the results of tests for selective neutrality using three different statistics: Tajima's D, Fu and Li's D and Fu and Li's F. Significant deviations were found in Corriedale, Junin, Krainer Steinschaf, Texel and Nellore sheep breeds. All the three statistics showed significant deviation from neutrality in Junin breed, while two statistics in each of corriedale and Texel sheep breeds and one statistics in each of Krainersteinschaf and Nellore sheep breeds showed significant deviations from selective neutrality. When tested at regional level, Tajima's D and Fu and Li's F statistics revealed significant deviations in European, South American and South Asian populations. Similarly, the Indian sheep which clustered distinctly when analyzed at immune SNP loci, showed significant deviation from neutrality under Tajima's D and Fu and Li's F tests. However, Fu and Li's D statistic did not detect any significant deviation from neutrality in all these populations. All the test statistics that showed significant deviation both at breed and regional levels were found to be positive indicating balancing selection in force, probably with low selection pressure. In order to further examine this process, haplotype networks were constructed for each of the geographical regions under study. Star contraction is expected under strong purifying selection while few haplotypes with moderate frequencies and short branches are expected under a weak purifying selection. On the contrary, balancing selection is expected to retain multiple lineages with high and low frequency clusters and long branches [53]. Reduced Median (RM) networks were constructed for haplotypes derived from populations in different geographical regions ( Figure 5). All the populations showed multiple lineages with several nodes of different sizes and long branches, thus confirming balancing selection in force as revealed by different tests for neutrality. This is further evident from the fact that many breeds under study were found to have either heterozygosity excess or near equal observed and expected heterozygosities. Although little information is available in sheep on immune gene polymorphisms across distinct geographical regions, few reports are available in human and cattle. In a study on innate immune genes including TLRs and defensins in Indian, European-American and African-American human populations, strong purifying selection was found to operate resulting in conservation of recognition motifs across a broad range of pathogens [52]. Similar observation was found with respect to TLR10 gene in a study on Bos taurus and Bos indicus cattle [54]. However, in case of adaptive immune genes like major histocompatibility complex, balancing selection with high genetic diversity and heterozygote advantage was found to be common [55,56]. To the best of our knowledge, the present study is the first to report balancing selection forces operating in immune pathway genes of sheep. However, it needs to be noted that there may be local variations in the nature of selection as it could be modulated by local differences in pathogen diversity and load [52].

Association of immune pathway gene polymorphisms with fecal egg count
To evaluate the potential utility of the SNP loci for future association study on a large number of samples, a pilot analysis was performed with the phenotypes generated in four breeds after artificial challenge with infective L3 larvae of Haemonchus contortus. Phenotypic data (fecal egg count, body weight change and change in packed cell volume 42 days post challenge) on 136 animals was used for least squares analysis under a complete fixed effect model. The effect of location of experimental stations (Corriedale and Pampinta in South America; Indonesian Thin Tail and Indonesian Fat Tailed sheep in South East Asia) was not found to have significant influence on fecal egg count and packed cell volume while significant effect was observed with respect to body weight change (P,0.01). Although the locations are wide apart geographically, uniform protocol was followed across different experimental stations in terms of age of lambs selected for experiment, deworming and data recording schedule, however some differences did exist in terms of quality of pasture available for grazing, etc. Higher observed body weight change in animals challenged at Aguil Experimental Station (AES), Argentina was due to better growth performance of Pampinta lambs. Higher body weight achieved by Pampinta lambs were due to their genetic differences in growth rate (average pre-weaning weight gain of 295 g/day) and weight gain (average weaning weight of 33.4 kg) as compared to other breeds like Corriedale (218 g/day and 24.6 kg) [57] and Indonesian Fat Tailed sheep (47.7 g/day and 9.7 kg) [58]. The fixed effect on body weight change was adjusted before phenotype-genotype association analysis. Among different breeds, lowest mean fecal egg count (mean log FEC 3.2360. 16) and packed cell volume change (21.57%) was observed in Indonesian fat tailed sheep while Corriedale showed highest mean values for these traits (mean logFEC 3.5860.079 and PCV of 24.97%), although the differences were not statistically significant (P.0.05). Among the SNP loci examined, genotypes at two loci, NAV3_591 and GLI1_576, both located in chromosome 3 and within exonic regions of the respective genes (Neuron navigator and GLI family zinc finger 1) were found to have significant differences in their fecal egg count (Table 6). Among these, GLI1_576 locus was a non-synonymous change from Asparagine (T allele) to Histidine (G allele). The mean log transformed fecal egg count at NAV3 locus was 3.435, 3.717 and 3.453 for GG, CC and GC genotype groups respectively. Similarly, the mean log transformed fecal egg count at GLI1_576 locus was 3.724, 3.364 and 3.749 for TT, GG and TG respectively ( Figure 6). Apart from these two loci, genotypes at ZBTB39_51, IL20RA_422, PIK3CD_433 and TLR7_2491 showed weak differences in their mean log transformed fecal egg count, although statistically not significant (P,0.10). However, it needs to be mentioned that none of these loci were found to have significant association when multiple testing correction factors were applied using Benjamini-Hochberg false discovery rate (FDR corrected P-value.0.10 for all the SNP loci). Similarly, with respect to body weight change, four loci (ACVRL1_445, GPR84_520, TARBP2_97 and SMCR7L_517) located in chromosome 3 were found to have  SNP loci showing significant or weak association with phenotypes (including fecal egg count, body weight change and packed cell volume change), some showed heterozygous advantage while few others had no heterozygous advantage. Unlike haplotype analysis conducted on many breeds within each region that showed balancing selection and heterozygous advantage, association analysis was performed in few selected breeds. Considering the results from neutrality tests, selection influence at a particular SNP locus vary across breeds and hence the balancing selection observed in haplotype networks within a particular region might be due to different alleles being favoured within and across loci in various breeds. However, it has to be noted that the preliminary association in the present study is based on relatively fewer number of animals and expected heterozygous advantage has to be tested in large populations.
In conclusion, the present study reports strong phylogeographic structure in sheep across Asia, Europe and South America and balancing selection operating at SNP loci located within immune pathway genes. Although the present association analysis is preliminary in nature, the SNP loci on chromosome 3 and those within immune pathway genes indicated their potential for future large scale association studies in naturally exposed populations. Figure S1