Clonal Evolution of Enterocytozoon bieneusi Populations in Swine and Genetic Differentiation in Subpopulations between Isolates from Swine and Humans

Enterocytozoon bieneusi is a widespread parasite with high genetic diversity among hosts. Its natural reservoir remains elusive and data on population structure are available only in isolates from primates. Here we describe a population genetic study of 101 E. bieneusi isolates from pigs using sequence analysis of the ribosomal internal transcribed spacer (ITS) and four mini- and microsatellite markers. The presence of strong linkage disequilibrium (LD) and limited genetic recombination indicated a clonal structure for the population. Bayesian inference of phylogeny, structural analysis, and principal coordinates analysis separated the overall population into three subpopulations (SP3 to SP5) with genetic segregation of the isolates at some geographic level. Comparative analysis showed the differentiation of SP3 to SP5 from the two known E. bieneusi subpopulations (SP1 and SP2) from primates. The placement of a human E. bieneusi isolate in pig subpopulation SP4 supported the zoonotic potential of some E. bieneusi isolates. Network analysis showed directed evolution of SP5 to SP3/SP4 and SP1 to SP2. The high LD and low number of inferred recombination events are consistent with the possibility of host adaptation in SP2, SP3, and SP4. In contrast, the reduced LD and high genetic diversity in SP1 and SP5 might be results of broad host range and adaptation to new host environment. The data provide evidence of the potential occurrence of host adaptation in some of E. bieneusi isolates that belong to the zoonotic ITS Group 1.

The objectives of this study were to characterize 101 pig E. bieneusi isolates that belong to nine genotypes in zoonotic Group 1 at the subtype level at five genetic loci, to assess the population structure and substructures, and to compare them with similar data from E. bieneusi subpopulations SP1 and SP2 in primates to examine the occurrence and extent of host segregation in different E. bieneusi subpopulations.

Ethics statement
This study was performed in accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the Ministry of Health, China. Prior to experiments, the protocol of the current study was reviewed and approved by the Institutional Animal Care and Use Committee of Northeast Agricultural University (approved protocol number SRM-08). For specimen collection, we obtained permission from animal owners. No specific permits were required for the described field studies, and the locations where we sampled are not privately owned or protected in any way. The field studies did not involve endangered or protected species.

Study area and parasite sampling
Isolates of E. bieneusi were obtained from pigs in cities Changchun, Daqing, Harbin, and Qiqihar in northeast China. All of them were previously genotyped by PCR and sequence analysis of the ITS locus [17,29]. A total of 101 isolates of ITS genotypes CHN7, CS-4, EbpA, EbpB, EbpC, Henan-I, Henan-IV, O, and PigEBITS3 in zoonotic Group 1 were selected for population genetic analysis in this study. The number of isolates and their ITS genotype designations by city are shown in Table 1. For comparative purposes, population genetic data from 101 E. bieneusi isolates in humans from India, Peru, and Nigeria and 5 isolates in captive baboons from Kenya were included in this analysis (Table 1) [26,27].

PCR and sequencing
Genomic DNAs extracted from 101 pig fecal specimens were used for PCR amplification of a minisatellite marker MS4 and three microsatellite markers (MS1, MS3, and MS7) as described [25]. The secondary PCR products with the expected sizes (bp) of approximately 676 for MS1, 537 for MS3, 885 for MS4, and 471 for MS7 were sequenced in both directions at the Beijing Genomics Institute, China. Raw sequences were assembled and edited with the software Chromas Pro version 1.33 (Technelysium Pty. Ltd., Helensvale, Queensland, Australia). The sequences obtained were compared to the sequence data of each target available in GenBank [26,27] using the software MAFFT version 7.300 (http://mafft.cbrc.jp/alignment/software/) [30].

Genetic analysis
The software DnaSP version 5.10.01 (http://www.ub.edu/dnasp/) was used to determine E. bieneusi genotypes at each of the five markers and the MLGs with consideration of both single base nucleotide substitutions and short insertions and deletions (indels) polymorphisms [31]. Intragenic LD and recombination rates for individual locus and concatenated multilocus data set were estimated from the segregating sites without consideration of indels using DnaSP [31]. Recombination rates were further assessed using the methods GENECONV, MaxChi, and SiScan implemented in the software RDP version 3.44 (http://darwin.uvigo.es/rdp/rdp.html) [32]. Tests for genetic diversity and neutrality (Fu's Fs and Tajima's D) were run on the concatenated contigs using DnaSP (based on segregating sites) and the software Arlequin 3.5.1.2 (http://cmpg.unibe.ch/software/arlequin35/; based on both segregating sites and indels) [31,33]. The different nucleotide sequences (considering both substitutions and indels) were assigned as distinct alleles and the alleles at each of the five loci defined the allelic profile or sequence type. We measured the pairwise intergenic LD on annotated allelic profiles using the exact test and Markov chain parameters implemented in Arlequin [33]. Values of standardized index of association (I S A ) were calculated with LIAN 3.5 (http://guanine.evolbio.mpg.de/cgibin/lian/lian.cgi.pl/query) on five-loci haplotypes [34].
We assessed population structure of E. bieneusi by analyzing the intragenic and intergenic LD, I S A , neutrality, and recombination events (Rms). Wright's fixation index (F ST ) calculated using Arlequin and gene flow (Nm) calculated using DnaSP were applied to evaluate the degree of genetic differentiation between E. bieneusi populations [31,33].

Phylogenetic and structure sub-clustering analysis
A Bayesian analysis implemented in the software MrBayes version 3.2.1 (http://mrbayes. sourceforge.net/) was used in clustering nucleotide sequences using Markov chain Monte Carlo (MCMC) methods [35]. The general time reversible model (GTR+G) was determined to be the best-fit nucleotide substitution model with the program ModelTest 3.7 (http://www. molecularevolution.org/software/phylogenetics/modeltest) [36]. An MCMC-based analysis of phylogeny was conducted using the GTR+G model and the default parameters settings as described [26]. The maximum clade credibility tree generated by these analyses was visualized and edited using the software FigTree version 1.3.1 (http://tree.bio.ed.ac.uk/software/ figtree/). Pairwise distance matrices among nucleotide sequences of MLGs were calculated using eachgap calculating method with the dist.seqs command in the software MOTHUR version 1.24.1 (http://www.mothur.org/wiki/Download_mothur) [37]. A principal coordinates analysis (PCoA) via covariance matrix with data standardization was performed on the generated matrices with the software GENALEX version 6.501 (http://biology-assets.anu. edu.au/GenAlEx) [38]. A Bayesian cluster analysis was performed on the allelic profile data using the software STRUCTURE version 2.3.1 (http://pritch.bsd.uchicago.edu/software. html) to assess the presence of distinct subpopulations [39]. We also constructed haplotype networks using the median-joining method implemented in the software Network version 4.6.1.1 (http://www.fluxus-engineering.com/sharenet_rn.htm) to estimate the genetic segregation and evolutionary trend of E. bieneusi isolates [40].

Genotyping and LD at individual loci
Sequence analysis of 101 pig E. bieneusi isolates identified 9, 15, 5, 15, and 11 genotypes at the loci ITS, MS1, MS3, MS4, and MS7, respectively. Single nucleotide polymorphisms (SNPs) were the only source of genetic diversity at the ITS locus, while genetic variation at the other four loci included the lengths of trinucleotide TAC and TAA repeats at MS1, dinucleotide TA repeats at MS3, tetranucleotide GGTA repeats at MS4, and TAA repeats at MS7 and the SNPs outside the tandem repeat regions. Some MS4 fragments also carried isostructural GG to AA substitutions in the first tetranucleotide repeat. Gene diversity (Hd) at individual loci was calculated using DnaSP and is shown in Table 2. The number of genotypes and Hd value of each locus were also measured for 106 primate E. bieneusi isolates (Table 2). Generally, markers MS1 and MS4 had higher resolution than the other ones ( Table 2). The intragenic LD among segregating sites for each locus was calculated based on a linear regression analysis in DnaSP. The markers MS1, MS3, and MS7 had complete LD (LD = 1) when only the pig E. bieneusi isolates were analyzed, whereas only MS1 and MS3 had complete LD in the analysis of all 207 isolates (Table 3). Table 3 displays the number of pairwise comparisons and the number of significant pairwise comparisons after Fisher's exact test and Bonferroni correction. The occurrence of intragenic recombination was assessed using DnaSP. As shown in Table 3, genetic recombination was only detected in markers with incomplete LD.

Multilocus sequence typing (MLST) and analysis
To investigate the genetic diversity and population characteristics of E. bieneusi, the five loci were concatenated into a single multilocus contig of 2,128 bp in length. The contigs from 101 pig isolates include 159 polymorphic sites (41 segregating sites and 118 indel sites). Due to the difference in substitution rate between SNPs and indels, two methods were used to estimate genetic diversity. The finite population genetic variance estimates that consider both SNPs and indels allowed identification of a total of 44 MLGs with a Hd value of 0.95 and a nucleotide diversity (Pi) value of 0.0197 (Table 4). In contrast, the use of infinite population genetic variance estimates that consider only SNPs led to reduced genetic diversity (MLGs = 37, Hd = 0.93, Pi = 0.0065) ( Table 4). The genetic diversity was also estimated for each of the four cities in northeast China (Table 4). Tests for intragenic LD and recombination among segregating sites were performed on combined multilocus contigs. The overall population and individual populations of E. bieneusi in pigs from four geographic locations all had significant but incomplete LD ( Table 4). The  negative slope returned by LD score regression is indicative of LD index declines with increasing nucleotide distance, implying the potential occurrence of recombination. A varying number of Rms were detected in the overall and individual pig E. bieneusi populations from four cities using DnaSP ( Table 4). The occurrence of Rms was also confirmed using the GENECOV, MaxChi, and SiScan methods in RDP4 (S1 Table). The neutrality tests conducted with DnaSP (using SNPs only) and Arlequin (using both SNPs and indels) were both significant, rejecting the null hypothesis of a neutral population at mutation-drift equilibrium ( Table 4). The negative Fs and D values (highlighted in bold in the center of Table 4) obtained in the tests of the pig E. bieneusi populations implied an excess of low frequency polymorphisms, as would be expected from a recent population expansion. We also calculated I S A and compared the values of V D (variance of pairwise differences) and L (the 95% critical value for V D relative to the null hypothesis of panmixia) to assess the population structure of E. bieneusi using allelic profile data [34]. As presented in Table 5, significant positive I S A values (at least 0.2733 in Qiqihar, P MC < 0.001) were obtained for the overall and individual pig E. bieneusi populations. The value of V D was greater than that of L for each data set as well. Thus, the populations tested were all in strong LD. Calculation of I S A and comparison of V D and L were also applied for the analysis of MLGs to avoid the possibility that LD would result from a clonal expansion of one or more MLGs which might mask the underlying equilibrium. The analysis showed the overall pig E. bieneusi population retained LD (I S A = 0.1441, P MC < 0.001; V D > L) when the isolates with the same MLG were treated as one individual (Table 5). Pairwise intergenic analysis of the five loci using the allelic profile data revealed strong LD as well (S2 Table). We also estimated the effective migration rate (Nm) using the F ST method. Pairwise analysis between geographic populations yielded F ST values ranging from 0.327 to 0.506 and Nm values ranging from 0.24 to 0.48 (Table 6). Thus, geographic segregation of the isolates and limited gene flow occurred among the pig E. bieneusi populations from four cities.
Population genetic analysis was also conducted when the MLST data from 106 E. bieneusi isolates from humans and baboons were included. Significant LD and limited recombination were obtained in the analysis of a total of 207 isolates (Tables 4 and 5). The test of selective neutrality revealed a nonneutral structure for the population ( Table 4). The negative Fs and D statistics (highlighted in bold at the bottom of Table 4) obtained from this test signified potential epidemic expansion of some MLGs and genetic subdivision.

Subpopulations and population genetic analysis
A Bayesian method was used to infer phylogenetic relationships among E. bieneusi isolates from pigs, humans, and baboons ( Fig 1A). The isolates were divided into two major phylogenetic clusters (one includes 44 MLGs from pigs and 1 MLG from a Peruvian adult with HIV infection and the other one includes 56 MLGs from humans and 4 MLGs from baboons) ( Fig  1A). Additional subdivision within the two clusters led to the formation of five genetically isolated subgroups (Fig 1A). The same clustering patterns appeared in the 3D image of PCoA with two main clusters (blue balls represent the isolates from humans and baboons and red balls from pigs) and five genetic subdivisions generated (Fig 1C). Considering the high concordance of the grouping patterns of MLGs formed in Bayesian inference and PCoA, we defined two known primate E. bieneusi subpopulations as SP1 and SP2 and three novel pig E. bieneusi subpopulations as SP3 to SP5. Subpopulations SP1 to SP5 contained 51, 9, 18, 14, and 13 MLGs derived from 66, 39, 35, 41, and 26 E. bieneusi isolates, respectively (Tables 2 and 4, Fig  1D). In general, subpopulations SP2 to SP4 had higher frequency of MLGs than SP1 and SP5 (Fig 1D).
We also performed substructure analysis based on allelic profile data using STRUCTURE. The initial run with K = 2 identified two major subclusters (Fig 1B). One subcluster in red included all isolates from pigs and one isolate from a human and the other subcluster in green contained the isolates from humans and baboons (Fig 1B), corresponding to the two major substructures generated in Bayesian inference and PCoA. The following runs at K = 3 and 4 yielded various intermediate patterns of population subdivision. The run at K = 5 showed the presence of five clear and robust subpopulations (Fig 1B). The isolates in each of the five subpopulations agreed with those in SP1 to SP5 except that several isolates from SP4 were clustered into SP5 (Fig 1). We also measured the alpha (α) values generated in the substructure analysis. When there are firm subdivisions in a population, the α values are held constant and commonly range from 0 to 0.2 in different runs. The runs at K = 2 to 5 in this study yielded consistent α values around 0.03, which supported the robustness of the substructure formation. However, when the analysis was performed at K = 6 or more, a fairly mixed and confused scene of clustering was observed (Fig 1B). Taken together, these results suggest that the run with K = 5 provided the best fit to our data.
Based on the results from Bayesian phylogeny, PCoA, and substructure analysis, it was apparent that five genetically isolated subdivisions were present in the total population. Distribution preference of the pig E. bieneusi isolates from Changchun and Daqing in SP3 and SP5 and those from Harbin and Qiqihar in SP4 suggested the presence of genetic segregation at some geographic level (S3 Table). Genetic diversity, intragenic LD, and Rms were measured based on multilocus sequences for each of the five subpopulations. Pairwise LD comparisons among the subpopulations showed that the clonality of E. bieneusi isolates in SP2 to SP4 was stronger than that in SP1 and SP5 (Table 4). In agreement with this, higher I S A values were generated in the analysis of SP2 to SP4 than in SP1 and SP5 ( Table 5). The measurement of population divergence among SP1 to SP5 was performed by the analysis of F ST and gene flow ( Table 6). SP1 and SP5 were shown to have a close genetic relationship (F ST = 0.185 and Nm = 1.10) (Table 6). Nevertheless, in comparative analysis of SP1 or SP5 to any other three Median-joining networks were used to infer the relationships between MLGs (Fig 2). The analysis showed the existence of five clusters marked in blue, green, yellow, purple, and red, which corresponded to SP1 to SP5, respectively (Fig 2). As central haplotypes are generally considered possible ancestors of the peripheral ones [26,41], the MLGs in SP2 to SP4 might have derived from the central ones in SP1 and SP5 (Fig 2). In addition, high dimensional networks in SP1 suggested the presence of significant recombination (Fig 2).

Discussion
Despite advances in defining E. bieneusi ITS genotypes from different hosts and geographical regions, the relationship between genotypes and phenotypic traits such as host specificity and zoonotic potential remains unclear [2,11]. Analysis of population structure can help us understand the epidemiology and evolution of parasites [42]. MLST analysis has provided substantial new insights into the population genetics of E. bieneusi in humans and non-human primates [26][27][28]. Herein, we evaluated intragenic and intergenic LD, I S A , neutrality, and Rms to determine the genetic structure and substructures in E. bieneusi populations from pigs using both sequence and allelic profile data from five genetic loci. The presence of strong LD and very limited recombination supported the existence of significant clonal structure in the overall and individual pig E. bieneusi populations from four cities, northeast China. In addition, as illustrated in Fig 2, the results of several neutrality tests suggest that selection acting in pig E. bieneusi populations has led to the expansion of several dominant MLGs, which might play a role in enhancing their adaptation to specific hosts. Two recent studies described that two other microsporidian species known to infect honeybees (Nosema apis and Nosema ceranae) were also under selective pressure and experienced a population expansion [43,44]. The estimates of F ST and Nm and the distribution of the pig E. bieneusi isolates revealed the existence of geographic segregation among the cities surveyed.
Pigs are considered a potential reservoir for human microsporidiosis based on genotypic features of E. bieneusi isolates at the ITS locus [2,17,23]. The ITS genotypes of pig E. bieneusi isolates used for MLST analysis in this study all belong to phylogenetic Group 1 with zoonotic potential [2,17,23]. The ITS genotypes of primate E. bieneusi isolates used for comparative analysis are also Group 1 members [26,27]. These Group 1 isolates formed several genetically isolated subpopulations (two existing primate SP1 and SP2 and three novel pig SP3 to SP5) in MLST analysis of five genetic loci. SP1 to SP5 probably have different phenotypic traits as reflected in the distribution of E. bieneusi isolates in these subpopulations. As observed in Fig  1A and S4 Table, SP1 and SP5 are comprised mainly of the isolates pertaining to zoonotic ITS genotypes D, IV, and EbpC that are found in a wide range of hosts and regions around the world. In contrast, SP2 to SP4 contain mainly isolates belonging to ITS genotypes A, EbpA, and EbpB that have narrow host and geographic ranges.
The stronger LD and higher occurrence of specific MLGs observed in SP2 to SP4 than SP1 and SP5 suggest the presence of higher clonality of E. bieneusi isolates in the former three subpopulations. The high diversity of E. bieneusi isolates in SP1 and SP5 may enable responses to environmental challenges and adaptations to new hosts [45]. Thus, MLGs in SP1 and SP5 might be responsible for cross-species E. bieneusi infections and have zoonotic potential. This is supported by the broad host range of E. bieneusi ITS genotypes D, IV, and EbpC in the two subpopulations.
Reduced gene flow between primate SP1 and SP2 and between pig SP5 and SP3 might promote the emergence of advantageous haplotypes in SP2 and SP3 and allow these haplotypes to Median-joining network for inferring intraspecific phylogenies of 207 Enterocytozoon bieneusi isolates from pigs in China, humans in India, Nigeria, and Peru, and baboons in Kenya. The size of the circles is proportional to the frequency of each of the 76 multilocus genotypes obtained based on segregating sites. The red, black, blue, yellow, and green colors in circles represent the isolates from China, India, Kenya, Nigeria, and Peru, respectively. ITS genotypes are labeled besides the circles. The black branches connecting multilocus genotypes have a length proportional to the number of single-nucleotide polymorphisms (SNPs), while the red branches having pairwise differences greater than 12 SNPs are shortened for better presentation. remain intact despite the possibility of recombination [46]. These processes might enable adaptation to specific host niches and initiate allopatric speciation [46]. In particular, this may have led to the adaptation of ITS genotype A in SP2 to humans and genotypes EbpA and EbpB in SP3 to pigs. Thus, MLGs in SP2 and SP3 are probably involved in host-specific colonization. SP2 was previously reported to have an epidemic population structure [26,27]. Likewise, genetic structure of SP3 with host-adapted features can also be considered to be epidemic. SP4 consists of E. bieneusi isolates belonging to the ITS genotypes CS-4, Henan-I, Henan-IV, PigE-BITS3, and EbpC. The former four genotypes were shown to infect a very limited number of host species as shown in S4 Table, while EbpC has a wide host range. Thus, the subpopulation probably represents an evolutionary intermediate between SP3 with host-adapted traits and SP5 with cross-species capability. The placement of one human E. bieneusi isolate with ITS genotype EbpC in SP4 supported the zoonotic nature of some E. bieneusi isolates.
Population genetic analysis is an indirect but powerful way to assess reproductive modes that are often difficult to uncover in some microorganisms [42]. Sexual and asexual reproduction alternates in many protozoan species under environmental pressure [47]. It is generally accepted that microsporidia undergo sexual reproduction, whereas some species or genotypes possibly have switched to obligate asexuality [48]. This is supported by the existence of clonal population structure in SP1 and SP5 and epidemic population structure or host-adapted traits in SP3 to SP4. Although a sexual phase might be rare or virtually absent in some microsporidian species, there are footprints of recombination in their genomes [47,48]. This could be responsible for the small number of recombination inferred in subpopulations SP1 through SP5. Although limited recombination would not constitute sufficient evidence for the presence of sexuality in SP1 and SP5, the weakened LD and high levels of MLG diversity might facilitate E. bieneusi to cope with host variations and environmental challenges. The haplotype network suggests that clusters SP3/SP4 may have been derived from SP5 and that SP2 may have arisen from SP1. Thus, SP2 to SP4 with host-specific features might serve as recurrent transitions to asexuality from otherwise SP1 and SP5 with sexual potential and some of the isolates in SP2 to SP4 might have outcompeted or partially displaced their relatives in SP1 and SP5 under certain ecological conditions. This process would limit host range and result in the emergence of highly successful E. bieneusi genotypes. It has been suggested that some asexual microsporidian populations might originate independently several times from their sexual ancestors [48,49]. These results indicate that E. bieneusi might have a sexual phase in its life cycle, sex could be lost or cryptic, and the parasite could switch to obligate asexuality when the population structure becomes epidemic.
In conclusion, we have shown an overall clonal population structure of E. bieneusi in pigs. Combined with the MLST data from primate E. bieneusi isolates, five distinct subpopulations have been defined. Among them, the very strong LD and low genetic recombination are indicative of the epidemic or host-adapted characteristics of SP2 to SP4, whereas the weakened LD and higher genetic diversity in SP1 and SP5 may represent higher potential for cross-species transmission of E. bieneusi infections. These data demonstrate the existence of genetic structure within E. bieneusi ITS Group 1 and the evolutionary potential for adaptation to host species in some of the subpopulations. Nevertheless, additional MLST data from other hosts including humans in China are needed for in-depth assessment of the potential for zoonotic transmission, host adaptation, and population differentiation of E. bieneusi isolates in different hosts.
Supporting Information S1