Genetic Structure Analysis of Spirometra erinaceieuropaei Isolates from Central and Southern China

Background Sparganosis caused by invasion of the plerocercoid larvae (spargana) of Spirometra erinaceieuropaei have increased in recent years in China. However, the population genetic structure regarding this parasite is still unclear. In this study, we used the sequences of two mitochondrial genes cytochrome b (cytb) and cytochrome c oxidase subunit I (cox1) to analyze genetic variation and phylogeographic structure of the S. erinaceieuropaei populations. Methodology/Principal Findings A total of 88 S. erinaceieuropaei isolates were collected from naturally infected frogs in 14 geographical locations of China. The complete cytb and cox1 genes of each sample was amplified and sequenced. Total 61 haplotypes were found in these 88 concatenated sequences. Each sampled population and the total population have high haplotype diversity (Hd), accompanied by very low nucleotide diversity (Pi). Phylogenetic analyses of haplotypes revealed two distinct clades (HeN+HuN+GZ-AS clade and GX+HN+GZ-GY clade) corresponding two sub-networks yielded by the median-joining network. Pairwise F ST values supported great genetic differentiation between S. erinaceieuropaei populations. Both negative Fu’s F S value of neutrality tests and unimodal curve of mismatch distribution analyses supported demographic population expansion in the HeN+HuN+GZ-AS clade. The BEAST analysis showed that the divergence time between the two clades took place in the early Pleistocene (1.16 Myr), and by Bayesian skyline plot (BSP) an expansion occurred after about 0.3 Myr ago. Conclusions S. erinaceieuropaei from central and southern China has significant phylogeographic structure, and climatic oscillations during glacial periods in the Quaternary may affect the demography and diversification of this species.


Introduction
Spirometra erinaceieuropaei (Cestoidea: Pseudophyllidea: Diphyllobothriidae) is one of the most important species of tapeworms [1]. Its plerocercoid larvae (spargana) can lodge in the subcutaneous tissues and sometimes invade the abdominal cavity, eye, and central nervous system of humans causing a serious parasitic zoonosis known as sparganosis [2]. Human infection results mainly from ingesting raw flesh of frogs and snakes infected with the plerocercoids, drinking raw water contaminated with cyclops harboring procercoids, or placing frog or snake flesh as poultice on open wound for treatment of skin ulcers or eye inflammations [3,4]. Sparganosis is distributed worldwide, but most cases occur in Eastern and Southeastern Asia [5,6]. China has the largest number of sparganosis cases in the world since 1999, with a total of approximately 1,000 instances of human sparganosis being reported in 27 out of 34 provinces, autonomous regions, or municipal districts [7]. In addition, the local cases have increased in recent years and sparganosis has even been termed as emerging enzootic diseases in several districts of China [8,9].
Sparganosis not only poses a serious threat to human health, but also causes significant economic losses [10,11]. Consequently, knowledge regarding the distribution of the pathogen of this disease, the genetic characteristics of its populations in relation to local environmental conditions is valuable for the prevention and control of sparganosis in humans. Unfortunately, insufficient studies on population genetics of S. erinaceieuropaei have been carried out to date. Thus, it becomes important to analyze the population genetics and demographic history of this tapeworm, so that we can get valuable clues about the population changes and genetic variation affecting the pathogenicity [12,13].
Due to high and rapid mutational rate, mitochondrial DNA (mtDNA) remains one of the most powerful and reliable tools for detecting population structure and inferring population differences [14,15]. Previous studies showed that within mtDNA, there are regions that diverge rapidly, while other regions that are highly conserved, making the different regions suitable for analysis of different taxonomic levels [16]. The structure and function of cytochrome b (cytb) and cytochrome c oxidase subunit I (cox1) genes have been verified in mtDNA sequences of cestodes and maintain a moderate evolutionary speed. Thus, cytb and cox1 have been used to study the population structure and genetic differentiation of several tapeworm species [17][18][19][20]. The aim of this study was to investigate the genetic variability, population structure and divergence pattern among S. erinaceieuropaei populations from central and southern China based on cytb and cox1 genes of mitochondrial DNA.

Ethics Statement
The performance of this study was strictly according to the recommendations of the Guide for the Care and Use of Laboratory Animals of the Ministry of Health, China, and our protocol was reviewed and approved by the Institutional Animal Care and Use Committee (IACUC) of Zhengzhou University (Permission No. SYXK 2012-0009). All the frog samples were collected from paddy fields after the permission of farm owners, with no specific permits being required by the authority for the collection of frog samples.

Sample collection
A total of 88 S. erinaceieuropaei isolates were collected from naturally infected frogs caught from a field site in fourteen geographical locations of China during May 2012 to August 2013 ( Fig. 1 and S1 Table). Briefly, frogs were euthanized using ethyl-ether anesthesia, weighed, and skinned. Skeletal muscles and subcutaneous tissues were carefully and visually observed for the presence of spargana. These were removed from muscles or subcutaneous tissues and placed in a Petri dish containing physiological saline.

DNA extraction, amplification and sequencing
Total genomic DNA was extracted from individual sample of plerocercoid using the Tiangen DNeasy Blood and Tissue Kit (Tiangen, China) following the manufacturer's protocol. The whole sequences of cytb and cox1 genes were amplified with Cob-F/R and Cox1-F/R two sets of primers respectively according to Yanagida et al. [21]. Polymerase chain reaction (25μl) was performed in 2mM MgCl2, 2.5μM of each primer, 2.5μl 10× rTaq buffer, 0.5mM of each deoxyribonucleoside triphosphate (dNTP), 1.25U of rTaq DNA polymerase (Takara, China), and 1μl of DNA sample in a thermocycler under the following conditions: after an initial denaturation at 94°C for 1 min, then 94°C for 30 s (denaturation); 58°C for 30 s (annealing); 72°C for 1 min (extension) for 30 cycles, followed by a final extension at 72°C for 5 min. These optimized amplification conditions for the specific and efficient amplification of individual DNA fragments were obtained after varying annealing and extension temperatures. One microlitre (5-10 ng) of genomic DNA was added to each PCR reaction. Samples without genomic DNA (no-DNA controls) were included in each amplification run, and in no case were amplicons detected in the no-DNA controls (not shown). PCR products were purified using the High Pure PCR Product Purification Kit (Takara, China) and sequenced in both directions using an automated sequencer (ABI Prism 3730 XL DNA Analyzer; ABI Prism, Foster City, CA) at the Genwiz Company (Beijing, China).

Nucleotide polymorphism
Sequences of cytb and cox1 genes were aligned using the computer program Clustal X 2.0 [22]. Molecular Evolutionary Genetics Analysis (MEGA) software version 5.0 [23] was employed to analyze the nucleotide composition, conserved sites, variable sites, parsimony-informative sites and singleton sites. The number of haplotypes, calculation of haplotype diversity (Hd) and nucleotide diversity (Pi) were performed using the program DnaSP v5.10.01 [24].

Phylogenetic analysis
The phylogenetic relationship among haplotypes was estimated through three methods of maximum parsimony (MP), neighbor-joining (NJ) and Bayesian inference (BI), respectively. MP analysis was performed in PAUP Ã 4b10 [25] using heuristic searches with TBR (tree bisection-reconnection) branch swapping and 2,000 random addition sequences. Confidence in each node was assessed by boot-strapping (2000 pseudo-replicates, heuristic search of 20 random addition replicates with TBR option). NJ analysis was also performed in PAUP Ã 4b10 [25] using the Kimura two-parameter distance selected by Modeltest v3.7 [26] under the Akaike information criterion and the 'heuristics' search option with the 'simple' addition sequence and TBR swapping. BI analyses were performed in MrBayes v3.1 [27] with 5,000,000 generations, sampling trees every 100 generations. Stationarity was assessed using a convergence diagnostic. An average standard deviation of the split frequencies (ASDSF) < 0.03 were used as criteria of convergence between both runs. Four Diphyllobothrium species: Diphyllobothrium nihonkaiense (Genbank accession number of cytb/cox1: AB508837/AB015755), D. latum (AB522608/AB511963), D. dendriticum (AB522613/KC812045) and D. ditremum (AB522617/ FM209182) were used as outgroup to root the resulting trees. We also used NETWORK v4.5.0.2 [28] to draw a median-joining network to analyze the relationships among detected haplotypes.

Analyses of genetic structure
The partitions of genetic diversity within and among populations were analyzed through analysis of molecular variance (AMOVA) [29] using the software Arlequin v3.5.1.2 [30]. To estimate levels of genetic differentiation among the populations, a pairwise comparison test was performed based on Slatkin's linearised F ST [31] using Arlequin v3.5.1.2 [30]. The significance of F ST evaluated was based on 10 000 random permutations (significance levels p = 0.05).

Divergence time estimation
The approximate divergence times were estimated for the lineages of S. erinaceieuropaei with an uncorrelated log-normal relaxed molecular-clock model using the software BEAST v1.6.1 [32]. The substitution models were HKY+G for cytb and HKY+G+I for cox1 following model selection by Modeltest v3.7 [26]. For the tree prior, a basic coalescent model assuming a constant population size over the time period considered was chosen according to the reduced effective population size resulting from the selfing mode of reproduction of S. erinaceieuropaei. Posterior distributions of the parameters, including the tree, were estimated via Markov chain Monte Carlo (MCMC) sampling. Two replicate MCMC runs were performed, with the tree and parameter values sampled every 1 000 steps over a total of 1 × 108 steps. The diagnostic software Tracer v1.5 [33] was used to assess convergence between runs, to estimate an appropriate number of samples to be discarded as burn-in, and to ensure that effective sample sizes (i.e., >500) were sufficient to provide reasonable estimates of model parameter variance. Log-Combiner v1.6.1 [32] was used to combine both runs. The sampled tree with the maximum product of clade credibilities was viewed using FigTree v1.3.1 [34]. The molecular evolutionary rate were fixed to 0.0195 and 0.0225 substitutions per site per million year (Myr) for cytb and cox1 respectively, according to the substitution rates for cytb and cox1 calculated based on the Taenia tapeworms [19,35].

Demographic history of population
We applied Neutrality tests through the program Arlequin v3.5.1.2 [30] as an assessment of possible population expansion. Under the assumption of neutrality, a population expansion produces a large negative value of Fu's F S test [36] and Tajima's D [37]. Tajima's D and Fu's F S are sensitive to bottleneck effects or population expansion, causing these values to be more significantly negative [38]. Fu's F S is particularly sensitive to recent population growth [36]. Population expansion events were determined through mismatch analysis [39] using Arlequin v3.5.1.2 [30] with the number of bootstrap replicates set to 5000. The validity of the expansion model was tested by using the sum of squared deviations (SSD) and Harpending's raggedness index (Rag) between observed and expected mismatches. The Bayesian skyline plot (BSP) was used to estimate the demographic history of S. erinaceieuropaei using the program BEAST v1.6.1 [32]. A piecewise-constant skyline model was selected, and a relaxed uncorrelated lognormal molecular clock was used with the mutation rates of 1.95%/Myr for cytb and 2.25%/ Myr for cox1 as suggested by Hoberg et al. [35] and Michelet et al. [19]. Tracer1.5 was used to reconstruct the demographic history through time.

Genetic variation
The concatenated sequence alignment contained 88 sequences and 2676 positions, of which 1110 bp were sequenced for the cytb gene and 1566 bp for the cox1 gene. The average base compositions of cytb and cox1 were 46.1% and 46.0% (T), 18.6% and 18.3% (A), 23.5% and 23.6% (G), 11.8% and 12.1% (C) respectively, with AT-richness in the sequences. No insertions or deletions were detected. A total of 161 polymorphic sites were found, of which 150 were parsimony-informative and 11 were singleton-variable. These polymorphic sites identified 61 haplotypes within 88 isolates from fourteen localities (Table 1 and S1 Dataset.). Each sampled population and the total population have high Hd, accompanied by very low Pi ( Table 1). The genetic divergence of cytb and cox1 sequences of S. erinaceieuropaei collected from fourteen geographical locations ranged from 0 to 3.6% and 0 to 4.9%, respectively.

Phylogenetic diversity
The 61 haplotypes of S. erinaceieuropaei formed two distinct clades (clade I and clade II) in all phylogenetic analyses based on three methods: maximum parsimony (Fig. 2), neighbor-joining (S1 Fig) (Fig. 2, S1 and S2 Figs). In the haplotype median-joining network, the 61 haplotypes of S. erinaceieuropaei observed in the dataset also generated two subnetworks: clade I and clade II (Fig. 3).

Population structure
Analysis of molecular variance indicated that most of the observed genetic variation occurs between the two clades (70.57%), whereas differentiation among fourteen endemic populations (HeN-XX, HeN-ZZ, HeN-KF, HeN-ZK, HeN-LH, HuN- (Table 2). As described above, pairwise F ST analyses were performed between specified regions (Table 3). In all estimated 105 pairwise F ST values, only 9 were statistically insignificant. Pairwise F ST values of regions within clade I (HeN+HuN+GZ-AS) and clade II (GX +HN+GZ-GY) were much lower than those between the two groups.

Divergence time
According to the relaxed molecular clock analysis of the concatenated sequences of cytb and cox1 genes (Fig. 4)

Demographic analysis
The results of neutral test analyses were submitted in the Table 4 (Fig. 6). However, both mismatch distribution analyses and the neutrality tests rejected a sudden population expansion in the clade II (Table 4 and Fig. 5).

Discussion
In our study, the AT-richness of cytb and cox1 genes exceeded 64.7% and 64.3% respectively, which were consistent with the results of studies of other tapeworms such as the Taenia species [19,40] and the Diphyllobothrium species [21]. As suggested by Neigel and Avise [41], the nucleotide diversity (Pi) value is an important index to measure the level of genetic diversity, and Pi > 0.01 is considered to indicate a comparatively large variation in most animals. However, most Pi values were lower than 0.01 in this study, suggesting that there was low genetic variation of 88 isolates from 14 regions. The average genetic distances among 88 isolates from central and southern China are lower than 0.05. This is accord with the range of intraspecific genetic distance (0-0.05) reported by Nei et al. [42]. Some pioneering scientists have been concentrated their studies on the phylogeny of S. erinaceieuropaei. Okamoto et al. [43] evaluated the intraspecific variation of S. erinaceieuropaei and its phylogenetic relationship with Diphyllobothrium using partial sequences of the cox1 gene, Dai et al. [44] examined sequence variability among and within three cestodes, S. erinaceieuropaei, Taenia multiceps and T. hydatigena, from different geographical origins in China based on partial cox1, nad4 and ITS genes, Liu et al. [45] explored sequence variability in three mtDNA regions (cox3, nad1 and nad4) in S. erinaceieuropaei spargana from different geographical regions in Hunan province of China, and Liang et al. [46] investigated sequence variability in S. erinaceieuropaei from dogs in Hunan province in China using partial cox1 and a    [47]. The result obtained from AMOVA showed weak genetic variation within populations. The explanation may be that gene flow interrupts between the S. erinaceieuropaei isolates collected from different geographical locations [29]. Meanwhile, most pairwise F ST values between S. erinaceieuropaei populations were higher than 0.25, indicating great genetic differentiation [48]. This evidenced that a long-term interruption of gene flow between two clades [49].
Estimation of divergence time from molecular data requires the selection of appropriate calibration information, such as fossil record or geological evidence [50]. Given a lack of fossils or geological events for S. erinaceieuropaei, we tended to use calibrating information based on substitution rates obtained from Taenia tapeworms for which there is a good fossil record or strong geological evidence dating a vicariance event [19,35]. Our data suggest that the S. erinaceieuropaei lineage separated 1.16 Myr ago, during early Pleistocene. This date is earlier than the divergence time between Asian lineage and African-American lineage of Taenia solium (in the middle Pleistocene) calculated using cytb and cox1 genes [19,51]. The average age estimates of most of the haplotypes imply that they originated within a 0.  Fig. 4), most of haplotypes appeared.
The divergence patterns of species can be driven by climatic fluctuations [52]. Tremendous climatic changes, particularly the Quaternary glaciations have made many plants and animals posterior density interval for the main nodes. Numbers above branches represent the Bayesian posterior probabilities. Only posterior probabilities above 0.6 are shown. Numbers in the square frames indicate the estimated age and 95% confidence intervals (shown in parenthesis). Circled Roman numbers 'I' and 'II' refer to two main clades discussed in the text. Dotted lines with lower case letters refer to divergence times discussed in the text. extinct and influenced the evolution and distribution of many plants and animals in China and its neighboring areas [53]. In our study, the neutrality test Fu's F S result of clade I was significantly negative, and mismatch distribution statistic support the assumption of population expansion of HeN+HuN+GZ-AS group in the past. The BEAST analysis showed that most of haplotypes appeared during the middle and late Pleistocene, coinciding with the Bayesian skyline plot (BSP) analysis: a population expansion occurred after about 0.3 Myr ago (in the middle Pleistocene). The climatic oscillations during glacial periods in the Quaternary may affected the demography and diversification of S. erinaceieuropaei from Henan and Hunan provinces in China. However, the demography and diversification pattern of GX+HN+GZ-GY group requires further research to clarify the definitive reason.
In conclusion, Spirometra erinaceieuropaei from central and southern China can be divided into two populations: the HeN+HuN+GZ-AS group and the GX+HN+GZ-GY group, and the two groups diverged from each other during the early Pleistocene. The climatic fluctuations in the Quaternary probably impacted the demography and diversification of the HeN+HuN+GZ-AS group.