Phylogenetic and Evolutionary Analysis of Chinese Leishmania Isolates Based on Multilocus Sequence Typing

Leishmaniasis is a debilitating infectious disease that has a variety of clinical forms. In China, visceral leishmaniasis (VL) is the most common symptom, and L. donovani and/or L. infantum are the likely pathogens. In this study, multilocus sequence typing (MLST) of five enzyme-coding genes (fh, g6pdh, icd, mpi, pgd) and two conserved genes (hsp70, lack) was used to investigate the phylogenetic relationships of Chinese Leishmania strains. Concatenated alignment of the nucleotide sequences of the seven genes was analyzed and phylogenetic trees were constructed using neighbor-joining and maximum parsimony models. A set of additional sequences from 25 strains (24 strains belong to the L. donovani complex and one strain belongs to L. gerbilli) were retrieved from GenBank to infer the molecular evolutionary history of Leishmania from China and other endemic areas worldwide. Phylogenetic analyses consolidated Chinese Leishmania into four groups: (i) one clade A population comprised 13 isolates from different foci in China, which were pathogenic to humans and canines. This population was subdivided into two subclades, clade A1 and clade A2, which comprised sister organisms to the remaining members of the worldwide L. donovani complex; (ii) a population in clade B consisted of one reference strain of L. turanica and five Chinese strains from Xinjiang; (iii) clade C (SELF-7 and EJNI-154) formed a population that was closely related to clade B, and both isolates were identified as L. gerbilli; and (iv) the final group, clade D, included Sauroleishmania (LIZRD and KXG-E) and was distinct from the other strains. We hypothesize that the phylogeny of Chinese Leishmania is associated with the geographical origins rather than with the clinical forms (VL or CL) of leishmaniasis. To conclude, this study provides further molecular information on Chinese Leishmania isolates and the Chinese isolates appear to have a more complex evolutionary history than previously thought.


Introduction
Leishmaniasis, a geographically widespread disease, is caused by infection with protozoan parasites of the genus Leishmania. The clinical forms are grouped into localized skin lesions of cutaneous leishmaniasis (CL), destructive metastatic nasopharyngeal lesions of mucocutaneous leishmaniasis (MCL), and progressive systemic visceral leishmaniasis (VL), which is fatal if left untreated [1]. The incidence of leishmaniasis is increasing, approximately 0.2 to 0.4 million VL cases and 0.7 to 1.2 million CL cases occur each year, with an estimate of 20,000 to 40,000 deaths as a result of leishmaniasis each year [2]. In recent years, Leishmania-HIV coinfection has been described as an emerging disease [3].
Leishmaniasis is also found in China, especially in the west frontier regions [4][5][6]. According to previous studies, L. donovani and/or L. infantum are the most common etiological pathogens in China [7,8]. A national control program has largely brought the disease under control in eastern China, while leishmaniasis is considered to be endemic or with sporadic outbreaks in western China, including Xinjiang Uygur Autonomous Region, Gansu, Sichuan and Inner Mongolia [9,10]. In 2008-2009, following an outbreak of the desert subtype of zoonotic VL, the number of infant VL infections dramatically increased in Jiashi County, Xinjiang [11].
The Leishmania species in China are complex. Various clinical forms exist, ranging from cutaneous to fatal VL, and they are caused by human/canine pathogenic Leishmania species. Some species infect gerbils and lizards, but these are nonpathogenic to humans [12]. According to different epidemiological characteristics, leishmaniasis in China has been classified into three types [13]: plain, hill and desert foci. The different hosts and vectors for the parasites in distinct foci make unveiling the relationship between Leishmania strains difficult. For example, L. donovani is generally considered to be the pathogen of plain type leishmaniasis, and only sporadic cases of this infection have been reported since the 1980s. In Gansu and Sichuan provinces, hill type cases of leishmaniasis are prevalent, and most patients are children aged less than 10 years. In this instance, domestic dogs are the key reservoir. Xinjiang is a mixed area with two epidemiological types, plain and desert [9,11]. The plain type is endemic in the oases of the plains of Kashgar City, where children aged less than 5 years are susceptible. The desert type is distributed in the desert regions of Xinjiang. Two species were found in this area, one is L. donovani complex that causes autochthonous kala-azar in infants, and the other species is L. turanica, which is nonpathogenic to humans.
Research groups have investigated the phylogenetic relationship of the Chinese Leishmania strains using internal transcribed spacer 1 (ITS1) sequencing [14] and kinetoplast cytochrome oxidase II (COII) gene sequencing [15]. Some studies have classified Chinese isolates as L. infantum [16,17], while other research groups suggest that Chinese Leishmania isolates do not form a monophyletic group (such as L. donovani or L. infantum) due to the presence of a novel, undescribed species of Leishmania [14,15,18]. However, most of these studies included single species isolates or isolates from a limit number of geographic regions. Therefore, the taxonomic diversity and phylogenetic relationship of Chinese Leishmania strains remain unclear.
As morphological distinction of Leishmania species is impractical, the immunological and DNA-based criteria are used to identify the different strains. The universally accepted standard procedure for characterizing and identifying strains of Leishmania is multilocus enzyme electrophoresis (MLEE) [19]. However, the drawbacks of MLEE have restricted its usage [20,21] and this method has been challenged by many alternative methods for species discrimination using molecular markers. These include restriction fragment length polymorphism (RFLP) analysis of the specific gene or gene fragment, such as cpB/gp63 [22], multilocus microsatellite typing (MLMT) fragment analysis [16,17,23], and multilocus sequencing typing (MLST) which targets conserved genes [24,25].
The introduction of molecular approaches, particularly the numerous typing methodologies with various targets, has called for a revision of Leishmania taxonomy [26,27]. Multilocus sequence typing (MLST) have been used to provide new insights into the population genetics, taxonomy and evolutionary history of Leishmania, and multiple specific MLST targets are available for Leishmania species [24,25,28].
Based on the published targets for the sub-genus of Leishmania, we combined five enzyme-coding genes (fh, g6pdh, icd, mpi and pgd) and two conserved genes (hsp70 and lack) to differentiate the Chinese Leishmania isolates and to investigate their phylogenetic relationships.

Ethics Statement
Details of the WHO codes, sources, geographical origins and clinical manifestations of the 28 Leishmania isolates used in this study are listed in Table 1. Fig. 1 shows the different locations from where the Chinese strains had been isolated. We included 22 Leishmania strains isolated from different foci in China and 6 WHO Leishmania reference strains. Two strains, MCAN/CN/11/1101 and MCAN/CN/11/1102, were isolated from two dogs in China in 2011. The other 26 strains included in this study were kindly provided by Professor Wang Junyun at the Shanghai Municipal Center for Disease Control and Prevention, and has been previously analyzed by Wang Y et al. [29]. For the isolation of MCAN/CN/11/1101 and MCAN/CN/11/1102, oral informed consent was obtained from the guardians of the two dogs. The original animal work that produced the samples was conducted in accordance with guidelines of the Chinese Council on Animal Care, as approved by the Animal Care Committee of West China Hospital, Sichuan University.

Parasite Culture and Genomic DNA Preparation
The parasites were isolated as promastigotes on Novy-MacNeal-Nicolle biphasic culture medium and grown in Medium 199 supplemented with 15,20% heat-inactivated fetal bovine serum at 22,24uC. The promastigotes were harvested at room temperature by centrifugation at 4000 rpm for 10 min and washed three times with NET buffer (50 mM Tris-HCl, 125 mM EDTA, 50 mM NaCl). Total genomic DNA was extracted immediately using a proteinase K-phenol/chloroform method [30]. DNA was suspended in TE buffer (10 mM Tris-HCl, 1 mM EDTA) and stored at -20uC until required.

PCR Amplification and Gene Sequencing
Five enzyme-coding genes [24,25] Table 2. After an initial denaturation step for 10 min at 95uC, DNA samples were processed through 30 cycles of 1 min at 95uC, 1 min at the annealing temperature (T A ) indicated in Table 2, 90 s at 72uC, followed by a terminal elongation step of 10 min at 72uC. Amplification products were examined by electrophoresis on 1.5% agarose gels and stained with ethidium bromide.
PCR products of the expected size were sequenced directly in both strands by ABI 3730 DNA Sequencers (Applied Biosystems) using the same primer pairs in the initial PCR reactions. The sequences were assembled with the aid of ContigExpress (http:// www.contigexpress.com/), and a consensus sequence was generated for each strain. Apart from strain WDD23 for g6pdh, no gaps or insertions was detected in the sequences obtained from the strains used in this study. The gene sequences were deposited in GenBank (accession numbers shown in Table 1).

Nucleotide Sequence Analysis
The sequences were aligned using ClustalW program in the MEGA 5.0 package [33]. The aligned matrix from this procedure was manually rechecked and minor adjustments were made if required. The transition/transversion ratios were calculated using the Kimura-2-Parameter and the synonymous and nonsynonymous substitution rates of selection were calculated using the modified Nei-Gojobori method (Jukes-Cantor distance model) with the MEGA 5.0 package.

Protein Sequence Analysis
Molecular weight, overall charge, and charge variation according to theoretical isoelectric point (pI) for each partial protein sequence were determined using online resources (http://web. expasy.org/compute_pi/).

Phylogenetic Analysis
Phylogenetic trees were constructed with each single gene sequence of fh, g6pdh, icd, mpi, pgd, hsp70 and lack with one or no heterozygous genes. The DNA sequences of strains with one or more heterozygous sites were deconstructed into two (or several) individual sequences. As the similarity in sequences produced poor resolution result of a single gene tree, the concatenated alignment of the nucleotide sequences for the five enzyme-coding genes or all seven genes were analyzed. Prior to phylogenetic analyses, the most appropriate model of evolution, GTR+I+G, was determined using jModelTest v.0.1.1 [34] under the Akaike Information Criterion (AIC). Different statistical methods (for example, the maximum likelihood tree, UPGMA tree and neighbor-joining tree) were calculated in MEGA 5.0 [33]. The support of monophyletic groups was assessed by the bootstrap method with 1,000 replicates. Simultaneously, the PAUP (version 4.0b) [35] program was used to construct phylogenetic trees by maximum parsimony analyses and to perform bootstrap with 1,000 replications.
A set of additional sequences of 25 strains (shown in Table 1, 24 strains belong to the L. donovani complex and one strain belong to L. gerbilli) were retrieved from GenBank for the global analyses. The retrieved sequences were from the studies of Mauricio et al. [24] and Zemanova et al. [25]. Alignment simulations were performed using the ClustalW program. Phylogenetic and molecular evolutionary analyses were conducted using MEGA 5.0 and PAUP 4.0b as mentioned above.

PCR Amplification of the Seven Genes
Using the primers listed in Table 2, a single product of the expected size for each gene was amplified from each of the Chinese Leishmania strains and the WHO reference strains. Three strains (LIZRD, KXG-E and M2903) only yielded the gene sequences of hsp70 and lack. As only the region between the primers was sequenced, only partial section of the entire coding region of each gene was analyzed. The lengths of the sequences, the nucleotide positions of the start and the end of the sequences analyzed for the genes, their G+C contents and the transition/ transversion ratios (Ts/Tv) were given in Table 2. The nonsynonymous/synonymous substitution ratios (dN/dS) varied, though they were all below 1.00 for seven genes. On a Z-test for selection, all the genes were neutral.

DNA Sequences
Two major groups of aligned sequences were noted. The first group included 13 Chinese isolates and a WHO L. donovani reference strain DD8. The second group included five Chinese isolates (strain KXG-Y, KXG-R, KXG-57, KXG-11 and QITAI-15) and a WHO L. turanica reference strain 3720 as reported in a previous study [29]. When the two groups were compared, several nucleotide differences at numerous sites were found, so the DNA sequences were analyzed for the two groups, respectively. Table 3 shows the polymorphisms within the group of 13 Chinese isolates. The number of single nucleotide polymorphic (SNP) sites varied among the genes. For example, there were seven mutations in the 1671 bp (7/1671) section of the fh gene, of which four were silent and three resulted in altered amino acid residues. Six SNPs (6/1578) were observed in g6pdh, and no alternative amino acid was identified. Two mutations were detected in both the icd and pgd genes, no amino acid change was observed in icd, and one position was involved with an amino acid change in pgd. Five diversity nucleotides were displayed in mpi, with one change in an amino acid residue. A low level of polymorphism was revealed for hsp70 among the isolates. There were three heterozygous sites for strain 801. A high degree of similarity was observed among the lack gene sequences, and no amino acid changes were detected.
For the six strains (3720, KXG-Y, KXG-R, KXG-57, KXG-11 and QITAI-15) of L. turanica mentioned above, the sequences had a high degree of similarity, apart from the synonymous mutations that occurred at a few mutant sites. Two strains LIZRD and KXG-E did not amplify the five enzyme-coding genes (fh?g6pdh?icd?mpi and pgd), so there is no accession numbers in GenBank. c For the 6 WHO reference strains, we did not submit the corresponding sequences obtained here to GenBank because the sequences have been studied and submitted to GenBank by other researchers. Meanwhile, we aligned the sequences of the enzyme-coding genes and hsp70 gene for the 6 reference strains we determined with those sequences retrieved from GenBank, the sequences for the genes we obtained were the same with those in GenBank, respectively. d Strains whose sequences taken from GenBank were studied using MLST by Mauricio

Phylogenetic Trees
The initial tree constructed based on individual gene comparisons did not produce discrepancy tree topologies (data not shown). The tree shown in Fig.2 was constructed based on the concatenated sequences of hsp70 and lack using the neighborjoining method in MEGA5.0 package. The phylogenetic analysis of hsp70 and lack showed that 13 Chinese Leishmania isolates grouped as a population in clade A, which were well separated from other strains. Clade A was further divided into two subpopulations, clade A1 and clade A2. Five Chinese strains formed a group (clade B) with WHO L. turanica reference strain 3720, as reported previously [29]. Two distinct groups were also produced, one including SELF-7 and EJNI-154 (clade C) which were closely related to clade B in the tree, and the other population (clade D), consisting of LIZRD and KXG-E, was separated from the other Chinese isolates. Phylogenetic tree was also constructed based on the concatenated sequences of the seven genes (Supplement Fig.1). Only 25 isolates were included in the analysis: three isolates, LIZRD, KXG-E and M2903, were excluded as only hsp70 and lack genes were amplified. The tree topologies calculated by using different statistical methods or substitution models did not differ, so only the tree constructed by the neighbor-joining method based on Kimura 2-Parameter distance matrices in the MEGA 5.0 package (Supplement Fig.1) was presented, bootstraps with 1,000 replicates were applied. Three major clades (clades A, B and C) were also recognized in the neighbor-joining tree based on the combined sequences. Clade A included a WHO L. donovani reference strain DD8 and Chinese isolates which were pathogenic to humans and canines, and the Clade A can be subdivided into two general subclades: subclade A1 included the majority of strains from humans and sandflies; subclade A2 contained the strains isolated from humans and canines. The coexistence of different disease phenotypes (VL and CL) in the same species complex (clade A) was noted in this study, more strikingly, the same genotype from different disease phenotypes. In a previous study, incongruence between the Leishmania genotypes and the VL/CL disease phenotype of the isolates had also been noted [7]. Clade B included five Chinese Leishmania strains and a WHO reference L. turanica strain 3720. A distinct group, clade C, was detected for strain SELF-7 and EJNI-154, which were identified as L. gerbilli.
We also investigated the phylogenetic relationships between Chinese Leishmania isolates in conjunction with isolates from other geographical regions. The trees generated by the entire dataset are displayed in Fig. 3 (and Supplement Fig.2). In both the neighborjoining and maximum parsimony trees, 13 Chinese isolates formed two subclades which were separate from other strains of the L. donovani complex isolated from Europe, India and Africa. The branches that clustered in the tree reflected the geographical origins of the Leishmania strains. The six isolates from Xinjiang, Sichuan and Shandong Province in China comprised the subclade A1, the other seven isolates from Xinjiang, Sichuan and Gansu provinces in China supported the subclade A2. Additionally, the strains in clade A1 were close to the Indian and Africa isolates, while strains in clade A2 were close to the European isolates.

Discussion
Multilocus sequence typing (MLST) has the potential to provide a substantial contribution to the understanding of the epidemiol- Figure 2. Neighbor-joining unrooted tree constructed from hsp70 and lack genes for 28 isolates in this study. The Kimura-2-parameter method was used. Numbers above branches correspond to bootstrap valued based on 1,000 replicates. The strains were designated by their names (See Table 1  ogy, transmission and phylogenetics of infectious diseases, and the data can then be shared by depositing or withdrawing information from the GenBank, EMBL or DDBJ databases [36]. In this study, seven gene targets (fh, g6pdh, icd, mpi, pgd, hsp70 and lack) were described for Leishmania strains in China, and the phylogenetic relationships and evolution of Chinese Leishmania isolates were investigated.
Despite the low level of SNP variation observed in our study, the existing differences of concatenated sequences reveal significant genetic diversity. The neighbor-joining or maximum parsimony methods identified several clusters when gene sequences of Chinese Leishmania strains were aligned with those of the WHO reference strains. In fact, these clusters were also observed in other tree topologies using different models (data not shown), which indicates that the derived groups were robust and not dependent on the choice of evolutionary methods underlying the various treebuilding algorithms. Within the Leishmania subgenus, there was correlation between intraspecies divergence and the geographic distribution of the species. Within clade A, isolates from Shandong and Karamay of Xinjiang (clade A1) apparently differed from the majority of the closely related strains isolated from Gansu and Kashgar of Xinjiang (clade A2). This further demonstrated that the geographic origin of a strain was a more important predictor of genetic relationship than the type of disease presentation [37].
It has been proposed that environmental changes of human origin may be attributed to population structure and phylogenetic diversity [38]. The hosts play important roles in determining the outcome of leishmaniasis. The parasites in different hosts may develop different escape responses that result in variable mutation rates. Minor differences exist between strain SC6 from a patient and SC9 from a canine may have been a result of the host/ reservoir origins. More isolates from humans and canines in Sichuan Province are needed to illustrate the phenomenon. . Phylogenetic trees constructed based on the sequences of the fh, g6pdh, icd, mpi and pgd genes for 25 isolates in this study and 24 isolates of the Leishmanaia donovani complex from other studies. Neighbor-joining tree was constructed from five enzyme genes for the 49 isolates using MEGA 5.0 software. The Kimura-2-parameter method was used. Numbers above branches correspond to bootstrap values based on 1,000 replicates. The strains were designated by their names (See Table 1  In clade B, five isolates from Xinjiang, an autonomous region of China, were highlighted by substantial genetic differentiations from L. donovani complex at the species level. The result of seven conserved genes indicated that the five strains were closely related to strain 3720, which was classified as L. turanica. Our results support the conclusion of previous studies [29] that these five strains were grouped as L. turanica. Two strains (SELF-7 and EJNI-154) in clade C have been previously identified as L. gerbilli [29]. The sequences of g6pdh, icd and mpi for strain SELF-7 and EJNI-154 were aligned with the WHO L. gerbilli reference strain, MRHO/CN/1960/Gerbilli, in the analysis, and strain SELF-7, EJNI-154 were corroborated as L. gerbilli. Additionally, five enzyme genes did not amplify from three isolates (LIZRD, KXG-E and M2903). Strain LIZRD and KXG-E were grouped into clade D in Fig. 2 when the hsp70 and lack genes were analyzed. The strain KXG-E was genetically similar to Sauroleishmania strain LIZRD described by Wang et al. [29]. As Sauroleishmania and Leishmania are genetically significantly different [39], the changes in the primer annealing sites for strains LIZRD and KXG-E may have caused the failure in amplification. New primers located in the intragenic region of the corresponding enzyme-coding genes were designed and only part of the sequence for the single enzymecoding gene was amplified. For strain M2903 of L. braziliensis, a New World species that belongs to a different sub-genus (Viannia), it was found to be distantly related to the Old World species. The enzyme-coding genes of this strain were not amplified with the primers used in this study as a result of the inappropriate primers.
Based on the trees constructed from our database and other sequences of the L. donovani complex worldwide, the results demonstrate that Chinese Leishmania isolates (clade A1 and A2) are sister to other members of the L. donovani complex. L. donovani complex species from China were segregated into two groups: clade A1 was most closely related to L. donovani strains from India and Africa, whereas clade A2 species were found to be related to the European strains of L. infantum. In the tree shown in Fig. 3, the clade A1 strains were placed between the Leishmania species from India and Africa. Previous finding has shown that Indian and African L. donovani strains differ considerably, and our results were in accordance with the previous study [40]. Thus, we further provide an evidence of a robust grouping of Chinese Leishmania when compared with the worldwide isolates, which provides a strong indication that the isolates from China have a more complex evolutionary history than previously thought. This study provides a basis for a further understanding of the phylogeny and evolutionary history of Leishmania parasitic species in China.
Whether Chinese Leishmania isolates are actually grouped as an undescribed species on the basis of new insights is a matter of debate. In the present study, sampling bias may exist for phylogenies, even though we tried to overcome this problem by selecting strains from diverse geographic origins and hosts. In addition, sequence variants may exist among different cells in a given population when uncloned cultures are used for DNA isolation. The direct DNA isolation of Leishmania from clinical samples would avoid this potential complication in the analysis. Further studies based on the current dataset of Chinese Leishmania isolates should be undertaken to gain further insight into the classification and evolution of Leishmania parasites. A systematic comparison of a statistically significant number of worldwide isolates would be an ideal way to elucidate the evolutionary profile of Leishmania species. Figure S1 Phylogenetic trees constructed based on sequences of the fh, g6pdh, icd, mpi, pgd, hsp70 and lack genes for 25 Leishmania isolates in this study. The neighbor-joining unrooted tree was constructed using MEGA 5.0 software. The Kimura-2-parameter method was used. Numbers above branches correspond to bootstrap values based on 1,000 replicates. The strains were designated by their names (see Table 1 for more details). (TIF) Figure S2 Phylogenetic trees constructed based on the sequences of the fh, g6pdh, icd, mpi and pgd genes for the 25 isolates in this study and 24 isolates of the Leishmania donovani complex from other studies. Maximum parsimony tree constructed with the sequences of five enzyme genes for 49 isolates using the PAUP 4.0b program. The trees were rooted with L. tropica (K27) and L. major (5ASKH). Numbers above branches correspond to the bootstrap values based on 1,000 replicates. The strains were designated by their names (see Table 1 for more details). (TIF)

Author Contributions
Conceived and designed the experiments: CZ XL YM. Performed the experiments: CZ XL LS YM. Analyzed the data: CZ XL XD JJ YM. Contributed reagents/materials/analysis tools: XL YM. Wrote the paper: CZ YM.