The Genetic Structure of Leishmania infantum Populations in Brazil and Its Possible Association with the Transmission Cycle of Visceral Leishmaniasis

Leishmania infantum is the etiologic agent of visceral leishmaniasis (VL) in the Americas, Mediterranean basin and West and Central Asia. Although the geographic structure of L. infantum populations from the Old World have been described, few studies have addressed the population structure of this parasite in the Neotropical region. We employed 14 microsatellites to analyze the population structure of the L. infantum strains isolated from humans and dogs from most of the Brazilian states endemic for VL and from Paraguay. The results indicate a low genetic diversity, high inbreeding estimates and a depletion of heterozygotes, which together indicate a predominantly clonal breeding system, but signs of sexual events are also present. Three populations were identified from the clustering analysis, and they were well supported by F statistics inferences and partially corroborated by distance-based. POP1 (111 strains) was observed in all but one endemic area. POP2 (31 strains) is also well-dispersed, but it was the predominant population in Mato Grosso (MT). POP3 (31 strains) was less dispersed, and it was observed primarily in Mato Grosso do Sul (MS). Strains originated from an outbreak of canine VL in Southern Brazil were grouped in POP1 with those from Paraguay, which corroborates the hypothesis of dispersal from Northeastern Argentina and Paraguay. The distribution of VL in MS seems to follow the west-east construction of the Bolivia-Brazil pipeline from Corumbá municipality. This may have resulted in a strong association of POP3 and Lutzomyia cruzi, which is the main VL vector in Corumbá, and a dispersion of this population in this region that was shaped by human interference. This vector also occurs in MT and may influence the structure of POP2. This paper presents significant advances in the understanding of the population structure of L. infantum in Brazil and its association with eco-epidemiological aspects of VL.


Introduction
Leishmania infantum Nicole 1908 is a causative agent of visceral leishmaniasis (VL) in endemic countries in Southern Europe, North Africa, West and Central Asia and the Americas [1]. This species belongs to the L. donovani complex [2,3]. Recently, a study that compared microsatellite profiles [4] strongly supported the hypothesis that L. infantum was introduced in the New World during or after the Colonial period, and this introduction may have occurred more than once, as previously proposed [5,6,7,8,9].
Initially, AVL was associated with rural and poor peri-urban areas, but this disease is displaying a new pattern, and urban cases are becoming common [1,14,15,16,17,18,19]. Dogs are the main reservoirs of AVL, and shelters of domestic animals likely act as breeding sites or provide a niche for the maintenance of the vector population close to residences [14,20,21,22,23].
In Brazil, the dispersion of AVL has been associated with the intense human migrations that occurred from the Northeast, where 82.5% of the Brazilian cases occur [16], to the urban centers of the Central West and Southeast Regions (São Paulo [SP] and Minas Gerais States). The Leishmaniases are known to follow anthropic events, such as demographic expansion and migration, hydroelectric plants, road construction or gas pipelines [1,17,18,24,25,26,27].
Recently, the South Region experienced its first outbreak of canine visceral leishmaniasis, which may have been imported from neighboring endemic countries [28].
In most endemic areas of American countries, Lu. longipalpis is the main vector of L. infantum [29]. However, the relevance of secondary vectors has been discussed. In Colombia, the sand fly Lu. evansi was reported to be an alternate vector of L. infantum [30]. Recently, Lu. migonei was indicated as a putative vector in Pernambuco (PE), Brazil, based on the observation of its natural infection with L. infantum and the absence of Lu. longipalpis in the area [31]. In some Brazilian states from the West Region, such as Mato Grosso (MT) and Mato Grosso do Sul (MS), Lu. cruzi has been reported infected by L. infantum [32,33,34]. This sand fly species belongs to the Lu. longipalpis complex, and their overlapping distributions in many areas of the Central West [35,36,37,38] might have important implications for the evolutionary process of vector speciation [39,40,41].
In the Old World, there are many vector species of L. infantum (WHO, 2010). This Leishmania species is known to infect many sand fly vectors, even when strains from the same zymodeme (MON-1) were considered [42]. The L. infantum zymodeme 1 seems to be predominant in the Neotropics; however, based on a set of microsatellite markers, some Central American strains have been shown to be more similar to the European non-MON-1 Figure 1. The geographic origin of Leishmania infantum strains and populations of STRUCTURE analysis. The yellow dots represent the locations of the collections of each analyzed strain. The graphics indicate the proportion numbers of the strains (Y axis) in each population (X axis). The assignment of the strains to a population was performed in the STRUCTURE analysis that was based on the profiles of 14 microsatellite markers. POP1 is a widespread population, and it is predominant in most of the foci. POP2 [4]. This approach, which is known as multi-locus microsatellite typing (MLMT), is based on the polymorphism of short tandem repeats that have been widely applied in different organisms as a result of advances in fragment-size detection [43,44,45,46]. Sets of multilocus microsatellite markers have been developed for Leishmania species, and they have proven to be highly sensitive for assessing genetic variation on an intra-zymodeme level [47,48,49,50,51,52]. These sets of markers have improved studies on the population genetics and molecular epidemiology of VL in Mediterranean areas and in the Middle East [52,53,54,55,56,57,58,59,60,61].
Previous microsatellite-based studies from New World L. infantum indicated the existence of two main populations; one population contains strains from Paraguay, Brazil, Colombia and Honduras and is composed of zymodeme MON-1, and a second population contains strains from Venezuela, Panama, Costa Rica and Honduras and is related to non-MON-1 zymodemes [4]. Substructures within the New World MON-1 population were also detected but failed to reach statistical significance. Furthermore, when the Neotropical and Old World strains were analyzed together, the first population resembled those from Portugal and Spain, from which the New World MON-1 strains likely originated [4,62].
In fact, MLMT has been demonstrated to successfully evaluate the population structure of L. infantum, which can help to elucidate epidemiological aspects, such as the spread of the parasites across endemic areas and the origin of VL outbreaks [4,57,62].
In this study, we described the L. infantum population structure from a representative sample of Brazilian endemic areas. We increased the sampling from a previous study [4] and included strains from outbreaks of exclusive CVL in South Brazil and from re-emerging foci in Central West Brazil. The sample locations also include areas in which vectors other than Lu. longipalpis may participate on the transmission cycle. In addition, some geographical hypotheses were corroborated by the observed MLMTypes. Human influence on the dispersion of genotypes and expectations for further VL studies were also discussed.

Sample set
The samples were obtained from the Leishmania collection of the Oswaldo Cruz Institute (CLIOC -http://clioc.fiocruz.br). We included 162 strains from most Brazilian states and 11 from Asunción, Paraguay ( Figure 1) that were typed by MLEE as L. infantum zymodeme 1. Twenty strains included here were previously analyzed by Kuhls et al. [4] (Table S1). Eighty-four of the samples were from humans, 87 were from dogs, one was from opossum (Didelphis marsupialis) and one was from fox (Cerdocyon sp.). Table S1 provides the sample information, including the international code [1], CLIOC code (IOC/L) and geographic origin.

DNA extraction and PCR amplification
The DNA was isolated and extracted using the Wizard TM Genomic DNA Purification System (Promega, Madison, WI, USA) following the manufacturer's protocol; the DNA was stored at 4uC. The PCRs were performed with 14 primer pairs ( Table 1) that were previously described for studies of species from the L. donovani complex [59]. The forward primers of each pair were conjugated with 6FAM or HEX fluorophores. Fragment-length screening was performed using Mega BACE 1000 (GE Healthcare) or ABI 3100XL (Applied Biosystems) from the Fragment Analysis Platform of the Oswaldo Cruz Institute (Plataforma Genômica -Sequenciamento de DNA -PDTIS/FIOCRUZ; RPT01A) using ET-ROX 400 (GE Healthcare) and GeneScan 500 ROX (Applied Biosystems), respectively, as the size standards.
The length analyses were performed on the Genetic Profile 2.2 (GE Healthcare) and Peak Scanner TM Software v1.0 (available at http://www.appliedbiosystems.com accessed on October 14, 2011). The results from both software packages were standardized based on strains of known sizes.

Data analysis
The STRUCTURE 2.3 software [63] was used to cluster individuals into populations with a model-based approach. The assigned proportion of each individual belonging to each population (membership coefficient Q) was estimated using Bayesian statistics and Markov Chain Monte Carlo simulations with a 100,000 iterations burn-in period that was followed by further 1,000,000 steps. For each value of K (from 1 to 10), 10 iterations were performed to estimate values of DK [64], which were implemented on Structure Harvester v0.6.1 (Earl, 2011; available at http://taylor0.biology.ucla.edu/structureHarvester/ accessed on October 14, 2011). The software CLUMPP 1.1.2 [65] was used to deal with the ''label switching'' performing alignments of the Q values for the chosen number of populations. The barplots of the CLUMPP outfiles were visualized using the Excel software (Microsoft). This clustering approach was used as a base for the population genetics analyses by assigning individuals to the single cluster in which they exhibited the highest Q value.
The distance-based analysis was performed with Populations 1.2. 30 (available at http://bioinformatics.org/tryphon/ populations/accessed on October 14, 2011) using the chord distance of Cavalli-Sforza & Edwards [66] that is considered to be appropriate for closely related groups [67]. The matrix of genetic distances between individuals was used to construct a phylogenetic network with the Neighbor-net (N-net) method [68] on SplitsTree 4.11.3 [69]. The factorial correspondence analysis (FCA) implemented on GENETIX software [70] which places strains in a three-dimensional space according to its relation of genotypes.
Descriptive statistics were calculated for the assigned population using the GDA software (available at http://hydrodictyon.eeb. uconn.edu/people/plewis/software.php accessed on October 14, 2011) to calculate the mean number of alleles (A), the observed (H o ) and expected (H e ) heterozygosity, and the inbreeding coefficient F IS . The significance of F IS was tested in GENETIX software [70] with 1,000 bootstraps.  (6) (1)  The F ST estimates were calculated using MSA software [71] with significance (p) tested with a 1000 permutations.

Descriptive analysis
In this study, we analyzed 162 L. infantum strains from 17 Brazilian states and 11 from one locality in Paraguay (Figure 1; Table S1). The MLMT profiles revealed the presence of 67 genotypes; among these, 45 were unique (Table S1). Unique genotypes were observed for strains that were obtained from the different hosts, including humans, dogs, fox and opossum. The genotype 'TYPE 10' was sampled 52 times from 14 Brazilian states and Paraguay, and it represented 30% of the strains, including both human and dog isolates (Table S1; Table 2). The second predominant genotype was 'TYPE 22', which was mainly composed of MS strains; it included 47% (17/36) of the strains from this state and only one strain from CE. Ten genotypes were shared between humans and dogs ( Table 2; Table S1).
The maximum number of alleles per locus was 8 (mean 2.93; Table 1), and three loci were monomorphic (KLIST7031, Li71-5/ 2, CS20). With the exception of two loci (Lm2TG and Lm4TA), the most frequent allele represented more than 92% of the strains. The overall H o for each locus ranged from 0 to 0.072 (mean = 0.013), and the values were always lower than H e , which ranged from 0 to 0.495 (mean = 0.102) ( Table 1). Overall, the inbreeding coefficient was 0.869, and it ranged from 20.003 to 0.939 (Table 1). The F IS for the two loci that presented negative values were not significantly different from zero (Table 1).

Population structure analyses
The DK values indicated that the variation of the dataset is better explained when three populations are considered ( Figure  S1). The result of K = 3 indicates a population composition of 111 strains for POP1 (36 genotypes), 31 strains for POP2 (19 genotypes) and 31 strains for POP3 (12 genotypes) (Figure 2; Table S1). Analyses with another clustering method implemented on software BAPS 5 [72] showed the same clusters when K = 3, showing the consistence of the populations observed on our dataset (data not shown). The geographic distribution of each population was not homogeneous (Figure 1). POP1 was observed in 17 of the studied localities (16 Brazilian states and Paraguay), and in some areas, it was markedly predominant. Despite the large geographic distribution of POP2 (10 Brazilian states), it was predominant only in MT, where 85% of the strains belong to this population. Seventy-five percent of POP3 was composed of MS strains and the remaining 25% was composed of four other strains, each of which originated from different states.
Even though the DK values favored K = 3 ( Figure S1), we also examined other K values ( Figure S2). We found that at K = 2, the first cluster corresponded to POP1 but displayed a lower level of admixture. The second cluster was composed of POP2 strains, which included samples with high degrees of admixture, and strains from POP3, which had a low probability of admixture as observed for K = 3 ( Figure S2). The POP3 strains formed a cluster at all of the K levels (from 3 to 8) ( Figure S2). Ten strains of POP1 from Espírito Santo State also formed a cluster for K.4 (Q.95% for K = 3; Figure S2). Two additional clusters (with 14 and 16 strains) were observed for K values between five and eight, but they included strains from mixed geographic origins. For some foci (Amazonas [1], Minas Gerais [1], Pará [1] and Rio de Janeiro [3]), few strains were analyzed; although they did not present any specific characteristic, an increase in sample size may provide a more reliable indication of their relationship with other foci. These strains were within POP1, with the exception of that from Pará, which was grouped with the POP2 strains. Furthermore, the strain from Manaus, Amazonas State, displayed dubious anamnesis; this area is not considered to be endemic for VL, and the history of the patient could not be confirmed.

Genetic diversity and phylogenetic analyses
Analyses of the genetic diversity within and between the populations were performed assuming the three populations obtained by the STRUCTURE analysis (POP1, POP2, and POP3; Figure 2; Figure S1). The H o values per locus within the populations ranged from zero to 0.26, and H e ranged from zero to 0.58. H e was always higher than H o . The inbreeding coefficients were high in all but three of the loci (two from POP1 and one from POP3). The overall F IS estimates for each population were 0.888, 0.676 and 0.603 for POP1, POP2 and POP3, respectively. The descriptive statistics of the three populations are shown in Table  S2.
These three populations were supported by significant F ST estimates (p,0.001). The pairwise comparisons (F ST ) were higher when they involved POP3 (0.66 and 0.52 with POP1 and POP2, respectively) than between POP1 and POP2 (0.35).
The three-dimensional FCA confirmed the occurrence of the three populations predicted on STRUCTURE analysis and confirmed by F ST estimates (Figure 3).
The splits graphic (N-net) essentially corroborates the other analyses. However, the presence of large splits indicates incongruences for the positions of some genotypes (Figure 4). POP1 formed a group of genotypes that were separated by short splits, and the other groups were more dispersed and divergent. Some POP2 genotypes were closely related to another population (POP1 or POP3) but do not constitute a distinct cluster. POP3 formed a cluster with simple splits among genotypes.

Discussion
The VL dynamics have changed in recent decades; outbreaks have challenged the public health system, and new scenarios have intrigued researchers. Advances in typing techniques have gradually enhanced the knowledge of the natural history of this widespread disease and have contributed to epidemiologic studies.  Table S1). The distribution of the splits shows the same populations that were determined by the STRUCTURE analysis ( Figure 2); some genotypes from POP2 are closer to POP1, and others are closer to POP3. doi:10.1371/journal.pone.0036242.g004 This study provides new insights into aspects of the population genetics of L. infantum from Brazil and its relation to the ecoepidemiologic aspects of this parasite species in some of its endemic areas in the Americas. The strains from six states were newly sampled with the MLMT approach, including Amazonas, Maranhão, Minas Gerais, São Paulo, Mato Grosso and strains from a recent focus of CVL in South Brazil (Rio Grande do Sul [RS] State).
The MLMT profiles of 173 L. infantum strains, which all belong to the same zymodeme, IOC/Z1, revealed 67 genotypes, which is the same proportion of polymorphic strains (39%) that was observed in a previous study that analyzed L. infantum strains from Central and South America [4]. In general, the H o values were lower than the H e values even within populations, which reflects a lack of heterozygotes, a result that is consistent with other microsatellite-based studies of Leishmania populations [4,47,48,53,54,55,59,60]. Both estimates (H e and H o ) were lower than in the Old World MON-1 strains, as previously described; this corroborates the hypothesis of recent (less than 500 years) introduction of L. infantum in the Neotropics [4]. The lower genetic diversity of the L. infantum populations from the New World when compared to those from the Old World and the high proportion of the predominant allele for each locus support the hypothesis of the introduction and later expansion of a few clones in the Americas since the Colonial period; the expansion likely followed the paths of human migration from the endemic areas of Spain and Portugal [4,62].
Interestingly, the F IS estimates for the POP1 across two loci and one in POP3 were negative, but not statistically different from zero. The F IS estimates for the other polymorphic loci were high, which indicates a depletion of heterozygotes and high endogamy, which is commonly observed in Leishmania species [4,47,48,55,59,60,73,74]. It was previously suggested that the variation in the heterozygosity levels among the loci in Leishmania (Viannia) is associated with genetic exchange among parasites [75]. Another explanation for these high F IS estimates may be the Wahlund effect, which is indicated by small consistent clusters that are observed in the hierarchical structure analysis ( Figure S2).
The use of H e as a measure of genetic diversity was proposed for sexual diploid organisms [76]. In theory, low genetic diversity is expected for strictly clonal organisms and, in absence of recombination, variation would only be generated through mutation events. In this case, the pairs of alleles accumulate mutations and become divergent which would lead to an excess of heterozygotes (Meselson effect) [77,78]. Furthermore, if all of the individuals were heterozygotes, F IS should be equal to 21 and any event of sexual reproduction could easily reduce heterozygosity [79,80,81,82]. The high inbreeding estimates and the depletion of heterozygotes that were observed in Brazilian L. infantum populations suggest a predominantly clonal breeding system; however, even at very low frequencies, sexual reproduction should have occurred, as reported in other studies [4,47,48,55,60,73,74].
Furthermore, the presence of strains that contain traces of more than one population (Figure 2) also indicates the occurrence of gene flow because this mixed ancestry may be interpreted as either migrant descents or recombinants; both of these outcomes are the result of sexual breeding. Understanding the breeding structure of pathogens is not only an academic challenge and it has also epidemiological importance because it might affect the distribution of advantageous mutations (such as those conferring drug resistance) among strains or populations [83,84]. A reproductive system where clonality is associated with low levels of sex could favor the rapid increase in frequency of hybrids with higher fitness [79]. This hypothesis however must be properly tested in Leishmania.
Although the molecular variability in our sampling was low, the clustering analysis detected the existence of three groups (POP1, POP2 and POP3; Figure 2) that were confirmed by FCA and distance-based methods (Figures 3 and 4) and were supported by F statistics. The signs of a population structure among the MON-1 strains from the New World that were analyzed by Kuhls et al. [4] were confirmed in the current work. The absence of statistical support for the New World genetic structure that was observed by Kuhls et al. [4] was likely a result of the lower sample size. POP1 and POP3 represent very similar populations of Sub-Pop1A-INF NW and Sub-Pop1B-INF NW , respectively (Figure 2 in [4]), but they were supported by the F ST estimate in the current study. However, the STRUCTURE analysis showed that many strains can be descendants of migrants, which may also lead to the occurrence of recombination as a consequence of sexual reproduction, as previously described [4,59,60,85].
Interestingly, POP1 contains strains from 16 states and Paraguay, which accounted for 64% of the total sample set and 54% of the genotypes. POP1 includes few samples from MS and MT, where POP3 and POP2 were respectively predominant ( Figure 1). This result may mean that, for some reason, the POP2 and POP3 genotypes possess higher fitness in Central West Brazil, perhaps due to adaptation to a different transmission cycle (see below).
The outbreaks of VL in MS and MT began in the late 1990s and early 2000s [35,86]. A ''historical series'' has shown that in MS, the spread of VL epidemics occurs from west to east, following the construction of a road and gas pipeline [24]. POP3 is composed primarily of strains from MS, where VL has exhibited particularities such as the participation of Lu. cruzi on the transmission [34] and the urbanization of the disease [19,25]. The capital of MS, Campo Grande, accounted for almost 50% of the cases in that state from 2001 to 2006, and these were primarily located in urban areas [86]. Urban cases were also predominant in MT [35]. Although MT and MS share some VL characteristics, they are influenced by different ecoregions. Most strains from MT (86%) were composed of POP2, whereas those in MS were predominantly composed of POP3. Surprisingly, POP2 was more closely related genetically to the widespread POP1 than to the geographic neighbor POP3, which can be seen from the STRUCTURE ( Figure 2) and pairwise F ST analyses.
Despite the presence of Lu. longipalpis in most of the foci from MS and MT, the sympatry with Lu. cruzi seems to be less prevalent in MS, where the latter species commonly accounts for a small percentage of collected sand flies or was simply not recorded in the same area [35,36,37,86,87]. The occurrence of Lu. cruzi is more relevant in a few areas, such as Corumbá (in western MS), one of the first localities on the west-east route of VL in MS. This suggests that this vector probably played an important role in shaping the genetic structure of L. infantum. Perhaps the POP3 genotypes were initially favored by the presence of Lu. cruzi in Corumbá, and then, following human interference, they dispersed through the eastern portion of the state. However, we did not include strains from any areas of high prevalence of Lu. cruzi, which prevents strong conclusions. The expansion of VL from west-east reached western SP [24] and has possibly influenced the distribution of the disease and L. infantum populations in this state. However, this possibility cannot be ruled out by our data because only four strains from Imbú das Artes (eastern SP) were analyzed.
Another way to explain the genetic structure of L. infantum in MS and MT is to assume that the recent spread of VL in Central West Brazil may be a consequence of the expansion from a few strains that were imported from other states, such as those from the Northeast Brazil. In fact, POP3 is predominantly observed in MS, but it is also present in four northeastern states in which VL is highly endemic. Furthermore, 70% of the POP3 MS strains were included in the same genotype 'TYPE 22' (Table S1), which was also recorded for one strain in Ceará State. We cannot exclude the possibility that this Ceará strain does not represent an autochthonous case. However, another explanation for this result is that this genotype entered MS from Northeast Brazil, where a highly successful adaptation facilitated a clonal expansion. It would be interesting to evaluate the possibility of an influence from Eastern Bolivia, although in that country, the only endemic area that has been reported is Los Yungas, La Paz, which does not seem to be connected to the Brazilian foci [88,89].
The clustering of samples from Paraguay and RS in POP1 may indicate that the presence of VL in these areas is not a consequence of VL expansion from MS. It may be difficult to explain how Paraguay and RS are genetically closer to the Southeast and Northeast strains when considering a VL-distribution gap from southern SP to northern RS. Thus, it is conceivable that the dispersion of VL was not linear and may have been carried by asymptomatic hosts [90] from Southeast Brazil to some point in Northeastern Argentina or Southern Paraguay and then expanded to South Brazil. However, it is more likely that the few POP1 strains in MS may represent a dispersal link between this population and those areas.
The L. infantum RS samples that were studied here were isolated from dogs of three foci; two were isolated from areas that share geographic limits with Northeastern Argentina, where new cases were recently described [91] and from which they could be linked.
The recent expansion of VL to South Brazil underscores the importance of understanding the transmission cycle in outbreak areas, such as Argentina and Paraguay, and the genetic relationships between the parasites and the disease source. The Leishmania source should be investigated through molecular epidemiology tools, as was previously performed for L. donovani in Ethiopia, to determine the origin of outbreak strains [57].
In addition to the recent evidence regarding the introduction of L. infantum to the New World [4,62], it is important to understand the dynamics and dispersion of AVL. The geographic population structure that is observed here does not discard the relationship with human migration, urban cycles and new endemic areas, and it also indicates the influence of vector species on the population structure of parasites. Further analyses are needed to elucidate how different vectors shape the variability in L. infantum populations. Moreover, an enhanced coverage and sampling area for some foci and micro-geographical studies could reveal interesting features of the natural history of this still-neglected disease. Figure S1 Delta K (DK) values of Evanno's method calculated with Structure Harvester v0.6.1. The most probable K value K that explains the variation in the microsatellite data set was three.

Supporting Information
(TIF) Figure S2 The distribution of Q values of K populations. The distribution of the alignments for the Q values determined with the CLUMPP software for the chosen number of populations (2,K,8). The DK values indicated that the most probable number of populations is 3 ( Figure S1). The squares show constant clusters with low degrees of admixture. (TIF)