Multilocus Sequence Analysis for Leishmania braziliensis Outbreak Investigation

With the emergence of leishmaniasis in new regions around the world, molecular epidemiological methods with adequate discriminatory power, reproducibility, high throughput and inter-laboratory comparability are needed for outbreak investigation of this complex parasitic disease. As multilocus sequence analysis (MLSA) has been projected as the future gold standard technique for Leishmania species characterization, we propose a MLSA panel of six housekeeping gene loci (6pgd, mpi, icd, hsp70, mdhmt, mdhnc) for investigating intraspecific genetic variation of L. (Viannia) braziliensis strains and compare the resulting genetic clusters with several epidemiological factors relevant to outbreak investigation. The recent outbreak of cutaneous leishmaniasis caused by L. (V.) braziliensis in the southern Brazilian state of Santa Catarina is used to demonstrate the applicability of this technique. Sequenced fragments from six genetic markers from 86 L. (V.) braziliensis strains from twelve Brazilian states, including 33 strains from Santa Catarina, were used to determine clonal complexes, genetic structure, and phylogenic networks. Associations between genetic clusters and networks with epidemiological characteristics of patients were investigated. MLSA revealed epidemiological patterns among L. (V.) braziliensis strains, even identifying strains from imported cases among the Santa Catarina strains that presented extensive homogeneity. Evidence presented here has demonstrated MLSA possesses adequate discriminatory power for outbreak investigation, as well as other potential uses in the molecular epidemiology of leishmaniasis.


Introduction
Leishmaniasis, a vector-borne disease caused by protozoan parasites of genus Leishmania [1], represents one of the highest disease burdens among the neglected tropical diseases in developing nations [2]. While not often fatal like the visceral form, the cutaneous form of the disease contributes substantially to leishmaniasis disease burden as it requires a lengthy and costly treatment regimen, results in apparent scarring, and can progress to a severely disfiguring mucosal form [1]. In recent years, leishmaniasis outbreaks have been described with increasing frequency [3][4][5], including those in sub-tropical regions or regions not previously endemic across the global [6][7][8]. In Brazil, beginning in 2005, an outbreak of human cutaneous leishmaniasis occurred in the southern Brazilian state of Santa Catarina, where the disease had not been observed previously as endemic. Overtime, cutaneous leishmaniasis has emerged in the region with evidence of a continued transmission cycle [9]. The species responsible for this outbreak has been incriminated as Leishmania (Viannia) braziliensis [9], the most widely distributed Leishmania species in Brazil to date [10,11]. However, many questions still remain regarding the outbreak, such as: is one main strain or various strains responsible for the outbreak; is the emergence of L. (V.) braziliensis in the region a recent event; and how are Santa Catarina strains related to other strains in Brazil? A wide range of molecular tools are available for the investigation of molecular epidemiology of leishmaniasis, but choosing which method and/or markers to use continues to be a challenge [12]. Particularly for New World species, open access databases based on gold-standard genetic markers have not been developed. Currently, outbreak investigation of leishmaniasis, mainly conducted for visceral leishmaniasis outbreaks caused by L. (Leishmania) donovani species complex [13,14], commonly employs multilocus microsatellite typing (MLMT). This technique has been proven to discriminate at the intra-species level [15] with high discriminatory power and is useful for determining outbreak strain origin when a database of MLMT strains is available for the Leishmania species of interest [13]. At the present moment, an open access MLMT database for L. (V.) braziliensis, has not been developed. The high discriminatory power of this technique has its drawbacks depending on the type of epidemiological question or analysis. In some cases, almost 20 ''different'' genotypes can be identified in one focus [13,16,17]. Dividing the isolates into many different genotypes reduces the statistical power of analyses involving epidemiological variables, such as clinical and demographic characteristics of the patient. Such reductions in statistical power greatly reduce the ability of researcher to conclude the relationship of factors like clinical form and disease virulence with a particular genotype.
Thus, epidemiological tools with appropriate discriminatory power, increased reliability and inter-laboratory reproducibility and comparability urgently are required. With these characteristics in mind, the method of multilocus sequence analysis (MLSA) provides a promising alternative. Projected as the future gold standard species typing method [12], MLSA involves sequencing a panel of house-keeping gene loci based on the panel of enzymes used in MLEE [18]. Several markers of these conserved regions have already been described, including ten markers for L. (L.) donovani [19,20], and six markers for New World species [18,21]. However, for L. (Viannia) species, these studies have mainly focused on interspecies discrimination and phylogenetic/taxonomic analysis and have employed only up to four markers. Given the challenges described above, we propose a panel of six gene loci, including three new markers described here for the first time, as an epidemiological tool for investigation of L. (V.) braziliensis outbreaks. In the present study, the recent outbreak in Santa Catarina is used to demonstrate the applicability of this technique in outbreak settings. The overarching objective of this work will be to generate interest in the community of leishmaniasis investigators to create an international sequence database based on these gene markers, as well as other markers from the original MLEE panel, for a more comprehensive and unified investigation into the distribution and epidemiological characteristics of Leishmania species.

Data and sample collection
Leishmania (Viannia) braziliensis strains from eleven Brazilian states (n = 53) were obtained from the Leishmania Collection of the Oswaldo Cruz Institute (Coleção de Leishmania do Instituto Oswaldo Cruz-CLIOC) in Rio de Janeiro, Brazil, and strains from Santa Catarina (n = 33) were obtained from the cryobank of the Laboratório de Protozoologia of the Universidade Federal de Santa Catarina (UFSC), Florianópolis, Santa Catarina, Brazil. Patient data from Santa Catarina used in this study were investigated as part of routine reportable disease surveillance and collection procedures have been previously described in [9]. Santa Catarina isolates were deposited in CLIOC and subjected to MLEE characterization, according to routine procedures employed by CLIOC.

PCR amplification and sequencing
Leishmania promastigotes were cultured at 25uC in Schneider's medium supplemented with 20% heat-inactivated fetal bovine serum. DNA extraction was conducted using the Wizard DNA purification Kit (Promega, Madison, USA), according to manufacturer's instructions.
Amplification was performed for a panel of six housekeeping gene loci listed in Table 1. Primers and PCR conditions have been previously described for 6-phosphogluconate dehydrogenase (6pgd), manose-6-phosphate isomerase (mpi), isocitrate dehydrogenase (icd) [18] and for the heat shock protein 70 (hsp70) [22,23]. Primers for mitochondrial malate dehydrogenase (mdhmt) and nuclear malate dehydrogenase (mdhnc) are described here for the first time. Both follow the reaction condition: for 50 ml, 0,2 mM of each primer, 100 mM Tris-HCl, pH 8.8; 500 mM KCl, 1% Triton X-100; 15 mM MgCl 2 , 0.25 mM deoxyribonucleotide triphosphate (dNTPs), 0.025 U FideliTaq/GoTaq polymerase and 50 ng DNA. Amplification conditions were 94uC for 2 min, followed by 34 cycles at 94uC for 30 s, 52uC for 30 s and 72uC for 1 min, with a final extension at 72uC for 5 min. PCR products were purified and subsequently sequenced with the same primers used in the PCR.

Author Summary
Molecular epidemiology of infectious diseases, which uses pathogen genetics to determine risk factors in the human population, is commonly employed to assist in outbreak investigation. While definitive genetic markers and techniques have been developed for several other bacterial, viral, and parasitic pathogens, the scientific community has yet to agree on an international standard for inter-and intra-species differentiation of Leishmania, the parasite that causes the disease leishmaniasis. As leishmaniasis represents one of the highest disease burdens among the neglected tropical diseases, development of molecular techniques, which allow for inter-laboratory comparability through international sequence databases, is imperative for moving forward with disease control. Based on the current standard technique employed for bacteria, the authors propose a panel of six genetic markers for multilocus sequence analysis (MLSA) for intraspecific differentiation of Leishmania braziliensis, the most widely distributed of the Leishmania species in Brazil. Using strains from a recent outbreak in the sub-tropical nonendemic southern Brazil in comparison with strains from eleven other Brazilian states, the authors provide a practical example of how this technique can be applied in a real world outbreak situation. Consensus sequences were obtained and edited in the software package Phred/Phrap/Consed Version: 0.020425.c (University of Washington, Seattle, WA, USA) and only those with Phred values above 20 were used as contigs. Analyzed sequence fragment lengths for each marker are provided in Table 1. Contigs of all strains were mounted and aligned in MEGA4 (Molecular Evolutionary Genetics Analysis version 4) [24]. Ambiguous sites were divided into two of the possible alleles for all markers using the PHASE algorithm in DnaSP5 [25].

Determination of clonal complexes
Clonal complexes (CC) were defined through BURST analysis in the software eBURSTv3 [26]. The BURST algorithm identified groups of mutually exclusive genotypes associated with a MLSA population and the founding genotype sequence within each group. Then, the algorithm provided the predicted descent from the founding genotype for all other genotypes [26,27]. For this analysis, criterion for CC formation was fixed at the most stringent level with at least five identical alleles for the six loci defining a CC. Sequences which were not able to be grouped into a clonal complex remained in the analysis as unique sequences.

Analysis of population structure
Haploid sequences rebuilt from the PHASE algorithm in DNAsp containing homozygous and heterozygous alleles were imported into STRUCTURE 2.3.4 (University of Chicago, Chicago, IL, USA) to investigate the population structure of the 86 samples of L. (V.) braziliensis based on the six MLSA loci. Using a Bayesian statistical approach, STRUCTURE applies a modelbased clustering method to infer population structure and assign individuals to clusters based on multilocus genotype data [28]. Genetically distinct clusters (K) are identified based on the frequency of alleles, attributing the fraction of each genotype for each sample. In STRUCTURE, runs were performed using a burn-in period of 200,000 iterations followed by 600,000 running iterations. Runs were repeated three times to obtain data suitable for estimating the value of DK (defined as the rate of variation of the log likelihood of data between successive values of K), which provides the most likely K value for the data to be used in STRUCTURE HARVESTER [29]. STRUCTURE HAR-VESTER generates graphs for the change in the log of k and calculation of DK of STRUCTURE results, which were compared for choosing the K that best fit the data. Next, CLUMPP version 1.1.2 [30] was employed to align the multiple replicate analyzes of the same data set. Hierarchical analysis of two to seven K clusters was performed to define the assignment of borderline strains.
Based on clusters found in STRUCTURE, we used Microsatellite Analyser (MSA) [31] to estimate F ST values and Genetic Data Analysis (GDA) version 1.1 [32] to calculate expected heterozygosity (H e ), observed heterozygosity (H o ), and inbreeding coefficient (F IS ). Recombination analysis was performed in Recombination Detection Program (RDP) [33].

Development of median-joining network
To view genetic relationships (phylogenetic network) among strains and differentiation provided by the six markers, the median-joining network was mounted in the program SplitsTree 4.0 [34]. The median-joining network was constructed using concatenated character nucleotide sequences with ambiguous sites for all loci and strains. Nodes of the network, representing individual or groups of strains, were labeled by size, color and/or year/location to reflect epidemiological variables associated with the patient from whom the strain was isolated.

Statistical analysis and mapping
Associations between genetic and epidemiological variables were analyzed in Stata SE 13 (StataCorp LP, College Station, TX, USA). Chi-squared test, or Fisher's exact test when appropriate, was used to assess the relationships between categorical variables. Maps were created in ArcGIS 10 (ESRI, Redlands, CA, USA).

Results
BURST analysis identified three clonal complexes (CC) among the 86 strains of L. (V.) braziliensis, with over half (54.7%, 47/86) of the strains not belonging to any of the three CCs and remaining separate as unique sequence types (Supporting Information S1). A total of 76 distinct sequence types were observed among strains. The analysis was heavily weighted by the homogeneity and large number of strains from Santa Catarina included in the analysis, with the large majority (84.8%, 28/33) of Santa Catarina strains being grouped into one nearly exclusive clonal complex (CC1). Five out of six strains from Santa Catarina that did not group with CC1 were registered as imported cases in the epidemiological investigation. No association was found between CC and clinical form (p = 0.660). Figure 1 shows the geographical distribution of the CCs by state, revealing proportionally higher genetic variation in states from the Amazon biome (94.1% (n = 16/17) unique sequence types) (Supporting Information S1).
Through calculation of DK in the STRUCTURE analysis, the L. (V.) braziliensis strains included in the present study from 12 Brazilian states were found to best fit into three clusters (POP) (Supporting Information S1). Overall, 41.9% (36/86) of strains belonged to POP1, 40.7% (35/86) to POP2, and 16.3% (14/86) to POP3. As in the BURST analysis, the large majority (87.9%, 29/ 33) of Santa Catarina strains formed their own cluster (POP2), which also included four strains from Pernambuco, one from Mato Grosso and one from Bahia ( Figure 2). The four Santa Catarina strains that did not cluster with POP2 were registered as imported cases in the epidemiological analysis. These four strains were the same strains from imported cases that did not cluster in the BURST analysis. Complete strain information can be found in Supporting Information S2.
As shown in Figure 3, POP1 demonstrated the most extensive geographical distribution, including strains from all states analyzed in this study. A distinction can be made between the genetic variation and genetic structure of coastal states, which contain Atlantic forest, and northern states, which are located in the Amazon basin. States of the Amazon region were predominately comprised of POP1 strains, while strains of POP2 and POP3 were mainly found in coastal states.
A significant association between the genetic cluster designated by STRUCTURE and leishmaniasis clinical form of the patient from which the strain was isolated was observed (p = 0.030) ( Table 2). Most strains from cases presenting the mucocutaneous clinical form (4/7) belonged to POP3, including one case from Rio de Janeiro State, one from Pernambuco and two from Bahia.
Based on the scale for the interpretation of F ST suggested by Wright (1978), the estimates showed significant genetic differentiation among the STRUCTURE clusters (  (Table 4). All loci were polymorphic for POP1  Results of the BURST and STRUCTURE analysis were found to be significantly associated (p,0.001) ( Table 5). The majority (37/48) of unique sequences in the BURST analysis were forced into their own population (POP1) in the STRUCTURE analysis, representing mainly strains from the Amazon regions.
Recombination events were detected by seven algorithms in RDP software (p,0.05). However, neither the beginning nor ending breakpoints could be identified, which may have resulted in recombinant misidentification. Nonetheless, one sample from  Santa Catarina (185) and one sample from Bahia (IOC/L 2871) were indicated as potentially parental or recombinant. Thirty-one samples from Santa Catarina had sequences with partial evidence of the same recombination event.
The median-joining network was created from concatenated sequences of the six gene loci for the 86 strains of L. (V.) braziliensis from Brazil. Majority of Santa Catarina strains presented as an evident cluster. Other strains close to the Santa Catarina cluster were from Pernambuco (n = 2), Rio de Janeiro (n = 1), and Pará (n = 1). When the nodes of Santa Catarina strains were highlighted by case origin, all cases not clustered with the principal cluster were imported cases, with the exception of strain 605 (Figure 4). This 605 strain also was grouped within the main Santa Catarina CC and POP in both the BURST and STRUCTURE analyses.
When the strains from cases of mucocutaneous and disseminated clinical form were highlighted, those from Bahia were clustered, while mucocutaneous cases from other Brazilian states appeared closer to the main cluster of Santa Catarina ( Figure 5). When the median-joining network was reduced to only strains from Santa Catarina, the resulting network presented three principal branches. Marked by year and city of leishmaniasis case diagnosis, a main cluster can be observed in the center of the network, representing the epicenter of the outbreak which occurred in 2006 in the municipality of Blumenau ( Figure 6). From this main epicenter, autochthonous cases branched separately, appearing to evolve over time and space to the neighboring municipality of the capital municipality of Florianópolis. The map in Figure 6 shows this main cluster of related Santa Catarina strains was distributed over a distance of 140 km in four years from Blumenau to Florianópolis.

Discussion
Multilocus sequencing analysis (MLSA) was successful in detecting epidemiological patterns among L. (V.) braziliensis strains from twelve Brazilian states. Additionally, the technique was able to detect intra-species variation compatible with epidemiological characteristics within a specific outbreak focus, demonstrating the potential of this technique as a molecular tool for outbreak investigation.
In the BURST analysis, strains were found to group into three clonal complexes. Samples from the Amazon region presented largely as unique sequence types, demonstrating a proportionally higher level of heterogeneity in comparison to coastal states. This distinction is particularly apparent when compared to Santa Catarina. Since the BURST analysis permitted samples not to be grouped into a specific CC and remain as unique sequences, the STRUCTURE analysis was observed to force these unique sequences to form a genetic cluster. This was also evident in the significant association between the two analyses (p,0.001). POP1 was comprised of almost entirely unique sequence types. This high heterogeneity is characteristic of strains from the Amazon biome, as previously observed in other studies, and reflects the large variety of vectors and hosts in the region [10,35,36]. Furthermore, the emergence of L. (V.) braziliensis in the state of Santa Catarina, Brazil appears to be a recent event, given the high homogeneity observed among the analyzed strains. This conclusion is based on the assumptions of the Hardy-Weinberg equilibrium model of populations, which states if no evolutionary pressure mechanism, such as migration in or out of the population or mutation over a long period of time, is acting upon a given population, then the genetic frequencies will remain unaltered [37,38]. Therefore, during the period from which the samples were collected during the outbreak, the strains were largely uninfluenced by outside strains, remaining as their own apparently unique population. This could also be caused by a specific transmission cycle in which other Leishmania strains or species were not easily incorporated. Specific vector-parasite relationships would remove the possibility of recombination given the major selective force on Leishmania populations occurs in the vector hosts during the development of the parasite [39].
MLSA utilizing the panel of six markers was able to distinguish epidemiological characteristics among L. (V.) braziliensis strains. In all three analyses (BURST, STRUCTURE, and median-joining), MLSA results were compatible with case origin evaluated in the epidemiological investigation of Santa Catarina strains. These results demonstrate the potential of this method for use in future outbreak investigations and surveillance. Despite being registered as an imported case in the epidemiological investigation, strain 605 was grouped within the main Santa Catarina cluster in all three analyses, pointedly suggesting this patient was most likely an autochthonous case. In such cases, the molecular characteriza-  tion proved to be a more reliable and precise tool than the epidemiological interview to determine if a case acquired the infection locally or outside of a given region. Results also show the methodology possesses discriminatory power to differentiate imported and autochthonous cases at state macroregion levels.
Knowledge on the origin of a case is important for predicting case  outcome and treatment course, since several studies have shown a relationship between specific characteristics of the infecting parasite and geographical location with the outcome of the patient [9,40,41]. Along the same lines, MLSA showed a significant association between clusters in the STRUCTURE analysis and patient clinical form among the samples analyzed in the present study (p = 0.0296). However, the current study only included seven cases of the mucocutaneous form. A study involving a large representative sample of these cases with controls is necessary to validate these findings. Identification of a genetic marker of Leishmania virulence has not been identified at the present moment [42], and the identification of such a marker would have important clinical and pharmacological significance. Despite the limited number of samples in this study, this methodology could be promising for the identification of a specific L. (V.) braziliensis cluster predisposed to the mucocutaneous form, and therefore, warrants further investigation.
Recombination is often difficult to detect within species because of low inter-strain diversity and/or apparent low diversity due to inappropriate sampling [43]. However, RDP results of the present study were able to reveal recombination occurring between the L. braziliensis strains. This suggests the strains from Santa Catarina may be the result of a clonal expansion from a recombinant event, and the resulting strains then encountered proper conditions to propagate in the state. Previous studies on recombination, including a study on population genetics for inbreeding [44] and a previous MLSA phylogenetic study [18] specifically for L. braziliensis, were able to detect recombination signals as well. In these situations, homologous recombination may have been the responsible mechanism. This phenomenon also may have produced the well-structured clonal complexes in Leishmania in the present study which allowed for the epidemiological inferences to be made.
As no definitive set of markers for MLSA has been defined for the study of populations within a given species of Leishmania, the markers evaluated here could be defined as potential candidates in the panel used for this type of study. Interestingly, the three new markers, hsp70, mdhnc and mdhmt, were the most polymorphic of the six markers, suggesting their addition provided the increase in discriminatory power that allowed for intra-species differentiation. Taken together, these six markers provided adequate discriminatory power to answer epidemiological questions surrounding genetic clusters of a single species. An important benefit of MLSA is the ability to create and store sequences in an international database for global comparison of Leishmania species and strains [15]. The next step will be to determine the viability and discriminatory power of this six loci panel for other species of Leishmania and increase the number of markers and strains sequenced. Four of the markers (6pgd, mpi, icd and hsp70) have already proven to be discriminatory among species of the Leishmania subgenus Viannia, including L.  [18,21].
With the recent increase in development of genetic markers and new statistical methods for analyzing them, the choice of which software is most adequate to your specific analysis is becoming increasingly difficult. No definitive guidelines currently exist [45]. For this reason, we opted to evaluate our MLSA results from three different perspectives, using diverse software (BURST, STRUC-TURE and Splitstree) to arrive at our inferences regarding the genetic structure among the L. (V.) braziliensis included in the present study. The BURST analysis, which is commonly used for MLSA of haploid organisms, such as bacteria, permitted a better comprehension of the genetic variability among the samples using conservative parameters for differentiating clonal complexes. As almost all Santa Catarina strains fit into one clonal complex and the remaining strains were mainly unique sequences, we can conclude the cluster in this state is highly homogeneous in comparison to other states. However, with over half of the strains not grouped in a clonal complex, comparison of genotypes with epidemiological factors was not possible. The STRUCTURE analysis forced all strains into a cluster, resulting in the grouping of all of these unique sequences into their own cluster. This phenomenon shows that, despite the high diversity among the samples from the Amazon region, strains from Santa Catarina continue to be genetically distinct from other Brazilian strains analyzed here. In other words, the diverse genetic clusters within POP1 of the Amazon region, as a whole, are still genetically more distinct from Santa Catarina strains than within themselves, as also shown by F ST and F IS . Interestingly, our study found high positive F IS values (high inbreeding coefficients) among the populations of L. (V.) braziliensis, which negates the hypothesis of strictly clonal reproduction among Leishmania species. High F IS values have also been observed in various MLMT studies for Leishmania, including a study on L. (V.) braziliensis in Bolivia and Peru [46], a study on L. (L.) infantum in Old World and New World strains [47] and a study on L. (L.) donovani in Ethiopia [48]. In these studies, possible explanations of these high F IS values were the presence of considerable inbreeding and/or sub-structuring of the population, reflecting a possible Wahlund effect.
Despite being too complex for comparing all strains among themselves, the median-joining network was the best visual representation for comparing Santa Catarina strains with all other strains from Brazil. This type of analysis is most applicable in an outbreak situation in which strains from a specific area can be compared to other reference strains, allowing for the distinguishing of imported cases and other epidemiological differences. Overall, until software capable of addressing specific genetic Leishmania characteristics, such as infrequent recombination, is created, use of all three types of genetic analyses can be used as an alternative to provide a robust MLSA analysis.
This information on the genetic variability of circulating strains is important for public health and control efforts. Considering drug resistance and complications in treatment have not been observed in Santa Catarina cases, control of leishmaniasis in Santa Catarina where the parasite strains are genetically homogeneous would be expected to be much more efficient than in regions where the parasite presents genetic heterogeneity and a more complex transmission cycle. This factor emphasizes the need for more urgent and active control methods to prevent further introduction of Leishmania strains and/or species, as well as geographical spread of the disease.

Conclusion
MLSA revealed epidemiological patterns among L. (V.) braziliensis strains from twelve Brazilian states, even within the state of Santa Catarina where the strains presented extensive homogeneity. The addition of three markers, hsp70, mdhnc and mdhmt to the previously described panel of markers increased the discriminatory power of the technique, permitting the identification of three genetic clusters within L. (V.) braziliensis strains. All three analyses (BURST, STRUCTURE and median-joining network) provided a complementary and integral part in the interpretation of the MLSA results. When used in tandem with MLMT, these two methods could provide a more robust approach to the molecular epidemiology of leishmaniasis and increased validity of the population structure model. A prospective study design that seeks to include a representative sample of the patient population and active collection of their Leishmania strains is needed to validate this method as a molecular epidemiology tool. However, the present study has provided sufficient evidence of the effectiveness of this method for pursuing further validation of MLSA for leishmaniasis outbreak investigation.

Supporting Information
Supporting Information S1 Summary of clonal complex and genetic cluster of strains by state. (XLS) Supporting Information S2 Strains by sequence type, with data on sex, clinical outcome, origin, and state of origin provided for Santa Catarina strains only. Doubled columns present alternative alleles for those sequences presenting ambiguous sites. (XLS)