Comparison of Two Multilocus Sequence Based Genotyping Schemes for Leptospira Species

Background Several sequence based genotyping schemes have been developed for Leptospira spp. The objective of this study was to genotype a collection of clinical and reference isolates using the two most commonly used schemes and compare and contrast the results. Methods and Findings A total of 48 isolates consisting of L. interrogans (n = 40) and L. kirschneri (n = 8) were typed by the 7 locus MLST scheme described by Thaipadungpanit et al., and the 6 locus genotyping scheme described by Ahmed et al., (termed 7L and 6L, respectively). Two L. interrogans isolates were not typed using 6L because of a deletion of three nucleotides in lipL32. The remaining 46 isolates were resolved into 21 sequence types (STs) by 7L, and 30 genotypes by 6L. Overall nucleotide diversity (based on concatenated sequence) was 3.6% and 2.3% for 7L and 6L, respectively. The D value (discriminatory ability) of 7L and 6L were comparable, i.e. 92.0 (95% CI 87.5–96.5) vs. 93.5 (95% CI 88.6–98.4). The dN/dS ratios calculated for each locus indicated that none were under positive selection. Neighbor joining trees were reconstructed based on the concatenated sequences for each scheme. Both trees showed two distinct groups corresponding to L. interrogans and L. kirschneri, and both identified two clones containing 10 and 7 clinical isolates, respectively. There were six instances in which 6L split single STs as defined by 7L into closely related clusters. We noted two discrepancies between the trees in which the genetic relatedness between two pairs of strains were more closely related by 7L than by 6L. Conclusions This genetic analysis indicates that the two schemes are comparable. We discuss their practical advantages and disadvantages.


Introduction
Leptospirosis is a common zoonotic disease worldwide, with a particularly high prevalence in warm humid countries [1][2][3][4]. About 350,000 severe cases of leptospirosis are estimated to occur annually, with case fatality reports up to 50% [5][6][7]. Reported cases are likely to be a gross under-estimate of global incidence rates, the result of a combination of factors including lack of surveillance, diagnostics and notification in those countries with the highest disease burden. Leptospirosis is currently considered a globally re-emerging disease, with frequent outbreaks in South East Asia (including Thailand, India, The Philippines and Sri Lanka) as well as in Latin America [3,[8][9][10][11][12][13][14]. International travel also leads to presentation of leptospirosis cases in settings where incidence is low and clinicians are unfamiliar with its clinical manifestations [7,15].
Multilocus sequencing typing (MLST) has been widely adopted for the study of bacterial evolution and population biology of a large number of microbial species [38], and represents the leading molecular method for bacterial genotyping. MLST based on 7 housekeeping loci has been developed for Leptospira [39], and is supported by a publically accessible database by which genotypes can be readily assigned as known or new sequence types. An alternative sequence based genotyping scheme of 6 loci including housekeeping genes, a 16S rRNA gene and genes encoding surface-expressed proteins has also been developed and used by several groups. This has led to uncertainty as to which scheme should be adopted. The aim of the current study was to compare the two schemes in terms of their discriminatory ability, both within and between species, by generating data using both schemes for a single set of isolates. We also discuss the practical aspects relating to each scheme.

Leptospira isolates and DNA isolation
The Leptospira isolates used in this study and their providers are shown in Table 1. Genomic DNA was extracted from laboratory bacterial cultures as described previously [39,40].

Genotyping
All isolates were evaluated using both genotyping schemes [39,40]. The MLST scheme described by Thaipadungpanit et al. (2007), is based on pntA, sucA, pfkB, tpiA, mreA, glmU and fadD [39], and the scheme described by Ahmed et al. (2006) is based on adk, icdA, secY, rrs2, lipL41, and lipL32 [40]. The terms 7L and 6L have been adopted throughout to refer to the 7 and 6 gene schemes, respectively. No modifications were made to the published primers or cycling conditions of 7L. Table 2 lists the primer pairs used for 6L. Four of the 12 primers (adk-F, adk-R, secY-R and icdA-R) were modified compared with the published 6L scheme, and used in a repeat PCR reaction in the event that the original primers failed to generate an amplicon. Cycling conditions were as described previously for 6L, with the exception that reactions using the four new 6L primers had a reduced annealing temperature of 54uC. Sequence data were edited using SeqMan software contained within the DNASTAR package (DNASTAR Inc., Wisconsin, USA). The region of sequence used to define each locus of 7L was as described previously [39], but the region used to define each locus of 6L was altered as follows. Three loci (secY, lipL32 and lipL41) were changed because the published PCR product and the region of sequence used to define the locus were either identical (secY and lipL32) or different by just two bp [40]. This meant that we were unable to obtain high quality sequence traces for the first 10-20 bases of the amplicon, and so trimmed the sequence in frame by approximately 20 bp at either end for all three genes. The other 3 published loci of 6L (adk, icdA and rrs2), were trimmed by one or two bases to put them in frame, which simplifies the analysis of synonymous and non-synonymous substitutions. The sequence start and end points for the 6 loci of 6L are shown in Table 2.
The alleles at each of the 7L loci were assigned and the sequence type (ST) defined using the publically accessible Leptospira MLST website (http://leptospira.mlst.net/). Allelic numbers, profiles and STs were not generated for the 6L data.

Sequence analysis
Sequence alignment, nucleotide diversity and reconstruction of phylogenetic trees were performed using Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0 [41]. Mean pairwise distances (p distance) were calculated using the Kimura Two Parameter nucleotide substitution model. Synonymous (dS) and non-synonymous (dN) nucleotide substitutions were calculated based on the Modified Nei-Gojobori method with Jukes Cantor correction using MEGA 4. Neighbor joining trees were reconstructed based on concatenated sequences of each scheme using the Kimura Two-Parameter substitution model. Gene order of the concatenated sequences were glmU, pntA, sucA, fadD, tpiA, pfkB, and mreA for 7L, and adk, icdA, lipL32, lipL41, rrs2, and secY for 6L. Discriminatory ability (D value) and 95% confidence intervals (CI) were estimated as described previously [42,43]. These values were verified using the LIAN web tool housed on pubmlst.org [44]. A sliding window analysis of withinand between-species variation was carried out using DNAsp v. 5.0 [45]. An initial ''window'' of 400-bp was selected, as this is roughly equivalent to a single allele. The first window was thus from base 1 to base 400 of the concatenated sequence. From this we took each species in turn and calculated the average number of nucleotide differences per site over all pairwise comparisons (p), to give the within species polymorphism. Similarly, we calculated the number of fixed differences between species (substitutions) per site to gauge the divergence between L. interrogans and L. kirschneri. The window region was then moved along 50-bp and these parameters recalculated. GenBank accession numbers of 6L generated sequences are JF509178-JF509357.

Discriminatory power of the two genotyping schemes
A total of 48 strains and isolates belonging to L. interrogans (n = 40) and L. kirschneri (n = 8) were included in this study, of which 17 were reference strains and 31 were clinical isolates -

Author Summary
Two independent multilocus sequence based genotyping schemes (denoted here as 7L and 6L for schemes with 7 and 6 loci, respectively) are in use for Leptospira spp., which has led to uncertainty as to which should be adopted by the scientific community. The purpose of this study was to apply the two schemes to a single collection of pathogenic Leptospira, evaluate their performance, and describe the practical advantages and disadvantages of each scheme. We used a variety of phylogenetic approaches to compare the output data and found that the two schemes gave very similar results. 7L has the advantage that it is a conventional multi-locus sequencing typing (MLST) scheme based on housekeeping genes and is supported by a publically accessible database by which genotypes can be readily assigned as known or new sequence types by any investigator, but is currently only applicable to L. interrogans and L. kirschneri. Conversely, 6L can be applied to all pathogenic Leptospira spp., but is not a conventional MLST scheme by design and is not available online. 6L sequences from 271 strains have been released into the public domain, and phylogenetic analysis of new sequences using this scheme requires their download and offline analysis. further referred to as strains (Table 1). Nine strains had been evaluated previously by both schemes [39,40], and 39 strains typed previously by only one of the two schemes were typed by the other scheme during this study. Two strains (a Thai clinical isolate strain L1207 of unknown serovar and a reference strain of serovar Kuwait strain 136/2/2) could not be typed using 6L as both had a deletion of three nucleotides in the lipL32 sequence. These two strains were excluded from further analysis. 7L resolved the 46 strains into 21 STs, shown in Table 1. 6L data were analysed off line, and the alleles at the six loci given arbitrary allelic numbers to construct an allelic profile and determine the number of genotypes. This demonstrated a total of 30 genotypes (data not shown). Overall levels of diversity (D) were comparable for the 7L and 6L schemes (92.0 (95% CI 87.5-96.5) and 93.5 (95% CI 88.6-98.4), respectively). The discriminatory ability per locus ranged from 59% (sucA) to 87% (glmU and mreA) for 7L and 66% (rrs2) to 92% (secY) for 6L (Table 3). All D values were verified using the LIAN web tool housed at pubmlst.org and found to be identical to the values shown. The majority of alleles of both schemes were species specific (that is, found in either L. interrogans or L. kirscheri but not both). There were three exceptions where alleles were found in both species, as follows: 7L, allele 1 of sucA; 6L, one allele of lipL32 and one allele of rrs2.

Nucleotide diversity of genetic loci
Overall nucleotide diversity (based on concatenated sequences) for the 46 isolates was 3.6% and 2.3% for 7L and 6L, respectively ( Table 3). The diversity within L. interrogans was lower than that within L. kirschneri (0.5% and 1.1% for 7L, and 0.4% and 0.8% for 6L, respectively). Table 3 also details the nucleotide diversity by locus. This ranged from 3.6% to 6.1% for 7L, and 0.5% to 6.7% for 6L. The lowest diversity was observed for lipL32 and rrs2 of 6L. The dN/dS ratios calculated for each locus indicated that none were under positive selection (that is, all values were lower than 1) ( Table 3).
A sliding window analysis of the concatenated sequences was performed to provide a visual comparison of the degree of polymorphism within both species, and the level of divergence between them. This revealed a generally higher level of variation within L. kirschneri compared to L. interrogans, particularly at sucA (7L) and to a lesser extent lipL41 (6L), although the sample size for the former species was very small (n = 8) (Figure 1). This analysis confirmed that the degree of within species polymorphism showed very little difference between the 7L and 6L scheme. However, 7L tended to provide better resolution between species, which was largely accounted for by the low level of divergence for lipL32 and rrs2 of 6L.

Relatedness of Leptospira spp. inferred from the two genotyping schemes
Neighbor joining trees were reconstructed for 7L and 6L based on the concatenated sequences of their respective loci ( Figure 2). Both trees showed two distinct groups corresponding to L. interrogans and L. kirschneri. There were also several obvious Table 2. Primers for 6 locus genotyping scheme used during this study [39]. similarities within L. interrogans between the two trees. For example, the clonal structure of ST34 and ST46 as defined by 7L was maintained by 6L. A common finding, however, was that 6L had a tendency to split single STs as defined by 7L into closely related clusters. For example, the three isolates designed as ST49 by 7L were split into three different genotypes by 6L. Further examples of splitting of a clone by the 6L scheme were 7L ST42, ST37, ST68 and ST17. A number of discrepancies were noted between the two trees. Two strains of L. kirschneri (strains Moskva V and Kipod 179) were designated by 7L as ST110, but these were resolved into different genotypes by 6L. These two strains differed by 9 nucleotides over 3 loci, with secY accounting for 7 of these. A difference was also noted for L. interrogans strain 654 (a Thai clinical isolate), which was closely related to L. interrogans strain Hard- Figure 1. Sliding window analysis of concatenated sequence of all 13 loci. Sliding window analysis of concatenated sequence of all 13 loci, carried out using DNAsp v 5 using a window size of 400-bp, a step size of 50-bp, and points based on the mid-point of each window (i.e. the first point is at position 200). The names of the individual loci are shown. Three plots are given to represent the level of polymorphism within each of the two species, and the level of diversity between them. In terms of the within species variation, there is little difference between the two schemes and both point to generally higher levels of variation within L. kirschneri than L. interrogans. However, there are two loci used in the 6L scheme that are highly conserved between species (lipL32 and rrs2), which means that in general the 7L scheme provides better between-species resolution. doi:10.1371/journal.pntd.0001374.g001 joprajitno by 6L (differing by only 1 nucleotide), but was more distantly related by 7L (differing by 11 nucleotides over 6 loci).

Discussion
The authors of this paper include representatives of the scientific groups that reported two independent genotyping schemes for Leptospira spp. Here, we provide the scientific community with the findings of a study that compared and contrasted the two schemes, together with a discussion of the practical aspects related to undertaking each.
The two schemes are unrelated and different by design. 7L was founded on a conventional strategy for MLST of selecting 7 housekeeping genes that were distributed around the genome and were not under positive selection. The design of 6L varied from this in that 6 loci were selected from different functional categories. For example, lipL41 and lipL32 encode surface expressed proteins that would be expected to be under positive selection as a result of being immunogenic and a target for the host response. At the other end of the spectrum, rrs2 is one of two 16S rRNA genes that would be predicted to be highly conserved.
Contrary to our expectations, we did not find that any of the 6L genes were under positive selection. More genotypes were resolved by 6L than by 7L, in part a function of the high number of alleles for secY. Analysis of genetic diversity indicated that there was little difference in within-species variation difference between the two schemes, both pointing to generally higher levels of variation within L. kirschneri than L. interrogans. The conserved nature of two loci used in 6L (lipL32 and rrs2), resulted in the finding on sliding window analysis that 7L provided better between-species resolution. Interestingly we noted that rrs2 of 6L showed a higher D value than the housekeeping gene sucA of 7L. Although this is an exception to the general rule that housekeeping metabolic genes provide more discrimination than conserved genes such as those encoding ribosomal RNA, such an observation is not unprecedented [46]. 6L has been applied to six pathogenic Leptospira spp. [40], which compares favorably with 7L which was designed for the two closely related species L. interrogans and L. kirschneri. However, this disadvantage of 7L will be resolved within the next 12 months; the scheme has already been extended to L. borpetersenii (manuscript in preparation), and the laboratory work to extend this to all pathogenic species is now completed. These improvements will be made publicly available by the end of 2011.
Conversely, the 6L scheme does not conform to the original concept of MLST as it includes a non-housekeeping gene (rrs2), and genes that encode cell surface proteins. Furthermore, the sequence start and stop sites used to define the allele for each locus were not provided in the original description of 6L scheme and so could not be performed based on the published methodology alone, although these have been detailed in this study. Minor changes were necessary to the start and stop sites, but we think it unlikely that this led to a change in the performance of the scheme.
The 6L scheme is not associated with a publically accessible website that allows an investigator to compare new data with existing sequence data. 6L has recently been applied to an Comparison of Two MLST Schemes for Leptospires www.plosntds.org extended set of strains and isolates (n = 271) encompassing a wide diversity of hosts and geographic regions [47], providing a rich source of sequence data that has been released into the public domain (GenBank). Comparative phylogenetic analysis by individual investigators will require downloading and storage of these data. In contrast, a website for 7L was launched at the time of publication and is regularly maintained and curated. At least one representative of each ST is recorded in a downloadable spreadsheet, providing a mechanism by which a picture of global bacterial diversity can be developed over time. This is easy to use, provides tools for comparison of a given strain with all of the other strains in the database, is more suited to investigators with limited phylogenetic training and experience, and so has the power to reach a wider audience.
In conclusion, we have provided detailed comparisons of two major genotyping schemes for Leptospira spp., and have described their advantages and disadvantages. 7L complies with the philosophy of MLST (housekeeping genes only supported by website), but will not be ready for use for the study of all pathogenic Leptospira spp. until the end of 2011. In the meantime, a bioinformatics analysis of the discriminatory power of 4 genes (three of which are not present in either scheme) as well as a new scheme with 7 loci both limited to L. interrogans and L. kirschneri have been reported [48,49], adding further diversity to the tools available for the phylogenetic study of Leptospira spp. There is a pressing need for consensus within the leptospirosis community as to the preferred genotyping scheme, an essential step if the wealth of knowledge gathered for other bacterial species based on detailed analysis within a single scheme is to be replicated for Leptospira spp. Both schemes contain highly discriminative and less discriminative loci. While it is feasible to formulate a consensus MLST combining the most discriminative housekeeping genes from both schemes, we have resisted the temptation of presenting an interim scheme that has not been extensively validated. Instead, we aim to expedite the release of the 7L MLST scheme for all the major pathogenic species, and recommend its use for the study of the global epidemiology of pathogenic Leptospira spp.