Molecular typing of Legionella pneumophila isolates from environmental water samples and clinical samples using a five-gene sequence typing and standard Sequence-Based Typing

Inadequate discriminatory power to distinguish between L. pneumophila isolates, especially those belonging to disease-related prevalent sequence types (STs) such as ST1, ST36 and ST47, is an issue of SBT scheme. In this study, we developed a multilocus sequence typing (MLST) scheme based on two non-virulence loci (trpA, cca) and three virulence loci (icmK, lspE, lssD), to genotype 110 L. pneumophila isolates from various natural and artificial water sources in Guangdong province of China, and compared with the SBT. The isolates were assigned to 33 STs of the SBT and 91 new sequence types (nSTs) of the MLST. The indices of discrimination (IODs) of SBT and MLST were 0.920 and 0.985, respectively. Maximum likelihood trees of the concatenated SBT and MLST sequences both showed distinct phylogenetic relationships between the isolates from the two environments. More intragenic recombinations were detected in nSTs than in STs, and they were both more abundant in natural water isolates. We found out the MLST had a high discriminatory ability for the disease-associated ST1 isolates: 22 ST1 isolates were assigned to 19 nSTs. Furthermore, we assayed the discrimination of the MLST for 29 reference strains (19 clinical and 10 environmental). The clinical strains were assigned to eight STs and ten nSTs. The MLST could also subtype the prevalent clinical ST36 or ST47 strains: eight ST36 strains were subtyped into three nSTs and two ST47 strains were subtyped into two nSTs. We found different distribution patterns of nSTs between the environmental and clinical ST36 isolates, and between the outbreak clinical ST36 isolates and the sporadic clinical ST36 isolates. These results together revealed the MLST scheme could be used as part of a typing scheme that increased discrimination when necessary.


Introduction
Legionella pneumophila (L. pneumophila) is a gram-negative bacterium worldwide in rivers and lakes as well as in many artificial water systems [1]. It is the major causative agent of Legionnaires' disease (LD), which manifests as atypical pneumonia, Pontiac fever or a self-limited flu-like illness [2,3]. Several molecular typing schemes have been used to investigate L. pneumophila epidemiology. These schemes included amplified fragment length polymorphism (AFLP), restriction fragment length polymorphism (RFLP), pulsed-field gel electrophoresis (PFGE), random amplified polymorphic DNA (RAPD) and Sequence-Based Typing (SBT). They have been used as part of a combinatory approach by some laboratories today [4][5][6][7][8]. The SBT, a scheme analogous to multilocus sequence typing (MLST), was proposed by the European Working Group for Legionella Infections (EWGLI, now is the ESCMID Study Group for Legionella Infections, ESGLI). It is an essentially seven-locus sequence typing method performed by sequencing and comparing seven loci (flaA, pilE, asd, mip, mompS, proA, and neuA), and appears to be a powerful tool for global epidemiology [9,10]. MLST approach with nonselective housekeeping genes has been well documented [11], while a combination of selective targets could produce sufficient discrimination to allow epidemiological typing of L. pneumophila [9]. Gaia has first chosen seven genes, including four non-selective (acn, groES, groEL, and recA) and three selective (flaA, proA, and mompS) to determine the availability of these genes in investigating the outbreaks of LD caused by L. pneumophila [12]. Then a modified six-gene (flaA, proA, mompS, asd, mip, and pilE) sequence typing was performed to improve the previous method [9]. In 2007, neuA was added to the six-gene sequence typing. It increased the discriminatory ability of the consensus sequence-based scheme for typing L. pneumophila and eventually formed the SBT scheme [10]. Although SBT is the current "gold standard" typing method for investigation of LD outbreaks, however, as common sequence types (STs) such as ST1, ST47, and ST36 isolates cause many infections, some investigations remain unresolved [13]. For example, subtyping the isolates belonging to a same prevalent ST required combinatory approach, including PFGE, AFLP, monoclonal antibody-based (MAb) subgrouping and some other genome sequence-based typing schemes [14][15][16]. A large proportion of LD cases is caused by just a small number of common STs (e.g., ST1); the SBT can lack discriminatory power [13,17,18]. Therefore, research and improvement of molecular typing methods for L. pneumophila are desirable.
As an opportunistic bacterium that inhabits aquatic environments, L. pneumophila has an intra-amoebal lifestyle. Free-living amoeba in natural water environments is the reservoir and shelter for L. pneumophila. From the natural water, it can colonize the artificial water environments such as cooling towers and hot-water systems and then spread in aerosols, infecting the susceptible person [19,20]. So far, person-to-person transmission of L. pneumophila has rarely been reported, the infection of LD is mainly via the inhalation of Legionella-containing aerosols [21,22]. Thus aquatic environments could serve as potential sources of Legionella infection, and epidemiological study of environmental isolates was of great importance. In a previous study, we researched the genetic diversity of clinical, artificial and natural water isolates at the non-virulence gene and virulence gene levels, respectively [23]. Five gene loci including two non-virulence loci (tryptophan synthase α subunit-encoding gene, trpA and tRNA nucleotidyltransferase gene, cca), which are common in a set of bacterial genomes, and three virulence loci (icmK, lspE, and lssD) belonging to the components of different secretion systems were studied. The allelic diversities of these loci in our environmental isolates implied that an MLST scheme based on these loci seemed to yield high discriminatory ability for these isolates. Therefore, we developed a five-gene (cca, trpA, lspE, lssD, and icmK) MLST scheme. The aims of this study were; 1. To evaluate the discriminatory power of the MLST scheme in genotyping 110 L. pneumophila isolates from various natural and artificial water sources of Guangdong Province of China, and compared it with the SBT scheme. This would answer whether the MLST could provide a higher discrimination for environmental isolates.
2. To investigate the diversity of the L. pneumophila isolates from natural and artificial water sources based on ST and new sequence type (nST, sequence type of the MLST) distributions. The phylogeny and molecular evolution of these isolates based on SBT sequences, and MLST sequences were also investigated to probe possible mechanism that operated the ST and nST distributions in different water sources. These would enable comparison of the genetic types in these isolates determined by SBT with that derived by MLST and enable the analysis of correspondence between the MLST and SBT schemes.
3. To determine the potential of the MLST scheme in genotyping reference clinical and environmental L. pneumophila strains, especially those strains with prevalent STs. We would try to find whether there were different distribution patterns of nSTs between the environmental and clinical isolates, and between the outbreak and sporadic clinical isolates.

Ethics statement
The local Centers for Disease Control and Prevention (CDC) and the hotel managers authorized the collection of cooling tower water of the hotels. There were no specific permissions required for the collection of water samples from lakes, rivers, and ponds; because they were public open areas for citizens. Our study did not involve endangered or protected species.

L. pneumophila isolates
Our environmental collection included 51 artificial water isolates and 59 natural water isolates. They were isolated from ponds, rivers, lakes and air conditioning cooling towers in 14 different sites in Guangdong Province of China, between October 2003 and September 2007. The details of the isolates including the locations where they were isolated, the geographic coordinates, and the collection dates, were summarized in S1 Table. These isolates were used to investigate the discriminatory ability of the MLST scheme for the environmental L. pneumophila isolates, and to investigate the diversity, the phylogeny and molecular evolution of the isolates from natural and artificial water sources. All identified Legionella isolates were grown on buffered charcoal yeast extract (BCYE) agar plates at 37˚C with 5% CO 2 for three days, and then the bacteria cultures were harvested. Genomic DNA extraction was performed as shown in our previous report [24]. Besides our environmental isolates, we used 19 reference clinical strains belonging to prevalent STs to investigate the genotyping potential of the MLST scheme. Ten reference environmental strains belonging to a prevalent ST (ST36) were also used to assess the discriminatory ability of the MLST for the isolates with the same ST but from different sources (clinical and environmental sources). The details of these strains are shown in Table 1.

Five-gene MLST and SBT schemes
All the environmental isolates were selected for sequencing partial cca, trpA, lspE, lssD and icmK genes. We selected the most variable regions through a sequence alignment with the known sequences (including sequences from reference L. pneumophila strains, such as Thunder Bay, ATCC43290, Lens, Alcoy, Corby, etc.) in the NCBI database in order to achieve maximum genetic variability and to make it represents the allelic diversity of these genes. The genes, reference gene ID of the NCBI database, primers, the fragment sizes of the PCR products, the gene regions used for the analysis, and the number of alleles found during this study are shown in S2 Table. PCR was employed to amplify fragments of DNA. The PCR was performed using a 2×EasyPfu PCR SuperMix (Transgene Biotech, Beijing) with 0.1 U Pfu polymerase/μl, 500 μM dNTP each, 50 mM Tris-HCl (pH8.7), 20 mM KCl, and 4 mM MgCl in a ready-to-use formulation. Primers were added to a total volume of 25 μl with a final concentration of 200 nM. PCR was carried out using the GeneAmp PCR system (MJ Research PTC-200) with the following thermal conditions: 95 o C for 3 min, followed by 35 cycles of 95 o C for 20 s, 60 o C for 20 s and 72 o C for 30 s (lspE, lssD, and icmK loci) or 70 s (cca, and trpA loci), and a a Sequence type was derived from the genome sequence data.
b Strain Lens has two non-identical copies of the mompS locus (354nt) in their genome, and its ST was defined according to Moran-Gilad's report [16]. https://doi.org/10.1371/journal.pone.0190986.t001 final extension at 72 o C for 5 min. For confirmation, each PCR reaction was performed with a positive control (L. pneumophila strain ATCC33152 genomic DNA as the PCR template) and a negative control (sterile water as the PCR template). PCR products were purified by an Easy-Pure Quick Gel Extraction Kit (Transgene Biotech, Beijing) and then transferred to Guangzhou IGE Biotechnology Ltd for sequencing. The quality of DNA sequencing was manually checked by Chromas (http://technelysium. com.au). The gene regions assembled to form a concatenated MLST sequence were shown in S2 Table. An nST was defined as a new allele of the concatenated MLST sequence. The STs were determined by using the protocol from ESGLI with seven gene fragments (flaA, pilE, asd, mip, mompS, proA, and neuA) according to the standard process shown in L. pneumophila SBT website (http://www.hpa-bioinformatics.org.uk/legionella/legionella_sbt/php/sbt_ homepage.php). The sequences of the SBT loci and MLST loci of the 29 reference L. pneumophila strains were gained from NCBI database. Their nSTs and STs were determined by analyzing the concatenated MLST and SBT sequences ( Table 1).

Population genetic analysis
The indices of discrimination (IODs) of the SBT and MLST for the isolate collection were calculated using Simpson's index of diversity, as first described by Hunter and Gaston [25]. DnaSP 5.10.01 was used to perform genetic diversity analyses of the concatenated MLST and SBT sequences of the environmental isolates [26,27]. The proportion of each nST or ST was compared between the natural and artificial water isolates by using Fisher's exact test or Chi-Square test (SPSS 16.0, SPSS Inc., USA). Analysis of molecular variance (AMOVA) for the concatenated MLST sequences and SBT sequences was performed with Arlequin Ver3.5.2 [28]. We defined the hierarchical subdivision of the environmental isolates at three levels. At the upper level, the two groups considered were based on the two cities where they were isolated (Guangzhou and Jiangmen groups, consisted of 66 and 44 isolates, respectively). As populations within groups, the intermediate level, we reckoned the isolates from the same environment as subpopulations. Therefore, Guangzhou and Jiangmen groups of isolates were both split into two subgroups (natural and artificial water subpopulations). The third level corresponded to the different haplotypes which were found within the four subgroups considered in the previous level.

Phylogenetic analysis
Phylogenetic analysis was conducted by MEGA7 package [29]. Maximum likelihood (ML) trees were obtained for the concatenated MLST and SBT sequences separately with MEGA7, based on the Kimura 2-parameter model [30]. Initial tree(s) was obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach. ML tree nodes were evaluated by bootstrapping with 1000 replications.

Molecular evolution analysis
The neighbor-net analysis was performed and converted to a splits graph using the drawing algorithms implemented in SplitsTree4 software (version 4.14.4) [31,32]. A reticulate network tree was prepared to show the relationships among different STs or nSTs and to visualize possible recombination events.
The concatenated MLST and SBT sequences of our environmental isolates were screened using RDP4 to detect intragenic recombinations [33]. Six methods implemented in the program RDP4 were utilized. These methods were RDP [34], GENECONV, BootScan [35], MaxChi [36], Chimaera [37], and SiScan [38]. Potential recombination events (PREs) were considered as those identified by at least two methods according to Coscolla's report [39]. Common settings for all methods were to consider sequences as linear, statistical significance was set at the P < 0.05 level, with Bonferroni correction for multiple comparisons and requiring phylogenetic evidence and polishing of breakpoints.

L. pneumophila five-gene MLST and SBT for environmental isolates
Of the 110 isolates, 33 STs of the SBT and 91 nSTs (the 91 nSTs were designated to be nST1, nST2, nST3 etc.) of the MLST were assigned ( Table 2 and Table 3). The most dominant ST was ST1, which accounted for 20% (22/110) of all L. pneumophila isolates, and mostly came from the artificial water sources (Table 2). ST1, the most prevalent and disease-associated ST worldwide, was also the most abundant in the EWGLI SBT database, followed by ST23 and ST47 [40]. ST1048, another dominant ST identified in this study, constituted 11.82% (13/110) of all isolates. Sixteen STs included only one isolate. The proportions of ST1 and ST1054 isolates were significantly higher in artificial environments (Fisher's exact test, P < 0.001 and P = 0.043, respectively), while the proportions of ST1048, ST739, and ST1267 isolates were higher in natural environments (Fisher's exact test, P = 0.006, P = 0.014, and P = 0.029, respectively). These findings reinforced the evidence that the distribution of STs between the natural and artificial environments was distinct [41]. NST50 and nST39 were the prevalent nSTs in this study (Table 3), but only constituted 4.55% (5/110) of all isolates. Most of the nSTs included only one isolate (90.11%, 82/91). The proportion of nST39 was significantly higher in artificial environments (Fisher's exact test, P = 0.019). The allele diversity of the seven SBT loci (flaA, pilE, asd, mip, mompS, proA, and neuA) in these isolates ranged from 9 to 17, while the allele diversity of the five MLST loci ranged from 12 to 18 in cca, trpA, lssD, lspE locus and the significant 83 in icmK locus (S2 Table). The 91 nSTs in 110 isolates implied higher discriminatory power of the MLST than that 33 STs in 110 isolates (IOD = 0.985 vs. IOD = 0.920, S3 Table). David and colleague studied the diversity of 79 epidemiologically unrelated L. pneumophia isolates. The IODs of these isolates were 0.972, 0.991 and 0.940 through the using of a 53 ribosomal-gene MLST (rMLST), a 100 coregene MLST (cgMLST), and the SBT, respectively [13]. The discriminatory power of the fivegene MLST scheme might be similar to the 100 core-gene cgMLST scheme [13].
Diversity of the L. pneumophila isolates from natural and artificial water sources based on the MLST and the SBT schemes Table 2 and Table 3 show the ST and nST compositions of the L. pneumophila isolates recovered from natural and artificial water sources. Fifty-nine isolates from natural water sources were grouped into 52 nSTs, and 51 artificial water isolates were grouped into 41 nSTs; while they were grouped into 23 STs and 17 STs, respectively. The diversity of nSTs was higher in the isolates from natural water sources than in those from artificial ones (IOD = 0.973 vs. IOD = 0.902, S3 Table). Similarly, the diversity of STs was also higher in the isolates from natural water sources (IOD = 0.914 vs. IOD = 0.807, S3 Table). Many studies demonstrated that diversity of isolates from natural water sources was higher than those from artificial water sources, but these studies were based on ST distributions [42,43]. In the present study, we obtained similar results not only based on ST distributions but also based on nST distributions and the diversity of nSTs in these isolates was higher than that of STs, indicating the MLST scheme was efficiency in determining the diversity of L. pneumophila isolates from different water sources. Moreover, we analyzed the genetic diversity of these isolates based on the concatenated SBT and MLST sequences. It showed that genetic diversity parameters such as haplotypes, haplotype diversity, nucleotide diversity, and nucleotide differences, were higher in the isolates from natural water sources ( Table 4). Most of these parameters derived from the MLST sequences were also higher. This result was in accord with our observation in the diversities of nSTs and STs and implied the five-gene MLST scheme had higher discriminatory ability than the SBT scheme.
Besides IOD comparison, we also performed a hierarchical AMOVA analysis to study the genetic variation of the concatenated MLST and SBT sequences in these isolates. The largest proportion of the genetic variation was found within populations, as this level accounted for 89.78% of the total variation in the MLST sequences, and 89.75% of the total variation in the SBT sequences ( Table 5). The fixation indices among groups (F CT ) were -0.0444 (MLST sequences) and -0.05784 (SBT sequences), and the variation did not vary significantly among the groups (P = 1.00), indicating no different genetic diversities of the isolates from the two cities exists. In contrast, fixation indices among populations (F SC ) were 0.14038 (MLST sequences) and 0.10249 (SBT sequences), and the genetic variation varied significantly among populations within groups (P < 0.01, Table 5). These results supported the notion that genetic differentiation existed between the isolates from the natural and artificial water sources, and L. pneumophila isolates from natural water sources had more genetic diversities.

Phylogeny of environmental L. pneumophila isolates based on concatenated MLST sequences of the 91 nSTs and SBT sequence of the 33 STs
ML tree of the concatenated MLST sequences of the 91 nSTs showed five main groups: fortythree nSTs formed nST group 1, and the isolates within this group were mainly from artificial water sources (68.75%, 33/48, P < 0.001, Chi-Square test); while 32 nSTs formed nST group 2, and the isolates within this group were mainly from natural water sources (76.19%, 32/42, P < 0.001, Chi-Square test) (S4 Table, Fig 1). We also found a comparable result in the ML tree of the concatenated SBT sequences of the 33 STs (Fig 2). ST1788, which only included one natural isolate (N67, an nST62 isolate), constituted a group. Of the five STs in group 2, the isolates of this group were mainly from natural water sources (95.24%, 20/21, P < 0.001, Fisher's exact test). In contrast, nine STs constituted the group 4, and the isolates of this group were mainly from artificial water sources (68.75%, 33/48, P<0.01, Chi-Square test). These results showed distinct phylogenetic patterns between the isolates from the two environments. The topology of the two inferred trees was not congruent since, depending on the concatenated SBT and MLST sequences, most isolates had different relationships with each other (Figs 1 and 2, S1 Table). However, we still found out an accordance between STs and nSTs on their respective trees, although not completely. For example, the isolates A5, A189, and A195 were clustered into a clade in the ST tree (ST1778, ST160 and ST19, group 1 of the ST tree, Fig 2, S1 Table). They were also situated in a clade in the nST tree (nST5 and nST36, group 5 of the nST tree, Fig 1, S1 Table). N71 and N220 were both ST45 isolates. They belonged to nST66 and nST91, and were clustered into a clade in the nST tree (Fig 1). N36, N37, N38, N39, N40, N41, and N43 were both ST739 isolates, but they belonged to nST44, nST45, nST46, nST47, nST48, and nST49, respectively. These twelve isolates and their respective branches were clustered into group 1 of the ST tree (Fig 2), while their respective nST branches distributed among three groups (Fig 1). These results showed different phylogenetic relationships between L. pneumophila isolates from natural and artificial water sources, demonstrated the partial correspondence of the MLST with SBT, and implied more discriminatory ability of the MLST scheme for environmental L. pneumophila isolates.

Recombinations in environmental L. pneumophila isolates
Many studies have reported that recombinations existed in L. pneumophila isolates. Costa has detected recombinations in L. pneumophila virulence-related effector sidJ within L. pneumophila subsp. pneumophila strains [21]. Recombination is an important mechanism that shaped L. pneumophila genomes [44]. In this study, the bootstrap values for some branches in the ML trees of STs and nSTs were less than 50%, implying that incongruence phylogeny of the tested nSTs and STs and possible recombination events in the population (Figs 1 and 2) [43]. We obtained reticulate network trees of the concatenated sequences of STs and nSTs by using the neighbor-net algorithm of SplitsTree4 [32] (Version 4.14.4). In the basis of the reticulate tree, a pure clonal population will not have any side edges, while we could find many side edges in reticulate network trees of the 33 STs and 91 nSTs (Figs 3 and 4). This result indicated that recombination events might exist within the population [45]. Thus we tested the intragenic recombinations in the concatenated SBT and MLST sequences separately by using RDP 4. Thirteen PREs among STs, and 14 PREs among nSTs were identified, which were supported by at least two of the six analysis methods (Table 6 and Table 7). Among the 41 resulting recombinant nSTs, three nSTs (nST22, nST23, nST39) were exclusively found in the isolates from artificial water sources, and 38 nSTs were exclusively found in the isolates from natural ones. Similarly, among the 20 recombinant STs, thirteen were exclusively found in natural water sources, and five were exclusively found in the artificial ones. These results together showed a higher frequency of recombinations existed in the isolates from natural water sources, which was consistent with a higher diversity in these isolates. Although early analysis based on multilocus enzyme electrophoresis (MLEE) described the population structure of this species as clonal, many recent reports have suggested that recombination also contributed to shaping variation across its genome [21,44,[46][47][48][49]. Coscolla reported that recombinations among L. pneumophila isolates from natural water sources are common, and not restricted to already described pathogenicity islands or other genome constituents, which provided the genome with high plasticity [46]. Recombinations were also found in outbreak-related L. pneumophila isolates [47]. Our results based on nSTs and STs, together with previously reports, supported the notion that L. pneumophila was undergoing recombinations, especially in those isolates from natural water sources. Recombination was a relevant factor in shaping molecular population genetic structure of this bacterium, and might contribute to the higher diversity of nSTs than that of STs, observed in our environmental isolate collection.

Five-gene MLST scheme to subtype the major and abundant diseaseassociated ST1 isolates
In this study, we obtained 22 ST1 isolates from water sources in Guangdong Province of China. They were mainly from artificial water sources (90.91%, 20/22, Table 2), and could be subtyped into 19 nSTs (S5 Table), indicating extraordinary discrimination of the five-gene MLST for environmental ST1 isolates. Many studies have reported that cgMLST could provide a high resolution in subtyping ST1 isolates [13,16,50], but these schemes sequenced thousands of core genes shared by different L. pneumophila strains. The MLST scheme reported in the present study only sequenced five loci, and the concatenated sequence length was comparable with that of the SBT (2876 bp vs. 2501/2498 bp), but provided a notable resolution. As shown in Fig 5A, the 22 ST1 isolates could be clustered into two main groups (group A and group B). ST1 isolates from natural water sources (N208 and N209) also formed a subgroup. This result suggested that the phylogeny of these two ST1 isolates were closer to each other, and genetic differences might exist between ST1 isolates from natural and artificial water sources. Reticulate network tree of the ST1 isolates showed many side edges, indicating the recombinations of the MLST sequences also exist within these isolates (Fig 5B).

Five-gene MLST scheme to genotype reference clinical and environmental L. pneumophila strains
As shown in Table 1, the nineteen clinical strains were assigned to 10 nSTs and 8 STs. The IODs of the SBT and MLST for this strain collection were 0.770 and 0.781, respectively, suggesting the MLST scheme was also more discriminatory for clinical strains. The nSTs and STs of the reference clinical strains were not found in our environmental isolate collection except the nST20 and the ST1. The initial LD isolate, Philadelphia-1 is an ST36 (also called Philadelphia sequence type) strain and was discovered on the outbreak of Philadelphia LD in 1976 [51]. After that, many ST36 isolates were found in outbreak investigations and sporadic cases in the USA [52]. ST36 was the most frequent ST that associated with LD outbreak in the USA during 1982 and 2012 [52]. It was also prevalent both in clinical and environmental isolates Bootstrap support values (1000 replicates) for nodes higher than 50% are indicated next to the corresponding node. Five main groups of the branches could be found. Different color of the branches indicated distinct groups of the nSTs, and branches with the same color were clustered into a group. The blocks indicate the strains of corresponding nSTs. A indicates artificial isolates, and N indicates natural isolates. https://doi.org/10.1371/journal.pone.0190986.g001

Fig 2. Phylogenetic tree of concatenated SBT sequences (2501/2498bp) of the 33 STs in this study.
Bootstrap support values (1000 replicates) for nodes higher than 50% are indicated next to the corresponding node. Five main groups of the branches could be found. Different color of the branches indicates distinct groups of the nSTs. Branches with the same color are clustered into a group. The relative size of solid circles indicates the number of isolates in the selective group; the red sector indicates artificial water isolates, while the blue sector indicates natural water isolates. The blocks indicate the isolates of the corresponding STs. A indicates artificial water isolates, and N indicates natural water isolates.
https://doi.org/10.1371/journal.pone.0190986.g002 distributed over 25 countries. The first clinical strain that isolated in Chinese mainland also belonged to ST36 [53]. The MLST scheme could subtype the eight clinical ST36 strains to three nSTs (nST92, nST98, and nST99) (Fig 6A). However, the outbreak ST36 isolates (C3_O, C7_O, and Philadelphia-1) could be subtyped into nST92 and nST98, and the sporadic ST36 isolates (C1_S, C2_S, C9_S, and C10_S) could be subtyped into nST92 and nST99 (Table 1). These clinical ST36 isolates situated in a clade of the ML tree of nSTs (Fig 6A). It was interesting that nST98 was exclusively found in the outbreak ST36 isolate (C7_O), while nST99 was exclusively found in the sporadic ST36 isolate (C10_S), and the phylogeny of the two nSTs was distinctive (Fig 6A). A nine-nucleotide difference in the trpA locus was found between nST98 and nST92, while only a single nucleotide difference in the icmK locus was found between nST99 and nST92, and these nucleotide differences were all found between nST98 and nST99 (data not shown). This would illustrate that some sporadic isolates and outbreak isolates were genetically different. We found two ST187 strains; Thunder Bay and ATCC43290 shared the nST92 with the clinical ST36 isolates, including C1_S, C2_S, C3_O, C5_P, C7_O, C9_S, C10_S, and Philadelphia_1 ( Fig 6B). The allelic profiles of ST36 and ST187 were 3, 4, 1, 1, 14, 9, 1 and 3, 10, 1, 28, 14, 9, 3, respectively. There were three loci (pilE, mip, and neuA) differences between the two STs, and contributed to 18 nucleotide differences, implying incongruous phylogenetic relationships between the SBT and MLST sequences in the clinical isolates, which have also been observed in our environmental isolate collection (Figs 1 and 2). We also used ten additional reference environmental ST36 strains to study the discriminatory ability of the MLST for isolates belonging to a same ST (ST36) but from different sources (clinical and environmental) ( Table 1). The ten environmental ST36 isolates could also be subtyped into three nSTs (nST3, nST92, and nST101). NST92 was found in both clinical and environmental ST36 isolates, and was the most prevalent nSTs of the eighteen ST36 isolates (13/18, 72.22%). NST3 and nST101 were exclusively found in environmental isolates, while nST98 and nST99 were exclusively found in clinical isolates, indicating different distribution patterns of nSTs between environmental and clinical ST36 isolates. Phylogenetic analysis of these ST36 isolates showed two main groups. NST98 (C7_O) was situated on its own distinct branch, separated from other four nSTs (nST3, nST92, nST99, and nST101) (S1 Fig). These results suggested that the MLST scheme could also subtype the prevalent ST36 isolates, and the phylogenetic relationships among ST36 isolates from clinical and environmental sources might be different, which was supported by Mercante and colleague [51]. ST47 was most frequently isolated from patients in many countries such as Netherlands and France [54,55]. In this study, two ST47 strains, Lorraine and LP_617 could be subtyped into two nSTs: nST94 and nST100. The phylogenetic tree of the concatenated MLST sequences showed these isolates were closely related to each other and clustered into a clade (Fig 6A). We have found similar phylogenetic relationship between Lorraine and LP_617 in a pilot study of rapid whole-genome sequencing for the investigation of a Legionella outbreak, in which single-nucleotide polymorphism (SNP)-based (also known as mapping-based) approach was performed, and it showed that LP_617 was only 56 SNPs different from Lorraine in the genome, and thus the two ST47 strains could be distinguished [56]. This fact highlighted the possibility that the MLST scheme also had discriminatory ability for some strains with very small genetic differences. As we know, traditional background mutation, gene deletion, episomal loss/acquisition, and horizontal gene transfer have led to varying degrees of genetic divergence in a related subpopulation of L. pneumophila https://doi.org/10.1371/journal.pone.0190986.g003 [57]. Furthermore, we also found more PREs of nSTs than those of STs in our environmental isolates (Table 5 and Table 6). We supposed these factors might contribute to the accelerated evolution of the MLST loci compared with the SBT loci and lead to the generation of new allelic profiles of nSTs, as it was well believed that clinical L. pneumophila was a small specific subset of all genotypes existing in nature, perhaps representing an especially adapted group of clones [39].
Two ST1 strains, OLDA and Paris were both assigned to nST20. One environmental ST36 isolate (E8_O), which was proved to be associated with LD outbreak was assigned to nST3. In our environmental collection of L. pneumophila, an ST1 isolate (A31) and an ST630 isolate (A23) were both nST20, and an ST242 (A3) isolate was nST3. However, most of our environmental isolates typed as ST1 were characterized by different nSTs, and only nST20 and nST3 could be found in the clinical strains or strains associated with LD outbreak. Furthermore, we found higher discriminatory power of the MLST for the environmental isolates than for the clinical isolates. In light of these findings, the role of environmental sources as a potential reservoir of distinct pathogens could be reinforced [58]. ML trees of the ten nSTs and eight STs of the clinical isolates both showed two main groups. However, the isolates constituted these groups were different. NST93 (ERS1434278 and 130b) and nST95 (Lens) constituted a distinct clade in the ML tree of nSTs, while ST187 (Thunder Bay and ATCC43290) and ST36 (C1_S, C2_S, C3_O, C5_P, C7_O, C9_S, C10_S, and Philadelphia_1) constituted a distinct clade in the ML tree of STs (Fig 6). We also found a relatively longer phylogenetic distance of the https://doi.org/10.1371/journal.pone.0190986.g004 Table 6. Intragenic recombination in the 33 STs by using six different methods implemented in RDP software.

Recombinant STs
Major parent Ã Minor parent # Detection methods implemented in RDP software $ MLST sequences than that of the SBT sequences within the clinical isolates. For example, the phylogenetic distance between ERS1434278 and Lens of the nST tree was longer than that of the ST tree (Fig 6). These results together suggested that the MLST scheme was a more discriminatory means for epidemiological investigation of clinical and environmental L. pneumophila isolates. It is well known that the major advantage of SBT has been the ease of exchanging data between different laboratories, but the evidence of a large proportion of cases is caused by a small number of common STs (e.g., ST1 and ST47) indicated this scheme lacked discriminatory power [13]. Thus the five-gene MLST scheme we proposed here might be used as a supplementary method for epidemiological investigation of L. pneumophila.

Conclusions
Although there have been many studies probing new typing methods for L. pneumophila, such as SNP-based approach [56], whole-genome mapping (WGM) [17], cgMLST [13,50] and rMLST [59], these schemes required to sequence a great many of gene loci, and the cost and bioinformatics infrastructure might be issues in some laboratories. In this study, we reported a five-gene MLST scheme for genotyping of L. pneumophila isolates from environmental water samples and clinical samples, and compared with the SBT. Our results showed higher discriminatory power of the MLST for our environmental isolate collection. We have described the differences in ST and nST distributions and diversities of L. pneumophila isolates from natural and artificial water sources in Guangdong province of China. We found intragenic recombination might be one of an important mechanism that contributed to higher discrimination of  MLST, and higher diversities of STs and nSTs in natural water isolates. The MLST scheme also showed an extraordinary resolution in subtyping environmental ST1 isolates and high discriminatory power in genotyping clinical L. pneumophila strains. In addition, the MLST scheme could subtype the clinical isolates belonging to prevalent STs (ST36 and ST47). We found different distribution patterns of nSTs between environmental and clinical ST36 isolates, and between the outbreak clinical ST36 isolates and the sporadic clinical ST36 isolates.
These results together suggested that the MLST scheme could be used as part of a typing scheme that increased discrimination when necessary.
Supporting information S1