Genetic diversity of Leptospira isolates in Lao PDR and genome analysis of an outbreak strain

Background Although Southeast Asia is one of the most leptospirosis afflicted regions, little is known about the diversity and molecular epidemiology of the causative agents of this widespread and emerging zoonotic disease. Methodology/Principal findings We used whole genome sequencing to examine genetic variation in 75 Leptospira strains isolated from patients in the Lao PDR (Laos) between 2006 and 2017. Eleven serogroups from 4 Leptospira species and 43 cgMLST-defined clonal groups (CGs) were identified. The most prevalent CG was CG272 (n = 18, 26.8%), composed of L. interrogans serogroup Autumnalis isolates. This genotype was recovered throughout the 12-year period and was associated with deaths, and with a large outbreak in neighbouring Thailand. Genome analysis reveals that the CG272 strains form a highly clonal group of strains that have, for yet unknown reasons, recently spread in Laos and Thailand. Additionally, accessory genes clearly discriminate CG272 strains from the other Leptospira strains. Conclusions/Significance The present study reveals a high diversity of Leptospira genotypes in Laos, thus extending our current knowledge of the pan- and core-genomes of these life-threatening pathogens. Our results demonstrate that the CG272 strains belong to a unique clonal group, which probably evolved through clonal expansion following niche adaptation. Additional epidemiological studies are required to better evaluate the spread of this genotype in Southeast Asia. To further investigate the key factors driving the virulence and spread of these pathogens, more intense genomic surveillance is needed, combining detailed clinical and epidemiological data.


Methodology/Principal findings
We used whole genome sequencing to examine genetic variation in 75 Leptospira strains isolated from patients in the Lao PDR (Laos) between 2006 and 2017.
Eleven serogroups from 4 Leptospira species and 43 cgMLST-defined clonal groups (CGs) were identified. The most prevalent CG was CG272 (n = 18, 26.8%), composed of L. interrogans serogroup Autumnalis isolates. This genotype was recovered throughout the 12-year period and was associated with deaths, and with a large outbreak in neighbouring Thailand. Genome analysis reveals that the CG272 strains form a highly clonal group of strains that have, for yet unknown reasons, recently spread in Laos and Thailand. Additionally, accessory genes clearly discriminate CG272 strains from the other Leptospira strains.

Conclusions/Significance
The present study reveals a high diversity of Leptospira genotypes in Laos, thus extending our current knowledge of the pan-and core-genomes of these life-threatening pathogens.

Introduction
It is estimated that one million patients suffer severe leptospirosis each year with nearly 60,000 deaths, mostly in developing tropical countries [1]. The global burden of leptospirosis in terms of disability-adjusted life years (DALYs) is in the same range or even higher than for dengue, rabies, schistosomiasis, leishmaniasis and lymphatic filariasis [2]. It is very likely underestimated because of misdiagnosis and inadequate surveillance systems in place in most countries, particularly where other diseases with similar non-specific presentations, such as dengue and malaria, are prevalent. Leptospirosis is likely to become even more prevalent due to (i) global climate changes resulting in more frequent and severe flooding [3], and (ii) the growing population residing in urban slums [4]. Pathogenic Leptospira colonize the proximal renal tubules of reservoir hosts and are excreted through urine into the environment. Infections usually occur through contact with water or soil contaminated with the urine of infected animals. Leptospires are highly motile spirochetes that penetrate abraded skin and mucous membranes to rapidly disseminate hematogenously, causing fever, Weil's disease or pulmonary hemorrhage syndrome [5].
Leptospira is a highly heterogeneous bacterial genus, divided into 64 species [6], 17 of which are potentially infectious to both humans and animals, and subdivided in nearly 300 serovars. However, for yet unknown reasons, a limited number of Leptospira serovars are much more likely to cause severe disease than others [7][8][9]. Recently, we provided insights into the virulence evolution of Leptospira spp. [6,10,11], identifying a group of species among the pathogens/subclade P1 most often associated with severe infections [6]. Better understanding of the diversity of Leptospira strains is important to (i) identify strains or genotypes responsible for severe infections, (ii) evaluate the accuracy of current diagnostic tools and whole Leptospira-vaccines and (iii) develop control and prevention strategies for Leptospira serovars associated with particular animal reservoirs. For this purpose, many different molecular typing schemes have been developed, including Pulse-Field Gel Electrophoresis (PFGE) [12], Multi-Locus Variable Number Tandem Repeat (VNTR) analysis [13], and several Multi-Locus Sequence Typing (MLST) methods [14,15]. This diversity of typing methods applied to different sample sets has resulted in fragmentation of our epidemiological knowledge of leptospirosis. Recently, we proposed a universal core genome MLST (cgMLST) scheme that allows highresolution genotyping of isolates across the entire Leptospira genus [16].
Leptospirosis is endemic in most countries of South and Southeast Asia [1,[17][18][19][20][21][22] and these regions appear to have the highest global burden estimates of leptospirosis with an estimated 266,000 cases and 14,200 deaths annually [1]. Laos is a land-linked country of seven million people, bordering China, Vietnam, Cambodia, Thailand, and Myanmar, with limited information on the circulating Leptospira strains. Serological surveys in Laos showed evidence of past leptospiral infections in 19% to 45% of the population [23,24]. Pathogenic Leptospira spp. are one of the leading bacterial pathogens causing fever [24][25][26], central nervous system infections [27] and acute jaundice [28,29] in Laos. We isolated strains from patients in Laos over 12 years and conducted whole-genome sequencing of 68 Leptospira isolates along with 7 from Laos already described [30] to yield a better understanding of the evolutionary dynamics of Leptospira.

Ethics statement
The study protocols were approved by the National Ethics Committee for Health Research, Government of the Lao PDR (134/2007), the Ethical Review Committee of Research MoH (2000) and the Oxford Tropical Research Ethics Committee, UK (006-07 and 015-10) and were conducted in compliance with the Declaration of Helsinki. All participants provided written informed consent. Written consent was obtained from the parent/guardian of each participant under 16 years of age. In Lao heath care, people aged 16 years and above are considered adults for health care decisions. The approved Lao national ethics clearance for this study included that patients aged 16 years and above are considered adults and therefore parent/guardian consent was not required.

Study design and study population
The isolates used here were from cultured blood clots left over after centrifugation to collect sera for aetiological investigations of the causes of fever. Blood clots from patients of any age and either sex admitted to Mahosot Hospital and Friendship Hospital, Vientiane City [31] with suspected community-acquired bacteremia were included provided they gave informed written consent (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). In addition, blood clots from inpatients and outpatients at Luang Namtha Provincial Hospital and Salavan Provincial Hospital (2010-2015) were included if they were aged 5-49 years, gave written informed consent, were eligible for malaria rapid diagnostic testing or microscopy by Lao national guidelines, had no obvious causes of fever (abscess or severe diarrhoea) with fever <8 days, and an admission tympanic temperature of >38 C [25]. History and clinical examination were recorded on standardised forms. Over the 12 years 36,495 patient blood samples were cultured for Leptospira, and a total of 158 isolations were made, with 75 successfully subcultured for this analysis. Epidemiological, clinical, and outcome features of the patients are shown in S1 Table.

Culture and Leptospira strains
Whole non-anticoagulated blood samples were collected from inpatients with suspected leptospirosis during 2006-2017 at the four hospitals (Mahosot Hospital and Friendship Hospital in Vientiane, Luang Namtha Provincial Hospital, and Salavan Provincial Hospital) [25,31].
Whole blood was centrifuged, serum removed and clots were incubated in Ellinghausen, McCullough, Johnson and Harris (EMJH) medium [32,33] overnight. The EMJH medium was then separated from the blood clot and incubated at 30˚C for up to 12 weeks with checks for growth by dark-field microscopy every 2 weeks at the Microbiology Laboratory, Mahosot Hospital. Serotyping was performed at the National Reference Centre for Leptospirosis (Institut Pasteur, Paris, France) as previously described with a panel of rabbit antisera representing 24 serogroups [34,35].

Whole-genome sequencing
Illumina sequencing was performed from extracted genomic DNAs of 68 exponential-phase cultures using a MagNA Pure 96 Instrument (Roche, Meylan, France). Next-generation sequencing (NGS) was performed using Nextera XT DNA Library Preparation kit and the NextSeq 500 sequencing systems (Illumina, San Diego, CA, USA) at the Mutualized Platform for Microbiology (P2M) at Institut Pasteur. CLC Genomics Workbench 9 software (Qiagen, Hilden, Germany) was used for analyses. The generated contig sequences together with the sample metadata are available in BIGSdb hosted at the Institut Pasteur (https://bigsdb.pasteur. fr/leptospira/). We also downloaded 7 additional genome sequences of Leptospira isolates from Laos from the NCBI database (S1 Table). These isolates originated from the same sources as detailed above [30]. Genome features of the 75 isolates are indicted in S1 Table. PacBio SMRT (Pacific Biosciences) sequencing was also performed for L. interrogans serogroup Autumnalis strain id779, a representative Laos strain belonging to the clonal group 272. Genomic DNA was extracted from a 35 ml culture with the Genomic tip 100 g kit (Qiagen, Hilden, Germany) according to manufacturer protocols. PacBio sequencing was performed at the Génome Québec Innovation Centre (McGill University, Montreal, Canada) using a Pacific BioScience RS II system. The sequencing reads were de novo assembled using Unicycler [36]. The complete genome of L. interrogans serogroup Autumnalis strain id779 can be visualized in the MicroScope Platform (https://mage.genoscope.cns.fr/microscope/home/ index.php). Sequences were deposited in GenBank under accession number SAMN18642399.
The reference-guided assembly of other samples was performed as described previously [48]. The de novo assembled genome was used as a reference in this step.

Genome analysis
The Percentage Of Conserved Proteins (POCP) values were determined using GET_HOMO-LOGUES version 08042020 [49] and the OMCL algorithm [50]. Functional annotation was carried out using eggNOG-mapper v2 (http://eggnog-mapper.embl.de) [51]. Representativeness of each functional category found among different clonal groups was expressed as a proportion of the total number of predicted proteins in each genome. An unpaired parametric ttest was used to compare the two independent groups. The core and pan genome of Leptospira was also evaluated using the same tool. Graphical representation of the results as well as statistical analysis were performed with GraphPad Prism 6. Circular maps were generated using CGview [52].
Core genome Multi-Locus Sequence (cgMLST) profiles were determined using BIGSdb as described previously [16]. Briefly, 545 core genomes were extracted, concatenated and analyzed in order to determine the cgST (core genome Sequencing Type) and cgMLST CGs (cgMLST Clonal Groups). A CG is defined as a group of cgMLST allelic profiles differing by no more than 40 allelic mismatches out of 545 gene loci. Phylogenetic trees were generated with MEGA 6 [53] using the Tamura Nei model and 100 pseudorandom bootstrap replicates. To compare sequencing data obtained from Laos with all available sequencing data obtained from the neighboring countries of Southeast Asia, we extracted the six genes (glmU, pntA, sucA, tpiA, pfkB, mreA, caiB) that are part of the MLST 1 scheme [30] from our raw data and enhanced our sample set by additional sequences stored at pubMLST (https://pubmlst.org/) database (S2 Table). The open-access and curated MLST (https://pubmlst.org/) and cgMLST (https://bigsdb.pasteur.fr/) databases for Leptospira were used as the source of strain metadata (reservoir, geographic location, etc).
The population dynamic of identified genotypes was shown by Muller diagram generated in R (version 3.6.1.) using the MullerPlot package [54]. The statistical interference was calculated in R (version 3.6.1).
Among the 75 patients, there was a higher proportion of males (71.2%) and the median (range) age was 30.0 (4-73) years. The Lao Loum, who represents 75% of the population of Laos, was the main ethnic group represented (88.4%), followed by Lao Thung (5.8%) and Lao Sung (5.8%) (S1 Table). Demographic, clinical and occupational characteristics of the study population are shown in Tables 1 and S1. Patients presented with median (range) of 3.9 (1-7) days of acute illness and most exhibited fever, headache and myalgia. Five patients died. Females had a higher frequency of deaths (33.3% of females and 7.1% of males, p = 0.0228). However, women affected by leptospirosis were older (median (range) age 34 (10-65) years) than males (mean (range) age 21.5 (4-73) years) (p = 0.0271). The vast majority of patients (95.5%) reported that they had seen rodents in the last 2 weeks before attending hospital (S1 Table).

Clonal groups and clusters of Lao Leptospira isolates
To investigate the genetic diversity of the 75 clinical isolates, we used a core genome MLST (cgMLST) scheme based on 545 genes [16]. cgMLST divided our sample set into four Leptospira species (above) and 43 clonal groups (CGs) (Fig 1A). The most prevalent CG was CG272 (n = 18, 26.9%), a novel CG composed of isolates belonging to L. interrogans serogroup Autumnalis. CG28 (n = 10, 13.3%), corresponding to L. interrogans serogroup Canicola, has been isolated from patients, rats, dogs, pigs and calves suggesting that this genotype is not restricted to a specific reservoir but is a widespread generalist genotype. CG40 (n = 4, 5.3%), composed of strains of L. interrogans serogroup Bataviae, has been described from a patient in Thailand. The remaining CGs are represented by less than three strains and they have not been found in other regions of the world, with the exception of CG25 (n = 2) which has been already isolated in Malaysia and Thailand. Among the Lao clinical isolates, most (37/43, 86%) CGs were detected only in a particular year, while other CGs declined and disappeared over a period of years. The only CG detected in almost every year (10/12 years) was CG272, associated with L. interrogans serogroup Autumnalis (Fig 1B).
Since there are few genomes of Leptospira strains isolated in Southeast Asia, we translated the whole genome data of these Lao clinical isolates into the widely used MLST 1 scheme [30]. We then compared the genetic data obtained in Laos with available Sequence Types (ST) in neighboring countries by MLST (Fig 2 and S2 Table). Strains belonging to CG272 were clustered by MLST with ST34 isolates forming the largest cluster (71 out of 258; 27.5%). ST34 strains were described as responsible for an outbreak of leptospirosis in northeast Thailand between 1999 and 2003 [59], and also isolated from bandicoot rats (Bandicota indica and Bandicota savilei) in Thailand [59].

Genomic features of the outbreak strain CG272 / ST34 and comparative genomics with other L. interrogans strains
As there was no whole genome sequence of the ST34 outbreak strain available in public databases we obtained the complete genome of one representative ST34/CG272 Laos strain using the PacBio single-molecule real-time (SMRT) sequencing method. The genome of L. interrogans serogroup Autumnalis strain id779 from a student who died of leptospirosis (S1 Table)  was composed of two chromosomes and one plasmid with the total length of 4,727,163 bp and a total number of 4,373 CDSs (Fig 3A). This isolate possesses an array of known virulence genes [60] including genes encoding for more than 100 membrane-associated lipoproteins such as LipL32 and LigA/B, 17 Leucine-Rich proteins, 5 hemolysins, 3 sphingomyelinases, etc. Moreover, this strain shares more than 85% (minLrap � 0.8; maxLrap � 0; Identity � 35%) of the CDSs found in the highly virulent strains L. interrogans serovar Copenhageni strain Fiocruz L1-130 and L. interrogans serovar Manilae strain UP-MMC-NIID LP, as determined using the MaGe Web interface (http://www.genoscope.cns.fr/agc/mage). Interestingly, strain id779 contains a 60-kb circular plasmid with most of the genes (>80%) encoding hypothetical proteins of unknown function (Fig 3A). Analysis of the presence of these plasmid genes in other Laos strains revealed that this plasmid is highly conserved within the CG272 strains and also present in other Autumnalis (CG297 and CG30) and Icterohaemorrhagiae (CG290, CG280, CG289) isolates (Fig 3B).
In order to investigate the genes that may be advantageous for environmental adaptation, host transmission, persistence or virulence of these ST34/CG272 strains compared with other strains, we analyzed and compared the deducted proteome of the two groups of L. interrogans strains (CG272 vs non-CG272 strains). To examine this, we first identified the percentage of conserved proteins (POCP) [61] of L. interrogans strains. The complete set of POCP values for all L. interrogans isolates are given in S4 Table. The POCP values for pairwise comparisons of each of the CG272 strains are �92%, confirming that they form a homogenous group of closely related strains. Other L. interrogans strains exhibited POCP ranging from 53-90%,

Fig 2. Prevalence of the ST34-like strains in Laos and neighbouring countries. Phylogenetic relatedness of pathogenic
Leptospira strains (n = 258) circulating in Southeast Asia. Maximum-likelihood phylogeny based on 994 SNVs found in 3,113 bp-long concatenated sequences of glmU, pntA, sucA, tpiA, pfkB, mreA and caiB loci characterized by MLST1 [30] together with the samples analyzed in this study. Isolates belonging to the ST34 (CG272) are highlighted in red. Metadata associated with the isolates not listed in S1 when compared with L. interrogans serogroup Autumnalis strain id779. Interestingly, higher genetic relatedness (POCP values ranging from 88-90%) was found between L. interrogans serogroup Autumnalis strain id779 and L. interrogans strains belonging to serogroups Grippotyphosa, Canicola, Autumnalis and Australis (S4 Table).
Among the Leptospira strains from Laos, we identified a pangenome of 11,748 unique protein-coding sequences. A large majority of genes (81%) in the pangenome are part of the accessory genome, which comprised the shell and cloud genes. Similarly, the pan-genome analysis of L. interrogans non-CG272 strains shows a strong enrichment (�5X) of gene clusters that are unique to one species (5681) as compared to gene clusters in the core genome (1211). In contrast, the core genome of L. interrogans CG272 strains accounted for 72% of the genes present in the pangenome (S1 Fig). As a second approach to explore signs of adaptation at the genomic level, orthology prediction was used to analyze the functional annotation of the genomes included in this study. Clusters of orthologous groups (COGs) of proteins were generated as a result (S5 Table). COG categories present in CG272 strains were contrasted against those of other clonal groups considered as a whole. In order to eliminate interspecies-related variability, only L. interrogans strains were included. Overall, those categories related to information storage and processing such as replication, transcription, or translation, among others, show little or no difference between groups. The only exception within this category is "Chromatin structure and dynamics", which shows a major difference with highest representation in CG272 strains (Fig 4). Interestingly the categories comprising transport and metabolisms of diverse molecules were over-represented in the CG272 strains suggesting an enrichment in metabolism processes.
To further resolve population structure of pathogenic Leptospira strains in Laos, we have reconstructed nearly complete genomes of L. interrogans strains investigated in this study using the complete genome of L. interrogans serogroup Autumnalis strain id779 as a reference (S6 Table). While achieving higher resolution, the ML phylogeny based on whole genome sequences (Fig 5A) recapitulated the branching of ML phylogeny based on the 545 core genes used in cgMLST (Fig 1A). L. interrogans strains were clustered in three clades with CG272 strains (n = 17) representing a separate monophyletic group (Fig 5A). The average nucleotide distance of CG272 isolates forming clade 2 was surprisingly low (0%) in comparison to the average nucleotide distance within the isolates belonging to the clade 1 and clade 3 (0.4% and 0.3%, respectively). In fact, there were only 62 SNPs across the whole genome sequences found among the 17 strains belonging to CG272/clade 2 isolated over a period of nearly 12 years. As revealed by the minimum spanning tree (GrapeTree), the CG272 was a dominant central genotype forming a star-like topology from which the other genotypes radiated (S2 Fig and  S7 Table).
In order to identify markers that could discriminate between ST34/CG272 isolates and other circulating isolates, we tested the most frequently used diagnostic and/or typing PCR assays (S3 Table). In silico PCR showed that all primers targeting lfb1, lipL32 and secY are able to specifically bind to all isolates in this study with at most 2 mismatches per primer sequence. To assess the resolution power of the assays, we compared the ML phylogeny build from the  Fig 1A. The heatmap was obtained by doing a TBLASTN analysis of the sequences of the plasmid against the genomic sequences of the strains. A gene was considered to be present if the hit had an e-value greater than 1e-10 and 50% similarity. The light green represents sequences with a similarity percentage between 50% and 79%, while the dark green represents sequences with a similarity percentage of 80% to 100%. Color bar on the left corresponds to: red, L. interrogans; blue, L. weilii, green, L. borgpetersenii; yellow: L. kirschneri.
https://doi.org/10.1371/journal.pntd.0010076.g003 sequences of PCR products to the core genome phylogeny. Sequencing of the PCR products of lfb1, lipL32 and two secY assays was able to distinguish 32%, 7%, 32.5% and 49% clonal complexes identified using cgMLST, respectively. Most importantly, sequencing the 549 bp long secY DNA fragment [57] with the highest resolution power (49%) was the only assay able to straightforwardly distinguish the CG272 strains from other strains.

Discussion
We used genome sequences of 75 Leptospira isolates from patients in Laos and metadata associated with these isolates. This is, to our knowledge, the largest genomic study that investigates the pathogenic Leptospira isolates circulating in a single country. We also included MLST data of clinical strains from other SE Asian countries (Thailand, Indonesia, Philippines, Malaysia) to explore the underlying diversity of Leptospira present and to understand the dynamics of epidemic and endemic Leptospira.
Core genome Multi-locus Sequence Typing (cgMLST) of the 75 strains revealed 43 different core genome clonal groups (cgCGs), revealing a high diversity of Leptospira genotypes in Laos. https://doi.org/10.1371/journal.pntd.0010076.g004 As described [16], one serogroup can be sub-divided in distinct CG but strains that are part of the same CG belong to the same serogroup. Among the 43 cgCGs, only 3 (CG272, n-18; CG28 = 10; and CG40 = 4) are composed of more than 3 strains. Analyses to identify associations of Leptospira genotypes to particular epidemiological variables, and specifically to test whether some genotypes are predictors of disease outcomes, cannot be performed with such a small sample size. The analysis of a larger number of strains would have more power to identify associations with clinical characteristics. Multiple definitions are currently in use for severe infections [7,[62][63][64][65] and a consensus definition would help in finding associations between clinical severity and genotypes. Additional information such as the epidemiology of reservoir hosts, modes of transmission and patient comorbidities and treatment would also facilitate more wide-ranging analysis.
We show that a single clone of L. interrogans serogroup Autumnalis, previously identified as ST34 in Thailand [59], was responsible for a significant proportion of infections in Laos between 2006 to 2017. This group of strains has been predominant in Thailand and a large outbreak of ST34 strains was reported between 1998 and 2003 but investigations did not find any specific epidemiological factor linked to this outbreak [59]. Whether ST34 was present in Laos before 2006 is unknown. An examination of the fitness of this outbreak strain in comparison with non-sympatric outbreak strains in different environmental conditions (pH levels, temperatures, water sources) did not find any significant difference [66]. Several genomic features suggest that ST34/CG272 strains are clinically important clones that require more attention: i.) CG272 was a dominant central genotype forming a star-like topology from which the other genotypes radiated suggestive of clonal expansion of these strains, ii.) CG272 was the only clonal group detected in all but 2 years, iii.) CG272 form a well-defined monophyletic clade and these strains are highly clonal (there was only 62 SNPs found across the 17 genomes over a period of nearly 12 years), and finally v.) the average nucleotide distances and POCP values within CG272 were low compared to other phylogenetic clades. The CG272 is therefore a clinically important and highly-clonal sublineage of L. interrogans in this region. The low genome diversity among CG272 strains may reflect adaptation of this clonal group to a specific niche such as the bandicoot rats which were identified as a likely maintenance host of ST34 [59]. As a note, the secY allele of ST34 had also been recovered from a wild mouse, Mus cookie, in northern Thailand [67]. On the contrary, strains occupying different niches such as the non-CG272 isolates should experience a larger diversity of selection pressures, which will drive selection for increased genome diversity [68].
Although the mechanisms of pathogenesis of pathogenic Leptospira spp. is not completely understood, virulence-associated factors have been identified, including motility, adhesion, stress response, and evasion of immune response [69]. To further investigate if ST34/CG272 strains possess genes that may confer a phenotypic advantage over other L. interrogans, we looked at genetic differences between Lao ST34/CG272 strains and non-ST34/CG272 strains. CG272 strains had an enrichment of different COGs, mostly associated with metabolism processes, which may play important roles in the adaptation of CG272 strains to specific ecological niches. The presence of a 60 kb plasmid encoding for a large proportion of hypothetical proteins was not specific to CG272 strains as it was also found across different CGs and serogroups. We failed to identify any other accessory genomic elements which might have influenced the spread of ST34/CG272 strains. We are thus far from understanding the precise nature of this clonal success.
Concerted control efforts targeting CG272/ST34 isolates specifically could reduce epidemic and endemic risks of leptospirosis in Southeast Asia. Leptospira are fastidious and slow-growing bacteria and culture isolation from biological samples is still challenging. Here, we show that secY can be used as a discriminative marker to identify CG272/ST34 isolates from other circulating strains. Leptospira genotyping directly on biological samples should allow the epidemiological follow-up of circulating strains as previously shown in leptospirosis patients in French Polynesia [70]. Large-scale epidemiological studies in Laos and neighboring countries should better identify the prevalence over time and the geographic spread of CG272/ST34 isolates in both patients and potential reservoir hosts. Leptospira genotyping should also contribute to identifying the extent and mode of transmission of epidemic clone(s) in case of outbreaks and the impact of control interventions on disease transmission.
In conclusion, genome analysis associated with detailed epidemiological and clinical data should lead to major insights into the evolution, biology and pathogenesis of this emerging pathogen. Whole-genome based typing should become available as a routine tool because of the continued decrease in costs, thus improving surveillance methodology and outbreak investigations as it has for the COVID-19 pandemic. Although the isolation of Leptospira strains from biological samples remains challenging, recent advances such as the use of a new cocktail of antibiotics [71] and development of a novel culture medium for fastidious Leptospira [72] should facilitate expanded culture isolation.
Supporting information S1  Table. COG categories of genes in the L. interrogans genomes. From the files generated by eggNOG-mapper v2, the columns corresponding to "gene name" and "COG categories" for each L. interrogans genome were copied into a new Excel file named "All_genomes_COG_analysis.xlsx". The information for each genome is contained in separate sheets, named after the corresponding genome. The comparative analysis was focused on finding differences in categories related to metabolic pathways, since most of them showed highly significant differences among the different groups (CG272 vs Others): [G] Carbohydrate transport and metabolism, [E] Amino acid transport and metabolism, [F] Nucleotide transport and metabolism, [H] Coenzyme transport and metabolism, [I] Lipid transport and metabolism, [P] Inorganic ion transport and metabolism. Therefore, genes related to these pathways were filtered, and then the list of genes for each genome was compared using the online tool https://www. molbiotools.com/listcompare.html. Minimum spanning tree was created using GrapeTree for visualization of core genomic relationships [1]. Every tree node represents a core genome of a single sample, the cgMLST clonal groups are indicated by the numbers inside the tree nodes and the geographic origin is determined by colors. Strains are listed in S6 Table. The base layer of the map is from outline-world-map.com. (DOCX)