Whole Genome Analysis of Leptospira licerasiae Provides Insight into Leptospiral Evolution and Pathogenicity

The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics.


Introduction
Leptospirosis is a globally important tropical infectious disease that takes a disproportionate toll in tropical regions [1]. Caused by more than 250 serovars of spirochetes distributed among nine species of pathogenic Leptospira and at least five known species of intermediate Leptospira [2], the burden of leptospirosis disease falls predominantly on people living in poverty and under inadequate sanitary conditions [3]. Yet, pathogenic mechanisms in leptospirosis remain poorly understood [2]. Reasons for the varying pathogenic potentials of different varieties of Leptospira to cause human disease have not been explored. Mechanisms of leptospiral tropisms for different mammalian reservoirs hosts are unknown. Lateral transfer of DNA has been observed in Leptospira but mechanisms for such transfer have yet to be defined [4][5][6]. The present study was designed to gain insight into the evolution of intermediate Leptospira with the highest degree of resolution currently possible-using comparative whole genome analysisand to explore the degree to which evidence might link this leptospiral clade to an evolutionary position between pathogenic and saprophytic Leptospira clades as suggested by phylogenetic analysis of 16S rRNA gene sequences [7][8][9].
DNA-relatedness and phylogenetic analyses have resolved the genus Leptospira into three distinct lineages [8,[10][11][12][13][14] comprising 20 species: nine pathogens, five intermediates and six saprophytes. Pathogenic Leptospira are capable of infecting and causing disease in humans and animals; intermediate Leptospira are able to infect humans and animals and cause a variety of clinical manifestations [8,15,16], although less frequently; saprophytic Leptospira are environmental bacteria that do not infect mammals at all. Genome sequencing efforts have so far focused on pathogenic (L. interrogans [17,18] and L. borgpetersenii [19]) and saprophytic species (L. biflexa [20]). Genomic comparisons indicate that while the L. biflexa genome is relatively stable, the genomes of pathogenic species have undergone considerable insertion sequence-mediated rearrangement [19,20]. It has been shown that there is considerable genomic plasticity even within the same species. For example, an ,54 kb genomic island and a large inversion in Chromosome I differentiate the L. interrogans sv. Lai and Copenhageni genomes [18], whose coding sequences are ,99% similar at the amino acid level. A comparison of in vitro growth characteristics also indicates that the third lineage of Leptospira, which includes L. licerasiae, occupies an intermediate position between the pathogenic and saprophytic species. Despite reference to intermediate Leptospira as ''saprophytic intermediates,'' [9] convincing clinical data confirm the pathogenicity of these Leptospira [8,13]. Knowledge of the genomic content of these intermediate species is necessary to complete our understanding of leptospiral evolution.
In this study, we sequenced and annotated genomes of L. licerasiae sv. Varillal strains VAR010 and MMD0835, the first intermediate species to be sequenced. In view of the range of stresses encountered by pathogenic bacteria during the course of infection, it is becoming apparent that in addition to virulence factors such as hemolysins there are additional proteins or contributory (pathogenicity-associated) factors involved in stress management strategies that are essential for successful infection. Genomic comparisons of the infectious species L. licerasiae, L. interrogans, L. borgpetersenii and the non-infectious saprophyte L. biflexa have provided much needed insight into these contributory factors, leptospiral virulence and pathogenicity.

Methods
Bacterial strain and genomic DNA extraction L. licerasiae sv. Varillal type strain VAR010 T (human isolate) and strain MMD0835 (Philander isolate) were originally isolated in Iquitos, Peru [8]. The type strain has been deposited in the American Type Culture Collection (ATCC BAA-1110 T ). L. licerasiae sv. Varillal str. MMD0835 strain is available through BEI Resources (http://www.beiresources.org/). Both strains were grown in liquid Ellinghausen-McCullough-Johnson-Harris (EMJH) medium under standard culture conditions to a density of ,10 8 organisms/mL. Cells were harvested from 10 mL of culture (10 9 Leptospira) by centrifugation and genomic DNA (gDNA) was extracted using TRIzol (Invitrogen Life Technologies, USA) following manufacturer's directions. To remove RNA, extracted gDNA was then treated with an RNase cocktail (Roche, USA) containing RNase A and H.

Genome sequencing and assembly
The genome of L. licerasiae sv. Varillal type strain VAR010 T was sequenced using a combination of 454 FLX Titanium and Illumina Solexa Genome Analyzer IIX. Paired-end libraries were constructed with fragment sizes ranging from 2000 to 4000 for 454 and 200 to 300 for Illumina. A total of 2272294 reads (1:4.26 454:Illumina) were assembled using the Celera Assembler version 7.0beta [21]. The genome assembled into 14 contigs (4 scaffolds) at 58-fold sequence coverage with 99.93% of the genome with more than 19-fold coverage. L. licerasiae sv. Varillal str. MMD0835 was sequenced using just the Illumina Genome Analyzer II platform. A single paired-end library with a fragment size between 300-500 bp was constructed. A total of 1112438 reads were used by the CLC bio de novo assembler (CLC NGS Cell v. 3.20.50819, http://www. clcbio.com) to generate 48 contigs at 25-fold sequence coverage with 78.0% of the genome above 19-fold coverage (99.9% above 4-fold coverage).

Deposition of Genome Sequence Data
The nucleotide sequences and the corresponding automated annotations for the genomes of L. licerasiae str. VAR010 T and MMD0835 were submitted to GenBank, with accession numbers AHOO01000000 and NZ_AFLO00000000, respectively.

Annotation
Genomes were run through the JCVI automated annotation pipeline v10.0. Ab initio gene predictions were generated using Glimmer3 [22] in an iterative fashion. The initial set of gene predictions was then used to train a second round of Glimmer3 analysis to produce the final set of gene predictions. All predicted genes were subsequently translated into all six reading frames and searched against a non-redundant amino-acid database using BLASTP. Each query protein-coding region was extended by 300 nucleotides in an attempt to extend the alignment through regions of low similarity and through different frames and stop codons using Blast-Extend-Repraze (BER, http://ber.sourceforge.net/). All putative protein coding sequences (CDS) were then searched against Pfam [23] and TIGRFAM [24] protein family models with HMMER3 [25]. Coding sequences that scored well to these models were assumed to share the function modeled by the HMM. All predicted proteins were then searched against the NCBI Protein Clusters Database (PRK) [26]. The remaining evidence types used in the automated functional annotation of gene products were SignalP [27], which detects the presence of putative signal sequences and TmHMM [28] to predict membranespanning regions.
The autoAnnotate program weighed the evidence obtained from the searches from a ranked list of evidence types to make a preliminary annotation, including name, gene symbol, Enzyme Commission (EC) [29] number, JCVI role category [30], and Gene Ontology (GO) [31] terms to each protein in the genome. Each protein was assigned a descriptive common name coming from an HMM name, a JCVI database of experimentally characterized proteins (CharProtDB) [32], or from a best BER match protein. Proteins predicted to encode enzymes were assigned EC numbers, JCVI role categories, GO terms and gene

Author Summary
Leptospirosis is one of the most common diseases transmitted by animals worldwide and is important because it is a major cause of febrile illness in tropical areas and also occurs in epidemic form associated with natural disasters and flooding. The mechanisms through which Leptospira cause disease are not well understood. In this study we have sequenced the genomes of two strains of Leptospira licerasiae isolated from a person and a marsupial in the Peruvian Amazon. These strains were thought to be able to cause only mild disease in humans. We have compared these genomes with other leptospires that can cause severe illness and death and another leptospire that does not infect humans or animals. These comparisons have allowed us to demonstrate similarities among the disease-causing Leptospira. Studying genes that are common among infectious strains will allow us to identify genetic factors necessary for infecting, causing disease and determining the severity of disease. We have also found that L. licerasiae seems to be able to uptake and incorporate genetic information from other bacteria found in the environment. This information will allow us to begin to understand how Leptospira species have evolved.

Comparative genomics
Regions of pairwise synteny between the Leptospira genomes were identified by first finding the maximum unique matches with a minimum length of five amino acids using PROmer [22,43], followed by visualization of the data using MUMmerplot (http:// mummer.sourceforge.net/) and Gnuplot 4.0 (http://www. gnuplot.info/) as previously described [44]. QuartetS [45] was used to identify orthologous protein sequences among the eight Leptospira genomes used in this study. QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes [45]. The QuartetS pipeline was run with default parameters. To be considered orthologs, the bi-directional best hit pairs had to satisfy the following conditions: (i) the alignment region had to cover at least 50% of the length of each sequence and (ii) the e-value of the pair-wise alignment had to exceed 1e 25 .
To better understand the functional differences between pathogenic, intermediate and saprophytic Leptospira, each of the annotated genomes was uploaded to the RAST (Rapid Annotation using Subsystem Technology) server [46] retaining the original gene calls. Subsystems predicted to be active within each genome were then compared. A subsystem is a generalization of the concept of a biochemical pathway, extended to include ancillary components and alternative reactions reflecting functional variants found in various species.

Prophage detection
Prophages were identified using Phage_Finder [47] version 2.0, which now utilizes HMMER3 [25,48], drastically improving the speed of the HMM searches. Predicted prophage regions were identified using default settings and under strict (-S) mode. To facilitate identification of prophages in Leptospira genomes, Bacteriophage LE1 [49,50] [51] was added to the BLAST database used for prophage identification. Phage_Finder version 2.0 is available at http://sourceforge.net/projects/phage-finder/ files/phage_finder_v2.0/ under the GNU General Public License.

Lipopolysaccharide preparation and gas chromatography mass spectrometery (GC-MS) analysis
A three-day culture of L. licerasiae str. VAR010 (,10 8 cells/mL) was harvested by centrifugation at 4000 rpm for 90 min at room temperature. Cells were washed thrice with 16 PBS then treated with 50% aqueous phenol for 30 min at 65uC with continuous stirring. The cells were immediately immersed in an ice-water bath to reduce the temperature to 10uC, then centrifuged at 4000 rpm for 40 min at 10uC. The top layer (phenol saturated aqueous layer) and bottom layer (water saturated phenol layer) were removed and dialyzed against ddH 2 O extensively to remove phenol (three days with change in water twice per day)-the phenol layer was analyzed by GC-MS and polyacrylamide gel electrophoresis. The dialyzed lipopolysaccharide (LPS) was lyophilized then resuspended in 500 mL of water; 200 mL was used for sugar composition analysis. For GC-MS, samples were silylated using Trimethylsilyl (TMS). First, samples were methanolyzed using 1 M MeOH-HCl, at 80uC for 16 h, followed by re-N-acetylation and TMS derivatization using Tri-Sil TP reagent (Thermo Scientific) according to manufacturer's directions. The derivatives were subjected to GC-MS analysis and the data quantified using an internal inositol standard. LPS isolation and GC-MS analysis were done by the Glycotechnology Core Resource at the University of California, San Diego.

Results and Discussion
Assembly and annotation details of two draft L. licerasiae genomes 454 and Illumina pyrosequencing of str. VAR010 T yielded 2,272,294 reads that were assembled into 14 contigs (4 scaffolds) with 4,211,147 high-quality mostly contiguous bases. These contigs had an average length of 300.8 kb, an N50 of 522.9 kb and a maximum length of 1.67 mb. The str. MMD0835 genome was assembled into 48 contigs with 4,198,811 contiguous bases (N50 of 463.5 kb; max. length of 1.07 mb). The overall characteristics of the draft L. licerasiae genomes are summarized in Table 1. G+C content. Gaps in genome coverage were not filled in with manual sequencing given resource constraints. This approach is consistent with de novo sequencing and publication of other pathogen genomes, given that the length of the draft genomes was consistent with other sequenced leptospiral genomes (Table 1) and that the two strains whose genome sequences reported here are vastly similar. Gaps are typically caused by large (greater than the library ''insert'' size) fragments, which tend to be rRNA operons, large mobile elements or duplicated regions and likely do not materially detract from the quality of the data analysis presented here.

General genome features of L. licerasiae str. VAR010 and MMD0835
Non-coding RNA (ncRNA) genes and regulatory elements. The L. licerasiae genomes were examined for the presence of riboswitches [52] and other ncRNA regulatory elements. Riboswitch predictions in the finished leptospiral genomes were confirmed by an online search of the Rfam database [36]. Only candidates passing Rfam trusted cutoffs and therefore very likely to be true ncRNAs are presented. All infectious Leptospira contain at least two copies of the cobalamin (vitamin B 12 ) riboswitch. As in other bacteria, both riboswitches appear to regulate expression of genes necessary for transport and biosynthesis of vitamin B 12 . The first, LEP1GSC185_0331, is immediately upstream of a gene encoding a TonB-dependent ligand-gated channel with similarity to the outer membrane cobalamin transport protein, BtuB, and the second, LEP1GSC185_3336, is immediately upstream of two genes encoding a putative cobalt transporter (cbtA-LEP1GSC185_3338; LlicsVM_010100017167 and cbtB-  Riboswitches and cis-regulatory elements Cobalamin LEP1GSC185_3337; LlicsVM_010100017162) and the adjacent cobalamin biosynthesis (cob) operon. The lack of a cobalamin riboswitch and an incomplete cob operon in the saprophyte L. biflexa (see below) suggest that the ability to respond to cobalamin levels and synthesize B 12 de novo from simpler metabolites is restricted to infectious Leptospira. Interestingly, the L. licerasiae genes encoding CbtA and CbtB, which share homology with Pseudomonas syringae proteins, may have been acquired via lateral gene transfer (LGT) since these genes are uncommon in Leptospira-homologs of both proteins are also present in L. broomii, L. inadai and L. kmetyi.
All of the genomes studied here possess a single thiamine pyrophosphate (TPP; LEP1GSC185_0557) riboswitch ( Table 1) that in L. licerasiae is directly upstream of thiC (LEP1GSC185_0556). The thiamine biosynthesis protein, ThiC, converts 59-phosphoribosyl-5-aminoimidazole to 4-amino-5-hydroxymethyl-2-methylpyrimidine, an important intermediate in the synthesis of TPP. A putative cis-regulatory element unique to L. licerasiae was also identified, ydaO-yuaA (LEP1GSC185_1591). This element is thought to be triggered during osmotic shock leading to activation of ydaO, a predicted amino acid transporter gene, and members of yuaA-yubG operon, which encode KtrA and KtrB K + transporters [53]. While a role in L. licerasiae is yet to be established, this element is found immediately upstream of a universal stress family protein (LEP1GSC185_1590; LlicsVM_010100003660), which has homology at the C-terminus to a family of universal stress proteins (USPs) and Na+/H+ exchangers (NHEs). USPs are small cytoplasmic bacterial proteins whose expression increases when the cell is exposed to stress agents such as DNA-damaging agents [54]. These proteins are thought to enhance survival during prolonged exposure to such conditions [54]. Indeed, one such protein UspA is up regulated in Leptospira at physiological temperature implying a role during in vivo growth [55]. NHEs are found in both prokaryotes and eukaryotes and are believed to be crucial for cell volume homeostasis [56]. Thus, it is possible that in L. licerasiae, the ydaO-yuaA element responds to and permits survival during periods of osmotic stress. This mechanism could allow for survival in environmental waters.

Prophages
Prophages can be important drivers of microbial evolution by providing fitness factors for their host [57,58], by facilitating movement of DNA through transduction of the host chromosome or packaging of pathogenicity islands [59] and altering serotype through lysogenic conversion [60,61]. To explore any of these possibilities in any of the available Leptospira genomes, Phage_Finder [47] was run under strict (-S) mode to identify prophage regions. Phage_Finder identified two prophage regions in the genomes of both L. licerasiae strains.
The first region in each strain was located on ,103 kb contigs (AHOO02000007 in VAR010 and NZ_AFLO01000023 in MMD0835) with best BLASTP matches to bacteriophage LE1 of L. biflexa. LE1 was previously shown to be of circular topology, to form intracellular particles consistent with phage, and to replicate like a plasmid [51]. Given this information and that a large portion of each contig was predicted to be prophage, it was reasonable to believe these phage-like contigs in L. licerasiae also represented linear forms of circular phage genomes like LE1. There was significant overlap in the sequence of the ends, also suggesting a circular form. To demonstrate circular topology, outward-facing primers were designed and used in PCR reactions. The results of PCR produced 300 bp products, indicating that both LE1-like phages are indeed circular in L. licerasiae strains VAR010 and MMD0835 (Figure 1). Comparisons between LE1 and these prophages at the protein level indicated that the later three quarters of the L. licerasiae prophage proteins match some portion of LE1 (Figure 1), albeit at a low percent identity (average ,30% identity). A comparison between the two LE1-like prophages revealed that they are identical at the amino acid level (Figure 1). We propose naming the L. licerasiae LE1-like prophages vB-LliZ_VAR010-LE1 and vB-LliZ_MMD0835-LE1 using a previously suggested systematic bacteriophage nomenclature [62]. vB-LliZ_VAR010-LE1 encodes 102 predicted proteins and has a G+C of 37.8% which is lower than the average for the entire L. licerasiae genome-41.6%. These L. licerasiae prophage elements possess ,22 kb of unique sequence that LE1 lacks as well as several unique predicted open reading frames interspersed among the LE1 homologs. A comparison of this ,22 kb region to other Leptospira genomes identified multiple efflux pumps in the infectious L. licerasiae that may function in adaptation to the mammalian host. Further, this amino acid similarity to bacterial efflux pumps suggests phage-mediated gene transfer between the L. licerasiae chromosome and LE1-like prophage. While the presence of these efflux pumps in the genomes of other infectious species would also suggest a role in pathogenicity, BLASTP searches against the non-redundant protein database (nr) indicate that these proteins have homologs in the non-pathogen Leptonema illini DSM 21528. Why L. licerasiae and not L. biflexa have maintained copies of these genes is unclear. Also within this region, the predicted L. licerasiae protein LEP1GSC185_3887 is notable in that it shares homology with a TolC/IS1533 transposase fusion protein. It has been suggested that the mobile genetic element (MGE) IS1533, has mediated LGT resulting in the antigenic switch of sv. Copenhageni to sv. Hardjo [63].
The second prophage region was only partially detected in the L. licerasiae genomes by Phage_Finder (Figure 2), but is adjacent to a cryptic prophage region expressed in L. interrogans sv. Lai and is presumably associated with pathogenicity [64]. The region detected by Phage_Finder is located at nucleotide position 210203..191954 of VAR010 and 71814..108770 of MMD0835, but after comparison to the above mentioned unnamed prophage element in L. interrogans sv. Lai, could be extended to include coordinates 210203..171583 of VAR010 and 71814..110434 of MMD0835 ( Figure 2). Presumably the reason this region was truncated by Phage_Finder was due to a lack of sufficient homology in the BLAST database used and/or due to the lack of a head morphogenesis region, which is required by Phage_Finder to label a region as ''prophage'' under strict mode. Since this region lacks an identifiable head morphogenesis region yet retains tail-like proteins, it may be functionally analogous to phage tail-type bacteriocins, called pyocins in Pseudomonas aeruginosa [65] and monocins in Listeria [66].
Comparison of the pathogenic, intermediately pathogenic and saprophytic leptospiral genomes L. licerasiae str. VAR010 causes mild disease in humans and has been isolated from peridomestic and wild rodents and marsupials in Peru [8]. Although phenotypic differences between VAR010 and MMD0835 have yet to be described, VAR010 (3931 total CDS) has 185 non-orthologous CDS relative to strain MMD0835 (3885 total CDS), whereas strain MMD0835 has 140 nonorthologous CDS relative to strain VAR010 reminiscent of another environmental pathogen with a plastic genome, Burkholderia pseudomallei [67]. The majority of these non-orthologous genes encode hypothetical proteins. Both strains share 3,745 CDS with an average pair-wise amino acid similarity of 99.98%. Of these, 1211 have no orthologs in the other genomes used in this study. A putative function could be assigned to 632 with the remainder comprising hypothetical (579) proteins (Table S1).
Considering only those genes common to both strains of each species, L. licerasiae shares 2,237 (,57%) with L. interrogans, 2,077 (,53%) with L. borgpetersenii and 1,898 (,48%) with L. biflexa. 1,547 orthologs (,39% of the predicted L. licerasiae CDS) were present in all genomes compared ( Figure 3) and likely represent a substantial proportion of the core genome of Leptospira. As shown in Figure 4, the gene order is more conserved in the intermediate and pathogenic branches. Surprisingly, L. licerasiae had the highest average protein identity with L. interrogans sv. Lai (2,278 proteins with an average pairwise identity of ,67%). Taken together these observations suggest that L. licerasiae is more closely related to the pathogenic branch of infectious Leptospira than to the saprophyte, L. biflexa. This was unexpected since 16S rRNA phylogeny suggests that L. licerasiae occupies an intermediate position between the pathogens and saprophytes [8]. Table 2 shows the subsystem distribution of predicted CDS in L. licerasiae, L. interrogans, L. borgpetersenii and L. biflexa. Based on these data it would seem that intermediate Leptospira retain several proteins related to nitrogen, amino acid and carbohydrate metabolism that have likely been lost by the pathogenic sub-branch. For example, L. licerasiae (LEP1GSC185_2652) and L. biflexa (LEPBI_I1590) both possess ilvA, which encodes threonine ammonia-lyase an enzyme that catalyzes the conversion of threonine to 2-oxobutanoate; while neither L. borgpetersenii nor L. interrogans appears to do so. That L. licerasiae and perhaps the other intermediates do well in artificial culture media might be related to the retention of these metabolic functions.
It is a commonly accepted concept that genes unique to pathogenic microorganisms are likely to be necessary for infection (pathogenesis). To identify potentially pathogenicity-associated genes, we compared the genome content of three infectious leptospiral species, L. licerasiae (2 strains), L. interrogans (2 strains) and L. borgpetersenii (2 strains) with that of the non-infectious saprophyte, L. biflexa (2 strains). These comparisons identified 452 conserved pathogen-specific proteins (Figure 3). Based on domain homology searches, 315 were assigned a putative function (Table  S2). Infectious Leptospira species share a number of proteins predicted to participate in environmental signaling and processing and metabolism (Table S2).
That the infectious species studied here appear to possess a complete vitamin B 12 biosynthesis operon and a novel regulatory mechanism is perhaps the most notable metabolic difference between infectious and non-infectious Leptospira. Indeed the absence of these genes from the L. biflexa and recently sequenced Leptonema genomes would indicate that the ability to synthesize B 12 was acquired after the speciation event giving rise to the infectious branch of Leptospira predating the separation of the intermediate and pathogenic sub-branches. The genomes of over 100 infectious Matthias and J. Vinetz manuscript in preparation), supporting the belief that these elements are essential for pathogenicity. As in other bacteria, the availability of different nutrients inside and outside the mammalian host requires changes in the metabolic capacity of Leptospira. Published data have firmly established that Leptospira have an absolute requirement for B 12 for growth at 37uC [68]. Much like iron, B 12 is sequestered in vivo. Hence, for survival in vivo, leptospiral pathogens need to synthesize B 12 de novo or scavenge B 12 from the host. Whether leptospires are fully capable of synthesizing the highly complex B 12 molecule from simpler   precursors de novo is not known. But, cobI (LEP1GSC185_3345; LIC20129), an enzyme involved in cobalamin biosynthesis, is ,30-fold up regulated during mammalian infection consistent with a role in vivo in replication and/or pathogenicity (J. Lehmann, J. Vinetz, and M. Matthias manuscript in preparation). In addition, although all leptospiral genomes sequenced to date, including L. biflexa, encode the enzyme cob(I)yrinic acid a,cdiamide adenosyltransferase, which catalyzes the first step in the conversion of cobinamide to B 12 , all infectious Leptospira, including L. licerasiae, L. interrogans, L. borgpetersenii, L. santarosai, L. noguchii and L. weilii, encode at least one additional homolog. The reason for this is unclear, but it may be that these pathogen-specific homologs are required for B 12 biosynthesis in vivo. While L. interrogans, L. borgpetersenii and L. licerasiae, appear to be able to use either Lglutamate or cobinamide to synthesize B 12 , it would seem that this is not a universal feature of infectious Leptospira.
Leptospira encode four essential B 12 -dependent enzymes: B 12dependent methionine synthase, two B 12 -dependent methylmalony-CoA mutase related proteins and a B 12 -dependent ribonucleotide reductase. Methionine synthase transfers a methyl group from methyl-tetrahydrofolate to homocysteine as the final step in the synthesis of methionine; ribonucleotide reductases generate the deoxyribonucleotides needed for DNA synthesis and allow the production of DNA in the absence of oxygen; methylmalonyl-CoA interconverts (R)-methylmalonyl-CoA and succinyl-CoA in the terminal step of b-oxidation of fatty acids/catabolism of choles-terol. A role for B 12 in leptospiral pathogenicity has yet to be established. However, B 12 synthesis has been linked to fatty acid metabolism and survival of the intracellular pathogen Mycobacterium tuberculosis in vivo [69]. As humans do not synthesize B 12 , these genes may represent novel drug targets.

Unique genomic features of the L. licerasiae O-antigen locus
Previously published immunological data from Peru indicate that the L. licerasiae O-antigen is antigenically unique [8]. Comparative analysis of all extant Leptospira spp. genomic data, including the new data presented here, explains this antigenic uniqueness at a genomic level. In contrast to the complex LPS Oantigen biosynthetic loci found in the published L. interrogans, L. borgpetersenii and L. biflexa genomes, which contain 91, 76 and 56 genes respectively, the L. licerasiae O-antigen locus we propose is comprised of a modest 6-gene operon, LEP1GSC185_2122-2127 ( Figure 5, Table 3). The genes in this cluster have no apparent orthologs in the already sequenced L. interrogans, L. borgpetersenii and L. biflexa genomes. We are confident that this operon is the true L. licerasiae O-antigen locus based on the following observations: 1) There are only two wzx O-antigen transporter homologs in the genome. One of these (LEP1GSC185_0029) is not in an operon with any other genes of types typically associated with O-antigen biosynthesis. The other, LEP1GSC185_2124, is part of the proposed O-antigen locus. 2) Of the 29 putative polysaccharide glycosyltransferases we could identify in the L. licerasiae genome, while none are orthologs of genes in the O-antigen regions of the other sequenced Leptospira genomes, 22 are bidirectional best hits (that is, candidate orthologs) with non-O-antigen related genes from one or more of these genomes. Of the remaining 7 genes, one (LEP1GSC185_3401) is associated with a glycogen-related operon, two (LEP1GSC185_1696 and _2304) are part of short operons with genes of unknown function, and one (LEP1GSC185_2985) is proximal to flagellin genes. The remaining three, interrogans, which suggests a complex history of differential genome rearrangement and LGT events in these two species. Indeed, the six genes of the operon do not correspond to any syntenic blocks in any sequenced genome -the most similar genes to each are present in entirely disjoint sets of bacterial and archaeal strains    (Table 3). This would seem to imply either, 1) that potential en bloc LGT source genomes with similarly constructed O-antigens have yet to be sequenced, or 2) that a series of LGT events from different sources have accumulated these genes in the L. licerasiae lineage to create a novel O-antigen cluster. Since the O-antigen cluster is not predicted to reside on a GI the latter seems more likely. By analogy to extant knowledge of how E. coli O-antigen operons are assembled [87], we can hypothesize that the L. licerasiae O-antigen consists of a repeating unit with at least 4 sugars (corresponding to the primer sugar and the products of the three glycosyltransferase enzymes), that these sugars are bioavailable from the core Leptospira metabolome (i.e. glucose, galactose, mannose, etc., since no blocks of sugar biosynthesis or sugar modification genes are present in the operon) and at least one of them is modified by (probably) pyruvoyl and/or acetyl groups.
The chemical composition of the polysaccharide component of leptospiral LPS has been examined in a few serovars [88][89][90]. The proportion of the major component sugars rhamnose, galactose, arabinose and xylose was shown to vary between strains. The composition of the LPS derived from L. licerasiae sv. Varillal is consistent with previously published data. However, our GC-MS analysis indicates that L. licerasiae LPS ( Figure 6) is composed primarily of arabinose (,61.6%), with xylose (,12.8%), mannose (,11.5%), rhamnose (,9.3%), galactose (,4.0%) and glucose (,1%). The relative proportion of arabinose and rhamnose (6:1) in the LPS of L. licerasiae is significantly different from that (1:3) reported in L. interrogans sv. Copenhageni [89], which might help to explain why there is absolutely no serological cross-reactivity between sv. Varillal and Copenhageni [8]. The presence of rhamnose in the purified VAR010 LPS is surprising since the genome does not appear to encode a complete pathway for the synthesis of dTDP-L-rhamnose shown to be present in L. interrogans and L. borgpetersenii; the enzymes that catalyze the final two steps on the pathway, rmlC and rmlD, are absent. Because both L. licerasiae genomes are unfinished, it is possible that these genes reside on unsequenced regions of the genome. But, since other intermediate strains sequenced to date, L. broomii and L. inadai and the saprophyte, L. biflexa also seem to lack rmlC and rmlD homologs, it is also biologically plausible that L. licerasiae truly lacks either gene. A TBLASTN search against the L. licerasiae genomes failed to produce any significant alignments, thus it is does not seem that the genes were missed. L. licerasiae does possess the enzymes necessary to synthesize GDP-D-rhamnose from GDP-D-mannose, gmd (LEP1GSC185_1627; LlicsVM_010100003480) and rmd (LEP1GSC185_1109; LlicsVM_010100011485). Although rare, other pathogens such as Pseudomonas aeruginosa have been shown to produce LPS containing D-rhamnose [91]; therefore, it is possible that L. licerasiae produces D-rhamnose, but this needs to be confirmed experimentally.
The polymerase (wzy) and chain length determinant (wzz) genes are not observed in the proposed O-antigen locus, but may be located elsewhere in the L. licerasiae genome. These genes may be difficult to identify by homology due to their membrane protein nature. There are two identified wzy homologs in L. licerasiae with candidate orthologs in L. interrogans and L. borgpetersenii. There are no obvious wzz homologs in the L. licerasiae genome. The formal possibility exists that this O-antigen consists of only a single repeat, obviating the need for wzy and wzz genes, but this would be unprecedented if true.   Seven putative genomic islands in L. licerasiae ranging in size from 5 kb to ,36 kb (Table 4) were identified, the longest of which coincides with the previously mentioned cryptic prophage in sv. Lai and Copenhageni [64]. In addition, we found 28 putative type II toxin-antitoxin systems (TASs) in the VAR010 genome (Table 5). TASs belong to the prokaryotic mobilome as they are extensively, if not preferentially, spread via plasmidmediated LGT [92]. Like many, if not most of the mobilome members, the TASs are not simply mobile, but appear to behave like selfish elements. If a mobile genetic element encoding a TAS is lost during cell division, the concentrations of the labile antitoxin rapidly decreases, allowing the toxin, which is more stable, to kill the cell. Thus, TASs contribute to the stable maintenance and dissemination of plasmids and genomic islands in bacterial populations despite the associated fitness cost. In M. tuberculosis, 37% of these systems are located on genomic islands [93]. In L. licerasiae, 36% (10/28) of the putative type II TASs reside on putative genomic islands, and thus, appear to have been acquired by LGT. Of the L. licerasiae type II TASs, chpK/chpI (Table 5) has been confirmed in L. interrogans [94] and appears to be unique to infectious species [95]. L. interrogans encodes another four TASs [96]. By contrast, L. biflexa str. Ames and Paris possess several TASs (22 and 20 TASs, respectively [97]) much like L. licerasiae.
As additional independent evidence of lateral transfer, more than half of the L. licerasiae-specific CDS have no or poor homology with other leptospiral proteins. These include phosphate, chromium and molybdate transport systems. Of these proteins, most have homology with non-invasive environmental bacteria including Sorangium cellulosum [6 proteins], Bdellovibrio bacteriovirus [6 proteins] and Haliscomenobacter hydrossis [5 proteins]. While IS elements appear to be major contributors to genomic diversification in pathogenic Leptospira, which may possess more than 20 insertion sequence (IS) elements [19], the relative lack of IS elements in the L. licerasiae and L. biflexa genomes would suggest that genomic diversity where it exists is a result of different mechanisms. The phylogenetic origins of the laterally transferred genes, suggest that L. licerasiae is able to exchange genetic material with non-invasive environmental bacteria, whether this species can become naturally competent remains to be determined.

Conclusion
This study bridges a major gap in our knowledge of leptospiral biology and addresses a key question in the field regarding the pathogenic potential of the intermediate clade of Leptospira [9]. The data presented here 1) demonstrate that L. licerasiae is more closely related to pathogenic than to saprophytic Leptospira; 2) provide insight into the genomic bases for its infectiousness and unique antigenic characteristics; and 3) support the denomination of the intermediate clade as 'intermediately pathogenic' and its consideration as a transitional group between saprophytes and pathogens.
Future comparative genomic analysis of the complete set of Leptospira species will provide deeper large-scale insights into the evolution, biology and evolution of virulence of this genus of spirochetes, and guide new experimental directions.