Whole genome sequence analysis reveals broad distribution of the RtxA type 1 secretion system and four novel type 1 secretion systems throughout the Legionella genus

Type 1 secretion systems (T1SSs) are broadly distributed among bacteria and translocate effectors with diverse function across the bacterial cell membrane. Legionella pneumophila, the species most commonly associated with Legionellosis, encodes a T1SS at the lssXYZABD locus which is responsible for the secretion of the virulence factor RtxA. Many investigations have failed to detect lssD, the gene encoding the membrane fusion protein of the RtxA T1SS, in non-pneumophila Legionella, suggesting that this system is a conserved virulence factor in L. pneumophila. Here we discovered RtxA and its associated T1SS in a novel Legionella taurinensis strain, leading us to question whether this system may be more widespread than previously thought. Through a bioinformatic analysis of publicly available data, we classified and determined the distribution of four T1SSs including the RtxA T1SS and four novel T1SSs among diverse Legionella spp. The ABC transporter of the novel Legionella T1SS Legonella repeat protein secretion system (LRPSS) shares structural similarity to those of diverse T1SS families, including the alkaline protease T1SS in Pseudomonas aeruginosa. The Legionella bacteriocin (1–3) secretion systems (LB1SS-LB3SS) T1SSs are novel putative bacteriocin transporting T1SSs as their ABC transporters include C-39 peptidase domains in their N-terminal regions, with LB2SS and LB3SS likely constituting a nitrile hydratase leader peptide transport T1SSs. The LB1SS is more closely related to the colicin V T1SS in Escherichia coli. Of 45 Legionella spp. whole genomes examined, 19 (42%) were determined to possess lssB and lssD homologs. Of these 19, only 7 (37%) are known pathogens. There was no difference in the proportions of disease associated and non-disease associated species that possessed the RtxA T1SS (p = 0.4), contrary to the current consensus regarding the RtxA T1SS. These results draw into question the nature of RtxA and its T1SS as a genetic virulence determinant.

). In another class, such as the HlyA secretion system in E. coli (10), the ABC 79 transporters contain an N-terminal C-39 peptidase-like domain (CLD), which lacks the catalytic 80 histidine (9) ( Figure 1B). A third class of T1SSs are composed of ABC transporters that lack either 81 the C-39 peptidase or CLD. These systems typically secrete smaller substrates, including 82 epimerases and proteases in Azotobacter vinelandi and Pseudomonas aeruginosa, respectively 83 (11,12) ( Figure 1C).

84
In Legionella pneumophila, the prototypical T1SS is encoded at the lssXYZABD locus with 85 lssB and lssD encoding the ABC transporter and membrane fusion protein, respectively (13,14).

86
This complex is responsible for secreting the virulence factor RtxA, which is associated with 87 adherence, pore-formation, cytotoxicity, and entrance into host cells (15, 16). The RtxA T1SS 88 belongs to a subset of the CLD-type T1SSs whose substrates possess an N-terminal retention 89 module. The most well characterized system of this type is the adhesin LapA in Pseudomonas is a key virulence determinant unique to L. pneumophila. In the present study, we report the broad 96 distribution of the RtxA T1SS and three novel T1SSs throughout the Legionella genus and 97 examine their occurrence among strains of disease-associated Legionella.

99
Isolation and phylogenetic identification of a novel Legionella taurinensis strain containing 100 four type 1 secretion systems 101 During a survey of municipal and well waters in Genesee County, Michigan, endemic L. 102 pneumophila were targeted for isolation using standard culture methods, and isolates were 103 subjected to whole genome sequencing (Garner et al., in review). Sequence and phylogenetic 104 analysis revealed the four isolates obtained from municipal water sourced from an aquifer to be 105 novel Legionella taurinensis strains (21), a species that did not have a reference genome at that 106 time (Table S1). These genome sequences are deposited at DDBJ/ENA/GenBank under the 107 accession PRJNA450138. While analyzing the genomes of the isolates for virulence factors, the 108 strain was found to possess the lssXYZABD locus believed to be absent from non-pneumophila    Table S2). L. taurinensis possesses the lssXYZABD locus including a gene encoding 125 an LssD homolog which is 60% identical to L. pneumophila LssD ( Figure S1).
Following this observation, we examined 45 Legionella species whole genome sequences 127 for the RtxA T1SS by comparing amino acid sequences of L. pneumophila LssB and LssD against 128 the predicted proteomes of Legionella spp. using blastp (Table S2). A species was considered to 129 encode the RtxA T1SS if its genome encoded homologs of LssD and LssB with amino acid 130 sequence ≥ 40% identity (Table S2) and if the ABC transporter was monophyletic with L. 131 pneumophila LssB (Figure 3, Figure S3). These two proteins were chosen as they constitute two-   Table 1). This is noteworthy as a previous bioinformatic investigation reported the 145 absence of this locus in L. moravica DSM19234 in its entirety (22). This study reports using the  (Table 1). Therefore,   Figure S3, Figure S5).

191
For this system, the name Legionella bacteriocin 2 secretion system (LB2SS) is proposed.

245
In conclusion, we report that the RtxA T1SS and four novel T1SSs discovered in L.

253
Determining Epidemiological Features of Legionella spp. 254 We considered a Legionella species to be "disease-associated" based on whether not any 255 strain of the species has ever been isolated from a patient. To determine this, we referenced the for incomplete query cover) with one of the query sequences. This criterion was used for all genes, except lb1ssD, for which many species displayed <40% amino acid homology relative to L. 288 taurinensis lb1ssD (Table S2). Despite this, these genes were consistently found to be co-localized 289 with lb1ssB homologs with ≥40% homology to L. taurinensis (for instance, 290 WP_012979428.1/WP_012979429.1 in L. longbeachae strain NSW150). Last, protein phylogeny 291 of the ABC transporters was inferred to validate the results suggested by the BLAST results. This 292 analysis resulted in the renaming of several T1SSs which were near 40% homologous (Table S2).

293
Legionella T1SS sequences with suggested names based on the results of the protein phylogeny 294 and the sequence identity analysis are compiled (Table S2).  Reference genomes from 45 Legionella species were downloaded from NCBI (Table S3).

310
A set of core marker genes were identified using PhyloSift (36) and hmmer (37) Table S1. Sequence analysis of conserved loci from the novel Legionella taurinensis strain reveal 473 similarity between the novel strains and previously sequenced loci in L. taurinensis.