Adding function to the genome of African Salmonella ST313

Salmonella Typhimurium ST313 causes invasive nontyphoidal Salmonella (iNTS) disease in sub-Saharan Africa, targeting susceptible HIV+, malarial or malnourished individuals. An in-depth genomic comparison between the ST313 isolate D23580, and the well-characterized ST19 isolate 4/74 that causes gastroenteritis across the globe, revealed extensive synteny. To understand how the 856 nucleotide variations generated phenotypic differences, we devised a large-scale experimental approach that involved the global gene expression analysis of strains D23580 and 4/74 grown in sixteen infection-relevant growth conditions. Comparison of transcriptional patterns identified virulence and metabolic genes that were differentially expressed between D23580 versus 4/74, many of which were validated by proteomics. We also uncovered the S. Typhimurium D23580 and 4/74 genes that showed expression differences during infection of murine macrophages. Our comparative transcriptomic data are presented in a new enhanced version of the Salmonella expression compendium SalComD23580: bioinf.gen.tcd.ie/cgi-bin/salcom_v2.pl. We discovered that the ablation of melibiose utilization was caused by 3 independent SNP mutations in D23580 that are shared across ST313 lineage 2, suggesting that the ability to catabolise this carbon source has been negatively selected during ST313 evolution. The data revealed a novel plasmid maintenance system involving a plasmid-encoded CysS cysteinyl-tRNA synthetase, highlighting the power of large-scale comparative multi-condition analyses to pinpoint key phenotypic differences between bacterial pathovariants.


102
To maximise the functional insights to be gained from a transcriptomic analysis, a well-annotated 103 genome is required. The published annotation for D23580 dates back to 2009 [19], and lacked certain 104 essential bacterial genes such as the two outer membrane proteins lppA and lppB [31]. Accordingly, 105 we searched for important non-annotated bacterial genes, and used D23580 transcriptomic data 106 (described below) to cross-reference the locations of transcripts with the location of coding genes (S1 107 Text). This analysis allowed us to update the published annotation of D23580 by adding 86 new coding 108 genes, 287 sRNAs, and correcting the start or end locations of 13 coding genes (S2 Table). The re-109 sequenced and re-annotated S. Typhimurium D23580 genome is subsequently referred to as 110 D23580_liv (accession: XXXXXXXXXX).

111
The S. Typhimurium D23580 and 4/74 genomes are 95% identical 112 C a n a l s e t a l . P a g e | 5 Previously the D23580 genome had been compared with the attenuated laboratory S. Typhimurium 113 LT2 strain [19,32,33]. To assess the similarities and differences between the ST313 strain D23580 and 114 a virulent ST19 isolate, a detailed comparative genomic analysis was performed against the ST19 strain 115 4/74 (S1 Text). 4/74 is a prototrophic S. Typhimurium ST19 strain that is highly virulent in four animal 116 models [34], and is the parent of the widely-used SL1344 auxotrophic strain [35]. D23580 and 4/74 117 share 92% and 95% of coding genes and sRNAs, respectively (S2 Table). Genetic differences included 118 788 SNPs, 3 multi-nucleotide polymorphisms (MNPs), 65 indels, as well as 77 D23580-specific 119 pseudogenes that have been listed elsewhere [19]. Analysis of the SNPs, using the 4/74 annotation as 120 a reference, showed that 379 were non-synonymous, 255 were synonymous, six were located in 121 sRNAs, nine generated stop codons in coding genes and seven inactivated stop codons in D23580 (S3 122   Table). The final 132 SNPs were in intergenic regions.

158
The RNA-seq-derived sequence reads were mapped to the D23580_liv chromosome, and the pSLT-  Table). To achieve a complete 163 transcriptomic comparison, we first re-analysed our 4/74 transcriptomic data [37,39]

171
C a n a l s e t a l . P a g e | 7 4674 orthologous coding genes and sRNAs shared by strains D23580 and 4/74 were expressed in at 172 least one growth condition in both strains.

173
A small minority (117) of orthologous genes were expressed in at least one condition in strain 4/74, 174 but not in any of the conditions in D23580, with most showing low levels of expression (close to the 175 threshold TPM = 10) (S5 Table). In contrast, we identified 82 orthologous coding genes and sRNAs that 176 were expressed in at least one of the 17 growth conditions for D23580, but not expressed in 4/74 (S5 177   Table).  The data confirmed that S. Typhimurium reacts to particular infection-relevant stresses with a series of 181 defined transcriptional programmes that we detailed previously [37]. By comparing the transcriptomic

184
A complementary analytical approach was used to identify the transcriptional differences that relate 185 to the distinct phenotypes of the ST313 and ST19 pathovariants (S1 Table). Overall, 1031 of the 186 orthologous coding genes and sRNAs were differentially expressed (≥ 3 fold-change) between strains 187 D23580 and 4/74, in at least one growth condition (Fig 2A). Transcriptional differences are highlighted 188 in S3 Fig.   189 The terms "D23580-upregulated" and "D23580-downregulated" refer to genes that are either more 190 or less expressed in D23580, compared to 4/74. Three coding genes were D23580-upregulated and 191 six genes were D23580-downregulated in almost all growth conditions (Fig 2B and 2C

196
Three of the genes that were D23580-downregulated in most conditions (pSLT043-5) were located  resistance island was inserted between the mig-5 promoter region and the pSLT043-5 genes, we 199 hypothesise that the differential expression reflects transcriptional termination mediated by the Tn21 200 C a n a l s e t a l .

221
Additionally, RNA was extracted from two additional biological replicates of intra-macrophage S.

223
Following RNA-seq, the sequence reads were mapped to the D23580 and 4/74 genomes using our  Table. To ensure that biologically meaningful gene expression differences were reported, we used very 226 conservative cut-offs to define differential expression (Materials and Methods).

227
Following RNA-seq analysis of the three biological replicates of D23580 and 4/74 in five growth 228 conditions, differential expression analysis of orthologous genes and sRNAs was performed with a 229 C a n a l s e t a l . P a g e | 9 rigorous statistical approach (Materials and Methods, S6 Table). We identified 677 genes and sRNAs 230 that showed ≥ 2 fold-change (FDR ≤ 0.001) in at least one growth condition ( Fig 3A). Between 6% 231 (anaerobic growth) and 2% (InSPI2 condition) of orthologous genes and sRNAs were differentially 232 expressed between the two strains ( Fig 3A and 3B).

233
The ability to swim in semi-solid agar is a key phenotypic difference between D23580 and 4/74 234 [13,16]. We confirmed that D23580 was less motile than 4/74 (S5 Fig   . We speculate that the downregulation of the flagellar regulon in NonSPI2 could be due to a 256 significant upregulation (3.5 fold-change) of rflP in this low-nutrient environmental condition. This 257 differential expression was not seen in the InSPI2 growth condition which only differs from NonSPI2 by 258 a lower pH (5.8 versus 7.4) and a reduced level of phosphate [37].

259
C a n a l s e t a l . P a g e | 10 We identified six genes and sRNAs that were D23580-upregulated in most growth conditions, 260 specifically pgtE, nlp, ydiM, STM2475 (SL1344_2438), the ST64B prophage-encoded SL1344_1966, 261 and the sRNA STnc3750 ( Fig 3C). Just four genes were D23580-downregulated in all conditions, 262 namely pSLT043-5 (SLP1_0062-4), and cysS ( Fig 3D). These findings confirmed that biologically 263 significant information can be extracted from the initial 17-condition experiment (Fig 2B and 2C) as 264 similar genes were up/down-regulated across the multiple conditions of the replicated experiment.

265
The transcriptomic data were interrogated to identify virulence-associated genes that were 266 differentially expressed between D23580 and 4/74. Coding genes and sRNAs located within the 267 Salmonella pathogenicity islands SPI-1, SPI-2, SPI-5, SPI-12 and SPI-16 showed differential    C a n a l s e t a l . P a g e | 11 The key features of the transcriptional signature of D23580 included the differential expression of 288 the flagellar and associated genes, genes involved in aerobic and anaerobic metabolism, and iron-289 uptake genes. Specifically, the aerobic respiratory pathway cyoABCDE was D23580-upregulated in the 290 anaerobic growth condition, and anaerobic-associated pathways pdu, cbi and tdc operons were 291 D23580-downregulated. Importantly, genes associated with the acquisition of iron through production 292 and uptake of siderophores were D23580-downregulated in the intra-macrophage environment. In    Table). A label-free quantification approach identified 66 differentially 310 expressed orthologous proteins (≥ 2 unique peptides, ≥ 2 fold-change, p-value < 0.05) (Fig 4), including 311 54 D23580-upregulated proteins and 12 D23580-downregulated proteins. The most highly D23580-

316
C a n a l s e t a l . P a g e | 12 To identify genes that were differentially expressed at both the transcriptional and translational 317 levels, the quantitative proteomic data were integrated with the transcriptomic data. Eight D23580-318 upregulated proteins (YciF, SopA, PgtE, STM2475, RipC, RibB, Nlp, and STM3775) were significantly 319 up-regulated in the transcriptomic data (≥ 2 fold-change, FDR ≤ 0.001). Four differentially expressed 320 proteins (pSLT043, CysS, YgaD, MelA) were D23580-downregulated at the transcriptomic level (≥ 2 321 fold-change, FDR ≤ 0.001) (Fig 5). Overall, 12 genes were differentially expressed at both the 322 transcriptional and protein levels.   Table). All three SNPs were found to be monophyletic, allowing us to infer 335 the temporal order in which they arose and representing an accumulation of SNPs in melibiose 336 utilization genes over evolutionary time. The first SNP, melB I466V, was present in all 258 ST313 strains 337 tested and therefore arose first. The second SNP, in melR, was present in all ST313 lineage 2 and UK-338 ST313 genomes, suggesting that it appeared prior to the divergence of these phylogenetic groups [5].

339
The final SNP, melB P398S, is present in all ST313 lineage 2 and a subset of UK-ST313 genomes, 340 consistent with this being last of the three mutations to arise (Fig 6C). ST313 strains can therefore be 341 classified into groups of strains containing one, two or three SNPs in melibiose utilization genes.

342
It has been reported that D23580 did not ferment melibiose whereas a ST313 lineage 1 isolate 343 (A130), S. Typhimurium SL1344 and S. Typhi Ty2 were able to utilize melibiose as a sole carbon source 344 [18]. MelB catalyzes the symport of melibiose with Na + , Li + , or H + [51]. We confirmed that ST19 strains, 345 C a n a l s e t a l . P a g e | 13 and strains belonging to the ST313 lineage 1, were positive for alpha-galactosidase activity. In contrast, 346 isolates representing the UK-ST313 lineage and the ST313 lineage 2 were unable to utilize melibiose.

347
To determine the biological role of the SNPs in the melB and melR genes, we employed a genetic 348 approach. Single nucleotide engineering was used to generate isogenic strains that reflect all three 349 melibiose gene SNP states for determination of the role of the SNP differences between ST313 lineage 350 2 and ST19 in the alpha-galactosidase (MelA)-mediated phenotypic defect (Fig 6D). Melibiose 351 utilization in D23580 was rescued by nucleotide exchange of the three SNP mutations (D23580 melB + 352 melR + ) (Fig 6E). D23580 recovered its ability to grow with melibiose as the sole carbon source after 353 exchanging only the melR SNP with 4/74 (D23580 melR + ). In contrast, D23580 did not grow in the same 354 medium when the exchange only involved the two melB SNPs (D23580 melB + ). 4/74 lost its ability to 355 utilize melibiose as sole carbon source when we introduced the D23580 melR SNP (4/74 melRand 356 4/74 melB -melR -). However, an exchange of the two nucleotides in melB did not eliminate the ability of 357 4/74 to grow in minimal medium with melibiose (4/74 melB -). These data correlated with the alpha-358 galactosidase activity of the mutants, although a slight difference was observed between strains 359 D23580 melR + (light green) and D23580 melB + melR + (green), and between strains 4/74 (green) and 360 4/74 melB -(light green) (Fig 6F) suggesting an altered efficiency of melibiose utilization between the 361 two strains. To completely restore alpha-galactosidase activity in D23580, the reversion of the non-362 synonymous SNPs in both the melR and melB genes was required. Our data suggest that the melR 363 SNP is critical for the loss of function of the melibiose utilization system.

364
In a chicken infection model, the S. Typhimurium melA transcript is more highly expressed in the  not required for growth in rich medium (Fig 7B). We searched for S. Typhimurium D23580 genes that 379 encoded a cysteinyl-tRNA synthetase, and identified the pBT1-encoded gene, pBT1-0241 (cysS pBT1 ), 380 which the TIS data suggested to be 'required' for growth in rich medium (Fig 7B).

381
To investigate cysteinyl-tRNA synthetase function in D23580, individual knock-out mutants were 382 constructed in the chromosomal cysS gene (cysS chr ), and the cysS pBT1 gene. These genes were 89% 383 identical at the amino acid level and 79% at the nucleotide level. The cysS pBT1 mutant was whole 384 genome sequenced to confirm the absence of secondary unintended mutations. The pBT1 plasmid was 385 also cured from D23580. We determined the relative fitness of the two cysS mutants and the pBT1-386 cured strain. The D23580 wild type, D23580 ΔcysS chr and D23580 ΔpBT1 mutants grew at similar rates 387 in LB, whilst the D23580 ΔcysS pBT1 mutant showed an extended lag phase (Fig 8A). The D23580

388
ΔcysS pBT1 mutant showed a more dramatic growth defect in minimal medium with glucose as the sole 389 carbon source (Fig 8B).

390
To determine whether the presence of the pBT1 plasmid was linked to the decrease in cysS chr 391 expression, RNA from two biological replicates was isolated from the pBT1-cured strain in the ESP 392 growth condition. Differential expression analysis between this mutant and the wild-type D23580 strain 393 showed a significant increase in expression of cysS chr , with TPM levels close to those seen in 4/74 (Fig   394   8C, S5 Table). These results suggested the pBT1 plasmid is responsible for the down-regulation of 395 cysS chr expression in D23580.

396
The conservation of pBT1 was studied among 233 ST313 strains and compared to the presence of 397 the pSLT-BT plasmid which was found in all lineage 2 isolates (Fig 8D, S8 Table). Approximately 37% 398 of ST313 lineage 2 isolates carried the pBT1 plasmid. The pBT1 plasmid has rarely been seen 399 previously, but did show significant sequence similarity to five plasmids found in Salmonella strains 400 isolated from reptiles and elsewhere (98 to 99% nucleotide identity over 92 to 97% of the plasmid 401 sequence, accessions JQ418537, JQ418539, CP022141, CP022036 and CP022136, S1 Text).

402
C a n a l s e t a l .

P a g e | 15
Examples of essential bacterial genes located on plasmids are rare and this phenomenon has been 403 previously explored [59]. We conclude that the essentiality of the cysS pBT1 gene provides a novel 404 strategy for plasmid maintenance in a bacterial population.

418
To discover the genetic differences that impact upon the biology of S. Typhimurium ST313, we used a 419 functional transcriptomic approach to show that the two S. Typhimurium pathovariants shared many 420 responses to environmental stress.

421
By investigating global gene expression in multiple infection-relevant growth conditions, we 422 discovered that 677 genes and sRNAs were differentially expressed between strains D23580 and 4/74.

423
A parallel proteomic approach confirmed that many of the gene expression differences led to alterations 424 at the protein level. The differential expression of 199 genes and sRNAs within macrophages allowed 425 us to predict functions of African S. Typhimurium ST313 that are modified during infection. The 426 comparative gene expression data were used to predict key phenotypic differences between the 427 pathovariants which are summarized in S1 Table. The power of our experimental approach is 428 highlighted by our discovery of the molecular basis of the melibiose utilization defect of D23580, and a 429 novel bacterial plasmid maintenance system that relied upon a plasmid-encoded essential gene.

430
C a n a l s e t a l .

P a g e | 16
In the future, functional transcriptomics could shed light on the factors responsible for the phenotypic 431 differences that distinguish pathovariants of many bacterial pathogens.

464
Draft sequences of the pBT2 and pBT3 plasmids were provided by Robert A. Kingsley [19], and 465 were used to design oligonucleotides for primer walking sequencing (all primer sequences are listed in 466 S10 and Rv-pBT2-1, Fw-pBT2-2 and Rv-pBT2-2; and, for pBT3, the following oligonucleotides were used:

503
The remaining reads of each library were aligned to the corresponding genomes using Bowtie2

511
The complete RNA-seq pipeline used for this study is described in https://github.com/will-512 rowe/rnaseq.

513
C a n a l s e t a l .

P a g e | 19
Two strain-specific browsers were generated for the visualization of the transcriptional data in a

522
the expression cut-off was set as TPM > 10 for genes and sRNAs [37].

536
For RNA-seq analysis of the D23580 ΔpBT1 strain grown in ESP, only two RNA-seq biological 537 replicates were used for differential expression analysis with the three biological replicates of D23580

539
Sample processing for proteomics 540 C a n a l s e t a l .

565
To assess alpha-galactosidase activity, strains were grown on Pinnacle™ Salmonella ABC

566
(chromogenic Salmonella medium, LabM). Bacteria that are able to produce alpha-galactosidase in the 567 absence of beta-galactosidase appear as green colonies on this medium due to the hydrolysis of X-568 C a n a l s e t a l . P a g e | 21 alpha-Gal. This enzymatic activity was correlated to the ability to grow in M9 minimal medium with 569 melibiose as the sole carbon source.

574
The methodology followed the same strategy used for λ Red recombination explained below. After 575 electroporation of the ssDNA oligonucleotide into D23580 carrying the pSIM5-tet plasmid, screening for 576 D23580 recombinants was performed using a PCR with a stringent annealing temperature and primers

581
The second strategy for constructing scarless SNP mutants followed a previously described

598
C a n a l s e t a l .    Table).  Fig 8D (S8 Table).

655
Statistical analysis for phenotypic studies 656 C a n a l s e t a l .

P a g e | 24
Unpaired two-tailed Student's t-tests were performed using GraphPad Prism 6.0 (GraphPad Software

694
695 C a n a l s e t a l .