Robustness encoded across essential and accessory replicons in an ecologically versatile bacterium

Bacterial genome evolution is characterized by gains, losses, and rearrangements of functional genetic segments. The extent to which genotype-phenotype relationships are influenced by large-scale genomic alterations has not been investigated in a high-throughput manner. In the symbiotic soil bacterium Sinorhizobium meliloti, the genome is composed of a chromosome and two large extrachromosomal replicons (pSymA and pSymB, which together constitute 45% of the genome). Massively parallel transposon insertion sequencing (Tn-seq) was employed to evaluate contributions of chromosomal genes to fitness in both the presence and absence of these extrachromosomal replicons. Ten percent of chromosomal genes from diverse functional categories are shown to genetically interact with pSymA and pSymB. These results demonstrate the pervasive robustness provided by the extrachromosomal replicons, which is further supported by constraint-based metabolic modelling. A comprehensive picture of core S. meliloti metabolism was generated through a Tn-seq-guided in silico metabolic network reconstruction, producing a core network encompassing 726 genes. This integrated approach facilitated functional assignments for previously uncharacterized genes, while also revealing that Tn-seq alone misses over a quarter of wild type metabolism. This work highlights the strong functional dependencies and epistatic relationships that may arise between bacterial replicons and across a genome, while also demonstrating how Tn-seq and metabolic modelling can be used together to yield insights not obtainable by either method alone.


INTRODUCTION
containing the entire genome, and a strain with both the pSymA and pSymB replicons removed 135 (designated RmP3496 or ∆pSymAB; strains described previously in [30]). Transposon library 136 sizes were skewed to compensate for the difference in genome sizes, resulting in nearly identical 137 insertion site density for each library (Table S1). Both libraries were passed through selective 138 growth regimens in either complex BRM broth (rich medium) or minimal VMM broth (defined 139 medium) in duplicates. Following approximately nine generations of growth, the location of the 140 transposon insertions in the population was determined, a gene essentiality index (GEI) was 141 calculated for all chromosomal genes, and each gene was classified into one of five fitness 142 categories (Table 1) using the procedure described in the Materials and Methods. Four genes 143 (pdxJ, fumC, smc01011, smc03995), including two of unknown function, were independently 144 mutated in the wild-type background, and in all cases, the mutations yielded the expected no-145 growth phenotype ( Figure S4), supporting the accuracy of the Tn-seq output. All Tn-seq data is 146 available as Data Set S1. 147 A strong correlation was observed between the number of insertions per gene in each set 148 essential: the DNA replication apparatus, the four RNA polymerase subunits, the housekeeping 158 sigma factor, the general transcriptional termination factor Rho, 40 out of 55 of the annotated 159 ribosomal protein subunits, 18 out of 20 of the annotated aminoacyl-tRNA synthetases, and 6 out 160 of 10 of the annotated ATP synthase subunits. Considering genes classified as essential plus 161 those genes whose mutation resulted in a large growth defect (Groups I and II in Table 1), a core 162 growth promoting genome of 489 genes, representing ~ 15% of the chromosome, was identified 163 ( Figure 2B). This expanded list includes 51 out of 55 of the annotated ribosomal protein 164 subunits, 19 out of 20 of the annotated aminoacyl-tRNA synthetases, and 9 out of 10 of the 165 annotated ATP synthase subunits These 489 genes appeared to be mostly dispersed around the 166 chromosome, although there was a bias for these genes to be found in the leading strand ( Figure  167 2A). Based on published RNA-seq data for S. meliloti grown in a glucose minimal medium, 168 these 489 genes tend to be highly expressed, with a median expression level above the 90% 169 percentile ( Figure S7). Compared to the entire chromosome (Fisher exact test, p-value < 0.05 170 following a Bonferroni correction for 18 tests), this set of 489 genes was enriched for genes 171 involved in translation (5.2-fold), lipid metabolism (2.7-fold), cofactor metabolism (3.3-fold), equally to growth in both media ( Figure 2D). Forty genes were identified as more important 182 during growth in rich medium than in defined medium, and these genes had a median SMI score 183 of 7 (values of 1 and -1 are neutral). Only translation functions (5.8-fold) displayed a statistically 184 significant enrichment in these genes, which may reflect the faster growth rate in the rich 185 medium ( Figure S8), while there was also a non-statistically significant enrichment in signal 186 transduction (5.1-fold) ( Figure 2C). The extent of specialization for growth in the defined 187 medium was more pronounced; 93 genes were more important during growth in the defined 188 medium with a median SMI score of -20. These genes were enriched (statistically significant) in 189 amino acid (9.0-fold) and nucleotide (6.7-fold) metabolism presumably due to the requirement of 190 their biosynthesis, and carbohydrate metabolism (3.6-fold) likely as the sole carbon source was a 191 carbohydrate ( Figure 2C). The same overall pattern was observed between media for the 192 ∆pSymAB strain ( Figure S9). 193

Mutant fitness phenotypes are strongly influenced by their genomic environment. 194
The Tn-seq data sets for the wild-type and the ∆pSymAB strains were compared to 195 evaluate the robustness of the observed fitness phenotypes in response to changes in the gene's 196 genomic environment. Similar results were observed for both growth media, suggesting that the 197 results were generalizable and not medium specific. Depending on the medium, either 484 or 488 198 genes had an equal contribution to growth in both strains, 81 or 89 genes led to stronger growth 199 impairment when mutated in wild-type cells, and either 250 or 251 genes led to stronger growth 200 impairment when mutated in ∆pSymAB cells (Figures 2E, 2F, and Table 2). Only minor 201 functional bias was observed in the genes that displayed larger fitness defects in the ∆pSymAB 202 background ( Figure 2C); in both media, only electron transport (3-fold) and oxidoreductases 203 An in silico analysis of the genes identified as specifically essential in R. leguminosarum on the 250 basis of the Tn-seq data was not performed as only two of these genes were present in the R. 251 leguminosarum model. 252 In silico analyses support a high potential for genetic redundancy in the S. meliloti genome. 253 The results of the previous two sections are consistent with a strong genomic 254 environment effect on the phenotypic consequences of gene mutations. One possible explanation 255 is the presence of widespread genetic redundancy, at the gene and/or pathway level. In support of 256 this, ~ 14% of chromosomal genes had a Blast-BBH hit when the chromosomal proteome was 257 compared against the combined pSymA/pSymB proteome (Data Set S3). Therefore, this 258 phenomenon was further explored using a constraint-based metabolic modelling approach. 259 We first tested the in silico effect of chromosomal single gene deletions on growth rate in 260 the presence and absence of pSymA/pSymB ( Figure 4A). This analysis identified 67 genes (~ 261 7% of all chromosomal model genes) as having a more severely impaired growth phenotype 262 when deleted in the absence of pSymA/pSymB genes, 38 of which were lethal. This appeared to 263 be due to a combination of direct functional redundancy of the gene products as well as through 264 metabolic bypasses, as deletion of 50 reactions dependent on chromosomal genes had a more 265 severe phenotype in the absence of pSymA/pSymB, 42 of which were lethal ( Figure S12). 266 Next, a double gene deletion analysis was performed to examine the effect on growth rate 267 of deleting every possible pair of model genes. This analysis suggested that 49 chromosomal 268 genes had a more significant impact on growth than expected when simultaneously deleted with 269 a single pSymA or pSymB gene ( Figure 4B). Additionally, synthetic negative phenotypes were 270 observed for 97 chromosomal genes when simultaneously deleted with another chromosomal 271 gene ( Figure 4C). Overall, 14% of chromosomal genes were predicted to have a synthetic metabolic robustness being encoded by the S. meliloti genome, and with a significant influence 274 of the genomic environment on the fitness phenotype of gene mutations. 275 A consolidated view of core S. meliloti metabolism through Tn-seq-guided in silico 276 metabolic reconstruction. 277 The results described in the previous sections made it evident that a Tn-seq approach 278 alone is insufficient to elucidate all processes contributing to growth in a particular environment. 279 This is especially true if also considering non-essential metabolism that is nevertheless actively 280 present in wild type cells, such as exopolysaccharide production. Moreover, it is difficult to fully 281 comprehend the core functions of a cell by simply examining a list of essential genes and their 282 predicted functions. We therefore attempted to overcome these limitations by using the Tn-seq 283 data to guide a manual in silico reconstruction of the core metabolic processes of S. meliloti. A 284 detailed description of this process is provided in the Materials and Methods. In brief, the 285 existing metabolic model iGD1575 was treated as a database of reactions and gene-reaction 286 associations. Each pathway involved in central carbon metabolism or the production of essential 287 or non-essential biomass components (Table S3) were then rebuilt in a new (initially empty) 288 reconstruction drawing from the reactions present in iGD1575. At the same time, the genes 289 associated with each reaction were compared to the Tn-seq data and published literature to 290 confirm the linkage of the correct gene(s) to each reaction. 291 The resulting model, termed iGD726 and included as in SBML format in File S2, is 292 summarized in Figure 5 and Table 3, and the entire model including genes, reaction formulas, 293 and references is provided as an easy to read Excel table in Data Set S4. The process of 294 integrating the Tn-seq data with in silico metabolic reconstruction resulted in a major refinement of the core metabolism compared to the existing genome-scale model: 228 new reactions were 296 added, 115 new genes were added, and the genes associated with 135 of the 432 reactions 297 common to both reconstructions were updated. In addition to improving the metabolic 298 reconstruction, this process significantly expanded the view of core S. meliloti metabolism 299 compared to that gained solely through the application of Tn-seq. The genes associated with 300 approximately one third of the iGD726 reactions were not detected as growth promoting in the 301 Tn-seq datasets ( Figure 5, Table 3). While many of the additional reactions present in iGD726 302 are due to the inclusion of non-essential biomass components, which are part of the wild type 303 cell but are nonetheless dispensable for growth, others are from essential metabolic pathways 304 (Figures 5, S13). Overall, the combined approach of integrating Tn-seq data and in silico 305 metabolic modelling allowed for the development of a high-quality representation of core S. 306 meliloti metabolism in a way that neither approach alone was capable of accomplishing. 307 Tn-seq-guided in silico metabolic reconstruction facilitates novel gene annotation. 308 Over 20 of the reactions of the core metabolic reconstruction initially had no gene 309 attributed with producing the enzyme responsible for its catalysis. Similarly, many genes with no 310 clear biological function were found to be essential in the Tn-seq screen. By attempting to fill the 311 gaps in the in silico model with the uncharacterized essential genes, we were able to assign 312 putative functions to eight previously uncharacterized genes (Table S4). Two of these genes were 313 chosen for further characterization, smc01361 and smc04042. The smc01361 gene was annotated 314 as encoding a dihydroorotase, and mutation of smc01361 resulted in pyrimidine auxotrophy 315 ( Figure S14). Given its location next to pyrB, and the presence of an essential PyrC 316 dihydroorotoase encoded elsewhere in the genome (Data Set S1), we propose that smc01361 317 encodes an inactive dihydroorotase (PyrX) required for PyrB activity as has been observed in rhizobia lack a gene encoding a classical L-histidinol-phosphate phosphohydrolase, and it was 321 suggested an inositol monophosphatase family protein may fulfill this function instead [37]. 190,000 unique insertion locations was largely consistent, but not identical, with that previously 335 reported [41]; however, the specificity appeared to extend past the 9 base pair region that is 336 duplicated during Tn5 insertion ( Figure S3). While this bias is unlikely to have a significant 337 influence on the results in species with high GC content genomes, such as S. meliloti, accounting 338 for this bias may be important when applying Tn5 mutagenesis to species with low GC content 339 genomes. similar to S. meliloti, that is, with at least two large DNA replicons [42,43]. Several studies have 342 revealed that, in many ways, each replicon acts as a functionally and evolutionarily distinct entity 343 (for a review, refer to [43]); yet, there can also be regulatory cross-talk However, the large number of chromosomal genes-across many functional groups ( Figure 2)-349 that became conditionally essential following the removal of pSymA and pSymB indicate the 350 presence of many genes whose products can perform essential metabolic capabilities but that 351 remain cryptic due to inter-replicon epistatic interactions. It was also interesting to note that the 352 strength of the correlation between duplicates ( Figure S5 Figure 2). This observation was not growth medium-dependent, was not unique to a specific 373 gene functional class, and was not simply due to an overall reduced fitness of the ∆pSymAB 374 strain as the findings could be largely replicated in silico (Figure 4). 375 The majority of the genes whose fitness phenotype was dependent on the genomic 376 environment became more important for fitness following the genome reduction. In many cases, 377 this is expected to reflect a loss of functional redundancy; the increased importance of the 378 chromosomal cytochrome genes likely reflects a compensation for the loss of the pSymA/pSymB 379 encoded cytochrome complexes ( Figure 6). In other cases, it may reflect newly activated 380 pathways that must compensate for the loss of a normal housekeeping pathway. The specific 381 essentiality of proline biosynthesis, and the second half of histidine biosynthesis, in the 382 ∆pSymAB strain during growth in rich medium presumably reflects the inability of these strains 383 to transport these compounds and must therefore synthesize them de novo ( Figure 6). Indeed, 384 previous metabolomics work is consistent with the ∆pSymAB strain being unable to transport specifically essential in the ∆pSymAB strain in rich medium ( Figure  Somewhat surprisingly, approximately a quarter of the genes with a genomic 399 environment effect had a greater fitness defect in the wild type strain. In some cases this may 400 have been due to the reduced nutrient demand of the ∆pSymAB strain as a result of the smaller 401 genome content. For example, mutations of genes for arginine biosynthesis and the biosynthesis 402 of AICAR and UMP, common precursors in the synthesis of purines and pyrimidines, 403 respectively, had fitness defects in rich medium specifically in the wild-type ( Figure 6). This 404 may reflect that in this environment, the uptake of these nutrients is growth limiting to the wild-405 type in the absence of their de novo synthesis, whereas this is not the case in the ∆pSymAB 406 strain due to the reduced genome size, and thus lower nutrient requirement, and the already 407 reduced growth rate ( Figure S8). Another possibility is that removal of pSymAB evokes phenotypes that are epistatic to many of those brought about by chromosomal mutations. For 409 example, the removal of pSymB is expected to have resulted in alterations of the cell membrane 410 [49,54,55]; our observation that many mutations causing greater relative fitness defects in wild-411 type cells are associated with lipid metabolism, such as biosynthesis of the lipopolysaccharide 412 core oligosaccharide (Figure 6) may be a result of those mutations being phenotypically masked 413 in the absence of pSymB. 414 Our work in integrating the Tn-seq data with in silico metabolic modelling made it 415 evident that Tn-seq alone is insufficient to identify the entire core metabolism of an organism; 416 almost a third of the reactions present in the core metabolic reconstruction were not supported by 417 Tn-seq data ( Figure 5 and Table 3). Similarly, the large number of changes made in the gene-418 reaction relationships when producing the core model illustrated the limitations in the quality of 419 metabolic reconstructions when high-throughput mutagenesis data are lacking. In some cases, 420 the gaps in the Tn-seq data were due to genomic environment effects, such as genetic 421 redundancy, in other cases it was due to the inclusion of reactions that are non-essential but that 422 are nonetheless required for production of 'wild type' cells, and sometimes the gene associated 423 with a reaction is simply unknown. A fourth possibility is phenotypic complementation through 424 cross-feeding. Given that Tn-seq involves growth of a population of mutants, a mutant unable to 425 produce an essential metabolite may still grow if the metabolite is excreted and transferred to the 426 mutant from the rest of the population.  resulting DNA fragments were appended with short 3' homopolymer (oligo-dCTP) tails using 514 terminal deoxynucleotidyl transferase (NEB #M0315S), and this sample was used as the 515 template for a two-round PCR that gave rise to the final Illumina-ready libraries. In the first 516 round, a transposon end-specific primer (1TN) and oligo-G primer (1GG) were used (all primer 517 sequences can be found in Table S6). After clean-up, a portion of the first-round product was 518 used as the template for the second-round reaction employing a nested transposon-specific 519 primer (2TNA-C) and a reverse index-incorporating primer (2BAR01-08). The series of three 520 Illumina Hi-Seq instrument as 50-bp single-end reads. Raw reads were used as input into a 526 custom-built Tn-seq analytical pipeline, which was recently described [57]. 527

Calculation of gene and synthetic indexes. 528
For calculation of Gene Essentiality Index (GEI) scores, a pseudo count of one was first 529 added to all gene read counts for each replicate. GEI were then calculated by summing the 530 number of reads that mapped to the gene in both replicates, and dividing this number by the 531 nucleotide length of the gene. GEI scores were calculated for each gene separately in each 532 medium and in each strain. All GEI values are available in Data Set S1. 533 Synthetic Media Index (SMI) scores were calculated to represent the difference in GEI 534 scores between the two media for the same strain. Raw SMI scores were determined by dividing 535 the GEI of the gene in defined medium by the GEI of the gene in rich medium. Processed SMI 536 scores, those shown throughout the manuscript, were determined as follows. If the raw value was 537 above one, the processed SMI and the raw SMI are the same. Raw SMI scores that were below 538 one were converted to processed SMI scores through the transformation, "1 / raw SMI score", 539 and presenting the value as a negative number. 540

Raw and processed Synthetic Rich Index (SRI) and Synthetic Defined Index (SDI) scores 541
were calculated to represent the difference in the GEI scores of a gene between the wild-type and 542 ∆pSymAB strains when grown in rich or defined medium, respectively. SRI and SDI indexes 543 were calculated using the same procedure as described for the SMI scores above. All synthetic 544 index scores are provided in Data Set S1. [68]. All genes belonging to an apcluster grouping that contained an essential gene, as 558 determined in any of the previous steps, were re-annotated as essential. Additionally, all genes 559 belonging to an apcluster grouping that spanned the border of two Mclust goups were transferred 560 to the same classification, based on which cluster the genes had a higher median probability of 561 being derived from in the Mclust analysis. Finally, genes that were classified as 'essential' in one 562 medium and 'large growth impairment' in the second medium, but that were identified as having 563 no medium specificity based on their SMI scores, were considered as essential in both media. 564 Genes with GEI scores significantly different between conditions were determined as 565 follows. The synthetic indexes (SMI, SDI, SRI) scores were imported into R and log 566 transformed, and the following clustering performed independently for each index. The log 567 transformed synthetic scores were clustered using Mclust and apcluster in R as described above 568 for the GEI scores. In the case of the SMI scores, three clusters were produced 'Little to no classified as 'Large difference' were considered to display a medium specificity. In the case of 571 SDI and SRI scores, only two clusters were produced: 'Little to no difference' and 'Difference 572 between strains'.  Table S7. In silico 614 analysis of redundancy in the S. meliloti genome was performed using the iGD1575b metabolic reconstruction, whose development is described in the following section. Single and double gene 616 deletion analyses were performed using the singleGeneDeletion and doubleGeneDeletion 617 functions, respectively, using the Minimization of Metabolic Adjustment (MOMA) method. All 618 Matlab scripts used in this work are provided as File S3. 619 For all deletion mutants, the growth rate ratio (grRatio) was calculated (growth rate of 620 mutant / growth rate of wild type). Single gene deletion mutants were considered to have a 621 growth defect if the grRatio was < 0.9. For the double gene deletion analysis, if the grRatio of 622 the double mutant was less than 0.9 the expected grRatio (based on multiplying the grRatio of 623 the two corresponding single mutants), the double deletion was said to have a synthetic negative 624 phenotype. 625

Development of iGD1575b. 626
For in silico analysis of redundancy in the S. meliloti genome, the previously published S. 627 meliloti genome-scale metabolic model iGD1575 [34] was modified slightly. As indicated in 628 Table S8, the biomass composition was updated to include 31 additional compounds at trace 629 concentrations, including vitamins, coenzymes, and ions, in order to ensure the corresponding 630 transport or biosynthetic pathways were essential. However, the original model iGD1575 was 631 unable to produce vitamin B12 and holo-carboxylate. To rectify this, the reversibility of 632 rxn00792_c0 was changed from 'false' to 'true', and the reactions rxn01609, rxn06864, and 633 rxnBluB were added to the model. However, no new genes were included in the model. This 634 updated model was termed iGD1575b and is available in SBML and Matlab format in File S2. 635 Simulating the removal of pSymA and pSymB in silico. 636 Several modifications to iGD1575b were required in order to produce a viable model 637 following the deletion of all pSymA and pSymB genes. As described previously [34], reaction relationships) were added to the reactions 'rxn01675_c0', 'rxn01997_c0', 640 'rxn02000_c0', and 'rxn02003_c0' in order to allow the continued production of the full LPS 641 molecule, as well as to 'rxn00416_c0' to allow asparagine biosynthesis. Additionally, 'gapfill' 642 GPRs were added to the reactions 'rxn03975_c0' and 'rxn03393_c0' so that removal of pSymA 643 and pSymB did not prevent biosynthesis of vitamin B12 and ubiquinone-8, respectively. Finally, 644 a glycerol export reaction via diffusion (rxnBLTPcpd00100b) was added to remove the glycerol to be essential, but the corresponding reaction for the gene was associated with multiple 673 alternative genes, all but the essential gene were removed from the reaction. Similarly, if a non-674 essential gene was associated with an essential reaction, a second gene or an Unknown was 675 added to reflect the apparent redundancy in the genome. Where possible, unknowns in the gene 676 associations were replaced with genes whose gene product may catalyze the reaction. cofactors, and ions were added to the biomass composition at trace concentrations to ensure that their biosynthesis or transport was essential. The complete biomass composition is provided in 683 Table S3. 684 The necessary metabolic and transport reactions to allow the model to growth with 685 sucrose, glucose, or succinate were included in the reconstruction. Once the model was capable 686 of producing all biomass components using any of the three carbon sources, the list of model

975
The lack of insertions within the phoU coding region is therefore consistent with the non-polar nature of the 976 transposon.