Differential strengths of molecular determinants guide environment specific mutational fates

Under the influence of selection pressures imposed by natural environments, organisms maintain competitive fitness through underlying molecular evolution of individual genes across the genome. For molecular evolution, how multiple interdependent molecular constraints play a role in determination of fitness under different environmental conditions is largely unknown. Here, using Deep Mutational Scanning (DMS), we quantitated empirical fitness of ∼2000 single site mutants of Gentamicin-resistant gene (GmR). This enabled a systematic investigation of effects of different physical and chemical environments on the fitness landscape of the gene. Molecular constraints of the fitness landscapes seem to bear differential strengths in an environment dependent manner. Among them, conformity of the identified directionalities of the environmental selection pressures with known effects of the environments on protein folding proves that along with substrate binding, protein stability is the common strong constraint of the fitness landscape. Our study thus provides mechanistic insights into the molecular constraints that allow accessibility of mutational fates in environment dependent manner. Author Summary Environmental conditions play a central role in both organismal adaptations and underlying molecular evolution. Understanding of environmental effects on evolution of genotype is still lacking a depth of mechanistic insights needed to assist much needed ability to forecast mutational fates. Here, we address this issue by culminating high throughput mutational scanning using deep sequencing. This approach allowed comprehensive mechanistic investigation of environmental effects on molecular evolution. We monitored effects of various physical and chemical environments onto single site mutants of model antibiotic resistant gene. Alongside, to get mechanistic understanding, we identified multiple molecular constraints which contribute to various degrees in determining the resulting survivabilities of mutants. Across all tested environments, we find that along with substrate binding, protein stability stands out as the common strong constraints. Remarkable direct dependence of the environmental fitness effects on the type of environmental alteration of protein folding further proves that protein stability is the major constraint of the gene. So, our findings reveal that under the influence of environmental conditions, mutational fates are channeled by various degrees of strengths of underlying molecular constraints.

role in determination of fitness under different environmental conditions is largely unknown. 23 Here, using Deep Mutational Scanning (DMS), we quantitated empirical fitness of ~2000 single 24 site mutants of Gentamicin-resistant gene (GmR). This enabled a systematic investigation of 25 effects of different physical and chemical environments on the fitness landscape of the gene. 26 Molecular constraints of the fitness landscapes seem to bear differential strengths in an 27 environment dependent manner. Among them, conformity of the identified directionalities of the 28 environmental selection pressures with known effects of the environments on protein folding 29 proves that along with substrate binding, protein stability is the common strong constraint of the 30 fitness landscape. Our study thus provides mechanistic insights into the molecular constraints 31 that allow accessibility of mutational fates in environment dependent manner. Alongside, to get mechanistic understanding, we identified multiple molecular constraints which 43 contribute to various degrees in determining the resulting survivabilities of mutants. Across all 44 tested environments, we find that along with substrate binding, protein stability stands out as the 45 common strong constraints. Remarkable direct dependence of the environmental fitness effects 46 on the type of environmental alteration of protein folding further proves that protein stability is 47 Introduction 54 Continuously fluctuating natural environments are known to a play central role in natural 55 selection by constituting environment to phenotype interactions (E2P). Such interactions can 56 predispose a particular genotype to alternative fates through diverted evolutionary trajectories 57 (1-5). At lower scales, molecular evolution of cellular genes, underlie the emergent organismal 58 adaptations. Cellular responses to the environmental conditions are well known to be often 59 enabled by maintenance of normal proteostasis (6). Therefore, an investigation of whether and 60 to how much extent protein folding and stability play a role in guiding mutational fates would be 61 a key factor in the efforts to elucidate molecular level E2P interactions. 62 Considering low rates of spontaneous mutations across variety of organisms (7), single site 63 mutations provide a view of immediate next fates of a gene. DMS approach has enabled 64 scanning of large scale of mutations in a high throughput manner (8,9); allowing comprehensive 65 analysis of sequence-space of genes. Resultant Distributions of Fitness Effects (DFE) provide a 66 continuous series of fitness effects ranging from strongly deleterious to beneficial which is a 67 valuable resource for quantitative genetics (10). In recent years, exploration of environmental 68 effects with a large-scale genotype to phenotype (G2P) data (11,12) has resulted in the 69 identification of environment specific differential mutational sensitivity. However, qualitative and 70 quantitative identification of determinants of fitness effects has been a challenging task (13). In 71 addition to improve much needed mechanistic understanding (14) of E2P interactions, such 72 information can potentially increase the robustness in current approaches of prediction of 73 phenotypes from genotypic information (15,16). 74 On a fitness landscape of a gene, levels of fitness of mutations may depend on number of 75 molecular constraints which would shape the landscape. Here, to systematically investigate the 76 respect to that in unselected pool serve as proxies for competitive fitness of mutants. Next, we 123 implement a strategy adopted in previous study on similar mutational scanning of antibiotic 124 resistant gene (21), to classify statistically neutral mutations by applying a cutoff of µ±2σ (µ: 125 mean, σ: standard deviation) obtained from fitness of unselected pool (S1 Fig 3). Accordingly, 126 mutants were classified as relatively enriched (F i >µ+2σ) and depleted (F i <µ-2σ). Fitness levels 127 of synonymous mutations across different environments are found to strongly correlate 128 (Pearson's r > 0.88, S1 Fig 4) with that fitness of synonymous mutations under reference 129 environment (37 0 C, 12.5 μg/mL Gm); thus diminishing the influence of codon bias associated 130 with G2P interactions of GmR. Therefore, unless specified, in the subsequent analysis of 131 mutational data, we primarily utilize non-synonymous mutants. This way, from a co-culture bulk 132 competition under reference conditions (37 0 C, 12.5 μg/mL Gm), we obtained empirical fitness of 133 2004 non-synonymous mutants (n) (S1 Fig 5, S2 Table). Among them, 609 mutants are 134 enriched, 262 mutants are depleted and 1133 are classified as statistically neutral (S1 Table1). 135 We compared pairs of DFEs obtained for various test environments by a metric defined as a 136 difference in average fitness scores of mutants between test and reference condition (∆F) which 137 indicates the relative increase or decrease in fitness of surviving mutants in test condition with 138 respect to reference environment. Additionally, number of mutants having significantly higher or 139 lower fitness in test environment are measured (n pos and n neg respectively). Here, to account for 140 the inherent experimental and biological noise, a cutoff is assigned for defining significant 141 increase or decrease in the fitness over the inherent dispersion within replicates (Materials and 142 Methods). 143 DFE of GmR obtained at 25 μg/mL Gm (37 0 C) reveals a depletion of fitness with a skew 144 towards the deleterious fitness levels as compared to DFE obtained at 12.5 μg/mL Gm (Fig 1B). 145 The lowered average fitness (∆F = -0.38) is accompanied by lowered survival of mutants in this 146 condition (∆n = -246). Such dosage dependent deleterious fitness effects are expected from 147 increased stringency for in vivo catalytic functionality and survivability of mutants which is 148 consistent with previous reports on mutational scannings of other antibiotic resistant genes (21-149 23). In terms of distribution of enriched and depleted mutants per position, mutations at sites 150 with low evolutionary rate, such as binding sites, show depleted levels of fitness while mutations 151 at sites with high evolutionary rate, specifically at N-terminal region, exhibit high fitness levels 152 (S1 Fig 6). Collectively, dosage dependence and consistent trends with conservation scores 153 support that the observed empirical fitness values are representatives of the in vivo 154 functionalities of GmR mutants. This further allows us to ascribe empirically quantified fitness 155 values to the in vivo functionalities of the mutants. 156 Here, all the bulk competitions were carried out at purifying selection of 12.5 μg/mL Gm.  We utilize prior information gained from growth kinetics of the model system i.e. wild type GmR 188 harboring E. coli under different test environments to establish the strengths of selection 189 pressures relative to reference environment. As compared to normal growth temperature of E. 190 coli (37 0 C), both lower (30 0 C) and elevated temperatures (42 0 C) show non-optimal growth 191 (shown in S1 Fig 7A). At elevated temperature, the amplitude of the growth kinetics decreases 192 by ~40%, while at low temperature there is more than two fold increase in the lag phase of the 193 growth kinetics. Both the chemical environments evidently show to suboptimal growth rates (S1 194  Table) supporting known 219 mechanistic difference in aiding protein folding (18,19). Further, at stringent Gm selection (25 220 μg/mL), compared to minimal Gm selection, TMAO treatment is found to exert mutational 221 robustness (S1 Fig 9A). Remarkably, glycerol treatment exerts mutational robustness compared 222 stringent (S1 Fig 9A) as well as minimal Gm selection (S1 Fig 9B). Collectively, among 223 individual treatment of environments, low temperature and chemical chaperones induce 224 mutational robustness while, elevated temperature exert deleterious fitness effects. 225

Fitness effects of combinations of environments 226
Phenotypic effects of simultaneous action of multiple environments have been historically 227 illusive to decipher (29). Using our experimental system, we set out to elucidate the effects of 228 combinations of different sets of environments having prominent but contrasting effects i.e. 229 elevated temperature and chemical chaperones. From growth kinetics of the model system, 230 under combination of environments, it is apparent that TMAO is able to counteract the effects of 231 elevated temperature while in case of treatment of glycerol, compared to growth kinetics at 232 elevated temperature, the kinetics drastically slows down (S1 Fig 7C). Co-culture bulk 233 competitions under simultaneous action of environmental conditions i.e. 42 0 C + TMAO and 42 0 C 234 + glycerol reveal drastic decrease in survivability of mutants. Interestingly, average fitness of 235 mutants is in case of 42 0 C + TMAO condition is higher than that for 42 0 C + glycerol condition. 236 Contrasting effects of the combinations of environments are consistent with the low 237 predictabilities fitness effects associate with simultaneous action of multiple environments. 238 In order to deduce relative influence of individual environments in the combinations, we 239 quantitated their selection metrics against individual constituent environments. Compared 240 against 42 0 C alone (Fig 2A), both 42 0 C + TMAO and 42 0 C + glycerol conditions show weaker 241 differences (Fig 2A) than when compared against chemical environments (Fig 2B and C). This Fitness effects would also depend upon constraints of multiple aspects such as structural, 272 folding and binding. Here, we used a set of molecular constraints describing multiple aspects of 273 gene fitness (S3 Table). Apart from measure of perturbation of stability by mutations (∆∆G), we Alongside the classification of molecular constraints, Euclidean clustering (Fig 3) along the axis 315 of environments reveals a distinct classification which in turn aligns with their effects on 316 survivabilities i.e. ∆n (Fig 3). Notably, correlation with conservation score is the highest in case 317 of reference condition (37 0 C) than other environments; implying that the mutational tolerances in 318 this environment closely resemble to that existed in evolutionary history of the gene. This 319 supports our primary assumption to regard 37 0 C as a reference environment. Molecular features 320 accounting for folding constraints i.e. ∆∆G and residue depth are best correlated in case of 321 environments with conserved (least changed) survivabilities suggesting that the environments 322 may induce allow survival of mutants with fitness changes that are aligned with the folding 323 constraints of the protein. In other words, folding constraint is likely to be stronger in such 324 environments than the environments with compromised survivabilities. Contrastingly, in terms of 325 distance from active site and dimer interface, fitness scores of mutants from environments with 326 lowered survivabilities are better correlated. This suggests that these factors particularly the 327 best correlated between the two i.e. distance from active site is potentially a major fitness 328

constraint. 329
Residue flexibility, interestingly, is negatively correlated with fitness scores of mutants from 330 environments with lowered survivabilities and positively correlated with majority of rest of the 331 environments. This implies that for environments that confer strongly deleterious fitness effects, 332  Next, we calculated subset wise metrics of selection i.e. ∆n and ∆F with respect to reference 373 environment (37 0 C, 12.5 µg/mL Gm). Firstly, in terms of survivability i.e. ∆n, except for 374 treatments of chemical chaperones at 12.5 µg/mL Gm and treatment of glycerol at 25 µg/mL 375 Gm, for rest all the environments, survivability of the mutants is drastically decreased (Fig 4A). 376 Notably, for environments with decrease in levels of survivabilities, across subsets, ∆n seem to 377 show a pattern which is closely dependent on the binding and folding constraints. Survivability 378 of the mutants seem to decrease in a following order FB > cFB > FcB > cFcB. The greater 379 decrease in survivabilities with the compromise in folding and binding supports our primary 380 basis for classification of mutants and indicates that fitness scores of mutants are closely 381 dependent on folding and binding constraints. 382 In terms of change in average fitness (∆F), with a clear exception of 37 0 C Gm 25µg/mL, for 383 majority of environments average fitness of surviving mutants is greater than reference 384 environment (Fig 4B). Stringent Gm selection in addition to decreasing survivabilities of 385 mutants, seem to lower the fitness levels of mutants. Among subsets, surviving mutants from 386 subset FcB in particular have substantially lowered fitness scores potentially due increased 387 dependence on substrate binding and catalytic activity because of increased level of purifying 388 selection. Combination of environments too show lowered average fitness across subsets; 389 effect of which is particularly exemplified in FcB subset. For other environments, the average 390 fitness of surviving mutants seem to be increasing in the order FB < cFB < FcB < cFcB. This 391 suggests that the compromise of folding and binding allow exclusive survival of mutants with 392 fitness advantage. While mutants at the core of the protein and near to active site (cFcB) have 393 lowered survival, the survived mutants tend to possess relatively increased fitness advantage. 394 Elevated temperature conditions is an interesting example wherein the survivability of mutants is 395 drastically decreased (Fig 4A) and the environmental section only allowed survival of mutants 396 with fitness advantage (Fig 4B). Overall, FB subset is least affected while contrastingly cFcB is 397 the most affected subset by across all environments. Mutational robustness by TMAO and 398 glycerol is apparent by the retention survivability of mutants ( Fig 4A) and increased fitness of 399 the mutants (Fig 4B). In combination with elevated temperature, both survivability and average

Unique reshaping of environment specific fitness landscape
In order to better contextualize the contributions of coupling between folding and binding fitness landscape of GmR by lining it with the folding and binding components which evidently act as central constraints of mutational fitness (Fig 5A). So, r For reference environment, as expected from corresponding folding and binding states, we find 421 that majority of high fitness mutations occupy FB regime whereas cFcB regime is majorly 422 occupied by the deleterious mutants (Fig 5A). Notably, this pattern is conserved across all the 423 test environments (Fig 5B-J). For reference environment, folding constraint produces a 424 pronounced fitness cliff whereby mutants with above a threshold of ∆∆G (~2 kcal/mol) are highly 425 likely to show deleterious fitness. In case of selection at stringent Gm concentration, 426 corroborating with dosage dependent effects, mutations close to the active site show a 427 prominent decrease in fitness ( Fig 5B). Here, the imposed higher load of Gm, seem to generate 428 additional pronounced fitness cliff along the binding axis. The substantial change in the fitness 429 landscape as a function of selection levels is consistent with known central role of purifying 430 selection in molecular evolution (21). 431 Among physical environments, fitness landscape corresponding to low temperature condition 432 show no peculiar regime wise distinction from that of reference environment ( Fig 5C); capturing 433 earlier noted weaker selection pressure (Fig 1C). Elevated temperature conditions -in 434 accordance with imposed negative selection pressure -show selective survival of only mutants 435 with enriched fitness associated with lowered survival of mutants from cFB and cFcB subsets 436 ( Fig 5D). Among chemical environments, mutational robustness imposed by TMAO and glycerol 437 is evident in the form of close similarity with the fitness landscape of reference environment (Fig  438   5E,G). At stringent Gm concentration, chemical chaperone seem to increase the fitness of 439 survived mutants lying at cFcB regime (Fig 5F,H) which are otherwise highly depleted (Fig 5B). robustness. This shows that fitness effects in each environment on fitness landscape for GmR is 477 dependent on kind of environmental alteration of cellular protein folding. 478 In our study, coupled factors, folding and binding constraints distinctly emerge as strong 504 determinants of fitness effects; and through their spandrel like property (34). To understand the 505 trade-off between the two, we visualized fitness of GmR mutants in a form of fitness landscape 506 lined by folding (ΔΔG) and binding axes (distance from active site) (Fig 5A). Across all 507 environments, corroborating with known trade-off in case of antibiotic resistant genes between 508 folding and binding (35), combinations of folding and binding states seem to underlie resulting 509 fitness effects. For instance, combination of weaker folding and binding constraints (FB) is 510 associated with largely enriched fitness levels while stringent folding and binding constraints 511 (cFcB) are associated with deleterious fitness levels of mutants ( Fig 5A). In case of reference 512 environment, folding constraint introduces a prominent limiting fitness cliff (at ∆∆G=~2 kcal/mol). 513 Whereas for test environments, although at variable degrees, common existence of limiting 514 fitness cliffs along folding axis (Fig 5) underscores central role of protein folding and stability in 515 molecular evolution (41). The universal dependence on folding component especially at minimal 516 purifying selection also explains the conformity of evident fitness effects with known effects of 517 environments on protein folding and proteostasis. Alongside, this study therefore reveals a 518 unique possibility of controlling mutational fates based on environmental alteration of major 519 constraints of fitness landscape. Notably, binding constraint imposes a limiting fitness cliff which 520 is more pronounced at stringent concentration of Gm (25 µg/mL) (shown in Fig 5B) and from our 521 data, it seems to be a feature dependent on level of purifying selection levels. 522 Collectively, from a simple experimental system consisting of a conditionally essential gene with 523 evidently weaker constraints imposed by events prior to protein expression, we identify 524 environment dependent differential fitness of mutations which are dependent on relative 525 strengths of underlying molecular constraints. Major implication of this information lies in the 526 improvement of our understanding of influence of environmental conditions on G2P interactions 527 at molecular level. Further, environmentally induced variability can potentially contribute a 528 requisite nuance in predictive models used for challenging task of inferring phenotypic outcomes 529 from genomic variants(15,16); making them more robust. In future, such study of 530 comprehensive environment specific fitness landscapes can be potentially extended to multiple 531 mutations to monitor combined effects of epistasis -arguably the major predictive factor in 532 molecular evolution (42), -more variety of environments as well as assessing fitness 533 landscapes of multiple proteins in gene regulatory networks. 534

Minimal inhibitory concentration (MIC) assays 536
Primary culture was prepared by inoculating (1% v/v) E. coli (K-12) in culture media (Luria-537 Bertani (LB) broth (HiMedia) containing 100 g/mL, ampicilin (Sigma) and 0.1% Arabinose 538 (Sigma)) and incubating at 37 0 C for 18 hrs. The primary culture was inoculated at OD 600 of 539 0.025 in culture media containing a range of Gm (Sigma) concentrations from 6.25 to 400 g/mL 540 with 2 fold increase at each increment (in 96-well storage plates). The assay plates were 541 incubated at 37 0 C for 18 hrs before measuring growth (OD 600 ) in Tecan microwell plate reader. 542 Growth assays 543 E. coli (K-12) harboring pBAD-GmR is grown in culture media (LB media containing 100 g/mL 544 and ampicilin 0.1% Arabinose) for ~18 hr. Primary culture was used as an inoculum (~0.01 OD) 545 for the growth assays. Growth assays in different environments were carried out using 546 Bioscreen C kinetic growth reader. The growth parameters were obtained by fitting absorbance 547 data to five parameter Logistic equation. pool) (as shown in Fig 1A) was carried out. An independent bulk competition was carried out at 563 37 0 C in the absence of Gm (unselected pool) which serves as a reference for calculating 564 preferential enrichments. 565

566
At the end of bulk competition assays, cells are pelleted and plasmid is purified. Amplicons were 567 generated by a short PCR (initial denaturation: 95ºC for 3 min, denaturation: 95ºC for 1 min, 568 annealing: 60ºC for 15 sec, extension: 72ºC for 1 min, final extension: 72ºC for 10 min) using 569 high fidelity KAPA HiFi DNA polymerase (cat. no. KK2601). High template concentration (1 570 ng/μl) and 20 cycles were used to reduce potential PCR bias. Multiplexing was carried out using 571 flanking barcoded primers (4 forward, 4 reverse, sequences in S1 Table 2). Amplicons of 572 barcoded samples were grouped in equimolar concentration and gel purified. A dual index 573 library for each such set was prepared using Truseq PCR-free DNA HT kit (Illumina Inc. Cat no. 574 F-121-3003) and sequenced using paired end (300 X 2) chemistry on Illumina Miseq platform. 575 Raw sequencing data is available upon request. 576

Estimation of fitness scores from deep sequencing data 577
Analysis of sequencing data was carried out by using dms2dfe (43) -a comprehensive analysis 578 pipeline exclusively designed for analysis of deep mutational scanning data. Through dms2dfe 579 workflow, output files from the sequencer (.fastq) were demultiplexed using ana0_fastq2dplx 580 module of dms2dfe. Average read depth of each demultiplexed sample was ~1X10 5 . Next, 581 though dms2dfe's modules namely ana0_fastq2sbam, sequence alignment was carried out 582 using Bowtie2 (44) followed by variant calling through ana1_sam2mutmat module which utilizes 583 pysam libraries (45). A variant is called only if average Q-score of the read and that of the 584 mutated codon is more than 30. Additionally a cut off of 3 reads per variants is used to filter out 585 anomalous low counts. As a result a codon level mutation matrix of counts of mutations is 586 generated. Codon level mutation matrix is then translated to amino acid level (based on the 587 codon usage bias of the E. coli). For each experimental condition, counts of ~2000 individual 588 mutants were quantified (S2 Table). Raw sequencing data is available at Sequence Read 589 Archive (SRA) as a BioProject: PRJNA384918. 590 Through ana2_mutmat2fit module of dms2dfe, counts of mutants are first normalized by the 591 depth of sequencing at each position of the gene. Then preferential enrichments which are log 592 (base 2) fold change of counts of the mutants in pool selected in presence of Gm against 593 unselected (0 µg/mL Gm) reference pool are estimated. Here, preferential enrichment of a 594 mutant serves as a proxy for its relative fitness and hence we simply refer it as 'fitness'. 595 Alongside significance levels of the preferential enrichments are obtained by Wald test through 596 DeSeq2 workflow (46) (S2 Table). 597

Classification of mutants based on comparison of DFEs 598
In order to assess the difference in fitness scores in a given comparison of DFEs of two 599 environments, we assigned a statistical threshold to reduce the influence of experimental and 600 biological noise. The mutants with higher fitness scores in test environments than control 601 environment are classified as 'positive' and while mutants with lower fitness scores in test 602 environments than control environment are classified as 'negative'. We use replicates of test 603 and control environments to get thresholds of difference in fitness scores such that the 604 differences in fitness scores between the environments ('signal') is greater than the differences 605 within the replicates ('noise'). From difference between fitness scores of replicates, two 606 distributions corresponding to replicates of test and control environment are obtained. µ test and 607 µ control are means and σ test and σ control are standard deviations of such distributions for test and 608 control environments respectively. So, the thresholds to segment the mutations into 'positive' or 609 'negative' is determined to be equal to ± 2 * . Number of mutants 610 undergoing positive effects is denoted as n pos, while number of mutants with negative effects is 611 denoted as n neg . 612