High-Resolution Phenotypic Landscape of the RNA Polymerase II Trigger Loop

The active sites of multisubunit RNA polymerases have a “trigger loop” (TL) that multitasks in substrate selection, catalysis, and translocation. To dissect the Saccharomyces cerevisiae RNA polymerase II TL at individual-residue resolution, we quantitatively phenotyped nearly all TL single variants en masse. Three mutant classes, revealed by phenotypes linked to transcription defects or various stresses, have distinct distributions among TL residues. We find that mutations disrupting an intra-TL hydrophobic pocket, proposed to provide a mechanism for substrate-triggered TL folding through destabilization of a catalytically inactive TL state, confer phenotypes consistent with pocket disruption and increased catalysis. Furthermore, allele-specific genetic interactions among TL and TL-proximal domain residues support the contribution of the funnel and bridge helices (BH) to TL dynamics. Our structural genetics approach incorporates structural and phenotypic data for high-resolution dissection of transcription mechanisms and their evolution, and is readily applicable to other essential yeast proteins.


Introduction
RNA polymerase II (Pol II) synthesizes all eukaryotic mRNAs. Structural studies of Saccharomyces cerevisiae (Sce) Pol II have illuminated mechanisms of transcription [1][2][3][4][5][6], especially RNA synthesis. RNA synthesis occurs through iterative nucleotide addition cycles (NACs): selection of correct substrate nucleoside triphosphate (NTP), catalysis of phosphodiester bond formation, and enzyme translocation to the next template position. These critical steps in NAC appear to be coordinated by a critical, conserved domain within the Pol II active site: the trigger loop (TL). TL functions are underpinned by its mobile and flexible nature ( Fig 1A). The primary function of the TL is kinetic selection of correct NTP substrates while balancing transcription speed and fidelity, and this function is highly conserved based on studies of RNAPs from Escherichia coli (Eco) [7,8], Thermus aquaticus (Taq) [9], the archaeons Pyrococcus furiosus (Pfu) [10] and Methanocaldococcus jannaschii (Mja) [11], and eukaryotic Pol II from Sce [12,13] and human [14]. In a simplified two-step model, correct NTP binding appears to facilitate TL movement such that a bound, matched NTP shifts the TL from the "open" state to the "closed" state [4,[15][16][17][18], allowing capture of the matched NTP in the Pol II active site and promotion of phosphodiester bond formation [4,17,19]. The subsequent release of the byproduct, pyrophosphate, allows a conformational shift of the TL from the "closed" state back to the "open" state [15,20,21]. TL opening has been proposed to be critical for enzyme translocation relative to the DNA template, an essential step for the next nucleotide addition cycle [8,13,15,[22][23][24][25]. Furthermore, additional TL states have been implicated in transcriptional pausing from studies in E.coli [17,22,26], backtracking from structural observations [27,28], and, although controversial, intrinsic cleavage [7,[29][30][31][32]. Thus, distinct TL conformations or interactions are linked to different functions in transcription, with delicate control of TL dynamics promoting proper transcription elongation while possibly incorporating signals from the rest of Pol II or Pol II bound factors [17,[33][34][35][36].
Genetic and biochemical studies have revealed TL functions in the NAC. First, the nucleotide interacting region (NIR, Rpb1 1078-1085) discriminates matched rNTPs from 2'-dNTPs and non-complementary rNTPs [12,13]. NIR substitutions in residues observed to interact with rNTPs widely conferred lethality. Where viable, substitutions reduced catalytic activity in vitro and were termed as partially loss-of-function (LOF) [7][8][9][10]12,37]. Second, a TL C-terminal mutant E1103G, conferred increased catalytic activity in vitro, which we termed gain-offunction (GOF) [12,13,38]. Fast kinetics experiments revealed that E1103G may bias TL dynamics towards the catalytically active "closed" state [13], consistent with infidelity and compromised translocation in addition to increased catalysis [12,13,23,39,40]. Furthermore, complex functional network within the Pol II TL [37]. Finally, context dependence for TL residue function has been observed, wherein analogous mutations in a conserved TL residue showed opposite effects in Sce Pol I and Pol II, suggesting different rate limiting steps for the two enzymes [42]. Together, the intricate intra-TL functional network and the context dependence of TL properties suggest importance of the extensive residue-residue interactions within and outside the TL.
The possible multifunctional nature of each TL residue complicates interpretations of functions if interpretations are based on a limited number of mutants. This is because the phenotype of any given mutant could result from removal of the wild type side-chain or additional functions of the substituted residue. Furthermore, different substitutions may have distinct effects on particular TL conformations [37,43]. In the TL, different substitutions in the same residue can confer distinct phenotypes, so limiting mutational analyses to a single substitution at a particular position may mislead about residue function [13,37]. Deep mutational scanning is an emerging technique for studying large sets of mutants by assessing the enrichment or depletion of variants after a strict selection process [44]. Different selection approaches have been designed such that a specific protein property (sensitivity to substitutions [45], thermostability [46], protein stability [47], etc) can be studied. Notably, our established genetic phenotypes (Table 1) were well correlated with altered transcription elongation rates in vitro and specific transcription defects in vivo [37,41], thus providing a powerful phenotypic framework for studying TL function. In this work, we have defined the fitness and phenotypic landscape of the conserved, essential S. cerevisiae Pol II TL. We have found three distinct classes of transcriptionally defective TL mutants that are associated with differential stress response profiles, allowing the determination of functional contributions of each TL residue. We have examined the mechanisms by which proximal Pol II domains communicate with the TL, while identifying examples of inter-residue epistasis, which are the likely drivers of incompatibility of RNAP evolutionary variants when placed in the Pol II context.

Strategy for studying in vivo effects of TL variant library
A comprehensively mutagenized TL variant library (Rpb1 1076-1106), excepting some previously well-characterized variants [12,37], was synthesized using the Slonomics technology [48,49] and validated by deep sequencing (Fig 1B). Synthesis conditions were such that single substitution mutants would predominate. Our TL mutant library showed an even distribution Suppressor of Ty (Spt -) lys2-128@; reports on chromatin defects and start site selection [52]. Specific class of GOF Pol II mutants [37,41,53].
Resistance to drug Sensitivity to drug for GOF mutants, relative resistance for LOF mutants.
Moderate sensitivity to galactose (Gal S ) of substitutions across all positions and substitution types (S1A and S1B Fig), with generally very low frequencies for excluded mutants, as expected ( Fig 1B). We first sought evidence that the measured allele frequencies reflected the real allele frequency distribution because PCR fidelity for highly similar amplicons is often compromised by template switching [50,51]. We spiked in five excluded single substitution variants (H1085Y, H1085Q, F1086S, G1097D, E1103G) as controls. Double mutant variants comprised of these single substitution spike-in variants would not be present in our library, but if observed they would presumably be the result of template switching between spike-ins. We prepared TL amplicons from a subset of conditions using both standard PCR and emulsion PCR (emPCR), which can suppress template switching [50,51]. First, double mutants derived from spike in controls were found at a significantly lower frequency than the relevant single substitution variants; Second, emPCR further suppressed the template switching frequency for all possible double mutants derived from spike-in single variants (Fig 2A, left), at about 2.5-fold on average (Fig 2A, right). We conclude that template switching is likely not extensive in our reactions but further reduction by emPCR led us to employ emPCR for our studies.
We have developed an experimental pipeline to examine mutations in an essential gene using a plasmid shuffling strategy, and have applied it to study the TL variant library (Fig 1C). To validate our pipeline and to isolate novel TL alleles, we performed a traditional genetic screening for mutants with transcriptional defects (Table 1). We have shown previously that these phenotypes correlate with Pol II biochemical activity in vitro [12,37,41]. Transcriptionrelated phenotypes employed include, first, the Suppressor of Ty (Spt -) phenotype, derived from a transposon insertion into the 5 0 end of the LYS2 gene (lys2-128@ allele) [52,53]. The transposable element insertion renders wild-type cells Lys -. A subset of Pol II TL mutants allow expression of a normally silent promoter within the transposable element to express a truncated but functional LYS2 transcript, conferring the Sptphenotype by allowing cells to become Lys + . Sptmutants in the TL correlate with biochemical GOF phenotypes and their related genetic interaction and gene expression signatures [37,41,53]. Second, we employed suppression of the galactose-induced toxicity conferred by the gal10Δ56 allele of GAL10, (Gal R ) [54,55]. gal10Δ56 contains a deletion in the major GAL10 polyadenylation signal, allowing transcription readthrough and interference with the downstream GAL7 gene [54,55]. This readthrough/interference alters the ratio of metabolic enzymes in the galactose-utilization pathway, causing the buildup of a toxic intermediate, resulting in galactose sensitivity (Gal S ). Mutations in transcription elongation factors and Pol II subunits can alter these transcription defects and suppress gal10Δ56 galactose sensitivity [37,41,55]. Third, we employed Mycophenolic acid (MPA) sensitivity. Sensitivity to MPA for examined Pol II TL mutants derives from altered transcription initiation at the IMD2 promoter [37,56], whose transcription is controlled through use of multiple start sites [57,58], and whose expression is required for cell resistance to MPA [59]. We have linked Pol II catalytic activity to the ability to induce IMD2. Increased activity Pol II alleles (GOF) fail to induce IMD2 in the presence of MPA due to aberrant transcription start site selection [37,56]. By screening for these three transcription-related phenotypes, we isolated 1166 candidate mutants (S1 Table), which included 154 singly-substituted and 386 multiply-substituted variants.
To further distinguish mutants, we examined 50 single substitution variants under various stress conditions to screen for conditions that could induce allele-specific phenotypes (Fig 2B,  S2 and S3 Figs). We observed that media containing caffeine, hydroxyurea, MnCl 2 , formamide, cycloheximide, or NaOH induced allele-specific sensitivity or resistance, while media containing ethanol, benomyl, HCl or NaCl showed fewer allele-specific effects (Fig 2B, S2 and S3 Figs). Therefore, in our high-throughput approach, we phenotyped TL variant library under our established conditions (medium lacking lysine (Spt -), medium containing MPA (MPA S ) or medium containing galactose (Gal R )) and appropriate media for the stress conditions empirically determined to discriminate among our pilot alleles. Phenotypic scores were estimated from the change of allele frequency normalized to WT, as is standard in mutational scanning studies [44][45][46][47]. Quantitative phenotypic scores of the 50 mutants from the highthroughput phenotyping were consistent with semi-quantitative growth scores derived from standard phenotyping (Fig 2C, S2-S4 Figs), validating our approach.

The Pol II TL fitness landscape
The TL is highly conserved, especially in the NIR, the loop tip residue (Rpb1 G1088) and for several TL C-terminal residues (Fig 3A). Highly-conserved residues are predicted to be critical for protein function, thus substitutions during evolution are expected to confer fitness defects and be selected against. We first sought to evaluate general fitness defects of observed TL   singly-substituted variants (termed the "fitness landscape"), both in the presence of WT RPB1 ( Fig 3B) and upon the removal of WT RPB1 (Fig 3C). Notably, TL NIR and loop tip substitutions conferred large fitness defects in general, while most perturbations in the similarly conserved C-terminal residues did not confer severe growth defects (Fig 3B and 3C). This observation highlights that conservation does not necessarily reflect sensitivity to perturbations, and that the TL fitness landscape can further distinguish extremely highly conserved TL residues, as discussed below: First, substitutions in the NIR (Rpb1 1077-1085) generally conferred both fitness defects ( Fig 3C) and apparent dominance (Fig 3B). Observed fitness defects were consistent with previous observations that several NIR mutants render Pol II slow in elongation in vitro and cause fitness defect in vivo [12,37]. The observed dominance for many NIR variants was consistent with TL variants being assembled into Pol II complexes that interfere with WT Pol II function, likely through clashes with WT Pol II on genes in vivo. Second, substitutions within the alanine-glycine linker (Rpb1 1087-1088) almost universally conferred lethality or severe growth defects. A Pol II structure with a closed TL [4] reveals that A1087 and G1088 are in a tight pocket between the funnel and bridge helices, presumably necessitating small side-chain residues (S5A Fig). To determine the extent of spatial constraint, we individually assessed the fitness of AG swapping variants, and small hydrophobic valine substitutions ( Fig 3D). Notably, all the swapping variants (A1087G, G1088A and A1087G/G1088A) were lethal ( Fig 3D). While G1088V is lethal, A1087V is severely sick but viable (Fig 3D), suggesting extremely high, but differential spatial constraint but differential tolerability for the two residues. This pocket/TL interaction is only observed in the closed TL [4] but not in any of the open states [60], suggesting function in stabilizing the active, closed TL conformation for promoting catalysis. Consistent with disruption of the pocket/TL interaction and the closed TL state, we observed genetically LOF phenotypes for A1087V (Gal R , slight MPA R ) ( Fig 3E). Finally, substitutions in the conserved C-terminal helix, though not strongly defective in general fitness, are likely to have transcription defects, based on our prior studies, and were further characterized (discussed below).

Novel TL NIR mutants allow mechanistic insights
The TL fitness landscape identified residues highly sensitive to perturbations, while also revealing variants in NIR residues previously known to be difficult to viably substitute. We highlight L1081 and H1085 as two examples. L1081 directly interacts with the nucleobase moieties of matched NTPs [4], and equivalent residues in Eco, Taq and Pfu RNAPs are important for substrate selection or catalysis [7,9,10]. L1081 is the most sensitive residue to perturbations among the hyper-conserved NIR. All previously tested L1081 variants were lethal [37], though viable substitutions were identified for all other NIR residues of interest. Furthermore, the GOF allele E1103G can generally suppress lethal substitutions for most NIR residues, but could not for tested L1081 substitutions [37]. In our TL fitness landscape, almost all L1081 variants were indeed predicted to be lethal based on our fitness threshold ( Fig 3C). L1081M conferred a severe growth defect, but was predicted to be just above the viable threshold ( Fig 3C). To validate this prediction, we constructed L1081M for direct analysis, and found that L1081M was indeed viable yet severely sick ( Fig 3D). Furthermore, L1081M conferred Gal R and slight MPA R phenotypes, consistent with other LOF mutants ( Fig 3E). Eukaryotic multi-subunit RNA Polymerases share a stringent evolutionary requirement for L at this TL position, while bacterial and archaeal lineages show both M and L variants. Consistent with evolutionary tolerance of variation within bacterial and archaeal lineages, the Taq RNAP M1238L variant shows near WT activity for substrate selection and catalysis in vitro [9]. The severe growth defect of L1081M highlights epistasis within Sce Pol II and likely eukaryotic RNAP lineages, which imposes a stringent requirement for Leucine at this position.
H1085 interacts with the β-phosphate of the matched NTP [4], and has been implicated in substrate selection, catalysis, intrinsic cleavage and PPi release [29,61]. We previously constructed several H1085 variants (A/N/D/F were lethal, K/R/W/Y caused severe growth defects, Q caused slight growth defect [12,37,41]), suggesting that some polar or positively charged residues, but not a hydrophobic phenylalanine or alanine, could partially complement loss of the histidine [37]. Here, we found that H1085L was viable and healthy in the fitness landscape ( Fig  3C), and validated it with phenotypic analyses of a reconstructed H1085L allele ( Fig 3D). While H1085L conferred slight MPA R and Gal R phenotypes, consistent with other LOF mutants (Fig 3E), it also conferred a slight Sptdefect, suggesting distinct defects from most other NIR mutants and all known LOF mutants [37]. This observation alters our understanding of the likely bounds of active site chemistry (see discussion).

There are at least three distinguishable TL mutant classes
The overall TL fitness landscape revealed the essentiality of almost all single substitution TL variants in standard growth medium, but could not indicate the nature of transcriptional defects, as we had previously found that both LOF and GOF alleles conferred growth defects. Therefore, we sought to determine the phenotypic outcome of the TL variants for the transcription-related Gal R , MPA S and Sptphenotypes and a variety of allele-distinguishing stress conditions (investigated earlier in Fig 2A). Here, we term this response profile as the "phenotypic landscape", as it distinguishes the TL mutants with presumably distinct transcription defects, in contrast to the general "fitness landscape" described above.
Hierarchical clustering of the phenotypic landscape for 412 TL variants passing fitness filters revealed three major mutant classes with distinct features (Fig 4A and 4B). Class 1 mutants generally conferred a strong Gal R phenotype yet were Spt + , and in some cases were also slightly MPA R relative to WT, consistent with previously characterized LOF mutants. We also identified high formamide sensitivity as a new signature phenotype for Class 1 mutants. Class 2 mutants showed generally weaker Gal R , slight formamide resistance, and did not confer strong phenotypes otherwise, representing a novel TL mutant class yet to be biochemically characterized. Class 3 mutants generally conferred Gal R , Sptand MPA S phenotypes, consistent with previously characterized GOF mutants. Mn 2+ hypersensitivity (Mn S ) was correlated broadly with Sptand MPA S phenotypes, suggesting a relationship among these phenotypes, and consistent with previous in vitro biochemical and in vivo phenotypic data for a subset of known GOF mutants [62,63]. Notably, our spike-in LOF (F1086S, H1085Q and H1085Y) and GOF mutants (E1103G and G1097D) co-clustered with Class 1 and Class 3 mutants, respectively.

Functional contribution of TL residues in different states and substrateinduced TL closing mechanism
The distributions within different mutant classes predict distinct functional contributions of TL residues to TL dynamics. Perturbations predicted to bias the TL towards the active, closed TL state have been shown to result in GOF, whereas destabilization of the closed TL state generally leads to LOF [8,12,13,17,37]. Therefore, distributions of Class 1 (LOF) and Class 3 (GOF) mutants predict alterations to TL dynamics, as follows: Class 1 (LOF) mutants included most variants from F1086, V1089, V1094 and P1099 ( Fig  4C, left), suggesting important functions of these residues in stabilizing the closed TL. F1086 and V1089 are both proximal to multiple funnel helix residues when TL is closed [4,18], while F1086 was proposed to orient H1085 for correct substrate interaction [18]. Therefore, alteration of these interactions may disrupt the closed TL state and result in LOF.  F1086 to form a hydrophobic interaction when TL is partially closed, suggesting that this side-chain interaction may be important for particular TL states (S5B Fig), though it was not discussed in previous molecular dynamics (MD) studies [18]. Furthermore, V1094 was observed to be proximal to the BH residue K830 in the closed TL state [4]. An interaction between K830 and V1094 side-chains could be counter-intuitive and possibly undervalued. However, neutralization of lysine's positive charge through ionic interactions (such as D836) can promote hydrophobicity of the lysine side chain [64], supporting the observed K830-V1094 interactions in the TL closed state (S5C Fig). Most variants in V1094 are LOF (Fig 4B), consistent with disruption of K830-V1094 interaction and concomitant destabilization of the closed, active TL conformation.
Models for NTP substrate-induced TL closing remain largely untested [4,[15][16][17][18]. A recent Pol II structure [60] exhibiting an open TL state led to explicit implication of a hydrophobic pocket formed by TL residues (A1076, M1079, T1080, G1097 and L1101) and other TL proximal residues (I837, L841, V1352, V1355 and I1356) in substrate-induced TL-folding (S5D Fig). Q1078 recognition of the 2'-OH of a matched NTP substrate was proposed to promote release of the adjacent residue M1079 from the hydrophobic pocket, triggering TL closing [60,65]. Consistent with disruption of this observed pocket and concomitant destabilization of the inactive open TL state, A1076T, a pocket variant previously isolated as genetically GOF, conferred increased transcription activity in vitro ( Fig 5B). Notably, GOF phenotypes were observed for a large number of variants in pocket residues. Among them, we observed almost universal GOF phenotypes for G1097 variants, but not the extreme fitness defects found for the previously observed GOF variant G1097D. We individually phenotyped ten G1097 variants from the traditional screening and confirmed this observation (S5E Fig). Together, these results are consistent with the hydrophobic pocket stabilizing the inactive, open TL and providing a plausible mechanism for substrate-induced TL closing. A single residue, M1079, can act as a linchpin for the entire TL through a network of interactions.

Identification of stress conditions that alter transcription in vivo
GOF and LOF TL variant classes have distinct phenotypic profiles. In general, compared to LOF variants, GOF mutants are more sensitive to Mn 2+ , caffeine and cycloheximide yet generally resistant to hydroxyurea and formamide ( Fig 4D). The allele-specific Mn 2+ response amplified our previous observation that the GOF allele E1103G was highly sensitive to Mn 2+ while the LOF allele H1085Y was resistant to, or even slightly suppressed by, Mn 2+ (while the Mn 2+ effects on both mutants were suppressed by Mg 2+ supplementation) [61]. The TL phenotypic landscape showed that this Mn 2+ response was general and class-specific for GOF and LOF mutants (Fig 4D). To validate this observation, we individually analyzed seven additional variants (two LOF and five GOF) for Mn 2+ sensitivity in the presence or absence of Mg 2+ supplementation. Notably, all tested LOF mutants conferred Mn 2+ resistance while all tested GOF mutants conferred Mn 2+ hypersensitivity (Fig 4E). Allele-specific Mn 2+ responses could be suppressed by Mg 2+ supplementation (Fig 4E). Mn 2+ has been shown to stimulate transcriptional activity while compromising fidelity in vitro [62,63]. Our observations suggested that Mn 2+ may suppress LOF mutants by stimulating transcriptional activity yet exacerbate GOF mutants by further decreasing their already compromised transcriptional fidelity in vivo [12,13]. Increased Pol II catalytic activity correlates strongly with upstream transcription start site (TSS) shifts in vivo [37,41]; therefore we assayed for TSS alterations upon Mn 2+ treatment. Primer extension analysis at ADH1 revealed that Mn 2+ treatment shifted the TSS distribution upstream, and further exacerbated the upstream shift conferred by E1103G (Fig 4F). Deletion of PMR1, the golgi Mn 2+ export channel, causes accumulation cytosolic Mn 2+ [66,67], and can be used to alter Mn 2+ levels apart from supplementation of the medium. Our prior high throughput genetic interaction analyses of Pol II mutants showed that pmr1Δ strongly interacts with Pol II mutants in a highly allele-specific fashion [41], suggesting an intimate relationship between increased cellular Mn 2+ levels and altered transcription activity. Here we find that pmr1Δ also shifted ADH1 TSSs upstream (Fig 4F). While Mn 2+ may have other indirect effects on Pol II mutants, these observations support direct effects of Mn 2+ on Pol II transcription activity in vivo, raising the possibility that other allele-specific stress conditions (e.g. formamide) may also directly alter transcription in vivo.

Functional contributions of the TL tip region
The TL tip region (Rpb1 1090-1096) is a random-coil region that forms an α-helical structure when the TL is closed, and helical formation has been proposed to assist TL closing [8,18,43]. Mejia et al characterized two Eco RNAP TL tip mutants I1134V and G1136S (Equivalent to Sce Pol II V1094 and S1096) with decreased or increased transcription activity, respectively [43]. These results were interpreted as I1134V and G1136S substitutions decreasing or increasing helical propensity and thus disfavoring or favoring TL closing [43]. Sce Pol II contains each of these variants as the WT residue, therefore individual substitutions to the E. coli variants (V1094I and S1096G) would be predicted to confer opposite phenotypes under the helical propensity model. However, V1094I and S1096G did not confer phenotypes clearly consistent with either GOF or LOF (Fig 4B), failing to support the helical propensity model. We asked if the proposed correlation from Eco RNAP studies was a general property for TL substitutions in this region, if extended to more than two substitutions. Our data, calculated from 122 variants, fail to support a general correlation between helical propensity and predicted catalytic activity for Pol II substitutions in this region (S8A Fig). As discussed above, V1094 may be involved in interaction with BH residue K830, and LOF in most V1094 variants may result from disrupted BH/TL coordination. Therefore, we repeated the analyses excluding V1094 variants, yet still failed to observe a correlation (S8A Fig). We cannot rule out contributions of helical propensity in this region to TL function; however, we did not find compelling or widespread evidence for it.
A number of recent studies have suggested potential functions of the TL tip region in regulating TL dynamics [18,60,68]. In a simulated TL closing process, positively charged K1092 and K1093 were predicted to interact with several TL-proximal residues, and some of the predicted interactions were validated by subsequent Pol II crystal structures with alternative open TL states (Fig 5A). These interactions were proposed to stabilize the open, inactive TL state, and thus alanine (K1092A, K1093A) or charge reversing substitutions (K1092D/E, K1093D/E) were predicted to disrupt the inactive TL open state and result in GOF [18]. Contrary to this prediction, none of the above substitutions conferred GOF (Fig 4B). Networks of residue-residue interactions near the TL tip were observed [18,60], some of which may be functionally overlapping or redundant, adding complexity to simple models. Our previous point mutant epistatic miniarray profile (p-EMAP) studies predicted two TL-proximal mutants (S713P and I1327V) to be GOF, which we confirm here (Fig 5B), suggesting that perturbation near the TL may interfere with native interactions, or create new ones, to destabilize the open TL. The tested variants here also extend the correlation between genetically predicted GOF and increased activity in vitro ( Fig 5B). Additionally, several TL tip variants with bulky side chains (K1092W, K1093Y, K1093M) conferred GOF phenotypes (Fig 4B). Given the complexity and observation of both GOF/LOF phenotypes, we wished to further assess the functions of these residue-residue interactions.
Functional interactions among residues can be explored by the similarity between single substitution variants and the phenotypes of double mutants. We first sought evidence that variants in potential TL interaction partners could confer similar GOF or LOF phenotypes. In the simulation, K1092 switched interaction partners between two funnel helix residues D716 and E712 [18], and other charged residues were either observed or simulated to interact with S1091, K1092 or K1093 (Fig 5A). Therefore, we constructed a panel of mutants in the residues D716, E712, R1281, E1307, and D1309 for phenotypic analyses. Notably, we observed GOF phenotypes (Mn S and MPA S ) in E1307K but not E1307A, suggesting that E1307K gained an interfering interaction to destabilize the open TL state. Furthermore, we observed the Gal R phenotype in D716A (Fig 5D, S8F Fig), consistent with LOF. D716K and E712A were lethal ( Fig 5D, S8B and S8C Fig), and their defects were further explored by double mutant analyses (discussed below). Together, both GOF and LOF variants were observed in the TL tip proximal residues, consistent with roles in regulating TL dynamics.
To further dissect functional relationships, we phenotyped double mutants from potential interaction partners, and observed a number of genetic interactions (Fig 5D, S9 Fig). First, GOF and LOF mutants were mutually suppressive when combined, and most TL mutants from same biochemical class (GOF/GOF or LOF/LOF) showed additive effects (synthetically sick or lethal). The observed class-specific genetic interactions are similar to the previously reported intra-TL genetic interactions [37], consistent with alteration of TL function in TL tip proximal variants. Furthermore, K1092A/D single substitutions did not confer transcriptionrelated phenotypes, but were able to suppress the E1307K GOF phenotypes. This observed epistasis suggested that loss of K1092 relieved a putative gain of interaction in E1307K (discussed above). Finally, E712A lethality was fully suppressed by K1092A, K1092D or K1093M, adding an additional instance of epistasis. A model to explain this complex genetic relationship is that loss of native E712-K1092 interaction re-directed K1092 towards an alternative interaction or strengthened an existing interaction with D716, causing lethality. Alteration of TL tip interaction potential through K1092/1093 substitutions relieves this allele-specific effect. Taken together, the observed allele-specific and epistatic interactions between TL tip and proximal residues suggest a highly complex genetic network of residues controlling TL dynamics, and illustrate how individual residues might constrain or allow diversification of the TL through evolution.

Functional interplay of the TL and Bridge helix (BH) domains
The BH is a strikingly conserved structural domain of multi-subunit RNA polymerases spanning the wide central cleft between polymerase "jaws", adjacent to the active site and proximal to the TL [1,69,70]. Although the BH is a straight helix in most published structures [1][2][3][4][5][6], some Thermus thermophilus RNAP structures revealed a bent BH conformation proposed to support translocation [69]. This BH bending mechanism was supported by a number of simulation studies but has never been directly tested [1,11,25,69,70]. In the archaeal Mja RNAP, proline substitutions at two hinge-proximal residues M808 and S824 (equivalent to Sce Rpb1 M818 and T834) resulted in GOF, suggesting kinking by the proline substitution results in increased translocation or catalysis [11,71]. Furthermore, Mja GOF TL and BH mutants were not additive when combined, suggesting mutual dependence on BH and TL functions [11].
To explore the functional consequence of BH kinking in Sce Pol II, we constructed and phenotyped BH mutants analogous to the characterized GOF and LOF variants in Mja RNAP. Notably, Sce T834 and other BH C-terminal hinge substitutions conferred in vivo phenotypes consistent with the altered transcriptional activities in Mja RNAP (S10F Fig), and we directly confirmed the altered activity of T834 variants in vitro (Fig 6A). In contrast, substitutions in M818, a predicted BH N-terminal hinge, showed defects deviating from expected conservation of function. M818P caused lethality, and could not be suppressed by any tested TL variants, precluding us from classifying it (S10A Fig). Furthermore, M818S and M818Y, although viable, did not confer any clear phenotypes (S10F Fig). Therefore, we further assessed the functional interplay between BH and TL by double mutant analyses, including BH variants (M818S/Y, T834A/P) and TL substitutions covering a range of altered transcriptional activities (Fig 6B-6E). Notably, the GOF BH variant T834P, along with M818S and M818Y, were mutually suppressive with biochemically strong LOF TL variants (Fig 6B, 6C and 6E), revealing both additive behavior between BH and TL for some combinations, and cryptic phenotypes for M818S/Y in others. The LOF BH variant T834A also suppressed GOF TL variants ( Fig 6D). However, the additive interactions (exacerbation, synthetic lethality) we observed for GOF BH and TL double mutants were in contrast to the epistasis for Mja RNAP [11].
Multiple lines of evidence suggested additional, specific defects exist in BH mutants, beyond simple cooperation with the TL. First, M818P lethality could not be suppressed by any tested TL variants (S10A Fig), which cover a wide range of transcriptional activities. Second, suppression between BH and TL mutants of different biochemical classes (GOF/LOF) was partial and not as strong as the previously observed intra-TL suppression. Third, GOF M818S, M818Y and T834P variants appeared to exhibit activity-dependent genetic interactions with TL variants. BH GOF variants suppressed strong LOF TL variants Q1078S and H1085Y but failed to suppress, or even exacerbated slightly LOF TL variants H1085Q and F1086S (Fig 6B, 6C and 6E), consistent with conditional epistasis, where GOF activity of BH variants can suppress  (Fig 6B), M818Y suppressed (yellow lines) the strong LOF TL variants (dark blue) but not the slight and moderate LOF TL variants (light blue), and showed synthetic sickness (red lines) with GOF TL variants (green). (D) Genetic interactions between BH T834A and TL substitutions. T834A suppressed (yellow lines) the GOF TL variants and was synthetic lethal with all the tested LOF TL variants (blue). (E) Genetic interactions between BH T834P and TL or BH. Similar to M818 variants (Fig 6B, 6C), T834P suppressed strong and moderate LOF TL variants (dark blue) but was synthetic sick with weak LOF TL variants (light blue), while synthetically lethal with GOF TL variants (green). T834P was also suppressed (yellow line) by two LOF BH mutants Y836A/H. doi:10.1371/journal.pgen.1006321.g006 either specific TL variants or otherwise exert their effects in specific contexts. Finally, recent modeling studies predicted that the BH residue Y836 assists Pol II forward translocation [72] by interacting with the DNA:RNA hybrid. Y836A/H conferred Gal R phenotypes, consistent with LOF and compromised translocation (S10F Fig). Notably, GOF T834P was suppressed by Y836A/H (Fig 6E, S12B Fig), consistent with T834P conferring a TL-independent fast translocation defect, suppressible by Y836A/H.

Context dependence of TL function
We previously observed that E1103G, a GOF allele in Sce Pol II, caused LOF in Pol I, highlighting divergent contributions of active site residues in different enzymatic contexts [42]. We also observed that the Pol I TL [42] and L1081M (this study) were functionally impaired in the Pol II context. We next sought to determine the functional compatibility of other evolutionary TL variants in the Sce Pol II context, using our fitness and phenotypic landscape (Fig 7). Most tested evolutionary TL variants did not confer fitness defects, with several exceptions (Fig 7A). Furthermore, some variants, although compatible for general growth, conferred transcriptionrelated phenotypes and could be further classified by our phenotypic landscape (Fig 7B). These observations further suggest that the evolution of TL function is shaped by likely epistasis between the TL and proximal domains.
We next asked what substitutions might underlie the large difference in compatibility of the Sce Pol I TL (versus the Sce Pol III TL) within Pol II [42]. From our phenotypic landscape, although many individual Sce Pol I and Pol III TL substitutions appeared to be compatible, functionally impairing variants were identified (Fig 7B). The yeast Pol III TL contains Pol II GOF (A1076G) and LOF (N1082K) variants, both of which hypothetically could be mutually suppressive, resulting in close to WT activity in the Pol II context [42]. The Pol I TL contains three Pol II LOF substitutions (V1089H, A1090G and S1091A). The net incompatibility of Pol I TL is consistent with additive defects of the three LOF variations, given that most TL LOF combinations show additive effects [37]. Since three evolutionarily observed variants with LOF phenotypes were all localized in the TL tip, we examined the difference between Pol I and Pol II structures for the TL tip proximal domains [60,73]. The Pol I funnel helix appears to impose less constraint than the Pol II funnel helix (Fig 7C), suggesting that Pol I controls its TL with a distinct network of interactions. In all, our mutational data, together with the recent Pol I crystal structure, reveal enzyme-specific mechanisms to control a highly conserved domain at the heart of eukaryotic transcription.

Discussion
The ability of the TL to fold into multiple conformations and the dynamic conversion between these states are critical for its functions. Previous studies from us and others demonstrate that TL function is delicately balanced, such that perturbations result in either increased or decreased catalytic activity and altered translocation dynamics. Distinct consequences for transcriptional activity manifest in vivo as what we term LOF and GOF phenotypes. In this study, we have advanced our genetic framework with which to dissect Pol II mechanisms. From our phenotypic landscape, we assessed the functional contributions of almost all TL residues to fitness in S. cerevisiae under multiple conditions. Our data indicate that both intra-TL interactions and TL interactions with nearby domains (e.g. BH and funnel helices) are critical for TL function. This conclusion is also supported by recent work on Rpb9 organizing the TL indirectly through an Rpb1 TL-adjacent α-helix 21 (one of the funnel helices, discussed below) [68], interactions between the TL and F-loop regions in bacteria [31], and predictions of TLproximal variants as GOF from our previous pEMAP analysis [41] (validated in this study).
Our system allows efficient analysis of a large number of variants to evaluate accumulating computational [18,24,25,74] and structural [4,5,27,60] predictions for interactions within the TL and from without.
The major function of the TL is to link substrate recognition to catalysis, while it is also proposed to gate translocation such that translocation probability is linked to phosphodiester bond formation. Critical to this recognition is that a substrate be positioned correctly by basepairing to the DNA template, and that the 2'-OH allows NTPs to be selected over 2'-dNTPs by the TL residue Q1078 [4,9,28]. We have proposed that the Q1078-substrate interaction releases the adjacent M1079 from its intra-TL hydrophobic pocket to trigger TL closing [60]. In this study, we find a great number of variants within the pocket residues A1076, M1079, G1097, L1101 to cause GOF phenotypes, providing evidence that disruption of the hydrophobic pocket destabilizes the open, inactive TL state. Additionally, while the TL shows incredibly  TL across Pol I, II, III evolution including 38 Pol II, 42 Pol I and 42 Pol III amino acid variants relative to Sce Pol II. (B) Evolutionary TL variants in three mutant classes from the TL phenotypic landscape (Fig 4A and 4B). Existing variants from Sce Pol I are colored in blue, and existing variants from Sce Pol III are colored in red. Sce Pol I has three substitutions (V1089H, A1090G and S1091A) that cause LOF in the Pol II context; Sce Pol III has one substitution (A1076G) classified as GOF and one substitution (N1082K) classified as LOF. (C) Difference in positioning of funnel helices (relative to TL) in Pol I and Pol II. Cartoon representation of TL/funnel helices from Pol I and Pol II are shown in cyan and yellow, respectively (PDB: 5C4J and 2VUM). high evolutionary conservation for a number of residues, prior work indicated alteration of ultra-conserved residues (eg. E1103 in Pol II, E1224 in Pol I) in different RNA polymerases could have distinct effects, suggesting the importance of the evolved context within each enzyme [12,37,42]. Here, we evaluate many evolutionarily observed eukaryotic TL variants in the Sce Pol II system, and discover a number of functionally impaired TL variants. Our results highlight that TL proximal domains may impose constraint and also allow functional diversification in the molecular evolution of the highly conserved TL by epistatic interactions.
One example of a proximal region, the so-called "funnel helices" (Rpb1 α-20 and α-21) or "rim helices" in the bacterial RNAP literature, shows both evolutionary conservation and functional diversification. Funnel helices are both surface exposed and proximal to the TL [60,75]. Multiple pieces of evidence from three mutations in α-21 suggest roles for funnel helices in controlling TL function. One, the C4 allele of Drosophila melanogaster, corresponding to R726H in Sce Rpb1, confers a slow elongation rate in both Drosophila (in vitro) and human Pol II enzymes (in cells) [76,77]. The molecular mechanism of this allele is not currently known, but based on another α-21 substitution (G730D) identified in yeast, we would speculate C4 enzymes have altered TL dynamics. rpb1-G730D was identified in yeast twice, in independent genetic screens [78][79][80]. rpb1-G730D is catalytically slow [81], confers a severe growth defect but can be suppressed by a GOF mutant, rpb9Δ [68,78]. In fact, rpb1-G730D behaves as if it is incompatible with Rpb9 [68]. Recent work from the Peterson lab strongly supports a model where Rpb9 normally coordinates a loop of Rpb1 -the "anchor loop"-to appropriately interact with the TL [68]. When Rpb9 is removed, anchor loop-TL interactions are disrupted, and the open conformation of the TL is destabilized. In rpb1-G730D, structural perturbations are proposed to alter Rpb9-Rpb1 interactions such that they interfere with the TL, therefore rpb1-G730D is incompatible with Rpb9. Removal of Rpb9 or alteration of specific Rpb9 residues that organize the Rpb1 anchor loop relieve the incompatibility between rpb1-G730D and the TL. Third, we previously identified rpb1-S713P, a substitution just proximal to the anchor loop (between α-20 and α-21), as conferring gene expression, genetic interaction, and initiation phenotypes indistinguishable from GOF TL mutants [41]. Here we show that rpb1-S713P also confers increased biochemical activity, similar to both TL GOF alleles and anchor loop GOF alleles. We propose that rpb1-S713P, through constraints of the proline on structure, alters the anchor loop and therefore TL dynamics. It is conceivable, given that the secondary channel and funnel helices are accessible to factors, factor binding might also be communicated to the TL from distal sites. In addition to the three previously identified mutants, we utilized a new set of TL mutants to assess genetic interactions between the TL and the funnel helix α-21, and discover epistasis between K1092A/D (TL) and a lethal mutant E712A (funnel helix) along with multiple allele-specific genetic interactions (Fig 5D). We have suggested a more relaxed control mechanism in the Pol I compared to Pol II (Fig 7C). Taken together, funnel helices may serve as a regulatory hotspot for direct or allosteric control of the Pol II active site through the TL. While structurally conserved, evolutionary diversification of sequence may allow distinct interactions with the TL in different msRNAPs.
The characterization of the unexpectedly healthy H1085L variant clouds the issue of how H1085 functions in substrate selection and catalysis. H1085 interacts with the substrate NTP through salt bridge and hydrogen bond [4], and previous simulations with limiting H1085 variants predicted the hydrogen bonding to be critical for maintaining substrate interaction [74]. The discovery of H1085L argues that productive substrate interactions may be supported by entirely different chemistry, although we cannot rule out the possibility that H1085L redirects substrate interactions to an alternative residue. Furthermore, H1085 variants may have multiple defects in NAC, such as substrate selection [12], catalysis [12,61], intrinsic cleavage [61] and PPi release [20,21], and whether or not H1085 or analogous residues act as a general acid remains controversial in different RNAPs [4,7,9,61,82]. Function of H1085L in all of these steps remains to be determined, but the H1085L phenotype suggests that function of H1085 as a general acid may be entirely bypassed.
The established TL phenotypic landscape can be further explored to study intra-and inter-TL epistasis. First, whether individual TL residues work collaboratively or independently to ensure balanced TL dynamics and proper function is an open question. Some TL residues can be functionally overlapping and act at similar steps, or functionally discrete, acting at distinct steps. For example, combination of LOF mutations in Q1078, N1082 and a TL-proximal residue N479 resulted in non-additive genetic interaction, suggesting functionally overlapping roles for these residues. In contrast, combination of variants from Q1078 (or N1082) and H1085 resulted in exacerbation or synthetic lethality, suggesting independent functions [37]. Coupled with structures of partially folded TL states, these genetic studies support the functional distinction between NIR residues and a multi-step TL folding model for the promotion of catalysis [37]. Here, we have identified many more predicted GOF and LOF TL variants (Fig 4B), some of which are predicted to confer epistatic interactions (e.g. F1086 and V1089). We expect the phenotypic landscape of a multiplysubstituted TL library to be extremely informative for understanding functional relationship between TL residues.
Second, the TL phenotypic landscape is an extremely sensitive readout for assessing active site re-arrangement. Transcription is under control by many factors, some of which may alter the Pol II active site conformations, though few studies directly address these possibilities. Initiation factors and Pol II TL mutants confer similar alterations in transcription start site selection, consistent with initiation factors functioning through the Pol II active site and altering the efficiency of Pol II catalysis during initiation [41,61,83]. Furthermore, TL may communicate with other Pol II sites, such as the RNA exit channel or clamp domain [36], or in direct competition with external factors, such as TFIIS [33]. Perturbations of this communication may alter TL dynamics and cause allele-specific genetic interactions (Figs 5 and 6). Specifically, an external perturbation by a relevant factor or Pol II TL distant domain may show epistasis or synergy only with specific TL alleles of a class (either LOF or GOF), whereas a non-interacting factor may not. Finally, similar perturbation of the TL phenotypic landscape by different factors would suggest functional similarity between them, thus clustering of phenotypic landscape changes upon different perturbations is expected to provide valuable insight.
The TL phenotypic landscape, along with our previous work [37], illustrates a strategy of utilizing in vivo genetic reporters or stress response profiles to distinguish a large number of mutants with distinct in vivo defects. As discussed above, the phenotypic landscape sheds light on functional contribution of TL residues to its dynamics, to the mechanism of catalysis and to the evolutionary constraints of the TL sequence and function. The phenotypic landscape strategy expands the current scope of existing deep mutational scanning studies [44][45][46][47], and can be generalized to study most, if not all, of the yeast proteins.
Detailed description of plasmids is in S2 Table, and complete sequences of plasmids are available upon request. For studies involving individual analyses of Pol II mutants, sitedirected mutagenesis was performed via the Quickchange strategy from Stratagene. All mutagenized regions have been verified by sequencing before sub-cloning into pRS315-derived plasmids, as previously described [37].

Genetic and biochemical analyses of individual Pol II mutants
Phenotypic analyses of individual Pol II mutants were performed by plasmid shuffling assays, with viable mutants further subjected to standard plate phenotyping. Each mutant in a pRS315-derived plasmid (CEN LEU2) was transformed into CKY283 (rpb1Δ::CLONATMX, pRP112 RPB1 CEN URA3). Transformants (Leu + ) were patched on SC-Leu plates and subsequently replica plated to SC-Leu+5FOA (1mg/mL) to assay complementation ability upon loss of the RPB1 CEN URA3 plasmid. Experimental details are as previously described [12,37]. Saturated cultures from single colonies of viable and shuffled Pol II mutants were subject to 10-fold serial dilution and spotting on indicated phenotyping media, as described in various previous reports [12,37].
Pol II enzymes were purified via a tandem-affinity tag (TAP) protocol derived from [85] with modifications described in [12]. Transcription elongation reactions were performed with Pol II elongation complexes assembled on a nucleic acid scaffold, in a procedure described in [12] with slight modifications in the amount of Pol II and nucleic acids as described in [60]. For each enzyme, elongation assays were performed with 25 μM, 125 μM, 500 μM and 750 μM NTPs (each of ATP, GTP, CTP, UTP), and maximal elongation rates were extracted exactly as previously described [12].
ADH1 transcription start site selection was analyzed by primer extension. In brief, indicated strains were grown in YPD until mid-log phase (~1×10 7 cells/mL), and diluted with YPD with 10mM MnCl 2 or equal volume of H 2 O. Total RNA was extracted as described [86], and 30 μg of total RNA was subject to primer extension analysis, following a protocol derived from [87] with modifications described in [37].

High-throughput phenotypic analyses of the TL variants library
The TL variant library was synthesized by Sloning Biotechnology (now MorphoSys) with wellcharacterized TL variants excluded (specified in Fig 1B) using a building block approach [48,49]. The TL variant library was transformed into CKY283 via a gap-repair strategy as previously described [41]. In brief, the amplified TL variant library with flanking sequence was transformed into CKY283 together with a linearized pRS315-derived plasmid (CEN LEU2) containing rpb1 deleted for the TL (TLΔ) and linearized at the deletion junction, allowing in vivo homologous recombination. Homologous recombination produced a library of complete rpb1 genes containing TL variants. The gap-repaired TL variants (Leu + ) were titered and plated at 200-300 colonies per plate to reduce inter-colony growth competition, and Leu + colonies were first replica-plated to SC-Leu+5FOA (1mg/mL), and subsequently to additional selective and control media. Three independent biological replicate screens were performed. In each replicate, we pooled 6000 to 12000 colonies. Each cell pool was subjected to genomic DNA extraction and TL amplification by emulsion PCR. Amplification of the TL region was performed using Micellula DNA Emulsion & Purification (ePCR) Kit (Chimerx) per manufacturer's instructions. To minimize amplification bias, each sample was amplified in a 15-cycle ePCR reaction, purified and subject to additional 13-15 cycle scale-up ePCR reactions. The two-step ePCR amplification protocol ensured sufficient yield of DNA for NGS sequencing while minimizing perturbation of the allele distribution in the DNA pool. The amplified samples were subject to Illumina HiSeq 2500 sequencing, and on average over 2 million reads were obtained from each replicate of a sample, with high reproducibility and minimal perturbation of the mutant distribution within the TL variant library (S1D Fig). Allele frequency was subsequently measured by deep sequencing of the TL amplicons. All the sequencing data (FASTQ format) for the reported analyses are deposited and available under the NCBI bioproject PRJNA340979. To identify the mutations that were present for each set of paired-end reads, a codon-based alignment algorithm was developed to align each paired-end read set in which the overlapping substrings from both flanking regions agreed perfectly to the WT sequence. The purpose of our approach was to identify real variants using an expected set of mutant codons used in the programmed library synthesis from sequencing errors. A dynamic programming algorithm was applied so that an exact match of three letters was assigned a positive score, a mismatch of at least one letter in a codon was assigned a negative score, and the insertion or deletion of either one, two or three letters was assigned a constant negative score. The allele frequency was subsequently calculated from the mapped reads, and the phenotypic score of each TL variant was calculated by allele frequency change (normalized to WT) under each condition, as below: Mutants with less than 200 reads in the transformed pool (SC-Leu) and allele frequency changes assessed from less than 50 reads from both conditions were excluded from further analyses. Median values from three independent biological replicates were used for fitness and phenotype scoring. Fitness score cutoff for lethality was estimated based on fitness scores (on SC-Leu and 5FOA) of 163 known viable TL and 16 known lethal mutants. Hierarchical clustering for generating phenotypic landscape was performed by Gene Cluster 3.0 using centered correlation [88]. Figures displaying structural information were generated using Pymol (https://www.pymol.org/).

Evolutionary analyses
Eukaryotic RNA polymerase large subunit sequences were obtained from BLAST using Sce Rpb1 (Pol II), Sce Rpa190 (Pol I), and Sce Rpo31 (Pol III) sequences as queries. Sequences were assigned to Pol I, II, or III based on highest similarity when compared to each of the three query sequences, with prokaryotic sequences further filtered out. Multiple sequence alignments (MSAs) were generated by first applying CD-HIT [89] to cluster sequences so that the identity between sequences in different clusters was less than 90%, then applying MUSCLE [90] to obtain an alignment that contains one representative sequence from each cluster. The TL conservation score was generated using Jalview 2.8 version 14.0 [91] and plotted as a heatmap using Gene-E (http://www.broadinstitute.org/cancer/software/GENE-E/index.html).
Supporting Information S1