The structural basis of mRNA recognition and binding by yeast pseudouridine synthase PUS1

The chemical modification of RNA bases represents a ubiquitous activity that spans all domains of life. Pseudouridylation is the most common RNA modification and is observed within tRNA, rRNA, ncRNA and mRNAs. Pseudouridine synthase or ‘PUS’ enzymes include those that rely on guide RNA molecules and others that function as ‘stand-alone’ enzymes. Among the latter, several have been shown to modify mRNA transcripts. Although recent studies have defined the structural requirements for RNA to act as a PUS target, the mechanisms by which PUS1 recognizes these target sequences in mRNA are not well understood. Here we describe the crystal structure of yeast PUS1 bound to an RNA target that we identified as being a hot spot for PUS1-interaction within a model mRNA at 2.4 Å resolution. The enzyme recognizes and binds both strands in a helical RNA duplex, and thus guides the RNA containing the target uridine to the active site for subsequent modification of the transcript. The study also allows us to show the divergence of related PUS1 enzymes and their corresponding RNA target specificities, and to speculate on the basis by which PUS1 binds and modifies mRNA or tRNA substrates.


Introduction
The cellular transcriptome throughout all domains of life displays a highly complex regulatory network of more than 150 known post-transcriptional RNA modifications that modulate RNA biogenesis, function, specificity, and stability [1][2][3].Pseudouridine (C), a C5-glycosidic isomer of uridine, was discovered in 1951 and soon after termed the fifth nucleoside [4][5][6].Almost 20 years later, the first pseudouridine synthase gene, TruA, was identified, and found to modify tRNA in bacteria [7,8].More recent advances in transcriptome-wide mapping of C revealed widespread pseudouridylation of mRNAs in eukaryotes at levels comparable to N 6 -methyladenosine (m 6 A) modifications [9][10][11][12][13][14].Despite the abundance of C in mRNA, little is known about a) the purpose of that modification, b) whether the location of pseudouridine modifications in mRNA is random or if they are of functional importance, and c) whether they are installed by specific PUS enzymes.
Two unique classes of pseudouridine synthase enzymes, that are jointly responsible for pseudouridylation of RNA, differ in the way that they target their RNA substrates.Guide RNA-dependent PUS enzymes in eukaryotes and archaea (such as Cbf5 in yeast and dyskerin in humans) are part of a box H/ACA ribonucleoprotein complex (so named based on two closely related sequence elements in the guide RNA called "box H" and "box ACA") that utilize small nucleolar RNAs (snoRNAs) as guides that recognize and base-pair with its substrates, which are mostly non-coding RNAs (ncRNA) [15].Alternatively, an RNA recognition mechanism that does not rely on guide RNA factors is employed by stand-alone PUS enzymes that independently recognize and modify their targets (reviewed in [16]).In contrast to the H/ ACA snoRNP PUS enzymes, most stand-alone PUS enzymes are conserved throughout eukaryotes and bacteria.These PUS enzymes are further divided into six families, which differ in N-or C-terminal extensions flanking a conserved catalytic core domain.That domain is present in all H/ACA snoRNP and stand-alone PUS enzymes and contains a central catalytic motif corresponding to an antiparallel β-sheet that is flanked by α-helices [17,18].A strictly conserved catalytic aspartate residue in that motif is required for the rearrangement of uridine to its C-glycoside isomer C in a mechanism that is not yet clear [19,20].
Eukaryotes generally contain multiple unique stand-alone PUS enzymes (PUS1-PUS10 and mitochondrial RPUSD), which differ in their substrate preference and localization in the cell.Each of them modifies specific sites in tRNAs, snRNAs, and rRNAs by targeting a sequence and/or structural element in their respective substrates [16,20].The crystal structures of many PUS enzymes have been solved, including structures of eukaryotic/archaeal box H/ACA snoRNPs with guide and substrate RNAs [21][22][23][24][25][26][27][28] and several bacterial stand-alone PUS enzymes bound to non-coding RNAs, most of them revealing specific interactions of PUS enzymes with their respective targets [29][30][31][32].
It remains somewhat unclear which PUS enzymes are responsible for the pseudouridylation of messenger RNA (mRNA).Various studies have indicated that more than half of the C modifications in mRNA are catalyzed by the tRNA-specific PUS4 (a member of the divergent TruB family of PUS enzymes) and PUS7 (TruD family) through recognition of tRNA-like structures (which are their reported primary targets [10,[12][13][14].Like PUS7, PUS1 (TruA family) has been reported to modify multiple structurally diverse positions in tRNA, U2 and U6 snRNAs, suggesting less restricted selection of its RNA targets by these enzymes compared to PUS4 [33][34][35][36].More recent data have suggested that PUS1 is the predominant PUS to modify uridines in mRNA [36][37][38][39].Although PUS1 sites in mRNA show little sequence similarity, highthroughput pseudouridylation studies have implicated a structure-dependent mRNA target recognition mechanism and suggested that modulation of the RNA structure may play a role in the regulation of mRNA pseudouridylation [14,40].However, it is still unclear how a tRNA-specific pseudouridine synthase recognizes and modifies mRNA substrates, mostly due to the lack of available structural information of PUS1 enzymes bound to mRNA targets.
Only a few structures of eukaryotic stand-alone PUS enzymes have been solved: the catalytic domains of human PUS1 and PUS10 [41][42][43], as well as structures of human and yeast PUS7 [44,45].The only available structural information of a stand-alone PUS enzyme bound to its target RNA are the structures of E. coli TruA (the closest bacterial homologue of PUS1) and TruB, each bound to tRNA [30,46].Despite the structural similarities between TruA and PUS1, modeling and docking studies of tRNA and the core domain of human PUS1 suggests a significantly different orientation of the tRNA when bound to PUS1 than in the E. coli tRNA-TruA complex [41].
Because visualization of how PUS1 binds to mRNA might provide new insight into the basis of its action on such substrates, we generated a pair of crystal structures of wildtype S. cerevisiae PUS1 and a catalytically inactive PUS1 mutant in complex with short mRNA fragments.The structures, from two unrelated crystal forms, represent the first structures of a eukaryotic stand-alone PUS enzyme bound to a target RNA.The structures illustrate how PUS1 recognizes and binds a helical RNA duplex and enable the identification of RNA-contacting amino acid residues that make extensive contacts with each RNA strand to orient and guide the RNA into the active site of the enzyme.Additional examination and comparison with previously described structural and biochemical studies indicate that a) while PUS1 and TruA use the same protein surface to interact with their respective target-RNAs, the position and identity of the corresponding contact residues and the position of the bound RNAs differ significantly from one another, and b) PUS1 likely binds and acts on at least one tRNA target in a manner that is closely related to how it engages with its docking site in mRNA.
To generate an untagged version of the PUS1 enzyme for crystallography trials, PUS1 D134A lacking the N-terminal HIS-tag was subcloned into expression vector pET21d (EMD Biosciences) using a Gibson Assembly cloning kit and protocol (NEB # E5510S) and sequence verified.A clone encoding tagless wild type PUS1 was then generated from PUS1 D134A by converting alanine 134 back to wild type aspartate 134 using a QuickChange II XL kit and protocol (Agilent).
Bacterial pellets from a liter of culture were resuspended in 30 mL of Buffer A (50 mM NaCl, 10 mM Tris/Cl, pH 7.5 at 4˚C, 0.1 mM EDTA, 5% glycerol, 1 mM DTT), and 0.2 mM PMSF and 900 U of Benzonase added.Cells were lysed on ice with a Misonix S-4000 sonicator operating at 70% power; the cell suspension was subjected to 80 seconds of total sonication time over the course of four cycles (each applying 20 second sonication bursts followed by 60 second cooling periods).The cell lysate was clarified by centrifugation in an SS-34 rotor (Sorvall) at 30597 x g for 20 minutes at 4˚C, followed by hand filtration through a 5 μm filter.The lysate was loaded onto a 5 mL HiTrap Heparin HP column (Cytiva) and the column was washed with 25 mL of Buffer A. A gradient from 100% Buffer A to 100% Buffer B (Buffer A augmented with 1.0 M NaCl) was run with a total elution volume of 100 mL and 5 mL fractions were collected.Peak fractions were combined, concentrated, filtered, and loaded onto a HiLoad 16/60 Superdex 200 gel filtration column (Cytiva) equilibrated in Buffer SEC (Buffer A augmented with 200 mM NaCl).Peak fractions were combined and concentrated.The final purification of the protein, including the size exclusion chromatography elution profile of the purified protein, are shown in S1 Fig.

In vitro transcription of Fluc mRNA fragments
A DNA duplex containing the T7 promoter followed by firefly luciferase (Fluc) mRNA (S2 Table ) covering positions 787-845 was ordered from Integrated DNA Technologies (5' -gcgaaattaatacgactcactatagggATTCCGGATACTGCGATTTTAAGTGTTGTTCCA TTCCATCACGGTTTTGGAATGTTTAC -3'). 1 μg of the duplex was in vitro transcribed using the HiScribe™ T7 Quick High Yield RNA Synthesis Kit (NEB # E2050S) following the standard IVT protocol and purified using Monarch 1 RNA Cleanup columns (NEB # T2030).

PUS1 activity assays
210 pmol (2.1 μM final concentration) PUS1 was incubated with 84 pmol (0.84 μM final concentration) synthetic RNA oligos (Integrated DNA Technologies, S3 Table ) or a short in vitro transcript (IVT) in 1x NEB buffer 1.1 for 90 min at 30˚C in a 100 μL reaction.Reactions with full length Fluc mRNA contained 2.1 μM PUS1 and 8.3 nM Fluc mRNA).Reactions were stopped by adding 0.8 units of Proteinase K (NEB # P8107S) followed by incubation for 10 min at 37˚C.The modified RNA was subsequently column-purified (NEB # T2030), 2 pmol were digested to single nucleosides using the Nucleoside Digestion Mix (NEB # M0649), and the ratio of Cs versus uridines was determined via tandem quadrupole mass spectrometry (Fig 1A).

Crystallization and data collection
RNA targets were complexed with purified protein, with the RNA present in a 1.4-molar excess over the protein.Complexes were screened for crystal growth in 96 well plate formats, with 200 nanoliter drop volumes of equal parts complex and reservoir solution equilibrated against 100 microliter reservoirs, against multiple commercial crystallization screens, while using a mosquito robot (TTP LabTech).Drops that generated visible crystals were then used to set up subsequent screening and expansion trays by hand, with 2 microliter drops equilibrating against 1000 microliter reservoir volumes.

Phasing and refinement
Structures were phased via the molecular replacement method, using the catalytic domain of human PUS1 (PDB ID: 4J37) [41] as a search model.Molecular replacement searches were conducted using program PHASER [51] within the Phenix crystallographic computational suite [52].The structure was rebuilt and refined using programs COOT [53] and PHENIX.REFINE [54].Initial refinements were performed using phenix.refinedefaults (standard Cartesian plus real-space refinements) with the addition of secondary structure restraints.During the final rounds of rebuilding and refinement, weight optimization (both geometry and B-factor restraints) was added.Simulated annealing was used periodically throughout to verify accuracies when fragments were built from scratch.NCS is not present in the asymmetric unit, and refinements with TLS and/or riding hydrogens did not improve the structure so were not used.

Identification of a PUS1 mRNA substrate
Since we were interested in the ability of PUS1 to modify uridines in full length mRNA, we tested the activity of purified PUS1 on in vitro transcribed Fluc mRNA (1769 nt, S2 Table ), a standard model mRNA in our lab.Using LC-MS/MS to detect and quantify the number of uridines that were pseudouridylated by PUS1, we were surprised to find that approximately 14% of all uridines in Fluc mRNA were pseudouridylated after 5 hours of incubation with PUS1 (Fig 1B).These values suggested that approximately 64 of the total 458 uridines in Fluc had been modified by PUS1.These data did not provide information about whether the same uridines at specific positions in Fluc were modified, or if the pseudouridylation occurred in a non-targeted fashion and was distributed randomly across the mRNA.However, this number was significantly higher than the less-than-one C per transcript that have been reported for cellular mRNAs previously [14].
Using a sliding window approach, dividing the Fluc mRNA first into 300 nt and then by 60 nt fragments, we identified a sequence spanning positions 787 to 845 to be highly pseudouridylated by PUS1 (S2A Fig) .To facilitate subsequent crystallographic studies, we further tested multiple short synthetic RNA oligos covering Fluc positions 760-845 to find the shortest possible mRNA substrate for PUS1 (Fig 1C).We found that most oligos comprising the Fluc sequences between positions 760 and 796 displayed robust pseudouridylation, while oligos with sequences downstream of position 787 were only weakly or not modified at all (Fig 1D and S2B Fig).Concurrent to our work, the Gilbert lab combined computational prediction and mutational analysis to reveal the determinants for an RNA target to be recognized by PUS1.Their study suggested that the ideal PUS1 RNA substrate contains a uridine as part of a weak H-R-U motif ("H" = adenine/cytosine/thymine; "R" = guanine/adenine) at positions -2, -1, 1 at the base of a 13 base pair stem-loop structure with a small internal bulge to increase flexibility [40].Interestingly, when we predicted the secondary structure of each of the RNA oligo substrates, we found that oligos that were modified by PUS1 all formed a suboptimal, but similar, PUS1 target structure (S2C Fig) .The pseudouridylated oligos R167, R168, and R169 all contained at least one uridine at the base of a stem-loop structure.However, R167 and R168 did not contain a H-R-U motif, and the stem length-a critical component in PUS1 target preference-in all oligos was shorter than optimal.
Intrigued by these results, we hypothesized that we could engineer an optimized stem-loop containing RNA substrate following the parameters described by the Gilbert lab [40] but based on the Fluc mRNA sequence (R397) that would (1) be efficiently modified by PUS1, and (2) form stable complexes with PUS1 for crystallography (Fig 1E).When we incubated the engineered Fluc stem-loop substrate with PUS1 in vitro, we identified pseudouridylation in an RNA fragment encompassing the 5'-end of the substrate via UHPLC-MS/MS (Fig 1H, panels 1 and 2).To confirm the position of the pseudouridine site, we incubated PUS1 with a similar substrate (R444) featuring reversed positions of the U and A at the base of the stem (Fig 1F).This substrate did not get pseudouridylated by PUS1 (Fig 1H , panel 3), confirming that-as predicted-PUS1 only targeted the uridine in the H-R-U motif at the base of the stem.In addition, we designed an artificial RNA substrate (R398) that fused the R397 Fluc-mRNA stem loop with the known PUS1 target PFY1-U290 mutant stem 1 loop that had previously been shown to be pseudouridylated by PUS1 in vitro [40].As expected, we detected pseudouridylation in the fragments containing the H-R-U motif at the base of either stem (Fig 1D and 1H, panels 3 and 4).

An active site mutant of PUS1 is catalytically inactive
We generated a catalytically inactive PUS1 variant, with the intention of using such a construct for structural studies without the complication of formation of a heterogeneous mixture of substrate, reactions intermediates or products during the crystallization process.To do so, we exploited the observation that all PUS enzymes-stand-alone or as part of the H/ACA snoRNP complexes-contain a conserved aspartate residue (D146 in human PUS1) that is strictly required for activity [41,55].Substituting the aspartate residue 134 with alanine (D134A) in yeast PUS1 generated a catalytically inactive PUS1 variant (PUS1 D134A ) that was unable to pseudouridylate RNA (S3 Fig).

PUS1 binds to an RNA duplex
Even though both optimized RNA substrates R397 and R398 were recognized and pseudouridylated by PUS1 at the predicted positions, attempts to co-crystallize PUS1 with either substrate were unsuccessful, most likely due to steric interference of the long RNA extensions with crystal lattice formation.We thus attempted crystallization with shorter RNA oligo substrates that resembled suboptimal PUS1 target structures that we had identified as being efficiently modified (R167, R168, R169).Unfortunately, neither of these oligos formed crystals with PUS1 wt or PUS1 D134A either.We then focused on RNA oligo R263 (Fluc positions 782-799; sequence provided in S2B

PUS1 interacts with both strands of a helical RNA duplex
PUS1-RNA co-crystals that diffracted to 2.4 Å resolution, containing the R263 RNA construct and the PUS1 D134A inactive enzyme, were successfully grown and found to belong to space group C2 (PDB: 7R9G; Table 1).Modeling and refinement of the enzyme-RNA complex yielded values for the crystallographic R-factors (R work and R free ) of 0.222 and 0.265 respectively, with tight protein geometry (rmsd bonds and angles 0.003 Å and 0.58˚; 95.73% of residues in favored Ramachandran regions).The average B-factor values for the protein and the bound RNA were comparable (65.48 and 62.74 Å 2 , respectively).The electron density maps from molecular replacement and subsequent rounds of refinement displayed wellordered density for most of the protein chain (the first 70 residues, last 49 residues, and four subsequent surface loops ranging in length from 2 to 19 residues were disordered) and the first 12 bases of the RNA substrate (the final 6 base pairs were also disordered).
The contents of the crystallographic asymmetric unit correspond to a single protein subunit and a single strand of the bound RNA ligand (Fig 2A).That complex was observed to be part of a higher-order assembly, comprised of an RNA duplex and two bound copies of PUS1, that is generated via application of a crystallographic two-fold symmetry axis (Fig 2B).Rather than binding to PUS1 as a monomer in the predicted short stem-loop formation, two R263 oligos formed a helical duplex.The base pairs within the RNA duplex (involving positions 2 through 10 in each strand) display both Watson-Crick and non-Watson-Crick interactions with their counterparts, including two central G:G reverse Hoogstein basepairs.The 5'-and 3'-most modeled base on each strand are not engaged in base-paired interactions but are also not flipped out of the strand's duplex conformation (Fig 2C).Immediately proximal to the 5' end of each RNA strand, at a distance of approximately 5.5 angstroms, a well-occupied and tightly coordinated sulfate ion is observed.This corresponds to a location and distance appropriate to represent an additional backbone phosphate group if the RNA were extended by an additional base at its 5' end (Fig 2D ).A sulfate ion was also observed and modeled at the same position in the previously described structure of the human PUS1 apo enzyme (PDB 4J37) [41,55].
In contrast to the extensive base-paired contacts between the two crystallographically related RNA strands, the corresponding pair of bound protein molecules do not display significant contacts with one another; the few contacts between the two bound protein subunits are limited to two surface-exposed loops (residues 86 to 89 and 95 to 98) in the N-terminal region of the enzyme.Therefore, we believe that the two copies of PUS1 that are associated with opposite sides and ends of the symmetric RNA duplex are bound as individual monomers, independently of one another.That conclusion agrees with the solution behavior of the purified enzyme, which eluted from a size exclusion column as a monomeric species at high micromolar concentrations of protein (S1 Fig) .Within the complex between a single bound protein subunit and the RNA duplex, at least thirteen amino acid side chains contact numerous atoms on each of the two RNA strands (Fig 3A and 3B).The residues involved in substrate recognition and binding are largely comprised of two separated clusters of residues within the enzyme's sequence and structure.The first cluster of seven RNA-contacting amino acids is distributed between residues 89 and 188; they collectively contact one end of the RNA duplex and the adjacent sulfate ion, positioning the first base at the 5'-end of an RNA strand near the entrance to the active site.All but one of those residues are conserved across eukaryotic PUS1 homologues from yeast to humans (S5 Fig) .A second cluster of four additional RNA-contacting amino acids is distributed between residues 362 and 394; they contact a series of bases and backbone atoms further downstream on one of the two RNA strands and are conserved across eukaryotic PUS1 enzymes (Fig 3A and 3B, S5 Fig) .Finally, two additional residues (K277 and Y459) contact the RNA backbone near the opposite end of the bound RNA duplex.The yeast specific K277 is not conserved across eukaryotic PUS1 enzymes.While the tyrosine in position 459 is yeast specific, other eukaryotic PUS1 variants contain a conserved threonine in its respective position.All the protein-RNA contacts described above are duplicated via symmetry by the second RNAbound protein subunit (Fig 3C ); each independently bound protein monomer makes multiple contact to nucleotide bases and to backbone phosphate groups on both RNA strands.To further investigate if the identified protein-RNA contacts are important for PUS1 function, we generated alanine mutants of residues in either cluster A (H89, R132) or cluster B (R362, K363) that are in close proximity to the RNA strands.While replacing histidine 89 with alanine only had a weak effect on PUS1 activity and/or specificity when tested with the R169 RNA oligo (~29% reduction of activity), mutating arginine 132 to alanine dramatically reduced pseudouridylation of the R169 RNA oligo by approximately 87%, suggesting an important functional role of this amino acid side chain in coordinating the RNA in the active site of the enzyme (Fig 4A  To ensure that the differences in activity were not caused by varying PUS1 concentrations, we showed that each reaction indeed contained similar PUS1 concentrations (S8B Fig) .We then tested the wild type and mutant PUS1 variants with the engineered R397 substrate that contained only one C-site.In agreement with the results above, we observed a dramatic reduction of pseudouridylation of the R397 RNA substrate with PUS1 mutants R132A and R362A, while PUS1 H98A and PUS1 K363A still modified their substrate (Fig 4B and 4C).
To test whether mutations that significantly reduced PUS1 activity affected substrate binding, we performed EMSAs of wildtype and mutant PUS1 enzymes with the RNA substrate used to generate the crystal structures (R263) as described in Materials & Methods.Neither of the catalytically impaired cluster A and B mutants showed a reduction of substrate binding (S6C Fig) .While these assays may not provide the resolution to accurately compare binding affinities, they clearly indicate that the detrimental effect of these mutations on enzyme activity cannot be explained by reduced binding.Together, these data imply that the formation and recognition of an RNA duplex near the active site is a mechanistically relevant feature of target selection.
In the structure of the catalytically inactive PUS1 D134A bound to the R263 duplex, the unpaired 5' base (A 1 ) is positioned within 4.5 angstroms of residue 134 (measured from Calpha of 134 to C4' of A1) and appears to be within potential distance to rotate into the enzyme active site (Fig 3B).To further examine the interactions of bound RNA with a catalytically competent version of the enzyme (in which the catalytic aspartate at position 134 was restored) we altered the RNA ligand by substituting a 5-fluoro-uracil (5-FU) base at position number 1, reasoning that it might be captured in a suicide complex by the enzyme after being flipped into the active site and subsequent nucleophilic attack by the carboxylate of D134.RNAs harboring such substitutions have previously been demonstrated to act as mechanism-based inhibitors of pseudouridine synthase enzymes and have been used to demonstrate the structural mechanism by which the target base gains access to the enzyme's active site [56].
Crystals of wild type PUS1 in complex with the new R340 RNA construct (5'-[5FU]AA UCG GGA UUC CGG AUA-3') belonged to a different space group (P6 1 22) and diffracted to 2.9 Å resolution (PDB 7R9F).Although the space group and corresponding lattice packing arrangement of the protein-RNA enzyme complex was completely unrelated to the packing of the C2 space group of the PUS1 D134A :RNA complex, the same RNA duplex, and the same position of two independently bound protein subunits was observed (S7A Fig) .The RNA duplex and second bound protein molecule is again generated by the application of crystallographic two-fold rotation symmetry on the contents of the asymmetric unit (which again corresponds to a single protein subunit and a single RNA chain).Structural superposition of an RNAbound monomer or dimer from the two crystal structures produced rmsd values of 0.63 Å and 1.47 Å, respectively.The reproducibility of all structural observations described above, in two different crystallization conditions and crystal forms, reinforces the conclusion that the formation and presence of an RNA duplex near the target uridine, and recognition of elements within each RNA strand by residues from each protein subunit, is a reproducible and mechanistically relevant feature of substrate recognition and activity.
Beyond providing an independent confirmation of the binding of two copies of the enzyme to an RNA duplex in a symmetric arrangement, the resulting electron density map, after several initial rounds of rebuilding and refinement, displayed a significant feature of positive difference density in the active site that might be indicative of low occupancy (ultimately estimated at less than 10% in refinement) by the 5' 5FU nucleobase of the RNA (S7B Fig) .However, the combination of lower resolution and mixture of density features surrounding that base prevented us from unambiguously modeling the bound RNA in a state corresponding to a trapped catalytic complex.Interestingly, our attempts to modify the RNA substrate to trap either the target uridine or 5FU in the active site by adding different H-R-U motifs 5' of the R263 sequence captured in our structures failed, despite designing the sequence to assure that the H-R part was single stranded.
The structure of the RNA-bound PUS1 enzyme was found to be similar to both the previously solved structure of human PUS1 in the absence of bound RNA (PDB 4J37) and the structure of bacterial tRNA pseudouridine synthase TruA (PDB 2NR0), with a rmsd across all comparable alpha carbons of approximately 2 Å in both pairwise superpositions (Fig 5).PUS1 and TruA employ the same protein surface, spanning the majority of their primary sequences, to contact their respective RNA targets; however, the position and identity of the corresponding contact residues and the conformation and orientation of their bound RNAs differ significantly from one another (S8 Fig) .An alignment of the yeast PUS1 and E. coli TruA and their respective RNA substrates revealed that the orientation of the TruA-bound leucyl tRNA results in a clash of the tRNA with a yeast specific large insertion, spanning residues S206 to approximately L279 (Figs 5 and 6).

Discussion
Previous biochemical studies have provided solid evidence indicating how PUS enzymes might target and interact with their ncRNA substrates [13,14,40].However, it is still unknown how stand-alone PUS enzymes interact with mRNA.In addition, it is still unclear if PUS1-mediated mRNA pseudouridylation is site specific and functional, or non-targeted and mostly a side product of PUS1 non-specifically recognizing tRNA-like structures in mRNA.Our data present crystal structures of a guide RNA-independent eukaryotic PUS enzyme bound to a mRNA fragment and shed light on the molecular mechanism underlying mRNA recognition and binding by such enzymes.
While bacterial TruA has been reported to crystalize as a homodimer (with each subunit binding an individual tRNA), other PUS enzymes-including human PUS1 [41,42]-function as monomers.Therefore, the recruitment of two PUS1 to the RNA-duplex was initially surprising.However, our data and the crystal structures (solved in two different space groups) collectively indicate that yeast PUS1 does in fact interact with RNA substrates as a monomer, and that the observation of two bound enzyme subunits to the RNA duplex in a symmetric manner is a (fortuitous) biochemical and crystallographic artifact.First, the PUS1 enzyme is a monomer in solution as confirmed by size-exclusion chromatography (SEC) at high protein concentrations (S1 Fig) .Second, despite binding to opposing ends and sides of the highly symmetric RNA-duplex used for crystallization, the PUS1 subunits do not display an extensive interface that would be expected for a functional protein dimer.However, we cannot exclude the possibility that binding of PUS1 to a single stranded RNA is required for or supports the formation of a stable RNA-duplex for some RNA substrates.A question that immediately arises from our analyses is if the base-paired RNA duplex observed in the crystal structures represents a physiological and/or mechanistical state that influences PUS1 specificity and activity on mRNA.The Schwartz laboratory recently reported the importance of double-stranded stem RNA-structures for mRNA pseudouridylation by TruB1 and PUS7 through high-throughput C mapping [10].Their data indicate that both enzymes modify structure motifs in mRNA that match important motifs in their preferred tRNA-substrates, suggesting a similar mechanism of site-recognition.
In contrast, PUS1 and its bacterial counterpart TruA are promiscuous in their target selection, modifying multiple RNAs with divergent sequences in the region of modification [57].Illumina-based mapping of C installed by PUS1 in yeast and human cells revealed a weak H-R-U sequence motif, which the authors concluded was not sufficient to explain PUS1 specificity [14].However, a following study by the same laboratory reported that an RNA-duplex that forms a stem-loop structure in combination with the H-R-U motif is required for efficient pseudouridylation [40].They showed that features like stem-length and -stability regulate the pseudouridylation rate, which agrees with previous data showing that a loop flanked by stem structures is preferably targeted by human PUS1 [42].It also agrees with our results testing two stem-loop substrates.As predicted, only the uridines located at the base of the stem were pseudouridylated in both tested substrates.However, neither of the substrates formed crystals with PUS1.Despite adding a bulge in the stem region to increase conformational flexibility, the loop connecting both RNA strands may restrict the flexibility and the ability of the substrate to form its most stable orientation when forming a complex with PUS1 under the conditions used for crystallization.Interestingly, we observed PUS1 activity on short RNA oligos that were predicted to fold into structures with sub-optimal PUS1 target sites.11% or 13% of 10 uridines in the R167 and R168 substrates, respectively, as well as 17% of the 12 uridines in R169 were pseudouridylated by PUS1 in vitro.Assuming that pseudouridylation occurred at specific sites rather than randomly throughout the oligo, this suggests that either one (R167/ R168) or two specific uridines (R169) were modified by PUS1.Accordingly, PUS1 substrate selectivity-at least in vitro-may be less restricted than recently reported.
The crystal structure presented here provides mechanical insight into why PUS1 requires a double stranded RNA stem for target recognition, binding, and activity.Our data show that each individual PUS1 subunit makes extensive contacts to both strands in the RNA duplex.Those RNA-contacting residues, which are well-conserved across PUS1 homologues, appear to enable binding of an RNA double helix and position the 5' flanking region of the bound RNA strand close to the active site in PUS1.We find that alanine mutations of PUS1 residues arginine 132 and arginine 362 significantly reduce PUS1 activity.Interestingly, these two residues contact opposite strands in the RNA double helix, further supporting the hypothesis that an RNA double helix structure is important for binding of the RNA substrate and/or the positioning of the pseudouridylation site in the active site of the enzyme.
One explanation for yeast PUS1's less restricted mode of structure recognition could be the location of its thumb loop (Tyr 83 -Thr 99).In the PUS1:mRNA structure, the loop is ordered and pointing away from the active site.This orientation results in a significantly wider opening of the active site cleft compared to TruA or human PUS1 [41], which may allow for the binding of more diverse RNA substrates.
In agreement with our finding that PUS1 does not significantly modify the R263 substrate, none of the uridines within the crystallization oligonucleotide are located near the enzyme's active site in the crystal structures but are instead involved in duplex formation via base pairing interactions between RNA strands.The structure suggests that an RNA-duplex with a minimum length of 10 to 12 nt is required for all RNA:PUS1 interactions to stably form and to position the RNA correctly in the active site.This agrees with a 11 bp median length of double stranded stem structures found in PUS1 mRNA targets [40].Our PUS1-dsRNA structures suggest that this minimum RNA-length is required for the formation of a sufficient stem structure that allows for all the PUS1-RNA interactions to form, which in turn positions the target uridine at the base of the stem in the catalytic center of the enzyme.Carlile and colleagues discuss the possibility that a cap at the entrance of the RNA binding channel in a yeast PUS1 structure model formed by a bundle of three α-helices may restrict the length of the RNA stem that can be accommodated in the enzyme [40,41].While our structures clearly indicate the presence of such a helical bundle, they do not indicate any clashes of the RNA with PUS1 components.Instead, we hypothesize that longer structures may not be efficiently bound, stabilized, and oriented in the active center by the PUS1-RNA interactions.
Comparison of the structures solved in this study with that of bacterial tRNA pseudouridine synthase TruA [30] (a representative of the structural family from which PUS1 is derived) bound to its tRNA substrate indicates how RNA recognition mechanisms can-and havediverged dramatically.Both enzymes display contacts to their respective RNA targets involving approximately 15 residues distributed across their N-and C-terminal domains, and in doing so position their active sites near bases that are potential sites of modification (individual bases located 5' of the bound RNA duplex for PUS1; bases 38, 39 and 40 of the tRNA anticodon stem loop for TruA).Of the residues involved in direct RNA contacts from each enzyme, several equivalent positions in each enzyme's N-terminal region contact RNA atoms but involve quite different protein residues (H 89 GMQYNPPN 97 in PUS1; Y 24 GWQRQNEV 32 in TruA) that contact unique nucleotide identities and conformations in their corresponding RNA targets.In addition, the orientation of the TruA-bound leucyl tRNA causes a significant clash with two helices in a yeast specific insert (S206 -L279), providing further evidence for divergent evolution of these enzymes.
Conversely, a separate examination and comparison of the structures solved in this study against a tRNA Ile from Staphylococcus aureus (PDB ID: 1QU3), which in eukaryotes is known to be modified within its anticodon loop by PUS1 [36,41,58], indicates that the enzyme may be able to do so through a binding mode and interactions with that substrate's anticodon stem loop that are similar to the binding and interactions observed in our crystal structure.Superposition of the tRNA Ile substrate, via its anticodon stem loop, onto the RNA duplex from the crystal structure positions the tRNA anticodon loop and the uridines at positions 34 and 36 adjacent to the enzyme's active site ( Fig 7).The corresponding docked position of the tRNA (generated by aligning the anticodon arm of the tRNA, which is known to be modified by PUS1, with the helix of the RNA in the structure) further positions the previously identified RNA-contacting residues at similar distances and potential interactions with atoms along the tRNA backbone.The superposition does not indicate significant clashes between the C-terminal helical region of the PUS1 enzyme and the remainder of the tRNA molecule.
The structures presented here provide the first comprehensive mechanistic insight into the interaction between a eukaryotic stand-alone PUS enzyme and its mRNA substrates.However, some questions remain unanswered.First, is the underlying RNA sequence within a given RNA stem loop in an mRNA important for pseudouridylation activity?Or is any such structure, regardless of base pair identity, sufficient for enzyme activity?Our and others' data suggest that while the length and stability of the stem modulate the pseudouridylation rate, the combination of a uridine at the base of a stem-loop structure may be sufficient for PUS1 to recognize it as a target.Second, if guide RNA-independent PUS enzymes recognize and bind to defined RNA structural motifs, and if such motifs are indeed responsible for the majority of mRNA pseudouridylation, are these motifs specifically placed in mRNA to have regulatory functions?Cs are not randomly distributed within mRNAs, and have been shown to be enriched in the 3'-UTRs and coding region [14], suggesting a functional role.The structural basis for target recognition by PUS1 presented here is a first step towards understanding how PUS enzymes select their targets in mRNA, and if and how this process may be regulated.

Fig 1 .
Fig 1. Determination of a suitable RNA substrate for crystallization with PUS1.(a) Schematic representation of the PUS1 activity assay and analysis (the chemical structures of U and C are provided to the left) (b) PUS1 was incubated with firefly luciferase (Fluc) mRNA.The graph shows percent of uridine-to-C conversion (% pseudoU) over time in minutes from two independent experimental replicates in a non-linear fit, four parameter logistic (4PL) sigmoidal curve.Reactions were stopped after 30/60/90/120/300 minutes and analyzed via LC-MS/MS.(c) Schematic view of synthetic RNA oligos and a short in vitro transcribed RNA covering Fluc positions 760 to 845.The names of the substrates are shown inside the bars.(d) Percent of uridine-to-C conversion as determined by LC-MS/MS of the RNA oligos by PUS1 after incubation for two hours.Each data set contains at least two replicas.RNA secondary structure predictions [48] of RNA substrates R397 (e), R444 (f), and R398 (g).The predicted pseudouridylation site, the positions of the H (-2) and R (-1) nucleotide in relation to the pseudouridylation site, and RNA structure features are indicated.The nucleotides are colored according to the type of structure that they are in (green: stems (canonical helices), yellow: interior loops, blue: hairpin loops, orange: 5' and 3' unpaired region).(h) Heatmaps of the number of oligonucleotide spectra identified by LC-MS/MS analysis in a RNase T1 digest of RNA substrates after incubation with PUS1 indicated above the heatmaps.Shown are the characteristic 207.04 m/z C nucleoside signature ion (bottom) or the 211.00 m/z ribose phosphate ion (top).https://doi.org/10.1371/journal.pone.0291267.g001 Fig) as potential substrate for co-crystallization with PUS1.While this short 18 nt oligo only displayed very low levels of pseudouridylation by PUS1 in vitro (Fig 1D), it was predicted to form a short stem-loop structure with a uridine at the 5' base of the stem, suggesting that it may still be recognized by PUS1 (S2C Fig).When tested for PUS1-binding, R263 formed stable complexes with PUS1 wt and PUS1 D134A in electrophoretic mobility shift assays (EMSA) (S4B Fig).We therefore decided to attempt crystallization of both PUS1 enzymes in the presence of R263.

Fig 2 .
Fig 2. Model and representative electron-density for RNA-bound PUS1.(a) The contents of the asymmetric unit, for both structures that were solved, corresponds to a single protein subunit (colored as a spectrum, from the blue Nterminal end of the refined model to the red C-terminal end) bound to a single R263 RNA oligonucleotide (black bases).The model of the catalytically inactive D134A enzyme is shown in two orientations related by a 90˚rotation around the x-axis (PDB ID: 7R9G).(b) In both structures, the application of a crystallographic 2-fold rotation axis generates a dimeric complex in which two subunits are independently bound to an RNA duplex.The second protein subunit and second RNA strand are colored in dark teal and pale blue, respectively.(c) Representative simulated annealing composite omit 2Fo-Fc electron density contoured across the RNA duplex and (d) at the region of protein-RNA contacts observed at the 5' end of one RNA strand.The structural features illustrated in this figure are replicated for the wild type enzyme bound to a closely related RNA complex, which was solved in an unrelated crystallographic space group and lattice (S7 Fig).The position of the bound sulfate ion mirrors a similarly placed sulfate in the previously described structure of unbound human PUS1.https://doi.org/10.1371/journal.pone.0291267.g002

Fig 3 .
Fig 3. Protein-RNA contacts.(a) The first twelve nucleotides of each RNA strand (R263) are visible in the crystal structure, while the last six nucleotides are disordered and unobservable.Each protein subunit displays identical contacts, related by two-fold symmetry, to RNA bases and backbone atoms, and to a tightly coordinated sulfate ion immediately upstream of the upstream base in each bound RNA (contacts made by only one protein subunit are displayed for clarity).Residues that either contact the RNA near the active site (cluster a, blue boxes), or further downstream (cluster b, orange boxes) are indicated.(b) Distribution of RNA-contacting residues in the proteinsubunit interface for one PUS1 subunit.Brackets indicate the residues that contact the RNA close to the active site (cluster A, blue) or further downstream (cluster B, orange).(c) Distribution of contacts around the RNA duplex for both enzyme subunits in the dimeric assemblage.RNA-contacting residues in both PUS1 proteins are labeled either black (rainbow colored PUS1) or red (turquoise colored PUS1).https://doi.org/10.1371/journal.pone.0291267.g003 and S6A Fig).Despite not being in direct vicinity of the enzyme's active center, residues in cluster B also proved important for PUS1 activity: Changing arginine 362 to alanine resulted in a dramatic reduction of pseudouridylation by approximately 75%, while mutation of the adjacent lysine 363 lead to a moderately reduced activity (~47%; Fig 3C and S8A Fig).

Fig 4 .
Fig 4. Mutation of RNA-contacting amino acid residues adversely affect PUS1 activity.(a)) The rate of pseudouridylation (% pseudoU) of PUS1 wildtype and the indicated PUS1 mutants with R169 substrate is shown.Data of two replicas is shown.(b) Heatmaps of the number of oligonucleotide spectra identified by LC-MS/MS analysis in a RNase T1 digest of RNA substrate R397 after incubation with wildtype PUS1 and cluster A mutants showing either a characteristic 207.04 m/z C nucleoside signature ion (bottom) or a 211.00 m/z ribose phosphate ion (top).The fragments containing C are highlighted with red boxes below the heatmaps and red circles in the RNA structure (right).The nucleotides are colored according to the type of structure that they are in (green: stems (canonical helices), yellow: interior loops, blue: hairpin loops, orange: 5' and 3' unpaired region).(c) LC-MS/MS analysis of RNA substrate R397 after incubation with wildtype PUS1 and cluster B mutants, showing the characteristic 207.04 m/z C nucleoside signature ion and the 211.00 m/z ribose phosphate ion.C containing fragments are highlighted with red boxes.https://doi.org/10.1371/journal.pone.0291267.g004

Fig 5 .
Fig 5. Comparison of yeast PUS1 to E. coli TruA and human PUS1.(a) Human PUS1 apo-enzyme (4J37).The structure includes two bound sulfate ions, one of which aligns with a single sulfate ion near the enzyme active site that was also observed in RNA-bound S. cerevisiae PUS1.(b) S. cerevisiae PUS1D134A bound to duplex RNA.A large insertion in the yeast enzyme, spanning residues S206 to approximately L279 (indicated by the oval), is unique as compared to its homologues in other eukaryotes (S5 Fig).It contains two RNA-contacting residues that are unique to the yeast enzyme.(c) E. coli TruA (2NR0) bound to tRNA.TruA utilizes an equivalent surface to bind its respective target but in a considerably different manner from PUS1.https://doi.org/10.1371/journal.pone.0291267.g005

Fig 7 .
Fig 7. Superposition of the crystal structure of PUS1 bound to the RNA duplex described in this study with a tRNA Ile known to be modified by the same enzyme within its anticodon loop.(a) The protein is colored; the RNA duplex from the crystal structure is dark grey; the tRNA substrate that is docked onto the crystal structure is light grey.The position of the active site D134 residue is indicated with light blue spheres; the positions of the site of uracil modifications are indicated with red sticks.(b) RNA structure prediction [48] of the tRNA Ile from Staphylococcus aureus (PDB ID: 1QU3) [58].The locations of the conserved C 34 and C 36 in the anticodon loop are indicated.The nucleotides are colored according to the type of structure that they are in (green: stems (canonical helices), yellow: interior loops, blue: hairpin loops, orange: 5' and 3' unpaired region).(c) When docked onto PUS1, C 34 and C 36 (red) are located close to the active site (D134; blue sphere).https://doi.org/10.1371/journal.pone.0291267.g007