Plasmodium apicoplast tyrosyl-tRNA synthetase recognizes an unusual, simplified identity set in cognate tRNATyr

The life cycle of Plasmodium falciparum, the agent responsible for malaria, depends on both cytosolic and apicoplast translation fidelity. Apicoplast aminoacyl-tRNA synthetases (aaRS) are bacterial-like enzymes devoted to organellar tRNA aminoacylation. They are all encoded by the nuclear genome and are translocated into the apicoplast only after cytosolic biosynthesis. Apicoplast aaRSs contain numerous idiosyncratic sequence insertions: An understanding of the roles of these insertions has remained elusive and they hinder efforts to heterologously overexpress these proteins. Moreover, the A/T rich content of the Plasmodium genome leads to A/U rich apicoplast tRNA substrates that display structural plasticity. Here, we focus on the P. falciparum apicoplast tyrosyl-tRNA synthetase (Pf-apiTyrRS) and its cognate tRNATyr substrate (Pf-apitRNATyr). Cloning and expression strategies used to obtain an active and functional recombinant Pf-apiTyrRS are reported. Functional analyses established that only three weak identity elements in the apitRNATyr promote specific recognition by the cognate Pf-apiTyrRS and that positive identity elements usually found in the tRNATyr acceptor stem are excluded from this set. This finding brings to light an unusual behavior for a tRNATyr aminoacylation system and suggests that Pf-apiTyrRS uses primarily negative recognition elements to direct tyrosylation specificity.


Introduction
Plasmodium falciparum causes the most severe form of malaria in humans. Rapid constitutive growth and expansion of the parasite are highly dependent on the continuous synthesis of proteins in the cytosol and organellar compartments [1]. Indeed, Plasmodium contains three genomes: nuclear, apicoplast (a relic chloroplast) and mitochondrial. These genomes require dedicated translation machineries to function, even if translation has not yet been explicitly demonstrated in the mitochondria. The Plasmodium apicoplast is essential and is involved in the synthesis of fatty acids, isoprenoid precursors and heme [2]. It has a 35 kb circular genome encoding 30  demonstrates that the identity elements in Pf-apitRNA Tyr are unusually reduced in strength and number. These results reveal that the identity elements of the apicoplast tyrosine aminoacylation system are both distinct and minimalistic in comparison to those that have been conserved evolutionarily elsewhere.

Cloning and purification of P. falciparum apicoplast TyrRS
The genomic sequence of Pf-apiTyrRS was retrieved from PlasmoDB [20] by sequence homology with the human mitochondrial TyrRS (EAW88518.1, Hs-mitoTyrRS) and Thermus thermophilus TyrRS (AEG33811.1) [6]. The gene (PF3D7_1117500) codes for a 561 amino acid protein. The Pf-apiTyrRS gene was amplified by PCR from a P. falciparum cDNA library (provided by Dr. H. Vial, Montpellier), sequenced, and cloned into pQE30 (Qiagen) to produce a protein with a 6-histidine fusion tag at its N-terminus. This plasmid expresses the Pf-apiTyrRS recombinant protein without its N-terminal apicoplast targeting signal and covers amino acids 25 to 561 (S1 Fig).
In addition to cloning of the endogenous P. falciparum nucleotide sequence of Pf-apiTyrRS (Endo), two additional nucleotide sequences, both encoding the same amino acid chain, were designed (S2 Fig). In the first, we produced an optimized version (Opt) of the Pf-apiTyrRS gene based on Escherichia coli codon usage (designed by Genscript) and a harmonized version (Harm) of the gene using the ANACONDA software [21,22]. Both genes were synthesized by GenScript (https://www.genscript.com) and cloned into the pQE30 plasmid. Furthermore, a truncated variant of the Harm Pf-apiTyrRS gene, lacking the C-terminal S4-like domain (residues 461 to 561) was cloned into pQE70 with a C-terminal 6-His tag.
Overexpression of all recombinant proteins was performed at 18˚C overnight in LB medium containing 0.1 mg/mL ampicillin and 1 mM IPTG (Isopropyl β-D-1-thiogalactopyranoside), and the purification of 6 His-tagged proteins was accomplished on Ni-NTA resin according to the manufacturer's instructions (Qiagen). Purified enzymes were dialyzed against 25 mM HEPES-KOH pH 7.5, 25 mM KCl, 50% glycerol and were kept at -20˚C until use. Proteins were quantified through absorbance measurements and their enzymatic activities were assessed by in vitro aminoacylation of native E. coli tRNA Tyr . Gel filtration analysis was performed on a Superdex 200 increase 10/3000 GL column (GE Healthcare) in 50 mM potassium phosphate buffer pH 7.5, 150 mM KCl, 10% glycerol and 1 mM EDTA.

Sequence analysis
Sequence alignments were computed with Tcoffee [23], and CLUSTALW [24] softwares. The prediction of Pf-apiTyrRS secondary structure was achieved with the PredictProtein software [25]. The PlasmoAP algorithm [20] confirmed the presence of an apicoplast targeting signal and predicted the cleavage site. The E. coli codon usage database was from [26].

Procedures for structural analysis of free and TyrRS-complexed tRNAs
Lead and enzymatic probing were performed as in [28] with the following details described here: Lead probing: 1 μM of 5'-labeled Pf-apitRNA Tyr wild-type transcript (80,000 cpm) was incubated in 50 mM Tris-acetate pH 7.5, 5 mM magnesium acetate, 50 mM potassium acetate. A solution of Pb(OAc) 2 , freshly prepared in H 2 O, was added to reach final concentrations of 1, 3, 6 and 10 mM. The samples were incubated for 6 min at 25˚C.
All reactions were stopped by the addition of 20 μl of Stop Mix (0.6 M NaOAc, 4 mM EDTA, 0.1 mg/mL total tRNA, and 1 μg glycogen) and ethanol precipitated. The pellets were washed twice with 70% ethanol, vacuum-dried, dissolved in gel loading mix (90% formamide, 0.5% EDTA, 0.1% xylene cyanol and 0.1% bromophenol blue), heated 2 min at 90˚C, and then loaded on a 12% denaturing gel. In parallel, T1 nuclease and alkaline hydrolysis reactions were performed under denaturing conditions to accurately assign the bands in each gel.
Footprinting assays (10 μL) were performed under the same conditions as above in the absence or presence of 5.7 μM Pf-apiTyrRS. The tRNA/TyrRS complex was incubated for 6 min at 25˚C, before 0.2 U T1 or 5.4 U S1 (supplemented with 1 mM ZnCl 2 for S1 cleavage) were added. Incubation was continued for 8 min at 25˚C and the reactions were stopped by phenol extraction. After precipitation, the pellets were treated as described above.

Identification of the P. falciparum apicoplast TyrRS gene
We identified the gene coding for the 561 amino acid Plasmodium Pf-apiTyrRS (PF3D7_1117500) containing an N-terminal catalytic domain, an anticodon binding domain at its C-terminus, and a putative signal sequence for apicoplast targeting [6] (Fig 1A). The Rossman-fold-containing catalytic domain (amino acids 1-328) presents both class I aaRS The overall organization of Pf-apiTyrRS, compared to TyrRSs from humans (mitochondria and cytosol), bacteria, and archaea. Each structural domain is given in a specific color: blue for mitochondrial and apicoplast targeting signals, black for catalytic domains, green for the canonical α-helical anticodonbinding domains, purple for anticodon-binding domains homologous to TrpRS [31], and grey for additional C-terminal domains (S4-like in bacteria and organelles; EMAPII-like in vertebrates [32]). Plasmodium-specific idiosyncratic insertions are shown in yellow. Red dots indicate the position of the signature sequences present in the catalytic domains of class I aaRSs. (B) Coomassie-stained gel (loading control) and the corresponding Western blot show Pf-apiTyrRS expressed from the Plasmodium wild-type nucleotide sequence (Endo), the optimized gene sequence for E. coli expression (Opt), and the harmonized gene sequence (Harm). T stands for Total extract, and S stands for Soluble extract (supernatant of the centrifuged total extract). On the coomassie-stained gel, overexpression of the Harm Pf-TyrRS is indicated with a red asterisk. Additional bands potentially correspond to degradation products. On the Western Blot, 6-His tagged proteins were specifically detected with a mouse anti-specific motifs (HNGL and KYSKS). As expected, the C-terminal domain of Pf-apiTyrRS presents the typical α-helical domain (amino acids 329-460) and the S4-like region (amino acids 461-561) that are both specific features found only in bacterial and mitochondrial TyrRSs (reviewed in [30]).
The PlasmoAP algorithm [20] that predicts apicoplast targeting signals indicated that the N-terminal extremity of Pf-apiTyrRS "very likely" contains an apicoplast targeting signal and secondary structure predictions positioned this targeting sequence within an α-helix (S1 Fig). These information led us to the deletion of the first 24 amino acids from the N-terminus in the recombinant Pf-apiTyrRS.
Pf-apiTyrRS is longer than its prokaryotic homologs because it contains two insertions of 19 and 56 amino acids in the catalytic and the anticodon-binding domains, respectively ( Fig  1A). Sequence alignments with seven other Plasmodium apicoplast TyrRSs (P. reichenowi, P. vivax, P. knowlesi, P. gallinaceum, P. yoelii, P. chabaudi, P. berghei) revealed conserved locations for these insertions, while their sizes and sequences vary significantly (S1 Fig). The insertion located in the anticodon-binding domain is characterized by single amino acid repeats [33]. Indeed, this insertion is composed of 30% asparagine residues.

Production of a functional recombinant P. falciparum apicoplast TyrRS
Pf-apiTyrRS 25-561 could be expressed in E. coli directly from the P. falciparum wild-type nucleotide sequence (Endo), but the affinity-purified yield of soluble protein was poor (<1.5 mg protein per liter culture). Alternative strategies were used to improve the production of soluble Pf-apiTyrRS  . Two different synthetic gene sequences both encoding the same wild-type Pf-apiTyrRS  were cloned (S2 Fig). In one case (Opt), the coding DNA sequence was changed using standard codon optimization rules for expression in E. coli (GenScript). In the second case, the DNA sequence was "harmonized" (Harm) using the bioinformatics application for gene primary structure analysis ANACONDA. This program uses statistical methods to analyze not only the codon usage but also the codon context (degree of association, context, and clustering) on a genomic scale [21,22]. In other words, it takes into account the rules governing the evolution of codon bias in P. falciparum to design a new nucleotide sequence adapted to the E. coli translational apparatus. The main differences between Opt and Harm were at the level of leucine codons, which were all substituted with CTA in the harmonized gene (S2 Fig); CTA is the rarest amongst the six leucine codons (0.385%) in E. coli [26].
Gene expression and protein purification using the Opt gene did not change the expression of the protein significantly compared to the endogenous Plasmodium DNA sequence (Endo), however, the solubility of Pf-apiTyrRS  was reduced ( Fig 1B) and purification yields were low (<0.4 mg protein per liter culture). The best expression yields were obtained with the Harm gene, which yielded nearly 3-fold more Pf-apiTyrRS after affinity purification (about 4 mg protein per liter culture) (S3A and S3B Fig). Furthermore, comparative aminoacylation assays ( Fig 1C) using native E. coli tRNA Tyr demonstrated that the enzymes produced from these three plasmids were not functionally equivalent. Indeed, the Harm Pf-apiTyrRS is significantly more efficient in aminoacylation than the other two preparations. This observation indicates that gene harmonization not only increased the solubility and hence the purification yields of Pf-apiTyrRS  , but also improved the correct folding of the recombinant protein.
Indeed, the purified Harm Pf-apiTyrRS 25-561 protein elutes as a major peak of about 160 kDa on a gel filtration column, suggesting that it forms the expected 118 kDa homodimer (S3C Fig). Thus, Pf-apiTyrRS 25-561 expressed from the harmonized construct was used to determine the kinetic parameters of all Pf-apitRNA Tyr mutants.
Pf-apiTyrRS displays an S4-like domain at its C-terminus, specific to bacterial and mitochondrial TyrRSs (Fig 1A). In general, the elimination of this domain increases solubility, while decrease the enzyme's affinity for tRNA Tyr (e.g. [34]), since recognition of the tRNA variable region is disrupted [35]. In contrast to these prokaryotic-type TyrRSs, the truncation of the S4-like domain of Pf-apiTyrRS led to an inactive enzyme that does not catalyze the first step of the aminoacylation reaction (tyrosine activation in the presence of ATP as measured by ATP/PPi exchange assays (S3D Fig)). This result suggests that, unlike all other known TyrRSs, deletion of the C-terminal domain of Pf-apiTyrRS destabilizes the folding of the N-terminal catalytic site.

Sequence peculiarities in P. falciparum apicoplast tRNA Tyr
The natural substrate for Pf-apiTyrRS is the only tRNA Tyr encoded by the apicoplast genome (Fig 2A and 2B). Pf-apitRNA Tyr is composed of 68% A/U residues (both in stems and loops) reflecting the rich A/T composition of the P. falciparum genome [3]. Pf-apitRNA Tyr displays (i) the phylogenetically conserved A 73 discriminator base; (ii) an α3-β4 D-loop with a noncanonical G 13 -A 22 base pair in the D-stem; and (iii) is a class 2 tRNA (like bacterial tRNA Tyr s), defined as tRNAs with a large variable region ( Fig 2B). Interestingly, Pf-tRNA Tyr is characterized by the presence of an A 1 -U 72 base pair at the top of the acceptor stem. This base pair is conserved in Plasmodium apicoplast tRNA Tyr sequences referenced in EupathDB [20] (some of which are displayed in Fig 2A), while tRNA Tyr from bacteria, mitochondria, and chloroplasts are characterized by a G 1 -C 72 base pair and tRNA Tyr from archaea and eukarya contain a C 1 -G 72 base pair (Fig 2C).

Cloverleaf folding of P. falciparum apicoplast tRNA Tyr
The 87 nucleotide Pf-apitRNA Tyr was produced as a transcript lacking modified bases. Probing experiments were performed with Pb(OAc) 2 and nucleases ( Fig 3A) to verify that the high proportion of A and U residues together with the absence of posttranscriptional modifications do not hinder the formation of the canonical cloverleaf fold. S1 and T1 nucleases and Pb(OAc) 2 are specific for single-stranded regions, whereas the V1 nuclease recognizes double-stranded and highly structured regions. In Fig 3A, the similarities between RNAse T1 profiles in native (T1) and denaturing (G) conditions suggest that the Pf-tRNA Tyr transcript is flexible. This hypothesis was confirmed by the pattern of lead cleavage positions, which occur throughout the sequence, with the strongest cuts concentrated in the loops and the variable region.
As expected for a cloverleaf fold, the strongest S1 and V1 cuts are only found in the anticodon-loop and the anticodon-stem, respectively, indicating that this portion of the transcript is indeed folded. Moreover, the moderate accessibilities of A 57 and U 20A to V1 cleavage confirm the presence of tertiary interactions between the T-and D-loops, whereas conflicting V1 and S1 cuts in the variable region suggest a fluctuating structure in this domain. Altogether, these probing data are in agreement with a cloverleaf fold and indicate intrinsic structural plasticity, reminiscent of what has been observed for some mitochondrial tRNAs [28,45]. Moreover, the correct folding of Pf-apitRNA Tyr was also confirmed by aminoacylation assays with the homologous Harm Pf-apiTyrRS  , since tyrosylation of Pf-apitRNA Tyr occurred with the same catalytic efficiency as with the native E. coli tRNA Tyr (Table 1).
Extensive recognition of P. falciparum apicoplast tRNA Tyr by its cognate TyrRS S1 and V1 nucleases were used in footprinting experiments (Fig 3B) to detect the protected regions of Pf-apitRNA Tyr in the presence of Pf-apiTyrRS  . The anticodon-loop and the variable domain of the Pf-apitRNA Tyr transcript are both strongly protected from RNase cleavage and are consistent with what has been observed in the crystallographic structure of the T. thermophilus TyrRS/tRNA Tyr complex [35]. However, protection patterns detected in the Pf-apitRNA Tyr D-domain are dissimilar to the bacterial recognition pattern [35]. The addition of Pf-apiTyrRS  protected the Pf-apitRNA Tyr D-loop from nuclease cleavage, as a is indicated. Notice the non-canonical G 13 -A 22 base pair located at the end of the D-arm. The tRNA is numbered according to [36]. (C) 2D schematic structures of tRNA Tyr showing the residues involved in tyrosylation in different phylae [19,30,[37][38][39][40][41][42][43][44]. The residues involved in tyrosine identity are explicitly given in uppercase. The strengths of the tyrosine identity elements are indicated by colors: red (loss in catalytic efficiency >100-fold compared to the wild-type transcript), orange (loss between 10-and 100-fold) and green (loss between 5-and 10-fold). Lowercase letters are given to highlight conservation of the residues in the anticodon triplet and at positions 1-72 and 73 in the acceptor stem, despite their exclusion from the identity set. The question mark shows that the importance of position 36 has not been tested in bacteria.
https://doi.org/10.1371/journal.pone.0209805.g002  [36]. Nucleotides at the 3'-and 5'-ends, which cannot be analyzed are indicated by dotted lines on the tRNA structure. (B) Autoradiogram corresponding to footprinting experiments: the 5'-labeled Pf-apitRNA Tyr transcript was incubated with S1 and T1 nucleases in the absence (-) or presence (+) of Pf-apiTyrRS. The strongest RNase protections which confirm the sites of interaction between the tRNA Tyr transcript and Pf-apiTyrRS are framed in orange. Controls (C) were performed without nucleases. G and A indicate T1 and alkaline ladders, respectively. Pf-apitRNA Tyr residues that are protected from nucleases in the presence of Pf-apiTyrRS are indicated in orange (strong protection) and yellow (weak protection) on the Pf-apitRNA Tyr cloverleaf structure. https://doi.org/10.1371/journal.pone.0209805.g003 Identity set of apicoplast tRNA Tyr consequence either of a direct contact with the synthetase, or of an indirect effect due to a conformational change in the tRNA bound to the enzyme.

Looking for identity elements in P. falciparum apicoplast tRNA Tyr
TyrRS displays species-specific tRNA recognition (summarized in Fig 2C with the references  therein). The tRNA Tyr A 73 discriminator base and the G 34 anticodon nucleotide are universally important for TyrRS recognition. Anticodon nucleotides U 35 and A 36 also contribute to tyrosylation identity, yet with varying strengths in eukarya, bacteria, and mitochondria, and are marginal in archaea. Notably, the G 1 -C 72 identity base-pair, located at the top of the acceptor stem in bacterial tRNA Tyr , is replaced by a C 1 -G 72 identity base pair in eukarya tRNA Tyr . Finally, the large variable region is unique to bacterial tRNA Tyr and is essential for tyrosylation.
We chose to elucidate the identity determinant set for Pf-apitRNA Tyr with Pf-api-TyrRS 25-561 . Eighteen mutants were designed to test the extremity of the acceptor stem (discriminator base 73 and the 1-72 base-pair), the anticodon triplet, and the variable region for their importance for tyrosylation (Fig 4). The kinetic parameters for tyrosylation were determined and compared to those obtained for the wild-type Pf-apitRNA Tyr transcript (Table 1). Acceptor stem. Unexpectedly, replacement of the A 73 discriminator base by G, C or U reduced tyrosylation only 2.2 to 4.2-fold, suggesting that these mutations do not affect the recognition of tRNA Tyr by Pf-apiTyrRS, but slightly modulate the structure of the tRNA near the catalytic site. Indeed, it has been shown that a pyrimidine residue at position 73 might influence the stability of the acceptor stem extremity [46]. Unlike the vast majority of tRNA Tyr isoacceptors, where the first base pair in the acceptor stem is G 1 -C 72 (in bacteria) or C 1 -G 72 (in archaea and eukarya), all Plasmodia apicoplast tRNA Tyr contain a conserved A 1 -U 72 (Fig 2A). A 1 -U 72 was therefore changed to G-C and C-G, (Fig 4 and Table 1). Pf-apiTyrRS 25-561 aminoacylates the wild-type transcript, the G 1 -C 72 and the C 1 -G 72 variants with similar kinetic values, indicating that the first base pair of the acceptor stem is not part of this system's tyrosine identity set, providing the second example (with the human mitochondrial TyrRS) of a TyrRS lacking specificity to base pair 1-72 [44]. We hypothesized that a different base pair in the acceptor stem might have replaced 1-72 in the identity set, so we mutated positions 2-71 and 3-70 ( Fig  4 and Table 1). The recombinant Pf-apiTyrRS aminoacylated these variants with kinetic parameters similar to those of the wild-type transcript, characterized by a loss in efficiency of only 2.6-fold. Together, these data show the absence of tyrosine identity elements in the acceptor stem and suggest that Pf-apiTyrRS only recognizes the ribose-phosphate backbone in this region of the tRNA molecule. Only the variant where C 34 replaced G 34 showed a substantial loss in tyrosylation (36-fold). Mutations at position 35 affected catalytic efficiency by 1 to 7.7-fold and mutations at position 36 had virtually no effect on activity (3.0-fold) ( Table 1). Moreover, converting the tyrosine anticodon to a serine GCU anticodon (mutant Ser C 35 U 36 ) only leads to a moderate but significant decrease of 19.1-fold (Table 1).
Variable region and D-loop. Shortening the variable region (ΔVr) decreased the catalytic efficiency by a factor 7.1, compared to wild-type Pf-apitRNA Tyr (Table 1). However, insertion of the variable region of Pf-apitRNA Ser (SerVr) reduced this effect to only a 2.7-fold reduction in activity, suggesting that our ΔVr deletion caused a change in tRNA structure rather than a direct impact on enzyme recognition. In T. thermophilus tRNA Tyr , A 20B , located in the D-loop, interacts with U 47i in the variable domain, which provides a precise orientation of the variable region for its optimal recognition by TyrRS [35]. Since the Pf-apitRNA Tyr sequence displays both A 20B and U 47i , the same tertiary interaction could form (Fig 3A). Replacement of A 20B by U 20B in the Pf-apitRNA Tyr D-loop should, therefore, eliminate this interaction. However, this mutant showed no loss in tyrosylation activity (1.6-fold) ( Table 1). Finally, both the D-loop and the Pf-apitRNA Tyr variable region were changed to the Pf-apitRNA Ser (SerVr+D-l) sequences. In contrast to the long variable region in Pf-apitRNA Tyr , the long variable region of Pf-apitRNA Ser GCU exhibits eight base pairs and potentially a specific tertiary interaction between G 45 -C 48a and U 20B , determining its spatial orientation as in T. thermophilus [35] (S4 Fig). This replacement had no significant effect on tyrosylation efficiency (2.1-fold). Together these mutants demonstrate that, in the apicoplast, the presence of a long variable region plays a weak but significant role in tyrosylation identity (ΔVr, 7.1-fold), but neither the sequence nor the orientation of this long variable region is involved.

Expression of Pf-apiTyrRS
Plasmodium aaRSs are longer than their homologs because they contain many peculiar, sequence-repetitive insertions [3]. Neither the synthesis nor the functions of these insertions are understood [33]; the presence of long single amino acid repeats often reduces the solubility of the recombinant protein, but they cannot be removed under penalty of obtaining an inactive protein (e.g, [47]). Insertions are more frequent and more extended in apicoplast than in cytosolic Plasmodium aaRSs. Moreover, it has been suggested that the translation of these additional sequences locally reduces the rate of ribosomes and could be used to regulate cotranslational folding of proteins [6]. The presence of these insertions and the challenges they introduce may explain why apicoplast aaRSs are poorly studied despite their interest for the development of new anti-malarial drugs [13][14][15]. To date, only four apicoplast aaRSs from P. falciparum have been cloned, expressed, and characterized, namely lysyl-(LysRS) [48], glutamyl-(GluRS) [49], tryptophanyl-(TrpRS) [15], and the dual-targeted cysteinyl-tRNA (CysRS) synthetases [8]. CysRS and GluRS do not contain insertions, thus LysRS and TrpRS are the only insertion-containing apicoplast aaRSs that have been studied to date.
Translation is influenced by the choice of synonymous codons, which specify the same amino acid but differ in their decoding properties [50]. Thus, the primary structure of mRNA contains information that affects translation efficiency. The dominant model is that some codons or codon combinations reduce the decoding rate of ribosomes and thereby isolate the synthesis and folding of well-defined protein domains (e.g. [51][52][53]). The availability of tRNAs that decode synonymous codons, their requirement for wobble decoding, as well as interactions between adjacent codons play fundamental roles in this model. Codon usage and the number of tRNA genes in Plasmodia are very different from those of E. coli and thus make the expression of Plasmodium multidomain proteins challenging in the E. coli heterologous expression system. In this study, expression of recombinant Pf-apiTyrRS directly from the Plasmodium mRNA sequence was indeed ineffective. The optimization of synonymous codons involves the selection of optimal codons decoded only by abundant tRNAs in the expression host and thus the simultaneous minimization of rare codons [50,54]. This approach further reduced the solubility of the produced Pf-apiTyrRS. However, the use of harmonized codons, designed by the ANACONDA algorithm [21,22], increased the synthesis, solubility, and enzymatic activity of the purified recombinant protein. The main difference between both the Endo and the Opt genes compared to the Harm gene is the systematic replacement of leucine codons (TTA, TTG, CTG, CTT, and CTC) with the rarest leucine codon used in E. coli translation (CTA). As the ribosome slows when it encounters rare codons it may help the protein to fold appropriately, thereby increasing the yield of soluble proteins. Here, the Harm gene, containing a combination of fast and slow codons, facilitates co-translational folding and thus the production of a biologically active Pf-apiTyrRS. This result suggests that such an approach could be used to overcome the difficulties encountered when expressing Plasmodium multidomains proteins.

Evolution of tyrosine identity
Experimental work on tyrosylation systems from different species has established the evolution of the tyrosine identity set (Fig 2C). The A 73 discriminator base and the G 34 and U 35 anticodon bases were determined as common identity elements in tRNA Tyr of bacteria, archaea, eukarya, and mitochondria (summarized in [30]). In addition, the 1-72 base-pair at the end of the acceptor stem is critical to archaeal and eukaryal tyrosylation systems, whereas the long variable region is required only for the correct recognition of bacterial tRNAs (Fig 2C).
From our mutational analysis, the Plasmodium apicoplast tyrosylation system retains only one moderate (G 34 ) and two weak (U 35 and the long variable region) identity elements to ensure specific aminoacylation. Unlike other tyrosylation systems, this identity set does not include residues in the tRNA acceptor arm. Indeed, despite its conservation in all Plasmodium apitRNA Tyr , the first A 1 -U 72 base pair was not involved in tyrosylation, a situation already observed for the human mitochondrial tyrosine system [44]; and the A 73 discriminator base, common to all tyrosylation identity sets, does not influence tyrosylation in the Plasmodium apicoplast. Besides, neither the structure nor the orientation of the variable region is sufficient to prevent apicoplast tyrosylation. In the Plasmodium apicoplast system, insertion of the tRNA Ser variable region into Pf-apitRNA Tyr (mutants SerVr and SerVr+D-l) does not significantly reduce its recognition by Pf-apiTyrRS (2.7 and 2.1-fold, Table 1), while swapping the sequence of the variable region of E. coli tRNA Tyr with that of E. coli tRNA Ser decreases tyrosylation by more than 300-fold [40].
The only critical effect in the anticodon was obtained when G 34 was mutated, which led to a loss in efficiency of only 36-fold; an unprecedented situation amongst tyrosylation systems. However, the identity of tRNAs is not only dictated by the presence of sets of positive identity elements allowing recognition by cognate synthetases, but also by negative signals that prevent the interaction of tRNAs with non-cognate synthetases. This scenario could play an important role in the Plasmodium apicoplast. Of the 27 tRNA gene sequences encoded by this genome, four contain a G 34 T 35 sequence (Pf-apitRNA Tyr GTA , tRNA Asn GTT , tRNA Asp GTC , tRNA His GTG ) and two contains a G 34 and a long variable region (Pf-apitRNA Tyr GTA and tRNA Ser GCT ) (S4 Fig). Thus, the non-cognate asparagine, aspartate, histidine, and serine tRNAs must display features prohibiting recognition and tyrosylation by Pf-apiTyrRS. The transcription method used in the present study yields tRNAs lacking modified nucleotides, which may be a disadvantage if post-transcriptional modifications of native tRNAs play such a negative role in identity [17]. We can only predict that some modifications may be present in the apicoplast when the modification enzymes have been annoted in the Plasmodium genome [20]. For example, three putative queuine tRNA-ribosyltransferase are found in EupathDB, one of which (PF3D7_1242200) is predicted to be targeted to the apicoplast. Queuosine and its derivatives are found in bacterial and eukaryal tRNAs with a G34 [55], and guarantee fidelity and efficiency of translation [56]. The presence of this modifying enzyme in the apicoplast suggests that local tyrosine, histidine, aspartate, and asparagine tRNAs can be modified at position 34. However, nothing is known about post-transcriptional modifications of Plasmodium apicoplast tRNAs and if idiosyncratic modification patterns can control aminoacylation specificities.
We propose that the high A/T content of the Plasmodium apicoplast genome significantly reduces the potential for identity nucleotide combinations in apicoplast tRNAs. In the specific case of Pf-apitRNA Tyr , this led to the conservation of a minimal identity set with only three weak identity features positively recognized by the Pf-apiTyrRS. It is reasonable to ask whether these elements are sufficient to drive tyrosylation in vivo efficiently. Tyrosylation specificity could be mainly maintained by the presence of negative determinants (sequence/structural features or post-transcriptional modifications), which prevent mischarging of other Pf-apitR-NAs by Pf-apiTyrRS.

S1 Fig. Multiple sequence alignments of eigth Plasmodium apiTyrRSs and comparison
with the human mitochondrial TyrRS. Protein sequences are from EupathDB: P. falcipar-um_3D7 (PF3D7_1117500), P. reichenowi_CDC (PRCDC_1115900), P. vivax_P01 (PVP01_ 0918100), P. knowlesi_strain_H (PKNH_0915200), P. gallinaceum_8A (PGAL8A_00344100), P. yoelii_yoelii_YM (PYYM_0931900), P. chabaudi_chabaudi (PCHAS_0913800) and P. ber-ghei_ANKA (PBANKA_0930500). The color code follows that of Fig 1A: residues belonging to the catalytic domain are in black with class I signature motifs highlighted in red; residues from the anticodon-binding domain are in green; the S4-like domain is in grey; and the two Plasmodium-specific insertions are in yellow. The starting position of recombinant Pf-apiTyrRS  is indicated in cyan. Alignments were performed with Tcoffee [23] and predicted β-sheets and α-helices of Pf-apiTyrRS predicted by the PredictProtein software [20] are indicated with green arrows and red rectangles, respectively. (DOCX) S2 Fig. DNA sequences encoding Pf-apiTyrRS. Alignment of nucleotide sequences corresponding to the endogeneous (Endo), optimized (Opt) and harmonized (Harm) gene sequences encoding Pf-apiTyrRS. The amino acid sequence of the protein is in bold. All leucine (L) codons are highlighted: red indicates codons whose usage in E. coli is higher than 1% (TTG, TTA, CTG, CTT and CTC) and in green for the only rare leucine codon (CTA, 0.385%) [26].  [57], in the presence of tyrosine (2 mM) and radiolabeled [ 32 P]Ppi (20 cpm/pmol, this high specific activity was used to detect low exchange activities) and 0.8 μM Pf-apiTyrRS 25