Synthetic Biology of Proteins: Tuning GFPs Folding and Stability with Fluoroproline

Background Proline residues affect protein folding and stability via cis/trans isomerization of peptide bonds and by the Cγ-exo or -endo puckering of their pyrrolidine rings. Peptide bond conformation as well as puckering propensity can be manipulated by proper choice of ring substituents, e.g. Cγ-fluorination. Synthetic chemistry has routinely exploited ring-substituted proline analogs in order to change, modulate or control folding and stability of peptides. Methodology/Principal Findings In order to transmit this synthetic strategy to complex proteins, the ten proline residues of enhanced green fluorescent protein (EGFP) were globally replaced by (4R)- and (4S)-fluoroprolines (FPro). By this approach, we expected to affect the cis/trans peptidyl-proline bond isomerization and pyrrolidine ring puckering, which are responsible for the slow folding of this protein. Expression of both protein variants occurred at levels comparable to the parent protein, but the (4R)-FPro-EGFP resulted in irreversibly unfolded inclusion bodies, whereas the (4S)-FPro-EGFP led to a soluble fluorescent protein. Upon thermal denaturation, refolding of this variant occurs at significantly higher rates than the parent EGFP. Comparative inspection of the X-ray structures of EGFP and (4S)-FPro-EGFP allowed to correlate the significantly improved refolding with the Cγ-endo puckering of the pyrrolidine rings, which is favored by 4S-fluorination, and to lesser extents with the cis/trans isomerization of the prolines. Conclusions/Significance We discovered that the folding rates and stability of GFP are affected to a lesser extent by cis/trans isomerization of the proline bonds than by the puckering of pyrrolidine rings. In the Cγ-endo conformation the fluorine atoms are positioned in the structural context of the GFP such that a network of favorable local interactions is established. From these results the combined use of synthetic amino acids along with detailed structural knowledge and existing protein engineering methods can be envisioned as a promising strategy for the design of complex tailor-made proteins and even cellular structures of superior properties compared to the native forms.


Introduction
Enhanced green fluorescent protein (EGFP) is the Phe64Leu/ Ser65Thr mutant of GFP [1] (Fig. 1A) and one of the most widely used autofluorescent tags in molecular and cell biology [2]. GFPs are frequently used as reporters for both in vitro and in vivo protein folding, but their (re)folding rates are known to be very slow (10-1000 s) [2]. Therefore, an improvement of the folding properties still represents a challenge for the design and engineering of fast folding autofluorescent proteins. GFPs contain ten proline residues in their primary sequence. These prolines affect the folding rates in a decisive manner because of their known slow cis/trans isomerization [3][4][5][6][7][8][9]. We have therefore focused the present study on the role of these proline residues in the GFP folding process.
Among the twenty naturally occurring amino acids, proline occupies a special place. Its five-membered pyrrolidine structure causes an exceptional conformational rigidity, which is responsible for the a-helix or b-sheet disrupting properties of this residue in proteins. More importantly, cis/trans isomerization of peptidylproline bonds is one of the rate-determining steps in protein folding [6,10]. The pyrrolidine ring of proline adopts two alternative conformations that differ in the position of the C c atom relative to the plane of the ring. These are referred to as either C c -exo or C c -endo pucker ( [11,12] and references therein). The cis and trans peptidyl-proline bond conformation and the C cexo and C c -endo pucker of the pyrrolidine ring are correlated properties in proteins [11,12], which can be affected by appropriate ring substituents such as C c (C-4) fluorine atoms. Indeed, (2S, 4R)-4-fluoroproline ((4R)-FPro) (Fig. 1B) favors by stereoelectronic effects the trans conformation and C c -exo puckering, while the epimeric (2S, 4S)-4-fluoroproline ((4S)-FPro) (Fig. 1B) promotes the cis conformation and C c -endo puckering [13,14]. These properties were exploited for the synthesis of hyperstable collagen triple helices by replacing the hydroxyproline residues with (4R)-FPro [15,16]. Conversely, with (4S)-FPro folding rates of the pseudo-wildtype barstar C40A/C82A/P27A mutant [17] were enhanced and its structure stabilized by residue-specific replacement of the single Pro48 residue with (4S)-FPro to favor its cisconformation [13,18]. Similarly, the folding rates of the Nterminal domain of minicollagen from Hydra nematocysts containing a single cis Pro bond were significantly and contrariwise affected by (4R)-or (4S)-FPro [19].
Based on these previous experiences it was reasonable to expect a marked effect of the two stereochemically distinct fluoroprolines (4R)-FPro and (4S)-FPro on folding and stability of EGFP where out of the 10 Pro residues 9 are involved in trans and only one in a cis peptide bond (Pro89) [20]. Upon replacement of all Pro residues in EGFP by either (4R)-FPro or (4S)-FPro we were not only able to control protein folding, but also to dissect the contributions of various factors to the folding of a complex protein molecule.
For a comparative analysis of the folding properties of EGFP and (4S)-FPro-EGFP, the proteins were unfolded in boiling 8 M urea and then refolded at room temperature after 100-fold dilution into buffer. Refolding kinetics were monitored fluorometrically over a time period of at least 30 min, and the refolding efficiency was assessed after 24 h incubation under non-denaturing conditions. (4S)-FPro-EGFP recovered more than 95% of its fluorescence before denaturation (Fig. 3A), whereas the parent EGFP retrieved only up to 60% of its initial fluorescence (Fig. 3B). The refolding kinetics of (4S)-FPro-EGFP and EGFP (Fig. 3C) show an initial fast phase with rate constants of 3.01610 22 s 21 and 1.41610 22 s 21 , respectively, that is followed by a slower refolding phase (rate constants 0.36610 22 s 21 and 0.15610 22 s 21 , respectively; Fig. 3C). Surprisingly, (4S)-FPro-EGFP exhibited superior refolding properties when compared to the parent EGFP as the rate is 2.1 times faster than that of the parent EGFP in both phases. A 'superfolder' GFP mutant has been reported by Pédelacq et al. [24], which refolds upon thermal denaturation in a two-step process as well, with an initial fast rate of 5.0610 21 s 21 , which is one order of magnitude faster than that of (4S)-FPro-EGFP. This GFP mutant was developed from an already wellfolding 'cycle 3' GFP mutant and relative to this parent mutant a 3.5 times enhanced folding rate was achieved.
From the results it is evident that, in contrast to our expectation, incorporation of (4R)-FPro into EGFP, which should favor the trans conformation of 9 Pro bonds, interferes with a correct protein folding, whereas (4S)-FPro with its opposite effect obviously does not. It is at least equally well accommodated by the protein structure as Pro and it enhances EGFP folding rates, whereas incorporation of (4R)-FPro apparently leads to effects that exceed the plasticity of the protein structure, and thus to irreversibly unfolded protein. A similar observation was reported recently by the group of Tirrell [25]. Indeed, global replacement of the leucine residues in GFP with 5,5,5-trifluoroleucine resulted in unfolded protein, and proper folding could only be restored after evolution of a mutant GFP that was able to accommodate the fluorinated residues [25]. In contrast to our and Pédelacq's superfolding GFP, the physical and spectroscopic properties of the fluorinated GFP mutant evolved by Tirrell's group were not superior to those of the parent GFP [25].
An inspection of the EGFP structure (PDB entry: 1EMG [20]) reveals that except Pro89 all other Pro residues are involved in trans peptide bonds. However, the resolution of the structure does not allow unambiguous assignment of the proline puckers. Indeed, in the process of the 3D-structure elucidation the C c -exo or -endo puckering attracts little attention due to its low relevance for the overall crystallographic data quality. In folded proteins ,5% of peptidyl-Pro bonds are in the cis conformation as derived by inspection of the protein structure database ( [26] and references therein). In unstructured polypeptide chains the content of cis peptidyl-Pro bonds can reach significantly higher values particularly with aromatic amino acids directly preceding the Pro residues [9,26,27]. It is well known that trans/cis isomerization dramatically affects folding kinetics of proteins with native cis peptidyl-Pro bonds [5][6][7][8][9]28]. Correspondingly, the fast initial refolding phase of EGFP should involve molecules with the Met88-Pro89 bond already in cis conformation whereas the slow phase should originate from denatured protein molecules with this peptide bond in trans [29]. In our (4S)-FPro-EGFP variant, the cis conformation of Met88-(4S)-FPro89 bond is favored and, thus, should accelerate the folding rates. In contrast to the experimental findings, the (4S)-FPro residues in all other nine positions were expected to disfavor the trans conformation and, thus, refolding properties. Obviously, the enhanced refolding rates of (4S)-FPro-EGFP have to originate from other factors.
The crystal structure of (4S)-FPro-EGFP (PDB entry: 2Q6P) solved at 2.1 Å resolution ( Fig. 2; refinement statistics are reported in Table 1; for details on crystallization conditions see Materials and Methods) confirmed that incorporation of the 10 (4S)-FPro residues did not affect the overall protein fold. All (4S)-FPro residues display C c -endo puckered pyrrolidine rings apart from Pro56, which adopts a C c -exo configuration. Indeed, the fluoroprolines are well defined and characterized by low B-factors indicating rigid local conformations in the protein matrix. As outlined above, (4S)-fluorination of Pro promotes C c -endo puckering. Apparently, such spatial display with preferred C c -endo puckering of 9 out of 10 Pro residues dramatically improves the folding properties.
Among the 10 Pro residues, five (13,75,89,192 and 211) are surface-exposed in EGFP, one is partially exposed (187) and the other residues are buried in the protein core (54, 56, 58 and 196). Fluorination of the buried residues increases their hydrophobicity and thus stabilizes the folded protein. We observed that the fluorinated EGFP was less prone to aggregation over the time and that related samples crystallized faster (overnight) than those of the parent protein (few days). Three of the buried proline residues are located in the characteristic proline-rich pentapeptide (4S)-FPro54- Fig. 4). The average B-factors for the prolines in the PVPWP motif as well as of the chromophore atoms are generally low (,12 Å 2 in (4S)-FPro-EGFP). Furthermore, in the crystalline state of (4S)-FPro-EGFP neighboring residues of fluorinated PVPWP exhibit lower average B-factors (,5-7 Å 2 ) as well.
The presence of the PVPWP pentapeptide in the GFP sequence has long been recognized [30], however, its significance is still unclear. Searching different protein databases (SwissProt, NCBI databases) we found the PVPWP motif in various proteins as different as cytochromes and eukaryotic voltage-activated potassium channels. Furthermore, we observed that the PVPWP motif is crucial for the EGFP function since site directed mutagenesis of Val55 abolishes protein fluorescence (unpublished data). Similarly, Trp57 cannot be replaced by any of the other 19 amino acids [31]. Thus, we speculate that the function of this proline-rich pentapeptide in GFP is to control the spatial orientation of the relatively bulky hydrophobic Val55 and Trp57 side chains. This is required for protecting the fluorophore from collisional quenching, e.g., by oxygen or other diffusible ligands.
The (4S) HRF replacement in Pro residues endows the pyrrolidine rings with large dipole moments because of the highly polar C-F bonds. This may promote strong dipole interactions in the local environments with polar groups such as amides, hydroxy or carbonyl groups. Indeed, 12 new interactions were detected in the (4S)-FPro-EGFP structure that were not present in EGFP (see Fig. 5). The majority of the fluorine atoms in (4S)-FPro-EGFP is involved in interactions with hydrogen atoms from neighboring backbone -NH-groups on their 'own' strand or on strands in the near vicinity (Fig. 5A-F). Only for the 4S-fluorine atoms at positions 187 and 192 direct interactions could not be detected. As outlined above, (4S)-FPro56 is the only fluoroproline having a C cexo pucker. This puckering directs the (4S)-fluorine atom towards an unfavorable position as it is involved in a repulsive interaction (3.07 Å ) with the backbone carbonyl group of Asn153 on the neighboring strand (Fig. 5B). However, the destabilizing effect of this repulsion is apparently largely outweighed by the other stabilizing interactions. The crystallographic distances detected in the (4S)-FPro-EGFP structure are well compatible with C-F-H-N/O electrostatic interactions, which are more favorable in the endo than they would be in the exo pucker conformation. We are well aware that hydrogen bonding to organic fluorine is a matter of considerable controversy [32]; thus, higher resolution threedimensional structures of fluorinated proteins are required to shed more light on this disputed matter.
It is obvious that fluorination of the Pro residues in EGFP is the main source of the superior refolding rates which may originate from several synergistic effects. The energy difference between the cis and trans Pro bond conformation is significantly larger for the exo than for the endo pucker and the activation energy for the cisRtrans isomerization of (4S)-FPro is almost identical to that of Pro [14]. Correspondingly, the cis/trans isomerization of the proline bonds affects folding of the EGFP variant to significantly lesser extents than the preferred endo puckering of the (4S)-FPro pyrrolidine ring. In the structural context, this generates stabilizing interactions of the fluorine atoms in (4S)-FPro-EGFP that are absent in the parent EGFP. Conversely, the ''superfolding GFP'' of Pédelacq et al. [24] was generated by random mutagenesis of  amino acids that likewise resulted in intramolecular interaction networks not present in native GFP. Although our study reports a serendipitous discovery, we are convinced that the insights we gained here can be generally useful for the design and engineering of proteins where proline residues play decisive roles. We think that the stability of a protein can be rationally manipulated by choosing the appropriate amino acid to fit into the 3D fine structure of the target protein. Alternatively, the structure of a target protein can be optimized for the incorporation of synthetic amino acids by guided evolution [25]. Thus, detailed protein models and classical engineering methods in combination with an expanded genetic code could open a new era of synthetic biology.

Crystallization and Structure Elucidation
The (4S)-FPro-EGFP was crystallized under the same conditions as the parent protein: (4S)-FPro-EGFP (16 mg/ml) was crystallized in 0.2 M Mg 2+ acetate, 0.1 M sodium cacodylate, pH 6.5, and 13% (w/v) polyethylene glycol 8000 using the sitting drop vapor diffusion method. 2 ml of protein solution were mixed with 1 ml of precipitant solution at 20uC. The structure of the EGFP variant was solved by the molecular replacement technique using 1EMG [20] as a model. The data set was collected on an X-ray image plate system (Mar Research, Hamburg, Germany) using CuKa-radiation generated by a Rigaku rotating anode at 100uK. Crystals were transferred to their mother solution containing 20% (v/v) glycerol as a cryoprotectant and shock-frozen in a nitrogen stream.
Reflections were integrated with the program DENZO, scaled and reduced using SCALEPACK [35]. Model building and refinement was performed with CNS [36]. The initial model was refined by alternating automatic minimization protocols performed with CNS inspecting visual electron density map and manually adjusted using the program O [37]. Except for a small part of the N-and the C-terminus the whole model (G4-I229) could be built. The data collection and refinement statistics are presented in Table 1. Accession codes: Protein Data Bank: Coordinates and structure factor amplitudes were deposited with accession code 2Q6P. Fluorescence spectra were excited at 488 nm by using excitation/emission slits of 5.0 nm and were recorded on a Perkin-Elmer spectrometer (LS50B) equipped with digital software. Protein concentrations were determined as described elsewhere [34].

Denaturation and refolding of the different GFP variants
GFPs are generally conformationally very stable proteins once their structures have formed. Their denaturation only occurs under extremely harsh conditions, e.g., strong denaturants in combination with high temperature. Denaturation of purified (4S)-  FPro-EGFP and EGFP (30 mM each) was performed in PBS containing 8 M urea and 5 mM DTT for 5 min at 95uC. Ureadenatured samples were renatured at room temperature by 100fold dilution into PBS with 5 mM DTT but without urea. Protein refolding was monitored for 30 min by fluorescence recovery at 509 nm by using the option 'Timedrive' of Perkin-Elmer spectrometer (LS50B) with an interval of 3 sec and a slit of 2.5 nm. The concentrations of denatured proteins were adjusted so that the dilution yielded about 0.3 mM protein. Raw data were imported into Origin 6.1 (OriginLab Corporation, Northampton, MA) and normalized before plotting. Data were fitted with Sigma Plot (Systat Software Inc., San Jose, CA) using equations as described elsewhere [24].
In order to assess the end point fluorescence recovery of EGFP and (4S)-FPro-EGFP, fluorescence spectra were recorded before denaturation and after renaturation at room temperature for 24 h.