Synthesis of Non-linear Protein Dimers through a Genetically Encoded Thiol-ene Reaction

Site-specific incorporation of bioorthogonal unnatural amino acids into proteins provides a useful tool for the installation of specific functionalities that will allow for the labeling of proteins with virtually any probe. We demonstrate the genetic encoding of a set of alkene lysines using the orthogonal PylRS/PylTCUA pair in Escherichia coli. The installed double bond functionality was then applied in a photoinitiated thiol-ene reaction of the protein with a fluorescent thiol-bearing probe, as well as a cysteine residue of a second protein, showing the applicability of this approach in the formation of heterogeneous non-linear fused proteins.


Introduction
Covalent attachment of proteins to ligands, polymers, and surfaces creates macromolecules combining specific biological function with favorable physical and chemical properties. For example, studying biological processes in their native environment often requires the addition of reporter tags to proteins [1]. To date, the mainstay tagging strategy for imaging of proteins involves genetic fusions of fluorescent proteins [2,3]. However, the large size of fluorescent proteins can interfere with the folding and activity of the targeted protein [4]. Alternatively, tag-mediated labeling methods have been exploited, including self-labeling proteins, such as HaloTag, SNAP-tag, CLIP-tag, and enzymemediated labeling [5,6]. Although these methods allow for smaller reporter tags, limitations with regard to the position and the structure of the label remain and the presence of an enzyme is required.
Bioorthogonal reactions have been applied in a variety of sitespecific modifications of proteins such as fluorescent labeling, PEGylation, biotinylation, post-translational modification mimics, and surface immobilization [7,9,[45][46][47]. Another area of interest for which bioconjugation reactions have been explored is in the generation of non-linear protein fusions. In biological systems, proteins often bind to other proteins to gain stability, affinity and higher specificity to perform specific cellular functions such as signal transduction, transcriptional regulation, and DNA repair [48][49][50]. Elucidation of many of these processes have led to the generation of chemical and biosynthetic methods to create nonlinear protein linkages post-translationally for the control and performance of a number of functions, as well as protein trafficking and isolation. Methods that have been explored include native chemical ligation [51][52][53][54], enzyme based strategies [55,56], and conjugation employing reactions with UAA residues [57][58][59][60][61][62][63]. The introduction of UAAs at a specific position allows for greater topological diversity with minimal protein modification [7][8][9]46]. Here, we are applying the site-specific genetic incorporation of alkenes into proteins in the direct, spacer-free generation of nonlinear protein fusions.
The thiol-ene reaction involves a radical-mediated addition of a thiol to an alkene that occurs upon UV irradiation (365-405 nm) [64,65]. The reaction offers the possibility of using light to control both in space and time the formation of a stable thioether bond. As a result of its specificity for alkenes and compatibility with aqueous environments, the thiol-ene reaction is a bioorthogonal reaction that has been applied in polymer and material synthesis [66][67][68][69][70][71][72], carbohydrate modification [73,74], and peptide and protein modification [21,22,29,[40][41][42][43][44]. Recently, orthogonal thiol-ene bioconjugations applying alkenyl UAAs and synthetic organic reaction partners have been reported [21,22]. In order to expand the chemical diversity of these orthogonal handles, we demonstrate the synthesis, incorporation and protein heterodimer formation using alternative thiol-ene reaction conditions.
To investigate whether the synthesized alkene-lysines are substrates for the wild-type MbPylRS, the incorporation efficiencies of 1-9 into myoglobin were evaluated by protein expression in E. coli. Cells were grown in the absence of an UAA and in the presence of 1-9. The amino acids 1, 2 and 3 have been previously described and incorporated into proteins using wild-type PylRS and/or PylRS mutants [22,85]. Here we found that additional analogs can be efficiently incorporated into myoglobin by the MbPylRS. The obtained incorporation efficiencies and ESI-MS results are listed in Figure 1B and the corresponding SDS-PAGE analysis is shown in Figure S1.
Previous crystallographic studies of PylRS have indicated that the synthetase holds a large hydrophobic pocket, capable of accommodating bulky and hydrophobic moieties [87,88]. In addition, it has been found that the carbamate moiety at the lysine side-chain is an essential discriminator for substrate recognition. For instance, the oxygen atom adjacent to the side-chain carbonyl group in 1 interacts via a water-mediated hydrogen bond with the side-chain carbonyl group of Asn346, a key residue in establishing substrate recognition in PylRS [85,87]. We found that the amino acid binding pocket of MbPylRS exhibited flexibility to accommodate substrates 1, 2, 3, 6 and 7 with amino acids 1 and 7 showing the highest incorporation efficiency, which could be explained by their smaller size. While the amino acids 4 and 5 were not efficiently incorporated into protein due to their longer carbon chains.
The successful incorporation of 1 and 7 into protein together with the inefficient substrate recognition of 8 and 9 by MbPylRS suggests that the presence of an oxygen atom adjacent to the sidechain carbonyl group favors the hydrogen-bond network to be established more efficiently. We hypothesize that the recognition of 7 by MbPylRS may be possible by re-directing the necessary interactions of the synthetase's binding pocket to the O e -position. Moreover, we have previously observed a preference for the carbamate moiety over an amide group to drive the efficient genetic encoding of e-N-propargyloxycarbonyl-lysine by the wildtype MbPylRS/PylT CUA pair, while its amide analog e-Npentynoyl-lysine was not accepted as a substrate [17]. Although analogs that bear a side-chain amide moiety have been incorporated into proteins by wild-type PylRS, so far only structures with up to four atom bonds in length from the amide e-amino group have been tolerated by the enzyme's binding pocket [89,90]. Since our amino acid 8 is a bond longer, we can speculate that the carbamate functionality in 1, compared to 8, assists in an increase of substrate recognition efficiency by MbPylRS, as the enzyme showed to also tolerate the lengthier amino acids 2 and 3. The amino acid 9, which bears a urea linkage, seemed to be slightly favored by MbPylRS compared to the amide 8. However, the amino acid 9 still proves to be a poor substrate compared to 1. Our findings suggest that the replacement of the oxygen atom on the carbamate by a carbon or nitrogen atom may be enough to discriminate between the very similar substrates 1, 8, and 9, possibly due to weaker interactions with the amide nitrogen atom or urea functionalities, thus not favoring an efficient binding of 8 or 9 into the MbPylRS amino acid pocket. With amino acids 1-3 showing good incorporation efficiency, we site-specifically incorporated these amino acids into superfolder Green Fluorescent Protein (sfGFP) as a second model protein in E. coli. We found that alkene lysines 1-3 were successfully introduced at position Y151 in sfGFP ( Figure 2A) and that sfGFP yields were obtained at 32-70 mg/L, an approximately 10-fold increase of incorporation efficiency compared to myoglobin bearing the same amino acids 1-3. ESI-MS analysis of purified sfGFP shows molecular weights corresponding to the site-specific incorporation of 1, 2, and 3 ( Figure 2B).

sfGFP labeling via the thiol-ene reaction
To verify that the thiol-ene reaction is suitable for labeling the alkene-bearing sfGFP, dansyl-thiol (10) was used as a fluorescent probe ( Figure 3A and Scheme S5). Wild-type sfGFP and modified sfGFPs carrying 1 or 2, which showed the highest incorporation efficiency, were subjected to a thiol-ene reaction with 10 by irradiating the reaction mixture with 365 nm UV light in the presence of the photoinitiator I2959 for 5 min. Both samples were then analyzed by SDS-PAGE gel and in-gel fluorescence imaging. Figure 3B shows that the alkene-containing sfGFPs modified with 1 and 2 were both selectively labeled with 10 after UV irradiation while the wild-type sfGFP was not fluorescently labeled. These results demonstrate that a thiol-containing fluorescence probe could be site-specifically conjugated to sfGFP bearing an alkene functional group.
In order to show the potential of the thiol-ene reaction in protein chemistry, we hypothesized that cysteine residues in another protein could also be used as a possible reaction partner, leading to the formation of a non-linear protein heterodimer ( Figure 3A). Lysozyme is a small protein containing 8 cysteine residues within 129 amino acids [91]. The cysteines form 4 disulfide bonds and can be reduced to release free thiol groups. Analysis of bioconjugated proteins by SDS-PAGE revealed bands of expected molecular weight, as the bands corresponding to sfGFP increased from 28 kD to 44 kD via conjugation to lysozyme after UV exposure in the presence of the photoinitiator I2959 for 10 min ( Figure 3C). This result indicates that the majority of the observed products are sfGFP-lysozyme heterodimers since lysozyme was supplied in 4 fold excess compared to alkenyl sfGFP. Without UV irradiation, no significant mobility shift was observed. As expected, wild-type sfGFP did not undergo a thiol-ene reaction with lysozyme. Overall, a successful protein-protein heterodimer formation via thiol-ene conjugation of an alkene-containing protein was achieved.
In both bioconjugation strategies we found that the addition of sodium dodecyl sulfate (SDS) was necessary for an efficient and specific conjugation reaction to alkene-labeled proteins within 5-10 min, in contrast to previously reported 1-2 h reaction times [22,29,43], thus significantly reducing UV exposure. We found that under our experimental conditions, lysozyme is (at least partially) denatured [92,93], as confirmed by circular dichroism (CD) spectroscopy ( Figure S2). This may result in more accessible cysteine residues and facilitate the thiol-ene bioconjugation reaction. Moreover, as a well-known surfactant, SDS has been proposed to form micelles in thiol-ene reactions for water-based polymerization reactions [94,95]. It is possible that the association of the proteins with micelles may increase their local concentration, thus further facilitating the reaction.

Conclusions
In conclusion, we have synthesized a collection of alkene lysines of varying length and e-linkages and demonstrated their sitespecific, genetically encoded incorporation into proteins in E. coli by the wild-type MbPylRS/PylT CUA pair. The alkene-containing amino acids 1-3 showed the highest incorporation efficiencies into myoglobin and protein yields decreased with increasing side-chain length, hinting the limitations of the wild-type synthetase's binding pocket to accommodate sterically demanding amino acids. Among these amino acids, we also successfully incorporated the amino acid 7 with an inverted carbamate functionality at the e-position of lysine. Replacement of the carbamate motif for an amide or urea failed to provide efficient incorporation into protein, once again suggesting that the carbamate moiety at the lysine side-chain can be an essential discriminator for substrate recognition by wild-type PylRS.
Next, the alkene amino acids 1, 2, and 3 were successfully incorporated into sfGFP, with 1 and 2 exhibiting the highest incorporation efficiency. Utilizing the thiol-ene reaction, alkenebearing sfGFP was site-specifically bioconjugated to a dansyl-thiol fluorophore (10) upon irradiation with 365 nm of UV light in the presence of photoinitiator I2959 after only 5 min. In addition, we applied the site-specific genetic incorporation of alkene-bearing amino acids into proteins in the direct, spacer-free synthesis of a non-linear protein fusion of sfGFP and lysozyme. All components are recombinantly expressed and no post-translational introduction of functional groups was required. The work described herein demonstrates for the first time the assembly of a protein heterodimer by means of a light-induced thiol-ene ligation using genetically encoded alkene-bearing UAAs. This approach may become a promising tool to create non-linear proteins directly, with minimal synthetic effort, by creating direct protein-to-protein conjugations.

Synthesis of alkene lysines: general considerations
Unless otherwise stated, all reagents used were commercial reagents used without purification and reactions were performed under nitrogen using flame-dried glassware. The 1 H NMR and 13 C NMR spectra were recorded on a 300 MHz or 400 MHz Varian NMR spectrometer. The amino acid 1 was purchased from Chem-Impex International, Inc. For synthesis schemes of 2-10, please refer to Schemes S1-S5.

Myoglobin Expression in E. coli
Plasmids, pMyo4TAGpylT and pBKpylS, were co-transformed into E. coli Top10 cells as previously described [98] and selected with 25 mg/mL tetracycline and 50 mg/mL kanamycin. A single colony was used to inoculate 2 mL LB medium containing the same antibiotics and grown overnight. Next, 500 mL of culture was used to seed 50 mL of LB culture containing 1 mM of the corresponding UAAs and antibiotics. The pH was adjusted to 7 with 10 M NaOH immediately before inoculation. Cells were then cultivated to OD 600 = 0.6 and 100 mL of 20% arabinose solution was supplemented to induce arabinose promoter driven expression. The cells were cultivated at 37uC shaker overnight and harvested by centrifugation at 3000 g in standard 50 mL conical tubes. Lysis of the cell was conducted by re-suspending the cell pellets with standard Ni-NTA phosphate lysis buffer with lysozyme and 0.1% Triton X-100. After 1 hour of incubation at 4uC, cells were sonicated on ice to release the soluble portion and debris was removed by centrifugation. The cleared lysates were incubated with 100 mL of Qiagen Ni-NTA agarose slurry at 4uC to bind Histagged myoglobin. The mixture was then centrifuged at 1000 g for 5 min and agarose beads were collected and transferred to microcentrifugator filter columns. Beads were washed three times with 400 mL Ni-NTA lysis buffer and one time with 400 mL Ni-NTA wash buffer. The protein was eluted with 400 mL of elution buffer. Eluted sample was mixed with SDS loading buffer, heated at 95uC for 5 min and loaded onto 10% SDS-PAGE gel with 1.5 mm thickness and ran at 150 V for 50 min. The gel was stained overnight with Coomassie blue solution (0.1% Coomassie blue, 10% acetic acid, 40% ethanol), then de-stained (10% acetic acid, 40% ethanol) and analyzed ( Figure S1). The protein was dialyzed in 1 L of 20 mM ammonium acetate buffer for mass spectrum analysis.
sfGFP Expression in E. coli The plasmid pMyo4TAGpylT [98] was modified by replacing the myoglobin coding sequence with the sfGFP gene with an amber stop codon mutation placed on Y151 position located on the outer beta sheet domain. The co-transformation was the same as described above but condensed culture protocol was used to maximize UAA yields. The 2 mL of overnight culture was scaledup to 400 mL culture in 261 L Erlenmeyer flask and grown to OD 600 = 0.6. Cells were harvested in 4650 mL conical tubes and re-suspended in 50 mL of LB medium containing 1 mM of the corresponding UAA, antibiotics, and 0.1% arabinose. Cells were re-suspended by incubating in a rotary shaker at 37uC for 10 min and collected in a 250 mL Erlenmeyer flask. The cells were induced for 4 h and harvested by centrifugation. Cell pellets were first suspended by 3.6 mL of 50 mM Tris-HCl pH 8.0, supplemented with 2.4 mL of 4 M ammonium sulfate and extracted by three-phase partitioning method [99] with 6 mL of t-butanol and vigorous shaking. The aqueous bottom layer containing sfGFP was removed and dialyzed against 1 L Ni-NTA lysis buffer for 1-2 h to remove most of the ammonium sulfate. The dialyzed samples were filtered through 0.45 mm disc filter before loading into Ni-NTA gravity column containing 0.5 mL bed volume. The proteins were bound and washed with 12 mL bed volume of lysis buffer, 6 mL bed volume of wash buffer containing 50 mM imidazole and eluted with Ni-NTA elution buffer. Samples were analyzed by SDS-PAGE, dialyzed against PBS pH 7.4 for subsequent labeling reaction and then dialyzed against 20 mM ammonium acetate for mass spectrum analysis.

Protein MS Analysis
Protein MS was measured at the Genomics and Proteomics Core Laboratories, University of Pittsburgh. The protein solution was adjusted to 5 pmol/mL in 80% acetonitrile and 0.1% aqueous formic acid. The sample was injected into a Bruker micrOTOF with an Ultimate 3000 HPLC. The results were deconvoluted to calculate the molecular weight using HyStar.

Thiol-ene Reactions with sfGFP
A reaction buffer containing 30 mL of 1 M TrisHCl pH 6.8 (120 mM), 50 mL of 10% SDS (2%), 50 mL of 10 mM TCEP (2 mM), and 120 mL of water was made. In another eppendorf tube, a solution of 10 mM photoinitiator I2959 containing 50% DMSO in water was prepared. Then 62.5 mL of I2959 were added to 250 mL of reaction buffer just before labeling. Next, 2.5 mL of reaction buffer/photoinitiator mix were added to 20 mL of sfGFP (2400 ng/mL) and incubated at room temperature for 10 min. Dansyl-thiol (10) 50X substrate solution was prepared with 10 mL of 100 mM TCEP (10 mM), 20 mL of 25 mM dansylthiol in DMSO and 70 mL of deionized water. Next, 16.8 mL of this solution was added to the reaction mixture and incubated at room temperature for another 10 min. Subsequently, 250 mL of 1X SDS loading buffer containing 2-mercaptoethanol and additional 50 mL of 100 mM DTT were prepared for stopping the reaction. Then 12 mL of the reaction samples were aliquoted into 200 mL PCR tubes. Samples were placed on a standard UV transilluminator at 365 nm for 5 min and the reaction was stopped by adding 12 mL of 1X SDS loading buffer to the mixture and heated at 95uC for 5 min. Next, 5 mL of the samples were loaded onto a 10% SDS-PAGE gel with 1.5 mm thickness and ran at 150 V for 50 min. After electroporation, gels were rinsed briefly with deionized water and imaged. Gels were stained with coomassie blue and scanned to visualize the protein bands.
For protein heterodimer formation, the thiol-ene conjugation was carried out with a denatured and reduced lysozyme solution. A reaction buffer containing 120 mL of 1 M TrisHCl pH 6.8, 200 mL of 10% SDS, and 1 mL of 10 mM TCEP was prepared. In 1.32 mL of the reaction buffer, 19 mg of lysozyme were dissolved (1 mM). The solution was sealed in a 2 mL microcentrifugation tube with a rubber septum and purged with nitrogen for 30 min. The tube containing the protein solution was then heated at 75uC for 30 min. Photo-initiator I2959 was diluted to 10 mM in a solution of 50% DMSO in water. sfGFP solutions were adjusted to a concentration of 23 mM (650 ng/mL) and 20 mL of this solution was mixed with 2 mL of reduced lysozyme and 0.5 mL of I2959 in the dark. The PCR tube containing this mixture was then placed on a standard UV transilluminator at 365 nm for 10 min. A solution containing 120 mL of 1 M Tris-HCl pH 6.8, 20 mL of 10% SDS, 10 mL of 1 M TCEP and 200 mL of glycerol was prepared and 20 mL of it were immediately added to the reaction solution after irradiation. 15 mL of the resulted sample was loaded onto native or 10% SDS-PAGE gel following standard procedures.

Circular dichroism analysis of lysozyme
CD experiments were performed on an Olis Circular Dichroism Spectrophotometer using 0.1 cm quartz cuvettes. A solution containing 19 mg of lysozyme in 1.32 mL 10 mM phosphate buffer pH 7.4 was prepared. SDS was added to the final concentration of 2%. The lysozyme concentration was diluted to 20 uM for the CD experiment, and CD spectra were collected from 195 to 260 nm in 1 nm increments with an integration time of 5 s and a bandwidth of 2 nm. Increased intensity in the far-UV spectrum with the addition of SDS ( Figure S2) is in agreement with previous observations [100].