Codon Usage in Signal Sequences Affects Protein Expression and Secretion Using Baculovirus/Insect Cell Expression System

By introducing synonymous mutations into the coding sequences of GP64sp and FibHsp signal peptides, the influences of mRNA secondary structure and codon usage of signal sequences on protein expression and secretion were investigated using baculovirus/insect cell expression system. The results showed that mRNA structural stability of the signal sequences was not correlated with the protein production and secretion levels, and FibHsp was more tolerable to codon changes than GP64sp. Codon bias analyses revealed that codons for GP64sp were well de-optimized and contained more non-optimal codons than FibHsp. Synonymous mutations in GP64sp sufficiently increased its average codon usage frequency and resulted in dramatic reduction of the activity and secretion of luciferase. Protein degradation inhibition assay with MG-132 showed that higher codon usage frequency in the signal sequence increased the production as well as the degradation of luciferase protein, indicating that the synonymous codon substitutions in the signal sequence caused misfolding of luciferase instead of slowing down the protein production. Meanwhile, we found that introduction of more non-optimal codons into FibHsp could increase the production and secretion levels of luciferase, which suggested a new strategy to improve the production of secretory proteins in insect cells.


Introduction
Synonymous codons refer to the different triplets which code for the same amino acid. All amino acids, except Met and Trp, are encoded by two to six codons. Increasing evidences have clearly shown that synonymous codons are not used equally in many species. This phenomenon is termed codon usage bias [1][2][3][4][5]. For instance, highly expressed genes in Escherichia coli and Saccharomyces cerevisiae have been found to selectively use optimal codons with high usage frequency [6,7]. Considering that the genomic codon frequency is usually correlated with the abundance of the cognate tRNAs, the abundant tRNAs for their optimal codons can transport enough amino acids for the rapid decoding of these highly expressed genes by ribosomes. Therefore, a plausible explanation for the correlation of the usage of optimal codons with the production level of individual genes is that highly expressed genes are translated at higher speed than other genes [8]. Experiments have demonstrated that substitution of synonymous codons can affect protein translation level [9,10], as well as other co-translational or post-translational processes including protein folding [8,[10][11][12], aggregation [11,13], and translocation [12,14].
Signal peptide, the short peptide which directs the newly synthesized protein to the secretory pathway in both prokaryotic and eukaryotic cells, majorly locates at the N terminus of secretory proteins. Although the sequences of signal peptides vary greatly, they all contain basic amino acids in the N-terminal region, followed by a middle hydrophobic core region and a C-terminal region containing polar amino acids. Different signal sequences can be interchanged between different proteins or even between proteins from different organisms. It is also a feasible way to improve the expression and secretion of some proteins by changing the type of the amino acids in their signal peptides [15][16][17][18]. Notably, genome-wide analyses have revealed high incidences of non-optimal codons at the N-terminal region of secretory protein genes in E. coli [19] and Streptomyces coelicolor [20]. Introducing synonymous codons with different usage frequency into signal peptides can provide new insights into the expression and secretion of secretory proteins. Using this strategy, it has been shown that non-optimal codons in the signal sequence of maltose binding protein (MBP) and β-lactamase are of great importance for the correct folding and export of the proteins [12,21].
Codon substitutions are always coupled with the change of mRNA secondary structure, and the latter also acts as an important regulatory factor on gene expression. It is always hard to determine whether the codon usage or the mRNA structure is the determinant of protein expression and translocation. Generally, mRNA folding with higher free energy tends to have less secondary structure; conversely, that holding lower folding energy tends to form more stable secondary structure. Additional energy is required to unfold stable secondary structure, and this will obviously hinder ribosome from translation initiation or moving along the mRNA during protein synthesis [22,23]. The stability of mRNA secondary structure adds complexity to protein expression regulation [24]. Recently, a stable mRNA secondary structure in the region of 30-80 nt downstream of the translation start codon was identified by computational analyses. The structural stability in this region was thought to be correlated with the translocation of secreted proteins [25][26][27]. This provides an interesting hypothesis that secondary structure at the N-terminus of the mRNAs for secretory proteins promotes ribosomal pause at early stage of elongation, benefitting the protein localization [26].
In this report, we investigated whether mRNA secondary structure stability or codon usage frequency in signal peptides affected protein expression and secretion using baculovirus/insect cell expression system. Two signal peptides, GP64sp from GP64 of Autographa californica multiple nucleopolyhedrovirus (AcMNPV) and FibHsp from the heavy chain of fibroin of Bombyx mori, were recoded with synonymous codons and fused to the firefly luciferase reporter gene. The expression and secretion of luciferase directed by these signal peptides were quantified and compared. We found that the correct folding and stability of the passenger protein was correlated with the non-optimal codon usage instead of the mRNA secondary structure stability in the signal sequences.

Time course for the expression and secretion of luciferase
To express firefly luciferase using baculovirus/insect cell expression system, the reporter's gene was cloned into pBac-5, a vector containing modified gp64 tandem early and late promoters which can initiate protein expression immediately after infection and also drive continued protein expression in the late phase of infection. In a previous study, we have reported that the expression of fluorescent proteins under the control of these tandem promoters can be detected as early as 8 hours post-infection (hpi) [28]. Here, the expression and secretion levels of luciferase fused with FibHsp, a signal peptide of Fibroin heavy chain from Bombyx mori, were examined at 12, 36 and 60 hpi (Fig 1). The results showed that the enzyme activity of luciferase was detectable, but at low levels, in both Sf9 cells and the cell culture media at 12 hpi, and the protein levels increased dramatically at 36 hpi. Even though the protein expression levels increased about 30 fold from 12 hpi to 60 hpi, the average secretion ratios remained between 35% and 40%. Similar results were obtained for luciferase fused with signal peptide GP64sp (data not shown). Therefore, in all further studies in this report, the reporter protein was examined at the time point of 36 hpi, when the protein expression reached a reasonable high level and the virus-infected cells were still in good condition.
Stability of mRNA secondary structure in GP64sp was not correlated with the protein expression and secretion Secondary structure of mRNA has been considered to be an important factor that participates in translation regulations. In a previous report, we found a high structural stability region (HSR) in the 30-80 nt region of mRNAs encoding secretory proteins by bioinformatics [26]. Interestingly, a significant correlation between structural stability and protein localization was revealed based on genome analysis. In order to study whether the high stability of this region plays a role in protein expression and secretion, here, we introduced synonymous substitutions in GP64sp to acquire mutants with altered mRNA stability by computer. For each mutant, the minimal free energy (mfe), an indicator widely used in representing the secondary structure stability of mRNA, was calculated. Among all the possible mutants, 5101 and 5102 used in this study had the lowest mfe, corresponding to the reduced structural stability. 5201 had a middle mfe close to the wild type GP64sp, and 5211 had the highest mfe. The predicted mRNA secondary structure and mfe for GP64sp and its four mutants used in this study were shown in Fig 2A. The mutated signal sequences and wild type GP64sp were then fused to the upstream of luciferase gene to direct the secretion of luciferase. The luciferase activity of their secreted and nonsecreted protein products was then examined in parallel with the enzyme expressed without signal peptide. As expected, luciferase without signal peptide (FNCOI in Fig 2B) was not secreted but produced in a much higher level than those with signal peptide, probably because the secretion process caused translation elongation arrest and slowed down the protein synthesis. Interestingly, in all of the four mutants, we found that both the enzyme activity and secretion ratio of the reporter decreased to very low levels, regardless of the mRNA structural stability (Fig 2B). Our results suggested that the stability of mRNA secondary structure in GP64sp had little effects on the expression and secretion of its passenger protein. Note that a strong codon usage bias was found in GP64sp, it is more possible that codon usage bias rather than structural stability of GP64sp affected the secretion and activity of the reporter protein.
FibHsp is more stable than GP64sp in guiding protein expression and secretion Containing 5 of 21 low-frequency codons (codon usage frequency <10 per thousand) in GP64sp, the dramatic reduction of luciferase activity and secretion ratio caused by the synonymous mutations implied that the codons for this signal sequence had been optimized during evolution. To further investigate the connection between the mRNA structural stability and protein expression and secretion, we introduced single synonymous codon substitutions into  another signal sequence FibHsp. FibHsp contained less low-frequency codons than GP64sp and might be more tolerable to codon changes. To test this possibility, the mfe for every possible single FibHsp mutant was calculated. Among them, Fib1 and Fib2 with the highest mfe, and Fib3 and Fib4 with the lowest mfe were used to fuse with luciferase gene (Fig 3A). We found that the enzyme activity and secretion ratio of the single mutants were shown comparable to the wild type FibHsp (Fig 3B), confirming that mRNA secondary structure in signal peptide had no significant effect on protein expression and secretion.
Compared with the four GP64sp mutants, which possessed 11 to 17 synonymous codon substitutions and disrupted the reporter expression and secretion, single mutations in FibHsp caused comparable mfe changes but did not affect the reporter expression and secretion. To investigate whether this discrepancy was due to the difference of codon substitution numbers, two more FibHsp mutants, Fib76 and Fib389 which respectively contained 14 and 15 synonymous codon substitutions, folded into different secondary structures with less stability than FibHsp (Fig 3C), were obtained and fused to the luciferase gene. We found that the total enzyme activity including the intracellular and extracellular reporter led by the mutated signal peptides was expressed at similar levels with FibHsp, but the reporter's secretion levels were moderately elevated by the mutated signal sequences (Fig 3D). The high protein expression and secretion levels under the direction of Fib76 and Fib389 further confirmed that protein expression and secretion were not correlated with the mRNA stability or structure of the signal sequence, and also suggested that FibHsp could function more stably than GP64sp.

Synonymous mutations in GP64sp resulted in the inactivation of the reporter protein
To investigate whether the different reporter activities were regulated at the transcriptional level or translational level, real-time PCR and western blot were carried out to detect the mRNA and protein products of the reporter. Surprisingly, in the group of GP64sp, only the transcription level of 5201 was not significantly higher than the wild type, while the other three mutants obviously produced more mRNAs (Fig 4A). At the protein level, the mutants were also comparable with GP64sp ( Fig 4B). As the enzyme activities of the mutants were shown to drop more than thousand times (Fig 2B), these results suggested that the synonymous codon substitutions in the signal sequence impaired luciferase function instead of slowing down the protein production.
As for the group of FibHsp, the single and multiple mutations did not result in significant changes of the mRNA production ( Fig 4A). By western blot, most of the single mutants, except Fib2, were detected at similar levels with the wild type, but the multiple mutants FIB76 and FIB389 were found to be produced more abundantly than the other secreted proteins ( Fig  4B). The increased production of protein directed by Fib76 and Fib389 may account for the strengthened secretion of the protein observed in Fig 3D. Non-optimal codons in the signal sequences benefited the secretion of luciferase Previous reports have revealed that secreted proteins contain more non-optimal codons at their N-terminal region, and the usage of non-optimal codons in the signal sequence plays an important role for the correct folding and export of the secreted proteins in prokaryotic expression systems [12,21]. To investigate the codon bias in the signal sequences used in this study and its connection with the protein production and secretion, codon usage frequencies of the codons in the signal sequences used in this study are listed in Fig 5 based on the data from Spodoptera frugiperda (http://www.kazusa.or.jp/codon/). The average codon frequency in both of Effects of Codon Usage in Signal Peptide on Protein Expression and Secretion the wild type signal sequences, especially GP64sp, are obviously lower than the first 21 or 22 codons for non-secreted luciferase (FNCOI), consisting with the observation of more non-optimal codons in secretory proteins in E. coli [19] and Streptomyces coelicolor [20].
In GP64sp, the codons are well de-optimized and the mutations in the four mutants sufficiently increased the average codon usage frequency in the signal sequences, and the number of codons used at the frequency higher than 20 are doubled or even tripled in the mutants. Less non-optimal codons in the mutants may speed up the translation of the protein but at the same time result in more mistakenly folded proteins, and this may contribute to the less enzyme activity detected in Fig 2 and explain why the protein abundant determined by western blot was not reduced in Fig 4. For FibHsp, which contains 11 codons with usage frequency higher than 20 and 2 codons with minimum usage frequency of 10, the average codon usage frequency is obviously higher than GP64sp although it is still lower than FNCOI. In Fib76 and Fib389, the mutations respectively introduced 3 more codons with usage frequency lower than 20 and 2 or 3 more codons with usage frequency lower than 10. The introduction of these non-optimal codons did not reduce the production of the protein but benefited the protein production ( Fig 4B) and secretion (Fig 3D). The results obtained from both GP64sp and FibHsp groups suggested that nonoptimal codons in the signal sequences could play an important role in the correct folding and export of the reporter protein. Luciferase directed by signal sequences with higher codon usage frequencies is more sensitive to the proteasome-dependent degradation In Fig 2, luciferase activity was observed dramatically declined in the GP64sp mutants, suggesting that luciferase could be misfolded when it was directed by the mutated signal sequences containing more high-frequency codons. During protein synthesis, proteins that are unfolded or misfolded in the endoplasmic reticulum tend to be tagged with ubiquitin and then degraded in proteasome [29]. To investigate whether protein misfolding contributed to the decrease of luciferase activity directed by the GP64sp mutants, MG-132, a cell-permeable proteasome inhibitor widely used for reducing the degradation of ubiquitin-conjugated proteins in eukaryotic cells [30], was used in this study to inhibit misfolded luciferase from being degraded by proteasome. Several doses of MG-132 (5, 10, 20 and 40 μM) were assessed for the cytotoxicity of the drug. The cell viability assay showed that the doses at 5 and 10 μM did not statistically affect cell viability compared to the untreated control cells in 72 h of drug treatment (Fig 6A). To maintain sufficient cell viability for the baculovirus infection and protein expression, MG-132 was used at the concentration of 5 μM in subsequent protein degradation inhibition assays.
Western blot was then carried out to analyze luciferase expression level, which was normalized to the baculovirus protein GP64 in each infected sample (Fig 6B). By densitometric scanning of the bands from three independent experiments, the relative luciferase expression levels of the mutants were calculated to the protein level with the wild type signal peptide and shown in the figure. Given that Western blot is only a semi-quantitative method, Enzyme-linked immunosorbent assay (ELISA) was further carried out for more accurate measurement of the luciferase expression with or without MG-132 treatment (Fig 6C). The ELISA data were consistent with the Western blot results that the relative expression levels of luciferase fused with mutated GP64 signal peptide (5101, 5102, 5201 and 5211), significantly increased at the presence of MG-132. In the FibHsp group, Fib76 and Fib389, the two mutants containing more Effects of Codon Usage in Signal Peptide on Protein Expression and Secretion non-optimal codons and expressed at higher levels in the absence of MG-132, did not produce more proteins than FibHsp at the presence of MG-132. These results confirmed our postulation that more optimal codons in signal sequences could promote translation speed but result in more folding deficiency.

Discussion
Using computational methods, we have found that mRNAs in the 30-80 nucleotide intervals for secretory proteins have significantly higher stability than other regions of secretory proteins and the region for non-secretory proteins [26]. In this study, by introducing synonymous codons into GP64sp and FibHsp, we investigated the influence of mRNA secondary structure stability of signal sequences on protein expression and secretion using baculovirus/insect cell expression system. The results show that mfe of HSR in GP64sp and FibHsp have no correlation with the protein expression and secretion, suggesting that the structural stability of the signal sequences is not the determinant for the production and translocation of secretory proteins although this structural region has undergone selection pressure to maintain high stability.
GP64sp, a signal sequence from GP64 of baculovirus AcMNPV, has been widely used for the expression of secretory proteins in baculovirus/insect cell expression system. Here we find that this signal sequence has been well de-optimized and synonymous substitutions in this region could drastically affect the enzyme activity and secretion of luciferase as its passenger protein. By western blots, ELISA and protein degradation inhibition assay with MG-132, we show that introducing optimal codons in the signal sequence can increase the production as well as the degradation of luciferase protein. In prokaryotic cells, it has been shown that nonoptimal codons in signal peptides play an important role in the correct folding of secretory MBP and β-lactamase [12,21]. A reasonable explanation for these observations is that the nonoptimal codons in the signal sequences may slowdown the translation elongation in this region and this could be important for the correctly folding and secretion of proteins.
Previous studies have suggested that optimized codons can contribute to the fast movement of ribosome, but it can also impair the activity of proteins or result in the proteolysis as the change of codon usage can affect the co-translational folding of protein [10,31]. Data from synonymous substitution of signal sequences also suggest that high frequency usage of non-optimal codons in signal sequences probably plays a similar role in the regulation of translation in bacterial cells [12,21]. The mechanism of protein translocation is still far from being completely understood. We speculate that non-optimal codons in signal sequences are required for the correct folding and binding of the nascent signal peptide to signal recognition particle. Misfolding of signal peptide caused by the fast decoding of the substituted higher frequency codons may interrupt the interaction of signal peptide with the signal recognition particle, and therefore block the on-going secretion and result in a disorder of the passenger protein translation. Further work is needed to verify this speculation.
Another interesting discovery from this study is that the codons for FibHsp, the signal peptide for the most abundant secretory protein in Bombyx mori, are not de-optimized as well as GP64sp and the sequence is more tolerable with synonymous codon substitutions. Introduction of non-optimal codons in Fib76 and Fib389 resulted in higher production and secretion levels of reporter protein. FibHsp is a potent signal peptide for the production of secreted proteins using baculovirus/insect cell expression platform. It is worthy to investigate whether introducing more non-optimal codons into FibHsp will further improve the production and secretion of passenger proteins.
To the best of our knowledge, this is the first study directly showing that synonymous codon substitutions of signal peptide have influence on passenger protein expression and secretion in eukaryotic cells. Further studies on codon bias in signal peptides may give us new insights into the protein co-translational process, including protein folding and translocation.

Secondary structure prediction
The prediction of mRNA secondary structure and calculation of minimum free energy (mfe) were achieved online using the website (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). Considering the optimal temperature for the growth of insect cell is 28°C, we set the parameter of temperature as 28°C, and other parameters were default settings.

Plasmids Construction
The gene of Firefly luciferase, abbreviated as Fluc, was amplified from plasmid pGL3-Basic (Promega) by PCR, using primers FlucF and FlucR. Fluc was then cloned into vector pBac-5 (Novagen) between the HindIII and XhoI sites, and the generated construct was named as pBac-Fluc.
All the signal sequences used in this study are listed in Table 1

Protein expression and detection
The transfer vectors were co-transfected with linearized BAC10:KO1629 DNA into Spodoptera frugiperda (Sf9) cells (Invitrogen) using Fugene HD Transfection Reagent (Roche). Cells were maintained at 27°C in SFX-INSECT medium (Thermo Scientific HyClone) with 1% fetal bovine serum (Thermo Scientific HyClone). Recombinant baculoviruses generated by homologous recombination were harvested at 5 days post transfection, and then Sf9 cells were infected with 100 μl of the culture medium containing recombinant baculoviruses.
For the detection of secreted luciferase, cell media were harvested at 12, 36 and/or 60 hours post infection (hpi). To detect the protein levels in Sf9 cells, cells in the plate were lysed by 5× cell lysis reagent (Promega) and then PBS was added to the same volume as the cell culture media. The enzyme activity of secreted (v) and unsecreted (w) firefly luciferase was determined by Glomax 20/20 Luminometer (Promega), using Luciferase Assay System (Promega). All of the samples were analyzed in triplicates, and each sample measurement was repeated three times. Secretion ratio (u) of the protein was calculated as below:

Western blot analysis
To detect the protein level in Sf9 cells, infected cells were harvested at 36 hpi and lysed with SDS loading buffer (2% SDS, 100 mM DTT, 0.1% bromophenol blue, 10% glycerol, 50 mM Tris-HCl pH 6.8). After boiling for 5 min, the cell lysates were analyzed by 12% SDS polyacrylamide gel electrophoresis (SDS-PAGE), transferred onto polyvinylidene difluoride (PVDF) membranes, and then blocked with 5% non-fat milk overnight at 4°C. The membranes were incubated, with anti-GP64 monoclonal antibody (Santa Cruz) for the normalization of protein samples, and with anti-His monoclonal antibody (CoWin Biotech, China) for the valuation of luciferase protein level, for 1 hour at room temperature. After three washes with TBS containing 0.1% Tween-20 (TBST), the membranes were incubated with HRP-conjugated goat antimouse antibody (CoWin Biotech, China) for 1 hour at room temperature, and washed again with TBST for three times. The proteins were then visualized by enhanced chemiluminescence using eECL Western Blot Kit (CoWin Biotech, China). Densitometry of Western blots was performed using Image Lab.

Quantitative RT-PCR
To extract RNA, infected cells from a 6-well plate were lysed with 1 mL/well of TRIzon (Beijing CoWin Biotech). 5 ng of total RNA from each sample was subjected to reverse transcription, using PrimeScript 1st cDNA Synthesis Kit (Takara). Quantitative real time polymerase chain reaction (qPCR) was performed on CFX96 Real Time PCR System (BIO-RAD, USA), using SYBR Primix Ex Taq II (TaKaRa). To detect the transcription level of luciferase, specific forward primer (5 0 -CTGGAGACATAGCTTACTGGGACG -3 0 ) and reverse primer  GTA AGC GCT ATT GTT TTA TAT GTG CTT TTG GCG GCG GCG GCG CAT TCT GCC TTT GCG  GCG  5101  ATG GTG TCT GCG ATA GTC TTG TAC GTC CTT CTG GCG GCT GCC GCG CAC TCG GCG TTC GCC  GCA  5102  ATG GTG AGC GCG ATT GTG CTA TAT GTG CTC TTG GCG GCT GCC GCC CAT TCG GCC TTT GCA  GCT  5201  ATG GTG TCC GCG ATC GTG CTG TAC GTG CTT CTG GCT GCC GCC GCT CAT TCC GCC TTC  Effects of Codon Usage in Signal Peptide on Protein Expression and Secretion (5 0 -GGTGTTGGAGCAAGATGGATTC -3 0 ) were used for the qPCR. AcMNPV gp64 mRNA, measured using specific forward primer (5 0 -TATGTGCTTTTGGCGGCGGC -3 0 ) and reverse primer (5 0 -GCATACGCCTGGTAGTACCC -3 0 ), was served as the internal reference for the total RNA level as well as the baculovirus infection efficiency. The reactions were carried out at 95°C for 10 min, followed by 40 cycles of 95°C for 10 s and 60°C for 30 s. Relative levels of luciferase mRNA were calculated using the 2 -ΔΔt method of relative quantification with gp64. All assays described here were repeated three times, and all of the measurements were made in triplicate.

Cell viability assay
Sf9 cells were seeded in 96-well cell culture plates at approximately 3×10 4 cells/well, and treated with 0, 5, 10, 20 or 40 μM MG-132 for 24 to 72 hours. Cell viability was analyzed by measuring the succinate dehydrogenase level with a MTT Cell Proliferation and Cytotoxicity Assay Kit (Beyotime, Beijing, PR China) according to the manufacturer's instruction.

Protein degradation inhibition assay
Sf9 cells were seeded in 12-well cell culture plates and infected with recombinant baculoviruses. 5 μM MG-132 was added to block the proteolytic activity of proteasome complex 9, and the cells were harvested at 36 hpi. Western blots were then carried out to detect the expression levels of luciferase, using baculovirus GP64 as a control for the virus infection and protein loading. Densitometry of the signals was performed using Image Lab. Relative expression levels of luciferase were normalized to GP64. The results shown were representative of at least duplicated experiments.

ELISA
Sf9 cells were seeded in 12-well plates and infected with recombinant baculoviruses expressing the luciferase. Cells infected with a baculovirus not expressing the luciferase were used for the baseline correction. 5 μM MG-132 was added to block the proteolytic activity of proteasome complex 9. The same volume of DMSO was added in parallel as the untreated control. The cells were harvested and lysed with cell lysis reagent (Promega) at 36 hpi. ELISA plates were coated with 30 μg/mL cell lysate diluted in 100 μL of 50 mM sodium carbonate buffer (pH 9.6) at 4°C overnight. The plates were washed three times with PBS containing 0.05% Tween-20 (PBST) and blocked with 5% nonfat milk in TBST buffer for 1 hour at 37°C. Anti-Luciferase polyclonal antibody (Promega) at the dilution of 1:250 was added and incubated at 37°C for 1 hour. After washing three times, 100 μL HRP-conjugated rabbit anti-goat IgG (diluted 1:2,000 in blocking buffer) was added to each well and incubated at 37°C for 1 hour. After washing with TBST, 50 μL of TMB was added and incubated in the dark at 37°C for 25 min. The reaction was stopped by adding 50 μL of 2M H 2 SO 4 , and the absorbance was read at 450 nm. All measurement was repeated three times.