Sets of Covariant Residues Modulate the Activity and Thermal Stability of GH1 β-Glucosidases

The statistical coupling analysis of 768 β-glucosidases from the GH1 family revealed 23 positions in which the amino acid frequencies are coupled. The roles of these covariant positions in terms of the properties of β-glucosidases were investigated by alanine-screening mutagenesis using the fall armyworm Spodoptera frugiperda β-glycosidase (Sfβgly) as a model. The effects of the mutations on the Sfβgly kinetic parameters (k cat/K m) for the hydrolysis of three different p-nitrophenyl β-glycosides and structural comparisons of several β-glucosidases showed that eleven covariant positions (54, 98, 143, 188, 195, 196, 203, 398, 451, 452 and 460 in Sfβgly numbering) form a layer surrounding the active site of the β-glucosidases, which modulates their catalytic activity and substrate specificity via direct contact with the active site residues. Moreover, the influence of the mutations on the transition temperature (T m) of Sfβgly indicated that nine of the coupled positions (49, 62, 143, 188, 223, 278, 309, 452 and 460 in Sfβgly numbering) are related to thermal stability. In addition to being preferentially occupied by prolines, structural comparisons indicated that these positions are concentrated at loop segments of the β-glucosidases. Therefore, due to these common biochemical and structural properties, these nine covariant positions, even without physical contacts among them, seem to jointly modulate the thermal stability of β-glucosidases.


Introduction
In recent years, the search for new enzymes using improvements in sequencing technologies has resulted in a large collection of protein sequences. For example, the glycoside hydrolase family 1 (GH1) groups have more than 5,000 b-glucosidase sequences in the CAZY database, only 274 of which have been marked as characterized to date [1]. Structural data -41 crystallographic structures are available -and biochemical characterization of GH1 b-glucosidases revealed these enzymes share the same fold, the (b/ a) 8 barrel, and that their catalytic activity depends on a pair of glutamate residues, which act as acid/base and nucleophile in a double substitution mechanism [1]. Moreover, their companions, R and Y residues, are involved in the modulation of the pK a of the catalytic glutamates. These catalytic residues are highly conserved among b-glucosidases, except for the myrosinases [2,3]. Additionally, a network of hydrogen bonds formed by Q, H, W and E residues, which are placed and conserved in the active site of the bglucosidases, modulates their substrate glycone specificity [4][5][6][7]. Finally, a set of variable residues, for which only the structural placement is conversed, forms the aglycone binding region for different b-glucosidases [8][9][10].
In parallel with the increase in sequence data, new methods to characterize the correlation between functions and structures of proteins have been developed that use special approaches to globally analyze protein sequences, and have revealed groups of residues that are jointly involved in determining functional properties. One of these methods is statistical coupling analysis (SCA), which, through covariation analysis of large multiple sequence alignments, is capable of identifying sets of residues that are important for protein folding [11], allostery [12], enzymatic activity and thermal stability [13]. Moreover, it was recently demonstrated that sets of covariant residues, termed sectors, are important starting points for protein engineering [14].
Among the characterized GH1 enzymes, the b-glucosidase from the fall armyworm Spodoptera frugiperda (Sfbgly) has been extensively studied, including biochemical and site-directed mutagenesis of the active site residues involved in pH optimum modulation and substrate specificity [2,4,10,15,16]. Additionally, residues outside of the active site that affect Sfbgly enzymatic activity via indirect contacts have been identified [17]. These findings make Sfbgly an excellent model for analyzing the function of covariant residues of the GH1 family.
We applied SCA to an alignment containing 768 b-glucosidase sequences and identified 23 covariant positions. Using Sfbgly as an experimental model, alanine residues were introduced at 18 of these covariant positions, and these single mutants were characterized for their thermal stability (T m ) and kinetic parameters (k cat / K m ) for the hydrolysis of three different chromogenic substrates. Based on the results, a set of 11 covariant positions, which are related to the enzymatic activity and form a layer surrounding the active site of the b-glucosidases, was identified. In addition, a second set of 9 covariant positions related to enzyme thermal stability, consisting of amino acid residues mostly at the loop regions of this (b/a) 8 barrel structure, was identified.

Materials and Methods
Identification of the covariant positions in b-glucosidases SCA was performed as described previously [11,18] using a multiple sequence alignment containing 768 non-redundant bglucosidases from the GH1 family, which were retrieved from the PFAM server (http://pfam.sanger.ac.uk). Site conservation (DG stat ) and positional statistical coupling (DDG stat ) parameters were calculated as previously described [19] using our own programs written in C/C ++ [18]. Perturbations were considered significant when differences were present in at least 25% of the sequences in the alignment. A set of residue positions containing correlated conservations was obtained by clustering the DDG stat matrix using the Matlab (MathWorks) software package.

Site-directed mutagenesis
Site-directed mutants were constructed using the QuikChange site-directed mutagenesis kit (Stratagene, La Jolla, CA, USA) following the manufacturer's instructions. Wild-type Sfbgly cloned into the pAE vector [20] was used as a template for PCR reactions performed with mutagenic primer pairs, which are presented on the Table S1. The pAE vector coding for wild-type Sfbgly, previously available, contains a T7 promoter, an ampicillin resistance mark and the insert was cloned in the NdeI and XhoI sites. Mutation incorporation was verified by DNA sequencing.

Expression and purification of recombinant and wildtype Sfbgly
NovaBlue (DE3) competent cells (EMD Millipore, Billerica, MA, USA) were transformed with pAE plasmids encoding wildtype or mutant Sfbgly, plated on LB-agar containing ampicillin (50 mg/mL) and grown at 37uC for 16 h. Single colonies were grown at 20uC in LB broth containing ampicillin (50 mg/mL) until they reached an attenuance of 0.500 at 600 nm. Next, 0.4 mM (final concentration) isopropyl b-D-1-thiogalactopyranoside (IPTG) was added for 16 h to induce recombinant protein expression, after which cells were harvested by centrifugation (4,0006g, 20 min, 4uC) and frozen at 280uC. The pelleted cells were resuspended in 100 mM sodium phosphate pH 7.4 containing 200 mM NaCl, 60 mM imidazole and 10% (v/v) glycerol and were lysed with 3 ultrasound pulses (15 s, output 10 using a Branson Sonifer 250 adapted with a microtip) with 1 min intervals in ice to avoid heating the samples. The supernatants were recovered by centrifugation (13,200 g, 20 min, 4uC), and soluble recombinant proteins were incubated with 200 mL Ni-NTA Agarose (4uC, 1 h) (Qiagen, Hilden, Germany). The resin was pelleted and washed 5 times with 100 mM sodium citrate-sodium phosphate pH 6.0 containing 200 mM NaCl and 60 mM imidazole. Protein elution was performed using the same buffer containing 500 mM imidazole. Protein purity was verified by SDS-PAGE [21], and purified proteins were desalted using a minitrap G-25 (GE Healthcare, Upsala, Sweden) and stored at 4uC.
Protein concentration was determined by measuring the absorbance at 280 nm in 20 mM sodium phosphate pH 6.0 containing 6 M guanidinium hydrochloride. Extinction coefficients (e 280 nm ) were calculated based on the primary sequences of the wild-type and mutant Sfbgly proteins using the ProtParam server at ExPaSy (http://web.expasy.org/protparam/). The extinction coefficients ranged from 2.149 to 2.053 (Table S2).

Kinetic and thermal characterization of the mutant and wild-type Sfbgly
Enzyme kinetic parameters (k cat , K m and k cat /K m ) were determined for purified enzymes by measuring the initial rates (v 0 ) of hydrolysis of at least 10 different concentrations of substrates, including p-nitrophenyl-b-D-glucopyranoside, p-nitrophenyl-b-D-fucopyranoside and p-nitrophenyl-b-D-galactopyranoside (Sigma, St. Louis, MO, USA) prepared in 100 mM sodium citrate -sodium phosphate buffer pH 6.0. Experiments were performed at 30uC. The hydrolysis of these substrates was detected following the formation of p-nitrophenolate by absorbance at 420 nm after addition of 250 mM sodium carbonate -sodium bicarbonate buffer pH 11 to the reaction samples. Kinetic parameters K m and k cat were determined by fitting the v 0 and [S] data to the Michaelis-Menten equation using the Enzfitter software (Elsevier-Biosoft, Cambridge, UK).
Differential scanning fluorimetry (DSF) experiments of both wild-type and mutant Sfbgly were performed using SYPROH Orange solution (500-fold dilution) (Sigma, St. Louis, MO, USA). Melting studies were performed in optical tubes using a 7500 Real-Time System (Applied Biosystems, Foster City, CA, USA). The temperature gradient ranged from 25uC to 95uC with a slope of 0.5% per step. The melting data were fitted according to recent literature [22], resulting in a theoretical T m . Fitting processes were performed using EnzFitter software.

Tridimensional modeling and computational structure comparison
The tridimensional structure of Sfbgly was homology-modeled based on the crystallographic structure of the b-glucosidase from Neotermes koshunensis (PDB, 3VIK) using the Phyre2 software [23]. The sequences had 49% identity and 66% similarity. The structural models were visualized using the PyMOL molecular graphics system v1.1 (Schrödinger, LLC). The distances between amino acid pairs were calculated using DeepView/SwissPDB-Viewer v3.7 software [24]. The topology schemes of different bglucosidases were manually drawn using information regarding the secondary structures visualized in PyMOL.
Because SCA is based on the concept that covariant positions of a primary protein structure should be occupied by amino acid residues that are jointly involved in determining the same functional property [13], an experimental approach was designed to search the identified set of covariant positions for subgroups linked to the enzymatic activities or thermal stabilities of the GH1 b-glucosidases. Therefore, Sfbgly (GenBank code: AF 052729), a digestive enzyme from the fall armyworm Spodoptera frugiperda that has been extensively studied [4,10,15], was chosen as a representative b-glucosidase, and residues at its 23 covariant positions were separately replaced with alanine using a site-directed mutagenesis technique. Eighteen of these mutant enzymes were successfully  expressed as recombinant proteins in NovaBlue (DE3) bacteria and purified ( Figure S1). The E451A mutant of Sfbgly has been previously studied [4]. Mutant enzymes with replacements at positions 176, 329, 449 and 456 were not studied due to poor solubility. The enzyme kinetic parameters for the hydrolysis of three different substrates catalyzed by wild-type and the 18 mutant Sfbgly proteins were determined (Table 1 and Figure S2). Additionally, the transition temperatures (T m ) for the denaturation of the mutant and wild-type Sfbgly proteins were evaluated in Differential Scanning Fluorimetry (DSF) experiments (Table 2 and Figure S3).
Mutations resulting in at least a 4-fold change in the k cat /K m for the hydrolysis of at least two different substrates were considered replacements of residues relevant for the enzymatic activity because such variation corresponds to a DDG { higher than 3 kJ/ mol, which is equivalent to the disruption or formation of one hydrogen bond [25]. Based on these criteria, only 4 mutations had no significant effects on the enzymatic activity, namely K49A, M57A, N112A and S445A (Table 3).
Similarly, the mutational effect on the thermal stability of Sfbgly was considered significant for T m changes higher than 2.5 K, given that in the DSF experiments, this variation range corresponds to four-fold changes in the ratio of native to denatured wild-type Sfbgly. Based on that threshold, the mutations K49A, P62A, W143A, P188A, H223A, P278A, P309A, W452A and F460A were identified as replacements at positions involved in the thermal stability of Sfbgly (Table 3).

SC positions related to the catalytic activity in b-glucosidases
Although no structural information was used in the SCA, visual inspection of the Sfbgly structural model revealed that residues at 11 positions related to the enzymatic activity are directly connected to active site residues through covalent or non-covalent bonds. Therefore, those 11 residues form a single group indirectly connected to each other through the active site residues and their non-covalent interactions with the substrate. This group was labeled sector A ( Figure 1; Table 4). This observation is similar to previous SCAs of different protein families, which also showed groups of residues forming chains of interactions that connect distant points of their structures [19,26]. In the particular case of serine proteases, two sets of covariant positions were identified. The first set contained residues involved in the catalytic mechanism (including the catalytic triad), and the second set contained residues present in the S1 pocket and was involved in substrate specificity [13].
Structural comparison showed that as in Sfbgly, the placement of sector A positions in close contact to the active site is also observed for different b-glucosidases from the GH1 family ( Figure 1; Table 4). In brief, residues at sector A positions contact two glutamate residues (essential for catalysis), a pair of residues (tyrosine and arginine) involved in the modulation of the pK a of the catalytic glutamate, and a group of residues that bind the substrates glycone and aglycone (Table 4). Thus, sector A residues might modulate several b-glucosidase properties, from catalytic activity to substrate specificity by affecting the positioning and properties of their active site residues. Moreover, data presented here suggest that the active site of the b-glucosidases is formed by a ''layer'' of highly conserved residues that interacts directly with the substrate and promotes its hydrolysis and is surrounded by a second ''layer'' formed by residues of the sector A positions (Table 4; Figure 1).
Evidence that joint variation of residues at sector A positions is involved in substrate binding and specificity is found when comparing b-glucosidases and 6-phospho-b-glucosidases, both groups belonging to the GH1 family. Indeed, 99% of the sequences exhibiting Y at position 460 also present S at position 451, and among these sequences, several were previously characterized as 6-phospho-b-glucosidases. Similarly, 99% of the sequences presenting E at position 451 present F at position 460, and several have b-glucosidase activity. Thus, the identities of residues at positions 451 and 460 are linked and directly connected to enzyme specificity. In 6-phospho-b-glucosidases, the replacement of E by S creates space in their active sites for the binding of substrates containing a phosphate group linked to the 6-OH of the glycone and also favors the formation of a hydrogen bond with this substrate. An additional hydrogen bond with the 6-phosphate group is formed by Y460 [31,32] (Figure S4). Therefore, changing the substrate specificity of these two groups of b-glucosidases depends on replacements at both positions 451 and 460. A similar result was observed for the joint mutations C42A, C58A/V and S195T at trypsin covariant residues, which converted this enzyme from a serine to threonine protease [13].
Finally, five residues, P62, G195, Y196, P278 and P309, ( Table 3) were not included in sector A because no direct or even indirect interactions connecting them to the active site were identified. Thus, the replacements of those residues affected the Sfbgly activity through currently unknown mechanisms.  The numbering of the sector A positions was based on the Sfbgly sequence. * identifies catalytic glutamic acids; p -indicates residues involved in the modulation of the pK a of the catalytic glutamic acids; bg -shows residues involved in the binding of the substrate glycone; abr -indicates residues that form the aglycone binding region. Data regarding the role of individual residues in substrate binding and catalysis were retrieved from the literature [5,27,28,29,9,30]. The coupled positions in which replacements significantly affect Sfbgly T m (Table 3) are mostly occupied by prolines (4 out 9 positions) or positively charged residues (K49 and H223). Conversely, only one proline residue was identified among the eight coupled positions not related to Sfbgly thermal stability, and no charged residue is present among them. In addition, sequence comparison shows that positions 62, 188, 278 and 309 (Sfbgly numbering) are predominantly occupied by proline residues among family 1 b-glucosidase sequences (frequencies higher than 50%), whereas positively charged residues are dominant at positions 49 and 223 (90% and 63%, respectively). Moreover, structural comparison shows that positions related to thermal stability are concentrated at the loop segments of the bglucosidases (8 out 9 for Sfbgly), whereas positions not related to thermal stability are mainly located in their a-helices and b-strands (5 out 8 for Sfbgly). Naturally, this distribution mainly results from the properties of the proline residues, which favor loop segments [33]. Indeed, these data are in agreement with previous observations demonstrating that proline and charged residues and loop length and mobility are related to the thermal stability of proteins [34,35]. Thus, based on their similarities, coupled positions related to the thermal stability of b-glucosidases were labeled sector S ( Figure 2). Therefore, in contrast to sectors observed in the serine protease [13] and sector A of b-glucosidases (discussed above), which are composed of residues contacting each other, sector S of the bglucosidases is formed by positions with no physical interactions but is characterized by the prevalence of proline residues. Because of their placement in loops, sector S positions are jointly related to the thermal stability of b-glucosidases. Figure S1 SDS-PAGE of the purified wild-type and mutant Sfbgly proteins. Mutant enzymes are identified by the residue and number of the covariant position followed by A, representing the replacement that they contain. Wt stands for wild type Sfbgly. Details regarding the protein purification procedures are found in the Materials and Methods section. The gels (12% polyacrilamide) were stained using Coomassie blue R-250. Approximately 2 mg of protein was loaded in each lane. MW, molecular weight markers. (TIF) Figure S2 The effect of substrate concentration on the activity of the wild-type and mutant Sfbgly proteins.   Figure S3 Differential scanning fluorimetry of the wildtype and mutant Sfbgly proteins. Samples (10 ng) were incubated with SYPRO stain, and the fluorescence was recorded at different temperatures. Mutant enzymes are identified by the residue and number of the covariant position in which A was introduced. Dots are the experimental data, whereas lines represent the calculated values produced based on the best fitting. Fitting process was performed using the software Enzfitter and the equation deduced on [22]. (TIF) Figure S4 Structural comparison between the active site of b-glucosidases (A) and 6-phospho b-glucosidases (B) from family GH1. A glucose unit (gray and red; A) is represented in the glycone subsite of the b-glucosidases Sfbgly (green; Phyre2 model), Neotermes koshunensis (blue; PDB 3VIF) and wheat Triticum aestivum (yellow; PDB 3AIQ), whereas a 6-phospho glucose unit (gray and red, B) is presented for the 6-phospho bglucosidases from Streptococcus pneumonia (cyan, PDB 4IPN) and Lactococcus lactis (formely know as Streptococcus lactis) (majenta, PDB 4PBG). Residues numbering was based on Sfbgly. Dotted lines represent hydrogen bonds. 69-P-OH indicates the phosphate group bound to the glucose hydroxyl 6. Based on [31] and [32]. (TIF)