Histidine 352 (His352) and Tryptophan 355 (Trp355) Are Essential for Flax UGT74S1 Glucosylation Activity toward Secoisolariciresinol

Flax secoisolariciresinol diglucoside (SDG) lignan is a natural phytoestrogen for which a positive role in metabolic diseases is emerging. Until recently however, much less was known about SDG and its monoglucoside (SMG) biosynthesis. Lately, flax UGT74S1 was identified and characterized as an enzyme sequentially glucosylating secoisolariciresinol (SECO) into SMG and SDG when expressed in yeast. However, the amino acids critical for UGT74S1 glucosyltransferase activity were unknown. A 3D structural modeling and docking, site-directed mutagenesis of five amino acids in the plant secondary product glycosyltransferase (PSPG) motif, and enzyme assays were conducted. UGT74S1 appeared to be structurally similar to the Arabidopsis thaliana UGT72B1 model. The ligand docking predicted Ser357 and Trp355 as binding to the phosphate and hydroxyl groups of UDP-glucose, whereas Cys335, Gln337 and Trp355 were predicted to bind the 7-OH, 2-OCH3 and 17-OCH3 of SECO. Site-directed mutagenesis of Cys335, Gln337, His352, Trp355 and Ser357 , and enzyme assays revealed an alteration of these binding sites and a significant reduction of UGT74S1 glucosyltransferase catalytic activity towards SECO and UDP-glucose in all mutants. A complete abolition of UGT74S1 activity was observed when Trp355 was substituted to Ala355 and Gly355 or when changing His352 to Asp352 , and an altered metabolite profile was observed in Cys335Ala, Gln337Ala, and Ser357Ala mutants. This study provided for the first time evidence that Trp355 and His352 are critical for UGT74S1’s glucosylation activity toward SECO and suggested the possibility for SMG production in vitro.


Introduction
Lignans are a class of diphenolic nonsteroidal phytoestrogens with a wide variety of purported health benefits [1][2][3][4][5][6][7][8]. Different types of lignans have been reported in various plant species and include secoisolariciresinol diglucoside (SDG) encountered mainly in flax (Linum usitatissimum L.) seed [9][10][11][12][13][14]. Flax lignans are usually found glycosylated in oligomeric chains [15]; its aglycone (SECO, MW 5362.4 g/mol) and monoglucoside (SMG) forms not being naturally accumulated in planta. Recently, During et al. [16] reported a linearly increased uptake of lignan aglycone forms (pinolariciresinol -PINO, SECO, and enterolactone -ENL) by human intestinal Caco-2 cells through simple diffusion or by low affinity transporter. Only ,0.1% SDG uptake was observed compared to 2% SECO, 2% PINO, and 7% ENL uptake by Caco-2 cells, evidencing the effect of glucosylation on absorption and bioavailability [16]. Due to its lower molecular weight (MW 5524.6 g/mol), it is reasonable to anticipate that SMG may be more prone to uptake through diffusion by Caco-2 cells compared to SDG (MW 5686.7 g/mol). In planta, glycosylation is a key mechanism that determines the chemical complexity and diversity of plant natural products [17,18], ensuring their chemical stability and water solubility while reducing chemical reactivity or toxicity [19], and facilitating their sorting, intercellular transport, storage and accumulation in plant cells [20][21]. Glycosylation is catalyzed by carbohydrate active enzymes (CAZymes), which include the glycosyltransferase (GT) superfamily [22]. Members of the GT superfamily have been classified into 94 families, with family 1 referred to as uridine glycosyl transferases (UGTs) [23,24]. Plant UGTs are characterized by a 44 amino acid signature motif known as plant secondary product glycosyltransferase (PSPG) box [15,24,25]. UGTs transfer UDP-activated sugar moieties, including UDP-glucose, to specific acceptor molecules [26] and contribute to their structural diversity. Based on sequence homology, more than 120 UGTs have been reported in Arabidopsis and were grouped into 30 sub-families, classified from UGT71 to UGT100 [22]. Recently, Barvkar et al. [27] reported 137 functionally uncharacterized flax UGTs from the flax draft genome [28]. Concomitantly to Barvkar's study [27], we have cloned and characterized five family 1 UGT genes belonging to four UGT families and to five sub-families referred to as UGT74S1, UGT74T1, UGT89B3, UGT94H1, UGT712B1. The functional characterization of the five UGTs identified UGT74S1 as the only one using SECO as substrate, forming SECO monoglucoside (SMG) and then SDG in a sequential manner [29]. Nonetheless, its catalytic mechanisms were not known. Many plant UGTs have been shown to be more regiospecific than substrate specific [30,31]. In flax, UGT74S1 was shown to glucosylate only SECO among the different aglycones tested [29]. However, its strict substrate specificity is unknown and its regiospecificity cannot be ruled out.
Despite the large plant UGT sequence data available in databases, the crystal structures of only a few have been reported so far [36]. The gap between the number of existing sequences and structures has driven computational methods for predicting protein structures [32]. Homology modeling and structure-based protein-ligand molecular interaction docking analyses have become powerful tools for predicting functional residues in proteins, guiding the development of functional hypotheses [32][33][34][35], rationalizing experimental data, and designing directed mutagenesis experiments [37]. As such, 3D structure modeling and docking of UGT85B1 and UGT94B1 proteins with ligands have been reported in several plant species including Sorghum bicolor and Bellis perennis [38,39]. Whereas the general role for the PSPG motif in substrate recognition and catalytic activity is widely accepted [18,39], the specific role for individual amino acids within this motif is still a matter of debate and extensive investigations. Structurebased rational mutagenesis studies have helped in the identification of key amino acids involved in substrate binding and catalysis of UGTs, and have generated mutants with altered regiospecificity, compromised activity or turnover [41]. In Medicago truncatula, Phe 148 and Tyr 202 were found to control regiospecificity for quercetin glucosylation by UGT71G1 [30]. In soybean, Glu 392 was identified as critical for GmlF7GT primary catalysis [42]. Currently, little is known about the key amino acids playing an essential role in the flax UGT74S1 glucosylation activity toward SECO. The objective of this study was to determine the role played by 5 amino acids located within the PSPG motif of the flax UGT74S1 in SECO glucosylation. Using 3D structural modeling of UGT74S1 protein, ligand docking, site-directed mutagenesis, heterologous expression and enzyme assays, we showed that Gln 337 and Ser 357 are essential for SMG conversion to SDG and that Trp 355 and His 352 are key critical amino acids within the PSPG motif and are determinant for UGT74S1 glucosylation activity toward SECO in vitro.

Molecular modeling and docking
Secondary and 3D protein structures were predicted using Protein Homology/ analogy Recognition Engine V2.0 (Phyre2) software [32]. Briefly, the hidden Markov model (HMM) of UGT74S1 (JX011632; AGD95005) was constructed by iteratively detecting homologs in a protein database using PSI blast search engine. These homologs were scanned against experimentally solved structures present in an HMM protein library. The 3D protein models were constructed based on the alignments between the HMM of UGT74S1 sequence and the HMMs of known structures. The UGT74S1 structure was finally determined using the structure of the Arabidopsis thaliana UGT72B1 (Q9M156; PDB ID, d2vcha1; X-RAY DIFFRACTION with resolution of 1.45 Å ) [43] as the closest search model. The generated protein model of wild type UGT74S1 was then subjected to beta testing using Phyre2 Investigator [32], interactively examining the pocket detection. The ZINC database codes [44] for three different conformers of SECO (2020114, 14694365, 14694366) and one for UDP-glucose (30320665), were used individually with the designed 3D protein structure of UGT74S1 for docking in SwissDock [34]. The predicted binding sites between each SECO structure or UDP-glucose and the UGT74S1 protein were clustered and visualized using UCSF Chimera [45]. Only the clusters and models showing the lowest energy and highest number of inter-molecular H bonds were selected, using default parameters (2 Å bond length and 0.4 Å relaxation), and applied for the rest of the study in docking and ligand binding sites predictions in the mutant variants.

Site-directed mutagenesis
Cloning of the full length UGT74S1 cDNA and its protein expression have been previously described [29]. The pYES2/NT C plasmid constructs harboring the full length cDNA for UGT74S1 was used as a template for all the site-directed mutagenesis reactions. Amino acids targeted for mutation were those predicted as binding the sugar donor or the acceptor to the wild UGT74S1 in molecular docking analysis. The highly conserved His 352 was also included as a target. These amino acids were substituted with smaller hydrophobic amino acids such as alanine, glycine, or the charged aspartic acid based on Phyre2 Investigator server prediction. This server predicts the effects of amino acid substitutions using the SuSPect method, a standalone web server that generates a mutational graph analysis by modelling the effect of specific amino acid mutations within the UGT74S1 protein sequence. Gene-specific primers containing the desired mutations were designed in such a way that they were 100% complementary to one another, had no overhangs, and the mutation site was centrally located on both primers (Table S1). The sequences of the wild UGT74S1 used for designing the mutagenesis forward and reverse primers are presented in S1 Fig. These primer sets were used to mutate 5 targeted amino acids by performing a DNA methylation and mutagenesis reaction as instructed by the GENart site-directed mutagenesis system (Invitrogen, Carlsbad, CA, USA). Briefly, the final methylation and mutagenesis reaction mixture consisted of 5 ml of 10X AccuPrime Pfx reaction mix, 5 ml of 10X Enhancer, 1.5 ml of primer mix (10 mM each), 20 ng of plasmid DNA, 1.5 ml of DNA Methylase (4 U/mL), 2.0 ml of 25X SAM, 0.4 ml of AccuPrime Pfx taq (2.5 U/mL), and PCR grade water (Life Technologies, Carlsbad, CA, USA), for a final volume of 50 mL. The methylation reactions were activated at 37˚C for 20 min and the inverted mutagenesis PCR cycling conditions consisted of an initial denaturation at 94˚C for 2 min followed by 18 cycles of 94˚C for 20 s, 57˚C for 30 s, and 68˚C for 3 min. The final extension was carried out at 68˚C for 5 min. After the reactions, 5 mL of the PCR products were visualized on a 0.8% agarose gel and the remaining products were purified using a PCR purification kit (Qiagen, Hilden, Germany).

In vitro DNA recombination and bacterial transformation
The in vitro DNA recombination reaction mixture consisted of 4 mL of 56 reaction buffer, 2 mL of 106 Enzymer mix (Invitrogen), 8 mL of purified PCR product from the mutagenesis reaction, and 6 mL of PCR grade water, for a final volume of 20 mL. The mixture was incubated at room temperature for 10 min and the recombination reaction was stopped by adding 1 mL of 0.5 M EDTA prior to transformation into E. coli strain DH5a-T1R. A 5 mL of recombination product and 50 mL DH5a-T1R cells were used for the transformation. A total of 100 mL from 10-fold diluted transformation reaction was spread on pre-warmed LB agar plates containing 50 mg/mL of ampicillin and incubated at 37˚C for 16-20 h. From each of the six transformation events, 10 colonies were analyzed by colony PCR using the UGT74S1 gene-specific forward and reverse primers as well as the pYES2/NT C plasmid vector-specific T7 forward and CYC reverse primers. Plasmid DNA from three positive clones of each transformation event was sequenced by the Sanger method using the same UGT74S1 gene-specific and pYES2/NT C vector-specific primer pairs for confirmation of mutations.

Heterologous expression
The pYES2/NT C plasmid constructs harbouring the cDNA of wild type UGT74S1 and those of the 6 mutant UGT74S1 genes were used to transform yeast INVSc1 strains, as described by Ghose et al. [29]. Briefly, single transformant INVSc1 yeast colonies were inoculated into 15 mL of Saccharomyces cerevisiae minimal media without uracil (SC-U), prepared as recommended by the supplier (Invitrogen), supplemented with 2% raffinose, and grown for 3 days under shaking at 250 rpm at 30˚C until the OD 600 reached 2.0. The culture was diluted in 50 mL of induction medium (SC-U supplemented with 1% raffinose and 2% galactose) to achieve an initial OD 600 of 0.4 and further incubated under shaking at 30˚C for 8 h. The induced yeast cells were harvested by centrifugation at 1,500 g for 5 min at 4˚C. The cells were washed using 500 mL of sterile cold distilled water, centrifuged and the pellets washed with 500 mL of lysis buffer (50 mM sodium phosphate, pH 7.4 supplemented with 5% glycerol and 1 mM PMSF) at 4˚C. After centrifugation, the cells were mechanically disrupted by vortexing for 30 seconds in the presence of an equal volume of 425-600 mm acid-washed glass beads (Sigma-Aldrich, St. Louis, MO, USA) and incubated on ice for 30 s. The vortexing and incubation cycle was repeated 4 times to ensure complete cell lysis. The lysates were centrifuged at 18,620 g for 10 min at 4˚C and the supernatant was collected. The polyhistidine containing recombinant proteins were purified using the ProBond (Invitrogen) purification system following the manufacturer's instructions. The purified enzymes were concentrated using 0.5 mL Ultracel-10k Amicon membrane column (Millipore, Billerica, MS, USA). Protein concentration was determined using the Bradford protein assay kit (Bio-Rad Laboratories, Hercules, CA, USA). The stability and expression of all six mutant proteins were monitored by Western blotting in which 40 mg total protein was separated on 12% polyacrylamide gel and transferred onto Immuno-blot PVDF membranes (Bio-Rad) using the Trans-Blot SD semi dry Transfer System (Bio-Rad). The membrane was first blocked in 5% nonfat dry milk blotting grade blocker (Bio-Rad) in TBST and incubated overnight with goat Anti-Xpress monoclonal antibody (1:5,000 dilution, Life Technologies). The blots were washed with 0.05% TBST and incubated for one hour with HRP conjugated rabbit anti goat IgG secondary antibody (1:6,000 dilution, Bio-Rad). After washing with 0.05% TBST, protein bands were developed using Immunostar Western C chemiluminescence kit western Blotting Detection Reagent (Bio-Rad) and photographed with a ChemiDocXRS molecular imaging system (Bio-Rad).

Enzyme assays and UPLC metabolite profiling
The purified recombinant proteins obtained from the yeast cultures harboring the wild type or the six mutant variants of UGT74S1 were reacted with acceptor substrate SECO (Chromadex, Irvine, CA, USA), in the presence of UDP-glucose following the optimized conditions reported in [29]. Other alternative aglycones including quercetin, kaempferol, coumaric acid, and caffeic acid were also tested as substrates with each mutant and wild type of UGT74S1. The 100 mL reaction mixture consisted of a reaction buffer (50 mM sodium phosphate, 1 mM PMSF, 5% glycerol, pH 7.4), 280 mM aglycone SECO substrate (acceptor for glucosylation), and 1.64 mM UDP-glucose (sugar donor) (Sigma-Aldrich). The reaction mixtures were pre-incubated at 30˚C for 10 min and the reactions were initiated by addition of 80 mg of enzyme. After incubation at 30˚C for 30 min, the reactions were stopped with 100 mL of 0.5% trifluoroacetic acid in acetonitrile. The reaction mixtures were purified using 0.2 mm membrane filters (Pall Life Sciences, Port Washington, NY, USA) to remove any particulates that might form during the reaction. The separation and identification of the reactants and resulting products were carried out using a Waters H-Class Acquity UPLC system (Waters, Milford, MA, USA) equipped with a TQD tandem mass spectrometer (Waters) using a Waters CSH C18 column (100 mm62.1 mm, 1.8 mm particle size). The formation of glucosylated products was monitored by examining the parent m/z masses and the principle fragments of eluted peaks, via ESI-mass spectrometry [29]. Two parallel MS2 scans were performed ranging from 120-800 a.m.u., using 15 and 45 V cone voltages. Selected ion recording (SIR) spectra were also collected to enhance the sensitivity of detection for SECO, SMG and SDG. The capillary voltage was set to 3 kV, the extractor and RF lens at 0.1 V. Chromatographic conditions consisted of a ternary gradient system composed of water (A), acetonitrile (B), and 10% formic acid in water (C), varied according to the following gradient: t0, A 568%, B 52%, C 530%; t154.4 min, A 50%, B 570%, C 530%; held isocratically until 6 min, afterward ramped down to starting conditions at 7 min, then held isocratically for 1 min to equilibrate before the next injection. Peaks were detected at 280 nm, indicative of phenolic lignan compounds, and were validated using authentic standards (SECO and SDG) purchased from Chromadex (Chromadex, Irvine, CA, USA). An in house standard of SMG was prepared by partial acid hydrolysis of SDG, purified and validated by LC-MS and NMR; SDG itself being generated and purified by alkaline hydrolysis of bulk flaxseed lignans [29]. All the reactions were carried out in three replicates and the data presented are the means ¡ standard deviations. A one-tailed student's t-test was performed to test metabolite production levels by UGT74S1 variants.

Wild and mutant UGT74S1 kinetic parameters
The kinetic parameters of the wild type and two UGT74S1 mutants were determined using a range of concentrations for sugar acceptor substrate (70-1650 mM SECO with fixed 1.64 mM UDP-glucose) or a range of concentrations for sugar donor (0.82-8.2 mM UDP-glucose with fixed 280 mM SECO) for 30 min, under optimum conditions described by Ghose et al. [29]. A total of 80 mg protein was used to determine the apparent Vmax and Km values for SECO and UDP-glucose the from the Lineweaver-Burk plots. The kcat was determined by dividing Vmax by the molar concentration of the enzyme.

Results
Modeling and molecular docking of wild type UGT74S1 with UDPglucose and SECO To determine key catalytic amino acids involved in UGT74S1 glucosylation mechanisms, 3D molecular modeling and ligand docking were conducted. By modeling wild type UGT74S1 against the PDB molecule UDP-glucosyltransferase (PDB ID, d2vcha1, Q9M156) using Phyre 2 tools, 458 amino acid residues, representing 98% of the total amino acid sequence, were successfully modeled with more than 90% accuracy. The generated 3D structure (S2 Fig.) had 1.45Å resolution with 100% confidence. The PSI Blast search showed a 25% identity between the wild UGT74S1 and the PDB target molecule. Phyre2 investigator 2D prediction detected the amino acids potentially located in the enzyme pocket, as well as amino acids with highest mutational sensitivity, mainly found clustered within the PSPG region (S3 Fig.).
Using the UGT74S1's 3D structure and its ligand (SECO or UDP-glucose) ZINC codes, the enzyme-ligand binding sites were predicted (Table 1; Fig. 1). Two hydrogen bonds were found to be involved in binding UDP-glucose to the UGT74S1 protein. The first bond occurred between the serine at position 357 (Ser 357 ) and the oxygen atom 8 located on the phosphate group 2 within the UDP-glucose moiety, forming a 2.207 Å hydrogen bond (Table 1; Fig. 1). The second bond occurred between the tryptophan at position 355 (Trp 355 ) and the oxygen atom 13 of the hydroxyl group located at position 13 of UDP-glucose, forming a 2.145 Å hydrogen bond (Table 1; Fig. 1). These UDP-glucose binding sites were located within the HCGWNS region of PSPG. Similarly, the docking analysis predicted three amino acid residues to be involved in four hydrogen bond formations with SECO. First, the oxygen atom 1 of the methoxy-ester group at position 2 of SECO formed a 2.512 Å bond with the glutamine at position 337 (Gln 337 ). Secondly, the hydrogen 9 of the hydroxyl group located at position 7 of the SECO formed a 1.802 Å bond with cysteine at position 335 (Cys 335 ). Thirdly, the oxygen atom 2 of the hydroxyl group at position 7 of the SECO formed a 2.566 Å bond with cysteine at position 335 (Cys 335 ), and fourthly, the oxygen atom 4 of the methoxy-ester group at position 17 of SECO formed a 2.486 Å bond with tryptophan at position 355 (Trp 355 ) of UGT74S1 (Table 1, Fig. 1). Based on

In vitro site-directed mutagenesis
Using 20 possible amino acids within the query protein (wild type UGT74S1), SusPect method predicted the effect of their mutation in impairing the enzyme function. Substitutions of Cys 335 , Gln 337 , Trp 355 , Ser 357 , and His 352 by Ala, Gly or Asp were predicted to have high likelihood to impair UGT74S1 function (S4 Fig.), thus giving more power to the selection of the targeted amino acids based on docking data. Using six pairs of primers targeting the 5 amino acids in sitedirected mutagenesis, 6 mutated full length cDNA sequences were generated and confirmed by sequencing (S5 Fig.). These 6 mutated cDNAs were translated in silico into 6 mutant proteins referred to as Cys335Ala, Gln337Ala, Ser357Ala, Trp355Ala, Trp355Gly, and His352Asp mutants.

Comparative 2D structures of the wild and mutant UGT74S1 proteins
To get more insights into the structural changes that may affect the UGT74S1 glycosyltransferase activity in vitro, a secondary dimension (2D) protein structural analysis of the mutant and the wild type proteins was performed. Mutations induced changes in the number of a-helices and b-strands and a high diversity in the b-strand number (S6 Fig.; S2 Table). The mutant Ser357Ala protein carrying an alanine mutation and the wild type showed the same number of a-helices and b-strands. In contrast, whereas all other mutants displayed 17 a-helices, the mutant protein Trp355Ala showed 16 a-helices. To further assess whether these structural changes affected the protein conformations and binding features, 3D molecular models of wild type and mutant of UGT74S1 were built (S7 Fig.) and their homology-based modeling matched the same UGT template with 100% confidence. The template and targets showed 24-28% identity, and overall, 458-463 amino acid residues were successfully modeled with more than 90% accuracy (S3 Table).

Molecular docking of the six UGT74S1 mutants with UDP-glucose and SECO
To further assess whether site-directed mutagenesis affected the predicted UGT74S1 binding characteristics, a molecular modeling and docking study was conducted using each mutant variant as performed for the wild type UGT74S1. Gains, losses, or changes of amino acid binding sites were observed for all the six UGT74S1 mutant variants for UDP-glucose and SECO (Table 2, 3). Using UDP-glucose as a ligand, two hydrogen bonds were found to be involved in binding UDP-glucose to the mutant Cys335Ala protein (Table 2; S8 Fig.). The first bond occurred between the oxygen atom 8 on phosphate group 2 of UDP-  Fig.).
Using the sugar acceptor SECO as a ligand, the docking of each Cys335Ala and Gln337Ala mutant proteins predicted asparagine 356 (Asn 356 ) forming two hydrogen bonds with SECO (Table 3; S8 Fig.). With both Cys335Ala and Gln337Ala mutant proteins, the interactions occurred between an Asn 356 residue and oxygen atom 4 of the methoxy-ester group located at positions 17, and oxygen atom 5 of the hydroxyl group at position 16 of SECO (Table 3; S8 Fig.). Similar to Cys335Ala and Gln337Ala, the three mutant Ser357Ala, Trp355Gly and His352Asp proteins were predicted to bond SECO by three hydrogen bonds involving Cys 335 and Gln 337 amino acid residues and the oxygen atom 2 of hydroxyl group at position 7, the hydrogen 9 of hydroxyl group at position 7, and the oxygen 1 of the methoxy-ester group at position 2 of SECO. The mutant protein Trp355Ala was predicted to interact with SECO through a single bond involving Ser 180 residue and the hydrogen 23 of the hydroxyl group at position 16 of SECO. The bond lengths between the three proteins and SECO were fairly similar (Table 3; S8 Fig.). None of the hydroxyl groups at positions 3 and 6; potential targets for glucosylation were involved in hydrogen bond formation with the proteins (S8 Fig.).

In vitro heterologous protein expression and enzyme activity
To assess the expression and functionality of the different mutant proteins, the full length cDNAs for the wild type UGT74S1 and that of each of the 6 UGT74S1 mutants were expressed in vitro in yeast. Similar to the wild type UGT74S1, all six mutants produced, along with the Histidine-Tag, a discrete protein band of 56.4 kDa (Fig. 2).
To determine the biological effects of the predicted ligand binding site alterations induced by site-directed mutagenesis, enzyme assays were performed using the purified proteins from the wild type UGT74S1 and each of the 6 six mutant versions. A significant reduction of UGT74S1 glucosyltransferase activity towards SECO was observed in all mutants when compared to the wild type (Fig. 3). None of the UGT74S1 mutants or wild type were shown to glucosylate any of the other aglycone substrates tested, and only the substrate's peaks were observed on chromatograms (data not shown). When Trp 355 was substituted by either Ala or Gly, a complete abolition of activity was observed. No glucosylation activity was also observed when His 352 was substituted by Asp 352 . Mutation of Cys 335 , Gln 337 and Ser 357 altered the lignan profiles in the reactions, with a significant reduction to null level of SDG production, while still producing SMG intermediate. Mutant Cys335Ala produced a significantly (P,0.001) lower level of SDG compared to the wild type, but produced a significantly higher (P,0.001) amount of SMG. Mutants Gln337Ala and Ser357Ala produced only SMG which was significantly (P,0.001) less than that produced by the wild type. No SDG was produced by these two mutants (Fig. 3).

Wild and mutant UGT74S1 kinetic parameters
Under the optimal conditions (80 mg enzyme, 1 mM NaCl, pH 8.0, 30˚C), the estimated apparent Km of the wild type and mutants (Cys335Ala and Gln337Ala) UGT74S1 proteins toward SECO or UDP-glucose were determined for SMG production ( Table 4). The UGT74S1 mutants showed lower affinity (higher Km) for SECO and UDP-glucose compared to the wild type. The wild type UGT74S1 had a higher catalytic efficiency (Kcat/Km), reacting 10 and 112 times faster with SECO than Cys335Ala and Gln337Ala, respectively, and 1.64 and 6 times faster with UDP-glucose than the same mutants, respectively. These kinetic parameters clearly demonstrate that the UGT74S1 mutants were less efficient in converting the substrate into overall SMG products.

Discussions
Prediction of protein-ligand interactions has paved the way to rational amino acid residue mutagenesis as an approach for a better understanding of their biological roles in protein activity and catalysis [46][47][48][49]. In this study, molecular docking, site-directed mutagenesis, and enzyme activity assays were conducted to determine the role played by five amino acid residues located within the PSPG motif of the recently characterized flax UGT74S1 [29]. The 3D structure of the UGT47S1 protein was constructed and its binding sites to ligands were predicted. Assessment of the wild type and mutant proteins' activities experimentally substantiated these predictions. Mutation of Trp 355 and His 352 completely abolished UGT74S1 enzyme activity toward SECO. Although, HCGWNS motif appears to be essential for UGT activity, binding to substrates, and product formation (based on the works from several other groups), this study is the first to report on a 2 step glucosylation enzyme and raised the possibility of producing a non-naturally occurring metabolite following mutagenesis within the motif. The findings not only confirmed our previous reports [29], but provided the first evidence that Trp 355 and His 352 are critical for UGT74S1 glucosylation activity toward SECO.
Using Phyre homology-based modeling, a 3D structure for UGT74S1 was produced with 90% accuracy and 100% confidence. The identity of UGT74S1 to known proteins was 25% and permitted an accurate modeling. Indeed, despite low primary sequence similarity, the secondary and tertiary structures of GTs are highly conserved [18], and less than 20% identity are commonly used for homology modeling by exploiting the 2D and 3D structure conservation between a query protein and a template protein of known structure [18,32]. By docking the UGT74S1's 3D structure with ligands, UDP-glucose and SECO were fitted in a pocket located in the PSPG region. UDP-glucose binds to Trp 355 and Ser 357 at the C-terminal portion of the PSPG, while SECO binds primarily to Cys 335 and Gln 337 at the N-terminal portion of this conserved motif, and one hydrogen bond established with Trp 355 at the C-terminal region, consistent with previous reports [18,35,50]. These predicted binding sites served for rational targeted mutagenesis that produced six UGT74S1 mutants, unprecedented in flax. Site-directed mutagenesis modified the protein secondary structure and binding sites to ligands. The wild UGT74S1 consisted of 17 a-helices and 13 bstrands. In contrast, a slightly higher number (14)(15) of b-strands were observed in all mutants, except for mutant Ser357Ala which was made of 13 b-strands, as was the wild type UGT74S1. All mutants displayed 17 a-helices except the mutant Trp355Ala which was made of 16 a-helices and 15 b-strands, thus being different from the wild type and all other mutants. Hence, wild UGT74S1 and its mutant variants displayed only a slight 2D structure variation. Previous studies have described 14 a-helices and 12 b-strands in UGT85B1 [38] whereas 16 a-helices and 13 b-strands were reported in UGT94B1 [36]. These observations are in agreement with the concept that 2D and 3D structures of query proteins and templates of known structures are conserved despite variations in the primary structures [18,32,41] and are further indications for the accuracy of our modeling. Changes in the 2D protein structures after site-directed mutagenesis were also followed by changes in the predicted ligand binding site to UGT74S1 variants, although their protein expression patterns in yeast were not different. Alterations of the predicted ligand binding sites in mutants were substantiated by enzyme assays. Reduced SECO glucosylation activity and alterations in the amounts and type of end products were observed in mutants carrying Cys335Ala, Gln337Ala, and Ser357Ala proteins as evidenced by their lower catalytic efficiency (Kcat/Km) when compared with the wild type. Mutant Cys335Ala produced both SDG and SMG, with the latter being synthetized more than 3X higher than SDG. Mutants Gln337Ala and Ser357Ala produced only SMG, albeit in small amounts. Both mutant proteins were predicted to bind UDP-glucose at Trp 355 , suggesting its important role in SMG production. As to why only SMG was observed in 2 mutants, this may be attributable to their extreme low catalytic efficiency toward both SECO and UDP-glucose and as consequence, unable to achieve the second glucosylation step into SDG. Some UGTs such as UGT78D2 and UGT71C2 have been reported to glycosylate only the 3-OH or 7-OH positions, respectively, whereas other UGTs such as UGT88A1 were capable of recognizing multiple positions [31]. UGT74S1 is part of the latter group as reported by Ghose et al. [29] and mutation of its Gln 337 and Ser 357 led to glucosylation of a single position as shown in mutants Gln337Ala and Ser357Ala. Hence, Gln 337 and Ser 357 can be engineered and used to produce a single metabolite (SMG) in vitro as this metabolite may be more bioavailable than SDG because of inter-individual differences in the gut microflora ensuring the deglucosylation of polyphenolics prior to absorption. The substitution of Trp 355 by the hydrophobic amino acids Ala or Gly, and His 352 by the negatively charged Asp resulted in the complete abolition of enzyme activity. It seems that these three mutations induced unsuitable UDP-glucose binding positions to the proteins and thus prevented the catalysis. This assumption is in line with the extreme low catalytic efficiency observed with the mutants Gln337Ala and Ser357Ala showing a much reduced activity. It is well known that a prerequisite and crucial point for activity is the position of the acceptor -OH, -NH 2 or -SH functional groups amenable to glycosylation [18]. For UGTs, the accepting functional group needs to be positioned near the 19C (C10 and C20 in this study; (S8 Fig.)) of the sugar-donor glucose and near to the amino acid that acts as a general base to facilitate the deprotonation of the acceptor. In most plant UGTs, this deprotonation amino acid is a histidine residue [40,[43][44][45][46][47][48][49][50][51]52]. In this study, mutation of His 352 located in the PSPG abolished UGT74S1 activity. Thus, it is reasonable to assume that His 352 of UGT74S1 may be responsible for the deprotonation process of the acceptor SECO and its mutation to Asp 352 impaired this process, despite UDPglucose and SECO being positioned in the pocket along the PSPG motif. Mutation of Trp 355 also abolished UGT74S1 activity. This amino acid residue interacted with both SECO and UDP-glucose in the wild type UGT74S1, and these interactions were altered in the mutants Trp355Ala and Trp355Gly and no activity occurred, providing evidence of its critical role in ligand binding and catalysis. Wild type UGT74S1 and three of its mutant versions showed glucosylation activity toward SECO only, among the aglycone substrates tested, suggesting exclusive SECO substrate specificity for this enzyme, and that His 352 and Trp 355 likely control the regiospecificity/regioselectivity of UGT74S1 for SECO glucosylation in flax.

Conclusion
Flax UGT74S1 sequentially glucosylates SECO into SMG and SDG as previously reported [29] and for the first time, we provided convincing evidence that Gln 337 and Ser 357 are crucial for conversion of SMG to SDG by this enzyme and that Trp 355 and His 352 are essential for UGT74S1 glucosylation activity toward SECO. Since SMG is not accumulated in planta, is not available commercially, and two of the mutants described in this study produced only this metabolite, we believe that tools and resources are now available to produce SMG in a fermentation setting. The coordinates used to generate each structure is indicated below the given structure. The a-helix, b-strand, and coils are colored in red, yellow and green, respectively. The PSPG region is indicated in blue; amino acids targeted for site-directed mutagenesis are shown in pink. The mutated amino acids in mutants are shown in cyan blue. doi:10.1371/journal.pone.0116248.s007 (TIF) S8 Fig. Comparative molecular docking of the wild type UGT74S1 and the six mutants using SECO and UDP glucose as ligands. The structure of ligand SECO and UDP-glucose are presented on top of the docking models where atoms are numbered following the UCSF Chimera numbering system. Mutant proteins are named using their one-letter amino acid codes. W, Tryp; C, Cys; Q, Gln; A, Ala; S, Ser; H, His; D, Asp. Only the portion of the protein interacting with ligands is shown. a-helix, b-strand, and coil are colored in red, yellow green, respectively; the hydrogen bonds between amino acids and ligands and their respective length (Å ) are indicated in black. The PSPG region is indicated in blue; amino acid residues involved in binding the sugar donor UDP-glucose or sugar acceptor SECO are colored in pink; The mutated amino acids are shown in cyan blue; the ligand oxygen atoms involved in hydrogen bond formation are circled in orange; the ligand hydrogen atoms involved in hydrogen bond formation are circled in grey; the SECO ligand oxygen atoms targeted for glucosylation are circled in blue. doi:10.1371/journal.pone.0116248.s008 (TIF) S1 Table. Forward and Reverse Primers used for site-directed mutagenesis of UGT74S1. doi:10.1371/journal.pone.0116248.s009 (DOCX) S2 Table. Description of a-helices and b-strands found in wild type and mutants of UGT74S1. The number of a-helices and b-strands, and percentage of amino acids involved in the a-helices, b-strands as well number of disordered structures are shown. doi:10.1371/journal.pone.0116248.s010 (DOCX) S3