Functional and Structural Analysis of a β-Glucosidase Involved in β-1,2-Glucan Metabolism in Listeria innocua

Despite the presence of β-1,2-glucan in nature, few β-1,2-glucan degrading enzymes have been reported to date. Recently, the Lin1839 protein from Listeria innocua was identified as a 1,2-β-oligoglucan phosphorylase. Since the adjacent lin1840 gene in the gene cluster encodes a putative glycoside hydrolase family 3 β-glucosidase, we hypothesized that Lin1840 is also involved in β-1,2-glucan dissimilation. Here we report the functional and structural analysis of Lin1840. A recombinant Lin1840 protein (Lin1840r) showed the highest hydrolytic activity toward sophorose (Glc-β-1,2-Glc) among β-1,2-glucooligosaccharides, suggesting that Lin1840 is a β-glucosidase involved in sophorose degradation. The enzyme also rapidly hydrolyzed laminaribiose (β-1,3), but not cellobiose (β-1,4) or gentiobiose (β-1,6) among β-linked gluco-disaccharides. We determined the crystal structures of Lin1840r in complexes with sophorose and laminaribiose as productive binding forms. In these structures, Arg572 forms many hydrogen bonds with sophorose and laminaribiose at subsite +1, which seems to be a key factor for substrate selectivity. The opposite side of subsite +1 from Arg572 is connected to a large empty space appearing to be subsite +2 for the binding of sophorotriose (Glc-β-1,2-Glc-β-1,2-Glc) in spite of the higher Km value for sophorotriose than that for sophorose. The conformations of sophorose and laminaribiose are almost the same on the Arg572 side but differ on the subsite +2 side that provides no interaction with a substrate. Therefore, Lin1840r is unable to distinguish between sophorose and laminaribiose as substrates. These results provide the first mechanistic insights into β-1,2-glucooligosaccharide recognition by β-glucosidase.

GH3 is one of the major families containing BGLs along with GH1. BGLs form a large subgroup widely distributed in animals, plants, and microorganisms in GH3 containing N-acetylβ-D-glucosaminidases, α-L-arabinofuranosidases and β-D-xylopyranosidases as well [16]. The substrate recognition residues and their structural positions for non-reducing end glucosides are highly conserved among GH3 BGLs. On the other hand, the substrate recognition sites in other moieties of substrates exhibit great diversification in the family, which leads to a variety of substrate and chain length specificities [17].
Lin1840 forms a clade with closely related homologs phylogenetically [18]. Only two enzymes have been characterized and/or are structurally available in the clade. Metagenomic GH3 β-glucosidase from compost (JMB19063), which is the only structurally available enzyme, was reported to act on cellooligosaccharides (β-1,4-glucooligosaccharides, Cel n s) [18], and Flavobacterium meningosepticum GH3 β-glucosidase (FmBGL) has been identified as an aryl βglucosidase [19]. The amino acid sequence identities of these two enzymes with Lin1840 are 42% and 40%, respectively. Arg587, which is one of the main residues comprising subsite +1 in JMB19063, is highly conserved among the closely related homologs (S1 Fig). The presence of a conserved arginine residue important for substrate recognition led us to expect a similar function among Lin1840, JMB19063, and FmBGL. Cel n s were the only natural compounds tested as substrates for previous characterization of JMB19063 and FmBGL. However, both enzymes have very low activity on Cel n s, whereas they show high activity toward p-nitrophenyl-β-D-glucopyranoside (pNP-β-Glc), an artificial substrate [18,19]. This means that previous reports of FmBGL and JMB19063 characterization lack information supporting or denying the hypothesis that Lin1840 is involved in β-1,2-glucan metabolism in L. innocua. In this study, we describe the characteristics and structure-function relationship of the Lin1840 to provide the first mechanistic insight into recognition of β-1,2-glucooligosaccharides (Sop n s) by BGLs.

Preparation of recombinant Lin1840 and mutants
Gene cloning and overexpression of lin1840 from L. innocua Clip11262 and purification of recombinant Lin1840 (Lin1840r) was described in our previous paper [20]. Briefly, the protein fused with a C-terminal His 6 -tag was purified from the cell extract of the transformant using a HisTrap FF crude column (5 ml; GE Healthcare, Buckinghamshire, England), and then was buffered with 50 mM 3-(N-morpholino)propanesulfonic acid (Mops) buffer (pH 7.0) using Amicon Ultra 30,000 molecular weight cut-off (Millipore, Billerica, MA, USA) for enzyme assay. The protein was further purified using Superdex 200 (Hiload 16/60; GE Healthcare) for crystallization. The molecular weight of Lin1840r in solution was estimated from the retention time. Ovalbumin (44 kDa), conalbumin (75 kDa), aldolase (158 kDa), ferritin (440 kDa), and thyroglobulin (669 kDa; GE Healthcare) were used as standard proteins. Blue dextran 2000 (2000 kDa; GE Healthcare) was used to determine the void volume of the column. Protein concentration was determined by UV absorbance at 280 nm (extinct coefficient [21] and theoretical molecular weight of Lin1840r are 67770 cm −1 M −1 and 80532.2 Da, respectively). Construction of plasmids for expression of the D270A, E473A, and R572K mutants was performed based on the protocol for a KOD-Plus-Mutagenesis Kit (Toyobo, Osaka, Japan) using pET-28a inserted with lin1840 as a template, KOD Plus (TOYOBO, Osaka, Japan), and the primers described below. The primer pairs used for amplification of the D270A, E473A, and R572K mutant genes were 5'-TGGGGCGCTGTTGCCGAAGTAATTAATCAC-3' and 5'-CGCAGAAATAAGTA CACCGTCAAACTCCA-3', 5'-CCCGCCCCATTCATTTTTTTCACCTAGCGC-3' and 5'-GCGGCAGGAAGTCTTGCTACTATTCG-3', and 5'-GAGCGCCACAAACACCGGAAAA TAAAGG-3' and 5'-CAGTGCGTAAATGATTATAATAAACTGG-3' (mutated nucleotides are underlined), respectively. Production and purification of the mutant enzymes were performed in the same way as for the WT.
Kinetic analysis using pNP-β-Glc and β-linked gluco-oligosaccharides To determine the kinetic parameters of pNP-β-Glc hydrolysis, 70 μl of a reaction mixture comprising 0.5-30 mM pNP-β-Glc, 15% glycerol and 0.42 μg of the enzyme in 50 mM Mes buffer (pH 6.0) was incubated at 20°C for 10 min. Ten μl aliquots of the reaction mixture were taken and then was mixed with 90 μl of 0.2 M Na 2 CO 3 every 2 min to stop the reaction. Activity was determined as described above. The kinetic parameters were determined by non-linear regression of the data using an equation expressing substrate inhibition: v/[E 0 ] = k cat [S]/(K m +[S] where v is the initial velocity of pNP release, [E 0 ] the enzyme β-Glucosidase Involved in β-1,2-Glucan Metabolism concentration, and K is the substrate inhibition constant. The kinetic analysis regarding βlinked gluco-oligosaccharides was performed using Sop n s [14], β-1,2-glucans (average DP 25) [22], Lam n s (BIOCON (JAPAN) LTD, Aichi, Japan), Cel 2 (Wako Pure Chemical Industries, Osaka, Japan), and Gen 2 (Tokyo Chemical Industry, Tokyo, Japan) as substrates. The reaction mixtures (70 μl) comprising various concentrations of substrates, the enzyme, and 15% glycerol in 50 mM Mes buffer (pH 6.0) were incubated at 20°C for 10 min. At intervals of two minutes, 10 μl aliquots of the mixtures were taken to stop the reaction by heat treatment at 99°C. Then the samples mixed with 90 μl of GOPOD FORMAT KIT (Megazyme, Wicklow, Ireland) were incubated at 45°C for 20 min. When the substrates were Sop n and Lam n , 6 μl aliquots of the samples and 114 μl of GOPOD FORMAT KIT were mixed. The amounts of Glc released from oligosaccharides were calculated from absorbance at the 510 nm using 100 μl aliquots of the solutions. The concentrations of Glc released from disaccharides were taken to be half since two Glc molecules were released in hydrolysis of one substrate molecule. The kinetic parameters were determined by fitting to equation 1 or the normal Michaelis-Menten equation.

Temperature and pH profiles
The reaction conditions and the method for measuring released pNP were the same as described in the kinetic analysis section. The substrate used was 5 mM pNP-β-Glc. The optimum temperature and pH were determined by measuring the activity at various temperatures (0-50°C) and in various pH ranges in the following buffers: sodium acetate (pH 4.0-5.5), Mes (pH 5.5-6.5), Mops (pH 6.5-7.5), 4-(2-hydroxyethyl)piperazine-1-(2-hydroxypropanesulfonic acid) (pH 7.5-8.5), and N-cyclohexyl-2-aminoethanesulfonic acid (pH 8.5-9.0). The thermal and pH stabilities were determined from the residual hydrolytic activity at 20°C after incubation of Lin1840r (0.1 mg/ml) at various temperatures in 50 mM Mes buffer (pH 6.0) for 1 h, and in 50 mM various buffers with the pH ranges described above at 20°C for 1 h, respectively. The incubated enzyme solutions were diluted at least 20 times with the reaction solution.

Crystallography
Crystallization of the WT and D270A mutant was performed by the hanging-drop vapor-diffusion method. As described previously [20], 1 μl of 10 mg/ml Lin1840r in 5 mM Mops (pH 7.0) mixed with 1 μl of reservoir solution comprising 15% (v/v) glycerol, 0.17 M Li 2 SO 4 , 0.085 M Tris-HCl (pH 9.0), and 25.5% (w/v) PEG4000 was incubated at 25°C for 1 week for crystallization. Crystals were soaked in the reservoir solution supplemented with each ligand for 1 h to obtain complex structures. Crystals of WT were soaked in 150 mM Glc, 5 mM GDL, or 5 mM IFG. D270A crystals were soaked in 100 mM Sop 2 , 100 mM Sop 3 , 50 mM Lam 2 , 300 mM Cel 2 , or 300 mM Gen 2 . The crystals were cooled and then kept at 100 K in a nitrogen-gas stream during data collection. A set of X-ray diffraction data for the crystal was collected using a CCD detector (ADSC Quantum 210r) on a beamline AR-NW12A at Photon Factory (Tsukuba, Japan). The diffraction data set was processed using iMosflm [23]. A model structure of Lin1840r was predicted using SWISS-MODEL (http://swissmodel.expasy.org/) [24] based on the A chain of HvExoI (PDB code; 1EX1), and then used as a search model for molecular replacement. Molecular replacement was performed using MOLREP [25] to determine initial phases. Automated model building was performed using ARP/wARP [26]. Manual model building and refinement were performed using Coot [27] and Refmac5 [28], respectively. Quality check of the structures was performed using wwPDB validation server (http://wwpdbvalidation.wwpdb.org/validservice/). The figures were prepared using PyMOL (DeLano Scientific; http://www.pymol.org). The buried surface area was calculated with the protein-protein interaction interface server (PISA; http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html) [29].

General properties of Lin1840r
The amino acid sequence of Lin1840 includes no predicted N-terminal signal peptide, as judged from SignalP 4.0 analysis (http://www.cbs.dtu.dk/services/SignalP/) [30], suggesting that the enzyme is localized in the cytosol. On size-exclusion chromatography, a recombinant Lin1840 protein was eluted as a 150 kDa one, suggesting that it is dimeric. The enzyme did not show significantly decreased catalytic activity on incubation up to 20°C in the pH range of 5.0-9.0, but drastically lost the activity at 40°C. The enzyme exhibited maximal catalytic activity at 37°C and pH 6.0. Asp270 and Glu473 in Lin1840r are predicted to be a catalytic nucleophile and a catalytic acid/base, respectively, according to primary sequence alignment with FmBGL [31,32].

Substrate specificity
The substrate specificity of Lin1840r as to glycone was determined using several p-nitrophenyl (pNP)-β-D-monosaccharides. The enzyme showed no activity toward any of those examined (less than 0.01 U/mg) except for pNP-β-Glc, indicating that it specifically acts upon β-glucosides. The kinetic parameters of the enzyme toward pNP-β-Glc were k cat = 48 ± 3 (s −1 ), K m = 3.1 ± 0.3 (mM), and k cat /K m = 15 ± 1 (mM −1 s −1 ). To investigate the linkage position specificity, kinetic parameters for β-linked gluco-disaccharides were determined (Table 1). Lin1840r showed large k cat values for Sop 2 and Lam 2 , while the k cat values for Cel 2 and gentiobiose (Glc-β-1,6-Glc, Gen 2 ) were less than 1% of those for Sop 2 and laminaribiose (Glc-β-1,3-Glc, Lam 2 ). The K m values for Sop 2 and Lam 2 were similarly small, while the K m values for Cel 2 and Gen 2 were approximately 7 and 13 times higher than that for Sop 2 , respectively. Consequently, the enzyme exhibited comparable k cat /K m values for Sop 2 and Lam 2 , while the values for Cel 2 and Gen 2 were both less than 0.05% of that for Sop 2 . The kinetic parameters for Sop 2 and Lam 2 were similar to those for pNPβ-Glc. Then kinetic parameters for Sop n s were determined. The k cat value for Sop 3 was similar to that for Sop 2 , but the k cat values for Sop 4 and Sop 5 were less than one-tenth of that for Sop 2 . The K m values for Sop 3-5 were 5 or more times higher than that for Sop 2 . Consequently, the catalytic efficiency decreased with the increase in DP remarkably. In the case of β-1,3-glucooligosaccharides (Lam n s), in contrast, the k cat , K m or k cat /K m value did not markedly change with increasing DP, unlike in the case of Sop n s. The enzyme did not show significant hydrolytic activity toward β-1,2-glucan (reaction velocity, less than 0.01 U/mg in the presence of 1 mM substrate).

Inhibition kinetics
The inhibition modes and constants for six inhibitors as to the hydrolytic activity toward pNPβ-Glc are summarized in Table 2. Three glucosidase inhibitors, GDL, IFG, and DNJ, consistently exhibited competitive-type inhibition with K i values of less than 1 mM. IFG showed the strongest inhibition among all the inhibitors examined, the K i value being 4.1 μM. While the K i value of Lin1840r for IFG was similar to that of Aspergillus aculeatus BGL1 (AaBGL1) (14 μM), the K i value of Lin1840r for DNJ (290 μM) was over 100 times higher than that of AaBGL1 (2.4 μM) [33]. On the other hand, Glc, Cel 2 , and Gen 2 consistently showed mixed-type inhibition ( Table 2). The K i and K i ' values for Cel 2 were more than 5 times higher than those for Glc and Gen 2 . Table 1. Kinetic parameters of Lin1840r for β-linked gluco-oligosaccharides and pNP-β-Glc.  Overall structure The crystal structure of apo wild-type (WT) Lin1840r was determined at 1.8 Å resolution (S1 Table). The structure showed that Lin1840r forms a dimer of identical subunits each composed of three domains (Fig 1A). The structure of Lin1840r was compared with those of two GH3 enzymes, JMB19063 and barley GH3 β-glucan-exohydrolase (HvExoI), exhibiting the highest similarity based on the Z-score with the Dali server (http://ekhidna.biocenter.helsinki.fi/dali_ server/) [34] (S2 Table). Lin1840r shows a similar overall structure to that of JMB19063. A loop (amino acids 568-598) in Lin1840r extends into the active site of the other subunit to form a part of the substrate pocket as in the case of the corresponding loop of JMB19063 (Fig 1  and S2A Fig). Therefore, the formation of the dimer is predicted to be important for the catalysis. Contrarily, HvExoI is a monomeric enzyme whose catalytic site is composed of a single subunit [35]. Lin1840r is composed of three domains based on Pfam (http://pfam.xfam.org/) [36]. The metal ion obviously undergoes six-coordination with the side chain of Asp648, the backbone carbonyl of Thr650, and four water molecules. The metal ion is suggested to be Mg 2+ according to the server for checking metal ion binding (http://csgid.org/csgid/metal_sites/, [37]) (S2 Table), although a Mg 2+ ion was not added to Lin1840r in any step of sample preparation.

Subsite −1 of Lin1840r
The active center of Lin1840r is located at the interface of domains 1 and 2 as in the cases of known GH3 BGLs. The two predicted catalytic residues, Asp270 and Glu473, occupy similar positions to the corresponding Asp residues (catalytic nucleophile) and Glu residues (catalytic acid/base), respectively, of the known enzymes (S3 Fig). In fact, the D270A and E473A mutants showed no detectable hydrolytic activity toward pNP-β-Glc. This result supports the assignment of Asp270 and Glu473 as the catalytic nucleophile and catalytic acid/base residues, respectively. A distance between side chain carboxyl oxygen atoms of Asp270 and Glu473 is approximately 6.0 Å, suggesting that the enzyme follows retaining mechanism. The Lin1840r-Glc complex structure was determined to understand the substrate recognition at subsite −1. Six residues (Asp91, Arg149, Lys191, His192, Asp270, and Glu473) constitute subsite −1 and form hydrogen bonds with the Glc molecule (S3 Fig). These residues can be well superimposed on the corresponding residues of the known structures of GH3 BGLs.

Complexes with inhibitors
To clarify the binding modes of inhibitors, Lin1840r-inhibitor complex structures were determined. In complex structures with IFG and GDL, which are Glc analogs, the ligands are located at the same position as Glc (Fig 2A, 2B and S3 Fig), suggesting that IFG and GDL compete with substrates for subsite −1. In the GDL complex, electron densities of glycerol molecules were observed between the aromatic rings of Tyr583 and Trp271 (Fig 2A). A GDL molecule can be fitted to a large electron density near Trp409 (outside the active center) and undergo hydrophobic stacking with Trp409, but the orientation of the molecule is obscure (Fig 2A). The structure of the GDL complex is not substantially different from those of the apo WT and Glc complex. In the case of the IFG complex structure, on the other hand, an unwound helix (amino acids [36][37][38][39][40][41][42][43][44][45][46][47] in domain 1 is inserted between the aromatic rings of Tyr583 and Trp271. Thr40 in this loop forms a hydrogen bond with the 6-OH group of IFG (Fig 2B). This loop might be involved in substrate binding in solution.

Complex structures with Sop 2 and Lam 2
In order to understand the recognition mechanism of Sop 2 and Lam 2 , the complex structures with these disaccharides were determined using the nucleophile D270A mutant of Lin1840r. The electron density of Sop 2 was clearly observed in the D270A structure on soaking in 100 mM Sop 2 and fitted only with the β-anomer (Fig 3A left). The significant electron density of a ligand was also observed on soaking in 50 mM Lam 2 and clearly fitted with both anomers of Lam 2 (Fig 3B left). The Glc moieties of both Sop 2 and Lam 2 at subsite −1 are located at almost    Table), suggesting that these three residues constitute subsite +1. The fact that Tyr583 and Arg572 are derived from a loop in the adjacent subunit indicates the participation of both subunits in formation of the catalytic pocket.
The results of Cremer-Pople ring pucker parameters calculator analysis (http://www.ric.hiho.ne.jp/asfushi/) [38] showed that the pyranose rings of Sop 2 and Lam 2 at subsite −1 are 4 H 3 and 1 S 3 , respectively. Since proposed itinerary of ring conformation to the substrate-enzyme intermediate in HvExoI is 4 C 1 (pre-Michaelis complex) ⬄ 1 S 3 (Michaelis complex) ⬄ 4 E ( 4 H 3 ) (transition state) ⬄ 4 C 1 (intermediate) [39,40], conformation of Lam 2 is considered to correspond with Michaelis complex. Sop 2 complex is also considered as Michaelis complex in spite of its conformation, since Sop 2 is not superimposed with glucophenylimidazole (PheGl-cIm) known as a transition-like analog in HvExoI [40] but with ligands known as Michaelis complex, such as 2',4'-dinitrophenyl-2-deoxy-2-fluro-β-D-cellobioside (Cel 2 F-DNP) in GH5 Bacillus agaradhaerens endo-β-1,4-glucanase (BaCel5A), a retaining enzyme [41] (Fig 3C). In addition, the Sop 2 molecule does not possess a trigonal anomeric center necessary to be judged as transition state. In the Sop 2 and Lam 2 complexes, the angles defined by the oxygen atom of the glycosidic bond, the anomeric carbon atoms in the ligands, and the carboxyl group in the catalytic nucleophile are 165°and 164°, respectively, as in the case of GlcNAc-MurNAc in a nucleophile mutant of GH3 B. subtillus N-acetyl-β-D-glucosaminidase and Cel 2 F-DNP in BaCel5A [41,42]. Dihedral angles of Sop 2 and Lam 2 (O2-C1-O5-C5 in non-reducing end) are 93.4°and 88.3°, respectively, implying that the lone pairs on the both endocyclic oxygen atoms almost face antiperiplanar to the scissile bonds. These facts suggest that the nucleophile is able to mount an in-line attack on the anomeric carbon (Fig 3A and 3B right) [43]. The distances between the catalytic nucleophile and the anomeric carbon at subsite −1 are 3.2 Å and 3.0 Å, respectively, and the distances between the catalytic acid-base and oxygen atoms of the glycosidic bond are 3.4 Å and 3.2 Å, respectively (Fig 3A and 3B right). These distances are within the possible range of the reaction [44]. These facts suggest that the ligands are positioned, as they would be in catalytically active complexes. This is the first report of a Michaelis complex for GH3 BGLs, though pre-Michaelis complexes have been reported for HvExoI [39,40].
The anomeric hydroxy group of reducing end glucosides in Sop 2 is exposed to the solvent, in the vicinity of Asp91. In the case of Lam 2 , the hydroxy group is exposed outside the substrate pocket, this being consistent with similar activity among Lam n s.

Complexes with Sop 3 , Cel 2 , and Gen 2
Soaking of the D270A mutant crystal in 100 mM Sop 3 resulted in clear observation of electron density corresponding to the middle Glc moiety of Sop 3 at subsite +1 (S4A Fig). The electron densities of glycoside moieties at both the non-reducing and reducing ends might be derived from Sop 3 but they were so weak that interpretation was difficult. The electron density of the Glc moiety was not observed at subsite −1. In the Cel 2 and Gen 2 complex structures, electron densities of Glc moieties were also both observed at subsite +1 (S4B and S4C Fig). These Glc moieties were thought to be the non-reducing ends of Cel 2 and Gen 2 due to the orientation of the hydroxy group participating in the glycosidic linkage (S4B and S4C Fig), suggesting that Cel 2 or Gen 2 binds to the enzyme non-productively. In the both complexes, Glc molecules were observed only in molecule A. This might be due to impurity in quite high concentrations of soaked substrates. In the Cel 2 complex, Trp409 stacks on another Glc moiety unlike in the cases of the Sop 3 and Gen 2 complexes (S4 Fig).

Kinetic analysis of the R572K mutant
It is suggested that Arg572 is an important residue for the recognition of Sop 2 , since it forms multiple hydrogen bonds with substrates at subsite +1 (Fig 3A and 3B left). We therefore characterized the R572K mutant. The kinetic parameters of the R572K mutant enzyme as to βlinked gluco-disaccharides are summarized in Table 1. The K m value of the mutant enzyme for Sop 2 was 10 times higher than that of the WT enzyme. The mutant enzyme also showed increased K m values for the other substrates, and 1.5-3.0 times smaller k cat values for Sop 2 , Lam 2 , and Cel 2 . As a result, the k cat /K m value of the R572K mutant for Sop 2 was over 25 times smaller, while the extent of reduction of the k cat /K m value for Lam 2 caused by the mutation was approximately 6 times. This indicates that Arg572 is important for substrate binding, especially for Sop 2 .

Comparison of substrate pockets
In order to investigate the importance of Arg572 and subsite +2, the substrate pocket of Lin1840r was compared with those of JMB19063, a homolog in the same clade as Lin1840, and homologs in the different clade from Lin1840. Trp271, Tyr583, and Arg572 constitute subsite +1 in Lin1840r. They can be well superimposed on the corresponding residues in JMB19063 (Tyr262, Phe598, and Arg587, respectively) ( Fig 4A). Try583 and Arg572 are derived from the other subunit, as observed for the corresponding residues of JMB19063. Contrarily, in the other structurally known GH3 BGLs, each subsite +1 comprises one subunit.
At the position corresponding to the Tyr583 residue, most structurally known GH3 BGLs have an aromatic amino acid, while the TnBgl3B structure lacks an aromatic residue probably due to disorder of the corresponding region. The Tyr583 residue and the corresponding residues of JMB19063, KmBglI, AaBGL1, and TrCel3A (Phe598, Phe508, Tyr511, and Trp443, respectively) are consistently located on loops extending from the acid-base catalyst residue side (Fig 4A and 4C-4E). On the other hand, Trp434 in HvExoI is on the loop extending from the opposite side (Fig 4B). Lin1840r and JMB19063 consistently lack this loop, and instead have empty spaces sufficiently large for glycoside binding (Fig 4A). This space is considered to be subsite +2, since the anomeric hydroxy group of reducing end glucosides in Sop 2 faces the space (Fig 4A right). The spaces are filled with loops in KmBglI, AaBGL1, and TrCel3A ( Fig  4C-4E).
The Arg572 residue in the substrate pocket is located at a similar position to Arg291 of HvExoI (Fig 4B). However, Arg291 does not correspond with the Arg572 residue in terms of primary sequence, and HvExoI does not have any corresponding loop for the Arg572 residue. The Arg291 residue is localized farther away from the center of the catalytic pocket than Arg572 (Fig 4B). KmBglI, AaBGL1, TrCel3A, and TnBgl3B possess no residue that corresponds to Arg572 or Arg291 (Fig 4C-4F). Gly57 in the loop of HvExoI (amino acids 54-66) participates in substrate recognition on the opposite side of subsite +1 from Arg291 [48] (Fig 4B). KmBglI, AaBGL1, and TrCel3A possess Phe55, Arg98, and Arg67, respectively, at the position corresponding to Gly57 (Fig 4C-4E). On the other hand, Lin1840r has no corresponding loop (Fig 4A).

Discussion
In this study, the hydrolytic activity of Lin1840r toward Sop n s was determined since the lin1839 gene in the same gene cluster as lin1840 encodes an enzyme specific to β-1,2-glucan. Lin1840r showed obvious preference for Sop 2 among Sop n s ( Table 1). Considering that Lin1839 phosphorolyzes β-1,2-glucan with DP 3 or more to produce G1P but does not act on Sop 2 [14], Lin1840 and Lin1839 cooperatively metabolize β-1,2-glucan in the cytosol. Since phosphorylation of glucose without the use of ATP is beneficial for energy acquisition, it is advantageous that BGLs show strong preference for substrates on which phosphorylases do not act. These facts suggest that Lin1840 is a BGL for Sop 2 degradation.
Nevertheless, structural analysis showed that Lin1840r possesses a sufficiently large space, which appears to be subsite +2, for access of Sop 3 . This space is needed only for binding of Sop n s, as judged from the orientations of the anomeric hydroxy groups of Sop 2 and Lam 2 in Lin1840r structures (Fig 3), and thioCel 2 (Fig 4B) [35] and methyl β-thiogentiobiose (PDB ID, 3WLP) in HvExoI structures. The corresponding spaces are filled in other structurally available GH3 BGLs except JMB19063 (Fig 4A-4E). However, the presence of Sop 3 accessible space at subsite +2 cannot explain the kinetic result that Lin1840r showed a much higher K m value toward Sop 3 than that toward Sop 2 . The conformation of Sop 3 is likely related to the difference in the K m values. According to molecular dynamics simulation, Sop 3 adopts a stable conformation that includes an intramolecular hydrogen bond between the 3-OH group of the reducing end glucoside and the oxygen atom of the pyranose ring or 6-OH group of the non-reducing end glucoside [49]. These intramolecular hydrogen bonds in Sop 3 have to be removed and the glycosidic bond of Sop 3 at the reducing end also has to be distorted for productive binding of Sop 3 . Overall, subsite +2 might be an evolutionary relic of specialization of Lin1840 from a Sop n s degrading enzyme to one specialized at Sop 2 degradation.
In spite of the estimated function of Lin1840, Lin1840r shows comparable activity toward Lam 2 with Sop 2 . The presence of Arg572 at subsite +1 and the space of subsite +2 are important features for substrate recognition. Arg572 forms many hydrogen bonds with Sop 2 or Lam 2 at subsite +1, and thereby likely compensates for the lack of any hydrogen-bond interaction with the substrates on the subsite +2 side (Figs 3A and 5A). The binding modes of Sop 2 and Lam 2 are apparently very similar in the Lin1840r complexes. The structures of the bound Sop 2 and Lam 2 molecules only differ in the substituting groups of reducing end glucosides on the subsite +2 side (Fig 5). This observation thus indicates that Lin1840r is not able to distinguish between Sop 2 and Lam 2 . BGLs from C. arvensicola and Acremonium sp. 15 induced by β-1,2-glucan show similar substrate specificities for β-linked gluco-disaccharides to Lin1840 [12,13]. They might have similar structural features to Lin1840r, though their amino acid sequences are unavailable.
Unlike Lam 2 , Arg572 effectively excludes Cel 2 as a substrate. This can be explained by comparison of the D270A-Sop 2 and HvExoI-thiocellobiose (thioCel 2 ) complex structures. Arg291 in HvExoI, which corresponds to Arg572 in Lin1840r, is located farther from the center of the  (Figs 4B and 5B). The 6-OH group of the reducing end glucoside of thioCel 2 forms a hydrogen bond with Arg291. However, the distance between Arg572 and the 6-OH of the glucoside in thioCel 2 is so close as to cause steric hindrance (Fig 5B). Actually, Cel 2 binds to Lin1840r in the non-productive form. In addition, other structurally available GH3 BGLs (TnBgl3B, TrCel3A, AaBGL1 and KmBglI), which lack a residue corresponding to Arg572 (Fig 4C-4F), consistently show sufficient hydrolytic activity toward Cel 2 [45,46,47,50]. These observations suggest that Arg572 accounts for much higher K m value for Cel 2 than those for Sop 2 and Lam 2 . Considering that Sop 2 and Lam 2 complexes mimic Michaelis complex, Arg572 is also related with differences in k cat values between Cel 2 and Sop 2 /Lam 2 . Thus, Arg572 is important for substrate specificity in Lin1840r.
The Arg572 residue is highly conserved among closely related homologs such as JMB19063 and FmBGL. Nevertheless, JMB19063 is thought to be involved in cellulose degradation and FmBGL is described as an aryl β-glucosidase. JMB19063 shares structural features important for substrate specificity with Lin1840. Therefore, we compared the substrate specificities of the three enzymes. The specific activity of JMB19063 toward Cel 2 (0.1 mM Cel 2 at 50°C) estimated from the substrate consumption is only 0.077 (U/mg), whereas JMB19063 shows high hydrolytic activity toward pNP-β-Glc (11.8 U/mg on 0.1 mM pNP-β-Glc) [18]. The specific activities of Lin1840r for pNP-β-Glc and Cel 2 (0.1 mM at 20°C) are 1.12 (U/mg) and 8.9 × 10 −4 (U/mg), respectively. In addition, FmBGL shows similar levels of kinetic parameters for pNP-β-Glc (k cat = 39.1 s −1 , K m = 0.49 mM) to Lin1840r but at least 100 times lower activity toward Cel 2 , as judged from preliminary results, than toward pNP-β-Glc [19,32]. Thus, the three enzymes show a similar tendency in substrate specificity as to pNP-β-Glc and Cel 2 . It should be noted that Lin1840r shows comparable activity toward Sop 2 and Lam 2 as glycoside hydrolases, while the activities of JMB19063 and FmBGL toward Sop 2 and Lam 2 have not been reported. Moreover, the structure of the Lin1840r-Cel 2 complex was compared with that of the JMB19063-Cel 5 complex, which contains Glc in subsite −1 and Cel 2 outside subsite −1 [18] (Fig 4A). This complex is the same as the Cel 2 complex of Lin1840r in that subsites +1 and −1 are filled with separate molecules (S4B Fig). Moreover, while ThioCel 2 in the HvExoI-thioCel 2 complex forms the intrinsically stable conformation of Cel 2 , glucoside at subsite +1 in the Cel 5 -soaked JMB19063 complex is inverted compared to that of ThioCel 2 (Figs 4A and 5A). Thus, the Cel 5 -soaked JMB19063 complex seems to be in the non-productive form, as in the case of the Cel 2 -Lin1840r complex. Overall, it would not be surprising if JMB19063 and FmBGL show similar substrate specificity to Lin1840r.
This study strongly suggests that Lin1840 is a BGL for Sop 2 degradation, and that Arg572 and subsite +2 are the key factors for its substrate specificity. This is not only a significant revelation in the field of β-1,2-glucan metabolizing enzymes but also evoke the need for reevaluation of the functions of closely related homologs with Lin1840 possessing arginine residues corresponding with the Arg572.