Structural Evidence for Inter-Residue Hydrogen Bonding Observed for Cellobiose in Aqueous Solution

The structure of the disaccharide cellulose subunit cellobiose (4-O-β-D-glucopyranosyl-D-glucose) in solution has been determined via neutron diffraction with isotopic substitution (NDIS), computer modeling and nuclear magnetic resonance (NMR) spectroscopic studies. This study shows direct evidence for an intramolecular hydrogen bond between the reducing ring HO3 hydroxyl group and the non-reducing ring oxygen (O5′) that has been previously predicted by computation and NMR analysis. Moreover, this work shows that hydrogen bonding to the non-reducing ring O5′ oxygen is shared between water and the HO3 hydroxyl group with an average of 50% occupancy by each hydrogen-bond donor. The glycosidic torsion angles φH and ψH from the neutron diffraction-based model show a fairly tight distribution of angles around approximately 22° and −40°, respectively, in solution, consistent with the NMR measurements. Similarly, the hydroxymethyl torsional angles for both reducing and non-reducing rings are broadly consistent with the NMR measurements in this study, as well as with those from previous measurements for cellobiose in solution.


Introduction
Conversion of plant cellulose into ethanol has been industrially achievable since the late Nineteenth Century. However, the nearinsolubility of cellulose in aqueous solvents initially required physical separation of pure cellulose from plant material and harsh acid hydrolysis to produce the glucose used in fermentation. [1] Today, interest lies in using complete lignocellulosic biomass as an ethanol feedstock, but the challenge of cellulosic recalcitrance to aqueous solvation and requirements for extensive physical and/or chemical pretreatment of the biomass remains. This is despite the active study of cellulose's chemical structure which dates back to the beginnings of modern molecular structure analysis. [2][3][4] While great progress has been made in understanding cellulose in the solid state, [5][6][7] cellulosic recalcitrance largely prevents structural studies of cellulose-water interactions. An example of the difficulty of studying cellulose in aqueous environments comes from the field of NMR spectroscopy. Using solution-state NMR experiments it is possible to determine protein structures with accuracies matching that of crystallography, but only cello-oligomers in the range of 2-6 glucose subunits have been extensively characterized in water solvent due to poor solubility. [8] Small-angle neutron and X-ray scattering experiments have revealed the bulk morphology of cellulose fibers with varying degrees of hydration, [9,10] and hydrogen bonding in and among cellulose chains in dry and aqueous suspensions of microcrystals, [11][12][13] but a structural description of cellulose-water or cello-oligomer-water interactions on the atomic length scale (1-10 Å ) in solution has yet to be attained.
Unlike higher cello-oligosaccharides, cellobiose with two glucose subunits exhibits considerable solubility in water, making it an ideal model molecule for investigations in solution. Cellobiose and methyl cellobioside in aqueous solutions have been studied extensively by NMR spectroscopy where various measures of coupling constants [14][15][16][17][18][19] have been determined. More recent studies have sought to quantify the populations of hydrogen bonds across the b-(1R4) linkages in solutions of disaccharides. [18][19][20][21] Cellobiose has also been investigated extensively by computation, from early stereochemical approaches [22] and molecular dynamics (MD) simulations [23][24][25] to modern quantum mechanical methods. [26,27] More recently cellobiose analogues have been studied by spectroscopic methods combined with Car-Parrinellotype simulations. [28] Despite these extensive studies, there is still debate about the structure of cellobiose in solution-specifically with respect to its conformation about the glycosidic linkage as well as its hydrogenbonding structure. There is a particular question concerning whether or not an internal hydrogen bond between the nonreducing ring oxygen (O59) and the adjacent reducing ring hydroxyl group (HO3; see Fig. 1) [17][18][19][25][26][27]29,30] is present in cellobiose or methyl cellobioside and if the persistence of this intramolecular hydrogen bond contributes to the low solubility of cellulose in water. Part of the reason for this continuing uncertainty in the solution structure of cellobiose is that hydrogen bonding in solution is difficult to measure by most experimental techniques.
In this work, NDIS augmented by computer simulation has been used in combination with NMR spectroscopy to determine the structure of cellobiose in aqueous solution. NDIS can measure the hydrogen bonding of molecules in aqueous solutions on the atomic-length scale and has been one of the premier techniques for structural determinations of hydrogen-containing liquids due to the 'sensitivity' of the neutron to hydrogen and deuterium. NDIS also has the advantage of being a direct structural technique, analogous to crystallography, which does not rely on structural interpretation of dynamical data as is the case with spectroscopy. By combining neutron scattering and computation with NMR spectroscopy, which provides an assessment of the cellobiose conformations, a rigorous structural assessment of cellobiose hydrogen bonding in water can be realized.

NDIS
NDIS has been used to investigate the structure of hydrogencontaining liquids such as anhydrous HF, [31] water [32] and organic solvents [33][34][35][36] as well as aqueous solutions of polar and ionic species [37][38][39][40][41] including several biological molecules. [42][43][44][45][46][47][48][49] Unlike X-rays where the scattering intensity is proportional to atomic size, neutrons are scattered by virtue of a neutron-nucleus interaction. This interaction is independent of atomic size and is of the same order of magnitude for both light and heavy atoms. As such, hydrogen atoms scatter neutrons with a relatively large intensity compared to hydrogen scattering from X-rays. Furthermore, because of the nuclear dependence, neutron scattering intensities vary for isotopes of the same element. [50] In the case of hydrogen the scattering intensity difference between hydrogen and deuterium is relatively large and can be exploited by measuring a series of isotopically unique yet chemically equivalent samples yielding multiple distinct measurements of the same chemical system. The quantity measured in a neutron diffraction experiment is the total scattering structure factor, F(Q), where c i and b i are the relative concentration and scattering length of atom i, [50] respectively; d ab is the Kroneker delta function introduced to avoid double counting of like-atom pairs, and S ab (Q) is the partial structure factor for the a-b atom pair. Q is the change in magnitude of the incident wave vector of the neutrons when they are scattered from the sample where Q = 4psin(h)/l with 2h being the scattering angle relative to the incident neutron beam and l the incident neutron wavelength. Eq. 1 describes the sum of all the partial structure factors, for which there are m(m+1)/2 for m number of structurally unique atoms in the system. The Fourier transform of S ab (Q) gives the distribution of atomic separations (distances) in real space on an atomic scale (Å ) as a radial distribution function (RDF) where r is the atomic number density of the sample and g ab (r) is the RDF for the a-b atom pair. To characterize the average local structure of a liquid, the RDF for an atom pair can be integrated yielding the average coordination number of atoms b around atoms a, n b a , over the distance range r 1 to r 2 , namely The coordination number is usually calculated from atom a at the origin (r 1 = 0) to the distance of the first minimum (r 2 ) after the first obvious maximum in the RDF.

Empirical Potential Structure Refinement (EPSR)
With NDIS it is only possible to extract all of the RDFs experimentally from systems with a small number of unique atoms such as HF [31] or water. [32] In practice, it is not feasible to measure a complete set of chemically equivalent yet isotopically unique samples due to (a) the limited availability of isotopes with significantly different neutron scattering lengths and (b) the difficulty of selectively labeling each unique atom in the system. In order to obtain a full set of RDFs from complex systems, a model-based approach known as EPSR can be used to augment the NDIS experiment by calculating a full set of RDFs that are consistent with the measured diffraction data. [51] The EPSR method begins with a standard Monte Carlo simulation of the sample structure based on a set of reference potentials. EPSR perturbs these initial reference potentials thus creating new potentials whose magnitudes are proportional to the difference between the measured total structure factors and those calculated from the model. EPSR then uses these new empirical potentials in the Monte Carlo simulation. This refinement process proceeds iteratively and results in a model that is consistent with a set of measured diffraction data. It should be noted that, while EPSR provides a model consistent with the data, it is not necessarily a definitive model of the systems in question. As is true with any modeling technique which determines the structure present in a liquid, as much knowledge as possible about the solution must be introduced into the EPSR model such as charges, molecular structures and overlap restrictions (as is done in the present case), in order to provide a physically realistic model of the physical structure. RDFs can be calculated for each unique atom pair from the resulting model. Additionally, spherical harmonic expansion of the calculated RDFs can be performed to generate spatial density functions (SDFs) that describe the location of molecules or portions of molecules relative to one another in three dimensions. A more detailed description of EPSR is given elsewhere, [51] and further details relevant to the derivation of SDFs (see below) are provided in the Supporting Information File S1.

NDIS Experiments
Neutron diffraction measurements were performed using the Small Angle Neutron Diffractometer for Amorphous and Liquid Samples (SANDALS) located at the ISIS facility (Rutherford Appleton Laboratory, STFC, UK) on an isotopomeric series of cellobiose-water solutions each at a molecular ratio of 1:63 cellobiose:water (,0.88 M) at standard ambient temperature and pressure (29860.1 K, 1 bar). Isotopic H/D substitution was performed on the cellobiose hydroxyl groups and on the water solvent. Samples were prepared from cellobiose (b-D-glucopyranosyl-(1R4)-D-glucopyranose, 99%, Sigma-Aldrich) and ultrapure water (Milipore) or from a sample of cellobiose previously lyophilized from deuterium oxide (D 2 O) (99.9% D, Cambridge Isotopes) and fresh D 2 O (99.9%, Sigma-Aldrich). Seven isotopically unique cellobiose-water solutions were studied; the isotopic composition of each sample is listed in the Supporting Information File S1.
The solutions were prepared by weight and transferred to sample cans constructed from a Ti/Zr alloy with a flat plate geometry and 1 mm sample thickness. Ti/Zr cans give very little scattering, thus simplifying data analysis, due to cancellation from the positive and negative coherent neutron scattering lengths of zirconium and titanium, respectively. [52] Data acquisition for each sample was conducted for ,1500 mA proton current (8-10 h) to give adequate statistics for the total structure factors. Raw data were obtained for the samples, empty containers, instrument background, and a vanadium standard in order to ensure accurate background subtraction and normalization. SANDALS is also equipped with a neutron transmission monitor that measures the neutron cross-sectional area of samples relative to the incident beam allowing for an absolute measure of scattering from the sample. The scattering observed for each isotopically labeled solution was within 10% of the expected theoretical level. Data were converted to F(Q) after appropriate corrections for neutron absorption, multiple scattering, and inelasticity effects using the program GUDRUN, which is based on the ATLAS suite of programs available at the ISIS facility. [53] NMR Spectroscopy NMR studies were conducted to probe the molecular conformation of cellobiose in aqueous solution. One-dimensional (1D) 1 H NMR spectroscopy and two-dimensional (2D) NMR correlation spectroscopy were performed using a Varian Inova spectrometer operating at 600 MHz for 1 H and 150 MHz for 13 C. The instrument was equipped with a triple-channel cryoprobe and z-axis gradients. 1D 13 C NMR spectroscopy was performed using a Bruker AMX spectrometer operating at 100 MHz for 13 C and equipped with a broadband X-channel inverse geometry probe. The sample temperature was regulated at 29860.5 K for all measurements. 1 H and 13 C resonances were interpreted according to Roslund et al. [54] Offline spectral processing, measurement of coupling constants and 1D and 2D resonance integration were performed using MestReC 4.9 (MestReLab Research, SL).
Cellobiose exists in two anomeric forms in aqueous solution at an equilibrium ratio of 38:62 a-:b-cellobiose ( Fig. 1). [54] Possible shifts in this equilibrium due to concentrations or isotopic substitutions similar to those used in the NDIS experiments were investigated by 1D 13 C spectroscopy. Samples of cellobiose lyophilized from D 2 O were prepared in fresh D 2 O at concentrations of 15 mM and 0.88 M. These two concentrations were chosen to represent a ''typical'' NMR measurement (15 mM) and the concentration measured by NDIS (0.88 M), thus allowing a direct comparison of results from the two methods. 13 C NMR spectra were acquired using inverse-gated 1 H decoupling over six hours (,8,000 transients) to obtain signal-to-noise ratios exceeding 30:1, thus allowing for accurate integration of the signals arising from the anomeric carbons. Anomeric ratios of ,40:60 a-:bcellobiose were observed at both concentrations.
J-coupling-modulated 1 H, 13 C gHMBC (J-mod gHMBC) experiments were performed to determine the magnitudes of the interglycosidic coupling constants 3 J H1',C4 and 3 J H4,C1' . The J-mod gHMBC experiment, [55] yields a 2D spectrum showing heteronuclear correlations through two or more bonds, and the absolute intensity of each correlation cross-peak is proportional to sin(p n J H,X t) where t represents a variable mixing time for polarization transfer. Performing a series of J-mod gHMBC experiments with varied mixing times and integrating the 1D projections of the cross-peaks of interest from each spectrum produces a set of intensity values as a function of t that can be fit by non-linear regression to extract the value of n J H,X . J-mod gHMBC spectra of 15 mM and 0.88 M cellobisoe in D 2 O were recorded for t values ranging from 20-200 ms. Coupling constants derived by non-linear regression were related to the torsion angles Q H (H19-C19-O4-C4) and y H (C19-O4-C4-H4), illustrated in Fig. 2 Determinations of the cellobiose hydroxymethyl (CH 2 OH) rotamer populations were made using the method reported by Serianni and co-workers [57] for measured values of 3 J H5,H6R and 3 J H5,H6S . This method assumes that the measured coupling constants are weighted averages resulting from the population of three preferred conformations of the v dihedral angle (O5-C5-C6-O6 = +65u, 265u or 6180u). The following Karplus-type relationships for 3 J H5,H6R and 3 J H5,H6S are solved for each of the preferred conformations, and the calculated coupling constants are used to factor out the contributions of the preferred conformations to the measured, average coupling constant (Eq. 's 5 and 6). EPSR An EPSR modeling box was constructed using 20 cellobiose molecules (8 a-cellobiose and 12 b-cellobiose) and 1260 water molecules for a 1:63 cellobiose:water molecular ratio at a density of 0.103018 atoms Å 23 . Initial atomic coordinates, bond distances, and angles for the cellobiose molecules were taken from the bcellobiose crystal structure [58] with the anomeric configuration inverted at C1 for a-cellobiose molecules and all O-H bond distances increased to 1.0 Å . Several non-bonding distance constraints were added to the cellobiose molecules in order to reproduce the energetically preferred 4 C 1 chair conformation of the hexopyranose rings. [59][60][61] These additional constraints were introduced through specifying values for backbone torsion angles and creating non-bonding energy potentials between atoms one and four of each torsion. The constraints were necessary since the relatively weak neutron scattering intensity from carbon atoms [50] compared with hydrogen or deuterium and identical isotopic labeling of numerous atomic sites in the experiment reduced the conformational information available to guide the EPSR simulation process. It should be noted that these constraints did not include any X-X-O-H torsions that would influence hydroxyl group orientation nor did they constrain the intramolecular (or inter-residue) flexibility in the cellobiose molecules. The constraints are listed in full in the Supporting Information File S1. Constraints specific to the glycosidic linkage were also added to guide the EPSR model with values derived from the J-mod gHMBC experiments (See foregoing Experimental section, NMR spectroscopy). [55] Finally, a non-bonding distance constraint of 2.205 Å between the H1' and H4 atoms ( Fig. 1) was also introduced to further stabilize the conformation of the glycosidic linkage.
The EPSR reference potentials used were Single Point Charge/ Extended (SPC/E) model for water molecules (Ow and Hw), [62] and Lennard-Jones potentials and atomic charges from a modified CHARMM force-field for cellobiose molecules. [60] Values for parameters of the potentials are listed in the Supporting Information File S1. The atomic labels for cellobiose in the EPSR model are shown in Fig. 1 along with the IUPAC-recommended nomenclature for cellobiose atoms, which is used herein except where explicitly stated otherwise. [63] The simulation was conducted until ,50,000 unique configurations of the minimized structural model were accumulated. The corrected neutron diffraction data, the EPSR fitted total structure factors, and the residuals between the data and fit are shown in the Supporting Information File S1.  Fig. 2. Constraints were imposed in the EPSR model as described above to keep the glycosidic conformation similar to that observed by NMR measurements. However, despite the constraints, these angles remained flexible in the model in order to accurately reproduce the NDIS data. Table 1 compares values of Q H and y H obtained via crystallography, [58] previous NMR mesurements, [16,64] and density functional theory (DFT) studies [27] with the average values from the NMR measurements in this study and the EPSR model of the measured NDIS data. The EPSR average has been performed in two ways: (a) by taking the average of the two peak heights as seen in Fig. 3, and (b) by fitting these peaks to a Gaussian function in order to obtain a peak maximum with the results of these methods reported in Table 1.

Cellobiose Conformation in the EPSR Model
On the whole, the EPSR averages for Q H and y H (Fig. 3) agree with the values obtained from current and previous investigations, with the cellobiose molecules adopting a syn-Q H /y H conformation  in solution. The best agreement with previous measures is for the y H torsion, [16,64] and, interestingly, the Q H value is in better agreement with recent DFT studies of solvated cellobiose, [27] compared with other previous measures. This is in opposition to recent vibrational spectroscopy measurements on the mircrohydration of phenyl b-cellobioside (which has a phenyl group in place of the hydrogen on the O1 oxygen on the non-reducing ring (Fig. 1)) in the gas phase where the molecules adopt a syn-Q H /antiy H conformation. [28] The NMR-derived torsional constraints for Q H and y H maintained the torsions of the glycosidic linkage in the EPSR model of the neutron diffraction data in the syn-Q H /y H conformational state. Fig. 3 shows a fairly tight distribution for each measured angle with the exception of the Q H torsion of acellobiose that populated a second conformation (approximately 12% of the a-molecules; Fig. 3). This range of torsions shows that the EPSR model reproduced both the well-defined average glycosidic conformation and the flexibility characteristic of this linkage.
Cellobiose also has preferred hydroxymethyl orientations (v(9); Fig. 2) in solution, as shown by the Newman projections for these conformations in Fig. 4. The three staggered orientations are generally referred to as gauche-trans (gt, v = +65u), gauchegauche (gg, v = 265u) and trans-gauche (tg, v = 6180u) due to the orientations of O5 and C4 relative to O6. Previous NMR studies of glucose, [65] glucose derivatives, [66] and cellobiose [54] have established the relative conformational distribution as P gt <P gg .P tg in aqueous solution, which is opposite to the P tg conformation in the crystal structure of cellobiose [58] and the P tg .. P gg <P gt distribution observed for native cellulose polymorphs. [6] Figure 5 shows the population distribution of hydroxymethyl conformations from the EPSR model, and Table 2 shows the calculated relative populations for each of the gt, gg and tg conformers compared with current NMR measurements and previous investigations of glucose [65] and cellobiose [54] in aqueous solutions.
The EPSR rotamer populations are in broad agreement with those from NMR studies, with the tg rotamer being the least populated and the gg and gt rotamers showing proportionally higher populations. This is consistent with prior NDIS/MD investigations of glucose in aqueous solution where a predominance of the gt and gg conformations were also observed. [44] Compared with previous and current NMR measurements-even at the slightly higher concentration measured here-EPSR gives the opposite trend for gg and gt rotamers where specifically the relative concentrations from EPSR are P gt .P gg ..P tg compared with P gg .P gt ..P tg from NMR measurements for v and the order P gg .P gt ..P tg for v9 compared with P gt .P gg ..P tg observed by NMR spectroscopy. For v, the higher population of gt compared with gg is evident in Fig. 5; however, each rotamer gives a fairly broad distribution of angles around each orientation in the EPSR fits to the neutron data for both aand b-cellobiose.
The variance between the NMR and the neutron diffraction data in v9 appears to be due to the tg rotamer being slightly more populated in the EPSR model, as P gg is similar from both the NMR and EPSR (Table 2). However, observation of the relative v9 rotamer populations in Fig. 5 indicates this discrepancy may due to the fact that EPSR gives a distribution of rotamers, while NMR rotamer population assignment is via coupling constants and only gives a single averaged value. As a result, when assessing a distribution of rotamer angles from EPSR, the exact number of hydroxymethyl groups in any one given orientation is not always clear as the distributions can be broad, as is the case for the tg rotamer for the v9 torsion in Fig. 5. Moreover the NMR results reported here are from the less concentrated sample as attempts to measure the higher concentration (0.88 M) led to spectra with poor resolution and as such the relative hydroxymethyl rotamer populations could not be accurately determined. It is possible that with a higher concentration of cellobiose in solution these rotamers have slightly different populations which are observed in the EPSR model. These results and those for NDIS studies on glucose [44] support the importance of carbohydrate-water hydrogen bonding in establishing the conformation of the hydroxymethyl groups as has been suggested by spectroscopic [66] and quantum-and  molecular mechanics studies [67] since NDIS is particularly sensitive to hydrogen-bonding interactions. Figure 6 shows the RDFs for cellobiose-cellobiose interactions that might occur in solution. The RDFs in this figure are only shown for possible hydrogen-bonding interactions between different cellobiose molecules in solution. The other putative nonhydrogen bonding association would be via C-H interactions with atoms on other cellobiose molecules. The RDFs for these intermolecular interactions were virtually non-existent (see Supporting Information File S1). As is clear from Fig. 6, the intermolecular hydrogen-bonding interactions are very slight, indicating that there are very few interactions between individual cellobiose molecules even in these concentrated solutions. There are only very small peaks in both the H-H and O-H functions which show potential hydrogen bonding between -OH groups on the different molecules signifying very limited interactions. For instance, the coordination number of the small peak at ,2Å in the O-H RDFs is 0.01 in each case. This indicates that only about 1% of the molecules show any hydrogen bonding between their OH groups. The H-H RDF shows slightly higher coordination with 8% of these molecules being associated in the solutions. Given the high concentration of molecules in solution, this is likely due to random interactions rather than any significant association of these molecules in this solution. The ring and linkage oxygens (C-O-C) to OH hydrogen RDFs (bottom panels Fig. 6) show even less association with no distinct peaks being present in these RDFs.

Cellobiose Hydroxyl Group-Water Hydrogen Bonding
Hydrogen-bonding interactions between cellobiose hydroxyl groups and water are easily distinguished via the RDFs from the   hydroxyl group-water pairs (Fig. 7). The RDFs in this figure are averages from both cellobiose anomers with the contributions weighted according to the 40:60 a-:b-cellobiose anomeric ratio. The only exceptions to this anomeric weighting are the RDFs for O4-X interactions where these atoms were labeled distinctly for a and b anomers in the EPSR model due to simulation routine requirements. The cellobiose hydroxyl groups show similar hydration with respect to both oxygen and hydrogen atoms with the exception of the HO3 group where fewer hydrogen bonds are donated to water (n Ow HO3 <0.6). Integration of g H(')-Ow (r) from 0-2.46 Å shows an average of ,0.8 hydrogen bonds being donated from cellobiose to water. Similar integration of g O(')-Hw (r) over the same distance range (0-2.46 Å ) gives n Hw O 0 ð Þ <0.9-1.0 for all hydroxyl groups, showing that each of the hydroxyl oxygens accept, on average, one hydrogen bond from water. These results are consistent with earlier predictions for monosaccharides in aqueous solution where it was hypothesized that two hydrogen bonds per hydroxyl group are formed with the surrounding water solvent due to the strong downfield shifts of saccharide hydroxyl group NMR signals that indicate hydrogen bonding involving both the O and H atoms. [66] Furthermore, from an observed absence of both distinct saccharide u OH bands and water u OH shifts in infrared spectra, it has been suggested that water is both a donor and acceptor of hydrogen bonds from the hydroxyl group. [68] Cellobiose O4-and O5-Water Hydrogen Bonding The glycosidic linkage oxygen atom-water RDFs for both acellobiose and b-cellobiose oxygens (O4a and O4b; Fig. 1) as well as the RDFs for water-oxgen interactions for both reducing and non-reducing ring oxygens are shown in Fig. 8. Both the glycosidic linakage oxygen and the non-reducing ring C-O-C oxygens accept fewer hydrogen bonds from water compared with the cellobiose hydroxyl groups. Integration of g O4-Hw (r) (0-2.46 Å ) gives ,0.6-0.7 hydrogen bonds, and similar integration of g O5'Hw (r) gives roughly the same number of bonds. This is similar to previous studies that have shown similar under-hydration of X-O-X (X ? H) when compared to hydroxyl, carbonyl or carboxylate groups in water. [43,46,48] Intriguingly, the RDFs for the O5(9)-Hw and O5(9)-Ow atom pairs (Fig. 8, lower panel) show a decrease in the hydration of the non-reducing ring oxygen at n Hw O5 0 <0.7 and n Ow O5 0 <0.7 when compared to the reducing ring oxygens at n Hw O5 <0.9 and n Ow O5 <1.2. This reduced hydration of the non-reducing ring oxygen O59 taken with similar reduced hydrogen-bond donation to water by HO3 suggests the population of an intramolecular O59???HO3 hydrogen bond which has been predicted by DFT calculations, both in the presence and absence of solvation, [26,27] by MD simulations [23] on cellobiose and by NMR spectroscopy of methyl a-cellobioside. [17] This is in opposition to gas phase microhydrated structures where there no O59???HO3 hydrogen bonding interactions were observed but rather the cellobioside structure was stabilized by water-mediated O69???HO3 interactions [28] and previous NMR/MD investigations on methyl bcellobioside where it was concluded that an O59???HO3 hydrogen bond was unlikely in water. [30] Given the differences observed in the reducing and nonreducing ring O5 and O59-water RDFs, the spatial density functions (SDFs) for the distributions of water around O5 and O59 were calculated from the EPSR modeling box. SDFs (Figs. 9A-B) give the most probable location in 3-dimensions of water molecules around the C19-O59-C5 fragments of cellobiose in solution. Specifically in Figs. 9A-B the cellobiose O59 atom is placed at the center of the laboratory axis, and the distribution of Ow atoms around these atoms is shown as a blue probability shell. Additional atoms from cellobiose are plotted to aid in visually orienting the relevant C-O-C fragment, but the atomic coordinates do not represent the average conformation of cellobiose in the EPSR simulation. The SDFs are averaged over the a:b anomeric distribution, and mathematical details of the spherical harmonic expansion calculation are given in the Supporting Information File S1. In Figs. 9A-B, the blue shells represent the most probable location of 80% of the water molecules within 0-3.0 Å of O59 and  O5, respectively. For both O5 and O59 the water molecules are located predominantly in the positive z-direction above the cellobiose molecules with an absence of density for water in the xy-plane.
The 3D SDF of water around the O59 and O5 atoms do not give the total number of water molecules present in the surrounding blue shells in Fig. 9, but rather give the highest probability of finding a water molecule at these distances. Although the absolute number of waters in these shells cannot be precisely determined, a cross-sectional projection of these SDFs can demonstrate the relative number of water molecules in the probability shells of one oxygen compared with the other. Figures 9C-D display these 2D projections onto the yz-plane and show that the distribution of waters around the O59 (Fig. 9C) is reduced in intensity relative to waters around the reducing-ring oxygen (O5; Fig. 9D), indicating that there are slightly more water molecules present around the O5 oxygen compared to the nonreducing ring oxygen. Steric hindrance of O59 hydration, given its proximity to the glycosidic linkage, could, in part, be responsible for the reduced spatial extent of the hydration shell compared with that for O5 (Fig. 9D). However, the reduced presence of water along the defined z-axis again suggests competition between water and HO3 in forming hydrogen bonds with O59. Figure 10 shows the RDF for the O59-HO3 atom pair for both aand b-cellobiose. The broad peak with a maximum at 2.28 Å is consistent with an O59???HO3 hydrogen bond; however, the distance range of this peak (,1.5-3.1 Å ) shows that this hydrogenbonding interaction is less well defined than the O59???Hw water hydrogen bond whose corresponding RDF peak is considerably sharper (Fig. 7). Previous DFT studies of solvated cellobiose predict a similar distance for this intramolecular bond ranging from 1.93 to 2.48 Å . [27] Integration of g O5'-HO3 (r) curve (Eq. 3) in Fig. 10 gives an intramolecular coordination number of ,0.46 indicating that this hydrogen bond is populated approximately 50% of the time when cellobiose is in aqueous solution. Similar to the water SDF around O59 in Fig. 9A, Fig. 11A shows the SDF for the most probable location of HO3 atoms around O59 with the O59 on the central axis, where in this figure the yellow shell represents the most probable 25% of HO3 locations around O59-HO3 at a distance of 1.5-3.0 Å -corresponding to the minimum and maximum distance in the RDF peak from Fig. 10. Again, similar to Figs 9 A-B, the cellobiose molecule is plotted only for clarity and to guide the reader, and as such does not represent the average orientation of cellobiose molecules in the solution. In Fig. 11, the HO3 hydroxyl group is generally located in the positive xy plane in a distribution consistent with rotations about the C3-O3 bond and conformational flexibility about the glycosidic linkage. Analogous to Fig. 9C the 2D projection of this SDF onto the yz plane (Fig. 11B) clearly shows that this hydroxyl group overlaps with the O59 hydration shell (Fig. 9C) thus contributing to the reduced hydration of O59.

Conclusions
The structure of cellobiose in aqueous solution has been investigated with a combination of structural techniques-NMR spectroscopy and NDIS augmented with EPSR computer modeling. Using this combination of experimental and computational techniques an atomic-level structure of cellobiose in solution has been determined that most notably gives firm evidence of the intramolecular hydrogen bond between the non-reducing ring O59 and the reducing ring HO3 group. The existence of this O59???HO3 hydrogen bond has been previously predicted by NMR spectroscopic studies on methyl a-cellobioside [17] in aqueous solution, and this bond has also been shown to be persistent for the same molecule in DMSO solvent. [18] The existence of this bond has also been determined by computational studies [25][26][27] and importantly in DFT investigations which are compared directly with NMR coupling constant measurements. Conversely, other NMR investigations on methyl b-cellobioside in both water and in methanol/water solutions concluded that the existence of any O59???HO3 bonds were ''insignificant''. [29,30] Although both methyl b-cellobioside and methyl a-cellobioside are slightly different than cellobiose measured here, in that both have an -OCH 3 group (fixed respectively in the bor aposition on C1) instead of an -OH on the C1 carbon atom, which is free to mutarotate (Fig. 1), it is unlikely this methoxy substitution would have much effect on the existence of the intramolecular hydrogen bond between the O59 and HO3 group in solution.
Diffraction techniques provide a direct determination of the structure of present in solution as opposed to spectroscopies which  can only infer structure, as these techniques only directly measure the dynamical aspects of molecules in the system. Here the measured NDIS data, interpreted through use of EPSR simulation, provide a physically reasonable model of the structure of cellobiose in water that is consistent with both neutron data and the NMR experiments. Previous findings on similar carbohydrate molecules noted that molecular geometry was 'dictated' by inter-residue hydrogen bonding. [19] This is consistent with the present work; the O59???HO3 bond in the EPSR model was reproduced only by virtue of constraints to the average glucose ring conformations and a single non-bonding distance across the glycosidic linkage. NDIS techniques are particularly useful as they can also quantify the average coordination observed in a liquid. Importantly, the neutron data is not only consistent with the presence of an inter-residue O59NNNHO3 bond, the hydrogen bonding to the reducing ring O59 oxygen is shared between water and the HO3 hydroxyl group with an average of 50% occupancy by each hydrogen-bond donor.
Other potential water hydrogen-bonding sites, namely the hydroxyl groups and ether oxygen atoms in the cellobiose molecule, have also been assessed. It was found that, on the average, each hydroxyl hydrogen donates ,0.8 hydrogen bonds to water with the exception of the HO3 group, which shows a relatively smaller number of H-Ow interactions of 0.6 hydrogen bonds to water. The C-O-C oxygen atoms-both the glycosidic linkage and the ring oxygens-accept on the average ,0.7 Hw-O hydrogen bonds from water, with the reducing ring oxygen (O5; Fig. 1) showing a slightly larger hydration of ,0.9 accepted from the surrounding water solvent. This reduction in hydration from the glycosidic linkage oxygen and the non-reducing ring oxygen are reflective of the O59 and HO3 hydrogen bonding interaction as the presence of this bond reduces the number of water molecules that may bind to either of these oxygen atoms in cellobiose.
The conformational aspects of the cellobiose structure are also delineated, both from NMR and from the EPSR model of the neutron diffraction data. The glycosidic torsion angles Q H and y H from the neutron data show a fairly tight distribution of angles around approximately 22 u and 240 u , respectively, syn-Q H /y H or 'trans' in solution, consistent with NMR measurements here and those previously reported, [16,64] as well as DFT studies. [27] Similarly hydroxymethyl torsional angles for both reducing and non-reducing rings from the neutron diffraction measurements are broadly consistent with the NMR data measured here as well as for previous measures of cellobiose in solution. [54] Supporting Information File S1 Supporting Information. (DOC)