Compatible topologies and parameters for NMR structure determination of carbohydrates by simulated annealing

The use of NMR methods to determine the three-dimensional structures of carbohydrates and glycoproteins is still challenging, in part because of the lack of standard protocols. In order to increase the convenience of structure determination, the topology and parameter files for carbohydrates in the program Crystallography & NMR System (CNS) were investigated and new files were developed to be compatible with the standard simulated annealing protocols for proteins and nucleic acids. Recalculating the published structures of protein-carbohydrate complexes and glycosylated proteins demonstrates that the results are comparable to the published structures which employed more complex procedures for structure calculation. Integrating the new carbohydrate parameters into the standard structure calculation protocol will facilitate three-dimensional structural study of carbohydrates and glycosylated proteins by NMR spectroscopy.


Introduction
Carbohydrates are important molecules for energy storage and metabolism in all organisms. Carbohydrate binding proteins play roles in carbohydrate biosynthesis, transport, metabolism, and degradation. Further, glycosylation is an important protein post-translational modification that affects protein stability, conformation and signaling function. NMR structures of proteins and nucleic acids account for about 10% of the total number of structures in the Protein Data Bank (PDB) [1], but NMR structures of carbohydrate-protein complexes, glycosylated proteins, and solo carbohydrate molecules represent only a small fraction of these. As of June 2017, there were more than 11,000 solution NMR structures in the PDB, but of these only about 50 structures contain saccharides. The large molecular weight and poor dispersion of sugar NMR signals can cause difficulties in NMR structure determination. However, a key factor contributing to the low number of carbohydrate-containing structures is likely to be the lack of a simple and convenient protocol including topologies and parameters to determine the NMR structure of carbohydrate molecules.
NMR methods have been employed to investigate saccharides, providing rich structural information, such as anomeric configuration, molecular weight, conformation and dynamics. Due to the flexible nature of carbohydrates and their poor chemical shift dispersion, NMR structure determination is often carried out in conjunction with molecular modelling [2]. Researchers have made great efforts to develop force fields for carbohydrates, and the GLY-CAM force field is the most widely used [3]. In addition, most of the popular molecular dynamics programs, such as Amber, GROMACS and CHARMM, have their own carbohydrate force fields [4][5][6]. However, for NMR structure determination of biomacromolecules, specialized programs are often used to combine the experimental restraints with the chemical structure constraints (i.e. structural parameters). The chemical structure constraints are defined in topology and parameter files. The topology file contains the definition of residues (units of biopolymers) and patches which make specific chemical modifications to residues (such as disulfide bonds). The topology definition includes the terms BOND (the chemical bond length between two atoms), ANGLE (the bond angle of two bonds connected to one atom), DIHEDRAL (the dihedral angle formed by four atoms, which is generally rotatable between conformations), and IMPROPER (the dihedral angle formed by four atoms which need not correspond to a rotatable bond, used in NMR structure calculations to maintain proper chirality of a tetrahedral center or to maintain planarity in a conjugated region, and thus is generally held rigid). The parameter file contains the values and energies of each type of topology term in the topology file. The NMR programs sacrifice part of the accuracy of the force field to obtain the compact three-dimensional structure from a linear extended structure within a reasonable calculation time frame. The calculation protocols in these programs are generally convenient and are widely used by researchers, providing reliable and comparable structures.
The most popular and widely used programs for NMR structure calculations include Crystallography & NMR system (CNS) [7], XPLOR/Xplor-NIH [8], and DYANA/CYANA [9,10]. CNS and XPLOR/Xplor-NIH have almost identical file formats for topology and parameters, and both contain carbohydrate files designed for X-ray structure determination which are not well validated for NMR structure calculation. DYANA/CYANA does not contain topology and parameter files for carbohydrates. The literature describing NMR structures of carbohydrates provides several approaches to overcome the lack of topology and parameter files. One way is to use other generic molecular dynamics software at the refinement stage that can handle carbohydrates [11,12], such as Amber [4]. Another way is to use parameters generated by other software or web-servers that are designed for the generation of parameters for any type of ligand, such as XPLO2D [13] and the PRODRG server [14]. There are also other approaches, such as modification/extension of original topology/parameter files for CYANA, CNS or XPLOR/XPLOR-NIH by combining with other programs [15][16][17]. However, the reported procedures often lack sufficient detail to be readily reproducible by other researchers. In addition, these combined methods complicate the structure determination process, and sometimes require specialist protocols that are different from the well-established protocols for proteins and nucleic acids [18]. Therefore, there is a need to construct convenient and compatible carbohydrate parameters that can be used with the well-established protocols for protein and nucleic acid structure calculation.
In this study, the carbohydrate topology and parameter files for CNS were checked and modified. The modifications were kept to a minimum and were verified to be compatible with the simulated annealing protocols for proteins and nucleic acids. The modifications include: the energies of BOND, ANGLE, DIHEDRAL, and IMPROPER terms were set to the corresponding values for protein/nucleic acid parameters; IMPROPER term definitions in the topology file were revised for sugar ring chiral centers; patches for O-and S-glycosylation were added; and some of the missing patches for L-saccharides as well as other missing parameters were also added. The new topology and parameter files were validated by recalculation of previously published structures including protein-carbohydrate complexes and glycosylated proteins.

Topology and parameter files
The original topology and parameter files carbohydrate.top and carbohydrate.param in CNS version 1.3 were modified. First, three parameters for the IMPROPER term of the NAG N2-C7 peptide bond were added to prevent the CNS program from terminating during the structure calculation process due to errors. Second, the energies for the BOND, ANGLE, IMPROPER, and DIHEDRAL terms were set to 1000.0, 500.0, 500, and 2.0, respectively, which are the same as the corresponding values for protein/nucleic acid parameters (protein-allhdg5-4.top and protein-allhdg5-4.param). Third, the IMPROPER terms of all sugar chiral carbons were redefined using four tetrahedral vertex atoms. Fourth, a number of patches were constructed for O-and S-glycosylation with α-or β-linkages at C1 for both L-and D-saccharides. The new topology and parameter files (carbohydrate-nmr.top and carbohydrate-nmr. param) are provided in the S1 File of this paper.

Structure calculation
The experimental restraints of the structures containing saccharides were downloaded from BioMagResBank (http://www.bmrb.wisc.edu) [19] or the Protein Data Bank [1]. The standard protocols in the files generate_seq.inp and generate_extended.inp for CNS were used to generate a linear extended structure as the initial structure for the simulated annealing calculations. The structures were recalculated using the default simulated annealing protocol (in the file anneal.inp) in CNS version 1.3. For each protein, a total of 100 structures were calculated and the 20 lowest energy structures were selected for analysis. The structures were visualized and analyzed using the program PyMol (http://www.pymol.org) (Schrödinger, LLC). Carbohydrate Ramachandran plots were generated using the CARP server (http://www.glycosciences. de/tools/carp/) [20].

Results and discussion
To check the original carbohydrate topology and parameter files in CNS, I recalculated a previously published structure of CCL2 in complex with a glycan (PDB 2LIQ) [11]. The published structure was originally calculated using CYANA and was refined in Amber with NOE-derived distance restraints, hydrogen bond distance restraints, and backbone dihedral angle restraints. In my calculation, the same restraints were used in the default simulated annealing protocol of CNS. I found that the CNS program terminated due to errors because the original files carbohydrate.top and carbohydrate.param lacked three IMPROPER term parameters for the NAG N2-C7 peptide bond. After adding the missing parameters, I also found that the energies for the BOND, ANGLE, IMPROPER, and DIHEDRAL terms in the carbohydrate parameter file were significantly different from those in the protein/nucleic acid parameter files. Some of the ANGLE term energies in the carbohydrate parameter file were smaller than those for experimental restraints in the default simulated annealing protocol of CNS, which will result in large angle deviations from the ideal geometry in the calculation. Therefore, I recalculated the structures after setting the energies of BOND, ANGLE, IMPROPER, and DIHEDRAL terms in the carbohydrate parameter file to the same values as those in the protein/nucleic acid parameter files. However, when I inspected the conformation of the carbohydrate, some chiral carbon atoms in the sugar ring have problematic bonding linkages (Fig 1a). Each of these carbon atoms has one bonded hydrogen atom whose position is incorrect because of flipping along the line of the C-H bond and the carbon center. Since the carbon chirality in the original topology file carbohydrate.top is defined by the IMPROPER term of the central carbon and the surrounding heavy atoms (i.e. non-hydrogen atoms) (Fig 1b), there is no energy for the IMPROPER term applied to the hydrogen atom. The position of the hydrogen atom was therefore restrained only by BOND and ANGLE parameters, and if the overall energy was minimized, this should be sufficient to drive the hydrogen atom to the correct position. However, without an energy value for the IMPROPER term applied to the hydrogen atom, the only energy change among all energy terms is the BOND energy, which is defined by the non-directional distance between two atoms when the hydrogen atom moves along the line connecting the central atom and the ideal tetrahedral vertex during the simulated annealing. This local energy minimum will result in trapping of the molecule in the flipped position, which caused the problem observed in the simulated annealing of the carbohydrate. In crystallography, this problem will not arise, because hydrogen atoms are not added at low resolution, and at high resolution, the electron density will provide additional constraints to avoid the flipping.
The protein part of the structure has no hydrogen flipping problem of chiral carbon atoms in the simulated annealing calculation, and the chirality of all the chiral carbons in the protein is defined by the IMPROPER term energies of the four tetrahedral vertex atoms surrounding the central carbon atom in the protein topology file (Fig 1c). This definition provides the same energy for the four atoms, so there is no local energy minimum in the flipped position. Therefore, I modified the carbohydrate topology and parameter files, and replaced the IMPROPER term definition of the sugar ring chiral carbon atoms using the four tetrahedral vertex atoms including hydrogen atoms (Fig 1d). After recalculation using the newly defined topology and parameter files, the structures of the CCL2-glycan complex showed the correct chiral carbon bonding, and the structures are very similar to the published structures (Fig 2a and 2b). The carbohydrate Ramachandran plots of original and recalculated structures show a similar phipsi torsion angle distribution in the favored region (Fig 3a and 3b; Table 1). This indicates that  the new topology and parameter files are compatible with the standard simulation annealing protocol in the NMR structure calculation, and the results are comparable with other molecular dynamics software such as Amber. It should be noted that the energy of the DIHEDRAL term is much smaller than the other three term energies in both the original and the new modified parameter files, because dihedral angles between conformations are variable over a wide range, which is different from the BOND, ANGLE, and IMPROPER terms. Therefore, the qualitative uniform small DIHEDRAL energy is used only to simplify the parameters, and is not critical for the structure calculation.
The original carbohydrate topology and parameter files have no patch for O-glycosylation and S-glycosylation, although O-glycosylation is a frequent post translational modification in proteins. Some patches for the linkages of L-saccharides are also absent. In my newly developed topology file, I construct the patches A1S and A1T for α-linkage from the carbohydrate C1 to Ser OG and Thr OG1, B1S and B1T for β-linkage from the carbohydrate C1 to Ser OG and Thr OG1, A1C and B1C for α-and β-linkage from the carbohydrate C1 to Cys SG, B13 and B16 for β(1,3)-and β(1,6)-linkages, and B13L, B16L, A1SL, B1SL, A1TL, B1TL, A1CL and B1CL for the linkages of L-saccharides. The new patches for O-glycosylation were tested by recalculating the structure of the glycoprotein GalNAcα-IFNα2a (PDB 2LMS) which contains an O-glycosylated threonine [16]. In the GalNAcα-IFNα2a structure, the carbohydrate is more flexible than most parts of the protein. There are only 17 restraints between the carbohydrate A2G and the protein, and they are all on the A2G H1 and H8 atoms. In the recalculated structures, all carbohydrate carbons have the correct bonding and chirality. The sugar rings of about half of the structures present a chair conformation, while others present boat or envelope conformations. In the published structures, all the sugars are in the chair conformation; however, in the structure calculation there is insufficient conformational restraint (NOE or dihedral) applied to achieve the chair conformation. Nevertheless, the chair conformation is the most stable conformation of pyranose in general, so the sugar is generally assumed to be in the chair conformation in the absence of experimental evidence to suggest otherwise [18,21]. NMR analysis of J-couplings can also be used to experimentally determine the conformation of the sugar ring [2]. In either case, additional dihedral restraints on the sugar ring can be introduced in the structure calculation. I performed a new calculation of PDB 2LMS with two additional dihedral restraints to the sugar ring, resulting in structures which are very similar to the published ones (Fig 2c and 2d). The carbohydrate Ramachandran plots of the original and recalculated structures also show similar phi-psi torsion angle distributions (Fig 3c and  Table 1). This demonstrates that the topology and parameters presented here can be used to determine the structures of glycosylated proteins with the standard simulated annealing protocol in CNS. These topology and parameter files were also used in the recent structure determination of several O-glycosylated carbohydrate binding modules, providing structural insight into the stabilizing effect of O-glycosylation [22].
The topology and parameter files were further tested by recalculating most of the proteincarbohydrate complex and glycosylated proteins available in the PDB. These structures include four protein-carbohydrate complexes, seventeen glycosylated proteins/peptides, and one carbohydrate (Table 2). These calculations demonstrate that the new topology and parameter files for carbohydrates work well with the standard CNS simulated annealing protocol for different kinds of protein and carbohydrate molecules. It should be noted that, similar to PDB 2LMS, the sugar rings in the calculated structure ensemble may present different conformations if the carbohydrate is flexible or lacks sufficient distance restraints, while in some previous studies, additional dihedral angle restraints were introduced to maintain the sugar ring in the theoretical lowest energy conformation or the conformation observed in crystal structures [21,23,24]. For most of the tested structures, it should be noted that the RMSDs in the recalculated structures are larger than those in the original structures. The larger RMSDs suggest that the recalculated structures are more "flexible" than the original, which may be caused by differences in the force field, calculation protocol, and/or by the lack of a refinement stage in the recalculation. However, some extremely large differences indicate that some published structures were over-refined by employing an MD program/force field. For example, in the structure of 2KR2, the sugar (maltose) has a very small RMSD (0.10 Å) after refinement using the COSMOS field [15], but the sugar binds to a partially flexible loop region with an RMSD of 2.5 Å. There are some NOE restraints between the sugar and protein, but no intramolecular NOE restraint for the sugar was used. Therefore, the extremely low RMSD of the sugar arises from the refinement, which resulted in a single conformation in the published structures. In my recalculated structures, the RMSD becomes 2.06 Å, which is more reasonable for a sugar that binds to loops with~2.5 Å RMSD. In summary, I have presented a suite of CNS topology and parameter files for carbohydrates. These files were demonstrated to be compatible with the standard simulated annealing calculation protocols for proteins and nucleic acids. Because CNS and XPLOR/Xplor-NIH share similar topology/parameter file format and the file modification procedures are simple, these files could be easily integrated into future upgrades of these programs. Integrating the carbohydrate topology and parameters into the standard structure calculation protocol will facilitate the three-dimensional structural study of carbohydrates and glycosylated proteins by NMR spectroscopy.
Supporting information S1 File. Revised carbohydrate topology and parameter files. (ZIP)