REDCRAFT: A Computational Platform Using Residual Dipolar Coupling NMR Data for Determining Structures of Perdeuterated Proteins Without NOEs

Nuclear Magnetic Resonance (NMR) spectroscopy is one of the two primary experimental means of characterizing macromolecular structures, including protein structures. Structure determination by NMR spectroscopy has traditionally relied heavily on distance restraints derived from nuclear Overhauser effect (NOE) measurements. While structure determination of proteins from NOE-based restraints is well understood and broadly used, structure determination by NOEs imposes increasing quantity of data for analysis, increased cost of structure determination and is less available in the study of perdeuterated proteins. In the recent decade, Residual Dipolar Couplings (RDCs) have been investigated as an alternative source of data for structural elucidation of proteins by NMR. Several methods have been reported that utilize RDCs in addition to NOEs, and a few utilize RDC data alone. While these methods have individually demonstrated some successes, none of these methods have exposed the full potential of protein structure determination from RDCs. To date, structure determination of proteins from RDCs is limited to small proteins (less than 8.5 kDa) using RDC data from many alignment media (>3) that cannot be collected from larger proteins. Here we present the latest version of the REDCRAFT software package designed for structure determination of proteins from RDC data alone. We have demonstrated the success of REDCRAFT in structure determination of proteins ranging in size from 50 to 145 residues using experimentally collected data and large proteins (145 to 573 residues) using simulated RDC data that can be collected from perdeuterated proteins. Finally, we demonstrate the accuracy of structure determination of REDCRAFT from RDCs alone in application to the structurally novel PF.2048 protein. The RDC-based structure of PF.2048 exhibited 1.0 Å of BB-RMSD with respect to the NOE-based structure by only using a small amount of backbone RDCs (∼3 restraints per residue) compared to what is required by other approaches. Author Summary Residual Dipolar Couplings have the potential to reduce the cost and the time needed to characterize protein structures. In addition, RDC data have been demonstrated to concurrently elucidate structure of proteins, perform assignment of resonances, and be used in characterization of the internal dynamics of proteins. Given all the advantages associated with the study of proteins from RDC data, based on the statistics provided by the Protein Databank (PDB), surprisingly the only 124 proteins (out of nearly 150,000 proteins) have utilized RDCs as part of their structure determination. Even a smaller subset of these proteins (approximately 7) have utilized RDCs as the primary source of data for structure determination. The impeding factor in the use of RDCs is the challenging computational and analytical aspects of this source of data. In this report, we demonstrate the success of the REDCRAFT software package in structure determination of proteins using RDC data that can be collected from small and large proteins in a routine fashion. REDCRAFT accomplishes the challenging task of structure determination from RDCs by introducing a unique search and optimization technique that is both robust and computationally tractable. Structure determination from routinely collectable RDC data using REDCRAFT can lead to faster and cheaper study of larger and more complex proteins by NMR spectroscopy in solution state.


22
Abstract: 23 Nuclear Magnetic Resonance (NMR) spectroscopy is one of the two primary experimental means 24 of characterizing macromolecular structures, including protein structures. Structure determination 25 by NMR spectroscopy has traditionally relied heavily on distance restraints derived from nuclear 26 Overhauser effect (NOE) measurements. While structure determination of proteins from NOE-27 based restraints is well understood and broadly used, structure determination by NOEs imposes 28 increasing quantity of data for analysis, increased cost of structure determination and is less 29 available in the study of perdeuterated proteins. In the recent decade, Residual Dipolar Couplings 30 (RDCs) have been investigated as an alternative source of data for structural elucidation of proteins 31 by NMR. Several methods have been reported that utilize RDCs in addition to NOEs, and a few 32 utilize RDC data alone. While these methods have individually demonstrated some successes, 33 none of these methods have exposed the full potential of protein structure determination from 34 RDCs. To date, structure determination of proteins from RDCs is limited to small proteins (less 35 than 8.5 kDa) using RDC data from many alignment media (>3) that cannot be collected from 36 larger proteins. Here we present the latest version of the REDCRAFT software package designed 37 for structure determination of proteins from RDC data alone. We have demonstrated the success 38 of REDCRAFT in structure determination of proteins ranging in size from 50 to 145 residues using 39 experimentally collected data and large proteins (145 to 573 residues) using simulated RDC data 40 that can be collected from perdeuterated proteins. Finally, we demonstrate the accuracy of structure 41 determination of REDCRAFT from RDCs alone in application to the structurally novel PF.2048 Introduction 64 Nuclear Magnetic Resonance Spectroscopy is a well-recognized and utilized approach to 65 structure determination of macromolecules including proteins. NMR spectroscopy has contributed 66 to structural characterization of nearly 11,649 proteins based on statistics reported by the Protein 67 DataBank(1-3) (PDB). Although NMR studies may in general be more time consuming and costly 68 than X-ray crystallography, they provide the unique benefit of observing macromolecules in their 69 native aqueous state, which provide a better understanding of molecular interactions and internal 70 dynamics at various timescales and resolutions. 71 Despite the changes that NMR spectroscopy has observed over the years, the methodology

120
In the following sections we present three sets of results, all of which demonstrate structure 121 determination of proteins from RDCs alone to reduce the overall cost of structure determination.

122
In the first set, we explored the structure determination of all proteins by REDCRAFT, for which 123 sufficient experimental RDC data were deposited into the BMRB database. In each of these 124 exercises, we used substantially smaller set of RDC data than the previously reported studies. In 125 the second set of results, we have investigated the success of REDCRAFT in structure 126 determination of large proteins using synthetically generated RDCs. The structure of these proteins 127 had been previously characterized by distance constraints while including a very small subset of C'} in five alignment media. Two previous studies used this full set of RDC data to determine a 156 structure of GB3 to within 1Å of the corresponding X-ray structure(32, 33). In a previous 157 REDCRAFT study (34), the structure of GB3 was determined using {N-H N , Cα-Hα} RDCs in two 158 alignment media. However, the collection of {Cα-Hα} RDCs is uncommon due to sample 159 preparation requirement and added complexity in NMR data interpretation. Using these RDCs, 160 REDCRAFT was able to reconstruct the structure to within 0.6-2.4Å of the NMR structure. For 161 the purposes of this study, the RDC data was reduced to the vectors {N-C', N-H N } since they can 162 be collected from perdeuterated protein samples. Utilization of these RDCs are more challenging 163 than the previous set due to their planarity with one another. Using these vectors, we were able to 164 calculate a structure of this protein with BB-RMSD of less than 2.5Å.

165
Rubredoxin -Previously, the structure of Rubredoxin was characterized to within 1.81Å 166 of the X-ray structure using the following RDC vectors: {N-C', N-H N , C'-H, Cα-Hα, H N -Hα, Hα-167 H N } in two alignment media(10). Again, to simulate an RDC set that could be collected from a 168 perdeuterated protein, that experimental RDC set was reduced to {N-H N , C'-H} from two 169 alignment media. A BB-RMSD of 1.12Å and 1.02Å were obtained in relation to the NMR and X-170 ray structures.    Table S1. Overall, both structures (with and without RDCs) exhibited high quality 198 structures, with excellent structure quality scores. The RDC Q-factors for the two alignment media 199 M1 and M3 are 0.340 ± 0.020 and 0.320 ± 0.031, respectively for the models generated without 200 RDCs, and 0.275 ± 0.015 and 0.280 ± 0.028, respectively, for models generated using RDC data 201 as restraints. The DP scores, assessing how well the models fit to the unassigned NOESY peak 202 list data, are 0.905 and 0.905 for the structures modeled without and with, respectively, RDC data.  Table 2. It was shown that the Q-217 factor for M2 was slightly improved. Note that the Q-factor for M1 incurred a slight increase 218 during minimization due to the correction of a Van der Waal collisions in the computed structure.    Figure 2. Results for PF2048.1 (in red) A) before minimization and B) after minimization are 229 shown superimposed to the NMR structure without RDCs (in blue) and the NMR structure with 230 RDCs (in yellow).

233
The results of structure calculation for large proteins using synthetic RDCs are shown in 234   Table 3 and Figure 3. Although the structure of ChR145 was characterized by REDCRAFT using 235 experimental data (reported in Table 1), here we have repeated the structure determination of this 236 protein with synthetic data to illustrate the possibility of full structure determination (instead of a 237 fragmented study) if adequate RDCs were collected. In this study, ChR145 was characterized in 238 one full continuous segment with a BB-RMSD of 1.45Å with respect to the reference structure. In 239 addition, the resulting structure had excellent Q-factors. 240 In the cases of Lpg1496 and Enzyme 1, fragmented study was performed due to space. In such instances, the structure determination was augmented with short refinement of each 244 fragment followed by their integration using Xplor-NIH. For Lpg1496, the largest contiguous fragment characterized as 138 residues in length displaying a BB-RMSD of 1.73Å. Additional 246 fragments ranged from 50 to 75 residues in length. All fragments reported Q-factors indicative of 247 reliable structure in each alignment medium as well as low BB-RMSDs to the reference structure.

248
The longest fragment for Enzyme 1 was 208 residues, which exhibited a BB-RMSD of 1.78Å. All      is therefore of great importance for any RDC-based structure determination technique to be able 361 to characterize structures from this subset of data. In this report we will demonstrate the success 362 of REDCRAFT in calculation of protein structures under these sparse data conditions.

363
REDCRAFT is also the only publicly available software package that is developed using a with an existing structure. Theoretically, the torsion angles adjoining any two peptide planes with 380 the best fitness to the RDC data should constitute the correct geometry and therefore structure 381 determination would be completed. Practically however, the globally optimal geometry will nearly 382 always not be ranked as the first (due to experimental or structural noise); necessitating a more of REDCRAFT (version 4.0) has also adopted NEF compliance in data import/export procedures, 412 and has incorporated an advanced decimation process that has allowed for successful structure 413 calculation of proteins with as much as ±4Hz of experimental noise (67,69).

432
During the first phase of our experiment, we selected the target proteins (shown in Table  4) based on the availability of RDC data in BMRB or PDB, structural diversity, and existence of 434 NMR or X-ray structure. RDC data for all the proteins except SR10 were obtained from the 435 BMRB(30), while the RDC data for SR10 were obtained from the SPINE database(35). More 436 detailed information regarding the exact RDCs can be found in the Table S1 of the Supplementary   437 Material. Table 4 provides some self-explanatory information for each protein including the final 438 column that highlights the average backbone similarity between the X-ray and NMR structures.

439
The protein GB1 has been previously studied in depth(28, 74) and represents an ideal 440 candidate to be used as a "proof of concept" case. GB3, an analog of GB1, was also investigated 441 in this study using a different set of RDCs. The RDCs for the GB3 were previously collected(32,    The protein was purified using a standard Ni affinity followed by size exclusion two-step 500 chromatography method first as previously reported (85 Tables S1 and S2). In addition, 132 backbone dihedral angle restraints   Under certain circumstances, structure determination by REDCRAFT is recommended to 571 be conducted in discrete fragments. One such instance is based on gap in the experimental data. In 572 comparison to the NOE-based structure determination, this can be a very powerful feature.

573
Fragmented study of a protein allows direct study of a certain region of interest in a protein and 574 therefore reduce the overall cost of data acquisition. A second instance relates to structure noise. The effect of structural noise, sometimes, accrues to become noticeable for fragments larger 579 than one hundred amino acids. Any existing gaps of less than 6 amino acids can easily be filled 580 during the process of structural refinement. In this study we have used XPLOR-NIH(15) to address 581 variation from ideal peptide geometries and complete the missing gaps. More specifically, during 582 the final refinement process, each structure was subjected to 30,000 steps of Powell minimization 583 that included the same set of RDCs used during the structure calculation with REDCRAFT. Aside 584 from completion of the missing residues, these minimizations normally resulted in structural 585 variation of less than 0.5A.

586
Funding for this project was provided by NIH grants P20 RR-016461 (to HV) and application in a systematic search algorithm for determining protein backbone 668 structure. Journal of biomolecular NMR. 2004;29:223-42. 669