Revisiting the NMR Structure of the Ultrafast Downhill Folding Protein gpW from Bacteriophage λ

GpW is a 68-residue protein from bacteriophage λ that participates in virus head morphogenesis. Previous NMR studies revealed a novel α+β fold for this protein. Recent experiments have shown that gpW folds in microseconds by crossing a marginal free energy barrier (i.e., downhill folding). These features make gpW a highly desirable target for further experimental and computational folding studies. As a step in that direction, we have re-determined the high-resolution structure of gpW by multidimensional NMR on a construct that eliminates the purification tags and unstructured C-terminal tail present in the prior study. In contrast to the previous work, we have obtained a full manual assignment and calculated the structure using only unambiguous distance restraints. This new structure confirms the α+β topology, but reveals important differences in tertiary packing. Namely, the two α-helices are rotated along their main axis to form a leucine zipper. The β-hairpin is orthogonal to the helical interface rather than parallel, displaying most tertiary contacts through strand 1. There also are differences in secondary structure: longer and less curved helices and a hairpin that now shows the typical right-hand twist. Molecular dynamics simulations starting from both gpW structures, and calculations with CS-Rosetta, all converge to our gpW structure. This confirms that the original structure has strange tertiary packing and strained secondary structure. A comparison of NMR datasets suggests that the problems were mainly caused by incomplete chemical shift assignments, mistakes in NOE assignment and the inclusion of ambiguous distance restraints during the automated procedure used in the original study. The new gpW corrects these problems, providing the appropriate structural reference for future work. Furthermore, our results are a cautionary tale against the inclusion of ambiguous experimental information in the determination of protein structures.


Introduction
The protein gpW from the Escherichia coli bacteriophage l is a component of the viral particle that localizes in the connector between the head and tail [1]. Biochemical studies suggest that the role of gpW is to impede the exit of the pre-packaged DNA and organize the formation of the tail during virus assembly [2]. To perform these functions gpW is thought to participate in proteinprotein and protein-DNA interactions. Such functional versatility makes gpW an interesting case example for studying macromolecular interactions, especially considering its small size (68 residues). The gpW 3D structure was originally determined by NMR using automatic assignment procedures [3]. This structure exhibits some peculiar features. For instance, the NMR study showed that gpW folds into an a+b topology consisting of two ahelices placed on top of a b-hairpin. Out of the 68 residues encoded by the gene only 50 form part of the folded structure, whereas three residues at the N-terminus and 15 at the C-terminus are disordered. The unstructured C-terminal fragment does not participate in stabilizing the native structure, but it is critical for connector assembly [1]. Another interesting structural property emerging from the NMR study is the conformation of the b-hairpin, which exhibits no significant twist and has both strands involved in tertiary contacts with the two helices, defining a single well-packed hydrophobic core according to the authors [3]. These features led to the classification of the gpW structure as a novel fold [3], status still valid today as manual and computational database searches fail to find any structural homologues.
The structural characteristics of gpW also make it an attractive candidate for folding studies. In fact, experimental studies of gpW's folding properties at the thermodynamic and kinetic level have been reported [1,4]. Equilibrium denaturation experiments of gpW have shown that the same construct used for structural studies -i.e. includes a conservative ValRThr mutation in position 2 and a C-terminal FLAG epitope followed by a hexa-histidine tag -is biologically active and folds-unfolds reversibly in a simple process compatible, at first glance, with the two-state folding mechanism [1]. A subsequent, more detailed, thermodynamic study of the gpW unfolding process using multiple structural probes and calorimetry demonstrated, however, that the a-helices of gpW melt at slightly higher temperature than the tertiary contacts defining the native environment of the sole tyrosine [4]. That is, the thermal unfolding of gpW is not concerted. In agreement with this observation, quantitative analysis of the DSC thermogram with the variable barrier model indicated that gpW folds over a marginal free energy barrier of ,1 RT, which places this protein within the downhill folding regime [4]. Finally, kinetic analysis with the laser-induced temperature jump technique revealed that gpW is also an ultrafast folder, with a relaxation time of a few microseconds at its denaturation temperature [4]. These experimental studies have elicited the interest of theoreticalcomputational groups, which have simulated gpW folding using coarse-grained models [5,6]. GpW has also become target for the extra-long molecular dynamics simulations performed by the Shaw group (K. Lindorff-Larsen, personal communication).
From the experimental side, the ultrafast folding kinetics and non-concerted unfolding behavior of gpW are the two exact properties required for performing an atom-by-atom analysis of protein folding [7]. In this analysis the thermal unfolding behavior of hundreds of individual atoms in the protein are monitored by NMR leading to a map of the folding interaction network of the protein [8]. However, before performing such analysis it is important to revisit the structural characterization of native gpW by NMR. This is so for several reasons. First, there are some differences between the original construct and that which was used for the multiprobe thermodynamic and kinetic studies. Particularly, the latter studies used a construct in which the original clone (including the ValRThr mutation) was modified to remove the FLAG epitope, the histidine tag, and the last 6 C-terminal residues of the gpW gene, which were unstructured and faraway from the folded domain in the original structure [3]. The modifications are inconsequential in terms of thermal stability, as revealed by simple comparison between the unfolding curves monitored by far-UV CD on the two constructs [1,4]. Nevertheless, it is useful to determine the NMR structure of the shorter construct for proper referencing of the atom-by-atom analysis. Second, the determination of the structure by multidimensional NMR using standard manual assignment would offer an opportunity to inspect the performance of the automatic methods that are being used in structural genomics projects [9]. Third, it is important to revisit the 3D structure of gpW given its novel fold and peculiar packing features.
Here we report the determination of the high-resolution structure of gpW without the C-terminal tags and unstructured residues using multidimensional NMR. We see that this structure conserves the overall a+b fold observed in the original study. However, the new structure shows clear differences in the packing of the b-hairpin against the two helices. In our structure the bhairpin strands display the characteristic twist observed in other protein structures. The a-helices are less curved and rotated ,40 degrees from one another relative to the original structure, thus forming a typical leucine zipper configuration. Further differences are found in tertiary packing, with the hairpin packing against the helices in an orthogonal rather than parallel orientation. These differences originate from the pattern of tertiary contacts observed among the aliphatic residues that conform the hydrophobic core. Comparison between the NMR datasets suggests that the structural discrepancies are caused by wrong long-range NOE assignments in the original study together with the inclusion of large sets of ambiguous NOEs in the automated structure calculation protocol. This interpretation is confirmed by molecular dynamics simulations in explicit solvent starting from both structures, and structure prediction calculations from the two sets of backbone chemical shift assignments using CS-Rosetta [10]. In fact, all these calculations converge onto a consensus structural ensemble for gpW that maintains the general structural features of our newly determined 3D structure. Therefore, this new structure should be used from now on as reference for future experimental and computational folding studies as well as for the interpretation of gpW's biological function.

Results and Discussion
New three-dimensional structure of gpW by NMR We performed all NMR experiments on the same gpW construct that was used before for the multiprobe thermodynamic and kinetic analysis [4]. This construct was derived from the clone used in the original NMR study [1]. Thus, both proteins bear the same T2RV mutation that was included in the original study (note that this is the case even though the 1HYW pdb file, which corresponds to the prior structure, shows a threonine in position 2). The difference lies on the C-terminus, which has been shortened here to remove the FLAG epitope, the histidine tag, and six unstructured residues (Fig. 1). The new three-dimensional structure of gpW was determined with 723 unambiguous NOEderived distances, together with 94 dihedral and 22 hydrogen bond restraints (coordinates deposited with the PDB accession code 2L6Q). The ensemble of 20 lowest energy conformers ( Fig. 2A) does not show distance or angle restraint violations greater than 0.3 Å and 5u, respectively ( Table 1). The ensemble of structures does not show significant deviations from covalent geometry and is well defined by the NMR data, as illustrated by the low root mean squared deviation (RMSD) calculated for the backbone and all heavy atoms ( Fig. 2A) ( Table 1). The Cterminal segment (residues 55-62) does not show NOE cross-peaks connecting them to the rest of the structure or NOE characteristic of secondary structure elements. In addition, residues 55-62 show chemical shifts close to random coil values. Altogether, the data indicate that this region is disordered, in agreement with the original structural study. The quality of the new gpW structure is high, with 91.2% of the residues (all those within the 4-54 structured segment) residing in the most favored regions of the Ramachandran plot for the whole ensemble ( Table 1). The resolution of this NMR structure for gpW according to the program PROCHECK-NMR [11] is equivalent to an X-ray structure with a resolution between 1.0 Å and 1.8 Å (data not shown).
In terms of secondary structure, we see that the first helix of gpW starts in residue 4 and extends up to residue 19. A 3-residue turn connects helix 1 to the first b-strand (residues 23-28) followed by a b-turn and the second b-strand (residues 31-36), conforming a b-hairpin. Another 3-residue turn connects the second strand with a-helix 2 (residues 40-54) (Fig. 2B). In this gpW structure the b-hairpin displays the right-hand twist that is characteristic of antiparallel b-structures. Multiple long-range NOE cross-peaks connecting residues of the two helices were observed resulting in a very good definition of their relative position ( Fig. 2A). These contacts (summarized in Tables 2, 3) mostly involve amino acids on the hydrophobic face of the helices (Fig. 3A, Table 2). In contrast, fewer contacts were found between helical residues and the two strands. Helices 1 and 2 show several connectivities with strand 1 ( Table 3), but only a few NOE cross-peaks were observed between strand 2 and either helix (most of them involve F35; Fig. 3B, Table 3). As a result, the new structure of gpW does not show a single well-packed hydrophobic core, but two orthogonal packing interfaces. One interface corresponds to the interactions that bring the two helices together to form a leucine zipper. The other interface corresponds to the interactions of the helices with only b-strand 1 of the b-hairpin, whereas strand 2 sits somewhat behind and is thus too far away to participate in tertiary contacts (Fig. 3). The electrostatic surface of the new gpW structure highlights the overall shape and charge distribution of the protein ( Fig. 4). This calculation reveals that the slight opening between the helices and the b-hairpin is important to facilitate exposure to the solvent of several negatively charged side chains that would be buried on a more compact structure.

Differences between the new and previous NMR structures
The two NMR structures of gpW are at considerable variance, and thus result in large backbone RMSD when superimposed (,3 Å , Fig. 5). Specifically, the helices show different curvature and are rotated from one another as consequence of different interaction interfaces (Fig. 5A,B). In the new structure the   hydrophobic side chains of the two helices are interlaced to form a leucine zipper (Fig. 3A) whereas in the original structure the hydrophobic side chains point towards the hairpin. The difference in b-hairpin orientation is also significant (Fig. 5A). All these differences originate from the fact that the pattern of tertiary contacts is drastically different in the two structures. The Table 3. Long-range (.i, i+5) NOEs connecting a-helices and b-hairpin in gpW * .  dissimilarity is particularly striking for the leucines in the helices: Leu 7 and Leu 14 in helix 1, and Leu 43 in helix 2 (Fig. 5B). Other residues in the helices, such as Ala 8, Ala 9, Ala 12 and Ala 13 in helix 1 and Ala 48 in helix 2 also adopt substantially different orientation. Altogether, these differences result in a different overall distribution of charges in the protein surface as illustrated in Fig. 4.
To investigate the source of these structural differences we need to consider several factors. A first consideration is the chemical changes between the constructs used in the two NMR studies (Fig. 1). In the previous work, a FLAG epitope followed by a hexa-histidine tag was added to the C-terminal of the gpW protein bearing the T2RV mutation. In this work the mutation is also present, but the tags are not and the sequence does not include the last 6 amino-acids encoded by the gene. However, according to the previous NMR studies, the unstructured natural tail, the FLAG epitope and the histidine tag do not show NOE contacts with the structured region. Accordingly, the coordinates deposited in the Protein Data Bank (PDB ID 1HYW) only span residues 1-58. Thus, the structural influence of the FLAG, histidine tag and the last C-terminal 10 residues should not be significant.
A second factor to consider is that the structures were determined under slightly different experimental conditions. There is some uncertainty in the experimental conditions used in the previous NMR study. The 1HYW PDB file sets the temperature employed for acquiring the NMR experiments at 25uC, whereas in the original article the experimental temperature is said to be 30uC. In the current work we determined the structure using NMR experiments acquired at 21uC. As a test, we acquired 15 N-HSQC spectra [12] of gpW at 25uC and 30uC, and compared them with the chemical shift assignments of the previous study. We requested these assignments directly from Karen L. Maxwell and Alan R. Davidson because the Biological Magnetic Resonance Bank entry 3227 cited in the original NMR work [3] is empty. The comparison demonstrates that the changes in amide 1 H N chemical shifts are minimal, ruling out temperature as the source of the structural differences. The pH was identical (pH 6.5), but there is a difference in ionic strength: the previous NMR work used 200 mM sodium chloride and 10 mM phosphate buffer, whereas in this work the sample was prepared in 20 mM phosphate buffer. To investigate whether the difference in ionic strength could cause structural changes in gpW, we recorded a 4D-[ 1 H-13 C]-HMQC-NOESY-HSQC [13] experiment in a sample including 200 mM sodium chloride. This experiment reveals the exact same NOE pattern in the absence and in the presence of 200 mM salt (Fig. S1). Therefore, altogether these considerations indicate that the differences between gpW structures are not due to changes in experimental conditions. The significant changes observed in the spatial arrangement of the secondary structure elements can only originate from substantially different sets of NOE-derived distance restraints. Comparing the NOE information from the current and previous (data also provided by Maxwell and Davidson) works we can note numerous differences, particularly relative to long-range contacts ( Tables 2, 3). Correct NOE cross-peak assignment crucially depends on the extent to which 1 H, 13 C and 15 N chemical shifts are unambiguously assigned. In this regard, here we achieved 99% versus only 89%, in the previous work (not including the assignment of 13 C carbonyl groups and side chain 15 NH 2 groups of the four Gln). In addition, we checked all chemical shift assignments with 4D-[ 1 H-13 C]-HMQC-NOESY-HSQC experiments [13], obtaining 98% of methyl 13 C and 1 H assignments in comparison to only 75% in the previous work. The correct assignment of methyl groups is essential to properly define the structure of the protein core.
For instance, in the previous NMR studies there are many NOEs assigned as involving Leu 14 with residues in the b-hairpin ( Table 3) that are equally consistent with sequential or local NOEs. Specifically, the previous dataset includes NOEs connecting the C d H 3 groups of Leu 14 with the H N of Ala 24 and the H N of Phe35 (in front of Ala 24 in the hairpin). However, the chemical shifts of the C c H 3 groups of Val 23 are nearly identical to those of the C d H 3 moieties of Leu 14. It is not possible to distinguish between these different NOE assignments (i.e. Leu 14-Ala 24 or Val 23-Ala 24 (sequential), Leu 14-Phe 35 or Val 23-Phe 35 (local)) in the 3D 15 N-edited-NOESY-type [12,13] experiment performed by the authors of the previous studies. In such situation it is important to solve the ambiguity using, for example, 4D NOESY experiments, or alternatively to be conservative and avoid assigning ambiguous long-range NOE data. A similar scenario is found for several side chain to side chain long-range NOEs between the According to the NOE data provided by the authors of the previous study, both sets of potential NOE assignments (sequential and long-range) were included in the structure calculation listed as unambiguous. Moreover, the previous structure was calculated adding on top many other NOEs classified by the authors as ambiguous, making up to ,25% of the total distance restraints in the final structure calculation round [3]. In this work we have used the standard manual procedure and have used only unambiguous NOE information from 3D and 4D experiments, resulting in a total of 723 distance restraints. As a result of the different NMR datasets and the two strategies for structure calculation the quality of the previous and new structures is drastically different. This is easily demonstrated by comparing the scores of standard structure quality  checks. For example, MolProbity [14] indicates that the new gpW structure (PDB ID 2L6Q) is in the 27 th percentile, whereas the previous one (PDB ID 1HYW) is only in the 1 st percentile.

Computational tests on the 1HYW and 2L6Q structures of gpW
From an NMR viewpoint, the new structure (PDB ID 2L6Q) is the one that best represents the native conformational ensemble of gpW, and the one that should be used from now on because it is based on a much more complete experimental dataset and does not use ambiguous NOE-derived restraints. However, the differences between the two structures raise an interesting surrogate question: which of these structures is more in accordance with what we know about protein three-dimensional structure, stability and folding? This is a particularly interesting question given the novel fold features embodied in the gpW structures. To address this question we performed molecular dynamics simulations in explicit solvent as well as structure-prediction calculations from backbone chemical shifts using the CS-Rosetta program [10].
To have sufficient statistical sampling we performed four independent molecular dynamics simulations starting from the atomic coordinates of either 1HYW or 2L6Q. These simulations were carried out using the CHARMM27 force-field with the TIP3P water model and were run for 20 ns to produce a total of 80 ns simulation time for each of the two starting structures. The four 20 ns trajectories starting from 2L6Q resulted in a fast (,1 ns) relaxation to an ensemble of structures with ,1.5 Å backbone RMSD (computed for residues 3-53) and stayed within that ensemble for the rest of the trajectory (blue in Fig. 6). The average backbone RMSD for the last 5 ns of the four trajectories was 1.42 Å , indicating that the MD ensemble is for all practical purposes equivalent to the initial 2L6Q structure. This result indicates that 2L6Q is a stable and robust structure for gpW according to the CHARMM27 force-field. We observed a quite different behavior for the simulations starting from 1HYW (red in Fig. 6). In this case, the fast ,1 ns relaxation resulted in significantly larger deviations from the starting structure, leading to over 2.5 Å backbone RMSD (again computed for residues 3-53). In one of the four simulations we also observed a second structural transition at ,16 ns to an ensemble ,4 Å RMSD away from the starting structure. The remaining three simulations did not show this transition, but since this transition happened near the end of the simulation it is possible that the others would have undergone the same change had their simulation time been longer. In any event, the average backbone RMSD relative to 1HYW for the last 5 ns of the four trajectories is 2.75 Å , which is much higher than what we observed in the 2L6Q simulations. This result indicates that the 1HYW structure is more energetically strained.
However, the most interesting result emerging from the 1HYW MD trajectories is that, as they wander off the initial structure, the ensemble becomes more similar to the 2L6Q structure. The average backbone RMSD for the last conformation of the four trajectories is 2.31 Å relative to the 2L6Q structure, compared to 2.98 Å relative to the starting 1HYW structure. In other words, the MD simulations relax the strain contained in the 1HYW structure leading to a final ensemble that converges onto the new 2L6Q structure. The representation of the secondary structure as function of time for the 1HYW simulations shows that the structural changes occurring during the first 2 ns involve mostly the hairpin and the N-terminal helix (Fig. 7). The central region of the hairpin changes from a bend to a more typical b-turn (as seen in 2L6Q), and the strands adopt a twisted conformation in which the second strand moves away from the helices (Fig. S2). Moreover, the N-terminal a-helix becomes longer, as it is observed in the 2L6Q structure (Fig. 7). The same overall conclusion emerges from the structure prediction calculations using the backbone chemical shift assignments as input data for CS-Rosetta [10]. In this case, the predicted native structure for gpW is nearly identical, whether we introduce the original or our new backbone chemical shift table as input data. The fact that the two predictions are essentially the same structure confirms that the differences between the two experimentally determined structures do not originate from discrepant backbone assignments, but are caused by the list of long-range NOEs used as distance restraints, as discussed in the previous section. The two predicted structures are very similar to 2L6Q (1.27 and 1.47 Å backbone RMSD) and much less to 1HYW (2.56 and 2.55 Å backbone RMSD). Moreover, pairwise structural comparisons show that the CS-Rosetta predicted structures share the same distinctive structural characteristics of 2L6Q (Fig. 8). The b-hairpin has a clear right-hand twist and is placed orthogonally to the a-helical interface so that only strand one is in direct contact with it. In contrast, in 1HYW the b-hairpin is flat and packing in parallel to the helical interface (upper row in Fig. 8). By the same token, the two helices are straighter, follow a leucine zipper arrangement, and the N-terminal helix is longer, exactly as observed in 2L6Q (lower row in Fig. 8).
MD simulations and CS-Rosetta predictions provide complementary computational tests for the structures because the two calculations are based on completely different input data and methodology. MD simulations use the atomic coordinates from the PDB file whereas CS-Rosetta uses the chemical shift data directly. Thus, for one calculation the input data are affected by the distance restraint information and structure determination procedure, but for the other calculation are not. Another critical difference is that the MD simulations provide dynamical information by solving Newton's equation of motion. CS-Rosetta, on the other hand, is a prediction algorithm based only on energy minimization. Finally, the CHARMM27 and Rosetta force-fields are drastically different. Therefore, the fact that both methods produce structures-ensembles that converge to the main characteristics of the 2L6Q structure regardless of the dataset that is used as input in the calculation indicates that our gpW structure is in accord with what is known about protein structure, folding and stability. These computational tests confirm that the 2L6Q structure corrects the packing issues that are present in 1HYW and which led to a distorted hairpin conformation and helix-helix interaction interface, as well as the partial burial of negative charges (Fig. 4).

Gene Cloning and plasmid construction
Residues 1-62 of the original gene of gpW provided by Alan R. Davidson was subcloned into the expression vector pBAT [15]. The new construct bears the same T2RV mutation present in the original construct, but lacks the purification tags (FLAG epitope and Histidine tag) and the last six unstructured residues of gpW ( Fig. 1).

Protein Expression and Purification
The 62-residue protein gpW was expressed in BL21(DE3) E. Coli strain (Novagen). Bacteria were grown at 37uC. After reaching an OD 600 of 0.6-0.7, protein expression was induced by adding 1 mM IPTG (isopropyl-b-D-thiogalacto-pyranoside). Expression continued for 4 h at 37uC. The cells were harvested by centrifugation and resuspended in a 20 mM phosphate buffer at pH 6.0, containing 0.2 mM NaCl and 0.1 mM protease inhibitor cocktail (Sigma). Cells were then lysed by sonication at 4uC and centrifuged at 25000 rpm for 30 minutes. The supernatant was purified by cation exchange chromatography using an SP sepharose Fast Flow column (GE Healthcare). A second purification step was necessary using reverse phase chromatography with a gradient of 0-95% water/acetonitrile and 0.1% trifluoroacetic acid. The purified protein solution was lyophilized and analyzed by SDS-polyacrylamide gel electrophoresis and electrospray mass spectrometry. Uniformly 15 N-and 13 C-labeled gpW was produced using 13 C 6 -D-glucose and 15 [12,13]. Side chain 1 H, 15 N and 13 C assignments were obtained from 3D HBHA(CO)NH, 3D CC(CO)NH and 3D HCCH-TOCSY [12,13]. NOE data were obtained from 3D- 15   (110 ms mixing time) [12,13]. All experiments were processed with NMRPipe [16] and analyzed with PIPP [17].

Structure calculation
Peak intensities from NOESY experiments were translated into a continuous distribution of interproton distances. Errors of 25% of the distances were applied to obtain lower and upper distance limits. A total of 723 unambiguous interproton distance restraints were used. 22 hydrogen bond distance restraints (r NH-O = 1.9-2.8 Å , r N-O = 2.8-3.4 Å ) were defined according to the experimentally determined secondary structure of the protein. The program TALOS+ [18] was used to obtain 94 w and Q backbone torsion angle constraints for those residues with statistically significant predictions. Structure calculations were performed with and without intra-residue distance restrains. However, the resulting structures did not show any substantial differences. Structures were calculated with the program X-PLOR-NIH 2.16.0 [19]. The starting structure was heated to 3000 K and cooled in 30,000 steps of 0.002 ps during simulated annealing. The minimized target function includes a harmonic potential for experimental distance restraints, a quadratic van der Waals repulsion term for the non-bonded contacts, a square potential for torsion angles and a torsion angle database potential of mean force. The final ensemble of 20 NMR structures was selected based on lowest energy and no restraint-violation criteria. These conformers have no distance restraint violations and no dihedral angle violations greater than 0.3 Å and 5u, respectively. Structures were validated using PROCHECK-NMR [11] and MolProbity [14], which show that the family of 20 structures is of considerably high quality in terms of geometry and side chain packing. Structures were analyzed with PYMOL [20]. Coordinates were deposited in the Protein Data Bank with accession code 2L6Q and chemical shifts were deposited in the Biological Magnetic Resonance Bank (BMRB code 17321).

Molecular Dynamics Simulations
Molecular dynamics simulations were performed using GRO-MACS [21] version 4.5.3 using the CHARMM27 force field [22]. The protein structures were placed in a cubic unit cell with a minimum distance of 1.0 nm to the box edge. Steepest descent minimization was performed followed by an addition of ions in order to neutralize the system, yielding a final system of about 6,800 TIP3P water molecules and 1 Cl 2 ion for 1HYW and 10,000 TIP3P water molecules and 4 Cl 2 ions for 2L6Q. The simulation box of 2L6Q is larger due to the presence of a longer C-terminal tail. Another steepest descent minimization was performed to the solvated system. Non-bonded interactions were evaluated using a twin range cutoff of 0.9 and 1.4 nm together with a reaction field (RF) correction [23] to compensate for the neglect of electrostatic interactions beyond the longer range cutoff (e RF = 78.0). Interactions within the shorter-range cutoff were evaluated at every step (2 fs), whereas interactions within the longer-range cutoff were evaluated every 5 steps (10 fs). Temperatures were maintained using the Berendsen thermostat [24] with a coupling time of 0.1 ps. The Berendsen barostat was used with a coupling time of 1 ps and an isotropic compressibility of 4.5610 25 bar 21 to maintain a constant pressure of 1bar. All bonds were constrained using the LINCS algorithm [25]. Periodic boundary conditions were applied and simulations were run using a 2 fs time step, with neighbor list updates every 10 fs. Each system was first energy-minimized and then simulated for 2.5 ps with position restraints on all heavy atoms to relax the system. After a further 5 ps of simulation without restraints, 20 ns production runs were performed. Four 20 ns-long simulations were performed for both 1HYW and 2L6Q.

Structure Predictions using CS-Rosetta
Calculations using CS-Rosetta [10] were performed using the experimentally determined 13 C a , 13 C b , 15 N, 1 H a and 1 H N NMR chemical shifts. For the calculations we used either the chemical shifts from the original NMR study (provided directly by K. L. Maxwell and A. R. Davidson), or the ones determined by us in this study (available in BMRB 17321). 1200 structural models were generated with CS-Rosetta for each chemical shift dataset and the lowest energy structure was chosen as the final prediction. Figure S1 Slices of 4D-[ 1 H-13 C]-HMQC-NOESY-HSQC spectra of gpW. Spectra were acquired at pH 6.5, 20 mM phosphate (blue spectrum) and 200 mM NaCl (red spectrum). The corresponding residue and 13 C-1 H pair are shown on top of each panel. The NOE cross-peak assignments are indicated. Empty squares in the bottom panels indicate the position of NOE crosspeaks corresponding to several distance restraints used in the previous structure calculation (Tables 2,3) that were not observed in our spectra. (TIF) Figure S2 GpW structures from NMR and MD simulations. Backbone superposition of the previous gpW NMR structure (red; PDB ID 1HYW), the new (blue; PDB ID 2L6Q) and the structures resulting from the four molecular dynamics simulations on PDB ID 1HYW shown in Fig. 6 (all in orange). ( )