^{1}

^{2}

^{3}

^{4}

^{4}

^{5}

^{*}

^{1}

^{2}

^{3}

^{*}

Conceived and designed the experiments: GDF JM TK. Performed the experiments: GDF. Analyzed the data: GDF JM TK. Contributed reagents/materials/analysis tools: NAL CG. Wrote the paper: GDF TK.

The authors have declared that no competing interests exist.

Conformational ensembles are increasingly recognized as a useful representation to describe fundamental relationships between protein structure, dynamics and function. Here we present an ensemble of ubiquitin in solution that is created by sampling conformational space without experimental information using “Backrub” motions inspired by alternative conformations observed in sub-Angstrom resolution crystal structures. Backrub-generated structures are then selected to produce an ensemble that optimizes agreement with nuclear magnetic resonance (NMR) Residual Dipolar Couplings (RDCs). Using this ensemble, we probe two proposed relationships between properties of protein ensembles: (i) a link between native-state dynamics and the conformational heterogeneity observed in crystal structures, and (ii) a relation between dynamics of an individual protein and the conformational variability explored by its natural family. We show that the Backrub motional mechanism can simultaneously explore protein native-state dynamics measured by RDCs, encompass the conformational variability present in ubiquitin complex structures and facilitate sampling of conformational and sequence variability matching those occurring in the ubiquitin protein family. Our results thus support an overall relation between protein dynamics and conformational changes enabling sequence changes in evolution. More practically, the presented method can be applied to improve protein design predictions by accounting for intrinsic native-state dynamics.

Knowledge of protein properties is essential for enhancing the understanding and engineering of biological functions. One key property of proteins is their flexibility—their intrinsic ability to adopt different conformations. This flexibility can be measured experimentally but the measurements are indirect and computational models are required to interpret them. Here we develop a new computational method for interpreting these measurements of flexibility and use it to create a model of flexibility of the protein ubiquitin. We apply our results to show relationships between the flexibility of one protein and the diversity of structures and amino acid sequences of the protein's evolutionary family. Thus, our results show that more accurate computational modeling of protein flexibility is useful for improving prediction of a broader range of amino acid sequences compatible with a given protein. Our method will be helpful for advancing methods to rationally engineer protein functions by enabling sampling of conformational and sequence diversity similar to that of a protein's evolutionary family.

It has long been known that a protein's native state is best represented as an ensemble of conformations rather than as a single structure

Two related concepts characterizing and interpreting properties of protein conformational ensembles have been proposed: The first suggests a correspondence between the conformational heterogeneity present in crystal structures and the native-state dynamics of proteins observed in simulations and using nuclear magnetic resonance (NMR) measurements. Several studies provide support for this idea. Zoete et al. concluded that the conformational changes present in a large number of crystal structures of HIV-1 protease reflect the inherent flexibility of the protein

The second concept proposes a link between the dynamics of a single protein and the conformational variability explored within its family of homologous proteins. This link was suggested based on the similar conformational variability observed in an MD simulation of myoglobin and in structures of different members of the globin family

To combine the two concepts outlined above, here we ask whether conformational ensembles reflecting variability observed in protein crystal structures of a single sequence can be simultaneously related to experimentally determined native-state solution dynamics of an individual protein, and to the conformational and sequence variability of the protein's family. To address these questions, we investigate two related hypotheses using ubiquitin as a model system: First, we test whether ensembles generated using the Backrub motional mechanism (“Backrub ensembles”), a model inspired by heterogeneity observed in experimental protein structures

Supporting our hypotheses, we find Backrub ensembles that are simultaneously consistent with native-state dynamics reflected in RDC measurements, the conformational variability observed in ubiquitin complex structures, and the conformational and sequence diversity of ubiquitin homologs. As an additional validation of our approach, we show that Backrub ensembles give similar agreement with the RDC data as ensembles generated from RDC-restrained MD simulations

We set out to investigate the hypothesized relations between conformational changes reflecting observed heterogeneity in protein crystal structures, native-state protein dynamics and evolutionarily sampled conformational and sequence diversity in two steps (

First, to test relation 1, we generated ensemble descriptions of ubiquitin dynamics using the Rosetta scoring function and several parameterizations of the Backrub motional model (described below) without using experimental restraints. Subsequently we selected ensembles according to their agreement with RDC measurements (Test 1). This approach is significantly different from many of the methods applied earlier to find ensembles compatible with NMR restraints

Second, we use the insight gained from the comparison of Backrub ensembles with characteristics of solution-state dynamics to evaluate relation 2 (

To test relation 1, our approach first uses unrestrained conformational sampling with the Backrub motional model to generate a large set of initial conformations, starting from the ubiquitin crystal structure (Protein Data Bank (PDB) code 1UBQ). We use a Monte Carlo protocol consisting of rotamer changes and Backrub moves. Backrub moves involve selection of a random peptide segment, followed by a rigid body rotation of all atoms in that segment about an axis defined by the endpoint C-alpha atoms

Backrub moves for (A) tripeptide segments and (B) segments of arbitrarily length from 2 through 12 residues. (C) Flowchart of the process used to select ensembles to match the RDC measurements.

Subsequently we select ensembles from the resulting structures based on their agreement to the RDC measurements as measured by the Q-factor (

An ensemble selection approach similar to the one described above has been successfully applied to model relaxation order parameters using snapshots from MD trajectories

To validate our approach, we compare the Backrub-generated conformational ensembles to reference methods such as snapshots from an MD simulation in explicit solvent

We first tested whether Q-factors of Backrub ensembles selected according to the strategy described in

(A) Increasing Backrub ensemble size improves the agreement with the RDCs. Maximum segment length of 12 with kT = 1.2. (B) Q factors vs. RMSD of the RDC-optimized Backrub ensemble with the lowest Q factor at each simulation temperature for maximum segment length = 12. Error bars display Q_{experimental_uncertainty} (see

The RDC-optimized Backrub ensemble described above has a Q-factor of 0.086 over regions of regular secondary structure (see

_{free} values for these ensembles were 18.0% and 21.3%, respectively (

The structural variability of the ensemble is illustrated in

(A) Structures of the C-alpha backbone traces of a RDC-optimized 50-member ensemble of maximum segment length of 12 with kT = 1.2. (B and D) Mean C-alpha difference distance values of indicated ensembles mapped onto the 1UBQ X-ray structure. (C) Theoretical B-factors from a Gaussian Network Model. Color coding for B, C and D: Green: 0–25% of the max value in the non-loop regions; Yellow: 25–50% of the max; Orange: 50–75% of the max; Red: 75–100% of the max; Grey: loop regions that were not included in the fit to the RDC measurements.

We compared the Q-factor of the RDC-optimized Backrub ensemble to the Q-factors from various other ubiquitin ensembles (_{free} values from cross-validation: 18.0%, 16.1%, 20.0%, 17.8%, and 23.3%, respectively for the RDC-optimized Backrub ensemble, the EROS ensemble, the 1D3Z structures, the ubiquitin X-ray ensemble and the ensemble of MD snapshots (

One important criterion with which the various ensembles of ubiquitin can be assessed, as mentioned above, is whether an ensemble matches the RDCs better than any single structure within it. If this is the case, dynamical information contained in the experimental measurements can be interpreted by analyzing the conformational variability in the ensemble. The RDC-optimized Backrub ensemble, the MD-EAR ensembles (1XQQ, 2NR2 and EROS PDB code: 2K39) and, the ubiquitin X-ray ensemble and the ensemble of MD structures have improved Q-factors over the best single structure (

The three sets of NMR structures (1D3Z, 1UD7, and 1G6J) do not show an improvement in the Q-factor over the best single structure. For the 1D3Z NMR structures, a subset of the RDCs were used in the refinement and, as a result, the Q-factor (Q = 0.107; calculated over all 23 datasets used in this paper) is lower than for the other NMR structures. The Q-factor of the lowest single 1D3Z NMR structure indicates that the 1D3Z NMR structure is a good representation of the average structure.

We also used the strategy described in

To characterize the conformational variability of different regions of the protein in our ensembles, we calculated C-alpha difference distance matrices (see

Supporting relation 1, the pattern of motion of the ubiquitin X-ray ensemble and the RDC-optimized Backrub ensemble show substantial similarities. In both these ensembles the most flexible regions are the C-terminal end of the helix and the N-terminal end of beta strand 2. This result is consistent with the suggestion of Lange et al.

We also investigated the differences between the RDC-optimized Backrub and the ubiquitin X-ray ensemble flexibilities in light of the errors in the calculated RDC values in these regions (

As a final point of comparison, we applied a Gaussian network model (GNM)

We showed above that our RDC-optimized Backrub ensemble (i) gives similar Q-factors to reference ensembles including an RDC-restrained MD ensemble (EROS)

Distributions are shown for the best RDC-optimized Backrub ensemble with maximum segment length of 12 and kT = 1.2, as well as modeled and experimental relaxation order parameters corresponding to these chi angles (chi1 and chi2 correspond to the Cγ and Cδ methyl groups, respectively). The Leucine Cδ methyl group relaxation order parameters were averaged.

Ubiquitin has several hotspots shown to be important in recognition of different binding partners: Ile 44, Asp 58, and His 68. These were identified as rigid in the order parameters of the EROS ensemble

Our results above provide support for the hypothesis of a correspondence between the properties of Backrub-derived conformational ensembles, solution-state dynamics reflected in NMR measurements and a conformational ensemble of 46 experimental crystal structures of ubiquitin. To broaden this result and shed light more generally on a link between protein dynamics and evolution, we next ask whether there is also a correspondence between the dynamics of a single protein sequence and the conformational variability explored in its protein family to accommodate sequence changes during evolution (relation 2;

To test the correspondence of the conformational variability of an individual protein and that of its family, we constructed an ensemble from the available structures of proteins in a multiple sequence alignment of the UBQ subfamily (see

The resulting UBQ subfamily ensemble shows high variability in the C-terminus of the helix and in the N-terminus of beta strand 2, which is strikingly similar to the regions of high flexibility in the RDC-optimized Backrub ensemble. Thus, we find similar conformational variability in the structures of the ubiquitin homologs and in an ensemble fit to the solution state dynamics of ubiquitin. This correspondence in pattern of flexibility holds despite the different motional amplitudes of these ensembles: 2.0 Å and 0.9 Å pair-wise RMSD to the 1UBQ X-ray structure, respectively, for the UBQ subfamily ensemble and the RDC-optimized Backrub ensemble.

We proposed in hypothesis 2 and showed above that there are similarities in the conformational variability of a single protein and that of its homologs. Here we extend this idea to ask whether the sequences compatible with a structural ensemble describing the dynamics of a single protein are similar to the sequences of the natural family members. We first tested whether there is a difference between the sequence spaces consistent with the RDC-optimized and non-RDC-optimized Backrub ensembles. We performed computational protein design with Rosetta

To compare the sequence space coverage of the various ensembles, we used the BLOSUM62 matrix

The sequence spaces sampled by the RDC-optimized and non-RDC-optimized Backrub ensembles with optimal Backrub parameters (maximum segment length of 12 and kT = 1.2) are very similar (

(A) Designed sequences on non-RDC-optimized (light blue), and RDC-optimized (dark blue) Backrub ensembles of maximum segment length of 12 with kT = 1.2. (B) and (C): Low-scoring designed sequences on the fixed backbone of the X-ray structure 1UBQ (orange); on non-RDC-optimized Backrub ensembles with maximum segment length of 12 with kT = 0.3 (green), kT = 1.2 (blue), and kT = 4.8 (cyan); and (B) low-scoring designed sequences on the ubiquitin X-ray ensemble (red), or (C) sequences from the UBQ subfamily (brown). (Note that the dimensions shown in the plots are selected to maximize the variation of the points in each plot and will differ between plots).

Next we compared the 2-D sequence space of designs on various non-RDC-optimized Backrub ensembles to the sequence space of designs on the ubiquitin X-ray ensemble. Different non-RDC-optimized Backrub ensembles of maximum segment length of 12 with varying amplitude (kT = 0.3, 1.2 and 4.8) sample largely separate sets of sequences (

Finally, to test whether there exists a link between the conformational heterogeneity of solution dynamical ensembles and the sequence space compatible with these ensembles (Test 4), we compared the 2-D sequence space of designs on various Backrub ensembles to the sequence space of the UBQ subfamily of the ubiquitin αβ roll subfold (

The sequence logo representations in

Sequence logo plots for (A) the UBQ subfamily, and low-scoring designed sequences on (B) the 1UBQ fixed backbone, (C) the non-RDC-optimized ensemble created with maximum segment length of 12 and kT = 0.3, and (D) the non-RDC-optimized and (E) RDC-optimized ensembles with maximum segment length of 12 and kT = 1.2. Designed sequences on (F) non-RDC-optimized and (G) RDC-optimized ensembles from a molecular dynamics trajectory of 100-nanoseconds. (H) Designed sequences on the EROS ensemble. Figure created using WebLogo

Taken together, our results thus indicate that the conformational sampling methods we use here to match RDC dynamics produce variability similar to the conformational heterogeneity of X-ray ensembles (both using different ubiquitin structures as well as structures from the UBQ subfamily) and may lead to significant overlap between sequences consistent with modeled ensembles and the sequence space covered by the natural family. Additionally, it appears from the similarity of sequences from RDC-optimized and non-RDC-optimized ensembles that the RDCs have led us to determine optimal Backrub sampling parameters (

In this work, we describe the application of the Backrub motional model to create ensembles of structures consistent with RDC measurements and to sample the conformational and sequence space of the UBQ subfamily.

The main new aspect of our work is that we link the conformational dynamics of a single sequence, as reflected by both RDC data and Backrub ensembles, to conformational diversity observed in crystal structures of ubiquitin and its family, and to evolutionary sampled sequence diversity. We achieve this by applying computational protein design to select low-energy sequences consistent with Backrub ensembles. The fact that low-Q factor Backrub ensembles sample a similar sequence space to that of the ubiquitin X-ray ensemble extends results by other groups demonstrating the correspondence of solution-state dynamics and crystallographic heterogeneity

We find that RDC-optimized ensembles created with only certain Backrub sampling parameters were able to reach the lowest Q-factors, indicating that the conformational space sampled by these Backrub parameters is the most similar (compared to other parameters) to the conformations giving rise to the RDC measurements. However, while we see significant improvements in Q-factors during the selection protocol, we also find substantial similarities between RDC-optimized and non-RDC-optimized Backrub ensembles in patterns of C-alpha RMSD, order parameters and designed sequence space. This somewhat surprising observation could mean that the selection procedure primarily optimizes for subtle differences in NH-vector orientations (_{free} for RDC-optimized over non-RDC-optimized ensembles, indicating that other aspects of the peptide plane orientation are better represented in the RDC-optimized ensembles. Notably, there are defined Backrub parameters that simultaneously give the best agreement with the RDC data (after selection) and the best sequence space overlap with the natural family, irrespective of whether we apply selection or not. This could indicate that it is primarily the mechanism and amplitude of motions that are important, and that, as long as the amplitude is in the correct range defined by the appropriate sampling parameters, the Backrub motional model can sample relevant motions without requiring RDC data. Hence, the Backrub motional model may be useful (i) to predictively sample conformations similar to ensembles of bound conformations and (ii) to use with design to sample the sequence space of the natural family. Such sampling of sequences likely to be accommodated by a given protein fold may help improve engineering of new protein structures, functions and interactions. For example, coupling backbone ensemble generation and sequence design may be useful to computationally predict sequence libraries enriched in functional members

There are several potential limitations of the Backrub method, as applied here. As we implement Backrub in a Monte Carlo protocol, the timescale of conformational transitions is not taken into account. Also, the method used here limits the backbone conformational space sampled to those conformations accessible with the Backrub mechanism, a restriction which can be alleviated for example with the addition of small phi/psi changes to the method or by using analytical methods for local loop closure

As necessitated by the scarcity of proteins with sufficient RDC data, we limit our study here to one protein and further work is needed to extend modeling of protein native state dynamics and tolerated sequence space to more proteins. However, the usefulness of the Backrub mechanism for modeling protein motions is supported by several studies

Analysis of the generated ubiquitin Backrub ensembles allows several fundamental insights on the relationship between structure, function, sequence and dynamics. The ubiquitin core flexibility and a binding mechanism by conformational selection have been pointed out previously

In conclusion, we have tested a method for sampling conformational diversity using Backrub conformational changes and shown that it can generate ensembles consistent with millisecond-timescale measurements of protein dynamics. This method is computationally more efficient than molecular dynamics-based methods, allowing it to be applied to a variety of protein modeling tasks such as sequence design. Notably, we find that the method recapitulated many of the structural properties of the RDC-optimized Backrub ensembles even when the RDC measurements were not incorporated in the sampling procedure, despite the fact that the RDCs were necessary to determine the amplitudes of motion in the Backrub ensembles. We additionally find that the sequence diversity tolerated by non-RDC-optimized Backrub ensembles is similar to that of both the ubiquitin X-ray ensemble and the UBQ subfamily X-ray ensemble. This result needs to be tested on more proteins and, if validated, should be useful in making prospective predictions to numerous applications, such as protein-protein or protein-small-molecule docking, protein interface design, and enzyme design.

The dataset of RDCs we use here consist of measurements in 23 alignment media as described in Lakomek et al.

For all X-ray structures, explicit hydrogen atoms were added according to standard geometry using Rosetta, and the positions of hydrogens with rotatable bonds were optimized

To generate protein conformational ensembles, we ran “Backrub” Monte Carlo simulations, as described in

We ran a Backrub Monte Carlo simulation at kT = 0.1 from the starting PDB conformation (using 1UBQ, which has the highest resolution (1.8 Å) of the unbound ubiquitin structures; similar results were obtained for maximum segment length of 3 with PDB entries 1UBI and 1CMX and worse Q factors were obtained for PDB entries 1FXT, 1AAR, 1F9J, and 1TBE) for 10,000 steps with a maximum segment length of 3 or 12, matching the segment length used later. The lowest energy structure from this simulation is used as the starting conformation for 10,000 randomly seeded Backrub simulations at one of 5 different temperatures (kT = 0.3, 0.6, 1.2, 2.4, or 4.8) run for an additional 10,000 steps. The last structure from each of these simulations is used to form the starting set of 10,000 structures.

From this initial set of 10,000 structures, ensembles are selected to match the RDCs by minimizing the Q-factor of the ensemble. First, structures are randomly chosen to create a starting ensemble of a given size (2, 3, 5, 10, 20, 50 or 100 structures), and the Q-factor of the ensemble is calculated (see below). Next, a random structure in this ensemble is chosen and replaced with a randomly chosen structure from the initial ensemble of 10,000 structures; then the new Q-factor of the ensemble is calculated. If the new Q-factor is lower than before the replacement, the change is kept, otherwise it is reverted. These structure replacements are repeated until the Q-factor changes by less than 0.001 in 5000 steps. By repeating this method 1000 times, 1000 RDC-optimized Backrub ensembles are created. There are a very large number of possible subsets of a given size. For example, there are 4*10^{^}61 different sub-ensembles of size 20 from the initial ensemble of size 10,000, too many to be evaluated. The approach described here does not guarantee that the ensemble with the lowest Q-factor will be found, but it starts from many random starting points to broadly sample the space of possible sub-ensembles and the selection process converges to a low Q-factor solution within 10,000 Backrub-generated structures for all Backrub Monte Carlo temperatures (except kT = 4.8; see

RDCs are calculated from a single structure and an ensemble of structures as described in ^{−1} D_{exp}, where T is the alignment tensor, A^{−1} is the Moore-Penrose inverted matrix of projection angles for the amide bonds (or averaged projection angles for a set of structures), and D_{exp} is the vector of experimental couplings. The predicted couplings are then calculated with the equation D_{calc} = AT where A is the same matrix of projection angles from above and D_{calc} is the vector of calculated couplings.

Q-factors were calculated for all RDC measurements with the equation:

Errors between experimental and predicted RDCs were calculated with:

Loop residues (i.e. those with DSSP

There are several sources of error in our analysis to consider when assessing the significance of the results. First, there is error in the RDC measurements due to experimental uncertainty. The uncertainty in these values is estimated to be 0.3 Hz _{experimental_error} = 0.036.

A second source of error results from not finding the ensemble with the lowest possible Q-factor from a given initial structure set. We estimated this error by repeating the selection procedure many times and evaluating the variance in the resulting Q-factors. We take explicit steps to minimize this error by enforcing two convergence criteria on the optimization: 1) ensemble selection is not finished until 5000 steps have passed without a change in Q of more than 0.001, and 2) enough RDC-optimized ensembles are generated from random starting structures such that the difference in the Q-factors of the best and 10th best RDC-optimized ensemble is not more than 0.005. Thus, this Q_{optimization_error} is on the order of 0.005.

A third important source of error is due to insufficient sampling of conformational space with the Backrub Monte Carlo protocol and the 10,000 structures that we use to select ensembles from. We estimated this Q_{sampling_error} by running the structure generation protocol at each temperature 10 times, thus creating 10 sets of 10,000 Backrub-generated structures at each temperature. The standard deviations of the minimum Q-factors over these 10 sets of 10,000 structures are 0.0151, 0.0104, 0.0025, 0.0039, and 0.0049 for kT = 0.3, 0.6, 1.2, 2.4 and 4.8, respectively for a maximum segment length of 12. The standard errors of the mean of these values are 0.0048, 0.0033, 0.0008, 0.0012, and 0.0015, respectively.

Gaussian-distributed noise was added to the experimental RDCs with 1000 Monte-Carlo samples. The RDC uncertainty of each measurement was 0.3 Hz _{experimental_uncertainty} is 0.036 with a standard deviation of 0.00102 over the 1000 samples.

Order parameters were calculated with the equation

We used the 100-nanosecond AMBER trajectory of ubiquitin in TIP4Pw/e water from Wong and Case

To estimate the sequence space compatible with different structures and ensembles, we used Rosetta computational protein design to generate 1000 low-energy sequences for each single structure or 20 sequences per ensemble member for ensembles of size 50. To find a low-scoring sequence, each design simulation consists of 20 rounds of Monte Carlo simulated annealing with the number of steps in each round equal to the number of rotamers created for the simulation. The backbone of each structure or ensemble member is kept fixed during the design simulations and all positions were allowed to vary to any of the 20 naturally occurring amino acids, adding extra conformers at one standard deviation around the mean rotamer for chi 1 and 2 dihedral angles. The scoring function used was the Rosetta all-atom scoring function

Distances between sequences were calculated as in

The procedure was repeated with the sequences of core residues only, where core residues were defined by counting the number of neighbor residues with C-beta atoms within 10 Å of the C-beta atom of the residue of interest (or C-alpha atoms for glycine). The cutoff value used (greater than or equal to 18) was chosen so that approximately one third of the residues fell into the core category (excluding the flexible C-terminus), resulting in 21 buried positions: residues 3, 5, 17, 21, 23, 25, 26, 27, 30, 41, 43, 45, 50, 55, 56, 59, 61, 65, 67, 68, and 69.

First, for each structure, we calculated the matrix of distances between all C-alpha atoms. Then, for each pair of structures, we calculated the distance difference matrix as the absolute value of the difference of the distance matrices of the structures. These distance difference matrices were averaged to give the C-alpha difference distance matrix of the ensemble

Theoretical B-factors were calculated by applying the online Gaussian Network Model (oGNM) tool at

To create a structural ensemble for the UBQ subfamily we took the highest resolution X-ray structure for each protein listed in Table 1 of Kiel et al.

We performed cross-validation by using the alignment tensor calculated from the NH RDC datasets to calculate RDCs for four datasets of NC′ RDC couplings and four datasets of HC′ couplings. These “free” data were not included in the selection process and are reported as R_{free} factors, as calculated by Lange et al. _{i} measurements each and Q-factor Q_{i}. For RDC-optimized Backrub ensembles, the R_{free} values are averaged over the five lowest-Q factor ensembles.

Supplementary results & supplementary methods.

(0.06 MB PDF)

RDC_{error} and Q-factors of different ensembles. (A) Error in the calculated RDCs. (B) Same data as

(0.58 MB EPS)

Stereochemistry of Backrub and other ensembles.

(0.27 MB EPS)

C-alpha difference distance matrices. (A) C-alpha difference distance matrices of various ensembles. (B) Mean C-alpha difference distance values for various ensembles. Red dashed lines: anchor residues 44, 58 and 68. (C) Normalized C-alpha difference distance values and RDC errors over sequence for the ubiquitin X-ray ensemble and the RDC-optimized Backrub ensemble. (The C-alpha difference distance values were normalized to the maximum and minimum values in the secondary structure regions longer than 3 residues.)

(4.29 MB TIF)

C-alpha RMSD and amide order parameter traces of Backrub ensembles. C-alpha RMSD traces of the best five RDC-optimized (grey) and one non-RDC-optimized (black) Backrub ensembles for maximum segment length of 3 with (A) kT = 0.3, (B) kT = 2.4, and (C) kT = 4.8 and maximum segment length of 12 with (D) kT = 0.3, (E) kT = 1.2, and (F) kT = 4.8. (G) Amide order parameters for the RDC-optimized and non-RDC-optimized Backrub ensembles, the SCRM description, the relaxation experiments, and the EROS ensemble.

(1.42 MB EPS)

Chi angle distributions of various residues. For the DER ensemble (1XQQ), RDC-optimized and non-RDC-optimized Backrub ensembles with maximum segment length of 12 with kT = 1.2. Also included are the order parameters for the RDC-optimized ensemble, the MD trajectory and the experimental relaxation measurements, where available.

(0.54 MB EPS)

Sampling of sequence space by computational design for both core only and aligned residues. Low-scoring designed sequences on the fixed backbone of the X-ray structure 1UBQ (orange); on non-RDC-optimized Backrub ensembles with maximum segment length of 12 with kT = 0.3 (green), kT = 1.2 (blue), and kT = 4.8 (cyan); and sequences from the UBQ family (brown) for (A) aligned and (B) only core residues; or low-scoring designed sequences on the 100 ns MD ensemble (red) for (C) aligned and (D) only core residues.

(1.35 MB EPS)

Amide vector orientations. Angle difference between the average amide vector orientation of the 1D3Z NMR ensemble and the average amide vector orientations in RDC-optimized and non-RDC-optimized Backrub ensembles (A) maximum segment length of 12 with kT = 1.2 and (B) maximum segment length of 3 with kT = 2,4. The angle difference of the average amide vector orientation of the 1D3Z ensemble is also compared to the orientation of amide vectors in two X-ray structures (with hydrogens added using Rosetta). (C) The difference in the angle differences from (A) and (B) for non-RDC-optimized minus RDC-optimized ensembles in secondary structure regions. (D) Angle differences of the two (E) RDC-optimized and (F) non-RDC-optimized Backrub ensembles plotted relative to each for residues in secondary structure regions.

(1.46 MB EPS)

Convergence of Q factors in the optimization protocol.

(0.17 MB EPS)

Cross-validation analysis.

(0.04 MB DOC)

Q-factors of RDC-optimized ensembles at various simulation temperatures and maximum segment lengths.

(0.03 MB DOC)

We would like to thank Mariana Babor for computational design simulations, Ryan Ritterson for initial results on individual and family conformational variability, Korvin F.A. Walter for help with the measurements of RDCs, Chris Saunders for sharing data on ubiquitin alignments and modeled sequence diversity, David Case for providing the 100-nanosecond molecular dynamics trajectory of ubiquitin, and Ian Davis and Vincent Chen for help with MolProbity.