The Role of Local Backrub Motions in Evolved and Designed Mutations

Amino acid substitutions in protein structures often require subtle backbone adjustments that are difficult to model in atomic detail. An improved ability to predict realistic backbone changes in response to engineered mutations would be of great utility for the blossoming field of rational protein design. One model that has recently grown in acceptance is the backrub motion, a low-energy dipeptide rotation with single-peptide counter-rotations, that is coupled to dynamic two-state sidechain rotamer jumps, as evidenced by alternate conformations in very high-resolution crystal structures. It has been speculated that backrubs may facilitate sequence changes equally well as rotamer changes. However, backrub-induced shifts and experimental uncertainty are of similar magnitude for backbone atoms in even high-resolution structures, so comparison of wildtype-vs.-mutant crystal structure pairs is not sufficient to directly link backrubs to mutations. In this study, we use two alternative approaches that bypass this limitation. First, we use a quality-filtered structure database to aggregate many examples for precisely defined motifs with single amino acid differences, and find that the effectively amplified backbone differences closely resemble backrubs. Second, we directly apply a provably-accurate, backrub-enabled protein design algorithm to idealized versions of these motifs, and discover that the lowest-energy computed models match the average-coordinate experimental structures. These results support the hypothesis that backrubs participate in natural protein evolution and validate their continued use for design of synthetic proteins.


Introduction
Proteins routinely incorporate amino acid changes over evolutionary time by adapting their conformation to the new sidechain. However, it remains a difficult task to predict such a conformational response, especially when subtle backbone adjustments are involved. This issue is of central importance to the burgeoning field of computational protein design, which has recently enjoyed a string of exciting developments [1][2][3][4].
A number of descriptions of backbone motion have been implemented for the purposes of protein design in the past, each with its own set of advantages and disadvantages. Anticorrelated ''crankshaft'' adjustments of the y(i21) and Q(i) torsions [5] are evident from order parameters derived from molecular dynamics (MD) simulations, but unrealistically distort the ends of the peptide if employed in isolation. Helical parameters [6] and normal mode analysis [7] enable efficient exploration of conformational space near the starting model, but are only useful for a small subset of protein architectures: respectively, coiled-coils and structures for which a small number of motional modes dominates conformational diversity. Peptide fragments [8] implicitly reflect local protein energetics because they are extracted from experimental structures, but can be computationally inefficient because most random fragment insertion attempts are incompatible with a given local structural context and will therefore be rejected. This being the case, it may be prudent to let nature inform our notion of backbone motion by using a move set based on empirical observations, which may encode aspects of protein energetics and sidechain/backbone coupling that are difficult to handle explicitly.
One such model is the backrub (Figure 1), a highly localized backbone motion tightly coupled to sidechain rotamer jumps, initially characterized by examining alternate conformations in ultra-high-resolution crystal structures [9]. A simple geometrical model of the backrub consists of a small (,15u) rotation of a dipeptide about the axis between the first and third C a atoms. Resulting strain in the N-C a -C bond angle t of all three residues may be partially alleviated and backbone H-bonding maintained with small counter-rotations of the two individual peptides. Note that this C a formulation is a simplified but very close approximation of the real molecular mechanism, which probably involves a computationally unwieldy set of small shifts in 6-10 backbone torsion angles, as discussed in [9]. Backrubs were seen for 3% of the total residues in that previous study, and for 2/3 of the alternate conformations with a change in C b position -far exceeding the next most common shifts, which are either peptide flips or local shear in a turn of helix.
Several studies have successfully used the backrub approach to expand the search space of protein design efforts and improve agreement between computed sidechain dynamics and nuclear magnetic resonance (NMR) measurements [10][11][12][13][14]. Recent work has shown that computational design of backbone structures generated by backrub sampling can recapitulate much of the sequence diversity found in the natural ubiquitin protein subfamily [15] and by phage display experiments [16]. However, the backrub has only been empirically demonstrated to accompany dynamic rotamer changes, not actual changes in amino acid identity. Importantly, no direct experimental evidence has been presented to support the assumption implicit in these studies that a dynamic, low-energy motion on the pico-to-nanosecond timescale is relevant on an evolutionary timescale. The contribution of this current study is to address in atomic detail the specific mechanisms by which backrubs accommodate amino acid changes during processes like subfamily evolution. We use a data set of 5200 highresolution, high-quality crystal structures to examine differences in local backbone conformation between well-defined motif populations related by a single amino acid difference, and find that the backrub motion explains the majority of the mainchain movement. Furthermore, we demonstrate that a provably-accurate flexible-backbone design algorithm allowing backrubs at those positions, in conjunction with a common molecular mechanics force field, accurately recapitulates such mutation-coupled backbone changes. These findings validate inclusion of the empirically observed backrub motion as part of the repertoire of ''moves'' for protein design and other modeling efforts.

Backrubs of a-Helix N-caps
The N-cap or C-cap position of a helix is defined as the residue half-in and half-out of the helix: the peptide on one side of the cap makes standard helical backbone interactions, while the peptide on the other side has quite non-helical position and interactions [17]. a-helix N-cap residues can make several types of interactions that stabilize or specify the structural transition from loop into a-helix, the most common and dominant of which is a sidechainmainchain hydrogen-bond to the i+3 amide [17][18][19]. The N-cap Figure 1. The backrub move for mutation-coupled local protein backbone adjustment. (A) A theoretical mutation in ideal b-sheet, from Leu in the mt rotamer (green) to Val in the m rotamer (blue) [43], changes the interactions of the sidechain with its surroundings. Hydrogen atoms are shown in gray. (B) The primary backrub rotation angle h 1,3 (red dotted circle) rotates the dipeptide of interest around the C a1 -C a3 axis (red line). As a result, the sidechain of residue 2 (the central residue) swings in a hinge-like manner. In the theoretical example shown, the space occupied by the new Val sidechain is now more similar to the space originally occupied by the Leu sidechain. (C) The secondary peptide rotation angles h 1,2 and h 2,3 (blue dotted circles) counter-rotate the individual peptides around the C a1 -C a2 and C a2 -C a3 axes (blue lines) to alleviate any strain introduced into the flanking t 1 and t 3 bond angles, respectively, and to restore H-bonding of the two peptides' amides and carbonyls, if necessary. The rotation angles, including the primary backrub angle h 1,3 , define a motion, not a structure, and thus are meaningful only in reference to a pair of conformations (e.g. before vs. after or mutant vs. wildtype). doi:10.1371/journal.pcbi.1002629.g001

Author Summary
Protein design has the potential to generate useful molecules for medicine and chemistry, including sensors, drugs, and catalysts for arbitrary reactions. When protein design is carried out starting from an experimentally determined structure, as is often the case, one important aspect to consider is backbone flexibility, because in response to a mutation the backbone often must shift slightly to reconcile the new sidechain with its environment. In principle, one may model the backbone in many ways, but not all are physically realistic or experimentally validated. Here we study the ''backrub'' motion, which has been previously documented in atomic detail, but only for sidechain movements within single structures. By a twopronged approach involving both structural bioinformatics and computation with a principled design algorithm, we demonstrate that backrubs are sufficient to explain the backbone differences between mutation-related sets of very precisely defined motifs from the protein structure database. Our findings illustrate that backrubs are useful for describing evolutionary sequence change and, by extension, suggest that they are also appropriate for rational protein design calculations.
H-bond enhances protein stability by compensating for the loss of a mainchain H-bond at the helix start relative to the middle of a helix. Note that the sidechain cannot reach this H-bonding position if the residue has helical w,y, so this interaction also specifies the exact helix start position and the direction from which the backbone can enter [20]. Asn, Asp, Ser, and Thr are especially favored at N-caps because their sidechains have the proper chemical character and shape to mimic the helical backbone interactions (which Gln and Glu are too long to do).
Notably, Asn/Asp sidechains are longer than Ser/Thr sidechains by one covalent bond, yet their H-bond distances (N-cap sidechain O to i+3 amide H) are only slightly shorter (2.0160.18 vs. 2.1760.18 Å ) based on a survey of all N-caps with i+3 H-bonds in the Top5200 database (described below and in Methods). This means the backbone must slightly adjust to maintain similar Hbond geometry in both cases.
With this motivation, we wished to confirm the appropriateness of the backrub model for this case of mutational rather than rotamer change. However, backbone coordinate shifts due to backrubs are very small -on the order of the coordinate differences between crystal structures of the same protein [21,22], thus obscuring differences between genuine shifts and experimental noise. The initial description of the backrub bypassed this problem by comparing alternate conformations within single structures [9]. Our approach here, in contrast, was to use the collective weight of many examples to ensure that observed local conformational differences were in fact genuine.
N-cap crystal structures. First, we needed to determine which subset of helix N-cap conformations would be likely to undergo backrubs and thus merited further examination. To do so, we compared amino acid preferences for a-helix N-caps and 3 10 -helix N-caps relative to general-case protein structure, using a non-redundant, quality-filtered set of structures, the Top5200 database (see Methods). Asn/Asp/Ser/Thr were indeed found to be strongly preferred (by factors of 2.5-3) at a-helix N-caps relative to protein structure in general ( Figure 2). Gly is next most common, but cannot form or be influenced by an N-cap H-bond. Pro is the outstandingly preferred residue at 3 10 N-caps ( Figure 2), while for a-helix Pro is disfavored at the N-cap but occurs preferentially at N-cap i+1. These comparative preferences are detailed further in Text S1, with the conclusion that helix Ntermini might well respond to a deletion in the preceding loop by transforming from a classic a-helix with Pro at N-cap i+1 to a tighter-wound 3 10 -helix with the Pro as N-cap ( Figure S1 in Text S1). As an additional factor in our choice of examples, a 3 10 -helix N-cap would have an i+2 sidechain-backbone H-bond rather than the i+3 common at a-helix N-caps, introducing another complicating issue. Therefore, we decided to focus on Asn/Asp vs. Ser/ Thr a-helix N-caps with i+3 sidechain-backbone H-bonds in this study.
With this specific motif in mind, we sought to obtain a preponderance of evidence for a backrub relationship in the form of numerous examples. Therefore, we performed a stringent structural motif search of the Top5200, resulting in identification of 429 Asn/Asp N-caps and a matching sample choice of 500 Ser/ Thr N-caps (out of 3208; see Methods).
The backbone conformations differ consistently: the longer Asn/Asp sidechains rotate the first turn's backbone away from residue i+3, while the shorter Ser/Thr sidechains pull the first turn's backbone toward i+3 in order to form the N-cap H-bond successfully ( Figure 3, Table 1). The Dataset S1 kinemage shows the full sets of Asn/Asp and Ser/Thr N-caps and animates between them. When average Asn/Asp and Ser/Thr structures are superimposed using the C a s surrounding the N-cap in the first turn (N-cap i21 and i+1 to i+3), all C a s match well except the Ncap C a itself ( Table 1). The conformational difference at the N-cap position is well modeled by a backrub rotation of about 11u, similar to shifts typical of rotamer-change backrubs. Furthermore, for both N/D and S/T, the C b deviations [23] and the C a -C b -C c bond angle distribution at N-caps are close to the general case distribution ( Figure S2 in Text S1; further details in Text S1). This means the observed C b shifts and further leveraged sidechain shifts can be attributed primarily to backbone motion rather than altered covalent sidechain geometry.
The median Asn N-cap sidechain built onto the average Ser Ncap backbone results in a steric clash in which the van der Waals radii of the N-cap sidechain O d 1 and the i+3 backbone amide H overlap by .0.45 Å . No x dihedral adjustments can alleviate this clash without abolishing the N-cap sidechain-mainchain H-bond or introducing clashes to other nearby atoms whose positions are fixed for this motif (e.g. the i+4 C b ). Therefore, the backrub away from the first turn by Asn N-caps is indeed forced by steric repulsion.
We also examined two control cases with similar backbone geometry but different sidechain-mainchain interactions. First, we identified 538 a-helix N-caps with any amino acid type except Asn/Asp/Ser/Thr, in which case the i+3 sidechain-backbone Hbond is absent. Second, we chose 500 examples of mid-a-helix structure flanked by at least four a-helical residues in both directions, in which case the i+3 sidechain-backbone H-bond of an N-cap is satisfied by a usual i+4 backbone-backbone a-helical Hbond. The Dataset S1 kinemage shows that the average C a atoms for both control categories are in between the average C a atoms for the Asn/Asp and Ser/Thr categories at the N-cap (or structurally equivalent) residue. This confirms that Asn/Asp and Ser/Thr N-caps are backrub-mediated excursions in opposite directions from equilibrium N-cap/helix structure.
N-cap BRDEE computation. We next wished to test whether a simple energy function based on molecular-mechanics terms from Amber [24] and a solvation term from EEF1 [25] would recapitulate these empirically observed changes, given the chance to access them via a backrub. This question is important for solidifying the connection between natural protein evolution and computational protein design. We therefore turned to Backrub Dead-End Elimination (BRDEE) [10], a protein design algorithm that incorporates backrubs in a provably-accurate deadend elimination framework. BRDEE is freely available as part of the Open Source Protein Redesign for You (OSPREY) [26] suite of protein design software.
As input to the algorithm, we prepared two versions of an idealized helix N-cap motif (see Methods), one with a short sidechain (Ser) and another with a long sidechain (Asn). We then used BRDEE to compute the lowest-energy model for each template, allowing backrubs and rotamer changes at the N-cap as well as small i+3 peptide rotations (see Methods).
The lowest-energy Ser N-cap shifted ''forward'' whereas the lowest-energy Asn N-cap shifted ''backward'' in order to establish comparable hydrogen bonds (Table 1) in a manner remarkably similar to the empirically observed structures ( Figure 3). In particular, the computed and observed C a and C b shifts and inferred backrub angles are of similar magnitude and directionality ( Table 1).
The changes in flanking N-C a -C angles (t) for the BRDEE lowest-energy conformations recapitulate the changes for the crystal structures in terms of directionality: Dt,0 for i21 and Dt.0 for i+1 from Ser/Thr to Asn/Asp ( Table 1). The magnitudes are somewhat exaggerated for the computed models, but this is expected because the backrub paradigm somewhat unrealistically redirects strain to these angles as a simplification. That said, we find some evidence of actual systematic bond angle adjustment in the Dt of almost 3u between the two crystal structure populations (Table 1). There is precedent for context-dependent t variation with Q,y [27,28]; here the variation appears to be dependent on the presence of a local motif where sidechainmainchain H-bonding may energetically counteract the bond angle strain. At any rate, actual structures almost certainly  undergo more complicated backbone changes that propagate to the i62 residues and alleviate i61 t angles. However, those additional changes are quite small, and thus the backrub remains an efficient yet remarkably accurate model for conformational changes at N-caps and elsewhere.
Additional comparison and contrast of the computed models and experimental structures can be found in Text S1.
The BRDEE results recapitulate the average crystal structures, confirming the hypothesis that Ser/ThrRAsn/Asp mutations at N-caps are well modeled by a backrub relationship. More generally, this implies that the backrub may reasonably accompany mutations during natural evolution or in silico protein engineering.

Backrubs of Aromatics in Antiparallel b-Sheet
Aromatic residues often pair with glycine in antiparallel b-sheet by adopting rotamers with x 1 <+60u, which places the aromatic ring directly over a Gly on the adjacent strand across a narrow pair of backbone H-bonds [29]. Aromatic-glycine pairings in antiparallel b-sheet have been demonstrated to yield a synergistic thermodynamic benefit [30]. If the opposite residue is changed to anything other than Gly, a sidechain including at least a C b atom is now present, which would sterically clash with the aromatic in its original conformation. However, the ''plus x 1 '' aromatic rotamer will still be compatible with some rotamers of the opposite sidechain, provided that the aromatic may shift slightly to reoptimize packing of its ring against the opposite residue's C b hydrogens. Here we investigate whether backrubs enable this relaxation by excursions in both directions from a ''neutral'' bsheet conformation. The leverage provided by such backbone motions could lean the aromatic residue forward/backward to maintain close inter-strand contact when the identity of the opposite residue is changed to/from Gly. b aromatic crystal structures. A stringent structural motif search, similar to that described for N-caps above, identified 321 Phe/Tyr residues with ''plus'' x 1 rotamers in antiparallel b-sheet (see Methods). Aromatics are about three-fold as common in antiparallel vs. parallel b-sheet, and are about twice as likely to adopt a plus x 1 rotamer when they do occur in antiparallel vs. parallel b-sheet (data from Top5200), so we focused on antiparallel b-sheet in this study.
In 72 examples the amino acid on the opposite strand is a Gly, in which case the aromatic sidechain moves downward to contact the Gly C a H. In the other 249 examples the C b H atoms of the amino acid on the opposite strand push the aromatic ring upward ( Figure 4, Table 2, Dataset S2 kinemage). Pro cannot provide both b H-bonds, but all other non-Gly residues are equivalent in this role, since their sidechains must avoid the aromatic ring and present only C b H atoms toward it. The average C b deviation from ideality (0.05-0.06 Å ) is far less than the outlier threshold (0.25 Å ) [23], and the average change in aromatic C a -C b -C c bond angle is very small (0.6u, ,1s) ( Figure S3 in Text S1), resulting in a ,0.05 Å shift of the C f contact point. These observations argue against the possibility that this large movement of the planar aromatic group is produced just by a bond-angle ''hinge'' with C a or C b as the pivot. Rather, a dipeptide backrub rotation of about 11u (presumably 5-6u from neutral in each direction) almost perfectly interrelates the two average conformations.
b aromatic BRDEE computation. Fewer examples, and more variation for b-sheet than for a-helix conformation, prevented the ideal-start calculation used for the N-cap case. Instead, low-energy conformations were computed by BRDEE for several examples judged to be appropriately representative of their respective type (across from Gly or across from other) (see Methods). In all cases, the lowest-energy conformation appears to match the average crystal structure very well, whether across from Gly or from some other amino acid with a C b atom ( Figure 4). Computed shifts in C a , C b , C f position between the computed wildtype and mutant structures are similar to shifts between the two crystal structure populations, albeit systematically a bit smaller (Table 2, Text S1). As with N-caps, though to a slightly lesser extent here in extended b conformation, in silico backrubs place more strain into the i61 t angles than do real structures (Table 2); however, the discrepancies are on the same order as the standard deviations of the experimental t distributions, which are around 3u (data not shown).
These results confirm both that BRDEE in conjunction with a simple force field reproduces natural conformations well and also that backrubs model the relationship well.

Backrubs and Natural Protein Evolution
It is known that backrubs relate conformations that interchange dynamically [9]. In this study we further show that, at least for certain specific motifs, backrubs relate conformations that ''interchange'' based on point mutations. For the b aromatic motif, local restraints on steric packing influence the aromatic residue's backbone indirectly via the other altered sidechain. This is in contrast to helix N-caps where the sequence change and the backrub occur at the same residue, as seen previously for rotamer  changes [9]. Taken together, these findings support the intriguing idea that backrubs may ''foster'' mutations, easing them into the structure and promoting their survival into future generations. In this paradigm, backrubs enable individual mutations that provide the raw material for natural selection.
The two specific motifs analyzed here represent only about 0.5% of the protein residues in our Top5200 data set, and thus in one sense the scope of this study is relatively narrow. However, a tight focus was necessary to substantiate the idea of mutationcoupled backrubs with sufficient certainty, due to the coordinate  Distances are after superposition into the same reference frame using 5 C a s (aromatic i22, i21, i+1, i+2 and opposite i). Distances for BRDEE for atoms at or beyond Ca i61 are shown, as opposed to the N-cap case, because superposition into the same reference frame is subtly affected by allowing backrubs at the opposite C a in BRDEE. error problem in the alternative approach of comparing individual wildtype and mutant crystal structures directly. These two cases were chosen as common, well-defined motifs where the primary interaction environment of the changing sidechain is provided by local secondary structure and is therefore consistent across hundreds of examples. For the general case of an individual mutation, the potential interaction environment is also the same before and after; however, it is seldom simple enough to be closely repeated in numerous proteins. Furthermore, as shown in previous work at ultra-high resolution [9], 2/3 of alternate conformations that move C b demonstrate backrubs between rotamers of the same amino acid. All in all, therefore, it is reasonable to assume that the general prevalence of backrub accommodation at sites of mutation is significantly higher than the ''lower bound'' provided in this study.
The backbone shift considered on its own is continuous and low-energy, without a barrier, while the two-state behavior is contributed by the sidechain switch between rotamers, between Hbond partners, or between amino acids. In the dynamics case of jumps between distinct sidechain rotamers or H-bonds [9], both conformations are quite favorable, but backbone and sidechain must change together. In the evolutionary case, such as the single amino acid changes between stable motifs illustrated in this paper, the two amino acid types cannot coexist in the same molecule. The sidechain-backbone coupling shifts the energy landscape for the backbone [31], stabilizing a different choice within a shallow energy well. From a modeling perspective, examination of existing fast-timescale structural dynamism may illuminate other possibilities for mutations on an evolutionary timescale [32].
Mutation-coupled backrubs are small local changes, which presumably mediate neutral drift much more often than they aid large-scale structural rearrangements or changes in function. However, the accumulation of changes via neutral drift over time may in fact enable future large-scale changes by subtly altering the native state energy landscape such that eventually a tipping point is reached. Recent analysis of the evolution of an ancient protein confirms that some function-altering mutations required structural pre-stabilization by earlier ''permissive'' mutations [33]; backrubs may facilitate such preemptive sequence changes by shifting the backbone such that the functionally neutral amino acids can fit. Backrub-related sequence changes could also sometimes enable functional change based on a purely local adaptation when they occur in active sites, either directly or by first enhancing functional promiscuity.
Note that we do not directly address true evolutionary relationships between proteins in this study. Rather, we substantiate the idea that backrubs enable single amino acid changes at specific motifs, which could aid actual evolution within a protein family [15].

Backrubs and Computational Protein Design
It is only natural to segue from the role of backrubs in protein evolution to their utility for protein design -essentially a computational analog of molecular evolution. Our results indicate that, despite the relative simplicity of their functional form, molecular-mechanics-based force fields like Amber plus EEF1 that are commonly used for protein design can in fact accurately recapitulate empirically-observed backbone conformation for multiple specific structural motifs, given the chance to access them via a backrub. (Note that the cases presented here were dominated by single interactions such as H-bonds or steric packing; a higher-cost energy function might be needed to maintain similar accuracy if different interactions are competing and need to be compared quantitatively.) Thus, predicting the conformational consequences of a sequence change in computational protein design is in large part a search problem: if the appropriate regions of protein conformational space are searched efficiently, in many cases low-cost energy functions can do the rest. Unfortunately, that space is vast indeed even for a single sequence, as we know from Levinthal's famous thought experiment [34]. The additional consideration of combinatorial mutations (even when conformational changes are restricted to simple sidechain rotamer alternatives) creates a space that is even more difficult to search, as shown by the proof that protein design is NP-hard [35]. Of course, backbone flexibility further enlarges the search space.
However, flexible-backbone design algorithms like BRDEE are excellent candidates for this task in many cases because (1) they are based on empirically demonstrated types of flexibility and (2) they come with mathematical guarantees of their accuracy with respect to the input parameters. Other algorithms that search over amino acid and rotamer identities, then minimize over backrub degrees of freedom post facto are not guaranteed to identify the global minimum energy conformation (GMEC) given the input model (starting structure, rotamer library, energy function). An advantage of BRDEE is that it incorporates backrub minimization awareness directly into the amino acid and rotamer comparison stages of dead-end elimination, and thus is guaranteed to identify the GMEC given the input model. Because BRDEE avoids becoming trapped in local minima, it effectively decouples the often intertwined issues of conformational search and scoring. Therefore, as a result of using BRDEE, this paper gives the limit of how well any algorithm can perform given our input model. In the future we plan to implement additional empirically-validated small backbone motions, such as peptide flips and tripeptide shears [9,14], to improve coverage of conformational space.

Conclusion
Overall, we have demonstrated that the backrub, a model of local backbone motion previously only documented for dynamic rotamer changes, also applies to local sequence changes. This finding is an important direct validation for the application of the backrub to the study of natural protein evolution and to continuing efforts in computational protein design.

Creation of Structure Database
To identify numerous examples of the desired motifs, we used a ''Top5200'' database of high-quality protein structures. The rapid growth of the Protein Data Bank (PDB) [36] in recent years enabled the creation of a high-quality database with an order of magnitude increase in size relative to the previously described Top500 [23] while maintaining similar standards of resolution and structure quality. However, due to sheer logistics it also necessitated a more automated selection protocol.
We included at most one protein chain per PDB 70% sequencesimilarity cluster as of April 5, 2007. We chose the representative for each such cluster as the chain with the best average of resolution and MolProbity score [37] where resolution is ,2 Å . MolProbity score is an estimate of the resolution at which a structure's steric clashes, rotamer quality, and Ramachandran quality would be average; thus the average of resolution and MolProbity score is a combined experimental and statistical indicator of structural quality [37]. The homology filter prevents redundancy and thus over-representation of certain motifs or substructures.
To calculate the MolProbity score for each chain, first hydrogens were added with the program Reduce [38]. The -flip flag was used in order to allow Asn/Gln/His flips throughout the structure, including at interfaces where multimer partners may participate in hydrogen-bonding networks. All protein chains with at least 37 residues were then extracted, along with any ''het'' atoms or waters with the same chain identifier, and MolProbity score was calculated for each chain.
Two ''post-processing'' steps were required. First, we removed four chains whose PDB structures had been obsoleted and replaced them with updated structures where possible (1sheAR2pk8A, 1wt4AR2v1tA, 2eubAR2pl1A, 2f4dARno replacement). Finally, we removed two chains with incomplete or unclear PDB files (1c53A had only C a atoms, 3ctsA had only ''UNK'' unknown residue types). The resulting 5199 protein chains make up about a million residues.

Identification of Motifs
Identification of a-helix N-caps. We first noticed that backrubs may explain backbone adjustments upon mutation between short and long N-caps sidechains while examining Ncaps in T4 lysozyme. Visual analysis using the BACKRUB tool in KiNG [39] revealed that a modest backrub (about 7u) nicely models the relationship between the Thr59 N-cap in the wildtype structure (2lzm) and the Thr59RAsn N-cap in the mutant structure (1lyg) [40]. Intriguingly, as well as being one of the most common N-caps, Ser is the most common amino-acid type for backrubs between alternate conformations in crystal structures [9], perhaps because it has many distinct possibilities for sidechainbackbone H-bonding.
N-caps were identified using a helix recognition algorithm based loosely on DSSP [41] that combined consecutive a-like (i+4 mainchain-mainchain H-bond) and/or 3 10 -like (i+3 mainchainmainchain H-bond) helical turns. To approximate the visually determined definition of an N-cap as the first residue within the C a cylinder of the helix [17] but extend it to a much larger data set, residues were added to the N-terminus as putative N-caps if the distance from their C a to the i+3 C a was less than 5.9 Å . We chose this distance cutoff in order to include true N-caps at helix ends that were somewhat but not excessively stretched, i.e. with relatively weak mainchain-mainchain H-bonds but within the helix C a cylinder and close enough to contribute with a sidechainmainchain capping interaction. No amendments to the DSSP algorithm were made for C-caps, which we did not specifically examine in this study. Helices were required to include at least five residues including N-and C-caps.
We also took precautions to help offset any leniency in helix extension introduced in the steps described above. To ensure that a tightly defined subset of N-caps was being examined, H-bond pseudo-energies were computed as in DSSP with the standard 20.5 kcal/mol cutoff. N-caps with an i+4 but not an i+3 mainchain-mainchain H-bond were labeled a, N-caps with an i+3 but not an i+4 mainchain-mainchain H-bond were labeled 3 10 , and N-caps with both i+3 and i+4 mainchain-mainchain H-bonds were labeled bifurcated a/3 10 and were ignored in this study.
For N-cap sequence propensity analysis, all a and 3 10 N-caps of each amino acid type were compared to the general case of all residues in the Top5200 regardless of backbone conformation. For N-cap backrub analysis, on the other hand, only a N-caps with i+3 but not i+2 sidechain-backbone H-bonds were used. These N-caps are found in the ''extended'' region of Ramachandran space, with two peaks of occurrence near (2150u,170u) and (280u,170u) [42]. We required Q,y values near the ''right-most'' of these peaks, i.e. between (2100u,155u) and (260u,180u), because it is the more populated of the two peaks for both the Asn/Asp and Ser/Thr groups and because the restriction provides a consistent depiction of the subset of N-caps that could undergo backrubs to relate to each other.
As described in Results, we also created two control categories. For the ''other N-caps'' category, the criteria were the same as for Asn/Asp or Ser/Thr N-caps, except we required some other amino acid identity at the N-cap position. For the mid-helix category, we required strictly a-helical Q,y values between (265u,245u) and (255u,235u) for the central residue of interest, at least four residues labeled ''H'' for a-helix by DSSP [41] in each direction, and at least 13 such residues total in the helix. For the Ser/Thr, Asn/Asp, and other N-cap categories, the N-cap residue was required to have maximum mainchain or sidechain atom Bfactor ,40 [23]. For the mid-helix category, the maximum Bfactor was 20. To attain roughly similar sample sizes for all four categories, we randomly selected 500 examples for Ser/Thr (out of 3,208) and for mid-helix (out of 17,047), to match the 429 Asn/ Asp and 538 ''other'' N-caps.
Examples of the N-cap motif were superposed onto one another using the N-cap i21 and i+1 to i+3 C a s; the N-cap i C a itself was not used for superposition to avoid biasing its final average position. For the mid-helix control category, only the i+1 to i+3 C a s were used, because there is no structural equivalent to the Ncap i21 residue. Root mean square deviation (RMSD)#1 Å relative to an ideal reference N-cap (see BRDEE input below) for Asn/Asp, Ser/Thr, and other N-caps, or relative to an ideal poly-Ala helix (similar to BRDEE input) for mid-helix, was required to keep each example.
Average motifs were then generated by taking the mean position of all backbone atoms (including hydrogens) and C b atoms from N-cap i21 to i+3. A Ser rotamer with x 1 = +65u and an Asn rotamer with x 1 = +66u and x 2 = +30u were added to reflect median dihedral angles from the data set, while maintaining the average C b positions. Finally the average structures were superposed onto one another, again using the N-cap i21 and i+1 to i+3 C a s (i+1 to i+3 C a s for mid-helix).
Using the same protocol, we also analyzed 388 Asn/Asp and 976 Ser/Thr examples from the second Ramachandran peak, near (2150u,170u) instead of (280u,170u) as mentioned above. The i21 C a in the secondary cluster is displaced by about 1 Å relative to the primary cluster due to the differing w values (2150u vs. 280u) before entering the helix; thus, given the magnitude of the backrub motion we hoped to capture, it was unreasonable to compute an average structure for the combined set. However, separate analysis of the secondary cluster revealed a very similar backrub relationship as seen in the primary cluster (Dataset S1 kinemage, PDB files in Dataset S3).
Identification of aromatics in antiparallel b-sheet. For b aromatics, we identified Phe/Tyr residues with ''plus'' x 1 (0-120u) in antiparallel b-sheet. The ''opposite'' residue is in the direction of the aromatic sidechain, between the narrow pair of mainchain Hbonds. Both residues were required to have maximum mainchain or sidechain atom B-factor less than 40, and to have at least one additional b residue (according to the custom DSSP-based secondary structure identification algorithm described above) in each direction along their respective strands.
To avoid irregularities from strand ends, we defined the ''fray'' parameter f: where t N (N-ward twist) is the dihedral between C a i21 and i on the aromatic strand and C a i and i+1 on the opposite strand; t C (C-ward twist) is the dihedral between C a i and i+1 on the aromatic strand and C a i and i21 on the opposite strand; p a (aromatic pleat) is the angle between C a i21, i, and i+1 at the aromatic residue; and p o (opposite pleat) is the angle between C a i21, i, and i+1 at the opposite residue. A difference in twist indicates that the two strands locally ''pull apart'' from each other; any difference in amount of pleating between the two strands is subtracted as a correction term. Fray was constrained to less than 10u for our motif search. Examples of the b aromatic motif were superposed onto one another using the aromatic i22, i21, i+1, i+2 C a s and the C a opposite i as an ''anchor'' point; the aromatic i C a itself was not used for superposition to avoid biasing its final average position. RMSD#1 Å relative to a reference example was required to keep each example. Average motifs were then generated by taking the mean position of all backbone atoms (including hydrogens) and C b atoms from residues i62 on both strands. Average Phe rotamers with x 1 = +65u and x 2 = +80u for the across-from-Gly category and with x 1 = +66u and x 2 = +84u for the across-from-other category were added to reflect median dihedral angles from the data set, while maintaining the average C b positions. Finally the average structures were superposed onto one another, again using the aromatic i22, i21, i+1, i+2 C a s and the C a opposite i.

Computational Design
For all BRDEE calculations, we used the same input parameters as described previously [10] for energy function, rotamer library, t filter, etc. Briefly, the energy function consists of the Amber electrostatic and van der Waals terms [24] plus the EEF1 pairwise solvation energy term [25]. The t filter prevents large strain at the only bond angle allowed to change in the calculations [10].
Computational design of a-helix N-caps. For N-cap calculations, we started from an ideal helix (Q,y 260u,240u) with poly-Pro conformation (Q,y 280u,170u) for the N-cap and its preceding residue. All residues were Ala except for the N-cap, which was either Asn or Ser. At the N-cap position, primary backrub rotation angles from 215u to +15u in increments of 0.5u were allowed, and the default e factor of 0.7 was used for backrotation of the single peptides. At the N3 position, backrubs were not allowed as part of the main search. However, using KiNG [39] we manually determined that single-peptide rotation angles from 25u to +15u of the N3 peptide could help optimize the H-bonding angle of its backbone amide to the N-cap sidechain H-bonding angle and were viable in terms of sterics, Ramachandran dihedrals [23], and t angles; therefore we included such a rotation in increments of 1u.
Computational design of aromatics in antiparallel bsheet. For b aromatic calculations, the strand twist and curl vary too much to construct averaged cases. Instead, we repeated the computational experiment several times with individual examples of the motif from different experimental structures as input coordinates. Representative examples such as 1khbA Phe144-Gly157 and 1z84A Phe171-Gln188 were chosen because they were visually relatively ''middle of the pack'' and were not obvious outliers in terms of geometric parameters like fray. Primary backrub rotation angles from 215u to +15u in increments of 0.5u were allowed at both residue positions, with the default e factor of 0.7. (Backrubs at the opposite position were of less interest for this study, and indeed turned out to be much smaller than at the aromatic position.) If the original opposite residue was Gly, BRDEE was run for both the original coordinates and a GlyRAla mutant. Likewise, if the original opposite residue was anything but Gly, BRDEE was run for both the original coordinates and an XaaRGly mutant.

Visualization
The KiNG graphics program [39] was used both to study the superposition results and to produce the figures. Dataset S3 provides raw PDB coordinate files for N-cap and aromatic crystal structure examples, average crystal structures, and BRDEE lowestenergy models.

Supporting Information
Dataset S1 This is a plain text kinemage graphics file allowing interactive, three-dimensional exploration of the N-cap conformations from Figure 3. It can be viewed locally with the free program KiNG (http://kinemage.biochem.duke.edu/software/ king.php) either by renaming the file to have a ''.kin'' file extension or by keeping the current filename but using the ''All files'' format option. Alternatively, it can be viewed on the web by uploading to MolProbity (http://molprobity.biochem.duke.edu/) as a kinemage file and selecting ''View in KiNG''.

(TXT)
Dataset S2 This is a plain text kinemage graphics file allowing interactive, three-dimensional exploration of the aromatic conformations from Figure 4. It can be viewed locally with the free program KiNG (http://kinemage.biochem.duke.edu/software/ king.php) either by renaming the file to have a ''.kin'' file extension or by keeping the current filename but using the ''All files'' format option. Alternatively, it can be viewed on the web by uploading to MolProbity (http://molprobity.biochem.duke.edu/) as a kinemage file and selecting ''View in KiNG''.

(TXT)
Dataset S3 This is a compressed tar archive containing lists of all individual examples and PDB coordinates for average and calculated structures (a README file explains the naming conventions) for both the crystal-structure ensembles and the BRDEE calculated structures.

(GZ)
Text S1 This file provides a discussion of the differences between a-helix N-caps and 3 10 -helix N-caps, and compares and contrasts in more detail the average crystal structures and low-energy BRDEE models for both the N-cap and b aromatic cases. (DOC)