Four Small Puzzles That Rosetta Doesn't Solve

A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular “puzzles” should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.


Introduction
The Rosetta modeling suite has enabled macromolecule structure prediction and molecular design with unprecedented accuracy and functionality (see, e.g., [1,2,3,4,5,6,7] and refs. therein). Nevertheless, Rosetta's algorithms continue to face limitations. Despite several near-atomic-resolution successes in favorable cases, rigorous tests in blind trials indicate that the general de novo prediction of even small proteins remains out of reach [8,9]. Similarly, the precise sculpting of polar environments or energetic balances necessary for designing efficient catalysts or allosteric switches remains challenging [4,5,10]. How can we achieve mastery over three-dimensional modeling and engineering?
Historically, Rosetta publications have focused on positive new developments, and rightly so, considering there are many. Nevertheless, this special collection of PLoS One manuscripts gives the Rosetta community an opportunity to present and dissect negative results. Individual labs have discovered many such results in recent years and found them useful for reflection; but these problems have not been disseminated widely and often come as surprises to new users.
In this short paper, I will take this opportunity to argue that the current Rosetta codebase has not yet achieved a critical step in its maturation into a general modeling tool: a confident and predictive understanding of the simplest and smallest macromolecule structure problems. If well-defined systems as small as four residues are not solvable by Rosetta, why have any confidence that 150-residue domains (or their massive complexes) are appropriate for de novo 3D modeling and eventually precision engineering? I will present four classes of simple but still perplexing small puzzles that provide tangible entry points -and, perhaps, shared model systems -for current and future enthusiasts hoping to establish a confident and rigorous foundation for Rosetta modeling.
Before describing the puzzles, some historical perspective is in order. It may seem self-evident that one should start modeling with the very smallest known sequences that take on well-defined 3D structures. For several reasons, such a completely reductionist approach has not been the mainstream strategy in the Rosetta community. First, in early days, most cases of naturally occurring ultra-small proteins (say, 30 residues or below) were considered irregular and perhaps ill-defined, lacking the clear a or b secondary structure and hydrophobic cores that are the hallmarks of larger protein domains. Thus, initial Rosetta studies from the mid-1990s focused on 50-to 100-residue protein sequences that formed regular, clearly well-defined structures [11,12], and these challenges were passed down from developer to developer as ''inhouse'' benchmark sets [1,13]. Second, the global folds of smaller macromolecules may be less robust to inaccuracies in assumed energy function than larger systems, as there are fewer key packing interactions that specify small folds. Third, early modeling work involved coarse-grained molecule representations and energy functions; at this medium resolution (4-8 Å ), most compact conformations of a small protein segment are indistinguishable from each other. Fourth, the focus on medium-resolution folds of larger systems has given useful insights, e.g., into assigning protein functions that have been subsequently validated by functional studies [14].
These historical reasons to avoid small systems are no longer as relevant. First, since the late 1990s, several very small protein systems have been discovered or engineered and then clearly demonstrated to attain precisely defined 3D structures (see, e.g., [15,16,17]). The expanding database of RNA and non-natural polymers (and the development of Rosetta code to model them [18,19]) further increases the number of such small, well-defined puzzles. In addition, de novo modeling of short irregular loops and small proteins regularly appear as sub-problems in blind prediction targets and in the design of catalytic sites, conformationally switchable segments, and structured peptides. Predicting the structural features of these small systems and sub-systems -and even modeling the fine energetic balance between alternative structures -is no longer something to be avoided but instead a key goal of many Rosetta developers.
Most importantly, many Rosetta developers are now striving for predictive power with Angstrom-level resolution [20], although fruitful insights and methods development at medium resolution continue (see, e.g., [21,22,23]). The new focus on high-resolution modeling is motivated by a shared belief: atomic accuracy appears necessary for a deep understanding of catalysis, drug design, and evolution. In a few, favorable cases, Angstrom-level modeling has been achieved [1,3,8], sometimes with limited experimental data [24,25,26]. Nevertheless, high-resolution success is rare rather than the norm, and this lack of general predictive power is typically blamed on the difficulty of complete conformational sampling at the 1-2 Å scale. At present, only small model systems offer the prospect of comprehensive sampling and are therefore the most stringent tests of Rosetta's assumptions and energy functions at high resolution. Due to their ubiquity, their functional importance, and their unique ability to test modeling at atomic resolution, small macromolecule modeling problems are an important and unsolved frontier for Rosetta modeling.

Results
There are at least four kinds of problems at this ''small puzzle'' frontier, from mini-proteins without and with disulfides to protein loops and RNA motifs. As illustration, the following descriptions present a single model system of each kind.
Each of the selected systems has been extensively characterized by numerous experimental structural and energetic methods. In particular, in each case, the free energy associated with the experimental conformation has been measured to be at least 3 kcal/mol more stable than the ensemble of unstructured states at room temperature. (The expected energy gap between the experimental conformation and any individual conformation of the competing ensemble is therefore expected to be (much) greater than 3 kcal/mol.) Note that the focus herein will be on recovering high-resolution features of the experimental models; thus an acceptable Ca RMSD should be 1 Å or lower, comparable to the differences between structures solved in different crystallographic space groups or with different binding partners. Further, the puzzle descriptions include discussion of side-chain conformations deemed experimentally stable and important for each molecule's fold and function.
I will summarize prior data and recent Rosetta modeling runs on these puzzles, using at least two different Rosetta conformational search strategies for each case. As per the guidelines for this Special Collection on RosettaCon 2010 science, an extensive methods section gives modeling details, including Rosetta command-lines and protocol capture, that will permit new developers to rapidly reproduce and assess this work.
A. Mini-proteins: the Trp cage ''Mini-proteins'' with sizes well under 30 residues are ideal systems for testing modeling tools and, indeed, are widely studied in the molecular dynamics (MD) community. The Trp cage is a particularly well-characterized mini-protein with a length of 20 residues, engineered by truncating and optimizing exendin-4 from gila monster saliva [16]. Several MD studies have recovered lowest-energy Trp cage conformations de novo that agree with the experimental NMR structure (see, e.g., [27,28,29,30]).
Rosetta de novo modeling, on the other hand, fails to solve this problem. While occasionally sampling a near-native conformation, Rosetta's fragment-assembly/all-atom-refinement protocol (''abrelax'') favors a tight cluster of structures with a backbone within 2 Å Ca RMSD of the native conformation but with the molecule's central tryptophan side-chain in an incorrect rotamer (Figs. 1A & 2A). An independent sampling approach, Stepwise Assembly (see Methods), which does not make use of fragments or a coarsegrained search phase, yields the same conformations as the lowest energy solution ( Fig. 2A). Extensive optical thermodynamic characterization and NMR spectroscopy of Trp cage and several dozen variants [30] have revealed no evidence for this discrete alternative state.

B. Structured peptides: the marine snail toxin GI
Peptides in snake, spider, and other venoms; mammalian and plant defensins; and extracellular signaling molecules form a second rich set of modeling puzzles. These molecules share folds and likely evolutionary lineage and have been optimized by evolution for high stability, precise folds, and, most importantly, small size. By making use of disulfide bonds, structured peptides can reach lengths smaller than those of (disulfide-free) miniproteins. The a-conotoxin GI, isolated from the fish-hunting marine snail Conus geographicus, was one of the first of these tiny but potent sequences [31]. Despite containing only 13 residues, the peptide forms a highly stable fold with two disulfides, whose structure has been determined with diffraction data to 1.20 Å [32].
Even with a ''cheat'' -tight distance constraints that enforce the molecule's native disulfide pairing -Rosetta de novo modeling (abrelax) gives low energy models that disagree with the crystal structure in all non-helical regions (not shown). The Stepwise Assembly algorithm yields even lower energy models that are still highly discrepant (Figs. 1B & 2B; 2.8 Å Ca RMSD over 13 residues).

C. Protein loops: the chymotrypsin inhibitor
The de novo building of loops excised from crystal structures offer another set of well-defined toy puzzles, with relevance to realworld practical problems such as comparative modeling or loop design. Some of these tests are surprisingly challenging. The chymotrypsin inhibitor from barley seeds displays a 10-residue protease-binding loop that appears visually irregular but is highly structured [33,34]-even in the absence of docking to the protease target site, this region is positioned with atomic precision by hydrogen bonds to two arginines extended from the molecule's main body.
Excision of this segment and subsequent Rosetta de novo loop modeling, leveraging kinematic closure (KIC) strategies [35], gives excellent convergence, with the 10 lowest energy models giving the same loop conformations (within 0.3 Å RMSD of each other). Unfortunately, this structure is an incorrect, collapsed loop with incorrectly positioned arginines (Fig. 1C). [Additional calculations with StepWise Assembly converge to similarly collapsed loops; these data are not shown here due to current differences in modeled degrees of freedom in KIC versus StepWise protocols.] D. RNA motifs: the most stable tetraloop, UUCG A final, rich source of simple modeling puzzles comes from structured RNAs. These molecules fold back on themselves to form numerous double helices interconnected by so-called noncanonical motifs [36]. Many of these motifs are quite small, including the ubiquitous 4-residue tetraloops that cap off double helices to form hairpin folds [37]. Rosetta fares quite poorly in modeling an UUCG tetraloop hairpin de novo. (The sequence studied here differs in the stem sequence from a UUCG hairpin modeled previously [19] and shows lower energy non-native conformations.) The lowest energy models derived from Fragment Assembly of RNA with Full Atom Refinement (FARFAR) achieve none of the non-canonical base-pairing geometries or basestacking interactions known from crystal structures [38]. Even lower energy solutions are uncovered by StepWise Assembly (Fig. 1D and Fig. 2D), still with poor all-atom RMSD (3.3 Å ).

Discussion
What is going wrong with Rosetta? Computational optimization procedures can give poor solutions due to either poor conformational search strategies or inaccurate optimization functions. Most recent Rosetta modeling papers have emphasized conformational search over hundreds of degrees of freedom as a critical, shared bottleneck in many problems [7,8,9,19,25]. However, conformational search is not the issue for the four small puzzles described above. In some cases, classic and novel search strategies produce nearly identical incorrect models ( Fig. 2A); and, in fragment assembly approaches, independent modeling runs converge well ( Figs. 2A, C, & D). The strongest evidence for efficient conformational sampling is that de novo models achieve lower energies than native models that have themselves undergone extensive optimization (involving equal amounts of high-performance computation; see Fig. 2, Table 1, and Methods).
If the Rosetta energy function gave an accurate portrait of in vitro folding, these non-native low-energy conformations would be expected to give energies at least 3 kcal/mol higher (instead of lower) than the optimized native models. Thus, at least for these four small puzzles, it is the poor discrimination of the Rosetta all-atom energy function that emerges as the critical problem. In previous studies, the difficulty of conformational sampling for larger problems as well the greater energy gaps attained in those problems likely masked flaws in the Rosetta energy function (see, however, [35]). Nevertheless, developers have recognized many shortcomings of the energy function (see also the Perspective in this issue), and inspection of puzzles A-D confirms a sizeable fraction of this list of problems. Rosetta's solvation model neglects many-body effects, nontrivial solvation structure oriented around polar groups, and ''second shell'' water effects; the model only weakly disfavors buried unsatisfied polar groups (Fig 1A) that are seldom observed in experimental structures [39]. Rosetta's hydrogen bond (H-bond) potential neglects the effects of charged atoms, (anti-) cooperativity within H-bond networks (Figs. 1B & C), and includes, by default, a dependence of H-bond strength on burial that is better modeled in the solvation term (Fig. 1C). Further, Rosetta ignores electrostatic interactions (besides H-bonds) and their screening, a likely  Fig. 2  important factor in stabilizing the UUCG hairpin (Fig. 1D) [40]. Finally, Rosetta does not currently permit rigorous estimation of a model's free energy, which has been suggested to be important for, e.g., the Trp cage structure [29].
How can these and additional issues be fixed? While there has been a long history of fine-tuning individual terms of the Rosetta energy function (much of it unpublished), these efforts have led to few substantial improvements in benchmarks or actual changes in Figure 2. All-atom energy vs. RMSD plots for de novo modeling of the four puzzles and for optimizing experimental (''native'') conformations. Panels correspond exactly to panels in Fig. 1. In protein cases (A)-(C), the default Rosetta all-atom energy function for de novo protein modeling (score12) is plotted against Ca RMSD. In the RNA case (D), the FARFAR energy function (which contains torsional terms for RNA, an orientation-dependent solvation function, and a carbon-hydrogen-bond model [19]) is plotted against all-heavy-atom RMSD. The conformational sampling algorithms (ABRELAX, SWA, etc.) used in the runs are denoted in the figure and described in detail in Methods. doi:10.1371/journal.pone.0020044.g002 the main codebase. One potential barrier is that the physics of solvation, H-bonds, and screened electrostatic interactions are strongly coupled to each other, and indeed are reflected in partly unified terms in most other molecular modeling force fields (see, e.g., [41,42]). Any fully consistent fix will require the guidance ofand perhaps a complete rewrite by -an expert developer with a comprehensive understanding of the Rosetta codebase and its hidden quirks. Further, some of the potential fixes, such as nonpair-wise energy function terms, may not be compatible with core features of the Rosetta package, such as the packer used for rapid side-chain optimization and design. Nevertheless, there is hope. The recent refactoring of Rosetta into object-oriented code greatly facilitates the creation and testing of novel energy functions [43]. The reorganization may also permit incorporation of libraries such as OpenMM [44] that implement independently developed energy functions containing physics (such as polarizable atoms [41]) missing in Rosetta. As a final point, it is important to mention that the four puzzles described herein are not outliers but, rather, representative of inaccurate results that Rosetta finds for many small modeling problems. Rosetta similarly fails to achieve high resolution predictions for other mini-proteins, such as the pinWW domain and TrpZip [17]; for other well-characterized disulfide-stabilized peptides such as the sea anemone toxin BGK [45]; for other functional loops, including the highly stable trypsin-binding loop from the jumping cucumber E. elaterium [46]; and other functional RNA motifs, including the bulged-G motif [47] (unpubl. results, R.D.). This paper has focused on four particular cases as specific illustrations for new Rosetta contributors, but future solutions to any of these four puzzles should also be validated on more extensive benchmarks including analogous mini-protein, loop, or RNA motif cases.
A complete macromolecule modeling package must necessarily be able to address the smallest structure prediction problems. It is both bad and good news that Rosetta fails at what appear to be the simplest high-resolution puzzles. The bad news is that Rosetta, perhaps the leading software package for 3D modeling and design, has fundamental limitations. But this is excellent news for current and future Rosetta developers; attaining confident solutions for the smallest modeling problems is an important goal whose pursuit involves interrogating the most basic rules of molecular self-assembly. The four puzzles presented herein offer well-defined entry points to developers who are interested in pursuing this fundamental path.

Command-lines for Rosetta
We summarize sequences and Rosetta command-lines for each of the four puzzles herein. Most of the calculations in the paper were carried out with Rosetta release 3.2; and all calculations will be implemented in the next Rosetta release. Remaining models were generated with the Rosetta codebase in the Das lab branch (revision number 40197), available to Rosetta developers (in the Rosetta Subversion repository at https://svn.rosettacommons.org/ source/branches/das_lab/); this code will be gladly provided to other academic users upon request.
(A) Trp cage. The modeled sequence was for the most stable variant of the Trp cage: DAYAQWLKDGGPSSGRPPPS. De novo modeling was carried out with the following ABRELAX and NATIVE_RELAX command lines (see, e.g., [8]), using fragment files obtained from the Robetta fragment modeling server [48] with the ''no homologs'' option: A total of 20,000 models were generated for both de novo and native optimization runs. Much larger runs (up to 1,000,000 models; unpub. data, D. Baker & RD) did not give significantly lower energies. Both of these standard command lines are also executable with Rosetta release 3.2.

out [SWA]
A complete directed acyclic graph (DAG) of the rebuild and clustering steps, along with associated commands in Condor format, was automatically generated by a master Python script. The script and a resulting example DAG are provided via Rosetta protocol capture (see below). The DAG was computed via DAGMAN with the Condor computing platform or with inhouse Python scripts (also provided by protocol capture) on the LSF queuing platform on 200 to 400 cores on Stanford's BioX 2 resource. Optimized native conformations were also estimated with the StepWise Assembly method. To ensure a fair comparison, the entire calculation was repeated, but using Rosetta atom-pair constraints (with the Rosetta smoothed step function ''fade'') to keep models with inter-residue Ca-Ca distances within 61 Å and the tryptophan rotamer in the native conformation. Explicitly, an example command line is: stepwise_protein_test.,exe. -database ,path to rosetta_database. -rebuild -out:file:silent_struct_ type binary -fasta 2jof.fasta -n_sample 18 -nstruct 100 -cluster:radius 0.100 -extrachi_cutoff 0 -ex1 -ex2 -score:weights score12.wts -pack_weightspack_no_hb_ env_dep.wts -add_peptide_plane -native 2jof.pdb -mute all -silent1 region_4_5_sample.cluster.out -tags1 S_0 -input_res1 4 5 -sample_res 3 4 -out:file:silent REGION_3_ 5/START_FROM_REGION_4_5_DENOVO_S_0/region_3_5_sample. out-cst_file 2jof_native_CA_CA_trp.cst [SWA NATIVE] and the constraint file 2jof_native_CA_CA_trp.cst is provided by Rosetta protocol capture.
(B) a-conotoxin GI. The modeled sequence was: ECCNPACGRHYSC. The methods for de novo modeling aconotoxin were essentially the same as for Trp cage. However, the following command lines need to be run from the Das lab branch, which disables complications in disulfide input/output and scoring in the Rosetta release 3.2.
StepWise Assembly methods were also applied to this case, but could not be directly compared to the KIC results because of difference in which degrees of freedom were optimized (the KIC protocol samples N-Ca-C bond angles, for example, whereas the StepWise Assembly code keeps all bond angles fixed).

Protocol capture
All files, including fragments, sequence files (.fasta), native conformations (.pdb), as well as example logs are being provided via ''protocol capture'' in the Rosetta Subversion repository: https://svn.rosettacommons.org/source/trunk/RosettaCon2010/ protocol_capture/rhiju_four_small_puzzles. The directory will be gladly provided to readers without access to the repository upon request.