Rosetta FunFolDes – A general framework for the computational design of functional proteins

doi:10.1371/journal.pcbi.1006623

Fig 1.

Rosetta FunFolDes—method overview.

FunFolDes was devised to tackle a wide range of functional protein design problems, combining a higher user control of the simulation parameters whilst simultaneously lowering the level of expertise required. FunFolDes is able to transfer single- and multi-segment motifs (light blue) together with the target partner (grey) by exploiting Rosetta’s FoldTree framework (top row). A wider range of information can be extracted from the template (wheat) to shift the final conformation towards a more productive design space (middle row), including targeted distance constraints, generation of structure-based fragments, motif insertion in sites with different residue length and presence of the binding target to bias the folding stage. The bottom row showcases the most typical application of the FunFolDes protocol. Implementation in RosettaScripts allows to tailor FunFolDes behavior. A seamless integration with other protocols and complex selection logics can be added to address the different needs in each design task.

More »

Expand

Fig 2.

Benchmark test set to evaluate FunFolDes structural sampling.

A) Structural representation of the 14 targets used in the benchmark. In each target is highlighted the motif (red) and query (blue) regions, and the positions from which distance constraints were generated (light blue). Conformations of the motif and query regions, as found in the template structures, are shown superimposed in light grey. B) Full structure RMSD (Overall RMSD) and local RMSD for the query region (FunFolDes–Query Region) is presented for four targets (full dataset presented in S1 Fig). Overall RMSD compares results for the two simulation modes (FunFolDes Vs. constrained–ab initio (cst-ab initio)) and the two fragment generation methods (structure (blue) Vs. sequence-based fragments (green)) against their original target. FunFolDes more frequently samples RMSDs closer to the conformation of the target structure. Generally, structure-based fragments contribute to lower mean overall RMSDs. The FunFolDes–Query Region RMSD distributions show that the two fragment sets do not have a major importance in the structural recovery of the query region.

More »

Expand

Fig 3.

Assessment of FunFolDes sequence sampling quality.

A) HMM Sequence Recovery measures the percentage of decoys generated that can be assigned to the original HMM from the CATH superfamily. FunFolDes consistently outperforms cst–ab initio, in agreement with the structural recovery metrics. B) Core Residues Sequence Recovery shows the sequence recovery between the core residues of the designs set and the target. Recovery is measured in terms of sequence identity and sequence similarity (as assigned through BLOSUM62). Core sequence identity and similarity was assessed over the structure-based fragment set. According to this metric FunFolDes outperforms cst-ab initio in every instance, reaching for some populations, levels of conservation similar to those found in more restrained flexible-backbone design approaches [10].

More »

Expand

Fig 4.

Target-biased design of a protein binder and performance assessment based on saturation mutagenesis.

A) Depiction of the initial design task, a single-segment binding motif (BIM-BH3) shown in light blue cartoons, with its target (BHRF1) shown in gray surface, is used by FunFolDes to generate an ensemble of designs compatible with the binding mode shown in light orange cartoons. B) Conformational difference between the initial template (PDB ID: 3LHP), shown in light brown and the previously designed binder (BINDI), shown in violet cartoons, helix 3 requires a subtle but necessary shift (2.6 Å) to avoid steric clashes with the target. C-G) Scoring metrics for design populations according to the simulation mode: no_target—FunFolDes was used without the target protein; static—target present no flexibility allowed; pack—target allowed to repack the side-chains; packmin–side-chain repacking plus minimization and backbone minimization were allowed for the target. The target flexibility was allowed during the relax-design cycles of FunFolDes. C) Structural drift observed for design and target binder measured as the RMSD between pre- and post-minimization conformations. D) Structural recovery of the conformation observed in the BINDI-BHRF1 assessed over the 3 helical segments of the bundle. E) Rosetta energy for the designs in the unbound state generated by different simulation modes. F) Interaction energy (ΔΔG) between the designs and the target. G) Deep-sequencing score distribution for each design population, computed as the mean score of each sequence after applying a position score matrix based on the deep-sequencing data. The pack population slightly outperforms the other simulation modes. H) Per-residue scoring comparison of the no_target and the pack populations according to the deep-sequencing data. Although the behavior is overall similar, pack outperforms no_target in multiple positions, several of which are highlighted (black dots) as interfacial contacts or second shell residues close to the binding site.

More »

Expand

Fig 5.

Functional design of a distant structural template.

A) Structural representation of 1kx8. The insertion region is colored in light red and the two disulfide bonds are labeled (CYD). B) Structural comparison between the insertion region of 1kx8 and the site II epitope (light red-filled silhouette). The local RMSD between the two segments is 2.37 Å. C) Superposition between 1kx8_d2 design model (blue with red motif) and the 1kx8 template (wheat and light red insertion site). Multiple conformational shifts are required throughout the structure to accommodate the site II epitope. D) CD spectrum of 1kx8_d2 showing a typical alpha-helical pattern with the ellipticity minima at 208 nm and 220 nm. E) 1kx8_d2 shows a melting temperature (T_m) of 43.4°C. F) Binding affinity determined by SPR. 1kx8_d2 shows a K_D of 1.14 nM. Experimental sensorgrams are shown in black and the fitted curves in red. G) Per-position evaluation of structural (top) and sequence (bottom) divergence between the design model 1kx8_d2 and the starting template 1kx8. The largest structural differences are observed in the epitope insertion region, the overall difference of the two structures is 2.25 Å (dashed line). The sequence was evaluated using the BLOSUM62 score matrix, yielding a total of 13.5% identity and 38.5% similarity. The epitope region is colored in light red. Identical positions between the 1kx8_d2 and 1kx8 are labeled with the residue one letter code, while positively scored changes are labeled with plus (+).

More »

Expand

Fig 6.

Functionalization of the functionless de novo fold TOP7.

A) Structure of TOP7 with the insertion region highlighted in light red. B) Structural comparison between 101F and TOP7’s insertion region shows a 2.1 Å RMSD. C) TOP7_full model (in blue and red for the motif) superimposed over the TOP7 crystal structure. 101F’s insertion is structurally compensated mostly by the first pairing beta strand and a shift of the first alpha helix. D) CD spectrum shows a broad ellipticity signal between 210 nm and 222 nm as a representative of mixed alpha and beta secondary structures. E) The T_m for TOP7_full was 54.5°C. F) Binding affinity determined by SPR. TOP7_full shows a K_D of 24.2 nM. Experimental sensorgrams are shown in black and the fitted curves in red. G) Per-position evaluation of structural (top) and sequence (bottom) divergence between the design model TOP7_full and the starting template TOP7. The largest structural differences are observed in the region downstream of the site IV epitope, the overall difference of the two structures is 1.5 Å (dashed horizontal line). The connecting loop between the strand that holds the epitope and the adjacent strand was also shortened to obtain a tighter connection between the 2 strands (dashed vertical region). Sequence divergence is evaluated by applying the BLOSUM62 score matrix to the sequences, yielding a total of 27.7% identity and 52.2% similarity. The epitope region is colored in light red. Identical positions between the TOP7_full and TOP7 are displayed as their residue types while positively scored changes according to BLOSUM62 are labeled with a plus (+).

More »

Expand

Table 1.

Targets included in the conformational and sequence recovery benchmark.

For each of the benchmark targets is indicated the CATH superfamily and representatives used in the simulations. (#) indicates the number of segments in the target protein that are considered motif. Motif range indicates the residues considered motif according to the PDB numbering.

More »

Expand