Rosetta:MSF: a modular framework for multi-state computational protein design

doi:10.1371/journal.pcbi.1005600

Fig 1.

Software architecture of MSF.

This framework consists of two strictly separated modules, the sequence optimizer and the evaluator. The evaluator executes the chosen Rosetta protocol for each combination of a state s_j and a sequence seq_i. The resulting scores are processed by the fitness function and transferred to the sequence optimizer. Initially, the user has to specify a number of states s₁, …, s_n and a set of initial sequences seq₁,…, seq_m. MSF uses a GA to optimize the sequences according to their fitness. To utilize a SSD protocol in an MSD environment, the user has to adapt the protocol to the evaluator and specify a fitness function.

More »

Expand

Fig 2.

Performance of SSD and MSD on the NMR ensemble hIFABP.

enzdes (blue lines) was executed for 1000 runs i for each of the ten conformations in the ensemble. For each number of runs i, the value (dotted line) is the mean of the ten lowest-energy sequences (Eq 6). The corresponding value (solid line) is the mean recovery value deduced from the same sequences (Eq 5). MSF:GA:enzdes (orange lines) was carried out for 800 generations j on the whole ensemble using a population of 210 sequences. For each generation j, the value (dotted line) is the mean of the ten lowest-energy sequences of the corresponding population (Eq 7). The corresponding value (solid line) is the mean recovery value deduced from the same sequences (Eq 5).

More »

Expand

Fig 3.

Convergence of SSD and MSD algorithms on the benchmark set BR_EnzBench enzdes (blue lines) was executed for 1000 runs i on all 20 conformations of each prot(k) from BR_EnzBench.

For each number of runs i, the value (dotted line) is the mean of the twenty lowest-energy sequences (Eq 9). The corresponding value (solid line) is the mean recovery value deduced from the same sequences (Eq 8). MSF:GA:enzdes (orange lines) was carried out for 600 generations j on all ensembles using a sequence population of 210. For each generation j, the value (dotted line) is the mean of the five lowest-energy sequences of each of the four protein-specific ensembles (Eq 13). The corresponding value (solid line) is the mean recovery value deduced from the same sequences (Eq 12).

More »

Expand

Table 1.

Performance of SSD and MSD for individual proteins from BR_EnzBench.

More »

Expand

Fig 4.

Performance of enzdes and MSF:GA:enzdes on a distinct grouping of conformations.

Each of the sets ES₁—ES₄ contains a quarter of the conformations from BR_EnzBench, which were grouped according to their nssr_MSD values (Eq 16). ES₁ contains all ensembles with the lowest and ES₄ those with the largest recovery values. For each set ES_i, the corresponding nssr_SSD (ES_i) and nssr_MSD (ES_i) values are represented by two boxplots. Left: performance of enzdes (blue boxplots), right: performance of MSF:GA:enzdes (orange boxplots). Whiskers indicate the lowest and the highest datum still within the 1.5 interquartile range.

More »

Expand

Fig 5.

Recovery of design shell residues from BR_EnzBench by means of enzdes and MSF:GA:enzdes.

The distributions nssr_SSD (aa_j) (blue bars) and nssr_MSD (aa_j) (orange bars) represent for each amino acid aa_j the nssr value (Eq 3) deduced from 13440 design sequences. These were created by enzdes or MSF:GA:enzdes for the benchmark proteins BR_EnzBench, respectively. nssr takes into account the recovery of all residues which are similar to the native aa_j.

More »

Expand

Fig 6.

Recovery of two striking binding pockets by means of enzdes and MSF:GA:enzdes.

(a) The 3D structure of the binding pocket of ARL3-GDP is shown on the right, the ligand GDP is colored light blue. The residues of the corresponding design positions are shown on the left (labeled “Native”). The sequence logos labeled enzdes and MSF:GA:enzdes represent for each design position the distribution of residues as generated by the corresponding protocols. Residues that are similar to the native ones are colored in green. In the native sequence, residues are colored in teal, if the outcome of the two protocols differs drastically. (b) The 3D structure of the binding pocket of the glucose binding protein is shown on the right; the bound glucose is colored light blue. Native residues and sequence logos are shown on the left and were prepared and colored as described for panel (a).

More »

Expand

Fig 7.

Single-state designability of MD_EnzBench conformations.

Each of the 100 boxplots on the right represents 16 × 10 nssr values resulting from ten conformations generated by the MD simulation in a 100 ps interval for each of the 16 prot(k). As a control, the 16 × 20 values of (single) enzdes designs generated for 20 protein-specific conformations from BR_EnzBench were summarized in a boxplot shown on the left (label Backrub). Whiskers indicate the lowest and the highest values of the 1.5 interquartile.

More »

Expand

Fig 8.

Mutations introduced into the IGPS scaffold to design retro-aldolase activity.

(a) An overview of all mutations introduced in 42 previous designs subsumed in the set RA* which are listed in S3 Text. Blue spheres indicate residue positions and sphere diameters are proportional to the frequency of the mutations in comparison to the native IGPS sequence. (b) Ditto, for nine RA_MSD* designs, mutations are visualized by means of orange spheres.

More »

Expand

Table 2.

MSD proteins and their retro-aldolase activity.

More »

Expand