Fig 1.
This framework consists of two strictly separated modules, the sequence optimizer and the evaluator. The evaluator executes the chosen Rosetta protocol for each combination of a state sj and a sequence seqi. The resulting scores are processed by the fitness function and transferred to the sequence optimizer. Initially, the user has to specify a number of states s1, …, sn and a set of initial sequences seq1,…, seqm. MSF uses a GA to optimize the sequences according to their fitness. To utilize a SSD protocol in an MSD environment, the user has to adapt the protocol to the evaluator and specify a fitness function.
Fig 2.
Performance of SSD and MSD on the NMR ensemble hIFABP.
enzdes (blue lines) was executed for 1000 runs i for each of the ten conformations in the ensemble. For each number of runs i, the value (dotted line) is the mean of the ten lowest-energy sequences (Eq 6). The corresponding
value (solid line) is the mean recovery value deduced from the same sequences (Eq 5). MSF:GA:enzdes (orange lines) was carried out for 800 generations j on the whole ensemble using a population of 210 sequences. For each generation j, the
value (dotted line) is the mean of the ten lowest-energy sequences of the corresponding population (Eq 7). The corresponding
value (solid line) is the mean recovery value deduced from the same sequences (Eq 5).
Fig 3.
Convergence of SSD and MSD algorithms on the benchmark set BR_EnzBench enzdes (blue lines) was executed for 1000 runs i on all 20 conformations of each prot(k) from BR_EnzBench.
For each number of runs i, the value (dotted line) is the mean of the twenty lowest-energy sequences (Eq 9). The corresponding
value (solid line) is the mean recovery value deduced from the same sequences (Eq 8). MSF:GA:enzdes (orange lines) was carried out for 600 generations j on all ensembles using a sequence population of 210. For each generation j, the
value (dotted line) is the mean of the five lowest-energy sequences of each of the four protein-specific ensembles (Eq 13). The corresponding
value (solid line) is the mean recovery value deduced from the same sequences (Eq 12).
Table 1.
Performance of SSD and MSD for individual proteins from BR_EnzBench.
Fig 4.
Performance of enzdes and MSF:GA:enzdes on a distinct grouping of conformations.
Each of the sets ES1—ES4 contains a quarter of the conformations from BR_EnzBench, which were grouped according to their nssrMSD values (Eq 16). ES1 contains all ensembles with the lowest and ES4 those with the largest recovery values. For each set ESi, the corresponding nssrSSD (ESi) and nssrMSD (ESi) values are represented by two boxplots. Left: performance of enzdes (blue boxplots), right: performance of MSF:GA:enzdes (orange boxplots). Whiskers indicate the lowest and the highest datum still within the 1.5 interquartile range.
Fig 5.
Recovery of design shell residues from BR_EnzBench by means of enzdes and MSF:GA:enzdes.
The distributions nssrSSD (aaj) (blue bars) and nssrMSD (aaj) (orange bars) represent for each amino acid aaj the nssr value (Eq 3) deduced from 13440 design sequences. These were created by enzdes or MSF:GA:enzdes for the benchmark proteins BR_EnzBench, respectively. nssr takes into account the recovery of all residues which are similar to the native aaj.
Fig 6.
Recovery of two striking binding pockets by means of enzdes and MSF:GA:enzdes.
(a) The 3D structure of the binding pocket of ARL3-GDP is shown on the right, the ligand GDP is colored light blue. The residues of the corresponding design positions are shown on the left (labeled “Native”). The sequence logos labeled enzdes and MSF:GA:enzdes represent for each design position the distribution of residues as generated by the corresponding protocols. Residues that are similar to the native ones are colored in green. In the native sequence, residues are colored in teal, if the outcome of the two protocols differs drastically. (b) The 3D structure of the binding pocket of the glucose binding protein is shown on the right; the bound glucose is colored light blue. Native residues and sequence logos are shown on the left and were prepared and colored as described for panel (a).
Fig 7.
Single-state designability of MD_EnzBench conformations.
Each of the 100 boxplots on the right represents 16 × 10 nssr values resulting from ten conformations generated by the MD simulation in a 100 ps interval for each of the 16 prot(k). As a control, the 16 × 20 values of (single) enzdes designs generated for 20 protein-specific conformations from BR_EnzBench were summarized in a boxplot shown on the left (label Backrub). Whiskers indicate the lowest and the highest values of the 1.5 interquartile.
Fig 8.
Mutations introduced into the IGPS scaffold to design retro-aldolase activity.
(a) An overview of all mutations introduced in 42 previous designs subsumed in the set RA* which are listed in S3 Text. Blue spheres indicate residue positions and sphere diameters are proportional to the frequency of the mutations in comparison to the native IGPS sequence. (b) Ditto, for nine RA_MSD* designs, mutations are visualized by means of orange spheres.
Table 2.
MSD proteins and their retro-aldolase activity.