Fig 1.
Pseudocode describing the implementation of the RECON algorithm.
Fig 2.
Schematic showing proposed energy landscape of forced vs. encouraged sequence convergence in MSD.
By allowing each state to maintain its own sequence and explore sequence space independently, RECON is able to provide an intermediate solution in a MSD problem, enabling more rapid determination of a low energy solution. Dashed lines represent forced convergence, where both states must adopt the same sequence (either AB or BA), whereas the solid line represents encouraged convergence, where state 1 can adopt sequence AB while state 2 adopts BA. This creates a lower energy intermediate state leading to more rapid adoption of the optimal solution, sequence BB.
Table 1.
Complexes used in common germline antibody benchmark.
Table 2.
Complexes used in promiscuous protein benchmark.
Fig 3.
Native/germline sequence recovery of designed complexes.
100 designs were generated using RECON, with both fixed backbone (FBB) and backbone minimized (BBM) protocols, and MPI_MSD. Sequences of the top 10% of models were compared to either the native sequence or, in the case of common germline-derived antibodies, to the germline sequence. See methods for details of native sequence recovery calculations.
Table 3.
Results of common germline gene multi-specificity design benchmark.
Table 4.
Results of promiscuous protein multi-specificity design benchmark.
Fig 4.
Encouraging sequence convergence in RECON can avoid high-energy sequence intermediates.
A. An example design trajectory of RECON in the FI6v3 benchmark through four design rounds is shown. Sequences tend to diverge in early rounds when convergence restraints are kept low, whereas in later rounds when restraints are increased states are encouraged to adopt a single solution. The figure displays one example from the fixed backbone design protocol, with convergence restraints removed before reporting fitness. The two states showed different preferences for residues highlighted in red. B. Residues highlighted in panel A were applied to the opposing state to analyze the energetic barrier of forced sequence convergence. The energy of these three residues was analyzed when the sequence favored by state 1 (TSY) was applied to state 2, and vice versa with the sequence QQW (intermediate sequence, black/red lines). This was compared to the three-residue fitness when each state was allowed to adopt its own preferred sequence (intermediate sequence, blue line). Energies were compared to the final, “compromised” sequence (QQY). These three amino acids occurred at positions 28, 30, and 53, respectively.
Table 5.
Comparison of CPU runtimes for multi-specificity design using different algorithms.
Fig 5.
Recapitulation of evolutionary sequence profiles by multi-specificity design.
A. For each protein in the benchmark set, an evolutionary sequence profile (top) was calculated and compared to the sequences generated by MSD (bottom). A similarity score was calculated for each position and averaged over designed positions to measure how well design searches biologically relevant sequence space. Highlighted are example positions where designed sequences either agreed (blue) or disagreed (red) with naturally occurring sequences. The figure displays the designed amino acid profile for a subset of positions in the VH5-51 benchmark set. See methods for details on percent similarity calculation. Amino acids are colored according to chemical properties. B. RECON-generated designs were more similar to observed evolutionary sequence profiles than those produced by MPI_MSD. Percent similarity was averaged over designed positions that had been mutated by any design method. Plotted are mean and SEM values. Design protocols are colored as in panel D. C. Improvement in recapitulating evolutionary sequence profiles of RECON increases with the number of designed positions. For each benchmark set, the number of designed positions is plotted against the difference in evolutionary sequence similarity between RECON backbone minimized and MPI_MSD. Least-squares linear fit is shown, with an R-value of 0.61 and p value of 0.02. D. Difference in recapitulation of evolutionary sequence profile for the four largest benchmark sets by designs generated by RECON using fixed backbone (FBB) or backbone minimization (BBM) protocols, or MPI_MSD. P values were calculated using a paired two-tailed t test.
Table 6.
Comparison of design-generated sequences to evolutionary sequence profiles of input proteins.
Fig 6.
Structural analysis of sequence preferences of RECON and MPI_MSD.
At positions 32 (A), 33 (B), and 74 (C), RECON and MPI_MSD showed consistent difference in sequence preference in the VH3-23 benchmark. Circled in red are positions that differ between the two structures. Shown in parenthesis are per-residue energy scores in REU summed across all post-minimization states. Shown above are post-minimization structures from designs generated by RECON and MPI_MSD. Structures shown in panels A and C are from the 1S78 complex, and those in panel B are from the 3BN9 complex.
Fig 7.
Incorporation of backbone motion into RECON recapitulates evolutionary sequence profiles in un-minimized structures.
Multi-specificity design using RECON was repeated on structures that had not been previously energy minimized to evaluate the benefit of incorporating backbone movements. Designs were generated using either a fixed backbone protocol (Fixed BB), alternating rounds of φ, ψ, and χ angle minimization (Minimize), or using backrub motions (Backrub). P values were calculated by a paired two-tailed t test.