Fig 1.
Overview of the DeepFold pipeline.
Starting from a query amino acid sequence, DeepMSA2 is used to search the query against multiple whole-genome and metagenome sequence databases to create a multiple sequence alignment (MSA). The MSA is then used by DeepPotential to derive input features based on co-evolutionary analyses for the deep ResNet training. DeepPotential outputs the probability distribution of the Cβ-Cβ/Cα-Cα contact and distance maps as well as the inter-residue orientations. These restraint potentials along with the inherent statistical energy function are used to guide the L-BFGS folding simulations for final full-length structure model construction.
Fig 2.
Contribution of the various spatial restraints and energy terms on the DeepFold modeling accuracy, where the violin plot shows the TM-score of DeepFold using different combinations of energy terms/restraints on the 221 test proteins.
Fig 3.
Illustrative folding examples from DeepFold.
A) Case study from SCOPe protein d1jqpa1 that demonstrates the importance of inter-residue orientations for folding β-proteins, where the native structure is shown in yellow, and the superposed predicted models built without (left) and with (right) orientation restraints are shown in blue. B) Case study from SCOPe protein d1xsza2, which highlights the importance of the general energy function for improving the physical quality of the models. The models built without (left) and with (right) the general physical energy function are depicted in rainbow coloring, where the clashing region is shown in the inset on the left and the clashes have been resolved in the model built with the general energy function on the right.
Table 1.
Summary of the structure modeling results by DeepFold and the control methods on the 221 test proteins.
The p-values were calculated between DeepFold and the control methods using paired, two-sided Student’s t-tests.
Fig 4.
Head-to-head TM-score comparisons between DeepFold and other protein structure prediction methods: A) I-TASSER; B) C-I-TASSER; C) DMPfold; D) trRosetta; E) AlphaFold. (A-D) are based on the 221 Hard benchmark proteins, while (E) is on 31 FM targets from CASP13.
Fig 5.
Dependence of the simulation time and TM-score on protein length.
A) Simulation runtime for QUARK, trRosetta, and DeepFold in minutes plotted against the protein length. B) A close up of the runtime vs protein length for DeepFold and trRosetta. C) Analysis of the average TM-score for DeepFold, QUARK, and Rosetta across different protein length ranges.
Fig 6.
Comparison of DeepFold and QUARK modeling results.
A) Evaluation of the modeling accuracy of QUARK and DeepFold guided by different numbers of spatial restraints, where the top n*L distances were selected by sorting the Cβ distances according to their predicted probabilities. B) Analysis of the conformational changes that occured during the QUARK fragment assembly simulations. The figure plots the TM-score of the decoy at REMC cycle i compared to the decoy at the previous cycle i-1. The right hand side shows the final QUARK model in red superposed with the native structure in cyan. C) Analysis of the conformational changes that occured during the DeepFold simulations. The figure plots the TM-score of the decoy at L-BFGS step i compared to the decoy at the previous step i-1, where the right hand side shows the final DeepFold model in red superposed with the native structure in cyan. D) Comparison between the DeepFold model at L-BFGS step 100 (blue) with the model at step 1100 (red) and the experimental structure (cyan). The insets show the areas of the structure that changed the most after the 100th L-BFGS step.