Improved protein complex prediction with AlphaFold-multimer by denoising the MSA profile

doi:10.1371/journal.pcbi.1012253

Fig 1.

a) The AFProfile Method. Starting with MSAs generated by the default AlphaFold-multimer pipeline, sequences are sampled and MSA features are created. These features are used to predict the structure with AlphaFold-multimer and among the most important is the cluster profile. We learn a residual to this, the cluster bias, which we suggest effectively denoises the MSA profile into a representation that generates a higher confidence score and a more accurate structure. This can be seen as denoising a blurry image to make it sharper. b) Average predicted model confidence vs time in hours on one NVIDIA A100 GPU using gradient descent for the 7 CASP15 targets (H1134, T1123, T1173, H1141, H1144, H1140 and T1187). A learning rate (lr) of 1e-4 with the Adam optimiser and 20 recycles was used here (Methods). Example models for T1123 (green) are shown at different points of predicted confidence with the top-ranked CASP15 model in structural superposition (grey). In total, 100 optimisation steps were performed per target (n = 700). c) Confidence vs MMscore across the 7 CASP15 targets (H1134, T1123, T1173, H1141, H1144, H1140 and T1187). For each target, 100 optimisation steps were performed (n = 700). The Spearman R is 0.68, the lr 1e-4 and the number of recycles used set to 10 (Methods). A density plot using all samples (n = 700) and a running mean using a step of 0.05 confidence are shown. d) Examples from CASP15 with the best prediction in grey and AlphaFold-multimer(AFM) and AFProfile coloured green for targets H1144, T1123 and T1173. For T1173, the MMscore improves from 0.49 with AFM to perfect with AFProfile (1.0). For H1144, one of the chains is in the wrong orientation, while the right configuration is found with AFProfile (MMscore = 0.84). For T1123, both chains are slightly wrong (MMscore = 0.55), while AFProfile improves the score to 0.77 (accurate model >0.75).

More »

Expand

Table 1.

Prediction results on 7 targets from CASP15 from the Wallner group, AFM and AFProfile.

The AFsample MMscores are higher on average, 0.97 vs 0.76 for AFProfile. Compared to AFM, the increase in MMscore is 0.13 on average (0.76 vs 0.64), making 3 additional targets successful (MMscore>0.75). The successful models (MMscore>0.75) are marked in bold.

More »

Expand

Fig 2.

a) MMscores of difficult targets that have low ranking confidence (RC, Eq 1) (n = 487) with AF-multimer compared to AFProfile. In total, 33% of the failed examples can be rescued and selected (MMscore>0.75 and RC>0.8) with AFProfile. b) MMscore vs confidence. The density represents all predictions (100 per target, n = 48700) and the line is the running mean using a step size of 0.01 confidence. c) Density plot of the confidence vs iteration of gradient descent with AFProfile (n = 48700). At higher iterations, there is a strong density of high confidence. Following Fig 2B, this region is more likely to have high MMscores.

More »

Expand

Fig 3.

a) Final MMscore vs structural change during the optimisation procedure as measured by the change in MMscore (ΔMMscore). The density and points represent all predictions (n = 487) and the line is the running mean with a step size of 0.05 ΔMMscore. This shows that the MMscore is unlikely to decrease with the optimisation procedure and that when the structural change is large compared to the initial prediction (measured by ΔMMscore), it is likely to obtain a higher MMscore. This is expected as the predicted structures are inaccurate using AFM (MMscore<0.75). b) Initial and AFProfile optimised structure of PDBIDs 6nnw (https://www.rcsb.org/structure/6NNW) and 6ya2 (https://www.rcsb.org/structure/6ya2). The native structure is in grey and the predicted ones are in blue. The MMscores increased from 0.44→0.96 and 0.52→0.93 and the confidences from 0.22→0.90 and 0.24→0.85, respectively, during the optimisation. c) Number of sequences in the MSA input representation generated by the AFM pipeline (MSA depth) vs the final MMscore after optimization with AFprofile. The density represents all predictions (n = 487) and the line is the running mean with a step size of 500 in MSA depth.

More »

Expand

Fig 4.

MMscore vs confidence for all 7 complexes using 100 iterations per complex (n = 700) from the CASP15 set.

The learning rate (lr) is increasing row-wise and the number of recycles column-wise. The Spearman correlations (R) are displayed in the figure legend. Some correlations are negative. The best correlation (0.68) is obtained by using lr = 0.0001 and 0 recycles.

More »

Expand