Protein loop structure prediction by community-based deep learning and its application to antibody CDR H3 loop modeling

doi:10.1371/journal.pcbi.1012239

Fig 1.

The overall workflow of ComMat.

At each cycle, a community of size N, represented by single, pair, and structure features, progresses through updates facilitated by communication via cross-over and pair feature update.

More »

Expand

Fig 2.

Performance comparison between ’With cross-over’ and ’No cross-over’ prediction setups on the IgFold set.

(a) Structure prediction errors measured by loop RMSD for best sampled and top scored models with and without cross-over (N = 32). Box plots display the median, interquartile range (IQR) bounds, whisker length of 1.5 ✕ IQR, and outliers beyond the 1.5 ✕ IQR range. (b) Success rates of best sampled and top scored structures with and without cross-over. (c) Best RMSD obtained by the two algorithms for individual targets. (d) Example case (PDB ID 7WPH) where best sampled and top scored models with cross-over (RMSD = 1.34 Å and 0.93 Å, respectively) outperform those without cross-over (3.06 Å and 2.75 Å, respectively).

More »

Expand

Table 1.

Median CDR H3 loop RMSD of the best sampled structures for the test complexes in the IgFold set for different methods.

More »

Expand

Table 2.

Median CDR H3 loop RMSD of the top-ranking structures for the test complexes in the IgFold set for different methods.

More »

Expand

Fig 3.

Comparison of predictions by ComMat with AF2Rank (N = 32) against (a) ImmuneBuilder and (b) AlphaFold-Multimer 2.2 for individual prediction targets, illustrating that different methods excel with different targets. (c) An example case (PDB ID: 7S0B) in which ComMat achieved higher prediction accuracy compared to both ImmuneBuilder and AlphaFold-Multimer 2.2.

More »

Expand

Fig 4.

t-SNE visualization of the loop structure sampling trajectory for the Glucosyltransferase domain of Clostridium difficile toxin B binding antibody (PDB ID 7SO5, H3 loop length = 13, maximum sequence identity to the training set = 38.5%) by (a) a single-inference sampling with ComMat trained with cross-over (N = 32) and (b) 32 independent sampling with ComMat without cross-over.

The structures obtained through the eight cycles of ComMat are indicated with round dots. Crystal structures, the best sampled and scored models by ComMat, and predictions by IgFold and AF-Multimer are denoted with different symbols. Comparison of the H3 loop structures is presented in (c), (d), and (e).

More »

Expand

Fig 5.

t-SNE visualization of the loop structure sampling trajectory for the SARS-Cov2 binding antibody (PDB ID 7SN1, H3 loop length = 14, maximum sequence identity to the training set = 50.0%) by (a) a single-inference sampling with ComMat trained with cross-over (N = 32) and (b) 32 independent sampling with ComMat without cross-over.

The structures obtained through the eight cycles of ComMat are indicated with round dots. Crystal structures, the best sampled and scored models by ComMat, and predictions by IgFold and AF-Multimer are denoted with different symbols. Comparison of the H3 loop structures is presented in (c), (d), and (e).

More »

Expand

Fig 6.

Dependency of RMSD of the best sampled loops on the community size for different loop length ranges.

More »

Expand