Paraplume: A fast and accurate antibody paratope prediction method provides insights into repertoire-scale binding dynamics

doi:10.1371/journal.pcbi.1013981

Fig 1.

(A) Antibody (blue) binding to an antigen (green), illustrated using the structure of the variable domain ( region) of the mouse anti-lysozyme antibody (PDB Code: 1BVK).

Amino acids are represented using carbon alpha atoms, and the paratope is colored red. Amino acids are labeled as belonging to the paratope if any non-hydrogen atom is within a distance of 4.5 Å of a non-hydrogen antigen atom. (B) The pipeline used for paratope prediction. The antibody sequence is given as input to protein language models (PLMs), the last embedding layer of which is concatenated and fed to a multi objective multilayer perceptron (MLP). The MLP calculates probabilities of amino acids belonging to a paratope.

More »

Expand

Fig 2.

Visualization of the average Paraplume prediction probability (top) and PLM raw importance score (bottom) across IMGT positions for 218 unseen sequences.

IMGT positions are labeled every two residues for clarity. Predictions are made on the Paragraph test set using Paraplume trained on the Paragraph training set. The importance scores were computed using Shapley values, as described in Methods [21]. Paratope predictions localize primarily to CDRs, where general PLMs have decreased contribution over antibody-specific PLMs. Notably, Paraplume identifies binding residues within the non-canonical FW3 DE loop, a region inaccessible to CDR-restricted methods.

More »

Expand

Table 1.

Comparison of methods that use sequences as inputs.

Paragraph and PECAN model the 3D structures from the sequences with ABodyBuilder (ABB) [10], while MIPE uses AlphaFold2 (AF2), and requires both antibody and antigen sequences. All other methods operate directly on sequences without requiring structural modeling. Performance metrics (PR AUC, ROC AUC, F1 score, and MCC) with additional model characteristics (structure modeling free and antigen agnostic) for models evaluated on PECAN, PARAGRAPH, and MIPE datasets. The highest value in each column is in bold, the second best is underlined.

More »

Expand

Table 2.

Comparison of methods that use experimentally determined structures as inputs.

Performance metrics (PR AUC, ROC AUC, F1 score, and MCC) with additional model characteristics (antigen agnostic) for models evaluated on PECAN, PARAGRAPH, and MIPE datasets. The highest value in each column is in bold, the second best is underlined.

More »

Expand

Fig 3.

(A) Dataset curation for the epitope asymmetry analysis, with the number of PDB structures at each stage.

(B) Cartoon of antibody-antigen complexes with symmetric and asymmetric paratopes and epitopes. An antibody side chain paratope binds to an epitope on the antigen side chain. In the asymmetric case two identical antibody sequences bind different epitopes, using different paratopes. (C) Normalized paratope asymmetry correlates strongly with the normalized epitope asymmetry (Pearson correlation coefficient), where each point represents a distinct antibody-antigen complex.

More »

Expand

Table 3.

F1 score and MCC for paratope prediction for the Upper Bound and Paraplume conditions across the PECAN, Paragraph, and MIPE datasets.

More »

Expand

Fig 4.

(A) Correlation between the average change in affinity and the average change in the probability for an amino acid to belong to a paratope across the 16 mutated positions of bnAb 9114.

Averages are computed across all antibody variants with measurable affinity in [29] for each of the H1, H3, and FluB antigens. (B) Paratope size as a function of amino acid mutation count for three groups of binders and non-binders, based on experimental affinity measurements from [29]. Non-binders are defined as sequences with no measurable affinity to any of the three strains. For (A-B) linear correlation was quantified using Pearson’s correlation coefficient (r), and the p-value computed with a two-sided hypothesis test (see pearsonr documentation). (C) Normalized paratope size density for a repertoire of IgG antibodies from mice immunized with tetanus toxoid [30] with antibodies sorted for binding to the antigen, compared to naive antibodies from the same mouse species [31]. The p-value was computed using a two-sided Mann–Whitney U test. (D) Comparison of paratope size between antibody sequences and their inferred germline sequences in the antibody repertoires of naive mice (left) and immunized mice (right). (E) Paratope size of observed antibody sequences and their germline sequences across different amino acid mutation count bins for naive mice (left) and immunized mice (right). The mutation count represents the number of amino acid differences between each antibody sequence and its germline, which is why germline sequences are also assigned mutation counts.

More »

Expand

Fig 5.

Effect of hypermutations on paratope size in human repertoires.

Analysis for donor 326651 from [32]. (A) 2D histogram showing the relationship between the paratope size of observed antibody sequences and their inferred germline counterparts. (B) Paratope sizes of observed sequences and germline sequences grouped by amino acid mutation count bins. (C) Density of the average increase in paratope size within lineages, shown across different lineage size bins. Each density curve is fitted using all lineages in the corresponding size range. The black line indicates the median average increase in paratope size for each bin. (D) Median average increase when averaging over sequences with a fixed number of mutations within the lineage.

More »

Expand

Fig 6.

(A) Comparison of the paratope-weighted and unweighted embeddings across the six large language models (LLMs) used in Paraplume.

Performance is evaluated using the F1 score from a regression model trained to classify binders versus non-binders, based on sequences from [29]. Two-fold cross-validation was performed on two distinct sets, resulting in the 12 data points. (B) The same analysis as in (A), but with a regression model trained to classify antibodies into epitope classes using sequences from [35]. A Wilcoxon paired sample test demonstrated that paratope-weighted embeddings yielded statistically significant improvements for both tasks, with p-values of 0.007 for binder classification and 0.004 for epitope binning.

More »

Expand