Fig 1.
Schematic overview describing the steps required to generate fingerprints for Deep Neural Network image analysis.
We used the Rosetta Antibody software to generate multiple 3-D models of a particular Ab or one of its antibody binding fragment (FAB) using the light and heavy chain sequences as input data. For each 3-D model, we used PYMOL to produce a fine grid perpendicular to the main axis of the Ab, which intersects the Ab binding site region. We selected amino acid residues from the model that lies within a distance of 20 Å from the grid, and their atoms were projected onto the 2-D grid and displayed using a “dot” representation. The image was then colored according to the desired color-scheme using either a charge or an amino acid property based representation. The resulting image was then stored as an image file. The transformation of the sequence into an image allowed us to train DNNs models for Ab classification purposes using collections of fingerprint sets from multiple Abs.
Table 1.
Summary of examined Tasks, the number of Abs, DNN models, and fingerprint images used in each Task.
Fig 2.
Variation of the loss function for DNN models with the number of learning cycles.
The compound blue line represents average loss per epochs during training of 10 DNN models. Top and bottom of the gray area correspond to the maximum and minimum limits of the loss at each epoch for ten models. After about 30 epochs, there was no improvement in the loss function and we typically terminated the training at 30 epochs.
Table 2.
Summary of 10 DNN models trained to differentiate between fingerprints belonging to pairs of antibodies.
Fingerprints were generated using the charge coloring scheme shown in Fig 1.
Fig 3.
Schematic diagram of the allocation of fingerprints into training, validation, and testing sets.
Antibody assignment: Antibodies are randomly split into two fractions: training/validation and testing. Fingerprints assignment: The fingerprint images of an Ab selected for testing are added to a common pool in the test set. If the Ab was selected for training/validation, its fingerprints are divided into two fractions: each fraction is added to specific pools in the training and the validation sets associated with the Ab class.
Fig 4.
Set of four antibodies associated with one family lineage.
(A) The graph highlights the amino acid substitutions in the heavy chain CDR3 region of the Abs with respect to the germline gene. Abs ADI-15912 and ADI-15843 share the same CDR3 sequence. (B) Each column shows three fingerprints for each Ab of the family showing how the amino acid substitutions listed in (A), and conformational changes in the models affect the fingerprints.
Table 3.
Training of DNNs for recognition of ten lineages.
Statistical summary for 80 DNN models used for classification of 28 antibodies belonging to ten family lineages using fingerprints colored according to the charge coloring code.
Table 4.
Detection of a specific lineage family.
Summary statisticsa of 80 DNN models used for classification of 28 antibodies belonging to ten family lineages using fingerprints colored according to the charge coloring code.
Fig 5.
Prediction accuracy of DNNs trained to detect Ab family lineage.
Plot of the “F1-score local”-metric as a function of the family lineage for two types of DNN models that we trained with Ab fingerprints generated by two alternative coloring schemes, i.e., by residue charge (black circles) or by reduced-amino-acid alphabet (grey squares).
Fig 6.
Main regions of the EBOV GP trimer for Ab recognition.
(A) Structural model of the EBOV GP trimer recognized by anti-EBOV Abs. Abs have been colored according to the regions of the trimer that they bind, i.e., B) base of the trimer, C) the α-helical heptad repeat 2 (HR2) region, and D) the glycan Cap domains.
Table 5.
Classification of Abs recognizing two epitopes at either the base (ADI-15734) or the HR2/MPER region (ADI-15974) of the EBOV-GP trimer.
We colored the fingerprints for the DNN models according to the charge coloring code.
Table 6.
Classification of Abs (ADI-15734 and ADI-15878) recognizing two epitopes at the base of the EBOV-GP trimer.
We colored the fingerprints for the DNN models according to the charge coloring code.
Table 7.
Classification of 30 Abs that bind exclusively to one out three possible epitopes.
Two binding sites recognized by Abs from Set1 (ADI-15734 competitors) and Set2 (ADI-15878 competitors) are located at the base of the EBOV-GP trimer, and the third epitope recognized by Abs from Set3 (ADI-15974 competitors) is located at HR2/MPER region. All DNN models are based on evaluating fingerprint images based on the charge coloring code.
Table 8.
Classification of 30 Abs that bind exclusively to one out three possible epitopes in the EBOV GP trimer.
Summary statisticsa of 20 DNN models trained using fingerprints colored according to the charge coloring code.
Fig 7.
Antibody recognition sites in HIV GP120/GP41.
Two main Ab binding regions based on the structural complex of the Ebola surface glycoprotein GP120/GP41 proteins with anti-HIV-1 Abs. Site 1 encompasses a structural overlay of 18 different site-specific Abs, whereas Site 2 contains 10 different Abs.
Table 9.
Classification of Abs recognizing two distinct epitopes on the HIV-1 GP120/GP41 protein complex.
Statistical summary of five experiments in which we trained 50 DNN models using fingerprints colored according to the charge coloring code.
Fig 8.
Representative Ab fingerprints from normal and outlier classes.
(A) Exemplar images of Ab fingerprints classified as belonging to lineage 1 by one DNN model. The images from panel A are ordered according to scores assigned by the neural network. They are organized in rows of 10 images, with scores decreasing from left to right. (B) Exemplar images of Ab fingerprints classified as belonging to outliers (Abs from lineages other than 1) by the DNN model. The images from panel B are ordered according to scores assigned by the neural network. They are organized in rows of 10 images, with scores increasing from left to right. These results correspond to the DNN model listed as “3” in Table 10.
Table 10.
Statistical summary of 12 DNN models trained to distinguish Abs from a specific family lineage using the Robust Convolutional Autoencoder one-class classification method.
Table 11.
Statistical summary of 13 DNN models trained to distinguish Abs from a specific family lineage from a large Ab set using the Robust Convolutional Autoencoder (RCAE) one-class classification method.
Fig 9.
Detection of clonally diverse antibodies using the OCC method RCAE.
a This number corresponds to the ranking assigned to the 100 DNN models based on the AUROC score computed on the testing set. b Image reconstruction errors ranked from low to high. Gray circles are associates with fingerprints from anomalous Abs. Colored circles highlight clusters of errors for fingerprints of the Abs from the “normal” class. Note that the graphs only display the reconstruction errors of 120 fingerprints from each testing set. c The test sets used to evaluate the DNN models below contained only Abs that do not compete with KZ52 in an attempt to detect false positives (i.e., the Ab representing the normal class was a decoy).
Fig 10.
LIME analysis evaluating the reliability of predictions from a trained DNN model.
a The column lists images of arbitrary fingerprints associated with the Abs listed under the “Abs Set” column. b The column contains images generated as the superposition of three elements, a) green color represent the most relevant pixels used by the DNN to generate the prediction, i.e., those shown in the image from column 3; b) bright red pixels have the most negative contribution to the prediction; and c) the remaining pixels from the original fingerprint image. c This column contains heatmap images describing the contribution of each pixels on the fingerprint to the prediction generated by the DNN model. The color scale ranges from dark blue for the most relevant contributions to dark red for the most negative ones. The color scale is selected independently for each heatmap based on the scores assigned by LIME.
Fig 11.
LIME analysis evaluating the reliability of predictions of a DNN model trained for classification of HIV Abs based on their binding preference.
a The column lists images of arbitrary fingerprints associated with HIV Abs binding to SITE1 and SITE2 as defined in Fig 7. b Images in this column show to the most relevant pixels from the analyzed fingerprint used by the DNN model to generate the associations with the correct Ab. c The column contains images generated as the superposition of three elements, a) green color represent the most relevant pixels used by the DNN to generate the prediction, i.e., those shown in the image from column 3; b) bright red pixels have the most negative contribution to the prediction; and c) the remaining pixels from the original fingerprint image. d This column contains heatmap images describing the contribution of each pixels on the fingerprint to the prediction generated by the DNN model. The color scale ranges from dark blue for the most relevant contributions to dark red for the most negative ones. The color scale is selected independently for each heatmap based on the scores assigned by LIME.
Fig 12.
Training of DNN for recognition of Abs from ten lineages.
The validation accuracy (Aval) is a metric associated with the quality of the DNN model that measures the accuracy on the validation set and is potentially prone to overfitting. The green horizontal line at κ equal 0.4 divides the set of predictions on independent test sets of fingerprints into significant (≥ 0.4) and no-significant (< 0.4).
Fig 13.
Using multiple Ab models to account for the CDRs flexibility and variations of side-chain orientation.
(A) Superposition of 3-D models of seven EBOV Abs (ADI-15974, ADI-15756, ADI-15758, ADI-15999, ADI-15820, ADI-15848, ADI-16061) that target the stalk region of EBOV GP. For simplicity, the Abs are represented using grey ribbon models with positively- and negatively-charged residues associated with the CDRs shown with a ‘stick’ representation in blue and red, respectively. The light gray fragments of the ribbon models highlighted the positions of the light-chain CDR3s (CDR3-L), and heavy-chain CDR3s (CDR3-H). (B) Superposition of ten 3-D models of Ab ADI-15974 shown in the same orientation as those in (A) and using the same color scheme. Variations in the PDB templates used by Rosetta Antibody for 3-D models generation can lead to differences in the CDRs, and variations in the fingerprint patterns. In addition, for one of the models, we display the remaining positively- and negatively-charged residues of the Ab using cyan and orange colors, respectively. Note that projections of the latter set of residues may also contribute to the fingerprint patterns. (C) Same models as in panel B viewed using a 90° rotation around the horizontal axis.
Fig 14.
Schematic diagram of the allocation of Ab fingerprints into training, validation, and testing sets for one-class classification.
See text for an explanation of antibody and fingerprints assignment. Note: the Abs labels have been simplified where “A” stands for “ADI-”.