Figure 1.
Asparagine and Aspartate degradation pathways.
Deamidation of asparagine or dehydration of aspartic acid occurs by nucleophilic attack of the α-amino group of the C-terminally flanking amino acid. This leads to formation of a metastable succinimide (cyclic imide) intermediate, which hydrolyzes to a mixture of aspartyl and iso-aspartyl linkages. Alternatively, nucleophilic attack by the backbone carbonyl oxygen results in a cyclic isoimide intermediate, yielding only aspartyl residues after hydrolysis independent of the point of attack of the incoming water molecule. Asparagine residues can deamidate to Asp by direct water-assisted hydrolysis. Standard amino acids (Asn, Asp) are outlined with black boxes.
Table 1.
Experimental Asn and Asp hotspot collection.
Figure 2.
Occurrence of Asn and Asp amino acid motifs in the CDRs of a therapeutic mAb collection and a set of naturally occurring antibodies (IMGT).
Black triangles show percentages of hotspots within Asn and Asp motifs of the experimental collection of 37 mAbs. Bars represent percentages of depicted sequence motifs among all Asn or Asp residues in only CDR regions. Percentages shown as filled bars represent the non-redundant collection of the 37 analytically assessed therapeutic mAbs, bars striped in light grey belong to a collection of 9990 V-D-J- and 6296 V-J regions of naturally occurring antibodies from the IMGT database. (A) Asn sequence motifs, (B) Asp sequence motifs.
Figure 3.
Parameters characterizing Asn and Asp residues in a structural environment outlined at an exemplary Asp residue.
Parameters describing the carboxyl/amino group leaving tendency, the transition state accessibility, the Nn+1 nucleophilicity, and the structural environment are depicted in pink, light blue, purple, and dark blue, respectively. Parameter names are used as in Table S1.
Figure 4.
ROC plots for comparison of 3D classifiers to sequence-based prediction shows significant decrease of false-positive rates.
Evaluation of different statistical methods is compared with only sequence-based prediction. For statistical classification methods, average numbers of false-positive and false-negative Asn/Asp residues are results of 40 rounds of Monte Carlo cross validation. TPR (true positive rate) = number of true positives divided by number of positives. FPR (false positive rate) = number of false positives divided by number of negatives. Tree, rpart, PP (Pipeline Pilot) tree, and RandomForest are recursive partitioning algorithms; svm, ksvm are support vector machine algorithms; rda is a regularized discriminant analysis algorithm; nnet is a neural network; sequence-based corresponds to prediction based on sequence motifs NG, NS, NT, and DG, DS, DT, DD, DH. The Pipeline Pilot tree, shown as a yellow circle, was selected as prediction algorithm, at pruning level 4. A: Asp model; B: Asn model. Panels C and D show a zoom view of the panels A and B, respectively. The numerical values shown in these graphs can be found in Table S3.
Figure 5.
ROC plot for comparison of different pruning levels of decision trees.
Decision trees were pruned automatically as implemented in Pipeline Pilot. Average numbers of false-positive and false-negative Asn/Asp residues are results of 40 rounds of Monte Carlo cross validation. TPR (true positive rate) = number of true positives divided by number of positives. FPR (false positive rate) = number of false positives divided by number of negatives. Trees 1-3 and 5-6 are shown as spheres, tree 4 as a black triangle. Tree 1 is the un-pruned tree model. Tree 4 was selected for prediction.
Figure 6.
Final Aspartate (A) and asparagine (B) decision trees.
The outline of nodes and leaves is colored by the weighted majority of the class that is present (red: hotspots, green: non-hotspots). Filling levels of the bars on the right hand side of each node/leaf refer to the fraction of the data set. The fraction of each class at a node/leaf is shown by the colored fraction of the circle. The number of members of each node/leaf is indicated above.