Exploring the potential of structure-based deep learning approaches for T cell receptor design
Fig 3
Occurrence of amino acids at CDR3 interface positions in native and designed sequences.
(A) Frequency of each amino acid at the CDR3 designed positions in native and ProteinMPNN (left panel) or ESM-IF1 (right panel). Higher frequency indicates that the amino acid was more frequently observed at the CDR3 interface in the analyzed test cases. Only non-redundant generated sequences are considered in the analysis. The x axis is ordered by the BLOSUM62 amino acid grouping: [A, G, S], [C], [D, E, P, T], [Q, N, H, R, K], [I, L, M, V], [F, Y, W]. (B) Heat map of the frequency of substitutions of the amino acid substitutions in the designed sequences. The x axis represents the amino acids at the native sequences and the y axis represents the corresponding substitution in the designed sequences. A hypothetical frequency of 100% alanine in native sequences and leucine in designed sequence, for instance, indicates that we observed a change from alanine to leucine in all design cases. (C) Distribution of the Estimation of Generation Probabilities (Pgen) of CDR3 Sequences Using OLGA [28]. Pgen was estimated for each CDR3 (α or β) generated by ProteinMPNN (in purple) or ESM-IF1 (in red) from different design scenarios (design of only CDR3 interface positions—upper panel—or design of all CDR3 positions—bottom panel). For comparison, Pgen was also estimated for CDR3 sequences from the test case native structures (in green). This analysis considered only designs from human TCR test cases bound to MHC-1, and only Pgen values greater than 0 were presented in the density plot.