Exploring the potential of structure-based deep learning approaches for T cell receptor design
Fig 4
Sequence recovery analysis of interface CDR3s amino acids in terms of identity, similarity, identity at buried positions and identity at hotspot positions for ProteinMPNN (left panel) and ESM-IF1 (right panel).
For both panels, each point of the box plot represents the percentage of sequence recovery of a unique design sequence from the MHC-I test cases, without redundancy. While identity considers only substitutions to the same native amino acid as recovered, the similarity considers as recovered substitutions to the same amino acid (AA) physicochemical class (see Methods). The buried AA identity corresponds to the identity only computed over buried positions (estimated by relative solvent accessibility, see Methods), whereas the hotspot AA identity corresponds to the identity only computed over interface CDR3s hotspot positions, predicted by computational alanine scanning experiments with Rosetta (see Methods). Statistical pairwise comparison assessed the significance between the identity (reference) and the other metrics. It was performed using the Mann-Whitney test with the R ggpubr package. Significance is indicated above each box plot (****, ** and * correspond to a p-value below 0.0001, 0.01 and 0.05, respectively, while ‘ns’ means no significance (p- value ≥ 0.05)). A detailed view of the same evaluated metrics per PDB test case is presented in S13 Fig.