OneProt: Towards multi-modal protein foundation models via latent space alignment of sequence, structure, binding sites and text encoders
Fig 5
Cosine Similarity distributions for models ESM-2, ProTrek-35M and -650M, OneProt-4 and -5.
The plot shows the similarity of a given protein to three groups: the 50 most evolutionarily similar proteins, the 50 most evolutionarily divergent sequences, and 1000 unrelated sequences. While all models partially capture evolutionary relationships, OneProt distinctly separates the three classes, demonstrating its ability to generate meaningful sequence representations.