Zero-shot segmentation using embeddings from a protein language model identifies functional regions in the human proteome

doi:10.1371/journal.pcbi.1012929

Zero-shot segmentation using embeddings from a protein language model identifies functional regions in the human proteome

Fig 3

Segment Embedding Evaluation and Visualization of IDRs, Domains, and Compositional Biases.

(A, B) Normalized confusion matrices for 1-nn assessment for (A) ProRule Domain compared to MobiDB IDRs (Disorder Consensus Predictions) and (B) Compositional Biases from MobiDB. In the normalized confusion matrix, we report 1-nn precision along the diagonal and state the number of ZPS segments for each label as n (see S2 Table for confidence intervals). (C, D) Shows 2-dimensional UMAPs of segment embeddings labelled with (C) ProRule domains compared to MobiDB disorder consensus and (D) Compositional Biases from MobiDB. Each point in the UMAP is a protein segment, and each segment shown in the UMAP was used in the respective 1-nn assessment.

doi: https://doi.org/10.1371/journal.pcbi.1012929.g003